cr - Tampereen yliopisto

Tilastotiede
Tilastotiede ei ole vain
numeroita taulukossa tai
graafeja paperilla!
Tilastotiede on yhteiskunnan,
teollisuuden ja tieteen keino
hallita epävarmuutta
ja tehdä
löydöksiä!
Costello et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nature Biotechnology, 2014.
Lähde: Morningstar Stock Report, morningstar.fi
The spatial patterns of the four leading interannual components extracted from climate data.
A. Ilin, H. Valpola and E. Oja. Exploratory Analysis of Climate Data Using Source Separation Methods. Neural Networks, 19(2):155-167, 2006.
?
?
?
?
José Caldas, Nils Gehlenborg, Ali Faisal, Alvis Brazma, and Samuel Kaski. Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics,
25:i145–i153, 2009.
Jaakko Peltonen and Samuel Kaski. Generative Modeling for Maximizing Precision and Recall in Information Visualization. In Geoffrey Gordon, David Dunson, and Miroslav
Dudik, eds., Proceedings of AISTATS 2011, the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP, vol. 15, 2011.
TILASTOTIEDE
Tilastotieteen
Tilastotieteen juuret
juuret ovat
ovat
todennäköisyysteoriassa,
todennäköisyysteoriassa, joka
joka alkoi
alkoi
sattumaa
sattumaa käyttävien
käyttävien pelien
pelien tutkimuksesta.
tutkimuksesta.
TILASTOTIEDE
Mittausten ja tilastojen
taitamaton käyttö voi
saada aikaan vääriä ja
harhaanjohtavia päätelmiä
TILASTOTIEDE
Tilastotiede on monipuolista data-analyysiä sisältäen sattuman ja
vaihtelun hallintaa, informaation suodattamista datasta sekä
mallintamista.
Tilastotieteellä on läheinen yhteys tiedonlouhintaan ja
koneoppimiseen.
Tärkeä nykysuunta on laskennallinen tilastotiede, jossa haetaan
aineistoista kiinnostavia epälineaarisia piirteitä ja ratkaistaan
monimutkaisia malleja mm. kehittyneen ja hajautetun optimoinnin ja
laskennan voimin.
Opetuksemme perehdyttää keskeiseen teoriaan, tärkeimpiin
aineistonhankinta- ja analyysimenetelmiin sekä näiden
tietokonepohjaiseen soveltamiseen.
Jakaumia, ennustamista, hypoteesin testausta, aikasarja-analyysia,
monimuuttujamenetelmiä, tiedon visualisointia, monista lähteistä oppimista...
Oakland A's GM Billy Beane is
handicapped with the lowest
salary constraint in baseball. If
he ever wants to win the World
Series, Billy must find a
competitive advantage. Billy is
about to turn baseball on its ear
when he uses statistical data
to analyze and place value
on the players he picks for
the team.
"geek-stats book turned into a
movie with a lot of heart"
"persuasively exposed front
office tension between ... old
school "eye-balling" of players
and newer models of datadriven statistical analysis”
Texts from IMDB, Wikipedia
Carl
Carl
Friedrich
Friedrich
Gauss
Gauss
s.
s. 1777
1777
Blaise
Blaise
Pascal
Pascal
s.
s. 1623
1623
Thomas
Thomas Bayes
Bayes
s.
s. 1702
1702
Pierre-Simon
Pierre-Simon
Laplace
Laplace
s.
s. 1749
1749
Ronald
Ronald Fisher
Fisher
s.
s. 1890
1890
Karl
Karl Pearson
Pearson
s.
s. 1857
1857
Stephen L. Portnoy
Alan Agresti
Irene Gijbels
University of Illinois Noel Cressie
Christian
P.
Robert
Harvey Goldstein
Hirotugu Akaike
University of FloridaCatholic University Urbana-Champaign
Paris
Dauphine
University
Ohio State
University of Bristol
Institute of
of Leuven
University
Statistical Mathematics
Jon A. Wellner
Jerome H. Friedman
University of Washington
The MITRE Corporation
Iain M. Johnstone
Stanford University
Peter Hall
University of Melbourne
Hira Lal Koul
Michigan State University
Peter Diggle
Lancaster University
Dan-Yu Lin
University of North
Carolina Chapel Hill
Gareth O. Roberts
David Donoho University of Warwick
Stanford University
Joseph G. Ibrahim
University of North
Carolina Chapel Hill
James Berger
Duke University
Donald Rubin
Harvard University
James Stephen Marron
University of North
Carolina Chapel Hill
Norman R. Draper
University of
Ingram Olkin
Wisconsin Madison
Stanford University
Jianqing Fan
Princeton University
Bernard W. Silverman
University of Oxford
Michael B. Woodroofe
University of Michigan
Peter J. Rousseeuw
University of Antwerp
Ole Barndorff-Nielsen
Enno
Mammen
Aarhus University
David B. Dunson
University
of
Mannheim
Duke University
Nancy Reid
University of
Toronto
Kanti V. Mardia
University of Leeds
Alexandre TsybakovPaul Rosenbaum Marc Hallin
CREST & Universite University of Universite Libre
Pennsylvania
de Bruxelles
Paris VI
Marc Yor
Raymond Carroll
Texas A&M University Pierre and Marie
Curie University
Bruce Lindsay
Pennsylvania
State University
Bradley Efron
George Box
Stanford University
University of
Hans-Georg Muller
Wisconsin Madison
University of
Peter J. Bickel Erich Leo Lehmann Alan Gelfand
California Davis Murad Taqqu William E. Strawderman
David O. Siegmund University of
Rutgers, the State
Duke University
Boston University University of New Jersey
Stanford UniversityCalifornia Berkeley University of
Wolfgang Karl Härdle
California Berkeley
Humboldt University
of Berlin
Peter Buhlmann
Ricardo Fraiman
ETH Zurich
Adrian Raftery Universidad de
Andrew Gelman
San Andres
John W. Tukey
Columbia UniversityPersi Diaconis
David A. FreedmanUniversity of Buenos Aires
Luc Devroye
Washington
Princeton University
Stanford University
University of
McGill University
California Berkeley
Robert Tibshirani
David Ruppert
Peter M. Robinson Standford University
Moscow State
London School of
Pedagogical University
Theodore W. Anderson Leo Breiman
Economics and
Stanford University
Holger Dette
George Casella Political Science
Richard David Gill
University of
Trevor Hastie
Ruhr University Bochum
University of Florida
California Berkeley Leiden University
Stanford University
Stephen L. Portnoy
Alan Agresti
Irene Gijbels
University of Illinois Noel Cressie
Christian
P.
Robert
Harvey Goldstein
Hirotugu Akaike
University of FloridaCatholic University Urbana-Champaign
Paris
Dauphine
University
Ohio State
University of Bristol
Institute of
of Leuven
University
Statistical Mathematics
Jon A. Wellner
Jerome H. Friedman
University of Washington
The MITRE Corporation
Iain M. Johnstone
Stanford University
Peter Hall
University of Melbourne
Hira Lal Koul
Michigan State University
Peter Diggle
Lancaster University
Dan-Yu Lin
University of North
Carolina Chapel Hill
Gareth O. Roberts
David Donoho University of Warwick
Stanford University
Joseph G. Ibrahim
University of North
Carolina Chapel Hill
James Berger
Duke University
Donald Rubin
Harvard University
James Stephen Marron
University of North
Carolina Chapel Hill
Norman R. Draper
University of
Ingram Olkin
Wisconsin Madison
Stanford University
Jianqing Fan
Princeton University
Bernard W. Silverman
University of Oxford
Michael B. Woodroofe
University of Michigan
Peter J. Rousseeuw
University of Antwerp
Ole Barndorff-Nielsen
Enno
Mammen
Aarhus University
David B. Dunson
University
of
Mannheim
Duke University
Nancy Reid
University of
Toronto
Kanti V. Mardia
University of Leeds
Alexandre TsybakovPaul Rosenbaum Marc Hallin
CREST & Universite University of Universite Libre
Pennsylvania
de Bruxelles
Paris VI
Marc Yor
Raymond Carroll
Texas A&M University Pierre and Marie
Curie University
Bruce Lindsay
Pennsylvania
State University
Bradley Efron
George Box
Stanford University
University of
Hans-Georg Muller
Wisconsin Madison
University of
Peter J. Bickel Erich Leo Lehmann Alan Gelfand
California Davis Murad Taqqu William E. Strawderman
David O. Siegmund University of
Rutgers, the State
Duke University
Boston University University of New Jersey
Stanford UniversityCalifornia Berkeley University of
Wolfgang Karl Härdle
California Berkeley
Humboldt University
of Berlin
Peter Buhlmann
Ricardo Fraiman
ETH Zurich
Adrian Raftery Universidad de
Andrew Gelman
San Andres
John W. Tukey
Columbia UniversityPersi Diaconis
David A. FreedmanUniversity of Buenos Aires
Luc Devroye
Washington
Princeton University
Stanford University
University of
McGill University
California Berkeley
Robert Tibshirani
David Ruppert
Peter M. Robinson Standford University
Moscow State
London School of
Pedagogical University
Theodore W. Anderson Leo Breiman
Economics and
Stanford University
Holger Dette
George Casella Political Science
Richard David Gill
University of
Trevor Hastie
Ruhr University Bochum
University of Florida
California Berkeley Leiden University
Stanford University
Sinä
Tampereen yliopisto
TILASTOTIETEILIJÄN TYÖTEHTÄVISTÄ
Tilastotieteilijä toimii yhteistyössä muiden alojen asiantuntijoiden
kanssa.
Sovellusaloja ja joitain tilastotieteen erikoisaloja:
●
●
●
●
●
Tekniikka ja luonnontieteet (teknometria, kemometria)
Biologia (biometria ks.
http://www.uta.fi/hes/tutkimus/tutkimusryhmat/Biometria.html)
Lääketiede (epidemiologia)
Taloustiede (ekonometria)
Yhteiskunta- ja käyttäytymistieteet (demometria, psykometriikka)
TILASTOTIETEILIJÄN TYÖTEHTÄVISTÄ
Ks. myös
http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoittuminen.pdf
Valinnaiset opinnot voivat vaikuttaa sijoittumiseen tietylle toimialalle.
Valinnaisten opintojen valinnasta ks. opinto-opas s. 51 tai
http://www10.uta.fi/opas/koulutus.htm?opsId=102&uiLang=fi&lang=fi&lvv=2013&koulid=19
Eräs esimerkki työtehtävistä:
http://www.luonnontieteet.fi/tyo/tilastotiede
Esimerkkejä työtehtävistä ja työnantajista
http://www.uta.fi/rekrytointi/opiskelijalle_ja_tyonhakijalle/uraseuranta/oppiainekoosteet/tilastotiede.html
TILASTOTIETEILIJÄN TYÖTEHTÄVISTÄ,
VALMISTUNEIDEN MIELIPITEITÄ
Tampereen yliopiston Ura- ja rekrytointipalvelu
http://www.uta.fi/rekrytointi tekee valmistuneiden työelämään
sijoittumisseurantaa
http://www.uta.fi/opiskelu/tyoelama/seurannat/index.html
Tuorein vuonna 2011 maisterin tutkinnosta valmistuneet
http://www.uta.fi/opiskelu/tyoelama/seurannat/maisterit/index/sijoittumisseuranta%202011.pdf
(1v valmistumisesta kaikki tilastotieteen opiskelijat olivat vakituisessa
tai määräaikaisessa työssä tai apurahatutkijana)
Matematiikkaa ja tilastotiedettä opiskelleiden kertomuksia opinnoista
ja työelämään sijoittumisesta, Opinto- ja kansainvälisten asiain
osaston julkaisu Loogista päättelyä ja tiedon analysointia
http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoi
ttuminen.pdf
“tutkijana valtion tutkimuslaitoksessa”, “matemaatikkona
valtiollisen viraston tutkimusyksikössä”, “Konsernin laatupäällikkö”,
“Data Mining -analyytikko”
CB
DA
International Master's
Degree Programme in
Computational Big
Data Analytics
Make
BIG SENSE
BIG DATA
out
of
Lawyers Are Turning to Big Data
Analysis (The National Law Journal,
Big data for big business - analytics
are no longer optional (The Globe and July 2015)
Mail, August 2015)
Intel Unveils Analytics Technologies
for Big Data, IoT (eWeek, August 2015)
Put big data to work with Cortana
Analytics (TechRepublic, July 2015)
How the age of Big Data made
statistics the hottest job around
(Canadian Business, April 2015)
What can big data do for small
startups? (VentureBurn, August 2015)
Why big data isn't always the
answer (ComputerWorld, August 2015)
Data Scientist: The Sexiest Job of
the 21st Century (Harvard Business
Review, October 2012)
Making Sense of Our Big Data World:
Statistics for the 99% (Business 2
'Big data' useful but caution is still Community, August 2015)
needed (Daily Record, August 2015)
Growth in big data draws
women to statistics (FWC.com,
How To Identify A Good/Bad Data February 2015)
Scientist In A Job Interview?
Why your kids will want to be data
(LinkedIn, August 2015)
scientists (CNBC, June 2014)
CB International Master's Degree Programme in
DA Computational Big Data Analytics
Tilastotieteen opinnot CBDA-ohjelmassa:
Suurissa tietoaineistoissa tapahtuu monenlaista variaatiota.
Osaamista tarvitaan jotta pelkistä mittauksista päästään
malleihin ja ymmärrykseen.
On vaikea tietää pelkästään katsomalla mitkä mahdolliset trendit
ovat ”todellisia” ja mitkä ovat vain yhteensattumaa. Tietokoneet
pystyvät etsimään mahdollisia trendejä suurista joukoista
vaihtoehtoja, mutta niille täytyy kertoa kuinka arvioida
löydöksien hyvyyttä.
CBDA:n tilastotieteen opinnot kertovat:
●
millaisia tilastotieteellisiä struktuureja ja trendejä voisi etsiä
●
miten mitata ovatko ne ”todellisia”
●
apukeinoja niiden etsimiseen ja tulosten esittämiseen
Master's programme in Computational Big Data Analytics (CBDA)
General Studies in Master's Degree Programmes given in English 2015-18 1–22 ECTS
General studies in the Master's degree programmes given in English are different depending on the student's educational background.
Please choose below only one of the three options A, B or C.
A) General studies for
international students 12–22
cr
Compulsory studies 12 cr
●
SISYY006 Orientation, 2 cr
●
SISYY005 Study Skills and
Personal Study Planning, 2 cr
●
KKENMP3 Scientific Writing,
5 cr
●
KKSU1 Finnish Elementary
Course 1, 3 cr
Free-choice studies 0–10 cr
●
YKYYKV1 Finnish Society
and Culture, 3–5 cr
●
YKYYV07 Introduction to
Science and Research, 2–5
cr
B) General studies for
students with education in
Finnish and BSc degree taken
outside SIS 9–18 cr
Compulsory studies 9–13 cr
Swedish course is required only
if no Swedish studies were taken
in the Bachelor's degree.
●
SISYY006 Orientation, 2 cr
●
SISYY005 Study Skills and
Personal Study Planning, 2 cr
●
KKENMP3 Scientific Writing, 5
cr
●
KKRULUK Ruotsin kielen
kirjallinen ja suullinen
viestintä, 4 cr
Free-choice studies 0–5 cr
●
YKYYV07 Introduction to
Science and Research, 2–5 cr
C) General studies for
students who have taken their
BSc degree at SIS 1–11 cr
Compulsory studies 1 cr
Basics of Information Literacy 1
cr is not required, only Personal
study planning 1 cr from
SISYY005.
●
SISYY005 Study Skills and
Personal Study Planning, 2 cr
Free-choice studies 0–10 cr
Scientific Writing is
recommended if the Master's
thesis is written in English.
●
KKENMP3 Scientific Writing,
5 cr
●
YKYYV07 Introduction to
Science and Research, 2–5
cr
Master's programme in Computational Big Data Analytics (CBDA)
Advanced Studies in Big Data Analytics 85 cr
Compulsory Advanced
Courses in Big Data
Analytics 50 cr
● MTTTS11 Master's
Seminar and Thesis, 40 cr
● MTTTS12 Introduction to
Bayesian Analysis 1, 5 cr
● TIETS01 Algorithms, 5 cr
Advanced Courses in
Methods of
Computational DataAnalytics 15– cr
● TIETS07
Neurocomputing, 5 cr
● TIETS11 Data Mining, 5
cr
● TIETS31 Knowledge
Discovery, 5–10 cr
● TIETS39 Machine
Learning Algorithms, 5 cr
● TIETS33 Advanced
Course in Computer
Science, 1–10 cr
Advanced Courses in
Methods of Statistical
Data-Analytics 20– cr
● MTTTS13 Introduction to
Bayesian Analysis 2, 5 cr
● MTTTS14 Statistical
Modeling 1, 5 cr
● MTTTS15 Statistical
Modeling 2, 5 cr
● MTTTS16 Learning from
Multiple Sources, 5 cr
● MTTTS17 Dimensionality
Reduction and
Visualization, 5 cr
● MTTTS18 Time Series
Analysis 1, 5 cr
● MTTTS19 Advanced
Regression Methods, 5 cr
● MTTTS21 Statistical
Inference 2, 5 cr
● MTTS1 Other course
(advanced)
Master's programme in Computational Big Data Analytics (CBDA)
Other and optional Studies in Big Data Analytics Programme 13–29 cr
Compulsory Introductory
Studies 5 cr
● TIETA17 Introduction to
Big Data Processing, 5 cr
Complementing Studies
Optional Studies
Complementing studies
determined based on
previous education
Recommended studies in
Applications of Data-Analytics
●
TIETS05 Digital Image
Processing, 5 cr
●
MTTTS20 Basics of Financial
Data-Analysis and Risk
Theory, 5 cr
●
ITIS13 Information retrieval
methods, 5 cr
●
ITIS16 Information practices
literature, 5–20 cr
●
MTTA3 Internship, 2–10 cr
CBDA Courses
Fall 2015
I: Introduction to Bayesian Analysis 1
I: Introduction to Big Data Processing
I-II: Learning from Multiple Sources
I-IV: Information practices literature
Prior and posterior distributions, Bayes
estimators, posterior predictive distribution,
interval estimation and hypothesis testing,
single-parameter models, simple
multiparameter models.
Data fusion, transfer learning, multitask
learning, multiview learning, and learning
under covariate shift
II: Time Series Analysis 1
Simple time series models, stationary time
series models (ARMA), nonstationary and
seasonal time series models (SARIMA), time
series regression, periodogram.
(Master's thesis and seminar runs
every fall and spring.)
Typical characteristics and common
applications of big data; basics of distributed
file systems, databases and computing;
practical data processing skills with
MapReduce / Apache Hadoop
Literature package on either: Information
practices; Information retrieval systems;
Interactive information retrieval; task-based
information retrieval
I-II: Knowledge Discovery
phases of the process of knowledge
discovery and its nature; basic data
prepocessing, data mining and
postprocessing tasks and methods;
application in practical knowledge discovery
tasks; advanced methods in knowledge
discovery; data management issues
CBDA Courses
Spring 2016
III: Introduction to Bayesian Analysis 2 III: Data Mining
Markov chains, MCMC methods, model
checking and comparison, commonly used
statistical models, such as hierarchical and
regression models, binomial and count data
models.
III-IV: Dimensionality Reduction and
Visualization
premises, objectives, relevance, and basic
methods of data mining; properties of data
and measurements, preprocessing methods,
some data mining algorithms and their
applications, for instance, for classification
and prediction of data.
I-IV: Information practices literature
Properties of high-dim data; Feature
Selection; Linear feature extraction; Graphical
excellence; Human perception; Nonlinear
dimensionality reduction; Neighbor embedding
methods; Graph visualization.
Literature package on either: Information
practices; Information retrieval systems;
Interactive information retrieval; task-based
information retrieval
IV: Statistical Inference 2
basic and advanced machine learning
methods for data mining, pattern recognition
and other problems
Roles of Modeling in Statistical Inference,
Principles of Data Reduction,
Estimation: Risk, Loss of estimators,... Large
sample properties
Likelihood-Based Methods, likelihood-based
tests and confidence regions
IV: Machine Learning Algorithms
CBDA Statistics Courses
Fall 2016 (preliminary!)
Spring 2017 (preliminary!)
I: Introduction to Bayesian Analysis 1
III: Statistical Modeling 1
I-II: Learning from Multiple Sources
III-IV: Dimensionality Reduction and
Visualization
Prior and posterior distributions, Bayes
estimators, posterior predictive distribution,
interval estimation and hypothesis testing,
single-parameter models, simple
multiparameter models.
Data fusion, transfer learning, multitask
learning, multiview learning, and learning
under covariate shift
II: Possibly ”Basics of financial data
analysis and risk theory 5cr”, or
another course
Multinomial and ordinal regression,
nonlinear regression, parametric survival
analysis, counting process models,
semiparametric hazard models.
Properties of high-dim data; Feature
Selection; Linear feature extraction;
Graphical excellence; Human perception;
Nonlinear dimensionality reduction;
Neighbor embedding methods; Graph
visualization.
IV: Statistical Modeling 2
Normal mixed model and extensions,
growth curve models, models for panel
discrete (binary,count, categorical)
observations, analysis of missing data,
mixture or latent class regression,
hierarchical and latent structure models
Tilastotiede on tiedon ja
epävarmuuden hallintaa.
Niin kauan kuin maailmassa
on epävarmuutta, on tarvetta
tilastotieteelle.