Machine learning –en introduktion

Machine learning – en introduktion
Josefin Rosén, Senior Analytical Expert, SAS Institute
[email protected]
Twitter: @rosenjosefin
#SASFORUMSE
Copyright © 2015, SAS Institute Inc. All rights reserved.
Machine learning – en introduktion
Agenda
 Vad är machine learning?
 När, var och hur används machine learning?
 Exempel – deep learning
 Machine learning i SAS
Copyright © 2015, SAS Institute Inc. All rights reserved.
Machine learning – vad är det?
Wikipedia: Machine learning, a branch of artificial intelligence,
concerns the construction and study of systems that can learn
from data.
SAS: Machine learning is a branch of artificial intelligence that
automates the building of systems that learn from data, identify
patterns, and make decisions – with minimal human intervention.
Copyright © 2015, SAS Institute Inc. All rights reserved.
Vad är vad egentligen?
Statistics
Pattern
Recognition
Data
Science
Data Mining
Machine
Learning
Databases
Information
Retrieval
Copyright © 2015, SAS Institute Inc. All rights reserved.
Computational
Neuroscience
AI
Machine learning – vad är det?
”Komplicerade metoder,
men användbara resultat”
Copyright © 2015, SAS Institute Inc. All rights reserved.
När används machine learning?
När modellens prediktionsnoggrannhet är viktigare än tolkningen
av modellen
När traditionella tillvägagångssätt inte passar, t ex när man har:
 fler variabler än observationer
 många korrelerade variabler
 ostrukturerad data
 fundamentalt ickelinjära eller ovanliga fenomen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Beslutsträd
Träningsdata
Regression
Neuralt
nätverk
Copyright © 2015, SAS Institute Inc. All rights reserved.
Var används machine learning?
Några exempel:
 Rekommendationsapplikationer
 Fraud detection
 Prediktivt underhåll
 Textanalys
 Mönster och bildigenkänning
 Den självkörande Google-bilen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Statistics
Pattern
Recognition
Data
Science
Data Mining
Machine
Learning
Databases
Information
Retrieval
Copyright © 2015, SAS Institute Inc. All rights reserved.
Computational
Neuroscience
AI
Machine Learning
Data Mining
SUPERVISED
LEARNING
Regression
LASSO regression
Logistic regression
Ridge regression
Decision tree
Gradient boosting
Random forests
Know y
Neural networks
SVM
Naïve Bayes
Neighbors
Gaussian
processes
UNSUPERVISED
LEARNING
A priori rules
Clustering
k-means clustering
Mean shift clustering
Spectral clustering
Kernel density
estimation
Nonnegative
matrix
factorization
PCA
Don’t
know y
SEMI-SUPERVISED
LEARNING
Prediction and
classification*
Clustering*
EM
TSVM
Manifold
regularization
Autoencoders
Sometimes
know y
Multilayer perceptron
Restricted Boltzmann
machines
Kernel PCA
Sparse PCA
Singular value
decomposition
SOM
*In semi-supervised learning, supervised prediction and classification algorithms are often combined with clustering.
Copyright © 2015, SAS Institute Inc. All rights reserved.
Deep learning
 Deep learning – att använda neurala nätverk med fler än två gömda lager
 Används framgångsrikt bl a inom mönsterigenkänning
 Bra på att extrahera features från ett dataset
Copyright © 2015, SAS Institute Inc. All rights reserved.
MNIST träningsdata
 784 variabler bildar en 28x28 digital grid
 784-dimensionell inputvektor X = (x1,…,x784)
 Varierande gråskala från 0 till 255
 60,000 träningsbilder med label
 10,000 testbilder utan label
Copyright © 2015, SAS Institute Inc. All rights reserved.
MNIST exempel
 Träna en stacked denoising autoencoder
 Extrahera representativa features från MNIST data
 Jämföra med PCA, två PCs
Copyright © 2015, SAS Institute Inc. All rights reserved.
Stacked
denoising
autoencoder
Uncorrupted Output Features
h5
h4
Hidden layers
h3
Target Layer
Hidden Neurons
Hidden Neurons
Hidden Neurons
h2
Hidden Neurons
h1
Hidden Neurons
Extractable Features
Partially Corrupted Input Features
Copyright © 2015, SAS Institute Inc. All rights reserved.
Input Layer
Record ID
Hidden Unit 1
Hidden Unit 2
1
0.98754
0.32453
2
0.76854
0.87345
3
0.87435
0.05464
⋮
⋮
⋮
h3
Extractable Features
Hidden Neurons
h2
Hidden Neurons
h1
Hidden Neurons
Partially Corrupted Input Features
Input Layer
Record ID
Pixel 1
Pixel 2
Pixel 3
Pixel 4
Pixel 5
Pixel 6
Pixel 7
Pixel 8
Pixel 9
Pixel 10
…
1
0
0
0
0
0
5
8
11
6
3
…
2
0
0
0
0
10
20
45
46
36
24
…
3
0
25
37
32
40
64
107
200
67
46
…
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋱
Copyright © 2015, SAS Institute Inc. All rights reserved.
Feature extraction – denoising autoencoder
Copyright © 2015, SAS Institute Inc. All rights reserved.
Feature extraction - PCA
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS machine learning algoritmer

Neural networks
 Decision trees
 Random forests
 Associations and sequence
discovery
 Gradient boosting and bagging
 Support vector machines
 Nearest-neighbor mapping
 K-means clustering
 DBSCAN
 Self-organizing maps
 Local search optimization techniques
such as genetic algorithms

Expectation maximization
 Multivariate adaptive regression
splines
 Bayesian networks
 Kernel density estimation
 Principal components analysis
 Singular value decomposition
 Gaussian mixture models
 Sequential covering rule building
 Model ensembles
 Recommendations
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS-produkter som använder machine learning
 SAS Enterprise Miner
 SAS Text Miner
 SAS In-Memory Statistics for Hadoop
 SAS Visual Statistics
 SAS/STAT
 SAS/OR
 SAS Factory Miner
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS EM-noder
SAS procedurer
Regression
High Performance Regression
LARS
Partial Least Squares
Regression
ADAPTIVEREG
GAM
GENMOD
GLMSELECT
HPGENSELECT
HPLOGISTIC
HHPQUANTSELECT
HPREG
LOGISTIC
QUANTREG
QUANTSELECT
REG
Beslutsträd
Decision Tree
High Performance Tree
ARBORETUM
HPSPLIT
Random forest
High Performance Tree
HPFOREST
Gradient boosting
Gradient Boosting
ARBORETUM
Neurala nätverk
AutoNeural
DMNeural
High Performance Neural
Neural Network
HPNEURAL
NEURAL
Support vector machine
High Performance Support Vector Machine
HPSVM
Naïve Bayes
Neighbors
HPBNET*
Memory Based Reasoning
DISCRIM
*PROC HPBNET kan lära sig olika nätverksstrukturer (naïve, TAN, PC, och MB) och automatiskt välja den bästa modellen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Supervised learning algoritmer
Algoritm
Unsupervised learning algoritmer
Algoritm
SAS EM-noder
A priori rules
Association
Link Analysis
K-means klustring
Cluster
High Performance Cluster
SAS procedurer
FASTCLUS
HPCLUS
Spektral klustring
Custom lösning genom Base SAS och procedurerna
DISTANCE och PRINCOMP
Kernel density estimation
KDE
Kernel PCA
Custom lösning genom Base SAS och procedurerna
CORR, PRINCOMP och SCORE
Singular value decomposition
HPTMINE
IML
Self organizing maps
SOM/Kohonen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Semi-Supervised learning algoritmer
Algoritm
Denoising autoencoders
SAS EM-noder
SAS procedurer
HPNEURAL
NEURAL
Copyright © 2015, SAS Institute Inc. All rights reserved.
Varför har machine learning fått ökat intresse?
 Big data
 Beräkningsresurser
 Kraftfulla datorer
“Space is big. You just won't believe how
vastly, hugely, mind-bogglingly big it is”
 Billig datalagring
Douglas Adams i ”Liftarens guide till galaxen”
Copyright © 2015, SAS Institute Inc. All rights reserved.
Copyright © 2015, SAS Institute Inc. All rights reserved.
Mer läsning
•
•
White papers

http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html

http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf
SAS-länkar

http://www.sas.com/en_us/insights/analytics/machine-learning.html

http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-weknew.html
•
SAS Data Mining Community

•
https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/
Big Data Matters Webinar Series:

www.sas.com/bigdatamatters
Copyright © 2015, SAS Institute Inc. All rights reserved.
Tack!
Copyright © 2015, SAS Institute Inc. All rights reserved.