Machine learning – en introduktion Josefin Rosén, Senior Analytical Expert, SAS Institute [email protected] Twitter: @rosenjosefin #SASFORUMSE Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – en introduktion Agenda Vad är machine learning? När, var och hur används machine learning? Exempel – deep learning Machine learning i SAS Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – vad är det? Wikipedia: Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. SAS: Machine learning is a branch of artificial intelligence that automates the building of systems that learn from data, identify patterns, and make decisions – with minimal human intervention. Copyright © 2015, SAS Institute Inc. All rights reserved. Vad är vad egentligen? Statistics Pattern Recognition Data Science Data Mining Machine Learning Databases Information Retrieval Copyright © 2015, SAS Institute Inc. All rights reserved. Computational Neuroscience AI Machine learning – vad är det? ”Komplicerade metoder, men användbara resultat” Copyright © 2015, SAS Institute Inc. All rights reserved. När används machine learning? När modellens prediktionsnoggrannhet är viktigare än tolkningen av modellen När traditionella tillvägagångssätt inte passar, t ex när man har: fler variabler än observationer många korrelerade variabler ostrukturerad data fundamentalt ickelinjära eller ovanliga fenomen Copyright © 2015, SAS Institute Inc. All rights reserved. Beslutsträd Träningsdata Regression Neuralt nätverk Copyright © 2015, SAS Institute Inc. All rights reserved. Var används machine learning? Några exempel: Rekommendationsapplikationer Fraud detection Prediktivt underhåll Textanalys Mönster och bildigenkänning Den självkörande Google-bilen Copyright © 2015, SAS Institute Inc. All rights reserved. Statistics Pattern Recognition Data Science Data Mining Machine Learning Databases Information Retrieval Copyright © 2015, SAS Institute Inc. All rights reserved. Computational Neuroscience AI Machine Learning Data Mining SUPERVISED LEARNING Regression LASSO regression Logistic regression Ridge regression Decision tree Gradient boosting Random forests Know y Neural networks SVM Naïve Bayes Neighbors Gaussian processes UNSUPERVISED LEARNING A priori rules Clustering k-means clustering Mean shift clustering Spectral clustering Kernel density estimation Nonnegative matrix factorization PCA Don’t know y SEMI-SUPERVISED LEARNING Prediction and classification* Clustering* EM TSVM Manifold regularization Autoencoders Sometimes know y Multilayer perceptron Restricted Boltzmann machines Kernel PCA Sparse PCA Singular value decomposition SOM *In semi-supervised learning, supervised prediction and classification algorithms are often combined with clustering. Copyright © 2015, SAS Institute Inc. All rights reserved. Deep learning Deep learning – att använda neurala nätverk med fler än två gömda lager Används framgångsrikt bl a inom mönsterigenkänning Bra på att extrahera features från ett dataset Copyright © 2015, SAS Institute Inc. All rights reserved. MNIST träningsdata 784 variabler bildar en 28x28 digital grid 784-dimensionell inputvektor X = (x1,…,x784) Varierande gråskala från 0 till 255 60,000 träningsbilder med label 10,000 testbilder utan label Copyright © 2015, SAS Institute Inc. All rights reserved. MNIST exempel Träna en stacked denoising autoencoder Extrahera representativa features från MNIST data Jämföra med PCA, två PCs Copyright © 2015, SAS Institute Inc. All rights reserved. Stacked denoising autoencoder Uncorrupted Output Features h5 h4 Hidden layers h3 Target Layer Hidden Neurons Hidden Neurons Hidden Neurons h2 Hidden Neurons h1 Hidden Neurons Extractable Features Partially Corrupted Input Features Copyright © 2015, SAS Institute Inc. All rights reserved. Input Layer Record ID Hidden Unit 1 Hidden Unit 2 1 0.98754 0.32453 2 0.76854 0.87345 3 0.87435 0.05464 ⋮ ⋮ ⋮ h3 Extractable Features Hidden Neurons h2 Hidden Neurons h1 Hidden Neurons Partially Corrupted Input Features Input Layer Record ID Pixel 1 Pixel 2 Pixel 3 Pixel 4 Pixel 5 Pixel 6 Pixel 7 Pixel 8 Pixel 9 Pixel 10 … 1 0 0 0 0 0 5 8 11 6 3 … 2 0 0 0 0 10 20 45 46 36 24 … 3 0 25 37 32 40 64 107 200 67 46 … ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ Copyright © 2015, SAS Institute Inc. All rights reserved. Feature extraction – denoising autoencoder Copyright © 2015, SAS Institute Inc. All rights reserved. Feature extraction - PCA Copyright © 2015, SAS Institute Inc. All rights reserved. SAS machine learning algoritmer Neural networks Decision trees Random forests Associations and sequence discovery Gradient boosting and bagging Support vector machines Nearest-neighbor mapping K-means clustering DBSCAN Self-organizing maps Local search optimization techniques such as genetic algorithms Expectation maximization Multivariate adaptive regression splines Bayesian networks Kernel density estimation Principal components analysis Singular value decomposition Gaussian mixture models Sequential covering rule building Model ensembles Recommendations Copyright © 2015, SAS Institute Inc. All rights reserved. SAS-produkter som använder machine learning SAS Enterprise Miner SAS Text Miner SAS In-Memory Statistics for Hadoop SAS Visual Statistics SAS/STAT SAS/OR SAS Factory Miner Copyright © 2015, SAS Institute Inc. All rights reserved. SAS EM-noder SAS procedurer Regression High Performance Regression LARS Partial Least Squares Regression ADAPTIVEREG GAM GENMOD GLMSELECT HPGENSELECT HPLOGISTIC HHPQUANTSELECT HPREG LOGISTIC QUANTREG QUANTSELECT REG Beslutsträd Decision Tree High Performance Tree ARBORETUM HPSPLIT Random forest High Performance Tree HPFOREST Gradient boosting Gradient Boosting ARBORETUM Neurala nätverk AutoNeural DMNeural High Performance Neural Neural Network HPNEURAL NEURAL Support vector machine High Performance Support Vector Machine HPSVM Naïve Bayes Neighbors HPBNET* Memory Based Reasoning DISCRIM *PROC HPBNET kan lära sig olika nätverksstrukturer (naïve, TAN, PC, och MB) och automatiskt välja den bästa modellen Copyright © 2015, SAS Institute Inc. All rights reserved. Supervised learning algoritmer Algoritm Unsupervised learning algoritmer Algoritm SAS EM-noder A priori rules Association Link Analysis K-means klustring Cluster High Performance Cluster SAS procedurer FASTCLUS HPCLUS Spektral klustring Custom lösning genom Base SAS och procedurerna DISTANCE och PRINCOMP Kernel density estimation KDE Kernel PCA Custom lösning genom Base SAS och procedurerna CORR, PRINCOMP och SCORE Singular value decomposition HPTMINE IML Self organizing maps SOM/Kohonen Copyright © 2015, SAS Institute Inc. All rights reserved. Semi-Supervised learning algoritmer Algoritm Denoising autoencoders SAS EM-noder SAS procedurer HPNEURAL NEURAL Copyright © 2015, SAS Institute Inc. All rights reserved. Varför har machine learning fått ökat intresse? Big data Beräkningsresurser Kraftfulla datorer “Space is big. You just won't believe how vastly, hugely, mind-bogglingly big it is” Billig datalagring Douglas Adams i ”Liftarens guide till galaxen” Copyright © 2015, SAS Institute Inc. All rights reserved. Copyright © 2015, SAS Institute Inc. All rights reserved. Mer läsning • • White papers http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf SAS-länkar http://www.sas.com/en_us/insights/analytics/machine-learning.html http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-weknew.html • SAS Data Mining Community • https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/ Big Data Matters Webinar Series: www.sas.com/bigdatamatters Copyright © 2015, SAS Institute Inc. All rights reserved. Tack! Copyright © 2015, SAS Institute Inc. All rights reserved.
© Copyright 2024