SAS High Performance Analytics och Arkitektur

make connections • share ideas • be inspired
SAS High Performance
Analytics och Arkitektur
Börje Edlund
SAS Institute Nordic EIA
[email protected] , twitter @BorjeEdlund
Copyright © 2013, SAS Institute Inc. All rights reserved.
Agenda High Performance Analytics och Arkitektur
 Exempel på problemområden av idag
 Arkitektur för att hantera data för analys, High
performance analys och användning av resultatet för
att realisera värdet.
 Några detaljer kring nyheter inom HPA och även vad
nytt inom HPA som ingår i SAS analysdelar.
 Demonstration av ett enkelt exempel.
Copyright © 2013, SAS Institute Inc. All rights reserved.
Problemområden (Finans, Telecom, Handel, Offentlig sektor, Industri)
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS Forum 2011
Hög kapacitet
Låg kapacitet
Nutek:
5 miljarder 2007
Naturvårdsverket: 2.5 miljarder 2005
Copyright © 2013, SAS Institute Inc. All rights reserved.
Mer data , snabbare analyser , snabbare värde , nya möjligheter
Jmf en väderprognos – ny prognos varannan timme eller endast en gång i veckan?
Värdet ligger inte i att det går snabbare och exaktare – Värdet ligger i vad detta sedan gör för företaget:
•
Flera modeller skapas och underhålls, mer individuella erbjudanden, bättre premiesättning
•
Analytiker hinner testa och skapa bättre modeller, mot allt data, bättre modell med bättre vinst
•
Ny prissättning klar varje morgon, istället för varje månad , man slipper arbeta med gamla priser
•
Dagliga prognoser av lagret, eller signaler under dagen, istället för varje vecka
•
Analysera alla dåliga betalare istället för ett urval
•
Friställa tid för annat som exempelvis analyser för strategiska initiativ
•
Möjlighet att göra sådant som inte var möjligt , bättre affärsmodell - genomförande
•
Realtids ändring mha scoring av kunderbjudandet hela tiden
•
Analys av nytt data som sociala media med befintligt data ger bättre kunskap om kundens beteende.
•
Kunna köra igenom och stresstesta finansiella risker , inte en gång per natt,vecka utan ad-hoc, marknadsfördel.
•
Titta samtidigt och visualisera allt data, avsett storlek hitta samband och förstå vad som sticker ut, utan SQLfråga.
Copyright © 2013, SAS Institute Inc. All rights reserved.
Exempel logisk Arkitektur
Deploy model
Business process
SAS Visual Analytics
SAS HPA
Hadoop
Data
Appliance
pp
Event Stream
Processing
ETL/ELT
Source data
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS
EG
SAS
DM
SAS
EM
Deploy model
Data Gov.
Analytics lifcycle
management
SAS Decision Manager/RtDM
Exempel logisk Arkitektur
Deploy model
Business process
SAS Visual Analytics
SAS HPA
Hadoop
Data
Appliance
pp
Event Stream
Processing
ETL/ELT
Source data
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS
EG
SAS
DM
SAS
EM
Deploy model
Data Gov.
Analytics lifcycle
management
SAS Decision Manager/RtDM
SAS DATAMANAGEMENT CONSOLE
Copyright © 2013, SAS Institute Inc. All rights reserved.
Datahantering
kortfattat
Connectors &
Access
Engines
Transparent access to data stored on a variety of platforms and formats
(>60 different sources)
• Data residing in Applications as well as metadata stores
• Structured, semi-structured and unstructured data
• Data ‘at rest’ and data streams
Event stream processing
Continuously analyzes data as it is received for real time
decision making
• Transaction fraud detection
• Real-time analysis of social data streams
• Personalized online offers based on navigation
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS Data
Management
SAS 9.4
DELMÄNGD AV SAS/ACCESS ENGINES

SAS/Access to PC Files

SAS/Access to Teradata

SAS/Access to Oracle

SAS/Access to SQL Server

SAS/Access to DB2

SAS/Access to Vertica

SAS/Access to PostgreSQL

SAS/Access to SybaseIQ

SAS/Access to GreenPlum

SAS/Access to Netezza

SAS/Access to Hadoop

SAS/Access to Aster nCluster
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS/Access
Exempel logisk Arkitektur
Deploy model
Business process
SAS Visual Analytics
SAS HPA
Data
Appliance
pp
Data Gov.
Hadoop
Event Stream
Processing
ETL/ELT
Source data
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS
EG
SAS
DM
SAS
EM
Deploy model
Analytics lifcycle
management
SAS Decision Manager/RtDM
SAS IN-database
SAS Embedded Process
Scoring Accelarator
SAS Co-located storage
Exempel logisk Arkitektur
Deploy model
Business process
SAS Visual Analytics
SAS HPA
Hadoop
Data
Appliance
pp
Event Stream
Processing
ETL/ELT
Source data
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS
EG
SAS
DM
SAS
EM
Deploy model
Data Gov.
Analytics lifcycle
management
SAS Decision Manager/RtDM
ESP
INTRODUKTION
WHAT IS SAS EVENT STREAM PROCESSING ?
ENGINE
Process Data
DATA IN
“On the Flow”
(Events)
(called Events)
Very High speed
Low latency
Copyright © 2013, SAS Institute Inc. All rights reserved.
DATA OUT
ESP
INTRODUKTION
ON THE FLOW ?
BATCH ENGINE
1.
Prepare data
2.
Run Process
3.
Get results
4.
Goto step 1
Copyright © 2013, SAS Institute Inc. All rights reserved.
STREAM ENGINE
1.
Run Process
2.
Continuous loop :
a.
Receive data in
b.
Process data
c.
Push results out
ESP
INTRODUCTIO
N
PROCESS DATA
SAS ESP
Filtering
Calculations
DATA IN
(called Events)
Aggregation
Joins
Procedural
Thresholding
Pattern detection
Copyright © 2013, SAS Institute Inc. All rights reserved.
DATA OUT
(Events)
ESP Koncept
“DATAFLOW CENTRIC” - DVS INTE ETL / DATA I RÖRELSE
SAS ESP
WINDOW
WINDOW
Event
Stream
DATA IN
(Events)
SOURCE
FILTER
1
Event
Stream
WINDOW
WINDOW
DATA OUT
WINDOW
JOIN
Event
Stream
DATA IN
(Events)
SOURCE
Event
Stream
CALCUL.
Event
Stream
2
WINDOW
WINDOW
WINDOW
(Events)
WINDOW
DATA OUT
Event
Stream
DATA IN
SOURCE
3
JOIN
CALCUL.
Event
Stream
Event
Stream
THRESHOL
D
Design of the rule model (called “Continuous Query”)
using components (called “Windows”)
Copyright © 2013, SAS Institute Inc. All rights reserved.
(Events)
(Events)
Exempel logisk Arkitektur
Deploy model
Business process
SAS Visual Analytics
SAS HPA
Hadoop
Data
Appliance
pp
Event Stream
Processing
ETL/ELT
Source data
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS
EG
SAS
DM
SAS
EM
Deploy model
Data Gov.
Analytics lifcycle
management
SAS Decision Manager/RtDM
SAS decision manager
USE CASES..
Identify
Fraudulent Activity
Process New
Loan Applications
Personalize
Online Experience
Recommend
Drugs & Dosage
Identify
Dangerous Driving
Copyright © 2013, SAS Institute Inc. All rights reserved.
Exempel logisk Arkitektur
Deploy model
Business process
SAS Visual Analytics
SAS HPA
Hadoop
Data
Appliance
pp
Event Stream
Processing
ETL/ELT
Source data
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS
EG
SAS
DM
SAS
EM
Deploy model
Data Gov.
Analytics lifcycle
management
SAS Decision Manager/RtDM
Infrastruktur kapabel till BIG analytics
Reliable Analytics Infrastructure
Analytics Engine
High Performance Computing
Grid Computing
In-Database
Architecture Flexibility
Desktop - SMP - MPP - Grid
Copyright © 2013, SAS Institute Inc. All rights reserved.
In-Memory
Analytics
Deployment Flexibility
On Premise, Cloud, Appliance
Många SAS lösningar använder HPA tekniken
SAS Enterprise Miner, SAS Visual Analytics, SAS Fraud
Framework, SAS Integrated Marketing , SAS Forecasting,
SAS Anti Money Laundering, SAS HPRisk, SAS PriceOptimization, SAS Datamanagement, SAS Credit Scoring ,
SAS Social Media Analytics, SAS Dataquality , SAS Scoring
Accelerator , SAS/BASE SAS/STAT mfl......
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS® HighPerformance
Analytics
Statistics
•
•
Binary target
& continuous
no.
predictions
Linear, NonLinear, &
Mixed Linear
modeling
NÅGRA ANALYTISKA OMRÅDEN OCH
EXEMPELANVÄNDNING
Data Mining
Text Mining
Parsing
large-scale
text
collections
•
• Complex
relationships
•
Tree-based
Classification
•
Variable
Selection
Copyright © 2013, SAS Institute Inc. All rights reserved.
Extract
entities
•
Auto.
Stemming &
synonym
detection
•
Forecasting
•
Large-scale,
multiple
hierarchy
problems
Econometrics
•
Probability of
events
•
Severity of
random
events
Optimization
Local search
optimization
•
•
Large-scale
linear &
mixed integer
problems
SAS 9.4 Juni
®
SAS® HighPerformance
Statistics
HPLOGISTIC
HPREG
HPLMIXED
HPNLIN
HPSPLIT
HPGENSELECT
SAS® HPA PROCEDURE EXAMPLES (RELEASE 12.3)
SAS® HighPerformance
Econometrics
HPCOUNTREG
HPSEVERITY
HPQLIM
SAS® HighPerformance
Optimization
HPLSO
Select features in
OPTMILP
OPTLP
OPTMODEL
SAS® HighPerformance
Data Mining1
HPREDUCE
HPNEURAL
HPFOREST
HP4SCORE
HPDECIDE
SAS® HighPerformance
Text Mining
SAS® HighPerformance
Forecasting2
HPTMINE
HPTMSCORE
HPFORECAST
Common Set (HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR)
Dessa finns tillgängliga för kunder som kör SAS9.4 i SMP mode utan kostnad!!!
(om man har enSAS Med SAS/Stat SAS enterpris eMiner , SAS/ETS SAS/OR osv.)
Q4 (sas94M1) kommer ytterligare HPA anpassade funktioner
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS 9.4
HP Enabled Nodes
•
HP Explore
•
HP Transform
•
HP Variable Selection
•
HP Impute
•
HP Regression
•
HP Neural Network
•
HP Data Partition
•
HP Forest
•
HP Text Miner
•
HP Decision Tree
Copyright © 2013, SAS Institute Inc. All rights reserved.
HIGH-PERFORMANCE NODER I ENTERPRISE MINER
HP PROCs in Single server (SMP MODE)
libname disk BASE “/filesys”;
proc hpreg data=disk.source;
analytic stuff…
run;
OPERATING SYSTEM
1
SAS Process
3
Process
5
2
SAS starts HPREG PROC
6
Multiple threads are launched to process the
incoming data
As execution continues, temporary data
is written out to utility files on disk
4
Disks – “/filesys”
Temp/Utility files
to support SAS
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS Datasets
HPPROCs in distributed architecture (MPP)
HADOOP HDAT – SHARED-RACK EXAMPLE
libname a sashdat;
option set=gridhost=“NAMENODE”;
proc hpreg data=a.source;
analytic stuff…
performance nodes=all;
run;
HADOOP NAMENODE
OPERATING SYSTEM
4
Process
NODE 1
SAS Process
4
1
SAS Process Steps:
SAS starts HPREG PROC
Due to GRIDHOST and proper access
engine setting, multi-threaded processes
are started on grid nodes
4
3
2
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
Data
6
7
As processes start up, data is lifted into RAM
from HDFS.
Processing occurs in parallel against in
memory data
Results return to initiating process on SAS
Server
5
NODE 2
4
5
Data
6
NODE N
5
4
6
Data
SAS® SCORING
ACCELERATOR
OpSys1
Flat file
extract
SQL
Server
SAS
Customer
Selection
Copyright © 2013, SAS Institute Inc. All rights reserved.
BUSINESS VALUE
Past Approach
• Daily process begins
with flat file creation at
6:30am – SLA delivered
at ~9:30am.
In-Database Approach
• Daily process begins
at 4:00am with EDW load.
OpSys1
Business Value
- Scope of customer analysis: 350K vs. 40M
- Monthly collections: $1M-$3M per month
- Approximately 13% incremental lift
• File transferred to SQL Server, limited to
~350K
customer records based on specific
criteria.
• All operational data loaded directly to
EDW. No flat file or intermediate
processing is needed.
• 300 step process to support data mining
life cycle.
• 10 step process
• Scoring and customer selection done indatabase against ALL customer rows
30 MINUTES TO SCORE ~350k
customers
4 MINUTES TO SCORE ~40M customers
Runs in ~ 3 HOURS
Runs in 12 MINUTES
SAS
Scoring
Accelerator
Teradata EDW
Demonstration på skillnaden att köra analys
på gamla sättet
och det nya HPA sättet!
Copyright © 2013, SAS Institute Inc. All rights reserved.
28
make connections • share ideas • be inspired
Frågor?
Copyright © 2013, SAS Institute Inc. All rights reserved.