Revolution Analytics R

Revolution Analytics / Why Work with Revolution R
WanHee, Kim | [email protected] | ISBC Inc 2013
2013-05-10
1
목차
•
•
•
•
•
•
•
•
Revolution Analytics는 누구인가
현재와 미래
1st 세대 Analytics
R이란 뭘까? (For Those Who Don’t Know)
2nd 세대 Analytics
R 과 RevoR 의 차이점
새로운 Beside/Inside 아키텍쳐
새로운 버전에 대한 소개
2
ISBC Korea
Who we are
Leading provider of commercial analytics platform base
d on open source R statistical computing language
Our Software Delivers
Power: Distributed, scalable high performance advanced analytics
Productivity: Easier to build and deploy analytic applications
Enterprise Readiness: Multi-platform
Customers
200+ Global 2000
Our Services Deliver
Knowledge: Our experts enable you to be experts
Global Presence
Time-to-Value: Our Quickstart projects give you a jumpstart
North America / EMEA / APAC
Guidance: Our customer support team is here to help you
Our Philosophy
Global Industries
Served
Financial Services / Retail / Telco
Digital Media / Government
Health & Life Sciences
High Tech / Manufacturing
Customer-centric innovation
Easy to do business with
3
ISBC Korea
현재와 미래
Revolution R Enterprise
V1 through V6.1
Revolution R Enterprise
V6.2 through V9
ISBC
RevoR
Partnership
Company
Founding
Relocate HQ
to Palo Alto
2007
NA Offices
NYC
Dallas
250
Customers
500
Customers
2013
Chapter 1
Capture
Mindshare
Revolution R Enterprise
V10 through v11
1000
Customers
2015
Chapter 2
Mobilize with
Market Focus
4
2017
Chapter 3
Scalable
Growth
ISBC Korea
200 Corporate Customers and Growing
Finance & Insurance
Academic & Gov’t
Healthcare & Life Sciences
Consumer & Info Svcs
5
Manuf & Tech
ISBC Korea
Comprehensive Partner Ecosystem
Marketing Service Providers
Advanced Analytics
Corios
ETL
Data Service Providers
SI / Services
Deployment / Consumption
Data / Infrastructure
6
ISBC Korea
1st 세대 Predictive Analytics
7
ISBC Korea
R이란 뭘까?
Download the White Paper
• Data analysis software
• A programming language
R is Hot
bit.ly/r-is-hot
– Development platform designed by and for statisticians
• An environment
– Huge library of algorithms for data access, data manipulatio
n, analysis and graphics
• An open-source software project
– Free, open, and active
• A community
– Thousands of contributors, 2 million users
– Resources and help in every domain
8
ISBC Korea
R 사용자 커뮤니티
From: The R Ecosystem
bit.ly/R-ecosystem
9
ISBC Korea
R is exploding in popularity and functionality
Scholarly Activity
Google Scholar hits (’05-’09 CAGR)
R
46%
SAS
SPSS
-11%
-27%
S-Plus
0%
Stata
“I’ve been astonished by the rate at whic
h R has been adopted. Four years ago, e
veryone in my economics department [a
t the University of Chicago] was using St
ata; now, as far as I can tell, R is the stan
dard tool, and students learn it first.”
10%
Deputy Editor for New Products at Forbes
Package Growth
Number of R packages listed on CRAN
“A key benefit of R is that it provides ne
ar-instant availability of new and experi
mental methods created by its user base
— without waiting for the development/
release cycle of commercial software. SA
S recognizes the value of R to our custo
mer base…”
Product Marketing Manager SAS Institute, Inc
2002
2004
2006
2008
2010
10
ISBC Korea
왜 R인가?
•
•
•
•
•
Every data analysis technique at your fingertips
Create beautiful and unique data visualizations
Get better results faster
Draw on the talents of data scientists worldwide
R is hot, and growing fast
11
ISBC Korea
Two Big Data problems: capacity and speed
 용량: problems handling the size of data sets
or models
 Data too big to fit into memory
 Even if it can fit, there are limits on what can be
done
 Even simple data management can be extremel
y challenging
 속도: even without a capacity limit, computati
on may be too slow to be useful
12
ISBC Korea
PEMAs Beat In-Memory Algorithms
 Parallel external memory algorithms (PEMA’s)




Exploit distributed and streaming data
Deliver scalability and performance
Split computations so not all data has to be in memory at one time
“automatically” parallelize and distribute algorithms
13
ISBC Korea
2nd 세대 Predictive Analytics
Big Data
Machine Learning
Quick to Fail
Lift
14
ISBC Korea
Revolution Scales R to the Enterprise…
Power
 RevoR – Performance enhanced op
en source R & CRAN
 ScaleR – High performance analytic
s
 PlatformR - Distributed Processing
Power
Productivity
 DesignerR – Analytic applications
 DeployR – Web services
 DevelopR - IDE
Productivity
Enterprise
Readiness
Enterprise Readiness
 ConnectR – High speed data conne
ctors
 Support & Qualification
 QuickstartR- 10 days to prototype
 Services & Training
15
ISBC Korea
RevoR Enterprise - High Performance, Multi-Platform Analytics Platform
Revolution R Enterprise
DeployR
DevelopR
Web Services Software Development Kit
Integrated Development Environment
ConnectR
High Speed & Direct Connectors
Teradata, HDFS (both), Hbase, SAS, SPSS, CSV, ODBC
ScaleR
High Performance Big Data Analytics
DistributedR
Streaming, In-Memory Distributed Computing Framework
IBM PureData, IBM Platform LSF, HPC Server, MS Azure Burst, Windows & redhat Servers
RevoR
Performance Enhanced Open Source R + Open Source R packages
16
ISBC Korea
Why Revolution R?
Open-Source R
RRE6
Workstation
RRE6
Server
✓
✓✓
✓✓
Exploratory data analysis
✓✓
✓✓
✓✓
Wide range of statistical methods
✓✓
✓✓
✓✓
Parallel Programming
✓
✓
✓✓
Multi-threaded performance
✘
✓
✓✓
Big Data Analytics
✘
✓
✓✓
Distributed Analytics (Grid / Cluster)
✘
Client
✓✓
Cloud Computing
✘
Client
✓✓
Hadoop Integration
✘
Client
✓✓
Multi-user support
✘
✘
✓✓
Scheduled, monitored batch production
✘
✘
✓✓
Secure code deployment, management
✘
✘
✓✓
Integration into Data Apps
✘
✘
✓✓
Interface with multiple data sources
17
ISBC Korea
HPA Benchmarking comparison*
Logistic Regression
Rows of data
1 billion
1 billion
Parameters
“just a few”
7
Time
80 seconds
44 seconds
Data location
In memory
On disk
Nodes
32
5
Cores
384
20
RAM
1,536 GB
80 GB
Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores,
a 20th as much RAM, a 6th as many nodes, and not pre-loading data into RAM.
Revolution R Enterprise Delivers Performance at
18
2% of the Cost
ISBC Korea
Decision
|||||||||
Analytics
Middleware
|||||||||
Analytic Applications
Integration
Revolution R Enterprise Propels Enterprises into the Future
Revolution R Enterprise
High Performance Analytics Platform
Data
|||||||||
Hadoop
Data Wareh
ouse
Other Data
Sources
19
ISBC Korea
One Big Data Predictive Analytics Platform Two Architectures
Rules Engine
DeployR
ScaleR
Analytics
Interactive
Web & Mobile
Applications
DevelopR
RevoR + Distributed R
Local Data Mart
Data
ConnectR
High Speed Connectors
Data Warehou
se
Desktop Applic
ations (ie: Excel
)
Business Intelli
gence
Rules Engine
Interactive
Web & Mobile
Applications
DeployR
ScaleR
|||||||||||
|
|||||||||||
|
Hadoop
Decision
Business Intelli
gence
Integration
Desktop
Applications
(ie: Excel)
• Inside Architecture
Data + Analytics
Integration
Decision
• Beside Architecture
DevelopR
RevoR + Distributed R
Hadoop
Data Warehou
se
Other Data So
urces
Other Data So
urces
20
ISBC Korea
Architecture Ecosystem for Big Data Predictive Analytics
Integration
Integration
Decision
• Inside Architecture
Decision
• Beside Architecture
Revolution R Enterprise
rHadoop p
ackage
Data + Analytics
Analytics
rHadoop p
ackage
Revolution R Enterprise
Revolution R Enterprise
|||||||||||
|
Data
|||||||||||
|
Revolution R Enterprise
Revolution R Enterprise
21
ISBC Korea
Leveraging R with Teradata Today
• Beside Teradata
• Inside Teradata
• Model Building
• In-Place Data Distillation
– Directly access Terdata dat
a Revolution R Enterprise
ConnectR (TPI connector)
– Build model in Revolution
R Enterprise
– SQL
• In-Place ETL
– SQL and/or third party pro
ducts
• Scoring
– Deploy/score model inside
Teradata using PMML via
Zementis or convert to SQ
L
22
ISBC Korea
Leveraging R with Hadoop Today
• Beside Hadoop
•
Model Building
–
–
Directly access HDFS data OR ex
tract data from Hbase or HDFS v
ia Revolution R Enterprise Conne
ctR
Build model in Revolution R Ente
rprise using CRAN and/or ScaleR
• Inside Hadoop
In-Place ETL
•
–
–
What? Transformations
How? R and/or R + CRAN
In-Place Data Distillation
•
–
–
What? Distill data to smaller data set to pull out
of Hadoop into Beside server for model building
How? Write mappers & reducers in R using rmr i
n rHadoop package and/or use CRAN
Roll-your-own parallel analytics
•
–
–
What? Write R base algorithms using MR
How? Use rmr to write mappers/reducers for par
allelized algorithms
Simulation/Experimentation
•
–
–
What? Parallel execution of simulations
How? Use rmr and/or CRAN to thread parallel ex
ecution of simulations (map only)
Scoring
•
–
–
23
What? Deploy/score model inside Hadoop
How? Use rmr and/or CRAN to score (map only)
ISBC Korea
RHadoop Connectors
Revolution R Enterprise
RODBC
rhbase
rhdfs or
ScaleR
to HDFS
HIVE
MapReduce
HBASE
HDFS
24
ISBC Korea
Diverging data paradigms
25
ISBC Korea
DeployR makes R accessible
Data Analysis
R / Statistical
Modeling Expert
DeployR
Deployment
Expert
Business Intelligence

Mobile Web Apps
Seamless
Bring the power of R to any web enabled application

Simple
Leverage common APIs including JS, Java, .NET

Scalable
Robustly scale user and compute workloads

Secure
Manage enterprise security with LDAP & SSO
26
Cloud / SaaS
ISBC Korea
Revolution Analytics Professional Services
Training
Comprehensive Topics
Consulting
Self Paced & Classroom
Remote & On site
Customizable
Projects & Staff Aug
Quick Start Programs
Entire project lifecycle support
Bundle them together
27
ISBC Korea
Get Early Win with Quick Start
28
ISBC Korea
• On-Call Technical Support
• Consulting
– Migration | Analytics | Applications | Validation
• Training
– R | Revolution R | Statistical Topics
• Systems Integration
– BI | ERP | Databases | Cloud
29
ISBC Korea
Why customers choose Revolution R Enterprise
INNOVATION
MULTI-PLATFORM
TIME-to-VALUE
VALUE
30
ISBC Korea
Announcement Revolution v6.2, March 2013
•
고속의 Teradata 데이터 연결.
–
•
‘Big Data’ 선형 모델을 위한 계단식 회귀 모델.
–
•
This new functionality provides an R interface to the parallel random number generators
supplied with the Intel MKL libraries. These allow high quality parallel random numbers to be
used in distributed computations. RevoDeployR Web Deployment framework 업데이트.
–
•
This feature allows users to automate the process of building a model by using a rigorous
method to test and select from among a range of variables that are available for use in the
model. The result is a dramatic reduction in the total time needed to fit a model. 고속의 랜덤 숫자 생성.
–
•
Teradata is the first database for which Revolution R Enterprise has a dedicated parallel
connection. Customers can seamlessly extract data from a Teradata database using the
Teradata Parallel Transporter and write it to a high performance XDF format file, or simply
analyze the data directly. The increased speed with which Revolution R Enterprise users can
move the data saves a significant amount of time when working with a large dataset.
New APIs for script management and new priority scheduling features improve the
management and operation of deployed R routines. Updated Java, JavaScript and .NET client
libraries provide additional support for applications developers, making it easier integrate ondemand R-based computations with desktop, web-based and mobile apps.
Version Update6.2부터는 R 2.15.3 Engine을 기반으로 설계가 되어 있습니다.
31
ISBC Korea
Announcement Revolution v6.0, June 2012
• IBM LSF Scheduler 와 통합되었습니다.
– RevoScale에 RxLsfCluster 명령어를 추가 하여, Revolution R에서, 분석
을 위한 Scheduling 이 가능합니다.
• Microsoft Azure for HPC Cluster
– RevoScale에 RxAzureBurst 명령어가 추가 되어, MS기반의 리소스 분배
기술이 통합 되었습니다.
• BigData를 위한 새로운 모델 지원 추가.
– Poisson Regression, Gamma regression, Tweedie Model 등의 추가로,
보험, 금융, 생명공학등 다양한 산업에서 활용할수 있게 되었습니다.
• 외부데이터를 활용한 직접 분석
– SAS/SPSS등의 라이선스 없이, 해당 소프트웨어의 데이터를 직접 분석
할수 있게 되었습니다.
– XDF형태의 Local Copy를 만들지 않아도 됩니다.
• Version Update6.0부터는 R 2.14.2 Engine을 기반으로 설계가 되
어 있습니다.
32
ISBC Korea
Partner Certification
33
ISBC Korea
The leading commercial provider of software and support for the popular
open source R statistics language.
34
ISBC Korea