Revolution Analytics / Why Work with Revolution R WanHee, Kim | [email protected] | ISBC Inc 2013 2013-05-10 1 목차 • • • • • • • • Revolution Analytics는 누구인가 현재와 미래 1st 세대 Analytics R이란 뭘까? (For Those Who Don’t Know) 2nd 세대 Analytics R 과 RevoR 의 차이점 새로운 Beside/Inside 아키텍쳐 새로운 버전에 대한 소개 2 ISBC Korea Who we are Leading provider of commercial analytics platform base d on open source R statistical computing language Our Software Delivers Power: Distributed, scalable high performance advanced analytics Productivity: Easier to build and deploy analytic applications Enterprise Readiness: Multi-platform Customers 200+ Global 2000 Our Services Deliver Knowledge: Our experts enable you to be experts Global Presence Time-to-Value: Our Quickstart projects give you a jumpstart North America / EMEA / APAC Guidance: Our customer support team is here to help you Our Philosophy Global Industries Served Financial Services / Retail / Telco Digital Media / Government Health & Life Sciences High Tech / Manufacturing Customer-centric innovation Easy to do business with 3 ISBC Korea 현재와 미래 Revolution R Enterprise V1 through V6.1 Revolution R Enterprise V6.2 through V9 ISBC RevoR Partnership Company Founding Relocate HQ to Palo Alto 2007 NA Offices NYC Dallas 250 Customers 500 Customers 2013 Chapter 1 Capture Mindshare Revolution R Enterprise V10 through v11 1000 Customers 2015 Chapter 2 Mobilize with Market Focus 4 2017 Chapter 3 Scalable Growth ISBC Korea 200 Corporate Customers and Growing Finance & Insurance Academic & Gov’t Healthcare & Life Sciences Consumer & Info Svcs 5 Manuf & Tech ISBC Korea Comprehensive Partner Ecosystem Marketing Service Providers Advanced Analytics Corios ETL Data Service Providers SI / Services Deployment / Consumption Data / Infrastructure 6 ISBC Korea 1st 세대 Predictive Analytics 7 ISBC Korea R이란 뭘까? Download the White Paper • Data analysis software • A programming language R is Hot bit.ly/r-is-hot – Development platform designed by and for statisticians • An environment – Huge library of algorithms for data access, data manipulatio n, analysis and graphics • An open-source software project – Free, open, and active • A community – Thousands of contributors, 2 million users – Resources and help in every domain 8 ISBC Korea R 사용자 커뮤니티 From: The R Ecosystem bit.ly/R-ecosystem 9 ISBC Korea R is exploding in popularity and functionality Scholarly Activity Google Scholar hits (’05-’09 CAGR) R 46% SAS SPSS -11% -27% S-Plus 0% Stata “I’ve been astonished by the rate at whic h R has been adopted. Four years ago, e veryone in my economics department [a t the University of Chicago] was using St ata; now, as far as I can tell, R is the stan dard tool, and students learn it first.” 10% Deputy Editor for New Products at Forbes Package Growth Number of R packages listed on CRAN “A key benefit of R is that it provides ne ar-instant availability of new and experi mental methods created by its user base — without waiting for the development/ release cycle of commercial software. SA S recognizes the value of R to our custo mer base…” Product Marketing Manager SAS Institute, Inc 2002 2004 2006 2008 2010 10 ISBC Korea 왜 R인가? • • • • • Every data analysis technique at your fingertips Create beautiful and unique data visualizations Get better results faster Draw on the talents of data scientists worldwide R is hot, and growing fast 11 ISBC Korea Two Big Data problems: capacity and speed 용량: problems handling the size of data sets or models Data too big to fit into memory Even if it can fit, there are limits on what can be done Even simple data management can be extremel y challenging 속도: even without a capacity limit, computati on may be too slow to be useful 12 ISBC Korea PEMAs Beat In-Memory Algorithms Parallel external memory algorithms (PEMA’s) Exploit distributed and streaming data Deliver scalability and performance Split computations so not all data has to be in memory at one time “automatically” parallelize and distribute algorithms 13 ISBC Korea 2nd 세대 Predictive Analytics Big Data Machine Learning Quick to Fail Lift 14 ISBC Korea Revolution Scales R to the Enterprise… Power RevoR – Performance enhanced op en source R & CRAN ScaleR – High performance analytic s PlatformR - Distributed Processing Power Productivity DesignerR – Analytic applications DeployR – Web services DevelopR - IDE Productivity Enterprise Readiness Enterprise Readiness ConnectR – High speed data conne ctors Support & Qualification QuickstartR- 10 days to prototype Services & Training 15 ISBC Korea RevoR Enterprise - High Performance, Multi-Platform Analytics Platform Revolution R Enterprise DeployR DevelopR Web Services Software Development Kit Integrated Development Environment ConnectR High Speed & Direct Connectors Teradata, HDFS (both), Hbase, SAS, SPSS, CSV, ODBC ScaleR High Performance Big Data Analytics DistributedR Streaming, In-Memory Distributed Computing Framework IBM PureData, IBM Platform LSF, HPC Server, MS Azure Burst, Windows & redhat Servers RevoR Performance Enhanced Open Source R + Open Source R packages 16 ISBC Korea Why Revolution R? Open-Source R RRE6 Workstation RRE6 Server ✓ ✓✓ ✓✓ Exploratory data analysis ✓✓ ✓✓ ✓✓ Wide range of statistical methods ✓✓ ✓✓ ✓✓ Parallel Programming ✓ ✓ ✓✓ Multi-threaded performance ✘ ✓ ✓✓ Big Data Analytics ✘ ✓ ✓✓ Distributed Analytics (Grid / Cluster) ✘ Client ✓✓ Cloud Computing ✘ Client ✓✓ Hadoop Integration ✘ Client ✓✓ Multi-user support ✘ ✘ ✓✓ Scheduled, monitored batch production ✘ ✘ ✓✓ Secure code deployment, management ✘ ✘ ✓✓ Integration into Data Apps ✘ ✘ ✓✓ Interface with multiple data sources 17 ISBC Korea HPA Benchmarking comparison* Logistic Regression Rows of data 1 billion 1 billion Parameters “just a few” 7 Time 80 seconds 44 seconds Data location In memory On disk Nodes 32 5 Cores 384 20 RAM 1,536 GB 80 GB Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a 20th as much RAM, a 6th as many nodes, and not pre-loading data into RAM. Revolution R Enterprise Delivers Performance at 18 2% of the Cost ISBC Korea Decision ||||||||| Analytics Middleware ||||||||| Analytic Applications Integration Revolution R Enterprise Propels Enterprises into the Future Revolution R Enterprise High Performance Analytics Platform Data ||||||||| Hadoop Data Wareh ouse Other Data Sources 19 ISBC Korea One Big Data Predictive Analytics Platform Two Architectures Rules Engine DeployR ScaleR Analytics Interactive Web & Mobile Applications DevelopR RevoR + Distributed R Local Data Mart Data ConnectR High Speed Connectors Data Warehou se Desktop Applic ations (ie: Excel ) Business Intelli gence Rules Engine Interactive Web & Mobile Applications DeployR ScaleR ||||||||||| | ||||||||||| | Hadoop Decision Business Intelli gence Integration Desktop Applications (ie: Excel) • Inside Architecture Data + Analytics Integration Decision • Beside Architecture DevelopR RevoR + Distributed R Hadoop Data Warehou se Other Data So urces Other Data So urces 20 ISBC Korea Architecture Ecosystem for Big Data Predictive Analytics Integration Integration Decision • Inside Architecture Decision • Beside Architecture Revolution R Enterprise rHadoop p ackage Data + Analytics Analytics rHadoop p ackage Revolution R Enterprise Revolution R Enterprise ||||||||||| | Data ||||||||||| | Revolution R Enterprise Revolution R Enterprise 21 ISBC Korea Leveraging R with Teradata Today • Beside Teradata • Inside Teradata • Model Building • In-Place Data Distillation – Directly access Terdata dat a Revolution R Enterprise ConnectR (TPI connector) – Build model in Revolution R Enterprise – SQL • In-Place ETL – SQL and/or third party pro ducts • Scoring – Deploy/score model inside Teradata using PMML via Zementis or convert to SQ L 22 ISBC Korea Leveraging R with Hadoop Today • Beside Hadoop • Model Building – – Directly access HDFS data OR ex tract data from Hbase or HDFS v ia Revolution R Enterprise Conne ctR Build model in Revolution R Ente rprise using CRAN and/or ScaleR • Inside Hadoop In-Place ETL • – – What? Transformations How? R and/or R + CRAN In-Place Data Distillation • – – What? Distill data to smaller data set to pull out of Hadoop into Beside server for model building How? Write mappers & reducers in R using rmr i n rHadoop package and/or use CRAN Roll-your-own parallel analytics • – – What? Write R base algorithms using MR How? Use rmr to write mappers/reducers for par allelized algorithms Simulation/Experimentation • – – What? Parallel execution of simulations How? Use rmr and/or CRAN to thread parallel ex ecution of simulations (map only) Scoring • – – 23 What? Deploy/score model inside Hadoop How? Use rmr and/or CRAN to score (map only) ISBC Korea RHadoop Connectors Revolution R Enterprise RODBC rhbase rhdfs or ScaleR to HDFS HIVE MapReduce HBASE HDFS 24 ISBC Korea Diverging data paradigms 25 ISBC Korea DeployR makes R accessible Data Analysis R / Statistical Modeling Expert DeployR Deployment Expert Business Intelligence Mobile Web Apps Seamless Bring the power of R to any web enabled application Simple Leverage common APIs including JS, Java, .NET Scalable Robustly scale user and compute workloads Secure Manage enterprise security with LDAP & SSO 26 Cloud / SaaS ISBC Korea Revolution Analytics Professional Services Training Comprehensive Topics Consulting Self Paced & Classroom Remote & On site Customizable Projects & Staff Aug Quick Start Programs Entire project lifecycle support Bundle them together 27 ISBC Korea Get Early Win with Quick Start 28 ISBC Korea • On-Call Technical Support • Consulting – Migration | Analytics | Applications | Validation • Training – R | Revolution R | Statistical Topics • Systems Integration – BI | ERP | Databases | Cloud 29 ISBC Korea Why customers choose Revolution R Enterprise INNOVATION MULTI-PLATFORM TIME-to-VALUE VALUE 30 ISBC Korea Announcement Revolution v6.2, March 2013 • 고속의 Teradata 데이터 연결. – • ‘Big Data’ 선형 모델을 위한 계단식 회귀 모델. – • This new functionality provides an R interface to the parallel random number generators supplied with the Intel MKL libraries. These allow high quality parallel random numbers to be used in distributed computations. RevoDeployR Web Deployment framework 업데이트. – • This feature allows users to automate the process of building a model by using a rigorous method to test and select from among a range of variables that are available for use in the model. The result is a dramatic reduction in the total time needed to fit a model. 고속의 랜덤 숫자 생성. – • Teradata is the first database for which Revolution R Enterprise has a dedicated parallel connection. Customers can seamlessly extract data from a Teradata database using the Teradata Parallel Transporter and write it to a high performance XDF format file, or simply analyze the data directly. The increased speed with which Revolution R Enterprise users can move the data saves a significant amount of time when working with a large dataset. New APIs for script management and new priority scheduling features improve the management and operation of deployed R routines. Updated Java, JavaScript and .NET client libraries provide additional support for applications developers, making it easier integrate ondemand R-based computations with desktop, web-based and mobile apps. Version Update6.2부터는 R 2.15.3 Engine을 기반으로 설계가 되어 있습니다. 31 ISBC Korea Announcement Revolution v6.0, June 2012 • IBM LSF Scheduler 와 통합되었습니다. – RevoScale에 RxLsfCluster 명령어를 추가 하여, Revolution R에서, 분석 을 위한 Scheduling 이 가능합니다. • Microsoft Azure for HPC Cluster – RevoScale에 RxAzureBurst 명령어가 추가 되어, MS기반의 리소스 분배 기술이 통합 되었습니다. • BigData를 위한 새로운 모델 지원 추가. – Poisson Regression, Gamma regression, Tweedie Model 등의 추가로, 보험, 금융, 생명공학등 다양한 산업에서 활용할수 있게 되었습니다. • 외부데이터를 활용한 직접 분석 – SAS/SPSS등의 라이선스 없이, 해당 소프트웨어의 데이터를 직접 분석 할수 있게 되었습니다. – XDF형태의 Local Copy를 만들지 않아도 됩니다. • Version Update6.0부터는 R 2.14.2 Engine을 기반으로 설계가 되 어 있습니다. 32 ISBC Korea Partner Certification 33 ISBC Korea The leading commercial provider of software and support for the popular open source R statistics language. 34 ISBC Korea
© Copyright 2024