– New Reference Architectures for Information Management Big Data Luis Campos

Big Data – New Reference Architectures for Information Management
Luis Campos
Big Data Solutions Lead, Oracle EMEA
@luigicampos
1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Big Data
New Reference Architectures for Information Management
AGENDA
- The New Information
- From 3-Tier to N-Tier Architecture
- What about High Performance Computing?
- The New Reference Architectures
- New Technologies and the role of Oracle Corp.
- Challenges of the main industries.
2
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Where’s the New Information?
New Sources
from Any Data
3
New Analytics
on All Data
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
New Integrations
of Data
New
Orchestrations
Any Computing
model
What does “New Data” really means?
Any Data,
Any Source
Absorb
All
Dimensions
of Data
=
360º
4
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
What does “All Data” really means?
Any Data,
Any Source
5
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Stop Throwing
Data Away
=
Know More
About What’s
Going
On
in your Business
What does “Any Data” really means?
Any Data,
Any Source
6
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Tap Any Data
=
New Revenue
Streams
From 3-Tier to N-Tier Architecture
Presentation
Tier
• Created partly to split
Presentation and Logic Layer
• Pushing away Logic from
the Data created new
challenges
Logic Tier
Data Tier
7
• Data would need to be
moved around in massive
amounts, using a plethora of
protocols and caching layers
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
What about High Performance Computing?
Does everyone have the need for Supercomputing?
• HPC: solving extraordinary real
life problems with extraordinary
computing power
• Vertical Computing:
Supercomputers
• Distributed Computing:
Massive Computer Clusters
8
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The New Reference Architectures
Emerging Challenges Call for New Solution Mix
• Low Latency Systems
• Pattern Recognition
• Data Science as a Service
9
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Low Latency Systems
Real Time processing for the masses
• Mobile Computing
• Critical element in User Experience
• Element of responsiveness in any user interface
Users don’t need this message anymore: “Your request is being processed...”
10
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Pattern Recognition
Predictive systems
• Act on pattern, not re-act
• Elements:
• Agents (Sensors)
• Event Processing engine
• Rules Engine
• Action Broadcast system
• Self Learning mixed with Supervised Learning
Input: Lots of Low Density Data
Output: Immediate Actions inside a context
Examples: Guided navigation, While-you-browse
recommendations, manufacturing lines, retail in-store promos
11
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Data Science as a Service
Lend the power of science and technology to everyday problems
• Incorporate non-deterministic data
When you can’t ask questions outside the function
• Generation G: “I need the system to tell them
what I want”
Enterprise Applications:
• Government Intelligence
• Enterprise Security
• Fraud Detection
12
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Reference Architecture
Source Data Layer
Information Access
Processes
COTS/ERP
Staging Data Layer
Strongly Typed
Data
Enterprise Data
with full history
External
Data
Quality
Social/Text
Foundation Layer
Performance Layer
Embedded
Data Marts
Weakly Typed
Data
Sensors
Knowledge Discovery Layer
Streaming
Security and Metadata
13
Data Mining Sandbox
Data Integration
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Rapid Dev Sandbox
BI Abstraction & Query Federation
Enterprise Data Warehouse
Performance
Management
Alerts,
Dashboards,
Reporting
Services
Information
Discovery
Advanced
Analysis &
Data Science
Translated into Oracle Product Architecture
Source Data Layer
Information Access
Processes
COTS/ERP
Staging Data Layer
Strongly Typed
Data
External
Data
Quality
Social/Text
Sensors
14
Oracle
Database
Performance Layer
Enterprise Data
with full historyAnalytics & OLAP
-Advanced
Embedded
- Spatial and
Graph
Data
Marts
- Industry Models
Oracle
NoSQL
Database
Knowledge
Discovery Layer
CDH
Weakly Typed
Data
Streaming
Security and Metadata
Foundation Layer
Data Mining Sandbox
Data Integration Oracle
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Endeca Information
Discovery
Rapid Dev Sandbox
Big Data Connectors
BI Abstraction & Query Federation
Enterprise Data Warehouse
Performance
Management
Alerts,
Dashboards,
Oracle
BI
Reporting
Foundation
Services
Endeca
Information
Information
Discovery
Discovery
Advanced
Analysis &
Data Science
Translated into Oracle Engineered Systems
Source Data Layer
Information Access
Processes
COTS/ERP
Staging Data Layer
Strongly Typed
Data
Foundation Layer
Performance Layer
Oracle
Embedded
Exadata
Enterprise Data
with full history
External
Data
Quality
Data Marts
Social/Text
Sensors
Streaming
Security and Metadata
15
Weakly Typed
Data
Oracle
Big Data
Knowledge
Discovery Layer
Appliance
Data Mining Sandbox
DataData
Integration
Big
Connectors
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Endeca
Rapid Dev Sandbox
BI Abstraction & Query Federation
Enterprise Data Warehouse
Performance
Management
Alerts,
Dashboards,
Reporting
Oracle
Exalytics
Services
Information
Discovery
Advanced
Analysis &
Data Science
Big Data Appliance
Hadoop Ecosystem for the Enterprises
Oracle
Big Data
Appliance
Cloudera Dist. Hadoop
Oracle NoSQL
BD Connectors
16
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
18 Nodes
648TB, 288 CPUs
12 Nodes (U)
6 Nodes
216TB, 96 CPUs
Oracle’s Big Data Connectors
Unlock the power of Hadoop integration
Hadoop
Oracle Database
Oracle Big Data
Connectors
17
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
(1) Oracle Data Integrator Application
Adapters for Hadoop
Transforms
Via MapReduce(HIVE)
Benefits
 Consistent tooling across BI/DW,
SOA, Integration and Big Data
Oracle Data
Integrator
Activates
 Reduce complexities :
Oracle
Loader for
Hadoop
graphical tooling
Loads
 Improves productivity
Oracle Database
Improving Productivity and
Efficiency for Big Data
18
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
(2) Oracle SQL Connector for Hadoop
Accessing HDFS Data from Oracle Database
Features
HDFS
Access or load into the
database in parallel using
external table mechanism
Oracle Database
Access and analyze data in
place on HDFS
Query and join data on
HDFS with database
resident data
Load into the database
using SQL if required
Automatic load balancing to
maximize performance
19
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
SQL Query
ODCH
ODCH
ODCH
HDFS
Client
External
Table
(3) Oracle R Connector for Hadoop
R Analytics leveraging Hadoop and HDFS
Oracle R Client
Linearly Scale a Robust Set
of R Algorithms
MAP
MAP
REDUCE
MAP
MAP
Hadoop
REDUCE
HDFS
20
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Leverage MapReduce for R
Calculations
Compute Intensive
Parallelism for Simulations
What is
?
• Brings R’s statistical functionality to the Oracle Database
• Eliminates R’s memory constraints
• Allows R to run on very large data sets
• Oracle R is architected for enterprise production infrastructure
• Automatically exploits database parallelism without requiring
parallel R programming
• Oracle R leverages the latest R algorithms and packages
• R is an embedded component of the DBMS server
• Part of Oracle Advanced Analytics (+ODM)
Oracle R Architecture
R workspace console
Function push-down
– data transformation &
Oracle statistics engine
statistics
Development
Production
OBIEE, Web
Services
Consumption
• Leverages SQL for data prep, analysis and enhanced statistics engine
• R engine runs on database nodes for production enablement of R models
• Leverages Exadata—Oracle R workloads run in-database and can be bound to
database nodes for workload isolation
• Enriches OBIEE dashboards with Oracle R statistics and analytics
Oracle Data Mining (ODM)
Data mining can answer questions
that cannot be addressed through
simple query and reporting techniques.
• Data Mining: Insight from discovering relationships
• Knowledge about what happened in the past
• Characterization, segmentation, comparisons, discrimination
• Descriptive models of patterns
• Predictive Analytics: Making better decisions and
forecasts
• Knowledge about what is happening right now and in the future
• Classification and prediction of patterns
• Rule-and-model driven
Data Mining – Some Definitions
Supervised Learning
Problem Classification
Sample Problem
Classification
Predict customer response to an affinity
card program
Regression
Predict customer’s age
Attribute Importance
Find the most significant predictors, data
preparation
A1 A2 A3 A4 A5 A6 A7
Data Mining – Some Definitions
Unsupervised Learning
Problem Classification
Sample Problem
Anomaly
Detection
Identify customer purchasing behavior that is
significantly different from the norm
Association
Rules
Find the items that tend to be purchased
together and specify their relationship –
market basket analysis
Segment demographic data into clusters and
rank the probability that an individual will
belong to a given cluster
Group the attributes into general
characteristics of the customers
Clustering
Feature
Extraction
F1 F2 F3 F4
Endeca Information Discovery
Sandbox and Production mode
Endeca
Information
Discovery
Studio
Endeca MDEX Server
Intergration Suite
26
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
What is the world doing today
Large Spanish Clothes Manufacturer
• Automation
• Sensory Event Processing
• Quality Assurance
27
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
What is the world doing today
Second Largest Bank in United States of America
• Analysis of data xLoB:
Loans, Insurance, on-line banking, card products
• Market assessment
• Risk Analysis
• Revenue lift for new & existing products
28
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Telco Industry
Deep, Big and Fast
Deep
• SNA*, Find Influencers, RA**
Big
• Network Optimization,
• CDR Analysis
Fast
• Sentiment Analysis
• Location Based Services
• Click stream Analysis
© 2012 Oracle Corporation – Proprietary and Confidential
* Social Network Analysis (Rate plan optimization) ** Revenue Assurance
Retail Industry
Marketing, Merchandising and Supply Chain
Marketing
• In-store behaviour analysis
• Sentiment Analysis + Micro segmentation
Merchandising
• Assortment optimization
Supply Chain
• Distribution and logistics optimization
• Informing supplier negotiations
© 2012 Oracle Corporation – Proprietary and Confidential
Oil and Gas Use Cases
Hadoop and Seismic Data Processing
31
Copyright © 2012, Oracle and/or its affiliates. All rights
reserved.
Life Sciences / Pharmaceutical
Life Sciences
• DNA Sequencing, Diseases Correlation
Pharmaceutical
• Clinical Trial – meds simulation
© 2012 Oracle Corporation – Proprietary and Confidential
Wrap Up
 New Challenges and the New Information
 N-Tier, HPC
 New Reference Architectures for New Data
 The role of Oracle Corp in developing New Technologies
 Challenges across all industries
© 2012 Oracle Corporation – Proprietary and Confidential
Thank You
@luigicampos
34
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
35
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.