Breakout B1-Big Data as a Service-Glen Campbell

The Power
To Do More
Big Data
at Ops Speed
Apps, Bits, and Ops
Data Ingestion without Ops Indigestion
Glen Campbell – Dell IT Summit 2015
April
9, 2015
Dell World
Executive Summit
No… Not THAT
Glen Campbell
Net Positive Effort…
Some people never open their
mouths without subtracting from
the sum of human knowledge.
Thomas Brackett Reed
I’m working not to resemble that statement…
#DellPTDM
Question #1 – What are You Trying to Do in BD Land?
Make Money?
Or SAVE Money?
#DellPTDM
Question #1 – What are You Trying to Do in BD Land?
Or SAVE Money?
#DellPTDM
Of prime consideration to many financial decision
makers at organizations is how to tighten their
CapEx:OpEx ratios.
…and then shrink their absolute CapEx costs.
Apps and Ops are part of the same conversation.
#DellPTDM
Uno, Dos, Tres (Catorce…)
1.
2.
3.
Anecdotal or Scientific: Can
we really ignore this?
Industry
a. VMware’s BDE
b. Microsoft’s HDInsight
Community
a. OpenStack Sahara
#DellPTDM
Start the
conversation
in your Org
How Do These Things
Relate?
9
Ops is the Reality of an Idea Over Time
~1/20th of a second
~7,000 / second
~7 billion CRUD activities / day
#DellPTDM
Venn
Ops
What
Should I Do?
What Can I
Do?
Sometimes…
How Do I Do
It?
#DellPTDM
Common
Elements
Extract
Get, buy, steal every
last piece of
information you
can.
Even if you think it
ISN’T related.
12
Common
Elements
Transform
Perform, learn the
business alchemy
to transform that
data into something
known to be useful.
13
Common
Elements
Load
Feed the business
beast NEVER sated
with enough
information, nor
enough of it, fast
enough.
14
The Most Successful Pattern-Matching Species in
History (but we still need help)
#DellPTDM
The Most Successful Pattern-Matching Species in
History (but we still need help)
#DellPTDM
The Elephants in the Room – VMware and Hadoop
Project Serengeti
Tadahari!
#DellPTDM
#DellPTDM
#DellPTDM
#DellPTDM
#DellPTDM
#DellPTDM
#DellPTDM
#DellPTDM
What’s New in the New Two Zoo?
› Exposed Cloudera Manager / Ambari endpoint configuration in UI
› HBase-only Clusters
 Allowing the integration with EXISTING HDFS
› Compute-only Clusters for Hadoop Data (worker) Nodes
 Allowing integration with an EXISTING phys/virt Hadoop implementation
› New Integration with Apache BigTop
 Excellent means of building / customizing / smoke-testing Hadoop builds
for a customer-specific environment
› Upgrade Engine
 Allowing the upgrade of BDE from rev-to-rev leaving data undisturbed,
configuration intact
#DellPTDM
Offering: Cloudera Hadoop
› Consumer
 VMware portal
•
•
•
•
Add Cloud users
Monitor resource consumption
Provision new Hadoop clusters
Scale up & down Hadoop nodes
Alfa
End User
Alfa
Admin
Operate Cloudera Hadoop
RBAC to Hadoop / Datasets
Submit data
Submit processing jobs
 Toad or other Hadoop client
tools
• Submit data
• Submit jobs
#DellPTDM
Bravo
End User
Toad
Toad
Portal
HUE
 HUE Portal & Cloudera Manager
•
•
•
•
Bravo
Admin
Portal
vRealize
Automation
Portal
HUE
VRO
VM Node VM Node
BDE
vCenter
ESXi
CPU NET DSK
VM Node VM Node
Provider
L3 IP
Logical
#DellPTDM
#DellPTDM
#DellPTDM
Windows Azure HDInsight
#DellPTDM
HDInsight - Overview
Microsoft’s
Hadoop
Distribution in
the Cloud
#DellPTDM
Offers Hadoop
on Windows
Platform
Tightly
integrated
with Microsoft
Technology
Stack
Based on
Hortonworks
Data Platform
(HDP)
HDInsight - Architecture
#DellPTDM
Microsoft Data Platform and Enterprise BI Ecosystem
#DellPTDM
HDInsight Versions
COMPONENT
VERSION 1.6
VERSION 2.1
VERSION 3.0
VERSION 3.1
(Current/Default)
Hortonworks Data Platform (HDP)
1.1
1.3
2.0
2.1.7
Apache Hadoop & YARN
1.0.3
1.2.0
2.2.0
2.4.0
Tez
0.4.0
Apache Pig
0.9.3
0.11.0
0.12.0
0.12.1
Apache Hive & HCatalog
0.9.0
0.11.0
0.12.0
0.13.1
HBase
0.98.0
Apache Sqoop
1.4.2
1.4.3
1.4.4
1.4.4
Apache Oozie
3.2.0
3.3.2
4.0.0
4.0.0
Apache HCatalog
0.4.1
Merged with Hive
Merged with Hive
Merged with Hive
Apache Templeton
0.1.4
Merged with Hive
Merged with Hive
Merged with Hive
API v1.0
1.4.1
>=1.5.1
3.4.5
3.4.5
Ambari
Zookeeper
Storm
0.9.1
Mahout
0.9.0
Phoenix
4.0.0.2.1.7.0-2162
#DellPTDM
34
HDInsight Use Case - ETL Automation
#DellPTDM
35
HDInsight Use Case - BI Integration
#DellPTDM
36
Typical Implementation
Social
Reporting and Analytics
Multi-Node
HDInsight Cluster
MapReduce
• Hive
• Java
Web Logs
Clickstream
Azure
Blob
Files
Blob
Blob
Blob
Blob
(TXT, XML, JSON, ..)
• SSRS
• Excel
• Power BI
Collaboration
Transactional
Warehouse
#DellPTDM
37
Office 365 / SharePoint
Social
Typical Implementation (Contd…)
PowerShell / SSIS / SQL Agent
Subscription & Cluster Management | Data Movement | Job Execution
Customers
E-Commerce
Web Logs
Azure
Web Logs
Blob
Blob
Blob
Blob
Blob
Blob
MapReduce
Hive
Blob
Blob Storage
OLTP
Sqoop
Or AzCopy
Transactional
Hive Metastore
Internal Systems
Team
Internal Systems
Warehouse
Internal Systems
• SSRS
• Excel
• Power BI
Collaboration, Reporting, and Analytics
#DellPTDM
38
Multi-Node
HDInsight Cluster
MapReduce
• Hive
• Pig
• Java
• Python
With Open Source, if you’re USING the boat, you’re participating
in how it moves.
The dynamic of Open Source software is
one where participation in that
community, not solely the usage of its
technology, is
part of the bargain.
#DellPTDM
The viability of many pieces of software is
increasingly being dictated NOT merely by
its functionality or the vendor, but by the
community and ecosystem around it.
Multi-Voice, Multi-Need
The Community
Approach
#DellPTDM
The Importance of EDP in the Open Source Cloud
The
OpenStack
Sahara
Project
#DellPTDM
#DellPTDM
#DellPTDM
Whose Hadoop, What Versions, What Jobs…?
• Vanilla Apache Hadoop
• - 1.2.1, 2.3.0, and 2.4.1 (2.6 just out)
• Cloudera Distribution of Hadoop (CDH)
• - CDH5
• Hortonworks Data Platform (HDP)
• - 1.3.2, and 2.0.6
• Spark
• - 0.9.1, 1.0.0, 1.0.2…
Supported Job Types:
• Jar, Pig, Hive
#DellPTDM
Supported Workflows:
• Oozie
• Mistral…?
OpenStack
“Ironic”
Pending
for
Bare Metal
Plus
• Cloudera Manager
• Apache Ambari
#DellPTDM
#DellPTDM
#DellPTDM
Apps, Ops, and DATA are Elements of the SAME Conversation
#DellPTDM
Apps, Ops, and DATA are Elements of the SAME Conversation
#DellPTDM
Apps, Ops, and DATA are Elements of the SAME Conversation
#DellPTDM
Dell’s 360° In the Analytics and Big Data Ecosystem
#DellPTDM
Design, Analyse
Manage, Change
Pull, Push
Diagnose,
Resolve
52
Confidential
Us, Them
Software Group
53
Confidential
Software Group
A Rich Portfolio of Software Assets to Drive Your Big Data Needs
•
software.dell.com/Dell-Statistica
•
software.dell.com/solutions/big-data-analytics
•
software.dell.com/products/boomi-atomsphere/
•
software.dell.com/products/toad-intelligencecentral/
•
software.dell.com/products/toad-data-point/
•
dell.com/bigdata
•
dell.cloudera.com/
•
dell.com/learn/us/en/555/solutions/hadoop-bigdata-solution
Talk to your
Dell Account Team
54
Confidential
Software Group
Marry
the
Models
Consider
Your BDaaS
Self
#DellPTDM
Thank You!
Questions
56
SharePlex
Enterprise Technologists – Content with Context
Global Marketing