Data Virtualization Nitin Gokhale

Data Virtualization
Nitin Gokhale
Data Virtualization
Nitin Gokhale
Customer Solution Architect, APJ
What is Cisco Data Virtualization ??
Cisco Data Virtualization is agile data integration software that
makes it easy to access data, no matter where it’s managed. Our
integrated data platform lets you query all types of data across the
network as if it is in a single place.
Virtualisation, Federation and Integration are practically interchangeable to describe
the technique of making disparate data sources look like a single source
3
Customers Realizing Significant Business
Outcomes
Financial Services
Pharmaceuticals
Energy
Wireless Communications
Putnam’s marketing campaign needed information from a wide variety of
sources in order to be effective. - 5X productivity improvement
Research scientists and drug portfolio managers needed to access and
analyze a complex set of data from multiple, disparate sources. - 10%
reduction in operational infrastructure costs
The Cisco solution made possible the strategy of ‘virtually’ keeping all data in one
place, resulting in a 30-month projects being completed in 18 months, and
business productivity increase.” - Anestacio Rios, Project Lead
Qualcomm project managers needed easy access to personnel information
related to engineering, technical,and managerial resources. - 10X faster
data integration development cycle
Today’s discussion
 The data challenge
 What is Data Virtualization
 Data Virtualization Architecture
 Customer Use Cases
 Summary
5
The Data Challenge
• Organizations “dump” data into their Enterprise Data
Warehouse and are facing huge expenses for
management, upgrades to capacity,
Next Year
This Year
Last Year
• User demands for more data and quicker response times
are increasing.
• Exponential increases in data volumes and additional data
sources are adding to the complexity.
• “Data Rich, Information Poor” - Comcast
Cost
6
The Data Challenge Options :Traditional ETL
Process
Data Warehouse
Operational
Stores
•
•
•
•
Data Repository for Business Reporting
Avoids Impacting Live Systems
Involves Copying Data from sources
DW Design, Batch Process and Scripting
ESBs and App
Servers
ETL PROCESS
Transactional
Applications
Extract
Transform
Load
Business
Reports
Data Warehouses
and Marts
Business Intelligence
Issues
• Slow development cycle
• Replicated data
SaaS
Applications
• Batch latencies
Self-service Analytics
• Physical stores overhead
7
Business Pain : Today and Anticipated
Data Silos Proliferating, Data Is Now Distributed Everywhere
How Does the Business Leverage All the Data?
Traditional Data
Sources
Big Data / IoE
Sources
Cloud Data
Sources
Data Virtualization Solution
unified, business friendly view of all data for better business outcome
Business Intelligence
Analytics
Cisco Data Virtualization on UCS
Traditional Data
Sources
Big Data / IoE
Sources
Cloud Data
Sources
Logical DW
Logical Data Warehouse
• One logical place to go for data
Self-service
Analytics
Business
Intelligence
ESBs and App
Servers
• More complete business view of data
• Any data source, any location
• Supports different security systems
• Dashboard / Analytics feed
Abstract
• No data replication
“Big Data”
and NoSQL
Data Virtualization Platform
Analytic
Stores
Operational
Stores
Federate
Data Warehouses
and Marts
Cache
Transactional
Applications
SaaS
Applications
Web
Services
Cisco Data Virtualization Platform
Business
Intelligence
Customer Experience
Management
Development Environment
Governance, Risk
& Compliance
Human Capital
Management
Mergers &
Acquisitions
Single View of
Enterprise Data
Supply Chain
Management
Analytics
Cisco Data Virtualization Suite
Management Environment
Runtime Server Environment
Discovery
Manager
Studio
Cisco Information Server
Monitor
Active Cluster
Adapters
XML
Packaged Apps
RDBMS
Excel Files
Data Warehouse
OLAP Cubes
Hadoop / “Big Data”
XML Docs
Flat Files
Web Services
Cisco Data Virtualization
Better Business Outcomes, Faster, for Less
Business Intelligence
Analytics
Cisco Data Virtualization
Up to 75%
Cost Savings
5-10x Faster
Immediate
Access
Higher Impact
More Agile
Less Expensive
Empower people to achieve better business outcomes
with instant access to all the data they want, the way they
want it.
Respond faster to your ever changing business
conditions and analytics/BI needs.
Data virtualization’s streamlined approach
reduces complexity and saves money.
Data Virtualization Architecture Overview
Runtime
Server EnvironmentPlatform
Cisco Data
Virtualization
Front-end Applications
Development
Environment
Management
Environment
Cisco Information Server
SQL
(ODBC, JDBC, ADO.NET)
Discovery
Web Services
(HTTP, REST, SOAP, JSON, OData)
Messaging
(JMS)
Hadoop
(Input Format)
Manager
Security
Federation Engine
Cost-based Optimizer
Rules-based Optimizer
Studio
Views, SQLScript (Database Centric)
Caching
Adapters
Monitor
XQuery, Java, WSDL, SCA (Services Centric)
Quality
Governance
Active Cluster
Security
SQL
(ODBC,
JDBC)
Web
Services
(REST, SOAP)
Messaging
(JMS)
URI
Hadoop
(HiveDB)
Java
MF
Adapter
Application
APIs
XML
Applications
Big Data Stores
Excel Files
Flat Files
Mainframes
Mainframes
OLAP
OLAP
Cubes
Cubes
Messages
Messages
XMLXML
Docs
Docs
RDBMS
RDBMS
Web Services
Data Virtualization Cluster Deployed on UCS
UCS outperforms competition
on TPS benchmarks, optimizing query transaction
processing.
UCS
Cisco Information
Server
UCS 10GE Unified Fabric
maximizes
throughput from data
sources.
High-throughput east-west
interconnects maximize
cluster
performance.
UCS Common Platform Architecture
scales to support any
workload and configuration.
UCS
Cisco Information
Server
UCS component resiliency
ensures high-availability operations.
UCS
Cisco Information
Server
UCS unified server management
simplifies deployment and
operations in multi-cluster environments.
Where Do You Use Data Virtualization Different Approaches
Data
Federation
Big Data
Integration
DW Extension
Data Virtualization
Layer
Cloud Data
Integration
Role in Big Data Architecture:
Integrate Hadoop & Enterprise
1
Make Hadoop data
available to traditional
SQL-based BI tools.
2
Combine Hadoop data
with enterprise data from
traditional data sources.
?Data Virtualization Platform
HDF
S
HDF
S
HDF
S
3
© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Data Virtualization Platform
Provide enterprise data from
traditional data sources
directly to map-reduce jobs in
Hadoop.
Cisco Confidential
16
Customer Use Cases
Overcome Complex Data Challenges
NYSE Euronext
Customer Challenge
• Improve performance of “big”
data access
• Increase availability and
accessibility of data for users
• Increase integration of
reference data
Data Virtualization
Solution
• Used the Cisco Data
Virtualization Platform as a
virtual data store for Trades,
Orders, Reports, Quotes,
Cancels, Admins (TORQCA)
data
Impact on Customer
• Reduced development costs
(anticipated savings of over
$4.5 million annually)
• Improved time to solution
Shorten Drug Discovery Cycle
Customer Challenge
Data Virtualization
Solution
Impact on Customer
• Shorten drug discovery cycle
• Remove data integration
bottlenecks for researchers
• Maintain data quality and
security
• Created agile data integration
methodology using the Cisco
Information Server
• Provided quicker, iterative
access to data from multiple,
disparate sources
• Reduced time new information
from months to days
• Improved data quality by 5%
• Decreased R&D project dates
misses by 60%
Drive Effective Marketing Campaigns
Customer Challenge
• Find customer trends to drive
marketing effective campaigns
• Analyze integrated collection of
data for PlayStation 3
• Combine different data sources
and make it look like unified
source; limit replication
Data Virtualization Solution
• Built a flexible data model /
abstraction layer across all
source systems
• Enhanced agility through a
flexible data delivery
infrastructure
Impact on Customer
• $9M in revenue acceleration
• IT staff savings of $415,000 on
first project
• Infrastructure cost avoidance of
$304,000 on first project
Summary : Cisco Data Virtualization
Gain more business insights by leveraging all your data – Empower your people with instant access to all the data
they want, the way they want it.
Respond faster to your ever changing analytics and business intelligence needs – Five to ten times faster time
to solution than traditional data integration.
Save 50-75% over data replication and consolidation – Stop copying so much data. Data virtualization’s streamlined
approach reduces complexity and saves money.
Increase utilization of existing server and storage investments – resulting in substantial hardware and
governance savings
Reduce Risk – Use a proven software, network and computing infrastructure to adopt big data and logical data
warehousing