Document 234810

Oracle Big Data SQL
杨长水
Master Principal Sales Consultant
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
3
Agenda
1
Oracle Big Data SQL: 一个新的架构
2
技术概要
3
演示
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
4
Big Data Customers
Big Data Analytic Services
• R&D, Supply Chain, Customer & Consumer
• Centralized Data Science Organization
BDA
Exadata
BDA
Exadata
BDA
Exadata
Business Transformation
• Leading Spanish Bank > 13M customers
• Collect & unify all relevant information
Innovative Network Defense
• Hadoop and NoSQL DB for data of different speeds
• Detect 0-days, uncover intrusions
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
5
Oracle Big Data 平台
BUSINESS
ANALYTICS
BIG DATA
APPLICATIONS
Big Data Management System
BY INDUSTRY &z LINE OF BUSINESS
DISCOVERY
BUSINESS ANALYTICS
DATA RESERVOIR
DATA WAREHOUSE
SOURCES
BIG DATA
MANAGEMENT
ORACLE BIG DATA SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
从技术创新推进商业价值
Use the Right Tool for the Job and benefit from the Power of “AND”
Hadoop
NoSQL
改变业务模式
拓展业务
Relational
业务运营
 瓦解竞争对手
 更快的提供数据
 和现有系统集成
 直接投资供应链
 迎接移动互联的挑战
 支持关键业务系统
 利用新的模式
 经济地横向扩展能力
 保护现有投资
 开拓新的分析
 确保相关技能
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
7
采用新技术克服障碍
集成
把大数据加入到现
有的架构中比较复
杂
技能
缺乏工具和培训去
利用大数据
安全
没有明确的规章制
度驱动和强制
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential
8
采用新技术克服障碍
集成
技能
安全
SQL
工程化
一体机
运行在所有
数据上的SQL
针对所有数据
的数据库安全
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential
9
要是你能够…
• 使所有的数据都可以被你Oracle 数据库应用轻而易举地访问
• 同时支持Oracle SQL查询语言的所有功能
• 可以利用到Oracle Database 12c的所有安全特性
• 不用在你的Hadoop集群和关系型数据库之间移动数据
• 并且拥有快速的查询性能
• 还可以用上你已经有数据库技能
• 同时还可以使用最新的Hadoop的创新功能
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
10
Oracle Big Data SQL – 一个新的架构
• 运行在Hadoop上的强大的高性能SQL
– Hadoop上的完全实现Oracle SQL功能
– 在Hadoop节点上本地处理SQL查询
• Oracle数据库和Hadoop之间实现简化数据集成
• 单一的SQL入口访问所有的数据
– 数据的连接操作可以扩展到Hadoop和关系型数据之间
• 优化的硬件
– 使用高速的Infiniband网络连接 Exadata和Hadoop
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
11
Oracle Big Data Management System
ORACLE BIG DATA SQL
DATA RESERVOIR
DATA WAREHOUSE
Cloudera Hadoop
Oracle
Database
Oracle
Database
Oracle Big Data
Connectors
Oracle NoSQL
Oracle R Advanced
Analytics for Hadoop
Oracle R Distribution
Oracle Data
Integrator
Oracle Industry
In-Memory,
Multi-tenant
Models
Oracle Industry Models
Oracle Advanced
Analytics
Oracle
Advanced
Analytics
Oracle Spatial & Graph
Oracle Spatial & Graph
Big Data Appliance
Apache
Flume
Oracle
GoldenGate
Oracle Data
Oracle
Oracle Event
Integrator GoldenGate Processing
SOURC
ES
Oracle Event
Processing
Exadata
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
我们如何做到这一点
1. 统一的元数据,查询语言和安全性
2. 优化性能: 在Oracle 大数据机上实现智能扫描
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
13
统一元数据,查询语言和安全性
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
14
存在Hadoop中的数据
Example: Files with JSON data
Hadoop/NoSQL Ecosystem
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}
{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}
{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}
{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}
{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}
{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}
{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}
{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}}
{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
15
Hive
• 提供类SQL的界面访问存储在HDFS中的数据
• 允许应用处理任意格式存储的数据
• 表可以捕获所需要的元数据去定位和解析数据
• SQL查询可以生成一个MapReduce 任务去处理数据
• Big Data SQL可以使用Hive的元数据简化管理但不是
必须的
• Big Data SQL 不使用Hive来执行查询
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
16
Hive Metastore
SQL Execution Engines Share Metadata
Oracle Big Data SQL
Shark
Hive
Hive Metastore
Impala
…
Table Definitions:
movieapp_log_json
movielog
movieapp_log_avro
Metastore maps DDL to Java access classes
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
19
统一元数据: 发布Hive Metadata 到Oracle Catalog中
Oracle Catalog
Hive Metastore
Hive metadata
CREATE TABLE movieapp_log_json
(click VARCHAR2(4000))
ORGANIZATION EXTERNAL
(TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
)
REJECT LIMIT UNLIMITED;
External Table
External Table
Exadata
Big Data Appliance
+
Oracle Database
+
Hadoop/NoSQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
20
Automation: 自动化:Oracle 数据模型
把Hive的定义导入到模型中
为导入的表自动生成Oracle DDL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
21
成为Oracle外部表新的数据源
CREATE TABLE movielog
(click VARCHAR2(4000))
ORGANIZATION EXTERNAL
( TYPE ORACLE_HIVE
DEFAULT DIRECTORY Dir1
ACCESS PARAMETERS
(
com.oracle.bigdata.tablename logs
com.oracle.bigdata.cluster mycluster)
)
REJECT LIMIT UNLIMITED
• 新的特性集
– ORACLE_HIVE 和ORACLE_HDFS 访问驱动
– 标识一个Hadoop集群,数据源,字段映射,错误
处理,溢出处理,日志
• 在查询执行时新表的metadata 从Oracle DDL传
递到Hadoop reader中
• 架构的可延伸性
– StorageHandler 能够在以后支持其它的数据源
– Examples: MongoDB, Hbase, Oracle NoSQL DB
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
22
在所有数据上使用Oracle SQL丰富的特有功能
Snapshot of Oracle SQL Analytic Functions
•
Ranking functions
•
– rank, dense_rank, cume_dist, percent_rank, ntile
•
•
•
•
Statistical Aggregates
•
Linear regression
– Fitting of an ordinary-least-squares regression line to a set of number
pairs.
Hypothesis Testing
– Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann
Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
– Correlation, linear regression family, covariance
•
Cross Tabs
– Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency
coefficient, Cohen's kappa
Reporting Aggregate functions
– Sum, avg, min, max, variance, stddev, count, ratio_to_report
•
Correlations
– Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).
LAG/LEAD functions
– Direct inter-row reference using offsets
•
– DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count,
min, max, range, mean, stats_mode, variance, standard deviation, median,
quantile values, +/- n sigma values, top/bottom 5 values
Window Aggregate functions (moving and cumulative)
– Avg, sum, min, max, count, variance, stddev, first_value, last_value
Descriptive Statistics
•
Distribution Fitting
– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal,
Uniform, Weibull, Exponential
– Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR
functions
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
管理所有的数据
Oracle Big Data Appliance
Oracle Database 12c
SQL
Store JSON data unconverted
in Hadoop
在Hadoop上应用高级的安全管理
− Masking/Redaction
− Virtual Private Database
− Fine-grained Access Control
Store business-critical data in
Oracle
Data analyzed via SQL
DBMS_REDACT.ADD_POLICY(
object_schema => 'hr',
object_name => 'employee',
column_name => 'social_sec_num',
policy_name => 'redact_ssn',
function_type => DBMS_REDACT.FULL,
expression => '1=1'
);
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
25
用于快速查询处理的智能扫描
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
26
智能查询优化
一个查询可以横跨Oracle数据库,Hadoop和NoSQL
Query Data in RDBMS,
Hadoop & NoSQL
快速
海量并行
Oracle SQL
本地过滤
最小化的数据移动
OracleNoSQLDB
BDS Server
HDFS Data Node
BDS Server
OracleDatabase
StorageServer
OracleNoSQLDB
BDS Server
HDFS Data Node
BDS Server
OracleDatabase
StorageServer
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Query Execution on Hadoop 在Hadoop上的查询执行
select last_name, state,
movie, genre
from movielog m, customer c
where genre=‘comedy’
and c.custid = m.custid
1
• 数据定位
• 数据结构
• 并行度
2
3
1
2
3
HDFS
NameNode
HiveMetastore
查询编译器确定:
HDFS Data Node
BDS Server
HDFS Data Node
BDS Server
使用 Big Data SQL 服务器并行读:
• 并行单位: PQ Slaves & InputSplit
• 过滤行和项目列
处理过滤后的结果
• 移动相关数据到数据库
• 和数据库表连接
• 应用数据库安全策略
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data SQL 服务器使数据移动最小化
Big Data SQL Server
1.
2.
3.
1.
2.
Hadoop Smart Scan
Apply filter predicates
Apply column projections
Apply row-level functions
• JSON Parsing
External Table Services
Read using Hadoop Classes
Convert to Oracle Data
Stream
Data Node
• 工作更靠近数据
– 从hadoop类中扫描和串行化
– 转换到Oracle的数据流中
• 智能扫描: 只发布相关数据
– 应用过滤条件
• Include complex predicates, e.g. JSON_EXISTS包括复
杂的查询条件,e.g. JSON_EXISTS
• Bloom filters 用于更快的表连接操作
• 给数据挖掘模型打分
– 预期的字段
• 从嵌入式结构中获得预期的字段
Disk
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
31
Oracle Big Data SQL
One fast SQL query, on all your data.
Oracle SQL on Hadoop and beyond
• 使用和Exadata中相同的智能扫描技术
• 使用本地的SQL操作
• 使用Oracle数据库的安全特性和确定性
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
32
高级查询&分析
具有SQL和高级分析的完整功能
利用你所有的数据
Relational, Hadoop and NoSQL
Secure 安全
统一治理所有的数据
最快的性能
利用跨平台的SQL处理
应用透明
不需要改变应用代码
Big Data Management System
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
38