Oracle Big Data SQL 杨长水 Master Principal Sales Consultant Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3 Agenda 1 Oracle Big Data SQL: 一个新的架构 2 技术概要 3 演示 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 4 Big Data Customers Big Data Analytic Services • R&D, Supply Chain, Customer & Consumer • Centralized Data Science Organization BDA Exadata BDA Exadata BDA Exadata Business Transformation • Leading Spanish Bank > 13M customers • Collect & unify all relevant information Innovative Network Defense • Hadoop and NoSQL DB for data of different speeds • Detect 0-days, uncover intrusions Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 5 Oracle Big Data 平台 BUSINESS ANALYTICS BIG DATA APPLICATIONS Big Data Management System BY INDUSTRY &z LINE OF BUSINESS DISCOVERY BUSINESS ANALYTICS DATA RESERVOIR DATA WAREHOUSE SOURCES BIG DATA MANAGEMENT ORACLE BIG DATA SQL Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 从技术创新推进商业价值 Use the Right Tool for the Job and benefit from the Power of “AND” Hadoop NoSQL 改变业务模式 拓展业务 Relational 业务运营 瓦解竞争对手 更快的提供数据 和现有系统集成 直接投资供应链 迎接移动互联的挑战 支持关键业务系统 利用新的模式 经济地横向扩展能力 保护现有投资 开拓新的分析 确保相关技能 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7 采用新技术克服障碍 集成 把大数据加入到现 有的架构中比较复 杂 技能 缺乏工具和培训去 利用大数据 安全 没有明确的规章制 度驱动和强制 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential 8 采用新技术克服障碍 集成 技能 安全 SQL 工程化 一体机 运行在所有 数据上的SQL 针对所有数据 的数据库安全 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential 9 要是你能够… • 使所有的数据都可以被你Oracle 数据库应用轻而易举地访问 • 同时支持Oracle SQL查询语言的所有功能 • 可以利用到Oracle Database 12c的所有安全特性 • 不用在你的Hadoop集群和关系型数据库之间移动数据 • 并且拥有快速的查询性能 • 还可以用上你已经有数据库技能 • 同时还可以使用最新的Hadoop的创新功能 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 10 Oracle Big Data SQL – 一个新的架构 • 运行在Hadoop上的强大的高性能SQL – Hadoop上的完全实现Oracle SQL功能 – 在Hadoop节点上本地处理SQL查询 • Oracle数据库和Hadoop之间实现简化数据集成 • 单一的SQL入口访问所有的数据 – 数据的连接操作可以扩展到Hadoop和关系型数据之间 • 优化的硬件 – 使用高速的Infiniband网络连接 Exadata和Hadoop Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 11 Oracle Big Data Management System ORACLE BIG DATA SQL DATA RESERVOIR DATA WAREHOUSE Cloudera Hadoop Oracle Database Oracle Database Oracle Big Data Connectors Oracle NoSQL Oracle R Advanced Analytics for Hadoop Oracle R Distribution Oracle Data Integrator Oracle Industry In-Memory, Multi-tenant Models Oracle Industry Models Oracle Advanced Analytics Oracle Advanced Analytics Oracle Spatial & Graph Oracle Spatial & Graph Big Data Appliance Apache Flume Oracle GoldenGate Oracle Data Oracle Oracle Event Integrator GoldenGate Processing SOURC ES Oracle Event Processing Exadata Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 我们如何做到这一点 1. 统一的元数据,查询语言和安全性 2. 优化性能: 在Oracle 大数据机上实现智能扫描 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 13 统一元数据,查询语言和安全性 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 14 存在Hadoop中的数据 Example: Files with JSON data Hadoop/NoSQL Ecosystem {"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8} {"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7} {"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7} {"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7} {"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7} {"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}} {"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4} Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15 Hive • 提供类SQL的界面访问存储在HDFS中的数据 • 允许应用处理任意格式存储的数据 • 表可以捕获所需要的元数据去定位和解析数据 • SQL查询可以生成一个MapReduce 任务去处理数据 • Big Data SQL可以使用Hive的元数据简化管理但不是 必须的 • Big Data SQL 不使用Hive来执行查询 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 16 Hive Metastore SQL Execution Engines Share Metadata Oracle Big Data SQL Shark Hive Hive Metastore Impala … Table Definitions: movieapp_log_json movielog movieapp_log_avro Metastore maps DDL to Java access classes Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 19 统一元数据: 发布Hive Metadata 到Oracle Catalog中 Oracle Catalog Hive Metastore Hive metadata CREATE TABLE movieapp_log_json (click VARCHAR2(4000)) ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ) REJECT LIMIT UNLIMITED; External Table External Table Exadata Big Data Appliance + Oracle Database + Hadoop/NoSQL Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 20 Automation: 自动化:Oracle 数据模型 把Hive的定义导入到模型中 为导入的表自动生成Oracle DDL Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 21 成为Oracle外部表新的数据源 CREATE TABLE movielog (click VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY Dir1 ACCESS PARAMETERS ( com.oracle.bigdata.tablename logs com.oracle.bigdata.cluster mycluster) ) REJECT LIMIT UNLIMITED • 新的特性集 – ORACLE_HIVE 和ORACLE_HDFS 访问驱动 – 标识一个Hadoop集群,数据源,字段映射,错误 处理,溢出处理,日志 • 在查询执行时新表的metadata 从Oracle DDL传 递到Hadoop reader中 • 架构的可延伸性 – StorageHandler 能够在以后支持其它的数据源 – Examples: MongoDB, Hbase, Oracle NoSQL DB Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 22 在所有数据上使用Oracle SQL丰富的特有功能 Snapshot of Oracle SQL Analytic Functions • Ranking functions • – rank, dense_rank, cume_dist, percent_rank, ntile • • • • Statistical Aggregates • Linear regression – Fitting of an ordinary-least-squares regression line to a set of number pairs. Hypothesis Testing – Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA – Correlation, linear regression family, covariance • Cross Tabs – Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Reporting Aggregate functions – Sum, avg, min, max, variance, stddev, count, ratio_to_report • Correlations – Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric). LAG/LEAD functions – Direct inter-row reference using offsets • – DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values Window Aggregate functions (moving and cumulative) – Avg, sum, min, max, count, variance, stddev, first_value, last_value Descriptive Statistics • Distribution Fitting – Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential – Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 管理所有的数据 Oracle Big Data Appliance Oracle Database 12c SQL Store JSON data unconverted in Hadoop 在Hadoop上应用高级的安全管理 − Masking/Redaction − Virtual Private Database − Fine-grained Access Control Store business-critical data in Oracle Data analyzed via SQL DBMS_REDACT.ADD_POLICY( object_schema => 'hr', object_name => 'employee', column_name => 'social_sec_num', policy_name => 'redact_ssn', function_type => DBMS_REDACT.FULL, expression => '1=1' ); Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25 用于快速查询处理的智能扫描 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 26 智能查询优化 一个查询可以横跨Oracle数据库,Hadoop和NoSQL Query Data in RDBMS, Hadoop & NoSQL 快速 海量并行 Oracle SQL 本地过滤 最小化的数据移动 OracleNoSQLDB BDS Server HDFS Data Node BDS Server OracleDatabase StorageServer OracleNoSQLDB BDS Server HDFS Data Node BDS Server OracleDatabase StorageServer Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Query Execution on Hadoop 在Hadoop上的查询执行 select last_name, state, movie, genre from movielog m, customer c where genre=‘comedy’ and c.custid = m.custid 1 • 数据定位 • 数据结构 • 并行度 2 3 1 2 3 HDFS NameNode HiveMetastore 查询编译器确定: HDFS Data Node BDS Server HDFS Data Node BDS Server 使用 Big Data SQL 服务器并行读: • 并行单位: PQ Slaves & InputSplit • 过滤行和项目列 处理过滤后的结果 • 移动相关数据到数据库 • 和数据库表连接 • 应用数据库安全策略 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Big Data SQL 服务器使数据移动最小化 Big Data SQL Server 1. 2. 3. 1. 2. Hadoop Smart Scan Apply filter predicates Apply column projections Apply row-level functions • JSON Parsing External Table Services Read using Hadoop Classes Convert to Oracle Data Stream Data Node • 工作更靠近数据 – 从hadoop类中扫描和串行化 – 转换到Oracle的数据流中 • 智能扫描: 只发布相关数据 – 应用过滤条件 • Include complex predicates, e.g. JSON_EXISTS包括复 杂的查询条件,e.g. JSON_EXISTS • Bloom filters 用于更快的表连接操作 • 给数据挖掘模型打分 – 预期的字段 • 从嵌入式结构中获得预期的字段 Disk Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 31 Oracle Big Data SQL One fast SQL query, on all your data. Oracle SQL on Hadoop and beyond • 使用和Exadata中相同的智能扫描技术 • 使用本地的SQL操作 • 使用Oracle数据库的安全特性和确定性 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 32 高级查询&分析 具有SQL和高级分析的完整功能 利用你所有的数据 Relational, Hadoop and NoSQL Secure 安全 统一治理所有的数据 最快的性能 利用跨平台的SQL处理 应用透明 不需要改变应用代码 Big Data Management System Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 38
© Copyright 2024