Oracle Exadata v2 Fast Track Not x2 Hüsnü Şensoy

1
Global Maksimum Data&Information
Technologies
Not x2
Oracle Exadata v2
Fast Track
Hüsnü Şensoy
Global Maksimum Data & Information Tech
Founder, VLDB Expert
2
Global Maksimum Data&Information
Technologies
Agenda
Why do we need Exadata v2 ?
Exadata Hardware
Exadata Software
Better to show rather than talk.
Conclusion
3
Global Maksimum Data&Information
Technologies
Who am I ?
 Data & Information expert on VLDB
environments







Before completing the year
•
HrOUG in two weeks later
•
Optimized Analytical Processing Capabilities of 11g Release 2
•
Database Consolidation Best Practices
•
ACED Session with
•
Jose Senegačnik
•
Denes Kubiček
•
UKOUG in December
•
Optimized Analytical Processing Capabilities of 11g Release 2
DWH
Data Mining
Inference Systems
Data Archiving Solutions
Niche Storage Technologies
Recovery Strategies & Solutions
HA Systems
 Oracle ACED on BI field
 Only one in Turkey
 Still the youngest one all over the community.
 DBA of the Year 2009
 7th and still the youngest all over the world.
 Only one in Turkey
 Member of Oracle CAB for 12g DWH
development
 Worldwide presenter of Oracle conferences and
user group events
4
Global Maksimum Data&Information
Technologies
Global Maksimum Data & Information Technologies
and
Oracle Exadata v2
 Only company in Turkey having IB
interconnected RAC 11g implementation
experience on Linux x86-64bit.
 Only company in Turkey having sufficient
consultancy experience (more than 120 TB
conventional system data) on Exadata v2




Physical & Architecture Design
Migration
Performance Optimization
Backup & Recovery Architectures Design
 Trains customers, Oracle partners, and
Oracle employees all over the Europe
 Strong joint relation with Oracle Platinum
Partners, Oracle Development Team Head
Office, and IB technology leaders.
 X-Migrator service provider for high
capacity customers.
5
Global Maksimum Data&Information
Technologies
Oracle Exadata v2
Don’t think Exadata as yet another
product sold by SALES guys.
• As a customer take it as an
effortless solution for hardware
software integration.
• As an engineer take it as an
elegant solution of so-called
unsolvable I/O problem for Oracle
databases.
6
Global Maksimum Data&Information
Technologies
Who needs Exadata v2 ?
 Engineers
 To learn that «The mechanic with a hammer thinks that all problems are
nail»
 Customers
 Shorter setup time
 Non-Exadata Customers
 More stable Oracle releases
 Oracle
 Easy to manage/standardize its code repository
7
Global Maksimum Data&Information
Technologies
Oracle Exadata v2
Hardware
Best thing about Exadata is that it has nothing
magical in it in terms of hardware.
• A few Sun Fire X4170 x86-64 bit servers.
• A few Sun Fire X4275 x86-64 bit servers.
• A few IB switches.
8
Global Maksimum Data&Information
Technologies
Exadata v2 X-Ray
Sun Datacenter 36-port Managed QDR IB Switched
Exadata Storage Servers
Sun Fire™ X4170 Oracle Database Server
KVM IP Console Switch
Rackmount KMM Keyboard with TFT monitor
42U
48-port Gigabit Ethernet Switch
9
Global Maksimum Data&Information
Technologies
Interconnect Network Hardware
 IB Switches
 3 x 36-port managed switches as opposed to Exadata v1 (2+1).
 2 “leaf”
 1 “spine” switches
 Spine switch is only available for Full Rack because it is for connecting
multiple full racks side by side.
 A subnet manager running on one switch discovers the topology of the
network.
 HCA
 Each node (RAC & Storage Cell) has a PCIe x8 40 Gbit HCA with two
ports
 Active-Standby Intracard Bonding.
10
Global Maksimum Data&Information
Technologies
RAC Node
 Sun Fire X4170 Server
 2 socket
 Quad Core
 2.53 GHz
 2 Hyper-Threads
 So, CPU_COUNT=16
 18 DDR3 DIMM Slots
 72 GB@800 MHz (2x3x3x4
GB)
 4 10/100/1000Base-T Ethernet
ports




NET0 : Management
NET1 : Public Network
NET2 : Public Network
NET3 : -
 PCIe PES24T6G2 Switch
 x8
11
Global Maksimum Data&Information
Technologies
Storage Node
 Sun Fire X4275 Server
 2 socket
 Quad Core
 2.53 GHz
 6 DDR3 DIMM Slots
 24 GB@1066 MHz (2x3x1x4
GB)
 HDD Storage
 12 x 3.5-inch 600 GB 15 K RPM
SAS disks
 12 x 3-5-inch 2 TB 7.2 K RPM
SATA disks
 4 Sun Flash Accelerator F20
PCIe Cards
12
Global Maksimum Data&Information
Technologies
Soft Storage Node
 CELLSRV
iDB
 Multithreaded block server
 Buffer cache reads
 Smart scans
MS
IORM
CELLSRV
 Performs I/O Resource Management
 Gather operational statistics
 Communicates over iDB with the clients.
 MS
RS
 OC4J application
 Provides functionalities for
 Cell management
 Cell administration
 Aler generation
 RS
 First process becoming live in storage cell.
 Work as a hang analyzer for CELLSRV
and MS
13
Global Maksimum Data&Information
Technologies
HDD Sequential Read
Performance
600 GB 15K RPM SAS
2 TB 7.2K RPM SATA
204 MB/s
122 MB/s
144 MB/s
90 MB/s
14
Global Maksimum Data&Information
Technologies
HDD Random Read Performance
600 GB 15K RPM SAS
2 TB 7.2K RPM SATA
175 IOPS @ 2KB
380 IOPS @ 2KB
79 IOPS @ 2KB
182 IOPS @ 2KB
15
Global Maksimum Data&Information
Technologies
F20 PCIe Card
 Not a SATA/SAS SSD driver but a x8 PCIe
device providing SATA/SAS interface.
 4 Solid State Flash Disk Modules (FMod) each
of 24 GB size
 256 MB Cache
 SuperCap Power Reserve
(EnergyStorageModule) provides write-back
operation mode.
 ESM should be enabled for optimal write
performance
 Should be replaced in every two years.
 Can be monitored using various tools like ILOM
 Embedded SAS/SATA configuration will
expose 16 (4 cards x 4 FMod) Linux
devices.
 /dev/sdn
 4K sector boundary for Fmods
 Each FMod consists of several NAND modules
best performance can be reached with
multithreading (32+ thread/FMod etc)
16
Global Maksimum Data&Information
Technologies
Performance of F20
Read: 1.1 GB/s
Random Write Performance Degeneration
 As the flash cache get full (sustained write)
 Wear Leveling
 SLC Update Mechanism : Delete + Write
 Garbage Collector
write performance is degenerated due to Write
Amplification.
 That’s why you are not advised to put real-time
performance demanding files on flash cards
 Online Redo Logs
Sequential
Max Write: 567
MB/s
(~145K IOPS @ 4K)
F20 PCIe Card
(4 FMod)
Read: 101K IOPS
Random @ 4K
Peak: 88K IOPS
Write
Average : 37K IOPS
17
Global Maksimum Data&Information
Technologies
Aggregate Capacity
Capacity
Quarter
Rack
Half
Rack
Full Rack
21 TB
50 TB
100 TB
72 TB
168 TB
336 TB
1.1 TB
2.6 TB
5.3 TB
SAS
6 TB
14 TB
28 TB
SATA
21 TB
50 TB
100 TB
Raw HDD SAS
SATA
Raw Flash
User
Data
Performance
Quarter
Rack
Half Rack Full Rack
SAS
4.5 GB/s
10.5 GB/s
21GB/s
SATA
2.5 GB/s
6 GB/s
12 GB/s
Flash Throughput
11 GB/s
25 GB/s
50 GB/s
Flash IOPS
225,000
500,000
1,000,000
HDD Throughput
18
Global Maksimum Data&Information
Technologies
Oracle Exadata v2
Software
Exadata hardware is almost sufficient
to beat any hardware configuration
possible to work with Oracle
Database.
But why to stop there while it is
possible to do more with
• Smart Scan
• Storage Indexes
• I/O Resource Manager
• EHCC
19
Global Maksimum Data&Information
Technologies
Soft Components of Exadata v2
 Open Soft Pieces
1.

 Oracle Enterprise Linux 5.3
2.
 Oracle defined set of RPMs
 Oracle Exadata Storage Software
 Smart Scan





Smart Flash Cache
HCC
Storage Index
IO Resource Manager (IORM)
Oracle Exadata Bundle Patches
 Common Soft Pieces
 Oracle RDBMS 11.2.0.1
3.
Pruning
Parallel Hash Join


 Oracle OFED (bug fixed version)
 Encrypted Data
 Data Mining
Partitioning
Bloom Filtering
Pairwise/Semi-pairwise
Join
Compression

HCC
 DBFS
 Oracle Grid IS 11.2.0.1
 ASM
 Clusterware
 Oracle Exadata Bundle Patches
 iDB
20
Global Maksimum Data&Information
Technologies
Smart Scan
 Smart Scan is initially formed to be column and row filtering based on
projection and predicates.
 But this was just the seed idea. Today Smart Scan can also do
 Projection (column) filtering
 Predicate (row) filtering
 SELECT * FROM v$sqlfn_metadata WHERE offloadable = 'YES';




Preperation of bloom filters for join
Smart Incremental backup
Scan on encrypted data
Smart File Creation
 RMAN Restore
 Tablespace Creation
 File Grow
 Scoring for Data Mining
 All data mining scoring functions are offloaded
21
Global Maksimum Data&Information
Technologies
Smart Scan OFF. Why ?













CELL_OFFLOAD_PROCESSING = FALSE
The table or partition is small.
CBO doesn’t choose to use direct path read.
ROW_DEPENDENCY ENABLED or rowscn is fetched.
Fetch rows in rowid order.
CREATE INDEX ... NOSORT
LOB or LONG fetch
Scan on flashback table
Cell based decryption is disable.
Tablespace is not completely on Exadata
More than 255 columns are queried.
Predicate evaluation on virtual column.
For dirty blocks
22
Global Maksimum Data&Information
Technologies
Storage Index
 Smart Scan is about saving RAC node CPUs during I/O processing,
but storage index is about saving the processors of Exadata storage
cells.
 Anyhow if we figure out that T = E+W, decreasing E in any layer
will decrease T. This means faster queries or more queries within
the same period.
 Storage Index is not something first used in Exadata. It is borrowed
from Netezza ZoneMap.
 Oracle’s SI is in memory
 It is about filtering out for a super set of actual result set.
23
Global Maksimum Data&Information
Technologies
select A,B,C from T1 where B<2;
CELLSRV
AU
AU
Smart Scan
RDSoRDMA
First Execution
24
Global Maksimum Data&Information
Technologies
Next Executions
select A,B,C from T1 where B<2;
Storage Index
B: 1/5
B: 3/10
B: 5/10
B: 9/10
B: 3/10
iDB
B: 2/10
AU
AU
Smart Scan
CELLSRV
25
Global Maksimum Data&Information
Technologies
More Storage Index
Information
 Storage Index may not be built
by CELLSRV yet.
 Storage Regions are not created
on all columns. CELLSRV
picks out suitable columns to be
indexed.
 Column types should be suitable
(byte level comparison should
match type level comparison)
 NLS types are not allowed.
Tips
 Keep your eyes on cell physical IO
bytes saved by storage index
statistics in V$SYSSTAT or
V$SESSSTAT
 Remember that in order to fully
utilize storage indexes, data should
be physically located in clustered
manner on highly queried column
 You might thing of as which column
would you index if you could.
 So modify your ETL in accordance
with that.
26
Global Maksimum Data&Information
Technologies
&