Download Report

The Tuning Affect:
Lightening Fast
PeopleSoft EPM
(Without the EXA's)
Session #33261
March 10, 2014
Your Presenter
• Reggie Gentle
• Extensive BI Experience (Before BI was BI circa 1999)
• Former BI Architecture Consultant for major Oracle BI
competitor.
• Responsible for ensuring consistent, efficient and accurate
delivery of BI/DW information to the campus community.
• Architect for BI/ETL/DW Full stack
• New Environment build out/performance tuning
• End-user empowerment.
• Experience with Data Warehouses exceeding 25TB in size
on both Oracle and Netezza and tables in excess of
10billion rows
• Oracle Training/Certifications include:
• DBA Performance Tuning Expert
• Oracle SQL Expert
Florida State University

is a premier, comprehensive,
graduate research university, with
both law and medical schools.

Annual Operating Budget: $1.1B

Over 41,000 students

Over 13,000 employees

Over 13,000 biweekly paychecks

Over $18 million in biweekly payroll

”Most Efficient” high quality
university 2 years in a row
(2013/2014 US News)
FSU’s BI Profile
• Deployed BI/DW Solution in 2008
• Solution meets the reporting needs of our major
Administrative & Academic organizations on Campus
• Consists of 225 Structured Dashboards and 86 Subject
Areas
• Over 1855 distinct users & 7+ Million Requests
• Dashboard Consumption & Self Service Reporting
• Currently Completing Phase V of our CS Deployment
(CSW Admissions Implementation)
• 2009 Oracle Innovation Award Recipient
The Vision for BI @ FSU
FSU BI Vision
To deliver world class analytic
and business intelligence
solutions that empower and
inform the user community.
BI Primary Objectives

Change the paradigm of reporting to one of enabling power users and analysts to
analyze structured summary and detailed information in an easy-to-access format

Establish a unified analytic information platform for Financial, Human Resource,
and Student data based on OBIEE.

Maintain accurate, high quality, timely, and trustworthy information within the BI
Solutions whereby University business decisions can be made and actions can be
taken.
FSU and Oracle
•
•
•
•
•
•
•
•
•
Implemented Financials 8.4, Portal 8.8, and EPM 8.8 in June 2004
Implemented HR/Payroll 8.8 in Dec 2004
Upgraded EPM & Portal Suites to 9.0 in Nov 2007
Upgraded HR Suite to 9.0 in Oct 2008
Upgraded FI Suite to 9.1 in Jan 2011
Deployed EPM 9.0 & OBIEE 10.1.3.3 (Windows) in Mar 2008
Upgraded OBIEE 10.1.3.4 (Linux) in April 2009
Upgraded EPM Suite from 9.0 to 9.1 in Feb 2012
Upgraded DataStage from 7.5 to 8.1 including platform migration from Windows to
Linux in Feb 2012
• Upgraded OBIEE 10.x to 11.1.1.6 July 2013
• Currently Upgrading to IBM DataStage 8.5
• Q2/Q3 Upgrades planned OBIEE 11.1.1.8.x (if available) and Migration from AIX to
RedHat Enterprise Linux (RHEL) for Data Warehouse
The Challenge...
Financials Nightly Load
• Must complete in under 6 hours each night in order to meet
Service Level Objective (SLO)
• Data must be accurate
• Data must be reconciled to the penny against transactional
system
• Must be completed without the use of any EXA’s
• Must be completed with minimal to no human interaction
OBIEE Dashboards
• All university approved dashboards must display data in 15
seconds or less (excluding real-time dashboards)
• Data must be timely and accurate
• Analysis (answers) Ad-hoc must be speedy and provide
similar response time when building new documents
• Must be completed without the use of any EXA’s
The Exa’s
(The Competition)
What Makes Them So Fast?
• Flash Cache – Generally comprised of internal SSD’s and
stores “HOT” data or frequently accessed data on SSD’s
• Storage Indexes – Provide min/max values for each data
storage cell for up to 8 columns in the cell. Queries first hit
storage cell index to determine which cells should be
returned to satisfy a query
• Hybrid Columnar Compress (HCC) – Oracle essentially
compresses data providing much smaller amounts of data
to scan during query execution
• Based on 1/4th rack X3-2 – 16 Cores DB, 36 Cores SQL,
512 GB RAM, 2.4 TB Flash Cache, 7 TB SAS disks
What Makes Them So Fast?
In Summary… the biggest key point in the previous slide…
Smaller data coupled with proper and equally small indexes =
faster results…
The Home Team
Current Architecture
Data Warehouse Size
• ~ 1.5 TB on Disk (excluding undo, archive..etc)
• 1,920,996,115 Rows Spread across 902 tables (Only PSFT
schema and excludes EPM system tables)
• B-Tree Indexes (excluding unique) – 936 total,
2,220,448,600 rows
• Bitmap Indexes – 472 total, 183,819,844 rows
Data Warehouse Size (cont.)
• Largest two tables are
• PS_KK_ACTIVITY_LOG - 88,337,954 rows (growth of
~1mil/month)
• PS_HR_ACCTG_LINE - 85,497,468 rows (growth of
~1mil/month
• Largest Fact Table - PS_FSU_F_LEDKK_SUM 50,820,071 rows (fact for ledger KK)
• Largest Dimension Table - PS_D_JRNL_SOURCE 38,271,118 rows
Is this speed
achievable without the
EXA’s?
First Things First!
• One must first understand WHAT goals are being sought
after.
• EPM ETL loads are NOT data intensive load, contrary to
popular belief
• Generally speaking, faster DW doesn’t equate to faster load
times during ETL windows
The E-T-L
• Let’s take a look at what delivered ETL’s actually do:
• Extract – Siphon off new data from respective transactional
systems
• Transform – Transform the data into a usable format geared
towards warehouse
• Load – Load the newly enriched data into a target warehouse
environment
The E-T-L Breakdown
Assuming a well tuned environment is implemented, below is an
average breakdown of EPM load percentages
• Extract – 15%
• Transform – 70%
• Load – 15%
HMMM… That’s interesting!!
The Logic Test
If 70% of ETL load happens during the transform stage, and the
transform stage happens on the DataStage host, why would we
expect to see 100x improvements by implementing Exadata for
the Data Warehouse environment?
DataStage Job Tuning
Efficient Loading Ledger KK
• Generally was inconsistent with load times and has a
tendency to timeout (mutex timeouts)
• Heavily dependent on Disk IO speed due to heavy hashed
file usage
• Delivered with one hashed file for all prior year data. Once
hashed file exceeds 2gb needs to be converted to 64bit
hashed file.
• Can see Volumes of 40k-100k nightly
Ledger KK (delivered)
Ledger KK (delivered)
Ledger KK (current)
Ledger KK cont.
• Avg. 3min run-time nightly
• Volume of 40k-100k nightly
• Stage statistics
46498 rows read from Trans_Sid_Lkps1_out
2008 rows read from HASH_PS_FSU_F_LEDGER_KK_2014_Exists
5620 rows read from HASH_PS_FSU_F_LEDGER_KK_2013_Exists
6445 rows read from HASH_PS_FSU_F_LEDGER_KK_2012_Exists
9199 rows read from HASH_PS_FSU_F_LEDGER_KK_2011_Exists
10171 rows read from HASH_PS_FSU_F_LEDGER_KK_2010_Exists
7213 rows read from HASH_PS_FSU_F_LEDGER_KK_2009_Exists
2902 rows read from HASH_PS_FSU_F_LEDGER_KK_2008_Exists
483 rows read from HASH_PS_FSU_F_LEDGER_KK_2007_Exists
156 rows read from HASH_PS_FSU_F_LEDGER_KK_2006_Exists
62 rows read from HASH_PS_FSU_F_LEDGER_KK_2005_Exists
2239 rows written to Trans_Assign_values_out_insert
44259 rows written to Trans_Assign_values_out_update
46498 rows written to UPDATE_PREV_ROWS_BATCH_SID
0 rows written to Trans_Load_Error_in
11.740 CPU seconds used, 13.000 seconds elapsed.
Ledger KK cont.
• Insert/Update job
• Insert takes ~ 12 seconds to run with avg rows 2k-50k
• Update job takes ~3minutes to run with avg rows 10k-100k
44259 rows read from Trans_Assign_values_out
44259 rows written to Trans_Assign_values_out
3.200 CPU seconds used, 193.000 seconds elapsed.
• Total Job(Xform,Load) run-time ~7-10 minutes for Ledger
KK
Ledger KK (future)
Ledger KK (future)
• Each XFM will run as an individual process, thereby
allowing for “poor mans” parallel processing.
• Allows memory footprint reduction on XFM process
• Data flows through the stream faster, reducing timeout
issues
• Also, removing all IPC’s and using sequential files for
buffering of data
Database Tuning
Financials Milestones
Database Storage
• 32k Block sized table spaces
• Tables between 1mil – 5mil rows shared 32k tablespace
• Tables with more than 5mil rows dedicated tablespace, with corresponding
index tablespace for indexes
• All surrogate ID columns have a singular bitmap index (if more than
26 SID columns determine best by query patterns)
• House keeping fields (updatedttm, createddttm, data origin) on OWS
tables have bitmap indexes on (See appendix for gotcha’s with
bitmaps and deadlocking).
• Partition OWS based on load needs from ETL tool, and MDW based
on query needs from BI tool
• SQL Profiles for query path optimization
• Frequently review Tuning advisor recommendations and implement
profiles/recommendations whenever necessary (see OBIEE tricks in
appendix to allow OBIEE to utilize query profiles)
• Partitioned tables get dedicated tablespace for table and indexes
Partitioning
• Must take an approach of reducing query runtimes based on
needs…. IE, OWS tables are partitioned differently than MDW
tables would be partitioned
• WRH$/ASH tables are your best friend when determining
frequently used queries
• Partition similar tables on same key.. IE, Header/Line/Src tables
should have same partition key
• Recommend compiling all queries against said table in
spreadsheet and defining commonly used items in where clause
• Use local bitmaps on partitions when needed
• Goal to have ~100k-250k rows per partition
• Millions of partitions now supported with Oracle 11.2.0.4
• Interval partitions are auto managed, so the old DBA
management nightmare is no longer a reason to have less
partitions with more data
Do we need medical
assistance for any of
the DBA’s in the room?
Partitioning
• Yes, you heard me right… 100k-250k rows per partition…
• Oracle 11g R2 supports over 1million partitions per table,
why not utilize what you’ve been given?
• If Interval with auto-range is used, new partitions are
automatically added without DBA involvement at all.
Partitioning
• First up KK Activity Log
• ~88million rows
• Heavily utilized on the selection of “changed data” after
OWS loads have pushed new rows into table
• May be selected against multiple times throughout the load
window
• Used heavily for reconciliation
Partitioning
TABLE_NAME
TABLESPACE NAME
PART?
PS_KK_ACTIVITY_LOG FSU_OWS_KKACTLOG_32K NO
# of Exec's Avg Exec
Candidate over last 90 Time over Total Exec
Days
prior 90
Seconds
NUM ROWS ?
88,337,954 Yes
2022
379
Part
Type
Potential
Key
Indexed Fields (Assume Separate
Indexes for Each Field Specified)
764,013.43
KK_TRAN_ID, KK_TRAN_LN, DEPT_ID,
FISCAL_YEAR, ACCOUNTING_PERIOD,
KK_TRAN_DT FUND_CODE, LASTUPD_EW_DTTM
NO
27,977,243 Yes
635
832
121,540.00
KK_TRAN_ID, KK_TRAN_LN,
DATA_ORIGIN, SRC_SYS_ID,
KK_TRAN_DT LASTUPD_EW_DTTM
NO
12,444,537 Yes
2274
335
760,836.22
KK_TRAN_ID, KK_DATETIME_STAMP,
KK_TRAN_DT LASTUPD_EW_DTTM
PS_HR_ACCTG_LINE FSU_OWS_HR_ACCTG_32K NO
85,497,468 Yes
440
343
162,043.00
RUN_DT
PS_KK_SOURCE_LN
FSU_KKSRCLN_32K
PS_KK_SOURCE_HDR FSU_KKSRCHDR_32K
SEQNUM, KK_AMOUNT_TYPE,
LASTUPD_EW_DTTM
Partitioning Cont.
PS_KK_ACTIVITY_LOG
SQL ID
# of Exec 90 Dy
Avg Elapsed Secs last 90 Total Avg Seconds
KK_TRAN_DT
KK_TRAN_ID
KK_TRAN_LN
bx68tnvc9uyfc
771
117.0859883
90273.297
Y
Y
Y
as745pchbankc
475
160.1717642
76081.588
Y
Y
Y
18vdbyt0hpq04
439
405.3001207
177926.753
gdf92xbmb84ua
110
98.41661818
10825.828
Y
Y
gbdrdma44d0s9
68
2749.091735
186938.238
Y
Y
20n0bndtk7bbc
66
2880.288136
190099.017
Y
Y
ctr1twfqc8t20
56
559.9732321
31358.501
2xzs117sd358s
17
1.766
30.022
Y
35sxczd4au591
16
5.6280625
90.049
Y
83czfyy7aa1mv
2
195.07
390.14
Initial Thoughts
Partition:
Indexes
DEPT_ID
FISCAL_YEAR
ACCOUNTING_PERIOD
Y
Y
FUND_CODE
Y
Y
Y
Y
Y
Y
Y
KK_TRAN_DT
KK_TRAN_ID, KK_TRAN_LN,
DEPT_ID, FISCAL_YEAR,
ACCOUNTING_PERIOD,
FUND_CODE
Y
Y
Y
Y
Y
Partitioning
select distinct sh.snap_id, h.sql_id,dbms_lob.substr(sql_text,100,1)
sqlbit,dbms_lob.substr(sql_text,100,DBMS_LOB.GETLENGTH(sql_text)-100) namech,
u.username, sh.sql_plan_hash_value, sh.sample_time, sh.SESSION_ID
from sys.WRH$_ACTIVE_SESSION_HISTORY sh
, DBA_HIST_SQLTEXT h
, DBA_USERS u
, DBA_HIST_ASH_SNAPSHOT dhas
where
DBMS_LOB.instr(h.sql_text, 'PS_KK_ACTIVITY_LOG',1,1)>1 and
h.sql_id=sh.sql_id and u.user_id=sh.user_id
--and h.sql_id = <fill in with sql_id you’d like to analyze further’
and dhas.snap_id=sh.snap_id and u.username in (<userid your ETL’s run under>)
and MACHINE = <your university ETL server name>
and dbhas.begin_interval_time between to_date('06-01-2013:00','MM-DDYYYY:HH24') and to_date('07-15-2013:23','MM-DD-YYYY:HH24')
order by sample_time desc
Partitioning
OBIEE Tuning
OBIEE Tuning
• Store Web Catalog on SSD/High Speed Storage
• Store user/query cache files on Ram-Drive
• Turn off query identifier to reduce hard parsing and plan
flopping
The Future?
Future Architecture
Questions?
Contacts
• Reggie Gentle
•
•
•
•
BI/EDW Architect
Reporting and Information Services
Florida State University
E-mail: [email protected]
This presentation and all
Alliance 2014 presentations are
available for download from the
Conference site at
www.alliance-conference.com
Presentations from previous meetings are also available
Appendix – DW Config
• Heavy Usage of Profiles – Use around 150+ profiles as
methods to influence the optimizer
• Profiles could be different between environments, but are
generally migrate-able
• Memory Settings:
• SGA Target set to 0 (ASM managed)
• PGA Target set to 0 (ASM managed)
• Auto parallel disabled, but manual parallel statement hinting
allowed and utilized for large table scans where index usage is
not possible. Also used for rebuilding bitmap indexes and
Materialized Views.
Appendix – Partitioning SQL
SELECT
SQL_ID, COUNT(*),
AVG(SECONDS_DIFF), SQLBIT FROM (
select session_id
, sql_plan_hash_value
, sql_id
, MIN(SAMPLE_TIME) START_TIME
, ((extract(HOUR from (MAX(SAMPLE_TIME)-MIN(SAMPLE_TIME))) *60*60) + (extract(MINUTE from (MAX(SAMPLE_TIME)MIN(SAMPLE_TIME))) *60) + extract(SECOND from (MAX(SAMPLE_TIME)-MIN(SAMPLE_TIME)))) SECONDS_DIFF
, (MAX(SAMPLE_TIME)-MIN(SAMPLE_TIME)) TIME_DIFF
,SQLBIT
FROM
(select distinct sh.snap_id, h.sql_id,dbms_lob.substr(sql_text,100,1)
sqlbit,dbms_lob.substr(sql_text,100,DBMS_LOB.GETLENGTH(sql_text)-100) namech,
u.username, sh.sql_plan_hash_value, sh.sample_time, sh.SESSION_ID
from sys.WRH$_ACTIVE_SESSION_HISTORY sh,SYS.DBA_HIST_SQLTEXT h,dba_users u, DBA_HIST_ASH_SNAPSHOT
dhas
where sh.snap_id between 1 and 100 and
DBMS_LOB.instr(h.sql_text, 'PS_KK_ACTIVITY_LOG',1,1)>0 and
h.sql_id=sh.sql_id and u.user_id=sh.user_id
)A
GROUP BY session_id, sql_plan_hash_value, SQLBIT, sql_id
HAVING (MAX(SAMPLE_TIME)-MIN(SAMPLE_TIME)) < '0 06:00:00.49000000'
Order by MIN(SAMPLE_TIME) DESC, SESSION_ID DESC
) GROUP BY SQL_ID, SQLBIT
Appendix – Index Rebuild
SPROC
-- Alter indexes for table to unusable
SET SERVEROUTPUT ON SIZE 1000000
DECLARE
INDEX_NAME CHAR;
BEGIN
FOR t IN
(select INDEX_NAME from ALL_INDEXES where TABLE_NAME IN ('PS_F_CLASS_ENRLMT') AND INDEX_TYPE = 'BITMAP')
LOOP -- Loop through list of indexes and mark bitmaps unusable
EXECUTE IMMEDIATE 'ALTER INDEX ' || t.INDEX_NAME || ' UNUSABLE';
dbms_output.put_line(t.INDEX_NAME);
END LOOP;
END;
-- rebuild prior unusable indexes
SET SERVEROUTPUT ON SIZE 1000000
DECLARE
INDEX_NAME CHAR;
BEGIN
FOR t IN
(select INDEX_NAME from ALL_INDEXES where TABLE_NAME IN ('PS_F_CLASS_ENRLMT') AND INDEX_TYPE = 'BITMAP')
LOOP -- Loop through list of indexes and rebuild those which are bitmap
EXECUTE IMMEDIATE 'ALTER INDEX ' || t.INDEX_NAME || ' rebuild';
dbms_output.put_line(t.INDEX_NAME);
END LOOP;
END;
Appendix – OBIEE SQL
Profile usage
• Config setting to remove ID from sql’s sent to DB (also
explained in Oracle bug 14722496
variable DISABLE_SQL_HASH is set in opmn.xml for the obis
process.
<ias-component id="coreapplication_obis1" inherit
environment="true">
<environment>
<variable id="DISABLE_SQL_HASH" value="1"/>