Configuring an optimal parallel query processing environment in ASE 15.0 using partitions

Configuring an optimal
parallel query processing
environment
in ASE 15.0 using partitions
Sudipto R. Chowdhuri
Sr. Staff Engineer, Query Optimizer Group,
Sybase Inc.
Agenda
 Partitions for performance
 Types of Parallelism in 15.0
 Basis of Parallel Querying
 Semantic partitioning
 How to chose your partitioning
 Case Study using TPC-H
Partitions for performance – why ?
A brief recap of execution plan
 Serial Execution Engine in ASE 15.0
• Showplan output is now a tree of operators
select a1, a2 from A, B where a1 > 10 and a2=b1
EMIT
NLJOIN
TSCAN
A
ISCAN IB
B
The type of query is SELECT.
ROOT:EMIT Operator
|NESTED LOOP JOIN Operator (Join Type: Inner
Join)
| |SCAN Operator
| | FROM TABLE A
| | Table Scan..
| |SCAN Operator
| | FROM TABLE B
| | Index : IB
| | Forward Scan.
| | Positioning by key.
| | Keys are:
| | b1 ASC
Types of parallelism
 Intra operator parallelism
• clones of the operator works concurrently on different data
fragments
select a1, a2 from A where a1 > 10
QUERY PLAN FOR STATEMENT 1 (at line 1)
EMIT
XCHG(p=2,c=1)
SCAN
A
Executed in parallel ………..
3 operator(s) under root
The type of query is SELECT
ROOT:EMIT Operator
|EXCHANGE Operator (Merged)
|….2 Producer and 1 Consumer processes.
| |EXCHANGE:EMIT Operator
| | |SCAN Operator
| | | FROM TABLE A
| | | Table Scan …with a 2-way partition scan.
Types of parallelism
 Inter operator parallelism (Single Pipeline)
• One or more clones of one or more operators working
concurrently on different data fragments
select a1, count(*) from A where a2 > 10 group by a1
P
I
P
E
L
I
N
E
EMIT
XCHG (p=1,c=1)
GROUPHASH
XCHG(p=2,c=1)
SCAN
A
The type of query is SELECT.
ROOT:EMIT Operator
|EXCHANGE Operator (Merged)
|in parallel by 1 Producer and 1 Consumer process.
……………………………………………
| | |HASH VEC AGGR Operator GROUP BY
| | | Evaluate Grouped COUNT AGGR.
| | | |EXCHANGE Operator (Merged)
| | | |..in parallel by 2 Producer and 1 Consumer..
……………………………………………
| | | | | |SCAN Operator
| | | | | | FROM TABLE A
| | | | | | Table Scan.
| | | | | | ..in parallel with a 2-way partition scan.
Types of parallelism
 Inter operator parallelism (Multiple Pipelines)
• One or more pipelines of operators working concurrently
select a1, b2 from A, B, C where a1 = b1 and b1 = c2
P3
EMIT
Pipe P3
MJOIN
Pipe P1
XCHG (p=1,c=1)
SORT
XCHG
TSCAN
A
P1
MJOIN
XCHG (p=1,c=1)
SORT
XCHG
TSCAN
B
Pipe P2
XCHG(p=2,c=1)
ISCAN
C
P2
Pipeline
Dependency
Tree
Pre-15.0 parallel plans
 There are two relevant limitations with pre 15.0 query
plan
• Redundant scans
– Same slice of a table can be scanned multiple times
• Bottlenecks
– Parent thread may need to do a final table level merge, which
can be expensive
Pre-15.0 Parallel Execution (Joins)
Redundant scans
Table A
Slice 1
Slice 2
Thread 1
Slice 1 Slice1
Slice 1 Slice2
Join clones
Slice1
Slice2
Table B
Slice 2 Slice1
Slice 2 Slice2
Thread 2
Redundant scans
Thread 3
Thread 4
Pre-15.0 Parallel Execution (Grouping)
Slice 1
Slice 1
Slice 2
Slice 2
Slice3
Slice3
Slice4
Slice4
Table A
Grouping
Clones
Thread 1
Thread 2
Grouping with
Thread 3
merge
Thread 4 Parent Thread
bottleneck
Basis of parallel Querying in ASE 15.0
Divide and conquer by using partitions
INTELLIGENT DIVISION OF DATA
Basis of Parallel Querying in ASE 15.0
 What is a partition


A partitioning is a pair (a,h) where a is an attribute and h(a,N) is a fn
that maps any value of a to a non-negative integer k where 1<= k <= N, N
being the # of partitions
h(Name,2) : Name < ‘O’; ‘O’<=Name < ‘Z’
Name
Nelly
P’Diddy
Dept
Optimizer
Execution
Emminem
Execution
SnoopDog
Access
JaRule
Recovery
Nelly
Emminem
Optimizer
Execution
JaRule
Recovery
P’Diddy
Execution
SnoopDog
Access
Basis of Parallel Querying in ASE 15.0
 Attribute Sensitive Operators

Two relations S and T are partitionable according to a fn h into
partitions S0, S1,..Sk and T1,T2,..Tk respectively, then a unary operator
is said to be “attribute sensitive if
 F(S) = F(S0) U…UF(Sk)

and a binary operator is “attribute sensitive” if
F(S,T)=F(S0,T0)U…F(Sk,Tk)
Joins in ASE 15.0 using partitions
C1
C2
10
abc
14
15
13
14
xz
sob
b2c
c
Table C
D1
D2
8
8
14
15
14
15
12
15
14
20
Table D
C1
C2
10
14
14
abc
xz
c
15
13
sob
b2c
D1
D2
8
8
14
14
15
15
12
14
15
20
Select c1,d2 From C, D
where c1=d1
14
14
14
14
12
14
12
14
15
15
15
20
Join as an Attribute
Sensitive Operator
removes redundancy
Groupings in ASE 15.0 using partitions
C1
C2
10
abc
14
15
13
14
xz
sob
b2c
c
Table C
Select count(*),c1 From C
group by c1
C1
C2
10
14
14
abc
xz
c
15
13
sob
b2c
1
2
1
1
10
14
15
13
Groupby as an Attribute
Sensitive Operator
removes bottlenecks
What is Semantic partitioning ?
 Given a row and a partitioning function h(a1,..,aN) on
attributes a1,..,aN, we can tell which partition the row
belongs to.
 ASE 15.0 supports different types of semantic based
partitioning
• Range : partition by range(col1, col2) (p1 values <= (10, 1000),
p2 values <= (20, 2000),.., pN values <= (100, 10000))
• Hash : partition by hash(col1, col2) (p1, p2, .. , pN)
• List : partition by list(col1) (p1 values in (10, 20, 30), p2 values in
(40, 15), .. , pN values in (3, 5))
 Local indexes (index with same partitioning as data) are
supported in ASE 15.0, which means that they can support
the same model.
Query operations that benefit from partitioning
 Query with predicates:
• select * from A where a1 in (100,200) and a1 < 40
Table A is range partitioned on column a1 as (p1 values <= (90),
p2 values <= (180), p3 values <= (270), p4 values <= (360))
QUERY PLAN FOR STATEMENT 1 (at line 1).
The type of query is SELECT.
ROOT:EMIT Operator
|RESTRICT Operator
|
| |SCAN Operator
| | FROM TABLE A
| | [ Eliminated Partitions : 1 2 3 4 ]
| | Table Scan.
| | Forward Scan.
Query operations that benefit from partitioning
 Query with joins:
• select * from A, B where a1 = b1 and a2 = b2
• Case A:
Table A and B are partitioned on columns a1 and b1 respectively
AND
The partitioning is exactly same (say both are hash partitioned 4
ways on a1 and b1 respectively)
• Inference:
The partitioning on both sides are a subset of the joining columns
AND the two tables join on the partitioning columns.
Equi-partitioned
Query with joins (contd)
The type of query is SELECT.
ROOT:EMIT Operator
|EXCHANGE Operator (Merged)
|Executed in parallel by 4 Producer and 1
Consumer processes.
| |EXCHANGE:EMIT Operator
| | |MERGE JOIN Operator (Join Type:
Inner Join)
| | | Key Count: 2
| | | Key Ordering: ASC ASC
| | | |SORT Operator
| | | | |SCAN Operator
| | | | | FROM TABLE A
| | | | | Table Scan.
| | | | | Executed in parallel with a 4-way
partition scan.
| | | |SORT Operator
| | | | |SCAN Operator
| | | | | FROM TABLE B
| | | | | Table Scan.
| | | | | Executed in parallel with a 4-way
partition scan.
EMIT
XCHG (p=4, c=1)
MJOIN
SORT
TSCAN
A
SORT
TSCAN
B
Query operations that benefit from partitioning
 Query with joins:
• select * from A, B where a1 = b1 and a2 = b2
• Case B:
Table A and B are partitioned on columns a2 and b1 respectively
AND
The partitioning is exactly same (say both are hash partitioned 4
ways on a2 and b1 respectively)
• Inference:
The partitioning on both sides are a subset of the joining columns
BUT the two tables DO NOT join on the partitioning columns.
Query with joins (contd)
Non equi-partitioned
ROOT:EMIT Operator
|EXCHANGE Operator (Merged)
|Executed in parallel by 4 Producer and 1
Consumer processes.
| |MERGE JOIN Operator (Join Type: Inner
Join)
| | |SORT Operator
| | |EXCHANGE Operator (Repartitioned)
| | |..in parallel by 4 Producer and 4
Consumer processes.
| | | |SCAN Operator
| | | |FROM TABLE A
| | | |.. Parallel with a 4-way partition scan
| | | |SORT Operator
| | | | |SCAN Operator
| | | | | FROM TABLE B
| | | | | ..in parallel with a 4-way partition
scan.
EMIT
XCHG (p=4, c=1)
MJOIN
SORT
XCHG(p=4, c=4)
TSCAN
A
SORT
TSCAN
B
Query operations that benefit from partitioning
 Query with groupbys and distincts:
• select a1, a4, count(*) from A group by a1, a4
• select distinct a1, a4 from A
• Case A:
Table A is partitioned on column a1.
• Inference:
The partitioning column is a subset of the grouping columns
The partitioning column is a subset of the distinct columns
Query with groupbys (contd)
ROOT:EMIT Operator
|EXCHANGE Operator (Merged)
|Executed in parallel by 4 Producer and 1
Consumer processes.
| |EXCHANGE:EMIT Operator
| |
| | |HASH VECTOR AGGREGATE
Operator
| | | GROUP BY
| | | Evaluate Grouped COUNT
AGGREGATE.
| | | Using Worktable1 for internal storage.
| | |
| | | |SCAN Operator
| | | | FROM TABLE
| | | | A
| | | | Table Scan.
| | | | Executed in parallel with a 4-way
partition scan.
In-partitioned
EMIT
XCHG (p=4, c=1)
GROUPHASHING
TSCAN
A
Query operations that benefit from partitioning
 Query with groupbys and distincts:
• select a1, a4, count(*) from A group by a1, a4
• select distinct a1, a4 from A
• Case A:
Table A is partitioned on column a2.
• Inference:
The partitioning column is NOT a subset of the grouping columns
The partitioning column is NOT a subset of the distinct columns
Query with groupbys (contd)
ROOT:EMIT Operator
|EXCHANGE Operator (Merged)
|Executed in parallel by 2 Producer and 1
Consumer processes.
| |EXCHANGE:EMIT Operator
| |
| | |HASH VECTOR AGGREGATE Operator
| | | GROUP BY
| | | Evaluate Grouped COUNT
AGGREGATE.
| | | |EXCHANGE Operator (Repartitioned)
| | | |Executed in parallel by 4 Producer and
2 Consumer processes.
| | | | |SCAN Operator
| | | | | FROM TABLE
| | | | |A
| | | | | Table Scan.
| | | | | Executed in parallel with a 4-way
partition scan.
Re-partitioned
EMIT
XCHG (p=2, c=1)
GROUPHASHING
XCHG (p=4, c=2)
TSCAN
A
Query with groupbys (contd)
ROOT:EMIT Operator
|HASH VECTOR AGGREGATE Operator
| GROUP BY
| Evaluate Grouped SUM OR AVERAGE
AGGREGATE.
| |EXCHANGE Operator (Merged)
| |Executed in parallel by 4 Producer and 1
Consumer processes.
| | | |HASH VECTOR AGGREGATE
Operator
| | | | GROUP BY
| | | | Evaluate Grouped COUNT
AGGREGATE.
| | | | |SCAN Operator
| | | | | FROM TABLE
| | | | | A
| | | | | Table Scan.
| | | | | Executed in parallel with a 4-way
partition scan.
2-phased
EMIT
GROUPBY
XCHG (p=2, c=1)
GROUPBY
TSCAN
A
How to chose the columns on which to partition ?
 Partitioning is like a low cardinality index
• Use it as a first order of lookup
• Use a primary key for partitioning if your query is not using the primary key
to lookup a single row
 Look for columns that join
• foreign keys are a good candidate, when primary keys are not.
sel ect s_acctbal , s_name, n_name, p_partk ey, p_mf gr,
s_address, s_phone, s_comment
f rom suppl i er s, partsupp ps, part p, nati on, regi on
w here
p_partk ey = ps_partk ey and s_suppk ey = ps_suppk ey
and p_si ze = 4 and p_type l i k e '%COPPER'
and s_nati onk ey = n_nati onk ey
and n_regi onk ey = r_regi onk ey and r_name = 'A SI A '
• part(p_partkey) and supplier(s_suppkey) seems ideal.
How to chose the columns on which to partition ?
 Look for columns where partition elimination can be
done.
sel ect l _orderk ey, sum(l _extendedpri ce * (1 - l _di scount)) as revenue, o_orderdate,
o_shi ppri ori ty
f rom customer, orders, l i nei tem
w here
c_mk tsegment = 'H OUSEH OLD ‘ and c_custk ey = o_custk ey
and l _orderk ey = o_orderk ey and o_orderdate < '1995-03-06'
and l _shi pdate > '1995-03-06'
group by
l _orderk ey, o_orderdate, o_shi ppri ori ty
• lineitem(l_shipdate), orders(o_orderdate) are possible partitioning
columns
How to chose the columns on which to partition ?
 Look for columns on which grouping or distinctness is required
 Select list columns in a union query
 Do not worry about order by clauses
sel ect l _orderk ey, sum(l _extendedpri ce * (1 - l _di scount)) as revenue, o_orderdate,
o_shi ppri ori ty
f rom customer, orders, l i nei tem
w here
c_mk tsegment = 'H OUSEH OLD ‘ and c_custk ey = o_custk ey
and l _orderk ey = o_orderk ey and o_orderdate < '1995-03-06'
and l _shi pdate > '1995-03-06'
group by
l _orderk ey, o_orderdate, o_shi ppri ori ty
order by
l _orderk ey, o_orderdate, o_shi ppri ori ty
• lineitem(l_orderkey) helps in the join as well as the grouping clause
• orders(o_orderdate) helps in grouping and partition elimination
How to chose the partitioning type
 Hash partitions are easy to set up.
• Balanced – use sp_help to check out the average skew which is
(size of largest partition / average size of partition)
• Anything greater than 1.8 is considered bad for parallel operation
 Do not use large list partitions unless the IN list is reasonably small
• Uses a lot of memory
• Processing is very CPU intensive because of the IN list being not
sorted
 Attribute sensitive operation like joins, groupbys, distincts, unions
do well with hash based partitioning
 Partition elimination due to predicates is best done using range
partitions as inequality predicates can be used
How to chose the partitioning degree
 The parameter “max parallel degree” that controls how much
parallelism can be applied to a query is not enforced.
 A large partitioning degree will flood the system with worker
threads.
• Check out I/O characteristics and see how much parallelism it
supports.
• Run something like:
– time dd if=“device name” of=“dev/null” bs=2k skip=40 count=80000
• Run N number of such commands in the background to see where
the elapsed time seems to be deviating from being constant.
 Do not exceed the number of engines configured in the system
• Re-partitioning is CPU intensive and cannot be avoided
Case Study using TPC-H database
PART (200,000)
PARTSUPP (800,000)
LINEITEM (6,000,000)
ORDERS (1,500,000)
PARTKEY
PARTKEY
ORDERKEY
ORDERKEY
NAME
SUPPKEY
PARTKEY
CUSTKEY
BRAND
AVAILQTY
SUPPKEY
STATUS
TYPE
SUPPLYCOST
LINENUMBER
TOT-PRICE
EXTENDEDPRICE
ORD-DATE
CUSTKEY
DISCOUNT
ORD-PRIORITY
NAME
TAX
ADDRESS
SHIPDATE
NATIONKEY
RECEIPTDATE
ACCTBAL
RETURNFLAG
MKTSEG
LINESTATUS
NATION (25)
REGION (5)
SIZE
CONTAINER
SUPPLIER (10,000)
PARTKEY
NAME
BRAND
CUSTOMER (150,000)
TYPE
SIZE
NATIONKEY
NATIONKEY
NAME
REGIONKEY
REGIONKEY
NAME
Case Study using TPC-H database (partitioning)
PART [HASH]
PARTSUPP [HASH]
LINEITEM [RANGE]
ORDERS [RANGE]
PARTKEY
PARTKEY
ORDERKEY
ORDERKEY
NAME
SUPPKEY
PARTKEY
CUSTKEY
BRAND
AVAILQTY
SUPPKEY
STATUS
TYPE
SUPPLYCOST
LINENUMBER
TOTALPRICE
EXTENDEDPRICE
ORDERDATE
CUSTKEY
DISCOUNT
PRIORITY
NAME
TAX
ADDRESS
SHIPDATE
NATIONKEY
RECEIPTDATE
ACCTBAL
RETURNFLAG
MKTSEG
LINESTATUS
SIZE
CONTAINER
SUPPLIER [HASH]
SUPPKEY
NAME
ADDRESS
CUSTOMER [HASH]
PHONE
ACCTBAL
NATIONKEY
NATION [NONE]
NATIONKEY
NAME
REGIONKEY
REGION [NONE]
REGIONKEY
NAME
Performance Analysis
6 engines
# of threads = 20
# max pll degree = 10
ms
PE: partition elimination
PJ: equi-partitioned join
PG: in-partitioned grouping
PE PJ
PE PE
PJ
PG
Q1
Q2
PE PE PJ
PJ
PJ
Q3 Q4 Q5 Q6 Q7 Q8
Unpartitioned
tables in 15.0
PJ
PE
PG
Q9
Q10
Partitioned tables
in 15.0