Download Report

How to Quantify Oracle Database Scalability
Examples
Dr. Neil J. Gunther
Perfdynamics
Peter Stalder
[email protected]
Hotsos Symposium
M h 7 – 11,
March
11 2010
Basel
·
Baden
·
Bern
·
Lausanne
·
Zurich
·
Düsseldorf
·
Frankfurt/M.
·
Freiburg i. Br.
·
Hamburg
·
Munich
·
Stuttgart
·
Vienna
About me
Senior Consultant at Trivadis AG in Zurich,, Switzerland
à ZH-IMS (Infrastructure Managed Services)
à [email protected]
Focus
à Application Performance Management (APM)
à Predictive Performance Management (PPM)
à Capacity Management
Recent presentations
à DOAG, Dec 2009 – Server Consolidation using analytical
modeling
à DOAG ITIL Days, Sept. 2009 – Ressourcen und
Kapazitätsanalysen im Oracle-Umfeld
à ukCMG,
kCMG May
M 2009 – Oracle
O l M
Metrics
i ffor Si
Sizing
i
Universal Law of Computational Scaling
2
© 2010
How to Quantify Oracle Database Scalability
Why quantifying? How quantifying?
Response Time Scalability
Conclusion
Data are always
part of the game.
Universal Law of Computational Scaling
3
© 2010
Swingbench example from a blog
Results are taken from
à http://oracledoug.com/.../1470-Time-Matters-Throughput-vs.-Response-Time.html
1
Avg.
g
Response
Time (ms)
79
2
108
6,772
,
4
133
10,481
8
198
13,346
12
244
13,639
16
310
14,798
20
337
14,749
24
369
14,176
28
428
15 181
15,181
32
563
13,278
36
533
14,151
40
0
587
58
13,302
3,30
Concurrentt
C
Sessions
Universal Law of Computational Scaling
4
Transactions
T
ti
Completed
4,203
© 2010
Blog comments,
comments opinions and discussions (1)
“The results for the throughput
g p of the system
y
are not consistent”
“Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory”
“Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
“The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
“Do I care more about throughput or response times?”
Universal Law of Computational Scaling
5
© 2010
Efficiency - consistent and valid
The efficiencyy does not exceed 100%
à a first confidence of the measurements
Measured
Users (N)
1
2
4
8
12
16
20
24
28
32
36
40
Universal Law of Computational Scaling
6
Trx / Sec
X(N)
4'203
6'772
10'481
13'346
13'639
14'798
14
798
14'749
14'176
15'181
13'278
14'151
14
151
13'302
RelCap
Efficiency
C=X(N)/X(1)
C/N
1.00
1.00
1.61
0.81
2.49
0.62
3.18
0.40
3.25
0.27
3.52
0.22
3.51
0.18
3.37
0.14
3.61
0.13
3.16
0.10
3 37
3.37
0 09
0.09
3.16
0.08
© 2010
Error Bars - consistent and valid
The error bars show the error spread
p
in the data
à The spread gives us confidence in the measurements
Universal Law of Computational Scaling
7
© 2010
Deviation - consistent and valid
A deviation from < 10% is fair
à This gives us further confidence in the measurements
Measured Trx / Sec
Users (N)
X(N)
1
4203
2
6772
4
10481
8
13346
12
13639
16
14798
20
14749
24
14176
28
15181
32
13278
36
14151
40
13302
Universal Law of Computational Scaling
8
Capacity
Modeled
4203
6961
10270
13168
14221
14566
14585
14437
14200
13916
13608
13290
© 2010
Error
%
0.00
2 80
2.80
-2.01
-1.33
4.27
-1.56
-1.11
1.84
-6.46
4.81
-3.83
-0.08
Blog comments,
comments opinions and discussions (2)
“The results for the throughput
g p of the system
y
are not consistent”
“Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory.”
“Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
“The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
“Do I care more about throughput or response times?”
Universal Law of Computational Scaling
9
© 2010
Ideal Load Test Case
Both t/put data X and latency
data R are nonlinear functions
of load N
S t ti
Saturation
Throughput (X) data
Load N is the independent
variable, representing the
number of sessions
Latency (R) data
Oracle data should look like
this
N
R( N ) =
−Z
X (N )
Queueing really kicks in
Near origin, X and R appear to be independent.
But this is an illusion due to no queueing (waiting).
Graph provided by Neil Gunther. Thanks!
Universal Law of Computational Scaling
10
© 2010
Real data from the blog
Throughput
g p and response
p
time data from load limited systems
y
has to have this characteristic
à Throughput is limited and represented by a concave curve
à Response Time is unlimited and represented by a convex curve
Universal Law of Computational Scaling
11
© 2010
Relationship between X and R
Therefore,, reducing
g R means increasing
g X by
y constant N
à Tuning an individual session results in a higher throughput
à A higher throughput can be archived by
tuning individual sessions
parts of the application, e.g. interest engine
Universal Law of Computational Scaling
12
© 2010
Blog comments,
comments opinions and discussions (3)
“The results for the throughput
g p of the system
y
are not consistent”
“Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory”
“Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
“The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
“Do I care more about throughput or response times?”
Universal Law of Computational Scaling
13
© 2010
Really fairly typical?
Contention α: 20%
Trendline Parameters
Super
Quadratic Coefficients Parameter
à Very high
à starving on CPU in this
case?
à Not responsible for a
retrograde t/put
Universal Law of Computational Scaling
a
b
c
14
2.40E-03
0.2051
0.0000
α
Serial
Values
0.2027
0.0118
20
5
β
Nmax
Nopt
p
© 2010
Yes may fairly typical
Yes,
typical, but ..
Coherencyy β
β: 0.0118
Trendline Parameters
Super
Quadratic Coefficients Parameter
à Fair in this case
à Responsible for the
retrograde t/put
Universal Law of Computational Scaling
a
b
c
15
2.40E-03
0.2051
0.0000
α
Serial
Values
0.2027
0.0118
20
5
β
Nmax
Nopt
p
© 2010
Blog comments,
comments opinions and discussions (4)
“The results for the throughput
g p of the system
y
are not consistent”
“Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory”
“Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
“The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
“Do I care more about throughput or response times?”
Universal Law of Computational Scaling
16
© 2010
This is the real core of the Blog
Trendline Parameters
Super
Quadratic Coefficients Parameter
a
b
c
Universal Law of Computational Scaling
α
2.40E-03
0.2051
0.0000
β
Nmax
Nopt
17
Serial
Values
0.2027
0.0118
20
5
© 2010
Recap
The USL tells us
à
à
à
à
à
If the b/m is consistent and valid
If the workload is contention-limited and / or coherency-limited
The theoretical maximum throughput
The optimal number of sessions
The maximal number of sessions
The USL quantification reduces qualitative discussion
We are forced to explain the shape of the curves
We are forced to explain the size of α and β
Equation: USL = BAAG (Battle against any guess)
Universal Law of Computational Scaling
18
© 2010
How to Quantify Oracle Database Scalability
Why quantifying? How quantifying?
Response Time Scalability
Conclusion
Data are always
part of the game.
Universal Law of Computational Scaling
19
© 2010
Case study – another Swingbench experiment
Transactions
à
à
à
à
à
Customer Registration
Browse Products
Order Products
Process Orders
Browse Orders
Load Ratio 10
Load Ratio 50
Load Ratio 10
Load Ratio 10
Load Ratio 50
Think Time: 3ms
UserLoad
1
2
3
6
9
12
15
24
36
TPS
132.61
239.31
358.18
602 09
602.09
732.31
779.26
807.16
803.63
815 00
815.00
Universal Law of Computational Scaling
RTT
0.0070
0.0079
0.0080
0 0099
0.0099
0.0120
0.0154
0.0180
0.0290
0 0440
0.0440
Think
0.0030
0.0030
0.0030
0 0030
0.0030
0.0030
0.0030
0.0030
0.0030
0 0030
0.0030
20
© 2010
Interactive Response
p
Time Law
N
R( N ) =
−Z
X (N )
Z
1
…
N
DB
R
Z is included in Swingbench‘s R (it‘s the round trip time or RTT)
Universal Law of Computational Scaling
21
© 2010
Response
p
Time Measurements – User Load 1
Users
(N)
1
2
3
6
9
12
15
24
36
Measured Measured
RTT
RTT - Z
0.0070
0.0040
0.0079
0.0049
0.0080
0.0050
0 0099
0.0099
0 0069
0.0069
0.0120
0.0090
0.0154
0.0124
0.0180
0.0150
0.0290
0.0260
0.0440
0.0410
It‘s not the R in
the DB, it‘s the
RTT!
Universal Law of Computational Scaling
22
© 2010
Response
p
Time Measurements – User Load 9
Users
(N)
1
2
3
6
9
12
15
24
36
Measured Measured
RTT
RTT - Z
0.0070
0.0040
0.0079
0.0049
0.0080
0.0050
0 0099
0.0099
0 0069
0.0069
0.0120
0.0090
0.0154
0.0124
0.0180
0.0150
0.0290
0.0260
0.0440
0.0410
We doubled the R in the DB
Universal Law of Computational Scaling
23
© 2010
Response
p
Time Measurements – User Load 36
Users
(N)
1
2
3
6
9
12
15
24
36
Universal Law of Computational Scaling
24
Measured Measured
RTT
RTT - Z
0.0070
0.0040
0.0079
0.0049
0.0080
0.0050
0 0099
0.0099
0 0069
0.0069
0.0120
0.0090
0.0154
0.0124
0.0180
0.0150
0.0290
0
0 90
0.0260
0
0 60
0.0440
0.0410
© 2010
Apply
pp y Interactive Response
p
Time Law
N
R( N ) =
−Z
X (N )
Users
(N)
2
3
6
9
12
15
24
36
Predicted Predicted Measured
C(N)
Capacity Capacity
1 00
1.00
132 61
132.61
133
1.84
243.47
239
2.54
336.64
358
4.06
538.39
602
4.99
662.30
732
5 57
5.57
738 64
738.64
779
5.92
784.72
807
6.24
827.72
804
6.01
797.55
815
Universal Law of Computational Scaling
Error
Measured Predicted Calculated
Error
%
R
R (USL)
from t/p
%
0 00
0.00
0 0040
0.0040
0 0045 =L50/O50-0
0.0045
=L50/O50-0.003
003
1.74
0.0049
0.0052
0.0054
-5.30
-6.01
0.0050
0.0059
0.0054
-14.78
-10.58
0.0069
0.0081
0.0070
-15.88
-9.56
0.0090
0.0106
0.0093
-14.72
-5.21
5 21
0 0124
0.0124
0 0132
0.0132
0 0124
0.0124
-6.52
6 52
-2.78
0.0150
0.0161
0.0156
-6.92
3.00
0.0260
0.0260
0.0269
0.02
-2.14
0.0410
0.0421
0.0412
-2.70
25
© 2010
Measured Response
p
Time
Universal Law of Computational Scaling
26
© 2010
Calculated Response
p
Time from Throughput
g p
Universal Law of Computational Scaling
27
© 2010
Predicted Response
p
Time byy USL
Universal Law of Computational Scaling
28
© 2010
Big
gp
picture:
ctu e Throughput
oug put a
and
d Response
espo se Time
e Sca
Scalability
ab ty
Trendline
Quadratic
a
b
c
Parameters
Coefficients
0.0016
0.0878
0.0000
Universal Law of Computational Scaling
Super
Parameter
α
β
Nmax
Nopt
p
Serial
Values
0.0862
0.0181
25
12
29
© 2010
Recap Response Time Scalability
The Response
p
Time can be calculated by
y using
g the interactive
Response Time Law
à Derived from Little‘s Law
R is given by
N
R( N ) =
−Z
X (N )
If the Response Time is measured, it can be used to validate the
model
If the Response Time is not measured, we still can rely on the
math
Universal Law of Computational Scaling
30
© 2010
How to Quantify Oracle Database Scalability
Why quantifying? How quantifying?
Response Time Scalability
Conclusion
Data are always
part of the game.
Universal Law of Computational Scaling
31
© 2010
Conclusion (1)
Neil Gunther’s model adds a new p
parameter to the more familiar
Amdahl’s law
The additional parameter β,
β representing coherence
coherence-related
related
delays, enables Gunther’s formula to model behavior where the
performance of a parallel program can actually degrade at higher
and
d hi
higher
h llevels
l off parallelization
ll li ti
à If β = 0, the USL reduces to Amdahl
Universal Law of Computational Scaling
32
© 2010
Conclusion (2)
Behind the data,, the hidden and useful information are visible
only by the USL model
à Theoretical maximum throughput
à Number of economically sensible Users or CPUs or Nodes (in case of
RAC)
à Let us know, if the controlled measurements are consistent
The law indicates whether scalability would be limited by
contention and / or coherencyy effects
Universal Law of Computational Scaling
33
© 2010
Conclusion (3)
It is no complex
p
queueing
q
g theoryy needed
Response Time can easily predicted from throughput
The USL quantification reduces qualitative discussion
We are forced to explain the shape of the curves
We are forced to explain the size of α and β
USL = BAAG
Universal Law of Computational Scaling
34
© 2010
Resources
Literature
à Book: Guerrilla Capacity Planning (2007), Neil J. Gunther
à Book: Analyzing Computer System Performance with Perl::PDQ
(2005) Neil JJ. Gunther
(2005),
Online Google doc
à
http://spreadsheets google com/ccc?key=0AslFTeSsTP15dGVjLWxBVUI2WHFmWTU1UVhmSjVXcFE&hl=en
http://spreadsheets.google.com/ccc?key=0AslFTeSsTP15dGVjLWxBVUI2WHFmWTU1UVhmSjVXcFE&hl=en
Downloads
à EXCEL sscalc.xls
sscalc xls spreadsheet
à www.perfdynamics.com/Classes/Materials/sscalc-class.xls
Universal Law of Computational Scaling
35
© 2010
Thank you!
Peter Stalder
Peter stalder@trivadis com
[email protected]
Basel
·
Baden
·
Bern
·
Lausanne
·
Zurich
·
Düsseldorf
·
Frankfurt/M.
·
Freiburg i. Br.
·
Hamburg
·
Munich
·
Stuttgart
·
Vienna