How to Quantify Oracle Database Scalability Examples Dr. Neil J. Gunther Perfdynamics

How to Quantify Oracle Database Scalability
Examples
Dr. Neil J. Gunther
Perfdynamics
Peter Stalder
[email protected]
Hotsos Symposium
M h 7 – 11,
March
11 2010
Basel
·
Baden
·
Bern
·
Lausanne
·
Zurich
·
Düsseldorf
·
Frankfurt/M.
·
Freiburg i. Br.
·
Hamburg
·
Munich
·
Stuttgart
·
Vienna
About me
ƒ Senior Consultant at Trivadis AG in Zurich,, Switzerland
à ZH-IMS (Infrastructure Managed Services)
à [email protected]
ƒ Focus
à Application Performance Management (APM)
à Predictive Performance Management (PPM)
à Capacity Management
ƒ Recent presentations
à DOAG, Dec 2009 – Server Consolidation using analytical
modeling
à DOAG ITIL Days, Sept. 2009 – Ressourcen und
Kapazitätsanalysen im Oracle-Umfeld
à ukCMG,
kCMG May
M 2009 – Oracle
O l M
Metrics
i ffor Si
Sizing
i
Universal Law of Computational Scaling
2
© 2010
How to Quantify Oracle Database Scalability
ƒ Why quantifying? How quantifying?
ƒ Response Time Scalability
ƒ Conclusion
Data are always
part of the game.
Universal Law of Computational Scaling
3
© 2010
Swingbench example from a blog
ƒ Results are taken from
à http://oracledoug.com/.../1470-Time-Matters-Throughput-vs.-Response-Time.html
1
Avg.
g
Response
Time (ms)
79
2
108
6,772
,
4
133
10,481
8
198
13,346
12
244
13,639
16
310
14,798
20
337
14,749
24
369
14,176
28
428
15 181
15,181
32
563
13,278
36
533
14,151
40
0
587
58
13,302
3,30
Concurrentt
C
Sessions
Universal Law of Computational Scaling
4
Transactions
T
ti
Completed
4,203
© 2010
Blog comments,
comments opinions and discussions (1)
ƒ “The results for the throughput
g p of the system
y
are not consistent”
ƒ “Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory”
ƒ “Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
ƒ “The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
ƒ “Do I care more about throughput or response times?”
Universal Law of Computational Scaling
5
© 2010
Efficiency - consistent and valid
ƒ The efficiencyy does not exceed 100%
à a first confidence of the measurements
Measured
Users (N)
1
2
4
8
12
16
20
24
28
32
36
40
Universal Law of Computational Scaling
6
Trx / Sec
X(N)
4'203
6'772
10'481
13'346
13'639
14'798
14
798
14'749
14'176
15'181
13'278
14'151
14
151
13'302
RelCap
Efficiency
C=X(N)/X(1)
C/N
1.00
1.00
1.61
0.81
2.49
0.62
3.18
0.40
3.25
0.27
3.52
0.22
3.51
0.18
3.37
0.14
3.61
0.13
3.16
0.10
3 37
3.37
0 09
0.09
3.16
0.08
© 2010
Error Bars - consistent and valid
ƒ The error bars show the error spread
p
in the data
à The spread gives us confidence in the measurements
Universal Law of Computational Scaling
7
© 2010
Deviation - consistent and valid
ƒ A deviation from < 10% is fair
à This gives us further confidence in the measurements
Measured Trx / Sec
Users (N)
X(N)
1
4203
2
6772
4
10481
8
13346
12
13639
16
14798
20
14749
24
14176
28
15181
32
13278
36
14151
40
13302
Universal Law of Computational Scaling
8
Capacity
Modeled
4203
6961
10270
13168
14221
14566
14585
14437
14200
13916
13608
13290
© 2010
Error
%
0.00
2 80
2.80
-2.01
-1.33
4.27
-1.56
-1.11
1.84
-6.46
4.81
-3.83
-0.08
Blog comments,
comments opinions and discussions (2)
ƒ “The results for the throughput
g p of the system
y
are not consistent”
ƒ “Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory.”
ƒ “Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
ƒ “The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
ƒ “Do I care more about throughput or response times?”
Universal Law of Computational Scaling
9
© 2010
Ideal Load Test Case
ƒ Both t/put data X and latency
data R are nonlinear functions
of load N
S t ti
Saturation
Throughput (X) data
ƒ Load N is the independent
variable, representing the
number of sessions
Latency (R) data
ƒ Oracle data should look like
this
N
R( N ) =
−Z
X (N )
Queueing really kicks in
Near origin, X and R appear to be independent.
But this is an illusion due to no queueing (waiting).
Graph provided by Neil Gunther. Thanks!
Universal Law of Computational Scaling
10
© 2010
Real data from the blog
ƒ Throughput
g p and response
p
time data from load limited systems
y
has to have this characteristic
à Throughput is limited and represented by a concave curve
à Response Time is unlimited and represented by a convex curve
Universal Law of Computational Scaling
11
© 2010
Relationship between X and R
ƒ Therefore,, reducing
g R means increasing
g X by
y constant N
à Tuning an individual session results in a higher throughput
à A higher throughput can be archived by
ƒ tuning individual sessions
ƒ parts of the application, e.g. interest engine
Universal Law of Computational Scaling
12
© 2010
Blog comments,
comments opinions and discussions (3)
ƒ “The results for the throughput
g p of the system
y
are not consistent”
ƒ “Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory”
ƒ “Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
ƒ “The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
ƒ “Do I care more about throughput or response times?”
Universal Law of Computational Scaling
13
© 2010
Really fairly typical?
ƒ Contention α: 20%
Trendline Parameters
Super
Quadratic Coefficients Parameter
à Very high
à starving on CPU in this
case?
à Not responsible for a
retrograde t/put
Universal Law of Computational Scaling
a
b
c
14
2.40E-03
0.2051
0.0000
α
Serial
Values
0.2027
0.0118
20
5
β
Nmax
Nopt
p
© 2010
Yes may fairly typical
Yes,
typical, but ..
ƒ Coherencyy β
β: 0.0118
Trendline Parameters
Super
Quadratic Coefficients Parameter
à Fair in this case
à Responsible for the
retrograde t/put
Universal Law of Computational Scaling
a
b
c
15
2.40E-03
0.2051
0.0000
α
Serial
Values
0.2027
0.0118
20
5
β
Nmax
Nopt
p
© 2010
Blog comments,
comments opinions and discussions (4)
ƒ “The results for the throughput
g p of the system
y
are not consistent”
ƒ “Typically throughput will start to drop when we are approaching
the limit of a resource
resource, disk,
disk CPU,
CPU software etc.
etc This is just simple
queuing theory”
ƒ “Once
Once the individual response time issue is improved
improved, the overall
throughput also improves.”
ƒ “The retrograde behavior of your throughput swingbench tests is
fairly typical. It comes from coherency issues as opposed to
contention
contention”
ƒ “Do I care more about throughput or response times?”
Universal Law of Computational Scaling
16
© 2010
This is the real core of the Blog
Trendline Parameters
Super
Quadratic Coefficients Parameter
a
b
c
Universal Law of Computational Scaling
α
2.40E-03
0.2051
0.0000
β
Nmax
Nopt
17
Serial
Values
0.2027
0.0118
20
5
© 2010
Recap
ƒ The USL tells us
à
à
à
à
à
If the b/m is consistent and valid
If the workload is contention-limited and / or coherency-limited
The theoretical maximum throughput
The optimal number of sessions
The maximal number of sessions
ƒ The USL quantification reduces qualitative discussion
ƒ We are forced to explain the shape of the curves
ƒ We are forced to explain the size of α and β
ƒ Equation: USL = BAAG (Battle against any guess)
Universal Law of Computational Scaling
18
© 2010
How to Quantify Oracle Database Scalability
ƒ Why quantifying? How quantifying?
ƒ Response Time Scalability
ƒ Conclusion
Data are always
part of the game.
Universal Law of Computational Scaling
19
© 2010
Case study – another Swingbench experiment
ƒ Transactions
à
à
à
à
à
Customer Registration
Browse Products
Order Products
Process Orders
Browse Orders
Load Ratio 10
Load Ratio 50
Load Ratio 10
Load Ratio 10
Load Ratio 50
ƒ Think Time: 3ms
UserLoad
1
2
3
6
9
12
15
24
36
TPS
132.61
239.31
358.18
602 09
602.09
732.31
779.26
807.16
803.63
815 00
815.00
Universal Law of Computational Scaling
RTT
0.0070
0.0079
0.0080
0 0099
0.0099
0.0120
0.0154
0.0180
0.0290
0 0440
0.0440
Think
0.0030
0.0030
0.0030
0 0030
0.0030
0.0030
0.0030
0.0030
0.0030
0 0030
0.0030
20
© 2010
Interactive Response
p
Time Law
N
R( N ) =
−Z
X (N )
Z
1
…
N
DB
R
Z is included in Swingbench‘s R (it‘s the round trip time or RTT)
Universal Law of Computational Scaling
21
© 2010
Response
p
Time Measurements – User Load 1
Users
(N)
1
2
3
6
9
12
15
24
36
Measured Measured
RTT
RTT - Z
0.0070
0.0040
0.0079
0.0049
0.0080
0.0050
0 0099
0.0099
0 0069
0.0069
0.0120
0.0090
0.0154
0.0124
0.0180
0.0150
0.0290
0.0260
0.0440
0.0410
It‘s not the R in
the DB, it‘s the
RTT!
Universal Law of Computational Scaling
22
© 2010
Response
p
Time Measurements – User Load 9
Users
(N)
1
2
3
6
9
12
15
24
36
Measured Measured
RTT
RTT - Z
0.0070
0.0040
0.0079
0.0049
0.0080
0.0050
0 0099
0.0099
0 0069
0.0069
0.0120
0.0090
0.0154
0.0124
0.0180
0.0150
0.0290
0.0260
0.0440
0.0410
We doubled the R in the DB
Universal Law of Computational Scaling
23
© 2010
Response
p
Time Measurements – User Load 36
Users
(N)
1
2
3
6
9
12
15
24
36
Universal Law of Computational Scaling
24
Measured Measured
RTT
RTT - Z
0.0070
0.0040
0.0079
0.0049
0.0080
0.0050
0 0099
0.0099
0 0069
0.0069
0.0120
0.0090
0.0154
0.0124
0.0180
0.0150
0.0290
0
0 90
0.0260
0
0 60
0.0440
0.0410
© 2010
Apply
pp y Interactive Response
p
Time Law
N
R( N ) =
−Z
X (N )
Users
(N)
2
3
6
9
12
15
24
36
Predicted Predicted Measured
C(N)
Capacity Capacity
1 00
1.00
132 61
132.61
133
1.84
243.47
239
2.54
336.64
358
4.06
538.39
602
4.99
662.30
732
5 57
5.57
738 64
738.64
779
5.92
784.72
807
6.24
827.72
804
6.01
797.55
815
Universal Law of Computational Scaling
Error
Measured Predicted Calculated
Error
%
R
R (USL)
from t/p
%
0 00
0.00
0 0040
0.0040
0 0045 =L50/O50-0
0.0045
=L50/O50-0.003
003
1.74
0.0049
0.0052
0.0054
-5.30
-6.01
0.0050
0.0059
0.0054
-14.78
-10.58
0.0069
0.0081
0.0070
-15.88
-9.56
0.0090
0.0106
0.0093
-14.72
-5.21
5 21
0 0124
0.0124
0 0132
0.0132
0 0124
0.0124
-6.52
6 52
-2.78
0.0150
0.0161
0.0156
-6.92
3.00
0.0260
0.0260
0.0269
0.02
-2.14
0.0410
0.0421
0.0412
-2.70
25
© 2010
Measured Response
p
Time
Universal Law of Computational Scaling
26
© 2010
Calculated Response
p
Time from Throughput
g p
Universal Law of Computational Scaling
27
© 2010
Predicted Response
p
Time byy USL
Universal Law of Computational Scaling
28
© 2010
Big
gp
picture:
ctu e Throughput
oug put a
and
d Response
espo se Time
e Sca
Scalability
ab ty
Trendline
Quadratic
a
b
c
Parameters
Coefficients
0.0016
0.0878
0.0000
Universal Law of Computational Scaling
Super
Parameter
α
β
Nmax
Nopt
p
Serial
Values
0.0862
0.0181
25
12
29
© 2010
Recap Response Time Scalability
ƒ The Response
p
Time can be calculated by
y using
g the interactive
Response Time Law
à Derived from Little‘s Law
ƒ R is given by
N
R( N ) =
−Z
X (N )
ƒ If the Response Time is measured, it can be used to validate the
model
ƒ If the Response Time is not measured, we still can rely on the
math
Universal Law of Computational Scaling
30
© 2010
How to Quantify Oracle Database Scalability
ƒ Why quantifying? How quantifying?
ƒ Response Time Scalability
ƒ Conclusion
Data are always
part of the game.
Universal Law of Computational Scaling
31
© 2010
Conclusion (1)
ƒ Neil Gunther’s model adds a new p
parameter to the more familiar
Amdahl’s law
ƒ The additional parameter β,
β representing coherence
coherence-related
related
delays, enables Gunther’s formula to model behavior where the
performance of a parallel program can actually degrade at higher
and
d hi
higher
h llevels
l off parallelization
ll li ti
à If β = 0, the USL reduces to Amdahl
Universal Law of Computational Scaling
32
© 2010
Conclusion (2)
ƒ Behind the data,, the hidden and useful information are visible
only by the USL model
à Theoretical maximum throughput
à Number of economically sensible Users or CPUs or Nodes (in case of
RAC)
à Let us know, if the controlled measurements are consistent
ƒ The law indicates whether scalability would be limited by
contention and / or coherencyy effects
Universal Law of Computational Scaling
33
© 2010
Conclusion (3)
ƒ It is no complex
p
queueing
q
g theoryy needed
ƒ Response Time can easily predicted from throughput
ƒ The USL quantification reduces qualitative discussion
ƒ We are forced to explain the shape of the curves
ƒ We are forced to explain the size of α and β
ƒ USL = BAAG
Universal Law of Computational Scaling
34
© 2010
Resources
ƒ Literature
à Book: Guerrilla Capacity Planning (2007), Neil J. Gunther
à Book: Analyzing Computer System Performance with Perl::PDQ
(2005) Neil JJ. Gunther
(2005),
ƒ Online Google doc
à
http://spreadsheets google com/ccc?key=0AslFTeSsTP15dGVjLWxBVUI2WHFmWTU1UVhmSjVXcFE&hl=en
http://spreadsheets.google.com/ccc?key=0AslFTeSsTP15dGVjLWxBVUI2WHFmWTU1UVhmSjVXcFE&hl=en
ƒ Downloads
à EXCEL sscalc.xls
sscalc xls spreadsheet
à www.perfdynamics.com/Classes/Materials/sscalc-class.xls
Universal Law of Computational Scaling
35
© 2010
Thank you!
Peter Stalder
Peter stalder@trivadis com
[email protected]
Basel
·
Baden
·
Bern
·
Lausanne
·
Zurich
·
Düsseldorf
·
Frankfurt/M.
·
Freiburg i. Br.
·
Hamburg
·
Munich
·
Stuttgart
·
Vienna