Download Report

Workload Management through Characterization and Idleness
Detection
A Thesis Presented to
Department of Computer Science
Faculty of Basic & Applied Sciences
In Partial Fulfillment
of the requirement for the degree
Of
PhD (Computer Science)
By
ABDUL MATEEN
(41-FBAS/PHDCS/F08)
International Islamic University Islamabad
(2012)
International Islamic University, Islamabad
Faculty of Basic & Applied Sciences
Department of Computer Science
Dated:
FINAL APPROVAL
It is certified that we have read the thesis, entitled “Workload Management through
Characterization and Idleness Detection” submitted by Abdul Mateen, Reg. No. 41FBAS/PHDCS/F08. It is our judgment that this thesis is of sufficient standard to warrant its
acceptance by the International Islamic University Islamabad for PhD Degree in Computer
Science.
PROJECT EVALUATION COMMITTEE
External Examiner:
_______________________
------------------------------------------------------------------Internal Examiner:
--------------, ----------------,
Department of Computer Science,
Faculty of Basic and Applied Sciences,
International Islamic University Islamabad
Supervisor:
Dr. Muhammad Sher
Chairman, Professor
Computer Science & Software Engineering Department
International Islamic University
Islamabad Pakistan
Co-Supervisor:
Dr. Mian Muhammad Awais
Associate Professor
School of Science & Engineering
Lahore University of Management Sciences (LUMS)
Islamabad Pakistan
Workload Management through Characterization and Idleness Detection ii Acknowledgement:
I would like to acknowledge many people who have supported, guided and advised me during
the PhD thesis. First, I would like to express my sincere gratitude to my supervisor Dr.
Muhammad Sher, Professor, Department of Computer Science & Software Engineering,
International Islamic University and co-supervisor Dr. Mian Muhammad Awais, Associate
Professor, University Management Sciences (LUMS), Lahore for providing me an opportunity to
start this study and research. Through the course of my study, I have had the great fortune to get
to know and interact with them. Their valuable comments, suggestions and motivations for
further development as well as her assistance during writing PhD thesis are invaluable to me.
They always guided whenever I stuck during the study and encouraged me during the whole
period.
I am also thankful to Professor Dr. Said Elnaffar, Associate Professor, UAE University and Dr
Norwati Mustapha, Associate Professor, Faculty of Computer Science & Information
Technology, University Putra Malaysia (UPM), Malaysia for their useful suggestions and great
ideas that helped me to move in the right direction.
For the all others who have directly or indirectly helped me in the completion of my work, I
thank you all.
Workload Management through Characterization and Idleness Detection iii Declaration:
I hereby declare and affirm that this thesis neither as a whole no as part thereof has been
copied out from any source. It is further declared that I have completed this thesis entirely on the
basis of my personal effort, made under the sincere guidance of my supervisors. If any part of
this report is proven to be copied our or found to be a reproduction of some other, we shall stand
by the consequences. No portion of the work presented in this report has been submitted in
support of an application for other degree or qualification of this or any other University or
Institute of learning.
Abdul Mateen
41-FBAS/PHDCS/F08
Workload Management through Characterization and Idleness Detection iv Dedication:
This work is dedicated to my parents and family who always encourage, support and guide me to
achieve this milestone.
Abdul Mateen
Workload Management through Characterization and Idleness Detection v Abstract
Database Management Systems (DBMSs) are used as a data source with financial, educational,
telecomm and web applications. Number of users all the time connected with the DBMS to
update the existing records and taking reports by executing complex workloads or queries. Due
to this reason the workload in DBMSs fluctuates all the time and sometime this fluctuation is
sudden and difficult to handle. In order to get the sufficient level of performance, arrangement of
workload is necessary especially when there are huge and complex workloads.
Rapid growth in data, maximum functionality and changing behavior tends the database
workload to be more complex and tricky.
Every organization observes complex types of
database workloads that are difficult to manage by the humans; human experts take much time to
get sufficient experience to manage workload efficiently; and even in some cases this task
becomes impossible and leads towards malnourishment. The versatility in database workload
leads database practitioners, vendors and researchers towards the new challenges. Efficient
execution and resource allocation for the workload is dependent on the workload type that may
be either On Line Transaction Processing (OLTP) or Decision Support System (DSS). OLTP
workload consists of small transactions and accesses a small amount of data. On the other hand
DSS workload comprises of queries that are long, highly intensive, need many resources and
processing. To achieve a satisfactory level of performance, either Database Administrator (DBA)
or DBMSs must have the knowledge about the workload shifts.
We are introducing a way to manage the workload in DBMSs on the basis of the workload type.
The research presents a novel approach for the database workload management through the Case
Base Reasoning Characterization and classification approach, impact based Fuzzy Logic
Scheduling and CPU Idleness Detection. The proposed approach is validated through the
experiments which are performed on the real time and benchmark workload to reveal the
effectiveness and efficiency. The ultimate goal of the research is to manage the workload in
DBMSs through Characterization, Scheduler and Idleness Detection.
Achievements and Contribution
Journal Publications
1. Abdul Mateen, Basit Raza, Mian M. Awais, Muhammad Sher, Norwatti Mustapha,
Workload Management: A Technology Perspective with respect to Self-*
Characteristics, Artificial Intelligence Review (Impact Factor = 1.213), 2012.
2. Basit Raza, Abdul Mateen, Mian M. Awais, Muhammad Sher, A Survey on
Autonomic Workload Management: Algorithms, Techniques and Models, Journal of
Computing (Impact Factor 0.2), 2011, Volume 3(6), pp. 29-38, June 2011.
3. Abdul Mateen, Basit Raza, Mian Muhammad Awais, Muhammad Sher, Selection of
the Best DBMS: A Fuzzy based Multi-objective Decision Making Approach, Lecture
Notes in Engineering and Computer Science (ISI Indexed), 2011, Volume 2188 (1),
Pages/record No : 287-292
4. Basit Raza, Abdul Mateen, Muhammad Sher, and Mian Muhammad Awais, SelfPrediction of Performance Metrics for the Database Management System Workload,
International Journal of Computer Theory and Engineering, INSPEC, Vol. 4, No. 2,
2012.
Conference Publications
1. Abdul Mateen, Basit Raza, Muhammad Sher, Mian M. Awais, “Database Workload
Management – Characterization to Idleness Detection”, International Conference on
Computing and Information Technology Al-Madinah, Saudi Arabia, March 2012.
2. Abdul Mateen, Basit Raza, Mian M. Awais, Muhammad Sher “Selection of the Best
DBMS: A Fuzzy Based Multi-Objective Decision Making Approach” IAENG
International Conference on Computer Science, Hong Kong, 16-18 March, 2011.
3. Abdul Mateen, Basit Raza, Muhammad Sher, Mian M. Awais, T. Hussain, “Evolution
of Autonomic Database Management Systems”, The 2nd International Conference on
Computer and Automation Engineering, Singapore, 26-28 Feb 2010.
4. Basit Raza, Abdul Mateen, Muhammad Sher, Mian M. Awais, “Self-Prediction of
Performance Metrics for the Database Management System Workload” 2011
International Conference on Information and Computer Applications (ICICA 2011)
18-20 March, Dubai, UAE, 2011.
5. Abdul Mateen, Basit Raza, Tauqeer Hussain, Mian M. Awais, “Autonomicity in
Universal Database DB2”, 8th IEEE International Conference on Computer and
Information Science (ICIS 09), Shanghai, China, Pp. 445-450, June 1-3 2009.
6. Basit Raza, Abdul Mateen, Muhammad Sher, Mian Muhammad Awais, Tauqeer
Hussain, “Autonomic View of Query Optimizers in Database Management Systems”,
Software Engineering, Software Engineering Research, Management & Applications,
Canada, 2010.
7. B. Raza, A. Mateen, M. Sher, M.M. Awais, and T. Hussain, “Autonomicity in Oracle
Databases Management System”, International Conference on Data Storage and Data
Engineering (DSDE 2010), India. 2010.
8. B. Raza, A. Mateen, T. Hussain, M. M. Awais, “Autonomic Success in Databases
Management Systems”, 8th IEEE International Conference on Computer and
Information Science (ICIS 09), Shanghai, China, Pp. 439-444, June 1-3 2009.
Workload Management through Characterization and Idleness Detection viii Table of Contents
Chapter 1: Introduction & Background ..................................................................................... 1
1.1 Introduction ............................................................................................................................... 2
1.2 What is Workload? ................................................................................................................... 3
1.3 Workload Evolution .................................................................................................................. 4
1.4 Workload Management Methods.............................................................................................. 5
1.5 Artificial Intelligence (AI) Techniques..................................................................................... 7
1.6 Autonomic Computing ........................................................................................................... 11
1.7 Autonomic Workload Management........................................................................................ 13
1.8 Problem Statement .................................................................................................................. 16
1.9 Research Questions ................................................................................................................. 17
Chapter 2: Related Work .............................................................................................................. 24
2.1 State of the Art in Workload Management ............................................................................. 25
2.1.1 Workload Characterization Techniques............................................................................... 25
2.1.2 Scheduling Techniques ........................................................................................................ 29
2.1.3 Limitations in Previous Research Work .............................................................................. 39
Chapter 3: ReSearch Methodology ............................................................................................... 41
3.1 Research Problem Overview................................................................................................... 42
3.2 Proposed Solution ................................................................................................................... 43
3.2.1 Characterization ................................................................................................................... 43
3.2.2 Scheduler.............................................................................................................................. 45
3.2.3 Idleness Detection & Exploitation ....................................................................................... 46
3.3 Experimental Setup & Results Validation .............................................................................. 47
3.3.1 Experimental Setup .............................................................................................................. 47
3.3.2 Database Management System ............................................................................................ 48
3.3.3 Proposed Technique Scenario .............................................................................................. 48
3.3.4 Evaluation Metrics ............................................................................................................... 49
Summary ....................................................................................................................................... 50
Chapter 4: Workload Characterization ......................................................................................... 51
4.1 Introduction and Background ................................................................................................. 52
4.2 Problems in Previous Workload Management Techniques .................................................... 53
4.3 Proposed Workload Management Solution ............................................................................ 54
4.3.1 Workload Characterization .................................................................................................. 54
4.3.1.1 Key Attributes Selection ................................................................................................... 55
4.3.1.2 CBR Development ............................................................................................................ 58
4.3.2 Detection of the DB Workload ............................................................................................ 65
4.4 Experimental Setup & Results Validation .............................................................................. 65
4.4.1 Workload Editor................................................................................................................... 66
4.4.2 Workload Database .............................................................................................................. 66
4.4.4 Results Discussion ............................................................................................................... 79
Summary ....................................................................................................................................... 80
Chapter 5: Fuzzy Logic Based Scheduling .................................................................................. 81
5.1 Introduction and Background ................................................................................................. 82
5.1.1 Scheduling............................................................................................................................ 82
5.1.1 Fuzzy Logic ......................................................................................................................... 83
5.2 Fuzzy Based Scheduler (FBS) ................................................................................................ 84
5.1.1 Fuzzification ........................................................................................................................ 85
5.1.2 Fuzzification Rules .............................................................................................................. 87
5.1.3 Defuzzification..................................................................................................................... 89
Summary ....................................................................................................................................... 90
Chapter 6: Idleness Detection ....................................................................................................... 91
6.1 CBR Development for Idleness Detection .............................................................................. 92
6.1.1 Retreive ................................................................................................................................ 92
6.1.2 Reuse .................................................................................................................................... 95
6.1.3 Revise ................................................................................................................................... 95
6.1.4 Retain ................................................................................................................................... 96
Summary ....................................................................................................................................... 96
Chapter 7: Results and Discussion ............................................................................................... 97
7.1 Experimental Setup ................................................................................................................. 98
7.2 Results and Discussion ........................................................................................................... 99
7.2.1 Proposed Vs FIFO Based Workload Management ............................................................ 100
7.2.2 Proposed Vs Priority Based Workload Management ........................................................ 106
7.2.3 Proposed Workload Management Vs SJF ......................................................................... 113
Summary ..................................................................................................................................... 116
Conclusion ................................................................................................................................. 117
Future Work ................................................................................................................................ 119
Refrences .................................................................................................................................... 120
Appendix – Database Structure .................................................................................................. 125
List of Figures
Figure 1.1
Autonomic Computing Elements ………………………………………….
12
Figure 1.2
Workload Management Activities ………………………………………...
14
Figure 1.3
Autonomic Workload Manager …………………………………………...
15
Figure 2.1
Query Shuffler (Mumtaz et al., 2009) ……………………………………..
31
Figure 3.1
Components of Workload Management Framework ……………………...
43
Figure 3.2
Steps of Characterization Module …………………………………………
45
Figure 3.3
Experimental Setup for the Proposed Solution ……………………………
48
Figure 4.1
Steps to Characterize the DBMS/DW Workload ………………………….
55
Figure 4.2
OLTP and DSS workload execution with respect to Key_writes …………
57
Figure 4.3
OLTP and DSS workload execution w.r.t Key_reads …………………….
57
Figure 4.4
OLTP and DSS workload execution w.r.t Table_lock_immediate ………..
58
Figure 4.5
Steps to Characterize and Classify the Workload via CBR ……………….
60
Figure 6.1
Steps to Identify the CPU Idleness ………………………………………..
93
Figure 7.1
Waiting Time of the Proposed Vs FIFO Workload Management ………...
102
Figure 7.2
Average Waiting Time of the Proposed Vs FIFO Workload Management . 103
Waiting Time of the Proposed Vs FIFO Workload Management ………... 105
Figure 7.3
Figure 7.4
Figure 7.5
Average Waiting Time of the Proposed Vs FIFO Workload Management . 106
Waiting Time of the Proposed Vs PB Workload Management …………... 108
Figure 7.6
Average Waiting Time of PB Vs Proposed Approach ……………………
109
Figure 7.7
Waiting Time of the Proposed Vs PB Workload Management …………...
111
Figure 7.8
Average Waiting Time of PB Vs Proposed Approach ……………………
112
Figure 7.9
Waiting Time of the Proposed Approach Vs SJF Workload Management .
114
Figure 7.10
Average Waiting Time of SJF Vs Proposed Approach …………………...
115
Workload Management through Characterization and Idleness Detection xii List of Tables
Table 2.1
Analysis of the Workload Characterization Techniques in DBMSs
28
Table 2.2
Analysis of the Workload Management Techniques in DBMSs
38
Table 4.1
CBR with Sample Data
59
Table 4.2
Euclidean Difference Calculation for Incoming Workload
61
Table 4.3
Euclidean Difference Calculation for Incoming Workload (Example 2)
62
Table 4.4
CBR Data with Hash Values
63
Table 5.1
The Workload Involving Mix OLTP and DSS Queries
85
Table 5.2
Percentage of OLTP and DSS in the Workload Involving Mix Queries
86
Table 5.3
Initial Fuzzification Rules
87
Table 5.4
Initial Fuzzification Rules in Compact Form
88
Table 5.5
Final Fuzzification Rules in Compact Form
89
Table 5.6
Workloads with their Identified Workload Class
89
Table 6.1
CBR DATA for CPU Idleness Detection
92
Table 6.2
Euclidean Difference Calculation (Example 1)
94
Table 7.1
Comparative Results of the Proposed and FIFO Workload Management 100
for Scenario 1
Table 7.2
Comparative Results of the Proposed and FIFO Workload Management 100
for Scenario 2
Table 7.3
Comparative Results of the Proposed and FIFO Workload Management 101
for Scenario 3
Table 7.4
Average Waiting Time of the Proposed and FIFO Workload
103
Management
Table 7.5
Comparative Results of the Proposed and FIFO Workload Management 103
for Scenario 1
Workload Management through Characterization and Idleness Detection xiii Table 7.6
Comparative Results of the Proposed and FIFO Workload 104
Management for Scenario 2 Table 7.7
Comparative Results of the Proposed and FIFO Workload 104
Management for Scenario 3 Table 7.8
Average Waiting Time of the Proposed and FIFO Workload Management Table 7.9
106
Comparative Results of the Proposed and PB Workload Management 107
for Scenario 1 Table 7.10
Comparative Results of the Proposed and PB Workload Management 107
for Scenario 2 Table 7.11
Comparative Results of the Proposed and PB Workload Management 107
for Scenario 3 Table 7.12
Average Waiting Time of the Proposed and PB Workload Management 109
Table 7.13
Comparative Results of the Proposed and PB Workload Management 110
for Scenario 1 Table 7.14
Comparative Results of the Proposed and PB Workload Management 110
for Scenario 2 Table 7.15
Comparative Results of the Proposed and PB Workload Management 110
for Scenario 3 Table 7.16
Average Waiting Time of the Proposed and PB Workload Management 112
Table 7.17
Comparative Results of the Proposed and SJF Workload Management 113
for Scenario 1 Table 7.18
Comparative Results of the Proposed and SJF Workload Management 113
for Scenario 2 Table 7.19
Comparative Results of the Proposed and SJF Workload Management 114
for Scenario 3 Table 7.20
Average Waiting Time of the Proposed and SJF Workload Management 115
Workload Management through Characterization and Idleness Detection xiv List of Abbreviation AC
Autonomic Computing
ADDM
Automatic Database Diagnostic Monitor
AI
Artificial Intelligence
ANN
Artificial Neural Network
AWM
Autonomic Workload Management
AWR
Automatic Workload Repository
BI
Business Intelligence
CBR
Case Based Reasoning
CBMG
Customer Behavior Model Graph
CHAID
CHi-squared Automatic Interaction Detector
COLT
Continuous On-Line Tuning
CRT
Classification & Regression Tree
DAD
Disk Array Designer
DBA
Database Administrator
DBMS
Database Management System
DSS
Decision Support System
DW
Data Warehouses
DTA
Database Tuning Advisor
FBS
Fuzzy Based Scheduler
FCFS
First Come First Server
FIFO
First In First Out
KB
Knowledge Base
Workload Management through Characterization and Idleness Detection xv LMP
Largest Memory Priority
MARS
Multivariate Adaptive Regression Splines
ML
Machine Learning
MPL
Multiprogramming Level
MV
Materialized View
MWS
Mixed Workload Scheduler
NN
Neural Network
ODRM
Oracle Database Resource Manager
OLAP
On Line Analytical Processing
OLTP
On Line Transaction Processing
PAQRS
Priority Adaptation Query Resource Scheduling
PBS
Priority Based Scheduler
PGM
Priority Gradient Multiprogramming
PI
Progress Indicator
PMM
Priority Memory Management
QEP
Query Execution Plan
QP
Query Patroller
QShuffler
Query Shuffler
QUEST
Quick, Unbiased & Efficient Statistical Trees
QUIET
Continuous Query-Driven Index Tuning
REDWAR
RElational Database Workload AnalyzeR
rFEED
Fair, Effective, Efficient and Differentiated
SAA
SQL Access Advisor
SDD
SemiDiscrete Decomposition
Workload Management through Characterization and Idleness Detection xvi SJF
Smallest Job First
SLA
Service Level Agreement
SLG
Service Level Goal
SLO
Service Level Objective
SVD
Singular Value Decomposition
SVM
Support Vector Machine
TCO
Total Cost of Ownership
Workload Management through Characterization and Idleness Detection xvii Chapter 1: Introduction & Background
Chapter 1 Introduction and Background 1.1 Introduction
DBMSs are used to manage the data in various organizations from the last many years and
currently included in core components of the organization. According to a report the worldwide
market for DBMS was more than $15 billion in 2005 with an estimated 10% annual growth [IDC Competitive Analysis (2007)]. The complexity in DBMSs increases due to the various functionality demands from the users,
complex data types, diverse workload and data volume that is increasing with the passage of
time. These factors cause brittleness and unmanageability in DBMSs. To handle this problem
organizations hire number of expert DBAs and spending lot of money to get expected
improvement, throughput and response. Usually DBAs have to take care of all the tasks such as
making policy for workload priorities, memory, configuration and other DBMS related tasks.
Currently, the workload complexity has been increased up to the level where it cannot be
handled by the DBA or other database expert.
On the other hand, cost of the hardware is decreasing as compared to the management cost.
Performing workload management activities manually, by hiring experts causes increase in Total
Cost of Ownership (TCO) [Huebscher et al. (2008), IBM Corporation (2003)]. Moreover with
the advent of distributed systems and data ware house, it became difficult and even some cases
impossible for DBA to manually organize, optimize and configure day to day tasks. To achieve
better workload management different approaches are used such as executing queries may be
stopped for a while and later these can be restarted.
Contemporary Database Management Systems are facing challenges of managing various types
of workload, demands for maximum functionality, users with different priorities, OLTP, DSS
and highly intensive and On Line Analytical Processing (OLAP) queries that involve complex
decision making. Main challenge involved with the workload management in DBMSs is the
handling of complex, tricky and large workloads that consists of complex OLAP queries. These
queries have mathematical functions, number of joins, nested queries and usually operate on
large data sets. These workloads answer different business questions, consume many resources,
Workload Management through Characterization and Idleness Detction 2 Chapter 1 Introduction and Background access huge data and require large time for complete execution. Due to this reason, they impact
the performance of smaller and more important workloads which need an immediate response.
With the regular user oriented workload, DBMS also has to take care of the system oriented
tasks that execute permanently such as database backup, index re-organization, statistics
updation etc.
The proposed research is started with the aim to manage the database workload in DBMSs.
There is need to take care of all the underlying issues and arranges the database workload in such
a way that each request has to wait for a minimum time with maximum CPU utilization. The
research is related with the workload, its management; Machine Learning (ML) techniques (as
the workload management problem is solved through ML techniques); and the Autonomic
Computing. Therefore in the following section, we are discussing these topics in detail.
1.2 What is Workload?
Workload is defined as the task that is given to some worker to perform in specific amount of
time. In Computer Science it is defined as the processing time to finish a job that further may
involves some programming and number of users [Niu et al. (2006)]. The work that is produced
or can be produced by the DBMS in a specified time interval is known as the database workload.
More concrete definition of the DB workload may be the set of structured query language
statements that may be either OLTP or DSS, sessions, commands and other operations that run in
an interval. OLTP workload usually consists of small transactions that may involve update,
delete etc while the DSS workload comprises of queries that are long, highly resource intensive
and involve many computations.
Example of OLTP Query:
The OLTP query consists of the transactions that take small amount for execution, processing as
well as access small amount of data.
Update Region set R_Name = 'Islamabad', R_Comment = 'Capital of
Pakistan' where R_Regionkey = 7
Workload Management through Characterization and Idleness Detction 3 Chapter 1 Introduction and Background Example of DSS Query:
Select S_Name, S_Address From Supplier, Nation
Where
S_Suppkey In (
Select Ps_Suppkey From Partsupp
Where
Ps_Partkey In (
Select P_Partkey From Part
Where
P_Name Like '[Color]%'
)
And Ps_Availqty > (
Select 0.5 * Sum(L_Quantity) From Lineitem
Where
L_Partkey = Ps_Partkey
and L_Suppkey = Ps_Suppkey
and L_Shipdate >= Date('[Date]’)
and L_Shipdate < Date('[Date]’) + Interval ‘1’ Year
)
)
and S_Nationkey = N_Nationkey
and N_Name = '[Nation]'
Order By
S_Name;
The above DSS query is an example of the ad-hoc queries that is used to show the supplier name
and their related address in ascending order from the supplier and nation tables on the basis of
some conditions. The conditions involve in this query subsequently consists of other sub-queries.
This type of query consumes many resources (such as memory, CPU cycles etc) as well as
affects the performance of the DBMS.
1.3 Workload Evolution
Workload has evolved through three phases which are capacity planning, resource sharing and
performance oriented [Niu et al. (2006)]. Capacity planning was introduced before 1980s with the
purpose of cost sharing and provides a way to find out the future organization’s demand. It is
used to estimate the future requirements of the organization. The word capacity represents the
maximum amount of work that an organization can perform in specific time. In workload
management, the capacity planning always guarantees the functions to achieve the production
Workload Management through Characterization and Idleness Detction 4 Chapter 1 Introduction and Background goal within the budget. The main concern of the capacity planning with respect to workload
change is with resources, their management, configuration and what will be the impact. Capacity
planning is performed through characterization and performance [Niu (2008)]. Later on the research over the capacity planning was sparse and converted into the resourceoriented workload due to the invention of minicomputers. In this the concept of service level was
maintained through maximum resource utilization by allocating the resources to various
applications. So the main idea behind the resource oriented workload was the maximum resource
utilization.
Heavy growth and versatility in workload introduced the concept of performance oriented
workload where the focus is over the business goals and objectives. The performance oriented
workloads work in a loop (measure, plan and modify) under the certain Service Level
Agreements (SLAs). The aim of the performance oriented workload is to best use of resources
without laborious activity. It performs the adaptation by configuring and optimizing without any
human intervention and also known as autonomic workload management [Lightstone et al.
(2002)]. In summary, the workload has been evolved from the capacity planning to the resource
oriented and finally performance oriented. The basis of the workload evolution is cost sharing
and Service Level Objectives (SLOs) due to the technology. SLO defines the objective of the
agreement and is the main entity of a SLA between the service provider and customer.
1.4 Workload Management Methods
There workload in DBMSs is managed through different approaches such as characterization,
scheduling, multiprogramming level and resource allocation.
1.4.1 Characterization
It is a process in which things are partitioned into the groups on the basis of similarities or
specific characteristics. In case of database workload it is the process where various workloads
are grouped into workload types such as OLTP, DSS etc.
Workload Management through Characterization and Idleness Detction 5 Chapter 1 Introduction and Background Before characterizing the workload, detection of the workload is performed. Basic purpose of the
workload detection is to find out changes and trends in the workload for predicting future
workloads. In general, workload is detected through two methods i.e. performance monitoring
and characterization. Former is the reactive approach that takes action when performance has
been degraded while the later is proactive approach to track the workload changes [Niu et al. (2006), (2007), (2007)]. Workload characterization is achieved by partitioning, analysis and
resource demand that is provided after monitoring the system where workload partitioning
reduces the complexity, workload analysis is useful for efficient workload management and
monitoring is used to observe the performance.
Identification of the workload type is not an easy task as size and type of the database workload
changes at different spans of the day. Behavior of the database workload also varies with respect
to day, week and month. For example, in case of stock exchange workload, at the start of the day
the server experiences more DSS type of workload than OLTP, turns to more OLTP like
workload than DSS during mid of the day and finally once again tends to DSS workload
overnight to observe the whole day analysis and reporting.
1.4.2 Scheduling
It is a way to make decision about the workload execution by arranging the user requests and
proper resource allocation. Usually DBMSs execute all the requests in time without
compromising the other works. But when there is huge amount of sessions, requests, and other
system jobs arrive simultaneously, DBMS either crashes due to the overload problem or
management has to wait for a long time to generate useful reports for decision making. An
efficient scheduler is one that balances the user oriented and system workload with respect to
size and importance throughout the time that will ultimately maximize the CPU utilization.
There are two ways to schedule the user requests or workload in DBMS that may be either
internal or external from DBMS [Schroeder et al. (2006)]. In case of internal scheduling, one
achieves it by updating the internals of the DBMS while in case of external scheduling there is
no need to change the internals. DBMS researchers, vendors, and practitioners prefer the external
scheduling due to a number of benefits. For example, they do not need to learn about the
Workload Management through Characterization and Idleness Detction 6 Chapter 1 Introduction and Background DBMS’s internal architecture, not necessary to change the DBMS internals, and it is general, not
specific to certain DBMS hence portable. Moreover, external scheduling can be modified easily
as per requirements and objectives of the organization. It has been applied on the web storage
and database servers for many years.
1.4.3 Multiprogramming Level
It is the way in which numbers of requests or transactions are limited before sending to DBMS
for execution. However, the task to select the right MPL is not any easy tasks as decreasing MPL
also decrease the throughput and too much increasing also downs the throughput. So in MPL
selection there is a need to take care about both that it should not so low or high that decreases
the system performance. Mostly the external scheduling is used for limiting the workload or
number of requests i.e. multiprogramming level.
1.4.4 Resource Allocation
In general there are two types of resources in DBMSs, i.e. physical and logical. The physical
resources include the memory, storage, CPU cycles, I/Os etc while the logical resources include
the table locks, system catalog etc. All the operations of the DBMS complete successfully with
the interaction of both type of resources. However, this interaction often leads to increase the
DBMS complexity. The other reason that has increased the DBMS complexity is the
heterogeneous nature of the resources.
1.5 Artificial Intelligence (AI) Techniques
As in this research, we are proposing an autonomous solution to handle the workload
management in DBMSs so the following section is reserved to elaborate some of the well
renowned AI and ML techniques that solves the problems intelligently. These AI and ML
techniques have been used from many years not only in databases but many other fields to
handle various problems.
Workload Management through Characterization and Idleness Detction 7 Chapter 1 Introduction and Background The concept of the Artificial Intelligence is introduced by the Turing in 1950 [Stuart et al. (2008)] and also discuss that how intelligence can be incorporated in machines. After the advent
of the AI field much of the research has been done in this area and number of AI techniques has
been introduced. We are discussing a few of them in the following section.
1.5.1 Decision Tree
Decision tree represents different parameters in the form of nodes that may be parent or child and
are used for the decision making. Moreover, they are also used in decision analysis, prediction
and learning. The nodes of the decision tree consist of actions and their related decision.
Problems are solved through the decision trees by applying some algorithms that may be ID3,
C4.5, CHi-squared Automatic Interaction Detector (CHAID) and MARS (Multivariate Adaptive
Regression Splines). Decision trees solve the problems with visual display and are used in many
fields such as Data Mining, Machine Learning, DBMS, DWs, Networks etc.
Decision Tree has numerous advantages such as simple, inexpensive, ability to coordinate with
other techniques, handles numerical as well as categorical data, robust and less time consuming.
There are also some disadvantages of the Decision trees such as sometime it represents a very
long and complex tree that needs pruning, operations such as XOR, parity or multiplexer cannot
be performed and locally optimal decisions are achieved due to the use of greedy algorithm in
many cases.
1.5.2 Bayesian Algorithm
The algorithm is introduced that uses Bayesian theorem that was introduced by Thomas Bayes
and works on the basis of previous history or training data. It is mostly applied in the probability,
prediction and classification related problems. Bayesian algorithm is widely used in many
applications and fields including statistics, science, medicine, control theory, engineering and
law. Bayesian algorithm can start working from small training data.
1.5.3 Neural Network
A network of neurons or artificial neurons is called a Neural Network (NN) or Artificial Neural
Network (ANN) [Hopfield J, (1982)]. There are large number of processing elements in the NN
Workload Management through Characterization and Idleness Detction 8 Chapter 1 Introduction and Background which are strongly connected with each other and works at the same time in order to get the
solution. The basic concept of NN is taken from the human brain neurons where the neural
processing creates the AI. The tasks performed by the human are due to the coordination of
certain neurons that become stronger when same task is performed again and finally leads to the
formation of memory. NNs solve the problem in the same way as we human do and also learn
from the examples just like human. However, there need high care in the selection of examples
for NN. As compare to the conventional computing NN solves the problem in different way as
conventional approach uses the steps or algorithm to solve the problem while NN uses its
learning power or knowledge to solve the problem. After its advent, various applications that are
related with medicine, business etc are using it due to their adaptive learning, organizing of
information and ability to correct the problems at its own.
NN has two types with respect to the architecture, i.e. Feed-Forward and Feedback Networks.
Former is also known as bottom-up or top-down approach where the solution solving of starts
from input to output and output of any layer cannot effect itself or previous layers while in case
Feedback the output of any layer can effect to any other layer, i.e. both directions. Due to this
reason Feedback NNs are dynamic in nature but also very complex. However, as NN finds the
solution at its own so sometime it can produce unpredictable solution.
1.5.4 Case Based Reasoning
The Case Base reasoning (CBR) solves the problem by using the solution of the similar past
problems that are stored in the knowledge base. The technique is introduced by the Roger
Schank to develop the dynamic memory model [Roger S, (1982)]. The mechanism behind the
CBR to solve the problem is just like the same as we human do in our daily life. When humans
face any problem, they try to relate it their past problems, find the similar past problems and use
its solution. There are four steps to solve the give problems through CBR, i.e. Retrieve, Reuse,
Retain and Revise. In retrieve step the case of problem to be solved is formed which consists of
the problem with related attributes and most similar problem is extracted from the knowledge
base. In Reuse, the solution of the similar case is taken and mapped it to the given problem. The
solution is tested in the Revise step and certain modifications are made if necessary. Finally, in
Workload Management through Characterization and Idleness Detction 9 Chapter 1 Introduction and Background Retain the adopted solution is stored in the knowledge base as a new case that can be used for
future problem solving.
It looks that CBR is similar to the rule induction algorithms as both starts towards their solution
with a set of cases. But CBR made the generalization after the knowledge of target problem
while the rule induction algorithm performs generalizations before. There are certain problems
with CBR such as retrieval time increases with the addition of the new cases.
1.5.5 Fuzzy Logic
In Boolean logic there are two values True and False, however in Fuzzy logic there is either truth
or false value which is in the range of [0, 1]. Fuzzy logic was introduced by Lotfi Zadeh that is
based on the concept of partial truth and approximation rather than fixed [Zadeh, (1965)]. Fuzzy
logic is similar to probabilistic logic as in both cases the truth value has range from 0 to 1, but
Fuzzy represents the degree of truth while the probability shows the likelihood. After its
introduction, it has been used in various fields including Control Theory, Engineering, Networks,
DBMS, and Artificial Intelligence etc.
The Fuzzy logic is performed through three steps which are Fuzzification, Fuzzy Rules
Evaluation, Aggregation and Defuzzification. In fuzzification step, crisp input values are
evaluated by applying the membership functions that returns a value in between 0 and 1.
Membership function must be simple and easily convertible to fuzzy set. The methods in this
step may be Gaussian, Triangular or Trapezoidal. In Rule Evaluation the evaluated output
received from the fuzzification step is checked through some already defined fuzzy rules which
portray the expert decision and written in the form of fuzzy words. The Aggregation is applied to
each output variable that performs the conversion of result into a fuzzy set.
Finally, in
Defuzzification, the fuzzified rules are used to produce the final output by changing the fuzzy set
into output values. Usually two methods (Maximum method and Centroid methods) are used in
this step.
Workload Management through Characterization and Idleness Detction 10 Chapter 1 Introduction and Background 1.5.6 Support Vector Machine
Support Vector Machine (SVM) is a machine learning algorithm that is similar to Neural
Networks and used for classification. It was introduced by Vladimir N. Vapnik and later on
modified by Vapnik and Corinna [Cortes et al. (1995)]. It provides a solution for the
classification and regression analysis by the data analysis and pattern recognition. It provides a
way of learning and makes prediction on the basis of training data that may consists of small
number of examples. The basis of the SVM is decision planes where classification is performed
by making an N-dimensional plane and separating the data into two categories. SVM is selected
due to its fastness, robustness and accuracy than other machine learning algorithms.
1.6 Autonomic Computing
The systems, which can execute, adjust and tune themselves in the presence of workload, are
called autonomic systems [Horn (2001)]. The autonomic systems focus on what rather than how.
The term autonomic computing was first time introduced by IBM in 2001 to describe the
systems that can manage themselves without any human interaction. Autonomic systems always
work in certain environment. The inspiration of the autonomic computing is taken from the
human nervous system that does different activities without conscious thought [Parashar et al. (2005)]. An autonomic computing system would control the functioning of computer applications
and systems without or minimal human intervention, in the same way human autonomic nervous
system regulates body system without conscious input from the individual.
1.6.1 AC Architecture
Generally the AC system consists of eight elements that work in a loop as shown in Figure 1.1.
These elements are managed element(s), sensor, monitor, analyze, plan, execute, Knowledge
Base (KB) and effecter [White et al. (2004), Mateen et al. (2008)]. Managed element is the main
element (s) in an autonomic system that requires to be managed. Sensor is responsible to collect
the information after the interaction with one or more managed elements. Monitor examines the
information provided from the sensor, filters and stores the useful information into the KB.
Analyze component compares the intended parameters with the parameters provided by the
Workload Management through Characterization and Idleness Detction 11 Chapter 1 Introduction and Background monitor component. The results of comparison are also stored in KB for future use. The purpose
of the planning component is to identify the possible ways to correct trends on the basis of
previously provided data. Execute component uses the effectors to adjust the parameter of the
managed element (s) and stores the values in KB. Knowledge base is a repository that contains
all the data or information whether it is in raw form or in refined form.
Figure 1.4
Autonomic Computing Elements [White et al. (2004)] 1.6.2 AC Characteristics
An AC system has some specific characteristics such as Self-Configuration, Self-Optimization,
Self-Healing, Self-Protection, Self-Organization and Self-Inspection. Self-Configuration is the
characteristic which has the capability to configure itself according to the environment and
desired goals. Self-optimization is the characteristic due to which autonomic system performs its
tasks and executes utilities in a best way. Self-healing is the capability to maintain the system in
a consistent state all the times. Whenever there is an attempt to failure, autonomic system
remains in the consistent state or recovers from failure using logs and back up. Self-Protection of
an autonomic system includes the security and privacy auditing mechanisms. SelfReorganization in an autonomic system dynamically reorganizes and restructures the layout of
the stored data, indexes and other to improve system performance. Through Self-Inspection,
autonomic system knows itself in order to make intelligent decisions. It includes the information
about its components, available resources, environment, current status, capabilities, limits,
boundaries and interdependencies with other systems.
Workload Management through Characterization and Idleness Detction 12 Chapter 1 Introduction and Background 1.6.3 AC Levels
The evolutionary process of autonomic computing has five levels which are Basic, Managed,
predictive, Adoptive and finally Autonomic. Basic level is a starting point where an individual
manages various jobs. These jobs may be set up for IT environment infrastructure, monitor and
make changes as per requirements. In Managed level system management technologies are used
to collect information from the systems. The system collected information is used and analyzed
by the administrator for future predictions and decisions. The predictive level provides new
technologies to find correlation among system components. The components are so mature and
advance that they can initiate pattern recognition, prediction and suggestions for optimal
solutions. At the Adoptive level components use the available information and knowledge of the
system to not only predict but can take different actions automatically. Finally at the Autonomic
level, the business policies and objectives are monitored to change the business policies,
objective or both.
1.7 Autonomic Workload Management
In workload management there are three entities which are workload, resources and objectives
that are co-related with each other. The workload uses some resources to meet the objectives of
an organization or we can also say that resources are allocated through different approaches to
the workload which has some management objective. In workload management the main
functions are workload frequency patterns, composition, intensity and required resources. The
adaptation of workload consists of workload detection and workload control. Three techniques
are used to derive workload control plan, which are performance model, heuristic and threshold
technique.
The functions performed by the workload manager include the inspection, configuration,
optimization, prediction, organization and adoption of the workload. Generally, in DBMSs as
shown in Figure 1.2 all the tasks are performed manual. Large organizations hire database
expert(s) to take care of all the activities of the workload.
Workload Management through Characterization and Idleness Detction 13 Chapter 1 Introduction and Background Figure 1.5
Workload Management Activities Workload management is not any easy task especially in large scale enterprise where thousands
of users generating the workload or requests to DBMS. The unmanageability in workload
management can be handled by making the DBMS to self-manage (Autonomic) that can perform
its tasks such as memory, configuration, storage management and resource allocation
automatically according to the current environment. The Autonomic technology has a high
potential to be incorporated in current DBMSs as well DWs. The benefit of autonomic
computing is to manage complexity itself according to the set goals and objectives. It contributes
an important role in managing systems, database management systems and workload
management. Workload management is a main feature of DBMSs and should be autonomic to
improve the efficiency of DBMS. Technology of the Autonomic computing can be incorporated
with workload management to execute the workload in an efficient and responsive way. There is
need of workload manager that manage workload without affecting other requests, efficient
resources utilization and handle all other matter related with workload. The Autonomic
Workload Manager (AWM) will perform the database tasks by collecting information about the
workload type, intensity, resource demand etc with minimal human intervention. Previously,
different techniques, models and tools have been developed by the database vendors and
practitioners to handle workload in an automatic and self-managed way. These tools are related
with workload scheduling, multiprogramming, prediction, adoption and resource allocation.
Workload Management through Characterization and Idleness Detction 14 Chapter 1 Introduction and Background The functions performed by the AWM are shown in Figure 1.3 which reveals that the database
tasks such as inspection, configuration, optimization etc will be performed automatically any
without any human intervention by the AWM through monitoring, analyzing, planning and
finally fixing.
Figure 1.6
Autonomic Workload Manager Workload Management through Characterization and Idleness Detction 15 Chapter 1 Introduction and Background 1.8 Problem Statement
Previous database workload characterization, classification and scheduling techniques did
not identify the effective workload parameters. Most of the techniques are based on the
assumptions or taking values from the query optimizers rather than calculating their actual
values. However, in reality there is enormous difference between the guessed and actual values.
These techniques used the status variables to mange the workload but there values are taken
during or after the workload execution. So these techniques are not feasible for workload
management task and only useful to identify the workload type. These techniques did not
identify the percentage of the OLTP or DSS percentage in the database workload through Fuzzy
Logic. Due to these reasons, previous workload management techniques give inefficient results
and often lead to starvation specially when there is diverse and huge workload.
There is a need of high level workload management technique that can manage the
database workload without affecting other requests, efficient resources utilization and maximum
CPU consumption. The main objective of the research is to develop the workload management
approach that consists of characterization, scheduler and idleness detector module. The work is
followed by testing the proposed approach over the various real-time and benchmark workloads.
Workload Management through Characterization and Idleness Detction 16 Chapter 1 Introduction and Background 1.9 Research Questions
1) How effective workload parameters will be extracted from the given DB workload?
2) What will be the core components of the proposed DB workload management
approach?
3) How these components will interact with each other to manage the DB workload?
Workload Management through Characterization and Idleness Detction 17 Chapter 1 Introduction and Background 1.10 Research Objectives
The aim of the research is to propose an autonomic workload management framework for
DBMSs that manage the workload through characterization, scheduling and idleness detection
techniques. To achieve the objectives, this research proposes and adopts the following ideas or
methods.
1. Selection of the effective workload characterization and classification parameters to
identify the DB workload type i.e. OLTP and DSS
2. Propose the workload characterization technique to detect and classify the workload
3. Propose the scheduling technique to arrange the DB workload
4. Identify the idleness to rearrange the workload for efficient execution
Workload Management through Characterization and Idleness Detction 18 Chapter 1 Introduction and Background 1.11 Research Contribution The main contribution of the proposed research includes:
1. Design and implementation of the workload management with characterization,
scheduling and idleness detection modules.
2. Selection of the effective workload characterization parameters; identification of
the workload into OLTP and DSS through Case Base Reasoning Approach.
3. A Fuzzy Logic based scheduling technique where incoming workloads are
arranged according to the workload type percentage and impact.
4. Detection of the DBMS idleness to use the free CPU cycles and improve the
workload management technique.
Workload Management through Characterization and Idleness Detction 19 Chapter 1 Introduction and Background Summary
In this chapter, background and related concepts of the workload management, AI techniques
and autonomic computing are discussed. The first part of the section describes the origin,
architecture, characteristics and levels of the autonomic computing. Workload management
section starts with some basics about the workload; how workload evolved with the passage of
time and technology; detection methods for workload management. The chapter also discusses
about the manual workload management and autonomous workload management. Finally, the
chapter concludes with the statement of the problem, research questions, research objectives,
research contribution, research plan and organization of the thesis.
Workload Management through Characterization and Idleness Detction 20 Chapter 1 Introduction and Background Organization of the Thesis
In Chapter 1, there is introduction and background of the database management systems,
workload, its management and autonomic computing. In first part of the chapter, there is
discussion about the workload, its entities, how workload is evolved, workload detection
methods and general as well as autonomic workload management in DBMSs. In the autonomic
workload management part an overview of the AC architecture, components and levels are
discussed. Finally, the chapter provides the statement of the problem, research questions,
objectives, applications and contribution.
Chapter 2 is dedicated for the detailed description of the state of the art that is related
with the workload management techniques in Database Management Systems and Data
Warehouses. These techniques are specially related with the workload classification,
characterization, scheduling and management. The chapter discussed the enormous research
work related with these areas that have been carried out from last many years. After the
discussion of the workload management techniques, the problems are identified and highlighted
which exist in these techniques.
Chapter 3 elaborates the methodology or steps that will be followed to achieve the
proposed research. Here, steps of the research are briefly explained. The objective of the chapter
is to prepare readers about the proposed system details which will be explained in later chapter of
the thesis. Three steps of the proposed research i.e. workload characterization, scheduling and
idleness detection are explained. Experimental design is also specifically discussed here, as we
know for this type of research experimental design is very important. This part of the chapter
explains the database management system, database workload and the benchmark dataset that
will be used during the experiments. Moreover, it explains the experimental setup where various
OLTP and DSS like workloads will be designed and executed. The proposed solution to manage
the database workload in DBMS will be evaluated by executing these experiments and obtaining
their results.
Workload Management through Characterization and Idleness Detction 21 Chapter 1 Introduction and Background In Chapter 4, we have discussed the proposed solution for the workload characterization
where the workload is classified into OLTP and DSS on the basis of key workload parameters.
These workload parameters are identified by executing the workload that consists of OLTP and
DSS like queries. During their execution, the snapshots are recoded with an interval of 5 seconds
to find change in workload parameters. The recorded snapshots are analyzed by observing the
change in workload parameters and on the basis of this observation the workload is classified
into OLTP and DSS. After proposing the CBR based workload characterization approach,
experiments are performed by executing the various workloads to show the effectiveness and
efficiency.
Chapter 5 is dedicated for the workload scheduling technique that is performed on the
basis of workload type which has been identified in the workload characterization process. The
scheduling is performed by finding the degree or percentage of the OLTP and DSS in particular
workload. Percentage of the OLTP and DSS in workload is calculated by using the Fuzzy Logic.
The Fuzzy Based Scheduler (FBS) consists of three main steps which are Fuzzification, Fuzzy
Rules and Defuzzification. In the Fuzzification step, the membership function is calculated on
the basis of workload parameter values and converted into the Fuzzy set. Eight (8) Fuzzy rules
are designed to decide whether the particular workload lies in low, medium, or high class with
respect to its size and importance. The membership functions and the fuzzy rules are identified
after executing the various workloads. The Fuzzy set is constructed that consists of the workload
with its relative class. Once the workload is grouped into classes, scheduling is performed on the
basis of workload importance and percentage of the DSS and OLTP.
Chapter 6 discusses the idleness detection approach and how it is useful to improve the
scheduling with maximum CPU utilization. There is a CBR that contains training data on the
basis of the workload parameters. Training data for the CBR is developed by executing various
workloads, extracting parameter’s values, identifying CPU idleness and storing parameters with
CPU idleness into the CBR for future executable workloads. The idleness of the incoming
workload is identified by searching the exact or nearest (almost equal) workload in the CBR.
After getting the exact or nearest match, idleness of that workload is taken from the CBR. In case
of no match, the workload is executed without taking the idleness for the first time and stored as
Workload Management through Characterization and Idleness Detction 22 Chapter 1 Introduction and Background a new case into the CBR after execution. Once the CPU idleness is found, the time period is used
for some other processes such as defragmentation, index rebuilding etc and also used to improve
the scheduling. The defragmentation process will execute up to the detected idleness time. After
completing the idleness time, the de-fragmentation or other process may complete otherwise it
will again temporarily stop.
Chapter 7 presents the experiments and discussion over the results that are achieved from
the proposed workload management approach. A comprehensive experimental study is
elaborated in this chapter including several experiments. It also shows the experimental
environment and the workload benchmarks. Effectiveness of the proposed research is
experimentally evaluated and the efficiency in terms of the runtime and waiting time is compared
with other techniques. The chapter also provides a comparison with the other well-known
workload management schemes. The results obtained from the approach are compared with the
First In First Out (FIFO) or First Come First Server (FCFS), Priority Based (PBS), and Smallest
Job First (SJF). All the results are shown in tables and graphs. The last section of the chapter is
dedicated for the conclusion and future work.
Finally, the Appendix – Database Structure reveals the table structure and the queries that
are required to build the database for experiments. The database with columns, type, primary and
foreign keys are also shown.
Workload Management through Characterization and Idleness Detction 23 Chapter 2: Related Work
Chapter 2 Related Work In chapter 1, we have described the concept of the workload, its evolution, workload
management techniques along with Autonomic Computing and how traditional workload
management can be evolved to the self-manage workload. The chapter 2 is dedicated to discuss
the state of the art that is related with the existing workload characterization, classification,
scheduling and management techniques that has been introduced and used in DBMSs and DWs.
For convenience and easy understanding, the literature is divided into characterization and
scheduling sub-sections. The characterization section discusses the various approaches that are
used to characterize and classify the database workload while scheduling part discusses the
techniques that are used in DBMSs and DWs to arrange the workload in some specific manner to
improve the efficiency.
2.1 State of the Art in Workload Management
The research on the workload management in DBMSs and DWs has been carried out from the
last many years. A large number of white papers, research papers and case studies that are related
with the characterization, scheduling and management in DBMSs and DWs are surveyed.
Number of researchers, practitioners and vendors are contributing to make them more selfmanaged and reliable with maximum efficiency. Here, we are dividing the literature review into
two sections, i.e. characterization and scheduling techniques sections. Following section
discusses few of these which are closely related to the proposed solution for workload
characterization and scheduling in DBMSs and DWs.
2.1.1 Workload Characterization Techniques
The characterization process is used to identify the workload classes by partitioning the global
workload into smaller sets or components that composed of the transactions or requests.
Following section discusses the previous research that has been done on the workload
characterization in DBMSs.
Workload Management through Characterization and Idleness Detction 25 Chapter 2 Related Work A model is proposed [Zewdu et al. (2009)] to identify and characterize the workload; and
identified the variables that affect the workload. Workload classification into OLTP and
Decision Support System (DSS) is performed through two algorithms, i.e. Quick, Unbiased &
Efficient Statistical Trees (QUEST) and Classification & Regression Tree (CRT). Degree of the
correlation is determined through the Hierarchical clustering. The key status variables are
identified that affect the workload and used these to measure the workload cost. The proposed
approach is validated by analyzing and experimenting over the TPC workload benchmark
queries and transactions in MySql 5.1 database. After performing number of experiments, four
(4) status variables are selected that are vital in the identification of workload type. These include
status variables Com_ratio, nnodb_log_writes, Qcache_hits and Questions.
[Elnaffar et al. (2002), Elnaffar et al. (2004)] proposed a workload characterization model that
automatically classifies the workload into OLTP and DSS. The classifier works on the basis of
workload characteristics and arranges the workload characteristics through decision tree
induction. The proposed classification model is validated through number of experiments that are
performed over the browsing and ordering profiles of TPC-W benchmark. The contribution of
the research is to analyze the performance measurements and introduce the classifier for OLTP
and DSS workload. However, there is no discussion about the distinct workload parameters
which are most affective to distinguish the workload.
REDWAR (RElational Database Workload AnalyzeR) by [Yu et al. (1992)] is an offline tool for
DB2 environment, which is used for the characterization of SQL trace and provides the structural
information with statistics of the query under execution. It analyzes the SQL statements and
views with respect to their composition; and classifies the data. During analysis, it uses statistical
summaries (correlation, distribution, variation) and runtime behavior of the workload. The
efficiency of the database system is increased by identifying the criteria for a SQL statements or
workload. REDWAR generates a report that can be used to plan the physical design and build
benchmark workload. However the research does not provide the automatic recommendations
for the physical design.
Workload Management through Characterization and Idleness Detction 26 Chapter 2 Related Work [Wasserman et al. (2004)] presented an analysis of characterization technique for Business
Intelligence (BI) workload through clustering technique where the Singular Value
Decomposition (SVD) and SemiDiscrete Decomposition (SDD) clustering algorithms are
applied. Their approach is based on resource related attributes such as CPU consumption,
sequential and random I/O rate and joins degree. The sizing technique works by collecting the
input data from the user and for each workload class data is validated and resource demand is
identified. Total resource demand is calculated by aggregating the resource demand for each
class with hardware configuration. Finally a report is generated that reveals a priority wise list of
hardware configuration. The characterization of workload is done by performing five steps
which are components identification, selection of characterizing parameters, input data
normalization, workload grouping into classes and finally identification of each class. The
proposed characterization technique is performed on TPC-H benchmark like data but with some
assumption based parameter. The technique is also limited as for characterization it only
considers the one parameter i.e. user resource demand.
A technique for characterization is introduced where workload models for E-commerce
environment is built [Menasce et al. (1999)]. In this research, a CBMG (Customer Behavior
Model Graph) or state transition graph is introduced for characterizing the workload. This model
graph represents similar navigational pattern for group of customers who perform same
activities. The workload model with related parameters is identified and workload
characterization is performed through clustering algorithm. The technique is evaluated with the
help of different experiments. The proposed technique had some problems such as there will be
maximum session drops when there are huge sessions or maximum load. Moreover there was no
mechanism to manage or recover these session drops.
An approach [Holze et al. (2008)] to monitor the workload for autonomous databases is
introduced that is based on n-Gram model. The proposed technique is lightweight due to use of
simple API and has ability to find workload shifts during the execution. The monitoring process
is divided into two sub-processes; first continuous monitoring where currently executed
workload compares with the typical workload model and second performs the configuration
analysis when major shift is found. The technique is validated through experiments which
Workload Management through Characterization and Idleness Detction 27 Chapter 2 Related Work performed over the DVD Shop and reporting data by taking different workload shift scenarios.
However, the research did not discussed that how the workload can be managed more efficiently.
The black box attributes can also be used to characterize the DB workloads. The work in [Liu et al. (2004)] the black box attributes which include response time, CPU usage, page reference and
disk utilization to characterize the E-commerce workload. Authors argued that CPU utilization
can be changed by varying the dynamic cache. For experiments, they have used various types of
dynamic caches such as Query, Table and Hybrid cache. After experiments it has been found that
response time of the Database server and number of references reduces due to the dynamic cache
even in peak hours. Authors also concluded that the efficiency of the Hybrid cache is better than
Table cache which in turn better than the Query cache. The experiments are performed on the
TPC-W benchmark to validate the black box approach.
Summarized analyses of the database workload characterization techniques for workload
management in DBMSs are presented in Table 2.1 (where the last column “Exp/ Imp” represents
whether the particular research is experimented or implemented in some database management
system).
Table 2.1
Analysis of the Workload Characterization Techniques in DBMSs Characterization
Technique
Zewdu et al.
(2009)
Elnaffar et al.
(2002), (2004)
Yu et al. (1992)
Wasserman et al.
(2004)
Purpose
A model is introduced to
identify and characterize the
workload; and identified the
affective variables that can be
used
for
workload
characterization.
A workload characterization
model is proposed that
classifies the workload into
OLTP & DSS automatically.
A tool that is used for the
characterization
of
the
workload and plan the
physical design.
A characterization technique
for the BI workload is
presented.
W Type/
Benchmark
Static/
Dynamic
Algorithm/ Model
Exp/ Imp
TPC-C TPC-H
Dynamic
Hierarchical
clustering, QUEST
and CRT
Imp
SPRINT – Scalable
Decision-Tree based
Algorithm
TPC-W
Dynamic
Exp
Accounting
Type Data
Static
-
Imp
TPC-H
Dynamic
SVD & SDD
Algorithm
Imp
Workload Management through Characterization and Idleness Detction 28 Chapter 2 Related Work Menasce et al.
(1999)
Holze et al. (2008)
Liu et al. (2004)
Mange workload in
Commerce
type
application.
Eof
An approach to monitor
the
workload
for
autonomous databases that
uses n-Gram model & has
ability to find workload
shifts during execution.
Black box attributes which
include response time, CPU
usage, page reference & disk
utilization to characterize the
E-commerce workload.
TPC-W
Dynamic
Queuing Model
Exp
DVD Shop &
Reporting
Data
Dynamic
n-Gram Model
Suffix Tree
Imp
TPC-W
Dynamic
Least Frequently
Used & Least
Recently Used
Imp
2.1.2 Scheduling Techniques
Scheduling is a way to arrange the queries or workload in some specific order to decrease their
average waiting time and speed up the execution process. The previous research over the
workload scheduling techniques is discussed in the coming section.
[Surajit et al. 2005, 2007] proposed a framework to stop and restart the queries during their
execution to manage the workload efficiently. The restart approach re-executes the stopped
query from the position where it was stopped. This technique does not save all the executed
information but save only the selected information from the past execution to reduce memory
overhead. This method also reduces the running time of re-executed queries. The proposed
technique is validated by performing the experiments over the real and benchmark data. The
proposed approach is limited to handle single Query Execution Plan (QEP) and does not consider
the parallel QEPs. There is no dynamic way in proposed technique to maintain the restart plan
during the modification of source records (past executed records).
The research discussed in [Chandramouli, (2007, 2007)] for query suspension and resumption
with minimum overhead where authors have proposed induction of asynchronous checkpoints
for each cardinality in a query. The technique solves the optimization problem by using mixedinteger programming. An optimized plan for suspension is proposed which dumps the current
state to disc and going back to previous checkpoint. The optimized plan performs its tasks
(suspension or resumption) with less overhead by observing the time constraint during
Workload Management through Characterization and Idleness Detction 29 Chapter 2 Related Work suspension. In case of query suspension all the resources are released while in case of resumption
the required resources are resumed. The technique is also implemented in a tool named as
PREDATOR and proved that this tool has better results than others. The technique is proven
experimentally for simple and heavy workload; and it is observed that it meets suspend time
constraint; and thereby reducing the overhead. The technique uses hybrid approach for query
suspension where suspend time overheads are negligible and due to this better results can be
seen. The technique allows to suspend whole query as compared to previous techniques where
switching is performed between individual operators. The memory wastage is higher in previous
techniques due to switching and shown worse results for unexpected suspend. However the
proposed technique does not re-optimize the given query. Memory wastage is less as compare to
the previous approaches as this technique allows to suspend the whole query.
An external scheduling technique for the OLTP workload is proposed y Schroeder et al. (2006)
where they identified main parameters and used feedback controller to select the appropriate
MPL automatically that is based on queuing theoretic model and Markov chain. Priorities are
also used for the external scheduling. The technique is validated through experiments and
observed that external scheduling can be equally effective to internal scheduling when suitable
MPL is selected. The research has provided a mechanism that selects a suitable MPL on the basis
of two parameters i.e. disc and CPU. The technique does not consider the impact of queries on
each other. The low priority transactions are executed only when there are no high priority
transactions. Moreover, the paper does not provide a mechanism to give priorities to different
transactions. The technique improves the high priority transactions by the factor of 12.1 while
the low priority average suffer is about 16%.
[Mumtaz et al.,(2008, 2008, 2009)] discussed the impact of query interaction over the workload
by using planning experiment and statistical modeling approach. This approach is portable as it
works without previous assumption about the internal knowledge of database and cause of query
interaction. A framework Query Shuffler (Qshuffler) – Interaction aware query scheduler for
workload management for Business Intelligence is introduced and shown in Figure 2.1.
Qshuffler considers the impact of queries over each other as it has vital role in performance and
proposed an experimental technique to handle query interaction problem. It considers impact of
Workload Management through Characterization and Idleness Detction 30 Chapter 2 Related Work queries over each other as it has vital role in performance. Different requests are given by the
users in the form of queries; the Qshuffler classifies these queries according to their type and
arranges them in a way to minimize the total execution time of the workload. It gives optimal
solution for scheduling as it is based on linear programming and is used to minimize the
dropping requests using non-preemptive shortest remaining time first scheduling technique. The
Qshuffler is implemented in the IBM DB2 and evaluated with the TPC-H benchmark workload.
The proposed approach is portable as it works without previous assumption about the internal
knowledge of database and the cause of query interaction. It gives four times performance over
the FIFO scheduler of database systems. However in Qshuffler, the large jobs have to wait for a
long time as it uses SJF algorithm for scheduling. Moreover, the average execution time is larger
as it uses non-preemptive SJF approach.
Figure 2.1
Query Shuffler (Mumtaz et al., 2009) [Mehta et al. (2009)] define the design criteria to develop a Mixed Workload Scheduler (MWS)
and uses it to design rFEED, i.e. MWS that is fair, effective, efficient, and differentiated. They
proposed a non-preemptive approach for scheduling since it can be expensive to preempt really
small queries that make the bulk of a BI workload. The approach uses optimizer’s estimated
execution cost as an approximation as authors think that approximation is sufficient and no need
to use precise value. Moreover, a single queue is used for scheduler and multiple queues for
execution. They also assumed that all queries have the same normalized service level. The
Workload Management through Characterization and Idleness Detction 31 Chapter 2 Related Work authors simulated the real workloads and compare this system with models of the current best of
breed commercial systems. The suggested methodology uses non-preemptive scheduling scheme
that gives inefficient results for time critical systems due to its poor responsiveness and
ultimately there is a chance of starvation and hanging. The authors used the approximate values
that never give the actual results. Optimizer’s estimated execution cost is used but in real life
approximate values never give the actual results. The proposed methodology assumes the same
service level but in real life the workload or queries do not have the same service level. The
approach set the MPL statically and does not consider the interaction among the queries in the
workload.
BI workload manager [Mehta et al. (2008, 2008)] for enterprise data warehouse is introduced in
which queries are admitted in the form of batches. It consists of three components which are
admission control, scheduler, and execution control. Admission control works on the basis of
memory requirement in the form of batches where a batch consists of those queries whose
memory requirement is equal to the available memory of the system. For scheduling, the authors
proposed that the query with maximum memory requirement will have the highest priority. The
manager uses Priority Gradient Multiprogramming (PGM) to prevent underload and overload
problems. Finally some experimental results are discussed to evaluate the performance of BI
batch manager. Workload is executed in the form of batches to avoid thrashing and provides the
optimal solution for all types of workload. This approach does not change in the internals of
DBMS to manage small and heavy workload. The manager gets benefit of the added
predictability of queries; stabilize the system and less sensitive to estimation errors. It uses feed
forward loop that stabilizes the system (from underload or overload) with maximum ability to
tolerate the prediction errors. Feedback control technique samples the performance metric and
controls the incoming request optimally. When performance metric exceeds from the threshold,
the rate of admitting requests reduces and vice versa. It gives high throughput for commercial
and enterprise class DBMS; however it has no ability to handle interactive and ad-hoc queries.
The sub-batches are created on the basis of one parameter that is memory. As the query with
largest memory gets higher priority so the queries with small memory will have to wait for a
long time. Due to this reason throughput will decrease and starvation will occur. As the PGM
Workload Management through Characterization and Idleness Detction 32 Chapter 2 Related Work executes that query first which requires largest memory so the same problem as with Largest
Memory Priority (LMP) will be experienced in PGM.
Query Progress Indicators (PI) are used to provides step-by-step status of a query execution.
Single Query PI Chaudhuri et al. (2005) is proposed with a graphical user interface for the
relational DBMS that keeps the track of work completion and estimates the remaining execution
time. The proposed PI starts its working by taking the estimated cost from the query optimizer
and calculates the remaining query execution time using statistics. It also monitors the query
execution speed continuously and remaining query execution time is estimated by dividing the
estimated remaining query cost over the query execution speed. The proposed PI is limited to
single query without considering the impact of one query over the other. So during the estimation
of the query execution remaining time, the load and query progress is considered in isolation. It
often provides inaccurate and wrong estimates. As in most of the concurrent queries execution,
one query can significantly slow down the progress of other query. In single query PIs, greedy
algorithm is used to speed up the process, where an optimal victim query is selected and the next
optimal victim query is chosen. The process continues up to get all the victim queries.
Later on, a multi-query PI Luo et al. (2006) has been proposed which represents the progress of
running queries and considers the impact of queries on each other as opposite to the previous
techniques of single query PIs. The technique of Multi-query PI works by considering the
remaining execution time of the concurrent queries with available statistics and predicts the
future execution speed of incoming queries. On the basis of estimation, it can also predict the
future queries and has the ability to manage the current workload efficiently. The indicator not
only provides the visualization of the running queries but helpful to manage the database
workload efficiently. This technique takes the workload management problem as an input and
provides its solution through the information that is provided by the Multi-query PI. The
proposed technique is implemented in POSTGRE SQL and examined with the remaining query
execution time.
The technique is the first attempt for a multi-query PI. As compare to the single query PIs, the
Multi-query PI considers the impact of the concurrent queries over each other and predicts the
Workload Management through Characterization and Idleness Detction 33 Chapter 2 Related Work incoming queries with considering the priority and cost. The information provided by these
indicators helpful for the workload management tools to take more intelligent decisions. The
indicator is able to predict the accurate future queries even initial estimates are wrong by
detecting and correcting their estimates. It monitors the system at all the times and manages the
workload more dynamically. Multi-query PI is adoptive as it revises its decisions when it found
some significant changes as compared to the predicted results. This adaptive behavior of MultiQuery PI shows the consistency with the trends of automatic and AC.
DB2 Query Patroller (QP) IBM Corporation (2003), Lightstone et al. (2002) is a management
tool, which is used to streamline the requests according to the available resources. It is possible
to accept the database workload from user and analyze it. On the basis of analysis it prioritizes
and schedules the workload according to different query classes. A class is build by considering
cost range and Multi-Programming Level (MPL) threshold. Cost range is provided by the query
optimizer that calculates the resource demands. MPL threshold is the maximum number of
requests in a class that can execute in one time. Remaining queries are placed in a queue when
the threshold level is reached and are placed for execution in a class whenever threshold level
goes down. It also gives the information to the user about the status of tasks. QP provides
sufficient resources for given workload and by using profile (that is created by administrator)
saturations for long terms queries can be avoided. QP controls the flow of requests proactively. It
provides the information about the completion of request and finds trends of queries, workload
of users as well as the frequently used indexes and tables. QP enhances the performance by
monitoring the system utilization, canceling or rescheduling the queries and identifying the
trends of database usage. Query submitter uses it to monitor the submitted queries, store query
result for future perspective and query submission customization. Submitter assigns the higher
priorities to some user so that in the class these queries run with less delay. QP suspends high
load queries so that they can be cancelled or scheduled to run after peak hours and track the
query process. By performing these steps the smaller queries may not stuck and system recourses
are used properly. It is based on client-server architecture and consists of three components.
These components are Query Patroller server, Query Patroller Center and Query Patroller
command line support. DBA uses QP to assign privileges of resources at user and system level.
Workload Management through Characterization and Idleness Detction 34 Chapter 2 Related Work Query Patroller monitors the given workload; perform analysis and prioritize it schedules for the
incoming requests from the users. It limits the flow of long running queries to avoid saturation
and ensures better resource utilization on the basis of profile (created by the administrator).
In Oracle, the automatic SQL tuning is performed through the query optimizer and SQL Tuning
Advisor Dageville et al. (2004). Query optimization has great importance especially in case of
complex workload. The SQL Tuning Advisor is an interface between the query optimizer and
user. It generates tuning recommendations for the SQL statements or workload. These
recommendations are provided to user, who either select or reject. When user selects the
recommendations, these will be stored in the SQL profile that is further utilized by the optimizer
for generation of best Query Execution Plan (QEP). SQL Tuning Advisor makes decisions on the
basis of information that is provided by the query optimizer, Automatic Database Diagnostic
Monitor (ADDM) and Automatic Workload Repository (AWR). The approach set the MPL
statically and does not consider the interaction among different queries. The SQL Tuning
Advisor improves the execution plans through SQL Profiling concept; and on the basis of costbased access paths and what-if analysis it generates tuning recommendations for the incoming
workload. It has a very strong analysis capability where it performs a number of analyses such as
estimate, access, parameter setting, statistical and SQL structure analysis. Whenever query
optimizer fails due to heavy or complex workload, it assists by stabilizing the system and
generating the QEPs.
Oracle Database Resource Manager (ODRM) by Rhee et al. (2001) allows DBA to logically
divide the workload into distinct units and allocates CPU resources to these units without extra
overhead. During the peak hours OLTP workload should be given preference over the DSS
workload and vice versa. ODRM has scheduling mechanism that is used for fixed time interval
and controls the number of active sessions executing at a time. When the available slots for
active sessions are filled with new sessions, the remaining sessions will be queued until some
slot is available. Through ODRM, administrator can define and set the scheduling policies for the
workload based on the predicted execution time of a query. The major components of the ODRM
are Resource Consumer Group, Resource Plan and Resource Plan directive that interact to
achieve the objective.
Workload Management through Characterization and Idleness Detction 35 Chapter 2 Related Work [Krompass et al. 2007, 2008, 2009] introduced an adaptive QoS management technique, in
which they used economic model to handle individual request of BI and OLTP workload
proactively. They provide a systematic way to arrange different requests by dividing these into
different classes based on cost and time limit. They also proposed a model which calculates the
cost of a request by differentiating underachieving and marginal gains of a Service Level
Objectives (SLOs) threshold. The framework is evaluated to observe its effectiveness by
performing experiments on different workloads. The framework is scalable as it can implement
the new workload management concepts with already previously implemented policies. The
framework uses economic model with two economic cost functions (Opportunity Cost, Marginal
Gains), where penalty information is added with the queries. The penalty information is used to
make the efficient order of pending query execution. The scheduling policy used for OLTP
workload in this framework is enhanced by considering the combine effect of priorities and
SLOs rather than considering merely priority.
A Priority Adaptation Query Resource Scheduling (PAQRS) algorithm is devised by [Pang et al.
(1995)] that is based on the Priority Memory Management (PMM) algorithm and deals with
multi-class query workload. The algorithm reduces the missed deadlines according to the miss
distribution defined by the administrator. It works by allocating memory and assign priorities by
considering the resource usage, workload characteristics, and performance experienced.
Whenever the workload change, new MPL is calculated, memory reallocation and priority
adjustment is done accordingly. Two techniques, the miss ratio projection and resource
utilization heuristics are used to calculate the new MPL. In case of miss ratio projection method,
previous MPL and miss ratio are used as the parameters. It is used to schedule the complex type
of workload and reduces the number of missed deadline thereby making the efficient use of
system resources. It has bias control mechanism, which regulate the distribution of missed
deadlines among different query classes. The MPL and memory is allocated on the basis of
regular and reserve group quota. The priority of regular queries is higher than reserve queries. By
adopting this policy, PAQRS make adjustments between the miss ratio and the target
distribution. However, it cannot handle the heterogeneous transactions and is limited to the
workload consisting of homogenous queries only. Multiple service classes are also not
considered that have diverse goals. Its performance degrades whenever there is increase in the
Workload Management through Characterization and Idleness Detction 36 Chapter 2 Related Work workload fluctuation. The research work also suggests how to adjust the policy threshold but
there is no discussion about the automatic policy updations with respect to the changing goals
and objectives.
A framework for workload adaptation by Niu et al. (2006) is proposed that has two components,
i.e. workload detection and workload control. The workload detection finds the changes and
provides information of the workload. The framework has also four functional components,
namely workload characterization, performance modeling, system monitoring and workload
control. The effectiveness of the framework is proven by implementing the query scheduler
through a large number of experiments. Query scheduler can directly handle OLAP workload
where as OLTP workload cannot be handled directly. They proposed a technique by using
indirect control of the OLTP through directly controlling OLAP workload. So due to this
enhancement in query scheduler both workload can be handled to achieve the performance goals.
Performance is improved by prediction process using Kalman filter as performance prediction
plays vital role in workload adaptation. Kalman filter is very powerful tool that is used to make
the estimation of past, present and future states. The filter provides an optimal solution for linear
processes and sub-optimal solutions for the non-linear processes. Through experiments, more
accurate prediction results are obtained with less unpredicted SLO violations. In short, the
research contribute by designing a general framework for performance oriented workload
adaptation, prototype implementation of framework (Query Scheduler), a cost-based
performance model for the workload control plans and improves the accuracy of the prediction
through Kalman filter. The experiments on the Query Scheduler are performed on the stable
workload, which is not representing the real environment as in actual the database workload
changes rapidly. During the experiments, total cost of a query instead of detailed cost is used as a
parameter that may leads to wrong results. Moreover, the scheduler is confined to the linear
workload however, in a real environment most of the time database workload behavior remains
non-linear.
Summarized analyses of the database workload management techniques in DBMSs are presented
in Table 2.2 (where the last column “Imp/ Exp” represents whether the particular research is
experimented or implemented in some database management system).
Workload Management through Characterization and Idleness Detction 37 Chapter 2 Related Work Table 2.2
Technique
Surajit et al.
2005, 2007
Chandramouli
et al. (2007,
2007)
Schroeder et al.
(2006)
Mumtaz et al.
(2003, 2008,
2009)
Mehta et al.
(2009)
Analysis of the Workload Management Techniques in DBMSs Purpose
A framework to stop and
restart the queries during
their execution to manage the
workload efficiently.
Query
suspension
and
resumption is discussed
where
asynchronous
checkpoints are used for all
the cardinalities in a query.
A technique introduced for
external scheduling to handle
the OLTP workload.
Schedule the workload by
considering the impact of the
queries over each other.
A design criterion for a MWS
is presented with simulation
results.
Dageville et al.
(2004)
A workload management
technique
to
execute
workload in batches.
Display the step by step
status of query execution.
The research provides a way
to manage the workload by
providing the step by step
status of query execution
with the consideration of the
impact in concurrent queries.
Prioritize and schedule the
workload according to query
classes.
Tunes the process of QEP
and
provides
the
recommendations for tuning.
Rhee et al.
(2001)
Logically
divides
the
workload into small parts and
allocates the resources
Mehta et al.
(2008, 2008)
Luo et al.
(2004)
Chaudhuri et
al. (2005)
Lightstone et
al. (2002)
Krompass et al.
(2007, 2008,
2009)
Pang et al.
(1995)
An adaptive QoS
management technique to
handle the BI and OLTP
workload.
Proposed an algorithm that
reduces the missed deadlines,
allocates memory & assigns
privileges.
W Type/
Benchmark
Static/
Dynamic
Algorithm/ Model
Imp/
Exp
TPC-H
Dynamic
Heuristic
Hash Join
Imp
TPC-H
Dynamic
Mixed-Integer
Programming
Imp
OLTP/ TPCC
Dynamic
TPC-H
Dynamic
Queuing Theoretic
and Markov Chain
Model
Statistical Modeling
(Linear &
Quadratic Models,
Regression Trees)
Smallest Job First
Exp
Imp
Dynamic
Static
(MPL
Selection)
Non-preemptive
Priority scheduling
-
BI/ OLAP
TPC-H
Dynamic
PGM
-
TPC-R
Dynamic
TPC-R
Dynamic
OLTP
TPC-C
OLTP/ TPCC
BI/ OLAP
OLTP/ TPCC
DSS/ TPCH
Greedy Algorithm
Greedy Algorithm
Imp
Imp
Dynamic
Administrator
assigns privileges at
user & system level
Imp
Dynamic
What-if Model,
Cost Based Model
Imp
OLTP/ TPCC
DSS/TPC-H
-
DBA divides
workload &
allocates resources
to these units.
Imp
OLTP
OLAP
Dynamic
-
-
-
Dynamic
PMM Algorithm
-
Workload Management through Characterization and Idleness Detction 38 Chapter 2 Related Work Niu et al.
(2006)
The research discussed the
performance oriented
workload adaptation
framework.
OLTP/ TPCC
BI/ OLAP
Dynamic
Kalman Filter
Queuing Model
Proto Imp
2.1.3 Limitations in Previous Research Work
In the above sections we have discussed the research that has been conducted to manage the
workload with various approaches. Each paragraph elaborates the summary of the particular
research and the critical review. The summary part discusses the approach, workload and the
environment in which the research is carried out while the critical review part list down the
problems that occurs. In the following section, there is discussion about some common problems
that have been identified in the previous workload management techniques.
Previous workload characterization and scheduling techniques did not identify the effective
workload parameters. Most of the techniques are based on the assumptions or taking values from
the query optimizers rather than calculating their actual values. However, in reality there is a
large difference between the guessed and actual values. None of the previous characterization
technique for the workload management in DBMSs identified the OLTP or DSS percentage in
the workload using Fuzzy Logic. There is a need of workload management approach that can
manage the database workload without affecting other requests, efficient resources utilization
and handle all other related matters. The work is followed by testing the proposed research over
the various real-time and benchmark workloads.
Summary
In this chapter, the problem of workload management and its complexity were reviewed. The
basic methodologies, frameworks and algorithms that have been used for workload management
were analyzed to find main ways, which can reduce the complexity and manage workload in a
better way. The summary of the each research provided and critically analyzed to find their
strengths and weaknesses. The results gained by studying and analyzing the related works are
Workload Management through Characterization and Idleness Detction 39 Chapter 2 Related Work also discussed. At the end of each section, a table is given which summarizes the previous
research that has been done in the context of workload management.
Workload Management through Characterization and Idleness Detction 40 Chapter 3: Research Methodology
Chapter 3 Research Methodology This chapter is dedicated to provide a general idea of the proposed solution to manage the
database workload. The chapter starts with providing an overview of the problem and discusses
the research methodology to achieve the proposed workload management task through
characterization, scheduling and idleness detection. In fact, there is explanation about the road
map and steps that will be carried out to achieve the specified goals of the research. Finally, there
is explanation about the experimental design including hardware and software detail, workloads,
performance metrics and the evaluation methods.
3.1 Research Problem Overview
Several database researchers, practitioners and vendors are working to manage the workload in
DBMSs using different techniques. However, there are certain problems with the previous
workload management techniques. The techniques that are based on the decision tree leads to
some problems. First, handling of decision tree becomes difficult with the size of the
decision tree and secondly, in such cases tree pruning is performed to get the result that may lead
to wrong results and lastly difficult to set up in such circumstances. The research performed on
the basis of status variables have the problems such as mostly techniques are taking workload
parameter values on assumption basis or from the query optimizers rather than calculating their
actual values. However, in reality there is a large difference between the guessed and actual
values. The other problem with these techniques is that values of status variable are recorded
during the workload execution that also slows down the process. These techniques are using the
status variables to mange the workload but there values are taken during or after the workload
execution. Due to this reason, these techniques are only useful to identify the workload type and
infeasible for workload management task.
Workload Management through Characterization and Idleness Detection 42 Chapter 3 Research Methodology 3.2 Proposed Solution
The proposed solution to manage the workload in DBMS consists of three modules which are
Workload Characterization, Scheduler and Idleness Detection as shown in Figure 3.1.
Responsibility of the characterization module is to divide the workload into OLTP and DSS after
identifying the effective workload parameters and passes its output that is workload parameter’s
values to the next module. The scheduler module after getting the parameters values will
calculate the impact of each workload and schedule them. Finally, the idleness detection module
will find the CPU idleness for the particular workload and assign the idleness period to some
other suspended process. These are the abstracted steps and next chapter are dedicated to explain
these steps in detail.
Figure 3.1
Components of Workload Management Framework 3.2.1 Characterization
The responsibility of the characterization module is to analyze the incoming workload to
get the effective parameters which are vital to know the workload type. The workload will be
observed on the query level with respect to the workload parameters such as type, cost, priority
and dependency (with other queries) as shown in Figure 3.2. Steps to achieve the workload
characterization are explained as follows:
1. Incoming database workload is first of all divided into two types (OLTP and DSS) by
identifying the key workload parameters. The parameters identification will be performed
by analyzing the workload parameters and their respective values that change time to
Workload Management through Characterization and Idleness Detction 43 Chapter 3 Research Methodology time with the execution of workload. After careful analysis, useful and effective
workload parameters will be extracted.
2. During the identification and classification of the workload, it is necessary to consider the
tradeoff of all workload parameters. The DSS type of workload needs an immediate
response as usually these are generated by the higher management while on the other
hand these requests are very large and requires many resources. OLTP transactions are
generated through the lower management but very small and require minimum resources.
Similar trade off should be analyzed in other workload related attributes. Previous
research related with workload classification over the MySql is discussed in [Zewdu et al.
(2009)] where the workload parameters used include Com_ratio, nnodb_log_writes,
Qcache_hits and Questions. Their research work identified the workload type on the basis
of these parameters.
3. Some AI and machine learning technique such as Case Base reasoning (CBR), Bayesian
Theorem, clustering algorithm etc will be used for workload classification.
4. Before applying the technique, training data will be prepared that will provide the basis
and helpful to make decision about the workload type. Training data will be constructed
by executing the database workload and storing its associated workload parameter’s
values with a specific interval. The suitable time interval for snapshot will be taken after
the experiments.
5. Impact of the workload will be calculated after the distinction of workload into OLTP
and DSS.
6. Output of this module will be the workload type and the parameter’s values.
Workload Management through Characterization and Idleness Detction 44 Chapter 3 Research Methodology Figure 3.2
Steps of Characterization Module 3.2.2 Scheduler
1. The Scheduler will take workload and related parameters values. Impact of the queries
will be calculated on the basis of parameters and workload type as identified in the
workload characterization module.
2. We will use the fuzzy logic for scheduling the workload that will handle the workload
dynamically. In Boolean logic each decision has two outputs true or false however in
fuzzy logic truth value has a range starting from 0 to 1. So the fuzzy logic works on the
concept of partial truth, where the truth value may range between completely true and
completely false.
3. Steps of the fuzzy based scheduler are as follows:
a. Fuzzification
In this step crisp input values are evaluated by applying the membership
functions that returns a value in between 0 and 1. Membership function must be
simple and easily convertible to fuzzy set. There are three methods for this
purpose which are Gaussian, triangular, trapezoidal. Input value is in fuzzy set
when the value received from the membership function is closer to 1.
Workload Management through Characterization and Idleness Detction 45 Chapter 3 Research Methodology b. Rule Evaluation
The evaluated output received from the fuzzification step is checked
through fuzzy rules which portray the expert decision and written in the form of
fuzzy words.
c. Aggregation
For each output variable, the aggregation step performs the conversion of
result into a fuzzy set.
d. Defuzzification
At the end fuzzified rules are used to produce the final output by
changing the fuzzy set into output values. Usually two methods i.e. Maximum
method and Centroid methods are used.
3.2.3 Idleness Detection & Exploitation
1. There will be a Case Base Reasoning (CBR) that contains training data on the basis of the
workload parameters such as Workload id, Number of OLTP queries, Number of DSS
queries, Execution time, Response Time, Memory, Disk I/Os, Communication cost etc.
Training data will be developed by executing various workloads, extracting parameter’s
values, identifying CPU idleness and storing parameters with CPU idleness into the CBR
for future executable workloads.
2. The idleness of the incoming workload will be identified by searching the exact or
nearest (almost equal) workload in the CBR.
a. After getting the exact or nearest match, idleness of that workload will be taken
from the CBR.
b. However if the match for the particular workload is not found in the CBR then the
workload will execute without taking the idleness for the first time and after its
successful execution it will be stored as a new case into the CBR with its
associated idleness.
Workload Management through Characterization and Idleness Detction 46 Chapter 3 Research Methodology 3. After the detection of CPU idleness, the time period will be used for some other
processes such as defragmentation, index rebuilding etc and also used to improve the
scheduling.
4. The defragmentation or other resumed process will remain in execution up to the detected
idleness time.
5. After idleness time is finished, the de-fragmentation or other process may complete its
execution otherwise again temporarily stop.
3.3 Experimental Setup & Results Validation
As the experimental design and evaluation has vital role in proving the proposed
research, therefore here we are discussing the experimental design, Database Management
System, performance metrics and the evaluation methods.
3.3.1 Experimental Setup
Following hardware will be required to build the experimental setup for experimentation
and validation of the proposed solution.
i.
Database Server (01)
ii.
Backup Server (01)
iii.
Personal Computers (06)
iv.
Switch (01)
All the hardware devices will be connected through a network as shown in Figure 3.3.
Workload Management through Characterization and Idleness Detction 47 Chapter 3 Research Methodology Figure 3.3
Experimental Setup for the Proposed Solution 3.3.2 Database Management System
We have selected the MySql Database Management System to execute various workloads
and validate the proposed technique. The selection of MySql database management system is due
to its freely availability and open source feature, i.e. source code is available which can be
further analyzed and modified accordingly. MySql Server will be installed on server machine
and MySql client on client machine. The main database will be stored at the server machine
while backup server machine will be responsible to maintain the backup of the database.
Each client machine will have the ability to generate thousands of queries or workload at
a time. The workload will be generated via threads and sent to the database server for execution.
There will also be a database on the server as well as backup machine that will contain various
OLTP and DSS queries which will be taken arbitrarily by the threads that will generated from
client to form a workload.
3.3.3 Proposed Technique Scenario
Proposed solution for workload management in DBMS can be achieved by performing
the following steps:
a. Workload is initiated through threads from the client machines and sent to the
database server for execution. A client machine has the ability to initiate
Workload Management through Characterization and Idleness Detction 48 Chapter 3 Research Methodology thousands of requests at a time to the database server. The workload consists of
the OLTP and DSS like queries that are stored in the database which is stored at
the database server.
b. In the workload management technique, first of all it is taken by the
characterization module which will partition the workload and observe it for
further analysis by taking and storing the snapshots.
c. After recording and analyzing the snapshots, effective workload parameters will
be identified and extracted which will be used to classify the workload into two
types of workload, i.e. OLTP and DSS.
d. Once the workload is classified, the OLTP and DSS percentage of the workload
will be calculated to arrange it for better execution.
e. Next step is to find out the DBMS idleness for that workload. The DBMS idleness
will be detected and some other system process (background process such as Defragmentation, index rebuild etc) or user oriented process will be scheduled with
the workload for execution.
f. The resumed processes will execute until the idleness period is finished. The
restarted process will finish or stop once again after the completion of identified
idleness period.
3.3.4 Evaluation Metrics
The proposed solution to manage the workload will be compared with the other wellknown techniques. The results obtained from the proposed approach will be compared with the
First In First Out (FIFO), Priority Based (PB) and Smallest Job First (SJF) approach. The
comparison will show how much the proposed solution is better than the previous techniques
with respect to the workload importance and average waiting time. The proposed Idleness
technique will be validated by observing the CPU utilization that will be recorded by executing
Workload Management through Characterization and Idleness Detction 49 Chapter 3 Research Methodology the workload in MySql environment and taking their snapshots. After the execution of workload
in MySql, same workload will be executed through the proposed technique. Comparison of these
techniques will be made and discussed that will show the effectiveness of the proposed
technique.
All the comparisons will be presented through the graphs that will show how much the
proposed solution for the workload management is effective.
Summary
The chapter starts with providing an overview of the research problem. The steps to
achieve the proposed research for the workload management in DBMSs are briefly explained.
The proposed idea for the workload management in DBMss consists of three main modules
which are Characterization, Scheduler and Idleness Detection. The abstract level steps of these
three modules are explained in this chapter. As the experimental design play main role to
validate the research, therefore experimental setup is discussed that includes experimental
environment (hardware, DBMS, workloads) and the evaluation method to validate the proposed
research.
Workload Management through Characterization and Idleness Detction 50 Chapter 4: Workload Characterization
Chapter 4 Workload Characterization The chapter starts with providing an overview of the workload and its complexity;
problems in the previous research and proposed workload management solution that further
consist of the workload characterization, scheduling and idleness detection modules. This
chapter is dedicated to provide the discussion about the proposed workload characterization
approach. It identifies the workload classes by partitioning the workload into smaller sets
composed of transactions or requests. The workload related parameters are identified after
executing the workload and analyzing the changes in the workload parameters. The module is
validated through various experiments over the real-time and benchmark workloads that consist
of OLTP and DSS like queries.
4.1 Introduction and Background
Complexity in Database Management System (DBMS) increases due to the various
factors which are functionality demands from the users, complex data types, diverse workload
and huge data volume that is increasing with the passage of time. These factors cause brittleness
and unmanageability in DBMSs. Moreover, today’s DBMSs have also to work as a Data
Warehouse (DW), i.e. providing summarized and consolidated reports; heterogeneous, complex
and ad-hoc workloads that involve number of aggregations, table joins, sorts and concurrent
I/O’s. To handle these problems organizations hire number of expert Database Administrators
(DBA) and spending lot of money to get expected improvement, throughput and response.
Usually DBA have to take care of all the database tasks such as making policy for workload
priorities, memory and storage configuration and such other tasks. The cost of hardware is
decreasing but the cost of management is increasing. Performing workload management
activities manually, by hiring experts causes increase in TCO [Huebscher et al. (2008), IBM
Corporation (2003)]. Moreover with the advent of distributed systems and Data Warehouses
(DWs), it became difficult and even some cases impossible for DBA to manually organize,
optimize and configure all the database tasks. For efficient workload management, the other
approach is to stop the workload for a while and resumed later on. However, when queries will
Workload Management through Characterization and Idleness Detection 52 Chapter 4 Workload Characterization be stopped during their execution then executed part will be stored for a while in memory and
used later on.
Style of the database workload has been changed from the offline analysis to online
adaptation. In workload management there are three units which are workload, resources and
objectives. All three workload units are co-related with each other. The workload uses some
resources to meet the objectives of an organization or we can say resources are allocated through
different approaches to workload which has some management objectives. Workload has
evolved through three phases which are capacity planning, resource sharing and performance
oriented workload [Niu et al. (2006)]. Capacity planning workload management is based on cost
sharing; the idea behind the resource oriented approach is maximum resource utilization while in
case of performance oriented approach the focus is on the business goals and objectives. In
workload management, the main functions are workload frequency patterns, composition,
intensity and required resources.
Identification of the workload type is not an easy task as workload size and type changes
at the different spans of the day, week and months. For example, the stock exchange experiences
more DSS workload than OLTP workload at the start of the day, turns to more OLTP than DSS
during mid of the day and finally once again tends to DSS overnight to observe the whole day
analysis and reporting.
In general, workload is detected through two methods that are
performance monitoring and characterization. Former is the reactive approach that takes action
when the performance has been degraded while the later is proactive approach that works by
tracking the workload changes.
4.2 Problems in Previous Workload Management Techniques
Database workload consists of SQL statements, transactions, scripts, commands, sessions
and Jobs. From the last many decays, different researchers, practitioners and vendors have been
working on the workload management in DBMSs and DWs. In this regard, various techniques
have been introduced to manage the workload that includes Linear Programming, Mixed-Integer
Programming, Queuing Theoretic Model, Largest Memory Priority (LMP), Priority Gradient
Workload Management through Characterization and Idleness Detction 53 Chapter 4 Workload Characterization Multiprogramming (PGM), and Shortest Job First (SJF) algorithm. However, these techniques
have certain limitations: the workload optimization techniques that are using non-preemptive
scheduling scheme provide inefficient results due to their poor responsiveness that may lead to
starvation and hanging; most of the previous techniques use approximate values rather than
calculating actual values; considering all the queries at same level; dividing the workload into
batches or taking multiprogramming (MPL) level on the basis of single or two parameters. By
taking the approximate values, considering all queries at same level, avoiding query interaction
and fixed query types lead to inefficient results. Dividing batches or MPL setting on the basis of
single parameter is not a sufficient criterion to get sufficient performance.
4.3 Proposed Workload Management Solution
As discussed in previous chapter that the proposed solution for the workload management
in DBMSs consists of three main modules. These are the Workload Characterization, Scheduler
and Idleness Detector module and now we are discussing these in detail.
4.3.1 Workload Characterization
The incoming workload is characterized and classified as OLTP and DSS on the basis of
key attributes of the workload. An overview of the proposed approach is shown in Figure 4.2.
For this purpose, we have developed a Case Base Reasoning (CBR) that contains the effective
workload parameters with their associative workload type. To develop the training data, these
effective workload parameters are identified with their associated workload type. The proposed
solution works by extracting the above parameters for each query of the incoming workload and
values of these parameters are searched in the CBR. When exact or nearest match against the
parameters is found then the workload type of the searched case is taken. After the identification
of the workload type it is stored in the CBR as a new case for future use. These steps are
explained in the chapter later.
Workload Management through Characterization and Idleness Detction 54 Chapter 4 Workload Characterization Figure 4.1
Steps to Characterize the DBMS/DW Workload The proposed technique for the characterization of database workload works as the user
writes the workload or query in the workload editor. The characterizing process works in parallel
with the user input to the workload editor. By adopting this approach, the workload is
characterized and classified before the submission of the workload to DBMS. In previous
characterization techniques, the characterizing process starts after the submission of the
workload to DBMS. In contrast to the previous approaches, we characterized the workload in
advance. Due to this reason, the proposed CBR workload characterization saves the precious
execution time that ultimately improves the efficiency of DBMS. Detailed steps of the proposed
workload characterization are:
4.3.1.1 Key Attributes Selection
In MySQL 5.1, there are 291 status variables which are used to distinguish the different
operations. The values of these status variables change according to the specific operation with
the experience of some transactions or workload. However, all status variables are not used to
distinguish the workload type, i.e. OLTP or DSS. After experimenting and analysis over the
various OLTP and DSS like workloads, 10 status variables are selected at the initial stage that
were used to distinguish the workload type. These status variables are Com_ratio,
Innodb_log_writes, key_reads, key_read_rquests, key_writes, key_write_requests, opened_files
and Qcache_hits, Questions, sort_rows and table_locks_immediate. Among these four status
Workload Management through Characterization and Idleness Detction 55 Chapter 4 Workload Characterization variables (Com_ratio, Innodb_log_writes, Qcache_hits and Questions) are those which were
already used by [Zewdu et al. (2009)] to classify the database workload. Once the key status
variables are selected, training data for CBR is constructed that includes these selected status
variables with their values and particular workload type.
After the various experiments over OLTP and DSS like workloads, 7 status variables
among the 10 are eliminated as their values were not affecting the workload type too much and
took only 3 of them which are vital in distinguishing the workload type. These 3 selected status
variables are key_reads, key_writes and table_locks. In the following section, these 3 selected
status variables are discussed with their recoded values after the execution of the OLTP and DSS
like workloads in isolation. These values are recorded with an interval of 5 seconds and also
represented through graphs for better understanding. To take the values of status variables,
snapshots are taken with an interval of 5 seconds during the workload execution. Finally, the
recorded values before execution are subtracted from the values after execution in order to take
the actual effect of each status variable. After executing various OLTP and DSS like workloads
and observing their associative snapshots, we have selected the 5 seconds as the snapshot
interval as in average at least one of the OLTP or DSS statement gets its completion.
Key_Writes: This variable is dedicated to represent the number of key blocks to disk.
The value of this variable remains very small in the DSS workload as compared to the OLTP
workload. As the workload starts its execution, the value of this variable increases but this
increase always remains maximum up to 0.2 in DSS types of workloads. However, in case of the
OLTP workload the value of key_writes increases with a very high rate as compare to DSS and
its bench mark value is up to 1. The consecutive 30 snapshots are taken and shown in Figure 4.3.
The graph represents a major difference between the OLTP and DSS workloads as the value of
Key_writes increases with the execution of the OLTP workload but remain constant or little
change during the DSS types of workloads.
Workload Management through Characterization and Idleness Detction 56 Chapter 4 Workload Characterization Figure 4.2
OLTP and DSS workload execution with respect to Key_writes Key_reads: This status variable represents the number of physical reads of a key block
from the disk. The value of the variable remains unchanged during the execution of the OLTP
workload; however its value increases rapidly in DSS workload as shown in Figure 4.4, while
value of key_reads remains constant or very little change is seen during the OLTP type of
workload.
Figure 4.3
OLTP and DSS workload execution with respect to Key_reads Table_locks_immediate: This variable represents the number of times that a request for
a table lock could be allowed immediately. This variable has a little bit similar behavior to
Key_reads status variable, as its value increases rapidly during the execution of the DSS
workloads while a very small increase has been observed during the OLTP workloads as shown
in Figure 4.5. The reason of more locks in the DSS workload is due to the involvement of more
number of tables and long term transactions as compared to the OLTP workload.
Workload Management through Characterization and Idleness Detction 57 Chapter 4 Workload Characterization Figure 4.4
OLTP and DSS workload execution with respect to Table_lock_immediate After the selection of these 3 status variables, their recorded values are made concise and
simple by converting them into the range of [0, 1]. The conversion is made by performing the
following steps:
a. Find the final value (or value of the last snapshot) of each individual status
variable.
b. Divide all the recorded snapshot values of that particular status variable by its
maximum or final value as identified in step a.
4.3.1.2 CBR Development
After the selection of key status variables, the sample data is constructed for CBR that is
taken by the execution of workload that is generated by executing the OLTP and DSS like
workloads. Both the benchmarks are used to represent the OLTP and DSS workloads. Raw
Snapshots are recorded against the three selected variables with their workload type during the
execution of workloads. These values are taken with the interval of 5 seconds. Average of each
selected parameter is calculated after getting their values. We standardized the collected data
through Z score and considered all the parameters have same weight or influence. Z score is the
measure that converts the values into the range of [0, 1]. Finally normalized data is used to
construct the CBR as shown in Table 4.1. The values represented in above graphs can also be
converted in the range of 0 to 1 by dividing each value of the column to the highest value of that
column. For example, in first column the maximum value is 1406831 that will be divided to each
value of the column to convert all the values in the range of [0, 1].
Workload Management through Characterization and Idleness Detction 58 Chapter 4 Workload Characterization Table 4.1
CBR with Sample Data Key_reads
Key_Writes
0
11312
105976
0
180164
121
1406831
0
234
0
.
.
.
.
.
5382
82
0
61754
65
102198
0
156979
221727
283629
.
.
.
.
.
Table_locks_
Workload
immediate
Type
39
66
78
41
78
42
98
48
49
50
.
.
.
.
.
OLTP
DSS
DSS
OLTP
DSS
OLTP
DSS
OLTP
OLTP
OLTP
.
.
.
.
.
There are four steps of the proposed database workload characterization technique to find
the particular workload type as shown in Figure 4.5. These steps include Retrieve, Reuse, Revise
and Retain.
Retrieve
It is the main step in the CBR where matching between incoming and stored cases is
performed. It is also important in the sense that missing similar case can lead towards inaccurate
case that ultimately makes wrong decisions. This is also important as the accuracy depend on the
number of cases stored and matching cases retrieval. The similarity of cases can be determined
through various techniques such as Euclidean distance, similarity matrices etc. The retrieval is
performed from the problem description and similar cases are found. The process of retrieval is
further divided into identification of the features, initially match, search, and selection of case in
general. In our case, as shown in Figure 4.6 after arrival of the workload to the DBMS, values of
the workload related parameters are taken. In CBR, retrieval phase takes much of the time in
spending the process of searching.
Workload Management through Characterization and Idleness Detction 59 Chapter 4 Workload Characterization Here, we have adopted the Euclidean distance method [Deza et al. (2009), Danielsson (1980)] to find the matched cases from CBR. In this method, distance between two points with
co-ordinates (p1, p2) and (q1, q2) is calculated as:
However, if we have to find the distance between two points which have more than two
co-ordinates then the formula for Euclidean distance is computed as the root of square
differences between coordinates of a pair of objects. For example, we have p = (p1, p2, ..., pn)
and q = (q1, q2, ..., qn). In this case the distance from p to q or q to p is:
The Euclidean distance method is selected here to identify the matched cases from the
CBR as it is simple, faster and based on the actual ratings comparison.
Figure 4.5
Steps to Characterize and Classify the Workload via CBR Workload Management through Characterization and Idleness Detction 60 Chapter 4 Workload Characterization Example 1
Let’s DBMS encounter a new workload with 117,102099 and 45 values against the three
status variables. Now, we have to find that it matches which of the already stored cases. In this
case first of all, the values of the status variables shown in Table 4.1 and new workload are
converted into the range of [0, 1]. The CBR values are converted by dividing all three values of
each row to the row’s highest value and status variables values of new workload are divided by
102099. The converted values of the new workload will become 0.0011, 1.000, 0.0004
respectively and the CBR records are shown in Table 4.2. Now, difference of the new workload
with all the records of CBR is calculated using above Euclidean Distance formula. The workload
type of the new workload is taken from the row/ record that have minimum distance value. In
this example, minimum value of the Euclidean distance is 0.0001 (row 6, which is also
highlighted) and the workload type in the row is OLTP. Therefore, the workload type of the
incoming workload is OLTP.
Table 4.2
Euclidean Difference Calculation for Incoming Workload (Example 1) New Workload (in the range of [0, 1])
CBR Data (In the range of [0, 1])
Workload
Type
Key_reads
Key_Writes
Table_locks
Euclidean
Difference
Key_reads
Key_Writes
Table_locks
_immediate
0.0000
1.0000
0.0072
OLTP
0.0011
1.0000
0.0004
0.0069
1.0000
0.0072
0.0058
DSS
0.0011
1.0000
0.0004
1.408
1.0000
0.0000
0.0007
DSS
0.0011
1.0000
0.0004
1.413
0.0000
1.0000
0.0007
OLTP
0.0011
1.0000
0.0004
0.0011
1.0000
0.0004
0.0004
DSS
0.0011
1.0000
0.0004
1.413
0.0012
1.0000
0.0004
OLTP
0.0011
1.0000
0.0004
0.0001
1.0000
0.0000
0.0001
DSS
0.0011
1.0000
0.0004
1.413
0.0000
1.0000
0.0003
OLTP
0.0011
1.0000
0.0004
0.0011
0.0011
1.0000
0.0002
OLTP
0.0011
1.0000
0.0004
0.0002
0.0000
1.0000
0.0002
OLTP
0.0011
1.0000
0.0004
0.0011
_immediate
Example 2
In second example the incoming workload have the values 324563, 98 and 34 against three status
variables respectively. Again in this case, values of the status variables as shown in Table 4.2
and incoming workload are converted into the range of [0, 1]. The converted values of the
incoming workload will become 1.0000, 0.0003, 0.0001 respectively and the CBR records are
Workload Management through Characterization and Idleness Detction 61 Chapter 4 Workload Characterization shown in Table 4.2. Now, difference of the new workload with all the records of CBR is
calculated using above Euclidean Distance formula. The workload type of the incoming
workload is taken from the row/ record that have minimum distance value. In this example,
minimum value of the Euclidean distance is 0.00030398 (row 7, which is also highlighted).
Therefore, the workload type of the incoming workload is DSS.
Table 4.3
Euclidean Difference Calculation for Incoming Workload (Example 2) New Workload (in the range of [0, 1])
CBR Data (In the range of [0, 1])
Euclidean
Difference
Key_Write
Table_locks
s
_immediate
Euclidean
Difference
1.0000
0.0003
0.0001
1.41401811
DSS
1.0000
0.0003
0.0001
0.00900505
0.0007
DSS
1.0000
0.0003
0.0001
0.00069976
1.0000
0.0007
OLTP
1.0000
0.0003
0.0001
1.41400018
1.0000
0.0004
0.0004
DSS
1.0000
0.0003
0.0001
0.00033342
0.0012
1.0000
0.0004
OLTP
1.0000
0.0003
0.0001
1.41316303
1.0000
0.0000
0.0001
DSS
1.0000
0.0003
0.0001
0.00030398
0.0000
1.0000
0.0003
OLTP
1.0000
0.0003
0.0001
1.41400009
0.0011
1.0000
0.0002
OLTP
1.0000
0.0003
0.0001
1.41325391
0.0000
1.0000
0.0002
OLTP
1.0000
0.0003
0.0001
1.41400007
Key_reads
Key_Writes
Table_locks
_immediate
Workload
Type
Key_reads
0.0000
1.0000
0.0072
OLTP
1.0000
0.0072
0.0058
1.0000
0.0000
0.0000
We have explained the method by which the nearest matching records from the CBR is
retrieved. However, a problem with the above method is that the search algorithm takes too
much time specially when the size of CBR is large. The complexity of above search Algorithm is
O(n) as Euclidean distance is calculated for all the records of the CBR.
In order to minimize the search time, Hash search technique is adopted. Hashing is the
fastest search technique that takes just unit time in searching process, i.e. O (1) when there is no
hash collision. The nearest searched case are determined by taking the Hash value from the
selected parameters of the workload. In the preparation of training data, the hash value for each
row is determined that is assigned from the selected workload parameter’s values. The Hash
value for each row consists of four digits that are taken by selecting the number of digits from
Workload Management through Characterization and Idleness Detction 62 Chapter 4 Workload Characterization the first two columns (i.e. Key_reads
and
Key_writes)
and
value
obtained
by
Table_locks_immediate Mod 100.
H (Key) = Number of Digits (Key_reads) + Number of Digits (Key_writes) +
Table_locks_immediate Mod 100
By adopting this scheme, in case of Table 4.1 first row will have the Hash value 1439 (1
digit in 0, 4 digits in 5382 and 39 from 39). The same strategy is adopted to calculate the hash
value for the remaining rows and now their hash values are 5266, 6178, 1541, 6278, 3642, 7198,
1648, 3649 and 1650. These Hash values are stored with their associative workload type into the
CBR for future use as shown in Table 4.4.
Table 4.4
CBR Data with Hash Values H (key)
Key_reads
Key_Writes
1439
5266
6178
1541
6278
3642
7198
1648
3649
1650
.
.
.
.
.
0
11312
105976
0
180164
121
1406831
0
234
0
.
.
.
.
.
5382
82
0
61754
65
102198
0
156979
221727
283629
.
.
.
.
.
Table_locks_
Workload
immediate
Type
39
66
78
41
78
42
98
48
49
50
.
.
.
.
.
OLTP
DSS
DSS
OLTP
DSS
OLTP
DSS
OLTP
OLTP
OLTP
.
.
.
.
.
In case of example 1 where the workload to be searched has the parameter values
117,102099 and 45, first of all the hash value is calculated i.e. 3645. There is no exact matching
record of 3645, therefore most similar records are identified from the CBR which are 3642 and
3649. The values of these two selected rows and the incoming workload are converted into the
range of [0, 1] and Euclidean distance is calculated as discussed previously. In second example,
the parameter values are 324563, 98 and 34 and its associated hash value will be 6234. Again,
Workload Management through Characterization and Idleness Detction 63 Chapter 4 Workload Characterization the exact match is not there while the nearest hash values are 6278 and 6178. Therefore, the
associated workload type is assigned to the current workload.
Reuse
The Reuse step starts after the completion of the Retrieve and can achieve by performing
two sub-steps. First, finding the differences among the current and the retrieved case; and second
identification of the portion of the retrieved case that must be stored as a new case into the CBR
for future use. The reuse step proposes the workload type from the solution that retrieved in
previous step.
Reusing a retrieved cases can be as easy as returning the retrieved solution, unchanged as
here we just have to use the workload type of the most similar case from the retrieve step. This
methodology is mostly used in the classification problems where each workload class or type is
likely to be represented frequently in the case base and therefore the most similar retrieved case
if sufficiently similar is most probably contain the right solution. However it becomes more
difficult when there are huge and major differences between the incoming workload and the
retrieved cases. In those circumstances the retrieved solution may need to be adapted to account
for these important differences.
Revise
The revise step is initiated only when the solution found from the previous step (reuse) is
inaccurate. The revise process is achieved by evaluating the reuse solution and corrects it by
applying the domain knowledge. In this step there is a need of some adaptation method that is
applied to revise the solution by observing the change in the retrieve process and how the
retrieve solution can be changed. We have used the Substitution adaptation method as proposed
by (Kolodner, 1993), where the transformation adaptation alters the structure of the solution.
There are two ways of adaptation, in first case there is involvement of the user at each
substitution step who decides whether the substitution is according to his requirement or not. So
user decides about the substitution accepts or rejection. While in second case, the system
performs all the substitution at its own and finally the substituted result is sent to the user for
final evaluation. For training data, we have adopted the first one in order to get accuracy.
Workload Management through Characterization and Idleness Detction 64 Chapter 4 Workload Characterization Retain
The retain phase is the final step in CBR that insert the valuable information of the
current case into the knowledge base that is helpful for future problems solving. However,
duplicate or extra information should be avoided during the incorporation of new case otherwise
it slows down the retrieval process. The retain process also involves that how information will be
stored and its associated index in current memory structure.
4.3.2 Detection of the DB Workload
Basic purpose of the workload detection is to discover, analyze and classify the database
workload. The values of the workload associated status variables change according to the
specific operation with the execution of some transactions or workload as we have discussed
above. In MySQL 5.1, there are 291 status variables which are used to distinguish the different
operations and from these 3 status variables are selected in order to distinguish the workload type
i.e. OLTP or DSS. In order to prepare the training data, we have taken the snapshots of these
status variables during the workload execution with an interval of 5 seconds. In each snapshot,
values of the status variables are recorded and stored. Finally, the recorded values of each
snapshot are subtracted from its predecessor snapshot to take the actual effect during the
execution. The recorded values for the three status variables are converted into the range of [0,
1].
4.4 Experimental Setup & Results Validation
The hardware is same as we have discussed in the previous chapter that is a Database
Server, Backup Server, six Personal Computers and a Switch that connects all devices to form a
network. MySql Database Management System is used to execute various workloads. MySql
Server is installed on the server machine and MySql client on the client machine. The main
database is stored on the server machine while backup server machine is responsible to maintain
the backup of the database.
Workload Management through Characterization and Idleness Detction 65 Chapter 4 Workload Characterization 4.4.1 Workload Editor
The workload editor is developed in VB.Net with MySql database at backend. It is used
to generate the various types of workloads that user wants to execute. There are number of daily
used workloads are stored in the database on the server machine that are available to users. The
workload editor allows users to either select the workload or query that are stored in the database
or write their own workload or query.
The proposed technique for characterization of the database workload works as the user
writes the workload in the workload editor or selects the workloads from the already stored
workloads. The characterizing process starts in parallel with user input to the workload editor. So
the workload is characterized and classified before the actual start of workload execution. In
previous characterization techniques, the characterizing process starts after the submission of
workload to the system. In contrast to the previous approaches, the proposed technique
characterizes the workload in advance, i.e. before the submission of workload to system. Due to
this reason, the proposed characterizing technique saves the precious execution time that
ultimately manages the workload in more efficient manner.
4.4.2 Workload Database
Main purpose of the workload database is to provide the common workloads that are
usually used by the database users to get useful information in the form of reports or allows
many other operations such as insertions, deletions, updations etc. The workload database is
stored at the Database Server, so that all the database users can use it. The sample workloads
consist of the OLTP and DSS like queries that include insert, update, delete, select, create, load,
alter and many other types of queries.
New workloads are also inserted into the workload database from the administrator or as
users write some new workload that is entirely different from the stored workloads or queries. So
by doing this there is no need to write the same workload again as it is stored in the database
after its successful execution and is available to all database users. The proposed approach saved
the precious time of the users by providing the mostly used workloads in the workload editor
Workload Management through Characterization and Idleness Detction 66 Chapter 4 Workload Characterization otherwise they have type even the same workload or query again and again that is just wastage of
their useful time.
Following are the queries that we have used for the experiments and are taken from the
TPC-H benchmarks.
Sample Queries Available in Workload Editor
create table nation (
n_nationkey decimal(3,0) not null,
n_name
char(25) not null,
n_regionkey decimal(2,0) not null,
n_comment
varchar(152)
) TYPE=MyISAM;
create table region (
r_regionkey decimal(2,0) not null,
r_name
char(25) not null,
r_comment
varchar(152)
) TYPE=MyISAM;
create table part (
p_partkey
decimal(10,0) not null,
p_name
varchar(55) not null,
p_mfgr
char(25) not null,
p_brand
char(10) not null,
p_type
varchar(25) not null,
p_size
decimal(2,0) not null,
p_container
char(10) not null,
p_retailprice decimal(6,2) not null,
p_comment
varchar(23) not null
) TYPE=MyISAM;
create table supplier (
s_suppkey
decimal(8,0) not null,
s_name
char(25) not null,
s_address
varchar(40) not null,
s_nationkey
decimal(3,0) not null,
s_phone
char(15) not null,
s_acctbal
decimal(7,2) not null,
s_comment
varchar(101) not null
) TYPE=MyISAM;
create table partsupp (
ps_partkey
decimal(10,0) not null,
ps_suppkey
decimal(8,0) not null,
ps_availqty
decimal(5,0) not null,
Workload Management through Characterization and Idleness Detction 67 Chapter 4 Workload Characterization ps_supplycost decimal(6,2) not null,
ps_comment
varchar(199) not null
) TYPE=MyISAM;
create table customer (
c_custkey
decimal(9,0) not null,
c_name
varchar(25) not null,
c_address
varchar(40) not null,
c_nationkey
decimal(3,0) not null,
c_phone
char(15) not null,
c_acctbal
decimal(7,2) not null,
c_mktsegment char(10) not null,
c_comment
varchar(117) not null
) TYPE=MyISAM;
create table orders (
o_orderkey
decimal(12,0) not null,
o_custkey
decimal(9,0) not null,
o_orderstatus
char(1) not null,
o_totalprice
decimal(8,2) not null,
o_orderdate
date not null,
o_orderpriority char(15) not null,
o_clerk
char(15) not null,
o_shippriority
decimal(1,0) not null,
o_comment
varchar(79) not null
) TYPE=MyISAM;
create table lineitem (
l_orderkey
decimal(12,0) not null,
l_partkey
decimal(10,0) not null,
l_suppkey
decimal(8,0) not null,
l_linenumber decimal(1,0) not null,
l_quantity
decimal(2,0) not null,
l_extendedprice decimal(8,2) not null,
l_discount
decimal(3,2) not null,
l_tax
decimal(3,2) not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_shipdate
date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode
char(10) not null,
l_comment
varchar(44) not null
) TYPE=MyISAM;
create
create
create
create
create
index
index
index
index
index
fkey_nation_1 on nation(n_regionkey);
fkey_supplier_1 on supplier(s_nationkey);
fkey_customer_1 on customer(c_nationkey);
fkey_partsupp_1 on partsupp(ps_suppkey);
fkey_partsupp_2 on partsupp(ps_partkey);
Workload Management through Characterization and Idleness Detction 68 Chapter 4 Workload Characterization create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
create
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
index
fkey_orders_1 on orders(o_custkey);
fkey_lineitem_1 on lineitem(l_orderkey);
fkey_lineitem_2 on lineitem(l_partkey,l_suppkey);
fkey_lineitem_3 on lineitem(l_suppkey);
xxx1 on lineitem(l_shipdate);
xxx2 on customer(c_mktsegment);
xxx3 on orders(o_orderdate);
xxx4 on region(r_name);
xxx5 on lineitem(l_discount);
xxx6 on lineitem(l_quantity);
xxx7 on lineitem(l_returnflag);
xxx8 on lineitem(l_shipmode);
xxx9 on lineitem(l_commitdate);
xxx10 on lineitem(l_receiptdate);
xxx11 on lineitem(l_partkey);
xxx12 on part(p_size);
xxx13 on part(p_type);
xxx14 on partsupp(ps_supplycost);
xxx15 on nation(n_name);
xxx16 on part(p_name);
xxx17 on orders(o_clerk);
xxx18 on part(p_brand);
xxx19 on part(p_container);
LOAD DATA INFILE "D://implementation//DBGen//region.tbl"
INTO TABLE
region
FIELDS TERMINATED BY "|" LINES TERMINATED BY "\r\n"
(r_regionkey, r_name, r_comment);
LOAD DATA INFILE "D://implementation//DBGen//nation.tbl"
INTO TABLE
nation
FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE "D://implementation//DBGen//supplier.tbl" INTO TABLE
supplier FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE "D://implementation//DBGen//part.tbl"
INTO TABLE
part FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE "D://implementation//DBGen//partsupp.tbl" INTO TABLE
partsupp FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE "D://implementation//DBGen//customer.tbl"
INTO
TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE "D://implementation//DBGen//orders.tbl"
INTO TABLE
orders
FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE "D://implementation//DBGen//lineitem.tbl" INTO TABLE
lineitem FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';
ALTER TABLE region ADD CONSTRAINT pkey_region PRIMARY
KEY(r_regionkey);
ALTER TABLE nation ADD CONSTRAINT pkey_nation PRIMARY
KEY(n_nationkey);
ALTER TABLE part ADD CONSTRAINT pkey_part PRIMARY KEY(p_partkey);
ALTER TABLE supplier ADD CONSTRAINT pkey_supplier PRIMARY
KEY(s_suppkey);
ALTER TABLE partsupp ADD CONSTRAINT pkey_partsupp PRIMARY
KEY(ps_partkey,ps_suppkey);
Workload Management through Characterization and Idleness Detction 69 Chapter 4 Workload Characterization ALTER TABLE customer ADD CONSTRAINT pkey_customer PRIMARY
KEY(c_custkey);
ALTER TABLE lineitem ADD CONSTRAINT pkey_lineitem PRIMARY
KEY(l_orderkey,l_linenumber);
ALTER TABLE orders ADD CONSTRAINT pkey_orders PRIMARY KEY(o_orderkey);
analyze
analyze
analyze
analyze
analyze
analyze
analyze
analyze
table
table
table
table
table
table
table
table
nation;
region;
part;
supplier;
partsupp;
customer;
orders;
lineitem;
/* TPC_H Query 1 - Pricing Summary Report */
SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,
SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1L_DISCOUNT)) AS SUM_DISC_PRICE,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE,
AVG(L_QUANTITY) AS AVG_QTY,
AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC,
COUNT(*) AS COUNT_ORDER
FROM LINEITEM
// WHERE L_SHIPDATE <= dateadd(dd, -90, cast('1998-12-01' as date))
WHERE L_SHIPDATE <= '1998-12-01'
GROUP BY L_RETURNFLAG, L_LINESTATUS
ORDER BY L_RETURNFLAG,L_LINESTATUS
/* TPC_H Query 2 - Minimum Cost Supplier */
SELECT S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY, P_MFGR, S_ADDRESS,
S_PHONE, S_COMMENT
FROM PART, SUPPLIER, PARTSUPP, NATION, REGION
WHERE
P_PARTKEY = PS_PARTKEY
AND S_SUPPKEY = PS_SUPPKEY
AND P_SIZE = 15
AND P_TYPE LIKE '%%BRASS'
AND S_NATIONKEY = N_NATIONKEY
AND N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'EUROPE'
AND PS_SUPPLYCOST = (SELECT MIN(PS_SUPPLYCOST) FROM PARTSUPP,
SUPPLIER, NATION, REGION
WHERE P_PARTKEY = PS_PARTKEY
AND S_SUPPKEY = PS_SUPPKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'EUROPE')
ORDER BY
S_ACCTBAL DESC, N_NAME, S_NAME, P_PARTKEY
TOP 100
Workload Management through Characterization and Idleness Detction 70 Chapter 4 Workload Characterization /* TPC_H Query 3 - Minimum Cost Supplier */
Select l_orderkey, sum(l_extendedprice*(1-l_discount)) as
revenue,o_orderdate, o_shippriority from customer, orders, lineitem
Where c_mktsegment = '[SEGMENT]'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '[DATE]'
and l_shipdate > date '[DATE]'
group by
l_orderkey, o_orderdate, o_shippriority
order by
revenue desc, o_orderdate;
/* TPC_H Query 4 - */
Select o_orderpriority, count(*) as order_count from orders
where
o_orderdate >= date '[DATE]'
and o_orderdate < date '[DATE]' + interval '3' month
and exists ( select * from lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
/* TPC_H Query 5 - Local Supplier Volume */
SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE
FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION
WHERE
C_CUSTKEY = O_CUSTKEY
AND L_ORDERKEY = O_ORDERKEY
AND L_SUPPKEY = S_SUPPKEY
AND C_NATIONKEY = S_NATIONKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'ASIA' AND O_ORDERDATE >= '1994-01-01'
AND O_ORDERDATE < DATEADD(YY, 1, cast('1994-01-01' as date))
GROUP BY
N_NAME
ORDER BY
REVENUE DESC
/* TPC_H Query 6 – not execute*/
Select
sum (l_extendedprice*l_discount) as revenue from lineitem
where
l_shipdate >= date '[DATE]'
and l_shipdate < date '[DATE]' + interval '1' year
and l_discount between [DISCOUNT] - 0.01
Workload Management through Characterization and Idleness Detction 71 Chapter 4 Workload Characterization and [DISCOUNT] + 0.01
and l_quantity < [QUANTITY];
/* TPC_H Query 7 - */
Select supp_nation, cust_nation, l_year, sum(volume) as revenue
from (select n1.n_name as supp_nation, n2.n_name as cust_nation,
extract(year from
l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as
volume
from supplier, lineitem, orders, customer,
nation n1, nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]')
or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by supp_nation, cust_nation, l_year
order by supp_nation, cust_nation, l_year;
/* TPC_H Query 8 - National Market Share – not executed*/
SELECT O_YEAR, SUM(CASE WHEN NATION = 'BRAZIL' THEN VOLUME ELSE 0
END)/SUM(VOLUME) AS MKT_SHARE
FROM (SELECT datepart(yy,O_ORDERDATE) AS O_YEAR, L_EXTENDEDPRICE*(1L_DISCOUNT) AS VOLUME, N2.N_NAME AS NATION
FROM PART, SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION N1, NATION N2,
REGION
WHERE
P_PARTKEY = L_PARTKEY AND S_SUPPKEY = L_SUPPKEY
AND L_ORDERKEY = O_ORDERKEY
AND O_CUSTKEY = C_CUSTKEY
AND C_NATIONKEY = N1.N_NATIONKEY
AND N1.N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'AMERICA'
AND S_NATIONKEY = N2.N_NATIONKEY
AND O_ORDERDATE BETWEEN '1995-01-01'
AND '1996-12-31'
AND P_TYPE= 'ECONOMY ANODIZED STEEL') AS ALL_NATIONS
GROUP BY
O_YEAR
ORDER BY
O_YEAR
Workload Management through Characterization and Idleness Detction 72 Chapter 4 Workload Characterization /* TPC_H Query 9 - */
select nation, o_year, sum(amount) as sum_profit
from (
select n_name as nation, extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as
amount
from part, supplier, lineitem, partsupp, orders, nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%[COLOR]%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
/* TPC_H Query 10 - */
Select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as
revenue,c_acctbal, n_name, c_address, c_phone, c_comment
from
customer, orders, lineitem, nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '[DATE]'
and o_orderdate < date '[DATE]' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
order by
revenue desc;
/* TPC_H Query 11 - Important Stock Identification */
SELECT PS_PARTKEY, SUM(PS_SUPPLYCOST*PS_AVAILQTY) AS VALUE
FROM PARTSUPP, SUPPLIER, NATION
WHERE
PS_SUPPKEY = S_SUPPKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_NAME = 'GERMANY'
GROUP BY
PS_PARTKEY
HAVING SUM(PS_SUPPLYCOST*PS_AVAILQTY) > (SELECT
SUM(PS_SUPPLYCOST*PS_AVAILQTY) * 0.0001000000
Workload Management through Characterization and Idleness Detction 73 Chapter 4 Workload Characterization FROM PARTSUPP, SUPPLIER, NATION
WHERE PS_SUPPKEY = S_SUPPKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_NAME = 'GERMANY')
ORDER BY
VALUE DESC
/* TPC_H Query 12 - */
Select l_shipmode, sum(case
when o_orderpriority ='1-URGENT'
or o_orderpriority ='2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders, lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '[DATE]'
and l_receiptdate < date '[DATE]' + interval '1' year
group by
l_shipmode
order by
l_shipmode;
/* TPC_H Query 13 - */
select
c_count, count(*) as custdist
from (
select c_custkey, count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like ‘%[WORD1]%[WORD2]%’
group by
c_custkey
)as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc;
Workload Management through Characterization and Idleness Detction 74 Chapter 4 Workload Characterization /* TPC_H Query 14 - */
Select 100.00 * sum (case
when p_type like 'PROMO%'
then l_extendedprice*(1-l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem, part
where
l_partkey = p_partkey
and l_shipdate >= date '[DATE]'
and l_shipdate < date '[DATE]' + interval '1' month;
/* TPC_H Query 15 - */
select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from
lineitem
where
l_shipdate >= date '[DATE]'
and l_shipdate < date '[DATE]' + interval '3' month
group by
l_suppkey;
select s_suppkey, s_name, s_address, s_phone, total_revenue from
supplier, revenue[STREAM_ID]
where
s_suppkey = supplier_no
and total_revenue = (select max(total_revenue) from
revenue[STREAM_ID])
order by
s_suppkey;
drop view revenue[STREAM_ID];
/* TPC_H Query 16 - Parts/Supplier Relationship */
SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY) AS
SUPPLIER_CNT
FROM PARTSUPP, PART
WHERE
P_PARTKEY = PS_PARTKEY
AND P_BRAND <> 'Brand#45'
AND P_TYPE NOT LIKE 'MEDIUM POLISHED%%'
AND P_SIZE IN (49, 14, 23, 45, 19, 3, 36, 9)
AND PS_SUPPKEY NOT IN (SELECT S_SUPPKEY FROM SUPPLIER
WHERE
S_COMMENT LIKE '%%Customer%%Complaints%%')
GROUP BY
P_BRAND, P_TYPE, P_SIZE
ORDER BY
SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE
/* TPC_H Query 17 - Small-Quantity-Order Revenue */
SELECT SUM(L_EXTENDEDPRICE)/7.0 AS AVG_YEARLY FROM LINEITEM, PART
WHERE
Workload Management through Characterization and Idleness Detction 75 Chapter 4 Workload Characterization P_PARTKEY = L_PARTKEY
AND P_BRAND = 'Brand#23'
AND P_CONTAINER = 'MED BOX'
AND L_QUANTITY < (SELECT 0.2*AVG(L_QUANTITY) FROM LINEITEM WHERE
L_PARTKEY = P_PARTKEY)
/* TPC_H Query 18 - Large Volume Customer */
SELECT TOP 100 C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE,
O_TOTALPRICE, SUM(L_QUANTITY)
FROM CUSTOMER, ORDERS, LINEITEM
WHERE
O_ORDERKEY IN (SELECT L_ORDERKEY FROM LINEITEM GROUP BY
L_ORDERKEY HAVING SUM(L_QUANTITY) >
300)
AND C_CUSTKEY = O_CUSTKEY
AND O_ORDERKEY = L_ORDERKEY
GROUP BY
C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE
ORDER BY
O_TOTALPRICE DESC, O_ORDERDATE
/* TPC_H Query 19 - */
Select sum(l_extendedprice * (1 - l_discount) ) as revenue from
lineitem, part
where
(
p_partkey = l_partkey
and p_brand = ‘[BRAND1]’
and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)
and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10
and p_size between 1 and 5
and l_shipmode in (‘AIR’, ‘AIR REG’)
and l_shipinstruct = ‘DELIVER IN PERSON’
)
or
(
p_partkey = l_partkey
and p_brand = ‘[BRAND2]’
and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’)
and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10
and p_size between 1 and 10
and l_shipmode in (‘AIR’, ‘AIR REG’)
and l_shipinstruct = ‘DELIVER IN PERSON’
)
or
(
p_partkey = l_partkey
and p_brand = ‘[BRAND3]’
and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)
and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10
and p_size between 1 and 15
Workload Management through Characterization and Idleness Detction 76 Chapter 4 Workload Characterization and l_shipmode in (‘AIR’, ‘AIR REG’)
and l_shipinstruct = ‘DELIVER IN PERSON’
);
/* TPC_H Query 20 - */
select s_name, s_address from supplier, nation
where
s_suppkey in (
select ps_suppkey from partsupp
where
ps_partkey in (
select p_partkey from part
where
p_name like '[COLOR]%'
)
and ps_availqty > (
select 0.5 * sum(l_quantity) from lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date('[DATE]’)
and l_shipdate < date('[DATE]’) + interval ‘1’ year
)
)
and s_nationkey = n_nationkey
and n_name = '[NATION]'
order by
s_name;
/* TPC_H Query 21 - Suppliers Who Kept Orders Waiting */
SELECT TOP 100 S_NAME, COUNT(*) AS NUMWAIT
FROM SUPPLIER, LINEITEM L1, ORDERS, NATION
WHERE
S_SUPPKEY = L1.L_SUPPKEY
AND O_ORDERKEY = L1.L_ORDERKEY
AND O_ORDERSTATUS = 'F'
AND L1.L_RECEIPTDATE> L1.L_COMMITDATE
AND EXISTS (SELECT * FROM LINEITEM L2 WHERE L2.L_ORDERKEY =
L1.L_ORDERKEY
AND L2.L_SUPPKEY <> L1.L_SUPPKEY)
AND NOT EXISTS (SELECT * FROM LINEITEM L3
WHERE
L3.L_ORDERKEY = L1.L_ORDERKEY
AND L3.L_SUPPKEY <> L1.L_SUPPKEY
AND L3.L_RECEIPTDATE > L3.L_COMMITDATE)
AND S_NATIONKEY = N_NATIONKEY
AND N_NAME = 'SAUDI ARABIA'
GROUP BY
S_NAME
ORDER BY
NUMWAIT DESC, S_NAME
Workload Management through Characterization and Idleness Detction 77 Chapter 4 Workload Characterization /* TPC_H Query 22 - */
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal
from
(select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from
customer
where
/*substring(c_phone from 1 for 2) in
('[I1]','[I2]’,'[I3]','[I4]','[I5]','[I6]','[I7]')
and */ c_acctbal > (
select avg(c_acctbal) from customer
where
c_acctbal > 0.00
/*and substring (c_phone from 1 for 2) in
('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')*/
)
and not exists (select * from orders
where
o_custkey = c_custkey
)
) as custsale
group by
cntrycode
order by
cntrycode;
4.4.3 How Proposed Workload Technique Works?
We have executed the OLTP and DSS like workloads. During their execution snapshots
are collected against the three selected status variables with an interval of 5 seconds. To build the
training data, these snapshots are marked as OLTP or DSS according to their type and are used
for the workload characterization. However, the training snapshots must be sufficient to build a
workload characterization and selected interval must contain the time period in which at least
one SQL statement must complete its execution.
To get the sufficient snapshot interval first of all 1 second is selected, where we observed
that many OLTP SQL statements completely executes but this is not true for the DSS like SQL
statements as they are large and take large time in their completion. Due to this reason, we have
increment the snapshot time and observing query completion behavior in that particular snapshot
time. After executing various OLTP and DSS like workloads; and observing their associative
snapshots, we have selected the 5 seconds as snapshot interval period as at least one OLTP or
DSS statement get its completion in this time.
Workload Management through Characterization and Idleness Detction 78 Chapter 4 Workload Characterization After building the training data, we have executed the OLTP and DSS like workloads
from 20 to 30 minutes. During the workload execution, snapshots are stored with an interval of 5
seconds. Then the proposed workload characterization technique is applied to distinguish and
classify the workload into OLTP and DSS.
4.4.4 Results Discussion
The proposed database workload characterization technique is validated and found to be
effective. For experiments, we have taken the OLTP and DSS like workloads (ordering profile as
OLTP and browsing profiles as DSS workloads). Values of three selected MySql status variables
are taken through the show status statement of MySql and stored in the database to identify the
workload type. The workload is generated from the client machines in the form of threads and
during the workload execution the status variable values are recorded with an interval of 5
seconds. The workload characterization during the browsing workload provides the information
that 89.3% of the workload is DSS and 11.7% is OLTP. In case of the ordering workload, it
reports that 8.5% of the workload as DSS and remaining as OLTP. After that, the workload that
is related with the shopping is executed. After its execution and analysis, it is found that there is
77% DSS and 23% OLTP workload in shopping profile. The results obtained from these
experiments are as per our anticipation due to the benchmark description and understandings of
the selected attributes. Same experiments are repeated multiple times to get the accurate results.
This method worked well to identify the workload type but still there are some issues.
First, the workload parameters are only available during or after the complete execution of the
workload. Due to this reason, small transactions cannot be managed as they get completion
before the identification of their workload type. While in case of large transactions, where for
efficient execution some transactions are necessary to stop and their executed part requires to be
stored in memory for later use (after resume). The other problem with the approach is that it
cannot be used for mix type of workloads, i.e. the workloads that consists of both OLTP and
DSS queries.
Workload Management through Characterization and Idleness Detction 79 Chapter 4 Workload Characterization Summary
In this chapter, we have discussed the workload characterization approach based on Case
Base Reasoning that makes the decision about the workload type. The approach starts its
working with the identification of the key workload parameters that are more effective for the
characterization and classification. On the basis of these parameters, workload is classified into
the OLTP and DSS types of workload. The approach is validated by executing various OLTP
and DSS like workloads and found to be effective in identifying the workload type.
Workload Management through Characterization and Idleness Detction 80 Chapter 5: Fuzzy Logic Based
Scheduling
Chapter 5 Fuzzy Logic Based Scheduling The chapter discusses the Fuzzy logic based technique for scheduling the DBMS. First
section of the chapter provides an overview and background of the scheduling technique and
Fuzzy Logic. Here the workloads are arranged on the basis of the OLTP and DSS percentage of
each workload. Subsequently, the entire workload is arranged for execution on the basis of that
percentage. The proposed Fuzzy based scheduling considers the workload type, size and
importance.
5.1 Introduction and Background
In this section, scheduling and Fuzzy Logic are discussed.
5.1.1 Scheduling
In DBMS, scheduling is a process in which the workloads that are generated by the users
and system arranged in some specific way for faster execution. It plays an important role to take
care of all the underlying issues and arranges the database workload in such a way that each
request has to wait for a least time (wait time) with maximum CPU utilization. Scheduling in
DBMS can be performed either internally or externally [Schroeder et al. (2006)]. In case of
internal scheduling, one achieves it by updating the internals of the DBMS while in case of
external scheduling there is no need to change the internals. DBMS researchers, vendors, and
practitioners prefer the external scheduling due to a number of benefits. For example, they do not
need to learn about the DBMS’s internal architecture, not necessary to change the DBMS
internals, and it is general, not specific to certain DBMS hence portable. Moreover, external
scheduling can be modified easily as per requirements and objectives of the organization. It has
been applied on the web storage and database servers for many years. However, mostly the
external scheduling is used for limiting the workload or number of requests i.e.
multiprogramming level.
In a small organization, DBMS executes all the requests in time without compromising
the higher management work. However, in the case where huge amount of sessions, requests,
Workload Management through Characterization and Idleness Detction 82 Chapter 5 Fuzzy Logic Based Scheduling and other system jobs arrive for execution simultaneously, DBMS either crashes due to the
overload problem or higher management has to wait for a long time to get important and useful
reports for decision making. There is also a need to manage the workload not only in an efficient
way but giving high priority to important workload as compared to the workload generated from
lower management. All these factors contribute in need of an efficient scheduler that may be
either internal or external. The scheduler will send the workload to the DBMS in an order by
which both starvation and underload problems can be avoided. Hence, the proposed scheduler is
required to take care of the user and system oriented workloads with respect to size and
importance all the time. This strategy also helps to use maximum CPU cycles and ultimately
enhance the efficiency of the DBMSs.
5.1.1 Fuzzy Logic
In Boolean logic there are two values True and False, while in Fuzzy logic there the truth
or false value is in the range of [0, 1]. Fuzzy logic was introduced by Lotfi Zadeh in 1965 that is
based on the concept of partial truth and approximation rather than fixed [Zadeh, (1965)]. It is
similar to probabilistic logic as in both cases the truth value has range from 0 to 1, but Fuzzy
represents the degree of truth while the probability shows the likelihood. After its introduction, it
has been used in various fields including Control Theory, Engineering, Networks, DBMS, and
Artificial Intelligence etc.
The Fuzzy logic is performed through four steps which are Fuzzification, Fuzzy Rules
Evaluation, Aggregation and Defuzzification. In fuzzification step, crisp input values are
evaluated by applying the membership functions that returns a value in between 0 and 1.
Membership function must be simple and easily convertible to fuzzy set. There different methods
that are used for this purpose such as Gaussian, Triangular or Trapezoidal. In Rule Evaluation
the evaluated output received from the fuzzification step is checked through some already
defined fuzzy rules which portray the expert decision and written in the form of fuzzy words.
The Aggregation is applied to each output variable that performs the conversion of result into a
fuzzy set. Finally, in Defuzzification, the fuzzified rules are used to produce the final output by
Workload Management through Characterization and Idleness Detction 83 Chapter 5 Fuzzy Logic Based Scheduling changing the fuzzy set into output values. Usually two methods (Maximum method and Centroid
methods) are used in this step.
5.2 Fuzzy Based (FB) Scheduler
It is not necessary that DBMS always experiences solely OLTP or DSS workload. By
solely we mean the crisp logic i.e. workload is 100% OLTP and 0% DSS or 100% DSS and 0%
OLTP. Most of the time DBMS experiences mix type of workload so we cannot achieve better
performance from the DBMS if workload is considered to be based on crisp logic. Behavior of
the workload in DBMS mostly remains the non-deterministic for example, w1 and w2 consists of
20% OLTP and 80% DSS transactions, 60% OLTP and 40% DSS transactions, respectively.
There is a need of a scheduler that can find the percentage of the OLTP and DSS of the incoming
workload. By doing this, database workloads are arranged in a better way.
The Fuzzy logic is adopted to find out the workload type percentage for the incoming
workload. Fuzzy logic is related with the Fuzzy set theory [Zadeh, (1965)] that gives
approximate value within the range of (0, 1). It is used in various fields including Control
Theory, Engineering, Networks, DBMS, and Artificial Intelligence etc. The crisp logic
represents the value either true or false solely while in the other case Fuzzy logic represents the
degree of the truthiness or falseness. The following section discusses the architecture of the
proposed scheduler for DBMS that uses the Fuzzy logic to calculate the OLTP and DSS
percentage. Finally, the FBS on the basis of OLTP and DSS percentage arranges the partitioned
workload through the Fuzzy rules and membership functions.
The scheduler takes the workload related parameter’s values from the characterization
module. Impact of each workload is calculated on the basis of parameters and workload type as
identified. We are using the Fuzzy logic for scheduling the workload that will handle the
workload dynamically. In Boolean logic each decision has two outputs true or false however in
Fuzzy logic truth value has a range starting from 0 to 1. The Fuzzy logic works on the concept of
the partial truth, where the truth value may range between completely true and completely false.
Advantage to use the Fuzzy logic for scheduling the workload includes easy understanding,
Workload Management through Characterization and Idleness Detction 84 Chapter 5 Fuzzy Logic Based Scheduling incremental growing of the knowledge with the arrival of new queries and workload and not
necessary to know the inner detail of the workload or queries.
There are three steps of the proposed FB management which are Fuzzification,
Fuzzification Rules and Defuzzification. Following section describes these steps in detail that:
5.1.1 Fuzzification
In this first step the crisp input values are evaluated by applying the membership
functions that returns a value for the OLTP and DSS in between 0 and 1. There are various
choices for the membership function which may be Gaussian, triangular, or trapezoidal method.
However, for this particular problem, we have selected the Gaussian method as it is simple and
easily convertible to the Fuzzy set. Input value is in the Fuzzy set when the value received from
the membership function is closer to 1.
Table 5.1 represents the various queries involved in the database workload ‘w’ that
executed over the MySQL 5.1. The workload ‘w’ consists of 5 mix type of queries (Q1 - Q5) that
involve both the OLTP and DSS type of workload in each query. To find out the OLTP and DSS
percentage of each query, first we have calculated the total points of each query on the basis of
four columns which are the number of tables, conditions, group by fields, and math functions
involved in the workload.
Table 5.1
The Workload Involving Mix OLTP and DSS Queries Tables
SQL Statements in
workload
Q1(Insert, Select)
Q2 (Update, Select)
Q3 (Delete, Select)
Q4 (Insert, Select,
Select)
Q5 (Delete, Select)
Total
Conditions
Group By
Columns
Main
Inner
Query
Query
Math
Functions
Main
Inner
Query Query
Total
Points
OLT
P%
DSS
%
0
0
0
8
10
10
0.125
0.5
0.9
0.875
0.5
0.1
0
2
11
0.091
0.91
0
0
12
0.75
0.473
0.25
0.527
Main
Query
Inner
Query
Main
Query
Inner
Query
1
2
4
4
3
1
0
3
5
3
2
0
0
0
0
0
0
0
0
0
0
1
3
0
3
0
2
2
1
7
2
0
0
Workload Management through Characterization and Idleness Detction 85 Chapter 5 Fuzzy Logic Based Scheduling The values for these columns are taken from each query and then summed up. For
example, in case of query 1 the total points will be 1+4+3 = 8. After that the percentage of the
OLTP and DSS in query is calculated as:
For Insert query (Q1) the percentage is calculated as 1/8 + 0/8 + 0/8 + 0/8 = 0.125
Where 1, 0, 0, 0 are the values of tables, conditions, group by, and function parameters of the
query respectively and 8 represents the total points of the query.
While in case of Select query, the percentage is calculated as (Q1) 4/8 + 3/8 + 0/8 + 0/8 = 0.875
Where 4, 3, 0, 0 are the values of tables, conditions, group by, and function parameters of the
query respectively and 8 represents the total points of the query.
In the same way the OLTP and DSS percentage is calculated for the remaining queries
and finally the total percentage of the OLTP and DSS is calculated for the entire workload by
taking the sum and average of the OLTP and DSS percentage of the all the queries. In the above
case as represented in Table 5.1, 0.473% OLTP and 0.527% DSS workload is found. Finally, the
percentage of the OLTP and DSS percentage for the whole workload is calculated that is
generated during the particular time. Table 5.2 represents a particular workload that is generated
by the users during some specific time t. The entire workload consists of 6 sub-workloads that
are generated by the various users with different priorities. Percentage of the OLTP and DSS is
calculated for each sub-workload according to the above methodology. Now in the next step,
these sub-workloads are arranged in a way so that the average waiting time of each sub-workload
should be minimized with respect to higher priority.
Table 5.2
Percentage of OLTP and DSS in the Workload Involving Mix Queries
Workload
w1
w2
w3
w4
w5
w6
User Priority
2
6
2
0
1
3
OLTP %
0.473
0.85
0.34
0
0.23
0.1
DSS %
0.527
0.15
0.66
1
0.77
0. 9
Workload Management through Characterization and Idleness Detction 86 Chapter 5 Fuzzy Logic Based Scheduling 5.1.2 Fuzzification Rules
Output of the Fuzzification step is taken through the Fuzzy rules that portray the expert
decision and written in the form of Fuzzy words. Fuzzy rules are defined in the form of “If
Condition then Action”. There are 8 different user priority levels that are assigned to each user.
The priority 0 represents the user that has the highest priority and 7 for the lowest while value of
the DSS is in the range of [0, 1]. Initially, the database workloads were divided into 8 classes
(Class 0 to 7) for execution. The workloads are divided into these classes according to their size
and importance. The workloads in class 0 execute first, next class 1’s workloads and so on. The
Fuzzy rules are shown in Table 5.7.
Table 5.3
Initial Fuzzification Rules S No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Rule
If Up = 0 and DSS <= 0.5 then “Class 0”
If Up = 0 and DSS > 0.5 then “Class 1”
If Up = 1 and DSS <= 0.3 then “Class 0”
If Up = 1 and DSS > 0.3 and DSS <= 0.6 then “Class 1”
If Up = 1 and DSS > 0.6 and DSS <= 0.8 then “Class 2”
If Up = 1 and DSS > 0.8 then “Class 3”
If Up = 2 and DSS <= 0.2 then “Class 0”
If Up = 2 and DSS > 0.2 and DSS <= 0.4 then “Class 1”
If Up = 2 and DSS > 0.4 and DSS <= 0.7 then “Class 2”
If Up = 2 and DSS > 0.7 then “Class 3”
If Up = 3 and DSS <= 0.2 then “Class 0”
If Up = 3 and DSS > 0.2 and DSS <= 0.4 then “Class 1”
If Up = 3 and DSS > 0.4 and DSS <= 0.6 then “Class 2”
If Up = 3 and DSS > 0.6 and DSS <= 0.8 then “Class 3”
If Up = 3 and DSS > 0.8 then “Class 4”
If Up = 4 and DSS <= 0.1 then “Class 0”
If Up = 4 and DSS > 0.1 and DSS <= 0.3 then “Class 1”
If Up = 4 and DSS > 0.3 and DSS <= 0.5 then “Class 2”
If Up = 4 and DSS > 0.5 and DSS <= 0.8 then “Class 3”
If Up = 4 and DSS > 0.8 then “Class 4”
If Up = 5 and DSS <= 0.1 then “Class 0”
If Up = 5 and DSS > 0.1 and DSS <= 0.3 then “Class 1”
If Up = 5 and DSS > 0.3 and DSS <= 0.5 then “Class 2”
If Up = 5 and DSS > 0.5 and DSS <= 0.6 then “Class 3”
If Up = 5 and DSS > 0.6 and DSS <= 0.8 then “Class 4”
If Up = 5 and DSS > 0.8 then “Class 5”
If Up = 6 and DSS <= 0.1 then “Class 0”
If Up = 6 and DSS > 0.1 and DSS <= 0.3 then “Class 1”
If Up = 6 and DSS > 0.3 and DSS <= 0.4 then “Class 2”
If Up = 6 and DSS > 0.4 and DSS <= 0.5 then “Class 3”
If Up = 6 and DSS > 0.5 and DSS <= 0.7 then “Class 4”
If Up = 6 and DSS > 0.7 and DSS <= 0.8 then “Class 5”
Workload Management through Characterization and Idleness Detction 87 Chapter 5 Fuzzy Logic Based Scheduling 33
34
35
36
37
38
39
40
41
If Up = 6 and DSS > 0.8 then “Class 6”
If Up = 7 and DSS <= 0.1 then “Class 0”
If Up = 7 and DSS > 0.1 and DSS <= 0.2 then “Class 1”
If Up = 7 and DSS > 0.2 and DSS <= 0.3 then “Class 2”
If Up = 7 and DSS > 0.3 and DSS <= 0.4 then “Class 3”
If Up = 7 and DSS > 0.4 and DSS <= 0.5 then “Class 4”
If Up = 7 and DSS > 0.5 and DSS <= 0.7 then “Class 5”
If Up = 7 and DSS > 0.7 and DSS <= 0.8 then “Class 6”
If Up = 7 and DSS > 0.8 then “Class 7”
This is one way to represent the Fuzzy rules but there exists some other approaches that
are used for the representations of Fuzzy rules. One of the methods is the Fuzzy Associative
Matrix (FAM) where the Fuzzy rules are written in tabular form. Usually the FAM table consists
of two input parameter and the associative output. By using this approach, the rules are written in
compact form and easy to understand. We have converted the above Fuzzy rules into the FAM
which are shown in Table 5.8.
Table 5.4
Initial Fuzzification Rules in Compact Form DSS/ UP
<=0.1
>0.1 and <=0.2
>0.2 and <=0.3
>0.3 and <=0.4
>0.4 and <=0.5
>0.5 and <=0.6
>0.6 and <=0.7
>0.7 and <=0.8
>0.8 and <=0.9
>0.9 and <=1
=0
0
0
0
0
0
1
1
1
1
1
=1
0
0
1
1
2
2
2
3
3
3
=2
0
0
1
1
2
2
2
3
3
3
=3
0
0
1
1
2
2
3
3
4
4
=4
0
0
1
2
2
2
3
3
4
4
=5
0
0
1
1
2
3
4
4
5
5
=6
0
1
1
2
3
3
4
5
6
6
=7
0
1
2
3
4
5
5
6
7
7
However after the experiments, it has observed that it takes too much calculation time
due to large number of Fuzzy rules and their computation. Due to this reason, we have
minimized the number of classes up to three, i.e. High, Medium and Low.
Each workload is assigned into High, Medium, and Low class on the basis of user
priority and percentage of the DSS. The High class consists of those workloads that will be
executed first. Medium class workloads will execute subsequently and low class workloads will
be executed at last. Here, the high class does not mean that the workload takes only the smallest
time but it represents those workloads that take small time and have higher user priority (UP).
Workload Management through Characterization and Idleness Detction 88 Chapter 5 Fuzzy Logic Based Scheduling The Fuzzy rules have been developed after executing the various OLTP and DSS like workloads
and by taking the opinions from the database experts. Each class is identified by the
Fuzzification output and Fuzzy rules. These rules are written in the form of Fuzzy Associative
Matrix (FAM) as shown in Table 5.9.
Table 5.5
Final Fuzzification Rules in Compact Form DSS/ UP
<=0.3
0.3 to 0.5
> 0.5
<=1
High
High
Medium
>=2 and <= 4
High
Medium
Low
>= 5
Medium
Low
Low
5.1.3 Defuzzification
The fuzzified rules are used in defuzzification step to produce the final output by
changing the Fuzzy set into output values. According to the OLTP and DSS percentage as
received from step 1 and fuzzified rules of step 2, we have identified the specific workload type
classes for workloads in Table 5.2 and the result is shown in Table 5.4. For example, w1 falls in
medium class as the values of user priority is 2 and DSS is 0.527, therefore the rule of the Table
5.3 is applied, i.e. If Up <=2 and DSS > 0.5 then “Medium”.
Table 5.6
Workloads with their Identified Workload Class
Workload
w1
w2
w3
w4
w5
w6
Workload
Class
Medium
High
Medium
Medium
High
Low
Finally, all the sub-workloads are grouped into their relative classes after the
identification of workload class for each sub-workload. In this example, w2 and w5 falls in high
class workload; w1, w3, and w4 in medium class; and w6 in low class.
High Class = {w2, w5}
Medium Class = {w1, w3, w4}
Low Class = {w6}
Workload Management through Characterization and Idleness Detction 89 Chapter 5 Fuzzy Logic Based Scheduling Summary
In this chapter there is discussion about the scheduling technique to arrange the workload
for efficient execution. The scheduling is performed by finding the degree or percentage of the
OLTP and DSS in each workload. The percentage is calculated by using the Fuzzy Logic
approach where there are three main steps i.e. Fuzzification, Fuzzy Rules and Defuzzification. In
the Fuzzification step, the membership function is calculated on the basis of workload parameter
values and converted into the Fuzzy set. Each workload is assigned low, medium, or high class
with respect to its size and importance. The membership functions and the Fuzzy rules are
identified after the executing of various workloads. Once the workload is grouped into classes,
the scheduling is performed on the basis of workload importance and percentage of the DSS and
OLTP.
Workload Management through Characterization and Idleness Detction 90 Chapter 6: Idleness Detection
Chapter 6 Idleness Detection The chapter discusses the proposed idleness detection approach where a CBR is built that
works on the basis of training data. The data for the CBR is developed by executing the various
OLTP and DSS like workloads, identifying the number of OLTP and DSS requests and
associative CPU idleness. Finally, storing the number of OLTP and DSS queries along with their
CPU idleness.
6.1 CBR Development for Idleness Detection
The sample data for the idleness detection is constructed for the CBR. The database
workload is generated by threads where each thread is responsible to execute the OLTP or DSS
like workloads. The parameters that are stored into the CBR include Workload ID, OLTP
Transactions, DSS Transactions and the CPU Idleness. The Workload ID column represents the
unique number for workload identification and is generated automatically, OLTP Transactions
represents the number of OLTP like queries and DSS Transactions represents the number of DSS
like queries in the particular workload while CPU Idleness is the CPU cycles which will remain
free during the execution of particular workload.
Table 6.1
CBR DATA for CPU Idleness Detection Workload
ID
1
2
3
4
5
6
.
.
.
.
OLTP
Transactions
12
5
7
15
31
2
.
.
.
.
DSS
Transactions
5
3
11
5
3
5
.
.
.
.
CPU Idleness
(%)
28
57
19
33
26
19
.
.
.
.
In order to find the CPU idleness for the incoming workload, there are four steps of the
proposed idleness detection technique. These steps include Retrieve, Reuse, Revise and Retain.
Workload Management through Characterization and Idleness Detction 92 Chapter 6 Idleness Detection 6.1.1 Retrieve
First step to find the CPU idleness from the CBR is to match the parameters (OLTP and
DSS Transactions) of the currently executing workload with already stored cases. The retrieval is
performed from the problem description. It is important that missing similar cases can lead
towards inaccurate matched case that ultimately produces the wrong decisions. This is also
important as the accuracy depend on the number of cases stored and matching cases retrieval.
The similarity of cases can be determined through various techniques such as finding the
distance between two cases through similarity matrices which can be provided by the experience
or experts etc. The retrieval step has further four sub-steps, i.e. identification of the features,
initial match, search, and selection of cases.
Here, again the Euclidean distance method [Deza et al. (2009), Danielsson (1980)] is used to
find the matched cases. So the same Euclidean distance formula as discussed in chapter 4 is used
to find the distance between the incoming workload and stored cases. It is one of the simplest,
fastest methods.
Figure 6.2
Steps to Identify the CPU Idleness Workload Management through Characterization and Idleness Detction 93 Chapter 6 Idleness Detection Example 1
Let’s DBMS encounter a new workload with 9 and 13 OLTP and DSS transactions
respectively. Now, we have to find the CPU idleness against this workload by tacking the match
cases from the CBR. Here, the workload parameter’s values of the CBR and new workload are
not converted into the range of [0, 1] as shown in Table 6.2. Difference of the new workload
with all the stored records of the CBR is calculated using the above Euclidean Distance formula.
The CPU idleness for the new workload is taken from the row/ record that have minimum
distance value. In this example, minimum value of the Euclidean distance is 2.82842712 (row 3,
which is also highlighted) and the associated CPU idleness is 19. Therefore, the CPU Idleness of
the incoming workload will also be 19. The identified CPU idleness is used for the system
processes. However, when there is no exact or nearest match is found then the workload executes
without finding the CPU Idleness. After the completion of workload, it is stored with the actual
CPU Idleness. The nearest records are considered that have the Euclidean Difference <= 0.5
when number of OLTP and DSS transactions are less than equal to 1, 10 when number of OLTP
and DSS transactions are less than equal to 100 and 100 when number of OLTP and DSS
transactions are less than equal to 1000.
Table 6.2
Euclidean Difference Calculation (Example 1) New Workload
CBR Data
(In the range of [0, 1])
(in the range of [0, 1])
Euclidean
Difference
OLTP
Transactions
DSS
Transactions
CPU
Idleness (%)
OLTP
Transactions
DSS
Transactions
12
5
28
9
13.0000
8.54400375
5
3
57
9
13.0000
10.77032961
7
11
19
9
13.0000
2.82842712
15
5
33
9
13.0000
10.00000000
31
3
26
9
13.0000
24.16609195
2
5
19
9
13.0000
10.63014581
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
We have explained the method by which the nearest matching records from the CBR is
retrieved.
Workload Management through Characterization and Idleness Detction 94 Chapter 6 Idleness Detection 6.1.2 Reuse
The Reuse step starts after the completion of the Retrieve and can achieve by performing
two sub-steps. First, finding the differences among the current and the retrieved case; and second
identification of the portion of the retrieved case that must be stored as a new case into the CBR
for future use. The reuse step proposes the CPU idleness from the solution that retrieved in
previous step.
Reusing a retrieved case can be as easy as returning the retrieved solution, unchanged as
here we just have to use the workload type of the most similar case from the retrieve step. This
methodology is mostly used in the classification problems where each workload class or type is
likely to be represented frequently in the case base and therefore the most similar retrieved case
if sufficiently similar is most probably contain the right solution. However it becomes more
difficult when there is major difference between the incoming workload and the retrieved cases.
In those circumstances the retrieved solution may need to be adapted to account for these
important differences.
6.1.3 Revise
The revise step is initiated only when the solution found from the previous step (reuse) is
inaccurate. The revise process is achieved by evaluating the reuse solution and corrects it by
applying the domain knowledge. In this step there is a need of some adaptation method that is
applied to revise the solution by observing the change in the retrieve process and how the
retrieve solution can be changed. We have used the Substitution adaptation method as proposed
by (Kolodner, 1993), where the transformation adaptation alters the structure of the solution.
There are two ways of adaptation, in first case there is involvement of the user at each
substitution step who decides whether the substitution is according to his requirement or not. So
user decides about the substitution accepts or rejection. While in second case, the system
performs all the substitution at its own and finally the substituted result is sent to the user for
final evaluation. For the training data, we have adopted the first one in order to get accuracy.
Workload Management through Characterization and Idleness Detction 95 Chapter 6 Idleness Detection 6.1.4 Retain
The retain phase is the final step in CBR that insert the valuable information of the
current case into the knowledge base that is helpful for future problems solving. However,
duplicate or extra information should be avoided during the incorporation of new case otherwise
it slows down the retrieval process. The retain process also involves that how information will be
stored and its associated index in current memory structure.
Summary
In this chapter the proposed idleness detection approach to utilize the free CPU cycles is
discussed to improve the workload management process. The approach consists of a CBR that
contains the training data that is taken by executing the OLTP and DSS like workload with the
identification of the CPU idleness and storing the workload related parameters. In case of no
exact match, the workload is executed without taking the idleness for the first time and stored as
a new case into the CBR after execution. Once the CPU idleness is found, the time period is used
for some other processes such as defragmentation, index rebuilding etc. The defragmentation
process executes up to the detected idleness time. After completing the idleness time, the defragmentation or other process may complete otherwise it is stopped again.
Workload Management through Characterization and Idleness Detction 96 Chapter 7: Results and Discussion
Appendix – Database Structure The chapter is dedicated for the results that are achieved after the implementation of the
proposed and other workload management approach. The correctness and performance of the
proposed approach is discussed theoretically in the last chapters. The objective of the chapter is
to prove the proposed research by comparing it with other well-known approaches. Section 7.1
explains the experimental environment. The Section 7.2 calculates the wait time of the proposed
and other workload management approaches to show the effectiveness. In order to show the
effectiveness of the proposed research different workload scenarios are taken in these
experiments.
7.1 Experimental Setup
The proposed approach is implemented in VB.Net and all the experiments are executed
on the machine having Intel Core 2 Duo 3.0 GHZ with 280 GB hard disk, main memory of 2 GB
with operating system Windows 7 Ultimate 32-bit and MySQL server version 5.1. There are two
databases at MySQL server, namely TPCH (1.03 GB) having 8 tables and TPCC as per
specification of the Transaction Processing Council (TPC). The data in tables of these two
databases are created through DBGen [TPC Benchmark, 2011] and Dbt2 tool [Database Test 2 (2011)]. DBGen is used to generate the TPC-H benchmark while Dbt2 is used for TPC-C
benchmark to populate the workload or transactions executed by different threads from the client
machines.
The hardware for experimentation and validation of the proposed solution consists of the
Database Server, Backup Server, six personal computers and a Switch as discussed in Chapter 3.
All the devices are connected through a network. The MySql Database Management System is
selected that is used as a data source. It is selected due to its freely availability and open source
feature i.e. source code is available which can be analyzed and modified accordingly. MySql
Server is installed on the server machine and MySql client on the client machines. The main
database is stored on the server machine while backup server is responsible to maintain the
Workload Management through Characterization and Idleness Detction 98 Chapter 7 Results and Discussion backup of the database. Each client machine has the ability to generate number of workloads at a
time. The workload is generated through threads and sent to the database server for execution.
The database is stored on the server as well as backup machine that also have various OLTP and
DSS queries which are taken arbitrarily by the threads.
In each experiment, the proposed method and three previously used workload
management approaches are separately executed in same experimental environment. The results
that are reported here are selected after their multiple executions. As explained in previous
chapters, the experimental evaluation is conducted by executing the benchmark workloads that
consists of OLTP and DSS like queries. All the comparisons are presented through the graphs
that will show how much the proposed solution for the workload management in DBMSs is
effective.
7.2 Results and Discussion
To check the effectiveness of the proposed workload management technique, we have
performed a comparison with other three most commonly used techniques which are First In
First Out (FIFO), Priority Based (PB) workload management, and Shortest Job First (SJF). The
comparison is based on the average waiting time that is taken by executing the various OLTP
and DSS like workloads over these four scheduling techniques. FIFO technique works by
observing the rule that the workload which encounter first will be executed first, SJF scheduling
executes the requests starting from the smallest one to largest order where as in case of PB the
requests with highest priority executes first even it takes the largest time. There are two main
problems with the FIFO based workload management; firstly, the smallest requests have to wait
for a long time if arrive after a long one, and secondly, it does not consider the priority or
importance i.e. whether the request is generated from the higher management or simple from an
operator. Same like FIFO, SJF considers all the requests at the same level and executes from
smallest to longest request without considering its impact for the organization or how much it is
important for the higher management. On the other hand, the PB workload management executes
the requests on the basis of the priority only and the requests generated from the low
management have to wait for a long time even small in size.
Workload Management through Characterization and Idleness Detction 99 Chapter 7 Results and Discussion 7.2.1 Proposed Vs FIFO Based Workload Management
In order to evaluate the performance of the proposed approach, number of experiments
has been executed where the database workloads (consisting of the OLTP and DSS like queries)
are executed by adopting the FIFO and proposed approach.
Experiment 1
Consider the workload presented in Table 7.1 to Table 7.3 below where there are six (6)
mix workloads are generated by the users having different priority. In this experiment, we have
taken only those workloads that take 0 – 15 seconds in their completion. After the execution of
these workloads, performance behavior is evaluated by observing the workload execution time
and waiting time.
7.1 Comparative Results of the Proposed and FIFO Workload Management for Scenario 1 W
w1
w2
w3
w4
w5
w6
Arrival
order
2
4
1
3
6
5
Priority
2
6
2
0
1
3
OLTP
%
0.473
0.85
0.34
0
0.23
0.1
DSS
%
0.527
0.15
0.66
1
0.77
0. 9
OLT
DSS
Total
P ET
ET
Time
3
6.4
9.4
4.2
2.1
6.3
2.7
8.2
10.9
0
7.2
7.2
0.2
6.9
7.1
0.03
6.2
6.23
Average Wait Time
FIFO
WT
10.9
27.5
0
20.3
40.03
33.8
22.1
PA
Order
4
1
5
3
2
6
PA
WT
20.6
0
30
13.4
6.3
40.9
18.53
7.2 Comparative Results of the Proposed and FIFO Workload Management for Scenario 2 W
Arrival
order
Priority
OLTP
%
w1
w2
w3
w4
w5
w6
3
6
1
2
5
4
2
6
2
0
1
3
0.473
0.85
0.34
0
0.23
0.1
DSS
%
0.52
7
0.15
0.66
1
0.77
0. 9
OLTP
ET
DSS
ET
Total
Time
1
6.4
7.4
2
2.2
4.2
1.6
8.2
9.8
0
7.2
7.2
0.2
6.9
7.1
0.03
4.1
4.13
Average Wait Time
FIFO
WT
PA
Order
PA
WT
17
35.63
0
9.8
28.53
24.4
19.2
4
1
5
3
2
6
18.5
0
25.9
11.3
4.2
35.7
15.9
Workload Management through Characterization and Idleness Detction 100 Chapter 7 Results and Discussion 7.3 Comparative Results of the Proposed and FIFO Workload Management for Scenario 3 W
Arrival
order
Priority
OLTP
%
w1
w2
w3
w4
w5
w6
1
2
5
3
6
4
2
6
2
0
1
3
0.473
0.85
0.34
0
0.23
0.1
DSS
%
0.52
7
0.15
0.66
1
0.77
0. 9
OLTP
ET
DSS
ET
Total
Time
1
6.4
7.4
2
2.9
4.9
1.6
8.2
9.8
0
6.3
6.3
0.2
5.1
5.3
0.03
4.1
4.13
Average Wait Time
FIFO
WT
PA
Order
0
7.4
22.73
12.3
32.53
18.6
15.6
4
1
5
3
2
6
PA WT
16.5
0
23.9
10.2
4.9
33.7
14.9
After execution of the OLTP and DSS like workloads, it is observed that our approach
works better than the FIFO in two perspectives in all three scenarios which are presented in
Table 7.1, 7.2, and 7.3. In scenario 1, the average waiting time of the proposed approach (18.53)
is less than the FIFO (22.1); saving 16% time than the FIFO based workload management. In
scenario 2 presented in Table 7.2, average waiting time of the proposed approach (15.9) is less
than the FIFO (19.2). Here, again the proposed workload management approach takes 17% less
time than the FIFO. In 3rd scenario, average waiting time of the approach (14.9) is less than the
FIFO (15.6). Again, it takes 4% less time than the FIFO. If we take average of these three cases,
the proposed workload management approach saves 12.3 % as compared to FIFO. Despite of
minimum average waiting time, second perspective is the importance of the workload. The
strategic decisions are made by the higher authorities in every organization, so it is necessary to
give priority to their workloads and other workloads can be stopped for a while. The FIFO
approach does not consider the priority or importance of the workload while our approach
considers the importance.
To understand above tables, we have demonstrated the results in the form of graphs
shown in Figure 7.1. These graphs represent the waiting time of the proposed and FIFO
technique based on three different scenarios. These scenarios portray that the proposed workload
management approach is efficient than FIFO.
Workload Management through Characterization and Idleness Detction 101 Chapter 7 Results and Discussion Figure 7.1
FIFO WT
PA WT
Waiting Time of the Proposed Vs FIFO Workload Management The Average waiting time of above three scenarios against the FIFO and proposed
approach are calculated. These values are shown in Table 7.4 and their respected graph is shown
in Figure 7.2 to illustrate the efficiency of the proposed workload management over the FIFO
based approach.
Workload Management through Characterization and Idleness Detction 102 Chapter 7 Results and Discussion 7.4 Average Waiting Time of the Proposed and FIFO Workload Management Scenario
1
2
3
Figure 7.2
FIFO
AWT
22.1
19.2
15.6
PA
AWT
18.53
15.9
14.9
FIFO AWT
PA AWT
Average Waiting Time of the Proposed Vs FIFO Workload Management Experiment 2
In this experiment, we are taking the workloads shown in Table 7.5 to 7.7 below where
there are six (6) mix sub-workloads which are generated by the users having different priority. In
this experiment, we have taken only those workloads that take 0 – 1500 seconds (i.e. less than 30
minutes) in their completion. After the execution of these workloads, performance behavior is
evaluated by observing the workload execution time and waiting time.
7.5 Comparative Results of the Proposed and FIFO Workload Management for Scenario 1 W
w1
w2
w3
w4
w5
w6
Arrival
order
2
4
1
3
6
5
Priority
2
6
2
0
1
3
OLTP
%
0.62
1
0.21
0
0.5
0.54
DSS
%
0.38
0
0.79
1
0.95
0.46
OLTP
DSS
Total
ET
ET
Time
105.2
312.4
417.6
24.2
0
24.2
30.12
750.03 780.15
0
711.91 711.91
0.13
556.12 556.25
102.34
201.2
303.54
Average Wait Time
FIFO
WT
780.15
1909.66
0
1197.75
2237.4
1933.86
1343.14
PA
Order
4
1
5
3
2
6
PA
WT
1292.36
0
1709.96
580.45
24.2
2490.11
1016.18
Workload Management through Characterization and Idleness Detction 103 Chapter 7 Results and Discussion 7.6 Comparative Results of the Proposed and FIFO Workload Management for Scenario 2 W
w1
w2
w3
w4
w5
w6
Arrival
order
3
6
1
2
5
4
Priority
2
6
2
0
1
3
OLT
P%
0.57
0.92
0.002
0.01
0.08
0.79
DSS
%
0.43
0.08
0.998
0.99
0.92
0.21
OLTP
DSS
Total
ET
ET
Time
136.43 106.12 242.55
111.4
119
230.4
19.6
6062
6081.6
10.6
1092
1102.6
13.2
820.19 833.39
96.12
127
223.12
Average Wait Time
FIFO
WT
7184.2
8483.26
0
6081.6
7649.87
7426.75
6137.61
PA
Order
4
1
5
3
2
6
PA
WT
2166.39
0
2408.94
1063.79
230.4
8490.54
2393.34
7.7 Comparative Results of the Proposed and FIFO Workload Management for Scenario 3 W
w1
w2
w3
w4
w5
w6
Arrival
order
1
2
5
3
6
4
Priority
2
6
2
0
1
3
OLTP
%
0.42
0.85
0.31
0.05
0.21
0.19
DSS
%
0.58
0.15
0.69
0.95
0.79
0. 91
OLTP
DSS
Total
ET
ET
Time
87.21
116.4 203.61
102
32
134
61.6
58.2
119.8
6.7
146.3
153
10.2
73.1
83.3
12.05
114.1 126.15
Average Wait Time
FIFO
WT
0
203.61
616.76
337.61
736.56
490.61
397.53 PA
Order
4
1
5
3
2
6
PA WT
370.3
0
573.91
217.3
134
693.71
331.54
After execution of the OLTP and DSS like workloads, it is observed that our approach
works better than the FIFO in two perspectives in all three scenarios which are presented in
Table 7.5, 7.6, and 7.7. In scenario 1, the average waiting time of the approach (1016.18) is less
than the FIFO (1343.14), so the proposed approach saves 32% time than the FIFO based
workload management for this workload. In scenario 2 presented in Table 7.2, average waiting
time of the proposed approach (2393.34) is less than the FIFO (6011.76). Here, again the
proposed workload management approach takes 151% less time than the FIFO workload
management. In 3rd scenario, average waiting time of the approach (331.54) is less than the FIFO
(397.53). Again, our proposed approach takes 20% less time than the FIFO workload
management. If we take the average of these three scenarios, the proposed workload
management approach saves 110 % as compared to the FIFO. Despite of minimum average
waiting time, second perspective is the importance of the workload. The strategic decisions are
made by the higher authorities in every organization, so it is necessary to give priority to their
workloads as compared to other workloads. As compared to the FIFO approach, our workload
management approach also considers the workload importance.
Workload Management through Characterization and Idleness Detction 104 Chapter 7 Results and Discussion To understand above tables, we have demonstrated the results in the form of graphs as
shown in Figure 7.3. These graphs represent the waiting time of the proposed and FIFO
technique based on three different scenarios. These scenarios portray that the proposed workload
management approach is efficient than FIFO.
Figure 7.3
FIFO WT
PA WT
Waiting Time of the Proposed Vs FIFO Workload Management Workload Management through Characterization and Idleness Detction 105 Chapter 7 Results and Discussion The Average waiting time of above three scenarios against the FIFO and proposed
workload management approach are calculated. Their associated values are represented in Table
7.8 and graph is shown in Figure 7.4 to illustrate the efficiency of the proposed workload
management.
7.8 Average Waiting Time of the Proposed and FIFO Workload Management Scenario
1
2
3
Figure 7.4
FIFO
AWT
1016.18
2393.34
331.54
PA
AWT
1343.14
6137.61
397.53
FIFO AWT
PA AWT
Average Waiting Time of the Proposed Vs FIFO Workload Management 7.2.2 Proposed Vs PB Workload Management
The proposed workload management and priority based techniques are compared to
check whether it works better or not.
Experiment 1
Consider the workloads which are presented in Table 7.9 to Table 7.11 where there are
six (6) mix workloads are generated. In this experiment, we have taken only those workloads that
take 0 – 15 seconds in their completion. After the execution of these workloads, performance
behavior is evaluated by observing the workload execution time and waiting time.
Workload Management through Characterization and Idleness Detction 106 Chapter 7 Results and Discussion 7.9 Comparative Results of the Proposed and PB Workload Management for Scenario 1 W
w1
w2
w3
w4
w5
w6
Arrival
order
2
4
1
3
6
5
Priority
2
6
2
0
1
3
OLTP
%
0.473
0.85
0.34
0
0.23
0.1
DSS
%
0.527
0.15
0.66
1
0.77
0. 9
OLTP
ET
3
4.2
2.7
0
0.2
0.03
DSS Total
PB
ET
Time Order
6.4
9.4
3
2.1
6.3
6
8.2
10.9
4
7.2
7.2
1
6.9
7.1
2
6.2
6.23
5
Average Wait Time
PB
WT
14.3
40.83
23.7
0
7.2
34.6
20.1
PA
WT
20.6
0
30
13.4
6.3
40.9
18.5
7.10 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W
w1
w2
w3
w4
w5
w6
Arrival
order
3
6
1
2
5
4
Priority
2
6
2
0
1
3
OLTP
%
0.473
0.85
0.34
0
0.23
0.1
DSS
%
0.527
0.15
0.66
1
0.77
0. 9
OLTP
ET
1
2
1.6
0
0.2
0.03
DSS Total
PB
ET
Time Order
6.4
7.4
3
2.2
4.2
6
8.2
9.8
4
7.2
7.2
1
6.9
7.1
2
4.1
4.13
5
Average Wait Time
PB
WT
14.3
35.63
21.7
0
7.2
31.5
18.4
PA
WT
18.5
0
25.9
11.3
4.2
35.7
15.9
7.11 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W
w1
w2
w3
w4
w5
w6
Arrival
order
1
2
5
3
6
4
Priority
2
6
2
0
1
3
OLTP
%
0.473
0.85
0.34
0
0.23
0.1
DSS
%
0.527
0.15
0.66
1
0.77
0. 9
OLTP
ET
1
2
1.6
0
0.2
0.03
DSS Total
PB
ET
Time Order
6.4
7.4
3
2.9
4.9
6
8.2
9.8
4
6.3
6.3
1
5.1
5.3
2
4.1
4.13
5
Average Wait Time
PB
WT
11.6
32.93
19
0
6.3
28.8
16.4
PA
WT
16.5
0
23.9
10.2
4.9
33.7
14.9
After the workload execution, it has been found that the proposed method works better
than the PB workload management again as shown in Table 7.9, 7.10 and 7.11. In first scenario,
average waiting time of the PA (18.5) is less than the PB approach (20.1), so proposed approach
is better than PB as it saves 8% time. In second scenario which is shown in Table 7.10, average
waiting time of the PA (15.9) is less than the PB (18.4). Here, again it takes 14% less time than
the PB. In third scenario shown in Table 7.11, the average waiting time of the PA (14.9) while
PB approach takes (16.4) and again in this case the proposed method takes 10 % less time than
Workload Management through Characterization and Idleness Detction 107 Chapter 7 Results and Discussion PB. Now take the average of three scenarios, our approach saves 10.3% as compared to PB. The
other important perspective of our research is the consideration of the workload importance as
compared to the PB.
In order to understand above tables easily we have demonstrated these results in
following graphs shown in Figure 7.3. These graphs show the efficiency of our proposed
scheduling approach.
Workload Management through Characterization and Idleness Detction 108 Chapter 7 Results and Discussion Figure 7.5
PB WT
PA WT
Waiting Time of the Proposed Vs PB Workload Management The Average waiting time of above three cases against the PB and PA are represented in
Table 7.8 and their respective graph is shown in Figure 7.4 to illustrate the efficiency of the PA
over the PB.
7.12 Average Waiting Time of the Proposed and PB Workload Management Scenario
1
2
3
Figure 7.6
PB
AWT
20.1
18.4
16.4
PA
AWT
18.5
15.9
14.9
PB AWT
PA AWT
Average Waiting Time of PB Vs Proposed Approach Workload Management through Characterization and Idleness Detction 109 Chapter 7 Results and Discussion Experiment 2
Consider the workloads which are presented in Table 7.13 to Table 7.15 where there are
six (6) mix workloads are generated. In this experiment, we have taken the workloads that take 0
– 1500 seconds in their completion. After the execution of these workloads, performance
behavior is evaluated by observing the workload execution time and waiting time.
7.13 Comparative Results of the Proposed and PB Workload Management for Scenario 1 W
w1
w2
w3
w4
w5
w6
Arrival
order
2
4
1
3
6
5
Priority
2
6
2
0
1
3
OLT
P%
0.62
1
0.21
0
0.55
0.54
DSS
%
0.38
0
0.79
1
0.45
0.46
OLTP
ET
105.2
24.2
30.12
0
256.13
102.34
DSS
ET
312.4
0
750.0
711.9
300.1
201.2
Total
PB
Time
Order
417.6
3
24.2
6
780.15
4
711.91
1
556.25
2
303.54
5
Average Wait Time
PB WT
PA WT
1268.16
2769.45
1685.76
0
711.91
2465.91
1483.53
1268.16
0
1709.96
1268.16
711.91
2769.45
1287.94
7.14 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W
w1
w2
w3
w4
w5
w6
Arrival
order
3
6
1
2
5
4
Priority
2
6
2
0
1
3
OLT
P%
0.57
0.92
0.002
0.77
0.68
0.79
DSS
%
0.43
0.08
0.998
0.33
0.32
0.21
OLTP
ET
136.43
111.4
19.6
610.6
420.2
96.12
DSS
Total
PB
ET
Time
Order
106.12 242.55
3
109
220.4
6
6062
6081.6
4
492
1102.6
1
413.19 833.39
2
127
223.12
5
Average Wait Time
PB WT
PA WT
1935.99
8483.26
2178.54
0
1102.6
8260.14
3660.09
1935.99
0
2398.94
1935.99
1102.6
8483.26
2642.80
7.15 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W
w1
w2
w3
w4
w5
w6
Arrival
order
1
2
5
3
6
4
Priority
2
6
2
0
1
3
OLT
P%
0.42
0.85
0.31
0.35
0.21
0.19
DSS
%
0.58
0.15
0.69
0.65
0.79
0. 91
OLTP
ET
87.21
102
61.6
76.7
10.2
12.05
DSS
ET
116.4
32
58.2
76.3
73.1
114.1
Total
PB
Time
Order
203.61
3
134
6
119.8
4
153
1
83.3
2
126.15
5
Average Wait Time
PB WT
236.3
685.86
439.91
0
153
559.71
345.80
PA
WT
236.3
0
573.91
236.3
153
685.86
314.23
Workload Management through Characterization and Idleness Detction 110 Chapter 7 Results and Discussion After the workload execution, it has been found that the proposed method works better
than the PB workload management again as shown in Table 7.13, 7.14 and 7.15. In first scenario,
average waiting time of the PA (1287.94) is less than the PB approach (1483.53), so proposed
approach is better than PB as it saves 15% time. In second scenario which is shown in Table
7.13, average waiting time of the proposed approach (2642.80) is less than the Priority Based
workload management (3660.09). Here, again it takes 39% less time than the PB workload
management. In third scenario shown in Table 7.15, the average waiting time of the PA (314.23)
while PB approach takes (345.80) and again in this case the proposed method takes 10 % less
time than PB. Now take the average of three scenarios, our approach saves 10.3% as compared
to PB. The other important perspective of our research is the consideration of the workload
importance as compared to the PB.
In order to understand above tables, we have demonstrated these results in the graphs
shown in Figure 7.3. These graphs show the efficiency of our proposed workload management
approach as compare to the priority based.
Workload Management through Characterization and Idleness Detction 111 Chapter 7 Results and Discussion Figure 7.7
PB WT
PA WT
Waiting Time of the Proposed Vs PB Workload Management The Average waiting time of above three cases against the proposed and priority based
approach are presented in Table 7.16. Their respective graph is shown in Figure 7.8 to illustrate
the efficiency of the PA over the PB.
7.16 Average Waiting Time of the Proposed and PB Workload Management Scenario
1
2
3
Figure 7.8
PB
AWT
1483.53
3660.09
345.80
PA
AWT
1287.94
2642.80
314.23
PB AWT
PA AWT
Average Waiting Time of PB Vs Proposed Approach Workload Management through Characterization and Idleness Detction 112 Chapter 7 Results and Discussion 7.2.3 Proposed Workload Management Vs SJF
Here, the proposed and shortest job first workload management approach for the same
workload as discussed previously. After the workload execution, it has been found that the
average waiting time of PA is greater than the SJF as shown in Table 7.17, 7.18, and 7.19 as SJF
gives higher priority to smallest workloads. In first scenario, the average waiting time of the PA
(18.5) is greater than the SJF (15.9), so PA takes 14% more time than SJF. In second scenario
shown in Table 7.18, average waiting time of the PA (15.9) is greater than SJF (12.7) and PA
takes 21% more time. In third scenario as shown in Table 7.19, average waiting time of the PA
takes 14.9 while SJF takes 12 and in this case PA takes 20% more time than SJF. After taking
the average these three scenarios, proposed approach takes 18.3% more time than the SJF.
However, SJF does not care about the importance of the workload which is very important factor
for decision making.
7.17 Comparative Results of the Proposed and SJF Workload Management for Scenario 1 W
Arrival
order
Priority
OLTP
%
w1
2
2
0.473
w2
w3
w4
w5
w6
4
1
3
6
5
6
2
0
1
3
0.85
0.34
0
0.23
0.1
DSS
%
0.52
7
0.15
0.66
1
0.77
0. 9
OLTP
ET
DSS
ET
Total
Time
SJF
Order
SJF
WT
PA
Order
PA
WT
3
6.4
9.4
5
26.83
4
20.6
6.23
36.23
13.4
12.53
0
15.9
1
5
3
2
6
0
30
13.4
6.3
40.9
18.5
4.2
2.7
0
0.2
0.03
2.1
6.3
2
8.2
10.9
6
7.2
7.2
4
6.9
7.1
3
6.2
6.23
1
Average Wait Time
Comparative Results of the Proposed and SJF Workload Management for Scenario 2
W
Arrival
order
Priority
OLTP
%
DSS
%
OLTP
ET
w1
w2
w3
w4
w5
w6
3
6
1
2
5
4
2
6
2
0
1
3
0.473
0.85
0.34
0
0.23
0.1
0.527
0.15
0.66
1
0.77
0. 9
1
2
1.6
0
0.2
0.03
DSS
ET
Total
Time
SJF
Order
SJF
WT
6.4
7.4
5
2.2
4.2
2
8.2
9.8
6
7.2
7.2
4
6.9
7.1
3
4.1
4.13
1
Average Wait Time
22.63
4.13
30.03
11.3
8.33
0
12.7
PA
Orde
r
4
1
5
3
2
6
PA
WT
18.5
0
25.9
11.3
4.2
35.7
15.9
Workload Management through Characterization and Idleness Detction 113 Chapter 7 Results and Discussion 7.18 Comparative Results of the Proposed and SJF Workload Management for Scenario 3 W
Arrival
order
Priority
OLTP
%
w1
1
2
0.473
w2
w3
w4
w5
w6
2
5
3
6
4
6
2
0
1
3
0.85
0.34
0
0.23
0.1
DSS
%
0.52
7
0.15
0.66
1
0.77
0. 9
OLTP
ET
DSS
ET
Total
Time
SJF
Order
SJF
WT
PA
Orde
r
PA
WT
1
6.4
7.4
5
20.63
4
16.5
4.13
28.03
10.2
9.03
0
12.0
1
5
3
2
6
0
23.9
10.2
4.9
33.7
14.9
2
1.6
0
0.2
0.03
2.9
4.9
2
8.2
9.8
6
6.3
6.3
4
5.1
5.3
3
4.1
4.13
1
Average Wait Time
In order to understand above tables results achieved from both the scheduling schemes
are represented by the graphs which are shown in Figure 7.5. These graphs show that the
proposed approach is inefficient than SJF but it is due to the reason that SJF only considers the
size of the workload but not importance of the workload. Due to this reason, SJF is not suitable
for the DBMS environment otherwise higher management has to wait for a long time in decision
making.
Workload Management through Characterization and Idleness Detction 114 Chapter 7 Results and Discussion Figure 7.9
PB WT
PA WT
Waiting Time of the Proposed Approach Vs SJF Workload Management Average waiting time of above three cases against the SJF and PA are represented in
Table 7.12 and their respected graph is shown in Figure 7.6.
7.19 Average Waiting Time of the Proposed and SJF Workload Management
Scenario
1
2
3
Figure 7.10
PA
AWT
18.5
15.9
14.9
SJF
AWT
15.9
12.7
12
FIFO AWT
PA AWT
Average Waiting Time of SJF Vs Proposed Approach Workload Management through Characterization and Idleness Detction 115 Chapter 7 Results and Discussion Summary
The results are discussed in that are achieved after the implementation of the proposed
and other well known workload management approach. The objective of the chapter is to prove
the proposed research by comparing it with other well-known approaches. The experimental
environment is discussed and calculated the wait time of proposed and other workload
management techniques to show the effectiveness of the proposed research. In order to show the
effectiveness of the proposed research, different workload scenarios are taken in these
experiments (their execution time is in the range of 0 to 30 minutes).
Workload Management through Characterization and Idleness Detction 116 Chapter 7 Results and Discussion Conclusion
The research proposed a solution for managing the workload in DBMSs that makes
decision about the workload execution with identifying the OLTP and DSS percentage. The
workload Type is identified through the CBR approach while OLTP and DSS percentage is
determined by using the Fuzzy Logic. The workload parameters are identified that are more
effective for the workload characterization and classification. After the identification of the
workload type, we have calculated the percentage of the OLTP and DSS. Once the OLTP and
DSS percentage is known, the entire workload is arranged for better execution. Results achieved
after experimenting over the OLTP and DSS like workload reveals that the proposed
characterization and scheduling techniques are proved to be effective and less time consuming
with consideration of the workload size and importance.
The study starts with the introduction and background about the workload management,
AI techniques, autonomic computing, problem statement, research questions, research objectives,
applications and contribution. Existing database workload management work is discussed that
are related with the workload classification, characterization and scheduling. This discussion also
highlighted the problems which exist in the previous work. After the discussion of the workload
background and related work, the methodology or brief steps of the proposed research are
explained with the experimental setup. The proposed solution consists of the characterization,
scheduler and idleness detector. The workload characterization is achieved by executing the
OLTP and DSS like workload; values of the related parameters are taken and classification is
performed through CBR. However, a problem with the CBR based characterization is that it
cannot be used for workload management as workload is classified during or after its complete
execution. So, there is a need of the solution which can classify the workload before its execution
with finding the OLTP and DSS percentage of each workload. The percentage of the OLTP and
DSS in workload is calculated by using the Fuzzy Logic that consists of the Fuzzification, Fuzzy
Rules and Defuzzification steps. The scheduling is performed on the basis of workload
importance and percentage of the DSS and OLTP. Finally, the CPU idleness of the incoming
workload is identified by searching the exact or nearest (almost equal) workload from the CBR.
Workload Management through Characterization and Idleness Detction 117 Chapter 7 Results and Discussion After getting the exact or nearest match, idleness of that workload is taken and that period is used
for de-fragmentation or any other process. The proposed approach is validated through analyses
and experiments. Effectiveness of the proposed research is experimentally evaluated and the
efficiency in terms of the waiting time is compared with other workload management techniques.
The chapter also provides a comparison with the other well-known workload management
schemes. The results obtained from the approach are compared with the FIFO, Priority Based,
and Smallest Job First (SJF). All the results are shown in tables and graphs that reveal that the
proposed approach works better in all the scenarios.
Workload Management through Characterization and Idleness Detction 118 Chapter 7 Results and Discussion Future Work
In future this type of research can be performed on the other DBMSs that will be helpful
in improving these DBMSs. In the current research, human intervention in the previous workload
management is identified and reported; in future this human intervention can be removed and the
autonomic level of the DBMSs can be improved. The proposed research is also the basis for
improving the workload management task in Data Warehouses. For this purpose, the intelligent
algorithms, techniques and models can be introduced to minimize the human intervention.
Workload Management through Characterization and Idleness Detction 119 Appendix – Database Structure References
Workload Management through Characterization and Idleness Detction 120 References Chandramouli B, Christopher N. Bond, B. Shivnath, Y. Jun, (2007). On Suspending and Resuming
Dataflows. ICDE, pp. 1289–1291.
Chandramouli B, Bond CN, Babu S, Yang J (2007). Query Suspend and Resume. Proceedings of the
ACM SIGMOD International Conference on Management of Data, pp. 557-568.
Dageville B, Das D, Dias K, Yagoub K, Zait M, Ziauddin M (2004). Automatic SQL tuning in Oracle
10g. Proceedings of the 30th International VLDB Conference on: 1098–1109.
Database Test 2 (2011), Database Test Suite. http://osdldbt.sourceforge.net/
Deza M.M., Deza E., (2009), Encyclopedia of Distances, ISBN 978-3-642-00233-5/hbk; ISBN 978-3642-00234-2/ebook, Springer.
Danielsson P. (1980), Euclidean Distance Mapping, Computer Graphics and Image Processing, pp.227248.
Elnaffar S, Martin P, Horman R, (2002). Automatically Classifying Database Workloads. Proceedings of
the CIKM’02, Washington, DC.
Elnaffar S, Martin P (2004). An Intelligent Framework for Predicting Shifts in the Workloads of
Autonomic Database Management Systems. Proceedings of the IEEE International Conference on
Advances in Intelligent Systems.
Holze M., Ritter N, (2008). Autonomic databases: Detection of workload shifts with n-gram-models,
LNCS, Vol. 5207, Springer, Heidelberg, pp. 127–142.
Horn (2001). Autonomic Computing: IBM’s Perspective on the State of Information Technology. IBM
Journal Paper.
Huebscher MC, McCann JA (2008). A Survey of Autonomic Computing-Degrees, Models and
Applications. ACM 0360-0300/2008/08-ART7.
Hopfield J, (1982). Neural Networks and Physical Systems with Emergent Collective Computational
Abilities, Proceedings of the National Acad. Science, Vol. 79, pp. 2554–2558, USA.
IBM Corporation (2003). DB2 Query Patroller Guide: Installation, Administration, and Usage.
IDC Competitive Analysis: Worldwide RDBMS 2005 Vendor Shares: Preliminary Results for the Top 5
Vendors Show Continued Growth.
http://www.oracle.com/corporate/analyst/reports/infrastructure/dbms/idc-201692.pdf
Krompass S, Scholz A, Albutiu MC, Kuno H, Wiener J, Dayal U, Kemper A (2008). Quality of Service
Enabled Management of Database Workload. IEEE Database Engineering Bulletin (DEBU). Vol. 31(1):
20–27.
Workload Management through Characterization and Idleness Detction 121 References Krompass S, Kuno H, Dayal U, Kemper A (2007). Dynamic Workload Management for Very Large Data
Warehouses: Juggling Feathers and Bowling Balls. In Proceedings of the 33rd International. Conference
on Very Large Data Bases (VLDB).
Krompass S, Kuno H, Wiener J, Wilkinson K, Dayal U, Kemper A (2009). Managing Long-Running
Queries. In the Proceedings of the 12th International Conference on Extending Database Technology
(EDBT).
Lightstone S, Lohman G, Zilio D (2002). Toward Autonomic Computing with DB2 Universal Database.
SIGMOD. Vol. 31(3), pp. 55–61.
Liu F, Zhao Y, Wang W, Andmakaroff D (2004). Database Server Workload Characterization in an ECommerce Environment. In Proceedings of the IEEE International Symposium on Modeling, Analysis,
and Simulation of Computer and Telecommunication Systems (MASCOTS), pp.475–483.
Mateen A, Raza B, Hussain T, Awais MM (2008). Autonomic Computing in SQL Server. 7th
International Conference on Computer and Information Science, pp. 113–118.
Mehta A, Gupta C, Wang S, Dayal U (2009). rFEED: A Mixed Workload Scheduler for Enterprise Data
Warehouses. ICDE, pp. 1455–1458.
Mehta A, Gupta C, Dayal U (2008). BI Batch Manager: A System for Managing Batch Workloads on
Enterprise Data Warehouses. International Conference on Extended Database Technology (EDBT),
pp.640–651.
Mehta A, Gupta C, Wang S, Dayal U (2008). Automated Workload Management for Enterprise Data
Warehouses”. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
Menasce DA, Almeida VAF, Fonseca R, Mendes MA (1999). A Methodology for Workload
Characterization of E-commerce Sites. Proceedings of the 1st ACM Conference on Electronic Commerce,
pp. 119–128.
Mumtaz A, Ashraf A, Shivnath B (2008). Modeling and Exploiting Query Interactions in Database
Systems. CIKM. pp. 183–192.
Mumtaz A, Ashraf A, Shivnath B, Munagala K (2008). QShuffler: Getting the Query Mix Right. ICDE,
pp. 1415–1417.
Mumtaz A, Ashraf A, Shivnath B (2009). Query interactions in database workloads, DBTest.
Niu B, Martin P, Powwley W, Horman R, Bird P (2006). Workload adaptation in autonomic DBMSs.
Proceedings of the Center for Advanced Studies on Collaborative Research Conference (CASCON’06).
ACM Press, 13.
Niu B (2008). Workload adaptation in autonomic database management systems, PhD Thesis, Queen's
University, Canada.
Workload Management through Characterization and Idleness Detction 122 References Niu B, Martin P, Powley W, Bird P (2007). Quantifying Workload Importance. 4th International
Workshop on Engineering Autonomic Software Systems, Canada.
Niu B, Martin P, Powley W, Bird P, Horman R (2007). Adapting Mixed Workloads to Meet SLOs in
Autonomic DBMSs. Proceedings of the 2nd International Workshop on Self-Managing Database Systems
(SMDB) in Conjunction with the 23rd International Conference on Data Engineering (ICDE), Istanbul,
Turkey.
Pang H H, Carey MJ, Livny M (1995). Multiclass Query Scheduling in Real-Time Database Systems.
IEEE Transactions on Knowledge and Data Engineering. Vol. 7(4), pp. 533–551.
Parashar M, Hariri S (2005). Autonomic Computing: An Overview. Springer-Verlag Berlin Heidelberg:
247–259.
Rhee A, Chatterjee S, Lahiri T (2001). The Oracle Database Resource Manager: Scheduling CPU
Resources at the Application. High Performance Transaction System Workshop.
Stuart R, Peter N. (2008). Artificial Intelligence. A Modern Approach, 2nd Edition, ISBN No. 81-7758367-0.
Schroeder B, Harchol-Balter M, Iyengar A, Nahum EM, Wierman A (2006). How to Determine a Good
Multi-Programming Level for External Scheduling. ICDE, pp. 60–71.
Roger S (1982). Dynamic Memory: A Theory of Learning in Computers and People. New York:
Cambridge University Press.
Surajit C, Raghav K, Ravishankar R (2005). When Can We Trust Progress Estimators For SQL Queries?.
In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 575–586.
Surajit C, Raghav K, Ravishankar R, Abhijit P (2007). Stop-and-Restart Style Execution for Long
Running Decision Support Queries. VLDB, pp. 735–745.
TPC Benchmark (2011), TM Standard Specification Revision 2.6.2. http://www.tpc.org
Wasserman T J, Martin P, Skillicorn D B, Rizvi H (2004). Developing a Characterization of Business
Intelligence Workloads for Sizing New Database Systems. In Proceedings of the 7th ACM International
Workshop on Data Warehousing and OLAP, ACM Press, pp. 7–13.
White S R, Hanson J E, Whalley I, Chess D M, Kephart J O (2004). An Architectural Approach to
Autonomic Computing. Proceeding of the International Conference on Volume, Autonomic Computing,
pp. 2–9.
Yu PS, Chen M, Heiss H, Lee S (1992). On Workload Characterization of Relational Database
Environments. Software Engineering. Vol. 18(4), pp. 347–355.
Zadeh L, (1965). Fuzzy Sets and Systems, Fox Journal, editor. System Theory, Brooklyn, NY:
Polytechnic Press, pp. 29–39.
Workload Management through Characterization and Idleness Detction 123 References Zewdu Z, Denko M K, Libsie M (2009). Workload Characterization of Autonomic DBMSs Using
Statistical and Data Mining Techniques. AIN, pp. 244–249.
Workload Management through Characterization and Idleness Detction 124 References Appendix
Database Structure
Workload Management through Characterization and Idleness Detction 125 Appendix – Database Structure Database and Relevant Table Creation
Database Creation CREATE DATABASE Workload_Database; use Workload_Database; Tables Creation create table nation ( n_nationkey decimal(3,0) not null, n_name char(25) not null, n_regionkey decimal(2,0) not null, n_comment varchar(152) ); create table region ( r_regionkey decimal(2,0) not null, r_name char(25) not null, r_comment varchar(152) ); create table part ( p_partkey decimal(10,0) not null, p_name varchar(55) not null, p_mfgr char(25) not null, p_brand char(10) not null, p_type varchar(25) not null, p_size decimal(2,0) not null, p_container char(10) not null, p_retailprice decimal(6,2) not null, p_comment varchar(23) not null ); create table supplier ( s_suppkey decimal(8,0) not null, s_name char(25) not null, s_address varchar(40) not null, s_nationkey decimal(3,0) not null, s_phone char(15) not null, s_acctbal decimal(7,2) not null, s_comment varchar(101) not null ); Workload Management through Characterization and Idleness Detction 126 References create table partsupp ( ps_partkey decimal(10,0) not null, ps_suppkey decimal(8,0) not null, ps_availqty decimal(5,0) not null, ps_supplycost decimal(6,2) not null, ps_comment varchar(199) not null ); create table customer ( c_custkey decimal(9,0) not null, c_name varchar(25) not null, c_address varchar(40) not null, c_nationkey decimal(3,0) not null, c_phone char(15) not null, c_acctbal decimal(7,2) not null, c_mktsegment char(10) not null, c_comment varchar(117) not null ); create table orders ( o_orderkey decimal(12,0) not null, o_custkey decimal(9,0) not null, o_orderstatus char(1) not null, o_totalprice decimal(8,2) not null, o_orderdate date not null, o_orderpriority char(15) not null, o_clerk char(15) not null, o_shippriority decimal(1,0) not null, o_comment varchar(79) not null ); create table lineitem ( l_orderkey decimal(12,0) not null, l_partkey decimal(10,0) not null, l_suppkey decimal(8,0) not null, l_linenumber decimal(1,0) not null, l_quantity decimal(2,0) not null, l_extendedprice decimal(8,2) not null, l_discount decimal(3,2) not null, l_tax decimal(3,2) not null, l_returnflag char(1) not null, l_linestatus char(1) not null, l_shipdate date not null, l_commitdate date not null, Workload Management through Characterization and Idleness Detction 127 References l_receiptdate date not null, l_shipinstruct char(25) not null, l_shipmode char(10) not null, l_comment varchar(44) not null ); Load Data to Tables
LOAD DATA INFILE "D://implementation//DBGen//region.tbl" INTO TABLE region FIELDS TERMINATED BY "|" LINES TERMINATED BY "\r\n" (r_regionkey, r_name, r_comment); LOAD DATA INFILE "D://implementation//DBGen//nation.tbl" INTO TABLE nation FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//supplier.tbl" INTO TABLE supplier FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//part.tbl" INTO TABLE part FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//partsupp.tbl" INTO TABLE partsupp FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//customer.tbl" INTO TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//orders.tbl" INTO TABLE orders FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//lineitem.tbl" INTO TABLE lineitem FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; Creation of Primary Key ALTER TABLE region ADD CONSTRAINT pkey_region PRIMARY KEY(r_regionkey); ALTER TABLE nation ADD CONSTRAINT pkey_nation PRIMARY KEY(n_nationkey); ALTER TABLE part ADD CONSTRAINT pkey_part PRIMARY KEY(p_partkey); ALTER TABLE supplier ADD CONSTRAINT pkey_supplier PRIMARY KEY(s_suppkey); ALTER TABLE partsupp ADD CONSTRAINT pkey_partsupp PRIMARY KEY(ps_partkey,ps_suppkey); ALTER TABLE customer ADD CONSTRAINT pkey_customer PRIMARY KEY(c_custkey); ALTER TABLE lineitem ADD CONSTRAINT pkey_lineitem PRIMARY KEY(l_orderkey,l_linenumber); ALTER TABLE orders ADD CONSTRAINT pkey_orders PRIMARY KEY(o_orderkey); Workload Management through Characterization and Idleness Detction 128 References Creation of Primary Key create index fkey_nation_1 on nation(n_regionkey); create index fkey_supplier_1 on supplier(s_nationkey); create index fkey_customer_1 on customer(c_nationkey); create index fkey_partsupp_1 on partsupp(ps_suppkey); create index fkey_partsupp_2 on partsupp(ps_partkey); create index fkey_orders_1 on orders(o_custkey); create index fkey_lineitem_1 on lineitem(l_orderkey); create index fkey_lineitem_2 on lineitem(l_partkey,l_suppkey); create index fkey_lineitem_3 on lineitem(l_suppkey); create index xxx1 on lineitem(l_shipdate); create index xxx2 on customer(c_mktsegment); create index xxx3 on orders(o_orderdate); create index xxx4 on region(r_name); create index xxx5 on lineitem(l_discount); create index xxx6 on lineitem(l_quantity); create index xxx7 on lineitem(l_returnflag); create index xxx8 on lineitem(l_shipmode); create index xxx9 on lineitem(l_commitdate); create index xxx10 on lineitem(l_receiptdate); create index xxx11 on lineitem(l_partkey); create index xxx12 on part(p_size); create index xxx13 on part(p_type); create index xxx14 on partsupp(ps_supplycost); create index xxx15 on nation(n_name); create index xxx16 on part(p_name); create index xxx17 on orders(o_clerk); create index xxx18 on part(p_brand); create index xxx19 on part(p_container); Workload Management through Characterization and Idleness Detction 129 References Table Structure
After the execution of the above statements, the following tables are created with the associative
primary and secondary keys:
CUSTOMER Table
LineItem Table
Workload Management through Characterization and Idleness Detction 130 References NATION Table
ORDERS Table
PART Table
Workload Management through Characterization and Idleness Detction 131 References PARTSUPP Table
REGION Table
SUPPLIER Table
Workload Management through Characterization and Idleness Detction 132