Workload Management through Characterization and Idleness Detection A Thesis Presented to Department of Computer Science Faculty of Basic & Applied Sciences In Partial Fulfillment of the requirement for the degree Of PhD (Computer Science) By ABDUL MATEEN (41-FBAS/PHDCS/F08) International Islamic University Islamabad (2012) International Islamic University, Islamabad Faculty of Basic & Applied Sciences Department of Computer Science Dated: FINAL APPROVAL It is certified that we have read the thesis, entitled “Workload Management through Characterization and Idleness Detection” submitted by Abdul Mateen, Reg. No. 41FBAS/PHDCS/F08. It is our judgment that this thesis is of sufficient standard to warrant its acceptance by the International Islamic University Islamabad for PhD Degree in Computer Science. PROJECT EVALUATION COMMITTEE External Examiner: _______________________ ------------------------------------------------------------------Internal Examiner: --------------, ----------------, Department of Computer Science, Faculty of Basic and Applied Sciences, International Islamic University Islamabad Supervisor: Dr. Muhammad Sher Chairman, Professor Computer Science & Software Engineering Department International Islamic University Islamabad Pakistan Co-Supervisor: Dr. Mian Muhammad Awais Associate Professor School of Science & Engineering Lahore University of Management Sciences (LUMS) Islamabad Pakistan Workload Management through Characterization and Idleness Detection ii Acknowledgement: I would like to acknowledge many people who have supported, guided and advised me during the PhD thesis. First, I would like to express my sincere gratitude to my supervisor Dr. Muhammad Sher, Professor, Department of Computer Science & Software Engineering, International Islamic University and co-supervisor Dr. Mian Muhammad Awais, Associate Professor, University Management Sciences (LUMS), Lahore for providing me an opportunity to start this study and research. Through the course of my study, I have had the great fortune to get to know and interact with them. Their valuable comments, suggestions and motivations for further development as well as her assistance during writing PhD thesis are invaluable to me. They always guided whenever I stuck during the study and encouraged me during the whole period. I am also thankful to Professor Dr. Said Elnaffar, Associate Professor, UAE University and Dr Norwati Mustapha, Associate Professor, Faculty of Computer Science & Information Technology, University Putra Malaysia (UPM), Malaysia for their useful suggestions and great ideas that helped me to move in the right direction. For the all others who have directly or indirectly helped me in the completion of my work, I thank you all. Workload Management through Characterization and Idleness Detection iii Declaration: I hereby declare and affirm that this thesis neither as a whole no as part thereof has been copied out from any source. It is further declared that I have completed this thesis entirely on the basis of my personal effort, made under the sincere guidance of my supervisors. If any part of this report is proven to be copied our or found to be a reproduction of some other, we shall stand by the consequences. No portion of the work presented in this report has been submitted in support of an application for other degree or qualification of this or any other University or Institute of learning. Abdul Mateen 41-FBAS/PHDCS/F08 Workload Management through Characterization and Idleness Detection iv Dedication: This work is dedicated to my parents and family who always encourage, support and guide me to achieve this milestone. Abdul Mateen Workload Management through Characterization and Idleness Detection v Abstract Database Management Systems (DBMSs) are used as a data source with financial, educational, telecomm and web applications. Number of users all the time connected with the DBMS to update the existing records and taking reports by executing complex workloads or queries. Due to this reason the workload in DBMSs fluctuates all the time and sometime this fluctuation is sudden and difficult to handle. In order to get the sufficient level of performance, arrangement of workload is necessary especially when there are huge and complex workloads. Rapid growth in data, maximum functionality and changing behavior tends the database workload to be more complex and tricky. Every organization observes complex types of database workloads that are difficult to manage by the humans; human experts take much time to get sufficient experience to manage workload efficiently; and even in some cases this task becomes impossible and leads towards malnourishment. The versatility in database workload leads database practitioners, vendors and researchers towards the new challenges. Efficient execution and resource allocation for the workload is dependent on the workload type that may be either On Line Transaction Processing (OLTP) or Decision Support System (DSS). OLTP workload consists of small transactions and accesses a small amount of data. On the other hand DSS workload comprises of queries that are long, highly intensive, need many resources and processing. To achieve a satisfactory level of performance, either Database Administrator (DBA) or DBMSs must have the knowledge about the workload shifts. We are introducing a way to manage the workload in DBMSs on the basis of the workload type. The research presents a novel approach for the database workload management through the Case Base Reasoning Characterization and classification approach, impact based Fuzzy Logic Scheduling and CPU Idleness Detection. The proposed approach is validated through the experiments which are performed on the real time and benchmark workload to reveal the effectiveness and efficiency. The ultimate goal of the research is to manage the workload in DBMSs through Characterization, Scheduler and Idleness Detection. Achievements and Contribution Journal Publications 1. Abdul Mateen, Basit Raza, Mian M. Awais, Muhammad Sher, Norwatti Mustapha, Workload Management: A Technology Perspective with respect to Self-* Characteristics, Artificial Intelligence Review (Impact Factor = 1.213), 2012. 2. Basit Raza, Abdul Mateen, Mian M. Awais, Muhammad Sher, A Survey on Autonomic Workload Management: Algorithms, Techniques and Models, Journal of Computing (Impact Factor 0.2), 2011, Volume 3(6), pp. 29-38, June 2011. 3. Abdul Mateen, Basit Raza, Mian Muhammad Awais, Muhammad Sher, Selection of the Best DBMS: A Fuzzy based Multi-objective Decision Making Approach, Lecture Notes in Engineering and Computer Science (ISI Indexed), 2011, Volume 2188 (1), Pages/record No : 287-292 4. Basit Raza, Abdul Mateen, Muhammad Sher, and Mian Muhammad Awais, SelfPrediction of Performance Metrics for the Database Management System Workload, International Journal of Computer Theory and Engineering, INSPEC, Vol. 4, No. 2, 2012. Conference Publications 1. Abdul Mateen, Basit Raza, Muhammad Sher, Mian M. Awais, “Database Workload Management – Characterization to Idleness Detection”, International Conference on Computing and Information Technology Al-Madinah, Saudi Arabia, March 2012. 2. Abdul Mateen, Basit Raza, Mian M. Awais, Muhammad Sher “Selection of the Best DBMS: A Fuzzy Based Multi-Objective Decision Making Approach” IAENG International Conference on Computer Science, Hong Kong, 16-18 March, 2011. 3. Abdul Mateen, Basit Raza, Muhammad Sher, Mian M. Awais, T. Hussain, “Evolution of Autonomic Database Management Systems”, The 2nd International Conference on Computer and Automation Engineering, Singapore, 26-28 Feb 2010. 4. Basit Raza, Abdul Mateen, Muhammad Sher, Mian M. Awais, “Self-Prediction of Performance Metrics for the Database Management System Workload” 2011 International Conference on Information and Computer Applications (ICICA 2011) 18-20 March, Dubai, UAE, 2011. 5. Abdul Mateen, Basit Raza, Tauqeer Hussain, Mian M. Awais, “Autonomicity in Universal Database DB2”, 8th IEEE International Conference on Computer and Information Science (ICIS 09), Shanghai, China, Pp. 445-450, June 1-3 2009. 6. Basit Raza, Abdul Mateen, Muhammad Sher, Mian Muhammad Awais, Tauqeer Hussain, “Autonomic View of Query Optimizers in Database Management Systems”, Software Engineering, Software Engineering Research, Management & Applications, Canada, 2010. 7. B. Raza, A. Mateen, M. Sher, M.M. Awais, and T. Hussain, “Autonomicity in Oracle Databases Management System”, International Conference on Data Storage and Data Engineering (DSDE 2010), India. 2010. 8. B. Raza, A. Mateen, T. Hussain, M. M. Awais, “Autonomic Success in Databases Management Systems”, 8th IEEE International Conference on Computer and Information Science (ICIS 09), Shanghai, China, Pp. 439-444, June 1-3 2009. Workload Management through Characterization and Idleness Detection viii Table of Contents Chapter 1: Introduction & Background ..................................................................................... 1 1.1 Introduction ............................................................................................................................... 2 1.2 What is Workload? ................................................................................................................... 3 1.3 Workload Evolution .................................................................................................................. 4 1.4 Workload Management Methods.............................................................................................. 5 1.5 Artificial Intelligence (AI) Techniques..................................................................................... 7 1.6 Autonomic Computing ........................................................................................................... 11 1.7 Autonomic Workload Management........................................................................................ 13 1.8 Problem Statement .................................................................................................................. 16 1.9 Research Questions ................................................................................................................. 17 Chapter 2: Related Work .............................................................................................................. 24 2.1 State of the Art in Workload Management ............................................................................. 25 2.1.1 Workload Characterization Techniques............................................................................... 25 2.1.2 Scheduling Techniques ........................................................................................................ 29 2.1.3 Limitations in Previous Research Work .............................................................................. 39 Chapter 3: ReSearch Methodology ............................................................................................... 41 3.1 Research Problem Overview................................................................................................... 42 3.2 Proposed Solution ................................................................................................................... 43 3.2.1 Characterization ................................................................................................................... 43 3.2.2 Scheduler.............................................................................................................................. 45 3.2.3 Idleness Detection & Exploitation ....................................................................................... 46 3.3 Experimental Setup & Results Validation .............................................................................. 47 3.3.1 Experimental Setup .............................................................................................................. 47 3.3.2 Database Management System ............................................................................................ 48 3.3.3 Proposed Technique Scenario .............................................................................................. 48 3.3.4 Evaluation Metrics ............................................................................................................... 49 Summary ....................................................................................................................................... 50 Chapter 4: Workload Characterization ......................................................................................... 51 4.1 Introduction and Background ................................................................................................. 52 4.2 Problems in Previous Workload Management Techniques .................................................... 53 4.3 Proposed Workload Management Solution ............................................................................ 54 4.3.1 Workload Characterization .................................................................................................. 54 4.3.1.1 Key Attributes Selection ................................................................................................... 55 4.3.1.2 CBR Development ............................................................................................................ 58 4.3.2 Detection of the DB Workload ............................................................................................ 65 4.4 Experimental Setup & Results Validation .............................................................................. 65 4.4.1 Workload Editor................................................................................................................... 66 4.4.2 Workload Database .............................................................................................................. 66 4.4.4 Results Discussion ............................................................................................................... 79 Summary ....................................................................................................................................... 80 Chapter 5: Fuzzy Logic Based Scheduling .................................................................................. 81 5.1 Introduction and Background ................................................................................................. 82 5.1.1 Scheduling............................................................................................................................ 82 5.1.1 Fuzzy Logic ......................................................................................................................... 83 5.2 Fuzzy Based Scheduler (FBS) ................................................................................................ 84 5.1.1 Fuzzification ........................................................................................................................ 85 5.1.2 Fuzzification Rules .............................................................................................................. 87 5.1.3 Defuzzification..................................................................................................................... 89 Summary ....................................................................................................................................... 90 Chapter 6: Idleness Detection ....................................................................................................... 91 6.1 CBR Development for Idleness Detection .............................................................................. 92 6.1.1 Retreive ................................................................................................................................ 92 6.1.2 Reuse .................................................................................................................................... 95 6.1.3 Revise ................................................................................................................................... 95 6.1.4 Retain ................................................................................................................................... 96 Summary ....................................................................................................................................... 96 Chapter 7: Results and Discussion ............................................................................................... 97 7.1 Experimental Setup ................................................................................................................. 98 7.2 Results and Discussion ........................................................................................................... 99 7.2.1 Proposed Vs FIFO Based Workload Management ............................................................ 100 7.2.2 Proposed Vs Priority Based Workload Management ........................................................ 106 7.2.3 Proposed Workload Management Vs SJF ......................................................................... 113 Summary ..................................................................................................................................... 116 Conclusion ................................................................................................................................. 117 Future Work ................................................................................................................................ 119 Refrences .................................................................................................................................... 120 Appendix – Database Structure .................................................................................................. 125 List of Figures Figure 1.1 Autonomic Computing Elements …………………………………………. 12 Figure 1.2 Workload Management Activities ………………………………………... 14 Figure 1.3 Autonomic Workload Manager …………………………………………... 15 Figure 2.1 Query Shuffler (Mumtaz et al., 2009) …………………………………….. 31 Figure 3.1 Components of Workload Management Framework ……………………... 43 Figure 3.2 Steps of Characterization Module ………………………………………… 45 Figure 3.3 Experimental Setup for the Proposed Solution …………………………… 48 Figure 4.1 Steps to Characterize the DBMS/DW Workload …………………………. 55 Figure 4.2 OLTP and DSS workload execution with respect to Key_writes ………… 57 Figure 4.3 OLTP and DSS workload execution w.r.t Key_reads ……………………. 57 Figure 4.4 OLTP and DSS workload execution w.r.t Table_lock_immediate ……….. 58 Figure 4.5 Steps to Characterize and Classify the Workload via CBR ………………. 60 Figure 6.1 Steps to Identify the CPU Idleness ……………………………………….. 93 Figure 7.1 Waiting Time of the Proposed Vs FIFO Workload Management ………... 102 Figure 7.2 Average Waiting Time of the Proposed Vs FIFO Workload Management . 103 Waiting Time of the Proposed Vs FIFO Workload Management ………... 105 Figure 7.3 Figure 7.4 Figure 7.5 Average Waiting Time of the Proposed Vs FIFO Workload Management . 106 Waiting Time of the Proposed Vs PB Workload Management …………... 108 Figure 7.6 Average Waiting Time of PB Vs Proposed Approach …………………… 109 Figure 7.7 Waiting Time of the Proposed Vs PB Workload Management …………... 111 Figure 7.8 Average Waiting Time of PB Vs Proposed Approach …………………… 112 Figure 7.9 Waiting Time of the Proposed Approach Vs SJF Workload Management . 114 Figure 7.10 Average Waiting Time of SJF Vs Proposed Approach …………………... 115 Workload Management through Characterization and Idleness Detection xii List of Tables Table 2.1 Analysis of the Workload Characterization Techniques in DBMSs 28 Table 2.2 Analysis of the Workload Management Techniques in DBMSs 38 Table 4.1 CBR with Sample Data 59 Table 4.2 Euclidean Difference Calculation for Incoming Workload 61 Table 4.3 Euclidean Difference Calculation for Incoming Workload (Example 2) 62 Table 4.4 CBR Data with Hash Values 63 Table 5.1 The Workload Involving Mix OLTP and DSS Queries 85 Table 5.2 Percentage of OLTP and DSS in the Workload Involving Mix Queries 86 Table 5.3 Initial Fuzzification Rules 87 Table 5.4 Initial Fuzzification Rules in Compact Form 88 Table 5.5 Final Fuzzification Rules in Compact Form 89 Table 5.6 Workloads with their Identified Workload Class 89 Table 6.1 CBR DATA for CPU Idleness Detection 92 Table 6.2 Euclidean Difference Calculation (Example 1) 94 Table 7.1 Comparative Results of the Proposed and FIFO Workload Management 100 for Scenario 1 Table 7.2 Comparative Results of the Proposed and FIFO Workload Management 100 for Scenario 2 Table 7.3 Comparative Results of the Proposed and FIFO Workload Management 101 for Scenario 3 Table 7.4 Average Waiting Time of the Proposed and FIFO Workload 103 Management Table 7.5 Comparative Results of the Proposed and FIFO Workload Management 103 for Scenario 1 Workload Management through Characterization and Idleness Detection xiii Table 7.6 Comparative Results of the Proposed and FIFO Workload 104 Management for Scenario 2 Table 7.7 Comparative Results of the Proposed and FIFO Workload 104 Management for Scenario 3 Table 7.8 Average Waiting Time of the Proposed and FIFO Workload Management Table 7.9 106 Comparative Results of the Proposed and PB Workload Management 107 for Scenario 1 Table 7.10 Comparative Results of the Proposed and PB Workload Management 107 for Scenario 2 Table 7.11 Comparative Results of the Proposed and PB Workload Management 107 for Scenario 3 Table 7.12 Average Waiting Time of the Proposed and PB Workload Management 109 Table 7.13 Comparative Results of the Proposed and PB Workload Management 110 for Scenario 1 Table 7.14 Comparative Results of the Proposed and PB Workload Management 110 for Scenario 2 Table 7.15 Comparative Results of the Proposed and PB Workload Management 110 for Scenario 3 Table 7.16 Average Waiting Time of the Proposed and PB Workload Management 112 Table 7.17 Comparative Results of the Proposed and SJF Workload Management 113 for Scenario 1 Table 7.18 Comparative Results of the Proposed and SJF Workload Management 113 for Scenario 2 Table 7.19 Comparative Results of the Proposed and SJF Workload Management 114 for Scenario 3 Table 7.20 Average Waiting Time of the Proposed and SJF Workload Management 115 Workload Management through Characterization and Idleness Detection xiv List of Abbreviation AC Autonomic Computing ADDM Automatic Database Diagnostic Monitor AI Artificial Intelligence ANN Artificial Neural Network AWM Autonomic Workload Management AWR Automatic Workload Repository BI Business Intelligence CBR Case Based Reasoning CBMG Customer Behavior Model Graph CHAID CHi-squared Automatic Interaction Detector COLT Continuous On-Line Tuning CRT Classification & Regression Tree DAD Disk Array Designer DBA Database Administrator DBMS Database Management System DSS Decision Support System DW Data Warehouses DTA Database Tuning Advisor FBS Fuzzy Based Scheduler FCFS First Come First Server FIFO First In First Out KB Knowledge Base Workload Management through Characterization and Idleness Detection xv LMP Largest Memory Priority MARS Multivariate Adaptive Regression Splines ML Machine Learning MPL Multiprogramming Level MV Materialized View MWS Mixed Workload Scheduler NN Neural Network ODRM Oracle Database Resource Manager OLAP On Line Analytical Processing OLTP On Line Transaction Processing PAQRS Priority Adaptation Query Resource Scheduling PBS Priority Based Scheduler PGM Priority Gradient Multiprogramming PI Progress Indicator PMM Priority Memory Management QEP Query Execution Plan QP Query Patroller QShuffler Query Shuffler QUEST Quick, Unbiased & Efficient Statistical Trees QUIET Continuous Query-Driven Index Tuning REDWAR RElational Database Workload AnalyzeR rFEED Fair, Effective, Efficient and Differentiated SAA SQL Access Advisor SDD SemiDiscrete Decomposition Workload Management through Characterization and Idleness Detection xvi SJF Smallest Job First SLA Service Level Agreement SLG Service Level Goal SLO Service Level Objective SVD Singular Value Decomposition SVM Support Vector Machine TCO Total Cost of Ownership Workload Management through Characterization and Idleness Detection xvii Chapter 1: Introduction & Background Chapter 1 Introduction and Background 1.1 Introduction DBMSs are used to manage the data in various organizations from the last many years and currently included in core components of the organization. According to a report the worldwide market for DBMS was more than $15 billion in 2005 with an estimated 10% annual growth [IDC Competitive Analysis (2007)]. The complexity in DBMSs increases due to the various functionality demands from the users, complex data types, diverse workload and data volume that is increasing with the passage of time. These factors cause brittleness and unmanageability in DBMSs. To handle this problem organizations hire number of expert DBAs and spending lot of money to get expected improvement, throughput and response. Usually DBAs have to take care of all the tasks such as making policy for workload priorities, memory, configuration and other DBMS related tasks. Currently, the workload complexity has been increased up to the level where it cannot be handled by the DBA or other database expert. On the other hand, cost of the hardware is decreasing as compared to the management cost. Performing workload management activities manually, by hiring experts causes increase in Total Cost of Ownership (TCO) [Huebscher et al. (2008), IBM Corporation (2003)]. Moreover with the advent of distributed systems and data ware house, it became difficult and even some cases impossible for DBA to manually organize, optimize and configure day to day tasks. To achieve better workload management different approaches are used such as executing queries may be stopped for a while and later these can be restarted. Contemporary Database Management Systems are facing challenges of managing various types of workload, demands for maximum functionality, users with different priorities, OLTP, DSS and highly intensive and On Line Analytical Processing (OLAP) queries that involve complex decision making. Main challenge involved with the workload management in DBMSs is the handling of complex, tricky and large workloads that consists of complex OLAP queries. These queries have mathematical functions, number of joins, nested queries and usually operate on large data sets. These workloads answer different business questions, consume many resources, Workload Management through Characterization and Idleness Detction 2 Chapter 1 Introduction and Background access huge data and require large time for complete execution. Due to this reason, they impact the performance of smaller and more important workloads which need an immediate response. With the regular user oriented workload, DBMS also has to take care of the system oriented tasks that execute permanently such as database backup, index re-organization, statistics updation etc. The proposed research is started with the aim to manage the database workload in DBMSs. There is need to take care of all the underlying issues and arranges the database workload in such a way that each request has to wait for a minimum time with maximum CPU utilization. The research is related with the workload, its management; Machine Learning (ML) techniques (as the workload management problem is solved through ML techniques); and the Autonomic Computing. Therefore in the following section, we are discussing these topics in detail. 1.2 What is Workload? Workload is defined as the task that is given to some worker to perform in specific amount of time. In Computer Science it is defined as the processing time to finish a job that further may involves some programming and number of users [Niu et al. (2006)]. The work that is produced or can be produced by the DBMS in a specified time interval is known as the database workload. More concrete definition of the DB workload may be the set of structured query language statements that may be either OLTP or DSS, sessions, commands and other operations that run in an interval. OLTP workload usually consists of small transactions that may involve update, delete etc while the DSS workload comprises of queries that are long, highly resource intensive and involve many computations. Example of OLTP Query: The OLTP query consists of the transactions that take small amount for execution, processing as well as access small amount of data. Update Region set R_Name = 'Islamabad', R_Comment = 'Capital of Pakistan' where R_Regionkey = 7 Workload Management through Characterization and Idleness Detction 3 Chapter 1 Introduction and Background Example of DSS Query: Select S_Name, S_Address From Supplier, Nation Where S_Suppkey In ( Select Ps_Suppkey From Partsupp Where Ps_Partkey In ( Select P_Partkey From Part Where P_Name Like '[Color]%' ) And Ps_Availqty > ( Select 0.5 * Sum(L_Quantity) From Lineitem Where L_Partkey = Ps_Partkey and L_Suppkey = Ps_Suppkey and L_Shipdate >= Date('[Date]’) and L_Shipdate < Date('[Date]’) + Interval ‘1’ Year ) ) and S_Nationkey = N_Nationkey and N_Name = '[Nation]' Order By S_Name; The above DSS query is an example of the ad-hoc queries that is used to show the supplier name and their related address in ascending order from the supplier and nation tables on the basis of some conditions. The conditions involve in this query subsequently consists of other sub-queries. This type of query consumes many resources (such as memory, CPU cycles etc) as well as affects the performance of the DBMS. 1.3 Workload Evolution Workload has evolved through three phases which are capacity planning, resource sharing and performance oriented [Niu et al. (2006)]. Capacity planning was introduced before 1980s with the purpose of cost sharing and provides a way to find out the future organization’s demand. It is used to estimate the future requirements of the organization. The word capacity represents the maximum amount of work that an organization can perform in specific time. In workload management, the capacity planning always guarantees the functions to achieve the production Workload Management through Characterization and Idleness Detction 4 Chapter 1 Introduction and Background goal within the budget. The main concern of the capacity planning with respect to workload change is with resources, their management, configuration and what will be the impact. Capacity planning is performed through characterization and performance [Niu (2008)]. Later on the research over the capacity planning was sparse and converted into the resourceoriented workload due to the invention of minicomputers. In this the concept of service level was maintained through maximum resource utilization by allocating the resources to various applications. So the main idea behind the resource oriented workload was the maximum resource utilization. Heavy growth and versatility in workload introduced the concept of performance oriented workload where the focus is over the business goals and objectives. The performance oriented workloads work in a loop (measure, plan and modify) under the certain Service Level Agreements (SLAs). The aim of the performance oriented workload is to best use of resources without laborious activity. It performs the adaptation by configuring and optimizing without any human intervention and also known as autonomic workload management [Lightstone et al. (2002)]. In summary, the workload has been evolved from the capacity planning to the resource oriented and finally performance oriented. The basis of the workload evolution is cost sharing and Service Level Objectives (SLOs) due to the technology. SLO defines the objective of the agreement and is the main entity of a SLA between the service provider and customer. 1.4 Workload Management Methods There workload in DBMSs is managed through different approaches such as characterization, scheduling, multiprogramming level and resource allocation. 1.4.1 Characterization It is a process in which things are partitioned into the groups on the basis of similarities or specific characteristics. In case of database workload it is the process where various workloads are grouped into workload types such as OLTP, DSS etc. Workload Management through Characterization and Idleness Detction 5 Chapter 1 Introduction and Background Before characterizing the workload, detection of the workload is performed. Basic purpose of the workload detection is to find out changes and trends in the workload for predicting future workloads. In general, workload is detected through two methods i.e. performance monitoring and characterization. Former is the reactive approach that takes action when performance has been degraded while the later is proactive approach to track the workload changes [Niu et al. (2006), (2007), (2007)]. Workload characterization is achieved by partitioning, analysis and resource demand that is provided after monitoring the system where workload partitioning reduces the complexity, workload analysis is useful for efficient workload management and monitoring is used to observe the performance. Identification of the workload type is not an easy task as size and type of the database workload changes at different spans of the day. Behavior of the database workload also varies with respect to day, week and month. For example, in case of stock exchange workload, at the start of the day the server experiences more DSS type of workload than OLTP, turns to more OLTP like workload than DSS during mid of the day and finally once again tends to DSS workload overnight to observe the whole day analysis and reporting. 1.4.2 Scheduling It is a way to make decision about the workload execution by arranging the user requests and proper resource allocation. Usually DBMSs execute all the requests in time without compromising the other works. But when there is huge amount of sessions, requests, and other system jobs arrive simultaneously, DBMS either crashes due to the overload problem or management has to wait for a long time to generate useful reports for decision making. An efficient scheduler is one that balances the user oriented and system workload with respect to size and importance throughout the time that will ultimately maximize the CPU utilization. There are two ways to schedule the user requests or workload in DBMS that may be either internal or external from DBMS [Schroeder et al. (2006)]. In case of internal scheduling, one achieves it by updating the internals of the DBMS while in case of external scheduling there is no need to change the internals. DBMS researchers, vendors, and practitioners prefer the external scheduling due to a number of benefits. For example, they do not need to learn about the Workload Management through Characterization and Idleness Detction 6 Chapter 1 Introduction and Background DBMS’s internal architecture, not necessary to change the DBMS internals, and it is general, not specific to certain DBMS hence portable. Moreover, external scheduling can be modified easily as per requirements and objectives of the organization. It has been applied on the web storage and database servers for many years. 1.4.3 Multiprogramming Level It is the way in which numbers of requests or transactions are limited before sending to DBMS for execution. However, the task to select the right MPL is not any easy tasks as decreasing MPL also decrease the throughput and too much increasing also downs the throughput. So in MPL selection there is a need to take care about both that it should not so low or high that decreases the system performance. Mostly the external scheduling is used for limiting the workload or number of requests i.e. multiprogramming level. 1.4.4 Resource Allocation In general there are two types of resources in DBMSs, i.e. physical and logical. The physical resources include the memory, storage, CPU cycles, I/Os etc while the logical resources include the table locks, system catalog etc. All the operations of the DBMS complete successfully with the interaction of both type of resources. However, this interaction often leads to increase the DBMS complexity. The other reason that has increased the DBMS complexity is the heterogeneous nature of the resources. 1.5 Artificial Intelligence (AI) Techniques As in this research, we are proposing an autonomous solution to handle the workload management in DBMSs so the following section is reserved to elaborate some of the well renowned AI and ML techniques that solves the problems intelligently. These AI and ML techniques have been used from many years not only in databases but many other fields to handle various problems. Workload Management through Characterization and Idleness Detction 7 Chapter 1 Introduction and Background The concept of the Artificial Intelligence is introduced by the Turing in 1950 [Stuart et al. (2008)] and also discuss that how intelligence can be incorporated in machines. After the advent of the AI field much of the research has been done in this area and number of AI techniques has been introduced. We are discussing a few of them in the following section. 1.5.1 Decision Tree Decision tree represents different parameters in the form of nodes that may be parent or child and are used for the decision making. Moreover, they are also used in decision analysis, prediction and learning. The nodes of the decision tree consist of actions and their related decision. Problems are solved through the decision trees by applying some algorithms that may be ID3, C4.5, CHi-squared Automatic Interaction Detector (CHAID) and MARS (Multivariate Adaptive Regression Splines). Decision trees solve the problems with visual display and are used in many fields such as Data Mining, Machine Learning, DBMS, DWs, Networks etc. Decision Tree has numerous advantages such as simple, inexpensive, ability to coordinate with other techniques, handles numerical as well as categorical data, robust and less time consuming. There are also some disadvantages of the Decision trees such as sometime it represents a very long and complex tree that needs pruning, operations such as XOR, parity or multiplexer cannot be performed and locally optimal decisions are achieved due to the use of greedy algorithm in many cases. 1.5.2 Bayesian Algorithm The algorithm is introduced that uses Bayesian theorem that was introduced by Thomas Bayes and works on the basis of previous history or training data. It is mostly applied in the probability, prediction and classification related problems. Bayesian algorithm is widely used in many applications and fields including statistics, science, medicine, control theory, engineering and law. Bayesian algorithm can start working from small training data. 1.5.3 Neural Network A network of neurons or artificial neurons is called a Neural Network (NN) or Artificial Neural Network (ANN) [Hopfield J, (1982)]. There are large number of processing elements in the NN Workload Management through Characterization and Idleness Detction 8 Chapter 1 Introduction and Background which are strongly connected with each other and works at the same time in order to get the solution. The basic concept of NN is taken from the human brain neurons where the neural processing creates the AI. The tasks performed by the human are due to the coordination of certain neurons that become stronger when same task is performed again and finally leads to the formation of memory. NNs solve the problem in the same way as we human do and also learn from the examples just like human. However, there need high care in the selection of examples for NN. As compare to the conventional computing NN solves the problem in different way as conventional approach uses the steps or algorithm to solve the problem while NN uses its learning power or knowledge to solve the problem. After its advent, various applications that are related with medicine, business etc are using it due to their adaptive learning, organizing of information and ability to correct the problems at its own. NN has two types with respect to the architecture, i.e. Feed-Forward and Feedback Networks. Former is also known as bottom-up or top-down approach where the solution solving of starts from input to output and output of any layer cannot effect itself or previous layers while in case Feedback the output of any layer can effect to any other layer, i.e. both directions. Due to this reason Feedback NNs are dynamic in nature but also very complex. However, as NN finds the solution at its own so sometime it can produce unpredictable solution. 1.5.4 Case Based Reasoning The Case Base reasoning (CBR) solves the problem by using the solution of the similar past problems that are stored in the knowledge base. The technique is introduced by the Roger Schank to develop the dynamic memory model [Roger S, (1982)]. The mechanism behind the CBR to solve the problem is just like the same as we human do in our daily life. When humans face any problem, they try to relate it their past problems, find the similar past problems and use its solution. There are four steps to solve the give problems through CBR, i.e. Retrieve, Reuse, Retain and Revise. In retrieve step the case of problem to be solved is formed which consists of the problem with related attributes and most similar problem is extracted from the knowledge base. In Reuse, the solution of the similar case is taken and mapped it to the given problem. The solution is tested in the Revise step and certain modifications are made if necessary. Finally, in Workload Management through Characterization and Idleness Detction 9 Chapter 1 Introduction and Background Retain the adopted solution is stored in the knowledge base as a new case that can be used for future problem solving. It looks that CBR is similar to the rule induction algorithms as both starts towards their solution with a set of cases. But CBR made the generalization after the knowledge of target problem while the rule induction algorithm performs generalizations before. There are certain problems with CBR such as retrieval time increases with the addition of the new cases. 1.5.5 Fuzzy Logic In Boolean logic there are two values True and False, however in Fuzzy logic there is either truth or false value which is in the range of [0, 1]. Fuzzy logic was introduced by Lotfi Zadeh that is based on the concept of partial truth and approximation rather than fixed [Zadeh, (1965)]. Fuzzy logic is similar to probabilistic logic as in both cases the truth value has range from 0 to 1, but Fuzzy represents the degree of truth while the probability shows the likelihood. After its introduction, it has been used in various fields including Control Theory, Engineering, Networks, DBMS, and Artificial Intelligence etc. The Fuzzy logic is performed through three steps which are Fuzzification, Fuzzy Rules Evaluation, Aggregation and Defuzzification. In fuzzification step, crisp input values are evaluated by applying the membership functions that returns a value in between 0 and 1. Membership function must be simple and easily convertible to fuzzy set. The methods in this step may be Gaussian, Triangular or Trapezoidal. In Rule Evaluation the evaluated output received from the fuzzification step is checked through some already defined fuzzy rules which portray the expert decision and written in the form of fuzzy words. The Aggregation is applied to each output variable that performs the conversion of result into a fuzzy set. Finally, in Defuzzification, the fuzzified rules are used to produce the final output by changing the fuzzy set into output values. Usually two methods (Maximum method and Centroid methods) are used in this step. Workload Management through Characterization and Idleness Detction 10 Chapter 1 Introduction and Background 1.5.6 Support Vector Machine Support Vector Machine (SVM) is a machine learning algorithm that is similar to Neural Networks and used for classification. It was introduced by Vladimir N. Vapnik and later on modified by Vapnik and Corinna [Cortes et al. (1995)]. It provides a solution for the classification and regression analysis by the data analysis and pattern recognition. It provides a way of learning and makes prediction on the basis of training data that may consists of small number of examples. The basis of the SVM is decision planes where classification is performed by making an N-dimensional plane and separating the data into two categories. SVM is selected due to its fastness, robustness and accuracy than other machine learning algorithms. 1.6 Autonomic Computing The systems, which can execute, adjust and tune themselves in the presence of workload, are called autonomic systems [Horn (2001)]. The autonomic systems focus on what rather than how. The term autonomic computing was first time introduced by IBM in 2001 to describe the systems that can manage themselves without any human interaction. Autonomic systems always work in certain environment. The inspiration of the autonomic computing is taken from the human nervous system that does different activities without conscious thought [Parashar et al. (2005)]. An autonomic computing system would control the functioning of computer applications and systems without or minimal human intervention, in the same way human autonomic nervous system regulates body system without conscious input from the individual. 1.6.1 AC Architecture Generally the AC system consists of eight elements that work in a loop as shown in Figure 1.1. These elements are managed element(s), sensor, monitor, analyze, plan, execute, Knowledge Base (KB) and effecter [White et al. (2004), Mateen et al. (2008)]. Managed element is the main element (s) in an autonomic system that requires to be managed. Sensor is responsible to collect the information after the interaction with one or more managed elements. Monitor examines the information provided from the sensor, filters and stores the useful information into the KB. Analyze component compares the intended parameters with the parameters provided by the Workload Management through Characterization and Idleness Detction 11 Chapter 1 Introduction and Background monitor component. The results of comparison are also stored in KB for future use. The purpose of the planning component is to identify the possible ways to correct trends on the basis of previously provided data. Execute component uses the effectors to adjust the parameter of the managed element (s) and stores the values in KB. Knowledge base is a repository that contains all the data or information whether it is in raw form or in refined form. Figure 1.4 Autonomic Computing Elements [White et al. (2004)] 1.6.2 AC Characteristics An AC system has some specific characteristics such as Self-Configuration, Self-Optimization, Self-Healing, Self-Protection, Self-Organization and Self-Inspection. Self-Configuration is the characteristic which has the capability to configure itself according to the environment and desired goals. Self-optimization is the characteristic due to which autonomic system performs its tasks and executes utilities in a best way. Self-healing is the capability to maintain the system in a consistent state all the times. Whenever there is an attempt to failure, autonomic system remains in the consistent state or recovers from failure using logs and back up. Self-Protection of an autonomic system includes the security and privacy auditing mechanisms. SelfReorganization in an autonomic system dynamically reorganizes and restructures the layout of the stored data, indexes and other to improve system performance. Through Self-Inspection, autonomic system knows itself in order to make intelligent decisions. It includes the information about its components, available resources, environment, current status, capabilities, limits, boundaries and interdependencies with other systems. Workload Management through Characterization and Idleness Detction 12 Chapter 1 Introduction and Background 1.6.3 AC Levels The evolutionary process of autonomic computing has five levels which are Basic, Managed, predictive, Adoptive and finally Autonomic. Basic level is a starting point where an individual manages various jobs. These jobs may be set up for IT environment infrastructure, monitor and make changes as per requirements. In Managed level system management technologies are used to collect information from the systems. The system collected information is used and analyzed by the administrator for future predictions and decisions. The predictive level provides new technologies to find correlation among system components. The components are so mature and advance that they can initiate pattern recognition, prediction and suggestions for optimal solutions. At the Adoptive level components use the available information and knowledge of the system to not only predict but can take different actions automatically. Finally at the Autonomic level, the business policies and objectives are monitored to change the business policies, objective or both. 1.7 Autonomic Workload Management In workload management there are three entities which are workload, resources and objectives that are co-related with each other. The workload uses some resources to meet the objectives of an organization or we can also say that resources are allocated through different approaches to the workload which has some management objective. In workload management the main functions are workload frequency patterns, composition, intensity and required resources. The adaptation of workload consists of workload detection and workload control. Three techniques are used to derive workload control plan, which are performance model, heuristic and threshold technique. The functions performed by the workload manager include the inspection, configuration, optimization, prediction, organization and adoption of the workload. Generally, in DBMSs as shown in Figure 1.2 all the tasks are performed manual. Large organizations hire database expert(s) to take care of all the activities of the workload. Workload Management through Characterization and Idleness Detction 13 Chapter 1 Introduction and Background Figure 1.5 Workload Management Activities Workload management is not any easy task especially in large scale enterprise where thousands of users generating the workload or requests to DBMS. The unmanageability in workload management can be handled by making the DBMS to self-manage (Autonomic) that can perform its tasks such as memory, configuration, storage management and resource allocation automatically according to the current environment. The Autonomic technology has a high potential to be incorporated in current DBMSs as well DWs. The benefit of autonomic computing is to manage complexity itself according to the set goals and objectives. It contributes an important role in managing systems, database management systems and workload management. Workload management is a main feature of DBMSs and should be autonomic to improve the efficiency of DBMS. Technology of the Autonomic computing can be incorporated with workload management to execute the workload in an efficient and responsive way. There is need of workload manager that manage workload without affecting other requests, efficient resources utilization and handle all other matter related with workload. The Autonomic Workload Manager (AWM) will perform the database tasks by collecting information about the workload type, intensity, resource demand etc with minimal human intervention. Previously, different techniques, models and tools have been developed by the database vendors and practitioners to handle workload in an automatic and self-managed way. These tools are related with workload scheduling, multiprogramming, prediction, adoption and resource allocation. Workload Management through Characterization and Idleness Detction 14 Chapter 1 Introduction and Background The functions performed by the AWM are shown in Figure 1.3 which reveals that the database tasks such as inspection, configuration, optimization etc will be performed automatically any without any human intervention by the AWM through monitoring, analyzing, planning and finally fixing. Figure 1.6 Autonomic Workload Manager Workload Management through Characterization and Idleness Detction 15 Chapter 1 Introduction and Background 1.8 Problem Statement Previous database workload characterization, classification and scheduling techniques did not identify the effective workload parameters. Most of the techniques are based on the assumptions or taking values from the query optimizers rather than calculating their actual values. However, in reality there is enormous difference between the guessed and actual values. These techniques used the status variables to mange the workload but there values are taken during or after the workload execution. So these techniques are not feasible for workload management task and only useful to identify the workload type. These techniques did not identify the percentage of the OLTP or DSS percentage in the database workload through Fuzzy Logic. Due to these reasons, previous workload management techniques give inefficient results and often lead to starvation specially when there is diverse and huge workload. There is a need of high level workload management technique that can manage the database workload without affecting other requests, efficient resources utilization and maximum CPU consumption. The main objective of the research is to develop the workload management approach that consists of characterization, scheduler and idleness detector module. The work is followed by testing the proposed approach over the various real-time and benchmark workloads. Workload Management through Characterization and Idleness Detction 16 Chapter 1 Introduction and Background 1.9 Research Questions 1) How effective workload parameters will be extracted from the given DB workload? 2) What will be the core components of the proposed DB workload management approach? 3) How these components will interact with each other to manage the DB workload? Workload Management through Characterization and Idleness Detction 17 Chapter 1 Introduction and Background 1.10 Research Objectives The aim of the research is to propose an autonomic workload management framework for DBMSs that manage the workload through characterization, scheduling and idleness detection techniques. To achieve the objectives, this research proposes and adopts the following ideas or methods. 1. Selection of the effective workload characterization and classification parameters to identify the DB workload type i.e. OLTP and DSS 2. Propose the workload characterization technique to detect and classify the workload 3. Propose the scheduling technique to arrange the DB workload 4. Identify the idleness to rearrange the workload for efficient execution Workload Management through Characterization and Idleness Detction 18 Chapter 1 Introduction and Background 1.11 Research Contribution The main contribution of the proposed research includes: 1. Design and implementation of the workload management with characterization, scheduling and idleness detection modules. 2. Selection of the effective workload characterization parameters; identification of the workload into OLTP and DSS through Case Base Reasoning Approach. 3. A Fuzzy Logic based scheduling technique where incoming workloads are arranged according to the workload type percentage and impact. 4. Detection of the DBMS idleness to use the free CPU cycles and improve the workload management technique. Workload Management through Characterization and Idleness Detction 19 Chapter 1 Introduction and Background Summary In this chapter, background and related concepts of the workload management, AI techniques and autonomic computing are discussed. The first part of the section describes the origin, architecture, characteristics and levels of the autonomic computing. Workload management section starts with some basics about the workload; how workload evolved with the passage of time and technology; detection methods for workload management. The chapter also discusses about the manual workload management and autonomous workload management. Finally, the chapter concludes with the statement of the problem, research questions, research objectives, research contribution, research plan and organization of the thesis. Workload Management through Characterization and Idleness Detction 20 Chapter 1 Introduction and Background Organization of the Thesis In Chapter 1, there is introduction and background of the database management systems, workload, its management and autonomic computing. In first part of the chapter, there is discussion about the workload, its entities, how workload is evolved, workload detection methods and general as well as autonomic workload management in DBMSs. In the autonomic workload management part an overview of the AC architecture, components and levels are discussed. Finally, the chapter provides the statement of the problem, research questions, objectives, applications and contribution. Chapter 2 is dedicated for the detailed description of the state of the art that is related with the workload management techniques in Database Management Systems and Data Warehouses. These techniques are specially related with the workload classification, characterization, scheduling and management. The chapter discussed the enormous research work related with these areas that have been carried out from last many years. After the discussion of the workload management techniques, the problems are identified and highlighted which exist in these techniques. Chapter 3 elaborates the methodology or steps that will be followed to achieve the proposed research. Here, steps of the research are briefly explained. The objective of the chapter is to prepare readers about the proposed system details which will be explained in later chapter of the thesis. Three steps of the proposed research i.e. workload characterization, scheduling and idleness detection are explained. Experimental design is also specifically discussed here, as we know for this type of research experimental design is very important. This part of the chapter explains the database management system, database workload and the benchmark dataset that will be used during the experiments. Moreover, it explains the experimental setup where various OLTP and DSS like workloads will be designed and executed. The proposed solution to manage the database workload in DBMS will be evaluated by executing these experiments and obtaining their results. Workload Management through Characterization and Idleness Detction 21 Chapter 1 Introduction and Background In Chapter 4, we have discussed the proposed solution for the workload characterization where the workload is classified into OLTP and DSS on the basis of key workload parameters. These workload parameters are identified by executing the workload that consists of OLTP and DSS like queries. During their execution, the snapshots are recoded with an interval of 5 seconds to find change in workload parameters. The recorded snapshots are analyzed by observing the change in workload parameters and on the basis of this observation the workload is classified into OLTP and DSS. After proposing the CBR based workload characterization approach, experiments are performed by executing the various workloads to show the effectiveness and efficiency. Chapter 5 is dedicated for the workload scheduling technique that is performed on the basis of workload type which has been identified in the workload characterization process. The scheduling is performed by finding the degree or percentage of the OLTP and DSS in particular workload. Percentage of the OLTP and DSS in workload is calculated by using the Fuzzy Logic. The Fuzzy Based Scheduler (FBS) consists of three main steps which are Fuzzification, Fuzzy Rules and Defuzzification. In the Fuzzification step, the membership function is calculated on the basis of workload parameter values and converted into the Fuzzy set. Eight (8) Fuzzy rules are designed to decide whether the particular workload lies in low, medium, or high class with respect to its size and importance. The membership functions and the fuzzy rules are identified after executing the various workloads. The Fuzzy set is constructed that consists of the workload with its relative class. Once the workload is grouped into classes, scheduling is performed on the basis of workload importance and percentage of the DSS and OLTP. Chapter 6 discusses the idleness detection approach and how it is useful to improve the scheduling with maximum CPU utilization. There is a CBR that contains training data on the basis of the workload parameters. Training data for the CBR is developed by executing various workloads, extracting parameter’s values, identifying CPU idleness and storing parameters with CPU idleness into the CBR for future executable workloads. The idleness of the incoming workload is identified by searching the exact or nearest (almost equal) workload in the CBR. After getting the exact or nearest match, idleness of that workload is taken from the CBR. In case of no match, the workload is executed without taking the idleness for the first time and stored as Workload Management through Characterization and Idleness Detction 22 Chapter 1 Introduction and Background a new case into the CBR after execution. Once the CPU idleness is found, the time period is used for some other processes such as defragmentation, index rebuilding etc and also used to improve the scheduling. The defragmentation process will execute up to the detected idleness time. After completing the idleness time, the de-fragmentation or other process may complete otherwise it will again temporarily stop. Chapter 7 presents the experiments and discussion over the results that are achieved from the proposed workload management approach. A comprehensive experimental study is elaborated in this chapter including several experiments. It also shows the experimental environment and the workload benchmarks. Effectiveness of the proposed research is experimentally evaluated and the efficiency in terms of the runtime and waiting time is compared with other techniques. The chapter also provides a comparison with the other well-known workload management schemes. The results obtained from the approach are compared with the First In First Out (FIFO) or First Come First Server (FCFS), Priority Based (PBS), and Smallest Job First (SJF). All the results are shown in tables and graphs. The last section of the chapter is dedicated for the conclusion and future work. Finally, the Appendix – Database Structure reveals the table structure and the queries that are required to build the database for experiments. The database with columns, type, primary and foreign keys are also shown. Workload Management through Characterization and Idleness Detction 23 Chapter 2: Related Work Chapter 2 Related Work In chapter 1, we have described the concept of the workload, its evolution, workload management techniques along with Autonomic Computing and how traditional workload management can be evolved to the self-manage workload. The chapter 2 is dedicated to discuss the state of the art that is related with the existing workload characterization, classification, scheduling and management techniques that has been introduced and used in DBMSs and DWs. For convenience and easy understanding, the literature is divided into characterization and scheduling sub-sections. The characterization section discusses the various approaches that are used to characterize and classify the database workload while scheduling part discusses the techniques that are used in DBMSs and DWs to arrange the workload in some specific manner to improve the efficiency. 2.1 State of the Art in Workload Management The research on the workload management in DBMSs and DWs has been carried out from the last many years. A large number of white papers, research papers and case studies that are related with the characterization, scheduling and management in DBMSs and DWs are surveyed. Number of researchers, practitioners and vendors are contributing to make them more selfmanaged and reliable with maximum efficiency. Here, we are dividing the literature review into two sections, i.e. characterization and scheduling techniques sections. Following section discusses few of these which are closely related to the proposed solution for workload characterization and scheduling in DBMSs and DWs. 2.1.1 Workload Characterization Techniques The characterization process is used to identify the workload classes by partitioning the global workload into smaller sets or components that composed of the transactions or requests. Following section discusses the previous research that has been done on the workload characterization in DBMSs. Workload Management through Characterization and Idleness Detction 25 Chapter 2 Related Work A model is proposed [Zewdu et al. (2009)] to identify and characterize the workload; and identified the variables that affect the workload. Workload classification into OLTP and Decision Support System (DSS) is performed through two algorithms, i.e. Quick, Unbiased & Efficient Statistical Trees (QUEST) and Classification & Regression Tree (CRT). Degree of the correlation is determined through the Hierarchical clustering. The key status variables are identified that affect the workload and used these to measure the workload cost. The proposed approach is validated by analyzing and experimenting over the TPC workload benchmark queries and transactions in MySql 5.1 database. After performing number of experiments, four (4) status variables are selected that are vital in the identification of workload type. These include status variables Com_ratio, nnodb_log_writes, Qcache_hits and Questions. [Elnaffar et al. (2002), Elnaffar et al. (2004)] proposed a workload characterization model that automatically classifies the workload into OLTP and DSS. The classifier works on the basis of workload characteristics and arranges the workload characteristics through decision tree induction. The proposed classification model is validated through number of experiments that are performed over the browsing and ordering profiles of TPC-W benchmark. The contribution of the research is to analyze the performance measurements and introduce the classifier for OLTP and DSS workload. However, there is no discussion about the distinct workload parameters which are most affective to distinguish the workload. REDWAR (RElational Database Workload AnalyzeR) by [Yu et al. (1992)] is an offline tool for DB2 environment, which is used for the characterization of SQL trace and provides the structural information with statistics of the query under execution. It analyzes the SQL statements and views with respect to their composition; and classifies the data. During analysis, it uses statistical summaries (correlation, distribution, variation) and runtime behavior of the workload. The efficiency of the database system is increased by identifying the criteria for a SQL statements or workload. REDWAR generates a report that can be used to plan the physical design and build benchmark workload. However the research does not provide the automatic recommendations for the physical design. Workload Management through Characterization and Idleness Detction 26 Chapter 2 Related Work [Wasserman et al. (2004)] presented an analysis of characterization technique for Business Intelligence (BI) workload through clustering technique where the Singular Value Decomposition (SVD) and SemiDiscrete Decomposition (SDD) clustering algorithms are applied. Their approach is based on resource related attributes such as CPU consumption, sequential and random I/O rate and joins degree. The sizing technique works by collecting the input data from the user and for each workload class data is validated and resource demand is identified. Total resource demand is calculated by aggregating the resource demand for each class with hardware configuration. Finally a report is generated that reveals a priority wise list of hardware configuration. The characterization of workload is done by performing five steps which are components identification, selection of characterizing parameters, input data normalization, workload grouping into classes and finally identification of each class. The proposed characterization technique is performed on TPC-H benchmark like data but with some assumption based parameter. The technique is also limited as for characterization it only considers the one parameter i.e. user resource demand. A technique for characterization is introduced where workload models for E-commerce environment is built [Menasce et al. (1999)]. In this research, a CBMG (Customer Behavior Model Graph) or state transition graph is introduced for characterizing the workload. This model graph represents similar navigational pattern for group of customers who perform same activities. The workload model with related parameters is identified and workload characterization is performed through clustering algorithm. The technique is evaluated with the help of different experiments. The proposed technique had some problems such as there will be maximum session drops when there are huge sessions or maximum load. Moreover there was no mechanism to manage or recover these session drops. An approach [Holze et al. (2008)] to monitor the workload for autonomous databases is introduced that is based on n-Gram model. The proposed technique is lightweight due to use of simple API and has ability to find workload shifts during the execution. The monitoring process is divided into two sub-processes; first continuous monitoring where currently executed workload compares with the typical workload model and second performs the configuration analysis when major shift is found. The technique is validated through experiments which Workload Management through Characterization and Idleness Detction 27 Chapter 2 Related Work performed over the DVD Shop and reporting data by taking different workload shift scenarios. However, the research did not discussed that how the workload can be managed more efficiently. The black box attributes can also be used to characterize the DB workloads. The work in [Liu et al. (2004)] the black box attributes which include response time, CPU usage, page reference and disk utilization to characterize the E-commerce workload. Authors argued that CPU utilization can be changed by varying the dynamic cache. For experiments, they have used various types of dynamic caches such as Query, Table and Hybrid cache. After experiments it has been found that response time of the Database server and number of references reduces due to the dynamic cache even in peak hours. Authors also concluded that the efficiency of the Hybrid cache is better than Table cache which in turn better than the Query cache. The experiments are performed on the TPC-W benchmark to validate the black box approach. Summarized analyses of the database workload characterization techniques for workload management in DBMSs are presented in Table 2.1 (where the last column “Exp/ Imp” represents whether the particular research is experimented or implemented in some database management system). Table 2.1 Analysis of the Workload Characterization Techniques in DBMSs Characterization Technique Zewdu et al. (2009) Elnaffar et al. (2002), (2004) Yu et al. (1992) Wasserman et al. (2004) Purpose A model is introduced to identify and characterize the workload; and identified the affective variables that can be used for workload characterization. A workload characterization model is proposed that classifies the workload into OLTP & DSS automatically. A tool that is used for the characterization of the workload and plan the physical design. A characterization technique for the BI workload is presented. W Type/ Benchmark Static/ Dynamic Algorithm/ Model Exp/ Imp TPC-C TPC-H Dynamic Hierarchical clustering, QUEST and CRT Imp SPRINT – Scalable Decision-Tree based Algorithm TPC-W Dynamic Exp Accounting Type Data Static - Imp TPC-H Dynamic SVD & SDD Algorithm Imp Workload Management through Characterization and Idleness Detction 28 Chapter 2 Related Work Menasce et al. (1999) Holze et al. (2008) Liu et al. (2004) Mange workload in Commerce type application. Eof An approach to monitor the workload for autonomous databases that uses n-Gram model & has ability to find workload shifts during execution. Black box attributes which include response time, CPU usage, page reference & disk utilization to characterize the E-commerce workload. TPC-W Dynamic Queuing Model Exp DVD Shop & Reporting Data Dynamic n-Gram Model Suffix Tree Imp TPC-W Dynamic Least Frequently Used & Least Recently Used Imp 2.1.2 Scheduling Techniques Scheduling is a way to arrange the queries or workload in some specific order to decrease their average waiting time and speed up the execution process. The previous research over the workload scheduling techniques is discussed in the coming section. [Surajit et al. 2005, 2007] proposed a framework to stop and restart the queries during their execution to manage the workload efficiently. The restart approach re-executes the stopped query from the position where it was stopped. This technique does not save all the executed information but save only the selected information from the past execution to reduce memory overhead. This method also reduces the running time of re-executed queries. The proposed technique is validated by performing the experiments over the real and benchmark data. The proposed approach is limited to handle single Query Execution Plan (QEP) and does not consider the parallel QEPs. There is no dynamic way in proposed technique to maintain the restart plan during the modification of source records (past executed records). The research discussed in [Chandramouli, (2007, 2007)] for query suspension and resumption with minimum overhead where authors have proposed induction of asynchronous checkpoints for each cardinality in a query. The technique solves the optimization problem by using mixedinteger programming. An optimized plan for suspension is proposed which dumps the current state to disc and going back to previous checkpoint. The optimized plan performs its tasks (suspension or resumption) with less overhead by observing the time constraint during Workload Management through Characterization and Idleness Detction 29 Chapter 2 Related Work suspension. In case of query suspension all the resources are released while in case of resumption the required resources are resumed. The technique is also implemented in a tool named as PREDATOR and proved that this tool has better results than others. The technique is proven experimentally for simple and heavy workload; and it is observed that it meets suspend time constraint; and thereby reducing the overhead. The technique uses hybrid approach for query suspension where suspend time overheads are negligible and due to this better results can be seen. The technique allows to suspend whole query as compared to previous techniques where switching is performed between individual operators. The memory wastage is higher in previous techniques due to switching and shown worse results for unexpected suspend. However the proposed technique does not re-optimize the given query. Memory wastage is less as compare to the previous approaches as this technique allows to suspend the whole query. An external scheduling technique for the OLTP workload is proposed y Schroeder et al. (2006) where they identified main parameters and used feedback controller to select the appropriate MPL automatically that is based on queuing theoretic model and Markov chain. Priorities are also used for the external scheduling. The technique is validated through experiments and observed that external scheduling can be equally effective to internal scheduling when suitable MPL is selected. The research has provided a mechanism that selects a suitable MPL on the basis of two parameters i.e. disc and CPU. The technique does not consider the impact of queries on each other. The low priority transactions are executed only when there are no high priority transactions. Moreover, the paper does not provide a mechanism to give priorities to different transactions. The technique improves the high priority transactions by the factor of 12.1 while the low priority average suffer is about 16%. [Mumtaz et al.,(2008, 2008, 2009)] discussed the impact of query interaction over the workload by using planning experiment and statistical modeling approach. This approach is portable as it works without previous assumption about the internal knowledge of database and cause of query interaction. A framework Query Shuffler (Qshuffler) – Interaction aware query scheduler for workload management for Business Intelligence is introduced and shown in Figure 2.1. Qshuffler considers the impact of queries over each other as it has vital role in performance and proposed an experimental technique to handle query interaction problem. It considers impact of Workload Management through Characterization and Idleness Detction 30 Chapter 2 Related Work queries over each other as it has vital role in performance. Different requests are given by the users in the form of queries; the Qshuffler classifies these queries according to their type and arranges them in a way to minimize the total execution time of the workload. It gives optimal solution for scheduling as it is based on linear programming and is used to minimize the dropping requests using non-preemptive shortest remaining time first scheduling technique. The Qshuffler is implemented in the IBM DB2 and evaluated with the TPC-H benchmark workload. The proposed approach is portable as it works without previous assumption about the internal knowledge of database and the cause of query interaction. It gives four times performance over the FIFO scheduler of database systems. However in Qshuffler, the large jobs have to wait for a long time as it uses SJF algorithm for scheduling. Moreover, the average execution time is larger as it uses non-preemptive SJF approach. Figure 2.1 Query Shuffler (Mumtaz et al., 2009) [Mehta et al. (2009)] define the design criteria to develop a Mixed Workload Scheduler (MWS) and uses it to design rFEED, i.e. MWS that is fair, effective, efficient, and differentiated. They proposed a non-preemptive approach for scheduling since it can be expensive to preempt really small queries that make the bulk of a BI workload. The approach uses optimizer’s estimated execution cost as an approximation as authors think that approximation is sufficient and no need to use precise value. Moreover, a single queue is used for scheduler and multiple queues for execution. They also assumed that all queries have the same normalized service level. The Workload Management through Characterization and Idleness Detction 31 Chapter 2 Related Work authors simulated the real workloads and compare this system with models of the current best of breed commercial systems. The suggested methodology uses non-preemptive scheduling scheme that gives inefficient results for time critical systems due to its poor responsiveness and ultimately there is a chance of starvation and hanging. The authors used the approximate values that never give the actual results. Optimizer’s estimated execution cost is used but in real life approximate values never give the actual results. The proposed methodology assumes the same service level but in real life the workload or queries do not have the same service level. The approach set the MPL statically and does not consider the interaction among the queries in the workload. BI workload manager [Mehta et al. (2008, 2008)] for enterprise data warehouse is introduced in which queries are admitted in the form of batches. It consists of three components which are admission control, scheduler, and execution control. Admission control works on the basis of memory requirement in the form of batches where a batch consists of those queries whose memory requirement is equal to the available memory of the system. For scheduling, the authors proposed that the query with maximum memory requirement will have the highest priority. The manager uses Priority Gradient Multiprogramming (PGM) to prevent underload and overload problems. Finally some experimental results are discussed to evaluate the performance of BI batch manager. Workload is executed in the form of batches to avoid thrashing and provides the optimal solution for all types of workload. This approach does not change in the internals of DBMS to manage small and heavy workload. The manager gets benefit of the added predictability of queries; stabilize the system and less sensitive to estimation errors. It uses feed forward loop that stabilizes the system (from underload or overload) with maximum ability to tolerate the prediction errors. Feedback control technique samples the performance metric and controls the incoming request optimally. When performance metric exceeds from the threshold, the rate of admitting requests reduces and vice versa. It gives high throughput for commercial and enterprise class DBMS; however it has no ability to handle interactive and ad-hoc queries. The sub-batches are created on the basis of one parameter that is memory. As the query with largest memory gets higher priority so the queries with small memory will have to wait for a long time. Due to this reason throughput will decrease and starvation will occur. As the PGM Workload Management through Characterization and Idleness Detction 32 Chapter 2 Related Work executes that query first which requires largest memory so the same problem as with Largest Memory Priority (LMP) will be experienced in PGM. Query Progress Indicators (PI) are used to provides step-by-step status of a query execution. Single Query PI Chaudhuri et al. (2005) is proposed with a graphical user interface for the relational DBMS that keeps the track of work completion and estimates the remaining execution time. The proposed PI starts its working by taking the estimated cost from the query optimizer and calculates the remaining query execution time using statistics. It also monitors the query execution speed continuously and remaining query execution time is estimated by dividing the estimated remaining query cost over the query execution speed. The proposed PI is limited to single query without considering the impact of one query over the other. So during the estimation of the query execution remaining time, the load and query progress is considered in isolation. It often provides inaccurate and wrong estimates. As in most of the concurrent queries execution, one query can significantly slow down the progress of other query. In single query PIs, greedy algorithm is used to speed up the process, where an optimal victim query is selected and the next optimal victim query is chosen. The process continues up to get all the victim queries. Later on, a multi-query PI Luo et al. (2006) has been proposed which represents the progress of running queries and considers the impact of queries on each other as opposite to the previous techniques of single query PIs. The technique of Multi-query PI works by considering the remaining execution time of the concurrent queries with available statistics and predicts the future execution speed of incoming queries. On the basis of estimation, it can also predict the future queries and has the ability to manage the current workload efficiently. The indicator not only provides the visualization of the running queries but helpful to manage the database workload efficiently. This technique takes the workload management problem as an input and provides its solution through the information that is provided by the Multi-query PI. The proposed technique is implemented in POSTGRE SQL and examined with the remaining query execution time. The technique is the first attempt for a multi-query PI. As compare to the single query PIs, the Multi-query PI considers the impact of the concurrent queries over each other and predicts the Workload Management through Characterization and Idleness Detction 33 Chapter 2 Related Work incoming queries with considering the priority and cost. The information provided by these indicators helpful for the workload management tools to take more intelligent decisions. The indicator is able to predict the accurate future queries even initial estimates are wrong by detecting and correcting their estimates. It monitors the system at all the times and manages the workload more dynamically. Multi-query PI is adoptive as it revises its decisions when it found some significant changes as compared to the predicted results. This adaptive behavior of MultiQuery PI shows the consistency with the trends of automatic and AC. DB2 Query Patroller (QP) IBM Corporation (2003), Lightstone et al. (2002) is a management tool, which is used to streamline the requests according to the available resources. It is possible to accept the database workload from user and analyze it. On the basis of analysis it prioritizes and schedules the workload according to different query classes. A class is build by considering cost range and Multi-Programming Level (MPL) threshold. Cost range is provided by the query optimizer that calculates the resource demands. MPL threshold is the maximum number of requests in a class that can execute in one time. Remaining queries are placed in a queue when the threshold level is reached and are placed for execution in a class whenever threshold level goes down. It also gives the information to the user about the status of tasks. QP provides sufficient resources for given workload and by using profile (that is created by administrator) saturations for long terms queries can be avoided. QP controls the flow of requests proactively. It provides the information about the completion of request and finds trends of queries, workload of users as well as the frequently used indexes and tables. QP enhances the performance by monitoring the system utilization, canceling or rescheduling the queries and identifying the trends of database usage. Query submitter uses it to monitor the submitted queries, store query result for future perspective and query submission customization. Submitter assigns the higher priorities to some user so that in the class these queries run with less delay. QP suspends high load queries so that they can be cancelled or scheduled to run after peak hours and track the query process. By performing these steps the smaller queries may not stuck and system recourses are used properly. It is based on client-server architecture and consists of three components. These components are Query Patroller server, Query Patroller Center and Query Patroller command line support. DBA uses QP to assign privileges of resources at user and system level. Workload Management through Characterization and Idleness Detction 34 Chapter 2 Related Work Query Patroller monitors the given workload; perform analysis and prioritize it schedules for the incoming requests from the users. It limits the flow of long running queries to avoid saturation and ensures better resource utilization on the basis of profile (created by the administrator). In Oracle, the automatic SQL tuning is performed through the query optimizer and SQL Tuning Advisor Dageville et al. (2004). Query optimization has great importance especially in case of complex workload. The SQL Tuning Advisor is an interface between the query optimizer and user. It generates tuning recommendations for the SQL statements or workload. These recommendations are provided to user, who either select or reject. When user selects the recommendations, these will be stored in the SQL profile that is further utilized by the optimizer for generation of best Query Execution Plan (QEP). SQL Tuning Advisor makes decisions on the basis of information that is provided by the query optimizer, Automatic Database Diagnostic Monitor (ADDM) and Automatic Workload Repository (AWR). The approach set the MPL statically and does not consider the interaction among different queries. The SQL Tuning Advisor improves the execution plans through SQL Profiling concept; and on the basis of costbased access paths and what-if analysis it generates tuning recommendations for the incoming workload. It has a very strong analysis capability where it performs a number of analyses such as estimate, access, parameter setting, statistical and SQL structure analysis. Whenever query optimizer fails due to heavy or complex workload, it assists by stabilizing the system and generating the QEPs. Oracle Database Resource Manager (ODRM) by Rhee et al. (2001) allows DBA to logically divide the workload into distinct units and allocates CPU resources to these units without extra overhead. During the peak hours OLTP workload should be given preference over the DSS workload and vice versa. ODRM has scheduling mechanism that is used for fixed time interval and controls the number of active sessions executing at a time. When the available slots for active sessions are filled with new sessions, the remaining sessions will be queued until some slot is available. Through ODRM, administrator can define and set the scheduling policies for the workload based on the predicted execution time of a query. The major components of the ODRM are Resource Consumer Group, Resource Plan and Resource Plan directive that interact to achieve the objective. Workload Management through Characterization and Idleness Detction 35 Chapter 2 Related Work [Krompass et al. 2007, 2008, 2009] introduced an adaptive QoS management technique, in which they used economic model to handle individual request of BI and OLTP workload proactively. They provide a systematic way to arrange different requests by dividing these into different classes based on cost and time limit. They also proposed a model which calculates the cost of a request by differentiating underachieving and marginal gains of a Service Level Objectives (SLOs) threshold. The framework is evaluated to observe its effectiveness by performing experiments on different workloads. The framework is scalable as it can implement the new workload management concepts with already previously implemented policies. The framework uses economic model with two economic cost functions (Opportunity Cost, Marginal Gains), where penalty information is added with the queries. The penalty information is used to make the efficient order of pending query execution. The scheduling policy used for OLTP workload in this framework is enhanced by considering the combine effect of priorities and SLOs rather than considering merely priority. A Priority Adaptation Query Resource Scheduling (PAQRS) algorithm is devised by [Pang et al. (1995)] that is based on the Priority Memory Management (PMM) algorithm and deals with multi-class query workload. The algorithm reduces the missed deadlines according to the miss distribution defined by the administrator. It works by allocating memory and assign priorities by considering the resource usage, workload characteristics, and performance experienced. Whenever the workload change, new MPL is calculated, memory reallocation and priority adjustment is done accordingly. Two techniques, the miss ratio projection and resource utilization heuristics are used to calculate the new MPL. In case of miss ratio projection method, previous MPL and miss ratio are used as the parameters. It is used to schedule the complex type of workload and reduces the number of missed deadline thereby making the efficient use of system resources. It has bias control mechanism, which regulate the distribution of missed deadlines among different query classes. The MPL and memory is allocated on the basis of regular and reserve group quota. The priority of regular queries is higher than reserve queries. By adopting this policy, PAQRS make adjustments between the miss ratio and the target distribution. However, it cannot handle the heterogeneous transactions and is limited to the workload consisting of homogenous queries only. Multiple service classes are also not considered that have diverse goals. Its performance degrades whenever there is increase in the Workload Management through Characterization and Idleness Detction 36 Chapter 2 Related Work workload fluctuation. The research work also suggests how to adjust the policy threshold but there is no discussion about the automatic policy updations with respect to the changing goals and objectives. A framework for workload adaptation by Niu et al. (2006) is proposed that has two components, i.e. workload detection and workload control. The workload detection finds the changes and provides information of the workload. The framework has also four functional components, namely workload characterization, performance modeling, system monitoring and workload control. The effectiveness of the framework is proven by implementing the query scheduler through a large number of experiments. Query scheduler can directly handle OLAP workload where as OLTP workload cannot be handled directly. They proposed a technique by using indirect control of the OLTP through directly controlling OLAP workload. So due to this enhancement in query scheduler both workload can be handled to achieve the performance goals. Performance is improved by prediction process using Kalman filter as performance prediction plays vital role in workload adaptation. Kalman filter is very powerful tool that is used to make the estimation of past, present and future states. The filter provides an optimal solution for linear processes and sub-optimal solutions for the non-linear processes. Through experiments, more accurate prediction results are obtained with less unpredicted SLO violations. In short, the research contribute by designing a general framework for performance oriented workload adaptation, prototype implementation of framework (Query Scheduler), a cost-based performance model for the workload control plans and improves the accuracy of the prediction through Kalman filter. The experiments on the Query Scheduler are performed on the stable workload, which is not representing the real environment as in actual the database workload changes rapidly. During the experiments, total cost of a query instead of detailed cost is used as a parameter that may leads to wrong results. Moreover, the scheduler is confined to the linear workload however, in a real environment most of the time database workload behavior remains non-linear. Summarized analyses of the database workload management techniques in DBMSs are presented in Table 2.2 (where the last column “Imp/ Exp” represents whether the particular research is experimented or implemented in some database management system). Workload Management through Characterization and Idleness Detction 37 Chapter 2 Related Work Table 2.2 Technique Surajit et al. 2005, 2007 Chandramouli et al. (2007, 2007) Schroeder et al. (2006) Mumtaz et al. (2003, 2008, 2009) Mehta et al. (2009) Analysis of the Workload Management Techniques in DBMSs Purpose A framework to stop and restart the queries during their execution to manage the workload efficiently. Query suspension and resumption is discussed where asynchronous checkpoints are used for all the cardinalities in a query. A technique introduced for external scheduling to handle the OLTP workload. Schedule the workload by considering the impact of the queries over each other. A design criterion for a MWS is presented with simulation results. Dageville et al. (2004) A workload management technique to execute workload in batches. Display the step by step status of query execution. The research provides a way to manage the workload by providing the step by step status of query execution with the consideration of the impact in concurrent queries. Prioritize and schedule the workload according to query classes. Tunes the process of QEP and provides the recommendations for tuning. Rhee et al. (2001) Logically divides the workload into small parts and allocates the resources Mehta et al. (2008, 2008) Luo et al. (2004) Chaudhuri et al. (2005) Lightstone et al. (2002) Krompass et al. (2007, 2008, 2009) Pang et al. (1995) An adaptive QoS management technique to handle the BI and OLTP workload. Proposed an algorithm that reduces the missed deadlines, allocates memory & assigns privileges. W Type/ Benchmark Static/ Dynamic Algorithm/ Model Imp/ Exp TPC-H Dynamic Heuristic Hash Join Imp TPC-H Dynamic Mixed-Integer Programming Imp OLTP/ TPCC Dynamic TPC-H Dynamic Queuing Theoretic and Markov Chain Model Statistical Modeling (Linear & Quadratic Models, Regression Trees) Smallest Job First Exp Imp Dynamic Static (MPL Selection) Non-preemptive Priority scheduling - BI/ OLAP TPC-H Dynamic PGM - TPC-R Dynamic TPC-R Dynamic OLTP TPC-C OLTP/ TPCC BI/ OLAP OLTP/ TPCC DSS/ TPCH Greedy Algorithm Greedy Algorithm Imp Imp Dynamic Administrator assigns privileges at user & system level Imp Dynamic What-if Model, Cost Based Model Imp OLTP/ TPCC DSS/TPC-H - DBA divides workload & allocates resources to these units. Imp OLTP OLAP Dynamic - - - Dynamic PMM Algorithm - Workload Management through Characterization and Idleness Detction 38 Chapter 2 Related Work Niu et al. (2006) The research discussed the performance oriented workload adaptation framework. OLTP/ TPCC BI/ OLAP Dynamic Kalman Filter Queuing Model Proto Imp 2.1.3 Limitations in Previous Research Work In the above sections we have discussed the research that has been conducted to manage the workload with various approaches. Each paragraph elaborates the summary of the particular research and the critical review. The summary part discusses the approach, workload and the environment in which the research is carried out while the critical review part list down the problems that occurs. In the following section, there is discussion about some common problems that have been identified in the previous workload management techniques. Previous workload characterization and scheduling techniques did not identify the effective workload parameters. Most of the techniques are based on the assumptions or taking values from the query optimizers rather than calculating their actual values. However, in reality there is a large difference between the guessed and actual values. None of the previous characterization technique for the workload management in DBMSs identified the OLTP or DSS percentage in the workload using Fuzzy Logic. There is a need of workload management approach that can manage the database workload without affecting other requests, efficient resources utilization and handle all other related matters. The work is followed by testing the proposed research over the various real-time and benchmark workloads. Summary In this chapter, the problem of workload management and its complexity were reviewed. The basic methodologies, frameworks and algorithms that have been used for workload management were analyzed to find main ways, which can reduce the complexity and manage workload in a better way. The summary of the each research provided and critically analyzed to find their strengths and weaknesses. The results gained by studying and analyzing the related works are Workload Management through Characterization and Idleness Detction 39 Chapter 2 Related Work also discussed. At the end of each section, a table is given which summarizes the previous research that has been done in the context of workload management. Workload Management through Characterization and Idleness Detction 40 Chapter 3: Research Methodology Chapter 3 Research Methodology This chapter is dedicated to provide a general idea of the proposed solution to manage the database workload. The chapter starts with providing an overview of the problem and discusses the research methodology to achieve the proposed workload management task through characterization, scheduling and idleness detection. In fact, there is explanation about the road map and steps that will be carried out to achieve the specified goals of the research. Finally, there is explanation about the experimental design including hardware and software detail, workloads, performance metrics and the evaluation methods. 3.1 Research Problem Overview Several database researchers, practitioners and vendors are working to manage the workload in DBMSs using different techniques. However, there are certain problems with the previous workload management techniques. The techniques that are based on the decision tree leads to some problems. First, handling of decision tree becomes difficult with the size of the decision tree and secondly, in such cases tree pruning is performed to get the result that may lead to wrong results and lastly difficult to set up in such circumstances. The research performed on the basis of status variables have the problems such as mostly techniques are taking workload parameter values on assumption basis or from the query optimizers rather than calculating their actual values. However, in reality there is a large difference between the guessed and actual values. The other problem with these techniques is that values of status variable are recorded during the workload execution that also slows down the process. These techniques are using the status variables to mange the workload but there values are taken during or after the workload execution. Due to this reason, these techniques are only useful to identify the workload type and infeasible for workload management task. Workload Management through Characterization and Idleness Detection 42 Chapter 3 Research Methodology 3.2 Proposed Solution The proposed solution to manage the workload in DBMS consists of three modules which are Workload Characterization, Scheduler and Idleness Detection as shown in Figure 3.1. Responsibility of the characterization module is to divide the workload into OLTP and DSS after identifying the effective workload parameters and passes its output that is workload parameter’s values to the next module. The scheduler module after getting the parameters values will calculate the impact of each workload and schedule them. Finally, the idleness detection module will find the CPU idleness for the particular workload and assign the idleness period to some other suspended process. These are the abstracted steps and next chapter are dedicated to explain these steps in detail. Figure 3.1 Components of Workload Management Framework 3.2.1 Characterization The responsibility of the characterization module is to analyze the incoming workload to get the effective parameters which are vital to know the workload type. The workload will be observed on the query level with respect to the workload parameters such as type, cost, priority and dependency (with other queries) as shown in Figure 3.2. Steps to achieve the workload characterization are explained as follows: 1. Incoming database workload is first of all divided into two types (OLTP and DSS) by identifying the key workload parameters. The parameters identification will be performed by analyzing the workload parameters and their respective values that change time to Workload Management through Characterization and Idleness Detction 43 Chapter 3 Research Methodology time with the execution of workload. After careful analysis, useful and effective workload parameters will be extracted. 2. During the identification and classification of the workload, it is necessary to consider the tradeoff of all workload parameters. The DSS type of workload needs an immediate response as usually these are generated by the higher management while on the other hand these requests are very large and requires many resources. OLTP transactions are generated through the lower management but very small and require minimum resources. Similar trade off should be analyzed in other workload related attributes. Previous research related with workload classification over the MySql is discussed in [Zewdu et al. (2009)] where the workload parameters used include Com_ratio, nnodb_log_writes, Qcache_hits and Questions. Their research work identified the workload type on the basis of these parameters. 3. Some AI and machine learning technique such as Case Base reasoning (CBR), Bayesian Theorem, clustering algorithm etc will be used for workload classification. 4. Before applying the technique, training data will be prepared that will provide the basis and helpful to make decision about the workload type. Training data will be constructed by executing the database workload and storing its associated workload parameter’s values with a specific interval. The suitable time interval for snapshot will be taken after the experiments. 5. Impact of the workload will be calculated after the distinction of workload into OLTP and DSS. 6. Output of this module will be the workload type and the parameter’s values. Workload Management through Characterization and Idleness Detction 44 Chapter 3 Research Methodology Figure 3.2 Steps of Characterization Module 3.2.2 Scheduler 1. The Scheduler will take workload and related parameters values. Impact of the queries will be calculated on the basis of parameters and workload type as identified in the workload characterization module. 2. We will use the fuzzy logic for scheduling the workload that will handle the workload dynamically. In Boolean logic each decision has two outputs true or false however in fuzzy logic truth value has a range starting from 0 to 1. So the fuzzy logic works on the concept of partial truth, where the truth value may range between completely true and completely false. 3. Steps of the fuzzy based scheduler are as follows: a. Fuzzification In this step crisp input values are evaluated by applying the membership functions that returns a value in between 0 and 1. Membership function must be simple and easily convertible to fuzzy set. There are three methods for this purpose which are Gaussian, triangular, trapezoidal. Input value is in fuzzy set when the value received from the membership function is closer to 1. Workload Management through Characterization and Idleness Detction 45 Chapter 3 Research Methodology b. Rule Evaluation The evaluated output received from the fuzzification step is checked through fuzzy rules which portray the expert decision and written in the form of fuzzy words. c. Aggregation For each output variable, the aggregation step performs the conversion of result into a fuzzy set. d. Defuzzification At the end fuzzified rules are used to produce the final output by changing the fuzzy set into output values. Usually two methods i.e. Maximum method and Centroid methods are used. 3.2.3 Idleness Detection & Exploitation 1. There will be a Case Base Reasoning (CBR) that contains training data on the basis of the workload parameters such as Workload id, Number of OLTP queries, Number of DSS queries, Execution time, Response Time, Memory, Disk I/Os, Communication cost etc. Training data will be developed by executing various workloads, extracting parameter’s values, identifying CPU idleness and storing parameters with CPU idleness into the CBR for future executable workloads. 2. The idleness of the incoming workload will be identified by searching the exact or nearest (almost equal) workload in the CBR. a. After getting the exact or nearest match, idleness of that workload will be taken from the CBR. b. However if the match for the particular workload is not found in the CBR then the workload will execute without taking the idleness for the first time and after its successful execution it will be stored as a new case into the CBR with its associated idleness. Workload Management through Characterization and Idleness Detction 46 Chapter 3 Research Methodology 3. After the detection of CPU idleness, the time period will be used for some other processes such as defragmentation, index rebuilding etc and also used to improve the scheduling. 4. The defragmentation or other resumed process will remain in execution up to the detected idleness time. 5. After idleness time is finished, the de-fragmentation or other process may complete its execution otherwise again temporarily stop. 3.3 Experimental Setup & Results Validation As the experimental design and evaluation has vital role in proving the proposed research, therefore here we are discussing the experimental design, Database Management System, performance metrics and the evaluation methods. 3.3.1 Experimental Setup Following hardware will be required to build the experimental setup for experimentation and validation of the proposed solution. i. Database Server (01) ii. Backup Server (01) iii. Personal Computers (06) iv. Switch (01) All the hardware devices will be connected through a network as shown in Figure 3.3. Workload Management through Characterization and Idleness Detction 47 Chapter 3 Research Methodology Figure 3.3 Experimental Setup for the Proposed Solution 3.3.2 Database Management System We have selected the MySql Database Management System to execute various workloads and validate the proposed technique. The selection of MySql database management system is due to its freely availability and open source feature, i.e. source code is available which can be further analyzed and modified accordingly. MySql Server will be installed on server machine and MySql client on client machine. The main database will be stored at the server machine while backup server machine will be responsible to maintain the backup of the database. Each client machine will have the ability to generate thousands of queries or workload at a time. The workload will be generated via threads and sent to the database server for execution. There will also be a database on the server as well as backup machine that will contain various OLTP and DSS queries which will be taken arbitrarily by the threads that will generated from client to form a workload. 3.3.3 Proposed Technique Scenario Proposed solution for workload management in DBMS can be achieved by performing the following steps: a. Workload is initiated through threads from the client machines and sent to the database server for execution. A client machine has the ability to initiate Workload Management through Characterization and Idleness Detction 48 Chapter 3 Research Methodology thousands of requests at a time to the database server. The workload consists of the OLTP and DSS like queries that are stored in the database which is stored at the database server. b. In the workload management technique, first of all it is taken by the characterization module which will partition the workload and observe it for further analysis by taking and storing the snapshots. c. After recording and analyzing the snapshots, effective workload parameters will be identified and extracted which will be used to classify the workload into two types of workload, i.e. OLTP and DSS. d. Once the workload is classified, the OLTP and DSS percentage of the workload will be calculated to arrange it for better execution. e. Next step is to find out the DBMS idleness for that workload. The DBMS idleness will be detected and some other system process (background process such as Defragmentation, index rebuild etc) or user oriented process will be scheduled with the workload for execution. f. The resumed processes will execute until the idleness period is finished. The restarted process will finish or stop once again after the completion of identified idleness period. 3.3.4 Evaluation Metrics The proposed solution to manage the workload will be compared with the other wellknown techniques. The results obtained from the proposed approach will be compared with the First In First Out (FIFO), Priority Based (PB) and Smallest Job First (SJF) approach. The comparison will show how much the proposed solution is better than the previous techniques with respect to the workload importance and average waiting time. The proposed Idleness technique will be validated by observing the CPU utilization that will be recorded by executing Workload Management through Characterization and Idleness Detction 49 Chapter 3 Research Methodology the workload in MySql environment and taking their snapshots. After the execution of workload in MySql, same workload will be executed through the proposed technique. Comparison of these techniques will be made and discussed that will show the effectiveness of the proposed technique. All the comparisons will be presented through the graphs that will show how much the proposed solution for the workload management is effective. Summary The chapter starts with providing an overview of the research problem. The steps to achieve the proposed research for the workload management in DBMSs are briefly explained. The proposed idea for the workload management in DBMss consists of three main modules which are Characterization, Scheduler and Idleness Detection. The abstract level steps of these three modules are explained in this chapter. As the experimental design play main role to validate the research, therefore experimental setup is discussed that includes experimental environment (hardware, DBMS, workloads) and the evaluation method to validate the proposed research. Workload Management through Characterization and Idleness Detction 50 Chapter 4: Workload Characterization Chapter 4 Workload Characterization The chapter starts with providing an overview of the workload and its complexity; problems in the previous research and proposed workload management solution that further consist of the workload characterization, scheduling and idleness detection modules. This chapter is dedicated to provide the discussion about the proposed workload characterization approach. It identifies the workload classes by partitioning the workload into smaller sets composed of transactions or requests. The workload related parameters are identified after executing the workload and analyzing the changes in the workload parameters. The module is validated through various experiments over the real-time and benchmark workloads that consist of OLTP and DSS like queries. 4.1 Introduction and Background Complexity in Database Management System (DBMS) increases due to the various factors which are functionality demands from the users, complex data types, diverse workload and huge data volume that is increasing with the passage of time. These factors cause brittleness and unmanageability in DBMSs. Moreover, today’s DBMSs have also to work as a Data Warehouse (DW), i.e. providing summarized and consolidated reports; heterogeneous, complex and ad-hoc workloads that involve number of aggregations, table joins, sorts and concurrent I/O’s. To handle these problems organizations hire number of expert Database Administrators (DBA) and spending lot of money to get expected improvement, throughput and response. Usually DBA have to take care of all the database tasks such as making policy for workload priorities, memory and storage configuration and such other tasks. The cost of hardware is decreasing but the cost of management is increasing. Performing workload management activities manually, by hiring experts causes increase in TCO [Huebscher et al. (2008), IBM Corporation (2003)]. Moreover with the advent of distributed systems and Data Warehouses (DWs), it became difficult and even some cases impossible for DBA to manually organize, optimize and configure all the database tasks. For efficient workload management, the other approach is to stop the workload for a while and resumed later on. However, when queries will Workload Management through Characterization and Idleness Detection 52 Chapter 4 Workload Characterization be stopped during their execution then executed part will be stored for a while in memory and used later on. Style of the database workload has been changed from the offline analysis to online adaptation. In workload management there are three units which are workload, resources and objectives. All three workload units are co-related with each other. The workload uses some resources to meet the objectives of an organization or we can say resources are allocated through different approaches to workload which has some management objectives. Workload has evolved through three phases which are capacity planning, resource sharing and performance oriented workload [Niu et al. (2006)]. Capacity planning workload management is based on cost sharing; the idea behind the resource oriented approach is maximum resource utilization while in case of performance oriented approach the focus is on the business goals and objectives. In workload management, the main functions are workload frequency patterns, composition, intensity and required resources. Identification of the workload type is not an easy task as workload size and type changes at the different spans of the day, week and months. For example, the stock exchange experiences more DSS workload than OLTP workload at the start of the day, turns to more OLTP than DSS during mid of the day and finally once again tends to DSS overnight to observe the whole day analysis and reporting. In general, workload is detected through two methods that are performance monitoring and characterization. Former is the reactive approach that takes action when the performance has been degraded while the later is proactive approach that works by tracking the workload changes. 4.2 Problems in Previous Workload Management Techniques Database workload consists of SQL statements, transactions, scripts, commands, sessions and Jobs. From the last many decays, different researchers, practitioners and vendors have been working on the workload management in DBMSs and DWs. In this regard, various techniques have been introduced to manage the workload that includes Linear Programming, Mixed-Integer Programming, Queuing Theoretic Model, Largest Memory Priority (LMP), Priority Gradient Workload Management through Characterization and Idleness Detction 53 Chapter 4 Workload Characterization Multiprogramming (PGM), and Shortest Job First (SJF) algorithm. However, these techniques have certain limitations: the workload optimization techniques that are using non-preemptive scheduling scheme provide inefficient results due to their poor responsiveness that may lead to starvation and hanging; most of the previous techniques use approximate values rather than calculating actual values; considering all the queries at same level; dividing the workload into batches or taking multiprogramming (MPL) level on the basis of single or two parameters. By taking the approximate values, considering all queries at same level, avoiding query interaction and fixed query types lead to inefficient results. Dividing batches or MPL setting on the basis of single parameter is not a sufficient criterion to get sufficient performance. 4.3 Proposed Workload Management Solution As discussed in previous chapter that the proposed solution for the workload management in DBMSs consists of three main modules. These are the Workload Characterization, Scheduler and Idleness Detector module and now we are discussing these in detail. 4.3.1 Workload Characterization The incoming workload is characterized and classified as OLTP and DSS on the basis of key attributes of the workload. An overview of the proposed approach is shown in Figure 4.2. For this purpose, we have developed a Case Base Reasoning (CBR) that contains the effective workload parameters with their associative workload type. To develop the training data, these effective workload parameters are identified with their associated workload type. The proposed solution works by extracting the above parameters for each query of the incoming workload and values of these parameters are searched in the CBR. When exact or nearest match against the parameters is found then the workload type of the searched case is taken. After the identification of the workload type it is stored in the CBR as a new case for future use. These steps are explained in the chapter later. Workload Management through Characterization and Idleness Detction 54 Chapter 4 Workload Characterization Figure 4.1 Steps to Characterize the DBMS/DW Workload The proposed technique for the characterization of database workload works as the user writes the workload or query in the workload editor. The characterizing process works in parallel with the user input to the workload editor. By adopting this approach, the workload is characterized and classified before the submission of the workload to DBMS. In previous characterization techniques, the characterizing process starts after the submission of the workload to DBMS. In contrast to the previous approaches, we characterized the workload in advance. Due to this reason, the proposed CBR workload characterization saves the precious execution time that ultimately improves the efficiency of DBMS. Detailed steps of the proposed workload characterization are: 4.3.1.1 Key Attributes Selection In MySQL 5.1, there are 291 status variables which are used to distinguish the different operations. The values of these status variables change according to the specific operation with the experience of some transactions or workload. However, all status variables are not used to distinguish the workload type, i.e. OLTP or DSS. After experimenting and analysis over the various OLTP and DSS like workloads, 10 status variables are selected at the initial stage that were used to distinguish the workload type. These status variables are Com_ratio, Innodb_log_writes, key_reads, key_read_rquests, key_writes, key_write_requests, opened_files and Qcache_hits, Questions, sort_rows and table_locks_immediate. Among these four status Workload Management through Characterization and Idleness Detction 55 Chapter 4 Workload Characterization variables (Com_ratio, Innodb_log_writes, Qcache_hits and Questions) are those which were already used by [Zewdu et al. (2009)] to classify the database workload. Once the key status variables are selected, training data for CBR is constructed that includes these selected status variables with their values and particular workload type. After the various experiments over OLTP and DSS like workloads, 7 status variables among the 10 are eliminated as their values were not affecting the workload type too much and took only 3 of them which are vital in distinguishing the workload type. These 3 selected status variables are key_reads, key_writes and table_locks. In the following section, these 3 selected status variables are discussed with their recoded values after the execution of the OLTP and DSS like workloads in isolation. These values are recorded with an interval of 5 seconds and also represented through graphs for better understanding. To take the values of status variables, snapshots are taken with an interval of 5 seconds during the workload execution. Finally, the recorded values before execution are subtracted from the values after execution in order to take the actual effect of each status variable. After executing various OLTP and DSS like workloads and observing their associative snapshots, we have selected the 5 seconds as the snapshot interval as in average at least one of the OLTP or DSS statement gets its completion. Key_Writes: This variable is dedicated to represent the number of key blocks to disk. The value of this variable remains very small in the DSS workload as compared to the OLTP workload. As the workload starts its execution, the value of this variable increases but this increase always remains maximum up to 0.2 in DSS types of workloads. However, in case of the OLTP workload the value of key_writes increases with a very high rate as compare to DSS and its bench mark value is up to 1. The consecutive 30 snapshots are taken and shown in Figure 4.3. The graph represents a major difference between the OLTP and DSS workloads as the value of Key_writes increases with the execution of the OLTP workload but remain constant or little change during the DSS types of workloads. Workload Management through Characterization and Idleness Detction 56 Chapter 4 Workload Characterization Figure 4.2 OLTP and DSS workload execution with respect to Key_writes Key_reads: This status variable represents the number of physical reads of a key block from the disk. The value of the variable remains unchanged during the execution of the OLTP workload; however its value increases rapidly in DSS workload as shown in Figure 4.4, while value of key_reads remains constant or very little change is seen during the OLTP type of workload. Figure 4.3 OLTP and DSS workload execution with respect to Key_reads Table_locks_immediate: This variable represents the number of times that a request for a table lock could be allowed immediately. This variable has a little bit similar behavior to Key_reads status variable, as its value increases rapidly during the execution of the DSS workloads while a very small increase has been observed during the OLTP workloads as shown in Figure 4.5. The reason of more locks in the DSS workload is due to the involvement of more number of tables and long term transactions as compared to the OLTP workload. Workload Management through Characterization and Idleness Detction 57 Chapter 4 Workload Characterization Figure 4.4 OLTP and DSS workload execution with respect to Table_lock_immediate After the selection of these 3 status variables, their recorded values are made concise and simple by converting them into the range of [0, 1]. The conversion is made by performing the following steps: a. Find the final value (or value of the last snapshot) of each individual status variable. b. Divide all the recorded snapshot values of that particular status variable by its maximum or final value as identified in step a. 4.3.1.2 CBR Development After the selection of key status variables, the sample data is constructed for CBR that is taken by the execution of workload that is generated by executing the OLTP and DSS like workloads. Both the benchmarks are used to represent the OLTP and DSS workloads. Raw Snapshots are recorded against the three selected variables with their workload type during the execution of workloads. These values are taken with the interval of 5 seconds. Average of each selected parameter is calculated after getting their values. We standardized the collected data through Z score and considered all the parameters have same weight or influence. Z score is the measure that converts the values into the range of [0, 1]. Finally normalized data is used to construct the CBR as shown in Table 4.1. The values represented in above graphs can also be converted in the range of 0 to 1 by dividing each value of the column to the highest value of that column. For example, in first column the maximum value is 1406831 that will be divided to each value of the column to convert all the values in the range of [0, 1]. Workload Management through Characterization and Idleness Detction 58 Chapter 4 Workload Characterization Table 4.1 CBR with Sample Data Key_reads Key_Writes 0 11312 105976 0 180164 121 1406831 0 234 0 . . . . . 5382 82 0 61754 65 102198 0 156979 221727 283629 . . . . . Table_locks_ Workload immediate Type 39 66 78 41 78 42 98 48 49 50 . . . . . OLTP DSS DSS OLTP DSS OLTP DSS OLTP OLTP OLTP . . . . . There are four steps of the proposed database workload characterization technique to find the particular workload type as shown in Figure 4.5. These steps include Retrieve, Reuse, Revise and Retain. Retrieve It is the main step in the CBR where matching between incoming and stored cases is performed. It is also important in the sense that missing similar case can lead towards inaccurate case that ultimately makes wrong decisions. This is also important as the accuracy depend on the number of cases stored and matching cases retrieval. The similarity of cases can be determined through various techniques such as Euclidean distance, similarity matrices etc. The retrieval is performed from the problem description and similar cases are found. The process of retrieval is further divided into identification of the features, initially match, search, and selection of case in general. In our case, as shown in Figure 4.6 after arrival of the workload to the DBMS, values of the workload related parameters are taken. In CBR, retrieval phase takes much of the time in spending the process of searching. Workload Management through Characterization and Idleness Detction 59 Chapter 4 Workload Characterization Here, we have adopted the Euclidean distance method [Deza et al. (2009), Danielsson (1980)] to find the matched cases from CBR. In this method, distance between two points with co-ordinates (p1, p2) and (q1, q2) is calculated as: However, if we have to find the distance between two points which have more than two co-ordinates then the formula for Euclidean distance is computed as the root of square differences between coordinates of a pair of objects. For example, we have p = (p1, p2, ..., pn) and q = (q1, q2, ..., qn). In this case the distance from p to q or q to p is: The Euclidean distance method is selected here to identify the matched cases from the CBR as it is simple, faster and based on the actual ratings comparison. Figure 4.5 Steps to Characterize and Classify the Workload via CBR Workload Management through Characterization and Idleness Detction 60 Chapter 4 Workload Characterization Example 1 Let’s DBMS encounter a new workload with 117,102099 and 45 values against the three status variables. Now, we have to find that it matches which of the already stored cases. In this case first of all, the values of the status variables shown in Table 4.1 and new workload are converted into the range of [0, 1]. The CBR values are converted by dividing all three values of each row to the row’s highest value and status variables values of new workload are divided by 102099. The converted values of the new workload will become 0.0011, 1.000, 0.0004 respectively and the CBR records are shown in Table 4.2. Now, difference of the new workload with all the records of CBR is calculated using above Euclidean Distance formula. The workload type of the new workload is taken from the row/ record that have minimum distance value. In this example, minimum value of the Euclidean distance is 0.0001 (row 6, which is also highlighted) and the workload type in the row is OLTP. Therefore, the workload type of the incoming workload is OLTP. Table 4.2 Euclidean Difference Calculation for Incoming Workload (Example 1) New Workload (in the range of [0, 1]) CBR Data (In the range of [0, 1]) Workload Type Key_reads Key_Writes Table_locks Euclidean Difference Key_reads Key_Writes Table_locks _immediate 0.0000 1.0000 0.0072 OLTP 0.0011 1.0000 0.0004 0.0069 1.0000 0.0072 0.0058 DSS 0.0011 1.0000 0.0004 1.408 1.0000 0.0000 0.0007 DSS 0.0011 1.0000 0.0004 1.413 0.0000 1.0000 0.0007 OLTP 0.0011 1.0000 0.0004 0.0011 1.0000 0.0004 0.0004 DSS 0.0011 1.0000 0.0004 1.413 0.0012 1.0000 0.0004 OLTP 0.0011 1.0000 0.0004 0.0001 1.0000 0.0000 0.0001 DSS 0.0011 1.0000 0.0004 1.413 0.0000 1.0000 0.0003 OLTP 0.0011 1.0000 0.0004 0.0011 0.0011 1.0000 0.0002 OLTP 0.0011 1.0000 0.0004 0.0002 0.0000 1.0000 0.0002 OLTP 0.0011 1.0000 0.0004 0.0011 _immediate Example 2 In second example the incoming workload have the values 324563, 98 and 34 against three status variables respectively. Again in this case, values of the status variables as shown in Table 4.2 and incoming workload are converted into the range of [0, 1]. The converted values of the incoming workload will become 1.0000, 0.0003, 0.0001 respectively and the CBR records are Workload Management through Characterization and Idleness Detction 61 Chapter 4 Workload Characterization shown in Table 4.2. Now, difference of the new workload with all the records of CBR is calculated using above Euclidean Distance formula. The workload type of the incoming workload is taken from the row/ record that have minimum distance value. In this example, minimum value of the Euclidean distance is 0.00030398 (row 7, which is also highlighted). Therefore, the workload type of the incoming workload is DSS. Table 4.3 Euclidean Difference Calculation for Incoming Workload (Example 2) New Workload (in the range of [0, 1]) CBR Data (In the range of [0, 1]) Euclidean Difference Key_Write Table_locks s _immediate Euclidean Difference 1.0000 0.0003 0.0001 1.41401811 DSS 1.0000 0.0003 0.0001 0.00900505 0.0007 DSS 1.0000 0.0003 0.0001 0.00069976 1.0000 0.0007 OLTP 1.0000 0.0003 0.0001 1.41400018 1.0000 0.0004 0.0004 DSS 1.0000 0.0003 0.0001 0.00033342 0.0012 1.0000 0.0004 OLTP 1.0000 0.0003 0.0001 1.41316303 1.0000 0.0000 0.0001 DSS 1.0000 0.0003 0.0001 0.00030398 0.0000 1.0000 0.0003 OLTP 1.0000 0.0003 0.0001 1.41400009 0.0011 1.0000 0.0002 OLTP 1.0000 0.0003 0.0001 1.41325391 0.0000 1.0000 0.0002 OLTP 1.0000 0.0003 0.0001 1.41400007 Key_reads Key_Writes Table_locks _immediate Workload Type Key_reads 0.0000 1.0000 0.0072 OLTP 1.0000 0.0072 0.0058 1.0000 0.0000 0.0000 We have explained the method by which the nearest matching records from the CBR is retrieved. However, a problem with the above method is that the search algorithm takes too much time specially when the size of CBR is large. The complexity of above search Algorithm is O(n) as Euclidean distance is calculated for all the records of the CBR. In order to minimize the search time, Hash search technique is adopted. Hashing is the fastest search technique that takes just unit time in searching process, i.e. O (1) when there is no hash collision. The nearest searched case are determined by taking the Hash value from the selected parameters of the workload. In the preparation of training data, the hash value for each row is determined that is assigned from the selected workload parameter’s values. The Hash value for each row consists of four digits that are taken by selecting the number of digits from Workload Management through Characterization and Idleness Detction 62 Chapter 4 Workload Characterization the first two columns (i.e. Key_reads and Key_writes) and value obtained by Table_locks_immediate Mod 100. H (Key) = Number of Digits (Key_reads) + Number of Digits (Key_writes) + Table_locks_immediate Mod 100 By adopting this scheme, in case of Table 4.1 first row will have the Hash value 1439 (1 digit in 0, 4 digits in 5382 and 39 from 39). The same strategy is adopted to calculate the hash value for the remaining rows and now their hash values are 5266, 6178, 1541, 6278, 3642, 7198, 1648, 3649 and 1650. These Hash values are stored with their associative workload type into the CBR for future use as shown in Table 4.4. Table 4.4 CBR Data with Hash Values H (key) Key_reads Key_Writes 1439 5266 6178 1541 6278 3642 7198 1648 3649 1650 . . . . . 0 11312 105976 0 180164 121 1406831 0 234 0 . . . . . 5382 82 0 61754 65 102198 0 156979 221727 283629 . . . . . Table_locks_ Workload immediate Type 39 66 78 41 78 42 98 48 49 50 . . . . . OLTP DSS DSS OLTP DSS OLTP DSS OLTP OLTP OLTP . . . . . In case of example 1 where the workload to be searched has the parameter values 117,102099 and 45, first of all the hash value is calculated i.e. 3645. There is no exact matching record of 3645, therefore most similar records are identified from the CBR which are 3642 and 3649. The values of these two selected rows and the incoming workload are converted into the range of [0, 1] and Euclidean distance is calculated as discussed previously. In second example, the parameter values are 324563, 98 and 34 and its associated hash value will be 6234. Again, Workload Management through Characterization and Idleness Detction 63 Chapter 4 Workload Characterization the exact match is not there while the nearest hash values are 6278 and 6178. Therefore, the associated workload type is assigned to the current workload. Reuse The Reuse step starts after the completion of the Retrieve and can achieve by performing two sub-steps. First, finding the differences among the current and the retrieved case; and second identification of the portion of the retrieved case that must be stored as a new case into the CBR for future use. The reuse step proposes the workload type from the solution that retrieved in previous step. Reusing a retrieved cases can be as easy as returning the retrieved solution, unchanged as here we just have to use the workload type of the most similar case from the retrieve step. This methodology is mostly used in the classification problems where each workload class or type is likely to be represented frequently in the case base and therefore the most similar retrieved case if sufficiently similar is most probably contain the right solution. However it becomes more difficult when there are huge and major differences between the incoming workload and the retrieved cases. In those circumstances the retrieved solution may need to be adapted to account for these important differences. Revise The revise step is initiated only when the solution found from the previous step (reuse) is inaccurate. The revise process is achieved by evaluating the reuse solution and corrects it by applying the domain knowledge. In this step there is a need of some adaptation method that is applied to revise the solution by observing the change in the retrieve process and how the retrieve solution can be changed. We have used the Substitution adaptation method as proposed by (Kolodner, 1993), where the transformation adaptation alters the structure of the solution. There are two ways of adaptation, in first case there is involvement of the user at each substitution step who decides whether the substitution is according to his requirement or not. So user decides about the substitution accepts or rejection. While in second case, the system performs all the substitution at its own and finally the substituted result is sent to the user for final evaluation. For training data, we have adopted the first one in order to get accuracy. Workload Management through Characterization and Idleness Detction 64 Chapter 4 Workload Characterization Retain The retain phase is the final step in CBR that insert the valuable information of the current case into the knowledge base that is helpful for future problems solving. However, duplicate or extra information should be avoided during the incorporation of new case otherwise it slows down the retrieval process. The retain process also involves that how information will be stored and its associated index in current memory structure. 4.3.2 Detection of the DB Workload Basic purpose of the workload detection is to discover, analyze and classify the database workload. The values of the workload associated status variables change according to the specific operation with the execution of some transactions or workload as we have discussed above. In MySQL 5.1, there are 291 status variables which are used to distinguish the different operations and from these 3 status variables are selected in order to distinguish the workload type i.e. OLTP or DSS. In order to prepare the training data, we have taken the snapshots of these status variables during the workload execution with an interval of 5 seconds. In each snapshot, values of the status variables are recorded and stored. Finally, the recorded values of each snapshot are subtracted from its predecessor snapshot to take the actual effect during the execution. The recorded values for the three status variables are converted into the range of [0, 1]. 4.4 Experimental Setup & Results Validation The hardware is same as we have discussed in the previous chapter that is a Database Server, Backup Server, six Personal Computers and a Switch that connects all devices to form a network. MySql Database Management System is used to execute various workloads. MySql Server is installed on the server machine and MySql client on the client machine. The main database is stored on the server machine while backup server machine is responsible to maintain the backup of the database. Workload Management through Characterization and Idleness Detction 65 Chapter 4 Workload Characterization 4.4.1 Workload Editor The workload editor is developed in VB.Net with MySql database at backend. It is used to generate the various types of workloads that user wants to execute. There are number of daily used workloads are stored in the database on the server machine that are available to users. The workload editor allows users to either select the workload or query that are stored in the database or write their own workload or query. The proposed technique for characterization of the database workload works as the user writes the workload in the workload editor or selects the workloads from the already stored workloads. The characterizing process starts in parallel with user input to the workload editor. So the workload is characterized and classified before the actual start of workload execution. In previous characterization techniques, the characterizing process starts after the submission of workload to the system. In contrast to the previous approaches, the proposed technique characterizes the workload in advance, i.e. before the submission of workload to system. Due to this reason, the proposed characterizing technique saves the precious execution time that ultimately manages the workload in more efficient manner. 4.4.2 Workload Database Main purpose of the workload database is to provide the common workloads that are usually used by the database users to get useful information in the form of reports or allows many other operations such as insertions, deletions, updations etc. The workload database is stored at the Database Server, so that all the database users can use it. The sample workloads consist of the OLTP and DSS like queries that include insert, update, delete, select, create, load, alter and many other types of queries. New workloads are also inserted into the workload database from the administrator or as users write some new workload that is entirely different from the stored workloads or queries. So by doing this there is no need to write the same workload again as it is stored in the database after its successful execution and is available to all database users. The proposed approach saved the precious time of the users by providing the mostly used workloads in the workload editor Workload Management through Characterization and Idleness Detction 66 Chapter 4 Workload Characterization otherwise they have type even the same workload or query again and again that is just wastage of their useful time. Following are the queries that we have used for the experiments and are taken from the TPC-H benchmarks. Sample Queries Available in Workload Editor create table nation ( n_nationkey decimal(3,0) not null, n_name char(25) not null, n_regionkey decimal(2,0) not null, n_comment varchar(152) ) TYPE=MyISAM; create table region ( r_regionkey decimal(2,0) not null, r_name char(25) not null, r_comment varchar(152) ) TYPE=MyISAM; create table part ( p_partkey decimal(10,0) not null, p_name varchar(55) not null, p_mfgr char(25) not null, p_brand char(10) not null, p_type varchar(25) not null, p_size decimal(2,0) not null, p_container char(10) not null, p_retailprice decimal(6,2) not null, p_comment varchar(23) not null ) TYPE=MyISAM; create table supplier ( s_suppkey decimal(8,0) not null, s_name char(25) not null, s_address varchar(40) not null, s_nationkey decimal(3,0) not null, s_phone char(15) not null, s_acctbal decimal(7,2) not null, s_comment varchar(101) not null ) TYPE=MyISAM; create table partsupp ( ps_partkey decimal(10,0) not null, ps_suppkey decimal(8,0) not null, ps_availqty decimal(5,0) not null, Workload Management through Characterization and Idleness Detction 67 Chapter 4 Workload Characterization ps_supplycost decimal(6,2) not null, ps_comment varchar(199) not null ) TYPE=MyISAM; create table customer ( c_custkey decimal(9,0) not null, c_name varchar(25) not null, c_address varchar(40) not null, c_nationkey decimal(3,0) not null, c_phone char(15) not null, c_acctbal decimal(7,2) not null, c_mktsegment char(10) not null, c_comment varchar(117) not null ) TYPE=MyISAM; create table orders ( o_orderkey decimal(12,0) not null, o_custkey decimal(9,0) not null, o_orderstatus char(1) not null, o_totalprice decimal(8,2) not null, o_orderdate date not null, o_orderpriority char(15) not null, o_clerk char(15) not null, o_shippriority decimal(1,0) not null, o_comment varchar(79) not null ) TYPE=MyISAM; create table lineitem ( l_orderkey decimal(12,0) not null, l_partkey decimal(10,0) not null, l_suppkey decimal(8,0) not null, l_linenumber decimal(1,0) not null, l_quantity decimal(2,0) not null, l_extendedprice decimal(8,2) not null, l_discount decimal(3,2) not null, l_tax decimal(3,2) not null, l_returnflag char(1) not null, l_linestatus char(1) not null, l_shipdate date not null, l_commitdate date not null, l_receiptdate date not null, l_shipinstruct char(25) not null, l_shipmode char(10) not null, l_comment varchar(44) not null ) TYPE=MyISAM; create create create create create index index index index index fkey_nation_1 on nation(n_regionkey); fkey_supplier_1 on supplier(s_nationkey); fkey_customer_1 on customer(c_nationkey); fkey_partsupp_1 on partsupp(ps_suppkey); fkey_partsupp_2 on partsupp(ps_partkey); Workload Management through Characterization and Idleness Detction 68 Chapter 4 Workload Characterization create create create create create create create create create create create create create create create create create create create create create create create index index index index index index index index index index index index index index index index index index index index index index index fkey_orders_1 on orders(o_custkey); fkey_lineitem_1 on lineitem(l_orderkey); fkey_lineitem_2 on lineitem(l_partkey,l_suppkey); fkey_lineitem_3 on lineitem(l_suppkey); xxx1 on lineitem(l_shipdate); xxx2 on customer(c_mktsegment); xxx3 on orders(o_orderdate); xxx4 on region(r_name); xxx5 on lineitem(l_discount); xxx6 on lineitem(l_quantity); xxx7 on lineitem(l_returnflag); xxx8 on lineitem(l_shipmode); xxx9 on lineitem(l_commitdate); xxx10 on lineitem(l_receiptdate); xxx11 on lineitem(l_partkey); xxx12 on part(p_size); xxx13 on part(p_type); xxx14 on partsupp(ps_supplycost); xxx15 on nation(n_name); xxx16 on part(p_name); xxx17 on orders(o_clerk); xxx18 on part(p_brand); xxx19 on part(p_container); LOAD DATA INFILE "D://implementation//DBGen//region.tbl" INTO TABLE region FIELDS TERMINATED BY "|" LINES TERMINATED BY "\r\n" (r_regionkey, r_name, r_comment); LOAD DATA INFILE "D://implementation//DBGen//nation.tbl" INTO TABLE nation FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//supplier.tbl" INTO TABLE supplier FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//part.tbl" INTO TABLE part FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//partsupp.tbl" INTO TABLE partsupp FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//customer.tbl" INTO TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//orders.tbl" INTO TABLE orders FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//lineitem.tbl" INTO TABLE lineitem FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; ALTER TABLE region ADD CONSTRAINT pkey_region PRIMARY KEY(r_regionkey); ALTER TABLE nation ADD CONSTRAINT pkey_nation PRIMARY KEY(n_nationkey); ALTER TABLE part ADD CONSTRAINT pkey_part PRIMARY KEY(p_partkey); ALTER TABLE supplier ADD CONSTRAINT pkey_supplier PRIMARY KEY(s_suppkey); ALTER TABLE partsupp ADD CONSTRAINT pkey_partsupp PRIMARY KEY(ps_partkey,ps_suppkey); Workload Management through Characterization and Idleness Detction 69 Chapter 4 Workload Characterization ALTER TABLE customer ADD CONSTRAINT pkey_customer PRIMARY KEY(c_custkey); ALTER TABLE lineitem ADD CONSTRAINT pkey_lineitem PRIMARY KEY(l_orderkey,l_linenumber); ALTER TABLE orders ADD CONSTRAINT pkey_orders PRIMARY KEY(o_orderkey); analyze analyze analyze analyze analyze analyze analyze analyze table table table table table table table table nation; region; part; supplier; partsupp; customer; orders; lineitem; /* TPC_H Query 1 - Pricing Summary Report */ SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1L_DISCOUNT)) AS SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER FROM LINEITEM // WHERE L_SHIPDATE <= dateadd(dd, -90, cast('1998-12-01' as date)) WHERE L_SHIPDATE <= '1998-12-01' GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG,L_LINESTATUS /* TPC_H Query 2 - Minimum Cost Supplier */ SELECT S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY, P_MFGR, S_ADDRESS, S_PHONE, S_COMMENT FROM PART, SUPPLIER, PARTSUPP, NATION, REGION WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY AND P_SIZE = 15 AND P_TYPE LIKE '%%BRASS' AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = 'EUROPE' AND PS_SUPPLYCOST = (SELECT MIN(PS_SUPPLYCOST) FROM PARTSUPP, SUPPLIER, NATION, REGION WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = 'EUROPE') ORDER BY S_ACCTBAL DESC, N_NAME, S_NAME, P_PARTKEY TOP 100 Workload Management through Characterization and Idleness Detction 70 Chapter 4 Workload Characterization /* TPC_H Query 3 - Minimum Cost Supplier */ Select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue,o_orderdate, o_shippriority from customer, orders, lineitem Where c_mktsegment = '[SEGMENT]' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '[DATE]' and l_shipdate > date '[DATE]' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate; /* TPC_H Query 4 - */ Select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '[DATE]' and o_orderdate < date '[DATE]' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority; /* TPC_H Query 5 - Local Supplier Volume */ SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = 'ASIA' AND O_ORDERDATE >= '1994-01-01' AND O_ORDERDATE < DATEADD(YY, 1, cast('1994-01-01' as date)) GROUP BY N_NAME ORDER BY REVENUE DESC /* TPC_H Query 6 – not execute*/ Select sum (l_extendedprice*l_discount) as revenue from lineitem where l_shipdate >= date '[DATE]' and l_shipdate < date '[DATE]' + interval '1' year and l_discount between [DISCOUNT] - 0.01 Workload Management through Characterization and Idleness Detction 71 Chapter 4 Workload Characterization and [DISCOUNT] + 0.01 and l_quantity < [QUANTITY]; /* TPC_H Query 7 - */ Select supp_nation, cust_nation, l_year, sum(volume) as revenue from (select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]') or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year; /* TPC_H Query 8 - National Market Share – not executed*/ SELECT O_YEAR, SUM(CASE WHEN NATION = 'BRAZIL' THEN VOLUME ELSE 0 END)/SUM(VOLUME) AS MKT_SHARE FROM (SELECT datepart(yy,O_ORDERDATE) AS O_YEAR, L_EXTENDEDPRICE*(1L_DISCOUNT) AS VOLUME, N2.N_NAME AS NATION FROM PART, SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION N1, NATION N2, REGION WHERE P_PARTKEY = L_PARTKEY AND S_SUPPKEY = L_SUPPKEY AND L_ORDERKEY = O_ORDERKEY AND O_CUSTKEY = C_CUSTKEY AND C_NATIONKEY = N1.N_NATIONKEY AND N1.N_REGIONKEY = R_REGIONKEY AND R_NAME = 'AMERICA' AND S_NATIONKEY = N2.N_NATIONKEY AND O_ORDERDATE BETWEEN '1995-01-01' AND '1996-12-31' AND P_TYPE= 'ECONOMY ANODIZED STEEL') AS ALL_NATIONS GROUP BY O_YEAR ORDER BY O_YEAR Workload Management through Characterization and Idleness Detction 72 Chapter 4 Workload Characterization /* TPC_H Query 9 - */ select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%[COLOR]%' ) as profit group by nation, o_year order by nation, o_year desc; /* TPC_H Query 10 - */ Select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue,c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '[DATE]' and o_orderdate < date '[DATE]' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc; /* TPC_H Query 11 - Important Stock Identification */ SELECT PS_PARTKEY, SUM(PS_SUPPLYCOST*PS_AVAILQTY) AS VALUE FROM PARTSUPP, SUPPLIER, NATION WHERE PS_SUPPKEY = S_SUPPKEY AND S_NATIONKEY = N_NATIONKEY AND N_NAME = 'GERMANY' GROUP BY PS_PARTKEY HAVING SUM(PS_SUPPLYCOST*PS_AVAILQTY) > (SELECT SUM(PS_SUPPLYCOST*PS_AVAILQTY) * 0.0001000000 Workload Management through Characterization and Idleness Detction 73 Chapter 4 Workload Characterization FROM PARTSUPP, SUPPLIER, NATION WHERE PS_SUPPKEY = S_SUPPKEY AND S_NATIONKEY = N_NATIONKEY AND N_NAME = 'GERMANY') ORDER BY VALUE DESC /* TPC_H Query 12 - */ Select l_shipmode, sum(case when o_orderpriority ='1-URGENT' or o_orderpriority ='2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '[DATE]' and l_receiptdate < date '[DATE]' + interval '1' year group by l_shipmode order by l_shipmode; /* TPC_H Query 13 - */ select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) from customer left outer join orders on c_custkey = o_custkey and o_comment not like ‘%[WORD1]%[WORD2]%’ group by c_custkey )as c_orders (c_custkey, c_count) group by c_count order by custdist desc, c_count desc; Workload Management through Characterization and Idleness Detction 74 Chapter 4 Workload Characterization /* TPC_H Query 14 - */ Select 100.00 * sum (case when p_type like 'PROMO%' then l_extendedprice*(1-l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '[DATE]' and l_shipdate < date '[DATE]' + interval '1' month; /* TPC_H Query 15 - */ select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '[DATE]' and l_shipdate < date '[DATE]' + interval '3' month group by l_suppkey; select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue[STREAM_ID] where s_suppkey = supplier_no and total_revenue = (select max(total_revenue) from revenue[STREAM_ID]) order by s_suppkey; drop view revenue[STREAM_ID]; /* TPC_H Query 16 - Parts/Supplier Relationship */ SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY) AS SUPPLIER_CNT FROM PARTSUPP, PART WHERE P_PARTKEY = PS_PARTKEY AND P_BRAND <> 'Brand#45' AND P_TYPE NOT LIKE 'MEDIUM POLISHED%%' AND P_SIZE IN (49, 14, 23, 45, 19, 3, 36, 9) AND PS_SUPPKEY NOT IN (SELECT S_SUPPKEY FROM SUPPLIER WHERE S_COMMENT LIKE '%%Customer%%Complaints%%') GROUP BY P_BRAND, P_TYPE, P_SIZE ORDER BY SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE /* TPC_H Query 17 - Small-Quantity-Order Revenue */ SELECT SUM(L_EXTENDEDPRICE)/7.0 AS AVG_YEARLY FROM LINEITEM, PART WHERE Workload Management through Characterization and Idleness Detction 75 Chapter 4 Workload Characterization P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#23' AND P_CONTAINER = 'MED BOX' AND L_QUANTITY < (SELECT 0.2*AVG(L_QUANTITY) FROM LINEITEM WHERE L_PARTKEY = P_PARTKEY) /* TPC_H Query 18 - Large Volume Customer */ SELECT TOP 100 C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE, SUM(L_QUANTITY) FROM CUSTOMER, ORDERS, LINEITEM WHERE O_ORDERKEY IN (SELECT L_ORDERKEY FROM LINEITEM GROUP BY L_ORDERKEY HAVING SUM(L_QUANTITY) > 300) AND C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY GROUP BY C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE ORDER BY O_TOTALPRICE DESC, O_ORDERDATE /* TPC_H Query 19 - */ Select sum(l_extendedprice * (1 - l_discount) ) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = ‘[BRAND1]’ and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10 and p_size between 1 and 5 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkey and p_brand = ‘[BRAND2]’ and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’) and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10 and p_size between 1 and 10 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkey and p_brand = ‘[BRAND3]’ and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10 and p_size between 1 and 15 Workload Management through Characterization and Idleness Detction 76 Chapter 4 Workload Characterization and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ); /* TPC_H Query 20 - */ select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like '[COLOR]%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('[DATE]’) and l_shipdate < date('[DATE]’) + interval ‘1’ year ) ) and s_nationkey = n_nationkey and n_name = '[NATION]' order by s_name; /* TPC_H Query 21 - Suppliers Who Kept Orders Waiting */ SELECT TOP 100 S_NAME, COUNT(*) AS NUMWAIT FROM SUPPLIER, LINEITEM L1, ORDERS, NATION WHERE S_SUPPKEY = L1.L_SUPPKEY AND O_ORDERKEY = L1.L_ORDERKEY AND O_ORDERSTATUS = 'F' AND L1.L_RECEIPTDATE> L1.L_COMMITDATE AND EXISTS (SELECT * FROM LINEITEM L2 WHERE L2.L_ORDERKEY = L1.L_ORDERKEY AND L2.L_SUPPKEY <> L1.L_SUPPKEY) AND NOT EXISTS (SELECT * FROM LINEITEM L3 WHERE L3.L_ORDERKEY = L1.L_ORDERKEY AND L3.L_SUPPKEY <> L1.L_SUPPKEY AND L3.L_RECEIPTDATE > L3.L_COMMITDATE) AND S_NATIONKEY = N_NATIONKEY AND N_NAME = 'SAUDI ARABIA' GROUP BY S_NAME ORDER BY NUMWAIT DESC, S_NAME Workload Management through Characterization and Idleness Detction 77 Chapter 4 Workload Characterization /* TPC_H Query 22 - */ select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from (select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where /*substring(c_phone from 1 for 2) in ('[I1]','[I2]’,'[I3]','[I4]','[I5]','[I6]','[I7]') and */ c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 /*and substring (c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')*/ ) and not exists (select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode; 4.4.3 How Proposed Workload Technique Works? We have executed the OLTP and DSS like workloads. During their execution snapshots are collected against the three selected status variables with an interval of 5 seconds. To build the training data, these snapshots are marked as OLTP or DSS according to their type and are used for the workload characterization. However, the training snapshots must be sufficient to build a workload characterization and selected interval must contain the time period in which at least one SQL statement must complete its execution. To get the sufficient snapshot interval first of all 1 second is selected, where we observed that many OLTP SQL statements completely executes but this is not true for the DSS like SQL statements as they are large and take large time in their completion. Due to this reason, we have increment the snapshot time and observing query completion behavior in that particular snapshot time. After executing various OLTP and DSS like workloads; and observing their associative snapshots, we have selected the 5 seconds as snapshot interval period as at least one OLTP or DSS statement get its completion in this time. Workload Management through Characterization and Idleness Detction 78 Chapter 4 Workload Characterization After building the training data, we have executed the OLTP and DSS like workloads from 20 to 30 minutes. During the workload execution, snapshots are stored with an interval of 5 seconds. Then the proposed workload characterization technique is applied to distinguish and classify the workload into OLTP and DSS. 4.4.4 Results Discussion The proposed database workload characterization technique is validated and found to be effective. For experiments, we have taken the OLTP and DSS like workloads (ordering profile as OLTP and browsing profiles as DSS workloads). Values of three selected MySql status variables are taken through the show status statement of MySql and stored in the database to identify the workload type. The workload is generated from the client machines in the form of threads and during the workload execution the status variable values are recorded with an interval of 5 seconds. The workload characterization during the browsing workload provides the information that 89.3% of the workload is DSS and 11.7% is OLTP. In case of the ordering workload, it reports that 8.5% of the workload as DSS and remaining as OLTP. After that, the workload that is related with the shopping is executed. After its execution and analysis, it is found that there is 77% DSS and 23% OLTP workload in shopping profile. The results obtained from these experiments are as per our anticipation due to the benchmark description and understandings of the selected attributes. Same experiments are repeated multiple times to get the accurate results. This method worked well to identify the workload type but still there are some issues. First, the workload parameters are only available during or after the complete execution of the workload. Due to this reason, small transactions cannot be managed as they get completion before the identification of their workload type. While in case of large transactions, where for efficient execution some transactions are necessary to stop and their executed part requires to be stored in memory for later use (after resume). The other problem with the approach is that it cannot be used for mix type of workloads, i.e. the workloads that consists of both OLTP and DSS queries. Workload Management through Characterization and Idleness Detction 79 Chapter 4 Workload Characterization Summary In this chapter, we have discussed the workload characterization approach based on Case Base Reasoning that makes the decision about the workload type. The approach starts its working with the identification of the key workload parameters that are more effective for the characterization and classification. On the basis of these parameters, workload is classified into the OLTP and DSS types of workload. The approach is validated by executing various OLTP and DSS like workloads and found to be effective in identifying the workload type. Workload Management through Characterization and Idleness Detction 80 Chapter 5: Fuzzy Logic Based Scheduling Chapter 5 Fuzzy Logic Based Scheduling The chapter discusses the Fuzzy logic based technique for scheduling the DBMS. First section of the chapter provides an overview and background of the scheduling technique and Fuzzy Logic. Here the workloads are arranged on the basis of the OLTP and DSS percentage of each workload. Subsequently, the entire workload is arranged for execution on the basis of that percentage. The proposed Fuzzy based scheduling considers the workload type, size and importance. 5.1 Introduction and Background In this section, scheduling and Fuzzy Logic are discussed. 5.1.1 Scheduling In DBMS, scheduling is a process in which the workloads that are generated by the users and system arranged in some specific way for faster execution. It plays an important role to take care of all the underlying issues and arranges the database workload in such a way that each request has to wait for a least time (wait time) with maximum CPU utilization. Scheduling in DBMS can be performed either internally or externally [Schroeder et al. (2006)]. In case of internal scheduling, one achieves it by updating the internals of the DBMS while in case of external scheduling there is no need to change the internals. DBMS researchers, vendors, and practitioners prefer the external scheduling due to a number of benefits. For example, they do not need to learn about the DBMS’s internal architecture, not necessary to change the DBMS internals, and it is general, not specific to certain DBMS hence portable. Moreover, external scheduling can be modified easily as per requirements and objectives of the organization. It has been applied on the web storage and database servers for many years. However, mostly the external scheduling is used for limiting the workload or number of requests i.e. multiprogramming level. In a small organization, DBMS executes all the requests in time without compromising the higher management work. However, in the case where huge amount of sessions, requests, Workload Management through Characterization and Idleness Detction 82 Chapter 5 Fuzzy Logic Based Scheduling and other system jobs arrive for execution simultaneously, DBMS either crashes due to the overload problem or higher management has to wait for a long time to get important and useful reports for decision making. There is also a need to manage the workload not only in an efficient way but giving high priority to important workload as compared to the workload generated from lower management. All these factors contribute in need of an efficient scheduler that may be either internal or external. The scheduler will send the workload to the DBMS in an order by which both starvation and underload problems can be avoided. Hence, the proposed scheduler is required to take care of the user and system oriented workloads with respect to size and importance all the time. This strategy also helps to use maximum CPU cycles and ultimately enhance the efficiency of the DBMSs. 5.1.1 Fuzzy Logic In Boolean logic there are two values True and False, while in Fuzzy logic there the truth or false value is in the range of [0, 1]. Fuzzy logic was introduced by Lotfi Zadeh in 1965 that is based on the concept of partial truth and approximation rather than fixed [Zadeh, (1965)]. It is similar to probabilistic logic as in both cases the truth value has range from 0 to 1, but Fuzzy represents the degree of truth while the probability shows the likelihood. After its introduction, it has been used in various fields including Control Theory, Engineering, Networks, DBMS, and Artificial Intelligence etc. The Fuzzy logic is performed through four steps which are Fuzzification, Fuzzy Rules Evaluation, Aggregation and Defuzzification. In fuzzification step, crisp input values are evaluated by applying the membership functions that returns a value in between 0 and 1. Membership function must be simple and easily convertible to fuzzy set. There different methods that are used for this purpose such as Gaussian, Triangular or Trapezoidal. In Rule Evaluation the evaluated output received from the fuzzification step is checked through some already defined fuzzy rules which portray the expert decision and written in the form of fuzzy words. The Aggregation is applied to each output variable that performs the conversion of result into a fuzzy set. Finally, in Defuzzification, the fuzzified rules are used to produce the final output by Workload Management through Characterization and Idleness Detction 83 Chapter 5 Fuzzy Logic Based Scheduling changing the fuzzy set into output values. Usually two methods (Maximum method and Centroid methods) are used in this step. 5.2 Fuzzy Based (FB) Scheduler It is not necessary that DBMS always experiences solely OLTP or DSS workload. By solely we mean the crisp logic i.e. workload is 100% OLTP and 0% DSS or 100% DSS and 0% OLTP. Most of the time DBMS experiences mix type of workload so we cannot achieve better performance from the DBMS if workload is considered to be based on crisp logic. Behavior of the workload in DBMS mostly remains the non-deterministic for example, w1 and w2 consists of 20% OLTP and 80% DSS transactions, 60% OLTP and 40% DSS transactions, respectively. There is a need of a scheduler that can find the percentage of the OLTP and DSS of the incoming workload. By doing this, database workloads are arranged in a better way. The Fuzzy logic is adopted to find out the workload type percentage for the incoming workload. Fuzzy logic is related with the Fuzzy set theory [Zadeh, (1965)] that gives approximate value within the range of (0, 1). It is used in various fields including Control Theory, Engineering, Networks, DBMS, and Artificial Intelligence etc. The crisp logic represents the value either true or false solely while in the other case Fuzzy logic represents the degree of the truthiness or falseness. The following section discusses the architecture of the proposed scheduler for DBMS that uses the Fuzzy logic to calculate the OLTP and DSS percentage. Finally, the FBS on the basis of OLTP and DSS percentage arranges the partitioned workload through the Fuzzy rules and membership functions. The scheduler takes the workload related parameter’s values from the characterization module. Impact of each workload is calculated on the basis of parameters and workload type as identified. We are using the Fuzzy logic for scheduling the workload that will handle the workload dynamically. In Boolean logic each decision has two outputs true or false however in Fuzzy logic truth value has a range starting from 0 to 1. The Fuzzy logic works on the concept of the partial truth, where the truth value may range between completely true and completely false. Advantage to use the Fuzzy logic for scheduling the workload includes easy understanding, Workload Management through Characterization and Idleness Detction 84 Chapter 5 Fuzzy Logic Based Scheduling incremental growing of the knowledge with the arrival of new queries and workload and not necessary to know the inner detail of the workload or queries. There are three steps of the proposed FB management which are Fuzzification, Fuzzification Rules and Defuzzification. Following section describes these steps in detail that: 5.1.1 Fuzzification In this first step the crisp input values are evaluated by applying the membership functions that returns a value for the OLTP and DSS in between 0 and 1. There are various choices for the membership function which may be Gaussian, triangular, or trapezoidal method. However, for this particular problem, we have selected the Gaussian method as it is simple and easily convertible to the Fuzzy set. Input value is in the Fuzzy set when the value received from the membership function is closer to 1. Table 5.1 represents the various queries involved in the database workload ‘w’ that executed over the MySQL 5.1. The workload ‘w’ consists of 5 mix type of queries (Q1 - Q5) that involve both the OLTP and DSS type of workload in each query. To find out the OLTP and DSS percentage of each query, first we have calculated the total points of each query on the basis of four columns which are the number of tables, conditions, group by fields, and math functions involved in the workload. Table 5.1 The Workload Involving Mix OLTP and DSS Queries Tables SQL Statements in workload Q1(Insert, Select) Q2 (Update, Select) Q3 (Delete, Select) Q4 (Insert, Select, Select) Q5 (Delete, Select) Total Conditions Group By Columns Main Inner Query Query Math Functions Main Inner Query Query Total Points OLT P% DSS % 0 0 0 8 10 10 0.125 0.5 0.9 0.875 0.5 0.1 0 2 11 0.091 0.91 0 0 12 0.75 0.473 0.25 0.527 Main Query Inner Query Main Query Inner Query 1 2 4 4 3 1 0 3 5 3 2 0 0 0 0 0 0 0 0 0 0 1 3 0 3 0 2 2 1 7 2 0 0 Workload Management through Characterization and Idleness Detction 85 Chapter 5 Fuzzy Logic Based Scheduling The values for these columns are taken from each query and then summed up. For example, in case of query 1 the total points will be 1+4+3 = 8. After that the percentage of the OLTP and DSS in query is calculated as: For Insert query (Q1) the percentage is calculated as 1/8 + 0/8 + 0/8 + 0/8 = 0.125 Where 1, 0, 0, 0 are the values of tables, conditions, group by, and function parameters of the query respectively and 8 represents the total points of the query. While in case of Select query, the percentage is calculated as (Q1) 4/8 + 3/8 + 0/8 + 0/8 = 0.875 Where 4, 3, 0, 0 are the values of tables, conditions, group by, and function parameters of the query respectively and 8 represents the total points of the query. In the same way the OLTP and DSS percentage is calculated for the remaining queries and finally the total percentage of the OLTP and DSS is calculated for the entire workload by taking the sum and average of the OLTP and DSS percentage of the all the queries. In the above case as represented in Table 5.1, 0.473% OLTP and 0.527% DSS workload is found. Finally, the percentage of the OLTP and DSS percentage for the whole workload is calculated that is generated during the particular time. Table 5.2 represents a particular workload that is generated by the users during some specific time t. The entire workload consists of 6 sub-workloads that are generated by the various users with different priorities. Percentage of the OLTP and DSS is calculated for each sub-workload according to the above methodology. Now in the next step, these sub-workloads are arranged in a way so that the average waiting time of each sub-workload should be minimized with respect to higher priority. Table 5.2 Percentage of OLTP and DSS in the Workload Involving Mix Queries Workload w1 w2 w3 w4 w5 w6 User Priority 2 6 2 0 1 3 OLTP % 0.473 0.85 0.34 0 0.23 0.1 DSS % 0.527 0.15 0.66 1 0.77 0. 9 Workload Management through Characterization and Idleness Detction 86 Chapter 5 Fuzzy Logic Based Scheduling 5.1.2 Fuzzification Rules Output of the Fuzzification step is taken through the Fuzzy rules that portray the expert decision and written in the form of Fuzzy words. Fuzzy rules are defined in the form of “If Condition then Action”. There are 8 different user priority levels that are assigned to each user. The priority 0 represents the user that has the highest priority and 7 for the lowest while value of the DSS is in the range of [0, 1]. Initially, the database workloads were divided into 8 classes (Class 0 to 7) for execution. The workloads are divided into these classes according to their size and importance. The workloads in class 0 execute first, next class 1’s workloads and so on. The Fuzzy rules are shown in Table 5.7. Table 5.3 Initial Fuzzification Rules S No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Rule If Up = 0 and DSS <= 0.5 then “Class 0” If Up = 0 and DSS > 0.5 then “Class 1” If Up = 1 and DSS <= 0.3 then “Class 0” If Up = 1 and DSS > 0.3 and DSS <= 0.6 then “Class 1” If Up = 1 and DSS > 0.6 and DSS <= 0.8 then “Class 2” If Up = 1 and DSS > 0.8 then “Class 3” If Up = 2 and DSS <= 0.2 then “Class 0” If Up = 2 and DSS > 0.2 and DSS <= 0.4 then “Class 1” If Up = 2 and DSS > 0.4 and DSS <= 0.7 then “Class 2” If Up = 2 and DSS > 0.7 then “Class 3” If Up = 3 and DSS <= 0.2 then “Class 0” If Up = 3 and DSS > 0.2 and DSS <= 0.4 then “Class 1” If Up = 3 and DSS > 0.4 and DSS <= 0.6 then “Class 2” If Up = 3 and DSS > 0.6 and DSS <= 0.8 then “Class 3” If Up = 3 and DSS > 0.8 then “Class 4” If Up = 4 and DSS <= 0.1 then “Class 0” If Up = 4 and DSS > 0.1 and DSS <= 0.3 then “Class 1” If Up = 4 and DSS > 0.3 and DSS <= 0.5 then “Class 2” If Up = 4 and DSS > 0.5 and DSS <= 0.8 then “Class 3” If Up = 4 and DSS > 0.8 then “Class 4” If Up = 5 and DSS <= 0.1 then “Class 0” If Up = 5 and DSS > 0.1 and DSS <= 0.3 then “Class 1” If Up = 5 and DSS > 0.3 and DSS <= 0.5 then “Class 2” If Up = 5 and DSS > 0.5 and DSS <= 0.6 then “Class 3” If Up = 5 and DSS > 0.6 and DSS <= 0.8 then “Class 4” If Up = 5 and DSS > 0.8 then “Class 5” If Up = 6 and DSS <= 0.1 then “Class 0” If Up = 6 and DSS > 0.1 and DSS <= 0.3 then “Class 1” If Up = 6 and DSS > 0.3 and DSS <= 0.4 then “Class 2” If Up = 6 and DSS > 0.4 and DSS <= 0.5 then “Class 3” If Up = 6 and DSS > 0.5 and DSS <= 0.7 then “Class 4” If Up = 6 and DSS > 0.7 and DSS <= 0.8 then “Class 5” Workload Management through Characterization and Idleness Detction 87 Chapter 5 Fuzzy Logic Based Scheduling 33 34 35 36 37 38 39 40 41 If Up = 6 and DSS > 0.8 then “Class 6” If Up = 7 and DSS <= 0.1 then “Class 0” If Up = 7 and DSS > 0.1 and DSS <= 0.2 then “Class 1” If Up = 7 and DSS > 0.2 and DSS <= 0.3 then “Class 2” If Up = 7 and DSS > 0.3 and DSS <= 0.4 then “Class 3” If Up = 7 and DSS > 0.4 and DSS <= 0.5 then “Class 4” If Up = 7 and DSS > 0.5 and DSS <= 0.7 then “Class 5” If Up = 7 and DSS > 0.7 and DSS <= 0.8 then “Class 6” If Up = 7 and DSS > 0.8 then “Class 7” This is one way to represent the Fuzzy rules but there exists some other approaches that are used for the representations of Fuzzy rules. One of the methods is the Fuzzy Associative Matrix (FAM) where the Fuzzy rules are written in tabular form. Usually the FAM table consists of two input parameter and the associative output. By using this approach, the rules are written in compact form and easy to understand. We have converted the above Fuzzy rules into the FAM which are shown in Table 5.8. Table 5.4 Initial Fuzzification Rules in Compact Form DSS/ UP <=0.1 >0.1 and <=0.2 >0.2 and <=0.3 >0.3 and <=0.4 >0.4 and <=0.5 >0.5 and <=0.6 >0.6 and <=0.7 >0.7 and <=0.8 >0.8 and <=0.9 >0.9 and <=1 =0 0 0 0 0 0 1 1 1 1 1 =1 0 0 1 1 2 2 2 3 3 3 =2 0 0 1 1 2 2 2 3 3 3 =3 0 0 1 1 2 2 3 3 4 4 =4 0 0 1 2 2 2 3 3 4 4 =5 0 0 1 1 2 3 4 4 5 5 =6 0 1 1 2 3 3 4 5 6 6 =7 0 1 2 3 4 5 5 6 7 7 However after the experiments, it has observed that it takes too much calculation time due to large number of Fuzzy rules and their computation. Due to this reason, we have minimized the number of classes up to three, i.e. High, Medium and Low. Each workload is assigned into High, Medium, and Low class on the basis of user priority and percentage of the DSS. The High class consists of those workloads that will be executed first. Medium class workloads will execute subsequently and low class workloads will be executed at last. Here, the high class does not mean that the workload takes only the smallest time but it represents those workloads that take small time and have higher user priority (UP). Workload Management through Characterization and Idleness Detction 88 Chapter 5 Fuzzy Logic Based Scheduling The Fuzzy rules have been developed after executing the various OLTP and DSS like workloads and by taking the opinions from the database experts. Each class is identified by the Fuzzification output and Fuzzy rules. These rules are written in the form of Fuzzy Associative Matrix (FAM) as shown in Table 5.9. Table 5.5 Final Fuzzification Rules in Compact Form DSS/ UP <=0.3 0.3 to 0.5 > 0.5 <=1 High High Medium >=2 and <= 4 High Medium Low >= 5 Medium Low Low 5.1.3 Defuzzification The fuzzified rules are used in defuzzification step to produce the final output by changing the Fuzzy set into output values. According to the OLTP and DSS percentage as received from step 1 and fuzzified rules of step 2, we have identified the specific workload type classes for workloads in Table 5.2 and the result is shown in Table 5.4. For example, w1 falls in medium class as the values of user priority is 2 and DSS is 0.527, therefore the rule of the Table 5.3 is applied, i.e. If Up <=2 and DSS > 0.5 then “Medium”. Table 5.6 Workloads with their Identified Workload Class Workload w1 w2 w3 w4 w5 w6 Workload Class Medium High Medium Medium High Low Finally, all the sub-workloads are grouped into their relative classes after the identification of workload class for each sub-workload. In this example, w2 and w5 falls in high class workload; w1, w3, and w4 in medium class; and w6 in low class. High Class = {w2, w5} Medium Class = {w1, w3, w4} Low Class = {w6} Workload Management through Characterization and Idleness Detction 89 Chapter 5 Fuzzy Logic Based Scheduling Summary In this chapter there is discussion about the scheduling technique to arrange the workload for efficient execution. The scheduling is performed by finding the degree or percentage of the OLTP and DSS in each workload. The percentage is calculated by using the Fuzzy Logic approach where there are three main steps i.e. Fuzzification, Fuzzy Rules and Defuzzification. In the Fuzzification step, the membership function is calculated on the basis of workload parameter values and converted into the Fuzzy set. Each workload is assigned low, medium, or high class with respect to its size and importance. The membership functions and the Fuzzy rules are identified after the executing of various workloads. Once the workload is grouped into classes, the scheduling is performed on the basis of workload importance and percentage of the DSS and OLTP. Workload Management through Characterization and Idleness Detction 90 Chapter 6: Idleness Detection Chapter 6 Idleness Detection The chapter discusses the proposed idleness detection approach where a CBR is built that works on the basis of training data. The data for the CBR is developed by executing the various OLTP and DSS like workloads, identifying the number of OLTP and DSS requests and associative CPU idleness. Finally, storing the number of OLTP and DSS queries along with their CPU idleness. 6.1 CBR Development for Idleness Detection The sample data for the idleness detection is constructed for the CBR. The database workload is generated by threads where each thread is responsible to execute the OLTP or DSS like workloads. The parameters that are stored into the CBR include Workload ID, OLTP Transactions, DSS Transactions and the CPU Idleness. The Workload ID column represents the unique number for workload identification and is generated automatically, OLTP Transactions represents the number of OLTP like queries and DSS Transactions represents the number of DSS like queries in the particular workload while CPU Idleness is the CPU cycles which will remain free during the execution of particular workload. Table 6.1 CBR DATA for CPU Idleness Detection Workload ID 1 2 3 4 5 6 . . . . OLTP Transactions 12 5 7 15 31 2 . . . . DSS Transactions 5 3 11 5 3 5 . . . . CPU Idleness (%) 28 57 19 33 26 19 . . . . In order to find the CPU idleness for the incoming workload, there are four steps of the proposed idleness detection technique. These steps include Retrieve, Reuse, Revise and Retain. Workload Management through Characterization and Idleness Detction 92 Chapter 6 Idleness Detection 6.1.1 Retrieve First step to find the CPU idleness from the CBR is to match the parameters (OLTP and DSS Transactions) of the currently executing workload with already stored cases. The retrieval is performed from the problem description. It is important that missing similar cases can lead towards inaccurate matched case that ultimately produces the wrong decisions. This is also important as the accuracy depend on the number of cases stored and matching cases retrieval. The similarity of cases can be determined through various techniques such as finding the distance between two cases through similarity matrices which can be provided by the experience or experts etc. The retrieval step has further four sub-steps, i.e. identification of the features, initial match, search, and selection of cases. Here, again the Euclidean distance method [Deza et al. (2009), Danielsson (1980)] is used to find the matched cases. So the same Euclidean distance formula as discussed in chapter 4 is used to find the distance between the incoming workload and stored cases. It is one of the simplest, fastest methods. Figure 6.2 Steps to Identify the CPU Idleness Workload Management through Characterization and Idleness Detction 93 Chapter 6 Idleness Detection Example 1 Let’s DBMS encounter a new workload with 9 and 13 OLTP and DSS transactions respectively. Now, we have to find the CPU idleness against this workload by tacking the match cases from the CBR. Here, the workload parameter’s values of the CBR and new workload are not converted into the range of [0, 1] as shown in Table 6.2. Difference of the new workload with all the stored records of the CBR is calculated using the above Euclidean Distance formula. The CPU idleness for the new workload is taken from the row/ record that have minimum distance value. In this example, minimum value of the Euclidean distance is 2.82842712 (row 3, which is also highlighted) and the associated CPU idleness is 19. Therefore, the CPU Idleness of the incoming workload will also be 19. The identified CPU idleness is used for the system processes. However, when there is no exact or nearest match is found then the workload executes without finding the CPU Idleness. After the completion of workload, it is stored with the actual CPU Idleness. The nearest records are considered that have the Euclidean Difference <= 0.5 when number of OLTP and DSS transactions are less than equal to 1, 10 when number of OLTP and DSS transactions are less than equal to 100 and 100 when number of OLTP and DSS transactions are less than equal to 1000. Table 6.2 Euclidean Difference Calculation (Example 1) New Workload CBR Data (In the range of [0, 1]) (in the range of [0, 1]) Euclidean Difference OLTP Transactions DSS Transactions CPU Idleness (%) OLTP Transactions DSS Transactions 12 5 28 9 13.0000 8.54400375 5 3 57 9 13.0000 10.77032961 7 11 19 9 13.0000 2.82842712 15 5 33 9 13.0000 10.00000000 31 3 26 9 13.0000 24.16609195 2 5 19 9 13.0000 10.63014581 . . . . . . . . . . . . . . . . . . . . . . . . We have explained the method by which the nearest matching records from the CBR is retrieved. Workload Management through Characterization and Idleness Detction 94 Chapter 6 Idleness Detection 6.1.2 Reuse The Reuse step starts after the completion of the Retrieve and can achieve by performing two sub-steps. First, finding the differences among the current and the retrieved case; and second identification of the portion of the retrieved case that must be stored as a new case into the CBR for future use. The reuse step proposes the CPU idleness from the solution that retrieved in previous step. Reusing a retrieved case can be as easy as returning the retrieved solution, unchanged as here we just have to use the workload type of the most similar case from the retrieve step. This methodology is mostly used in the classification problems where each workload class or type is likely to be represented frequently in the case base and therefore the most similar retrieved case if sufficiently similar is most probably contain the right solution. However it becomes more difficult when there is major difference between the incoming workload and the retrieved cases. In those circumstances the retrieved solution may need to be adapted to account for these important differences. 6.1.3 Revise The revise step is initiated only when the solution found from the previous step (reuse) is inaccurate. The revise process is achieved by evaluating the reuse solution and corrects it by applying the domain knowledge. In this step there is a need of some adaptation method that is applied to revise the solution by observing the change in the retrieve process and how the retrieve solution can be changed. We have used the Substitution adaptation method as proposed by (Kolodner, 1993), where the transformation adaptation alters the structure of the solution. There are two ways of adaptation, in first case there is involvement of the user at each substitution step who decides whether the substitution is according to his requirement or not. So user decides about the substitution accepts or rejection. While in second case, the system performs all the substitution at its own and finally the substituted result is sent to the user for final evaluation. For the training data, we have adopted the first one in order to get accuracy. Workload Management through Characterization and Idleness Detction 95 Chapter 6 Idleness Detection 6.1.4 Retain The retain phase is the final step in CBR that insert the valuable information of the current case into the knowledge base that is helpful for future problems solving. However, duplicate or extra information should be avoided during the incorporation of new case otherwise it slows down the retrieval process. The retain process also involves that how information will be stored and its associated index in current memory structure. Summary In this chapter the proposed idleness detection approach to utilize the free CPU cycles is discussed to improve the workload management process. The approach consists of a CBR that contains the training data that is taken by executing the OLTP and DSS like workload with the identification of the CPU idleness and storing the workload related parameters. In case of no exact match, the workload is executed without taking the idleness for the first time and stored as a new case into the CBR after execution. Once the CPU idleness is found, the time period is used for some other processes such as defragmentation, index rebuilding etc. The defragmentation process executes up to the detected idleness time. After completing the idleness time, the defragmentation or other process may complete otherwise it is stopped again. Workload Management through Characterization and Idleness Detction 96 Chapter 7: Results and Discussion Appendix – Database Structure The chapter is dedicated for the results that are achieved after the implementation of the proposed and other workload management approach. The correctness and performance of the proposed approach is discussed theoretically in the last chapters. The objective of the chapter is to prove the proposed research by comparing it with other well-known approaches. Section 7.1 explains the experimental environment. The Section 7.2 calculates the wait time of the proposed and other workload management approaches to show the effectiveness. In order to show the effectiveness of the proposed research different workload scenarios are taken in these experiments. 7.1 Experimental Setup The proposed approach is implemented in VB.Net and all the experiments are executed on the machine having Intel Core 2 Duo 3.0 GHZ with 280 GB hard disk, main memory of 2 GB with operating system Windows 7 Ultimate 32-bit and MySQL server version 5.1. There are two databases at MySQL server, namely TPCH (1.03 GB) having 8 tables and TPCC as per specification of the Transaction Processing Council (TPC). The data in tables of these two databases are created through DBGen [TPC Benchmark, 2011] and Dbt2 tool [Database Test 2 (2011)]. DBGen is used to generate the TPC-H benchmark while Dbt2 is used for TPC-C benchmark to populate the workload or transactions executed by different threads from the client machines. The hardware for experimentation and validation of the proposed solution consists of the Database Server, Backup Server, six personal computers and a Switch as discussed in Chapter 3. All the devices are connected through a network. The MySql Database Management System is selected that is used as a data source. It is selected due to its freely availability and open source feature i.e. source code is available which can be analyzed and modified accordingly. MySql Server is installed on the server machine and MySql client on the client machines. The main database is stored on the server machine while backup server is responsible to maintain the Workload Management through Characterization and Idleness Detction 98 Chapter 7 Results and Discussion backup of the database. Each client machine has the ability to generate number of workloads at a time. The workload is generated through threads and sent to the database server for execution. The database is stored on the server as well as backup machine that also have various OLTP and DSS queries which are taken arbitrarily by the threads. In each experiment, the proposed method and three previously used workload management approaches are separately executed in same experimental environment. The results that are reported here are selected after their multiple executions. As explained in previous chapters, the experimental evaluation is conducted by executing the benchmark workloads that consists of OLTP and DSS like queries. All the comparisons are presented through the graphs that will show how much the proposed solution for the workload management in DBMSs is effective. 7.2 Results and Discussion To check the effectiveness of the proposed workload management technique, we have performed a comparison with other three most commonly used techniques which are First In First Out (FIFO), Priority Based (PB) workload management, and Shortest Job First (SJF). The comparison is based on the average waiting time that is taken by executing the various OLTP and DSS like workloads over these four scheduling techniques. FIFO technique works by observing the rule that the workload which encounter first will be executed first, SJF scheduling executes the requests starting from the smallest one to largest order where as in case of PB the requests with highest priority executes first even it takes the largest time. There are two main problems with the FIFO based workload management; firstly, the smallest requests have to wait for a long time if arrive after a long one, and secondly, it does not consider the priority or importance i.e. whether the request is generated from the higher management or simple from an operator. Same like FIFO, SJF considers all the requests at the same level and executes from smallest to longest request without considering its impact for the organization or how much it is important for the higher management. On the other hand, the PB workload management executes the requests on the basis of the priority only and the requests generated from the low management have to wait for a long time even small in size. Workload Management through Characterization and Idleness Detction 99 Chapter 7 Results and Discussion 7.2.1 Proposed Vs FIFO Based Workload Management In order to evaluate the performance of the proposed approach, number of experiments has been executed where the database workloads (consisting of the OLTP and DSS like queries) are executed by adopting the FIFO and proposed approach. Experiment 1 Consider the workload presented in Table 7.1 to Table 7.3 below where there are six (6) mix workloads are generated by the users having different priority. In this experiment, we have taken only those workloads that take 0 – 15 seconds in their completion. After the execution of these workloads, performance behavior is evaluated by observing the workload execution time and waiting time. 7.1 Comparative Results of the Proposed and FIFO Workload Management for Scenario 1 W w1 w2 w3 w4 w5 w6 Arrival order 2 4 1 3 6 5 Priority 2 6 2 0 1 3 OLTP % 0.473 0.85 0.34 0 0.23 0.1 DSS % 0.527 0.15 0.66 1 0.77 0. 9 OLT DSS Total P ET ET Time 3 6.4 9.4 4.2 2.1 6.3 2.7 8.2 10.9 0 7.2 7.2 0.2 6.9 7.1 0.03 6.2 6.23 Average Wait Time FIFO WT 10.9 27.5 0 20.3 40.03 33.8 22.1 PA Order 4 1 5 3 2 6 PA WT 20.6 0 30 13.4 6.3 40.9 18.53 7.2 Comparative Results of the Proposed and FIFO Workload Management for Scenario 2 W Arrival order Priority OLTP % w1 w2 w3 w4 w5 w6 3 6 1 2 5 4 2 6 2 0 1 3 0.473 0.85 0.34 0 0.23 0.1 DSS % 0.52 7 0.15 0.66 1 0.77 0. 9 OLTP ET DSS ET Total Time 1 6.4 7.4 2 2.2 4.2 1.6 8.2 9.8 0 7.2 7.2 0.2 6.9 7.1 0.03 4.1 4.13 Average Wait Time FIFO WT PA Order PA WT 17 35.63 0 9.8 28.53 24.4 19.2 4 1 5 3 2 6 18.5 0 25.9 11.3 4.2 35.7 15.9 Workload Management through Characterization and Idleness Detction 100 Chapter 7 Results and Discussion 7.3 Comparative Results of the Proposed and FIFO Workload Management for Scenario 3 W Arrival order Priority OLTP % w1 w2 w3 w4 w5 w6 1 2 5 3 6 4 2 6 2 0 1 3 0.473 0.85 0.34 0 0.23 0.1 DSS % 0.52 7 0.15 0.66 1 0.77 0. 9 OLTP ET DSS ET Total Time 1 6.4 7.4 2 2.9 4.9 1.6 8.2 9.8 0 6.3 6.3 0.2 5.1 5.3 0.03 4.1 4.13 Average Wait Time FIFO WT PA Order 0 7.4 22.73 12.3 32.53 18.6 15.6 4 1 5 3 2 6 PA WT 16.5 0 23.9 10.2 4.9 33.7 14.9 After execution of the OLTP and DSS like workloads, it is observed that our approach works better than the FIFO in two perspectives in all three scenarios which are presented in Table 7.1, 7.2, and 7.3. In scenario 1, the average waiting time of the proposed approach (18.53) is less than the FIFO (22.1); saving 16% time than the FIFO based workload management. In scenario 2 presented in Table 7.2, average waiting time of the proposed approach (15.9) is less than the FIFO (19.2). Here, again the proposed workload management approach takes 17% less time than the FIFO. In 3rd scenario, average waiting time of the approach (14.9) is less than the FIFO (15.6). Again, it takes 4% less time than the FIFO. If we take average of these three cases, the proposed workload management approach saves 12.3 % as compared to FIFO. Despite of minimum average waiting time, second perspective is the importance of the workload. The strategic decisions are made by the higher authorities in every organization, so it is necessary to give priority to their workloads and other workloads can be stopped for a while. The FIFO approach does not consider the priority or importance of the workload while our approach considers the importance. To understand above tables, we have demonstrated the results in the form of graphs shown in Figure 7.1. These graphs represent the waiting time of the proposed and FIFO technique based on three different scenarios. These scenarios portray that the proposed workload management approach is efficient than FIFO. Workload Management through Characterization and Idleness Detction 101 Chapter 7 Results and Discussion Figure 7.1 FIFO WT PA WT Waiting Time of the Proposed Vs FIFO Workload Management The Average waiting time of above three scenarios against the FIFO and proposed approach are calculated. These values are shown in Table 7.4 and their respected graph is shown in Figure 7.2 to illustrate the efficiency of the proposed workload management over the FIFO based approach. Workload Management through Characterization and Idleness Detction 102 Chapter 7 Results and Discussion 7.4 Average Waiting Time of the Proposed and FIFO Workload Management Scenario 1 2 3 Figure 7.2 FIFO AWT 22.1 19.2 15.6 PA AWT 18.53 15.9 14.9 FIFO AWT PA AWT Average Waiting Time of the Proposed Vs FIFO Workload Management Experiment 2 In this experiment, we are taking the workloads shown in Table 7.5 to 7.7 below where there are six (6) mix sub-workloads which are generated by the users having different priority. In this experiment, we have taken only those workloads that take 0 – 1500 seconds (i.e. less than 30 minutes) in their completion. After the execution of these workloads, performance behavior is evaluated by observing the workload execution time and waiting time. 7.5 Comparative Results of the Proposed and FIFO Workload Management for Scenario 1 W w1 w2 w3 w4 w5 w6 Arrival order 2 4 1 3 6 5 Priority 2 6 2 0 1 3 OLTP % 0.62 1 0.21 0 0.5 0.54 DSS % 0.38 0 0.79 1 0.95 0.46 OLTP DSS Total ET ET Time 105.2 312.4 417.6 24.2 0 24.2 30.12 750.03 780.15 0 711.91 711.91 0.13 556.12 556.25 102.34 201.2 303.54 Average Wait Time FIFO WT 780.15 1909.66 0 1197.75 2237.4 1933.86 1343.14 PA Order 4 1 5 3 2 6 PA WT 1292.36 0 1709.96 580.45 24.2 2490.11 1016.18 Workload Management through Characterization and Idleness Detction 103 Chapter 7 Results and Discussion 7.6 Comparative Results of the Proposed and FIFO Workload Management for Scenario 2 W w1 w2 w3 w4 w5 w6 Arrival order 3 6 1 2 5 4 Priority 2 6 2 0 1 3 OLT P% 0.57 0.92 0.002 0.01 0.08 0.79 DSS % 0.43 0.08 0.998 0.99 0.92 0.21 OLTP DSS Total ET ET Time 136.43 106.12 242.55 111.4 119 230.4 19.6 6062 6081.6 10.6 1092 1102.6 13.2 820.19 833.39 96.12 127 223.12 Average Wait Time FIFO WT 7184.2 8483.26 0 6081.6 7649.87 7426.75 6137.61 PA Order 4 1 5 3 2 6 PA WT 2166.39 0 2408.94 1063.79 230.4 8490.54 2393.34 7.7 Comparative Results of the Proposed and FIFO Workload Management for Scenario 3 W w1 w2 w3 w4 w5 w6 Arrival order 1 2 5 3 6 4 Priority 2 6 2 0 1 3 OLTP % 0.42 0.85 0.31 0.05 0.21 0.19 DSS % 0.58 0.15 0.69 0.95 0.79 0. 91 OLTP DSS Total ET ET Time 87.21 116.4 203.61 102 32 134 61.6 58.2 119.8 6.7 146.3 153 10.2 73.1 83.3 12.05 114.1 126.15 Average Wait Time FIFO WT 0 203.61 616.76 337.61 736.56 490.61 397.53 PA Order 4 1 5 3 2 6 PA WT 370.3 0 573.91 217.3 134 693.71 331.54 After execution of the OLTP and DSS like workloads, it is observed that our approach works better than the FIFO in two perspectives in all three scenarios which are presented in Table 7.5, 7.6, and 7.7. In scenario 1, the average waiting time of the approach (1016.18) is less than the FIFO (1343.14), so the proposed approach saves 32% time than the FIFO based workload management for this workload. In scenario 2 presented in Table 7.2, average waiting time of the proposed approach (2393.34) is less than the FIFO (6011.76). Here, again the proposed workload management approach takes 151% less time than the FIFO workload management. In 3rd scenario, average waiting time of the approach (331.54) is less than the FIFO (397.53). Again, our proposed approach takes 20% less time than the FIFO workload management. If we take the average of these three scenarios, the proposed workload management approach saves 110 % as compared to the FIFO. Despite of minimum average waiting time, second perspective is the importance of the workload. The strategic decisions are made by the higher authorities in every organization, so it is necessary to give priority to their workloads as compared to other workloads. As compared to the FIFO approach, our workload management approach also considers the workload importance. Workload Management through Characterization and Idleness Detction 104 Chapter 7 Results and Discussion To understand above tables, we have demonstrated the results in the form of graphs as shown in Figure 7.3. These graphs represent the waiting time of the proposed and FIFO technique based on three different scenarios. These scenarios portray that the proposed workload management approach is efficient than FIFO. Figure 7.3 FIFO WT PA WT Waiting Time of the Proposed Vs FIFO Workload Management Workload Management through Characterization and Idleness Detction 105 Chapter 7 Results and Discussion The Average waiting time of above three scenarios against the FIFO and proposed workload management approach are calculated. Their associated values are represented in Table 7.8 and graph is shown in Figure 7.4 to illustrate the efficiency of the proposed workload management. 7.8 Average Waiting Time of the Proposed and FIFO Workload Management Scenario 1 2 3 Figure 7.4 FIFO AWT 1016.18 2393.34 331.54 PA AWT 1343.14 6137.61 397.53 FIFO AWT PA AWT Average Waiting Time of the Proposed Vs FIFO Workload Management 7.2.2 Proposed Vs PB Workload Management The proposed workload management and priority based techniques are compared to check whether it works better or not. Experiment 1 Consider the workloads which are presented in Table 7.9 to Table 7.11 where there are six (6) mix workloads are generated. In this experiment, we have taken only those workloads that take 0 – 15 seconds in their completion. After the execution of these workloads, performance behavior is evaluated by observing the workload execution time and waiting time. Workload Management through Characterization and Idleness Detction 106 Chapter 7 Results and Discussion 7.9 Comparative Results of the Proposed and PB Workload Management for Scenario 1 W w1 w2 w3 w4 w5 w6 Arrival order 2 4 1 3 6 5 Priority 2 6 2 0 1 3 OLTP % 0.473 0.85 0.34 0 0.23 0.1 DSS % 0.527 0.15 0.66 1 0.77 0. 9 OLTP ET 3 4.2 2.7 0 0.2 0.03 DSS Total PB ET Time Order 6.4 9.4 3 2.1 6.3 6 8.2 10.9 4 7.2 7.2 1 6.9 7.1 2 6.2 6.23 5 Average Wait Time PB WT 14.3 40.83 23.7 0 7.2 34.6 20.1 PA WT 20.6 0 30 13.4 6.3 40.9 18.5 7.10 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W w1 w2 w3 w4 w5 w6 Arrival order 3 6 1 2 5 4 Priority 2 6 2 0 1 3 OLTP % 0.473 0.85 0.34 0 0.23 0.1 DSS % 0.527 0.15 0.66 1 0.77 0. 9 OLTP ET 1 2 1.6 0 0.2 0.03 DSS Total PB ET Time Order 6.4 7.4 3 2.2 4.2 6 8.2 9.8 4 7.2 7.2 1 6.9 7.1 2 4.1 4.13 5 Average Wait Time PB WT 14.3 35.63 21.7 0 7.2 31.5 18.4 PA WT 18.5 0 25.9 11.3 4.2 35.7 15.9 7.11 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W w1 w2 w3 w4 w5 w6 Arrival order 1 2 5 3 6 4 Priority 2 6 2 0 1 3 OLTP % 0.473 0.85 0.34 0 0.23 0.1 DSS % 0.527 0.15 0.66 1 0.77 0. 9 OLTP ET 1 2 1.6 0 0.2 0.03 DSS Total PB ET Time Order 6.4 7.4 3 2.9 4.9 6 8.2 9.8 4 6.3 6.3 1 5.1 5.3 2 4.1 4.13 5 Average Wait Time PB WT 11.6 32.93 19 0 6.3 28.8 16.4 PA WT 16.5 0 23.9 10.2 4.9 33.7 14.9 After the workload execution, it has been found that the proposed method works better than the PB workload management again as shown in Table 7.9, 7.10 and 7.11. In first scenario, average waiting time of the PA (18.5) is less than the PB approach (20.1), so proposed approach is better than PB as it saves 8% time. In second scenario which is shown in Table 7.10, average waiting time of the PA (15.9) is less than the PB (18.4). Here, again it takes 14% less time than the PB. In third scenario shown in Table 7.11, the average waiting time of the PA (14.9) while PB approach takes (16.4) and again in this case the proposed method takes 10 % less time than Workload Management through Characterization and Idleness Detction 107 Chapter 7 Results and Discussion PB. Now take the average of three scenarios, our approach saves 10.3% as compared to PB. The other important perspective of our research is the consideration of the workload importance as compared to the PB. In order to understand above tables easily we have demonstrated these results in following graphs shown in Figure 7.3. These graphs show the efficiency of our proposed scheduling approach. Workload Management through Characterization and Idleness Detction 108 Chapter 7 Results and Discussion Figure 7.5 PB WT PA WT Waiting Time of the Proposed Vs PB Workload Management The Average waiting time of above three cases against the PB and PA are represented in Table 7.8 and their respective graph is shown in Figure 7.4 to illustrate the efficiency of the PA over the PB. 7.12 Average Waiting Time of the Proposed and PB Workload Management Scenario 1 2 3 Figure 7.6 PB AWT 20.1 18.4 16.4 PA AWT 18.5 15.9 14.9 PB AWT PA AWT Average Waiting Time of PB Vs Proposed Approach Workload Management through Characterization and Idleness Detction 109 Chapter 7 Results and Discussion Experiment 2 Consider the workloads which are presented in Table 7.13 to Table 7.15 where there are six (6) mix workloads are generated. In this experiment, we have taken the workloads that take 0 – 1500 seconds in their completion. After the execution of these workloads, performance behavior is evaluated by observing the workload execution time and waiting time. 7.13 Comparative Results of the Proposed and PB Workload Management for Scenario 1 W w1 w2 w3 w4 w5 w6 Arrival order 2 4 1 3 6 5 Priority 2 6 2 0 1 3 OLT P% 0.62 1 0.21 0 0.55 0.54 DSS % 0.38 0 0.79 1 0.45 0.46 OLTP ET 105.2 24.2 30.12 0 256.13 102.34 DSS ET 312.4 0 750.0 711.9 300.1 201.2 Total PB Time Order 417.6 3 24.2 6 780.15 4 711.91 1 556.25 2 303.54 5 Average Wait Time PB WT PA WT 1268.16 2769.45 1685.76 0 711.91 2465.91 1483.53 1268.16 0 1709.96 1268.16 711.91 2769.45 1287.94 7.14 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W w1 w2 w3 w4 w5 w6 Arrival order 3 6 1 2 5 4 Priority 2 6 2 0 1 3 OLT P% 0.57 0.92 0.002 0.77 0.68 0.79 DSS % 0.43 0.08 0.998 0.33 0.32 0.21 OLTP ET 136.43 111.4 19.6 610.6 420.2 96.12 DSS Total PB ET Time Order 106.12 242.55 3 109 220.4 6 6062 6081.6 4 492 1102.6 1 413.19 833.39 2 127 223.12 5 Average Wait Time PB WT PA WT 1935.99 8483.26 2178.54 0 1102.6 8260.14 3660.09 1935.99 0 2398.94 1935.99 1102.6 8483.26 2642.80 7.15 Comparative Results of the Proposed and PB Workload Management for Scenario 2 W w1 w2 w3 w4 w5 w6 Arrival order 1 2 5 3 6 4 Priority 2 6 2 0 1 3 OLT P% 0.42 0.85 0.31 0.35 0.21 0.19 DSS % 0.58 0.15 0.69 0.65 0.79 0. 91 OLTP ET 87.21 102 61.6 76.7 10.2 12.05 DSS ET 116.4 32 58.2 76.3 73.1 114.1 Total PB Time Order 203.61 3 134 6 119.8 4 153 1 83.3 2 126.15 5 Average Wait Time PB WT 236.3 685.86 439.91 0 153 559.71 345.80 PA WT 236.3 0 573.91 236.3 153 685.86 314.23 Workload Management through Characterization and Idleness Detction 110 Chapter 7 Results and Discussion After the workload execution, it has been found that the proposed method works better than the PB workload management again as shown in Table 7.13, 7.14 and 7.15. In first scenario, average waiting time of the PA (1287.94) is less than the PB approach (1483.53), so proposed approach is better than PB as it saves 15% time. In second scenario which is shown in Table 7.13, average waiting time of the proposed approach (2642.80) is less than the Priority Based workload management (3660.09). Here, again it takes 39% less time than the PB workload management. In third scenario shown in Table 7.15, the average waiting time of the PA (314.23) while PB approach takes (345.80) and again in this case the proposed method takes 10 % less time than PB. Now take the average of three scenarios, our approach saves 10.3% as compared to PB. The other important perspective of our research is the consideration of the workload importance as compared to the PB. In order to understand above tables, we have demonstrated these results in the graphs shown in Figure 7.3. These graphs show the efficiency of our proposed workload management approach as compare to the priority based. Workload Management through Characterization and Idleness Detction 111 Chapter 7 Results and Discussion Figure 7.7 PB WT PA WT Waiting Time of the Proposed Vs PB Workload Management The Average waiting time of above three cases against the proposed and priority based approach are presented in Table 7.16. Their respective graph is shown in Figure 7.8 to illustrate the efficiency of the PA over the PB. 7.16 Average Waiting Time of the Proposed and PB Workload Management Scenario 1 2 3 Figure 7.8 PB AWT 1483.53 3660.09 345.80 PA AWT 1287.94 2642.80 314.23 PB AWT PA AWT Average Waiting Time of PB Vs Proposed Approach Workload Management through Characterization and Idleness Detction 112 Chapter 7 Results and Discussion 7.2.3 Proposed Workload Management Vs SJF Here, the proposed and shortest job first workload management approach for the same workload as discussed previously. After the workload execution, it has been found that the average waiting time of PA is greater than the SJF as shown in Table 7.17, 7.18, and 7.19 as SJF gives higher priority to smallest workloads. In first scenario, the average waiting time of the PA (18.5) is greater than the SJF (15.9), so PA takes 14% more time than SJF. In second scenario shown in Table 7.18, average waiting time of the PA (15.9) is greater than SJF (12.7) and PA takes 21% more time. In third scenario as shown in Table 7.19, average waiting time of the PA takes 14.9 while SJF takes 12 and in this case PA takes 20% more time than SJF. After taking the average these three scenarios, proposed approach takes 18.3% more time than the SJF. However, SJF does not care about the importance of the workload which is very important factor for decision making. 7.17 Comparative Results of the Proposed and SJF Workload Management for Scenario 1 W Arrival order Priority OLTP % w1 2 2 0.473 w2 w3 w4 w5 w6 4 1 3 6 5 6 2 0 1 3 0.85 0.34 0 0.23 0.1 DSS % 0.52 7 0.15 0.66 1 0.77 0. 9 OLTP ET DSS ET Total Time SJF Order SJF WT PA Order PA WT 3 6.4 9.4 5 26.83 4 20.6 6.23 36.23 13.4 12.53 0 15.9 1 5 3 2 6 0 30 13.4 6.3 40.9 18.5 4.2 2.7 0 0.2 0.03 2.1 6.3 2 8.2 10.9 6 7.2 7.2 4 6.9 7.1 3 6.2 6.23 1 Average Wait Time Comparative Results of the Proposed and SJF Workload Management for Scenario 2 W Arrival order Priority OLTP % DSS % OLTP ET w1 w2 w3 w4 w5 w6 3 6 1 2 5 4 2 6 2 0 1 3 0.473 0.85 0.34 0 0.23 0.1 0.527 0.15 0.66 1 0.77 0. 9 1 2 1.6 0 0.2 0.03 DSS ET Total Time SJF Order SJF WT 6.4 7.4 5 2.2 4.2 2 8.2 9.8 6 7.2 7.2 4 6.9 7.1 3 4.1 4.13 1 Average Wait Time 22.63 4.13 30.03 11.3 8.33 0 12.7 PA Orde r 4 1 5 3 2 6 PA WT 18.5 0 25.9 11.3 4.2 35.7 15.9 Workload Management through Characterization and Idleness Detction 113 Chapter 7 Results and Discussion 7.18 Comparative Results of the Proposed and SJF Workload Management for Scenario 3 W Arrival order Priority OLTP % w1 1 2 0.473 w2 w3 w4 w5 w6 2 5 3 6 4 6 2 0 1 3 0.85 0.34 0 0.23 0.1 DSS % 0.52 7 0.15 0.66 1 0.77 0. 9 OLTP ET DSS ET Total Time SJF Order SJF WT PA Orde r PA WT 1 6.4 7.4 5 20.63 4 16.5 4.13 28.03 10.2 9.03 0 12.0 1 5 3 2 6 0 23.9 10.2 4.9 33.7 14.9 2 1.6 0 0.2 0.03 2.9 4.9 2 8.2 9.8 6 6.3 6.3 4 5.1 5.3 3 4.1 4.13 1 Average Wait Time In order to understand above tables results achieved from both the scheduling schemes are represented by the graphs which are shown in Figure 7.5. These graphs show that the proposed approach is inefficient than SJF but it is due to the reason that SJF only considers the size of the workload but not importance of the workload. Due to this reason, SJF is not suitable for the DBMS environment otherwise higher management has to wait for a long time in decision making. Workload Management through Characterization and Idleness Detction 114 Chapter 7 Results and Discussion Figure 7.9 PB WT PA WT Waiting Time of the Proposed Approach Vs SJF Workload Management Average waiting time of above three cases against the SJF and PA are represented in Table 7.12 and their respected graph is shown in Figure 7.6. 7.19 Average Waiting Time of the Proposed and SJF Workload Management Scenario 1 2 3 Figure 7.10 PA AWT 18.5 15.9 14.9 SJF AWT 15.9 12.7 12 FIFO AWT PA AWT Average Waiting Time of SJF Vs Proposed Approach Workload Management through Characterization and Idleness Detction 115 Chapter 7 Results and Discussion Summary The results are discussed in that are achieved after the implementation of the proposed and other well known workload management approach. The objective of the chapter is to prove the proposed research by comparing it with other well-known approaches. The experimental environment is discussed and calculated the wait time of proposed and other workload management techniques to show the effectiveness of the proposed research. In order to show the effectiveness of the proposed research, different workload scenarios are taken in these experiments (their execution time is in the range of 0 to 30 minutes). Workload Management through Characterization and Idleness Detction 116 Chapter 7 Results and Discussion Conclusion The research proposed a solution for managing the workload in DBMSs that makes decision about the workload execution with identifying the OLTP and DSS percentage. The workload Type is identified through the CBR approach while OLTP and DSS percentage is determined by using the Fuzzy Logic. The workload parameters are identified that are more effective for the workload characterization and classification. After the identification of the workload type, we have calculated the percentage of the OLTP and DSS. Once the OLTP and DSS percentage is known, the entire workload is arranged for better execution. Results achieved after experimenting over the OLTP and DSS like workload reveals that the proposed characterization and scheduling techniques are proved to be effective and less time consuming with consideration of the workload size and importance. The study starts with the introduction and background about the workload management, AI techniques, autonomic computing, problem statement, research questions, research objectives, applications and contribution. Existing database workload management work is discussed that are related with the workload classification, characterization and scheduling. This discussion also highlighted the problems which exist in the previous work. After the discussion of the workload background and related work, the methodology or brief steps of the proposed research are explained with the experimental setup. The proposed solution consists of the characterization, scheduler and idleness detector. The workload characterization is achieved by executing the OLTP and DSS like workload; values of the related parameters are taken and classification is performed through CBR. However, a problem with the CBR based characterization is that it cannot be used for workload management as workload is classified during or after its complete execution. So, there is a need of the solution which can classify the workload before its execution with finding the OLTP and DSS percentage of each workload. The percentage of the OLTP and DSS in workload is calculated by using the Fuzzy Logic that consists of the Fuzzification, Fuzzy Rules and Defuzzification steps. The scheduling is performed on the basis of workload importance and percentage of the DSS and OLTP. Finally, the CPU idleness of the incoming workload is identified by searching the exact or nearest (almost equal) workload from the CBR. Workload Management through Characterization and Idleness Detction 117 Chapter 7 Results and Discussion After getting the exact or nearest match, idleness of that workload is taken and that period is used for de-fragmentation or any other process. The proposed approach is validated through analyses and experiments. Effectiveness of the proposed research is experimentally evaluated and the efficiency in terms of the waiting time is compared with other workload management techniques. The chapter also provides a comparison with the other well-known workload management schemes. The results obtained from the approach are compared with the FIFO, Priority Based, and Smallest Job First (SJF). All the results are shown in tables and graphs that reveal that the proposed approach works better in all the scenarios. Workload Management through Characterization and Idleness Detction 118 Chapter 7 Results and Discussion Future Work In future this type of research can be performed on the other DBMSs that will be helpful in improving these DBMSs. In the current research, human intervention in the previous workload management is identified and reported; in future this human intervention can be removed and the autonomic level of the DBMSs can be improved. The proposed research is also the basis for improving the workload management task in Data Warehouses. For this purpose, the intelligent algorithms, techniques and models can be introduced to minimize the human intervention. Workload Management through Characterization and Idleness Detction 119 Appendix – Database Structure References Workload Management through Characterization and Idleness Detction 120 References Chandramouli B, Christopher N. Bond, B. Shivnath, Y. Jun, (2007). On Suspending and Resuming Dataflows. ICDE, pp. 1289–1291. Chandramouli B, Bond CN, Babu S, Yang J (2007). Query Suspend and Resume. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 557-568. Dageville B, Das D, Dias K, Yagoub K, Zait M, Ziauddin M (2004). Automatic SQL tuning in Oracle 10g. Proceedings of the 30th International VLDB Conference on: 1098–1109. Database Test 2 (2011), Database Test Suite. http://osdldbt.sourceforge.net/ Deza M.M., Deza E., (2009), Encyclopedia of Distances, ISBN 978-3-642-00233-5/hbk; ISBN 978-3642-00234-2/ebook, Springer. Danielsson P. (1980), Euclidean Distance Mapping, Computer Graphics and Image Processing, pp.227248. Elnaffar S, Martin P, Horman R, (2002). Automatically Classifying Database Workloads. Proceedings of the CIKM’02, Washington, DC. Elnaffar S, Martin P (2004). An Intelligent Framework for Predicting Shifts in the Workloads of Autonomic Database Management Systems. Proceedings of the IEEE International Conference on Advances in Intelligent Systems. Holze M., Ritter N, (2008). Autonomic databases: Detection of workload shifts with n-gram-models, LNCS, Vol. 5207, Springer, Heidelberg, pp. 127–142. Horn (2001). Autonomic Computing: IBM’s Perspective on the State of Information Technology. IBM Journal Paper. Huebscher MC, McCann JA (2008). A Survey of Autonomic Computing-Degrees, Models and Applications. ACM 0360-0300/2008/08-ART7. Hopfield J, (1982). Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proceedings of the National Acad. Science, Vol. 79, pp. 2554–2558, USA. IBM Corporation (2003). DB2 Query Patroller Guide: Installation, Administration, and Usage. IDC Competitive Analysis: Worldwide RDBMS 2005 Vendor Shares: Preliminary Results for the Top 5 Vendors Show Continued Growth. http://www.oracle.com/corporate/analyst/reports/infrastructure/dbms/idc-201692.pdf Krompass S, Scholz A, Albutiu MC, Kuno H, Wiener J, Dayal U, Kemper A (2008). Quality of Service Enabled Management of Database Workload. IEEE Database Engineering Bulletin (DEBU). Vol. 31(1): 20–27. Workload Management through Characterization and Idleness Detction 121 References Krompass S, Kuno H, Dayal U, Kemper A (2007). Dynamic Workload Management for Very Large Data Warehouses: Juggling Feathers and Bowling Balls. In Proceedings of the 33rd International. Conference on Very Large Data Bases (VLDB). Krompass S, Kuno H, Wiener J, Wilkinson K, Dayal U, Kemper A (2009). Managing Long-Running Queries. In the Proceedings of the 12th International Conference on Extending Database Technology (EDBT). Lightstone S, Lohman G, Zilio D (2002). Toward Autonomic Computing with DB2 Universal Database. SIGMOD. Vol. 31(3), pp. 55–61. Liu F, Zhao Y, Wang W, Andmakaroff D (2004). Database Server Workload Characterization in an ECommerce Environment. In Proceedings of the IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp.475–483. Mateen A, Raza B, Hussain T, Awais MM (2008). Autonomic Computing in SQL Server. 7th International Conference on Computer and Information Science, pp. 113–118. Mehta A, Gupta C, Wang S, Dayal U (2009). rFEED: A Mixed Workload Scheduler for Enterprise Data Warehouses. ICDE, pp. 1455–1458. Mehta A, Gupta C, Dayal U (2008). BI Batch Manager: A System for Managing Batch Workloads on Enterprise Data Warehouses. International Conference on Extended Database Technology (EDBT), pp.640–651. Mehta A, Gupta C, Wang S, Dayal U (2008). Automated Workload Management for Enterprise Data Warehouses”. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. Menasce DA, Almeida VAF, Fonseca R, Mendes MA (1999). A Methodology for Workload Characterization of E-commerce Sites. Proceedings of the 1st ACM Conference on Electronic Commerce, pp. 119–128. Mumtaz A, Ashraf A, Shivnath B (2008). Modeling and Exploiting Query Interactions in Database Systems. CIKM. pp. 183–192. Mumtaz A, Ashraf A, Shivnath B, Munagala K (2008). QShuffler: Getting the Query Mix Right. ICDE, pp. 1415–1417. Mumtaz A, Ashraf A, Shivnath B (2009). Query interactions in database workloads, DBTest. Niu B, Martin P, Powwley W, Horman R, Bird P (2006). Workload adaptation in autonomic DBMSs. Proceedings of the Center for Advanced Studies on Collaborative Research Conference (CASCON’06). ACM Press, 13. Niu B (2008). Workload adaptation in autonomic database management systems, PhD Thesis, Queen's University, Canada. Workload Management through Characterization and Idleness Detction 122 References Niu B, Martin P, Powley W, Bird P (2007). Quantifying Workload Importance. 4th International Workshop on Engineering Autonomic Software Systems, Canada. Niu B, Martin P, Powley W, Bird P, Horman R (2007). Adapting Mixed Workloads to Meet SLOs in Autonomic DBMSs. Proceedings of the 2nd International Workshop on Self-Managing Database Systems (SMDB) in Conjunction with the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey. Pang H H, Carey MJ, Livny M (1995). Multiclass Query Scheduling in Real-Time Database Systems. IEEE Transactions on Knowledge and Data Engineering. Vol. 7(4), pp. 533–551. Parashar M, Hariri S (2005). Autonomic Computing: An Overview. Springer-Verlag Berlin Heidelberg: 247–259. Rhee A, Chatterjee S, Lahiri T (2001). The Oracle Database Resource Manager: Scheduling CPU Resources at the Application. High Performance Transaction System Workshop. Stuart R, Peter N. (2008). Artificial Intelligence. A Modern Approach, 2nd Edition, ISBN No. 81-7758367-0. Schroeder B, Harchol-Balter M, Iyengar A, Nahum EM, Wierman A (2006). How to Determine a Good Multi-Programming Level for External Scheduling. ICDE, pp. 60–71. Roger S (1982). Dynamic Memory: A Theory of Learning in Computers and People. New York: Cambridge University Press. Surajit C, Raghav K, Ravishankar R (2005). When Can We Trust Progress Estimators For SQL Queries?. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 575–586. Surajit C, Raghav K, Ravishankar R, Abhijit P (2007). Stop-and-Restart Style Execution for Long Running Decision Support Queries. VLDB, pp. 735–745. TPC Benchmark (2011), TM Standard Specification Revision 2.6.2. http://www.tpc.org Wasserman T J, Martin P, Skillicorn D B, Rizvi H (2004). Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems. In Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, ACM Press, pp. 7–13. White S R, Hanson J E, Whalley I, Chess D M, Kephart J O (2004). An Architectural Approach to Autonomic Computing. Proceeding of the International Conference on Volume, Autonomic Computing, pp. 2–9. Yu PS, Chen M, Heiss H, Lee S (1992). On Workload Characterization of Relational Database Environments. Software Engineering. Vol. 18(4), pp. 347–355. Zadeh L, (1965). Fuzzy Sets and Systems, Fox Journal, editor. System Theory, Brooklyn, NY: Polytechnic Press, pp. 29–39. Workload Management through Characterization and Idleness Detction 123 References Zewdu Z, Denko M K, Libsie M (2009). Workload Characterization of Autonomic DBMSs Using Statistical and Data Mining Techniques. AIN, pp. 244–249. Workload Management through Characterization and Idleness Detction 124 References Appendix Database Structure Workload Management through Characterization and Idleness Detction 125 Appendix – Database Structure Database and Relevant Table Creation Database Creation CREATE DATABASE Workload_Database; use Workload_Database; Tables Creation create table nation ( n_nationkey decimal(3,0) not null, n_name char(25) not null, n_regionkey decimal(2,0) not null, n_comment varchar(152) ); create table region ( r_regionkey decimal(2,0) not null, r_name char(25) not null, r_comment varchar(152) ); create table part ( p_partkey decimal(10,0) not null, p_name varchar(55) not null, p_mfgr char(25) not null, p_brand char(10) not null, p_type varchar(25) not null, p_size decimal(2,0) not null, p_container char(10) not null, p_retailprice decimal(6,2) not null, p_comment varchar(23) not null ); create table supplier ( s_suppkey decimal(8,0) not null, s_name char(25) not null, s_address varchar(40) not null, s_nationkey decimal(3,0) not null, s_phone char(15) not null, s_acctbal decimal(7,2) not null, s_comment varchar(101) not null ); Workload Management through Characterization and Idleness Detction 126 References create table partsupp ( ps_partkey decimal(10,0) not null, ps_suppkey decimal(8,0) not null, ps_availqty decimal(5,0) not null, ps_supplycost decimal(6,2) not null, ps_comment varchar(199) not null ); create table customer ( c_custkey decimal(9,0) not null, c_name varchar(25) not null, c_address varchar(40) not null, c_nationkey decimal(3,0) not null, c_phone char(15) not null, c_acctbal decimal(7,2) not null, c_mktsegment char(10) not null, c_comment varchar(117) not null ); create table orders ( o_orderkey decimal(12,0) not null, o_custkey decimal(9,0) not null, o_orderstatus char(1) not null, o_totalprice decimal(8,2) not null, o_orderdate date not null, o_orderpriority char(15) not null, o_clerk char(15) not null, o_shippriority decimal(1,0) not null, o_comment varchar(79) not null ); create table lineitem ( l_orderkey decimal(12,0) not null, l_partkey decimal(10,0) not null, l_suppkey decimal(8,0) not null, l_linenumber decimal(1,0) not null, l_quantity decimal(2,0) not null, l_extendedprice decimal(8,2) not null, l_discount decimal(3,2) not null, l_tax decimal(3,2) not null, l_returnflag char(1) not null, l_linestatus char(1) not null, l_shipdate date not null, l_commitdate date not null, Workload Management through Characterization and Idleness Detction 127 References l_receiptdate date not null, l_shipinstruct char(25) not null, l_shipmode char(10) not null, l_comment varchar(44) not null ); Load Data to Tables LOAD DATA INFILE "D://implementation//DBGen//region.tbl" INTO TABLE region FIELDS TERMINATED BY "|" LINES TERMINATED BY "\r\n" (r_regionkey, r_name, r_comment); LOAD DATA INFILE "D://implementation//DBGen//nation.tbl" INTO TABLE nation FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//supplier.tbl" INTO TABLE supplier FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//part.tbl" INTO TABLE part FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//partsupp.tbl" INTO TABLE partsupp FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//customer.tbl" INTO TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//orders.tbl" INTO TABLE orders FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; LOAD DATA INFILE "D://implementation//DBGen//lineitem.tbl" INTO TABLE lineitem FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'; Creation of Primary Key ALTER TABLE region ADD CONSTRAINT pkey_region PRIMARY KEY(r_regionkey); ALTER TABLE nation ADD CONSTRAINT pkey_nation PRIMARY KEY(n_nationkey); ALTER TABLE part ADD CONSTRAINT pkey_part PRIMARY KEY(p_partkey); ALTER TABLE supplier ADD CONSTRAINT pkey_supplier PRIMARY KEY(s_suppkey); ALTER TABLE partsupp ADD CONSTRAINT pkey_partsupp PRIMARY KEY(ps_partkey,ps_suppkey); ALTER TABLE customer ADD CONSTRAINT pkey_customer PRIMARY KEY(c_custkey); ALTER TABLE lineitem ADD CONSTRAINT pkey_lineitem PRIMARY KEY(l_orderkey,l_linenumber); ALTER TABLE orders ADD CONSTRAINT pkey_orders PRIMARY KEY(o_orderkey); Workload Management through Characterization and Idleness Detction 128 References Creation of Primary Key create index fkey_nation_1 on nation(n_regionkey); create index fkey_supplier_1 on supplier(s_nationkey); create index fkey_customer_1 on customer(c_nationkey); create index fkey_partsupp_1 on partsupp(ps_suppkey); create index fkey_partsupp_2 on partsupp(ps_partkey); create index fkey_orders_1 on orders(o_custkey); create index fkey_lineitem_1 on lineitem(l_orderkey); create index fkey_lineitem_2 on lineitem(l_partkey,l_suppkey); create index fkey_lineitem_3 on lineitem(l_suppkey); create index xxx1 on lineitem(l_shipdate); create index xxx2 on customer(c_mktsegment); create index xxx3 on orders(o_orderdate); create index xxx4 on region(r_name); create index xxx5 on lineitem(l_discount); create index xxx6 on lineitem(l_quantity); create index xxx7 on lineitem(l_returnflag); create index xxx8 on lineitem(l_shipmode); create index xxx9 on lineitem(l_commitdate); create index xxx10 on lineitem(l_receiptdate); create index xxx11 on lineitem(l_partkey); create index xxx12 on part(p_size); create index xxx13 on part(p_type); create index xxx14 on partsupp(ps_supplycost); create index xxx15 on nation(n_name); create index xxx16 on part(p_name); create index xxx17 on orders(o_clerk); create index xxx18 on part(p_brand); create index xxx19 on part(p_container); Workload Management through Characterization and Idleness Detction 129 References Table Structure After the execution of the above statements, the following tables are created with the associative primary and secondary keys: CUSTOMER Table LineItem Table Workload Management through Characterization and Idleness Detction 130 References NATION Table ORDERS Table PART Table Workload Management through Characterization and Idleness Detction 131 References PARTSUPP Table REGION Table SUPPLIER Table Workload Management through Characterization and Idleness Detction 132
© Copyright 2025