White Paper Intel® Service Assurance Administrator Meet Cloud Performance Goals with Intel® Service Assurance Administrator Maintain consistent performance levels to meet enterprise virtual workload requirements in the cloud Overview Enterprise IT is turning to cloud to meet demands ranging from faster service provision to lower costs. But shared cloud environments and service level agreements (SLAs) bring new management challenges. This whitepaper examines - what better methods can be used to provide service level objectives for cloud virtual machines (VMs), detect noisy neighbors, and deliver business value in Openstack™-based private cloud environments. Intel® Service Assurance Administrator provides essential software tools for OpenStack* enabling creation of a software-defined infrastructure with enhanced service level objectives. Overview . . . . . . . . . . . . . . . . . . . . . . . . 1 Deploying cloud workloads: the challenges . . . . . . . . . . . . . . . . . . . 1 Managing Cloud Workloads: The Solution . . . . . . . . . . . . . . . . . . . . . 2 Performance Assurance: Why Service Compute Units Matter. . . . . . 3 Intel SAA Advantages. . . . . . . . . . . . . 3 Intel SAA Use Cases . . . . . . . . . . . . . . 4 How Intel SAA Works. . . . . . . . . . . . . 4 Shared Resources, Performance Contention, and Noisy Neighbors. . 4 Conclusion. . . . . . . . . . . . . . . . . . . . . . . 6 Deploying cloud workloads: the challenges Openstack Scheduling Today (Before Implementing Intel® SAA) Cloud computing datacenters provide computational services and resources, such as data processing, networking, and storage. Typically, they run large numbers of servers arranged in groups controlled by a group controller. The workloads handled by the datacenter tap the server hardware resources to various extents. End users of such datacenter services expect consistent workload performance, measurable by industry metrics and standard benchmarks. Besides SPEC* benchmarks, Unixbench* and specialized usage workloads are often used to measure the performance of a virtual infrastructure instance allocated to an Intel or other CPU. Benchmarks developed by end users measure compute and I/O performance to inform the build out of virtual infrastructure for the end user application. In public Infrastructure-as-a-Service (IaaS) environments resources are provided as X number of cores or vCPUs. For example, in Openstack you can use any of the filters available to shortlist the server you want to serve the workload. The compute filters for capabilities, capacity, etc. allow the scheduler to allocate a workload on the system that is best able to serve it. A cloud end user has applications with workloads either predominantly utilizing memory and data storage or the server processors. In a static workload pattern, consistent performance is a must or performance degradation will deteriorate over time. With no knowledge of these workload patterns, the initial sizing of the infrastructure becomes important. In a dynamically changing environment, an elastic approach to scale up or down based on performance requirements and assurance needs to be built in. White Paper: Intel® Service Assurance Administrator vCPU vs. SCU: Getting Accurate Performance Measurements A virtual CPU (vCPU) is a virtual machine allocated to a physical CPU. Gartner sums up the difficulty of obtaining accurate performance measurements based on the vCPU: “AWS is a massive scale cloud provider, with a wide variety mix of servers and processor architectures in existence. Therefore, two instances each with 2 vCPU will not necessarily be equivalent. One instance could reside on top of a 2012-based processor while the other could reside on top of a 2014-based processor. Many people have written about the fact that EC2 processor architecture varies across instance types and across regions, even those described as having the “same specs.’”2 A service compute unit (SCU) provides a uniform number of compute units for performance assurance as a defined and measurable service level objective (SLO). These units are part of the machine flavor definitions. With Intel’s SCU, IT has server capacity and accurate metrics for assessment performance on Intel based processors. Cloud Services Catalog Service Server Data Center Test Score Amazon EC2 Amazon EC2 ec2-us-west.linux.c1.medium CA, US 3.44 ec2-us-west.linux.c1.xlarge CA, US 6.12 Amazon EC2 ec2-us-west.linux.m1.large CA, US 3.37 Amazon EC2 ec2-us-west.linux.m1.small CA, US N/A Amazon EC2 ec2-us-west.linux.m1.xlarge CA, US 4.18 Amazon EC2 ec2-us-west.linux.m2.2xlarge CA, US 12.99 Amazon EC2 ec2-us-west.linux.m2.4xlarge CA, US 22.81 Table 1. Cloud Harmony, March 14 “Pi day,” Aggregate CPU performance metric. It is necessary to know the end user workload requirements and categorize static patterns (e.g., batch-mode applications) or dynamically changing patterns (e.g., interactive workloads) to arrive at the application performance requirements for the workload. Whether static or dynamic, the user has no means to specify a unit of required compute performance that is measurable and portable. Currently, a service provider has no means to commit to a performance SLA that is constant across platform generation, sites, and varying power and thermal conditions. Data presented in published research1 shows that performance predictability in an Amazon EC2 varies between 10% and 35%. There is a small variation within the same processor type, and a larger variation based on different processor types for the same instance request. In Table 1, we see 1% to 17% variation in CPU performance at the same location for the same instance type. Once the virtual machine and image is selected it is assigned to the bare-metal hardware in a rack (Figure 1). Services Request 2 Service level objectives Service Assurance - Monitoring - Usage tracking - Reporting - Diagnostics - Capacity planning Intel® Service Assurance Manager (Intel® SAA) Intel® Service Assurance Manager (Intel® SAA) is service management software. It’s targeted for environments where enterprise IT administrators want to: • Start an on-premise enterprise cloud initiative to run workloads on virtualized and cloud infrastructure with performance service level objectives (SLO) guarantees • Explore integrated hardware and software cloud systems solutions to reduce time to deploy Intel SAA maximizes the capabilities of Intel® Xeon®-based platforms to enable a “virtual machines” contention detection solution via scheduler models—providing assured performance with noisy neighbor detection, as well as the ability to port workloads between datacenters without compromising on performance. Cloud Scheduler Machine Flavors & App Cloud Services Customers Managing Cloud Workloads: The Solution SDI - Cloud Machines Resource Scheduling Machine placement to match resource needs and service management objectives. Plug-In Service Assurance Administrator Datacenter Infrastructure Figure 1. End user service flow. White Paper: Intel® Service Assurance Administrator On server platforms, there are partitionable shared resources. Resources are partitionable for the same CPU and threads in the same core, and different cores in the same CPU. There are also multiple generations of CPUs and instruction sets. To provide consistent performance on any Intel® server platform, Intel created the service compute Unit (SCU). Part of Intel SAA, the SCU is, a consistent unit of measurement for use by the server and VMs. With the SCU, the service provider can monitor virtual machines SLO in the form of Intel SAA units. In Intel SAA, an SLO is defined for each virtual instance (i.e., a particular number of SCUs), which will be multiplied with the allocated virtual cores to provide the “allowed” total consumption to the VM. For example, if the Instance flavors defined and SLOs are as follows, the SCU SLO management engine contains: 1.A Compute Consumption metric based on the CPUs ability to execute instructions. Control ALU ALU ALU ALU Cache DRAM DRAM CPU 2.5 GPU Comparison of the total number of SCUs per CPU generation and corresponding performance. This shows the impact of the greater number of SCUs per CPU generation. 2.2 1.9 53.28 1.3 Nehalem Westmere Sample SLO Extra large +30 GB 4 virtual cores with 4 SCUs High Extra large +15 GB 4 virtual cores with 2 SCUs Medium (Virtual Cores* SCU) Table 2. Sample SLOs Each generation provides increasing performance throughput, based on more SCUs. So, one SCU signifies the equal amount of performance range in any generation of an Intel Xeon CPU. For example, for Sandybridge (SNB), 1 SCU is the same as for Haswell (HSX). However, as each new CPU generation improves in performance, it does offer a greater number of SCUs. 3 Sandy Bridge Haswell SCU growth per generation Intel® SAA Advantages CPU 139.2 79.92 2.And a Compute Performance Metric to qualify the SCUs consumed. Instance Type and RAM 172.8 SCU Performance Assurance: Why Service Compute Units Matter Intel SAA—a service management framework built and integrated into the OpenStack cloud management environment—brings advanced management capabilities to private cloud enterprise deployments, including: 1.Creating a service management environment to define service level objectives (SLOs) and “tether” to OpenStack flavors prior to launching the workload. Intel SAA defines where SCU is integrated into an application. 2.Building an OpenStack list of compatible hosts able to run the image, based on capacity as measured using SCU metric and system characteristics, and to sort hosts by “fitness” (a metric for prioritizing hosts). 3.Matching workloads on platforms based on capability, capacity, and SLO. 4.Monitoring allocated machines and workloads using deep platform telemetry to detect performance bottlenecks, noisy neighbors, and SLO violations in order for the cloud administrator to remediate performance issues. 5.Assign workloads to run on secure servers with Intel® Trusted Execution Technology (Intel® TXT)3,—providing hardware trust assurance with boot attestation and white-listing for trusted pools. White Paper: Intel® Service Assurance Administrator Intel® SAA Use Cases Workload (w/SLA definition) Open Stack • Create VM Flavors: A flavor is an available hardware configuration for a server. The SCU is used to create VM flavors for the end user to choose (Table 2, section 2). Once the cloud user chooses the flavor (Figure 1), the VM of that size and SCUs are allocated on a server. • Plan Platform Allocations: SCUs are also used by the cloud provider to plan the size instances that can be allocated to each hardware platform. Nova Nova-API Shared Resources, Performance Contention, and Noisy Neighbors Within a server host there are two broad classes of resources: partitionable resources (e.g., CPU cycles) and non-partitionable resources (e.g., cache, bandwidth). Operating systems have strong support for allocation and scheduling of partitionable resources, which ensures VMs get their fair share of allocation. 4 Flavors... Nova-scheduler UI Flavor Creator SAA Workload placement Data collection Reporting Reporting SLO Data Collection & Analysis How Intel SAA Works Node Agent Linux/KVM Monitoring -Utilization (IPC, ...) -SCUs Control -cGroups -TO SCUs target Figure 2. Intel® SAA control flow in Openstack™. 2 3 4 5 6 Host Servers 1 Flavor: 8 vCPUs, SCUs, 24GB RAM Server Capabilities: vCPUs, Xeon 2500, TXT Server Capabilities: vCPUs, Xeon 5500 FILTERS Intel SAA is built to work with cloud orchestrator Openstack. The architecture and control flow is shown in Figure 2. The workload is bundled into one or more virtual machines by choosing a flavor with an SLO measurement. The workload SLO is used by the Intel SAA Controller agent to create a filter for the Openstack NOVA* scheduler. It schedules the VMs to the appropriate CPUs sockets in the nodes. The workload placement module hosts the SLO database, data analysis and collection agents and reporting of workload characteristics, and VM usage and performance trends to the Intel SAA UI. At each host, there is a node agent that uses a reservation share and limit mechanism to allocate the appropriate Intel SAA SCUs and resources according to the flavor chosen. It monitors the data on each VM, and reports SLO violations and captured in the log files. DB Selected server for placing VM 1 2 3 4 5 6 Server Capabilities: vCPUs, Xeon 5500, TXT, PCIe, 10G Figure 3. Scheduling in Openstack™. The cloud controllers in OpenStack, allocate resources across hundreds of machines in the cluster using the filter (Figure 3). They also use conventional partitionable resources, such as available memory, to make scheduling decisions. The primary challenge lies in allocating non-partitionable shared resources. These are currently unmanaged based on their operating environments. This means allocation is nondeterministic in nature. For example, the share of cache that a VM uses, depends not only on its requirements, but on other neighboring VMs on the same socket at any point in time. Noisy neighbors are VMs that cause degrading performance and contention side-effects to other running VMs residing on the same server. They are a publically acknowledged problem with few current solutions. White Paper: Intel® Service Assurance Administrator a. b. c. Figure 4: a. Throughput going down; b. contention going up, and; c. transactions becoming slower for the victim due to noisy neighbors. Figure 5: The victim VM slows down nearly 50% when running with contention. Intel SAA monitors, identifies, and meets SLO in a noisy neighbor situation (Figures 6a & 6b). Noisy Neighbors: Example #1 Noisy Neighbors: Example #2 Client A’s virtual machine orchestrator (VM0) lands on a host with performance matching its SLO of 2 SCUs (Figure 4). As new VMs from this or other accounts lands on this host, the throughput starts dropping (measured as a performance score). At the same time, the contention score goes up (Figure 5). As a result, you may see a transaction of VM0 1 or 2 slow down. As seen in Figure 6, the transaction of VM0 or the victim slows down by 700 seconds, an approximately 100% slowdown if all neighbors are aggressors. 1.VM1 hosts a task scheduler used in a mail server. It finds the shortest path in the network to deliver emails. 2.The graph in Figure 5 shows the victim VM slowing down almost 50% when running with contention. 3.After taking action we see that the victim’s completion time becomes equal to its solo run. When aggressors are added, the SCU score comes down and workload throughput is reduced by 50%. This is an extreme case, where multiple aggressors are present. Intel SAA will start flagging aggressor behavior on its dashboard. The service provider administrator can then take an action to move the workload and get the SCU consumption back to the desired level (Figure. 6b). In Figure 6a the victim is allocated 2 SCUs, and reaches a consumption of 1.32 when it runs without contention. Figure 6a Figure 6b 5 White Paper: Intel®Service Assurance Administrator Conclusion SCU is a more accurate way of measuring the performance of VM and server capacity when datacenters are populated with different generations of Intel® CPU platforms, Service providers should consider deploying Intel SAA to provide better customer insight on how workloads are performing against end user contract SLAs, especially, if customer workloads are overusing the stated resource outlined in the contract. Using SCU as a capacity management tool— rather than over-provisioning servers to anticipate VM bursting performance requirements—can also increase server VM density. Enhancing physical and virtual cloud infrastructure with Intel SAA results in increased manageability and efficiency for enterprise workloads. Intel SAA provides essential software tools for OpenStack, enabling creation of a software-defined infrastructure with enhanced service level objectives. Learn more at: intel.com/assurance. View a related white paper from Intel and Redapt, Inc., Integrated OpenStack™ Cloud Solution with Service Assurance, at: http://www.intel.com/ content/www/us/en/software/intel-serviceassurance-redapt-white-paper.html 1. https://ebooksgenius.com/pdf/hpc-in-cloud-eda-20901094.html and https://www.usenix.org/conference/hotcloud12/workshop-program/presentation/ou 2. Gartner, AWS moves from ECU to vCPU, 2014. http://blogs.gartner.com/kyle-hilgendorf/2014/04/16/aws-moves-from-ecu-to-vcpu/ 3. No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules, and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit www.intel.com/content/www/us/en/data-security/ security-overview-general-technology.html. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A “Mission Critical Application” is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL’S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS’ FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 0914/FF/CMD/PDF Please Recycle 331237-001US
© Copyright 2024