Meet Cloud Performance Goals with Intel® Service Assurance Administrator

White Paper
Intel® Service Assurance Administrator
Meet Cloud Performance
Goals with Intel® Service
Assurance Administrator
Maintain consistent performance levels to meet enterprise virtual
workload requirements in the cloud
Overview
Enterprise IT is turning to cloud to meet demands ranging from faster service
provision to lower costs. But shared cloud environments and service level
agreements (SLAs) bring new management challenges.
This whitepaper examines - what better methods can be used to provide service
level objectives for cloud virtual machines (VMs), detect noisy neighbors, and
deliver business value in Openstack™-based private cloud environments.
Intel® Service Assurance
Administrator provides
essential software tools for
OpenStack* enabling creation
of a software-defined
infrastructure with enhanced
service level objectives.
Overview . . . . . . . . . . . . . . . . . . . . . . . . 1
Deploying cloud workloads:
the challenges . . . . . . . . . . . . . . . . . . . 1
Managing Cloud Workloads:
The Solution . . . . . . . . . . . . . . . . . . . . . 2
Performance Assurance: Why
Service Compute Units Matter. . . . . . 3
Intel SAA Advantages. . . . . . . . . . . . . 3
Intel SAA Use Cases . . . . . . . . . . . . . . 4
How Intel SAA Works. . . . . . . . . . . . . 4
Shared Resources, Performance
Contention, and Noisy Neighbors. . 4
Conclusion. . . . . . . . . . . . . . . . . . . . . . . 6
Deploying cloud workloads: the challenges
Openstack Scheduling Today (Before Implementing Intel® SAA)
Cloud computing datacenters provide
computational services and resources,
such as data processing, networking,
and storage. Typically, they run large
numbers of servers arranged in groups
controlled by a group controller. The
workloads handled by the datacenter
tap the server hardware resources to
various extents.
End users of such datacenter
services expect consistent workload
performance, measurable by industry
metrics and standard benchmarks.
Besides SPEC* benchmarks, Unixbench*
and specialized usage workloads are
often used to measure the performance
of a virtual infrastructure instance
allocated to an Intel or other CPU.
Benchmarks developed by end users
measure compute and I/O performance
to inform the build out of virtual
infrastructure for the end
user application.
In public Infrastructure-as-a-Service
(IaaS) environments resources are
provided as X number of cores or
vCPUs. For example, in Openstack you
can use any of the filters available to
shortlist the server you want to serve
the workload. The compute filters for
capabilities, capacity, etc. allow the
scheduler to allocate a workload on the
system that is best able to serve it.
A cloud end user has applications
with workloads either predominantly
utilizing memory and data storage
or the server processors. In a
static workload pattern, consistent
performance is a must or performance
degradation will deteriorate over time.
With no knowledge of these workload
patterns, the initial sizing of the
infrastructure becomes important. In a
dynamically changing environment, an
elastic approach to scale up or down
based on performance requirements
and assurance needs to be built in.
White Paper: Intel® Service Assurance Administrator
vCPU vs. SCU: Getting
Accurate Performance
Measurements
A virtual CPU (vCPU) is a virtual
machine allocated to a physical
CPU. Gartner sums up the
difficulty of obtaining accurate
performance measurements
based on the vCPU: “AWS is a
massive scale cloud provider,
with a wide variety mix of servers
and processor architectures
in existence. Therefore, two
instances each with 2 vCPU will
not necessarily be equivalent.
One instance could reside on
top of a 2012-based processor
while the other could reside on
top of a 2014-based processor.
Many people have written about
the fact that EC2 processor
architecture varies across
instance types and across
regions, even those described as
having the “same specs.’”2
A service compute unit (SCU)
provides a uniform number of
compute units for performance
assurance as a defined and
measurable service level
objective (SLO). These units
are part of the machine flavor
definitions.
With Intel’s SCU, IT has server
capacity and accurate metrics
for assessment performance on
Intel based processors.
Cloud Services
Catalog
Service
Server
Data Center
Test Score
Amazon EC2
Amazon EC2
ec2-us-west.linux.c1.medium
CA, US
3.44
ec2-us-west.linux.c1.xlarge
CA, US
6.12
Amazon EC2
ec2-us-west.linux.m1.large
CA, US
3.37
Amazon EC2
ec2-us-west.linux.m1.small
CA, US
N/A
Amazon EC2
ec2-us-west.linux.m1.xlarge
CA, US
4.18
Amazon EC2
ec2-us-west.linux.m2.2xlarge
CA, US
12.99
Amazon EC2
ec2-us-west.linux.m2.4xlarge
CA, US
22.81
Table 1. Cloud Harmony, March 14 “Pi day,” Aggregate CPU performance metric.
It is necessary to know the end user
workload requirements and categorize
static patterns (e.g., batch-mode
applications) or dynamically changing
patterns (e.g., interactive workloads) to
arrive at the application performance
requirements for the workload.
Whether static or dynamic, the user
has no means to specify a unit of
required compute performance that is
measurable and portable. Currently, a
service provider has no means to commit
to a performance SLA that is constant
across platform generation, sites, and
varying power and thermal conditions.
Data presented in published research1
shows that performance predictability in
an Amazon EC2 varies between 10% and
35%. There is a small variation within
the same processor type, and a larger
variation based on different processor
types for the same instance request. In
Table 1, we see 1% to 17% variation in
CPU performance at the same location
for the same instance type.
Once the virtual machine and image is
selected it is assigned to the bare-metal
hardware in a rack (Figure 1).
Services
Request
2
Service level
objectives
Service
Assurance
- Monitoring
- Usage tracking
- Reporting
- Diagnostics
- Capacity planning
Intel® Service Assurance Manager
(Intel® SAA)
Intel® Service Assurance Manager (Intel®
SAA) is service management software.
It’s targeted for environments where
enterprise IT administrators want to:
• Start an on-premise enterprise
cloud initiative to run workloads on
virtualized and cloud infrastructure
with performance service level
objectives (SLO) guarantees
• Explore integrated hardware and
software cloud systems solutions to
reduce time to deploy
Intel SAA maximizes the capabilities
of Intel® Xeon®-based platforms
to enable a “virtual machines”
contention detection solution via
scheduler models—providing assured
performance with noisy neighbor
detection, as well as the ability to port
workloads between datacenters without
compromising on performance.
Cloud
Scheduler
Machine
Flavors & App
Cloud Services
Customers
Managing Cloud Workloads:
The Solution
SDI - Cloud Machines
Resource
Scheduling
Machine placement
to match resource
needs and service
management
objectives.
Plug-In
Service Assurance
Administrator
Datacenter
Infrastructure
Figure 1.
End user service flow.
White Paper: Intel® Service Assurance Administrator
On server platforms, there are
partitionable shared resources.
Resources are partitionable for the
same CPU and threads in the same core,
and different cores in the same CPU.
There are also multiple generations of
CPUs and instruction sets.
To provide consistent performance on
any Intel® server platform, Intel created
the service compute Unit (SCU). Part of
Intel SAA, the SCU is, a consistent unit
of measurement for use by the server
and VMs. With the SCU, the service
provider can monitor virtual machines
SLO in the form of Intel SAA units.
In Intel SAA, an SLO is defined for
each virtual instance (i.e., a particular
number of SCUs), which will be
multiplied with the allocated virtual
cores to provide the “allowed” total
consumption to the VM. For example,
if the Instance flavors defined and
SLOs are as follows, the SCU SLO
management engine contains:
1.A Compute Consumption metric
based on the CPUs ability to execute
instructions.
Control
ALU
ALU
ALU
ALU
Cache
DRAM
DRAM
CPU
2.5
GPU
Comparison of the total number
of SCUs per CPU generation and
corresponding performance.
This shows the impact of the
greater number of SCUs per
CPU generation.
2.2
1.9
53.28
1.3
Nehalem
Westmere
Sample
SLO
Extra large
+30 GB
4 virtual cores
with 4 SCUs
High
Extra large
+15 GB
4 virtual cores
with 2 SCUs
Medium
(Virtual Cores* SCU)
Table 2. Sample SLOs
Each generation provides increasing
performance throughput, based on
more SCUs. So, one SCU signifies the
equal amount of performance range in
any generation of an Intel Xeon CPU.
For example, for Sandybridge (SNB), 1
SCU is the same as for Haswell (HSX).
However, as each new CPU generation
improves in performance, it does offer a
greater number of SCUs.
3
Sandy Bridge
Haswell
SCU growth per generation
Intel® SAA Advantages
CPU
139.2
79.92
2.And a Compute Performance Metric
to qualify the SCUs consumed.
Instance
Type and
RAM
172.8
SCU
Performance Assurance: Why
Service Compute Units Matter
Intel SAA—a service management
framework built and integrated into
the OpenStack cloud management
environment—brings advanced
management capabilities to private
cloud enterprise deployments,
including:
1.Creating a service management
environment to define service level
objectives (SLOs) and “tether” to
OpenStack flavors prior to launching
the workload. Intel SAA defines where
SCU is integrated into an application.
2.Building an OpenStack list of
compatible hosts able to run the image,
based on capacity as measured using
SCU metric and system characteristics,
and to sort hosts by “fitness” (a metric
for prioritizing hosts).
3.Matching workloads on platforms
based on capability, capacity, and
SLO.
4.Monitoring allocated machines and
workloads using deep platform
telemetry to detect performance
bottlenecks, noisy neighbors, and
SLO violations in order for the
cloud administrator to remediate
performance issues.
5.Assign workloads to run on secure
servers with Intel® Trusted Execution
Technology (Intel® TXT)3,—providing
hardware trust assurance with boot
attestation and white-listing for
trusted pools.
White Paper: Intel® Service Assurance Administrator
Intel® SAA Use Cases
Workload (w/SLA definition)
Open Stack
• Create VM Flavors: A flavor is an
available hardware configuration for
a server. The SCU is used to create
VM flavors for the end user to choose
(Table 2, section 2). Once the cloud
user chooses the flavor (Figure 1),
the VM of that size and SCUs are
allocated on a server.
• Plan Platform Allocations: SCUs are
also used by the cloud provider to
plan the size instances that can be
allocated to each hardware platform.
Nova
Nova-API
Shared Resources, Performance
Contention, and Noisy Neighbors
Within a server host there are
two broad classes of resources:
partitionable resources (e.g., CPU
cycles) and non-partitionable resources
(e.g., cache, bandwidth).
Operating systems have strong support
for allocation and scheduling of
partitionable resources, which ensures
VMs get their fair share of allocation.
4
Flavors...
Nova-scheduler
UI
Flavor Creator
SAA
Workload placement
Data collection
Reporting
Reporting
SLO
Data Collection
& Analysis
How Intel SAA Works
Node
Agent
Linux/KVM
Monitoring
-Utilization (IPC, ...)
-SCUs Control
-cGroups
-TO SCUs target
Figure 2. Intel® SAA control flow in Openstack™.
2
3
4
5
6
Host Servers
1
Flavor:
8 vCPUs, SCUs,
24GB RAM
Server Capabilities:
vCPUs, Xeon 2500,
TXT
Server Capabilities:
vCPUs, Xeon 5500
FILTERS
Intel SAA is built to work with
cloud orchestrator Openstack. The
architecture and control flow is shown
in Figure 2. The workload is bundled
into one or more virtual machines
by choosing a flavor with an SLO
measurement. The workload SLO is
used by the Intel SAA Controller agent
to create a filter for the Openstack
NOVA* scheduler. It schedules the VMs
to the appropriate CPUs sockets in
the nodes. The workload placement
module hosts the SLO database, data
analysis and collection agents and
reporting of workload characteristics,
and VM usage and performance trends
to the Intel SAA UI. At each host, there
is a node agent that uses a reservation
share and limit mechanism to allocate
the appropriate Intel SAA SCUs and
resources according to the flavor
chosen. It monitors the data on each
VM, and reports SLO violations and
captured in the log files.
DB
Selected server
for placing VM
1
2
3
4
5
6
Server Capabilities:
vCPUs, Xeon 5500, TXT,
PCIe, 10G
Figure 3. Scheduling in Openstack™.
The cloud controllers in OpenStack,
allocate resources across hundreds of
machines in the cluster using the filter
(Figure 3). They also use conventional
partitionable resources, such as
available memory, to make scheduling
decisions. The primary challenge
lies in allocating non-partitionable
shared resources. These are currently
unmanaged based on their operating
environments. This means allocation
is nondeterministic in nature. For
example, the share of cache that a
VM uses, depends not only on its
requirements, but on other neighboring
VMs on the same socket at any point in
time.
Noisy neighbors are VMs that cause
degrading performance and contention
side-effects to other running VMs
residing on the same server. They are a
publically acknowledged problem with
few current solutions.
White Paper: Intel® Service Assurance Administrator
a.
b.
c.
Figure 4: a. Throughput going down; b.
contention going up, and; c. transactions
becoming slower for the victim due to noisy
neighbors.
Figure 5: The victim VM slows down nearly 50% when running with contention. Intel SAA
monitors, identifies, and meets SLO in a noisy neighbor situation (Figures 6a & 6b).
Noisy Neighbors: Example #1
Noisy Neighbors: Example #2
Client A’s virtual machine orchestrator
(VM0) lands on a host with performance
matching its SLO of 2 SCUs (Figure 4).
As new VMs from this or other accounts
lands on this host, the throughput
starts dropping (measured as a
performance score). At the same time,
the contention score goes up (Figure 5).
As a result, you may see a transaction
of VM0 1 or 2 slow down. As seen in
Figure 6, the transaction of VM0 or the
victim slows down by 700 seconds, an
approximately 100% slowdown if all
neighbors are aggressors.
1.VM1 hosts a task scheduler used in a
mail server. It finds the shortest path
in the network to deliver emails.
2.The graph in Figure 5 shows the
victim VM slowing down almost 50%
when running with contention.
3.After taking action we see that the
victim’s completion time becomes
equal to its solo run.
When aggressors are added, the SCU
score comes down and workload
throughput is reduced by 50%. This
is an extreme case, where multiple
aggressors are present. Intel SAA will
start flagging aggressor behavior on
its dashboard. The service provider
administrator can then take an action
to move the workload and get the SCU
consumption back to the desired level
(Figure. 6b).
In Figure 6a the victim is allocated 2
SCUs, and reaches a consumption of
1.32 when it runs without contention.
Figure 6a
Figure 6b
5
White Paper: Intel®Service Assurance Administrator
Conclusion
SCU is a more accurate way of measuring the performance
of VM and server capacity when datacenters are populated
with different generations of Intel® CPU platforms, Service
providers should consider deploying Intel SAA to provide
better customer insight on how workloads are performing
against end user contract SLAs, especially, if customer
workloads are overusing the stated resource outlined in
the contract. Using SCU as a capacity management tool—
rather than over-provisioning servers to anticipate VM
bursting performance requirements—can also increase
server VM density.
Enhancing physical and virtual cloud infrastructure with Intel
SAA results in increased manageability and efficiency for
enterprise workloads. Intel SAA provides essential software
tools for OpenStack, enabling creation of a software-defined
infrastructure with enhanced service level objectives.
Learn more at: intel.com/assurance.
View a related white paper from Intel and Redapt,
Inc., Integrated OpenStack™ Cloud Solution with
Service Assurance, at: http://www.intel.com/
content/www/us/en/software/intel-serviceassurance-redapt-white-paper.html
1. https://ebooksgenius.com/pdf/hpc-in-cloud-eda-20901094.html and https://www.usenix.org/conference/hotcloud12/workshop-program/presentation/ou
2. Gartner, AWS moves from ECU to vCPU, 2014. http://blogs.gartner.com/kyle-hilgendorf/2014/04/16/aws-moves-from-ecu-to-vcpu/
3. No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer with Intel®
Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules, and an Intel TXT-compatible measured launched
environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit www.intel.com/content/www/us/en/data-security/
security-overview-general-technology.html.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF
SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO
SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY,
OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A “Mission Critical Application” is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death.
SHOULD YOU PURCHASE OR USE INTEL’S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND
ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS
COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS’ FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY,
PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR
WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics
of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for
conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this
information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
© 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
0914/FF/CMD/PDF
Please Recycle
331237-001US