Citrix XenDesktop and NetApp VDI Best Practice Guide

Delivering 5000 Desktops with Citrix XenDesktop
Validation Report and Recommendations for a Scalable VDI Deployment using Citrix
XenDesktop and Provisioning Services, NetApp Storage and VMWare Server
Virtualization
www.citrix.com
TABLE OF CONTENTS INTRODUCTION ............................................................................................................................... 3 CITRIX XENDESKTOP OVERVIEW .............................................................................................. 3 EXECUTIVE SUMMARY................................................................................................................... 4 Key Findings ................................................................................................................................................................................................................................. 5 METHODOLOGY AND WORKLOAD ............................................................................................ 5 Workload Details – LoginVSI from Login Consultants ............................................................................................................................................... 5 Component Scalability Results ............................................................................................................................................................................................. 6 XENDESKTOP DESKTOP DELIVERY CONTROLLER (DDC) ........................................................... 7 SINGLE SERVER SCALABILITY .................................................................................................. 8 PROVISIONING SERVICES SCALABILITY ..................................................................................... 9 FINDINGS .......................................................................................................................................... 10 The Desktop and Desired User Experience .................................................................................................................................................................. 10 Citrix XenDesktop Desktop Delivery Controller ........................................................................................................................................................ 10 Storage Recommendations ................................................................................................................................................................................................. 11 Server Hardware Findings .................................................................................................................................................................................................. 12 Server Virtualization Findings ........................................................................................................................................................................................... 13 Additional Implications for Scalability Design ............................................................................................................................................................ 14 LARGE SCALE TEST RESULTS .................................................................................................... 16 Test Details ................................................................................................................................................................................................................................ 16 Summary of Large Scale Test Results ............................................................................................................................................................................. 17 Session Performance and Session Start‐up Times .................................................................................................................................................... 17 Desktop Delivery Controller and Provisioning Services Performance ............................................................................................................. 19 DESKTOP DELIVERY CONTROLLER PERFORMANCE .................................................................. 19 CITRIX PROVISIONING SERVICES (PVS) PERFORMANCE .......................................................... 29 NetApp Storage Performance ............................................................................................................................................................................................ 34 VMWare Virtual Center and ESX Performance ........................................................................................................................................................... 35 ESX PERFORMANCE ............................................................................................................. 42 SUMMARY ........................................................................................................................................ 44 APPENDIX A – BLADE SERVER HARDWARE AND DEPLOYMENT .................................. 45 APPENDIX B – NETWORK DIAGRAM ...................................................................................... 48 REFERENCES ................................................................................................................................... 50 2
Introduction
This document is intended to provide advanced technical personnel - architects, engineers and
consultants – with data to assist in the planning, design and deployment of a Citrix XenDesktop hosted
VM-based (VDI) solution that scales to 5000 desktops. This document presents the findings of internal
Citrix testing that simulates a large enterprise deployment of VDI desktops.
This document provides data generated from a sample deployment, in which a single OS image is
provisioned to 5,000 unique desktop users. This document is not intended to provide definitive guidance
on scalability, and the data should be interpreted and adapted for your specific environment’s. To help
you understand the data, some examples of possible recommendations are made throughout the
document to adjust to different scenarios.
The information gathered from this testing is part of a comprehensive and constantly growingguidebook
to
scalability.
Please
reference
the
XenDesktop
Scalability
Guidelines
at
http://support.citrix.com/proddocs/topic/xendesktop-bdx/cds-scalability-wrapper-bdx.html for an
understanding how to scale in building blocks to many tens of thousands of desktops.
Citrix XenDesktop Overview
IT organizations today are looking for new ways to address their desktop challenges, whether it be rapid
provisioning, Windows 7 migrations, security, patching and updating, or remote access. They are
exploring solutions for current business initiatives, such as outsourcing, compliance and globalization.
Many are interested in “bring your own computer” policies, to enable IT to get out of the business of
managing hardware and focus on the core software and intellectual property that is central to the line of
business.
Citrix XenDesktop offers the most powerful and flexible desktop virtualization solution available on the
market, enabling organizations to start delivering desktops as a service to users on any device, anywhere.
With FlexCast delivery technology, XenDesktop can match the virtual desktop model to the
performance, security, flexibility and cost requirements of each group of users across the enterprise.
This document focuses on the scalability and test results of one of the six FlexCast delivery models:
hosted VM-based desktops, or VDI.
3
Executive Summary
Citrix internally tested a sample VDI deployment designed for high-availability and simulated real-world
workloads using XenDesktop 4. The end-to-end environment included more than 3300 Windows XP
virtual desktops. In addition, key components were individually tested to determine their ability to
support more than 5000 desktops. Combining the complete system results with the individual
component tests enabled Citrix to extrapolate results to support a single virtual desktop infrastructure
design that can deliver at least 5000 desktops.
The full VDI infrastructure was built using the following components:
o
o
o
o
o
o
Desktop Delivery Controller for brokering, remoting and managing the virtual desktop
Citrix Provisioning Services for OS provisioning
NetApp centralized storage for storing user profiles, write cache and relevant databases
HP Blade servers for hosting the VMs
VMWare ESX and vCenter as the server virtualization infrastructure
Cisco datacenter network switches
4
Key Findings
o Workloads and Boot or Logon Storms (from rapid concurrent or simultaneous user logons) have
the largest impact to how you scale and size this VDI design
o Desktop Delivery Controllers can be virtualized and have roles divided amongst them for best
scalability and resiliency
o Citrix Provisioning Services, with the release of 5.1 SP2, has demonstrated unparalleled scale
(with over 3000 users per physical server) and reliability in this VDI deployment.
o Virtual Machine density will vary with OS, workload and of course server hardware
Methodology and Workload
Testing was done in two phases - individual component scalability and full-system scalability. Central to
both phases of testing is the use of a tool that simulates real-world workloads, as well as an internally
built tool to measure session startup times (providing expected user logon times).
Workload Details – LoginVSI from Login Consultants One of the most critical factors of designing a scalable VDI deployment is understanding the true user
workflow and planning adequately in terms of server and storage capacity, while setting a standard for
the user experience throughout.
To accurately represent a real-world user workflow, the third-party tools from Login Consultants were
used throughout the full system testing. These tools also take measures of in-session response time,
providing a way to measure the expected user experience in accessing their desktop throughout large
scale testing, including login storms.
The widely available workload simulation tool, LoginVSI 1.x, was also coupled with the use of the idle
pool (feature in XenDesktop) to spin up sessions, simulating a scenario of all users coming in to work at
the same time and logging on. (Login VSI is freeware and can be downloaded from
www.loginconsultants.com.)
Login VSI is a benchmarking methodology that calculates an index based on the amount of
simultaneous sessions that can be run on a single machine. The objective is to find the point at which
5
the number of sessions generates too much load that end user experience would be noticeably e
degraded.
Login VSI simulates a medium-heavy workload user (intensive knowledge worker) running generic
applications like: Microsoft Office 2007, Internet Explorer including Flash applets and Adobe Acrobat
Reader (Note: For the purposes of this test, applications were installed locally, not streamed or hosted).
Like real users, the scripted session will leave multiple applications open at the same time. Every session
will average about 20% minimal user activity, similar to real world usage. Note that during each 18
minute loop users open and close files a couple of time per minutes which is probably more intensive
that most users.
Each loop will open and use:
• Outlook 2007, browse 10 messages & type new message.
• Internet Explorer, one instance is left open, one instance is browsed to Microsoft.com,
VMware.com and Citrix.com (locally cached copies of these websites).
• Word 2007, one instance to measure response time (9 times), one instance to review, edit and
print a random document.
• Solidata PDF writer & Acrobat Reader, the word document is printed to PDF and reviewed.
• Excel 2007, a very large randomized sheet is opened and edited.
• PowerPoint 2007, a random presentation is reviewed and edited.
3 Breaks (40, 20 & 40 seconds) are included to emulate real world usage.
Component Scalability Results The following components were tested for individual scalability:
• Desktop Delivery Controller (DDC)
• VMWare ESX Server on blade servers
• Provisioning Services (note this testing was done as part of full-scale system tests)
6
XenDesktop Desktop Delivery Controller (DDC)
The DDCs were virtualized on ESX server and some of the roles of the DDC were assigned to specific
DDCs, an approach often taken in Citrix XenApp deployments. The DDCs were configured such that:
DDC 1: Farm Master and Pool Management
DDC 2 & 3: VDA Registrations and XML Brokering
In this environment, 3 DDCs (4vCPU, 4GB RAM) were shown to be able to sustain a farm size of 6000
desktops and proved stable handling over 120k logons from a pool of 5650 users.
It was necessary to have multiple Virtual Center instances to support this scale; each VC instance
required a new XenDesktop desktop group. In the testing 5 VCs were used with the following
distribution:
• 2 x 2000 Desktops
• 2 x 700 Desktop
• 1 x 600 Desktops
The stability of the deployment was validated using the following method:
• All VMs were powered on using Idle Pool Management. This feature of XenDesktop allows the
environment to be automatically brought up in a controlled manner in advance of peak user
activity.
7
•
•
An initial logon storm was created by logging users on at a rate of ~3 per second.
Followed by a steady load as users logged off, rebooted and VDAs re-registered.
Single Server Scalability
The Single Server Scalability tests are focused on determining the number of Virtual Desktops a given
target machine can support. There are many permutations of tests that could be performed to evaluate
specific features or architectures. As this testing is a precursor to more comprehensive scalability tests
and guidelines, exploring a broad set of configurations was not within the scope of the project.
The methodology used is based on Project Virtual Reality Check (Project VRC:
http://www.virtualrealitycheck.net); project VRC was collaboration between two Consulting companies
(Login Consultants and PQR) with the objective of measuring hypervisor scalability using Login VSI 1.0.
The key differences are between the testing methodology used at Citrix and that of Project VRC are:
• Provisioning Services was included and enables Pooled XP Desktops running from a single
common vDisk. Some changes were made to the session logon scripts to prevent unnecessary
file copy operations that would impact the PVS Write Cache; this operation was intended for
XenApp environments.
• Connections are brokered via the XenDesktop DDC, not direct connections.
• The XP Virtual Desktops have been allocated 512MB RAM, compared to 1GB in the case of
ProjectVRC.
• Roaming users have been used instead of local profiles, as this would be representative of a VDI
deployment.
Each of the hardware platforms tested were intended to show scalability in memory and CPU bound
conditions along with cases where the environment was rich in memory and CPU resources.
VM Density used in Large Scale Testing
The following specifications were used.
o Windows XP pooled desktops
o 1vCPU and 512MB RAM.
o 1.5 GB PVS Cache on NFS (NetApp
3170HA)
o HP BL460c Dual Quad Core (1.86GHz
L5320) 16GiB RAM
o HP BL460c Dual Quad Core (2.5GHz
L5420) 32GB RAM
o ESX 3.5 Update 4
o VMs/Host
o VMs/Core
o 28
o 3.5
o 50
o 6.25
8
Note that at smaller scale, slightly higher single server density was possible, however at large scale we
noticed some degradation of performance. Testing showed that with 34 desktops on the BL460c 16GB
blade that ballooning was occurring and was unable to free enough memory. This caused the ESX host
to start to swap the guest memory to the storage tier. This impacted the end user experience as pages the
guest believed to be in memory were actually on disk, causing an increase in latency for accessing those
pages. A reduction in the number of guests per host removed the swapping behavior and removed the
impact on the end user experience that was seen when the environment was being scaled out.
Testing with 32GB RAM, 52 desktops were possible though the system was close to becoming CPU
bound. To avoid the risk of impacting user experience, we slightly reduced density used in the large-scale
tests.
Provisioning Services Scalability
The scalability of Provisioning Services builds on the results from the SSS testing. As we increased the
number of desktops being streamed from PVS, we monitored the Login VSI score and the logon time
to ensure that the end user experience remained acceptable. Standard Perfmon metrics were also
captured to understand the characteristics of PVS and streaming pooled desktops.
As the full-system scalability testing was conducted and users added to the maximum capacity of the
hardware, it was observed that ONE physical Provisioning Server could easily support the 3300
desktops. This is a significant improvement from earlier testing of previous versions of the technology.
9
Findings
To build a 5000 VDI desktop deployment, the findings of this round of testing indicates some new
guidance in our overall approach to scalability, to be captured in a comprehensive scalability guide in the
near future:
The Desktop and Desired User Experience Ensuring proper design of a large-scale VDI deployment requires that you have a good understanding of
how the users on average will be using their desktops and applications. The two critical elements are
login storms and the in-session workload.
The test environment is capable of supporting a login storm of 5000 desktops based on test data.
LoginVSI workload was for a medium type of user as described in the Methodology section.
If the user workload varies greatly on average from the one described in this design, then you need to
model the workload on at least a single-server basis to gain approximations for sizing servers and storage
components differently.
Citrix XenDesktop Desktop Delivery Controller XenDesktop Desktop Delivery Controller configuration was an enterprise installation with the following
adjustments to allow distribution of roles to 3 virtualized brokers:
Farm master (DDC1)
• Registry configured so that the DDC rejects VDA registrations.
•
Pool Management throttling was configured at 40 desktops, overriding the default of 10% of the
pool size (~160-170 desktops depending on the group.
•
Configured as the preferred Farm Master.
VDA registration and XML brokering (DDC2 and DDC3)
• The above pool management configuration change was made in case pool management failed
over to a different VDA.
This configuration was tested to support 5000 sessions.
10
Storage Recommendations For a large VDI deployment, a scalable storage solution is a cost-effective and reliable solution. The
NetApp FAS3170HA was used with 2 controllers, 70 x 300GB drives for storage and PAMII cards.
The PAM II modules in the NetApp FAS3170HA filer did not offer any gains as the workload on the
storage was write focused. For this version of XenDesktop and VDI design, the PAM II cards are not
required and would be not recommended
Otherwise, this particular configuration of NetApp is recommended as designed here for 5000 users,
with the assumption that there will be some potential degradation in a complete failover situation (where
one NetApp controller fails complete or similar failure). To tune the NetApp sizing for your particular
failover/recovery needs, it’s recommended to work with a NetApp sales engineer.
The FAS3170 was running OnTap version 7.3.2 with PAMII cards enabled. One aggregate per
controller with multiple volumes created on each aggregate per the layout shown below.
11
Server Hardware Findings For hosting the actual virtual desktops, a blade server configuration is recommended.
In this design, approximately 50 VMs/host was achieved using the following:
HP BL460
• 2 x 1.86Ghz Intel Xeon L5320 Quad Core (8MiB L2 Cache 1066Mhz Bus)
• 1 x 36GB HDD SAS 10K rpm
• 16 GB RAM 667Mhz
• Dual Broadcom 1Gb NICs
• QLogic QMH2462 Dual Port Fibre Channel HBA
HP BL460c
• 2 x 2.5Ghz Intel Xeon L5420 Quad Core (12MiB L2 Cache 1333Mhz Bus)
• 1 x 72GB HDD SAS 10K rpm
• 32 GB RAM 667Mhz
• Dual Broadcom 1Gb NICs
• QLogic QMH2462 Dual Port Fibre Channel HBA
Using similar hardware configuration but with newer updated Intel Nehalem processors (55xx series)
and memory configurations 64-96GB should provide significantly increased VM density.
For Provisioning Services, dedicated servers were used and over-specified for this design of 5000
desktops. An HP BL680 was used:
Citrix PVS Server
OS:
Windows 2008 64bit
Make:
HP
CPU:
4 x Intel E7450 2.4GHz
Disk:
2 x 72GB 10k SAS
Provisioning Services 5.1 SP2
Service Pack:
Model:
RAM:
Network:
camb5e1b02
1
BL680
64GiB
8 x 1GbE
From the test data, this server was highly underutilized.
The 24 core server is clearly over specified. With a peak of < 30%, this would equate to 7.2 cores. A dual
quad core server would expect to be able to handle this load, though may be too close to the maximum
utilization; hence instead of two 24 core servers, three 8 core servers would be sufficient.
12
Server Virtualization Findings In our testing, two desktop groups were configured, pointing at two different VMWare Virtual Center
servers.
Virtual Center 1 would run 1604 desktop sessions on 32 blades.
Virtual Center 2 would run 1708 desktop sessions on 61 blades.
Based on VMware best practice for the software versions used (VMWare ESX 3.5 Update 4) and
published maximums (2000 VMs per Virtual Center) the environment had to be split over 2 Virtual
Center instances.
Since then, VMWare has released version 4.0 that has higher limits than the 2000 VMs tested in version
3.5 (note that in version 4, the limit is respectively 3000 and 4500 for 32bit and 64bit guests). In
general, the recommendation would be to have the least number of Virtual Centers configured.
No changes were made or recommended from a standard installation. Servers were placed into logical
clusters, with one cluster matching one blade enclosure.
VMware ESX 3.5.0 build 176894 was used on all ESX hosts in the environment. Each host is
configured with a single virtual switch with both vmnic0 and vmnic1 connected.
The VM Network is configured with vmnic0 as active and vmnic1 as standby.
o This is used for ICA, PVS and general network traffic
The Service Console is not specifically bound to a specific vmic
VMotion is configured with vmnic1 as active and vmnic0 as standby
o This is used for NFS and VMotion traffic
13
Service Console was allocated 800MiB.
NFS configuration changes were made as per current NetApp guidance in the NetApp Technical
Report TR-3428
NTP was configured to sync time.
ESX hosts were installed with the latest HP ESX utilities for monitoring hardware.
Due to interrupt sharing issues between the vmkernel and the service console USB was disabled in
the BIOS. See VMWare KB article 1003710. Note that while the BIOS disabled USB, USB was
still available from the iLO so remote keyboard access was still available.
Additional Implications for Scalability Design Don’t place the PVS vDisk on a CIFS share.
o Windows does not cache files from file shares in memory, thus each time a call is made
to the PVS server to it in turn has to reach out to the shared storage.
Ensure VMware Virtual Center hasn’t set a resource limit on your Virtual Machine
o When we moved from the DDC testing which used 256MiB guests to the large-scale test
we increased the VM memory back to 512MiB however for some reason a limit was
placed on the memory resources available to the guest of 256MiB. This resulted in a VM
which appeared to have 512MiB RAM but was limited to only using 256MiB of physical
RAM and the rest was held in the VMware swap file, leading to huge increase in our
storage IO to the SAN which crippled the large scale environment down to less than 100
desktops. Check: “Virtual Machine Properties -> Resources -> Memory -> Limit:”
Don’t place too many Virtual Machines on VMFS volumes
o Not applicable to the NFS implementation, but seen with SSS testing using local VMFS
volumes and also FC attached VMFS volumes. Impact was most noticeable on user
logon time with it quickly increasing with more than 40 active VMs on a single VMFS
volume. Splitting this on to multiple volumes on the same number of disks alleviated the
problem.
.NET 3.5 SP1 (+ later windows updates) is necessary to improve scalability of the DDC
14
o Without this update applied we would see VDAs deregister as users began to login to the
system. This was seen with ~1500 desktops and higher. The Microsoft fixes to .NET
addressed the problem and allowed testing to achieve ~6000 desktops.
By default Pool Management will attempt to start 10% of the total pool size. In a large
environment this may be more than Virtual Center can cope with.
o The number of concurrent requests can be throttled by editing the Pool Management
Service configuration file:
o C:\Program Files\Citrix\VMManagement\CdsPoolMgr.exe.config
o Modify the <appSetting> section by adding the line:
o <add key="MaximumTransitionRate" value="20"/>
o The Pool Management service needs to be restarted to read the new configuration.
o If VMware DRS is being used a lower value should be set as DRS needs additional time
to determine guest placement before powering it on. In our testing with DRS enabled
the rate of 20 was used.
o In our testing we allowed DRS to do the initial VM placement through a full run, DRS
was then disabled and this allowed the MaxiumumTransisionRate to be increased to 40
without VC becoming overloaded.
Details on assigning the farm master roles can be found in CTX117477. Note that the XenDesktop
PowerShell SDK can also be used to configure the preferred farm master.
To stop the farm master handling connections, see the MaxWorkers registry key in CTX117446.
PVS NIC teaming can simplify the deployment of the PVS server.
o NIC teaming also improves the reliability, as one PVS server has one IP address, if a
network connection fails, the remaining connections take over the load and the PVS
server continues to operate on its current IP. This is especially useful for failover and HA
as only one IP address needs to be specified for the login server per host. This also
allows the network layer to handle the load balancing of client connections over the
available NICs.
15
Large Scale Test Results
Test Details The test run of 3312 desktops comprised of an idle pool spin up with the following details:
o All sessions launched within approximately 60 minutes.
o Individual logon times tracked to ensure logon performance did not degrade
significantly.
o All running the Login VSI 1.1 workload and their response times logged.
o At the end of the VSI workload phase the users would logout. This triggers Pool
Management to shutdown then restart the desktop.
o PVS HA testing to ensure all desktops would continue to run in the event of a PVS
server failure.
o Use the various product management consoles during the test to ensure they remain
responsive to general admin tasks.
Environment: Two desktop groups, pointing at two different Virtual Center servers.
o Virtual Center 1 ran 1604 desktop sessions on 32 blades.
o Virtual Center 2 ran 1708 desktop sessions on 61 blades.
o Based on VMware best practice and published maximums the environment had to be
split over 2 Virtual Center instances. Within the Virtual Center individual clusters are
created for each blade chassis (of up to 16 blade servers).
o Virtual Center 1 has clusters for two chassis of the more powerful blade servers. Virtual
Center 2 hosts clusters for the other four chassis of blades.
16
Summary of Large Scale Test Results •
•
•
•
•
•
•
Powering on all 3312 desktops ready for users to login took less than 60 minutes using
XenDesktop Idle Pool Management capability.
Using a launch rate of 107/minute, 99% of users logged on in 31 minutes.
PVS was shown to be able to run 3312 desktops from a HA pair of servers. In a separate test
one of the PVS servers was shutdown triggering a HA failover. The ~1600 sessions transferred
to the other server within 8 minutes.
The scalability of the environment was verified through analysis of the logon times, Login VSI
test response times and performance metrics gathered from all the major components.
The perfmon data confirms that a number of the servers were oversized and could easily handle
more load than was placed on them in this test.
It took on average 19 seconds from launching the ICA file to having a fully running desktop.
Login VSI response times indicate the system remained at an acceptable performance level for
all users during the test.
Session Performance and Session Start‐up Times LoginVSI results illustrate the capture of response time against the count of sessions launched.
You can observe that the max response time increases nominally as session count increases, but that
overall, average response times stay within the 2000ms for the duration.
17
Max Response_Time
Min Response_Time
Average Response_Time
4500
4000
Response time (ms)
3500
3000
2500
2000
1500
1000
500
1
104
207
310
413
516
619
722
825
928
1031
1134
1237
1340
1443
1546
1649
1752
1855
1958
2061
2164
2267
2370
2473
2576
2679
2782
2885
2988
3091
3194
0
Active Sessions
o Total Sessions Launched o 3312
o Uncorrected Optimal Performance Index (UOPI) o 3312
o Stuck Session Count before UOPI (SSC) o 0
o Lost Session Count before UOPI (LSC) o 44
o Corrected Optimal Performance Index (COPI = UOPI ‐ (SSC*50%) ‐ LSC) o 3268
Session start-up time is a measure of the time taken from starting the ICA client on the client launcher,
having received the ICA file from a successful XML brokering request, to the session loading and the
STAT mini agent (a .NET application loaded by the windows start-up folder) loading. This method of
calculating start-up time is the closest approximation to true user logon time in such a test environment
Logon times can be seen to fit mostly in band between 15-22 seconds though with some stray sessions
taking close to 40 seconds near the end of the logon storm and the earlier users and part way through
their first workload run.
18
Min
11 secs
Max
39 secs
Average
19 seconds
Desktop Delivery Controller and Provisioning Services Performance Where available, the data is presented for the environment during the spin up phase, which is controlled
via XenDesktop Idle Pool Management and during the test run; where the Login VSI 1.x workload is
run in all the sessions until all desktops have run the full set of scripts at least once, then a file is dropped
on a network share which triggers the Login VSI scripts to initiate a logoff when they next complete a
full run of the scripts.
As the desktops were configured to reboot on logoff, additional load is placed on the systems when
users begin to logoff and then idle pool management powers them back on again.
Standard Microsoft Windows perfmon counters were used to collect the following performance metrics.
Desktop Delivery Controller Performance
19
As mentioned previously, 3 DDCs were used in this test with specific roles assigned. All are running as
Virtual Machines on a separate ESX server to the desktop VMs. Configured with 4vCPU and 4GB
RAM, running on a HP BL460c with 2 x 1.8 GHz Quad Core L5320 CPU and 16 GB RAM.
DDC1: Farm Master + Pool Management
Pool Spin Up
During Test Run
% Processor Time: _Total (4vCPU)
15:11:22
15:15:45
15:20:08
15:24:31
15:28:54
15:33:17
15:37:40
15:42:03
15:46:26
15:50:49
15:55:11
15:59:33
16:03:56
16:08:18
16:12:41
16:17:03
16:21:26
16:25:49
16:30:12
16:34:35
16:38:59
50
45
40
35
30
25
20
15
10
5
0
13:10:54
13:14:48
13:18:43
13:22:38
13:26:33
13:30:27
13:34:21
13:38:15
13:42:09
13:46:03
13:49:57
13:53:51
13:57:45
14:01:39
14:05:33
14:09:27
14:13:21
14:17:15
14:21:09
14:25:03
14:28:57
50
45
40
35
30
25
20
15
10
5
0
XenDesktop Services % Processor Time
15:11:22
15:15:59
15:20:36
15:25:14
15:29:51
15:34:28
15:39:05
15:43:43
15:48:19
15:52:56
15:57:33
16:02:09
16:06:46
16:11:23
16:15:59
16:20:36
16:25:13
16:29:50
16:34:28
16:39:06
200
180
160
140
120
100
80
60
40
20
0
13:10:54
13:14:55
13:18:58
13:23:00
13:27:01
13:31:02
13:35:03
13:39:04
13:43:06
13:47:07
13:51:08
13:55:09
13:59:10
14:03:11
14:07:13
14:11:14
14:15:15
14:19:16
14:23:17
14:27:18
200
180
160
140
120
100
80
60
40
20
0
Process ‐‐ % Processor Time ‐‐ CdsPoolMgr
Process ‐‐ % Processor Time ‐‐ CdsPoolMgr
Process ‐‐ % Processor Time ‐‐ CdsImaProxy
Process ‐‐ % Processor Time ‐‐ CdsImaProxy
Process ‐‐ % Processor Time ‐‐ CdsController
Process ‐‐ % Processor Time ‐‐ CdsController
Process ‐‐ % Processor Time ‐‐ CitrixManagementServer
Process ‐‐ % Processor Time ‐‐ CitrixManagementServer
Process ‐‐ % Processor Time ‐‐ ImaSrv
Process ‐‐ % Processor Time ‐‐ ImaSrv
The main item to note is that during pool spin up the high usage process is the CdsPoolMgr process. This is expected is it drives Virtual
Center to start the guests up. The two peaks of the IMA service during Pool Spin up are caused by the UI taking the two desktop groups
out of Maintenance Mode.
During the test run itself IMASrv is responsible for brokering all the desktops, and so the zone master takes the most load while making
20
the decision on desktop assignment. In the later stage of the run desktops are starting to logoff and so the Pool Management Service is
starting to shutdown and restart the desktops.
21
15:11:22
15:16:13
15:21:05
15:25:56
15:30:48
15:35:39
15:40:31
15:45:22
15:50:13
15:55:04
15:59:55
16:04:45
16:09:36
16:14:27
16:19:18
16:24:09
16:29:01
16:33:53
16:38:45
13:10:54
13:15:09
13:19:27
13:23:42
13:27:58
13:32:13
13:36:28
13:40:44
13:44:59
13:49:15
13:53:30
13:57:45
14:02:00
14:06:16
14:10:31
14:14:46
14:19:02
14:23:17
14:27:32
15:11:22
15:15:59
15:20:36
15:25:14
15:29:51
15:34:28
15:39:05
15:43:43
15:48:19
15:52:56
15:57:33
16:02:09
16:06:46
16:11:23
16:15:59
16:20:36
16:25:13
16:29:50
16:34:28
16:39:06
13:10:54
13:14:55
13:18:58
13:23:00
13:27:01
13:31:02
13:35:03
13:39:04
13:43:06
13:47:07
13:51:08
13:55:09
13:59:10
14:03:11
14:07:13
14:11:14
14:15:15
14:19:16
14:23:17
14:27:18
15:11:22
15:16:27
15:21:33
15:26:39
15:31:45
15:36:50
15:41:56
15:47:01
15:52:07
15:57:12
16:02:16
16:07:21
16:12:27
16:17:32
16:22:37
16:27:42
16:32:48
16:37:55
13:10:54
13:15:23
13:19:55
13:24:25
13:28:55
13:33:24
13:37:54
13:42:23
13:46:53
13:51:22
13:55:52
14:00:21
14:04:51
14:09:20
14:13:50
14:18:19
14:22:49
14:27:18
MiB
2,000
MiB
Memory – Committed Megabytes
4,000
4,000
3,500
3,500
3,000
3,000
2,500
2,500
1,500
100
95
90
85
80
75
70
65
60
55
50
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
2,000
1,500
1,000
1,000
500
500
0
0
The memory usage on this DDC grows significantly towards the end of the run as users log off. This will trigger the tainting detection code
to shutdown the VM. Once shutdown pool management will power it back on again.
time garbage collection would correct the spike.
Further investigation is required to better understand the dramatic memory increase at the end of the test. It’s suspected that given enough
PhysicalDisk -- % Idle Time -- _Total
100
95
90
85
80
75
70
65
60
55
50
Context Switches (per second)
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
22
5
4.5
4.5
4
4
3.5
3.5
3
3
Mbps
5
2.5
2.5
2
1.5
1.5
1
1
0.5
0.5
0
0
Mbps Received
15:11:22
15:16:27
15:21:33
15:26:39
15:31:45
15:36:50
15:41:56
15:47:01
15:52:07
15:57:12
16:02:16
16:07:21
16:12:27
16:17:32
16:22:37
16:27:42
16:32:48
16:37:55
2
13:10:54
13:15:23
13:19:55
13:24:25
13:28:55
13:33:24
13:37:54
13:42:23
13:46:53
13:51:22
13:55:52
14:00:21
14:04:51
14:09:20
14:13:50
14:18:19
14:22:49
14:27:18
Mbps
Network Utilisation (Mbps)
Mbps Sent
Mbps Received
Mbps Sent
The spikes in network traffic at the end of the test correspond to the desktops being shutdown and restarted by the pool
management service. This traffic is between the DDC and the Virtual Center servers, as can be seen by the corresponding
increase on traffic on both VC at this time.
23
DDC2: XML + VDA registration
Pool Spin Up
During Test Run
% Processor Time: _Total (4vCPU)
15:11:22
15:15:44
15:20:07
15:24:30
15:28:53
15:33:16
15:37:38
15:42:01
15:46:23
15:50:46
15:55:08
15:59:30
16:03:52
16:08:14
16:12:36
16:16:58
16:21:20
16:25:42
16:30:04
16:34:26
16:38:48
50
45
40
35
30
25
20
15
10
5
0
13:10:53
13:14:47
13:18:41
13:22:34
13:26:28
13:30:22
13:34:15
13:38:09
13:42:03
13:45:56
13:49:50
13:53:44
13:57:38
14:01:31
14:05:25
14:09:19
14:13:12
14:17:06
14:20:59
14:24:53
14:28:47
50
45
40
35
30
25
20
15
10
5
0
XenDesktop Services % Processor Time
15:11:22
15:15:59
15:20:36
15:25:13
15:29:50
15:34:27
15:39:03
15:43:41
15:48:17
15:52:53
15:57:29
16:02:05
16:06:42
16:11:18
16:15:54
16:20:30
16:25:06
16:29:43
16:34:19
16:38:56
200
180
160
140
120
100
80
60
40
20
0
13:10:53
13:14:54
13:18:55
13:22:56
13:26:56
13:30:57
13:34:58
13:38:58
13:42:59
13:47:00
13:51:01
13:55:02
13:59:03
14:03:03
14:07:04
14:11:05
14:15:05
14:19:06
14:23:07
14:27:08
200
180
160
140
120
100
80
60
40
20
0
Process ‐‐ % Processor Time ‐‐ CdsPoolMgr
Process ‐‐ % Processor Time ‐‐ CdsPoolMgr
Process ‐‐ % Processor Time ‐‐ CdsImaProxy
Process ‐‐ % Processor Time ‐‐ CdsImaProxy
Process ‐‐ % Processor Time ‐‐ CdsController
Process ‐‐ % Processor Time ‐‐ CdsController
Process ‐‐ % Processor Time ‐‐ CitrixManagementServer
Process ‐‐ % Processor Time ‐‐ CitrixManagementServer
Process ‐‐ % Processor Time ‐‐ ImaSrv
Process ‐‐ % Processor Time ‐‐ ImaSrv
In contrast to DDC1 the load is noticeably lower. The main active process is the CdsController which handles communication with the
VDA including heartbeats and initial registration.
24
15:11:22
15:16:13
15:21:04
15:25:55
15:30:47
15:35:38
15:40:29
15:45:20
15:50:10
15:55:00
15:59:51
16:04:41
16:09:31
16:14:22
16:19:12
16:24:03
16:28:53
16:33:44
16:38:34
13:10:53
13:15:08
13:19:23
13:23:38
13:27:53
13:32:08
13:36:23
13:40:38
13:44:52
13:49:08
13:53:22
13:57:38
14:01:53
14:06:07
14:10:22
14:14:37
14:18:52
14:23:07
14:27:22
15:11:22
15:15:59
15:20:36
15:25:13
15:29:50
15:34:27
15:39:03
15:43:41
15:48:17
15:52:53
15:57:29
16:02:05
16:06:42
16:11:18
16:15:54
16:20:30
16:25:06
16:29:43
16:34:19
16:38:56
13:10:53
13:14:54
13:18:55
13:22:56
13:26:56
13:30:57
13:34:58
13:38:58
13:42:59
13:47:00
13:51:01
13:55:02
13:59:03
14:03:03
14:07:04
14:11:05
14:15:05
14:19:06
14:23:07
14:27:08
15:11:22
15:16:27
15:21:33
15:26:38
15:31:43
15:36:49
15:41:54
15:46:59
15:52:03
15:57:08
16:02:12
16:07:17
16:12:21
16:17:26
16:22:31
16:27:35
16:32:40
16:37:45
13:10:53
13:15:22
13:19:52
13:24:20
13:28:49
13:33:19
13:37:48
13:42:17
13:46:46
13:51:15
13:55:44
14:00:13
14:04:43
14:09:11
14:13:40
14:18:09
14:22:38
14:27:08
MiB
2,000
MiB
Memory – Committed Megabytes
4,000
4,000
3,500
3,500
3,000
3,000
2,500
2,500
1,500
100
95
90
85
80
75
70
65
60
55
50
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
2,000
1,500
1,000
1,000
500
500
0
0
PhysicalDisk -- % Idle Time -- _Total
100
95
90
85
80
75
70
65
60
55
50
Due to some previous memory leak tracing for the IMA Service, user mode stack trace database was being created for the imasrv.exe. This
extra tracing was causing the higher than normal disk utilization, showing a steady baseline of 20% utilisation.
Context Switches (per second)
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
Network Utilisation (Mbps)
25
Mbps Received
15:11:22
15:15:44
15:20:06
15:24:29
15:28:51
15:33:13
15:37:36
15:41:58
15:46:21
15:50:42
15:55:04
15:59:26
16:03:48
16:08:10
16:12:32
16:16:54
16:21:16
16:25:38
16:30:00
16:34:22
16:38:45
13:10:54
13:14:47
13:18:42
13:22:35
13:26:29
13:30:22
13:34:16
13:38:10
13:42:03
13:45:57
13:49:51
13:53:44
13:57:38
14:01:32
14:05:26
14:09:19
14:13:13
14:17:06
14:21:00
14:24:53
14:28:47
15:11:22
15:16:27
15:21:33
15:26:38
15:31:43
15:36:49
15:41:54
15:46:59
15:52:03
15:57:08
16:02:12
16:07:17
16:12:21
16:17:26
16:22:31
16:27:35
16:32:40
16:37:45
13:10:53
13:15:22
13:19:52
13:24:20
13:28:49
13:33:19
13:37:48
13:42:17
13:46:46
13:51:15
13:55:44
14:00:13
14:04:43
14:09:11
14:13:40
14:18:09
14:22:38
14:27:08
Mbps
2.5
Mbps
5
5
4.5
4.5
4
4
3.5
3.5
3
3
50
45
40
35
30
25
20
15
10
5
0
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
Mbps Sent
Mbps Received
DDC3: XML + VDA registration
Pool Spin Up
Mbps Sent
During Test Run
% Processor Time: _Total (4vCPU)
50
45
40
35
30
25
20
15
10
5
0
26
XenDesktop Services % Processor Time
15:11:22
15:15:58
15:20:35
15:25:11
15:29:48
15:34:24
15:39:01
15:43:38
15:48:14
15:52:50
15:57:26
16:02:02
16:06:38
16:11:14
16:15:50
16:20:26
16:25:03
16:29:39
16:34:15
16:38:52
200
180
160
140
120
100
80
60
40
20
0
13:10:54
13:14:55
13:18:56
13:22:56
13:26:57
13:30:58
13:34:59
13:38:59
13:43:00
13:47:01
13:51:02
13:55:02
13:59:03
14:03:04
14:07:05
14:11:05
14:15:06
14:19:06
14:23:07
14:27:08
200
180
160
140
120
100
80
60
40
20
0
Process ‐‐ % Processor Time ‐‐ CdsPoolMgr
Process ‐‐ % Processor Time ‐‐ CdsPoolMgr
Process ‐‐ % Processor Time ‐‐ CdsImaProxy
Process ‐‐ % Processor Time ‐‐ CdsImaProxy
Process ‐‐ % Processor Time ‐‐ CdsController
Process ‐‐ % Processor Time ‐‐ CdsController
Process ‐‐ % Processor Time ‐‐ CitrixManagementServer
Process ‐‐ % Processor Time ‐‐ CitrixManagementServer
Process ‐‐ % Processor Time ‐‐ ImaSrv
Process ‐‐ % Processor Time ‐‐ ImaSrv
The load profile is as expected similar to DDC2. In contrast to DDC1 the load is noticeably lower. The main active process is the
CdsController which handles communication with the VDA including heartbeats and initial registration.
4,000
3,500
3,500
3,000
3,000
2,500
2,500
MiB
4,000
2,000
2,000
1,500
1,000
1,000
500
500
0
0
15:11:22
15:16:26
15:21:31
15:26:36
15:31:41
15:36:46
15:41:51
15:46:56
15:52:00
15:57:05
16:02:09
16:07:14
16:12:18
16:17:22
16:22:27
16:27:31
16:32:36
16:37:41
1,500
13:10:54
13:15:23
13:19:52
13:24:21
13:28:50
13:33:19
13:37:49
13:42:18
13:46:47
13:51:16
13:55:45
14:00:14
14:04:43
14:09:12
14:13:41
14:18:10
14:22:39
14:27:08
MiB
Memory – Committed Megabytes
27
15:11:22
15:16:12
15:21:03
15:25:54
15:30:44
15:35:35
15:40:26
15:45:17
15:50:07
15:54:57
15:59:48
16:04:38
16:09:28
16:14:18
16:19:09
16:23:59
16:28:49
16:33:40
16:38:31
13:10:54
13:15:09
13:19:24
13:23:39
13:27:54
13:32:09
13:36:24
13:40:38
13:44:53
13:49:08
13:53:23
13:57:38
14:01:53
14:06:08
14:10:23
14:14:37
14:18:52
14:23:07
14:27:22
15:11:22
15:15:58
15:20:35
15:25:11
15:29:48
15:34:24
15:39:01
15:43:38
15:48:14
15:52:50
15:57:26
16:02:02
16:06:38
16:11:14
16:15:50
16:20:26
16:25:03
16:29:39
16:34:15
16:38:52
13:10:54
13:14:55
13:18:56
13:22:56
13:26:57
13:30:58
13:34:59
13:38:59
13:43:00
13:47:01
13:51:02
13:55:02
13:59:03
14:03:04
14:07:05
14:11:05
14:15:06
14:19:06
14:23:07
14:27:08
PhysicalDisk -- % Idle Time -- _Total
100
95
90
85
80
75
70
65
60
55
50
100
95
90
85
80
75
70
65
60
55
50
Context Switches (per second)
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
28
5
5
4.5
4.5
4
4
3.5
3.5
3
3
Mbps
Mbps
Network Utilisation (Mbps)
2.5
2
2.5
2
1.5
1.5
1
1
0.5
0.5
0
Mbps Received
15:11:22
15:16:26
15:21:31
15:26:36
15:31:41
15:36:46
15:41:51
15:46:56
15:52:00
15:57:05
16:02:09
16:07:14
16:12:18
16:17:22
16:22:27
16:27:31
16:32:36
16:37:41
13:10:54
13:15:23
13:19:52
13:24:21
13:28:50
13:33:19
13:37:49
13:42:18
13:46:47
13:51:16
13:55:45
14:00:14
14:04:43
14:09:12
14:13:41
14:18:10
14:22:39
14:27:08
0
Mbps Sent
Mbps Received
Mbps Sent
Citrix Provisioning Services (PVS) Performance
There are 2 PVS servers handling the 3312 desktops in the environment. The processor and memory
configuration for these servers can clearly be seen to significantly over-specified. The servers’ 8 gigabit
NICs were configured as NIC team, the blade chassis had 4x10GbE uplink to the core switch.
The PVS servers are each running on BL680c blades with 4 x E7450 2.40 GHz hex core CPUs, with
64GB RAM.
29
15:11:22
15:16:26
15:21:31
15:26:35
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:08
16:07:12
16:12:16
16:17:21
16:22:25
16:27:30
16:32:34
16:37:39
13:10:53
13:17:22
13:23:51
13:30:21
13:36:50
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:46
14:35:15
14:41:44
14:48:13
14:54:42
15:01:11
MiB
6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000
MiB
15:11:22
15:15:58
15:20:34
15:25:10
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:25
16:02:01
16:06:37
16:11:13
16:15:49
16:20:25
16:25:01
16:29:37
16:34:14
16:38:50
13:10:53
13:16:39
13:22:26
13:28:13
13:34:00
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:30
14:20:16
14:26:03
14:31:49
14:37:36
14:43:23
14:49:09
14:54:56
15:00:43
PVS Server 1
Pool Spin Up
During Test Run
% Processor Time: _Total (4 x 6 Core CPUs)
100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0
Memory – Committed Megabytes
6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000
30
Mbps Received
15:11:22
15:16:26
15:21:31
15:26:35
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:08
16:07:12
16:12:16
16:17:21
16:22:25
16:27:30
16:32:34
16:37:39
13:10:53
13:17:22
13:23:51
13:30:21
13:36:50
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:46
14:35:15
14:41:44
14:48:13
14:54:42
15:01:11
1500
Mbps
Mbps
15:11:22
15:15:58
15:20:34
15:25:10
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:25
16:02:01
16:06:37
16:11:13
16:15:49
16:20:25
16:25:01
16:29:37
16:34:14
16:38:50
13:10:53
13:16:39
13:22:26
13:28:13
13:34:00
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:30
14:20:16
14:26:03
14:31:49
14:37:36
14:43:23
14:49:09
14:54:56
15:00:43
PhysicalDisk -- % Idle Time -- _Total
100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0
Network Utilisation (Mbps) (8 Teamed 1GbE NICs)
4000
3500
4000
3000
3500
2500
3000
2000
2500
1000
500
0
2000
1500
1000
500
0
Mbps Sent
Mbps Received
Mbps Sent
Peak traffic occurs during the user logon phase of the test run with a peak close to 2.3Gbps.
31
15:11:22
15:16:26
15:21:31
15:26:36
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:07
16:07:12
16:12:16
16:17:20
16:22:24
16:27:29
16:32:33
16:37:38
13:10:53
13:17:22
13:23:52
13:30:21
13:36:51
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:45
14:35:14
14:41:43
14:48:12
14:54:41
15:01:10
MiB
6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000
MiB
15:11:22
15:15:58
15:20:34
15:25:11
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:24
16:02:00
16:06:36
16:11:12
16:15:48
16:20:24
16:25:00
16:29:36
16:34:13
16:38:49
13:10:53
13:16:39
13:22:26
13:28:14
13:34:01
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:29
14:20:16
14:26:03
14:31:49
14:37:36
14:43:22
14:49:09
14:54:55
15:00:42
PVS Server 2
Pool Spin Up
During Test Run
% Processor Time: _Total (4 x 6 Core CPUs)
100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0
The 24 core server is clearly over specified. With a peak of < 30%, this would equate to 7.2 cores. A dual quad core server
would expect to be able to handle this load, though may be too close to the maximum utilisation; hence instead of two 24
core servers, three 8 core servers would expect to be sufficient.
Memory – Committed Megabytes
6,000
5,900
5,800
5,700
5,600
5,500
5,400
5,300
5,200
5,100
5,000
32
Mbps Received
15:11:22
15:16:26
15:21:31
15:26:36
15:31:40
15:36:45
15:41:50
15:46:54
15:51:59
15:57:03
16:02:07
16:07:12
16:12:16
16:17:20
16:22:24
16:27:29
16:32:33
16:37:38
13:10:53
13:17:22
13:23:52
13:30:21
13:36:51
13:43:20
13:49:50
13:56:19
14:02:49
14:09:18
14:15:47
14:22:16
14:28:45
14:35:14
14:41:43
14:48:12
14:54:41
15:01:10
Mbps
2000
Mbps
15:11:22
15:15:58
15:20:34
15:25:11
15:29:47
15:34:23
15:39:00
15:43:36
15:48:12
15:52:48
15:57:24
16:02:00
16:06:36
16:11:12
16:15:48
16:20:24
16:25:00
16:29:36
16:34:13
16:38:49
13:10:53
13:16:39
13:22:26
13:28:14
13:34:01
13:39:48
13:45:35
13:51:22
13:57:09
14:02:56
14:08:43
14:14:29
14:20:16
14:26:03
14:31:49
14:37:36
14:43:22
14:49:09
14:54:55
15:00:42
PhysicalDisk -- % Idle Time -- _Total
100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0
Network Utilisation (Mbps) (8 Teamed 1GbE NICs)
4000
4000
3500
3500
3000
3000
2500
2500
Mbps Sent
2000
1500
1500
1000
1000
500
500
0
0
Mbps Received
Mbps Sent
This network load mirrors the load seen on the other PVS server, with a peak close to 2.2Gbps.
33
NetApp Storage Performance Analysis concentrates on the actual test run rather than the spin up phase as the load is significantly
higher. The following summary (courtesy of NetApp) captures the critical read/write and IOPS info for
the 3312 desktop test.
Averages for 3312 Virtual Desktops
Mean Network Read/Write ratio
Max Network Read/Write ratio
Mean Disk Read/Write ratio
Max Disk Read/Write ratio
Reads
11.5%
20.5%
14.2%
17.8%
Mean IOPS per desktop
Max Average IOPS per desktop
IOPS
4.4
27.7
Writes
88.5%
79.5%
85.8%
82.2%
Analysis
o Never did more than 2 CPUS of the 4 on the storage controllers become fully utilised,
staying well within normal operating limits with significant headroom for further growth if
performance during a cluster failover is not required.
o The average latency for all protocols was well within reasonable performance, which would
provide an excellent end user experience.
o During the start and end of the test run the CIFS workload was a 50% player in protocol
usage. This is seen as a large amount of reads during the beginning of the test (when user
profiles are loaded) and a large amount of writes at the end of the test (when profiles are
written back).
o The remaining duration of the test NFS played the predominate role being utilised for PVS
client side cache.
o FCP (Fibre Channel) played very little if no part in the workload seen on the filer. FCP was
limited to database traffic for the various components in the environment.
o The majority of all IO’s were writes across all protocols.
o Average and Max Disk utilization was never more than 40% which suggests there could be
headroom to accept more virtual machines on to these controllers.
o In the event of a cluster failure the data indicates the filer could handle 3000-4000 desktops
with minimal or no performance degradation.
34
VMWare Virtual Center and ESX Performance Two blade servers have been installed as physical Virtual Center servers. Within each VC a cluster is
created for each blade chassis of up to 16 ESX hosts. As there are two different hardware specs in the
lab the number of Virtual Desktops hosted on each VC isn’t quite balanced.
Virtual Center 1
Blade Chassis
# Hosts
# Virtual Machines
camb4e1
16
898
camb4e2
16
800
Total
32
1698
During the testing only 1604 desktops were actively used. The remaining VMs remained powered off
though would still be enumerated by Virtual Center and XenDesktop Pool Management. These
additional VMs are present from earlier broker scalability testing.
Virtual Center 2
Blade Chassis
# Hosts
# Virtual Machines
camr3e1
16
481
camr3e2
14
392
camr5e1
16
480
camr5e2
15
420
Total
61
1773
In addition to the desktops above, VC2 also manages camr5e2b13 which hosts some infrastructure
VMs, e.g. 3 x Brokers and 1 x NetApp performance monitor.
Out of the 1773 desktop VMs only 1708 were powered on. As with VC1 these additional VMs were
present from earlier testing at higher host densities.
35
15:11:21
15:16:22
15:21:22
15:26:22
15:31:23
15:36:23
15:41:23
15:46:24
15:51:24
15:56:24
16:01:25
16:06:25
16:11:25
16:16:26
16:21:26
16:26:26
16:31:27
16:36:27
16:41:27
13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:57
14:22:58
14:28:58
14:34:59
14:40:59
14:46:59
14:53:00
14:59:00
15:05:01
100
90
80
70
60
50
40
30
20
10
0
250
250
200
200
150
150
100
100
50
50
0
0
15:11:21
15:16:22
15:21:22
15:26:22
15:31:23
15:36:23
15:41:23
15:46:24
15:51:24
15:56:24
16:01:25
16:06:25
16:11:25
16:16:26
16:21:26
16:26:26
16:31:27
16:36:27
16:41:27
13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:57
14:22:58
14:28:58
14:34:59
14:40:59
14:46:59
14:53:00
14:59:00
15:05:01
camr3e2b15: Virtual Center 1
Pool Spin Up
During Test Run
% Processor Time: _Total (2 x 4 Core CPUs)
100
90
80
70
60
50
40
30
20
10
0
Process -- % Processor Time -- vpxd
The vpxd service is exercised when XenDesktop Pool Management is requesting VMs be powered up or shut down. This can be seen
during the spin up phase and at the end of the test run.
As this server has 8 cores, the peak at ~200% would be equivalent to 2 cores being fully utilised.
36
15:11:21
15:16:52
15:22:22
15:27:52
15:33:23
15:38:53
15:44:24
15:49:54
15:55:24
16:00:55
16:06:25
16:11:55
16:17:26
16:22:56
16:28:26
16:33:57
16:39:27
13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:57
14:22:28
14:28:58
14:35:29
14:41:59
14:48:29
14:55:00
15:01:30
MiB
MiB
2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500
100
100
98
98
96
94
92
90
88
88
86
15:11:21
15:16:22
15:21:22
15:26:22
15:31:23
15:36:23
15:41:23
15:46:24
15:51:24
15:56:24
16:01:25
16:06:25
16:11:25
16:16:26
16:21:26
16:26:26
16:31:27
16:36:27
16:41:27
13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:57
14:22:58
14:28:58
14:34:59
14:40:59
14:46:59
14:53:00
14:59:00
15:05:01
Memory – Committed Megabytes
2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500
PhysicalDisk -- % Idle Time -- _Total
96
94
92
90
37
15:11:21
15:16:52
15:22:22
15:27:52
15:33:23
15:38:53
15:44:24
15:49:54
15:55:24
16:00:55
16:06:25
16:11:55
16:17:26
16:22:56
16:28:26
16:33:57
16:39:27
13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:57
14:22:28
14:28:58
14:35:29
14:41:59
14:48:29
14:55:00
15:01:30
Mbps
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Mbps
Network Utilisation (Mbps)
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
NIC1: Mbps Received
NIC1: Mbps Sent
NIC1: Mbps Received
NIC1: Mbps Sent
NIC2: Mbps Received
NIC2: Mbps Sent
NIC2: Mbps Received
NIC2: Mbps Sent
38
15:11:22
15:16:23
15:21:23
15:26:23
15:31:24
15:36:24
15:41:24
15:46:25
15:51:25
15:56:25
16:01:26
16:06:26
16:11:26
16:16:27
16:21:27
16:26:27
16:31:28
16:36:28
16:41:29
13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:58
14:22:58
14:28:58
14:34:59
14:40:59
14:47:00
14:53:00
14:59:00
15:05:01
100
90
80
70
60
50
40
30
20
10
0
250
250
200
200
150
150
100
100
50
50
0
0
15:11:22
15:16:23
15:21:23
15:26:23
15:31:24
15:36:24
15:41:24
15:46:25
15:51:25
15:56:25
16:01:26
16:06:26
16:11:26
16:16:27
16:21:27
16:26:27
16:31:28
16:36:28
16:41:29
13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:58
14:22:58
14:28:58
14:34:59
14:40:59
14:47:00
14:53:00
14:59:00
15:05:01
camr3e2b16: Virtual Center 2
Pool Spin Up
During Test Run
% Processor Time: _Total (2 x 4 Core CPUs)
100
90
80
70
60
50
40
30
20
10
0
Process -- % Processor Time -- vpxd
The load on vpxd is consistent between the two VC servers.
As this server has 8 cores, the peak at ~230% would be equivalent to a little more than 2 cores being fully utilised.
39
15:11:22
15:16:53
15:22:23
15:27:54
15:33:24
15:38:54
15:44:25
15:49:55
15:55:25
16:00:56
16:06:26
16:11:56
16:17:27
16:22:57
16:28:28
16:33:58
16:39:28
13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:58
14:22:28
14:28:58
14:35:29
14:41:59
14:48:30
14:55:00
15:01:31
MiB
MiB
2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500
100
100
99
99
98
98
97
97
96
96
95
95
94
94
93
93
92
92
91
91
15:11:22
15:16:23
15:21:23
15:26:23
15:31:24
15:36:24
15:41:24
15:46:25
15:51:25
15:56:25
16:01:26
16:06:26
16:11:26
16:16:27
16:21:27
16:26:27
16:31:28
16:36:28
16:41:29
13:10:53
13:16:53
13:22:54
13:28:54
13:34:55
13:40:55
13:46:55
13:52:56
13:58:56
14:04:57
14:10:57
14:16:58
14:22:58
14:28:58
14:34:59
14:40:59
14:47:00
14:53:00
14:59:00
15:05:01
Memory – Committed Megabytes
2,000
1,950
1,900
1,850
1,800
1,750
1,700
1,650
1,600
1,550
1,500
The memory used on each VC is similar, though VC2 is ~300MiB higher. This is to be expected as it’s managing twice the number of ESX
hosts and a higher number of VM guests.
PhysicalDisk -- % Idle Time -- _Total
40
15:11:22
15:16:53
15:22:23
15:27:54
15:33:24
15:38:54
15:44:25
15:49:55
15:55:25
16:00:56
16:06:26
16:11:56
16:17:27
16:22:57
16:28:28
16:33:58
16:39:28
13:10:53
13:17:23
13:23:54
13:30:24
13:36:55
13:43:25
13:49:56
13:56:26
14:02:57
14:09:27
14:15:58
14:22:28
14:28:58
14:35:29
14:41:59
14:48:30
14:55:00
15:01:31
Mbps
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Mbps
Network Utilisation (Mbps)
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
NIC1: Mbps Received
NIC1: Mbps Sent
NIC1: Mbps Received
NIC1: Mbps Sent
NIC2: Mbps Received
NIC2: Mbps Sent
NIC2: Mbps Received
NIC2: Mbps Sent
41
ESX Performance
The test environment consists of 2 different hardware configurations running the desktop workload.
The data below is from a BL460c with 32 GB RAM and 2 x L5420 Quad Core CPU.
Pool Spin Up
During Test Run
700
600
600
500
500
400
400
300
300
200
200
100
100
0
0
15:03:32
15:10:12
15:16:52
15:23:32
15:30:12
15:36:52
15:43:32
15:50:12
15:56:52
16:03:32
16:10:12
16:16:52
16:23:32
16:30:12
16:36:52
16:43:32
Percent
700
13:10:11
13:16:51
13:23:31
13:30:11
13:36:51
13:43:31
13:50:11
13:56:51
14:03:32
14:10:12
14:16:52
14:23:32
14:30:12
14:36:52
14:43:32
14:50:12
14:56:52
Percent
CPU Usage (2 x L5420 Quad Core 2.5GHz CPU)
CPU Usage (Average) ‐ 0
CPU Usage (Average) ‐ 1
CPU Usage (Average) ‐ 0
CPU Usage (Average) ‐ 1
CPU Usage (Average) ‐ 2
CPU Usage (Average) ‐ 3
CPU Usage (Average) ‐ 2
CPU Usage (Average) ‐ 3
CPU Usage (Average) ‐ 4
CPU Usage (Average) ‐ 5
CPU Usage (Average) ‐ 4
CPU Usage (Average) ‐ 5
CPU Usage (Average) ‐ 6
CPU Usage (Average) ‐ 7
CPU Usage (Average) ‐ 6
CPU Usage (Average) ‐ 7
42
20000
40
10000
20
0
60
50
40
30
20
10
0
15:04:52
15:12:12
15:19:32
15:26:52
15:34:12
15:41:32
15:48:52
15:56:12
16:03:32
16:10:38
16:17:58
16:25:18
16:32:38
16:39:58
13:10:11
13:18:51
13:27:31
13:36:11
13:44:51
13:53:31
14:02:12
14:10:52
14:19:32
14:28:12
14:36:52
14:45:32
14:54:12
0
30000
25000
20000
15000
10000
5000
0
%
60
MiB
30000
Percent
MiB
Memory Usage
Memory Balloon (Average)
Memory Balloon (Average)
Memory Shared Common (Average)
Memory Shared Common (Average)
Memory Granted (Average)
Memory Granted (Average)
Memory Swap Used (Average)
Memory Swap Used (Average)
Memory Active (Average)
Memory Active (Average)
Average Memory Usage (%)
Average Memory Usage (%)
2000
1500
1000
500
0
15:03:52
15:09:52
15:15:52
15:21:52
15:27:52
15:33:52
15:39:52
15:45:52
15:51:52
15:57:52
16:03:52
16:09:52
16:15:52
16:21:52
16:27:52
16:33:52
16:39:52
16:45:52
KBps
2000
1500
1000
500
0
13:10:12
13:16:32
13:22:52
13:29:12
13:35:32
13:41:52
13:48:12
13:54:32
14:00:52
14:07:12
14:13:32
14:19:52
14:26:12
14:32:32
14:38:52
14:45:12
14:51:32
14:57:52
KBps
Disk Usage – Kilobytes/second
Disk Read Rate ‐ vmhba0:0:0
Disk Read Rate ‐ vmhba0:0:0
Disk Write Rate ‐ vmhba0:0:0
Disk Write Rate ‐ vmhba0:0:0
This traffic is on the local physical disk of the ESX host, rather than tracking the activity of the VMs as these are on NFS shared storage.
The frequency of the disk activity would suggest some logging, perhaps of performance data from the VMs. The rate of traffic appears to
be proportional to the number of running virtual machines.
200
150
150
Mbps
200
100
100
50
0
0
15:04:52
15:10:32
15:16:12
15:21:52
15:27:32
15:33:12
15:38:52
15:44:32
15:50:12
15:55:52
16:01:32
16:07:12
16:12:52
16:18:32
16:24:12
16:29:52
16:35:32
16:41:12
50
13:10:11
13:16:31
13:22:51
13:29:11
13:35:31
13:41:51
13:48:11
13:54:31
14:00:52
14:07:12
14:13:32
14:19:52
14:26:12
14:32:32
14:38:52
14:45:12
14:51:32
14:57:52
Mbps
Network Utilisation (Mbps)
vmnic0: Mbps Sent
vmnic1: Mbps Sent
vmnic0: Mbps Sent
vmnic1: Mbps Sent
vmnic0: Mbps Receive
vmnic1: Mbps Receive
vmnic0: Mbps Receive
vmnic1: Mbps Receive
43
Summary
Spend extra time and care on how you simulate the user workload as it highly impacts all
design recommendations.
o Don’t forget to consider the entire user population and how and when login storms will
occur.
Use free and reputable tools like LoginVSI from Login Consultants to simulate real-worldlike user workloads.
Design for failover, your infrastructure size will depend on what user experience you want
during failover (degraded or not, and how much).
o Use central storage and blade servers for scale and reliability.
Virtualize most major components of XenDesktop
o Provisioning server in this design was not virtualized, and given the high scalability; you
should dedicate a physical server to it in your design. It will be an option to run PVS
virtualized, but look for recommendations on this in an upcoming document.
44
Appendix A – Blade Server Hardware and Deployment
The test environment consists of primarily HP Blade servers. Some additional servers hosting
infrastructure of specific test components are detailed later in this report. VMware ESX was installed on
the 2 different specification BL460 servers, labelled (V1) and (V2), which were used to host both
Windows XP Desktops and a small number of VMs for XenDesktop Brokers (DDCs).
The BL680 servers were used to host two Citrix Provisioning Services and a Microsoft SQL Server.
These machines were somewhat over specified for their roles.
BL460c (v1) – 1.86Ghz Dual Processor Quad Core 16GB RAM
• 2 x 1.86Ghz Intel Xeon L5320 Quad Core (8MiB L2 Cache 1066Mhz
Bus)
• 1 x 36GB HDD SAS 10K rpm
• 16 GB RAM 667Mhz
• Dual Broadcom 1Gb NICs
• QLogic QMH2462 Dual Port Fibre Channel HBA
Product
Overview:
http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/460c/index.html
BL460c (v2) – 2.5Ghz Dual Processor Quad Core 32GB RAM
• 2 x 2.5Ghz Intel Xeon L5420 Quad Core (12MiB L2 Cache 1333Mhz
Bus)
• 1 x 72GB HDD SAS 10K rpm
• 32 GiB RAM 667Mhz
• Dual Broadcom 1Gb NICs
• QLogic QMH2462 Dual Port Fibre Channel HBA
Product
http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/460c/index.html
Overview:
45
BL680 G5 – 2.4Ghz Quad Processor Hex Core 64GB RAM
• 4 x 2.4Ghz Intel Xeon E7450 Hex Core (9MiB L2 Cache (12MiB L3
Cache) 1000Mhz Bus)
• 2 x 72GB HDD SAS 10K rpm
• 64 GiB RAM 667Mhz
• 8 x Broadcom 1Gb NICs
• QLogic QMH2462 Dual Port Fibre Channel HBA
Product
Overview:
http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/680c/index.html
46
Blade Deployment
47
Appendix B – Network Diagram
This is predominately HP blade based environment running the virtual machines. Dell 1950 1U Servers
are used to run many ICA clients on the same server to connect into the environment.
The environment was originally designed to use Fibre Channel for storage traffic, however in this testing
NFS was used as it offer greatly simplified management and scalability.
All traffic is passed to either a top of rack Cisco 2960-G switch or via the Cisco blade switch modules in
the blades back to a central Cisco 4510 chassis. This chassis houses multiple 1GbE and 10GbE line
cards in addition to the supervisor modules. Where the blade switches support stacking this feature has
been used.
48
Fibre Channel Storage Network
Fibre Channel network is only used for databases on SQL server running on one of the BL680 blades
servers.
All other storage traffic uses NFS over Ethernet links.
49
REFERENCES
Citrix (Knowledgebase Articles)
Separating the Roles of Farm Master and Controller in the XenDesktop Farm (CTX117477)
Registry Key Entries Used by XenDesktop (CTX117446)
NetApp:
Deployment Guide for XenDesktop 3.0 and VMware ESX Server on NetApp (TR-3795)
NetApp and VMware Virtual Infrastructure 3 Storage Best Practices (TR-3428)
Citrix XenServer 5.0 and NetApp Storage Best Practices (TR-3732)
Citrix XenDesktop 2.0 with NetApp Storage— Pilot Deployment Overview (TR-3711)
2,000-Seat VMware View on NetApp Deployment Guide Using NFS (TR-3770)
Project VRC / Login Consultants:
VRC, VSI and Clocks Reviewed
VMware Platform Performance Index v1.1
XenServer Platform Performance Index v1.0
VMware:
VMware Virtual Infrastructure 3.5 Configuration Maximums
Comparison of Storage Protocol Performance
NetApp FAS2020HA Unified Storage
50
About Citrix
Citrix Systems, Inc. (NASDAQ:CTXS) is the leading provider of virtualization, networking and software as a service
technologies for more than 230,000 organizations worldwide. It is Citrix Delivery Center, Citrix Cloud Center (C3) and Citrix
Online Services product families radically simplify computing for millions of users, delivering applications as an on-demand
service to any user, in any location on any device. Citrix customers include the world’s largest Internet companies, 99 percent
of Fortune Global 500 enterprises, and hundreds of thousands of small businesses and prosumers worldwide. Citrix partners
with over 10,000 companies worldwide in more than 100 countries. Founded in 1989, annual revenue in 2008 was $1.6
billion.
©2010 Citrix Systems, Inc. All rights reserved. Citrix®, Access Gateway™, Branch Repeater™, Citrix Repeater™, HDX™,
XenServer™, XenApp™, XenDesktop™ and Citrix Delivery Center™ are trademarks of Citrix Systems, Inc. and/or one or
more of its subsidiaries, and may be registered in the United States Patent and Trademark Office and in other countries. All
other trademarks and registered trademarks are property of their respective owners.
51