Delivering 5000 Desktops with Citrix XenDesktop Validation Report and Recommendations for a Scalable VDI Deployment using Citrix XenDesktop and Provisioning Services, NetApp Storage and VMWare Server Virtualization www.citrix.com TABLE OF CONTENTS INTRODUCTION ............................................................................................................................... 3 CITRIX XENDESKTOP OVERVIEW .............................................................................................. 3 EXECUTIVE SUMMARY................................................................................................................... 4 Key Findings ................................................................................................................................................................................................................................. 5 METHODOLOGY AND WORKLOAD ............................................................................................ 5 Workload Details – LoginVSI from Login Consultants ............................................................................................................................................... 5 Component Scalability Results ............................................................................................................................................................................................. 6 XENDESKTOP DESKTOP DELIVERY CONTROLLER (DDC) ........................................................... 7 SINGLE SERVER SCALABILITY .................................................................................................. 8 PROVISIONING SERVICES SCALABILITY ..................................................................................... 9 FINDINGS .......................................................................................................................................... 10 The Desktop and Desired User Experience .................................................................................................................................................................. 10 Citrix XenDesktop Desktop Delivery Controller ........................................................................................................................................................ 10 Storage Recommendations ................................................................................................................................................................................................. 11 Server Hardware Findings .................................................................................................................................................................................................. 12 Server Virtualization Findings ........................................................................................................................................................................................... 13 Additional Implications for Scalability Design ............................................................................................................................................................ 14 LARGE SCALE TEST RESULTS .................................................................................................... 16 Test Details ................................................................................................................................................................................................................................ 16 Summary of Large Scale Test Results ............................................................................................................................................................................. 17 Session Performance and Session Start‐up Times .................................................................................................................................................... 17 Desktop Delivery Controller and Provisioning Services Performance ............................................................................................................. 19 DESKTOP DELIVERY CONTROLLER PERFORMANCE .................................................................. 19 CITRIX PROVISIONING SERVICES (PVS) PERFORMANCE .......................................................... 29 NetApp Storage Performance ............................................................................................................................................................................................ 34 VMWare Virtual Center and ESX Performance ........................................................................................................................................................... 35 ESX PERFORMANCE ............................................................................................................. 42 SUMMARY ........................................................................................................................................ 44 APPENDIX A – BLADE SERVER HARDWARE AND DEPLOYMENT .................................. 45 APPENDIX B – NETWORK DIAGRAM ...................................................................................... 48 REFERENCES ................................................................................................................................... 50 2 Introduction This document is intended to provide advanced technical personnel - architects, engineers and consultants – with data to assist in the planning, design and deployment of a Citrix XenDesktop hosted VM-based (VDI) solution that scales to 5000 desktops. This document presents the findings of internal Citrix testing that simulates a large enterprise deployment of VDI desktops. This document provides data generated from a sample deployment, in which a single OS image is provisioned to 5,000 unique desktop users. This document is not intended to provide definitive guidance on scalability, and the data should be interpreted and adapted for your specific environment’s. To help you understand the data, some examples of possible recommendations are made throughout the document to adjust to different scenarios. The information gathered from this testing is part of a comprehensive and constantly growingguidebook to scalability. Please reference the XenDesktop Scalability Guidelines at http://support.citrix.com/proddocs/topic/xendesktop-bdx/cds-scalability-wrapper-bdx.html for an understanding how to scale in building blocks to many tens of thousands of desktops. Citrix XenDesktop Overview IT organizations today are looking for new ways to address their desktop challenges, whether it be rapid provisioning, Windows 7 migrations, security, patching and updating, or remote access. They are exploring solutions for current business initiatives, such as outsourcing, compliance and globalization. Many are interested in “bring your own computer” policies, to enable IT to get out of the business of managing hardware and focus on the core software and intellectual property that is central to the line of business. Citrix XenDesktop offers the most powerful and flexible desktop virtualization solution available on the market, enabling organizations to start delivering desktops as a service to users on any device, anywhere. With FlexCast delivery technology, XenDesktop can match the virtual desktop model to the performance, security, flexibility and cost requirements of each group of users across the enterprise. This document focuses on the scalability and test results of one of the six FlexCast delivery models: hosted VM-based desktops, or VDI. 3 Executive Summary Citrix internally tested a sample VDI deployment designed for high-availability and simulated real-world workloads using XenDesktop 4. The end-to-end environment included more than 3300 Windows XP virtual desktops. In addition, key components were individually tested to determine their ability to support more than 5000 desktops. Combining the complete system results with the individual component tests enabled Citrix to extrapolate results to support a single virtual desktop infrastructure design that can deliver at least 5000 desktops. The full VDI infrastructure was built using the following components: o o o o o o Desktop Delivery Controller for brokering, remoting and managing the virtual desktop Citrix Provisioning Services for OS provisioning NetApp centralized storage for storing user profiles, write cache and relevant databases HP Blade servers for hosting the VMs VMWare ESX and vCenter as the server virtualization infrastructure Cisco datacenter network switches 4 Key Findings o Workloads and Boot or Logon Storms (from rapid concurrent or simultaneous user logons) have the largest impact to how you scale and size this VDI design o Desktop Delivery Controllers can be virtualized and have roles divided amongst them for best scalability and resiliency o Citrix Provisioning Services, with the release of 5.1 SP2, has demonstrated unparalleled scale (with over 3000 users per physical server) and reliability in this VDI deployment. o Virtual Machine density will vary with OS, workload and of course server hardware Methodology and Workload Testing was done in two phases - individual component scalability and full-system scalability. Central to both phases of testing is the use of a tool that simulates real-world workloads, as well as an internally built tool to measure session startup times (providing expected user logon times). Workload Details – LoginVSI from Login Consultants One of the most critical factors of designing a scalable VDI deployment is understanding the true user workflow and planning adequately in terms of server and storage capacity, while setting a standard for the user experience throughout. To accurately represent a real-world user workflow, the third-party tools from Login Consultants were used throughout the full system testing. These tools also take measures of in-session response time, providing a way to measure the expected user experience in accessing their desktop throughout large scale testing, including login storms. The widely available workload simulation tool, LoginVSI 1.x, was also coupled with the use of the idle pool (feature in XenDesktop) to spin up sessions, simulating a scenario of all users coming in to work at the same time and logging on. (Login VSI is freeware and can be downloaded from www.loginconsultants.com.) Login VSI is a benchmarking methodology that calculates an index based on the amount of simultaneous sessions that can be run on a single machine. The objective is to find the point at which 5 the number of sessions generates too much load that end user experience would be noticeably e degraded. Login VSI simulates a medium-heavy workload user (intensive knowledge worker) running generic applications like: Microsoft Office 2007, Internet Explorer including Flash applets and Adobe Acrobat Reader (Note: For the purposes of this test, applications were installed locally, not streamed or hosted). Like real users, the scripted session will leave multiple applications open at the same time. Every session will average about 20% minimal user activity, similar to real world usage. Note that during each 18 minute loop users open and close files a couple of time per minutes which is probably more intensive that most users. Each loop will open and use: • Outlook 2007, browse 10 messages & type new message. • Internet Explorer, one instance is left open, one instance is browsed to Microsoft.com, VMware.com and Citrix.com (locally cached copies of these websites). • Word 2007, one instance to measure response time (9 times), one instance to review, edit and print a random document. • Solidata PDF writer & Acrobat Reader, the word document is printed to PDF and reviewed. • Excel 2007, a very large randomized sheet is opened and edited. • PowerPoint 2007, a random presentation is reviewed and edited. 3 Breaks (40, 20 & 40 seconds) are included to emulate real world usage. Component Scalability Results The following components were tested for individual scalability: • Desktop Delivery Controller (DDC) • VMWare ESX Server on blade servers • Provisioning Services (note this testing was done as part of full-scale system tests) 6 XenDesktop Desktop Delivery Controller (DDC) The DDCs were virtualized on ESX server and some of the roles of the DDC were assigned to specific DDCs, an approach often taken in Citrix XenApp deployments. The DDCs were configured such that: DDC 1: Farm Master and Pool Management DDC 2 & 3: VDA Registrations and XML Brokering In this environment, 3 DDCs (4vCPU, 4GB RAM) were shown to be able to sustain a farm size of 6000 desktops and proved stable handling over 120k logons from a pool of 5650 users. It was necessary to have multiple Virtual Center instances to support this scale; each VC instance required a new XenDesktop desktop group. In the testing 5 VCs were used with the following distribution: • 2 x 2000 Desktops • 2 x 700 Desktop • 1 x 600 Desktops The stability of the deployment was validated using the following method: • All VMs were powered on using Idle Pool Management. This feature of XenDesktop allows the environment to be automatically brought up in a controlled manner in advance of peak user activity. 7 • • An initial logon storm was created by logging users on at a rate of ~3 per second. Followed by a steady load as users logged off, rebooted and VDAs re-registered. Single Server Scalability The Single Server Scalability tests are focused on determining the number of Virtual Desktops a given target machine can support. There are many permutations of tests that could be performed to evaluate specific features or architectures. As this testing is a precursor to more comprehensive scalability tests and guidelines, exploring a broad set of configurations was not within the scope of the project. The methodology used is based on Project Virtual Reality Check (Project VRC: http://www.virtualrealitycheck.net); project VRC was collaboration between two Consulting companies (Login Consultants and PQR) with the objective of measuring hypervisor scalability using Login VSI 1.0. The key differences are between the testing methodology used at Citrix and that of Project VRC are: • Provisioning Services was included and enables Pooled XP Desktops running from a single common vDisk. Some changes were made to the session logon scripts to prevent unnecessary file copy operations that would impact the PVS Write Cache; this operation was intended for XenApp environments. • Connections are brokered via the XenDesktop DDC, not direct connections. • The XP Virtual Desktops have been allocated 512MB RAM, compared to 1GB in the case of ProjectVRC. • Roaming users have been used instead of local profiles, as this would be representative of a VDI deployment. Each of the hardware platforms tested were intended to show scalability in memory and CPU bound conditions along with cases where the environment was rich in memory and CPU resources. VM Density used in Large Scale Testing The following specifications were used. o Windows XP pooled desktops o 1vCPU and 512MB RAM. o 1.5 GB PVS Cache on NFS (NetApp 3170HA) o HP BL460c Dual Quad Core (1.86GHz L5320) 16GiB RAM o HP BL460c Dual Quad Core (2.5GHz L5420) 32GB RAM o ESX 3.5 Update 4 o VMs/Host o VMs/Core o 28 o 3.5 o 50 o 6.25 8 Note that at smaller scale, slightly higher single server density was possible, however at large scale we noticed some degradation of performance. Testing showed that with 34 desktops on the BL460c 16GB blade that ballooning was occurring and was unable to free enough memory. This caused the ESX host to start to swap the guest memory to the storage tier. This impacted the end user experience as pages the guest believed to be in memory were actually on disk, causing an increase in latency for accessing those pages. A reduction in the number of guests per host removed the swapping behavior and removed the impact on the end user experience that was seen when the environment was being scaled out. Testing with 32GB RAM, 52 desktops were possible though the system was close to becoming CPU bound. To avoid the risk of impacting user experience, we slightly reduced density used in the large-scale tests. Provisioning Services Scalability The scalability of Provisioning Services builds on the results from the SSS testing. As we increased the number of desktops being streamed from PVS, we monitored the Login VSI score and the logon time to ensure that the end user experience remained acceptable. Standard Perfmon metrics were also captured to understand the characteristics of PVS and streaming pooled desktops. As the full-system scalability testing was conducted and users added to the maximum capacity of the hardware, it was observed that ONE physical Provisioning Server could easily support the 3300 desktops. This is a significant improvement from earlier testing of previous versions of the technology. 9 Findings To build a 5000 VDI desktop deployment, the findings of this round of testing indicates some new guidance in our overall approach to scalability, to be captured in a comprehensive scalability guide in the near future: The Desktop and Desired User Experience Ensuring proper design of a large-scale VDI deployment requires that you have a good understanding of how the users on average will be using their desktops and applications. The two critical elements are login storms and the in-session workload. The test environment is capable of supporting a login storm of 5000 desktops based on test data. LoginVSI workload was for a medium type of user as described in the Methodology section. If the user workload varies greatly on average from the one described in this design, then you need to model the workload on at least a single-server basis to gain approximations for sizing servers and storage components differently. Citrix XenDesktop Desktop Delivery Controller XenDesktop Desktop Delivery Controller configuration was an enterprise installation with the following adjustments to allow distribution of roles to 3 virtualized brokers: Farm master (DDC1) • Registry configured so that the DDC rejects VDA registrations. • Pool Management throttling was configured at 40 desktops, overriding the default of 10% of the pool size (~160-170 desktops depending on the group. • Configured as the preferred Farm Master. VDA registration and XML brokering (DDC2 and DDC3) • The above pool management configuration change was made in case pool management failed over to a different VDA. This configuration was tested to support 5000 sessions. 10 Storage Recommendations For a large VDI deployment, a scalable storage solution is a cost-effective and reliable solution. The NetApp FAS3170HA was used with 2 controllers, 70 x 300GB drives for storage and PAMII cards. The PAM II modules in the NetApp FAS3170HA filer did not offer any gains as the workload on the storage was write focused. For this version of XenDesktop and VDI design, the PAM II cards are not required and would be not recommended Otherwise, this particular configuration of NetApp is recommended as designed here for 5000 users, with the assumption that there will be some potential degradation in a complete failover situation (where one NetApp controller fails complete or similar failure). To tune the NetApp sizing for your particular failover/recovery needs, it’s recommended to work with a NetApp sales engineer. The FAS3170 was running OnTap version 7.3.2 with PAMII cards enabled. One aggregate per controller with multiple volumes created on each aggregate per the layout shown below. 11 Server Hardware Findings For hosting the actual virtual desktops, a blade server configuration is recommended. In this design, approximately 50 VMs/host was achieved using the following: HP BL460 • 2 x 1.86Ghz Intel Xeon L5320 Quad Core (8MiB L2 Cache 1066Mhz Bus) • 1 x 36GB HDD SAS 10K rpm • 16 GB RAM 667Mhz • Dual Broadcom 1Gb NICs • QLogic QMH2462 Dual Port Fibre Channel HBA HP BL460c • 2 x 2.5Ghz Intel Xeon L5420 Quad Core (12MiB L2 Cache 1333Mhz Bus) • 1 x 72GB HDD SAS 10K rpm • 32 GB RAM 667Mhz • Dual Broadcom 1Gb NICs • QLogic QMH2462 Dual Port Fibre Channel HBA Using similar hardware configuration but with newer updated Intel Nehalem processors (55xx series) and memory configurations 64-96GB should provide significantly increased VM density. For Provisioning Services, dedicated servers were used and over-specified for this design of 5000 desktops. An HP BL680 was used: Citrix PVS Server OS: Windows 2008 64bit Make: HP CPU: 4 x Intel E7450 2.4GHz Disk: 2 x 72GB 10k SAS Provisioning Services 5.1 SP2 Service Pack: Model: RAM: Network: camb5e1b02 1 BL680 64GiB 8 x 1GbE From the test data, this server was highly underutilized. The 24 core server is clearly over specified. With a peak of < 30%, this would equate to 7.2 cores. A dual quad core server would expect to be able to handle this load, though may be too close to the maximum utilization; hence instead of two 24 core servers, three 8 core servers would be sufficient. 12 Server Virtualization Findings In our testing, two desktop groups were configured, pointing at two different VMWare Virtual Center servers. Virtual Center 1 would run 1604 desktop sessions on 32 blades. Virtual Center 2 would run 1708 desktop sessions on 61 blades. Based on VMware best practice for the software versions used (VMWare ESX 3.5 Update 4) and published maximums (2000 VMs per Virtual Center) the environment had to be split over 2 Virtual Center instances. Since then, VMWare has released version 4.0 that has higher limits than the 2000 VMs tested in version 3.5 (note that in version 4, the limit is respectively 3000 and 4500 for 32bit and 64bit guests). In general, the recommendation would be to have the least number of Virtual Centers configured. No changes were made or recommended from a standard installation. Servers were placed into logical clusters, with one cluster matching one blade enclosure. VMware ESX 3.5.0 build 176894 was used on all ESX hosts in the environment. Each host is configured with a single virtual switch with both vmnic0 and vmnic1 connected. The VM Network is configured with vmnic0 as active and vmnic1 as standby. o This is used for ICA, PVS and general network traffic The Service Console is not specifically bound to a specific vmic VMotion is configured with vmnic1 as active and vmnic0 as standby o This is used for NFS and VMotion traffic 13 Service Console was allocated 800MiB. NFS configuration changes were made as per current NetApp guidance in the NetApp Technical Report TR-3428 NTP was configured to sync time. ESX hosts were installed with the latest HP ESX utilities for monitoring hardware. Due to interrupt sharing issues between the vmkernel and the service console USB was disabled in the BIOS. See VMWare KB article 1003710. Note that while the BIOS disabled USB, USB was still available from the iLO so remote keyboard access was still available. Additional Implications for Scalability Design Don’t place the PVS vDisk on a CIFS share. o Windows does not cache files from file shares in memory, thus each time a call is made to the PVS server to it in turn has to reach out to the shared storage. Ensure VMware Virtual Center hasn’t set a resource limit on your Virtual Machine o When we moved from the DDC testing which used 256MiB guests to the large-scale test we increased the VM memory back to 512MiB however for some reason a limit was placed on the memory resources available to the guest of 256MiB. This resulted in a VM which appeared to have 512MiB RAM but was limited to only using 256MiB of physical RAM and the rest was held in the VMware swap file, leading to huge increase in our storage IO to the SAN which crippled the large scale environment down to less than 100 desktops. Check: “Virtual Machine Properties -> Resources -> Memory -> Limit:” Don’t place too many Virtual Machines on VMFS volumes o Not applicable to the NFS implementation, but seen with SSS testing using local VMFS volumes and also FC attached VMFS volumes. Impact was most noticeable on user logon time with it quickly increasing with more than 40 active VMs on a single VMFS volume. Splitting this on to multiple volumes on the same number of disks alleviated the problem. .NET 3.5 SP1 (+ later windows updates) is necessary to improve scalability of the DDC 14 o Without this update applied we would see VDAs deregister as users began to login to the system. This was seen with ~1500 desktops and higher. The Microsoft fixes to .NET addressed the problem and allowed testing to achieve ~6000 desktops. By default Pool Management will attempt to start 10% of the total pool size. In a large environment this may be more than Virtual Center can cope with. o The number of concurrent requests can be throttled by editing the Pool Management Service configuration file: o C:\Program Files\Citrix\VMManagement\CdsPoolMgr.exe.config o Modify the <appSetting> section by adding the line: o <add key="MaximumTransitionRate" value="20"/> o The Pool Management service needs to be restarted to read the new configuration. o If VMware DRS is being used a lower value should be set as DRS needs additional time to determine guest placement before powering it on. In our testing with DRS enabled the rate of 20 was used. o In our testing we allowed DRS to do the initial VM placement through a full run, DRS was then disabled and this allowed the MaxiumumTransisionRate to be increased to 40 without VC becoming overloaded. Details on assigning the farm master roles can be found in CTX117477. Note that the XenDesktop PowerShell SDK can also be used to configure the preferred farm master. To stop the farm master handling connections, see the MaxWorkers registry key in CTX117446. PVS NIC teaming can simplify the deployment of the PVS server. o NIC teaming also improves the reliability, as one PVS server has one IP address, if a network connection fails, the remaining connections take over the load and the PVS server continues to operate on its current IP. This is especially useful for failover and HA as only one IP address needs to be specified for the login server per host. This also allows the network layer to handle the load balancing of client connections over the available NICs. 15 Large Scale Test Results Test Details The test run of 3312 desktops comprised of an idle pool spin up with the following details: o All sessions launched within approximately 60 minutes. o Individual logon times tracked to ensure logon performance did not degrade significantly. o All running the Login VSI 1.1 workload and their response times logged. o At the end of the VSI workload phase the users would logout. This triggers Pool Management to shutdown then restart the desktop. o PVS HA testing to ensure all desktops would continue to run in the event of a PVS server failure. o Use the various product management consoles during the test to ensure they remain responsive to general admin tasks. Environment: Two desktop groups, pointing at two different Virtual Center servers. o Virtual Center 1 ran 1604 desktop sessions on 32 blades. o Virtual Center 2 ran 1708 desktop sessions on 61 blades. o Based on VMware best practice and published maximums the environment had to be split over 2 Virtual Center instances. Within the Virtual Center individual clusters are created for each blade chassis (of up to 16 blade servers). o Virtual Center 1 has clusters for two chassis of the more powerful blade servers. Virtual Center 2 hosts clusters for the other four chassis of blades. 16 Summary of Large Scale Test Results • • • • • • • Powering on all 3312 desktops ready for users to login took less than 60 minutes using XenDesktop Idle Pool Management capability. Using a launch rate of 107/minute, 99% of users logged on in 31 minutes. PVS was shown to be able to run 3312 desktops from a HA pair of servers. In a separate test one of the PVS servers was shutdown triggering a HA failover. The ~1600 sessions transferred to the other server within 8 minutes. The scalability of the environment was verified through analysis of the logon times, Login VSI test response times and performance metrics gathered from all the major components. The perfmon data confirms that a number of the servers were oversized and could easily handle more load than was placed on them in this test. It took on average 19 seconds from launching the ICA file to having a fully running desktop. Login VSI response times indicate the system remained at an acceptable performance level for all users during the test. Session Performance and Session Start‐up Times LoginVSI results illustrate the capture of response time against the count of sessions launched. You can observe that the max response time increases nominally as session count increases, but that overall, average response times stay within the 2000ms for the duration. 17 Max Response_Time Min Response_Time Average Response_Time 4500 4000 Response time (ms) 3500 3000 2500 2000 1500 1000 500 1 104 207 310 413 516 619 722 825 928 1031 1134 1237 1340 1443 1546 1649 1752 1855 1958 2061 2164 2267 2370 2473 2576 2679 2782 2885 2988 3091 3194 0 Active Sessions o Total Sessions Launched o 3312 o Uncorrected Optimal Performance Index (UOPI) o 3312 o Stuck Session Count before UOPI (SSC) o 0 o Lost Session Count before UOPI (LSC) o 44 o Corrected Optimal Performance Index (COPI = UOPI ‐ (SSC*50%) ‐ LSC) o 3268 Session start-up time is a measure of the time taken from starting the ICA client on the client launcher, having received the ICA file from a successful XML brokering request, to the session loading and the STAT mini agent (a .NET application loaded by the windows start-up folder) loading. This method of calculating start-up time is the closest approximation to true user logon time in such a test environment Logon times can be seen to fit mostly in band between 15-22 seconds though with some stray sessions taking close to 40 seconds near the end of the logon storm and the earlier users and part way through their first workload run. 18 Min 11 secs Max 39 secs Average 19 seconds Desktop Delivery Controller and Provisioning Services Performance Where available, the data is presented for the environment during the spin up phase, which is controlled via XenDesktop Idle Pool Management and during the test run; where the Login VSI 1.x workload is run in all the sessions until all desktops have run the full set of scripts at least once, then a file is dropped on a network share which triggers the Login VSI scripts to initiate a logoff when they next complete a full run of the scripts. As the desktops were configured to reboot on logoff, additional load is placed on the systems when users begin to logoff and then idle pool management powers them back on again. Standard Microsoft Windows perfmon counters were used to collect the following performance metrics. Desktop Delivery Controller Performance 19 As mentioned previously, 3 DDCs were used in this test with specific roles assigned. All are running as Virtual Machines on a separate ESX server to the desktop VMs. Configured with 4vCPU and 4GB RAM, running on a HP BL460c with 2 x 1.8 GHz Quad Core L5320 CPU and 16 GB RAM. DDC1: Farm Master + Pool Management Pool Spin Up During Test Run % Processor Time: _Total (4vCPU) 15:11:22 15:15:45 15:20:08 15:24:31 15:28:54 15:33:17 15:37:40 15:42:03 15:46:26 15:50:49 15:55:11 15:59:33 16:03:56 16:08:18 16:12:41 16:17:03 16:21:26 16:25:49 16:30:12 16:34:35 16:38:59 50 45 40 35 30 25 20 15 10 5 0 13:10:54 13:14:48 13:18:43 13:22:38 13:26:33 13:30:27 13:34:21 13:38:15 13:42:09 13:46:03 13:49:57 13:53:51 13:57:45 14:01:39 14:05:33 14:09:27 14:13:21 14:17:15 14:21:09 14:25:03 14:28:57 50 45 40 35 30 25 20 15 10 5 0 XenDesktop Services % Processor Time 15:11:22 15:15:59 15:20:36 15:25:14 15:29:51 15:34:28 15:39:05 15:43:43 15:48:19 15:52:56 15:57:33 16:02:09 16:06:46 16:11:23 16:15:59 16:20:36 16:25:13 16:29:50 16:34:28 16:39:06 200 180 160 140 120 100 80 60 40 20 0 13:10:54 13:14:55 13:18:58 13:23:00 13:27:01 13:31:02 13:35:03 13:39:04 13:43:06 13:47:07 13:51:08 13:55:09 13:59:10 14:03:11 14:07:13 14:11:14 14:15:15 14:19:16 14:23:17 14:27:18 200 180 160 140 120 100 80 60 40 20 0 Process ‐‐ % Processor Time ‐‐ CdsPoolMgr Process ‐‐ % Processor Time ‐‐ CdsPoolMgr Process ‐‐ % Processor Time ‐‐ CdsImaProxy Process ‐‐ % Processor Time ‐‐ CdsImaProxy Process ‐‐ % Processor Time ‐‐ CdsController Process ‐‐ % Processor Time ‐‐ CdsController Process ‐‐ % Processor Time ‐‐ CitrixManagementServer Process ‐‐ % Processor Time ‐‐ CitrixManagementServer Process ‐‐ % Processor Time ‐‐ ImaSrv Process ‐‐ % Processor Time ‐‐ ImaSrv The main item to note is that during pool spin up the high usage process is the CdsPoolMgr process. This is expected is it drives Virtual Center to start the guests up. The two peaks of the IMA service during Pool Spin up are caused by the UI taking the two desktop groups out of Maintenance Mode. During the test run itself IMASrv is responsible for brokering all the desktops, and so the zone master takes the most load while making 20 the decision on desktop assignment. In the later stage of the run desktops are starting to logoff and so the Pool Management Service is starting to shutdown and restart the desktops. 21 15:11:22 15:16:13 15:21:05 15:25:56 15:30:48 15:35:39 15:40:31 15:45:22 15:50:13 15:55:04 15:59:55 16:04:45 16:09:36 16:14:27 16:19:18 16:24:09 16:29:01 16:33:53 16:38:45 13:10:54 13:15:09 13:19:27 13:23:42 13:27:58 13:32:13 13:36:28 13:40:44 13:44:59 13:49:15 13:53:30 13:57:45 14:02:00 14:06:16 14:10:31 14:14:46 14:19:02 14:23:17 14:27:32 15:11:22 15:15:59 15:20:36 15:25:14 15:29:51 15:34:28 15:39:05 15:43:43 15:48:19 15:52:56 15:57:33 16:02:09 16:06:46 16:11:23 16:15:59 16:20:36 16:25:13 16:29:50 16:34:28 16:39:06 13:10:54 13:14:55 13:18:58 13:23:00 13:27:01 13:31:02 13:35:03 13:39:04 13:43:06 13:47:07 13:51:08 13:55:09 13:59:10 14:03:11 14:07:13 14:11:14 14:15:15 14:19:16 14:23:17 14:27:18 15:11:22 15:16:27 15:21:33 15:26:39 15:31:45 15:36:50 15:41:56 15:47:01 15:52:07 15:57:12 16:02:16 16:07:21 16:12:27 16:17:32 16:22:37 16:27:42 16:32:48 16:37:55 13:10:54 13:15:23 13:19:55 13:24:25 13:28:55 13:33:24 13:37:54 13:42:23 13:46:53 13:51:22 13:55:52 14:00:21 14:04:51 14:09:20 14:13:50 14:18:19 14:22:49 14:27:18 MiB 2,000 MiB Memory – Committed Megabytes 4,000 4,000 3,500 3,500 3,000 3,000 2,500 2,500 1,500 100 95 90 85 80 75 70 65 60 55 50 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 2,000 1,500 1,000 1,000 500 500 0 0 The memory usage on this DDC grows significantly towards the end of the run as users log off. This will trigger the tainting detection code to shutdown the VM. Once shutdown pool management will power it back on again. time garbage collection would correct the spike. Further investigation is required to better understand the dramatic memory increase at the end of the test. It’s suspected that given enough PhysicalDisk -- % Idle Time -- _Total 100 95 90 85 80 75 70 65 60 55 50 Context Switches (per second) 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 22 5 4.5 4.5 4 4 3.5 3.5 3 3 Mbps 5 2.5 2.5 2 1.5 1.5 1 1 0.5 0.5 0 0 Mbps Received 15:11:22 15:16:27 15:21:33 15:26:39 15:31:45 15:36:50 15:41:56 15:47:01 15:52:07 15:57:12 16:02:16 16:07:21 16:12:27 16:17:32 16:22:37 16:27:42 16:32:48 16:37:55 2 13:10:54 13:15:23 13:19:55 13:24:25 13:28:55 13:33:24 13:37:54 13:42:23 13:46:53 13:51:22 13:55:52 14:00:21 14:04:51 14:09:20 14:13:50 14:18:19 14:22:49 14:27:18 Mbps Network Utilisation (Mbps) Mbps Sent Mbps Received Mbps Sent The spikes in network traffic at the end of the test correspond to the desktops being shutdown and restarted by the pool management service. This traffic is between the DDC and the Virtual Center servers, as can be seen by the corresponding increase on traffic on both VC at this time. 23 DDC2: XML + VDA registration Pool Spin Up During Test Run % Processor Time: _Total (4vCPU) 15:11:22 15:15:44 15:20:07 15:24:30 15:28:53 15:33:16 15:37:38 15:42:01 15:46:23 15:50:46 15:55:08 15:59:30 16:03:52 16:08:14 16:12:36 16:16:58 16:21:20 16:25:42 16:30:04 16:34:26 16:38:48 50 45 40 35 30 25 20 15 10 5 0 13:10:53 13:14:47 13:18:41 13:22:34 13:26:28 13:30:22 13:34:15 13:38:09 13:42:03 13:45:56 13:49:50 13:53:44 13:57:38 14:01:31 14:05:25 14:09:19 14:13:12 14:17:06 14:20:59 14:24:53 14:28:47 50 45 40 35 30 25 20 15 10 5 0 XenDesktop Services % Processor Time 15:11:22 15:15:59 15:20:36 15:25:13 15:29:50 15:34:27 15:39:03 15:43:41 15:48:17 15:52:53 15:57:29 16:02:05 16:06:42 16:11:18 16:15:54 16:20:30 16:25:06 16:29:43 16:34:19 16:38:56 200 180 160 140 120 100 80 60 40 20 0 13:10:53 13:14:54 13:18:55 13:22:56 13:26:56 13:30:57 13:34:58 13:38:58 13:42:59 13:47:00 13:51:01 13:55:02 13:59:03 14:03:03 14:07:04 14:11:05 14:15:05 14:19:06 14:23:07 14:27:08 200 180 160 140 120 100 80 60 40 20 0 Process ‐‐ % Processor Time ‐‐ CdsPoolMgr Process ‐‐ % Processor Time ‐‐ CdsPoolMgr Process ‐‐ % Processor Time ‐‐ CdsImaProxy Process ‐‐ % Processor Time ‐‐ CdsImaProxy Process ‐‐ % Processor Time ‐‐ CdsController Process ‐‐ % Processor Time ‐‐ CdsController Process ‐‐ % Processor Time ‐‐ CitrixManagementServer Process ‐‐ % Processor Time ‐‐ CitrixManagementServer Process ‐‐ % Processor Time ‐‐ ImaSrv Process ‐‐ % Processor Time ‐‐ ImaSrv In contrast to DDC1 the load is noticeably lower. The main active process is the CdsController which handles communication with the VDA including heartbeats and initial registration. 24 15:11:22 15:16:13 15:21:04 15:25:55 15:30:47 15:35:38 15:40:29 15:45:20 15:50:10 15:55:00 15:59:51 16:04:41 16:09:31 16:14:22 16:19:12 16:24:03 16:28:53 16:33:44 16:38:34 13:10:53 13:15:08 13:19:23 13:23:38 13:27:53 13:32:08 13:36:23 13:40:38 13:44:52 13:49:08 13:53:22 13:57:38 14:01:53 14:06:07 14:10:22 14:14:37 14:18:52 14:23:07 14:27:22 15:11:22 15:15:59 15:20:36 15:25:13 15:29:50 15:34:27 15:39:03 15:43:41 15:48:17 15:52:53 15:57:29 16:02:05 16:06:42 16:11:18 16:15:54 16:20:30 16:25:06 16:29:43 16:34:19 16:38:56 13:10:53 13:14:54 13:18:55 13:22:56 13:26:56 13:30:57 13:34:58 13:38:58 13:42:59 13:47:00 13:51:01 13:55:02 13:59:03 14:03:03 14:07:04 14:11:05 14:15:05 14:19:06 14:23:07 14:27:08 15:11:22 15:16:27 15:21:33 15:26:38 15:31:43 15:36:49 15:41:54 15:46:59 15:52:03 15:57:08 16:02:12 16:07:17 16:12:21 16:17:26 16:22:31 16:27:35 16:32:40 16:37:45 13:10:53 13:15:22 13:19:52 13:24:20 13:28:49 13:33:19 13:37:48 13:42:17 13:46:46 13:51:15 13:55:44 14:00:13 14:04:43 14:09:11 14:13:40 14:18:09 14:22:38 14:27:08 MiB 2,000 MiB Memory – Committed Megabytes 4,000 4,000 3,500 3,500 3,000 3,000 2,500 2,500 1,500 100 95 90 85 80 75 70 65 60 55 50 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 2,000 1,500 1,000 1,000 500 500 0 0 PhysicalDisk -- % Idle Time -- _Total 100 95 90 85 80 75 70 65 60 55 50 Due to some previous memory leak tracing for the IMA Service, user mode stack trace database was being created for the imasrv.exe. This extra tracing was causing the higher than normal disk utilization, showing a steady baseline of 20% utilisation. Context Switches (per second) 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Network Utilisation (Mbps) 25 Mbps Received 15:11:22 15:15:44 15:20:06 15:24:29 15:28:51 15:33:13 15:37:36 15:41:58 15:46:21 15:50:42 15:55:04 15:59:26 16:03:48 16:08:10 16:12:32 16:16:54 16:21:16 16:25:38 16:30:00 16:34:22 16:38:45 13:10:54 13:14:47 13:18:42 13:22:35 13:26:29 13:30:22 13:34:16 13:38:10 13:42:03 13:45:57 13:49:51 13:53:44 13:57:38 14:01:32 14:05:26 14:09:19 14:13:13 14:17:06 14:21:00 14:24:53 14:28:47 15:11:22 15:16:27 15:21:33 15:26:38 15:31:43 15:36:49 15:41:54 15:46:59 15:52:03 15:57:08 16:02:12 16:07:17 16:12:21 16:17:26 16:22:31 16:27:35 16:32:40 16:37:45 13:10:53 13:15:22 13:19:52 13:24:20 13:28:49 13:33:19 13:37:48 13:42:17 13:46:46 13:51:15 13:55:44 14:00:13 14:04:43 14:09:11 14:13:40 14:18:09 14:22:38 14:27:08 Mbps 2.5 Mbps 5 5 4.5 4.5 4 4 3.5 3.5 3 3 50 45 40 35 30 25 20 15 10 5 0 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 Mbps Sent Mbps Received DDC3: XML + VDA registration Pool Spin Up Mbps Sent During Test Run % Processor Time: _Total (4vCPU) 50 45 40 35 30 25 20 15 10 5 0 26 XenDesktop Services % Processor Time 15:11:22 15:15:58 15:20:35 15:25:11 15:29:48 15:34:24 15:39:01 15:43:38 15:48:14 15:52:50 15:57:26 16:02:02 16:06:38 16:11:14 16:15:50 16:20:26 16:25:03 16:29:39 16:34:15 16:38:52 200 180 160 140 120 100 80 60 40 20 0 13:10:54 13:14:55 13:18:56 13:22:56 13:26:57 13:30:58 13:34:59 13:38:59 13:43:00 13:47:01 13:51:02 13:55:02 13:59:03 14:03:04 14:07:05 14:11:05 14:15:06 14:19:06 14:23:07 14:27:08 200 180 160 140 120 100 80 60 40 20 0 Process ‐‐ % Processor Time ‐‐ CdsPoolMgr Process ‐‐ % Processor Time ‐‐ CdsPoolMgr Process ‐‐ % Processor Time ‐‐ CdsImaProxy Process ‐‐ % Processor Time ‐‐ CdsImaProxy Process ‐‐ % Processor Time ‐‐ CdsController Process ‐‐ % Processor Time ‐‐ CdsController Process ‐‐ % Processor Time ‐‐ CitrixManagementServer Process ‐‐ % Processor Time ‐‐ CitrixManagementServer Process ‐‐ % Processor Time ‐‐ ImaSrv Process ‐‐ % Processor Time ‐‐ ImaSrv The load profile is as expected similar to DDC2. In contrast to DDC1 the load is noticeably lower. The main active process is the CdsController which handles communication with the VDA including heartbeats and initial registration. 4,000 3,500 3,500 3,000 3,000 2,500 2,500 MiB 4,000 2,000 2,000 1,500 1,000 1,000 500 500 0 0 15:11:22 15:16:26 15:21:31 15:26:36 15:31:41 15:36:46 15:41:51 15:46:56 15:52:00 15:57:05 16:02:09 16:07:14 16:12:18 16:17:22 16:22:27 16:27:31 16:32:36 16:37:41 1,500 13:10:54 13:15:23 13:19:52 13:24:21 13:28:50 13:33:19 13:37:49 13:42:18 13:46:47 13:51:16 13:55:45 14:00:14 14:04:43 14:09:12 14:13:41 14:18:10 14:22:39 14:27:08 MiB Memory – Committed Megabytes 27 15:11:22 15:16:12 15:21:03 15:25:54 15:30:44 15:35:35 15:40:26 15:45:17 15:50:07 15:54:57 15:59:48 16:04:38 16:09:28 16:14:18 16:19:09 16:23:59 16:28:49 16:33:40 16:38:31 13:10:54 13:15:09 13:19:24 13:23:39 13:27:54 13:32:09 13:36:24 13:40:38 13:44:53 13:49:08 13:53:23 13:57:38 14:01:53 14:06:08 14:10:23 14:14:37 14:18:52 14:23:07 14:27:22 15:11:22 15:15:58 15:20:35 15:25:11 15:29:48 15:34:24 15:39:01 15:43:38 15:48:14 15:52:50 15:57:26 16:02:02 16:06:38 16:11:14 16:15:50 16:20:26 16:25:03 16:29:39 16:34:15 16:38:52 13:10:54 13:14:55 13:18:56 13:22:56 13:26:57 13:30:58 13:34:59 13:38:59 13:43:00 13:47:01 13:51:02 13:55:02 13:59:03 14:03:04 14:07:05 14:11:05 14:15:06 14:19:06 14:23:07 14:27:08 PhysicalDisk -- % Idle Time -- _Total 100 95 90 85 80 75 70 65 60 55 50 100 95 90 85 80 75 70 65 60 55 50 Context Switches (per second) 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 28 5 5 4.5 4.5 4 4 3.5 3.5 3 3 Mbps Mbps Network Utilisation (Mbps) 2.5 2 2.5 2 1.5 1.5 1 1 0.5 0.5 0 Mbps Received 15:11:22 15:16:26 15:21:31 15:26:36 15:31:41 15:36:46 15:41:51 15:46:56 15:52:00 15:57:05 16:02:09 16:07:14 16:12:18 16:17:22 16:22:27 16:27:31 16:32:36 16:37:41 13:10:54 13:15:23 13:19:52 13:24:21 13:28:50 13:33:19 13:37:49 13:42:18 13:46:47 13:51:16 13:55:45 14:00:14 14:04:43 14:09:12 14:13:41 14:18:10 14:22:39 14:27:08 0 Mbps Sent Mbps Received Mbps Sent Citrix Provisioning Services (PVS) Performance There are 2 PVS servers handling the 3312 desktops in the environment. The processor and memory configuration for these servers can clearly be seen to significantly over-specified. The servers’ 8 gigabit NICs were configured as NIC team, the blade chassis had 4x10GbE uplink to the core switch. The PVS servers are each running on BL680c blades with 4 x E7450 2.40 GHz hex core CPUs, with 64GB RAM. 29 15:11:22 15:16:26 15:21:31 15:26:35 15:31:40 15:36:45 15:41:50 15:46:54 15:51:59 15:57:03 16:02:08 16:07:12 16:12:16 16:17:21 16:22:25 16:27:30 16:32:34 16:37:39 13:10:53 13:17:22 13:23:51 13:30:21 13:36:50 13:43:20 13:49:50 13:56:19 14:02:49 14:09:18 14:15:47 14:22:16 14:28:46 14:35:15 14:41:44 14:48:13 14:54:42 15:01:11 MiB 6,000 5,900 5,800 5,700 5,600 5,500 5,400 5,300 5,200 5,100 5,000 MiB 15:11:22 15:15:58 15:20:34 15:25:10 15:29:47 15:34:23 15:39:00 15:43:36 15:48:12 15:52:48 15:57:25 16:02:01 16:06:37 16:11:13 16:15:49 16:20:25 16:25:01 16:29:37 16:34:14 16:38:50 13:10:53 13:16:39 13:22:26 13:28:13 13:34:00 13:39:48 13:45:35 13:51:22 13:57:09 14:02:56 14:08:43 14:14:30 14:20:16 14:26:03 14:31:49 14:37:36 14:43:23 14:49:09 14:54:56 15:00:43 PVS Server 1 Pool Spin Up During Test Run % Processor Time: _Total (4 x 6 Core CPUs) 100 90 80 70 60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 0 Memory – Committed Megabytes 6,000 5,900 5,800 5,700 5,600 5,500 5,400 5,300 5,200 5,100 5,000 30 Mbps Received 15:11:22 15:16:26 15:21:31 15:26:35 15:31:40 15:36:45 15:41:50 15:46:54 15:51:59 15:57:03 16:02:08 16:07:12 16:12:16 16:17:21 16:22:25 16:27:30 16:32:34 16:37:39 13:10:53 13:17:22 13:23:51 13:30:21 13:36:50 13:43:20 13:49:50 13:56:19 14:02:49 14:09:18 14:15:47 14:22:16 14:28:46 14:35:15 14:41:44 14:48:13 14:54:42 15:01:11 1500 Mbps Mbps 15:11:22 15:15:58 15:20:34 15:25:10 15:29:47 15:34:23 15:39:00 15:43:36 15:48:12 15:52:48 15:57:25 16:02:01 16:06:37 16:11:13 16:15:49 16:20:25 16:25:01 16:29:37 16:34:14 16:38:50 13:10:53 13:16:39 13:22:26 13:28:13 13:34:00 13:39:48 13:45:35 13:51:22 13:57:09 14:02:56 14:08:43 14:14:30 14:20:16 14:26:03 14:31:49 14:37:36 14:43:23 14:49:09 14:54:56 15:00:43 PhysicalDisk -- % Idle Time -- _Total 100 90 80 70 60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 0 Network Utilisation (Mbps) (8 Teamed 1GbE NICs) 4000 3500 4000 3000 3500 2500 3000 2000 2500 1000 500 0 2000 1500 1000 500 0 Mbps Sent Mbps Received Mbps Sent Peak traffic occurs during the user logon phase of the test run with a peak close to 2.3Gbps. 31 15:11:22 15:16:26 15:21:31 15:26:36 15:31:40 15:36:45 15:41:50 15:46:54 15:51:59 15:57:03 16:02:07 16:07:12 16:12:16 16:17:20 16:22:24 16:27:29 16:32:33 16:37:38 13:10:53 13:17:22 13:23:52 13:30:21 13:36:51 13:43:20 13:49:50 13:56:19 14:02:49 14:09:18 14:15:47 14:22:16 14:28:45 14:35:14 14:41:43 14:48:12 14:54:41 15:01:10 MiB 6,000 5,900 5,800 5,700 5,600 5,500 5,400 5,300 5,200 5,100 5,000 MiB 15:11:22 15:15:58 15:20:34 15:25:11 15:29:47 15:34:23 15:39:00 15:43:36 15:48:12 15:52:48 15:57:24 16:02:00 16:06:36 16:11:12 16:15:48 16:20:24 16:25:00 16:29:36 16:34:13 16:38:49 13:10:53 13:16:39 13:22:26 13:28:14 13:34:01 13:39:48 13:45:35 13:51:22 13:57:09 14:02:56 14:08:43 14:14:29 14:20:16 14:26:03 14:31:49 14:37:36 14:43:22 14:49:09 14:54:55 15:00:42 PVS Server 2 Pool Spin Up During Test Run % Processor Time: _Total (4 x 6 Core CPUs) 100 90 80 70 60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 0 The 24 core server is clearly over specified. With a peak of < 30%, this would equate to 7.2 cores. A dual quad core server would expect to be able to handle this load, though may be too close to the maximum utilisation; hence instead of two 24 core servers, three 8 core servers would expect to be sufficient. Memory – Committed Megabytes 6,000 5,900 5,800 5,700 5,600 5,500 5,400 5,300 5,200 5,100 5,000 32 Mbps Received 15:11:22 15:16:26 15:21:31 15:26:36 15:31:40 15:36:45 15:41:50 15:46:54 15:51:59 15:57:03 16:02:07 16:07:12 16:12:16 16:17:20 16:22:24 16:27:29 16:32:33 16:37:38 13:10:53 13:17:22 13:23:52 13:30:21 13:36:51 13:43:20 13:49:50 13:56:19 14:02:49 14:09:18 14:15:47 14:22:16 14:28:45 14:35:14 14:41:43 14:48:12 14:54:41 15:01:10 Mbps 2000 Mbps 15:11:22 15:15:58 15:20:34 15:25:11 15:29:47 15:34:23 15:39:00 15:43:36 15:48:12 15:52:48 15:57:24 16:02:00 16:06:36 16:11:12 16:15:48 16:20:24 16:25:00 16:29:36 16:34:13 16:38:49 13:10:53 13:16:39 13:22:26 13:28:14 13:34:01 13:39:48 13:45:35 13:51:22 13:57:09 14:02:56 14:08:43 14:14:29 14:20:16 14:26:03 14:31:49 14:37:36 14:43:22 14:49:09 14:54:55 15:00:42 PhysicalDisk -- % Idle Time -- _Total 100 90 80 70 60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 0 Network Utilisation (Mbps) (8 Teamed 1GbE NICs) 4000 4000 3500 3500 3000 3000 2500 2500 Mbps Sent 2000 1500 1500 1000 1000 500 500 0 0 Mbps Received Mbps Sent This network load mirrors the load seen on the other PVS server, with a peak close to 2.2Gbps. 33 NetApp Storage Performance Analysis concentrates on the actual test run rather than the spin up phase as the load is significantly higher. The following summary (courtesy of NetApp) captures the critical read/write and IOPS info for the 3312 desktop test. Averages for 3312 Virtual Desktops Mean Network Read/Write ratio Max Network Read/Write ratio Mean Disk Read/Write ratio Max Disk Read/Write ratio Reads 11.5% 20.5% 14.2% 17.8% Mean IOPS per desktop Max Average IOPS per desktop IOPS 4.4 27.7 Writes 88.5% 79.5% 85.8% 82.2% Analysis o Never did more than 2 CPUS of the 4 on the storage controllers become fully utilised, staying well within normal operating limits with significant headroom for further growth if performance during a cluster failover is not required. o The average latency for all protocols was well within reasonable performance, which would provide an excellent end user experience. o During the start and end of the test run the CIFS workload was a 50% player in protocol usage. This is seen as a large amount of reads during the beginning of the test (when user profiles are loaded) and a large amount of writes at the end of the test (when profiles are written back). o The remaining duration of the test NFS played the predominate role being utilised for PVS client side cache. o FCP (Fibre Channel) played very little if no part in the workload seen on the filer. FCP was limited to database traffic for the various components in the environment. o The majority of all IO’s were writes across all protocols. o Average and Max Disk utilization was never more than 40% which suggests there could be headroom to accept more virtual machines on to these controllers. o In the event of a cluster failure the data indicates the filer could handle 3000-4000 desktops with minimal or no performance degradation. 34 VMWare Virtual Center and ESX Performance Two blade servers have been installed as physical Virtual Center servers. Within each VC a cluster is created for each blade chassis of up to 16 ESX hosts. As there are two different hardware specs in the lab the number of Virtual Desktops hosted on each VC isn’t quite balanced. Virtual Center 1 Blade Chassis # Hosts # Virtual Machines camb4e1 16 898 camb4e2 16 800 Total 32 1698 During the testing only 1604 desktops were actively used. The remaining VMs remained powered off though would still be enumerated by Virtual Center and XenDesktop Pool Management. These additional VMs are present from earlier broker scalability testing. Virtual Center 2 Blade Chassis # Hosts # Virtual Machines camr3e1 16 481 camr3e2 14 392 camr5e1 16 480 camr5e2 15 420 Total 61 1773 In addition to the desktops above, VC2 also manages camr5e2b13 which hosts some infrastructure VMs, e.g. 3 x Brokers and 1 x NetApp performance monitor. Out of the 1773 desktop VMs only 1708 were powered on. As with VC1 these additional VMs were present from earlier testing at higher host densities. 35 15:11:21 15:16:22 15:21:22 15:26:22 15:31:23 15:36:23 15:41:23 15:46:24 15:51:24 15:56:24 16:01:25 16:06:25 16:11:25 16:16:26 16:21:26 16:26:26 16:31:27 16:36:27 16:41:27 13:10:53 13:16:53 13:22:54 13:28:54 13:34:55 13:40:55 13:46:55 13:52:56 13:58:56 14:04:57 14:10:57 14:16:57 14:22:58 14:28:58 14:34:59 14:40:59 14:46:59 14:53:00 14:59:00 15:05:01 100 90 80 70 60 50 40 30 20 10 0 250 250 200 200 150 150 100 100 50 50 0 0 15:11:21 15:16:22 15:21:22 15:26:22 15:31:23 15:36:23 15:41:23 15:46:24 15:51:24 15:56:24 16:01:25 16:06:25 16:11:25 16:16:26 16:21:26 16:26:26 16:31:27 16:36:27 16:41:27 13:10:53 13:16:53 13:22:54 13:28:54 13:34:55 13:40:55 13:46:55 13:52:56 13:58:56 14:04:57 14:10:57 14:16:57 14:22:58 14:28:58 14:34:59 14:40:59 14:46:59 14:53:00 14:59:00 15:05:01 camr3e2b15: Virtual Center 1 Pool Spin Up During Test Run % Processor Time: _Total (2 x 4 Core CPUs) 100 90 80 70 60 50 40 30 20 10 0 Process -- % Processor Time -- vpxd The vpxd service is exercised when XenDesktop Pool Management is requesting VMs be powered up or shut down. This can be seen during the spin up phase and at the end of the test run. As this server has 8 cores, the peak at ~200% would be equivalent to 2 cores being fully utilised. 36 15:11:21 15:16:52 15:22:22 15:27:52 15:33:23 15:38:53 15:44:24 15:49:54 15:55:24 16:00:55 16:06:25 16:11:55 16:17:26 16:22:56 16:28:26 16:33:57 16:39:27 13:10:53 13:17:23 13:23:54 13:30:24 13:36:55 13:43:25 13:49:56 13:56:26 14:02:57 14:09:27 14:15:57 14:22:28 14:28:58 14:35:29 14:41:59 14:48:29 14:55:00 15:01:30 MiB MiB 2,000 1,950 1,900 1,850 1,800 1,750 1,700 1,650 1,600 1,550 1,500 100 100 98 98 96 94 92 90 88 88 86 15:11:21 15:16:22 15:21:22 15:26:22 15:31:23 15:36:23 15:41:23 15:46:24 15:51:24 15:56:24 16:01:25 16:06:25 16:11:25 16:16:26 16:21:26 16:26:26 16:31:27 16:36:27 16:41:27 13:10:53 13:16:53 13:22:54 13:28:54 13:34:55 13:40:55 13:46:55 13:52:56 13:58:56 14:04:57 14:10:57 14:16:57 14:22:58 14:28:58 14:34:59 14:40:59 14:46:59 14:53:00 14:59:00 15:05:01 Memory – Committed Megabytes 2,000 1,950 1,900 1,850 1,800 1,750 1,700 1,650 1,600 1,550 1,500 PhysicalDisk -- % Idle Time -- _Total 96 94 92 90 37 15:11:21 15:16:52 15:22:22 15:27:52 15:33:23 15:38:53 15:44:24 15:49:54 15:55:24 16:00:55 16:06:25 16:11:55 16:17:26 16:22:56 16:28:26 16:33:57 16:39:27 13:10:53 13:17:23 13:23:54 13:30:24 13:36:55 13:43:25 13:49:56 13:56:26 14:02:57 14:09:27 14:15:57 14:22:28 14:28:58 14:35:29 14:41:59 14:48:29 14:55:00 15:01:30 Mbps 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Mbps Network Utilisation (Mbps) 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 NIC1: Mbps Received NIC1: Mbps Sent NIC1: Mbps Received NIC1: Mbps Sent NIC2: Mbps Received NIC2: Mbps Sent NIC2: Mbps Received NIC2: Mbps Sent 38 15:11:22 15:16:23 15:21:23 15:26:23 15:31:24 15:36:24 15:41:24 15:46:25 15:51:25 15:56:25 16:01:26 16:06:26 16:11:26 16:16:27 16:21:27 16:26:27 16:31:28 16:36:28 16:41:29 13:10:53 13:16:53 13:22:54 13:28:54 13:34:55 13:40:55 13:46:55 13:52:56 13:58:56 14:04:57 14:10:57 14:16:58 14:22:58 14:28:58 14:34:59 14:40:59 14:47:00 14:53:00 14:59:00 15:05:01 100 90 80 70 60 50 40 30 20 10 0 250 250 200 200 150 150 100 100 50 50 0 0 15:11:22 15:16:23 15:21:23 15:26:23 15:31:24 15:36:24 15:41:24 15:46:25 15:51:25 15:56:25 16:01:26 16:06:26 16:11:26 16:16:27 16:21:27 16:26:27 16:31:28 16:36:28 16:41:29 13:10:53 13:16:53 13:22:54 13:28:54 13:34:55 13:40:55 13:46:55 13:52:56 13:58:56 14:04:57 14:10:57 14:16:58 14:22:58 14:28:58 14:34:59 14:40:59 14:47:00 14:53:00 14:59:00 15:05:01 camr3e2b16: Virtual Center 2 Pool Spin Up During Test Run % Processor Time: _Total (2 x 4 Core CPUs) 100 90 80 70 60 50 40 30 20 10 0 Process -- % Processor Time -- vpxd The load on vpxd is consistent between the two VC servers. As this server has 8 cores, the peak at ~230% would be equivalent to a little more than 2 cores being fully utilised. 39 15:11:22 15:16:53 15:22:23 15:27:54 15:33:24 15:38:54 15:44:25 15:49:55 15:55:25 16:00:56 16:06:26 16:11:56 16:17:27 16:22:57 16:28:28 16:33:58 16:39:28 13:10:53 13:17:23 13:23:54 13:30:24 13:36:55 13:43:25 13:49:56 13:56:26 14:02:57 14:09:27 14:15:58 14:22:28 14:28:58 14:35:29 14:41:59 14:48:30 14:55:00 15:01:31 MiB MiB 2,000 1,950 1,900 1,850 1,800 1,750 1,700 1,650 1,600 1,550 1,500 100 100 99 99 98 98 97 97 96 96 95 95 94 94 93 93 92 92 91 91 15:11:22 15:16:23 15:21:23 15:26:23 15:31:24 15:36:24 15:41:24 15:46:25 15:51:25 15:56:25 16:01:26 16:06:26 16:11:26 16:16:27 16:21:27 16:26:27 16:31:28 16:36:28 16:41:29 13:10:53 13:16:53 13:22:54 13:28:54 13:34:55 13:40:55 13:46:55 13:52:56 13:58:56 14:04:57 14:10:57 14:16:58 14:22:58 14:28:58 14:34:59 14:40:59 14:47:00 14:53:00 14:59:00 15:05:01 Memory – Committed Megabytes 2,000 1,950 1,900 1,850 1,800 1,750 1,700 1,650 1,600 1,550 1,500 The memory used on each VC is similar, though VC2 is ~300MiB higher. This is to be expected as it’s managing twice the number of ESX hosts and a higher number of VM guests. PhysicalDisk -- % Idle Time -- _Total 40 15:11:22 15:16:53 15:22:23 15:27:54 15:33:24 15:38:54 15:44:25 15:49:55 15:55:25 16:00:56 16:06:26 16:11:56 16:17:27 16:22:57 16:28:28 16:33:58 16:39:28 13:10:53 13:17:23 13:23:54 13:30:24 13:36:55 13:43:25 13:49:56 13:56:26 14:02:57 14:09:27 14:15:58 14:22:28 14:28:58 14:35:29 14:41:59 14:48:30 14:55:00 15:01:31 Mbps 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Mbps Network Utilisation (Mbps) 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 NIC1: Mbps Received NIC1: Mbps Sent NIC1: Mbps Received NIC1: Mbps Sent NIC2: Mbps Received NIC2: Mbps Sent NIC2: Mbps Received NIC2: Mbps Sent 41 ESX Performance The test environment consists of 2 different hardware configurations running the desktop workload. The data below is from a BL460c with 32 GB RAM and 2 x L5420 Quad Core CPU. Pool Spin Up During Test Run 700 600 600 500 500 400 400 300 300 200 200 100 100 0 0 15:03:32 15:10:12 15:16:52 15:23:32 15:30:12 15:36:52 15:43:32 15:50:12 15:56:52 16:03:32 16:10:12 16:16:52 16:23:32 16:30:12 16:36:52 16:43:32 Percent 700 13:10:11 13:16:51 13:23:31 13:30:11 13:36:51 13:43:31 13:50:11 13:56:51 14:03:32 14:10:12 14:16:52 14:23:32 14:30:12 14:36:52 14:43:32 14:50:12 14:56:52 Percent CPU Usage (2 x L5420 Quad Core 2.5GHz CPU) CPU Usage (Average) ‐ 0 CPU Usage (Average) ‐ 1 CPU Usage (Average) ‐ 0 CPU Usage (Average) ‐ 1 CPU Usage (Average) ‐ 2 CPU Usage (Average) ‐ 3 CPU Usage (Average) ‐ 2 CPU Usage (Average) ‐ 3 CPU Usage (Average) ‐ 4 CPU Usage (Average) ‐ 5 CPU Usage (Average) ‐ 4 CPU Usage (Average) ‐ 5 CPU Usage (Average) ‐ 6 CPU Usage (Average) ‐ 7 CPU Usage (Average) ‐ 6 CPU Usage (Average) ‐ 7 42 20000 40 10000 20 0 60 50 40 30 20 10 0 15:04:52 15:12:12 15:19:32 15:26:52 15:34:12 15:41:32 15:48:52 15:56:12 16:03:32 16:10:38 16:17:58 16:25:18 16:32:38 16:39:58 13:10:11 13:18:51 13:27:31 13:36:11 13:44:51 13:53:31 14:02:12 14:10:52 14:19:32 14:28:12 14:36:52 14:45:32 14:54:12 0 30000 25000 20000 15000 10000 5000 0 % 60 MiB 30000 Percent MiB Memory Usage Memory Balloon (Average) Memory Balloon (Average) Memory Shared Common (Average) Memory Shared Common (Average) Memory Granted (Average) Memory Granted (Average) Memory Swap Used (Average) Memory Swap Used (Average) Memory Active (Average) Memory Active (Average) Average Memory Usage (%) Average Memory Usage (%) 2000 1500 1000 500 0 15:03:52 15:09:52 15:15:52 15:21:52 15:27:52 15:33:52 15:39:52 15:45:52 15:51:52 15:57:52 16:03:52 16:09:52 16:15:52 16:21:52 16:27:52 16:33:52 16:39:52 16:45:52 KBps 2000 1500 1000 500 0 13:10:12 13:16:32 13:22:52 13:29:12 13:35:32 13:41:52 13:48:12 13:54:32 14:00:52 14:07:12 14:13:32 14:19:52 14:26:12 14:32:32 14:38:52 14:45:12 14:51:32 14:57:52 KBps Disk Usage – Kilobytes/second Disk Read Rate ‐ vmhba0:0:0 Disk Read Rate ‐ vmhba0:0:0 Disk Write Rate ‐ vmhba0:0:0 Disk Write Rate ‐ vmhba0:0:0 This traffic is on the local physical disk of the ESX host, rather than tracking the activity of the VMs as these are on NFS shared storage. The frequency of the disk activity would suggest some logging, perhaps of performance data from the VMs. The rate of traffic appears to be proportional to the number of running virtual machines. 200 150 150 Mbps 200 100 100 50 0 0 15:04:52 15:10:32 15:16:12 15:21:52 15:27:32 15:33:12 15:38:52 15:44:32 15:50:12 15:55:52 16:01:32 16:07:12 16:12:52 16:18:32 16:24:12 16:29:52 16:35:32 16:41:12 50 13:10:11 13:16:31 13:22:51 13:29:11 13:35:31 13:41:51 13:48:11 13:54:31 14:00:52 14:07:12 14:13:32 14:19:52 14:26:12 14:32:32 14:38:52 14:45:12 14:51:32 14:57:52 Mbps Network Utilisation (Mbps) vmnic0: Mbps Sent vmnic1: Mbps Sent vmnic0: Mbps Sent vmnic1: Mbps Sent vmnic0: Mbps Receive vmnic1: Mbps Receive vmnic0: Mbps Receive vmnic1: Mbps Receive 43 Summary Spend extra time and care on how you simulate the user workload as it highly impacts all design recommendations. o Don’t forget to consider the entire user population and how and when login storms will occur. Use free and reputable tools like LoginVSI from Login Consultants to simulate real-worldlike user workloads. Design for failover, your infrastructure size will depend on what user experience you want during failover (degraded or not, and how much). o Use central storage and blade servers for scale and reliability. Virtualize most major components of XenDesktop o Provisioning server in this design was not virtualized, and given the high scalability; you should dedicate a physical server to it in your design. It will be an option to run PVS virtualized, but look for recommendations on this in an upcoming document. 44 Appendix A – Blade Server Hardware and Deployment The test environment consists of primarily HP Blade servers. Some additional servers hosting infrastructure of specific test components are detailed later in this report. VMware ESX was installed on the 2 different specification BL460 servers, labelled (V1) and (V2), which were used to host both Windows XP Desktops and a small number of VMs for XenDesktop Brokers (DDCs). The BL680 servers were used to host two Citrix Provisioning Services and a Microsoft SQL Server. These machines were somewhat over specified for their roles. BL460c (v1) – 1.86Ghz Dual Processor Quad Core 16GB RAM • 2 x 1.86Ghz Intel Xeon L5320 Quad Core (8MiB L2 Cache 1066Mhz Bus) • 1 x 36GB HDD SAS 10K rpm • 16 GB RAM 667Mhz • Dual Broadcom 1Gb NICs • QLogic QMH2462 Dual Port Fibre Channel HBA Product Overview: http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/460c/index.html BL460c (v2) – 2.5Ghz Dual Processor Quad Core 32GB RAM • 2 x 2.5Ghz Intel Xeon L5420 Quad Core (12MiB L2 Cache 1333Mhz Bus) • 1 x 72GB HDD SAS 10K rpm • 32 GiB RAM 667Mhz • Dual Broadcom 1Gb NICs • QLogic QMH2462 Dual Port Fibre Channel HBA Product http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/460c/index.html Overview: 45 BL680 G5 – 2.4Ghz Quad Processor Hex Core 64GB RAM • 4 x 2.4Ghz Intel Xeon E7450 Hex Core (9MiB L2 Cache (12MiB L3 Cache) 1000Mhz Bus) • 2 x 72GB HDD SAS 10K rpm • 64 GiB RAM 667Mhz • 8 x Broadcom 1Gb NICs • QLogic QMH2462 Dual Port Fibre Channel HBA Product Overview: http://h18000.www1.hp.com/products/servers/proliant-bl/cclass/680c/index.html 46 Blade Deployment 47 Appendix B – Network Diagram This is predominately HP blade based environment running the virtual machines. Dell 1950 1U Servers are used to run many ICA clients on the same server to connect into the environment. The environment was originally designed to use Fibre Channel for storage traffic, however in this testing NFS was used as it offer greatly simplified management and scalability. All traffic is passed to either a top of rack Cisco 2960-G switch or via the Cisco blade switch modules in the blades back to a central Cisco 4510 chassis. This chassis houses multiple 1GbE and 10GbE line cards in addition to the supervisor modules. Where the blade switches support stacking this feature has been used. 48 Fibre Channel Storage Network Fibre Channel network is only used for databases on SQL server running on one of the BL680 blades servers. All other storage traffic uses NFS over Ethernet links. 49 REFERENCES Citrix (Knowledgebase Articles) Separating the Roles of Farm Master and Controller in the XenDesktop Farm (CTX117477) Registry Key Entries Used by XenDesktop (CTX117446) NetApp: Deployment Guide for XenDesktop 3.0 and VMware ESX Server on NetApp (TR-3795) NetApp and VMware Virtual Infrastructure 3 Storage Best Practices (TR-3428) Citrix XenServer 5.0 and NetApp Storage Best Practices (TR-3732) Citrix XenDesktop 2.0 with NetApp Storage— Pilot Deployment Overview (TR-3711) 2,000-Seat VMware View on NetApp Deployment Guide Using NFS (TR-3770) Project VRC / Login Consultants: VRC, VSI and Clocks Reviewed VMware Platform Performance Index v1.1 XenServer Platform Performance Index v1.0 VMware: VMware Virtual Infrastructure 3.5 Configuration Maximums Comparison of Storage Protocol Performance NetApp FAS2020HA Unified Storage 50 About Citrix Citrix Systems, Inc. (NASDAQ:CTXS) is the leading provider of virtualization, networking and software as a service technologies for more than 230,000 organizations worldwide. It is Citrix Delivery Center, Citrix Cloud Center (C3) and Citrix Online Services product families radically simplify computing for millions of users, delivering applications as an on-demand service to any user, in any location on any device. Citrix customers include the world’s largest Internet companies, 99 percent of Fortune Global 500 enterprises, and hundreds of thousands of small businesses and prosumers worldwide. Citrix partners with over 10,000 companies worldwide in more than 100 countries. Founded in 1989, annual revenue in 2008 was $1.6 billion. ©2010 Citrix Systems, Inc. All rights reserved. Citrix®, Access Gateway™, Branch Repeater™, Citrix Repeater™, HDX™, XenServer™, XenApp™, XenDesktop™ and Citrix Delivery Center™ are trademarks of Citrix Systems, Inc. and/or one or more of its subsidiaries, and may be registered in the United States Patent and Trademark Office and in other countries. All other trademarks and registered trademarks are property of their respective owners. 51
© Copyright 2024