VMware Capacity Management Presented by Metron using Athene Agenda • • • • • • Key Metrics for VI3 Increasing Efficiency Intelligent Reporting Intelligent Trending / Forecasting Intelligent Modeling ITIL Capacity Management Reasons for Capacity Management • Ensure the Enterprise receives ROI for their IT resources • Ensure capacity levels support established service level targets – Quality of Service • Ensure capacity is forecasted based on business events Key Metrics for VI3 Monitoring and Alerting VI3 METRICS • Out of all the metrics available – What should I look at, how should I use • What should I monitor? – For what reason? Problems / Transient? Trends? • What should I alert on? – >80%? VMware metrics • Host / guest metric families – – – – – – CPU %busy, %ready etc Memory used/free, reclaimed, swapped etc I/O rate and response times by disk NIC packets in/out, data rate in/out Datastore size/free/used (host only) Logical disk size/free/used (guest only) • Resource pools – Definitions, limits, shares • Clusters – CPU and Memory available for VMs Resource Pool - Key Metrics • Resource Pools – – – – – CPU Available for VM Reservation CPU Usage Memory Available for VM reservation Memory Usage Is the Resource Pool expandable? Proven Practice - Resource Pool Capacity Management with Metron Athene Cluster - Key Metrics • Clusters – – – – Effective CPU Available for VMs Effective Memory Available for VMs Total Number of Hosts Total VM Migrations Proven Practice: Capacity Management Reporting for VMware Clusters with Metron Athene Proven Practice - Creating a VMware Capacity Management Dashboard with Metron Athene Host - Key Metrics • Host – – – – – – Physical CPU utilisation Memory In Use Memory Allocated to VMs Memory Swapped In NIC utilisation Datastore Utilisation VM - Key Metrics • VM – – – – Ready Time Ballooning Driver Activity CPU & Memory Usage Disk Occupancy • Resource Pool – Limit, Reservation, Utilization Storage - Key Metrics • Storage Metrics – – – – – Total IOPS (I/O’s per second) Data Reads and Writes Disk Capacity Disk Freespace Not a metric, review the placement of the disks within the enterprise and their access VMware tool requirements • Data capture/collect/storage – Simple, consistent data retrieval – Scalable, accessible database – Auto-manage data (aggregate, copy, delete) • Automated reporting – How it was yesterday / last week/ this year – How it might be if things carry on the same way – Create reports in HTML, Word, Excel, PDF,… • Hands-on tool needed • Capacity data lives on different levels and you need a tool to bring all the data together Increasing Efficiency Increasing Efficiency • Influencing the design and implementation of VI within Enterprise – – – – 1 vCPU vs. multiple vCPU’s Transparent Page Sharing Rogue and Under-Utilized VMs From Metron paper » Using Athene to recycle VM’s » Monitor VMs < 10% CPU? Virtual CPU Considerations CPU ready with 1 VCPU CPU ready with 1 vs 2 VCPUs Number of CPUs CPU ready time Utilization % Number of CPUs, VM1GBVAP199 (under PlateSpin control) CPU ready time Utilization %, VM1GBVAP199 (under PlateSpin control) Number of CPUs, VM1GBVIF021 CPU ready time Utilization %, VM1GBVIF021 18 16 18 14 16 12 14 12 10 10 8 8 6 6 4 4 2 2 0 0 May 25, 2007 May 25, 2007 Memory Avg. 53 MB used by VM About 70 MB used to support this VM Over 272 MB shared with other VMs No memory reclamation has been necessary Rogue or Under-Utilized VMs Intelligent Reporting • • • • Daily / Weekly / Monthly Dashboard Web Portal Publishing Intelligent and not time consuming Resource Pool Report Cluster Report - Dashboard VMware Cluster Report Host Report (Overhead) VM CPU Usage Capacity Planning proven practices • What should everyone be doing – Design / 1vCPU – avoid SMP contention, get more density ( 4x more VMs per Host) – Design / Use Transparent Page Sharing, get more density (reduce memory needs for VMs) – Metric / %READY – highlight saturated hosts (over-dense) Understanding Ready Time – Process / Track impact of idle + rogue VMs (remove resources not being used) – Process / Capacity management hooked into release and change procedures Alerting • Determine what to alert on • Determine how often to alert • Determine what reports to have available when an alert is received – Have a tool box of reports that you run when an alert is received – Review past reports to determine if it is an anomaly or indication of a future problem where action needs to take place Thresholds for alerting Intelligent Trending Forecasts • Forecasting – When will I run out of capacity – Business data needed • Trending – Straight-line trends – Trends with a point in time increase “Dog Leg Trend” VMware Trend Report VMware Dog-leg Trend Intelligent Modeling Forecasts Design and Placement • Where do I put the next VMs? – Assume all things equal, only capacity is different between two clusters » » What are the goals in terms of capacity (e.g. Cluster 1 <50% full Gold Standard, Cluster 2 <80% full Low Cost Standard) How can I get a suggestion “at a glance” without having to study reports? VMware Host Modeling • • • • • More detailed than cluster level Unit of planning = Host Make workloads = VMs Must be simple to set up and use Must be able to incorporate business information or application data, if available Growing virtual workloads • 4-CPU ESX Server currently running 5 virtual machines • Requirement – Grow all workloads by 90% over 10 quarters Growing virtual workloads • 4-CPU ESX Server currently running 5 virtual machines • Requirement – Grow all workloads by 90% over 10 quarters • Second go… – When needed add a new disk to the host and move some of the I/O to it Growing virtual workloads • 4-CPU ESX Server currently running 5 virtual machines • Requirement – Grow all workloads by 90% over 10 quarters • Second go… – When needed add a new disk to the host and move some of the I/O to it • Third go… – As second go but when needed upgrade CPU power by 50% VMware Modeling summary • Deal with physical things – Guest = workload – Hosts = physical machine – Cluster = several physical machines • Use trends where trends will do • Use intelligent trends when nonlinear changes occur • Incorporate business info/needs ITIL Capacity Management What is ITIL? Why do I need ITIL? • Best practice that will fit my business • Repeatable processes • Integration • Standards ITIL® Capacity Management Levels Business Monitor Service Analyze Demand Management Modeling Application Sizing Tune Component Implement Capacity Plan Capacity Management Information System (CMIS) Virtualized Demand Management • Physical World – Operational hours – Differential charging – Restrict usage • Virtual World – Operational Demand management » Utilizes “new” technology e.g. DRS, VMotion – Strategic Demand management – Much easier, optimal resource consumption Service Response Times • Capturing Service Response Times – Why are they important? » Provide a focus on the end-user experience » Provide the Service Level manager with service performance information » A valuable feed into the Service Level Agreements (SLA) – How are they captured? » » » » GUI simulation tools Network sniffers ARMS vCenter AppSpeed Modeling: Performance Prediction R •Responses are Non-linear •Traffic related queuing •Lists, cache, freeslots •Constraints of OS and network •Constraints of RDBMS etc •Feedback loops •Non-intuitive Utilization U Non-linear change in Response Time R Data collection and management • Data collection – Central vs guest/vm » Performance overhead associated with collecting at the VM level » Awareness of logical limits that have been applied • Capacity Management Information System (CMIS) – Storage of all performance data across the enterprise – Provide central storage point » Performance data » Business data » Appropriate configuration data Key Performance Indicators (KPIs) • Business CM – % Reduction in physical estate – % Reduction in power consumption – % Reduction in cost • Service CM – Service Response Times » How this has changed over time » Potential reduction following virtualization • Component CM – No. warning/critical threshold breaches – No. of performance related incidents Capacity Management Reports • Business CM – Datacenter level – Avoid technical jargon – QoS, BMI etc • Service CM – Cluster/Resource pool – Technical utilization vs service response time – Provides the “user” perspective • Component CM – Host/VM – Technical in nature i.e. Utilization, memory consumption 7 Point Plan for Effective Capacity Management • People – Have the VMware team talk with the Capacity team. Use this slide deck & VI:OPS – Improve your knowledge of ITIL, VI is part of a larger entity. See VIOPS and Metron training webinars • Tools – Automate laborious activities with a tool such as Athene. See the whole picture and make informed decisions. Fast payback. • Monitoring – Focus on the key metrics and create processes and reports around them. 7 Point Plan for Effective Capacity Management • Reporting – Set up reports shown in this presentation and meet regularly with Stakeholders to review. • Create charts of your capacity trends – Use this presentation and VI:OPS as a guide and automate it using a tool such as Athene • Modeling – Run scenarios on a regular basis. It’s easy to do with a tool. See when capacity and service levels are impacted. How long do you have to react and what decisions need to be made? 7 Point Plan for Effective Capacity Management • Improve – use your new skills and knowledge to improve efficiency of infrastructure. Measure it, use KPI’s in presentation …..Take a bow & get all the glory! And finally… • VI:OPS – http://viops.vmware.com/home/index.jspa – VI3.Blueprint Capacity Workshop • Metron – www.metron-athene.com • Tools – Athene Virtual Appliance – http://www.vmware.com/appliances/directory/13 24
© Copyright 2024