How to Build an IAAS Cloud GlassHouse Whitepaper By Dick Benton, GlassHouse Technologies, Principal Consultant Can you build your own internal cloud? Or even better, can you transition your existing environment to an internal cloud model? At first blush, this may seem like a herculean task, but in reality it is a matter of following a few simple rules, adopting a few disciplines, and finding some good tools. Why an Internal Cloud – Benefits and Drivers? In many organizations today, there is a sea change rolling over what is considered as cost effective deployment of IT capabilities. Senior management, executive management, financial management, and many business unit “go getters” are all reading up on and getting excited about Cloud. Why is this so? For the “go getters”, cloud is a way to get what they need quickly and painlessly, without the impediments of IT process, but albeit at a price. Recently, we saw an article in an industry publication quoting a team leader who spoke in glowing terms about how Cloud had helped his team achieve their goals. Alas, that organization’s IT department was totally unaware that outside Cloud resources had been used – until the article was published, that is. Senior corporate management sees promise in a future where IT presents as service‐oriented and business focused rather than as an overhead providing back room number crunching. Financial managers are attracted to the potential of cost savings through the automation of IT administration functions, and the potential for supplying peak needs through transitory operating expenses instead of long term capital expenses. Coping with Cloud ‐ Internal IT Strategies How can the beleaguered CIO cope with this rapidly emerging and apparently unstoppable sea change while retaining and building relevance within the organization as a key enabler? As always, the answer is that it is better to run really fast in front of the truck instead of being dragged in the dirt behind it. This means the prudent CIO must first make access to an external Cloud resource easy and encourage it where it’s most effective. Additionally, the CIO must start a rapid transition to allow internal resources to be deployed in a similar billable manner, just as easily, just as quickly, and most importantly, just as cost effectively as the external cloud. This CIO strategy has a three‐fold impact. First, it confounds the expectations that IT will always be an impediment; second, it starts building the service‐oriented face of IT; and third, and perhaps most important, it acknowledges that IT resources come at a price. Historically, this latter fact of life has been sadly lacking in most IT resource deployments. When IT is free, it isn’t valued. So how does the CIO start the journey to providing internal services in a similar manner as his external Cloud competitors? What Type of Cloud Should You Build? We suggest an initial focus on the Infrastructure as a Service (IAAS) cloud. IAAS is a deployment model whereby consumers of infrastructure services can quickly and easily © Copyright 2011 GlassHouse Technologies, Inc. All rights reserved. 1 self‐provision their own needs, and equally quickly and easily release resources when they are no longer used. Perhaps most importantly, however, they pay for the resources they consume. In this scenario, IT needs to be able to offer provisioning and de‐ provisioning automation, typically via a web‐based front end, and be able to track usage with sufficient granularity to produce a monthly bill for only those services rendered. But first, let’s visit one of the key secrets of cloud success: you must be able to track and charge resource utilization appropriately. As previously mentioned, if a resource is free, it will usually be consumed without cause or concern for tomorrow. Using storage as an example, multiple copies will be generated because they can be. No one is going to release anything when they have finished using it if its use is free. Few but the most altruistic will care how much they use or for how long they keep it unless a price is associated with the resource usage. Free resources are consumed and retained without concern. Look what happens to the file shares in your MS Office environment. Unless resource use is priced, and priced competitively, lack of disciplines over resource consumption can quickly destroy the most elegant cloud and produce the dreaded RPE (Resume Producing Event). Your Cloud deployment MUST contain the capability to track usage and bill it under a transparent and reasonably fair cost model. Now, it may not be possible to actually issue an invoice because of corporate policy or inertia, however it is critical that usage and cost of usage by consumer and business unit is made visible as a key performance indicator in the IT presentation of monthly metrics. Who are the Cloud Consumers? So who are the consumers of cloud services? Which groups have the driving need to quickly provision infrastructure services, use the heck out of them, and then release them back into the pool when no longer needed? Another cloud secret. We have met the enemy and he is us. Yes, in most organizations the major consumer of infrastructure services is, in fact, other IT groups ‐ either within the IT organization or IT groups within business units. Corporate IT is the infrastructure service supplier to applications and database service suppliers. Typically, developer teams are always in need of another computer and more storage for another iteration of another version or the latest hot application idea. And database folks know that happiness is all the storage you can get, and they not only consume large amounts, but often a large number of iterations of the original amount are required for various version testing. Developers and DBAs can feel, not unreasonably, that resources should be available in a relatively short time, like today or tomorrow, but not 3 weeks out. These groups in the IT supply chain are totally dependent on the infrastructure service supplier to provide them with the wherewithal to do their job in a timely and effective manner. Who needs ad hoc access to resources outside IT? Typically, we find the technologically literate business unit making use of analytics and decision support. They want to access and even create “big” data and crunch it through compute intensive engines to find better ways to do business. Having to wait days, if not weeks, before a critical analysis can be run is not conducive to high levels (or even any levels) of satisfaction with IT. In fact, IT is nearly always having to say “we’re sorry” and being unable to affect the root cause, i.e. the inability to quickly (and that means automatically) provision resources to meet short term needs and short term surges. This leads us to another cloud secret; it is prudent to ensure you have the ability to cloud burst out of your environment and © Copyright 2011 GlassHouse Technologies, Inc. All rights reserved. 2 expand onto a handy third party IAAS provider to handle peaks that would otherwise be uneconomical to accommodate internally with Cap Ex. Benefits of an Internal Cloud What are the major benefits of transitioning your internal infrastructure to a Cloud Model? What can you expect to be different if you have an Infrastructure as a Service Cloud? What does Future IT look like? Let us look at the situation after the wave change has hit. Your most aggressive users are now happily provisioning themselves, and because they are paying a monthly charge, they are equally happily de‐provisioning themselves and saving money. Because you have eliminated IT as a gate on the path to provisioning, you have removed the perception of IT as a major impediment or road block to progress. IT no longer has to say “we’re sorry” and consumers are now getting what they need, when they need it, at a reasonably competitive price. So how do you transition an IT operation from 7x24 crisis with a backlog of incidents and service requests a mile long to this smoothly functioning Cloud machine? The foundational answer has been around for some years. It is called the service provider model. Instead of managing 5,000 servers running 5,000 and more applications, the service provider model transitions the management effort to some 5 +/‐ tiers of service. Now managing 5 entities is doable; even the most dedicated of us find it difficult to manage 5,000 entities. Building the Internal Cloud There are five major steps on the journey to transitioning your infrastructure to support an internal IAAS Cloud. Step 1 – Build a Service Catalog Re‐vision your infrastructure capabilities into 5 +/‐ tiers of service with attributes based on performance, scalability, and protection. Often, we see names such as a platinum tier, a gold tier, and a silver tier. Each grouping of attributes will probably require specific technologies. These technology differences will drive cost differences. High availability and quick recovery will prove expensive because of additional servers and replication. High performance based on IOPS may dictate certain storage and certain front end and back end port configurations, scalability may demand an enterprise class device or a vendor who can scale out as well as up without adding administrative complexity. Once the tiers of service are defined and published, we find rapid acceptance of performance, scalability and protection attributes, as those are the ones consumers care about, whereas the technology specifications for the infrastructure to deliver those attributes really only needs to be visible within IT engineering. In addition, we find that differences in supporting technology create differences in the cost per GB or cost per GHz of the consumable resource. This results in a cost differential between tiers of perhaps 50% or more. Now, with costs per unit of resource being visible in the catalog, IT is not only © Copyright 2011 GlassHouse Technologies, Inc. All rights reserved. 3 providing the consumer with a choice of services, but doing so in a manner that allows the consumer to choose the most appropriate service tier at the most appropriate price: a major change in corporate responsibility as selection and consumption decisions are now made in a cost conscious manner. So now we have a service catalog containing several tiers of service and we have a price for the consumable resource of each tier. Step 2 – Create a Service Level Agreement The next step is to create a little “marketing” awareness by offering a formal service level agreement to guarantee delivery of the services to the attributes defined in the service catalog. When IT is prepared to offer a written guarantee on the services they provide, the whole IT/consumer dynamic transitions away from IT as an impediment to IT as a concerned and involved partner providing defined resources for a defined amount of money and to a defined level of service. The service level agreement should include the information on both parties to the contract, each party’s responsibilities, mutual responsibilities, as well as escalation and remediation clauses. Step 3 – Build and Report on Key Performance Indicators To make all this work, IT needs to know definitively what it has got, how it is being used, and if more may soon be needed. Metrics make things happen. If you don’t know what’s happening, you will always be surprised. This means IT must take a conscious and effective effort to implement key performance indicators that will allow it to monitor and alert its service delivery capability and, through trending, to determine supply and demand in time to avoid disruptive outages. It is critical that these metrics are empirically based and not subjective assessment or opinion. (E.g. What is the actual IOPS demand of a server? Without that and similar metrics, how can you size bandwidth? How do you know where to look for performance bottlenecks? How can you determine if usage levels are rising beyond norms and shortly may drive a resource constraint outage?) Interfaces are needed with incident management applications to track against specific hardware or software components. Resource allocation, availability, consumption and resource release are all mission critical key performance indicators. Step 4 – Build an Inventory of Infrastructure Components To cost effectively administer your infrastructure, including orderly change management and release management, to say nothing of new product introduction or life cycle transitions, it is mission critical to know exactly what is on the floor, what its connected to, what is running on it, along with knowledge of inter‐ and intra‐dependencies. Without such a configuration management data base (CMDB), effective change management and release management is impossible and will only generate more incidents and outages. © Copyright 2011 GlassHouse Technologies, Inc. All rights reserved. 4 At this stage in your transition to the internal IaaS Cloud you will have made a substantial impact in simplifying and disciplining your IT infrastructure and developing a manageable measured and monitored environment with mature change and release management. All this should reduce the number of alligators sufficiently to continue to drain the swamp. Step 5 – Implement Back End Billing Now that you have a service catalog and are deploying your resources as priced services under a service level agreement, it’s time to use your key performance indicators to track and cost the months usage of those resources for each business unit. Preferably, this should be used in a cost recovery model but that is probably beyond the tolerance of most financial departments, despite the obvious benefits of linking usage to a dollar figure. The adoption of cost conscious behaviors can have a dramatic effect on resource selection and usage. There are many startups offering unique and innovative ways to go about this function. It is important to work with finance and develop a workable financial policy for charge back or at least cost visibility. It may not be corporately appropriate to issue an invoice internally, but you can achieve substantially the same thing by publishing costs by resource and user. Make friends with finance and figure out an appropriate set of policies. Will you recover only the usage cost, or the allocated cost? Will you recover your costs or make a markup to cover inflation, or will you markup according to the trending data and its indication of what you will need to buy next year? Ensure your costs are fully loaded, and include costs of facilities and staffing as well as hardware, software, and associated maintenance. Step 6 – Rationalize the Infrastructure First, figure out what can be automated. Not all resources can be automatically provisioned. Technology is just not there yet. Big Box Unix and mainframe may benefit from procedurally based scripting but will require some IT manual effort to deploy new and or additional resources. The obvious target today for automating provisioning is the virtualized X86 platform and the network attached storage (NAS) environment. Developing the policies, procedures and technical skills necessary to virtualize the x86 environment and its associated storage is absolutely key to implementing the most visible piece of your internal cloud services, automated provisioning. Remember, the Cloud deployment model is not only automated provisioning, it is also critical to be able to support de‐provisioning when the consumer has finished with the resources. Without the virtualized platform, it is too clumsy to execute the many steps involved in provisioning and de‐provisioning the physical environment by conventional means, even if supported by scripting. Virtualization technology lends itself to the automated provisioning of both compute and storage resources as well as the automated recognition and installation of added resources. Without virtualization, your Cloud deployment program will be simply vapor. Now here we enter some controversy. Cloud deployment is about transitioning IT from a techno‐centric “build it and they will come” organization that is slow © Copyright 2011 GlassHouse Technologies, Inc. All rights reserved. 5 moving and unresponsive to an organization that can provide rapid response to consumer driven services at lowest cost and improved service levels over time. Nothing new here. This is the mantra of every successful business and isn’t the IT Cloud supposed to be a successful business? But every technology platform requires a skill set and skill sets are carried by people and people carry cost. Every documented process is written by people, executed by people step‐by‐ step and usually checked by people. So this leads us to another cloud secret: people are a key cost driver in the IT financial model. To reduce costs related to people, we must reduce the number of redundant vendors and platforms to a rationalized set of trusted partners. And more importantly, we must automate everything. This means any policy or procedure that needs to be given voice should be implemented through automation. Every checkpoint required needs to be checked, alerted and escalated through automation. New hardware, once racked, should be automatically included in available configurations and resource pools. This level of automation is critical to a successful cloud . And it is often the most challenging component. Step 7 – Automate Provisioning This is the last and most important step from the end consumer viewpoint – rapid self‐provisioning. How is this front‐end automation going to work? The essence of cloud is self‐service selection of resources displayed in a service catalog at a defined price ‐ just like buying something on the web from a catalog. And so we should be looking for a web‐based front end and some stateless application code that allows a consumer to search the catalog, select the service or services required, receive a price, and then consummate the transaction, perhaps by entering the corporate equivalent of a credit card (e.g. department/employee). Consummation should result in a process that makes the resources immediately available to the consumer. Again, there is a plethora of startups with some quite amazing innovation offering this type of functionality. Understanding what platforms and APIs such applications will need to interface with is key to the selection process. Some organizations who cut down to single vendor infrastructure will venture bravely into self‐development of web‐based applications using APIs to native functionality. If needs are simple, this may be an appropriate start point until the market matures and a smaller number of leaders emerge. It also avoids lock‐in to the hardware vendors and the embarrassment of having your selected tool vendor fail in the market place. A number of hardware vendors and independent third parties are developing or have already released front end web‐based platforms that provide the end consumer with an “IT free” provisioning process from service selection through the actual allocation and assignment of access. We suggest you research what is out there and make a list of what features and functions are most important to your organization. Use this list of requirements to compare vendors in an RFP process. © Copyright 2011 GlassHouse Technologies, Inc. All rights reserved. 6 Conclusion Front end provisioning, backend invoicing and virtualization of your X86 platform are all the hard parts. But what you can do right now is build the disciplines and the services under the service provider model, ensuring you have a priced service catalog, service level agreements, key performance indicators, and mature processes. This takes you from managing 5,000+ entities to managing 5 +/‐ tiers of service and provides a disciplined framework where you know what you’ve got, what’s running on it and access to the metrics to manage and monitor it all cost effectively. This takes you a long way to improving service, satisfaction levels and reducing costs. Once you virtualize the storage and compute environment, and automate the front‐end provisioning process, you dramatically improve your time to market and make a major leap ahead in servicing your consumers and becoming a valued and visible member of the organization. Automating backend billing allows the organization to transition to a new cost conscious awareness in the consumption of valuable IT resources and the recognition of IT as a competitive service supplier. The result? IT now operates as a competitive and cost effective internal supplier who can also rapidly deploy external Cloud suppliers for peak loads and transitory demand. GlassHouse Technologies is a global provider of data center consulting and managed services. In a rapidly changing data center environment, GlassHouse partners with customers to define a strategy, execute that plan and operate their environment. Our constant focus is on cost efficiency, risk mitigation and service improvement. We provide this through Transom, our unique business model comprised of proprietary software tools, methodologies and domain expertise. We help deliver on the promise of agility and usage‐based spending in the next generation infrastructure paradigm. This journey is significant and requires an experienced and pragmatic guide to help you achieve your goals. Our experience is based on thousands of projects in addition to ongoing daily operations of customer environments. © Copyright 2011 GlassHouse Technologies, Inc. All rights reserved. 7