! The Technology Enabling 100:1 Data Efficiency Whitepaper SimpliVity’s Data Virtualization Platform (DVP) is the market-leading hyperconverged infrastructure that delivers triple digit data efficiency rates. The DVP was designed from the ground up to simplify IT by solving the data problem and dramatically improving overall data efficiency. The Data Problem IOPS have become one of the most expensive resources in the data center. Application demands on IT infrastructure have accelerated in the age of virtualization and the legacy infrastructure stack can’t keep up. In the typical data center, many disparate products are required in order to deliver critical capabilities to enterprise applications: performance, data protection, data efficiency and global unified management. Why so many different products? Enterprise applications require these critical capabilities. As data centers expanded and data volumes grew, these point technologies were the only options available to keep the lights on. At first, it may have been one deduplication device for backup or it may have been an optimization appliance for the WAN. But over time, the number of boxes started to multiply. And most were not originally designed for virtualization applications and workloads. Technology Refresh Cycle Generally, each of these products is purchased from a different vendor, each requires separate training, each is managed from a separate management screen, each requires a separate support and maintenance contract, and each is bought on a different refresh cycle. One specific example of the proliferation of disparate devices and technologies is in data efficiency. When introduced to the market in the mid-2000s, deduplication was designated specifically for backup. In this use case, optimizing for capacity is crucial, given massive redundancy of data and the ever-increasing volume of data to be backed up and retained. Deduplication then spread to other isolated phases of the data lifecycle as IT organizations saw the benefits: ! 1! ! • Enhanced data mobility. A fundamental principle of server virtualization is the mobility of the VMs, but coarse-grain data structures (large block sizes, LUNs, etc.) significantly inhibit mobility in a traditional infrastructure environment. When the data is deduplicated, it is easier to move VMs from one data center to another, and it is easier to move data in and out of the cloud for the same reason. • Enhanced performance given that less actual data needs to be written to disk or read from disk. This is amplified in application environments such as a Virtual Desktop Infrastructure (VDI), where a “boot storm” can generate multiple GB of random reads from disk. • Efficient storage utilization. Required capacity can be reduced 2-3X in standard primary storage cases based on the effective use of deduplication, compression, and optimization. • More efficient use of the SSD storage cache. A deduplication process that operates at the right point in the data stream can reduce the footprint in the cache, increasing the effective capacity and overall system-wide performance. • Dramatic bandwidth reduction on replication between sites. Twenty years ago, the IT organization was dedicated to a single primary data center, but today, almost all IT teams manage multiple sites. A fundamental requirement of the infrastructure naturally emerges to foster efficient data transfer among sites. Deduplicating data before it is sent to a remote site makes the transfer itself more efficient and saves significant bandwidth resources. Despite the maturity of data efficiency over the past decade, and the great capacity and performance benefits therein, no vendor has thus far comprehensively solved the deduplication challenge. As mentioned previously, deduplication was originally designed for backup, which occupies a stage in the data lifecycle long after data has originally been created. Intuitively, deduplicating, compressing and optimizing data at the point of inception – long before backup, transfer, archive, etc. – would solve the performance and capacity challenges before redundancy and wasted resources proliferate throughout the infrastructure. Legacy technologies to date have been unable to deliver on this ideal. Typically, this is as a result of performance tradeoffs – tradeoffs that IT teams are not willing to make. Instead, legacy technologies have provided deduplication and/or compression as point products or appliances at various stages in the lifecycle. Some products apply deduplication only within the SSD tier, and therefore only offer limited benefits in terms of overall efficiency. Others apply compression technology and incorrectly use the term “deduplication”. In primary storage systems, optimizing for disk capacity is a relatively lower priority. Hard Disk IOPS and CPU cycles are a much more expensive system resource than HDD capacity. As a result of the latency that deduplication imposes, many have deployed it as a “post-process,” which severely limits other operations such as replication and backup. Most of these sub-optimal implementations are a result of adding deduplication to existing legacy architecture, rather than developing it as the foundation for the overall 21st Century architecture. ! 2! ! The various fragmented work-arounds that vendors have delivered have varying levels of value, but fall short of solving the underlying problem; they ultimately do not deliver a truly fine-grained and mobile data infrastructure. IT teams can be left with higher acquisition costs and even more complexity as they manage a patchwork of incomplete data efficiency solutions amidst their other infrastructure burdens. Many IT organizations have thus invested in up to nine separate data efficiency products, each designed to provide some level of efficiency (deduplication, compression, or optimization) at a specific stage within the data lifecycle. These stages include, but are not limited to: 1. 2. 3. 4. 5. 6. Flash cache in the server DRAM and/or Flash cache or tier in the storage array Disk in the storage array Flash array Backup appliance in the primary data center Archive or secondary storage array 7. DR storage array 8. WAN optimization appliance 9. Cloud gateway appliance ! 3! ! Traditional Data Efficiency Technologies In today’s data center, the primary concern is IOPS, not HDD capacity. With traditional hard disk-based storage devices, storage performance can be hard to achieve. Here’s why: When a user decides to save a document, the mechanical components inside the hard drive on which that document will be stored jump into action. The write head, which is the device responsible for saving that data, positions itself a fraction of a hair’s width above a magnetic platter that spins between 7,200 and 15,000 revolutions per minute. Once positioned, the write head starts to write the file to the storage. However, there may not be enough continuous space on the platter to write the whole file at once, so the write head saves portions of the file all over the disk. It then keeps a comprehensive index so that it knows where the pieces of every file are stored. The more time it takes to write the file, the longer the user has to wait for the computer to save the file. The time between the command to write the file and the acknowledgement that the file has been written is known as latency. Latency is one of the most serious problems in today’s data centers, and it is exacerbated by new kinds of applications that have hit the business application market. Many new applications introduce performance requirements on storage systems that increase latency. Eventually, latency becomes so bad that it is a detriment to ongoing business operations. To combat this growing problem, a new kind of storage has entered the market. Called flash storage or solid state storage, this kind of storage doesn’t suffer from the latency issues that plague traditional hard drives. However, where modern hard drives can each hold 6 terabytes (6 trillion bytes) or more per disk, solid state disks can only storage a fraction of that amount (1 TB or 1 trillion bytes, for example). Further, solid state disks remain very expensive compared to hard disks, at least when it comes to capacity. The data efficiency technologies developed for the appliances described in the previous section were historically designed to address a capacity problem. Since HDDs were expensive, a number of appliances were developed for the purpose of conserving as much capacity on HDDs as possible by sacrificing CPU resources. But as we just discussed, the real challenge is IOPS. Let us review the technical advantages and disadvantages of today’s prevailing data efficiency technologies. Compression: Compression is the process of reducing the size of any given data element. Not all data can be compressed - for example, video or audio - while text compresses very well. The challenge is that there is no way to know exactly how well data will compress without actually compressing it. • Inline Compression: o o ! Resource intensive, requires a significant amount of CPU Time consuming process that increases latency 4! ! o Benefit: a reduction in used HDD capacity • Post-Process Compression: First writes data to HDD and then reads the data back from the HDD later and attempts to compress it. o Additional HDD IOPS are required after the initial write to read and then potentially rewrite the data. o CPU is required after the initial write to read the data. o CPU is required to process the data. o More HDD IOPs are needed to write the data back to disk. o Benefit: Eventual HDD capacity savings.! o Deduplication: Deduplication is a specialized data efficiency technique for eliminating duplicate copies or repeating data. It is used to improve storage utilization where unique chunks of data, or byte patterns, are identified and stored during a process of analysis. Most systems deduplicate at one phase of the data life cycle. By definition, this deduplication has happened after some other event. • Inline Deduplication: o o o o Imposes a performance penalty on all I/O as it requires CPU resources. Storage system: data is first written by the application to the server, then transferred over a storage area network to the array. Backup system: data is first written to servers and storage before backed up and deduplicated to the backup appliance. Benefit: a reduction in used HDD capacity. • Post-Process Deduplication: o o o ! Data is first written to disk, and then deduplicated after the fact. Requires sufficient space upfront to first ingest the data before deduplicating. Requires additional IOPS to write data then read the data to perform the deduplication, and then write it again in deduplicated form. 5! ! o o Requires additional WAN bandwidth to transfer the blocks over the wire before deduplication occurs. Benefit: a reduction in used HDD capacity will eventually be recognized. • Read Cache Deduplication: o o Requires sacrificing CPU upfront. Benefit: achieve an increase in effective cache size and in application IOPS. Each of the data efficiency technologies described above share some of the same fundamental flaws: 1. They require a tradeoff: whether it’s sacrificing CPU up front or IOPS on the backend, they’re robbing Peter to pay Paul. 2. They steal application resources: within any converged environment, they all use CPU, memory and disk that should be reserved for applications. 3. They have hidden costs: within any converged environment, more CPU resources may be required to support the applications, which could mean more hypervisor and database licensing fees. 4. They were designed for capacity: fundamentally, these technologies were designed to solve a capacity problem, not an IOPS problem. 5. They were designed for different stages of the data lifecycle: Each technology today is implemented across one device at one point in time. Every time a piece of data is moved to its next stage in its lifecycle, it needs to be processed again and again. All of this points in one direction: 21st century data has to be deduplicated, compressed, and optimized at inception and maintained in that state through the entire lifecycle. When data is deduplicated across all tiers right from the point of inception, it has significant resource-saving ramifications downstream, and opens up the advanced functionality required for today’s virtualized world. SimpliVity’s OmniCube platform leverages hybrid storage right out of the box by providing customers with the best of both worlds when it comes to storage. The SimpliVity Solution - Hyperconvergence SimpliVity’s solution is the revolutionary hyperconverged platform – a salable, modular 2U building block of x86 resources that offers all the functionality of traditional IT infrastructure in one device. It is an all-in-one, IT infrastructure platform that includes storage, compute, networking, hypervisor, real-time deduplication, compression, and optimization along with powerful global unified management, data protection and disaster recovery capabilities. SimpliVity’s hyperconverged infrastructure delivers the best of both worlds: x86 cloud economics with critical enterprise capabilities. SimpliVity refers to this level of integration as Hyperconvergence. Hyperconvergence has a few critical design elements: 1. The solution is designed for high availability with no single point of failure ! 6! ! 2. Scale-out: new systems are seamlessly added into the network, adding linear performance and capacity. 3. Global Unified Management: all resources are managed from the same pane of glass, globally. 4. TCO savings: the solution provides at least 3x Total Cost of Ownership savings when compared to the traditional infrastructure “legacy stack”. 5. Operational Efficiency: the solution ties back to SimpliVity’s vision to “Simplify IT”, which includes not only the hardware and software, but also the operational elements from provisioning, to backup and restore, to ongoing management and maintenance. The Data Virtualization Platform (DVP) is SimpliVity’s foundational technology, essential to delivering data efficiency while minimizing the impact of the I/O Gap (The I/O Gap is the increasing divergence between the available capacity of an HDD vs. the IOPS provided. For example, over the last decade, disk drive capacity has increased to 6TB, yet performance has only increased ~1.5x.). The DVP enables SimpliVity to transform all IT infrastructure below the hypervisor, hyper-converging 8-12 previously disparate devices – and their associated data efficiency technologies – into one scalable building block. The key innovation is SimpliVity’s ability to provide enterprise data efficiency without the tradeoffs of the legacy technology. SimpliVity is able to provide all of the benefits of deduplication without the penalties that are apparent in the legacy data efficiency equations. The SimpliVity Equation is a result of its underlying architecture. The DVP includes two components: 1. Inline Deduplication, Compression and Optimization: SimpliVity’s DVP performs inline data deduplication, compression, and optimization on all data at inception across all phases of the data lifecycle (primary, backup, WAN, archive, and on the cloud), across all tiers within a system (DRAM, Flash/SSD, and HDD), all handled with fine data granularity of just 4KB-8KB.! In order to achieve what was once previously thought to be contradictory – deduplicating and compressing data inline without performance penalties or resource tradeoffs– SimpliVity developed the OmniStack Accelerator Card, a PCIe card with an FPGA, flash, and DRAM, protected with super capacitors. Deduplication and compression are processorintensive activities and with that in mind, SimpliVity’s Accelerator Card provides the processing power and reserves the Intel CPUs to run business applications. This architecture allows data processing at nearwire speeds, delivering enterprise-class performance and reducing latency. ! 7! ! All data is deduplicated, compressed, and optimized, before it ever hits a hard disk. This is critical. Not only does this save capacity and prevents administrators from over-provisioning hardware, it also increases performance. Why? Because SimpliVity eliminates IOPS. The HDD performance is critical when managing data and supporting business applications. Even with the aid of DRAM and flash caches, HDD performance will suffer. SimpliVity avoids this tradeoff by either eliminating or compressing all write operations before they impact the HDDs. The Data Virtualization Platform also aggregates all writes and does full stripe writes to the disk. Furthermore, because sequential writes are faster than random I/O, SimpliVity accepts random writes and writes them to disk sequentially, boosting overall system performance and efficiency. While deduplication is the fundamental core, the Data Virtualization Platform further enhances CAPEX and OPEX savings by delivering remarkable efficiencies through operating system and virtualization-aware optimizations. The optimizations within SimpliVity’s Data Virtualization Platform deliver similar effects to deduplication in a different way—they identify data that need not be copied, or replicated, and take data-specific actions to improve the overall efficiency of the system. Given that SimpliVity’s hyperconverged infrastructure today is optimized for the VMware environment, most such optimizations stem from awareness of VMware specific content or commands. For example, .vswp files, though important to the functionality of each individual VM, do not need to be backed up locally or replicated across sites. Thus, when preparing to backup or replicate a given VM from one site to another, the Data Virtualization Platform recognizes the .vswp file associated with a VM, and eliminates that data from the transfer - saving time, bandwidth and capacity. Another important optimization is SimpliVity’s ability to serialize or “sequentialize” random I/O into sequential I/O. Sequential writes are much more efficient – and therefore faster – than random writes on an HDD. SimpliVity’s optimization makes the HDD even more efficient in terms of IOPS consumption. Other optimizations are similar in nature— leveraging the Data Virtualization Platform’s ability to find and make real-time decisions on common data types within a VMware environment. 2. Global Unified Management: The second critical element of the DVP is global unified management – an intelligent network of collaborative systems that provides massive scale-out capabilities and VM-centric management through a single unified interface for the entire global infrastructure, the Federation. A key differentiator for SimpliVity is that the Graphical User Interface (GUI) is fully integrated with VMware vCenter as a plug-in. A single administrator can manage all aspects of the global SimpliVity federation from within vCenter, a tool that all VMware administrators already know and use regularly. SimpliVity’s management is at the VM-level, globally. Gone are the days of managing LUNs and Volumes. Instead, an administrator manages, moves, and protects VMs with the click of a button within the vCenter plug-in. Moreover, SimpliVity’s DVP was designed from creation to allow data to ! 8! ! be mobile and remain mobile throughout its lifecycle within the Global Federation. Global unified management combined with SimpliVity’s data efficiency equation a powerful solution for data efficiency. Not only can SimpliVity provide unique deduplication, compression, and optimization in one site, it can do it globally. When SimpliVity backs up, restores, or moves a VM to a remote site, it will never transfer duplicate data; it will only transfer the data that the remote site needs. If the data is already there (in a backup of the VM being transferred or any other VM), then the data will not be sent. SimpliVity’s Functionality Elevates Data Efficiency Additional increases in data efficiency are gained by taking full advantage of SimpliVity’s comprehensive functionality. Efficiency will continue to improve as administrators do more with the platform – the more you do, the more efficiency you gain. • Backups: As you backup more VMs and data, the overall efficiency increases while improving availability. A high frequency of backups would effectively provide a much lower Recovery Point Objective (RPO). SimpliVity allows IT the ability to backup applications that previously were not possible within the allocated budget. • Cloning: The ability to perform rapid cloning with SimpliVity brings more agility to test & development teams. Data efficiency continues to improve as more clones are created. • Retention: When you retain data longer, you will improve your ability to restore further back in time, improving overall data protection, increasing agility and increasing efficiency. • Similarity: The more applications that are based on the same operating system – Windows, for example – will lead to higher efficiency rates. • Applications: Certain applications will naturally see higher deduplication and compression ratios. For example, any text-heavy applications will compress well. Benefits of Data Efficiency with SimpliVity In order to address all aspects of the data problem with one single 2U building block, SimpliVity recognizes that addressing data efficiency equates to improved performance, capacity savings, and as a result, a lower TCO. SimpliVity provides the following benefits directly related to data efficiency: • Performance: SimpliVity significantly increases performance of all applications by eliminating IOPS that would have to be written to disk in traditional infrastructures. Within the Data Virtualization Platform, deduplication makes each storage media tier more efficient (DRAM, Flash, and HDD) because the data is processed only once and every tier of the system benefits from more efficient data. SimpliVity’s DVP removes the performance tradeoff that previously prevented legacy technologies from solving the data problem at inception. Now, instead of sacrificing performance to save capacity, SimpliVity actually increases performance. • Capacity: SimpliVity solves for the I/O gap and therefore requires fewer HDDs in the system than legacy systems, as writes will have already been deduplicated. The capacity benefit that SimpliVity provides is positively compounded with hyperconvergence. No longer are resources split accordingly to every separate stages of the data lifecycle – primary, secondary, backup, etc. Now, all resources are hyperconverged into one scalable, modular building block. ! 9! ! • Operational Efficiency: SimpliVity drastically simplifies IT. It starts with data efficiency (reducing capacity while increasing performance) but extends with Global Unified Management and VM-centricity and mobility. As a whole, SimpliVity’s hyperconverged solution simplifies all aspects of IT management, producing operational efficiency that allows administrators and staff to focus their efforts on new innovation instead of maintenance. SimpliVity simplifies management across all sites, reduces the need for additional IT resources across multiple sites, and maximizes productivity of IT staff by automating mundane tasks. • 3x Lower TCO: Hyperconvergence combines previously disparate products and applications into a single, scalable, modular 2U building block. SimpliVity’s solution provides not only server and storage but also natively includes data protection and backup at the VMlevel. There is no longer a need to have separate appliances for data efficiency, data performance, and data protection, producing immediate Capital Expense (CAPEX) benefits as less physical hardware is required. Perhaps more significant though are the Operational Expense (OPEX) benefits of hyperconvergence. The smaller physical footprint and simplified operations result in lower over Total Cost of Ownership (TCO) accounting for support, maintenance, bandwidth, power and cooling, and administrative expenses. • Data Protection: SimpliVity’s native VM-level backup, restore and disaster recovery capabilities ensure data protection locally and remotely. The data protection features benefit from the data efficiency enabled by the DVP: o o o All backups and restores are full logical backups with no performance impact. When SimpliVity creates a new local backup, no IOPS are consumed since the creation of the backup is a metadata update. At the same time, backups are managed as logically independent objects allowing for unique policies without the integrity risks of traditional snaps tied to a master or gold copy. All backups are deduplicated, compressed and optimization across the global federation. If a backup is sent to a remote site, only unique blocks are sent across the wire, saving on WAN bandwidth and remote processing / IOPS. Cloning operations – on running VMs! – are instantaneous. The DVP is once and forever, which extends to test and development environments. Simply right-click a VM within vCenter and press “clone”, and a powered-off copy of that running VM will be created immediately, without any I/O or impact to production. Data Efficiency Customer Testimonials: Central One Federal Credit Union achieved massive data efficiency, in both capacity and performance, with the use of SimpliVity. Central One was running a mixed workload including a banking application, test and development, Oracle 11g transactional database, Microsoft Exchange, Microsoft SQL Server and Active Directory on a non-virtualized environment of 30 disparate servers. This was costly, complex and generated concerns about outgrowing the data center. After installing SimpliVity OmniCube hyperconverged solution and migrating the workloads, Central One saw a dramatic improvement in performance and resource optimization, and was able to store 1.9PB of data on only 7.9TB of physical capacity. ! 10! ! “The OmniCube’s native DR ability with built-in dedupe is a gift to clients. No other solution can do this.” – Adam Winter, IT Consultant! ! Francis Drilling Fluids, Ltd. (FDF), a provider of midstream services within the Oil & Gas industry, deployed SimpliVity’s OmniCube to relieve VM server sprawl, improve disaster recovery and streamline its virtualized data center. With SimpliVity, FDF realized a data efficiency rate of 288:1 and saw a capacity savings of over 260TB. “In the beginning, I thought that OmniCube seemed too good to be true. It really was inconceivable that we could deploy a 2U solution and gain all the server, storage and data protection functionality, along with the data deduplication, replication and WAN optimization we needed. Scalability was a huge factor as well, as was ease of use and management all integrated from within vCenter.” – Steve Schaaf, CIO, Francis Drilling Fluids, Ltd.! To read more please view our case studies and our TechValidate SimpliVity OmniCube results. ! 11! ! Solution Summary The results of SimpliVity’s revolutionary Data Virtualization Platform is the market’s most efficient infrastructure for the modern, agile data center – a globally federated hyperconverged IT platform that enables VM-centric global management of all VMs, data, and the underlying infrastructure. The key to SimpliVity's triple digit data efficiency rates - while simultaneously increasing performance - is the DVP and its fundamentally unique architecture.. SimpliVity put an end to the appliance sprawl with true hyperconvergence encompassing all main infrastructure functions, including enterprise capabilities of data efficiency, performance, protection, and global unified management. SimpliVity also improves operational efficiency with global unified management, natively integrates data protection, and lowers TCO by at least 3x with true hyperconvergence. OmniCube improves performance while saving on capacity. SimpliVity is the only solution that is proven to deliver over 100:1 data efficiency rates without any of the penalties. ! 12!