The Technology Enabling 100:1 Data Efficiency Whitepaper

!
The Technology Enabling 100:1 Data Efficiency
Whitepaper
SimpliVity’s Data Virtualization Platform (DVP) is the market-leading hyperconverged
infrastructure that delivers triple digit data efficiency rates. The DVP was designed from the
ground up to simplify IT by solving the data problem and dramatically improving overall data
efficiency.
The Data Problem
IOPS have become one of the most expensive resources
in the data center. Application demands on IT
infrastructure have accelerated in the age of virtualization
and the legacy infrastructure stack can’t keep up.
In the typical data center, many disparate products are
required in order to deliver critical capabilities to
enterprise applications: performance, data protection,
data efficiency and global unified management.
Why so many different products?
Enterprise applications require these critical capabilities.
As data centers expanded and data volumes grew, these
point technologies were the only options available to
keep the lights on. At first, it may have been one deduplication device for backup or it may have
been an optimization appliance for the WAN. But over time, the number of boxes started to
multiply. And most were not originally designed for virtualization applications and workloads.
Technology Refresh Cycle
Generally, each of these products is purchased from a
different vendor, each requires separate training, each is
managed from a separate management screen, each
requires a separate support and maintenance contract,
and each is bought on a different refresh cycle.
One specific example of the proliferation of disparate
devices and technologies is in data efficiency. When
introduced to the market in the mid-2000s, deduplication
was designated specifically for backup. In this use case,
optimizing for capacity is crucial, given massive
redundancy of data and the ever-increasing volume of data
to be backed up and retained.
Deduplication then spread to other isolated phases of the data lifecycle as IT organizations saw the
benefits:
!
1!
!
• Enhanced data mobility. A fundamental principle of server virtualization is the mobility
of the VMs, but coarse-grain data structures (large block sizes, LUNs, etc.) significantly
inhibit mobility in a traditional infrastructure environment. When the data is deduplicated,
it is easier to move VMs from one data center to another, and it is easier to move data in
and out of the cloud for the same reason.
• Enhanced performance given that less actual data needs to be written to disk or read
from disk. This is amplified in application environments such as a Virtual Desktop
Infrastructure (VDI), where a “boot storm” can generate multiple GB of random reads
from disk.
• Efficient storage utilization. Required capacity can be reduced 2-3X in standard
primary storage cases based on the effective use of deduplication, compression, and
optimization.
• More efficient use of the SSD storage cache. A deduplication process that operates at
the right point in the data stream can reduce the footprint in the cache, increasing the
effective capacity and overall system-wide performance.
• Dramatic bandwidth reduction on replication between sites. Twenty years ago, the
IT organization was dedicated to a single primary data center, but today, almost all IT
teams manage multiple sites. A fundamental requirement of the infrastructure naturally
emerges to foster efficient data transfer among sites. Deduplicating data before it is sent
to a remote site makes the transfer itself more efficient and saves significant bandwidth
resources.
Despite the maturity of data efficiency over the past decade, and the great capacity and
performance benefits therein, no vendor has thus far comprehensively solved the deduplication
challenge.
As mentioned previously, deduplication was originally designed for backup, which occupies a stage
in the data lifecycle long after data has originally been created.
Intuitively, deduplicating, compressing and optimizing data at the point of inception – long before
backup, transfer, archive, etc. – would solve the performance and capacity challenges before
redundancy and wasted resources proliferate throughout the infrastructure. Legacy technologies to
date have been unable to deliver on this ideal. Typically, this is as a result of performance tradeoffs
– tradeoffs that IT teams are not willing to make.
Instead, legacy technologies have provided deduplication and/or compression as point products or
appliances at various stages in the lifecycle.
Some products apply deduplication only within the SSD tier, and therefore only offer limited
benefits in terms of overall efficiency. Others apply compression technology and incorrectly use
the term “deduplication”. In primary storage systems, optimizing for disk capacity is a relatively
lower priority. Hard Disk IOPS and CPU cycles are a much more expensive system resource than
HDD capacity.
As a result of the latency that deduplication imposes, many have deployed it as a “post-process,”
which severely limits other operations such as replication and backup. Most of these sub-optimal
implementations are a result of adding deduplication to existing legacy architecture, rather than
developing it as the foundation for the overall 21st Century architecture.
!
2!
!
The various fragmented work-arounds that vendors have delivered have varying levels of value,
but fall short of solving the underlying problem; they ultimately do not deliver a truly fine-grained
and mobile data infrastructure. IT teams can be left with higher acquisition costs and even more
complexity as they manage a patchwork of incomplete data efficiency solutions amidst their other
infrastructure burdens.
Many IT organizations have thus invested in up to nine separate data efficiency products, each
designed to provide some level of efficiency (deduplication, compression, or optimization) at a
specific stage within the data lifecycle. These stages include, but are not limited to:
1.
2.
3.
4.
5.
6.
Flash cache in the server
DRAM and/or Flash cache or tier in the storage array
Disk in the storage array
Flash array
Backup appliance in the primary data center
Archive or secondary storage array
7. DR storage array
8. WAN optimization appliance
9. Cloud gateway appliance
!
3!
!
Traditional Data Efficiency Technologies
In today’s data center, the primary concern is IOPS, not HDD capacity.
With traditional hard disk-based storage devices, storage performance can be hard to achieve.
Here’s why: When a user decides to save a document, the mechanical components inside the
hard drive on which that document will be stored jump into action. The write head, which is the
device responsible for saving that data, positions itself a fraction of a hair’s width above a
magnetic platter that spins between 7,200 and 15,000 revolutions per minute. Once positioned,
the write head starts to write the file to the storage. However, there may not be enough
continuous space on the platter to write the whole file at once, so the write head saves portions
of the file all over the disk. It then keeps a comprehensive index so that it knows where the
pieces of every file are stored.
The more time it takes to write the file, the longer the user has to wait for the computer to save
the file. The time between the command to write the file and the acknowledgement that the file
has been written is known as latency. Latency is one of the most serious problems in today’s
data centers, and it is exacerbated by new kinds of applications that have hit the business
application market. Many new applications introduce performance requirements on storage
systems that increase latency. Eventually, latency becomes so bad that it is a detriment to
ongoing business operations.
To combat this growing problem, a new kind of storage has entered the market. Called flash
storage or solid state storage, this kind of storage doesn’t suffer from the latency issues that
plague traditional hard drives. However, where modern hard drives can each hold 6 terabytes (6
trillion bytes) or more per disk, solid state disks can only storage a fraction of that amount (1 TB
or 1 trillion bytes, for example). Further, solid state disks remain very expensive compared to
hard disks, at least when it comes to capacity.
The data efficiency technologies developed for the appliances described in the previous section
were historically designed to address a capacity problem. Since HDDs were expensive, a
number of appliances were developed for the purpose of conserving as much capacity on HDDs
as possible by sacrificing CPU resources. But as we just discussed, the real challenge is IOPS.
Let us review the technical advantages and disadvantages of today’s prevailing data efficiency
technologies.
Compression:
Compression is the process of reducing the size of any given data element. Not all data can be
compressed - for example, video or audio - while text compresses very well. The challenge is
that there is no way to know exactly how well data will compress without actually compressing it.
• Inline Compression:
o
o
!
Resource intensive, requires a significant amount of CPU
Time consuming process that increases latency
4!
!
o
Benefit: a reduction in used HDD capacity
• Post-Process Compression:
First writes data to HDD and then reads the data back from the HDD later and attempts
to compress it.
o Additional HDD IOPS are required after the initial write to read and then potentially
rewrite the data.
o CPU is required after the initial write to read the data.
o CPU is required to process the data.
o More HDD IOPs are needed to write the data back to disk.
o Benefit: Eventual HDD capacity savings.!
o
Deduplication:
Deduplication is a specialized data efficiency technique for eliminating duplicate copies or
repeating data. It is used to improve storage utilization where unique chunks of data, or byte
patterns, are identified and stored during a process of analysis. Most systems deduplicate at
one phase of the data life cycle. By definition, this deduplication has happened after some other
event.
• Inline Deduplication:
o
o
o
o
Imposes a performance penalty on all I/O as it requires CPU resources.
Storage system: data is first written by the application to the server, then transferred
over a storage area network to the array.
Backup system: data is first written to servers and storage before backed up and
deduplicated to the backup appliance.
Benefit: a reduction in used HDD capacity.
• Post-Process Deduplication:
o
o
o
!
Data is first written to disk, and then deduplicated after the fact.
Requires sufficient space upfront to first ingest the data before deduplicating.
Requires additional IOPS to write data then read the data to perform the deduplication,
and then write it again in deduplicated form.
5!
!
o
o
Requires additional WAN bandwidth to transfer the blocks over the wire before
deduplication occurs.
Benefit: a reduction in used HDD capacity will eventually be recognized.
• Read Cache Deduplication:
o
o
Requires sacrificing CPU upfront.
Benefit: achieve an increase in effective cache size and in application IOPS.
Each of the data efficiency technologies described above share some of the same fundamental
flaws:
1. They require a tradeoff: whether it’s sacrificing CPU up front or IOPS on the backend,
they’re robbing Peter to pay Paul.
2. They steal application resources: within any converged environment, they all use CPU,
memory and disk that should be reserved for applications.
3. They have hidden costs: within any converged environment, more CPU resources may be
required to support the applications, which could mean more hypervisor and database
licensing fees.
4. They were designed for capacity: fundamentally, these technologies were designed to solve
a capacity problem, not an IOPS problem.
5. They were designed for different stages of the data lifecycle: Each technology today is
implemented across one device at one point in time. Every time a piece of data is moved to
its next stage in its lifecycle, it needs to be processed again and again.
All of this points in one direction: 21st century data has to be deduplicated, compressed, and
optimized at inception and maintained in that state through the entire lifecycle. When data is
deduplicated across all tiers right from the point of inception, it has significant resource-saving
ramifications downstream, and opens up the advanced functionality required for today’s virtualized
world.
SimpliVity’s OmniCube platform leverages hybrid storage right out of the box by providing
customers with the best of both worlds when it comes to storage.
The SimpliVity Solution - Hyperconvergence
SimpliVity’s solution is the revolutionary hyperconverged platform – a salable, modular 2U
building block of x86 resources that offers all the functionality of traditional IT infrastructure in
one device. It is an all-in-one, IT infrastructure platform that includes storage, compute,
networking, hypervisor, real-time deduplication, compression, and optimization along with
powerful global unified management, data protection and disaster recovery capabilities.
SimpliVity’s hyperconverged infrastructure delivers the best of both worlds: x86 cloud
economics with critical enterprise capabilities. SimpliVity refers to this level of integration as
Hyperconvergence.
Hyperconvergence has a few critical design elements:
1. The solution is designed for high availability with no single point of failure
!
6!
!
2. Scale-out: new systems are seamlessly added into the network, adding linear performance
and capacity.
3. Global Unified Management: all resources are managed from the same pane of glass,
globally.
4. TCO savings: the solution provides at least 3x Total Cost of Ownership savings when
compared to the traditional infrastructure “legacy stack”.
5. Operational Efficiency: the solution ties back to SimpliVity’s vision to “Simplify IT”, which
includes not only the hardware and software, but also the operational elements from
provisioning, to backup and restore, to ongoing management and maintenance.
The Data Virtualization Platform (DVP) is SimpliVity’s foundational technology, essential to
delivering data efficiency while minimizing the impact of the I/O Gap (The I/O Gap is the
increasing divergence between the available capacity of an HDD vs. the IOPS provided. For
example, over the last decade, disk drive capacity has increased to 6TB, yet performance has
only increased ~1.5x.). The DVP enables SimpliVity to transform all IT infrastructure below the
hypervisor, hyper-converging 8-12 previously disparate devices – and their associated data
efficiency technologies – into one scalable building block. The key innovation is SimpliVity’s
ability to provide enterprise data efficiency without the tradeoffs of the legacy technology.
SimpliVity is able to provide all of the benefits of deduplication without the penalties that are
apparent in the legacy data efficiency equations.
The SimpliVity Equation is a result of its underlying architecture. The DVP includes two
components:
1. Inline Deduplication, Compression and Optimization:
SimpliVity’s DVP performs inline data deduplication, compression, and optimization on all data
at inception across all phases of the data lifecycle (primary, backup, WAN, archive, and on the
cloud), across all tiers within a system (DRAM, Flash/SSD, and HDD), all handled with fine data
granularity of just 4KB-8KB.!
In order to achieve what was once previously thought to be contradictory
– deduplicating and compressing data inline without performance
penalties or resource tradeoffs– SimpliVity developed the OmniStack
Accelerator Card, a PCIe card with an FPGA, flash, and DRAM, protected
with super capacitors. Deduplication and compression are processorintensive activities and with that in mind, SimpliVity’s Accelerator Card
provides the processing power and reserves the Intel CPUs to run
business applications. This architecture allows data processing at nearwire speeds, delivering enterprise-class performance and reducing
latency.
!
7!
!
All data is deduplicated, compressed, and optimized, before it ever hits a hard disk. This is
critical. Not only does this save capacity and prevents administrators from over-provisioning
hardware, it also increases performance. Why? Because SimpliVity eliminates IOPS.
The HDD performance is critical when managing data and supporting business applications.
Even with the aid of DRAM and flash caches, HDD performance will suffer. SimpliVity avoids
this tradeoff by either eliminating or compressing all write operations before they impact the
HDDs. The Data Virtualization Platform also aggregates all writes and does full stripe writes to
the disk. Furthermore, because sequential writes are faster than random I/O, SimpliVity accepts
random writes and writes them to disk sequentially, boosting overall system performance and
efficiency.
While deduplication is the fundamental core, the Data Virtualization Platform further enhances
CAPEX and OPEX savings by delivering remarkable efficiencies through operating system and
virtualization-aware optimizations. The optimizations within SimpliVity’s Data Virtualization Platform
deliver similar effects to deduplication in a different way—they identify data that need not be
copied, or replicated, and take data-specific actions to improve the overall efficiency of the system.
Given that SimpliVity’s hyperconverged infrastructure today is optimized for the VMware
environment, most such optimizations stem from awareness of VMware specific content or
commands. For example, .vswp files, though important to the functionality of each individual VM,
do not need to be backed up locally or replicated across sites. Thus, when preparing to backup or
replicate a given VM from one site to another, the Data Virtualization Platform recognizes the .vswp
file associated with a VM, and eliminates that data from the transfer - saving time, bandwidth and
capacity.
Another important optimization is SimpliVity’s ability to serialize or “sequentialize” random I/O into
sequential I/O. Sequential writes are much more efficient – and therefore faster – than random
writes on an HDD. SimpliVity’s optimization makes the HDD even more efficient in terms of IOPS
consumption. Other optimizations are similar in nature— leveraging the Data Virtualization
Platform’s ability to find and make real-time decisions on common data types within a VMware
environment.
2. Global Unified Management:
The second critical element of the DVP is global unified
management – an intelligent network of collaborative systems
that provides massive scale-out capabilities and VM-centric
management through a single unified interface for the entire
global infrastructure, the Federation. A key differentiator for
SimpliVity is that the Graphical User Interface (GUI) is fully
integrated with VMware vCenter as a plug-in. A single
administrator can manage all aspects of the global SimpliVity
federation from within vCenter, a tool that all VMware
administrators already know and use regularly.
SimpliVity’s management is at the VM-level, globally. Gone
are the days of managing LUNs and Volumes. Instead, an
administrator manages, moves, and protects VMs with the
click of a button within the vCenter plug-in. Moreover,
SimpliVity’s DVP was designed from creation to allow data to
!
8!
!
be mobile and remain mobile throughout its lifecycle within the Global Federation.
Global unified management combined with SimpliVity’s data efficiency equation a powerful
solution for data efficiency. Not only can SimpliVity provide unique deduplication, compression,
and optimization in one site, it can do it globally. When SimpliVity backs up, restores, or moves
a VM to a remote site, it will never transfer duplicate data; it will only transfer the data that the
remote site needs. If the data is already there (in a backup of the VM being transferred or any
other VM), then the data will not be sent.
SimpliVity’s Functionality Elevates Data Efficiency
Additional increases in data efficiency are gained by taking full advantage of SimpliVity’s
comprehensive functionality. Efficiency will continue to improve as administrators do more with
the platform – the more you do, the more efficiency you gain.
• Backups: As you backup more VMs and data, the overall efficiency increases while
improving availability. A high frequency of backups would effectively provide a much lower
Recovery Point Objective (RPO). SimpliVity allows IT the ability to backup applications that
previously were not possible within the allocated budget.
• Cloning: The ability to perform rapid cloning with SimpliVity brings more agility to test &
development teams. Data efficiency continues to improve as more clones are created.
• Retention: When you retain data longer, you will improve your ability to restore further back
in time, improving overall data protection, increasing agility and increasing efficiency.
• Similarity: The more applications that are based on the same operating system – Windows,
for example – will lead to higher efficiency rates.
• Applications: Certain applications will naturally see higher deduplication and compression
ratios. For example, any text-heavy applications will compress well.
Benefits of Data Efficiency with SimpliVity
In order to address all aspects of the data problem with one single 2U building block, SimpliVity
recognizes that addressing data efficiency equates to improved performance, capacity savings,
and as a result, a lower TCO. SimpliVity provides the following benefits directly related to data
efficiency:
•
Performance: SimpliVity significantly increases performance of all applications by
eliminating IOPS that would have to be written to disk in traditional infrastructures. Within
the Data Virtualization Platform, deduplication makes each storage media tier more efficient
(DRAM, Flash, and HDD) because the data is processed only once and every tier of the
system benefits from more efficient data. SimpliVity’s DVP removes the performance
tradeoff that previously prevented legacy technologies from solving the data problem at
inception. Now, instead of sacrificing performance to save capacity, SimpliVity actually
increases performance.
•
Capacity: SimpliVity solves for the I/O gap and therefore requires fewer HDDs in the system
than legacy systems, as writes will have already been deduplicated. The capacity benefit
that SimpliVity provides is positively compounded with hyperconvergence. No longer are
resources split accordingly to every separate stages of the data lifecycle – primary,
secondary, backup, etc. Now, all resources are hyperconverged into one scalable, modular
building block.
!
9!
!
•
Operational Efficiency: SimpliVity drastically simplifies IT. It starts with data efficiency
(reducing capacity while increasing performance) but extends with Global Unified
Management and VM-centricity and mobility. As a whole, SimpliVity’s hyperconverged
solution simplifies all aspects of IT management, producing operational efficiency that allows
administrators and staff to focus their efforts on new innovation instead of maintenance.
SimpliVity simplifies management across all sites, reduces the need for additional IT
resources across multiple sites, and maximizes productivity of IT staff by automating
mundane tasks.
•
3x Lower TCO: Hyperconvergence combines previously disparate products and
applications into a single, scalable, modular 2U building block. SimpliVity’s solution provides
not only server and storage but also natively includes data protection and backup at the VMlevel. There is no longer a need to have separate appliances for data efficiency, data
performance, and data protection, producing immediate Capital Expense (CAPEX) benefits
as less physical hardware is required. Perhaps more significant though are the Operational
Expense (OPEX) benefits of hyperconvergence. The smaller physical footprint and
simplified operations result in lower over Total Cost of Ownership (TCO) accounting for
support, maintenance, bandwidth, power and cooling, and administrative expenses.
•
Data Protection: SimpliVity’s native VM-level backup, restore and disaster recovery
capabilities ensure data protection locally and remotely. The data protection features benefit
from the data efficiency enabled by the DVP:
o
o
o
All backups and restores are full logical backups with no performance impact. When
SimpliVity creates a new local backup, no IOPS are consumed since the creation of the
backup is a metadata update. At the same time, backups are managed as logically
independent objects allowing for unique policies without the integrity risks of traditional
snaps tied to a master or gold copy.
All backups are deduplicated, compressed and optimization across the global federation.
If a backup is sent to a remote site, only unique blocks are sent across the wire, saving
on WAN bandwidth and remote processing / IOPS.
Cloning operations – on running VMs! – are instantaneous. The DVP is once and
forever, which extends to test and development environments. Simply right-click a VM
within vCenter and press “clone”, and a powered-off copy of that running VM will be
created immediately, without any I/O or impact to production.
Data Efficiency Customer Testimonials:
Central One Federal Credit Union achieved massive data efficiency,
in both capacity and performance, with the use of SimpliVity. Central
One was running a mixed workload including a banking application, test and development,
Oracle 11g transactional database, Microsoft Exchange, Microsoft SQL Server and Active
Directory on a non-virtualized environment of 30 disparate servers. This was costly, complex
and generated concerns about outgrowing the data center. After installing SimpliVity OmniCube
hyperconverged solution and migrating the workloads, Central One saw a dramatic
improvement in performance and resource optimization, and was able to store 1.9PB of data on
only 7.9TB of physical capacity.
!
10!
!
“The OmniCube’s native DR ability with built-in dedupe is a gift to clients. No other
solution can do this.” – Adam Winter, IT Consultant!
!
Francis Drilling Fluids, Ltd. (FDF), a provider of midstream services within the
Oil & Gas industry, deployed SimpliVity’s OmniCube to relieve VM server
sprawl, improve disaster recovery and streamline its virtualized data center.
With SimpliVity, FDF realized a data efficiency rate of 288:1 and saw a
capacity savings of over 260TB.
“In the beginning, I thought that OmniCube seemed too good to be true. It really was
inconceivable that we could deploy a 2U solution and gain all the server, storage and
data protection functionality, along with the data deduplication, replication and WAN
optimization we needed. Scalability was a huge factor as well, as was ease of use and
management all integrated from within vCenter.” – Steve Schaaf, CIO, Francis Drilling
Fluids, Ltd.!
To read more please view our case studies and our TechValidate SimpliVity OmniCube results.
!
11!
!
Solution Summary
The results of SimpliVity’s revolutionary Data Virtualization Platform is the market’s most
efficient infrastructure for the modern, agile data center – a globally federated hyperconverged
IT platform that enables VM-centric global management of all VMs, data, and the underlying
infrastructure. The key to SimpliVity's triple digit data efficiency rates - while simultaneously
increasing performance - is the DVP and its fundamentally unique architecture.. SimpliVity put
an end to the appliance sprawl with true hyperconvergence encompassing all main
infrastructure functions, including enterprise capabilities of data efficiency, performance,
protection, and global unified management. SimpliVity also improves operational efficiency with
global unified management, natively integrates data protection, and lowers TCO by at least 3x
with true hyperconvergence. OmniCube improves performance while saving on capacity.
SimpliVity is the only solution that is proven to deliver over 100:1 data efficiency rates without
any of the penalties.
!
12!