How to get the most from storage while staying within budget CW

buyer’s guide
How to get the most from storage
while staying within budget
thinkstock
CW Buyer’s guide
In 2011, storage
represented an
Storage & Back-up average of 15% of the
total IT budget.
Making the wrong
bets on how you
architect your storage
environment could cause this number to grow more than your
finance chief might like. In this buyer’s guide, we assess the
changing storage landscape and identify how the various
technologies can be implemented to maximum effect.
Contents
Derive maximum value from storage
page 2
Forrester analysts Andrew Reichman and Vanessa Alvarex identify
the top trends in a rapidly changing storage market and methods
for implementing relevant technologies to deliver and manage
business data effectively and within budget
page 4
thinkstock
Cut costs with primary deduplication
Chris Evans looks at how primary storage deduplication works,
what it can achieve and how its use is set to increase
How storage provision must change with virtual
desktop infrastructure
page 6
Centrally run storage can suck all benefits from a VDI deployment
if it is not sufficiently provisioned, writes Cliff Saran
These articles were originally published in the Computer Weekly ezine.
a whitepaper from
1
Computer Weekly
thinkstock
buyer’s guide
Derive maximum value from storage
Forrester analysts Andrew Reichman and Vanessa Alvarex identify the top trends in a rapidly changing storage market
and methods for implementing relevant technologies to deliver and manage business data effectively and within budget
CW Buyer’s guide
storage & back-up
I
n 2011, storage represented an
average of 15% of the total IT
budget. Making the wrong bets
on how you architect your storage environment could cause this
number to grow more than your finance chief might like. We assess the
changing storage landscape and identify how the various technologies can
be implemented to maximum effect.
Storage will no longer
be run as an island
The traditional model for an infrastructure and operations (I&O)
organisation is to have a distinct
server, storage and network team,
with different budgets and priorities – and the result is often strained
relationships and poor communication among the groups.
Because most firms don’t have effective chargeback, there is little visibility into the overall IT impact of
any cross-group strategy moves. Add
in the complexity of technical interactions between these silos, and you
get a real mess.
Change in this approach has been
sorely needed for years, and we are
starting to see it happen. We expect
2012 to be a banner year for convergence across these silos, and cooperation will bring storage out of the vacuum. Because storage is so expensive,
CIOs and CFOs are paying more attention to purchase decisions, and this
trend is pushing those purchases towards greater consistency and fit with
the wider IT strategy.
The consolidation of applications,
increased use of virtual server technology, and the emergence of application-specific appliances and bundles means that it is more viable to
buy consistent solutions for stacks
such as Oracle databases and applications, VMware and Microsoft applications and virtual servers, among
other workloads.
Forrester’s advice: Break down
the organisational and budgetary
walls that prevent I&O people from
cooperating.
Consider aligning teams by the
major workload stacks rather than
technology components; you may see
much better communication as a result. Make storage technology decisions in concert with server, network
and application strategies, and you
will likely start to optimise around
the thing you care most about: supporting the business.
Storage to become more
specialised for big firms
For years, many firms have simply
chosen the “highest common denominator” as their single tier of storage
– in other words, if some data needed
2
top-tier block storage, then in many
cases, this was the only flavour to be
deployed.
As data volumes have grown over
the years, the penalty for such a simple environment has grown, when
much of the data does not really need
top-tier storage. Additionally, unique
requirements for specific workloads
vary significantly, so the single flavour is often not well suited to big
portions of the data being stored.
Major workload categories that demand optimisation include virtual
servers, virtual desktops, Oracle databases, Microsoft applications, files,
data warehouse/business intelligence, mainframe, archives, and
back-ups. Each of these has unique
performance and availability profiles,
and each has major applications that
need close integration to the storage
they use.
Forrester’s advice: I&O professionals should be clear about which of
these workloads are major consumers »
of data in your large storage environment and see if an optimised architecture would make more sense than
a generic solution.
Once you start measuring and
strategising along those lines, develop a set of scenarios about what you
could buy and how you could staff
along workload-optimised lines, and
a strategy will emerge from there.
»
Cloud storage to become a
viable enterprise option
In 2010 and 2011, I&O professionals
saw a great deal of attention being
paid to multiple forms of cloud, storage included, but still, few large enterprises had jumped on board.
With more enterprise-class cloud
storage service provider options, better service level agreements (SLAs),
the emergence of cloud storage gateways, and more understanding of the
workloads that make sense, 2012 is
likely to be a big year for enterprises
moving data that matters into the
public cloud. I&O professionals will
have to assess what data they can
move to the cloud on a workload-byworkload basis.
There will not be a dramatic “tear
down this datacentre” moment any
time soon, but I&O professionals
will quietly shift individual data to
the cloud in situations that make
sense, while other pieces of data
will remain in a more traditional
setting. The appropriate place for
your data will depend on its performance, security and geographic access requirements, as well as integration with other applications in
your environment.
Forrester’s advice: I&O teams
should evaluate their workloads to
see if they have some that might
make sense to move now.
Develop a set of detailed requirements that would enable a move to
the cloud, then evaluate service providers to determine what is feasible.
Focus on files, archives, low-performance applications and back-up
workloads as likely cloud storage
candidates, and develop scenarios of
SSD remains far
more expensive
than traditional
spinning disk, so it
is still challenging
to figure out how
and where to use it
thinkstock
buyer’s guide
The high cost of storage is
leading to greater consistency
and fit with the wider IT strategy
needs it. I&O professionals have another option in leveraging the performance power of SSD to enable
better deduplication that could bring
storage cost down, but these options
are still newer to the market.
If you currently use custom performance-enhancing configurations
such as “short stroking”, then that
data is likely to be a good candidate
to get better results on SSD. If you
have applications that are struggling
to deliver the needed levels of performance, then SSD might be your best
option to house their data.
Forrester’s advice: You need to understand the performance requirements and characteristics of your
workloads to make effective use of
SSD. Don’t overspend on SSD where
traditional disk will do – carry out
rigorous performance analysis to find
out where the bottlenecks are, and
pick the tools that will address the
gaps you uncover.
how they could run in cloud models
currently on the market.
Make sure you think about fallback strategies in case the results are
poor, so that you are insulated should
your provider change its offering or
go out of business.
SSD to play a larger part in
enterprise storage
While application performance
demands continue to increase, spinning disk drives are not getting any
faster; they have reached a plateau at
15,000rpm. To fill the gap, the industry has coalesced around solid state
disk (SSD) based on flash memory –
the same stuff that’s in your iPod (for
the most part).
Flash memory is fast, keeps data
even when it loses power, and recent
improvements in hardware and software have increased the reliability
profile to effectively meet enterprise
needs. However, SSD remains far
more expensive than traditional spinning disk, so it is still challenging to
figure out how and where to use it. In
2012, Forrester expects to see existing
and promising new suppliers showcase more mature offerings in a variety
of forms, including SSD tiers within
disk arrays supported in some cases
by automated tiering, SSD data caches, and SSD-only storage systems.
Because SSD is fast, but relatively
expensive, the long-term media mix
is likely to include cheap dense
drives for the bulk of data that is not
particularly performance sensitive,
and a small amount of SSD that is
targeted only for the data that truly
Automated tiering will
become widely adopted
I&O teams have dreamed of an easy
way to put the right data on to the
right tier of storage media, but a costeffective, reliable way of doing so has
remained elusive.
Tiering, information lifecycle management (ILM), and hierarchical storage management (HSM) look promising, but few firms have managed to
get it right and spend less money on
storage as a result.
Compellent, now owned by Dell,
was a pioneer in sub-volume, automated tiering – a method that takes
3
the responsibility away from the administrator to make decisions about
what should live where and has
enough granularity to address the
varied performance needs within
volumes. Almost every supplier in
the space is eagerly working on a
tool that can accomplish this goal,
and we are likely to see results in
2012, leading to increased maturity
and wider adoption.
However, some application providers say block storage systems
don’t have enough context of data to
effectively predict performance
needs and that the applications
should do this, rather than the storage systems. They also cite that the
added central processing unit (CPU)
burden outweighs benefits, or that
SSD will eventually be cheap enough
with advanced deduplication that a
tierless SSD architecture will replace
the need for tiering altogether. Suppliers such as NetApp also prefer a
caching approach.
Forrester’s advice: There is some
validity in some of these arguments,
but there is little doubt that automated tiering will play a bigger role in
enterprise storage, along with alternatives such as application-driven data
management, advanced caching, and
SSD-centric systems. ■
This is an extract from the report: “Top 10
Storage Predictions For I&O Professionals”
(Feb, 2012) by Forrester analysts Andrew
Reichman and Vanessa Alvarez, both of
whom are speaking at Forrester’s upcoming
Infrastructure & Operations EMEA Forum
2012 in Paris (19-20 June).
buyer’s guide
Cut costs with primary deduplication
Chris Evans looks at how primary storage deduplication works, what it can achieve and how its use is set to increase
CW Buyer’s guide
Storage & back-up
D
isk space reduction is
a key consideration for
many organisations that
want to reduce storage
costs. With this aim in mind, data
deduplication has been deployed
widely on secondary systems, such
as for data back-up, but primary
storage deduplication has yet to
which further instances of data map
to the single instance already held.
In instances such as back-up
operations, where the same static
data may be backed up repeatedly,
deduplication can reduce physical
storage consumption by ratios as
high as 10-to-1 or 20-to-1 (equalling
reach this level of adoption.
Data deduplication is the process
of identifying and removing identical
pieces of information in a batch of
data. Compression removes redundant data to reduce the size of a file
but doesn’t do anything to cut the
number of files it encounters. Data
deduplication, meanwhile, takes a
broader view, comparing files or
blocks in files across a much larger
data set and removing redundancies
from that.
In a data deduplication hardware
setting, rather than store two copies
of the same data, the array retains
metadata and pointers to indicate
90% and 95% saving in disk space
respectively).
Clearly, the potential savings in
physical storage are significant. If
primary storage could be reduced by
up to 90%, this would represent huge
savings for organisations that deploy
large numbers of storage arrays.
»
“Inline deduplication requires more
resources and can suffer from latency
as data is checked against metadata”
4
buyer’s guide
Unfortunately, the reality is not
that straightforward. The use case for
deduplicated data fits well with backup but not always so well with primary storage.
Compared with large back-up
streams, the working data sets in primary storage are much smaller and
contain far fewer redundancies. Consequently, ratios for primary storage
deduplication can be as low as 2-to-1,
depending on the type of data the algorithm gets to work on.
Having said that, as more organisations turn towards server and desktop virtual infrastructures, the benefits of primary storage deduplication
implementations re-appear.
Virtual servers and desktops are
typically cloned from a small number
of master images and a workgroup
will often run from a relatively small
set of spreadsheets and Word documents, resulting in highly efficient
deduplication opportunities that can
bring ratios of up to 100-to-1.
The deduplication saving can even
justify the use of solid-state drives
(SSDs), where their raw cost would
have been previously unjustifiable.
Pros and cons
Of course, primary storage deduplication is no panacea for solving storage growth issues and there are some
disadvantages alongside the obvious
capacity and cost savings.
There are two key data deduplication techniques in use by suppliers
today. Identification of duplicate data
can be achieved either inline in real
time, or asynchronously at a later
time, known as post-processing.
Inline deduplication requires more
resources and can suffer from latency
issues as data is checked against
metadata before being committed to
disk or flagged as a duplicate.
Increases in CPU processing power
help to mitigate this issue and, with
efficient search algorithms, performance can actually be improved if a
large proportion of the identified data
is duplicated, as this data doesn’t
need to be written to disk and metadata can simply be updated.
Post-processing data deduplication
requires a certain amount of storage
to be used as an overhead until the
deduplication process can be executed and the duplicates removed. In
environments with high data growth
rates, this overhead starts to cut into
the potential savings.
For both implementations, deduplicated data produces random I/O
for read requests, which can be an
issue for some storage arrays. Storage
array suppliers spent many years
optimising their products to make
use of sequential I/O and prefetch.
Deduplication can work counter to
Supplier implementations of deduplication technology
How have suppliers implemented deduplication
technology into their primary storage systems?
l NetApp: NetApp was the first supplier to offer primary
storage deduplication in its filer products five years ago,
in May 2007. Originally called A-SIS (advanced singleinstance storage), the feature performs post-processing
deduplication on NetApp volumes. Many restrictions
were imposed on volumes configured with A-SIS; as
volume sizes increased, the effort required to find and
eliminate duplicate blocks could have significant
performance impacts. These restrictions have been
eased as newer filers have been released with faster
hardware. A-SIS is a free add-on feature and has been
successful in driving NetApp in the virtualisation market.
l EMC: Although EMC has had deduplication in its
back-up products for some time, the company’s only
array platform to currently offer primary storage
deduplication is the VNX. This capability is restricted to
file-based deduplication, traced to the part of the
product that was the old Celerra. EMC has talked about
block-level primary storage deduplication for some time,
and we expect to see that in a future release.
l Dell: In July 2010 Dell acquired Ocarina Networks.
Ocarina offered a standalone deduplication appliance
that sat in front of traditional storage arrays to provide
inline deduplication functionality. Since acquisition, Dell
has integrated Ocarina technology into the DR4000 for
disk-to-disk backup and the DX6000G Storage
Compression Node, providing deduplication functionality for object data. Dell is rumoured to be working on
deploying primary storage deduplication in its
Compellent products.
l Oracle and suppliers that support ZFS: As the
owner of ZFS, Oracle has had the ability to use data
deduplication in its storage products since 2009. The
Sun ZFS Storage Appliance supports inline deduplica-
“With efficient
search algorithms,
performance can
actually be
improved”
this because over time it pulls apart
the “natural” sequence of blocks
found in unreduced data, making
gaps here and placing pointers there
and spreading parts of the file across
many spindles.
Users can deal with this issue by
adding flash as a top tier in working
data, which provides rapid-enough
access to combat the type of randomisation that’s an issue for spinning
disk. Some suppliers mentioned in
the panel above – the SSD startups –
have seen the boost that flash can
give to primary data deduplication
and designed it into their product architectures from the start. ■
5
tion and compression. The deduplication feature also
appears in software from suppliers that use ZFS in their
storage platforms. These include Nexenta Systems,
which incorporated data deduplication into NexentaStor
3.0 in 2010, and GreenBytes, a start-up specialising in
SSD-based storage arrays that also makes use of ZFS
for inline data deduplication.
l SSD array startups: SSD-based arrays are suited to
coping with the impacts of deduplication, including
random I/O workloads. SSD array startups Pure Storage,
Nimbus Data Systems and SolidFire all support inline
primary data deduplication as a standard feature. In fact,
on most of these platforms, deduplication cannot be
disabled and is integral to these products.
l Suppliers targeting virtualisation: For platforms that
specifically target virtualisation environments, Tintri and
NexGen Storage offer arrays optimised for virtualisation,
and both utilise data deduplication. NexGen has taken a
different approach from some of the other recent
start-ups and implements post-processing deduplication with its Phased Data Reduction feature.
Primary storage data deduplication offers the ability to
reduce storage utilisation significantly for certain use
cases and has specific benefits for virtual server and
desktop environments. The major storage suppliers
have struggled to implement deduplication into their
flagship products – NetApp is the only obvious
exception to this – perhaps because it reduces their
ability to maximise disk sales.
However, new storage start-ups, especially those that
offer all- or heavily SSD-reliant arrays, have used that
performance boost to leverage data deduplication as a
means of justifying the much higher raw storage cost of
their devices. So it looks as if primary storage deduplication is here to stay, albeit largely as a result of its
incorporation into new forms of storage array.
»
buyer’s guide
Storage hardware
How storage provision must change
with virtual desktop infrastructure
Centrally run storage can suck all benefits from a VDI deployment if it is not sufficiently provisioned, writes Cliff Saran
becomes an option for companies in
the mid-market.”
CW Buyer’s guide
Storage & back-up
I
f 100 people tried to access the
same piece of data on conventional PC infrastructure simultaneously, it would result in a
denial of service. And this is exactly
the case with virtual desktop infrastructure (VDI).
There is a growing realisation
among IT professionals of an Achilles’ heel to desktop virtualisation, in
the way conventional storage works
in VDI. Virtualising hundreds, if not
thousands, of desktop computers
may make sense from a security and
manageability perspective. But each
machine has local processing, graphics processors and storage.
Server CPUs may be up to the task
of running most desktop applications
and modern VDI offers local graphics
accelerators. But storage needs to run
centrally. So if each physical PC has
120GB of local storage, a 1,000 virtual
desktop deployment needs at least
120TB of enterprise storage.
However, even this is not enough.
For a good user experience on VDI,
the infrastructure must minimise latency. It boils down to I/O operations
per seconds (IOPS) – the number of
data reads and writes to disks.
Theoretical models of usage claim
a desktop PC spends 70-80% of its
time performing disk reads and 1030% of its time writing to disk. But
Ruben Spruijt, CTO at IT infrastructure specialist PQR, believes these
numbers are underestimates.
“In my experience a user’s PC
spends 20-40% of the time doing
reads, and 60-80% on disk writes,”
says Spruijt. And writing to disk can
be difficult in VDI.
The more IOPS virtual desktops
need, the greater the cost. Consider
streaming media, desktop video conferencing and any application that
makes frequent disk read and writes.
Josh Goldstein, vice-president of
marketing and product management
at XtremIO, says: “Since successful
“If you hold a
lot of data in the
disk controller, it
reduces the
number of writes
to the disk”
Hamish MacArthur,
MacArthur Stroud
ments. Some products try to sequence disk drive access to minimise
the distance the disk heads need to
move. Others perform data de-duplication, to prevent multiple copies of
data being stored on disk. This may
be combined with a large cache and
tiering to optimise access to frequently used data.
Today, the most talked-about move
in disk technology is solid state disks
(SSDs), which can be used for tier
one storage to maximise IOPS.
SSDs from companies such as
Kingston improve VDI performance
by boosting IOPS.
Graham Gordon is operations director at ISP and datacentre company, Internet for Business. The firm is
expanding its product portfolio to
offer clients VDI.
Gordon says: “SSD is still relatively expensive but it is getting cheaper.
There is still some way to go before it
selling of VDI storage requires keeping the cost of a virtual machine inline or lower than a physical machine, storage vendors artificially
lower their IOPS planning assumptions to keep costs in line.
“This is one of the reason many
VDI projects stall or fail. Storage that
worked great in a 50 desktop pilot
falls apart with 500 desktops.”
Innovation in disk technology
Storage expert Hamish MacArthur,
founder of MacArthur Stroud, says:
“If you read and write every time it
builds up a lot of traffic. One way
manufacturers of disk controllers are
tackling this problem is to hold the
data. If you can hold a lot of data in
the disk controller before writing to
the disk, it reduces the number of
writes to the disk.”
A new breed of disk controllers is
now tailored to virtualised environ-
6
Exploiting SSD flash niche
EMC recently revealed elements of its
upcoming product release resulting
from its acquisition of XtremIO.
It will use the start-up’s technology
to create an entirely flash-based storage array to give enormous – albeit
expensive – performance, compared
with traditional disk drives.
Goldstein claimed it could achieve
unlimited IOPS which, in a short
demonstration, reached 150,000
write and 300,000 read speeds.
The key to the box is its ability to
scale out and link up to other XtremIO arrays. Goldstein has demonstrated eight working together as a cluster,
which had the potential to achieve
over 2.3 million IOPS.
He also showed the array creating
100 10TB volumes in 20 seconds,
configuring 1PB overall.
However, flash-based technologies
and solid state drives are too expensive to run as primary storage in enterprise environments. Instead, companies are deploying tiered storage
arrays, using SSD for immediate access to important data and cheaper
hard drives for mass storage.
Taking this a step further, in the
Ovum report, 2012 Trends to watch:
Storage, analyst Tim Stammers notes
storage vendors are planning to create flash-based caches of data physically located inside servers, to eliminate the latency introduced by SANs.
“Despite their location within
third-party servers, these caches
would be under the control of disk
arrays. EMC has been the most vocal
proponent of this concept, which is
sometimes called host-side caching.
EMC’s work in this area is called Project Lightning,” writes Stammers.
Project Lightning has now become
a product called VFCache, which
places flash memory onto a PCIe card
that plugs into the server. It allows a
copy of data to be taken immediately
at the server level, rather than before
it gets to the storage, upping performance yet again. ■
This is an edited excerpt. Click here to read
the full article online