The Why And How Of SSD Over Provisioning White Paper WP004

The Why And How Of
SSD Over Provisioning
White Paper WP004
October 2012
Corporate Headquarters: 39672 Eureka Dr., Newark, CA 94560, USA ♦ Tel:(510) 623-1231 ♦ Fax:(510) 623-1434 ♦ E-mail: [email protected]
Flash Design Center: 2 Robbins Road, Westford, MA 01886, USA ♦ Tel:(978) 303-8500 ♦ Fax:(978) 303-8757
Flash Design Center: 2600 W. Geronimo, Chandler, AZ 85244, USA ♦ Tel:(480) 792-8900 ♦ Fax:(480) 792-8901
Asia: Plot 18, Lrg Jelawat 4, Kawasan Perinudstrian Seberang Jaya 13700, Prai, Penang, Malaysia ♦ Tel:+604-3992909 ♦ Fax:+604-3992903
WP004 – The Why And How Of SSD Over Provisioning
Table of Contents
1
Overview ......................................................................................................................................................... 2
2
Flash Management Algorithms ....................................................................................................................... 2
3
The Impact of Over Provisioning ..................................................................................................................... 3
4
Write Amplification ......................................................................................................................................... 4
5
Optimus SAS SSD family .................................................................................................................................. 5
1|Page
October 2012
WP004 – The Why And How Of SSD Over Provisioning
1 Overview
Solid State Disks reserve a portion of their total flash address space for “Over Provisioning” (OP), a percentage
of the total physical memory reserved by the SSD and not part of the device’s logical address space. The level
of OP affects both write performance and endurance (operational lifetime). Higher is always better. The
differences in performance and endurance that result from the changes in OP are all a “natural” result of the
amount of reserved physical address space, there are no algorithmic changes, the drive just runs more
“efficiently” when the OP is higher.
The HDD equivalent to this is called “short stroking” and the implications of the change are identical. The exact
same device running the exact same firmware simply accesses a larger or smaller range of the physical address
space.
The benefit for a storage OEM is that qualifying an SSD with a one OP configuration is in essence the same as
qualifying all OP configurations. The firmware and the hardware of the drive remain the same; the only
difference will be seen in its performance, logical capacity and endurance specifications. Separate qualification
processes for different levels of Over Provisioning within the same product family are not required.
SMART Storage Systems offers the Optimus family of products in three specific levels of OP, allowing the
Optimus SSDs to address a range of workloads that range from 10 “Drive Writes Per Day” (DWPD) to 50 DWPD
with a level of performance and “Total Cost of Ownership” (TCO) that’s better than any comparable
competitor.
2 Flash Management Algorithms
When an SSD is in a FOB (Fresh-Out-of-Box) state 1 , almost the entire available flash memory space is in an
unallocated/free pool 2. As host writes come in, the drive allocates enough flash memory to write the host
data into those allocated physical memory locations, and assigns a logical-block-to-physical-address entry in a
table. As new writes come in for previously unwritten logical blocks, more memory is allocated from the free
pool to store this new data. This process continues until the entire logical capacity of the drive has been
written once.
The order and addresses of user data sent to the drive do not necessarily correlate with the physical locations
of the data in the flash. The physical location used to store data for specific logical addresses is more a function
of when the command was received than what the targeted logical block address was. Regardless of a host
command’s logical block addresses, the associated data gets written to the next-available flash block in the
free pool. The process is pretty straight forward as long as new writes always go to logical addresses that
haven’t been written before.
Things get more complicated when newly received write commands overlap previously written locations. Flash
is not direct-rewritable memory. Data is stored in Blocks consisting of Pages that contain the User Data. Flash is
written a Page at a time, but must be erased a Block at a time. So once written, the entire Block containing the
1
2
FOB state can be also achieved through a FORMAT function.
Some overhead is required for drive operation and is excluded from the “unallocated” space.
2|Page
October 2012
WP004 – The Why And How Of SSD Over Provisioning
memory location must be explicitly erased before that location can be written again. This means that write
commands that overlap data that has already been written require a Read/Modify/Write cycle in order to be
completed. These Read/Modify/Write cycles are extremely costly from a performance and endurance
standpoint. To minimize them, the SSD keeps a fixed amount of physical flash memory in reserve, called the
Over Provisioning (OP) space. This memory is not part of the user-addressable logical space (the physical
capacity of the drive is greater than the logical capacity). Where a HDD can just overwrite previously written
areas of the media to update those locations, an SSD puts newly received data into newly allocated locations
from the pool of free flash. Furthermore, depending on alignment and transfer size the drive may also need to
read data from old adjacent locations to combine into a full page to write to the new location. When this
happens, old flash blocks that now contain the “stale” data are marked as invalid and returned to the free pool
(where they will be erased until such that they are ready to be used again).
It is worth mentioning that logical block addresses that have never been written don’t actually “exist”. That is,
there is no physical location containing data assigned to that logical block location. As a result, read
performance on commands accessing these addresses will be poor. There is no flash location to read, and the
drive has to generate the data “manually”. Protection Information (PI) can add additional overhead, slowing
read performance further. For this reason, it is critically important to write all addresses at least once prior to
reading in order to maximize read performance.
3 The Impact of Over Provisioning
Like any system that dynamically allocates and de-allocates resources from a shared pool, the larger the size of
the pool the higher the operating efficiency and the better the performance. Since the pool starts off
completely full of erased flash blocks, new allocations occur quickly, with little-to-no overhead. The allocation
routine simply supplies the next available physical address range from the pool. The resulting drive
performance is high, but not representative of its sustainable performance. In typical operating environments,
the capacity of the drive is quickly written and the initial performance level falls to a lower steady-state level.
“Preconditioning” is designed to address this issue during performance testing.
As the pool is used, it becomes “dirty”. The ongoing process of allocation and de-allocation results in
fragmentation of the large contiguous unallocated regions in the physical flash free space. This can cause the
allocation routine to have to work harder to find a region (or regions) sufficient to meet each new request,
slowing performance. So the allocation routine must also periodically “garbage collect”, moving content
around to pack used regions together in order to maximize the extent of the free regions in the pool. This can
also cause slower performance.
Over provisioning is generally defined as follows:
𝑂𝑃 (𝑂𝑣𝑒𝑟 𝑃𝑟𝑜𝑣𝑖𝑠𝑖𝑜𝑛𝑖𝑛𝑔) =
𝑃ℎ𝑦𝑠𝑖𝑐𝑎𝑙 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦
−1
𝐿𝑜𝑔𝑖𝑐𝑎𝑙 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦
A higher percentage of overprovisioning means that there is a higher probability of finding an available block
for each new write. It also means that the SSD can perform garbage collection less “aggressively”, reducing its
impact on latency and improving the overall performance of the drive.
3|Page
October 2012
WP004 – The Why And How Of SSD Over Provisioning
As discussed in Section 2, when the drive starts out in a FOB state all the flash blocks are in the free
unallocated pool. In this condition, the drive is effectively 100% over provisioned, which significantly benefits
write performance. As writes are completed, capacity is consumed and the effective OP decreases, eventually
reaching the drive’s specified OP level. The OP determines the maximum logical capacity of the drive, but the
drive’s consumed logical capacity can change the amount of OP that is effectively reserved. The same physical
drive could be reduced in logical size by any additional arbitrary amount, and the logical space “surrendered”
becomes part of the OP. By using less of the total logical address space (by reducing the logical block address
range accessed by the host), the OP is increased, which implicitly results in higher performance and increased
endurance.
4 Write Amplification
The performance of an SSD is in large a function of how efficiently it manages the OP pool. Unfortunately, the
allocation and de-allocation routines, garbage collection, checkpoint data, plus the Read/Modify/Write
operations all result in more data to be written to the flash than is actually being received from the host. Write
Amplification (WA) is defined as the ratio between these two, as shown below.
𝑊𝑟𝑖𝑡𝑒 𝐴𝑚𝑝𝑙𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 =
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝐷𝑎𝑡𝑎 𝑊𝑟𝑖𝑡𝑡𝑒𝑛 𝑡𝑜 𝐹𝑙𝑎𝑠ℎ
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝐷𝑎𝑡𝑎 𝑊𝑟𝑖𝑡𝑡𝑒𝑛 𝑏𝑦 𝐻𝑜𝑠𝑡
Because WA represents overhead and extra writes to flash, the lower the Write Amplification 3 the higher the
performance of the SSD and the longer the drive will last. The reverse is true as well; a high Write Amplification
results in reduced performance and accelerated drive wear out (Optimus includes a performance monitoring
parameter that can be read by the host to determine current WA. This parameter can be configured to
generate a S.M.A.R.T. warning if WA increases above the chosen threshold).
The OP has a direct effect on the Write Amplification, as it allows the flash management algorithms to run
more efficiently. Figure 1 below shows the correlation between Write Amplification and over provisioning.
Figure 1: Write Amplification vs Over Provisioning
10
Write Amplification
9
OP=7%
WA=8.60
8
7
6
5
4
OP=28%
WA=2.73
3
OP=156%
WA=1.22
2
1
0%
20%
40%
60%
80%
100%
120%
140%
160%
Overprovisioning (%)
3
Unless the SSD controller deploys a compression scheme, write amplification will always be larger than 1.
4|Page
October 2012
WP004 – The Why And How Of SSD Over Provisioning
As can be seen from the graph above, higher OP has a significant impact on WA. A 7% OP drive has an average
WA of 8.6. Every full write of the drive generates 8.6 times as many internal writes to Flash. A 156% OP drive
has a WA of only 1.22. Subjecting the same drive type to the same workload at these two different OP levels,
the higher OP drive would last 7x longer.
Drive Writes Per Day (DWPD) describes how many bytes of data the SSD is rated to accept before reaching
wear out. It is important to note that this includes consideration for the extra writes imposed by WA, as well as
user data, but that it is independent of the physical capacity of the drive.
𝐷𝑟𝑖𝑣𝑒 𝑊𝑟𝑖𝑡𝑒𝑠 𝑃𝑒𝑟 𝐷𝑎𝑦 (𝐷𝑊𝑃𝐷) =
𝐸𝑛𝑑𝑢𝑟𝑎𝑛𝑐𝑒 ∗ (1 + 𝑂𝑃)
𝐷𝑎𝑦𝑠 𝑃𝑒𝑟 𝐿𝑖𝑓𝑒 ∗ 𝑊𝐴
As a result of this relationship, DWPD can be increased significantly, simply by increasing OP!
5 Optimus SAS SSD family
SMART Storage Systems Optimus, Optimus Ultra and Optimus Ultra+ SAS SSDs take advantage of the effects of
Write Amplification and Over Provisioning to deliver configurations ranging from 10 DWPD to 50 DWPD. The
only difference between these models (besides the Label, Inquiry string/VPD identifying the type of drive, and
raw capacity) is the logical address space they report to the host system.
For example, an Optimus 400GB SSD has a physical flash capacity of 512GB (and a logical capacity of ~374GB).
The “missing” 138GB is the 28% OP, which results in an endurance spec of 10 DWPD. The same 512GB raw
capacity space can also be configured into an Optimus Ultra 300GB, effectively over provisioning the drive by
an additional 50GB for a total of 71% OP space. This configuration supports 25 DWPD. Finally, by increasing the
over provisioning even further to 156%, the Optimus Ultra+ provides a capacity of 200GB and endurance
capability of 50 DWPD.
These configurations are set by limiting the maximum LBA address the drive says it will accept (“MaxLBA”).
This is accomplished via a single data-value change, the default setting of a field in a SAS Mode Page 4.
SMART Storage Systems makes this change for customers when they order the specific model types, although
it is possible for the customer to make this change in their configuration process or on their host systems and
not be restricted to the three configurations offered by SMART Storage Systems.
This architecture allows our customer to qualify one model from the Optimus family, and then purchase one or
all of the models to use in a variety of different workload environments. The qualification burden is
significantly reduced because the different models do not run different FW, and share 100% identical HW. The
only difference is the maximum LBA range they are configured to accept.
4
The host can achieve the same effect by just not accessing any logical block addresses above the desired maximum.
5|Page
October 2012
WP004 – The Why And How Of SSD Over Provisioning
Disclaimer:
No part of this document may be copied or reproduced in any form or by any means, or transferred to any third party,
without the prior written consent of an authorized representative of SMART Storage Systems (“SMART”). The information
in this document is subject to change without notice. SMART assumes no responsibility for any errors or omissions that
may appear in this document, and disclaims responsibility for any consequences resulting from the use of the information
set forth herein. SMART makes no commitments to update or to keep current information contained in this document.
The products listed in this document are not suitable for use in applications such as, but not limited to, aircraft control
systems, aerospace equipment, submarine cables, nuclear reactor control systems and life support systems. Moreover,
SMART does not recommend or approve the use of any of its products in life support devices or systems or in any
application where failure could result in injury or death. If a customer wishes to use SMART products in applications not
intended by SMART, said customer must contact an authorized SMART representative to determine SMART's willingness
to support a given application. The information set forth in this document does not convey any license under the
copyrights, patent rights, trademarks or other intellectual property rights claimed and owned by SMART. The information
set forth in this document is considered to be “Proprietary” and “Confidential” property owned by SMART.
ALL PRODUCTS SOLD BY SMART ARE COVERED BY THE PROVISIONS APPEARING IN SMART'S TERMS AND CONDITIONS OF
SALE ONLY, INCLUDING THE LIMITATIONS OF LIABILITY, WARRANTY AND INFRINGEMENT PROVISIONS. SMART MAKES NO
WARRANTIES OF ANY KIND, EXPRESS, STATUTORY, IMPLIED OR OTHERWISE, REGARDING INFORMATION SET FORTH
HEREIN OR REGARDING THE FREEDOM OF THE DESCRIBED PRODUCTS FROM INTELLECTUAL PROPERTY INFRINGEMENT,
AND EXPRESSLY DISCLAIMS ANY SUCH WARRANTIES INCLUDING WITHOUT LIMITATION ANY EXPRESS, STATUTORY OR
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
©2012 SMART Storage Systems. All rights reserved.
Corporate Headquarters: 39672 Eureka Dr., Newark, CA 94560, USA ♦ Tel:(510) 623-1231 ♦ Fax:(510) 623-1434 ♦ E-mail: [email protected]
Flash Design Center: 2 Robbins Road, Westford, MA 01886, USA ♦ Tel:(978) 303-8500 ♦ Fax:(978) 303-8757
Flash Design Center: 2600 W. Geronimo, Chandler, AZ 85244, USA ♦ Tel:(480) 792-8900 ♦ Fax:(480) 792-8901
Asia: Plot 18, Lrg Jelawat 4, Kawasan Perinudstrian Seberang Jaya 13700, Prai, Penang, Malaysia ♦ Tel:+604-3992909 ♦ Fax:+604-3992903
6|Page
October 2012