You Zhou , Fei Wu , Ping Huang , Xubin He

Session 4 : Storage ( 09:00 - 10:15, April 23)
An Efficient FTL to Optimize Address Translation in Flash Memory
∗
∗
†
†
∗
∗
You Zhou , Fei Wu , Ping Huang , Xubin He , Changsheng Xie , and Jian Zhou
∗
†
Huazhong University of Science and Technology, China
Virginia Commonwealth University, USA
Introduction
Analytical Models
Observation 1
Observation 2
NAND flash memory gains popularity due to
high performance, low energy, and decreasing
$/GB. To emulate a standard block device, a flash
translation layer (FTL) is required to perform address translation and garbage collection.
The FTL uses a mapping cache to accelerate the
address translation. However, inefficient cache
management introduces significant overhead,
which degrades the performance and lifetime of
flash memory.
We have developed a performance model and a
write amplification (WA) model to analyze the address translation overhead in a demand-based
page-level FTL.
To investigate how to lower the probability, we
have conducted experiments to study the distribution of cached entries in DFTL.
To investigate how to increase the hit ratio, we
have studied the spatial locality in several representative enterprise workloads.
Activeentries
LPN
···
···
···
Loadentries
PPN
···
···
···
Mappingtable
LPN PPN
Writeback
dirtyentries
Trans.Page
Mapping Cache
Two conclusions:
Flash Memory
MRU
1:335
x:x
1:334
x:x
TP8
8:520
x:x
TP3
• lowering the probability of replacing
a dirty entry.
LRU
TP5
3:106
x:x
5:436
x:x
3:952
x:x
5:435
x:x
TVPN TPPN
x:x
Aclusterwith
threeentrynodes
Mapping Cache
5:56
x:x
5:55
x:x
5:979
x:x
~
• Less than 15% of entries in a cached translation page are recently accessed.
• Cached trans. pages with multiple dirty entries may be written back repeatedly when
replacements occur.
To improve the hit ratio, two techniques are
proposed, based on Observation 2.
We use the trace-driven simulation to evaluate
TPFTL and compare it with DFTL, S-FTL and the
5
optimal FTL under four enterprise workloads.
+15.3%
1. Request-level prefetching leverages
the sequentiality in each large request.
>95%
+13.1%
+16.5%
0
x:x
1
x:x
2
x:x
...
...
• Two-level LRU Lists
Cached entries that belongs to the same
translation page, which is the access unit of
the mapping table in flash memory, are clustered in a TP node.
<4%
2. Selective prefetching leverages the sequentiality among requests.
<4%
+15.3%
<4%
• An Efficient Replacement Policy
5.1
4.9
4.1
To lower the probability, two techniques are
proposed, based on Observation 1.
2. Clean-first replacement postpones the
replacements of dirty entries.
Acknowledgment
We are grateful to our shepherd Michael Swift and the anonymous reviewers for their insightful feedback. This work was sponsored in part by the National Basic Research Program of China (973 Program)
under Grant No.2011CB302303 and the National Natural Science Foundation of China No. 61300047.
The work conducted at VCU was partially sponsored by U.S. National Science Foundation under Grants
CNS-1320349 and CNS-1218960.
• Inefficient cache management results in
significant address translation overhead,
which degrades the performance and lifetime of flash memory and can be reduced
by increasing the cache hit ratio or lowering
the probability of replacing a dirty entry.
+15.3%
76.5%
1. Batch-update replacement writes back
all dirty entries in a TP node when replacing one of them.
• The number change of cached translation
pages can be leveraged to recognize the
sequential accesses, which are common in
workloads.
Conclusion
Counter=2
GlobalTranslation
Directory(GTD)
3:609
1
• collecting trans. blocks.
• A Workload-adaptive Loading Policy
We prose a novel demand-based page-level
FTL with a translation page-level caching
mechanism, called TPFTL.
TP1
71%
Results
• Architecture
Coldest
53%
• cache loadings and replacments,
Motivation and Design
Hottest
8%
1. Address translation overhead comes from:
• increasing the cache hit ratio,
DataPage
5
14%
2. The overhead can be reduced by:
Userdata
14%
8%
1. TPFTL significantly lowers the probability
and maintains a relatively high hit ratio.
2. TPFTL reduces the response time by up to
24% and WA by up to 17%.
• To minimize the address translation overhead, we propose a new demand-based
page-level FTL, called TPFTL, which employs a two-level cache mechanism with a
workload-adaptive loading policy and an
efficient replacement policy.
• Extensive evaluations with enterprise
workloads show that TPFTL can efficiently
use a small mapping cache to perform fast
address translation at low overhead.
Author Information
Corresponding author:
[email protected]
Lab website:
http://en.wnlo.cn/