Session 4 : Storage ( 09:00 - 10:15, April 23) An Efficient FTL to Optimize Address Translation in Flash Memory ∗ ∗ † † ∗ ∗ You Zhou , Fei Wu , Ping Huang , Xubin He , Changsheng Xie , and Jian Zhou ∗ † Huazhong University of Science and Technology, China Virginia Commonwealth University, USA Introduction Analytical Models Observation 1 Observation 2 NAND flash memory gains popularity due to high performance, low energy, and decreasing $/GB. To emulate a standard block device, a flash translation layer (FTL) is required to perform address translation and garbage collection. The FTL uses a mapping cache to accelerate the address translation. However, inefficient cache management introduces significant overhead, which degrades the performance and lifetime of flash memory. We have developed a performance model and a write amplification (WA) model to analyze the address translation overhead in a demand-based page-level FTL. To investigate how to lower the probability, we have conducted experiments to study the distribution of cached entries in DFTL. To investigate how to increase the hit ratio, we have studied the spatial locality in several representative enterprise workloads. Activeentries LPN ··· ··· ··· Loadentries PPN ··· ··· ··· Mappingtable LPN PPN Writeback dirtyentries Trans.Page Mapping Cache Two conclusions: Flash Memory MRU 1:335 x:x 1:334 x:x TP8 8:520 x:x TP3 • lowering the probability of replacing a dirty entry. LRU TP5 3:106 x:x 5:436 x:x 3:952 x:x 5:435 x:x TVPN TPPN x:x Aclusterwith threeentrynodes Mapping Cache 5:56 x:x 5:55 x:x 5:979 x:x ~ • Less than 15% of entries in a cached translation page are recently accessed. • Cached trans. pages with multiple dirty entries may be written back repeatedly when replacements occur. To improve the hit ratio, two techniques are proposed, based on Observation 2. We use the trace-driven simulation to evaluate TPFTL and compare it with DFTL, S-FTL and the 5 optimal FTL under four enterprise workloads. +15.3% 1. Request-level prefetching leverages the sequentiality in each large request. >95% +13.1% +16.5% 0 x:x 1 x:x 2 x:x ... ... • Two-level LRU Lists Cached entries that belongs to the same translation page, which is the access unit of the mapping table in flash memory, are clustered in a TP node. <4% 2. Selective prefetching leverages the sequentiality among requests. <4% +15.3% <4% • An Efficient Replacement Policy 5.1 4.9 4.1 To lower the probability, two techniques are proposed, based on Observation 1. 2. Clean-first replacement postpones the replacements of dirty entries. Acknowledgment We are grateful to our shepherd Michael Swift and the anonymous reviewers for their insightful feedback. This work was sponsored in part by the National Basic Research Program of China (973 Program) under Grant No.2011CB302303 and the National Natural Science Foundation of China No. 61300047. The work conducted at VCU was partially sponsored by U.S. National Science Foundation under Grants CNS-1320349 and CNS-1218960. • Inefficient cache management results in significant address translation overhead, which degrades the performance and lifetime of flash memory and can be reduced by increasing the cache hit ratio or lowering the probability of replacing a dirty entry. +15.3% 76.5% 1. Batch-update replacement writes back all dirty entries in a TP node when replacing one of them. • The number change of cached translation pages can be leveraged to recognize the sequential accesses, which are common in workloads. Conclusion Counter=2 GlobalTranslation Directory(GTD) 3:609 1 • collecting trans. blocks. • A Workload-adaptive Loading Policy We prose a novel demand-based page-level FTL with a translation page-level caching mechanism, called TPFTL. TP1 71% Results • Architecture Coldest 53% • cache loadings and replacments, Motivation and Design Hottest 8% 1. Address translation overhead comes from: • increasing the cache hit ratio, DataPage 5 14% 2. The overhead can be reduced by: Userdata 14% 8% 1. TPFTL significantly lowers the probability and maintains a relatively high hit ratio. 2. TPFTL reduces the response time by up to 24% and WA by up to 17%. • To minimize the address translation overhead, we propose a new demand-based page-level FTL, called TPFTL, which employs a two-level cache mechanism with a workload-adaptive loading policy and an efficient replacement policy. • Extensive evaluations with enterprise workloads show that TPFTL can efficiently use a small mapping cache to perform fast address translation at low overhead. Author Information Corresponding author: [email protected] Lab website: http://en.wnlo.cn/
© Copyright 2024