The Design and Implementation of a Log Structured File System a

The Design and Implementation of
a Log-Structured
Log Structured File System
Mendel Rosenblum
and John K. Ousterhout
Contents
Overview
Motivation
Design and implementation of LFS
Cleaning
g policy
p
y
Evaluation of real implementation
Concluding comments
Overview
Goal
전체 디스크 사용 효율을 높임
Method
디스크의 로그 구조 : 모든 쓰기 작업은 “appended”
작은 랜덤 쓰기 작업 -> 하나의 큰 순차적인 쓰기 작업
Key issue
Cleaning Policy
Contents
Overview
Motivation
Design and implementation of LFS
Cleaning
g policy
p
y
Evaluation of real implementation
Concluding comments
Motivation
Technology trends
CPU : 지수적으로 성능 개선
Main Memory : Size 가 지수적으로 증가
Larger file caches possible :
est
Absorbing greater fraction of the read req
request
Disk traffic dominated by writes
Acts as write buffer  crash 시 lost data가 큼
Disk :
Improved in Cost and Capacity
Disk
s transfer
t a s e bandwidth
ba dw dt : 충분히 개선
Disk array, parallel-head disk
Access time (HDD 모터 속도에 좌우) : 큰 개선 없음
File
il system workload
kl d
Worst case workload : Office and engineering applications
Small random disk I/O
LFS is mainly focused on small file workload
Motivation
Problems of other file systems
Widely spreading Information
매우 많은 작은 단위 access 유발 (dir entry,
entry inode,
inode data block)
(Ex.) Unix FFS : disk I/O 최소 5회 발생
Synchronous
y
metadata writes
Metadata structures are written synchronously
For many small file workload, disk traffic is dominated by the
synchronous metadata writes
Contents
Overview
Motivation
Design and implementation of LFS
Cleaning
g policy
p
y
Evaluation of real implementation
Basic concept & Issues
The basic idea of LSF system
Traditional FS VS LFS
: Metadata
or Small
file
Two key issues
How to retrieve information from the log
How to manage the free space on disk
How to retrieve information from the log
Adopts Indexed structure : same as Unix FFS
Each inode is at a fixed location on disk in Unix FFS
Each inode is written to the log in Sprite LFS with inode map
inode
data
block
metadata
block
bl
k ptr
t
block ptr
…
block ptr
block ptr
block ptr
index
block
dir entry
file name
inode number
Unix FFS에서
는 inode의
위치가 disk상
에 고정
Major data structure Stored on Disk by Sprite LFS
Physical layout in disk
Example of creating 2 files in different
directories
Free space management(1/2)
Threading and Copying
Goal : to maintain large free extents for writing new data
Sprite LFS는 threading과 copying을 같이 사용
Leave the
live data in
place
Segment 간 -> threading
live data -> copying
Copy live
data out of
the log
Free Space Management(2/2)
Segment : unit of writing and cleaning
512KB ~ 1024KB
Disk : consists of segments + checkpoint region
segment 0
segment 1
…
segment n
checkpoint region
…

Segment summary block

Contains each block’s identity : <inode number, offset>

Used to check validness of each block

Modified times for each block
Operations
Read a block
Inode map block ptr -> inode map block -> inode -> data
block
In memory
Same as FFS
Write a block
Data block, inode, inode map block, segment usage table
block
메모리에 있는
현재 세그먼트
used
not used
Update inode map table ptr, segment summary block,
segment usage table
Crash recovery
Checkpoint
주기적으로 혹은 사용자의 요구시,
Data block,
block indirect block,
block inode,
inode inode map block,
block segment
usage table
Disk상의 fixed 된 checkpoint region에 기록
C
Consistent
i t t state
t t : 메모리에 남겨진 수정된 데이터가 없음
Roll-forward
h가 발생하면,
발생하면
만약 crash가
가장 최근의 checkpoint의 쓰여진 로그를 살펴봄
checkpoints
crash
roll-forward
Contents
Overview
Motivation
Design and implementation of LFS
Cleaning
g policy
p
y
Evaluation of real implementation
Cleaning policy
Cleaning : simple three step process
Read segments in memory
Identify the live data
Write the live data back to a smaller number of clean segments
problems
To identify which blocks of each segment are live
So that they can be written out again
To identify the file to which each block belongs and the position of
the block within the file
To update the file’s inode to point to the new location of block
Solution
Writing a segment summary block as part of each segment
UID(inodeNO + file Rev.)  file 삭제 시 Rev 변경  변경 catch : no
bi
fi
li needed
d d
bitmap
or firee
list
Cleaning policy
4 problems
when?  threshold value
how many segments at a time?
Segment selection policy - most fragmented ?
Bl k redistribution
Block
di t ib ti policy
li
Try to enhance the locality of future reads
fil in
i th
t
files
the same di
directory
aging sort : 최근 수정 시간으로 정렬  유사 age
별로 grouping  new segment에 할당
Measurement :write cost
Write cost
신규 데이터 쓰기의 디스크 활동 평균 시간[cleaning
overhead 포함] / byte
UNIX FFS : seek/rotational time
LFS : cleaning overhead
write cost 10.0 : 90% time is wasted
1 0 (cleaning overhead 없이 disk full
Ideal case : 1.0
bankwidth로 써짐)
Write cost of LFS
No seek/rotational time in LFS
쓰기 비용은 소거 중 “복사된 총 데이터”에 의해 결정
U(Utilization) :
fraction of data
still alive in
cleaned segment
Goal : 소거된 세그먼트에 valid 데이터를 감소
Tradeoff : cost & utilization
LFS에서 cost-performance와
cost performance와 utilization과
tradeoff 관계
Bimodal segment distribution
Improved upto
25% using logging,
delayed writes,
and disk request
sorting
Simulation based research
Simulator
디스크는 4-KB files로 채움
특정 disk capacity
capacit utilization의
tili ation의 생성
Two access patterns :
U if
Uniform
: random
d
access pattern
tt
Hot-and-cold : 90% writes to 10% “hot” files,
10% writes to 90% “cold”
cold files  locality부여
Simulated policy
Segment selection
Greedy : 최소 사용된 세그먼트 선택 (U값 최소선택)
Block redistribution
Uniform : No redistribution
Hot-and-cold : Age sorting
Sort by last modified time of file  age 별
segment 분리  bimodal을 기대
First result
locality 와 redistribution가 “worse”
worse 성능의 결
과를 보임
FFS
FFS improved
Logging, delayed write,
disk request sorting

---- : hot-and-cold (age sorting)

___ : uniform
Analysis
Hot segments are more frequently cleaned
hot-and-cold에서 소거된 세그먼트의 활용이 uniform
보다 높음
Cost benefit selection policy
Cost-benefit
Segment selection
1 : a cost to read segment
u : write back the
live data
Segment usage table
Block size : 512 byte
M time
block
Unused
segment
:Null
N byte
:Null
N byte
b t
F1
F1
F1
System time : 3034
F1
block1block2
Update : file 1->block 2
:1024
M time
: 5134
N byte
:2048
table
file 1->block 2
M time
N byte
usage
create : file 1->block 1
block1block2
bl
k1bl k2
:1024
: 3034
Segment
System time : 2034
M time
: 2034
segment
F1
F1
F2
F2
block1block2block1block2
System time : 5134
create : file 2->block
2 >block 1
file 2->block 2
Result
Contents
Overview
Motivation
Design and implementation of LFS
Cleaning
g policy
p
y
Evaluation of real implementation
Implementation study
Implementation complexity
FFS와 대부분 같음
LFS
FFS
Segment Cleaner
Allocation bitmaps, layout
policies
fsck code
Ch k i t/ ll f
Checkpoint/roll-forward
d
But, FFS can reuse codes
Sprite network operating system의 구현
Installed in 5 different disk partitions used by
about 30 user
Micro benchmarks
Micro-benchmarks
Small file workload
workload, no cleaning happened
(best case performance)

create/delete
case는 대략 10
배 정도 FFS보다
빠름
expectation
p
of performance
improvement
with faster
processor

FFS is diskbound : 85%
utilized (Cf.
LFS : 17%)

Micro benchmarks
Micro-benchmarks
Large file workload,
workload no cleaning happened
100MB file,, write
& read performance
(5 phases are run in
sequence)

New write
N
it
creating file
Overwrite
O
it tto
existing file
Long term usage statistics

Collected over a 4-month period
About 70% of bandwidth utilized (write
cost 1.2~1.6 : bandwidth 63~83%)


Segment utilization of /user6 partition
Large number of fully utilized and
totally empty segments

Critics on LFS
LFS의 성능 향상 이득은 최상인가?
메타데이터 집중 워크로드에서 뛰어남
읽기/쓰기의 일반적인 I/O 성능은 Sun-FFS와
S n FFS와 비슷하거
나 적음
LFS 읽기 성능은 일반적으로 FFS보다 적음
지움(cleaning)에 대한 오버헤드는 성능을 저하시
킴
Sun-FFS 구현 비용은 LFS보다 훨씬 적음