The Design and Implementation of a Log-Structured Log Structured File System Mendel Rosenblum and John K. Ousterhout Contents Overview Motivation Design and implementation of LFS Cleaning g policy p y Evaluation of real implementation Concluding comments Overview Goal 전체 디스크 사용 효율을 높임 Method 디스크의 로그 구조 : 모든 쓰기 작업은 “appended” 작은 랜덤 쓰기 작업 -> 하나의 큰 순차적인 쓰기 작업 Key issue Cleaning Policy Contents Overview Motivation Design and implementation of LFS Cleaning g policy p y Evaluation of real implementation Concluding comments Motivation Technology trends CPU : 지수적으로 성능 개선 Main Memory : Size 가 지수적으로 증가 Larger file caches possible : est Absorbing greater fraction of the read req request Disk traffic dominated by writes Acts as write buffer crash 시 lost data가 큼 Disk : Improved in Cost and Capacity Disk s transfer t a s e bandwidth ba dw dt : 충분히 개선 Disk array, parallel-head disk Access time (HDD 모터 속도에 좌우) : 큰 개선 없음 File il system workload kl d Worst case workload : Office and engineering applications Small random disk I/O LFS is mainly focused on small file workload Motivation Problems of other file systems Widely spreading Information 매우 많은 작은 단위 access 유발 (dir entry, entry inode, inode data block) (Ex.) Unix FFS : disk I/O 최소 5회 발생 Synchronous y metadata writes Metadata structures are written synchronously For many small file workload, disk traffic is dominated by the synchronous metadata writes Contents Overview Motivation Design and implementation of LFS Cleaning g policy p y Evaluation of real implementation Basic concept & Issues The basic idea of LSF system Traditional FS VS LFS : Metadata or Small file Two key issues How to retrieve information from the log How to manage the free space on disk How to retrieve information from the log Adopts Indexed structure : same as Unix FFS Each inode is at a fixed location on disk in Unix FFS Each inode is written to the log in Sprite LFS with inode map inode data block metadata block bl k ptr t block ptr … block ptr block ptr block ptr index block dir entry file name inode number Unix FFS에서 는 inode의 위치가 disk상 에 고정 Major data structure Stored on Disk by Sprite LFS Physical layout in disk Example of creating 2 files in different directories Free space management(1/2) Threading and Copying Goal : to maintain large free extents for writing new data Sprite LFS는 threading과 copying을 같이 사용 Leave the live data in place Segment 간 -> threading live data -> copying Copy live data out of the log Free Space Management(2/2) Segment : unit of writing and cleaning 512KB ~ 1024KB Disk : consists of segments + checkpoint region segment 0 segment 1 … segment n checkpoint region … Segment summary block Contains each block’s identity : <inode number, offset> Used to check validness of each block Modified times for each block Operations Read a block Inode map block ptr -> inode map block -> inode -> data block In memory Same as FFS Write a block Data block, inode, inode map block, segment usage table block 메모리에 있는 현재 세그먼트 used not used Update inode map table ptr, segment summary block, segment usage table Crash recovery Checkpoint 주기적으로 혹은 사용자의 요구시, Data block, block indirect block, block inode, inode inode map block, block segment usage table Disk상의 fixed 된 checkpoint region에 기록 C Consistent i t t state t t : 메모리에 남겨진 수정된 데이터가 없음 Roll-forward h가 발생하면, 발생하면 만약 crash가 가장 최근의 checkpoint의 쓰여진 로그를 살펴봄 checkpoints crash roll-forward Contents Overview Motivation Design and implementation of LFS Cleaning g policy p y Evaluation of real implementation Cleaning policy Cleaning : simple three step process Read segments in memory Identify the live data Write the live data back to a smaller number of clean segments problems To identify which blocks of each segment are live So that they can be written out again To identify the file to which each block belongs and the position of the block within the file To update the file’s inode to point to the new location of block Solution Writing a segment summary block as part of each segment UID(inodeNO + file Rev.) file 삭제 시 Rev 변경 변경 catch : no bi fi li needed d d bitmap or firee list Cleaning policy 4 problems when? threshold value how many segments at a time? Segment selection policy - most fragmented ? Bl k redistribution Block di t ib ti policy li Try to enhance the locality of future reads fil in i th t files the same di directory aging sort : 최근 수정 시간으로 정렬 유사 age 별로 grouping new segment에 할당 Measurement :write cost Write cost 신규 데이터 쓰기의 디스크 활동 평균 시간[cleaning overhead 포함] / byte UNIX FFS : seek/rotational time LFS : cleaning overhead write cost 10.0 : 90% time is wasted 1 0 (cleaning overhead 없이 disk full Ideal case : 1.0 bankwidth로 써짐) Write cost of LFS No seek/rotational time in LFS 쓰기 비용은 소거 중 “복사된 총 데이터”에 의해 결정 U(Utilization) : fraction of data still alive in cleaned segment Goal : 소거된 세그먼트에 valid 데이터를 감소 Tradeoff : cost & utilization LFS에서 cost-performance와 cost performance와 utilization과 tradeoff 관계 Bimodal segment distribution Improved upto 25% using logging, delayed writes, and disk request sorting Simulation based research Simulator 디스크는 4-KB files로 채움 특정 disk capacity capacit utilization의 tili ation의 생성 Two access patterns : U if Uniform : random d access pattern tt Hot-and-cold : 90% writes to 10% “hot” files, 10% writes to 90% “cold” cold files locality부여 Simulated policy Segment selection Greedy : 최소 사용된 세그먼트 선택 (U값 최소선택) Block redistribution Uniform : No redistribution Hot-and-cold : Age sorting Sort by last modified time of file age 별 segment 분리 bimodal을 기대 First result locality 와 redistribution가 “worse” worse 성능의 결 과를 보임 FFS FFS improved Logging, delayed write, disk request sorting ---- : hot-and-cold (age sorting) ___ : uniform Analysis Hot segments are more frequently cleaned hot-and-cold에서 소거된 세그먼트의 활용이 uniform 보다 높음 Cost benefit selection policy Cost-benefit Segment selection 1 : a cost to read segment u : write back the live data Segment usage table Block size : 512 byte M time block Unused segment :Null N byte :Null N byte b t F1 F1 F1 System time : 3034 F1 block1block2 Update : file 1->block 2 :1024 M time : 5134 N byte :2048 table file 1->block 2 M time N byte usage create : file 1->block 1 block1block2 bl k1bl k2 :1024 : 3034 Segment System time : 2034 M time : 2034 segment F1 F1 F2 F2 block1block2block1block2 System time : 5134 create : file 2->block 2 >block 1 file 2->block 2 Result Contents Overview Motivation Design and implementation of LFS Cleaning g policy p y Evaluation of real implementation Implementation study Implementation complexity FFS와 대부분 같음 LFS FFS Segment Cleaner Allocation bitmaps, layout policies fsck code Ch k i t/ ll f Checkpoint/roll-forward d But, FFS can reuse codes Sprite network operating system의 구현 Installed in 5 different disk partitions used by about 30 user Micro benchmarks Micro-benchmarks Small file workload workload, no cleaning happened (best case performance) create/delete case는 대략 10 배 정도 FFS보다 빠름 expectation p of performance improvement with faster processor FFS is diskbound : 85% utilized (Cf. LFS : 17%) Micro benchmarks Micro-benchmarks Large file workload, workload no cleaning happened 100MB file,, write & read performance (5 phases are run in sequence) New write N it creating file Overwrite O it tto existing file Long term usage statistics Collected over a 4-month period About 70% of bandwidth utilized (write cost 1.2~1.6 : bandwidth 63~83%) Segment utilization of /user6 partition Large number of fully utilized and totally empty segments Critics on LFS LFS의 성능 향상 이득은 최상인가? 메타데이터 집중 워크로드에서 뛰어남 읽기/쓰기의 일반적인 I/O 성능은 Sun-FFS와 S n FFS와 비슷하거 나 적음 LFS 읽기 성능은 일반적으로 FFS보다 적음 지움(cleaning)에 대한 오버헤드는 성능을 저하시 킴 Sun-FFS 구현 비용은 LFS보다 훨씬 적음
© Copyright 2025