ZFS The last word in File Systems IS

  • Slides: 26
Download presentation
ZFS: The last word in File Systems - IS IT ? Swaminathan Sundararaman Sriram

ZFS: The last word in File Systems - IS IT ? Swaminathan Sundararaman Sriram Subramanian

ZFS: Zettabyte File System l l The last word in file systems "We've rethought

ZFS: Zettabyte File System l l The last word in file systems "We've rethought everything and rearchitected it, " - Jeff Bonwick, Sun distinguished engineer and chief architect of ZFS. l "We've thrown away 20 years of old technology that was based on assumptions no longer true today. "

Our Goal l To uncover interesting policies of ZFS l Focus on l How

Our Goal l To uncover interesting policies of ZFS l Focus on l How ZFS automatically chooses multiple block sizes, to match workload l Policy and performance analysis of ZFS during synchronous workloads

Methodology l Semantic Block Analysis [ Prabhakaran et. al. ’ 05] Workload Application File

Methodology l Semantic Block Analysis [ Prabhakaran et. al. ’ 05] Workload Application File System Pseudo Device Driver OS Disk Block Inference

Preliminary Results l Naïve block allocation policy l l Dynamic merges small block writes

Preliminary Results l Naïve block allocation policy l l Dynamic merges small block writes l l l Does not work well for random workloads Suffers from Read-Modify-Write for some workload Poor ZFS Intent Log blocks allocation policy Dynamically changes the block writing mechanism based on workload (under investigation)

Outline l Infrastructure l l Policies l l Block Classification Strategy Block Allocation Dynamic

Outline l Infrastructure l l Policies l l Block Classification Strategy Block Allocation Dynamic block resizing ZFS Intent Log (ZIL) Conclusion

Infrastructure l Pseudo Device Driver l l Implemented a Block Driver using Layered Device

Infrastructure l Pseudo Device Driver l l Implemented a Block Driver using Layered Device Interface (LDI) Ioctls to control collection of statistics l l Issue: Solaris did not allow us to issue ioctls to pseudo block drivers Solution: Indirection l Wrote a dummy character driver and redirected the ioctl requests to our block device

Infrastructure (Contd. ) l Selective classification l Log files for Offline block analysis l

Infrastructure (Contd. ) l Selective classification l Log files for Offline block analysis l Negligible performance overheads l Asynchronously written to the log file

Block Classification Strategy l Uber blocks l l l 1024 byte blocks Identified by

Block Classification Strategy l Uber blocks l l l 1024 byte blocks Identified by its Magic Flag Data blocks l Identified by a special pattern l l Pattern repeated after ever 512 byte offset Individual data blocks l identified by seq. increasing numbers

Block Classification Strategy l ZIL blocks l l Identified by its Magic Flag Meta-data

Block Classification Strategy l ZIL blocks l l Identified by its Magic Flag Meta-data blocks l Rest of the blocks

Sequential Write of 1 GB file l l l Block size: 4 K ZFS

Sequential Write of 1 GB file l l l Block size: 4 K ZFS Caches small block writes Large sequential 128 k block writes

Random writes inside 4 GB file l l Block size: 4 K Large 128

Random writes inside 4 GB file l l Block size: 4 K Large 128 k block write for every small 4 k write

Random Writes of 4 K blocks Offset Block Size 36 40 20 40 84

Random Writes of 4 K blocks Offset Block Size 36 40 20 40 84 88 0 88 20 88 52 88 16 88 4 88

Random Writes of 512 bytes Offset Block size 0 0. 5 16 16. 5

Random Writes of 512 bytes Offset Block size 0 0. 5 16 16. 5 32 32. 5 64 64. 5 127. 5 128 150 128

Inference l Block Allocation l l Purely based on file offsets l Block size

Inference l Block Allocation l l Purely based on file offsets l Block size is set to 128 K for offsets >= 128 k l Block size is a multiple of 512 bytes for offsets < 128 k NOT based on dynamic workload characteristics

Small Sequential Writes of 4 K l Write 4 K blocks l Sleep 10

Small Sequential Writes of 4 K l Write 4 K blocks l Sleep 10 sec l Write Next block

Small Seq. Writes of 32 KBytes

Small Seq. Writes of 32 KBytes

Unmount after every write

Unmount after every write

Dynamic Resizing of Blocks l Until file sizes < 128 k l Appending data

Dynamic Resizing of Blocks l Until file sizes < 128 k l Appending data to small files in inefficient l If data is not in memory l Small append converted to Read-Modify-Write

COW in ZFS l Copy-on-write design makes most disk writes sequential l Multiple block

COW in ZFS l Copy-on-write design makes most disk writes sequential l Multiple block sizes, automatically chosen to match workload

ZIL Block Chaining

ZIL Block Chaining

ZIL Block Allocation

ZIL Block Allocation

ZIL Block Allocation 33 K

ZIL Block Allocation 33 K

Conclusions l Block Allocation l l l Dynamic Resizing of Blocks l l l

Conclusions l Block Allocation l l l Dynamic Resizing of Blocks l l l Purely based on file offsets NOT based on dynamic workload characteristics Until file sizes < 128 k Appending data to small files in inefficient ZFS Intent Log l Internal fragmentation l Bad blocks allocation policy l Block chaining Mechanism

Conclusion l ZFS: The last Word in file systems l l Might be the

Conclusion l ZFS: The last Word in file systems l l Might be the latest word definitely not the last word !

Questions ?

Questions ?