University of Wisconsin Madison RELIABILITY ANALYSIS OF ZFS
University of Wisconsin - Madison RELIABILITY ANALYSIS OF ZFS CS 736 Project
Reliability Analysis of ZFS Summary To perform reliability analysis of ZFS Test existing reliability claims Layered driver interface – simulating transient block corruptions at various levels in ZFS ondisk hierarchy. Results � Classes of fault handled by ZFS. � Measure of the robustness of ZFS. � Lessons on building a reliable, robust file system. University of Wisconsin - Madison
Coming Up Outline of the talk ZFS Organization � ZFS On Disk format � ZFS features and specs regarding reliability. Experimental Setup and Experiments Results and Conclusions Future Work University of Wisconsin - Madison
ZFS Organization ZFS ZFS Pooled Storage Model ZFS Pool -Pooled Storage Model - Disk is a ZFS pool comprising of many file systems. University of Wisconsin - Madison
ZFS Organization Object based Transactional based object file system � Every structure is an object. � Operation on object(s) is a transaction. � Grouping of transaction as transaction group. All data and metadata blocks are checksummed. � No silent corruptions. Modifications are always Copy on Write � Always on-disk consistent. All metadata and data(optional) is compressed. University of Wisconsin - Madison
ZFS Structures Entire file system is represented as � Objects - dnode_phys_t � Object Sets - dnode_phys_t [ ] P/L analogy – each object is a template. The bonus buffer describes specific attributes. University of Wisconsin - Madison
ZFS Structures Blocks and block pointers Data transferred to disks in terms of blocks. Block pointers (blkptr_t) used to locate, verify and describe blocks. � Contains checksum and compression information. � Physical size of block <> Logical Size of block � Gang blocks University of Wisconsin - Madison
ZFS Structures Block pointers Data Virtual Address – combination of fields in blkptr_t to locate block on disk. Wideness – blkptr_t can store upto three copies of the data pointed by a unique DVA. These blocks are called as “ditto blocks”. � � � vdev 1 asize offset 1 vdev 2 asize offset 2 Three for pool wide metadata Two for file system wide metadata One for data (configurable) vdev 3 asize offset 3 Lvl typ cksum comp psize lsize University of Wisconsin - Madison
ZFS Structures University of Wisconsin - Madison Wideness
ZFS Structures Attributes on disk ZAP (ZFS Attribute Processor) ZAP objects used to handle arbitrary (name, object) associations within an object set (objset) � Most commonly used to implement directories � Also used extensively throughout the DSL University of Wisconsin - Madison
Putting it all together • Everything in ZFS is an object. Objects • A dnode describes and organizes a collection of blocks making up an object. University of Wisconsin - Madison Objects
Putting it all together Object set Objects Object Sets • Group related objects to form objsets. • Filesystems, volumes, clones and snapshots are objsets. University of Wisconsin - Madison
Putting it all together Object set Objects Space map Snapshot Information Data. Sets • Encapsulates objset and provides • Space usage • Snapshot Information Data. Set University of Wisconsin - Madison
Putting it all together Object set • Groups Datasets Space map Objects Dataset directories Snapshot Information • Properties such as quotas, compression • Dataset Relationships Data. Set Properties Data. Set Directory Child Map University of Wisconsin - Madison
A road less travelled University of Wisconsin - Madison From vdev label to data
To sum up Moving forward Layers of indirection End to end Checksums which are separated from data. Wideness (Ditto Blocks) (3 – 2 – 1) Compression Copy on Write Scrub facility University of Wisconsin - Madison
Experimental Setup Corruption Framework � Corrupter Modify Driver physical disk blocks � Analyzer App Understand on-disk ZFS structures � Consumer App Monitor ZFS responses, error codes University of Wisconsin - Madison
Experimental Setup - Simplification Setup on Solaris 10 VM Only one physical vdev (disk) No striping, mirror, raid… Initial target – Pointer Corruption � Reduced Sample Space � Interesting Cases Disable compression as much as possible University of Wisconsin - Madison
Initial Finding All metadata compressed � Cannot disable metadata compression Pointer Corruption not feasible Perform corruptions on compressed objects � Representative of effects of disk faults on ZFS University of Wisconsin - Madison
Corruption Experiments TYPE: � Type-aware Object Corruptions TARGET (Targeted On-Disk Objects) � � � Vdev labels [@Pool] Uberblocks [@Pool] Object sets Meta Object Set [@Pool] Myfs Object Set [@FS] � � � objset_phys_t (describing object set) Object array objset_phys_t Indirect blkptr objects Object array ZIL [@FS] File Data [@FS] Directory Data [@FS] University of Wisconsin - Madison
Results Detection Recovery Correction vdev label YES/Checksum YES/Replica NO/COW uberblock YES/Checksum YES/Replica NO/COW MOS Object Set YES/Checksum YES/Replica NO/COW FS Object YES/Checksum YES/Replica NO/COW FS Indirect Objects YES/Checksum YES/Replica NO/COW FS Object Set YES/Checksum YES/Replica NO/COW ZIL YES/Checksum NO NO Directory Data YES/Checksum NO/Configurable File Data YES/Checksum NO/Configurable University of Wisconsin - Madison
Summary (using IRON Taxonomy) Detection � Checksums in parent blkptrs Recovery � Replication in parent blkptrs (ditto blocks) University of Wisconsin - Madison
Conclusion Integration of File System and Volume Manager � Saves Use of one generic pointer block for checksums and replication � Merkel an additional translation tree provides Robustness Use of replication/compression in commodity file system viable COW can be used effectively University of Wisconsin - Madison
Observations/Questions No correction of ditto blocks: relies on COW � Consecutive (n=wideness) failures without transaction group commit ? ? � Snapshot corruption ? ? Explicit scrubbing corrects ditto blocks in-place � Potential for corruption ? ? Space/ Performance hit due to redundancy/compression � 2% hit in terms of space/IO ? ? (Banham & Nash) � No Page Cache, uses ARC University of Wisconsin - Madison
Future Work Snapshot corruptions Multiple device configuration � Striping � Mirror � RAID-Z University of Wisconsin - Madison
- Slides: 25