ANATOMY OF LINUX JOURNALING FILE SYSTEMS M Tim

  • Slides: 24
Download presentation
ANATOMY OF LINUX JOURNALING FILE SYSTEMS M. Tim Jones Emulex

ANATOMY OF LINUX JOURNALING FILE SYSTEMS M. Tim Jones Emulex

Overview • Paper surveys past and current Linux JFS • Presents three modes of

Overview • Paper surveys past and current Linux JFS • Presents three modes of operation – Writeback mode – Ordered mode – Data mode • Discusses tail packing

History • IBM's JFS – First released in 1990 – Updated since then (JSF

History • IBM's JFS – First released in 1990 – Updated since then (JSF 2) • Silicon Graphics' XFS – Released in 1994 – Ported into Linux in 1994

History • Smart FS – Developed for the Amiga – Supported by Linux until

History • Smart FS – Developed for the Amiga – Supported by Linux until 2005 • ext 3 fs – Most commonly used – Extension of ext 2 with journaling – Supported by Linux since 2001

History • Reiser File System – Introduced many new features – Author now serves

History • Reiser File System – Introduced many new features – Author now serves 15 years to life sentence for second-degree homicide • Killed his estranged wife • Plea-bargained his first-degree homicide conviction

Variation on Journaling • Writeback mode: – Only journals metadata – Makes no guarantee

Variation on Journaling • Writeback mode: – Only journals metadata – Makes no guarantee that data updates will be written to disk before associated metadata are marked as committed • Ordered mode: – Makes that guarantee • Data mode:

Writeback mode issues • Metadata can be marked as committed before the data they

Writeback mode issues • Metadata can be marked as committed before the data they point to are written to disk – File system can be corrupted if the system crashes • After some metadata are marked as committed • Before the data they point to are written to disk

Writeback mode issues Committed i-node Due to a crash, block B was never written

Writeback mode issues Committed i-node Due to a crash, block B was never written to disk Block B

Data mode issues • Most reliable • Slowest: – All data must be written

Data mode issues • Most reliable • Slowest: – All data must be written twice

JFS 2 • Supports – Ordered journaling – Extent-based allocation: • Allocates contiguous sets

JFS 2 • Supports – Ordered journaling – Extent-based allocation: • Allocates contiguous sets of blocks –Better read and write performance –Metadata are only updated for the extent

JFS 2 • Uses B+ trees for – Fast directory lookups – Managing extent

JFS 2 • Uses B+ trees for – Fast directory lookups – Managing extent descriptors • Has no internal journal commit policy – Relies on timeouts of kupdate daemon • Daemon that periodically writes modified buffers to disk

XFS • Supports full 64 -bit addressing • Uses B+ trees for both directories

XFS • Supports full 64 -bit addressing • Uses B+ trees for both directories and file allocation • Uses extent-based allocation with variable block size support (512 B to 64 KB) • Uses delayed allocation for extents – Extent is not allocated until blocks are ready to be written on disk

Extent-based allocation • When a process creates a file, the file system allocates a

Extent-based allocation • When a process creates a file, the file system allocates a set of contiguous physical blocks to the file – Improves access times for large files – Reduces file fragmentation • Large files can occupy multiple extents – ext 4 extents can go up to 128 MB with a 4 KB block size

Ext 3 fs • Compatible with non-journaling ext 2 FS • Supports – Writeback

Ext 3 fs • Compatible with non-journaling ext 2 FS • Supports – Writeback – Ordered – Journal data journaling modes • Does not support extents – Not as fast as JFS, XFS and Reiser FS

A parenthesis: Ext 2 • Essentially analogous to the UNIX fast file system we

A parenthesis: Ext 2 • Essentially analogous to the UNIX fast file system we have discussed – Fifteen block addresses per i-node – Cylinder groups are called block groups • Major differences include – Larger maximum file size: 16 GB to 2 TB – Various extensions

Reiser. FS • • Introduced in 2001 Now dead Default mode is ordered Includes

Reiser. FS • • Introduced in 2001 Now dead Default mode is ordered Includes tail packing – Uses empty space at the end of large files – Reduces internal fragmentation

Tail packing • Also known as tail merging • Tail here refers to the

Tail packing • Also known as tail merging • Tail here refers to the last block of a file – Rarely full • Tail packing stores in the same block – Tails of several files – Very small files • Reduces internal fragmentation • Adds complexity

Without tail packing File A File B File C Too much wasted space

Without tail packing File A File B File C Too much wasted space

With tail packing File A File B File C Now occupies a single bloc

With tail packing File A File B File C Now occupies a single bloc Shares last block of file A

Reiser 4 • Was designed from scratch • Was to use – Wandering logs

Reiser 4 • Was designed from scratch • Was to use – Wandering logs – Delayed allocation of extents • As in XFS

Ext 4 fs (I) • Evolution from ext 3 fs – Can mount an

Ext 4 fs (I) • Evolution from ext 3 fs – Can mount an ext 4 fs partition as ext 3 fs or an ext 3 fs partition as ext 4 fs • 64 -bit file system – 48 -bit block addresses • Can support very large volumes – One exabyte, that is, 230 gigabytes!

Ext 4 fs (II) • Can support extents – Becomes then incompatible with ext

Ext 4 fs (II) • Can support extents – Becomes then incompatible with ext 3 fs • Uses delayed extent allocation – Reduces file fragmentation • Especially when file grows • Checksums contents of journal – More reliable

Ext 4 fs (III) • Uses H-trees instead of B+ or B* trees for

Ext 4 fs (III) • Uses H-trees instead of B+ or B* trees for indexes • Includes an online defragmenting tool – e 4 defrag – Can defragment individual files or entire file systems • Minimum timestamp resolution is one ns

Conclusions • Journaling file systems – Protect data against computer crashes and power failures

Conclusions • Journaling file systems – Protect data against computer crashes and power failures – Allow faster file system recovery after a crash • No need to fschk the whole file system – Have become the new standard