CENG 334 Introduction to Operating Systems Filesystems Case

  • Slides: 57
Download presentation
CENG 334 Introduction to Operating Systems Filesystems – Case studies Topics: FAT UNIX V

CENG 334 Introduction to Operating Systems Filesystems – Case studies Topics: FAT UNIX V 7 Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY 1

FAT (MS-DOS) Filesystems MS-DOS, Windows 98, Windows ME Still supported in Windows NT, XP,

FAT (MS-DOS) Filesystems MS-DOS, Windows 98, Windows ME Still supported in Windows NT, XP, Vista Its use has been shifted towards embedded devices such as Digital cameras MP 3 playes i. Pod (the default filesystem) 2

Directories Filenames are limited to 8+3 characters. . Smaller ones are left justified and

Directories Filenames are limited to 8+3 characters. . Smaller ones are left justified and padded with space. Attributes: read-only, archive, hidden, system Time represented with 2 bytes: correct upto +-2 second Date: Counts in three fields: day (5 bit), month (4 bits), year (7 bits). Contains: Y 2108 problem File size: 2 bytes. Theoretically 4 GB limit, but the limit is 2 GB due to other reasons. 10 bits reserved for future use. 3

File Allocation Table MS-DOS keeps track of files through FAT table hold in memory.

File Allocation Table MS-DOS keeps track of files through FAT table hold in memory. First block number (2 bytes) used as the index to the FAT table which has 64 K entries. Block size can be set as multiple of 512 byes. Three versions of FAT depending on the number of bits a disk address contains: FAT-12 FAT-16 FAT-32 (actually should be called as FAT-28) FAT is also used for keeping track of free blocks. 4

FAT-12 Block size: 512 bytes (2^12 -10) X 512 bytes =~ 2 MB 10

FAT-12 Block size: 512 bytes (2^12 -10) X 512 bytes =~ 2 MB 10 disk addresses are used as special markers. FAT table size: 4096 entries with 2 bytes each. Worked well for floppy disks. The limit was extended using larger block sizes: 1 KB, 2 KB, 4 KB providing support for partitions 16 MB. Limited for hard disks. 5

The MS-DOS File System (2) Figure 4 -32. Maximum partition size for different block

The MS-DOS File System (2) Figure 4 -32. Maximum partition size for different block sizes. The empty boxes represent forbidden combinations. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 6

The UNIX V 7 File System (1) A UNIX V 7 directory entry. Tanenbaum,

The UNIX V 7 File System (1) A UNIX V 7 directory entry. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 7

The UNIX V 7 File System (2) A UNIX i-node. Tanenbaum, Modern Operating Systems

The UNIX V 7 File System (2) A UNIX i-node. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 8

The UNIX V 7 File System (3) Figure 4 -35. The steps in looking

The UNIX V 7 File System (3) Figure 4 -35. The steps in looking up /usr/ast/mbox. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 9

10

10

11

11

12

12

13

13

14

14

The ISO 9660 File System The ISO 9660 directory entry. Tanenbaum, Modern Operating Systems

The ISO 9660 File System The ISO 9660 directory entry. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 15

Rock Ridge Extensions Rock Ridge extension fields: • • PX - POSIX attributes. PN

Rock Ridge Extensions Rock Ridge extension fields: • • PX - POSIX attributes. PN - Major and minor device numbers. SL - Symbolic link. NM - Alternative name. CL - Child location. PL - Parent location. RE - Relocation. TF - Time stamps. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 16

Joliet Extensions Joliet extension fields: • • Long file names. Unicode character set. Directory

Joliet Extensions Joliet extension fields: • • Long file names. Unicode character set. Directory nesting deeper than eight levels. Directory names with extensions Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 17

File System Mounting A file system must be mounted before it can be accessed

File System Mounting A file system must be mounted before it can be accessed A unmounted file system is mounted at a mount point 18

Virtual File Systems VFS: another layer of abstraction Upper interface: for processes implementing POSIX

Virtual File Systems VFS: another layer of abstraction Upper interface: for processes implementing POSIX interface Lower interface: for concrete file systems VFS translates the POSIX calls to the calls of the filesystems under it. 19

VFS- 2 At boot time, the root filesystem is registered with VFS. When other

VFS- 2 At boot time, the root filesystem is registered with VFS. When other filesystems are mounted, they must also register with VFS. When a filesystem registers, it provides the list of addresses of the functions that the VFS demands, such as reading a block. After registration, when one opens a file: open(“/usr/include/unistd. h”, O_RDONLY) VFS creates a v-node and makes a call to the concrete filesystem to return all the information needed. The created v-node also contains pointers to the table of functions for the concrete filesystem that the file resides. 20

Virtual File Systems (2) A simplified view of the data structures and code used

Virtual File Systems (2) A simplified view of the data structures and code used by the VFS and concrete file system to do a read. 21

CENG 334 Introduction to Operating Systems Filesystem Corruption and Backups Erol Sahin Dept of

CENG 334 Introduction to Operating Systems Filesystem Corruption and Backups Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY 22

Filesystem corruption What happens when you are making changes to a filesystem and the

Filesystem corruption What happens when you are making changes to a filesystem and the system crashes? Example: Modifying block 5 of a large directory, adding lots of new file entries System crashes while the block is being written The new files are “lost!” System runs fsck program on reboot Scans through the entire filesystem and locates corrupted inodes and directories Can typically find the bad directory, but may not be able to repair it! The directory could have been left in any state during the write fsck can take a very long time on large filesystems And, no guarantees that it fixes the problems anyway 23

Example: removing a file requires Remove the file from its directory Release the i-node

Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks In the absence of crashes the order these steps taken do not matter. In the presence of crashes, however, it does! 24

Example: removing a file requires Remove the file from its directory Release the i-node

Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks The inodes and file blocks will not be accessible from any file yet they will not be available for reassignment. 25

Example: removing a file requires Remove the file from its directory Release the i-node

Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks The directory node will point to an invalid inode or (if the inode is reassigned) point to a different file. The blocks of the file will not be available for reassignment. 26

Example: removing a file requires Remove the file from its directory Release the i-node

Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks The file will point to empty blocks, or (after reassignment) it will share the blocks of other files to which these were reassigned. . 27

File system consistency Consistency check is typically done after a crash. . UNIX: fsck

File system consistency Consistency check is typically done after a crash. . UNIX: fsck Windoze: scandisk Redundant information in the filesystem is used: Check the blocks Check the files Two tables, each containing a counter initialized to 0 Blocks in use: How many times a block is present in a file Read all the i-nodes using a raw device (not through the filesystem calls) For each block that is referenced in the inode structure, increment the corresponding block use counter by one. Free blocks: Examine the free block list of free block bitmap structure Each appearance of a free block increments the counter by one 28

File System Consistency File system states. (a) Consistent. (b) Missing block. (c) Duplicate block

File System Consistency File system states. (a) Consistent. (b) Missing block. (c) Duplicate block in free list. (d) Duplicate data block. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 29

Missing block • • Harmless but wastes space Action: Add the missing block to

Missing block • • Harmless but wastes space Action: Add the missing block to the free list. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 30

Duplicate block in free list • • Can only occur in linked list representation.

Duplicate block in free list • • Can only occur in linked list representation. Bitmap representation does not have this problem. Action: Rebuild the free list. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 31

Duplicate data block • • The worst thing that can happen! A data block

Duplicate data block • • The worst thing that can happen! A data block appears in two different files. . Action: • Allocate a free block • Copy the contents into the new block. • Change the links such that each copy appears once in each file. • For sure, the contents of one of the files is garbled. • The filesystem is made to be consistent. • The user is informed. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 32

Consistency check for directories Uses a table of counters per file (rather than per

Consistency check for directories Uses a table of counters per file (rather than per block) Starts from the root and traverses the tree For each i-node, it increments the corresponding counter for that file Remember due to hard links, a file can appear more than once It then checks the link counts stored in the i-nodes to these values. If the link count > counter Even if the file is deleted by from all the directory entries, it will continue to exist. Solution: correct the link count If the counter > link count Although the file is linked from, say, two directories, removal from one would cause the i-node deleted leaving the other one invalid. Solution: correct the link count 33

Journaling Filesystems Ensure that changes to the filesystem are made atomically That is, a

Journaling Filesystems Ensure that changes to the filesystem are made atomically That is, a group of changes are made all together, or not at all In the directory modification example, this means that after the system reboots: The directory either looks exactly as it did before the block was modified Or the directory looks exactly as it did after the block was modified Cannot leave an FS entity (data block, inode, directory, etc. ) in an intermediate state! Idea: Maintain a log of all changes to the filesystem Log contains entries that indicate what was done e. g. , “Directory 2841 had inodes 404, 407, and 408 added to it” To make a filesystem change: 1. Write an intent-to-commit record to the log 2. Write the appropriate changes to the log Do not modify the filesystem data directly!!! 3. Write a commit record to the log This is essentially the same as the notion of database transactions 34

Journaling FS Recovery What happens when the system crashes? Filesystem data has not actually

Journaling FS Recovery What happens when the system crashes? Filesystem data has not actually been modified, just the log! So, the FS itself reflects only what happened before the crash Periodically synchronize the log with the filesystem data Called a checkpoint Ensures that the FS data reflects all of the changes in the log No need to scan the entire filesystem after a crash. . . Only need to look at the log entries since the last checkpoint! For each log entry, see if the commit record is there If not, consider the changes incomplete, and don't try to make them 35

Journaling FS Example File 1 File 2 Checkpoint Log 36

Journaling FS Example File 1 File 2 Checkpoint Log 36

Journaling FS Example File 1 File 2 Checkpoint Log 37

Journaling FS Example File 1 File 2 Checkpoint Log 37

Journaling FS Example File 1 File 2 Checkpoint Log Filesystem reflects changes up to

Journaling FS Example File 1 File 2 Checkpoint Log Filesystem reflects changes up to last checkpoint Fsck scans changelog from last checkpoint forward Doesn't find a commit record. . . changes are simply ignored 38

File System Backups (1) Backups are generally made to handle one of two potential

File System Backups (1) Backups are generally made to handle one of two potential problems: • • Recover from disaster. Recover from stupidity. • Thrash bins Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 39

Backup what? • In order to have storage efficiency, one can choose not to

Backup what? • In order to have storage efficiency, one can choose not to backup: Executable files (since they can usually be restored from manufacturer CD-ROMs • Temporary files • /tmp • Special files (which correspond to I/O devices) • /dev • Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 40

Back-up issues Full backup all the filesystem providing a snapshot of the system at

Back-up issues Full backup all the filesystem providing a snapshot of the system at that point which can be fully restored Typically done weekly or monthly Incremental backup Backup only the files that were changed after the most recent backup Typically, a weekly backup is followed by daily incremental backups Smaller backup size, and faster Compressed backups Reduced storage Less secure: a single bad byte can screw the whole backup Backing up and active filesystem is tricky Since during backup, files and directories are being modified Makes the system less secure Each backup tape/disk needs to be as safe as the serve itself. . It doesn’t matter if your backup tapes are lying around, even if you have the most secure computer system. . 41

Dumping strategies – physical dump Algorithm Starts at block 0 of the disk and

Dumping strategies – physical dump Algorithm Starts at block 0 of the disk and copies all the data onto a tape/disk Pros Simple to implement in a bug-free way Fast Cons Backups the free blocks as well Backuping a free disk takes as much storage/time as a full disk Dumping of “bad blocks” is a concern Typically disk controllers provide bad block replacement transparently without the OS even knowing about it If a block goes bad after formatting, then the OS typically creates a “file” consisting of all the bad blocks. 42

Dumping strategies – logical dump Pros Allows to restore a single file. If directories

Dumping strategies – logical dump Pros Allows to restore a single file. If directories that lie on the path to the file-to-be-restored were deleted, then they would also be restored. Cons Slow and complicated. 43

Dumping strategies – logical dump Algorithm Full backup: Traverses the filesystem as a tree

Dumping strategies – logical dump Algorithm Full backup: Traverses the filesystem as a tree and creates the same filesystem structure on the backup disk/tape. Partial backup: all the directories on the path to the particular file needs to be saved. For instance, backing up file 9 requires the saving of directories, 1, 5, 6, and 7. Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number. 44

Dumping strategies – incremental logical dump Bitmaps, indexed by i-node number are used. Phase

Dumping strategies – incremental logical dump Bitmaps, indexed by i-node number are used. Phase 1: Examine all files and directories below the starting directory (root in this case), and mark the modified files. mark ALL THE DIRECTORIES Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number. 45

Dumping strategies – incremental logical dump Phase 2: Recursively walk the tree again, and

Dumping strategies – incremental logical dump Phase 2: Recursively walk the tree again, and UNMARK all the directories that do not have any modified files under them. Note: 10 and 11 are unmarked 5 and 6 remain marked Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number. 46

Dumping strategies – incremental logical dump Phase 3: Scan the i-nodes in numerical order

Dumping strategies – incremental logical dump Phase 3: Scan the i-nodes in numerical order and Dump all the marked directories Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number. 47

Dumping strategies – incremental logical dump Phase 4: dump the marked files. Squares are

Dumping strategies – incremental logical dump Phase 4: dump the marked files. Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number. 48

Dumping strategies – incremental logical dump Restoring: Restore all the directories that were backupped

Dumping strategies – incremental logical dump Restoring: Restore all the directories that were backupped Restore all the files Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number. 49

Issues Links: If a file is linked to more than one directory, only one

Issues Links: If a file is linked to more than one directory, only one copy should be saved. Holes: In UNIX, some file, such as core files may contain holes. These files, write a few bytes, and then seek to a distant file offset and write some more bytes. These empty blocks that are not written should not be dumped and stored. Cores typically have may megabytes of empty blocks. Special files Such as named pipes (which can appear anywhere in the filesystem) should not be dumped. 50

CENG 334 Introduction to Operating Systems Filesystem Caching Topics: Disks Erol Sahin Dept of

CENG 334 Introduction to Operating Systems Filesystem Caching Topics: Disks Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY 51

File System Caching Most filesystems cache significant amounts of disk in memory e. g.

File System Caching Most filesystems cache significant amounts of disk in memory e. g. , Linux tries to use all “free” physical memory as a giant cache Avoids huge overhead for going to disk for every I/O Issues: When do you commit a write to the disk? What happens if you write only to the memory cache and then the system crashes? How do you keep the memory and disk views of a file consistent? What if the file metadata (inodes, etc. ) is modified before the data blocks? Read-ahead Read a few extra blocks into memory when you do one read operation Amortize the cost of the seek Useful if the blocks of a file are laid out in contiguous blocks Take advantage of sequential access patterns on the file 52

Caching Reading a 32 -bit word from memory takes 10 nsec. Hard disks can

Caching Reading a 32 -bit word from memory takes 10 nsec. Hard disks can transfer data at: 100 MB/sec, that is 40 nsec per 32 -bit words. . PLUS 5 -10 msecs of seek time! Caching aims to fill in the gap. . Often thousands of blocks are kept in cache. 53

Caching (1) Hash the device and the disk address and look up the result

Caching (1) Hash the device and the disk address and look up the result in a hash table. All the blocks with the same cache value are chained together through a linked list. In addition to this, a bidirectional link runs through all the blocks implementing a LRU list. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 54

Caching (2) • • Some blocks, such as i-node blocks, are rarely referenced two

Caching (2) • • Some blocks, such as i-node blocks, are rarely referenced two times within a short interval. Consider a modified LRU scheme, taking two factors into account: • Is the block likely to be needed again soon? • Is the block essential to the consistency of the file system? For both questions, blocks can be divided into categories • i-node blocks • Indirect blocks • Directory blocks • Data blocks • Full • Partially full Blocks that are essential for filesystem consistency should be written to disk immediately – write-through-caches • UNIX: synch (every 30 seconds) • Windows: In the past: none, recent ones: Flush. File. Buffers Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 55

Block read ahead The system tries to get blocks into the cache, before they

Block read ahead The system tries to get blocks into the cache, before they are accessed to increase the hit rate. Sequential access: performance improvement Random access: performance degradation 56

Reducing Disk Arm Motion Figure 4 -29. (a) I-nodes placed at the start of

Reducing Disk Arm Motion Figure 4 -29. (a) I-nodes placed at the start of the disk. (b) Disk divided into cylinder groups, each with its own blocks and i-nodes. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0 -13 -6006639 57