CPS 110 IO and file systems Landon Cox

  • Slides: 47
Download presentation
CPS 110: I/O and file systems Landon Cox April 8, 2008

CPS 110: I/O and file systems Landon Cox April 8, 2008

Virtual/physical interfaces Applications OS Hardware

Virtual/physical interfaces Applications OS Hardware

Multiple updates and reliability êReliability is only an issue in file systems êDon’t care

Multiple updates and reliability êReliability is only an issue in file systems êDon’t care about losing address space after crash êYour files shouldn’t disappear after a crash êFiles should be permanent êMulti-step updates cause problems êCan crash in the middle

Multi-step updates êTransfer $100 from Melissa’s account to mine 1. Deduct $100 from Melissa’s

Multi-step updates êTransfer $100 from Melissa’s account to mine 1. Deduct $100 from Melissa’s account 2. Add $100 to my account êCrash between 1 and 2, we lose $100

Multiple updates and reliability ê Seems obvious, right? êNo modern OS would make this

Multiple updates and reliability ê Seems obvious, right? êNo modern OS would make this mistake, right? ê Video evidence suggests otherwise êDirectory with 3 files êWant to move them to external drive êDrive “fails” during move êDon’t want to lose data due to failure ê Roll film …

The lesson? 1. Building an OS is hard. 2. OSes are software, not religions

The lesson? 1. Building an OS is hard. 2. OSes are software, not religions (one is not morally superior to another) 3. “Fanboys are stupid, but you are not. ” (Anil Dash, dashes. com/anil)

Multi-step updates êBack to business … êMove file from one directory to another 1.

Multi-step updates êBack to business … êMove file from one directory to another 1. Delete from old directory 2. Add to new directory êCrash between 1 and 2, we lose a file “/home/lpcox/names” “/home/chase/names”

Multi-step updates êCreate an empty new file 1. Point directory to new file header

Multi-step updates êCreate an empty new file 1. Point directory to new file header 2. Create new file header êWhat happens if we crash between 1 and 2? êDirectory will point to uninitialized header êKernel will crash if you try to access it êHow do we fix this? êRe-order the writes

Multi-step updates êCreate an empty new file 1. Create new file header 2. Point

Multi-step updates êCreate an empty new file 1. Create new file header 2. Point directory to new file header êWhat happens if we crash between 1 and 2? êFile doesn’t exist êFile system won’t point to garbage

Multi-step updates ê What if we also have to update a map of free

Multi-step updates ê What if we also have to update a map of free blocks? 1. Create new file header 2. Point directory to new file header 3. Update the free block map ê Does this work? êBad if crash between 2 and 3 êFree block map will still think new file header is free

Multi-step updates ê What if we also have to update a map of free

Multi-step updates ê What if we also have to update a map of free blocks? 1. Create new file header 2. Update the free block map 3. Point directory to new file header ê Does this work? êBetter, but still bad if crash between 2 and 3 êLeads to a disk block leak êCould scan the disk after a crash to recompute free map êOlder versions of Unix and Windows do this ê(now we have journaling file systems …)

Careful ordering is limited ê Transfer $100 from Melissa’s account to mine 1. Deduct

Careful ordering is limited ê Transfer $100 from Melissa’s account to mine 1. Deduct $100 from Melissa’s account 2. Add $100 to my account ê Crash between 1 and 2, we lose $100 ê Could reverse the ordering 1. Add $100 to my account 2. Deduct $100 from Melissa’s account ê Crash between 1 and 2, we gain $100 ê What does this remind you of?

Atomic actions êA lot like pre-emptions in a critical section êRace conditions êAllow threads

Atomic actions êA lot like pre-emptions in a critical section êRace conditions êAllow threads to see data in inconsistent state êCrashes êAllow OS to see file system in inconsistent state êWant to make actions atomic êAll operations are applied or none are

Atomic actions êWith threads êBuilt larger atomic operations ê(lock, wait, signal) êUsing atomic hardware

Atomic actions êWith threads êBuilt larger atomic operations ê(lock, wait, signal) êUsing atomic hardware operations ê(test and set, interrupt enable/disable) êSame idea for persistent storage êTransactions

Transactions êFundamental to databases ê(except My. SQL, until recently) êSeveral important properties ê“ACID” (atomicity,

Transactions êFundamental to databases ê(except My. SQL, until recently) êSeveral important properties ê“ACID” (atomicity, consistent, isolated, durable) êWe only care about atomicity (all or nothing) Called “committing” the transaction BEGIN disk write 1 … disk write n END

Transactions êBasic atomic unit provided by the hardware êWriting to a single disk sector/block

Transactions êBasic atomic unit provided by the hardware êWriting to a single disk sector/block ê(not actually atomic, but close) êHow to make sequences of updates atomic? êTwo styles êShadowing êLogging

Transactions: shadowing êEasy to explain, not widely used 1. Update a copy version 2.

Transactions: shadowing êEasy to explain, not widely used 1. Update a copy version 2. Switch the pointer to the new version êEach individual step is atomic êStep 2 commits the transaction êWhy doesn’t anyone do this? êDouble the storage overhead

Transactions: logging Commit Write. N Write 1 Begin 1. Begin transaction 2. Append info

Transactions: logging Commit Write. N Write 1 Begin 1. Begin transaction 2. Append info about modifications to a log 3. Append “commit” to log to end x-action 4. Write new data to normal database ê Single-sector write commits x-action (3) … Transaction Complete Invariant: append new data to log before applying to DB Called “write-ahead logging”

Transactions: logging Commit Write. N … Write 1 Begin 1. Begin transaction 2. Append

Transactions: logging Commit Write. N … Write 1 Begin 1. Begin transaction 2. Append info about modifications to a log 3. Append “commit” to log to end x-action 4. Write new data to normal database ê Single-sector write commits x-action (3) What if we crash here (between 3, 4)? On reboot, reapply committed updates in log order.

Transactions: logging Write. N … Write 1 Begin 1. Begin transaction 2. Append info

Transactions: logging Write. N … Write 1 Begin 1. Begin transaction 2. Append info about modifications to a log 3. Append “commit” to log to end x-action 4. Write new data to normal database ê Single-sector write commits x-action (3) What if we crash here? On reboot, discard uncommitted updates.

Transactions êMost file systems êUse transactions to modify meta-data êWhy not use them for

Transactions êMost file systems êUse transactions to modify meta-data êWhy not use them for data? êRelated updates are program-specific êWould have to modify programs êOS doesn’t know how to group updates

Virtual/physical interfaces Applications OS Hardware

Virtual/physical interfaces Applications OS Hardware

Naming and directories êHow to locate a file? êNeed to tell OS which file

Naming and directories êHow to locate a file? êNeed to tell OS which file header êUse a symbolic name or click on an icon êCould also describe contents êLike Google desktop and Mac Spotlight êNaming in databases works this way

Name translation êUser-specified file on-disk location êLots of possible data structures ê(hash table, list

Name translation êUser-specified file on-disk location êLots of possible data structures ê(hash table, list of pairs, tree, etc) êOnce you have the header, the rest is easy “/home/lpcox/” FS translation data (directory)

Directories êDirectory êContains a mapping for a set of files êName file header’s disk

Directories êDirectory êContains a mapping for a set of files êName file header’s disk block # êOften just table ê<file name, disk block #> pairs

Directories ê Stored on disk along with actual files ê Disk stores lots of

Directories ê Stored on disk along with actual files ê Disk stores lots of kinds of data êEnd-user data (i. e. data blocks) êMeta-data that describes end-user data êInodes, directories, indirect blocks ê Can often treat files and directories the same êBoth can use the same storage structure êE. g. multi-level indexed files êAllows directories to be larger than one block

Directories êDirectories can point to each other êName file’s disk header block # êName

Directories êDirectories can point to each other êName file’s disk header block # êName directory’s disk header block # êCan users read/write directories like files? êOS has no control over content of data blocks êOS must control content of meta-data blocks êWhy? êOS interprets meta-data, it must be wellformatted

Directories êUsers still have to modify directories êE. g. create and delete files êHow

Directories êUsers still have to modify directories êE. g. create and delete files êHow do we control these updates? êWhere else have we seen this issue? êUse a narrow interface (e. g. createfile ()) êLike using system calls to modify kernel

Directories êTypically a hierarchical structure êDirectory A has mapping to files and directories ê

Directories êTypically a hierarchical structure êDirectory A has mapping to files and directories ê /lpcox/cps 110/names is the root directory /lpcox is a directory within the / directory /lpcox/cps 110 is a directory within the /lpcox directory ê / ê ê

How many disk I/Os? ê Read first byte of /lpcox/cps 110/names? 1. 2. 3.

How many disk I/Os? ê Read first byte of /lpcox/cps 110/names? 1. 2. 3. 4. 5. 6. 7. 8. File header for / (at a fixed spot on disk) Read first data block for / Read file header for /lpcox Read first data block for /lpcox Read file header for /lpcox/cps 110 Read first data block for /lpcox/cps 110 Read file header for /lpcox/cps 110/names Read first data block for /lpcox/cps 110/names

How many disk I/Os? ê Caching is only way to make this efficient êIf

How many disk I/Os? ê Caching is only way to make this efficient êIf file header block # of /lpcox/cps 110 is cached êCan skip steps 1 -4 ê Current working directory êAllows users to specify file names instead of full paths êAllows system to avoid walking from root on each access êEliminates steps 1 -4

Mounting multiple disks ê Can easily combine multiple disks ê Basic idea: êEach disk

Mounting multiple disks ê Can easily combine multiple disks ê Basic idea: êEach disk has a file system (a / directory) êEntry in one directory can point to root of another FS êThis is called a “mount point” ê For example, on my machine crocus ê/bin is part of one local disk ê/tmp is part of another local disk ê/afs is part of the distributed file system AFS

Mounting multiple disks ê Requires a new mapping type êName file’s disk header block

Mounting multiple disks ê Requires a new mapping type êName file’s disk header block # êName directory’s disk header block # êName device name, file system type ê Use drivers to walk multiple file systems ê Windows: disks visible under My. Computer/{C, D, E} ê Unix: disks visible anywhere in tree

Course administration êProject 3: out today, due in two weeks êAmre is the expert

Course administration êProject 3: out today, due in two weeks êAmre is the expert on these êI can help, but he is the one in charge êDiscussion sections êWill focus on Project 3 êWill be hard to do P 3 without

Virtual/physical interfaces Applications OS Hardware

Virtual/physical interfaces Applications OS Hardware

File caching êFile systems have lots of data structures êData blocks êMeta-data blocks êDirectories

File caching êFile systems have lots of data structures êData blocks êMeta-data blocks êDirectories êFile headers êIndirect blocks êBitmap of free blocks êAccessing all of these is really slow

File caching ê Caching is the main thing that improves performance êRandom sequential I/O

File caching ê Caching is the main thing that improves performance êRandom sequential I/O êAccessing disk accessing memory ê Should the file cache be in virtual or physical memory? êShould put in physical memory êCould put in virtual memory, but might get paged out êWorst-case: each file is on disk twice ê Could also use memory-mapped files êThis is what Windows does

Memory-mapped files êBasic idea: use VM system to cache files êMap file content into

Memory-mapped files êBasic idea: use VM system to cache files êMap file content into virtual address space êSet the backing store of region to file êCan now access the file using load/store êWhen memory is paged out êUpdates go back to file instead of swap space

File Read in Windows NT Application User mode Kernel mode Read(file, b) I/O Manager

File Read in Windows NT Application User mode Kernel mode Read(file, b) I/O Manager IRP(file, b) Read(file, b) FS Driver n No Page Fault (A) ed ch Ca Disk I/O Cache Manager map <A, file> cp A->b ) file Disk Driver d( a Re Disk VM Manager lookup <A, file> cp file->A

File caching issues êNormal design considerations for caches êCache size, block size, replacement, etc.

File caching issues êNormal design considerations for caches êCache size, block size, replacement, etc. êWrite-back or write-through? êWrite-through: writes to disk immediately êSlow, loss of power won’t lose updates êWrite-back: delay for a period êFast, loss of power can lose updates êMost systems use a 30 sec write-back delay

Distributed file systems êDistributed file system êData stored on many different computers êOne, unified

Distributed file systems êDistributed file system êData stored on many different computers êOne, unified view of all data êSame interface to access êLocal files êRemote files

Distributed file systems êExamples: AFS, NFS, Samba, web(? ) êWhy use them? êEasier to

Distributed file systems êExamples: AFS, NFS, Samba, web(? ) êWhy use them? êEasier to share data among people êUniform view of files from different machines êEasier to manage (backup, etc)

Basic implementation êClassic client/server model Clients Going over the network for data makes performance

Basic implementation êClassic client/server model Clients Going over the network for data makes performance bad. Cache at the client to improve. Server

Caching êI’m sitting at crocus and want to access ê/usr/project/courses/cps 110/bin/submit 1 10 êWhich

Caching êI’m sitting at crocus and want to access ê/usr/project/courses/cps 110/bin/submit 1 10 êWhich is stored at linux. cs. duke. edu êWhat should happen to the file? êTransfer sole copy from server to client? êMake a copy of the file instead (replication)

Caching ê What happens if I modify the file? êOther clients’ copy will be

Caching ê What happens if I modify the file? êOther clients’ copy will be out of date ê All copies must be kept consistent 1. Invalidate all other copies 2. Update other copies ê Exam analogy: all exams must be the same êCould have everyone update their copy (broadcast) êCould make everyone throw out their copy (invalidation)

Caching êWhen do I give up my copy? ê(invalidation) êOnly when someone else modifies

Caching êWhen do I give up my copy? ê(invalidation) êOnly when someone else modifies it

Cached file states Invalid Someone else writes the file I read the file Someone

Cached file states Invalid Someone else writes the file I read the file Someone else writes the file I write the file Exclusive I read the file Shared I write the file What else from 110 does this remind you of? Reader-writer locks: allow parallel reads of shared data, one writer