File Management 1 Operating System Components Operating System

  • Slides: 101
Download presentation
File Management 1

File Management 1

Operating System Components Operating System File Manager Process & Resource Manager Processor(s) Memory Manager

Operating System Components Operating System File Manager Process & Resource Manager Processor(s) Memory Manager Device Manager Main Memory Devices Computer Hardware 2

Why Programmers Need Files HTML Editor <head> … </head> <body> … </body> Web Browser

Why Programmers Need Files HTML Editor <head> … </head> <body> … </body> Web Browser foo. html File Manager • Persistent storage • Shared device <head> … </head> <body> … </body> File Manager • Structured information • Can be read by any applic • Accessibility 3 • Protocol

Fig 13 -2: The External View of the File Manager Application Program Memory Mgr

Fig 13 -2: The External View of the File Manager Application Program Memory Mgr Process Mgr File Mgr UNIX Device Mgr Write. File() Create. File() Close. Handle() Read. File() Set. File. Pointer() Memory Mgr Process Mgr Device Mgr File Mgr mount() write() close() open() read() lseek() Windows Hardware 4

Introduction • What is a file? • Where is a file located physically? •

Introduction • What is a file? • Where is a file located physically? • What are the steps to access a file? 5

File Management • File is a named, ordered collection of information • The file

File Management • File is a named, ordered collection of information • The file manager administers the collection by: – Storing the information on a device – Mapping the block storage to a logical view – Allocating/deallocating storage – Providing file directories • What abstraction should be presented to programmer? 6

File system context 7

File system context 7

Levels in a file system 8

Levels in a file system 8

Levels of data abstraction 9

Levels of data abstraction 9

Logical structures in a file 10

Logical structures in a file 10

Information Structure Applications Records Structured Record Files Record-Stream Translation Byte Stream Files Stream-Block Translation

Information Structure Applications Records Structured Record Files Record-Stream Translation Byte Stream Files Stream-Block Translation Storage device 11

Byte Stream File Interface • Implements the block-stream interface • Info on file held

Byte Stream File Interface • Implements the block-stream interface • Info on file held in file descriptor – described later • Typical operations on file – file. ID = open(file. Name) – close(file. ID) – read(file. ID, buffer, length) – write(file. ID, buffer, length) – seek(file. ID, file. Position) 12

Low Level Files fid = open(“file. Name”, …); … read(fid, buflen); … close(fid); int

Low Level Files fid = open(“file. Name”, …); … read(fid, buflen); … close(fid); int int int open(…) {…} close(…) {…} read(…) {…} write(…) {…} seek(…) {…} b 0 b 1 b 2 . . . bi . . . Stream-Block Translation Storage device response to commands 13

File meta-data • File contain data plus information about the data, that is, meta-data

File meta-data • File contain data plus information about the data, that is, meta-data – Meta-data is kept in a file descriptor 14

File Descriptor Information • • • • External name Current state Sharable Owner User

File Descriptor Information • • • • External name Current state Sharable Owner User Locks Protection settings Length Time of creation Time of last modification Time of last access Reference count Storage device details 15

File Descriptor in Unix • File descriptor in UNIX is called an inode (index

File Descriptor in Unix • File descriptor in UNIX is called an inode (index node), containing the following entries 16

Structured Files • A file is a stream of bytes • Usually want to

Structured Files • A file is a stream of bytes • Usually want to access in a structured manner – May have no structure imposed (UNIX) • Must be provided by application – May have a structure imposed (VMS) • Need to maintain additional information – Type of file – Access methods – Other information 17

Block Record Translation Records Record-Block Translation 18

Block Record Translation Records Record-Block Translation 18

Record-Oriented Sequential Files Logical Record file. ID = open(file. Name) close(file. ID) get. Record(file.

Record-Oriented Sequential Files Logical Record file. ID = open(file. Name) close(file. ID) get. Record(file. ID, record) put. Record(file. ID, record) seek(file. ID, position) • A structured sequential file is a named sequence of logical records, indexed by nonnegative integers • Records may be of fixed size, or variable size – This is determined by file manager 19

Record-Oriented Sequential Files Logical Record H byte header k byte logical record . .

Record-Oriented Sequential Files Logical Record H byte header k byte logical record . . . Next Header • Header contains record descriptor information (occupies H bytes) • Logical record takes up k bytes – fixed size 20

Record-Oriented Sequential Files Logical Record H byte header k byte logical record . .

Record-Oriented Sequential Files Logical Record H byte header k byte logical record . . . Physical Storage Blocks Fragment 21

Electronic Mail Example struct message { put. Record(struct message *msg) { /* The mail

Electronic Mail Example struct message { put. Record(struct message *msg) { /* The mail message */ put. Address(msg->to); address to; put. Address(msg->from); address from; put. Address(msg->cc); line subject; put. Line(msg->subject); address cc; put. String(msg->body); string body; } }; struct message *get. Record(void) { struct message *msg; msg = allocate(sizeof(message)); msg->to = get. Address(. . . ); msg->from = get. Address(. . . ); msg->cc = get. Address(. . . ); msg->subject = get. Line(); msg->body = get. String(); return(msg); } 22

Record-Oriented Sequential Files • Fixed size records can be a problem – Applications requiring

Record-Oriented Sequential Files • Fixed size records can be a problem – Applications requiring large record sizes would require that the programmer break the records into smaller pieces – Applications only requiring small record sizes would waste space • A solution is for the file system to be enhanced to include a function to define the record size for a file – encoded in header 23

Indexed Sequential File • Suppose we want to directly access records • Add an

Indexed Sequential File • Suppose we want to directly access records • Add an index to the file. ID = open(file. Name) close(file. ID) get. Record(file. ID, index) index = put. Record(file. ID, record) delete. Record(file. ID, index) 24

Indexed Sequential File (cont) Application structure Account # 0123456 294376. . . 529366. .

Indexed Sequential File (cont) Application structure Account # 0123456 294376. . . 529366. . . 965987 Index index = i i k index = k j index = j 25

More Abstract Files • Inverted files – System index for each datum in the

More Abstract Files • Inverted files – System index for each datum in the file – Records accessed based on appearance in table rather than their logical location • Company accounts may be accessed by customer name, but customer may have several accounts • Set up external index table by name with pointers to the main table • Multimedia storage – Records contain radically different types – Access methods must be general 26

Database Management Systems • A database is a very highly structured set of information

Database Management Systems • A database is a very highly structured set of information – Stored across different files – Optimized to minimize access time • DBMSs implementation – Some DBMSs use the normal files provided by the OS for generic use – Some use their own storage device block 27

File systems • File system – A data structure on a disk that holds

File systems • File system – A data structure on a disk that holds files • actually a file system is in a disk partition • a technical term different from a “file system” as the part of the OS that implements files • File systems in different OSs have different internal structures 28

A file system layout 29

A file system layout 29

Implementing Low Level Files • Process needs to be able to read from and

Implementing Low Level Files • Process needs to be able to read from and write to storage devices • Simplest system is byte stream file system – (will consider record-oriented systems later) • Storage device may be accessed 2 ways – Sequentially – like a tape drive – Randomly – like a magnetic disk 30

Low-level File System Architecture Block 0 b 1 b 2 b 3 … …

Low-level File System Architecture Block 0 b 1 b 2 b 3 … … bn-1 . . . Sequential Device Randomly Accessed Device 31

Low Level Files Management • Secondary storage device contains: – Volume directory (sometimes a

Low Level Files Management • Secondary storage device contains: – Volume directory (sometimes a root directory for a file system) – External file descriptor for each file – The file contents • Manages blocks – Assigns blocks to files (descriptor keeps track) – Keeps track of available blocks • Maps to/from byte stream 32

File Manager Data Structures 2 Keep the state of the processfile session 3 Return

File Manager Data Structures 2 Keep the state of the processfile session 3 Return a reference to the data structure Process-File Session Open File Descriptor 1 Copy info from external to the open file descriptor External File Descriptor 33

An open Operation • Locate the on-device (external) file descriptor • Extract info needed

An open Operation • Locate the on-device (external) file descriptor • Extract info needed to read/write file • Authenticate that process can access the file • Create an internal file descriptor in primary memory • Create an entry in a “per process” open file status table • Allocate resources, e. g. , buffers, to support file usage 34

A close Operation • • • Completes all pending operations Release I/O buffers Release

A close Operation • • • Completes all pending operations Release I/O buffers Release locks process holds on file Update external file descriptor Deallocate file status table entry 35

Opening a UNIX File fid = open(“file. A”, flags); … read(fid, buffer, len); 0

Opening a UNIX File fid = open(“file. A”, flags); … read(fid, buffer, len); 0 1 2 3 stdin stdout stderr. . . On-Device File Descriptor File structure inode Open File Table Internal File Descriptor 36

Block Management • The job of selecting & assigning storage blocks to the file

Block Management • The job of selecting & assigning storage blocks to the file • For a fixed sized file of k blocks – File of length m requires N = m/k blocks – Byte bi is stored in block i/k • The logical file is divided into logical blocks • Each logical block is mapped to a physical disk block 37

Locating file data • The file descriptor contains data on how to perform this

Locating file data • The file descriptor contains data on how to perform this mapping – there are many methods for performing this mapping • Three basic strategies: – Contiguous allocation – Linked lists – Indexed allocation 38

Dividing a file into blocks 39

Dividing a file into blocks 39

Disk Organization Boot Sector Volume Directory … Blk 0 Blk 1 Blkk+1 Blkk-1 Track

Disk Organization Boot Sector Volume Directory … Blk 0 Blk 1 Blkk+1 Blkk-1 Track 0, Cylinder 0 Blk 2 k-1 Track 0, Cylinder 1 … Blk Track 1, Cylinder 0 … Blk Track N-1, Cylinder M-1 … … Blk Blk … Blk 40

Contiguous Allocation • Maps the N blocks into N contiguous blocks on the secondary

Contiguous Allocation • Maps the N blocks into N contiguous blocks on the secondary storage device – Simple to implement – Random access • Does not provide for dynamic file sizes – If you want to extend a file, hope there is an empty block following, or recopy the entire file to a larger group of unallocated contiguous blocks File descriptor Head position … First block Number of blocks 237 785 25 41

A contiguous file 42

A contiguous file 42

Keeping a file in pieces • We need a block pointer for each logical

Keeping a file in pieces • We need a block pointer for each logical block, an array of block pointers – block mapping indexes into this array – Each file is a linked list of disk blocks • But where do we keep this array? – usually it is not kept as contiguous array – the array of disk pointers is like a second related file (that is 1/1024 as big) 43

Block pointers in the file descriptor 44

Block pointers in the file descriptor 44

Block pointers in contiguous disk blocks 45

Block pointers in contiguous disk blocks 45

Linked Lists • Each block contains a header with – Number of bytes in

Linked Lists • Each block contains a header with – Number of bytes in the block – Pointer to next block • Blocks need not be contiguous • Files can expand contract • Seeks can be slow First block … Head: 417. . . Length NULL Length Byte 0. . . Byte 4095 Block 0 Block 1 Block N-1 46

Linked Lists – cont. 47

Linked Lists – cont. 47

Indexed Allocation • Extract headers and put them in an index • Simplify seeks

Indexed Allocation • Extract headers and put them in an index • Simplify seeks • May link indices together (for large files) Index block … Head: 417. . . Byte 0. . . Byte 4095 Length Block 0 Byte 0. . . Byte 4095 Length Byte 0. . . Byte 4095 Block N-1 Block 1 48

Block pointers in an index block 49

Block pointers in an index block 49

Block pointers in an index block – cont. 50

Block pointers in an index block – cont. 50

Chained index blocks 51

Chained index blocks 51

Two-level index blocks 52

Two-level index blocks 52

Two-level index blocks – cont. primary index secondary index table data blocks 53

Two-level index blocks – cont. primary index secondary index table data blocks 53

File system layout variations • New UNIX file systems use cylinder groups (mini-file systems)

File system layout variations • New UNIX file systems use cylinder groups (mini-file systems) to achieve better locality of file data • MS/DOS uses a FAT (file allocation table) file system – so does the Macintosh OS (although the Mac. OS layout is different) 54

inode UNIX Files mode owner … Direct block 0 Direct block 1 … Direct

inode UNIX Files mode owner … Direct block 0 Direct block 1 … Direct block 11 Single indirect Double indirect Triple indirect Data Index Data Index Index Data 55 Data

DOS FAT Files File Descriptor 43 254 Disk Block … 107 Disk Block Logical

DOS FAT Files File Descriptor 43 254 Disk Block … 107 Disk Block Logical Linked List 56

DOS FAT Files File Descriptor 43 Disk Block 254 Disk Block … File Descriptor

DOS FAT Files File Descriptor 43 Disk Block 254 Disk Block … File Descriptor 43 254 43 107 Disk Block … 107 Disk Block 254 File Access Table (FAT) 57

Unallocated Blocks • How should unallocated blocks be managed? • Need a data structure

Unallocated Blocks • How should unallocated blocks be managed? • Need a data structure to keep track of them – Block status map (or disk bitmap) • Small enough to be held in primary memory – Linked list (or free list) • Very large • Hard to manage spatial locality (need to scan list to find blocks ‘close to’ each other) 58

Free-Space Management • Bit vector (n blocks) 0 1 2 n-1 … bit[i] =

Free-Space Management • Bit vector (n blocks) 0 1 2 n-1 … bit[i] = 1 block[i] free 0 block[i] occupied • First free block number (number of bits per word) * (number of 0 -value words) + offset of first 1 bit 59

Free-Space Management - cont. • Bit map requires extra space. Example: block size =

Free-Space Management - cont. • Bit map requires extra space. Example: block size = 212 bytes disk size = 230 bytes (1 gigabyte) n = 230/212 = 218 bits (or 32 K bytes) • Easy to get contiguous files • Linked list (free list) – Cannot get contiguous space easily – No waste of space 60

Free list organization 61

Free list organization 61

Free-Space Management – cont. • Need to protect: – Pointer to free list –

Free-Space Management – cont. • Need to protect: – Pointer to free list – Bit map • Must be kept on disk • Copy in memory and disk may differ. • Cannot allow for block[i] to have a situation where bit[i] = 0 in memory and bit[i] = 1 on disk. – Solution: • Set bit[i] = 0 in disk. • Allocate block[i] • Set bit[i] = 0 in memory 62

Marshalling the Byte Stream • Must read at least one buffer ahead on input

Marshalling the Byte Stream • Must read at least one buffer ahead on input • Must write at least one buffer behind on output • Seek flushing the current buffer and finding the correct one to load into memory • Inserting/deleting bytes in the interior of the stream 63

Buffering • Storage devices use Block I/O • Files place an explicit order on

Buffering • Storage devices use Block I/O • Files place an explicit order on the bytes • Therefore, it is possible to predict what will be read after bytei • When file is opened, manager reads as many blocks ahead as feasible • After a block is logically written, it is queued for writing behind, whenever the disk is available • Buffer pool – usually variably sized, depending on virtual memory needs – Interaction with the device manager and memory manager 64

Supporting Other Storage Abstractions • Low-level file systems avoid encoding record -level functionality –

Supporting Other Storage Abstractions • Low-level file systems avoid encoding record -level functionality – If applications use very large or very small records, a generic file manager may be efficient – Some operating systems provide a higher-layer file system to support applications with large or small files – Database management systems and multimedia documents are examples 65

Other Storage Abstractions • Modern, open operating systems tend towards low-level file systems •

Other Storage Abstractions • Modern, open operating systems tend towards low-level file systems • Proprietary operating systems designed for specific applications implement higher layer files systems • Structured Sequential Records – Contain collections of logical records – Need to read from or write to entire records 66

Other Storage Abstractions • Indexed sequential files – File manager keeps table for each

Other Storage Abstractions • Indexed sequential files – File manager keeps table for each open file and maps index to block containing the record – Consumes space – Read/write operations more complex – Buffering is not of much value (records accessed in arbitrary order) • Multimedia – Requires large files and high bandwidth • Use larger block sizes • Try to use contiguous block allocation 67

Directories • A directory is a set of logically associated files and other directories

Directories • A directory is a set of logically associated files and other directories of files – Directories are the mechanism we use to organize files • The file manager provides a set of commands to manage directories – Traverse a directory – Enumerate a list of all files and nested directories 68

Directories • Directory commands – enumerate – copy – rename – delete – traverse

Directories • Directory commands – enumerate – copy – rename – delete – traverse – etc. 69

Directory Structures • How should files be organized within directory? – Flat name space

Directory Structures • How should files be organized within directory? – Flat name space • All files appear in a single directory – Hierarchical name space • Directory contains files and subdirectories • Each file/directory appears as an entry in exactly one other directory -- a tree • Popular variant: All directories form a tree, but a file can have multiple parents. 70

Directory Structures 71

Directory Structures 71

Directory Structures – cont. 72

Directory Structures – cont. 72

A directory tree 73

A directory tree 73

Directory Implementation • Device Directory – A device can contain a collection of files

Directory Implementation • Device Directory – A device can contain a collection of files – Easier to manage if there is a root for every file on the device -- the device root directory • File Directory – Typical implementations have directories implemented as a file with a special format – Entries in a file directory are handles for other files (which can be files or subdirectories) 74

Directory Implementation – cont. • Sorted linear list of file names with pointers to

Directory Implementation – cont. • Sorted linear list of file names with pointers to the data blocks – simple to program – time-consuming to execute • Hash Table – linear list with hash data structure – decreases directory search time – collisions – situations where two file names hash to the same location – fixed size 75

Directory Implementation – cont. • Physical disk may be divided into two or more

Directory Implementation – cont. • Physical disk may be divided into two or more logical disks – Bitmap table doesn’t need to be as large – Easier to archive – Can handle several operating systems • Requires partitioning at device driver level 76

Mounting file systems • Each file system has a root directory • We can

Mounting file systems • Each file system has a root directory • We can combine file systems by mounting – that is, link a directory in one file system to the root directory of another file system • This allows us to build a single tree out of several file systems • This can also be done across a network, mounting file systems on other machines 77

Mounting a file system 78

Mounting a file system 78

UNIX mount Command / bin usr etc bill foo nutt bar abc cde xyz

UNIX mount Command / bin usr etc bill foo nutt bar abc cde xyz blah 79

UNIX mount Command / / bin usr etc bill bin usr etc foo bill

UNIX mount Command / / bin usr etc bill bin usr etc foo bill nutt foo bar nutt abc cde xyz bar blah abc cde xyz blah mount bar at foo 80

More on Files 81

More on Files 81

File names • Directory – Maps component names into objects (files or directories) •

File names • Directory – Maps component names into objects (files or directories) • Path name – A sequence of component names specifying a path of directories • absolute path: starts at the root directory • relative path: starts at the working directory • File name extension: suffix of a component names that indicate the type of the file • Alias: alternate path names for a file 82

File name space topologies 83

File name space topologies 83

Some common file extensions • • • file. c -- a C program file.

Some common file extensions • • • file. c -- a C program file. txt -- a text file. s -- an assembly language file. obj -- an object file (in MS/DOS) file. o -- an object file (in UNIX) file. exe -- an executable file (in MS/DOS) file. wk 1 -- spreadsheet worksheet file. tex -- tex or latex (a text formatter) file. mif -- Framemaker interchange file. scm -- Scheme program file. tmp -- a temporary file 84

Path name examples • • /home/faculty/egle/os/book/ch 02 – UNIX /home/student/jdoe/os/proj 2 – UNIX book/ch

Path name examples • • /home/faculty/egle/os/book/ch 02 – UNIX /home/student/jdoe/os/proj 2 – UNIX book/ch 02 –UNIX relative path name E: egleclassosbookch 02 – MS/DOS users: u 1: egle: os: book: ch 02 – Macintosh disk$faculty: [egle. os. book]ch 02 – VMS [. book]ch 02 – VMS relative path name 85

Open Files • When a file is opened, the file manager keeps additional dynamic

Open Files • When a file is opened, the file manager keeps additional dynamic information – File position 86

File Operations • • • open. File = open(file name) open. File = create(file

File Operations • • • open. File = open(file name) open. File = create(file name) file. Meta. Data = status(file name) okay = access(file name, access type) okay = change mode(file name, new mode) – changes protection information • okay = change owner(file name, new owner) 87

Open file operations • bytes. Read = read(open file) • bytes. Written = write(open

Open file operations • bytes. Read = read(open file) • bytes. Written = write(open file) • new. File. Pos = seek(open file, how much, how) -- move file position • close(open file) • open. File = duplicate(open file) • file. Lock(open file) • file. Control(open file) • two. Open. Files = pipe() 88

Directory operations in UNIX • • • link(file name, alias name) unlink(file name): delete

Directory operations in UNIX • • • link(file name, alias name) unlink(file name): delete file rename(old name, new name) make. Directory(directory name) remove. Directory(directory name) 89

Two major parts of a file system The file manager needs to map a

Two major parts of a file system The file manager needs to map a filename to a collection of physical blocks on the storage devices 90

File system data structures 91

File system data structures 91

Flow of control for an open 92

Flow of control for an open 92

Flow of control for a read 93

Flow of control for a read 93

Connecting files and devices 94

Connecting files and devices 94

Special files • Special files are not ordinary files – e. g. directories, devices,

Special files • Special files are not ordinary files – e. g. directories, devices, pipes, message queues, remote files, mounted directories, etc. • They are marked by flags in the file descriptor • The read and write operations are directed to code for that type of special file – a case statement determines how to handle a read or write operation 95

Fork data structure changes 96

Fork data structure changes 96

System call data structure changes 97

System call data structure changes 97

Duplicate data structure changes 98

Duplicate data structure changes 98

Pipe data structure changes 99

Pipe data structure changes 99

Avoiding data copies 100

Avoiding data copies 100

Path name lookup algorithm 101

Path name lookup algorithm 101