INF 1060 Introduction to Operating Systems and Data

  • Slides: 68
Download presentation
INF 1060: Introduction to Operating Systems and Data Communication Operating Systems: Storage: Disks &

INF 1060: Introduction to Operating Systems and Data Communication Operating Systems: Storage: Disks & File Systems Pål Halvorsen 5/10 - 2005

Overview ü Disks ü Disk scheduling ü Memory caching ü File systems INF 1060

Overview ü Disks ü Disk scheduling ü Memory caching ü File systems INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disks

Disks

Disks ü Disks. . . Ø are used to have a persistent system J

Disks ü Disks. . . Ø are used to have a persistent system J are cheaper compared to main memory J have more capacity L are orders of magnitude slower ü Two resources of importance Ø storage space Ø I/O bandwidth ü Because. . . Ø . . . there is a large speed mismatch (ms vs. ns - 106) compared to main memory (this gap still increases) Ø . . . disk I/O is often the main performance bottleneck . . . we must look closer on how to manage disks INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Mechanics of Disks Platters circular platters covered with magnetic material to provide nonvolatile storage

Mechanics of Disks Platters circular platters covered with magnetic material to provide nonvolatile storage of bits Spindle of which the platters rotate around Tracks concentric circles on a single platter Disk heads read or alter the magnetism (bits) passing under it. The heads are attached to an arm enabling it to move across the platter surface Sectors segment of the track circle – usually each contains 512 bytes – separated by non-magnetic gaps. The gaps are often used to identify beginning of a sector INF 1060 – introduction to operating systems and data communication Cylinders corresponding tracks on the different platters are said to form a cylinder 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Specifications Note 1: disk manufacturers usually denote GB as 109 whereas computer quantities

Disk Specifications Note 1: disk manufacturers usually denote GB as 109 whereas computer quantities often are powers of 2, i. e. , GB is 230 ü Some existing (Seagate) disks today: Barracuda 180 Capacity (GB) Cheetah 36 Cheetah X 15. 3 181. 6 36. 4 73. 4 7200 10. 000 15. 000 24. 247 9. 772 18. 479 average seek time (ms) 7. 4 5. 7 3. 6 min (track-to-track) seek (ms) 0. 8 0. 6 0. 2 16 12 7 4. 17 3 2 282 – 508 520 – 682 609 – 891 16 MB 4 MB 8 MB Spindle speed (RPM) #cylinders max (full stroke) seek (ms) average latency internal transfer rate (Mbps) disk buffer cache Note 2: there is a difference between internal and formatted transfer rate. Internal is only between platter. Formatted is after the signals interfere with the electronics (cabling loss, interference, retransmissions, checksums, etc. ) INF 1060 – introduction to operating systems and data communication Note 3: there is usually a trade off between speed and capacity 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Capacity ü The size (storage space) of the disk is dependent on Ø

Disk Capacity ü The size (storage space) of the disk is dependent on Ø the number of platters Ø whether the platters use one or both sides Ø number of tracks per surface Ø (average) number of sectors per track Ø number of bytes per sector ü Example (Cheetah X 15): Note: Ø 4 platters using both sides: 8 surfaces there is a difference between formatted and total capacity. Some Ø 18497 tracks per surface of the capacity is used for storing Ø 617 sectors per track (average) checksums, spare tracks, gaps, etc. Ø 512 bytes per sector 10 Ø Total capacity = 8 x 18497 x 617 x 512 4. 6 x 10 = 42. 8 GB Ø Formatted capacity = 36. 7 GB INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Access Time ü How do we retrieve data from disk? Ø position head

Disk Access Time ü How do we retrieve data from disk? Ø position head over the cylinder (track) on which the block (consisting of one or more sectors) are located Ø read or write the data block as the sectors move under the head when the platters rotate ü The time between the moment issuing a disk request and the time the block is resident in memory is called disk latency or disk access time INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Access Time block x in memory I want block X Disk platter Disk

Disk Access Time block x in memory I want block X Disk platter Disk access time = Disk head Seek time + Rotational delay Disk arm + Transfer time + Other delays INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Access Time: Seek Time ü Seek time is the time to position the

Disk Access Time: Seek Time ü Seek time is the time to position the head Ø the heads require a minimum amount of time to start and stop moving the head Ø some time is used for actually moving the head – roughly proportional to the number of cylinders traveled Ø Time to move head: Time number of tracks seek time constant fixed overhead ~ 10 x - 20 x “Typical” average: 10 ms 40 ms 7. 4 ms (Barracuda 180) 5. 7 ms (Cheetah 36) 3. 6 ms (Cheetah X 15) x 1 N INF 1060 – introduction to operating systems and data communication Cylinders Traveled 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Access Time: Rotational Delay ü Time for the disk platters to rotate so

Disk Access Time: Rotational Delay ü Time for the disk platters to rotate so the first of the required sectors are under the disk head here Average delay is 1/2 revolution “Typical” average: 8. 33 5. 56 4. 17 3. 00 2. 00 ms ms ms (3. 600 RPM) (5. 400 RPM) (7. 200 RPM) (10. 000 RPM) (15. 000 RPM) block I want INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Access Time: Transfer Time ü Time for data to be read by the

Disk Access Time: Transfer Time ü Time for data to be read by the disk head, i. e. , time it takes the sectors of the requested block to rotate under the head amount of data per track ü Transfer rate = time per rotation ü Transfer time = amount of data to read / transfer rate ü Example – Barracuda 180: 406 KB per track x 7. 200 RPM 47. 58 MB/s ü Example – Cheetah X 15: 316 KB per track x 15. 000 RPM 77. 15 MB/s Note: one might achieve these transfer rates reading continuously on disk, but time must be added for seeks, etc. ü Transfer time is dependent on data density and rotation speed ü If we have to change track, time must also be added for moving the head INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Access Time: Other Delays ü There are several other factors which might introduce

Disk Access Time: Other Delays ü There are several other factors which might introduce additional delays: Ø Ø Ø Ø CPU time to issue and process I/O contention for controller contention for bus contention for memory verifying block correctness with checksums (retransmissions) waiting in scheduling queue. . . ü Typical values: “ 0” (maybe except from waiting in the queue) INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Writing and Modifying Blocks ü A write operation is analogous to read operations Ø

Writing and Modifying Blocks ü A write operation is analogous to read operations Ø must add time for block allocation Ø a complication occurs if the write operation has to be verified – must wait another rotation and then read the block to see if it is the block we wanted to write Total write time read time (+ time for one rotation) Ø ü A modification operation is similar to reading and writing operations Ø cannot modify a block directly: n n Ø read block into main memory modify the block write new content back to disk (verify the write operation) Total modify time read time (+ time to modify) + write time INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Controllers ü To manage the different parts of the disk, we use a

Disk Controllers ü To manage the different parts of the disk, we use a controller, which is a small processor capable of: disk Ø controlling the actuator moving the head to the desired track Ø selecting which platter and surface to use Ø knowing when right sector is under the head Ø transferring data between main memory and disk ü New controllers acts like small computers themselves Ø both disk and controller now has an own buffer reducing disk access time Ø data on damaged disk blocks/sectors are just moved to spare room at the disk – the system above (OS) does not know this, i. e. , a block may lie elsewhere than the OS thinks INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Efficient Secondary Storage Usage ü Must take into account the use of secondary storage

Efficient Secondary Storage Usage ü Must take into account the use of secondary storage Ø there are large access time gaps, i. e. , a disk access will probably dominate the total execution time Ø there may be huge performance improvements if we reduce the number of disk accesses Ø a “slow” algorithm with few disk accesses will probably outperform a “fast” algorithm with many disk accesses ü Several ways to optimize. . . Ø block size - 4 KB Ø file management / data placement - various Ø disk scheduling - SCAN derivate Ø multiple disks - a specific RAID level Ø prefetching - read-ahead prefetching Ø memory caching / replacement algorithms- LRU variant Ø … INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Scheduling

Disk Scheduling

Disk Scheduling ü Seek time is a dominant factor of total disk I/O time

Disk Scheduling ü Seek time is a dominant factor of total disk I/O time ü Let operating system or disk controller choose which request to serve next depending on the head’s current position and requested block’s position on disk (disk scheduling) ü Note that disk scheduling CPU scheduling Ø a mechanical device – hard to determine (accurate) access times Ø disk accesses can/should not be preempted – run until it finishes Ø disk I/O often the main performance bottleneck ü General goals Ø short response time Ø high overall throughput Ø fairness (equal probability for all blocks to be accessed in the same time) ü Tradeoff: seek and rotational delay vs. maximum response time INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Disk Scheduling ü Several traditional algorithms Ø First-Come-First-Serve (FCFS) Ø Shortest Seek Time First

Disk Scheduling ü Several traditional algorithms Ø First-Come-First-Serve (FCFS) Ø Shortest Seek Time First (SSTF) Ø SCAN (and variations) Ø Look (and variations) Ø … ü A LOT of different algorithms exist depending on expected access pattern INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

First–Come–First–Serve (FCFS) FCFS serves the first arriving request first: ü Long seeks ü “Short”

First–Come–First–Serve (FCFS) FCFS serves the first arriving request first: ü Long seeks ü “Short” average response time incoming requests (in order of arrival, denoted by cylinder number): 14 2 7 scheduling queue 21 8 1 24 5 10 15 20 cylinder number 25 12 14 2 7 21 8 24 time 12 INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

SCAN (elevator) moves head edge to edge and serves requests on the way: ü

SCAN (elevator) moves head edge to edge and serves requests on the way: ü bi-directional ü compromise between response time and seek time optimizations ü several optimizations: C-SCAN, LOOK, C-LOOK, … incoming requests (in order of arrival): 14 14 2 2 7 21 21 8 1 24 24 5 10 15 20 cylinder number 25 scheduling queue time 12 12 INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

SCAN vs. FCFS incoming requests (in order of arrival): ü Disk scheduling makes a

SCAN vs. FCFS incoming requests (in order of arrival): ü Disk scheduling makes a difference! 1 5 10 12 14 15 2 7 21 20 8 24 cylinder number 25 FCFS see that SCAN requires much less head movement compared to FCFS time ü In this case, we SCAN time (here 37 vs. 75 tracks) INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used to be performed by OS (file system or device driver) only… ü … but, new disks are more complex Ø hide their true layout, e. g. , n n only logical block numbers different number of surfaces, cylinders, sectors, etc. OS view INF 1060 – introduction to operating systems and data communication real view 2005 Kjell Åge Bringsrud & Pål Halvorsen

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used to be performed by OS (file system or device driver) only… ü … but, new disks are more complex Ø hide their true layout Ø transparently move blocks to spare cylinders n e. g. , due to bad disk blocks OS view INF 1060 – introduction to operating systems and data communication real view 2005 Kjell Åge Bringsrud & Pål Halvorsen

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used to be performed by OS (file system or device driver) only… ü … but, new disks are more complex Ø hide their true layout Ø transparently move blocks to spare cylinders Ø have different zones OS view real view ü Constant angular velocity ü Zoned CAV disks Ø constant rotation speed Ø zones are ranges of tracks Ø typical few zones Ø the different zones have different amount of data, i. e. , more better on outer tracks ð thus, variable transfer time (CAV) disks Ø Ø ð constant rotation speed equal amount of data in each track thus, constant transfer time NB! illustration of transfer time, not rotation speed INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used to be performed by OS (file system or device driver) only… ü … but, new disks are more complex Ø hide their true layout Ø transparently move blocks to spare cylinders Ø have different zones Ø head accelerates – most algorithms assume linear movement overhead Time ~ 10 x - 20 x x 1 INF 1060 – introduction to operating systems and data communication N Cylinders Traveled 2005 Kjell Åge Bringsrud & Pål Halvorsen

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used to be performed by OS (file system or device driver) only… ü … but, new disks are more complex Ø hide their true layout Ø transparently move blocks to spare cylinders Ø have different zones Ø head accelerates – most algorithms assume linear movement overhead Ø on device buffer caches may use read-ahead prefetching disk INF 1060 – introduction to operating systems and data communication buffer disk 2005 Kjell Åge Bringsrud & Pål Halvorsen

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used

Modern Disk Scheduling ü Disk used to be simple devices and disk scheduling used to be performed by OS (file system or device driver) only… ü … but, new disks are more complex Ø hide their true layout Ø transparently move blocks to spare cylinders Ø have different zones Ø head accelerates – most algorithms assume linear movement overhead Ø on device buffer caches may use read-ahead prefetching ð are “smart” with build in low-level scheduler (usually SCAN-derivate) ð we cannot fully control the device (black box) ü OS could (should? ) focus on high level scheduling only INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Memory Caching

Memory Caching

Data Path (Intel Hub Architecture) application Pentium 4 Processor registers file system communication system

Data Path (Intel Hub Architecture) application Pentium 4 Processor registers file system communication system disk network card cache(s) RDRAM memory controller hub file system RDRAM application RDRAM I/O controller hub PCI slots INF 1060 – introduction to operating systems and data communication disk 2005 Kjell Åge Bringsrud & Pål Halvorsen

Buffer Caching application caching possible cache How do we manage a cache? ü how

Buffer Caching application caching possible cache How do we manage a cache? ü how much memory to use? ü how much data to prefetch? ü which data item to replace? ü how do lookups quickly? ü… file system communication system disk network card expensive INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Buffer Caching INF 1060 – introduction to operating systems and data communication 2005 Kjell

Buffer Caching INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Buffer Caching: Windows XP ü I/O manager perform caching Ø centralized facility to all

Buffer Caching: Windows XP ü I/O manager perform caching Ø centralized facility to all components (not only file data) process ü I/O requests processing: 1. 2. q in cache: 3. 4. q I/O request from process I/O manager forwards to cache manager locates and copies to process buffer via VMM notifies process on disk: 3. 4. 5. 6. 7. 8. 9. 10. cache manager generates a page fault VMM makes a non-cached service request I/O manager makes request to file system forwards to disk finds data reads into cache manager copies to process buffer via VMM virtual memory manager notifies process INF 1060 – introduction to operating systems and data communication I/O manager file system drivers virtual memory manager (VMM) cache manager disk drivers Kernel 2005 Kjell Åge Bringsrud & Pål Halvorsen

Buffer Caching: Linux / Unix ü File system perform caching Ø caches disk data

Buffer Caching: Linux / Unix ü File system perform caching Ø caches disk data (blocks) only Ø may hint on caching decisions Ø prefetching ü I/O requests processing: 1. I/O request from process 2. virtual file system forwards to local file system 3. local file system finds requested block number 4. requests block from buffer cache 5. data located… q Kernel virtual file system FAT 32 (Windows) Linux ext 2 fs HFS (Macintosh) … in cache: a. q Process return buffer memory address buffers … on disk: a. make request to disk driver b. data is found on disk and transferred to buffer c. return buffer memory address 6. file system copies data to process buffer 7. process is notified INF 1060 – introduction to operating systems and data communication disk drivers 2005 Kjell Åge Bringsrud & Pål Halvorsen

File Systems

File Systems

Files? ? ü A file is a collection of data – often for a

Files? ? ü A file is a collection of data – often for a specific purpose Ø Ø unstructured files, e. g. , Unix and Windows structured files, e. g. , Mac. OS (to some extent) and MVS ü In this course, we consider unstructured files Ø for the operating system, a file is only a sequence of bytes Ø it is up to the application/user to interpret the meaning of the bytes Ø simpler file systems INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

File Systems ü File systems organize data in files and manage access regardless of

File Systems ü File systems organize data in files and manage access regardless of device type, e. g. : Ø file management – providing mechanisms for files to be stored, referenced, shared, secured, … Ø auxiliary storage management – allocating space for files on secondary storage Ø file integrity mechanisms – ensuring that information is not corrupted, intended content only Ø access methods – provide methods to access stored data Ø … INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Organizing Files - Directories ü A system usually has a large amount of different

Organizing Files - Directories ü A system usually has a large amount of different files ü To organize and quickly locate files, file systems use directories Ø contain no data itself file containing name and locations of other files Ø several types Ø n n single-level (flat) directory structure hierarchical directory structure INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Single-level Directory Systems Root directory Four files ü CP/M Ø Microcomputers Ø Single user

Single-level Directory Systems Root directory Four files ü CP/M Ø Microcomputers Ø Single user system ü VM Ø Host computers Ø “Minidisks”: one partition per user INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Hierarchical Directory Systems ü Tree structure Ø nodes = directories, / root node =

Hierarchical Directory Systems ü Tree structure Ø nodes = directories, / root node = root directory Ø leaves = files / ü Directories Ø stored on disk Ø attributes just like files Ø subdirectories need names ü To access a file Ø must test all directories in path for n n n Ø existence being a directory permissions similar tests on the file itself INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Hierarchical Directory Systems ü Windows: one tree per partition or device Device C Device

Hierarchical Directory Systems ü Windows: one tree per partition or device Device C Device D WINNT EXPLORER. EXE Complete filename example: C: Win. NTEXPLORER. EXE INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Hierarchical Directory Systems ü Unix: single acyclic graph spanning several devices / cdrom /

Hierarchical Directory Systems ü Unix: single acyclic graph spanning several devices / cdrom / doc Howto Complete filename example: /cdrom/doc/Howto INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

INF 1060: Introduction to Operating Systems and Data Communication Operating Systems: Storage: Disks &

INF 1060: Introduction to Operating Systems and Data Communication Operating Systems: Storage: Disks & File Systems Pål Halvorsen 19/10 - 2005 (cnt’d)

File & Directory Operations ü File: Ø create Ø delete Ø open Ø close

File & Directory Operations ü File: Ø create Ø delete Ø open Ø close Ø read Ø write Ø append Ø seek Ø get/set attributes Ø rename Ø link Ø unlink Ø … ü Directory: Ø create Ø delete Ø opendir Ø closedir Ø readdir Ø rename Ø link Ø unlink Ø … INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Example: open(), read() and close() #include <stdio. h> #include <stdlib. h> int main(void) {

Example: open(), read() and close() #include <stdio. h> #include <stdlib. h> int main(void) { int fd, n; char buffer[BUFSIZE]; char *buf = buffer; if ((fd = open( “my. file” , O_RDONLY , 0 )) == -1) { printf(“Cannot open my. file!n”); exit(1); /* EXIT_FAILURE */ } while ((n = read(fd, buf, BUFSIZE) > 0) { <<USE DATA IN BUFFER>> } close(fd); exit(0); /* EXIT_SUCCESS */ } INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

BDS Operating System Open exa open(name, mode, perm) system call handling as described earlier

BDS Operating System Open exa open(name, mode, perm) system call handling as described earlier mp le sys_open() vn_open(): 1. Check if valid call 2. Allocate file descriptor 3. If file exists, open for read. Otherwise, create a new file. Must get directory inode. May require disk I/O. 4. Set access rights, flags and pointer to vnode 5. Return index to file descriptor table fd INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Example: open(), read() and close() #include <stdio. h> #include <stdlib. h> int main(void) {

Example: open(), read() and close() #include <stdio. h> #include <stdlib. h> int main(void) { int fd, n; char buffer[BUFSIZE]; char *buf = buffer; if ((fd = open( “my. file” , O_RDONLY , 0 )) == -1) { printf(“Cannot open my. file!n”); exit(1); /* EXIT_FAILURE */ } while ((n = read(fd, buf, BUFSIZE) > 0) { <<USE DATA IN BUFFER>> } close(fd); exit(0); /* EXIT_SUCCESS */ } INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

BDS Operating System Read exa buffer read(fd, *buf, len) system call handling as described

BDS Operating System Read exa buffer read(fd, *buf, len) system call handling as described earlier mp le sys_read() dofileread() (*fp_read==vn_read)(): 1. Check if valid call and mark file as used 2. Use file descriptor as index in file table to find corresponding file pointer 3. Use data pointer in file structure to find vnode 4. Find current offset in file 5. Call local file system VOP_READ(vp, len, offset, . . ) INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Read Operating System VOP_READ(. . . ) is a pointer to a read function

Read Operating System VOP_READ(. . . ) is a pointer to a read function in the corresponding file system, e. g. , Fast File System (FFS) READ(): 1. Find corresponding inode VOP_READ(vp, len, offset, . . ) 2. Check if valid call - file size vs. len + offset 3. Loop and find corresponding blocks • find logical blocks from inode, offset, length • do block I/O, fill buffer structure e. g. , bread(. . . ) bio_doread(. . . ) getblk() getblk(vp, blkno, size, . . . ) • INF 1060 – introduction to operating systems and data communication return and copy block to user 2005 Kjell Åge Bringsrud & Pål Halvorsen

Operating System Read A B C D E F G H I J K

Operating System Read A B C D E F G H I J K L getblk(vp, blkno, size, . . . ) M 1. Search for block in buffer cache, return if found (hash vp and blkno and follow linked hash list) 2. Get a new buffer (LRU, age) 3. Call disk driver - sleep or do something else VOP_STRATEGY(bp) 4. Reorganize LRU chain and return buffer INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Operating System Read VOP_STRATEGY(. . . ) is a pointer to the corresponding driver

Operating System Read VOP_STRATEGY(. . . ) is a pointer to the corresponding driver depending on the hardware, e. g. , SCSI - sdstrategy(. . . ) sdstart(. . . ) 1. Check buffer parameters, size, blocks, etc. 2. Convert to raw block numbers VOP_STRATEGY(bp) 3. Sort requests according to SCAN - disksort_blkno(. . . ) 4. Start device and send request INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Read file attributes Operating System . . . data pointer data pointer. . .

Read file attributes Operating System . . . data pointer data pointer. . . M INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Read Interrupt to notify end of disk IO Operating System Kernel may awaken sleeping

Read Interrupt to notify end of disk IO Operating System Kernel may awaken sleeping process A B C D E F G H I J K M L 1. Search for block in buffer cache, return if found (hash vp and blkno and follow linked hash list) 2. Get a new buffer (LRU, age) 3. Call disk driver - sleep or do something else 4. Reorganize LRU chain (not M shown) and return buffer INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Read Operating System buffer READ(): 1. Find corresponding inode 2. Check if valid call

Read Operating System buffer READ(): 1. Find corresponding inode 2. Check if valid call - file size vs. len + offset M 3. Loop and find corresponding blocks • find logical blocks from inode, offset, length • do block I/O, e. g. , bread(. . . ) bio_doread(. . . ) getblk() • INF 1060 – introduction to operating systems and data communication return and copy block to user 2005 Kjell Åge Bringsrud & Pål Halvorsen

Example: open(), read() and close() #include <stdio. h> #include <stdlib. h> int main(void) {

Example: open(), read() and close() #include <stdio. h> #include <stdlib. h> int main(void) { int fd, n; char buffer[BUFSIZE]; char *buf = buffer; if ((fd = open( “my. file” , O_RDONLY , 0 )) == -1) { printf(“Cannot open my. file!n”); exit(1); /* EXIT_FAILURE */ } while ((n = read(fd, buf, BUFSIZE) > 0) { <<USE DATA IN BUFFER>> } close(fd); exit(0); /* EXIT_SUCCESS */ } INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Management of File Blocks file attributes. . . data pointer data pointer. . .

Management of File Blocks file attributes. . . data pointer data pointer. . . INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Management of File Blocks ü Many files consist of several blocks Ø relate blocks

Management of File Blocks ü Many files consist of several blocks Ø relate blocks to files Ø how to locate a given block Ø maintain order of blocks ü Approaches Ø chaining in the media Ø chaining in a map Ø table of pointers Ø extent-based allocation INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Chaining in the Media Metadata File blocks ü Metadata points to chain of used

Chaining in the Media Metadata File blocks ü Metadata points to chain of used file blocks ü Free blocks may also be chained D expensive to search (random access) D must read block by block INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Chaining in a Map Metadata Map INF 1060 – introduction to operating systems and

Chaining in a Map Metadata Map INF 1060 – introduction to operating systems and data communication File blocks 2005 Kjell Åge Bringsrud & Pål Halvorsen

FAT Example ü FAT: File Allocation Table ü Versions FAT 12, FAT 16, FAT

FAT Example ü FAT: File Allocation Table ü Versions FAT 12, FAT 16, FAT 32 Ø number indicates number of bits used to identify blocks in partition (2 12, 216, 232) Ø FAT 12: Block sizes 512 bytes – 8 KB: max 32 MB partition size Ø FAT 16: Block sizes 512 bytes – 64 KB: max 4 GB partition size Boot sector 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 FAT 1 … 0000 0003 0004 FFFF 0006 0008 FFFF 0000 … FAT 2 (backup) Root directory Other directories and files empty File 1 File 2 File 3 File 2 empty empty empty INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Table of Pointers Metadata Table of pointers File blocks C good random and sequential

Table of Pointers Metadata Table of pointers File blocks C good random and sequential access C main structure small, extra blocks if needed D uses one indirect block regardless of size D can be too small INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Unix/Linux Example: FFS, UFS, … inode mode owner … Direct block 0 Direct block

Unix/Linux Example: FFS, UFS, … inode mode owner … Direct block 0 Direct block 1 … Direct block 10 Direct block 11 Single indirect Double indirect Triple indirect Flexible block size e. g. 4 KB ca. 1000 entries per index block Data block Data block index index Data block index INF 1060 – introduction to operating systems and data communication Data block 2005 Kjell Åge Bringsrud & Pål Halvorsen

Extent-based Allocation ü Observation: indirect block reads introduce disk I/O and breaks access locality

Extent-based Allocation ü Observation: indirect block reads introduce disk I/O and breaks access locality Metadata List of extents File blocks 1 3 2 C faster block allocation (many at a time) C higher performance reading large data elements C less file system meta data C reduce number of lookups reading a file INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Linux Example: XFS, JFS, … ü Count-augmented address indexing in the extent sections ü

Linux Example: XFS, JFS, … ü Count-augmented address indexing in the extent sections ü Introduce a new inode structure Ø add counter field to original direct entries – n n direct points to a disk block count indicated how many other blocks is following the first block (contiguously) inode attributes direct 0 count 3 0 direct 1 count 1 direct 2 count 2 … … direct 10 count 10 direct 11 count 11 data single indirect double indirect triple indirect INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Windows Example: NTFS ü Each partition contains a master file table (MFT) Ø a

Windows Example: NTFS ü Each partition contains a master file table (MFT) Ø a linear sequence of 1 KB records Ø each record describes a directory or a file (attributes and disk addresses) re d co sta info about data blocks rd nd fil ata e h he ard e na ad i m ade er nfo e r 1 strun extension 1 run 2 nd 2 extension run 3 run 1 run 2, run 3, …, run k-1 20 MFT 42630 MFT 2 27 74 10 7 …data… 2 run k un 78 use 3 d 27 - second extension record 26 - first extension record 24 - base record A file can be … • stored within the record (immediate file, < few 100 B) first 16 reserved for • represented by disk block addresses (which hold data): NTFS metadata runs of consecutive blocks (<addr, no>, like extents) • use several records if more runs are needed INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

Recovery & Journaling ü When data is written to a file, both metadata and

Recovery & Journaling ü When data is written to a file, both metadata and data must be updated Ø Ø metadata is written asynchronously, data may be written earlier if a system crashes, the file system may be corrupted and data is lost ü Journaling file systems provide improved consistency and recoverability Ø Ø makes a log to keep track of changes the log can be used to undo partially completed operations e. g. , Reiser. FS, JFS, XFS and Ext 3 (all Linux) NTFS (Windows) provide journaling properties where all changes to MFT and file system structure are logged INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen

The End: Summary

The End: Summary

Summary ü Disks are the main persistent secondary storage devise ü The main bottleneck

Summary ü Disks are the main persistent secondary storage devise ü The main bottleneck is often disk I/O performance due to disk mechanics: seek time and rotational delays ü Much work has been performed to optimize disks performance Ø Ø scheduling algorithms try to minimize seek overhead (most systems use SCAN derivates) memory caching can save disk I/Os additionally, many other ways (e. g. , block sizes, placement, prefetching, striping, …) world today more complicated (both different access patterns, unknown disk characteristics, …) new disks are “smart”, we cannot fully control the device ü File systems provide Ø Ø file management – store, share, access, … storage management – management of physical storage access methods – functions to read, write, seek, … … INF 1060 – introduction to operating systems and data communication 2005 Kjell Åge Bringsrud & Pål Halvorsen