SE292 High Performance Computing File Systems Sathish Vadhiyar

  • Slides: 32
Download presentation
SE-292 High Performance Computing File Systems Sathish Vadhiyar

SE-292 High Performance Computing File Systems Sathish Vadhiyar

FILE SYSTEMS What is a file? q Storage that continues to exist beyond lifetime

FILE SYSTEMS What is a file? q Storage that continues to exist beyond lifetime of program (persistent) q Named sequence of bytes stored on disk 2

Moving-head Disk Mechanism 3

Moving-head Disk Mechanism 3

About Disks n n n n Platter: metal disk covered with magnetic material Multiple

About Disks n n n n Platter: metal disk covered with magnetic material Multiple platters rotating together on common spindle Read/write head: electromagnet used to read/write Tracks: concentric circular recording surfaces Sector/block: unit of track that is read/written Head associated with disk arm, attached to actuator Cylinder: all tracks associated with a given actuator position Our view of disk: linear address space of fixed size sectors/blocks numbered from 0 up 4

Other Disk Components n n Disk drive connected to computer by I/O bus Data

Other Disk Components n n Disk drive connected to computer by I/O bus Data transfers on bus carried by special processors – host controller on the host side, disk controller on the disk side 5

Disk Performance n Transfer rate – rate of data flow between disk drive and

Disk Performance n Transfer rate – rate of data flow between disk drive and computer (few megabytes per sec) q n n Data transferred from memory to disks in units of blocks. Each block consists of sectors. Seek time/latency – time to move disk arm to desired cylinder (few milliseconds) Rotational time/latency – time for the sector in the track to rotate and position and under the head (few milliseconds) 6

Disk Attachment n Can be host-attached – DVD, CD, hard disk by special buses

Disk Attachment n Can be host-attached – DVD, CD, hard disk by special buses and protocols q n n Protocols - SATA, SCSI (difference in terms of number of disk drives, address space, speed of transfers) Network-Attached – NFS Storage Area Network q q To prevent storage traffic interfering with other network traffic Specialized network 7

Operations on Files n n n fd = open (name, operation) fd = creat

Operations on Files n n n fd = open (name, operation) fd = creat (name, mode) status = close(fd) bytecount = read (fd, buffer, bytecount) bytecount = write (fd, buffer, bytecount) offset = lseek (fd, offset, whence) status = link (oldname, newname) status = unlink (name) status = stat (name, buffer) status = chown (name, owner, group) status = chmod (name, mode) 8

Common File Access Patterns n n Sequential access: bytes of file are read in

Common File Access Patterns n n Sequential access: bytes of file are read in order from start to finish Random access: bytes of file are read in some (random) order File System Design Issues q q q Disk management: efficient use of disk space Name management: how users select files for use Protection: of files from users 9

Disk Management Issues 1. Allocation: How are disk blocks associated with a file? 2.

Disk Management Issues 1. Allocation: How are disk blocks associated with a file? 2. Arm scheduling: Which disk I/O request should be sent to disk next? FCFS, Shortest Seek Time First (SSTF), Scan, C-Scan 10

Disk Block Allocation: Contiguous File Descriptor: OS structure that describes which blocks on disk

Disk Block Allocation: Contiguous File Descriptor: OS structure that describes which blocks on disk represent a file 17 20 File is stored in contiguous blocks on disk q File descriptor: first block address, file size 94 File 1: Size 4 blocks; Blocks 17, 18, 19, 20 File 2: Size 6 blocks; Blocks 94, 95, 96, 97, 98, 99 99 File 1: Start 17 Size 4 File 2: Start 94 Size 6 11

Disk Block Allocation: Linked Each block contains disk address of next file block q

Disk Block Allocation: Linked Each block contains disk address of next file block q File descriptor: first block address 14 99 17 84 84 14 99 nil File 1: Size 4 blocks; Blocks 17, 84, 14, 99 File 1: Start 17 12

FAT system n n n File Allocation Table A form of indexed allocation A

FAT system n n n File Allocation Table A form of indexed allocation A portion of disk used for FAT 13

Disk Block Allocation: Indexed File Index is an array containing addresses of 1 st,

Disk Block Allocation: Indexed File Index is an array containing addresses of 1 st, 2 nd, etc block of file q File descriptor: index 14 17 File 1: Size 4 blocks; Blocks 17, 84, 14, 99 INDEX 1 2 3 4 17 84 14 99 Problem: size of the index? 84 99 Some schemes? 14

UNIX Version of Indexed Allocation 1 2 3 4 5 6 7 8 9

UNIX Version of Indexed Allocation 1 2 3 4 5 6 7 8 9 10 11 12 Disk block addresses of file Assume disk block size: 1 KB, disk block address size: 4 B Maximum file size: 9 KB + 256*256 KB +256*256 KB Indirect disk block address – address of disk block containing more disk block addresses of the file Doubly indirect disk block address Triply indirect disk block address 15

Combined Scheme: UNIX (4 K bytes per block) n n Pointers can occupy significant

Combined Scheme: UNIX (4 K bytes per block) n n Pointers can occupy significant space Performance can be improved – disk controller cache, buffer cache 16

Name Management Issues: How do users refer to files? How does OS find file,

Name Management Issues: How do users refer to files? How does OS find file, given a name? n Directory: mapping between file name and file descriptor q q Could have a single directory for the whole disk, or a separate directory for each user UNIX: tree structured directory hierarchy n Directories stored on disk like regular files n Each contains (filename, i-number) pairs . for itself (. . ) n Each contains an entry with name n Special (nameless) directory called the root 17

Protection Objective: to prevent accidental or intentional misuse of a file system n Aspects

Protection Objective: to prevent accidental or intentional misuse of a file system n Aspects of a protection mechanism q q q n User identification (authentication) Authorization determination: determining what the user is entitled to do Access enforcement UNIX q 3 sets of 3 access permission bits in each descriptor 18

File System Structure n n Layered file structure consisting of following layers (top to

File System Structure n n Layered file structure consisting of following layers (top to bottom) Logical file system q n n n contains inodes or file control block – a FCB contains information about file including ownership, permission, location File organization q Translation between logical and physical blocks q manages buffers and caches q contains device driver Basic file system I/O control 19

File System Implementation n n In disks – FCB (contains pointers to blocks) In

File System Implementation n n In disks – FCB (contains pointers to blocks) In memory – system-wide open file table, per-process file table (thus 2 tables) Operations on file using pointer to an entry in per-process file table Entry is referred as file descriptor 20

In-Memory File System Structures File Open File Read 21

In-Memory File System Structures File Open File Read 21

UNIX I/O Kernel Structure 22

UNIX I/O Kernel Structure 22

Life Cycle of An I/O Request 23

Life Cycle of An I/O Request 23

File System Performance Ideas • Caching or buffering • • • System keeps in

File System Performance Ideas • Caching or buffering • • • System keeps in main memory a disk cache of recently used disk blocks Could be managed using an LRU like policy Pre-fetching • If a file is being read sequentially, a few blocks can be read ahead from the disk 24

Memory Mapped Files n n n n Traditional open, /lseek/read/write/close are inefficient due to

Memory Mapped Files n n n n Traditional open, /lseek/read/write/close are inefficient due to system calls, data copying Alternative: map file into process virtual address space Access file contents using memory addresses Can result in page fault if that part not in memory Applications can access and update in the file directly and in-place (instead of seeks) System call: mmap(addr, len, prot, flags, fd, off) Some OS’s: cat, cp use mmap for file access 25

Asynchronous I/O n n Objective: allows programmer to write program so that process can

Asynchronous I/O n n Objective: allows programmer to write program so that process can perform I/O without blocking Eg: Sun. OS aioread, aiowrite library calls q Aioread(fd, buff, numbytes, offset, whence, result) n n n Reads numbytes of data from fd into buff from position specified and offset The buffer should not be referenced until after operation is completed; until then it is in use by the OS Notification of completion may be obtained through aiowait or asynchronously by handling signal SIGIO 26

Two I/O Methods Synchronous Asynchronous 27

Two I/O Methods Synchronous Asynchronous 27

Blocking and Non-Blocking I/O n n Blocking – process moved from ready to wait

Blocking and Non-Blocking I/O n n Blocking – process moved from ready to wait queue. Execution of application is suspended. Non-blocking – overlapping computation and I/O. Using threads. 28

DMA (Direct Memory Access) n n n It is wasteful for the CPU to

DMA (Direct Memory Access) n n n It is wasteful for the CPU to engage in I/O between device and memory Many systems have special purpose processor called DMA controller CPU writes “I/O details” to memory Sends this address to DMA controller Thereafter, DMA engages in transfer of data between device and memory Once complete, DMA controller informs (interrupts) CPU 29

Six Step Process to Perform DMA Transfer 30

Six Step Process to Perform DMA Transfer 30

RAID n n Redundant Array of Independent Disks (RAIDS) – multiple disks to improve

RAID n n Redundant Array of Independent Disks (RAIDS) – multiple disks to improve performance and reliability Performance q n Can be used to increase simultaneous access and transfer rate by striping Reliability q q MTBF (Mean Time Between Failure) decreases with more disks Hence data has to be redundantly stored 31

32

32