Chapter 16 File Systems Persistent storage storage that

  • Slides: 14
Download presentation
Chapter 16 - File Systems – Persistent storage: storage that will continue to exist

Chapter 16 - File Systems – Persistent storage: storage that will continue to exist after a program that uses or creates it completes. – Sometimes called secondary storage, since the devices commonly used to store permanent objects are farther down the storage hierarchy. Examples: disks, CD-ROMs, tapes, etc. – Disks are user-unfriendly (imagine having to access info by sector numbers only).

 • Files and File Systems – A file system provides a convenient way

• Files and File Systems – A file system provides a convenient way for users to manage their data. – A file is a sequence of bytes of arbitrary length. – Files are implemented by the operating system to provide persistent storage. – A file system provides a way of storing, naming and protecting files. – Accessing file data occurs through layers: File system interface (system calls), device driver interface and disk hardware interface (Figure 16. 1). – By tracing the file concept down the line you can see what each part manages (Figure 16. 2).

 • Files and File Systems – A useful abstraction is typical in a

• Files and File Systems – A useful abstraction is typical in a file system: • File name space (directory structure): human-readable strings • File data space: actual data blocks of the files – Note how the system calls reflect this abstraction: • open() maps a string to a local file ID. • read(), write(), close() use the local file ID, not a string. • Other (UNIX) file system calls that deal with the file name space also use strings (mkdir(), unlink(), etc. ).

 • Logical File Structure – Files are referenced by the operating system in

• Logical File Structure – Files are referenced by the operating system in three possible atomic forms (Figure 16. 4): • Bytes (“flat file”) - typical of UNIX • Fixed-length records - think: tapes • Variable-length records - think: database records – In general, older OSes tended to support multiple file formats. Trend is towards treating files as mere byte streams, letting the application layers decided how to impose structure. High-end database servers can even manage the drive directly, bypassing the file system altogether (Figure 16. 5). – Files vary from 0 bytes to very large (usually limited by “native math” of machine’s registers).

 • Logical File Structure – 32 bits allows for a theoretical 4 GB

• Logical File Structure – 32 bits allows for a theoretical 4 GB file (assuming byte addressed files). Notice on xi, however, that the “df” command displays some file systems that “break” the 32 -bit barrier : ) df -k /real/barracuda 9 – As discussed earlier, files also have metadata (name, type, size, owner, group(s), permissions, timestamps, disk & data block pointers, etc. ). – UNIX: stat() and fstat() return metadata; ls command displays them.

 • File Naming – Virtually all modern OSes use a hierarchical file naming

• File Naming – Virtually all modern OSes use a hierarchical file naming system. – Note that the separator character distinguishing path components is different (/ = UNIX, = DOS/Win, : = Macintosh). – Different limits exist as to what are legal file names and how long they can be: • • DOS FAT filesystem: case insensitive “ 8+3”/component. Win 95 FAT: Kludge of the first order; up to 255 chars. Win. NT NTFS: case insensitive; up to 255 chars. UNIX: case sensitive; from 14 (old limit) to 255 chars/component. • Macintosh: case insensitive (even though stored sensitively); up to 31 chars/component.

 • File Naming – Tree example: Figure 16. 6. Note presence of an

• File Naming – Tree example: Figure 16. 6. Note presence of an alias used to connect children of different parents. In UNIX this is done with the ln command, which has two types of links: hard and soft (aka symbolic). In Win this is called a shortcut; Macintoshes call it an alias. – Absolute path name: full path from the “root” of the file system tree (UNIX: “/” prefix; DOS/Win: “” prefix). Note that for DOS/Win the drive letters represent roots of separate trees. – Current (working) directory: allows use of relative path names by use of an absolute prefix. Displayed via pwd command in UNIX and CD in DOS/Win.

 • File Naming – Note how current working directory obeys the locality model

• File Naming – Note how current working directory obeys the locality model we saw in the memory chapters -- file objects that are used by a program tend to hang around together. – The hierarchical file system can allow for variations (Figure 16. 7); but it can be dangerous (16. 7 -c). – UNIX (shells and web servers, actually) uses the ~ character in a path name to indicate the home directory of a particular user (~jtbauer == /home/cs 46/jtbauer or where ever my home may be).

 • File Naming Conventions – File naming conventions are sometimes a necessary part

• File Naming Conventions – File naming conventions are sometimes a necessary part of the operating system semantics (. COM and. EXE files in DOS/Win) or merely a set of conventions (most UNIX file extensions). – Typical extensions exist for a variety of OSes, programs and applications: . c, . txt, . s, . OBJ, . o, . a, . LIB, . EXE, . COM, . tex, . gif, . jpg, . mov, . avi, . ps, . Z, . gz, . mif, . DOC, . h, . cpp, . c++, . pas, etc. ! • File system operations – Figures 16. 8, 16. 9 & 16. 10 categorize file system operations into three: operations on files, operations on open files & operations on directories.

 • File System Implementation – File systems are typically layered, to provide useful

• File System Implementation – File systems are typically layered, to provide useful abstractions at various levels. – Figure 16. 11 diagrams typical file system data structures: • Process Descriptor contains an open file pointer array, used to point a processes’ open files to entries in the open file table. • The open file table is a system-wide OS-managed table of entries for all opened files. It typically contains: – Current file position – File status info (R/W, locks, file type, etc. ) – Pointer to the file descriptor/device driver/pipe data structure • The file descriptor table is an in-memory copy of diskresident file descriptors

 • File System Implementation – File descriptor table points to information about a

• File System Implementation – File descriptor table points to information about a particular file: • owner, file protection info, timestamps, location on disk – Note that the disk drive also contains other information: • • File system info File descriptors Directories File data – In some file systems these data structures are intermingled.

 • File System Implementation – Control/data flow for open() (Figure 16. 12) and

• File System Implementation – Control/data flow for open() (Figure 16. 12) and read() (Figure 16. 13): • Left hand side shows the data structures involved. • Right hand side shows the flow of control through the file system layers. – Notice the distinction between the logical file system and the physical file system: • The logical file system deals with logical byte offsets, logical blocks and logical block numbers in a diskindependent fashion. • The physical file system deals with physical blocks on actual disk drives. – Notice that memory caching is used to improve performance.

 • File System Implementation – Physical file systems connect to the appropriate I/O

• File System Implementation – Physical file systems connect to the appropriate I/O system, identified by a device number. – A device switch (jump table) maps the device number to the address of the corresponding device driver (Figure 16. 14). – UNIX-style operating systems use special files to address the device drivers (try “ls -l /dev”). – The fork() system call duplicates the parent’s open files (Figure 16. 15). – Other system calls modify various parts of the file system data structure, depending on the operation being performed (Figs. 16 -16. 18).

 • File System Implementation – Notice how use of the VM system’s page

• File System Implementation – Notice how use of the VM system’s page tables allows for copy avoiding (Fig. 16. 19). – File system directory implementation: • Maps component names to file descriptors. • Sometimes the FDs are in the directory, other times the directory contains pointers to the FDs elsewhere. • Name/path resolution algorithm: Figure 16. 20. • Notice that typically directories are implemented as files. • UNIX: directory contains name to inode number mappings; the inode (information node) is the UNIX term for a file descriptor and it contains the metadata of the file. Try “od -cx dirname” and “ls -i” on UNIX. – Skip section 16. 6 (Example File System Implementation).