File Layer and Virtual File System Chapter Three

  • Slides: 54
Download presentation
File Layer and Virtual File System Chapter Three Digital UNIX Internals II 3 -1

File Layer and Virtual File System Chapter Three Digital UNIX Internals II 3 -1 File Layer and Virtual File System

Topics q File System Abstractions q File System Layers q The File Layer q

Topics q File System Abstractions q File System Layers q The File Layer q The Virtual File System q Selected File Related Calls Digital UNIX Internals II 3 -2 File Layer and Virtual File System

UNIX File Abstraction q The File F Stream q any of bytes record structure

UNIX File Abstraction q The File F Stream q any of bytes record structure is imposed by application F Sequential q The or Random Access Directory Structure F Tree-like directory hierarchy F File sharing q hard links - multiple names for same disk file q soft (symbolic) links - stored path shortcut F Access Digital UNIX Internals II control associated with the file 3 -3 File Layer and Virtual File System

File Related System Calls q open(), close() q creat(), unlink() q read(), write() q

File Related System Calls q open(), close() q creat(), unlink() q read(), write() q seek() q getattr(), setattr() q mmap() q ioctl() q fsync() q dup(), dup 2() Digital UNIX Internals II 3 -4 File Layer and Virtual File System

File Descriptor q Applications F small q The F F F name for an

File Descriptor q Applications F small q The F F F name for an open file integer returned by open() first three file descriptors are 0 -- standard input 1 -- standard output 2 -- standard error q These are usually associated with a terminal q Each has an associated offset or file position pointer Digital UNIX Internals II 3 -5 File Layer and Virtual File System

Types of Files q Regular q Directory q Block Special (Device) File q Character

Types of Files q Regular q Directory q Block Special (Device) File q Character Special (Device) File q FIFO (Named Pipe) q Symbolic Link q Socket (In AF_UNIX Domain) Digital UNIX Internals II 3 -6 File Layer and Virtual File System

UNIX Disk Abstraction q q Partitions F Subsets of the disk that may be

UNIX Disk Abstraction q q Partitions F Subsets of the disk that may be treated as logical disk drives. Partitioning a Large Disk F Overcomes 32 -bit UNIX limit problems F Isolates directories F Decreases fsck time disklabel utility writes/edits disk label Partition identified by a special file F F block: /dev/disk/dsk[number][partition_letter] character: /dev/rdisk/disk[number][partition_letter] Digital UNIX Internals II 3 -7 File Layer and Virtual File System

UNIX File System Abstraction q Two Senses FA mountable directory hierarchy administered in /etc/fstab.

UNIX File System Abstraction q Two Senses FA mountable directory hierarchy administered in /etc/fstab. F A specific implementation of the UNIX file abstraction (UFS, NFS, Adv. FS, CDFS, etc). q One file system is the root file system q Other file systems are graphed in to the root by mounting. Digital UNIX Internals II 3 -8 File Layer and Virtual File System

File System Calls q mount(), unmount() q sync() Digital UNIX Internals II 3 -9

File System Calls q mount(), unmount() q sync() Digital UNIX Internals II 3 -9 File Layer and Virtual File System

The Virtual File System: Transparent Access DOS A: B: UNIX C: rz 0 g

The Virtual File System: Transparent Access DOS A: B: UNIX C: rz 0 g Digital UNIX Internals II rz 0 a 3 - 10 rz 3 c File Layer and Virtual File System

The Virtual File System: Uniform Access Application Process Common system calls: open(), close(), read(),

The Virtual File System: Uniform Access Application Process Common system calls: open(), close(), read(), write, seek() VFS To specific filesystem type implementation of the call Digital UNIX Internals II 3 - 11 File Layer and Virtual File System

File System Management Layers System Call read(), write() etc. Manage file access state for

File System Management Layers System Call read(), write() etc. Manage file access state for a given process File Layer Represent filesystem and files generically Virtual File System Specific file system implementation, UFS, Advfs NFS, MFS, etc. File System(s) Digital UNIX Internals II Cache In memory block storage for a file system. Could be traditional buffer cache, unified buffer cache or home grown. Device Local Block Device, Network Interface or a Logical Volume. 3 - 12 File Layer and Virtual File System

Digital UNIX File Systems q “True” Data File Systems F UNIX File System (UFS)

Digital UNIX File Systems q “True” Data File Systems F UNIX File System (UFS) F Network File System (NFS) F Advanced File System (Adv. FS) F Memory File System (MFS) F CD File System, ISO 9660: 1988 (CDFS) F Universal Disk Format (UDF) -- DVDFS q Pseudo-File Systems or Layers F Proc File System (procfs) F File Descriptor File System (FDFS) F File-on-File System Digital UNIX Internals II 3 - 13 File Layer and Virtual File System

CDFS q Compact Disk File System q Support for common extensions F ISO 9660

CDFS q Compact Disk File System q Support for common extensions F ISO 9660 Standard with Rocky Ridge Extensions F Joliet (Microsoft) extensions F Multi-session (Kodak) CD format q Can be exported by NFS Digital UNIX Internals II 3 - 14 File Layer and Virtual File System

CDFS: ISO 9660 Layout q ISO 9660 layout consists of F Primary and Secondary

CDFS: ISO 9660 Layout q ISO 9660 layout consists of F Primary and Secondary volume descriptors (a. k. a. . super blocks) F Path Tables F Directory and File Data q Directory records contain q Location of file or directory q Size q Length of extended attribute record (XAR) q Interleave attributes q Flags q File Name Digital UNIX Internals II 3 - 15 File Layer and Virtual File System

CDFS: Interleaved and Noninterleaved Data Layout Non-Interleaved XAR Gap Data XAR contents: Digital UNIX

CDFS: Interleaved and Noninterleaved Data Layout Non-Interleaved XAR Gap Data XAR contents: Digital UNIX Internals II UID and GID Access Permissions Creation/Modification dates 3 - 16 File Layer and Virtual File System

Memory File System (MFS) q Memory Only - No Permanent Storage q ufs format

Memory File System (MFS) q Memory Only - No Permanent Storage q ufs format (in-memory) q Created with newfs q not wired - backed by swap q use: fast temporary directories F system /tmp F build areas F etc. Digital UNIX Internals II 3 - 17 File Layer and Virtual File System

MFS and swap Physical Memory Digital UNIX Internals II 3 - 18 swap File

MFS and swap Physical Memory Digital UNIX Internals II 3 - 18 swap File Layer and Virtual File System

The /proc File System q The /proc file system is useful for process tracing

The /proc File System q The /proc file system is useful for process tracing or debugging utilities, such as truss or dbx q Structures used by the /proc file system include: prstatus Status of a traced task or thread prrun Actions to be taken before a stopped task or thread is run prpsinfo Information reported by ps Digital UNIX Internals II 3 - 19 File Layer and Virtual File System

File on File Mounting FS q Layer allowing mounting on a regular file of;

File on File Mounting FS q Layer allowing mounting on a regular file of; F regular files F character device files F block special device files q Provided for SVID Conformance q FIFOs are given a names as files q see fattach(3) and fdetach(3) Digital UNIX Internals II 3 - 20 File Layer and Virtual File System

File Layer and VFS Structures VFS UNIX Domain socket vnode UFS inode f_data mount

File Layer and VFS Structures VFS UNIX Domain socket vnode UFS inode f_data mount file ttyvp uf_entry[ ][ ]. ufe_ofile proc and session cdir rdir proc Digital UNIX Internals II uthread utask 3 - 21 ufs_mount utask File Layer and Virtual File System

Per Process File Descriptor “table” q Relates a task to an open file q

Per Process File Descriptor “table” q Relates a task to an open file q Referenced through the utask structure F two-level tree structures q Beginning F entries with V 5. 0 are allocated when qa file is opened or a pipe or socket is created q inherited in a fork() q a descriptor is copied via dup() F entries are deallocated when qa file, pipe or socket is closed q a process terminates Digital UNIX Internals II 3 - 22 File Layer and Virtual File System

File structure q Records state of access to a file q Access mode (R/W)

File structure q Records state of access to a file q Access mode (R/W) q Offset into file q Two tasks may share a single file structure n q Two F for File structures inherited by child processes uses of the file structure regular files q includes an ops vector for manipulating regular files q includes a pointer to a vnode F for sockets q includes an ops vector for manipulating sockets q includes a pointer to a socket Digital UNIX Internals II 3 - 23 File Layer and Virtual File System

File descriptor table within utask q Substructure of utask struct ufile_state uu_file_state; q Ultimately

File descriptor table within utask q Substructure of utask struct ufile_state uu_file_state; q Ultimately references file entry structures struct ufile_entry { struct file *ufe_ofile; struct socket_sel_queue *ufe_so_sel; int ufe_unused; int ufe_oflags; udecl_simple_lock_data(, ufe_ofile_lock) } Digital UNIX Internals II 3 - 24 File Layer and Virtual File System

ufile_state structure (1) struct ufile_state { udecl_simple_lock_data(, uf_ofile_lock) int q. First int utask_need_to_lock; uf_first_available;

ufile_state structure (1) struct ufile_state { udecl_simple_lock_data(, uf_ofile_lock) int q. First int utask_need_to_lock; uf_first_available; available file descriptor uf_of_count; q. Number int of overflow entries uf_flags; q. Marks int pending changes in file descriptor table uf_references q. Used Digital UNIX Internals II to block table shrink 3 - 25 File Layer and Virtual File System

ufile_state structure (2) q Open file bit arrays -- indicates open file u_long uf_open_bits_lvl

ufile_state structure (2) q Open file bit arrays -- indicates open file u_long uf_open_bits_lvl 1 ; u_long *uf_popen_bits_lvl 0; u_long *uf_popen_bits_lvl 1; u_long uf_open_bits_lvl 0 ; q Pointers to the file entries struct ufile_entry *uf_entry[U_FE_ARRAY_SIZE]; struct ufile_entry **uf_of_entry ; } Digital UNIX Internals II 3 - 26 File Layer and Virtual File System

file structure (1) struct file { udecl_simple_lock_data(, f_incore_lock) int f_flag; uint_t f_count; /* reference

file structure (1) struct file { udecl_simple_lock_data(, f_incore_lock) int f_flag; uint_t f_count; /* reference count*/ int f_type; /* descriptor type*/ int f_msgcount; /* references from message queue */ struct ucred *f_cred; /* descriptor's credentials */ struct fileops *f_ops; /* operations on f_data */ caddr_t f_data; /* vnode or socket */. . Digital UNIX Internals II 3 - 27 File Layer and Virtual File System

file structure (2) … union { /* offset or next free file struct */

file structure (2) … union { /* offset or next free file struct */ off_t fu_offset; struct file *fu_freef; } f_u; uint_t f_io_lock; /* I/O lock */ /* (lower half of thread ptr) */ int f_io_waiters; /* number of waiters on i/o lock */ }; Digital UNIX Internals II 3 - 28 File Layer and Virtual File System

struct fileops { int (*fo_read)(); int (*fo_write)(); int (*fo_ioctl)(); int (*fo_select)(); int (*fo_close)(); }

struct fileops { int (*fo_read)(); int (*fo_write)(); int (*fo_ioctl)(); int (*fo_select)(); int (*fo_close)(); } Digital UNIX Internals II 3 - 29 File Layer and Virtual File System

struct fileops Implementations Regular Files: vfs/vfs_vnops. c struct fileops vnops = { vn_read, vn_write,

struct fileops Implementations Regular Files: vfs/vfs_vnops. c struct fileops vnops = { vn_read, vn_write, vn_ioctl, vn_select, vn_close }; Sockets: bsd/sys_socket. c struct fileops socketops = { soo_read, soo_write, soo_ioctl, soo_select, soo_close }; Digital UNIX Internals II 3 - 30 File Layer and Virtual File System

Virtual File System Originally designed for UNIX by Sun Microsystems, Inc. , to support

Virtual File System Originally designed for UNIX by Sun Microsystems, Inc. , to support the Network File System (NFS) q Object-oriented support of multiple file system types: F struct vnode is a generic representation of a file for all types of file system implementations F struct mount is a generic representation of a whole mountable file system for all implementations F a file system implements its own set of: q member functions for vnodes and mount structures qdata structures to combine with generic vnode and mount structures q Digital UNIX Internals II 3 - 31 File Layer and Virtual File System

struct vnode (1) <locks> v_flag v_usecount v_holdcnt lock counts v_lastr v_id v_type v_tag v_mountedhere

struct vnode (1) <locks> v_flag v_usecount v_holdcnt lock counts v_lastr v_id v_type v_tag v_mountedhere v_op v_freef v_freeb v_mountf v_mountb v_buflists_lock v_cleanblkhd v_dirtyblkhd. . . Digital UNIX Internals II multiprocessor exclusion vnode flags reference count of users page & buffer references user-level lock counts mount structure vnodeops structure vnode structure buf structure 3 - 32 last read (read-ahead) capability identifier vnode type of underlying data ptr to vfs we are in ptr to mounted vfs vnode operations vnode freelist forward vnode freelist back vnode mountlist forward vnode mountlist back protect clean/dirty heads clean blocklist head dirty blocklist head File Layer and Virtual File System

struct vnode (2) . . v_ncache_time v_free_time v_output_lock v_numoutput v_outflag v_cache_lookup_refs v_rdcnt v_wrcnt v_dirtyblkpush

struct vnode (2) . . v_ncache_time v_free_time v_output_lock v_numoutput v_outflag v_cache_lookup_refs v_rdcnt v_wrcnt v_dirtyblkpush v_un v_object v_secops v_data[ ] Digital UNIX Internals II last cache activity time on vnode free_list protect numoutput, outflag num of writes in progress output flags count of readers count of writers Snapshot count of dirty blocks Snapshot count of pushed blocks vm_object structure vnsecops structure 3 - 33 ptr to sock, dev specinfo, pipe VM object for vnode security ops placeholder, private data File Layer and Virtual File System

Types of Vnodes Type ----VNON VREG VDIR VBLK VCHR VLNK VSOCK VFIFO Digital UNIX

Types of Vnodes Type ----VNON VREG VDIR VBLK VCHR VLNK VSOCK VFIFO Digital UNIX Internals II Description --------------------------Allocated, but as-yet untyped vnode Vnode representing a regular file Directory vnode Block device vnode Character device vnode Symbolic link vnode UNIX domain socket vnode FIFO special file vnode 3 - 34 File Layer and Virtual File System

struct vnodeops (1) Operation vn_lookup vn_create vn_mknod vn_open vn_close vn_access vn_getattr vn_setattr vn_read vn_write

struct vnodeops (1) Operation vn_lookup vn_create vn_mknod vn_open vn_close vn_access vn_getattr vn_setattr vn_read vn_write vn_ioctl vn_select Digital UNIX Internals II Function Looks up a file Creates a regular file Creates a fifo or device special file Opens a file Closes a file Checks the access for a file Gets file attributes Sets file attributes Reads a file Writes to a file Controls a device For synchronous I/O multiplexing 3 - 35 File Layer and Virtual File System

struct vnodeops (2) Operation vn_mmap vn_fsync vn_seek vn_remove vn_link vn_rename vn_mkdir vn_rmdir vn_symlink vn_readdir

struct vnodeops (2) Operation vn_mmap vn_fsync vn_seek vn_remove vn_link vn_rename vn_mkdir vn_rmdir vn_symlink vn_readdir vn_readlink vn_abortop Digital UNIX Internals II Function Map memory of a character device Synchronize file data and statistics Sets position on a file Removes a file Creates a hard link to a file Renames a file Creates a directory Removes a directory Creates a symbolic link to a file Reads a directory Reads contents of a symbolic link Aborts operation 3 - 36 File Layer and Virtual File System

struct vnodeops (3) Operation Function vn_inactive Sets inactive vn_reclaim Reclaims a vnode vn_bmap Maps

struct vnodeops (3) Operation Function vn_inactive Sets inactive vn_reclaim Reclaims a vnode vn_bmap Maps to file system block vn_strategy Calls device strategy routine vn_print Prints the contents of an inode vn_pgrd Reads a page vn_pgwr Writes a page vn_swap Swaps handler vn_bread. Reads buffer vn_brelse Releases buffer vn_lockctl Provides file locking vn_syncdata Synchronizes range in open file Digital UNIX Internals II 3 - 37 File Layer and Virtual File System

struct vnodeops (4) Operation vn_lock vn_unlock vn_getproplist vn_setproplist vn_delproplist vn_pathconf Digital UNIX Internals II

struct vnodeops (4) Operation vn_lock vn_unlock vn_getproplist vn_setproplist vn_delproplist vn_pathconf Digital UNIX Internals II Function Locks an inode Unlocks an inode Gets extended attributes Sets extended attributes Deletes extended attributes Checks path 3 - 38 File Layer and Virtual File System

struct mount m_lock Lock for synchronization (SMP) m_flag Flags m_funnel Flag for SMP m_next

struct mount m_lock Lock for synchronization (SMP) m_flag Flags m_funnel Flag for SMP m_next m_prev m_op m_vnodecovered m_mounth mount structure vfsops structure vnode structure Next in mount list Previous in mount list Operations on file system Vnode we are mounted on List of vnodes this mount m_vlist_lock Lock for vnode list m_exroot Exported mapping for UID 0 m_uid UID of mounter m_stat File system statistics m_data Private data m_nfs_errmsginfo NFS error information m_unmount_lock Lock for synchronization Digital UNIX Internals II 3 - 39 File Layer and Virtual File System

struct vfsops Operation vfs_mount vfs_start vfs_unmount vfs_root vfs_quotactl vfs_statfs vfs_sync vfs_fhtovp vfs_vptofh vfs_init vfs_mountroot

struct vfsops Operation vfs_mount vfs_start vfs_unmount vfs_root vfs_quotactl vfs_statfs vfs_sync vfs_fhtovp vfs_vptofh vfs_init vfs_mountroot vfs_smoothsync Digital UNIX Internals II Function Mounts the file system Starts the file system Unmounts the file system Returns vnode for the root of the file system Performs operations associated with quotas Updates file system statistics Synchronizes the file system Returns the vnode pointer, given a file handle Returns a file handle, given a vnode pointer Initializes the file system Mount root file system Gently sync the file system 3 - 40 File Layer and Virtual File System

VFS Switch Table q Identifies file system types that have been implemented. q Contains

VFS Switch Table q Identifies file system types that have been implemented. q Contains an entry point for file system operations for each supported file system type. struct vfsops *vfssw[MOUNT_MAXTYPE]; Digital UNIX Internals II 3 - 41 File Layer and Virtual File System

Setting Up File System Operations mount *m_op vfssw vfsops NULLPTR ufs_mount &ufs_vfsops ufs_start &nfs_vfsops

Setting Up File System Operations mount *m_op vfssw vfsops NULLPTR ufs_mount &ufs_vfsops ufs_start &nfs_vfsops ufs_unmount ufs_root ufs_quotactl Digital UNIX Internals II 3 - 42 File Layer and Virtual File System

Mounted File System Structures rootfs Mount Table m_next m_prev m_data v_mount Vnodes v_data file

Mounted File System Structures rootfs Mount Table m_next m_prev m_data v_mount Vnodes v_data file system specific file information Digital UNIX Internals II 3 - 43 File Layer and Virtual File System

Recording Mount Points How are they mounted? (1) A A B B C C

Recording Mount Points How are they mounted? (1) A A B B C C Digital UNIX Internals II 3 - 44 File Layer and Virtual File System

Recording Mount Points How are they mounted? (2) rootfs Mount Structs next m_mounth next

Recording Mount Points How are they mounted? (2) rootfs Mount Structs next m_mounth next v_mounted_here m_mounth Vnode Structs VDIR Digital UNIX Internals II 3 - 45 m_vnodecovered VROOT VDIR File Layer and Virtual File System

File System Operations q namei() Interprets a pathname q mount() Mounts a file system

File System Operations q namei() Interprets a pathname q mount() Mounts a file system q open() Opens a file q read()/write() Reads or writes a file Digital UNIX Internals II 3 - 46 File Layer and Virtual File System

Namei (1) q VFS routine that maps pathnames to vnodes F performs access checks

Namei (1) q VFS routine that maps pathnames to vnodes F performs access checks on each component of that pathname. q Uses VOP_LOOKUP to move down the path q Special Cases F Symbolic Links F Mount Points F Process-Specific root (chroot()) q Special Digital UNIX Internals II Care - unmounts 3 - 47 File Layer and Virtual File System

Namei (2) q. A LRU Hash Table < parent-vnode, component-name> to < target-vnode, capabilities>

Namei (2) q. A LRU Hash Table < parent-vnode, component-name> to < target-vnode, capabilities> q Capabilities are "tags" assigned to vnodes F prevent cache entries from referring to out-ofdate associations q Related data structures include: F namecache - namei cache F nchash - hash list for cache F nchsize - size of cache F nchsz - size of hash list Digital UNIX Internals II 3 - 48 File Layer and Virtual File System

namei()flow Start Copy name into local buffer Copy next component to buffer Yes ".

namei()flow Start Copy name into local buffer Copy next component to buffer Yes ". . " ? Find parent vnode No Call file system specific lookup routine VOP_LOOKUP() Yes Symbolic link? Copy name to buffer No Yes Mounted on? Find root vnode of mounted file system: VFS_ROOT() No Done Digital UNIX Internals II No More components? Yes 3 - 49 File Layer and Virtual File System

mount()flow namei() mount() vmountset() Mount Table VFS_MOUNT Digital UNIX Internals II UFS ufs_mount() 3

mount()flow namei() mount() vmountset() Mount Table VFS_MOUNT Digital UNIX Internals II UFS ufs_mount() 3 - 50 File Layer and Virtual File System

open() flow falloc() File Table open() namei() VOP_LOOKUP() vn_open () VOP_CREATE() Digital UNIX Internals

open() flow falloc() File Table open() namei() VOP_LOOKUP() vn_open () VOP_CREATE() Digital UNIX Internals II 3 - 51 File Layer and Virtual File System

read()and write()flow Process Descriptor Table File Table Vnode Table vn_read() VOP_READ() VOP_WRITE() File read()

read()and write()flow Process Descriptor Table File Table Vnode Table vn_read() VOP_READ() VOP_WRITE() File read() write() getf() rwuio() FOP_READ() FOP_WRITE() Digital UNIX Internals II 3 - 52 ufs_read() ufs_write() File Layer and Virtual File System

Source Reference (1 of 2) q kernel/sys/users. h F open file table ufile_state in

Source Reference (1 of 2) q kernel/sys/users. h F open file table ufile_state in struct utask q kernel/sys/file. h F defines a struct file and fileops q kernel/vfs_vnops. c F implementation of fileops for vnode file structs q kernel/bsd/sys_socket. c F implementation of fileops for socket file structs q kernel/sys/vnode. h F definition Digital UNIX Internals II of vnode and vnodeops 3 - 53 File Layer and Virtual File System

Source Reference (2 of 2) q kernel/sys/mount. h F definition of struct mount and

Source Reference (2 of 2) q kernel/sys/mount. h F definition of struct mount and vfsops q kernel/vfs_syscalls. c F vfs_switch[] q kernel/vfs_lookup. c F implementation of namei() q kernel/vfs_syscalls. c F implementation of mount() and open() calls q kernel/bsd/sys_generic. c F implementation Digital UNIX Internals II of read() and write() calls 3 - 54 File Layer and Virtual File System