DISTRIBUTED FILE SYSTEMS 1 Topics Introduction File Service
DISTRIBUTED FILE SYSTEMS 1
Topics § Introduction § File Service Architecture § Case Study: Sun NFS Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005
Introduction § A distributed file system is a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer. § File system were originally developed for centralized computer systems and desktop computers. § File system was as an operating system facility providing a convenient programming interface to disk storage. 3
� What is the advantages of the persistent storage at a few servers? ◦ Reduce the need for local disk storage ◦ Enables economies to be made in the management and archiving of the persistent data owned by an organization ◦ Other services ( name service, user authentication service and print service) can be more easily implemented.
Introduction § Distributed file systems support the sharing of information in the form of files and hardware resources. § A well designed file service provides access to files stored at a server with performance and reliability similar to, and in some cases better than, files stored on local disks. § Figure 1 provides an overview of types of storage system. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 5
Types of storage system Sharing Persis- Distributed Consistency Example tence cache/replicas maintenance Main memory 1 1 File system Distributed file system RAM UNIX file system Web Sun NFS Web server Distributed shared memory Ivy (Ch. 18) Remote objects (RMI/ORB) Persistent object store 1 1 Peer-to-peer storage system CORBA Persistent Object Service Ocean. Store(Ch. 10) Figure 1. Storage systems and their properties Types of consistency between copies: 1 - strict one-copy consistency √ - approximate consistency X - no automatic consistency 6
A persistent object store is a computer storage system that records and retrieves complete objects, or provides the illusion of doing so. Simple examples store the serialized object in binary format. Distributed shared memory (DSM) is a form of memory architecture where the (physically separate) memories can be addressed as one (logically shared) address space. It provides an emulation of a shared memory by the replication of memory pages or segments at each host.
Consistency � Consistency indicates whether mechanisms exist for the maintenance of Consistency between multiple copies of data when updates occur. � Cashing was first applied to main memory and nondistributed file system, for which consistency is strict ( denoted by a ‘ 1’ because programs cannot observe any discrepancies between copies after updates. � When distributed replicas are used, strict Consistency is more difficult to achieve. Distributed file systems such as Sun NFS adopt specific consistency mechanisms to maintain an approximation to strict Consistency
Cont. � The consistency between the copies stored at the web proxies and client caches and the original server is only maintained by explicit user action. � Clients are notified when a page stored at the original server updated; they must perform explicit checks to keep their local copies up to date.
Characteristics of file systems § File systems are responsible for the organization, storage, retrieval, naming, sharing and protection of files. § Files contain both data and attributes. § File data consists of a sequence of data items (8 -bit), accessible by operation to read and write any portion of the sequence. § The attributes are held as a single record containing information such as the length of the file, time stamps, file type, owner’s identity and access control list. 10
Distributed file systems File attribute record structure updated by system: File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner updated by owner: File type Access control list E. g. for UNIX: rw-rw-r-11
� File system are designed to store and manage large numbers of files with facilities for creating, naming and deleting files. The name of files is supporting by the use of directories. � Directories: is a file that provides a mapping from text names to internal file identifiers. � The term Matadata: is often used to refer to all of the extra information stored by a file system that is needed for the management of files. And its include: �File attributes �Directories �All other persistent information used by file system
� Figure 8. 2 shows a typical layered module structure for the implementation of a non-distributed file system in a conventional operating system. � Each � The layer depends only the layered below it. implementation of a distributed file system requires all of the components there, with additional components to deal with client-server communication and with the distributed naming and location of files.
Figure 2. File system modules Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 14
§ Figure 4 summarizes the main operations on files that are available to applications in UNIX systems. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 15
Introduction Figure 4. UNIX file system operations filedes = open(name, mode) filedes = creat(name, mode) status = close(filedes) count = read(filedes, buffer, n) count = write(filedes, buffer, n) pos = lseek(filedes, offset, whence) status = unlink(name) status = link(name 1, name 2) status = stat(name, buffer) Opens an existing file with the given name. Creates a new file with the given name. Both operations deliver a file descriptor referencing the open file. The mode is read, write or both. Closes the open filedes. Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the read-write pointer. Moves the read-write pointer to offset (relative or absolute, depending on whence). Removes the file name from the directory structure. If the file has no other names, it is deleted. Adds a new name (name 2) for a file (name 1). Gets the file attributes for file name into buffer. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 16
Distributed File system requirements Ø Related requirements in distributed file systems are: 1 - Transparency 2 - Concurrent file updates : changes to a file by one client should not interfere with the operation of other clients. 3 - File Replication: In a file service that supports replication, a file may represented by several copies at different locations. 4 -Hardware and operating system heterogeneity: The service interfaces should be defined so that client and server software can be implemented for different operating system and computers Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 17
5 - Fault tolerance: The central role of the file service in distributed systems makes it essential that the service continue to operate in the face of client and server failures. 6 - Consistency: 7 - Security 8 - Efficiency
File Service Architecture § An architecture that offers a clear separation of the main concerns in providing access to files is obtained by structuring the file service as three components: Ø A flat file service Ø A directory service Ø A client module. § The relevant modules and their relationship is shown in Figure 5. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 19
File Service Architecture Client computer Application program Server computer Directory service Flat file service Client module Figure 5. File service architecture Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 20
File Service Architecture § The Client module implements exported interfaces by flat file and directory services on server side. § Responsibilities of various modules can be defined as follows: Ø Flat file service: v Concerned with the implementation of operations on the contents of file. Unique File Identifiers (UFIDs) are used to refer to files in all requests for flat file service operations. UFIDs are long sequences of bits chosen so that each file has a unique among all of the files in a distributed system. v When the flat file service receives a request to create a file it generate a new UFID and send it to the requester 21
File Service Architecture Ø Directory service: v Provides mapping between text names for the files and their UFIDs. v Clients may obtain the UFID of a file by quoting its text name to directory service. v Directory service provides the functions needed to generate directories, to add new files to directories and to obtain UFIDs from directories. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 22
File Service Architecture Ø Client module: v It runs on each client computer and provides integrated service and extending the operations of the flat file service and directory service under a single application programming Interface that is available to user programs in client. v For example, in UNIX hosts, a client module emulates the full set of Unix file operations. v It holds information about the network locations of flat-file and directory server processes v It can be achieve better performance through implementation of a cache of recently used file blocks at the client. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 23
File Service Architecture Ø Flat file service interface: v Figure 6 contains a definition of the interface to a flat file service. This is the RPC interface used by client module. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 24
File Service Architecture Read(File. Id, i, n) -> Data if 1≤i≤Length(File): Reads a sequence of up to n items -throws Bad. Position from a file starting at item i and returns it in Data. Write(File. Id, i, Data) if 1≤i≤Length(File)+1: Write a sequence of Data to a -throws Bad. Position file, starting at item i, extending the file if necessary. Create() -> File. Id Creates a new file of length 0 and delivers a UFID for it. Delete(File. Id) Removes the file from the file store. Get. Attributes(File. Id) -> Attr Returns the file attributes for the file. Set. Attributes(File. Id, Attr) Sets the file attributes (only those attributes that are not shaded in Figure 3. ) Figure 6. Flat file service operations Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 25
Comparison with UNIX � Our interface and the Unix file system primitives are functionally equivalent. � It is a Simple matter to construct a client module that emulates the UNIX system calls.
Flat file service UNIX system No open and close operations Read and write request include parameter i Each read or write operation starts at the current position of read-write pointer Seek operation reposition the pointer The operations are idempotent At-least-once RPC semantics Operations are not idempotent The interface is suitable for implementation by stateless server No requirement for stateless implementation
Cont. Ø Access control v In the Unix file system , the user’s access rights are checked against the access mode v In distributed implementations, access rights checks have to be performed at the server because the server RPC interface is an otherwise unprotected point of access to files. Ø Directory service interface v Figure 7 contains a definition of the RPC interface to a directory service. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 28
Cont. Lookup(Dir, Name) -> File. Id Locates the text name in the directory and -throws Not. Found returns the relevant UFID. If Name is not in the directory, throws an exception. Add. Name(Dir, Name, File) If Name is not in the directory, adds(Name, File) -throws Name. Duplicate to the directory and updates the file’s attribute record. If Name is already in the directory: throws an exception. Un. Name(Dir, Name) If Name is in the directory, the entry containing Name is removed from the directory. If Name is not in the directory: throws an exception. Get. Names(Dir, Pattern) -> Name. Seq Returns all the text names in the directory that match the regular expression Pattern. Figure 7. Directory service operations Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 29
Cont. Ø Hierarchic file system v A hierarchic file system such as the one that UNIX provides consists of a number of directories arranged in a tree structure. Ø File Group v A file group is a collection of files that can be located on any server or moved between servers while maintaining the same names. – A similar construct is used in a UNIX file system. – It helps with distributing the load of file serving between several servers. – File groups have identifiers which are unique throughout the system (and hence for an open system, they must be globally unique). Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 30
Cont. To construct a globally unique ID we use some unique attribute of the machine on which it is created, e. g. IP number, even though the file group may move subsequently. File Group ID: 32 bits IP address 16 bits date Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 31
DFS: Case Studies § NFS (Network File System) Ø Developed by Sun Microsystems (in 1985) Ø Most popular, open, and widely used. Ø NFS protocol standardized through IETF (RFC 1813) § AFS (Andrew File System) Ø Developed by Carnegie Mellon University as part of Andrew distributed computing environments (in 1986) Ø A research project to create campus wide file system. Ø Public domain implementation is available on Linux (Linux. AFS) Ø It was adopted as a basis for the DCE/DFS file system in the Open Software Foundation (OSF, www. opengroup. org) DEC (Distributed Computing Environment Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 32
Case Study: Sun NFS q NFS (Network File System) allows hosts to mount partitions on a remote system and use them as though they are local file systems. This allows the system administrator to store resources in a central location on the network, providing authorized users continuous access to them. q The Sun Network Files system (NFS) provides transparent, remote access to file systems. q NFS is designed to be easily portable to other operating systems and machine architectures. 33
Case Study: Sun NFS q NFS is implemented on top of a Remote Procedure Call package (RPC) to help simplify protocol definition, implementation, and maintenance. q In order to build NFS into the UNIX kernel in a way that is transparent to applications, we decided to add a new interface to the kernel which separates generic file system operations from specific file system implementations. q The “file system interface” consists of two parts: (1) Virtual File System (VFS) interface defines the operations that can be done on a file system, (2) Virtual node (Vnode) interface defines the operations that can be done on a file within that file system. q This new interface allows us to implement and install new file systems in much the same way as new device drivers are added to the kernel. 34
NFS architecture Client computer Server computer Application program UNIX system calls Virtual file system Operations on local files UNIX file system Other file system UNIX kernel Operations on remote files NFS client Figure 8. NFS architecture Virtual file system NFS server UNIX file system NFS protocol (remote operations) Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 35 *
Virtual file system q It is clear that NFS provides access transparency: user programs can issue file operations for local or remote files without distinction. q Other distributed file systems may be present that support UNIX system calls, and if so, they could be integrated in the same way. q The integration is achieved by a virtual file system (VFS) module, which has been added to the UNIX kernel to distinguish between local and remote files and to translate between the UNIX-independent file identifiers used by NFS and the internal file identifiers normally used in UNIX and other file systems. q In addition, VFS keeps track of the file systems that are currently available both locally and remotely, and it passes each request to the appropriate local system module (the UNIX file system, the NFS client module or the service module for another file system).
NFS protocol q The NFS protocol uses the Sun Remote Procedure Call (RPC) mechanism For the same reasons that procedure calls simplify programs, RPC helps simplify the definition, organization, and implementation of remote services. q The NFS protocol is defined in terms of a set of procedures, their arguments and results, and their effects. Remote procedure calls are synchronous, that is, the client application blocks until the server has completed the call and returned the results. This makes RPC very easy to use and understand because it behaves like a local procedure call. q NFS uses a stateless protocol. The parameters to each procedure call contain all of the information necessary to complete the call, and the server does not keep track of any past requests. This makes crash recovery easy; when a server crashes, the client resends NFS requests until a response is received, and the server does no crash recovery at all. When a client crashes, no recovery is necessary for either the client or the server.
q. The file identifiers used in NFS are called file handles. q A file handle is opaque to clients and contains whatever information the server needs to distinguish an individual file. q In UNIX implementations of NFS, the file handle is derived from the file’s inode number by adding two extra fields as follows. 39
q The file system identifier field is a unique number that is allocated to each file system when it is created. q The i-node number of file is a number that serves to identify and locate the file within the file system in which the file is stored. q The i-node generation number is needed because in the conventional UNIX file system i-node numbers are reused after a file is removed. q The virtual file system layer has one VFS structure for each mounted file system and one v-node per open file. q A VFS structure relates a remote file system to the local directory on which it is mounted. q The v-node contains an indicator to show whether a file is local or remote. If the file is local, the v-node contains a reference to the index of the local file (an inode in a UNIX implementation). If the file is remote, it contains the file handle of the remote file.
Client integration q The NFS client module cooperates with the virtual file system in each client machine. q It operates in a similar manner to the conventional UNIX file system, transferring blocks of files to and from the server and caching the blocks in the local memory whenever possible. q It shares the same buffer cache that is used by the local input-output system. q But since several clients in different host machines may simultaneously access the same remote file, a new and significant cache consistency problem arises.
Access control and authentication q Unlike the conventional UNIX file system, the NFS server is stateless and does not keep files open on behalf of its clients. So the server must check the user’s identity against the file’s access permission attributes afresh on each request, to see whether the user is permitted to access the file in the manner requested. q The Sun RPC protocol requires clients to send user authentication information with each request and this is checked against the access permission in the file attributes. q An NFS server provides a conventional RPC interface at a well-known port on each host and any process can behave as a client, sending requests to the server to access or update a file. q The client can modify the RPC calls to include the user ID of any user, impersonating the user without their knowledge or permission.
§ NFS access control and authentication Ø The NFS server is stateless server, so the user's identity and access rights must be checked by the server on each request. v In the local file system they are checked only on the file’s access permission attribute. Ø Every client request is accompanied by the user. ID and group. ID Ø Kerberos has been integrated with NFS to provide a stronger and more comprehensive security solution. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 43
Case Study: Sun NFS § A simplified representation of the RPC interface provided by NFS version 3 servers is shown in Figure 9. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 44
Case Study: Sun NFS • • • • read(fh, offset, count) -> attr, data write(fh, offset, count, data) -> attr create(dirfh, name, attr) -> newfh, attr remove(dirfh, name) status getattr(fh) -> attr setattr(fh, attr) -> attr lookup(dirfh, name) -> fh, attr rename(dirfh, name, todirfh, toname) link(newdirfh, newname, dirfh, name) readdir(dirfh, cookie, count) -> entries symlink(newdirfh, newname, string) -> status readlink(fh) -> string mkdir(dirfh, name, attr) -> newfh, attr rmdir(dirfh, name) -> status statfs(fh) -> fsstats Figure 9. NFS server operations (NFS Version 3 protocol, simplified) Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 45
Mount service Ø Mount operation: mount (remotehost, remotedirectory, localdirectory) Ø Server maintains a table of clients who have mounted filesystems at that server. Ø Each client maintains a table of mounted file systems holding: < IP address, port number, file handle> Ø Remote file systems may be hard-mounted or soft -mounted in a client computer. Ø Figure 10 illustrates a Client with two remotely mounted file stores. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 46
Mount service q The mounting of subtrees of remote filesystems by clients is supported by a separate mount service process that runs at user level on each NFS server computer. q On each server, there is a file with a well-known name containing the names of local filesystems that are available for remote mounting. q An access list is associated with each filesystem name indicating which hosts are permitted to mount the filesystem.
Case Study: Sun NFS Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2.
Server caching Ø Similar to UNIX file caching for local files: v pages (blocks) from disk are held in a main memory buffer cache until the space is required for newer pages. Read-ahead and delayed-write optimizations. v For local files, writes are deferred to next sync event (30 second intervals). v Works well in local context, where files are always accessed through the local cache, but in the remote case it doesn't offer necessary synchronization guarantees to clients. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 49
Server caching Ø NFS servers use the cache at the server machine just as it is used for other file accesses. Ø The use of the server’s cache to hold recently read disk blocks does not raise any consistency problems; Ø But when a server performs write operations, extra measures are needed to ensure that clients can be confident that the results of the write operations are persistent, even when server crashes occur. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 50
Case Study: Sun NFS Ø NFS v 3 servers offers two strategies for updating the disk: v Write-through - altered pages are written to disk as soon as they are received at the server. When a write() RPC returns, the NFS client knows that the page is on the disk. v Delayed commit - pages are held only in the cache until a commit() call is received for the relevant file. This is the default mode used by NFS v 3 clients. A commit() is issued by the client whenever a file is closed. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 51
Case Study: Sun NFS § Client caching Ø Server caching does nothing to reduce RPC traffic between client and server v further optimization is essential to reduce server load in large networks. v NFS client module caches the results of read, write, getattr, lookup and readdir operations v synchronization of file contents (one-copy semantics) is not guaranteed when two or more clients are sharing the same file. 52
Case Study: Sun NFS Ø Timestamp-based validity check v It reduces inconsistency, but doesn't eliminate it. v It is used for validity condition for cache entries at the client: (T - Tc < t) v (Tmclient = Tmserver) t freshness guarantee Tc time when cache entry was last validated Tm time when block was last updated at server T current time 53
Case Study: Sun NFS v t is configurable (per file) but is typically set to 3 seconds for files and 30 secs. for directories. v it remains difficult to write distributed applications that share files with NFS. 54
Case Study: Sun NFS § Other NFS optimizations Ø Sun RPC runs over UDP by default (can use TCP if required). Ø Uses UNIX BSD Fast File System with 8 -kbyte blocks. Ø reads() and writes() can be of any size (negotiated between client and server). Ø The guaranteed freshness interval t is set adaptively for individual files to reduce getattr() calls needed to update Tm. Ø File attribute information (including Tm) is piggybacked in replies to all file requests. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 55
Case Study: Sun NFS § NFS performance Ø Early measurements (1987) established that: v Write() operations are responsible for only 5% of server calls in typical UNIX environments. – hence write-through at server is acceptable. v Lookup() accounts for 50% of operations -due to step -by-step pathname resolution necessitated by the naming and mounting semantics. Ø More recent measurements (1993) show high performance. v see www. spec. org for more recent measurements. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 56
Case Study: Sun NFS § NFS summary Ø NFS is an excellent example of a simple, robust, high-performance distributed service. Ø Achievement of transparencies are other goals of NFS: v Access transparency: – The API is the UNIX system call interface for both local and remote files. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 57
Case Study: Sun NFS v Location transparency: – Naming of filesystems is controlled by client mount operations, but transparency can be ensured by an appropriate system configuration. v Mobility transparency: – Hardly achieved; relocation of files is not possible, relocation of filesystems is possible, but requires updates to client configurations. v Scalability transparency: – File systems (file groups) may be subdivided and allocated to separate servers. Ultimately, the performance limit is determined by the load on the server holding the most heavily-used filesystem (file group). Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 58
Case Study: Sun NFS v Replication transparency: – Limited to read-only file systems; for writable files, the SUN Network Information Service (NIS) runs over NFS and is used to replicate essential system files. v Hardware and software operating system heterogeneity: – NFS has been implemented for almost every known operating system and hardware platform and is supported by a variety of filling systems. v Fault tolerance: – Limited but effective; service is suspended if a server fails. Recovery from failures is aided by the simple stateless design. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 59
Case Study: Sun NFS v Consistency: – It provides a close approximation to one-copy semantics and meets the needs of the vast majority of applications. – But the use of file sharing via NFS for communication or close coordination between processes on different computers cannot be recommended. v Security: – Recent developments include the option to use a secure RPC implementation for authentication and the privacy and security of the data transmitted with read and write operations. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 60
Case Study: Sun NFS v Efficiency: –NFS protocols can be implemented for use in situations that generate very heavy loads. Couloris, Dollimore and Kindberg Distributed Systems: Concepts & Design Edn. 4 , Pearson Education 2005 61
- Slides: 61