DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 11 DISTRIBUTED FILE SYSTEMS 1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS Objectives: • Architecture • Processes • Communication • Synchronization • Consistency and Replication • Fault Tolerance 2 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS Distributed file systems allow multiple processes to share data over long periods of time in a secure and reliable way 3 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS ARCHITECTURE: - Client-Server Architectures - Cluster-Based Distributed File Systems - Symmetric Architectures (P 2 P-based File Systems) 4 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS ARCHITECTURE: - Client-Server Architectures e. g. Network File System (NFS) from Sun Microsystems File System Model: - to access a file, a client must first look up its name in a naming service and obtain the associated file handle - each file has a number of attributes whose values can be looked up and changed 5 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Client-Server Architectures Figure 11 -1. (a) The remote access model. (b) The upload/download model. 6 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Client-Server Architectures Figure 11 -2. The basic NFS architecture for UNIX systems. 7 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
File System Model Figure 11 -3. An incomplete list of file system operations supported by NFS. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5 8
File System Model Figure 11 -3. An incomplete list of file system operations supported by NFS. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5 9
DISTRIBUTED FILE SYSTEMS Cluster-Based Distributed File Systems - When dealing with very large data collections, following a simple client-server approach is not going to work - for speeding up file accesses, apply striping techniques by which files can be fetched in parallel file-striping techniques: by which a single file is distributed across multiple servers, it becomes possible to fetch different parts in parallel 10 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Cluster-Based Distributed File Systems Figure 11 -4. The difference between (a) distributing whole files across several servers and (b) striping files for parallel access. 11 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Cluster-Based Distributed File Systems Example: Google, has developed its own Google file system (GFS) The Google solution: Divide files in large 64 MB chunks, and distribute/replicate chunks across many servers: - The master maintains only a (file name, chunk server) table in main memory; minimal I/O - Files are replicated using a primary-backup scheme 12 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Cluster-Based Distributed File Systems Figure 11 -5. The organization of a Google cluster of servers. 13 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Symmetric Architectures P 2 P-based File Systems Example: Ivy, a distributed file system that is built using a Chord DHT-based system Basic idea: Store data blocks in the underlying P 2 P system: -Every data block with content D is stored on a node with hash h(D). Allows for integrity check. - Public-key blocks are signed with associated private key and looked up with public key. - A local log of file operations to keep track of {block. ID, h(D)} pairs 14 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Symmetric Architectures Figure 11 -6. The organization of the Ivy distributed file system. 15 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS Processes: most interesting aspect concerning file system processes is whether or not they should be stateless? Example: in the NFS latest version(4) statefull 16 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS COMMUNICATION: Communications in distributed file systems are based on remote procedure calls (RPCs) The main reason is to make the system independent from underlying operating systems, networks, and transport protocols RPC in NFS: Every NFS operation can be implemented as a single remote procedure call to a file server 17 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Remote Procedure Calls in NFS Figure 11 -7. (a) Reading data from a file in NFS version 3. (b) Reading data using a compound procedure in version 18 4. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS Synchronization: File sharing semantics • When dealing with distributed file systems, we need to take into account the ordering of concurrent read/write operations and expected semantics (i. e. , consistency) • When two or more users share the same file at the same time, it is necessary to define the semantics of reading and writing precisely to avoid problems 19 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS Synchronization: File sharing semantics • UNIX semantics: a read operation returns the effect of the last write operation can only be implemented for remote access models in which there is only a single copy of the file • Transaction semantics: the file system supports transactions on a single file; issue is how to allow concurrent access to a physically distributed file • Session semantics: the effects of read and write operations are seen only by the client that has opened (a local copy) of the file; what happens when a file is closed (only one client may actually win) 20 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Semantics of File Sharing Figure 11 -16. (a) On a single processor, when a read follows a write, the value returned by the read is the value just written. 21 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Semantics of File Sharing Figure 11 -16. (b) In a distributed system with caching, obsolete values may be returned. 22 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Semantics of File Sharing Figure 11 -17. Four ways of dealing with the shared files in a distributed system. 23 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS CONSISTENCY AND REPLICATION: • In modern distributed file systems, client-side caching is the preferred technique for attaining performance; server-side replication is done for fault tolerance. • Clients are allowed to keep (large parts of) a file, and will be notified when control is withdrawn; servers are now generally stateful 24 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS CONSISTENCY AND REPLICATION e. g. Caching in NFS: clients cache file data, attributes, file handles, and directories. Different strategies exist to handle consistency of the cached data, cached attributes, etc. 25 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Client-Side Caching Figure 11 -21. Client-side caching in NFS. 26 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Client-Side Caching Figure 11 -22. Using the NFSv 4 callback mechanism to recall file delegation. 27 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
DISTRIBUTED FILE SYSTEMS FAULT TOLERANCE: Replication is deployed to create fault-tolerant server groups • Handling Byzantine Failures (arbitrary failures) quorum certificate : sufficiently large number of processes (2 k+1) have stored the same request and that it is thus safe to proceed 28 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Handling Byzantine Failures Figure 11 -26. The different phases in Byzantine fault tolerance. 29 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Summary • • • Distributed file systems form an important paradigm for building distributed systems generally organized according to the client-server model client-side caching and server replication to meet scalability requirements Also, caching and replication are needed to achieve high availability. More recently, symmetric architectures such as those in peerto-peer file-sharing systems have emerged. whether whole files or data blocks are distributed 30 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
Summary • • All operations can be expressed as RPCs to a file server instead of having to use primitive message-passing operations What makes distributed file systems different from nondistributed file systems is the semantics of sharing files Semantics: Unix, session, Immutable, and transactions To achieve acceptable performance, distributed file systems generally allow clients to cache an entire file 31 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2 e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0 -13 -239227 -5
- Slides: 31