Distributed File Systems NFS AFS SMB DISTRIBUTION Distribution
Distributed File Systems NFS AFS SMB
DISTRIBUTION
Distribution p Distributed n File System (DFS) Network file system with: Clients, servers, and storage devices p Spread throughout the machines of a distributed system or intranet p n Service activity occurs across the network p System has multiple and independent storage devices § Some DFS servers run on dedicated machines § Some machines can be both a server and a client n A DFS can be implemented: As part of a distributed operating system p By a software layer p § Managing the communication between conventional operating systems and file systems
Distribution p Distinctive n System feature of a DFS: with many - and p autonomous p n clients and servers
TRANSPARENCY
Transparency p A DFS should appear to its users to be a conventional, centralized file system n Multiplicity and dispersion of its servers and storage devices should be invisible p n Client interface used by programs should not distinguish between local and remote files DFS locates the files and arranges the transport of data
PERFORMANCE
Performance p Most important performance measurement of a DFS: n p Amount of time needed to satisfy service requests Typical performance constraints: n Conventional file systems (local drives): p p n Disk-access time Small amount of CPU-processing time DFS: p p Additional overhead due to the distributed structure Includes: § Time to deliver the request to a server § Time to deliver the response to the client § For each direction, a CPU overhead of running the communication protocol software
Performance p Performance of DFS can be viewed as one dimension of its transparency n Ideally, it would be comparable to that of a conventional file system
CONCURRENT FILE UPDATES
Concurrent file updates p DFS should allow n Multiple client processes On multiple machines p Access and update the same files p n n Updates to the file from one client should not interfere with access and updates from other clients Concurrency control or locking may be: Built into the file system p Provided by an add-on protocol p
Distributed File Systems
Sample File Systems p Client-server file systems n n n 9 P Amazon S 3 Coda Secure File System (SFS) Self-certifying File System (SFS)
Sample File Systems p Distributed file systems n n Andrew File System (AFS), Open. AFS DCE Distributed File System p n n n Google File System p File system that is optimized for Google's core storage needs p Java software framework that supports distributed applications running on large clusters of commodity computers Hadoop Kosmos FS p n n Remote file access protocol used with the Distributed Computing Environment High performance distributed file system. It is also integrated with hadoop. Lustre p Distributed file system for large scale cluster computing p Set of client and server services for organizing many distributed SMB file shares into a distributed file system Microsoft Distributed File System Gluster. FS p Platform for developing clustering applications p Parallel file system that focuses on high performance access to large data sets Parallel Virtual File System
http: //en. wikipedia. org/wiki/Network_File_System_%28 proto col%29 NFS PROTOCOL
Network File System Protocol p Network file system protocol: n Allows a user on a client computer to access files over a network p n p Originally developed by Sun Microsystems in 1984 NFS builds on Open Network Computing Remote Procedure Call system n p As if the network devices were attached to its local disks ONC RPC Network File System protocol specified in: n n n RFC 1094 RFC 1813 RFC 3530 (which obsoletes RFC 3010)
VERSIONS AND VARIATIONS
Versions and variations p Version 1 n n Used only for Sun in-house experimental purposes Not released to public
Versions and variations p Version 2 n n Originally operated entirely over UDP Defined in RFC 1094 p n March 1989 Designers meant to keep the protocol stateless p Locking (for example) implemented outside of the core protocol
Versions and variations p Version 3 (RFC 1813, June 1995) added: n Support for 64 -bit file sizes and offsets p n Support for asynchronous writes on the server p n Improve write performance Additional file attributes in many replies p n Handle files larger than 4 gigabytes (GB) Avoid the need to re-fetch them READDIRPLUS operation Get file handles and attributes p Along with file names when scanning a directory p n Assorted other improvements
Versions and variations p At the introduction of Version 3 n n Vendor support for TCP as a transport-layer protocol began increasing Several vendors had already added support for NFS Version 2 with TCP as a transport p n Sun Microsystems added support for TCP as a transport for NFS at the same time it added support for Version 3 Using TCP as a transport made using NFS over a WAN more feasible
Versions and variations p Version 4 n n First version developed with the IETF After Sun Microsystems handed over the development of the NFS protocols
Versions and variations p Version 4 includes: n n n p Performance improvements Mandates strong security Introduces a stateful protocol Defined by n n RFC 3010, December 2000 Revised in RFC 3530, April 2003
Versions and variations p Various side-band protocols have become associated with NFS, including: n Byte-range advisory Network Lock Manager (NLM) protocol p n Added to support UNIX System V file-locking APIs Remote quota reporting (RQUOTAD) protocol p Allow NFS-users to view their data-storage quotas on NFS servers
Versions and variations p Web. NFS n Allows NFS to Integrate more easily into Web-browsers p Enable operation through firewalls p n p An extension to Version 2 and Version 3 Sun Microsystems open sourced their Web. NFS implementation n via https: //yanfs. dev. java. net/
http: //en. wikipedia. org/wiki/Network_file_system NFS GENERAL
Network File System (NFS) p Any computer file system that supports sharing of files and other resources as persistent storage over a computer network n n First file servers were developed in the 1970 s Network File System (NFS) p p Created at Sun Microsystems 1985 First widely used distributed file system Other notable distributed file systems n n Andrew File System (AFS) Server Message Block SMB p AKA CIFS
PLATFORMS
Platforms p NFS n Most commonly used with UNIX n Other software platforms: p Mac OS p Microsoft Windows p Novell Net. Ware p IBM AS/400
Platforms p Alternative remote file access protocols include n Server Message Block (SMB) protocol p n n n p p p Also known as CIFS Apple Filing Protocol (AFP) Net. Ware Core Protocol (NCP) OS/400 File Server file system (QFile. Svr. 400) SMB and Net. Ware Core Protocol (NCP) are most common on Microsoft Windows AFP is most common on Macintosh systems QFile. Svr. 400 is most common on AS/400 systems
TYPICAL IMPLEMENTATION
Typical implementation p Unix-style n Client p n scenario: Requires access to data Server p Servers data
Typical implementation p Server Side n Server p p n Server administrator p n Determines what resources to make available § Exporting names and parameters of directories § Typically using the § /etc/exports configuration file § exportfs command Server security-administration p n NFS daemon (nfsd) Make data generically available to clients Recognizes and approves validated clients Server network configuration p Lets appropriate clients can negotiate with it through any firewall system
Typical implementation p Client Side n Client machine p p n Users on the client machine p p p Requests access to exported data Issue a mount command View and interact with mounted file systems on the server Within the parameters permitted Note: automating the NFS mounting process is typical n E. g. using /etc/fstab and/or automounting facilities
PROTOCOL DEVELOPMENT VERSUS COMPETING PROTOCOLS
Protocol development versus competing protocols p 1980 s n n NFS and ONC figured prominently in the network-computing war between Sun Microsystems and Apollo Computer, and later the UNIX wars (ca 1987 -1996) between AT&T and Sun on one side, and Digital Equipment, HP, and IBM on the other. During the development of the ONC protocol (called Sun. RPC at the time), only Apollo's Network Computing System (NCS) offered comparable functionality. Two competing groups developed over fundamental differences in the two remote procedure call systems. Arguments focused on the method for data-encoding — ONC's External Data Representation (XDR) always rendered integers in bigendian order, even if both peers of the connection had little-endian machinearchitectures, whereas NCS's method attempted to avoid byte-swap whenever two peers shared a common endianness in their machine-architectures. An industry-group called the Network Computing Forum formed (March 1987) in an (ultimately unsuccessful) attempt to reconcile the two network-computing environments. Later, Sun and AT&T announced that the two firms would jointly develop AT&T's next version of UNIX: System V Release 4. This caused many of AT&T's other licensees of UNIX System V to become concerned that this would put Sun in an advantaged position, and it ultimately led to Digital Equipment, HP, IBM, and others forming the Open Software Foundation (OSF) in 1988. Ironically, Sun and AT&T had previously competed over Sun's NFS versus AT&T's Remote File System (RFS), and the quick adoption of NFS over RFS by Digital Equipment, HP, IBM, and many other computer vendors tipped the majority of users in favor of NFS. OSF solicited the proposals for various technologies, including the remote procedure call (RPC) system and the remote file access protocol. In the end, proposals for these two requirements, called respectively, the Distributed Computing Environment (DCE), and the Distributed File System (DFS) won over Sun's proposed ONC and NFS. DCE derived from a suite of technologies, including NCS and Kerberos. DFS used DCE as the RPC and derived from AFS.
Protocol development versus competing protocols p 1990 s n n p Sun Microsystems and the Internet Society (ISOC) reached an agreement to cede "change control" of ONC RPC so that ISOC's engineering-standards body, the Internet Engineering Task Force (IETF), could publish standards documents (RFCs) documenting the ONC RPC protocols and could extend ONC RPC. OSF attempted to make DCE RPC an IETF standard, but ultimately proved unwilling to give up change-control. Later, the IETF chose to extend ONC RPC by adding a new authentication flavor, RPCSEC GSS, in order to meet IETF's requirements that protocol standards have adequate security. Later, Sun and ISOC reached a similar agreement to give ISOC change control over NFS, although writing the contract carefully to exclude NFS version 2 and version 3. Instead, ISOC gained the right to add new versions to the NFS protocol, which resulted in IETF specifying NFS version 4 in 2003. 2000 s n By the 21 st century, neither DFS nor AFS had achieved any major commercial success as compared to CIFS or NFS. IBM, which had previously acquired the primary commercial vendor of DFS and AFS, Transarc, donated most of the AFS source code to the free software community in 2000. The Open. AFS project lives on. In early 2005, IBM announced end of sales for AFS and DFS.
Present p NFSv 4. 1 presents new opportunities for the NAS community n Primary addition is Parallel NFS (p. NFS) p n Protocol defines a method of separating the meta-data (names and attributes) of a filesystem from the location of the file data p p n Holds the names of files and their data under the single umbrella of the server Products exist that are multi-node NFS servers p n Goes beyond simple name/data separation Defines method of striping the data amongst a set of data servers Very different from the traditional NFS server p n A method of introducing data access parallelism Limited participation by the client in separation of meta-data and data NFSv 4. 1 client can be enabled to be a direct participant in the exact location of file data p Avoid sole interaction with the single NFS server when moving data
Present p NFSv 4. 1 p. NFS server: n A collection or community of server resources or components p p Community members are controlled by the meta-data server p. NFS client still accesses a single metadata server for traversal or interaction with the namespace n Client moves data to and from the server p Directly interact with the set of data servers belonging to the p. NFS server community
Present p NFSv 4. 1 provides: n n n Sessions Directory Delegation and Notifications Multi-server Namespace ACL/SACL/DACL Retention Attributions SECINFO_NO_NAME
AFS http: //en. wikipedia. org/wiki/Andrew_File _System
Andrew File System (AFS) p Distributed networked file system n Developed by Carnegie Mellon University p n Part of the Andrew Project Named for Andrew Carnegie p Andrew Mellon p p Primary used in distributed computing
FEATURES
Features p AFS has several benefits over traditional networked file systems n n n Particularly in the areas of security and scalability Not uncommon for enterprise AFS cells to exceed 50, 000 clients Uses Kerberos for authentication p n Implements access control lists on directories for users and groups Each client caches files on the local filesystem for increased speed on subsequent requests for the same file p Allows limited file system access in the event of a server crash or a network outage
Features p Read and write operations on an open file are directed only to the locally cached copy n When a modified file is closed p n n Cache consistency maintained by callback When a file is cached p p n Server makes a note of this Will inform the client if the file is updated by someone else After any client, server, or network failure (including a time-out) p n Only the changed portions copied back to the file server Callbacks are discarded and must be re-established Re-establishing a callback involves: p p Status check Does not require re-reading the file itself
Features p Consequence of the whole file locking strategy n n AFS does not support large shared databases or record updating within files shared between client systems Deliberate design decision p n Based on the perceived needs of the university computing environment Leads to the use of a single file per message in the original email system for the Andrew Project, the Andrew Message System, rather than the more customary single file per mailbox
Features p A significant feature of AFS is the volume n n n Tree of files Sub-directories AFS mountpoints p p Volumes are created by administrators n n p Links to other AFS volumes Linked at a specific named path in an AFS cell Once created, users of the filesystem may create directories and files as usual without concern for the physical location of the volume A volume may have a quota assigned to it in order to limit the amount of space consumed. n As needed, AFS administrators can move that volume to another server and disk location without the need to notify users p Moving can occur while files in that volume are even being used
Features p AFS volumes can be replicated to read-only cloned copies n n When accessing files in a read-only volume, a client system will retrieve data from any read-only copy If a copy becomes unavailable, clients will look for any remaining copies p p n Users of that data are unaware of the location of the readonly copy Administrators can create and relocate copies as needed AFS command suite guarantees that all read-only volumes contain exact copies of the original read-write volume at the time the read-only copy was created
Features p File name space on an Andrew workstation is partitioned into a shared and local name space n Shared name space identical on all workstations p n Usually mounted as /afs on the Unix file system Local name space unique to each workstation p Contains only temporary files needed for workstation initialization and symbolic links to files in the shared name space
Features p Andrew File System n n Heavily influenced Version 4 of Sun Microsystems' NFS Variant of AFS adopted by the Open Software Foundation Distributed File System (DFS) p Part of their Distributed computing environment p
IMPLEMENTATIONS
Implementations p There are three major implementations n n n p p Transarc software is losing support and is deprecated AFS (version two) n p Transarc (IBM) Open. AFS Arla Predecessor of the Coda file system Fourth implementation in the Linux kernel source code since at least version 2. 6. 10 n Committed by Red Hat p p p A fairly simplementation Still in its early stages of development Incomplete
AVAILABLE PERMISSIONS
Available permissions p The following Access Control List (ACL) permissions can be granted: n Lookup (l) p Allows users to: § List the contents of the AFS directory § Examine the ACL associated with the directory § Access subdirectories n Insert (i) p n Delete (d) p n Allows users to add new files or subdirectories to the directory Allows users to remove files and subdirectories from the directory Administer (a) p p Allows users to change the ACL for the directory Users always have this right on their home directory § Even if they accidentally remove themselves from the ACL
Available permissions p Permissions that affect files and subdirectories include: n Read (r) p p n Write (w) p p n Allows users to look at the contents of files in a directory and list files in subdirectories Files that are to be granted read access to any user, including the owner, need to have the standard UNIX "owner read" permission set Allows users to modify files in a directory Files that are to be granted write access to any user, including the owner, need to have the standard UNIX "owner write" permission set Lock (k) p p Allows the processor to run programs that need to "flock" files in the directory See the UNIX man page for "flock" for more details
Available permissions p AFS includes Application ACLs (A)-(H) which have no effect on access to files.
SMB http: //en. wikipedia. org/wiki/Server_Message_Block
Server Message Block (SMB) p Operates as an application-level network protocol n Applied to shared access to p p n p p Files Printers Serial ports Miscellaneous communications Between nodes on a network Provides an authenticated Inter-process communication mechanism Most usage of SMB involves computers running Microsoft Windows: n Microsoft environments users often know it simply as "Microsoft Windows Network"
Server Message Block (SMB) p When discussing SMB, one should distinguish: n n n SMB the protocol SMB services that run on the protocol Net. BIOS DCE/RPC services that use SMB as an authenticated Inter-process communication channel (over named pipes) "Network Neighborhood" protocols which primarily (but not exclusively) run as datagram services directly on the Net. BIOS transport
HISTORY
History p Barry Feigenbaum originally invented SMB at IBM n p Turn DOS "Interrupt 33" (21 h) local file-access into a networked file-system Microsoft made considerable modifications to the most commonly used version n Merged the SMB protocol with the LAN Manager p n Developing with 3 Com (circa 1990) Continued to add features to the protocol p p Windows for Workgroups (circa 1992) Later versions of Windows
History p Original design of SMB envisaged it running on top of the Net. BIOS and Net. BEUI APIs n Typically implemented with NBF (Net. BIOS Frame) p Net. BIOS over IPX/SPX p NBT (Net. BIOS TCP/IP) p n SMB can also run directly on the TCP/IP protocols p Introduced with Windows 2000
History p Circa Sun Microsystems announcing Web. NFS n n Microsoft launched an initiative in 1996 to rename SMB to Common Internet File System (CIFS) Added more features, including p Support for: § § p p Symbolic links Hard links Larger file sizes Attempt at supporting direct connections without the Net. BIOS Largely experimental effort that required further refinement Microsoft submitted some partial specifications as Internet-Drafts to the IETF n Submissions have expired
History p Windows Vista introduced Server Message Block 2. 0
History p Side issue: n Samba project originated aim of Reverse engineer SMB p Provide a free implementation of a compatible SMBclient and server for use with non-Microsoft operating system p n Driven by the importance of the SMB protocol in interacting with the widespread Microsoft Windows platform
Resume 7/31
IMPLEMENTATION
Implementation p Client-server approach n SMB works through a peer-to-peer approach p p n Client makes specific requests Server responds accordingly One section of the SMB protocol specifically deals with access to file systems p Clients may make requests to a file server n Other sections of the SMB protocol specialize in inter-process communication (IPC) n Developers have optimized the SMB protocol for local subnet usage p Users have also put SMB to work to access different subnets across the Internet § Involving file-sharing or print-sharing in MS Windows environments
Implementation p Client-server approach n SMB servers make their file systems and other resources available to clients on the network p Client computers may want access to the shared file systems and printers on the server § For this primary function, SMB is best-known and most heavily used p SMB file-server aspect would count for little without the NT domains suite of protocols § Provide NT-style domain-based authentication p NT Domains protocols offer MSRPC services available almost exclusively available on SMB IPC "named pipe“ § Almost all implementations of SMB servers use NT Domain authentication to validate user-access to resources
Implementation p Performance issues n Many people believe that the SMB protocol makes heavy use of network bandwidth because each client broadcasts its presence to the whole subnet p SMB itself does not use broadcasts § Broadcast problems commonly associated with SMB actually originate with the Net. BIOS service location protocol p p p Microsoft Windows server will use Net. BIOS to advertise and locate services Net. BIOS functions by broadcasting services available on a particular host at regular intervals. Usually ok for an acceptable default in a network with fewer than 20 hosts § Broadcast traffic will cause problems as the number of hosts increases. p Proper implementation of a Net. BIOS Name Server (NBNS) can mitigate this problem § Windows Internet Naming Service (WINS) offers a suitable solution in Microsoft environments. p WINS uses a much more advanced system of registration and centralized service requests § Imposes its own complexity upon the design and maintenance of the network p Microsoft recommends the use of Dynamic DNS § In Microsoft Active Directory environments
Implementation p Performance issues n Network designers should expect that network latency will have a significant impact on the performance of the SMB protocol p Monitoring reveals this mostly due to navigating among directories through SMB § When significant network latency exists between hosts p For example, a VPN connection over the Internet will often introduce network latency § Can make for a frustrating experience
Implementation p Microsoft's modifications n Microsoft has added several extensions to its own SMB implementation. p Added NT LAN Manager (NTLM) Version 2 § NTLM version 1 used DES in a flawed manner § Derived from the original legacy SMB specification's requirement to use IBM "Lan. Manager" passwords p NT 4. 0 Domain Logon protocols use 40 -bit encryption outside of the United States of America § Does not conform with modern security standards
Implementation p Points of interest n SMB's "Inter-Process Communication" mechanism deserves a specific mention p SMB "IPC" system provides named pipes § Provides one of the first few inter-process mechanisms commonly available to programmers § Provides a means for services to inherit the authentication carried out when a client first connected to an SMB server § Inherited authentication in named pipes has become so ubiquitous and transparent that both Windows-users and programmers who use the Windows API often simply take it for granted
Implementation p Points of interest n Some services that operate over named pipes also allow MSRPC client programs to perform authentication Over-rides the authorization provided by the SMB server p Only in the context of the MSRPC client program that successfully makes the additional authentication p
Implementation p Points of interest n SMB uses opportunistic locking Special type of locking-mechanism p Improves performance p n SMB serves as the basis for Microsoft's Distributed File System implementation
VERSIONS AND IMPLEMENTATIONS
Versions and implementations p The following list explicitly refers to "SMB" as including an SMB client or an SMB server n n n Including the various protocols that extend SMB For simplicity and conciseness and vagueness, however, the list omits mention of the extent or completeness of the reimplementation or porting status for any of these implementations Lumps them all together simply as "SMB"
Versions and implementations p List… n Samba includes an SMB server and a command-line SMB client p n n Samba TNG: a fork of Samba The Linux kernel includes two SMB client implementations that use the Linux VFS p n n smbfs originally derived from the Free. BSD smbfs; they use the Net. BSD and OS X VFS Solaris has a project called CIFS client for Solaris p n smbfs that uses its VFS Net. BSD and Mac OS X include SMB client implementations: p n Provides access to Net. Ware volumes for Microsoft Network clients Free. BSD includes an SMB client implementation p n users can access the same data through both protocols Novell Net. Ware version 6 and later has a CIFS server implementation p n Provides access to files on an SMB server through the standard file system API: smbfs and cifs ONStor Inc. offers an SMB implementation that also supports NFS protocol p n Free re-implementation of the SMB protocol and of the Microsoft extensions to it Based on the Mac OS X smbfs Sun Microsystems added in-kernel CIFS support to Solaris in October 2007 Sun Microsystems Cascade (aka PC-Netlink) represents a port of Advanced Server for Unix p Sun took over two years making the code useful, due to the poor quality of the original port
Versions and implementations p Continued… n Free. NAS runs Free. BSD for Network-attached storage (NAS) services p n Advanced Server for Unix (AS/U) comprises a port of Windows NT 3. 51's SMB server code to Unix p p n n n n n Microsoft licensed the code to AT&T, which then licensed it to major Unix vendors. The poor quality of the original port (allegedly carried out by Microsoft itself) has caused any vendor sub-licensing it significant grief VERITAS Software has an implementation of SMB SCO has a port of Advanced Server for Unix SCO has Vision. FS, a Microsoft-independent re-implementation of SMB developed through reference to Samba source code Network Appliance has an SMB server implementation Objective Development's Sharity provides an SMB file-system client for Unix The Alfresco content-management system includes a Java implementation of SMB JCIFS offers an implementation of SMB in Java RTSMB, a CIFS/SMB implementation written in ANSI C. p n Supports protocols including CIFS/Samba EBS designed RTSMB from scratch, independently of MS or SAMBA design reference, to run in embedded devices Visuality Systems NQ CIFS, a CIFS (SMB) server and client solution for embedded devices — ported to many popular real-time operating systems (RTOSs) Thursby Software Systems offers a commercial implementation of SMB/CIFS for Mac OS
- Slides: 79