Distributed Systems General Services Januari 2000 Distributed systems

Distributed Systems: General Services Januari 2000 Distributed systems: general services 1

Overview of chapters • Introduction • Co-ordination models and languages • General services – – Ch 11 Time Services, 11. 1 -11. 3 Ch 9 Name Services Ch 8 Distributed File Systems Ch 10 Peer-to-peer Systems • Distributed algorithms • Shared data November 2005 Distributed systems: general services 2

This chapter: overview • Introduction • Time services • Name services • Distributed file systems • Peer-to-peer systems November 2005 Distributed systems: general services 3

Introduction • Objectives – Study existing subsystems • Services offered • Architecture • Limitations – Learn about non-functional requirements • Techniques used • Results • Limitations November 2005 Distributed systems: general services 4

This chapter: overview • Introduction • Time services • Name services • Distributed file systems • Peer-to-peer systems November 2005 Distributed systems: general services 5

Time services: Overview • What & Why • Clock synchronization • Logical clocks (see chapter on algorithms) November 2005 Distributed systems: general services 6

Time services • Definition of time – International Atomic Time (TAI) A second is defined as 9. 192. 631. 770 periods of transition between two hyperfine levels of the ground state of Caesium-133 – Astronomical time Based on the rotation of the earth on its axis and its rotation about the sun. But due to the tidal friction the earth’s rotation is getting longer – Co-ordinated Universal Time (UTC) International standard based Atomic time, but a leap second is occasionally inserted or deleted to keep up with Astronomical time November 2005 Distributed systems: general services 7

Time services (cont. ) • Why is time important? – Accounting (e. g. logins, files, …) – Performance measurement – Resource management – some algorithms depend on it: • Transactions using timestamps • Authentication (e. g. Kerberos) • Example Make November 2005 Distributed systems: general services 8

Time services (cont. ) • An example: Unix tool “make” – large program: • multiple source files: graphics. c • multiple object files: graphics. o • change of one file recompilation of all files – use of make to handle changes: • change of graphics. c at 3 h 45 • make examines creation time of graphics. o • e. g. 3 h 43: recompiles graphics. c to create an up-todate graphics. o November 2005 Distributed systems: general services 9

Time services (cont. ) • Make in a distributed environment: • source file on system A • object file on system B Graphics. o created Make graphics. o Clock B 3 h. . 47 48 49 50 Clock A 3 h. . no effect!! 41 42 43 Graphics. c modified November 2005 time 44 Equal times Distributed systems: general services 10

Time services (cont. ) • Hardware clocks – each computer has a physical clock • H(t): value of hardware clock/counter • different clocks different values • e. g. crystal-based clocks drift due to – different shapes, temperatures, … • Typical drift rates: – 1 sec/sec 1 sec in 11. 6 days – High precision quartz clocks: 0. 1 -0. 01 sec/sec • Software clocks – C(t) = H(t) + • Ex. nanosecs elapsed since reference time November 2005 Distributed systems: general services 11

Time services (cont. ) • Changing local time – Hardware: influence H(t) – Software: change and/or – Forward: • Set clock forward • some clock ticks appear to have been missed – Backward: • Set backward? unacceptable • Let clock run slower for a short period of time November 2005 Distributed systems: general services 12

Time services (cont. ) • Properties – External synchronization to UTC (S) | S(t) - C(t) | < D – Internal synchronization | Ci(t) – Cj(t)| < D – Correctness Drift rate < – Monotonicity t’ > t C(t’) > C(t) November 2005 Distributed systems: general services 13

Time services: Overview • What & Why • Synchronising Physical clocks – Synchronising with UTC – Christian’s algorithm – The Berkeley algorithm – Network Time Protocol • Logical clocks (see chapter on algorithms) November 2005 Distributed systems: general services 14

Time services synchronising with UTC • Broadcast by Radio-stations – WWV (USA) (accuracy 0. 1 - 10 msec. ) • Broadcast by satellites – Geos (0. 1 msec. ) – GPS (1 msec) • Receivers to workstations November 2005 Distributed systems: general services 15

Time services Christian’s algorithm • assumption: time server process Ts on a computer with UTC receiver • process P: request/reply with time server T-request P Ts reply (t) November 2005 Distributed systems: general services 16

Time services Christian’s algorithm • how to set clock at P? – time t inserted in Ts – time for P: t + Ttrans – Ttrans = min + x • min = minimum transmission time • x? depends on – network delay – process scheduling November 2005 Distributed systems: general services 17

Time services Christian’s algorithm • In a synchronous system – Upper bound on delay: max – P sets time to t + (max – min) / 2 – Accuracy: (max – min) / 2 • In an asynchronous system? November 2005 Distributed systems: general services 18

Time services Christian’s algorithm • practical approach – P measures round-trip time: Tround – P sets time to t + Tround/2 – accuracy? • time at Ts when message arrives at P is in range: [t + min, t + Tround - min] • accuracy = ((Tround/2) - min) • problems: – single point of failure – impostor time server November 2005 Distributed systems: general services 19

Time services The Berkeley algorithm • Used on computers running Berkeley Unix • active master elected among computers whose clocks have to be synchronised – no central time server – upon failure of master: re-election November 2005 Distributed systems: general services 20

Time services The Berkeley algorithm • algorithm of master: – periodically polls the clock value of other computers – estimates its local clock time (based on roundtrip delays) – averages the obtained values – returns the amount by which each individual’s slave clock needs adjustment – fault-tolerant average (strange values excluded) may be used November 2005 Distributed systems: general services 21

Time services Network Time Protocol • Standard for the Internet • Design aims and features: – Accurate synchronization despite variable message delays • Statistical techniques to filter timing data from different servers – Reliable service that can survive losses of connectivity • Redundant servers • hierarchy of servers in synchronised subnets November 2005 Distributed systems: general services 22

Time services Network Time Protocol Stratum 1 have UTC receiver 1 2 3 Level = stratum 2 3 3 synchronisation November 2005 Distributed systems: general services 23

Time services Network Time Protocol • Three working modes for NTP servers – Multicast mode • used on high speed LAN • slave sets clock assuming small delay – Procedure-call mode • similar to Christian’s algorithm – Symmetric Mode • used by master servers and higher levels • association between servers November 2005 Distributed systems: general services 24

Time services Network Time Protocol • Procedure-call and Symmetric mode – each message bears timestamps of recent message events T i-2 Server B m Server A November 2005 T i-3 T i-1 m’ time Ti Distributed systems: general services 25

Time services Network Time Protocol – Compute approximations of offset and delay T i-2 Server B m Transmission times Server A a = T i-2 - T i-3 = t + o m’ t T i-3 di=a-b estimates o true offset t’ Ti T i-2 = T i-3 + t + o T i = T i-1 + t’ - o T i-1 d delay total transmission time o i = (a + b) / 2 oi-di/2 o oi+di/2 b = T i-1 - T i = o – t’ November 2005 Distributed systems: general services 26

Time services Network Time Protocol • Procedure-call and Symmetric mode – data filtering applied to successive pairs – NTP has complex peer-selection algorithm; favoured are peers with : • a lower stratum number • a lower synchronisation dispersion – Accuracy: • 10 msec on Internet • 1 msec on LAN November 2005 Distributed systems: general services 27

This chapter: overview • Introduction • Time services • Name services • Distributed file systems • Peer-to-peer systems November 2005 Distributed systems: general services 28

Name services • Overview – Introduction – Basic service – Case study: Domain name system – Directory services November 2005 Distributed systems: general services 29

Name services Introduction • Types of name: – users – files, devices – service names (e. g. RPC) – port, process, group identifiers • Attributes associated with names – Email address, login name, password – disk, block number – network address of server –. . . November 2005 Distributed systems: general services 30

Name services • Goal: Introduction – binding between name and attributes • History – originally: • quite simple problem • scope: a single LAN • naïve solution: all names + attributes in a single file – now: • interconnected networks • different administrative domains in a single name space November 2005 Distributed systems: general services 31

Name services Introduction • naming service separate from other services: – unification • • general naming scheme for different objects e. g. local + remote files in same scheme e. g. files + devices+ pipes in same scheme e. g. URL – integration • scope of sharing difficult to predict • merge different administrative domains • requires extensible name space November 2005 Distributed systems: general services 32

Name services Introduction • General requirements: – scalable – long lifetime – high availability – fault isolation – tolerance of mistrust November 2005 Distributed systems: general services 33

Name services • Overview – Introduction – Basic service – Domain name system – Directory services November 2005 Distributed systems: general services 34

Name services Basic service • Operations: identifier lookup (name, context) register (name, context, identifier) delete (name, context) • From a single server to multiple servers: – hierarchical names – efficiency – availability November 2005 Distributed systems: general services 35

Name services Basic service • Hierarchical name space – introduction of directories or subdomains • /users/pv/edu/distri/services. ppt • cs. kuleuven. ac. be – easier administration • delegation of authority • less name conflicts: unique per directory • unit of distribution: server per directory + single root server November 2005 Distributed systems: general services 36

Name services Basic service • Hierarchical name space – navigation • process of locating naming data from more than one server • iterative or recursive – result of lookup: • attributes • references to another name server November 2005 Distributed systems: general services 37

Name services Basic service P Lookup(“/users/pv/edu/…”, …) root Ref to server-users sys November 2005 pv Distributed systems: general services src ann 38

Name services Basic service P root Lookup(“pv/edu/…”, …) Ref to server-pv users sys November 2005 pv Distributed systems: general services src ann 39

Name services Basic service • Efficiency caching – hierarchy introduces multiple servers more communication overhead more processing capacity – caching at client side: • <name, attributes> are cached • fewer request lookup performance higher • inconsistency? – Data is rarely changed – limit on validity November 2005 Distributed systems: general services 40

Name services Basic service • Availability replication – at least two failure independent servers – primary server: accepts updates – secondary server(s): • get copy of data of primary • period check for updates at primary – level of consistency? November 2005 Distributed systems: general services 41

Name services • Overview – Introduction – Basic service – Case study: Domain name system – Directory services November 2005 Distributed systems: general services 42

Name services Domain name system • DNS = Internet name service • originally: an ftp-able file – not scalable – centralised administration • a general name service for computers and domains – partitioning – delegation – replication and caching November 2005 Distributed systems: general services 43

Name services Domain name system • Domain name – hierarchy nix. cs. kuleuven. ac. be country (België) univ. (academic) K. U. Leuven dept. computer science name of computersystem November 2005 Distributed systems: general services 44

Name services Domain name system • 3 kinds of Top Level Domains (TLD) – 2 -letter country codes (ISO 3166) – generic names (similar organisations) • • com org int net commercial organisations non-profit organisations (bv. vzw) international organisations (nato, EU, …) network providers – USA oriented names • edu • gov • mil universities American government American army – new generic names • biz, info, name, aero, museum, . . . November 2005 Distributed systems: general services 45

Name services Domain name system • For each TLD: – administrator (assign names within domain) – “be”: DNS BE vzw (previous: dept. computer science) • Each organisation with a name : – responsible for new (sub)names – e. g. cs. kuleuven. ac. be • Hierarchical naming + delegation workable structure November 2005 Distributed systems: general services 46

Name services Domain name system • Name servers – root name server + – server per (group of) subdomains + replication high availability + caching acceptable performance – time-to-live to limit inconsistencies November 2005 Distributed systems: general services 47

Name services Domain name system Name server cs. kuleuven. ac. be Systems/subdomains of cs. kuleuven. ac. be type IP-adres nix A 134. 58. 42. 36 idefix A 134. 58. 41. 7 droopy A 134. 58. 41. 10 stevin A 134. 58. 41. 16 . . . A = Address November 2005 Distributed systems: general services 48

Name services Domain name system Name server kuleuven. ac. be Machines/subdomeinen van kuleuven. ac. be type IP-adres cs NS 134. 58. 39. 1 esat NS … www A … . . . NS = Name. Server November 2005 Distributed systems: general services 49

Name services Domain name system www. cs. vu. nl 130. 37. 24. 11 Lokale NS Example : www. cs. vu. nl Root-NS NS (nl) (cs. kuleuven. ac. be) NS (vu. nl) 130. 37. 24. 11 November 2005 NS (cs. vu. nl) Distributed systems: general services 50

Name services Domain name system • Extensions: – mail host location • used by electronic mail software • requests where mail for a domain should be delivered – reverse resolution (IP address -> domain name) – host information • Weak points: – security November 2005 Distributed systems: general services 51

Name services Domain name system • Good system design: – partitioning of data • multiple servers – replication of servers • high availability • limited inconsistencies • NO load balancing; NO server preference – caching • acceptable performance • limited inconsistencies November 2005 Distributed systems: general services 52

Name services • Overview – Introduction – Basic service – Case study: Domain name system – Directory services November 2005 Distributed systems: general services 53

Name services Directory services • Name service exact name attributes • Directory service some attributes of object all attributes • Examples: – request: E-mail address of Janssen at KULeuven – result: list of names of all persons called Janssen – select person and read attributes November 2005 Distributed systems: general services 54

Name services Directory services • Directory information base – list of entries – each entry • set of type/list-of-values pairs • optional and mandatory pairs • some pairs determine name – distributed implementation November 2005 Distributed systems: general services 55

Name services Directory services • X 500 – CCITT standard • LDAP – light weight directory access protocol – Internet standard November 2005 Distributed systems: general services 56

This chapter: overview • Introduction • Time services • Name services • Distributed file systems • Peer-to-peer systems November 2005 Distributed systems: general services 57

Distributed file systems • Overview – File service architecture – case study: NFS – case study AFS – comparison NFS <> AFS November 2005 Distributed systems: general services 58

Distributed file systems File service architecture • Definitions: – file – directory – file system cf. Operating systems November 2005 Distributed systems: general services 59

Distributed file systems File service architecture • Requirements: – addressed by most file systems: • • access transparency location transparency failure transparency performance transparency – related to distribution: • concurrent updates • hardware and operating system heterogeneity • scalability November 2005 Distributed systems: general services 60

Distributed file systems File service architecture • Requirements (cont. ) : – scalable to a very large number of nodes • replication transparency • migration transparency – future: • support for fine-grained distribution of data • tolerance to network partitioning November 2005 Distributed systems: general services 61

Distributed file systems File service architecture • File service components User program Client module: API Network names Directory service UFID Flat file service November 2005 Distributed systems: general services 62

Distributed file systems File service architecture • Flat file service – file = data + attributes – data = sequence of items – operations: simple read & write – attributes: in flat file & directory service – UFID: unique file identifier November 2005 Distributed systems: general services 63

Distributed file systems File service architecture • Flat file service: operations – Read (File. ID, Position, n) Data – Write (File. ID, Position, Data) – Create () File. ID – Delete (File. ID) – Get. Attributes (File. Id) Attr – Set. Attributes (File. ID, Attr) November 2005 Distributed systems: general services 64

Distributed file systems File service architecture • Flat file service: fault tolerance – straightforward for simple servers – idempotent operations – stateless servers November 2005 Distributed systems: general services 65

Distributed file systems File service architecture • Directory service – translate file name in UFID – substitute for open – responsible for access control November 2005 Distributed systems: general services 66

Distributed file systems File service architecture • Directory service: operations – Lookup (Dir, Name, Access. Mode, User. ID) UFID – Add. Name (Dir, Name, File. ID, User. ID) – Un. Name (Dir, Name) – Re. Name (Dir, Old. Name, New. Name) – Get. Names (Dir, Pattern) list-of-names November 2005 Distributed systems: general services 67

Distributed file systems File service architecture • Implementation techniques: – known techniques from OS experience – remain important – distributed file service: comparable in • performance • reliability November 2005 Distributed systems: general services 68

Distributed file systems File service architecture • Implementation techniques: overview – file groups – space leaks – capabilities and access control – access modes – file representation – file location – group location – file addressing – caching November 2005 Distributed systems: general services 69

Distributed file systems File service architecture • Implementation techniques: file groups – (similar: file system, partition) = collection of files mounted on a server computer – unit of distribution over servers – transparent migration of file groups – once created file is locked in file group – UFID = file group identifier + file identifier November 2005 Distributed systems: general services 70

Distributed file systems File service architecture • Implementation techniques: space leaks – 2 steps for creating a file • create (empty) file and get new UFID • enter name + UFID in directory – failure after step 1: • file exists in file server • unreachable: UFID not in any directory lost space on disk – detection requires co-operation between • file server • directory server November 2005 Distributed systems: general services 71

Distributed file systems File service architecture • Implementation techniques: capabilities = digital key: access to resource granted on presentation of the capability – request to directory server: file name + user id + mode of access UFID including permitted access modes – construction of UFID – unique – encode access – unforgeable November 2005 Distributed systems: general services 72

Distributed file systems File service architecture • Implementation techniques: capabilities Reuse? File group id File nr Random nr Access check? File group id File nr Random nr Forgeable? Access bits File group id File nr Encrypted (access bits + random number) November 2005 Access bits Distributed systems: general services 73

Distributed file systems File service architecture • Implementation techniques: file location – from UFID location of file server – use of replicated group location database file group id, Port. Id – why replication? – why location not encoded in UFID? November 2005 Distributed systems: general services 74

Distributed file systems File service architecture • Implementation techniques: caching – server cache: reduce delay for disk I/O • selecting blocks for release • coherence: – dirty flags – write-through caching – client cache: reduce network delay • always use write-through • synchronisation problems with multiple caches November 2005 Distributed systems: general services 75

Distributed file systems • Overview – File service model – case study: Network File System -- NFS – case study: AFS – comparison NFS <> AFS November 2005 Distributed systems: general services 76

Distributed file systems NFS • Background aims – first file service product – emulate UNIX file system interface – de facto standard • key interfaces published in public domain • source code available for reference implementation – supports diskless workstations • not important anymore November 2005 Distributed systems: general services 77

Distributed file systems NFS • Design characteristics – client and server modules can be in any node Large installations include a few servers – Clients: • On Unix: emulation of standard UNIX file system • for MS/DOS, Windows, Apple, … – Integrated file and directory service – Integration of remote file systems in a local one: mount remote mount November 2005 Distributed systems: general services 78

Distributed file systems NFS: configuration • Unix mount system call – each disk partition contains hierarchical FS – how integrate? • Name partitions a: /usr/students/john • glue partitions together – invisible for user – partitions remain useful for system managers November 2005 Distributed systems: general services 79

Distributed file systems NFS: configuration • Unix mount system call Root partition Partition c / (root). . . vmunix students x / (root) usr staff . . . pierre network ann e proj Directory staff root of c: /usr/staff/ann/network November 2005 Distributed systems: general services 80

Distributed file systems NFS: configuration • Remote mount client Server 1 / (root). . . vmunix students x / (root) usr staff . . . ann users . . . pierre Directory staff users: /usr/staff/ann/. . . November 2005 Distributed systems: general services 81

Distributed file systems NFS: configuration • Mount service on server – enables clients to integrate (part of) remote file system in the local name space – exported file systems in /etc/exports + access list (= hosts permitted to mount; secure? ) • On client side – file systems to mount enumerated in /etc/rc – typically mount at start up time November 2005 Distributed systems: general services 82

Distributed file systems NFS: configuration • Mounting semantics – hard • client waits until request for a remote file succeeds • eventually forever – soft • failure returned if request does not succeed after n retries • breaks Unix failure semantics November 2005 Distributed systems: general services 83

Distributed file systems NFS: configuration • Automounter – principle: • empty mount points on clients • mount on first request for remote file – acts as a server for a local client • gets references to empty mount points • maps mount points to remote file systems • referenced file system mounted on mount point via a symbolic link, to avoid redundant requests to automounter November 2005 Distributed systems: general services 84

Distributed file systems NFS: implementation • In UNIX: client and server modules implemented in kernel • virtual file system: – internal key interface, based on file handles for remote files November 2005 Distributed systems: general services 85

Distributed file systems NFS: implementation User client process Client computer server computer System call UNIX kernel Virtual file system UNIX file system November 2005 Network NFS client NFS protocol Distributed systems: general services NFS server UNIX file system 86

Distributed file systems NFS: implementation • Virtual file system – Added to UNIX kernel to make distinction • Local files • Remote files – File handles: file ID in NFS • Base: inode number – File ID in partition on UNIX system • Extended with: – File system identifier – inode generation number (to enable reuse) November 2005 Distributed systems: general services 87

Distributed file systems NFS: implementation • Client integration: – NFS client module integrated in kernel • • offers standard UNIX interface no client recompilation/reloading single client module for all user level processes encryption in kernel • server integration – only for performance reasons – user level = 80% of kernel level version November 2005 Distributed systems: general services 88

Distributed file systems NFS: implementation • Directory service – name resolution co-ordinated in client – step-by-step process for multi-part file names – mapping tables in server: high overhead reduced by caching • Access control and authentication – based on UNIX user ID and group ID – included and checked for every NFS request – secure NFS 4. 0 thanks to use of DES encryption November 2005 Distributed systems: general services 89

Distributed file systems NFS: implementation • Caching – Unix caching • • based on disk blocks delayed write (why? ) read ahead (why? ) periodically sync to flush buffers in cache – Caching in NFS • Server caching • Client caching November 2005 Distributed systems: general services 90

Distributed file systems NFS: implementation • Server caching in NFS – based on standard UNIX caching: 2 modes – write-through (instead of delayed write) • failure semantics • performance – delayed write • Data stored in cache, till commit operation is received – Close on client commit operation on server • Failure semantics? • Performance November 2005 Distributed systems: general services 91

Distributed file systems NFS: implementation • Client caching – cached are results of • read, write, getattr, lookup, readdir – problem: multiple copies of same data at different NFS clients – NFS clients use read-ahead and delayed write November 2005 Distributed systems: general services 92

Distributed file systems NFS: implementation • Client caching (cont. ) – handling of writes • block of file is fetched and updated • changed block is marked as dirty • dirty pages of files are flushed to server asynchronously – on close of file – sync operation on client – by bio-daemon (when block is filled) • dirty pages of directories are flushed to server – by bio-daemon without further delay November 2005 Distributed systems: general services 93

Distributed file systems NFS: implementation • Client caching (cont. ) – consistency checks • based on time-stamps indicating last modification of file on server • validation checks – when file is opened – when a new block is fetched – assumed to remain valid for a fixed time (3 sec for file, 30 sec for directory) – next operation causes another check • costly procedure November 2005 Distributed systems: general services 94

Distributed file systems NFS: implementation • Caching – Cached entry is valid (T – Tc) < t or (Tmclient = Tmserver) – – T: current time Tc: time when cache entry was last validated t: freshness interval (3. . 30 secs in Solaris) Tm: time when block was last modified at … – consistency level • acceptable • most UNIX applications do not depend critically on synchronisation of file updates November 2005 Distributed systems: general services 95

Distributed file systems NFS: implementation • Performance – reasonable performance • remote files on fast disk better than local files on slow disk • RPC packets are 9 Kb to contain 8 Kb disk blocks – Lookup operation covers about 50% of server calls – Drawbacks: • frequent getattr calls for time-stamps (cache validation) • poor performance of (relative infrequent) writes (because of write-through) November 2005 Distributed systems: general services 96

Distributed file systems • Overview – File service model – case study: NFS – case study: Andrew File System -- AFS – comparison NFS <> AFS November 2005 Distributed systems: general services 97

Distributed file systems AFS • Background aims – Base: observation of UNIX file systems • • • files are small ( < 10 Kb) read is more common than write sequential access is common, random access is not most files are not shared files are often modified by one user file references come in bursts – Aim: combine best of personal computers and time-sharing systems November 2005 Distributed systems: general services 98

Distributed file systems AFS • Background aims (cont. ) – Assumptions about environment • • secured file servers public workstations with local disk no private files on local disk – Key targets: scalability and security • CMU 1991: 800 workstations, 40 servers November 2005 Distributed systems: general services 99

Distributed file systems AFS • Design characteristics – whole-file serving – whole-file caching entire files are transmitted, not blocks – client cache realised on local disk (relatively large) lower number of open requests on the network – separation between file and directory service November 2005 Distributed systems: general services 100

Distributed file systems AFS • Configuration – single (global) name space – local files • temporary files • system files for start-up – volume = unit of configuration and management – each server maintains a replicated location database (volume-server mappings) November 2005 Distributed systems: general services 101

Distributed file systems AFS • File name space local shared / (root) tmp bin / (root) afs –Symbolic link . . . ann users bin pierre Directory afs root of shared file system November 2005 Distributed systems: general services 102

Distributed file systems AFS • Implementation – Vice = file server • secured systems, controlled by system management • understands only file identifiers • runs in user space – Venus = user/client software • runs on workstations • workstations keep autonomy • implements directory services – kernel modifications for open and close November 2005 Distributed systems: general services 103

Distributed file systems AFS workstations User program servers Venus Unix kernel User program Unix kernel Venus Vice Unix kernel November 2005 Vice Distributed systems: general services 104

Distributed file systems AFS User program Unix file system call Non-local file operations Venus Unix kernel Unix file system November 2005 Distributed systems: general services 105

Distributed file systems AFS • Implementation of file system calls – open, close: • UNIX kernel of workstation • Venus (on workstation) • VICE (on server) – read, write: • UNIX kernel of workstation November 2005 Distributed systems: general services 106

Distributed file systems AFS • Open system call open( File. Name, …) kernel if File. Name in shared space /afs then pass request to Venus if file not present in local cache OR file present with invalid callback then pass request to Vice server venus network vice transfer copy of file and valid callback place copy of file in local file system and store File. Name open local file and return descriptor to user process November 2005 Distributed systems: general services 107

Distributed file systems AFS • Read Write system call read( File. Descriptor, …) kernel perform normal UNIX read op local copy of file …. November 2005 Distributed systems: general services 108

Distributed file systems AFS • Close system call close( File. Descriptor, …) kernel close local copy and inform Venus about close venus network if local copy is changed then send copy to Vice server Store copy of file and send callback to other clients holding callback promise vice. . . November 2005 Distributed systems: general services 109

Distributed file systems AFS • Caching – callback principle • service supplies callback promise at open • promise is stored with file in client cache – callback promise can be valid or cancelled • initially valid • server sends message to cancel callback promise – to all clients that cache the file – whenever file is updated November 2005 Distributed systems: general services 110

Distributed file systems AFS • Caching maintenance – when client workstation reboots • cache validation necessary because of missed messages • cache validation requests are sent for each valid promise – valid callback promises are renewed • on open • when no communication has occurred with the server during a period T November 2005 Distributed systems: general services 111

Distributed file systems AFS • Update semantics – guarantee after successful open of file F on server S: latest(F, S, 0) or lostcallback(S, T) and incache(F) and latest(F, S, T) – no other concurrency control • 2 copies can be updated at different workstations • all updates except from last close are (silently) lost • <> normal UNIX operation November 2005 Distributed systems: general services 112

Distributed file systems AFS • Performance: impressive <> NFS – benchmark: load on server • 40% AFS • 100% NFS – whole-file caching: • reduction of load on servers • minimises effect of network latency – read-only volumes are replicated (master copy for occasional updates) Andrew optimised for a specific pattern of use!! November 2005 Distributed systems: general services 113

Distributed file systems • Overview – File service model – case study: NFS – case study AFS – comparison NFS <> AFS November 2005 Distributed systems: general services 114

Distributed file systems NFS <>AFS • Access transparency – both in NFS and AFS – Unix file system interface is offered • Location transparency – uniform view on shared files in AFS – in NFS • mounting freedom • same view possible if same mounting; discipline! November 2005 Distributed systems: general services 115

Distributed file systems NFS <>AFS • Failure transparency – NFS • no state of clients stored in servers • idempotent operations • transparency limited by soft mounting – AFS • state about clients stored in servers • cache maintenance protocol handles server crashes • limitations? November 2005 Distributed systems: general services 116

Distributed file systems NFS <>AFS • Performance transparency – NFS • acceptable performance degradation – AFS • only delay for first open operation on file • better than NFS for small files November 2005 Distributed systems: general services 117

Distributed file systems NFS <>AFS • Migration transparency – limited: update of locations required – NFS • update configuration files on all clients – AFS • update of replicated database November 2005 Distributed systems: general services 118

Distributed file systems NFS <>AFS • Replication transparency – NFS • not supported – AFS • limited support; for read-only volumes • one master for exceptional updates; manual procedure for propagating changes to other volumes November 2005 Distributed systems: general services 119

Distributed file systems NFS <>AFS • Concurrency transparency – not supported (in UNIX) • Scalability transparency – AFS better than NFS November 2005 Distributed systems: general services 120

Distributed file systems • Overview – File service architecture – case study: NFS – case study AFS – comparison NFS <> AFS November 2005 Distributed systems: general services 121

Distributed file systems • Conclusions – Standards <> quality – evolution to standards and common key interface • AFS-3 incorporates Kerberos, vnode interface, large messages for file blocks (64 Kb) – network (mainly access) transparency causes inheritance of weakness: e. g. no concurrency control – evolution mainly performance driven November 2005 Distributed systems: general services 122

Distributed file systems • Good system design: – partitioning • different storage volumes – replication of servers • • high availability limited inconsistencies load balancing? server preference? – caching • acceptable performance • limited inconsistencies November 2005 Distributed systems: general services 123

This chapter: overview • Introduction • Time services • Name services • Distributed file systems • Peer-to-peer systems November 2005 Distributed systems: general services 124

Peer-to-peer systems • Overview – Introduction – Napster – Middleware – Routing algorithms – Ocean. Store –Pond file store November 2005 Distributed systems: general services 125

Peer-to-peer systems • Definition 1: Introduction – Support useful distributed services – Using data and computer resources in the PCs and workstations available on the Internet • Definition 2: – Applications exploiting resources available at the edges of the Internet November 2005 Distributed systems: general services 126

Peer-to-peer systems • Characteristics: Introduction l e b a t c i – Each user contributes resources to the system d e – All nodes have the same functional capabilities and r np u responsibilities s i s r e t uon the existence of – Correct operation does not depend p com any centrally administeredgsystems in t a – Can offer limited degree p of anonymity i c i t r a p – Efficient algorithms for the placement of data across f o y t many hosts and subsequent access i l i b a l i – Efficient algorithms for load balancing and availability a v Aof data November 2005 Distributed systems: general services 127

Peer-to-peer systems Introduction • 3 generations of peer-to-peer systems and application development: – Napster: music exchange service – File-sharing applications offering greater scalability, anonymity and fault tolerance: Freenet, Gnutella, Kazaa – Middleware for the application-independent management of distributed resources on a global scale: Pastry, Tapestry, … November 2005 Distributed systems: general services 128

Peer-to-peer systems Napster • Download digital music files • Architecture: – Centralised indices – Files stored and accessed on PCs • Method of operation (next slide) November 2005 Distributed systems: general services 129

Peer-to-peer systems Napster – step 5: file sharing expected! November 2005 Distributed systems: general services 130

Peer-to-peer systems • Conclusion: Napster – Feasibility demonstrated – Simple load balancing techniques – Replicated unified index of all available music files – No updates of files – Availability not a hard concern – Legal issues: anonymity November 2005 Distributed systems: general services 131

Peer-to-peer systems • Overview – Introduction – Napster – Middleware – Routing algorithms – Ocean. Store –Pond file store November 2005 Distributed systems: general services 132

Peer-to-peer systems • Goal: middleware – Automatic placement – Subsequent location of distributed objects } • Functional requirements: – Locate and communicate with any resource – Add and remove resources – Add and remove hosts – API independent of type of resource November 2005 Distributed systems: general services 133

Peer-to-peer systems middleware • Non-functional requirements: – Global scalability (millions of object on hundreds of thousands of hosts) – Optimization for local interactions between neighbouring peers – Accommodating to highly dynamic host availability – Security of data in an environment with heterogeneous trust – Anonymity, deniability and resistance to censorship November 2005 Distributed systems: general services 134

Peer-to-peer systems • Approach: middleware – Knowledge of locations of objects • Partitioned • Distributed • Replicated (x 16) throughout the network ═ Routing overlay: • Distributed algorithm for locating objects and nodes November 2005 Distributed systems: general services 135

Peer-to-peer systems middleware November 2005 Distributed systems: general services 136

Peer-to-peer systems middleware • Resource identification: – GUID = globally unique identifier – Secure hash from • all state – self certifying – for immutable objects • part of state – e. g. name + … November 2005 Distributed systems: general services 137

Peer-to-peer systems middleware • API for a DHT in Pastry put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(GUID) Deletes all references to GUID and the associated data. value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible it. DHT = distributed hash table November 2005 Distributed systems: general services 138

Peer-to-peer systems middleware • API for a DOLR in Tapestry publish(GUID ) This function makes the node performing a publish operation the host for the object corresponding to GUID. unpublish(GUID) Makes the object corresponding to GUID inaccessible. send. To. Obj(msg, GUID, [n]) an invocation message is sent to an object in order to access it. The optional parameter [n], requests the delivery of the same message to n replicas of the object. DOLR = distributed object location and routing November 2005 Distributed systems: general services 139

Peer-to-peer systems • Overview – Introduction – Napster – Middleware – Routing algorithms – Ocean. Store –Pond file store November 2005 Distributed systems: general services 140

Peer-to-peer systems Routing algorithms • prefix routing – Used in Pastry, Tapestry • GUID: 128 -bit – Host: secure hash on public key – Object: secure hash on name or part of state • N participating nodes – O(log N) steps to route a message to any GUID – O(log N) messages to integrate a new node November 2005 Distributed systems: general services 141

Peer-to-peer systems Routing algorithms • Simplified algorithm: circular routing – Leaf set in each active node – Element in leaf set: GUID – IP address of nodes whose GUIDS are numerically closest on either side of its own GIUD – Size L = 2*l – Leaf sets are updated when nodes • Join • Leave November 2005 Distributed systems: general services 142

Peer-to-peer systems Routing algorithms • Circular space for GUIDs • Dot = live node • Leaf size = 8 • Shown: routing of a message from node 65 A 1 FC to D 46 A 1 C November 2005 Distributed systems: general services 143

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry) – Tree-structured routing table • GUID – IP pairs spread throughout entire range of GUIDs • Increased density of coverage for GUIDs numerically close to GUID of local node • GUIDs viewed as hexadecimal values • Table classifies GUIDs based on hexadecimal prefices • As many rows as hexadecimal digits November 2005 Distributed systems: general services 144

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry) November 2005 Distributed systems: general services 145

Peer-to-peer systems Routing algorithms • Routing a message from 65 A 1 FC to D 46 A 1 C • Well-populated table: ~log 16(N) hops November 2005 Distributed systems: general services 146

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry) November 2005 Distributed systems: general services 147

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry): host integration – New node X: • computes GUIDX • X should know at least one nearby Pastry node A with GUIDA • X sends special join message to A (destination GUIDX) – A despatches the join message via Pastry • Route of message: A, B, …. Z (GUIDZ GUIDX) – Each node sends relevant part of its routing table and leaf sets to X November 2005 Distributed systems: general services 148

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry): host integration – Route of message: A, B, …. Z – Routing table of X: • • Row 0 of A row 0 of X B and X share hexadecimal digits: so row i of B row I of X … Choice for entry? proximity neighbour selection algorithm (metric: IP hops, measured latency, …) – Leaf set of Z leaf set of X – X sends its leaf set and routing table to all nodes in its routing table and leaf set November 2005 Distributed systems: general services 149

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry): host failure – Repair actions to update leafsets: leaf set requested from node(s) close to failed node – Routing table: repairs on a “when discovered” basis November 2005 Distributed systems: general services 150

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry): Fault tolerance – All nodes in leaf set life? • Each node sends heartbeat messages to nodes in leaf set – Malicious nodes? • Clients can use at least once delivery mechanism • Use limited randomness in node selection in routing algorithm November 2005 Distributed systems: general services 151

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry): dependability – Include acks in routing algorithm: No ack • alternative route • node: suspected failure – Heartbeat messages – Suspected failed nodes in routing tables • Probed • Fail: alternative nodes – Simple Gossip protocol to exchange routing table info between nodes November 2005 Distributed systems: general services 152

Peer-to-peer systems Routing algorithms • Full algorithm (Pastry): evaluation – Message loss – Performance: Relative Delay Penalty (RDP) = Pastry delivery / IP/UDP delivery IP loss Pastry wrong node RDP 100. 000 messages 0% 1. 5 0 1. 8 5% 3. 3 1. 6 2. 2 November 2005 Distributed systems: general services 153

Peer-to-peer systems • Overview – Introduction – Napster – Middleware – Routing algorithms – Ocean. Store –Pond file store November 2005 Distributed systems: general services 154

Distributed file systems • Objectives: Ocean. Store – Very large scale, incrementally-scalable, persistent storage facility – Mutable data objects with long-term persistence and reliability – Environment of network and computing resources constantly changing • Intended use • Mechanisms used November 2005 Distributed systems: general services 155

Distributed file systems • Intended use: Ocean. Store – NFS-like file system – Electronic mail hosting • Mechanism needed: – Consistency between replicas • Tailored to application needs by a Bayou like system – Privacy and integrity • Encryption of data • Byzantine agreement protocol for updates November 2005 Distributed systems: general services 156

Distributed file systems Ocean. Store • Storage organization AGUID version i+1 VGUID of version i d 1 d 2 d 3 root block version i BGUID (copy on write) VGUID of current version certificate indirection blocks data blocksd 1 d 2 d 3 d 4 d 5 VGUID of version i-1 November 2005 Distributed systems: general services 157

Distributed file systems Ocean. Store • Storage organization (cont) Name Meaning Description BGUID block GUID Secure hash of a data block VGUID version GUID BGUID of the root block of a version AGUID active GUID Uniquely identifies all the versions of an object November 2005 Distributed systems: general services 158

Distributed file systems Ocean. Store • Storage organization (cont) – New object AGUID • small set of hosts act as inner ring • publish (AGUID) • Refers to signed certificate recording the sequence of versions of the object – Update of object • Byzantine agreement between host in inner ring (primary copy) • Result disseminated to secondary replicas November 2005 Distributed systems: general services 159

Distributed file systems • Performance Ocean. Store LAN WAN Phase Linux NFS Pond Predominant operations in benchmark 1 0. 0 1. 9 0. 9 2. 8 Read and write 2 0. 3 11. 0 9. 4 16. 8 Read and write 3 1. 1 1. 8 8. 3 1. 8 Read 4 0. 5 1. 5 6. 9 1. 5 Read 5 2. 6 21. 0 21. 5 32. 0 Read and write Total 4. 5 37. 2 47. 0 54. 9 November 2005 Distributed systems: general services 160

Distributed file systems Ocean. Store • Performance (cont) – Benchmarks: • • • Create subdirectories recursively Copy a source tree Examine the status of all files in the tree Examine every byte of data in all files Compile and link the files November 2005 Distributed systems: general services 161

This chapter: overview • Introduction • Time services • Name services • Distributed file systems • Peer-to-peer systems November 2005 Distributed systems: general services 162

Distributed Systems: General Services November 2005 Distributed systems: general services 163