Introduction to Distributed Systems CO 1 Gain understanding

Introduction to Distributed Systems CO 1: Gain understanding of fundamental principles of Distributed System 1

Contents Introduction Distributed Computing Models Software Concepts Issues in designing Distributed System Client – Server Model 2

What is a distributed system? A distributed system is a collection of independent computers that appear to the users of the system as a single coherent system. Examples: Network of workstations……. Internet Distributed manufacturing system (e. g. , automated assembly line) Network of branch office computers……Intranet 3

What is a distributed system? A distributed system is the one that looks (appears) to its users like an ordinary centralized system, but runs on multiple, independent processors. The key concept is Transparency. i. e. The use of multiple processors should be invisible (transparent) to the users. OR Users view the system as a virtual uniprocessor not as a collection of distinct machines. 4

Why Distributed systems? ? Resource sharing Scalability Reliability Need for higher processing speed Concurrent transactions 5

Advantages of Distributed Systems over Independent PCs : • Economics: • • 6 • A collection of microprocessors offer a better price/performance than mainframes. Speed: A distributed system may have more total computing power than a mainframe. Inherent distribution: Some applications are inherently distributed. Ex. a supermarket chain. Reliability: If one machine crashes, the system as a whole can still survive. Higher availability and improved reliability. Incremental growth: Computing power can be added in small increments. Modular expandability. Data sharing: allow many users to access to a common data base Resource Sharing: expensive peripherals like color printers Communication: enhance human-to-human communication, e. g. , email, chat Flexibility: spread the workload over the available machines

Disadvantages of Distributed Systems Software: difficult to develop software for distributed systems Network: saturation Security: easy access also applies to secrete data Failures: The programs may not be able to detect whether the network has failed or has become unusually slow No Global Clock: The only communication is be sending messages over the network 7

Architectures for Distributed Systems Shared memory architectures (tightly coupled systems) Distributed memory architectures (loosely coupled systems) 8

Architectures for Distributed Systems PE 1 PE 2 PE 3 PE 4 Interconnection Network Memory shared by all processing elements FIG: Shared memory architecture 9 Memory CPU CPU FIG: Distributed memory architecture

Distributed Computing Models Workstation Model 2. Workstation-Server Model 3. Processor-pool Model 1. 10

Distributed Computing Models 1. Workstation Model It consists of a network of personal computers, each one with its own hard disk and local file system, interconnected over the network. (Diskful workstations) Workstation 100 mbps LAN Workstation 11 Workstation FIG: Workstation model

Distributed Computing Models 2. Workstation-Server Model It consists of multiple workstations coupled with powerful servers with extra hardware to store the file systems and other software like databases. (Diskless workstations) Workstation 100 Gbps LAN Minicomputer file server 12 Minicomputer http server FIG: Workstation-server model Minicomputer cycle server

Distributed Computing Models 3. Processor-pool Model It consists of multiple processors (a pool of processors) and a group of workstations. 100 Gbps LAN Server 1 13 ………. FIG: Processor-pool model Server N

Comparison of the Distributed Computing Models 14

Comparison between Workstation model and Workstation-server model Economically more viable to use a few high-end costly 15 servers & more diskless workstations Diskless workstations are easier to maintain than diskful ones. Simpler to install new releases of costly software in a few servers. In workstation-server model, the request-response protocol indicates that the client does not get burdened and the process migration becomes unnecessary. Since all files are managed by the servers, they remain unharmed even if the user’s home workstation fails. The client-server model is suitable for sharing the resources between different systems in a modular fashion.

Comparison between Processor-pool model and Workstation-server model The processor-pool model uses computing resources more effectively, all the resources of the system being available to the currently working users. Whereas, the workstationserver model offers services only to individual clients. There may be some workstations remaining idle, as their processing power cannot be utilized by other users. In processor-pool model, some of its processors can work as servers, if the load has increased or if more users are logged in and demanding new services. The workstation-server model performs better in case of high-performance interactive applications. 16

Software Concepts Software which handles the management of the entire distributed system consists of open services such as file service, name service, networking, and electronic mail; support for distributed programming such as RPC and group communication. • Three types: 1. Network Operating Systems 2. (True) Distributed Systems 17 3. Multiprocessor Time Sharing

1. Network Operating System It provides an environment where users are aware of multiplicity of machines Users can access the remote resources by Logging into remote machine OR Transferring data from the remote machine to their own machine Users should know where the required files and directories are and mount them 18

1. Network Operating Systems cont’d. . · Loosely-coupled software on loosely-coupled 19 hardware · A network of workstations connected by LAN · Each machine has a high degree of autonomy · Sharing can be obtained by a few commands o rlogin machine o rcp machine 1: file 1 machine 2: file 2 · Files servers: client and server model · Clients mount directories on file servers · Best known network OS: o Sun’s NFS (network file servers) for shared file systems

1. Network Operating System cont’d. . OSes can be different (Windows or Linux) Typical services: rlogin, rcp Fairly primitive way to share files 20

2. Distributed Operating System Runs a cluster of machines with no shared memory Users get a feel of a single processor Transparency is the driving force Requires: A single global IPC mechanism A global protection mechanism 21

2. (True) Distributed Systems § Tightly-coupled software on loosely-coupled hardware § provide a single-system image or a virtual uniprocessor § a single, global interprocess communication mechanism, process management, file system; the same system call interface everywhere § Ideal definition: 22 “ A distributed system runs on a collection of computers that do not have shared memory, yet looks like a single computer to its users. ”

2. Distributed Operating Systems No shared memory Provide message passing 23

3. Multiprocessor Operating Systems · Tightly-coupled software on tightly-coupled hardware · Examples: high-performance servers · shared memory · single run queue 24

Characteristics of a Distributed System Consists of several computers that do not share memory Computers communicate with each other by exchanging messages over a communication network. Each computer has its own memory & runs its own operating system. The resources owned & controlled by a computer are said to be Local for it. Resources owned & controlled by other computers are said 25 to be Remote.

Design Issues of Distributed Systems 1. Connecting Users and Resources 2. Transparency 3. Flexibility 4. Reliability 5. Performance 26 6. Scalability

1. Connecting Users and Resources A Distributed system should make it easy for users to access remote resources & to share them with other users Eg. Of resources : Printers, Data files, Web Pages Connecting users and resources also makes it easier to collaborate & exchange information. But, as the connectivity and sharing increases, security becomes more important 27

2. Transparency An important goal of a distributed system is to hide the fact that its processes & resources are physically distributed across multiple computers. It should provide a Single System Image to its users A Distributed system that is able to present itself to its users and applications as if it were only a single computer system is said to be Transparent 28

Forms of Transparency in a Distributed System 29 Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource may be shared by several competitive users Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource

2. 1 Access Transparency User should not know whether a resource (H/w, S/w) is remote or local DOS allows the users to access remote resources in the same way as local resources. It is the responsibility of the DOS to locate resources & to arrange for servicing user in user transparent manner. 30

2. 2 Location Transparency Location of the resource should be hidden from the user. i. e. Resources accessed by a user can be anywhere on the network without the user having an idea where the resource is located. The name of the resource should not reveal the location of the resource. E. G. URL http: //www. prehall. com/index. html Gives no clue about location of Prentice Hall’s main web server 31

2. 3 Migration Transparency Resources can be moved from one node to another node, without user/client noticing it. E. g. : Mobile users can continue to use their wireless laptop while moving from one place to another without ever being disconnected. 32

2. 4 Replication Transparency For better performance and reliability, DOS should have provision to create replicas of resources on different nodes. Both, existence of multiple copies & replication activity should be transparent to the users. 33

2. 5 Concurrency Transparency Users are unaware that the resources are being accessed concurrently. Concurrent access to a shared resource should leave the resource in Consistent state 34

2. 6 Failure Transparency Deals with masking failures in the system from user E. g. Machine Failure , Storage device crash. User does notice that a resource fails to work properly & system subsequently recovers from that failure. Complete Failure Transparency is not achievable as all types of failures can't be handled in a user transparent manner. E. g. Failure of communication network 35

Forms of Transparency in a Distributed System 36 Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource may be shared by several competitive users Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource

3. Flexibility A DOS should be flexible and should provide— Ease of Modification: It should be easy to incorporate changes (Due to fixes / new user requirements) in the system in a user-transparent manner Ease of Enhancement: It should be easy to add new functionalities to the system, time to time. 37

4. Scalability Refers to the capability of a system to adapt increased service load. Scalability can be measured in three different dimensions: Scalable administratively: It should be easy to manage even if the system spans many different administrative organizations. Scalable geographically: Users & resources may lie far apart 38 Scalability w. r. t. size:

4. Scalability (issues) • Systems grow with time or become obsolete. Techniques that require resources linearly in terms of the size of the system are not scalable. • e. g. , broadcast based query won't work for large distributed systems. • Examples of bottlenecks o Centralized components: a single mail server for all users o Centralized tables: a single URL address book, A single on-line telephone book o Centralized algorithms: routing based on complete 39 information

5. Security In order that the users can trust the system & rely on it, resources of a computer system should be protected against destruction and unauthorized access. Enforcing security in DOS is difficult In DOS-- It should be possible for the sender to know that message was received by the receiver It should be possible for the receiver to know who is the sender. Contents should not get changed. 40

Client- Server Model 41

Basic Concepts 42

Basic Concepts Connectionless Request Reply protocol 43

44

Client-Server Addressing 45

1. Machine addressing The client sends the address as a part of the message, which is extracted at the receiving end by the server. Works well if there is only one process running on the server machine. If there are multiple processes running on the server, the process ID should be sent as a part of the message. 46

2. Process addressing Send a message to the processes and not the machine. Name comprising the machine ID and the process ID is used for addressing. The sender kernel broadcasts the message which contains the destination process address. All kernels check their process addresses, the machine that owns the process sends its network address to the sender. 47

3. Name server technique Use an extra machine to map ASCII level names to machine addresses. The ASCII strings are embedded in the program. This special server is called a name server. 48

Client-Server Implementation 49

How can you differentiate the client and the server? Three levels User interface level Processing level Data level 50

51

Client-Server Architecture 52
- Slides: 52