Chapter 1 Introduction Grand tour of the Distributed

Chapter 1: Introduction Grand tour of the Distributed Systems Thanks to the authors of the textbook [TS] for providing the base slides. I made several changes/additions. These slides may incorporate materials kindly provided by Prof. Dakai Zhu. So I would like to thank him, too. Turgay Korkmaz korkmaz@cs. utsa. edu Distributed Systems 1. 1 TS

Chapter 1: Introduction n DEFINITION OF A DISTRIBUTED SYSTEM n GOALS l l l Making Resources Accessible Distribution Transparency Openness Scalability Pitfalls n TYPES OF DISTRIBUTED SYSTEMS Distributed Computing Systems l Distributed Information Systems l Distributed Pervasive Systems l Distributed Systems 1. 2 TS

Objectives n To provide a grand tour of the key issues in distributed systems Distributed Systems 1. 3 TS

Computer System Revolution n Computers large/expensive small/cheap n Networks: LAN WAN, bps Kbps Gbps n Now, it is easy to put together many computers, and people to: Solve problems l Share resources l Increase collaboration l n Centralized sys Distributed sys Cloud Distributed Systems 1. 4 TS

What are Distributed Systems? intranet ISP % % backbone satellite link desktop computer: server: network link: A collection of networked independent computers that appears to its users as a single coherent system Distributed Systems 1. 5 TS

Distributed Systems: Definition n A distributed system (DS) is a piece of software that ensures that a collection of independent computers appears to its users as a single coherent system n But HOW can we hide the differences between independent computers & l provide a single system view? l n Solutions for distributed systems Distributed OS l Network OS l Middleware l Distributed Systems 1. 6 TS

Distributed Operating Systems n OS essentially tries to maintain a single, global view of the resources it manages (Tightly-coupled OS). n Full transparency: users feel a big system and are not aware of multiple different machines n Access to remote services similar to local resources Distributed Systems 1. 7 [From Wikipedia] TS

Network Operating Systems n Collection of independent OS augmented by network services (Loosely-coupled OS) n No transparency: users are aware of the multiplicity of the machines l explicitly log on into remote machines, or copy files from other machines n Apps use network services to access resources Distributed Systems - No single view of the distributed system - Need multiple passwords, multiple access permissions. - Only means of communication is message passing + Adding or removing a machine is relatively simple. 1. 8 TS

Middleware-Based Distributed Systems n Most modern distributed systems are designed to provide a level of transparency through a software layer on top of local OSes n This software layer is called Middleware Distributed Systems 1. 9 TS

Middleware-Based DS (cont. ) n Middleware l A higher level of programming abstraction 4 Examples: RPC, RMI l It hides the differences between various computers and the ways in which they communicate l It provides a single-system view l …. . n As a result, middleware facilitates the integration and interaction of various networked applications in a consistent and uniform manner. Distributed Systems 1. 10 TS

Make resources available/accessible Distribution transparency Openness Scalability GOALS OF DISTRIBUTED SYSTEMS Distributed Systems 1. 11 TS

Make resources available/accessible A 7: Anytime Anywhere Affordable Access to Anything by Anyone Authorized (Jeannette M. Wing, 2008) + share resources (economics) + increase collaboration - security and privacy - unwanted traffic Distributed Systems 1. 12 TS

Distribution Transparency -- users and applications see the DS as a single coherent system -- Access Hides differences in data representation and invocation mechanisms Location Hides where an object resides Migration Hides from an object the ability of a system to change that object’s location Relocation Hides that a resource may be moved to another location while in use Replication Hides the fact that an object or its state may be replicated at different locations Concurrency Hides coordination of activities between objects to achieve consistency at a higher level Failure Hides failure and possible recovery of Objects Distribution transparency is a nice a goal, but achieving it is a different story. Distributed Systems 1. 13 TS

Degree of Transparency n Aiming at full distribution transparency is good, but too much of it might hurt (like food : ) n Full transparency will cost performance l Keeping Web caches exactly up-to-date with the master l Immediately flushing write operations to disk for fault tolerance n Completely hiding failures of networks and nodes is (theoretically and practically) impossible l Can we distinguish a slow computer from a failing one? l Can we be sure that a server actually performed an operation before a crash? n Moreover, some things cannot be hidden (e. g. , propagation delay) n Solution: Expose distribution of the system l Let user be aware of distribution at various levels of transparency l For example, would you prefer a busy printer in CS or the idle one in ECE? Distributed Systems 1. 14 TS

Openness of Distributed Systems n Offer services according to standard rules that describe the syntax and semantics of those services n So that different open systems would be able to interact and use services from each other n How to achieve openness l Conform to well-defined interfaces (often described using IDL) 4 l Supportability 4 l Easy to define syntax. But semantic is hard so in practice it is defined in a natural language The same implementation (source code) should work on different machines Easily interoperate 4 Two different implementations should work together irrespective of their environments n Distributed system should be independent from heterogeneity of the underlying environment Hardware, Software Platforms, and Languages Distributed Systems 1. 15 TS

Implementing Openness: Separate Policies and Mechanisms n What are they? Take web-caching as an example: n Policies: l l What level of consistency do we require Which operations in a downloaded code do we allow Which Qo. S requirements do we adjust What level of secrecy do we require for communication? n Mechanisms: l l Allow (dynamic) setting of caching policies Support different levels of trust for mobile code Provide adjustable Qo. S parameters per data stream Offer different encryption algorithms n Separate them for flexibility and efficiency n For this, design system as a collection of small components instead of a monolithic large prog Distributed Systems 1. 16 TS

Scalability in Distributed Systems n Many developers of modern distributed system easily use the adjective “scalable” without making clear why their system actually scales. n Three aspects of scalability l Size Number of users and/or processes l Geographical Maximum distance between nodes l Administrative Number of administrative domains n Most systems account for size scalability: powerful servers (supercomputer) n Challenges: geographical and administrative scalability Distributed Systems 1. 17 TS

Problems with Size Scalability n What happens when more users/resources added? n Limitations of centralized systems l Service (e. g. , single server) overloaded servers l Data (e. g. , single phone book) saturated communication links l Algorithm (e. g. , routing based on global info) too much traffic n Use distributed service, database, and algorithm l No machine has complete info l Make decision based on local info l Failure of one node does affect others (not always) l No global clock (it can be done on LANs but trick in WANs) Distributed Systems 1. 18 TS

Problems with Geographical Scalability n Suppose we have an interactive application working on a LAN, can we use it over a WAN? n Delay l Blocking read/write might be OK on LAN but not on WAN n Reliability l Longer the distance higher the chance of loosing messages n Bandwidth n Locating a service by broadcasting is OK on LAN (e. g. , ARP) but not on WAN Distributed Systems 1. 19 TS

Problems with Administrative Scalability n In a single domain: l We can try to optimize resource usage because each entity belongs to the same domain and can be trusted n In case of multiple and independent administrative domains: l We do not own all resources and cannot trust others l So, we try to get things done based on some policies and agreements rather than optimization (e. g. , BGP vs. OSPF) l But there are several problems 4 Conflicting policies (who uses what and pays how much) 4 Management 4 Security (access rights and trust management) Distributed Systems 1. 20 TS

Techniques for Scalability to solve performance problems n Use asynchronous communication l Separate handler for incoming response and do something while waiting l + hide communication latencies l - what if there is nothing else to do n Partition data and computations into smaller parts and distribute them across multiple machines l Decentralized naming services (DNS) l Decentralized data, information systems (WWW) l Decentralized algorithm (Distance Vector) n Move computations to clients (Java applets) n Minimize packet format and protocol overheads n Use forward error coding instead of re-transmission Distributed Systems 1. 21 TS

Techniques for Scalability (cont’d) to solve performance problems n Use Replication/caching that makes multiple copies of the same services or data available at different machines Mirrored Web sites l Replicated file servers and databases l Web caches (in browsers and proxies) l File caching (at server and client) l + increase availability l + improve load balance and performance l + hide communication latency l - Inconsistencies when one copy is modified l - Global synchronization is needed for keeping copies consistent but it precludes large-scale solutions l - Tolerance to inconsistencies depends on application l Distributed Systems 1. 22 TS

Techniques for Scalability (cont’d) n All the techniques discussed so far deal with performance problems due to size and geographical scalability n How about administrative scalability? l The most challenging one (why? ) l The problems are often non-technical (Politics!) Distributed Systems 1. 23 TS

Developing Distributed Systems: Pitfalls n A complex task: Sound SW Eng Principles help n But a lot of mistakes are made because l the dispersion of many components are not taken in to account during design, n Mistakes are often due to false assumptions: The same global time l Perfect network/communication l 4 Latency is zero 4 Bandwidth is infinite 4 The network is reliable 4 The network is secure 4 The network is homogeneous The topology does not change l There is one administrator l Distributed Systems 1. 24 TS

Distributed Computing Systems (DCS) Cluster computing, Grid computing, Cloud computing Distributed Information Systems (DIS) Web servers, Distributed database applications Distributed Pervasive Systems (DPS) Smart home systems, Electronic health systems, Sensor networks: surveillance systems DIFFERENT TYPES OF DISTRIBUTED SYSTEMS Distributed Systems 1. 25 TS

DCS: Distributed Computing Systems Cluster Computing Systems A group of high-end systems connected through a LAN l Homogeneous: same OS, near-identical hardware l Single managing node Distributed Systems 1. 26 TS

DCS: Distributed Computing Systems Grid Computing Systems n Lots of nodes from everywhere share resources and collaborate l Heterogeneous l Dispersed across several organizations l Can easily span a wide-area network n To allow for collaborations, grids generally use virtual organizations. l In essence, this is a grouping of users (or better: their IDs) that have the same access rights l The key questions are 4 how to authorize users from different administrative domains and 4 how to provide these authorized users with the access to resources Distributed Systems 1. 27 TS

DCS: Distributed Computing Systems Grid Computing Systems (cont’d) n Application: l Use the grid computing environment n Collectivity layer: l Handles access to multiple resources (resource discovery) n Connectivity layer: l Communication protocols (access a remote resource, security) n Resource layer: l Manage single resource (create a process) n Fabric layer: l Interface to local resources (query, locking) Distributed Systems 1. 28 TS

DCS: Distributed Computing Systems Cloud Computing Systems n “Cloud computing has become another buzzword after Web 2. 0. ” n “We won’t compute on local computers, but on centralized facilities operated by third-party compute and storage utilities” n “There are dozens of different definitions for cloud computing and there seems to be no consensus on what a cloud is. ” Here is one definition: “A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. ” n “Cloud computing is not a completely new concept; it has intricate connection to the relatively new but thirteen-year established grid computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general. ” I. Foster, Cloud Computing and Grid Computing 360 -Degree Compared, Grid Computing Environments Workshop, 2008. GCE '08 Distributed Systems 1. 29 TS

DIS: Distributed Information Systems n Organizations have legacy networked applications, but it is hard to make them interoperate n Middleware can help n Integration can take place at several levels l Client-servers wrap a number of request into one and have it executed as a Distributed Transaction (all or none of requests would be executed) l Applications can be detached from their databases or divided into several components, these applications need to directly communicate instead of req/reply: Enterprise Application Integration (EAI) Distributed Systems 1. 30 TS

DIS: Distributed Information Systems Transaction Processing Systems Characteristic properties of transactions (ACID) n Atomic: To the outside world, the transaction happens indivisibly; l All operations either succeed, or all of them fail; n Consistent: The transaction does not violate system invariants; l Not exclude the possibility of invalid, intermediate states n Isolated: Concurrent transactions do not interfere with each other n Durable: Once a transaction commits, the changes are permanent Distributed Systems 1. 31 TS

DIS: Distributed Information Systems Distributed Databases: Transaction Process n TP Monitor: coordinate the execution of a transaction (subtransactions) when data is distributed across several servers Distributed Systems 1. 32 TS

DIS: Distributed Information Systems Enterprise Application Integration n A TP monitor doesn’t separate apps from their databases. n But we can do that and allow these applications to directly communicate l Use RPC or RMI 4 both applications must be up and running 4 know exactly how to refer to each other l Message-Oriented Middleware (MOM) 4 send data to a logical contact 4 publish/subscribe Distributed Systems 1. 33 TS

DPS: Distributed Pervasive Systems n So far we considered stable distributed systems (fixed nodes good connections) n But this is not the case for the emerging nextgeneration of distributed systems in which mobile and embedded devices are used ion t u b i distr ing it! e s n Some requirements Expo d of hid a inste Computing anywhere and anytime l Contextual change: environment changes should be immediately accounted for. l Ad hoc composition: Each node may be used in a very different ways by different users. Requires ease-of-configuration. l Sharing is the default: Nodes come and go, providing sharable services and information. Calls again for simplicity. l Distributed Systems 1. 34 TS

DPS: Distributed Pervasive Systems Home Systems n Should be completely self-organizing: n There should be no system administrator n Provide a personal space for each of its users n Simplest solution: l a centralized home box? l But how to access what you want? 4 Recommender Distributed Systems programs will help… 1. 35 TS

DPS: Distributed Pervasive Systems Electronic Health Care Systems n Devices are physically close to a person l Where and how should monitored data be stored? l How can we prevent loss of crucial data? l How can security be enforced? l How can physicians provide online feedback? Distributed Systems 1. 36 TS

DPS: Distributed Pervasive Systems Sensor Networks The nodes to which sensors are attached are: n Many (10 s-1000 s) n Simple (small memory/compute/communication capacity) n Often battery-powered (or even battery-less) Distributed Systems 1. 37 TS

END Distributed Systems 1. 38 TS