Major Application Areas in Cyberspace J o el

Major Application Areas in Cyberspace J o el C r ichlow , P h. D

Areas q. Distributed File Systems q. Distributed Database Systems q. Distributed Computation Systems q. Distributed Real-Time Systems q. Distributed Multimedia Systems q. Distributed Operating Systems

Distributed File Systems Structure ◦ Client-Server ◦ Peer-to-Peer Issues ◦ Unit of Access ◦ File ◦ Page/Block ◦ Record ◦ Word/Byte

Distributed File Systems Issues ◦ Division of Labor ◦ Clients maintain own file system ◦ Server maintains a global file system ◦ All file commands are channeled to the server ◦ Use mounting to combine local file systems with global file system

Distributed File Systems Client maintains file system Maps local textual names onto global FIDs Client Server File map User 1 User 2 User 3 Filename Friends Foes FID 100179 428761 Filename FID Page map entry 100179 Page 0 1 2 Block 4 7 3

Distributed File Systems §Clients maintain own file system §Global file naming is done at the server level §If the file server provides automatic backup and recovery facilities, then files can be classified as recoverable, robust or ordinary §The unit of access available to the client will determine how much data are stored at the server for mapping the client’s logical request onto the physical address

Distributed File Systems Use mounting to combine local file systems with global file system Server 1 has a directory ‘play’; server 2 has a directory ‘work’. Client 1 places ‘play’ and ‘work’ at the same level; client 2 places ‘work’ in a sub-directory of ‘play’ Client 1 Server 1: play Server 2: work playhard workhard playeasy workeasy Client 2 work playhard workhard playeasy work playhard playeasy workhard workeasy

Google File System (GFS) Latest version is called Colossus. Two of the key issues addressed by the designers were (a) The frequency of component failures. (b) The management of very large data sets.

Google File System (GFS) §GFS runs on thousands of storage machines built from inexpensive commodity parts, and it is accessed by an equivalent number of client machines §Failure is viewed as the norm rather than the exception §The system must constantly monitor itself to detect, tolerate and recover from failure

Google File System (GFS) The system supports millions of files of any size, but multi-GB files are common. Many of the accesses to these files are large streaming reads that can read up to 1 MB or more.

Google File System (GFS) §There are many large sequential writes that append multiple KB to MB of data to files §Multiple clients can append atomically to the same file concurrently §There also small reads of a few KB at any offset and small writes to arbitrary positions in a file

Google File System (GFS) The GFS architecture comprises a single master, multiple chunkservers and multiple clients. Files are divided into fixed-size blocks called chunks of 64 MB (current size).

Google File System (GFS) The master keeps informed of the current state of the system by sending (Heartbeat) messages periodically to each chunkserver. The GFS client provides the interface for applications to use the file system.

Distributed Database Systems Distribution Problem and Pattern ◦ Volume and Activity ◦ Number of Participating Hosts ◦ Storage Facilities ◦ Communication Load ◦ Replication and Partitioning

Distributed Database Systems Queries and Updates Phases Query phases ◦ Copy identification phase ◦ Query decomposition ◦ Response composition Update phases ◦ Copy Identification ◦ Pessimistic/Optimistic approach

Distributed Database Systems Queries Supplier relation S# 100 200 Name JOHN DOE City POS NY Unit price relation S# 100 200 P# 1011 1300 1123 1246 Price $0. 50 $1. 50 $0. 60 $0. 70 Site 1 Site 2 Query is made here P# 1011 1123 1246 1300 Parts relation Pname Quant. Bolt 400 Nut 400 Screw 600 Nail 500 Relat. Sup. Part Price Dictionary Locat. #Tups Site 1 800 Site 2 1500 Site 3 10000 T-size 10 10 3 Suppler relation Parts relation Unit price relation What are the names of suppliers in NY who supply screws at a unit price of less than $1. 00? Site 3

Distributed Database Systems Updates ◦ Integrity ◦ Concurrency ◦ Replication

Big Data management §Handles very large amounts of data distributed over many servers §Highly available service with no single point of failure §Key-value store §Different levels of consistency §Automatic replication of data to multiple nodes

Google Big. Table §Google’s No. SQL distributed data management system. §Big. Table is a sparse map or (key, value) store distributed over multiple servers. §It is designed to include clusters comprising thousands of commodity servers storing petabytes of data.

Google Big. Table The data or values stored in Big. Table are treated as uninterpreted strings. The Big. Table key is three-dimensional. The three-part key contains a row key, a column key and a timestamp. Therefore the mapping takes the form: (row key, column key, timestamp) value.

Distributed Computation Systems §Networked computers cooperate in the execution of a computationally intensive program §The Network Platform §Algorithm Design and Implementation §Languages, Standards and Tools

Distributed Computation Systems The Network Platform ◦ Cluster Computing ◦ The Internet ◦ The Lambdagrid Algorithm Design and Implementation ◦ control parallelism ◦ data parallelism

Distributed Computation Systems Languages, Standards and Tools ◦ PVM ◦ MPI ◦ DCE ◦ CORBA ◦ Globus Toolkit

Distributed Computation Tasks (T) interact with each other in a PVM running context. PVM uses network protocols (N) for communication among the computers T PVM N T T Distributed applications MIDDLEWARE Host OS and network service N PVM Distributed applications network Distributed applications use MIDDLEWARE tools to interoperate over a network of heterogeneous computers

XSEDE §The Extreme Science and Engineering Discovery Environment, XSEDE, tightly integrates supercomputing resources, storage and scientific instruments across geographically dispersed major research centers §The interconnection network includes a backbone of hubs allowing interhub transmission capacity of 40 Gbps. §To the hubs are linked border routers which are the interfaces between the grid and the sites. §Each site has up to 10 Gbps dedicated transmission capacity

XSEDE The XSEDE interconnection network is hierarchical.

Distributed Real-Time Systems §Environment §Geographic Range §Communication Traffic §Computer Processing

Distributed Real-Time Systems Computer Processing Server Data Aggregation Network of Sensors Distributed real-time processing may be hierarchical, involving a low-level network of sensors feeding data to data -aggregation nodes which feed high-level servers

Distributed Multimedia Systems The Signals ◦ Stereo quality audio CD would require up to 1. 411 Mbps. ◦ Video: flash discrete images at a rate of 50 or more images per second ◦ The images in video can be represented as a sequence of frames (a frame is a rectangular grid of pixels) ◦ Twenty-four bits per pixel with a frame of 1024 * 768 pixels is illustrative of present high-resolution technology ◦ Transmitting at 25 frames per second would require transmission capacity in excess of 400 Mbps Peer-to-Peer Multimedia Systems Media On Demand (MOD) ◦ Video on Demand (VOD) or On Demand (OD)

Distributed Multimedia Media On Demand (MOD) MOD server maintains a digital repository of videos which home users, via communication networks, can access and view immediately home MOD server network home

Distributed Multimedia Media On Demand (MOD) Massive storage must be arranged as hierarchical structure client viewers network Mag disk Opt disk Mag tape Server RAM Mag disk Opt disk Mag tape

Distributed Operating Systems §Network Operating System §Distributed Operating System §Issues §Threads

Network Operating System Network operating system of agents and different local operating systems Local OS agent network agent Local OS

Distributed Operating Systems Homogeneous network-wide operating system Issues ◦ Fundamental OS problems ◦ Data integrity ◦ Fail-Soft operation ◦ Security ◦ Performance ◦ Scalability Threads

Windows NT family The Windows NT family comprises a series of releases of operating systems that support distributed system applications The NT architecture comprises a number of layers ◦ Hardware Abstraction Layer (HAL) ◦ Kernel ◦ Executive ◦ Subsystems.

Windows NT family Architecture

Mach microkernel Mach is a distributed operating system project that has seen its kernel used in several Unix-like operating systems and in the Mac OS X operating system

Conclusion We looked at: q. Distributed File Systems q. Distributed Database Systems q. Distributed Computation Systems q. Distributed Real-Time Systems q. Distributed Multimedia Systems q. Distributed Operating Systems
- Slides: 38