Data Technologies 3 1 Distributed Relational Database Architecture

Data Technologies 3. 1. Distributed Relational Database Architecture (DRDA) 1

DRDA basics • DRDA is an open, published architecture that enables communication between ‒ applications and ‒ database systems • on disparate platforms, whether those applications and database systems are provided by the same or different vendors and whether the platforms are the same or different hardware/software architectures. • DRDA is a combination of other architectures and the environmental rules and process model for using them. The architectures that actually comprise DRDA are • Distributed Data Management (DDM) and • Formatted Data Object Content Architecture(FD: OCA) 2

Distributed Data Management • The Distributed Data Management (DDM) architecture provides the overall command reply structure used by the distributed database. • Fewer than 20 commands are required to implement all of the distributed database functions for communication between the application requester (client) and the application server. 3

Formatted Data Object Content Architecture • The Formatted Data Object Content Architecture (FD: OCA) provides the data definition architectural base for DRDA. Descriptors defined by DRDA provide layout and data type information for all the information routinely exchanged between the application requesters and servers. • A descriptor organization is defined by DRDA to allow dynamic definition of user data that flows as part of command or reply data. DRDA also specifies that the descriptors only have to flow once per answer set, regardless of the number of rows actually returned, thus minimizing data traffic on the wire. 4

DRDA Reference • The DRDA Reference describes the necessary connection between ‒ an application and ‒ a relational database management system in a distributed environment. Application 1 Application 2 DBMS 1 DBMS 2 OS 1 OS 2 Proc 1 Proc 2 DB 1 Network Zurich 5

DRDA Reference • The DRDA Reference describes the necessary connection between ‒ an application and ‒ a relational database management system in a distributed environment. • It describes the responsibilities of these participants, and specifies when the flows should occur. • It describes the formats and protocols required for distributed database management system processing. 6

Unit of work (Uo. W) • Unit of work (Uo. W), unit of recovery (UR), or logical transaction are different ways to refer to the same concept. The DRDA terminology prefers the term unit of work to transaction, but the meaning is the same. • Unit of work refers to a sequence of database requests that carry out a particular task, such as in a banking application when you transfer money from your savings account to your checking account. • This task has its logical independence and should be treated “atomically”, which means that either all its components are executed or none of them are. You do not want your savings balance to be updated without your checking balance being updated too. • A unit of work is generally terminated by a commit operation if the entire task completes successfully. 7

Unit of work concept 8

Extension of DML SQL language 9

SQL as a common DRDA database access language • SQL has become the most common data access language for relational databases in the industry. • SQL was chosen as part of DRDA because of its high degree of standardization and portability. • In a distributed environment, where you want to access data at remote locations, the SQL requests are routed to the remote systems and they are executed remotely. • Prior to sending the remote SQL request, a DRDA application must establish a connection with the remote relational database where the data is located. • This is the purpose of the CONNECT SQL statement provided by DRDA. 10

Connect statement • An application using the CONNECT statement directs the function receiving the application request to establish connectivity with a named relational database system. • The term that DRDA uses to represent the name of the relational database (RDB) is RDB_NAME. 11

Types of distribution 12

Types of Distribution • There are (level 0, 1, 2, 3 ) four levels of distribution of database management system functions. • Each level of distribution has different DRDA requirements. 13

Level 0 – Remote request 14

Level 0 – Remote request 15

Level 0 – Remote request Uo. W SQL Request @Zurich Commit DBMS 2 DBMS 1 OS 2 Proc 1 Proc 2 Network DB 1 Zurich 16

Level 0, Remote Request (RR) • One request • Within one Uo. W • To one DBMS 17

Data flow between a DB 2 Connect server and a host At this level no DBMS required on the workstation! 18

Application requester and application server • In a distributed relational database environment, the system running the application and sending the SQL requests across the network is called an application requester (AR). • Any remote system that executes SQL requests coming from the application requester is also known as an application server (AS). • Some platforms can participate in a distributed database environment as both an application requester and an application server. 19

Application requester and Application server • An application requester (AR) is the code that handles the application end of a distributed connection. • The AR is the application that is requesting data. • SQL Connect acts as an application requester on behalf of application programs which can be local to the DB 2 Connect workstation or on a separate client remote to DB 2 Connect. • An application server (AS) is the code that handles the database end of the connection. 20

Level 0 – Remote request Uo. W SQL Request @Zurich Commit Application Requestor Application Server DBMS 1 OS 2 Proc 1 Proc 2 DBMS 2 Network DB 1 Zurich 21

DRDA Level 0 – Remote request 22

Remote request 23

Application-Directed Remote Unit of Work (DRDA Level 1) 24

Level 1 – Remote unit of work 25

Level 1 – Remote unit of work Uo. W SQL Request 1 @Zurich SQL Request 2 @Zurich Commit DBMS 2 DBMS 1 OS 2 Proc 1 Proc 2 Network DB 1 Zurich 26

Level 1 - Remote Unit of Work (RUW) • One or more SQL requests • within one Uo. W • to a single DBMS • Application initiates commit • Commitment at a single DBMS Switching to a different location is possible, but a new Uo. W must be started and the previous one must be completed. 27

Remote unit of work A remote unit of work lets a user or application program read or update data at one location per unit of work. It supports access to one database within a unit of work. While an application program can update several remote databases, it can only access one database within a unit of work. 28

Level 1 – Remote Uo. W 29

Remote unit of work 30

Level 1 - Application Requester (AR) • The AR accepts SQL requests from an application and sends them to the appropriate application servers for processing. Using this function, application programs can access remote data. 31

Level 1 - Application Server (AS) • The AS receives requests from application requesters and processes them. • The AR and the AS communicate through a protocol called the Application Support Protocol which handles data representation conversion. 32

Level 1 – Database server Database Server (DS) • The DS receives requests from AS or other DS servers. The DS supports distributed requests • The AS and the DS among themselves communicate through a protocol called the Database Support Protocol. 33

Application-Directed Remote Unit of Work Commit • With Remote Unit of Work, an application program executing in one system can access data at a remote database management system using the SQL supported by that remote database management system. • Remote Unit of Work supports access to one database management system within a unit of work. • The application can perform multiple SQL statements within the unit of work. • When the application is ready to commit the work, it initiates the commit at the database management system that is accessed for the unit of work. • In the next unit of work, the application can access the same database management system or another database management system. 34

Example shows a database client running a funds transfer application that accesses a database containing checking and savings account tables, as well as a transaction fee schedule. 35

Example The application must: • Accept the amount to transfer from the user interface. • Subtract (in main memory) the amount from the savings account, and determine the new balance. • Read the fee schedule to determine the transaction fee for a savings account with the given balance. • Subtract the transaction fee from the savings account. • Add the amount of the transfer to the checking account. • Commit the transaction (unit of work). 36

Tables to be created To set up such an application, you must: Create the tables for the • savings account, • checking account and • transaction fee schedule in the same database. If physically remote, set up the database server to use the appropriate communications protocol. If physically remote, catalog the node and the database to identify the database on the database server. Precompile your application program to specify a type 1 connection; that is, specify CONNECT(1) on the PREP command. 37

Application-Directed Distributed Unit of Work (DRDA Level 2) 38

Level 2 – Distributed Unit of Work 39

Level 2 – Distributed Unit of work Uo. W SQL Request 1 @Zurich SQL Request 2 @Rochester Commit DBMS 1 DBMS 2 OS 1 OS 2 Proc 1 Proc 2 Network Rochester Zurich 40

Application-Directed Distributed Unit of Work (DUW) • Several DBMSs per unit of work • Multiple requests per unit of work • One DBMS per request • One unit of work per DBMS • Application directs the distribution of work • Application initiates commit • NEW! - Commitment coordination across multiple DBMSs 41

Distributed Uo. W functionalities For Isolation • Distributed two phase locking: we will not go into detail For Atomicity and Durability and Termination • Distributed two phase commit 42

Application-Directed Distributed Unit of Work (DUW) • Several DBMSs per unit of work • Multiple requests per unit of work • One DBMS per request • One unit of work per DBMS • Application directs the distribution of work • Application initiates commit • NEW! - Commitment coordination across multiple DBMSs • NEW! Distributed protocol for commit! Two phase commit 43

Distributed unit of work 44

Level 2 – Distributed Uo. W New!! Distributed Isolation & termination transaction management 2 PC 45

Remote Unit of Work (DRDA-1) versus Distributed Unit of Work (DRDA-2) REMARK i. Series is an IBM technology • Figure shows three units of work. The arrows pointing to the left indicate the only possible way to access data in a DRDA-1 application. • The arrows pointing to the right show the new flexibility of a DRDA 2 application accessing multiple systems in the same Uo. W. Let's consider, for example, the Rochester system on the right-hand side. It issues three requests: Request 2, Request 4, and Request 6. • Each of these requests belong to a different unit of work. • The Rochester system on the lefthand side also issues three requests (Request 3, Request 4, and Request 5), each targeting a different application server within one Uo. W. Level 1 Level 2 46

Function types • Application Requester Functions. Support SQL and program preparation services from applications. • Application Server Functions. Support requests that application requesters have sent and routes requests to database servers by connecting as an application requester. • Database Server Functions. Support requests from application servers. Support the propagation of special register settings. 47

Distributed Uo. W interoperability • Any database management systems could be in any position in this figure. Figure shows three places where databases (DBMS) from different vendors—for example, IBM, Microsoft, Oracle, and so on—may be used. • By the way, a developer might choose to implement either DRDA Remote Unit of Work or Distributed Unit of Work. 48

Distributed two phase commit in more detail 49

Need for a distributed commit – 1 (Remark: here for simplicity we do not represent Appl. Requestor etc. ) Transaction T 2 DB 1 DBMS Processing Unit 1 Processing Unit 2 DB 2 Network 50

Need for a distributed commit - 2 Local Trans. t 1 update (b) commit Global Transaction T = [example: update(a); update (b); commit] Local Trans. t 2 update (a) commit DBMS b DBMS Processing Unit 1 Processing Unit 2 a Network One global transaction T Two local transactions t 2 and t 1! 51

In absence of a global clock…. …. we need a distributed commit Local Trans. t 1 update (b) commit b Global Transaction T = [example: update(a); update (b); commit] DBMS Processing Unit 1 Processing Unit 2 Local Trans. t 2 update (a) commit a Network 52

Two-phase commit protocol • Synchronizing multiple databases requires additional effort compared to the process of keeping data consistent on a single system. • Because multiple physical locations are involved, the synchronization process is split into two phases to ensure data atomicity across multiple locations. • The database managers involved in the distributed unit of work must make sure that either all of them commit their changes or roll all the changes back consistently. • The protocol by which multiple database managers can keep their data in sync is called two-phase commitment control or two phase commit. 53

Two phase commit • In an application using two-phase commit, the COMMIT statement generates a rather complex sequence of operations ( ) that allows the various agents in the network to keep their data in a consistent state. • Also, a two-phase commit must protect your applications against network or system failures that may occur during the transaction ( durability). In these cases, the database managers involved in the unit of work automatically roll back their changes. 54

Coordinator & Participant(s) PARTICIPANT Local Trans. t 1 update (b); DBMS commit b COORDINATOR Global Transaction T = [example: update(a); update (b); commit] DBMS Processing Unit 1 Processing Unit 2 Local Trans. t 2 update (a); commit a Network 55

Two phase commit protocol 56

Two phase commit protocol 57

Two phase commit protocol 58

Two phase commit protocol 59

Two phase commit protocol 60

Level 2 implementation example 61

Distributed request (DRDA Level 3) 62

Level 3 - Distributed Request 63

Level 2 – Distributed Unit of work Uo. W SQL Request 1 @Zurich and @Rochester (example: Join) Commit DBMS 2 DBMS 1 OS 2 Proc 1 Proc 2 Network Rochester Zurich 64

Distributed Request (DRDA Level 3) • Several DBMSs per unit of work • Application directs requests to a DBMS • The DBMS distributes the unit of work to multiple DBMSs 65

Level 3 – Database Direct Access SQL Request Application Requestor DBMS 1 Application Server OS 1 OS 2 Proc 1 Proc 2 DBMS 2 Network DB 1 Zurich 66

Database-Directed Access (DRDA Level 3) • Several DBMSs per unit of work • Application directs requests to a DBMS • The DBMS distributes the unit of work to multiple DBMSs • Multiple requests per unit of work • Multiple DBMS per request • Application initiates commit • Commitment coordination across multiple DBMSs 67

Level 3 – Distributed request New!! Query optimization 68

New Transparency layer at level 3 and other properties 69

Location transparency • Distributed request provides location transparency for database objects. If information (in tables and views) is moved, references to that information (called nicknames) can be updated without any changes to applications that request the information. Location Transparency Logical Transparency Physical Transparency 70

Compensation • Distributed request also provides compensation for DBMSs that do not support e. g. all of the IBM DB 2 SQL dialect, or certain optimization capabilities. Operations that cannot be performed under such a DBMS (such as recursive SQL) are run under DB 2 Connect. 71

Semi autonomy • Distributed request function in a semi-autonomous manner. • For example, IBM DB 2 queries containing references to Oracle objects can be submitted while Oracle applications are accessing the same server. • Distributed request does not monopolize or restrict access (beyond integrity and locking constraints) to Oracle or other DBMS objects. 72

Distributed request 73

Database-Directed Distributed Unit of Work • In database-directed requests, an application connects to a relational database management system (RDB) that can execute one or more SQL requests locally or route some or all of the SQL requests to other RDBs. • The RDB determines which system manages the data referenced by the SQL statement and automatically propagates any special registers set by the application and directs the request to that system. 75

Global schema and local schemas 76

At every node we may represent • A local schema that represens tables stored at the node • A common global schema result of the integration of the local schemas 77

Example of distributed DBMS Which is the global schema? Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) DBMS Processing Unit 1 Processing Unit 2 Students(#Stud, Surname) 2. 000 Students Network DBMS Processing Unit 3 78

In every node we may represent a global schema Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 2 Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 1 DBMS Processing Unit 1 Processing Unit 2 Network Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 3 DBMS Processing Unit 3 79

We may define also local schemas seen at nodes Which is the local schema at node 2? Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 2 Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 1 DBMS Processing Unit 1 Processing Unit 2 Network Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 3 DBMS Processing Unit 3 80

We may define also local schemas seen at nodes Which is the local schema at node 1? Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 2 Void Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 1 DBMS Processing Unit 1 Processing Unit 2 Network Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 3 DBMS Processing Unit 3 81

Local schema at node 3? Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 1 Exams (#Stud, #Course, Grade) Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 2 Void DBMS Processing Unit 1 Processing Unit 2 Network Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 3 DBMS Processing Unit 3 82

Global and local schemas Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 1 Exams (#Stud, #Course, Grade) Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 2 Void DBMS Processing Unit 1 Processing Unit 2 Network Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 3 Students (#Stud, Surname) DBMS Processing Unit 3

Queries can be expressed in terms of the global schema Select students with surnames and exams where grade = «cum laude» Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 1 Exams (#Stud, #Course, Grade) Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 2 Void DBMS Processing Unit 1 Processing Unit 2 Network Global schema Students(#Stud, Surname) Exams (#Stud, #Course, Grade) Local schema 3 Student(#Stud, Surname) DBMS Processing Unit 3

A new issue: distributed query optimization I will introduce with an exercise 85

Exercise: Find the execution plan of Students enrolled@Bicocca, with surnames and exams passed (#Course) such that grade = «cum laude» ) Cost of communication (one tuple) = 10 * Cost of transfer Secondary Mem <--> CPU (one tuple) Find all student and exams such that grade = cum laude Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) Result DBMS Processing Unit 1 Processing Unit 2 Communication Students(#Stud, Surname) 2. 000 Students Network DBMS Processing Unit 3 Transfer Sec. Mem <--> CPU 86

The execution plan is made of a sequence of two types of tasks • Transfer Table from node i to node j • Execute in memory at node i operation O = [project, select, join] Find all student and exams Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) DBMS Processing Unit 1 Processing Unit 2 Communication Students(#Stud, Surname) 2. 000 Students Network DBMS Processing Unit 3 Transfer Sec. Mem <--> CPU 87

The execution plan is made of a sequence of two types of tasks • Transfer Table from node i to node j • Execute in memory at node i operation O = [project, select, join] Find all student and exams 1. SELECT Exams with G=CL Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) DBMS Processing Unit 1 2. Transfer 1, 3 Exams. CL Processing Unit 2 4. Transfer 3, 2 St. CL Network Students(#Stud, Surname) 2. 000 Students DBMS Processing Unit 3 3. Join (Exams. CL; Students) 88

Which is the cost of operation 1? Find all student and exams 1. SELECT Exams with G=CL Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) DBMS Processing Unit 1 2. Transfer 1, 3 Exams. CL Processing Unit 2 4. Transfer 3, 2 St. CL Network Students(#Stud, Surname) 2. 000 Students DBMS Processing Unit 3 3. Join (Exams. CL; Students) 89

Which is the cost of operation 2? Find all student and exams 1. SELECT Exams with G=CL Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) DBMS Processing Unit 1 2. Transfer 1, 3 Exams. CL Processing Unit 2 4. Transfer 3, 2 St. CL Network Students(#Stud, Surname) 2. 000 Students DBMS Processing Unit 3 3. Join (Exams. CL; Students) 90

Which is the cost of operation 3? Find all student and exams 1. SELECT Exams with G=CL Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) DBMS Processing Unit 1 2. Transfer 1, 3 Exams. CL Processing Unit 2 4. Transfer 3, 2 St. CL Network Students(#Stud, Surname) 2. 000 Students DBMS Processing Unit 3 3. Join (Exams. CL; Students) 91

Which is the cost of operation 4? Find all student and exams 1. SELECT Exams with G=CL Exams (#Stud, #Course, Grade) 20. 000 Exams (200 cum Laude) DBMS Processing Unit 1 2. Transfer 1, 3 Exams. CL Processing Unit 2 4. Transfer 3, 2 St. CL Network Students(#Stud, Surname) 2. 000 Students DBMS Processing Unit 3 3. Join (Exams. CL; Students) 92

Summary on the four levels 93

DRDA Network 94

Distributed DBMS – level with highest transparency Fragmentation transparency 95

Two types of fragmentation Horizontal fragmentation • Student(#St, Surname, Region) 1. Lombardy. Student (#St, Surname) 2. Piedimont. Student (#St, Surname) Vertical fragmentation • Student(#St, Surname, Region) 1. Student(#St, Surname) 2. Student(#St, Region) 96

Example of Fragmentation Transparency Distributed DBMSs Centralized DBMSs Fragmentation Transparency Network (Location & Naming) Transparency Logical Data Independence (Transparency) Physical Data Independence (Transparency) E. g. Table Students (St#, Surname, Region of Birth) Table Lombady_Students (St#, Surname, Region of Birth= Lombardy) at node 1 Table Other_Students (St#, Surname, Region of Birth <> Lombardy) at node 2 97

Distributed DBMS – level with highest transparency Investigated in the literature (e. g. book of Ozsu Valduriez Principles of Distributed Database Systems accessible in the library, but never implemented in the DRDA standard (we will see that Oracle and other DBMSs have typically horizontal fragmentation) 98

Evolution to Data Federation 99
- Slides: 98