D A T Database Management Systems A B

D A T Database Management Systems A B Chapter 11 A Distributed Databases S Jerry Post E Copyright © 2013 1

Objectives ² ² ² ² Why do you need a distributed database? What are distributed databases? How is data distributed with client/server systems? Can a Web approach solve the data distribution issues? How much data can you send to a client form? How do you transfer data across diverse systems? How will Sally’s employees access the database? 2

Distributed Databases ² ² ² SELECT Sales FROM Britain. Sales UNION SELECT Sales FROM France. Sales UNION SELECT Sales FROM Italy. Sales Definition Advantages / Uses Problems / Complications Client-Server / SQL Server Microsoft Access Germany Britain France Italy 3

Distributed Database Definition ² Multiple independent databases ª Each DBMS is a complete DBMS (engine, queries, locking, transactions, etc. ) ª Usually on different machines. ª Usually in different locations. ² Connected by a network. ² Might be different environments ª Hardware ª Operating System ª DBMS Software Database Apollo Database Zeus England France Database Athena United States 4

Distributed Database Rules ² C. J. Date ² Rule 0: Transparency: the user should not know or care that the database is distributed. ª Local autonomy. ª No reliance on a central site. ª Continuous operation. ª Location independence. ª Fragmentation independence (physical storage). ª Replication independence. ª Distributed query processing. ª Distributed transaction management. ª Hardware independence. ª Operating system independence. ª Network independence. ª DBMS independence. 5

Distributed Features ² Each database can continue to run even if portion fails. ² Data and hardware can be moved without affecting operations or users. ª Expanding operations. ª Performance issues. ² System expansion and upgrades. ª Add new section without affecting others. ª Upgrade hardware, network and DBMS. 6

Advantages and Applications local transactions ² Business operations are often distributed ª Work and data are segmented by department. ª Work and data are segmented by geographical location. ² Improved performance ª Most updates and queries are performed locally. ª Maintain local control and responsibility over data. ² Can still combine data across the system. ² Scalability and expansion future expansion ª Add on, not replacement. 7

Creating a Distributed Database ² ² ² ² Design administration plan. Choose hardware and DBMS vendor, and network. Set up network and DBMS connections. Choose locations for data. Choose replication strategy. Create backup plan and strategy. Create local views and synonyms. Perform stress test: loads and failures. 8

Network Transfer Rates ² ² ² Drives: 60 - 400 MB per sec. (with SSD or RAID) LANs: 10 -100 MB per sec (100 -1000 mbps). WANs: 0. 2 - 300 MB per sec. Faster is possible but expensive! Goal is to minimize transmissions. ª Each system must be capable of evaluating queries --preferably SQL. ª Results depend heavily on how the system joins tables. WAN 0. 2 - 300 MB 60 – 400 MB Disk drive 10 -100 MB LAN 9

Distributed Query Processing ² Example ª ª NY: Customers: 1 M rows LA: Production: 10 M rows Chicago: Sales: 20 M rows Query: List customers who bought blue products on March 1 ª Bad idea #1 © Transfer all rows to Chicago © Then JOIN and select. ª Better idea #2 (probably) NY Customers(C#, …) 1, 000 C# list from desired P# Chicago Matching Sales(S#, C#, Sdate) Customer data 20, 000 Sale. Item(S#, P#, …) 50, 000 P# sold on March 1 © Transfer blue products from LA to Chicago ª Better idea #3 © Get sale items on March 1 © Get blue products from LA © Send C# to NY LA Blue P# sold on March 1 Products(P#, Color…) 10, 000 10

Data Replication: Publish/Subscribe Main subscribe Replica Published changes Changed data 11

Data Replication Britain ² Goals ª Minimize transmissions ª Improve performance ª Support heavy multiuser access. ² Problems ª Updating copies © Bulk transmissions © Site unavailable ª Concurrency © Easier for two people to change the same data at the same time. ² Decision support systems. ² Data warehouse. Britain: Customers & Sales Market research & data corrections. France: Customers & Sales Spain: Customers & Sales Periodic updates Spain Britain: Customers & Sales France: Customers & Sales Spain: Customers & Sales Update data. 12

Concurrency and Locks ² Each DBMS must maintain lock facility. ² To update, each DBMS must utilize and recognize other lock mechanisms and return codes. ² Each DBMS must have a deadlock resolution protocol that recognizes the distributed databases. ª Random wait. ª Optimistic updates. ª Two-phase commit. DBMS #1 Accounts Jones 8898 Transaction A Locked Waiting DBMS #2 Accounts Jones 3561 Transaction B Waiting Locked 13

Transactions & Two-Phase Commit ² Two (or more) separate lock managers. ² DBMS initiating update serves as the Database 1 coordinator. Initiate Transaction ² Two phases 1. Prepare to commit. ª Coordinator sends message and data to all machines to “get ready. ” ª Local machines save data in logs, verify update status and return message. ª If all locals report OK, then coordinator writes log and instructs others to proceed. If any fail, it sends Rollback message. All agree? 2. Commit Database 2 Lock tables. Save log. Update all tables. Database 3 14

Distributed Transaction Managers Transaction Manager Resource Manager DBMS Transaction Manager Transaction Processing Monitor Resource Manager DBMS The distributed transaction coordinator/transaction processing monitor handles the transaction decisions and coordinates across the participating systems. 15

Distributed Design Questions 16

Client-Server Shared Database Front-end User Interface Clients Older, but common for retail stores/checkout computers. 18

LAN File Server: Slow File Server My. File. mdb Cust. ID Name … 115 Jenkins… Forms 125 Juarez. . . Order. . . DBMS software transferred. Application and query transferred. SELECT * FROM Customer WHERE City = “Sandy” One row at a time transferred, until all rows are examined. 20

Client-Server Databases ² One machine is dominant (server) and handles data for many clients. ² Client machines handle front-end tasks and small data tables that are not shared. File Server DBMS Send SQL statement. . . T C E SEL . Shared SQL Server Data Return matching data. application 21

Three-Tier Client-Server ² Server Databases ² Client front-end ² Middle ª Locate databases ª Business rules ª Program code Databases. Transactions. Legacy applications. Database links. Business rules. Program code. Application. Front-end. User Interface. Database Servers Middleware Client 23

Database Independence on the Client Original DBMS ADO New DBMS ADO Application 24

Database Independence with Queries Independent Application Query: works with any DBMS SELECT Sale. ID, Sale. Date, Customer. ID, Customer. Name FROM Sale. Customer Saved Oracle Query SELECT Sale. ID, Sale. Date, Customer. ID, Last. Name || ‘, ‘ || First. Name AS Customer. Name FROM Sale, Customer WHERE Sale. Customer. ID=Customer. ID Saved SQL Server Query SELECT Sale. ID, Sale. Date, Customer. ID, Last. Name + ‘, ‘ + First. Name AS Customer. Name FROM Sale INNER JOIN Customer ON Sale. Customer. ID = Customer. ID 25

The Internet as Client-Server Internet Router Client Browser Router request http: //server. location/page result page Server Web Server HTML forms SQL Database Server 26

HTML Limited Clients <HTML> <HEAD> <TITLE>My main page</TITLE></HEAD> <BODY BACKGROUND=“graphics/back 0. jpg”> <P>My text goes in paragraphs. </P> <P>Additional tags set <B>boldface</B> and <I>Italic</I>. <P>Tables are more complicated and use a set of tags for rows and columns. </P> <TABLE BORDER=1> <TR><TD>First cell</TD><TD>Second cell</TD></TR> <TR><TD>Next row</TD><TD>Second column</TD></TR> </TABLE> <P>There are form tags to create input forms for collecting data. But you need CGI program code to convert and use the input data. </P> </BODY> </HTML> 27

HTML Output 28

Web Server Database Fundamentals DBMS Web Server SQL <body> <form id="form 1" runat="server"> <asp: Label ID="Page. Title. Label" runat="server" … Data <asp: Sql. Data. Source ID="Customer. Sql. Data. Source“ Delete. Command="DELETE FROM [Customer] … Select. Command="SELECT [Customer. ID], … Update. Command="UPDATE [Customer] SET… Data Object <Delete. Parameters> <asp: Parameter Name="Customer. ID" Type="Int 32" /> </Delete. Parameters> <asp: Form. View ID="Customers. Form. View" runat="server“… Data. Source. ID="Customer. Sql. Data. Source"> … </asp: Form. View> </form> </body> Web Browser Customers Customer. ID Last. Name First. Name … 1653 Jones Mary Save CSS Style Sheet. Page. Title { font-weight: bold; font-size: larger; text-align: center; } 30

Database Example: Client Side 0 Request Server/Form. html Server 1 Initial form e 2 Cal l We ag p b 3 Results 31

Client-Server Data Transfer Order Form Order ID Customer Order Date 1015 Jones, Martha 12 -Aug What if there are 10, 000 customers? How much time to load the combo box? How do you refresh/reload the combo box? Alternatives? 32

Latency Server Generate form Receive form data Transmission delay time Form received Client User delay 33

Cloud Computing Data is replicated to multiple, connected servers in the Internet cloud. Client requests are filled from the nearest available server, spreading the bandwidth and processor demands across the network. 34

Cloud Databases: Amazon S 3 Web server HTML, Code, Bucket +Key Amazon S 3 Bucket Name Key, Object HTML Page User Developer opens Amazon S 3 account and creates/names a bucket. Developer uploads content (objects) to bucket and assigns a key to each object. Developer writes Web server code with HTML page and background code that calls the S 3 service with the bucket name and object key. Web page is constructed on request and delivered with the object. 35

Web Databases (and Cloud) ² Many cloud systems are proprietary and not relational. ² Designed to handle data that rarely changes—typically through bulk uploads. ² So little fear of concurrency or transaction issues. ² Microsoft Azure SQL is an exception. ² Need to handle concurrency (and transactions/logs) in your own code. ² Always use optimistic concurrency (not pessimistic locking). ² See Chapter 7, but keep the original values when you read data and use it in the WHERE clause when you alter data to see if it has changed. 36
- Slides: 32