Database Conceptual and Logical Design Zachary G Ives
Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Administrivia § Homework 2 due now § Homework 3 handed out (due on the 13 th) 2
Modifying the Database: Inserting Data § Inserting a new literal tuple is easy, if wordy: INSERT INTO PROFESSOR(fid, name) VALUES (4, ‘Simpson’) § But we can also insert the results of a query! INSERT INTO PROFESSOR(fid, name) SELECT sid AS fid, name FROM STUDENT WHERE sid < 20 3
Deleting and Modifying Tuples § Deletion is a fairly simple operation: DELETE FROM STUDENT S WHERE S. sid < 25 § So is insertion: UPDATE STUDENT S SET S. sid = 1 + S. sid, S. name = ‘Janet’ WHERE S. name = ‘Jane’ 4
I’m Building an App: How Do I Talk to the DB? § Generally, apps are in a different (“host”) language with embedded SQL statements § Static: SQLJ, embedded SQL in C § Dynamic: ODBC, JDBC, ADO, OLE DB, … § Typically, predefined mappings between host language types and SQL types (e. g. , VARCHAR String or char[]) 5
The Impedance Mismatch and Cursors § SQL is set-oriented – it returns relations § There’s no relation type in most languages! § Solution: result sets and cursors that are opened, read, as if from a file 6
JDBC: Dynamic SQL for Java § See Chapter 6 of the text for more info import java. sql. *; Connection conn = Driver. Manager. get. Connection(…); Prepared. Statement stmt = conn. prepare. Statement(“SELECT * FROM STUDENT”); … Result. Set rs = stmt. execute. Query(); while (rs. next()) { sid = rs. get. Integer(1); … } 7
Database-Backed Web Sites § We all know traditional static HTML web sites: Web Browser HTTP-Request GET. . . Web-Server HTML-File Load File HTML-File-System 8
Interaction Is Achieved via HTML Forms <html> <form action=“http: //my. com/some-handler-url” method=“POST”> <input type=“text” name=“value 1” /> <input type=“submit” value=“Send” /> <input type=“rest” value=“Cancel” /> </form> 9
DB Access with Java Applets and Server Processes Browser JVM Java Applet TCP/UDP IP Java-Server-Process JDBC Driver manager JDBCDriver Sybase Oracle . . . 10
Java Applets: Discussion § Advantages: § Can take advantage of client processing § Platform independent – assuming standard java § Disadvantages: § § Requires JVM on client; self-contained Inefficient: loading can take a long time. . . Resource intensive: Client needs to be state of the art Restrictive: can only connect to server where applet was loaded from (for security … can be configured) § A common alternative is to run code on the server-side § CGI, ASP/PHP/JSP, ASP. Net, servlets 11
Server Pages (*P) and Servlets (IIS, Tomcat, …) Web Server HTTP Request Load File HTML? HTML File Output File-System File Script? Server Extension I/O, Network, DB 12
ASP/JSP/PHP “Escapes” <html> <head><title>Sample</title></head> <body> <h 1>Sample</h 1> <% my. Class. Process(request. get. Parameter(“test”)); %> <%= request. get. Parameter(“value”); %> </body> </html> 13
Servlets class My. Class extends Http. Servlet { public void do. Get(Http. Request req, Http. Response res) … { … res. println(“<html><head><title>Test</title></h ead></html>”); } } 14
ASP/JSP/PHP Versus Servlets § The goal: combine direct HTML (or XML) output with program code that’s executed at the server The code is responsible for generating more HTML, e. g. , to output the results of a database table as HTML table elements § How might I do this? § HTML with embedded code (*P) § Code that prints out HTML (Servlets) 15
Now: How Do We Get the Database in the First Place? § Database design theory! § Neat outcome: we can actually prove that we have optimal design, in a manner of speaking… § But first we need to understand how to visualize in pretty pictures… 16
Databases Anonymous: A 6 -Step Program 1. Requirements Analysis: what data, apps, critical operations 2. Conceptual DB Design: high-level description of data and constraints – typically using ER model 3. Logical DB Design: conversion into a schema 4. Schema Refinement: normalization (eliminating redundancy) 5. Physical DB Design: consider workloads, indexes and clustering of data 6. Application/Security Design 17
Entity-Relationship Diagram (based on our running example) Underlined attributes are keys fid entity set STUDENTS sid name PROFESSORS name Teaches semester relationship set COURSES Takes exp-grade serno subj cid attributes (recall these have domains) 18
Conceptual Design Process § What are the entities being represented? § What are the relationships? STUDENTS Takes § What info (attributes) do we store about each? name exp-grade § What keys & integrity constraints do we have? sid 19
Translating Entity Sets to Logical Schemas & SQL DDL Fairly straightforward to generate a schema… CREATE TABLE STUDENTS (sid INTEGER, name VARCHAR(15) PRIMARY KEY (sid) ) CREATE TABLE COURSES (serno INTEGER, subj VARCHAR(30), cid CHAR(15), PRIMARY KEY (serno) ) 20
Translating Relationship Sets Generate schema with attributes consisting of: § Key(s) of each associated entity (foreign keys) § Descriptive attributes CREATE TABLE Takes (sid INTEGER, serno INTEGER, exp-grade CHAR(1), PRIMARY KEY (? ), FOREIGN KEY (serno) REFERENCES COURSES, FOREIGN KEY (sid) REFERENCES STUDENTS) 21
… OK, But What about Connectivity in the E-R Diagram? § Attributes can only be connected to entities or relationships § Entities can only be connected via relationships § As for the edges, let’s consider kinds of relationships and integrity constraints… PROFESSORS Teaches COURSES (warning: the book has a slightly different notation here!) 22
Logical Schema Design § Roughly speaking, each entity set or relationship set becomes a table (not always be the case; see Thursday) § Attributes associated with each entity set or relationship set become attributes of the relation; the key is also copied (ditto with foreign keys in a relationship set) 23
Binary Relationships & Participation § Binary relationships can be classified as 1: 1, 1: Many, or Many: Many, as in: 1: 1 1: n m: n 24
1: Many (1: n) Relationships § Placing an arrow in the many one direction, i. e. towards the entity that’s ref’d via a foreign key § Suppose profs teach multiple courses, but may not have taught yet: PROFESSORS Teaches COURSES Partial participation (0 or more…) § Suppose profs must teach to be on the roster: PROFESSORS Teaches COURSES Total participation (1 or more…) 25
Many-to-Many Relationships § Many-to-many relationships have no arrows on edges § The “relationship set” relation has a key that includes the foreign keys, plus any other attributes specified as key STUDENTS Takes COURSES 26
Examples § Suppose courses must be taught to be on the roster § Suppose students must have enrolled in at least one course 27
Representing 1: n Relationships in Tables CREATE TABLE Teaches( • Key of relationship fid INTEGER, serno CHAR(15), set: semester CHAR(4), PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS, FOREIGN KEY (serno) REFERENCES Teaches) CREATE TABLE Teaches_Course( • Or embed relationship in “many” entity set: serno INTEGER, subj VARCHAR(30), cid CHAR(15), fid CHAR(15) , when CHAR(4) , PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS) 28
1: 1 Relationships If you borrow money or have credit, you might get: rid Credit. Report delinquent? Describes debt Borrower ssn name What are the table options? 29
Roles: Labeled Edges Sometimes a relationship connects the same entity, and the entity has more than one role: Includes Assembly qty Subpart Parts id name This often indicates the need for recursive queries 30
DDL for Role Example CREATE TABLE Parts (Id INTEGER, Name CHAR(15), … PRIMARY KEY (ID) ) CREATE TABLE Includes (Assembly INTEGER, Subpart INTEGER, Qty INTEGER, PRIMARY KEY (Assemb, Sub), FOREIGN KEY (Assemb) REFERENCES Parts, FOREIGN KEY (Sub) REFERENCES Parts) 31
Roles vs. Separate Entities id Husband name What is the difference between these two representations? id Wife Married Husband Wife Person id name 32
ISA Relationships: Subclassing (Structurally) § Inheritance states that one entity is a “special kind” of another entity: “subclass” should be member of “base class” id People name ISA Employees salary 33
But How Does this Translate into the Relational Model? Compare these options: § § Two tables, disjoint tuples Two tables, disjoint attributes One table with NULLs Object-relational databases 34
Weak Entities A weak entity can only be identified uniquely using the primary key of another (owner) entity. § Owner and weak entity sets in a one-to-many relationship set, 1 owner : many weak entities § Weak entity set must have total participation People ssn name Feeds weekly. Cost Pets name species 35
Translating Weak Entity Sets Weak entity set and identifying relationship set are translated into a single table; when the owner entity is deleted, all owned weak entities must also be deleted CREATE TABLE Feed_Pets ( name VARCHAR(20), species INTEGER, weekly. Cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE) 36
N-ary Relationships § Relationship sets can relate an arbitrary number of entity sets: Student Project Indep Study Advisor 37
Summary of ER Diagrams § One of the primary ways of designing logical schemas § CASE tools exist built around ER (e. g. ERWin, Power. Builder, etc. ) § Translate the design automatically into DDL, XML, UML, etc. § Use a slightly different notation that is better suited to graphical displays § Some tools support constraints beyond what ER diagrams can capture § Can you get different ER diagrams from the same data? 38
Schema Refinement & Design Theory § ER Diagrams give us a start in logical schema design § Sometimes need to refine our designs further § There’s a system and theory for this § Focus is on redundancy of data § Let’s briefly touch on one key concept in preparation for Thursday’s lecture on normalization… 39
Not All Designs are Equally Good Why is this a poor schema design? Stuff(sid, name, cid, subj, grade) And why is this one better? Student(sid, name) Course(cid, subj) Takes(sid, cid, exp-grade) 40
Focus on the Bad Design sid name cid subj exp-grade 1 Sam 570 AI B 23 Nitin 550 DB A 45 Jill 505 OS A 1 Sam 505 OS C § Certain items (e. g. , name) get repeated § Some information requires that a student be enrolled (e. g. , courses) due to the key 41
Functional Dependencies Describe “Key-Like” Relationships A key is a set of attributes where: If keys match, then the tuples match A functional dependency (FD) is a generalization: If an attribute set determines another, written A ! B then if two tuples agree on A, they must agree on B: sid ! Address What other FDs are there in this data? Ø FDs are independent of our schema design choice 42
Formal Definition of FD’s Def. Given a relation scheme R (a set of attributes) and subsets X, Y of R: An instance r of R satisfies FD X Y if, for any two tuples t 1, t 2 2 r, t 1[X ] = t 2[X ] implies t 1[Y] = t 2[Y] § For an FD to hold for scheme R, it must hold for every possible instance of r § (Can a DBMS verify this? Can we determine this by looking at an instance? ) 43
General Thoughts on Good Schemas We want all attributes in every tuple to be determined by the tuple’s key attributes What does this say about redundancy? But: § What about tuples that don’t have keys (other than the entire value)? § What about the fact that every attribute determines itself? § Stay tuned for Thursday! 44
- Slides: 44