Lecture 14 Database Design SIMS 202 Information Organization
Lecture 14: Database Design SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10: 30 am - 12: 00 pm Fall 2002 http: //www. sims. berkeley. edu/academics/courses/is 202/f 02/ IS 202 – FALL 2002. 10. 17 - SLIDE 1
Lecture Overview • Review – Databases and Database Design – Database Life Cycle – ER Diagrams • Database Design • Normalization • Web-Enabled Databases IS 202 – FALL 2002. 10. 17 - SLIDE 2
Lecture Overview • Review – Databases and Database Design – Database Life Cycle – ER Diagrams • Database Design • Normalization • Web-Enabled Databases IS 202 – FALL 2002. 10. 17 - SLIDE 3
Models (1) Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 4
Database System Life Cycle Physical Creation 2 Conversion 3 Design 1 Growth, Change, & Maintenance 6 Integration 4 Operations 5 IS 202 – FALL 2002. 10. 17 - SLIDE 5
Another View of the Life Cycle Integration 4 Operations 5 Design Physical 1 Creation Conversion Growth, 2 Change 3 6 IS 202 – FALL 2002. 10. 17 - SLIDE 6
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 7
Entity • An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information – Persons (e. g. : customers in a business, employees, authors) – Things (e. g. : purchase orders, meetings, parts, companies) Employee IS 202 – FALL 2002. 10. 17 - SLIDE 8
Attributes • Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities) Birthdate First Middle Last IS 202 – FALL 2002 Age Name Employee SSN Projects 2002. 10. 17 - SLIDE 9
Relationships • Relationships are the associations between entities • They can involve one or more entities and belong to particular relationship types – One to One – One to Many – Many to Many IS 202 – FALL 2002. 10. 17 - SLIDE 10
Relationships Student Attends Class Project Supplier IS 202 – FALL 2002 Supplies project parts Part 2002. 10. 17 - SLIDE 11
Types of Relationships • Concerned only with cardinality of relationship Employee 1 Assigned n Assigned 1 1 m Assigned n Truck Project Chen ER notation IS 202 – FALL 2002. 10. 17 - SLIDE 12
More Complex Relationships Manager 1/1/1 Employee 1/n/n Evaluation n/n/1 Project SSN Date Project Employee 4(2 -10) Assigned 1 Manages Employee IS 202 – FALL 2002 Is Managed By Project 1 Manages n 2002. 10. 17 - SLIDE 13
Weak Entities • Owe existence entirely to another entity Part# Invoice # Order Invoice# Contains Quantity Order-line Rep# IS 202 – FALL 2002. 10. 17 - SLIDE 14
Supertype and Subtype Entities Employee Sales-rep Sold Is one of Manages Clerk Other Invoice IS 202 – FALL 2002. 10. 17 - SLIDE 15
Many to Many Relationships SSN Proj# Hours Project Assignment Is Assigned Project Assigned Employee IS 202 – FALL 2002 SSN 2002. 10. 17 - SLIDE 16
Lecture Overview • Review – Databases and Database Design – Database Life Cycle – ER Diagrams • Database Design • Normalization • Web-Enabled Databases IS 202 – FALL 2002. 10. 17 - SLIDE 17
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 18
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 19
Requirements Analysis • Conceptual Requirements – Systems Analysis Process • Examine all of the information sources used in existing applications • Identify the characteristics of each data element – – Numeric Text Date/time Etc. • Examine the tasks carried out using the information • Examine results or reports created using the information IS 202 – FALL 2002. 10. 17 - SLIDE 20
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 21
Conceptual Design • Conceptual Model – Merge the collective needs of all applications – Determine what Entities are being used • Some object about which information is to maintained – What are the Attributes of those entities? • Properties or characteristics of the entity • What attributes uniquely identify the entity – What are the Relationships between entities • How the entities interact with each other? IS 202 – FALL 2002. 10. 17 - SLIDE 22
Developing a Conceptual Model • Overall view of the database that integrates all the needed information discovered during the requirements analysis • Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details • Can also be represented using other modeling tools (such as UML) IS 202 – FALL 2002. 10. 17 - SLIDE 23
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 24
Logical Design • Logical Model – How is each entity and relationship represented in the Data Model of the DBMS • • Hierarchic? Network? Relational? Object-Oriented? IS 202 – FALL 2002. 10. 17 - SLIDE 25
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 26
Physical Design • Internal Model – Choices of index file structure – Choices of data storage formats – Choices of disk layout IS 202 – FALL 2002. 10. 17 - SLIDE 27
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002. 10. 17 - SLIDE 28
Database Application Design • External Model – User views of the integrated database – Making the old (or updated) applications work with the new database design IS 202 – FALL 2002. 10. 17 - SLIDE 29
Lecture Overview • Review – Databases and Database Design – Database Life Cycle – ER Diagrams • Database Design • Normalization • Web-Enabled Databases IS 202 – FALL 2002. 10. 17 - SLIDE 30
Normalization • Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data • Normalization is a multi-step process beginning with an “unnormalized” relation – Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management IS 202 – FALL 2002. 10. 17 - SLIDE 31
Normal Forms • • • First Normal Form (1 NF) Second Normal Form (2 NF) Third Normal Form (3 NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4 NF) Fifth Normal Form (5 NF) IS 202 – FALL 2002. 10. 17 - SLIDE 32
Normalization No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency IS 202 – FALL 2002 Boyce. Codd and Higher Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key 2002. 10. 17 - SLIDE 33
Unnormalized Relations • First step in normalization is to convert the data into a two-dimensional table • In unnormalized relations data can repeat within a column IS 202 – FALL 2002. 10. 17 - SLIDE 34
Unnormalized Relations IS 202 – FALL 2002. 10. 17 - SLIDE 35
First Normal Form • To move to First Normal Form a relation must contain only atomic values at each row and column – No repeating groups – A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation IS 202 – FALL 2002. 10. 17 - SLIDE 36
First Normal Form IS 202 – FALL 2002. 10. 17 - SLIDE 37
1 NF Storage Anomalies • Insertion: A new patient has not yet undergone surgery -- hence no surgeon # -- Since surgeon # is part of the key we can’t insert • Insertion: If a surgeon is newly hired and hasn’t operated yet -- there will be no way to include that person in the database • Update: If a patient comes in for a new procedure, and has moved, we need to change multiple address entries • Deletion (type 1): Deleting a patient record may also delete all info about a surgeon • Deletion (type 2): When there are functional dependencies (like side effects and drug) changing one item eliminates other information IS 202 – FALL 2002. 10. 17 - SLIDE 38
Second Normal Form • A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key – That is, every nonkey attribute needs the full primary key for unique identification IS 202 – FALL 2002. 10. 17 - SLIDE 39
Second Normal Form IS 202 – FALL 2002. 10. 17 - SLIDE 40
Second Normal Form IS 202 – FALL 2002. 10. 17 - SLIDE 41
Second Normal Form IS 202 – FALL 2002. 10. 17 - SLIDE 42
1 NF Storage Anomalies Removed • Insertion: Can now enter new patients without surgery • Insertion: Can now enter Surgeons who haven’t operated • Deletion (type 1): If Charles Brown dies the corresponding tuples from Patient and Surgery tables can be deleted without losing information on David Rosen • Update: If John White comes in for third time, and has moved, we only need to change the Patient table IS 202 – FALL 2002. 10. 17 - SLIDE 43
2 NF Storage Anomalies • Insertion: Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient • Deletion: If John White receives some other drug because of the penicillin rash, and a new drug and side effect are entered, we lose the information that penicillin cause a rash • Update: If drug side effects change (a new formula) we have to update multiple occurrences of side effects IS 202 – FALL 2002. 10. 17 - SLIDE 44
Third Normal Form • A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes – When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency • The side effect column in the Surgery table is determined by the drug administered – Side effect is transitively functionally dependent on drug so Surgery is not 3 NF IS 202 – FALL 2002. 10. 17 - SLIDE 45
Third Normal Form IS 202 – FALL 2002. 10. 17 - SLIDE 46
Third Normal Form IS 202 – FALL 2002. 10. 17 - SLIDE 47
2 NF Storage Anomalies Removed • Insertion: We can now enter the fact that a particular drug has a particular side effect in the Drug relation • Deletion: If John White recieves some other drug as a result of the rash from penicillin, but the information on penicillin and rash is maintained • Update: The side effects for each drug appear only once IS 202 – FALL 2002. 10. 17 - SLIDE 48
Boyce-Codd Normal Form • Most 3 NF relations are also BCNF relations • A 3 NF relation is NOT in BCNF if: – Candidate keys in the relation are composite keys (they are not single attributes) – There is more than one candidate key in the relation, and – The keys are not disjoint, that is, some attributes in the keys are common IS 202 – FALL 2002. 10. 17 - SLIDE 49
Most 3 NF Relations Are Also BCNF – Is This One? IS 202 – FALL 2002. 10. 17 - SLIDE 50
BCNF Relations IS 202 – FALL 2002. 10. 17 - SLIDE 51
Fourth Normal Form • Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial • Eliminate non-trivial multivalued dependencies by projecting into simpler tables IS 202 – FALL 2002. 10. 17 - SLIDE 52
Fifth Normal Form • A relation is in 5 NF if every join dependency in the relation is implied by the keys of the relation • Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation IS 202 – FALL 2002. 10. 17 - SLIDE 53
Normalizing to Death • Normalization splits database information across multiple tables • To retrieve complete information from a normalized database, the JOIN operation must be used • JOIN tends to be expensive in terms of processing time, and very large joins are very expensive IS 202 – FALL 2002. 10. 17 - SLIDE 54
Lecture Overview • Review – Databases and Database Design – Database Life Cycle – ER Diagrams • Database Design • Normalization • Web-Enabled Databases IS 202 – FALL 2002. 10. 17 - SLIDE 55
Overview • Why use a database system for Web design and e-commerce? • What systems are available? • Pros and Cons of different Web database systems? • Text retrieval in database systems • Search engines for Intranet and Intrasite searching IS 202 – FALL 2002. 10. 17 - SLIDE 56
Why Use a Database System? • Simple Web sites with only a few pages don’t need much more than static HTML files IS 202 – FALL 2002. 10. 17 - SLIDE 57
Simple Web Applications Web Server Internet Files Server Clients IS 202 – FALL 2002. 10. 17 - SLIDE 58
Adding Dynamic Content to the Site • Small sites can often use simple HTML and CGI scripts accessing data files to create dynamic content for small sites IS 202 – FALL 2002. 10. 17 - SLIDE 59
Dynamic Web Applications 1 Web Server Files CGI Internet Server Clients IS 202 – FALL 2002. 10. 17 - SLIDE 60
Issues For Scaling Up Web Applications • • • Performance Scalability Maintenance Data integrity Transaction support IS 202 – FALL 2002. 10. 17 - SLIDE 61
Why Use a Database System? • Database systems have concentrated on providing solutions for all of these issues for scaling up Web applications – Performance – Scalability – Maintenance – Data integrity – Transaction support • While systems differ in their support, most offer some support for all of these IS 202 – FALL 2002. 10. 17 - SLIDE 62
Dynamic Web Applications 2 Web Server Internet Files CGI DBMS Server database Clients IS 202 – FALL 2002. 10. 17 - SLIDE 63
Server Interfaces SQL HTML DHTML Web Server Java. Script Native DB Interfaces Database Web DB CGI App ODBC Web Server API’s Cold. Fusion Native DB interfaces JDBC Ph. P Perl Web Application Server Adapted from John P. Ashenfelter, Choosing a Database for Your Web Site IS 202 – FALL 2002 Java ASP 2002. 10. 17 - SLIDE 64
Web Application Server Software • • Cold. Fusion PHP ASP All of these are server-side scripting languages that embed code in HTML pages IS 202 – FALL 2002. 10. 17 - SLIDE 65
Cold. Fusion • Developing WWW sites typically involved a lot of programming to build dynamic sites – E. g. , pages generated as a result of catalog searches, etc. • Cold. Fusion was designed to permit the construction of dynamic Web sites with only minor extensions to HTML through a DBMS interface IS 202 – FALL 2002. 10. 17 - SLIDE 66
What Cold. Fusion Is Good For • Putting up databases onto the Web • Handling dynamic databases (frequent updates, etc. ) • Making databases searchable and updateable by users IS 202 – FALL 2002. 10. 17 - SLIDE 67
CFML Cold. Fusion Markup Language • Read data from and update data to databases and tables • Create dynamic data-driven pages • Perform conditional processing • Populate forms with live data • Process form submissions • Generate and retrieve email messages • Perform HTTP and FTP function • Perform credit card verification and authorization • Read and write client-side cookies IS 202 – FALL 2002. 10. 17 - SLIDE 68
Templates • Assume we have a database named contents_of_my_shopping_cart. mdb -single table called contents. . . • Create an HTML page (uses extension. cfm), before <HEAD>. . . • <CFQUERY NAME= ”cart" DATASOURCE=“contents_of_my_shoppi ng_cart"> SELECT * FROM contents ; </CFQUERY> IS 202 – FALL 2002. 10. 17 - SLIDE 69
Templates (cont. ) • • • <HEAD> <TITLE>Contents of My Shopping Cart</TITLE> </HEAD> <BODY> <H 1>Contents of My Shopping Cart</H 1> <CFOUTPUT QUERY= ”cart"> <B>#Item#</B> <BR> #Date_of_item# <BR> $#Price# <P> </CFOUTPUT> </BODY> </HTML> IS 202 – FALL 2002. 10. 17 - SLIDE 70
Templates (cont. ) Contents of My Shopping Cart Bouncy Ball with Psychedelic Markings 12 December 1998 $0. 25 Shiny Blue Widget 14 December 1998 $2. 53 Large Orange Widget 14 December 1998 $3. 75 IS 202 – FALL 2002. 10. 17 - SLIDE 71
CFIF and CFELSE <CFOUTPUT QUERY= ”cart"> Item: #Item# <BR> <CFIF #Picture# EQ""> <IMG SRC=“generic_picture. jpg"> <BR> <CFELSE> <IMG SRC="#Picture#"> <BR> </CFIF> </CFOUTPUT> IS 202 – FALL 2002. 10. 17 - SLIDE 72
Photo Browser • The current photo browser uses a combination of – Javascript for expandable hierarchies – Database in MS Access – Cold. Fusion to search the database when one of the facets is selected • The database design for the photo database currently looks like… IS 202 – FALL 2002. 10. 17 - SLIDE 73
Photo Browser ER IS 202 – FALL 2002. 10. 17 - SLIDE 74
Photo Database • Lets look at the photo database in the Access interface – Multi-Facet queries – Queries for multiple descriptors in the same facet (harder) IS 202 – FALL 2002. 10. 17 - SLIDE 75
Assignment 7 (Database Design) • Involves – Examining a Web Site (probably) using a DBMS for E-commerce to sell books – Inferring the structure and kinds of entities and attributes used in that site (book info only) – Creating your own design using ER diagrams showing the entities and relationships that you inferred IS 202 – FALL 2002. 10. 17 - SLIDE 76
Next Week • Introduction to Information Retrieval IS 202 – FALL 2002. 10. 17 - SLIDE 77
- Slides: 77