Database Design From Conceptual Design to Physical Implementation

Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval 9/9/1999 Information Organization and Retrieval

Review • Database Design Process • Normalization 9/9/1999 Information Organization and Retrieval

Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Application 4 Conceptual requirements 9/9/1999 Information Organization and Retrieval Internal Model

Normalization • Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data • Normalization is a multi-step process beginning with an “unnormalized” relation – Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management. 9/9/1999 Information Organization and Retrieval

Normal Forms • • • First Normal Form (1 NF) Second Normal Form (2 NF) Third Normal Form (3 NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4 NF) Fifth Normal Form (5 NF) 9/9/1999 Information Organization and Retrieval

Normalization No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency 9/9/1999 Boyce. Codd and Higher Information Organization and Retrieval Functional dependencyof nonkey attributes on the primary key - Atomic values only Full Functional dependencyof nonkey attributes on the primary key

Unnormalized Relations • First step in normalization is to convert the data into a two-dimensional table • In unnormalized relations data can repeat within a column 9/9/1999 Information Organization and Retrieval

Unnormalized Relation 9/9/1999 Information Organization and Retrieval

First Normal Form 9/9/1999 Information Organization and Retrieval

Second Normal Form 9/9/1999 Information Organization and Retrieval

Second Normal Form 9/9/1999 Information Organization and Retrieval

Second Normal Form 9/9/1999 Information Organization and Retrieval

Third Normal Form 9/9/1999 Information Organization and Retrieval

Third Normal Form 9/9/1999 Information Organization and Retrieval

Most 3 NF Relations are also BCNF 9/9/1999 Information Organization and Retrieval

ER Diagram Symbols Primary key Attribute Entity Relationship 9/9/1999 Ovals are used to indicate the attributes associated with an entity or relationship (That is, the pieces of information recorded in the database about the entity or relationship) An underlined name indicates that the attribute is a primary key (That is, it can uniquely identify the entity) Rectangles are used to indicate entities (That is, the representatives or records describing persons, things, or events in the database) Diamonds are used to indicate relationships between entities. (That is, some association between the data records of different entities) Information Organization and Retrieval

Today: New Design • Today we will build the COOKIE database from needs (rough) through the conceptual model, logical model and finally physical implementation in Access. 9/9/1999 Information Organization and Retrieval

Cookie Requirements • Cookie is a bibliographic database that contains information about a hypothetical union catalog of several libraries. • Need to record which books are held by which libraries • Need to search on bibliographic information – Author, title, subject, call number for a given library, etc. • Need to know who publishes the books for ordering, etc. 9/9/1999 Information Organization and Retrieval

Cookie Database • There are currently 5 main types of entities in the database – – – 9/9/1999 Books (bibfile) Local Call numbers (callfile) Libraries (libfile) Publishers (pubfile) Subject headings (subfile) Links between subject and books (indxfile) Information Organization and Retrieval

BIBFILE • Books (BIBFILE) contains information about particular books. It includes one record for each book. The attributes are: – – – – – 9/9/1999 accno -- an “accession” or serial number author -- The author’s name (not realistic -- one author per book) title -- The title of the book loc -- Location of publication (where published) date -- Date of publication price -- Price of the book pagination -- Number of pages ill -- What type of illustrations (maps, etc) if any height -- Height of the book in centimeters Information Organization and Retrieval

Books/BIBFILE Author Title accno Price Loc Books Date Pagination Height 9/9/1999 Information Organization and Retrieval Ill

CALLFILE • CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are: – accno -- the book accession number – libid -- the id of the holding library – callno -- the call number of the book in the particular library – copies -- the number of copies held by the particular library 9/9/1999 Information Organization and Retrieval

Local. Info/CALLFILE libid Callno accno Copies CALLFILE 9/9/1999 Information Organization and Retrieval

LIBFILE • LIBFILE contain information about the libraries participating in this union catalog. Its attributes include: – – – – 9/9/1999 libid -- Library id number library -- Name of the library laddress -- Street address for the library lcity -- City name lstate -- State code (postal abbreviation) lzip -- zip code lphone -- Phone number mop - suncl -- Library opening and closing times for each day of the week. Information Organization and Retrieval

Libraries/LIBFILE lstate laddress lcity lzip Library lphone Libid Sat. Cl LIBFILE Sun. Op Sat. Op Suncl FCl MOp FOp Th. Cl Mcl Tu. Op 9/9/1999 Tu. Cl WOp Information Organization and Retrieval WCl Th. Op

PUBFILE • PUBFILE contain information about the publishers of books. Its attributes include – – – – 9/9/1999 pubid -- The publisher’s id number publisher -- Publisher name paddress -- Publisher street address pcity -- Publisher city pstate -- Publisher state pzip -- Publisher zip code pphone -- Publisher phone number ship -- standard shipping time in days Information Organization and Retrieval

Publisher/PUBFILE paddress Publisher pcity PUBFILE pubid pzip Ship pphone 9/9/1999 pstate Information Organization and Retrieval

SUBFILE • SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are – subcode -- Subject identification number – subject -- the subject heading/description 9/9/1999 Information Organization and Retrieval

Subjects/SUBFILE subid Subject SUBFILE 9/9/1999 Information Organization and Retrieval

INDXFILE • INDXFILE provides a way to allow manyto-many mapping of subject headings to books. Its attributes consist entirely of links to other tables – subcode -- link to subject id – accno -- link to book accession number 9/9/1999 Information Organization and Retrieval

Linking Subjects and Books subid accno INDXFILE 9/9/1999 Information Organization and Retrieval

Some examples of Cookie Searches • Who wrote Microcosmographia Academica? • How many pages long is Alfred Whitehead’s The Aims of Education and Other Essays? • Which branches in Berkeley’s public library system are open on Sunday? • What is the call number of Moffitt Library’s copy of Abraham Flexner’s book Universities: American, English, German? • What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries? • Print a list of the Mechanics Library holdings, in descending order by height. • What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc. )? • Which library closes earliest on Friday night? 9/9/1999 Information Organization and Retrieval

pubid Cookie ER diagram accno BIBFILE Has call CALLFILE accno Has copy LIBFILE libid publishes Has index INDXFILE accno 9/9/1999 PUBFILE pubid Has subject subcode Information Organization and Retrieval SUBFILE subcode Note: diagram contains only attributes used for linking

What Problems? • What sorts of problems and missing features arise given the previous ER diagram? 9/9/1999 Information Organization and Retrieval

Problems Identified • Field sizes inappropriate • Author doesn’t allow multiple authors (editors, etc). • Subtitles, parallel titles • Edition information • Series information • lending status • material type designation • Genre, class information • Better codes (ISBN? ) 9/9/1999 • Missing information (ISBN) • Authority control for authors • Missing/incomplete data • Data entry problems • Ordering information • Illustrations • Subfield separation (such as last_name, first_name) • Separate personal and corporate authors Information Organization and Retrieval

Problems (Cont. ) • Location field inconsistent • No notes field • No language field • Zipcode doesn’t support plus-4 • No publisher shipping addresses 9/9/1999 • No (indexable) keyword search capability • No support for multivolume works • No support for URLs – to online version – to libraries – to publishers Information Organization and Retrieval

pubid Original Cookie ER diagram accno BIBFILE CALLFILE Has call accno Callno publishes Has index INDXFILE accno 9/9/1999 PUBFILE LIBFILE Has copy Libid libid pubid Has subject subid Information Organization and Retrieval Address, etc SUBFILE subid Library subject

Cookie 2: Separate Name Authorities pubid accno BIBFILE CALLFILE accno AUTHBIB LIBFILE libid nameid PUBFILE authtype INDXFILE pubid SUBFILE AUTHFILE accno nameid 9/9/1999 name subcode Information Organization and Retrieval subcode

Cookie 3: Keywords termid accno termid pubid accno BIBFILE TERMS CALLFILE LIBFILE accno AUTHBIB KEYMAP libid nameid PUBFILE authtype INDXFILE pubid SUBFILE AUTHFILE accno nameid 9/9/1999 name subcode Information Organization and Retrieval subcode

ser_title SERIES seriesid Cookie 4: Series termid accno termid seriesid pubid accno BIBFILE TERMS CALLFILE LIBFILE accno AUTHBIB KEYMAP libid nameid PUBFILE authtype INDXFILE pubid SUBFILE AUTHFILE accno nameid 9/9/1999 name subcode Information Organization and Retrieval subcode

ser_title SERIES Cookie 5: Circulation seriesid termid accno termid seriesid pubid accno BIBFILE TERMS CALLFILE LIBFILE accno AUTHBIB KEYMAP libid circid libid nameid PUBFILE authtype pubid PATRON INDXFILE SUBFILE CIRC AUTHFILE accno nameid 9/9/1999 name subcode Information Organization and Retrieval copynumpatronid circid

Mapping to Relations • Take each entity – BIBFILE – LIBFILE – CALLFILE – SUBFILE – PUBFILE – INDXFILE • And make it a table. . . 9/9/1999 Information Organization and Retrieval

Implementing the Physical Database. . . • For each of the entities, we will build a table… • Start up access… • Use “New” in Tables… • Loading data • Entering data • Data entry forms 9/9/1999 Information Organization and Retrieval
- Slides: 43