Introduction to Database Design SIMS 202 Information Organization
Introduction to Database Design SIMS 202: Information Organization and Retrieval Prof. Ray Larson UC Berkeley SIMS IS 202 – FALL 2005 -10 -06 - SLIDE 1
Lecture Overview • • • Databases and Database Design Database Life Cycle Database Design ER Diagrams Discussion IS 202 – FALL 2005 -10 -06 - SLIDE 2
Lecture Overview • • • Databases and Database Design Database Life Cycle Database Design ER Diagrams Discussion IS 202 – FALL 2005 -10 -06 - SLIDE 3
What is a Database? IS 202 – FALL 2005 -10 -06 - SLIDE 4
Files and Databases • File: A collection of records or documents dealing with one organization, person, area or subject (Rowley) – Manual (paper) files – Computer files • Database: A collection of similar records with relationships between the records (Rowley) – Bibliographic, statistical, business data, images, etc. IS 202 – FALL 2005 -10 -06 - SLIDE 5
Database • A Database is a collection of stored operational data used by the application systems of some particular enterprise (C. J. Date) – Paper “Databases” • Still contain a large portion of the world’s knowledge – File-Based Data Processing Systems • Early batch processing of (primarily) business data – Database Management Systems (DBMS) IS 202 – FALL 2005 -10 -06 - SLIDE 6
Why DBMS? • History – 50’s and 60’s all applications were custom built for particular needs – File based – Many similar/duplicative applications dealing with collections of business data – Early DBMS were extensions of programming languages – 1970 - E. F. Codd and the Relational Model – 1979 - Ashton-Tate and first Microcomputer DBMS IS 202 – FALL 2005 -10 -06 - SLIDE 7
From File Systems to DBMS • Problems with file processing systems – Inconsistent data – Inflexibility – Limited data sharing – Poor enforcement of standards – Excessive program maintenance IS 202 – FALL 2005 -10 -06 - SLIDE 8
DBMS Benefits • • • Minimal data redundancy Consistency of data Integration of data Sharing of data Ease of application development Uniform security, privacy, and integrity controls Data accessibility and responsiveness Data independence Reduced program maintenance IS 202 – FALL 2005 -10 -06 - SLIDE 9
Terms and Concepts • Data independence – Physical representation and location of data and the use of that data are separated • The application doesn’t need to know how or where the database has stored the data, but just how to ask for it • Moving a database from one DBMS to another should not have a material effect on application program • Recoding, adding fields, etc. in the database should not affect applications IS 202 – FALL 2005 -10 -06 - SLIDE 10
Database Environment CASE Tools Repository IS 202 – FALL 2005 User Interface DBMS Application Programs Database 2005 -10 -06 - SLIDE 11
Database Components DBMS ======== Design tools Database contains: User’s Data Metadata Indexes Application Metadata IS 202 – FALL 2005 Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4 GL) ======= Run time Form processor Query processor Report Writer Language Run time Application Programs User Interface Applications 2005 -10 -06 - SLIDE 12
Types of Database Systems • • • PC databases Centralized database Client/server databases Distributed databases Database models IS 202 – FALL 2005 -10 -06 - SLIDE 13
PC Databases E. g. : Access Fox. Pro Dbase Etc. IS 202 – FALL 2005 -10 -06 - SLIDE 14
Centralized Databases Central Computer IS 202 – FALL 2005 -10 -06 - SLIDE 15
Client Server Databases Client Network Database Server Client IS 202 – FALL 2005 -10 -06 - SLIDE 16
Distributed Databases Location C computer Location B computer Homogeneous Databases Location A IS 202 – FALL 2005 -10 -06 - SLIDE 17
Distributed Databases Client Heterogeneous Or Federated Databases Database Server Remote Comp. Local Network Comm Server Client IS 202 – FALL 2005 Remote Comp. 2005 -10 -06 - SLIDE 18
Terms and Concepts • A “database application” is an application program (or set of related programs) that is used to perform a series of database activities: – Create • Add new data to the database – Read • Read current data from the database – Update • Update or modify current database data – Delete • Remove current On behalf of database users IS 202 – FALL 2005 -10 -06 - SLIDE 19
Terms and Concepts • Enterprise – Organization • Entity – Person, Place, Thing, Event, Concept. . . • Attributes – Data elements (facts) about some entity – Also sometimes called fields or items or domains • Data values – Instances of a particular attribute for a particular entity IS 202 – FALL 2005 -10 -06 - SLIDE 20
Terms and Concepts • Key – An attribute or set of attributes used to identify or locate records in a file • Primary Key – An attribute or set of attributes that uniquely identifies each record in a file IS 202 – FALL 2005 -10 -06 - SLIDE 21
Terms and Concepts • Models – (1) Levels or views of the Database • Conceptual, logical, physical – (2) DBMS types • Relational, Hierarchic, Network, Object-Oriented, Object-Relational IS 202 – FALL 2005 -10 -06 - SLIDE 22
Models (1) Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements More later on this… IS 202 – FALL 2005 -10 -06 - SLIDE 23
Data Models(2): History • Hierarchical Model (1960’s and 1970’s) – Similar to data structures in programming languages Books (id, title) Authors (first, last) IS 202 – FALL 2005 Publisher Subjects 2005 -10 -06 - SLIDE 24
Data Models(2): History • Network Model (1970’s) – Provides for single entries of data and navigational “links” through chains of data. Authors Subjects Books Publishers IS 202 – FALL 2005 -10 -06 - SLIDE 25
Data Models(2): History • Relational Model (1980’s) – Provides a conceptually simple model for data as relations (typically considered “tables”) with all data visible IS 202 – FALL 2005 -10 -06 - SLIDE 26
Data Models(2): History • Object Oriented Data Model (1990’s) – Encapsulates data and operations as “Objects” Books (id, title) Authors (first, last) IS 202 – FALL 2005 Publisher Subjects 2005 -10 -06 - SLIDE 27
Data Models(2): History • Object-Relational Model (1990’s) – Combines the well-known properties of the Relational Model with such OO features as: • User-defined datatypes • User-defined functions • Inheritance and sub-classing • All of the major enterprise DBMS systems are now Object-Relational or incorporate Object-Relational features IS 202 – FALL 2005 -10 -06 - SLIDE 28
Lecture Overview • • • Databases and Database Design Database Life Cycle Database Design ER Diagrams Discussion IS 202 – FALL 2005 -10 -06 - SLIDE 29
Database System Life Cycle Physical Creation 2 Conversion 3 Design 1 Growth, Change, & Maintenance 6 Integration 4 Operations 5 IS 202 – FALL 2005 -10 -06 - SLIDE 30
Design (more later) • Determination of the needs of the organization – That is: “what are the business rules of this organization”, what information does it use and need? • Development of the Conceptual Model of the database – In Database design this is typically done with Entity. Relationship diagramming techniques • Construction of a Data Dictionary • Development of the Logical Model IS 202 – FALL 2005 -10 -06 - SLIDE 31
Physical Creation • Development of the Physical Model of the Database – Data formats and types – Determination of indexes, etc. • Load a prototype database and test • Determine and implement security, privacy and access controls • Determine and implement integrity constraints IS 202 – FALL 2005 -10 -06 - SLIDE 32
Conversion • Convert existing data sets and applications to use the new database – May need programs, conversion utilities to convert old data to new formats IS 202 – FALL 2005 -10 -06 - SLIDE 33
Integration • Overlaps with Phase 3 • Integration of converted applications and new applications into the new database IS 202 – FALL 2005 -10 -06 - SLIDE 34
Operations • All applications run full-scale • Privacy, security, access control must be in place • Recovery and Backup procedures must be established and used IS 202 – FALL 2005 -10 -06 - SLIDE 35
Growth, Change, and Maintenance • Change is a way of life – Applications, data requirements, reports, etc. will all change as new needs and requirements are found – The Database and applications and will need to be modified to meet the needs of changes IS 202 – FALL 2005 -10 -06 - SLIDE 36
Another View of the Life Cycle Integration 4 Operations 5 Design Physical 1 Creation Conversion Growth, 2 Change 3 6 IS 202 – FALL 2005 -10 -06 - SLIDE 37
Lecture Overview • • • Databases and Database Design Database Life Cycle Database Design ER Diagrams Discussion IS 202 – FALL 2005 -10 -06 - SLIDE 38
Database design process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2005 -10 -06 - SLIDE 39
Conceptual Design • Conceptual Model – Merge the collective needs of all applications – Determine what Entities are being used • Some object about which information is to maintained – What are the Attributes of those entities? • Properties or characteristics of the entity • What attributes uniquely identify the entity – What are the Relationships between entities • How the entities interact with each other? IS 202 – FALL 2005 -10 -06 - SLIDE 40
Developing a Conceptual Model • Overall view of the database that integrates all the needed information discovered during the requirements analysis • Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details • Can also be represented using other modeling tools (such as UML) IS 202 – FALL 2005 -10 -06 - SLIDE 41
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2005 -10 -06 - SLIDE 42
Logical Design • Logical Model – How is each entity and relationship represented in the Data Model of the DBMS • • Hierarchic? Network? Relational? Object-Oriented? IS 202 – FALL 2005 -10 -06 - SLIDE 43
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2005 -10 -06 - SLIDE 44
Physical Design • Internal Model – Choices of index file structure – Choices of data storage formats – Choices of disk layout IS 202 – FALL 2005 -10 -06 - SLIDE 45
Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2005 -10 -06 - SLIDE 46
Database Application Design • External Model – User views of the integrated database – Making the old (or updated) applications work with the new database design IS 202 – FALL 2005 -10 -06 - SLIDE 47
Lecture Overview • • • Databases and Database Design Database Life Cycle Database Design ER Diagrams Discussion IS 202 – FALL 2005 -10 -06 - SLIDE 48
ER Diagrams • You have already seen a UML form of one… IS 202 – FALL 2005 -10 -06 - SLIDE 49
ER Diagrams • Entity-Relationship (ER) diagrams are concerned with the entities involved in the organization/database and the relationships between those entities (as well as the attributes of each entity IS 202 – FALL 2005 -10 -06 - SLIDE 50
Entity • An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information – Persons (e. g. : customers in a business, employees, authors) – Things (e. g. : purchase orders, meetings, parts, companies) Employee IS 202 – FALL 2005 -10 -06 - SLIDE 51
Attributes • Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (this is the Metadata for the entities) Birthdate First Middle Last IS 202 – FALL 2005 Age Name Employee SSN Projects 2005 -10 -06 - SLIDE 52
Relationships • Relationships are the associations between entities • They can involve one or more entities and belong to particular relationship types IS 202 – FALL 2005 -10 -06 - SLIDE 53
Relationships Student Attends Class Project Supplier IS 202 – FALL 2005 Supplies project parts Part 2005 -10 -06 - SLIDE 54
Types of Relationships • Concerned only with cardinality of relationship Employee 1 Assigned n Assigned 1 1 m Assigned n Truck Project Chen ER notation IS 202 – FALL 2005 -10 -06 - SLIDE 55
Other Notations Employee Assigned Truck Employee Assigned Project “Crow’s Foot” IS 202 – FALL 2005 -10 -06 - SLIDE 56
Other Notations Employee Assigned Truck Employee Assigned Project IDEFIX Notation IS 202 – FALL 2005 -10 -06 - SLIDE 57
More Complex Relationships Manager 1/1/1 Employee 1/n/n Evaluation n/n/1 Project SSN Date Project Employee 4(2 -10) Assigned 1 Manages Employee IS 202 – FALL 2005 Is Managed By Project 1 Manages n 2005 -10 -06 - SLIDE 58
Weak Entities • Owe existence entirely to another entity Part# Invoice # Order Invoice# Contains Quantity Order-line Rep# IS 202 – FALL 2005 -10 -06 - SLIDE 59
Supertype and Subtype Entities Employee Sales-rep Sold Is one of Manages Clerk Other Invoice IS 202 – FALL 2005 -10 -06 - SLIDE 60
Many to Many Relationships SSN Proj# Hours Project Assignment Is Assigned Project Assigned Employee IS 202 – FALL 2005 SSN 2005 -10 -06 - SLIDE 61
UML diagraming • ER diagrams are often now expressed using UML instead… IS 202 – FALL 2005 -10 -06 - SLIDE 62
UML Class Diagram DIVEORDS Order No Customer No Sale Date Shipvia Payment. Method CCNumber No of People Depart Date Return Date Destination Vacation Cost Calc. Total. Invoice() Calc. Equipment() IS 202 – FALL 2005 Class Name List of Attributes List of operations 2005 -10 -06 - SLIDE 63
Differences from Entities in ER • Entities can be represented by Class diagrams • But Classes of objects also have additional operations associated with them IS 202 – FALL 2005 -10 -06 - SLIDE 64
Operations • Three basic types for database – Constructor – Query – Update IS 202 – FALL 2005 -10 -06 - SLIDE 65
Associations • An association is a relationship that describes a set of links between or among objects. • An association can have a name that describes the nature of this relationship. You can put a triangle next to this name to indicate the direction in which the name should be read. IS 202 – FALL 2005 -10 -06 - SLIDE 66
Associations • An association contains an ordered list of association ends. – An association with exactly two association ends is called a binary association – An association with more than two ends is called an n-ary association. IS 202 – FALL 2005 -10 -06 - SLIDE 67
Associations: Unary relationships * 0. . 1 Person 0. . 1 IS 202 – FALL 2005 Is-married-to manages Employee 0. . 1 manager 2005 -10 -06 - SLIDE 68
Associations: Binary Relationship Employee 0. . 1 Is-assigned Parking Place 0. . 1 One-to-one Product Line 1 contains * Product One-to-many Student * Registers-for * Course Many-to-many IS 202 – FALL 2005 -10 -06 - SLIDE 69
Associations: Ternary Relationships Part * Vendor IS 202 – FALL 2005 * Supplies * Warehouse 2005 -10 -06 - SLIDE 70
Association Classes Registers-for Student * Course * Computer Account Registration _________ acct. ID Term issues Password * 0. . 1 Grade Server. Space ________ Check. Eligibility() IS 202 – FALL 2005 -10 -06 - SLIDE 71
Derived Attributes, Associations, and Roles Course Student Course Offering ____________ Scheduled-for name Registers-for crse. Code term ssn * crse. Title * * 1 section date. Of. Birth credit. Hrs time Derived /age location attribute * * /participant Derived role {age = current. Date – date. Of. Birth} /Takes Derived association IS 202 – FALL 2005 -10 -06 - SLIDE 72
Generalization Employee ______ emp. Name emp. Number address date. Hired ______ print. Label() Hourly Employee ________ Hourly. Rate ________ compute. Wages() IS 202 – FALL 2005 Salaried Employee ________ Annual Sal stockoption ________ Contributepension() Consultant ________ contract. Number billing. Rate ________ compute. Fees() 2005 -10 -06 - SLIDE 73
Lecture Overview • • Databases and Database Design Database Life Cycle ER Diagrams Discussion IS 202 – FALL 2005 -10 -06 - SLIDE 74
Discussion • Why use DBMS for web-based system development? • Why Not use IR systems? • Can you use both? • Other Questions? IS 202 – FALL 2005 -10 -06 - SLIDE 75
Next Time • Normalization and the Relational Model IS 202 – FALL 2005 -10 -06 - SLIDE 76
- Slides: 76