INFORMATION SYSTEMS X Chapter 13 Database Design INFO
INFORMATION SYSTEMS @ X Chapter 13 Database Design INFO 425: Systems Design
INFORMATION SYSTEMS @ X Learning Objectives § Describe the differences and similarities between relational and object-oriented database management systems § Design a relational database schema based on an entity -relationship diagram ¨ High Level ¨ Detailed § Design an object database schema based on a class diagram INFO 425: Systems Design
INFORMATION SYSTEMS @ X Learning Objectives (continued) § Design a relational schema to implement a hybrid object-relational database § Describe the different architectural models for distributed databases INFO 425: Systems Design
INFORMATION SYSTEMS @ X Overview § This chapter describes design of relational and OO data models § Developers transform conceptual data models into detailed database models ¨ ¨ Entity-relationship diagrams (ERDs) for traditional analysis Class diagrams for object-oriented (OO) analysis § Detailed database models are implemented with database management system (DBMS) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Databases and Database Management Systems § Databases (DB) – integrated collections of stored data that are centrally managed and controlled § Database management system (DBMS) – system software that manages and controls access to database § Databases described by a schema – description of structure, content, and access controls § XML: Data and Schema are integrated INFO 425: Systems Design
INFORMATION SYSTEMS @ X XML Example Elements <PROPERTYLIST> <PROPERTYNO>PA 14</PROPERTYNO> <ADDRESS> <STREET>16 Holland</STREET> <CITY>Aberdeen</CITY> <POSTCODE>AB 7 5 SU</POSTCODE> </ADDRESS> <RENT>650</RENT> </PROPERTYLIST> INFO 425: Systems Design
INFORMATION SYSTEMS @ X Database Models § Impacted by technology changes since 1960 s § Model types ¨ ¨ Hierarchical Network Relational Object-oriented § Most current systems use relational and (rarely) object-oriented data models INFO 425: Systems Design
INFORMATION SYSTEMS @ X INFO 425: Systems Design
INFORMATION SYSTEMS @ X Network Data Model INFO 425: Systems Design
INFORMATION SYSTEMS @ X Relational Data Model INFO 425: Systems Design
INFORMATION SYSTEMS @ X Multidimensional Data Model INFO 425: Systems Design
INFORMATION SYSTEMS @ X Object Oriented Data Model INFO 425: Systems Design
INFORMATION SYSTEMS @ X Components of a DB and DBMS Virtually all actions in a relational DB managed through Structured Query Language INFO 425: Systems Design
INFORMATION SYSTEMS @ X Important DBMS Capabilities § Simultaneous access by multiple users and applications § Access to data without application programs (via a query language) § Organizational data management with uniform access and content controls INFO 425: Systems Design
INFORMATION SYSTEMS @ X Objectives of SQL § Ideally, database language should allow user to: ¨ ¨ ¨ create the database and relation structures; perform insertion, modification, deletion of data; perform simple and complex queries. § Should perform these tasks with minimal user effort and command structure/syntax must be easy to learn. § Should be portable. § The relational model provides Structured Query Language (SQL) to meet the above objectives INFO 425: Systems Design
INFORMATION SYSTEMS @ X SQL Benefits § SQL is relatively easy to learn: ¨ ¨ it is non-procedural - you specify what information you require, rather than how to get it; it is essentially free-format. § Relatively easy to learn and use – but also very powerful § Consists of standard English words: CREATE TABLE Staff(staff. No VARCHAR(5), l. Name VARCHAR(15), salary DECIMAL(7, 2)); INSERT INTO Staff VALUES ('SG 16', 'Brown', 8300); INFO 425: Systems Design
INFORMATION SYSTEMS @ X SQL § SQL has 3 major components: ¨ ¨ ¨ A Data Definition Language (DDL) for defining database structure. A Data Manipulation Language (DML) for retrieving and updating data. Commands to control access to data (security) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Relational Databases § Relational database management system (RDBMS) organizes data into tables or relations § Tables are two dimensional data structures ¨ ¨ Tuples – rows or records – represents a specific occurrence of an entity Fields – columns or attributes § Tables have primary key field(s) that can be used to identify unique records § Keys relate tables to each other… foreign keys INFO 425: Systems Design
INFORMATION SYSTEMS @ X Partial Display of Relational Database Table INFO 425: Systems Design
INFORMATION SYSTEMS @ X Database design § High Level – “logical” § Detailed – “physical” INFO 425: Systems Design
INFORMATION SYSTEMS @ X High Level Design § Create table for each entity type § Choose or invent primary key for each table § Add foreign keys to represent one-to-many relationships § Create new tables to represent many-to-many relationships INFO 425: Systems Design
INFORMATION SYSTEMS @ X High Level Design § Define referential integrity constraints § Evaluate schema quality and make necessary improvements…. normalization § Choose appropriate data types and value restrictions (if necessary) for each field ¨ Valid values ¨ Valid ranges INFO 425: Systems Design
INFORMATION SYSTEMS @ X RMO Entity-Relationship Diagram INFO 425: Systems Design
INFORMATION SYSTEMS @ X Relationship Between Data in Two Tables INFO 425: Systems Design
INFORMATION SYSTEMS @ X Representing Relationships § Relational databases use foreign keys to represent relationships § One-to-many relationship ¨ Add primary key field of “one” entity type as foreign key in table that represents “many” entity type § Many-to-many relationship ¨ ¨ Use the primary key field(s) of both entity types Use (or create) an associative entity table to represent relationship INFO 425: Systems Design
INFORMATION SYSTEMS @ X Entity Tables with Primary Keys INFO 425: Systems Design (Figure 12 -7)
INFORMATION SYSTEMS @ X Represent One-to-Many Relationships by Adding Foreign Keys (in italics) (Figure 12 -8) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Enforcing Referential Integrity § Consistent relational database state § Every foreign key value also exists as a primary key value § DBMS enforces referential integrity automatically after schema designer identifies primary and foreign keys INFO 425: Systems Design
INFORMATION SYSTEMS @ X DBMS Referential Integrity Enforcement § When rows containing foreign keys are created ¨ DBMS ensures that value also exists as a primary key in a related table § When row is deleted ¨ DBMS ensures no foreign keys in related tables have same value as primary key of deleted row § When primary key value is changed ¨ DBMS ensures no foreign key values in related tables contain the same value INFO 425: Systems Design
INFORMATION SYSTEMS @ X RI in DDL RI is defined when defining the foreign keys associated with a table INFO 425: Systems Design
INFORMATION SYSTEMS @ X Evaluating Schema Quality § High-quality data model has ¨ ¨ ¨ Uniqueness of table rows and primary keys Ease of implementing future data model changes (flexibility and maintainability) Lack of redundant data (database normalization) § Database design is not objective or quantitatively measured; it is experience and judgment based INFO 425: Systems Design
INFORMATION SYSTEMS @ X Normalization §A good data model: ¨ ¨ ¨ Is simple – all attributes in an entity describe only that entity. Is non-redundant. All attributes other than foreign keys describe one entity Is flexible and adaptive. §Normalization is a data analysis technique that organizes data attributes such that they are grouped to form non-redundant, stable, flexible, and adaptive entities. INFO 425: Systems Design
INFORMATION SYSTEMS @ X Database Normalization § Normal forms minimize data redundancy ¨ ¨ First normal form (1 NF) – no repeating fields or groups of fields Functional dependency – one-to-one relationship between the values of two fields 2 NF – in 1 NF and if each non-key element is functionally dependent on entire primary key 3 NF – in 2 NF and if no non-key element is functionally dependent on any other non-key element INFO 425: Systems Design
INFORMATION SYSTEMS @ X The first normal form states that each field is granular and there are no repeating groups First Normal Form AND… INFO 425: Systems Design
INFORMATION SYSTEMS @ X Second/Third Normal Form INFO 425: Systems Design The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. Third normal form is the same as second normal form except that it only refers to tables that have a single field as their primary key.
INFORMATION SYSTEMS @ X Exercise § Define what would need to be done place data in first normal form (look carefully at the data) § Place this data in 3 rd normal form INFO 425: Systems Design
INFORMATION SYSTEMS @ X Detailed Design § Design physical representation ¨ ¨ Analyze transactions Choose file organizations Choose indexes Estimate disk space requirements § Design user views § Design security mechanisms § Consider the introduction of controlled redundancy § Monitor and tune the operational system INFO 425: Systems Design
INFORMATION SYSTEMS @ X Analyze Transactions § Assume that a transaction = use case § Use this information to identify the parts of the database that may cause performance problems. § To select appropriate file organizations and indexes, also need to know high-level functionality of the transactions, such as: ¨ ¨ attributes that are updated in an update transaction criteria used to restrict rows that are retrieved in a query. INFO 425: Systems Design
INFORMATION SYSTEMS @ X Analyze Transactions § Often not possible to analyze all expected transactions, so investigate most ‘important’ ones – use Paretto Principle § To help identify which transactions to investigate, can use: ¨ ¨ transaction/relation cross-reference matrix, showing relations that each transaction accesses, and/or Access frequency analysis. INFO 425: Systems Design
INFORMATION SYSTEMS @ X Cross-referencing transactions and relations Which relations require the most attention? INFO 425: Systems Design
INFORMATION SYSTEMS @ X (2) Determine frequency information § We need to identify the key transactions that access those relations § Key information includes: ¨ ¨ ¨ Data volumes anticipated in each relation Average and maximum number of times period that a transaction executes Peak access times for each transaction Why is peak access time important to know? INFO 425: Systems Design
INFORMATION SYSTEMS @ X (2) Determine frequency information § Peak times are key to understanding if transactions conflict; that is, have similar peak times. If so, potential for performance issues is greater. INFO 425: Systems Design
INFORMATION SYSTEMS @ X Transaction usage map for some sample transactions INFO 425: Systems Design
INFORMATION SYSTEMS @ X Choose File Organizations Purpose: To determine an efficient file organization for each table. § § § Determines how records physically ordered on disk File organizations type: unordered, hash. Rule of Thumb: Use DBMS default organization INFO 425: Systems Design
INFORMATION SYSTEMS @ X Choose Indexes Purpose: To determine whether adding indexes will improve the performance of the system. § Need to consider both primary and secondary indexes ¨ ¨ Primary Index – how data will be physically ordered on disk Secondary Index – additional indexes created to speed access to data INFO 425: Systems Design
INFORMATION SYSTEMS @ X Why do indexes improve performance? § § § Without an index, need to scan tables Index = table with very few attributes. Use the index to find the record(s) that meet query criteria. INFO 425: Systems Design
INFORMATION SYSTEMS @ X Estimate Disk Space Requirements Purpose: To estimate the amount of disk space that will be required by the database. § We need to do this step to ensure that we have sufficient hardware and disk capacity both now and in the future to manage the database. § To calculate storage, need to consider: ¨ ¨ Amount of data required by each table (width of each row * count of rows) Indexes to be maintained Log files to be maintained Growth over a defined time period (1 – 2 years) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Design User Views Purpose: To design the user views that were identified during the requirements stage of the relational database application lifecycle. § On multi-user systems, views are often used to manage security, reduce complexity for users INFO 425: Systems Design
INFORMATION SYSTEMS @ X Other Detailed Design Tasks § Design security mechanisms § Consider the introduction of controlled redundancy § Monitor and tune the operational system INFO 425: Systems Design
INFORMATION SYSTEMS @ X Object-Oriented Databases § Direct extension of OO design and programming paradigm § ODBMS stores data as objects § Direct support for method storage, inheritance, nested objects, object linking, and programmerdefined data types § Object Definition Language (ODL) ¨ Standard language for describing structure and content of an object database INFO 425: Systems Design
INFORMATION SYSTEMS @ X Designing Object Databases § Determine which classes require persistent storage § Define persistent classes § Represent relationships among persistent classes § Choose appropriate data types and value restrictions (if necessary) for each field INFO 425: Systems Design
INFORMATION SYSTEMS @ X Representing Classes § Transient classes ¨ ¨ Objects exist only during lifetime of program or process Examples: view layer window, pop-up menu § Persistent classes ¨ ¨ ¨ Objects not destroyed when program or process ceases execution. State must be remembered. Exist independently of program or process Examples: customer information, employee information INFO 425: Systems Design
INFORMATION SYSTEMS @ X Representing Relationships § Object identifiers ¨ ¨ ¨ Used to identify objects uniquely Physical storage address or reference Relate objects of one class to another § ODBMS uses attributes containing object identifiers to find objects that are related to other objects ¨ Similar to…. . ? § Keyword relationship can be used to declare relationships between classes INFO 425: Systems Design
INFORMATION SYSTEMS @ X Representing Relationships (continued) § Advantages include ¨ ¨ ODBMS assumes responsibility for determining connection among objects ODBMS assumes responsibility for maintaining referential integrity § Type of relationships ¨ ¨ 1: 1, 1: M, M: M (one-to-one, one-to-many, many-to-many) Association class used with M: M INFO 425: Systems Design
INFORMATION SYSTEMS @ X RMO Domain Model Class Diagram (Figure 12 -15) INFO 425: Systems Design
INFORMATION SYSTEMS @ X One-to-One Relationship Represented with Attributes Containing Object Identifiers INFO 425: Systems Design
INFORMATION SYSTEMS @ X One-to-Many Relationship Between Customer and Order Classes INFO 425: Systems Design
INFORMATION SYSTEMS @ X One-to-Many Relationship Represented with Attributes Containing Object Identifiers INFO 425: Systems Design
INFORMATION SYSTEMS @ X Generalization Hierarchy within the RMO Class Diagram (Figure 12 -21) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Object DBMS Realities (from Wikipedia) § Object databases based on persistent programming acquired a niche in application areas such as: ¨ engineering and spatial databases, telecommunications, and scientific areas such as high energy physics and molecular biology. § They have made little impact on mainstream commercial data processing, INFO 425: Systems Design
INFORMATION SYSTEMS @ X Object DBMS Realities (from Wikipedia) § ODBMS suggest that pointer-based techniques are optimized for very specific "search routes" or viewpoints. However, for general-purpose queries on the same information, pointer-based techniques will tend to be slower and more difficult to formulate than relational. § Other things that work against ODBMS: ¨ ¨ lack of interoperability with a great number of tools/features that are taken for granted in the SQL world including but not limited to industry standard connectivity, reporting tools, OLAP tools and backup and recovery standards. Lack a formal mathematical foundation, unlike the relational model, INFO 425: Systems Design
INFORMATION SYSTEMS @ X Hybrid Object-Relational Database Design § RDBMS (hybrid DBMS) used to store object attributes and relationships § Design complete relational schema and simultaneously design equivalent set of classes § Mismatches between relational data and OO ¨ ¨ ¨ Class methods cannot be directly stored or automatically executed Relationships are restricted compared to ODBMS can represent wider range of data types INFO 425: Systems Design
INFORMATION SYSTEMS @ X Classes and Attributes § Designers store classes and object attributes in RDBMS by table definition § Relational schema can be designed based on class diagram § Table is created for each class § Fields of each table same as attributes of class § Row holds attribute values of single object § Key field is chosen for each table INFO 425: Systems Design
INFORMATION SYSTEMS @ X Relationships § Relationships are represented with foreign keys § Foreign key values serve same purpose as object identifiers in ODBMS § 1: M relationship – add primary key field of class on “one” side of the relationship to table representing class on “many” side § M: M relationship – create new table that contains primary key fields of related class tables and attributes of the relationship itself INFO 425: Systems Design
INFORMATION SYSTEMS @ X Data Access Classes § OO design based on a three-layer architecture § Data access classes are implementation bridge between data stored in program objects and data in relational database § Methods add, update, find, and delete fields and rows in table or tables that represent the class § Methods encapsulate logic needed to copy data values from problem domain class to database and vice versa INFO 425: Systems Design
INFORMATION SYSTEMS @ X Interaction Among a Domain Class, a Data Access Class, and the DBMS (Figure 12 -25) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Data Types § Storage format and allowable content of program variable, object state variable, or database field or attribute § Primitive data types – directly implemented ¨ Memory address (pointer), Boolean, integer, and so on § Complex data types – user-defined ¨ Dates, times, audio streams, video images, URLs INFO 425: Systems Design
INFORMATION SYSTEMS @ X Relational DBMS Data Types § Designer must choose appropriate data type for each field in relational database schema § Choice for many fields is straightforward ¨ ¨ ¨ Names and addresses use a set of fixed- or variable-length character arrays Inventory quantities can use integers Item prices can use real numbers § Complex data types (DATE, LONGRAW) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Subset of Oracle RDBMS Data Types INFO 425: Systems Design
INFORMATION SYSTEMS @ X Object DBMS Data Types § Use set of primitive and complex data types comparable to RDBMS data types § Schema designer can create new data types and associated constraints § Classes are complex user-defined data types that combine traditional concept of data with processes (methods) to manipulate data INFO 425: Systems Design
INFORMATION SYSTEMS @ X Distributed Databases § Rare for all organizational data to be stored in a single database in one location § Different information systems in an organization are developed at different times § Parts of an organization’s data may be owned and managed by different units § System performance is improved when data is near primary applications INFO 425: Systems Design
INFORMATION SYSTEMS @ X Single Database Server Architecture (Figure 12 -27) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Replicated Database Server Architecture (Figure 12 -28) INFO 425: Systems Design
INFORMATION SYSTEMS @ X Partitioned Database Server Architecture INFO 425: Systems Design
INFORMATION SYSTEMS @ X Advantages of DDBMSs § Reflects organizational structure § Improved shareability and local autonomy § Improved availability § Improved reliability § Improved performance § Economics § Modular growth INFO 425: Systems Design
INFORMATION SYSTEMS @ X Disadvantages of DDBMSs § Complexity § Cost § Security § Integrity control more difficult § Lack of standards § Lack of experience § Database design more complex INFO 425: Systems Design
INFORMATION SYSTEMS @ X Functions of a DDBMS § Expect DDBMS to have at least the functionality of a DBMS. § Also to have following functionality: ¨ ¨ ¨ Extended communication services. Extended Data Dictionary. Distributed query processing. Extended concurrency control. Extended recovery services. INFO 425: Systems Design
- Slides: 77