Data Warehouse Fundamentals Chapter 6 Relational Data Modeling

  • Slides: 146
Download presentation
Data Warehouse Fundamentals Chapter 6: Relational Data Modeling -- CONCEPTS, PRINCIPLES & APPROACHES-- Paul

Data Warehouse Fundamentals Chapter 6: Relational Data Modeling -- CONCEPTS, PRINCIPLES & APPROACHES-- Paul Chen www. cs 522. com (Please reference white papers on Data Modeling at Seattle U teaching materials website)

Topics Levels of Modeling 2. Relational Data Modeling—What is it? Types of Models and

Topics Levels of Modeling 2. Relational Data Modeling—What is it? Types of Models and Pre -Modeling Activities 3. Understand Terms and Terminology –Tool Demonstration 4. Conceptual Data Modeling: What, Why, When, Who? Activity Description 5. Logical Data Modeling: What, Why, When, Who? Activity Description 6. Physical Data Modeling- An Overview 7 Prototyping and RAD 1.

Databases & Modeling Type of Database Relational Database Constructs ERD & EER Characteristics Row/

Databases & Modeling Type of Database Relational Database Constructs ERD & EER Characteristics Row/ Column Multi-Dimensional Database Dimensional Modeling Cube Distributed Database Distributed Component Object Model Client Object (DCOM) Object-Oriented Database Class Diagram New Trend OLAP DW Object = Data + Operations(Services); Entity = Data only XML UML

Topic 1: Level of Modeling Descriptive: The dealer sold 200 cars last month. Operational

Topic 1: Level of Modeling Descriptive: The dealer sold 200 cars last month. Operational (OLTP) Primarily Two Dimensional Database System Explanatory: For every increase in 1 % in the interest, auto sales decrease by 5 %. Traditional DW (OLAP) Star Schema Cube Predictive: Predictions about future buyer behavior. Data Mining Cube + sophisticated analytical tools

Level of Analytical Processing Descriptive SIMPLE QUERIES & REPORTS Explanatory Predictive “WHAT IF” PROCESSING

Level of Analytical Processing Descriptive SIMPLE QUERIES & REPORTS Explanatory Predictive “WHAT IF” PROCESSING ANALYZE WHAT HAS PREVIOUSLY OCCURRED TO BRING ABOUT THE CURRENT STATE OF THE DATA Normalized Tables Query Dimensional Tables Roll-up; Drill Down + DETERMINE IF ANY PATTERNS EXIST BY REVIEWING DATA RELATIONSHIPS Statistical Analysis/Expert System/ Artificial Intelligence Classification & Value Prediction

DESCRIPTIVE MODELING Relational Data Modeling using ER Diagram u Conceptual Data Model (Analysis -

DESCRIPTIVE MODELING Relational Data Modeling using ER Diagram u Conceptual Data Model (Analysis - Requirements Gathering; What’s it? ) u Logical Data Model (Design-How is it? ) u Physical Data Model (Implementation)

EXPLANATORY MODELING u u u Also called Dimensional Modelling (to be discussed in chapter

EXPLANATORY MODELING u u u Also called Dimensional Modelling (to be discussed in chapter 7) Ways to derive the database component of a data warehouse Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables.

PREDICTIVE MODELING (to be discussed in chapter 10) u Similar to the human learning

PREDICTIVE MODELING (to be discussed in chapter 10) u Similar to the human learning experience – Uses observations to form a model of the important characteristics of some phenomenon. u Uses generalizations of ‘real world’ and ability to fit new data into a general framework. u Can analyze a database to determine essential characteristics (model) about the data set.

Topic 2: Relational Data Modeling-What’s it? A data model is a collection of constructs,

Topic 2: Relational Data Modeling-What’s it? A data model is a collection of constructs, business rules and sample data which together supports a dynamic representation of real world objects and events. u u u Constructs: entity-relationship diagrams (conceptual & logical or functional) and tables Business rules: constraints such as referential integrity rules and operators (add, update, delete) Sample Data for verification and prototyping: Verifying the accuracy of the model.

Objectives u Testing the real system before building it. u Assisting in understanding an

Objectives u Testing the real system before building it. u Assisting in understanding an organization’s data requirements. u Facilitating physical data base design.

Types of Models u Conceptual Data Model (Analysis - Requirements Gathering; What’s it? )

Types of Models u Conceptual Data Model (Analysis - Requirements Gathering; What’s it? ) u Logical Data Model (Design-How is it? ) u Physical Data Model (Implementation)

Pre-modeling Activities u Data collection phase u Facilitation techniques (for ex. JAD session) u

Pre-modeling Activities u Data collection phase u Facilitation techniques (for ex. JAD session) u Roles and responsibilities u Tools and repository Naming standards Modeling convention (What methods to use) Data protection/backup and recovery procedures u u u

Data Collection Phase * Sampling and Investigating Hard Data u The Needs for Sampling:

Data Collection Phase * Sampling and Investigating Hard Data u The Needs for Sampling: Containing costs; Speeding up the data gathering; Improving effectiveness; Reducing bias u Sampling Design -Four steps: 1. Determine the data to be collected or described 2. Determine the population to be sampled 3. Choose the type of sample 4. Decide on the sample size u * Please review chapter 5

Kinds of Information Sought in Investigation Types of hard Data (other than interviewing and

Kinds of Information Sought in Investigation Types of hard Data (other than interviewing and Observation) u u u Memos Signs on bulletins boards or in work areas Corporate Web sites Manuals Policy handbooks Record layout

Retina Scan “That recent Tom Cruise movie, Minority Report, shows advertising that targets each

Retina Scan “That recent Tom Cruise movie, Minority Report, shows advertising that targets each individual consumer as they pass by the signage. That’s the extreme, but I can see it going that way, ” said St. Denis.

Five Steps in Interview Preparation u Reading background material u Establishing interview objectives u

Five Steps in Interview Preparation u Reading background material u Establishing interview objectives u Deciding when to interview u Preparing the interviewee u Deciding on question type and structure

Two Types of Questions – Open-End Questions vs. Closed Questions Closed interview questions Such

Two Types of Questions – Open-End Questions vs. Closed Questions Closed interview questions Such as “ How many subordinates do you have? Benefits: u Getting to relevant data u Keeping control over the interview Drawbacks: u Failing to obtain rich detail u Intimidating the interviewee

Three Basic ways of Structuring Interviews u Pyramid Structure: Starting from closed questions, then

Three Basic ways of Structuring Interviews u Pyramid Structure: Starting from closed questions, then gradually expand into open territory. u Funnel Structure: The reverse of pyramid structure approach. u Diamond-Shaped: A combination of the two above structures.

JAD (Joint Application Development) u JAD sessions (also called facilitated session) are used to

JAD (Joint Application Development) u JAD sessions (also called facilitated session) are used to gather information and feedback and confirm the results of requirements gathering. u JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users. Advantages: Achieving consensus during the session when multiple sources of information exist, raising and addressing issues or assigning them for resolution, and immediately confirming information.

Topic 3: Understand Terms and Terminology u Independent entity u Dependent entity u Associative

Topic 3: Understand Terms and Terminology u Independent entity u Dependent entity u Associative Entity u Identifying relationship u Non-identifying relationship

Understand Terms and Terminology Identifier An attribute distinctly identifies each occurrence of an entity.

Understand Terms and Terminology Identifier An attribute distinctly identifies each occurrence of an entity. For ex. , bank account Id. , and student Id. u Association (relationships) An association is a relationship between two or more entities. Employee works for company Part has item u

Understand Terms and Terminology Cardinality(the form of relationship) Associations occur in there forms: one-to-one;

Understand Terms and Terminology Cardinality(the form of relationship) Associations occur in there forms: one-to-one; one-to-many; many-to-many u Tables A table is a two-dimensional representation of data consisting of columns and rows. u Primary Key Used to identify entities. Unique identification for a row in a table. Allow no nulls and no duplicates. May be system assigned. u

Understand Terms and Terminology Foreign key A foreign key is one or more data

Understand Terms and Terminology Foreign key A foreign key is one or more data elements whose value is based on the primary identifier of another entity, thus allowing the system to ‘join’ and get related information from other entities. The ‘joining’ of different entities in this manner eliminates the need of data repetition and redundancy. u Normalization A technique to make sure that the data in a logical model is defined once and only once. Normalization helps minimum data redundancy, and minimize update abnormalities. u

Topic 4: Conceptual Data Modeling: What? What is it? 1. It is a conceptual

Topic 4: Conceptual Data Modeling: What? What is it? 1. It is a conceptual representation of data without concern for its logical (functional) or physical aspects. 2. It is a set of high-level business data models which provides a framework for the data modeling activities at the next level.

Conceptual Data Modeling: When? When should it be done? 1. In support of the

Conceptual Data Modeling: When? When should it be done? 1. In support of the data requirements of a process model under development at the corresponding level. or 2. Outside the system application lifecycle on a department, division, or company wide basis.

Conceptual Data Modeling: Who? Who should do it? 1. The group responsible for assuring

Conceptual Data Modeling: Who? Who should do it? 1. The group responsible for assuring that data structure reflects business policies and rules. 2. It should be a joint effort between the owners and custodians of the data, the users of the data, and the analysts.

Conceptual Data Modeling: Why? u Documents the type of data (information) which must be

Conceptual Data Modeling: Why? u Documents the type of data (information) which must be represented in a system independent of specific application, organizations, or technology. u Maximizes data sharing; minimizing redundancy. u Provides foundations for physical database design. u Describes the unique business enterprise specifically.

Conceptual Data Modeling: Why? u Outside of application life cycle on a company-wide basis.

Conceptual Data Modeling: Why? u Outside of application life cycle on a company-wide basis. u Data modeling expresses inherent associations which are the most part, independent of anyone application. u Data entities change very little even through the way they are used can change for each application. u A complete maintained conceptual data model should shorten the requirements definition phases of system development life cycle.

Data Modeling: Approach Data partitioning Use a top-down approach to define the data requirements

Data Modeling: Approach Data partitioning Use a top-down approach to define the data requirements of a system. The purpose is to divide and conquer (from subject to entity), and to evolve from the conceptual level to logical level until physical database is derived. u Standard deliverables For each of the levels, there is a set of standard deliverables that must be produced. The documentation items must be well defined so that the data at each level is well understood. u

Data Partitioning Via Modeling (How) (What, Why, Who, Where) Conceptual Level Subjects Entities Technical

Data Partitioning Via Modeling (How) (What, Why, Who, Where) Conceptual Level Subjects Entities Technical considerations Relationships Logical Level Data Elements Physical Level Frequencies Data Definition Language -DDL(create, Alter, drop tables) Data Manipulation Language (select, insert, delete, update)

Conceptual Data Modeling -Activity Description 1. Define the system boundary – Data Context Diagram.

Conceptual Data Modeling -Activity Description 1. Define the system boundary – Data Context Diagram. 2. Partition a subject into entities. 3. Discover entities -super-type and subtype; part and whole. 4. Define associations between entities. 5. Define major attributes. 6. Define unique identifier for the entity. 7. Assign cardinalities and set up relationships between the entities. 8. Validate the model with the users.

1. Define the System Boundary by Subject Context Diagram A subject (a group of

1. Define the System Boundary by Subject Context Diagram A subject (a group of entities with strong affinity) is a class of data objects representing the mission and resources of the organization. u “Subjects” provides mechanisms for controlling how much of a model a reader (user, analyst, manager) is able to consider and comprehend at a time. For a small system, go directly to define entities. u For ex: Library (subject) decomposed into book, member, and account. (entities).

ATM ERD –Subject Context Level Customer uses ATM Has Owns Bank Consortium Consists of

ATM ERD –Subject Context Level Customer uses ATM Has Owns Bank Consortium Consists of Affiliated Bank Account Holds

Data (Subject) Context Diagram u A Data context diagram is a special case of

Data (Subject) Context Diagram u A Data context diagram is a special case of ERD in which a single diagram represents the problem domain in terms of data requirements. For example: The ATM diagram illustrates and highlights Several important characteristics of the system: The people and organization with which the system communicates. u It documents the significant connections (relationships) between the data objects within as well as outside the problem domain. u

2. Partition a Subject into Entities Criteria for partitioning a subject into entities are:

2. Partition a Subject into Entities Criteria for partitioning a subject into entities are: 1. The entities included in a subject all tend to describe the subject and have a strong affinity with the subject. 2. The entities of a subject should be of equal importance, as measured by the range, complexities or importance of their data. 3. Each entity should belong to only one subject.

3. Discover Entities u Identifying data objects via business event analysis An individual stimulus

3. Discover Entities u Identifying data objects via business event analysis An individual stimulus from one data object to another is an event. Events are discovered by investigating the external influences that act upon a system, and the data transformation that occurs within a system that converts inputs into outputs. Thus, source and target data objects that interact with an event can be identified. For example, an event “customer buys a product” as depicted below is initiated by the data object “customer” who requires a response from the data object “sale”, “product”, and “payment”.

Customer Sale Customer buys A Product Payment Product

Customer Sale Customer buys A Product Payment Product

Types of Entities To find potential entities (entities are nouns), look for: 1. 2.

Types of Entities To find potential entities (entities are nouns), look for: 1. 2. 3. Objects can be generalized or specialized The entities of a subject should be of equal importance, as measured by the range, complexities or importance of their data. Each entity should belong to only one subject

Subtype and Super-type Faculty Full Time Faculty Part (day) Time Faculty Part (Night)Time Faculty

Subtype and Super-type Faculty Full Time Faculty Part (day) Time Faculty Part (Night)Time Faculty

Generalization & Specialization Generalization Commercial 747 Specialization Aircraft Military 777 B 52 B-1 B

Generalization & Specialization Generalization Commercial 747 Specialization Aircraft Military 777 B 52 B-1 B

4. Define Associations Between Entities Use a verb to describe associations between two or

4. Define Associations Between Entities Use a verb to describe associations between two or more entities that the user wants to keep track of. Each relationship must be specific as possible so that its meaning is clear. An individual owns a building. Owns: possession, rental; or management?

5. Define Major Attributes All the attributes of an entity must have meaning for

5. Define Major Attributes All the attributes of an entity must have meaning for each and every one of the occurrence of the entity. Only elementary data are included in the model. Attributes resulting from process algorithms should not be included in the model. For ex. Entity ‘individual’ has attributes such as name, address; sex (male or female); no. of dependents, etc. But Derived attributes (such as percentage) are not attributes.

6. Define Unique Identifier for the Entity This is an attribute that unambiguously identifies

6. Define Unique Identifier for the Entity This is an attribute that unambiguously identifies each occurrence of each entity. Some entities do no have their own identifier. These entities are qualified as dependent or week entities. The identifier of the entity to which the dependent entities are associated with must be used to uniquely identify their occurrences. For ex. , a ‘child’ entity must have his or her parent identifier to be uniquely identified.

7. Assign Cardinalities & Set Up Relationships Between the Entities Cardinality is the minimum

7. Assign Cardinalities & Set Up Relationships Between the Entities Cardinality is the minimum and maximum number of times an occurrence of an entity occurs in relationship to another entity. The minimum number: 0 or 1 The maximum number: 1 or M There are three types of associations: One-to-one; one-to-many; many to many.

Types of Relationships By Degree (# of Attributes the Relation Contains) The relationships (representing

Types of Relationships By Degree (# of Attributes the Relation Contains) The relationships (representing business rules) could be either as binary, recursive, or ternary. Binary Doctor Patient Part Order Ternary Part/Order Recursive Organization

Recursive Association A recursive association is one in which there is a relationship between

Recursive Association A recursive association is one in which there is a relationship between An entity and itself. Information Dev Accounting Payable Engineering Receivable Facility Product Nuclear Coal

8. Validate the Model with the Users u The approach requires that the users

8. Validate the Model with the Users u The approach requires that the users be involved at the outset of the data modeling activity until the end of its implementation. They work side-by-side with system developers, providing input and validating the accuracy of the data requirements. This will ensure that the delivered end-product meets the users’ need.

Validate the Model with the Users – To identify and document the integrity constraints

Validate the Model with the Users – To identify and document the integrity constraints given in the user’s view of the enterprise. This includes identifying: » Required data » Referential integrity » Attribute domain constraints » Enterprise constraints » Entity integrity

Validate the Model With These Check Points u u u Attribute allocation Rules followed

Validate the Model With These Check Points u u u Attribute allocation Rules followed Cardinality necessity Relationship necessity Achievement of organization goals Contributions to expected benefits

Topic 5: Logical Data Modeling: What, Why, When, Who? And Activity Description What is

Topic 5: Logical Data Modeling: What, Why, When, Who? And Activity Description What is it? u u u It is a representation of data required to support the complete business needs for a particular business area, system or project. It is a set of data models that provides a framework for the physical database construction activities. It is a graphical representation of data objects that shows the relationship between the tables, views and functional core services used by modules in the application system.

Logical Data Modeling: What, Why, When, Who? Why do it? u u Document the

Logical Data Modeling: What, Why, When, Who? Why do it? u u Document the type of data which must be represented in a system with regard to specific system applications, organizations, or technologies. Assist in the orderly creation of a physical database design. Specifically describe the unique business enterprise. Accelerate and clarify communications between the functional analysis and DBA’s.

Logical Data Modeling: What, Why, When, Who? When should it be done? u u

Logical Data Modeling: What, Why, When, Who? When should it be done? u u u Part of the system application lifecycle. In parallel with process modeling activities. Upon the completion of the conceptual data model to produce a first-cut database design that includes definitions of tables, columns, and constraints.

Logical Data Modeling: What, Why, When, Who? Who should do it? u The group

Logical Data Modeling: What, Why, When, Who? Who should do it? u The group responsible for ensuring that data structure reflects business data requirements. u It should be a joint effort between the functional analysts and data administrators.

Logical Data Modeling: What, Why, When, Who? Benefits u Provide a definition of the

Logical Data Modeling: What, Why, When, Who? Benefits u Provide a definition of the data architecture of how the target system will be implemented. u Model parts of the database schema that show data structures are related to the processes. u Provide program designers with the detail for the part of the database design that their modules use.

Logical Data Modeling: Activity Description 1. Define Data Architectural Standards 2. Position the conceptual

Logical Data Modeling: Activity Description 1. Define Data Architectural Standards 2. Position the conceptual data modeling and revise the definitions of the entities. 3. Define integrity rules for entities and relationships and Apply normalization rules to each entity. 4. Complete and standardize the data elements. 5. Package the model for physical data modeling and system construction. 6. Evaluate quality of data for conversion. 7. Validate and Verify the Data Model

1. Define Data Architectural Standards Data Access/retrieval guidelines Naming convention; SQL coding standards; Data

1. Define Data Architectural Standards Data Access/retrieval guidelines Naming convention; SQL coding standards; Data integrity (package; trigger; commit; rollback) Error handling; record locking rule; update collision. • Data Security Data access rule; data separation rule; Data recovery/backup • Data base refresh/performance tuning •

2. Position the Conceptual Data Modeling and Revise the Definitions of the Entities u

2. Position the Conceptual Data Modeling and Revise the Definitions of the Entities u Position the conceptual data modeling and revise the definitions of the entities. By taking architectural standards into consideration, the complete CDM is reexamined. As a result, new entities and relationships my be discovered. u Partition data into entities at the table level.

Continued u Map Local Conceptual Data Model to Local Logical Data Model u To

Continued u Map Local Conceptual Data Model to Local Logical Data Model u To refine the local conceptual data model to remove undesirable features and to map this model to a local logical data model. This involves: u (1) Remove M: N relationships. u (2) Remove complex relationships. u (3) Remove recursive relationships. u (4) Remove relationships with attributes. u (5) Remove multi-valued attributes. u (6) Re-examine 1: 1 relationships. u (7) Remove redundant relationships.

Removing M: N Relationship

Removing M: N Relationship

Removing Complex Relationship

Removing Complex Relationship

Removing Recursive Relationship

Removing Recursive Relationship

Removing Relationship with Attribute

Removing Relationship with Attribute

Removing Multi-valued Attribute

Removing Multi-valued Attribute

Re-examine 1: 1 relationships. Remove redundant relationships

Re-examine 1: 1 relationships. Remove redundant relationships

3. Define Integrity Rules and Apply Normalization Rules Integrity rules for entities indicate the

3. Define Integrity Rules and Apply Normalization Rules Integrity rules for entities indicate the context in which an entity occurrence may be created, modified, or deleted. They also ensure that the entity is consistent with other entities. This is accomplished by placing referential attributes in each appropriate entity on the model. u For example, a Client (entity) holds an Account (entity). A client cannot be deleted if at least one of his accounts has a balance greater than 0. u Apply normalization rules to each entity.

Formalizing a One-to-one Relationship with Referential Attribute Husband *Husband name Other attributes Wife Married

Formalizing a One-to-one Relationship with Referential Attribute Husband *Husband name Other attributes Wife Married to Referential Attribute *Wife name Other attributes Husband name

Formalizing a One-to-Many Relationship with Referential Attribute Dog * Dog Id Other attributes Dog

Formalizing a One-to-Many Relationship with Referential Attribute Dog * Dog Id Other attributes Dog Owner Id Dog Owner (1: M) (1: 1) Referential Attribute * Dog Owner Id Other attributes

Formalizing a Many-to-Many Relationship with Referential Attribute Part Order *Part Id Other attributes *Order

Formalizing a Many-to-Many Relationship with Referential Attribute Part Order *Part Id Other attributes *Order No Other attributes Referential Attributes Order/Part *Order No *Part Id Other attributes An associative entity may Participate in relationship With other entity.

A Many-to-Many Relationship A many-to-many relationships will result in the creation of a new

A Many-to-Many Relationship A many-to-many relationships will result in the creation of a new entity. Order # Part # 1: M Part/Order Part #/Order #

Referential Integrity Three options: u u u Restrict: A primary key can not be

Referential Integrity Three options: u u u Restrict: A primary key can not be deleted if there any dependent foreign key rows. Cascade: Deleting a primary key row causes the deletion of all dependent foreign key rows. Set Null: Deleting a primary key row causes all dependent foreign keys values to be set null.

Apply Normalization Rules A technique to make sure the data in a logical data

Apply Normalization Rules A technique to make sure the data in a logical data models is defined once and only once. Normalization helps minimum data redundancy, and minimize update abnormalities. Three forms: u u u First Normal Form Second Normal Form Third Normal Form

Normalization u First Normal Form: Relationships between primary key and each attribute must be

Normalization u First Normal Form: Relationships between primary key and each attribute must be one-to-one; ie. , remove repeating group. u Second Normal Form: All non-key elements are dependent upon the entire primary key rather than any part thereof. u Third Normal Form: Elimination of the dependence of non-key field upon any other field excepts the primary keys.

PK: Primary Key FK: Foreign Key NN: No Null ND: No duplicate Order Part

PK: Primary Key FK: Foreign Key NN: No Null ND: No duplicate Order Part Relationship Order/Part Order. No (PK) part-no Qty PK PK NN partname ND FK + FK 1 1 123 Nut 3 5 123 Bolt

First Normal Form Item Table Item No Qty-Store-1 Qty-Store-2 Qty-Store-3 PK 101 3000 4000

First Normal Form Item Table Item No Qty-Store-1 Qty-Store-2 Qty-Store-3 PK 101 3000 4000 5000 The above is an violation of first normal form because there exists a repeated group.

Rule Number 1 u For each occurrence of an entity, there is only one

Rule Number 1 u For each occurrence of an entity, there is only one and only one value for each its attributes. Attributes with repeating values form at least one new entity. u N other words, relationship between primary key and each attribute must be one-to-one.

Possible Solution Store/Item Store ID PK S 1 S 2 Item. No Qty Sold

Possible Solution Store/Item Store ID PK S 1 S 2 Item. No Qty Sold + FK FK S 1 101 3000 S 2 102 4000

Second Normal Form Student/Course No Student No PK FK ST 01 ST 02 Grade

Second Normal Form Student/Course No Student No PK FK ST 01 ST 02 Grade Teacher code Course Name + FK FK 100 200 3. 0 4. 0 T 2 T 1 Math CS Both course name and student name should be removed because They are not related to the entire student/course primary key. Lee Doe

Possible Solution Student No Student Name Course No Student/Course Name

Possible Solution Student No Student Name Course No Student/Course Name

Rule Number 2 u Each attribute must be related to the entire primary key.

Rule Number 2 u Each attribute must be related to the entire primary key.

Second Normal Process Order No PK 1 3 Part Name Part. No Order-Dt Pt-price

Second Normal Process Order No PK 1 3 Part Name Part. No Order-Dt Pt-price PK 1/2/01 1 Nut 1. 5 1/3/01 5 Bolts 2. 0 Order/Part Orde r No PK 11 3 Partno QTY + 1 123 5 123 How about Putting Part. Name In Order/part Table?

Third Normal Form COURSE Course Id Course Name Dept -Id Teacher Code Dept Name

Third Normal Form COURSE Course Id Course Name Dept -Id Teacher Code Dept Name Teacher Name PK MH 400 Math A 1 CS 401 DB CS T 1 T 2 Math DOE CS Lee The relationship between any two non-primary key components must not be one-to-one. What’s wrong with the above?

Rule Number 3 u The relationship between any two non-primary key components must not

Rule Number 3 u The relationship between any two non-primary key components must not be one-t-one; ie. , remove tables within tables.

The Normal Process Order Customer Cust-Id Cust-Name PK 1 Lee 3 Sato Order ID

The Normal Process Order Customer Cust-Id Cust-Name PK 1 Lee 3 Sato Order ID PK 1 5 Order Cust-Id DT FK 1/2/ 01 1 1/5/21 3 It would be a violation of third normal form to place cust-name in the order table.

Why Reasons: 1. 2. 3. 4. One-to-one relationship between two non-primary key columns (Cus-Id

Why Reasons: 1. 2. 3. 4. One-to-one relationship between two non-primary key columns (Cus-Id and Cust-name). Redundancy An update anomaly (when a customer name was changed) Worse yet when a new name was added (the name could not be stored until the customer placed at least one order)

Identify Integrity Constraints –To identify and document the integrity constraints given in the user’s

Identify Integrity Constraints –To identify and document the integrity constraints given in the user’s view of the enterprise. This includes identifying: » Required data » Referential integrity » Attribute domain constraints » Enterprise constraints » Entity integrity

4. Complete and Standardize the Data Elements As a result of the modeling process

4. Complete and Standardize the Data Elements As a result of the modeling process via the preceding procedure, an information model will emerge. The information model can be used as input (for ex. , via Erwin Tool) to generate a data definition language (DDL) which in turn is used as input for physical data model. The information model (also called logical data model) fulfills the data requirements of the system.

Complete and Standardize the Data Elements (Continued) For each entity, identify the associated list

Complete and Standardize the Data Elements (Continued) For each entity, identify the associated list of attributes. For each element, specify the following: » Permitted value » Coding and editing rules » Dimensions » Length » Value » Frequency

5. Package the model for physical data modeling and system construction. To do this,

5. Package the model for physical data modeling and system construction. To do this, one must have: 1. 2. 3. A normalized entity relationship diagram. A description of table and column definitions. A description of data architecture standards.

6. Evaluate Quality of Data For Conversion If the data modeling is part of

6. Evaluate Quality of Data For Conversion If the data modeling is part of re-engineering efforts, we must also document: u u u Condition of the data of the existing system Impacts on the new and enhanced system. Conversion rules

7. Validate and Verify the Data Model u Validation (dynamic): Prototyping is used to

7. Validate and Verify the Data Model u Validation (dynamic): Prototyping is used to validate and refine the model. u Verification (static): Inspection or walk-through Inspections for entity necessity, relationship necessity, and attribute allocation.

7 Validate and Verify the Data Model (continued) u Validate Model against User Transactions

7 Validate and Verify the Data Model (continued) u Validate Model against User Transactions To ensure that the logical data model supports the transactions that are required by the user view. (Prototyping is a good tool) u Draw Entity-Relationship Diagram To draw an Entity-Relationship (ER) diagram that is a logical representation of the data given in the user’s view of the enterprise.

Validate Model Against User Transaction u Example transactions (a) Insert details for new members

Validate Model Against User Transaction u Example transactions (a) Insert details for new members of staff. (b) Delete details of a member of staff, given the staff number.

Topic 6: Physical Data Modeling-An Overview Step 1 Translate global logical data model for

Topic 6: Physical Data Modeling-An Overview Step 1 Translate global logical data model for target DBMS u Step 2 Design physical representation u Step 3 Design security mechanisms u

Step 1: Translate global logical data model for target DBMS u To produce a

Step 1: Translate global logical data model for target DBMS u To produce a basic working relational database schema from the global logical data model u Design base relations for target DBMS u To decide how to represent the base relations we have identified in the global logical data model in the target DBMS. u Design enterprise constraints for target DBMS u To design the enterprise constraints for the target DBMS.

Step 2 : Design physical representation To determine the file organizations and access methods

Step 2 : Design physical representation To determine the file organizations and access methods that will be used to store the base relations; that is, the way in which relations and tuples will be held on secondary storage. u u u 2. 1 Analyze transactions 2. 2 Choose file organizations 2. 3 Choose secondary indexes 2. 4 Consider the introduction of controlled redundancy 2. 4 Estimate disk space requirements

Step 2 : Design physical representation (Continued) u 2. 1 Analyze transactions u To

Step 2 : Design physical representation (Continued) u 2. 1 Analyze transactions u To understand the functionality of the transactions that will run on the database and to analyze the important transactions. u 2. 2 Choose file organizations u To determine an efficient file organization for each base relation.

Typical Disk Configuration

Typical Disk Configuration

Analyze Transactions u For each Transaction associated with the components of the data model

Analyze Transactions u For each Transaction associated with the components of the data model (usually predefined queries including view, trigger, procedure, function and package), it needs to be broken down into further smaller units of work:

Transactions Analysis (continued) u A. Transformation Rules: Describe the rules (R, U, I, D)

Transactions Analysis (continued) u A. Transformation Rules: Describe the rules (R, U, I, D) or algorithms used to transform data received into data generated. u B. Edit and Error Rules: Define the rules validating data received and the method of processing erroneous data. u C. Sequence Analysis: Describe under what conditions this transaction is performed and what rules determine which transaction will be performed next.

Cross-referencing Transactions and Relations

Cross-referencing Transactions and Relations

Transactions Analysis (continued) u D. Audit Rules: Describe the rules required to audit the

Transactions Analysis (continued) u D. Audit Rules: Describe the rules required to audit the activity performed within this transaction. u E. Security Rules: Define the security required to invoke the transaction or various facets of the transaction.

Transactions Analysis (continued) u F. Frequency of execution: Define the number of times this

Transactions Analysis (continued) u F. Frequency of execution: Define the number of times this transaction is performed in a fixed period of time. u G. Type of transaction mode: Describe whether the transaction is batch, on demand, or interactive.

Example - Sample Transactions

Example - Sample Transactions

ER Model for Sample Transactions showing Expected Occurrences

ER Model for Sample Transactions showing Expected Occurrences

Analysis of Selected Transaction C

Analysis of Selected Transaction C

Step 2 Design Physical Representation (continued) u 2. 3 Choose secondary indexes u To

Step 2 Design Physical Representation (continued) u 2. 3 Choose secondary indexes u To determine whether adding secondary indexes will improve the performance of the system. u 2. 4 Consider the introduction of controlled redundancy u To determine whether introducing redundancy in a controlled manner by relaxing the normalization rules will improve the performance of the system.

Step 2. 3 Choose secondary indexes Data File: The file contains the logical record.

Step 2. 3 Choose secondary indexes Data File: The file contains the logical record. Index File: The file contains the index file. u The values in the index file are ordered per the indexing field which is usually based on a single attribute.

Indexes u u u Primary index: The indexing field is guaranteed to have a

Indexes u u u Primary index: The indexing field is guaranteed to have a unique value. Secondary Index: An index that is defined on a nonordering field of of the data. Clustering index: If the index field is not a key field of the file, so that there can be more than one record corresponding to a value of the indexing field.

Step 2. 4 Consider the introduction of controlled redundancy u u Simplified Relation with

Step 2. 4 Consider the introduction of controlled redundancy u u Simplified Relation with Derived Attribute Duplicating Attribute Setting up Lookup Table Duplicating Foreign Key

Step 2 Design Physical Representation (Continued) u 2. 5 Estimate disk space requirements u

Step 2 Design Physical Representation (Continued) u 2. 5 Estimate disk space requirements u To estimate the amount of disk space that will be required by the database.

Step 3 Design Security Mechanisms u 3. 1 Design user views u To design

Step 3 Design Security Mechanisms u 3. 1 Design user views u To design the user views that were identified in Step 1 of the conceptual database design methodology. u 3. 2 Design access rules u To design the access rules to the base relations and user views.

Use Hotel Case for illustration Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_name Guest_address Hotel_No

Use Hotel Case for illustration Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_name Guest_address Hotel_No Hotel_name City Registration Room 1: 1 1: M Identifying Relationship Room_no Hotel_no Type Price Dependent Entity (Attribute Entity)

Data Partitioning –using Hotel as a case study (How) (What, Why, Who, Where) Conceptual

Data Partitioning –using Hotel as a case study (How) (What, Why, Who, Where) Conceptual Level Subjects (Hotel) Entities Technical considerations Relationships Logical Level Data Elements Frequencies Data Definition Language -DDL(create, Alter, drop tables) Data Manipulation Language (select, insert, delete, update)

Conceptual Data Modeling (Breaking the Subject Hotel into several entities. Guest_no 1: M Hotel_No

Conceptual Data Modeling (Breaking the Subject Hotel into several entities. Guest_no 1: M Hotel_No Books 1: M Room 1: 1 Has 1: M Identifying Relationship Room_no Dependent Entity (Attribute Entity)

Logical Data Modeling Booking Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_name Guest_address Hotel_No Hotel_name

Logical Data Modeling Booking Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_name Guest_address Hotel_No Hotel_name City Room 1: 1 1: M Identifying Relationship Room_no Hotel_no Type Price Dependent Entity (Attribute Entity)

Resolving Referential Attributes & Normalization Item (such as TV, Bed) Item No Qty_Hotel _no-1

Resolving Referential Attributes & Normalization Item (such as TV, Bed) Item No Qty_Hotel _no-1 Qty_ Hotel _no -2 Qty_ Hotel _no -3 PK 101 6 9 14 The above is an violation of first normal form because there exists a repeated group. Relationships between primary key and each attribute must be one-to-one.

Possible Solution Hotel ID Hotel/Item Hotel name PK PK Min-nan H 1 H 2

Possible Solution Hotel ID Hotel/Item Hotel name PK PK Min-nan H 1 H 2 Hotel ID FK H 1 Xiamen H 2 Item. No Qty + FK 101(TV) 102 6 5

Second Normal Form Room/Hotel Room No 101 Hotel No 4 Type double Price 100

Second Normal Form Room/Hotel Room No 101 Hotel No 4 Type double Price 100 Hotel name Xiamen Hotel Name should be removed because it is not related to the entire room/hotel primary key. What happens if one of the hotel names is being changed?

Third Normal Form u The relationship between any two non-primary key components must not

Third Normal Form u The relationship between any two non-primary key components must not be one-t-one; ie. , remove tables within tables.

Third Normal Form City ID Hotel/City name C 2 Hotel name City Id PK

Third Normal Form City ID Hotel/City name C 2 Hotel name City Id PK PK C 1 Hotel ID Las Vegas Seattle H 0 H 1 H 2 Circus Flemingo Holiday C 1 C 2 What happens if a new City name is added to the Hotel/City Table?

What’s the Referential Integrity Hotel vs Room Tables? u Restrict u Cascade u Set

What’s the Referential Integrity Hotel vs Room Tables? u Restrict u Cascade u Set Null

Physical Data Modeling • Data Definition Language -DDL(create, Alter, drop tables) u Data Manipulation

Physical Data Modeling • Data Definition Language -DDL(create, Alter, drop tables) u Data Manipulation Language (select, insert, delete, update)

Data Manipulation Language u u u SELECT DISTINCT COUNT(Guest_No) FROM Booking WHERE Date_From >

Data Manipulation Language u u u SELECT DISTINCT COUNT(Guest_No) FROM Booking WHERE Date_From > 08/01/2000 AND < 08/31/2000);

Data Definition Language -DDL u u CREATE TABLE hotel ( hotel_no char(18) NOT NULL,

Data Definition Language -DDL u u CREATE TABLE hotel ( hotel_no char(18) NOT NULL, hotel_name char(18) NULL ) Note: Primary key is not null.

Topic 7: Prototyping Purposes: To validate and refine the model of a system. Characteristics:

Topic 7: Prototyping Purposes: To validate and refine the model of a system. Characteristics: Prototyping is a discipline of interactive experimentation to stimulate user feedback. Approach: A prototype is a materialization of modeling process by building rapidly and simulating the essential aspects of the system.

Software Modeling/Prototyping With Built-in Testing Validation/Project Evaluation & Control Existing System Target System Physical

Software Modeling/Prototyping With Built-in Testing Validation/Project Evaluation & Control Existing System Target System Physical Model Built-In Testing Validation Prototyping Project Evaluation/Control Logical Model Approach: Structured or Spiral or Iterative Approach Rapid Application Development CASE tools; Deliverable Templates

Types of Prototyping u Nonworking Scale Prototyping (Mock-up) u Straw man (exploratory) prototyping (scale

Types of Prototyping u Nonworking Scale Prototyping (Mock-up) u Straw man (exploratory) prototyping (scale model approach) u First Full-Scale Prototyping (Real world model)

Types of Prototyping Evaluative (mock-up) prototyping (Blue print approach) Use: Visualization; demonstration; inspection u

Types of Prototyping Evaluative (mock-up) prototyping (Blue print approach) Use: Visualization; demonstration; inspection u Advantages. Reduce risk; low cost; resolve uncertainty Early; fast; flexible How: Focuses on a relatively small number of questions to avoid becoming too complex. Generally thought of a throw-away.

Types of Prototyping Straw man (exploratory) prototyping (scale model approach) Use: Experiment; validating and

Types of Prototyping Straw man (exploratory) prototyping (scale model approach) Use: Experiment; validating and refining the conceptual as well as logical data model. u Advantages. Used to improve design; discover things about a proposed system that would otherwise not be revealed. How: Evaluate the impact on work flows; validate ease of use.

Types of Prototyping Evolution prototyping (real world model) Use: Production use. u Advantages. Part

Types of Prototyping Evolution prototyping (real world model) Use: Production use. u Advantages. Part of production exercise. How: Explore functionality of subsystem and system; probe technical feasibility the performance and the integrity of the system.

Stages of Prototyping u Planning u Initial analysis and design u Construction u Tryout

Stages of Prototyping u Planning u Initial analysis and design u Construction u Tryout u Evaluation u Disposal

Dimensions of Prototyping Relationships of any prototyping to its eventual system are characterized along

Dimensions of Prototyping Relationships of any prototyping to its eventual system are characterized along four dimensions: u Focus – for example, a prototype may focus on the functionality, user interface, system integration, reliability, and performance. u Scope– Is a measure of how much of the eventual system the prototype represents.

Dimensions of Prototyping u Depth – is a measure of how deeply it represents

Dimensions of Prototyping u Depth – is a measure of how deeply it represents the behavior of the eventual system. For ex. , a shallow prototype of a message system might display ‘canned’ messages, where a deeper prototype might actually perform communications to provide a more realistic surrogate for the eventual system. u Scale– Is a measure of how its size or performance compares with that of the eventual system.

Use Hotel case for illustration Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_name Guest_address Hotel_No

Use Hotel case for illustration Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_name Guest_address Hotel_No Hotel_name City Room 1: 1 1: M Identifying Relationship Room_no Hotel_no Type Price Dependent Entity (Attribute Entity)

Guest Registration

Guest Registration

Prototyping for Hotel Case u u u Straw man (exploratory) prototyping (scale model approach)-use

Prototyping for Hotel Case u u u Straw man (exploratory) prototyping (scale model approach)-use Microsoft Access to build a form to simulate the registration of guests. Straw man (exploratory) prototyping (scale model approach)- build a small set of tables to accept the data. Evolution prototyping (real world model) –actually building a mini-system with real tables to simulate the system.

Rapid Application Development (RAD) u RAD, or rapid application development, is an object-oriented approach

Rapid Application Development (RAD) u RAD, or rapid application development, is an object-oriented approach to systems development that includes a method of development as well as software tools

RAD Phases u There are three broad phases to RAD: u Requirements planning u

RAD Phases u There are three broad phases to RAD: u Requirements planning u RAD design workshop u Implementation

Requirements Planning Phase u Users and analysts meet to identify objectives of the application

Requirements Planning Phase u Users and analysts meet to identify objectives of the application or system u Oriented toward solving business problems

RAD Design Workshop u Design and refine phase u Use group decision support systems

RAD Design Workshop u Design and refine phase u Use group decision support systems to help users agree on designs u Programmers and analysts can build and show visual representations of the designs and workflow to users u Users respond to actual working prototypes u Analysts refine designed modules based on user responses

Implementation Phase u u As the systems are built and refined, the new systems

Implementation Phase u u As the systems are built and refined, the new systems or partial systems are tested and introduced to the organization When creating new systems, there is no need to run old systems in parallel

RAD and the SDLC u u u RAD tools are used to generate screens

RAD and the SDLC u u u RAD tools are used to generate screens and exhibit the overall flow of the application Users approve the design and sign off on the visual model Implementation is less stressful since users helped to design the business aspects of the system

When to Use RAD u RAD is used when u The team includes programmers

When to Use RAD u RAD is used when u The team includes programmers and analysts who are experienced with it u There are pressing reasons for speeding up application development u The project involves a novel ecommerce application and needs quick results u Users are sophisticated and highly engaged with the goals of the company

Using RAD Within the SDLC u u RAD is very powerful when used within

Using RAD Within the SDLC u u RAD is very powerful when used within the SDLC It can be used as a tool to update, improve, or innovate selected portions of the system

Disadvantages of RAD u u May try and hurry the project too much Loosely

Disadvantages of RAD u u May try and hurry the project too much Loosely documented May not address pressing business problems Potentially steep learning curve for programmers inexperienced with RAD tools

Final Words u Transform data into information by understanding the process u Transform information

Final Words u Transform data into information by understanding the process u Transform information into decisions with knowledge u Transform decisions into results with actions