Data Modeling Dr Anil Maheshwari Learning Objectives Why
Data Modeling Dr. Anil Maheshwari
Learning Objectives Why Data Modeling tools Entity Relationship Diagrams
Information Systems, and Context Business Analyst Designers INPUT PROCESS TECHNICAL SIDE BUSINESS SIDE Business Units OUTPUT DATA (Information Systems) End Users Developers Picture developed by Dr. Anil K. Maheshwari
Types of Data Names (nominal) Rank Order (ordinal) Numeric (ratio) Numeric for subjective data (Interval) Alphabetical text – language, finite or unlimited Sound – tone Picture – color, quality, content Video – color, quality, content Metadata – data definitions
Why Data Management? “Data-fication” of the world everything is becoming data Data can be analyzed to predict the future Quality of data is important GIGO: Garbage In, Garbage Out Good quality data will give good quality information Quantity of data generated is doubling every 18 -24 months Big Data: High volume, variety & velocity of data Emails, tweets, transactions, documents, videos, etc Choice: Be buried under the avalanche, or use it for competitive advantage.
Top 10 Requirements for Good Data Management Data should be: 1. Accurate: with consistent values across data stores 2. Persistent: non-volatile, stored and managed 3. Available: when where and how users want to access 4. Accessible: in desired formats, with easy tools 5. Comprehensive: gathered from all relevant sources 6. Analyzable: for historical and predictive purposes 7. Flexible: data variety – small/large, text/video, etc 8. Secure: backed up and protected against loss 9. Cost-effective: cost proportional to estimated value 10. Scalable: to meet emergent demands
Top 10 Terms used in data management 1. Data. Base (DB) – organized data, made of tables/files 2. Data Normalization – technique for organizing data 3. SQL – Structured Query language for searching DB 4. Database management system (DBMS) - software 5. Database Administration – provisioning, tuning of sw 6. Database server – data manager in n-tier architecture 7. Data cleansing – for quality of data 8. Data warehouse – organized data for analytics 9. Data Analytics – data/text/web mining approach 10. Data Mining – statistical & machine learning techniques to generate insights from data 11. Big Data – catch-all for emergent data-driven world
Relational Database Model A relational database is composed of a set of relations (tables), Tables can be JOINed using shared attributes, SELECTed attributes can be PROJECTed for specific purpose It is a mathematical-theory driven model (Codd, 1970) Each data ‘table’ is a set of rows (or records) Each row has a unique key attribute
Accessing information from a Relational Database Data tables can be JOINed using the shared ‘key’ attributes to create larger temporary tables which can be queried to fetch information across tables Joins can be simple as between two tables Joins can also be complex with AND, OR, UNION or INTERSECTION etc of many joins High level commands in Structured Query Language (SQL) can be used to perform joins, selection and organizing of records
Conceptual Data Modeling Data as Entities and Relationships An Entity is any object or event about which someone chooses to collect data A Relationship between entities … involving two or more entities Data structures can be represented as an Entity. Relationship Diagram (ERD)
Entities and Attributes Entity Any object or event about which someone chooses to collect data E. g. Sales person, City, Product, Vehicle, Employee Attributes are data items that have something in common with the entity. E. g. Student ID, Student Name, and Student Address represent details for a Student entity 11
Relationships have degree, cardinality and participation constraints Degree of relationships can be Unary (e. g. Employee and Manager-as-employee) Binary (e. g. Student and course) Ternary (e. g. Vendor, Part, Warehouse) Cardinality represents the extent of relationship of each entity One-to-one (e. g. Employee and Parking Space) One-to-many (e. g. Customer and Orders) Many-to-many (e. g. Student and Course) Participation may be the optional or mandatory Customer and Order (mandatory) Employee and Course (optional)
Key fields A key attribute(s) is one that can be used to identify a particular instance. E. g. Student ID Key types are: Primary key—unique attribute value for the instance (e. g. Student ID) Candidate key—an attribute(s), that can serve as a primary key (Student Address) Secondary key—a key which may not be unique, used to select a group of records (Student city) Composite key—a combination of two or more attributes representing the key (e. g. Flight number and Flight date)
Attribute Values Attributes can be single-valued or multi-valued Single-valued e. g. Student Name Multi-valued e. g. Price history of a product Attribute can be simple or composite Simple e. g. Student name Composite e. g. Student address (composed of street, city, State, etc)
Determine Keys using ERD For a one-to-many relationship the primary key of the file at the one end of the relationship should be contained as a foreign key on the file at the many end of the relationship. For a many-to-many relationship It should be converted into two one-to-many relationships with an associative entity in the middle. The combination of primary keys of the entities participating in the relationship will form the primary key for the associative entity
Sample E-R Diagram 13 -16
Data Modeling : Tennis League A local tennis league wants to create a computer-based system to manage a tournament. A tennis player will fill out an application form to the club. The league will evaluate the application and, if suitable, will send a letter of acceptance to the player. Referees are registered with the league the same way. The league generates a schedule of matches and sends it to the players and referees. After each match, the referee send the results to the league. The league produces an updated tournament table, and sends to all the players.
Thank you.
- Slides: 18