Fundamentals of Databases Database Design Computer Science Data

Fundamentals of Databases Database Design Computer Science

Data Models • Databases are used whenever we want to store lots of data Literally the base of our data • Can come in lots of different formats (each one changing how we use it) Relational Factual Flat-format • We will be looking at relational databases 2 Fundamentals of Databases: Database Design

Data Models • Before creating any data, or any structures for the data, we first analyse the requirements of the database • This comes into play in two ways – The data requirements • What data do things need in total • For example, a classroom needs a classroom code, year group, and teacher – The data constraints • What can the data not do, or not be grouped with • For example, a classroom cannot have more than one teacher 3 Fundamentals of Databases: Database Design

Data Models • Relational databases work based off of two things The entities that the database contains The relationship between these entities • These types of databases are designed to accurately reflect the world By identifying relationships between entities Entity A single place, person, or thing about which data can be stored • We can use these to plan out a Conceptual Data Model of the database 4 Fundamentals of Databases: Database Design

Data Models • Each entity can also have data associated with them Each piece of data fitting into a property, also known as an attribute • For example, a Person entity might have the following attributes Name Age Gender • Databases contain many records of an entity for each individual person/place/thing – Each record is known as an instance, or occurrence 5 Fundamentals of Databases: Database Design

Data Models • We can write entities out, with their attributes, in a simple description • For example, a Student has a First Name, Surname, Date of Birth, and Current Form Group • We can describe this entity using the following: Student(First. Name, Surname, Date. Of. Birth, Current. Form. Group) 6 Fundamentals of Databases: Database Design

Data Models • All entities need one attribute (or a combination of them) the uniquely identify them • This is known as the Entity Identifier • We can either add an attribute specifically for this (like an ID), or use a combination of attributes (like a first and last name) • Note that entity identifiers should be unique among all entities of one type • Here’s an example (the underlined attributes combine to make the entity identifier) Student(First. Name, Surname, Date. Of. Birth, Current. Form. Group) 7 Fundamentals of Databases: Database Design

Data Models • The individual attributes on their own would be too weak to use as an identifier – As there could by many students with the same first name • However, lots of them combined are more appropriate – As the chance of finding a student with all three being the same are negligible Student(First. Name, Surname, Date. Of. Birth, Current. Form. Group) 8 Fundamentals of Databases: Database Design

• Create a data model for the following situation, and write the entity descriptions the model would use: A system administrator is creating a database for a university. They need to store information about the modules they offer, the professors that teach those modules, and the students that enrol onto those modules. Module have their module ID, their name, and branching subject (like Maths, Computer Science, Art, and so on). Professors have their name and faculty ID. Students have their name, date of birth, course enrolment (like Microbiology, Computer Science and Maths, Software Engineering) and module enrolment. 9 Fundamentals of Databases: Database Design

Data Models • We can represent entities (and their attributes) in a visual format • Each rectangle is an entity, and the lines inside them are their attributes – Where the entity identifier is underlined • Can be referred to as an entity diagram 10 Fundamentals of Databases: Database Design

• For the data model you made in the previous exercise • Create an entity diagram for each entity • Remember to underline the entity identifier for that entity 11 Fundamentals of Databases: Database Design

Entity Relationships • A relationship is an association between two entities Like a House and a Person A House could contain multiple People • Relationships have degrees (known as cardinalities), defining the maximum number of entities one entity can be linked to One-to-One One-to-Many-to-Many 12 Fundamentals of Databases: Database Design

Entity Relationships • Here are some examples of cardinalities in action • Primary care physicians will help many patients One-to-Many • Surgeons perform operations Many-to-Many • Patients have health-care cards One-to-One 13 Fundamentals of Databases: Database Design

• For your data model from before, think about what relationships it would use – And the cardinalities of them A system administrator is creating a database for a university. They need to store information about the modules they offer, the professors that teach those modules, and the students that enrol onto those modules. Module have their module ID, their name, and branching subject (like Maths, Computer Science, Art, and so on). Professors have their name and faculty ID. Students have their name, date of birth, course enrolment (like Microbiology, Computer Science and Maths, Software Engineering) and module enrolment. 14 Fundamentals of Databases: Database Design

Entity Relationships • We can represent relationships between entities in a visual format – Making use of entity diagrams 15 Fundamentals of Databases: Database Design

Entity Relationships • These lines have specific start and end points – Each representing a different cardinality 16 Fundamentals of Databases: Database Design

Entity Relationships • We can flesh out the relationship diagram with entity diagrams to create the full Entity Relationship (ER) Diagram 17 Fundamentals of Databases: Database Design

• Create an ER Diagram using the data model, entity descriptions, and relationships you made for this situation A system administrator is creating a database for a university. They need to store information about the modules they offer, the professors that teach those modules, and the students that enrol onto those modules. Module have their module ID, their name, and branching subject (like Maths, Computer Science, Art, and so on). Professors have their name and faculty ID. Students have their name, date of birth, course enrolment (like Microbiology, Computer Science and Maths, Software Engineering) and module enrolment. 18 Fundamentals of Databases: Database Design

• Create data models (using ER Diagrams) for the following situations – Think about what attributes the entities may need as well A blind person may be given a guide dog. A guide dog will look after just one blind person and a blind person will only have one guide dog. Every car registered in the UK has a unique registration number and one registered keeper. The driver and Vehicle Licensing Agency records the keeper’s name and address. One person may keep several vehicles. A lending library lends books to borrowers. Each borrower may borrow several books. There may be more than one copy of popular books. The library records names and addresses of the borrowers and each borrower has a unique number identifying them on the library’s system. 19 Fundamentals of Databases: Database Design

Relational Data Models • 20 Fundamentals of Databases: Database Design

Relational Data Models • In these models, we would represent entities as tables – Every column in the table being an attribute of that entity – Every row in the table being an instance/occurrence of that entity – The underlined column is the entity identifier 21 Module. Code Module. Name Total. Credits Number. Of. Lessons Hours. Per. Lesson PRO 101 Programming 101 10 12 1 LIN 001 Linear Algebra 10 12 1 GEN 001 Genetic Computing 20 24 2 Fundamentals of Databases: Database Design

• Based off this relational data model for a Module table – Come up with a relational data model for a Teacher table – Should include the teacher’s ID, name, start date, and assigned modules Module. Code Module. Name Total. Credits Number. Of. Lessons Hours. Per. Lesson PRO 101 Programming 101 10 12 1 LIN 001 Linear Algebra 10 12 1 GEN 001 Genetic Computing 20 24 2 • Come up with at least three occurrences to put in the table 22 Fundamentals of Databases: Database Design

Relational Data Models • You may have come up with something like this for the Teacher table ID First. Name Last. Name Start. Date Assigned. Modules 00123456 Johnathan Joseph 02/04/1999 PRO 101, MAL 002 00654321 Abi Pranav 16/06/2010 LIN 001, ALG 001 00524163 Clive Sinclair 21/03/1990 GEN 001 • With both of these tables together, we have an implicit relation – As the Teacher table refers to Assigned. Modules – Which are the Module. Code values from the Module table 23 Fundamentals of Databases: Database Design

Relational Data Models • We have to link data from one table to another to create a relation • The type of relation depends on where and how the values are used • The entity identifier is referred to as the Primary Key (when in its own table) • When this is used in another table, we call it the Foreign Key – In our example earlier, the Assigned. Modules is the Foreign Key for the Module. Code 24 Fundamentals of Databases: Database Design

Relational Data Models • You may have spotted the use of multiple values in a single occurrence for a single column ID First. Name Last. Name Start. Date Assigned. Modules 00123456 Johnathan Joseph 02/04/1999 PRO 101, MAL 002 00654321 Abi Pranav 16/06/2010 LIN 001, ALG 001 00524163 Clive Sinclair 21/03/1990 GEN 001 • This is often frowned upon, as it makes a search for a particular value take longer – After finding the specific occurrence, we would then need to search through those multiple values 25 Fundamentals of Databases: Database Design

Relational Data Models • Having inefficiencies like this are very much a bad thing • So, we often need to think about how we can make our tables more efficient • This is through the act of Normalisation 26 Fundamentals of Databases: Database Design

Normalisation • Databases can store data in any format or organisation style we want • However, some formats are more efficient than others • Normalisation has two goals: Eliminating redundant data Ensuring data dependencies make sense. • The overall aim of this is to reduce the amount of memory space a database takes And reduce the time it takes to find something in a database 27 Fundamentals of Databases: Database Design

Normalisation • There are different levels of normalisation (known as normal forms) • 1 st to 3 rd Normal Form are practically applied in all modern databases • 4 th Normal Form is occasionally found in larger databases • 5 th Normal Form is very rare • We only need to know 1 st to 3 rd 28 Fundamentals of Databases: Database Design

Normalisation (1 st) • 1 st Normal Form starts the efficiency process by eliminating repeating groups from the same table • Then creates separate tables for each group of related data (entity) • Finally, a primary key is identifiable in the table The field that uniquely defines a record Each record must have a unique value for the primary key field 29 Fundamentals of Databases: Database Design

Normalisation (1 st) • Here is a un-normalised example Sales. Order. No Order. Date Customer. No Customer. Na me Customer. Add ress Clerk. No Clerk. Name Item. No Description Unit. Price Quantity 01112 01/01/2009 123 NT Pvt. Ltd. 1, High Street, Maidenhead 101 Fred 101 8697 3456 Ring Binder Divider Stapler 1. 50 0. 50 2. 99 3 4 5 031245 10/02/2009 144 Premier Co. 10, London Road, Reading 110 Gregg 9684 3456 Scissors Stapler 2. 99 2 4 • Note the repeated values in Item. No, Description, Unit. Price, and Quantity 30 Fundamentals of Databases: Database Design

Normalisation (1 st) • Here is the 1 st Normal Form Sales. Order. No Order. Date Customer. No Customer. Na me Customer. Add ress Clerk. No Clerk. Name Item. No Description Unit. Price Quantity 01112 01/01/2009 123 NT Pvt. Ltd. 1, High Street, Maidenhead 101 Fred 101 Ring Binder 1. 50 3 01112 01/01/2009 123 NT Pvt. Ltd. 1, High Street, Maidenhead 101 Fred 8697 Divider 0. 50 4 01112 01/01/2009 123 NT Pvt. Ltd. 1, High Street, Maidenhead 101 Fred 3456 Stapler 2. 99 5 031245 10/02/2009 144 Premier Co. 10, London Road, Reading 110 Gregg 9684 Scissors 2. 99 2 031245 10/02/2009 144 Premier Co. 10, London Road, Reading 110 Gregg 3456 Stapler 2. 99 4 • All we’ve done is unfold the repeated groups onto their own occurrences 31 Fundamentals of Databases: Database Design

Normalisation (2 nd) • 2 nd Normal Form carries on where the 1 st leaves from • However, we also remove any non-primary key dependencies Fields that don’t rely on the primary key to add context • Also creates relationships between tables using foreign keys A primary key from one table that is used in another table to link them 32 Fundamentals of Databases: Database Design

Normalisation (2 nd) • There are quite a few attributes here not relevant to the primary key Sales. Order. No Order. Date Customer. No Customer. Na me Customer. Add ress Clerk. No Clerk. Name Item. No Description Unit. Price Quantity 01112 01/01/2009 123 NT Pvt. Ltd. 1, High Street, Maidenhead 101 Fred 101 Ring Binder 1. 50 3 01112 01/01/2009 123 NT Pvt. Ltd. 1, High Street, Maidenhead 101 Fred 8697 Divider 0. 50 4 01112 01/01/2009 123 NT Pvt. Ltd. 1, High Street, Maidenhead 101 Fred 3456 Stapler 2. 99 5 031245 10/02/2009 144 Premier Co. 10, London Road, Reading 110 Gregg 9684 Scissors 2. 99 2 031245 10/02/2009 144 Premier Co. 10, London Road, Reading 110 Gregg 3456 Stapler 2. 99 4 • They are: – Customer. Name, Customer. Address (only relevant to Customer. No) – Clerk. Name (only relevant to Clerk. No) – Description, Unit. Price (only relevant to Item. No) 33 Fundamentals of Databases: Database Design

Normalisation (2 nd) • We can take those non-relevant attributes and move them into their own tables 34 Customer. No Customer. Name Customer. Address 123 NT Pvt. Ltd. 1, High Street, Maidenhead 144 Premier Co. 10, London Road, Reading Clerk. No Item. No Description Unit. Price 101 Ring Binder 1. 50 8697 Divider 0. 50 Clerk. Name 3456 Stapler 2. 99 101 Fred 9684 Scissors 2. 99 110 Gregg Fundamentals of Databases: Database Design

Normalisation (2 nd) • The original table will then need to be adjusted (with the removal of those attributes) 35 Sales. Order. No Order. Date Customer. No Clerk. No Item. No Quantity 01112 01/01/2009 123 101 3 01112 01/01/2009 123 101 8697 4 01112 01/01/2009 123 101 3456 5 031245 10/02/2009 144 110 9684 2 031245 10/02/2009 144 110 3456 4 Fundamentals of Databases: Database Design

Normalisation (3 rd) • The final of the commonly seen normal forms (once again carries on from 2 nd Normal Form) • Now we eliminate all non-key dependencies Completely throwing away any data that is not relevant to a particular record • Most tables/databases tend to perform 2 nd and 3 rd Normal Forms at the same time Removes non-key dependencies while stripping away partial ones 36 Fundamentals of Databases: Database Design

Normalisation (3 rd) • For example, this table still have repeated records – The Quantity attribute is not only reliant on Sales. Order. No (the Primary Key) – But also on Item. No (as that’s where to get the total price from) 37 Sales. Order. No Order. Date Customer. No Clerk. No Item. No Quantity 01112 01/01/2009 123 101 3 01112 01/01/2009 123 101 8697 4 01112 01/01/2009 123 101 3456 5 031245 10/02/2009 144 110 9684 2 031245 10/02/2009 144 110 3456 4 Fundamentals of Databases: Database Design

Normalisation (3 rd) • We can split Item. No and Quantity into their own table – Using the Sales. Order. No as a Foreign Key Sales. Order. No Item. No Quantity Sales. Order. No Order. Date Customer. No Clerk. No 01112 101 3 01112 01/01/2009 123 101 01112 8697 4 031245 10/02/2009 144 110 01112 3456 5 031245 9684 2 031245 3456 4 • Now the only value that is repeated is the Foreign Key of Sales. Order. No 38 Fundamentals of Databases: Database Design

• Take this database through 1 st, 2 nd, and 3 rd Normal Forms Dept. No Dept. Name Location Mgr. Name Mgr. IDNo Tel. Extn Cust. No Cust. Name Date. Of. Complaint Nature. Of. Complaint 11232 Soap Division Cincinnati Mary Samuel S 11 7711 P 10451 Robert Drumtree 12/01/1998 Poor Service P 10480 Steven Parks 14/01/1998 Discourteous Attendant P 10687 Jenny Baltimore 07/04/1997 Poor Service 11230 Shampoo Division Cincinnati Sam Andrews S 31 7703 • Once done, create an ER Diagram for the 3 rd Normal Form of the database 39 Fundamentals of Databases: Database Design

- Slides: 40