6 830 Lecture 4 Normalization Recap Simple DB
6. 830 Lecture 4 Normalization Recap Simple. DB Overview Database Internals 3/1/2021
Hobby Schema SSN Name Address Hobby Cost 123 john main st dolls $ 123 john main st bugs $ 345 mary lake st tennis $$ 456 joe first st dolls $ Table key is Hobby, SSN “Wide” schema – has redundancy and anomalies in the presence of updates, inserts, and deletes SSN Entity Relationship Diagram Name Address Person n: n Hobby Cost Name
BCNFify Example for Hobbies Iter 1 Schema S = SSN, H = Hobby, N = Name, A = Addr, C = Cost FDs Iter 2 (S, H, N, A, C) S, H N, A, C S N, A H C violates bcnf Schema FDs (S, N, A) S N, A Schema FDs (S, H, C) S, H C H C key violates bcnf Iter 3 Schema FDs (H, C) H C Schema FDs (S, H)
Non-BCNF Example Account Client Office A Joe 1 B Mary 1 A John 1 C Joe 2 Redundancy! Primary Key is Client, Office Each Account is handled by exactly 1 Office Within a given Office, a given Client has exactly 1 Account FDs: Client, Office Account Office Violates BCNF! (account not a superkey) Splitting into 1. Account, Office and 2. Client Loses correspondence between clients, offices, and accounts
Third Normal Form • This is a minimal example of a table in “third normal form” (3 NF) but not BCNF • All tables in BCNF are also in 3 NF • But 3 NF allows redundancy to “preserve FDs” – Meaning that all FDs can be checked in a single table • Not going to study 3 NF algorithms Account Client Office A Joe 1 B Mary 1 A John 1 C Joe 2 FDs: Client, Office Account Office
DB Core Components • • • Major Components: • --------------------(Query System)-------------- • • • • Admission Control Connection Management (sql) | Parser Memory Management (parse tree) | Rewriter Disk Space Management | (parse tree) Planner Replication Services | (query plan) Optimizer Admin Utilities | (query plan) Executor ----------------(Storage System)--------------Access Methods Buffer Manager Lock Manager Log Manager
Flow of a Query
Query Processing Steps • • Admission Control Query Rewriting Plan Formulation (SQL Tree) Optimization
Query Rewriting
Study Break (Tricky) Flatten this query (departments where number of machines is more than number of employees): SELECT dept. name FROM dept WHERE dept. num-of-machines ≥ (SELECT COUNT(emp. *) FROM emp WHERE dept. name=emp. dept_name) What happens if there is a department with no employees?
Answer SELECT dept. name FROM dept LEFT OUTER JOIN emp ON (dept. name=emp. dept_name ) GROUP BY dept. name HAVING dept. num-of-machines < COUNT(emp. *)
Plan Formulation emp (eno, ename, sal, dno) dept (dno, dname, bldg) kids (kno, eno, kname, bday) Π ename, count �� count > 7 �� agg: count(*), group by ename SELECT ename, count(*) FROM emp, dept, kids AND emp. dno=dept. dno AND kids. eno=emp. eno AND emp. sal > 50000 AND dept. name = 'eecs' GROUP BY ename HAVING count(*) > 7 � eno=eno kids � dno=dno �� name=‘eecs’ dept �� sal>50 k emp
Query Optimization Π ename, count �� count > 7 �� Logical planning: operator ordering (exponential search space) agg: count(*), group by ename � Order? eno=eno kids � Physical planning: operator implementation & access methods (indexes vs heap files) dno=dno �� name=‘eecs’ dept �� Implementation? emp Storage model & access methods? sal>50 k
Physical Layout • Arrangement of records on disk / in memory • Disk / memory are linear, tables are 2 D – ”Row Major” - Row at a time A B C Disk D 1 2 3 0 1 2 3 4 5 6 7 8 9 10 11
Physical Layout • Arrangement of records on disk / in memory • Disk / memory are linear, tables are 2 D – ”Row Major” - Row at a time – “Column Major” Column at a time A A B C Disk D 0 1 2 3 4 5 6 7 8 9 10 11
- Slides: 16