d de lu ti nc SQL DDL DML

Database: Review What is a database? Introduction to the database systems Sept. 2009 The

Database: Review The main characteristics of the database approach: single repository of data •

Database: Review Data modeling using ER-model Sept. 2009 Entity-relationship model - Entity types -

Database: Review ER-model: fname lname minit name sex ssn name number address startdate employee

Database: Review Database schema, Schema evolution, Database state Concepts and Architecture Working process with

Database: Review Database schema Course CName CNo Cr. Hrs Dept Database 8803 3 CS

Database: Review Working process with a database system: Definition • record structure • data

Database: Review Database Management System (DBMS) • collection of software facilitating the definition, construction

Database: Review Three-schema architecture External view Describes the whole database for all users Conceptual

Database: Review external hashing static hashing & dynamic hashing hash function mathematical function that

Database: Review External hashing: the data are on the disk. Static hashing: using a

Database: Review Linear hashing: 1. What is a phase? 2. When to split a

Database: Review Linear hashing: initially hash file contains M buckets hi = key mod

Database: Review phase 2: insertion using h 1 = key mod (2 M) splitting

Database: Review Linear Hashing including two Phases: - collision resolution strategy: chaining - split

Database: Review The first phase – phase 0 • when inserting the sixth record

Database: Review insert(5) 8 0 Sept. 2009 1 1 1 5 1 2 14

Database: Review insert(10) 8 0 8 1 5 1 1 5 2 14 3

Database: Review 8 0 1 1 2 14 3 2 3 overflow 4 4

Database: Review insert(7) 8 1 2 14 3 4 n=2 load factor: 9/12=0. 75

Database: Review 8 1 2 10 3 7 4 5 14 n=3 load factor:

Database: Review 8 24 1 2 10 3 7 4 5 14 n=3 load

Database: Review 8 24 1 2 10 3 4 5 14 7 n=4 The

Database: Review 8 24 1 17 2 10 3 4 5 14 7 n=0

Database: Review 8 24 1 17 2 10 3 5 13 4 14 7

Database: Review insert(15) 1 17 2 10 3 3 4 5 13 14 7

Database: Review tree - root, internal, leaf, subtree - parent, child, sibling Multi-level index

Database: Review B+-tree Structure non-leaf node (internal node or a root) • < P

Database: Review B+-tree Structure leaf node (terminal node) • < (K 1, Pr 1),

Database: Review A B+-tree pinternal = 3, pleaf = 2. 5 3 1 3

Database: Review B+-tree insertion: leaf node splitting, internal node splitting Leaf splitting When a

Database: Review Internal node splitting If an internal node splits and it is not

Database: Review Internal node splitting When a new root is formed, a key value

Database: Review Deleting nodes from a B+-tree: 1. When deleting a key from a

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 7 3

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 A

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 3 1 Sept.

Database: Review Relational Data Model Data modeling using Relational model Relational algebra - relation

Database: Review Integrity Constraints • any database will have some number of constraints that

Database: Review Integrity Constraints • If a relation schema has more than one key,

Database: Review Integrity Constraints 3. entity integrity • no part of a PK can

Database: Review Updating and constraints insert • Insert the following tuple into EMPLOYEE: <‘Cecilia’,

Database: Review Updating and constraints delete • Delete the WORK_ON tuple with Essn =

Database: Review cascade – a strategy to enforce referential integrity Employee ssn . .

Database: Review Modify – a strategy to enforce referential integrity Employee ssn . .

Database: Review Relational Algebra a set of relations relation specific a set of operations

Database: Review Relational algebra Retrieve for each female employee a list of the names

Database: Review Query: Retrieve the name of employees who work on all the projects

Database: Review Division The DIVISION operator can be expressed as a sequence of ,

Database: Review DDL - creating schemas - modifying schemas SQL DML - select-from-where clause

Database: Review DDL - Examples: • Create schema: Create schema COMPANY authorization JSMITH; •

Database: Review DDL - Examples: • drop schema DROP SCHEMA CAMPANY CASCADE; DROP SCHEMA

Database: Review DML - select-from-where clause Retrieve a list of employees and the projects

Slides: 60

Download presentation

d de lu ti nc SQL: DDL, DML Relational algebra, Relational data model B+-tree Hashing ER-model, Data modeling, 1 ACS-3902 Yangjun Chen Sept. 2009 system architecture, Basic concepts, Introduction no Database: Review Database

Database: Review What is a database? Introduction to the database systems Sept. 2009 The main characters of a database The basic database design method The entity-relationship data model for application modeling Yangjun Chen ACS-3902 2

Database: Review The main characteristics of the database approach: single repository of data • sharable by multiple users • concurrency control and transaction concept • security and integrity constraints • self-describing - system catalogue contains meta data • program-data independence • some changes to the database are transparent to programs/users • multiple views of data - to support individual needs of programs/users Sept. 2009 Yangjun Chen ACS-3902 3

Database: Review Data modeling using ER-model Sept. 2009 Entity-relationship model - Entity types - strong entities - weak entities - Relationships among entities - Attributes - attribute classification - Constraints - cardinality constraints - participation constraints ER-to-Relation-mapping Yangjun Chen ACS-3902 4

Database: Review ER-model: fname lname minit name sex ssn name number address startdate employee bdate 1 department 1 1 M hours supervisee 1 N works on supervision M controls manages M supervisor works for N salary 1 location dependents of number of employees N project name number location N dependent name Sept. 2009 sex birthdate relationship Yangjun Chen ACS-3902 5

Database: Review Database schema, Schema evolution, Database state Concepts and Architecture Working process with a database system Database system architecture Data independence concept Sept. 2009 Yangjun Chen ACS-3902 6

Database: Review Database schema Course CName CNo Cr. Hrs Dept Database 8803 3 CS Relation schema Schema evolution Database state C Student Name St. No Class Major Smith 17 1 CS Brown 8 2 CS Section SId CNo 32 8803 25 8803 43 2606 Sept. 2009 Semester Yr Spring 2000 Winter 2000 Spring 2000 Yangjun Chen 2606 3 CS Grades St. No Sid Grade 17 25 A 17 43 B Instructor Smith Jones ACS-3902 7

Database: Review Working process with a database system: Definition • record structure • data elements • names • data types • constraints etc Sept. 2009 Construction • create database files • populate the database with records Yangjun Chen ACS-3902 Manipulation • querying • updating 8

Database: Review Database Management System (DBMS) • collection of software facilitating the definition, construction and manipulation of databases DBMS Request manager Users/ actors Sept. 2009 Storage manager, Query evaluation Yangjun Chen ACS-3902 Meta data Stored database 9

Database: Review Three-schema architecture External view Describes the whole database for all users Conceptual schema Physical storage structures and details Internal schema Sept. 2009 Yangjun Chen A specific user or groups view of the database ACS-3902 10

Database: Review external hashing static hashing & dynamic hashing hash function mathematical function that maps a key to a Hashing technique bucket address collision resolution scheme - open addressing - chaining - multiple hashing linear hashing Sept. 2009 Yangjun Chen ACS-3902 11

Database: Review External hashing: the data are on the disk. Static hashing: using a hashing function to map keys to bucket addresses primary area can not be changed collision resolusion scheme: open addressing chaining multiple hashing Dynamic hashing: primary area can be changed linear hashing Sept. 2009 Yangjun Chen ACS-3902 12

Database: Review Linear hashing: 1. What is a phase? 2. When to split a bucket? 3. How to split a bucket? 4. What bucket will be chosen to split next? 5. How do we find a record inserted into a linear hashing file? Sept. 2009 Yangjun Chen ACS-3902 13

Database: Review Linear hashing: initially hash file contains M buckets hi = key mod (2 i M) (i = 0, 1, 2, . . . ) insertion process can be divided into several phases phase 1: insertion using h 0 = key mod M splitting using h 1 = key mod (2 M) splitting rule: overflow of a bucket or if load factor > constant (e. g. , 0. 70) overflow will be put in the overflow area or redistributed through splitting a bucket splitting buckets from n = 0 to n = M- 1 (after each splitting n is increased by 1. Phase 1 finishes when n = M (in this case, the primary area becomes 2 M buckets long) Sept. 2009 Yangjun Chen ACS-3902 14

Database: Review phase 2: insertion using h 1 = key mod (2 M) splitting using h 2 = key mod (4 M) splitting rule: overflow of a bucket or if load factor > constant (e. g. , 0. 70) overflow will be put in the overflow area or redistributed through splitting a bucket splitting buckets from n = 0 to n = 2 M- 1 (after each splitting n is increased by 1. Phase 1 finishes when n = 2 M (in this case, the primary area will contain 4 M buckets. ) phase 3: . . . … h 2 = …, h 3 = …, . . . Sept. 2009 Yangjun Chen ACS-3902 15

Database: Review Linear Hashing including two Phases: - collision resolution strategy: chaining - split rule: load factor > 0. 7 - initially M = 4 (M: size of the primary area) - hash functions: hi(key) = key mod 2 i M (i = 0, 1, 2, …) - bucket capacity = 2 Trace the insertion process of the following keys into a linear hashing file: 3, 2, 4, 1, 8, 14, 5, 10, 7, 24, 17, 13, 15. Sept. 2009 Yangjun Chen ACS-3902 16

Database: Review The first phase – phase 0 • when inserting the sixth record we would have 4 8 1 0 1 2 14 3 2 3 n=0 before the split (n is the point to the bucket to be split. ) • but the load factor 6/8= 0. 75 > 0. 70 and so bucket 0 must be split (using h 1 = Key mod 2 M): 8 2 14 1 0 Sept. 2009 1 3 2 4 3 Yangjun Chen 4 ACS-3902 n=1 after the split load factor: 6/10=0. 6 no split 17

Database: Review insert(5) 8 0 Sept. 2009 1 1 1 5 1 2 14 3 2 3 Yangjun Chen 4 4 4 n=1 load factor: 7/10=0. 7 no split 4 ACS-3902 18

Database: Review insert(10) 8 0 8 1 5 1 1 5 2 14 3 4 4 4 n=1 load factor: 8/10=0. 8 split using h 1. overflow 10 Sept. 2009 Yangjun Chen ACS-3902 19

Database: Review 8 0 1 1 2 14 3 2 3 overflow 4 4 5 5 10 n=2 load factor: 8/12=0. 66 no split Sept. 2009 Yangjun Chen ACS-3902 20

Database: Review insert(7) 8 1 2 14 3 4 n=2 load factor: 9/12=0. 75 split using h 1. overflow 10 8 0 1 1 2 14 3 7 2 3 overflow 5 4 4 5 5 10 Sept. 2009 Yangjun Chen ACS-3902 21

Database: Review 8 1 2 10 3 7 4 5 14 n=3 load factor: 9/14=0. 642 no split. insert(24) 8 Sept. 2009 1 2 10 3 7 Yangjun Chen 4 ACS-3902 5 14 22

Database: Review 8 24 1 2 10 3 7 4 5 14 n=3 load factor: 10/14=0. 71 split using h 1. 8 24 Sept. 2009 1 2 10 3 Yangjun Chen 4 ACS-3902 5 14 7 23

Database: Review 8 24 1 2 10 3 4 5 14 7 n=4 The second phase – phase 1 n = 0; using h 1 = Key mod 2 M to insert and h 2 = Key mod 4 M to split. insert(17) 8 24 Sept. 2009 1 2 10 3 Yangjun Chen 4 ACS-3902 5 14 7 24

Database: Review 8 24 1 17 2 10 3 4 5 14 7 n=0 load factor: 11/16=0. 687 no split. insert(13) 8 24 Sept. 2009 1 17 2 10 3 Yangjun Chen 4 ACS-3902 5 14 7 26

Database: Review 8 24 1 17 2 10 3 5 13 4 14 7 n=0 load factor: 12/16=0. 75 split bucket 0, using h 2. 1 17 Sept. 2009 2 10 3 4 Yangjun Chen 5 13 14 ACS-3902 7 8 24 27

Database: Review insert(15) 1 17 2 10 3 3 4 5 13 14 7 8 24 14 7 15 8 24 n=1 load factor: 13/18=0. 722 split bucket 1, using h 2. 1 17 Sept. 2009 2 10 3 4 Yangjun Chen 5 13 14 ACS-3902 7 15 8 24 28

Database: Review tree - root, internal, leaf, subtree - parent, child, sibling Multi-level index balanced, unbalanced b+-tree - splits on overflow; merge on underflow - in practice it is usually 3 or 4 levels deep search, insert, delete algorithms Sept. 2009 Yangjun Chen ACS-3902 29

Database: Review B+-tree Structure non-leaf node (internal node or a root) • < P 1, K 1, P 2, K 2, …, Pq-1, Kq-1, Pq > (q pinternal) • K 1 < K 2 <. . . < Kq-1 (i. e. it’s an ordered set) • For any key value, X, in the subtree pointed to by Pi • Ki-1 < X Ki for 1 < i < q • X K 1 for i = 1 • Kq-1 < X for i = q • Each internal node has at most pinternal pointers. • Each node except root must have at least pinternal/2 pointers. • The root, if it has some children, must have at least 2 pointers. Sept. 2009 Yangjun Chen ACS-3902 30

Database: Review B+-tree Structure leaf node (terminal node) • < (K 1, Pr 1), (K 2, Pr 2), …, (Kq-1, Prq-1), Pnext > • K 1 < K 2 <. . . < Kq-1 • Pri points to a record with key value Ki, or Pri points to a page containing a record with key value Ki. • Maximum of pleaf key/pointer pairs. • Each leaf has at least pleaf/2 keys. • All leaves are at the same level (balanced). • Pnext points to the next leaf node for key sequencing. Sept. 2009 Yangjun Chen ACS-3902 31

Database: Review A B+-tree pinternal = 3, pleaf = 2. 5 3 1 3 5 7 6 7 8 8 9 12 Records in a file Sept. 2009 Yangjun Chen ACS-3902 32

Database: Review B+-tree insertion: leaf node splitting, internal node splitting Leaf splitting When a leaf splits, a new leaf is allocated • the original leaf is the left sibling, the new one is the right sibling • key and pointer pairs are redistributed: the left sibling will have smaller keys than the right sibling • a 'copy' of the key value which is the largest of the keys in the left sibling is promoted to the parent 22 33 33 12 22 33 44 48 55 12 22 31 33 44 48 55 insert 31 Sept. 2009 Yangjun Chen ACS-3902 33

Database: Review Internal node splitting If an internal node splits and it is not the root, • insert the key and pointer and then determine the middle key • a new 'right' sibling is allocated • everything to its left stays in the left sibling • everything to its right goes into the right sibling • the middle key value along with the pointer to the new right sibling is promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between this left and right sibling) 26 55 55 22 33 22 Insert 26 Sept. 2009 Yangjun Chen ACS-3902 33 34

Database: Review Internal node splitting When a new root is formed, a key value and two pointers must be placed into it. 40 26 55 Insert 40 Sept. 2009 Yangjun Chen ACS-3902 35

Database: Review Deleting nodes from a B+-tree: 1. When deleting a key from a node A, check whether the number of the remaining keys (or pointers) is p/2. 2. If it is not the case, redistribute the keys in the left sibling B or in the right sibling C if it is possible. Otherwise, merge A and B or merge A and C. 3. When redistributing or merging, change the key values in the parent node so that the following condition is satisfied: • < P 1, K 1, P 2, K 2, …, Pq-1, Kq-1, Pq > • K 1 < K 2 <. . . < Kq-1 (i. e. it is an ordered set) • for the key values, X, in the subtree pointed to by Pi • Ki-1 < X <= Ki for 1 < i < q • X <= K 1 for i = 1 • Kq-1 < X for i = q Sept. 2009 Yangjun Chen ACS-3902 36

Database: Review A b+-tree pinternal = 3, pleaf = 2. 5 3 1 3 5 7 6 7 8 8 9 12 Records Sept. 2009 Yangjun Chen ACS-3902 37

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 7 3 1 3 5 6 7 9 9 12 Deleting 8 causes the node redistribute. Sept. 2009 Yangjun Chen ACS-3902 38

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1 3 5 7 6 7 9 12 is removed. Sept. 2009 Yangjun Chen ACS-3902 39

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1 3 5 6 6 7 9 is removed. Sept. 2009 Yangjun Chen ACS-3902 40

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1 3 5 6 6 Deleting 7 makes this pointer no use. Therefore, a merge at the level above the leaf level occurs. Sept. 2009 Yangjun Chen ACS-3902 41

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 A 5 This point becomes useless. The corresponding node should also be removed. B 1 3 5 6 C For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than 5. Sept. 2009 Yangjun Chen ACS-3902 42

Database: Review Entry deletion - deletion sequence: 8, 12, 9, 7 3 1 Sept. 2009 3 5 Yangjun Chen 5 6 ACS-3902 43

Database: Review Relational Data Model Data modeling using Relational model Relational algebra - relation schema, relations - database schema (relational schema), database state - integrity constraints and updating Relational algebra - select, project, join, cartesian product - division - set operations: union, intersection, difference, Sept. 2009 Yangjun Chen ACS-3902 44

Database: Review Integrity Constraints • any database will have some number of constraints that must be applied to ensure correct data (valid states) 1. domain constraints • a domain is a restriction on the set of valid values • domain constraints specify that the value of each attribute A must be an atomic value from the domain dom(A). 2. key constraints • a superkey is any combination of attributes that uniquely identify a tuple: t 1[superkey] t 2[superkey]. - Example: <Name, SSN> (in Employee) • a key is superkey that has a minimal set of attributes - Example: <SSN> (in Employee) Sept. 2009 Yangjun Chen ACS-3902 45

Database: Review Integrity Constraints • If a relation schema has more than one key, each of them is called a candidate key. • one candidate key is chosen as the primary key (PK) • foreign key (FK) is defined as follows: i) Consider two relation schemas R 1 and R 2; ii) The attributes in FK in R 1 have the same domain(s) as the primary key attributes PK in R 2; the attributes FK are said to reference or refer to the relation R 2; iii) A value of FK in a tuple t 1 of the current state r(R 1) either occurs as a value of PK for some tuple t 2 in the current state r(R 2) or is null. In the former case, we have t 1[FK] = t 2[PK], and we say that the tuple t 1 references or refers to the tuple t 2. Example: FK Employee(SSN, …, Dno) Sept. 2009 Yangjun Chen Dept(Dno, … ) ACS-3902 46

Database: Review Integrity Constraints 3. entity integrity • no part of a PK can be null 4. referential integrity • domain of FK must be same as domain of PK • FK must be null or have a value that appears as a PK value 5. semantic integrity • other rules that the application domain requires: • state constraint: gross salary > net income • transition constraint: Widowed can only follow Married; salary of an employee cannot decrease Sept. 2009 Yangjun Chen ACS-3902 47

Database: Review Updating and constraints insert • Insert the following tuple into EMPLOYEE: <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘ 677678989’, ‘ 1960 -04 -05’, ‘ 6357 Windy Lane, Katy, TX’, F, 40000, null, 4> • When inserting, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity update • Update the SALARY of the EMPLOYEE tuple with ssn = ‘ 999887777’ to 30000. • When updating, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity Sept. 2009 Yangjun Chen ACS-3902 48

Database: Review Updating and constraints delete • Delete the WORK_ON tuple with Essn = ‘ 999887777’ and pno = 10. • When deleting, the referential constraint will be checked. - The following deletion is not acceptable: Delete the EMPLOYEE tuple with ssn = ‘ 999887777’ - reject, cascade, modify Sept. 2009 Yangjun Chen ACS-3902 49

Database: Review cascade – a strategy to enforce referential integrity Employee ssn . . . 123456789 . . . Works-on Essn 123456789 . . . Sept. 2009 delete Pno 5 . . . Yangjun Chen delete ACS-3902 50

Database: Review cascade – a strategy to enforce referential integrity Employee ssn . . . supervisor 234589710 123456789 . . . 234589710 null delete Employee ssn . . . supervisor 234589710 123456789 . . . delete 234589710 Sept. 2009 not reasonable null Yangjun Chen ACS-3902 delete 51

Database: Review Modify – a strategy to enforce referential integrity Employee ssn . . . 123456789 . . . Works-on Essn 123456789 . . . delete Works-on Essn Pno null 5 . . . Pno 5 . . . This violates the entity constraint. Sept. 2009 Yangjun Chen ACS-3902 52

Database: Review Relational Algebra a set of relations relation specific a set of operations set operations Sept. 2009 Yangjun Chen ACS-3902 select project join division union intersection difference cartesian product 53

Database: Review Relational algebra Retrieve for each female employee a list of the names of her dependents: FEMALE_EMPS SEX = ‘F’ (EMPLOYEE) EMPNAMES FNAME, LNAME, SSN(FEMALE_EMPS) ACTUAL_DEPENDENTS EMPNAMES DEPENDENT SSN = ESSN RESULT FNAME, LNAME, DEPENDENT_NAME(ACTUAL_DEPENDENTS ) Sept. 2009 Yangjun Chen ACS-3902 54

Database: Review Query: Retrieve the name of employees who work on all the projects that ‘John Smith’ works on. SMITH FNAME = ‘John’ and LNAME = ‘Smith’(EMPLOYEE) SMITH_PNOs PNO(WORK_ON ESSN = SSNSMITH) SSN_PNO ESSN, PNO(WORK_ON) SSNS(SSN) SSN_PNO : SMITH_PNOs RESULT FNAME, LNAME(SSNS * EMPLOYEE) Sept. 2009 Yangjun Chen ACS-3902 55

Database: Review Division The DIVISION operator can be expressed as a sequence of , , and - operations as follows: Z = {A 1, …, An, B 1, …, Bm}, X = {B 1, …, Bm}, Y = Z - X = {A 1, …, An}, T 1 Y( R) R(Z) : S(X) T 2 Y((S T 1) - R) T T 1 - T 2 result Sept. 2009 Yangjun Chen ACS-3902 56

Database: Review DDL - creating schemas - modifying schemas SQL DML - select-from-where clause - group by, having, order by - update - view Sept. 2009 Yangjun Chen ACS-3902 57

Database: Review DDL - Examples: • Create schema: Create schema COMPANY authorization JSMITH; • Create table: Create table EMPLOYEE (FNAME VARCHAR(15) NOT NULL, MINIT CHAR, LNAME VARCHAR(15) NOT NULL, SSN CHAR(9) NOT NULL, BDATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY DECIMAL(10, 2), SUPERSSN CHAR(9), DNO INT NOT NULL, PRIMARY KEY(SSN), FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN), FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER)); Sept. 2009 Yangjun Chen ACS-3902 58

Database: Review DDL - Examples: • drop schema DROP SCHEMA CAMPANY CASCADE; DROP SCHEMA CAMPANY RESTRICT; • drop table DROP TABLE DEPENDENT CASCADE; DROP TABLE DEPENDENT RESTRICT; • alter table ALTER TABLE COMPANY. EMPLOYEE ADD JOB VARCHAR(12); ALTER TABLE COMPANY. EMPLOYEE DROP ADDRESS CASCADE; Sept. 2009 Yangjun Chen ACS-3902 59

Database: Review DML - select-from-where clause Retrieve a list of employees and the projects they are working on, ordered by department, within each department, ordered alphabetically by last name, first name: SELECT DNAME, LNAME, FNAME, PNAME FROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT WHERE DNUMBER = DNO AND SSN = ESSN AND PNO = PNUMBER ORDER BY DNAME, LNAME, FNAME Sept. 2009 Yangjun Chen ACS-3902 60