RDBMS RELATIONAL DATABASE MANAGEMENT SYSTEM A Brief History
RDBMS � RELATIONAL � DATABASE � MANAGEMENT � SYSTEM
A Brief History of Data Models � 1950 s file systems, punched cards � 1960 s hierarchical � 1970 s network � 1980 s relational ◦ IMS ◦ CODASYL, IDMS ◦ INGRES, ORACLE, DB 2, Sybase ◦ Paradox, d. Base � 1990 s object oriented and object relational ◦ O 2, Gem. Stone, Ontos
Relational Model � Sets ◦ collections of items of the same type ◦ no order ◦ no duplicates domain � Mappings 1: many: 1 1: 1 many: many range
Definitions � � Database A database consists of a collection of interrelated data. RDBMS A relational database management system (RDBMS) is a database engine/system based on the relational model specified by Edgar F. Codd--the father of modern relational database design--in 1970. A relational database refers to a database that stores data in a structured format, using rows and columns. Relation a subset of the cartesian product of its domains. Given a relation schema R, a relation on that schema r, a set of attributes A 1. . An for that relation then r(R) (dom(A 1) dom(A 2) . . . dom(An)) � Relation Schema denoted by R(A 1, A 2, …, An), is made up of relation name R and list of attributes A 1, A 2, …, An.
� (N)-tuple a set of (n) attribute-value pairs representing a single instance of a relation’s mapping between its domains. � Degree the number of attributes a relation has. � Cardinality a number of tuples a relation has. � Attribute a function on a domain for each instance of the mapping or tuple � Attribute Value the result of the attribute function. Each instance of the mapping is represented by one attribute value drawn from each domain or a special NULL value. Given a tuple t and an attribute A for a relation r, t[A]--> a, where a is the attribute’s value for that tuple. � Domain set of all possible values for an attribute; for attribute A, the domain is represented as dom(A). A domain has a format and a base data type.
Keys � Super. Key a set of attributes whose values together uniquely identify a tuple in a relation � Candidate Key a superkey for which no proper subset is a superkey…a key that is minimal. Can be more than one for a relation � Primary Key a candidate key chosen to be the main key for the relation. One for each relation � Keys can be � Foreign Key composite A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It acts as a cross-reference between tables because it references the primary key of another table, thereby establishing a link between them.
Integrity Constraints Integrity constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency. � � � Domain Constraints Referential Integrity Assertions Triggers Functional Dependencies
Functional Dependencies � Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs � FDs and keys are used to define normal forms for relations � FDs are constraints that are derived from the meaning and interrelationships of the data attributes
Functional Dependencies (2) set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y � X Y holds if whenever two tuples have the same value for X, they must have the same value for Y If t 1[X]=t 2[X], then t 1[Y]=t 2[Y] in any relation �A instance r(R) Y in R specifies a constraint on all relation instances r(R) � FDs are derived from the real-world constraints on the attributes �X
Inference Rules for FDs a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold � Armstrong's inference rules � Given A 1. (Reflexive) If Y subset-of X, then X Y A 2. (Augmentation) If X Y, then XZ YZ (Notation: XZ stands for X U Z) A 3. (Transitive) If X Y and Y Z, then X Z A 2, A 3 form a sound and complete set of inference rules � A 1,
Introduction to Normalization � Normalization: Process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations � Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form ◦ 2 NF, 3 NF, BCNF based on keys and FDs of a relation schema ◦ 4 NF based on keys, multi-valued dependencies
First Normal Form We say a relation is in 1 NF if all values stored in the relation are singlevalued and atomic. 1 NF places restrictions on the structure of relations. Values must be simple. 91. 2914 12
First Normal Form The following in not in 1 NF Emp. Num 123 333 679 Emp. Phone Emp. Degrees 233 -9876 233 -1231 BA, BSc, Ph. D 233 -1231 BSc, MSc Emp. Degrees is a multi-valued field: employee 679 has two degrees: BSc and MSc employee 333 has three degrees: BA, BSc, Ph. D 91. 2914 13
Second Normal Form A relation is in 2 NF if it is in 1 NF, and every non-key attribute is fully dependent on each candidate key. (That is, we don’t have any partial functional dependency. ) • 2 NF (and 3 NF) both involve the concepts of key and non-key attributes. • A key attribute is any attribute that is part of a key; any attribute that is not a key attribute, is a non-key attribute. • Relations that are not in BCNF have data redundancies • A relation in 2 NF will not have any partial dependencies 91. 2914 14
Third Normal Form • A relation is in 3 NF if the relation is in 2 NF and all determinants of non-key attributes are candidate keys That is, for any functional dependency: X Y, where Y is a nonkey attribute (or a set of non-key attributes), X is a candidate key. • This definition of 3 NF differs from BCNF only in the specification of non-key attributes - 3 NF is weaker than BCNF. (BCNF requires all determinants to be candidate keys. ) • A relation in 3 NF will not have any transitive dependencies of non-key attribute on a candidate key through another non-key attribute. 91. 2914 15
Boyce-Codd Normal Form BCNF is defined very simply: a relation is in BCNF if it is in 1 NF and if every determinant is a candidate key. If our database will be used for OLTP (on line transaction processing), then BCNF is our target. Usually, we meet this objective. However, we might denormalize (3 NF, 2 NF, or 1 NF) for performance reasons. 91. 2914 16
In 3 NF, but not in BCNF: student_no course_no instr_no Instructor teaches one course only. Student takes a course and has one instructor. {student_no, course_no} instr_no course_no since we have instr_no course-no, but instr_no is not a Candidate key. 91. 2914 17
Relational Algebra � Formalism for creating new relations from existing ones � Its place in the big picture: Declartive query language SQL, relational calculus Algebra Relational algebra Relational bag algebra Implementation
Relational Algebra � Five ◦ ◦ ◦ operators: Union: Difference: Selection: s Projection: P Cartesian Product: � Derived or auxiliary operators: ◦ Intersection, complement ◦ Joins (natural, equi-join, theta join, semi-join) ◦ Renaming: r
1. Union and 2. Difference � R 1 R 2 � Example: ◦ Active. Employees Retired. Employees � R 1 – R 2 � Example: ◦ All. Employees -- Retired. Employees
What about Intersection ? � It is a derived operator � R 1 R 2 = R 1 – (R 1 – R 2) � Also expressed as a join (will see later) � Example ◦ Unionized. Employees Retired. Employees
3. Selection � Returns all tuples which satisfy a condition � Notation: sc(R) � Examples ◦ s. Salary > 40000 (Employee) ◦ sname = “Smith” (Employee) � The condition c can be =, <, , >, , <>
SSN Name Salary 1234545 John 200000 5423341 Smith 600000 4352342 Fred 500000 SSN Name Salary 5423341 Smith 600000 4352342 Fred 500000 s. Salary > 40000 (Employee)
4. Projection � Eliminates columns, then removes duplicates � Notation: P A 1, …, An (R) � Example: project social-security number and names: ◦ P SSN, Name (Employee) ◦ Output schema: Answer(SSN, Name)
P Name, Salary SSN Name Salary 1234545 John 200000 5423341 John 600000 4352342 John 200000 (Employee) Name Salary John 20000 John 60000
5. Cartesian Product � Each tuple in R 1 with each tuple in R 2 � Notation: R 1 R 2 � Example: ◦ Employee Dependents � Very joins rare in practice; mainly used to express
Natural Join � Notation: � Meaning: � Where: R 1 | | R 2 = PA(s. C(R 1 R 2)) ◦ The selection s. C checks equality of all common attributes ◦ The projection eliminates the duplicate common attributes
Natural Join � R= �R | | S= S= A B B C X Y Z U X Z V W Y Z Z V A B C X Z U X Z V Y Z U Y Z V W
Theta Join A join that involves a predicate � R 1 | | q R 2 = s q (R 1 R 2) � Here q can be any condition � Eq-join A theta join where q is an equality � R 1 | | A=B R 2 = s A=B (R 1 R 2) � Example: � ◦ Employee | | � SSN=SSN Dependents Most useful join in practice
Semijoin �R | S = P A 1, …, An (R | | S) � Where A 1, …, An are the attributes in R � Example: ◦ Employee | Dependents
Finally: RA has Limitations ! � Cannot � Find compute “transitive closure” Name 1 Name 2 Relationship Fred Mary Father Mary Joe Cousin Mary Bill Spouse Nancy Lou Sister all direct and indirect relatives of Fred � Cannot express in RA !!! Need to write C program
- Slides: 31