Chapter 9 Database Systems Introduction to CS 1

Chapter 9 Database Systems Introduction to CS 1 st Semester, 2014 Sanghyun Park

Outline ± ± ± ± Database Fundamentals Data Models Relational Model Object-Oriented Databases Maintaining Database Integrity Traditional File Structures Data Mining Social Impact of Database Technology (skip)

Database Fundamentals ± ± ± A database is a collection of data that is ____, not necessarily physically, _____ A database management system (_______) defines, creates, and maintains a database The DBMS also allows users _____ access to data in the database relying on schemas and subschemas A ____ is a description of the entire database structure used by the DBMS A _____ is a description of only that portion of the database ____ to a particular user’s needs Databases have evolved as a means of _____ data storage systems

File vs. Database Organization (1/2) ± ± _____ data ______ problem

File vs. Database Organization (2/2)

Layered Approach to Database Implementation ± A typical database system consists of ____ layers – an _____ layer and a database management layer ± Advantages of this dichotomy ® Allows for the construction and use of _____ tools ® Provides a means for controlling access to the databases ® Achieves data _______

Database Models ± A database model defines the _______ design of data ± The model also describes the ______ between different parts of data ± In the history of database design, three models have been in use: ® Hierarchical model ® Network model ® Relational model

Hierarchical Model ± ± Data are organized as a ____ Each entity has only ___ parent but can have several children

Network Model ± ± Entities are organized in a _____, where some entities can be accessed through _______ paths There is __ hierarchy

Relational Model (1/2) ± ± Data are organized in two-dimensional ______ called _____ There is no hierarchical or network structure imposed on the data

Relational Model (2/2) ± Each column in a relation is called an _____ ± The total number of attributes for a relation is called the ______ of the relation ± Each row in a relation is called a _____ ± The total number of rows in a relation is called the _____ of the relation ± The cardinality of a relation changes when tuples are added or deleted; this makes the database _____

Issues of Relational Design (1/4) ± ± ± Lack of efficiency due to _____ Possibility of information ____ when deleting a tuple The source of the problems is that we have combined more than one concept into a ______ relation

Issues of Relational Design (2/4)

Issues of Relational Design (3/4) ASSIGNMENT relation Job Id Start Date Term Date S 25 X F 5 S 25 Z 3 -1 -1999 10 -1 -2001 5 -1 -2001 4 -30 -2001 * * ± This database contains information about employees in EMPLOYEE relation, about available jobs in JOB relation, and about job history in ASSIGNMENT relation ± Additional information is implicitly available by ____ the information from _____ relations

Issues of Relational Design (4/4) ± Finding the department in which employee 23 Y 34 has worked

Relational Operations: SELECT

Relational Operations: PROJECT

Relational Operations: JOIN (1/3)

Relational Operations: JOIN (2/3)

Relational Operations: JOIN (3/3)

Issues of Implementation ± ± ± The data in a database are actually stored in a ____ storage system A DBMS allows the application software to be written in terms of a _______ It is the duty of the ______ to accept commands in terms of the relational _____ and convert them into actions relative to the actual storage structure The simplest way for a DBMS to implement a relation is to store it as a _____ file To provide rapid access to entries, the DBMS would store the relation as an _______ file or utilize ____ techniques

Structured Query Language (1/2) ± The language called ____ (Structured Query Language) is used extensively by application software programmers for manipulating databases ± A query involving a combination of SELECT, PROJECT, and JOIN operations can be expressed as a _____ SQL statement ± We should read a SQL statement as a description of the information ______ rather than a _____ of activities to be performed not procedural but _____

Structured Query Language (2/2) ± SELECT statement select from where ± INSERT statement insert values ± into EMPLOYEE (‘ 42 Z 12’, ‘Sue A. Burt’, ‘ 33 Fair St. ’, ‘ 444661111’) DELETE statement delete where ± EMPLOYEE. Name, ASSIGNMENT. Start. Date EMPLOYEE, ASSIGNMENT EMPLOYEE. Empl. Id = ASSIGNMENT. Empl. Id from EMPLOYEE Name = ‘G. Jerry Smith’ UPDATE statement update set where EMPLOYEE Address = ‘ 1812 Napoleon Ave. ’ Name = ‘Joe E. Baker’

Maintaining Database Integrity (1/2) ± ______ is a sequence of operations that must all happen together (e. g. , money transfer between accounts) ± Before a transaction is allowed to alter the database, the alteration to be performed is first recorded in the ___ ± The point at which all the steps in a transaction have been recorded in the log is called the ______ ± At this point DBMS becomes ____ to the transaction ± If problems arise before a transaction has reached its commit point, the log can be used to _______ (undo) the activities actually performed by the transaction

Maintaining Database Integrity (2/2) ± Simultaneous access problems ® Incorrect summary problem ® Lost update problem ± ______ is used for preventing others from accessing data being used by a transaction ® Shared lock is used when ____ data ® Exclusive lock is used when _____ data

Traditional File Structures ± A ___ is an external collection of related data treated as a unit ± The primary purpose of a file is to _____ data ± Files are stored in what are known as ____ or _____ storage devices such as disk and tape ± For our purposes, a file is a collection of data _______ with each record consisting of one or more _____ ± The _____ method determines how records can be retrieved: sequentially or randomly

Taxonomy of File Structures

Sequential Files While Not EOF { Read the next record Process the record }

Applications of Sequential Files ± The sequential file is used in applications that need to access ___ records from beginning to end ± For example, if personal information about each employee is stored in a file, we can use sequential access to retrieve each record at the end of the month to print the paychecks ± The sequential file is not efficient for _______ access

Indexed Files ± To access a record in a file randomly, we need to know the _______ of the record ± An indexed file is made of a data file, which is a sequential file, and an _____ ± The index itself is a very ____ file with only two fields: the ___ of the sequential file and the ____ of the corresponding record on the disk

Mapping in an Index File

Logical View of an Indexed File

Hashed Files ± In an indexed file, the index _____ the key to the address; A hashed file uses a ____ to accomplish this mapping ± The hashed file eliminates the need for an ____ file (index) ± However, we will see that hashed files have their own problems

Modulo Division Hashing Method

Collision (1/2) ± The population of keys is greater than the ______ of records in the data file ± Because there are many keys for each address in the file, there is a possibility that more than one key will hash to the _____ address in the file ± We call the set of keys that hash to the same address ____ ± A _______ is the event that occurs when a hashing algorithm produces an address for an insertion key, and that address is already occupied

Collision (2/2)

Collision Resolution: Open Addressing

Collision Resolution: Linked List Resolution

Collision Resolution: Bucket Hashing
- Slides: 39