Introduction to Database CHAPTER 1 INTRODUCTION n n
Introduction to Database CHAPTER 1 INTRODUCTION n n n n n Edited: Wei-Pang Yang, IM. NDHU, 2005 Purpose of Database Systems View of Data Models Data Definition Language Data Manipulation Language Database Users Database Administrator Transaction Management Storage Management Overall System Structure Source: Database System Concepts, Silberschatz etc. 2001 1
Contents Chapter 1: Introduction PART 1 DATA MODELS Chapter 2: Entity-Relationship Model Chapter 3: Relational Model PART 2 RELATIONAL DATABASES Chapter 4: SQL Chapter 5: Other Relational Languages Chapter 6: Integrity and Security Chapter 7: Relational Database Design PART 3 OBJECT-BASED DATABASES AND XML PART 4 DATA STORAGE AND QUERYING Chapter 11: Storage and File Structure Chapter 12: Indexing and Hashing Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 2
Database System: Introduction n Database Management System (DBMS) n Contains a large bodies of information n Collection of interrelated data (database) n Set of programs to access the data Goal of a DBMS: n provides a way to store and retrieve database information that is both • convenient and • efficient. Functions of DBMS: Management of Data (MOD) n Defining structure for storage data n Providing mechanisms for manipulation of data n Ensure safety of data (system crashes, unauthorized access, misused, …) n Concurrent control in multi-user environment § Computer Scientists: developed a lot of concepts and technique for MOD § concepts and technique form the focus of this book, and this course Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 3
1. 1 Database System Applications n n Database Applications: n Banking: all transactions n Airlines: reservations, schedules n Universities: registration, grades, student profile, . . n Sales: customers, products, purchases n Manufacturing: production, inventory, orders, supply chain n Human resources: employee records, salaries, tax deductions Databases touch all aspects of our lives Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 4
1. 2 Database Systems vs. File Systems n n In the early days, database applications were built on top of file systems Drawbacks of using file systems to store data: n n Data redundancy and inconsistency • Multiple file formats, duplication of information in different files Difficulty in accessing data • Need to write a new program to carry out each new task Data isolation — multiple files and formats Integrity problems • Integrity constraints (e. g. account balance > 0) become part of program code • Hard to add new constraints or change existing ones Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 5
Drawbacks of using file systems (cont. ) n Drawbacks of using file systems to store data: (cont. ) n n Atomicity of updates 原子性 • Failures may leave database in an inconsistent state with partial updates carried out • E. g. transfer of funds from one account to another should either complete or not happen at all Concurrent access by multiple users • Concurrent accessed needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies § E. g. two people reading a balance and updating it at the same time n Security problems Solution Database systems offer solutions to all the above problems Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 6
1. 3 View of Data: Levels of Abstraction n Physical level: describes how a record (e. g. , customer nformation) is stored in disk. n n By sequential file, pointer, or hash structure, … Logical level: describes data stored in database, and the relationships among the data. type customer = record name : string; street : string; city : string; income : integer; end; n View level: application programs hide details of data types. Views can also hide information (e. g. , income) for security purposes. Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 7
View of Data: Three Levels An architecture for a database system Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 8
1. 3. 2 Instances and Schemas n Schema – the logical structure of the database n e. g. , the database consists of information about a set of customers and accounts and the relationship between them n Analogous to type information of a variable in a program n Physical schema: database design at the physical level n Logical schema: database design at the logical level account create table account (account-number char(10), balance integer) type customer = record name : string; street : string; city : integer; end; Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 9
Instances and Schemas (cont. ) n Instance – the actual content of the database at a particular point in time n Analogous to the value of a variable Instance Schema create table account (account-number char(10), balance integer) n Physical Data Independence – the ability to modify the physical schema without changing the logical schema n Applications depend on the logical schema n In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 10
View of Data: Three Levels An architecture for a database system Physical Data Independence Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 11
0. 2 Architecture for a Database System View 2: Three Levels (補) User A 1 Host Language + DSL 1 User A 2 Host Language + DSL 2 External View @ # & External schema A External/conceptual mapping A < External schema B External View B External/conceptual mapping B Conceptual View Database management system Dictionary (DBMS) e. g. system catalog Conceptual/internal mapping DBA (Build and maintain schemas and mappings) User B 2 User B 3 Host C, C++ Language + DSL DSL (Data Sub. Language) e. g. SQL 3 Conceptual schema User B 1 Host Language + DSL Storage structure definition (Internal schema) Edited: Wei-Pang Yang, IM. NDHU, 2005 Stored database (Internal View) 1 2 3 # Source: Database System Concepts, Silberschatz etc. 2001 . . . 100 & @ 12
1. 4 Data Models n n A collection of tools for describing n data (entities, objects) n data relationships n data semantics n data constraints Data Models: n Entity-Relationship model n Relational model n Object-oriented model n Semi-structured data models n Older models: • Network model and • Hierarchical model Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 13
1. 4. 1 Entity-Relationship Model § Example: Schema in the Entity-Relationship model 帳戶 客戶(存款戶, 貸款戶, 信用卡戶) Edited: Wei-Pang Yang, IM. NDHU, 2005 存款戶 Source: Database System Concepts, Silberschatz etc. 2001 14
Relational Database: A Sample Account A-101 is held by customer Johnson Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 15
Entity Relationship Model (cont. ) n n E-R model of real world n Entities (objects) • E. g. customers, accounts, bank branch n Relationships between entities • E. g. Account A-101 is held by customer Johnson • Relationship set depositor associates customers with accounts Widely used for database design n Database design in E-R model usually converted to design in the Relational model (coming up next) which is used for storage and processing n E-R model (ch. 2) n Relational Model (ch. 3) Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 16
1. 4. 2 Relational Model n Example: Tabular data (instants) in the Relational model Customer-id customername 192 -83 -7465 Johnson 019 -28 -3746 Smith 192 -83 -7465 Johnson 321 -12 -3123 Jones 019 -28 -3746 Smith Edited: Wei-Pang Yang, IM. NDHU, 2005 customerstreet Attributes customercity accountnumber Alma Palo Alto A-101 North Rye A-215 Alma Palo Alto A-201 Main Harrison A-217 North Rye A-201 Source: Database System Concepts, Silberschatz etc. 2001 17
Relational Database: A Sample Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 18
1. 4. 3 Other Data Models n Hierarchical Data Model n Network Data Model n Object-oriented Data Model n Object-relational Data Model n Extensible Markup Language (XML) n … Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 19
Introduction to Database CHAPTER 1 INTRODUCTION n n n n n Edited: Wei-Pang Yang, IM. NDHU, 2005 Purpose of Database Systems View of Data Models Data Definition Language Data Manipulation Language Database Users Database Administrator Transaction Management Storage Management Overall System Structure Source: Database System Concepts, Silberschatz etc. 2001 20
1. 5 Database Languages n Data Definition Language (DDL): n Specification notation for defining the database schema n E. g. create table account (account-number char(10), balance integer) n Data Manipulation Language (DML) n To express database queries or updates n E. g. Select account-number from account where balance >1000 n SQL (Structured Query Language): a single language for both Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 21
1. 5. 1 Data Definition Language (DDL) n Specification notation for defining the database schema E. g. create table account (account-number char(10), balance integer) DDL compiler generates a set of tables stored in a data dictionary (See p. 1 -12) Data dictionary contains metadata (i. e. , data about data) n n n Database schema Data storage and definition language • Language in which the storage structure and access methods used by the database system are specified • Usually an extension of the data definition language Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 22
1. 5. 2 Data Manipulation Language (DML) n Language for accessing and manipulating the data organized by the appropriate data model n DML also known as query language For retrieval, insertion, deletion, modification (update) Two classes of languages n n n Procedural – user specifies what data is required and how to get those data • E. g. … in C Declarative DML (Nonprocedural) – user specifies what data is required without specifying how to get those data • E. g. In SQL: Select account-number from account where balance > 700 SQL is the most widely used query language n n Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 23
SQL (Structured Query Language) n SQL: widely used non-procedural language n E. g. find the name of the customer with customer-id 192 -83 -7465 select customer-name from customer where customer-id = ‘ 192 -83 -7465’ customer Output: customer-name Johnson Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 24
SQL (Structured Query Language) n n E. g. find the balances of all accounts held by the customer with customer-id 192 -83 -7465 select account. balance from depositor, account where depositor. customer-id = ‘ 192 -83 -7465’ and depositor. account-number = account-number Application programs generally access databases through one of n Language extensions to allow embedded SQL n Application program interface (e. g. ODBC/JDBC) which allow SQL queries to be sent to a database Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 25
1. 6 Database Users and Administrators User A 1 Host Language + DSL 1 User A 2 Host Language + DSL 2 External View @ # & External schema A External/conceptual mapping A < External schema B External View B External/conceptual mapping B Conceptual View Database management system Dictionary (DBMS) e. g. system catalog Conceptual/internal mapping DBA (Build and maintain schemas and mappings) User B 2 User B 3 Host C, C++ Language + DSL DSL (Data Sub. Language) e. g. SQL 3 Conceptual schema User B 1 Host Language + DSL Storage structure definition (Internal schema) Edited: Wei-Pang Yang, IM. NDHU, 2005 Stored database (Internal View) 1 2 Source: Database System Concepts, Silberschatz etc. 2001 3 # . . . 100 & @ 26
1. 6. 1 Database Users n n n 單純的 n Application programmers n interact with system through DML calls Sophisticated users 複雜, 多用途 n Submit query without write program n E. g. OLAP (Online analytical processing), data mining tools Specialized users n write specialized database applications that do not fit into the traditional data processing framework n E. g. CAD, expert system, complex data type (graphics, audio) Naive users (end user) n invoke one of the permanent application programs that have been written previously n E. g. people accessing database over the web, bank tellers, clerical staff 辦事員 Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 27
1. 6. 2 Database Administrator n n Database Administrator: n Coordinates all the activities of the database system; n has a good understanding of the enterprise’s information resources and needs. Database Administrator's Duties: n Schema definition n Storage structure and access method definition n Schema and physical organization modification n Granting of authorization for data access n Routine maintenance • Periodically backup database • Upgrade system e. g. disk • Monitoring performance n … Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 28
Introduction to Database CHAPTER 1 INTRODUCTION n n n n n Edited: Wei-Pang Yang, IM. NDHU, 2005 Purpose of Database Systems View of Data Models Data Definition Language Data Manipulation Language Database Users Database Administrator Transaction Management Storage Management Overall System Structure Source: Database System Concepts, Silberschatz etc. 2001 29
1. 7 Transaction Management n n n Transaction: n A transaction is a collection of operations that performs a single logical function in a database application n Atomicity: all or nothing Transaction-management component n ensures that the database remains in a consistent (correct) state, n Failure recovery manager n Failure: • system failures (e. g. , power failures and operating system crashes) • transaction failures. Concurrency-control manager n controls the interaction among the concurrent transactions, to ensure the consistency of the database. Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 30
1. 8 Database System Structure n Components of Database System n Storage Manager • • n Query Require a large amount of space Can not store in main memory Disk speed is slower Minimize the need to move data between disk and main memory DBMS Language Processor Optimizer Operation Processor Query Processor • • • Helps to simplify to access data High-level view Users are not be burdened unnecessarily with the physical details Access Method File Manager Goal of a DBMS: provides a way to store and retrieve data that is both convenient and efficient. p. 1 Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 Database 31
Overall System Structure low-level data stored database. Source: Database System Concepts, Silberschatz etc. 2001 Edited: Wei-Pang Yang, IM. NDHU, 2005 32
1. 8. 1 Storage Management n n n Storage Manager n is a program module n that provides the interface between the low-level data stored and the application programs and queries submitted to the system. Tasks of the Storage Manager: n interaction with the file manager (part of Operating System) n Translates DML into low-level file-system commands, n i. e. responsible for storing, retrieving and updating of data in database Data Structures of the Storage Manager n Data files: store database itself n Data Dictionary: store metadata n Indices: provide fast access to data items that hold particular values Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 33
Storage Management (cont. ) n Components of Storage manager: n Authorization and Integrity Manager • Tests for the satisfaction of integrity constraints • Checks the authority of users to access data n Transaction Manager • Ensure the database in a consistent state (correct) after failures • Ensure that concurrent transaction executions proceed without conflicting n File Manager • Manages the allocation of space on disk • Manages the data structures used to representation data stored n Buffer manager • Fetches data from disk into main memory • Decides what data to cache in main memory Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 34
1. 8. 2 The Query Processor n n DDL Interpreter n Interprets DDL statements n write the definitions (schema, view, . . ) into the data dictionary DML Compiler n n n Translates DML statements into an evaluation plan (or some evaluation plans) which consists low-level instructions Query Optimization: picks the lowest cost evaluation plan Query Evaluation Engine: n execute low-level instructions generated by the DML Compiler Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 35
Example: A Simple Query Processing (補) Query in SQL: DBMS SELECT CUSTOMER. NAME FROM CUSTOMER, INVOICE WHERE REGION = 'N. Y. ' AND AMOUNT > 10000 AND CUTOMER. C#=INVOICE. C Language Processor Internal Form : ( (S Optimizer SP) Operator : SCAN C using region index, create C SCAN I using amount index, create I SORT C? and I? on C# JOIN C? and I? on C# EXTRACT name field Language Processor Query Processor Access Method Storage Manager Operator Processor Calls to Access Method: OPEN SCAN on C with region index GET next tuple. . . Calls to file system: GET 10 th to 25 th bytes from block #6 of file #5 Access Method e. g. B-tree; Index; Hashing File System Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 database 36
1. 9 Application Architectures n Application Structure n User uses database at the site n Users uses database through a network • Client: remote database users work • Sever: database system runs here n Partition of Database Application n Two-tier architecture n Three-tier architecture Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 37
1. 9 Application Architectures ODBC/JDBC n n Two-tier Architecture: e. g. client programs using ODBC/JDBC to communicate with a database Three-tier Architecture: e. g. web-based applications, and applications built using “middleware” Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 38
1. 10 History of Database Systems n n n 1950 s – early 1960: n Tapes: sequentially n Application: Payroll, n Input: punched decks, Output: printer Late 1960 s -- 1970 s: n Disk: direct access n Codd proposed Relational Model, … Turing Award 1980 s: n System R: IBM Res. Lab. IBM DB 2, Oracle, Ingress, DEC Rdb n Replaced Network/Hierarchical model n Research: parallel database, distributed database, object-oriented, … Early 1990 s: n Parallel database n Object-Relational Late 1990 s: n World Wide Web was explosive growth n Database were used much more than ever before n Database had to support Web interfaces to data Edited: Wei-Pang Yang, IM. NDHU, 2005 Source: Database System Concepts, Silberschatz etc. 2001 39
History of Database Systems (補) 1950 -1965 Data Model Network Hierarchical 1980 -1989 1990 -1995 Semantic Object-oriented Logic Relation proposed Relation Merging data models, knowledge-base Relation 1965 -1979 Network Hierarchical Database Hardware Mainframes User Interface None Forms DL/I COBOL+DL/I Program Interface Procedural Presentation and display processing Reports Processing data Edited: Wei-Pang Yang, IM. NDHU, 2005 Procedural Reports Processing data Mainframes Minis PCs Graphics, Menus SQL, QUEL Query-by-forms Embedded Query non-Procedural Report generators Information and transaction processing 1995 -present Object-Oriented OO-relation XML Relation Faster PCs Workstations Parallel Database machines Optical memories Natural language Speech input WWW Web interface Integrated database 4 GL and programming Logic programming language Business graphics Image output Knowledge processing Source: Database System Concepts, Silberschatz etc. 2001 Multimedia 40
- Slides: 42