Chapter 1 Introduction n Purpose of Database Systems

Chapter 1: Introduction n Purpose of Database Systems n View of Data n Data Models n Data Definition Language n Data Manipulation Language n Transaction Management n Storage Management n Database Administrator n Database Users n Database Management System Structure 1. 1 ©Silberschatz, Korth and Sudarshan

What is a Database? n According to the book: Ø Collection of interrelated data Ø Set of programs to access the data Ø A DBMS contains information about a particular enterprise Ø DBMS provides an environment that is both convenient and efficient to use. n Another definition: Ø A database is a collection of organized, interrelated data, typically relating to a particular enterprise Ø A Database Management System (DBMS) is a set of programs for managing and accessing databases 1. 2 ©Silberschatz, Korth and Sudarshan

Some Popular Database Management Systems n Commercial “off-the-shelf” (COTS): Ø Oracle Ø IBM DB 2 (IBM) Ø SQL Server (Microsoft) Ø Sybase Ø Informix (IBM) Ø Access (Microsoft) n Open Source: Ø My. SQL Ø Postgre. SQL Note: This is not a course on any particular DBMS! 1. 3 ©Silberschatz, Korth and Sudarshan

Some Database Applications n Databases touch all aspects of our lives: Ø Banking – accounts, loans, customers Ø Airlines - reservations, schedules Ø Universities - registration, grades Ø Sales - customers, products, purchases Ø Manufacturing - production, inventory, orders, supply chain Ø Human resources - employee records, salaries, tax deductions n Anywhere there is data, there could be a database. n Course context is an “enterprise” that has requirements for: Ø Storage and management of 100’s of gigabytes or terabytes of data Ø Support for 100’s or more of concurrent users and transactions Ø Traditional supporting platform, e. g, Sun Enterprise server, 2 GB RAM, 10 TB of disk space 1. 4 ©Silberschatz, Korth and Sudarshan

Purpose of Database System n In the early days, database applications were built on top of file systems – coded from the ground up. Sometimes this approach is still advocated. n Drawbacks of this approach: Ø Data redundancy and inconsistency Ø Multiple files and formats Ø Difficulty accessing data • A new program is required to carry out each new task Ø Data integrity problems • Integrity constraints (e. g. account balance > 0) become embedded throughout program code • Hard to add new constraints or change existing ones Ø Plus others… 1. 5 ©Silberschatz, Korth and Sudarshan

Purpose of Database Systems (Cont. ) n Database systems offer solutions for the above problems. n Database systems also support: Ø Atomicity of updates - Failures may leave database in an inconsistent state with partial updates • E. g. , transfer of funds from one account to another should either complete or not happen at all Ø Concurrent access by multiple users - Concurrent accessed is needed for performance - Uncontrolled concurrent accesses can lead to inconsistencies • E. g. two people reading a balance and updating it at the same time Ø Data security n Recoding this functionality from scratch is not easy! 1. 6 ©Silberschatz, Korth and Sudarshan

Levels of Abstraction n Physical level: defines low-level details about how a data item (e. g. , customer) is stored on disk. n Logical level: describes data stored in a database, and the relationships among the data. type customer = record name : string; street : string; city : integer; end; n View level: defines how information is presented to users. Views can also hide details of data types, and information (e. g. , salary) for security purposes. 1. 7 ©Silberschatz, Korth and Sudarshan

View of Data An “architecture” for a database system: 1. 8 ©Silberschatz, Korth and Sudarshan

View of Data, Cont. n In general, the interfaces between the three levels should be defined so that changes in some parts do not seriously influence others. n Carefully defining the interfaces helps enhance Physical Data Independence, which is the ability to modify the physical schema without changing the logical schema. 1. 9 ©Silberschatz, Korth and Sudarshan

Instances vs. Schemas n Similar to types and variables in programming languages n Schema – defines the structure or design of a database Ø Analogous to type information of a variable in a program - E. g. , a database might consists of information about a set of customers and accounts and the relationship between them Ø More precisely: - Logical schema: database design at the logical level - Physical schema: database design at the physical level n Instance – a database and its’ contents at one point in time Ø Analogous to a variable and its’ value 1. 10 ©Silberschatz, Korth and Sudarshan

What is a Data Model? n The phrase “data model” is used in a couple of different ways. n The phrase (use #1) is frequently used to refer to an overall approach or philosophy for database design and development. n For those individuals, groups and corporations that subscribe to a specific data model, that model permeates all aspects of database design, development, implementation, etc. n Current data models: Ø Entity-Relationship model Ø Relational model Ø Object-oriented model Ø Object-relational model Ø Semi, and non-structured data models n Legacy models: Ø Network Ø Hierarchical 1. 11 ©Silberschatz, Korth and Sudarshan

What is a Data Model, Cont? n During the early phases of database design and development, a “data model” is frequently developed (use #2). n The purpose of developing the data model is to define: Ø Data Ø Relationships between data items Ø Semantics of data items Ø Constraints on data items In other words, a data model defines the logical schema, i. e. , the logical level of design of a database. n A data model is typically conveyed as one or more diagrams. n The type of diagrams used depends on the overall approach or philosophy (i. e. , the data model, as defined in the first sense). n This early phase is referred to as data modeling. 1. 12 ©Silberschatz, Korth and Sudarshan

Entity-Relationship Model Example of an entity-relationship diagram: n Widely used for database modelling. n An ER model is converted to tables in a relational database. 1. 13 ©Silberschatz, Korth and Sudarshan

Relational Model Attributes n Example of tabular data in the relational model customer-id n customer-name customer-street customer-city account-number 192 -83 -7465 Johnson Alma Palo Alto A-101 019 -28 -3746 Smith North Rye A-215 192 -83 -7465 Johnson Alma Palo Alto A-201 321 -12 -3123 Jones Main Harrison A-217 019 -28 -3746 Smith North Rye A-201 From a data modeling perspective, which approach is preferable? ER model, or the relational model? 1. 14 The ©Silberschatz, Korth and Sudarshan

A Sample Relational Database 1. 15 ©Silberschatz, Korth and Sudarshan

Data Definition Language (DDL) n Notation and program for defining a (physical) database schema (see page 129 for a more complete example): create table account ( account-number char(10), balance integer) n Given a DDL file, the DDL compiler generates a set of tables n A description of those tables is stored in a data dictionary: Ø Contains information from the database schema Ø Frequently referred to as metadata (i. e. , data about data) n Data storage and definition language: Ø Language in which the storage structure and access methods used by the database system are specified Ø Usually an extension of the data definition language 1. 16 ©Silberschatz, Korth and Sudarshan

Data Manipulation Language (DML) n Language for accessing and manipulating the data Ø DML is also known as query language n Two classes of DML languages: Ø Procedural – user specifies what data is required and how to get that data Ø Non-procedural – user specifies what data is required without specifying how to get that data n SQL is the most widely used query language Ø Usually referred to as a non-procedural query language 1. 17 ©Silberschatz, Korth and Sudarshan

SQL Examples n Find the name of the customer with customer-id 192 -83 -7465: select from where customer-name customer-id = ‘ 192 -83 -7465’ n Find the balances of all accounts held by the customer with customer-id 19283 -7465: select account. balance from depositor, account where depositor. customer-id = ‘ 192 -83 -7465’ and depositor. account-number = account-number n Databases are typically accessed by: Ø Users through a command line interface Ø Application programs that (generally) access them through: • Language extensions to allow embedded SQL • An application program interface (e. g. ODBC/JDBC) which allow SQL queries to be sent to a database 1. 18 ©Silberschatz, Korth and Sudarshan

Database Users n Users are differentiated by the way they interact with the system n Naïve users – invoke application programs that have been written previously Ø E. g. people accessing a database over the web, bank tellers, clerical staff, ATM users n Application programmers – interact with the system by making DML calls through an API, e. g. , ODBC or JDBC from within a computer program n Sophisticated users – form requests in a database query language, typically submitted at the command-line n Specialized users – write specialized database applications that do not fit into the traditional data processing framework 1. 19 ©Silberschatz, Korth and Sudarshan

Database Administrator (DBA) n Coordinates all the activities of the database system; the DBA has a good understanding of the enterprise’s information resources and needs. n DBA duties include: Ø Granting user authority to access the database Ø Acting as liaison with users Ø Installing and maintaining DBMS software Ø Monitoring performance and performance tuning Ø Backup and recovery n According to the book, the DBA is also responsible for: Ø Schema definition Ø Storage structure and access method definition Ø Schema and physical organization modification Ø Specifying integrity constraints Ø Responding to changes in requirements n These latter tasks are typically performed by a DB designer. 1. 20 ©Silberschatz, Korth and Sudarshan

Transaction Management n A transaction is a collection of operations that performs a single logical function in a database application n The backup and recovery components of a DBMS ensure that the database remains in a consistent (correct) state despite failures: Ø system, power, network failures Ø operating system crashes Ø transaction failures. n The concurrency-control manager in a DBMS controls the interaction among the concurrent transactions, to ensure the consistency of the database. 1. 21 ©Silberschatz, Korth and Sudarshan

Storage Management n The storage manager in a DBMS provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. n The storage manager is responsible to the following tasks: Ø interaction with the file manager Ø efficient storing, retrieving and updating of data n Note that the DBMS may or may not make use of the facilities of the operating systems’ file management facilities. 1. 22 ©Silberschatz, Korth and Sudarshan

Overall System Structure Query Optimizer 1. 23 ©Silberschatz, Korth and Sudarshan

Application Architectures: Ø Mainframe – client programs and DBMS reside on one platform. Ø Two Tier – client programs and DBMS reside on different platforms; clients connect to DBMS via an API such as ODBC/JDBC. Ø Three Tier – client programs, application server (or other “middleware”), and DBMS; clients connect to DBMS indirectly through the application server (also via an API). Typically used in web-based applications. Ø N Tier – recent generalization of 2 and 3 tier architectures. 1. 24 ©Silberschatz, Korth and Sudarshan
- Slides: 24