Database Management Systems CSE 590 DB Introduction March
Database Management Systems CSE 590 DB Introduction March 30, 1998
Staff z. Instructor: Alon Levy y. Sieg, Room 310, alon@cs. washington. edu y. Office hours: by appointment z. TA: Rachel Pottinger y. Sieg 223, rap@cs. washington. edu y. Office hours: TBA z Mailing list: cse 590 db@cs z Web page: http: //www. cs. washington. edu/education/courses/590 db/98 sp/
Purpose and Format z. Purpose: y. Foundations of database management systems. y. Introduction to current research issues in databases. z. Format: y. Lectures introducing the main topics y. Student presentations of selected research papers. y. Projects (for 3 -credit takers)
Textbooks (none required) z Database Management Systems (Ramakrishnan) z Foundations of Databases (Abiteboul, Hull & Vianu) z Fundamentals of Database Systems (Elmasri and Navathe) z Database Systems (Silberschatz, Korth and Sudarshan) z Data and Knowledge based Systems (volumes I, II) (Ullman) z Readings in Database Systems (Stonbraker) z Proceedings of SIGMOD, VLDB, PODS confrences.
Prerequisites
Real Prerequisites z Operating systems z Data structures and algorithms z Distributed systems z Complexity theory z Mathematical Logic z Knowledge Representation z User interface design z Programming languages z Artificial Intelligence (Search) z Greek, Hebrew, French
Why Use a DBMS? All programs manipulate data, so why use a database? • Large • • • amounts of data (Giga’s) Data is very structured Persistent data Valuable data Performance requirements Concurrent access to the data Restricted access to data
Functionality of a DBMS z. Persistent storage management z. Multiple abstraction levels of the data (in particular, provides a logical view). z. High level query and data manipulation language z. Efficient query processing z. Transaction management z. Resiliency: recovery from crashes. z. Interface with programming languages
Persistent Storage z. Becomes a hard problem because of the interaction with the other levels of the DBMS: y. What are we storing? y. Efficient indexing y. Special issues due to resiliency requirements y. Exploit “semantic” knowledge z. Issue: interaction with the operating system. Should we rely on the OS?
Levels of Abstraction External Schema 1 External Schema 2 Conceptual Schema • Conceptual schema: tables and their attributes • Physical schema: files, indexes hash tables. • External schema: views of the different applications, classes of users. Physical Schema Disk System catalog: la The component of the database that manages the meta data about the different levels of abstraction.
The Relational Model Data is organized into tables with attributes. Rows in the tables are tuples. The power of simplicity!
Logical Model Issues z. What data model should we use? y. Relational, object-oriented, object-relational, deductive database model, semi-structured z. How do we design a good conceptual schema? (normal forms, index selection) z. Are we really providing an abstraction? z. How does this abstraction interact with the programming language? (the impedance mismatch).
Querying a Database z. Find all the students who have taken CSE 444 in Winter, 1998. z. S(tructured) Q(uery) L(anguage) yselect E. name yfrom Enroll E ywhere E. course=CSE 444 and y E. quarter=“Winter, 1998” z. SQL also provides an update facilities. z. SQL: an acquired taste (try datalog first)
Issues in Query Languages z. Does it provide the appropriate functionality? y. SQL books get thicker and thicker. z. Expressive power of a query language. z. Ease of use (query by example) z. Declarativity z. Provide guidance in writing “good” queries?
Query Optimization z. A query is a declarative specification of “what” you want. z. A query execution plan is an imperative program to produce the answer. z. Query optimization: produce an efficient query execution plan. z. Issues: large search space of plans, cost estimation, semantic transformations z. Real goal: avoid the bad plans.
Transaction Processing and Recovery z. For efficient use of resources, we want concurrent access to data. z. Systems sometimes crash. z. A “real” database guarantees ACID: ACID y. Atomicity: all or nothing of a transaction. y. Consistency: always leave the DB consistent. y. Isolation: every transaction runs as if it’s the only one in the system. y. Durability: if committed, then we really mean it.
Database Industry z. Relational databases are a great success of theoretical ideas. z“Big 3” DBMS companies are among the largest software companies in the world. z. IBM (with DB 2) and Microsoft (SQL Server, Microsoft Access) are also important players. z$20 B industry z. Moving to warehousing, decision support.
Why Use a DBMS? z. Data independence and efficient access. z. Reduced application development time. z. Data integrity and security. z. Uniform data administration z. Concurrent access and recovery from crashes.
DBMS Development z. Issues in scaleup: y. Indexing and storing large amounts of data. y. Algorithms: sorting, joins z“Novel” issues: y. Modeling data (models, constraints, schema design). y. Query languages y. Optimization: from a declarative specification to an efficient program.
Course (Rough) Outline z. Data models and their associated query languages: y. Relational: SQL, datalog, relational algebra y. Object-oriented: OQL y. Object-relational: novel features in SQL 3. y. Semi-structured: languages for querying graphs. z. Storage (very briefly) z. Query optimization: foundations and current limitations.
Outline (continued) z. Semantic analysis: query containment, using views. z. Decision support, data warehousing: data warehouse design, maintainability issues. z. Data integration: querying heterogeneous sources in a uniform fashion. z. Data mining z. SIGMOD/PODS
- Slides: 21