Introduction to Database Systems 67506 1 Pop Quiz
Introduction to Database Systems 67506 1
Pop Quiz • Question 1: How often do you use a database system or database system application? a) At least once a day (LOVE my db apps) b) About once a week (anything to get out of cleaning my room) c) Very rarely, if ever (only at gunpoint) 2
Pop Quiz • Question 2: Who are the people below and how are they related to our course? • Some clues: – 4 out of 5 are multi-billionaires – 4 out of 5 dropped out of school Conclusions: 1. Dropping out of school may be a good strategy 2. Patent your ideas! 3. Understanding and developing databases can make you rich or famous (or both) – (4 out of 5 are Jewish) 3
What is a database? • Database: A collection of data • Data Base Management System (DBMS): A software system that facilitates the creation, maintenance and use of an electronic database • Examples on the Web: – <you fill this in> • Examples off the Web: – <you fill this in> 4
Some History 5
History: 1960 s Databases Taken By John Taken By Jane Courses AI • Required by companies with data processing needs: banking, airline reservations, corporate record keeping • Databases were navigational • Accessed by entire programs 6
History: 1970 s Course Studies DB S 1 DB S 2 AI S 1 Stud. Id Name Phone Number S 1 John 234 -5678 S 2 Jane 444 -2218 • Codd (at IBM) introduces the relational model • Conceptually, data is stored in tables, connected by keys • Codd introduces relational algebra, for querying, which became the basis for SQL • End 1970 s, first DBMS-s hit the market 7
Moore’s Law Reversed • Moore’s Law: Processing power doubles every 18 months • Amount of data doubles every 9 months – Disk sales (# of bits) doubles every 9 months – Parkinson’s Law: Data expands to fill the space available for storage • Moore’s Law reversed: – Time to process all data doubles every 18 months! • Does your attention span double every 18 months? – No, so we need smarter data management techniques 8
Some of the Challenges 9
Designing the Schema • Real world requirements for data are complex • How do you choose which tables to store data in? Does it even matter? • Example: University Database – Students have names, tz, addresses, phone numbers – Courses have numbers, names, number of credits – Students register for courses • What kind of tables do we need? 10
Is this Good? Student. Course. Info Stud Name TZ Address Stud Phone Course Name Num Credits 11
Is this Good? Students Stud Name TZ Address Courses Stud Phone Registered Stud Course TZ Num Course Name Num Credits Can the notion of “good” be quantified in some way? 12
Processing the Data • How would we find all courses for which the student with TZ 12345678 is registered? Registered • Can sorting help? Stud Course TZ Num • How can we sort efficiently? • Sorting Data Interview at Google 13
Persistent Data • Jane has a joint bank account containing 150 shekels • Jane goes to the ATM ( )כספומט to withdraw 100 shekels 1) prompt user for credit card; 2) get balance from bank database; 3) if balance >= 100 then balance : = balance - 100; 4) emit the money 5) if OK then put new balance into database; else print "sorry" • What happens if the ATM crashes after line 3? line 4? • Would exchanging lines 4 and 5 help? 14
Multiple Concurrent Users • Jane and John have a joint bank account containing 150 shekels • Jane goes to the ATM ( )כספומט to withdraw 100 shekels • John goes to a different ATM ( )כספומט to withdraw 100 shekels • Do they both succeed win withdrawing 100 shekels? • What is the balance at the end? 15
Main Features of a DBMS 16
Some of the Main Features (1) • Levels of Abstraction: The conceptual schema (= manner in which the data seems to be stored to the user, usually in tables), is independent from the physical schema (= actual method of storage on disk, e. g. , using hash tables, etc). – Makes changes to physical storage easier – Makes access for user easier 17
Some of the Main Features (2) • ACID Properties: Atomicity, consistency, isolation and durability – Basically implies that system will behave “as expected” even if there are multiple users and system crashes • Efficiency: Although data will typically be huge (not fit in main memory), special access methods are available to make data retrieval efficient – We need efficiency in practice and not only an O(…) theoretical efficiency guarantee – “In theory there's no difference between theory and practice. In practice there is” 18
Administration 19
Administration • Lecturers: – Prof. Sara Cohen – Prof. Yehoshua Sagiv • TA: same as above • Course homepage: http: //www. cs. huji. ac. il/~db 20
Exercises • Approximately 10 assignments • 25% of the final grade • Exercises are done individually • Theoretical and practical assignments, some in Gradience • Each exercise has a grader in charge. Any, and all questions about the exercise should be sent to this grader • ALL EXERCISES ARE MANDATORY • EXERCISES CANNOT BE GIVEN IN LATE 21
Gradience System • This year several of our exercises will be submitted through the Gradience System. • You will now be given the secret token for registration. Write this down. It does not appear online. – Registration instructions are available online 22
Using Gradience • Gradience is a system for automatic grading of exercises • The question are mostly multiple choice • You should complete all questions and then submit (there is no partial submission) • You will then get feedback on the number of questions you got right/wrong • Typically, you will have unlimited chances to submit • Each time you redo the exercise, you will have the same questions, but new answers. 23
Books • H. Garcia-Molina, J. Ullman, J. Widom, Database Systems: The Complete Book • R. Ramakrishnan and J. Gehrke, Database Management Systems, 3 rd edition, 2002, Mc. Graw Hill. – A lot of online supporting material is available at http: //www. cs. wisc. edu/~dbbook/ • J. Ullman, Principles of Database & Knowledge. Base Systems Vol. 1, Computer Science Press 24
Topics • Modeling: ER Diagrams (1 week, Cohen) • Querying: Relational Algebra (1 week, Sagiv) • Querying: SQL (2 weeks, Cohen) • Programming: JDBC, PLSQL (1 week, Cohen) • Design Theory (3 weeks, Cohen) • Query Evaluation and Optimization (3 weeks, Sagiv) • Concurrency Control (2 weeks, Sagiv) • Recovery (1 week, Sagiv) 25
Teaching • The lectures and tirgulim will be given by myself and Prof. Sagiv • WARNING: If you miss class, you should make up the material before the tirgul, and if you miss a tirgul, you should make it up before class! 26
- Slides: 26