Advanced Database Techniques Martin Kersten cwi nl Stefan
Advanced Database Techniques Martin. Kersten @ cwi. nl Stefan Manegold@cwi. nl Sandor Heman @ cwi. nl Jennie Zhang @ cwi. nl Romulo Goncalves @cwi. nl
Administrative details • The website evolves as during the course • Exam material is marked explicitly • Lab work deadlines are strict • Email is the preferred way to communicate • Tomorrow the assistants will be available in person between 11: 00 -12: 00, room REC-P. 123
Relational systems • A database system should simplify the organization, validation, sharing, and bookkeeping of information • Prerequisite knowledge – Relational data model and algebra – Data structures (B-tree, hash) – Operating system concepts – Using a SQL database system • What is your practical experience? [Ruby on Rails expertise needed]
Applications • Bread-and-butter applications? – Web-shop – Banking systems – Inventory systems – Production systems – Shopping systems – Government systems – Health systems – Multimedia systems – Science systems …
Advanced Applications • Bread-and-butter applications ? ? ? – Banking systems • What happens if you install a stock trading system which should handle >100 K transactions/minute • How to derive trading advice using compute intensive applications • How to warn thousands of users about their trading opportunity – …. Need for parallel, distributed main-memory database technology…
Advanced application requirements • Bread-and-butter applications – Inventory applications • How to install a battlefield inventory systems • How to deliver goods just in time? • How to keep track of moving objects/persons ? • … need for sensor-based database support and RFID tags … need for a new DBMS ? …
Advanced Applications • Production systems – How to interact with component suppliers – How to manage the production workflow – How to avoid bad production steps – How to maintain a database with 12000 tables (SAP) • … need for interoperability between autonomous systems… datamining and knowledge discovery…
Advanced Applications • Health information systems – How to monitor your health over 30 years – How to enable quick response to a heart attack • …need for interoperable database systems …
The Ambient Home HELP
The Ambient Home HELP 911 called
Monet. DB Data. Cell
Monet. DB Data. Cell A Shared Tuple Space using an SQL DBMS nucleus 911 called
Advanced applications • Government systems – Security • Biometric data management issues, finger/image matching – Public safety • Forensics, manipulate complex objects using proprietary algorithms • …need for extensible database technology…need to support unstructured data…
Advanced Applications • Science systems – The new accelerator in CERN • how to handle >1 PTByte files – The Sloan Digital Skyserver schema is 200 pages and the catalogued data 2. 5 Tb • How to query this efficiently –. . need for P 2 P and … a novel way to organize data…
LOFAR central processor specs • Streaming Data – Input: 320 Gbit/s – Internally within correlator: 20 Tbit/s – Into storage: 25 Gbit/s = 250 TByte/day – Final products: 1 -3 TByte/day • High Performance Computing – Correlation: 15 Tflops – Pre processing and filtering: 5 Tflops – Off-line processing (calibration, analysis): 5 -10 Tflops – Visualisation, control, scheduling etc: 2 Tflops • Storage – On-line temporal storage: 500 TByte – Archive: PByte range of data stored in Grid
Technological challenges • Data is often not structured as tables – XML and XQuery • Data does not always fit on one system – Distributed and parallel databases • Querying is more like world-wide searching – Continuous and streaming queries • A database tells more than facts – Datamining and knowledge discovery
Code bases • Database management systems are BIG software systems – Oracle, SQL-server, DB 2 >1 M lines – Postgre. SQL 300 K lines – My. SQL 500 K lines – Monet. DB 200 -800 K lines – SQLite 40 K lines • Programmer teams for DBMS kernels range from a few to a few hundred
Performance components • • • Hardware platform Data structures Algebraic optimizer SQL parser Application code – What is the total cost of execution ? – How many tasks can be performed/minute ? – How good is the optimizer? – What is the overhead of the datastructures ?
Not all are equal
Gaining insight • Study the code base (inspection + profiling) – Often not accessible outside development lab • Study individual techniques (data structures + simulation) – Focus of most Ph. D research in DBMS • Detailed knowledge becomes available, but ignores the total cost of execution. • Study as a functional black box – Analyse a small application framework
The Jack The Ripper Project • Study the snippet of the database technology and design an XQuery and SQL application • What is the schema? • What are the queries? • What are unorthodox solutions?
Learning points • My poor knowledge on relational database? Read the chapters on SQL and relational algebra. Knowledge on data structures comes in handy. • Database systems are much more than administrative bookkeeping systems
Learning points – Advanced application challenge the technology provided by a DBMS – Many techniques do not easily scale in size, complexity, functionality – Effectiveness of a DBMS is determined by many tightly interlocked components
- Slides: 29