Database Activities and Trends Jim Gray Microsoft Research
Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, Tech. Net, London Outline • What I have been doing 15 minutes • BIG Changes in DBs 15 minutes • Q&A 20 minutes
Scalability Projects • Terra. Server: Geospatial data online – Now part of Virtual Earth http: //local. live. com/ • SATA disk evaluation – Copy 1. 5 Petabytes (count types of errors) MSR-TR-2005 -166 • Disk and Network performance move 1 GB/s from CERN to Pasadena • Bricks MSR-TR-2004 -62 – BI-Bricks: cheap boxes/disks for BI – Server Bricks: Terra. Server Bricks: MSR-TR-2004 -107
DB Projects • Spatial data access inside SQL – Gives a good example of using CLR to extend SQL – Sample is part of SQL 2005 programming samples. – Many papers …, MSR-TR-2005 -122, MSR-TR-2006 -52 • To Blob or NOT to Blob? – Explored what is the break-even point of Blobs vs Files. Guess what! Almost all files should be blobs. MSR-TR-2006 -45 • GPU Tera. Sort: – You have been hearing about Many-Core from Intel – Nvidia & ATI give you 100 cores today (2 x next year) Not me, but 10 x the operations per second than the CPU very cool! 10 x the memory bandwidth of the CPU – How to program them? – Sort represents IO, memory, processing. – GPU Terra. Sort demos this MSR-TR-2005 -183 – Accelerator: C# extension is a GPU compiler. MSR-TR-2005 -184
e. Science Projects • Sky. Server: Astronomy data online http: //skyserver. sdss. org/ – A real Data Grid app – Web services are popular – Sky. Query and Cas. Jobs use web services. http: //casjobs. sdss. org/Cas. Jobs/ – Spatial access built as SQL 2005 C# extensions. • Doing Finite Element Analysis with a DB and Vis tools Supporting Finite Element Analysis with a Relational Database Backend; Part I: There is Life beyond Files MSR-TR-2005 -49 • Ecological sensors (soil, water, ocean, …) – Only public thing so far: http: //lifeunderyourfeet. org/ – Many papers coming • Starting Bio. Info efforts (Portable Pub. Med Central, …. )
Portable Pub. Med. Central • “Information at your fingertips” • Helping build Portable. Pub. Med. Central • Deployed US, China, England, Italy, South Africa, (Japan soon). • Each site can accept documents • Archives replicated • Federate thru web services • Working to integrate Word/Excel/… with Pubmed. Central – e. g. Word. ML, XSD, • To be clear: NCBI is doing 99% of the work, but it is very cool and very significant.
Outline • What I have been doing 15 minutes • BIG Changes in DBs 15 minutes • Q&A 20 minutes
DB System Architecture sets records os Worked, but applications wanted to query other data types Added: sets utilities Notification Space Time Data Mine Cubes Text ETL Replication XML Queues Procedures +Text, Time, Space + Cubes, Data mining + XML, XQuery + Programming Languages + Triggers and queues + Replication, Pub/sub + Extract-Transform-Load + Many more extensions coming utilities • The classic DBMS model … records os A Mess?
DB Systems evolved to be containers for information services develop, deploy, and execution environment • Classic ++ – – – – + Triggers and queues + Replication, Pub/sub + Extract-Transform-Load + Text, Time, Space + Cubes, Data mining + XML, XQuery + Many more extensions coming sets records os utilities – + Programming Languages • DBMS is an ecosystem OO is the key structuring strategy: – – – Everything is a class Database is a complex object Core object is Data. Set Classes publish/consume them Depends on strong Object Model Data. Set
The Object-Relational World marry programming languages and DBMSs • Stored procedures evolve to “real” languages Java, C#, . . With real object models. • Data encapsulated: a class with methods • Classes may be persistent Business Objects • Tables are enumerable & index-able record sets with foreign keys • Records are vectors of objects • Opaque or transparent types • Set operators on transparent classes • Transactions: – Preserve invariants – A composition strategy – An exception strategy • Ends Inside-DB Outside-DB dichotomy
Ask not “How to add objects to databases? ”, Ask “What kind of object is a database? ” Q: Given an object model, what is a DB? A: Data. Set class and methods (nested relation with metadata) This is the basis for the ecosystem Distributed DB Extensible DB Interoperable DB Tables Question or Text …. or cube Dataset Or…. . This was implicit in ODBC but is now explicit within the DBMS ecosystem Input: Command (any language) Output: Dataset Entity Set in ADO. NET 3. 0
Queues & Workflows SODA (Service Oriented Data Architecture) Service Oriented Database Architecture: App Server-Lite? MSR TR 2005 129 • Apps are loosely connected via Queued messages Workflow: • Queues are databases. Script Execute • Basis for workflow Administer & • Queues: the first class to Expedite add to an OR DBMS all built on queues • Queues fire triggers. Active databases • Synergy with DBMS security, naming, persistence, types, query, …
Text, Temporal, and Spatial Data Access select Title, Abstract, Rank • Q: What comes after queues? from Books join (Title, A: Basic types: text, time, space, … Free. Text. Table Abstract, ‘XML semistructured') T • Great application of OR technology on Book. ID = T. Key • Key idea: table valued functions == indices An index is a table, organized differently Query executor uses index to map: select galaxy, distance from Get. Nearby. Obj. EQ(22, 37) Key → set (aka sequence of rows) • Table valued function can do this map Optimizer can use it. select store, holiday, sum(sales) • +extras: cost function, cardinality, … from Sales join Holiday. Dates(2004) T on Sales. day = T. day group by store, holiday • BIG DEAL: Approximate answers: Rank and Support
What’s new here? Question Dataset • DBMS have tight-integration with language classes (Java, C#, VB, . . ) • The DB is a class • You can add classes to DB. • Adding indices is “easy” If you have a new idea. • Now have solid Queue systems Adding workflow is “easy” If you have a new idea. • This is a vehicle for publishing data on the Web. Tables or Text or cube Or…. .
Cubes • Data cubes now standard • MDX is very powerful (Multi-Dimensional e. Xpressions) SELECT <axis_spec> FROM <cube_spec> WHERE <slicer_spec> • Dimension, Measure, Operator concepts highly evolved beyond snowflake schema • Cube stores cohabit with row stores ROLAP + MOLAP + ( x x. OLAP) FO CH RD EV Y (relational +multidimensional online analytic processing) • Very sophisticated algorithms • A big part of the ecosystem 0 199 991 1 2 199 93 19 RED WHITE BLUE
Semi-Structured Data • “Everyone starts with the same schema: <stuff/>. ” Then they refine it. ” J. Widom • We are a “strong schema” community • That has pros-and-cons. • Files <stuff/> and XML <<foo/> <bar/>> are here to stay. Get over it! • File directories are becoming databases; – Pivot on any attribute – Folders are standing queries. – Freetext+schema search (better precision/recall) • XSD (xml schema) and x. Query are transitional; But we have to do them to get to the real answer. • Cohabit with row-stores. • Challenge: figure out what comes after XSD+x. Query
Data Mining and Machine Learning • Tasks: classification, association, prediction • Tools: Decision trees, Bayes, A Priori, clustering, regression, Neural net, … • now unified with DBs – Create table T (x, y, z, u, v, w) Learn “x, y, z” from “u, v, w” using <algorithm> – Train T with data. – Then can ask: • Probability x, y, z, u, v, w • What are the u, v, w probabilities given x, y, z – Example: Learn height from age. • Anyone with a data mining algorithm has full access to the DBMS infrastructure. • Challenge: Better learning algorithms.
DM – DB Synergy Create the model: CREATE MINING MODEL Height. From. Age. Sex ( ID long key, Gender text discrete, Age long continuous, Height long continuous PREDICT) USING Decision_Trees Train a data mining model: INSERT INTO Height SELECT ID, Gender, Age, Height FROM People learn height from Gender + Age DB verbs to drive Modeler Predict height from model: SELECT height, Predict. Probability(height) FROM Height PREDICTION JOIN New ON New. Gender = Height. Gender AND New. Age = Height. Age Probabilistic Reasoning
Notification, Stream Processing, and Sensor Processing • Traditionally: Query billions of facts • Streams: millions of queries one new fact Q? – New protein compare to all DNA – Change in price or time A! facts • Implications – New aggregation operators (extension) – New programming style – Streams in products: • Queries represented as records • New query optimizations. • Sensor networks – push queries out to sensors. – Simpler programming model – Optimizes power & bandwidth fact, fact… Q QQ Q Q Notification
Restatement: DB Systems evolved to be containers for information services develop, deploy, and execution environment records os sets records os utilities – Everything is a class – Database is a complex object – Core object is Data. Set sets utilities • DBMS is an ecosystem Key structuring strategy: • The architecture lets you add your new ideas. Data. Set
Language + DB Integration (the Microsoft contribution) • LINQ are a BIG deal (SQL and XML) http: //msdn. microsoft. com/data • Entity Sets are next step in Data Sets ADO. NET V 3 automates entities
Data access today Explicit DB connections command text Connection Rows void Emps. By. Date(Date. Time date) { using( Sql. Connection con = new Sql. Connection( Settings. Default. Adventure. Works. SQL)) { con. Open(); Opaque Command Data. Reader Sql. Command cmd = con. Create. Command(); cmd. Command. Text = @" SELECT Sales. Person. ID, First. Name, Hire. Date Relational Engine Customer Sales. Person FROM Sales. Person sp INNER JOIN Employee e ON sp. Sales. Person. ID = e. Employee. ID INNER JOIN Contact c ON e. Employee. ID = c. Contact. ID Entities ≠ Rows WHERE e. Hire. Date < @date"; cmd. Parameters. Add. With. Value("@date", date); Db. Data. Reader r = cmd. Execute. Reader(); Untyped resultsets while(r. Read()) { Console. Write. Line( "{0: d}: t{1}", r["Hire. Date"], r["First. Name"]); } }
void Emps. By. Date(Date. Time date) { Auto-Gen classes No explicit connections Query<T> Connection Command Data. Reader Relational Engine foreach (Sales. Person p in people) { Console. Write. Line("{0: d}t{1}", p. Hire. Date, p. First. Name ); } } Object. Context Map. Data. Reader Rows Strongly typed commands Order Map. Command using (Adventure. Works. DB aw = new Adventure. Works. DB()) { var people = from p in aw. Sales. People where p. Hire. Date < date select p; Sales. Data Map. Connection Entities public partial class Adventure. Works. DB : System. Data. Objects. Object. Context { public System. Data. Objects. Query<Sales. Order> Sales. Orders { … } public System. Data. Objects. Query<Sales. Person> Sales. People { … } } Objects Domain Objects Data access tomorrow Customer Strongly typed results Sales. Person
Outline • What I have been doing 15 minutes • BIG Changes in DBs 15 minutes • Q&A 20 minutes
- Slides: 29