Microsoft SQL Server Scalability Database Research Jim Gray
Microsoft SQL Server, Scalability, & Database Research Jim Gray Researcher Microsoft Corporation Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Outline 4 Summary of what you heard. (10 min) 4 The database scene in general. (10 min) 4 Scaleability: Farms, Clones, Parts & Packs (15 min) 4 Microsoft DB research focus. (15 min) • Terra. Server (design and ops). • RAGS. • Data Mining 4 Q&A Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ (10 min) 3 October 1999 Chicago, Ill.
Organizations Are Going Online 4 Building a digital nervous system. 4 Inexpensive hardware means huge databases are possible. 4 But, we are drowning in data. 4 Databases help organize information. 4 Microsoft’s goal: • Information at your fingertips. • Make it easy to capture, manage, and analyze information. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Microsoft SQL Server 7 Goals u u Easy u u Scalability u u u Data Warehousing u u Dynamic self management Multi-site management Operation Scripting Job scheduling and execution Alert/response management Scriptable Install+upgrade DBA profiling/tuning tools Unicode English Language Query Integrated with NT Security Integrated with NT files Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 4 3 October 1999 Chicago, Ill.
Scalability u u u Data Warehousing u Easy Win 9 x/NTW version Dynamic row-level locking Improved query optimizer Intra-query parallelism VLDB improvements Replication improvements Distributed query High Availability Clusters Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 11 3 October 1999 Chicago, Ill.
Scale Down to Windows 95 -98 4 Full function (same as NTW) 4 Integration with Access 97 4 MSDE in Office 2000 and MSDN 4 Win. CE version demonstrated Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Replication Publisher 4 Transactional and Merge 4 Remote update 4 ODBC and OLE DB subscribers 4 Wizards 4 Performance OS 390 VSAM DB 2 Subscriber CICS 2 PC, RPC Distributor DB 2 Subscriber Updating Subscriber Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ (immediate updates) 3 October 1999 Chicago, Ill.
Query Processor Enhancements Focus on Complex Queries 4 Parallelism 4 Improved scan, fetch, & sort 4 Smart hash & merge join 4 Large joins & grouping 4 Better query optimization 4 Multi-index operations 4 Automatic statistics maintenance 4 Distributed Query 4 Heterogeneous Query Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Parallel Query SMP & Disk Parallelism Global Agg. + Result 50 rows 4 x 50 rows Local Agg. + + Disks + + 50, 000 rows • # of emp. per group • total inc. per group 4 Plus Distributed 4 Plus Hash Join (fanciest on the planet) 4 Plus Optimized Partitioned views Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Distributed Heterogeneous Queries Data Fusion / Integration Join spread sheets, databases, directories, Text DBs etc. Any source that exposes OLE DB interfaces SQL Server as gateway, even on the desktop Directory Service Database (DB 2, VSAM, Oracle, …) Spreadsheet SQL 7. 0 Query Processor Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ Photos Mail Maps Documents and the Web 3 October 1999 Chicago, Ill.
Utilities The Key to LARGE Databases 4 Auto-Repair 4 Index creation 4 Backup • Fuzzy • Parallel • Incremental • Restartable ~2 x faster than 6. 5 4 DBCC • not required, • a good practice • 5 x - 100 x faster 4 Recovery 4 Reorganize • shrinks file • reclusters file Recovery time (secs) • Fast • File granularity Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ # of indices 17 3 October 1999 Chicago, Ill.
Data Warehousing u u Warehousing Framework Visual data modeler Microsoft repository Data transformation services Scalability (DTS) u Plato & Dcube - Multi Dimensional Data Cubes Data Warehousing u English query 2. 0 Easy Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 20 3 October 1999 Chicago, Ill.
Data Warehouse / Data Analysis 4 Data Transformation Services to get data into the warehouse 4 CUBE (OLE/DB OLAP) to analyze data Operational Data Extact & Load OLAP Data Warehouse Storage Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Plato and Data Cube and HOLAP CH EV FO Y RD By Year 0 199 991 1 992 1 993 1 By Make & Year Source table Europe RED WHITE BLUE By Color & Year Sum Partition 1 By Color “Plato” User 1 ROLAP Asia Designer Partition 2 MD SQL Client app “Plato” server Partition 3 Dcu b e USA Dcu b e SQL User 2 ROLAP Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ Client app 25 3 October 1999 Chicago, Ill.
English Query Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 26 3 October 1999 Chicago, Ill.
Easy Scalable Data Warehousing Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 27 3 October 1999 Chicago, Ill.
“Shiloh” The Next SQL Server 4 Shiloh (H 1’ 00) - Strengthen Position • Data Warehousing leadership - Materialized Views - Cascading Referential Integrity (#1 requested user-group feature) - XML support • Scalability - Win. CE support - W 2 K VLM (36 and 64 bit) - Multi-instance support 4 Yukon – Next Big Step • Scalability (Clusters, Partitions) • Programmability • Ease of Use (Self Tuning, Auto Config) Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Outline 4 Summary of what you heard. (10 min) 4 The database scene in general. (10 min) 4 Scaleability: Farms, Clones, Parts & Packs (15 min) 4 Microsoft DB research focus. (15 min) • Terra. Server (design and ops). • RAGS. • Data Mining 4 Q&A Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ (10 min) 3 October 1999 Chicago, Ill.
Info Capture Yotta Zetta Exa 4 You can record everything you see or hear or read. 4 What will you do with it? 4 How will you organize & analyze it? 4 Most data will never be seen 4 Analysis an summarization are key technologies Everything ! Recorded All Books Multi. Media Peta Tera All books (words) Giga . Movie Mega Kilo A Photo A Book Video Audio Read or write: 8 PB per lifetime (10 GBph) 30 TB (10 KBps) 8 GB (words) See: http: //www. lesk. com/mlesk/ksg 97/ksg. html Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Data Tidal Wave 4 Seagate 47 GB drive @ 783$ (= 1. 7 ¢/mb) • 100 GB penny per MB drive coming in 2000 4 10 $/GB = 10 k$/ Terabyte! • “Everyone” can afford one 4 What’s a terror bite? • • • If you sell ten billion items a year (e. g Wal-Mart) And you record 100 bytes on each one Then you get a Tera. Byte/year 4 Where will the terror bytes come from? • Multimedia (like the Terra. Server) and. . . Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Reducing Data’s Cost-of-Ownership Self-Managing data Cost of ownership: One admin/TB (100 K$ vs 10 K$) Admin cost exceeds storage cost. SQL 7: 4 Suggests indices 4 Migrates data away from end of file 4 Truncates file Someday: 4 Automatic move files to balance disks 4 Online defragmentation & restructuring 4 Online physical redesign Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
OBJECT RELATIONAL The Next Great DBMS Wave 4 All DB vendors have added objects to DB 4 Microsoft is adding DBs to Objects 4 Integration with COM+ 4 Gives user-defined types and objects 4 Plug-ins will be Billion dollar industry • Blades for SQL Server razor Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Why Is XML Important? Self-describing data Data stream in a typical interface… “ABC 47 -Z”, “ 100”, “STL”, “C”, “ 3”, “ 28” Same data stream in XML… <INVENTORY> <PART_NUM>ABC 47 -Z</PART_NUM> <QUANTITY>100</QUANTITY> <WAREHOUSE>STL</WAREHOUSE> <ZONE>C</ZONE> <AISLE>3</AISLE> <BIN>28</BIN> </INVENTORY> Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
table. xsl bar. xsl art. xsl Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 35 3 October 1999 Chicago, Ill.
XML Applications 4 Exposing Software as a “Service” • Websites without UI’s • Exposed services with common scheme • Integration points at the enterprise, valuechain, workgroup, desktop and intelligent gizmo “levels” 4 B 2 B value chains • Uses XML to transmit wide range of date to a broad set of stakeholders (regulatory agencies, suppliers, customers, etc. ). • Leverage for prior efforts like EDI • Biz. Talk a key industry effort in this regard Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
XML: Biz. Talk Framework www. biztalk. org XML schema XML MVS CICS SAP R/3 Library Order Processing Service Interface XML XML Message XML Another Service Document XML Message XML Document JD Edwards Browser Client Apps New Form Factors Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Outline 4 Summary of what you heard. (20 min) 4 The database scene in general. (10 min) 4 Scaleability: Farms, Clones, Parts & Packs (10 min) 4 Microsoft DB research focus. (15 min) • Terra. Server (design and ops). • RAGS. • Data Mining 4 Q&A Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ (15 min) 3 October 1999 Chicago, Ill.
Terminology for scaleability Farm 4 Farms of servers: • Clones: identical - Scaleability + availability Clone Partition • Partitions: - Scaleability • Packs Pack - Partition availability via fail-over Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Unpredictable Growth 4 The Terra. Server Story: • • We expected 5 M hits per day We got 50 M hits on day 1 We peak at 15 -20 M hpd on a “hot” day Average 5 M hpd after 1 year 4 Most of us cannot predict demand • Must be able to deal with NO demand • Must be able to deal with HUGE demand Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
An Architecture for Internet Services? 4 Need to be able to add capacity • New processing • New storage • New networking 4 Need continuous service • Online change of all components (hardware and software) • Multiple service sites • Multiple network providers 4 Need great development tools • Change the application several times per year. • Add new services several times per year. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Premise: Each Site is a Farm 4 Buy computing by the slice (brick): • Rack of servers + disks. 4 Grow by adding slices • Spread data and computation to new slices 4 Two growth styles: • Clones: anonymous servers • Parts+Packs: Partitions fail over within a pack 4 In both cases, remote farm for disaster recovery Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Scaleable Systems Scale UP and Scale OUT 4 Everyone does both. 4 Choice is • Size of a brick • Clones or partitions • Size of a pack Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Everyone scales out What’s the Brick? 41 M$/slice • IBM S 390? • Sun E 10, 000? 4 100 K$/slice • Wintel 8 X 4 10 K$/slice • Wintel 4 x 41 K$/slice • Wintel 1 x Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Clones: Availability+Scalability 4 Some applications are • Read-mostly • Low consistency requirements • Modest storage requirement (less than 1 TB) 4 Examples: • HTTP web servers (IP sprayer/sieve + replication) • LDAP servers (replication via gossip) • App/compute servers or firewalls 4 Replicate app at all nodes (clones) 4 Spray requests across nodes. 4 Grow by adding clones 4 Fault tolerance: stop sending to dead clone. 4 Growth: add a clone. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Facilities Clones Need 4 Automatic replication • Applications (and system software) • Data 4 Automatic request routing • Spray or sieve 4 Management: • Who is up? • Update management & propagation • Application monitoring. 4 Clones are very easy to manage: • Rule of thumb: 100’s of clones per admin Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Partitions for Scalability 4 Clones are not appropriate for some apps. • Statefull apps do not replicate well • high update rates do not replicate well • Huge DBs (disk to expensive to clone) 4 Examples • Email / chat / … • Databases 4 Partition state among servers 4 Scalability (online): • Partition split/merge • Partitioning must be transparent to client. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Partitioned (aka. Clustered) Apps 4 Mail servers • Perfectly partitionable 4 Business Object Servers • Partition by set of objects. 4 Parallel Databases - Transparent access to partitioned tables - Parallel Query Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Packs for Availability 4 Each partition may fail (independent of others) 4 Partitions migrate to new node via fail-over • Fail-over in seconds 4 Pack: the nodes supporting a partition • • • VMS Cluster Tandem Process Pair SP 2 HACMP Sysplex™ Win. NT MSCS (wolfpack) 4 Cluster In A Box now commodity 4 Partitions grow in packs. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
What Parts+Packs Need 4 Automatic partitioning (in dbms, mail, files, …) • Location transparent • Partition split/merge • Grow without limits (100 x 10 TB) 4 Simple failover model • Partition migration is transparent • MSCS-like model for services 4 Application-centric request routing 4 Management: • Who is up? • Automatic partition management (split/merge) Gray, Research and Microsoft SQL Server • Jim Application monitoring. Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Services on Clones & Partitions 4 Application provides a set of services 4 If cloned: • Services are on subset of clones 4 If partitioned: • Services run at each partition 4 System load balancing routes request to • Any clone • Correct partition. • Routes around failures. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Farm pairs: Always Up 4 Two farms 4 Changes from one sent to other 4 When one farm fails other provides service 4 Masks • Hardware/Software faults • Operations tasks (reorganize, upgrade move • Environmental faults (power fail) Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Availabilty for a simple web site Clones for availability Packs for availability Web File Store SQL Database SQL Temp State Front End Load Balance Web Clients Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 53 3 October 1999 Chicago, Ill.
Farm Scale Out Scenarios The FARM: Clones and Packs of Partitions Packed Partitions: Database Transparency SQL Partition 3 SQL Partition 2 replication Web File Store. A Web File Store. B SQLPartition 1 Database SQL Temp State Cloned Packed file servers Web Clients Load Balance Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 54 Cloned Front Ends (firewall, sprayer, 3 October 1999 web server )Chicago, Ill.
Reliable, Scalable, Modular Network Clients Load Balancing Clones 1 Component Load Balancing (COM+) Clones 1 2 3 4 Cluster Service Pack 2 3 … … 32 8 COM+ Components IIS Web Server Application Servers Jim Gray, IP Research and Microsoft SQL Server or other based services Microsoft Research http: //research. Microsoft. com/~gray/talks/ Data Servers SQL, Exchange, 3 October File 1999 55 Chicago, Ill.
Talk 2 (if there is time) 4 Terminology for scaleability 4 Farms of servers: Farm • Clones: identical - Scaleability + availability Clone Partition • Partitions: - Scaleability • Packs Pack - Partition availability via fail-over Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Scalability: COM+ progress serving 1, 000 -statement ASP’s (servelets) 4 Poor SMP Scaleability on IIS 4 NT 4 4 Big SPS: servelets per second (ASPs served per second by IIS, improvements 1, 000 statement VBscript) from standard Transaction Processing tricks Shift from 4 x 200 Mhz to 8 450 Mhz 4 Out of Proc (safe execution) now much faster than In Proc was on IIS 4 Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Scaleability: So, What about the death of NT/Alpha? Two simultaneous Compaq TPC-C numbers § Intel Profusion § Alpha § § NT/SQL/COM+ 550 Mhz 8 Processors 4 GB memory § 40, 368 TPM-C @ 18. 46$/tpm. C § 745 K$ 5 -year cost § Avail: 12/31/99 § § Unix/Sybase/Tuexdo 700 Mhz 8 Processors 16 GB memory § 42, 437 TPM-C @ 55. 45 $/tpm. C § $2. 35 M$ 5 -year cost Avail: 10/18/99 § 200% more expense for 5% more performance? Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Outline 4 Summary of what you heard. (10 min) 4 The database scene in general. (10 min) 4 Scaleability: Farms, Clones, Parts & Packs (15 min) 4 Microsoft DB research focus. (15 min) • Terra. Server (design and ops). • RAGS. • Data Mining 4 Q&A Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ (10 min) 3 October 1999 Chicago, Ill.
The Terra. Server http: //www. terraserver. microsoft. com/ Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Database & application UI 4 Coverage: Range from 70ºN to 70ºS today: 35% U. S. , 1% outside U. S. 4 Source Imagery: • 4 Concept: User navigates an ‘almost seamless’ image of 4 TB 1 sq meter/pixel Aerial (USGS - 60, 000 earth 46 Mb B&W- 151 Mb Color IR files) • 1 TB 1. 56 meter/pixel Satellite (Spin-2 - 2400 300 Mb B&W) 4 Display Imagery: 200 x 200 pixel images, subsample to build image pyramid 4 Store 5 x compressed data 4 Nav Tools: • • 1. 5 m place names “Click-on” Coverage map Expedia & Virtual Globe Pick of the week Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 200 x 200 m tile. 4 x. 4 km browse. 8 x. 8 km 8 m thumbnail 1. 6 x 1. 6 km “city view” 3 October 1999 Chicago, Ill.
Software: Classic 3 Tier Design Image Server Active Server Pages (ADO) 24 Web Client Internet Information Server 5. 0 46 HTML Java Viewer browser MTS The Internet 20 (8/12) Terra-Server Stored Procedures Fire wall SQL Server 7 46 Terra. Server DB Terra. Server Web Site Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ Internet Information Server 4. 0 Microsoft Site Server EE Image Delivery SQL Server Application 7 1999 Image Provider 3 October Site(s) 62 Chicago, Ill.
Logical Schema Load Mgmt Famous Category Country Name State Name Famous Place Source. Meta Scale Job Place Type Place Name Image Search Image Load Job Pyramid Small Place Name External Geo Image Type Terra. Server External Link Imagery Gazetteer External Group Search Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ Terra Database Terra. Admin 63 3 October 1999 Admin Chicago, Ill.
Terra. Server File Group Layout 4 Convert 324 disks to 28 RAID 5 sets plus 28 spare drives 4 Make 4 NT volumes (RAID 50) 595 GB per volume 4 Build 30 20 GB files on each volume 4 DB is File Group of 120 files E: F: Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ G: H: 3 October 1999 Chicago, Ill.
Hardware Internet Map Site Servers SPIN-2 100 Mbps Ethernet Switch DS 3 Web Servers 2. 9 TB Database Server Alpha. Server 8400 8 x 400. 10 GB RAM 324 Storage. Works disks 10 drive tape library (STC Timber Wolf DLT 7000 ) Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 65 3 October 1999 Chicago, Ill.
Load & Backup&Recovery 4 Backup and Recovery • STK 9710 Tape robot • SQL Server Backup & Restore + • Legato Networker • Fast, incremental, differential, online • Clocked at 80 MBps (peak) (~ 200 GB/hr) 4 Restore • Fast, incremental (file oriented), not online. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
BAD OLD Load DLT Tape “tar” NT Drop’N’ Do. Job Load. Mgr DB Wait 4 Load Backup Load. Mgr . . . Load. Mgr ESA Alpha Server 4100 100 mbit Ether. Switch 60 4. 3 GB Drives Alpha Server 4100 Img. Cutter Drop’N’ Images 10: Img. Cutter 20: Partition 30: Thumb. Img 40: Browse. Img 45: Jump. Img 50: Tile. Img 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place Enterprise Storage Array STC DLT Tape Library 108 9. 1 GB Drives Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 108 9. 1 GB Drives Alpha Server 8400 3 October 1999 Chicago, Ill.
New Image Load and Update DLT Tape “tar” Staging Disk Active Server Pages Cut & Load Scheduling System Metadata Load DB Dither Image Pyramid From base Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ JPEG tiles Image Cutter Merge ODBC Tx Terra. Loader ODBC TX ODBC Tx 68 Terra. Server SQL DBMS 3 October 1999 Chicago, Ill.
Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 99 6/ 22 /9 9 5/ 22 /9 9 4/ 22 /9 9 3/ 99 2/ 22 /9 9 1/ /9 8 12 /2 2 /9 8 11 /2 2 /9 8 98 10 /2 2 9/ 22 / 98 8/ 22 / 98 6/ 22 / 98 Count 4 1 TB of data 750 M records 10 M 4 2. 3 billion Hits 4 2. 0 billion DB Queries 0 4 1. 7 billion Images sent 4 368 million Page Views 4 99. 93% DB Availability 4 3 rd design now Online 4 Built and operated by team of 4 people Sessions Hit Page View DB Query Image 20 M 7/ 22 / After a Year: Terra. Server Daily Traffic Jun 22, 1998 thru June 22, 1999 30 M 3 October 1999 Chicago, Ill.
What Terra. Server Shows 4 Can serve huge databases on Internet for about a penny a page view mostly phone bill (!). Advertising pays more than a penny a page. 4 Commodity tools do scale fairly far. 4 A few people (3 developers, 1 operator) using power tools can build an impressive web site Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Outline 4 Summary of what you heard. (20 min) 4 The database scene in general. (10 min) 4 Scaleability: Packs & Mobs (10 min) 4 Microsoft DB research focus. (15 min) • Terra. Server (design and ops). • RAGS. • Data Mining 4 Q&A Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ (15 min) 3 October 1999 Chicago, Ill.
Automatic Testing 4 60% of Microsoft R&D is testing. 4 What can research do to help? • beyond joining the 500, 000 Win 2 K beta testers 4 Test generation robot: • Make up SQL queries • Send them to SQL Server, Oracle, DB 2, Informix, … • If answer is the same, great, if not there is a problem Case W X Y Z 1672 232 234 241 31 15 12 28 1 12 5 116 0 29 32 4 18 18 19 25 18 113 45 19 Error 4 Also good for stress tests 4 Found MANY bugs in our products (all fixed). 4 Found MANY bugs in other’s products. 4 Very valuable tool. All four agree 84% W, X, and agree 95% Problem with intermediate table. 4 MSR-TR-98 -21 Massive Stochastic Testing of SQL, Slutz, Don http: //research. microsoft. com/scripts/pub. DB/pubsasp. asp? Record. ID=175 Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Kilo Mega Giga Tera Peta Some Tera-Byte Databases 4 The Web: 1 TB of HTML 4 Terra. Server 1 TB of images 4 Several other 1 TB (file) servers 4 Hotmail: 20 TB of email 4 Sloan Digital Sky Survey: 40 TB raw, 2 TB cooked 4 EOS/DIS (picture of planet each week) • 15 PB by 2007 Exa 4 Federal Clearing house: images of checks • 15 PB by 2006 (7 year history) Zetta 4 Nuclear Stockpile Stewardship Program • 10 Exabytes (? ? ? !!) Yotta Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Data Mining 4 Find interesting structure (patterns, relationships) in data • Prediction • Segmentation (clustering) • Dependency modeling (find distribution) • Summarization • Trend and change detection and modeling 4 Allow user to state the query in terms of the business logic • User does not speak statistics or SQL 4 Use data to build predictors • regression, classification, segmentation etc. 4 Generate summaries and reports for insight • find “easy to describe” segments in data automatically • find segments not known to analyst Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Data Mining: Microsoft Site. Server Commerce 3. 0 Http: //www. holtoutlet. com/outlet 4 4 Intelligent Cross-sell 4 Based on: • Historical sales baskets in stores • Contents of current shopper basket • Browsing behavior of shopper 4 Predict: ranking of products in store likely to be most interesting to shopper. Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 3 October 1999 Chicago, Ill.
Mail to 25% and capture 40% 400% improved response! Real. Jimdata drawn from a Microsoft marketing example 3 October 1999 Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 80 Chicago, Ill.
How do people use www. microsoft. com? 100 M hits per day 14 M users/week User browsing data X segments Data Mining (Clustering) Engine Cluster Visualizer Wizard Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 81 3 October 1999 Chicago, Ill.
Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ 82 3 October 1999 Chicago, Ill.
Outline 4 Summary of what you heard. (10 min) 4 The database scene in general. (10 min) 4 Scaleability: Farms, Clones, Parts & Packs (15 min) 4 Microsoft DB research focus. (15 min) • Terra. Server (design and ops). • RAGS. • Data Mining 4 Q&A Jim Gray, Research and Microsoft SQL Server Microsoft Research http: //research. Microsoft. com/~gray/talks/ (10 min) 3 October 1999 Chicago, Ill.
- Slides: 68