Toto Were Not in Kansas Anymore On Transitioning
- Slides: 39
Toto, We’re Not in Kansas Anymore… On Transitioning from Research to the Real World Mike Carey Fellow, Platform Engineering carey@propel. com
Today’s Talk • Background information • Lessons from the "Road to Propel" § The UW-Madison years § The IBM Almaden years § The Propel (web) years • Database research in the new millennium § Maturity brings its own challenges § Research opportunities in e-commerce § Some operational recommendations
Part One: Background information
Background Info • UW-Madison CS Professor (1983 -1995) § Concurrency control algorithms § Query processing performance § Main memory databases § Extensible database systems (Exodus) § Real-time database systems § Client-server O-O database systems (Shore) § Online algorithms, DBMS performance
Background Info (cont. ) • IBM Almaden Research Staff Member and Manager (1995 -2000) § Heterogeneous database systems (Garlic) § Object middleware (Component Broker) § Object-relational databases (DB 2 UDB) • Propel Platform Engineering Fellow (2000 -? ) § Scalable e-commerce infrastructure software
Part Two: Lessons from the "Road to Propel"
UW-Madison Years Lesson #1: Awareness is key • Be “plugged in” to current technologies & issues § Hardware and OS characteristics Ø CPU, memory, disk, and network performance Ø Path lengths (e. g. , TCP/IP messages) § DBMS software characteristics Ø DBMS internal components Ø Layers/calls: SQL, records, pages, … Ø Interactions, e. g. , concurrency & recovery § Application characteristics Ø “Typical” workload characteristics Ø What systems can or cannot know (when/how)
UW-Madison Years Lesson #2: Students are the product • Having industrial impact is a laudable goal, but § It’s hard (in general) to be fully plugged in Ø Details of systems and workloads § The algorithms may not be the hard part Ø More about this shortly • Students are our biggest accomplishment § Well-trained students are incredibly valuable Ø Systems sense; ability to think, learn, adapt • I’m extremely proud of my former students! § That’s what I miss the most in industry
UW-Madison Years The wake-up call: A house of cards? • [ACL 85]: Blindly following colleagues § Ten years later, some papers still using the same hardware and software parameters • RTDBS: The blind following the blind? § We basically stated and then solved these research problems ourselves • SIGMOD-94: The SIGMOD chair’s lunchtime analysis of SIGMOD paper production § Not clear to me that “most SIGMOD papers in the last ten years” was such a good thing
The First Transition From UW-Madison to IBM Almaden • Intellectual reasons § Weary of inventing and then solving problems § Wanted access to real problems and systems § Also just needed a change after 12 years • IBM Almaden reasons § Terrific environment & colleagues for DB research § “Development from the safety of a research lab” • Personal reasons § Wanted to “have a life” again outside work § Wanted to live in the Bay area (Silicon Valley)
IBM Almaden Years Context: Extending DB 2 UDB • From 1996 -2000, I worked on adding object extensions to SQL and DB 2 UDB (V 5. 2 -V 7. 1) § Object-relational data model extensions Ø Types, OIDs, references, subtables, object views § Corresponding query language extensions Ø Substitutability, path expressions, constraints and triggers, type predicates, sub-table access rules § System extensions Ø Storage & query processing for all of the above • DB 2 UDB work is geographically distributed § IBM Toronto, Santa Teresa, and Almaden labs
IBM Almaden Years Lesson #1: Products are hard to build • Products are very different than prototypes §Someone else wrote the first 1 M+ lines of code Ø System has many nooks and crannies Ø No one person understands the whole thing Ø 100 or so people are working on it with you § You have to do the other 80 -90% of the work ØTesting, code reviews, testing, docs, testing, … Ø System catalogs: no big deal, right…? • The engine is just one aspect of a product § Import/export, bulk load, control center, visual explain, query tools, design tools, replication, …
IBM Almaden Years Lesson #1: Products are hard (cont. ) • It’s difficult to make some kinds of changes § Customers already have terabytes of data Ø Data migration is a no-no (at least at IBM ) Ø Catalog migration is a pain and a time sink • It’s not just your own product that’s affected § 3 rd-party vendors may also be a factor Ø Ex. 1: Physical load utilities (table hierarchies) Ø Ex. 2: Logical & physical database design tools § Market share & standards come into play here
IBM Almaden Years Lesson #2: Adding to a language is hard • SQL is a 25 -year old language that was never intended to do everything we want it to today § World was simple tables, basic retrievals § Various assumptions made for “convenience” Ø Ex. 1: Sub-queries – scalar- or table-valued? Ø Ex. 2: Nulls – inconsistent (e. g. , where vs. max) • SQL changes must be monotonic in nature § Can’t change meaning of existing queries (!) § Extensions must all peacefully co-exist § Language is getting “full” (> 1000 pages)
IBM Almaden Years Lesson #2: Adding is hard (cont. ) • “Cool new SQL features” are a double-edged sword § Can add real value for advanced applications Ø Consider OLAP, O-R, and temporal extensions § “Different” or “proprietary” = “bad”? Ø To 3 rd-party vendors, also to nervous customers § And, tools may hide them anyway Ø Query builders, EJB programming model, … • SQL standardization is an interesting world § Serious extensions must someday fly with ANSI & ISO § SQL standard is in some ways a corporate battleground § Vendors only want the extensions on their radar screen
IBM Almaden Years Lesson #3: Listen to users’ needs • So many features, so little time…! § Potential users help you prioritize your work ØEx: Sub-table triggers & constraints in DB 2 § They also help you make “safe” initial decisions Ø Ex: Internal storage for DB 2 table hierarchies • Potential users can help you see things you might otherwise miss (at least initially) § Ex 1: Advantages of DB 2 user-defined OIDs Ø Customers already “simulate” objects today Ø Access to system-generated OID values? Ø Object caching and efficient write-back §Ex 2: DB 2 object view functionality Ø Virtual table hierarchies, same authorization model
The Second Transition From IBM Almaden to Propel • Some triggering events § Working on XML middleware layer for DB 2 UDB Ø After spending nearly 20 years “under the hood” § Almaden management discussions: connecting to Valley § Personal belief that this was a unique period for CS § Call (out of the blue) from Steve Kirsch, CEO • Given a 4 -year paid scholarship to “e-school” § Chance to learn about Ø Using database system technology Ø Web and e-commerce applications Ø The startup company experience § Excellent senior team to learn from at Propel § Unemployment risk “low” ( ) in Silicon Valley
Propel (Web) Years Context: E-commerce infrastructure • Propel is developing two software products § E-Commerce Suite Ø “Amazon-in-a-box” product § Distributed Services Platform Ø Infrastructure product for the above (and other data-centric, mission-critical internet applications) • Platform = Scalable 24 x 7 “e-commerce OS” § Online data management, caching, search, messaging, live deployment, monitoring, …
Propel (Web) Years Context: E-C infrastructure (cont. ). . . Firewall Load Balancer Web Server App Server Web Server App Server . . . … … … Order Mgmt Service ERP Service Payment Service . . . Propel Platform Message Service … Data Management & Search Service … Caching Service … Admin & Monitoring Service … …
Propel (Web) Years Lesson #1: Standards vs. innovation • What a marketing person will likely tell you after asking a customer for their input § Customers want standards-based solutions Ø “We want DB access via SQL and JDBC” Ø “We want our programmers to use EJBs (J 2 EE)” Ø “We want to use JSPs for our dynamic pages” § I. e. , a typical customer dictionary entry says Ø Proprietary: see “bad” • This poses obvious challenges for innovation! § Luckily… Ø XML is also considered “standards-based” Ø Performance, ease of use are still compelling in web-land
Propel (Web) Years Lesson #2: Oracle is a de facto standard • Talking to dot-com’s with Oracle DBAs is an interesting experience for the academic-minded § Academic point of view Ø Whatever; it’s just a database system… § Oracle DBA point of view Ø Do my Oracle utilities work with your solution? Ø Do my Oracle sequences work with your solution? Ø You mean it’s not Oracle? (said with a whine ) • Again, this poses obvious challenges for innovation (not to mention other DB vendors!) § Luckily… Ø Saying “Oracle inside” seems to help Ø Oracle is not a cheap, perfect, or limitless solution
Propel (Web) Years Lesson #3: VCs, dot-coms, and ASPs • Oracle+Sun+Solaris are to web sites what IBM was to corporate IS departments 15+ years ago § Some VC firms prescribe(d) them to dot-coms § Some IS departments pre-approve (just) them § They are a favorite managed stack for ASPs • Thus, today’s “technology brakes” include § Corporate and VC comfort zones § ASP system management expertise § Developer and DBA skill set availability
Part Three: Database research in the new millennium
The DB Field Has Matured Bringing a new set of challenges • SQL DB systems are becoming a commodity § ISVs produce DBMS-independent packages Ø Ex: ERP systems (SAP, Peoplesoft, Baan, …) Ø SQL + ODBC/JDBC is just a “given” § New features face a huge uphill battle Ø Witness the rate of object-relational adoption Ø Hopefully SQL 99 will help, but…. ? § A SQL DBMS has truly become a component Ø Transactional storage for ERP Ø On-line data repository for e-commerce Ø I. e. , just a place to put your data • So where does that leave our community…?
The DB Field Has Matured Bringing new challenges (cont. ) • Interesting questions remain! For example: § A good component is easy to manage Ø DB systems have way too many knobs Ø They’re virtually impossible to hide as a result § A good component plugs in well with others Ø Better, faster interfaces would be nice Ø Cache interaction hooks would be nice Ø Workflow hooks would be nice Ø (Your application hooks go here) § XML appears poised for interoperation success Ø W 3 C XML Schema, Query, & Protocol efforts Ø Our community should keep playing a big role
The DB Field Has Matured Bringing new challenges (cont. ) • Interesting questions remain (cont. ) § Major applications are worth studying Ø Ex: Kemper, Kossman, et al SAP study Ø Sources of “typical” workload info, database characteristics, and feature use (or disuse) info § Bottom line from a component perspective Ø We need to understand how our technologies are being utilized (or not) and respond accordingly - Ex. 1: Queries with parameter markers - Ex. 2: SQL’s approach to authorization - Ex. 3: Actual usage-driven interoperation hooks § And, of course, we must continue to innovate! Ø Somehow…? !?
E-Commerce DB Research A Propel Perspective • The Propel Distributed Services Platform § Scalable, 24 x 7 e-business infrastructure Ø Array of inexpensive Sun or Intel boxes Ø Exploitation of low main memory cost § High-performance and highly available Ø Data management and search capabilities Ø Transparent data replication & partitioning Ø Caching of page fragments, objects, and data Ø Scalable messaging & queuing infrastructure Ø Built from best-of-breed components § XML-enabled (for the future of e-business) § Unified administration and on-line deployment
E-Commerce DB Research Problem #1: Caching • What to cache and where to cache it? § Fragments of dynamic HTML pages Ø Personalization ruins basic page caching Ø Commonly used fragments assured, though § XML objects used to create HTML fragments Ø If applicable, probably less bulky § Java objects materialized on app servers Ø Avoids database re-access cost Ø Issues: load balancing, memory duplication § Database objects accessed from DB server(s) Ø Lowers database access cost Ø Where – app servers, DB server(s), or both?
E-Commerce DB Research Problem #1: Caching (cont. ) • How to keep caches consistent § Multiple web servers and app servers § DB rows -> Java objects -> XML -> HTML Ø How to uniquely identify objects? Ø How to keep track of what’s where? Ø How to keep track of data dependencies? Ø How/when to propagate updates? Ø How to maintain consistency? Ø In fact, how to define consistency…? Ø What about queries and query results? • And, just to up the ante a bit further § Want all this to work across continents…!
E-Commerce DB Research Problem #2: Consistency & transactions • Not all e-business data is equally “valuable” § Want to trade off reliability & performance Ø Products: hot, may be read-only once deployed Ø Shopping carts: read/write, “best effort” durability Ø Orders: also read/write, require full durability • Similar considerations arise w. r. t. consistency § Would like well-defined choices available Ø Auctions: okay to bid using slightly outdated info Ø Orders: real-time inventory requires transactions • Need good, architecturally appropriate solutions § Caching, replication, failover, smart load balancing, …
E-Commerce DB Research Problem #3: Queries and search • W 3 C’s XML Schema recommendation § How to store richly typed XML data? Ø Sparse/variant data, repeating elements, subtyping, text, … Ø Would like to map it into (object-? ) relational databases • W 3 C’s XML Query recommendation § How to process XML queries efficiently? Ø SQL-appropriate processing model Ø Pushdown and other optimizations § How to handle search-oriented queries? Ø Want transaction-consistent text indexing Ø Also want relevance ranking and various IR “goodies”
E-Commerce DB Research Problem #4: Content management • E-business web sites are rich in content § HTML fragments (e. g. , logos and other goodies) § Images (e. g. , pictures of products) § Text (e. g. , descriptions of products) § Database data (e. g. , product attributes, pricing) § JSP pages (e. g. , a product page) § Personalization rules (i. e. , what to show me) § Business logic (i. e. , Java code) § Data -> object mappings (e. g. , Java classes) § And the list goes on…
E-Commerce DB Research Problem #4: Content mgmt. (cont. ) • This poses a number of problems § Versioning of file-based artifacts Ø Not unlike CAD or document versioning Ø Multiple editors working on the content base Ø Several companies do this (e. g. , Interwoven) § Versioning of DB-based artifacts Ø Not clear how to handle & integrate this part Ø No winning solutions out there yet (that I know of) § Versioning of code-based artifacts Ø How to keep all this stuff mutually consistent? Ø And, how to deploy online in a 24 x 7 world…?
E-Commerce DB Research Problem #5: The sun never sets anymore • The web brings a clear need for 24 x 7 solutions § Asynchronous replication techniques § Online schema evolution (w/replication) § Online data loading and deployment § Online management of rolling history data • Design for administration/monitoring is also key § Online backup/restore § Failure & performance monitoring § Would like system to be self-tuning & self-scaling Ø Reassign boxes between services as needed Ø Even give and take boxes from ASP infrastructure
The Propel Platform We’re attacking all of these issues • Programming model § Objects with (truly!) universal OIDs § Java classes, derived from XML Schema objects • Caching § Multilevel cache hierarchy (w/partitioning) § Mini-caches, global cache, MM-DBMS, DB-DBMS • Consistency and transactions § Can trade off ACID-ity vs. performance • Queries and search § XML-influenced query language, integrated search § Transparency for cached, partitioned, & replicated data
The Propel Platform We’re attacking all of these issues (cont. ) • Platform messaging support § Clustered IPC for Platform components Ø Load balancing & failover Ø System monitoring § Persistent queues as database objects Ø Think “active tables” (enqueue/dequeue, queries) Ø Good foundation for transactional workflows • Content management § Currently focused on deployment problems § Partnering for content management today • System monitoring and administration § Separate software stack with agents everywhere § JSP-based console to oversee & integrate activities
Conclusion Lessons from the "Road to Propel" • UW-Madison lessons: Know what matters! § Awareness is key § Students are the product • IBM Almaden lessons: What’s really hard? § Products are hard to build § Adding to a language is hard § Listen to users’ needs • Propel lessons: Commoditization brings roadblocks. § Standards vs. innovation § Oracle is a de facto standard § Dot-coms, VCs, and ASPs
Conclusion DB research in the new millennium • SQL databases are becoming commodity parts § ISVs strive for DBMS vendor-independence § This makes (visible) innovation hard § Lots of interesting research questions, though Ø Component hooks, usage scenarios, XML, … • E-commerce problems are ripe for the picking § Examples that have arisen at Propel include Ø Caching, transactions & consistency Ø Queries and search Ø Content management Ø Online everything for a 24 x 7 world
Conclusion Some operational recommendations • Understand the real problems out there § Industrial friends can be very helpful § Your students will benefit tremendously § So will the companies who hire them • Recognize that commoditization is happening § Consider working within the constraints that it brings § Many important open problems remain § E-commerce is one fun/interesting example here • Also keep in mind what really matters § It’s actually not any of this stuff, in the end…!
- Betsy was lonely. what caused this
- Its not just anymore
- How to read pipe blueprints
- Kansas kansas state fight
- Why isn't pluto a planet anymore
- Sudden and violent but brief; fitful; intermittent
- A livella poesia testo
- Toto je môj milovaný syn
- Kevin liu toto
- Kut anlayışı ne demek
- Toto qr code
- Toto booking
- Toto je môj milovaný syn
- Confitebor tibi domine in toto corde meo
- Toto fiduli
- Toto matrix
- Toto blag resan
- Nina 9
- Tegese tembang pangkur kalamun ana manungsa
- Synekdocha pars pro toto
- Bara toto
- Confitebor tibi domine in toto corde meo
- Tápláló eledel rejtvény
- Toto wff
- Vga toto
- Bonifert domonkos matematikaverseny 2020
- Rudina toto
- Future in the past continuous
- Why were the birds not cut in half in genesis 15:10
- Kansas stock market game
- How did bleeding kansas embody the slavery controversy
- Is kansas flatter than a pancake
- Image
- Ksde ela standards
- Oracion de apertura en el senado de kansas
- Kansas city walkway collapse
- Kansas eviction prevention program
- Kansas state computer science
- Kbs ku
- Math standards kansas