Saying Yes to No SQL Overview n n

  • Slides: 25
Download presentation
Saying “Yes” to No. SQL Overview: n n The Relational Model Structured Query Language

Saying “Yes” to No. SQL Overview: n n The Relational Model Structured Query Language (SQL) The “original” No. SQL Movement No. SQL Today Inspiration for this talk: n n n Dr. Ford Dr. Kaner Dr. Menezes

The Relational Model E. F. Codd: (1923 -2003) n n n Developed the relational

The Relational Model E. F. Codd: (1923 -2003) n n n Developed the relational model while at IBM San Jose Research Laboratory IBM Fellow 1976 Turing Award 1981 ACM Fellow 1994 British, by birth Associations: n n n Raymond F. Boyce Hugh Darwen C. J. Date Nikos Lorentzos David Mc. Goveran Fabian Pascal 2

The Relational Model “A Relational Model of Data for Large Shared Data Banks, ”

The Relational Model “A Relational Model of Data for Large Shared Data Banks, ” E. F. Codd, Communications of the ACM, Vol. 13, No. 6, June, 1970. “Further Normalization of the Data Base Relational Model, ” E. F. Codd, Data Base Systems, Proceedings of 6 th Courant Computer Science Symposium, May, 1971. “Relational Completeness of Data Base Sublanguages, ” E. F. Codd, Data Base Systems, Proceedings of 6 th Courant Computer Science Symposium, May, 1971. Plus others… 3

The Relational Model The basic data model: n n n Relations, tuples, attributes, domains

The Relational Model The basic data model: n n n Relations, tuples, attributes, domains Primary & foreign keys Normal forms “Employee” ID 15394 21621 17852 32904 Last-Name Jones Smith Brown Carson Date-of-Birth 11/3/75 6/24/69 8/14/72 10/29/64 : : Job-Category Software Management Hardware Software Query model: n n Relational algebra – cartesian product, selection, projection, union, set-difference Relational calculus A primary theme: n Physical data independence 4

Relational Database Management Systems (RDBMS) Database Management Systems Based on the Relational Model: n

Relational Database Management Systems (RDBMS) Database Management Systems Based on the Relational Model: n n n n System R – IBM research project (1974) Ingres – University of California Berkeley (early 1970’s) Oracle – Rational Software, now Oracle Corporation (1974) SQL/DS – IBM’s first commercial RDBMS (1981) Informix – Relational Database Systems, now IBM (1981) DB 2 – IBM (1984) Sybase SQL Server – Sybase, now SAP (1988) 5

Structure Query Language (SQL) SQL is a language for querying relational databases. History: n

Structure Query Language (SQL) SQL is a language for querying relational databases. History: n n Developed at IBM San Jose Research Laboratory, early 1970’s, for System R Credited to Donald D. Chamberlin and Raymond F. Boyce Based on relational algebra and tuple calculus Originally called SEQUEL Language Elements: n Clauses, expressions, predicates, queries, statements, transactions, operators, nesting etc. select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '[DATE]‘ and o_orderdate < date '[DATE]' + interval '3' month and exists (select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate) group by o_orderpriority order by o_orderpriority; 6

SQL and the Relational Model A text search of E. F. Codd’s early papers

SQL and the Relational Model A text search of E. F. Codd’s early papers for “SQL” (or SEQUEL) reveals: 7

Relational Query Languages Other Relational Query Languages: n n n Datalog QUEL Query By

Relational Query Languages Other Relational Query Languages: n n n Datalog QUEL Query By Example (QBE) SQL variations shell scripts, with relational extensions 8

The No. SQL RDBMS One of first uses of the phrase No. SQL is

The No. SQL RDBMS One of first uses of the phrase No. SQL is due to Carlo Strozzi, circa 1998. No. SQL: n n n A fast, portable, open-source RDBMS A derivative of the RDB database system (Walter Hobbs, RAND) Not a full-function DBMS, per se, but a shell-level tool User interface – Unix shell Based on the “operator/stream paradigm” http: //www. strozzi. it/cgi-bin/CSA/tw 7/I/en_US/nosql/Home%20 Page 9

Operator/stream Paradigm Commonly referenced papers: n n “The Next Generation, ” E. Schaffer and

Operator/stream Paradigm Commonly referenced papers: n n “The Next Generation, ” E. Schaffer and M. Wolf, UNIX Review, March, 1991, page 24. “The UNIX Shell as a Fourth Generation Language, ” E. Schaffer and M. Wolf, Revolutionary Software. Regarding Database Management Systems: “…almost all are software prisons that you must get into and leave the power of UNIX behind. ” “…large, complex programs which degrade total system performance, especially when they are run in a multi-user environment. ” “…put walls between the user and UNIX, and the power of UNIX is thrown away. ” In summary: n n Relational model => yes UNIX => big yes Big, COTS, relational DBMS => no SQL => no 10

The No. SQL RDBMS Getting back to Strozzi’s No. SQL RDBMS: n n n

The No. SQL RDBMS Getting back to Strozzi’s No. SQL RDBMS: n n n Based on the relational model Based on UNIX and shell scripts Does not have an SQL interface In that sense, and interpreted literally, No. SQL means “no sql, ” i. e. , we are not using the SQL language. 11

No. SQL Today More recently: § The term has taken on different meanings §

No. SQL Today More recently: § The term has taken on different meanings § One common interpretation is “not only SQL” Most modern No. SQL systems diverge from the relational model or standard RDBMS functionality: The data model: tuples attributes domains normalization relations vs. The query model: tuple calculus relational algebra vs. text search map/reduce The implementation: rigid schemas ACID compliance graphs key/values documents graph traversal vs. (schema-less) flexible schemas vs. BASE In that sense, No. SQL today is more commonly meant to be something like “non-relational” 12

No. SQL Today Motivation for recent No. SQL systems is also quite varied: §

No. SQL Today Motivation for recent No. SQL systems is also quite varied: § “…there are significant advantages to building our own storage solution at Google, ” Chang et. al. , 2006 § Scalability, performance, availability, flexibility § Speculation - $$$, control My. SQL vs. Mongo. DB: • http: //www. youtube. com/watch? v=b 2 F-DIt. Xt. Zs How “big” is the No. SQL movement? Will they eventually eliminate the need for relational databases? Is this another grand conspiracy by the government and, you know, that guy…. 13

No. SQL Today (a partial, unrefined list) Hbase Cassandra Hypertable Accumulo Amazon Simple. DB

No. SQL Today (a partial, unrefined list) Hbase Cassandra Hypertable Accumulo Amazon Simple. DB Sci. DB Stratosphere flare Cloudata Big. Table QD Technology Smart. Focus KDI Alterian Cloudera C-Store Vertica Qbase–Meta. Carta Open. Neptune HPCC Mongo DB Couch. DB Clusterpoint Server. Terrastore Jackrabbit Orient. DB Perservere Coud. Kit Djondb Schema. Free. DB SDB Raptor. DB Thru. DB Raven. DB Dynamo. DB Azure Table Storage Couchbase Server Riak Level. DB Chordless Genie. DB Scalaris Tokyo Kyoto Cabinet Tyrant Scalien Berkeley DB Voldemort Dynomite KAI Memcache. DB Faircom C-Tree Hamster. DB STSdb Tarantool/Box Maxtable Pincaster Raptor. DB TIBCO Active Spaces allegro-C ness. DBHyper. Dex Mnesia Light. Cloud Hibari Bang. DB Open. LDAP/MDB/Lightning Scality Redis Ka. Tree Tom. P 2 P Kumofs Treap. DB NMDB luxio actord Keyspace schema-free RAMCloud Sub. Record Mo 8 on. Db Dovetaildb JDBM Neo 4 Infinite. Graph Sones Info. Grid Hyper. Graph. DB DEX Graph. Base Trinity Allegro. Graph Brightstar. DB Bigdata Meronymy Open. Link Virtuoso Vertex. DB Flock. DB Execom IOG Java Univ Netwrk/Graph Framework Open. RDF/Sesame Filament OWLim i. Graph Jena SPARQL Orient. Db Arango. DB Alchemy. DB Soft No. SQL Systems Db 4 o Versant Objectivity Starcounter ZODB Magma NEO siaqodb Sterling Morantex Eye. DB HSS Database Framer. D Ninja Database Pro Stupid. DB Kioku. DB Perl solution Durus Giga. Spaces Infinispan Queplix Grid. Gain Galaxy Space. Base Joafip. Coherence e. Xtreme. Scale Mark. Logic Server EMC Documentum x. DB e. Xist Sedna Base. X Qizx Network. X Pico. List Hazelcast Jas. DB Berkeley DB XML Xindice Tamino Globals Intersystems Cache GT. M EGTM U 2 Open. Insight Reality Open. QM ESENT j. BASE Multi. Value Lotus/Domino e. Xtreme. DB RDM Embedded ISIS Family Prevayler Yserial Vmware v. Fabric Gem. Fire Btrieve Kirby. Base Tokutek Recutils File. DB Armadillo illuminate Correlation Database Fluid. DB Fleet DB Twisted Storage Rindo Sherpa tin Dryad Sky. Net Disco MUMPS Adabas XAP In-Memory Grid e. Xtreme Scale Mckoi. DDB Mckoi SQL Database Innostore No-List KDI Perst Oracle Big Data Appliance Fleet. DB IODB 14

No. SQL Today It is easy to find diagrams that look like this: •

No. SQL Today It is easy to find diagrams that look like this: • http: //www. vertabelo. com/blog/vertabelo-news/jdd-2013 -what-we-found-out-about-databases It is easy to find diagrams that look like this: • http: //db-engines. com/en/ranking_categories It is easy to find diagrams that look like this: • http: //www. odbms. org/2014/11/gartner-2014 -magic-quadrant-operational-database-management-systems-2/ 15

Primary No. SQL Categories General Categories of No. SQL Systems: n n Key/value store

Primary No. SQL Categories General Categories of No. SQL Systems: n n Key/value store (wide) Column store Graph store Document store Compared to the relational model: n n Query models are not as developed. Distinction between abstraction & implementation is not as clear. 16

Key/Value Store “Dynamo: Amazon’s Highly Available Key-value Store, ” De. Candia, G. , et

Key/Value Store “Dynamo: Amazon’s Highly Available Key-value Store, ” De. Candia, G. , et al. , SOSP’ 07, 21 st ACM Symposium on Operating Systems Principles. The basic data model: n n Database is a collection of key/value pairs The key for each pair is unique Primary operations: n n No requirement for normalization (and consequently dependency preservation or lossless join) insert(key, value) delete(key) update(key, value) lookup(key) Additional operations: n n variations on the above, e. g. , reverse lookup iterators Dynamo. DB Azure Table Storage Riak Rdis Aerospike Foundation. DB Level. DB Berkeley DB Oracle No. SQL Database Genie. Db Bang. DB Chordless Scalaris Tokyo Cabinet/Tyrant Scalien Voldemort Dynomite KAI Memcache. DB Faircom C-Tree LSM Kitaro. DB Hamster. DB STSdb Tarantool. Box Maxtable Quasardb Pincaster Raptor. DB TIBCO Active Spaces Allegro-C ness. DB Hyper. Dex Shared. Hash. File Symas LMDB Sophia Pickle. DB Mnesia Light. Cloud Hibari Open. LDAP Genomu Binary. Rage Elliptics Dbreeze Rocks. DB Treode. DB (www. nosql-database. org www. db-engines. com www. wikipedia. com) 17

Wide Column Store “Bigtable: A Distributed Storage System for Structured Data, ” Chang, F.

Wide Column Store “Bigtable: A Distributed Storage System for Structured Data, ” Chang, F. , et al. , OSDI’ 06: Seventh Symposium on Operating System Design and implementation, 2006. The basic data model: n n n Database is a collection of key/value pairs Key consists of 3 parts – a row key, a column key, and a time-stamp (i. e. , the version) Flexible schema - the set of columns is not fixed, and may differ from row-to-row One last column detail: n Column key consists of two parts – a column family, and a qualifier Warning #1! Accumulo Amazon Simple. DB Big. Table Cassandra Cloudata Cloudera Druid Flink Hbase Hortonworks HPCC Hyupertable KAI KDI Map. R Monet. DB Open. Neptune Qbase Splice Machine Sqrrl (www. nosql-database. org www. db-engines. com www. wikipedia. com) 18

Wide Column Store Column families Row key Personal data ID First Name Last Name

Wide Column Store Column families Row key Personal data ID First Name Last Name Professional data Date of Birth Job Category Salary Date of Hire Employer Column qualifiers 19

Wide Column Store Personal data Professional data ID First Name Last Name Date of

Wide Column Store Personal data Professional data ID First Name Last Name Date of Birth Job Category Salary Date of Hire ID First Name Middle Name Last Name Job Category Employer Hourly Rate ID First Name ID Last Name Job Category Salary Date of Hire Employer Group Employer Seniority Insurance ID Bldg # Office # Emergency Contact Medical data One “table” 20

Wide Column Store Row key t 1 t 0 ID First Name Last Name

Wide Column Store Row key t 1 t 0 ID First Name Last Name Date of Birth Job Category Personal data Salary Date of Hire Employer Professional data One “row” in a wide-column No. SQL database table = Many rows in several relations/tables in a relational database 21

Graph Store Neo 4 j - “The Neo Database – A Technology Introduction, ”

Graph Store Neo 4 j - “The Neo Database – A Technology Introduction, ” 2006. The basic data model: n n Directed graphs Nodes & edges, with properties, i. e. , “labels” Allegro. Graph Arango. DB Bigdata Bitsy Brightstar. DB DEX/Sparksee Execom IOG Fallen * Filament Flock. DB Graph. Base Graphd Horton Hyper. Graph. DB IBM System G Native Store Infinite. Graph Info. Grid j. Core. DB Graph Map. Graph Meronymy Neo 4 j Orly Open. Link virtuoso Oracle Spatial and Graph Oracle No. SQL Datbase Orient. DB OQGraph Ontotext OWLIM R 2 DF ROIS Sones Graph. DB SPARQLCity Sqrrl Enterprise Stardog Teradata Aster Titan Trinity Triple. Bit Velocity. Graph Vertex. DB White. DB (www. nosql-database. org www. db-engines. com www. wikipedia. com) 22

Document Store Mongo. DB - “How a Database Can Make Your Organization Faster, Better,

Document Store Mongo. DB - “How a Database Can Make Your Organization Faster, Better, Leaner, ” February 2015. The basic data model: n The general notion of a document – words, phrases, sentences, paragraphs, sections, subsections, footnotes, etc. n Flexible schema – subcomponent structure may be nested, and vary from document-to-document. n n Metadata – title, author, date, embedded tags, etc. Key/identifier. One implementation detail: n Formats vary greatly – PDF, XML, JSON, BSON, plain text, various binary, scanned image. Amisa. DB Arango. DB Base. X Cassandra Cloudant Clusterpoint Couchbase Couch. DB Densodb Djondb EJDB Elasticsearch e. Xist Fleet. DB i. Box. DB Inquire Jas. DB Mark. Logic Mongo. DB MUMPS Ne. DB No. SQL embedded db Orient. DB Raptor. DB Raven. DB Rethink. DB Siso. DB Terrastore Thru. DB (www. nosql-database. org www. db-engines. com www. wikipedia. com) 23

ACID vs. BASE Database systems traditionally support ACID requirements: n Atomicity, Consistency, Isolation, Durability

ACID vs. BASE Database systems traditionally support ACID requirements: n Atomicity, Consistency, Isolation, Durability In a distributed web applications the focus shifts to: n Consistency, Availability, Partition tolerance CAP theorem - At most two of the above can be enforced at any given time. n n Conjecture – Eric Brewer, ACM Symposium on the Principles of Distributed Computing, 2000. Proved – Seth Gilbert & Nancy Lynch, ACM SIGACT News, 2002. Reducing consistency, at least temporarily, maintains the other two. 24

ACID vs. BASE Thus, distributed No. SQL systems are typically said to support some

ACID vs. BASE Thus, distributed No. SQL systems are typically said to support some form of BASE: n n n Basic Availability Soft state Eventual consistency* “We’d really like everything to be structured, consistent and harmonious, …, but what we are faced with is a little bit of punk-style anarchy. And actually, whilst it might scare our grandmothers, it’s OK. . . ” -Julian Browne https: //www. youtube. com/watch? v=p. Oe 9 PJrbo 0 s 25