Relational vs NonRelational Databases RDBMS vs No SQL

Relational vs Non-Relational Databases RDBMS vs No. SQL Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

About Me • CTO at Confiz • 20 years of experience in IT – Product development and Services • Worked in US, Europe, Middle East and Pakistan with clients including General Electric, Ford Motors, Nestle, Qatar Exchange, P&G, Microsoft, Mobilink • Patents in US and EU on Artificial Intelligence and Computer Vision Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Agenda • • Different database types Comparison Use cases Recommendations * Content has been taken from James Serra’s presentation Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Definitions • Relational Databases – Example: Microsoft SQL Server, Oracle DB, DB 2 – Used in large enterprise scenarios • Non-relational databases – Example: Mongo. DB, Cassandra – Four categories: Key-value stores, wide-column stores, Document stores and graph stores Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Relational Database • Pros – – – Works with structured data Transactional consistency Supports Joins Built-in Data integrity Relationships via constraints Strong SQL • Cons – – Does not scale out horizontally Normalized data means joins hence affecting speed Schema-on-write Cost Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Non-Relational Database • Pros – – – – Works with semi-structured data Scales out horizontally High concurrency, high volume random reads and writes Massive data stores Scheme-free, schema-on-read Cost Simplicity of design: no “impedance mismatch” Speed • Cons – – Denormalized data, requiring mass updates No relationship enforcement Limited transaction support Uses 10 -50 x more space Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

ACID (RDBMS) • Atomicity: All data and commands in a transaction succeed or all fail and roll back • Consistency: All committed data must be consistent with all data rules including constraints, triggers, cascades, atomicity, isolation and durability • Isolation: Other operations cannot access data that has been modified during a transaction that has not been completed • Durability: Once a transaction is committed, data will survive system failures and can be recovered Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

BASE (No. SQL) • Basically Available: Guaranteed Availability • Soft-state: State of the system may change, even without a query (because of a node updates) • Eventually consistent: System will become consistent over time Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Comparison RDBMS No. SQL Strong consistency Weak consistency – stale data OK Isolation Last Write Wins Transaction Programmer Managed Available / Consistent Available / Partition Tolerant Robust Database / Simpler Code Simpler Database / Harder Code Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Relational Stores • Data Stored in tables • Tables contain some number of columns, each of a type • A schema describes the columns each table can have • Every table’s data is stored in one or more rows • Each row contains a value for every column in that table • Rows aren’t kept in any particular order Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Key – Value Stores • Key-value stores offer very high speed via the least complicated data model – anything can be stored as a value, as long as each value is associated with a key or a name Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Wide-Column Stores • Wide-column include a primary key, an optional secondary and anything stored as a value. Also called column stores. Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Document Stores • Document stores contain data objects that are inherently hierarchical, tree-like structures (JSON or XML) Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Graph store • Uses graphic structures for semantic queries with nodes, edges and properties to represent and store data Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org
![Use cases for No. SQL categories • Key-value stores: [Redis] For cache, queues, fit-in Use cases for No. SQL categories • Key-value stores: [Redis] For cache, queues, fit-in](http://slidetodoc.com/presentation_image/234c561d49422161a23534e1bf644f4f/image-15.jpg)
Use cases for No. SQL categories • Key-value stores: [Redis] For cache, queues, fit-in memory, rapidy changing data, store blob data. (Shopping cart, Sessoion Data, Stock Prices). Fastest performance • Wide-column stores: [Cassandra] Real-time querying of random (non-sequential) data, huge number of writes, sensors. (Web analytics, real-time data analytics, time series analytics): Internet scale • Document stores: [Mongo. DB] Flexible schemas, dynamics queries, defined indexes, good performance on big DB: (Order data, customer data, log data, chat sessions, tweets, ratings, comments). Fastest development • Graph database: [Neo 4 j] Graph-style data, social network, master data management (fraud detection, graph search, gene sequencing) Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Velocity Transactions Per Second Volume Per day Relational DB Document DB Key Value 100, 000 8 GB As Is 1 M 86 GB Tuned As is 5 M 432 GB Appliance Tuned As Is 10 M 864 GB Clustered Appliance Clustered Servers Tuned 100 M 8. 6 TB Cluster of Clusters Clustered Servers 500 M 43 TB Cluster of Clusters Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Use case for different database techniques • Traditional OLTP business systems (ERP, CRM, In-house apps): Relational database (RDBMS) • Database warehouses (OLAP): Relational database (SMP / MPP) • Web and mobile global OLTP applications: Non -relational database (No. SQL) • Data Lake: Hadoop Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

CAP Theorem • Impossible for any shared data system to guarantee all three: • Consistency: Once data is written, all future requests will contain the data • Availability: Database is always responsive and available. • Partitioning: If part of database is unavailalbe, other parts are unaffected SQL Server with no replication: CA Cassandra, Coach. DB: AP Hbase, Mongo. DB, Redis: CP Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

DB Engines Ranking (May 2017) Ranking DBMS DB Model 1 Oracle Relational 2 My. SQL Relational 3 Microsoft SQL Server Relational 4 Postgre. SQL Relational 5. Mongo. DB Document Store 6. DB 2 Relational 7. MS Access Relational 8. Cassandra Wide column Store 9. Redis Key-value Store 10. SQLite Relational DBMS Db-engines. com/en/ranking Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Summary Current market share has 80% Relational and 14% Non. Relational DBs Choose No. SQL when • Data has lot of volume and / or variety • Data is non-relational / semi-structured • You can relax transactional consistency and scalability and performance are more important • Want to serve large number of users vs enforcing business rules Relational databases provide strong consistency at the cost of speed and scalability. Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Bottom line • RDBMS for enterprise OLTP and ACID compliance, or db’s under 5 TB • No. SQL for scaled OLTP and JSON documents • Hadoop for big data analytics (OLAP) Usman Ahmed - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org
- Slides: 21