the New SQL database youll never outgrow Taming
the New. SQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, Volt. DB
Big Data Defined § Velocity + Moves at very high rates (think sensor-driven systems) + Valuable in its temporal, high velocity state § Volume + Fast-moving data creates massive historical archives + Valuable for mining patterns, trends and relationships § Variety + Structured (logs, business transactions) + Semi-structured and unstructured Volt. DB 2
Example Big Data Use Cases Financial trade monitoring Data Source High-frequency operations Lower-frequency operations Capital markets Write/index all trades, store tick data Show consolidated risk across traders Real-time authorization Fraud detection/analysis Inbound HTTP requests Visitor logging, analysis, alerting Traffic pattern analytics Online game Rank scores: • Defined intervals • Player “bests” Leaderboard lookups Match form factor, placement criteria, bid/ask Report ad performance from exhaust stream Location updates, Qo. S, transactions Analytics on transactions Telco call data Call initiation request record management Website analytics, fraud detection Online gaming micro transactions Digital ad Real-time ad trading exchange services systems Wireless locationbased services Volt. DB Mobile device location sensor 3
Big Data and You § Incoming data streams are different than traditional business apps Big Data and You + You need to write data quickly and reliably, but … § It’s not just about high speed writes + + + Volt. DB You need to validate in real-time You need to count and aggregate You need to analyze in real-time You need to scale on demand You may need to transact 4
Big Data Management Infrastructure High Velocity Online gaming Ad serving Sensor data Financial trade Internet commerce Saa. S, Web 2. 0 Mobile platforms Volt. DB New. SQL § Structured data § ACID guarantees § Relational/SQL § Real-time analytics High Volume Analytic Datastore § Unstructured data § Eventual consistency § Schemaless § KV, document Other OLAP data stores No. SQL 5
Big Data Management Infrastructure High Velocity Online gaming New. SQL High Volume Analytic Datastore Ad serving Sensor data Financial trade Internet commerce Saa. S, Web 2. 0 Mobile platforms Volt. DB Other OLAP data stores No. SQL 6
High Velocity Data Management
High Velocity DBMS Requirements § Ingest at very high speeds and rates § Scale easily to meet growth and demand peaks § Support integrated fault tolerance § Support a wide range of real-time (or “near-time”) analytics § Integrate easily with high volume analytic datastores Volt. DB 8
High Speed Data Ingestion § Support millions of write operations per second at scale § Read and write latencies below 50 milliseconds § Provide ACID-level consistency guarantees (maybe) § Support one or more well-known application interfaces + SQL + Key/Value + Document Volt. DB 9
Scale to Meet Growth and Demand § Scale-out on commodity hardware § Built-in database partitioning + Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare § Database must automatically implement defined partitioning strategy + Application should “see” a single database instance § Database should encourage scalability best practices + For example, replication of reference data minimizes need for multi-partition operations Volt. DB 10
A Look Inside Partitioning select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3, 303, 2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition Partition 1 Volt. DB 1 1 4 101 401 1 2 3 knife spoon fork Partition 2 2 3 2 2 5 5 201 502 1 2 3 knife spoon fork Partition 3 1 3 2 3 6 6 201 601 1 2 3 knife spoon fork 1 1 2 table orders : (partitioned) customer_id (partition key) order_id product_id table products : (replicated) product_id product_name 11
Integrated Fault Tolerance § Database should transparently support built-in “Tandem-style” HA + Users should be able to easily increase/decrease fault tolerance levels § Database should be easily and quickly recoverable in the event of severe hardware failures § Database should be able to automatically detect and manage a variety of partition fault conditions § Downed nodes should be “rejoinable” without the need for service windows Volt. DB 12
Partition Detection & Recovery Network fault protection § Detects partition event Server A § Determines which side of fault to disable Server C § Snapshots and disables orphaned node(s) Server B Live node rejoin § Allows “downed” nodes to rejoin live cluster Server A § Automatically re-synchs all node data Server C § Coordinates transactions during re-synch Server B Volt. DB 13
Real-time Analytics § Database should support a wide variety of high performance reads + High-frequency single-partition + Lower-frequency multi-partition § Common analytic queries should be optimized in the database + Multi-partition aggregations, limits, etc. § Database should accommodate a flexible range of relational data operations + Particularly relevant to structured data Volt. DB 14
Integration with Analytic Datastores § Database should offer high performance, transactional export § Export should allow a wide variety of common data enrichment operations + Normalize and de-normalize + De-duplicate + Aggregate § Architecture should support loosely-coupled integrations + Impedance mismatches + Durability Volt. DB 15
Volt. DB Export Data Flow High Velocity Database Cluster § Loosely-coupled, asynchronous § Queue must be durable § Bi-directional durability Volt. DB 16
Summary § Big Data infrastructures will usually require more than one engine + High velocity engine for “fast” data + Analytic engine for “deep” data § Data characteristics will often determine which high velocity engine to use + New. SQL is often well-suited to structured data + No. SQL is often a good fit for unstructured data § Choose solutions that suit your needs and are designed for interoperability Volt. DB 17
- Slides: 17