RealTime Big Data Use Cases John Leach CTO
Real-Time Big Data Use Cases John Leach CTO, Splice Machine
disruptive Distributed Computing Across Commodity Servers Ph. Ds Java Programmers Data Expensive to Store Data Cheap to Store Before After 2
obstacles Map. Reduce Java programmers are scarce and costly Limited use cases because of batch nature of Hadoop 3
Hadoop – not just for data scientists anymore Moving Hadoop Beyond Batch Analytics to Power Real-Time Apps Real-Time Datastores Distributed RDBMS SQL-99 Queries Distributed File System Java Map. Reduce Programs Real-Time Updates with ACID Transactions Read-Only Batch Analytics Real-Time Apps and Analytics 4
real-time Big Data use cases Ad Technology Digital Marketing Fraud Detection Internet of Things Cyberthreat Security Network Monitoring Personalized Medicine 5
case study: Rocket Fuel 6
case study: digital marketing Clients Consumers Cross-Channel Campaigns Real-Time Personalization Unica Real-Time Actions Oracle Initial Results Replaced Oracle RAC DB Powers Unica app and Cognos Scale-out with commodity servers Made queries 3 x-7 x faster Achieved over 10 x price/perf improvement 7
fraud detection Roadtrip to Nevada 1. Start in San Francisco 2. Use credit card for gas in Sacramento 3. Use credit card in Tahoe for lunch 4. Credit card denied for gas because you left CA 5. Spend 15 minutes on phone to get credit card reinstated 4 2 3 1 Benefits Intelligent Fraud Detection Correlate spending patterns based on real-time movements or trips Move beyond simple rules Prevent false positives Catch fraud faster Increase customer satisfaction 8
IOT: network monitoring Network Monitoring App Telemetry Data Cable Set-Top Boxes Remote Resets Scaleout RDBMS Benefits Proactive Fault Response Detect and isolate faults based by trending real-time events Perform remote resets Increase customer satisfaction Reduce costly calls and “truck rolls” 9
IOT: cyberthreat security Security Monitoring App Network Firewalls Network Events Real-Time Responses Scaleout RDBMS Benefits Real-Time Threat Response Correlate millions of events/sec against 3 -5 years of firewall history to identify “sleepers” waking up Prevents loss of sensitive data such as credit cards Reduce embarrassing public exposure 10
IOT: personalized medicine Genomic Data Personalized Treatment App Doctors Personalized Treatment Plans Electronic Medical Records (EMRs) Alerts Scaleout RDBMS Medical Monitoring Devices Personalized Treatment Plans Coordinate care with EMRs Identify complications w/ genetic data Drive real-time response w/ device data Reduce hospital readmissions Eliminate lost revenue under Obama. Care 11 Benefits
scale up vs. scale-out How do I scale? Scale Up Scale Out - e. g. , Exadata - Very expensive - Poor price/performance No. SQL New. SQL - e. g. , Mongo. DB SQL-on. Hadoop - Limited SQL - No transactions - May have weak consistency or no joins Proprietary - e. g. , Nuo. DB Hadoop RDBMS - Unproven scalability - e. g. , Splice Machine - e. g. , Impala - No Hadoop - Proven scale-out - No transactions architecture - Transactional RDBMS - Power real-time apps 12 Analytic Engines - No real-time updates - Can’t power a real- time app
Splice Machine The only Hadoop RDBMS Standard ANSI SQL Horizontal Scale-Out Real-Time Updates ACID Transactions Powers OLAP and OLTP Seamless BI Integration 13
proven building blocks Apache Derby ≈ SQL Scale ≈ 14
how we do it 15
distributed query processing Parallelized computation across cluster Moves computation to the data Utilizes HBase co-processors No Map. Reduce 16
summary Distributed Computing Disruptive technology Data now cheap to store Real-Time Use Case Types Port existing operational applications experiencing cost or scaling issues Develop new applications that can leverage historical data in real-time Examples Digital marketing Ad Tech Fraud Detection Internet of Things 17
Questions? 18
Real-Time Big Data Use Cases John Leach CTO, Splice Machine
- Slides: 19