A Modern Approach to Data Management Regan Inkster






























































- Slides: 62
A Modern Approach to Data Management Regan Inkster Senior Manager, Enterprise Cloud Strategists East/Central Canada regan. Inkster@oracle. com Copyright 2018 Oracle its affiliates. All rights reserved. Copyright © 2018, ©Oracle and/or its affiliates. All rights reserved. | |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 2
Agenda • Overall Solution Strategy • High Level Strategy for Data Management • Dealing with data at scale Sharding – Facebook, Twitter approaches to sharding • Data Management and Integration – If Data were Food • Quick Demo • e. BAY Kappa Architecture Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
#Think. Autonomous Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | #Think. Autonomous
Oracle’s Autonomous Enterprise – Architectural Layers Hyper-Scale, Identity Graph-powered Data Service Provider Saa. S ERP Saa. S EPM Saa. S SCM Saa. S HCM Saa. S CX IOT Apps Daa. S Industry Apps Safe Harbor Statement Saa. S Security + Management Identity Mgmt, Access Control & Governance Monitoring & Mgmt ID Governance Access Mgmt Security Broker-CASB Log Analytics API + App Monitoring & Perf The following is intended to outline our general product direction. It is intended for Autonomous Analytics Data Prep & Visualization AI/ML Stream Graph contract. It is not a information purposes only, and may not be incorporated into any Advanced Analytics Data. Science. com Stream Analytics Graph Analytics A Analytics Cloud commitment to deliver any material, code, or functionality, and should not be relied Autonomous App Extensions Self-Service Trust Process upon in making purchasing decisions. The development, release, and timing of any Blockchain Cloud Bots Mobile Docs Sites Process Cloud/RPA A A Autonomous features or functionality described for Oracle’s products remains at the sole discretion of Autonomous Integration Paa. S Real-Time Batch Governance Oracle. Event Hub IOT Cloud A Data Integration Platform API Mgmt A Integration Cloud SOA Cloud-Native Microservices API 1 st A Oracle DB Compute (Bare + VM) Autonomous App Dev Serverless Containers K 8 s CI/CD Pipeline Enterprise Native Fn Java Cloud Autonomous Data Management Relational/Small Data Multi-Dimensional/OLAP A No. SQL My. SQL A Oracle DW Essbase Sparkline Event Hub Public Cloud Flat & Fast Network Enterprise-Grade Public Cloud On Premises Cloud @ Customer Low-Code A Visual Builder Big Data Speed Layer Batch Layer Engineered Systems Exadata, Exalogic + ZFS + ZDLRA Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Hybrid Iaa. S & Engineered Systems #Think. Autonomous
Cloud: Autonomous Database Cloud Backup & Load Replicate Clone & Migrate Query Update High Scale, Elastic, Fully Managed BEST OLTP PERFORMANCE, LOWEST Cost BEST 99. 995% AVAILABILITY Automated PATCH, UPGRADE, TUNING while RUNNING Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Autonomous Data Warehouse Cloud Numbers Query Process & Load Images Sounds Video Text Sensor High Scale, Elastic, Managed Best Query PERFORMANCE, Fully ELASTIC INSTANTLY Lowest COST Automated PATCH, UPGRADE, TUNING while RUNNING Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Autonomous Application Platform Lift & Shift Cloud Native Container Native AUTONOMOUS, DEPLOY, ACCESS, MONITOR, TUNE, PROTECT One PLATFORM, ALL STYLES of Development Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Serverless
Cloud: Dev. Ops with Autonomous Application Platform Source Control Build, Test, Publish, Deploy Workflow Create & Operate Environment Iaa. S Paa. S Containers Serverless Continuous INTEGRATION Continuous DELIVERY Developer is OPERATOR Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Container & Serverless Cloud Any Oracle Cloud Service Developer Cloud/CI-CD Scheduler or Workflow HTTP, ATOM, REST Events/Streaming Object Store REACTIVE Infrastructure TRANSPARENT Scaling, ZERO Provisioning Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | PAY FOR USE
Cloud: Autonomous Analytics Numbers Images Sounds Video Text Sensor Replicate Process Summarize Visualize Analyze ANY TYPE OF DATA from ANY SOURCE for EVERYONE Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Autonomous Big Data Cloud Numbers Cleanse Extract Entities Images Sounds Video Text Sensor Enrich Blend Data Catalog Data Science Artificial Intelligence Machine Learning $ Hadoop Process Spark Streaming (Kafka) Metadata Catalog Model Catalog Scalable Elastic Infrastructure BATCH & STREAMING POLYGLOT DATA Types RICH PROCESSING Services Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Autonomous Analytics Cloud ANY DATA Source MODEL VISUALIZE Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | COLLABORATE
Cloud: Ambient Human Interface Web Mobile Messaging Voice Camera Augmented Reality Intelligent Agent BOT REST API Business Logic Multiple MODES & DEVICES Business Process Intelligent ASSISTANT BEYOND the SCREEN Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Autonomous Mobile Cloud Web Applications Voice Messaging BOTS & Conversational AI Platform Push Sync Location Analytics Identity Persistence Mobile Core API Service Any APPLICATION Any CHANNEL AMBIENT INTERFACE Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Autonomous Integration Cloud Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in API making purchasing decisions. The development, release, and timing of any Process Automation Monitoring Orchestration Management features or functionality described for Oracle’s products remains at the sole discretion of Enterprise Service Bus Oracle. Connectors & API Catalog Any APPLICATION HYBRID & MULTI-CLOUD Any STYLE of Integration Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Cloud: Autonomous AI & ML Cloud Jupyter Pillow Pandas Scikit Open. CV Numpy Machine Learning AI Algorithms Keras Caffe Tensor. Flow Deep Learning Automated Packaging, Deployment, Scaling Elastic AI & ML Infrastructure GPU Flash Storage 25 Gig Ethernet Experiment with DATA SETS & AI/ML ALGORITHMS Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Strategy – Oracle Data Management Platform Integrated Cloud Services for modern solutions Events & Observations Io. T Service HTAP Integration Service OLTP Distributed Variable Data BI/DW Big Data Prep & Discovery BI Service BIG DATA Cloud Data Management Platform Services Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Strategy – Unified Data Management Platform Service Special purpose access with standards based integration for any data type {JSON} Special Purpose Developer Enabled access drivers XML No. SQL Database service Data Lake Autonomous Database (OLTP, Warehousing, Graph, Documents) service Hadoop & Spark service Oracle SQL Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Cloud Services for any Data Type and Workload Integrated, SQL Standards based access to any data source
High Velocity, High Volume Data Management Facebook, Pay. Pal, Linked. In, China Telecom We. Chat examples Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
The Autonomous Database must meet the availability needs of even the largest cloud-scale applications Autonomous Database Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | 21
Native Database Sharding A B E F I • RAC and Data Guard meet needs of over 99% of applications while preserving application transparency • Some World Scale applications want farm of independent databases - database sharding J C D G H K L Table Partitions One Giant DB to Many Small DBs A B E F I J C D G H K L Shard DB #1 Shard DB #2 Shard DB #3 – Avoid scalability or availability edge cases of a gigantic single system image database – Willing to modify applications to help route workloads to specific databases in the farm • No. SQL databases made it easy to deploy Sharding, Oracle Database Native Sharding makes it easy for fullfeatured relational databases Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | 22
Sharding 2. 0 • Sharding 1. 0 (Oracle 12. 2) is great for internet-style applications – Easy to achieve Google-style availability and scalability • Sharding 2. 0 (Oracle 18 c) expands range of use cases 1. 2. 3. 4. RAC Sharding Multi-shard queries supported for all sharding methods User-defined sharding and swim-lanes Geo-replicated databases Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | 23
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Oracle Database Native Sharding: a Customer Perspective John Kanagaraj, Sr. Member of Technical Staff, Pay. Pal Core Data Platform 32 Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Oracle SHARDING Vinoth Govindaraj www. linkedin. com/in/vgovindaraj Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
PPT模板下载:www. 1 ppt. com/moban/ 行业PPT模板:www. 1 ppt. com/hangye/ 节日PPT模板:www. 1 ppt. com/jieri/ PPT 素材下载:www. 1 ppt. com/sucai/ PPT背景图片:www. 1 ppt. com/beijing/ PPT 图表下载:www. 1 ppt. com/tubiao/ 优秀PPT下载:www. 1 ppt. com/xiazai/ PPT 教程: www. 1 ppt. com/powerpoint/ Word教程: www. 1 ppt. com/word/ Excel 教程:www. 1 ppt. com/excel/ 资料下载:www. 1 ppt. com/ziliao/ PPT 课件下载:www. 1 ppt. com/kejian/ 范文下载:www. 1 ppt. com/fanwen/ 试卷下载:www. 1 ppt. com/shiti/ 教案下载:www. 1 ppt. com/jiaoan/ PPT 论坛:www. 1 ppt. cn Implementation of Oracle Sharding for China Telecom’s Io. T We. Chat Customer Service Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Data Management Strategy Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Data Integration Solution Use Cases Data Profiling and Cleansing Data Governance Move a Data Warehouse into the Cloud Active-Active Databases Multi-Region Cloud Availability (Oracle or Amazon) Data High Availability Database record level sharding DW/Mart Automation Marketing Analytics on Big Data Cloud Data Migrations Data Catalog and Policies Serving Layer for Raw Data Access Customer 360 from Salesforce or Sales Cloud Oracle Database Migrations into 12 c Streaming Integration People. Soft or Workday into Fusion HCM 3 Kinds of Data Lineage for Lo. B and IT Users Prepared Data Subscriptions for Lo. B Streaming ETL for Data Pipelines Migrate from Amazon RDS to Oracle Cloud Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 36
BUT: Data Management is going through a major transformation… Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
After 20 yrs Reign… Hub-and-Spoke is now a Legacy LEGACY: ERP Discovery Enterprise BI Classical Data Management: Hub and Spoke • Invasive on Sources • ODS & ETL Hubs • High Latency / SLA • EDW/Mart Hubs • Mainly Relational Views • MDM/RDM Hubs • Heavy IT process overhead • Static Data Lake Hubs • Vendor-centric software Departmental BI Safe Harbor Statement App DB ETL Operational Data Store EDW ETL Staging Prod The following is intended to outline. ETLour. Martgeneral product direction. It is intended for ETL Next-Gen: Streaming Databus/Kappa Mart • Low into impact any on Sources • It Pub/Sub for Staging information purposes. Mart only, and not be incorporated contract. is not a MDM may. Mart Hub Mart • Low Latency (< 1 second) • ETL in Pipelines Less Governed --------------------------More Governed commitment to deliver any material, code, or functionality, and not reliedin Stream • Variety of Data should Formats • be Analytics/CEP • More Agile Dev. Ops processes • Data is in Motion upon in making purchasing decisions. The development, release, and timing of any • Open source centric software NEXT-GEN: features or. Web. Apps functionality described for Oracle’s products remains at the sole discretion of ERP Oracle. Less Governed -------------------------------- More Governed Mobile App DB ETL App DB No. SQL / APIs RESTful API for Producers and Subscribers (events are pushed) Golden. Gate App DB ETL No. SQL Raw Data Topics Schema Event Topics 1, 000’s 100’s Data Pipeline (ETL) Prepared Data Topics 10’s Data Pipeline (ETL) Master Data Topics Marts EDW Hadoop / Spark Apps / Mobile Departmental BI Enterprise BI Discovery Physical Layer for Events = MPP Messaging (eg; Apache Kafka) Physical Layer for ETL Pipelines = MPP Streaming (eg; Apache Spark Streaming) Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Open World 2015 38
Core Design Pattern: Kappa-style Databus Our Vision is to enable the modern ‘Kappa style' data architecture for Enterprise Strength solutions • • Raw Data Layer common ingestion point for all enterprise data sources Speed Layer data processing for streaming data and ETL data pipelines, in-memory Batch Layer data processing for huge data volumes, that may span long time periods, using MPP Serving Layer technologies for easy access to any data, at any latency Business Data Raw Data Layer Serving Layer Speed Layer Data Streams Social and Logs Raw Events Pub / Sub ETL Data Pipelines REST APIs Databus (topic modeling) Changed Data Enterprise Data Batch Layer Schema Events Highly Available Databases Stream Analytics Apps Data Staging or Archive No. SQL EDWs Data Discovery ETL Offload Analytics Bulk Data Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 39
Oracle Approach: Blend of Commercial + Open Source Modern Architecture will be a ‘Hybrid Open-Source’ pattern: • • Open Source at the core of speed and batch processing engines for general purpose data workloads Enterprise Vendors for connecting to legacy systems, strong governance, and for highly optimized workloads Cloud Platforms for Dev-Test (at least), rapid prototyping and eventually all production workloads Saa. S & Applications are key data “producers” and will remain largely proprietary and/or highly customized Business Data Streams Raw Data Layer Serving Layer Speed Layer Analytics Apps Pub / Sub Social and Logs REST APIs Enterprise Data Highly Available Databases Batch Layer Analytics No. SQL EDWs Bulk Data Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 40
Map. Reduce | Pig | Hive | Spark Hive Firehose Dynamo EMR Redshift Event Hubs Cassandra | HBase Data Factory Storm | Spark | Apex | Flink Pub/Sub Kinesis DMS Kafka Proof this is a Pattern: Many Instantiations Stream Analytics Table Storage Data Lake SQL Server Dataflow Big. Table Dataproc Big. Query Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
Best-of-Breed: Oracle Platform for Kappa-style Architecture Oracle Software can help customers Accelerate & Reduce Risk around adoption: • • • Ingest Data with lower latency, greater reliability and from any database using Oracle Golden. Gate ETP Pipelines for Data automate pipeline creation with zero-footprint using Oracle Data Integrator Analyze Data In-Motion run temporal, spatial and predictive algorithms with Oracle Stream Analytics Foundation Services for hosting Kafka (Event Hub) Spark/Hadoop (Big Data Cloud) or Relational (Database) Govern the data flowing through Kappa architecture with Oracle Metadata Management Business Data Raw Data Layer Serving Layer Data Integrator Apps Speed Layer Data Streams Analytics Stream Analytics Social and Logs Pub / Sub Analytics REST APIs Enterprise Data Batch Layer Golden. Gate Highly Available Databases No. SQL EDWs Event Hub Big Data Bulk Database Metadata Management (for Data Governance) Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 42
Load the Data Warehouse with DIPC Fully Documented, Best Practice Architecture Replicated Data, APIs & SQL Bulk Data Flow Oracle Doc Library: https: //docs. oracle. com/en/solutions/load-data-warehouse-for-business-analytics-oracle-cloud/ Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | 43
DIPC Integration to ADWC • Real-time replication with Golden. Gate in DIPC Remote Agent • Best Practice bulk load patterns – Object Storage & dbms_cloud orchestration • ETL and complex transformation with ODI in DIPC • End-to-end data warehouse lifecycle • Customer managed DIPC Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Synchronize Data with DIPC – Lab & Demo Real-Time data movement for Real-Time Analytics On Premise 3 rd Party Cloud Oracle Cloud Initial / Bulk Load Real-Time Replication Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 45
Kappa at Massive Scale Using e. Bay’s Rheos Presented by Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted
Rheos: A Business Focused Real-Time Data Platform ✓ Fully managed real-time streaming data platform built with Oracle Golden. Gate, Kafka, Mirror. Maker and Storm ✓ Provide shared, curated, “private” streams and stream processing computation running on e. Bay cloud ✓ Dynamic stream endpoint discovery ✓ Standardized data format & stream catalog ✓ Secure stream access control ✓ Data movement across security zones over a TLS connection ✓ Comprehensive monitoring, alerting and remediation Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Business Motivation Value Method ✓ Standardized event header with Avro and stream namespaces ✓ A schema registry to store metadata or schema definition for each stream ✓ Logical to physical stream mapping ✓ Data-Driven Recommendation ✓ Lifecycle Management Service for node provisioning, replacement, administering remediation SOPs ✓ Data-Driven Business Models ✓ End-to-end monitoring and alerting at the stream, node and cluster level ✓ Higher Conversion Rates ✓ Stream access authentication via Identity Service ✓ Data mirroring to support use cases’ HA model as well as their data movement requirements ✓ Data Democratization ✓ Real-Time Seller Insights Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Rheos Services Lifecycle Management Service - a cloud service that provisions and provides full lifecycle management for Zookeeper, Kafka, Storm, Mirror. Maker, [soon-to-be-available] Flink clusters Core Service - consists of these components: Kafka Proxy Server, Schema Registry, Metadata System, and Management Health Check Service - monitors the health of each asset (for example, a Kafka, Zookeeper, or Mirror. Maker node) that is provisioned through the Lifecycle Management Service in these aspects: node state, cluster health, source & sink traffic, lag and etc. Mirroring Service - provides high data availability and integrity by mirroring data from source cluster to one or more target clusters. This service is also used to perform data movement across security zones. Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Fun Facts Rheos @ Scale 2500+ 200+ compute nodes streams Alignment with Oracle 232+ 90+ OGG producers Oracle tables > 200 billion > 28 billion events per day change events per day 1400+ 840+ stream consumers stream producers second(s) latency from DB to Kafka Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
What’s Next? ✓Upgrade to Oracle Integrated Extract based solution ✓Provide Flink as Rheos’ stream processing framework ✓Full lifecycle management for stream processing applications ✓Run Flink and Kafka as Kubernetes cloud-natives Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Sushi Principle of Data: “Data is Best Served Raw” Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
Sushi Principle of Data: “Data is Best Served Raw” Poly. Structured <subscribe> <produce> Relational RAW DATA Many customers want to consume their data “raw” …they prefer it close to the source of truth <produce> All Enterprise Data Sources SCHEMA EVENTS Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 53
State of the Art Data Ingestion: Golden. Gate + Kappa Fastest, most scalable and non-invasive way to ingest data into Apache. Benefits of low-impact on Sources, micro-second access to transactions and ability to replicate schema (DDL) events for downstream automation of change impact. Raw Data Layer Apps Layer Speed Layer Streaming Analytics Serving Layer Golden. Gate for Big Data Supported Platforms REST Services User Updates From user update to serving layer in <1 second & no impact on Source Application DBMS Updates Deliver Pump Route Trail GG Capture Batch Layer GG Kafka Visualization Tools Reporting Tools HDFS Data Marts GG used with 4 of top 5 largest Kafka clusters in the world… Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 54
De-Coupling of the Database: Downstream Processing Remote DR Host Mid-Tier for Log Mine Golden. Gate Eliminate overhead on DBMS Primary Site REDO Transport Route WAN Secondary Deliver Always. On Trail Primary Pump Deliver Pump Route Trail Active Data. Guard Capture Primary Capture Business Apps Log Mine Secondary Always. On Remote Standby Golden. Gate Eliminate overhead on DBMS Primary Site Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 55
…But Sometimes Fully Prepared / Cooked is Needed Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
Prepared Data: ETL to “Cook” the Data for Consumption <subscribe> Poly. Structured <produce> Relational ETL RAW DATA ETL PREPARED DATA <subscribe> MASTER DATA Business-oriented consumers usually prefer that IT prepare the data for them <produce> All Enterprise Data Sources SCHEMA EVENTS Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 57
ETL Pipelines with Data Integrator Oracle Data Integrator Speed Layer Streaming Analytics Serving Layer REST Services Deliver Pump Route Trail GG Capture Raw Data Layer Data Integrator for Big Data ü Batch data ingestion with Sqoop, native loaders & Oozie ü Generate data transformations in Hive, Pig, Spark & Spark Streaming ü Extract data into external DBs, Files or Cloud Compare to Informatica / Talend Batch Layer API/File Visualization Tools Reporting Tools SQOOP + Native Loaders Data Marts ü No. ETL Engine native E-LT execution, 1000’s of references ü Zero Footprint does not require any Oracle install on cluster ü Loosely Coupled design time means you can reuse mapping logic in many big data languages Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 58
A Common Data Pattern: Access Data from REST/Kafka <subscribe> Poly. Structured <produce> ETL Data Science ETL <subscribe> <produce> Relational RAW DATA PREPARED DATA MASTER DATA <subscribe> <produce> All Enterprise Data Sources <subscribe> SCHEMA EVENTS Data Analysts Business Analyst DBAs Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 59
Kappa Data Flow Pattern using Oracle Tech Stack Data Producers DBMS Updates Oracle Event Hub 1 Topic : 1 Table Entire Enterprise Database Estate Schema Events (DDL) Golden. Gate Master Data Prepared Data Topics Analytic Data Raw Data (LCR) API Management Data Consumers <subscribe> Applications Streaming. Analytics Stream ODS (Data Store) Data Integrator <generate> Dev / Test Env. ETL CQL & Spatial Big Data Lake <generate> Data Warehouses Oracle Big Data Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
If. Raw. Transaction Data Were Food… Prepared Seared Fully Cooked VERY RAW. . . …SYNTACTIC PREPARATION…………RECORD LEVEL VALIDATION……. . AGGREGATE DATA Native Source Events as JSON Validated JSON Topics Aggregate Topics LCR$_ROW_RECORD type (LONG, LONGRAW, or LOB) and contains the following attributes: gg. handler. kafkahandler. Format (JSON) Topic Policy = phone. Number(!NULL) {"address": { "street. Address": "21 2 nd Street", "city": "New York", "state": "NY", "postal. Code": "10021" }, “ssn": "646554567" } gg. handler. kafkahandler. Format (JSON) { "first. Name": "Jonathan", "last. Name": "Smith", "age": 25, "address": { "street. Address": “ 101 Main Street", "city": “San Francisco", "state": “CA", "postal. Code": “ 27519" }, "phone. Number": [ { "type": “cell", "number": "212 555 -1234" }, { "type": "fax", "number": "646 555 -4567" } ] } Raw Data: sparsely populated Validated Data: • • • source_database_name: command_type: object_owner: object_name: tag: transaction_id: scn: old_values: LCRs from new_values: Raw Records: Databases; Log Events from Web/Mobile; App Events from Saa. S or ERP Applications raw records (eg; changes only) but syntactically normalized in JSON format { "first. Name": "John", "last. Name": "Smith", "age": 25, "address": { "street. Address": "21 2 nd Street", "city": "New York", "state": "NY", "postal. Code": "10021" }, "phone. Number": [ { "type": "home", "number": "212 555 -1234" }, { "type": "fax", populate the "number": "646 555 -4567" } ] } fully populated record, filter bad records or light transformations, records still 1: 1 with Source Master Data: Composite records have had ETL aggregations and may have merged attributes from many sources/topics or joins back to DBs Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
If Transaction Data Were Food…How Will You Eat Yours? Where does Canadian Tire want to start? Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential