Oracle Big Data e Seminar Series 1 Copyright
Oracle Big Data e. Seminar Series 1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data e. Seminar Series 2 TITLE DATE, TIME Oracle Big Data Appliance April 10, 2012, 10 AM PST Oracle No. SQL Solutions April 17, 2012, 10 AM PST Oracle Data Integrator Application Adapter for Hadoop April 24, 2012, 10 AM PST Oracle Analytics for Big Data May 3, 2012, 10 AM PST Integrating Big Data with the Enterprise May 10, 2012, 10 AM PST Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Announcing Oracle Big Data 3 Day Hands-On 3 Technical Workshop • When: May 8 -10, 2012, Registration details coming soon • Where: Oracle Redwood Shores, CA, USA • Agenda – Oracle Big Data Appliance technical architecture and its hardware and software features – Oracle No. SQL Database – Oracle R Connector for Hadoop and Oracle Analytics for Big Data – Oracle Loader for Hadoop and Oracle Direct Connector for HDFS 3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Platform 4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Integrated Solution Stack for Big Data 5 (Map. Reduce) Oracle No. SQL Database Oracle Big Data Connectors Enterprise Applications Oracle Data Integrator ACQUIRE ORGANIZE Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Data Warehouse In-Database Analytics Hadoop HDFS ANALYZE Analytic Applications DECIDE
Oracle Big Data Platform 6 Marty Gubar BI/DW Product Management Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Agenda • Use Cases – Demonstrate Sentiment Analysis of Tweets • Introduce the Oracle Big Data Platform • Review Steps to Building Application 7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Data Use Cases 8 Industry Big Data Use Cases Potential Benefits Banking & Finance • Analysis of data sets across lines of business (loans, insurance, on-line banking, card products) for market assessment • Risk analysis & revenue lift for new & existing products • Analysis of stock portfolio trends & risk • Increased share of customer • Increased customer loyalty • Increased overall revenue • Decreased financial risk Healthcare • Analysis of unexpected health condition associations using electronic health records and visualization • Improved quality of care • Reduced cost of care High Tech / Manufacturing / Mobile Devices • Product failure analysis • Patent records research • Analysis of mobile device usage by location • Optimized manufacturing • Lower cost of warranty claims • Faster problem resolution Retail • Location based targeted programs & promotions • Social network buying analysis • Just-in-time promotions raising spend • Understanding of customer sentiments Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Sentiment Analysis Using Twitter Feeds Demonstration 9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Social Media Sentiment Analysis • Airlines actively monitoring and responding to Tweets • Identify opportunities – “It’s really cold. I wish I were going to…” – Customer conversion • Identify customer service issues – Keep customers happy. Nothing is private! – Avoid negative “buzz Millions of tweets – which ones are important? http: //online. wsj. com/article/SB 10001424052702304173704575578321161564104. html 10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Platform 11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Platform Big Data Appliance Data Exadata Applications ACQUIRE 12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Big Data Connectors Infini. Band Oracle Data Integrator ORGANIZE Oracle Advanced Analytics Data Warehouse Oracle Database In-Database Analytics Hadoop Oracle No. SQL Database Exalytics Exadata Appliance Open Source R Exalytics Infini. Band ANALYZE Analytic Applications Alerts, Dashboards, MDAnalysis, Reports, Query Web Services BI Abstraction DECIDE
Big Data In Action DECIDE ACQUIRE Acquire all available data ANALYZE 13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. ORGANIZE
Two Sets of Characteristics 14 Batch-Oriented Real-Time Process data to use Deliver a service Bulk storage Fast access to specific record Write once, read all Read, write, delete update Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Best Choices 15 Hadoop Distributed File System (HDFS) File System Oracle No. SQL Database Parallel scanning Indexed storage No inherent structure Simple data structure High volume writes High volume random reads and writes Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Database
Acquiring Twitter Data Bulk Collect vs. Streaming Search User / Lookup • Tweet • User Name • Time • User Name • # Followers • # Friends • Bulk collect using Search & User/Lookup ‒ Use search terms to acquire relevant tweets ‒ Search does not return social importance metrics ‒ Pair with User/Lookup for complete user details • Save to HDFS Oracle BDA Hadoop File System Streaming • • • 16 Tweet User Name Time # Followers # Friends Copyright © 2011, Oracle and/or its affiliates. All rights reserved. ‒ Hadoop delivers File. System API over HDFS ‒ Use standard Java file i/o classes and methods for reading/writing to that file system • Continuous data collection using Streaming ‒ Use search terms to acquire relevant tweets ‒ Streaming returns all the key user metrics
Tweets Stored in HDFS Tweet stream captured to XML file in HDFS 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Data in Action DECIDE ACQUIRE Organize and distill big data using massive parallelism ANALYZE 18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. ORGANIZE
Organize Derive Meaning from Source Data 1. Map Filters and interprets the source – producing key/value pairs 2. Reduce Summarizes the sorted map results – producing the final key/value output 19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Organize Load Results into Oracle Database at 12 TB/hour • Oracle Loader for Hadoop (OLH) – A Map. Reduce utility to optimize data loading from HDFS into Oracle Database • Oracle Direct Connector for HDFS – Access data directly in HDFS using external tables • ODI Application Adapter for Hadoop – ODI Knowledge Modules optimized for Hive and OLH 20 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Loader for Hadoop Use The Cluster ORACLE LOADER FOR HADOOP MAP REDUCE MAP SHUFFLE /SORT MAP MAP 21 REDUCE SHUFFLE /SORT Copyright © 2011, Oracle and/or its affiliates. All rights reserved. REDUCE Last stage in Map. Reduce workflow Partitioned and nonpartitioned tables Online and offline loads
Oracle Direct Connector for HDFS Direct Access from Oracle Database HDFS SQL Query External Table SQL access to HDFS External table view Data query or import Infini 22 Band Copyright © 2011, Oracle and/or its affiliates. All rights reserved. DCH DCH HDFS Client
Big Data in Action DECIDE ACQUIRE Analyze all your data, at once ANALYZE 23 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. ORGANIZE
Oracle In-Database Analytics Platform Spatial Analytics Oracle R Enterprise Oracle Data Mining SQL Analytics Text and Search Parallel Processing Engine XML Data Layer 24 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Relational OLAP Spatial RDF Media
R Statistical Programming Language Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality graphs Highly extensible 25 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Why R Wasn’t Ready for the Enterprise R 26 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Small data models only are stored and run on user’s laptop
Oracle R Enterprise Approach Models run in-database R Processes large data sets Uses the power of Oracle Database 11 g and Exadata Same code, much faster 27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle R Hadoop Connector Native R Access to Hadoop Client Host Oracle Big Data Appliance R Engine ORE ORHC Hadoop Cluster Software Map. Reduce Nodes HDFS 28 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Native R Map. Reduce Native R HDFS access
Oracle Exalytics In-Memory Machine Speed of Thought Interactive Analysis Free Exploration Dense Visualizations Fully Mobile 29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Data Technologies • Time to Build? • Required Optimizations? • Cost and Difficulty Maintaining? 30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Appliance Hardware • 18 Sun X 4270 M 2 Servers per BDA – 864 GB memory – 216 cores – 648 TB storage • 40 Gb/s Infini. Band Fabric – Inter-rack Connectivity – Inter-node Connectivity • 10 Gb/s Ethernet Connectivity – Data center connectivity 31 Copyright © 2011, Oracle and/or its affiliates. All rights. Full reserved. Rack Configuration Only
Horizontal Scale Out Model Scale out by connecting racks to each other using Infini. Band Same way to connect Exadata machines in your configuration 32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Infini. Band Top of Rack
Cloudera Distribution Including Apache Hadoop • Fast evolution in critical features – Built by the Hadoop experts in the community – Practical instead of esoteric – Focus on what is needed for large clusters • Proven at very large scale – In production at all the large consumers of Hadoop – Extremely stable in those environments • Managed and Tested by Cloudera – Managed Open Source components – Contains a rich management GUI tool 33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Cloudera CDH 3 Distribution Details • • • Apache Hadoop Apache Hive Apache Pig Apache HBase Apache Zookeeper Apache Flume • • • Apache Sqoop Apache Mahout Apache Whirr Apache Oozie Fuse-DFS Hue Plus Cloudera Manager 34 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Data Platform Summary Big Data for the Enterprise • Optimized and Complete – Everything you need to store and integrate your lower information density data • Integrated with Oracle Exadata • Analyze all your data • Easy to Deploy – Risk Free, Quick Installation and Setup • Single Vendor Support – Full Oracle support for the entire system and software set 35 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
36 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
- Slides: 36