Machine Learning Oracle Hrvatska Igor Raji Business Development
Machine Learning Oracle Hrvatska Igor Rajić Business Development Manager Fusion Middleware & Business Analytics Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2016 Oracle and/or its affiliates. All rights reserved. 2
Traži se „Data Scientist”! • Harvard Business Review „Data Scientist: The Sexiest Job of the 21 st Century” • Glassdoor: The best job in USA - 2016 • Deloitte, IDC, Gartner itd. . • Machine Learning ? Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Advanced analytics Predictive analytics Data Mining Machine learning Copyright © 2016 Oracle and/or its affiliates. All rights reserved. 4
Machine Learning • Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. • Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Machine Learning in commercial sector • Targeting the right customer with the right offer • How is a customer likely to respond to an offer? • Finding the most profitable growth opportunities • Finding and preventing customer churn • Maximizing cross-business impact • Security and suspicious activity detection • Understanding sentiments in customer conversations • Reducing medical errors & improving quality of health • Understanding influencers in social networks • Spam filtering Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
ML in Education • Content analytics that organize and optimize content modules • Learning analytics that track student knowledge and recommend next steps: – Adaptive learning systems – Game-based learning • Dynamic scheduling matches students that need help with teachers that have time • Grading systems that assess and score student responses to assessments and computer assignments at large scale, either automatically or via peer grading Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 7
Preparing the data „Learning” Model selection Evaluation Deployment Prediction Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Predicting house prices – model 1 linear Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Predicting house prices – model 2 quadratic Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Predicting house prices – model 3 high order polynomial Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Predicting house prices - overfitting Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Multiple features (input variables) Other possible features • # bedrooms • Plot size • Quality of neighborhood • Year of construction • … • Copyright © 2016 Oracle and/or its affiliates. All rights reserved. Few 100 s typical
Where does complexity come from? • Model choice • Number of „features” (input variables) – F – Text – every word is a feature – Genetics – 20. 000 features • Number of records of data - R • In general it is important that R>>>F – Big Data • High volume of data helps with avoiding overfitting via cross validation and other techniques Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
More Data Variety—Better Predictive Models • Increasing sources of relevant data can boost model accuracy 100% True positive Rate Model with “Big Data” and hundreds -- thousands of input variables including: • Demographic data • Purchase POS transactional data • “Unstructured data”, text & comments • Spatial location data • Long term vs. recent historical behavior • Web visits • Sensor data • etc. 100% Naïve Guess or Random Model with 20 variables Model with 75 variables Model with 250 variables 0% False Positive Rate Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Other models • Identify most important factor (Attribute Importance) • Predict customer behavior (Classification) • Predict or estimate a value (Regression) • Find profiles of targeted people or items (Decision Trees) • Segment a population (Clustering) • Find fraudulent or “rare events” (Anomaly Detection) • Determine co-occurring items in a “baskets” (Associations) Copyright © 2016 Oracle and/or its affiliates. All rights reserved. A 1 A 2 A 3 A 4 A 5 A 6 A 7
Unsupervised machine learning - clustering Find similar! Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
As a conclusion (so far) • Machine Learning in general is a complex subject area which requires highly skilled people – data scientists – combination of IT and mathematical/statistical skills • Majority of technologies and implementations are batch (slow) or recently near real-time, specially related to the „learning” process of ML • Can we simplify it? • Can we do it in real-time? Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Data Miner Survey 2016 by Rexer Analytics While 6 out 10 data miners report the data is available for analysis within days of capture, the time to deploy the models takes substantially longer. For 60% of the respondents the deployment time will range Everyone between 3 weeks and 1 year. forgets about deployment – but is most important component! Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Traditional vs. Oracle Machine Learning/Predictive Analtyics • Traditional— “Move the data” —“Don’t move the data!” Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 20
Traditional vs. Oracle Machine Learning/Predictive Analtyics • Traditional— “Move the data” — “Move the algorithms” Simpler, Smarter Data Management + Analytics / Machine Learning Architecture Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 21
Zagrebačka Bank (biggest bank in Croatia) Increases Cash Loans by 15% Within 18 Months of Deployment Objectives § Needed to speed up entire advanced analytics process; data prep was taking 3 days; model building 24 hours § Faster time to “actionable analytics” for Credit Risk Modeling and Targeted Customer Campaigns § “With Oracle Advanced Analytics we execute computations on thousands of attributes in parallel—impossible with open-source R. Analyzing in Oracle Database without moving data increases our agility. Oracle Advanced Analytics enables us to make quality decisions on time, increasing our cash loans business 15%. ” – Jadranka Novoselovic, Head of BI Dev. , Zagrebačka Bank Solution “We chose Oracle because our entire data modeling process runs on the same machine with the highest performance and level of Analytics platform for statistical modeling and predictive integration. With Oracle Database we simply switched on the Oracle analytics Advanced Analytics option and § Increased prediction performance by leveraging the needed no new tools, ” security, reliability, performance, and scalability of – Sinisa Behin, ICT coordinator at BI Dev. Zagrebačka Bank Oracle Database and Oracle Advanced Analytics for § Zaba migrated from SAS to the Oracle Advanced predictive analytics—running data preparation, transformation, model building, and model scoring within the database Zaba. Bank Oracle Customer Snapshot on OTN Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Oracle RTD Self Learning RTD Complements Traditional Data Mining Traditional Learning Process: models lag by weeks or months Source Databases Analytical Mart Data Mining Tools Scores and Lists Operational Applications feedback: days or weeks Continuous Self-Learning Process: models are updated in real-time • Automatic model creation • Quick to react when behavior changes • Both learning and scoring in Real-Time • Allows broader scope of analysis • Simple to implement and run Operational Applications Self-Learning Analytics Advantages: input from external models and lists events decisions feedback: immediate Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
To a more dynamic approach Real-time & fully Define some basic automated marketing rules RTD self-learns based on success, all relevant data and rules. Auto adjusts future offers accordingly RTD informed on success or failure of each offer RTD Manual Activity Automated Supply all relevant data Interact with the Individual Customer (e. g. Website, App, Email, Call, or Mail) Copyright © 2016 Oracle and/or its affiliates. All rights reserved. 24
Oracle Cloud Day May 12 th 2016 Machine Learning in financial industry: Oracle RTD implementation in PBZ Maja Salamon Project manager Privredna banka Zagreb, 12 May 2016 Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Business have a silo view of channels Buying Journey Has Become More Complex Research Shop Buy Pickup Buy Online Search Tablet Check Delivery Status Receive Email Offer Browse Reviews Write Review Chat Like Check With Friends Tweet Social Inspect Product In-Store Pickup Store Call Center Product. Related Call Service Follow-On Purchase Web Mobile Siloed Channels Create Inconsistency Call for Accessory Information Web Mobile Tablet Social Store Call Center Pricing Pricing Promotions Promotions Order Capture Order Capture Logic Logic Data Data Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Betfair boosts revenue with Oracle analytics James Knight, web capabilities product manager at Betfair, told delegates yesterday at Gartner's Business Intelligence Summit that an initial 23 potential suppliers was eventually cut down to one. "We went through a rigorous selection process and found 23 potential suppliers. We cut this to a shortlist of six, who performed technology presentations for us, after which we cut to three suppliers. " After visiting firms around the US and Europe, Betfair eventually decided to select Oracle's RTD. "We've seen a 400 per cent uplift in click rate in the target group which is driven by RTD, " explained Knight. http: //www. computing. co. uk/ctg/news/2144534/betfair-boosts-revenue-oracle-analytics Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 27
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Oracle’s Advanced Analytics Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics Traditional Analytics Major Benefits § Data remains in Database & Hadoop § Model building and scoring occur in-database § Use R packages with data-parallel invocations § Leverage investment in Oracle IT § Eliminate data duplication § Eliminate separate analytical servers § Deliver enterprise-wide applications § GUI for Predictive Analytics & code gen § R interface leverages database as HPC engine Oracle Advanced Analytics Data Import Data Mining Model “Scoring” Data Prep. & Transformation avings Data Mining Model Building Data Prep & Transformation Data Extraction Model “Scoring” Embedded Data Prep Model Building Data Preparation Hours, Days or Weeks Copyright © 2016 Oracle and/or its affiliates. All rights reserved. Secs, Mins or Hours
“We increased our revenue by 40%” – Mark Sucrese, Marketing IT Director, Dell Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 30
Customer & Brand 360 Powered by Oracle Real-Time Decisions, BI, Siebel, Eloqua Improve revenues and operations through personalized predictive analytics $132 M in net new revenue FY 2012 40% reduction in cost of dispatch “The insights are amazing. You can really see customers' buying patterns and interests, how they change over time, and we can take action on that. ” Mark Sucrese, Marketing Director Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 31
Dell RTD is Live in 3 Channels, 4 th is WIP Service Contact Centre Sales Contact Centre Email content personalization Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 32
Project#3: Email Personalization Personalize email subjects header to increase opening rates Send emails at optimal time based on learned behaviors. Propose Upsell / Cross-Sell offers at mail opening with high probability of a sale transformation Learn in real-time on mail opening, click, cart, revenue generating positive events Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 33
Oracle’s Unified Big Data Management and Analytics Strategy Oracle BI Foundation Suite & Data V. Exalytics • Experiment, Prototype, Collaborate Oracle Real-Time Decisions – Quickly find, explore, transform, discover and share in BDD Experiment, Prototype & Collaborate Emerging Sources Productize, Secure & Govern Existing Sources In-Memory Appliance Oracle Advanced Analytics Exadata – Publish results to HDFS Oracle Big Data SQL – Use to build predictive models with Oracle R for Hadoop Tables in DB • Productize, Secure, Govern Oracle Database – Connect published HDFS files to secure Oracle DB using Oracle Big Data SQL Data Warehouse Oracle Big Data Discovery SQL join ORAAH – No data movement required – Seamlessly extends existing DWH and BI investments with nontraditional data in Hadoop BDA Hadoop (HDFS) Data Reservoir Tables in Hadoop • Available as Engineered Systems Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
The Result of Silo’d Marketing is A Broken Customer Experience Marketers Lean Heavily on Fragmented Tools Pass Fragmentation Onto Customer 78% of customers don’t receive consistent experience across channels. — Accenture Bombarded, Customers Don’t Convert or They Leave 94% of customers have discontinued communication with a company because of irrelevant messages. — Blue Research Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 1
Real-Time Interactions Optimization Value Dimension Value Effectiveness Option 3: • Real-Time Self-learning + Offline Data Mining + Rules + Performance Goals Option 2: • Offline Data Mining + Rules Option 1: • Rules only Doing Nothing Efficiency Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | TCO Oracle Confidential 36
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 37
Many Organizations Are Facing Similar Challenges Objectives Timeliness and Relevance Multi-Channel Support Ease of Integration Problems Today Best Practice Over-reliance on business rules Balance between model-based and user-defined decisions Long lead times between analysis and deployment High degree of automation / Self-learning models Poor channel integration Pervasive solution spanning all customer-facing applications Siloed solution for each channel Common set of models and metadata for all channels Poor real-time performance and scalability Service-oriented architecture with guaranteed response times Source: Oracle Insight analysis Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Classification • Targeting the right customer with the right offer • How is a customer likely to respond to an offer? • Finding and preventing customer churn • Security and suspicious activity detection • Understanding sentiments in customer conversations • Reducing medical errors & improving quality of health • Spam filtering Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Classification - Logistic regression # day negative account balance • Uses Generalized Linear Model for scoring Logit function # day positive account balance Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Classification - Logistic regression Score sigmoid(Score) Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Oracle’s Advanced Analytics (Machine Learning Platform) Multiple interfaces across platforms — SQL, R, GUI, Dashboards, Apps Information Producers R programmers Data & Business Analysts Users R Client SQL Developer/ Oracle Data Miner Information Consumers Business Analysts/Mgrs Domain End Users OBIEE Applications Platform Hadoop HQL ORAAH Parallel, distributed algorithms Oracle Database Enterprise Edition Oracle Advanced Analytics - Database Option SQL Data Mining, ML & Analytic Functions + R Integration for Scalable, Distributed, Parallel in-DB ML Execution Oracle Cloud Advanced Analytics Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Oracle Database 12 c
You Can Think of Oracle’s Advanced Analytics Like This… Traditional SQL Oracle Advanced Analytics - SQL & – “Human-driven” queries – Domain expertise – Any “rules” must be defined and managed SQL Queries – – – – SELECT DISTINCT AGGREGATE WHERE AND OR GROUP BY ORDER BY RANK – Automated knowledge discovery, model building and deployment – Domain expertise to assemble the “right” data to mine/analyze + Analytical SQL “Verbs” – – – – PREDICT DETECT CLUSTER CLASSIFY REGRESS PROFILE IDENTIFY FACTORS ASSOCIATE Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
R—Widely Popular R is a statistics language R environment • Strengths – Powerful & Extensible – Graphical & Extensive statistics – Free—open source • Challenges – Memory constrained – Single threaded – Outer loop—slows down process – Not industrial strength Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
Oracle Advanced Analytics Database Evolution Analytical SQL in the Database • New algorithms (EM, PCA, SVD) • Predictive Queries • SQLDEV/Oracle Data Miner 4. 0 SQL script generation and SQL • ODM 11 g & 11 g. R 2 adds Auto. Data. Prep (ADP), text Query node (R integration) mining, perf. improvements • OAA/ORE 1. 3 + 1. 4 • SQLDEV/Oracle Data Miner adds NN, Stepwise, • Oracle Data Mining scalable R algorithms 3. 2 “work flow” GUI 10 g. R 2 SQL - 7 new launched • Oracle Adv. Analytics SQL dm algorithms • Integration with “R” and for Hadoop Connector • Oracle Data Mining and new Oracle Data introduction/addition of launched with • Oracle acquires Miner “Classic” 9. 2 i launched – 2 scalable BDA Oracle R Enterprise Thinking Machine wizards driven GUI algorithms (NB • Product renamed “Oracle algorithms Corp’s dev. team + and AR) via Java • SQL statistical • 7 Data Mining “Darwin” data Advanced Analytics (ODM + functions introduced API “Partners” ORE) mining software 1998 1999 2002 2004 2005 2008 Copyright © 2016 Oracle and/or its affiliates. All rights reserved. 2011 2014
Agenda • Prema mnogim studijama "Data Scientist" je jedan od onih poslova budućnosti za koje već sada nedostaje veliki broj obrazovanih stručnjaka. • Djelatnost "Data Scientista" je usko vezana uz područje Machine Learning-a (ML) • Što je ML i gdje se sve koristi? • Koje poslove obavljaju "Data Scientisti"? • Kako možemo primijeniti ML u obrazovanju? • Primjere implementacija ML u Hrvatskoj u financijskom sektoru Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
- Slides: 46