Pentaho business analytics data integration Amjad akkawizaponet com

Pentaho business analytics & data integration Amjad. akkawi@zaponet. com

About US – Zaponet data science solutions Ø Zaponet is a service integrator and development shop providing solutions & professional services for building state of the art data-products which leverage big-data & data-science technologies. Ø Zaponet architect, design and builds big-data solutions: data warehouses, user-profile systems, recommendation engines, complex event processing and more Ø Some of our technology partners are: pentaho , cloudera , infobright , vertica, kognitio , gigaspaces • more details www. zaponet. com *future meetup: Pentaho Weka for data science

About Me – Amjad Akkawi Ø Zaponet CTO Ø Experience in pentaho

Agenda • Pentaho in business analytics & data integration • Pentaho BI Demo • Pentaho PDI Demo

About Pentaho • Recognized leader in business analytics & data integration • Subscription-based business model • Achieved critical mass: • Over 1, 200 commercial customers • Over 10, 000 production deployments • Over 185 countries • Stewardship of most important open source analytics projects INDUSTRY RECOGNITION OVER 160 PARTNERS GLOBALLY

Why Customer Love Pentaho Speed of Deployment Innovation & Scalability Superior Customer Service Marketing dashboard in less than 1 day Music files from 20, 000 sources “… better functionality and more support” 2 weeks time to market 8 weeks time to market Fully rolled out in budget in 4 months Operational reports at all 1000 retail stores Analyzing buying patterns of 5 million members Analytics on 500, 000 patients records “… top-notch professional support” “Pentaho support is as good as its software” “… a great partner through every phase of our project” € 350 K+ cost saving Less than 1 month ROI Total Value 75% lower acquisition costs “…ROI was almost immediate. ”

R • 3 rd Party BI Tools Applications • • Data Integration Scheduling Job Orchestration High Performance Workflow Visual IDE Hadoop Java Map. Reduce, Pig Pentaho Map. Reduce No. SQL Databases Analytic Databases Data Integration 3 rd Party Tools Big Data Mgmt Pentaho Business Analytics Big Analytics Pentaho in the Big Data Fabric

High Level Feature/Functions Self-service Interactive KPI & Metrics and Visualization Reporting Analysis Data Components are independent Dashboards Ad hoc and Operational Reports Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

Dashboards

Dashboards & Interactive Dashboards

Dashboards – Geo Location-Based

High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

Reports – Interactive, Static, Distributed

Reports – Reporting Pack & House Styles 15

Reports – Reporting Pack & House Styles

High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

Enhanced In-Memory Analytics • Enhanced in-memory caching for speed of thought visualization & analysis – More re-usability of in-memory data – Fewer trips to the database/disk • Builds on existing unique extreme-scale in-memory analytics – Support for external data grids • Infinispan / JBoss Enteprise Data Grid and Memcached • Scale to caching hundreds of GBs (potentially TBs) of data in-memory • Competition – Java heap or C++ memory space (a few GB at most (most BI products) or – Proprietary (hard to manage) in-memory technology (e. g. Qlikview, Microstrategy) 18

Analyzer – Table format

Analyzer – Chart format

Analyzer: Geo Location-Based Analysis

High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining

Scenario 1 Operational Database Dashboard Report

Scenario 2 Data Mart(s) / Warehouse Dashboard Metadata Report Analyzer

Metadata – Schema Workbench Complex calculations and multi-cube requirements may need more modeling

Scenario 3 BIG DATA Technology and/or Structured Data Unstructured Data 100 Staging Area & Data Vault PDI Data Mart(s) / Warehouse PDI Dashboard Metadata Report Analyzer Pentaho Data Integration Source data acquisition Cleansing Initial consolidation as required Transformation Change Data Capture Data Warehouse Management

Variations on a Theme Alerting BIG DATA Technology SMS, e. Mail & attachments and/or Structured Data Staging Area & Data Vault PDI Data Mart(s) / Warehouse PDI Dashboard Metadata Report Unstructured Data Analyzer Pentaho Data Integration Source data acquisition Cleansing Initial consolidation as required Transformation Change Data Capture Data Warehouse Management Ad-hoc Data

PDI Components • Enterprise Edition Data Integration Server – – – Execution and remote monitoring Integrated scheduling Enterprise Security options Enhanced content management including revision history and locking Remote distributed cluster based processing

Kettle Conceptual Model

Pentaho Data Integration Step based processing engine with instant visualization of results

Pentaho Data Integration Step based performance

Pentaho Data Integration Integrated Metadata Creation 32

Pentaho and Big Data Forrester Wave, Enterprise Hadoop Solutions, Q 1 2012 § Only vendor in strong performer category: “an impressive Hadoop integration tool” § Only business analytics vendor § Richest functionality § Most extensive integration with open source Apache Hadoop and major Hadoop distributions

Expanded Insight into Big and Diverse Data • Improved support for Hadoop – Simpler deployment across Hadoop clusters • Support for the Hadoop cache • Debian RPM installer – Performance and ease of use enhancements for Pentaho Map. Reduce visual development – Support for Hadoop Security data access • New No. SQL database support – Cassandra – Mongo. DB • Growing the Pentaho big data community – Open sourced all big data components (Hadoop & No. SQL) • Apache License – same as used by leading Hadoop and No. SQL distros – New big data developer resources: How to documents, videos, walk-throughs

Hadoop Data Management & Integration Accessible by any ETL developer or data scientist Pentaho Map. Reduce

No. SQL Data Management & Integration Visual Job Orchestration Any Data Source Accessible by any ETL developer or data scientist

Visual Job Orchestration Any Data Source Scheduling Accessible to any ETL developer or data scientist

Pentaho Integration Options Pentaho BI Server Other Application Pentaho Custom Stuff My Application Pentaho Components

Integration Bundled Mashup Value Fastest Way to Get Analytics that Have Your Look & Feel An Integrated Experience for Yours End User Customizing Pentaho for Your Experience Ultimate Integration and Customization What it Takes? • Pentaho is a separate app, branded with Partner’s logo, look & feel • Pentaho & Partner app have the same UI • Pentaho’s core functionality is extended through plug-ins. Examples: - Connecting to custom data sources - Adding new visualizations - Customizing security - Replacing Pentaho rules engine • Integrate with Partner’s App Server • HTML skills • Java skills • Knowledge of Pentaho architecture • Optional: Partner app may include links to Pentaho reports, analysis and dashboards (popping new window) • Pentaho User Console, or individual reports, analysis or dashboards are included in partner app • Single sign-on creates a seamless experience Extended Embedded • Directly embedding Pentaho into your app • Calling Pentaho Java APIs from your App • Optional: Single sign-on creates a seamless experience Skill Level • Limited HTML skills • HTML skills

Q&A Pentaho PDI Demo Pentaho BI Demo

“Traditional” Database Support DATA ANALYSIS DATA INTEGRATION

Broadest Support for Big Data Platforms Hadoop No. SQL Analytic Databases
- Slides: 42