Pentaho business analytics data integration Amjad akkawizaponet com
Pentaho business analytics & data integration Amjad. akkawi@zaponet. com
About US – Zaponet data science solutions Ø Zaponet is a service integrator and development shop providing solutions & professional services for building state of the art data-products which leverage big-data & data-science technologies. Ø Zaponet architect, design and builds big-data solutions: data warehouses, user-profile systems, recommendation engines, complex event processing and more Ø Some of our technology partners are: pentaho , cloudera , infobright , vertica, kognitio , gigaspaces • more details www. zaponet. com *future meetup: Pentaho Weka for data science
About Me – Amjad Akkawi Ø Zaponet CTO Ø Experience in pentaho
Agenda • Pentaho in business analytics & data integration • Pentaho BI Demo • Pentaho PDI Demo
About Pentaho • Recognized leader in business analytics & data integration • Subscription-based business model • Achieved critical mass: • Over 1, 200 commercial customers • Over 10, 000 production deployments • Over 185 countries • Stewardship of most important open source analytics projects INDUSTRY RECOGNITION OVER 160 PARTNERS GLOBALLY
Why Customer Love Pentaho Speed of Deployment Innovation & Scalability Superior Customer Service Marketing dashboard in less than 1 day Music files from 20, 000 sources “… better functionality and more support” 2 weeks time to market 8 weeks time to market Fully rolled out in budget in 4 months Operational reports at all 1000 retail stores Analyzing buying patterns of 5 million members Analytics on 500, 000 patients records “… top-notch professional support” “Pentaho support is as good as its software” “… a great partner through every phase of our project” € 350 K+ cost saving Less than 1 month ROI Total Value 75% lower acquisition costs “…ROI was almost immediate. ”
R • 3 rd Party BI Tools Applications • • Data Integration Scheduling Job Orchestration High Performance Workflow Visual IDE Hadoop Java Map. Reduce, Pig Pentaho Map. Reduce No. SQL Databases Analytic Databases Data Integration 3 rd Party Tools Big Data Mgmt Pentaho Business Analytics Big Analytics Pentaho in the Big Data Fabric
High Level Feature/Functions Self-service Interactive KPI & Metrics and Visualization Reporting Analysis Data Components are independent Dashboards Ad hoc and Operational Reports Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining
High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining
Dashboards
Dashboards & Interactive Dashboards
Dashboards – Geo Location-Based
High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining
Reports – Interactive, Static, Distributed
Reports – Reporting Pack & House Styles 15
Reports – Reporting Pack & House Styles
High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining
Enhanced In-Memory Analytics • Enhanced in-memory caching for speed of thought visualization & analysis – More re-usability of in-memory data – Fewer trips to the database/disk • Builds on existing unique extreme-scale in-memory analytics – Support for external data grids • Infinispan / JBoss Enteprise Data Grid and Memcached • Scale to caching hundreds of GBs (potentially TBs) of data in-memory • Competition – Java heap or C++ memory space (a few GB at most (most BI products) or – Proprietary (hard to manage) in-memory technology (e. g. Qlikview, Microstrategy) 18
Analyzer – Table format
Analyzer – Chart format
Analyzer: Geo Location-Based Analysis
High Level Feature/Functions Dashboards Ad hoc and Operational Reports Reporting Analysis Data Self-service Interactive KPI & Metrics and Visualization Self-service Interactive and Ad Hoc Analysis Information Consumers Business Users Knowledge Workers/ Business Users High Performance Data Integration, BIG DATA, Cleansing and Presentation Power Users, Developers & DBAs Advanced Predictive Analysis Advanced Power Users & Viewers Data Mining
Scenario 1 Operational Database Dashboard Report
Scenario 2 Data Mart(s) / Warehouse Dashboard Metadata Report Analyzer
Metadata – Schema Workbench Complex calculations and multi-cube requirements may need more modeling
Scenario 3 BIG DATA Technology and/or Structured Data Unstructured Data 100 Staging Area & Data Vault PDI Data Mart(s) / Warehouse PDI Dashboard Metadata Report Analyzer Pentaho Data Integration Source data acquisition Cleansing Initial consolidation as required Transformation Change Data Capture Data Warehouse Management
Variations on a Theme Alerting BIG DATA Technology SMS, e. Mail & attachments and/or Structured Data Staging Area & Data Vault PDI Data Mart(s) / Warehouse PDI Dashboard Metadata Report Unstructured Data Analyzer Pentaho Data Integration Source data acquisition Cleansing Initial consolidation as required Transformation Change Data Capture Data Warehouse Management Ad-hoc Data
PDI Components • Enterprise Edition Data Integration Server – – – Execution and remote monitoring Integrated scheduling Enterprise Security options Enhanced content management including revision history and locking Remote distributed cluster based processing
Kettle Conceptual Model
Pentaho Data Integration Step based processing engine with instant visualization of results
Pentaho Data Integration Step based performance
Pentaho Data Integration Integrated Metadata Creation 32
Pentaho and Big Data Forrester Wave, Enterprise Hadoop Solutions, Q 1 2012 § Only vendor in strong performer category: “an impressive Hadoop integration tool” § Only business analytics vendor § Richest functionality § Most extensive integration with open source Apache Hadoop and major Hadoop distributions
Expanded Insight into Big and Diverse Data • Improved support for Hadoop – Simpler deployment across Hadoop clusters • Support for the Hadoop cache • Debian RPM installer – Performance and ease of use enhancements for Pentaho Map. Reduce visual development – Support for Hadoop Security data access • New No. SQL database support – Cassandra – Mongo. DB • Growing the Pentaho big data community – Open sourced all big data components (Hadoop & No. SQL) • Apache License – same as used by leading Hadoop and No. SQL distros – New big data developer resources: How to documents, videos, walk-throughs
Hadoop Data Management & Integration Accessible by any ETL developer or data scientist Pentaho Map. Reduce
No. SQL Data Management & Integration Visual Job Orchestration Any Data Source Accessible by any ETL developer or data scientist
Visual Job Orchestration Any Data Source Scheduling Accessible to any ETL developer or data scientist
Pentaho Integration Options Pentaho BI Server Other Application Pentaho Custom Stuff My Application Pentaho Components
Integration Bundled Mashup Value Fastest Way to Get Analytics that Have Your Look & Feel An Integrated Experience for Yours End User Customizing Pentaho for Your Experience Ultimate Integration and Customization What it Takes? • Pentaho is a separate app, branded with Partner’s logo, look & feel • Pentaho & Partner app have the same UI • Pentaho’s core functionality is extended through plug-ins. Examples: - Connecting to custom data sources - Adding new visualizations - Customizing security - Replacing Pentaho rules engine • Integrate with Partner’s App Server • HTML skills • Java skills • Knowledge of Pentaho architecture • Optional: Partner app may include links to Pentaho reports, analysis and dashboards (popping new window) • Pentaho User Console, or individual reports, analysis or dashboards are included in partner app • Single sign-on creates a seamless experience Extended Embedded • Directly embedding Pentaho into your app • Calling Pentaho Java APIs from your App • Optional: Single sign-on creates a seamless experience Skill Level • Limited HTML skills • HTML skills
Q&A Pentaho PDI Demo Pentaho BI Demo
“Traditional” Database Support DATA ANALYSIS DATA INTEGRATION
Broadest Support for Big Data Platforms Hadoop No. SQL Analytic Databases
- Slides: 42