Welcome to Make Big Data Work org Todays
Welcome to Make. Big. Data. Work. org Today’s webinar Making Big Data Work in the Cloud www. Make. Big. Data. Work. org
Introduction • Who is Make Big Data Work ? • It’s a consortium of Streamsets, Waterline Data, Trifacta, and Arcadia Data created to educate, share best practices, and increase the overall effectiveness of Big Data. • Webinar logistics • • • Everyone will be in a “listen only” mode Please use the Chat function to ask questions Slides will be sent to each attendee later this week. Copyright (C) 2018 451 Research LLC
Presenters Keynote Presenter Matt Aslett Research Director, data platforms and analytics Big Data Expert Panelists Data Ingestion Clarke Patterson Data Preparation Bertrand Cariou Data Catalog Bob Hagenau Big Data Analytics Steve Wooledge Copyright (C) 2018 451 Research LLC
Make Big Data Work in the Cloud: Top Trends and Technologies Matt Aslett, Research Director, Data Platforms & Analytics Copyright (C) 2018 451 Research LLC
451 Research Is a Leading IT Research & Advisory Company RESEARCH & DATA ADVISORY EVENTS GO 2 MARKET Founded in 2000 250+ employees, including over 120 analysts 1, 000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 85, 000+ IT professionals, business users and consumers in our research community 2, 000+ technology & service providers under coverage 451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Austin, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia 2 Copyright (C) 2018 451 Research LLC
The cloud is becoming the default business environment 6 Copyright (C) 2018 451 Research LLC
Public cloud (Iaa. S) functions currently in use Iaa. S/public cloud users Relational database Data/business analytics Containers Auto-scaling Data warehouse Serverless compute/function as a service No. SQL database Real-time/streaming data processing Machine learning Mobile services Io. T platform Large-scale/batch data transfer Other None 45% 42% 41% 37% 33% 30% 25% 23% 22% 21% 16% 14% 5% 8% % of respondents (n = 322) Q 4. Which of the following Iaa. S features is your organization using in connection with your Iaa. S/public cloud deployment? Please select all that apply. Source: 451 Research, Voice of the Enterprise: Cloud, Hosting & Managed Services, Workloads and Key Projects 2018 7
Public cloud (Iaa. S) functions planned for implementation Iaa. S/public cloud users Machine learning Containers Data/business analytics Serverless compute/function as a service Real-time/streaming data processing Auto-scaling Io. T platform Relational database Data warehouse Mobile services No. SQL database Large-scale/batch data transfer Other None 27% 19% 18% 16% 15% 13% 12% 12% 10% 2% 18% % of respondents (n = 268) Q 5. Which of the following Iaa. S features is your organization planning to begin using in connection with Iaa. S/public cloud services during the next year? Please select all that apply. Source: 451 Research, Voice of the Enterprise: Cloud, Hosting & Managed Services, Workloads and Key Projects 2018 8
Primary environment used to operate data processing, analytics, business intelligence functions today Respondents with data processing, analytics and business intelligence functions in place 35% On-premises 'traditional' resources and infrastructure 21% On-premises private cloud IT resources and infrastructure 12% Infrastructure as a service (Iaa. S)/platform as a service (Paa. S)/public cloud 12% Hosted private cloud 11% Software as a service (Saa. S) and hosted applications 8% Hosted, non-cloud infrastructure None 2018 2% 2020 % of respondents (n = 66) Q 18. Which of the following best describes the primary environment used to operate your organization's data processing, analytics, business intelligence today? Source: 451 Research, Voice of the Enterprise: Cloud, Hosting & Managed Services, Workloads and Key Projects 2018 9
Primary environment used to operate data processing, analytics, business intelligence functions today/in two years Respondents with data processing, analytics and business intelligence functions in place 35% On-premises 'traditional' resources and infrastructure 8% 21% 19% On-premises private cloud IT resources and infrastructure 12% Infrastructure as a service (Iaa. S)/platform as a service (Paa. S)/public cloud 27% 12% Hosted private cloud 16% 11% Software as a service (Saa. S) and hosted applications 22% 8% Hosted, non-cloud infrastructure None 2018 16% 2% 2% 2020 % of respondents (n = 66) Q 18. Which of the following best describes the primary environment used to operate your organization's data processing, analytics, business intelligence today? Source: 451 Research, Voice of the Enterprise: Cloud, Hosting & Managed Services, Workloads and Key Projects 2018 10
Drivers of Deploying Workloads/Applications in Off-Premises Environments Base: Respondents Plan To Deploy Majority Workloads/Applications in Off-Premises IT Environments 38% Reduce IT costs 37% Enhance IT systems agility 35% Improved access to new technology resources/capabilities/features 32% Modernize IT infrastructure 30% Deploy new applications/features faster Enhance application performance and resiliency 27% Move from capital-intensive IT to an operating expense model 27% 19% Improve security 16% Overcome lack of in-house IT staff/expertise Other 1% % of respondents (n=448) Q 15. You indicated that the majority of your organization’s workloads/applications will be deployed in off-premises cloud/hosted IT environments two years from now. What are the drivers behind this? Please select up to 3. Source: 451 Research, Voice of the Enterprise: Digital Pulse, Workloads and Key Projects 2018 11
Migration patterns 12 Copyright (C) 2018 451 Research LLC
Migration patterns Retain Keep current applications unchanged on existing on-premises infrastructure. Lift and shift Migrate applications to offpremises/cloud with minimal changes to the application code or business logic. 11% Modernize 12% 44% 14% Repurchase and shift Replace current on-premises applications with Saa. S or off-premises hosted versions of the applications. 18% Retain existing applications onpremises but move to more modern application and infrastructure architectures. Refactor and shift Re-architect or redesign existing applications using cloud-native frameworks and deploy in off-premises cloud environments. % of respondents (n=1049) Q 7. Which of the following best describes your organization’s overall IT infrastructure approach to mission-critical legacy applications and workloads going forward? Source: 451 Research, Voice of the Enterprise: Digital Pulse, Workloads and Key Projects 2018 13
Public cloud (Iaa. S) functions currently in use Iaa. S/public cloud users Relational database Data/business analytics Containers Auto-scaling Data warehouse Serverless compute/function as a service No. SQL database Real-time/streaming data processing Machine learning Mobile services Io. T platform Large-scale/batch data transfer Other None 45% 42% 41% 37% 33% 30% 25% 23% 22% 21% 16% 14% 5% 8% % of respondents (n = 322) Q 4. Which of the following Iaa. S features is your organization using in connection with your Iaa. S/public cloud deployment? Please select all that apply. Copyright (C) 2018 451 Research LLC Source: 451 Research, Voice of the Enterprise: Cloud, Hosting & Managed Services, Workloads and Key Projects 2018 1 4
The separation of compute and storage DATA ANALYTICS AND BI DATA WAREHOUSE Copyright (C) 2018 451 Research LLC 15
The separation of compute and storage DATA ANALYTICS AND BI IN THE CLOUD DATA WAREHOUSE IN THE CLOUD Copyright (C) 2018 451 Research LLC 16
The separation of compute and storage DATA ANALYTICS AND BI IN THE CLOUD DATA WAREHOUSE IN THE CLOUD Copyright (C) 2018 451 Research LLC OTHER CLOUD DATA PLATFORMS 17
Data gravity • • Rather than bring the data to the compute, bring the compute to the data. • Analyze data in cloud storage Cost Scalability Durability Copyright (C) 2018 451 Research LLC 18
It’s a multi-cloud world (increasingly deliberately) Which Flavor of Multi-Cloud? + Q. Has your organization configured any of the following cloud deployments for interoperability for the seamless delivery of a business function? Source: 451 Research, Voice of the Enterprise: Cloud Transformation, Vendor Evaluations, 2016 Copyright (C) 2018 451 Research LLC 19
It’s a multi-cloud world (increasingly deliberately) Which Flavor of Multi-Cloud? + Q. Has your organization configured any of the following cloud deployments for interoperability for the seamless delivery of a business function? Source: 451 Research, Voice of the Enterprise: Cloud Transformation, Vendor Evaluations, 2016 Copyright (C) 2018 451 Research LLC 20
Multiple infrastructure environments in the operation of data processing, analytics, business intelligence functions Respondents with data processing, analytics and business intelligence functions in place Yes No, but we plan to No, but we are considering it No, and we don't plan to 52% 9% 14% 25% % of respondents (n = 64) Q 20. Is your organization currently using multiple infrastructure environments in the operation of data processing, analytics, business intelligence? Copyright (C) 2018 451 Research LLC Source: 451 Research, Voice of the Enterprise: Cloud, Hosting & Managed Services, Workloads and Key Projects 2018 2 1
Reasons for multiple infrastructure environments in the operation of data processing, analytics, business intelligence functions Respondents with data processing, analytics and business intelligence functions in place Improving performance/availability 48% Optimizing for cost 39% Adding new functions to existing deployments 37% 33% Migrating between infrastructure environments Adding geographic diversity/reducing latency 24% Isolating sensitive business data 24% Meeting regulatory or data sovereignty requirements 22% Other None of the above 11% 4% % of respondents (n = 46) Q 21. Which, if any, of the following best describe your organization's reasons for using multiple infrastructure environments to operate data processing, analytics, business intelligence? Copyright (C) 2018 451 Research LLC Source: 451 Research, Voice of the Enterprise: Cloud, Hosting & Managed Services, Workloads and Key Projects 2018 2 2
The trouble with data processing in a globally distributed, multi-location environment ENTERPRISE EDW In theory: one data warehouse, at the heart of the enterprise 23 Copyright (C) 2018 451 Research LLC
The trouble with data processing in a globally distributed, multi-location environment ENTERPRISE EDW In practice: data marts/ departmental data warehouses/ data lakes 24 Copyright (C) 2018 451 Research LLC
The trouble with data processing in a globally distributed, multi-location environment ENTERPRISE EDW In theory: cloud 25 Copyright (C) 2018 451 Research LLC
The trouble with data processing in a globally distributed, multi-location environment ENTERPRISE EDW In practice: multiple clouds, multiple database services 26 Copyright (C) 2018 451 Research LLC
The trouble with data processing in a globally distributed, multi-location environment ENTERPRISE EDW Data processing at the edge 27 Copyright (C) 2018 451 Research LLC
Standardization – uniformity across multiple clouds ENTERPRISE EDW 28 Copyright (C) 2018 451 Research LLC
Catalog – index and discovery Data Catalog ENTERPRISE EDW 29 Copyright (C) 2018 451 Research LLC
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA SCIENTISTS ADVANCED ANALYTICS SELF-SERVICE ANALYTICS APPLICATIONS DATA-AS-ASERVICE DATA LAKE PARTNERS SUPPLIERS 30 Backup Copyright (C) 2018 451 Research LLC Disaster
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA SCIENTISTS DATA ANALYSTS ADVANCED ANALYTICS SELF-SERVICE ANALYTICS Data cleansing SELF-SERVICE Data harmonization DATA PREPARATION Data discovery Collaboration Data enrichment Data matching APPLICATIONS DATA-AS-ASERVICE DATA LAKE PARTNERS SUPPLIERS DATA STEWARDS IT Data catalog Data security Data lineage Backup DATA GOVERNANCE Copyright (C) 2018 451 Research LLC Data inventory Data quality Data pipelines Disaster 31
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA SCIENTISTS DATA ANALYSTS ADVANCED ANALYTICS SELF-SERVICE ANALYTICS SCALE-OUT ANALYTICS ACCELERATION LAYER Data cleansing SELF-SERVICE Data harmonization DATA PREPARATION Data discovery Collaboration Data enrichment Data matching APPLICATIONS DATA-AS-ASERVICE DATA LAKE PARTNERS SUPPLIERS DATA STEWARDS IT Data catalog Data security Data lineage Backup DATA GOVERNANCE Copyright (C) 2018 451 Research LLC Data inventory Data quality Data pipelines Disaster 32
Key Takeaways The cloud is becoming the default business environment for new analytics workloads , as more data is produced and stored in the cloud, driven by benefits including agility, efficiency, and modernization. The impact will be a considerable shift in the primary environment used for data processing, analytics and BI. From 35% traditional on-premises and 12% public cloud today, to only 8% traditional on-premises and 27% public cloud in 2020. The future is hybrid. Analytics users and vendors both need to prepare, if they haven't already, for a future of multi-venue analytics deployments, with workloads spread across on-premises non-cloud, on-premises private cloud, Iaa. S, Saa. S, hosted private cloud and hosed non-cloud. 33 Copyright (C) 2018 451 Research LLC
Thank You! matthew. aslett@451 research. com @maslett www. 451 research. com Copyright (C) 2018 451 Research LLC
How do you make big data work for self-service analytics in the cloud? Ingest Raw Data Find & Catalog Explore Transform & Improve Self-Service BI / Visualization AWS, Azure, Google, on-prem hybrid big-data platform Enterprise Metadata Repository
How do you make big data work for self-service analytics in the cloud? AWS, Azure, Google, on-prem hybrid big-data platform 36
How do you make big data work for self-service analytics in the cloud? How do you load all the data into the data lake in the first place? AWS, Azure, Google, on-prem hybrid big-data platform 37
How do you make big data work for self-service analytics in the cloud? Data Ingestion AWS, Azure, Google, on-prem hybrid big-data platform 38
By 2019, 60% of IT workloads will run in the cloud. Copyright (C) 2018 451 Research LLC
Drift Hinders Cloud Adoption Success Storage/Compute Structure/Semantics Requirements Architectures are a complex web of services, legacy, cloud and big data platforms The structure and semantics of upstream data is poorly governed Business requires agility and want data at the speed of need Copyright (C) 2018 451 Research LLC
Stream. Sets The First Ever Data. Ops Platform BUILD EXECUTE OPERATE PROTECT Data Collector ___ Control Hub ___ DPM ___ Cross-Platform Execution Cross-Platform Automation & Monitoring Data Availability & Quality SLAs Data Collector Edge Data Protector ___ Execute at the Edge Data Security SLAs Dataflow Sensors Copyright (C) 2018 451 Research LLC
How do you make big data work for self-service analytics in the cloud? How do users find, evaluate, understand, and acquire the data that they need to do their jobs? How do you ensure that only the right people get to view “sensitive” data? Data Ingestion AWS, Azure, Google, on-prem hybrid big-data platform 42
How do you make big data work for self-service analytics in the cloud? Data Catalog Data Ingestion AWS, Azure, Google, on-prem hybrid big-data platform 43
60% of Big Data Projects Fail More data, more variety, more users, more complexity. . . Preventing data clutter is hard Finding data is hard Governing and securing data is hard Core Problem: Obscure, Inconsistent or Missing Metadata Across Billions of Columns
Data Catalog - A Necessity in Big Data Architectures Data must be automatically tagged with business terms With High Accuracy @ Petabyte Scale AI Driven, Enterprise Data Catalog
Waterline Enterprise Data Catalog Inventory and Profile Discover and Tag Multiple Use Cases with Open Operability Self Service Analytics • Cloud, On • premise, & Relational @Petabyte Scale • AI Driven Automation • Crowdsourced • Curation Collaboration Secure Sensitive Data Migrate & Rationalize Unifying data discovery across ALL data sources and use cases
How do you make big data work for self-service analytics in the cloud? How do users clean and combine data from disparate data sets into a consistent format for analysis and reporting? Data Catalog Data Ingestion AWS, Azure, Google, on-prem hybrid big-data platform 47
How do you make big data work for self-service analytics in the cloud? Data Wrangling Data Catalog Data Ingestion AWS, Azure, Google, on-prem hybrid big-data platform 48
The Problem ANALYSIS & CONSUMPTION 80% “ 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data” — Kaggle founder and CEO Anthony Goldbloom DATA PLATFORMS
Interactive Exploration and Predictive Transformations 50 • Automated visualization of data quality and distribution to understand content & anomalies • Users interaction with data provides instant suggestions & previews of potential transformations to apply • Immediate visual feedback and profiling enables real-time validation
Requirement: Platform-Agnostic Data Wrangling On-Prem Data ADLS
How do you make big data work for self-service analytics in the cloud? How can 100's of users perform ad-hoc analysis and visualize results without having to replicate all that data back to the traditional data warehouse Data Wrangling Data Catalog Data Ingestion AWS, Azure, Google, on-prem hybrid big-data platform 52
How do you make big data work for self-service analytics in the cloud? Native BI and Data Visualization Data Wrangling Data Catalog Data Ingestion AWS, Azure, Google, on-prem hybrid big-data platform 53
Leading Companies are now Choosing Two Enterprise BI Standards
Arcadia Is Native BI Built from the Ground Up for Data Lakes
Self-Service Visual Analytics and BI for Business Users ▪ Intuitive and Visual UI that Anyone Can Use ▪ Accessed via web-browser ▪ Easy to compose visuals, dashboards and apps via drag and drop ▪ ▪ Get recommendations via machineassisted insights Benefits ▪ Unlocks big data analytics for business users and analysts ▪ ▪ Agility and reduced time to insight Business self-sufficiency and relieves burden on IT
Q&A For other questions or if you would like more information, please contact: bhagenau@waterlinedata. com clarke@streamsets. com bcariou@trifacta. com steve@arcadiadata. com www. Make. Big. Data. Work. org
- Slides: 57