Introduction to cloud data warehousing with Snowflake Who

  • Slides: 26
Download presentation
Introduction to cloud data warehousing with Snowflake

Introduction to cloud data warehousing with Snowflake

Who am I? Asanka Padmakumara Business Intelligence Consultant • More than 10 years in

Who am I? Asanka Padmakumara Business Intelligence Consultant • More than 10 years in BI and Data Warehousing • Regular speaker at data-related events • Blog: asankap. wordpress. com • Linked In: linkedin. com/in/asankapadmaku mara • Twitter: @asanka_e

What is a Data Warehouse? Central Organized Integrated Read Optimized Reporting/ Analysis SMP/MPP

What is a Data Warehouse? Central Organized Integrated Read Optimized Reporting/ Analysis SMP/MPP

Symmetric Multi-Processing(SMP) • Multiple CPUs share • • OS Memory Hard Disk Network •

Symmetric Multi-Processing(SMP) • Multiple CPUs share • • OS Memory Hard Disk Network • Shared Everything/Shared Disk Architecture • Scale up • High concurrency • Expensive

Massively Parallel Processing (MPP) • • • Multiple processors, own Memory and Hard Disk

Massively Parallel Processing (MPP) • • • Multiple processors, own Memory and Hard Disk Shared Nothing Architecture Distributed processing Scale Out Cheap Low Concurrency

SMP or MPP? SMP MPP • Small to medium data sets • Many small

SMP or MPP? SMP MPP • Small to medium data sets • Many small read/write operations • Multiple row-by-row operations • • Big data, analytical, batch-oriented workloads Complex Queries

Main Cloud based MPP DW Services https: //gigaom. com/report/cloud-data -warehouse-performance-testing/ * Sponsored by Microsoft

Main Cloud based MPP DW Services https: //gigaom. com/report/cloud-data -warehouse-performance-testing/ * Sponsored by Microsoft

Why Cloud Data Warehousing? Cloud On-Premises Scalability High Scalability; Scale-up, Scaleout Time and resource

Why Cloud Data Warehousing? Cloud On-Premises Scalability High Scalability; Scale-up, Scaleout Time and resource consuming Availability High availability Depends on hardware/software quality. Expensive in DR Security Highly secure environments Depend on the competency of internal IT team Performance Depend on selected service, data center and capacity Depend of hardware, can achieve high performance Cost No Capex, Pay as you go, No maintenance cost. Need significant initial investments. Hardware, Team, Training Maintainability No maintenance required Need continuous maintenance, performance enhancements

Snowflake DATA WAREHOUSE AS A SERVICE (DWaa. S) BUILT FROM THE SCRATCH 3 EDITIONS:

Snowflake DATA WAREHOUSE AS A SERVICE (DWaa. S) BUILT FROM THE SCRATCH 3 EDITIONS: STANDARD, ENTERPRISE, BUSINESS CRITICAL AND…VIRTUAL PRIVATE SNOWFLAKE AVAILABLE ON PUBLIC CLOUDS(AZURE, AWS AND GOOGLE CLOUD) ON DEMAND (PAY AS YOU GO) OR CAPACITY BASED PRICING

20 19 On Google Cloud 3400+ customers, 12. 5 B USD 20 18 20

20 19 On Google Cloud 3400+ customers, 12. 5 B USD 20 18 20 12 20 On Amazon 20 On Azure 20 Founded 14 History

Found its place in Gartner Magic Quadrant 2019

Found its place in Gartner Magic Quadrant 2019

Availability

Availability

Architecture • Hybrid of Shared Nothing and Shared Disk Architectures • 3 Layers 1.

Architecture • Hybrid of Shared Nothing and Shared Disk Architectures • 3 Layers 1. 2. 3. Unlimited, cheap storage MPP Compute Clusters (Virtual warehouses) Cloud Service (handles everything for you) • Internal Optimised, highly compressed, columnar data storage • Support ANSI: SQL

Virtual Warehouse • A cluster with up to 128 nodes • Up to 10

Virtual Warehouse • A cluster with up to 128 nodes • Up to 10 Multi-Clusters • Auto Suspend, Auto Resume • Scale up and Scale out X-Small Medium 1 Node 2 Nodes 4 Nodes X-Large 2 X-Large 16 Nodes 32 Nodes Large 8 Nodes 3 X-Large 4 X-Large 64 Nodes 128 Nodes

What makes Snowflake special? • Data warehouse as a Service (Truly!!) • Cloud Platform

What makes Snowflake special? • Data warehouse as a Service (Truly!!) • Cloud Platform agnostic (Amazon, Azure, Google Cloud) • No Maintenance, No Optimisations • Auto-pause, auto-resume • Time Travel • Automated continuous data loading (Snowpipe) • Secure data sharing • Zero Copy Cloning

Loading Data • Bulk Loading • COPY INTO <table> • Load data from stages,

Loading Data • Bulk Loading • COPY INTO <table> • Load data from stages, both internal and external • Delimited, JSON, Avro, ORC, Parquet, XML • Snowpipe • Automated and Triggered Data Loading • Upload using Web Interface • 3 rd Party Tools

Bulk Loading External Internal Staging

Bulk Loading External Internal Staging

Snowpipe • Automated Data Loading using Cloud Messaging • Amazon Web Services (AWS) 1

Snowpipe • Automated Data Loading using Cloud Messaging • Amazon Web Services (AWS) 1 2 • Amazon S 3 • Microsoft Azure • Blob storage • Data Lake Storage Gen 2 • General-purpose v 2 • Triggered Data Loading calling REST API 3

Data Ingestion Tools

Data Ingestion Tools

Micro Partitioning and Clustering C 1 6 1 2 4 1 C 2 1

Micro Partitioning and Clustering C 1 6 1 2 4 1 C 2 1 6 9 2 5 2 C 1 1 C 2 6 2 1 8 5 2 8 1 3 9 5 5 C 1 2 1 5 4 8 2 C 2 8 6 3 2 5 1 MP 1 SELECT * FROM Fact. Sales WHERE C 1=2 MP 2 ALTER TABLE Fact. Sales CLUSTER BY (C 1, C 2) MP 3 C 1 C 2 1 5 1 6 2 1 C 2 2 8 2 9 4 2 C 1 C 2 5 3 6 1 8 5

Collaboration Zero Copy Cloning

Collaboration Zero Copy Cloning

Time Travel and Fail Safe SELECT * FROM Fact. Online. Sales at(timestamp => 'Mon,

Time Travel and Fail Safe SELECT * FROM Fact. Online. Sales at(timestamp => 'Mon, 01 May 2015 16: 20: 00 -0700': : timestamp);

Security • Network Level Security • IP whitelist/backlist • Private Link • User Level

Security • Network Level Security • IP whitelist/backlist • Private Link • User Level Security • • SCIM MFA OAuth SSO • Object Security • Warehouse, Database, User, Schema, Table, etc. • Data Security • Automatic data encryption, Periodic Rekeying, Customer Managed Keys, Data Masking

Things you want to consider ! • No Indexes, no manual partitioning, no manual

Things you want to consider ! • No Indexes, no manual partitioning, no manual distribution • Java Script based Stored Procedures • SPs can’t return a dataset • Unable to back-up to external storages • Privileges does not get combined across roles

It’s demo time!!!

It’s demo time!!!