Introduction to cloud data warehousing with Snowflake Who


























- Slides: 26
Introduction to cloud data warehousing with Snowflake
Who am I? Asanka Padmakumara Business Intelligence Consultant • More than 10 years in BI and Data Warehousing • Regular speaker at data-related events • Blog: asankap. wordpress. com • Linked In: linkedin. com/in/asankapadmaku mara • Twitter: @asanka_e
What is a Data Warehouse? Central Organized Integrated Read Optimized Reporting/ Analysis SMP/MPP
Symmetric Multi-Processing(SMP) • Multiple CPUs share • • OS Memory Hard Disk Network • Shared Everything/Shared Disk Architecture • Scale up • High concurrency • Expensive
Massively Parallel Processing (MPP) • • • Multiple processors, own Memory and Hard Disk Shared Nothing Architecture Distributed processing Scale Out Cheap Low Concurrency
SMP or MPP? SMP MPP • Small to medium data sets • Many small read/write operations • Multiple row-by-row operations • • Big data, analytical, batch-oriented workloads Complex Queries
Main Cloud based MPP DW Services https: //gigaom. com/report/cloud-data -warehouse-performance-testing/ * Sponsored by Microsoft
Why Cloud Data Warehousing? Cloud On-Premises Scalability High Scalability; Scale-up, Scaleout Time and resource consuming Availability High availability Depends on hardware/software quality. Expensive in DR Security Highly secure environments Depend on the competency of internal IT team Performance Depend on selected service, data center and capacity Depend of hardware, can achieve high performance Cost No Capex, Pay as you go, No maintenance cost. Need significant initial investments. Hardware, Team, Training Maintainability No maintenance required Need continuous maintenance, performance enhancements
Snowflake DATA WAREHOUSE AS A SERVICE (DWaa. S) BUILT FROM THE SCRATCH 3 EDITIONS: STANDARD, ENTERPRISE, BUSINESS CRITICAL AND…VIRTUAL PRIVATE SNOWFLAKE AVAILABLE ON PUBLIC CLOUDS(AZURE, AWS AND GOOGLE CLOUD) ON DEMAND (PAY AS YOU GO) OR CAPACITY BASED PRICING
20 19 On Google Cloud 3400+ customers, 12. 5 B USD 20 18 20 12 20 On Amazon 20 On Azure 20 Founded 14 History
Found its place in Gartner Magic Quadrant 2019
Availability
Architecture • Hybrid of Shared Nothing and Shared Disk Architectures • 3 Layers 1. 2. 3. Unlimited, cheap storage MPP Compute Clusters (Virtual warehouses) Cloud Service (handles everything for you) • Internal Optimised, highly compressed, columnar data storage • Support ANSI: SQL
Virtual Warehouse • A cluster with up to 128 nodes • Up to 10 Multi-Clusters • Auto Suspend, Auto Resume • Scale up and Scale out X-Small Medium 1 Node 2 Nodes 4 Nodes X-Large 2 X-Large 16 Nodes 32 Nodes Large 8 Nodes 3 X-Large 4 X-Large 64 Nodes 128 Nodes
What makes Snowflake special? • Data warehouse as a Service (Truly!!) • Cloud Platform agnostic (Amazon, Azure, Google Cloud) • No Maintenance, No Optimisations • Auto-pause, auto-resume • Time Travel • Automated continuous data loading (Snowpipe) • Secure data sharing • Zero Copy Cloning
Loading Data • Bulk Loading • COPY INTO <table> • Load data from stages, both internal and external • Delimited, JSON, Avro, ORC, Parquet, XML • Snowpipe • Automated and Triggered Data Loading • Upload using Web Interface • 3 rd Party Tools
Bulk Loading External Internal Staging
Snowpipe • Automated Data Loading using Cloud Messaging • Amazon Web Services (AWS) 1 2 • Amazon S 3 • Microsoft Azure • Blob storage • Data Lake Storage Gen 2 • General-purpose v 2 • Triggered Data Loading calling REST API 3
Data Ingestion Tools
Micro Partitioning and Clustering C 1 6 1 2 4 1 C 2 1 6 9 2 5 2 C 1 1 C 2 6 2 1 8 5 2 8 1 3 9 5 5 C 1 2 1 5 4 8 2 C 2 8 6 3 2 5 1 MP 1 SELECT * FROM Fact. Sales WHERE C 1=2 MP 2 ALTER TABLE Fact. Sales CLUSTER BY (C 1, C 2) MP 3 C 1 C 2 1 5 1 6 2 1 C 2 2 8 2 9 4 2 C 1 C 2 5 3 6 1 8 5
Collaboration Zero Copy Cloning
Time Travel and Fail Safe SELECT * FROM Fact. Online. Sales at(timestamp => 'Mon, 01 May 2015 16: 20: 00 -0700': : timestamp);
Security • Network Level Security • IP whitelist/backlist • Private Link • User Level Security • • SCIM MFA OAuth SSO • Object Security • Warehouse, Database, User, Schema, Table, etc. • Data Security • Automatic data encryption, Periodic Rekeying, Customer Managed Keys, Data Masking
Things you want to consider ! • No Indexes, no manual partitioning, no manual distribution • Java Script based Stored Procedures • SPs can’t return a dataset • Unable to back-up to external storages • Privileges does not get combined across roles
It’s demo time!!!