Yoav Dobrin Azure Data Explorer Not an ordinary
Yoav Dobrin Azure Data Explorer Not an ordinary Data-Base
Agenda
Big Data in Modern Business Environment Volume: Petabytes Terabytes Gigabytes Megabytes Click streams Sensors/RFID/ Social Sentiment Wikis, Blogs Devices Audio/Video Advertising Collaboration Mobile e. Commerce ERP/CRM Web 2. 0 Web logs Digital Marketing Payables Contacts Search Marketing Payroll Deal Tracking Recommendations Inventory Sales Pipeline Data Complexity: Velocity, Variety Big. Data Log files Spatial & GPS Append only Free text data coordinates Data market feeds e. Gov feeds More decision makers Weather Text/image New decisions daily
Why are these visitors abandoning their shopping carts? Understand the WHY behind the WHAT © Microsoft Corporation Who are my most dissatisfied users? Understand patterns in my service deeply Why aren’t customers clicking on this content? Discover emergent behaviors and trends What is impacting production quality? Discover unexpected patterns that can inspire new ideas
Modern Data Warehouse
Modern Data Warehouse 70+ Connectors Cost-effective Lightning Best Fast. Spark & Complex Analytics Fastest DW Modern BI
What is Azure Data Explorer Any appendonly stream of records High volume High velocity High variance (structured, semistructured, free-text) Purposely built Relational query model: Filter, aggregate, join, calculated columns, … Fullymanaged Paa. S, Vanilla, Database Rapid iterations to explore the data 7
Key Usage Patterns
Multi-temperature data processing paths Hot Warm Cold • • • in-mem cube • minutes freshness, months retention • raw data • ad-hoc queries • seconds-minutes query perf • data exploration • • • column store • hours freshness, years retention • raw data • programmatic batch processing • minutes-hours query perf • data manipulation • • • distributed file system • seconds freshness, days retention • in-mem aggregated data • pre-defined standing queries • split-seconds query performance • data viewing stream analytics … Indexing … map reduce …
Demo - – Social / Git. Hub
Data Map ( In / Out ) APIs. NET SDK. NET Std SDK Python SDK Java SDK Queued Ingestion Azure Blob & Queue Blob & Event Grid In Direct APIs UX Bulk . NET SDK Web UI Streaming . NET Std SDK Desktop App Java SDK Java Script Jupyter Magic Azure Notebooks Out Event Hub Monaco IDE Power BI Io. T Hub Python SDK Grafana Kafka Tableau ADLS Gen 2 Lens (In roadmap) Connectors Power BI Direct Query Microsoft Flow ADF Spark (In roadmap) Azure Logic App Protocols REST API MS-TDS
Big Data Interactive Analytics Query language Comprehensive Strength • Metrics and time-series data • Text search and text analytics • Multidimensional/relational analysis • • Simple and powerful Publicly available Data Exploration Rich relational query language • Full text Search • ML Extensibility High performance over large data sets • Scale out in hardware • Scale out across geos • Granular resource utilization Control • Cross geo queries Data Ingestion and Management • Low Latency ingestion • Schema management for logs and tables • Compression and indexing • Retention • Hot/cold resource allocation
Our Current Figures 46 267 PB 3. 2 K Cluster pairs (Engine + DM) 7. 2 PB 27. 1 K 265 K Regions in Azure Total data size Data ingested per day Nodes Databases 500 K 12. 2 B Cores Total queries
ADX Architecture
High Level Architecture Azure portal ADX Web UI dataexplorer. azure. com Management & Ingestion Customer subscription Query “Kusto QL” SDK Query TDS/SQL Customer subscription ADX subscription ingest-<name>. <region>. kusto. windows. net Ingest Export
Enabling Pillars Technologies • • • Column store Sharded Indexed “Log” data (append only) Query language Storage • Data stored in Blob Storage • Proprietary Format • Ingest & Query performance critically rely on this • Data must be ingested prior to query • Data is heavily cached • In RAM & In SSD (local VM or Managed Disk) • Amount of “hot” SSD cache is usually the important factor in scale-out • And driving factor of cost
Capabilities Takeaways Build powerful, interactive analytics solutions Used by most of Microsoft products Quickly discover insights from large volumes of data Query 1 B records in <1 second Get started and scale to terabytes of data, in minutes Ingest 200 MB data per second per node with linear scaling
Feedback, Please – ﻣﻦ ﻓﻀﻠﻚ - בבקשה Session's feedback form: https: //www. sqlsaturday. com/823/Sessions/Session. Evaluati on. aspx Event's feedback form: https: //www. sqlsaturday. com/823/Event. Eval. aspx
Product links Product Page: http: //aka. ms/Azure. Data. Explorer Announcement: https: //azure. microsoft. com/en-in/blog/introducing-azure-data-explorer/ Whitepaper: https: //azure. microsoft. com/en-us/resources/azure-data-explorer/en-us/ 101 blog: https: //azure. microsoft. com/en-us/blog/azure-data-explorer-technology-101/ Scott Guthrie’s Keynote (ADX Announcement/Demo) Rohan Kumar’s Keynote (ADX Announcement/Demo) Scott Guthrie’s in Techorama (Scott’s demo) Docs: https: //docs. microsoft. com/en-us/azure/data-explorer Pricing Page: https: //azure. microsoft. com/en-us/pricing/details/data-explorer/ Cost Estimator: http: //aka. ms/ADX. Cost. Estimator Course - https: //www. pluralsight. com/courses/kusto-query-language-kql-from-scratch
Our generous sponsors
- Slides: 20