Inside SQL Server Polybase Bob Ward Principal Architect

Inside SQL Server Polybase Bob Ward, Principal Architect, Microsoft https: //aka. ms/bobsql https: //aka. ms/bobwardms https: //aka. ms/bobsqldemos


“It’s all about Data Virtualization” ü Distributed compute engine integrated with SQL Server ü Query data where it lives using T-SQL ü Distributed, scalable query performance Analytics T-SQL Apps ü Manual/deploy with SQL Server ü Auto deploy/optimize with Big Data Clusters SQL Server Poly. Base external tables ODBC Intelligence over all data No. SQL Relational databases Big data

Polybase comes to PDW 2012 MSFT Jim Gray Labs Project to query “big data” with SQL in 2011 Polybase ships with SQL Server 2016 Polybase “classic” Microsoft acquires Metanautix in 2015 bringing new connectors Project Aris commences in 2017 to take Polybase to the next level SQL Server 2019 includes Polybase classic, new ODBC data sources, and Big Data Clusters (BDC)

SQL Setup and configure Polybase Setup authentication Create EXTERNAL DATA SOURCE Create EXTERNAL TABLE Create statistics on key columns Query like any other table Create EXTERNAL FILE FORMAT

SQL Engine Polybase Engine Polybase Data Movement Service mpdwsvc. exe


Polybase “Classic” Connect to Hadoop: Cloudera or Horton. Works on Windows and Linux Azure Blob Storage Direct Steaming or Java Map. Reduce SQL Server or Azure SQL Data Warehouse Polybase and HDFS in Big Data Clusters Direct Access to HDFS via SQL Server Engine Hadoop cluster pre-installed with HDFS and Spark HDFS metadata handled within the cluster

No client software install required

Use EXTERNAL table to map common names familiar to your SQL Server Exclude fields or columns from data source that are not needed (Planned) Use SQL Server views to abstract joins and data source access Use UNION to join similar data from data sources and local SQL Server

External Tables Linked Servers Database Scoped Instance Scoped ODBC Drivers OLEDB Providers Read-only* Read/write Scale out queries with push-down Single threaded queries with push-down Failover with AG Requires separate config from AG Basic authentication Basic and integrated authentication Distributed Transactions not supported Distributed Transactions supported

SQL




mpdwsvc. exe uses SQLPAL Control Plane Controller Svc Azure FSM Engine Configuration Store (SQL Server) Kibana Grafana Elastic Search Influx. DB Custom apps BI Polybase compute nodes In Linux containers Compute pool SQL Compute Node “Built-in” Data Sources Compute pool SQL Compute Node Data mart Storage Node SQL Server master instance SQL Compute Node … SQL Compute Node Directly read from HDFS Storage pool SQL Data Node Spark SQL Server HDFS Data Node Io. T data Polybase head node In Linux container Compute pool SQL Compute Node SQL Data Node SQL Analytics Storage Spark SQL Server HDFS Data Node … Spark SQL Server HDFS Data Node Map. Reduce Not used Kubernetes pod Node Persistent storage Node


PROSE Code generation in Notebooks Import wizards External Table detection for HDFS

Download EAP

Questions? aka. ms/SQLBits 19

https: //aka. ms/bobsql https: //aka. ms/bobwardms https: //aka. ms/bobsqldemos documentation Git. Hub documentation Loading data into Azure SQL Data Warehouse with Polybase

- Slides: 22