SQL Server 2016 Poly Base Sean Werick Principal
SQL Server 2016 Poly. Base
Sean Werick Principal Consultant 502. 320. 2918 swerick@pragmaticworks. com
SQL Server 2016 - Industry leading TCO R SERVER IN-DB ADVANCED ANALYTICS MOBILE BI INTELLIGENCE & SELF-SERVICE BI BUSINESS $320 K SQL Server 2016 Everything built-in #1 TPC-H—DW DATA WAREHOUSING BUILT-IN ETL $1, 272, 000 $3, 745, 000 $3, 433, 000 $640, 000 $803, 000 11. 7 x more INDUSTRY OLTP LEADER—OLTP Built-in with SQL Server vs. expensive add-ons with Oracle In-memory built-in End-to-end security built-in Advanced Analytics built-in Complete mobile BI built-in
SQL Server 2016: Everything built-in Industry leader #1 performance 6 years in a row A fraction of the cost least vulnerable 80 $2, 230 SQL Server 60 SQL Server 49 50 43 34 40 SQL Server 29 30 2220 6 0 4 1 2010 SQL Server 2011 0 2012 Oracle 0 2013 My. SQL 5 $480 22 18 15 20 0 R + in-memory 69 70 10 built-in 3 2014 3 #1 #2 #3 Oracle is #5 2015 SAP HANA $120 Microsoft TPC-H Tableau Oracle Self-service BI per user at massive scale In-memory across all workloads Consistent experience from on-premises to cloud The above graphics were published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. National Institute of Standards and Technology Comprehensive Vulnerability Database update 10/2015 TPC-H 10 TB non-clustered results as of 04/06/15, 5/04/15, 4/15/14 and 11/25/13, respectively. http: //www. tpc. org/tpch/results/tpch_perf_results. asp? resulttype=noncluster 8
In-memory enhancements Operational analytics & enhanced performance Capability In-memory SQL Server Column. Store data warehouse 2 -24 hrs Benefits ETL Fraud detected Real-time fraud detection 010010110 In-memory SQL Server OLTP Mission critical performance Æ Unlike competition, you gain operational analytics & 30 x faster transactions & 100 x queries Æ In-memory for more of your applications
Built-in advanced analytics In-database analytics at massive scale Example Solutions Extensibility • Sales forecasting R Integration • Warehouse efficiency • Predictive maintenance R ? New R scripts • Credit risk protection 0100100 010101 Analytic Library 0100100 010101 Data Scientist Interact directly with data 0100100 010101 T-SQL Interface Relational Data Built-in to SQL Server Deeper insights across data Developer/DBA Manage data and analytics together 0100100 010101 Microsoft Azure Marketplace
R integration and advanced analytics Capability SQL Server Data Scientists Analytics library Publish algorithms, interact directly with data Share and collaborate Manage and deploy Analytical engines Full R integration Fully extensible DBAs R Data Management Layer Relational data T-SQL interface Stream data in-memory Advanced analytics + Manage storage and analytics together Business Analysts Analysis through TSQL, tools, and vetted algorithms Extensible in-database analytics, integrated with R, exposed through T-SQL Centralize enterprise library for analytic models Benefits No data movement, resulting in faster time to insights Real-time analytics on transactional data Integrate with existing workflows Unified governance across analytics and storage
Always Encrypted Help protect data at rest and in motion, on-premises & cloud Apps SQL Server Trusted Client side SELECT Name FROM Patients WHERE SSN=@SSN='198 -33 -0987' Column Master Key Result Set Name Jim Gray Mission critical performance Query SELECT Name FROM Patients WHERE SSN=@SSN=0 x 7 ff 654 ae 6 d Enhanced ADO. NET Library Result Set Column Encryption Key Name Jim Gray ciphertext dbo. Patients Name SSN Country Jane Doe 243 -24 -9812 USA 1 x 7 fg 655 se 2 e Jim Gray 198 -33 -0987 0 x 7 ff 654 ae 6 d USA John Smith 123 -82 -1095 0 y 8 fj 754 ea 2 c USA
RLS in three steps Two App user (e. g. , nurse) selects from Patients table Security Policy transparently rewrites query to apply filter predicate Nurse Database Three Security Policy Patients Application SELECT * FROM Patients Policy Manager Filter Predicate: INNER JOIN… CREATE FUNCTION dbo. fn_securitypredicate(@wing int) RETURNS TABLE WITH SCHEMABINDING AS return SELECT 1 as [fn_securitypredicate_result] FROM SELECT * FROM Patients Staff. Duties d INNER JOIN Employees e SEMIJOIN APPLY dbo. fn_securitypredicate(patients. Wing); ON (d. Emp. Id = e. Emp. Id) WHERE e. User. SID = SUSER_SID() AND @wing = d. Wing; SELECT Patients. * FROM Patients, CREATE SECURITY POLICY dbo. Sec. Pol Staff. Duties d INNER JOIN Employees e ON (d. Emp. Id = e. Emp. Id) ADD FILTER PREDICATE dbo. fn_securitypredicate(Wing) ON Patients WHERE e. User. SID = SUSER_SID() AND Patients. wing = d. Wing; WITH (STATE = ON) Security
Dynamic data masking walkthrough 2) Application user selects from Employee table 1) Dynamic Securitydata officer defines dynamic data masking policydata in T-SQL sensitive 3) masking policy obfuscates the sensitive in theover query results data in Employee table ALTER TABLE [Employee] ALTER COLUMN [Social. Security. Number] ADD MASKED WITH (FUNCTION = ‘SSN()’) ALTER TABLE [Employee] ALTER COLUMN [Email] ADD MASKED WITH (FUNCTION = ‘EMAIL()’) ALTER TABLE [Employee] ALTER COLUMN [Salary] ADD MASKED WITH (FUNCTION = ‘RANDOM(1, 20000)’) GRANT UNMASK to admin 1 SELECT [Name], [Social. Security. Number], [Email], [Salary] FROM [Employee] Security
Monitoring performance by using the Query Store Capability Query Store helps customers quickly find and fix query performance issues Query Store is a ‘flight data recorder’ for database workloads Benefits Greatly simplifies query performance troubleshooting Provides performance stability across SQL Server upgrades Allows deeper insight into workload performance Performance
Stretch SQL Server into Azure Stretch warm and cold tables to Azure with remote query processing Capability Microsoft Azure Jim Gray ox 7 ff 654 ae 6 d 3/18/2005 Order history Stretch to cloud Name SSN Jane Doe Date Customer data cm 61 ba 906 fd 2/28/2005 Jim Gray ox 7 ff 654 ae 6 d 3/18/2005 Product data John Smith Bill Brown i 2 y 36 cg 776 rg 4/10/2005 Order History nx 290 pldo 90 l 4/27/2005 Sue Daniels ypo 85 ba 616 rj Sarah Jones bns 51 ra 806 fd Hyperscale cloud Jake Marks mci 12 hh 906 fj 5/12/2005 5/22/2005 6/07/2005 Stretch cold database tables from on-premises SQL Server Databases to Azure with remote query processing Benefits Æ Cost effective historical data Æ Entire table is online and remains queryable from on-premises apps Æ Transparent to applications Æ Supports Always Encrypted & Row Level Security Query SQL Server App
New Portal
Updated Report Features
New KPI Features
Mobile Reporting
Brand the Portal
SQL Server 2016 Features So many great features, but which are included with which version? https: //www. microsoft. com/en-us/server-cloud/products/sql-server-editions/
Deeper Insights Across Data with Poly. Base
What is Poly. Base
What is Poly. Base
Poly. Base Query relational and non-relational data with T-SQL Capability Query relational and non-relational data, on-premises and in Azure T-SQL for querying relational and non-relational data across SQL Server and Hadoop Benefits T-SQL query New business insights across your data lake SQL Server Apps Access any data Hadoop Leverage existing skill sets and BI tools Faster time to insights and simplified ETL process
Poly. Base Query relational and non-relational data with T-SQL Capability Quote: ************ T-SQL query *********************** SQL Server Name DOB State Jim Gray 11/13/58 WA Ann Smith 04/29/76 ME Deeper insights across data Hadoop $658. 39 T-SQL for querying relational and non-relational data across SQL Server and Hadoop Benefits Æ New business insights across your data lake Æ Leverage existing skillsets and BI tools Æ Faster time to insights and simplified ETL process
Poly. Base Can…
Poly. Base Performance
Polybase scale-out groups
Poly. Base Requirements SQL Server (64 -bit) Java SE downloads Enable or Disable a Server Network Protocol
Setting up Poly. Base 1. Install Poly. Base a) Poly. Base Data Movement Service b) Poly. Base Engine 2. Configure SQL Server and enable the option 3. Configure Pushdown 4. Create external data source 5. Create external file format 6. Create Hadoop user 7. Create external table
Poly. Base Configuration First, is it enabled? SELECT SERVERPROPERTY ('Is. Polybase. Installed') -- 5 denotes the connection type EXEC sp_configure 'hadoop connectivity', 5; RECONFIGURE; Option 0: Disable Hadoop connectivity Option 1: Hortonworks HDP 1. 3 on Windows Server Option 1: Azure blob storage (WASB[S]) Option 2: Hortonworks HDP 1. 3 on Linux Option 3: Cloudera CDH 4. 3 on Linux Option 4: Hortonworks HDP 2. 0 on Windows Server Option 4: Azure blob storage (WASB[S]) Option 5: Hortonworks HDP 2. 0 on Linux Option 6: Cloudera 5. 1, 5. 2, 5. 3, 5. 4, and 5. 5 on Linux Option 7: Hortonworks 2. 1, 2. 2, and 2. 3 on Windows Server Option 7: Azure blob storage (WASB[S])
Poly. Base Configuration Restart: • SQL Server • Poly. Base Data Movement Service • Poly. Base Engine
Poly. Base Configuration
Poly. Base Configuration EXEC sp_polybase_join_group 'PQTH 4 A-CMP 01', 16450, 'MSSQLSERVER';
Poly. Base Configuration Create Scoped Credential USE [Adventureworks. DW] GO -- 2: Create a database scoped credential for Kerberos-secured Hadoop clusters. -- IDENTITY: the user name -- SECRET: the password CREATE DATABASE SCOPED CREDENTIAL HDPUser WITH IDENTITY = 'hue', Secret = ''; GO
Poly. Base Configuration Create External Data Source USE [Adventureworks. DW] GO CREATE EXTERNAL DATA SOURCE [HDP 2] WITH (TYPE = HADOOP, LOCATION = N'hdfs: //pwpchadoop. cloudapp. net: 8020', CREDENTIAL = HDPUser); GO
Poly. Base Configuration CREATE EXTERNAL FILE FORMAT TSV WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = 't', DATE_FORMAT = 'MM/dd/yyyy' ) )
Poly. Base Configuration CREATE EXTERNAL TABLE HDP_Fact. Internet. Sales ([Product. Key] [int], [Order. Date. Key] [int], [Due. Date. Key] [int], [Ship. Date. Key] [int], [Customer. Key] [int], …) WITH (LOCATION = '/apps/hive/warehouse/factinternetsales', DATA_SOURCE = HDP 2, FILE_FORMAT = TSV, REJECT_TYPE = value, REJECT_VALUE=0)
Using Poly. Base SELECT Insured_Customers. First. Name, Insured_Customers. Last. Name, Insured_Customers. Yearly. Income, Insured_Customers. Marital. Status INTO Fast_Customers from Insured_Customers INNER JOIN (SELECT * FROM Car. Sensor_Data where Speed > 35) AS Sensor. D ON Insured_Customers. Customer. Key = Sensor. D. Customer. Key
Using Poly. Base -- Enable INSERT into external table sp_configure 'allow polybase export', 1; Reconfigure; -- Create an external table. CREATE EXTERNAL TABLE [dbo]. [Fast. Customers 2009] ( [First. Name] char(25) NOT NULL, [Last. Name] char(25) NOT NULL, [Yearly. Income] float NULL, [Marital. Status] char(1) NOT NULL) WITH (LOCATION='/old_data/2009/customerdata. tbl', DATA_SOURCE = Hadoop. HDP 2, FILE_FORMAT = Text. File. Format, REJECT_TYPE = VALUE, REJECT_VALUE = 0);
Using Poly. Base SELECT DISTINCT Insured_Customers. First. Name, Insured_Customers. Last. Name, Insured_Customers. Yearly. Income, Car. Sensor_Data. Speed FROM Insured_Customers, Car. Sensor_Data WHERE Insured_Customers. Customer. Key = Car. Sensor_Data. Customer. Key and Car. Sensor_Data. Speed > 35 ORDER BY Car. Sensor_Data. Speed DESC OPTION (FORCE EXTERNALPUSHDOWN); -- or OPTION (DISABLE EXTERNALPUSHDOWN)
Using Poly. Base SELECT customer. name, customer. zip_code FROM customer WHERE customer. account_balance < 200000
Using Poly. Base
Poly. Base Troubleshooting -- Find the longest running query SELECT execution_id, st. text, dr. total_elapsed_time FROM sys. dm_exec_distributed_requests dr cross apply sys. dm_exec_sql_text(sql_handle) st ORDER BY total_elapsed_time DESC; -- Find the longest running step of the distributed query plan SELECT execution_id, step_index, operation_type, distribution_type, location_type, status, total_elapsed_time, command FROM sys. dm_exec_distributed_request_steps WHERE execution_id = 'QID 4547' ORDER BY total_elapsed_time DESC;
Poly. Base Troubleshooting -- Find the execution progress of SQL step SELECT execution_id, step_index, distribution_id, status, total_elapsed_time, row_count, command FROM sys. dm_exec_distributed_sql_requests WHERE execution_id = 'QID 4547' and step_index = 1;
Poly. Base Troubleshooting SELECT execution_id, step_index, dms_step_index, compute_node_id, type, input_name, length, total_elapsed_time, status FROM sys. dm_exec_external_work WHERE execution_id = 'QID 4547' and step_index = 7 ORDER BY total_elapsed_time DESC;
Poly. Base Troubleshooting
Start Playing Developer Edition https: //www. microsoft. com/en-us/server-cloud/products/sql-server-editions/sql-server-developer. aspx SQL Server Data Tools For Visual Studio 2015 https: //msdn. microsoft. com/en-us/library/mt 204009. aspx
Pragmatic Works Offers a Variety of Services to Help You with SQL Server - Architectural Design Sessions - Detailed Assessments and Roadmaps - Migrations and Upgrades Ask us about test-driven migrations and upgrades with Legitest Sean Werick swerick@pragmaticworks. com
- Slides: 50