Smith Matt F matthew smithdell com SQL Server
- Slides: 31
Smith, Matt F | matthew. smith@dell. com SQL Server Poly. Base and Dell EMC Isilon storage
A little bit about me. • I work for the Dell EMC Big Data/Io. T Solution Engineering Team as a Solutions Architect (consulting & delivery) • Started with SQL Server 7 back in the Dot-com days doing application development: ASP 3. 0, VB 6 and SQL Server. • Lots of work with SQL Server over the years: Dev, DBA, BI • Moved into the Big Data space a few years ago: Pivotal, Hortonworks, Cloudera.
Why I chose this topic. • • Personal interest in creating a performant entry-level Big Data solution for small to mid-sized Hadoop File System data sets (1 -5 TB). Reduce the common barriers of entry to Big Data: Training and experience, financial investment, infrastructure. Introduce Big Data into organizations through known & trusted products, leveraging existing skill sets. Realize value in helping DBA’s, developers and organizations start their big data journeys through low-risk pilot projects.
SQL Server Poly. Base – a quick overview • Introduced and initially a component of SQL Server 2012 Parallel Data Warehouse (PDW), now known as Analytics Platform System (APS) • Added to the SQL Server Product Family in 2016 • Allows you to access semi-structured data (Schema on Read) located in an HDFS compliant file-storage through SQL Server • Connect to Dell EMC Isilon, Azure blob storage, Hadoop or Cloudera
What is Isilon? • Scale-out, Multi-protocol Network Attached Storage • Write with NFS, SMB 3, FTP, HTTP and read immediately with another protocol • 7000+ customers, scales from 15 TB to 68 PB • One. FS file system – Isilon OS • Each node includes processors, ram, network and disk. Compute on every node. Network: 10 Gb. E, 40 Gb. E. • Wide selection of disk choices for nodes allow for tiering of hot data (Flash), Hybrid and Archive storage. Automate data movement across storage tiers • Active Directory & Kerberos integration
Integrated Isilon and Poly. Base NFS Web Click data Poly. Base Cluster Step 2: Jobs are run Decision Support Databases name node EDW Step 1: Much or all of the Data lives on the Isilon/Hadoop Cluster Head Compute data node OLAP SMB, NFS, HTTP, FTP, HDFS node info name node Isilon
POC Architecture – MS SQL with Isilon HDFS Clickstream Isilon services as HDFS and Name-Nodes Enables Microsoft T-SQL queries in the HADOOP environment Parallel operation on HADOOP and MS SQL in the same database DSS Sensor MS SQL Integration 1. Direct queries from SQL to Isilon HDFS 2. External Pushdown OLAP EDW MS SQL 2016 Enterprise Edition Poly. Base Scale. Out Group Very efficient methods (SMB, FTP, NFS, HDFS) for data import by DELL EMC ISILON Multiple Hadoop applications can even simultaneously access the same dataset (Isilon) at the same time
Configuring Isilon for Poly. Base POC Steps 1. Create an Access Zone 2. Configure HDFS Service 3. Configure Network 4. Create pdw_user 5. Add Active Directory NS record & test
Configuring Isilon – POC • • Verify Licenses Create an Access Zone
Configuring Isilon – POC Configure HDFS
Configuring Isilon – POC Configure Smart. Connect
Configuring Isilon – POC Create pdw_user
Configuring Isilon – POC Add DNS entry, test.
Configuring Isilon - Resources EMC Isilon Best Practices Guide for Hadoop Data Storage - Dell EMC https: //www. emc. com/collateral/. . . /h 12877 -wp-emcisilon-hadoop-best-practices. pdf One. FS with HDFS Reference Guide - Dell EMC https: //www. emc. com/collateral/Technical. Document/doc u 84284. pdf
Install & Enable Poly. Base on SQL Server Install the Oracle Java SE Runtime Environment (JRE) 7. 51 (x 64) or 8. Do not install JRE 9!
Poly. Base Data Sources • • Data sources include Isilon, Azure, Hortonworks and Cloudera Add Multiple Data Sources (pending support) • Hortonworks HDP 1. 3, 2. 0, 2. 1, 2. 2 • Cloudera CDH 4. 3, 5. 1
Poly. Base External File Format • • • File formats are required You create and define your own File Formats Supported File Formats include: • • • Text (delimited) Hive ORC – Optimized Row Columnar Hive RCFile - Record Columnar (key-value)
Poly. Base External Tables • • External Tables are schema on read You define external table columns
Table Statistics Add statistics to optimize query performance
Query Testing. . . Used Adam Machanic’s sp_Who. Is. Active so I could get a better idea of what was going on. Observations… • Lots of Temp Tables! • Some data moving back and forth • Things seemed to take a long time • Data came back!
Insert into / Select from worked best. Insert into / Select From to Load the data you want to work with locally • Create indexes (Columnstore or other) and then join to local tables for best performance • Joining from local tables to external tables just didn’t work well • Nightly ETL processes to move relevant data from HDFS External Tables into SQL Server makes sense Opportunities for Improvement • Add additional Scale-out compute nodes (I plan to test SQL Express!)…also Containers & Powershell provisioning • Temp. DB on flash or SSD
DMV’s for troubleshooting and analysis Many DMV’s exist. The link at the bottom of this slide includes a great process on how to use them. https: //msdn. microsoft. com/library/ce 9078 b 7 -a 750 -4 f 47 -b 23 e-90 b 83 b 783 d 80
Planning for Growth Data lakes tend to fill up.
HDFS: Standard Hadoop Cluster NFS reply Decision Support Databases HTTP CIFS FTP OLAP EDW Compute Data Name node Node node Web Click data NFS Step 1: Data is copied into the Landing Zone file node copy 2 copy 3 info Landing Zone Servers node file info HDFS Step 2: Data is copied into the Cluster (3 times) node file info copy 3 copy 2 3 X node file info copy 2 copy 3 MAP Reduce MAP Reduce MAP Reduce Step 3: Hadoop Jobs are run
The EMC Isilon Advantage for Analytics HDFS 1 Scale-Out Storage Platform – Multiple applications & workflows 2 No Single Point of Failure – Distributed Name. Node 3 End-to-End Data Protection – Snapshot. IQ, Sync. IQ, NDMP Backup 4 Industry-Leading Storage Efficiency – >80% Storage Utilization 5 Independent Scalability – Add compute & storage separately 6 Multi-Protocol – Industry standard protocols – NFS, CIFS, FTP, HTTP, HDFS
Hadoop Architecture with Isilon R (RHIPE) PIG Compute Node Mahout Job Tracker Compute Node Hive HBase Name. Node Task Tracker Compute Node name node Compute Node name node data node Ethernet Data. Node
Can you really do big data with SQL Server 2016+ ? • • Yes you can, to a point. Poly. Base and a Hadoop Distributed File System (HDFS) allows you to get started with Big Data Many organizations use Dell EMC Isilon storage, which supports HDFS This is a reasonable, workable solution, tested with 2. 5 TB of unstructured marketing analytics (clickstream) data
Your Next Step: Big data pilot project • • Acquire some space on your organization’s Isilon cluster. Alternatively, you could look into Azure Blob Storage and Azure Compute (SQL Server & Windows Server). Load your data sets into your newly acquired storage Install SQL Server 2016+ (Head) and one or two Compute Nodes Create your External Data Source, External File Format, External Tables and start experimenting
https: //www. emc. com/products-solutions/trial-software-download/isilon. htm
Special Thanks to: • • • Dell EMC Big Data team - @Dell. EMCbigdata Christian Scharrer - Dell EMC Senior Systems Engineer Rob Sonders – Dell EMC Microsoft Specialist – SQL Server: @Robert. Sonders Michael Wells - Dell EMC Senior Systems Engineer: @Sql. Tech. Mike Dell EMC - Denver, CO office - Isilon Engineering Team: @Dell. EMCStorage
Thank You Sponsors Platinum Gold Silver Bronze Swag Venue
- Cơm
- Bài thơ mẹ đi làm từ sáng sớm
- Difference between oracle and pl sql
- Oracle sql developer real time sql monitoring
- Sql server 2005 management studio
- Sql server roadmap
- Logo sql sorgu örnekleri
- Sql server 2016 management studio
- Sql server parallel data warehouse
- Ssis-314
- Sql server 2017 windows 7
- Ms sql server architecture
- Sql server create database
- Windows awe
- Azure stretch database
- Sp_cursorprepexec
- Self service bi
- Sql server 101
- Sql server security best practice
- Visio sql server
- Sql server unlock table
- Sql server unstructured data
- Sql server 101
- Sql server 2000 sp
- Advanced stored procedure examples in sql server
- Microsoft sql server 2005 analysis services
- Inside the sql server query optimizer
- Sql server 2016 security features
- Sql server excessive grant
- Sql server 2019 bdc deployment
- Storage engine in sql server
- Sql server crash dump analysis