Yossi Elkayam Sr BI Azure Architect Microsoft Services
Yossi Elkayam Sr. BI & Azure Architect Microsoft Services yelkayam@microsoft. com
Age of data ~50 Zettabytes
BI Database http: //db-engines. com/en/ranking
Cloud Relational Beyond Relational Azure Data Lake SQL Server Azure VM HDInsight Azure SQL DB Document. DB Azure SQL DW Microsoft Data Platform Power BI Azure Machine Learning Azure Data Factory On-premises Federated Query SQL Server APS Comprehensive Connected SQL Server Choice
Cortana Analytics
Azure Internet of Things (Io. T) Suite & Dynamics Devices RTOS, Linux, Windows, Android, i. OS Batch Analytics & Visualizations Azure HDInsight, Azure. ML, Power BI, Azure Data Factory Protocol Adaptation Hot Path Analytics Field Gateway Azure Stream Analytics, Azure HDInsight Storm Protocol Adaptation Presentation & Business Connectivity App Service, Websites Cloud Gateway Field Gateway Event Hubs & Io. T Hub Device Connectivity & Management Hot Path Business Logic Service Fabric & Actor Framework Dynamics, Biz. Talk Services, Notification Hubs Analytics & Operationalized Insights Presentation & Business Connectivity
7 Streaming Analytics Event producers Collection Event Queuing System Transformation Azure ML Long-term storage Live Dashboards Web/thick client dashboards Applications Devices Cloud gateways (web APIs) Storm HDInsight Event hubs Stream Analytics Sensors Kafka/Rabbit. MQ/ Active. MQ Web and social Field gateways Storage adapters Stream processing Presentation and action Apache HBase on HDInsight Document. DB Solr Azure Search Mongo. DB SQL Search and query Data analytics (Excel) Event hub Devices to take action
Io. T Scenario - Connected Cars / Devices Document. DB Document Store HBase SQL Azure No SQL Store Relational Store Event Hubs Queue Service Power. BI Get Data Cloud gateways Apache Storm Get Reference Data Business Logic Store Raw Data Store Reporting Data Queue Service Event Hubs Live Dashboard
CLOUD MICROSOFT DATAPLATFORM MICROSOFT DATA VISUALIZATION Analyze and Authoring ON PREMISES SQL Server in Azure VM AG -Async replica SQL Database + Elastic Scale LRS – Geo-Replication BI and Advanced Analytics Database Engine Document. DB Buffer Pool Ext. Query Store Stretch. DB Columnstore Row Store Resource Governor Transactional Replication In-Memory Row level security, Transparent Data Encryption, Always Encrypted, Data Masking, Auditing, Compliance Analysis Services Tabular, Multi-Dimensional Power. Pivot, BISM, Data Mining, KPI, BISM Reporting Services SQL Agent, Database Mail, Linked Servers, Managed Backup, Backup to Azure Analytical Reports Microsoft Excel Power BI Desktop Mobile Reports SQL Server Database Engine Relational, XML, JSON, Spatial, Full. Text, Binary, Image, File. Table, Filestream Azure Marketplace Native, Share. Point Integrated Table Storage Redis Cache Blob Storage Azure HDInsight Map Reduce, Pig, Hive, Hbase, Storm, Spark Mobile Report Publisher Paginated Reports Report Builder Report Designer Delivery MPP Azure SQL Data Warehouse Azure Data Lake Store R Services Cloud Power BI Service On-Premises Information Management & Data Orchestration Data Quality Services Master Data Services Integration Services HA/DR Always. On Replication Log Shipping Azure Data Factory Dimensional Modeling, Star, Snowflake, Polybase Reference Architectures Appliances Polybase APS Massively Parallel Processing Common Tools SQL Server Management Studio SQL Server Data Tools Azure Data Catalog SQL Reporting Services Azure Search Consume AG Polybase Physical / Virtual Deployment FCI Data Warehousing Command Line (Power. Shell, BCP, SQLCMD) Power BI Web Portal Event Hub Stream Analytics Azure Data Lake Analytics Azure ML Windows Phone App Android App i. OS App Reporting Services Portal Migration & Upgrade Tools (SSMA, Upgrade Advisor, Map Toolkit) Share. Point Cortana
“Big Data Reality Framework…” Reference Implementation – Products + Apache™ Hadoop™ on Windows Azure Blob Storage Lib. HDFS FTPS Stream. Insight HCatalog Oozie JDBC Connector Active Directory Ambari http: //static. googleusercontent. com/exte rnal_content/untrusted_dlcp/research. goo gle. com/en/us/pubs/archive/41344. pdf Map Reduce Need to Know Hadoop: The Definitive Guide 3 rd Ed. Good to Know - Tom White, O’Reilly Books
סיפור אישי Yossi Elkayam Sr. BI & Azure Architect Microsoft Services yelkayam@microsoft. com
Ever growing data, ever shrinking IT End-users DBAs Storage Admins & TDM’s
. . . להמציא את עצמנו מחדש Yossi Elkayam Sr. BI & Azure Architect Microsoft Services yelkayam@microsoft. com
SQL Server 2016: Everything built-in #1 performance 6 years in a row Industry leader built-in A fraction of the cost R + in-memory 0 0 -10 6 00 00 19 19 6/ 5/ 15 Oracle 1/ 1/ 1/ 20 Tableau $120 SQL Server 3 5 22 Microsoft 0 4 -20 -40 19 4/ 3/ 1/ 0 1 -30 00 00 19 2/ 1/ 19 00 least vulnerable 18 3 SQL Server $480 SQL Server 22 29 34 43 -50 49 -60 -70 #1 #2 #3 Oracle is #4 69 -80 SQL Server Oracle My. SQL 2 SAP HANA TPC-H non-clustered 10 TB $2, 230 Self-service BI per user In-memory across all workloads Consistent experience from on-premises to cloud The above graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. National Institute of Standards and Technology Comprehensive Vulnerability Database update 10/2015 TPC-H non-clustered results as of 04/06/15, 5/04/15, 4/15/14 and 11/25/13, respectively. http: //www. tpc. org/tpch/results/tpch_perf_results. asp? resulttype=noncluster at massive scale
SQL Data Warehouse Architecture Application or User connection SQL DB Control Node Data Loading (Poly. Base, ADF, SSIS, REST, OLE, ODBC, ADF, AZCopy, PS) SQL DB Control – “The Brain” An endpoint for connection and tools. Coordinates storage/compute activity. DMS Massively Parallel Processing (MPP) Engine DMS Compute Node SQL DB DMS Compute Node Azure Infrastructure and Storage Blob storage [WASB(S)] Compute – “The Brawn” Handles query processing, ability to scale up/ down Data Movement Services Coordinates data movement from nodes/storage Storage AddLoad data to WASB(S) without incurring compute costs
How does Stretch work? Internet boundary Source Database Hot/Active Data TRICKLE MIGRATION Cold/Historical Closed Data Source SQL Server Creates a secure connection between the Source SQL Server and Azure Remote Database Remote Table Provisions remote instance and begins migration Apps and Queries continue to run for both the local database and remote endpoint Security controls and maintenance remain local Microsoft Azure
Machine Learning (Mahout) Query (Hive) Distributed Processing (Map. Reduce) Distributed Storage (HDFS) Data Integration (Sqoop/REST/ODBC) Scripting (Pig) No. SQL Database (HBase) Workflow &Scheduling (Oozie) Coordination (Zoo. Keeper) Management & Monitoring (Ambari)
Azure Data Lake Batch, real-time and interactive analytics made easy Azure Data Lake Analytics service Managed clusters (HDInsight) ` YARN Web. HDFS Store Unstructured Semi-Structured Principles • Maximize return on accessible data • Reduce time to value • Reduce time to insight Approach • Productivity day one (Developers, Scientists, Analysts) • Open (Yarn, HDFS), designed for the cloud • All data available for analysis • Leverages existing skills, use SQL, Spark, Hive, Storm, Hbase • Dynamically scales to meet your business objectives • Managed and supported with an enterprise grade SLA
Store all data Ingest all data regardless of requirements Devices Do analysis in native format without schema definition Using analytic engines like Hadoop Batch queries Social Interactive queries LOB applications Devices Video Real-time analytics Social LOB applications Sensors Web Sensors Video Relational Web Relational Clickstream Machine Learning Data warehouse
Poly. Base Query relational and non-relational data with T-SQL Capability Quote: ************ T-SQL query *********************** SQL Server Name DOB Denny Usher 11/13/58 Usher Gina Burch 04/29/76 State WA ME Deeper insights across data Hadoop Data Lake Store $658. 39 T-SQL for querying relational and non-relational data across SQL Server and Hadoop Upcoming Æ Support for HDInsight and Azure Data Lake Store Æ 90 day certification SLA for Hortonworks and Cloudera distributions Æ HDFS Streaming (performance) Æ Kerberos support Æ String pushdown / collation support
CREATE EXTERNAL DATA SOURCE Hadoop. Cluster WITH (TYPE = Hadoop, LOCATION = 'hdfs: //10. 193. 26. 177: 8020', RESOURCE_MANAGER_LOCATION = '10. 193. 26. 178: 8050'); Once per Hadoop Cluster CREATE EXTERNAL FILE FORMAT Text. File WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org. apache. hadoop. io. compress. Gzip. Codec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); Once per File Format CREATE EXTERNAL TABLE [dbo]. [Customer] ( [Sensor. Key] int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May 2014/sensordata. tbl', DATA_SOURCE = Hadoop. Cluster, FILE_FORMAT = Text. File ) HDFS File Path
CREATE DATABASE SCOPED CREDENTIAL Hadoop. Credential WITH IDENTITY = 'hadoop. User. Name', Secret = 'hadoop. Password'; CREATE EXTERNAL DATA SOURCE Hadoop. Cluster WITH (TYPE = Hadoop, LOCATION = 'hdfs: //10. 193. 26. 177: 8020', RESOURCE_MANAGER_LOCATION = '10. 193. 26. 178: 8050', Hadoop. Credential); CREATE EXTERNAL FILE FORMAT Text. File WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org. apache. hadoop. io. compress. Gzip. Codec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); Once per Hadoop User Once per Hadoop Cluster per user Once per File Format CREATE EXTERNAL TABLE [dbo]. [Customer] ( [Sensor. Key] int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May 2014/', DATA_SOURCE = Hadoop. Cluster, FILE_FORMAT = Text. File ); HDFS File Path
Federated queries: Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need • Push SQL expressions to remote SQL sources • Filters • Joins y r ue Q Azure Storage Blobs e rit W Query U-SQL Query Qu e Azure Data Lake Analytics Azure SQL in VMs Qu e ry ry Azure SQL DB Azure SQL Data Warehouse
Handling variety of data and model changes Information are stored in JSON format Support complex analysis on JSON documents Pillars Benefits Drivers Modern services exchange data in JSON format Fast built-in JSON/relational data conversion Combination of relational and JSON data The power of T-SQL and SQL Server engine Integration with all SQL Server components
Built-in functions ISJSON_VALUE JSON_MODIFY JSON_QUERY [ { }, { ] } "Number": "SO 43659", "Date": "2011 -05 -31 T 00: 00" "Account. Number": "AW 29825", "Price": 59. 99, "Quantity": 1 "Number": "SO 43661", "Date": "2011 -06 -01 T 00: 00“ "Account. Number": "AW 73565“, "Price": 24. 99, "Quantity": 3 OPENJSON Transforms JSON text to table SO 43659 2011 -05 -31 T 00: 00 MSFT 59. 99 1 SO 43661 2011 -06 -01 T 00: 00 Nokia 24. 99 3 FOR JSON Formats result set as JSON text.
OPENJSON
{ "name": "Microsoft", "homepage_url": "www. microsoft. com", "blog_url": "blogs. microsoft. com/", "products": [ { "name": "Azure", "permalink": "azure. com" } ], "offices": [ { "address 1": "1 Redmond Way", "zip_code": "98052", "city": "Redmond", "state_code": "WA", "country_code": "USA" } ] Part of No. SQL family Built for simplicity cale and performance Non-relational o enforced schema }
Part of No. SQL family Built for simplicity cale and performance Non-relational o enforced schema { "id": "itemdata 2344", "data": "TWFu. IGlz. IGRpc 3 Rpbmd 1 a. XNo. ZWQs. I G 5 vd. CBvbmx 5 IGJ 5 IGhpcy. By. ZWFzb 24 s. IGJ 1 d. CBie. SB 0 a. Glz. IHNpbmd 1 b. GFy. IHB c 3 Npb 24 g. Zn. Jvb. SBvd. Ghlcibmlt. YWxz LCB 3 a. Glja. CBpcy. Bh. IGx 1 c 3 Qgb 2 Yd. Ghl. I G 1 pbm. Qs. IHRo. YXQg. Ynkg. YSBw. ZXJz. ZX Zlcm. Fu. Y 2 Ugb 2 Yg. ZGVsa. Wdod. CBpb B 0 a. GUg. Y 29 ud. Glud. WVk. IGFu. ZCBpbm Rl. Zm. F 0 a. Wdh. Ymxl. IGdlbm. Vy. YXRpb 24 gb 2 Yga 25 vd 2 xl. ZGdl. LCBle. GNl. ZWRz. IH RZSBza. G 9 yd. CB 2 ZWhlb. WVu. Y 2 Ugb 2 Yg. YW 55 IGNhcm 5 hb. CBwbhc 3 Vy. Z 4==" }
Part of No. SQL family Built for simplicity cale and performance Non-relational o enforced schema
{ Jill Ben Susan Andrew Sven Thomas {id: {id: "Jill" }, "Ben", manager: "Jill" }, "Susan", manager: "Jill" }, "Andrew", manager: "Ben" }, "Sven", manager: "Susan" }, "Thomas", manager: "Sven" } } To get the manager of any employee is trivial - SELECT manager FROM org WHERE id = "Susan"
{ id: "CDC 101", title: “The Fundamentals of Database Design", title. Words: ["database", "design", "database design"], credits: 10 } Consider using a Reg. Ex to transform words to lowercase and remove punctuation. Strip out stop words like “to”, “the”, “of” etc. Denormalize keywords in to key phrases SELECT books. title FROM books WHERE ARRAY_CONTAINS(books. title. Words, "database")
{ id: ". . . ", timestamp. Minute: ". . . ", readings: [ {minute: 0, reading: 123}, {minute: 1, reading: 456}, . . . {minute: 59, reading: 999} ] }
{ id: ". . . ", timestamp: ". . . ", log. Data: {attr 1: value 1, attr 2: value 2, . . . } }
Ever growing data, ever shrinking IT End-users DBAs Storage Admins & TDM’s
- Slides: 41