Microsoft Big Data Essentials Module 1 Introduction to
Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya
• Why Big Data? Agenda • Big Data Lambda Architecture • Getting started with Windows Azure HDInsight Service
Big Data Lambda Architecture
• Batch layer • • Stores master dataset Compute arbitrary views • Speed layer • • Fast, incremental algorithms Batch layer eventually overrides speed layer • Serving layer • • Random access to batch views Updated by batch layer Batch Layer Speed Layer Serving Layer
• Stores master dataset (in append mode) • Unrestrained computation • Horizontally scalable • High latency Incoming data streams Master dataset Batch views
Real-time views • Stream processing of data • Stores a limited window of data • Dynamic computation Incoming data streams Process stream Real-time increments Increment views
Batch views • Queries the batch and real-time views • Merges the results Querying and merging Real-time views Output
Batch Layer Speed Layer Serving Layer
Batch Layer Speed Layer Apache Hadoop Staging Database Serving Layer SQL Server Analysis Service (SSAS) Microsoft Excel and Power. Pivot Other BI Tools and Custom Applications SQL Server Connector (Hadoop Hive ODBC) SQL Server Analysis Services (SSAS Cube) Hadoop Data Third Party Database + Custom Applications Microsoft Excel & Power. Pivot for Excel
Batch Layer Windows Azure HDInsight Speed Layer Serving Layer Reactive Extensions (Rx) Microsoft Dynamics AX SQL Server Database (In. Memory OLTP) SQL Server Analysis Services Reactive Extensions (Rx) SQL Server Reporting Services Data Feed from Smart Meters Windows Azure HDInsight SQL Server (In-Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services
Windows Azure Storage
Batch Layer Azure Blob storage Speed Layer Serving Layer Azure Storage Explorer Windows Azure Blob storage
• Store large amounts of unstructured text or binary data with the fastest read performance • Highly scalable, durable, and available file system • Blobs can be exposed publically over HTTP • Securely lock down permissions to blobs http: //<account>. blob. core. windows. net/<container>/<blobname> Account Container Blob PIC 01. JPG Images Contoso Pages/ Blocks Block/Page PIC 02. JPG Video Block/Page VID 1. AVI
Batch Layer Windows Azure HDInsight Speed Layer Serving Layer HDInsight Console Azure Blob storage Windows Azure HDInsight Console https: //<Cluster. Name>. azurehdinsight. net/ Windows Azure Blob storage
Batch Layer Windows Azure HDInsight Speed Layer Serving Layer HDInsight Console Azure Blob storage Windows Azure HDInsight CSV files from local disk HDInsight Console https: //<Cluster. Name>. azurehdinsight. net/ Windows Azure Blob storage
Easy Access to Data, Big & Small
Search, Access & Shape Simplify access to public & corporate data Key Features Easily preview, shape, & format your data Power Query Windows Azure Marketplace Windows Azure HDInsight Service Combine with Unstructured Combine and refine data across multiple sources Gain insight across relational, unstructured, & semi-structured data Easily Manage & Query Common management of structured & unstructured data Query across relational DB & Hadoop with single T-SQL Query Parallel Data Warehouse with Polybase
http: //blogs. msdn. com/b/windowsazure/archive/2013/03/ 19/getting-started-with-hdinsight. aspx http: //blogs. msdn. com/b/windowsazure/archive/2013/03/ 21/azure-hdinsight-and-azure-storage. aspx
Questions?
- Slides: 23