Loading Data in Azure Data Factory What is

  • Slides: 30
Download presentation
Loading Data in Azure Data Factory

Loading Data in Azure Data Factory

What is Azure Data Factory? Azure Data Factory is a cloud service that orchestrates,

What is Azure Data Factory? Azure Data Factory is a cloud service that orchestrates, manages, and monitors the integration and transformation of structured and unstructured data from on-premises and cloud sources at scale.

What is Azure Data Factory? I’d call it Paa. S

What is Azure Data Factory? I’d call it Paa. S

Most like…. SSIS DTS Informatica Between other cloud services and On Prem Sources, Destinations,

Most like…. SSIS DTS Informatica Between other cloud services and On Prem Sources, Destinations, Transformations

Is it just SSIS in the Cloud? 5

Is it just SSIS in the Cloud? 5

Another kind of MVP • Minimally Viable Product • Big Data Scenario • Emphasis

Another kind of MVP • Minimally Viable Product • Big Data Scenario • Emphasis on new tech, JSON based 6

Where Portal. azure. com New>Data+Analytics>Data Factory

Where Portal. azure. com New>Data+Analytics>Data Factory

Azure Pricing Cloud/On Prem Activities Data Movement Units https: //azure. microsoft. com/en-us/pricing/details/data-factory/ https: //azure.

Azure Pricing Cloud/On Prem Activities Data Movement Units https: //azure. microsoft. com/en-us/pricing/details/data-factory/ https: //azure. microsoft. com/en-us/documentation/articles/datafactory-copy-activity-performance/#cloud-data-movement-units

Data Movement Units The cloud data movement unit is a measure that represents the

Data Movement Units The cloud data movement unit is a measure that represents the power (combination of CPU, memory and network resource allocation) of a single unit in the Azure Data Factory service that is used to perform a cloud-to-cloud copy operation. Configurable

Three Main Elements • Linked Services – Think Connection Managers • Datasets—Schemas Think mapping

Three Main Elements • Linked Services – Think Connection Managers • Datasets—Schemas Think mapping of Data Flows • Pipeline –Think Data Flows • Activities –Types of Data Flows 1 0

Getting around ADF Interface 1 1

Getting around ADF Interface 1 1

Main Dev Environments • Author and Deploy (Portal) • Copy Data (Portal, preview) •

Main Dev Environments • Author and Deploy (Portal) • Copy Data (Portal, preview) • Diagram • Monitor and Manage • Visual Studio 1 2

Author and Deploy 1 3

Author and Deploy 1 3

Copy Data (Wizardish) New Tab in Browser 1 4

Copy Data (Wizardish) New Tab in Browser 1 4

Monitor and Manage New Tab in Browser 1 5

Monitor and Manage New Tab in Browser 1 5

Diagram 1 6

Diagram 1 6

Visual Studio Extension

Visual Studio Extension

JSON pronounced Jay-Sahn Java. Script Object Notation http: //json. org/

JSON pronounced Jay-Sahn Java. Script Object Notation http: //json. org/

JSON is built on two structures: name/value • A collection of pairs. In various

JSON is built on two structures: name/value • A collection of pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. { } • An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. [ ] Java. Script Object Notation http: //json. org

JSON in ADF, Dataset Example { "name": "On. Prem. Actor. Srce", "properties": { "published":

JSON in ADF, Dataset Example { "name": "On. Prem. Actor. Srce", "properties": { "published": false, "type": "Sql. Server. Table", "linked. Service. Name": "North. Wind. Stg", "type. Properties": { "table. Name": "Actor" }, "availability": { "frequency": "Day", "interval": 1 }, "policy": { "external. Data": { "retry. Interval": "00: 01: 00", "retry. Timeout": "00: 10: 00", "maximum. Retry": 3 } } }

JSON specific to ADF https: //msdn. microsoft. com/enus/library/azure/dn 835050. aspx

JSON specific to ADF https: //msdn. microsoft. com/enus/library/azure/dn 835050. aspx

Data Gateways & ADF Supplies key Install Gateway on each On Prem resource (server,

Data Gateways & ADF Supplies key Install Gateway on each On Prem resource (server, laptop, etc) A resource can only store one key for use by ADF, so that usually means there can be only data factory 2 2

Data Management Gateway Configuration Manager • • http: //www. microsoft. com/en-us/download/details. aspx? id=39717 •

Data Management Gateway Configuration Manager • • http: //www. microsoft. com/en-us/download/details. aspx? id=39717 • • For on prem machines. • The Gateway is for the entire server. The entire machine. The Linked service will use that gateway for other things and must be configured for each service i. e. Sql databases. • Be patient. Refresh rate is slow and can make it seem like it didn’t work when it did. Instructions on use: https: //azure. microsoft. com/en-us/documentation/articles/datafactory-move-data-between-onprem-and-cloud/#using-the-data-gateway-step-bystep-walkthrough Load the Gateway on the machine. Then go to the Azure Data Factory. Create the Linked Service Gateway there. Get the key from the ADF linked service, copy and paste it into the final step of the Gateway setup on the On Prem Machine.

Slices • • Each unit of data consumed and produced by an activity run

Slices • • Each unit of data consumed and produced by an activity run is called a data slice. • "sql. Reader. Query": "$$Text. Format('select * from My. Table where timestampcolumn >= \'{0: yyyy-MM-dd HH: mm}\' AND timestampcolumn < \'{1: yyyy-MM-dd HH: mm}\'', Window. Start, Window. End)" They have Start. Time and End. Time and those are accessible to the pipeline activity via ADF System Variables:

Using Slices • http: //blogs. msdn. com/b/bigdatasupport/archive/2016/01/24/incremental-data-loadfrom-azure-table-storage-to-azure-sql-using-azure-data-factory. aspx 2 5

Using Slices • http: //blogs. msdn. com/b/bigdatasupport/archive/2016/01/24/incremental-data-loadfrom-azure-table-storage-to-azure-sql-using-azure-data-factory. aspx 2 5

Visual Studio Extension • • • Azure SDK 2. 7 and above for Visual

Visual Studio Extension • • • Azure SDK 2. 7 and above for Visual Studio 2013 You get templates You can reverse engineer You can connect to your factory and deploy from VS Came out JULY 22, 2015 ENABLES SOURCE CONTROL!

Resources • Simple SIMPLE tutorial. https: //azure. microsoft. com/enus/documentation/articles/data-factory-get-started/ • Wee Hyong Tok’s webcast

Resources • Simple SIMPLE tutorial. https: //azure. microsoft. com/enus/documentation/articles/data-factory-get-started/ • Wee Hyong Tok’s webcast https: //info. microsoft. com/Webnar-Introduction-to. Azure-Data-Factory. html • • Reza Rad’s blog http: //www. radacad. com/blog Understanding Azure Storage: https: //azure. microsoft. com/enus/documentation/videos/azure-storage-5 -minute-overview/

Loading ADL with ADF

Loading ADL with ADF

Loading ADL with ADF https: //azure. microsoft. com/en-us/blog/creating-big -data-pipelines-using-azure-data-lake-and-azuredata-factory/

Loading ADL with ADF https: //azure. microsoft. com/en-us/blog/creating-big -data-pipelines-using-azure-data-lake-and-azuredata-factory/

Loading ADL with ADF

Loading ADL with ADF