Custom Activities in Azure Data Factory Presented by
Custom Activities in Azure Data Factory Presented by Jared Zagelbaum Senior Consultant, Blue Granite
Introduction • About me: ▫ ▫ ▫ Microsoft Data Platform since 2008 Azure since Azure MCSE Data & Analytics Microsoft Certificate in Data Science (R) Senior Consultant with Blue Granite Recent projects (last 6 months) – Manufacturing, Logistics Technologies implemented - Power BI, SQL DW, ADF, SSIS (BIML), SSAS, SQL Server, Azure Data Lake, Dev. Ops / CI
Objectives • • Understand when to use a custom activity Know how to go about creating one Save you some pain with undocumented things I’ve encountered Appreciate the scope of what you can really do with ADF v 2 orchestrating Azure services
Agenda • Azure Prerequisites for Custom Activities ▫ Overview of Azure Batch ▫ Implementation of custom activities in Azure Data Factory (v 1 and v 2) • Review the use cases for custom activities in Azure Data Factory • ADFv 1 Deep Dive ▫ Setting up development environment for ADFv 1 custom activities ▫ Developing a custom activity for ADF v 1 ▫ Deployment and Debugging • ADFv 2 Deep Dive ▫ Developing a custom activity for ADF v 2 (much more fun version) ▫ Deployment and Debugging
Azure Batch • Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. • There is no cluster or job scheduler software to install, manage, or scale. • There is no additional charge for using Batch. You only pay for the underlying resources consumed, such as the virtual machines, storage, and networking. • Batch works well with intrinsically parallel (also known as "embarrassingly parallel") workloads-- where the applications can run independently, and each instance completes part of the work.
Custom Activities Compared ADF v 1 vs v 2 • ADFv 1 ▫ Execution restricted to single activity run (no version 2 Custom Activity version 1 (Custom) Dot. Net Activity How custom logic is defined By providing an executable By implementing a. Net DLL Execution environment of the custom logic Windows or Linux Windows (. Net Framework 4. 5. 2) Executing scripts Supports executing scripts directly (for example "cmd /c echo hello world" on Windows VM) Requires implementation in the. Net DLL Dataset required Optional Required to chain activities and pass information Pass information from activity to custom logic Through Reference. Objects (Linked. Services and Datasets) and Extended. Properties (custom properties) Through Extended. Properties (custom properties), Input, and Output Datasets Retrieve information in custom logic Parses activity. json, linked. Services. json, and datasets. json stored in the same folder of the executable Through. Net SDK (. Net Frame 4. 5. 2) Logging Writes directly to STDOUT Implementing Logger in. Net DLL Differences opportunity to scale within an activity definition) • ADFv 2 ▫ Can run parallel / scale out easily via control activities ▫ Can run packaged executables if callable from command line (Linux or Windows)– not just scripts! ▫ Must use cloud hosted integration runtime and Azure batch
Key Takeaways… • ADFv 1 ▫ Custom (. Net) activities are designed to interact with datasets that require specific access methods / transformation rules. ▫ Azure Batch is used as an anonymizer of resources more than for its actual potential to scale. ▫ Requires. Net 4. 5. 2, IDot. Net. Activity interface, and Nu. Get Package Microsoft. Azure. Management. Data. Factories – if you need a custom activity, you’re basically building it from scratch • ADFv 2 ▫ Run any executable- self compiled, script, or packaged executable (with command arguments)…Windows or Linux OS. ▫ Control activities leverage the full power of Azure Batch to scale out parallel workloads ▫ “No holds barred” – not expected to produce or transform a dataset
Use cases for custom activities ADFv 1 ADFv 2 • You need to access a source or service not supported with native components • You need to perform a specific compute task on “small data” • ADFv 1 use cases • You want to run an SMP application based on conditions / wall clock and possibly have the output of the application trigger additional actions • You want batch processes logging all to a common system • You are filling in the holes unsupported in current SSIS lift and shift: https: //docs. microsoft. com/enus/sql/integration-services/lift-shift/ssis-azurevalidate-packages
ADFv 1 Deep Dive
ADFv 1 • Adding custom code to projects and deployment is fairly easy with Data Factory Tools for Visual Studio 2015 • Debugging. Net class library requires additional work • Developing pipelines and activities is all JSON • Debugging in ADFv 1 is centralized
ADFv 2 Deep Dive
ADFv 2 • Use any development environment you want, heck, even any framework as long as its SMP based • No slick tooling for deployment like in v 1 = slightly more work • Debugging locally doesn’t require much refactoring if any • Developing pipelines and activities is helped by visual editor initially • Debugging in ADFv 2 is buggy– its still in preview!
Session Summary • ADFv 1 and v 2 use Azure batch to run custom activities ▫ v 1 mostly for convenience ▫ v 2 fer reelz yo– you can use it to run parallel tasks at enormous scale (along with your Azure bill) • ADFv 1 had a dream of things being nice, neat, tumbling windows where custom activities had a certain place in this tiny little world • ADFv 2 lets you run pretty much any workload and access pretty much any data source without restriction to platform or scale, and orchestrates everything into a single service. Kids, you can drive the car now.
Evaluation • Thanks for attending and filling out the evaluations for all the sessions you go to today– they really matter to the presenters!
- Slides: 14