Near Real Time ETLs with Azure Serverless Architecture

  • Slides: 22
Download presentation
Near Real Time ETLs with Azure Serverless Architecture Samara Soucy Innovative Architects 9/16/18

Near Real Time ETLs with Azure Serverless Architecture Samara Soucy Innovative Architects 9/16/18

Samara Soucy Microsoft Certified Specialist – Programing in C# Software Development Consultant – Innovative

Samara Soucy Microsoft Certified Specialist – Programing in C# Software Development Consultant – Innovative Architects samarasoucy@gmail. com @oneangrypenguin Oneangrypenguin. com

What does it mean when a service is serverless? • There are still servers

What does it mean when a service is serverless? • There are still servers (duh) • Next iteration after Paa. S • Consumers don’t have to worry about resource management. • Dynamic Scaling • Pricing is usage based • (Usually) easier to develop against

ETL via Microservices Pros Cons performance • Divide and Conquer • Lose because •

ETL via Microservices Pros Cons performance • Divide and Conquer • Lose because • Easier to scale just components must the parts that need it • Removes the possibility of dependency hell • Services are easy to understand maintain. communicate rather than having a single process. • Complicated system to maintain.

Service Roles • Source & Destination • Event – “We’ve got data to work

Service Roles • Source & Destination • Event – “We’ve got data to work on. ” • Manager – Decides when an event will happen or controls • • process flow. Router – Moves events between services. May transmit data as well. Worker – Moves data between source and destination, may perform transforms. Monitor – Makes sure that the processes are all running and may check data quality. Utility – Performs background maintenance tasks (ex. Moving old data to an archive). ** A service will often fit into multiple roles.

Basic Architecture

Basic Architecture

Separate Manager and/or Router

Separate Manager and/or Router

ETL via Microservices pt. 2 • • Entrance to the ETL should be as

ETL via Microservices pt. 2 • • Entrance to the ETL should be as close to the originating event as possible. • Each service should preform a single task. Usually. When it doesn’t create unessesary complexity. Balance is important. • When chaining services together, consider whether or not you may want multiple things to trigger off a specific processing stage. If there is a posibility that you need to fan out or fan in at a specific point, make sure the data stream makes a stop at at a router. • Take advantage of the Visual Studio projects for Azure serverless. Keeping the code for all the services in a single solution makes it easier to pull up all the code for your ETL system than clicking around in the portal. If possible, event data should be possible to pass through Azure queue systems. Event Grid is the smallest at 64 KB per message.

Azure Serverless Services Meet the toys!

Azure Serverless Services Meet the toys!

The Workers pt. 1 Logic Apps • • Similar experience to SSIS (but better)

The Workers pt. 1 Logic Apps • • Similar experience to SSIS (but better) • • Strongest at managing process flow. • Limited transform capabilities, but it can call Functions to perform that task. Integrates with Azure services, Http, timers and many 3 rd party APIs out of the box. No code required (minus any Functions integrations). Functions • • Extends the capabilities of Web Jobs. • Integrates natively with many other Azure services, HTTP, and timers triggers. • Small processes built in C# or JS. Experimental support for other languages like Python, PHP, Powershell and others. Strongest tool for transforms, most flexible.

The Workers pt. 2 Stream Analytics • • SQL-like queries against a data stream.

The Workers pt. 2 Stream Analytics • • SQL-like queries against a data stream. • Allows for measures to be computed in real time. • Strong for finding values out of range, trends, and possible fraud. Primarily attatches to Event Hub or Iot Hub, paired with Azure storage for contextual data. Cognitive Services • • Machine Learning as an API • Integration with Stream Analytics- call many of the APIs directly from your SQL Query. Lots of tools like speech to text, sentiment analysis, search, natural language processing, and many more.

The Routers Event Grid • • HTTPS input and push to subscribers. Offers some

The Routers Event Grid • • HTTPS input and push to subscribers. Offers some basic filtering of which events get pushed to which subscribers. Event Hub • • HTTPS (in) or APMQ based message routing. • Handles both batched messages and streams of data. • If you are going to use Stram Analytics, you want Event Hub in your pipeline. Caches messages for a set period of time so they can be replayed. Default is 24 hours.

The Destinations Cosmos DB • • No SQL successor to Document. DB Multiple storage

The Destinations Cosmos DB • • No SQL successor to Document. DB Multiple storage types, multiple API options Scale on demand Globally distributed Azure Storage • • File and Blob storage • Used as a destination and as a way to store data that will be used in multiple services rather than passing it in the routers. Will be used to store code and logs for most of the other serverless offerings

Application Insights (The Monitor) • Comprehensive application monitoring and logging. • Azure Funtions are

Application Insights (The Monitor) • Comprehensive application monitoring and logging. • Azure Funtions are closely integrated with App Insights, almost all metrics run through there. • When using App Insight with Azure Funtions, consider turning down the sampling rate. You will end up spending significantly more on app insights than on your Functions • May also act as a source since it provides a real time feed of the status of whatever application it is monitoring. Setup continuous export to Azure Storage to allow real time ingestion by a real time analytics feed.

Links • https: //azure. microsoft. com/en-us/overview/serverlesscomputing/ • Presentation Links: • • https: //nrtdemoweb. azurewebsites.

Links • https: //azure. microsoft. com/en-us/overview/serverlesscomputing/ • Presentation Links: • • https: //nrtdemoweb. azurewebsites. net/ https: //github. com/serri 588/NRT_ETL_Demo