Apache Flink Stephan Ewen Flink committer cofounder CTO

  • Slides: 44
Download presentation
Apache Flink Stephan Ewen Flink committer co-founder / CTO @ data Artisans @Stephan. Ewen

Apache Flink Stephan Ewen Flink committer co-founder / CTO @ data Artisans @Stephan. Ewen

Looking back one year 2

Looking back one year 2

April 16, 2014 3

April 16, 2014 3

Stratosphere 0. 4 Pact API (Java) Data. Set API (Scala) Stratosphere Optimizer Stratosphere Runtime

Stratosphere 0. 4 Pact API (Java) Data. Set API (Scala) Stratosphere Optimizer Stratosphere Runtime Local Remote Batch processing on a pipelining engine, with iterations … 4

Looking at now… 5

Looking at now… 5

What is Apache Flink? Real-time data streams Flink (master) Event logs Kafka, Rabbit. MQ,

What is Apache Flink? Real-time data streams Flink (master) Event logs Kafka, Rabbit. MQ, . . . Historic data HDFS, JDBC, . . . ETL, Graphs, Machine Learning Relational, … Low latency, windowing, aggregations, . . .

HBase Data. Set (Java/Scala) Data. Stream (Java/Scala) JDBC Flink Optimizer Stream Builder Hadoop M/R

HBase Data. Set (Java/Scala) Data. Stream (Java/Scala) JDBC Flink Optimizer Stream Builder Hadoop M/R Dataflow SAMOA Dataflow ML Table HCatalog Gelly HDFS Python What is Apache Flink? Flink Dataflow Runtime Kafka Rabbit. MQ Flume Local Remote Yarn Tez Embedded 7

Batch / Steaming APIs case class Word (word: String, frequency: Int) Data. Set API

Batch / Steaming APIs case class Word (word: String, frequency: Int) Data. Set API (batch): val lines: Data. Set[String] = env. read. Text. File(. . . ) lines. flat. Map {line => line. split(" "). map(word => Word(word, 1))}. group. By("word"). sum("frequency"). print() Data. Stream API (streaming): val lines: Data. Stream[String] = env. from. Socket. Stream(. . . ) lines. flat. Map {line => line. split(" "). map(word => Word(word, 1))}. window(Count. of(1000)). every(Count. of(100)). group. By("word"). sum("frequency"). print() 8

Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.

Technology inside Flink case class Path (from: Long, to: Long) val tc = edges. iterate(10) { paths: Data. Set[Path] => val next = paths. join(edges). where("to"). equal. To("from") { (path, edge) => Path(path. from, edge. to) }. union(paths). distinct() next } Group. Red sort Type extraction stack Dataflow Graph forward Join Hybrid Hash build. H T Cost-based optimizer hash-part [0] Map Data. Sourc e Filter Pre-flight (Client) probe lineitem. tbl Data. Sourc e orders. tbl Program deploy operators Memory manager Out-of-core algos Batch & Streaming State & Checkpoints Workers track intermediate results Recovery metadata Task scheduling Master

Flink by Feature / Use Case 10

Flink by Feature / Use Case 10

Data Streaming Analysis 11

Data Streaming Analysis 11

Life of data streams § Create: create streams from event sources (machines, databases, logs,

Life of data streams § Create: create streams from event sources (machines, databases, logs, sensors, …) § Collect: collect and make streams available for consumption (e. g. , Apache Kafka) § Process: process streams, possibly generating derived streams (e. g. , Apache Flink) 12

Stream Analysis in Flink More at: http: //flink. apache. org/news/2015/02/09/streaming-example. html 13

Stream Analysis in Flink More at: http: //flink. apache. org/news/2015/02/09/streaming-example. html 13

Defining windows in Flink § Trigger policy • When to trigger the computation on

Defining windows in Flink § Trigger policy • When to trigger the computation on current window § Eviction policy • When data points should leave the window • Defines window width/size § E. g. , count-based policy • evict when #elements > n • start a new window every n-th element § Built-in: Count, Time, Delta policies 14

Checkpointing / Recovery § Flink acknowledges batches of records • Less overhead in failure-free

Checkpointing / Recovery § Flink acknowledges batches of records • Less overhead in failure-free case • Currently tied to fault tolerant data sources (e. g. , Kafka) § Flink operators can keep state • State is checkpointed • Checkpointing and record acks go together § Exactly one semantics for state 15

Checkpointing / Recovery Operator checkpoint starting Pushes checkpoint barriers through the data flow Checkpoint

Checkpointing / Recovery Operator checkpoint starting Pushes checkpoint barriers through the data flow Checkpoint done barrier Data Stream Before barrier = After barrier = Not in snapshot part of the snapshot (backup till next snapshot) checkpoint in progress Checkpoint done Chandy-Lamport Algorithm for consistent asynchronous distributed snapshots 16

Heavy ETL Pipelines 17

Heavy ETL Pipelines 17

Heavy Data Pipelines Apology: Graph had to be blurred for online slides, due to

Heavy Data Pipelines Apology: Graph had to be blurred for online slides, due to confidentiality Complex ETL programs 18

Memory Management Managed Unmanaged Flink contains its own memory management stack. Memory is allocated,

Memory Management Managed Unmanaged Flink contains its own memory management stack. Memory is allocated, de-allocated, and used strictly using an internal buffer pool implementation. To do that, Flink contains its own type extraction and serialization components. User code objects Sorting, hashing, caching Shuffling, broadcasts public class WC { public String word; public int count; } empty page Pool of Memory Pages More at: https: //cwiki. apache. org/confluence/pages/viewpage. action? page. Id=53741525 19

Smooth out-of-core performance Single-core join of 1 KB Java objects beyond memory (4 GB)

Smooth out-of-core performance Single-core join of 1 KB Java objects beyond memory (4 GB) Blue bars are in-memory, orange bars (partially) out-of-core More at: http: //flink. apache. org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room. html 20

Benefits of managed memory § More reliable and stable performance (less GC effects, easy

Benefits of managed memory § More reliable and stable performance (less GC effects, easy to go to disk) 21

Table API val customers = envread. Csv. File(…). as('id, 'mkt. Segment). filter( 'mkt. Segment

Table API val customers = envread. Csv. File(…). as('id, 'mkt. Segment). filter( 'mkt. Segment === "AUTOMOBILE" ) val orders = env. read. Csv. File(…). filter( o => date. Format. parse(o. order. Date). before(date) ). as('order. Id, 'cust. Id, 'order. Date, 'ship. Prio) val items = orders. join(customers). where('cust. Id === 'id). join(lineitems). where('order. Id === 'id). select('order. Id, 'order. Date, 'ship. Prio, 'extd. Price * (Literal(1. 0 f) - 'discount) as 'revenue) val result = items. group. By('order. Id, 'order. Date, 'ship. Prio). select('order. Id, 'revenue. sum, 'order. Date, 'ship. Prio) 22

Iterations in Data Flows Machine Learning Algorithms 23

Iterations in Data Flows Machine Learning Algorithms 23

Iterate by looping Client Step Step § for/while loop in client submits one job

Iterate by looping Client Step Step § for/while loop in client submits one job per iteration step § Data reuse by caching in memory and/or disk 24

Iterate in the Dataflow 25

Iterate in the Dataflow 25

Large-Scale Machine Learning Factorizing a matrix with 28 billion ratings for recommendations (Scale of

Large-Scale Machine Learning Factorizing a matrix with 28 billion ratings for recommendations (Scale of Netflix or Spotify) More at: http: //data-artisans. com/computing-recommendations-with-flink. html 26

State in Iterations Graphs and Machine Learning 27

State in Iterations Graphs and Machine Learning 27

Iterate natively with deltas Replace initial workset A B workset initial solution partial solution

Iterate natively with deltas Replace initial workset A B workset initial solution partial solution X Y delta set iteration result other datasets Merge deltas 28

# of elements updated Effect of delta iterations… iteration

# of elements updated Effect of delta iterations… iteration

… very fast graph analysis Performance competitive with dedicated graph analysis systems … and

… very fast graph analysis Performance competitive with dedicated graph analysis systems … and mix and match ETL-style and graph analysis in one program More at: http: //data-artisans. com/data-analysis-with-flink. html 30

Closing 31

Closing 31

Flink Roadmap for 2015 § Out-of-core state in Streaming § Monitoring and scaling for

Flink Roadmap for 2015 § Out-of-core state in Streaming § Monitoring and scaling for streaming § Streaming Machine Learning with SAMOA § More additions to the libraries • Batch Machine Learning • Graph library additions (more algorithms) § SQL on top of expression language § Master failover 32

Flink community 120 #unique contributor ids by git commits 100 80 60 40 20

Flink community 120 #unique contributor ids by git commits 100 80 60 40 20 0 May-10 Dec-10 Jun-11 Jan-12 Jul-12 Feb-13 Aug-13 Mar-14 Oct-14 Apr-15

flink. apache. org @Apache. Flink

flink. apache. org @Apache. Flink

Backup 35

Backup 35

Cornerpoints of Flink Design Flexible Data Streaming Engine Robust Algorithms on Managed Memory à

Cornerpoints of Flink Design Flexible Data Streaming Engine Robust Algorithms on Managed Memory à Low Latency Steam Proc. à Highly flexible windows No Out. Of. Memory Errors à Scales to very large JVMs à Efficient an robust processing High-level APIs, beyond key/value pairs à Java/Scala/Python (upcoming) à Relational-style optimizer Pipelined Execution of Batch Programs à Better shuffle performance à Scales to very large groups Active Library Development Native Iterations à Graphs / Machine Learning à Streaming ML (coming) à Very fast Graph Processing à Stateful Iterations for ML 36

Program optimization 37

Program optimization 37

A simple program val orders = … val lineitems = … val filtered. Orders

A simple program val orders = … val lineitems = … val filtered. Orders = orders. filter(o => data. Format. parse(l. ship. Date). after(date)). filter(o => o. ship. Prio > 2) val lineitems. Of. Orders = filtered. Orders. join(lineitems). where(“order. Id”). equal. To(“order. Id”). apply((o, l) => new Selected. Item(o. order. Date, l. extd. Price)) val price. Sums = lineitems. Of. Orders. group. By(“order. Date”). sum(“l. extd. Price”); 38

Two execution plans Group. Red sort hash-part [0, 1] Join Hybrid Hash build. HT

Two execution plans Group. Red sort hash-part [0, 1] Join Hybrid Hash build. HT forward Best plan depends on relative sizes of input files Combine Join Hybrid Hash probe build. HT probe broadcast forward hash-part [0] Map Data. Source lineitem. tbl Filter Data. Source orders. tbl lineitem. tbl 39

Examples of optimization § Task chaining • Coalesce map/filter/etc tasks § Join optimizations •

Examples of optimization § Task chaining • Coalesce map/filter/etc tasks § Join optimizations • Broadcast/partition, build/probe side, hash or sortmerge § Interesting properties • Re-use partitioning and sorting for later operations § Automatic caching • E. g. , for iterations 40

Visualization 41

Visualization 41

Visualization tools 42

Visualization tools 42

Visualization tools 43

Visualization tools 43

Visualization tools 44

Visualization tools 44