Whats new in Kafka 0 10 0 Introducing

  • Slides: 53
Download presentation
What’s new in Kafka 0. 10. 0 Introducing Kafka Streams Eno Thereska eno@confluent. io

What’s new in Kafka 0. 10. 0 Introducing Kafka Streams Eno Thereska eno@confluent. io enotheres ka Kafka Meetup, July 21, 2016 Slide contributions: Michael Noll and Ismael

What’s new in Kafka 0. 10. 0 1. Lots of new KIPs in 1.

What’s new in Kafka 0. 10. 0 1. Lots of new KIPs in 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. KIP-4 metadata KIP-31 Relative offsets in compressed message sets KIP-32 Add timestamps to Kafka message KIP-35 Retrieving protocol version KIP-36 Rack aware replica assignment KIP-41 Kafka. Consumer Max Records KIP-42: Add Producer and Consumer Interceptors KIP-45 Standardize all client sequence interaction KIP-43 Kafka SASL enhancements KIP-57 - Interoperable LZ 4 Framing KIP-51 - List Connectors REST API KIP-52: Connector Control APIs KIP-56: Allow cross origin HTTP requests on all HTTP methods 2. Kafka Streams 2

Kafka Streams • Powerful yet easy-to use Java library • Part of open source

Kafka Streams • Powerful yet easy-to use Java library • Part of open source Apache Kafka, introduced in v 0. 10, May 2016 • Source code: https: //github. com/apache/kafka/tree/trunk/streams • Build your own stream processing applications that are • • • highly scalable fault-tolerant distributed stateful able to handle late-arriving, out-of-order data 3

Kafka Streams 4

Kafka Streams 4

When to use Kafka Streams (as of Kafka 0. 10) Recommended use cases Questionable

When to use Kafka Streams (as of Kafka 0. 10) Recommended use cases Questionable use cases • Application Development • Data Science / Data Engineering • “Fast Data” apps (small or big data) • Reactive and stateful applications • Linear streams • Event-driven systems • Continuous transformations • Continuous queries • Microservices • “Heavy lifting” • Data mining • Non-linear, branching streams (graphs) • Machine learning, number crunching • What you’d do in a data warehouse 5

Alright, can you show me some code now? • API option 1: Kafka Streams

Alright, can you show me some code now? • API option 1: Kafka Streams DSL (declarative) KStream<Integer, Integer> input = builder. stream(“numbers-topic”); // Stateless computation KStream<Integer, Integer> doubled = input. map. Values(v -> v * 2); // Stateful computation KTable<Integer, Integer> sum. Of. Odds = input. filter((k, v) -> v % 2 != 0). select. Key((k, v) -> 1). reduce. By. Key((v 1, v 2) -> v 1 + v 2, ”sum-of-odds"); 6

Alright, can you show me some code now? • API option 2: low-level Processor

Alright, can you show me some code now? • API option 2: low-level Processor API (imperative) Startup Process a record Periodic action Shutdown 7

How do I install Kafka Streams? • There is and there should be no

How do I install Kafka Streams? • There is and there should be no “install”. • It’s a library. Add it to your app like any other library. <dependency> <group. Id>org. apache. kafka</group. Id> <artifact. Id>kafka-streams</artifact. Id> <version>0. 10. 0. 0</version> </dependency> 8

Do I need to install a CLUSTER to run my apps? • No, you

Do I need to install a CLUSTER to run my apps? • No, you don’t. Kafka Streams allows you to stay lean and lightweight. • Unlearn bad habits: “do cool stuff with data != must have cluster” Ok. 9

How do I package and deploy my apps? How do I …? 10

How do I package and deploy my apps? How do I …? 10

How do I package and deploy my apps? How do I …? • Whatever

How do I package and deploy my apps? How do I …? • Whatever works for you. Stick to what you/your company think is the best way. • Why? Because an app that uses Kafka Streams is…a normal Java app. • Your Ops/SRE/Info. Sec teams may finally start to love not hate you. 11

Kafka concepts recap 12

Kafka concepts recap 12

Kafka concepts 13

Kafka concepts 13

Kafka concepts 14

Kafka concepts 14

Kafka Streams concepts 15

Kafka Streams concepts 15

Stream: ordered, re-playable, fault-tolerant sequence of immutable data records 16

Stream: ordered, re-playable, fault-tolerant sequence of immutable data records 16

Processor topology: computational logic of an app’s data processing 17

Processor topology: computational logic of an app’s data processing 17

Stream partitions and stream tasks: units of parallelism 18

Stream partitions and stream tasks: units of parallelism 18

Streams meet Tables A stream is a changelog of a table A table is

Streams meet Tables A stream is a changelog of a table A table is a materialized view at time of a stream 19

Streams meet Tables – in the Kafka Streams DSL time = interprets data as

Streams meet Tables – in the Kafka Streams DSL time = interprets data as record stream KStream alice KTable “Alice clicked 2+3 = 5 “Alice clicked 2 times. ” 2 bob 10 alice 3 “Alice clicked 2 times. ” “Alice clicked 2 3 times. ” = interprets data as changelog stream ~ is a continuously updated materialized view 20

Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user

Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream. left. Join(KTable) 21

Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user

Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream. left. Join(KTable) 22

Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user

Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream. left. Join(KTable) Input left. Join() w/ KTable alice 13 alice map() reduce. By. Key(_ + _) bob (europe, 13) bob 13 europe KStream 5 (europe, 5) KStream 5 KStream europe 13 europe 18 … … … … KTable 23

Streams meet Tables – in the Kafka Streams DSL 24

Streams meet Tables – in the Kafka Streams DSL 24

Kafka Streams key features 25

Kafka Streams key features 25

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic 26

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations (e. g. joins, aggregations) 27

Fault tolerance State stores 28

Fault tolerance State stores 28

Fault tolerance bob 1 alice charlie 3 alice 1 2 State stores 29

Fault tolerance bob 1 alice charlie 3 alice 1 2 State stores 29

Fault tolerance State stores 30

Fault tolerance State stores 30

Fault tolerance State stores alice 2 alice 1 31

Fault tolerance State stores alice 2 alice 1 31

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model 32

Time 33

Time 33

Time 34

Time 34

Time • You configure the desired time semantics through timestamp extractors • Default extractor

Time • You configure the desired time semantics through timestamp extractors • Default extractor yields event-time semantics • Extracts embedded timestamps of Kafka messages (introduced in v 0. 10) 35

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model • Windowing 36

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits

Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model • Windowing • Supports late-arriving and out-of-order data • Millisecond processing latency, no micro-batching • At-least-once processing guarantees (exactly-once is in the works) 37

Where to go from here? • Kafka Streams is available in Apache Kafka 0.

Where to go from here? • Kafka Streams is available in Apache Kafka 0. 10 and Confluent Platform 3. 0 • • http: //kafka. apache. org/ http: //www. confluent. io/download (free + enterprise versions, tar/zip/deb/rpm) • Kafka Streams demos at https: //github. com/confluentinc/examples • • Java 7, Java 8+ with lambdas, and Scala Word. Count, Joins, Avro integration, Top-N computation, Windowing, … • Apache Kafka documentation: http: //kafka. apache. org/documentation. html • Confluent documentation: http: //docs. confluent. io/3. 0. 0/streams/ • Quickstart, Concepts, Architecture, Developer Guide, FAQ • Join our bi-weekly Ask Me Anything sessions on Kafka Streams • Contact me at eno@confluent. io for details 38

Some of the things to come • Exactly-once semantics • Queriable state – tap

Some of the things to come • Exactly-once semantics • Queriable state – tap into the state of your applications (KIP-67: adopted) • SQL interface • Listen to and collaborate with the developer community • Your feedback counts a lot! Share it via users@kafka. apache. org 39

Want to contribute to Kafka and open source? Join the Kafka community http: //kafka.

Want to contribute to Kafka and open source? Join the Kafka community http: //kafka. apache. org/ …in a great team with the creators of Kafka? Confluent is hiring http: //confluent. io/ Questions, comments? Tweet with #bbuzz and /cc to @Confluent. Inc 40

Backup 41

Backup 41

Details on other KIPs (Slides contributed by Ismael Juma) 42

Details on other KIPs (Slides contributed by Ismael Juma) 42

KIP-4 Metadata - Update Metadata. Request and Metadata. Response Expose new fields for KIP-4

KIP-4 Metadata - Update Metadata. Request and Metadata. Response Expose new fields for KIP-4 - not used yet Make it possible to ask for cluster information with no topics Fix nasty bug where request would be repeatedly sent if producer was started and unused for more than 5 minutes - KAFKA-3602 43

KIP-31 Relative offsets in compressed message sets - Message format change (affects Fetch. Request,

KIP-31 Relative offsets in compressed message sets - Message format change (affects Fetch. Request, Produce. Request and on-disk format) - Avoids recompression to assign offsets - Improves broker latency - Should also improve throughput, but can affect producer batch sizes so can reduce throughput in some cases, tune linger. ms and batch. size 44

KIP-32 Add timestamps to Kafka message - Create. Time or Log. Append. Time Increases

KIP-32 Add timestamps to Kafka message - Create. Time or Log. Append. Time Increases message size by 8 bytes Small throughput degradation, particularly for small messages Careful not to go over network limit due to this increase 45

Migration from V 1 to V 2 format - Read the upgrade notes 0.

Migration from V 1 to V 2 format - Read the upgrade notes 0. 10 Producer produces in new format 0. 10 broker can store in old or new format depending on config 0. 10 consumers can use either format 0. 9 consumers only support old format Broker can do conversion on the fly (with performance impact) 46

KIP-35 Retrieving protocol version - Request type that returns all the requests and versions

KIP-35 Retrieving protocol version - Request type that returns all the requests and versions supported by the broker - Aim is for clients to use this to help them support multiple broker versions - Not used by Java client yet - Used by librdkafka and kafka-python 47

KIP-36 Rack aware replica assignment - Kafka can now run with a rack awareness

KIP-36 Rack aware replica assignment - Kafka can now run with a rack awareness feature that isolates replicas so they are guaranteed to span multiple racks or availability zones. This allows all of Kafka’s durability guarantees to be applied to these larger architectural units, significantly increasing availability - Old clients must be upgraded to 0. 9. 0. 1 before going to 0. 10. 0. 0 - broker. rack in server. properties - Can be disabled when launching reassignment tool 48

New consumer enhancements - KIP-41 Kafka. Consumer Max Records - KIP-42: Add Producer and

New consumer enhancements - KIP-41 Kafka. Consumer Max Records - KIP-42: Add Producer and Consumer Interceptors - KIP-45 Standardize all client sequence interaction on j. u. Collection. 49

KIP-43 Kafka SASL enhancements - Multiple SASL mechanisms: PLAIN and Kerberos included - Pluggable

KIP-43 Kafka SASL enhancements - Multiple SASL mechanisms: PLAIN and Kerberos included - Pluggable - Added support for protocol evolution 50

KIP-57 - Interoperable LZ 4 Framing It was broken, fixed in 0. 10, took

KIP-57 - Interoperable LZ 4 Framing It was broken, fixed in 0. 10, took advantage of message format bump 51

Connect KIPs KIP-51 - List Connectors REST API KIP-52: Connector Control APIs KIP-56: Allow

Connect KIPs KIP-51 - List Connectors REST API KIP-52: Connector Control APIs KIP-56: Allow cross origin HTTP requests on all HTTP methods 52

Lots of bugs fixed Producer ordering, Socket. Server leaks, New Consumer, Offset handling in

Lots of bugs fixed Producer ordering, Socket. Server leaks, New Consumer, Offset handling in the broker http: //mirrors. muzzy. org. uk/apache/kafka/0. 10. 0. 0/RELEASE_NOTE S. html 53