Whats new in Kafka 0 10 0 Introducing
- Slides: 53
What’s new in Kafka 0. 10. 0 Introducing Kafka Streams Eno Thereska eno@confluent. io enotheres ka Kafka Meetup, July 21, 2016 Slide contributions: Michael Noll and Ismael
What’s new in Kafka 0. 10. 0 1. Lots of new KIPs in 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. KIP-4 metadata KIP-31 Relative offsets in compressed message sets KIP-32 Add timestamps to Kafka message KIP-35 Retrieving protocol version KIP-36 Rack aware replica assignment KIP-41 Kafka. Consumer Max Records KIP-42: Add Producer and Consumer Interceptors KIP-45 Standardize all client sequence interaction KIP-43 Kafka SASL enhancements KIP-57 - Interoperable LZ 4 Framing KIP-51 - List Connectors REST API KIP-52: Connector Control APIs KIP-56: Allow cross origin HTTP requests on all HTTP methods 2. Kafka Streams 2
Kafka Streams • Powerful yet easy-to use Java library • Part of open source Apache Kafka, introduced in v 0. 10, May 2016 • Source code: https: //github. com/apache/kafka/tree/trunk/streams • Build your own stream processing applications that are • • • highly scalable fault-tolerant distributed stateful able to handle late-arriving, out-of-order data 3
Kafka Streams 4
When to use Kafka Streams (as of Kafka 0. 10) Recommended use cases Questionable use cases • Application Development • Data Science / Data Engineering • “Fast Data” apps (small or big data) • Reactive and stateful applications • Linear streams • Event-driven systems • Continuous transformations • Continuous queries • Microservices • “Heavy lifting” • Data mining • Non-linear, branching streams (graphs) • Machine learning, number crunching • What you’d do in a data warehouse 5
Alright, can you show me some code now? • API option 1: Kafka Streams DSL (declarative) KStream<Integer, Integer> input = builder. stream(“numbers-topic”); // Stateless computation KStream<Integer, Integer> doubled = input. map. Values(v -> v * 2); // Stateful computation KTable<Integer, Integer> sum. Of. Odds = input. filter((k, v) -> v % 2 != 0). select. Key((k, v) -> 1). reduce. By. Key((v 1, v 2) -> v 1 + v 2, ”sum-of-odds"); 6
Alright, can you show me some code now? • API option 2: low-level Processor API (imperative) Startup Process a record Periodic action Shutdown 7
How do I install Kafka Streams? • There is and there should be no “install”. • It’s a library. Add it to your app like any other library. <dependency> <group. Id>org. apache. kafka</group. Id> <artifact. Id>kafka-streams</artifact. Id> <version>0. 10. 0. 0</version> </dependency> 8
Do I need to install a CLUSTER to run my apps? • No, you don’t. Kafka Streams allows you to stay lean and lightweight. • Unlearn bad habits: “do cool stuff with data != must have cluster” Ok. 9
How do I package and deploy my apps? How do I …? 10
How do I package and deploy my apps? How do I …? • Whatever works for you. Stick to what you/your company think is the best way. • Why? Because an app that uses Kafka Streams is…a normal Java app. • Your Ops/SRE/Info. Sec teams may finally start to love not hate you. 11
Kafka concepts recap 12
Kafka concepts 13
Kafka concepts 14
Kafka Streams concepts 15
Stream: ordered, re-playable, fault-tolerant sequence of immutable data records 16
Processor topology: computational logic of an app’s data processing 17
Stream partitions and stream tasks: units of parallelism 18
Streams meet Tables A stream is a changelog of a table A table is a materialized view at time of a stream 19
Streams meet Tables – in the Kafka Streams DSL time = interprets data as record stream KStream alice KTable “Alice clicked 2+3 = 5 “Alice clicked 2 times. ” 2 bob 10 alice 3 “Alice clicked 2 times. ” “Alice clicked 2 3 times. ” = interprets data as changelog stream ~ is a continuously updated materialized view 20
Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream. left. Join(KTable) 21
Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream. left. Join(KTable) 22
Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream. left. Join(KTable) Input left. Join() w/ KTable alice 13 alice map() reduce. By. Key(_ + _) bob (europe, 13) bob 13 europe KStream 5 (europe, 5) KStream 5 KStream europe 13 europe 18 … … … … KTable 23
Streams meet Tables – in the Kafka Streams DSL 24
Kafka Streams key features 25
Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic 26
Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations (e. g. joins, aggregations) 27
Fault tolerance State stores 28
Fault tolerance bob 1 alice charlie 3 alice 1 2 State stores 29
Fault tolerance State stores 30
Fault tolerance State stores alice 2 alice 1 31
Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model 32
Time 33
Time 34
Time • You configure the desired time semantics through timestamp extractors • Default extractor yields event-time semantics • Extracts embedded timestamps of Kafka messages (introduced in v 0. 10) 35
Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model • Windowing 36
Key features in 0. 10 • Native, 100%-compatible Kafka integration • • Also inherits Kafka’s security model, e. g. to encrypt data-in-transit Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model • Windowing • Supports late-arriving and out-of-order data • Millisecond processing latency, no micro-batching • At-least-once processing guarantees (exactly-once is in the works) 37
Where to go from here? • Kafka Streams is available in Apache Kafka 0. 10 and Confluent Platform 3. 0 • • http: //kafka. apache. org/ http: //www. confluent. io/download (free + enterprise versions, tar/zip/deb/rpm) • Kafka Streams demos at https: //github. com/confluentinc/examples • • Java 7, Java 8+ with lambdas, and Scala Word. Count, Joins, Avro integration, Top-N computation, Windowing, … • Apache Kafka documentation: http: //kafka. apache. org/documentation. html • Confluent documentation: http: //docs. confluent. io/3. 0. 0/streams/ • Quickstart, Concepts, Architecture, Developer Guide, FAQ • Join our bi-weekly Ask Me Anything sessions on Kafka Streams • Contact me at eno@confluent. io for details 38
Some of the things to come • Exactly-once semantics • Queriable state – tap into the state of your applications (KIP-67: adopted) • SQL interface • Listen to and collaborate with the developer community • Your feedback counts a lot! Share it via users@kafka. apache. org 39
Want to contribute to Kafka and open source? Join the Kafka community http: //kafka. apache. org/ …in a great team with the creators of Kafka? Confluent is hiring http: //confluent. io/ Questions, comments? Tweet with #bbuzz and /cc to @Confluent. Inc 40
Backup 41
Details on other KIPs (Slides contributed by Ismael Juma) 42
KIP-4 Metadata - Update Metadata. Request and Metadata. Response Expose new fields for KIP-4 - not used yet Make it possible to ask for cluster information with no topics Fix nasty bug where request would be repeatedly sent if producer was started and unused for more than 5 minutes - KAFKA-3602 43
KIP-31 Relative offsets in compressed message sets - Message format change (affects Fetch. Request, Produce. Request and on-disk format) - Avoids recompression to assign offsets - Improves broker latency - Should also improve throughput, but can affect producer batch sizes so can reduce throughput in some cases, tune linger. ms and batch. size 44
KIP-32 Add timestamps to Kafka message - Create. Time or Log. Append. Time Increases message size by 8 bytes Small throughput degradation, particularly for small messages Careful not to go over network limit due to this increase 45
Migration from V 1 to V 2 format - Read the upgrade notes 0. 10 Producer produces in new format 0. 10 broker can store in old or new format depending on config 0. 10 consumers can use either format 0. 9 consumers only support old format Broker can do conversion on the fly (with performance impact) 46
KIP-35 Retrieving protocol version - Request type that returns all the requests and versions supported by the broker - Aim is for clients to use this to help them support multiple broker versions - Not used by Java client yet - Used by librdkafka and kafka-python 47
KIP-36 Rack aware replica assignment - Kafka can now run with a rack awareness feature that isolates replicas so they are guaranteed to span multiple racks or availability zones. This allows all of Kafka’s durability guarantees to be applied to these larger architectural units, significantly increasing availability - Old clients must be upgraded to 0. 9. 0. 1 before going to 0. 10. 0. 0 - broker. rack in server. properties - Can be disabled when launching reassignment tool 48
New consumer enhancements - KIP-41 Kafka. Consumer Max Records - KIP-42: Add Producer and Consumer Interceptors - KIP-45 Standardize all client sequence interaction on j. u. Collection. 49
KIP-43 Kafka SASL enhancements - Multiple SASL mechanisms: PLAIN and Kerberos included - Pluggable - Added support for protocol evolution 50
KIP-57 - Interoperable LZ 4 Framing It was broken, fixed in 0. 10, took advantage of message format bump 51
Connect KIPs KIP-51 - List Connectors REST API KIP-52: Connector Control APIs KIP-56: Allow cross origin HTTP requests on all HTTP methods 52
Lots of bugs fixed Producer ordering, Socket. Server leaks, New Consumer, Offset handling in the broker http: //mirrors. muzzy. org. uk/apache/kafka/0. 10. 0. 0/RELEASE_NOTE S. html 53
- Introducing new market offerings
- Introducing and naming new products and brand extensions
- Introducing and naming new products and brand extensions
- Signal phrase to introduce a quote
- Introducing phonology answer key
- 1941-1882
- How to introduce a quote sentence starters
- Introducing counterclaim
- Unit 1 about myself
- Warm up introducing yourself
- Ma
- Diamante poem examples
- Introduction to the digestive system
- Kfc company background
- Integers essential questions
- Who isp
- Quotes in mla format
- As he himself puts it the art of quoting summary
- Define upgrade advisor
- Khdmdcm metric system
- Bio introducing yourself
- Stimulus diffusion definition
- Blood relation definition
- An introduction to the odyssey
- Student letter exchange
- Carrying broker
- Int family
- Safe relentless improvement
- Government in america chapter 1
- Introducing flex pods
- Which one is the expression of leave taking
- Introducing neeta anil said
- An introduction to rhetoric using the available means
- Templates for introducing quotations
- Templates for introducing quotations
- Metric system basics
- Introducing yourself
- Introducing quotes words
- Operational definition of affection
- Introducing illustrator 2013
- The purpose of introducing weakened microbes
- Ariel introducing a
- Whats hot whats not
- Resumen del capitulo 9 de un grito desesperado
- Resumen de metamorfosis de kafka
- Stream data model
- Netflix kafka monitoring
- Interpretacion de la metamorfosis de franz kafka
- Kafka metamorphosis movie
- Murakami kafka nad morzem
- Kafka premena obsah
- Franz kafka heimkehr
- Kafka otsus
- Ambiente de la metamorfosis