RealTime Streaming with Python ML Inference Marko Topolnik

  • Slides: 24
Download presentation
Real-Time Streaming with Python ML Inference Marko Topolnik 1

Real-Time Streaming with Python ML Inference Marko Topolnik 1

About Us Hazelcast started as an In-Memory Data Grid project (distributed caching with data-local

About Us Hazelcast started as an In-Memory Data Grid project (distributed caching with data-local computation) Java codebase, polyglot clients we focus on Simplicity in 2017 we added Hazelcast Jet: In-Memory Distributed Stream Processing My role: tech lead of the Hazelcast Jet core team 2

Example: Salary Prediction Python, Sci. Kit Learn, Random Forest 3

Example: Salary Prediction Python, Sci. Kit Learn, Random Forest 3

Sample Input and Output Input: { Output: { "age": 25, "probability": 0. 85 "workclass":

Sample Input and Output Input: { Output: { "age": 25, "probability": 0. 85 "workclass": "Self-emp", "income": "<=50 K" "fnlwgt": 176756, } "education": "HS-grad", "education-num": 9, "marital-status": "Never-married", "occupation": "Farming-fishing", "relationship": "Own-child", "capital-gain": 0, "capital-loss": 0, "hours-per-week": 35, "native-country": "United-States" } 4

(Showing Project Directory) 5

(Showing Project Directory) 5

We have a Web Service Doing ML! Client R E S T 6

We have a Web Service Doing ML! Client R E S T 6

Data Science: The Hype Do Data Science Profit! 7

Data Science: The Hype Do Data Science Profit! 7

Data Science: The Reality Package Deploy Profit (fingers crossed) Do Data Science Scale Up

Data Science: The Reality Package Deploy Profit (fingers crossed) Do Data Science Scale Up Re-Train Scale Out Monitor Load-Balance 8

Productionizing the REST service Request #1 R E S T Request #2 Request #3

Productionizing the REST service Request #1 R E S T Request #2 Request #3 Parallelism? 9

Productionizing the REST service Request #1 REST Request #2 REST Request #3 REST Load-Balancing?

Productionizing the REST service Request #1 REST Request #2 REST Request #3 REST Load-Balancing? 10

Productionizing the REST service REST Request #1 Request #2 Load Balancer REST Request #3

Productionizing the REST service REST Request #1 Request #2 Load Balancer REST Request #3 Batching? 11

Effect of Batching on Throughput 12

Effect of Batching on Throughput 12

Productionizing the REST service REST Request #1 Batch Request Queue Request #2 Request #3

Productionizing the REST service REST Request #1 Batch Request Queue Request #2 Request #3 Response Queue Batch Load Balancer REST 13

Replace REST with Distributed Streaming Request #1 Kafka Hazelcast Jet Cluster Request #2 Request

Replace REST with Distributed Streaming Request #1 Kafka Hazelcast Jet Cluster Request #2 Request #3 $ jet submit ML 14

Hazelcast Jet Code Pipeline p = Pipeline. create(); p. read. From(Kafka. source()). apply(map. Using.

Hazelcast Jet Code Pipeline p = Pipeline. create(); p. read. From(Kafka. source()). apply(map. Using. Python(new Python. Service. Config(). set. Base. Dir("/Users/mtopol/dev/python/sklearn"). set. Handler. Module("example_1_inference_jet"))). write. To(Kafka. sink()); jet. new. Job(p); $ mvn package $ jet submit target/my-job. jar 15

Jet Job's Execution Plan Kafka Topic A Jet Node 1 Source Batch Python process

Jet Job's Execution Plan Kafka Topic A Jet Node 1 Source Batch Python process Batch map. Using Python process Sink Kafka Topic B 16

Let's Start a Jet Cluster! 17

Let's Start a Jet Cluster! 17

Jet's Stream Operators ● windowed aggregation using Event Time ○ sliding, session window ○

Jet's Stream Operators ● windowed aggregation using Event Time ○ sliding, session window ○ count, sum, average, linear regression, . . . ○ custom aggregate function ● ● rolling aggregation streaming join (co-grouping) hash join (enrichment) contact arbitrary external services ○ map. Using. Python uses this 18

Jet's Cooperative Multithreading (1/2) n -ro bi d ro al lo c d un

Jet's Cooperative Multithreading (1/2) n -ro bi d ro al lo c d un d ne pa rti tio d ar tit io ne Sink Combine ut e di st rib lo ca ro al lo c Group + Accumulate lp un dro bi n Source Flat. Map + Filter 19

Jet's Cooperative Multithreading (2/2) Cooperative Thread 1 Cooperative Thread 2 Non-Cooperative Thread 1 Flat.

Jet's Cooperative Multithreading (2/2) Cooperative Thread 1 Cooperative Thread 2 Non-Cooperative Thread 1 Flat. Map + Filter Source Group + Acc Combine Non-Cooperative Thread 2 Sink From Accumulate To Combine 20

Cluster Elasticity and Resilience ● Jet jobs are fault-tolerant ● nodes can join and

Cluster Elasticity and Resilience ● Jet jobs are fault-tolerant ● nodes can join and leave the cluster, jobs go on ● automatically rescale to available hardware 21

Cluster Self-Formation Hazelcast Jet natively supports: ● Amazon AWS ● Google GCP ● Kubernetes

Cluster Self-Formation Hazelcast Jet natively supports: ● Amazon AWS ● Google GCP ● Kubernetes With simple configuration, the nodes self-discover in these environments 22

Source and Sink Connectors ● ● ● ● Kafka Change Data Capture: My. SQL,

Source and Sink Connectors ● ● ● ● Kafka Change Data Capture: My. SQL, Postgre. SQL, . . . HTTP: Web. Socket, Server-Sent Events Hadoop HDFS S 3 bucket JDBC JMS queue and topic 23

Thanks for attending! Q&A marko@hazelcast. com @mtopolnik 24

Thanks for attending! Q&A marko@hazelcast. com @mtopolnik 24