Stream ING models Realtime model deployment of ML

  • Slides: 39
Download presentation
Stream. ING models Realtime model deployment of ML capabilities Erik de Nooij, IT Chapter

Stream. ING models Realtime model deployment of ML capabilities Erik de Nooij, IT Chapter Lead Fraud&Cybersec.

Who Am I? § IT Chapter Lead within the Fraud & Cybersecurity department, based

Who Am I? § IT Chapter Lead within the Fraud & Cybersecurity department, based in Amsterdam § Before ING implemented Enterprise Software, mainly knowledge management and CRM related § Background in: Scala, Java, C# (MCSD), Tomcat, Websphere, Oracle, Cassandra and now…. Flink https: //www. linkedin. com/in/erik-de-nooij-93 ab 1 a/ Erik. g. de. Nooij@ing. nl 2

About ING

About ING

About ING The Netherlands Worldwide § 35 Million customers § 51. 000 Employees §

About ING The Netherlands Worldwide § 35 Million customers § 51. 000 Employees § Presence in over 40 countries Netherlands § 9 Million Customers § Billion logins yearly on https: //www. ing. nl § 1 million transactions per day 4 Market leaders Benelux Challengers Growth markets Commercial Banking

Criminal organization Threats related to fraud & cybersecurity 2008 2017 Fake ID ? Individuals

Criminal organization Threats related to fraud & cybersecurity 2008 2017 Fake ID ? Individuals 2010 Skimming Small groups 2012 Phishing worldwide groups 2014 APT Organized crime Response Manual detection 5 Rule based detection Model based detection Scanomaly detection

Carbanak APT (Advanced Persistent Threat) § This started via a phishing email… 6

Carbanak APT (Advanced Persistent Threat) § This started via a phishing email… 6

Goals § Support various types of (ML) models § Tools to create models versus

Goals § Support various types of (ML) models § Tools to create models versus scoring models § One codebase, Saa. S deployment model § Make changes instantly (no downtime) § Multiple domains 7

Goals § Support various types of (ML) models § One codebase, Saa. S deployment

Goals § Support various types of (ML) models § One codebase, Saa. S deployment model § Pre-processor, Decoupled architecture § Make changes instantly (no downtime) § Multiple domains 8

Goals § Support various types of (ML) models § One codebase, Saa. S deployment

Goals § Support various types of (ML) models § One codebase, Saa. S deployment model § Make changes instantly (no downtime) § Use case § Feature extraction § Enriching streams § End user tooling § Demo § Multiple domains 9

Goals § Support various types of (ML) models § One codebase, Saa. S deployment

Goals § Support various types of (ML) models § One codebase, Saa. S deployment model § Make changes instantly (no downtime) § Multiple domains § examples 10

Support various types of models

Support various types of models

Creating models offline, scoring online offline online <PMML /> {PFA} Portable model HDFS Model

Creating models offline, scoring online offline online <PMML /> {PFA} Portable model HDFS Model creation 12 Streaming platform Model execution

Predictive Model Markup Language (PMML) § The Predictive Model Markup Language (PMML) is an

Predictive Model Markup Language (PMML) § The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format if field 1 > 500 AND field 2 == 1 AND field 3 > 1 13 <Simple. Rule score="Alert" weight="1. 0"> <Compound. Predicate boolean. Operator="and"> <Simple. Predicate field="field 1" operator="greater. Than" value="500"/> <Simple. Predicate field="field 2" operator="equal" value="1"/> <Simple. Predicate field="field 3" operator="greater. Than" value="1"/> </Compound. Predicate> </Simple. Rule>

Predictive Model Markup Language (PMML) § The Predictive Model Markup Language (PMML) is an

Predictive Model Markup Language (PMML) § The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format 14

Machine learning tools supporting pmml 15

Machine learning tools supporting pmml 15

Model scoring using Open. Scoring. io library § Parse the pmml file(s) § Pass

Model scoring using Open. Scoring. io library § Parse the pmml file(s) § Pass on the Feature Set to the model(s) § Run the ‘predict’ function which returns the output of the model(s) Control stream Feature sets Data stream 16 model scoring Score

Supported models(*) Association rules Regression Cluster model Rule set General regression Scorecard Naive Bayes

Supported models(*) Association rules Regression Cluster model Rule set General regression Scorecard Naive Bayes Support Vector Machine k-Nearest neighbours Tree model Neural network Ensemble model (*) supported models by http: //openscoring. io/ 17

Goals Use of various types of models One codebase, Saa. S Deployment model Pre-processor,

Goals Use of various types of models One codebase, Saa. S Deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains 18

One Bank Strategy Market leaders Benelux Challengers Growth markets Commercial Banking 19

One Bank Strategy Market leaders Benelux Challengers Growth markets Commercial Banking 19

How flexible is this architecture? Amount = “ 42, 00” Amountincents = Feature 4200

How flexible is this architecture? Amount = “ 42, 00” Amountincents = Feature 4200 extraction & Model scoring Amount = 42. 00 20

Decoupled architecture Amount = “ 42. 00” Pre. Processor Amountincents = 4200 Amount =

Decoupled architecture Amount = “ 42. 00” Pre. Processor Amountincents = 4200 Amount = 42. 00 21 Amountincents = 4200 Feature extraction & Model scoring Busines s events

Goals Use of various types of models One codebase, Saa. S Deployment model Make

Goals Use of various types of models One codebase, Saa. S Deployment model Make changes instantly (no downtime) § Use case § Feature extraction § Enriching streams § End user tooling § Demo Multiple domains 22

Use case • Your phone with the banking app installed is stolen • Limit

Use case • Your phone with the banking app installed is stolen • Limit on the banking app is 1. 000, • Funds are transferred from your account (A) to a mule account (B) 23

Model features and model output Amount > 500 Nr. Of Trxs Last 1 h

Model features and model output Amount > 500 Nr. Of Trxs Last 1 h First Trx <24 h ago 24 Model Alert || OK

Stream with stateless operators A B 1000 Ev. 1 Amount, Unknown, Prev. Trxs (1000,

Stream with stateless operators A B 1000 Ev. 1 Amount, Unknown, Prev. Trxs (1000, ? ) Fe. X Feature extraction 25 PMM L Model scoring

Stream with stateful operators A B 1000 Ev. 1 Ev. 2 Amount, Unknown, Prev.

Stream with stateful operators A B 1000 Ev. 1 Ev. 2 Amount, Unknown, Prev. Trxs 0) (1000, true, 1) Fe. X PMM L Model scoring STATE 26 Key Value (A, B, First. Trx) Ev. 1 (A, B, Historical. Trxs) ev 1 1000, ev 1 1000 ev 2 1000 Alert || OK

How to perform aggregate functions on a stream? 27 Average amount last week: €

How to perform aggregate functions on a stream? 27 Average amount last week: € 37, 04 Max amount last month: € 834, 12

Enriching the stream based. A on multiple keys A B IP 1000 Ev. 1

Enriching the stream based. A on multiple keys A B IP 1000 Ev. 1 3542321 B IP 1000 Ev. 1 Split 3542321 A A, E, I. . 192. x. x. 1, 192. x. x. 5 A’ 3542321 B B 192. x. x. 2, D, F B’. . 192. x. x. 6 3542321 A. B IP C 192. x. x. 3, G, A. B’ H 192. x. x. 7. . 3542321 IP’……. J, 192. x. x. 4, K. . Accounts are distributed across the task managers 28 Aggregation step Calculating features

Aggregating and model scoring Aggregation A B IP 1000 Ev. 1 A. B’ Model

Aggregating and model scoring Aggregation A B IP 1000 Ev. 1 A. B’ Model Scoring (A. B’, (B’) 1000) 1. Amount 2. (A. B). First. Tr x 3. (A. B). Nr. Trxs 1. B’ B’ 29 A B IP 1000 Ev. 1 1. IP’ 2. ….

Domain Specific Language (DSL) A DSL is a domain specific language. We use it

Domain Specific Language (DSL) A DSL is a domain specific language. We use it to define the behaviour of our operators. § The persist rules (which data to store within state) § Feature calculation rules § Model definition rules 30

Definition instead of code - Persist rule history[double, 4 weeks, 100] @(source. Accnt. Nr.

Definition instead of code - Persist rule history[double, 4 weeks, 100] @(source. Accnt. Nr. dest. Accnt. Nr). Trxs : = $amount 31

Feature Calculation rules Nr. Of Trxs Last 1 h count(between @(source. Accnt. Nr. dest.

Feature Calculation rules Nr. Of Trxs Last 1 h count(between @(source. Accnt. Nr. dest. Accnt. Nr). Trxs, $eventtime-1 hour)); First Trx A to B <24 h @(source. Accnt. Nr. dest. Accnt. Nr). First. Used >= $eventtime-24 hours; 32

Creating models offline, scoring online offline online DSL <PMML /> {PFA} Portable model HDFS

Creating models offline, scoring online offline online DSL <PMML /> {PFA} Portable model HDFS Model creation 33 Data scientist with offline tooling Streaming platform Model execution

Control streams

Control streams

Streaming in the defintions DSL files § Persist rules Split Broad cast § Model

Streaming in the defintions DSL files § Persist rules Split Broad cast § Model definitions § Feature calculation rules Fex & Model scoring 35

Demo

Demo

Goals Use of various types of models One codebase, Saa. S Deployment model Make

Goals Use of various types of models One codebase, Saa. S Deployment model Make changes instantly (no downtime) Multiple domains 37

Multiple domains – ponder on this We have built a feature-extraction engine and used

Multiple domains – ponder on this We have built a feature-extraction engine and used that to make a Fraud-Risk Engine Can we also build this? …. § § 38 Customer Notifications? Calculating RFQ’s for Bond Prices? Product Fullfilment engine? Other?

Take aways Decoupled architecture with preprocessor 39 Enriching events with multiple keys End users

Take aways Decoupled architecture with preprocessor 39 Enriching events with multiple keys End users making changes Multiple domain