Stream ING models Realtime model deployment of ML






























![Definition instead of code - Persist rule history[double, 4 weeks, 100] @(source. Accnt. Nr. Definition instead of code - Persist rule history[double, 4 weeks, 100] @(source. Accnt. Nr.](https://slidetodoc.com/presentation_image_h/0bdb00936ce28dd5105b909015d00c9a/image-31.jpg)








- Slides: 39
Stream. ING models Realtime model deployment of ML capabilities Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
Who Am I? § IT Chapter Lead within the Fraud & Cybersecurity department, based in Amsterdam § Before ING implemented Enterprise Software, mainly knowledge management and CRM related § Background in: Scala, Java, C# (MCSD), Tomcat, Websphere, Oracle, Cassandra and now…. Flink https: //www. linkedin. com/in/erik-de-nooij-93 ab 1 a/ Erik. g. de. Nooij@ing. nl 2
About ING
About ING The Netherlands Worldwide § 35 Million customers § 51. 000 Employees § Presence in over 40 countries Netherlands § 9 Million Customers § Billion logins yearly on https: //www. ing. nl § 1 million transactions per day 4 Market leaders Benelux Challengers Growth markets Commercial Banking
Criminal organization Threats related to fraud & cybersecurity 2008 2017 Fake ID ? Individuals 2010 Skimming Small groups 2012 Phishing worldwide groups 2014 APT Organized crime Response Manual detection 5 Rule based detection Model based detection Scanomaly detection
Carbanak APT (Advanced Persistent Threat) § This started via a phishing email… 6
Goals § Support various types of (ML) models § Tools to create models versus scoring models § One codebase, Saa. S deployment model § Make changes instantly (no downtime) § Multiple domains 7
Goals § Support various types of (ML) models § One codebase, Saa. S deployment model § Pre-processor, Decoupled architecture § Make changes instantly (no downtime) § Multiple domains 8
Goals § Support various types of (ML) models § One codebase, Saa. S deployment model § Make changes instantly (no downtime) § Use case § Feature extraction § Enriching streams § End user tooling § Demo § Multiple domains 9
Goals § Support various types of (ML) models § One codebase, Saa. S deployment model § Make changes instantly (no downtime) § Multiple domains § examples 10
Support various types of models
Creating models offline, scoring online offline online <PMML /> {PFA} Portable model HDFS Model creation 12 Streaming platform Model execution
Predictive Model Markup Language (PMML) § The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format if field 1 > 500 AND field 2 == 1 AND field 3 > 1 13 <Simple. Rule score="Alert" weight="1. 0"> <Compound. Predicate boolean. Operator="and"> <Simple. Predicate field="field 1" operator="greater. Than" value="500"/> <Simple. Predicate field="field 2" operator="equal" value="1"/> <Simple. Predicate field="field 3" operator="greater. Than" value="1"/> </Compound. Predicate> </Simple. Rule>
Predictive Model Markup Language (PMML) § The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format 14
Machine learning tools supporting pmml 15
Model scoring using Open. Scoring. io library § Parse the pmml file(s) § Pass on the Feature Set to the model(s) § Run the ‘predict’ function which returns the output of the model(s) Control stream Feature sets Data stream 16 model scoring Score
Supported models(*) Association rules Regression Cluster model Rule set General regression Scorecard Naive Bayes Support Vector Machine k-Nearest neighbours Tree model Neural network Ensemble model (*) supported models by http: //openscoring. io/ 17
Goals Use of various types of models One codebase, Saa. S Deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains 18
One Bank Strategy Market leaders Benelux Challengers Growth markets Commercial Banking 19
How flexible is this architecture? Amount = “ 42, 00” Amountincents = Feature 4200 extraction & Model scoring Amount = 42. 00 20
Decoupled architecture Amount = “ 42. 00” Pre. Processor Amountincents = 4200 Amount = 42. 00 21 Amountincents = 4200 Feature extraction & Model scoring Busines s events
Goals Use of various types of models One codebase, Saa. S Deployment model Make changes instantly (no downtime) § Use case § Feature extraction § Enriching streams § End user tooling § Demo Multiple domains 22
Use case • Your phone with the banking app installed is stolen • Limit on the banking app is 1. 000, • Funds are transferred from your account (A) to a mule account (B) 23
Model features and model output Amount > 500 Nr. Of Trxs Last 1 h First Trx <24 h ago 24 Model Alert || OK
Stream with stateless operators A B 1000 Ev. 1 Amount, Unknown, Prev. Trxs (1000, ? ) Fe. X Feature extraction 25 PMM L Model scoring
Stream with stateful operators A B 1000 Ev. 1 Ev. 2 Amount, Unknown, Prev. Trxs 0) (1000, true, 1) Fe. X PMM L Model scoring STATE 26 Key Value (A, B, First. Trx) Ev. 1 (A, B, Historical. Trxs) ev 1 1000, ev 1 1000 ev 2 1000 Alert || OK
How to perform aggregate functions on a stream? 27 Average amount last week: € 37, 04 Max amount last month: € 834, 12
Enriching the stream based. A on multiple keys A B IP 1000 Ev. 1 3542321 B IP 1000 Ev. 1 Split 3542321 A A, E, I. . 192. x. x. 1, 192. x. x. 5 A’ 3542321 B B 192. x. x. 2, D, F B’. . 192. x. x. 6 3542321 A. B IP C 192. x. x. 3, G, A. B’ H 192. x. x. 7. . 3542321 IP’……. J, 192. x. x. 4, K. . Accounts are distributed across the task managers 28 Aggregation step Calculating features
Aggregating and model scoring Aggregation A B IP 1000 Ev. 1 A. B’ Model Scoring (A. B’, (B’) 1000) 1. Amount 2. (A. B). First. Tr x 3. (A. B). Nr. Trxs 1. B’ B’ 29 A B IP 1000 Ev. 1 1. IP’ 2. ….
Domain Specific Language (DSL) A DSL is a domain specific language. We use it to define the behaviour of our operators. § The persist rules (which data to store within state) § Feature calculation rules § Model definition rules 30
Definition instead of code - Persist rule history[double, 4 weeks, 100] @(source. Accnt. Nr. dest. Accnt. Nr). Trxs : = $amount 31
Feature Calculation rules Nr. Of Trxs Last 1 h count(between @(source. Accnt. Nr. dest. Accnt. Nr). Trxs, $eventtime-1 hour)); First Trx A to B <24 h @(source. Accnt. Nr. dest. Accnt. Nr). First. Used >= $eventtime-24 hours; 32
Creating models offline, scoring online offline online DSL <PMML /> {PFA} Portable model HDFS Model creation 33 Data scientist with offline tooling Streaming platform Model execution
Control streams
Streaming in the defintions DSL files § Persist rules Split Broad cast § Model definitions § Feature calculation rules Fex & Model scoring 35
Demo
Goals Use of various types of models One codebase, Saa. S Deployment model Make changes instantly (no downtime) Multiple domains 37
Multiple domains – ponder on this We have built a feature-extraction engine and used that to make a Fraud-Risk Engine Can we also build this? …. § § 38 Customer Notifications? Calculating RFQ’s for Bond Prices? Product Fullfilment engine? Other?
Take aways Decoupled architecture with preprocessor 39 Enriching events with multiple keys End users making changes Multiple domain