Predicting Loan Delinquency at 1 M Transactions per

  • Slides: 20
Download presentation
Predicting Loan Delinquency at 1 M Transactions per Second David Smith @revodavid R Community

Predicting Loan Delinquency at 1 M Transactions per Second David Smith @revodavid R Community Lead, Microsoft

It looks like you’ve created a predictive model… 2 NOW WHAT?

It looks like you’ve created a predictive model… 2 NOW WHAT?

http: //hamiltonmusical. wikia. com/wiki/Right_Hand_Man TRAINING A MODEL IS EASY OPERATIONALIZING IT IS HARDER 3

http: //hamiltonmusical. wikia. com/wiki/Right_Hand_Man TRAINING A MODEL IS EASY OPERATIONALIZING IT IS HARDER 3

Generating Predictions Batch Mode • Create many (millions!) of predictions at once • Time

Generating Predictions Batch Mode • Create many (millions!) of predictions at once • Time required proportional to number of predictions Real Time • Only a few (maybe only one!) data point available to predict – There may be multiple requests in a short timeframe • Latency the key metric here – Many applications require sub-second latency at endpoint 4

Real-Time Operationalization Options • Rewrite prediction code in some other language – PMML /

Real-Time Operationalization Options • Rewrite prediction code in some other language – PMML / C++ / Java / … • OR, use your R code: – Deploy as a web service with Microsoft R Server – Deploy as a stored procedure in SQL Server 5

Lending Club Loan Performance Data • www. lendingclub. com/info/download-data. action – Feature selection and

Lending Club Loan Performance Data • www. lendingclub. com/info/download-data. action – Feature selection and generation: aka. ms/lendingclub Loan. Stat. New Description all_util annual_inc_joint Balance to credit limit on all trades The combined self-reported annual income provided by the co-borrowers during registration A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the coborrowers' combined self-reported monthly income Interest Rate on the loan The number of months since the last public record. Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit. Principal received to date Late > 16 days, Default, or Charged Off dti_joint int_rate mths_since_last_record revol_util total_rec_prncp is_bad (generated) 6

Operationalization with Microsoft R Server Consumption Deployment Data Scientist Quant IT Administator Microsoft R

Operationalization with Microsoft R Server Consumption Deployment Data Scientist Quant IT Administator Microsoft R Client Publish R function into web services Microsoft R Client Explore and consume services in R directly e vic r e t. S (mrsdeploy package) ge publish. Service (mrsdeploy package) Microsoft R Server configured for operationalizing R analytics RE ST AP Ic alls Configuration § § § In-cloud or on-prem Add nodes to scale out High availability & load balancing Integration Developer Swagger-based APIs: Consume with any programming language

Flexible vs Real-Time Deployment Flexible Deployment Real-Time Deployment Publish any R script or function

Flexible vs Real-Time Deployment Flexible Deployment Real-Time Deployment Publish any R script or function as Web Service • R interpreter runs script on demand via REST API Publish R model object • Revo. Scale. R or Microsoft. ML • Prediction engine generates scores from data via REST API library(mrsdeploy) publish. Service( service. Type='Script', Code=<<R script or function>>) library(mrsdeploy) publish. Service( service. Type='Real. Time', model=<<R object>>) 8

Real-Time Deployment Models Linear Regression (rx. Lin. Mod, rx. Fast. Linear) Logistic Regression (rx.

Real-Time Deployment Models Linear Regression (rx. Lin. Mod, rx. Fast. Linear) Logistic Regression (rx. Logit, rx. Logistic. Regression) Classification / Regression trees (rx. DTree, rx. Fast. Trees) Classification / Regression forests (rx. DForest, rx. Fast. Forest) Stochastic gradient-boosted decision trees (rx. BTrees) One-class Support Vector Machines (rx. One. Class. Svm) Convolutional Neural Networks (rx. Neural. Net) Also: pre-trained models for text sentiment and image featurization 9

Demonstration Server: Azure Data Science Virtual Machine, Azure GS 5 instance (32 cores, 448

Demonstration Server: Azure Data Science Virtual Machine, Azure GS 5 instance (32 cores, 448 GB memory) Client: Surface. Book FLEXIBLE AND REAL-TIME SCORING WITH MICROSOFT R SERVER 10

11

11

12

12

13

13

14

14

15

15

Flexible vs Real-Time Performance Comparison Server: Standard_D 3_v 2 (4 CPU core, 14 GB

Flexible vs Real-Time Performance Comparison Server: Standard_D 3_v 2 (4 CPU core, 14 GB RAM), Windows Algos Rx. Logit (model size 2 K) Rx. Neural. Net (model size 8 K) 16 Real time (ms) 3. 5 2. 5 Flexible (ms) 39. 2 122. 0 Model Size Real time (ms) Flexible (ms) 2 MB 5. 0 9215. 7 43 MB 5. 4 20255. 6 (Rx. Logistic. Regression)

Deployment in SQL Server 2016 sp_execute_external_script Flexible Microsoft R Client (Revo. Scale. R package)

Deployment in SQL Server 2016 sp_execute_external_script Flexible Microsoft R Client (Revo. Scale. R package) rx. Serialize. Object SQL SERVER 2016 sp_rx. Predict Real-Time 17

1 M predictions/sec Same benchmark One-sixth the resources SQL Server 2017 8 sockets, 192

1 M predictions/sec Same benchmark One-sixth the resources SQL Server 2017 8 sockets, 192 cores 6 TB RAM Flexible operationalization blog. revolutionanalytics. com/2016/09/fraud-detection. html 20

Operationalization Overview Platform Flexible Operationalization • Any R Function / Package Real-Time Operationalization •

Operationalization Overview Platform Flexible Operationalization • Any R Function / Package Real-Time Operationalization • Specific Revo. Scale. R / Microsoft. ML models SQL Server EXEC sp_execute_external_script EXEC sp_rx. Predict @language = N'R', @model=<<serialized R object>> @script = N'<<R script>>' @input. Data=<<SQL query>> Microsoft R Server library(mrsdeploy) publish. Service( service. Type='Script', Code=<<R script or function>>) library(mrsdeploy) publish. Service( service. Type='Real. Time', model=<<R object>>) • Use Microsoft R Server 9+ or SQL Server 2016+ as the deployment server • Flexible Operationalization supports any R code / package • Real-Time Operationalization supports Microsoft R models with improved latency 21

Thank You! David Smith @revodavid R Community Lead, Microsoft Special thanks: Pratik Palnitkar, Microsoft

Thank You! David Smith @revodavid R Community Lead, Microsoft Special thanks: Pratik Palnitkar, Microsoft Arun Gurunathan, Microsoft Download Microsoft R Client: aka. ms/rclient Data Science Virtual Machine: aka. ms/dsvm