CAS Predictive Modeling Seminar Visualizing Predictive Modeling Results

  • Slides: 31
Download presentation
!@ CAS Predictive Modeling Seminar Visualizing Predictive Modeling Results Chuck Boucek (312) 879 -3859

!@ CAS Predictive Modeling Seminar Visualizing Predictive Modeling Results Chuck Boucek (312) 879 -3859 #

Agenda • Data Validation • Hypothesis Building • Model Testing • Monitoring • Visualization

Agenda • Data Validation • Hypothesis Building • Model Testing • Monitoring • Visualization as a Diagnostic Tool 1

Data Validation • Goals – Validate reasonableness of data – Understand key patterns in

Data Validation • Goals – Validate reasonableness of data – Understand key patterns in data – Understand changes in data and underlying business through time 2

Data Validation • Histogram is a simple tool to for reasonability testing of modeling

Data Validation • Histogram is a simple tool to for reasonability testing of modeling database 3

Data Validation • Mosaic Plot shows the distribution of predictors in two dimensions 4

Data Validation • Mosaic Plot shows the distribution of predictors in two dimensions 4

Data Validation • Missing Data plot shows the relationship of missing data elements 5

Data Validation • Missing Data plot shows the relationship of missing data elements 5

Data Validation 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5

Data Validation 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1. 0 Claims Match to Exposure • Time series plots identify consistency of data over time Company 1 Company 2 Company 3 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 6

Hypothesis Building • Goals – Perform initial analysis of potential predictor variables – Limit

Hypothesis Building • Goals – Perform initial analysis of potential predictor variables – Limit the list of predictor variables to be employed in subsequent phases of model building – Further reasonability testing of data 7

0 0. 001 0. 02 0. 3 0 0. 02 200 0. 01 0

0 0. 001 0. 02 0. 3 0 0. 02 200 0. 01 0 0. 001 Premium ($MM) 50 100 150 0 0 10000 0. 3 Exposure ($MM) 2500 5000 7500 0. 02 0 0. 01 40000 0. 001 32500 0. 001 25000 Severity 0 17500 0 10000 0. 250 0. 0 5000 0. 375 0. 500 Loss Ratio 0. 750 Frequency 10000 0. 625 1. 12500 Pure Premium 7500 0. 750 1. 500 15000 Demographic Variable 1 8

Hypothesis Building • Quantile-Quantile plots help identify needed transformations of data 9

Hypothesis Building • Quantile-Quantile plots help identify needed transformations of data 9

Hypothesis Building • Correlation Web concisely summarizes a correlation matrix 10

Hypothesis Building • Correlation Web concisely summarizes a correlation matrix 10

Model Building • Model building is an iterative process • Understanding patterns and relationships

Model Building • Model building is an iterative process • Understanding patterns and relationships throughout this process is critical 11

Model Building • Partial Plots are a key tool to visualize predictor variables throughout

Model Building • Partial Plots are a key tool to visualize predictor variables throughout the model building process • What is a “Partial Plot? ” Linear Predictor = k + b 1 X 1 + b 2 X 2 + b 3 X 3 + b 4 X 4 Predicted value = (ek) x (eb 1 X 1) x (eb 2 X 2) x (eb 3 X 3) x (eb 4 X 4) • Partial Plot demonstrates an individual predictor variables contribution to final prediction 12

Model Building • Partial Plot demonstrates an individual predictor variables contribution to final prediction

Model Building • Partial Plot demonstrates an individual predictor variables contribution to final prediction 8000 1. 25 4000 6000 1. 00 2000 0. 75 0 0. 50 0 5 10 15 20 25 30 13

Model Building 0. 25 0. 50 0. 75 1. 000 1. 25 • Partial

Model Building 0. 25 0. 50 0. 75 1. 000 1. 25 • Partial Plot with modified scatter plot of variable 0 10 20 30 14

Model Building • Time Consistency plot is a critical tool for numeric predictors 0

Model Building • Time Consistency plot is a critical tool for numeric predictors 0 0 5 10 15 1997 1998 1999 2000 15 20 20 25 25 15

Model Building • Partial Plot for a factor variable 1. 30 1. 20 Credit

Model Building • Partial Plot for a factor variable 1. 30 1. 20 Credit Level 2 1. 10 1. 00 0. 80 0. 70 Credit Level 1 0. 90 16

No Yes 200 No 0 Yes Premium ($MM) 50 100 150 No No 0

No Yes 200 No 0 Yes Premium ($MM) 50 100 150 No No 0 Exposure (Pred. Count) 2500 5000 7500 10000 40000 Yes 32500 Yes 25000 No 17500 No 10000 Severity 0. 250 0. 0 5000 0. 375 7500 0. 500 Loss Ratio 0. 750 Frequency 10000 0. 625 1. 12500 Pure Premium 0. 750 1. 500 15000 Credit Variable 1 17

Model Testing • Likely the most critical visualizations in predictive modeling work – Management’s

Model Testing • Likely the most critical visualizations in predictive modeling work – Management’s perception of a project’s success will likely depend on these visualizations • Holdout tests • Cross validation tests 18

Model Testing • Lift Chart shows overall model performance Loss Ratio Lift Chart -

Model Testing • Lift Chart shows overall model performance Loss Ratio Lift Chart - Holdout Sample 1. 0 Predicted Actual 0. 9 0. 7 0. 8 0. 7 0. 6 0. 5 0. 4 19

Model Testing • ROC Curve shows overall model performance 0. 4 0. 6 0.

Model Testing • ROC Curve shows overall model performance 0. 4 0. 6 0. 8 1. 0 Holdout Sample ROC Curve 0. 0 0. 2 Null, 0 Perfect, 1 prem, 0. 51 pred. loss, 0. 56 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 20

Model Testing • Classical Cross Validation exhibit Prediction Error Out of Sample Error Number

Model Testing • Classical Cross Validation exhibit Prediction Error Out of Sample Error Number of variables in final model In Sample Error 5 10 15 20 25 Number of Predictors 21

Monitoring Model Results • The work does not end when the lift chart looks

Monitoring Model Results • The work does not end when the lift chart looks good • Monitoring tools – Decile management – Exception analysis – Model vs. Actual Results 22

Monitoring Model Results • Decile Management – – Retention Loss Ratio Rate Action Tier/Schedule

Monitoring Model Results • Decile Management – – Retention Loss Ratio Rate Action Tier/Schedule Mod 23

Monitoring Model Results • Average score over time 24

Monitoring Model Results • Average score over time 24

Monitoring Model Results • Loss ratio of model exceptions 25

Monitoring Model Results • Loss ratio of model exceptions 25

Visualization as Diagnostic Tool • Frequency and severity models have been developed • Model

Visualization as Diagnostic Tool • Frequency and severity models have been developed • Model is underperforming in predicting loss ratio • Likely cause of underperformance is severity model 26

Visualization as Diagnostic Tool 27

Visualization as Diagnostic Tool 27

Visualization as Diagnostic Tool 28

Visualization as Diagnostic Tool 28

Visualization as Diagnostic Tool 29

Visualization as Diagnostic Tool 29

Visualization as Diagnostic Tool • Two different visualizations of the same model tell a

Visualization as Diagnostic Tool • Two different visualizations of the same model tell a very different story! 30