CAS Predictive Modeling Seminar Visualizing Predictive Modeling Results
- Slides: 31
!@ CAS Predictive Modeling Seminar Visualizing Predictive Modeling Results Chuck Boucek (312) 879 -3859 #
Agenda • Data Validation • Hypothesis Building • Model Testing • Monitoring • Visualization as a Diagnostic Tool 1
Data Validation • Goals – Validate reasonableness of data – Understand key patterns in data – Understand changes in data and underlying business through time 2
Data Validation • Histogram is a simple tool to for reasonability testing of modeling database 3
Data Validation • Mosaic Plot shows the distribution of predictors in two dimensions 4
Data Validation • Missing Data plot shows the relationship of missing data elements 5
Data Validation 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1. 0 Claims Match to Exposure • Time series plots identify consistency of data over time Company 1 Company 2 Company 3 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 6
Hypothesis Building • Goals – Perform initial analysis of potential predictor variables – Limit the list of predictor variables to be employed in subsequent phases of model building – Further reasonability testing of data 7
0 0. 001 0. 02 0. 3 0 0. 02 200 0. 01 0 0. 001 Premium ($MM) 50 100 150 0 0 10000 0. 3 Exposure ($MM) 2500 5000 7500 0. 02 0 0. 01 40000 0. 001 32500 0. 001 25000 Severity 0 17500 0 10000 0. 250 0. 0 5000 0. 375 0. 500 Loss Ratio 0. 750 Frequency 10000 0. 625 1. 12500 Pure Premium 7500 0. 750 1. 500 15000 Demographic Variable 1 8
Hypothesis Building • Quantile-Quantile plots help identify needed transformations of data 9
Hypothesis Building • Correlation Web concisely summarizes a correlation matrix 10
Model Building • Model building is an iterative process • Understanding patterns and relationships throughout this process is critical 11
Model Building • Partial Plots are a key tool to visualize predictor variables throughout the model building process • What is a “Partial Plot? ” Linear Predictor = k + b 1 X 1 + b 2 X 2 + b 3 X 3 + b 4 X 4 Predicted value = (ek) x (eb 1 X 1) x (eb 2 X 2) x (eb 3 X 3) x (eb 4 X 4) • Partial Plot demonstrates an individual predictor variables contribution to final prediction 12
Model Building • Partial Plot demonstrates an individual predictor variables contribution to final prediction 8000 1. 25 4000 6000 1. 00 2000 0. 75 0 0. 50 0 5 10 15 20 25 30 13
Model Building 0. 25 0. 50 0. 75 1. 000 1. 25 • Partial Plot with modified scatter plot of variable 0 10 20 30 14
Model Building • Time Consistency plot is a critical tool for numeric predictors 0 0 5 10 15 1997 1998 1999 2000 15 20 20 25 25 15
Model Building • Partial Plot for a factor variable 1. 30 1. 20 Credit Level 2 1. 10 1. 00 0. 80 0. 70 Credit Level 1 0. 90 16
No Yes 200 No 0 Yes Premium ($MM) 50 100 150 No No 0 Exposure (Pred. Count) 2500 5000 7500 10000 40000 Yes 32500 Yes 25000 No 17500 No 10000 Severity 0. 250 0. 0 5000 0. 375 7500 0. 500 Loss Ratio 0. 750 Frequency 10000 0. 625 1. 12500 Pure Premium 0. 750 1. 500 15000 Credit Variable 1 17
Model Testing • Likely the most critical visualizations in predictive modeling work – Management’s perception of a project’s success will likely depend on these visualizations • Holdout tests • Cross validation tests 18
Model Testing • Lift Chart shows overall model performance Loss Ratio Lift Chart - Holdout Sample 1. 0 Predicted Actual 0. 9 0. 7 0. 8 0. 7 0. 6 0. 5 0. 4 19
Model Testing • ROC Curve shows overall model performance 0. 4 0. 6 0. 8 1. 0 Holdout Sample ROC Curve 0. 0 0. 2 Null, 0 Perfect, 1 prem, 0. 51 pred. loss, 0. 56 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 20
Model Testing • Classical Cross Validation exhibit Prediction Error Out of Sample Error Number of variables in final model In Sample Error 5 10 15 20 25 Number of Predictors 21
Monitoring Model Results • The work does not end when the lift chart looks good • Monitoring tools – Decile management – Exception analysis – Model vs. Actual Results 22
Monitoring Model Results • Decile Management – – Retention Loss Ratio Rate Action Tier/Schedule Mod 23
Monitoring Model Results • Average score over time 24
Monitoring Model Results • Loss ratio of model exceptions 25
Visualization as Diagnostic Tool • Frequency and severity models have been developed • Model is underperforming in predicting loss ratio • Likely cause of underperformance is severity model 26
Visualization as Diagnostic Tool 27
Visualization as Diagnostic Tool 28
Visualization as Diagnostic Tool 29
Visualization as Diagnostic Tool • Two different visualizations of the same model tell a very different story! 30
- Predictive maintenance seminar
- Aep predictive modeling
- Predictive risk modeling
- Health care risk adjustment and predictive modeling
- Predictive analytics risk adjustment healthcare examples
- Model and role modeling theory
- Relational vs dimensional data modeling
- 1-1 practice nets and drawings for visualizing geometry
- Performance lawn equipment case study
- Domain model guidelines
- Visualizing environmental science (doc or html) file
- Visualizing reading strategy
- Nets and drawings for visualizing geometry
- Visualizing and understanding recurrent networks
- Visualizing and verbalizing program
- Organizing and visualizing variables
- Visualizing and verbalizing activities
- Visualizing magnetic field
- Limiting reactant
- Organizing and visualizing variables
- Nets and drawings for visualizing geometry
- Visualizing and understanding convolutional neural networks
- The perimeter of a closed plane figure is the length of its
- Visualizing environmental science solution manual
- Visualizing and understanding neural machine translation
- Shield volcano
- Electric currents and magnetic fields
- Shoko hamano
- Net of a cube
- Coco
- Visualizing primary and secondary growth
- Visualizing environmental science solution manual