CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn
- Slides: 20
CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn Meyers ISO Innovative Analytics October 5, 2006
Choosing Models • Predicting losses for individual insurance policies involves: – Millions of policy records – Hundreds (or thousands) of variables • There a number of models that provide good predictions – GLM, GAM, CART, MARS, Neural Nets, etc. • Business objectives influence choice of model
The Modeling Process • Modeling process involves dimension reduction techniques – Clustering, Principal Components, Factor Analysis – Building submodels and using predicted values as input into a higher level model • The modeling cycle – 1. Build model with training data – 2. Evaluate model with test data – 3. Identify improvements in models and data – 4. Go back to Step 1
Hidden Parameters • Classic model building methods correct for the number of parameters using “degrees of freedom. ” • The model exploration process “eats up degrees of freedom” in ways that cannot be captured by formal model adjustments. • In essence the “test” data gets merged into the “training” data.
What Is Significant? • Statistical packages will often identify improvements that are “statistically significant” but not “practically significant. ” • This talk is about determining when a model identifies “practically significant” improvements. • Illustrate how to do this on a real example.
The Example A Personal Auto Model Under Development Preliminary Results • Input – Address of insured vehicle • Output – Address Specific Loss Cost – 30 year old, single car with no SDIP points – 500 deductible or 25/50/25 policy limits – Symbol 8, model year 2006 – etc. • Model derived from over 1, 200 variables reflecting weather, traffic, demographic, topographical and economic conditions.
Difference Between Address Specific and ISO Territory Loss Cost
Differences Abound Some Questions to Ask • Can the model output be used to improve insurer underwriting results? • Are the results statistically significant? Define ELI
Use Expected Loss Index for Risk Selection
Propose a Standard Way of Evaluating Lift – The Gini Index • Originally proposed by Corrado Gini in 1912 • Most often used to measure income and/or wealth inequality – Search for “Gini” in wikipedia. org • In insurance underwriting, we want to evaluate systematic methods of finding “loss” inequality.
Gini Index • Look at set of policy records below cutoff point, ELI < 1. • This set of records accounts for 59% of total ISO (full) loss cost. • This set of records accounts for 48% of total loss. • 1 − 48/59 → 19% reduction in loss ratio.
Gini Index • Do this calculation for other cutoff points. • The results make up the what we call the Lorenz Curve
Gini Index • If ELI is random, the Lorenz curve will be on the diagonal line. • The Gini index is the percentage of the area under the “random” line that is above the Lorenz curve. • Higher Gini means better predictive model.
A Gini Index Thought Experiment • If we had the ability to predict who will have losses, what would the Gini index be? • It would be 100% if only one risk had all the losses
Bodily Injury
Property Damage
Collision
Statistical Significance • How much random fluctuation is in the Gini index calculation? • Use bootstrapping to evaluate – Take a random sample of records, with replacement. – Calculate Gini index for the sample. – Repeat 250 times. • Plot a histogram of the results.
Bootstrap Results
Summary • Standard tests of statistical significance are suspect. – – • • Informal model selection process Statistical/Practical significance Propose Gini index as a test of practical significance. Divide data into three samples 1. Training – Used to fit models 2. Test – Used to evaluate fits 3. Holdout – “Final” evaluation 2 R
- Predictive risk modeling
- Predictive analytics risk adjustment healthcare examples
- Predictive analytics risk adjustment healthcare examples
- Aep predictive modeling
- Predictive maintenance seminar
- Model and role modeling theory
- Dimensional modeling vs relational modeling
- Logic driven model
- Difference between model and semi modals
- Glenn bard
- Glenn ahrens
- Glenn hubbard inside job
- Glenn christie
- Glenn dion
- Maysam alie-bazzi
- Periodic table by stowe and tarantola
- Veia braquiocefálica
- Blank development software
- Glenn york elementary
- Fontan
- Glenn fagan