CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn

Choosing Models • Predicting losses for individual insurance policies involves: – Millions of policy

The Modeling Process • Modeling process involves dimension reduction techniques – Clustering, Principal Components,

Hidden Parameters • Classic model building methods correct for the number of parameters using

What Is Significant? • Statistical packages will often identify improvements that are “statistically significant”

The Example A Personal Auto Model Under Development Preliminary Results • Input – Address

Difference Between Address Specific and ISO Territory Loss Cost

Differences Abound Some Questions to Ask • Can the model output be used to

Use Expected Loss Index for Risk Selection

Propose a Standard Way of Evaluating Lift – The Gini Index • Originally proposed

Gini Index • Look at set of policy records below cutoff point, ELI <

Gini Index • Do this calculation for other cutoff points. • The results make

Gini Index • If ELI is random, the Lorenz curve will be on the

A Gini Index Thought Experiment • If we had the ability to predict who

Statistical Significance • How much random fluctuation is in the Gini index calculation? •

Summary • Standard tests of statistical significance are suspect. – – • • Informal

Slides: 20

Download presentation

CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn Meyers ISO Innovative Analytics October 5, 2006

Choosing Models • Predicting losses for individual insurance policies involves: – Millions of policy records – Hundreds (or thousands) of variables • There a number of models that provide good predictions – GLM, GAM, CART, MARS, Neural Nets, etc. • Business objectives influence choice of model

The Modeling Process • Modeling process involves dimension reduction techniques – Clustering, Principal Components, Factor Analysis – Building submodels and using predicted values as input into a higher level model • The modeling cycle – 1. Build model with training data – 2. Evaluate model with test data – 3. Identify improvements in models and data – 4. Go back to Step 1

Hidden Parameters • Classic model building methods correct for the number of parameters using “degrees of freedom. ” • The model exploration process “eats up degrees of freedom” in ways that cannot be captured by formal model adjustments. • In essence the “test” data gets merged into the “training” data.

What Is Significant? • Statistical packages will often identify improvements that are “statistically significant” but not “practically significant. ” • This talk is about determining when a model identifies “practically significant” improvements. • Illustrate how to do this on a real example.

The Example A Personal Auto Model Under Development Preliminary Results • Input – Address of insured vehicle • Output – Address Specific Loss Cost – 30 year old, single car with no SDIP points – 500 deductible or 25/50/25 policy limits – Symbol 8, model year 2006 – etc. • Model derived from over 1, 200 variables reflecting weather, traffic, demographic, topographical and economic conditions.

Difference Between Address Specific and ISO Territory Loss Cost

Differences Abound Some Questions to Ask • Can the model output be used to improve insurer underwriting results? • Are the results statistically significant? Define ELI

Use Expected Loss Index for Risk Selection

Propose a Standard Way of Evaluating Lift – The Gini Index • Originally proposed by Corrado Gini in 1912 • Most often used to measure income and/or wealth inequality – Search for “Gini” in wikipedia. org • In insurance underwriting, we want to evaluate systematic methods of finding “loss” inequality.

Gini Index • Look at set of policy records below cutoff point, ELI < 1. • This set of records accounts for 59% of total ISO (full) loss cost. • This set of records accounts for 48% of total loss. • 1 − 48/59 → 19% reduction in loss ratio.

Gini Index • Do this calculation for other cutoff points. • The results make up the what we call the Lorenz Curve

Gini Index • If ELI is random, the Lorenz curve will be on the diagonal line. • The Gini index is the percentage of the area under the “random” line that is above the Lorenz curve. • Higher Gini means better predictive model.

A Gini Index Thought Experiment • If we had the ability to predict who will have losses, what would the Gini index be? • It would be 100% if only one risk had all the losses

Bodily Injury

Property Damage

Collision

Statistical Significance • How much random fluctuation is in the Gini index calculation? • Use bootstrapping to evaluate – Take a random sample of records, with replacement. – Calculate Gini index for the sample. – Repeat 250 times. • Plot a histogram of the results.

Bootstrap Results

Summary • Standard tests of statistical significance are suspect. – – • • Informal model selection process Statistical/Practical significance Propose Gini index as a test of practical significance. Divide data into three samples 1. Training – Used to fit models 2. Test – Used to evaluate fits 3. Holdout – “Final” evaluation 2 R