A Rorschach Test 1 Variable Importance in Environmental
A Rorschach Test 1
Variable Importance in Environmental Studies S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011
Current Challenges in Statistical Learning 1. Statistical methods 2. Data quality 3. Invalid claims a. Multiple testing b. Multiple modeling c. Bias
Great Smog of '52 or Big Smoke 12, 000 estimated deaths
Pope et al. 2009
Studied Variables Life Expectancy life-table methods Per capita income (in thousands of $) Lung Cancer (Age standardized death rate) COPD (Age standardized death rate) High-school graduates (proportion of population) PM 2. 5 (μg/m 3) Black population (proportion of population) Population (in hundreds of thousands) 5 -Year in-migration (proportion of population) Hispanic population (proportion of population) Urban residence (proportion of population) 6
First Analysis, Regression Variable SS First SS Last Income 31. 8 15. 6 Lung Cancer 22. 4 5. 1 COPD 21. 5 4. 1 High School 15. 9 0. 0 Population 9. 4 5. 2 PM 2. 5 9. 4 5. 8 Hispanic 4. 3 2. 4 Black 3. 1 1. 7 Urban 1. 4 0. 8 Migration 0. 0 0. 8 7
Recursive Partitioning 8
Variable Importance Variable Regression RP Income 0. 3390 0. 2108 COPD 0. 1621 0. 1199 Lung Cancer 0. 1768 0. 1467 PM 2. 5 0. 0732 0. 1302 High School 0. 0997 0. 1066 %Black 0. 0537 0. 0319 Pop Density 0. 0418 0. 0793 %Hispanic 0. 0177 0. 0136 Migration 0. 0228 0. 0202 Urban 0. 0133 0. 0105 9
East versus West Krewski et al. 2000 Health Effects In. Enstrom 2005 Inhalation Toxicology Bell et al. 2007 Env Health Pers Smith et al. 2009 Inhalation Toxicology Jerrett 2010 CARB workshop 10
Fine particles and Mortality Pope co-author, 2000. 11
Ozone and Mortality 12
Variable Importance 13
Longevity versus PM 2. 5 14 East : Gray West : Red
Longevity versus Income 15
Hans Rosling's 200 Countries, 200 Years 16 http: //www. youtube. com/watch? v=jbk. SRLYSojo
Summary to this point Income is very important. PM 2. 5 is 4 th or 5 th in importance. PM 2. 5 is not important in West. Pope knew or should have known the East/West heterogeneity. 17
E 1: Breakfast cereal and boy babies 18
P-value plot 19
E 2 : Peto, NEJM, statins and cancer Hypothesis: The (SEAS) trial has raised the hypothesis that adding ezetimibe to statin therapy might increase the incidence of cancer.
The claim fails to replicate. The relative risk is wide (95% CI, 1. 13 to 2. 12; 99% CI, 1. 02 to 2. 33; uncorrected P = 0. 006 before any allowance is made for this being the hypothesis-generating result. NB: 16 x 0. 006 = 0. 098. SEAS New Studies
E 3: A multiple testing and modeling train wreck JAMA 1. 275 chemicals 2. 32 medical outcomes 3. 10 demographic covariates 275 x 32 = 8800 x 2^10 = ~9 million A CDC “systems” train wreck in progress!
E 4 : Bias Example: Lancet DAD study Author Interpretation There exists an increased risk of myocardial infarction in patients exposed to abacavir and didanosine within the preceding 6 months.
First drug use (Text, page 1422, and Table 3)
E 4 : BMJ versus JAMA (1) BMJ 2010; 341: c 4444 Conclusion: The risk of oesophageal cancer increased with 10 or more prescriptions for oral bisphonates and with prescriptions over about a five year period. 25
E 4: BMJ versus JAMA (2) JAMA 2010; 304(6): 657‐ 663 Conclusion: Oral bisphonates was not significantly associated with incident of esophageal or gastric cancer. 26
A Rorschach Test With large, complex data sets, there is enough flexibility to get what you want/need. 27
Consumer Wishes Honest science Valid claims Claims in context + and – of data and methods 28
What do we have? (Deming) A systems failure. Essentially no process control. Journals operating by “quality by inspection”. Workers are happy. Management failure. 29
What to do? Funding agencies need to require data access on publication. Editors need to give up quality by inspection require split sample strategy require number of claims at issue. 30
Statisticians Eventually society will figure it out; Scientific claims are (most) often wrong. Essentially all claims are supported by statistics. Society will ask, “Where were the statisticians? ” 31
Contact Stan Young www. niss. org young@niss. org 919 685 9328 32
- Slides: 32