STATISTICAL LEARNING FROM DATA LIES DAMNED LIES AND

  • Slides: 36
Download presentation
STATISTICAL LEARNING FROM DATA “LIES, DAMNED LIES, AND STATISTICS’’, Mark Twain. Senate Approves Tighter

STATISTICAL LEARNING FROM DATA “LIES, DAMNED LIES, AND STATISTICS’’, Mark Twain. Senate Approves Tighter Policing of Drug Makers, May 8, 2007 Mark van der Laan, www. stat. berkeley. edu/~laan

OVERVIEW How good is the human statistical intuition? p Statistics are a tool of

OVERVIEW How good is the human statistical intuition? p Statistics are a tool of modern life. p The loose statistical world of data analyses and scientific publishing: “Why more than half [data based] published research findings are false? ” (John P. A. Ioannidis) p The rigid statistical world of the FDA and Drug Manufacturers: Role of statistics in clinical trials. p Advances in statistics can help but are heavily un-used: The challenge. Can we reduce the cost of prescription drugs through improvements in statistical practice and design? Can statistics improve safety reviews of drugs in use? p Concluding remarks. p

THE QUIZZ-MASTER PROBLEM Behind one of these doors is a car. Behind the other

THE QUIZZ-MASTER PROBLEM Behind one of these doors is a car. Behind the other two is a goat. Click on the door that you think the car is behind.

YOU SELECT DOOR 1 To keep it exciting, I open one of the other

YOU SELECT DOOR 1 To keep it exciting, I open one of the other two doors without a car behind it. Obviously the car is not behind door 3. But before I open door 1, the door you selected, I'm going to let you switch to door 2 if you like. Again, click on the door in which you think the car is behind.

Congratulations! You're a winner! Recap: You originally picked door 1 and then switched to

Congratulations! You're a winner! Recap: You originally picked door 1 and then switched to door 2. Here is a summary of how previous contestants have fared. # of Players Winners Percent Winners Switched 131 88 67. 2 Didn't Switch 116 42 36. 2

Quizz-Master Problem was raised by reader in Marilyn Vos Savant's Sunday Parade Column This

Quizz-Master Problem was raised by reader in Marilyn Vos Savant's Sunday Parade Column This problem was given the name The Monty Hall Paradox in honor of the long time host of the television game show "Let's Make a Deal. " Articles about the controversy appeared in the New York Times and other papers around the country. Marilyn's answer to the reader was that the contestant should switch doors and she received nearly 10, 000 responses from readers, most of them disagreeing with her. Hundreds of them were Professors in mathematics and scientists whose responses ranged from hostility to disappointment at the nation's lack of mathematical skills.

Statistics are a tool of modern life p p Identifying associations, correlations and patterns

Statistics are a tool of modern life p p Identifying associations, correlations and patterns Establishing causation based on randomized trials and observational studies. Making predictions Shaping strategies and future behavior n n n Of people and societies Of machines and computing Of complex systems

Statistics are often cited to denote a certainty that does not, in fact, exist

Statistics are often cited to denote a certainty that does not, in fact, exist p “There is increasing concern that in modern research, false findings may be the majority, or even the vast majority, of published research claims. ” n n “Simulations show that for most study designs and settings it is more likely for a research claim to be false than true. ” “For many current scientific fields, claimed research findings often may be simply accurate measures of the prevailing bias. ” - J. P. A. Ioannidis, “Why Most Published Research Findings are False”, Chance, Vol. 18, No. 4, 2005

Causal Inference and Curse of Dimensionality Causal Model and Data Text from Taubes NYTimes

Causal Inference and Curse of Dimensionality Causal Model and Data Text from Taubes NYTimes Article

Text from Taubes NYTimes Article

Text from Taubes NYTimes Article

False conclusions are expensive p In medicine n n p In drug discovery n

False conclusions are expensive p In medicine n n p In drug discovery n n p False positives lead to failed trials p The average cost of a phase III clinical trial is $4 m-$20 m, some cost more than $100 m False negatives lead to failed trials p Missed contraindications, negative interactions and imprecise dosages In genomics, proteomics and chemoinformatics n n p False positives lead to expensive additional tests and anxiety False negatives lead to delayed treatment with escalated costs and illness False positives are abundant and lead to wasted time, effort and experimentation False negatives lead to missed business opportunities In public policy n False positives and false negatives lead to action based on false premises and, frequently, public cynicism

Bias is a hazard of statistics p Statistical data samples can be biased n

Bias is a hazard of statistics p Statistical data samples can be biased n n p Statistical methods for learning from data can be biased n n p The sample selected does not represent the population Example: There are five red heads in a town of 100 people. Our sample of 20 people happens to include all five. The statistical model selected is not the one that best fits the data… … for the question being asked! Statistical interpretations of findings can be biased.

Variable Importance of HIV resistance mutations p Goal: Rank a set of genetic mutations

Variable Importance of HIV resistance mutations p Goal: Rank a set of genetic mutations based on their importance for determining an outcome n n p Mutations (A) in the HIV protease enzyme p Measured by sequencing Outcome (Y) = change in viral load 12 weeks after starting new regimen containing saquinavir How important is each mutation for viral resistance to this specific protease inhibitor drug? n Inform genotypic scoring systems

Stanford Drug Resistance Database p All Treatment Change Episodes (TCEs) in the Stanford Drug

Stanford Drug Resistance Database p All Treatment Change Episodes (TCEs) in the Stanford Drug Resistance Database n Patients drawn from 16 clinics in Northern CA Baseline Viral Load <24 weeks Viral Genotype 12 weeks TCE (Change >= 1 Drug) 333 patients on saquinavir regimen Final Viral Load Change in Regimen

Parameter of Interest p Need to control for a range of other covariates W

Parameter of Interest p Need to control for a range of other covariates W n p Include: past treatment history, baseline clinical characteristics, non-protease mutations, other drugs in regimen Parameter of Interest: Variable Importance ψ = E[E(Y|Aj=1, W)-E(Y|Aj=0, W)] n For each protease mutation (indexed by j)

Analytic approach p Standard approach: n p p Fit a single multivariable regression E(Y|A,

Analytic approach p Standard approach: n p p Fit a single multivariable regression E(Y|A, W) p i. e. Regress clinical response on mutations, covariates Is this the best approach for answering the scientific question of interest? What is the scientific question? n n Construct best predictor vs. Estimate importance of each mutation

Prediction vs. Importance p Prediction – create a model that the clinician will use

Prediction vs. Importance p Prediction – create a model that the clinician will use to help predict risk of a disease for the patient. p Explanation – trying to investigate the causal association of a treatment or risk factor and a disease outcome.

Targeted Maximum Likelihood p p MLE- aims to do good job of estimating whole

Targeted Maximum Likelihood p p MLE- aims to do good job of estimating whole density Targeted MLE- aims to do good job at parameter of interest Ø Ø Ø General decrease in bias for parameter of Interest Protection under the null hypothesis Honest p-values, inference, multiple testing

Targeted Maximum Likelihood p In regression case, implementation just involves adding a covariate h(A,

Targeted Maximum Likelihood p In regression case, implementation just involves adding a covariate h(A, W) to the regression model p Requires estimating g(A|W) n p Robust: Estimate of ψ is consistent if either n n p E. g. distribution of each mutation given covariates g(A|W) is estimated consistently E(Y|A, W) is estimated consistently More on this later. . .

Mutation Rankings Based on Variable Importance Current Score Mutation VIM p-value Crude p-value 35

Mutation Rankings Based on Variable Importance Current Score Mutation VIM p-value Crude p-value 35 90 M 0. 70 0. 00 0. 76 0. 00 40 48 VM 0. 79 0. 00 1. 07 0. 00 30 N -0. 78 0. 00 -1. 06 0. 00 10 82 AFST 0. 46 0. 01 0. 35 0. 03 10 54 VA 0. 46 0. 01 0. 31 0. 11 10 73 CSTA 0. 67 0. 03 0. 80 0. 00 2 20 IMRTVL 0. 32 0. 07 0. 26 0. 18 1 36 ILVTA 0. 28 0. 10 0. 27 0. 12 2 10 FIRVY 0. 27 0. 13 0. 48 0. 00 5 88 DTG -0. 23 0. 24 -0. 50 0. 33 2 71 TVI 0. 18 0. 29 0. 14 0. 37 5 32 I -0. 18 0. 58 -0. 20 0. 55 2 63 P 0. 06 0. 77 0. 11 0. 56 5 46 ILV 0. 13 0. 98 0. 27 0. 10 0

“Better Evaluation Tools – Biomarkers and Disease” p p #1 highly-targeted research project in

“Better Evaluation Tools – Biomarkers and Disease” p p #1 highly-targeted research project in FDA “Critical Path Initiative” n Requests “clarity on the conceptual framework and evidentiary standards for qualifying a biomarker for various purposes” n “Accepted standards for demonstrating comparability of results, … or for biological interpretation of significant gene expression changes or mutations” Proper identification of biomarkers can. . . n n n Identify patient risk or disease susceptibility Determine appropriate treatment regime Detect disease progression and clinical outcomes Access therapy effectiveness Determine level of disease activity etc. . .

Evaluation of Biomarker Discovery Methods > Univariate Linear Regression p Importance measure: Coefficient value

Evaluation of Biomarker Discovery Methods > Univariate Linear Regression p Importance measure: Coefficient value with associated p-value p Measures marginal association > Random. Forest (Breiman 2001) p Importance measures (no p-values) RF 1: variable’s influence on error rate RF 2: mean improvement in node splits due to variable > Variable Importance with LARS • Importance measure: causal effect p Formal inference, p-values provided p LARS used to fit initial E[Y|A, W] estimate W={marginally significant covariates} Ø All p-values are FDR adjusted Simulation Study > Test methods ability to determine “true” variables under increasing correlation conditions • • Ranking by measure and p-value Minimal list necessary to get all “true”? > Variables p. Block Diagonal correlation structure: 10 independent sets of 10 p. Multivariate normal distribution p. Constant ρ, variance=1 pρ={0, 0. 1, 0. 2, 0. 3, …, 0. 9} > Outcome p. Main effect linear model p 10 “true” biomarkers, one variable from each set of 10 p. Equal coefficients p. Noise term with mean=0 sigma=10 § “realistic noise”

Simulation Results Minimal List length to obtain all 10 “true” variables p No appreciable

Simulation Results Minimal List length to obtain all 10 “true” variables p No appreciable difference in ranking by importance measure or p-value n plot above is with respect to ranked importance measures p List Length for linear regression and random. Forest increase with increasing correlation, Variable Importance w/LARS stays near minimum (10) through ρ=0. 6, with only small decreases in power p Linear regression list length is 2 X Variable Importance list length at ρ=0. 4 and 4 X at ρ=0. 6 p Random. Forest (RF 2) list length is consistently short than linear regression but still is 50% than Variable Importance list length at ρ=0. 4, and twice as long at ρ=0. 6 p Variable importance coupled with LARS estimates true causal effect and outperforms both linear regression and random. Forest

THE ‘’RIGID’’ STATISTICAL WORLD OF THE FDA p p Clinical Trials Rigid statistical methodology

THE ‘’RIGID’’ STATISTICAL WORLD OF THE FDA p p Clinical Trials Rigid statistical methodology and designs required for FDA approval.

Clinical trials are expensive p Time to market is critical n n p A

Clinical trials are expensive p Time to market is critical n n p A lot of money is involved n n p Half of the time-to-approval (currently 15. 3 yrs) is spent in clinical trials Each day of delay is expensive p Moderately successful drug: $1 m per day in lost sales p Blockbuster drug: $3 m per day Spending on US-sponsored clinical trials is $25. 6 b in 2006 p Biotech + Pharma = $22. 6 b, NIH = $3 p 9, 937 trials this year Pharma: 71% of R&D goes to drug development, 45% of this goes to clinical trials Recruiting patients is expensive n n n Direct costs of patient recruitment are high: $440 m per year Indirect costs due to delays p #1 contributor to drug application delays p 94% of trials in US miss their enrollment deadlines (Europe: 82%) p 80% are delayed at least one month Drop outs are a major problem p 1 of 4 volunteers drops out of a study after it begins

MOVING TOWARDS ADAPTIVE CLINICAL TRIALS ‘’A widely noted survey by Accenture provided some alarming

MOVING TOWARDS ADAPTIVE CLINICAL TRIALS ‘’A widely noted survey by Accenture provided some alarming figures a few years ago: Eighty-nine percent of all drug candidates from the initiation of Phase I through FDA approval fail and many of them in the clinic. Clearly, any techniques that could give an earlier read on these issues would be valuable. In too many cases , the chief result of a trial is to show that the trial itself was set up wrong, in ways that only became clear after the data were un-blinded. Did the numbers show that your dosage was suboptimal partway into a two year trial? Too bad- you probably weren'tallowed to know that. Were several arms of your study obviously pointless from the start? Even if you know, what could you do about it without harming the validity of the whole effort? Over the last years, such concerns have stimulated an unprecedented amount of work on new approaches. Ideas have come from industry, academia, and regulatory agencies such as the FDA's critical path initiative. A common theme in these efforts has been to move toward adaptive clinical trials. ’’

Approval of Drugs, and Post-Market Safety Reviews of Drugs p p FDA approvals are

Approval of Drugs, and Post-Market Safety Reviews of Drugs p p FDA approvals are based on inefficient and often biased statistical methods (e. g. , in how they deal with informative drop out. ) FDA does not have expertise to do post market safety reviews, since this requires expertise in the challenging field of causal inference. FDA needs to be modernized and need to make strong alliances with academic and industrial centers of excellence.

Senate Approves Tighter Policing of Drug Makers, May 8, 2007

Senate Approves Tighter Policing of Drug Makers, May 8, 2007

Statistical Innovations are Available p p p Statistical Inference for Adaptive designs. Targeted (Maximum

Statistical Innovations are Available p p p Statistical Inference for Adaptive designs. Targeted (Maximum Likelihood) Learning in Biomarker Discovery Targeted (Maximum Likelihood) Learning of Causal Effects of drugs and other interventions Targeted Learning of Treatment effect modification due to genetic and genomic factors (Multiple Testing). Learning of Individualized Treatment Rules (e. g. , individualized medicine). Super Learning in Prediction

CONCLUDING REMARKS: p p ‘’Statistics’’ can do a lot of harm in the hands

CONCLUDING REMARKS: p p ‘’Statistics’’ can do a lot of harm in the hands of people. Any published statistical analysis should be based on publicly available data. Any statistical analysis should be based on a priori specified analysis plan (MACHINE LEARNING) and any HUMAN INDUCED deviation from it should be documented (just like the FDA requires!). Statistical tools used in practice need to improved: FDA needs to be modernized with well founded statistical innovations.