Adaptive Data Analysis without Overfitting Vitaly Feldman Accelerated

  • Slides: 29
Download presentation
Adaptive Data Analysis without Overfitting Vitaly Feldman Accelerated Discovery Lab IBM Research - Almaden

Adaptive Data Analysis without Overfitting Vitaly Feldman Accelerated Discovery Lab IBM Research - Almaden Cynthia Dwork Microsoft Res. Moritz Hardt Toni Pitassi Omer Reingold Aaron Roth Google Res. Penn, CS U. of Toronto Samsung Res.

 Param. estimates Classifier, Clustering etc. Results Analysis

Param. estimates Classifier, Clustering etc. Results Analysis

Data Analysis 101 Does student nutrition affect academic performance? 50 100 Normalized grade

Data Analysis 101 Does student nutrition affect academic performance? 50 100 Normalized grade

Check correlations Correlations with grade 0. 3 0. 2 0. 1 0 1 2

Check correlations Correlations with grade 0. 3 0. 2 0. 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 -0. 1 -0. 2 -0. 3 -0. 4

Pick candidate foods Correlations with grade 0. 3 0. 2 0. 1 0 1

Pick candidate foods Correlations with grade 0. 3 0. 2 0. 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 -0. 1 -0. 2 -0. 3 -0. 4

Fit linear function of 3 selected foods True vs Predicted Grade 1. 5 SUMMARY

Fit linear function of 3 selected foods True vs Predicted Grade 1. 5 SUMMARY OUTPUT 1 Regression Statistics Multiple R 0. 4453533 R Square 0. 1983396 Adjusted R Square 0. 1732877 Standard Error 1. 0041891 Observations 100 0. 5 0 -4 -3 -2 -1 0 ANOVA df 3 96 99 2 3 4 -1 -1. 5 Regression Residual Total 1 -0. 5 SS MS F 23. 95086544 7. 983622 7. 917151 96. 80600126 1. 008396 120. 7568667 FAL SE Significance F D I S COV 8. 98706 E-05 ERY Intercept Mushroom Pumpkin Nutella Coefficients Standard Error t Stat P-value -0. 044248 0. 100545016 -0. 44008 0. 660868 -0. 296074 0. 10193011 -2. 90468 0. 004563 0. 255769 0. 108443069 2. 358555 0. 020373 0. 2671363 0. 095186165 2. 806462 0. 006066 Freedman’s Paradox [1983]

Statistical inference “Fresh” i. i. d. samples Data Techniques Procedure CLT VC dimension Rademacher

Statistical inference “Fresh” i. i. d. samples Data Techniques Procedure CLT VC dimension Rademacher compl. Stability … Hypothesis tests Regression Learning Result + generalization guarantees 0

Holdout validation Data Data Training Holdout/Testing 0

Holdout validation Data Data Training Holdout/Testing 0

Data analysis is adaptive Data A Data B • • • Exploratory data analysis

Data analysis is adaptive Data A Data B • • • Exploratory data analysis Variable selection Hyper-parameter tuning Shared datasets …

Is this a real problem? “Why Most Published Research Findings Are False” [Ioannidis 05]

Is this a real problem? “Why Most Published Research Findings Are False” [Ioannidis 05] 1, 000+ downloads; 1, 400+ citations Adaptive data analysis is one of the causes • • Researcher degrees of freedom [Simmons, Nelson, Simonsohn 11] Garden of forking paths [Gelman, Loken 15] Leads to invalid (cross-)validation error estimates [Reunanen 03; Rao, Fung 08; Cawley, Talbot 10]

Approaches to adaptive analysis Abstinence e. g. pre-registration and data splitting Need safe techniques

Approaches to adaptive analysis Abstinence e. g. pre-registration and data splitting Need safe techniques

Adaptive analysis Data analyst(s)

Adaptive analysis Data analyst(s)

Adaptive statistical queries Data analyst(s) Statistical query oracle [Kearns 93]

Adaptive statistical queries Data analyst(s) Statistical query oracle [Kearns 93]

Answering non-adaptive SQs •

Answering non-adaptive SQs •

Answering adaptive SQs •

Answering adaptive SQs •

Our results

Our results

Algorithmic stability “Small changes” to the dataset have “small effect” on the output [Rogers,

Algorithmic stability “Small changes” to the dataset have “small effect” on the output [Rogers, Wagner 78; Devroye, Wagner 79] [Bousquet, Eliseeff 02; Kutin, Nyogi 02; Raklin, Mukherjee, Poggio 05; Shalev-Shwatrz, Shamir, Srebro, Sridharan 10] • •

06] S Cynthia Frank Aaron Chris Kobbi Adam Algorithm ratio bounded

06] S Cynthia Frank Aaron Chris Kobbi Adam Algorithm ratio bounded

DP composes adaptively A B

DP composes adaptively A B

DP composes adaptively A B

DP composes adaptively A B

DP composes adaptively DP implies generalizatio n

DP composes adaptively DP implies generalizatio n

Proof ideas •

Proof ideas •

Reusable holdout Data Data Analyst(s) Reusable Holdout algorithm

Reusable holdout Data Data Analyst(s) Reusable Holdout algorithm

Reusable holdout •

Reusable holdout •

Thresholdout algorithm • noise

Thresholdout algorithm • noise

Illustration •

Illustration •

Further reading Preserving Statistical Validity in Adaptive Data Analysis, http: //arxiv. org/abs/1411. 2664 [STOC

Further reading Preserving Statistical Validity in Adaptive Data Analysis, http: //arxiv. org/abs/1411. 2664 [STOC 2015] Generalization in Adaptive Data Analysis and Holdout Reuse http: //arxiv. org/abs/1506. 02629 [NIPS 2015] Overview: Reusable holdout: Preserving validity in adaptive data analysis. [Science, 2015] Come to workshop on “Adaptive Data Analysis” at NIPS 2015! • wadapt. org