Understanding Generalization in Adaptive Data Analysis Vitaly Feldman

  • Slides: 28
Download presentation
 Understanding Generalization in Adaptive Data Analysis Vitaly Feldman

Understanding Generalization in Adaptive Data Analysis Vitaly Feldman

Overview • Adaptive data analysis o Motivation o Definitions o Basic techniques With Dwork,

Overview • Adaptive data analysis o Motivation o Definitions o Basic techniques With Dwork, Hardt, Pitassi, Reingold, Roth [DFHPRR 14, 15] • New results [F, Steinke 17] • Adaptivity in convex optimization 2

 Results Data Analysis 3

Results Data Analysis 3

Statistical inference Data Theory Concentration/CLT Model complexity Rademacher compl. Stability Online-to-batch 0

Statistical inference Data Theory Concentration/CLT Model complexity Rademacher compl. Stability Online-to-batch 0

Data analysis is adaptive Steps depend on previous analyses of the same dataset Data

Data analysis is adaptive Steps depend on previous analyses of the same dataset Data pre-processing Exploratory data analysis Feature selection Model stacking Hyper-parameter tuning Trial-and-error Shared datasets … Data analyst(s)

Thou shalt not test hypotheses suggested by data “Quiet scandal of statistics” [Leo Breiman,

Thou shalt not test hypotheses suggested by data “Quiet scandal of statistics” [Leo Breiman, 1992]

Reproducibility crisis? “Why Most Published Research Findings Are False” [Ioannidis 2005] “Irreproducible preclinical research

Reproducibility crisis? “Why Most Published Research Findings Are False” [Ioannidis 2005] “Irreproducible preclinical research exceeds 50%, resulting in approximately US$28 B/year loss” [Freedman, Cockburn, Simcoe 2015]

Existing approaches • Sample splitting • Selective inference o Model selection + parameter estimation

Existing approaches • Sample splitting • Selective inference o Model selection + parameter estimation o Variable selection + regression • Pre-registration © Center for Open Science 8

ML practice Data Data Validation Training Testing XGBoost SVRG Tensorflow 9

ML practice Data Data Validation Training Testing XGBoost SVRG Tensorflow 9

Adaptive data analysis [DFHPRR 14] Data analyst(s) Algorithm

Adaptive data analysis [DFHPRR 14] Data analyst(s) Algorithm

Adaptive statistical queries Data analyst(s) Statistical query oracle [Kearns 93] Can measure correlations, moments,

Adaptive statistical queries Data analyst(s) Statistical query oracle [Kearns 93] Can measure correlations, moments, accuracy/loss, gradients Run any statistical query algorithm

Answering non-adaptive SQs •

Answering non-adaptive SQs •

Answering adaptively-chosen SQs •

Answering adaptively-chosen SQs •

Answering adaptive SQs

Answering adaptive SQs

Differential privacy [Dwork, Mc. Sherry, Nissim, Smith 06] M ratio bounded

Differential privacy [Dwork, Mc. Sherry, Nissim, Smith 06] M ratio bounded

DP implies generalization Differential privacy is stability

DP implies generalization Differential privacy is stability

DP implies generalization Differential privacy limits information learned about the dataset

DP implies generalization Differential privacy limits information learned about the dataset

DP implies generalization DP composes adaptively

DP implies generalization DP composes adaptively

DP implies generalization DP composes adaptively

DP implies generalization DP composes adaptively

Value perturbation [DMNS 06] • 20

Value perturbation [DMNS 06] • 20

Beyond low-sensitivity • 21

Beyond low-sensitivity • 21

Stable Median 22

Stable Median 22

Median algorithms • 23

Median algorithms • 23

Analysis • 24

Analysis • 24

Limits • 25

Limits • 25

Stochastic convex optimization 26

Stochastic convex optimization 26

Gradient descent 27

Gradient descent 27

Conclusions ü Real-valued analyses (without any assumptions) • Going beyond tools from DP o

Conclusions ü Real-valued analyses (without any assumptions) • Going beyond tools from DP o Other notions of stability for outcomes • [BNSSSU 16; RRTWX 16; F. , Steinke 17 b] o Max/mutual information • [DFHPRR 15; Russo, Zou 16; RRST 16; Bassily, Freund 16; Xu, Raginsky 17] • Generalization beyond uniform convergence • Using these techniques in practice 28