BOOTSTRAPS Techniques for the ComputingCapable Statistician 12312021 Think

BOOTSTRAPS Techniques for the Computing-Capable Statistician 12/31/2021 Think hard about statistical properties of estimators. 1

Monographs on Statistics and Applied Probability 57 An Introduction to the Bootstrap Bradley Efron Robert J. Tibshirani THE BOOK Chapman & Hall/CRC 12/31/2021 Think hard about statistical properties of estimators. 2

WHEN PROBABILITY THEORY WORKS. . . n . . . it works very well. n sums of iid random variables n n min of iid random variables n n ~c 2 ratios of c 2 n n ~Exponential sums of standard Normal^2 n n ~Normal ~F See handout on transforms, etc. 12/31/2021 Think hard about statistical properties of estimators. 3

STATISTICS n Estimate a quantity n n Predict the variability of the estimate n n q-hat estimates q involves predicting the distribution (form, parameters) of the estimator q-hat X-bar and s-hat have known distributions n n 12/31/2021 X-bar ~ Normal s 2 -hat ~ c 2 Think hard about statistical properties of estimators. 4

OTHER STATISTICS n Median (an example of) n n quartile estimates order statistics n Ratios, Transforms, non-polynomial Functions n None of these have known distributions How can you assess the variability of an estimator? n 12/31/2021 Think hard about statistical properties of estimators. 5

PRELIMINARY DEFINITION AND NOTATION n Given samples X 1, X 2 , . . . , Xn n X(i) is the i-th smallest sample n and is called the i-th order statistic = X(i) such that an ~ i n is called the a-th p-tile n Xa 12/31/2021 Think hard about statistical properties of estimators. 6

EXAMPLE X(1) = 10 n X(2) = 11 n X(3) = 11. 2 n X(4) = 11. 6 n n X 0. 025 = X(25) 12/31/2021 X(997) = 12. 8 n X(998) = 12. 9 n X(999) = 13. 0 n X(1000) = 13. 9 n n X 0. 975 = X(975) Think hard about statistical properties of estimators. 7

EMPIRICAL CONFIDENCE INTERVAL n Empirical confidence interval for X is n (X 0. 025 , X 0. 975) = (X(25) , X(975) ) n for X, not the mean or median, etc. n Can use all 1000 samples to estimate the median n M = X(500) = 11. 9 n NO predictive value n 12/31/2021 How accurate is this estimate? Think hard about statistical properties of estimators. 8

MORE VENACULAR n Call F the underlying distribution of the phenomenon being studied n n Call F-hat the empirical (observed example) distribution of F n n F(x) = P(X <= x) F-hat = {X 1, X 2 , . . . , Xn} weighted 1/n each BOOTSTRAPPING: Use F-hat as a sampling surrogate for F n 12/31/2021 don’t oversell resulting reliability of estimates Think hard about statistical properties of estimators. 9

SMOOTHED F-HAT 12/31/2021 Think hard about statistical properties of estimators. 10

BOOTSTRAPPING Given samples F-hat = {X 1, X 2 , . . . , Xn} n b-th bootstrap sample x*(b) n n sample n times from X 1, X 2 , . . . , Xn with replacement let m*(b) be the median of the b-th set of samples m*(1), m*(2), . . . , m*(B) is a sample of medians 12/31/2021 Think hard about statistical properties of estimators. 11

THE BASE SAMPLE FORMS THE POPULATION FOR THE BOOTSTRAP SAMPLE BOOTSTRAP WORLD REAL WORLD X 1, X 2 , . . . , Xn F Mbase usual estimate 12/31/2021 EMPIRICAL F X*1, X*2 , . . . , X*n. . . BOOTSTRAP estimate of Mbase ‘s distribution Think hard about statistical properties of estimators. 12

KEY EXCEPTION n Are m*(1), m*(2), . . . , m*(B) independent samples of the median? n 12/31/2021 With respect to F-hat but not with respect to F Think hard about statistical properties of estimators. 13

n Mbase has nonparametric confidence interval. . . (m*0. 025 , m*0. 975) n Standard error of Mbase estimated as a standard deviation 12/31/2021 Think hard about statistical properties of estimators. 14

PRACTICAL APPLICATION Bootstrap samples treated as independent n B ~ 500 n Practical for ANY sample statistic n n Spreadsheet Bootstrap. xls does an estimate of the Median and IQR (X 0. 75 X 0. 25) for IQ scores 12/31/2021 Think hard about statistical properties of estimators. 15

IS BOOTSTRAPPING CHEATING? n Example: n n n 12/31/2021 100 real datapoints, 200 Bootstrap samples statistic M calculated for each Bootstrap sample Standard (non-bootstrap) Error of Mbase is S(M*i – Mbase)2/199 Think hard about statistical properties of estimators. 16

IS BOOTSTRAPPING CHEATING? n n n If we had 100 x 200 = 20, 000 independent samples n One large pool to estimate Mbase n Standard Error of M is ~ S(Mi – Mbase)2/(19, 999) As the number of bootstrap samples increase, the standard error estimate stabilizes As the number of independent samples increases, the standard error estimate converges to 0! 12/31/2021 Think hard about statistical properties of estimators. 17

SUMMARY n Bootstrapping allows us to estimate the variability of sample statistics where the statistic’s probability distribution is unknown. 12/31/2021 Think hard about statistical properties of estimators. 18