The sampling distribution of a statistic 1 Sampling
The sampling distribution of a statistic 1
: . Sampling distribution of the mean of a sample. 2
Monte Carlo Estimates 3
The process for simulating the sampling distribution for some statistic: Generate multiple random samples Compute the statistic for each sample The collection of these calculated statistics provides an approximate sampling distribution (ASD). Analyze the ASD to draw conclusions. 4
Simulation using the Data Step 5
Sampling Distribution of the Mean, uniform(0, 1) 1. Generate Random samples. %let obs = 10; /* size of each sample */ %let reps = 1000; /* number of samples */ %let seed=54321; data Sim. Uni; call streaminit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Uniform"); output; end; run; 6
2. Compute mean for each sample proc means data=Sim. Uni noprint; by rep; var x; output out=Out. Uni mean=Mean. X; run; proc print data=outuni(obs=10); run; 7
3. Analyze ASD: summarize and create histogram proc means data=Out. Uni N Mean Std P 5 P 95; var Mean. X; run; proc univariate data=Out. Uni; label Mean. X = "Sample Mean of U(0, 1) Data"; histogram Mean. X / normal; ods select Histogram moments goodnessoffit; run; 8
Examine Percentiles proc univariate data=Out. Uni noprint; var Mean. X; output out=Pctl 95 N=N mean=Mean. X pctlpts=2. 5 97. 5 pctlpre=Pctl; run; proc print data=Pctl 95 noobs; run; 9
Estimate Probabilities from ASD, e. g. what is the probability the mean of a sample >. 7 proc sql; select sum(meanx>. 7)/count(*) as prob from outuni; quit; 10
Sampling Distribution of statistics from normal data 11
1. Simulate data %let obs = 31; %let rep = 10000; %let seed=54321; data Normals(drop=i); call streaminit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Normal"); output; end; run; 12
2. Compute statistics for each sample proc means data=Normals noprint; by rep; var x; output out=Stats. Norm mean=Sample. Mean median=Sample. Median var=Sample. Var; run; 13
3. Analyze Approximate Sampling Distribution. Calculate variances of sampling distribution for mean and median proc means data=Stats. Norm Var; var Sample. Mean Sample. Median; run; 14
3. Analyze Approximate Sampling Distribution. Plot kernel density estimates. proc sgplot data=Stats. Norm; title "Sampling Distributions of Mean and Median for N(0, 1) Data"; density Sample. Mean / type=kernel legendlabel="Mean"; density Sample. Median / type=kernel legendlabel="Median"; refline 0 / axis=x; run; 15
3. Analyze Approximate Sampling Distribution. Examine sampling distribution of the variance and fit to chi-square distribution. /* scale the sample variances by (N-1)/sigma^2 */ data Out. Stats. Norm; set Out. Stats. Norm; Scaled. Var = Sample. Var * (&N-1)/1; run; /* Fit chi-square distribution to data */ proc univariate data=Out. Stats. Norm; label Scaled. Var = "Variance of Normal Data (Scaled)"; histogram Scaled. Var / gamma(alpha=15 sigma=2); ods select Histogram; run; 16
The effect of sample size
Generate samples %let reps = 1000; %let seed=54321; data Sim. Uni. Size; call streaminit(&seed); do obs = 10, 30, 50, 100; do rep = 1 to &rep; do i = 1 to obs; x = rand("Uniform"); output; end; run; 18
Compute mean for each sample proc means data=Sim. Uni. Size noprint; by obs rep; var x; output out=Out. Stats mean=Sample. Mean; run; proc print data=outstats(obs=10); run; 19
Summarize approx. sampling distribution of statistic proc means data=Out. Stats Mean Std; class obs; var Sample. Mean; run; proc means data=Out. Stats noprint; class obs; var Sample. Mean; output out=out(where=(_TYPE_=1)) Mean=Mean Std=Std; run; 20
Use IML to create data to graph proc iml; use out; /*output dataset from proc means*/ read all var {N Mean Std}; /*create vectors*/ close out; /close the dataset*/ NN = N; x = T( do(0. 1, 0. 9, 0. 0025) ); create Convergence var {N x pdf}; /*create an empty data set*/ do i = 1 to nrow(NN); N = j(nrow(x), 1, NN[i]); pdf = pdf("Normal", x, Mean[i], Std[i]); append; /*add this observation to data set*/ end; close Convergence; /*close the dataset*/ quit; 21
Graph Created Data ods graphics / ANTIALIASMAX=1300; proc sgplot data=Convergence; title "Sampling Distribution of Sample Mean"; label pdf = "Density" N = "Sample Size"; series x=x y=pdf / group=N; run; 22
- Slides: 22