CSC 323 Quarter Winter 0203 Daniela Stan Raicu

  • Slides: 16
Download presentation
CSC 323 Quarter: Winter 02/03 Daniela Stan Raicu School of CTI, De. Paul University

CSC 323 Quarter: Winter 02/03 Daniela Stan Raicu School of CTI, De. Paul University 2/6/2022 Daniela Stan - CSC 323 1

Outline Chapter 5: Sampling Distributions Ø Ø Population and sample Sampling distribution of a

Outline Chapter 5: Sampling Distributions Ø Ø Population and sample Sampling distribution of a sample mean Central limit theorem Examples 2/6/2022 Daniela Stan - CSC 323 2

Introduction Ø This chapter begins a bridge from the study of probabilities to the

Introduction Ø This chapter begins a bridge from the study of probabilities to the study of statistical inference, by introducing the sampling distribution. Quality of sample data: Sample Population 2/6/2022 • The quality of all statistical analysis depends on the quality of the sample data • If the data sample is not representative, analyzing the data and drawing conclusions will be unproductive-at best. Random Sampling: every unit in the population has an equal chance to be chosen Daniela Stan - CSC 323 3

Some definitions Ø Parameter: A number describing a population. Ø Statistic: A number describing

Some definitions Ø Parameter: A number describing a population. Ø Statistic: A number describing a sample. 1. A random sample should represent the population well, so sample statistics from a random sample should provide reasonable estimates of population parameters. Sample statistics Sample mean Population parameter x Sample proportion p_hat Sample variance s 2 2/6/2022 Daniela Stan - CSC 323 p 2 4

Some definitions (cont. ) 2. All sample statistics have some error in estimating population

Some definitions (cont. ) 2. All sample statistics have some error in estimating population parameters. 3. If repeated samples are taken from a population and the same statistic (e. g. mean) is calculated from each sample, the statistics will vary, that is, they will have a distribution. 4. A larger sample provides more information than a smaller sample so a statistic from a large sample should have less error than a statistic from a small sample. 2/6/2022 Daniela Stan - CSC 323 5

Describing the Sample Mean Let us assume that we want to estimate the mean

Describing the Sample Mean Let us assume that we want to estimate the mean of the population since usually this is the first piece of information that an analyst wants to analyze: Ø Since the value of the sample mean depends on the particular sample we draw, the sample mean is a variable with a huge number of possible values. Ø The sample mean is a random variable because the samples are drawn randomly. Ø The best way to summarize this vast amount of information is to describe it with a probability distribution. 2/6/2022 Daniela Stan - CSC 323 6

The Distribution of the Sample Mean Problem: Population: {A, B, C, D, E, F}

The Distribution of the Sample Mean Problem: Population: {A, B, C, D, E, F} Population mean: =. 1483 Population Variance: =. 00061 2/6/2022 Daniela Stan - CSC 323 7

The Distribution of the Sample Mean Assumptions: • What is the central value of

The Distribution of the Sample Mean Assumptions: • What is the central value of the variable x? • What is its variability? • Is there a familiar pattern in the variability? 2/6/2022 =. 1483 =. 00061 Daniela Stan - CSC 323 8

What is the central value of the sample mean? • For large samples, the

What is the central value of the sample mean? • For large samples, the distribution of x should be symmetrical: x should be larger than about 50% of the time and x should be smaller than about 50% of the time. It can be shown theoretically (Central Limit theorem) that the mean of the sample means equals the population mean: E(x) = In our example, E(x)= 0. 1483 = x is an unbiased estimator 2/6/2022 Daniela Stan - CSC 323 9

What is the variance of the sample mean? • An estimator variance reveals a

What is the variance of the sample mean? • An estimator variance reveals a great deal about the quality of the estimator. The variance of the sample mean s 2 = 2/n Where 2 = variance of the population n = sample size Increase of the sample size n Decrease of the variance s 2 Better accuracy of the estimator 2/6/2022 Daniela Stan - CSC 323 10

Accuracy of the Estimator As in many problems, there is a trade off between

Accuracy of the Estimator As in many problems, there is a trade off between accuracy and dollars. What we will get from our money if we invest dollars in obtaining a larger size? n = 100? n = 200? 2/6/2022 Daniela Stan - CSC 323 11

Is there a familiar pattern in the data? • As the sample size becomes

Is there a familiar pattern in the data? • As the sample size becomes larger, the distribution of the sample mean becomes closer to a normal distribution, regardless the distribution of the population from which the sample is drawn. • The central limit theorem summarizes the distribution of the sample mean. 2/6/2022 Daniela Stan - CSC 323 12

The Central Limit Theorem 2/6/2022 Daniela Stan - CSC 323 13

The Central Limit Theorem 2/6/2022 Daniela Stan - CSC 323 13

Importance of the central limit theorem • The most important feature is that it

Importance of the central limit theorem • The most important feature is that it can be applied to any population as long as the sample size n is large enough. How large is large? n >= 30 2/6/2022 Daniela Stan - CSC 323 14

Importance of the central limit theorem Examples: 2/6/2022 Daniela Stan - CSC 323 15

Importance of the central limit theorem Examples: 2/6/2022 Daniela Stan - CSC 323 15

Is x normal distributed? Is the population normal? Yes Is Yes No Is ?

Is x normal distributed? Is the population normal? Yes Is Yes No Is ? No is normal 2/6/2022 has t-student distribution Yes is considered to be normal Daniela Stan - CSC 323 ? No may or may not be considered normal (We need more info) 16