Introduction to Validity What is Validity the best

Introduction to Validity

What is Validity? § the best available approximation to the truth or falsity of a given inference, proposition, conclusion § a set of standards by which research can be judged

The Causal Context Theory Observation

The Causal Context Theory Cause Construct Observation

The Causal Context Theory Cause Construct Observation cause-effect construct Effect Construct

The Causal Context Theory Cause Construct Observation What you think cause-effect construct Effect Construct

The Causal Context Theory Cause Construct What you think cause-effect construct Effect Construct operationalize Program Observation In this study

The Causal Context Theory Cause Construct What you think cause-effect construct Effect Construct operationalize Program Observations Observation In this study

The Causal Context Theory What you think Cause Construct cause-effect construct Effect Construct operationalize Program Observations What you do What you see Observation In this study

The Causal Context Theory What you think Cause Construct cause-effect construct operationalize Program program-outcome relationship What you do Observation Effect Construct Observations What you see What you test In this study

Conclusion Validity Is there a relationship between. . . § what you did and what you saw? § your program and your observations? Program What you do Observation program-outcome relationship Observations What you see In this study

Internal Validity Is the relationship causal between. . . • what you did and what you saw? • your program and your observations? alternative cause Program What you do Observation program-outcome relationship alternative cause Observations What you see alternative cause In this study

Construct Validity Theory What you think Cause Construct cause-effect construct Effect Construct Can we generalize to the constructs? Program What you do Observation program-outcome relationship Observations What you see

External Validity Theory What you think Cause Construct Effect Construct cause-effect construct Can we generalize to other persons, places, times? Program program-outcome relationship What you do Observations Program What you see What you do program-outcome relationship Observations What you see Observation Program What you do Observation program-outcome relationship Observations Program What you see What you do Observation program-outcome relationship Observations What you see

The Validity Questions are cumulative. . .

The Validity Questions are cumulative. . . In this study Is there a relationship between the cause and effect?

The Validity Questions are cumulative. . . In this study Conclusion Is the relationship causal? Is there a relationship between the cause and effect?

The Validity Questions are cumulative. . . In theory Internal Conclusion Can we generalize to the constructs? Is the relationship causal? Is there a relationship between the cause and effect?

The Validity Questions are cumulative. . . Can we generalize In theory to other persons, places, times? Can we generalize to Construct the constructs? Internal Conclusion Is the relationship causal? Is there a relationship between the cause and effect?

The Validity Questions are cumulative. . . Validity Can we generalize External to other persons, places, times? Can we generalize to Construct the constructs? Internal Conclusion Is the relationship causal? Is there a relationship between the cause and effect?

Sampling

The External Validity Question Theory Cause Construct What you think cause-effect construct Effect Construct

The External Validity Question Theory Cause Construct Effect Construct cause-effect construct Can we generalize to other persons, places, times? Program program-outcome relationship What you do Observations Program What you see What you do program-outcome relationship Observations What you see Observation Program What you do Observation program-outcome relationship Observations Program What you see What you do Observation program-outcome relationship Observations What you see

How Do We Generalize? Model I: Sampling specified persons, places , times Population

How Do We Generalize? Model I: Sampling Population draw sample Sample draw sample

How Do We Generalize? Model I: Sampling generalize back Population Sample

How Do We Generalize? Model II: Proximal Similarity Our Study

How Do We Generalize? Model II: Proximal Similarity settings times Our Study places people

How Do We Generalize? Model II: Proximal Similarity less similar settings less similar times Our Study places less similar people less similar

How Do We Generalize? Model II: Proximal Similarity less similar settings less similar times Our Study people less similar places less similar Gradients of Similarity

Threats to External Validity Interaction of Selection and Treatment maybe it is just these people Interaction of Setting and Treatment maybe it is just these places Interaction of History and Treatment maybe it is just these times

How Can We Improve External Validity?

How Can We Improve External Validity? Population random sampling Sample

How Can We Improve External Validity? Population random sampling Our Study tim es replicate, replicate places Sample people

How Can We Improve External Validity? Population random sampling Our Study tim es replicate, replicate places Sample people settings use theory times Our Study places people

Two Major Types of Sampling Methods Probability Sampling uses some form of random selection requires that each unit have a known (often equal) probability of being selected Non-Probability Sampling selection is systematic or haphazard, but not random

Basic Terms of Sampling Who do you want to generalize to?

Basic Terms of Sampling Theoretical Population

Basic Terms of Sampling Theoretical Population What population can you get access to?

Basic Terms of Sampling Theoretical Population The Study Population

Basic Terms of Sampling Theoretical Population The Study Population How can you get access to them?

Basic Terms of Sampling Theoretical Population The Study Population The Sampling Frame

Basic Terms of Sampling Theoretical Population The Study Population The Sampling Frame Who is in your study?

Basic Terms of Sampling Theoretical Population The Study Population The Sampling Frame The Sample

What are the Theoretical Population, Sample Population and Sampling Frame? Seem like a good sample? § To study medical factors related to falls by elderly individuals you sample all 178 residents who fell in 1990 while in a particular geriatric facility and 339 patients randomly selected from the 850 residents in the same facility who did not fall during that year.

What are the Theoretical Population, Sample Population and Sampling Frame? Seem like a good sample? § To study intra-city travel patterns the city council has commissioned a computerized dataset of 110, 000 trips made in one year § You randomly sample 10% of the trips for intensive analysis

Probability in Sampling

Key Concepts § Statistical terms in sampling § Sampling error § The sampling distribution

Statistical Terms in Sampling Variable

Statistical Terms in Sampling Variable 1 Self-esteem 2 3 4 5

Statistical Terms in Sampling Variable 1 Self-esteem Statistic 2 3 4 5

Statistical Terms in Sampling Variable 1 2 3 4 5 Self-esteem Statistic Average = 3. 72 Sample

Statistical Terms in Sampling Variable 1 2 3 4 5 Self-esteem Statistic Average = 3. 72 Sample Parameter

Statistical Terms in Sampling Variable 1 2 3 4 5 Self-esteem Statistic Average = 3. 72 Sample Parameter Average = 3. 75 Population

What are the variable, statistic and parameter? § To assess the number of falls by elderly residents of retirement homes, you sample 10 retirement homes from different parts of the country and find that about 3. 1 falls occur per facility.

Sampling Error The population has a mean of 3. 75. Frequency 150 100 50 0 3. 5 4. 0 Self-esteem 4. 5

Sampling Error The population has a mean of 3. 75. . . Frequency 150 . . . and a standard deviation of. 25. 100 50 0 3. 5 4. 0 Self-esteem 4. 5

Sampling Error The population has a mean of 3. 75. . . 150 . . . and a standard deviation of. 25. F Frequency 100 50 0 3. 0 This means that. . . 3. 5 4. 0 Self-esteem 4. 5

Sampling Error The population has a mean of 3. 75. . . Frequency 150 . . . and a standard deviation of. 25. 100 50 0 3. 0 This means that. . . 3. 5 4. 0 Self-esteem about 64% of cases fall between 3. 5 - 4. 0. 4. 5

Sampling Error The population has a mean of 3. 75. . . Frequency 150 . . . and a standard deviation of. 25. 100 50 0 3. 0 This means. . . 3. 5 4. 0 Self-esteem About 64% of cases fall between 3. 5 - 4. 0. About 95% of cases fall between 3. 25 - 4. 25. 4. 5

Sampling Error The population has a mean of 3. 75. . . 150 . . . and a standard deviation of. 25. Frequency 100 50 0 3. 0 This means 3. 5 4. 0 Self esteem About 64% of cases fall between 3. 5 - 4. 0. About 95% of cases fall between 3. 25 - 4. 25. about 99% of cases fall between 3. 0 - 4. 5

Sampling Error The sample of 1000 has a mean of 3. 74 and a standard deviation of. 0074. Frequency 150 100 50 0 3. 5 4. 0 4. 5 Self-esteem The standard deviation of a sample is called the sampling error.

Sampling Error The sample of 1000 has a mean of 3. 74 and a standard deviation of. 0074. Frequency 150 100 50 0 3. 5 4. 0 4. 5 Self-esteem The sampling error shows that the odds are. 95 that the population mean is 3. 74 + 2(. 0074).

Sampling Error The sample of 1000 has a mean of 3. 74 and a standard deviation of. 0074. Frequency 150 100 50 0 3. 5 4. 5 Confidence level 4. 0 Self-esteem The sampling error shows that the odds are. 95 that the population mean is 3. 74 + 2(. 0074).

Sampling Error The sample of 1000 has a mean of 3. 74 and a standard deviation of. 0074. Frequency 150 100 50 0 3. 5 4. 0 Self-esteem 4. 5 Confidence interval The sampling error shows that the odds are. 95 that the population mean is 3. 74 + 2(. 0074).

What is the confidence interval? § To assess the number of falls by elderly residents of retirement homes, you sample 10 retirement homes from different parts of the country and find that about 3. 1 falls occur per facility § The sampling error is 1. 4 § What is the confidence interval at 95% certainty?

The Sampling Distribution Sample

The Sampling Distribution Sample 5 5 5 0 0 0 3. 2 3. 4 3. 6 3. 8 4. 0 4. 2 4. 4 3. 0 3. 2 3. 4 3. 6 3. 8 4. 0 4. 2 4. 4

The Sampling Distribution Sample 5 5 5 0 0 0 3. 2 3. 4 3. 6 3. 8 4. 0 4. 2 Average 4. 4 3. 0 3. 2 3. 4 3. 6 3. 8 4. 0 4. 2 Average 4. 4

The Sampling Distribution Sample 5 5 5 0 0 0 3. 2 3. 4 3. 6 3. 8 4. 0 4. 2 4. 4 3. 0 Average 3. 2 3. 4 3. 6 3. 8 4. 0 4. 2 4. 4 Average 15 The sampling distribution. . . is the distribution of a statistic across an infinite number of samples. 10 5 0 3. 2 3. 4 3. 6 3. 8 4. 0 4. 2 4. 4

The Sampling Distribution The Standard Deviation for the Sampling Distribution is called the Standard Error That’s the third term for Standard Deviation! When we are sampling, this is what we are trying to estimate

Awesome In. Class Activity that will make all of this CLEAR

What you should have gotten out of that § Larger the sample, smaller the sampling error § Greater to variability, greater the sampling error § Greater the variability, the larger the sample needs to be

Can you randomly sample from a skewed distribution? § § § Central Limit Theorem Scrabble Tiles of Probability! As a sample gets sufficiently large (e. g. more than 30), the distribution of the sample means will tend toward a normal curve § Answer: Yes!

Types of Sampling

Probability Sampling

Types of Probability Sampling Designs § § § Simple Random Sampling Stratified Sampling Systematic Sampling Cluster (Area) Sampling Multistage Sampling

Some Definitions § N = the number of cases in the sampling frame § n = the number of cases in the sample § f = n/N = the sampling fraction

Simple Random Sampling • Objective - select n units out of N such that every n has an equal chance • Procedure - use table of random numbers, computer random number generator (RAND in Excel) or mechanical device • can sample with or without replacement • f=n/N is the sampling fraction

Simple Random Sampling § § Example: small service agency client assessment of quality of service get list of clients over past year draw a simple random sample of n/N

Simple Random Sampling List of Clients

Simple Random Sampling List of Clients Random Subsample

Handout

Stratified Random Sampling • sometimes called "proportional" or "quota" random sampling • Objective - population of N units divided into non-overlapping strata N 1, N 2, N 3, . . . Ni such that N 1 + N 2 +. . . + Ni = N, then do simple random sample of n/N in each strata

Stratified Sampling Purposes: • to insure representation of each strata - oversample smaller population groups • administrative convenience - field offices • • sampling problems may differ in each strata increase precision (lower variance) if strata are homogeneous within

Stratified Random Sampling List of Clients

Stratified Random Sampling List of Clients African-American Strata Hispanic-American Others

Stratified Random Sampling List of Clients African-American Hispanic-American Others Strata Random Subsamples of n/N

$Proportionate vs. Disproportionate Stratified Random Sampling § proportionate: if sampling fraction is equal for$

Proportionate vs. Disproportionate Stratified Random Sampling § proportionate: if sampling fraction is equal for each stratum § disproportionate: unequal sampling fraction in each stratum § needed to enable better representation of smaller (minority groups)

Handouts

Systematic Random Sampling Procedure: § § § number units in population from 1 to N decide on the n that you want or need N/n=k the interval size randomly select a number from 1 to k then take every kth unit

Systematic Random Sampling § Assumes that the population is randomly ordered § Advantages - easy; may be more precise than simple random sample § Example – Shoot the card catalog with a shotgun – would it be as representative as a systematic random sample?

Systematic Random Sampling N = 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Systematic Random Sampling N = 100 want n = 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Systematic Random Sampling N = 100 want n = 20 N/n = 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Systematic Random Sampling N = 100 want n = 20 N/n = 5 select a random number from 1 -5: chose 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Systematic Random Sampling N = 100 want n = 20 N/n = 5 select a random number from 1 -5: chose 4 start with #4 and take every 5 th unit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Handout

Cluster (area) Random Sampling Procedure: § divide population into clusters § randomly sample clusters § measure all units within sampled clusters

Cluster (area) Random Sampling § Advantages - administratively useful, especially when you have a wide geographic area to cover § Examples - randomly sample from city blocks and measure all homes in selected blocks

Handout

Multi-Stage Sampling § Cluster (area) random sampling can be multi-stage § Any combinations of single-stage methods

Multi-Stage Sampling Example - choosing students from schools � Example - choosing students from schools § Select all schools, then sample within schools § Sample schools, then measure all students § Sample schools, then sample students

How to choose the best sampling method § The best sampling method is the sampling method that most effectively meets the particular goals of the study in question. The effectiveness of a sampling method depends on many factors. Because these factors interact in complex ways, the "best" sampling method is seldom obvious.

How to choose the best sampling method § However, good researchers use the following strategy to identify the best sampling method. 1. List the research goals (usually some combination of accuracy, precision, and/or cost). 2. Identify potential sampling methods that might effectively achieve those goals. 3. Test the ability of each method to achieve each goal. 4. Choose the method that does the best job of achieving the goals.

How to choose the best sampling method § Problem Statement: At the end of every school year, the state administers a reading test to a sample of third graders. The school system has 20, 000 third graders, half boys and half girls. There are 1000 third-grade classes, each with 20 students. The maximum budget for this research is $3600. The only expense is the cost to proctor each test session. This amounts to $100 per session. The purpose of the study is to estimate the reading proficiency of third graders, based on sample data. School administrators want to maximize the precision of this estimate without exceeding the $3600 budget. What sampling method should they use?

Goals 1. List goals. This study has two main goals: (1) maximize precision and (2) stay within budget.

Sampling Methods § Identify potential sampling methods. We will consider four basic sampling methods - simple random sampling, proportionate stratified sampling, disproportionate stratified sampling, and cluster sampling

Test § Test methods. A key part of the analysis is to test the ability of each potential sampling method to satisfy the research goals. Specifically, we will want to know the level of precision and the cost associated with each potential method. For our test, we use the standard error to measure precision. The smaller the standard error, the greater the precision.

SRS § The test was administered to 36 students selected via simple random sampling. The test score from each sampled student is shown below: § 50, 55, 60, 62, 65, 67, 70, 70, 72, 73, 75, 78, 78, 80, 80, 82, 85, 85, 88, 90, 90 § Using sample data, estimate the mean reading achievement level in the population. Find the margin of error and the confidence interval. Assume a 95% confidence level. Do this one in class.

SRS Results § Mean = ( 50 + 55 + 60 +. . . + 90 ) / 36 = 75 § Standard Deviation = 9. 95 § Standard Error = 9. 95/sqrt(36) = 1. 66 § Margin of Error = 1. 66 x 1. 96 = 3. 25 § Conclusion: 95% confidence that true population mean is 75 ± 3. 25

Sampling Methods Compared Sampling method Cost Standard error Sample size Simple random sampling $3, 600 1. 66 36

Proportionate Stratified § A proportionate stratified sample was used to select 36 students for testing. Because the population is half boy and half girl, one stratum consisted of 18 boys; the other, 18 girls. Test scores from each sampled student are shown below: § Boys 50, 55, 60, 62, 65, 67, 70, 73, 75, 78, 80, 85, 90 § Girls 70, 72, 75, 78, 80, 82, 85, 88, 90, 90 § Using sample data, estimate the mean reading achievement level in the population. Find the margin of error and the confidence interval. Assume a 95% confidence level.

Stratified Mean 1. Computing the mean of the stratified sample is pretty easy 2. Compute the mean of each strata 3. Multiply each strata mean by its proportion in the population 4. Add them together So, for the boys, 70 x (10, 000/20, 000) = 35 And for the girls, 80 x (10, 000/20000) = 40 So, the mean of the sample is 35 + 40 = 75

Stratified Standard Error § Computing the Standard Error IS NOT CONCEPTUALLY EASY! § Each Sampling Method has a different way to figure out the Standard Error § SE = (1 / N) * sqrt [ Σ ( Nh 2 * sh 2 / nh ) ] § Walkthrough in Excel

Sampling Methods Compared Sampling method Cost Standard error Sample size Simple random sampling $3, 600 1. 66 36 Proportionate stratified sampling $3, 600 1. 45 36

Nonproportional: Optimum Allocation § If you have previous scores, you can find a sample allocation plan that provides more precision. The solution to this problem is a special case of optimal allocation, called Neyman allocation. § YOU DON’T NEED TO KNOW THE NEYMAN ALLOCATION, but based on the Neyman allocation, the best sample size for stratum h would be: § nh = n * ( Nh * σh ) / [ Σ ( Ni * σi ) ] § where nh is the sample size for stratum h, n is total sample size, Nh is the population size for stratum h, and σh is the standard deviation of stratum h.

Nonproportional: Optimum Allocation Let’s say that the results from last year's test are shown in the table below: Stratum Mean score Standard deviation Boys 70 10. 27 Girls 80 6. 66 To maximize precision, how many sampled students should be boys and how many should be girls? What is the mean reading achievement level in the population? Compute the confidence interval and find the margin of error at 95%

Nonproportional Sample Size § The first step is to decide how to allocate sample in order to maximize precision. Based on Neyman allocation, the number of boys in the sample is: § nboys = 36 * ( 10, 000 * 10. 27 ) / [ ( 10, 000 * 10. 27 ) + ( 10, 000 * 6. 67 ) ] = 21. 83 or 22 boys § ngirls = Total - boys = 36 – 22 = 14

Nonproportional Results § At this point you finish up just like in the proportional stratified (same formula for Standard Error) § This time you would get a Standard Error of 1. 41

Sampling Methods Compared Sampling method Cost Standard error Sample size Simple random sampling $3, 600 1. 66 36 Proportionate stratified sampling $3, 600 1. 45 36 Disproportionate stratified sampling $3, 600 1. 41 36

Cluster § The test is administered to each student in 30 randomly -sampled classes. Thus, this is one-stage cluster sampling, with classes serving as clusters. The average test score from each sampled cluster xi is shown below: § 55, 60, 65, 67, 70, 70, 72, 72, 73, 75, 75, 75, 77, 78, 78, 80, 80, 80, 83, 85, 85 § Using the sample data, estimate the mean reading achievement level in the population. Find the margin of error and the confidence interval. Assume a 95% confidence level.

Cluster Results § mean = ( 55 + 60 + 65 +. . . + 85 ) / 36 = 75 § Standard Error formula cluster sampling is a truly scary beast to look upon § Trust me when I say it comes out to 1. 1 § So, the confidence interval at 95% is 1. 1 x 1. 96 = 2. 16 § Answer is 75 ± 2. 16

Which one do you pick? Sampling method Cost Standard error Sample size Simple random sampling $3, 600 1. 66 36 Proportionate stratified sampling $3, 600 1. 45 36 Disproportionate stratified sampling $3, 600 1. 41 36 One-stage cluster sampling $3, 600 1. 1 720

Nonprobability Sampling Designs

Major Issues • Likely to misrepresent the population • May be difficult or impossible to detect this misrepresentation

Types of Nonprobability Samples § § § § Accidental, haphazard, convenience Modal Instance Purposive Expert Quota Snowball Heterogeneity sampling

Accidental, Haphazard or Convenience Sampling § § § “man on the street” college psychology majors available or accessible clients volunteer samples Problem: we have no evidence for representativeness § What is Jay Leno’s purpose?

Modal Instance Sampling • Sample for the typical case • Will it play in Peoria? • Typical voter? • Problem: may not represent the modal group proportionately

Purposive Sampling § Might sample several pre-defined groups (e. g. , the shopping mall survey which attempts to identify relevant market segments) § Deliberately sampling an extreme group § Problem: Proportionality § Problem: Need theory to correctly sample an extreme group

Handout

Expert Sampling § have a panel of experts make a judgment about the representativeness of your sample § Advantage: at least you can say that expert judgment supports the sampling § Problem: the “experts” may be wrong § Is China going to stop certain relations with the US over the Dalai Lama? Better ask some experts.

Quota Sampling § select people non-randomly according to some quotas § Proportional Quota Sampling § Nonproportional Quota Sampling

Proportional Quota Sampling • Objective: represent major characteristics of population by sampling a proportional amount of each. For example, if you know the population has 40% women and 60% men, you want your sample to meet that quota • However, does not random sample within quotas (not like stratified random)

Nonproportional Quota Sampling • making sure you have enough units from each target group of interest (even if not proportional) • as with stratified random sampling you might do this to assure that you have good representation of smaller population groups • Establish minimums

Snowball Sampling § one person recommends another, who recommends another, etc. § good way to identify hard-to-reach populations § for example, homeless persons § Also good for identifying stakeholders

Heterogeneity Sampling • make sure you include all sectors - at least several of everything - don't worry about proportions (like in quota sampling) • use when one or more people are a good proxy for the group • for instance, when brainstorming issues across stakeholder groups or running a focus group

Calculating a Sample Size

Sample Accuracy • Sample accuracy: refers to how close a random sample’s statistic (e. g. mean, variance, proportion) is to the population’s value it represents (mean, variance, proportion) • Important points: • Sample size is NOT related to representativeness … you could sample 20, 000 persons walking by a street corner and the results would still not represent the city; however, an n of 100 could be “right on. ”

Sample Accuracy • Important points: • Sample size, however, IS related to accuracy. How close the sample statistic is to the actual population parameter (e. g. sample mean vs. population mean) is a function of sample size.

Sample Size AXIOMS To properly understand how to determine sample size, it helps to understand the following AXIOMS…

Sample Size Axioms • The only perfectly accurate sample is a census. • A probability sample will always have some inaccuracy (sample error). • The larger a probability sample is, the more accurate it is (less sample error). • Probability sample accuracy (error) can be calculated with a simple formula, and expressed as a + % value.

Sample Size Axioms…cont. • You can take any finding in the survey, replicate the survey with the same probability sample plan & size, and you will be “very likely” to find the same result within the + range of the original findings. • In almost all cases, the accuracy (sample error) of a probability sample is independent of the size of the population.

Sample Size Axioms…cont. • A probability sample can be a very tiny percentage of the population size and still be very accurate (have little sample error). • The size of the probability sample depends on the client’s desired accuracy (acceptable sample error) balanced against the cost of data collection for that sample size.

There is only one method of determining sample size that allows the researcher to PREDETERMINE the accuracy of the sample results… The Confidence Interval Method of Determining Sample Size

The Confidence Interval Method of Determining Sample Size Notion of Confidence Interval Confidence interval: range whose endpoints define a certain percentage of the responses to a question • Central limit theorem: a theory that holds that values taken from repeated samples of a survey within a population would look like a normal curve. The mean of all sample means is the mean of the population.

We also know that, given the amount of variability in the population, the sample size affects the size of the confidence interval; as n goes down the interval widens (more “sloppy”)

The Confidence Interval Method of Determining Sample Size • The relationship between sample size and sample error:

2 Formulas § One for when you’re looking for a proportional answer (e. g. for/against) § One for when you’re looking for a mean (e. g. the average of a county’s voters)

The Confidence Interval Method of Determining Sample Size - Proportions Variability • Variability: refers to how similar or dissimilar responses are to a given question • P (%): share that “have” or “are” or “will do” etc. • Q (%): 100%-P%, share of “have nots” or “are nots” or “won’t dos” etc. N. B. : The more variability in the population being studied, the larger the sample size needed to achieve stated accuracy level.

With Nominal data (i. e. Yes, No), we can conceptualize answer variability with bar charts…the highest variability is 50/50

So, what have we learned thus far? There is a relationship among: § the level of confidence we desire that our results be repeated within some known range if we were to conduct the study again, and… § the variability (in responses) in the population and… § the amount of acceptable sample error (desired accuracy) we wish to have and… § the size of the sample.

Sample Size Formula • The formula requires that we (a. )specify the amount of confidence we wish to have, (b. ) estimate the variance in the population, and (c. ) specify the level of desired accuracy we want. • When we specify the above, the formula tells us what sample size we need to use…. n

Sample Size Formula - Proportion • The sample size formula for estimating a proportion (also called a percentage or share):

Practical Considerations in Sample Size Determination • How to estimate variability (p and q shares) in the population • Expect the worst case (p=50%; q=50%) • Estimate variability: results of previous studies or conduct a pilot study

Practical Considerations in Sample Size Determination • How to determine the amount of desired sample error • Researchers should work with managers to make this decision. How much error is the manager willing to tolerate (less error = more accuracy)? • Convention is + 5% • The more important the decision, the less should be the acceptable level of the sample error

Practical Considerations in Sample Size Determination • How to decide on the level of confidence desired • Researchers should work with clients to make this decision. The higher the desired confidence level, the larger the sample size needed • Convention is 95% confidence level (z=1. 96 which is + 1. 96 s. d. ’s ) • The more important the decision, the more likely the manager will want more confidence. For example, a 99% confidence level has a z=2. 58.

Example: Estimating a Percentage (proportion or share) in the Population What is the Required Sample Size? § Five years ago a survey showed that 42% of client’s were aware of the agency’s services (Clients were either “aware” or “not aware”) § After an intense public information campaign, management will conduct another survey. They want to be 95% confident (95 chances in 100) that the survey estimate will be within + 5% of the true share of “aware” consumers in the population. § What is n?

Estimating a Percentage: What is n? Z=1. 96 (95% confidence) p=42% (p, q and e must be in the same units) q=100% - p%=58% e= + 5% What is n?

N=374 What does this mean? It means that if we use a sample size of 374, after the survey, we can say the following of the results: (Assume results show that 55% are aware) “Our most likely estimate of the percentage of consumers that are “aware” of our brand name is 55%. In addition, we are 95% confident that the true share of “aware” customers in the population falls between 52. 25% and 57. 75%. ” Note that: ( +. 05 x 55% = + 2. 75%) !!!!

Task § Figure out how big the sample should be for 95% confidence in the Simple Random Sample example handout § 400

Estimating a Mean This requires a different formula Z is determined the same way (1. 96 or 2. 58) e is expressed in terms of the units we are estimating, i. e. if we are measuring attitudes on a 1 -7 scale, we may want our error to be no more than +. 5 scale units. If we are estimating dollars being paid for a product, we may want our error to be no more than + $3. 00. S is a little more difficult to estimate, but must be in same units as e.

Estimating “s” in the Formula to Determine the Sample Size Required to Estimate a Mean Since we are estimating a mean, we can assume that our data are either interval or ratio. When we have interval or ratio data, the standard deviation of the sample, s, may be used as a measure of variance. How to estimate s? § Use standard deviation of the sample from a previous study on the target population § Conduct a pilot study of a few members of the target population and calculate s

Example: Estimating the Mean of a Population What is the required sample size, n? Management wants to know clients’ level of satisfaction with their service. They propose conducting a survey and asking for satisfaction on a scale from 1 to 10 (since there are 10 possible answers, the range = 10). Management wants to be 99% confident in the results (99 chances in 100 that true value is captured) and they do not want the allowed error to be more than +. 5 scale points. What is n?

What is n? S = 1. 7 (from a pilot study), Z = 2. 58 (99% confidence), and e =. 5 scale points What is n? It is 77. Assume the survey average score was 7. 3, what does this “tell us? ” A 10 is very satisfied and a 1 is not satisfied at all. Answer: “Our most likely estimate of the level of consumer satisfaction is 7. 3 on a 10 -point scale. In addition, we are 99% confident that the true level of satisfaction in our consumer population falls between 6. 8 and 7. 8 on the scale. ”

§ Go back and look at systematic example handout § What types of questions are being asked? § Might you need different formulas?

Other Methods of Sample Size Determination • Arbitrary “percentage rule of thumb” sample size: • Arbitrary sample size approaches rely on erroneous rules of thumb (e. g. “n must be at least 5% of the population”). • Arbitrary sample sizes are simple and easy to apply, but they are neither efficient nor economical. (e. g. Using the “ 5 percent rule, ” if the universe is 12 million, n = 600, 000 – a very large and costly result)

Other Methods of Sample Size Determination…cont. • Conventional sample size specification • Conventional approach follows some “convention” or number believed somehow to be the right sample size (e. g. 1, 000 – 1, 200 used for national opinion polls w/+ 3% error) • Using conventional sample size can result in a sample that may be too large or too small. • Conventional sample sizes ignore the special circumstances of the survey at hand.

Other Methods of Sample Size Determination…cont. • Statistical analysis requirements of sample size specification • Sometimes the researcher’s desire to use particular statistical technique influences sample size. As cross comparisons go up cell sizes go up and n goes up. • Cost basis of sample size specification • Using the “all you can afford” method, instead of the value of the information to be gained from the survey being the primary consideration in sample size determination, the sample size is based on budget factors.

Special Sample Size Determination Situations Sample Size Using Nonprobability Sampling • When using nonprobability sampling, sample size is unrelated to accuracy, so cost-benefit considerations must be used