Lecture 9 SAMPLE SIZE DETERMINATION Most of the

  • Slides: 28
Download presentation
Lecture - 9 SAMPLE SIZE DETERMINATION

Lecture - 9 SAMPLE SIZE DETERMINATION

 • Most of the research studies require the support of data or measurements

• Most of the research studies require the support of data or measurements from subjects considered in the research study. • The subjects may be patients, individuals who participate in a survey, animals in a laboratory experiment etc. • In all such situations , while planning the study itself, the researcher should ask the question: How large a sample do I need? 3/5/2021 2: 24 PM Prof. K. K. Achary 2

 • The answer to this question will depend on the scope , objectives

• The answer to this question will depend on the scope , objectives and study design on one side and on the other side, time and money at the disposal of the researcher. • It is not true that determination of the sample size is a crucial issue in all research studies. For ex. , in a study of the curative effect of a drug on a fatal disease like AIDS, getting a single positive result could be important. 3/5/2021 2: 24 PM Prof. K. K. Achary 3

 • On the other hand, if a new vaccine for malaria is to

• On the other hand, if a new vaccine for malaria is to be tested, the number of subjects to be tested will have to be sufficiently large for confirming the vaccine’s effect over the effects of existing preventive measures. • The type of outcome of the study, type of statistical investigation to be carried out are also important. 3/5/2021 2: 24 PM Prof. K. K. Achary 4

 • The outcome variable may be categorical with two or more categories or

• The outcome variable may be categorical with two or more categories or it may be continuous. • In case of categorical variables , the analysis may include cross- tabulation, contingency table, proportions, percentages etc. • In case of continuous variables, the analysis will be in terms of mean, variance , correlation , tests of hypotheses, comparison between two or more groups etc. 3/5/2021 2: 24 PM Prof. K. K. Achary 5

 • If the study has to be completed in a short period and

• If the study has to be completed in a short period and with minimum expenditure , we may need a small sample but the validity of the conclusions may be at stake. On the other hand, to achieve high validity of the results we may require large cohorts/samples • Considering very large sample without proper judgment on its impact on the outcome may lead to increase in cost and time. 3/5/2021 2: 24 PM Prof. K. K. Achary 6

 • Hence it is very important to decide the minimum sample size required

• Hence it is very important to decide the minimum sample size required for a research study before the study begins. • A researcher has to complete the following steps before approaching a statistician to know the sample size. ü Describe aim , scope and objectives of study üFormulate the research question precisely üFormulate the study design 3/5/2021 2: 24 PM Prof. K. K. Achary 7

ü Possess complete knowledge about the variables , categories etc. involved in the study

ü Possess complete knowledge about the variables , categories etc. involved in the study and also the outcome /output variable. ü Possess knowledge about similar or related studies conducted by other researchers. ü Possess knowledge about important parameters of interest, desired level of precision ( margin of error ), effect size , type of sampling design planned , levels of different factors , randomization involved in the study etc. ü Consider the feasibility of a pilot study , if required. 3/5/2021 2: 24 PM Prof. K. K. Achary 8

Sample size determination in case of prevalence estimation • Estimation of prevalence in epidemological

Sample size determination in case of prevalence estimation • Estimation of prevalence in epidemological studies is very common. Researchers are interested in estimating the prevalence of tuberculosis , cancer etc. • Estimating prevalence of smoking, drug addiction etc. are also of interest for the researchers. • In such cases sample size determination is very important. 3/5/2021 2: 24 PM Prof. K. K. Achary 9

 • Suppose the population proportion (prevalence ) of the event of interest is

• Suppose the population proportion (prevalence ) of the event of interest is ‘p’. Using the confidence interval for the population proportion, we can derive the following formula for sample size: 3/5/2021 2: 24 PM Prof. K. K. Achary 10

 • Here is the standard normal value ( zscore) corresponding to level of

• Here is the standard normal value ( zscore) corresponding to level of significance α, p is the prevalence and e is the margin of error. • The margin of error is the absolute difference between the true prevalence and the estimated prevalence that is allowed by the researcher. • When true prevalence is not known , it is estimated through past data or using a 3/5/2021 2: 24 PM Prof. K. K. Achary 11

 • If we have no definite knowledge about ‘p’, then we take p=0.

• If we have no definite knowledge about ‘p’, then we take p=0. 5. • The margin of error e has to be fixed by the researcher 3/5/2021 2: 24 PM Prof. K. K. Achary 12

 • Ex: A local health dept. wants to estimate the prevalence of tuberculosis

• Ex: A local health dept. wants to estimate the prevalence of tuberculosis among children under five years age. How many children should be included in the sample so that the prevalence may be estimated within 5% of the true value with 95% confidence, if it is known that the true rate is unlikely to exceed 20% ? • Here p=0. 2 , α = 0. 05 and = 1. 96. • Since the margin of error has to be within 5% of the true proportion Δ = 0. 05. With these values we get n ≥ 245. 86. We take n=246. • If there is a possibility of dropouts, we can consider 10% more in the sample , which fixes the sample size at 271. 3/5/2021 2: 24 PM Prof. K. K. Achary 13

 • If we choose different α and e, we get different sample size.

• If we choose different α and e, we get different sample size. If we increase α , then n will be smaller. For larger margin of error, we get smaller n. Larger margin of error will give less precise estimate. • A thorough understanding of the confidence level , margin of error and the cost and time constraints on the execution of the study is very important. 3/5/2021 2: 24 PM Prof. K. K. Achary 14

Sample size using power analysis • When we use tests of significance to test

Sample size using power analysis • When we use tests of significance to test our hypothesis, using z-test , t-test etc. we consider power analysis and determine the minimum sample size required for the study. • In this formula, we should fix α = level of significance, (1 -β )= the power of the test , ∂ = the effect size. The standard deviation σ is either known or it is estimated from past data/studies. Usually we fix α= 0. 05, β= 0. 05 ( or 0. 10, 0. 20 ) and find the corresponding zscores. 3/5/2021 2: 24 PM Prof. K. K. Achary 15

Sample size formula for paired t-test 3/5/2021 2: 24 PM Prof. K. K. Achary

Sample size formula for paired t-test 3/5/2021 2: 24 PM Prof. K. K. Achary 16

Sample size for correlation If we have to use correlation analysis in our study

Sample size for correlation If we have to use correlation analysis in our study then the sample size is given by the following formula. where ‘r’ is the population correlation ( or its estimate ) 3/5/2021 2: 24 PM Prof. K. K. Achary 17

Problems in sample size determination • The formula gives only an approximate value. •

Problems in sample size determination • The formula gives only an approximate value. • We need to know several information about the study • If we carryout several types of analysis , then we have to determine sample size for each analysis separately and take the maximum sample size. But this is not feasible!!. • For complicated types of analysis , like ANOVA, regression/logistic regression etc. the formulae/methods are complicated. 3/5/2021 2: 24 PM Prof. K. K. Achary 18

What is error bar? • When we select a random sample from a population,

What is error bar? • When we select a random sample from a population, the sample mean is an estimator of the population • From each different sample we get sample mean and sample standard deviation (s. d. ) • To express the spread of the individual values in a sample , we provide the information about the sample mean and sample standard deviation, expressed as : mean ± s. d. 3/5/2021 2: 24 PM Prof. K. K. Achary 19

 • When we have to compare these values between two or more groups,

• When we have to compare these values between two or more groups, we draw error bars, which are bar diagrams • Height of each bar represents sample mean value and vertical line drawn at the midpoint of the top of each bar showing twice s. d. 3/5/2021 2: 24 PM Prof. K. K. Achary 20

3/5/2021 2: 24 PM Prof. K. K. Achary 21

3/5/2021 2: 24 PM Prof. K. K. Achary 21

3/5/2021 2: 24 PM Prof. K. K. Achary 22

3/5/2021 2: 24 PM Prof. K. K. Achary 22

Standard error • Suppose we select a sample of size ‘ n ‘ from

Standard error • Suppose we select a sample of size ‘ n ‘ from a population and compute the mean value to estimate the population mean, the difference between the true value of the population mean and the sample mean is called the ‘error’ in estimation. • If we select all different samples of the same size and compute the variance of the different sample means, we get variance of the sample mean 3/5/2021 2: 24 PM Prof. K. K. Achary 23

 • This is equal to : • However, we select only one sample

• This is equal to : • However, we select only one sample from the population, we estimate this variance from sample variance. The square root of this variance is called the standard error( SEM) • SEM = 3/5/2021 2: 24 PM Prof. K. K. Achary 24

 • Use Mean ± s. d. when you want to show the scatter

• Use Mean ± s. d. when you want to show the scatter ( or variation ) in individual values • Use Mean ± SEM when you want to show the error in estimation of population mean • It is easy to be confused about the difference between the standard deviation (SD) and the standard error of the mean (SEM). Here are the key differences 3/5/2021 2: 24 PM Prof. K. K. Achary 25

 • The SD quantifies scatter — how much the values vary from the

• The SD quantifies scatter — how much the values vary from the mean. • The SEM quantifies how precisely you know the true mean of the population. It takes into account the value of the SD and the sample size. • Both SD and SEM are in the same units -- the units of the data. • The SEM, by definition, is always smaller than the SD. 3/5/2021 2: 24 PM Prof. K. K. Achary 26

 • The SEM gets smaller as your sample size gets larger. • This

• The SEM gets smaller as your sample size gets larger. • This makes sense, because the mean of a large sample is likely to be closer to the true population mean than is the mean of a small sample. With a large sample, you'll know the value of the mean with a lot of precision even if the data are very scattered. 3/5/2021 2: 24 PM Prof. K. K. Achary 27

 • The SD does not change predictably as you acquire more data. The

• The SD does not change predictably as you acquire more data. The SD you compute from a sample is the best possible estimate of the SD of the population. • As you collect more data, you'll assess the SD of the population with more precision. But you can't predict whether the SD from a larger sample will be bigger or smaller than the SD from a small sample. 3/5/2021 2: 24 PM Prof. K. K. Achary 28