SAMPLE SIZE DETERMINATION Most of the research studies






















- Slides: 22
SAMPLE SIZE DETERMINATION
�Most of the research studies require the support of data or measurements from subjects considered in the research study. �The subjects may be patients, individuals who participate in a survey, animals in a laboratory experiment etc. �In all such situations , while planning the study itself, the researcher should ask the question: How large a sample do I need?
�The answer to this question will depend on the scope , objectives and study design on one side and time and money at the disposal of the researcher. �It is not true that determination of the sample size is a crucial issue in all research studies. For ex. In a study of the curative effect of a drug on a fatal disease like AIDS, getting a single positive result could be important.
�On the other hand, if a new vaccine for malaria is to be tested, the number of subjects to be tested will have to be sufficiently large for confirming the vaccine’s effect over the effects of existing preventive measures. �The type of outcome of the study, type of statistical investigation to be carried out are also important.
�The outcome variable may be categorical with two or more categories or it may be continuous. �In case of categorical variables , the analysis may include cross- tabulation, contingency table, proportions, percentages etc. �In case of continuous variables, the analysis will be in terms of mean, variance , correlation , tests of hypotheses, comparison between two or more groups etc.
�If the study has to be completed in a short period and minimum expenditure , we may need small sample but the validity of the conclusions may be at stake. On the other hand, to achieve high validity of the results we may require large cohorts/samples �Considering very large sample without proper judgment on its impact on the outcome may lead to increase in cost and time.
�Hence it is very important to decide the minimum sample size required for a research study before the study begins. �A researcher has to complete the following steps before approaching a statistician to know the sample size. ü Describe aim, scope and objectives of study ü Formulate the research question precisely ü Formulate the study design
ü Possess complete knowledge about the variables , categories etc. involved in the study and also the outcome /output variable. ü Possess knowledge about similar or related studies conducted by other researchers. ü Possess knowledge about important parameters of interest, desired level of precision ( margin of error ), effect size , type of sampling design planned , levels of different factors , randomization involved in the study etc. ü Consider the feasibility of a pilot study , if required.
1. Sample size determination in case of prevalence estimation �Estimation of prevalence in epidemological studies is very common. Researchers are interested in estimating the prevalence of tuberculosis , cancer etc. �Estimating prevalence of smoking, drug addiction etc. are also of interest for the researchers. � In such cases sample size determination is very important.
�Suppose the population proportion of the event of interest is ‘p’. The 100(1 -α )% confidence interval for ‘p’ can be obtained from a sample of size ‘n’ as follows:
�Here is an estimate of p. The margin of error in estimating p is |p- |. If we fix this as Δ , then the minimum sample size required to achieve this can be obtained as follows:
�This formula gives the minimum sample size required when α is the level of significance , p is the population proportion ( if it is known ). If ‘p’ is not known, we can use past estimates or estimate it through a pilot study. We can also use reliable sources of health statistics. �If we have no definite knowledge about ‘p’, then we take p=0. 5. �The margin of error Δ has to be fixed by the researcher � is the normal abscissa corresponding to α.
�Ex: A local health dept. wants to estimate the prevalence of tuberculosis among children under five years age. How many children should be included in the sample so that the prevalence may be estimated within 5% of the true value with 95% confidence, if it is known that the true rate is unlikely to exceed 20% ? � Here p=0. 2 , α = 0. 05 and = 1. 96. � Since the margin of error has to be within 5% of the true proportion Δ = 0. 05. With these values we get n ≥ 245. 86. We take n=246. � If there is a possibility of dropouts, we can consider 10% more in the sample , which fixes the sample size at 271.
�If we choose different α and Δ, we get different sample size. If we increase α , then n will be smaller. For larger Δ value we get smaller n. Larger Δ will give less precise estimate. �A thorough understanding of the confidence level , margin of error and the cost and time constraints on the execution of the study is very important.
Sample size using power analysis �When we use tests of significance to test our hypothesis, using z-test , t-test etc. we consider power analysis and determine the minimum sample size required for the study. �In this formula, we should fix α = level of significance, (1 -β )= the power of the test , ∂ = the effect size. The standard deviation σ is either known or it is estimated from past data/studies. Usually we fix α= 0. 05, β= 0. 05 ( or 0. 10 ) and find the corresponding z-scores.
Sample size formula for paired t-test
�For size 5% , power 90%, population s. d. =12. 5 and effect size 5, the minimum sample size required is : 132 per group (considering two-sided two sample t-test).
Sample size for correlation If we have to use correlation analysis in our study then the sample size is given by the following formula. where ‘r’ is the population correlation ( or its estimate )
Single group experiment �If the aim is to determine whether an event has occurred (e. g. , whether a pathogen is present in a colony of animals), then the number of animals that need to be tested or produced is given by: n=log βlog p �where 1−β is the chosen power (usually 0. 10 or 0. 05) and p represents the proportion of the animals in the colony that are not infected.
�For example, if 30% of the animals are infected and the investigator wishes to have a 95% chance of detecting that infection, then the number of animals that need to be sampled (n) is �n=log( 0. 05)log (0. 7) , n=8. 4. �Hence 9 animals should be included in the experiment. �If p=0. 1, then n is about 29.
Problems in sample size determination �The formula gives only an approximate value. �We need to know several information about the study �If we carryout several types of analysis , then we have to determine sample size for each analysis separately and take the maximum sample size. But this is not feasible!!. �For complicated types of analysis , like ANOVA, regression/logistic regression etc. the formulae/methods are complicated.
Some websites �<http: //www. biomath. info>: a simple website of the biomathematics division of the Department of Pediatrics at the College of Physicians & Surgeons at Columbia University, which implements the equations and conditions discussed in this article; �<http: //davidmlane. com/hyperstat/power. html>: a clear and concise review of the basic principles of statistics, which includes a discussion of sample size calculations with links to sites where actual calculations can be performed;