The following lecture has been approved for University

The following lecture has been approved for University Undergraduate Students This lecture may contain information, ideas, concepts and discursive anecdotes that may be thought provoking and challenging It is not intended for the content or delivery to cause offence Any issues raised in the lecture may require the viewer to engage in further thought, insight, reflection or critical evaluation

Calculating Sample Sizes for Research Dr. Craig Jackson Senior Lecturer in Health Psychology Faculty of Health www. hcc. uce. ac. uk/craig_jackson

Keep it simple “Some people hate the very name of statistics but. . . their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the science of man. ” Sir Francis Galton, 1889

How Many Make a Sample?

How Many Make a Sample? “ 8 out of 10 owners who expressed a preference, said their cats preferred it. ” How confident can we be about such statistics? 8 out of 10? 80 out of 100? 800 out of 1000? 80, 000 out of 100, 000?

Multiple Measurement of small sample 25 cell clusters 26 22 cell clusters 25 24 24 cell clusters 23 22 21 21 cell clusters Total Mean SD 20 = 92 cell clusters = 23 cell clusters = 1. 8 cell clusters

It all depends on the size of your needl

Small samples spoil research N Age IQ 1 2 3 4 5 6 7 8 9 10 20 20 20 100 100 100 1 2 3 4 5 6 7 8 9 10 18 20 22 24 26 21 19 25 20 21 100 119 101 105 113 120 119 114 101 1 2 3 4 5 6 7 8 9 10 18 20 22 24 26 21 19 25 20 45 100 119 101 105 113 120 119 114 156 Total Mean SD 200 20 0 100 0 Total Mean SD 216 21. 6 ± 4. 2 110. 2 ± 19. 2 Total Mean SD 240 24 ± 8. 5 1157 115. 7 ± 30. 2

Background on Surveys • Large-scale • Quantitative • Can be descriptive (“ 2% of women think they are beautiful”) • Can be inferential (“Significantly more single women think they’re beautiful than married women do”) • Done with a sample of patients, respondents, consumers, or professionals • Differences between any groups assessed with hypothesis testing

Importance of Sample Size • “Forgotten” in many studies • Little consideration given • Appropriate sample size needed to confirm / refute hypotheses • Small samples far too small to detect anything but the grossest difference • Non-significant results are reported as “significant” – Type 2 error • Too large a sample – unnecessary waste of (clinical) resources • Ethical considerations – waste of patient time, inconvenience,

Qualitative studies need to sample wisely too… Asian GPs’ attitudes to ANP Objective: To determine attitudes to ANP among Asian doctors in East Birmingham PCT Method: Send invitation to 55 Asian GPs (Approx 47% of East Birmingham PCT) Intends to interview (30 mins) with first 20 GPs who respond Sample would be 36% of Asian GPs – and only 17% of GPs in PCT

Have Some Consideration – “The Good” #1 Pulmonary Valve Replacement on Biventricular Function following Tetralogy of Fallot Q. How many participants will be recruited? How many of these participants will be in a control group? A. “Power analyses have been undertaken based on previous data provided by Hazekamp et al. (2001). A sample size of 18 in each group will have 95% power to detect a difference in right-ventricular end-diastolic volume of 78 ml (the difference between preoperative mean of 292 ml and the postoperative mean of 214 ml) assuming the common standard deviation is 62 ml and using a two-group t-test with a 5% two-sided significance level. ”

Have Some Consideration – “The Bad” #2 Survey of knowledge and Attitudes regarding ADHD in Adults among Specialist Adult Psychiatrists It is a cross sectional questionnaire survey to assess the current knowledge and attitudes regarding ADHD in Adults amongst ALL General and Specialist Adult Consultants, Specialist Registrars and Staff-grade / Associate Specialist Doctors in Birmingham and Solihull Q. How many participants will be recruited? How many of these participants will be in a control group? A. “ 100. ”

Have Some Consideration – “The Ugly” #3 The Sepsis Study This is a cross sectional study which will be conducted using a postal questionnaire with a follow-up reminder letter to nonresponders. The sample will be taken from patients who have been admitted to the ITU department for severe sepsis or septic shock between Feb 1 st 2004 and Aug 1 st 2004. Patients will be over the age of 18 and will have spent at least one day on ITU. The questionnaire will be a standard health related quality of life questionnaire. Patients will be contacted by letter a maximum of two times. The patients’ personal details will be stored on a database kept in hospital to maintain patient confidentiality. Names will not be published in the written report. The database should highlight any patients who are deceased and obviously questionnaires will not be sent to the addresses.

Have Some Consideration – “The Ugly” #3 The Sepsis Study Q. How many participants will be recruited? How many of these participants will be in a control group? A. “Between 30 and 60. ”

Hypothesis testing All about 2 types of errors Hi Men perform better than women Ho Men perform no better Imagine: actual data really shows no difference between sexes Decide to accept Ho Ho true Correct decision Ho false decision Type 2 error (false negative) Decide to reject Ho Type 1 error (false positive) probability α Correct probability β

Errors in hypothesis testing Type 1 errors “False positive” Occurs if null-hypothesis rejected when it should be accepted e. g. a “significant result” obtained when null hypothesis is in fact true Probability of making Type 1 error denoted as “α” Type 2 errors “False negative” Occurs if null-hypothesis accepted when it should be rejected e. g. a non-significant result obtained when null hypothesis is in fact not true Probability of making Type 2 error denoted as “β”

Factors affecting Sample Size Dependent upon 4 inter-related factors 1. Possible to calculate each one if the other three are known 2. cally t ini rtan ce l C po n im ffere di 1. e w Po r N=? 3. al i r tu bil a N ria va ty nc a ic f i gn el i S ev el 4. y ar me im co re? r P ut u O eas M

1. Power Probability that study of given size would detect a real statistically significant difference Usually between 80% to 90%. 80 . 85 . 90 Higher power = higher chance of detecting a genuine significant difference and low chance of making a type 2 error With high power, can be reasonably sure any non-significant result is genuine e. g. ok to accept null-hypothesis

2. Minimal Important Size of difference to be detected • If difference between treatments is large, small samples can produce significant results • If difference between treatments is small, larger samples are needed • Important to know if any differences are expected to be small • Determine the min. difference between treatments considered clinically relevant • Given large enough sample, any difference can be made statistically significant Experience & Judgement needed in deciding minimal treatment effect that is of any value – to justify effort, time and finance

2. Minimum Important Difference to be detected (MID) Bronchodilator & Chronic Bronchitis Example volume New bronchodilator causes a real increase in tidal in patients (10 ml average) Standard deviation (natural variation) in tidal volume in this clinical population is more than 10 ml Given huge sample a significant tidal volume increase in users could be proved (but this is due to natural variation) Expensive & Pointless

3. Standard Deviation & Variability Larger the SD of 2 groups, relative to CID, then the larger the sample needed Smaller the SD, the smaller the sample required Ratio of MID to SD is the “standardized difference” – used in calculating sample sizes Estimated SD Estimate of SD may not be available 1. Pilot study 2. Begin trial and estimate SD from initial patients 3. Use SD found in previous trials ±

4. Significance Level • Significance level (α) important bearing on sample size required P • Relationship between significance level (α) and the chance of making type 2 error (β) • Smaller significance level (e. g. P=0. 01 rather than P=0. 05) requires larger sample size to avoid type 2 error • As nominated significance level gets smaller, so does chance of type 2 error • Significance level of P=0. 05 implies a type 2 error will occur in every 20 trials 5 out of 100 studies will make type 2 errors - - purely by chance. Acceptable

Calculating Sample Size Sample size calculations available for all study designs, trials, and data types e. g. categorical data, continuous data, means, proportions, multiple groups, paired samples, unpaired samples, equal / unequal sized groups Calculations are complex but easily done with a PC and www Statistician helpful (if s/he can communicate clearly!) Two approaches for us nonstatisticians 1. Altman’s Normogram 2. Internet

Standardized difference = Min. important difference Standard deviation Altman’s Normogram 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1. 0 1. 1 1. 2 00 0 10 000 0 6 00 0 4 00 0 3 00 0 2 40 0 1 00 8 00 6 00 5 00 4 00 N 3 40 2 00 2 60 1 40 1 20 1 0 8 0 7 0 6 0 5 0 4 0 3 4 2 0 2 6 1 4 1 2 1 0 1 8 0. 995 0. 99 0. 98 0. 97 0. 96 0. 95 0. 90 0. 85 0. 80 0. 75 0. 70 0. 65 0. 60 0. 55 0. 50 0. 45 0. 40 0. 35 0. 30 0. 25 0. 20 0. 15 0. 10 0. 05 Power

Example Calculation – Effects of Pesticide Study IQ survey, concerning workers exposed to pesticides What we already know… Mean IQ score is 100 points IQ= 90 -110 SD is ± 10 points e. g. Normal What we need to do…. a) Decide on CID. A difference of 11 IQ points seems clinically important to me b) Calculate Standardized Difference = Min Important Difference 11 = 1 Standard Deviation 10

0. 0 0. 1 11 10 0. 2 0. 3 0. 4 = 0. 5 0. 6 1. 1 Standardized difference = Min. important difference Standard deviation Altman’s Normogram - Effects of Pesticide Study 0. 7 0. 8 0. 9 1. 0 1. 1 1. 2 00 0 10 000 0 6 00 0 4 00 0 N 3 00 0 2 40 0 1 00 8 00 6 00 5 00 4 00 3 40 2 00 2 60 1 40 1 20 1 0 8 0 7 0 6 0 5 0 4 0 3 4 2 0 2 6 1 4 1 2 1 0 1 8 0. 995 0. 99 0. 98 0. 97 0. 96 0. 95* 0. 90* 0. 85* 0. 80* 0. 75 0. 70 0. 65 0. 60 0. 55 0. 50 0. 45 0. 40 0. 35 0. 30 0. 25 0. 20 0. 15 0. 10 0. 05 Power

2. Electronic Calculation of Sample Size Not covered in most stats packages e. g. SPSS, Statistica Many sites available Real time calculation Hyperstat by David M Lane www. davidmlane. com Other additional software e. g. Xlstat. com

Summary of Sample Size & Power Correct sample size helps avoid type I & type II errors A correct study has balance of four factors d if xe Power (no less than. 80) Bigger = Better study Min. clinical difference (effective difference) Bigger = Better study d if xe Standard deviation (variability) Smaller = Better study Significance level (0. 05) Smaller = Better