Connecting Epidemiology and Biostatistical concepts Muhammad Mizanur Rashid
Connecting Epidemiology and Biostatistical concepts Muhammad Mizanur Rashid Shuvra, MBBS, MPH, MSc (ECD) shuvram@hotmail. com NB: To print these slides, select ‘Pure Black and White’ from the printing options to save Black Ink
Session objective • To have a general understanding of how Concepts of Epidemiology (study design) and Biostatistics are used in actual research • SO, which comes first in research reality? • Epidemiology ? Or Research? Egg? Chicken?
• Please feel free to interrupt and ask questions/clarify • Would appreciate if your mobile is in silent mode
Steps of research - 1 Researchable Problem • Not all problems are researchable, literature review required Research Question • Important and often SMART Methodology • Scientifically sound and selection doable
Steps of research - 2 Instrument preparation • Appropriate Instruments need to be pre-planned Data collection and/or management • Secondary data only needs access Data analysis and interpretation • Scientifically sound and doable Reporting
Example of a problem: What questions come to mind? In 1970 UNICEF started installing shallow water tube wells across Bangladesh to provide safe drinking water and battle cholera. However, in 1983 the neighboring West Bengal found a case of skin lesion due to arsenicosis. Curiosity crept on slowly and by 1987 many more were identified. In 2000, WHO bulletin reported the following: Bangladesh is grappling with the largest mass poisoning of a population in history because groundwater used for drinking has been contaminated with naturally occurring inorganic arsenic. It is estimated that of the 125 million inhabitants of Bangladesh between 35 million and 77 million are at risk of drinking contaminated water. The scale of this environmental disaster is greater than any seen before; it is beyond the accidents at Bhopal, India, in 1984, and Chernobyl, Ukraine, in 1986. [Ref: Bulletin of the World Health Organization, 2000, 78 (9)]
Example of a problem: What questions come to mind? • Does the prevalence of arsenicosis vary across the country? • Which socio-demographic population are most affected by arsenicosis? • How has lifestyle of people changed after noticing aresenicosis in the community? • Are exclusively breastfed children exposed to arsenic?
Research question (from my MPH thesis, 2005) • What factors determine arsenic concertation in breast milk of mothers who drink arsenic contaminated water? Researchable Problem Research Question
Selecting methodology: Feasibility and Resource Research Question Methodology selection Key Elements • Method: Quantitative vs Qualitative • Population vs sample • Sample size calculation • Site • Data collection tool • Data collection procedure • Study duration
Method: Quantitative vs qualitative Points to consider Answer to the research question Quantitative Qualitative Can be presented by Best presented through numbers words/Numbers cannot answer Data Numbers Words/sentences Data collection tool Structured Very open questionnaire/forced ended/respondent is free choice to keep talking Data analysis Statistical Thematic Interpretation Objective Subjective
Selecting the method based on the research question • Does the prevalence of arsenicosis vary across the country? Quantitative? • Which socio-demographic population are most affected by arsenicosis? Quantitative? • How has lifestyle of people changed after noticing aresenicosis in the community? Qualitative? • Are exclusively breastfed children exposed to arsenic? Quantitative?
Quantitative method: selecting epidemiological design Descriptive or Analytical or Experimental Observational Cross sectional/survey Randomized controlled trial (RCT) Case-control Nested Cohort Without randomization Not always ethically possible
Examples of different study designs: Cross sectional
Examples of different study designs: Case Control
Examples of different study designs: Cohort study
MPH Thesis: Epidemiological Design Research Question: What factors determine arsenic concertation in breast milk of mothers who drink arsenic contaminated water? Limiting factors: • I had 2 months, from proposal writing to thesis submission • I was allocated about a CAD 100 What design was used? What design could have been used if I had no limitation? What design could never been used?
Starting to consider Biostatistics: Instrument preparation • You will require a collection tool regardless of data collection procedure ; e. g. questionnaire, checklist etc. • Steps to consider: Research question: Are exclusively breastfed children exposed to arsenic? Variables: age of child, duration of breast feeding, food items given to child, As concentration in breast milk Variables Question/items: 1. How old is your youngest child? ………………. . months 2. How long have you been breast feeding this child? …………… months 3. Do you give any other forms of liquid or solid to your child other than breastmilk? a) Yes……………. (1) b) No. ……………. (0)
Instrument preparation: Starting to consider Biostatistics • Question (s)/checklist = variables • Variables are what you analyze using statistics • Descriptive statistics; i. e. central tendencies, and dispersion, is common to all research • Descriptive or Inferential statistics depends on the research question • Nature of Dependent variable dictate statistical test • Sample size gives ‘power’ to make an inference (use software) • Make dummy tables, charts etc. during planning your instrument • Make SPSS template according to questionnaire before collecting data • Enter dummy data and produce intended outputs before data
Data: Collection q Data collection refers to primary data collection Face-to-face survey: Using Paper or electronic devices Self or interviewer administered Mail/Web/phone survey: Interviewer and respondents never meet. Popular web based survey tools: i) Qualtrics (https: //www. qualtrics. com/homepage/) ii) Survey Monkey (https: //www. surveymonkey. com/) q Research with Secondary data is common in our profession • Extraction of required variables is crucial • Data cleaning is inevitable
BREAK: 5 minutes (if you are not interested in Math) When is 2 x 2=5?
Data: Analysis (sample guideline) Research question tries to. . Dependent variable (DV) Independent variable (IV) Required assumption about data Find out correlation/association between variables. DV, IV identification not required Continuous Normally distributed Pearson’s product moment 2 variables only. DV, IV correlation/simple linear identification not required regression Not normally distributed/ fewer observation Spearman’s Rank Order Correlation Categorical Count data (not %). More than 2 categories allowed for each variable Chi-square or Fisher’s exact Binary Normally distributed. Sample size 30 or more Dependent or independent t -test Binary Not normally distributed Wilcoxon signed rank or Wilcoxon sum-ranked Continuous More than 2 categories Sample size for each group ANOVA is equal Continuous Any kind Normally distributed uncorrelated residual Linear regression Binary Any kind No specific assumption Logistic regression Categorical Answer differences of Continuous outcome between 2 categories of independent var. Continuous Find associated factors; e. g. exposures Statistical test possible
How is Epidemiology and Biostatistics connected? (1/3) Sample size calculation • Depends on epidemiological design or • Statistical tests or parameters to be estimated • Needs an understanding of how the formula works to explain sample size • Use available software whenever possible Some popular software used in Epidemiology: Win. Pepi (use for observational studies): http: //www. brixtonhealth. com/pepi 4 windows. html G Power (uses statistical test and parameters, use for power easy calculation): http: //www. gpower. hhu. de/en. html Optimal Design software (use for randomized control trials): http: //hlmsoft. net/od/
How is Epidemiology and Biostatistics connected? (2/3) Epi: Comparison or counterfactual is fundamental to epidemiological designs; e. g. case-control, Cohort, RCT and Cross sectional also Example: Cross Sectional study shows Breast milk arsenic concentration among those taking water contaminated with As above 50 ppb compared to those taking below 50 ppb Bio: Depending on the nature of the variable t test, logistic regression, ANOVA, helps to analyze for comparison Example: t test shows that mothers drinking arsenic contaminated water 50 ppb and above has a mean breast milk As concentration higher than mothers who drink water contaminated with more than 50 ppb
Example of logistic regression output with comparison from case control study Epidemiology: The design is case control. Cases: Venous Thromboembolsim (VTE) Controls from the community Biostatistics: The logistic regression with binary outcome; i. e. VTE vs community controls. The rest are independent variables
Example of linear regression output with comparison from Cross sectional study Epidemiology: The design is case control. Cases: Venous Thromboembolsim (VTE) Controls from the community Biostatistics: The multiple linear regression with continuous outcome; i. e. As concentration in Breast Milk. Only one independent variable remained significant
How is Epidemiology and Biostatistics connected? (3/3) • Contingency tables are very common in all Epidemiological design: Use Chisquare (or unadjusted Odds ratio, rate ratio, etc. depending on the study design) to show association between exposure and outcome Cases Exposed Unexposed Control Note that data for contingency tables are usually count data/nominal/binary variable • Confounding is controlled at 1) Design stage: Experimental design (random assignment), matched casecontrol studies 2) Statistical analysis: Use stratified analysis, partial correlation, multiple linear regression or logistic regression etc.
What can you expect in your job in research= ITMD Research Assistant, coordinator, manager EXPECTED TO KNOW ABOUT every COMPONENTS OF RESEARCH AND HOW THEY CONNECT from Inception through execution to finalization Basically everything ITMD bridging program gives you are required bits by bits. • Starting from communication and leadership skills Collaborator, research partners, team members, Donor, Ethical Review Board, Journal Editor, management • Research Methods, Epidemiology, Biostatistics Proposal Development, Ethics Application, REB interview, Management • Data management and analysis Data collection, use of, Excel, SPSS (or other software) • Report writing usually a collaborative approach of team memebers, but you may have to lead
Thank you so much! Questions?
- Slides: 28