Evaluation Design Learning Objectives By the end of

Evaluation Design

Learning Objectives By the end of the session, participants will be a able to: 1. 2. 3. 4. Name the criteria for inferring causality Understand internal & external validity Know the different types of designs for an evaluation Identify the strengths and limitations of the different types of study designs 5. Develop an evaluation framework 6. Select a study design that fits the purpose of a given evaluation

What are some different types of evaluation? • Formative evaluation • Process evaluation • Impact evaluation

Introduction • To show impact, researchers often want to make inferences about cause and effect • We may, for instance, want to: – Identify the factors that explain why the use of ITNs is more prevalent among one group compared to another – Know why a particular intervention works

Logic of Causal Inference • Under what condition may we infer that a change in dependent variable was really caused by the independent variable, and not by something else? • What are some of the most plausible rival explanations, and how do we rule them out?

Overview • In this section, we will examine appropriate and inappropriate criteria for inferring causality • We will identify various evaluation designs: – Experimental – Quasi-experimental – Non-Experimental and the ways they attempt to show causality

Criteria for Inferring Causality: Temporal Relationship – The cause must precede the effect

Criteria for Inferring Causality: Plausibility – An association is plausible, and thus more likely to be causal, if consistent with other knowledge

Criteria for Inferring Causality: Strength of the Association – A strong association between possible cause and effect, as measured by the size of the relative risk, is more likely to be causal than a weak association

Criteria for Inferring Causality: Consistency • Several studies giving the same result – Clearest when variety of study designs are used in different settings • The likelihood that all studies are making the same mistake is minimized – Lack of consistency does not exclude a causal association • Different exposure levels and other conditions may reduce the impact of the causal factor in certain studies

Criteria for Inferring Causality: Dose-response Relationship • Occurs when changes in the level of a possible cause are associated with changes in the prevalence or incidence of the effect – The prevalence of hearing loss increases with noise level and exposure time

Criteria for Inferring Causality: Reversibility • When the removal of a possible cause results in a reduced disease risk, the likelihood of the association being causal is strengthened – Cessation of cigarette smoking is associated with a reduction in the risk of lung cancer relative to that in people who continue to smoke • However, when the cause leads to rapid irreversible changes whether or not there is continued exposure (HIV infection), then reversibility cannot be a condition for causality

Internal & External Validity

Internal & External Validity • When considering cause and effect, two forms of validity become relevant: – Internal validity – External Validity

Internal Validity • Internal validity: The confidence that the results of a study accurately depict whether one variable is or is not a cause of another – A study is internally valid if it meets our criteria: • Cause precedes effect • Empirical correlation between cause and effect • No confounding variable

Threats to Internal Validity • • • History Maturation or the passage of time Testing & Instrumentation Selection bias Loss to follow-up Diffusion or imitation of treatments

External Validity • The extent to which the causal relationship depicted in a study can be generalized beyond the study conditions

Evaluating the Effect of an Intervention

Specific Evaluation Questions Service provision Are the services available? Are they accessible? Is quality adequate? Service utilization Are the services being used? Service coverage Is the target population being reached? Service effectiveness Impact Is there improvement in disease outcome or health-related behavior? Were the improvements due to the program?

Measuring the Effect of an Intervention With intervention Change in Intervention outcome Start of Intervention t n e v Inter act imp Time Without intervention Intervention ends

Deciding on Evaluation Design 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. What is your research question? What is your target population? What do you know about this population? How do you intend to use the results? What do you want to measure (indicators)? What type of inference do you want to draw? When do you need the results? Do you have a sampling frame? What shape is it in? How much are you willing to pay? Where in the program life cycle are you now?

Types of Evaluation Design Experimental Strongest for demonstrating causality, most expensive Quasi. Weaker for demonstrating causality, experimental less expensive Weakest for demonstrating causality, Nonexperimental least expensive

Experimental Design

The Basic Experimental Principle • The intervention is the only difference between two groups • This is achieved by random assignment

Pretest-posttest Experimental Design • Experimental designs attempt to provide maximum control for threats to internal validity • They do so by giving the researchers greater ability to manipulate and isolate the independent variable – (not always possible in practice)

Pretest-posttest Experimental Design Essential components: 1. Identify the study population and determine the appropriate sample size for experimental and control groups 2. Randomly assign individuals to experimental and control groups 3. Pre-test everyone with a standardized instrument 4. Introduce the independent variable (intervention) to the experimental group while withholding it from the control group 5. Post-test both groups with the same instrument and under the same conditions as the pretest 6. Compare the amount of change in dependent variable for both experimental and control groups

Pretest-posttest Control Group Design Intervention group Pre-test Control group Pre-test Program Posttest Randomization Posttest Time

Factors that May Distort Conclusions Dropout Instrumentation effects Testing effects If you think that taking the pretest might influence the treatment effects, or if it might bias the post-test responses, you might want to opt for the posttestonly control group design Contamination

Posttest-only Experimental Design

Post Test Only Experimental Design • Initial group equivalence • Differences between the experimental and control groups at posttest are assumed to reflect the causal impact of the independent variable

Post Test Only Experimental Design Intervention group Posttest Program Randomization Control group Posttest Time

Post-test Only, what to consider? Advantages Cheaper Useful when pre-test can interfere with program effects Randomization ensures equivalent experimental and control groups Disadvantages Cannot assess whether the program is going to people for whom it was intended Cannot check comparability of groups Cannot know how much change actually occurred However, a pre-test post-test design always preferred

Group Discussion In which situations might experimental design not be possible?

Possible Responses • Randomization needed before program starts • Ethics o Solution: Use alternative program rather than no program o Known efficacy of intervention • Political factors • Scale-up o Solution: Start out on small scale and use delayed program strategy

Quasi-Experimental Design

Quasi-experimental designs • Can be used when random assignment is not possible • Less internal validity than “true” experiments • Still provide a moderate amount of support for causal inferences

Principles of the Quasi-experimental Design (Pre- and Post-test with comparison group but not randomized) Intervention group Pre-test Comparison group Pre-test Program Posttest Time Keep in mind selection effects: these occur when people selected for a comparison group differ from the experimental group

Quasi-experimental Statistical Methods o Difference-in-Difference analysis: method involves comparing changes before and after the program for individuals in the program and control groups o Regression analysis: Attempts to address the problem of confounding by controlling for difference at baseline

Summary of Quasi-Experimental Design Advantage Provides the assurance that outcomes are actually the results of the program Allows you to accurately assess how much of an effect the program has Disadvantage Can demand more time and resources Require access to at least two similar groups

Non-Experimental Design

Pre-test Post-test Non-experimental design Pre-test Post-test (No comparison group ) Intervention group Pre-test Program Time Posttest

Merits of Pre-test Post-test Nonexperimental design Advantage Relatively simple to implement Controls for participants' prior knowledge/attitudes/skills intentions Disadvantage Cannot account for nonprogram influences on outcomes Causal attribution not possible Cannot detect small but important changes If self-reporting is used rather than objective measures, posttest scores may be lower than pretest scores

Time Series Design (No comparison group ) Intervention group Pretest 1 Pretest 2 Program Time Posttest 1 Posttest 2

Merits of Time Series Design Advantage Enables detection of whether program effects are long-term or short-term Series of tests before intervention can eliminate need of control group Series of tests before program can be used to project results which would be expected Can be used if you have only one site to conduct your evaluation Disadvantage • Problem of confounding • Changes in instruments during the series of measurements • Loss or change of cases • Changes in group composition

Strengthening non-experimental designs • Since there is no control group confounding can be a problem • By constructing a plausibility argument and controlling for contextual and confounding factors, non-experimental designs can be strengthened

How to construct a Plausibility Argument • Describe trends in • Intervention coverage • Intermediary outcomes • Impact outcomes • Contextual factors • Link these trends – Temporal, spatial, age-pattern, “dose-response” associations

Plausibility Argument for Impact of Malaria Control Indicators • • ITN Ownership ITN use IPTp Treatment • Parasite prevalence • Anaemia (<8 g/d. L) • Fever Increase in effective intervention coverage Contextual factors • Climatic factor • Rainfall • Temperature All cause under-five mortality (5 q 0) Decreased malariaassociated mortality Decreased morbidity • Health intervention • Health care utilization • ANC, EPI, Vit A, PMTC • • • Socioeconomic Education Fertility risk Housing condition Nutrition

Merits of Plausibility Argument Advantage Disadvantage Allow national scale • Need of several data points for intervention and outcomes Can accommodate the use • Challenges in comparing data from different sources program evaluation. Complex intervention of data from different sources • Non consistent data collection methods overtime

Summary of Different Study Designs True experimental Quasi-experimental Non-experimental Partial coverage/ new programs Full coverage programs Control group Comparison group -- Strongest design Weaker than experimental design Weakest design Most expensive Less expensive Least expensive More robust Less robust

Summary of Different Study Designs • Different designs vary in their capacity to produce information that allows for the linking of program outcomes to program activities • The more confident you want to be about making these connections, the more rigorous the design and costly the evaluation

Impact Evaluation Case Study 1: Reducing Malaria Transmission through IRS

Impact Evaluation Case Study 2: Antimalarial drug policy change

Case Study 3: National-Level Scale-up of malaria interventions

References Alexander K. Rowe, Faustin Onikpo, Marcel Lama, Dawn M. Osterholt, and Michael S. Deming Impact of a Malaria-Control Project in Benin That Included the Integrated Management of Childhood Illness Strategy. American Journal of Public Health, Published online ahead of print May 12, 2011, as 10. 2105/AJPH. 2010. 300068 de Savigny, D and T Adam, eds. 2009. Systems thinking for health systems strengthening. Geneva, Alliance for Health Policy and Systems Research and WHO. Craig, P, P Dieppe, et al. 2008. Developing and evaluating complex interventions: new guidance. www. mrc. ac. uk/complexinterventionsguidance, Medical Research Council. Galiani S, Gertler PJ, Schargrodsky E. Water for life: the impact of the privatization of water services on child mortality. J Polit Econ 2005; 113: 83 -120. Gertler, Martinez, Premand, Rawlings and Vermeersch (2010) Impact Evaluation in Practice, Washington, DC: The World Bank. Habicht JP et al. (1999) Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. International Journal of Epidemiology, 28: 10 -18. Rossi P et al. (1999). Evaluation. A systematic Approach. Thousand Oaks: Sage Publications. Habicht, JP, CG Victora, et al. 1999. Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. International Journal of Epidemiology 28: 10 -18.

Group work • Please get into your project groups and select an evaluation design for your program (Experimental, Quasi-Experimental, Non. Experimental) • Explain why you chose this design • Discuss any strengths and weaknesses of this design as they relate to your program

General Questions to Be Answered by Each Group • What study design will you use? • What are the strengths and limitations of your evaluation design? • How would you know if your program is effective? • How will you address contextual or confounding factors?

MEASURE Evaluation is a MEASURE program project funded by the U. S. Agency for International Development (USAID) through Cooperative Agreement GHA-A-00 -08 -00003 -00 and is implemented by the Carolina Population Center at the University of North Carolina at Chapel Hill, in partnership with Futures Group International, John Snow, Inc. , ICF Macro, Management Sciences for Health, and Tulane University. Visit us online at http: //www. cpc. unc. edu/measure