Big Data vs Traditional Research Approaches Module 2

Big Data vs. Traditional Research Approaches Module 2

Disclosure The following information has been developed with assistance and input from Kenneth J. Wilkins, Ph. D – Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH) This information does not constitute an endorsement by NIDDK or NIH

Module Objectives I. Identify types of big data being collected II. Contrast big data research III. Contrast hypothesis-driven vs. hypothesisgenerating research IV. Explain the differences between causation vs. correlation/association

There are two main ways that researchers use big data in health research: 1) Ask specific questions based on: 2) Look for patterns within big data Traditional vs. Big Data Research • Experience – doctors may notice that certain types of patients react poorly to a particular drug, or those who are taking certain dietary supplements seem to be coming in with specific problems • Other data studies – secondary findings or unfulfilled outcomes form a first study that can inform future studies Both types of research are necessary to advance medicine and patient care.

Traditional vs. Big Data Research annotations Recall from Module 1 that “big data” is a term that may be defined differently by different stakeholders and researchers The phrase “Traditional vs. Big Data Research” is used mostly to help you appreciate recent innovations using big data methodologies For example, consider patients with a rare disease – the size of the data may be small, but the complexity may be large The “big data research” is able to get measurements across multiple domains that are very intensive and longitudinal

Typical steps in conducting traditional research: Traditional Research 1) Develop a hypothesis 2) Test the hypothesis a. b. Design an experiment – e. g. in a randomized control trial or an experimental/treatment and control group, OR Compile/collect observational data – collect variables that impact that hypothesis Traditional research is also known as being primarily hypothesis-driven research Hypothesis-driven research can answer causation as well as association question and start to answer why things work in a certain way Hypothesis-driven research may not be the most easily done type of research for big data studies

NIH Lumbar Imaging with Reporting of Epidemiology (LIRE) Project Hypothesis. Driven Research: A Case Example • Researchers hypothesized that inserting information into lumbar spine imaging reports (i. e. MRIs, CT scans) about the prevalence of potentially misleading findings that have appeared for patients without back pain would lead to fewer primary care providers ordering unnecessary spine-related interventions for patients. • Researchers set up a RCT in which some clinics received standard lumbar spine imaging reports and some received imaging reports with prevalence data. • Researchers found that patients whose physicians received the imaging reports with prevalence data (the novel process of care) ordered fewer unneeded interventions than those who received standard reports. • Check out the project webpage https: //www. nihcollaboratory. org/demonstrationprojects/Pages/LIRE. aspx for more details, available through news releases and interviews/videos as well as scientific publications

Big data research does not follow a set method, nor necessarily follow a pre-specified data collection/compilation plan The typical approach is looking for patterns in available data at hand, and developing a hypothesis based on these patterns Big Data Research This is known as hypothesis-generating research Often done as a consequence in traditional research Key to planning future hypotheses – and thus data – to pursue Big data research can also expand upon traditional research approaches by leveraging already existing big data to test hypotheses derived from observation, patterns in the data, or other results from prior studies This tends to be at least as much hypothesisgenerating as it is hypothesis-driven research

The use of big data research can be beneficial in multiple ways. Hypothesis-generating research can: Big Data Research Use Find patterns (i. e. interactions between different conditions & treatments) that individual experiments might not find Get larger data sets and types of data Can overcome some types of sampling bias It can also: • Identify differences in outcomes for particular demographics, i. e. men vs. women or different ethnic groups • Identify trends in disease outcomes and treatments • Identify outcomes several years after a treatment

Randomized Control Trials (RCTs) Best chance to determine “cause and effect” A “causal study” means that the treatment differs, but everything else is the same RCTs vs. Observational Studies Patients randomly assigned into control or treatment groups; everything but the treatment is based on the “average” Patients/assessor “blind” to which treatment the patient received; less likely to subjectively affect adherence/outcomes Observational Studies Harder to conclude about cause and effect Look for trends after an event or treatment Typically done when it’s not yet feasible or ethical to do RCTs; need strong yet unverifiable assumptions

Consider this example of how big data can be used for identifying correlations or associations that can then be studied: RCTs vs. Observational Studies: An Example • Study: Determine the effect of low amounts of alcohol during early pregnancy • It would be unethical to give half of the women in the study wine instead of grape juice and not tell them • There is a reasonable chance there could be detrimental effects not worth the risk taken for the knowledge gained • This would also be a causal study – cause and effect determined by the “treatment” (wine) vs. “control” (grape juice)

• However, you can ask women who have complications with pregnancy if they drank during pregnancy • Example cont. This would be an association study – complications during pregnancy could be associated with women drinking alcohol during pregnancy • What might be one problem with getting women to report exactly how much they drank in their pregnancy? • • • Recall bias—people may remember incorrectly or don’t want to say that they drank during pregnancy. People who drink while pregnant may also be more likely to smoke or not go to the doctor, let alone be willing to enroll in the study and respond to all of the questions. Controlling for these factors requires a very large – and potentially difficult to obtain – data set.

Causation is determined when one variable (a treatment, condition, etc. ) is shown to directly influence a second variable Randomized controlled trials (RCTs) and hypothesis -driven research are well-suited to determine causal relationships Causation vs. Correlation/ Association Example: • You could determine that depriving people of sleep causes decreased memory and slower decisionmaking than allowing people their typical amount of sleep each night. • To test this, you would assign some people to the treatment (sleep deprivation over multiple days) and some to the control (no sleep deprivation) • Measure indicators of change in memory capacity and speed of decision-making relative to ‘baseline’ (same variables measured following a series of nights without sleep deprivation).

Correlation is determined when two variables are consistently related together in a predictable way Causation vs. Correlation/ Association If two variables are correlated with each other, there may be some relationship between the two Association is the broader term to capture how variables’ values change in ways that reflect some sort of tendency that patterns in one’s values may predict (or be predicted by) patterns in the other’s Does not necessitate a “linear” relationship Examples of correlation (or more generally, association): • There may be more cases of skin cancer among people with fair skin than among people without fair skin; however, we cannot say that having fair skin causes skin cancer. • There may be lower rates of breast cancer among patients who receive estrogen replacement therapy; however, we cannot conclude that estrogen replacement therapy prevents breast cancer.

An Example: Causation vs. Correlation Sources: National Vital Statistics Reports and U. S. Department of Agriculture http: //www. tylervigen. com/view_correlation? id=1703 • This graph shows that just because two things are associated or even correlated, that does not imply causation. • That is, the reduction in divorces in Maine is not caused – or related to – the declining use of margarine consumption. • Note: This graph was created by taking data sets and running a algorithm specifically to fit the curves together and create a correlation.

Review the key concepts covered during this module and reflect on your role as a patient advocate. Self-Led Activity: Review & Reflect Has your work as a patient advocate benefitted from your improved sense of the strengths and limitations to how researchers utilized big data? If not, how could it benefit? What skills or further information would you need? What role(s) could you take to help your constituents add value to big data research efforts? Could your community add data by advocating access? Could your community help researchers prioritize topics?

You should be able to: Upon Completing this Module… 1) Define traditional research vs. big data research 2) Describe how traditional and big data research projects are assembled and conducted 3) Give examples of traditional and big data research 4) Determine whether data associations are “causal” or “correlative” 5) Reflect on how concepts in this module impact your role as a patient advocate
- Slides: 17