The Reproducibility Crisis A Call to Action for

The Reproducibility Crisis: A Call to Action for Bio. Link Jeanette Mowery, Ph. D. Lisa Seidman, Ph. D.

AGENDA • Introduction; reproducibility in science, a contemporary hot topic • The causes of irreproducibility that everyone talks about • The causes of irreproducibility that we talk about – but they don’t • Bio-Linker’s contributions to the discussion • A call to action – what comes next?

WHAT ARE THESE HEADLINES ABOUT? • Pharmaceutical companies scour scientific literature for leads • If find a promising study, Amgen scientists would try to replicate it, but seldom could • 2012, C. Glenn Begley, decided to study this formally • Selected 53 papers that could have led to groundbreaking drugs and tried to replicate them in house • Scientific findings confirmed in only 6 cases • So asked original scientists to help, occasionally in their own labs, but using a blinded methodology

RESULT • Even original authors could not replicate most of the work • Bayer company scientists in 2011 had published results of similar study where they replicated 25% of studies

WHY IS THIS A BIG DEAL? • Consequences for patient treatments/prevention • Patient advocacy groups get discouraged angry • Huge financial implications • Affects the public’s view of science • Political ramifications • Affects careers • Strikes at the heart of what science is all about

LED TO THE REPRODUCIBILITY PROJECT • 2013 Collaboration • Center for Open Science and Science Exchange • Published protocols before beginning in the lab • Total of 29 studies will be done • First results from 5 studies announced January 19, 2017 • Replicated two out of five studies • One study not replicated • Two studies were inconclusive • Some caveats • Registered Report/Replication study format

CAUSES OF IRREPRODUCIBILITY THAT EVERYONE IS TALKING ABOUT • Problems with using animal models • Mice are not small people with four legs • Problems with cell lines • Problems with antibodies • Poor use of statistics • Poor experimental design • FLAWED CULTURE IN ACADEMIC SCIENCE

RODENT MODELS: CONSISTENCY ALMOST IMPOSSIBLE TO ACHIEVE • Mice are affected by dozens of things that are difficult to control • For example, height of cage in room affects mice • Presence of male handlers • Even a man’s sweaty t-shirt in room affects mice • Bedding • Food • Etc.

RODENT MODELS • From Harris book: “Imagine that I was testing a new drug to help control nausea in pregnancy and I suggested to the FDA that I tested it purely in 35 year old white women all in one small town in Wisconsin with identical husbands, identical homes, identical diets, which I formulate identical thermostats that I’ve set, and identical IQs. And incidentally they all have the same grandfather. ” That would be recognized as a terrible experiment, “but that’s exactly how we do mouse work. And fundamentally that’s why I think we have this enormous failure rate. ” Joseph Garner

BLINDING and STUDY DESIGN • One study showed that only 17% of studies used blinded experimental design and random assignment of mice to groups

CELL LINES • Most famous example of mixed up cell lines is He La story • But there are many other examples and thousands of studies that used cell lines that were not what they thought • Almost impossible to clean up the literature • Fortunately, now it is possible to have cell lines authenticated but many scientists are not doing this

ANTIBODIES • Often bind non-specifically; researchers may be unaware that antibody is not binding to what they think it is • To some extent, good experimental design can help with this problem

ISSUES with STATISTICS • “Batch effect, ” where experimental and controls have some subtle difference in conditions; e. g. , are run on different days where instrument performance differs • Common idea that you need to repeat experiment only three times before reporting it – not based on statistical analysis • P-hacking • Re-analyze data until get P value of 0. 05 • Use of p value of 0. 05 • If repeat an experiment with this level of significance there is a 50/50 chance that it won’t be significant the next time

HARKing: Hypothesizing After Results are Known • Confusion between exploratory and confirmatory • Barn analogy • FDA now requires that scientists running clinical trials register their hypotheses before beginning the trials • Robert Kaplan and Veronica Irvin reviewed major studies of drugs and dietary supplements supported by National Heart, Lung, and Blood institute between 1970 and 2012. Prior to law 57% showed efficacy, afterwards, only 8% did

FLAWED CULTURE • Career reward system • • Competition Pressure to publish, particularly in high impact journals Incentive to achieve new and exciting results No incentive to publish negative results • Journals • Few opportunities to publish negative result or retraction • Retractions for honest mistakes often viewed as evidence of fraud • True fraud is often not corrected

RECOMMENDATIONS FROM BEGLEY AND ELLIS • More opportunities to present negative data • Preclinical studies must be required to report all findings • Funding agencies, reviewers and journal editors must agree that negative data is just as informative as positive • Journal editors must play an active role • Greater dialogue between physicians, scientists, patient advocates and patients • Get universities on board • More credit for teaching and mentoring • Reward quality vs quantity • Rely on more than publications in top-tier journals • Specific recommendations for cancer research tools • Glenn Begley suggests adopting a form of glp, but calls it GIP

SUGGESTIONS FOR IMPROVEMENT • Fall 2015 conference at Stanford: • Get individual scientists to change their ways • Get journals to change incentives and practices • publish negative results • publish retractions easily • include statistical review for papers using statistics • Use online methods to evaluate work, such as open pre-publication

BUT WE NEED TO REMEMBER • That academic scientists work at the edge of knowledge • We expect that many of their ideas will be wrong • That is part of what science is • Biological systems are inherently variable • Experimentation is always indirect • Nonetheless, we can still expect avoidable error to be avoided

JOURNALISTS’ RESPONSE • Journals have already responded by demanding much more detail in research reports • “Transparency” • Related to documentation and traceability, two ideas very familiar to those working in quality systems • Note that can have excellent documentation without transparency • Not unusual to lose documentation when a grad student or post-doc leaves the lab

ANOTHER PIECE OF THE SOLUTION • Journalists’ solution is to print details • Scientific Community and Educators’ solution is…. education

RESPONSE FROM NIH AND SCIENTIFIC SOCIETIES • NIH Training Modules to enhance Data Reproducibility • • Experimental Design Laboratory Practices Analysis and Reporting Culture of Science • Society for Neuroscience • Training Modules to Enhance Data Reproducibility

NIH GRANT REQUIREMENTS • The NIH plans to require formal instruction in rigorous experimental design and transparency to enhance reproducibility for institutional training, institutional career development, and individual fellowship applications no sooner than 2017. See NOTOD-16 -034. • When implemented, applications will be expected to provide the following: • Institutional training grant applications will be required to include within the training program plan a summary of the instruction planned for all predoctoral and postdoctoral trainees to ensure the knowledge and skills required to design and conduct rigorous, well-controlled experiments that consider all relevant biological variables, use authenticated biological and chemical resources, and apply appropriate statistical tests for data analyses. In addition, a separate attachment will be required to describe in more detail the instructional content and curricular content.

MOVEMENT TOWARD STANDARDS: SEQUENCING

QUICK STORY • Experience as a post-doc • Did not understand one important cause of variability

REPRODUCIBILITY AND VARIABILITY • Talking about reproducibility without looking at the underlying variability

QUALITY SYSTEMS • Reducing variability is fundamental in all production • Biomanufacturing • Medical devices • Formal quality systems evolved to reduce variability

• We would argue that just as you must truly understand control variability throughout manufacturing, so you must control it in the lab • Otherwise cannot achieve reproducibility

“DOING GOOD SCIENCE” • The “Quality System” of Academia • Not written down in one place • Not standardized • Enforcement is through peers looking at journal articles and grant proposals – in other words, limited • May not be sufficiently sensitive to all causes of variability

VARIABILITY • ”Small changes in experimental design, such as buffer conditions, p. H, slight differences in cell line, reagents used in studies, cell culture changes and even differences in tubing and labware suppliers could change the outcome of experiments” Quote from Biopharm article, Jan 19, 2017: “Reproducibility Project only Partially Able to Validate Findings of Prominent Cancer Studies”

DATA FROM A GROUP OF POST-BACCALAUREATE STUDENTS 1 M Tris Buffer, p. H 8. 0

• Often in order to know if there is underlying variability, you have to look for it

Lisa and Jeanette go to the Lab • Day and a half playing with p. H • We could achieve + 0. 14 p. H units on same solution on two different days

However… • 3 pt vs 2 point calibration • Confirmed that p. H measurements are not linear • Difficult to detect faulty electrode • Had to check efficiency by looking at m. V • Calibration should be repeated approx every 2 hours • We made solution slowly, allowing time to equilibrate, also careful about temperature

CONCLUSION • It is difficult to reduce variability in p. H measurements if you do not pay attention to numerous factors • If p. H measurements are inaccurate, how can you expect solutions to be consistent?

HOW OFTEN ARE UNDERGRADUATES TAUGHT EXPLICITLY HOW TO PREPARE SOLUTIONS? • Academicians may make assumptions about what lab workers know • The assumption that they will learn somewhere else is almost certainly false • Students learning often takes place on a “need to know” basis • What they learn may or may not be correct • Not systematic • For example, there are concepts of metrology that underlie all measurements, won’t learn these concepts if not taught

BUT DOES THIS SORT OF DETAIL MATTER? • Scientists Interested in Oncogene Effects on Expressed Proteins • Compared proteins in cultured cells transformed with oncogenes to those not transformed • First looked at lysis buffer and its effects on proteins observed • Found that buffer composition had a significant effect on which proteins were detected and therefore conclusions about differences between transformed and nontransformed cells

BUFFER VARIATION • Ionic strength • Type and concentration of detergent • p. H • Presence of phosphate • Thus, reproducibility of this type of cancer study could be adversely affected by reagent variability (Woods, et al Bio. Techniques, 20: 794 -796, May 1996)

BRINGING BIO-LINKERS TO THE TABLE • We teach students to work in regulated environments (with quality formal quality systems) • We teach the core skills systematically • We have the core skill standards and others that we can use as base documents • We have some credentialing tests that can be an avenue towards improving technical performance • We have instructional materials that address some of the core issues • We think about things that they don’t (usually) think about

• Results of our little experiment

Course-in-a-Box

CALL TO ACTION • What should we, as a group, do next? • Ideas and thoughts?

WHAT NEXT?

EXAMPLE COURSE