Reproducible Research What is it For 1 to
> # Reproducible Research : : What is it? > For 1 to (all. Ecologists) { & Why Ecologists should embrace it. } > Time = 45 minutes > If Time then { “Demonstration” } else “Case Study” > # Alan Couch, Institute for Applied Ecology, University of Canberra > # With thanks to Roger Peng & Data Science Team, Johns Hopkins & Coursera
Coursera
Outcomes – Discovery & Replication Discovery Replication ✓ ✓ Wrong Perpetuated fallacy Wrong Unchallenged fallacy Wrong Not done ✓ Not done Optimal Self-correcting False non-replication Unconfirmed discovery Modified from: John P. A. loannidis 2005; Why Most Published Research Findings Are False; PLo. S Med › v. 2(8); 2005 Aug
Outcomes – Discovery & Replication Discovery Replication ✓ ✓ Wrong Perpetuated fallacy Wrong Unchallenged fallacy Wrong Not done ✓ Not done Optimal Self-correcting False non-replication Unconfirmed discovery
> # Reproducible Research : : What is it? > For 1 to (all. Ecologists) { & Why Ecologists should embrace it. } > Time = 45 minutes > If Time then { “Demonstration” } else “Case Study” > # Alan Couch, Institute for Applied Ecology, University of Canberra > # With thanks to Roger Peng & Data Science Team, Johns Hopkins & Coursera
Replication Gold standard of scientific evidence is independent replication of findings including: üInvestigators üData üAnalytical methods ü Laboratories ü Instruments Falsifiability is key
N=18 2 10
Reproducible Research – What Is It • Replication, or at least the ability to attempt to falsify, is at the heart of Science • But, some things can’t be fully re-done because: – Time – Money – Unique • But, data manipulation and analysis should be 100% reproducible. • ie: not collection, measurement or instrumentation
On Amir, Dan Ariely and Nina Mazar (2008), The Dishonesty of Honest People: A Theory of Self-Concept Maintenance. Journal of Marketing Research. Vol. 45: 633 -634.
Reproducible Research – What Is It • Analytic data are available; and • Analytic code is available; and • Documentation of code and data are available, – So that the data munging, analysis and visualisation can be reproduced. • Reproducible - a subset of replicable research • Standard means of distribution is helpful • Figshare, Data Dryad, Rpubs, Git. Hub
Literate Programing • Code and narrative all together in one program. • The program is run, and output produced as required – rather than having to flow updates through downstream document versions – Benefit of alleviating some version control issues • How to create Literate Statistical Programs (LSP) – A number of ways; but – Knitr
knitr • knitr - a package that allows LSP in R • Developed by Yihui Xie (while a graduate student in statistics at Iowa State University) • See http: //yihui. name/knitr/ • knitr uses R as the programing language (others are allowed) • R markdown as the documentation language (others; La. Te. X, Markdown, HTML are allowed)
> # Reproducible Research : : What is it? > For 1 to all. Ecologists { & Why Ecologists should embrace it. } > Time = 45 minutes > If Time then { “Demonstration” } else “Case Study” > # Alan Couch, Institute for Applied Ecology, University of Canberra > # With thanks to Roger Peng & Data Science Team, Johns Hopkins
Why Ecologists Should Embrace LSP • • Large complex data sets Difficult to Replicate Complex Data Manipulation Complex Analytical and Statistical Methods Complex Visualisations Heavy Reliance on Statistical Findings Significant Human, Economic and Policy Consequences Image: Thanks to http: //cafethorium. whoi. edu/website/cruises/vertigok 2_updates_050824. html 2005 VERTIGO Northwest Pacific "K 2" Cruise-Searching for Higher Particle Fluxes
Ecology Paper Example – April 2015
As a discipline do you think ecology good or poor?
So the news is relatively Good in Ecology
Pros and Cons • Fosters order and discipline in data management • Helps author see potential issues as well • Can automate backups and version management. • Need to commit early, some learning curve • Not good for very large complex documents • Less convenient to mix visuals from variety of software packages • Need code-able software (not GUI only)
So: • Large number of conscious, unconscious, subconscious biases that operate in the research process • Replication and reproducibility practices vary across disciplines • Adoption of good practices can be key to success rates, efficiency, credibility, legacy and reputation of each discipline/lab.
> # Reproducible Research : : What is it? > For 1 to all. Ecologists { & Why Ecologists should embrace it. } > Time = 45 minutes > If Time then { “Demonstration” } else “Case Study” > # Alan Couch, Institute for Applied Ecology, University of Canberra > # With thanks to Roger Peng & Data Science Team, Johns Hopkins
> # Reproducible Research : : What is it? > For 1 to (all. Ecologists) { & Why Ecologists should embrace it. } > Time = 45 minutes > If Time then { “Demonstration” } else “Case Study” > # Alan Couch, Institute for Applied Ecology, University of Canberra > # With thanks to Roger Peng & Data Science Team, Johns Hopkins & Coursera
- Slides: 26