Overview of R Bioconductor Aedn Culhane aedinjimmy harvard
Overview of R & Bioconductor Aedín Culhane aedin@jimmy. harvard. edu http: //www. hsph. harvard. edu/research/aedin-culhane/
R • Why is it called R? – The name is partly based on the (first) names of the first two R authors and partly a play on the name of the Bell Labs language ‘S – Initially written by Robert Gentleman, & Ross Ihaka, Dept of Statistics, University of Auckland, New Zealand (1996)
– Open source, development- flexible, extensible – Large number of statistical and numerical methods – High quality visualization and graphical tools – Extended by a very large collection of rapidly developing packages
Short R History � 1991: Ross Ihaka, Robert Gentleman begin work on a project that will become R 1993: The first announcement of R 1995: R available by ftp 1996: A mailing list is started and maintained by Martin Maechler at ETH 1997: The R core group is formed 2000: R 1. 0. 0 is released
Short R History Continued 2001: Bioconductor for the analysis and comprehension of genomic data using R 2008: The Omegahat project to enable connectivity between R and other languages 2010: Former co-founder and employees of SPSS found Revolution Analytics, a company which offers a commerical package around R. 2011: Rstudio Project provide a free open source integrated development environment (IDE) for R
Jan 2009 Data Analysts Captivated by R’s Power "R is really important to the point that it’s hard to overvalue it, ” said Daryl Pregibon, a research scientist at Google, which uses the software widely. “It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems. ” Nov 10 2010 Names You Need to Know in 2011: R Data Analysis Software "R is rapidly augmenting or replacing other statistical analysis packages at universities"
R R project (2. 13 to be released April 2011) Biannual release (normally April, October) Download core and contributed packages from CRAN Link: R Task Views
Bioconductor Biannual release (normally April, October) to coincide with R release. Current: Bioconductor 2. 8 (release coincide with R 2. 13) To install use script on Bioconductor Website source("http: //www. bioconductor. org/bioc. Lite. R") bioc. Lite()
Packages Overview Bio. Conductor web site • Bioconductor Bioc. Views Task view Software Annotation Data Experimental Data
R Interface • Default R interface • Rstudio – www. rstudio. org – Cross platform, Windows/Mac/Linux • Others – MTinn. R, Notepad++, RCMDR, etc
RStudio • 4 windows -Editor, Console, History, Files/plots • • • Code completion Easy access to help (F 1) One step Sweave pdf generation Searchable history Keyboard Shortcuts – http: //www. rstudio. org/docs/using/keyboard_shortcuts
R basics: Getting help • To get help – ? mean – help(mean) • help. search(“mean”) • apropos("mean") • example(mean) • http: //www. bioconductor. org/help/
Bioconductor resources • Lots of help available for each software package – Each package MUST contain vignette (howto) • Also use documentation, workshop/course material online – Slides from talks, pdf of tutorials, R code • Feature of Bioconductor - Metadata
Vignettes • A tutorial, frequently provides worked example of package use • Bioconductor documentation requirement • A vignette = executable document consisting of – a collection of documentation text – and code chunks. • Vignettes form dynamic, integrated, and reproducible statistical documents that can be automatically updated if either data or analyses are changed. • Vignettes can be generated using the Sweave function from the R tools package. • The original latex vignette file (. Rnw file)
Vignette • Written in Sweave (Leisch, 2002). – Produce dynamic reports in which R code is embedded and executable – LA T E X – All R code in vignette is checked (and executed) by R CMD check – http: //www. bioconductor. org/docs/vignettes. html library("Biobase") library("GOstats") open. Vignette() # Load package of interest
Annotation • Provides software for associating data with biological metadata from web databases (eg annotate package). – Gen. Bank, Locus. Link and Pub. Med • Software tools for processing genomic annotation data, from databases (eg Gen. Bank, Gene Ontology, Locus. Link, Uni. Gene, Ann. Builder package) • Data packages are distributed to provide mappings between different probe identifiers (e. g. Affy IDs). • Bioma. Rt software to use biomart to search Ensembl genomes and other marts
What Packages do I need? Specific to you data and analysis pipeline but for examples: • Bioconductor Workshops • Bioconductor Workflows
- Slides: 17