Biopackages net Operating System Packages for Bioinformatics Allen

Biopackages. net Operating System Packages for Bioinformatics Allen Day 2005. 17

What is a package? o o Software, config files, documentation, and/or data encapsulated in a single file Metadata describing: n n n Version, license, package “category” Dependencies What the package provides

o GMOD target audience n Small MODs

Package Dependency Graph chado-Hsa postgresql-Affx. Seq genome-Hsa-annotation-gene genome-Hsa-annotation-affymetrix chado perl-bioperl-go-perl postgresql-server n n genome-Hsa-nib Dependencies obo-core What the package provides ucsc-blat

Dependencies o o Build Dependency Installation Dependency

What is a Package Manager? o Tools to manage installation, upgrade, uninstallation of packages n n Verify package integrity (checksums) Maintain system integrity o o n n n Transactional Allow rollbacks Dependency checking Dependency graph recursion Allow software customization (patches)

Why bioinformatics packages? o Consistency of installation process n o Automatic dependency installation n o Perl modules especially bad – bioperl has 60+ modules in its dependency tree Integrity/Auditing of system state n o Bioinfo. package installs vary wildly, and commonly lack documentation Know an installed package works, which version, how to replicate system setup Tighter integration with operating system n Daemons, config & log file locations, etc.

What’s available? o RPM packages only right now n o Primary focus on Fedora Core 2 Some RPMs also available for n n n Fedora Core 3 Red. Hat 9 Cygwin

What’s available? o Three primary foci n n n Applications Libraries Data sets

Applications o o o Gbrowse Textpresso BLAT daemon NCBI Toolkit (BLAST, etc) HMMer

What’s available? o Libraries n n Bioperl R & Bioconductor Squid EMBOSS

What’s available? o Data sets n n Genome & protein sequence Sequence features Ontologies All installed using a common directory structure

What’s available? o o o UCSC tools (utilities, BLAT system service, CGI scripts) Bioperl R / Bioconductor GMOD apps (Gbrowse, Textpresso, …) Data packages n n Genome sequence (fa, nib, blastdb) Genome features (Affy probeset alignments, m. RNA, etc)

GMOD Components Available das 2 -Hsa gmod-web-Hsa apollo-Hsa cmap-Hsa chado o‘Hsa’ gbrowse textpresso genome-Hsa-nib turnkey ucsc-BLAT can be substituted for your organism o. Currently built for ‘Cel’, ‘Hsa’, ‘Sce’

More details… chado-Hsa genome-Hsa-annotation-gene genome-Hsa-annotation-affymetrix postgresql-Affx. Seq chado perl-go-perl-bioperl postgresql-server … … … genome-Hsa-nib ucsc-blat … …

Gene Expression Components DAS/2 for Genotyping, Gene. Chip Quant/Norm Pipeline chado-GEC chado-Hsa R Bioconductor

Resources o http: //www. biopackages. net n n ~1000 RPMs for Fedora Core 2, 3 Available via yum o See site for a configuration example.

TODO o Support more architectures n o Automate package build process n o Build for Cygwin & OS X. RPM has been ported to both Build farm of multiple architectures, controllable via scheduler (Grid. Engine) Automate (if possible) inclusion of new software / data releases

TODO o Build community interest and involvement n n Keep adding more packages! Keep existing packages current!

Acknowledgements o o o Patrick Alger Jared Fox Brian O’Connor Todd Harris Lincoln Stein Stanley Nelson
- Slides: 20