The GMOD Project Lincoln Stein Cold Spring Harbor
The GMOD Project Lincoln Stein Cold Spring Harbor Laboratory
Test Subject: Michael Caudy o Drosophila neurobiologist o Proneural differentiation o notch pathway o HLH transcriptional activators/repressors o achaete/scute complex o No computer science training o Took my “bioinformatics for biologists” course
“Simple” Problem o Discover the transcriptional factor binding site code controlling proneural differentiation.
Regular Expression Search o Using achaete promoter as exemplar, search for combinations of known binding sites in particular architectures
Mike’s Got Lots of Data o 90 -11, 000 TF binding site clusters o 100 s-1000 s of genes o millions of interactions o Which genes are involved in neural differentiation? o Which have interactions with the pathway? o Which have suggestive mutant phenotypes?
Mike Needs a Database o Database management system for proneural differentiation genes. o Visualization/exploration tools for relationship of genes to putative TF clusters. o Literature citations o Link out to Fly. Base, Genbank & other DBs. o Add notes and other annotations.
Try to do it with Filemaker o o “Cluster-centric” vs “gene-centric”? Data import from Fly. Base? Storing images? Maintaining relationships between genes & clusters? o Updates?
Mike Needs a MOD o Model Organism Database o Repository for reagents o Stocks, vectors, clones o Genetic & physical maps o Large-scale data sets o Genome o EST sets, microarray results, 2 -cell hybrid interactions o Literature o Ontologies & Nomenclature o Meetings, announcements
Example MOD: Worm. Base
Looking for Sex
An Author Entry
Bibliography
Citation
Gene
Genome
Proteome
Comparative Genomics
Functional Genomics
Anatomy
How Worm. Base Works Images, Movies Web server Perl scripts You Database access library Genomic Data ACe. DB My. SQL
Can Mike reuse Worm. Base to manage his data? No!
Sorry Mike o Worm. Base website difficult to install o Data model nematode-centric o Data entry tools very processspecific o Customization difficult o Software documentation uneven o Standard operating procedure documentation uneven
MOD Redux o o o SGD, MGD, Fly. Base, TAIR, RGD… The same basic idea as Worm. Base Implementation entirely different Wheel reinvented many times Little software sharing This madness must stop!
The GMOD Project o Portable, open source software to support model organism databases o Multiple MODs involved o Worm, fly, yeast, mouse, arabidopsis, rat, monocot, [fugu], [E. coli] o Funded by NIH as of June 2002 o Programmers, coordinator, quarterly meetings http: //www. gmod. org
GMOD Home Page
The GMOD Pyramid Modular Applications Modular Schema Open Source DBMS & Middleware
A MOD Construction Set Appplication Layer Middleware Layer map browser annotation pipeline genome browser editor genomes genome Database Layer map editor citation browser maps genetic maps citation editor citations literature Bioperl Bio. Java Bio. Python
Chado – Modular Schema o Common schema for use by Fly. Base and Worm. Base o Ontology Driven o Small number of generic tables e. g. “feature” o Controlled vocabulary names object types and relationships among them: o “achaete protein is a HLH activator” o “m 8 protein inhibits achaete transcription” o Evidence-Savvy
GMOD Applications o o o Apollo genome annotation editor Gbrowse generic genome browser Pub. Search literature curation editor CMAP comparative map browser IMD insertional mutagenesis database management system
Apollo – BDGP & Sanger Center
Apollo Data adapters o Parser -> data models -> display o Existing data adapters o o GAME XML GFF Ensembl CGI server DAS o Write your own data adapter! o Extend Abstract. Data. Adapter class o Display options defined in config file
Who is Using Apollo? o BDGP o Reannotated Drosophila genome o Bristol-Myers Squibb o Launching Apollo from web browser via mime types o GNF o JDBC adapter layer over Bio. SQL o Biogen o View human genome alignment between public and Biogen internal database o Connected BLAT pipeline to Apollo o HGMP-RC Fugu Genomics group o Displaying annotations on fugu scaffolds
Pub. Search – TAIR & Rat. DB
Pub. Search – Gene Association
IMD – Insertional Mutagenesis Db
CMap – Gramene
Cmap – Detailed View
GBrowse – Worm. Base
GBrowse – Zoomed in
GBrowse – Zoomed Way In
GBrowse – Zoomed Way In
GBrowse – Keyword Search
GBrowse – Third Party Annotations
Sequence dumps & other reports
Extensively Customizable o End-user o Turn tracks on and off, change order, change packing & labeling attributes (stored in cookie) o Data provider o Change fonts, colors, text. o Change overview – genetic map, contigs, coverage, karyotype. o Define new tracks using simple config file. o Tinker with track appearance to hearts content.
Adding a New Track (a) Create a GFF file named “deletions. gff” Chr 1 targeted deletion 1293224 1294901. . . Deletion d 101 k 2 Chr 1 targeted deletion 8239811 8241116. . . Deletion d 680 k 2 Chr 2 targeted deletion 5866382 5866500. . . Deletion d 007 k 2 (b) Run the load_gff. pl script > load_gff. pl –d example_database deletions. gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature = deletion glyph = span fgcolor = red key = Knockouts link = http: //example. org/cgi-bin/knockout_details? $name citation = These are deletion knockouts produced by the example knockout consortium (http: //example. org/knockouts. html)
Extensively Extensible Plugins gbrowse CGI script Apache Web Server Glyphs Oracle adaptor Bio: : Graphics library Bio. Perl library Bio: : DB: : GFF adaptor Oracle Flat File adaptor Chado adaptor My. SQL/Postgres Flat Files
GBrowse on Gen. Bank! Gen. Bank? Plugins gbrowse CGI script Apache Web Server Glyphs Bio: : Graphics library Bio. Perl library Bio: : DB: : GFF Gen. Bank adaptor Proxy Adaptor My. SQL Gen. Bank
B. burgdorferi via Gen. Bank proxy
Who is Using GBrowse? o GMOD Members o o o o Worm. Base, Fly. Base, Rat. DB HGMP-RC Fugu genomics group KEGG (multiple microorganisms) Ingenium AG (mouse) Bristoll-Myers Squibb (drosophila) Texas A&M University (salmonella) Mc. Gill University (human chr 7) Institute of Systems Biology (human)
Genome Knowledgebase (GK)
“Constellation View” (in dev) DNA Replication Ethanol Catabolism Amino Acid Biosynthesis Glucose Metabolism RNA Splicing Oxidative Decarboxylation TCA Cycle
“Constellation View” (in dev) DNA Replication Ethanol Catabolism Amino Acid Biosynthesis Glucose Metabolism RNA Splicing Oxidative Decarboxylation TCA Cycle
Can Mike use GMOD to manage his data? Almost
Mike’s very own flybase
Uploaded Annotations
Details
Essential Pieces in Progress o o Generic MOD web site Strain & phenotype curation tools Pathway tools and browsers Tree (e. g. phylogenetic) tools & browsers o Biopipe – genome annotation pipeline
Find out more about GMOD o o Go to www. gmod. org Examine software matrix Find a project you’re interested in Contact project leader o Or contact Scott Cain: cain@cshl. org o Or mail gmod-dev@lists. sourceforge. net
Credits CSHL Adrian Arva Shuly Avraham Scott Cain Ken Clark Allen Day Xiaokang Pan BDGP Nomi Harris Suzanna Lewis Chris Mungall John Richter Sheng. Qiang Shu Colin Weil EBI Michele Clamp Stephen Searle Harvard David Emmert Stan Letovsky Cornell Medical School Michael Caudy Carnegie Institute Sue Rhee Danny Yoo http: //www. gmod. org
- Slides: 60