Integrated Bioinformatics Data and Analysis Tools for Herpesviridae
Integrated Bioinformatics Data and Analysis Tools for Herpesviridae Viruses in the Virus Pathogen Resource (Vi. PR) Yun Zhang 1, Brett Pickett 1, Eva Sadat 2, , R. Burke Squires 2, Jyothi Noronha 2, Sanjeev Kumar 3, Sam Zaremba 3, Zhiping Gu 3, Liwei Zhou 3, Chris Larsen 4, Wei Jen 3, Edward B. Klem 3, Richard H. Scheuermann 1 1 J. Craig Venter Institute, San Diego, CA; 2 Department of Pathology, Univ. of Texas Southwestern Medical Center, Dallas, TX; 3 Northrop Grumman Health Solutions, Rockville MD; 4 Vecna Technologies, Greenbelt MD. Introduction GBrowse for Genome Viewing The Virus Pathogen Database and Analysis Resource (Vi. PR, www. viprbrc. org), sponsored by the National Institute of Allergy and Infectious Diseases serves as a single publicly-accessible repository of integrated datasets and analysis tools for 14 different virus families including Herpesviridae to support wet-bench virology researchers focusing on the development of diagnostics, prophylactics, vaccines, and treatments for these pathogens 1. • Provides both bird’s eye and detailed views of genomes and genome annotations. • Available for Reference Sequences of Pox- and Herpes viruses. Host-virus Interaction Data Vi. PR is currently funding Driving Biological Projects to produce whole genome sequences for Human Herpesvirus 1 oral or neurotropic isolates. Lists of host genes that are differentiallyexpressed during infection of human neuronal cells will also be deposited. Vi. PR Supports 14 Virus Families Arenaviridae Flaviviridae Poxviridae Bunyaviridae Hepeviridae Reoviridae Caliciviridae Herpesviridae Rhabdoviridae Coronaviridae Paramyxoviridae Togaviridae Filoviridae Picornaviridae Vi. PR Integrates Data from Many Sources • Gen. Bank sequence records, gene annotations, and strain metadata • Protein Databank (PDB) 3 D protein structures • Immune epitopes from the Immune Epitope Database (IEDB) • Clinical data • Host Factor Data generated from the NIAID Systems Biology projects and the Vi. PR-funded Driving Biological Projects • Uni. Prot. KB protein annotations • Gene Ontology (GO) classifications • Additional data derived from computational algorithms Vi. PR Provides Analysis and Visualization Tools • Multiple Sequence Alignment • Phylogenetic Tree Construction • Sequence Polymorphism Analysis • Metadata-driven Comparative Genomics Statistical Analysis • Genome Annotator • Gbrowse Genome Viewer • Sequence Format Conversion • BLAST Sequence Similarity Search • 3 D Protein Structure Visualization and Movie Generation • Sequence Feature Variant Type (SFVT) Analysis Vi. PR enables you to store and share data and results through the Vi. PR Workbench Figure 2: A screenshot of the GBrowse window. The “Overview” panel displays the entire genome; the “Region” panel displays a portion of the genome surrounding a specified region; the “Details” panel displays several tracks of genomic features. Figure 5: Host Factor Data in Vi. PR. A host factor experiment result summary showing differentially expressed genes in human cells infected with SARS. 3 D Protein Structure Viewer Viral Protein Ortholog Groups Vi. PR groups viral proteins into clusters based on predicted orthology within a virus taxon to facilitate gene/protein search, gene function inference, and virus evolution research. Figure 3: A screenshot of the Ortholog Group search result page. Each ortholog group name is linked to all viral proteins in the same ortholog cluster for the selected taxon. Multiple Sequence Alignment, Phylogenetic Tree and Metadata-driven Comparative Analysis Tool A D B C Figure 6: 3 D Protein Structure Viewer 5 in Vi. PR. A display of a 3 D protein structure for the Thymidine Kinase protein from Herpes Simplex Type 1 virus. Ligands, epitopes and active sites are highlighted (PDB ID: 1 E 2 I). Summary Vi. PR combines the strength of a relational database with a suite of integrated bioinformatics tools to support everything from basic sequence and structural analyses to genotype-phenotype studies and host-virus interaction studies. The uniqueness of Vi. PR lies in: • integrating data from various sources • capturing unique data on the host response to virus infection • combining the available tools to quickly perform complex analytical workflows • facilitating rapid hypothesis generation using bioinformatics methods for subsequent experimental testing • allowing data sharing and storage with collaborators Acknowledgements Figure 1: A screenshot of the Vi. PR homepage The Vi. PR homepage is the portal used to access the various types of data and advanced functionality for any supported virus family. Figure 4: Comparative Genomic Analytical tools in Vi. PR A multiple sequence alignment of Human Herpesvirus 1 (HHV-1) (A) whole genome sequences 2 and (B) VP 16 nucleotide sequences 3. (C) A phylogenetic tree visualized with the Archaeopteryx 4 tool shows the relationship between HHV-1 VP 16 proteins, red represents human while pink indicates unknown host. (D) The Metadata-driven Comparative Analysis Tool for Sequences uses statistics to identify individual positions that correlate with a specified metadata attribute. We would like to thank the primary data providers for the data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing Vi. PR, which has been wholly supported with federal funds from the NIH/NIAID (N 01 AI 2008038 to R. H. S. ). References Pickett, B. E. , et al. (2012) Vi. PR: an open bioinformatics database and analysis resource for virology research. Nucl. Acids Res. 40(D 1): D 593 -D 598 2 Darling, A. C. E. , et al. (2004) Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Res. , 14: 1394 -1403 3 Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5): 1792 -1797. 4 Zmasek, C. M. and Eddy, S. R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383 -384. 5 Hanson, R. (2010) Jmol - a paradigm shift in crystallographic visualization. Journal of Applied Crystallography, 43, 1250 -1260. 1
- Slides: 1