Viral Genomics Allie Evans Colin Lappala Chelsea Layes

  • Slides: 42
Download presentation
Viral Genomics Allie Evans Colin Lappala Chelsea Layes Sheena Scroggins

Viral Genomics Allie Evans Colin Lappala Chelsea Layes Sheena Scroggins

The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific Rusch

The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. PLo. S Biology Vol. 5, No. 3, e 77 doi: 10. 1371/journal. pbio. 0050077 The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, et al. PLo. S Biology Vol. 5, No. 3, e 16 doi: 10. 1371/journal. pbio. 0050016 The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial Samples Shannon J. Williamson, Douglas B. Rusch, Shibu Yooseph, Aaron L. Halpern, Karla B. Heidelberg, John I. Glass, Cynthia Andrews-Pfannkoch, Douglas Fadrosh, Christopher S. Miller, Granger Sutton, Marvin Frazier, J. Craig Venter

Baltimore Classification of Viruses • • ds. DNA ss. DNA ds. RNA +ss. RNA

Baltimore Classification of Viruses • • ds. DNA ss. DNA ds. RNA +ss. RNA -ss. RNA-RT ds. DNA-RT http: //upload. wikimedia. org/wikipedia/en/thumb/0/07/Baltimore_Classification. png/720 px-Baltimore_Classification. png

Bacteriophages • Viruses that infect bacteria • Numerically dominant type of phage in oceans.

Bacteriophages • Viruses that infect bacteria • Numerically dominant type of phage in oceans. http: //www. scienceclarified. com/images/uesc_02_img 0070. jpg

Cyanophages • Prochlorococcus • Viruses have acquired and retained photosynthesis gene http: //web. mit.

Cyanophages • Prochlorococcus • Viruses have acquired and retained photosynthesis gene http: //web. mit. edu/mbsulli/www/NATL 2 A-40 -group-cropped. jpg

Phage Cycles

Phage Cycles

Lateral gene transfer l http: //upload. wikimedia. org/wikipedia/commons/thumb/4/42/Transduction_(genetics). svg/800 px-Transduction_(genetics). svg. png

Lateral gene transfer l http: //upload. wikimedia. org/wikipedia/commons/thumb/4/42/Transduction_(genetics). svg/800 px-Transduction_(genetics). svg. png

Metagenomics • Contribution of viral genomes to microbial environmental processes studied through metagenomic techniques.

Metagenomics • Contribution of viral genomes to microbial environmental processes studied through metagenomic techniques. • Metagenomics enables us to study microorganisms by examining DNA that is extracted directly from communities of environmental microorganisms

http: //camera. calit 2. net/metagenomics/what-is-metagenomics. php

http: //camera. calit 2. net/metagenomics/what-is-metagenomics. php

Metagenomic Challenges • Inefficiencies in sampling • DNA extraction methods • Construction of libraries

Metagenomic Challenges • Inefficiencies in sampling • DNA extraction methods • Construction of libraries • Inadequacies in data analysis and visualization tools • Low abundance species overlooked • Lack of reference genomes • Sequencing complex environments cost prohibitive • Standardizing metadata

Methods First: • Cruise the world • Collect 90 -200 L of seawater from

Methods First: • Cruise the world • Collect 90 -200 L of seawater from each of 37 different stations • Record p. H, salinity, temperature, etc. of water

Methods • Pass water through 2. 0, 0. 8, 0. 1 µm filters, TFF

Methods • Pass water through 2. 0, 0. 8, 0. 1 µm filters, TFF to 50 Kda for viral concentrate • Store at -20°C until shipment from next port

Sequencing Preparation • Extract DNA • Nebulize DNA – Average of 1. 0 -2.

Sequencing Preparation • Extract DNA • Nebulize DNA – Average of 1. 0 -2. 2 kb fragments • Gel electrophoresis extraction – purify and determine lengths • Subclone into E. coli • Colonies selected for inserts • Shotgun sequence inserts

Sequencing • End sequence each insert – Average of 822 bp sequenced per end

Sequencing • End sequence each insert – Average of 822 bp sequenced per end www. pasteur. fr/recherche/genopole/PF 8/equipement_en. html

Metagenomic Assembly • Same procedure as in humans, Drosophila, dogs, etc. Unitigs using 98%

Metagenomic Assembly • Same procedure as in humans, Drosophila, dogs, etc. Unitigs using 98% or 94% homology for overlap Scaffolding Consensus sequence Venter et al. (2001)

Metagenomic Assembly New uses for shotgun sequencing and assembly • Multiple organisms at once

Metagenomic Assembly New uses for shotgun sequencing and assembly • Multiple organisms at once • Likely novel organisms Problems? • • Mate-pair data relied on more heavily, since overlap coverage is low or unknown Need verification of assembly somehow

Metagenomic Assembly • Created multiple distinct assemblies – 98% homology unitigs – 94% homology

Metagenomic Assembly • Created multiple distinct assemblies – 98% homology unitigs – 94% homology unitigs – non-preassembled end-pairs at various stringencies for multiple sequence alignments • Multiple assemblies allowed cross-referencing, quality assurance.

Taxonomic Assignment Protein-ORF based strategy • 5. 6 million sequences from GOS • All

Taxonomic Assignment Protein-ORF based strategy • 5. 6 million sequences from GOS • All ORFs in same sequence scaffold compared to NCBI protein database using BLAST • Votes tallied from each ORF into pools for scaffold • Archea, Bacteria, Eukaryota, Viral • 5. 0 million sequence assigned using this method

Quantitative PCR How many copies of studied proteins exist: • from station to station?

Quantitative PCR How many copies of studied proteins exist: • from station to station? • versus one another? http: //www. invitrogen. com/c ontent. cfm? pageid=10037

Quantitative PCR • Level of fluorescence checked after each PCR cycle • Initial amount

Quantitative PCR • Level of fluorescence checked after each PCR cycle • Initial amount can be inferred using standard curve • Multiple dilutions allow comparison - Outcome reported only if: -- Ten-fold above no-template negative control AND -- 10 -2 dilution results in 3 -30 more than 10 -3 dilution http: //www. invitrogen. com/c ontent. cfm? pageid=10037

Clustering and Phylogeny • Proteins clustered and compared to NCBI – Sequence alignments, not

Clustering and Phylogeny • Proteins clustered and compared to NCBI – Sequence alignments, not just domains – Gene families bolstered with new genes • Phylogeny trees generated – Multiple sequence alignments CLUSTALW – Used only long, fairly homologous samples • PHYLIP used to build trees – Based on difference matrix

Results • 37 marine surface water samples collected • 7. 7 million sequencing reads

Results • 37 marine surface water samples collected • 7. 7 million sequencing reads were produced • Identified 154, 662 viral peptide sequences

Identification of Viral Sequences • Data from microbial fraction of water samples was examined

Identification of Viral Sequences • Data from microbial fraction of water samples was examined • Looked for viral sequences by comparison to the NCBI non-redundant protein database • 154, 662 viral peptide sequences were identified • Approximately 3% of predicted proteins were identified as viral sequences • Number of viral sequences thought to be largely underestimated

Classification through Protein Clustering • Of 154, 662 viral peptide sequences, 117, 123 or

Classification through Protein Clustering • Of 154, 662 viral peptide sequences, 117, 123 or 76% fell within 380 protein clusters containing at least 20 proteins • Remaining sequences fell within clusters containing less than 20 proteins • Average cluster size contained 258 peptide sequences

Neighbor Functional Linkage Analysis • Used to verify that they were on viral instead

Neighbor Functional Linkage Analysis • Used to verify that they were on viral instead of pro-viral regions of bacterial genomes • Proportion of viral same-scaffold ORFs range from 32% to 92% for the metabolic gene families studied • Occurrence of viral neighbors on same scaffolds as hostderived viral genes supports hypothesis that sources of the sequences are viruses rather than bacterial

Quantitative PCR • q. PCR used on DNA collected from 5 sampling locations •

Quantitative PCR • q. PCR used on DNA collected from 5 sampling locations • Yields were initially too low, so samples were pooled • Viral gene families psb. D, pet. E, spe. D, tal. C, pst. S, and pho. H were included • Results indicate that host-derived viral genes are viral in nature • Viral genes encoding environmentally significant hostspecific functions are prevalent in aquatic samples

Phylogenetic Analyses Figure 2. Phylogenetic trees of all GOS and publicly available psb. A(A)

Phylogenetic Analyses Figure 2. Phylogenetic trees of all GOS and publicly available psb. A(A) and psb. D(B) sequences. BS indicates bootstrap values. GOS and public viral sequences are colored aqua and pink respectively. GOS and public prokaryotic sequences are navy blue and lime green respectively. doi: 10. 1371/journal. pone. 0001456. g 002

Figure 3. Phylogenetic trees of all GOS and publicly available pst. S(A) and tal.

Figure 3. Phylogenetic trees of all GOS and publicly available pst. S(A) and tal. C(B) sequences. BS indicates bootstrap values. GOS and public viral sequences are colored aqua and pink respectively. GOS and public prokaryotic sequences are navy blue and lime green respectively. GOS eukaryotic sequences are colored yellow. doi: 10. 1371/journal. pone. 0001456. g 003

All viral gene families were positively correlated with water temperature Some viral gene families

All viral gene families were positively correlated with water temperature Some viral gene families were correlated with salinity, water depth, and calculated trophic status indices Different environmental pressures may influence acquisition of these genes by viruses Table S 7 shows the correlations between viral gene families and environmental parameters

Discussion • Most studies have focused on the filtered viral fraction of the data

Discussion • Most studies have focused on the filtered viral fraction of the data • This is the first study to focus on the viral components in the microbial fraction of the data • Strong evidence for abundance and distribution of environmentally important host-derived viral gene families • Distribution patterns of host-derived viral families over environmental gradients • Evidence of interactions between bacteriophage and host organisms

Detection of Viruses in Mircrobial Data • Large viruses (0. 1 µm– 0. 22

Detection of Viruses in Mircrobial Data • Large viruses (0. 1 µm– 0. 22 µm) get caught in the filters because of their size and geometric shape • Small free living phages flow through the filter, but when viruses physically interacting with the microbes will be caught along with the microbes • When filtrating large volumes, biomass accumulates on the filter and viruses get caught • Most viruses found within the aquatic microbial communities studies seemed to be in the lytic infection cycle therefore they were actively replicating their DNA

Viruses with Metabolic Genes • Through lateral gene transfer, metabolic genes can be acquired

Viruses with Metabolic Genes • Through lateral gene transfer, metabolic genes can be acquired from the host • Acquisition, retention, and expression of metabolic genes may increase fitness • Key metabolic processes and pathways running during infection allows maximum replication • Previous studies on host-derived metabolic viral genes has been on the photosynthesis genes psb. A and psb. D of a cyanophage • Previous studies did not focus on abundance or distribution of these genes in the oceans

Host-Derived Metabolic Gene Families • In aquatic viral communities sampled, host-derived genes were found

Host-Derived Metabolic Gene Families • In aquatic viral communities sampled, host-derived genes were found widely distributed in significant proportions • Quantitative PCR of these genes confirmed high abundance • Not known if these genes were expressed at the time of sampling • Unlikely to see these genes in high abundance if they: – Were not expressed – Did not have a fitness advantage

“Suggests that viruses may play a more substantial role in environmentally relevant metabolic processes

“Suggests that viruses may play a more substantial role in environmentally relevant metabolic processes than previously recognized such as the conversion of light to energy, photoadaptation, phosphate acquisition, and carbon metabolism”

Potential Evolutionary Viral-Host Relationships • The study of the cyanophage found that the hostderived

Potential Evolutionary Viral-Host Relationships • The study of the cyanophage found that the hostderived genes undergo higher mutation rates than their cyanobacterial nucleotide counterpart • After phage acquisition, the genes could diversify • Mutated viral genes could form gene reservoirs for the host • Through horizontal gene transfer, viruses could promote diversity and distribution

Prochlorococcus – P-SSM 4 -like Phage • Prochlorococcus is one of the most widespread

Prochlorococcus – P-SSM 4 -like Phage • Prochlorococcus is one of the most widespread picophytoplankton in the ocean • P-SSM 4 -like phage may influence the abundance, diversity, and distribution of Prochlorococcus • Statistically significant relationship between the Prochlorococcus and the -SSM 4 -like phage P

Metagenomic Viral-Microbial Interactions • This study of viral-microbial association between communities was coincidental •

Metagenomic Viral-Microbial Interactions • This study of viral-microbial association between communities was coincidental • Horizontal transfer of metabolic genes • More studies necessary on the viral-microbial diversity and genetic complement – Community relationships – Evolutionary relationships

Any Questions?

Any Questions?