Comparative Genomics Virulence in E coli Diversity of

  • Slides: 28
Download presentation
Comparative Genomics Virulence in E. coli Diversity of Genomes How Many Genomes are There?

Comparative Genomics Virulence in E. coli Diversity of Genomes How Many Genomes are There? Different Genome Perspectives

Virulence in E. coli n n n 1997 - Fred Blattner lab at UWis

Virulence in E. coli n n n 1997 - Fred Blattner lab at UWis sequenced E. coli K 12 strain 2001 - sequenced pathogenic strain O 157: H 7 This strain causes hemorrhagic colitis which affects 75, 000 people each year Genome has 5. 5 Mb instead of 4. 6 Mb Has 1. 3 Mb of “O-islands” not found in K 12, K 12 has. 5 Mb of “K-islands” not found in O 157: H 7 (1387 and 528 genes, respectively)

n n Island Genes Many of the O 157: H 7 unique genes are

n n Island Genes Many of the O 157: H 7 unique genes are predicted to be virulence genes, including toxins, metabolic pathways, transporters, and adhesion molecules. K-12, however, also have genes in these categories but the strain is not virulent. A striking difference between O-islands and Kislands is their base compositions, which differ from that of the backbone. Many of the island genes have orthologs in other species and viruses and may have resulted from horizontal transfer.

Chi-square Analysis How to tell if base compositions, such as those associated with O-

Chi-square Analysis How to tell if base compositions, such as those associated with O- and K- islands really are different from the norm. Base Seq 1 Seq 2 Total A 1, 000 600 1, 600 C 1, 000 800 1, 800 G 1, 000 700 1, 700 T 1, 000 900 1, 900 Total 4, 000 3, 000 7, 000

Hypothesis: the base composition is equal Observed 1, 000 600 800 700 900 Expected

Hypothesis: the base composition is equal Observed 1, 000 600 800 700 900 Expected 914. 3 1028. 6 971. 4 1085. 7 685. 7 771. 4 728. 6 814. 3 (O - E)2 7344. 5 818. 0 7344. 5 (O - E)2/E 8. 03. 80. 80 8. 03 c 2 = 35. 32

Differences Between Two Strains n n n Virulence may be due to genes on

Differences Between Two Strains n n n Virulence may be due to genes on the “Oislands” or to differences between shared genes Although they share 75% of their DNA, only 25% of their genes are identical The rest have at least 1 base difference While this amount of difference is small, it can mean the difference between healthy individuals and those with sickle-cell anemia or cystic fibrosis

460 Genomes, and counting… n n The more genomes we sequence, the wide diversity

460 Genomes, and counting… n n The more genomes we sequence, the wide diversity of these genomes becomes more evident. These genomes range in size from. 5 -10 Mb and in GC content from 25 -75%. These seem to correlate, since GTP and CTP take more energy to make. One trend is that stable niches tend to accommodate small genomes while volatile environments do not. One thing that remains fairly constant is coding capacity, prokaryotes all have about 1 gene/kb.

Circular Prokaryotic Chromosomes n n 1) 2) Another thing we have learned are that

Circular Prokaryotic Chromosomes n n 1) 2) Another thing we have learned are that not all prokaryotic chromosomes are circular. 3 distantly related groups of bacteria have linear chromosomes that seem to have evolved independently. In regards to chromosome #, some confusion exists whether particular pieces of DNA are chromosomes or plasmids. Two criteria are used to define a chromosome: Does it contain essential genes? Does it contain ribosomal genes?

Genomes are Constantly Changing n n The size of a genome may change rapidly

Genomes are Constantly Changing n n The size of a genome may change rapidly due to horizontal transfer or fusing of genomes. The cost of replicating additional DNA must be balanced with the benefit of having genes that may lend a selective advantage. If the cell evolves to fill a new niche, losing unused genes may be advantageous. Most bacteria in similar niches have similar sized genomes. Gut bacteria, for instance, have genomes in the 4 -5 Mb range.

How Many Genomes are There?

How Many Genomes are There?

Experimental Procedures • 1, 500 liters of surface water was collected 7 times from

Experimental Procedures • 1, 500 liters of surface water was collected 7 times from 4 different sites around the sea. • This was passed through filters which trapped particles between. 1 and 3 mm. • Collected cells were lysed and their DNA cut into <1 kb pieces which were then cloned. • Genomic DNA was extracted from the filters and subjected to shotgun sequencing.

Results: • About 1 million separate sequences were obtained, totaling 1. 6 billion base

Results: • About 1 million separate sequences were obtained, totaling 1. 6 billion base pairs of DNA • At least 1, 412 different r. RNA genes are represented in this sample, including 148 which are new to the database. • Using 6 other genes for comparison, a range of 341 -569 phylotypes (ie. species) were sampled (including 12 complete genomes). • As the cost of sequencing DNA continues to drop, this approach may become the “next wave” of research into biodiversity

Sampling Problems n n n One problem with this method is that favors more

Sampling Problems n n n One problem with this method is that favors more abundant species. The coverage for a particular gene in an abundant species is better and a greater number of genes/species exist. 53% of all DNA from sample #1 were from two genera: Shewanella & Burkholderia. This is a mystery since the former prefers nutrient-rich water and the latter is usually terrestrial. Calculations to correct for lost species estimate that 1, 800 different species may have been present.

n n New Genes Discovered A total of 1. 2 million genes were characterized

n n New Genes Discovered A total of 1. 2 million genes were characterized in this study, including 70, 000 novel ones. Bacteriorhodopsin was one popular gene family, previous sampling using PCR had uncovered 67 homologs, but this study found 782 new ones. 13 families of bacteriorhodopsin were characterized, from a wider range of bacteria than previously thought. One must keep in mind that this data was collected using 1. 5 x 103 l of water, while the ocean’s estimated volume is 1. 37 x 1015 l.

Families of Bacteriorhopsin

Families of Bacteriorhopsin

Different Genome Perspectives n n 1) 2) 3) 4) 5) What you see using

Different Genome Perspectives n n 1) 2) 3) 4) 5) What you see using comparative genomics depends on what perspective you take. Zooming out, from small to large, we get: amino acids gene families segments of chromosomes whole chromosomes

Out with the Old, In with the New n n One group decided to

Out with the Old, In with the New n n One group decided to look at proteomes at the amino acid level. Instead of worrying about the proteins encoded, the researchers identified amino acids that were identical in 2 distantly related species but different in 2 closely related species. This focuses on evolutionary drift. One pattern was seen: amino acids predicted to be among the 1 st incorporated into the genetic code are decreasing, while those predicted to be newer are increasing in frequency. This is true across all 3 domains of life.

Figure 3. 4

Figure 3. 4

n n Gene Family Level A German group led by Svante Pääbo studied the

n n Gene Family Level A German group led by Svante Pääbo studied the evolution of olfactory receptor (OR) genes in 19 primates + mouse. They plotted the number of OR pseudogenes in each species studied. New World monkeys clustered around 18% pseudogenes, while Old World monkeys had around 30%. Humans had >50% pseudogenes. The one exception is the howler monkey, which seems out of place. Interestingly, all Old World monkeys see in 2 colors, with the exception of the howler monkey, which sees in 3 colors like New World monkeys.

Whole Chromosome Level Evan Eichler at Case Western Reserve examined human chromosome 7, looking

Whole Chromosome Level Evan Eichler at Case Western Reserve examined human chromosome 7, looking for recombination hot spots. There were a total of 27, 12 on the short arm (p) and 15 on the long arm (q). A team of researchers mapped the recombination events that have produced syntenic regions in human, mouse, rat, and dog. CTVM is a genetic disease in dogs that leads to thickened heart valves, it has been mapped to canine chromosome 9. This region is syntenic with chromosome 17 in humans.

Dot Plots of Recombination

Dot Plots of Recombination

Comparing 4 Chromosomes n n n When all 4 chromosomes (dog, human, mouse &

Comparing 4 Chromosomes n n n When all 4 chromosomes (dog, human, mouse & rat) are compared simultaneously, colored lines are used to highlight the recombinational hotspots, with shaded regions showing the 2 large human recombined areas. Crossing lines show inversions, while bent lines that do not cross show translocations. The site of recombination, as well as gene loss, is often conserved across species. Highly repetitive DNA is often involved in recombination

Most Recent Common Ancestor Chromosomes can be Constructed using recombination data.

Most Recent Common Ancestor Chromosomes can be Constructed using recombination data.