Class 5 Multiple Sequence Alignment Multiple sequence alignment

  • Slides: 20
Download presentation
Class 5: Multiple Sequence Alignment .

Class 5: Multiple Sequence Alignment .

Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-ATLVCLISDFYPGA--VTVAWKADS-AALGCLVKDYFPEP--VTVSWNSG--VSLTCLVKGFYPSD--IAVEWESNG-- Homologous residues are aligned together in

Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-ATLVCLISDFYPGA--VTVAWKADS-AALGCLVKDYFPEP--VTVSWNSG--VSLTCLVKGFYPSD--IAVEWESNG-- Homologous residues are aligned together in columns · Homologous - in the structural and evolutionary sense Ideally, a column of aligned residues occupy similar 3 d structural positions

Multiple alignment – why? u Identify sequence that belongs to a family · Family

Multiple alignment – why? u Identify sequence that belongs to a family · Family – a collection of homologous, with similar sequence, 3 d structure, function or evolutionary history u Find features that are conserved in the whole family · Highly conserved regions, core structural elements

The relation between the divergence of sequence and structure [Durbin p. 137, redrawn from

The relation between the divergence of sequence and structure [Durbin p. 137, redrawn from data in Chothia and Lesk (1986)]

Scoring a multiple alignment (1) Important features of multiple alignment: u Some positions are

Scoring a multiple alignment (1) Important features of multiple alignment: u Some positions are more conserved than others Position specific scoring u Sequences are not independent (related by phylogenetic tree) Ideally, specify a complete model of molecular sequence evolution

Scoring a multiple alignment (2) Unfortunately, not enough data … Assumption (1) Columns of

Scoring a multiple alignment (2) Unfortunately, not enough data … Assumption (1) Columns of alignment are statistically independent.

Minimum entropy Assumption (2) Symbols within columns are independent Entropy measure

Minimum entropy Assumption (2) Symbols within columns are independent Entropy measure

Sum of pairs (SP) Columns are scored by a “sum of pairs” function, using

Sum of pairs (SP) Columns are scored by a “sum of pairs” function, using a substitution scoring matrix Note:

Multidimensional DP

Multidimensional DP

Multidimensional DP

Multidimensional DP

Multidimensional DP Complexity Space: Time:

Multidimensional DP Complexity Space: Time:

Pairwise projections of MA

Pairwise projections of MA

MSA (i) [Carrillo and Lipman, 1988]

MSA (i) [Carrillo and Lipman, 1988]

MSA (ii)

MSA (ii)

MSA (iii) Algorithm sketch

MSA (iii) Algorithm sketch

Progressive alignment methods (i) Basic idea: construct a succession of PW alignments Variatoins: u

Progressive alignment methods (i) Basic idea: construct a succession of PW alignments Variatoins: u PW alignment order u One growing alignment or subfamilies u Alignment and scoring procedure

Progressive alignment methods (ii) Most important heuristic – align the most similar pairs first.

Progressive alignment methods (ii) Most important heuristic – align the most similar pairs first. Many algorithms build a “guide tree”: u Leaves – sequence u Interior nodes – alignments u Root – complete multiple alignment

Feng-Doolittle (1987) u Calculate all pairwise distances using alignment scores: u Construct u Highest

Feng-Doolittle (1987) u Calculate all pairwise distances using alignment scores: u Construct u Highest a guide tree using hierarchical clustering scoring pairwise alignment determines sequence to group alignment

Profile alignment u Use profiles for group to sequence and group to group alignments

Profile alignment u Use profiles for group to sequence and group to group alignments u CLUSTALW (Thompson et al. , 1994): · Similar to Feng-Doolittle, but uses profile alignment methods · Numerous heuristics

Iterative Refinement u Addresses u Iteratively “frozen” sub-alignment problem realign sequences or groups to

Iterative Refinement u Addresses u Iteratively “frozen” sub-alignment problem realign sequences or groups to a profile of the rest u Barton and Sternberg (1987) · Align two most similar sequences · Align current profile to most similar sequence · Remove each sequence and align it to profile