Class 5 Multiple Sequence Alignment Multiple sequence alignment
- Slides: 20
Class 5: Multiple Sequence Alignment .
Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-ATLVCLISDFYPGA--VTVAWKADS-AALGCLVKDYFPEP--VTVSWNSG--VSLTCLVKGFYPSD--IAVEWESNG-- Homologous residues are aligned together in columns · Homologous - in the structural and evolutionary sense Ideally, a column of aligned residues occupy similar 3 d structural positions
Multiple alignment – why? u Identify sequence that belongs to a family · Family – a collection of homologous, with similar sequence, 3 d structure, function or evolutionary history u Find features that are conserved in the whole family · Highly conserved regions, core structural elements
The relation between the divergence of sequence and structure [Durbin p. 137, redrawn from data in Chothia and Lesk (1986)]
Scoring a multiple alignment (1) Important features of multiple alignment: u Some positions are more conserved than others Position specific scoring u Sequences are not independent (related by phylogenetic tree) Ideally, specify a complete model of molecular sequence evolution
Scoring a multiple alignment (2) Unfortunately, not enough data … Assumption (1) Columns of alignment are statistically independent.
Minimum entropy Assumption (2) Symbols within columns are independent Entropy measure
Sum of pairs (SP) Columns are scored by a “sum of pairs” function, using a substitution scoring matrix Note:
Multidimensional DP
Multidimensional DP
Multidimensional DP Complexity Space: Time:
Pairwise projections of MA
MSA (i) [Carrillo and Lipman, 1988]
MSA (ii)
MSA (iii) Algorithm sketch
Progressive alignment methods (i) Basic idea: construct a succession of PW alignments Variatoins: u PW alignment order u One growing alignment or subfamilies u Alignment and scoring procedure
Progressive alignment methods (ii) Most important heuristic – align the most similar pairs first. Many algorithms build a “guide tree”: u Leaves – sequence u Interior nodes – alignments u Root – complete multiple alignment
Feng-Doolittle (1987) u Calculate all pairwise distances using alignment scores: u Construct u Highest a guide tree using hierarchical clustering scoring pairwise alignment determines sequence to group alignment
Profile alignment u Use profiles for group to sequence and group to group alignments u CLUSTALW (Thompson et al. , 1994): · Similar to Feng-Doolittle, but uses profile alignment methods · Numerous heuristics
Iterative Refinement u Addresses u Iteratively “frozen” sub-alignment problem realign sequences or groups to a profile of the rest u Barton and Sternberg (1987) · Align two most similar sequences · Align current profile to most similar sequence · Remove each sequence and align it to profile
- T coffee multiple sequence alignment
- Pasta alignment
- Tcoffee multiple sequence alignment
- Progressive multiple sequence alignment
- A named sequence of statements is known as
- Praline multiple sequence alignment
- Global alignment
- Global alignment vs local alignment
- Global alignment vs local alignment
- Global and local alignment in bioinformatics
- Global alignment vs local alignment
- Bioedit download
- Dot plot bioinformatics
- Clustal omega alignment
- Sequence alignment
- Sequence alignment
- Hirschberg's algorithm
- Actcg
- Baseline
- Disadvantages of mimd
- Time sequence of multiple interrupts