COMPUTATIONAL METHODS OF SEQUENCE ALIGNMENT OUTLINE Sequence Alignment

OUTLINE • Sequence Alignment • Types of a sequence alignment • Methods of sequence

Definition of sequence alignment Sequence alignment is a way of arranging sequences of DNA,

Interpretation of sequence alignment • Sequence alignment is useful for discovering structural, functional and

Types of Sequence Alignment is of two types , namely : Global Alignment Local

Types of Sequence Alignment • L G P S S K Q T G

Types of Sequence Alignment Global alignment Input: treat the two sequences as potentially equivalent

Types of Sequence Alignment Local alignment Input: The two sequences may or may not

Method of sequence alignment • Dot matrix method • Dynamic programming method • Word

Dot matrix analysis • A dot matrix is a grid system where the similar

Dot matrix analysis • Dot matrix method is a qualitative and simple to analyze

Dot matrix analysis: Two identical sequences • Nucleic Acids Dot Plots

Dot matrix analysis: two very different sequences • Nucleic Acids Dot Plots of genes

Dot matrix analysis: two similar sequences • Nucleic Acids Dot Plots of genes

• Each alignment has a score, several different alignments can have identical scores

Word Method or K-tuple method • It is used to find an optimal alignment

Word Method or K-tuple method • In the FASTA method, the user defines a

Pairwise v Multiple alignment • Here we have focused on pairwise alignments, however there

Slides: 19

Download presentation

COMPUTATIONAL METHODS OF SEQUENCE ALIGNMENT

OUTLINE • Sequence Alignment • Types of a sequence alignment • Methods of sequence alignment Bioinformatics

Definition of sequence alignment Sequence alignment is a way of arranging sequences of DNA, RNA or protein to identify regions of similarity. The similarity may indicate functional, structural and evolutionary significance. The sequence alignment is made between a known sequence and unknown sequence or between two unknown sequences. The known sequence is called a reference sequence. The unknown sequence is called the query sequence.

Interpretation of sequence alignment • Sequence alignment is useful for discovering structural, functional and evolutionary information. • Sequences that are highly alike may have similar secondary and 3 D structure, similar function and likely a common ancestral sequence. It is extremely unlikely that such sequences obtained similarity by chance. • Large scale genome studies revealed existence of horizontal transfer of genes and other sequences between species, which may cause similarity between some sequences in very distant species.

Types of Sequence Alignment is of two types , namely : Global Alignment Local Alignment Global Alignment : is matching the residues of two sequences across their entire length. Global alignment matches the identical sequences. Local Alignment : is a matching two sequence from regions which have more similarity with each other.

Types of Sequence Alignment • L G P S S K Q T G K G S - S R I W D N • Globalalignment • L N - I T K S A G K G A I M R L G D A • - - - - T G K G - - - - • Localalignment • - - - - A G K G - - - -

Types of Sequence Alignment Global alignment Input: treat the two sequences as potentially equivalent Goal: identify conserved regions and differences Applications: - Comparing two genes with same function (in human vs. mouse). - Comparing two proteins with similar function.

Types of Sequence Alignment Local alignment Input: The two sequences may or may not be related Goal: see whether a substring in one sequence aligns well with a substring in the other Note: for local matching, overhangs at the ends are not treated as gaps Applications: - Searching for local similarities in large sequences (e. g. , newly sequenced genomes). - Looking for conserved domains or motifs in two proteins

Method of sequence alignment • Dot matrix method • Dynamic programming method • Word or k-tuple methods

Dot matrix analysis • A dot matrix is a grid system where the similar nucleotides of two DNA sequences are represented as dots. • It also called dot plots. • It is a pairwise sequence alignment made in the computer. • The dots appear as colourless dots in the computer screen. • In dot matrix, nucleotides of one sequence are written from the left to right on the top row and those of the other sequence are written from the top to bottom on the column of the matrix. At every point, where the two nucleotides are the same, a dot in the intersection of row and column becomes a dark dot. Each dot in the plot represents a matching nucleotide or amino acid.

Dot matrix analysis • Dot matrix method is a qualitative and simple to analyze sequences. however , it takes much time to analyze large sequences. • Dot matrix method is useful for the following studies : • Sequence similarity between two nucleotide sequences or two amino acid sequences. • Insertion of short stretches in DNA or amino acid sequence. • Deletion of short stretches from a DNA or amino acid sequence. • Repeats or inserted repeats in a DNA or amino acid sequence.

Dot matrix analysis: Two identical sequences • Nucleic Acids Dot Plots

Dot matrix analysis: two very different sequences • Nucleic Acids Dot Plots of genes

Dot matrix analysis: two similar sequences • Nucleic Acids Dot Plots of genes

• Each alignment has a score, several different alignments can have identical scores as the method can produce more than one optimal alignment. Manipulation of parameters can discriminate alignments with similar scores. • Global alignment is based on the Needleman-Wunsch algorithm and local alignment on the Smith-Waterman algorithm. The Smith-Waterman underpins tools that align sequencing data to reference genomes e. g. BWA. Both algorithms are derived from the basic dynamic programming algorithm.

Word Method or K-tuple method • It is used to find an optimal alignment solution. • This method is useful in large-scale database searches to find whethere is significant match available with the query sequence. • Word method is used in the database search tools like FASTA and the BLAST family. • They identify a series of short, non-overlapping subsequences (words) of the query sequence. • Then they are matched to candidate database sequences to get a result. • This is a heuristic method (approximate)

Word Method or K-tuple method • In the FASTA method, the user defines a value k to use as the word length to search the database. It is slower but more sensitive at lower values of k. They are also preferred for searches involving a very short query sequence. • BLAST provides a number of algorithms optimized for particular types of queries e. g. for distantly related sequence matches. • It is a good alternative to FASTA. Like FASTA, BLAST uses a word search of length k, but evaluates only the most significant word matches rather than every word match. • Later we will study BLAST in greater depth and try out BLAST alignments!

Pairwise v Multiple alignment • Here we have focused on pairwise alignments, however there are cases when we wish to compare more than one sequence i. e. multiple sequence alignments • We can cluster groups of sequences according to similarity and we typically use different tools for this • More to come….