Sequence Alignment Two general methods for sequence alignment

  • Slides: 17
Download presentation
Sequence Alignment

Sequence Alignment

Two general methods for sequence alignment: 1. Global alignment: considers similarity across the full

Two general methods for sequence alignment: 1. Global alignment: considers similarity across the full extent of the sequences, e. g. Meg. Align 2. Local alignment: focuses on regions of similarity in parts of the sequences only, e. g. BLAST programs.

Questions: • How similar are two sequences? • What is the best alignment between

Questions: • How similar are two sequences? • What is the best alignment between the two sequences? • How should alignments be scored? • And, if gaps are allowed, how should they be scored? Three things are required : 1. a means of scoring matches and mismatches, 2. a means of scoring gaps, and 3. a method of using the two to evaluate numerous possible alignments.

Sequence 1 ALCPQCDIE ALC +CD+E Sequence 2 ALCAKCDVE

Sequence 1 ALCPQCDIE ALC +CD+E Sequence 2 ALCAKCDVE

Grouping of amino acids based on physico-chemical properties important in protein structures.

Grouping of amino acids based on physico-chemical properties important in protein structures.

Commonly used substitution matrices are: • Point Accepted Mutation matrix (PAM) PAM 250 •

Commonly used substitution matrices are: • Point Accepted Mutation matrix (PAM) PAM 250 • BLOcks SUBstitution Matrix (BLOSUM) BLOSUM 62

Gap penalties Mutational events include not only substitutions but also insertions and deletions. •

Gap penalties Mutational events include not only substitutions but also insertions and deletions. • Affined gap penalties impose an 'opening' penalty for a gap and an 'extension' penalty that decreases the relative penalty for each additional position in an already opened gap. Sequence 1 ALCPQCDIE ALC CD+E Sequence 2 ALCA--DVE

Sequence Search

Sequence Search

Sensitivity versus Speed • FASTA looks for exactly matching 'words‘. • BLAST uses a

Sensitivity versus Speed • FASTA looks for exactly matching 'words‘. • BLAST uses a scoring matrix.

v BLAST (Basic Local Alignment Search Tools) • The BLAST programs have been designed

v BLAST (Basic Local Alignment Search Tools) • The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity. • Include a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. • The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. • Local alignment may produce more biologically meaningful and sensitive results.

Dynamic programming • First described in the 1950 s. • First applied in this

Dynamic programming • First described in the 1950 s. • First applied in this context by Needleman and Wunsch in 1970. Breaking the original problem into smaller and smaller subproblems until the subproblems have a trivial solution, and then using those solutions to construct solutions for larger and larger portions of the original problem.

All BLAST programs take the following steps: 1. The query is divided to overlapping,

All BLAST programs take the following steps: 1. The query is divided to overlapping, short “word sizes”, (e. g. 3 for amino acid sequence, 11 for nucleotide sequence). 2. Words with simple compositions are filtered out. 3. The remaining words are searched for in the databases. 4. After finding the best matching sequence with each word, the matching is extended in both direction until the highest scoring pairs (HSP) are found. 5. HSPs are reported to the client. MNPLSSSGQPHTLM MNP SGQ NPL GQP PLS QPH LSS PHT SSS HTL SSG TLM MNGPLSSSGQTSTSPH LSS MNGPLSSSGQTSTSPH PLSSSGQ

BLAST Programs • BLASTN: Compares a nucleotide query sequence against a nucleotide sequence database.

BLAST Programs • BLASTN: Compares a nucleotide query sequence against a nucleotide sequence database. • BLASTP: Compares an amino acid query sequence against a protein sequence database. • BLASTX: Compares a nucleotide query sequence translated in all reading frames against a protein sequence database.

 • tblastn: Compares a protein query sequence against a nucleotide sequence database dynamically

• tblastn: Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. • tblastx: Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

If your sequence is NUCLEOTIDE Length Database Purpose BLAST Program Identify the query sequence

If your sequence is NUCLEOTIDE Length Database Purpose BLAST Program Identify the query sequence MEGABLAST (accept batch queries) Standard BLAST (blastn) Find sequences similar to query sequence Standard BLAST (blastn) Find similar proteins to translated query in a translated database Translated BLAST (tblastx) Protein Find similar proteins to translated query in a protein database Translated BLAST (blastx) Nucleotide Find primer binding sites or map short contiguous motifs Search for short, nearly exact matches Nucleotide 20 bp or longer 7 - 20 bp

If your sequence is PROTEIN Length Database Purpose BLAST program Identify the query sequence

If your sequence is PROTEIN Length Database Purpose BLAST program Identify the query sequence or find protein sequences similar to query Standard Protein BLAST (blastp) Find members of a protein family or build a custom position-specific score matrix PSI-BLAST Find proteins similar to the query around a given pattern PHI-BLAST Conserved Domains Find conserved domains in the query CD-search (RPS-BLAST) Conserved Domains Find conserved domains in the query and identify other proteins with similar domain architectures Domain Architecture Retrieval Tool (DART) Nucleotide Find similar proteins in a translated nucleotide database Translated BLAST (tblastn) Protein Search for peptide motifs Search for short, nearly exact matches Protein 15 residues or longer 5 -15 residues

BLAST search examples

BLAST search examples