Introduction to Bioinformatics Dot Plots Dot Plots One
Introduction to Bioinformatics Dot Plots
Dot Plots • One of the simplest and oldest methods for sequence alignment • Visualization of regions of similarity – Assign one sequence on the horizontal axis – Assign the other on the vertical axis – Place dots on the space of matches – Diagonal lines means adjacent regions of identity
Simple Example • Construct a simple dot plot for GCTGAA GCGAA G G C T A A C T * G A A * * * One sequence goes horizontally, the other vertically Mark boxes w/ matched horizontal and vertical symbols Look for diagonal(s) Alignment: GCTGAA GCT-AA
Another Example • Construct a simple dot plot for GCTAGTCAGATCTGACGCTA GATGGTCACATCTGCCGC A long stretch of nearly identical residues is revealed starting at the fifth nucleotide of each sequence (GTCA-ATCTG-CGC).
Sliding Window and Cutoff • Problem – Plot becomes noisy when comparing large, similar sequences • Solution – Sliding window (size = w) – Cutoff (value = v) – Consider w nucleotides at a time – When at least v matches in a window, place a dot on the space where the window starts
Example • Same example with w = 4 and v = 3 • Compare to the previous plot. You make the call!
Worksheet • w = 4 and v = 3
What else can it do (and how)? • • Gaps Inverse subsequence Repeats Palindrome Genome rearrangement Exon identification RNA structure prediction Nice tool for conceptualizing sequencerelated algorithms
- Slides: 8