# Sequence Homology M M Dalkilic Ph D Monday

• Slides: 15

Sequence Homology M. M. Dalkilic, Ph. D Monday, September 08, 2008 Class IV Indiana University, Bloomington, IN 1 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Outline �New Programming and written homework Friday �New Reading Posted on Website �Readings [R] Chaps 5 �Most Important Aspect of Bioinformatics— homology search through sequence similarity (cont’d) �Sequence Alignment 2 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Introduction to Entropy �Shannon’s theory of quantifying communication �Can be derived axiomatically �Simple model 3 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Introduction to Entropy �An increase in surprise means an increase in information �A decreate in surprise means a decrease in information of a �Since for each message set Encoding we associate M probability function 4 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Introduction to Entropy �An increase in surprise means an increase in information �A decrease in surprise means a decrease in information Encoding of � MSince for each message set we associate a probability function 5 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Introduction to Entropy �An increase in surprise means an increase in information �A decrease in surprise means a decrease in information �Since for each message set we associate a probability function 6 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Introduction to Entropy �Can formally prove these later—not complicated. �We’ll look at multivariate entropy, conditional, and mutual information later as we examine the internals of BLAST 7 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

NW and SM Alignment Algorithms �Initialization Phase (the initial values of the recurrences) �Fill-in (Bottom-up recursion) �Trace-back �This reduces complexity to �Cost? We cannot guarantee the best solution— only a decent solution (at best) �This is why it is mandatory to manually inspect alignments 8 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

NM �Initialize top row and left column by placing the negative distance away from the start of the sequences �Fill-in 9 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

10 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

11 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

NM and SM Alignment �Traceback—start at right-bottom and follow sequence finish arrows to left-top Start 12 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Recurrence 13 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

SM is local alignment �Initialization of top row and left column to zeros �Cell values can only be non-negative �Traceback starts at maximum value and ends at zero 14 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©

Affine gap scores �Initial gap cost is high �Continuing gaps are constant and lower 15 Sequence Similiarty (Computation) M. M. Dalkilic, Ph. D So. I Indiana University, Bloomington, IN 2008 ©