GA for Sequence Alignment Pairwise alignment Multiple string
- Slides: 12
GA for Sequence Alignment ¨ Pair-wise alignment ¨ Multiple string alignment
Pairwise Sequence Alignment ¨ VNRLQQNIVSLEVDHKVANYKP ¨ VNRLQQSIVSLRDAFNDGELD HRVLNYKP ¨ Solving by a dynamic programming using Dayhoff matrics ¨ Each pairwise alignment needs O(n 1 n 2) ¨ VNRLQQNIVSL_____EVDHKVANYKP ¨ VNRLQQSIVSLRDAFND GELD HRVLNYKP
How to implement a GA ? ¨Representation ¨Fitness ¨Operators design ¨Selection strategy
Pair-wise Alignment: Representation ¨ How do you think? ¨ For example (my intuitively way) – Guess a length n – Chromosome
Pair-wise Alignment: Representation ¨ So the chromosome becomes: (1, 2, 4, 5, 6, 8…. ) (2, 4, 5, 7, 8, 10…. ) ¨ You can also use the gap position
Pair-wise Alignment: Fitness Function ¨ Simplest – Match : 1 – Dismatch : -2 – Gap : -1 ¨ Using the scoring matrix – Protein : PAM, … – DNA: substitution matrix ¨ Summarize the total score.
Pair-wise Alignment: Genetic Operators ¨ All our previous operators. – Image one!!! ¨ Selection – Try it!!!
Conclusion About Pair-wise Alignment ¨ DP can solve it in O(NM) ¨ GA can’t have too much advantage.
RPCVCPVLRQAAQ s 1 RPCVC_ P__VLRQAAQ a 1 RPCACCPVLRQVVQ s 2 RPCACCP__VLRQVVQ a 2 KPCLCPRQLRQV s 3 KPCLC_ P RQLRQV_ _ a 3 KPCCPRQAAQ s 4 KPC_C_ P____ RQAAQ a 4 S A
Multiple String Alignment: Representation ¨ How do you think? ¨ For example (my intuitively way) – Guess a length n – Chromosome
Multiple String Alignment: Representation ¨ So the chromosome becomes: (1, 2, 4, 5, 6, 8…. ) (2, 4, 5, 7, 8, 10…. ) … ¨ You can also use the gap position – Need fewer space – Some good operators…. .
Multiple String Alignment: Fitness Function ¨ The most hard part ¨ You can never know what is the real scoring system! Even biologists!!! ¨ Approximation – Using SOP (sum of pairs) • The most widely used • Using PAM, … – Motif-based…