# Efficient Algorithms for Locating the Length Constrained Heaviest

- Slides: 21

Efficient Algorithms for Locating the Length. Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin* Tao Jiang Kun-Mao Chao Dept CS & Info Mngmt, Providence Univ, Taiwan Dept CS & Engineering, UC Riverside, USA Dept CS & Info Engnr, Nat. Taiwan Univ, Taiwan * Yaw-Ling Lin, Providence, Taiwan

Outline • • • Introduction. Applications to Biomolecular Sequence Analysis. Maximum Sum Consecutive Subsequence. Maximum Average Consecutive Subsequence. Implementation and Preliminary Experiments Concluding Remarks Yaw-Ling Lin, Providence, Taiwan 2

Introduction • Two fundamental algorithms in searching for interesting regions in sequences: • Given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum --- an O(n)-time algorithm. • Given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. --- an O(n log L)-time algorithm. Yaw-Ling Lin, Providence, Taiwan 3

Applications to Biomolecular Sequence Analysis (I) • Locating GC-Rich Regions – Finding GC-rich regions: an important problem in gene recognition and comparative genomics. – Cp. G islands ( 200 ~ 1400 bp ) – [Huang’ 94]: O(n L)-time algorithm. • Post-Processing Sequence Alignments – Comparative analysis of human and mouse DNA: useful in gene prediction in human genome. – Mosaic effect: bad inner sequence. – Normalized local alignment. – Post-processing local aligned subsequences Yaw-Ling Lin, Providence, Taiwan 4

Applications to Biomolecular Sequence Analysis (II) • Annotating Multiple Sequence Alignments – [Stojanovic’ 99]: conserved regions in biomolecular sequences. – Numerical scores for columns of a multiple alignment; each column score shall be adjusted by subtracting an anchor value. • Ungapped Local Alignments with Length Constraints – Computing the length-constrained segment of each diagonal in the matrix with the largest sum (or average) of scores. – Applications in motif identification. Yaw-Ling Lin, Providence, Taiwan 5

Maximum Sum Consecutive Subsequence <-4, 1, -2, 3> is left-negative < 5, -3, 4, -1, 2, -6 > is not. <5> <-3, 4> <-1, 2> <-6> is minimal leftnegative partitioned. Yaw-Ling Lin, Providence, Taiwan 6

Minimal left-negative partition Yaw-Ling Lin, Providence, Taiwan 7

MLN-partition: linear time Yaw-Ling Lin, Providence, Taiwan 8

Max-Sum with LC Yaw-Ling Lin, Providence, Taiwan 9

Analysis of MSLC Yaw-Ling Lin, Providence, Taiwan 10

Max Average Subsequence <4, 2, 3, 8> is right-skew < 5, 3, 4, 1, 2, 6 > is not. <5> <3, 4> <1, 2, 6> is decreasing rightskew partitioned. Yaw-Ling Lin, Providence, Taiwan 11

Decreasing right-skiew partition Yaw-Ling Lin, Providence, Taiwan 12

DRS-partition: linear time Yaw-Ling Lin, Providence, Taiwan 13

Max-Avg-Seq with LC Yaw-Ling Lin, Providence, Taiwan 14

Locate good-partner Yaw-Ling Lin, Providence, Taiwan 15

Analysis of Max. Avg. Seq Yaw-Ling Lin, Providence, Taiwan 16

Implementation and Preliminary Experiments Yaw-Ling Lin, Providence, Taiwan 17

Implementation and Preliminary Experiments Yaw-Ling Lin, Providence, Taiwan 18

Conclusion • Find a max-sum subsequence of length at most U can be done in O(n)-time. • Find a max-avg subsequence of length at least L can be done in O(n log L)-time. Yaw-Ling Lin, Providence, Taiwan 19

Recent Progress • Lu (CMCT’ 2002): finding the max-avg subsequence of length at least L on binary (0, 1) sequences. O(n)-time. • Goldwasser, Kao, Lu (2002, manuscripts): finding the max-avg subsequence of length at least L and at most U on real sequences. O(n)-time • Tools: finding Cp. G islands using MAVG (joint work with Huang, X. , Jiang, T. and Chao, K. -M. ) http: //deepc 2. zool. iastate. edu/aat/mavg/cgdoc. html http: //deepc 2. zool. iastate. edu/aat/mavg/cg. html Yaw-Ling Lin, Providence, Taiwan 20

Future Research • Best k (nonintersecting) subsequences? • Normalized local alignment? • Measurement of goodness? Yaw-Ling Lin, Providence, Taiwan 21

- Efficient Algorithms for Locating the Length Constrained Heaviest
- Efficient Algorithms to Monitor Continuous Constrained k Nearest
- Efficient Algorithms for Locating Maximum Average Consecutive Substrings
- Resource Constrained Project Scheduling Problem Overview Resource Constrained
- COnstrained MANanagement COMAN Management of Networks with Constrained