Mira AbrahamCohen and Haim J Wolfson Blavatnik School
Mira Abraham-Cohen and Haim J. Wolfson Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel
Why RNA? n RNA (ribonucleic acid) is: n not solely a carrier of genetic information (non-coding RNAs) DNA RNA X The Central Dogma of Molecular Biology Protein
Why RNA? n RNA (ribonucleic acid) is: n not solely a carrier of genetic information (non-coding RNAs) n a key player in essential cellular processes (e. g. protein synthesis and transport, gene silencing) n involved in pathological processes (e. g. cancerous tumors, AIDS) n a potential drug or drug-target (e. g. RNAi, bacterial ribosomes as antibiotic-targets)
RNA Structure 1 D 2 D 3 D
Why RNA secondary structure? n “RNA structure” usually refers to 2 D structure n Easier to achieve (more common than 3 D structures) n Secondary structure elements n Helix n Loop
Secondary Structure elements Helix Bulge Internal loop Multi branch loop Hairpin
GUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCA. ((((((. . . . )))))). . ((((. . . . )))). [[[. . ((((((]]]. . . )))))). . .
Pseudoknot structural motif n Important for the function of many RNAs helix 1 i 1 < i 2 < j 1 < j 2 helix 2 n RNA 2 D structure alignment n Disregarding pseudoknots O(n 4) [Zhang and Shasha 1989] n Including pseudoknots NP-Hard [Zhang et al. 1999]
Why do pseudoknots make a difference? Are they common? Over 30% of the functional groups Less than 70% 2 D similarity
Previous work – RNA 2 D alignment n Methods disregarding pseudoknots n RNAforester [Hofacker et al. 2004] n Migals [Allali and Sagot 2005] n MARNA [Siebert and Backofen 2005] n Methods that deal with limited cases n rna_align (DP) [Jiang et al. 2001] n pkalign (DP) [Mohl et al. 2009]
Previous work – RNA 2 D alignment n A method that deals with the general problem n LARA (ILP) [Bauer et al. 2007] n All current methods dealing with pseudoknots n High time and memory complexity n Impractical for big structures n n n rna_align < 150 nts pkalign < 800 nts LARA < 1600 nts on pc-wolfson 1 (2 GB RAM)
HARP Motivation Preserved 3 D structure ? Preserved function Preserved relative 3 D distances Preserved function Preserved relative 2 D distances Preserved function
HARP n Aligns RNA 2 D structures with no limitation on the pseudoknot type n Exploits inherent RNA distance constraints Distances between 2 D elements are usually conserved n Pseudoknots often create spatial distance constraints n n Goal: Finding the largest set of conserved helices n Heuristic method based on an analog of Geometric Hashing
Geometric hashing Point of “view” Each pair of points defines a “view” Voting table
HARP - Overview R 1 R 2 Generate reduced “helix” graph representations G 1 G 2 Build a look-up table of geodesic distances in all bases Query the look-up table Refine alignments and extend the match
Reduced graph representation n Vertices- stable helices n Helix beginning, termination and length n Edges connect adjacent helices n Direction: polymerization direction n Weight: minimal number of nucleotides needed for connection
Graph representation
Graph representation i k j forward 11 i k 7 backward 4 20 j i k 16 4 j
Building a look-up table n Shortest path between any two vertices n Any two vertices (i, j) define a “view” forward backward
Similar views Inserting G 1 triangles Querying with G 2 triangles
Querying the vote table n Querying the table with the indexing edges of G 2 n ε-vicinity Indexing edges Basis edge • Filtering by – Triangle type F/B – ε-vicinity
Alignment refinement G 1 G 2 w Distance between the Correlation between vertices Hungarian algorithm helices’ lengths
Alignment extension and scoring n Greedy approach n Starting with the largest (pair of bases) match n Extending by adding the pair that contributes most to the extension n Score
Complexity Generating reduced graphs representations Building a look-up. In practice: Average size structures less than a second table Big structures (~2800 nucleotides) less than a Querying the look-up minute and 10 MB table Generating alignments: 1. Alignments refinement 2. Alignment extension
Results n HARP’s statistics n Average score and p-value n Comparison with LARA n Alignment examples
HARP’s statistics Group size Average size (nts) Average score p-value t. RNA 4 78 100% 0 Ribosomal 23 S subunit 4 2852 71. 9% 0 Ribosomal 5 S subunit 4 120 77. 2% 0. 18 Ribosomal 16 S subunit 2 1530 86. 7% 0 Self splicing group I introns 2 224 78. 0% 0. 02 Thi-box riboswitch 2 80 95. 0% 0. 07 Guanine riboswitch 2 69 100% 0 SRP S domain 2 114 73. 2% 0. 17 RNase P catalytic domain 2 298 68. 9% 0. 02 Total 24 596 83. 4% 0. 05 Functional Group (based on DARTS)
Similar 2 D yet different function 5 S ribosomal RNA SRP
Comparison with LARA 23 r. RNA
Sensitivity TP/P=TP/(TP+FN) Comparison with LARA HARP LARA 1 -Specificity = FPR FP / N = FP / (FP + TN)
Self splicing group I introns 68. 9% similarity (left) PDB id 1 zzn chain B, 10 stable helices. (right) PDB id 1 y 0 q chain A, 13 stable helices.
Catalytic domains of ribonuclease P (left) PDB id 2 a 2 e chain A, 19 stable helices (right) PDB id 2 a 64 chain A, 16 stable helices.
Conclusions HARP ü HARP is a tool for the alignment of RNA secondary structures, which may include pseudoknots ü Accurate tool capable of distinguishing between homologous structures and non-homologous structures ü Highly efficient ü ü Takes less than a second for average-size structures Less than a minute and 10 MB for very big structures ü Web server : http: //bioinfo 3 d. cs. tau. ac. il/HARP
Thank you for your attention !
- Slides: 33