proteomics myriad Can New Oracle 10 g Search

  • Slides: 33
Download presentation
proteomics myriad Can New Oracle 10 g Search Features Help Bridge the Biological Discovery

proteomics myriad Can New Oracle 10 g Search Features Help Bridge the Biological Discovery Gap? Jake Y. Chen, Ph. D. Marcel Davidson Head of Computational Proteomics & Principle Bioinformatics Scientist Head of Data Management

Messages v New Informatics Challenges in Protein Interactomics R&D Ø Scale, integration, discovery issues

Messages v New Informatics Challenges in Protein Interactomics R&D Ø Scale, integration, discovery issues Ø A data-driven discovery-oriented framework v “Enabling” Features in 10 g? Ø Biological data integration? Ø Biological data analysis integration? proteomics myriad

Outline v. Data-driven Discovery-oriented Computational Framework v 10 g Regular Expression Case Studies v

Outline v. Data-driven Discovery-oriented Computational Framework v 10 g Regular Expression Case Studies v 10 g BLAST Case Studies proteomics myriad

Why Myriad Maps Protein-Protein Interactions Conventional Drug Discovery Post-Genomic Drug Discovery GPCR enzyme Nucleus

Why Myriad Maps Protein-Protein Interactions Conventional Drug Discovery Post-Genomic Drug Discovery GPCR enzyme Nucleus hormone receptor Nucleus novel, more specific targets non-specific targets novel, druggable targets target validation lead discovery, optimization enhanced pre-validation target pool proteomics myriad

Principle of the Yeast Two-Hybrid (Y 2 H) System Scenario A: Human Proteins X

Principle of the Yeast Two-Hybrid (Y 2 H) System Scenario A: Human Proteins X and Y do Interact Prey Activation Human Domain Protein Y Human Protein X Bait Readout: Yeast colonies grow DNA Binding Domain Reporter Gene DNA Scenario B: Human Proteins X and Z do not Interact Prey Activation Human Domain Protein Z Human Protein X Bait DNA Binding Domain ( No Reporter Gene Activity ) Readout: No growth of yeast colonies Reporter Gene proteomics myriad

Data Collected from Y 2 H System Search Experiment Bait Fragment Prey Fragment 0001

Data Collected from Y 2 H System Search Experiment Bait Fragment Prey Fragment 0001 TACACACCTCGGCGTCG CAGCTCTCGATCATCTCC GGAGCTAACAAGG CCGGACTGTCCCGTAGA AGCCGCTCTGC TGTGAGCCGGGGACCAT GCAGCCCGAAACCTCCA GTCACTGCGCCCGGCAG GAGTCAGGAGCCAGGGA CTGTGCAGCCTG GCCGAGAAAGATCACAC ACAAGGCTGTCACTTCAT ACTTGGAGAGTTGCACA GCGGCGGGGCAGAGGA GCTCCTCACTTC TATCAAATTGAAGAAGTA TGACGGTCGAACCAAAC CCATTCCAAA 0002 Perform BLAST Against Human REFSEQ DB Search Experiment Bait Sequence Prey Sequence 0001 NM_016333. 2 NM_021962. 1 0002 NM_016486. 2 NM_003134. 1 proteomics myriad

Protein Interaction Network (Snapshot of ~8, 000 interactions) proteomics myriad

Protein Interaction Network (Snapshot of ~8, 000 interactions) proteomics myriad

Knowledge Discovery (KD) Challenges Experimental Measurement ? Protein Interaction Data • >1. 5 million

Knowledge Discovery (KD) Challenges Experimental Measurement ? Protein Interaction Data • >1. 5 million sequence fragments • ~250, 000 search experiments performed • several TB data storage • ~80, 000 unique interactions • ~100 biological data sources Distilled Information • ~1000 relevant interactions for each interested pathways Marketable Knowledge • Data-driven • Discovery-oriented • 1 -10 novel drug targets per disease $$$, drugs, … proteomics myriad

KD in Interaction-based Proteomics • Data Cleansing • Statistical Data Analysis • Domain-specific Data

KD in Interaction-based Proteomics • Data Cleansing • Statistical Data Analysis • Domain-specific Data Modeling Reduce Data Noise Represent Interactions and Pathways Genomics/Funct ional Genomics Data • Data Integration • LIMS Programming E-RDBMS Collect raw sequences and lab condition measurements Organize Data in Regulatory Pathways Select and Validate Drug targets • Knowledge Curation • DB Querying • Visualization proteomics myriad

Bioinformatics DB Framework Y 2 H Data Processing and Analysis DB Lab_Seq, Seq_Match, Y

Bioinformatics DB Framework Y 2 H Data Processing and Analysis DB Lab_Seq, Seq_Match, Y 2 H_Mart Annotation DB Ref. Seq, Locus. Link, GO, OMIM, CGAP, Protein Kinase DB, GPCR DB, Ensemble, Curation, … Y 2 H Interaction Data Mart Y 2 h_Mart proteomics myriad

A Schema Fragment to Manage Sequence Similarity Results Jake Yue Chen and John Carlis

A Schema Fragment to Manage Sequence Similarity Results Jake Yue Chen and John Carlis (2003) Genomic Data Modeling. Information Systems Journal, 28(4), p 287 -310. proteomics myriad

Interaction Matrix using Randomly Ordered Locus IDs • 12, 958 unique Interactions • 1955

Interaction Matrix using Randomly Ordered Locus IDs • 12, 958 unique Interactions • 1955 bait loci • 2766 prey loci Jake Yue Chen, et al (2003) Proceedings of the IEEE Computer Science Society Bioinformatics Conference 2003. Stanford University, Stanford, CA. proteomics myriad

Outline v Data-driven Discovery-oriented Computational Framework v 10 g Regular Expression Case Studies v

Outline v Data-driven Discovery-oriented Computational Framework v 10 g Regular Expression Case Studies v 10 g BLAST Case Studies proteomics myriad

Oracle 10 g Regular Expressions: Powerful String Processing v RE new tools in Oracle

Oracle 10 g Regular Expressions: Powerful String Processing v RE new tools in Oracle 10 g v Search and manipulate data strings of arbitrary complexity v Prior database solutions Ø SQL LIKE operator Ø Java stored procedures, C external libraries v Prior non-database solutions: AWK, SED, GREP, PERL, etc. v Done now inside database v Facilitates rapid data-centric analysis proteomics myriad

Case 1: Retrieving Protein data from SGD (Saccharomyces Genome Database) ORF Identifier Associated Amino

Case 1: Retrieving Protein data from SGD (Saccharomyces Genome Database) ORF Identifier Associated Amino Acid Sequence proteomics myriad

HTTP Raw Data </script> </head><body><body bgcolor='#FFFFFF'> <table cellpadding="2" ><td colspan="4"><hr ><tr tr><td valign="middle" cellpadding="2"

HTTP Raw Data </script> </head><body><body bgcolor='#FFFFFF'> <table cellpadding="2" ><td colspan="4"><hr ><tr tr><td valign="middle" cellpadding="2" width="100%" cellspacing="0" border="0"><tr><td colspan="4"><hr width="100%" /></td></tr>< valign="middle" align="right"><a href="http: // www. yeastgenome. org/"><img alt="SGD" border="0" src=" http: //www. yeastgenome. org/images/SGD-to. gif" href="http: //www. yeastgenome. org src="http: //www. yeastgenome. org/images/SGD-to. gif /></a></td><th ><td valign="middle" /></a></td><th valign="middle" nowrap="1">Quick Search: </th><td valign="middle" align="left"><form method="post" action="http: //db. yeastgenome. org/cgi-bin/SGD/search/quick. Search application/x-www-form-urlencoded"> "> action="http: //db. yeastgenome. org/cgi-bin/SGD/search/quick. Search" enctype="application/x-www-form-urlencoded <input type="text" name="query" size="13" /><input type="submit" name="Submit" value="Submit" /> </form></td><th www. yeastgenome. org/sitemap. html">Site Map</a> | <a </form></td><th valign="middle" align="left"><a href="http: //www. yeastgenome. org/sitemap. html href="http: // www. yeastgenome. org/Help. Contents. shtml">Help</a> | <a href="http: // www. yeastgenome. org/Search. Contents. shtml">Full Search</a> | href="http: //www. yeastgenome. org/Help. Contents. shtml href="http: //www. yeastgenome. org/Search. Contents. shtml <a href="http: // www. yeastgenome. org/">Home</a></th th></ ></tr tr>< ><tr tr><td align="left" colspan="4"><table href="http: //www. yeastgenome. org colspan="4"><table cellpadding="1" width="100%" cellspacing="0" border="0"><tr navajowhite"><td><font size="-1"><a href="http: // www. yeastgenome. org/Com. Contents. shtml">Community border="0"><tr align="center" bgcolor="navajowhite href="http: //www. yeastgenome. org/Com. Contents. shtml Info</a></font></td><font size="-1"><a href="http: // www. yeastgenome. org/Submit. Contents. shtml">Submit Data</a></font></td><font href="http: //www. yeastgenome. org/Submit. Contents. shtml size="-1"><a href="http: //seq. yeastgenome. org/cgi-bin/SGD/nph-blast 2 sgd">BLAST</a></font></td><td><font size="-1"><a href="http: // seq. yeastgenome. org/cgi-bin/SGD/web-primer">Primers</a></font></td><td><font size="-1"><a href=" http: //seq. yeastgenome. org/cgihref="http: //seq. yeastgenome. org/cgibin/SGD/PATMATCH/nph-patmatch"> Pat. Match</a></font></td><td><font size="-1"><a href=" http: //db. yeastgenome. org/cgi-bin/SGD/seq. Tools">Gene/Seq bin/SGD/PATMATCH/nph-patmatch">Pat. Match href="http: //db. yeastgenome. org/cgi-bin/SGD/seq. Tools ">Gene/Seq Resources</a></font></td><font size="-1"><a href=" http: //www. yeastgenome. org/Vl-yeast. shtml">Virtual Library</a></font></td><font href="http: //www. yeastgenome. org/Vl-yeast. shtml size="-1"><a href="http: // db. yeastgenome. org/cgi-bin/SGD/suggestion">Contact SGD</a></font></td></tr></table></td></ tr>< ><tr tr><td href="http: //db. yeastgenome. org/cgi tr></table></td></tr colspan="4"><hr ></table><table cellpadding="0" ><td width="10%"><br colspan="4"><hr width="100%" /></td></tr></table><table cellpadding="0" width="100%" cellspacing="0" border="0"><tr><td width="10%"><br /></td><td valign="middle" align="center" width="80%"><h 1>Sequence for a region of YDR 099 W/BMH 2</h 1></td><tdvalign="middle" align="right" width="10%"></td></tr ></table><p /><center><a target="infowin " href="http: // db. yeastgenome. org/cgi-bin/SGD/suggestion">Send questions or width="10%"></td></tr></table><p target="infowin" href="http: //db. yeastgenome. org/cgi suggestions to SGD</a></center><p /><center><a target="infowin" href="http: //seq. yeastgenome. org/cgi-bin/SGD/nphblast 2 sgd? name=YDR 099 W& suffix=prot ">BLAST search</a> | <a target="infowin " href=" http: //seq. yeastgenome. org/cgi-bin/SGD/nphblast 2 sgd? name=YDR 099 W& suffix=prot">BLAST target="infowin" href="http: //seq. yeastgenome. org/cgi-bin/SGD/nphfastasgd? name=YDR 099 W& suffix= prot">FASTA search</a></center><p /><center><hr width="35%" /></center><p /><font fastasgd? name=YDR 099 W& suffix=prot color="FF 0000"><strong>Protein translation of the coding sequence. </strong></font><p />Other Formats Available: <a href=" http: //db. yeastgenome. org/cgi-bin/SGD/get. Seq? map=pmap& seq=YDR 099 W& flankl=0& flankr=0& rev=">GCG</a><pre>>YDR 099 W href="http: //db. yeastgenome. org/cgi-bin/SGD/get. Seq? map pmap& seq=YDR 099 W& flankl=0& flankr=0& rev=">GCG</a><pre>>YDR 099 W Chr 4 MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVAYKNVIGARRAS WRIVSSIEQKEESKEKSEHQVELIRSYRSKIETELTKISDDILSVLDSHLIPSATTGESK VFYYKMKGDYHRYLAEFSSGDAREKATNSSLEAYKTASEIATTELPPTHPIRLGLALNFS VFYYEIQNSPDKACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISES GQEDQQQQQQQQQAPAEQTQGEPTK* </pre><hr size="2" width="75%"> <table width="100%"><tr ><td valign="top" www. yeastgenome. org/"><img border="0" width="100%"><tr><td valign="top" align="left"><a href="http: //www. yeastgenome. org src="http: // www. yeastgenome. org/images/arrow. small. up. gif" />Return to SGD</a></td><td valign="bottom" src="http: //www. yeastgenome. org/images/arrow. small. up. gif valign="bottom" align="right"><form method="post" action="http: //db. yeastgenome. org/cgi -bin/SGD/suggestion" enctype=" application/x-www-form-urlencoded" " target="infowin " name="suggestion"> action="http: //db. yeastgenome. org/cgi-bin/SGD/suggestion" enctype="application/x-www-form-urlencoded target="infowin" <input type="hidden" name="script_name " value="/cgi-bin/SGD/get. Seq " /><input type="hidden" name="server_name " value="db. yeastgenome. org " name="script_name" value="/cgi-bin/SGD/get. Seq" name="server_name" value="db. yeastgenome. org" /><input type="hidden" name="query_string " value="seq =YDR 099 W& flankl=0& flankr=0& map=p 3 map" /><a name="query_string" value="seq=YDR 099 W& flankl=0& flankr=0& map=p 3 map" href=" javascript: document. suggestion. submit()">Send a Message to the SGD Curators<img border="0" href="javascript: document. suggestion. submit src="http: // www. yeastgenome. org/images/mail. gif" " /></a> src="http: //www. yeastgenome. org/images/mail. gif </form></td></tr ></table></body></html> </form></td></tr></table></body></html> Need to parse out embedded AA Sequence proteomics myriad

Function to Return AA Sequence Given ORF Parameterized ORF Id Web site create or

Function to Return AA Sequence Given ORF Parameterized ORF Id Web site create or replace function orf 2 seq ( p_orf in varchar 2 URL ) return varchar 2 is v_stream clob; strt number; begin -- Retrieve the HTTP stream: v_stream : = httpuritype. getclob(httpuritype. createuri ( 'http: //db. yeastgenome. org/cgi-bin/SGD/get. Seq? seq ='||p_orf || ='||p_orf|| '& flankl=0&flankr=0&map=p 3 map' ) flankl ); Reg. Exp to remove control chars from HTTP stream -- Trim off the head of the stream: strt : = dbms_lob. instr(v_stream , 'Submit', 1, 1); -- Strip out control characters, new lines, etc. : v_stream : = regexp_replace(dbms_lob. substr(v_stream , 4000, strt), : ]]', , strt), '[[: cntrl: ]]' ''); -- Return the AA sequence: return(regexp_substr(dbms_lob. substr(v_stream , 4000, strt), '[[: upper: ]]{10, }' )); end; Reg. Exp to extract AA sequence proteomics myriad

Amino Acid Sequence for ORF ‘YDR 099 W’ SQL> select orf 2 seq('YDR 099

Amino Acid Sequence for ORF ‘YDR 099 W’ SQL> select orf 2 seq('YDR 099 W') from dual; ORF 2 SEQ('YDR 099 W') ----------------------------------------MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVAYKNVIGARRASWRIVSSIEQKEESKEKSEHQ VELIRSYRSKIETELTKISDDILSVLDSHLIPSATTGESKVFYYKMKGDYHRYLAEFSSGDAREKATNSSLEAYKTASEI ATTELPPTHPIRLGLALNFSVFYYEIQNSPDKACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISES GQEDQQQQQQQQQAPAEQTQGEPTK Elapsed: 00: 01. 24 Elapsed time <2 sec. (network latency) SQL> insert into pseq (orf_id, sequence) 2 values ('YDR 099 W', orf 2 seq('YDR 099 W')); proteomics myriad

Case 2: Motif Searching in Proteins TKP motif pattern PROSITE database of protein sequence

Case 2: Motif Searching in Proteins TKP motif pattern PROSITE database of protein sequence motifs ID AC DT DE PA CC CC DO TYR_PHOSPHO_SITE; PATTERN. PS 00007; APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE). Tyrosine kinase phosphorylation site. [RK]-x(2, 3)-[DE]-x(2, 3)-Y. /TAXO-RANGE=? ? E? V; CC /SITE=5, phosphorylation; 1 Aspartate 2 2 – 3 Any /SKIP-FLAG=TRUE; 1 Arginine or PDOC 00007; Glutamate – 3 Any or Lysine Source: http: //www. expasy. org/prosite/ps_frequent_patterns. txt 1 Tyrosine v TKP Pattern: [RK]-x(2, 3)-[DE]-x(2, 3)-Y. Ø R=Arginine, K=Lysine, D=Aspartate, E=Glutamate, Y=Tyrosine, x=any AA v Oracle 10 g Regular Expression Equivalent Ø [RK]. {2, 3}[DE]. {2, 3}[Y] proteomics myriad

SQL Example: Retrieving all Interacting Proteins with TKP select distinct substr(a. refseq_id , 1,

SQL Example: Retrieving all Interacting Proteins with TKP select distinct substr(a. refseq_id , 1, 9) refseq_id, length(a. seq_string_varchar ) seq_length, regexp_instr(a. seq_string_varchar , '[RK]. {2, 3}[DE]. {2, 3}[Y]', 1, 1) motif_offs 1, regexp_instr(a. seq_string_varchar , '[RK]. {2, 3}[DE]. {2, 3}[Y]', 1, 2) motif_offs 2, regexp_instr(a. seq_string_varchar , '[RK]. {2, 3}[DE]. {2, 3}[Y]', 1, 3) motif_offs 3, regexp_instr(a. seq_string_varchar , '[RK]. {2, 3}[DE]. {2, 3}[Y]', 1, 4) motif_offs 4 from target_db a, y 2 h_interaction_p b where a. refseq_id like 'NP%' and regexp_like(a. seq_string_varchar , '[RK]. {2, 3}[DE]. {2, 3}[Y]') and (substr(a. refseq_id, 1, 9) = b. bait_refseq or substr(a. refseq_id, 1, 9) = b. prey_refseq) ; Returns all rows with TKP site Returns first 4 instances of TKP in each sequence proteomics myriad

Motif #1 at offset 8 Motif #2 at offset 50 SQL Example Output Motif

Motif #1 at offset 8 Motif #2 at offset 50 SQL Example Output Motif #3 at offset 62 REFSEQ_ID SEQ_LENGTH MOTIF 1_OFFS MOTIF 2_OFFS MOTIF 3_OFFS MOTIF 4_OFFS -----------------NP_003961 1465 14 202 347 537 NP_003968 330 241 0 0 0 NP_003983 490 8 50 62 93 NP_004001 3562 3085 0 0 0. . . MHHCKRYRSPEPDPYLSYRWKRRRSYSREHEGRLRYPSRREPPPRRSRSRSHDRLPYQRRYRERRDSDTYRCEERSPSFG EDYYGPSRSRHRRRSRERGPYRTRKHAHHCHKRRTRSCSSASSRSQQSSKRTGRSVEDDKEGHLVCRIGDWLQERYEIVG NLGEGTFGKVVECLDHARGKSQVALKIIRNVGKYREAARLEINVLKKIKEKDKENKFLCVLMSDWFNFHGHMCIAFELLG KNTFEFLKENNFQPYPLPHVRHMAYQLCHALRFLHENQLTHTDLKPENILFVNSEFETLYNEHKSCEEKSVKNTSIRVAD FGSATFDHEHHTTIVATRHYRPPEVILELGWAQPCDVWSIGCILFEYYRGFTLFQTHENREHLVMMEKILGPIPSHMIHR TRKQKYFYKGGLVWDENSSDGRYVKENCKPLKSYMLQDSLEHVQLFDLMRRMLEFDPAQRITLAEALLHPFFAGLTPEER Motif #4 at SFHTSRNPSR offset 93 [RK]. {2, 3}[DE]. {2, 3}[Y] Result: 702 (56%) interacting proteins with TKP site proteomics myriad

Is 56% TKP in interacting proteins significant? All Curated Proteins Total NP Entries Myriad

Is 56% TKP in interacting proteins significant? All Curated Proteins Total NP Entries Myriad Proteomics Interaction Subset Curated Proteins w/ TKP Percent with TKP 16, 908 6, 991 41% 1, 248 702 56% Random sample test of all NP entries • N = 33 random samples • Sample size 7. 4% (~1251) • Sample mean = 515 • SD = 17. 2 • Significance level < 1 E-30 proteomics myriad

Outline v Data-driven Discovery-oriented Computational Framework v 10 g Regular Expression Case Studies v

Outline v Data-driven Discovery-oriented Computational Framework v 10 g Regular Expression Case Studies v 10 g BLAST Case Studies proteomics myriad

Similarity Search (Sequence Comparison): A Routine Biology Task A Query Sequence n Target Sequences

Similarity Search (Sequence Comparison): A Routine Biology Task A Query Sequence n Target Sequences k Pair-wise Comparison Results Similarity Search has not been integrated into the DB system. proteomics myriad

proteomics myriad

proteomics myriad

Using BLAST can be a laborious process & a data-management hell v. Custom setup

Using BLAST can be a laborious process & a data-management hell v. Custom setup of BLAST target database v. Iterate through query sequences: “Batch BLAST” v. Export/parse/filter/import data <-> DBMS v. Integration of results with external data proteomics myriad

Case 1: Oracle 10 g BLASTN as a sequence identification tool -- A sequence

Case 1: Oracle 10 g BLASTN as a sequence identification tool -- A sequence fragment with a sequence_id = 100 -- Sequence is stored in the query_db table. TACACACCTCGGCGTCGCAGCTCTCGATCATCTCCGGAGCTAA CAAGGCCGGACTGTCCCGTAGAAGCCGCTCTGC SELECT t. t_seq_id, t. expect FROM TABLE ( BLASTN_MATCH ( (select sequence FROM query_db where sequence_id = 100), CURSOR (select refseq_id, sequence_string FROM target_db where refseq_id like 'NM_%') ) )t WHERE t. expect < 1 E-20; T_SEQ_ID --------NM_016333. 2 EXPECT -------0 proteomics myriad

Case 2: Discovering “Interlogs” Yeast Protein Interactome A Human Protein Interactome Homology Mapping X

Case 2: Discovering “Interlogs” Yeast Protein Interactome A Human Protein Interactome Homology Mapping X C B Y Z Interlogs: (A|X, B|Y) and (A|X, B|Z) proteomics myriad

A Computational Intensive Task v Data to use Ø Ø Yeast Protein-Protein Interaction Data

A Computational Intensive Task v Data to use Ø Ø Yeast Protein-Protein Interaction Data Yeast Protein Sequences Human Protein-Protein Interaction Data Human Protein Sequences & Annotations Missing Data v Analysis to prepare Ø Homology search: yeast vs. human proteins Laborious v Things to consider Ø Collect/parse public data from web Ø Import/export data for BLAST Ø Connect analysis result to internal data Traditional way? Or inside DBMS? proteomics myriad

Pipelining Missing Data Directly into BLASTP Searches insert into yeast_human_homolog select 'YDR 099 W‘

Pipelining Missing Data Directly into BLASTP Searches insert into yeast_human_homolog select 'YDR 099 W‘ Yeast_ORF_name, t. t_seq_id Human_refseq, t. expect E_Value from TABLE ( BLAST in DBMS BLASTP_MATCH ( (SELECT orf 2 seq ('YDR 099 W') FROM dual), Online Data CURSOR Integration (SELECT refseq_id, sequence_string FROM target_db BLAST WHERE refseq_id LIKE 'NP_%') Target DB ) Customization )t WHERE t. expect < 0. 0001 ; -- Note: Iterate through Yeast ORF Names to perform batch BLAST. proteomics myriad

Mission Impossible: Accomplished SELECT a. orf_1, a. orf_2, b. human_refseq, b. e_value , c.

Mission Impossible: Accomplished SELECT a. orf_1, a. orf_2, b. human_refseq, b. e_value , c. human_refseq, c. e_value FROM yeast_interaction a, yeast_human_homolog b, yeast_human_homolog c, y 2 h_interaction_p d WHERE a. orf_1 = b. yeast_ORF_name and a. orf_2 = c. yeast_ORF_name and ( (b. human_refseq = d. bait_refseq and c. human_refseq = d. prey_refseq) or (b. human_refseq = d. prey_refseq and c. human_refseq = d. bait_refseq) ORF_1 ORF_2 HUMAN_REFSEQ E_VALUE ) ------------------------- ---------------YCR 002 C YHR 107 C NP_xxxxx 1 5. 9279 E-44 NP_yyyyy 1 3. 7130 E-46 YCR 002 C YJR 076 C NP_xxxxx 2 5. 9279 E-44 NP_yyyyy 2 1. 7807 E-48 ; YJR 076 C YHR 107 C NP_xxxxx 3 1. 9734 E-39 NP_yyyyy 3 3. 7130 E-46 YCR 002 C YJR 076 C YHR 107 C NP_xxxxx 4 2. 3257 E-48 NP_xxxxx 5 2. 3257 E-48 NP_xxxxx 6 1. 7807 E-48 NP_yyyyy 4 NP_yyyyy 5 NP_yyyyy 6 7. 4988 E-39 1. 9734 E-39 7. 4988 E-39 proteomics myriad

Conclusion v Data-driven discovery-oriented bioinformatics framework demands rich bio-specific DBMS support v 10 g

Conclusion v Data-driven discovery-oriented bioinformatics framework demands rich bio-specific DBMS support v 10 g Regular Expression and BLAST in DBMS features benefit our scientific discovery tasks in interactome studies v Additional enhancements proteomics myriad

References v Jake Yue Chen, et al (2003) Initial Large-scale Exploration of Protein-protein Interactions

References v Jake Yue Chen, et al (2003) Initial Large-scale Exploration of Protein-protein Interactions in the Human Brain. Proceedings of the IEEE Computer Science Society Bioinformatics Conference 2003. Stanford University, Stanford, CA. v Sudhir Sahasrabudhe and Chen, Jake Yue (2003) Extracting Biological Information from System-scale Protein Interactome Data. Tutorial at the 11 th International Conference on Intelligent Systems in Molecular Biology. Brisbane, Australia. v Jake Yue Chen and John Carlis (2003) Similar_Join: Extending DBMS with a Bio-specific Operator. Proceedings of the 2003 ACM Symposium on Applied Computing. Melbourne, Florida. v Jake Yue Chen and John Carlis (2003) Genomic Data Modeling. Information Systems, Vol 28, issue 4: p 287 -310. proteomics myriad