Quete OntologyBased Query System for Distributed Sources Haridimos
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti, dp @ics. forth. gr
Presentation Outline 1. 2. 3. 4. 5. 6. 7. Motivation Current Integration Approaches Quete Overview Querying in Quete Evaluation Conclusions Future Work Computer Science Department, University of Crete & FORTH-ICS 2
1. Motivation Visualization Tools Statistical, Clustering, Classification Tools findings Regulatory Element Tools metadata Query Engine D. B. mediator Sample name Normalization Tools Genomic IS Normalized data Clinical IS Computer Science Department, University of Crete & FORTH-ICS 3
2. Current Approaches (1/2) n Warehouse Integration n Data is downloaded, filtered, integrated and stored in a warehouse. Answers are taken from the warehouse n GUS n Navigational Integration n Explicit Links Between data n SRS, n Entrez Mediator - Wrapper Approaches n A global schema is defined over all data sources n K 2/Bio. Kleisli, TAMBIS, BACIIS, Discovery. Link Computer Science Department, University of Crete & FORTH-ICS 4
2. Current Approaches (2/2) Query Results Mediator n Mediator-Wrapper approach n GAV approach n Wrapper n LAV approach n Source 1 Source 2 The global schema is defined in terms of the source terminologies The sources are defined in terms of the global schema Computer Science Department, University of Crete & FORTH-ICS 5
3. Integration Architecture Java Application Query Ontology Result Q Java DB Engine U E T E Jdbc-Odbc Source 1 Jdbc-Odbc Source 2 Jdbc-Odbc Source 3 Computer Science Department, University of Crete & FORTH-ICS 6
3. 1 The Reference Ontology n Ontology is organized as a graph (+relationship concepts) related through n n IS-A HAS-A Risk. Factors Years. Of. Smoking Age : Int Breast. Cancer. Patient Name City SSN : String Reporter GOAnnotation Reporter. Name : String HGNCGene. Symbol : String GOId GOName : String Gene. Expression Ratio. Value : Decimal GOBiological. Process GOMolecular. Function GOCellular. Component Hybridization Tumor. Sample Tumor. Identifier Surgery. Date Hybridization. Date : Date IS-A : String : Date HAS-A Computer Science Department, University of Crete & FORTH-ICS 7
3. 2 Semantic Names n A semantic name (SN) captures the system independent semantics of a schema element combining one or more ontology terms n Semantic_name= [CN 1; …; CNm] AN The semicolon between CNi and CNi+1 means that concept CNi is generalization of concept CNi+1. Type Semantic Name System Name Table [Breast. Cancer. Patient] Breat. Cancer. Patient Field [Breast. Cancer. Patient] Name Field [[Breast. Cancer. Patient] City Table [Breast. Cancer. Patient; Tumor. Sample] Surgical. Excision Field [Breast. Cancer. Patient; Tumor. Sample] Tumor. Id Tumor. Sample. Id Field [Breast. Cancer. Patient; Tumor. Sample] Surgery. Date Computer Science Department, University of Crete & FORTH-ICS 8
3. 3 Definitions n A semantic name [CN 1; …; CNm] AN is subsumed by a semantic name [CN 1 ’; …; CNm ’] AN ’ , if n n m ’ <= m CNm-m’+I coincides with or is a specialization of CNi ’ , i=1, …, m’ AN=AN’ Two semantic names are semantically overlapping if n n Their last i concept names are the same or related through the ISA relationship They have the same attribute name AN Computer Science Department, University of Crete & FORTH-ICS 9
3. 4 Integration Steps n Capture Process n n Captures the data to be integrated Performed independently in each source n n n Use Extractor tool to export database schemata Choose fields/tables of interest Use the Ontology to Annotate Schemata Database schemata extracted and stored in X-Spec files that are sent to the central site. Integration Process n n Central Integration of the various data sources A global view is produced in memory called Context View Computer Science Department, University of Crete & FORTH-ICS 10
4. 1 Query Formulation n Attribute-only version of SQL SELECT [Breast. Cancer. Patient]Name, [Reporter]HGNCGene. Symbol, [Gene. Expression]Ratio. Value WHERE [Risk. Factors]Years. Of. Smoking>30 AND [Hybridization]Hybridization. Date=[Tumor. Sample]Surgery. Date AND [Reporter; GOMolecular. Function]GOName=“celladhesion” ORDERBY [Breast. Cancer. Patient]Name n n SELECT clause contains concepts to be projected WHERE clause specifies selection criteria FROM clause is absent since the integration system will automatically identify tables to be used. No need for explicit join declarations Computer Science Department, University of Crete & FORTH-ICS 11
4. 2 Query Answering n Semantic Query is decomposed in SQL subqueries n When possible all operations are pushed into subqueries They are issued in parallel in distinct data sources n When all results are returned in central site, all remaining operations are performed ( joins, ordering etc) n Computer Science Department, University of Crete & FORTH-ICS 12
4. 3 Requirements in forming local subqueries 1. Identify the interesting to the user table attributes with semantic name [CNpath]AN 1. 2. 3. i. e (attributes with the same or more specific information+ local join keys) Since the from clause is missing, the linking tables with interesting to the user attributes must be determined and their join conditions The join attributes called DB link attributes are needed to link the interesting to the user attributes among sources Computer Science Department, University of Crete & FORTH-ICS 13
4. 4 Forming the local sub-queries Extension of Unity’s algorithm that increase’s system recall with no sacrifice in precision n Our algorithm takes into account n The user query n The ontology n The data source-to-ontology mappings n n …and formulates a single sub query (SQ) for each data source Computer Science Department, University of Crete & FORTH-ICS 14
4. 5 Algorithm: Result Composition Input: (i)The user semantic query (ii) local SQs Output: Composition plan 1. Find all minimal subsets of SQs such that 1. 2. 3. 4. 5. There is a join tree connecting all subqueries All the semantic query’s fields exist In each SQ there is a projection attribute which does not overlap with the projection attribute of another SQ Join the queries in each minimal subset Project the common requested attributes Union Results Apply Group and Order operations Computer Science Department, University of Crete & FORTH-ICS 15
4. 6 Results composition n Is done with the help of a central DBMS For every sub query design the temporary table in central db and store the returned results n Build the global SQL query to be issued to the central DB according to the result composition plan n Execute the global SQL query n n Pros First step executed in parallel n Uses DBMS technology to handle efficient join, union, order and group operators n Computer Science Department, University of Crete & FORTH-ICS 16
4. 6 Novel features n Horizontal, vertical and hybrid fragmentation can be declared and used n n During the formation of local sub queries During the formation of the result composition plan It rebuilds the fragmented tables before going further down to composition plan Advantages n n Eliminate unnecessary local sub queries Avoids joins that are certain to return empty results Increasing system’s recall Improving performance. Computer Science Department, University of Crete & FORTH-ICS 17
Preliminary Evaluation Computer Science Department, University of Crete & FORTH-ICS 18
Conclusions n Information Integration is a difficult task n Heterogeneity of Sources n Independent Evolution n Communication costs n Complicated Structures n Our system has good performance. n A LAV system n Global Schema do not change as sources evolve new sources are added n But without LAV’s complexity in processing n Trade off between complexity and efficiency Computer Science Department, University of Crete & FORTH-ICS 19
Future Work More Query Algorithms in memory n Database Cycles n Non – Relational Data Sources n Exploit Systems for Automatic Schema matching n Web Service – Grid approach n Caching n Updates in sources n Computer Science Department, University of Crete & FORTH-ICS 20
Thanks !!!
- Slides: 21