Web 2 0 Web 3 0 Web 5
Web 2. 0 + Web 3. 0 = Web 5. 0? The HSFBCY + CIHR + Microsoft Research SADI and Cardio. SHARE Projects Mark Wilkinson Heart + Lung Research Institute i. CAPTURE Centre, St. Paul’s Hospital, UBC
“Non-logical” reasoning and SPARQL queries over distributed data that doesn’t exist
How do we make data and tools easily available to biologists
Ontologies!
Problem…
Ontology Spectrum Catalog/ ID Thesauri “narrower term” relation Terms/ glossary WHY? Informal is-a Because I say so! Formal is-a Because it fulfils XXX Frames Selected (Properties) Logical Formal Value instance Restrs. Constraints (disjointness, inverse, …) General Logical constraints Originally from AAAI 1999 - Ontologies Panel by Gruninger, Lehmann, Mc. Guinness, Uschold, Welty; – updated by Mc. Guinness. Description in: www. ksl. stanford. edu/people/dlm/papers/ontologies-come-of-age-abstract. html
My Definition of Ontology (for this talk) Ontologies explicitly define things that exist in “the world” based on what properties each kind of thing must have
Ontology Spectrum Catalog/ ID Thesauri “narrower term” relation Terms/ glossary Informal is-a Frames Selected (Properties) Logical Formal Value instance Restrs. Constraints (disjointness, inverse, …) General Logical constraints
My goal with this talk: the “sweet spot”
COST Catalog/ ID Thesauri “narrower term” relation Terms/ glossary Informal is-a Frames Selected (Properties) Logical Formal Value instance Restrs. Constraints (disjointness, inverse, …) General Logical constraints
COMPREHENSIBILITY Catalog/ ID Thesauri “narrower term” relation Terms/ glossary Informal is-a Frames Selected (Properties) Logical Formal Value instance Restrs. Constraints (disjointness, inverse, …) General Logical constraints
Likelihood of being “right” Catalog/ ID Thesauri “narrower term” relation Terms/ glossary Informal is-a Frames Selected (Properties) Logical Formal Value instance Restrs. Constraints (disjointness, inverse, …) General Logical constraints
Here’s my argument…
Semantic Web? An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had not anticipated.
Semantic Web? If we cannot achieve those two things, then IMO we don’t have a “semantic web”, we only have a distributed (? ? ), linked database… and that isn’t particularly exciting or interesting…
Where is the semantic web? Catalog/ ID Thesauri “narrower term” relation Terms/ glossary Informal is-a Frames Selected (Properties) Logical Constraints (disjointness, inverse, …) Formal Value instance Restrs. REASON: “Because I say so” is not open to re-interpretation General Logical constraints
Founding partner
SADI Premise #1: Web Services in Bioinformatics expose the implicit biological relationship between an input and its associated output
SADI Premise #1:
SADI Premise #2: A web services registry that provides WS discovery based on these properties enables the behaviours expected of the Semantic Web
Dynamic Distributed Discovery Interpretation Re-interpretation
Example SADI-enabled App Imagine: there exists a “virtual graph” connecting every conceivable input to every conceivable Web Service and their respective outputs. . . How do we query that graph?
DEMO “SHARE” A SADI-enabled query resolver for life sciences
Recap what we just saw A SPARQL database query was entered into the SHARE environment The query was passed to SADI and was interpreted based on the properties being asked-about SADI searched-for, found, and accessed the databases and/or analytical tools required to generate those properties
Recap what we just saw We asked, and answered a complex “database query” WITHOUT A DATABASE
Founding partner
Cardio. SHARE A domain-specific implementation of SADI Utilizes OWL ontologies describing cardiovascular concepts Ontologies are designed to lie in the “sweet spot” of the Semantic range
Cardio. SHARE Premise #1: Ontology = Query = Workflow
QUERY: Concept: SELECT images of mutations from genes in “Homologous Mutant organism XXX that share homology. Image” to this gene in organism YYY WORKFLOW
Phrased in terms of properties: SELECT image P where { Gene Q has. Image image P Gene Q has. Sequence Q Gene R has. Sequence R Sequence Q similar. To Sequence R Gene R = “my gene of interest” }
…but these are simply axioms… Homologous. Mutant. Image is equivalent. To { Gene Q has. Image image P Gene Q has. Sequence Q Gene R has. Sequence R Sequence Q similar. To Sequence R Gene R = “my gene of interest” }
Class: homologous mutant images
QUERY: Retrieve homologous mutant images for gene XXX
Cardio. SHARE We are not building massive ontologies! Publish small, independent single-Classes of OWL Cheap Scalable Flexible Don’t try to describe all of biology!
DEMO Cardio. SHARE
Recap SADI interprets queries (SPARQL + OWL Class Definitions) Determine which properties are available, and which need to be discovered/generated Discovery of services via on-the-fly “classification” of local data with small OWL Classes representing service interfaces
Recap Cardio. SHARE encapsulates workflows as OWL Classes Ontology = Query = Workflow Ontologies consist of one class Low-cost, high accuracy
Cardio. SHARE OWL Classes are shared on the Web such that third-parties, potentially with different expertise, can utilize the expertise of the person who designed the Class. Easily share your expertise with others Easily utilize the expertise of others. . . all based on the premise that we define the world by its properties, rather than its classes
Cardio. SHARE repercussion. . . if Ontology = query = workflow and query = hypothesis then Ontology = Hypothesis
Currrent Research How far can we push the Ontology = Hypothesis approach? ? Attempting to duplicate some clinical outcomes research using ONLY ontologies
What we achieve Re-interpretation : The SADI data-store simply collects properties, and matches them up with OWL Classes in a SPARQL query and/or from individual service provider’s WS interface
What we achieve Novel re-use: Because we don’t pre-classify, there is no way for the provider to dictate how their data should be used. They simply add their properties into the “cloud” and those properties are used in whatever way is appropriate for me.
What we achieve Data remains distributed – no warehouse! Data is not “exposed” as a SPARQL endpoint greater provider-control over computational resources Yet data appears to be a SPARQL endpoint… no modification of SPARQL or reasoner required. No longer dependent on “pure” DL logic
Fin
- Slides: 45