Using RDFOWL Technologies for Discovery and Use Metadata

  • Slides: 63
Download presentation
Using RDF/OWL Technologies for Discovery and Use Metadata M. Benno Blumenthal, Michael Bell, John

Using RDF/OWL Technologies for Discovery and Use Metadata M. Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University http: //iridl. ldeo. columbia. edu/

Definitions • Resource Description Framework (RDF) • Web Ontology Language (OWL)

Definitions • Resource Description Framework (RDF) • Web Ontology Language (OWL)

Why RDF? Web-based system for interoperating semantics A key part of the Semantic Web

Why RDF? Web-based system for interoperating semantics A key part of the Semantic Web RDF/OWL is an interesting technology, but it is even more interesting when it is clear that it can help solve our problems

The Data Problem Datasets Users

The Data Problem Datasets Users

The Tool Interface Tools Datasets Users

The Tool Interface Tools Datasets Users

Standard Metadata Schema/Data Services Tools Datasets Users

Standard Metadata Schema/Data Services Tools Datasets Users

Many Data Communities Standard Metadata Schema Tools Standard Metadata Schema Datasets Users Standard Metadata

Many Data Communities Standard Metadata Schema Tools Standard Metadata Schema Datasets Users Standard Metadata Schema Tools Users Datasets Tools Users Standard Metadata Schema Tools Users Datasets

Super Schema Standard metadata schema Standard Metadata Schema Tools Datasets Users Standard Metadata Schema

Super Schema Standard metadata schema Standard Metadata Schema Tools Datasets Users Standard Metadata Schema Tools Users Datasets

Super Schema: direct Standard metadata schema/data service Standard Metadata Schema Tools Datasets Users Standard

Super Schema: direct Standard metadata schema/data service Standard Metadata Schema Tools Datasets Users Standard Metadata Schema Tools Users Datasets

Flaws • A lot of work • Super Schema/Service is the Lowest. Common-Denominator •

Flaws • A lot of work • Super Schema/Service is the Lowest. Common-Denominator • Science keeps evolving, so that standards either fall behind or constantly change

RDF Standard Data Model Exchange Standard metadata schema RDF RDF Standard Metadata Schema Tools

RDF Standard Data Model Exchange Standard metadata schema RDF RDF Standard Metadata Schema Tools Datasets Users RDF Standard Metadata Schema Tools Users Datasets

RDF Data Model Exchange Standard metadata schema RDF RDF Standard Metadata Schema RDF RDF

RDF Data Model Exchange Standard metadata schema RDF RDF Standard Metadata Schema RDF RDF Tools Datasets Users RDF Standard Metadata Schema Users RDF Standard Metadata Schem RDF Tools Datasets RDF Datasets Standard Metadata Schema Tools Users RDF Datasets Tools Users Datasets

RDF Architecture queries Virtual (derived) RDF RDF RDF RDF RDF

RDF Architecture queries Virtual (derived) RDF RDF RDF RDF RDF

Why is this better? • Maps the original dataset metadata into a standard format

Why is this better? • Maps the original dataset metadata into a standard format that can be transported and manipulated • Still the same impedance mismatch when mapped to the least-common-denominator standard metadata, but • When a better standard comes along, the original complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping • Can uses enhanced mappings between models that are close • EASIER – these are tools to enhance the mapping process

Sample Tool: Faceted Search http: //iridl. ldeo. columbia. edu/ontologies/query 2. pl? . . .

Sample Tool: Faceted Search http: //iridl. ldeo. columbia. edu/ontologies/query 2. pl? . . .

Distinctive Features of the search • Search terms are interrelated • terms that describe

Distinctive Features of the search • Search terms are interrelated • terms that describe the set of returns are displayed (spanning and not) • Returned items also have structure (subitems and superseded items are not shown)

Architectural Features of the search http: //iridl. ldeo. columbia. edu/ontologies/query 2. pl • Multiple

Architectural Features of the search http: //iridl. ldeo. columbia. edu/ontologies/query 2. pl • Multiple search structures possible • Multiple languages possible • Search structure is kept in the database, not in the code

Cast of RDF Characters Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se. RQL

Cast of RDF Characters Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se. RQL OWL SKOS Tools and Frameworks Sesame Reasoners Redland Jena

RDF: framework for writing connections Triplets of • Subject • Property (or Predicate) •

RDF: framework for writing connections Triplets of • Subject • Property (or Predicate) • Object URI’s identify things, i. e. most of the above Namespaces are used as a convenient shorthand for the URI’s

Datatype Properties {WOA} dc: title “NOAA NODC WOA 01” {WOA} dc: description “NOAA NODC

Datatype Properties {WOA} dc: title “NOAA NODC WOA 01” {WOA} dc: description “NOAA NODC WOA 01: World Ocean Atlas 2001, an atlas of objectively analyzed fields of major ocean parameters at monthly, seasonal, and annual time scales. Resolution: 1 x 1; Longitude: global; Latitude: global; Depth: [0 m, 5500 m]; Time: [Jan, Dec]; monthly”

Object Properties {WOA} iridl: is. Container. Of {Grid-1 x 1}, {Grid-1 x 1} iridl:

Object Properties {WOA} iridl: is. Container. Of {Grid-1 x 1}, {Grid-1 x 1} iridl: is. Container. Of {Monthly}

WOA 01 diagram

WOA 01 diagram

Standard Properties {WOA} dcterm: has. Part {Grid-1 x 1}, {Grid-1 x 1} dcterm: has.

Standard Properties {WOA} dcterm: has. Part {Grid-1 x 1}, {Grid-1 x 1} dcterm: has. Part {MONTHLY} Alternatively {WOA} iridl: is. Container. Of {Grid-1 x 1}, {iridl: is. Container. Of} rdfs: sub. Property. Of {dcterm: has. Part}

netcdf/CF in RDF {SST} rdf: type {cfatt: non_coordinate_variable}, {SST} cfatt: standard_name {cf: sea_surface_temperature}, {SST}

netcdf/CF in RDF {SST} rdf: type {cfatt: non_coordinate_variable}, {SST} cfatt: standard_name {cf: sea_surface_temperature}, {SST} netcdf: has. Dimension {longitude} Object properties provide a framework for explicitly writing down relationships between data objects/components, e. g. vague meaning of nesting is made explicit Properties also can be related, since they are objects too

Noncontextual Modeling • “noncontextual modeling make RDF the perfect glue between systems and fixed

Noncontextual Modeling • “noncontextual modeling make RDF the perfect glue between systems and fixed data models” – The Semantic Web

RDF Level • • • Transport/Exchange (RDF/XML) Storage RDF APIs (Redland, Jena, Sesame) Query

RDF Level • • • Transport/Exchange (RDF/XML) Storage RDF APIs (Redland, Jena, Sesame) Query (SPARQL, Se. RQL, …) Basic Semantics

RDF Semantics RDF Primer Truly useful property rdf: type “a” Underlying Class rdf: Property

RDF Semantics RDF Primer Truly useful property rdf: type “a” Underlying Class rdf: Property Organizational Classes rdf: Bag rdf: Alt rdf: Seq rdf: List Structured values rdf: value Reification Bag Properties rdf: Statement: rdf: subject rdf: predicate rdf: object rdf: _1 rdf: _2 … List Properties rdf: first rdf: rest: rdf: nil

RDF-Schema (RDFS) Transitive Properties rdfs: sub. Class. Of (“is a”), rdfs: sub. Property. Of

RDF-Schema (RDFS) Transitive Properties rdfs: sub. Class. Of (“is a”), rdfs: sub. Property. Of rdfs: Class, rdfs: Resource rdfs: member rdfs: domain, rdfs: range rdfs: Datatype, rdfs: Literal, rdfs: Container Refering to other rdfs: see. Also, RDF documents rdfs: is. Defined. By Basic documentation rdfs: label, rdfs: comment

Gazetteer Classes

Gazetteer Classes

Gazetteer Individuals

Gazetteer Individuals

Search Interface Term • http: //iri. columbia. edu/~benno/sampleterm. pdf

Search Interface Term • http: //iri. columbia. edu/~benno/sampleterm. pdf

Semantics lead to Virtual Triples Transitive: {a} rdfs: sub. Class. Of {b} rdfs: sub.

Semantics lead to Virtual Triples Transitive: {a} rdfs: sub. Class. Of {b} rdfs: sub. Class. Of {c} implies {a} rdfs: sub. Class. Of {c} i. e. semantics of rdfs: sub. Class. Of imply additional triples not explicitly stated Likewise: {a} rdfs: sub. Property. Of{b} rdfs: sub. Property. Of {c} implies {a} rdfs: sub. Property. Of {c} More interestingly, {a} myprop {b}, {myprop} rdfs: sub. Property. Of {prop 2} implies {a} prop 2 {b}

Subcategories are not sub. Classes So carelessly translating existing conceptual organizations can get one

Subcategories are not sub. Classes So carelessly translating existing conceptual organizations can get one into trouble

Domain and Range are inherited Since the domain and range of a property are

Domain and Range are inherited Since the domain and range of a property are classes, then subclasses “inherit” properties (in this sense)

UML/RDFS • Unified Modeling Language • Base concepts are the same (RDFS lacks methods),

UML/RDFS • Unified Modeling Language • Base concepts are the same (RDFS lacks methods), so one can export the underlying structure of the code as the underlying structure for the metadata • See Representing UML in RDF

Ontologies Use Conventions to connect concepts to established sets of concepts Generate additional “virtual”

Ontologies Use Conventions to connect concepts to established sets of concepts Generate additional “virtual” triples from the original set and semantics RDFS – some property/class semantics OWL – additional property/class semantics: more sophisticated (ontological) relationships

OWL Language for expressing ontologies, i. e. the semantics are very important. However, even

OWL Language for expressing ontologies, i. e. the semantics are very important. However, even without a reasoner to generate the implied RDF statements, OWL classes and properties represent a sophistication of the RDF Schema However, there is a serious split in world view from what we have been talking about: concepts as classes vs concepts as individuals

OWL rdf: Property rdfs: see. Also owl: Datatype. Property owl: Object. Property owl: Annotation.

OWL rdf: Property rdfs: see. Also owl: Datatype. Property owl: Object. Property owl: Annotation. Property owl: Functional. Property owl: Inverse. Functional. Property owl: Transitive. Property owl: Symmetric. Property owl: imports owl: ontology

Protégé Tool for editing/displaying Ontologies Different “tabs” display different perspectives http: //protege. stanford. edu/

Protégé Tool for editing/displaying Ontologies Different “tabs” display different perspectives http: //protege. stanford. edu/

Cast of RDF Characters II Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se.

Cast of RDF Characters II Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se. RQL OWL SKOS Tools and Frameworks Sesame Reasoners Redland Jena

Query Language: SPARQL • (quick reference at http: //www. dajobe. org/2005/04 -sparql/) • Supported

Query Language: SPARQL • (quick reference at http: //www. dajobe. org/2005/04 -sparql/) • Supported by Redland, Jena, Sesame-2. 0 (alpha) • Jena implementation supports url source of triples, i. e. do not even need a triple store • The standard

Query Language: Se. RQL • Older than SPARQL • Implemented on top of Sesame

Query Language: Se. RQL • Older than SPARQL • Implemented on top of Sesame • Currently more powerful than SPARQL, i. e. has nested queries

Se. RQL Details Copied from on-line tutorial • • • Syntax Select Construct Where

Se. RQL Details Copied from on-line tutorial • • • Syntax Select Construct Where From

Se. RQL: basic syntax {person} foo: works. For {Company} rdf: type {foo: ITCompany}

Se. RQL: basic syntax {person} foo: works. For {Company} rdf: type {foo: ITCompany}

Se. RQL: multiple statements {subj 1} pred 1 {obj 1}; pred 2 {obj 2}

Se. RQL: multiple statements {subj 1} pred 1 {obj 1}; pred 2 {obj 2} Or {subj 1} pred 1 {obj 1} , {subj 1} pred 2 {obj 2}

Se. RQL: short cuts {subj 1} pred 1 {obj 1, obj 2, obj 3}

Se. RQL: short cuts {subj 1} pred 1 {obj 1, obj 2, obj 3} (also implies obj 1, obj 2, obj 3 are distinct)

Se. RQL: Select Output as table (XML) SELECT dataset, dlabel FROM {dataset} rdf: type

Se. RQL: Select Output as table (XML) SELECT dataset, dlabel FROM {dataset} rdf: type {iridl: dataset}, [{dataset} rdfs: label {dlabel}] USING NAMESPACE iridl = <http: //iridl. ldeo. columbia. edu/ontologies/iridl. owl>

Se. RQL: Construct Output as RDF (RDF/XML) CONSTRUCT {dataset} rdf: type {foo: Labelled. Datasets}

Se. RQL: Construct Output as RDF (RDF/XML) CONSTRUCT {dataset} rdf: type {foo: Labelled. Datasets} FROM {dataset} rdf: type {iridl: dataset}; rdfs: label {dlabel} USING NAMESPACE iridl = <http: //iridl. ldeo. columbia. edu/ontologies/iridl. owl>

Faceted Search Explicated

Faceted Search Explicated

Search Interface • Items (datasets/maps) • Terms • Facets • Taxa

Search Interface • Items (datasets/maps) • Terms • Facets • Taxa

Search Interface Semantic API {item} dc: title dc: description rss: link iridl: icon dcterm:

Search Interface Semantic API {item} dc: title dc: description rss: link iridl: icon dcterm: is. Part. Of {item 2} dcterm: is. Replaced. By {item 2} {item} trm: is. Described. By {term} a {facet} of {taxa} of {trm: Term}, {facet} a {trm: Facet}, {taxa} a {trm: Taxa}, {term} trm: directly. Implies {term 2}

Faceted Search w/Queries http: //iridl. ldeo. columbia. edu/ontologies/query 2. pl? . . .

Faceted Search w/Queries http: //iridl. ldeo. columbia. edu/ontologies/query 2. pl? . . .

RDF Architecture queries Virtual (derived) RDF RDF RDF RDF RDF

RDF Architecture queries Virtual (derived) RDF RDF RDF RDF RDF

IRI RDF Architecture MMI Data Servers Ontologies JPL bibliography Start Point Standards Organizations RDF

IRI RDF Architecture MMI Data Servers Ontologies JPL bibliography Start Point Standards Organizations RDF Crawler RDFS Semantics Owl Semantics SWRL Rules Se. RQL CONSTRUCT Sesame Search Queries Search Interface Location Canonicalizer Time Canonicalizer

Creating Virtual Triples from Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se. RQL

Creating Virtual Triples from Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se. RQL OWL SKOS Tools and Frameworks Sesame Reasoners Redland Jena

SWRL: A Semantic Web Rule Language Combining OWL and Rule. ML A language for

SWRL: A Semantic Web Rule Language Combining OWL and Rule. ML A language for writing rules in RDF/OWL, i. e. RDF statements that are rules for creating new RDF statements

Simple Knowledge Organization System (SKOS) Schema for relating concepts

Simple Knowledge Organization System (SKOS) Schema for relating concepts

Simple Knowledge Oranization System (SKOS) • So, for a resource of type skos: Concept,

Simple Knowledge Oranization System (SKOS) • So, for a resource of type skos: Concept, any properties of that resource (such as creator, date of modification, source etc. ) should be interpreted as properties of a concept, and not as properties of some 'real world thing' that resource may be a conceptualisation of. • This layer of indirection allows thesaurus-like data to be expressed as an RDF graph. The conceptual content of any thesaurus can of course be remodelled as an RDFS/OWL ontology. However, this remodelling work can be a major undertaking, particularly for large and/or informal thesauri. A SKOS Core representation of a thesaurus maps fairly directly onto the original data structures, and can therefore be created without expensive remodelling and analysis

RDF Frameworks Protégé API Redland Jena Sesame Bindings in many languages, supports several triple

RDF Frameworks Protégé API Redland Jena Sesame Bindings in many languages, supports several triple stores, some with context Java API, some cmd line utilities, supports inference layers HTTP server, Java API, supports inference, version 2 alpha has context

Sesame SAIL- Storage and Inference Layer i. e. you can write down rules that

Sesame SAIL- Storage and Inference Layer i. e. you can write down rules that imply virtual triples so that triples are generated as they are put into the store RDFS No inference RDFS inference OWLIM Some OWL inference Custom

Jena Java framework In-memory and persistent stores Inference API

Jena Java framework In-memory and persistent stores Inference API

Topics/Issues • Open. DAP and RDF: can we transport data semantics without fixing the

Topics/Issues • Open. DAP and RDF: can we transport data semantics without fixing the entire schema? • netcdf/HDF and RDF: do we need noncontextual modeling in our metadata transport/storage? • Concepts as classes vs concepts as individuals • Sub-classes vs sub-categories • OWL in detail • Protégé demo

RDF Cast of Characters Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se. RQL

RDF Cast of Characters Semantic Layers RDFS Query Language SPARQL SWRL Protégé Se. RQL OWL SKOS Tools and Frameworks Sesame Reasoners Redland Jena