What is the Semantic Web The Semantic Web
What is the Semantic Web? “The Semantic Web is an extension of the current web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. ” – Tim Berners-Lee, et al [The Semantic Web, Scientific American, 2001. ] “A set of standards and best practices for sharing data and the semantics of that data over the Web for use by applications” -- Bob Du. Charme [Learning SRARQL, 2013. ] Standards: 1. RDF data model 2. SPARQL query language 3. RDFS and OWL standards for storing vocabularies and ontologies. Best practices include the use of URIs (IRIs) to refer to entities on the web and use of standards. 1
Semantic Web technologies vs current web technologies URL (Uniform Resource Locator[=web pages]) HTML, CSS etc. Technologies for the presentation of data Databases E. g. , My. SQL, Oracle, etc… (Humans) 2
Limitations of the WWW To get a specific question answered, a person must: 1. Browse extensively, i. e. collect all the relevant information – very tedious and time consuming, because there is too much information with too little structure on the web. 2. Interpret the collected information, accounting for the fact that web contents is heterogeneous in terms of data representation, structure, and character encoding. 3. Integrate that information to derive the answer – the result is only as good as the integration process itself AND only people can carry out this task as of now BECAUSE only people can derive new information from multiple possibly heterogeneous pieces of information. Note: Integration requires not only understanding of the meaning of the information to be integrated, but also ability to perform reasoning, i. e. what follows from what we know. 3
What we cannot do on the web: Examples 1. Find the highest ranked ABET accredited computer science program in CT offered by a public university. 2. Which degree pays more – Computer Science or Exercise Science? 3. Find the cheapest two bedrooms apartment in New Britain near CCSU. Finding answers to these questions require intelligent integration of content from multiple websites, as well as background knowledge. Note that all the required knowledge is on the web, but it is not understandable because it is intended to be processed and used by people. How about the so-called “intelligent personal assistants” like Siri, Cortana and Google Assistant? Not as “intelligent” as they seem to be – they are intended to help general user navigate and collect information, but the interpretation and integration of this information is up to the user. 4
The challenge • There is a huge amount of data on the web BUT most of this data is not linked (consider www without document links). • We need a standard way to represent data which also allows linkable data to be automatically linked. • We need ways to access data to be used by applications. • We need vocabularies to capture the meaning of data. • We need languages to process this data. • We need methods for deriving further information from existing data, such as logical deduction which is the basis for automated reasoning. • And much more. . 5
Towards addressing the challenge: A (smarter) navigation agent for the SW (adapted from Liyang Yu – A Developer’s Guide to the Semantic web) The task: You want to collect information about CCSU that is currently available on the SW. Traditional approach implements the “follow your nose” strategy. You go from one place to another where “relevant” hyperlinks are taking you. Current web crawlers follow the same strategy. Linked data allow us to implement smarter version on this approach, because it connects not just documents but datasets as well, thus allowing us to make data discoveries along the way. CCSU is “known” on the SW under different names, one of which is http: //dbpedia. org/resource/Central_Connecticut_State_University This link returns an RDF document containing information about CCSU (obtained via a process called de-referencing). 6
De-referencing a URI link returns an RDF document 7
8
The role of same. As links Notice the owl: same. As field – it contains multiple links that refer to exact same entity (or resource) CCSU. Clicking on wikidata: Central Connecticut State University takes us to another Linked Data set and returns another RDF document containing more information about CCSU. 9
The implementation of “follow your nose” approach: general framework Step 1: De-reference the chosen URI to get an RDF document that describes it. Step 2: Data discovery – in the returned RDF document, identify all statements that satisfy the following pattern http: //dbpedia. org/resource/Central_Connecticut_State_University <some_Property> <some_Value> These are the facts discovered in the document about CCSU: Facts about <http: //dbpedia. org/resource/Central_Connecticut_State_University>: - <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> : <http: //dbpedia. org/class/yago/Institution 108053576> - <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> : <http: //umbel. org/umbel/rc/Educational. Organization> - <http: //dbpedia. org/ontology/mascot> : <Kizer the Blue Devil> - <http: //dbpedia. org/ontology/campus> : <http: //dbpedia. org/resource/Suburban_area> - <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> : <http: //dbpedia. org/class/yago/Abstraction 100002137> - <http: //www. w 3. org/2000/01/rdf-schema#label> : <? ? ? ? ? ? ? ? ? @ru> - <http: //dbpedia. org/ontology/state> : <http: //dbpedia. org/resource/Connecticut> - <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> : <http: //dbpedia. org/class/yago/Yago. Legal. Actor> - <http: //dbpedia. org/ontology/wiki. Page. Revision. ID> : <744013227^^http: //www. w 3. org/2001/XMLSchema#integer> - <http: //dbpedia. org/ontology/athletics> : <http: //dbpedia. org/resource/NCAA_Division_I> 10
The implementation of “follow your nose” approach: general framework (cond. ) Step 3: Get links to follow to find more data about CCSU. For that, find all patterns in the current RDF document of the forms http: //dbpedia. org/resource/Central_Connecticut_State_University owl: same. As <some_Resource> or <some_Resourse> owl: same. As http: //dbpedia. org/resource/Central_Connecticut_State_University This step returns a list of links where information about CCSU is found: Links to follow http: //rdf. freebase. com/ns/m. 02 hdt 4 http: //wikidata. dbpedia. org/resource/Q 1053764 http: //yago-knowledge. org/resource/Central_Connecticut_State_University http: //sw. cyc. com/concept/Mx 4 rvq. Q 56 Zwp. Eb. Gdrc. N 5 Y 29 yc. A http: //de. dbpedia. org/resource/Central_Connecticut_State_University http: //es. dbpedia. org/resource/Universidad_Estatal_de_Connecticut_Central http: //ko. dbpedia. org/resource/? ? ? _? ? ? http: //www. wikidata. org/entity/Q 1053764 11
These links are stored on a stack and are explored (recursively) in depth-first fashion starting at Step 1. That is, de-referencing the top link returns an RDF document where facts about CCSU are found as suggested in Step 2, and then Step 3 identifies new links to be added to the stack. Stop when there are no more links to follow. In our case, we have http: //www. wikidata. org/entity/Q 1053764 on the top of the stack: Facts about <http: //www. wikidata. org/entity/Q 1053764>: - <http: //www. w 3. org/2004/02/skos/core#pref. Label> : <? ? ? ? ? ? ? ? ? @ru> - <http: //www. w 3. org/2004/02/skos/core#pref. Label> : <? ? ? ? ? ? ? @ur> - <http: //www. wikidata. org/prop/direct/P 281> : <06050> - <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> : <http: //wikiba. se/ontology#Item> - <http: //www. w 3. org/2000/01/rdf-schema#label> : <? ? ? ? ? ? ? @ur> - <http: //www. wikidata. org/prop/P 31> : <http: //www. wikidata. org/entity/statement/q 1053764 -9 CC 87 E 93 -14 C 3 -404 A-BF 0 ADC 0 BF 207 EA 46> - <http: //www. wikidata. org/prop/direct/P 3876> : <http: //www. wikidata. org/entity/Q 8353462> - <http: //www. w 3. org/2004/02/skos/core#pref. Label> : <Universitat Central de Connecticut@ca> - <http: //www. w 3. org/2004/02/skos/core#pref. Label> : <? ? ? @ko> - <http: //www. wikidata. org/prop/direct/P 3417> : <Central-Connecticut-State-University> links to follow //no new links discovered in this case http: //rdf. freebase. com/ns/m. 02 hdt 4 http: //wikidata. dbpedia. org/resource/Q 1053764 http: //yago-knowledge. org/resource/Central_Connecticut_State_University http: //sw. cyc. com/concept/Mx 4 rvq. Q 56 Zwp. Eb. Gdrc. N 5 Y 29 yc. A http: //de. dbpedia. org/resource/Central_Connecticut_State_University http: //es. dbpedia. org/resource/Universidad_Estatal_de_Connecticut_Central http: //ko. dbpedia. org/resource/? ? ? _? ? ? 12
What does it take to implement a SW agent utilizing Linked Data? 1. Since the agent browses different web sites to collect information, each web site must describe that information in a uniform way, i. e. we need a standard for representing data/knowledge on the web. 2. The information at different web sites cannot be arbitrary – it should be consistent with agreed common terms and relations. For example, to describe a person, we use some common terms such as name, birthday, homepage, etc. defined by a FOAF (Friend Of A Friend) vocabulary. 3. The agent must “understand” each chunk of information it collects. 4. The agent must be able to conduct reasoning based on its understanding of common terms and relations. 5. The agent must be able to address queries about the information it has collected. 6. … and more stuff, of course. 13
Linked Open Data Cloud (2007) Over 500 million data chunks (RDF triples) with 120, 000 links between them 14
Linked Data [http: //lod-cloud. net ] For Linked Open Data cloud diagram in 2009 see http: //lod-cloud. net/versions/2009 -07 -14/lod-cloud. pdf For Linked Open Data cloud diagram in 2011 see http: //lod-cloud. net/versions/2011 -09 -19/lod-cloud. pdf For Linked Open Data cloud diagram in 2014 see http: //lod-cloud. net/versions/2014 -08 -30/lod-cloud. pdf For Linked Open Data cloud diagram in June 2018 see https: //lod-cloud. net/ Note: All data in the LOD cloud is linked together and is publicly available. In June, 2018 the LOD cloud contained contains 1, 224 datasets with 16, 113 links between them and hundreds of billions of facts. 15
Linked Data principles [https: //www. w 3. org/Design. Issues/Linked. Data. html] 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF for representing data, SPARQL for querying data) 4. Include links to other URIs so that someone can discover more things. 16
Linked Data Real World Applications • BBC Music Website [https: //www. bbc. co. uk/music] Want to learn more? Read https: //www. cmswire. com/cms/informationmanagement/bbcs-adoption-of-semantic-web-technologies-an-interview 017981. php For more behind the scene, check out https: //www. bbc. co. uk/ontologies • Data. gov [https: //www. data. gov/] contains US government data intended to provide the public with easy access to a huge number of datasets. • Wikidata [http: //wikidata. org/wiki/Wikidata: Main_Page] “Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others. “ (from Wikidata main page) 17
Linked Open Data Cloud (2007) Over 500 million data chunks (RDF triples) with 120, 000 links between them 18
Linked Data [http: //lod-cloud. net ] For Linked Open Data cloud diagram in 2009 see http: //lod-cloud. net/versions/2009 -07 -14/lod-cloud. pdf For Linked Open Data cloud diagram in 2011 see http: //lod-cloud. net/versions/2011 -09 -19/lod-cloud. pdf For Linked Open Data cloud diagram in 2014 see http: //lod-cloud. net/versions/2014 -08 -30/lod-cloud. pdf For Linked Open Data cloud diagram in June 2018 see https: //lod-cloud. net/ Note: All data in the LOD cloud is linked together and is publicly available. In June, 2018 the LOD cloud contained contains 1, 224 datasets with 16, 113 links between them and hundreds of billions of facts. 19
Linked Data principles [https: //www. w 3. org/Design. Issues/Linked. Data. html] 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF for representing data, SPARQL for querying data) 4. Include links to other URIs so that someone can discover more things. 20
Linked Data: Real World Applications • BBC Music Website [https: //www. bbc. co. uk/music] Want to learn more? Read https: //www. cmswire. com/cms/informationmanagement/bbcs-adoption-of-semantic-web-technologies-an-interview 017981. php For more behind the scene, check out https: //www. bbc. co. uk/ontologies • Data. gov [https: //www. data. gov/] Contains US government data intended to provide the public with easy access to a huge number of datasets. (to be discussed – textbook/Chapter 9) • Wikidata [http: //wikidata. org/wiki/Wikidata: Main_Page] “Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others. “ (from Wikidata main page) 21
What is needed to implements Linked Data? • A formal, machine understandable language to represent data and relations between data (i. e. define syntax). • Formal rules to allow machines to extract (classify, query) information from data (i. e. define semantics). • Built-in descriptions (vocabularies, ontologies) to describe domains in which data is to be interpreted. Recall the importance of the context and the pragmatics in data interpretation. Linked data is what the Semantic Web is about. That is, the Semantic Web is a collection of standard technologies to implement linked data. 22
The Semantic Web is a REALITY Currently, the Semantic Web encompasses almost 10000 databases, >85 billion facts, > 800 million links. These are publicly available data, identifiable via URI and accessible via HTTP. Example: DBPedia -- Wikipedia for the Semantic Web, which can be used by both, humans and computers. For humans, information is returned as an HTML document, for computers – information is returned in machine understandable RDF format. The link http: //dbpedia. org/resource/Central_Connecticut_State_University http: //dbpedia. org/page/Central_Connecticut_State_University (returns the web page) http: //dbpedia. org/data/Central_Connecticut_State_University (returns machine-understandable representation) 23
Human readable html document A URI for a particular resource may already exist on the web. Example: http: //dbpedia. org/resource/CCSU -- check it out. Notice that submitted http request about the resource, CCSU, is automatically directed to the human readable HTML document to be returned to the user. However, if an application would refer to that same resource, dbpedia would return a machine readable representation instead. 24
Machine understandable RDF document A URI for a particular resource may already exist on the web. Example: http: //dbpedia. org/resource/CCSU -- check it out. Notice that submitted http request about the resource, CCSU, is automatically directed to the human readable HTML document to be returned to the user. However, if an application would refer to that same resource, dbpedia would return a machine readable representation instead. 25
Semantic Web Layer Cake Source: http: //www. semanticfocus. com/blog/entry/title/introduction-to-the-semantic-web-vision-and-technologies-part-2 foundations/
XML (e. Xtended Markup Language) XML is a flexible text format that is used to structure, store, and transport data over the Web. Contrary to HTML, which is about displaying data, XML is about describing data, BUT there is no one standard way to describe the same data. Example: Consider the concept COURSE, and its instance CS 462. HTML description XML description <H 1> CS 462: AI</H 1> <course> <UL> <title> CS 462: AI </title> <LI> CRV: 4185 <CRV> 4185 </CRV> <LI> Level: undergrad/grad <level> undergrad/grad </level> <LI> Professor: NZ, office hours … <Professor> <LI> Website: www. cs. ccsu. edu/~neli <name> NZ </name> </UL> <office hours> …</office hours> <Website> … </Website> </Professor> </course> 27
XML documents are labeled trees Course Professor CRN Name Title Level Web site 28
XML (contd. ) XML documents are easily readable and understandable by humans, because their tags are familiar terms, but • XML lacks semantics, and • XML makes no commitment to ontological vocabulary, nor to ontological modelling , i. e. can not serve as knowledge representation language. Because XML is a universal meta markup language, the same term can be given different meanings by different sources (for example title can mean “book title” or “person title”). To resolve such inconsistencies, the so-called namespaces are used. For example: xmlns: dc=“http: //purl. org/dc/elements/1. 1/” defines namespace dc (Dublin Core) and <dc: title>Artificial Intelligence</dc: title> suggests that term title refers to a book. xmlns: v=“http: //www. w 3. org/2006/vcard/” describes people, and <v: title>Doctor</v: title> suggests that term title refers to a person. 29
RDF (Resource Description Framework) • RDF is the foundation for representing and processing knowledge on the web. It is a graph-based data model, where knowledge is represented as a list of statements called triples. • Each triple has the form “subject, predicate, object”. Example: “Jones TEACHES Math 101” • Each element of a triple (the resource) is identified by a URI. Example: <http: //my. Univ. edu/people/Jones> <http: //my. Univ. edu/terms/teaches> <http: //my. Univ. edu/courses/Math 101> --- in N-triples format. RDF can be implemented in various ways (called serializations), one of which has XMLbased syntax to support syntactic interoperability. Example: <rdf: RDF xmlns: rdf=“http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#” xmlns: my. Univ=“http: /my. Univ. edu/terms/”> <rdf: Description rdf: about=“http: //my. Univ. edu/jones”> <my. Univ: teaches> <rdf: Description rdf: about=“http: //my. Univ. edu/courses/Math 101”> </rdf: Description> </rdf: RDF> 30
RDF statements are directed labeled graphs <http: //www. math. ccsu/jones> <http: //www. cs. ccsu. edu/~neli/univ. owl#teaches> <http: //www. ccsu. edu/catalog/Math 101> • RDF is provided with a model-theoretic semantics that defines the notion of entailment between two RDF statements. • RDF graphs are finite sets of RDF triples. • This types of graphs are very similar to semantic nets. 31
Another example Consider the following set of triples: { <? p 1 foaf: name “Jones”>, <? p 1 foaf: knows ? p 2>, <? p 1 my. Univ: teaches ? c 1>, <? p 2 my. Univ: studies ? c 1>, <? p 2 foaf: name “Bob”>, <? p 2 foaf: mbox “Bob@mygmail. com”>, <? c 1 rdf: type my. Univ: course>, <? c 1 foaf: name “Math 101”>} where foaf : <http: //xmlns. com/foaf/0. 1/> rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntav-ns#type> “Jones” “Bob” foaf: name foaf: knows foaf: mbox _: p 1 _: p 2 my. Univ: teaches foaf: name my. Univ: studies “Bob@mgmail. com” _: c 1 “Math 101” rdf: type “my. Univ: course” 32
RDF Schema (RDF Vocabulary Description Language) • RDF is a universal language that allow users to describe their own domains, but it does not make assumptions about any particular domain. • RDF Schema defines the vocabulary, specifies object properties and their values, and describes the relations between objects. • RDF Schema organizes this vocabulary in a typed class hierarchy. Example (for short, in N 3 format, which is a superset of N-Triples; it allows us to define a URI prefix and identify entity URIs wrt a set of prefixes at the beginning of the document) @prefix univ: <http: //www. cs. ccsu. edu/~neli/univ. owl>. @prefix rdfs: <http: //www. w 3. org/2000/01/rdf-schema#>. @prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#>. univ: professor rdfs: sub. Class. Of univ: staff rdf: type rdfs: Class. univ: professor rdf: type rdfs: Class. univ: Jones rdf: type univ: professor. 33
RDF/RDFS Example teaches Jones Math 101 Professor Teaching. Ass RDFS Staff 34
RDF and RDFS Axiomatic Semantics • All language primitives are represented by constants, such as Resource, Class, Property, sub. Class. Of, Literal, etc. • A few predefined predicates are used to represent relations between constants, such as: – An RDF triple is represented as Prop. Val (P, R, V), where P is a property, R is a resource, and V is a value; – Predicate Type (R, T) states that resource R has the type T, and it is equivalent to Prop. Val (type, R, T). All classes are instances of Class and have the type Class, i. e Type (Class, Class), Type (Property, Class), Type (Resource, Class), etc. • Resource is the most general class – every class and every property is a resource. • Predicates in RDF statements are properties. 35
RDF and RDFS Semantics (contd) In RDFS, we also have subclasses , subproperties, and constrains. • sub. Class. Of is a property, i. e. Type(sub. Class. Of, Property). • If class C is a subclass of class C’, then all instances of C are also instances of C’, i. e. Prop. Val (sub. Class. Of, ? c’) (Type(? c, Class) & Type(? c’, Class) & ? x (Type (? x, ? c) Type (? x, ? c’))) • Property P is a subproperty of property P’, if P’(x, y) whenever P (x, y), i. e. Type (sub. Property. Of, Property) Prop. Val (sub. Property. Of, ? p’) (Type(? p, Property) & Type(? p’, Property) & ? r ? v (Prop. Val (? p, ? r, ? v) Prop. Val (? p’, ? r, ? v))) • Every constraint resource is a resource, i. e. Prop. Val (subclass. Of, Constraint. Resourse, Resourse) • Constraint properties are all properties that are also constraint resourses, i. e (Type (? cp, Constraint. Property) (Type (? cp, Constraint. Resource) & Type (? cp, Property)) 36
RDF and RDFS Semantics (contd) • domain and range are constraint properties, i. e. Type (domain, Constraint. Property) Type (range, Constraint. Property). • Domain of a property is a set of all object to which P applies, i. e. Prop. Val (domain, ? p, ? d) ? x ? y (Prop. Val (? p, ? x, ? y) Type (? x, ? d)) • Range of property P is the set of all values that P can take, i. e. Prop. Val (range, ? p, ? r) ? x ? y (Prop. Val (? p, ? x, ? y) Type (? y, ? r)) Given all these axioms, we can derive the following formulas: Prop. Val (domain, range, Property) Prop. Val (range, Class ) Prop. Val (domain, Property) Prop. Val (range, domain, Class) Example. Given Prop. Val (sub. Class. Of, Professor, Staff), Prop. Val (domain, teaches, Professor), Prop. Val (teaches, Jones, Math 1) we can derive Type (Jones, Staff). 37
A Direct Inference System for RDF and RDFS • Based on rules of the form If E contains certain triples Then add to E certain triples, where E is a set of RDF triples. Example rules (from W 3 C RDF recommendations): Ø If E contains the triple (? x, ? p, ? y) Then E also contains the triple (? p, rdf : type, rdf : property) Ø If E contains the triples (? u, rdfs : sub. Class. Of, ? v) and (? v, rdfs : sub. Class. Of, ? w) Then E also contains the triple (? u, rdfs : sub. Class. Of, ? w) Ø If E contains the triples (? x, rdf : type, ? u) and (? u, rdfs : sub. Class. Of, ? v) Then E also contains the triple (? x, rdf : type, ? v) Ø If E contains the triples (? x, ? p, ? y) and (? p, rdfs : range, ? u) Then E also contains the triple (? y, rdf : type, ? u) • 38
How inference in RDF is different from inference in RDFS? Consider the following triples • my. Uni: Student 1 rdf: type my. Uni: Teaching. Assistant. • my. Univ: Teaching. Assistant rdfs: sub. Class. Of my. Univ: Staff. RDF inference will not return an answer to the query to retrieve all staff members, i. e. (? x , rdf : type, my. Univ : Staff) because there is no triple matching this pattern. RDFS will return the instances of the Teaching. Assistant class using the rule (called the “type propagation” rule) If E contains the triples (? x, rdf : type, ? u) and (? u, rdfs : sub. Class. Of, ? v) Then E also contains the triple (? x, rdf : type, ? v) 39
Types of inferences in SW applications 1. Class membership: if x is an instance of class C, and C is a subclass of D, we want to infer that x is an instance of D. 2. Equivalence of classes: If class A is equivalent to class B, and class B is equivalent of class C, then A is equivalent to C. 3. Classification: if a property-value pair is declared to be a sufficient condition for membership in class A, then if individual x satisfies this condition, x must be an instance of A. 4. Consistency: if x is declared to be an instance of class A where A B C, A D , and B D = , then the ontology is inconsistent because class A must be empty but instead x A. 40
Multiple inheritance and RDFS Consider the rule If E contains the triples (? x, rdf : type, ? u) and (? u, rdfs : sub. Class. Of, ? v) Then E also contains the triple (? x, rdf : type, ? v) Assume the following triples a. “Bob” rdf : type my. Univ : Teaching. Assistant. b. “Bob” rdf : type my. Univ : Student. c. my. Univ: Teaching. Assistant rdfs : sub. Class. Of my. Univ: Staff . From a. and c. and the above rule will can derive “Bob” rdf : type my. Uni : Staff. The later may be inconsistent with b. if Staff and Student are supposed to be disjoint classes. But disjointness of classes cannot be expressed in RDFS. In RDFS, if ? A is sub. Class. Of ? B, and ? A is sub. Class. Of ? C, then any individual ? x that is a member of ? A will also be a member of ? B and ? C. That is, the range definitions in RDFS are not used to restrict the range of a property, but to infer the membership of the range. 41
What can we deduce in RDFS? In summary, the inference capabilities of RDFS are limited to the following : 1. Given the domain and the range of a property, we can deduce: • Class membership from the domain of a property. • Class membership from the range of a property. Example: given that “Course is. Taught. By Professor” and “Math 101 is. Taught. By Jones”, we can derive that Math 101 Course, and Jones Professor. 2. Given a class hierarchy, we can deduce superclass membership. Example: given that Professor Staff and Jones Professor, we can derive that Jones Staff. 3. Given a property hierarchy, we can deduce new facts from subproperty relationships. Example: from teach. At emplyed. By and “Jones teach. At CCSU” we can derive that “Jones employed. By CCSU” 42
What cannot we deduce in RDFS? 1. We can’t say that two classes are disjoint, i. e. we can define Student and Staff as subclasses to Person class, but can say that they are disjoint. 2. Property range is defined globally for all classes, we can’t declare range restrictions that apply to some classes only, i. e. exceptions are not allowed. 3. We can’t build Boolean combinations of classes. For example, we may want to declare a new class, person, which is disjoint union of classes male and female. 4. Cardinality restrictions are not allowed. For example, we can’t say that a person has exactly two parents, or a class has exactly one instructor. 5. We can’t declare a property to be inverse of another property, transitive, functional, etc. 43
OWL: The Web Ontology Language § The original OWL language, OWL 1, was intended to provide a richer expressiveness compared to RDFS which is why it was based on SHOIN(D) logic. More expressive power, however, may lead to undesirable computational properties which is why OWL 1 was designed in 3 different flavors to address different knowledge representation needs: 1. 2. 3. § § § OWL Full: fully compatible with RDFS which is further extended with cardinality constraints and other means for maximum expressivity, BUT the language is undecidable. OWL DL: subset of OWL Full to allow for efficient reasoning, but is not fully compatible with RDFS. OWL Lite: subset of OWL DL that does not allow for enumerated classes, disjointness and arbitrary cardinality. The latest version of OWL, OWL 2, is based on SROIQ(D) logic. It also comes in different flavors: OWL EL, OWL RL, OWL QL, which are all subsets of OWL 2 DL, which turn is a subset of OWL 2 Full. OWL is based on the Open World Assumption, which states that the absence of information is not the reason to assume that this information is false. OWL does not rely on the Unique Name Assumption (which is the case with data bases), i. e. if two names are not explicitly stated to be different, they may refer to the same 44 individual.
OWL RDF/RDFS relation rdfs: Resource rdfs: Class rdf: Property owl: Class owl: Object. Property owl: Datatype. Property OWL uses RDF syntax. owl: Class, owl: Datatype. Property, and owl: Object. Property are specializations of rdfs: Class and rdf: Property, respectively. 45
OWL 1 Syntax OWL 1 is based on the SHOIN(D) logic, which provides for the following expressiveness: § The TBox defines subsumption relationships between classes (ex. C ⊑ D) § The ABox contains facts about class membership (ex. C(a), C(b)), properties relations (ex. R(a, b)), equality and difference relations between individuals (ex. a = b, a b) § The RBox defines subsumption/inclusion relationships between properties (ex. R ⊑ S), inverse properties (ex. R - ), and transitivity properties (ex. R ⊑ + R) § Class constructors: conjunction (C ⊓ D), disjunction (C ⊔ D), negation ( C) § Property restrictions – universal ( R. C) and existential ( R. C) § Number restrictions -- n R (max. Cardinality) and n R (min. Cardinality) Examples: 1 has. Child In FOL: n y has. Child(x, y) 1 has. Child In FOL: n y has. Child(x, y) § Closed classes (nominals} – {a} § Datatypes 46
OWL 2 Syntax OWL 2 is based on the SROIQ(D) logic, which provides for the following additional expressiveness compared to OWL 1: § § § The TBox allows also for: § equivalence relationships between classes (ex. C D) § disjoint union (ex. Every person is either male or female BUT NOT both. ) § disjoint classes (ex. No one can be a member of any pair of classes from a given set – say, Faculty, Student, Staff) § Self restriction: a special class expression Self: R. Self. Example: “Optimists are people who believe in themselves” § Qualified cardinality restrictions n R. C and n R. C Examples: “Man with exactly three sons”, “Each student has at most one ID number”. The ABox allows also for negated data and object properties (ex. R(a, b)) The RBox may contain in addition to simple properties, inverse properties (ex. R - ) and universal properties (U) and also allows for general inclusion, property chains (ex. R 1 R 2 ⊑ S), symmetry, reflexivity, irreflexivity and disjunctiveness of properties. 47
The nuts and bolts of OWL 2: Classes and Individuals • To define a class (in Turtle): : Book a owl: Class. : Person a owl: Class. • There are two special classes in OWL: – owl: Thing, a class that contains all individuals; T C ⊔ C – owl: Nothing, a class with no members – empty class; C ⊓ C • To define an individual as a member of a class: : Semantic. Web. Technologies a : Book. : Jones a : Lecturer. • We can define an individual as a owl: Named. Individual without specifying its class membership: : Semantic. Web. Technologies a owl: Named. Individual. : Jones a owl: Named. Individual. • To state that two individuals are different [ ] a owl: All. Different; 48 owl: distinct. Members ( : Mary : Ann : Mary. Ann ).
More on classes • Enumerated (closed) classes: they are defined via directed enumeration of their members. Example: Professor. CS 151 {Kjell, Markov, Zlatareva} : Professor. CS 151 a owl: Class; owl: one. Of ( : Kjell : Markov : Zlatareva ). • Equivalent classes. Example: : Professor owl: equivalent. Class : Lecturer. • Disjoint classes. Example: : Student owl: disjoint. With : Professor. OR [ ] a owl: All. Disjoint. Classes; owl: members ( : Faculty : Student : Staff ). Note: OWL assumes that classes overlap. We cannot assume that an individual in NOT a member of a class only because we have not declared it to be a member of the class (OWA!). To assert that an individual cannot be a Faculty, a Student, and a Staff at the same time we must declare these classes disjoint. 49
Building complex classes • Union (disjunction) of classes: instances of the Union of two classes are either an instance of one, or both classes. Example: Person Man ⊔ Woman : Person a owl: Class; owl: equivalent. Class [ owl: union. Of ( : Man : Woman) ]. • Intersection (conjunction) of classes: instances of the intersection of two classes are instances of both classes at the same time. Example: Man Person ⊓ Male : Man a owl: Class; owl: equivalent. Class [ owl: intersection. Of ( : Person : Male) ] • Complement (logical negation): a class and its complement do not have common instances – note that this is semantically equivalent to classes being disjoint. Example: Professor Student Professor ⊔ Student : Professor a owl: Class; rdfs: sub. Class. Of [ owl: disjoint. With : Student. 50 owl: complement. Of : Student ].
OWL Properties • “Inherits” from RDFS -- rdfs: sub. Property. Of, rdfs: domain, rdfs: range. • rdf: Property “splits” in OWL to owl: Object. Property and owl: Datatype. Property. • Object properties are relations between individuals. Example : Mary : has. Mother : Anna. • Datatype properties link an individual to a data literal. Example : Mary : has. Age 25. • OWL has a third type of property, called annotation property. It is used to add matadata to classes, individuals, object and datatype properties. Example : CS 1 : part. Of : Accredited. Program. owl: incompatible. With is a predefined annotation property which might be especially useful if the domain contains inconsistent entities at the same time. Note: In OWL, properties may have sub-properties, thus creating hierarchies of properties. For example, has. Mother is a sub-property of has. Parent. Subproperties specialize super-properties (same as class/sub-class relation). 51
Property axioms • owl: equivalent. Property (parent. Of has. Child) Example : parent. Of owl: equivalent. Property : has. Child. • owl: inverse. Of (parent. Of ⊑ child. Of- ) Example : parent. Of owl: inverse. Of : child. Of. • owl: Symmetric. Property (sibling- ⊑ sibling and sibling ⊑ sibling- ) Example : sibling rdf: type owl: Symmeric. Property. : Mary : sibling : Sammy. : Sammy : sibling : Mary. • owl: Transitive. Property (ancestor+ ⊑ ancestor) Example : Bob : ancestor. Of : Tom. : Tom : ancestor. Of : Carl. : ancestor. Of rdf: type owl: Transitive. Property. : Bob : ancestor. Of : Carl. • owl: Reflexive. Property (ex. knows) and owl: Irreflexive. Property (ex. is. Mother. Of). 52
Transitivity property example : Alexia a : Person ; : has. Parent : Willem. Alexander. : Beatrix a : Person ; : has. Parent : Wilhelmina. : Person a owl: Class ; rdfs: sub. Class. Of owl: Thing. : Wilhelmina a : Person. : Willem. Alexander a : Person ; : has. Parent : Beatrix. : has. Ancestor a owl: Object. Property , owl: Transitive. Property ; rdfs: domain : Person ; rdfs: range : Person. : has. Parent a owl: Object. Property ; rdfs: domain : Person ; rdfs: range : Person ; rdfs: sub. Property. Of : has. Ancestor. 53
54
Property axioms (contd. ) • owl: Functional. Property and owl: Inverse. Functional. Property Example : Mary : has. Mother : Peggy. : Mary : has. Mother : Margaret. : Margaret : is. Mother. Of : Mary. : Peggy : is. Mother. Of : Mary. : has. Mother rdf: type owl: Functional. Property. : is. Mother. Of rdf: type owl: Inverse. Functional. Property. : Peggy owl: same. As : Margaret. • owl: property. Chain. Axiom Example : uncle. Of owl: property. Chain. Axiom ( : brother. Of : father. Of ). : brother. Of rdf: type owl: Symmetric. Property. : Bob : brother. Of : John. : Bob : father. Of : Carl. : John : brother. Of : Bob ; : uncle. Of : Carl. 55
A note on domain and range of properties When we define a domain and range of a property in OWL, we simply say members of which classes play those roles. But we cannot use them as constraints to be checked – rather, they are used as axioms in reasoning. Recall the following two RDFS rules/axioms: If P rdfs: domain D. If P rdfs: range R. AND x P y. X P y. THEN x rdf: type D. y rdf: type R If we have declared the domain for : teaches to be : Professor, and we apply the above “domain” rule to : Bob : teaches : CS 1. : Bob rdf: type : Teaching. Assistant. : Teaching. Assistant rdfs: sub. Class. Of : Professor #semantically incorrect, which will be prevented if Professor and Teaching. Assistant are declared disjoint. 56
Classes as Restrictions on properties In OWL, we can define classes using local restrictions on a specific property. Note the class hierarchy in OWL: rdfs: Class We can define a class of individuals satisfying a owl: Class specified restriction using owl: Restriction. The difference between owl: Class and owl: Restriction is owl: Restriction that the former defines a named class, while owl: Rectriction defines an anonymous/unnamed class. Example. Define a class Red. Wine as a subclass of all entities with red color: : Red. Wine a owl: Class ; rdfs: subclass. Of [ a owl: Restriction ; owl: on. Property : color ; owl: has. Value “red”^^xsd: string ]. 57
Types of OWL restrictions on properties Assume we want to specify the following classes of individuals: o a class of individuals whose members are all CS students, o a class of individuals whose member may include CS students, o a class on individuals that contain at least 2 CS students, o a class on individuals that contain at most 5 CS students, o a class on individuals that contain exactly 2 CS students, etc. Such classes can be defined by using OWL restrictions that fall in the following categories: 1. Quantifier restrictions. – owl: all. Values. From [rdfs: Class OR owl: Class in OWL Lite / DL ] – the class of individuals that only have the restricted property relation to individuals from the specified class. – owl: some. Values. From [rdfs: Class OR owl: Class in OWL Lite / DL ] – the class of individuals that have at least only relation along the specified property. 58
Restrictions on Properties – owl: all. Values. From Consider the following set of triples: : Peggy : has. Child : Mary rdf: type : Female. : Bob : has. Child : Anna , Sammy. : Sammy rdf: type : Female. : Anna rdf: type : Female. Assume, we want to define a class of individuals who only have daughters: : Parent. Of. Daughters owl: equivalent. Class [ owl: all. Values. From : Female ; owl: on. Property : has. Child ]. : Peggy rdf: type : Parent. Of. Daughters. : Bob rdf: type : Parent. Of. Daughters. 59
Restrictions on Properties – owl: some. Values. From Recall that properties describe binary relations. Consider the following triples: : Peggy : has. Child : Mary rdf: type : Person. : Bob : has. Child : Anna , Sammy. : Sammy rdf: type : Person. : Anna rdf: type : Person. Assume, we want to define a class of individuals with at least one child: : Parent owl: equivalent. Class [ owl: some. Values. From : Person ; owl: on. Property : has. Child ]. : Peggy rdf: type : Parent. : Bob rdf: type : Parent. 60
More examples on quantifier restrictions Computer science club consists of only computer science majors: : CSClub a owl: Class ; rdfs: sub. Class. Of [ a owl: Restriction ; #superclass owl: on. Property : club. Member ; owl: all. Values. From : CSMajor ]. Computer science club consists of computer science majors (but may include other individuals): : CSClub a owl: Class ; rdfs: sub. Class. Of [ a owl: Restriction ; #superclass owl: on. Property : club. Member ; owl: some. Values. From : CSMajor ]. Notice that restrictions define anonymous superclasses of the class being described. CSClub is a sub. Class. Of the specified anonymous restriction class. 61
SPARQL: a query language for the Semantic Web • • • SRARQL is a protocol: RDF over HTTP SPARQL is a query language; it returns results in different formats XML, JSON, Turtle, etc. SPARQL engine works on RDF datasets SPARQ Protocol Layer HTTP User Server SPARQL query language SPARQL endpoint User interface RDF knowledge base, (Jena Fuseki Server GUI) also called a triplestore 62
What is SPARQL endpoint? • A SPARQL endpoint is an interface that users (people or applications) can access to query RDF dataset by means of the SPARQL query language. • A SPARQL endpoint can be configured to return results in different formats depending on the user – for people that result is in a form of an HTML table, for applications the results is serialized in a machineprocessible format such as RDF/XML or Turtle. • An endpoint interface provides text fields where the user can type the URL of the dataset you wish to query, and the query itself. On hitting the ‘Submit’ button, a dynamically generated webpage listing the values of the query variables in a table is returned. • There are libraries allowing the user to incorporate SPARQL queries into application programs. Example: Jena – a Java library to be accessed at http: //jena. apache. org/. 63
How SPARQL works? A SPARQL query searches an RDF graph for specific information. The search is based on the graph pattern matching algorithm. The idea behind it: RDF graph WHERE Where to search? SELECT Example: SELECT ? x ? y ? z WHERE { ? x <URI> ? y. ? y <URI> ? z. } 64
SPARQL pattern matching algorithm • A graph pattern is an RDF triple with one or more variables, which are identified by ? , such as ? x <URI> ? y. • Several graph patterns can be combined to form a complex conjunctive query. Example: { ? x <URI> ? y <URI> ? z. } • Graph pattern matching is a recursive algorithm that works as follows: Repeat until all graph patterns are matched against all RDF triples in the identified dataset by performing the following 3 steps: Step 1: Specify the graph pattern to be matched. Step 2: Match the specified pattern to each RDF triple in the data set. Step 3: Bind pattern variables to the resources in the data set triples that resulted in a match. Return all bindings. Note: Something will be returned ONLY if ALL variables in the pattern / conjunctive pattern are matched. If even one variable is not matched, nothing will be returned. 65
Example (arq. Example 1. ttl) @prefix vcard: <http: //www. w 3. org/2006/vcard/ns#>. @prefix univ: <http: //www. cs. ccsu. edu/~neli/university. owl#>. @prefix owl: <http: //www. w 3. org/2002/07/owl#>. @prefix xsd: <http: //www. w 3. org/2001/XMLSchema#>. @prefix rdfs: <http: //www. w 3. org/2000/01/rdf-schema#>. @prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#>. @prefix dc: <http: //purl. org/dc/elements/1. 1/>. univ: Lecturer 1 vcard: title "Ph. D" ; univ: teaches "CS 101" ; univ: first. Name "Mark" ; univ: last. Name "Kain". univ: Lecturer 2 vcard: title "Doctor" ; univ: teaches "CS 202" ; univ: teaches "CS 101" ; vcard: family-name "Novac". univ: Lecturer 3 rdf: type univ: Lecturer ; vcard: given-name "John" ; vcard: family-name "Homes"; vcard: title "Doctor" ; univ: teaches "CS 303". univ: book 1 rdf: type univ: Textbook ; dc: title "Intro to CS". univ: CS 101 univ: uses univ: book 1. 66
And the query that goes with it prefix vcard: <http: //www. w 3. org/2006/vcard/ns#> prefix univ: <http: //www. cs. ccsu. edu/~neli/university. owl#> prefix owl: <http: //www. w 3. org/2002/07/owl#> prefix xsd: <http: //www. w 3. org/2001/XMLSchema#> prefix rdfs: <http: //www. w 3. org/2000/01/rdf-schema#> prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#> prefix dc: <http: //purl. org/dc/elements/1. 1/> select ? x ? y where {? x univ: teaches ? y. } 67
After creating new dataset, arq. Example 1. ttl, you can upload it in Fuseki and then query it. 68
Placing a query 69
Running the query returns the result in the selected Table format 70
Raw response in preselected JASON format returned (other output formats also available) 71
What can we do with SPARQL? 1. 2. 3. 4. 5. 6. 7. Extract data as RDF subgraphs, URIs, blank nodes, etc. Explore data via query for unknown relations. Transform RDF data from one vocabulary into another Construct new RDF graphs based on RDF query graphs Update RDF graphs Do logical entailment for RDF, RDFS, and OWL Do federated queries over deferent SPARQL endpoints 72
Building new dataset with SPARQL: construct query Consider the following query run on arq. Example 1. ttl prefix vcard: <http: //www. w 3. org/2006/vcard/ns#> prefix univ: <http: //www. cs. ccsu. edu/~neli/university. owl#> prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#> CONSTRUCT {? x rdf: type univ: Professor. } WHERE {? x univ: teaches ? y. } 73
arq. Example 1 revisited (contd. ) The generated RDF file can be saved and later merged with the original. ttl file if needed. But, this query have no effect on the arq. Example 1. ttl file. Consider now the following query -- to be called update request because it will be executed in Fuseki prefix vcard: <http: //www. w 3. org/2006/vcard/ns#> prefix univ: <http: //www. cs. ccsu. edu/~neli/university. owl#> prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#> INSERT {? x rdf: type univ: Professor. } WHERE {? x univ: teaches ? y. } As a result of this update request, the file arq. Example 1. ttle is now changed. 74
arq. Example 1. ttl updated 75
Deleting triples SPARQL update operations DELETE and DELETE DATA work the same way as INSERT and INSERT DATA, respectively. DELETE DATA specifies the triples to be deleted while DELETE can also use triple patterns for more flexibility. Example: delete the title of book 1 prefix vcard: <http: //www. w 3. org/2006/vcard/ns#> prefix univ: <http: //www. cs. ccsu. edu/~neli/university. owl#> prefix owl: <http: //www. w 3. org/2002/07/owl#> prefix xsd: <http: //www. w 3. org/2001/XMLSchema#> prefix rdfs: <http: //www. w 3. org/2000/01/rdf-schema#> prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#> prefix dc: <http: //purl. org/dc/elements/1. 1/> DELETE {univ: book 1 dc: title ? x. } WHERE {univ: book 1 dc: title ? x. } Note: CLEAR DEFAULT update request clears all the triple of the default dataset. 76
77
Updating the dataset: combination of INSERT and DELETE updates Consider arq. Example 1. ttl and assume you want to updated teaching schedule of Lecturers 1 and 2. prefix vcard: <http: //www. w 3. org/2006/vcard/ns#> prefix univ: <http: //www. cs. ccsu. edu/~neli/university. owl#> prefix owl: <http: //www. w 3. org/2002/07/owl#> prefix xsd: <http: //www. w 3. org/2001/XMLSchema#> prefix rdfs: <http: //www. w 3. org/2000/01/rdf-schema#> prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#> prefix dc: <http: //purl. org/dc/elements/1. 1/> DELETE {univ: Lecturer 1 univ: teaches ? x. univ: Lecturer 2 univ: teaches ? y. } WHERE {univ: Lecturer 1 univ: teaches ? x. univ: Lecturer 2 univ: teaches ? y. }; # Notice the semicolon here INSERT DATA {univ: Lecturer 1 univ: teaches "CS 501". univ: Lecturer 2 univ: teaches "MATH 501". } 78
The updated file 79
- Slides: 79