Introduction to the Semantic Web Vincenzo Maltese Fausto
Introduction to the Semantic Web Vincenzo Maltese, Fausto Giunchiglia University of Trento LDKR course These slides are inspired by the book A Semantic Web Primer written by G. Antoniou, F. van Harmelen (and related slides)
Roadmap The current Web From the current Web to the Semantic Web technologies A layered approach Semantic Web applications 2
The Current Web 3 Chapter 1
The current Web An enormous collection of data and documents Any kind of material Mixed together Keeps growing Open to all Mostly to be directly used by people Unstructured content (text, images, videos) Semi-structured content (tables) Typical usage: keyword search, navigation Search by hand Consumption by reading Navigation by clicking No meaning of terms 4
A Web for direct usage by people: navigation 5
A Web for direct usage by people: keyword search 6
Problems of Keyword-Based Search Engines High recall, low precision Results are highly sensitive CITY PERSON to vocabulary Results are single Web pages Human involvement is necessary to interpret and combine results Results of Web searches are not readily accessible by other software tools FACILITY PERSON 7
A Web for direct usage by people: encyclopedias 8
A Web for direct usage by people: encyclopedias Wikipedia infoboxes The meaning of Web content is not machineaccessible lack of semantics Wikipedia categories 9
From the current Web to the Semantic Web 10 Chapter 1
The Semantic Web vision Represent Web content in a form that can be processed by machines such that intelligent services can be automatically developed and combined An extension of the WWW, in which information is given well-defined meaning, better enabling computers and people to work in cooperation [T. Berners-Lee et al. , 2001] A new form of Web content that is computer comprehensible will open up a revolution of new possibilities [T. Berners-Lee et al. , 2001] An alternative approach to represent Web content in machine processable way, and to use intelligent techniques to take advantage of these representations [G. Antoniou and F. van Harmelen, 2004] An extra abstraction layer, a so-called semantic layer, to be built on top of the Web [F. Giunchiglia et al. , 2010] 11
Example: arrange a trip to Crete Consider that you are planning vacation to major excavation region of Heraklion in Crete Island You use a search engine You find a list of hotels by location In the list you find out that an hotel of your favorite hotel chain is there Unfortunately, you do not see it in the main website of the hotel chain (failure) Consider that you are planning a conference trip to Crete Island You use a search engine You find many branches of your favorite hotel chain in the surroundings of the conference venue You wonder to know the nearest (minimum walking distance) one You use Google Maps to find out, but you need to copy-paste the addresses from the website of the hotel and of the conference venue (manual effort) [D. Allemang &J. Hendler, 2008] Can we do it any better? 12
Example: find answers How many and what are the municipalities in Trentino? Information is hard-coded in HTML pages Information cannot be directly processed by machines Information is hidden in authorities’ databases Can we do it any better? Different sources may provide different information, not easy to keep aligned 13
Contributing fields Knowledge Representation and Reasoning (KRR) Representing knowledge Reason about known facts Knowledge Organization (KO) Classify documents Support information retrieval Knowledge Management (KM) Acquiring, accessing, and maintaining knowledge within an organization Key activity of large businesses: internal knowledge as an intellectual asset It is particularly important for international, geographically dispersed organizations Most information is currently available in a poorly structured form (e. g. text, audio, video) 14
Why? The importance of KM Gartner predicts that, by 2017, 33 percent of Fortune 100 organizations will experience an information crisis, due to their inability to effectively value, govern and trust their enterprise information. 15
Evolving KM Current KM Semantic Web enabled KM Organizing Syntactic indexing techniques Semantic indexing techniques (based on meaning) Searching Keyword-based search engines Semantic query answering Extracting Human involvement strictly necessary for browsing, retrieving, interpreting, combining Data represented semantically can be extracted (semi-) automatically and exploited automatically Maintaining Inconsistencies in terminology, outdated information. Automated tools for maintenance Viewing Impossible to define views on Web Defining what and who can view knowledge certain parts of information (even parts of documents) will be possible. 16
Benefits of the new paradigm Data Better understanding of the content and reduced ambiguity/inconsistency Enabling connections among data Semantics as a standard and interoperability Reduced cost of data reuse Applications Enabling smart agent-based applications Automatic information interpretation Automatic recommendation and negotiation systems Automatic translations Provenance and reputation computation and updates Enforcing privacy policies Better coordination across different applications Reduced development costs Reduced human effort Example: Ride Sharing (Smart. Society) Example: the ESSENCE training network 17
Semantic Web technologies 18 Chapter 1
Semantic Web key technologies Data and documents are given explicit semantics. Explicit Metadata Properties are codified as explicit metadata (e. g. XML, JSON) Standard Vocabularies (e. g. , Dublin Core, FOAF) Semantic Web Languages (e. g. , RDF, OWL) Ontologies Formal language and Vocabulary A set of terms and semantic relations between them Logic and inference Logic as a tool for expressing knowledge and semantics Agents Artificial agents that reason and act automatically 19
HTML Web content is mainly formatted for human readers rather than artificial agents HTML is the predominant language in which Web pages are written The vocabulary describes the presentation layer <h 1>Agilitas Physiotherapy Centre</h 1> Welcome to the home page of the Agilitas Physiotherapy Centre. Do you feel pain? Have you had an injury? Let our staff Lisa Davenport, Kelly Townsend (our lovely secretary) and Steve Matthews take care of your body and soul. <h 2>Consultation hours</h 2> Mon 11 am - 7 pm Tue 11 am - 7 pm Wed 3 pm - 7 pm Thu 11 am - 7 pm Fri 11 am - 3 pm<p> But note that we do not offer consultation during the weeks of the <a href=". . . ">State Of Origin</a> games. PROBLEM: Artificial agents are not able to reason and act on HTML 20
Explicit metadata Metadata is data about data Metadata capture (part of) the meaning of data Semantic Web does not rely on text-based manipulation, but rather on machine-processable metadata Here the vocabulary describes metadata This representation is far easier to process by machines <company> <treatment. Offered>Physiotherapy</treatment. Offered> <company. Name>Agilitas Physiotherapy Centre</company. Name> <staff> <therapist>Lisa Davenport</therapist> <therapist>Steve Matthews</therapist> <secretary>Kelly Townsend</secretary> </staff> </company> PROBLEM: Still the meaning is not explicit 21
Ontologies An ontology is an explicit specification of a shared conceptualization [Gruber, 1993] Terms denote important concepts (classes of objects) of the domain Relations are defined between these terms: ontologies are often thought of as directed graphs By providing a common formal terminology and understanding of a given domain of interest, an ontology allows for automation (logical inference), supports reuse and favor interoperability across applications and people. 22
Kinds of ontologies Informal representations User classification Web directories Business catalogs Progressive formal Enumerative (e. g. DDC) Knowledge Organization Systems Faceted Classifications Formal ontologies Expressed into a formal Ontologies differ according to the purpose and the logic language and semantics [Uschold and Gruninger, 2004] represented using formal specifications, e. g. , OWL) 23
Additional elements in ontologies Relations e. g. X teaches Y e. g. X friend of Y Attributes e. g. X height is 1. 85 m e. g. X age is 45 Value restrictions e. g. only faculty members can teach courses e. g. the range of the attribute height goes from 0 to 3 m) Disjointness statements e. g. faculty members and administrative staff are disjoint Logical relationships between objects e. g. every department must include at least 10 faculty members 24
The Role of Ontologies on the Web By providing a common terminology and understanding of a given domain of interest, ontologies: overcome differences in terminology (vocabulary control) and support learning support knowledge organization (indexing, search and navigation of information) support reuse and favor semantic interoperability across applications and people If the terminology is formal (TBOX), they allow for automation (logical inference) Ontologies are useful for improving the accuracy of Web search engines can look for pages that refer to a precise concept in an ontology (indexing, concept search) Web search can exploit the generalization/specialization relations between concepts (query expansion) 25 If a query fails to find any relevant documents, the search engine may suggest to the user a more general query
Web Ontology Languages RDF and RDF Schema RDF is a data model for objects and relations between them RDF Schema is a vocabulary description language RDF Schema describes properties and classes of RDF resources RDF Schema provides semantics for generalization hierarchies of properties and classes OWL is a richer ontology language It supports relations between classes, including disjointness It supports cardinality (e. g. exactly one) It supports richer typing of properties It supports characteristics (meta-properties) of properties (e. g. , symmetry) 26
Logical inference When a formal language (logic) is used, automated reasoners can deduce (infer) conclusions from the given knowledge Logic can also be used by intelligent agents for making decisions and selecting courses of action Logic is more general than ontologies Logic can be used to uncover ontological knowledge that is implicitly given It can also help uncover unexpected relationships and inconsistencies 27
Software Agents A personal agent on the Semantic Web will be able to: Software agents work autonomously and proactively receive some tasks and preferences from the person seek information from Web sources communicate with other agents compare information about user requirements and preferences make certain choices give answers to the user 28
A layered approach 29 Chapter 1
The Semantic Web Stack(s) The development of the Semantic Web proceeds in steps Each step builds a layer on top of another Principles of “downward compatibility” and “Upward partial understanding” Two alternative stacks are currently in place 30
The layers Digital signatures, recommendations, rating agencies …. Proof generation, exchange, validation It enhances ontology languages further (application-specific) More expressive than RDF Schema, OWL is the current Semantic Web standard RDF basic data model for facts + RDF Schema simple ontology language XML as syntactic basis URIs are universal reference identifiers 31
Semantic Web Applications 32 Chapter 1
Semantic Data and Web of Data The Semantic Web is a web of interconnected datasets where: one data element can point to another (through URIs), rather than a webpage points to another, forming a Web of data (rather than a Web of pages) the Web infrastructure provides a data model supporting a scenario in which a single entity can be referred to over the Web the coherence of the data model is part of the Web infrastructure 33
Linked Data The Linked Data approach forms the basis of data publishing guidelines pinpointing how can data from government, public and private sectors be more valuable for the consumers Principles: the use of http URIs as the identifiers of things (concepts, entities and attributes) the provision of meaningful content published in RDF for each such URI reference the production of navigable content via links 34
Linked Data 35
links to other RDF open datasets W 3 C open format (e. g. RDF) Non-proprietary format (e. g. CSV) structured format publishing on the Web with an open license regardless of format The 5 -start rating system 36
Open Government Data Various governmental departments as part of their daily activities, produce, manage and store large volume of authentic and interesting data Why opening data? : great economic value strong potential for supporting innovation transparency and participation improving organizational and communication efficiency support data-centric applications Not all of this data can be made publicly available because of the constraints such as: privacy issues intellectual property rights national security concerns 37
Summary 38 Chapter 1
Features of the Semantic Web The Web is characterized by the AAA Slogan: Anyone can say Anything about Any topic The Semantic Web is a radical new way of thinking about a better representation of information with embedded meaning The Semantic Web is still characterized by the AAA Slogan where anyone can contribute with a piece of data about some entity that can be linked via URIs to other sources This requirement is at the basis of Web languages and follows an Open World Assumption 39
References o T. Berners-Lee, J. Hendler, & O. Lassila (2001, May). The Semantic Web. o o o o Scientific American 284, 34– 43. Gruber (1993). A translation approach to portable ontology specifications. Knowledge Aquisition, 5 (2), 199– 220. G. Antoniou & F. van Harmelen (2004). A Semantic Web Primer (Cooperative Information Systems). MIT Press, Cambridge MA, USA. Uschold & Gruninger (2004). Ontologies and semantics for seamless connectivity. SIGMOD Rec. , 33(4), 58– 64. F. Giunchiglia, F. Farazi, L. Tanca, and R. D. Virgilio (2009). The semantic web languages. In Semantic Web Information management, a model based perspective, Springer. D. Allemang and J. Hendler (2008). Semantic web for the working ontologist: modeling in RDF, RDFS and OWL. Morgan Kaufmann Elsevier, Amsterdam, NL. T. Berners-Lee (2006). Linked Data. Design Issues for the World Wide Web W 3 C, http: //www. w 3. org/Design. Issues/Linked. Data. html. T. Heath, C. Bizer (2011). Linked Data. Evolving the Web into a global data space, Morgan and Claypool.
- Slides: 40