W 3 C Invited Talk HighLevel Change Detection

  • Slides: 36
Download presentation
W 3 C Invited Talk High-Level Change Detection in the Semantic Web Giorgos Flouris

W 3 C Invited Talk High-Level Change Detection in the Semantic Web Giorgos Flouris fgeo@ics. forth. gr Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece Joint work with: Vicky Papavassiliou, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides 16/09/2009 Giorgos Flouris

W 3 C Invited Talk World Wide Web q. WWW (and HTML) focus on

W 3 C Invited Talk World Wide Web q. WWW (and HTML) focus on human readability u. Page presentation (fonts, colors, images, …) u. Human understanding u. Presentation Semantical content u. Content is not formally described (for a machine to understand) q. WWW contains documents, not data 16/09/2009 Giorgos Flouris 2

W 3 C Invited Talk Problems with Current Web q. Search and access becomes

W 3 C Invited Talk Problems with Current Web q. Search and access becomes difficult u. Software ignorant of the semantical content of a web page u. Keyword search u. High recall, low precision q. Terminological issues u. Synonyms (heart disease = cardiac disease) u. Hyponyms/hypernyms (parliament members are politicians) q. Queries on the semantical content cannot be made u. Fetch articles that support B. Obama’s foreign policy u. Fetch the home pages of all members of the Greek Parliament 16/09/2009 Giorgos Flouris 3

W 3 C Invited Talk Semantic Web q. The Semantic Web is an extension

W 3 C Invited Talk Semantic Web q. The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation (Berners-Lee et al. , 2001) q. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http: //www. w 3. org/2001/sw/ q[Semantic Web] is a collaborative effort led by W 3 C with participation from a large number of researchers and industrial partners http: //www. w 3. org/2001/sw/ 16/09/2009 Giorgos Flouris 4

W 3 C Invited Talk Semantic Web in Practice q. Web of data, rather

W 3 C Invited Talk Semantic Web in Practice q. Web of data, rather than documents u. HTML for presentation u. Semantical languages for semantical content u. Readable and understandable by humans and machines q. Semantic Web languages, protocols, etc u. Web page annotation (metadata descriptions etc) u. Publication of data on the Internet u. Efficient communication and manipulation of data over the Internet q. Different applications u. Efficient searching u. Sharing of data (e-science, e-government, remote learning, …) 16/09/2009 Giorgos Flouris 5

W 3 C Invited Talk Ontologies q. Backbone of the Semantic Web q. Ontologies

W 3 C Invited Talk Ontologies q. Backbone of the Semantic Web q. Ontologies allow the description of data u. Annotation and metadata regarding web pages u. Terminological relations (synonyms, hyponyms, …) u. Communication and description of data, ideas, beliefs q. An ontology is an explicit specification of a shared conceptualization of a domain (Gruber, 1993) u. Precise, logical account of the intended meaning of terms, data structures etc u. Common (shared) interpretation of terms u. Formal vocabulary for information exchange (for humans and machines) 16/09/2009 Giorgos Flouris 6

W 3 C Invited Talk Ontologies in Practice q. Basic structures: u. Classes (or

W 3 C Invited Talk Ontologies in Practice q. Basic structures: u. Classes (or concepts): collections of objects (e. g. , Actor, Politician) u. Properties (or roles): binary relationships between objects (e. g. , started_on, member_of) u. Instances (or individuals): objects (e. g. , Giorgos, B. Obama) q. Relations between them u. Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), … u. The allowed relations and their semantics depend on the language q. Different representation languages for ontologies u. RDF, RDFS, DAML+Oi. L, OWL-DL, OWL-Lite, OWL 2, DLs, … u. Usually triple-based 16/09/2009 Giorgos Flouris 7

W 3 C Invited Talk Visualization, Triples, Serialization Visualization Triple Representation Period Actor Event

W 3 C Invited Talk Visualization, Triples, Serialization Visualization Triple Representation Period Actor Event participants Existing started_on Stuff Onset Birth Define classes [Period type Class] Define properties [participants type Property] [participants domain Onset] [participants range Actor] Instantiate/define individuals [G_Birth type Birth] [Giorgos type Actor] [G_Birth participants Giorgos] Define hierarchies [Event sub. Class Period] participants Giorgos G_Birth instantiation subsumption 16/09/2009 Giorgos Flouris Serialization (RDF/XML) <rdfs: Class rdf: ID=“Period”> </rdfs: Class> <rdf: Property rdf: ID=“participants”> <rdfs: domain rdf: resource=“Onset”/> <rdfs: range rdf: resource=“Actor”/> </rdf: Property> <G_Birth rdf: about Birth> <participants> <Giorgos rdf: about Actor/> </participants> </G_Birth> <rdfs: Class rdf: ID=“Event”> <rdfs: sub. Class. Of rdf: resource=“Period”/> </rdfs: Class>

W 3 C Invited Talk Ontology Dynamics q. Ontologies change constantly u. World changes

W 3 C Invited Talk Ontology Dynamics q. Ontologies change constantly u. World changes (dynamic models) u. View on the world changes (new knowledge, measurements, etc) u. Perspective and usage changes q. Example: GO ontology changes daily u. Gene Ontology: information about gene products (biology) q. Must find a way to cope with changes u. Ontology evolution (modify an ontology in response to a change) u. Ontology versioning (keep track of versions and their relations) u… q. We deal with a peripheral problem (change detection) 16/09/2009 Giorgos Flouris 9

W 3 C Invited Talk What is Change? Real World Ontology 16/09/2009 Delete_Class(…) Pull_Up_Class(…)

W 3 C Invited Talk What is Change? Real World Ontology 16/09/2009 Delete_Class(…) Pull_Up_Class(…) Rename_Class(…) … Giorgos Flouris Ontology Evolution Algorithm 10

W 3 C Invited Talk What is Change Detection? Real World Ontology Delete_Class(…) Pull_Up_Class(…)

W 3 C Invited Talk What is Change Detection? Real World Ontology Delete_Class(…) Pull_Up_Class(…) Rename_Class(…) Change Detection Algorithm … 16/09/2009 Giorgos Flouris 11

W 3 C Invited Talk Keeping Track of Changes q. Purpose of this work:

W 3 C Invited Talk Keeping Track of Changes q. Purpose of this work: change detection u. A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way q. It is important to store the changes between versions u. Visualization of differences u. Efficient storage and/or communication u. Evolution history q. Record changes as they happen (manual or automatic) u. Error-prone, difficult (often impossible) V 1 16/09/2009 C 1 V 2 C 2 V 3 Giorgos Flouris C 3 V 4 C 4 V 5 12

W 3 C Invited Talk Sample Evolution Version 1 (V 1) Version 2 (V

W 3 C Invited Talk Sample Evolution Version 1 (V 1) Version 2 (V 2) Period Actor Event Persistent participants Existing started_on Onset participants started_on Onset Event Birth Evolution Stuff Birth participants Giorgos G_Birth instantiation subsumption 16/09/2009 instantiation subsumption Giorgos Flouris G_Birth

W 3 C Invited Talk Analyzing the Evolution (Using Triples) q. Triples in V

W 3 C Invited Talk Analyzing the Evolution (Using Triples) q. Triples in V 1 (partial list) q. Triples in V 2 (partial list) [Event type Class] [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Giorgos type Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] [Birth subclass Onset] … 16/09/2009 [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Giorgos Flouris 14

W 3 C Invited Talk Low-Level Delta q. Triples in V 2 but not

W 3 C Invited Talk Low-Level Delta q. Triples in V 2 but not in V 1 (added triples) q. Triples in V 1 but not in V 2 (deleted triples) [Event domain participants] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Birth subclass Event] [Period type Class] [Event subclass Period] [participants domain Onset] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Birth subclass Onset] Low-Level Delta Add([Event domain participants]) Add([Persistent type Class]) … Del([Period type Class]) … 16/09/2009 Giorgos Flouris 15

W 3 C Invited Talk Analyzing the Evolution (Visually) Version 1 (V 1) Version

W 3 C Invited Talk Analyzing the Evolution (Visually) Version 1 (V 1) Version 2 (V 2) Period Actor Event Persistent participants started_on Event Birth Onset participants Existing started_on Onset Evolution Stuff participants Giorgos G_Birth instantiation subsumption 16/09/2009 Giorgos Birth G_Birth High-Level Delta Generalize_Domain(participants, Onset, Event) Pull_Up_Class(Birth, Onset, Event) Delete_Class(Period, Ø, {Event}, Ø, Ø) Rename_Class(Existing, Persistent) Giorgos Flouris

W 3 C Invited Talk Comparing the Deltas Version 1 (V 1) Version 2

W 3 C Invited Talk Comparing the Deltas Version 1 (V 1) Version 2 (V 2) Period Actor Event Persistent participants started_on Onset Event Birth participants Existing started_on Onset Evolution Stuff participants Stuff Giorgos Birth participants Giorgos G_Birth instantiation subsumption 16/09/2009 Low-level delta Del([participants Del([Period Del([Birth subclass type domain Class]) Onset]) Add([participants Del([Event Add([Birth subclass domain Period]) Event]) Giorgos Flouris G_Birth High-level delta Generalize_Domain Delete_Class Pull_Up_Class (Period, Ø, {Event}, Ø, Ø) (participants, (Birth, Onset, Event)

W 3 C Invited Talk Associations (Partitioning) Low-Level Changes Associated High-Level Changes Del([participants domain

W 3 C Invited Talk Associations (Partitioning) Low-Level Changes Associated High-Level Changes Del([participants domain Onset]) Generalize_Domain (participants, Onset, Event) Add([participants domain Event]) Del([Birth subclass Onset]) Pull_Up_Class(Birth, Onset, Event) Add([Birth subclass Event]) Del([Period type Class]) Delete_Class (Period, Ø, {Event}, Ø, Ø) Del([Event subclass Period]) Del([Existing type Class]) Del([Stuff subclass Existing]) Del([started_on domain Existing]) Add([Persistent type Class]) Rename_Class(Existing, Persistent) Add([Stuff subclass Persistent]) Add([started_on domain Persistent]) 16/09/2009 Giorgos Flouris 18

W 3 C Invited Talk Low-Level Versus High-Level Deltas q. Purpose: u. A posteriori

W 3 C Invited Talk Low-Level Versus High-Level Deltas q. Purpose: u. A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way q. Low-level deltas u. Easier to get q. High-level deltas u. More concise (e. g. , Rename_Class) u. More intuitive (e. g. , Pull_Up_Class) u. Carry additional information (e. g. , Generalize_Domain) q. Objective: detection of high-level deltas 16/09/2009 Giorgos Flouris 19

W 3 C Invited Talk Language of Changes and Algorithm q. Deltas based on

W 3 C Invited Talk Language of Changes and Algorithm q. Deltas based on some language of changes u. A set of formal definitions that describe the changes that can be understood and detected u. Can be high-level or low-level u. Must be coupled with a corresponding detection algorithm q. Low-level languages easy to define (Add(t), Del(t)) q. High-level languages more complicated u. Several proposals; no standard q. Challenges for high-level languages u. Must be deterministic (exactly one high-level delta) u. Must be fine-grained enough to capture subtle changes u. Must be coarse-grained enough to be concise 16/09/2009 Giorgos Flouris 20

W 3 C Invited Talk Proposed Language L q. The formal definition of a

W 3 C Invited Talk Proposed Language L q. The formal definition of a change consists of: u. Changes required in the low-level delta (added/deleted triples) u. Conditions that should hold in V 1 and/or V 2 q. Generalize_Domain(P, X, Y) u. Del([P domain X]) u. Add([P domain Y]) u. P existing property in both V 1, V 2 u. X, Y existing classes in both V 1, V 2 u. X subclass of Y in both V 1, V 2 q. Generalize_Domain(participants, Onset, Event): detectable q. Similarly for the other changes in L (about 120 in total) 16/09/2009 Giorgos Flouris 21

W 3 C Invited Talk Results on L: Granularity q. Granularity problem: solved by

W 3 C Invited Talk Results on L: Granularity q. Granularity problem: solved by defining levels of changes u. Basic Changes: fine-grained, roughly correspond to low-level u. Composite Changes: coarse-grained, group several basic changes together u. Heuristic Changes: based on heuristics, necessary for Rename, Merge, Split etc q. Problems with determinism u. One evolution could correspond to different sets of basic/composite changes q. Priorities in detection u. Heuristic Composite Basic 16/09/2009 Giorgos Flouris 22

W 3 C Invited Talk Results on L: Types of Changes Low-Level High-Level Add

W 3 C Invited Talk Results on L: Types of Changes Low-Level High-Level Add Del 16/09/2009 Basic Composite Heuristic Delete_Subclass Delete_Domain Pull_Up_Class Change_Domain Rename_Class Split_Class Giorgos Flouris

W 3 C Invited Talk Results on L: Determinism q. Each low-level change is

W 3 C Invited Talk Results on L: Determinism q. Each low-level change is associated with exactly one detectable high-level change u. Full partitioning of low-level changes into high-level ones q. Each pair of versions (V 1, V 2) is associated with: u. Exactly one low-level delta u. Exactly one high-level delta q. Determinism is necessary u. More than one would lead to ambiguities u. Less than one would make some inputs (V 1, V 2) irresolvable 16/09/2009 Giorgos Flouris 24

W 3 C Invited Talk Results on L: Application Version 1 (V 1) Version

W 3 C Invited Talk Results on L: Application Version 1 (V 1) Version 2 (V 2) Period Actor Event Detect C participants Existing started_on Onset Persistent participants started_on Onset Event Birth Apply C Stuff Apply C-1 Birth participants Giorgos G_Birth 16/09/2009 Giorgos Flouris 25

W 3 C Invited Talk Results on L: Deltas Keep Version History q. Can

W 3 C Invited Talk Results on L: Deltas Keep Version History q. Can reproduce all versions as long as you keep (any) one version and the deltas q. Deltas are more concise than the versions themselves u. Storage and communication efficiency V 1 16/09/2009 C 1 V 2 C 2 V 3 Giorgos Flouris C 3 V 4 C 4 V 5 26

W 3 C Invited Talk Detection Algorithm for L (1/2) List of Mappings <V

W 3 C Invited Talk Detection Algorithm for L (1/2) List of Mappings <V 1: Existing> is matched with <V 2: Persistent> Compute Run Matcher Heuristic (External) Changes Heuristic Changes Rename_Class(Existing, Persistent) Triples in V 1 (Partial List) [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] … 16/09/2009 Triples in Delta (step 1: low-level) Del([participants domain Onset]) Del([Birth subclass Onset]) Del([Event subclass Period]) Del([Existing type Class]) Del([Stuff subclass Existing]) Del([started_on domain Existing]) Del([Period type Class]) Calculate Low-Level Delta Add([Birth subclass Event]) Add([participants domain Event]) Add([Persistent type Class]) Add([Stuff subclass Persistent]) Add([started_on domain Persistent]) Giorgos Flouris Triples in V 2 (Partial List) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] …

W 3 C Invited Talk Detection Algorithm for L (2/2) Del([participants domain Onset]) ?

W 3 C Invited Talk Detection Algorithm for L (2/2) Del([participants domain Onset]) ? Find Associated Change ? ? Generalize_Domain(participants, Onset, Event) DETECTABLE Triples in V 1 (Partial List) [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] … 16/09/2009 Triples in Delta (step 2: heuristic) Onset]) Triples. Del([participants in Delta (step 3: domain basic and composite) Del([Birth subclass Onset]) Triples in Delta (step 4: result) Del([Event subclass Period]) Delete_Class(Period, {Event}, Del([Period Ø, type Class])Ø, Ø, Ø, Ø) Pull_Up_Class(Birth, Onset, Event) Add([Birth subclass Event]) Add([participants domain Event]) Rename_Class(Existing, Persistent) Generalize_Domain(participants, Onset, Event) Giorgos Flouris Triples in V 2 (Partial List) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] …

W 3 C Invited Talk Find Associated Change Del([participants domain Onset]) Required in Low-Level

W 3 C Invited Talk Find Associated Change Del([participants domain Onset]) Required in Low-Level Delta Potentially Associated High-Level Change Add([participants domain X]) Generalize_Domain(participants, Onset, X) Add([participants domain X]) Specialize_Domain(participants, Onset, X) --- Delete_Domain(participants, Onset) Del([participants type Property]) Delete_Property(participants, Onset, X) Del([participants range X]) … … Operations Pull_Up_Class(*, *, *) Delete_Property(participants, *, *) Specialize_Domain(participants, Onset, Event) Generalize_Domain(participants, Onset, Birth) Generalize_Domain(participants, Onset, Event) Delete_Domain(participants, Onset) 16/09/2009 Giorgos Flouris [not in the table] [necessary triples not found] [conditions not true] [wrong parameter (triples not found)] [DETECTABLE (ASSOCIATED)] [composite changes have priority] 29

W 3 C Invited Talk Implementation q. Algorithm implemented for experiments and evaluation q.

W 3 C Invited Talk Implementation q. Algorithm implemented for experiments and evaluation q. Uses the APIs of SWKM u. Platform for efficient and scalable management of dynamic RDF/S ontologies and data u. Query, update, low-level delta, high-level delta, versioning, … 16/09/2009 Giorgos Flouris 30

W 3 C Invited Talk Performance q. Complexity: O(max{N 1, N 2}) u. Linear

W 3 C Invited Talk Performance q. Complexity: O(max{N 1, N 2}) u. Linear average-case u. Highly dependent on the detected changes (type, number) 16/09/2009 Giorgos Flouris 31

W 3 C Invited Talk Evaluation: Usefulness and Intuitiveness q. L is well-defined (changes

W 3 C Invited Talk Evaluation: Usefulness and Intuitiveness q. L is well-defined (changes used in practice) u. GO: add/delete class, comments changing u. CIDOC: add/delete/rename properties q. Results confirmed by literature/editor notes 16/09/2009 Giorgos Flouris 32

W 3 C Invited Talk Evaluation: Conciseness q. Basic ≈ Low-Level q. Basic+Composite+Heuristic <<

W 3 C Invited Talk Evaluation: Conciseness q. Basic ≈ Low-Level q. Basic+Composite+Heuristic << Low-Level 16/09/2009 Giorgos Flouris 33

W 3 C Invited Talk Manual Change Recording (CIDOC) q. Editor notes q. Detection

W 3 C Invited Talk Manual Change Recording (CIDOC) q. Editor notes q. Detection result u. Delete class: 3 u. Delete class: 6 u. Add property: 54 u. Add property: 58 u. Delete property: 16 u. Delete property: 18 u. Rename property: 24 u. Rename property: 30 u. Redirect properties (domain): 14 u. Generalize_Domain: 13 u. Specialize_Domain: 1 u. Redirect properties (range): 14 u. Generalize_Range: 14 u. Specialize_Range: 1 u. Change_Range: 1 16/09/2009 Giorgos Flouris 34

W 3 C Invited Talk Conclusion q. High-level change detection u. A posteriori detection

W 3 C Invited Talk Conclusion q. High-level change detection u. A posteriori detection (input: V 1, V 2) u. No further information needed (e. g. , logs, change recording etc) q. Formal semantics u. Formal results (reversibility, determinism, …) u. Non-heuristic based (except for heuristic changes) u. No need for precision and recall evaluation q. Efficient, sound and complete detection algorithm q. Nice informal properties u. Conciseness, intuitiveness q. Future work: more operations, evaluation on other datasets, evaluation with real users 16/09/2009 Giorgos Flouris 35

W 3 C Invited Talk References 1. Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris

W 3 C Invited Talk References 1. Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides. On Detecting High-Level Changes in RDF/S KBs. In Proceedings of the 8 th International Semantic Web Conference (ISWC-09), to appear, 2009 2. Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides. Formalizing High-Level Change Detection for RDF/S KBs. Technical Report TR-398, FORTH-ICS, 2009 16/09/2009 Giorgos Flouris 36