Da Qua Ta International Workshop Lyon France 12122017

  • Slides: 27
Download presentation
Da. Qua. Ta International Workshop Lyon - France 12/12/2017 Interactive repairing of inconsistent knowledge

Da. Qua. Ta International Workshop Lyon - France 12/12/2017 Interactive repairing of inconsistent knowledge bases Abdallah Arioua University of Lyon 1 / LIRIS France. Angela Bonifati

1 0 Motivation Interactive repairing of knowledge bases • Knowledge Bases are ubiquitous: •

1 0 Motivation Interactive repairing of knowledge bases • Knowledge Bases are ubiquitous: • The Semantic Web, ontology-based reasoning and data access. • Big data integration and fusion. • Knowledge and concept graphs in industry • Errors can be introduced from mappings, typos, knowledge fusion, etc. • Automatic repairing is costly and lossy. • Bring human to the loop for a better quality. • The democratization of data cleaning. 1/20

1 0 Preliminaries Logical language and knowledge bases • A knowledge base K is

1 0 Preliminaries Logical language and knowledge bases • A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Formalism Weakly-acyclic 2/20 Correspond to Denial Constraints with equality

1 0 Preliminaries Logical language and knowledge bases • A knowledge base K is

1 0 Preliminaries Logical language and knowledge bases • A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Formalism 2/20 Reasoning

1 0 Preliminaries Logical language and knowledge bases • A knowledge base K is

1 0 Preliminaries Logical language and knowledge bases • A knowledge base K is a set of facts, TGDs (rules) and CDDs (constraints). Formalism Reasoning Example: 2/20

1 0 Preliminaries Inconsistency handling • A knowledge base K is inconsistent iff: Conflicts

1 0 Preliminaries Inconsistency handling • A knowledge base K is inconsistent iff: Conflicts discovery 3/20 • K is inconsistent: • Repairing K using Deletions: Updates: • E. g. Johnfacts Remove has an that allergy are involved against in Penicillin conflicts. rather than Aspirin • Penicillin Lossy approach. is prescribed to John rather than Aspirin • … User interaction

Given a KB equipped with a set of TGDs and CDDs, produce an error-free

Given a KB equipped with a set of TGDs and CDDs, produce an error-free KB: (i) Accounting for the interplay of TGDs and CDDs. (ii) Minimizing user interaction. Outline Update-based repairing User intervention Questioning strategies Experimental study Conclusion

2 0 Update-based repairing Introduction • Repairing using updates in KBs and RDBs 1

2 0 Update-based repairing Introduction • Repairing using updates in KBs and RDBs 1 • • On Functional Dependencies (FDs), Conditional FDs, Denial Constraints (DCs). Repairing using updates in KBs: • TGDs are natural in KB reasoning but they may introduce new conflicts. • 1. Apply TGDs then 2. apply CDDs then go to 1. and repeat. + TGDs • 4/20 CDDs + TGDs CDDs … Computationally expensive, overwhelming and some finiteness issues. 1 Philip Bohannon et al. 2005, Kolahi and Lakshmanan 2009, Yakout et al. 2011, Xu Chu et al. 2013, Farid et al. 2016

2 0 5/20 Update-based repairing Basic concepts • A set of fixes P is

2 0 5/20 Update-based repairing Basic concepts • A set of fixes P is called consistent fix if it produces a consistent KB. • P is a repair fix if KB is consistent and minimally changed (w. r. t set • inclusion). Example:

2 0 Update-based repairing ∏-Repairability • Repairing with immutable set of positions ¦. •

2 0 Update-based repairing ∏-Repairability • Repairing with immutable set of positions ¦. • Trusted positions or previously fixed positions. • Some KBs cannot be fixed when some positions are immutable. • Example: • Consider: Not p- • Checking ¦-repairability: repairable • Change all non-immutable positions to unique labelled nulls. • Check consistency. Inconsistent • 6/20 The procedure is sound, complete and computed in linear time (data complexity).

3 0 User intervention Basic definitions • A question © is a finite set

3 0 User intervention Basic definitions • A question © is a finite set of fixes. • If all the fixes in © yield a ¦-repairable KB then © is sound. • The user choses a fix from © as an answer. • A sequence of sound questions and answers is called an inquiry over K. • Example: ©: which fix is true from the following set? 7/20

3 0 User intervention Generating sound questions and inquiries • Procedure: 1. Generate a

3 0 User intervention Generating sound questions and inquiries • Procedure: 1. Generate a sound question by filtering values. 2. Ask the user and update, continue until no conflict is left. 8/20

3 0 User intervention Generating sound questions and inquiries • Procedure: 1. Generate a

3 0 User intervention Generating sound questions and inquiries • Procedure: 1. Generate a sound question by filtering values. 2. Ask the user and update, continue until no conflict is left. 8/20

3 0 User intervention Results • Proposition 1: questions are sound and polynomial. •

3 0 User intervention Results • Proposition 1: questions are sound and polynomial. • Proposition 2: the procedure runs in finite time and produces a consistent knowledge base K. • When the procedure produces a repair of K? • If the user is an oracle then K is minimally repaired. • An oracle is a user who knows everything about K (a domain and knowledge expert). • 9/20 Proposition 3: delay time between questions is polynomial.

4 0 Questioning strategies Intuition by examples Consider the following knowledge base: • Conflicts

4 0 Questioning strategies Intuition by examples Consider the following knowledge base: • Conflicts Random strategy 1. Pick a conflict and generate all possible fixes. 10/20 • No resolution if the user chooses: • We ask more questions. Join positions!

4 0 Questioning strategies Intuition by examples • Consider the following knowledge base: Join

4 0 Questioning strategies Intuition by examples • Consider the following knowledge base: Join strategy 1. Pick a conflict and generate all possible fixes over join positions. • 11/20 We ask less questions Conflicts

4 0 Questioning strategies Intuition by examples • Consider the following knowledge base: Conflicts

4 0 Questioning strategies Intuition by examples • Consider the following knowledge base: Conflicts Join strategy 1. Pick a conflict and generate all possible fixes over join positions. • We ask less questions • 11/20 Improvement: propagate fixes. Reduce the size of the next questions.

4 0 Questioning strategies Intuition by examples • Consider the following knowledge base: Conflicts

4 0 Questioning strategies Intuition by examples • Consider the following knowledge base: Conflicts MCD strategy 1. Rank join positions w. r. t inclusion in conflicts and choose the top ranked. 12/20 • 2 conflicts by one question in the example. • Observation: more overlapping less questions

5 0 Experimental study Experimental environment Variables: • Effectiveness: avg number of questions per

5 0 Experimental study Experimental environment Variables: • Effectiveness: avg number of questions per strategy and average number of conflicts per question. • Delay time: average delay time between asked question. Environment: • Java 8, 2. 40 GHz 4 core, 16 G RAM (windows 7). • Multi trial experiments with a cold start. • For each experimental variable we test our approach on synthetic and real-world datasets. 13/20

5 0 Experimental study Effectiveness KB s: • • • Durum Wheat Kb v

5 0 Experimental study Effectiveness KB s: • • • Durum Wheat Kb v 1: manually constructed. TGDs and CDDs have been validated by experts. Summary v 1: 567 atoms, TGDs=269 , CDD=27, 185 conflicts. Summary v 2: 567 atoms, TGDs=269, CDD=100, 212 conflicts. Results: 14/20

5 0 Experimental study Effectiveness: synthetic KBs no TGDs Results: 15/20

5 0 Experimental study Effectiveness: synthetic KBs no TGDs Results: 15/20

5 0 Experimental study Effectiveness: synthetic KBs (convergence) Results: 16/20

5 0 Experimental study Effectiveness: synthetic KBs (convergence) Results: 16/20

0 Experimental study Effectiveness: synthetic KBs (convergence) Results:

0 Experimental study Effectiveness: synthetic KBs (convergence) Results:

5 0 Experimental study Delay time: synthetic KBs only CDDs • • • Reasonable

5 0 Experimental study Delay time: synthetic KBs only CDDs • • • Reasonable delay time: less than 1 to 2 seconds 1. MCD strategy is used. Drum wheat v 1&2 less than 1 sec. Results: 17/20 1 Robert B Miller, Response time in man-computer conversational transactions, 1968.

5 0 Experimental study Delay time: synthetic KBs CDDs and TGDs • • •

5 0 Experimental study Delay time: synthetic KBs CDDs and TGDs • • • Reasonable delay time: less than 1 to 2 seconds 1. MCD strategy is used. Drum wheat v 1&2 less than 1 sec. Results: TGDs CDDs D 1 50 150 D 2 100 150 D 3 150 D 4 200 150 Size Inc ratio 18/20 400 atoms 100% 1 Robert B Miller, Response time in man-computer conversational transactions, 1968.

6 0 Conclusion Summary: • Update-based repairing of inconsistent knowledge bases. • Interactive repairing

6 0 Conclusion Summary: • Update-based repairing of inconsistent knowledge bases. • Interactive repairing in presence of interacting dependencies. • Strategies for interaction minimization. • Approach can be applied on portions of large knowledge bases. • Delay time is reasonable. Perspectives: • Full Denial Constraints (but undecidability!). • Other Data Cleaning constraints (CFDs, Metric FDs etc. ). 19/20

0 End! Thank you! Questions 20/20

0 End! Thank you! Questions 20/20