Keyword Proximity Search in Complex Data Graphs Konstantin

  • Slides: 63
Download presentation
Keyword Proximity Search in Complex Data Graphs • Konstantin Golenberg • Benny Kimelfeld •

Keyword Proximity Search in Complex Data Graphs • Konstantin Golenberg • Benny Kimelfeld • Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions &

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions & Future Work SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Schema-Free Extraction of Data Exposure to many databases Nowadays… • Different types (relational, XML,

Schema-Free Extraction of Data Exposure to many databases Nowadays… • Different types (relational, XML, RDF…) • Different schemas • Not easy to use traditional paradigms of querying (e. g. , SQL, XQuery, SPARQL) and, moreover, they require a thorough understanding of the schema • Goal: Enable users to instantly pose (inaccurate) queries without knowing the schema The natural (and popular) option: Keyword Search −Problem: Inherently different from standard IR SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Keyword Proximity Search (KPS) Data have varying degrees of structure – Relational (w/ foreign

Keyword Proximity Search (KPS) Data have varying degrees of structure – Relational (w/ foreign keys), XML (w/ id-references) – Natural representation by a graph – Usually, data-centric rather than document-centric A query is a set of keywords − No structural constraints The Goal: Extract meaningful parts of data w. r. t. the keywords • Agrawal et al. ICDE’ 02 • Hristidis et al. , VLDB’ 02, 03, ICDE’ 03 • Bhalotia et al. VLDB’ 05 • Kacholia al. , VLDB’ 06 • Ding et al. , ICDE’ 07 • Liu et al. , SIGMOD’ 06 • Wang et al. , VLDB’ 06 • Luo et al. , SIGMOD’ 07 … SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Example: Search in RDB Cities Organizations ID Name Population ID Name Head Q. 22

Example: Search in RDB Cities Organizations ID Name Population ID Name Head Q. 22 Amsterdam 1101407 135 EU 73 73 Brussels 951580 175 ESA 81 Countries Memberships Code Name Area Capital Country Org. NL Netherlands 37330 22 B 135 B Belgium 30510 73 NL 135 search SIGMOD’ 08 Belgium , Brussels Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Brussels is the capital city of Belgium Cities Organizations ID Name Population ID Name

Brussels is the capital city of Belgium Cities Organizations ID Name Population ID Name Head Q. 22 Amsterdam 1101407 135 EU 73 73 Brussels 951580 175 ESA 81 Countries Memberships Code Name Area Capital Country Org. NL Netherlands 37330 22 B 135 B Belgium 30510 73 NL 135 search SIGMOD’ 08 Belgium , Brussels Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Brussels hosts EU and Belgium is a member Cities Organizations ID Name Population ID

Brussels hosts EU and Belgium is a member Cities Organizations ID Name Population ID Name Head Q. 22 Amsterdam 1101407 135 EU 73 73 Brussels 951580 175 ESA 81 Countries Memberships Code Name Area Capital Country Org. NL Netherlands 37330 22 B 135 B Belgium 30510 73 NL 135 search SIGMOD’ 08 Belgium , Brussels Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Example: Search in XML search SIGMOD’ 08 Yannakakis , Approximation Keyword Proximity Search in

Example: Search in XML search SIGMOD’ 08 Yannakakis , Approximation Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Yannakakis wrote a paper about Approximation search SIGMOD’ 08 Yannakakis , Approximation Keyword Proximity

Yannakakis wrote a paper about Approximation search SIGMOD’ 08 Yannakakis , Approximation Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Yannakakis is cited by a paper about Approximation search SIGMOD’ 08 Yannakakis , Approximation

Yannakakis is cited by a paper about Approximation search SIGMOD’ 08 Yannakakis , Approximation Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Data Graphs v Structural and keyword nodes v Edges and nodes may have weights

Data Graphs v Structural and keyword nodes v Edges and nodes may have weights – Weak relationships are penalized by large weights Each keyword has one occurrence in the data graph (technical) SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Queries are sets of keywords from the data graph Q={ Summers , Cohen ,

Queries are sets of keywords from the data graph Q={ Summers , Cohen , coffee } SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

An Answer is a Reduced Subtree An answer is a subtree of the data

An Answer is a Reduced Subtree An answer is a subtree of the data graph v Contains all keywords of the query v Has no redundant edges (and nodes) This paper 3 variants: directed, undirected, strong (undirected, kw’s are leaves); SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Previous Solutions • Lack of guarantees − Highly relevant answers might be missed, and

Previous Solutions • Lack of guarantees − Highly relevant answers might be missed, and / or − Inefficient algorithms • Rather simple data sets – a (very) small number of relevant answers − They considered data that are essentially collections of entities, namely, DBLP, IMDB, Lyrics, etc. − An answer is usually within the scope of an entity → e. g. , the keywords appear in a single movie • Crucial problems ignored − In particular, the “repeated information” problem − Especially pervasive in complex data graphs SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Contributions A system for keyword proximity search • An algorithm for generating answers with

Contributions A system for keyword proximity search • An algorithm for generating answers with guarantees − Does not miss (valuable) answers − Efficient (polynomial delay) − Answers generated in a 2 -approximate order by height • A ranking technique that is aware of the repeated-information problem − Gives preference to answers with low similarity to earlier ones • Experimentation over a highly-cyclic data graph − The Mondial database − Many “meaningful” connections among keywords SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The MONDIAL Database Institute for Informatics Georg-August-Universität Göttingen http: //www. dbis. informatik. uni-goettingen. de/Mondial/

The MONDIAL Database Institute for Informatics Georg-August-Universität Göttingen http: //www. dbis. informatik. uni-goettingen. de/Mondial/

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions &

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions & Future Work SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Challenges • Huge no. of answers; not instantiated! − Not simple to generate all

Challenges • Huge no. of answers; not instantiated! − Not simple to generate all relevant answers, even if ranking is ignored − For practical ranking functions, enumerating the answers in ranked order is probably impossible • For example, finding the smallest answer is the intractable Steiner-tree problem • Redundancy / repeated information − Many answers are very similar (altogether provide a low amount information) − Crucial in complex (highly cyclic) data graphs We employ a two-phase architecture: SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Architecture: Generator + Ranker Simplified ranking at first [Bhalotia et al. , ICDE’ 02,

Architecture: Generator + Ranker Simplified ranking at first [Bhalotia et al. , ICDE’ 02, VLDB’ 05] Answer Generator Ranker Generates next M·k answers Ranks all answers generated up to now (simplified ranking function) (- printed ones) • search(keywords) • next k answers SIGMOD’ 08 top-k answers (relative to those that have already been printed) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions &

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions & Future Work SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Generating the Top Answers: Not Trivial! To demonstrate the difficulty of generating the “good”

Generating the Top Answers: Not Trivial! To demonstrate the difficulty of generating the “good” (top) answers, let’s see how existing approaches operate on a simple example: SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Find the Answers in this Example! SIGMOD’ 08 Keyword Proximity Search in Complex Data

Find the Answers in this Example! SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The BANKS Approach [Bhalotia et al. , ICDE’ 02, VLDB’ 05] Answers are directed

The BANKS Approach [Bhalotia et al. , ICDE’ 02, VLDB’ 05] Answers are directed subtrees ∀ nodes v (in a “good” order) and keyword occurrences: Generate the min-height subtree emanating from v SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The BANKS Approach [Bhalotia et al. , ICDE’ 02, VLDB’ 05] Answers are directed

The BANKS Approach [Bhalotia et al. , ICDE’ 02, VLDB’ 05] Answers are directed subtrees What about this answer? Never generated! ∀ nodes v (in a “good” order) and keyword occurrences: Generate the min-height subtree emanating from v SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees ∀

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees ∀ nodes v (in a “good” order): Generate the min-weight subtree that includes v SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees This

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees This node is redundant It is actually the previous answer! ∀ nodes v (in a “good” order): Generate the min-weight subtree that includes v SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees This

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees This node is redundant Again, the previous answer! ∀ nodes v (in a “good” order): Generate the min-weight subtree that includes v SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees What

The NUITS Approach [Ding et al. , ICDE’ 07] Answers are undirected subtrees What about this answer? Never generated! Severe limit on # of generated answers! (≤ one per node) ∀ nodes v (in a “good” order): Generate the min-weight subtree that includes v SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The DISCOVER / DBXplorer Approach [Hristidis et al. , VLDB’ 02, 03, ICDE’ 03]

The DISCOVER / DBXplorer Approach [Hristidis et al. , VLDB’ 02, 03, ICDE’ 03] [Agrawal et al. ICDE’ 02] Easy to implement! DBMS queries– No in-memory graph algorithms All answers are generated in ranked order! ∀ possible queries Q (from the schema) in inc. size: Evaluate Q over the database SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The DISCOVER / DBXplorer Approach [Hristidis et al. , VLDB’ 02, 03, ICDE’ 03]

The DISCOVER / DBXplorer Approach [Hristidis et al. , VLDB’ 02, 03, ICDE’ 03] [Agrawal et al. ICDE’ 02] Worst case: exponential in the data But many queries do not generate any answer at all! Inefficient! Limited Ranking! by the query (rather than the answer) weight ∀ possible queries Q (from the schema) in inc. size: Evaluate Q over the database SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

We Need Generators w/ Guarantees! All answers are generated − In particular, each of

We Need Generators w/ Guarantees! All answers are generated − In particular, each of the “relevant” answers is produced at some point (100% recall is achievable) Controlled order of answers − For instance, increasing weight, increasing height, approximate (what is the ratio? ) / heuristic order Efficiency − The top-k answers should be generated efficiently − Bound on time between successive answers SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Order by Increasing Weight / Height If Then ≤ Top-k Answers SIGMOD’ 08 Keyword

Order by Increasing Weight / Height If Then ≤ Top-k Answers SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Approximate and Heuristic Orders Approximate order Heuristic order There is a provable bound on

Approximate and Heuristic Orders Approximate order Heuristic order There is a provable bound on the extent to which the actual order can deviate from the optimal one Intuitively, expected to be close to the optimal order, but there is no guarantee SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

C-Approximate Order (inc. Weight / Height) If Then ≤C C-Approximation of the Top-k Answers

C-Approximate Order (inc. Weight / Height) If Then ≤C C-Approximation of the Top-k Answers [Fagin et al. , PODS’ 01] SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Our Approach • PODS’ 06: Enum. by (exact / approx) inc. weight − Problem:

Our Approach • PODS’ 06: Enum. by (exact / approx) inc. weight − Problem: Repeated application of Steiner-tree alg’s − “Heavy” – hard to implement efficiently • Here: Follow the basic approach of PODS’ 06 • But, we adopt the BANKS idea of using height (≠ weight) for the enumeration order − Recall: BANKS might miss highly relevant answers • Thus, we bypass Steiner trees and obtain a much faster algorithm • Our alg. has all 3 guarantees: answers are not missed, missed approximate order, order poly. delay SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

An Overview of the Algorithm Task: Enum. by (2 -approx. ) increasing height Task:

An Overview of the Algorithm Task: Enum. by (2 -approx. ) increasing height Task: Lawler / Yen method Types of Constraints: • Inclusion: “include edge e” • Exclusion: “exclude edge e” Find (a 2 -approx. of) the shortest answer under constraints The intricate part … Task: Backward-search (Dijkstra) iterators (~ BANKS) SIGMOD’ 08 Find the shortest answer (w/o constraints) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Finding an Answer under Constraints • Inclusion: “include edge e” • Exclusion: “exclude edge

Finding an Answer under Constraints • Inclusion: “include edge e” • Exclusion: “exclude edge e” Handling exclusion constraints is easy Simply remove the excluded edges from the graph SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Inclusion Constraints are the Problem • Inclusion: “include edge e” • Exclusion: “exclude edge

Inclusion Constraints are the Problem • Inclusion: “include edge e” • Exclusion: “exclude edge e” redundant edge The shortest subtree that contains the kw’s and satisfies the const’s But it is not an answer! • Not reduced (has redundancy) • Moreover, includes a previously printed answer • Sometimes, no answer at all! SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The Correct Answer • Inclusion: “include edge e” • Exclusion: “exclude edge e” Technique:

The Correct Answer • Inclusion: “include edge e” • Exclusion: “exclude edge e” Technique: 1. Generate a min-height subtree (as in the wrong solution) 2. Not an answer? → modify • Intricate to guarantee 2 -approx. • Details in the proceedings SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Running Times Each entry is an avg. of 4 queries SIGMOD’ 08 Keyword Proximity

Running Times Each entry is an avg. of 4 queries SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Alg. Order vs. Weight Order How many answers are generated in order to obtain

Alg. Order vs. Weight Order How many answers are generated in order to obtain the top-k (among 1000) according to weight? Each entry is an avg. of 4 queries SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Effective Approx. Ratio: Height ↑ 2 keywords % 3 keywords k (answers) Effective approx.

Effective Approx. Ratio: Height ↑ 2 keywords % 3 keywords k (answers) Effective approx. ratio SIGMOD’ 08 worst / best (among first k) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Effective Approx. Ratio: Height ↑ 4 keywords % 5 keywords k (answers) Effective approx.

Effective Approx. Ratio: Height ↑ 4 keywords % 5 keywords k (answers) Effective approx. ratio SIGMOD’ 08 worst / best (among first k) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Effective Approx. Ratio: Weight ↑ 2 keywords % 3 keywords k (answers) Effective approx.

Effective Approx. Ratio: Weight ↑ 2 keywords % 3 keywords k (answers) Effective approx. ratio SIGMOD’ 08 worst / best (among first k) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Effective Approx. Ratio: Weight ↑ 4 keywords % 5 keywords k (answers) Effective approx.

Effective Approx. Ratio: Weight ↑ 4 keywords % 5 keywords k (answers) Effective approx. ratio SIGMOD’ 08 worst / best (among first k) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions &

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions & Future Work SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The Basic Ranking Function 1 abs-rel(a) = weight(a) Σ weight(node) + Σ weight(edge) node∊a

The Basic Ranking Function 1 abs-rel(a) = weight(a) Σ weight(node) + Σ weight(edge) node∊a edge∊a weight(a) = SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Determining the Weight of an Edge org. enters many countries → weak connection (large

Determining the Weight of an Edge org. enters many countries → weak connection (large weight) Many org’s enter country → weak connection (large weight) Strong connection (small weight) SIGMOD’ 08 Strongest! Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The Basic Ranking Function (cont’d) 1 abs-rel(a) = weight(a) Σ weight(node) + Σ weight(edge)

The Basic Ranking Function (cont’d) 1 abs-rel(a) = weight(a) Σ weight(node) + Σ weight(edge) l e v node∊a edge∊a weight(a) R =e ant answ ers but … weight(node) = fixed (1) weight(edge) = log(1 + α·out(v 1→t 2) + (1 − α)·in(t 1→v 2)) edge = (v 1, v 2) # t 2 nodes with edges from v 1 tag(vi) = ti # t 1 nodes with edges to v 2 SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Answers with High Similarity SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny.

Answers with High Similarity SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Combinations of Connections But each individual answer is relevant! SIGMOD’ 08 Keyword Proximity Search

Combinations of Connections But each individual answer is relevant! SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Dynamic Ranking … Output Candidate Answers Next-Answer() a ← extract-top-candidate() print(a) for all candidates

Dynamic Ranking … Output Candidate Answers Next-Answer() a ← extract-top-candidate() print(a) for all candidates c and pairs of keywords k 1, k 2 if c and a connect k 1 and k 2 similarly, then penalize(c) What does it mean SIGMOD’ 08 ? Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Two Types of “Similarity” a c 1 The same connection Iso mo r (sa

Two Types of “Similarity” a c 1 The same connection Iso mo r (sa phic me con sch nec em tion a) k 1, k 2 = Belgium, France 2 options: • Sum over printed answers • Max over printed answers SIGMOD’ 08 Penalty: 1 c 2 Penalty: p (≤ 1) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The General Ranking Function 1 abs-rel(c) = weight(c) ∑ rpt-inf(c) = k , k

The General Ranking Function 1 abs-rel(c) = weight(c) ∑ rpt-inf(c) = k , k ∊∑ kw’s 1 2 score(c) = SIGMOD’ 08 printed p answers or 1 or ∑max p or 1 k 1, k 2 printed answers ∊ kw’s 1 1 + ε · rpt-inf(c) abs-rel(c) Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

The bottom configuration is better than the top one Score Loss ofvs. Diversity Smaller

The bottom configuration is better than the top one Score Loss ofvs. Diversity Smaller reduction score for similar/higher degree of diversity Score (1/weight) Connections (u. t. iso. ) Sum, p=1. 0 % of max. Max, p=0. 1 • 5 keywords • Avg. of 4 queries • Top-20 answers SIGMOD’ 08 ε Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions &

Overview Keyword Proximity Search System Overview Algorithm for Answer Generation Ranking Answers Conclusions & Future Work SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Conclusions • KPS in complex data graphs has inherent problems that are ignored in

Conclusions • KPS in complex data graphs has inherent problems that are ignored in existing systems • 2 -component arch. : answer generator & ranker • 1 st component: Enum. algorithm w/ guarantees − Efficient, correct (no missed answers), 2 -approximate order by height − In the paper: Ext. to OR semantics (exact order) • 2 nd component: Dynamically ranks candidates by penalizing them for repeated information − Our experiments over Mondial suggest a tuning of the parameters that gives the best tradeoff between information gain and score loss SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Current & Future Research • Improve / optimize the answer generator − Successful: Parallelism

Current & Future Research • Improve / optimize the answer generator − Successful: Parallelism − Concurrent queries? • Implement different answer generators − E. g. , by (approx. ) increasing weight [KS-PODS’ 06] • Assessment by humans − Relevancy / repeated information − Methodology example: [Zhang et al. , SIGIR’ 02] • Other aspects − Answer presentation → SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Answer Presentation • On the Web, we instantly get the meaning of an answer

Answer Presentation • On the Web, we instantly get the meaning of an answer (Web page) by the <title>, URL and, possibly, a snippet of the text • In KPS, understanding the meaning of a subtree is note straightforward—need to derive the semantics from the graphical presentation SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

What’s the Meaning of this Answer? Harder in XML! IMDB • No division into

What’s the Meaning of this Answer? Harder in XML! IMDB • No division into relations (everything is element / attribute) • What information is needed to describe a node? A snapshot of BANKS demo (http: //www. cse. iitb. ac. in/banks/) SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Answer Presentation • On the Web, we instantly understand the meaning of an answer

Answer Presentation • On the Web, we instantly understand the meaning of an answer (Web page) by reading the <title> element, the URL and, possibly, a snapshot of the text • In KPS, understanding the meaning of a subtree is cumbersome since we need to derive the semantics from the presentation • Graphical presentation is based on restructuring answers in terms of of entities, properties and Solution: relationships (under • Apply heuristics for determining the minimal develop. ) set of properties required for each entity SIGMOD’ 08 Keyword Proximity Search in Complex Data Graphs Benny. University Kimelfeld The Hebrew

Thank you! Questions?

Thank you! Questions?