Ontological Issues for KnowledgeEnhanced Search Deborah Mc Guinness
Ontological Issues for Knowledge-Enhanced Search Deborah Mc. Guinness AT&T Labs Research 180 Park Ave. Room A 215 Florham Park, NJ 07932 dlm@research. att. com http: //www. research. att. com/info/dlm
Outline • Motivation • Basic Building Blocks (objects, constructors) • Inference • Algorithmic Approaches • Applications • Conclusion
Motivation Queries miss relevant documents because: • queries are naive • documents do not contain “perfect” content
Solutions • Augment Documents - tag all pages with (controlled vocabulary) meta tags - labor is distributed and must be trained - approach is unscalable especially if content changes • Augment Index - centralized labor cost - must re-index every time meta tag language changes • Augment Query (manually) - requires user training • Augment Query (automatically) - no user training or content provider training - centralized labor cost, no rework needed
Find. UR • Address issues of recall, rank ordering, and browsing • utilizing available knowledge • in a standard search platform • deploy, test, and maintain on websites
Background Knowledge Supports: • Retrievals of previously missed relevant documents • More relevant retrievals scored higher than less relevant documents • Simple user generation and refinement of queries • User expectation setting
Find. UR Architecture Content to Search: Research Site Technical Memorandum Calendars (Summit 2005, Research) Yellow Pages (Directory Westfield) Newspapers (Leader) Internal Sites (Rapid Prototyping) AT&T Solutions Worldnet Customer Care Search Technology: User Interface: Content (Web Pages or Databases Content Classification CLASSIC Knowledge Representation System Search Engine Domain Knowledge GUI supporting browsing and selection Results (standard format) Results (domain specific) Verity (and topic sets) Collaborative Topic Set Tool Verity Search. Script, Javascript, HTML, CGI, CLASSIC
Find. UR improves search by: • Retrieving previously missed relevant documents • More appropriately ordering search results • Facilitating simple user generation and refinement of queries • Setting user expectations about the content domain
Selected Find. UR implementations: Electronic Yellow Pages: www. quintillion. com/westfield Event Calendars: www. quintillion. com/calendar/[summit |westfield] Medical Information (P-CHIP, POS) Computer Science Research Information Competitive Intelligence Sites Staff Augmentation and Vendor Procurement Info Network Service Realization Rapid Prototyping Info and Services Technical Memorandum Access Online Newspapers Hometown Cites Intellectual Capital
Common Site Conditions Short Document Length Few related content words per document Unfamiliar vocabularies Variability in specificity of documents Inconsistent or irregular meta tagging Higher (relevance) value for general documents over specific documents
Evidence types Synonyms Subclasses Products Companies Associated Standards Key People
Find. UR Architecture Content to Search: Content (Web Pages, Documents, Databases) Content Classification Search Technology: Search Engine CLASSIC Domain User Interface: GUI supporting browsing and selection Verity Topic Sets Query Input Results (std. format) 6/8/98 Knowledge Collaborative Topic Building Tool Results (domain spec. ) Verity Search. Script, Javascript, HTML, CGI Ontological Issues - FOIS ‘ 98
Find. UR/Smart Search Benefits • Retrieves documents otherwise missed • More appropriately organizes documents according to relevance (useful for large number of retrievals) • Browsing support (navigation, highlighting) • Simple User Query building and refinement • Full Query Logging and Trace • Facilitate use of advanced search functions without requiring knowledge of a search language • Automatically search the right knowledge sources according to information about the context of the query 6/8/98 Ontological Issues - FOIS ‘ 98
Find. UR Future Work • Topic Set Generation • Distributed Collaborative Topic Set Building Environment • Use tagged content to generate candidate topic sets • Information Retrieval (use clustering to analyze documents and suggest topic definitions) • Machine Learning (use query logs as training data) • Reuse topic sets for different purposes using views of knowledge • Knowledge Representation Integration • Use knowledge base to check definitions and determine overlaps • Expand beyond subclass, instance, and synonym relationships and incorporate more structured information • Maintain information about how and when to use topic information • Maintain descriptions of content sources • Evaluation and Interface Evolution 6/8/98 • Evaluate on effectiveness of retrievals, relevance ranking, ease of query refinement, east of content input into category scheme Ontological Issues - FOIS ‘ 98 • Java-based interface for scalability, rapid changing, understandability
AT&T Labs Research Site • Find. UR has a taxonomy of background information which includes “knowledge representation” as a sub-category of “artificial intelligence. ” • The category/sub-category relationships are displayed in the user interface. Users can construct queries by simply clicking categories and sub-categories, invoking background knowledge in the process.
AT&T Labs Research Site • With background knowledge the search returns 696 relevant listings. • Documents of a more general nature such as bibliographies and departmental overviews float higher in the list. Without background knowledge, a reference manual was the first retrieval.
General Nature of Descriptions a WINE a LIQUID a POTABLE general categories grape: chardonnay, . . . [>= 1] sugar-content: dry, sweet, off-dry color: red, white, rose price: a PRICE winery: a WINERY structured components grape dictates color (modulo skin) harvest time and sugar are related interconnections between parts
General Nature of Descriptions concept superconcepts number restrictions roles value restrictions a WINE a LIQUID a POTABLE general categories grape: chardonnay, . . . [>= 1] sugar-content: dry, sweet, off-dry color: red, white, rose price: a PRICE winery: a WINERY structured components grape dictates color (modulo skin) harvest time and sugar are related interconnections between parts
URLs Find. UR Home Page: http: //www. research. att. com/~dlm/findur Description Logic Home Page: http: //dl. kr. org/dl Implemented Description Logic-based systems: http: /www. ida. liu. se/labs/iislab/people/patla/DL/systems. html The CLASSIC Knowledge Representation System: http: //www. research. att. com/sw/tools/classic Deborah Mc. Guinness: http: //www. research. att. com/info/dlm
Contributors Tom Beattie Beth Cataldo Ihung (Kyle) Chang Curtis Chen Lisa Croel Martha Desmond Paul Fuoss Karrie Hanson Pam Kirkbride Dave Kormann Harley Manning Russ Maulitz Mark Plotnick Lori Alperin Resnick Beth Robinson Steve Solomon
- Slides: 20