CPECSC 580 Knowledge Management Dr Franz J Kurfess
- Slides: 62
CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 1
Course Overview u u Introduction Knowledge Processing u u Knowledge Organization u u u Classification, Categorization Ontologies, Taxonomies, Thesauri Knowledge Retrieval u u u Knowledge Acquisition, Representation and Manipulation Information Retrieval Knowledge Navigation Knowledge Presentation u Knowledge Visualization © 2001 -2005 Franz J. Kurfess u Knowledge Capture, Transfer, and Distribution u Usage u of Knowledge Access Patterns, User Feedback u Knowledge Techniques u Exchange Management Topic Maps, Agents u Knowledge Management Tools u Knowledge Management in Organizations Knowledge Retrieval 2
Overview Knowledge Retrieval u u u Motivation Objectives Finding Out About u u Data Retrieval u u Keywords and Queries Documents Indexing Access via Address, Field, Name Information Retrieval u u Access via Content (Values) Parsing Matching Against Indices Retrieval Assessment © 2001 -2005 Franz J. Kurfess u Knowledge u u Access via Structure Meaning Context Usage u Knowledge u u Retrieval Discovery Data Mining Rule Extraction u Important Concepts and Terms u Chapter Summary Knowledge Retrieval 3
Logistics u Term Project u APIs u Lab and Homework Assignments u Deadline HW 1: May 1 u Exams u Midterm: Thursday, May 3 © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 4
Finding Out About © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 5
Pre-Test © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 6
Motivation © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 7
Objectives © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 8
Finding Out About u Keywords u Queries u Documents u Indexing © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 10
Keywords u linguistic atoms used to characterize the subject or content of a document u words u pieces of words (stems) u phrases u provide the basis for a match between u the user’s characterization of information need u the contents of the document u problems u ambiguity u choice of keywords © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 11
Queries u formulated u natural language u u interaction with human information providers artificial language u interaction with computers u u especially search engines vocabulary u controlled u u limited set of keywords may be used uncontrolled u u in a query language any keywords may be used syntax u u often Boolean operators (AND, OR) sometimes regular expressions © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 12
Documents u general u any u interpretation document that can be represented digitally text, image, music, video, program, etc. u practical interpretation u passage of text strings of characters in an alphabet u written natural language u length may vary u u longer documents may be composed of shorter ones © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 13
Aboutness of Documents u describes the suitability of a document as answer to a query u assumptions u all documents have equal aboutness the probability of any document in a corpus to be considered relevant is equal for all documents u simplistic; not valid in reality u ua paragraph is the smallest unit of text with appreciable aboutness © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 14
Structural Aspects of Documents u documents may be composed of documents u paragraphs, subsections, chapters, parts u footnotes, references u documents may contain meta-data u information about the document u not part of the content of the document itself u may be used for organization and retrieval purposes u can be abused by creators u usually to increase the perceived relevance © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 15
Document Proxies u surrogates u abridged u for the real document representations catalog, abstract u pointers u bibliographical citation, URL u different media microfiches u digital representations u © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 16
Indexing ua vocabulary of keywords is assigned to all documents of a corpus u an index maps each document doci to the set of keywords {kwj} it is about Index : doci about {kwj} Index-1 : {kwj} describes doci u indexing of a document / corpus u manual: humans select appropriate keywords u automatic: a computer program selects the keywords u building the index relation between documents and sets of keywords is critical for information retrieval © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 17
FOA Conversation Loop © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 18
Data Retrieval u access to specific data items u access via address, field, name u typically used in data bases u user asks for items with specific features u absence or presence of features u values u system u no returns data items irrelevant items u deterministic © 2001 -2005 Franz J. Kurfess retrieval method Knowledge Retrieval 19
Information Retrieval (IR) u access u also to documents referred to as document retrieval u access via keywords u IR aspects u parsing u matching against indices u retrieval assessment © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 20
Diagram Search Engine © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 21
Parsing u extraction u mostly of lexical features from documents words u may require some manipulation of the extracted features u e. g. stemming of words u used as the basis for automatic compilation of indices © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 22
Parsing Tools u Montytagger python and Java u u fn. TBL u (C++) http: //nlp. cs. jhu. edu/~rflorian/fntbl/ fast u Brill u http: //web. media. mit. edu/~hugo/montytagger/ Tagger (C) http: //www. cs. jhu. edu/~brill/ the original; influenced several later ones u out there is the Natural Language Toolkit: http: //nltk. sourceforge. net/ u good starting point for basics of NLP algorithms [recommended by Dan Miller, who is working on his Master’s thesis on automated generation of abstracts] © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 23
Matching Against Indices u identification of documents that are relevant for a particular query u keywords of the query are compared against the keywords that appear in the document u either in the data or meta-data of the document u in addition to queries, other features of documents may be used u descriptive v features provided by the author or cataloger usually meta-data u derived features computed from the contents of the document © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 24
Retrieved and Relevant Documents recall precision © 2001 -2005 Franz J. Kurfess |retrieved relevant| / |relevant| |retrieved relevant| / |retrieved| [Belew 2000] Knowledge Retrieval 25
Specificity vs. Exhaustivity © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 26
Vector Space u interpretation u relates u can of the index matrix documents and keywords grow extremely large u binary matrix of 100, 000 words * 1, 000 documents u sparsely populated: most entries will be 0 u can be used to determine similarity of documents u overlap in keywords u proximity in the (virtual) vector space u associative memories can be used as hardware implementation u extremely fast, but expensive to build © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 27
Vector Space Diagram © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 28
Document Retrieval © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 29
Retrieval Assessment u subjective assessment u how well do the retrieved documents satisfy the request of the user u objective assessment u idealized omniscient expert determines the quality of the response © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 30
Retrieval Assessment Diagram © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 31
Relevance Feedback u subjective assessment of retrieval results u often used to iteratively improve retrieval results u may be collected by the retrieval system for statistical evaluation u can be viewed as a variant of object recognition u the object to be recognized is the prototypical document the user is looking for v this document may or may not exist u the difference between the retrieved document(s) and the idealized prototype indicates the quality of the retrieval results © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 32
Relevance Feedback in Vector Space u relevance feedback is used to move the query towards the cluster of positive documents u moving away from bad documents does not necessarily improve the results u it can also be used as a filter for a constant stream of documents u as in news channels or similar situations © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 33
Query Session Example © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 34
Consensual Relevance u relevance feedback from multiple users u identifies documents that many users found useful or interesting u used by some Web sites u related to collaborative filtering u can also be used as an evaluation method for search engines v performance criteria must be carefully considered v precision and recall, plus many others © 2001 -2005 Franz J. Kurfess [Belew 2000] Knowledge Retrieval 35
IR Diagram Index Query Documents Term 2 Term 3 Term 4 Keywords Term 1 © 2001 -2005 Franz J. Kurfess Corpus Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Knowledge Retrieval 36
IR Diagram Index Query Documents Term 2 Term 3 Term 4 Keywords Term 1 © 2001 -2005 Franz J. Kurfess Corpus Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Knowledge Retrieval 37
IR Diagram Index Query Documents Term 2 Term 3 Term 4 Keywords Term 1 © 2001 -2005 Franz J. Kurfess Corpus Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Knowledge Retrieval 38
IR Diagram Index Query Documents Term 2 Term 3 Term 4 Keywords Term 1 © 2001 -2005 Franz J. Kurfess Corpus Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Knowledge Retrieval 39
IR Diagram Index Query Documents Term 2 Term 3 Term 4 Keywords Term 1 © 2001 -2005 Franz J. Kurfess Corpus Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Knowledge Retrieval 40
Query Term 3 Term 2 Term 4 Term 1 Corpus Documents Doc. 5 Doc. 5 Doc. 4 Doc. 4 Doc. 3 Doc. 3 Doc. 2 Doc. 2 Doc. 1 Doc. 1 Keywords KR Diagram Index Term A Term B Term C Term E Term D Ontology Term F Term H © 2001 -2005 Franz J. Kurfess Term I Term J Term K Term L Term G Term M Knowledge Retrieval 41
Knowledge Retrieval u Context u Usage © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 42
Context in Knowledge Retrieval u in addition to keywords, relationships between keywords and documents are exploited u explicit v hypertext u related v links concepts thesaurus, ontology u proximity spatial: place, directory v temporal: creation date/time v u intermediate relations author/creator v organization v project v © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 43
Inference beyond the Index u determines relationships between documents u citations are explicit references to relevant documents u bibliographic references u legal citations u hypertext u examples u NEC Cite. Seer <http: //citeseer. nj. nec. com> u Google Scholar http: //scholar. google. com © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 44
Additional Information Sources © 2001 -2005 Franz J. Kurfess [Belew 2000, after Kochen 1975] Knowledge Retrieval 45
Hypertext u inter-document links provide explicit relationships between documents u can be used to determine the relevance of a document for a query u example: Google <http: //www. google. com> u intra-document links may offer additional context information for some terms u footnotes, glossaries, related terms © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 46
Adaptive Retrieval Techniques u fine-tuning the matching between queries and retrieved documents u learning of relationships between terms training with term pairs (thesaurus) v pattern detection in past queries v automatic grouping of documents according to common features v u clustering of similar documents pre-defined categories v metadata v overlap in keywords v consensual relevance v source v © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 47
Document Classification © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 48
Query Model u query types (templates) u frequently v used types of queries e. g. problem/solution, symptoms/diagnosis, problem/further checks, . . . u category types u abstractions of query types u used to determine categories or topics for the grouping of search results u context information u current working document/directory u previous queries © 2001 -2005 Franz J. Kurfess [Pratt, Hearst, Fagan 2000] Knowledge Retrieval 49
Terminology Model u individual terms are connected to related terms u thesaurus/ontology v synonyms, super-/sub-classes, related terms u identifies labels for the category types © 2001 -2005 Franz J. Kurfess [Pratt, Hearst, Fagan 2000] Knowledge Retrieval 50
Matching u categorizer u determines the categories to be selected for the grouping of results u assigns retrieved documents to the categories u organizer u arranges v categories into a hierarchy should be balanced and easy to browse by the user u depends on the distribution of the search results © 2001 -2005 Franz J. Kurfess [Pratt, Hearst, Fagan 2000] Knowledge Retrieval 51
Results u retrieved documents are grouped into hierarchically arranged categories meaningful for the user u the categories are related to the query u the categories are related to each other u all categories have similar size v not always achievable due to the distribution of documents u reduced search times u higher user satisfaction © 2001 -2005 Franz J. Kurfess [Pratt, Hearst, Fagan 2000] Knowledge Retrieval 52
Dyna. Cat u knowledge-based approach to the organization of search results u categorizes results into meaningful groups that correspond to the user’s query u uses knowledge of query types and of the domain terminology to generate hierarchical categories u applied to the domain of medicine u MEDLINE is an on-line repository of medical abstracts 9. 2 million bibliographic entries from 3800 journals v Pub. Med is a web-based search tool v v v returns titles as an relevance-ranked list links to “related articles” © 2001 -2005 Franz J. Kurfess [Dyna. Cat, 2000] Knowledge Retrieval 53
Dyan. Cat Results © 2001 -2005 Franz J. Kurfess [Dyna. Cat, 2000] Knowledge Retrieval 54
Dyna. Cat Query Types © 2001 -2005 Franz J. Kurfess [Dyna. Cat, 2000] Knowledge Retrieval 55
Dyna. Cat Search © 2001 -2005 Franz J. Kurfess [Dyna. Cat, 2000] Knowledge Retrieval 56
Information vs. Knowledge Retrieval u IR u u u KR keywords as main components of the query index as match-making facility statistical basis for selection of relevant documents (ordered) list of results u u © 2001 -2005 Franz J. Kurfess keywords plus context information for the query index plus ontology for matching query and documents relationships between keywords and documents influence the selection of relevant documents results are grouped into meaningful categories Knowledge Retrieval 57
KR Diagram © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 58
Knowledge Discovery u Data Mining u Rule Extraction © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 59
Post-Test © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 60
Important Concepts and Terms u u agent information retrieval knowledge representation knowledge retrieval © 2001 -2005 Franz J. Kurfess u natural language processing Knowledge Retrieval 62
Summary Knowledge Retrieval © 2001 -2005 Franz J. Kurfess Knowledge Retrieval 63
© 2001 -2005 Franz J. Kurfess Knowledge Retrieval 64
- Franz kurfess
- Franz kurfess
- Stephen kurfess
- Ned kurfess
- Nia 580
- Nia 580
- Isa 580
- Elite 580
- Diketahui sin 580°
- 580-490
- Pachimetria 580
- El matematico griego pitagoras nacio en el año 580
- Nsa 580
- Personal and shared knowledge
- Knowledge shared is knowledge squared meaning
- Knowledge shared is knowledge multiplied
- Knowledge creation and knowledge architecture
- Contoh shallow knowledge dan deep knowledge
- Priori and posteriori knowledge
- Street smart vs book smart
- Knowledge claim
- Gertler econ
- Frederick william franz
- Franz aurenhammer
- Franz mertens
- Resumen de la metamorfosis capítulo 1
- Franz josef och
- Haydn mappa concettuale
- Convergencia adaptativa
- Estructura de la metamorfosis de franz kafka
- Franz josef och
- Chapter 29 marching toward war
- Entartete kunst
- La metamorfosis
- Franz kafka premena obsah
- Madžarski skladatelj istvan
- Heimkehr text
- Arne franz
- Espacio educativo de la metamorfosis
- Franz joseph haydn characteristics of music
- Franz immler
- Franz knoop dog experiment
- European and american structuralism
- Boas linguistics
- Franz kafka, “the metamorphosis” (1915)
- Franz anton ratkojat
- Franz josef gellert
- Franz moritz
- Joseph haydn characteristics
- Franz rothenbacher
- Características de los personajes de la metamorfosis
- Niederschlagsarten
- Who is he
- Franz kafka background
- Franz kafka lub joseph haydn
- Franz brentano aportaciones
- Franz marc modrý kůň
- Le triomphe d'achille
- Franz chaves sell
- Franz kafka brexit
- Franz marc zitate
- Franz kafka
- Franz haag