Information Organization Overview IO What n What is









- Slides: 9
Information Organization: Overview
IO: What n What is Information Organization? Systematic arrangement of items u group similar items together · assign meaning to groups · determine relationships between groups · assign items to groups · Grouping 1 Grouping 2 Big Small Big Square Small Blue Circle Blue Small Big Small Red Big Red Circle Search Engine Grouping 3 Square Circle Square 2
IO: Why n Why organize information? u Why do we put certain things in certain places? Closet Taxonomy - Seasonal groups - Pants vs. Shirts - Color groups - Favorite vs. non-favorite Food Good sweet taste To find things easier → Information Retrieval (IR) Search Engine smell like milk Bad too hot hard to chew To make sense of the world → Knowledge Discovery (KD) 3
IO: How n How do we organize information? u General Approach anticipate how item is searched for → e. g. by subject, date, author · look for common features among items · determine what an item is about · u Classification Identification/creation of classes · Assignment of items into classes · u Clustering · n group similar items together What to do when information to organize is massive? u u u Search Engine 1, 000 journal papers 10, 000 books 1, 000, 000 web pages 4
Machine Learning: Introduction n What is Machine Learning? u A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997). u n Any change in a system that allows it to perform better the second time on repetition of the same task or on task drawn from the same population (H. Simon, 1983). How can systems improve? u By acquiring new knowledge Acquiring new facts · Acquiring new skills · u By adapting its behavior Solving problems more accurately · Solving problems more efficiently · Search Engine 5
Machine Learning: Introduction n Which is different? n Which are similar? n How is learning possible? u Search Engine Because there are regularities in the world. 6
ML: Classification vs. Clustering n Classification u u Task is to learn to assign instances to predefined classes Supervised Learning data has to specify what we are trying to learn (the classes) · requires training data · → n predefined classes and classified items Clustering u Task is to group items into clusters · u Unsupervised Learning · u data doesn’t specify what we are trying to learn (the clusters) Clustering algorithms divide a data set into natural groups (clusters) · Search Engine no predefined classification is required items in the same cluster are similar to each other and share certain properties 7
IO for IR n Clustering u u Document Clustering · Cluster Hypothesis → Documents having similar contents tend to be relevant to the same query · Rank clusters by Query-Cluster Similarity → Cluster documents based on vector similarity · Post-retrieval clustering → Scatter-Gather Keyword Clustering · Search Engine Automatic Thesaurus Construction → Query Expansion 8
IO for IR n Classification u Document Categorization · classify documents into manually defined categories → u Document Indexing · assign keywords to documents → u automatic indexing with controlled vocabulary, metadata generation Document Filtering · u supports hierarchical browsing, query expansion via relevance feedback e. g. news delivery, email spam filtering Query Classification collection selection · algorithm selection · Search Engine 9