Information Organization Overview IO What What is Information

  • Slides: 9
Download presentation
Information Organization: Overview

Information Organization: Overview

IO: What § What is Information Organization? Systematic arrangement of items q q group

IO: What § What is Information Organization? Systematic arrangement of items q q group similar items together assign meaning to groups determine relationships between groups assign items to groups Grouping 1 Grouping 2 Big Small Big Square Small Blue Grouping 3 Circle Blue Small Big Small Red Big Red Circle Square Circle Search Engine Square 2

IO: Why § Why organize information? Why do we put certain things in certain

IO: Why § Why organize information? Why do we put certain things in certain places? Closet Taxonomy - Seasonal groups - Pants vs. Shirts - Color groups - Favorite vs. non-favorite Food Good sweet taste smell like milk Bad too hot hard to chew To make sense of the world To find things easier → Knowledge Discovery (KD) → Information Retrieval (IR) Search Engine 3

IO: How § How do we organize information? General Approach q anticipate how item

IO: How § How do we organize information? General Approach q anticipate how item is searched for – e. g. by subject, date, author q q look for common features among items determine what an item is about Classification q q Identification/creation of classes Assignment of items into classes Clustering q group similar items together § What to do when information to organize is massive? 10, 000 books 100, 000 journal papers 1, 000 web pages Search Engine 4

Machine Learning: Introduction § What is Machine Learning? A computer program learns if it

Machine Learning: Introduction § What is Machine Learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997). Any change in a system that allows it to perform better the second time on repetition of the same task or on task drawn from the same population (H. Simon, 1983). § How can systems improve? By acquiring new knowledge q q Acquiring new facts Acquiring new skills By adapting its behavior q q Solving problems more accurately Solving problems more efficiently Search Engine 5

Machine Learning: Introduction § Which is different? § Which are similar? § How is

Machine Learning: Introduction § Which is different? § Which are similar? § How is learning possible? Because there are regularities in the world. Search Engine 6

ML: Classification vs. Clustering § Classification Task is to learn to assign instances to

ML: Classification vs. Clustering § Classification Task is to learn to assign instances to predefined classes Supervised Learning q q data has to specify what we are trying to learn (the classes) requires training data – predefined classes and classified items § Clustering Task is to learn a classification from the data q no predefined classification is required Unsupervised Learning q data doesn’t specify what we are trying to learn (the clusters) Clustering algorithms divide a data set into natural groups (clusters) q items in the same cluster are similar to each other and share certain properties Search Engine 7

IO for IR § Clustering Document Clustering q Cluster Hypothesis – Documents having similar

IO for IR § Clustering Document Clustering q Cluster Hypothesis – Documents having similar contents tend to be relevant to the same query q Rank clusters by Query-Cluster Similarity – Cluster documents based on vector similarity q Post-retrieval clustering – Scatter-Gather Keyword Clustering q Automatic Thesaurus Construction – Query Expansion Search Engine 8

IO for IR § Classification Document Categorization q classify documents into manually defined categories

IO for IR § Classification Document Categorization q classify documents into manually defined categories – supports hierarchical browsing, query expansion via relevance feedback Document Indexing q assign keywords to documents – automatic indexing with controlled vocabulary, metadata generation Document Filtering q e. g. news delivery, email spam filtering Query Classification q q collection selection algorithm selection Search Engine 9