Text Mining Application Programming Chapter 1 Introduction Manu

  • Slides: 12
Download presentation
Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006

Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006

Definition: Text Mining q all types of text processing that deal with finding, organizing,

Definition: Text Mining q all types of text processing that deal with finding, organizing, and analyzing information. q (formal) the creation of new information that is not obvious in a collection of documents. q. New information is defined as a pattern, trend, or relationship that can’t be easily gleaned by reading individual documents. q. The term document to refer to any unit of text, such as a Web page, an e-mail, a formatted article, a set of slides, or a plain text file.

Data Mining vs. Text Mining q Data mining deals with structured numeric data, text

Data Mining vs. Text Mining q Data mining deals with structured numeric data, text mining deals with unstructured text. q Data used for data mining is extracted, transformed, and loaded in a data warehouse. q Text mining attempts to build a model from data that is assumed to be imprecise.

Origins of Text Mining q Information Retrieval q Natural Language Processing

Origins of Text Mining q Information Retrieval q Natural Language Processing

Understanding Text q “Alice saw the rabbit with glasses, ” q Polysemy q“In what

Understanding Text q “Alice saw the rabbit with glasses, ” q Polysemy q“In what state would you find Lincoln” q“free software” q Synonymy q. More than one word can be expressed the same meaning. q. Exuberant: lush, luxuriant, profuse, and riotous.

An Architecture for Text Mining Applications

An Architecture for Text Mining Applications

Text Mining Functions q Searching q Information Extraction q Clustering q Categorization q Summarization

Text Mining Functions q Searching q Information Extraction q Clustering q Categorization q Summarization q Information Monitor q Question and Answer

A Layered Model

A Layered Model

Text Mining Installation q Text Mine (http: //textmine. sf. net) is a collection of

Text Mining Installation q Text Mine (http: //textmine. sf. net) is a collection of Perl modules and code on Source. Forge to index, cluster, classify, and summarize text.

Usage q Command line q Web-based interface.

Usage q Command line q Web-based interface.

Web Interface

Web Interface