NITISH MANOCHA Platforms AIX workstation OS390 Sun Solaris

  • Slides: 28
Download presentation
NITISH MANOCHA

NITISH MANOCHA

Platforms § § AIX workstation OS/390 Sun Solaris Windows NT

Platforms § § AIX workstation OS/390 Sun Solaris Windows NT

Tools to Use § Topic categorization tool l l Categorizing emails Categorizing Web Pages

Tools to Use § Topic categorization tool l l Categorizing emails Categorizing Web Pages

Text Analysis Tool § Topic Categorization Tool

Text Analysis Tool § Topic Categorization Tool

Text Analysis Tool § Topic Categorization Tool l Category 1 (AI Schedule)

Text Analysis Tool § Topic Categorization Tool l Category 1 (AI Schedule)

Text Analysis Tool l Category 2 (Database Schedule)

Text Analysis Tool l Category 2 (Database Schedule)

Text Analysis Tool § Target Category ( Data Mining Schedule)

Text Analysis Tool § Target Category ( Data Mining Schedule)

Text Analysis Tool § Result - Category 2 (Databases)

Text Analysis Tool § Result - Category 2 (Databases)

Tools to Use § Clustering Tool (Finding Similar Information) l l Dividing Documents into

Tools to Use § Clustering Tool (Finding Similar Information) l l Dividing Documents into Groups Identifying hidden similarities in documents Identifying duplicate documents from a collection Finding Documents that are out of place

Text Analysis Tool § Hierarchical Clustering - imzhclst

Text Analysis Tool § Hierarchical Clustering - imzhclst

Text Analysis Tool § Binary Clustering - imzcrlst

Text Analysis Tool § Binary Clustering - imzcrlst

Text Analysis Tool § Results

Text Analysis Tool § Results

Text Analysis Tool § Results

Text Analysis Tool § Results

Tools to Use § Feature Extraction Tool l Name Extraction Abbreviation Extraction Relation Extraction

Tools to Use § Feature Extraction Tool l Name Extraction Abbreviation Extraction Relation Extraction

Text Analysis Tool § Using Feature Extraction tool to extract names l imzxrun -b

Text Analysis Tool § Using Feature Extraction tool to extract names l imzxrun -b 2 -f C -x n -o faculty. out faculty. htm

Text Analysis Tool

Text Analysis Tool

Tools to Use § Language Identification Tool l l Organize collection of documents by

Tools to Use § Language Identification Tool l l Organize collection of documents by language Restrict Search Results to documents in a particular language

Text Analysis Tool § Using Language Identification tool l imzlgini -b 2 -v <

Text Analysis Tool § Using Language Identification tool l imzlgini -b 2 -v < mydoc. htm

Text Analysis Tool § Language Identification Tool Results l Supports 13 Languages, New Languages

Text Analysis Tool § Language Identification Tool Results l Supports 13 Languages, New Languages Can be trained

Text Analysis Tool § Using Summarizer tool l imzsum -l 4 project. html

Text Analysis Tool § Using Summarizer tool l imzsum -l 4 project. html

Text Analysis Tool § Summarizer tool - Results

Text Analysis Tool § Summarizer tool - Results

Tools to Use § Web Crawler l l Follows the Link topology for a

Tools to Use § Web Crawler l l Follows the Link topology for a fast search Produces a Web Site Map Use to Recognize the Authoritative pages Provides a filtered collection of pages

Web Crawler § imyclean - to define a web space l Created include. re

Web Crawler § imyclean - to define a web space l Created include. re , exclude. re, types. re § imycrawl - to crawl a defined web space l imycrawl url webspace § imystat - to track what happens during a crawl

Tools to Use § Text Search Engine l l Complicated Text Search Powerful Linguistic

Tools to Use § Text Search Engine l l Complicated Text Search Powerful Linguistic Capabilities Fuzzy searches Query based on structure of document

Text Search Engine § Operates on a Previously based index

Text Search Engine § Operates on a Previously based index

Text Search Engine § Types of Index l l l Linguistic Index (bought as

Text Search Engine § Types of Index l l l Linguistic Index (bought as buy) Feature Index (Linguistics + Names) Precise Index (bought as bought) Normalized Precise Index (Case Insensitive) Ngram Index

Combining Tools for Solutions § Searching with Categories l combining Text Search Engine and

Combining Tools for Solutions § Searching with Categories l combining Text Search Engine and Topic Categorization Tool § Surviving a flood of email l by using Topic Categorization Tools § Selectively indexing Web Pages l by combining Web Crawler, Topic Categorization Tool & Text Search Engine

Views of the Tool § § Command Line (Good for Unix) Not very useful

Views of the Tool § § Command Line (Good for Unix) Not very useful on Windows NT Not a good stand-alone Tool Should be viewed as a Library