Introduction to NLTK ELN Natural Language Processing Giuseppe

  • Slides: 13
Download presentation
Introduction to NLTK ELN – Natural Language Processing Giuseppe Attardi

Introduction to NLTK ELN – Natural Language Processing Giuseppe Attardi

Installing NLTK l Download and Install § http: //nltk. org/install. html l Download NLTK

Installing NLTK l Download and Install § http: //nltk. org/install. html l Download NLTK data >>> import nltk >>> nltk. download()

NLTK

NLTK

NLTK l Suite of classes for several NLP tasks l Parsing, POS tagging, classifiers…

NLTK l Suite of classes for several NLP tasks l Parsing, POS tagging, classifiers… l Several text processing utilities, corpora § Brown, Penn Treebank corpus… § Your data was divided into sentences using ‘punkt’

NLTK l Text material § Raw text § Annotated Text l Tools § Part

NLTK l Text material § Raw text § Annotated Text l Tools § Part of speech taggers § Semantic analysis l Resources § Word. Net, Treebanks

Linguistic Tasks l l l l Part of Speech Tagging Parsing Word Net Named

Linguistic Tasks l l l l Part of Speech Tagging Parsing Word Net Named Entity Recognition Information Retrieval Sentiment Analysis Document Clustering Topic Segmentation l l l l Authoring Machine Translation Summarization Information Extraction Spoken Dialog Systems Natural Language Generation Word Sense

Part of Speech Tagging l Task: Given a string of words, identify the parts

Part of Speech Tagging l Task: Given a string of words, identify the parts of speech for each word. A man walks into a bar. Det Noun Verb Prep Det Noun

POS Tag Usage Surface level syntax. l Primary operation l § § Parsing Word

POS Tag Usage Surface level syntax. l Primary operation l § § Parsing Word Sense Disambiguation Semantic Role labeling Segmentation • Discourse, Topic, Sentence

How to do it? Learn from Data. l Annotated Data: A man walks into

How to do it? Learn from Data. l Annotated Data: A man walks into a bar. Det Noun Verb Prep Det Noun l Unlabeled Data: A man walks home. The pitcher issued four walks. l

POS probabilities Det Noun Verb Prep Adj 0. 9 0. 1 0 0 0

POS probabilities Det Noun Verb Prep Adj 0. 9 0. 1 0 0 0 man 0 0. 6 0. 2 0 0. 2 walks 0 0. 2 0. 8 0 0 into 0 0 0 1 0 bar 0 0. 7 0. 3 0 0 A

‘import nltk’ l You will need to import the necessary modules to create objects

‘import nltk’ l You will need to import the necessary modules to create objects and call member functions § import ~ include objects from pre-built packages l Freq. Dist, Conditional. Freq. Dist are in nltk. probability l Plaintext. Corpus. Reader is in nltk. corpus

Exercise 1. l Run examples from Chapter 1 of NLTK book: § http: //nltk.

Exercise 1. l Run examples from Chapter 1 of NLTK book: § http: //nltk. googlecode. com/svn/trunk/doc/book/ch 0 1. html

Exercise 2. l Run examples from Chapter 3 of NLTK book § http: //nltk.

Exercise 2. l Run examples from Chapter 3 of NLTK book § http: //nltk. googlecode. com/svn/trunk/doc/book/ch 0 3. html