CSCE 771 Natural Language Processing Lecture 24 Distributional

CSCE 771 Natural Language Processing Lecture 24 Distributional based Similarity II Topics n Distributional based word similarity Readings: April 10, 2013 l NLTK book Chapter 2 (wordnet) l Text Chapter 20

Overview Last Time (Programming) n Examples of thesaurus based word similarity l path-similarity – memory fault ; Lin n sim-path(c 1, c 2) = -log pathlen(c 1, c 2)nick, extended Lesk – glosses of words need to include hypernyms Today n Distributional methods Readings: n n Text 19, 20 NLTK Book: Chapter 10 Next Time: Distributional based Similarity II – 2– CSCE 771 Spring 2013

Figure 20. 8 Summary of Thesaurus Similarity measures Elderly moment IS-A memory fault IS-A mistake sim-path correct in table – 3– CSCE 771 Spring 2013

Example computing PPMI • Need counts so lets make up some • – 4– we need to edit this table to have counts CSCE 771 Spring 2013

Associations PMI-assoc • assoc. PMI(w, f) = log 2 P(w, f) / P(w) P(f) Lin- assoc - f composed of r (relation) and w’ • assoc. LIN(w, f) = log 2 P(w, f) / P(r|w) P(w’|w) t-test_assoc (20. 41) – 5– CSCE 771 Spring 2013

Figure 20. 10 Co-occurrence vectors § Dependency based parser – special case of shallow parsing § identify from “I discovered dried tangerines. ” (20. 32) § § – 6– discover(subject I) tangerine(obj-of discover) I(subject-of discover) tangerine(adj-mod dried) CSCE 771 Spring 2013

Figure 20. 11 Objects of the verb drink Hindle 1990 – 7– CSCE 771 Spring 2013

vectors review dot-product length sim-cosine – 8– CSCE 771 Spring 2013

Figure 20. 12 Similarity of Vectors – 9– CSCE 771 Spring 2013

Fig 20. 13 Vector Similarity Summary – 10 – CSCE 771 Spring 2013

Figure 20. 14 Hand-built patterns for hypernyms Hearst 1992 – 11 – CSCE 771 Spring 2013

Figure 20. 15 – 12 – CSCE 771 Spring 2013

Figure 20. 16 – 13 – CSCE 771 Spring 2013

http: //www. cs. ucf. edu/courses/cap 5636/fall 2011/nltk. pdf how to do in nltk NLTK 3. 0 a 1 released : February 2013 This version adds support for NLTK’s graphical user interfaces. http: //nltk. org/nltk 3 -alpha/ which similarity function in nltk. corpus. wordnet is Appropriate for find similarity of two words? I want use a function for word clustering and yarowsky algorightm for find similar collocation in a large text. http: //en. wikipedia. org/wiki/Wikipedia: Wiki. Project_Linguistics http: //en. wikipedia. org/wiki/Portal: Linguistics http: //en. wikipedia. org/wiki/Yarowsky_algorithm http: //nltk. googlecode. com/svn/trunk/doc/howto/wordnet. html – 14 – CSCE 771 Spring 2013