Automatic Verb Sense Grouping Term Project Proposal for

Automatic Verb Sense Grouping --- Term Project Proposal for CIS 630 Jinying Chen 10/28/2002

Motivation • “Making fine-grained and coarse-grained distinction, both manually and automatically (Martha, Hoa, Christiane, 2002) – The difficulty of finding consistent criteria for making fine-grained sense distinction, either manually or automatically – Well-defined sense groups can alleviate this problem – Potential application in Machine Translation

Model • Unsupervised Learning • EM algorithm (similar as in Dan Gildea 2002, Walde 2000, Rooth 1999, Ted Pedersen, 1997)

EM clustering algorithm • Soft clustering P(v|c) • Each verb vi is associated with a set of features {fi 1, fi 2, … fin}, there are m clusters {c 1 , c 2, … cm} • Estimate P(v|c) by maximize loglikelihood

Two problems • How many clusters for a particular verb? – human knowledge of the rough number of verb sense groups is instructive in unsupervised learning – Olga’s proposal • How many features for a particular verb? – May not be a problem: hopefully the EM algorithm can do feature selection on some degree – However, a well-restricted feature set can reduce the model complexity (O(nm)) and alleviate the effect of noise data – Borrow ideas from “Automatic Verb Classification based on Statistical Distribution of Argument Structure” (Paola Merlo and Suzanne Stevenson, 2001)

Plan • Phase I --- Corpus analysis – Automatically and manually – Determine the range of feature set for each verb • Phase II --- Automatic verb sense grouping – Implement EM clustering algorithm – Evaluate the performance • Phase III --- Compare with other clustering methods – Ward’s minimum-variance method (Ward, 1963) – Mc. Quitty’s similarity analysis (Mc. Quitty, 1966) – Spectral Clustering (Brew & Walde, 2002)