Collocations and Terminology Vasileios Hatzivassiloglou University of Texas
- Slides: 22
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas
Collocations • Frank Smadja, “Retrieving Collocations from Text”, Computational Linguistics, 1993 • Recurrent combinations of words that cooccur more often than chance, often with non-compositional meaning • Technical and non-technical
Examples of collocations • • The Dow Jones average of industrials The Dow average The Dow industrials *The Jones industrials The Dow Jones industrial *The industrial Dow *The Dow industrial
Collocation properties • Arbitrary (dialect dependent) – ride a bike, set the table • Domain dependent – dry suit, wet suit • Recurrent • Cohesive – Part of a collocation primes for the rest
Applications • Lexicography • Grammatical restrictions (compare with/to but associate with) • Generation • Translation
Types of collocations • Predicative relations – make a decision, hostile takeover – flexible (syntactic variability, intervening words) • Rigid word groups – over the counter market • Phrases with open slots – fluency in a domain
Issues in finding collocations • Possibly more than two words – Need measure that extends beyond the binary case • Possibly intervening words • Possibly morphological and syntactic variation • Semantic constraints (cf. doctors-dentists and doctors-hospitals)
Xtract stage one • For a given word, find all collocates at positions -5 to +5 • Three criteria: – strength (normalized frequency); 95% rejection vs. expected 68% under normal distribution – position histogram must not be flat – select peak from histogram
Xtract stage two • Start from word pairs • Look at each position in between, to the left, and to the right • Keep words that appear very often • If that fails, keep parts of speech that satisfy this criterion
Xtract stage three • Applied to pairs of words • Requires (partial) parsing • Examines the syntactic relationship between words and keeps those pairs with consistent relationships (e. g. , verb-object)
Evaluation • • Ask lexicographer to evaluate output 40% precision after stages one and two 80% precision after stage three 94% conditional recall
Terminology • Béatrice Daille, “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology”, ACL Balancing Act workshop, 1994 • Terms refer to concepts • Terms key for populating a domain ontology • Terms are typically nominal compounds of certain structure, e. g. , NN, N of N
Defining terms • Unique reference • Unique translation • Term extension by – modification (e. g. , addition of an adjective) – substitution – extension of structure – coordination
Algorithm • Apply syntactic constraints to match pairs of words in a candidate term • Filter by application of an association measure • Measures examined: pointwise mutual information, Φ 2 (chi-square), log-likelihood ratio
Observations • • Compare with reference list Frequency a strong predictor Log-likelihood ratio works best Additional criteria: – diversity of the distribution of each word – distance between the two words (determines flexibility but not term status)
Justeson and Katz • Justeson and Katz, “Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text”, Natural Language Engineering, 1995.
Analysis • Examined association measures • Well-known problems: – eliminating general-language constructs (e. g. , collocations) – what to do with single word terms?
Observations • Frequency works well • But a stronger predictor is P(k>1) compared to P(k≥ 1) in the same document • Use syntactic patterns to propose terms, then check if they reappear in the same document • Require this across multiple documents
Term Expansion • Jacquemin, Klavans, and Tzoukermann, “Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax”, ACL 1997. • Need to expand a given list of terms, especially for scientific domains
Term variation • Syntactic (same words, different structure) • Morphosyntactic (derivational forms of words) • Semantic (synonyms are used) • In IR, normalization through stemming and removal of stop words
Approach • Process corpus matching new candidate terms to old ones via unification • Matching based on – inflectional morphology (transducer) – derivational morphology (rule-based) – syntactic transformations – additions of words
Results • Manual inspection of several thousand proposed terms • Precision of 89% • Effectiveness in indexing increases by a factor of three when using the variants (P/R from 99. 7/72 to 97/93)
- Richard laux
- Farshad fahimi
- Kolokasi contoh
- Noun and verb collocations
- Motivation collocations
- Have make do
- Lecture collocation
- Correct the mis collocations in these sentences
- Collocation catch
- Idiomatic collocations
- Collocations definition
- Amosova classification of phraseological units
- Collocations
- Outline collocation
- Lecture collocations
- Collocation meaning
- Rewrite the sentences using the...the + comparative
- Flax collocation
- Texas lutheran university scholarships
- Jonathan tyner texas state university
- Gastroenteritis at a university in texas
- Texas state university psychology department
- Texas state afrotc