Pointwise Mutual Information and Its Developments Jiayi Zhang

  • Slides: 13
Download presentation
Pointwise Mutual Information and Its Developments Jiayi Zhang & Ziming Pei

Pointwise Mutual Information and Its Developments Jiayi Zhang & Ziming Pei

Contents Pointwise Mutual Information Positive Pointwise Mutual Information Association Ratio

Contents Pointwise Mutual Information Positive Pointwise Mutual Information Association Ratio

Pointwise Mutual Information (PMI)

Pointwise Mutual Information (PMI)

Positive Pointwise Mutual Information (PPMI)

Positive Pointwise Mutual Information (PPMI)

PPMI Formulas

PPMI Formulas

PMI Deficiency

PMI Deficiency

Word Association and Association Ratio Association ratio: a new objective measure based on PMI

Word Association and Association Ratio Association ratio: a new objective measure based on PMI Estimates word association norms directly from computer readable corpora More objective and less costly Can be scaled up to provide robust estimates of word association norms for a large portion of the language Word association: Associated word pairs e. g. “nurse” and “doctor”, “save” and “from” An important factor in psycholinguistic research basis for a statistical description of interesting linguistic phenomena semantic relations of the doctor/nurse type (content word/content word) lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word)

Practical Applications: Constraining the language model both for speech recognition and optical character recognition

Practical Applications: Constraining the language model both for speech recognition and optical character recognition (OCR) Providing disambiguation cues for parsing highly ambiguous syntactic structures Retrieving texts from large databases Enhancing the productivity of computational linguists in compiling lexicons of lexicosyntactic facts Enhancing the productivity of lexicographers in identifying normal and conventional usage

Information Theoretic Measure

Information Theoretic Measure

Differences between association ratio and PMI

Differences between association ratio and PMI

Lexico-Syntactic Regularities Reveal word associations between Nouns and noun eg. “Nurse” and “doctor” Verbs

Lexico-Syntactic Regularities Reveal word associations between Nouns and noun eg. “Nurse” and “doctor” Verbs and typical arguments/adjuncts eg. “Set … off”, “set … to” Preprocess corpus and manipulate inventory of tokens For measuring syntactic constraints Include some part of speech information Exclude much of the internal structure of noun phrases For other purposes Tag items and/or phrases with semantic labels

Application in Lexicography Obtain better phrase coverage at less cost, by identifying relatively rare

Application in Lexicography Obtain better phrase coverage at less cost, by identifying relatively rare phrases Eg. “save … from”: 21 times more likely than chance Not included in many dictionaries Identify semantic classes by providing a powerful set of suggestions what needs to be accounted for in choosing a set of semantic tags. Eg. Large value for “save … forests” in association ratio table (output) the trend to save give them money to save the forests[ENV] save the dogs[ANIMAL] from being destroyed[DESTRUCT]

Thank you!

Thank you!