Pointwise Mutual Information and Its Developments Jiayi Zhang
- Slides: 13
Pointwise Mutual Information and Its Developments Jiayi Zhang & Ziming Pei
Contents Pointwise Mutual Information Positive Pointwise Mutual Information Association Ratio
Pointwise Mutual Information (PMI)
Positive Pointwise Mutual Information (PPMI)
PPMI Formulas
PMI Deficiency
Word Association and Association Ratio Association ratio: a new objective measure based on PMI Estimates word association norms directly from computer readable corpora More objective and less costly Can be scaled up to provide robust estimates of word association norms for a large portion of the language Word association: Associated word pairs e. g. “nurse” and “doctor”, “save” and “from” An important factor in psycholinguistic research basis for a statistical description of interesting linguistic phenomena semantic relations of the doctor/nurse type (content word/content word) lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word)
Practical Applications: Constraining the language model both for speech recognition and optical character recognition (OCR) Providing disambiguation cues for parsing highly ambiguous syntactic structures Retrieving texts from large databases Enhancing the productivity of computational linguists in compiling lexicons of lexicosyntactic facts Enhancing the productivity of lexicographers in identifying normal and conventional usage
Information Theoretic Measure
Differences between association ratio and PMI
Lexico-Syntactic Regularities Reveal word associations between Nouns and noun eg. “Nurse” and “doctor” Verbs and typical arguments/adjuncts eg. “Set … off”, “set … to” Preprocess corpus and manipulate inventory of tokens For measuring syntactic constraints Include some part of speech information Exclude much of the internal structure of noun phrases For other purposes Tag items and/or phrases with semantic labels
Application in Lexicography Obtain better phrase coverage at less cost, by identifying relatively rare phrases Eg. “save … from”: 21 times more likely than chance Not included in many dictionaries Identify semantic classes by providing a powerful set of suggestions what needs to be accounted for in choosing a set of semantic tags. Eg. Large value for “save … forests” in association ratio table (output) the trend to save give them money to save the forests[ENV] save the dogs[ANIMAL] from being destroyed[DESTRUCT]
Thank you!
- Pointwise mutual information
- Mutual information vs correlation
- Recent developments in ict
- Recent developments in object detection
- Cultural developments of sahelanthropus tchadensis
- Peter rosenwald
- Political developments in the early republic
- Huron creek developments reviews
- Political developments in the early republic
- Surface developments
- Gte milano
- In the colonial era developments such as the new england
- Target developments
- Pattern and development