From Lexical Semantics to Knowledge Systems How to

  • Slides: 40
Download presentation
From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data

From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica http: //cwn. ling. sinica. edu. tw/huang. htm

Outline A generative lexicalist approach to grammar n From distributional data to the basic

Outline A generative lexicalist approach to grammar n From distributional data to the basic contrasts in a semantic field (or conceptual motivation for corpus distribution) n Lexical distribution as cognitive model n Radical as ontology n Language as a knowledge system n 2007. 03. 09 ISLCC Chu-Ren Huang

Introduction: A generative lexicalist approach to grammar Back to Aristotle (through Pustejovsky) How do

Introduction: A generative lexicalist approach to grammar Back to Aristotle (through Pustejovsky) How do know and what do we know: through what we experience n Qualia Structure: what we experience n n n 2007. 03. 09 Formal Constitutive Agentive Telic ISLCC Chu-Ren Huang

Linguistics: What do we know about language n Qualia Structure of Theory of Language

Linguistics: What do we know about language n Qualia Structure of Theory of Language n n n Formal: from Sign to Structure, Structuralism Constitutive: from IA to IP, rule and transformation based theories Agentive: UG approaches Telic: Function and Use based Theories We need a linguistic theory that accounts for the complete knowledge structure, not just its individual aspects 2007. 03. 09 ISLCC Chu-Ren Huang

Towards Language as Knowledge System Atoms of knowledge:lexicalized concepts n ‘frames’ of knowledge:lexical semantic

Towards Language as Knowledge System Atoms of knowledge:lexicalized concepts n ‘frames’ of knowledge:lexical semantic relations n Instantiation of knowledge:corpus n lexicon-driven, corpus-based to infer knowledge structure underlying linguistic structure 2007. 03. 09 ISLCC Chu-Ren Huang

Three Studies n The semantic field of emotion: (elaborated from Chang et al. 2000)

Three Studies n The semantic field of emotion: (elaborated from Chang et al. 2000) n Lexicalized Model of Cognition: (Huang and Hong 2005) n Conventionalized Ontology in Writing( Chou and Huang 2005) 2007. 03. 09 ISLCC Chu-Ren Huang

Semantic Field of Verbs of Emotion n Issues: Methodological n n n Interpretation of

Semantic Field of Verbs of Emotion n Issues: Methodological n n n Interpretation of Distributional Data Measuring and Interpreting lexical choices Issues: Linguistic n n Archetype Via Contrast Why Change-of-State: n 2007. 03. 09 Saliency and relevance to human cognition ISLCC Chu-Ren Huang

Distributional Contrast of Verbs of Emotion 高興gao 1 xing 4 (Type A) Vs. 快樂kuai

Distributional Contrast of Verbs of Emotion 高興gao 1 xing 4 (Type A) Vs. 快樂kuai 4 le 4 (Type B) n Category: intrans. vs. trans. state verb Function: more predicative vs. more nominalized n Collocation: CAUSE complement vs. no CAUSE n Collocation: Perfect aspect vs. no -le n Collocation (modified nouns): Eventive n 2007. 03. 09 ISLCC Chu-Ren Huang

A Natural Dichotomy of Verbs of Emotion Subtype Happiness Type A Type B gao

A Natural Dichotomy of Verbs of Emotion Subtype Happiness Type A Type B gao 1 xing 4高興(669) kuai 4 le 4快樂(942) kai 1 xin 1開心(152) yu 2 kuai 4愉快(271) tong 4 kuai 4痛快(40) xi 3 yue 4喜悅(156) huan 1 le 4歡樂(141) huan 1 xi 3歡喜(107) kuai 4 huo 2快活(48) Depression nan 2 guo 4難過(232) Tong 4 ku 3痛苦(443) tong 4 xin 1痛心(48) 2007. 03. 09 ISLCC chen 2 zhong 4沈重(83) ju 3 sang 4沮喪(62) Chu-Ren Huang

A Natural Dichotomy of Verbs of Emotion Subtype Sadness Regret Anger Fear Worry Type

A Natural Dichotomy of Verbs of Emotion Subtype Sadness Regret Anger Fear Worry Type A Type B hang 1 xin 1傷心(134) bei 1 shang 1悲傷(52) hou 4 hui 3後悔(102) yi 2 han 4遺憾(198) seng 1 qi 4生氣(307) fen 4 nu 4憤怒(112) qi 4 fen 4氣憤(49) hai 4 pa 4害怕(261) kong 3 ju 4恐懼(149) wei 4 ju 4畏懼(40) dan 1 xin 1擔心(609) fan 2 nao 3煩惱(199) dan 1 you 1擔憂(64) ku 3 nao 3苦惱(45) you 1 xin 1 憂心(46) 2007. 03. 09 ISLCC Chu-Ren Huang

Some Observations n Each of the seven kinds of emotion verbs show the same

Some Observations n Each of the seven kinds of emotion verbs show the same dichotomy: n n change-of-state vs. homogeneous state Each side of the dichotomy is dominated by a dominating verb n 2007. 03. 09 in terms of frequency and prototypicality of meaning ISLCC Chu-Ren Huang

Semantic Field and Contrast Set n A semantic field is consisted of a unique

Semantic Field and Contrast Set n A semantic field is consisted of a unique covering term and a number of contrast sets. Paraphrase of Grandy 1992 n n 2007. 03. 09 The unique covering term may or may not occur in a contrast set. All other members of the semantic field must be determined by entering into a contrast set relation with a known member of the semantic field. ISLCC Chu-Ren Huang

Observation: Chinese Defines a Property by Contrast qing 1 zhong 4 light+heavy = weight

Observation: Chinese Defines a Property by Contrast qing 1 zhong 4 light+heavy = weight n da 4 xiao 3 big+small = size n gao 1 ai 3 tall+short = height n shi 4 fei 1/dui 4 cuo 4 right+wrong = affair n xiong 1 di 4 elder+younger = brothers n zang 1 pi 3 praise+attack = criticize n hu 1 xi 1 exhale+inhale = breathe n 2007. 03. 09 ISLCC Chu-Ren Huang

Our Proposal T is either a single term or a privileged contrast set, called

Our Proposal T is either a single term or a privileged contrast set, called a contrast pair. n When T is a contrast pair, the semantic field can be defined by the shared semantic properties of the pair. n The fundamental contrast relation defining a contrast pair may be shared by a superset of semantic fields. n 2007. 03. 09 ISLCC Chu-Ren Huang

Our Proposal T must enter contrast set relations with other members of the semantic

Our Proposal T must enter contrast set relations with other members of the semantic field, although the contrast relation may be weakened to a marked/unmarked contrast. n The set of fundamental contrast relations are shared by all semantic fields. [cf. Semantic relations] n 2007. 03. 09 ISLCC Chu-Ren Huang

Patterns of Distribution as Representational Clues Numbers Don’t Lie n The pattern itself is

Patterns of Distribution as Representational Clues Numbers Don’t Lie n The pattern itself is a proof that generalizations based on a single lexical item is replicable. n The uniformity and universality of the pattern across a broad but contiguous semantic field strongly favors a conceptual motivation. n 2007. 03. 09 ISLCC Chu-Ren Huang

Functional Distribution of Type A Verbs of Emotion Type A Pred. Nom. N. M.

Functional Distribution of Type A Verbs of Emotion Type A Pred. Nom. N. M. gao 1 xing 4 85. 05% 0. 30% 1. 35% nan 2 guo 4 86. 64% 2. 16% 2. 59% shang 1 xin 1 76. 12% 2. 99% 11. 19% hou 4 hui 3 94. 12% 0. 00% 2. 94% sheng 1 qi 4 87. 82% 0. 00% 4. 06% hai 4 pa 4 93. 10% 3. 07% 2. 68% dan 1 xin 1 96. 72% 1. 97% 1. 31% Average 88. 51% 1. 50% 3. 73% 2007. 03. 09 ISLCC Chu-Ren Huang

Functional Distribution of Type B Verbs of Emotion Type B Pred. N. M. kuai

Functional Distribution of Type B Verbs of Emotion Type B Pred. N. M. kuai 4 le 4 37. 79% tong 4 ku 3 25. 73% bei 1 shang 1 40. 38% yi 2 han 4 34. 85% fen 4 nu 4 28. 57% kong 3 ju 4 23. 49% fan 2 nao 3 24. 12% Average 30. 70% 2007. 03. 09 Nom. 26. 43% 45. 60% 28. 85% 33. 84% 37. 50% 68. 46% 69. 85% 44. 36% ISLCC Chu-Ren Huang 24. 84% 20. 54% 19. 23% 3. 54% 17. 86% 7. 38% 6. 03% 14. 21%

Preference of A verbs over B verbs in Predicative Uses Verbs Pred. -Freq. gaoxing/kuaile

Preference of A verbs over B verbs in Predicative Uses Verbs Pred. -Freq. gaoxing/kuaile 569/356 nanguo/tongku 201/114 shangxin/beishang 102/21 houhui/yihan 96/69 shengqi/fennu 238/32 haipa/kongju 243/35 danxin/fannao 589/48 Average ratio 2007. 03. 09 ISLCC Chu-Ren Huang A/B Ratio 1. 59 1. 76 4. 86 1. 39 7. 44 6. 94 12. 27 5. 62

Preference of B verbs over A verbs in Nominal Uses Verbs Nom. -Freq. B/A

Preference of B verbs over A verbs in Nominal Uses Verbs Nom. -Freq. B/A Ratio gaoxing/kuaile 11/483 43. 91 nanguo/tongku 11/293 26. 64 shangxin/beishang 19/25 1. 32 houhui/yihan 3/74 24. 67 shengqi/fennu haipa/kongju 2007. 03. 09 11/62 15/113 ISLCC Chu-Ren Huang 5. 64 7. 53

Summary of the Likelyhood Ratio Data A clear lexical preference between nearsynonyms are established.

Summary of the Likelyhood Ratio Data A clear lexical preference between nearsynonyms are established. n Predicative preference and deverbal preference tend to compensate each other to establish contrast. n Overall, the deverbal preference seems to be the defining feature of the dichotomy. [note that these are all verbs. ] n 2007. 03. 09 ISLCC Chu-Ren Huang

Deverbal Use Frequency of Type A Verbs tong 4 kuai 4痛快 gao 1 xing

Deverbal Use Frequency of Type A Verbs tong 4 kuai 4痛快 gao 1 xing 4高興 hou 4 hui 3後悔 dan 1 xin 1擔心 sheng 1 qi 4生氣 tong 4 xin 1痛心 nan 2 guo 4難過 hai 4 pa 4害怕 you 1 xin 1憂心 kai 1 xin 1開心 dan 1 you 1擔憂 shang 1 xin 1傷心 2007. 03. 09 0. 00% 1. 65% 2. 94% 3. 28% 3. 58% 4. 17% 4. 75% 5. 75% 6. 52% 7. 89% 9. 38% 14. 18% ISLCC Chu-Ren Huang

Deverbal Use Frequency of Type B Verbs qi 4 fen 4氣憤 24. 49% chen

Deverbal Use Frequency of Type B Verbs qi 4 fen 4氣憤 24. 49% chen 1 zhong 4沈重 48. 19% wei 4 ju 4畏懼 25. 00% kuai 4 le 4快樂 51. 27% yu 2 kuai 4愉快 29. 89% fen 4 nu 4憤怒 55. 36% huan 1 xi 1歡喜 30. 84% tong 4 ku 3痛苦 66. 14% kuai 4 huo 2快活 33. 33% kong 3 ju 4恐懼 75. 84% ju 3 sang 4沮喪 33. 87% fan 2 nao 3煩惱 75. 88% yi 2 han 4遺憾 37. 38% xi 1 yue 4喜悅 2007. 03. 09 ISLCC Chu-Ren Huang

Deverbal Use Frequency as a Benchmark for Type A/B Verbs n More than 10%

Deverbal Use Frequency as a Benchmark for Type A/B Verbs n More than 10% differentiates the lowest Type B verb (qi 4 fen 4氣憤 24. 49%) from the highest Type A verbs (shang 1 xin 1傷心 14. 18%). n 2007. 03. 09 The smallest gap between a competing pair is almost 34% (shang 1 xin 1傷心14. 18% vs. bei 1 shang 1悲傷 48. 08% ). ISLCC Chu-Ren Huang

The Noisy-Channel Model of Theory of Communication n Our Proposal Language is an information-based

The Noisy-Channel Model of Theory of Communication n Our Proposal Language is an information-based communication system. n An optimized communication system is where all redundant signs (for one piece of information) also minimally differentiate another piece of information. n 2007. 03. 09 ISLCC Chu-Ren Huang

Re-Interpretation of the Data Members of the same semantic field in general, and a

Re-Interpretation of the Data Members of the same semantic field in general, and a near-synonym pair in particular, are competing signs to express information pertaining to the field. n A sign is chosen to represent a piece of information because it expresses that piece of information most effectively. n 2007. 03. 09 ISLCC Chu-Ren Huang

Re-Interpretation of the Data This preference for expressing certain information can be lexicalized to

Re-Interpretation of the Data This preference for expressing certain information can be lexicalized to establish logical implicature. n Once that lexical preference is established, linguists could use the preferential ratio to infer the lexical information being carried. n 2007. 03. 09 ISLCC Chu-Ren Huang

Lexical distribution as cognitive model: Senses A further step based on property defined by

Lexical distribution as cognitive model: Senses A further step based on property defined by contrast, with focus on how senses are represented n Study the sense of hearing and the basic property term of sheng-yin ‘sound/voice’ n We (Huang and Hong 2005) look at the distribution of these two lexical elements in all derived words n 2007. 03. 09 ISLCC Chu-Ren Huang

聲 Sheng vs. 音 Yin n 聲樂 vs. 音樂 n *噪聲 vs. 噪音 noise

聲 Sheng vs. 音 Yin n 聲樂 vs. 音樂 n *噪聲 vs. 噪音 noise n 大聲 vs. *大音 loudly vocal music vs. music n 發聲 vs. 發音 make a sound vs. articulate n 高聲 vs. 高音 loudly vs. high pitch 2007. 03. 09 ISLCC Chu-Ren Huang

NN Compound N+* 聲 Sheng +source 音 Yin + quality n n n n

NN Compound N+* 聲 Sheng +source 音 Yin + quality n n n n 歌 掌 人 腳步 風 鐘 水 … 2007. 03. 09 n n n n ISLCC 嗓 鄉 喉 裝飾 尾 哨 … Chu-Ren Huang

The semantic Contrast n聲 n n 2007. 03. 09 n音 Production of sounds Often

The semantic Contrast n聲 n n 2007. 03. 09 n音 Production of sounds Often refers to the manner or source of haw a sound was made ISLCC n n Chu-Ren Huang Perception of a sound Often refers to the sound quality or how a sound is perceived by an intelligent agent

A Lexicalized Schema for Hearing in Chinese From Huang and Hong 2005 Process of

A Lexicalized Schema for Hearing in Chinese From Huang and Hong 2005 Process of Hearing 聲sheng 音yin 發動者(instigator) 經驗者(experiencer) 起點、來源 source 主動完成 production 2007. 03. 09 終點、結果 goal 被動接收 reception ISLCC Chu-Ren Huang

A Lexicalized Schema for Sense in Chinese Process of Sensation 感知接收(sensation) word 1 word

A Lexicalized Schema for Sense in Chinese Process of Sensation 感知接收(sensation) word 1 word 0 經驗者(experiencer) Goal/perceptiopn: 2007. 03. 09 ISLCC experience of sense Chu-Ren Huang

詞彙詞義分析 (7) 「視覺」、「觸覺」與「聽覺」三者的關係圖示 認知特徵的對比 詞彙 特徵 感覺發動者 (instigator of action) — marked 感覺經驗者 (experiencer

詞彙詞義分析 (7) 「視覺」、「觸覺」與「聽覺」三者的關係圖示 認知特徵的對比 詞彙 特徵 感覺發動者 (instigator of action) — marked 感覺經驗者 (experiencer of sensation) — shared and 音 (perception) unmarked 聽覺 聲 (production) 視覺 看 (inchoative) 見 (bounded result) 觸覺 觸 (activity) 摸 (incremental theme) 2007. 03. 09 ISLCC Chu-Ren Huang perception

Radical as ontology Chinese writing system has been conventionalized and shared for over three

Radical as ontology Chinese writing system has been conventionalized and shared for over three thousand years n And adopted by typologically very different languages n If the radical system is a system of conceptualization, then it is the most robust and most widely used ontology n 2007. 03. 09 ISLCC Chu-Ren Huang

Example: the horse radical (from Chou 2005) 馬 is a semantic symbol of horse

Example: the horse radical (from Chou 2005) 馬 is a semantic symbol of horse n Examples: n n 馬 2007. 03. 09 n n 驩: 馬名 a kind of horse 驫: 眾馬 horses 騎: 騎馬 riding a horse 驍: 良馬 a good horse 驚: 馬驚 a scared horse ISLCC Chu-Ren Huang

Research Tool and Issue n Formal Description n IEEE SUMO ( Suggested Upper Merged

Research Tool and Issue n Formal Description n IEEE SUMO ( Suggested Upper Merged Ontology) http: //www. ontologyportal. org http: //BOW. sinica. edu. tw n 2007. 03. 09 Issue: Why Chinese radicals are usually considered as a imperfect and misleading taxonomy? ISLCC Chu-Ren Huang

Knowledge System of the Radical 艸 /艹 (Grass, for Plants) Usage 蕃藥蔬菜薪 苑藩藉茭 蕉蘭芒蒙菌蔓

Knowledge System of the Radical 艸 /艹 (Grass, for Plants) Usage 蕃藥蔬菜薪 苑藩藉茭 蕉蘭芒蒙菌蔓 苦菊茱范荷茅 蕈蔚菲草 Description 茲蒼芳落 茸茂荒薄 芬蒸莊 Parts 萌莖芽茄 苗蓮葉 Plants IS-A 2007. 03. 09 Constitutive ISLCC Descriptive/ formal Chu-Ren Huang telic

Conclusion I: Corpus as Evidence n n Core issue of a scientific explanation of

Conclusion I: Corpus as Evidence n n Core issue of a scientific explanation of language and cognition Language as an living organism allows variations and adaptations (the evolutionary view) The coherence of language is the shared tendency of all users Distributional data in corpus lead to discovery of these shared tendencies n 2007. 03. 09 This should be more valuable than incidental example ISLCC Chu-Ren Huang

Conclusion II: Language as a Knowledge System The generative lexicalist approach to grammar: language

Conclusion II: Language as a Knowledge System The generative lexicalist approach to grammar: language as a knowledge system n All aspects of Language are projected from a unified knowledge system n Lexical semantics based on distributional data offers the best window to the underlying knowledge system of language n 2007. 03. 09 ISLCC Chu-Ren Huang