REFERENTIAL CHOICE AS A PROBABILISTIC MULTIFACTORIAL PROCESS Andrej
REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch, Dmitrij A. Zalmanov aakibrik@gmail. com
Referential choice in discourse § When a speaker needs to mention (or refer to) a specific, definite referent, s/he chooses between several options, including: § Full noun phrase (NP) • Proper name (e. g. Pushkin) • Common noun (with modifiers) = definite description (e. g. the poet) § Reduced NP, particularly a third person pronoun (e. g. he) 2
anaphors Example antecedent coreference Pronoun Full NP § Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <. . . > ØHow is this choice made? 3
Why is this important? § Reference is among the most basic cognitive § § § operations performed by language users It is the linguistic representation of what is known as attention and working memory in psychology Reference constitutes a lion’s share of all information in natural communication Consider text manipulation according to the method of Biber et al. 1999: 230 -232 4
Referential expressions marked in green § Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <. . . > 5
Referential expressions removed § Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <. . . > 6
Referential expressions kept § Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <. . . > 7
Plan of talk § I. Referential choice as a multi-factorial process § II. The Ref. Rhet corpus and the machine learning-based approach § III. The probabilistic character of referential choice 8
Multi-factorial character of referential choice § Many factors of referential choice § Distance to antecedent • Along the linear discourse structure • Along the hierarchical discourse structure § Antecedent role § Referent animacy § Protagonisthood. . . . . § None of these factors alone can explain referential choice 9
Factors integration § At every poing in discourse factors are somehow § § summed and give rise to an integral characterization – the referent’s activation score Activation score is the referent’s status with respect to the speaker’s working memory Activation score predetermines referential choice § Low full NP § Medium full or reduced NP § High reduced NP 10
Multi-factorial model of referential choice Various properties of the referent or discourse context Referent’s activation score Referential choice Activation factors (Kibrik 1999) 11
Modeling multi-factorial processes: machine learning-based methods § Neural networks approach (Gruening and Kibrik 2005) § Machine learning algorithm • Automatic selection of factors’ weights • Automatic reduction of the number of factors ( «pruning» ) § However: • Small data set • Single method of machine learning • Low interpretability of results ØHence a new study § Large corpus § Implementation of several machine learning methods § Statistical model of referential choice 12
The Ref. Rhet corpus § English § Business prose § Initial material – the RST Discourse Treebank § Annotated for hierarchical discourse structure § 385 articles from Wall Street Journal § The added component – ØThe Ref. Rhet corpus referential annotation § Over 30 000 referential expressions 13
Example of a hierarchical graph 14
Scheme of referential annotation § The ММАХ 2 program § Krasavina and Chiarcos 2007 § All markables are annotated, including: § Referential expressions § Their antecedents § Coreference relations are annotated § Features of referents and context are annotated that can potentially be factors of referential choice 15
16
Work on referential annotation § O. Krasavina § A. Antonova § D. Zalmanov § A. Linnik § M. Khudyakova § Students of the Department of Theoretical and Applied Linguistics, MSU 17
Current state of the Ref. Rhet referential annotation § 2/3 completed § Further results are based on the following data: § 247 texts § 110 thousand words § 26 024 markables • 7097 proper names • 8560 definite descriptions • 1797 third person pronouns § 3756 reliable pairs «anaphor – antecedent» • Proper names — 1623 (43%) • Definite descriptions — 971 (26%) • Pronouns — 1162 (31%) 18
Factors of referential choice § Properties of the referent: § Animacy § Protagonisthood § Properties of the antecedent: § Type of syntactic phrase (phrase_type) § Grammatical role (gramm_role) § Form of referential expression (np_form, def_np_form) § Whether it belongs to direct speech or not (dir_speech) 19
Factors of referential choice § Properties of the anaphor: § § First vs. nonfirst mention in discourse (referentiality) Type of syntactic phrase (phrase_type) Grammatical role (gramm_role) Whether it belongs to direct speech or not (dir_speech) § § Distance in words Distance in markables Linear distance in clauses Hierarchical distance in elementary discourse units § Distance between the anaphor and the antecedent: 20
Goals for the machine learning-base study § Dependent variable: § Form of referential expression (np_form) § Binary prediction: § Full NP vs. pronouns § Three-way prediction: § Definite description vs. proper name vs. pronoun § Accuracy maximization: § Ratio of correct predictions to the overall number of instances 21
Machine learning methods (Weka, a data mining system) § Easily interpretable methods: § Logical algorithms • Decision trees (C 4. 5) • Decision rules (JRip) § Higher quality: ØLogistic regression § Quality control – the cross-validation method 22
Examples of decision rules generated by the JRip algorithm § (Antecedent’s grammatical role = subject) & (Hierarchical distance ≤ 1. 5) & (Distance in words ≤ 7) => pronoun § (Animate) & (Distance in markables ≥ 2) & (Distance in words ≤ 11) => pronoun 23
Main results § Accuracy § Binary prediction: § logistic regression – 86. 1% § logical algorithms – 85% § Three-way prediction: § logistic regression – 74% § logical algorithms – 72% 24
Comparison of singleand multi-factor accuracy Feature Three-way prediction Binary prediction The largest class 43% 69% Distance in words 55% 76% Hierarchical distance 53. 5% 74. 8% Anaphor’s grammatical role 45. 2% 70% Anaphor in direct speech 43. 8% 70% Animate 47. 3% 71. 5% Combination of factors 74% 86. 1% 25
Referential choice is a probabilistic process § According to Kibrik 1999 Potential referential expressions Full NP only (19%) Full NP, ? pronoun (21 %) Actual referential expressions Full NP (49%) Pronoun or full NP (28%) Pronoun, ? full NP (23%) Pronoun only (9%) Pronoun (51%) 26
Probabilistic character of referential choice in the Ref. Rhet study § Prediction of referential choice cannot be fully § § deterministic There is a class of instances in which referential choice is random It is important to tune up the model so that it could process such instances in a special manner § We are working on this problem § Logistic regression generates estimates of § probability for each referential option This estimate of probability can be interpreted as the activation score from the cognitive model 27
Probabilistic multi-factorial model of referential choice Various properties of the referent or discourse context Activation score = probability of using a certain referential expression Referential choice Activation factors 28
Conclusions about the Ref. Rhet study § Quantity: Large corpus of referential expressions § Quality: A high level of accurate prediction is already attained § And this is not the limit § Theoretical significance: the following fundamental properties of referential choice are addressed: § Multi-factorial character of referential choice § Probabilistic character of referential choice § This approach can be applied to a wide range of linguistic and other behavioral choices 29
- Slides: 29