Learning Extraction Patterns for Subjective Expressions Ellen Riloff
Learning Extraction Patterns for Subjective Expressions Ellen Riloff University of Utah 7/2003 Janyce Wiebe University of Pittsburgh EMNLP 03 1
Subjectivity • Subjective language includes opinions, rants, allegations, accusations, suspicions, and speculation • Distinguishing factual from subjective information could benefit many applications: – information extraction – question answering – summarization 7/2003 EMNLP 03 2
Goals • Sentence-level subjectivity classification – (Wiebe et al. 2001) found that 44% of sentences in news articles are subjective • Learning subjectivity clues from unannotated text corpora • Learning linguistically rich patterns 7/2003 EMNLP 03 3
Previous Work: Subjectivity Analysis • Document-level subjectivity classification (e. g. , Turney 2002; Pang et al 2002; Spertus 1997) and above (Tong 2001) • Genre classification (e. g. , Karlgren and Cutting 1994; Kessler et al. 1997; Wiebe et al. 2001) • Supervised sentence-level classification (Wiebe et al 1999) • Learning adjectives, adjectival phrases, verbs, nouns, and N-grams (e. g. , Turney 2002; Hatzivassiloglou & Mc. Keown 1997; Wiebe et al. 2001) 7/2003 EMNLP 03 4
Recent Related Work • Yu and Hatzivassiloglou (EMNLP 03). Unsupervised sentence level classification. Complementary approach and features. • Dave et al. (WWW 03): reviews classified as positive or negative. • Agrawal et al. (WWW 03): newsgroup authors partitioned into camps based on quotation links • Gordon et al. (ACL 03): manually developed grammars for some types of subjective language 7/2003 EMNLP 03 5
Extraction Patterns • Extraction patterns are lexico-syntactic patterns to identify relevant information • Typically they represent role relationships surrounding noun and verb phrases hijacking of <x>: hijacked vehicle <x> was hijacked: hijacked vehicle <x> hijacked: hijacker 7/2003 EMNLP 03 6
Our Method • Subjective expressions represented as extraction patterns get to know <dobj> <subj> appear to be <subj> was satisfied <subj> complained • Supervised extraction pattern learning • Training data generated automatically • Entire process bootstrapped 7/2003 EMNLP 03 7
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection English language versions of FBIS news articles from a variety of countries. Size: 302, 160 sentences 7/2003 EMNLP 03 15
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
• From previous work • Manually identified (e. g, entries from Levin 1993) • Automatically identified (e. g. , nouns from Riloff et al. 2003) Known subjective vocabulary
• From previous work • Manually identified (e. g, entries from Levin 1993) • Automatically identified (e. g. , nouns from Riloff et al. 2003) Known subjective vocabulary • Strongly subjective: most instances subjective • Weakly subjective: objective instances also common
• From previous work • Manually identified (e. g, entries from Levin 1993) • Automatically identified (e. g. , nouns from Riloff et al. 2003) Known subjective vocabulary Any data used is separate from data in this paper • Strongly subjective: most instances subjective • Weakly subjective: objective instances also common
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary >1 strongly subjective clue unlabeled sentences objective Objective sentences Classifier subjective sentences 91. 3% Precision 31. 9% Recall Test set: 2197 sentences 59% subjective
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary Objective Classifier >1 strongly subjective clue subjective sentences unlabeled sentences 0 strongly subjective clue & 0 or 1 weakly subjective clue in previous, current, next sentence 82. 6% Precision 16. 4% Recall objective sentences
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Subjective Classifier 17, 000 subjective sentences “relevant texts” 17, 000 Extraction Pattern Auto. Slog-TS objective Riloff 1996 Objective sentences Learner Classifier “irrelevant texts” subjective patterns
Step 1: Apply Syntactic Templates <subj>active-verb dobj <subj> dealt blow <subj> verb infinitive <subj> appear to be <subj> aux noun <subj> has position Active-verb <dobj> endorsed <dobj> Verb infinitive <dobj> get to know <dobj> Noun prep <np> opinion on <np> Infinitive prep <np> to resort to <np> 7/2003 EMNLP 03 25
Step 1: Apply Syntactic Templates <subj>active-verb dobj <subj> dealt blow <subj> verb infinitive <subj> appear to be <subj> aux noun <subj> has position Active-verb <dobj> endorsed <dobj> Verb infinitive <dobj> get to know <dobj> Noun prep <np> opinion on <np> Infinitive prep <np> to resort to <np> 7/2003 EMNLP 03 26
Step 1: Apply Syntactic Templates <subj>active-verb dobj <subj> dealt blow Matches any sentence with verb phrase with head=dealt direct object with head=blow. “The experience certainly dealt a stiff blow to his pride. ” 7/2003 EMNLP 03 27
Step 2: Select Patterns • Apply all learned patterns to training data • Rank patterns: Prec(pattern) = p(subjective | pattern) = # in subjective sentences / total # • Choose patterns with: Frequency > F Prec > P on the training data for some F and P 7/2003 EMNLP 03 28
Examples from Training Data %SUBJ <subj> was asked is talk 100% 63% 100% talk of <np> 90% <subj> will talk 71% was expected from <np> 100%
Examples from Training Data %SUBJ <subj> was asked is talk 100% 63% 100% talk of <np> 90% <subj> will talk 71% was expected from <np> 100%
Examples from Training Data %SUBJ <subj> was asked is talk 100% 63% 100% talk of <np> 90% <subj> will talk 71% was expected from <np> 100%
Examples from Training Data %SUBJ <subj> was asked is talk 100% 63% 100% talk of <np> 90% <subj> will talk 71% was expected from <np> 100%
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Test Data • Manual annotation to support project investigating multiple perspective QA (ARDA AQUAINT NRRC) • 0. 77 ave pair-wise kappa • 0. 89 ave pair-wise kappa with borderline sentences removed (11% of the corpus) Wilson & Wiebe, SIGDIAL 2003, describes the annotation scheme and agreement study 7/2003 EMNLP 03 34
Example (writer, FM) The Foreign Ministry said Thursday that it was “surprised, to put it mildly” (writer, FM, SD) by the U. S. State Department’s criticism of Russia’s human rights (writer, FM) record and objected in particular to the “odious” section on Chechnya. 7/2003 EMNLP 03 35
7/2003 EMNLP 03 36
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Evaluation of Learned Patterns • Test data: – 3947 sentences – 54% subjective Train F > 9 P: 100% F > 1 P > 59% 7/2003 Test P: 85% Recall: 41% P: 71% Recall: 92% EMNLP 03 38
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Subjective Classifier 17000 subjective sentences unlabeled sentences Objective 17000 Classifier objective sentences unlabeled sentences Extraction Pattern Learner subjective patterns Pattern-Based Subjective Classifier new subjective sentences
Subjective Classifier 17000 subjective sentences unlabeled sentences Objective 17000 Classifier objective sentences unlabeled sentences Extraction Pattern Learner subjective patterns Pattern-Based Subjective Classifier > 0 instances of patterns with F >4 P = 1 on training data 9500 new subjective sentences
Subjective Classifier 17000 7500 subjective sentences unlabeled sentences Objective 17000 Classifier objective sentences unlabeled sentences Extraction Pattern Learner 9500 new subjective sentences Pattern-Based Subjective Classifier
Subjective Classifier 17000 7500 subjective sentences unlabeled sentences Objective 17000 Classifier objective sentences unlabeled sentences Extraction Pattern Learner 9500 new subjective sentences new subjective patterns Pattern-Based Subjective Classifier 4248 patterns P >. 59 on training data 308 patterns P = 1. 0 on training data
Subjective Classifier 17000 7500 subjective sentences unlabeled sentences Objective 17000 Classifier objective sentences unlabeled sentences Extraction Pattern Learner 9500 new subjective sentences new subjective patterns Pattern-Based Subjective Classifier Evaluate new + old patterns on test set: Recall +2– 4% Prec -0. 5– 2%
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
unlabeled sentences Subjective Classifier subjective patterns subjective sentences Known subjective vocabulary Extraction Pattern Learner
unlabeled sentences Subjective Classifier New subjective Sentences: 1 old clue + 1 new >1 new Known subjective vocabulary subjective patterns F > 9, P = 1. 0 on training data old + new subjective sentences Extraction Pattern Learner
unlabeled sentences Subjective Classifier New subjective Sentences: 1 old clue + 1 new >1 new Known subjective vocabulary subjective patterns F > 9, P = 1. 0 on training data old + new subjective sentences Extraction Pattern Learner
Evaluation on Test Data • Original subjective classifier 32. 9% recall 91. 3% precision • Augmented subjective classifier 40. 1% recall 7/2003 90. 2% precision EMNLP 03 49
Future Work 7/2003 EMNLP 03 50
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
• Improve original high-precision classifier • Identify new objective sentences during bootstrapping Known subjective vocabulary objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner objective sentences Pattern-Based Objective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
unlabeled sentences Unannotated Text Collection Subjective Classifier subjective sentences Iteration 0 Iteration 1+ Known subjective vocabulary Subjective objective sentences Classifier Iteration 0 Iteration 1+
• Build up subjective lexicon as the process is applied to new corpora. • Human review of high precision patterns Known subjective vocabulary Tough act to follow: linguistic subjectivity Rush Limbaugh: opinionated source police: “lightning rod topic” • Richer Representation with deeper knowledge (theta roles, polarity, tone, ambiguity, …)
Conclusions • High-precision subjectivity classification can be used to generate large amounts of labeled training data • Extraction pattern learning techniques can learn linguistically rich subjective patterns • Bootstrapping process results in higher recall with little loss in precision 7/2003 EMNLP 03 56
Annotation Scheme • The annotation scheme was developed as part of a U. S. government-sponsored project (ARDA AQUAINT NRRC) to investigate multiple perspective question answering. • Annotators labeled private state expressions. • Each private state can have low, medium, or high strength. • Our gold standard considers a sentence to be subjective if it contains at least one private state expression of medium or higher strength. 7/2003 EMNLP 03 57
Two Ways of Expressing Private States • Explicit mentions of private states and speech events – The United States fears a spill-over from the anti-terrorist campaign • Expressive subjective elements – The part of the US human rights report about China is full of absurdities and fabrications. 7/2003 EMNLP 03 58
Nested Sources (writer, Xirao-Nima, US) (writer, Xirao-Nima) “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer) (writer, Xirao-Nima) “The report is full of absurdities, ’’ he continued. (writer) 7/2003 EMNLP 03 59
Only. Factive=yes Only. Factive=no (writer, Xirao-Nima, US) (writer, Xirao-Nima) “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer) Only. Factive=yes 7/2003 EMNLP 03 60
Example (writer, FM) The Foreign Ministry said Thursday that it was “surprised, to put it mildly” (writer, FM, SD) by the U. S. State Department’s criticism of Russia’s human rights (writer, FM) record and objected in particular to the “odious” section on Chechnya. 7/2003 EMNLP 03 61
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
Unannotated Text Collection unlabeled sentences Subjective Classifier Known subjective vocabulary subjective patterns subjective sentences unlabeled sentences objective Objective sentences Classifier unlabeled sentences Extraction Pattern Learner subjective patterns subjective sentences Pattern-Based Subjective Classifier
- Slides: 63