Working with Frames Annotating a German Corpus with

  • Slides: 73
Download presentation
Working with Frames Annotating a German Corpus with Frames – and using them for

Working with Frames Annotating a German Corpus with Frames – and using them for NLP Anette Frank Computational Linguistics Department Saarland University Saarbrücken Language Technology Lab DFKI Gmb. H Saarbrücken SLTC 2006, Swedish Language Technology Conference Göteborg, 27 -28 Oct 2006

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Funded by German Research

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Funded by German Research Foundation, DFG (2004 – 2008) – Project team, objectives and background information Annotating a German Corpus with Frame. Net Frames … – Cross-language application: using English Frame. Net for German – Corpus-based approach • Special phenomena: non-compositionality and vagueness • Coverage problems – Consistency control – From Corpus to Lexicon … and using them in NLP applications – Automatic Frame and Role Assignment (Erk and Pado, 2006) – Frame Semantics for Textual Entailment (Burchardt and Frank, 2006) Conclusions and Outlook

The SALSA Project Team Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Manfred Pinkal

The SALSA Project Team Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Manfred Pinkal and Sebastian Pado Motivation Alleviating the bottleneck in lexical-semantic resource creation for languages other than English Objectives I. Creation of a large semantically annotated corpus of German – Annotating frame semantic classes and roles from Berkeley Frame. Net on top of a syntactically analysed newspaper corpus (TIGER corpus) II. Creation of a semantic lexicon on the basis of corpus annotations – – Word Sense: Frame-semantic classifications of predicates Argument Structure: Semantic roles and syntactic realisation patterns III. Developing methods for automation and application of frame-semantic information in NLP applications

Frame Semantics (Fillmore 1976, Fillmore et al. 2003) – A frame represents a conceptual

Frame Semantics (Fillmore 1976, Fillmore et al. 2003) – A frame represents a conceptual structure, or a prototypical situation, with a (frame-specific) set of roles that identify the participants or props involved in the situation – Frames are organised in a hierarchy, with various frame-to-frame relations • Inheritance, subframe (defining scenarios) – Frame. Net database: 600 frames, 8, 700 lexical units, 133, 846 annotated sents Commercegoods-transfer Seller BMW bought Rover from British Aerospace. Buyer Rover was bought by BMW, which financed [. . . ] the new Range Rover. Goods BMW, which acquired Rover in 1994, is now dismantling the company. Money BMW‘s purchase of Rover for $1. 2 billion was a good move.

Frame Definitions and Annotations

Frame Definitions and Annotations

Frame. Net Hierarchy and Frame Relations

Frame. Net Hierarchy and Frame Relations

Role Inheritance and Perspectivization

Role Inheritance and Perspectivization

Why Frame Semantics? Cross-linguistic aspects – Frame. Net’s conceptual classes bear high potential for

Why Frame Semantics? Cross-linguistic aspects – Frame. Net’s conceptual classes bear high potential for cross-lingual applicability – Frames are linguistically motivated • Syntactic realisation of core semantic roles • Ontological constraints and “perspectivization” These properties may differ across languages Research issues – Cross-linguistic applicability of Frame. Net’s semantic inventory – Accounting for cross-linguistic divergences in a Multilingual Frame. Net Cross-lingual Frame. Net Group: Building Frame. Nets for English, German, Spanish, Japanese, French, …

Why Frame Semantics? Using Frame Semantics in NLP applications – Focusing on lexical semantic

Why Frame Semantics? Using Frame Semantics in NLP applications – Focusing on lexical semantic classes and role-based argument structure – Disregarding aspects of „deep“ semantics: negation, modality, quantification, . . . – Normalisation: syntactic alternations [Fred. Agent] hit. Cause_Impact [the ball. Impactee]. --- [The ball. Impactee] was hit. Cause_Impact [John. Donor] gave. Giving [Mary. Recipient] [a book. Theme]. [John. Donor] gave. Giving [a book. Theme] [to Mary. Recipient]. § Normalisation: lexical alternations (within and across part-of-speech) [Marylin. Speaker] spoke. Statement about [her past. Topic]. [Marylin. Speaker]‘s statement. Statement about [her past. Topic]. [Marylin. Speaker] talked. Statement about [her past. Topic]. Provides semantic classes (senses) within a semantic network, combined with argument structure information, at high abstraction level

Annotating a German Corpus with Frames Manual semantic annotation of a syntactically analysed corpus

Annotating a German Corpus with Frames Manual semantic annotation of a syntactically analysed corpus – TIGER Treebank (Universities of Saarbrücken, Stuttgart, Potsdam) – 1. 5 million words / 80 K sentences of newspaper text (Frankfurter Rundschau) – Combined constituent and dependency structure (edges labelled with grammatical functions), with crossing edges for flexible word order – Relatively flat trees SPD asks coalition to talk about reform

Annotation Scheme Annotating frames on top of syntactic structure • Frame REQUEST is evoked

Annotation Scheme Annotating frames on top of syntactic structure • Frame REQUEST is evoked by discontinuous target word „fordert auf“ (ask, request) • Frame elements (roles) are connected to constituents • Flat semantic trees (depth 1) • Independent frames SPD asks coalition to talk about reform

Annotation Scheme Annotating frames on top of syntactic structure • Frame REQUEST is evoked

Annotation Scheme Annotating frames on top of syntactic structure • Frame REQUEST is evoked by discontinuous target word „fordert auf“ (ask, request) • Frame elements (roles) are connected to constituents • Flat semantic trees (depth 1) • Independent frames SPD asks coalition to talk about reform Encoded in TIGER/SALSA XML, an extension of TIGER XML: modular description of syntax and (frame) semantics

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and background information Annotating a German Corpus with Frame. Net Frames … Ø Cross-language application: using English Frame. Net for German – Corpus-based approach • Coverage problems • Special phenomena: non-compositionality and vagueness – Consistency control – From Corpus to Lexicon … and using them in NLP applications – Automatic Frame and Role Assignment (Erk and Pado 2006) – Frame Semantics for Textual Entailment (Burchardt and Frank, 2006) Conclusions and Outlook

Using English Frame. Net Frames for German SALSA frames – stay as close as

Using English Frame. Net Frames for German SALSA frames – stay as close as possible to the Berkeley Frame. Net database – Cross-lingual divergences: adaptation of FN frames Cross-lingual divergences – missing FEs – differences in lexical realisation patterns

Missing FEs Taking: An Agent removes a Theme from a Source such that it

Missing FEs Taking: An Agent removes a Theme from a Source such that it is in the Agent’s possession. (Source: either location or former possessor) (2) Er nahm [dem Mann]? das Bier aus der Hand. He took the man the beer from the hand “He took the beer from the man’s hand” Ø Adding lacking FEs to frames (here: Possessor)

Differences in lexical realisation patterns (Rare) cases in which German verbs run counter to

Differences in lexical realisation patterns (Rare) cases in which German verbs run counter to frame distinctions made on English data – German “fahren” encompasses English “to drive” and “to ride” s 20937: In 14 Armeefahrzeugen fuhren sie von dem abgezäunten Gelände, das der Besatzungsmacht 28 Jahre lang als Hauptquartier gedient hatte “With 14 army vehicles they drove/departed from the enclosed area which had served the occupying forces for more than 28 years. ” s 27678: Und die Inhaber von Jahresnetzkarten fahren künftig sogar billiger. “Holders of annual-season tickets will ride even cheaper in the future. ” Ø FN has introduced the frame Use_vehicle which subsumes Operate_vehicle and Ride_vehicle

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and overview Annotating a German Corpus with Frame. Net Frames … – Cross-language application: using English Frame. Net for German – Corpus-based approach Ø Coverage problems • Special phenomena: non-compositionality and vagueness – Consistency control – From Corpus to Lexicon … and using them in NLP applications – Automatic Frame and Role Assignment (Erk and Pado 2006) – Frame Semantics for Textual Entailment (Burchardt and Frank, 2006) Conclusions and Outlook

Coverage Problems SALSA: corpus-based approach – For each predicate: Annotation of all instances in

Coverage Problems SALSA: corpus-based approach – For each predicate: Annotation of all instances in TIGER corpus Frame. Net: lexicographic approach – Defining frames as a sense inventory for describing word meaning – Proceeding by frames, not by predicate Handling gaps in Frame. Net – Frame. Net does not (yet) cover the complete “conceptual space” Predicates encountered in corpus may have missing senses • behandeln 1 (treat an illness) => Frame CURE • behandeln 2 (treat with kindness) => No frame available – Frame. Net does not consider multiword expressions or figurative senses Ø Construction of German Proto-frames („Unknowns“)

„Unknown“ Frames Construction of proto-frames (“Unknowns”) – Inspection of first 20 corpus instances –

„Unknown“ Frames Construction of proto-frames (“Unknowns”) – Inspection of first 20 corpus instances – Identify and group readings not covered by existing Frame. Net frames Proto-framing – Textual definition for frames and roles • Often contrastive to existing frames • Applying Frame. Net „framing principles“ – Differences wrt. Frame. Net frames • Lemma-specific: No evidence from groups of predicates • Covers only senses found in TIGER – Proto-frames of similar predicates are often related („is identical to“)

Example: „rechnen“ (I) Categorisation (Frame. Net) („rank/range among, count as“) – A Cognizer construes

Example: „rechnen“ (I) Categorisation (Frame. Net) („rank/range among, count as“) – A Cognizer construes an Item as belonging to a certain Category. – Hat [Cognizer man] [Item sie] [Category zur alten Elite] gerechnet? „Did one range her among the old elite? “ Unknown 1 (SALSA) („range among, count as“) – An Item is an example or a member of a particular Category. In contrast to Categorisation, there is no Cognizer involved. In contrast to Membership, the Category does not have to be a social organisation. – [Item Die Philippinen und Chile] rechnen [Cat zu den armen Ländern der Region]. „The Philippines and Chile range among the poor countries of the region“

Example: „rechnen“ (II) Expectation (Frame. Net) („expect“) – Words in this frame have to

Example: „rechnen“ (II) Expectation (Frame. Net) („expect“) – Words in this frame have to do with a Cognizer believing that some Phenomenon will take place in the future. [Cog Das Geldinstitut] rechnet [Phen mit einem Angebotsüberhang]. „The institute reckons with a back-lock of offers“ Unknown 2 (SALSA) („count on“) – An Event or State will happen in the foreseeable future. In contrast to Expectation, the actual factivity of the Event is stressed. Is Identical To: Unknown 1 of Frames: bevorstehen. v [Event Womit] hätte [Exp man] rechnen müssen? „What would one have had to count on? “ Unknown 3 (SALSA) („pay off“) – A state of affairs or entity (Theme) creates or increases profit for a beneficiary. [Thm Das Steigen der Grundstueckspreise] rechnet sich auf jeden Fall. „The increase of land prices pays off“

Some Figures Sample: annotation of 476 German predicates – – – 18500 annotated instances

Some Figures Sample: annotation of 476 German predicates – – – 18500 annotated instances 252 annotated with FN frames 373 annotated with proto-frames Avg. 2. 8 frames/pred (2 FN frames + 0. 8 proto-frames) Avg. 43 sentences per FN frame Avg. 17 sentences per proto-frame Most polysemous predicate: “kommen” – 39 frames (FN and proto), includes MWEs Predicate with most missing senses: “bringen” – 15 proto-frames

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and overview Annotating a German Corpus with Frame. Net Frames … – Cross-language application: using English Frame. Net for German – Corpus-based approach • Coverage problems Ø Special phenomena: non-compositionality and vagueness – Consistency control – From Corpus to Lexicon … and using them in NLP applications – Automatic Frame and Role Assignment (Erk and Pado 2006) – Frame Semantics for Textual Entailment (Burchardt and Frank, 2006) Conclusions and Outlook

Non-compositional Phenomena Support verb constructions – Der Professor hält eine Vorlesung The professor is

Non-compositional Phenomena Support verb constructions – Der Professor hält eine Vorlesung The professor is holding a lecture “The professor is giving a lecture” Metaphors – Der Chef kocht The boss is cooking “The boss is in a rage” Idioms – Die unannotierten Sätze gehen zur Neige The unannotated sentences are going towards decline “The unannotated sentences are running out”

Some Figures Standard readings Sample of 246 Lemmas Number % Sub-corpus nehmen Number %

Some Figures Standard readings Sample of 246 Lemmas Number % Sub-corpus nehmen Number % 4638 85, 7% 42 17, 4% Metaphor 369 6, 8% 38 15, 8% Support 326 6, 0% 132 54, 8% 79 1, 5% 29 12, 0% 774 14, 3% 199 82, 6% 5412 100, 0% 241 100, 0% Idiom Non-literal use Total

Support Verb Constructions Annotation – The semantic head is a noun or adjective supported

Support Verb Constructions Annotation – The semantic head is a noun or adjective supported by the governing verb („take a bath“, „perform an operation“) – The verb is tagged with a pseudo-frame „Support“ with frame element „Supported“ The current prime (minister) can take in claim to have. . . „the current prime minister can claim to have. . . “

Idioms Classification Criteria – non-compositional: meaning composition not transparent – the meaning is introduced

Idioms Classification Criteria – non-compositional: meaning composition not transparent – the meaning is introduced by the whole construction (modulo variability) Annotation – Tagging multi-word expression as complex FEE (frame-evoking predicate) Nachteile in Kauf nehmen disadvantages in purchase take „accept disadvantages“

Metaphors Classification criteria – non-literal (“figurative”) meaning – semi-compositional: recoverable (literal meaning + mapping

Metaphors Classification criteria – non-literal (“figurative”) meaning – semi-compositional: recoverable (literal meaning + mapping from literal to non-literal meaning) • Subjective – Example: „Many think that Perot would walk into a wall on Capitol. “ Annotation – Annotation of source (literal) and target (metaphorical) meaning with “flags”: source / target – Determining the target frame is often difficult • „For some, this goes too far“ – In these cases, only the source frame is annotated, with a metaphor flag „source“ for recovery

Annotation of Metaphors (transparent) – Source frame evoked by the verb – Target frame

Annotation of Metaphors (transparent) – Source frame evoked by the verb – Target frame projected from MWE (verb + syntactic argument) Source Target The sound of their Bigband is a jewel which one can safely take under a strong magnifying glass „The sound of their Bigband is a jewel which stands up to any scrutiny. “

Metaphor: Transfer Scheme Ein Juwel das man unter die starke Lupe nehmen kann A

Metaphor: Transfer Scheme Ein Juwel das man unter die starke Lupe nehmen kann A jewel which one can take under a strong magnifying glass FEE Frame Roles nehmen PLACING AGENT [1] man THEME [2] ein Juwel GOAL [3] ([4] starke) Lupe FEE Frame Roles nehmen • [3]/[4] SCRUTINY COGNIZER [1] man PHENOMENON [2] ein Juwel DEGREE [4] starke

Vagueness and Ambiguity Often, it is not possible to make a safe choice among

Vagueness and Ambiguity Often, it is not possible to make a safe choice among a set of possible semantic interpretations – In frame assignment – In the assignment of semantic roles Different sources of ambiguity and vagueness – Available context does not allow resolution of an ambiguity – More than one interpretation may apply at the same time – The distinction between two readings may be systematically unclear Cases are hard to distinguish (Kilgarriff and Rosenzweig, 2000)

Vagueness and Ambiguity Examples – Gleichwohl versuchen offenbar Assekuranzen, das Gesetz zu umgehen, indem

Vagueness and Ambiguity Examples – Gleichwohl versuchen offenbar Assekuranzen, das Gesetz zu umgehen, indem sie von Nichtdeutschen [mehr Geld] verlangen “… by claiming more money from non-Germans” • REQUEST and/or COMMERCE frame? – Die nachhaltigste Korrektur der Programmatik fordert [ein Antrag] “A motion requests the most sustainable correction of the political objectives” • Motion may be SPEAKER or MEDIUM Underspecification – Annotators may assign a set of frames or frame elements marked as “underspecified” (blue) – Special markup in TIGER-SALSA-XML Foreign investors in India again welcome

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and overview Annotating a German Corpus with Frame. Net Frames … – Cross-language application: using English Frame. Net for German – Corpus-based approach • Coverage problems • Special phenomena: non-compositionality and vagueness Ø Consistency control – From Corpus to Lexicon … and using them in NLP applications – Automatic Frame and Role Assignment (Erk and Pado, 2006) – Frame Semantics for Textual Entailment (Burchardt and Frank, 2006) Conclusions and Outlook

The „four eye“ Principle Subcorpus preparation Extracting all TIGER sentences for a given lemma

The „four eye“ Principle Subcorpus preparation Extracting all TIGER sentences for a given lemma Identifying suitable Frame. Net frames Construction of „Unknowns“ Annotator 1 Annotator 2 Merging for double adjudication Adjudicator 1 Conflict resolution Adjudicator 2 Conflict resolution Detection and resolution of „major“ annotation errors and conflicts Merging for meta adjudication Detection and discussion of „difficult“ annotation problems „DONE“ Entire process supported by SALTO annotation tool

Inter-annotator and -adjudicator Agreement (Frames) Agreement (FEs) Inter-Annotator 84. 9% 85. 7% Inter-Adjudicator 97.

Inter-annotator and -adjudicator Agreement (Frames) Agreement (FEs) Inter-Annotator 84. 9% 85. 7% Inter-Adjudicator 97. 0% 96. 2% Adjudication can resolve annotation differences fairly reliably – Reduction of disagreements from 15% to 3 -4% – Present strategy: “Four-eye adjudcation” Remaining disagreements are mostly “real problems” – Constructional problems • Complex markables (Ellipses), ambiguities (e. g. ambiguous pronouns) – Conceptual differences • Difficult role distinctions, Level of abstraction, Uncertain inferences about roles

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and overview Annotating a German Corpus with Frame. Net Frames … – Cross-language application: using English Frame. Net for German – Corpus-based approach • Coverage problems • Special phenomena: non-compositionality and vagueness – Consistency control – From Corpus to Lexicon … and using them in NLP applications – Automatic Frame and Role Assignment (Erk and Pado 2006) – Frame Semantics for Textual Entailment (Burchardt and Frank, 2006) Conclusions and Outlook

A Description Logics based Lexicon Model Work by Dennis Spohr, IMS Stuttgart and SALSA

A Description Logics based Lexicon Model Work by Dennis Spohr, IMS Stuttgart and SALSA (Spohr et al, 2006) Purposes – Querying XML annotations involving intersecting hierarchies – Consistency checking – Lexicon building: Abstraction of lexicon data from annotation instances DL-based modelling of Frame. Net data – OWL DL • Monotonicity, decidability (Baader et al. 2003) • Reasoning and consistency checking services – Formalisation of definitional part of Frame. Net and corpus annotations – Focus on: • Flexible ways for abstraction and normalisation of data • Consistency checking • Storage and querying architecture (SESAME and Se. RQL)

A Description Logics based Lexicon Model T-Box Linguistic Model Annotation Model § Frame. Net

A Description Logics based Lexicon Model T-Box Linguistic Model Annotation Model § Frame. Net – Frames, Frame Relations – Roles § Sense Assignment – Lemma – Frame § Role Assignment – Syntactic units – Roles § Annotation Types – Frames: single, elliptic, metaphoric, USP – Roles: Single, USP – Target: Single, Multi-Word § Sentences § Syntactic units • Normalisation • Querying • Consistency checking A-Box Corpus: Annotation instances • Sentences • Syntactic units • Frame and role annotations

T-Box vs. A-Box T-Box: General classes for frames (and relations) A-Box: Specific frames (CURE)

T-Box vs. A-Box T-Box: General classes for frames (and relations) A-Box: Specific frames (CURE) and corpus annotations Query properties of individual frames (which roles, etc. ) T-Box: General and Specific frames T-Box: General and specific frame classes A-Box: corpus annotations Consistency checking

Querying Retrieving information from the Corpus/Lexicon – Queries specify paths through the model graph

Querying Retrieving information from the Corpus/Lexicon – Queries specify paths through the model graph – Allow querying of intersecting hierarchies Example: Extract all lemmas that evoke the PLACING frame – Se. RQL query – Retrieved information (with grouping for frequency information)

Normalisation of linguistic information at different levels – TIGER syntactic categories and edge labels

Normalisation of linguistic information at different levels – TIGER syntactic categories and edge labels – Normalised syntactic categories and grammatical functions • Noun. P, Prep. P, Sent, …. Subj, Obj, Pobj, … Example: syntactic realisation of semantic roles – Specific categories: 2. 176 realisation patterns – Normalised categories: 1. 026 realisation patterns

First Data Release SALSA Corpus – Scheduled for 2006 – > 500 German verbal

First Data Release SALSA Corpus – Scheduled for 2006 – > 500 German verbal predicates (of all frequency bands) – total size of about 20. 000 annotated instances and Lexicon with querying interfaces

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and

Working with Frames SALSA: Saarbrücken Lexical Semantics Acquisition Project – Project team, objectives and overview Annotating a German Corpus with Frame. Net Frames … – Cross-language application: using English Frame. Net for German – Corpus-based approach • Coverage problems • Special phenomena: non-compositionality and vagueness – Consistency control – From Corpus to Lexicon … and using them in NLP applications – Automatic Frame and Role Assignment (Erk and Pado 2006) Ø Frame Semantics for Textual Entailment (Burchardt and Frank, 2006) Conclusions and Outlook

Frame Semantics for Textual Entailment (Recognizing) Textual Entailement (RTE): Testing a system‘s capacity to

Frame Semantics for Textual Entailment (Recognizing) Textual Entailement (RTE): Testing a system‘s capacity to recognize „Textual Entailment“ Sunday‘s earthquake was felt in the southern Indian city of text Madras on the mainland, as well as other parts of south India. The city of Madras is located inhypothesis Southern India. Entailed? TASK: Entailed? – Yes „Realistic“, open-domain data set drawn from system outputs in NLP applications: IR, IE, QA, SUM Controlled set-up: balanced training and test sets 800/800 text-hypothesis pairs

Textual Entailment „We say that T entails H if the meaning of H can

Textual Entailment „We say that T entails H if the meaning of H can be inferred from the meaning of T, as would typically be interpreted by people. This somewhat informal definition is based on (and assumes) common human understanding of language as well as common background knowledge. “ (Dagan, Glickmann, Magnini, RTE 2005 Workshop Proceedings)

The data Fine-grained linguistic analysis T: Oscar-winning actor Nicolas Cage‘s new son and Superman

The data Fine-grained linguistic analysis T: Oscar-winning actor Nicolas Cage‘s new son and Superman have sth. in common. . . H: Nicolas Cage‘s new son was awarded an Oscar. — No (IE) Lexical semantics and paraphrases (nominalisation, synonymy) T: [o]n December 10 th 1936 King Edward VIII gave up his right to the British throne. H: King Edward VIII abdicated on the 10 th of December, 1936. — Yes (QA) Modality T: U. S. Secretary of State Condoleezza Rice said Thursday that North Korea should return to nuclear disarmament talks and. . . H: North Korea says it will rejoin nuclear talks. . Inference and World Knowledge — No (SUM)

Approximating Textual Entailment Fine-grained LFG-based syntactic analysis – English LFG grammar (Riezler et al.

Approximating Textual Entailment Fine-grained LFG-based syntactic analysis – English LFG grammar (Riezler et al. 2002) broad-coverage with high-quality probabilistic disambiguation Frame Semantics – Coarse-grained lexical-semantic classification of predicates with rolebased argument structure encoding – Extended semantic representations: Word. Net senses, SUMO concepts Computing structural and semantic overlap – Hypothesis: high/low ratio of H/T overlap => entailment: yes/no H/T matching for TE text hypothesis match graph size hypothesis graph size

Approximating Textual Entailment Fine-grained LFG-based syntactic analysis – English LFG grammar (Riezler et al.

Approximating Textual Entailment Fine-grained LFG-based syntactic analysis – English LFG grammar (Riezler et al. 2002) broad-coverage with high-quality probabilistic disambiguation Frame Semantics – Coarse-grained lexical-semantic classification of predicates with rolebased argument structure encoding – Extended semantic representations: Word. Net senses, SUMO concepts Computing structural and semantic overlap – A learning problem: measures of overlap, weighted entailment decision H/T matching for TE text hypothesis match graph size hypothesis graph size

The SALSA RTE System Linguistic analysis components and Integration XLE parsing: LFG f-structure Fred/Detour

The SALSA RTE System Linguistic analysis components and Integration XLE parsing: LFG f-structure Fred/Detour + Rosy: frames & roles Word. Net-based WSD: Word. Net & SUMO f-structure w/ (extended) framesemantic projection Recognizing Textual Entailment: Graph matching & Statistical approximation text hypothesis f-structure w/ frames & concepts text-hypothesis-match graph • matching nodes and edges • different match types (similarity types) • extensions for deeper modelling (modality, lexical entailment) Feature extraction Using XLE term rewriting system (Crouch 2005) Model training & classification

Frame and Role Assignment Shalmaneser (Erk & Pado, 2006) – Shallow semantic parser for

Frame and Role Assignment Shalmaneser (Erk & Pado, 2006) – Shallow semantic parser for Frame. Net frame and role assignment – Fred: statistical frame assignment • WSD system for predicates, in terms of frames – Rosy: semantic role assignment • Argument recognition and argument labelling • Using state-of-the-art features from robust syntactic parsing Detour (to Frame. Net via Word. Net) (Burchardt et al. , 2005) – Aim: overcome lexical gaps in Frame. Net – A rule-based frame assignment system that takes a “detour to Frame. Net via Word. Net” § Determine similarity of “unknown LUs” to existing frames (their LUs) based on Word. Net-similarity measures

Frame and Role Assignment Fred & Rosy (Shalmaneser) Fred, Detour & Rosy

Frame and Role Assignment Fred & Rosy (Shalmaneser) Fred, Detour & Rosy

Extended semantics projection Porting frame and role assignments to LFG f-structure – Defining a

Extended semantics projection Porting frame and role assignments to LFG f-structure – Defining a frame semantics projection using head lemmata as interface layer (accounts for parser discrepancies) – Using XLE rewrite system (Crouch 2005) Head-indexed frame & role assignments

Extended semantics projection Rule-based extensions of LFG-frame structures – Frames corresponding to LFG NE

Extended semantics projection Rule-based extensions of LFG-frame structures – Frames corresponding to LFG NE classes (location, date, companies, …) – Extra-thematic roles, based on LFG adjunct classes (time, reason, location, etc. ) • +adjunct(Z, Y), ntype_sem(Y, time) ==> s: : (Z, Sem. Z), s: : (Y, Sem. Y), time(Sem. Z, Sem. Y). Extended semantics projection: Word. Net and SUMO classes – WSD: Banerjee & Pedersen, 2003 – Word. Net – SUMO/MILO mapping: Niles and Pease (2001)

A walk-through-example from RTE 2006 Pair 716 Text In 1983, Aki Kaurismäki directed his

A walk-through-example from RTE 2006 Pair 716 Text In 1983, Aki Kaurismäki directed his own first full-time feature. Hypothesis Aki Kaurismäki directed a film.

LFG F-Structures

LFG F-Structures

Automatic Frame Annotation for Text Fred & Rosy frames & roles (statistical) Collins Parse

Automatic Frame Annotation for Text Fred & Rosy frames & roles (statistical) Collins Parse Detour System frames (via Word. Net)

Automatic Frame Annotation for Hypothesis 716_h: Aki Karusmäki directed a film.

Automatic Frame Annotation for Hypothesis 716_h: Aki Karusmäki directed a film.

LFG and Frames for Hypothesis Rule-based (LFG-NER) Aki Kaurismäki directed a film.

LFG and Frames for Hypothesis Rule-based (LFG-NER) Aki Kaurismäki directed a film.

The SALSA RTE System Linguistic analysis components and Integration XLE parsing: LFG f-structure Fred/Detour

The SALSA RTE System Linguistic analysis components and Integration XLE parsing: LFG f-structure Fred/Detour + Rosy: frames & roles Word. Net-based WSD: Word. Net & SUMO f-structure w/ (extended) framesemantic projection Recognizing Textual Entailment: Graph matching & Statistical approximation text hypothesis f-structure w/ frames & concepts text-hypothesis-match graph • matching nodes and edges • different match types (similarity types) • extensions for deeper modelling (modality, lexical entailment) Feature extraction Model training & classification

Hypothesis-Text-Match Graphs Computing structural and semantic overlap – Computing a “match graph” from text

Hypothesis-Text-Match Graphs Computing structural and semantic overlap – Computing a “match graph” from text and hypothesis graphs – Different aspects of similarity: • Syntactic: f-structure (PRED, grammatical functions, functional attributes) • Semantic: extended frame structures (frames, roles, Word. Net, SUMO) – Different degrees of similarity: • Strict similarity: Identical syntactic and semantic nodes and edges • Weak similarity: WN-/ FN-relatedness for non-identical PREDs and frames Match graph consists of partial syntactic & semantic graphs Approximating textual entailment – High/low overlap ratio of hypothesis and match graph => entailment: yes/no H/T matching for TE text hypothesis match graph size hypothesis graph size

t: In 1983, Aki Kaurismäki directed his own first fulltime feature. Grammatically related h:

t: In 1983, Aki Kaurismäki directed his own first fulltime feature. Grammatically related h: Aki Kaurismäki directed a film. Word. Net related

Extensions: Modality Detecting indicators of inconsistent modality types – T: A pet must have

Extensions: Modality Detecting indicators of inconsistent modality types – T: A pet must have rabies protection confirmed by a blood test. H: A case of rabies was confirmed. Marking modal contexts in text and hypothesis – 5 modality types: conditional, future, diamond, box, negation Handling inconsistent modality types in matching process – Introducing negatively marked match nodes – Blocking embedded structures for similarity-based matches – Thus, reducing the size of the match graph

Extensions: Lexical Entailments Bridging partial non-matching text and hypothesis pairs – T: Olson, 62,

Extensions: Lexical Entailments Bridging partial non-matching text and hypothesis pairs – T: Olson, 62, previously worked as a partner at Ernst & Young LLP, as a Minnesota bank president and as a congressional aide, before joining the Fed board in 2001, to serve a term ending in 2010. H: Olsen is a member of the Fed board. Lexically induced inferences, defined as rewrite rules on h/t/m graphs t: (X 1) joins X 2 h: (Y 1) member-of Y 2 m: (Z 2, Y 2, X 2) => match_type(heuristic_entailment_match). Similar: non-lexical heuristic inferences – Appositions: prime minister X X is prime minister – Possessive constructions: X’s Y the Y of X

Similarity/Entailment measures lexical text graph hypothesis graph match graph proportional: h/t and m/h ratio

Similarity/Entailment measures lexical text graph hypothesis graph match graph proportional: h/t and m/h ratio lex_id ratio_lexid node_m (pred, coref, pro) edge_syn_m (all, gf, subc) ratio_nodes ratio_edges (lfg_)frames_m (lfg_)roles_m ratio_(lfg_)frames ratio_(lfg_)roles syntactic Semantic strict (lfg_)frames_t (lfg_)roles_t (lfg_)frames_h (lfg_)roles_h weak node_frame. FN/derived_m mode_framerel/detour/wnrel_m node_heuristic_entailment_m node_modal_ctxt_mismatch_m Connectedness other clusters_no, clusters_avg_size fragmentary rte_task clusters_avgsize_rel_h clusters_abssize_rel_h

Machine learning WEKA: Selected learners and models – Model 1 Simple Conjunctive Rule classifier

Machine learning WEKA: Selected learners and models – Model 1 Simple Conjunctive Rule classifier preds_m_relto_h 0. 485294 & frames_m_relto_h 0. 954546 rte_entails = 0 Medium/high threshold on pred/frame matches as criterion for rejection High degree of frame similarity /w medium predicate similarity models entailment – Model 2 Meta-classifier Logit. Boost 1. No. of predicate matches relative to hypothesis 2. No. of frame (Fred, Detour) matches relative to hypothesis 3. No. of roles (Rosy) matches relative to hypothesis 4. Match graph size rel. to hypothesis, incl. syn, sem, ontological info

Results in RTE-II SALSA RTE system results RTE-II all tasks IE IR QA SUM

Results in RTE-II SALSA RTE system results RTE-II all tasks IE IR QA SUM Model 1 59. 0 49. 5 54. 5 72. 5 Model 2 57. 8 48. 5 57. 0 67. 0 – Both models score SUM > IR > QA > IE – Refined model better on QA – simple model better on SUM Overall RTE-II results – Average accuracy: 60% (Median: 59%) Accuracy range (in%) No. of groups 53 - 56 58 - 61 62 - 64 74 -75 7 11 3 2

True negatives Modal context marking seems to be effective – 27% of all true

True negatives Modal context marking seems to be effective – 27% of all true negatives involved modality mismatches, while only 11. 9% of all sentences involve marked modal contexts T: The goal of preserving indigenous culture can hardly be achieved by a handful of researchers and curators at museums of ethnology and folk culture. H: Indigenous folk art is preserved. (233) T: Even today, within the deepest recesses of our mind, lies a primordial fear that will not allow us to enter the sea without thinking about the possibility of being attacked by a shark. H: A shark attacked a human being. (322) Future plans – Extend to lexically induced modality/facticity indicators – Testing for non-monotonicity contexts

False positives Typical cases Semantic dissimilarity – Non-matching predicates within larger match graphs, which

False positives Typical cases Semantic dissimilarity – Non-matching predicates within larger match graphs, which are in fact semantically dissimilar Structural distance – Matching nodes within a match graph correspond to far distant nodes in the text graph – compared to neighbouring nodes in the match graph

False positives Unconnected nodes matched with distant nodes in text grap T: Some 420

False positives Unconnected nodes matched with distant nodes in text grap T: Some 420 people have been hanged in Singapore since 1991, mostly for drug trafficking, an Amnesty International 2004 report said. That gives the country of 4. 4 million people the highest execution rate in the world relative to population. H: 4. 4 million people were executed in Singapore. (198) – False positive

False positives Graph matching process – Allows criss-cross matching of nodes in the match

False positives Graph matching process – Allows criss-cross matching of nodes in the match graph – Builds growing clusters by finding matching edges text hypothesis Introduce weighted edges that reflect the relative distance of pairs of match nodes in text and hypothesis (path distance)

False positives Graph matching process – Allows criss-cross matching of nodes in the match

False positives Graph matching process – Allows criss-cross matching of nodes in the match graph – Builds growing clusters by finding matching edges text hypothesis Introduce weighted edges that reflect the relative distance of pairs of match nodes in text and hypothesis (path distance)

Exeriences we gained. . . Annotation – Semantic annotation is difficult, time-consuming and expensive

Exeriences we gained. . . Annotation – Semantic annotation is difficult, time-consuming and expensive – Frame Semantics works well cross-linguistically – Complementarity of lexicographically vs. corpus-driven annotation Automation and Application – Training automatic frame assignment systems • Shalmaneser (Erk and Pado, 2006) – Experiments in cross-language projection • Pado and Lapata (2005, 2006) – Using frame semantics in NLP tasks: • Textual Entailment (Burchardt and Frank 2006) • Multi-lingual ontology-based question-answering (Frank et al. 2006)