Crosslingual projection of Semantics Sebastian Pado IGK Colloquium
Cross-lingual projection of Semantics Sebastian Pado IGK Colloquium Dec 16 th 2004
Overview 1. Background: Role Semantics 2. Semantic Projection 3. Current and Future Work
Framework: Role semantics Predicate-argument structure, Theta roles, who did what to whom Agent Recipient Theme Peter gives Mary a book NB. No treatment of discourse relations, modality, negation, etc.
Flavours of role semantics n Top-down approach: common, intuitively defined roleset for all verbs give: is Mary Recipient or Goal or Patient? n resemble: Subj vs. Obj n n Bottom-up approach: Frame Semantics Frames: Conceptual rep of a situation Statement, Giving, Transaction n Each frame is introduced by a target say, give, buy n Roles are frame-specific n
Frame Semantics n An Example Frame: Giving Targets: give, hand out, receive n Roles: Donor, Recipient, Theme n n The Berkeley Frame. Net Project English Frame Lexicon n ~ 200 Frames, ~ 2. 500 words (V/N/Adj) n Typically 3 -6 roles per frame n Corpus of ~ 60. 000 annotated instances n
Frame Semantics: An Example
What do Role Semantics buy us? n Surface-independent representation Solves the paraphrase problem Peter gives the book to Mary receives the book from Peter n n Flexible basis for QA, Inference etc. n n Aljoscha Burchardt’s Ph. D Common cross-lingual semantic rep
Semantic Role Assignment n Task: Automatic tagging of roles on free text Important for NLP applications n Linking (syntax-semantics interface) n n Statistical modelling (as classification) Frame = semantically coherent targets n Targets show linking idiosyncrasies n • Give: Sub - Donor, Dobj - Theme, To-PP/Iobj - Rec • Get: Sub - Rec, Dobj - Theme, From-PP - Donor n Needs lots of training data
Moving to another language… n SALSA: Manual creation and use of a German corpus with semantic annotation n n English frames (mostly) work for German n n Basis: TIGER newspaper corpus, 1. 5 m words Frame concept language-independent But: Annotation slow and error-prone n Total effort: > 10 person years Can we use the English data for German?
Overview 1. Background: Role Semantics 2. Semantic Projection 3. Current and Future Work
Central idea: Semantic Projection 1. Find a large, parallel bilingual corpus 1. 2. Assign semantic roles on English side 1. 3. E/G part of EUROPARL (25 m words) Train automatic tagger on English data Project semantics over to German 1. 2. 3. Step 1: Find semantic equivalences via word alignment Step 2: Project frame Step 3: Project roles Result: Large German annotated corpus
Projection: Example Arriving Peter comes home Arriving Peter kommt nach Hause Three assumptions to make this work
Assumption 1 Semantic representation is parallel Arriving Peter comes home Arriving Peter kommt nach Hause
Semantic (im-)parallelism n Frame definition based on realisable roles German and English typologically similar n Mostly, same frames evoked n n Aspect is problematic n Proper differences We finish by 12 o’clock Activity_finish Wir sind um 12 Uhr fertig Activity_done_state n Same aspect, lexicalised differently I finish by saying Abschliessend sage ich
Assumption 2 There is always parallel lexical material that is semantically equivalent Arriving Peter comes home Arriving Peter kommt nach Hause
(Im)parallelism of lexical material n We only need semantic parallelism, only for targets and roles Don’t care about discourse, modality, etc. n Don’t care about exact wording n n Insights from translation science n Translation = Recreation of text based on content and target language norms • Frame structures ~ propositional content • Specific register • Specific domain (no cultural differences)
Assumption 3 Word Alignment provides semantic equivalence Arriving Peter comes home Arriving Peter kommt nach Hause
Word Alignment as Semantic Equivalence n Current Word Alignment models use cooccurrence to determine alignment n But co-occurrence != semantic equivalence decide insist entscheiden Entscheidung treffen bestehen darauf Problems: Phrasal verbs, Idioms, Support Verbs (Funktionsverbgefuege), Noise proper
Overview 1. Background: Role Semantics 2. Semantic Projection 3. Current and Future Work
Current Work (1) n Empirical assessment of assumptions n n 1. 2. 3. Manual annotation of parallel corpus sample Independent annotation of German / English Evaluation of semantic parallelism Evaluation of lexical parallelism Evaluation of automatic word alignment
Current Work (2) n Token-wise word alignment too noisy n n decide - treffen: Deciding? Instead: Find reliable type equivalences Statistics over complete corpus, filtering n Removal of German collocations n n Result: German frame lexicon n Target x can evoke frames a, b, c n Project frame only if licensed by German lexicon
Current Work (3) n Projection of roles: Find equivalences between constituents 1. 2. Define pairwise similarities Efficiently identify best match 1. Graph matching 2. Probabilistic model n Choice points: n n n Definition of similarities Bijective correspondence, yes or no? Implementation
Future Work Thorough Evaluation n Filtering n n n Projection will be noisy Training a German semantic tagger Evaluation wrt coverage, accuracy n Combination with manually annotated data (SALSA) n n Using another language n English/French part of EUROPARL
Conclusion n Automatic creation of semantically annotated data for a new language n n Projection of annotation from known language using a word-aligned parallel corpus Theory in place n Potential Problems: • Semantics may diverge • Lexical material may diverge • Word Alignment noisy n Empirical evaluation underway
- Slides: 24