Morphological Segmentation InsideOut Ryan Cotterell Arun Kumar Hinrich







































- Slides: 39
Morphological Segmentation Inside-Out Ryan Cotterell, Arun Kumar, Hinrich Schütze
Old Idea: Surface Morphological Segmentation
unachievability Segment un achiev abil ity PREFIX STEM SUFFIX
Semi-New Idea: Canonical Morphological Segmentation
unachievability Restore unachieveableity Segment un achieve able ity PREFIX STEM SUFFIX
Why is canonicalization useful?
unachievability achievement underachiever achieves
un achiev abil ity achieve ment under achieve s
Are they the same morpheme? ? ? un achiev abil ity achieve ment under achieve s
unachievability achievement underachiever achieves
unachieveableity achievement underachieveer achieves
un achieve able ity achieve ment under achieve s
Canonical segmentations are standardized across words un achieve able ity achieve ment under achieve s
unachievability thinkable accessible untouchable
unachieveableity thinkable accessable untouchable
un achieve able ity think able access able un touch able
un achieve able ity think able access able un touch able
New Idea: Morphology as Parsing
unachievability achievement underachiever achieves
unachieveableity achievement underachieveer achieves
un achieve able ity achieve ment under achieve s
un achieve able ity achieve ment under achieve s
under achieve er
under achieve er
under achieve er
under achieve er
PREFIX STEM SUFFIX under achieve er
Why are trees useful?
Reason 1: Words are ambiguous! Tree Captures Ambiguity! PREFIX STEM PREFIX SUFFIX un lock able “incapable of being locked” “capable of being unlocked” PREFIX STEM SUFFIX un lock able “? ? ? ” Flat Segmentation Doesn’t!
Reason 2: Model Order of Affixation Path of Derivation achieve underachieve c n E d o d e A underachiever e e r T s PREFIX STEM SUFFIX under achieve er
New Resource
Morphological Tree Bank • English • Size
A Joint Model
un achieve able ity unachieveableity unachievability unachieveableity ununachievability achieve able ity Canonical Segmentation Parse Tree Underlying Word (Surface. Form)
How good is the treeunderlying form pair? (s=un achieve able ity, u=unachieveableity) How good is the underlying form-word pair? (u=unachieveableity, w=unachievability)
Inference and Learning • Inference is intractable! • Approximate inference with importance sampling • Decoding also with importance sampling • Learning Ada. Grad (Duchi et al. 2011)
Experimental Results • Key Point: Do trees help segmentation accuracy? – Baseline: flat segmentation model • New Task:
Results
Fin. Thank You!