Speed and accuracy in shallow and deep stochastic

Popular Myth n Shallow statistical parsers are fast, robust… and useful n Deep grammar-based

Empirical test n Compare current systems – Stochastic XLE with English LFG grammar –

Collins (1999) Parser n Tree bank grammar, stochastic selection learned from WSJ n Model

A Collins tree He reiterated his opposition to such funding, but expressed hope of

Cleaned up a bit n Nonterminals simplified n Terminals stemmed (a la English FST,

Trees to triples: Hard problem n n Goal is fair comparison… without building an

F-structures to (reduced) triples n Eliminate features that are peculiar to LFG parse –

Reduced triples example Meridian will pay a premium of $30. 5 million to assume

Experiments n Tune on 140 held-out sentences – Collins tuning » Beamsize 1, 000

Results Time Prec Recall F-score LFG Core 298. 88 79. 1 76. 2 77.

Slides: 13

Download presentation

Speed and accuracy in shallow and deep stochastic parsing Ron, Stefan, Tracy, John, Alex, Dick HLT/NAACL 2004

Popular Myth n Shallow statistical parsers are fast, robust… and useful n Deep grammar-based parsers are slow and brittle n Is this true?

Empirical test n Compare current systems – Stochastic XLE with English LFG grammar – Collins (1999) tree-bank parser: de facto standard n Inspired by Culy’s experiments: Fall 2003 Pargram – How long to parse same corpus? FX PAL tech report – Measured coverage as indicator of quality n Present comparison: Speed and accuracy – How long to parse same corpus? PARC 700 Gold Standard – Measure accuracy on dependency triples, not phrase-trees n Dependencies needed for meaning-sensitive applications (= usefulness) (translation, question answering…but maybe not IR)

Collins (1999) Parser n Tree bank grammar, stochastic selection learned from WSJ n Model 3: Categories explicitly mark – Heads – Arguments (vs. adjuncts) – Gap-threads and traces n Requires separate part-of-speech tagger (Ratnaparkhi, 1996) – Words are not stemmed n Produces single most probable parse n Speed/accuracy controlled by beam-size parameter

A Collins tree He reiterated his opposition to such funding, but expressed hope of a compromise.

Cleaned up a bit n Nonterminals simplified n Terminals stemmed (a la English FST, filtered by POS tags)

Gap-threading

Trees to triples: Hard problem n n Goal is fair comparison… without building an LFG grammar/lexicon General mapping principles: – NP-a under S is SUBJ, NP-a under VP is OBJ, … – Use tags to get tense, number, … features n Transformations for equivalent analyses – Embedded auxes perf, prog features – Flat coordination conjunct phrases n Stem mismatches – Named entities, participles vs. base forms (following/JJ) n n Subjects of infinitives? promise vs. persuade Distribution over coordinations? An imperfect science

F-structures to (reduced) triples n Eliminate features that are peculiar to LFG parse – CHECK features, NTYPE, VTYPE, etc. n Keep semantically relevant features – Grammatical functions, tense, number, stmt-type etc. n Again, attempt at fair comparison

Reduced triples example Meridian will pay a premium of $30. 5 million to assume $2 billion in deposits. tense(pay~0, fut), adjunct(pay~0, assume~7), obj(pay~0, premium~3), stmt_type(pay~0, declarative), subj(pay~0, Meridian~5), det_type(premium~3, indef), adjunct(premium~3, of~23), num(premium~3, sg), adjunct(million~4, 30. 5~28), number_type(million~4, cardinal), num(Meridian~5, sg), obj(assume~7, $~9), subj(assume~7, pro~8), number($~9, billion~17), adjunct($~9, in~11), num($~9, pl), adjunct_type(in~11, nominal), obj(in~11, deposit~12), num(deposit~12, pl), adjunct(billion~17, 2~19), number_type(billion~17, cardinal), number_type(2~19, cardinal), obj(of~23, $~24), number($~24, million~4), num($~24, pl), number_type(30. 5~28, cardinal))

Experiments n Tune on 140 held-out sentences – Collins tuning » Beamsize 1, 000 is much faster, almost as accurate as recommended 10, 000 – XLE tuning » Accuracy: estimation parameters » Speed: skimming, maxmedial, stochastic time-out » Identify Core grammar: certain OTMARKS made NOGOOD » Faster, with more fragments, but acceptable accuracy n Test on remaining 560 – F-score on (reduced) triples – Time » XLE: includes morphology, parsing, disambiguation » Collins: includes tagging, parsing, doesn’t include stemming, triples conversion – Core and Complete grammar

Results Time Prec Recall F-score LFG Core 298. 88 79. 1 76. 2 77. 6 LFG Complete 985. 3 79. 4 79. 8 79. 6 Collins 1, 000 199. 6 78. 3 71. 2 74. 6