Speed and accuracy in shallow and deep stochastic

  • Slides: 13
Download presentation
Speed and accuracy in shallow and deep stochastic parsing Ron, Stefan, Tracy, John, Alex,

Speed and accuracy in shallow and deep stochastic parsing Ron, Stefan, Tracy, John, Alex, Dick HLT/NAACL 2004

Popular Myth n Shallow statistical parsers are fast, robust… and useful n Deep grammar-based

Popular Myth n Shallow statistical parsers are fast, robust… and useful n Deep grammar-based parsers are slow and brittle n Is this true?

Empirical test n Compare current systems – Stochastic XLE with English LFG grammar –

Empirical test n Compare current systems – Stochastic XLE with English LFG grammar – Collins (1999) tree-bank parser: de facto standard n Inspired by Culy’s experiments: Fall 2003 Pargram – How long to parse same corpus? FX PAL tech report – Measured coverage as indicator of quality n Present comparison: Speed and accuracy – How long to parse same corpus? PARC 700 Gold Standard – Measure accuracy on dependency triples, not phrase-trees n Dependencies needed for meaning-sensitive applications (= usefulness) (translation, question answering…but maybe not IR)

Collins (1999) Parser n Tree bank grammar, stochastic selection learned from WSJ n Model

Collins (1999) Parser n Tree bank grammar, stochastic selection learned from WSJ n Model 3: Categories explicitly mark – Heads – Arguments (vs. adjuncts) – Gap-threads and traces n Requires separate part-of-speech tagger (Ratnaparkhi, 1996) – Words are not stemmed n Produces single most probable parse n Speed/accuracy controlled by beam-size parameter

A Collins tree He reiterated his opposition to such funding, but expressed hope of

A Collins tree He reiterated his opposition to such funding, but expressed hope of a compromise.

Cleaned up a bit n Nonterminals simplified n Terminals stemmed (a la English FST,

Cleaned up a bit n Nonterminals simplified n Terminals stemmed (a la English FST, filtered by POS tags)

Gap-threading

Gap-threading

Trees to triples: Hard problem n n Goal is fair comparison… without building an

Trees to triples: Hard problem n n Goal is fair comparison… without building an LFG grammar/lexicon General mapping principles: – NP-a under S is SUBJ, NP-a under VP is OBJ, … – Use tags to get tense, number, … features n Transformations for equivalent analyses – Embedded auxes perf, prog features – Flat coordination conjunct phrases n Stem mismatches – Named entities, participles vs. base forms (following/JJ) n n Subjects of infinitives? promise vs. persuade Distribution over coordinations? An imperfect science

F-structures to (reduced) triples n Eliminate features that are peculiar to LFG parse –

F-structures to (reduced) triples n Eliminate features that are peculiar to LFG parse – CHECK features, NTYPE, VTYPE, etc. n Keep semantically relevant features – Grammatical functions, tense, number, stmt-type etc. n Again, attempt at fair comparison

Reduced triples example Meridian will pay a premium of $30. 5 million to assume

Reduced triples example Meridian will pay a premium of $30. 5 million to assume $2 billion in deposits. tense(pay~0, fut), adjunct(pay~0, assume~7), obj(pay~0, premium~3), stmt_type(pay~0, declarative), subj(pay~0, Meridian~5), det_type(premium~3, indef), adjunct(premium~3, of~23), num(premium~3, sg), adjunct(million~4, 30. 5~28), number_type(million~4, cardinal), num(Meridian~5, sg), obj(assume~7, $~9), subj(assume~7, pro~8), number($~9, billion~17), adjunct($~9, in~11), num($~9, pl), adjunct_type(in~11, nominal), obj(in~11, deposit~12), num(deposit~12, pl), adjunct(billion~17, 2~19), number_type(billion~17, cardinal), number_type(2~19, cardinal), obj(of~23, $~24), number($~24, million~4), num($~24, pl), number_type(30. 5~28, cardinal))

Experiments n Tune on 140 held-out sentences – Collins tuning » Beamsize 1, 000

Experiments n Tune on 140 held-out sentences – Collins tuning » Beamsize 1, 000 is much faster, almost as accurate as recommended 10, 000 – XLE tuning » Accuracy: estimation parameters » Speed: skimming, maxmedial, stochastic time-out » Identify Core grammar: certain OTMARKS made NOGOOD » Faster, with more fragments, but acceptable accuracy n Test on remaining 560 – F-score on (reduced) triples – Time » XLE: includes morphology, parsing, disambiguation » Collins: includes tagging, parsing, doesn’t include stemming, triples conversion – Core and Complete grammar

Results Time Prec Recall F-score LFG Core 298. 88 79. 1 76. 2 77.

Results Time Prec Recall F-score LFG Core 298. 88 79. 1 76. 2 77. 6 LFG Complete 985. 3 79. 4 79. 8 79. 6 Collins 1, 000 199. 6 78. 3 71. 2 74. 6