CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini

  • Slides: 52
Download presentation
CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini 9/14/2021 CPSC 503 Winter 2009 1

CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini 9/14/2021 CPSC 503 Winter 2009 1

Today 1/10 • Finish POS tagging • Start Syntax / Parsing (Chp 12!) 9/14/2021

Today 1/10 • Finish POS tagging • Start Syntax / Parsing (Chp 12!) 9/14/2021 CPSC 503 Winter 2009 2

Evaluating Taggers • Accuracy: percent correct (most current taggers 96 -7%) *test on unseen

Evaluating Taggers • Accuracy: percent correct (most current taggers 96 -7%) *test on unseen data!* • Human Celing: agreement rate of humans on classification (96 -7%) • Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%) • What is causing the errors? Build a confusion matrix… 9/14/2021 CPSC 503 Winter 2009 3

Confusion matrix • Precision ? • Recall ? 9/14/2021 CPSC 503 Winter 2009 4

Confusion matrix • Precision ? • Recall ? 9/14/2021 CPSC 503 Winter 2009 4

Error Analysis (textbook) • Look at a confusion matrix • See what errors are

Error Analysis (textbook) • Look at a confusion matrix • See what errors are causing problems – Noun (NN) vs Proper. Noun (NNP) vs Adj (JJ) – Past tense (VBD) vs Past Participle (VBN) 9/14/2021 CPSC 503 Winter 2009 5

Knowledge-Formalisms Map (next three lectures) Morphology State Machines (and prob. versions) (Finite State Automata,

Knowledge-Formalisms Map (next three lectures) Morphology State Machines (and prob. versions) (Finite State Automata, Finite State Transducers, Markov Models) Syntax Semantics Pragmatics Discourse and Dialogue Rule systems (and prob. versions) (e. g. , (Prob. ) Context-Free Grammars) Logical formalisms (First-Order Logics) AI planners 9/14/2021 CPSC 503 Winter 2009 6

Today 1/10 • Finish POS tagging • English Syntax • Context-Free Grammar for English

Today 1/10 • Finish POS tagging • English Syntax • Context-Free Grammar for English – – Rules Trees Recursion Problems • Start Parsing 9/14/2021 CPSC 503 Winter 2009 7

Syntax Def. The study of how sentences are formed by grouping and ordering words

Syntax Def. The study of how sentences are formed by grouping and ordering words Example: Ming and Sue prefer morning flights * Ming Sue flights morning and prefer Groups behave as single unit wrt Substitution, Movement, Coordination 9/14/2021 CPSC 503 Winter 2009 8

Syntax: Useful tasks • Why should you care? – Grammar checkers – Basis for

Syntax: Useful tasks • Why should you care? – Grammar checkers – Basis for semantic interpretation • Question answering • Information extraction • Summarization – Machine translation – …… 9/14/2021 CPSC 503 Winter 2009 9

Key Constituents – with heads (Specifier) X (Complement) (English) • • • Noun phrases

Key Constituents – with heads (Specifier) X (Complement) (English) • • • Noun phrases Verb phrases Prepositional phrases Adjective phrases Sentences • • • (Det) (Qual) (Deg) (NP) N V P A (I) (PP) (NP) (PP) (VP) Some simple specifiers Category Typical function Examples Determiner specifier of N the, a, this, no. . Qualifier specifier of V never, often. . Degree word specifier of A or P very, almost. . 9/14/2021 CPSC 503 Winter 2009 Complements? 10

Key Constituents: Examples • (Det) N (PP) the cat on the table • (Qual)

Key Constituents: Examples • (Det) N (PP) the cat on the table • (Qual) V (NP) Verb phrases never eat a cat P (NP) Prepositional phrases • (Deg) almost in the net • (Deg) A (PP) Adjective phrases very happy about it • (NP) (I) (VP) Sentences a mouse -- ate it • Noun phrases • • 9/14/2021 CPSC 503 Winter 2009 11

Context Free Grammar (Example) Start-symbol • S -> NP VP • • • NP

Context Free Grammar (Example) Start-symbol • S -> NP VP • • • NP -> Det NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left 9/14/2021 CPSC 503 Winter 2009 Non-terminal Terminal 12

CFG more complex Example Grammar with example phrases 9/14/2021 CPSC 503 Winter 2009 Lexicon

CFG more complex Example Grammar with example phrases 9/14/2021 CPSC 503 Winter 2009 Lexicon 13

CFGs • Define a Formal Language (un/grammatical sentences) • Generative Formalism – Generate strings

CFGs • Define a Formal Language (un/grammatical sentences) • Generative Formalism – Generate strings in the language – Reject strings not in the language – Impose structures (trees) on strings in the language 9/14/2021 CPSC 503 Winter 2009 14

CFG: Formal Definitions • 4 -tuple (non-term. , productions, start) • (N, , P,

CFG: Formal Definitions • 4 -tuple (non-term. , productions, start) • (N, , P, S) • P is a set of rules A ; A N, ( N)* • A derivation is the process of rewriting 1 into m (both strings in ( N)*) by applying a sequence of rules: 1 * m • L G = W|w * and S * w 9/14/2021 CPSC 503 Winter 2009 15

Derivations as Trees Nominal flight 9/14/2021 CPSC 503 Winter 2009 Context Free? 16

Derivations as Trees Nominal flight 9/14/2021 CPSC 503 Winter 2009 Context Free? 16

CFG Parsing I prefer a morning flight Nominal Parser Nominal flight • It is

CFG Parsing I prefer a morning flight Nominal Parser Nominal flight • It is completely analogous to running a finite-state transducer with a tape – It’s just more powerful • Chpt. 13 9/14/2021 CPSC 503 Winter 2009 17

Other Options • Regular languages (FSA) A x. B or A x – Too

Other Options • Regular languages (FSA) A x. B or A x – Too weak (e. g. , cannot deal with recursion in a general way – no center-embedding) • CFGs A (also produce more understandable and “useful” structure) • Context-sensitive A ; ≠ – Can be computationally intractable • Turing equiv. ; ≠ – Too powerful / Computationally intractable 9/14/2021 CPSC 503 Winter 2009 18

Common Sentence-Types • Declaratives: A plane left S -> NP VP • Imperatives: Leave!

Common Sentence-Types • Declaratives: A plane left S -> NP VP • Imperatives: Leave! S -> VP • Yes-No Questions: Did the plane leave? S -> Aux NP VP • WH Questions: Which flights serve breakfast? S -> WH NP VP When did the plane leave? S -> WH Aux NP VP 9/14/2021 CPSC 503 Winter 2009 19

NP: more details NP -> Specifiers N Complements • NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom

NP: more details NP -> Specifiers N Complements • NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom e. g. , all the other cheap cars • Nom -> Nom PP (PP) e. g. , reservation on BA 456 from NY to YVR Nom -> Nom Gerund. VP e. g. , flight arriving on Monday Nom -> Nom Rel. Clause ->(who | that) VP e. g. , flight that arrives in the evening 9/14/2021 CPSC 503 Winter 2009 20

Conjunctive Constructions • S -> S and S – John went to NY and

Conjunctive Constructions • S -> S and S – John went to NY and Mary followed him • NP -> NP and NP – John went to NY and Boston • VP -> VP and VP – John went to NY and visited MOMA • … • In fact the right rule for English is X -> X and X 9/14/2021 CPSC 503 Winter 2009 21

Problems with CFGs • Agreement • Subcategorization 9/14/2021 CPSC 503 Winter 2009 22

Problems with CFGs • Agreement • Subcategorization 9/14/2021 CPSC 503 Winter 2009 22

Agreement • In English, – Determiners and nouns have to agree in number –

Agreement • In English, – Determiners and nouns have to agree in number – Subjects and verbs have to agree in person and number • Many languages have agreement systems that are far more complex than this (e. g. , gender). 9/14/2021 CPSC 503 Winter 2009 23

Agreement • This dog • Those dogs • *This dogs • *Those dog •

Agreement • This dog • Those dogs • *This dogs • *Those dog • This dog eats • You have it • Those dogs eat • *This dog eat • *You has it • *Those dogs eats 9/14/2021 CPSC 503 Winter 2009 24

Possible CFG Solution OLD Grammar • S -> NP VP • NP -> Det

Possible CFG Solution OLD Grammar • S -> NP VP • NP -> Det Nom • VP -> V NP • … 9/14/2021 NEW Grammar • Sg. S -> Sg. NP Sg. VP • Pl. S -> Pl. Np Pl. VP • Sg. NP -> Sg. Det Sg. Nom • Pl. NP -> Pl. Det Pl. Nom • Pl. VP -> Pl. V NP • Sg. VP 3 p ->Sg. V 3 p NP • … Sg = singular CPSC 503 Winter 2009 Pl = plural 25

CFG Solution for Agreement • It works and stays within the power of CFGs

CFG Solution for Agreement • It works and stays within the power of CFGs • But it doesn’t scale all that well (explosion in the number of rules) 9/14/2021 CPSC 503 Winter 2009 26

Subcategorization • Def. It expresses constraints that a predicate (verb here) places on the

Subcategorization • Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments (see first table) • *John sneezed the book • *I prefer United has a flight • *Give with a flight 9/14/2021 CPSC 503 Winter 2009 27

Subcategorization • • Sneeze: John sneezed Find: Please find [a flight to NY]NP Give:

Subcategorization • • Sneeze: John sneezed Find: Please find [a flight to NY]NP Give: Give [me]NP[a cheaper fare]NP Help: Can you help [me]NP[with a flight]PP Prefer: I prefer [to leave earlier]TO-VP Told: I was told [United has a flight]S … 9/14/2021 CPSC 503 Winter 2009 28

So? • So the various rules for VPs overgenerate. – They allow strings containing

So? • So the various rules for VPs overgenerate. – They allow strings containing verbs and arguments that don’t go together – For example: • VP -> V NP therefore Sneezed the book • VP -> V S therefore go she will go there 9/14/2021 CPSC 503 Winter 2009 29

Possible CFG Solution OLD Grammar • • • VP -> V NP PP •

Possible CFG Solution OLD Grammar • • • VP -> V NP PP • • … • NEW Grammar VP -> Intrans. V VP -> Trans. V NP VP -> Trans. PPto NP PPto … Trans. PPto -> hand, give, . . This solution has the same problem as the one for agreement 9/14/2021 CPSC 503 Winter 2009 30

CFG for NLP: summary • CFGs cover most syntactic structure in English. • But

CFG for NLP: summary • CFGs cover most syntactic structure in English. • But there are problems (overgeneration) – That can be dealt with adequately, although not elegantly, by staying within the CFG framework. • There are simpler, more elegant, solutions that take us out of the CFG framework: LFG, XTAGS… Chpt 15 “Features and Unification” 9/14/2021 CPSC 503 Winter 2009 31

Dependency Grammars • Syntactic structure: binary relations between words • Links: grammatical function or

Dependency Grammars • Syntactic structure: binary relations between words • Links: grammatical function or very general semantic relation • Abstract away from word-order variations (simpler grammars) • Useful features in many NLP applications 9/14/2021 CPSC 503 Winter 2009 32 (for classification, summarization and NLG)

Today 2/10 • English Syntax • Context-Free Grammar for English – – Rules Trees

Today 2/10 • English Syntax • Context-Free Grammar for English – – Rules Trees Recursion Problems • Start Parsing (if time left) 9/14/2021 CPSC 503 Winter 2009 33

Parsing with CFGs Valid parse trees Sequence of words I prefer a morning flight

Parsing with CFGs Valid parse trees Sequence of words I prefer a morning flight Parser Nominal CFG flight Assign valid trees: covers all and only the elements of the input and has an S at the top 9/14/2021 CPSC 503 Winter 2009 34

CFG • • Parsing as Search S -> NP VP S -> Aux NP

CFG • • Parsing as Search S -> NP VP S -> Aux NP VP NP -> Det Noun VP -> Verb Det -> a Noun -> flight Verb -> left, arrive Aux -> do, does Search space of possible parse trees defines Parsing: find all trees that cover all and only the words in the input 9/14/2021 CPSC 503 Winter 2009 35

Constraints on Search Sequence of words Valid parse trees I prefer a morning flight

Constraints on Search Sequence of words Valid parse trees I prefer a morning flight CFG (search space) Parser Nominal flight Search Strategies: • Top-down or goal-directed • Bottom-up or data-directed 9/14/2021 CPSC 503 Winter 2009 36

Top-Down Parsing • Since we’re trying to find trees rooted with an S (Sentences)

Top-Down Parsing • Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S. • Then work your way down from there to the words. Input: flight 9/14/2021 CPSC 503 Winter 2009 37

Next step: Top Down Space ……. . • When POS categories are reached, reject

Next step: Top Down Space ……. . • When POS categories are reached, reject trees whose leaves fail to match all words in the input 9/14/2021 CPSC 503 Winter 2009 38

Bottom-Up Parsing • Of course, we also want trees that cover the input words.

Bottom-Up Parsing • Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. • Then work your way up from there. flight 9/14/2021 CPSC 503 Winter 2009 39

Two more steps: Bottom-Up Space ……. . flight 9/14/2021 ……. . CPSC 503 Winter

Two more steps: Bottom-Up Space ……. . flight 9/14/2021 ……. . CPSC 503 Winter 2009 flight 40

Top-Down vs. Bottom-Up • Top-down – Only searches for trees that can be answers

Top-Down vs. Bottom-Up • Top-down – Only searches for trees that can be answers – But suggests trees that are not consistent with the words • Bottom-up – Only forms trees consistent with the words – Suggest trees that make no sense globally 9/14/2021 CPSC 503 Winter 2009 41

So Combine Them • Top-down: control strategy to generate trees • Bottom-up: to filter

So Combine Them • Top-down: control strategy to generate trees • Bottom-up: to filter out inappropriate parses Top-down Control strategy: • Depth vs. Breadth first • Which node to try to expand next (left-most) • Which grammar rule to use to expand a node (textual order) 9/14/2021 CPSC 503 Winter 2009 42

Top-Down, Depth-First, Left-to. Right Search Sample sentence: “Does this flight include a meal? ”

Top-Down, Depth-First, Left-to. Right Search Sample sentence: “Does this flight include a meal? ” 9/14/2021 CPSC 503 Winter 2009 43

Example 9/14/2021 “Does this flight include a meal? ” CPSC 503 Winter 2009 44

Example 9/14/2021 “Does this flight include a meal? ” CPSC 503 Winter 2009 44

Example “Does this flight include a meal? ” flight 9/14/2021 CPSC 503 Winter 2009

Example “Does this flight include a meal? ” flight 9/14/2021 CPSC 503 Winter 2009 flight 45

Example “Does this flight include a meal? ” flight 9/14/2021 flight CPSC 503 Winter

Example “Does this flight include a meal? ” flight 9/14/2021 flight CPSC 503 Winter 2009 46

Adding Bottom-up Filtering The following sequence was a waste of time because an NP

Adding Bottom-up Filtering The following sequence was a waste of time because an NP cannot generate a parse tree starting with an AUX Aux 9/14/2021 Aux CPSC 503 Winter 2009 Aux 47

Bottom-Up Filtering Category Left Corners S Det, Proper-Noun, Aux, Verb NP Nominal Noun VP

Bottom-Up Filtering Category Left Corners S Det, Proper-Noun, Aux, Verb NP Nominal Noun VP Verb Aux 9/14/2021 Det, Proper-Noun Aux CPSC 503 Winter 2009 48

Problems with TD-BU-filtering • Ambiguity • Repeated Parsing • SOLUTION: Earley Algorithm (once again

Problems with TD-BU-filtering • Ambiguity • Repeated Parsing • SOLUTION: Earley Algorithm (once again dynamic programming!) 9/14/2021 CPSC 503 Winter 2009 49

For Next Time • Read Chapter 13 (Parsing) • Optional: Read Chapter 16 (Features

For Next Time • Read Chapter 13 (Parsing) • Optional: Read Chapter 16 (Features and Unification) – skip algorithms and implementation 9/14/2021 CPSC 503 Winter 2009 50

Grammars and Constituency • Of course, there’s nothing easy or obvious about how we

Grammars and Constituency • Of course, there’s nothing easy or obvious about how we come up with right set of constituents and the rules that govern how they combine. . . • That’s why there are so many different theories of grammar and competing analyses of the same data. • The approach to grammar, and the analyses, adopted here are very generic (and don’t correspond to any modern linguistic theory of grammar). 9/14/2021 CPSC 503 Winter 2009 51

Syntactic Notions so far. . . • N-grams: prob. distr. for next word can

Syntactic Notions so far. . . • N-grams: prob. distr. for next word can be effectively approximated knowing previous n words • POS categories are based on: – distributional properties (what other words can occur nearby) – morphological properties (affixes they take) 9/14/2021 CPSC 503 Winter 2009 52