Context Free Grammar CFG Specification for Structures Constituency
















- Slides: 16

Context Free Grammar (CFG): Specification for Structures & Constituency n Parse Tree: graphical representation of structure u u u root node (S): a sentencial level structure internal nodes: constituents of the sentence arcs: relationship between parent nodes and their children (constituents) terminal nodes: surface forms of the input symbols (e. g. , words) alternative representation: bracketed notation: F n e. g. , [I saw [the [girl [in [the park]]]]] For example: NP NP PP NP girl in the park Jing-Shin Chang 1

Parse Tree: “I saw the girl in the park” S NP VP NP NP NP PP NP pron v det n p det I saw the girl in the park Jing-Shin Chang n 2

CFG: Components n CFG: formal specification of parse trees u u u n G = { , N, P, S} : terminal symbols N: non-terminal symbols P: production rules S: start symbol : terminal symbols u the input symbols of the language F F u n n pre-terminal: parts of speech (when words are regarded as terminals) N: non-terminal symbols u n programming language: tokens (reserved words, variables, operators, …) natural languages: words or parts of speech groups of terminals and/or other non-terminals S: start symbol: the largest constituent of a parse tree P: production (re-writing) rules u u u form: α → β (α: non-terminal, β: string of terminals and non-terminals) meaning: α re-writes to (“consists of”, “derived into”)β, or βreduced to α start with “S-productions” (S → β) Jing-Shin Chang 3

CFG: Example Grammar n Grammar Rules u u u S → NP VP NP → Pron | Proper-Noun | Det Norm → Noun Norm | Noun VP → Verb | Verb NP PP | Verb PP PP → Prep NP F F n S: sentence, NP: noun phrase, VP: verb phrase Pron: pronoun Det: determiner, Norm: Norminal PP: prepositional phrase, Prep: preposition Lexicon (in CFG form) u u u Noun → girl | park | desk Verb → like | want | is | saw | walk Prep → by | in | with | for Det → the | a | this | these Pron → I | you | he | she | him Proper-Noun → IBM | Microsoft | Berkeley Jing-Shin Chang 4

CFG: Accepted Languages n CFG Operations u u u n derivation: applying a production rule to re-write the LHS non-terminal into its constituents rightmost derivation: a sequence of derivations in which the rightmost nonterminal is always re-write first leftmost derivation: leftmost non-terminal first Context-Free Language u Language accepted by a CFG F L(G) = {w | S =*=> w (strings of terminals that can be derived from start symbol)} Jing-Shin Chang 5

CFG: Expressive Power n CFG vs. Regular Expression (R. E. ) u u u n every R. E. can be recognized by a FSA every FSA can be represented by a CFG with production rules of the form: A -> a B | ε therefore, L(RE) < L(CFG) Writing a CFG for a FSA (RE) u u define a non-terminal Ni for a state with state number i start symbol S = N 0 (assuming that state 0 is the initial state) for each transition δ(i, a)=j (from state i to stet j on input alphabet a), add a new production Ni -> a Nj to P for each final state i, add a new production Ni -> εto P Jing-Shin Chang 6

CFG: Expressive Power (cont. ) n Writing a CFG for a FSA (RE) u u u define a non-terminal Ni for a state with state number i start symbol S = N 0 (assuming that state 0 is the initial state) for each transition δ(i, a)=j (from state i to stet j on input alphabet a), add a new production Ni -> a Nj to P for each final state i, add a new production Ni -> εto P For example: RE: (a|b)* a b b a 0 a 1 b 2 b b Jing-Shin Chang 3 S -> a S | b S | a N 1 -> b N 2 -> b N 3 -> ε 7

CFG: Expressive Power (cont. ) n Chomsky Hierarchy: u u R. E. : regular set (FSA) CFG: context-free (pushdown automata) CSG: context-sensitive (linear bounded automata) unrestricted: recursively enumerable (Tuning Machine) Jing-Shin Chang 8

CFG: Equivalence n Chomsky Normal Form (CNF) (Chmosky, 1963): u u ε-free, and Every production rule is in either of the following form: F F F u u n A -> A 1 A 2 A -> a (A 1, A 2: non-terminal, a: terminal) two non-terminals or one terminal at the RHS generate binary tree good simplification for some algorithms (e. g. , grammar training with the inside-outside algorithm (Baker 1979)) Every CFG can be converted into a weakly equivalent CNF u equivalence: L(G 1) = L(G 2) F F u strong equivalent: assign the same phrase structure to each sentence (except for renaming non-terminals) weak equivalent: do not assign the same phrase structure to each sentence e. g. , A -> B C D == {A -> B X, X -> CD} Jing-Shin Chang 9

CFG vs. Finite-State Machine n Inappropriateness of FAS u u n Constituents Recursion RTN (Recursive Transition Network) u u u FSA with augmentation of recursion arc: terminal or non-terminal if arc is non-terminal: call to a sub-transition network & return upon traversal Jing-Shin Chang 10

CFG for English n Sentence Level Constructions u u Declarative (直述句): NP (Subject) VP Imperative (命令句): VP Yes-No Questions: Aux NP VP WH-Questions: Wh-NP VP Jing-Shin Chang 11

CFG for English n Noun Phrase u u Head Noun Modifiers: F n Pre-nominal Modifiers: u u u pre-determiner: “all” determiner: “the” post-determiner: (ordinal) (cardinal) (quantifier) (ADJP) F F F n pre-nominal (pre-head) and post-nominal (post-head) ordinal: “first”, ”second”, ”next” cardinal: “two”, ”three” quantifier: “many”, “several” Post-nominal Modifiers: u u u PP: prepositional phrase non-finite clauses: VP(+ing), VP(+ed), VP(to-V) forms relative clauses: restrictive, non-restrictive F F the man whose son lives in NY (restrictive) the man, whose son lives in NY (non-restrictive) Jing-Shin Chang 12

CFG for English n Coordination (同位語, 對等連接詞, …) u u u n conjunction (conj): and, or, but X → X conj X a big source of ambiguity: X can be almost anything Comparison with Mathematic Operators u u (left/right) association: ((a + b) + c), ( a ** (b ** c) ) (high/low) precedence: a + b x c : (a + b) x c, a + (b * c) Jing-Shin Chang 13

CFG for English n Agreement u Subject-Verb (or Aux. Verb): person & number F F u I like her He likes her Gender Agreement (German or French): ADJ-Noun, Det-Noun Jing-Shin Chang 14

CFG for English n Verb Phrases & Subcategorization u not every verb is compatible with every verb phrases F F F n + NP, +NP-NP, +to-V, +Ving… e. g. , transitive (Vt), intransitive (Vi) subcat. frame for a verb: possible set of complements CFG for SUBCAT Problems u u u. Solution 2: u. VP → verb NP u. VP → verb S Solution 1: VP → v 1 VP → v 2 NP VP → v 3 S F F F v 1 → disappear | … v 2 → find | leave | repeat v 3 → think | believe | say Fverb Jing-Shin Chang → disappear | … | find |… | think 15

CFG for English n Auxiliaries u u n modal: “can”, “may”, “must”, “will” +V(stem) perfect: “have” +V(pp) progressive: “be” +V(ing) passive: “be” +V(past) Multiple Auxiliaries u modal < perfect < progressive < passive F F modal perfect: “could have been …” modal passive: “will be married …” perfect progressive: “have been feasting …” modal perfect passive: “might have been prevented …” Jing-Shin Chang 16