Parsing with ContextFree Grammars for ASR Julia Hirschberg
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy Mc. Keown, Dan Jurafsky and James Martin
What is Syntax? • Structure of language • How words are arranged together and related to one another • Goal of syntactic analysis: relate surface form (what someone says or writes) to underlying structure, to support semantic analysis (what the utterance or text means) • Syntactic representation: typically a tree structure
Structure in Strings • A set of words, or, a lexicon: the a small nice big very boy girl sees likes • Some `good’ (grammatical) sentences: – the boy likes a girl – the small girl likes the big girl – a very small nice boy sees a very nice boy • Some bad (ungrammatical) sentences: – *the boy the girl – *small boy likes nice girl • Can we find a way of distinguishing between the two kinds of sequences? • Can we identify similarities among grammatical subsequences?
One Version of Constituent Structure • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: – (the) boy (likes a girl) – (the small) girl (likes the big girl) – (a very small nice) boy (sees a very nice boy) • Ungrammatical sentences: – *(the) boy (the girl) – *(small) boy (likes the nice girl)
Another Constituency Hypothesis • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: – (the boy) likes (a girl) – (the small girl) likes (the big girl) – (a very small nice boy) sees (a very nice boy) • Ungrammatical sentences: – *(the boy) (the girl) – *(small boy) likes (the nice girl) • Better: fewer types of constituents (blue and red are of same type)
Even More Structures • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: – ((the) boy) likes ((a) girl) – ((the) (small) girl) likes ((the) (big) girl) – ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl) • Ungrammatical sentences: – *((the) boy) ((the) girl) – *((small) boy) likes ((the) (nice) girl)
From Substrings to Trees • (((the) boy) likes ((a) girl)) boy the likes a girl
How do we Label the Nodes? • ( ((the) boy) likes ((a) girl) ) • Choose constituents so each one has one non-bracketed word: the head • Group words by distribution of constituents they head (POS) – Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det) • Category of constituent: XP, where X is POS – NP, S, Adj. P, Adv. P, Det. P
Types of Nodes • (((the/Det) boy/N) likes/V ((a/Det) girl/N)) nonterminal symbols = constituents S NP Det. P the boy likes NP Det. P Phrase-structure tree girl a terminal symbols = words
Determining Part-of-Speech A blue seat/a child seat: noun or adjective? – Syntax: • a blue seat a child seat • a very blue seat *a very child seat • this seat is blue *this seat is child – Morphology: • bluer *childer – blue and child are not the same POS – blue is Adj, child is Noun
Determining Part-of-Speech – Preposition or particle? • • A B he threw out the garbage he threw the garbage out the door he threw the garbage out *he threw the garbage the door out – The two out are not same POS • A is particle, B is Preposition
Constituency • Some Noun phrases (NPs) • A red dog on a blue tree • A blue dog on a red tree • Some big dogs and some little dogs • A dog • I • Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs • How do we know these form a constituent?
NP Constituency • NPs can all appear before a verb: – Some big dogs and some little dogs are going around in cars… – Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs are all at a dog party! – I do not • But individual words can’t always appear before verbs: – *little are going… – *blue are… – *and are • Must be able to state generalizations like: – Noun phrases occur before verbs
PP Constituency • Preposing and postposing: – Under a tree is a yellow dog. – A yellow dog is under a tree. • But not: – *Under, is a yellow dog a tree. – *Under a is a yellow dog tree. • Prepositional phrases notable for ambiguity in attachment – I saw a man on a hill with a telescope.
Context-Free Grammars • Defined in formal language theory – Terminals: e. g. cat – Non-terminal symbols: e. g. NP, VP – Start symbol: e. g. S – Rewriting rules: e. g. S NP VP • Start with start symbol, rewrite using rules, done when only terminals left
A Fragment of English S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the Input: the cat is on the mat
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the S S
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the NP VP S NP VP
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the Det. P N VP S NP Det. P VP N
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the cat VP S NP VP Det. P N the cat
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the cat V PP S NP VP Det. P N the cat V PP
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the cat is Prep NP S NP VP Det. P N V the cat is PP Prep NP
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the cat is on Det N S NP VP Det. P N V the cat is PP Prep NP on Det. P N
Derivations in a CFG S NP VP VP V PP NP Det. P N N cat | mat V is PP Prep NP Prep on Det. P the cat is on the mat S NP VP Det. P N V the cat is PP Prep NP on Det. P the N mat
A More Complicated Fragment of English • • • S NP VP S VP VP V PP VP V NP Det. P NP NP N PP Prep NP Mary likes the cat bowl. The cat ate the tasty food. Hello. Nice talking to you. • N cat | mat | food | bowl | Mary • V is | likes | sits • Prep on | in | under • Det. P the | a
Pocket Sphinx Grammar Format • Variables go in angle brackets, e. g. <city> • Terminals must appear in your pronunciation dictionary (case sensitive) • X Y is concatenation -- e. g. , I WANT • (X | Y) means X or Y -- e. g. , (WANT|NEED) • Square brackets mean optional -- e. g. , [ON] FRIDAY • * means that the expansion may be spoken zero or more times -- e. g. <digit>* • + means one or more times -- e. g. <digit>+
Example • <city> = BOSTON | NEWYORK | WASHINGTON | BALTIMORE; • <time> = MORNING | EVENING; • <day> = FRIDAY | MONDAY; • public <query> = (((WHAT TRAINS LEAVE) | (WHAT TIME CAN I TRAVEL) | (IS THERE A TRAIN)) (FROM|TO) <city> [(FROM|TO) <city>] ON <day> [<time>]); Hello. No. I want to go on Tuesday. When does the train leave?
Next Class • Language modeling for large vocabulary applications: Ngrams
- Slides: 28