FiniteState and the Noisy Channel 600 465 Intro

  • Slides: 19
Download presentation
Finite-State and the Noisy Channel 600. 465 - Intro to NLP - J. Eisner

Finite-State and the Noisy Channel 600. 465 - Intro to NLP - J. Eisner 1

Noisy Channel Model real language X noisy channel X Y yucky language Y want

Noisy Channel Model real language X noisy channel X Y yucky language Y want to recover X from Y 600. 465 - Intro to NLP - J. Eisner 2

Noisy Channel Model real language X correct spelling noisy channel X Y typos yucky

Noisy Channel Model real language X correct spelling noisy channel X Y typos yucky language Y misspelling want to recover X from Y 600. 465 - Intro to NLP - J. Eisner 3

Noisy Channel Model real language X (lexicon space)* noisy channel X Y delete spaces

Noisy Channel Model real language X (lexicon space)* noisy channel X Y delete spaces yucky language Y text w/o spaces want to recover X from Y 600. 465 - Intro to NLP - J. Eisner 4

Noisy Channel Model real language X (lexicon space)* language model noisy channel X Y

Noisy Channel Model real language X (lexicon space)* language model noisy channel X Y pronunciation acoustic model yucky language Y speech want to recover X from Y 600. 465 - Intro to NLP - J. Eisner 5

Noisy Channel Model real language X tree probabilistic CFG noisy channel X Y yucky

Noisy Channel Model real language X tree probabilistic CFG noisy channel X Y yucky language Y delete everything but terminals text want to recover X from Y 600. 465 - Intro to NLP - J. Eisner 6

Noisy Channel Model real language X p(X) * noisy channel X Y yucky language

Noisy Channel Model real language X p(X) * noisy channel X Y yucky language Y p(Y | X) = p(X, Y) want to recover x X from y Y choose x that maximizes p(x | y) or equivalently p(x, y) 600. 465 - Intro to NLP - J. Eisner 7

Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model. o. phone

Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model. o. phone context CAT: k æ t. o. ə: phone. o. 600. 465 - Intro to NLP - J. Eisner p(word seq) p(phone seq | word seq) p(acoustics | phone seq) context 8

Noisy Channel Model b: b . 7 0 / a a: /0. 3 .

Noisy Channel Model b: b . 7 0 / a a: /0. 3 . o. b: C . 1 0 / 9 a: C 0. / D a: b: D/ * /0. 8 0. b: C b: p(Y | X) 2 = 07. 0 / a: C 63. /0 D a: p(X) D/ /0. 24 = p(X, Y) 06 Note p(x, y) sums to 1. Suppose y=“C”; what is best “x”? 600. 465 - Intro to NLP - J. Eisner 9

Noisy Channel Model b: b . 7 0 / a a: /0. 3 .

Noisy Channel Model b: b . 7 0 / a a: /0. 3 . o. b: C . 1 0 / 9 a: C 0. / D a: b: D/ * /0. 8 0. b: C b: p(Y | X) 2 = 07. 0 / a: C 63. /0 D a: p(X) D/ /0. 24 = p(X, Y) 06 Suppose y=“C”; what is best “x”? 600. 465 - Intro to NLP - J. Eisner 10

Noisy Channel Model b: b . 7 0 / a a: /0. 3 .

Noisy Channel Model b: b . 7 0 / a a: /0. 3 . o. b: C . 1 0 / 9 a: C 0. / D a: b: D/ * /0. 8 0. C: C/ 1 = 7 . 0 0 / C a: 600. 465 - Intro to NLP - J. Eisner b: C bes p(Y | X) 2 . o. restrict just to paths compatible with output “C” p(X) /0. t pa th 24 * (Y=y)? = p(X, y) 11

Edit Distance Transducer O(k) deletion arcs a: e b: e a: b e: a

Edit Distance Transducer O(k) deletion arcs a: e b: e a: b e: a O(k) insertion arcs e: b O(k 2) substitution arcs b: a a: a b: b O(k) no-change arcs 600. 465 - Intro to NLP - J. Eisner 12

Stochastic Edit Distance Transducer O(k) deletion arcs a: e b: e a: b e:

Stochastic Edit Distance Transducer O(k) deletion arcs a: e b: e a: b e: a O(k) insertion arcs e: b O(k 2) substitution arcs b: a a: a b: b O(k) identity arcs Likely edits = high-probability arcs 600. 465 - Intro to NLP - J. Eisner 13

Stochastic Edit Distance Transducer Best path (by Dijkstra’s algorithm) r: e a: e c

Stochastic Edit Distance Transducer Best path (by Dijkstra’s algorithm) r: e a: e c r: c a: a r: a a: c r: a: c a: r: a: l: c a: c l: c: a c c: e: c b: e a a: e c c: . o. l: e l: c: e c l: clara c: e e: c e: c l: e a: e r: e a: b c: b: a e: b = e: a c: ee: a l: ee: a a: ee: a r: e e: a a: e e: a e: c b: b a a a 600. 465 - Intro to NLP - J. Eisner a caca e: c e: c l: e a: e r: e a: e c: e a . o. c a: a a e: a e: a l: e a: e r: e a: e c: e 14

Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model. o. pronunciation

Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model. o. pronunciation model p(word seq) p(phone seq | word seq) . o. acoustic model. o. observed acoustics 600. 465 - Intro to NLP - J. Eisner p(acoustics | phone seq) 15

Word Segmentation theprophetsaidtothecity § What does this say? § And what other words are

Word Segmentation theprophetsaidtothecity § What does this say? § And what other words are substrings? § Could segment with parsing (how? ), but slow. § § § Given L = a “lexicon” FSA that matches all English words. How to apply to this problem? What if Lexicon is weighted? From unigrams to bigrams? Smooth L to include unseen words? 600. 465 - Intro to NLP - J. Eisner 16

Spelling correction § Spelling correction also needs a lexicon L § But there is

Spelling correction § Spelling correction also needs a lexicon L § But there is distortion … § Let T be a transducer that models common typos and other spelling errors § § § ance ( ) ence e e // Cons _ Cons rr r ge dge etc. (deliverance, . . . ) (athlete, . . . ) (embarrasş occurrence, …) (privilege, …) § Now what can you do with L. o. T ? § Should T and L have probabilities? § Want T to include “all possible” errors … 600. 465 - Intro to NLP - J. Eisner 17

Morpheme Segmentation § Let L be a machine that matches all Turkish words §

Morpheme Segmentation § Let L be a machine that matches all Turkish words § Same problem as word segmentation § Just at a lower level: morpheme segmentation § Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına = uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na (behaving) as if you are among those whom we could not cause to become civilized § Some constraints on morpheme sequence: bigram probs § Generative model – concatenate then fix up joints § stop + -ing = stopping, fly + -s = flies, vowel harmony § Use a cascade of transducers to handle all the fixups § But this is just morphology! § Can use probabilities here too (but people often don’t) 600. 465 - Intro to NLP - J. Eisner 18

More Engineering Applications § Markup § § § Dates, names, places, noun phrases; spelling/grammar

More Engineering Applications § Markup § § § Dates, names, places, noun phrases; spelling/grammar errors? Hyphenation Informative templates for information extraction (FASTUS) Word segmentation (use probabilities!) Part-of-speech tagging (use probabilities – maybe!) § Translation § § § Spelling correction / edit distance Phonology, morphology: series of little fixups? constraints? Speech Transliteration / back-transliteration Machine translation? § Learning … 600. 465 - Intro to NLP - J. Eisner 19