More on Text Management Context Free Grammars Context

  • Slides: 18
Download presentation
More on Text Management

More on Text Management

Context Free Grammars • Context Free Grammars are a more natural model for Natural

Context Free Grammars • Context Free Grammars are a more natural model for Natural Language • Syntax rules are very easy to formulate using CFGs • Provably more expressive than Finite State Machines – E. g. Can check for balanced parentheses

Context Free Grammars • Non-terminals • Terminals • Production rules – V → w

Context Free Grammars • Non-terminals • Terminals • Production rules – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals

Context Free Grammars • Can be used as acceptors • Can be used as

Context Free Grammars • Can be used as acceptors • Can be used as a generative model • Similarly to the case of Finite State Machines • How long can a string generated by a CFG be?

Stochastic Context Free Grammar • Non-terminals • Terminals • Production rules associated with probability

Stochastic Context Free Grammar • Non-terminals • Terminals • Production rules associated with probability – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals

Chomsky Normal Form • Every rule is of the form • V → V

Chomsky Normal Form • Every rule is of the form • V → V 1 V 2 where V, V 1, V 2 are non-terminals • V → t where V is a non-terminal and t is a terminal Every (S)CFG can be written in this form • Makes designing many algorithms easier

Questions • What is the probability of a string? – Defined as the sum

Questions • What is the probability of a string? – Defined as the sum of probabilities of all possible derivations of the string • Given a string, what is its most likely derivation? – Called also the Viterbi derivation or parse – Easy adaptation of the Viterbi Algorithm for HMMs • Given a training corpus, and a CFG (no probabilities) learn the probabilities on derivation rule

Inside-outside probabilities • Inside probability: probability of generating wp…wq from non-terminal Nj. • Outside

Inside-outside probabilities • Inside probability: probability of generating wp…wq from non-terminal Nj. • Outside probability: total prob of beginning with the start symbol N 1 and generating and everything outside wp…wq

CYK algorithm Nj Nr wp Ns wd Wd+1 wq

CYK algorithm Nj Nr wp Ns wd Wd+1 wq

CYK algorithm N 1 Nf Nj w 1 wp Ng wq Wq+1 we wm

CYK algorithm N 1 Nf Nj w 1 wp Ng wq Wq+1 we wm

CYK algorithm N 1 Nf Ng w 1 we Wp-1 Nj Wp wq wm

CYK algorithm N 1 Nf Ng w 1 we Wp-1 Nj Wp wq wm

Outside probability

Outside probability

Probability of a sentence

Probability of a sentence

The probability that a binary rule is used (1 )

The probability that a binary rule is used (1 )

The probability that Nj is used (2 )

The probability that Nj is used (2 )

The probability that a unary rule is used (3 )

The probability that a unary rule is used (3 )

Multiple training sentences (1 ) (2 )

Multiple training sentences (1 ) (2 )