POS Tagging Using Hidden Markov Model By Ravi

POS Tagging Using Hidden Markov Model By Ravi Yadav Amit Kumar Biplab Ch Das

Generating the Lexical Probabilities We know that p(W|T)=P(W, T)/P(T) And since p(W, T)=N(W, T)/N(T) We Used a Hash Map to get all the counts of tagged words in the training corpus. That is we first find N(W, T). And then divide it by N(T).

Generating The transition Probabilities We know that p(Tn|Tn-1)=P(Tn, Tn-1)/P(Tn-1) And since p(W, T)=N(Tn-1, Tn)/N(Tn-1) Used Hash Map again. Got the N(t(n-1), t(n)) values and then we divided it by N(t(n-1)).

Separation of folds The folds were separated in the ratio of 4: 1 4 for training and 1 for the testing.

Smoothing of the data For the non existent data we tried to smoothen the data by adding a minor value(0. 0000001) to the count. A single zero in product can make the whole term zero.

The viterbi algorithm was implemented to get the best possible state sequence using the transition matrix and the emission matrix. The best possible state sequence was considered as the best tag possible for the sentence

The previous algorithm implemented Till now the pos tagger have implemented the viterbi algorithm , at the same time it we kept the greedy local lookup that we presented before. T(n)*=argmax(Tj)(p((Tj|Ti)*p(Wj|tj)); Here ‘i’is prev state and ‘j’ varies.

Belief network of assumption We take joint probability P(Si, Sj, wi, wj)=P(sj|si)*P(wj|sj)*P(wi|si) And set P(si)=1 and P(wi|si)=1(Considering the prev state is constant , we don’t need to cosider P(si) for local maximisation )

Confusion Matix The confusion matrix have been created. The actual tags were shown in the right hand side column and the tags by the Viterbi algorithm were enumerated at the bottom line. file: ///C: /Users/Biplab/Desktop/confusion_matrix. ht ml

Confusion Reason Most of the confusion was between NP 0 and NN 1 It was because of the unknown word prediction handling done in the code.

The F-score The proper output for the tag wise precision-recall and the F-score has been calculated. The F-score was calculated putting the beta value to 1. file: ///C: /Users/Biplab/Desktop/proper. html

Trigram Implentation The trigram model was implemented for the Pos tagger. In this case the current tag not only depends on the previous state but also on the previous to previous state.

Trigram the extension of bigram model:

The graphical model for trigram model:

Accuracy of the tagger Fold No Bigram. Skip gram (. 23*bigram+. 11*skipgram+. 66*trigram) A* algorithm (0. 75*bigram+. 25 skipgram) Fold 1 92. 69 92. 71 92. 82 93. 20 Fold 2 93. 76 94. 08 94. 28 94. 75 Fold 3 93. 61 93. 69 93. 73 94. 66 Fold 4 94. 00 94. 06 94. 17 94. 72 Fold 5 93. 22 93. 38 93. 64 93. 86 Overall 93. 45 93. 58 93. 73 94. 23

Discriminitive Model tag directly from the data. Expression : Argmax(j) P(T(j)|T(j-1), Current_word) Accuracy acieved 72. 98 % Possible problem : Unknown word handling was not done.

Word Prediction

Model for word prdeiction: argmax∑(Nw)∑(Tw)I(Tw)*sim(Syn(Nw), Syn(Tw)) Where argmax is over Nw Here nw=argmax(wi, wj) maximizing over wj Sim is a similarity function Syn(W) refers to the synset of the words

Accuracy of the word prediction model Measured using trigram of content words and checked with the top 5 suggestions. It came out to be 43. 30% without knowing pos tags and with pos tags it increased to 46. 00% Most of the confusion occurred when the next words involved names.

Parser Projection Biplab Ch Das Ravi Kumar

Source of Slides Franco M. Luque Ten (Or More) Minutes on Unsupervised Parsing

Supervised Parsing is the problem of building a parser using a treebank. Treebanks are corpuses of parsed sentences. ▶A part of the treebank is used to train the parser. ▶Another part is used to evaluate the parser. ▶It can be seen as learning a function f : X → Y (the parser) by knowing some points (x 1 , f (x 1 )), . . . , (xn , f (xn )) (the treebank).

Unsupervised Parsing What can we do by only knowing a set of points {x 1 , . . . , xn } of the domain of f ? The parser is trained with sentences. ▶The evaluation is still done against a treebank. ▶The set of syntactic categories is unknown.

The DMV+CCM Model Developed by Dan Klein and Chris Manning in 2004. Parses dependency trees that can be converted to binary bracketings. Learn and parse from POS tags instead of words. Must be combined with a POS tagger to obtain a real parser.

Combining the two Models and training CCM Stands for Constituent Content Model DMV stands for Dependency Model with Valence Versions of the inside-outside algorithm for PCFG’s [Lari and Young, 1990] can be used for running estimation maximation on both these probabilistic models.

But How can we use The parsed tree of other language Alignment Problem!!

Lets consider deviate a bit from original problem. Suppose we have a Parallel tree bank and we need to find the alignment between sentences given in two different languages. Solution: We can get the alignment using dependencies and constituents for Tree-Based Alignment. (ref: Dependencies vs. Constituents for Tree-Based Alignment by Daniel Gildea)

But that was not our problem Lets see word problem. Word alignment was Discussed in class. There are other approaches to get word alignment using the hybrid approach provided in [A hybrid approach to align sentences and words in English-Hindi parallel corpora, Niraj Aswani and Robert Gaizauskas] Lets see an abstract view of the algorithm.

The first part Dictionary lookup

Then nearest neighbor approach

Now lets focus on the problem: Google Translate was used for translation. English Sentence: Alignment problem is difficult. Hindi Translation: �� Also possible: ��

Running Stanford Parser on it: Pos Tags: • Alignment/NNP is/VBZ difficult/JJ problem/NN The parse bracketed tree: (S (NP (NNP Alignment) ) (VP (VBZ is) (NP

Pattern Matching/Unification The parse bracketed tree: (S (NP (NNP Alignment)==(NNP �� ) ) (VP (VBZ is)==(VBZ �� ) ----(1) (NP ----(2)

Lets try to make the parse and sentence sequence consistent Exchange: (1) and (2) (S (NP (NNP Alignment)==(NNP �� ) ) (VP (NP ----(2) (JJ difficult)==(JJ �� ) (NN problem)==(NN

Rule Proposed: We can exchange nodes at the same level to make the unified tree be consistent with the Sequence of translated sentence.

Lets make it consistent with first translation We use the proposed rule and exchange (3) and (4) (S (NP (NNP �� ) ) (VP (NN �� ) ------(4)

Both the trees can be parsed by CFG of HIndi Here CFG Is: S->NP VP VP->NP VBZ NP->JJ NN|NN JJ Problem: The rule is not robust

Take an example I shot the man with ice cream �� Translation itself is not correct!!! Difficult to find alignment ��and ��don’t align to anything.

What can we do? Apply Unsupervised parsing on hindi corpus. And for scoring the parse trees we can give higher scores to the one that is consistent with parse tree generated by proposed approach.

Thank You