Structured Learning Sequence Labeling Problem 2 Hungyi Lee

Structured Learning Sequence Labeling Problem (2) Hung-yi Lee

Name Entity Recognition • Identifying names of people from a sentence • Important component for a machine to understand human language 這位是李宏毅先生 Name of People Can be difficult … 楊公再興之神馮氏埋香之塚 Ref: https: //www. ptt. cc/bbs/Jin. Yong/M. 12 58625573. A. DC 4. html Ref: https: //www. ptt. cc/bbs/Jin. Yong/M. 11 95128035. A. 31 A. html

Name Entity Recognition - Notation 先: 字(character) 單字詞(single word) 先生: 詞(word) O: “Name”, X: “Not Name” Input x: 這位是李宏毅先生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 segmentation Output y: X y 1 X y 2 O y 3 Label each X segment y 4

Name Entity Recognition - Notation 先: 字(character) 單字詞(single word) 先生: 詞(word) O: “Name”, X: “Not Name” Input x: 這位是李宏毅先生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 segmentation Output y: X y 1 X y 2 Label each X segment O y 3 y 4 cf. last time John saw the saw. x 1 x 2 x 3 x 4 PN y 1 V y 2 D N y 3 y 4

How to address this task …. . • Not an issue, right? Structured Learning: Problem 1: Evaluation Problem 2: Inference Problem 3: Training There are some efficient ways to find w.

Evaluation Define • Input x: 這位是李宏毅先生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: X y 1 X y 2 O y 3 X y 4

Evaluation Define • Input x: 這位是李宏毅先生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: O y 1 X y 2 X y 3 O y 3 X y 4

Evaluation • Event 1 Event 2 Event 3 Define

Inference Input x: 這位是李宏毅先生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: y 3 y 1 y 2 y 1 y 3 y 2 y 3 y 4 …… y 1 y 2 y 3 y 4 y 5 y 6 y 7 Ø Enumerate all possible segmentations Ø Extra constraint: The length of each segment cannot be longer than 3. y 8

Inference Input x: 這位是李宏毅先生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: y. O 3 Oy 2 Oy 1 O X O X X …… O X Ø For each segmentation, each segment can be “O” or “X” Ø Compute F(x, y) for each possible y to find max

Brute Force 李宏毅先生 x 1 x 2 x 3 x 4 x 5 O O X X We will have a tree structure O O O X X X O O X X Compute F(x, y) for each path on the tree

Brute Force 李宏毅先生 x 1 x 2 x 3 x 4 x 5 O +3 +3 +1 X +0 +0 O O X X O …… Traversing the whole tree is computationally intensive. X From training data: 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

Observation x 2 …… x 1 y 1 …… ym-1 xn xn-1 xn+1 …… xn+k ym+1 ym O? X? by y How many scores would be added m+1 O? X? depending on Ø What is xn+1 to xn+k Ø What is the label of ym+1 Ø What is the label of ym Independent to y 1 to ym-1 Designing efficient algorithm based on this characteristics 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

Basic Idea 李宏毅先生 x 1 x 2 x 3 x 4 x 5 O +3 X O X +1 +1 +0 Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生 O O +3 +3 Because the label of the last segment is the same, adding the same segment will adding the same score. 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

Basic Idea 李宏毅先生 x 1 x 2 x 3 x 4 x 5 O +3 X O X +1 +1 +0 Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生 With smaller scores, it can never be the largest Don’t go any further. Because the label of the last segment is the same, adding the same segment will adding the same score. Based on this idea, more efficient algorithm can be - (Modified) Viterbi Algorithm designed.

Basic Idea Notice! Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生李宏毅先生 x 1 x 2 x 3 x 4 x 5 X X X +1 +1 +1 O X +1 +1 +0 O 輸了 +0 +3 逆轉勝! 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

x: 李 x 1 李宏 x 2 毅 x 3 先 x 4 李 O 0 X 1 Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生生 x 5 宏 O 0 O O 0 X O 1 Max X 0 O X 1 X X 2 Max 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

李李 O 0 宏毅 O X 1 O O 李宏 X O 1 X X 2 X 3 Max 0 1 1 2 0 X X 0 1 2 3 Max 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

李李 O 0 宏毅 3 Max O X 1 李宏 X O 1 X X 2 X X X 2 3 Max 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

李李 O 0 宏毅 1 O X 1 2 O O 李宏 X O 1 X X 2 李宏 X 毅 X X Max 2 X 3 3 1 X O X 先 4 Max 3 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

李李宏毅先 O 0 X 1 李宏 X O 1 X X 2 李宏毅 O X X 3 X Max X X X O 3 X X 4 Max 3 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

Ignore “李” case 李李宏 X O 1 X X 2 李宏宏毅先生 O 2 O O 毅 O 6 3 X X 李宏毅先 X X X O 3 X X 4 3 4 2 7 X X 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it 5

Ignore “李” case 李李宏 X O 1 X X 2 李宏宏毅先生 O O 6 O X 7 毅 O 3 X X X 李宏毅先 X X X O 3 X X 4 3 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it

Training • Structured Perceptron You know how to do it now • Structured SVM • Solving a QP • Find most violated (Can you develop the algorithm by yourself? )

Also called segmental CRF, semi-Markov model Conclusion This time Last Time x: x 1 x 2 x 3 x 4 x: y 1 =X y 2 =O y 3 =X y 4 =O y: x 2 y 1 =(1, 3, O) x 3 x 4 y 2 =(4, 4, X) x : y : x 1 TSI TSI I I N N N y : y 1 y 2 y 3 =(1, 4, TSI =(5, 6, I) ……

Appendix

This is the original paper

Inference – Brute Force

Inference – Brute Force Find the path with the largest