Structured Learning Sequence Labeling Problem 2 Hungyi Lee
Structured Learning Sequence Labeling Problem (2) Hung-yi Lee
Name Entity Recognition • Identifying names of people from a sentence • Important component for a machine to understand human language 這 位 是 李 宏 毅 先 生 Name of People Can be difficult … 楊公再興之神 馮氏埋香之塚 Ref: https: //www. ptt. cc/bbs/Jin. Yong/M. 12 58625573. A. DC 4. html Ref: https: //www. ptt. cc/bbs/Jin. Yong/M. 11 95128035. A. 31 A. html
Name Entity Recognition - Notation 先: 字(character) 單字詞(single word) 先生: 詞(word) O: “Name”, X: “Not Name” Input x: 這 位 是 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 segmentation Output y: X y 1 X y 2 O y 3 Label each X segment y 4
Name Entity Recognition - Notation 先: 字(character) 單字詞(single word) 先生: 詞(word) O: “Name”, X: “Not Name” Input x: 這 位 是 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 segmentation Output y: X y 1 X y 2 Label each X segment O y 3 y 4 cf. last time John saw the saw. x 1 x 2 x 3 x 4 PN y 1 V y 2 D N y 3 y 4
How to address this task …. . • Not an issue, right? Structured Learning: Problem 1: Evaluation Problem 2: Inference Problem 3: Training There are some efficient ways to find w.
Evaluation Define • Input x: 這 位 是 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: X y 1 X y 2 O y 3 X y 4
Evaluation Define • Input x: 這 位 是 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: O y 1 X y 2 X y 3 O y 3 X y 4
Evaluation • Event 1 Event 2 Event 3 Define
Inference Input x: 這 位 是 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: y 3 y 1 y 2 y 1 y 3 y 2 y 3 y 4 …… y 1 y 2 y 3 y 4 y 5 y 6 y 7 Ø Enumerate all possible segmentations Ø Extra constraint: The length of each segment cannot be longer than 3. y 8
Inference Input x: 這 位 是 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Output y: y. O 3 Oy 2 Oy 1 O X O X X …… O X Ø For each segmentation, each segment can be “O” or “X” Ø Compute F(x, y) for each possible y to find max
Brute Force 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 O O X X We will have a tree structure O O O X X X O O X X Compute F(x, y) for each path on the tree
Brute Force 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 O +3 +3 +1 X +0 +0 O O X X O …… Traversing the whole tree is computationally intensive. X From training data: 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
Observation x 2 …… x 1 y 1 …… ym-1 xn xn-1 xn+1 …… xn+k ym+1 ym O? X? by y How many scores would be added m+1 O? X? depending on Ø What is xn+1 to xn+k Ø What is the label of ym+1 Ø What is the label of ym Independent to y 1 to ym-1 Designing efficient algorithm based on this characteristics 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
Basic Idea 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 O +3 X O X +1 +1 +0 Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生 O O +3 +3 Because the label of the last segment is the same, adding the same segment will adding the same score. 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
Basic Idea 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 O +3 X O X +1 +1 +0 Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生 With smaller scores, it can never be the largest Don’t go any further. Because the label of the last segment is the same, adding the same segment will adding the same score. Based on this idea, more efficient algorithm can be - (Modified) Viterbi Algorithm designed.
Basic Idea Notice! Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生 李 宏 毅 先 生 x 1 x 2 x 3 x 4 x 5 X X X +1 +1 +1 O X +1 +1 +0 O 輸了 +0 +3 逆轉勝! 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
x: 李 x 1 李 宏 x 2 毅 x 3 先 x 4 李 O 0 X 1 Lexicon: l 李 l 宏 l 毅 l 先 l 生 l 先生 生 x 5 宏 O 0 O O 0 X O 1 Max X 0 O X 1 X X 2 Max 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
李 李 O 0 宏 毅 O X 1 O O 李 宏 X O 1 X X 2 X 3 Max 0 1 1 2 0 X X 0 1 2 3 Max 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
李 李 O 0 宏 毅 3 Max O X 1 李 宏 X O 1 X X 2 X X X 2 3 Max 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
李 李 O 0 宏 毅 1 O X 1 2 O O 李 宏 X O 1 X X 2 李 宏 X 毅 X X Max 2 X 3 3 1 X O X 先 4 Max 3 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
李 李 宏 毅 先 O 0 X 1 李 宏 X O 1 X X 2 李 宏 毅 O X X 3 X Max X X X O 3 X X 4 Max 3 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
Ignore “李” case 李 李 宏 X O 1 X X 2 李 宏 宏 毅 先 生 O 2 O O 毅 O 6 3 X X 李 宏 毅 先 X X X O 3 X X 4 3 4 2 7 X X 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it 5
Ignore “李” case 李 李 宏 X O 1 X X 2 李 宏 宏 毅 先 生 O O 6 O X 7 毅 O 3 X X X 李 宏 毅 先 X X X O 3 X X 4 3 1. The segment labeled “O” has length 3 and begins with a usual family name. 2. The segment labeled “X” is a word in a lexicon (辭典) 3. The segment labeled “O” has a title after it
Training • Structured Perceptron You know how to do it now • Structured SVM • Solving a QP • Find most violated (Can you develop the algorithm by yourself? )
Also called segmental CRF, semi-Markov model Conclusion This time Last Time x: x 1 x 2 x 3 x 4 x: y 1 =X y 2 =O y 3 =X y 4 =O y: x 2 y 1 =(1, 3, O) x 3 x 4 y 2 =(4, 4, X) x : y : x 1 TSI TSI I I N N N y : y 1 y 2 y 3 =(1, 4, TSI =(5, 6, I) ……
Appendix
This is the original paper
Inference – Brute Force
Inference – Brute Force Find the path with the largest
- Slides: 29