ACL 2017 Visualizing and Understanding Neural Machine Translation

  • Slides: 22
Download presentation
ACL 2017 Visualizing and Understanding Neural Machine Translation Yanzhuo Ding, Yang Liu, Huanbo Luan,

ACL 2017 Visualizing and Understanding Neural Machine Translation Yanzhuo Ding, Yang Liu, Huanbo Luan, Maosong Sun 1

Machine Translation • MT: using computer to translate natural languages 布什 与 Bush held

Machine Translation • MT: using computer to translate natural languages 布什 与 Bush held 沙� �行 了 会� a talk with Sharon 2

Neural Machine Translation Black Box 3

Neural Machine Translation Black Box 3

Previous Work • Attention: relevance between input and output (Bahdanau et al. , 2015)

Previous Work • Attention: relevance between input and output (Bahdanau et al. , 2015) 4

Previous Work • First-Derivative Saliency: using gradient to measure relevance. (Li et al. ,

Previous Work • First-Derivative Saliency: using gradient to measure relevance. (Li et al. , 2016) 5

Previous Work • Layer-wise relevance propagation: decomposing outputs into sum of relevance scores (Bach

Previous Work • Layer-wise relevance propagation: decomposing outputs into sum of relevance scores (Bach et al. , 2015) 6

Our Work • Visualizing and interpreting NMT using LRP method • Helping to analyze

Our Work • Visualizing and interpreting NMT using LRP method • Helping to analyze translation errors 7

An Example 8

An Example 8

An Example 9

An Example 9

Neuron-level relevance • The relevance between two neuron. 10

Neuron-level relevance • The relevance between two neuron. 10

Vector-level relevance • The relevance between two vectors. 11

Vector-level relevance • The relevance between two vectors. 11

Relevance vectors A sequence of vector-level relevance of its contextual words 12

Relevance vectors A sequence of vector-level relevance of its contextual words 12

Weight ratio • Matrix multiplication • Element-wise multiplication • Maximization 13

Weight ratio • Matrix multiplication • Element-wise multiplication • Maximization 13

LRP Algorithm in NMT Algorithm: Layer-wise relevance propagation for NMT 14

LRP Algorithm in NMT Algorithm: Layer-wise relevance propagation for NMT 14

Visualization of NMT model Source Side 近 0 jin 两 1 0 liang 近

Visualization of NMT model Source Side 近 0 jin 两 1 0 liang 近 jin 1 2 两 liang 年 nian 2 3 年 nian 来 4 lai 3 , 来 lai , 4 , , 5 5 美国 meiguo 6 6

Visualization of NMT model Target Side 0 my 1 0 我 wo visit 1

Visualization of NMT model Target Side 0 my 1 0 我 wo visit 1 参拜 canbai is 2 2 是 shi 3 to 3 为了 weile 4 祈求 qiqiu pray 4 5 my 1 5 16

Translation error analysis Word Omission 5 vote 6 4 参 can of 5 7

Translation error analysis Word Omission 5 vote 6 4 参 can of 5 7 众 zhong confidence 6 两 liang 7 in 8 院 yuan 8 9 the 10 信任 投票 </s> 9 10 xinren toupiao </s> senate 11 the 10 11 </s> 12 senate 11 17

Translation error analysis Word Repetition 18

Translation error analysis Word Repetition 18

Translation error analysis Unrelated Words 19

Translation error analysis Unrelated Words 19

Translation error analysis Negation Reversion 20

Translation error analysis Negation Reversion 20

Conclusion • We propose to use layer-wise relevance propagation to visualize and interpret NMT

Conclusion • We propose to use layer-wise relevance propagation to visualize and interpret NMT • Our approach can calculate the relevance between arbitrary hidden states and contextual words • It helps us to analyze translation errors and debug the model 21

Thanks 22

Thanks 22