ACL 2017 Visualizing and Understanding Neural Machine Translation

Machine Translation • MT: using computer to translate natural languages 布什与 Bush held

Previous Work • Attention: relevance between input and output (Bahdanau et al. , 2015)

Previous Work • First-Derivative Saliency: using gradient to measure relevance. (Li et al. ,

Previous Work • Layer-wise relevance propagation: decomposing outputs into sum of relevance scores (Bach

Our Work • Visualizing and interpreting NMT using LRP method • Helping to analyze

Neuron-level relevance • The relevance between two neuron. 10

Vector-level relevance • The relevance between two vectors. 11

Relevance vectors A sequence of vector-level relevance of its contextual words 12

Weight ratio • Matrix multiplication • Element-wise multiplication • Maximization 13

LRP Algorithm in NMT Algorithm: Layer-wise relevance propagation for NMT 14

Visualization of NMT model Source Side 近 0 jin 两 1 0 liang 近

Visualization of NMT model Target Side 0 my 1 0 我 wo visit 1

Translation error analysis Word Omission 5 vote 6 4 参 can of 5 7

Translation error analysis Word Repetition 18

Translation error analysis Unrelated Words 19

Translation error analysis Negation Reversion 20

Conclusion • We propose to use layer-wise relevance propagation to visualize and interpret NMT

Slides: 22

Download presentation

ACL 2017 Visualizing and Understanding Neural Machine Translation Yanzhuo Ding, Yang Liu, Huanbo Luan, Maosong Sun 1

Machine Translation • MT: using computer to translate natural languages 布什与 Bush held 沙� �行了会� a talk with Sharon 2

Neural Machine Translation Black Box 3

Previous Work • Attention: relevance between input and output (Bahdanau et al. , 2015) 4

Previous Work • First-Derivative Saliency: using gradient to measure relevance. (Li et al. , 2016) 5

Previous Work • Layer-wise relevance propagation: decomposing outputs into sum of relevance scores (Bach et al. , 2015) 6

Our Work • Visualizing and interpreting NMT using LRP method • Helping to analyze translation errors 7

An Example 8

An Example 9

Neuron-level relevance • The relevance between two neuron. 10

Vector-level relevance • The relevance between two vectors. 11

Relevance vectors A sequence of vector-level relevance of its contextual words 12

Weight ratio • Matrix multiplication • Element-wise multiplication • Maximization 13

LRP Algorithm in NMT Algorithm: Layer-wise relevance propagation for NMT 14

Visualization of NMT model Source Side 近 0 jin 两 1 0 liang 近 jin 1 2 两 liang 年 nian 2 3 年 nian 来 4 lai 3 ，来 lai , 4 ， , 5 5 美国 meiguo 6 6

Visualization of NMT model Target Side 0 my 1 0 我 wo visit 1 参拜 canbai is 2 2 是 shi 3 to 3 为了 weile 4 祈求 qiqiu pray 4 5 my 1 5 16

Translation error analysis Word Omission 5 vote 6 4 参 can of 5 7 众 zhong confidence 6 两 liang 7 in 8 院 yuan 8 9 the 10 信任投票 </s> 9 10 xinren toupiao </s> senate 11 the 10 11 </s> 12 senate 11 17

Translation error analysis Word Repetition 18

Translation error analysis Unrelated Words 19

Translation error analysis Negation Reversion 20

Conclusion • We propose to use layer-wise relevance propagation to visualize and interpret NMT • Our approach can calculate the relevance between arbitrary hidden states and contextual words • It helps us to analyze translation errors and debug the model 21

Thanks 22