Attention is not Explanation NAACL 19 Sarthak Jain

Background-Attention • Given sequence h and query Q • Calculate attention distribution • Additive

Question • Is the attention mechanism really get the semantic attention?

Is the attention provide transparency? • Do attention weights correlate with measures of feature

Experiment Model y h dense layer encoder (Bi. RNN) attention h embedding one hot

Correlation with Feature Importance • Gradient based measure • Leave one feature out

Result for Correlation Orange=>Positive, Purple=>Negative O, P, G=>Neutral, Contradiction, Entailment • Gradients

Result for Random Permutation Orange=>Positive, Purple=>Negative O, P, G=>Neutral, Contradiction, Entailment

Adversarial Attention • Optimize a relaxed version with Adam SGD

Conclusion • correlation between feature importance measures and learned attention weights is weak •

Slides: 18

Download presentation

Attention is not Explanation NAACL’ 19 Sarthak Jain, Byron C. Wallace Northeastern University

Background • Attention Mechanism

Background-Attention • Given sequence h and query Q • Calculate attention distribution • Additive function • Scaled dot-product function • Get attention vector:

Question • Is the attention mechanism really get the semantic attention?

Is the attention provide transparency? • Do attention weights correlate with measures of feature importance? • Would alternative attention weights necessarily yield different predictions?

Experiment Model y h dense layer encoder (Bi. RNN) attention h embedding one hot h Q

Dataset

Correlation with Feature Importance • Gradient based measure • Leave one feature out

Result for Correlation Orange=>Positive, Purple=>Negative O, P, G=>Neutral, Contradiction, Entailment • Gradients

Result for Correlation • Leave One Out

Statistically Significant

Random Attention Weights

Result for Random Permutation Orange=>Positive, Purple=>Negative O, P, G=>Neutral, Contradiction, Entailment

Adversarial Attention • Optimize a relaxed version with Adam SGD

Result for Adversarial Attention

Conclusion • correlation between feature importance measures and learned attention weights is weak • counterfactual attentions often have no effect on model output • limitations • only consider a handful of attention variants • only evaluate tasks with unstructured output spaces (no seq 2 seq)

Adversarial Heatmaps Example