Learning Phrase Representations using RNN EncoderDecoder for Statistical

Recurrent Neural Networks (2/3) A variable-length sequence x = (x 1, …, x. T)

Recurrent Neural Networks (3/3) The output at each timestep t is the conditional probability

RNN Encoder-Decoder (2/3) Encoder: Input: A variable-length sequence x Output: A fixed-length vector representation

RNN Encoder-Decoder (3/3) Trained jointly to maximize conditional log-likelihood Usage: Generate an output sequence

The Hidden Unit (1/2) Gated Recurrent Unit (GRU) 2 gates: Update gate z decides

The Hidden Unit (2/2) Reset gate: Update gate: New state: Final state: 9

Statistical Machine Translation RNN encoder-decoder for scoring phrase pairs. Additional feature in the log-linear

Experiments English-to-French machine translation 11

Slides: 11

Download presentation

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’ 14 paper by Kyunghyun Cho, et al.

Recurrent Neural Networks (1/3) 2

Recurrent Neural Networks (2/3) A variable-length sequence x = (x 1, …, x. T) Hidden state h (optional) Output y (e. g. next symbol in a sequence) A non-linear activation function f Logistic sigmoid Long short-term memory (LSTM) 3

Recurrent Neural Networks (3/3) The output at each timestep t is the conditional probability p(xt | xt-1, … , x 1) e. g. output from a softmax layer: Hence, the probability of the sequence x can be computed: 4

RNN Encoder-Decoder (1/3) 5

RNN Encoder-Decoder (2/3) Encoder: Input: A variable-length sequence x Output: A fixed-length vector representation c Decoder: Input: A given fixed-length vector representation c Output: A variable-length sequence y Note that the decoder’s hidden state ht depends on ht-1, yt-1, and c. 6

RNN Encoder-Decoder (3/3) Trained jointly to maximize conditional log-likelihood Usage: Generate an output sequence given an input sequence Score a given pair of input and output sequences 7

The Hidden Unit (1/2) Gated Recurrent Unit (GRU) 2 gates: Update gate z decides how the hidden state is updated with a new hidden state Reset gate r decides whether the previous hidden state is ignored. 8

The Hidden Unit (2/2) Reset gate: Update gate: New state: Final state: 9

Statistical Machine Translation RNN encoder-decoder for scoring phrase pairs. Additional feature in the log-linear model of the phrase-based SMT framework Trained on each phrase pairs (ignoring frequencies). A new score is added to the existing phrase table. 10

Experiments English-to-French machine translation 11