Recurrent Neural Networks deeplearning ai Gated Recurrent Unit

  • Slides: 59
Download presentation
Recurrent Neural Networks deeplearning. ai Gated Recurrent Unit (GRU)

Recurrent Neural Networks deeplearning. ai Gated Recurrent Unit (GRU)

Motivation • Not all problems can be converted into one with fixed-length inputs and

Motivation • Not all problems can be converted into one with fixed-length inputs and outputs • Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information • Simple case: Output YES if the number of 1 s is even, else NO 1000010101 – YES, 100011 – NO, … • Hard/Impossible to choose a fixed context window • There can always be a new sample longer than anything seen Andrew Ng

Recurrent Neural Networks (RNNs) • Recurrent Neural Networks take the previous output or hidden

Recurrent Neural Networks (RNNs) • Recurrent Neural Networks take the previous output or hidden states as inputs. The composite input at time t has some historical information about the happenings at time T < t • RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori Andrew Ng

Sample Feed-forward Network y 1 h 1 x 1 t=1 4 Andrew Ng

Sample Feed-forward Network y 1 h 1 x 1 t=1 4 Andrew Ng

Sample RNN y 3 y 2 y 1 h 1 x 1 h 2

Sample RNN y 3 y 2 y 1 h 1 x 1 h 2 x 2 h 3 x 3 t=2 t=1 5 Andrew Ng

Sample RNN y 3 y 2 y 1 h 0 x 1 h 2

Sample RNN y 3 y 2 y 1 h 0 x 1 h 2 x 2 h 3 x 3 t=2 t=1 6 Andrew Ng

Sentiment Classification • Classify a restaurant review from Yelp! OR movie review from IMDB

Sentiment Classification • Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative • Inputs: Multiple words, one or more sentences • Outputs: Positive / Negative classification • “The food was really good” • “The chicken crossed the road because it was uncooked” Andrew Ng

Sentiment Classification RNN h 1 The Andrew Ng

Sentiment Classification RNN h 1 The Andrew Ng

Sentiment Classification RNN The h 1 RNN h 2 food Andrew Ng

Sentiment Classification RNN The h 1 RNN h 2 food Andrew Ng

Sentiment Classification hn RNN The h 1 RNN food h 2 hn-1 RNN good

Sentiment Classification hn RNN The h 1 RNN food h 2 hn-1 RNN good Andrew Ng

Sentiment Classification Linear Classifier hn RNN The h 1 RNN food h 2 hn-1

Sentiment Classification Linear Classifier hn RNN The h 1 RNN food h 2 hn-1 RNN good Andrew Ng

Sentiment Classification Linear Classifier Ignore h 1 h 2 hn RNN RNN The h

Sentiment Classification Linear Classifier Ignore h 1 h 2 hn RNN RNN The h 1 food h 2 hn-1 good Andrew Ng

Sentiment Classification h = Sum(…) h 1 RNN The h 1 hn h 2

Sentiment Classification h = Sum(…) h 1 RNN The h 1 hn h 2 RNN food http: //deeplearning. net/tutorial/lstm. html h 2 hn-1 RNN good Andrew Ng

Sentiment Classification Linear Classifier h = Sum(…) h 1 RNN The h 1 hn

Sentiment Classification Linear Classifier h = Sum(…) h 1 RNN The h 1 hn h 2 RNN food http: //deeplearning. net/tutorial/lstm. html h 2 hn-1 RNN good Andrew Ng

RNN unit Output activation value Andrew Ng

RNN unit Output activation value Andrew Ng

( c ) GRU

( c ) GRU

Same dimension, for example 100 Element-wise Dot product

Same dimension, for example 100 Element-wise Dot product

GRU (simplified) The cat, which already ate …, was full. [Cho et al. ,

GRU (simplified) The cat, which already ate …, was full. [Cho et al. , 2014. On the properties of neural machine translation: Encoder-decoder approaches] [Chung et al. , 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling] Andrew Ng

Relevant gate

Relevant gate

Recurrent Neural Networks deeplearning. ai LSTM (long short term memory) unit

Recurrent Neural Networks deeplearning. ai LSTM (long short term memory) unit

Introduction • RNN (Recurrent neural network) is a form of neural networks that feed

Introduction • RNN (Recurrent neural network) is a form of neural networks that feed outputs back to the inputs during operation • LSTM (Long short-term memory) is a form of RNN. It fixes the vanishing gradient problem of the original RNN. • Application: Sequence to sequence model based using LSTM for machine translation • Materials are mainly based on links found in https: //www. tensorflow. org/tutorials RNN, LSTM v. 7. c 39 Andrew Ng

LSTM (Long short-term memory) • Standard RNN • Input concatenate with output then feed

LSTM (Long short-term memory) • Standard RNN • Input concatenate with output then feed to input again • LSTM • The repeating structure is more complicated RNN, LSTM v. 7. c 41 Andrew Ng

GRU and LSTM More powerful and general version GRU LSTM [Hochreiter & Schmidhuber 1997.

GRU and LSTM More powerful and general version GRU LSTM [Hochreiter & Schmidhuber 1997. Long short-term memory] Andrew Ng

LSTM in pictures softmax forget gate * -- * softmax -- * output gate

LSTM in pictures softmax forget gate * -- * softmax -- * output gate softmax tanh update gate softmax tanh * -- * * -- Andrew Ng

Core idea of LSTM • C= State Ct-1 = State of time t-1 Ct

Core idea of LSTM • C= State Ct-1 = State of time t-1 Ct = State of time t • Using gates it can add or remove information to avoid the long term dependencies problem Bengio, et al. (1994) =a sigmoid function. RNN, LSTM v. 7. c A gate controlled by : The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through, ” while a value of one means “let everything through!” An LSTM has three of these gates, to protect and control the cell state http: //colah. github. io/posts/2 015 -08 -Understanding-LSTMs/ 47 Andrew Ng

First step: forget gate layer • Decide what to throw away from the cell

First step: forget gate layer • Decide what to throw away from the cell state What to be kept/forget “For the language model example. . the cell state might include the gender of the present subject, so that the correct pronouns can be used. When we see a new subject, we want to forget the gender of the old subject. ” “It looks at ht− 1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct− 1. A 1 represents “completely keep this” while a 0 represents “completely get rid of this. ” ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ RNN, LSTM v. 7. c 48 Andrew Ng

Second step (a): input gate layer • Decide what information to store in the

Second step (a): input gate layer • Decide what information to store in the cell state What to be kept/forget New information added to become the state Ct “For the language model example. . In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting. ” “Next, a tanh layer creates a vector of new candidate values, ~Ct, that could be added to the state. In the next step, we’ll combine these two to create an update to the state. ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ RNN, LSTM v. 7. c 49 Andrew Ng

Second step (b): update the old cell state • Ct-1 Ct “We multiply the

Second step (b): update the old cell state • Ct-1 Ct “We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it ∗ ~Ct. This is the new candidate values, scaled by how much we decided to update each state value. ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ “For the language model example. . this is where we’d actually drop the information about the old subject’s gender and add the new information, as we decided in the previous steps. ” RNN, LSTM v. 7. c 50 Andrew Ng

Third step: output layer • Decide what to output (ht). “Finally, we need to

Third step: output layer • Decide what to output (ht). “Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between − 1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to. ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ RNN, LSTM v. 7. c “For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows next. ” 51 Andrew Ng

X is of size nx 1 h is of size mx 1 http: //kvitajakub.

X is of size nx 1 h is of size mx 1 http: //kvitajakub. github. io/2016/04/14/rnndiagrams/ • Ct(mx 1) Forget gate Ct-1(mx 1) ft(mx 1) i(mx 1) U(mx 1) o (mx 1) t ht(mx 1) ht-1(mx 1) X is of size nx 1 Size( Xt(nx 1) append ht-1(mx 1) )=(n+m)x 1 RNN, LSTM v. 7. c 52 Andrew Ng

Recurrent Neural Networks deeplearning. ai Bidirectional RNN

Recurrent Neural Networks deeplearning. ai Bidirectional RNN

Getting information from the future He said, “Teddy bears are on sale!” He said,

Getting information from the future He said, “Teddy bears are on sale!” He said, “Teddy Roosevelt was a great President!” He said, “Teddy bears are on sale!” Andrew Ng

Bidirectional RNN (BRNN) Andrew Ng

Bidirectional RNN (BRNN) Andrew Ng