Recurrent Neural Networks deeplearning ai Gated Recurrent Unit
- Slides: 59
Recurrent Neural Networks deeplearning. ai Gated Recurrent Unit (GRU)
Motivation • Not all problems can be converted into one with fixed-length inputs and outputs • Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information • Simple case: Output YES if the number of 1 s is even, else NO 1000010101 – YES, 100011 – NO, … • Hard/Impossible to choose a fixed context window • There can always be a new sample longer than anything seen Andrew Ng
Recurrent Neural Networks (RNNs) • Recurrent Neural Networks take the previous output or hidden states as inputs. The composite input at time t has some historical information about the happenings at time T < t • RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori Andrew Ng
Sample Feed-forward Network y 1 h 1 x 1 t=1 4 Andrew Ng
Sample RNN y 3 y 2 y 1 h 1 x 1 h 2 x 2 h 3 x 3 t=2 t=1 5 Andrew Ng
Sample RNN y 3 y 2 y 1 h 0 x 1 h 2 x 2 h 3 x 3 t=2 t=1 6 Andrew Ng
Sentiment Classification • Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative • Inputs: Multiple words, one or more sentences • Outputs: Positive / Negative classification • “The food was really good” • “The chicken crossed the road because it was uncooked” Andrew Ng
Sentiment Classification RNN h 1 The Andrew Ng
Sentiment Classification RNN The h 1 RNN h 2 food Andrew Ng
Sentiment Classification hn RNN The h 1 RNN food h 2 hn-1 RNN good Andrew Ng
Sentiment Classification Linear Classifier hn RNN The h 1 RNN food h 2 hn-1 RNN good Andrew Ng
Sentiment Classification Linear Classifier Ignore h 1 h 2 hn RNN RNN The h 1 food h 2 hn-1 good Andrew Ng
Sentiment Classification h = Sum(…) h 1 RNN The h 1 hn h 2 RNN food http: //deeplearning. net/tutorial/lstm. html h 2 hn-1 RNN good Andrew Ng
Sentiment Classification Linear Classifier h = Sum(…) h 1 RNN The h 1 hn h 2 RNN food http: //deeplearning. net/tutorial/lstm. html h 2 hn-1 RNN good Andrew Ng
RNN unit Output activation value Andrew Ng
( c ) GRU
Same dimension, for example 100 Element-wise Dot product
GRU (simplified) The cat, which already ate …, was full. [Cho et al. , 2014. On the properties of neural machine translation: Encoder-decoder approaches] [Chung et al. , 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling] Andrew Ng
Relevant gate
Recurrent Neural Networks deeplearning. ai LSTM (long short term memory) unit
Introduction • RNN (Recurrent neural network) is a form of neural networks that feed outputs back to the inputs during operation • LSTM (Long short-term memory) is a form of RNN. It fixes the vanishing gradient problem of the original RNN. • Application: Sequence to sequence model based using LSTM for machine translation • Materials are mainly based on links found in https: //www. tensorflow. org/tutorials RNN, LSTM v. 7. c 39 Andrew Ng
LSTM (Long short-term memory) • Standard RNN • Input concatenate with output then feed to input again • LSTM • The repeating structure is more complicated RNN, LSTM v. 7. c 41 Andrew Ng
GRU and LSTM More powerful and general version GRU LSTM [Hochreiter & Schmidhuber 1997. Long short-term memory] Andrew Ng
LSTM in pictures softmax forget gate * -- * softmax -- * output gate softmax tanh update gate softmax tanh * -- * * -- Andrew Ng
Core idea of LSTM • C= State Ct-1 = State of time t-1 Ct = State of time t • Using gates it can add or remove information to avoid the long term dependencies problem Bengio, et al. (1994) =a sigmoid function. RNN, LSTM v. 7. c A gate controlled by : The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through, ” while a value of one means “let everything through!” An LSTM has three of these gates, to protect and control the cell state http: //colah. github. io/posts/2 015 -08 -Understanding-LSTMs/ 47 Andrew Ng
First step: forget gate layer • Decide what to throw away from the cell state What to be kept/forget “For the language model example. . the cell state might include the gender of the present subject, so that the correct pronouns can be used. When we see a new subject, we want to forget the gender of the old subject. ” “It looks at ht− 1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct− 1. A 1 represents “completely keep this” while a 0 represents “completely get rid of this. ” ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ RNN, LSTM v. 7. c 48 Andrew Ng
Second step (a): input gate layer • Decide what information to store in the cell state What to be kept/forget New information added to become the state Ct “For the language model example. . In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting. ” “Next, a tanh layer creates a vector of new candidate values, ~Ct, that could be added to the state. In the next step, we’ll combine these two to create an update to the state. ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ RNN, LSTM v. 7. c 49 Andrew Ng
Second step (b): update the old cell state • Ct-1 Ct “We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it ∗ ~Ct. This is the new candidate values, scaled by how much we decided to update each state value. ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ “For the language model example. . this is where we’d actually drop the information about the old subject’s gender and add the new information, as we decided in the previous steps. ” RNN, LSTM v. 7. c 50 Andrew Ng
Third step: output layer • Decide what to output (ht). “Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between − 1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to. ” http: //colah. github. io/posts/2015 -08 -Understanding-LSTMs/ RNN, LSTM v. 7. c “For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows next. ” 51 Andrew Ng
X is of size nx 1 h is of size mx 1 http: //kvitajakub. github. io/2016/04/14/rnndiagrams/ • Ct(mx 1) Forget gate Ct-1(mx 1) ft(mx 1) i(mx 1) U(mx 1) o (mx 1) t ht(mx 1) ht-1(mx 1) X is of size nx 1 Size( Xt(nx 1) append ht-1(mx 1) )=(n+m)x 1 RNN, LSTM v. 7. c 52 Andrew Ng
Recurrent Neural Networks deeplearning. ai Bidirectional RNN
Getting information from the future He said, “Teddy bears are on sale!” He said, “Teddy Roosevelt was a great President!” He said, “Teddy bears are on sale!” Andrew Ng
Bidirectional RNN (BRNN) Andrew Ng
- Andrew ng lstm
- Voltage gated vs ligand gated
- Lstm components
- Extensions of recurrent neural network language model
- Lstm colah
- Tomas mikolov
- Visualizing and understanding recurrent networks
- Cs 231 n
- Visualizing and understanding convolutional neural networks
- Vc dimension of neural networks
- Neural networks ib psychology
- Audio super resolution using neural networks
- Convolutional neural networks for visual recognition
- Style transfer
- Efficient processing of deep neural networks pdf
- Mippers
- Convolutional neural network ppt
- Neural networks and learning machines 3rd edition
- Matlab neural network toolbox
- Neural networks for rf and microwave design
- 11-747 neural networks for nlp
- Xor problem
- Sparse convolutional neural networks
- On the computational efficiency of training neural networks
- Tlu perceptron
- Fuzzy logic lecture
- Netinsights
- Convolutional neural networks
- Few shot learning with graph neural networks
- Deep forest: towards an alternative to deep neural networks
- Convolutional neural networks
- Neuraltools neural networks
- Predicting nba games using neural networks
- Neural networks and learning machines
- The wake-sleep algorithm for unsupervised neural networks
- Bharath subramanyam
- Alternatives to convolutional neural networks
- Ligand gated
- Difference between flip flop and latch
- Gated d latch truth table
- Spatial summation
- Gated wye friction loss
- Gated oscillator
- Inverse gated decoupling
- Metabotropic receptor
- In primitive flow table for gated latch each state has
- Axon patch clamp amplifier
- Depolarization sodium ions
- The d flip flop has input
- Gated d latch characteristic equation
- Teardowns definition ap human geography
- Virtual circuit vs datagram networks
- Basestore iptv
- Unit 10, unit 10 review tests, unit 10 general test
- Unit 5 managing networks
- Unit 4 computer networks
- Position of vocal cords
- Profunda brachi artery
- Thyrohyoid membrane is pierced by
- Recurrent strokes