CS 4803 7643 Deep Learning Topics Recurrent Neural
- Slides: 54
CS 4803 / 7643: Deep Learning Topics: – Recurrent Neural Networks (RNNs) – Back. Prop Through Time (BPTT) Dhruv Batra Georgia Tech
Administrativia • HW 3 Released – – (C) Dhruv Batra Due: 11/06, 11: 55 pm Last HW Focus on projects after this https: //www. cc. gatech. edu/classes/AY 2019/cs 7643_fall/asse ts/hw 3. pdf 2
Plan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – Back. Prop Through Time (BPTT) (C) Dhruv Batra 3
New Topic: RNNs (C) Dhruv Batra Image Credit: Andrej Karpathy 4
New Words • Recurrent Neural Networks (RNNs) • Recursive Neural Networks – General family; think graphs instead of chains • Types: – – “Vanilla” RNNs (Elman Networks) Long Short Term Memory (LSTMs) Gated Recurrent Units (GRUs) … • Algorithms – Back. Prop Through Time (BPTT) – Back. Prop Through Structure (BPTS) (C) Dhruv Batra 5
What’s wrong with MLPs? • Problem 1: Can’t model sequences – Fixed-sized Inputs & Outputs – No temporal structure (C) Dhruv Batra Image Credit: Alex Graves, book 6
What’s wrong with MLPs? • Problem 1: Can’t model sequences – Fixed-sized Inputs & Outputs – No temporal structure • Problem 2: Pure feed-forward processing – No “memory”, no feedback (C) Dhruv Batra Image Credit: Alex Graves, book 7
Why model sequences? Figure Credit: Carlos Guestrin
Why model sequences? (C) Dhruv Batra Image Credit: Alex Graves 9
Sequences are everywhere… (C) Dhruv Batra Image Credit: Alex Graves and Kevin Gimpel 10
Even where you might not expect a sequence… (C) Dhruv Batra Image Credit: Vinyals et al. 11
Even where you might not expect a sequence… Classify images by taking a series of “glimpses” Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015. Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Even where you might not expect a sequence… • Output ordering = sequence (C) Dhruv Batra Image Credit: Ba et al. ; Gregor et al 13
(C) Dhruv Batra Image Credit: [Pinheiro and Collobert, ICML 14] 14
Sequences in Input or Output? • It’s a spectrum… Input: No sequence Output: No sequence Example: “standard” classification / regression problems (C) Dhruv Batra Image Credit: Andrej Karpathy 15
Sequences in Input or Output? • It’s a spectrum… Input: No sequence Output: No sequence Example: “standard” classification / regression problems (C) Dhruv Batra Output: Sequence Example: Im 2 Caption Image Credit: Andrej Karpathy 16
Sequences in Input or Output? • It’s a spectrum… Input: No sequence Input: Sequence Output: No sequence Example: “standard” classification / regression problems (C) Dhruv Batra Example: Im 2 Caption Example: sentence classification, multiple-choice question answering Image Credit: Andrej Karpathy 17
Sequences in Input or Output? • It’s a spectrum… Input: No sequence Input: Sequence Output: No sequence Example: “standard” classification / regression problems (C) Dhruv Batra Example: Im 2 Caption Example: sentence classification, multiple-choice question answering Example: machine translation, video classification, video captioning, open-ended question answering Image Credit: Andrej Karpathy 18
2 Key Ideas • Parameter Sharing – in computation graphs = adding gradients (C) Dhruv Batra 19
Computational Graph (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 20
(C) Dhruv Batra 21
Gradients add at branches + Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
2 Key Ideas • Parameter Sharing – in computation graphs = adding gradients • “Unrolling” – in computation graphs with parameter sharing (C) Dhruv Batra 24
How do we model sequences? • No input (C) Dhruv Batra Image Credit: Bengio, Goodfellow, Courville 26
How do we model sequences? • No input (C) Dhruv Batra Image Credit: Bengio, Goodfellow, Courville 27
How do we model sequences? • With inputs (C) Dhruv Batra Image Credit: Bengio, Goodfellow, Courville 28
2 Key Ideas • Parameter Sharing – in computation graphs = adding gradients • “Unrolling” – in computation graphs with parameter sharing • Parameter sharing + Unrolling – Allows modeling arbitrary sequence lengths! – Keeps numbers of parameters in check (C) Dhruv Batra 29
Recurrent Neural Network RNN x Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Recurrent Neural Network y usually want to predict a vector at some time steps RNN x Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Recurrent Neural Network We can process a sequence of vectors x by applying a recurrence formula at every time step: y RNN new state old state input vector at some time step some function with parameters W Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n x
Recurrent Neural Network We can process a sequence of vectors x by applying a recurrence formula at every time step: y RNN Notice: the same function and the same set of parameters are used at every time step. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n x
(Vanilla) Recurrent Neural Network The state consists of a single “hidden” vector h: y RNN x Sometimes called a “Vanilla RNN” or an “Elman RNN” after Prof. Jeffrey Elman Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
RNN: Computational Graph h 0 f. W h 1 x 1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
RNN: Computational Graph h 0 f. W x 1 h 1 f. W h 2 x 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
RNN: Computational Graph h 0 f. W x 1 h 1 f. W x 2 h 2 f. W h 3 x 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n … h. T
RNN: Computational Graph Re-use the same weight matrix at every time-step h 0 W f. W x 1 h 1 f. W x 2 h 2 f. W h 3 x 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n … h. T
RNN: Computational Graph: Many to Many y 2 y 1 h 0 W f. W x 1 h 1 f. W x 2 h 2 y 3 f. W h 3 x 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n y. T … h. T
RNN: Computational Graph: Many to Many h 0 W f. W x 1 y 1 L 1 y 2 L 2 y 3 L 3 y. T h 1 f. W h 2 f. W h 3 … h. T x 2 x 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n LT
L RNN: Computational Graph: Many to Many h 0 W f. W x 1 y 1 L 1 y 2 L 2 y 3 L 3 y. T h 1 f. W h 2 f. W h 3 … h. T x 2 x 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n LT
RNN: Computational Graph: Many to One y h 0 W f. W x 1 h 1 f. W x 2 h 2 f. W h 3 x 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n … h. T
RNN: Computational Graph: One to Many y 2 y 1 h 0 W f. W h 1 f. W h 2 y 3 f. W h 3 x Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n y. T … h. T
Sequence to Sequence: Many-to-one + one-to-many Many to one: Encode input sequence in a single vector h 0 W 1 f. W x 1 h 1 f. W x 2 h 2 f. W h 3 … h. T x 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Sequence to Sequence: Many-to-one + one-to-many One to many: Produce output sequence from single input vector Many to one: Encode input sequence in a single vector h 0 W 1 f. W x 1 h 1 f. W x 2 h 2 f. W x 3 y 2 y 1 h 3 … h. T f. W h 1 f. W W 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n h 2 f. W …
Example: Character-level Language Model Vocabulary: [h, e, l, o] Example training sequence: “hello” Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Example: Character-level Language Model Vocabulary: [h, e, l, o] Example training sequence: “hello” Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Distributed Representations Toy Example • Local vs Distributed (C) Dhruv Batra Slide Credit: Moontae Lee 48
Distributed Representations Toy Example • Can we interpret each dimension? (C) Dhruv Batra Slide Credit: Moontae Lee 49
Power of distributed representations! Local Distributed (C) Dhruv Batra Slide Credit: Moontae Lee 50
Example: Character-level Language Model Vocabulary: [h, e, l, o] Example training sequence: “hello” Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Training Time: MLE / “Teacher Forcing” Example: Character-level Language Model Vocabulary: [h, e, l, o] Example training sequence: “hello” Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
- Cs 7643 deep learning
- Pooling stride
- Cs 7643
- Cs 7643 deep learning
- Recurrent neural network andrew ng
- Deep forest: towards an alternative to deep neural networks
- Extensions of recurrent neural network language model
- Draw: a recurrent neural network for image generation
- Pixel recurrent neural networks
- Recurrent neural network based language model
- Cs 4803
- Cs 4803
- Cs 4803
- Cs 4803
- Cs 4803
- Cs 7643 project
- Cs 7643 github
- Cs 7643
- Deep learning vs machine learning
- Tony wagner's seven survival skills
- Efficient processing of deep neural networks pdf
- Deep neural networks and mixed integer linear optimization
- Deep asleep deep asleep it lies
- 深哉深哉耶穌的愛
- Cuadro comparativo entre e-learning b-learning y m-learning
- Least mean square algorithm in neural network
- Adaptive learning neural network
- Few shot learning with graph neural networks
- Terminator learning computer
- Neural networks and learning machines
- Semons law
- Visualizing and understanding recurrent networks
- Radial artery in the hand
- Laryngeal cartilages
- Recurrent stroke causes
- Recurrent stroke causes
- Webers syndrome
- Visualizing and understanding recurrent networks
- Bert question generation
- Element of hair
- Coarcitation of aorta
- Symbol dichotomy
- Part 135 recurrent training
- Paralysis of recurrent laryngeal nerve
- Cadaveric position of vocal cords
- Khepara
- Nasal vestibulitis treatment
- Foot vascular anatomy
- Rolo tracking
- Simple recurrent network
- Hima lakkaraju
- Global citizenship education topics and learning objectives
- Operator fusion deep learning
- Hadoop deep learning
- Gandiva: introspective cluster scheduling for deep learning