ECE 6504 Deep Learning for Perception Topics Recurrent

Administrativia • HW 3 – – (C) Dhruv Batra Out today Due in 2

Plan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – Back.

New Topic: RNNs (C) Dhruv Batra Image Credit: Andrej Karpathy 4

Synonyms • Recurrent Neural Networks (RNNs) • Recursive Neural Networks – General familty; think

What’s wrong with MLPs? • Problem 1: Can’t model sequences – Fixed-sized Inputs &

Sequences are everywhere… (C) Dhruv Batra Image Credit: Alex Graves and Kevin Gimpel 7

Even where you might not expect a sequence… (C) Dhruv Batra Image Credit: Vinyals

Even where you might not expect a sequence… • Input ordering = sequence (C)

(C) Dhruv Batra Image Credit: [Pinheiro and Collobert, ICML 14] 10

Why model sequences? Figure Credit: Carlos Guestrin

Why model sequences? (C) Dhruv Batra Image Credit: Alex Graves 12

Name that model Y 1 = {a, …z} X 1 = Y 2 =

How do we model sequences? • No input (C) Dhruv Batra Image Credit: Bengio,

How do we model sequences? • With inputs (C) Dhruv Batra Image Credit: Bengio,

How do we model sequences? • With inputs and outputs (C) Dhruv Batra Image

How do we model sequences? • With Neural Nets (C) Dhruv Batra Image Credit:

How do we model sequences? • It’s a spectrum… Input: No sequence Input: Sequence

Things can get arbitrarily complex (C) Dhruv Batra Image Credit: Herbert Jaeger 19

Key Ideas • Parameter Sharing + Unrolling – Keeps numbers of parameters in check

BPTT • a (C) Dhruv Batra Image Credit: Richard Socher 22

Illustration [Pascanu et al] • Intuition • Error surface of a single hidden unit

Fix #1 • Pseudocode (C) Dhruv Batra Image Credit: Richard Socher 24

Fix #2 • Smart Initialization and Re. Lus – [Socher et al 2013] –

Slides: 25

Download presentation

ECE 6504: Deep Learning for Perception Topics: – – Recurrent Neural Networks (RNNs) Back. Prop Through Time (BPTT) Vanishing / Exploding Gradients [Abhishek: ] Lua / Torch Tutorial Dhruv Batra Virginia Tech

Administrativia • HW 3 – – (C) Dhruv Batra Out today Due in 2 weeks Please please start early https: //computing. ece. vt. edu/~f 15 ece 6504/homework 3/ 2

Plan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – Back. Prop Through Time (BPTT) – Vanishing / Exploding Gradients • [Abhishek: ] Lua / Torch Tutorial (C) Dhruv Batra 3

New Topic: RNNs (C) Dhruv Batra Image Credit: Andrej Karpathy 4

Synonyms • Recurrent Neural Networks (RNNs) • Recursive Neural Networks – General familty; think graphs instead of chains • Types: – – – Long Short Term Memory (LSTMs) Gated Recurrent Units (GRUs) Hopfield network Elman networks … • Algorithms – Back. Prop Through Time (BPTT) – Back. Prop Through Structure (BPTS) (C) Dhruv Batra 5

What’s wrong with MLPs? • Problem 1: Can’t model sequences – Fixed-sized Inputs & Outputs – No temporal structure • Problem 2: Pure feed-forward processing – No “memory”, no feedback (C) Dhruv Batra Image Credit: Alex Graves, book 6

Sequences are everywhere… (C) Dhruv Batra Image Credit: Alex Graves and Kevin Gimpel 7

Even where you might not expect a sequence… (C) Dhruv Batra Image Credit: Vinyals et al. 8

Even where you might not expect a sequence… • Input ordering = sequence (C) Dhruv Batra Image Credit: Ba et al. ; Gregor et al 9

Why model sequences? Figure Credit: Carlos Guestrin

Why model sequences? (C) Dhruv Batra Image Credit: Alex Graves 12

Name that model Y 1 = {a, …z} X 1 = Y 2 = {a, …z} X 2 = Y 3 = {a, …z} Y 4 = {a, …z} X 3 = X 4 = Y 5 = {a, …z} X 5 = Hidden Markov Model (HMM) (C) Dhruv Batra Figure Credit: Carlos Guestrin 13

How do we model sequences? • No input (C) Dhruv Batra Image Credit: Bengio, Goodfellow, Courville 14

How do we model sequences? • With inputs (C) Dhruv Batra Image Credit: Bengio, Goodfellow, Courville 15

How do we model sequences? • With inputs and outputs (C) Dhruv Batra Image Credit: Bengio, Goodfellow, Courville 16

How do we model sequences? • With Neural Nets (C) Dhruv Batra Image Credit: Alex Graves 17

How do we model sequences? • It’s a spectrum… Input: No sequence Input: Sequence Output: No sequence Example: “standard” classification / regression problems (C) Dhruv Batra Example: Im 2 Caption Example: sentence classification, multiple-choice question answering Example: machine translation, video captioning, openended question answering, video question answering Image Credit: Andrej Karpathy 18

Things can get arbitrarily complex (C) Dhruv Batra Image Credit: Herbert Jaeger 19

Key Ideas • Parameter Sharing + Unrolling – Keeps numbers of parameters in check – Allows arbitrary sequence lengths! • “Depth” – Measured in the usual sense of layers – Not unrolled timesteps • Learning – Is tricky even for “shallow” models due to unrolling (C) Dhruv Batra 20

BPTT • a (C) Dhruv Batra Image Credit: Richard Socher 22

Illustration [Pascanu et al] • Intuition • Error surface of a single hidden unit RNN; High curvature walls • Solid lines: standard gradient descent trajectories • Dashed lines: gradient rescaled to fix problem (C) Dhruv Batra 23

Fix #1 • Pseudocode (C) Dhruv Batra Image Credit: Richard Socher 24

Fix #2 • Smart Initialization and Re. Lus – [Socher et al 2013] – A Simple Way to Initialize Recurrent Networks of Rectified Linear Units, Le et al. 2015 (C) Dhruv Batra 25