Recurrent Neural Network What problems are CNNs good

Recurrent Neural Network

What problems are CNNs good at? • Image classification • Input: One image • Output: The class label • Assumption is one image is independent from the other image (in training set). • This assumption fails in many other problems. • Even CNNs are not designed for length-varying input and output. Figure taken from https: //www. twitch. tv/hsdogdog

What is a sequence Learning? • Sequence learning is the study of machine learning algorithms designed for sequential data. • Sequence learning is used for: • Time-series prediction: Tasks where the history of a time series is used to predict the next point. • Applications: stock market prediction, weather forecasting, animal migration tracking, storm trajectory prediction, and so on. • Sequence labelling: Tasks where a sequence of labels is applied to a sequence of data. • Applications: Language modeling, Speech recognition, gesture recognition, handwriting recognition, etc.

• Ways to deal with sequence labelling

Ways to deal with sequence labelling Finally, the RNN • Recurrent Neural Networks take the previous output (or hidden states) as inputs. • Distributed hidden state that allows them to store a lot of information about the past efficiently. • No need to infer hidden state. • Weight sharing is present. Figure taken from Deep Learning book by Ian Goodfellow

The Vanilla RNN cell • It’s a unit of the overall network that encapsulates certain parts of it in one homogeneous block • Here W is shared in the feedforward Vanilla RNN xt W ht ht-1 Typically how a simple RNN is shown (Left: Canonical form, Right: Unrolled form). Figure taken from https: //medium. com/@apiltamang/unmasking-a-vanilla-rnn-what-lies-beneath-912120 f 7 e 56 c

Recurrent Neural Networks (RNNs) • Note that the weights, W are shared over time, t • Essentially, copies of the RNN cell are made over time (unrolling), with different inputs at different time steps.

An example: Sentiment Classification • Classify • • Restaurant review from Yelp Movie review from IMDB … As positive review, or a negative review. • Inputs: Multiple words, one/more sentences. • Outputs: Positive/ Negative classification. • Example sentences: • “The food was really good” • “The chicken crossed the road because it was uncooked”

Sentiment Classification

Sentiment Classification Reference, http: //deeplearning. net/tutorial/lstm. html

Image Captioning • Given an image, produce a sentence describing its contents • Inputs: Image feature (most likely from a CNN) • Outputs: Multiple words (let’s consider a sentence)

Image captioning Show and Tell: A Neural Image Caption Generator, Orio et al. , (CVPR 2015)

RNN outputs: Language Modeling http: //karpathy. github. io/2015/05/21/rnn-effectiveness/

Input/Output scenarios One-to-One One-to-Many-to-one Many-to-many Feed-forward network Image captioning Sentiment classification Translation Video captioning

The Vanilla RNN • Unfold the network through time by making copies at each timestep.

Revising Backpropagation

Revising Backpropagation: with multiple layers

Chain rule of gradient computation

Backpropagation through time (BPTT) • One of the methods used to train RNNs. • The unfolded network (using during forward pass) is treated as one big feed-forward network. • This unfolded network accepts the whole time series as input. • The weight updates are computed for each copy in the unfolded network, then summed (or averaged), and then applied to the RNN weights.

The unfolded vanilla RNN • Treat the unfolded network as one big feed-forward network. • This big network takes in entire sequence as input. • Compute gradients through the usual backpropagation • Update shared weights.

Forward pass of the Unfolded Vanilla RNN

Backward propagation of the Unfolded Vanilla RNN

Issues with BPTT learning of the RNN params • Vanishing gradient, and Exploding gradient issues • Since the gradient is a product of k real number, it can shrink to zero, or explode to infinity. • This issue can be solved by using a special type of memory cell, called Long short-term Memory cells. • More on this coming up in “Deep Learning” Course (Fall 2018). • You are all invited! • Reference: Long Short-term memory, by Sepp Hochreiter et al. (1997)

Acknowledgements • Slide courtesy to Arun Mallya [http: //arunmallya. github. io/ ]