CS 194294 129 Designing Visualizing and Understanding Deep
- Slides: 104
CS 194/294 -129: Designing, Visualizing and Understanding Deep Neural Networks John Canny Spring 2018 Lecture 9: Recurrent Networks, LSTMs and Applications
Last time: Localization and Detection Classification + Localization Object Detection Instance Segmentation CAT CAT, DOG, DUCK Single object Multiple objects Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Updates Please get your project proposal in asap: • Submit now even if your team is not complete. • We will try to merge small teams with related topics. Assignment 2 is out, due 3/5 at 11 pm. You’re encouraged but not required to use an EC 2 virtual machine. This week’s (starting tomorrow) discussion sections will focus on Tensorflow. Midterm 1 is coming up on 2/26. Next week’s sections will be midterm preparation.
Neural Network structure Standard Neural Networks are DAGs (Directed Acyclic Graphs). That means they have a topological ordering. • The topological ordering is used for activation propagation, and for gradient back-propagation. Conv 3 x 3 Re. LU Conv 3 x 3 + • These networks process one input minibatch at a time.
Recurrent Neural Networks (RNNs) One-step delay
Unrolling RNNs can be unrolled across multiple time steps. One-step delay This produces a DAG which supports backpropagation. But its size depends on the input sequence length.
Unrolling RNNs Usually drawn as:
RNN structure Often layers are stacked vertically (deep RNNs): Time Same parameters at this level Abstraction - Higher level features or = Same parameters at this level
RNN structure Backprop still works: Abstraction - Higher level features Time Activations (forward computation)
RNN structure Backprop still works: Abstraction - Higher level features Time Activations
RNN structure Backprop still works: Abstraction - Higher level features Time Activations
RNN structure Backprop still works: Abstraction - Higher level features Time Activations
RNN structure Backprop still works: Abstraction - Higher level features Time Activations
RNN structure Backprop still works: Abstraction - Higher level features Time Activations
RNN structure Backprop still works: Abstraction - Higher level features Time Activations
RNN structure Backprop still works: Abstraction - Higher level features Time Gradients (backward computation)
RNN structure Backprop still works: Abstraction - Higher level features Time Gradients
RNN structure Backprop still works: Abstraction - Higher level features Time Gradients
RNN structure Backprop still works: Abstraction - Higher level features Time Gradients
RNN structure Backprop still works: Abstraction - Higher level features Time Gradients
RNN structure Backprop still works: Abstraction - Higher level features Time Gradients
RNN structure Backprop still works: Abstraction - Higher level features Time Gradients
RNN unrolling Question: Can you run forward/backward inference on an unrolled RNN, i. e. keeping only one copy of its state? One-step delay
RNN unrolling Question: Can you run forward/backward inference on an unrolled RNN, i. e. keeping only one copy of its state? One-step delay Forward: Yes Backward: No
Recurrent Networks offer a lot of flexibility: Vanilla Neural Networks Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Networks offer a lot of flexibility: e. g. Image Captioning image -> sequence of words Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Networks offer a lot of flexibility: e. g. Sentiment Classification sequence of words -> sentiment Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Networks offer a lot of flexibility: e. g. Machine Translation seq of words -> seq of words Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Networks offer a lot of flexibility: e. g. Video classification on frame level Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Neural Network RNN h x Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Neural Network y RNN usually want to predict a vector at some time steps h x Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Neural Network We can process a sequence of vectors x by applying a recurrence formula at every time step: y h RNN new state old state input vector at some time step some function with parameters W Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson x
Recurrent Neural Network We can process a sequence of vectors x by applying a recurrence formula at every time step: y h RNN Notice: the same function and the same set of parameters are used at every time step. Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson x
(Vanilla) Recurrent Neural Network The state consists of a single “hidden” vector h: y h RNN x Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Character-level language model example Vocabulary: [h, e, l, o] Example training sequence: “hello” y h RNN x Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Character-level language model example Vocabulary: [h, e, l, o] Example training sequence: “hello” Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Character-level anguage model example Vocabulary: [h, e, l, o] Example training sequence: “hello” Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Character-level language model example Vocabulary: [h, e, l, o] Example training sequence: “hello” Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
min-char-rnn. py gist: 112 lines of Python (https: //gist. github. com/karpath y/d 4 dee 566867 f 8291 f 086) Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Applications - Poetry y RNN x http: //karpathy. github. io/2015/05/21/rnn-effectiveness/ Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson h
(Vanilla) Recurrent Neural Network http: //karpathy. github. io/2015/05/21/rnn-effectiveness/ Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
At first: train more Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
And later: Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Image Captioning • “Explain Images with Multimodal Recurrent Neural Networks, ” Mao et al. • “Deep Visual-Semantic Alignments for Generating Image Descriptions, ” Karpathy and Fei-Fei • “Show and Tell: A Neural Image Caption Generator, ” Vinyals et al. • “Long-term Recurrent Convolutional Networks for Visual Recognition and Description, ” Donahue et al. • “Learning a Recurrent Visual Representation for Image Caption Generation, ” Chen and Zitnick Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Recurrent Neural Network Convolutional Neural Network Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
test image
test image
test image X
test image x 0 <STA RT> <START>
test image y 0 before: h = tanh(Wxh * x + Whh * h) h 0 Wih now: h = tanh(Wxh * x + Whh * h + Wih * v) x 0 <STA RT> v <START>
test image y 0 sample! h 0 x 0 <STA RT> <START> straw
test image y 0 y 1 h 0 h 1 x 0 <STA RT> straw <START>
test image y 0 y 1 sample! h 0 h 1 x 0 <STA RT> straw <START> hat
test image y 0 y 1 y 2 h 0 h 1 h 2 x 0 <STA RT> straw <START> hat
test image y 0 y 1 y 2 sample <END> token => finish. h 0 h 1 x 0 <STA RT> straw <START> h 2 hat
RNN sequence generation Greedy (most likely) symbol generation is not very effective. Typically, the top-k sequences generated so far are remembered and the top-k of their one-symbol continuations are kept for the next step (beam search) However, k does not have to be very large (7 was used in Karpathy and Fei-Fei 2015).
Beam search with k = 2 Top-k most likely words straw man 0. 2 0. 1 … h 0 <start> <START>
Beam search with k = 2 Top-k most likely words straw man 0. 2 0. 1 … hat 0. 3 with roof 0. 03 0. 1 sitting … … Need Top-k most likely words Because they might be part of the top-K most likely subsequences Top sequences so far are: straw hat (p = 0. 06) man with (p = 0. 03) h 0 <start> <START> h 1 straw man
Beam search with k = 2 Top-k most likely words straw man 0. 2 0. 1 … hat 0. 3 with roof 0. 03 0. 1 sitting … … <end> 0. 1 0. 3 0. 01 0. 1 … … straw hat Best 2 words this pos’n Top sequences so far: straw hat <end> (0. 006) man with straw (0. 009) h 0 <start> <START> h 1 h 2 straw man 2 best Subsequence words h 2 hat with (0. 06) (0. 03)
Beam search with k = 2 Top-k most likely words straw man 0. 2 0. 1 … hat 0. 3 with roof 0. 03 0. 1 sitting … … h 0 <start> <START> h 1 <end> 0. 1 0. 3 straw 0. 01 0. 1 hat … … h 1 h 2 straw man 2 best Subsequence words h 2 hat with (0. 06) (0. 03) 0. 4 0. 1 … hat h 3 straw (0. 009) 0. 4 0. 1 … <end> h 4 hat (0. 00036)
Image Sentence Datasets Microsoft COCO [Tsung-Yi Lin et al. 2014] mscoco. org currently: ~120 K images ~5 sentences each
RNN attention networks RNN attends spatially to different parts of images while generating each word of the sentence: Show Attend and Tell, Xu et al. , 2015 64
Sequential Processing of fixed inputs Multiple Object Recognition with Visual Attention, Ba et al. 2015 Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Sequential Processing of fixed inputs DRAW: A Recurrent Neural Network For Image Generation, Gregor et al. Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Vanilla RNN: Exploding/Vanishing Gradients y h RNN x
Vanilla RNN: Exploding/Vanishing Gradients y h RNN x
Better RNN Memory: LSTMs Recall: “Plain. Nets” vs. Res. Nets Res. Net are very deep networks. They use residual connections as “hints” to approximate the identity fn. Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson 69
Better RNN Memory: LSTMs
Better RNN Memory: LSTMs They also have non-linear, linearly transformed hidden states like a standard RNN. No Transformation W General Linear Transformation
LSTM: Long Short-Term Memory Vanilla RNN: Input from below Input from left LSTM: (much more widely used) Input from below Input from left Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
LSTM: Anything you can do… Vanilla RNN: Input from below Input from left LSTM: (much more widely used) Input from below Input from left An LSTM can emulate a simple RNN by remembering with i and o
LSTM: Anything you can do… Vanilla RNN: Input from below Input from left LSTM: (much more widely used) Input from below Input from left 0 1 1 An LSTM can emulate a simple RNN by remembering with i and o
LSTM: Anything you can do… Vanilla RNN: Input from below Input from left LSTM: (much more widely used) Input from below Input from left 0 1 1 An LSTM can emulate a simple RNN by remembering with i and o, almost
LSTM = “gain” nodes [0, 1] = “data” nodes [-1, 1] h Input from below Input from left cell Input (left) tanh g tanh “peepholes” exist in some variations Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
LSTM Arrays • W
Open source textbook on algebraic geometry Latex source http: //karpathy. github. io/2015/05/21/rnn-effectiveness/ Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Generated math http: //karpathy. github. io/2015/05/21/rnn-effectiveness/ Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Generated math http: //karpathy. github. io/2015/05/21/rnn-effectiveness/ Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Train on Linux code http: //karpathy. github. io/2015/05/21/rnn-effectiveness/ Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Generated C code Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Generated C code Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Generated C code Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Long Short Term Memory (LSTM) [Hochreiter et al. , 1997] cell state c x f Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Long Short Term Memory (LSTM) [Hochreiter et al. , 1997] cell state c x + x f i g Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Long Short Term Memory (LSTM) [Hochreiter et al. , 1997] cell state c x + c x f i g tanh o x h Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Long Short Term Memory (LSTM) higher layer, or prediction [Hochreiter et al. , 1997] cell state c x + c x f i g tanh o x h Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
LSTM one timestep cell state c x x + + tanh x f i g x f x i x x o o h g h x Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson h
LSTM Summary 1. Decide what to forget 2. Decide what new things to remember 3. Decide what to output Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Searching for interpretable cells [Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei] Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Searching for interpretable cells quote detection cell (LSTM, tanh(c), red = -1, blue = +1) Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Searching for interpretable cells line length tracking cell Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Searching for interpretable cells if statement cell Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Searching for interpretable cells quote/comment cell Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Searching for interpretable cells code depth cell Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
LSTM stability How fast can the cell value c grow with time? How fast can the backpropagated gradient of c grow with time?
LSTM stability How fast can the cell value c grow with time? Linear How fast can the backpropagated gradient of c grow with time? Linear
Better RNN Memory: LSTMs No Transformation W General Linear Transformation
Better RNN Memory: LSTMs Linear gradient growth W Exponential gradient grow/shrink
LSTM variants and friends [An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al. , 2015] [LSTM: A Search Space Odyssey, Greff et al. , 2015] GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014] Based on cs 231 n by Fei-Fei Li & Andrej Karpathy & Justin Johnson
LSTM variants and friends GRU:
LSTM variants and friends GRU:
Summary - RNNs are a widely used model for sequential data, including text RNNs are trainable with backprop when unrolled over time RNNs learn complex and varied patterns in sequential data Vanilla RNNs are simple but don’t work very well - Backward flow of gradients in RNN can explode or vanish - Common to use LSTM or GRU. Memory path makes them stably learn long-distance interactions. - Better/simpler architectures are a hot topic of current research
- Visualizing and understanding recurrent networks
- Visualizing and understanding convolutional neural networks
- Visualizing and understanding recurrent networks
- Visualizing and understanding neural machine translation
- Colorante e121
- The elevator incident by melinda answers
- Ecs 129 uc davis
- Taskj-129
- 300 - 129
- Art 129 cst
- Deep asleep deep asleep it lies
- Deep forest: towards an alternative to deep neural networks
- O the deep deep love of jesus
- Visualizing and verbalizing
- Visualizing and exploring data in business analytics
- 1-1 practice nets and drawings for visualizing geometry
- Visualizing and verbalizing activities
- Analyzing and visualizing data with microsoft power bi
- Lesson 1 nets and drawings for visualizing geometry
- Nets and drawings for visualizing geometry
- Nets and drawing for visualizing geometry
- Organizing and visualizing variables
- Organizing and visualizing variables
- Visualize and verbalize
- Visualizing primary and secondary growth
- Visualizing reading strategy
- Limiting reactants
- Visualizing japanese grammar
- Visualize 1 acre
- Visualizing environmental science solution manual
- El nino enso
- Visualizing environmental science (doc or html) file
- Visualizing magnetic field
- Electric currents and magnetic fields
- Visualizing magnetic field
- Visualizing physical geography
- Visualizing japanese grammar
- Perimetero
- Domain model guidelines
- -is not one of the purposes for giving oral presentations.
- The art and science of designing and constructing building
- The process of designing and maintaining an environment
- Computer aided embroidery and designing
- Designing pay structure
- Designing effective hrd programs pdf
- Benefits of designing and managing service processes
- Designing and managing products
- Chapter 4 product and service design
- Fleishman job analysis system
- Designing and managing services
- Designing and managing services ppt
- Dialogues
- Designing and implementing branding strategies
- Designing and implementing brand architecture
- Designing and managing integrated marketing communications
- Test of detail
- Procedure of textbook development
- Designing and managing incentive compensation programs
- Product design and development plan
- Objectives of positioning
- Value network in marketing
- Designing and managing value networks
- Designing and managing value networks
- Types of jaycustomers
- The process of designing and maintaining an environment
- Extension design software
- Designing and managing integrated marketing communications
- Designing and managing integrated marketing channels
- Integrated marketing channel system
- Fighter brand strategy
- Amplified expressiveness
- Softwares can_ -in designing and modeling in every field
- Designing and managing services
- Principles of graphic devices
- Hrd design
- Conversation
- Designing and managing integrated marketing communications
- Designing and managing integrated marketing communications
- Process oriented performance-based assessment
- Designing a message
- Design a secure network
- Patrick is designing a large
- Rpa concept in rpd
- Designing customer oriented marketing channels
- Strategic choices in internal alignment design
- Human resource management defintion
- Thematic unit examples
- You are designing a sticker to advertise your band
- Channel design decision example
- Ros smith
- Designing motivating jobs adalah
- Designing marketing programmes to build brand equity
- Designing effective input
- Designing classroom language tests
- Zip line design calculations
- Steel roof structure design
- Steps in designing a customer driven marketing strategy
- Quality design process
- An organized method for making good buying decisions
- Designing a global sneaker
- Designing effective input
- Breakdown method of territory design
- Web designing dmis
- Channel design decision
- First step in designing an experiment