Network Structure Hungyi Lee Three Steps for Deep
- Slides: 52
Network Structure Hung-yi Lee 李宏毅
Three Steps for Deep Learning Step 1: Neural Network Step 2: Cost Function Step 3: Optimization Step 1. A neural network is a function composed of simple functions (neurons) Ø Usually we design the network structure, and let machine find parameters from data Step 2. Cost function evaluates how good a set of parameters is Ø We design the cost function based on the task Step 3. Find the best function set (e. g. gradient descent)
Outline • Basic structure (3/03) • Fully Connected Layer • Recurrent Structure • Convolutional/Pooling Layer • Special Structure (3/17) • Spatial Transformation Layer • Highway Network / Grid LSTM • Recursive Structure • External Memory • Batch Normalization • Sequence-to-sequence / Attention (3/24)
Prerequisite • Brief Introduction of Deep Learning • https: //youtu. be/Dr. WRl. EFefw? list=PLJV_el 3 u. VTs. Py 9 o. CRY 30 o. BPNLCo 89 yu 49 • Convolutional Neural Network • https: //youtu. be/Fr. KWi. Rv 254 g? list=PLJV_el 3 u. VTs. Py 9 o. C RY 30 o. BPNLCo 89 yu 49 • Recurrent Neural Network (Part I) • https: //youtu. be/x. CGid. Aey. S 4 M? list=PLJV_el 3 u. VTs. Py 9 o CRY 30 o. BPNLCo 89 yu 49 • Recurrent Neural Network (Part II) • https: //www. youtube. com/watch? v=r. Tqm. Wlnwz_0&list =PLJV_el 3 u. VTs. Py 9 o. CRY 30 o. BPNLCo 89 yu 49&index=25
Basic Structure: Fully Connected Layer
Fully Connected Layer Output of a neuron: Layer …… …… Neuron i Output of one layer: …… …… Layer nodes : a vector
Fully Connected Layer to Layer …… …… Layer nodes from neuron j (Layer to neuron i (Layer ) )
Fully Connected Layer : bias for neuron i at layer l …… …… Layer nodes bias for all neurons in layer l
Fully Connected Layer : input of the activation function for neuron i at layer l : input of the activation function all the neurons in layer l …… …… …… Layer nodes
Relations between Layer Outputs …… …… Layer nodes
Relations between Layer Outputs … … …… …… Layer nodes
Relations between Layer Outputs …… …… Layer nodes
Relations between Layer Outputs …… …… Layer nodes
Basic Structure: Recurrent Structure Simplify the network by using the same function again and again
Reference K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, J. Schmidhuber, "LSTM: A Search Space Odyssey, " in IEEE Transactions on Neural Networks and Learning Systems, 2016 Rafal Józefowicz, Wojciech Zaremba, Ilya Sutskever, “An Empirical Exploration of Recurrent Network Architectures, ” in ICML, 2015 https: //www. cs. toronto. edu/~ graves/preprint. pdf
Recurrent Neural Network h and h’ are vectors with the same dimension • y 1 h 0 f x 1 y 2 h 1 f x 2 y 3 h 2 f h 3 …… x 3 No matter how long the input/output sequence is, we only need one function f
Deep RNN … … … b 0 c 1 c 2 c 3 f 2 b 1 y 1 h 0 f 1 x 1 f 2 b 2 y 2 h 1 f 1 x 2 f 2 b 3 …… y 3 h 2 f 1 x 3 h 3 ……
Bidirectional RNN x 1 b 0 f 2 x 2 b 1 c 1 f 3 f 1 x 1 b 2 c 2 y 1 a 1 h 0 f 2 x 3 f 1 x 2 b 3 c 3 y 2 a 2 h 1 f 2 f 3 y 3 a 3 h 2 f 1 x 3 h 3
Pyramidal RNN • Reducing the number of time steps W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ” ICASSP, 2016
Naïve RNN • y h f x h' Wh h y Wo h' Wi x h' softmax Ignore bias here
yt LSTM yt ct-1 ct LSTM ht-1 Naive xt ht ht-1 ht xt c change slowly ct is ct-1 added by something h change faster ht and ht-1 can be very different
z ct-1 zi zf zf zi z ht-1 xt W Wi Wf zo zo xt Wo ht-1 xt ht-1
xt z W ct-1 ht-1 ct-1 diagonal “peephole ” zf zo zi z ht-1 xt zo zf zi obtained by the same way
yt ct ct-1 tanh zf zi z ht-1 xt zo ht
LSTM yt+1 yt ct+1 ct ct-1 tanh zf zi z tanh zo zf zi z zo ht ht-1 xt ht xt+1
GRU yt ht ht-1 1 reset update r z ht-1 xt h' xt
Example Task • (Simplified) Speech Recognition: Frame classification on TIMIT y 1 y 2 y 3 y 4 TSI TSI I x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 …… I N …… Utterance 1 N N S …… S @ @ x 1 x 2 x 3 x 4 …… Utterance 2
Target Delay • Only for unidirectional RNN Delay 3 steps: True labels: x x x TSI TSI TSI I I N N N N
LSTM > RNN > feedforward Bi-direction > uni-direction
Forward direction Reverse direction
Training LSTM is faster than RNN
LSTM: A Search Space Odyssey Standard LSTM works well Simply LSTM: coupling input and forget gate, removing peephole Forget gate is critical for performance Output gate activation function is critical
An Empirical Exploration of Recurrent Network Architectures LSTM-f/i/o: removing forget/input/output gates LSTM-b: large bias Importance: forget > input > output Large bias forget gate is helpful
An Empirical Exploration of Recurrent Network Architectures
Basic Structure: Convolutional / Pooing Layer Simplify the neural network (based on prior knowledge to the task)
Convolutional Layer Receptive Field Sparse Connectivity Each neural only connects to part of the output of the previous layer Different neurons have different, but overlapping, receptive fields ……
Convolutional Layer Sparse Connectivity Each neural only connects to part of the output of the previous layer Parameter Sharing The neurons with different receptive fields can use the same set of parameters. …… Less parameters then fully connected layer
Convolutional Layer Considering neuron 1 and 3 as “filter 1” (kernel 1) filter (kernel) size: size of the receptive field of a neuron Stride = 2 Considering neuron 2 and 4 as “filter 2” (kernel 2) …… Kernel size, no. of filter, stride are all designed by the developers.
Example – 1 D Signal + Single Channel Classification, Predict the future … Audio Signal, Stock Value …
Example – 1 D Signal + Multiple Channel A document: each word is a vector I like this movie very much …… Does this kind of receptive field make sense?
Example – 2 D Signal + Single Channel 0 1 0 0 1 1 0 0 0 1 1 0 0 6 x 6 black & white picture image 7 0 : 8 1 : 9 0 : 0 10: … 1 0 0 1 … Size of Receptive field is 3 x 3, Stride is 1 1 1 : 2 0 : 3 0 : 4 0 : 13 0 : 0 14 : 15: 1 16: 1 Only show 1 filter here …
Example – 2 D Signal + Multiple Channel 7 0 0 : 8 1 1 : 9 0 1 : 0 0 10: 0 0 0 0 … 1 0 0 0 0 1 0 11 00 00 01 00 1 0 0 00 11 01 00 10 0 1 1 0 0 1 00 00 10 11 00 0 11 00 00 01 10 0 0 1 0 0 00 11 00 01 10 0 1 0 6 x 6 colorful image 0 … Size of Receptive field is 3 x 3 x 3, Stride is 1 1 : 2 0 0 : 3 0 1 : 4 0 0 : 13 0 1 : 0 1 14 : 15: 1 0 16: 1 1 1 0 Only show 1 filter here …
Without Zero Padding
Pooling Layer … Average Pooling: Max Pooling: … nodes … … Layer nodes L 2 Pooling:
Pooling Layer Convolutional Layer Which outputs should be grouped together? Pooling Layer Subsampling …… Group the neurons corresponding to the same filter with nearby receptive fields
Pooling Layer Convolutional Layer Which outputs should be grouped together? Pooling Layer Maxout Network How do you know which neurons detect the same pattern? …… Group the neurons with the same receptive field
Combination of Different Basic Layers Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, “Learning the Speech Front-end. With Raw Waveform CLDNNs, ” In INTERPSEECH 2015
Combination of Different Basic Layers Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, “Learning the Speech Front-end. With Raw Waveform CLDNNs, ” In INTERPSEECH 2015
Combination of Different Basic Layers Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, “Learning the Speech Front-end. With Raw Waveform CLDNNs, ” In INTERPSEECH 2015 3 layers
Next Time • 3/10: TAs will teach Tensor. Flow • Tensor. Flow for regression • Tensor. Flow for word vector • word vector: https: //www. youtube. com/watch? v=X 7 PH 3 Nu. YW 0 Q • Tensor. Flow for CNN • If you want to learn Theano • http: //speech. ee. ntu. edu. tw/~tlkagk/courses/MLDS_20 15_2/Lecture/Theano%20 DNN. ecm. mp 4/index. html • http: //speech. ee. ntu. edu. tw/~tlkagk/courses/MLDS_20 15_2/Lecture/Theano%20 RNN. ecm. mp 4/index. html
- 硬train一發
- Hungyi lee
- Hungyi
- Hungyi lee
- Hungyi
- Surface and deep structure
- Surface and deep structure
- Subject-dqrnghtp
- Yntax
- Deep asleep deep asleep it lies
- Deep forest: towards an alternative to deep neural networks
- 深哉深哉耶穌的愛
- Supervised learning
- Cite at least 5 axial steps and 5 locomotor steps
- Fspos vägledning för kontinuitetshantering
- Novell typiska drag
- Tack för att ni lyssnade bild
- Ekologiskt fotavtryck
- Shingelfrisyren
- En lathund för arbete med kontinuitetshantering
- Personalliggare bygg undantag
- Vilotidsbok
- Sura för anatom
- Vad är densitet
- Datorkunskap för nybörjare
- Stig kerman
- Mall debattartikel
- Autokratiskt ledarskap
- Nyckelkompetenser för livslångt lärande
- Påbyggnader för flakfordon
- Formel för lufttryck
- Offentlig förvaltning
- Kyssande vind
- Presentera för publik crossboss
- Vad är ett minoritetsspråk
- Bat mitza
- Treserva lathund
- Mjälthilus
- Bästa kameran för astrofoto
- Centrum för kunskap och säkerhet
- Programskede byggprocessen
- Bra mat för unga idrottare
- Verktyg för automatisering av utbetalningar
- Rutin för avvikelsehantering
- Smärtskolan kunskap för livet
- Ministerstyre för och nackdelar
- Tack för att ni har lyssnat
- Referatmarkering
- Redogör för vad psykologi är
- Borstål, egenskaper
- Tack för att ni har lyssnat
- Borra hål för knoppar
- Vilken grundregel finns det för tronföljden i sverige?