Show and Tell A Neural Image Caption Generator
- Slides: 38
Show and Tell: A Neural Image Caption Generator Oriol Vinyals, Google Alexander Toshev, Google Samy Bengio, Google Dumitru Erhan, Google
Example Image Credit: Oriol Vinyals
System flowchart Image Credit: Reference paper
Introduction • Generative model • CNN (vision) + LSTM (language) • End-to-end system trained to maximize likelihood of target description given image • State-of-the-art results • PASCAL dataset: BLEU-1: 25 → 59 • Flickr 30 k dataset: BLEU-1: 55 → 66 • MS COCO: BLEU-4 score of 27. 7
Introduction • Motivation from Machine Translation: encoder RNN and decoder RNN • Replace encoder RNN with deep CNN • Produces a rich representation of input image • Fully trainable network
Model Image Credit: Reference paper
Model • Objective: maximize probability of correct description given the image • Chain rule for joint probability over words of sentence S • Optimize sum of log probabilities over whole training set • Loss function: negative log likelihood of the correct word Image Credit: Reference paper
Sentence generation • Two approaches • Sampling: sample one word from probability distribution pi and feed the corresponding embedding as input at the next time step until an EOS token is sampled or till a maximum length • Beam search: keep k best sentences at each time • The paper used a beam size of 20 • Beam size of 1 degrades results by 2 BLEU points
Training details • Datasets are smaller • Solution • Transfer learning • Dropout • Ensembling • Can initialize weights of CNN • trained for classification task on Image. Net
Slide Credit: Oriol Vinyals
Evaluation • BLEU score for generation • Recall and mean rank for ranking • Mechanical Turk experiment: human subjects give a score on the usefulness of descriptions • Also use perplexity for hyperparameter tuning but not reported
Results – Sentence generation Image Credit: Reference paper
Results – Reranking Image Credit: Reference paper
Results – Human Evaluation Image Credit: Reference paper
Image Credit: Reference paper
Other results • Transfer learning • Flickr 30 k to Flickr 8 k improves BLEU score by 4 points • Generation diversity • Best candidate sentence is present in training set 80% of the times. Image Credit: Reference paper
Good example Image Credit: Oriol Vinyals
Bad example Image Credit: Oriol Vinyals
Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation Xinlei Chen, CMU C. Lawrence Zitnick, MSR, Redmond
Introduction • Motivation: A good image description is often said to “paint a picture in your mind’s eye. ” • Objective: To learn bi-directional representation that can generate • novel descriptions from images, and • visual representations from descriptions.
Image Credit: Larry Zitnick
Image Credit: Larry Zitnick
Image Credit: Larry Zitnick
Image Credit: Larry Zitnick
Image Credit: Larry Zitnick
Model Image Credit: Larry Zitnick
Objective •
Model • Main contribution is the addition of recurrent visual hidden layer u • Recurrent layer u tries to reconstruct visual features v • Visual hidden layer u is also used to predict the next word Image Credit: Reference paper
Hidden visual units u Image Credit: Larry Zitnick
Recurrent Neural Networks Image Credit: Larry Zitnick
This model • Image Credit: Reference paper
Model • Image Credit: Reference paper
Results – Sentence generation Image Credit: Reference paper
Results – Sentence generation Image Credit: Reference paper
Results – Retrieval tasks Image Credit: Reference paper
Results – Retrieval tasks Image Credit: Reference paper
Qualitative results Image Credit: Reference paper
Questions ?
- A neural image caption generator
- Show attend and tell
- Show not tell technique
- Tell me i will forget show me i will remember
- Show me and i will forget
- Apa image caption
- American psychological association example
- Famous manifestors
- Tell me what you eat and i shall tell you what you are
- Show and tell letter l
- Barrier konseling
- Show don't tell
- Show dont tell excited
- Show not tell clipart
- Show not tell in writing
- Show don't tell
- What does the picture tell or show
- Show don't tell writing exercises
- Show dont tell writing
- Leon gatys
- Cs 7643 github
- What does the image tell
- Horse conformation judging practice
- What does this image show
- Hildreth meiere
- What does this picture show
- Analog image and digital image
- Yearbook captions examples
- Video berasal dari daerah
- Headings text features
- Clever
- Karangan bunga puisi
- Case caption example
- What is an external text structure
- Photo caption examples
- Cutline vs caption
- Extended caption
- Headline vs caption
- Cutline vs caption