Analysis of Large Scale Visual Recognition FeiFei Li
- Slides: 60
Analysis of Large Scale Visual Recognition Fei-Fei Li and Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Backpack
Flute Strawberry Traffic light Backpack Matchstick Bathing cap Sea lion Racket
Large-scale recognition
Large-scale recognition Need benchmark datasets
PASCAL VOC 2005 -2012 20 object classes Classification: person, motorcycle Detection 22, 591 images Segmentation Person Motorcycle Action: riding bicycle Everingham, Van Gool, Williams, Winn and Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV 2010.
Large Scale Visual Recognition Challenge (ILSVRC) 2010 -2012 20 object classes 1000 object classes 22, 591 images 1, 431, 167 images Dalmatian http: //image-net. org/challenges/LSVRC/{2010, 2011, 2012}
Variety of object classes in ILSVRC
Variety of object classes in ILSVRC
ILSVRC Task 1: Classification Steel drum
ILSVRC Task 1: Classification Steel drum Output: Scale T-shirt Steel drum Drumstick Mud turtle ✔ Output: Scale T-shirt Giant panda Drumstick Mud turtle ✗
ILSVRC Task 1: Classification Steel drum Accuracy Output: Scale T-shirt Steel drum Drumstick Mud turtle 1 = 100, 000 Σ 100, 000 images ✔ Output: Scale T-shirt Giant panda Drumstick Mud turtle ✗ 1[correct on image i]
ILSVRC Task 1: Classification # Submissions 2010 2011 0. 72 0. 74 2012 0. 85 Accuracy (5 predictions/image)
ILSVRC Task 2: Classification + Localization Steel drum
ILSVRC Task 2: Classification + Localization Steel drum Output Persian cat ✔ Picket fence Steel drum Loud speaker Folding chair
ILSVRC Task 2: Classification + Localization Steel drum Output Persian cat ✔ Output (bad localization) Persian cat ✗ Picket fence Steel drum Folding chair Output (bad classification) Persian cat Loud speaker Folding chair Picket fence Steel drum Loud speaker ✗ Picket fence King penguin Loud speaker Folding chair
ILSVRC Task 2: Classification + Localization Steel drum Output Persian cat ✔ Accuracy 1 = 100, 000 Σ 100, 000 images Picket fence Steel drum Loud speaker Folding chair 1[correct on image i]
Accuracy (5 predictions) ILSVRC Task 2: Classification + Localization ISI G G V O _ V r D e p OR u F S X n o i is
What happens under the hood?
What happens under the hood on classification+localization?
What happens under the hood on classification+localization? Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Preliminaries: • ILSVRC-500 (2012) dataset • Leading algorithms What happens under the hood on classification+localization? Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Preliminaries: • ILSVRC-500 (2012) dataset • Leading algorithms What happens under the hood on classification+localization? • A closer look at small objects • A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Preliminaries: • ILSVRC-500 (2012) dataset • Leading algorithms What happens under the hood on classification+localization? • A closer look at small objects • A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
ILSVRC (2012) 1000 object classes Easy to localize Hard to localize
ILSVRC-500 (2012) 500 classes with smallest objects Easy to localize Hard to localize
ILSVRC-500 (2012) 500 classes with smallest objects Hard to localize Easy to localize Object scale (fraction of image area occupied by target object) ILSVRC-500 (2012) 500 object categories 25. 3% PASCAL VOC (2012) 20 object categories 25. 2%
Chance Performance of Localization Steel drum B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 B 9 N = 9 here
Chance Performance of Localization Steel drum B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 B 9 N = 9 here
Chance Performance of Localization Steel drum B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 B 9 N = 9 here ILSVRC-500 (2012) 500 object categories 8. 4% PASCAL VOC (2012) 20 object categories 8. 8%
Level of clutter Steel drum - Generate candidate object regions using method of Selective Search for Object Detection van. De. Sande et al. ICCV 2011 - Filter out regions inside object - Count regions
Level of clutter Steel drum - Generate candidate object regions using method of Selective Search for Object Detection van. De. Sande et al. ICCV 2011 - Filter out regions inside object - Count regions ILSVRC-500 (2012) 500 object categories 128 ± 35 PASCAL VOC (2012) 20 object categories 130 ± 29
Preliminaries: • ILSVRC-500 (2012) dataset – similar to PASCAL • Leading algorithms What happens under the hood on classification+localization? • A closer look at small objects • A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Super. Vision (SV) Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (Krizhevsky NIPS 12) Image classification: Deep convolutional neural networks • 7 hidden “weight” layers, 650 K neurons, 60 M parameters, 630 M connections • Rectified Linear Units, max pooling, dropout trick • Randomly extracted 224 x 224 patches for more data • Trained with SGD on two GPUs for a week, fully supervised Localization: Regression on (x, y, w, h) http: //image-net. org/challenges/LSVRC/2012/supervision. pdf
Super. Vision (SV) Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (Krizhevsky NIPS 12) Image classification: Deep convolutional neural networks • 7 hidden “weight” layers, 650 K neurons, 60 M parameters, 630 M connections • Rectified Linear Units, max pooling, dropout trick • Randomly extracted 224 x 224 patches for more data • Trained with SGD on two GPUs for a week, fully supervised Localization: Regression on (x, y, w, h) http: //image-net. org/challenges/LSVRC/2012/supervision. pdf
OXFORD_VGG (VGG) Karen Simonyan, Yusuf Aytar, Andrea Vedaldi, Andrew Zisserman Image classification: Fisher vector + linear SVM (Sanchez CVPR 11) • Root-SIFT (Arandjelovic CVPR 12), color statistics, augmentation with patch location (x, y) (Sanchez PRL 12) • Fisher vectors: 1024 Gaussians, 135 K dimensions • No SPM, product quantization to compress • Semi-supervised learning to find additional bounding boxes • 1000 one-vs-rest SVM trained with Pegasos SGD • 135 M parameters! Localization: Deformable part-based models (Felzenszwalb PAMI 10), without parts (root-only) http: //image-net. org/challenges/LSVRC/2012/oxford_vgg. pdf
Preliminaries: • ILSVRC-500 (2012) dataset – similar to PASCAL • Leading algorithms: SV and VGG What happens under the hood on classification+localization? • A closer look at small objects • A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Results on ILSVRC-500 Cls+loc accuracy 54. 3% SV 45. 8% VGG
Difference in accuracy: SV versus VGG Classification-only Persian cat ✔ Picket fence Steel drum Loud speaker Folding chair
Difference in accuracy: SV versus VGG Cls. Accuracy: SV - VGG Classification-only Object scale
Difference in accuracy: SV versus VGG Cls. Accuracy: SV - VGG Classification-only SV better (452 classes) VGG better (34 classes) Object scale
Difference in accuracy: SV versus VGG SV beats VGG Cls. Accuracy: SV - VGG Classification-only SV better *** (452 classes) *** VGG beats SV VGG better (34 classes) Object scale * *** ***
Difference in accuracy: SV versus VGG Classification+Localiation SV better (452 classes) VGG better (34 classes) Object scale Cls+Loc Accuracy: SV - VGG Cls. Accuracy: SV - VGG Classification-only SV better (338 classes) VGG better (150 classes) Object scale
Cumulative accuracy across scales Classification+Localization SV VGG Object scale SV Cumulative cls+loc accuracy Cumulative cls. accuracy Classification-only VGG Object scale
Cumulative accuracy across scales Classification+Localization SV VGG Object scale SV Cumulative cls+loc accuracy Cumulative cls. accuracy Classification-only VGG 205 smallest object classes 0. 24 Object scale
Preliminaries: • ILSVRC-500 (2012) dataset – similar to PASCAL • Leading algorithms: SV and VGG What happens under the hood on classification+localization? • SV always great at classification, but VGG does better than SV at localizing small objects • A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Preliminaries: • ILSVRC-500 (2012) dataset – similar to PASCAL • Leading algorithms: SV and VGG What happens under the hood on classification+localization? • SV always great at classification, but VGG does better than SV at localizing small objects WHY? • A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Preliminaries: • ILSVRC-500 (2012) dataset – similar to PASCAL • Leading algorithms: SV and VGG What happens under the hood on classification+localization? • SV always great at classification, but VGG does better than SV at localizing small objects • A closer look at textured objects Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
Textured objects (ILSVRC-500) Low Amount of texture High
Textured objects (ILSVRC-500) Amount of texture Low # classes High No texture Low texture Medium texture High texture 116 189 143 52
Textured objects (ILSVRC-500) Amount of texture Low High No texture Low texture Medium texture High texture # classes 116 189 143 52 Object scale 20. 8% 23. 7% 23. 5% 25. 0%
Textured objects (416 classes) Amount of texture Low High No texture Low texture Medium texture High texture # classes 116 189 143 115 52 35 Object scale 20. 8% 23. 7% 20. 8% 23. 5% 20. 8% 25. 0% 20. 8%
Localizing textured objects (416 classes, same average object scale at each level of texture) VGG Localization accuracy SV Level of texture
Localizing textured objects (416 classes, same average object scale at each level of texture) Localization accuracy SV VGG On correctly classified images Level of texture
Localizing textured objects (416 classes, same average object scale at each level of texture) Localization accuracy SV VGG On correctly classified images Level of texture
Preliminaries: • ILSVRC-500 (2012) dataset – similar to PASCAL • Leading algorithms: SV and VGG What happens under the hood on classification+localization? • SV always great at classification, but VGG does better than SV at localizing small objects • Textured objects easier to localize, especially for SV Olga Russakovsky, Jia Deng, Zhiheng Huang, Alex Berg, Li Fei-Fei Detecting avocados to zucchinis: what have we done, and where are we going? ICCV 2013 http: //image-net. org/challenges/LSVRC/2012/analysis
ILSVRC 2013 NEW with large-scale object detection Fully annotated 200 object classes across 60, 000 images Person Car Motorcycle Helmet Allows evaluation of generic object detection in cluttered scenes at scale http: //image-net. org/challenges/LSVRC/2013/
ILSVRC 2013 NEW with large-scale object detection Statistics PASCAL VOC 2012 Object classes 20 Training Validation Testing ILSVRC 2013 10 x 200 Images 5. 7 K 395 K Objects 13. 6 K Images 5. 8 K Objects 13. 8 K Images 11. 0 K 40. 1 K Objects --- 25 x 345 K 20. 1 K 4 x More than 50, 000 person instances annotated http: //image-net. org/challenges/LSVRC/2013/ 55. 5 K
ILSVRC 2013 NEW with large-scale object detection • 159 downloads so far: http: //image-net. org/challenges/LSVRC/2013/ • Submission deadline Nov. 15 th • ICCV workshop on December 7 th, 2013 • Fine-Grained Challenge 2013: https: //sites. google. com/site/fgcomp 2013/
Thank you! Dr. Jia Deng Stanford U. Zhiheng Huang Stanford U. Jonathan Krause Stanford U. Sanjeev Satheesh Stanford U. Hao Su Stanford U. Prof. Alex Berg UNC Chapel Hill
- Feifei jiang
- Feifei li stanford
- Scale definition geography
- What is the definition of map scale
- Map scale ratio
- Large scale vs small scale map
- Geography skills handbook
- Berk atikoglu
- A comparison of approaches to large-scale data analysis
- Convolutional neural networks for visual recognition
- Visual recognition with human in the loop
- Object recognition from local scale-invariant features
- The anatomy of a large scale hypertextual web search engine
- Oogoogle translate
- Circulation of air masses
- Small scale fermentation
- Large scale chart definition
- Thick and thin film ic
- Large scale global investment
- Automatic wrappers for large scale web extraction
- Large scale fading in wireless communication
- Abhishek verma google
- Large scale interventions
- Large scale entry example
- Distbelief
- Large scale map definition
- Pregel: a system for large-scale graph processing
- Large scale systems
- The anatomy of a large scale hypertextual web search engine
- The anatomy of a large-scale hypertextual web search engine
- Large scale manufacturing of semisolids
- Market entry modes for international businesses chapter 7
- Small scale map
- Purification of water on large scale storage
- Oag: toward linking large-scale heterogeneous entity graphs
- Large scale classification
- Large-scale philanthropy
- Aser
- Apa yang dimaksud dengan pemrograman konvensional?
- Sadq h10
- Parts of a scale
- Five-tone scale
- Scale up vs scale out storage
- Scale up and scale out in hadoop
- How to understand scale drawings
- Plain scale and diagonal scale
- What is a proportional two dimensional drawing of an object
- Poss scale vs rass scale
- Scale out architecture in big data
- Inner scale
- Inner and outer scale of protractor
- What is representative fraction
- Large signal analysis of bjt
- The large cool store analysis
- Large signal analysis of differential amplifier
- Visual merchandising swot analysis
- Optic picture analysis
- Ies visual analysis
- Optic strategy examples
- Visual analysis
- Multiple scale analysis