From Large Scale Image Categorization to EntryLevel Categories
- Slides: 37
From Large Scale Image Categorization to Entry-Level Categories Vicente Ordonez, Jia Deng, Yejin Choi, Alexander C. Berg, Tamara L. Berg
What would you call this? Grampus griseus Dolphin
What would you call this? Object Organism Animal Chordate Vertebrate Bird Aquatic bird Swan Whistling swan Cygnus Colombianus
What would you call this? Object Artifact Instrument Transport Vehicle Craft Vessel Watercraft Ship Cargo vessel Freighter
“The apparent ease with which people identify common objects belies the subtlety and complexity of the operations and structures involved in” Jolicoeur, Gluck & Kosslyn, 1984
Principles of Categorization – Eleanor Rosch 1976 - Cognitive Economy - Perceived World Structure
Basic-Level Category The most abstract category at which members share a maximum number of common attributes. The most abstract category for which an image could be reasonably representative of a class as a whole. Rosch et al, 1976 Superordinates: animal, vertebrate Basic-Level: bird Subordinates: Black-capped chickadee
Four experiments to find Basic. Level Categories • Attribute listing • Action (Motor movement) listing • Shape overlap • Average outline naming
Implications • Imagery: The most abstract category at which an average image can still be recognized. • Development: Children can first recognize basiclevel categories. • Perception: We first identify basic-level categories and then identify either subordinates or superordinates. • Language: The word of choice when naming instances even when belonging to subordinate categories.
Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, vertebrate Basic Level: bird Entry Level: bird Subordinates: Black-capped chickadee
Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, bird Basic Level: bird Entry Level: penguin Subordinates: Chinstrap penguin
Pictures and Names: Making the Connection Jolicoeur, Gluck, Kosslyn 1984
Experiment 1 • Do we identify basic-level categories first and then we go to superordinate categories? Or do we have a separate process where we go directly to superordinate categories but it is slower? apple Name the basic level category / Name a superordinate (e. g. fruit)
Experiment 2 • How about subordinate categories? Do we indeed need additional perceptual information compared to recognizing superordinates? Picture followed by word
Experiment 3 • Is the entry point always the basic level category? What about for atypical objects? Give a name for the picture
Naming Image Content Vision Input Image (0. 80) Grampus griseus (0. 83) American black bear (0. 16) Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple Thousands of Noisy Category Predictions Grampus Naming griseus Pick the Best Dolphin What Should I Call It?
Is this hard? wordnet hierarchy Living thing Plant, Flora Bird Seabird Penguin King penguin Angiosperm Bulbous Plant Flower Narcissus Cormorant Orchid Frog Orchid Daffodil Daisy
How will we do it? Wordnet Linguistic resources Imagenet Google Web 1 T Computer Vision Lots of text The Egyptian cat statue by the floor clock and perpetual motion Interior design of modern white and brown living room furniture hanging. SBU Captioned Dataset Man sits in a rusted car buried in the sand on Waitarere beach Labeled Images Little girl and her dog in northern Thailand. They both seemed. Our dog Zoe in her bed Emma in her hat looking super cute Lots of images with text
Scaling Naming Tasks! 48 categories > 7000 categories
1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin
1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin
Category Translation by Humans Friesian, Holstein-Friesian cow cattle pasture fence
1. 1 Category Translation: Text-based wordnet hierarchy 656 M Animal Mammal 15 M 128 M Seabird Cetacean 0. 9 M Penguin 88 M King penguin 1. 2 M 55 M Whale 30 M 22 M Sperm whale Dolphin 6. 4 M Grampus griseus 0. 08 M n-gram Frequency Naturalness Bird Semantic Distance 366 M Cormorant
1. 2 Category Translation: Image-based Friesian, Holstein-Friesian (1. 9071) cow (1. 1851) orange_tree (0. 6136) stall (0. 5630) mushroom (0. 3825) pasture (0. 3156) sheep (0. 3321) black_bear (0. 3015) puppy (0. 2409) pedestrian_bridge (0. 2353) nest Vision System
Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird buzzard, Buteo buteo hawk bird whinchat, Saxicola rubetra bird chat bird Weimaraner dog dog numbat, banded anteater, anteater cat rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird duck yellowbelly marmot, rockchuck Squirrel marmot rock
1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin
Large Scale Categorization Flat Classifiers Selective Search Windows. van De Sande et al. ICCV 2011 Local descriptors Coding (LLC), Wang et al. CVPR 2010 Spatial pooling (0. 80) Grampus griseus (0. 41) American black bear (0. 16) Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple
2. 1 Propagated Visual Estimates Animal 656 M 15 M Mammal (0. 8) 128 M Seabird (0. 2) 0. 9 M Cetacean (0. 8) 55 M Whale (0. 8) Penguin (0. 15) 1. 2 M King penguin (0. 15) 30 M 0. 08 M Dolphin (0. 6) 6. 4 M Grampus (0. 6) griseus Deng et al. CVPR 2012 Our work Sperm (0. 2) whale Naturalness (0. 2) Specificity 22 M Bird Accuracy 88 M (1. 0) 366 M (0. 05) Cormorant
2. 2 Supervised Learning (0. 80) Grampus griseus (0. 41) American black bear (0. 16) Grizzly bear (0. 25) King penguin (0. 11) Cormorant Bear (0. 56) Homing pigeon Dog (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse Penguin (0. 03) Soapweed Tree (0. 12) Brazilian rosewood Palm tree (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple training from weak annotations SBU Captioned Photo Dataset 1 million captioned images! Building House Bird
Extracting Meaning from Data Weights learned to recognize images with “tree” in caption Mammals Birds Instruments Structures Plants Other snag shade tree bracket fungus, shelf fungus bristlecone pine, Rocky Mountain bristlecone pine, Pinus aristata Brazilian rosewood, caviuna wood, jacaranda, Dalbergia nigra redheaded woodpecker, redhead, Melanerpes erythrocephalus redbud, Cercis canadensis mangrove, Rhizophora mangle chiton, coat-of-mail shell, sea cradle, polyplacophore crab apple, crabapple papaya, papaia, pawpaw, papaya tree, melon tree, Carica papaya frogmouth
Extracting Meaning from Data Weights learned to recognize images with “water” in caption water dog surfing, surfboarding, surfriding manatee, Trichechus manatus punt dip, plunge cliff diving fly-fishing sockeye, sockeye salmon, red salmon, blueback salmon, Oncorhynchus nerka sea otter, Enhydra lutris American coot, marsh hen, mud hen, water hen, Fulica americana booby canal boat, narrowboat Mammals Birds Instruments Structures Plants Other
Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint farm, fence field horse, mule kite, dirt people tree, zoo gelding yearling shire yearling draft horse equine perissodactyl ungulate male horse tree equine male gelding horse pasture field cow fence
Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint fence, junk sign stop sign street sign trash can tree feeder Hyla cleaner box large woody tree structure plant vascular tree structure building plant area logo street neighborhood building office building
Evaluation: Content Naming Test Set B – High Confidence Prediction Scores Test Set A – Random Images 26% 24% 22% 20% 18% 16% 14% 12% 10% 8% 8% 6% 6% 4% 4% 2% 2% 0% 0% Flat Deng et al. Propagated Supervised Combined Classifier CVPR'12 Visual Learning Estimates Precision Recall Flat Deng et al. Propagated Supervised Combined Classifier CVPR'12 Visual Learning Estimates Precision Recall
Conclusions/Future Work • We explored different models for content naming in images. • Results can be used to improve the larger goal of generating human-like image descriptions. • Go beyond nouns and infer other type of abstractions on action and attribute words.
Questions?
- Linear geography
- A map scale definition
- Map scale ratio
- Introduction to topographic maps
- Geography skills handbook
- Pineapple categorization binary
- Web page categorization
- Prototype psychology
- What is text categorization
- Fine-grained visual categorization
- The definitional approach to categorization
- Statistics of natural image categories
- The anatomy of a large-scale hypertextual web search engine
- The anatomy of a large scale hypertextual web search engine
- Circulation of air masses
- Large scale fermenter design
- Large map scale
- Integrated circuit
- Large scale global investment
- Automatic wrappers for large scale web extraction
- Workload analysis of a large-scale key-value store
- Large scale fading in wireless communication
- Large scale cluster management at google with borg
- Large scale interventions
- Large scale entry example
- Neural language model
- A comparison of approaches to large-scale data analysis
- How do maps help focus the reader's attention
- Pregel: a system for large-scale graph processing
- Large scale systems
- The anatomy of a large scale hypertextual web search engine
- The anatomy of a large-scale hypertextual web search engine
- Large scale manufacturing of semisolids
- Market entry modes for international businesses chapter 7
- Small scale map
- Double pot method water purification
- Oag: toward linking large-scale heterogeneous entity graphs
- Large scale classification