From Large Scale Image Categorization to EntryLevel Categories

  • Slides: 37
Download presentation
From Large Scale Image Categorization to Entry-Level Categories Vicente Ordonez, Jia Deng, Yejin Choi,

From Large Scale Image Categorization to Entry-Level Categories Vicente Ordonez, Jia Deng, Yejin Choi, Alexander C. Berg, Tamara L. Berg

What would you call this? Grampus griseus Dolphin

What would you call this? Grampus griseus Dolphin

What would you call this? Object Organism Animal Chordate Vertebrate Bird Aquatic bird Swan

What would you call this? Object Organism Animal Chordate Vertebrate Bird Aquatic bird Swan Whistling swan Cygnus Colombianus

What would you call this? Object Artifact Instrument Transport Vehicle Craft Vessel Watercraft Ship

What would you call this? Object Artifact Instrument Transport Vehicle Craft Vessel Watercraft Ship Cargo vessel Freighter

“The apparent ease with which people identify common objects belies the subtlety and complexity

“The apparent ease with which people identify common objects belies the subtlety and complexity of the operations and structures involved in” Jolicoeur, Gluck & Kosslyn, 1984

Principles of Categorization – Eleanor Rosch 1976 - Cognitive Economy - Perceived World Structure

Principles of Categorization – Eleanor Rosch 1976 - Cognitive Economy - Perceived World Structure

Basic-Level Category The most abstract category at which members share a maximum number of

Basic-Level Category The most abstract category at which members share a maximum number of common attributes. The most abstract category for which an image could be reasonably representative of a class as a whole. Rosch et al, 1976 Superordinates: animal, vertebrate Basic-Level: bird Subordinates: Black-capped chickadee

Four experiments to find Basic. Level Categories • Attribute listing • Action (Motor movement)

Four experiments to find Basic. Level Categories • Attribute listing • Action (Motor movement) listing • Shape overlap • Average outline naming

Implications • Imagery: The most abstract category at which an average image can still

Implications • Imagery: The most abstract category at which an average image can still be recognized. • Development: Children can first recognize basiclevel categories. • Perception: We first identify basic-level categories and then identify either subordinates or superordinates. • Language: The word of choice when naming instances even when belonging to subordinate categories.

Entry-Level Category The category that people are likely to name when presented with a

Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, vertebrate Basic Level: bird Entry Level: bird Subordinates: Black-capped chickadee

Entry-Level Category The category that people are likely to name when presented with a

Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, bird Basic Level: bird Entry Level: penguin Subordinates: Chinstrap penguin

Pictures and Names: Making the Connection Jolicoeur, Gluck, Kosslyn 1984

Pictures and Names: Making the Connection Jolicoeur, Gluck, Kosslyn 1984

Experiment 1 • Do we identify basic-level categories first and then we go to

Experiment 1 • Do we identify basic-level categories first and then we go to superordinate categories? Or do we have a separate process where we go directly to superordinate categories but it is slower? apple Name the basic level category / Name a superordinate (e. g. fruit)

Experiment 2 • How about subordinate categories? Do we indeed need additional perceptual information

Experiment 2 • How about subordinate categories? Do we indeed need additional perceptual information compared to recognizing superordinates? Picture followed by word

Experiment 3 • Is the entry point always the basic level category? What about

Experiment 3 • Is the entry point always the basic level category? What about for atypical objects? Give a name for the picture

Naming Image Content Vision Input Image (0. 80) Grampus griseus (0. 83) American black

Naming Image Content Vision Input Image (0. 80) Grampus griseus (0. 83) American black bear (0. 16) Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple Thousands of Noisy Category Predictions Grampus Naming griseus Pick the Best Dolphin What Should I Call It?

Is this hard? wordnet hierarchy Living thing Plant, Flora Bird Seabird Penguin King penguin

Is this hard? wordnet hierarchy Living thing Plant, Flora Bird Seabird Penguin King penguin Angiosperm Bulbous Plant Flower Narcissus Cormorant Orchid Frog Orchid Daffodil Daisy

How will we do it? Wordnet Linguistic resources Imagenet Google Web 1 T Computer

How will we do it? Wordnet Linguistic resources Imagenet Google Web 1 T Computer Vision Lots of text The Egyptian cat statue by the floor clock and perpetual motion Interior design of modern white and brown living room furniture hanging. SBU Captioned Dataset Man sits in a rusted car buried in the sand on Waitarere beach Labeled Images Little girl and her dog in northern Thailand. They both seemed. Our dog Zoe in her bed Emma in her hat looking super cute Lots of images with text

Scaling Naming Tasks! 48 categories > 7000 categories

Scaling Naming Tasks! 48 categories > 7000 categories

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin

Category Translation by Humans Friesian, Holstein-Friesian cow cattle pasture fence

Category Translation by Humans Friesian, Holstein-Friesian cow cattle pasture fence

1. 1 Category Translation: Text-based wordnet hierarchy 656 M Animal Mammal 15 M 128

1. 1 Category Translation: Text-based wordnet hierarchy 656 M Animal Mammal 15 M 128 M Seabird Cetacean 0. 9 M Penguin 88 M King penguin 1. 2 M 55 M Whale 30 M 22 M Sperm whale Dolphin 6. 4 M Grampus griseus 0. 08 M n-gram Frequency Naturalness Bird Semantic Distance 366 M Cormorant

1. 2 Category Translation: Image-based Friesian, Holstein-Friesian (1. 9071) cow (1. 1851) orange_tree (0.

1. 2 Category Translation: Image-based Friesian, Holstein-Friesian (1. 9071) cow (1. 1851) orange_tree (0. 6136) stall (0. 5630) mushroom (0. 3825) pasture (0. 3156) sheep (0. 3321) black_bear (0. 3015) puppy (0. 2409) pedestrian_bridge (0. 2353) nest Vision System

Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird buzzard, Buteo buteo

Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird buzzard, Buteo buteo hawk bird whinchat, Saxicola rubetra bird chat bird Weimaraner dog dog numbat, banded anteater, anteater cat rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird duck yellowbelly marmot, rockchuck Squirrel marmot rock

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin

Large Scale Categorization Flat Classifiers Selective Search Windows. van De Sande et al. ICCV

Large Scale Categorization Flat Classifiers Selective Search Windows. van De Sande et al. ICCV 2011 Local descriptors Coding (LLC), Wang et al. CVPR 2010 Spatial pooling (0. 80) Grampus griseus (0. 41) American black bear (0. 16) Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple

2. 1 Propagated Visual Estimates Animal 656 M 15 M Mammal (0. 8) 128

2. 1 Propagated Visual Estimates Animal 656 M 15 M Mammal (0. 8) 128 M Seabird (0. 2) 0. 9 M Cetacean (0. 8) 55 M Whale (0. 8) Penguin (0. 15) 1. 2 M King penguin (0. 15) 30 M 0. 08 M Dolphin (0. 6) 6. 4 M Grampus (0. 6) griseus Deng et al. CVPR 2012 Our work Sperm (0. 2) whale Naturalness (0. 2) Specificity 22 M Bird Accuracy 88 M (1. 0) 366 M (0. 05) Cormorant

2. 2 Supervised Learning (0. 80) Grampus griseus (0. 41) American black bear (0.

2. 2 Supervised Learning (0. 80) Grampus griseus (0. 41) American black bear (0. 16) Grizzly bear (0. 25) King penguin (0. 11) Cormorant Bear (0. 56) Homing pigeon Dog (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse Penguin (0. 03) Soapweed Tree (0. 12) Brazilian rosewood Palm tree (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple training from weak annotations SBU Captioned Photo Dataset 1 million captioned images! Building House Bird

Extracting Meaning from Data Weights learned to recognize images with “tree” in caption Mammals

Extracting Meaning from Data Weights learned to recognize images with “tree” in caption Mammals Birds Instruments Structures Plants Other snag shade tree bracket fungus, shelf fungus bristlecone pine, Rocky Mountain bristlecone pine, Pinus aristata Brazilian rosewood, caviuna wood, jacaranda, Dalbergia nigra redheaded woodpecker, redhead, Melanerpes erythrocephalus redbud, Cercis canadensis mangrove, Rhizophora mangle chiton, coat-of-mail shell, sea cradle, polyplacophore crab apple, crabapple papaya, papaia, pawpaw, papaya tree, melon tree, Carica papaya frogmouth

Extracting Meaning from Data Weights learned to recognize images with “water” in caption water

Extracting Meaning from Data Weights learned to recognize images with “water” in caption water dog surfing, surfboarding, surfriding manatee, Trichechus manatus punt dip, plunge cliff diving fly-fishing sockeye, sockeye salmon, red salmon, blueback salmon, Oncorhynchus nerka sea otter, Enhydra lutris American coot, marsh hen, mud hen, water hen, Fulica americana booby canal boat, narrowboat Mammals Birds Instruments Structures Plants Other

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint farm, fence field horse, mule kite, dirt people tree, zoo gelding yearling shire yearling draft horse equine perissodactyl ungulate male horse tree equine male gelding horse pasture field cow fence

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint fence, junk sign stop sign street sign trash can tree feeder Hyla cleaner box large woody tree structure plant vascular tree structure building plant area logo street neighborhood building office building

Evaluation: Content Naming Test Set B – High Confidence Prediction Scores Test Set A

Evaluation: Content Naming Test Set B – High Confidence Prediction Scores Test Set A – Random Images 26% 24% 22% 20% 18% 16% 14% 12% 10% 8% 8% 6% 6% 4% 4% 2% 2% 0% 0% Flat Deng et al. Propagated Supervised Combined Classifier CVPR'12 Visual Learning Estimates Precision Recall Flat Deng et al. Propagated Supervised Combined Classifier CVPR'12 Visual Learning Estimates Precision Recall

Conclusions/Future Work • We explored different models for content naming in images. • Results

Conclusions/Future Work • We explored different models for content naming in images. • Results can be used to improve the larger goal of generating human-like image descriptions. • Go beyond nouns and infer other type of abstractions on action and attribute words.

Questions?

Questions?