CS 65014501 Vision and Language Entrylevel Categories and

  • Slides: 37
Download presentation
CS 6501/4501: Vision and Language Entry-level Categories and Naming

CS 6501/4501: Vision and Language Entry-level Categories and Naming

Last Class • Referring Expressions vs Image Captions • Generating Referring Expressions • Referring

Last Class • Referring Expressions vs Image Captions • Generating Referring Expressions • Referring Expression Comprehension • Guest Lecture by Lisa Anne Hendricks – Deep. Mind • Captioning and Bias • Visual Grounding of Language

What would you call this? Grampus griseus Dolphin

What would you call this? Grampus griseus Dolphin

What would you call this? Object Organism Animal Chordate Vertebrate Bird Aquatic bird Swan

What would you call this? Object Organism Animal Chordate Vertebrate Bird Aquatic bird Swan Whistling swan Cygnus Colombianus

What would you call this? Object Artifact Instrument Transport Vehicle Craft Vessel Watercraft Ship

What would you call this? Object Artifact Instrument Transport Vehicle Craft Vessel Watercraft Ship Cargo vessel Freighter

“The apparent ease with which people identify common objects belies the subtlety and complexity

“The apparent ease with which people identify common objects belies the subtlety and complexity of the operations and structures involved in” Jolicoeur, Gluck & Kosslyn, 1984

Principles of Categorization – Eleanor Rosch 1976 - Cognitive Economy - Perceived World Structure

Principles of Categorization – Eleanor Rosch 1976 - Cognitive Economy - Perceived World Structure

Basic-Level Category The most abstract category at which members share a maximum number of

Basic-Level Category The most abstract category at which members share a maximum number of common attributes. The most abstract category for which an image could be reasonably representative of a class as a whole. Rosch et al, 1976 Superordinates: animal, vertebrate Basic-Level: bird Subordinates: Black-capped chickadee

Four experiments to find Basic-Level Categories • Attribute listing • Action (Motor movement) listing

Four experiments to find Basic-Level Categories • Attribute listing • Action (Motor movement) listing • Shape overlap • Average outline naming

Implications • Imagery: The most abstract category at which an average image can still

Implications • Imagery: The most abstract category at which an average image can still be recognized. • Development: Children can first recognize basic-level categories. • Perception: We first identify basic-level categories and then identify either subordinates or superordinates. • Language: The word of choice when naming instances even when belonging to subordinate categories.

Entry-Level Category The category that people are likely to name when presented with a

Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, vertebrate Basic Level: bird Entry Level: bird Subordinates: Black-capped chickadee

Entry-Level Category The category that people are likely to name when presented with a

Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, bird Basic Level: bird Entry Level: penguin Subordinates: Chinstrap penguin

Pictures and Names: Making the Connection Jolicoeur, Gluck, Kosslyn 1984

Pictures and Names: Making the Connection Jolicoeur, Gluck, Kosslyn 1984

Experiment 1 • Do we identify basic-level categories first and then we go to

Experiment 1 • Do we identify basic-level categories first and then we go to superordinate categories? Or do we have a separate process where we go directly to superordinate categories but it is slower? apple Name the basic level category / Name a superordinate (e. g. fruit)

Experiment 2 • How about subordinate categories? Do we indeed need additional perceptual information

Experiment 2 • How about subordinate categories? Do we indeed need additional perceptual information compared to recognizing superordinates? Picture followed by word

Experiment 3 • Is the entry point always the basic level category? What about

Experiment 3 • Is the entry point always the basic level category? What about for atypical objects? Give a name for the picture

Naming Image Content (0. 16) Grampus griseus American black bear Grizzly bear (0. 25)

Naming Image Content (0. 16) Grampus griseus American black bear Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple (0. 80) (0. 83) Vision Input Image Thousands of Noisy Category Predictions Grampus Naming griseus Pick the Best Dolphin What Should I Call It?

Is this hard? wordnet hierarchy Living thing Plant, Flora Bird Seabird Penguin King penguin

Is this hard? wordnet hierarchy Living thing Plant, Flora Bird Seabird Penguin King penguin Angiosperm Bulbous Plant Flower Narcissus Cormorant Orchid Frog Orchid Daffodil Daisy

How will we do it? Wordnet Linguistic resources Imagenet Google Web 1 T Computer

How will we do it? Wordnet Linguistic resources Imagenet Google Web 1 T Computer Vision Lots of text The Egyptian cat statue by the floor clock and perpetual motion Interior design of modern white and brown living room furniture hanging. SBU Captioned Dataset Man sits in a rusted car buried in the sand on Waitarere beach Labeled Images Little girl and her dog in northern Thailand. They both seemed. Our dog Zoe in her bed Emma in her hat looking super cute Lots of images with text

Scaling Naming Tasks! 48 categories > 7000 categories

Scaling Naming Tasks! 48 categories > 7000 categories

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin

Category Translation by Humans Friesian, Holstein-Friesian cow cattle pasture fence

Category Translation by Humans Friesian, Holstein-Friesian cow cattle pasture fence

1. 1 Category Translation: Text-based wordnet hierarchy 656 M Animal Mammal 15 M 128

1. 1 Category Translation: Text-based wordnet hierarchy 656 M Animal Mammal 15 M 128 M Seabird Cetacean 0. 9 M Penguin 88 M King penguin Cormorant 1. 2 M Whale 55 M 30 M 22 M Dolphin 6. 4 M Grampus griseus 0. 08 M Sperm whale n-gram Frequency Naturalness Bird Semantic Distance 366 M

1. 2 Category Translation: Image-based Friesian, Holstein-Friesian (1. 9071) cow (1. 1851) orange_tree (0.

1. 2 Category Translation: Image-based Friesian, Holstein-Friesian (1. 9071) cow (1. 1851) orange_tree (0. 6136) stall (0. 5630) mushroom (0. 3825) pasture (0. 3156) sheep (0. 3321) black_bear (0. 3015) puppy (0. 2409) pedestrian_bridge (0. 2353) nest Vision System

Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird buzzard, Buteo buteo

Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird buzzard, Buteo buteo hawk bird whinchat, Saxicola rubetra bird chat bird Weimaraner dog dog numbat, banded anteater, anteater cat rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird duck yellowbelly marmot, rockchuck Squirrel marmot rock

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level

1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin

Large Scale Categorization (0. 16) Grampus griseus American black bear Grizzly bear (0. 25)

Large Scale Categorization (0. 16) Grampus griseus American black bear Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple (0. 80) (0. 41) Flat Classifiers Selective Search Windows. van De Sande et al. ICCV 2011 Local descriptor s Coding (LLC), Wang et al. CVPR 2010 Spatial pooling

2. 1 Propagated Visual Estimates 656 M (0. 2) 15 M Mammal (0. 8)

2. 1 Propagated Visual Estimates 656 M (0. 2) 15 M Mammal (0. 8) 128 M Seabird (0. 2) 0. 9 M Cetacean (0. 8) 55 M Whale (0. 8) Penguin King penguin (0. 15) 1. 2 M (0. 15) (0. 05) Cormorant 30 M 0. 08 M Dolphin Grampus griseus (0. 6) 6. 4 M Sperm whale (0. 6) Our. Deng worket al. CVPR 2012 (0. 2) Naturalness Bird Specificity 22 M 366 M Accuracy 88 M (1. 0) Animal

2. 2 Supervised Learning (0. 80) (0. 16) Grampus griseus American black bear Grizzly

2. 2 Supervised Learning (0. 80) (0. 16) Grampus griseus American black bear Grizzly bear (0. 25) King penguin (0. 11) Cormorant Bear (0. 56) Homing pigeon Dog (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse Penguin (0. 03) Soapweed Tree (0. 12) Brazilian rosewood (0. 13) Bristlecone pine Palm tree (0. 04) Cliffdiving (0. 19) Crabapple (0. 41) training from weak annotations SBU Captioned Photo Dataset 1 million captioned images! Building House Bird

Extracting Meaning from Data Weights learned to recognize images with “tree” in caption snag

Extracting Meaning from Data Weights learned to recognize images with “tree” in caption snag shade tree bracket fungus, shelf fungus bristlecone pine, Rocky Mountain bristlecone pine, Pinus aristata Brazilian rosewood, caviuna wood, jacaranda, Dalbergia nigra redheaded woodpecker, redhead, Melanerpes erythrocephalus redbud, Cercis canadensis mangrove, Rhizophora mangle chiton, coat-of-mail shell, sea cradle, polyplacophore crab apple, crabapple papaya, papaia, pawpaw, papaya tree, melon tree, Carica papaya frogmouth Mammals Birds Instruments Structures Plants Other

Extracting Meaning from Data Weights learned to recognize images with “water” in caption water

Extracting Meaning from Data Weights learned to recognize images with “water” in caption water dog surfing, surfboarding, surfriding manatee, Trichechus manatus punt dip, plunge cliff diving fly-fishing sockeye, sockeye salmon, red salmon, blueback salmon, Oncorhynchus nerka sea otter, Enhydra lutris American coot, marsh hen, mud hen, water hen, Fulica americana booby canal boat, narrowboat Mammals Birds Instruments Structures Plants Other

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint farm, fence field horse, mule kite, dirt people tree, zoo gelding yearling shire yearling draft horse equine perissodactyl ungulate male horse tree equine male gelding horse pasture field cow fence

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual

Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint fence, junk sign stop sign street sign trash can tree feeder Hyla cleaner box large woody tree structure plant vascular tree structure building plant area logo street neighborhood building office building

Conclusions/Future Work • We explored different models for content naming in images. • Results

Conclusions/Future Work • We explored different models for content naming in images. • Results can be used to improve the larger goal of generating human-like image descriptions. • Go beyond nouns and infer other type of abstractions on action and attribute words.

Questions? 38

Questions? 38