CS 65014501 Vision and Language Entrylevel Categories and
- Slides: 37
CS 6501/4501: Vision and Language Entry-level Categories and Naming
Last Class • Referring Expressions vs Image Captions • Generating Referring Expressions • Referring Expression Comprehension • Guest Lecture by Lisa Anne Hendricks – Deep. Mind • Captioning and Bias • Visual Grounding of Language
What would you call this? Grampus griseus Dolphin
What would you call this? Object Organism Animal Chordate Vertebrate Bird Aquatic bird Swan Whistling swan Cygnus Colombianus
What would you call this? Object Artifact Instrument Transport Vehicle Craft Vessel Watercraft Ship Cargo vessel Freighter
“The apparent ease with which people identify common objects belies the subtlety and complexity of the operations and structures involved in” Jolicoeur, Gluck & Kosslyn, 1984
Principles of Categorization – Eleanor Rosch 1976 - Cognitive Economy - Perceived World Structure
Basic-Level Category The most abstract category at which members share a maximum number of common attributes. The most abstract category for which an image could be reasonably representative of a class as a whole. Rosch et al, 1976 Superordinates: animal, vertebrate Basic-Level: bird Subordinates: Black-capped chickadee
Four experiments to find Basic-Level Categories • Attribute listing • Action (Motor movement) listing • Shape overlap • Average outline naming
Implications • Imagery: The most abstract category at which an average image can still be recognized. • Development: Children can first recognize basic-level categories. • Perception: We first identify basic-level categories and then identify either subordinates or superordinates. • Language: The word of choice when naming instances even when belonging to subordinate categories.
Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, vertebrate Basic Level: bird Entry Level: bird Subordinates: Black-capped chickadee
Entry-Level Category The category that people are likely to name when presented with a depiction of an object. Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 Superordinates: animal, bird Basic Level: bird Entry Level: penguin Subordinates: Chinstrap penguin
Pictures and Names: Making the Connection Jolicoeur, Gluck, Kosslyn 1984
Experiment 1 • Do we identify basic-level categories first and then we go to superordinate categories? Or do we have a separate process where we go directly to superordinate categories but it is slower? apple Name the basic level category / Name a superordinate (e. g. fruit)
Experiment 2 • How about subordinate categories? Do we indeed need additional perceptual information compared to recognizing superordinates? Picture followed by word
Experiment 3 • Is the entry point always the basic level category? What about for atypical objects? Give a name for the picture
Naming Image Content (0. 16) Grampus griseus American black bear Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple (0. 80) (0. 83) Vision Input Image Thousands of Noisy Category Predictions Grampus Naming griseus Pick the Best Dolphin What Should I Call It?
Is this hard? wordnet hierarchy Living thing Plant, Flora Bird Seabird Penguin King penguin Angiosperm Bulbous Plant Flower Narcissus Cormorant Orchid Frog Orchid Daffodil Daisy
How will we do it? Wordnet Linguistic resources Imagenet Google Web 1 T Computer Vision Lots of text The Egyptian cat statue by the floor clock and perpetual motion Interior design of modern white and brown living room furniture hanging. SBU Captioned Dataset Man sits in a rusted car buried in the sand on Waitarere beach Labeled Images Little girl and her dog in northern Thailand. They both seemed. Our dog Zoe in her bed Emma in her hat looking super cute Lots of images with text
Scaling Naming Tasks! 48 categories > 7000 categories
1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin
1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin
Category Translation by Humans Friesian, Holstein-Friesian cow cattle pasture fence
1. 1 Category Translation: Text-based wordnet hierarchy 656 M Animal Mammal 15 M 128 M Seabird Cetacean 0. 9 M Penguin 88 M King penguin Cormorant 1. 2 M Whale 55 M 30 M 22 M Dolphin 6. 4 M Grampus griseus 0. 08 M Sperm whale n-gram Frequency Naturalness Bird Semantic Distance 366 M
1. 2 Category Translation: Image-based Friesian, Holstein-Friesian (1. 9071) cow (1. 1851) orange_tree (0. 6136) stall (0. 5630) mushroom (0. 3825) pasture (0. 3156) sheep (0. 3321) black_bear (0. 3015) puppy (0. 2409) pedestrian_bridge (0. 2353) nest Vision System
Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird buzzard, Buteo buteo hawk bird whinchat, Saxicola rubetra bird chat bird Weimaraner dog dog numbat, banded anteater, anteater cat rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird duck yellowbelly marmot, rockchuck Squirrel marmot rock
1. Goal: Category Translation Detailed Category Grampus griseus What should I Call It? (Entry-Level Category) dolphin 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin
Large Scale Categorization (0. 16) Grampus griseus American black bear Grizzly bear (0. 25) King penguin (0. 11) Cormorant (0. 56) Homing pigeon (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse (0. 03) Soapweed (0. 12) Brazilian rosewood (0. 13) Bristlecone pine (0. 04) Cliffdiving (0. 19) Crabapple (0. 80) (0. 41) Flat Classifiers Selective Search Windows. van De Sande et al. ICCV 2011 Local descriptor s Coding (LLC), Wang et al. CVPR 2010 Spatial pooling
2. 1 Propagated Visual Estimates 656 M (0. 2) 15 M Mammal (0. 8) 128 M Seabird (0. 2) 0. 9 M Cetacean (0. 8) 55 M Whale (0. 8) Penguin King penguin (0. 15) 1. 2 M (0. 15) (0. 05) Cormorant 30 M 0. 08 M Dolphin Grampus griseus (0. 6) 6. 4 M Sperm whale (0. 6) Our. Deng worket al. CVPR 2012 (0. 2) Naturalness Bird Specificity 22 M 366 M Accuracy 88 M (1. 0) Animal
2. 2 Supervised Learning (0. 80) (0. 16) Grampus griseus American black bear Grizzly bear (0. 25) King penguin (0. 11) Cormorant Bear (0. 56) Homing pigeon Dog (0. 26) Ball-peen hammer (0. 06) Spigot (0. 07) Diskette, floppy (0. 06) Steel arch bridge (0. 16) Farmhouse Penguin (0. 03) Soapweed Tree (0. 12) Brazilian rosewood (0. 13) Bristlecone pine Palm tree (0. 04) Cliffdiving (0. 19) Crabapple (0. 41) training from weak annotations SBU Captioned Photo Dataset 1 million captioned images! Building House Bird
Extracting Meaning from Data Weights learned to recognize images with “tree” in caption snag shade tree bracket fungus, shelf fungus bristlecone pine, Rocky Mountain bristlecone pine, Pinus aristata Brazilian rosewood, caviuna wood, jacaranda, Dalbergia nigra redheaded woodpecker, redhead, Melanerpes erythrocephalus redbud, Cercis canadensis mangrove, Rhizophora mangle chiton, coat-of-mail shell, sea cradle, polyplacophore crab apple, crabapple papaya, papaia, pawpaw, papaya tree, melon tree, Carica papaya frogmouth Mammals Birds Instruments Structures Plants Other
Extracting Meaning from Data Weights learned to recognize images with “water” in caption water dog surfing, surfboarding, surfriding manatee, Trichechus manatus punt dip, plunge cliff diving fly-fishing sockeye, sockeye salmon, red salmon, blueback salmon, Oncorhynchus nerka sea otter, Enhydra lutris American coot, marsh hen, mud hen, water hen, Fulica americana booby canal boat, narrowboat Mammals Birds Instruments Structures Plants Other
Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint farm, fence field horse, mule kite, dirt people tree, zoo gelding yearling shire yearling draft horse equine perissodactyl ungulate male horse tree equine male gelding horse pasture field cow fence
Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’ 12 Propagated Visual Supervised Estimates Learning Joint fence, junk sign stop sign street sign trash can tree feeder Hyla cleaner box large woody tree structure plant vascular tree structure building plant area logo street neighborhood building office building
Conclusions/Future Work • We explored different models for content naming in images. • Results can be used to improve the larger goal of generating human-like image descriptions. • Go beyond nouns and infer other type of abstractions on action and attribute words.
Questions? 38
- Human vision vs computer vision
- Love languages types
- What is life role?
- Routine messages examples
- Prose vs poetry
- Fashion style types and categories
- Chapter 11 managing weight and eating behaviors
- What are the three categories of yeast breads
- Whmis hazard categories 1-4
- How many types of serial killers are there
- Basta game
- What are the 4 types of labor?
- Classifications of vegetables
- Andrew nierman iq
- Adult privilege
- Decision categories
- Quality cost categories
- Profit first categories
- Eo army
- Mshsl speech categories
- Seas atar
- What are the main software categories?
- Categories of technology
- Seismic use group
- Project risk categories
- Risk projection
- Objectives of a research paper
- Categories of research design
- Categories of research design
- Renshaw cells
- Categories of vitamins
- Orange vegetables list
- Nccd categories of disability
- Categories used to group different types of literary work
- Dewey decimal system
- Categories gramaticals invariables
- Wisc v training
- 96th percentile iq