Image and video descriptors Advanced Topics in Computer
- Slides: 71
Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi
Outline • Overview • Image Descriptors – Histograms of Oriented Gradients Descriptors – Shape Descriptors – Color Descriptors • Video Descriptors
Overview - Motivation • The problem we are trying to solve is image similarity. • Given two images (or image regions) – are they similar or not ?
Overview - Motivation • Solution: Image Descriptors. • An image descriptors “describes” a region in an image. • To compare two such regions we will compare their descriptors.
Overview - Descriptor To compare two images, we will compare their descriptors Similar? Descriptor Function Similar?
Overview - Similarity • But what is similar to you ? • Depends on the application !
Overview • Image (or region) similarity is used in many CV applications, for example: – – – – Object recognition Scene classification Image registration Image retrieval Robot localization Template matching Building panorama And many more…
Overview • Example – 3 D reconstruction from stereo images. 3 75 12 80 15 30 39 80 102 103 110 23 150 195 200 196 208 19 • Comparing the pixels as they are, will not work!
Overview • Descriptors provide a means for comparing images or image regions. • Descriptors allow certain differences between the regions – scale, rotation, illumination changes, noise, shape, etc.
Overview - Motivation Similar ? Descriptor Function • Again, can’t take the pixels alone… Similar ?
Overview Comonly used as follows 1. Extract features from the image as small regions 2. Describe each region using a feature descriptor 3. Use the descriptors in application (comparison, training a classifier, etc. )
Overview • Main problems – Features Detection – Where to compute the descriptors? will cover briefly – Feature Description (Descriptors) How to compute descriptors? today – Feature Comparison How to compare two descriptors? will cover briefly
Overview - Features Detection Methods Where to compute the descriptors? • Grid • Key-Points • Global
Overview - Features Detection Key-Points as Detector Output • Can be – Points – Regions (of different orientation, scale and affine trans. ) • • Squares Ellipses Circles Etc. .
Overview – Descriptor Comparison Given two region description, how to compare them? • • Usually descriptor come with it’s own distance function Many descriptors use L 2 distance
Overview – Descriptor Invariance • Different descriptors measure different similarity • Descriptors can have invariance for visual effects – – Illumination Noise Colors Texture Similar ? • Different applications require different invariance therefore require different descriptors
Outline • Overview • Image Descriptors – Histograms of Oriented Gradients Descriptors – Shape Descriptors – Color Descriptors • Video Descriptors
Descriptor To compare two images, we will compare their descriptors Similar? Descriptor Function Similar?
Descriptors Types of descriptors • Intensity based • Histogram • Gradient based • Color Based • Frequency • Shape • Combination of the above
Descriptors Why not use patches? • Very large representation. • Not invariant to small deformations in the descriptor location. • Not invariant to changes in illumination.
Descriptors Intensity Histogram 0 255 - Not invariant to light intensity change - Does not capture geometric information
Descriptors Histogram of image gradients • Normalize for light intensity invariance • Does not capture geometric information
Descriptors Solution: • Divide the area • For each section compute it’s own histogram SIFT - David Lowe 1999
Descriptors - SIFT How to compute SIFT descriptor Input: an image and a location to compute the descriptor 16 x 16 Step 1: Warp the image to the correct orientation and scale, and than extract the feature as 16 x 16 pixels
Descriptors - SIFT Step 2: Compute the gradient for each pixel (direction and magnitude) 16 x 16 Step 3: Divide the pixels into 16, 4 x 4 squares
Descriptors - SIFT Step 4: For each square, compute gradient direction histogram over 8 directions. The result: 128 dimensions feature vector.
Descriptors - SIFT • Warp the feature into 16 x 16 square. • Divide into 16, 4 x 4 squares. • For each square, compute an histogram of the gradient directions. => Feature vector (128)
Descriptors - SIFT • Weighted by magnitude and Gaussian window ( σ is half the window size) • Normalize the feature to unit vector • Use L 2 distance to compare features Can use other distance functions • X^2 (chi square) • Earth mover’s distance
Descriptors - SIFT Invariance to illumination • Gradient are invariant to Light intensity shift (i. e. add a scalar to all the pixels) • Normalization to unit length add invariance to light intensity change (i. e. multiply all the pixels by a scalar) Invariance to shift and rotation • Histograms does not contains any geometric information • Using 16 histograms allows to preserve geometric information.
Descriptors - GLOH • Similar to SIFT • Divide the feature into log-polar bins instead of dividing the feature into square. – 17 log-polar location bins – 16 orientation bins – We get 17 x 16=272 dimensions. Analyze the 17 x 16=272 Dimensions Apply PCA analysis, keep 128 components C. S. Krystian Mikolajczyk. A performance evaluation of local descriptors. TPAMI 2005
SURF • Use integral images to detect and describe SIFT like features • SURF describes image faster than SIFT by 3 times • SURF is not as well as SIFT on invariance to illumination change and viewpoint change
Descriptors Histograms of Oriented Gradients Descriptors SIFT David Lowe 1999 GLOH Mikolajczyk K. , Schmid C 2005 SURF Bay H. , Ess A. , Tuytelaars T. , Van Gool L 2008
Outline • Overview • Image Descriptors – Histograms of Oriented Gradients Descriptors – Shape Descriptors – Color Descriptors • Video Descriptors
Descriptors
Descriptors - Shape Context =? Assume we have a good edge detector Take a patch of edges? Not invariant to small deformations in the shape
Descriptors - Shape Context • Quantize the edges surface using a log-polar binning • In each bin, sum the number of edge points
Descriptors - Shape Context
Descriptors - Shape Context
Complex Notion of Similarity
The Local Self-Similarity Descriptor Input image Correlation surface Image descriptor
The Local Self-Similarity Descriptor 3 2 1 1 2 3 3 2 1
The Local Self-Similarity Descriptor Input image Correlation surface Properties & Benefits: Image descriptor MAX 1. A unified treatment of repetitive patterns, color, texture, edges 2. Captures the shape of a local region 3. Invariant to appearance 4. Accounts for small local affine & non-rigid deformations Color Texture Edges
Template image:
Descriptors Shape Descriptors Allows measuring of shape similarity Shape Context Belongie S. , Malik J. , Puzicha J. Shape Matching and Object Recognition Using Shape Contexts. PAMI, 2002. Local Self-Similarity Shechtman E. , Irani M. Matching Local Self-Similarities across Images and Videos. CVPR, 2007. Geometric Blur rg A. C. , Malik J. Geometric Blur for Template Matching. CVPR, 2001. Outperform the commonly used SIFT in object classification task Horster E. , Greif T. , Lienhart R. , Slaney M. Comparing local feature descriptors in p. LSA-based image models.
Outline • Overview • Image Descriptors – Histograms of Oriented Gradients Descriptors – Shape Descriptors – Color Descriptors • Video Descriptors
Color Descriptors
Color Descriptors Color spaces • RGB • HSV • Opponent
Color Descriptors Opponent color space • intensity information is represented by channel O 3 • color information is represented by channel O 1 and O 2 • O 1 and O 2 are invariant to offset
Color Descriptors • RGB color histogram • Opponent O 1, O 2 • Color moments • Use all generalized color moments up to the second degree and the first order. • Gives information on the distribution of the colors.
Color Descriptors • RGB-SIFT descriptors are computed for every RGB channel independently – Normalize each channel separately – Invariant to light color change • rg-SIFT - SIFT descriptors over to r and g channels of the normalized-RGB space (2 x 128 dimensions per descriptor) • Opponent. SIFT - describes all the channels in the opponent color space • C-SIFT - Use O 1/O 3 and O 2/O 3 of the opponent color space (2 x 128 dimensions per descriptor) – Scale-invariant with respect to light intensity. – Due to the definition of the color space, the offset does not cancel out when taking the derivative G. J. Burghouts and J. M. Geusebroek Performance evaluation of local color invariants 2009
Color Descriptors Studies the invariance properties and the distinctiveness of color descriptors Light intensity change Light color change Light intensity shift Light color change and shift Light intensity shift and change
Color Descriptors
Color Descriptors
Color Descriptors Increased invariance can reduce discriminative power
Color Descriptors Descriptor performance on image benchmark
Color Descriptors
Descriptors How to chose your descriptor? What is the similarity that you need for your application?
Descriptors
Descriptors Name Capture SIFT Gradient histograms Texture, gradients GLOH Variant of SIFT, log-polar descriptor Texture, gradients SURF Faster variant of SIFT with lower performance Texture, gradients Shape Context Histogram of edges, good for shapes description Shape, edges Self. Similarity Higher level shape description, Invariant Shape to appearance RGB-SIFT descriptors are computed for every Texture, gradients RGB channel independently C-SIFT base on the opponent color space, Texture, gradients, color shown to be better then SIFT for object and scene recognition
Outline • Overview • Image Descriptors – Histograms of Oriented Gradients Descriptors – Shape Descriptors – Color Descriptors • Video Descriptors
Video Descriptors Application: Action recognition Video: More then just a sequence of images Want to capture temporal information
Video Descriptors • Space-Time SIFT 64 -directions histogram P. Scovanner, S. Ali, M. Shah A 3 -dimensional sift descriptor and its application to action recognition - 2007
Video Descriptors Actions as Space-Time Shapes
3 D Shape Context Represent an action in a video sequence by a 3 D point cloud extracted by sampling 2 D silhouettes over time M. Grundmann, F. Meier, and I. Essa (2008) “ 3 D Shape Context and Distance Transform for Action Recognition”
The Local Self-Similarity Descriptor in Video Input video y Correlation volume space-time patch x e m i t space-time region Action detection Video descriptor
Video Descriptors • On Space-Time Interest Points; Ivan Laptev – Local image features provide compact and abstract representations of images, eg: corners – Extend the concept of a spatial corner detector to a spatio-temporal corner detector
Space-Time Interest Points • Consider a synthetic sequence of a ball moving towards a wall and colliding with it • An interest point is detected at the collision point
Space-Time Interest Points • Consider a synthetic sequence of 2 balls moving towards each other coarser scale • Different interest points are calculated at different spatial and temporal scales
Conclusion • The problem we are trying to solve is similarity between images and videos. • Descriptors provide a solution
Conclusion • Tradeoff between keeping the geometric structure and obtaining invariance properties (perturbations & rotations). • Tradeoff between preserving information and obtaining invariance.
Thank You
- Advanced topics in computer science
- Image representation and description
- Representation and description in image processing
- Regional descriptors in image processing
- Aggregating local image descriptors into compact codes
- Cs 527 uiuc
- Angular advanced topics
- Advanced topics in angular
- Advanced c topics
- Advanced topics in web development
- Android advanced topics
- Siop and tap have similar descriptors and indicators
- Advanced video search engine
- Computer and society topics
- Yandex video
- Gravity yahoo
- Video.search.yahoo.com
- The frame size of a video refers to the video’s
- Search exact image
- Transmutation of grades
- Transition learning and development statement examples
- Anatomic descriptors and fundamental body structure
- Types of segment descriptor
- Computer organization topics
- Advanced image search engine
- Point processing techniques
- Hevc full form
- Epostl descriptors
- Tone in reading
- Tone in reading
- Mood descriptors
- Slidetodoc.com
- Reynolds intellectual screening test
- Tone and mood passages
- File descriptors
- Wais qualitative descriptors
- Verbal descriptors in grading system
- Epostl
- What are the 4 elpac levels?
- Wida can do descriptors
- Likert scale descriptors
- Marketing research measurement scales
- Mood versus tone
- Low inference descriptors
- Ljmu grade descriptors
- Ljmu grade descriptors
- Mental status examination purpose
- Telpas alt observable behaviors inventory
- Elps texas
- Msfd descriptors
- Msfd descriptors
- Brunel university grade boundaries
- Msfd descriptors
- Reception production interaction mediation
- Self-directed play
- Mood descriptors in play
- Fsa achievement levels 2020
- Content descriptors
- Esrb arc manual
- Can do descriptors
- Ntu grade descriptors
- Qaa level descriptors
- Wjec criminology grade descriptors
- Brunel grade descriptors
- Gdt global descriptor table
- Cid designation
- Descriptors
- Tap rubric descriptors
- Apst standards
- Elps telpas proficiency level descriptors
- Crop insurance yield descriptors
- Fundamentals of cpu in advanced computer architecture