Detecting Categories in News Video Using Image Features

  • Slides: 18
Download presentation
Detecting Categories in News Video Using Image Features Slav Petrov, Arlo Faria, Pascal Michaillat,

Detecting Categories in News Video Using Image Features Slav Petrov, Arlo Faria, Pascal Michaillat, Alex Berg, Andreas Stolcke, Dan Klein, Jitendra Malik

System Overview Images GB SVM ASR MFCC TFIDF GMM Sequential context Audio Category correlation

System Overview Images GB SVM ASR MFCC TFIDF GMM Sequential context Audio Category correlation Video Source combination 1 -best selection SVM Feature extraction Primary systems Higher-level systems

Image Features in Trec. Vid ’ 05 § IBM: § Color Histogram § Color

Image Features in Trec. Vid ’ 05 § IBM: § Color Histogram § Color Correlogram § Color Moments § Co-occurence Texture § Wavelet Texture Grid § Edge Histogram Layout § CMU (local): § Color Histograms (in different color spaces) § Texture Histograms § Edge Histograms § Columbia (part based model): § Color § Texture § Size § Spatial Relation § Tsinghua (local and global): § Color Auto-Correlograms § Color Coherence Vectors § Color Histograms § Color Moments § Edge Histograms § Wavelet Texture

Image Features in Trec. Vid ’ 05 Berkeley Tsinghua Columbia CMU IBM Moments Color

Image Features in Trec. Vid ’ 05 Berkeley Tsinghua Columbia CMU IBM Moments Color Correlograms Histograms Texture Wavelets Edge Histograms Shape

Exemplars for Recognition § Use exemplars for recognition § Compare query image and each

Exemplars for Recognition § Use exemplars for recognition § Compare query image and each exemplar using shape cues Database of Exemplars Query Image

Finding similar patches Exemplar Query

Finding similar patches Exemplar Query

[Berg & Malik, CVPR’ 01] Geometric Blur (Local Appearance Descriptor) Compute sparse channels from

[Berg & Malik, CVPR’ 01] Geometric Blur (Local Appearance Descriptor) Compute sparse channels from image ~ Geometric Blur Descriptor is robust to Descriptor small affine distortions Extract a patch in each channel Idealized signal Apply spatially varying blur and sub-sample

GB in Practice § In practice compute discrete blur levels for whole image and

GB in Practice § In practice compute discrete blur levels for whole image and sample as needed for each feature location. Horizontal Channel Vertical Channel Increasing Blur

[Berg, Berg & Malik, CVPR’ 05] Comparing Images § Sample 200 GB features from

[Berg, Berg & Malik, CVPR’ 05] Comparing Images § Sample 200 GB features from edge points § Dissimilarity from A to B is where the Fx are the GB features.

Caltech 101 Dataset § Object Recognition Benchmark § 101 Categories: § § Stereotypical pose

Caltech 101 Dataset § Object Recognition Benchmark § 101 Categories: § § Stereotypical pose Little clutter Objects centered One object per image

[Zhang, Berg, Maire & Malik, CVPR’ 06] Caltech 101 Results uses GB features

[Zhang, Berg, Maire & Malik, CVPR’ 06] Caltech 101 Results uses GB features

Primal features for SVM § Compare to 50 prototypes from each class § Use

Primal features for SVM § Compare to 50 prototypes from each class § Use distances as feature vector for an SVM Query Prototype s Featur 0. 7 e Vector … . . . 0. 9 … 0. 1 … …… 0. 8 ………. 0. 7

SVM features interpretation § Slices of the Kernel Matrix: q § Fixed-points in a

SVM features interpretation § Slices of the Kernel Matrix: q § Fixed-points in a higher dimensional vector space: ti tj tk ti tj q tk

SVM Specifics § SVMlight package § Same parameters for all categories: § Linear kernel

SVM Specifics § SVMlight package § Same parameters for all categories: § Linear kernel § Default regularization parameter § Asymmetric cost doubling the weight of positive examples

m. AP = 0. 11 Co TV mpu -Sc ter ree n Ca r

m. AP = 0. 11 Co TV mpu -Sc ter ree n Ca r eti ng Me Sp ort s Results ’ 06 Results ’ 05 Berkeley-Shape m. AP = 0. 38 Best ’ 05 (IBM) m. AP = 0. 34 Best Berkeley-Shape Median

Limitations § Several objects per image: § Features do not capture: § Different Scales

Limitations § Several objects per image: § Features do not capture: § Different Scales § Color

Conclusions § Shape is an important cue for object recognition. § System that uses

Conclusions § Shape is an important cue for object recognition. § System that uses shape features only can have competitive performance. § Shape features are orthogonal to features used in the past.

Thank You! petrov@eecs. berkeley. edu

Thank You! petrov@eecs. berkeley. edu