Object Recognition So what does object recognition involve

Object Recognition

So what does object recognition involve?

Verification: is that a bus?

Detection: are there cars?

Identification: is that a picture of Mao?

Object categorization sky building flag banner bus face street lamp cars wall bus

Challenges 1: view point variation Michelangelo 1475 -1564

Challenges 2: illumination slide credit: S. Ullman

Challenges 3: occlusion Magritte, 1957

Challenges 4: scale

Challenges 5: deformation Xu, Beihong 1943

Challenges 7: intra-class variation

Two main approaches Global sub-window Part-based

Global Approaches x 1 Aligned images x 2 x 3 Vectors in highdimensional space

Global Approaches Vectors in high-dimensional space x 1 x 2 x 3 Training Involves some dimensionality reduction Detector

Detection – Scale / position range to search over

Detection – Combine detection over space and scale.

PROJECT 1 Build a detection system that inputs an image, runs a detector over (x, y) and scales, and removes spurious detections. The system should be able to run different detectors. For initial testing use linear SVM (existing package). Challenge: • Algorithm for integration of raw detections. • Speed.

• • • Turk and Pentland, 1991 Belhumeur et al. 1997 Schneiderman et al. 2004 Viola and Jones, 2000 Keren et al. 2001 Osadchy et al. 2004 • Amit and Geman, 1999 • Le. Cun et al. 1998 • Belongie and Malik, 2002 • Schneiderman et al. 2004 • Argawal and Roth, 2002 • Poggio et al. 1993

Antiface method for detection • No training on negative examples is required. • A set of rejectors is applied in cascaded manner. • Robust to large pose variation. • Simple and very fast.

Intuition How are the natural images distributed in a high dimensional space? Lower probability Boltzmann distribution image smoothness measure Lower probability

Intuition PCA Lower probability Many false positives Lower probability Antiface Much less false positives

Main Idea Claim: for unit vectors, random natural images viewed as is large on average. Anti-Face detector is defined as a vector d satisfying: – for all positive class – d is smooth is large on average for random natural image.

Discrimination If x is an image and is a target class: SMALL LARGE

Cascade of Independent Detectors 7 inner products 4 inner products

Example Samples from the training set 4 Anti-Face Detectors

4 Anti-face Detectors

Eigenface method with the subspace of dimension 100

PROJECT 2 • Implement Antiface method for detection*. • Implement several extensions of Antifaces: – Change the accepting rule so that instead of passing all the detectors it passes at least 80% of detectors. – Apply Naïve Bayes in 10 D antiface space – Project each image onto 20 D Antiface space and train SVM in this space. See project page for details * D. Keren M. Osadchy and C. Gotsman, Anti-Faces: A novel, fast method for image detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, No. 7, July 2001, pp. 747 -761.

Part-Based Approaches Bag of ‘words’ Object Constellation of parts

Bag of ‘words’ analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal sensory, image was transmitted pointbrain, by point to visual centers in the brain; the cerebral cortex was a visual, perception, movie screen, so to speak, upon which the cerebral cortex, image inretinal, the eye was projected. Through the discoveries ofeye, Hubelcell, and Wiesel we now optical know that behind the origin of the visual image perception in thenerve, brain there is a considerably more complicated course of events. By Hubel, Wiesel following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a stepwise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. China is forecasting a trade surplus of $90 bn (£ 51 bn) to $100 bn this year, a threefold increase on 2004's $32 bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750 bn, compared with a 18% rise in imports to China, trade, $660 bn. The figures are likely to further annoy the US, which has long argued that surplus, commerce, China's exports are unfairly helped by a exports, imports, US, deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan, bank, domestic, yuan is only one factor. Bank of China foreign, increase, governor Zhou Xiaochuan said the country also needed to do more tovalue boost domestic trade, demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2. 1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

Interest Point Detectors • Basic requirements: – Sparse – Informative – Repeatable • Invariance – Rotation – Scale (Similarity) – Affine

Popular Detectors Scale Invariant The are many others… See: 1) “Scale and affine invariant interest point detectors” K. Mikolajczyk, C. Schmid, IJCV, Volume 60, Number 1 - 2004 Harris-Laplace Difference of Gaussians Laplace of Gaussians Scale Saliency (Kadir 2) “A comparison of affine region detectors”, K. Mikolajczyk, T. Tuytelaars, C. Braidy) Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool, Affine Invariant http: //www. robots. ox. ac. uk/~vgg/research/affine/det_eval_files/vibes_ijcv 2004. pdf Harris-Laplace Affine Difference of Gaussians Laplace of Gaussians Affine Saliency (Kadir. Braidy)

Representation of appearance: Local Descriptors • Invariance – Rotation – Scale – Affine • Insensitive to small deformations • Illumination invariance – Normalize out

SIFT – Scale Invariant Feature Transform • Descriptor overview: – Determine scale (by maximizing Do. G in scale and in space), local orientation as the dominant gradient direction. Use this scale and orientation to make all further computations invariant to scale and rotation. – Compute gradient orientation histograms of several small windows (128 values for each point) – Normalize the descriptor to make it invariant to intensity change David G. Lowe, "Distinctive image features from scale-invariant keypoints, “ International Journal of Computer Vision, 60, 2 (2004), pp. 91 -110.

Feature Detection and Representation Compute SIFT descriptor Normalize patch [Lowe’ 99] Detect patches [Mikojaczyk and Schmid ’ 02] [Matas et al. ’ 02] [Sivic et al. ’ 03] Slide credit: Josef Sivic

Feature Detection and Representation …

Codewords dictionary formation …

Codewords dictionary formation … Vector quantization Slide credit: Josef Sivic

Codewords dictionary formation Fei-Fei et al. 2005

Image patch examples of codewords Sivic et al. 2005

SVM classification positive negative Learning positive negative Representation Vector X SVM classifier

SVM classification Recognition SVM(X) Representation Contains object Vector X Doesn’t contain object

PROJECT 3 • Implement a bag of ‘words’ approach. The method is described in “Visual Categorization with Bags of Keypoints” G. Cruska, C. R. Dance, L. Fan, J. Willamowski, C. Bray. • Test it on 4 categories (from 101 database): airplanes, faces, cars side, motorbikes, against background.

PROJECT 4 • Implement part based method, described in “Class Recognition Using Discriminative Local Features”, by G. Dorkó, C. Schmid. • Test it on Oxford object data set. • Compare the performance of the algorithm using different point detectors. The code for point detectors is provided. • Compare the performance of the algorithm with original SIFT and with SIFT without rotation invariance. The initial code for SIFT is provided, but should be edited to remove rotation invariance.