DICTA 08 MultiLocal Feature Manifolds for Object Detection
DICTA 08 Multi-Local Feature Manifolds for Object Detection Oscar Danielsson (osda 02@csc. kth. se) Stefan Carlsson (stefanc@csc. kth. se) Josephine Sullivan (sullivan@csc. kth. se)
The Problem • Object categories are often modeled by collections (bag-of-features) or constellations (pictorial structures) of local features • Many simple, shape-based objects don’t have any discriminative local appearance features ?
The Multi-Local Feature § A specific spatial constellation of oriented edgels (or other local content) § Captures global shape properties § “Weak” detector of shape-based object categories § Described by coordinate vector: (x 1, …, x 12)
Modeling Intra-Class Variation
Modeling Intra-Class Variation 1. Generate coordinate vectors by clicking corresponding edgels in a (small) number of training images 2. Align coordinate vectors wrt. similarity transform
Modeling Intra-Class Variation 3. Extend coordinate vectors into their convex hull
Detection
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Detection For each occurrence x 1 of c 1 For each consistent occurrence x 2 of c 2 Sample from p(x 4, x 3|x 2, x 1) to hypothesize image locations of c 3 and c 4 Sample image edgels Compute normalized distance to convex hull of training features 4 If distance is below threshold, an instance was detected 1 End For 3 2
Experiments Detection performance was evaluated on a standard database (ETHZ Shape Classes) and we want to investigate: § Is a multi-local feature a good weak detector? § How many local features should be used?
Mugs - Training 25 training images were downloaded from Google images • 14 edgels constituting a multilocal feature were marked in each training image • 10 14 4 9 1 12 11 6 8 2 5 7 13 3 10 14 4 11 6 9 8 5 2 1 7 12 13 3
Mugs - Results
Mugs - Results Performance decreases when adding more than 9 local features 60. 6 % 0. 4
Bottles - Training 25 training images were downloaded from Google images • 12 edgels constituting a multilocal feature were marked in each training image • 4 4 7 9 7 8 10 1 2 11 12 6 5 3 8 9 10 1 2 11 12 5 6 3
Bottles - Results
Bottles - Results 72. 7 % 0. 4
Apple logos - Training 20 training images were downloaded from Google images • 12 edgels constituting a multilocal feature were marked in each training image •
Apple logos - Results
Apple logos - Results Performance decreases when adding more than 11 local features 77. 3 % 0. 4
Conclusions § A multi-local feature is a good weak detector of shape-based object categories § The best performance is achieved with multilocal features with a moderate number of local features § Convex combinations of valid exemplars are in general also valid exemplars (we can extend a few training examples into their convex hull)
Future Work § Automatic learning of multi-local features § Building combinations of multi-local feature detectors into an object detection system
Related Work § Pictorial Structures § E. g. . Felzenszwalb, Huttenlocher. Pictorial Structures for Object Recognition, IJCV No. 1, January 2005. § Constellation Models § E. g. . Fergus, Perona, Zisserman. Object class recognition by unsupervised scale -invariant learning, CVPR 03. Differences § Different detection methods § Use rich local features
Thanks!
Representation The multi-local feature manifold consists of all convex combinations of the training examples
- Slides: 30