Multiple Instance Learning with Query Bags Boris Babenko

Multiple Instance Learning with Query Bags Boris Babenko, Piotr Dollar, Serge Belongie [In prep. for ICML’ 09 – feedback appreciated!]

Outline • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion

Multiple Instance Learning (MIL) • Ambiguity in training data • Instead of instance/label pairs, get bag of instances /label pairs • Bag is positive if one or more of it’s members is positive

Multiple Instance Learning (MIL) • Supervised Learning Training Input • MIL Training Input • Goal: learning instance classifier

MIL Assumptions • Bags are predefined/fixed & finite (size • Bag label determined by: ) • Typical assumption: instances all drawn i. i. d. • Refer to this as a classical bag

MIL Theory • Best known PAC bound due to Blum et al. • • • = dimensionality = bag size = the desired error. • Problem harder for larger bags • Result relies on i. i. d. assumption

Outline • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion

MIL Applications • Most MIL Apps: bag generated by breaking object into many overlapping pieces. • Let’s see some examples…

Vision • Image known to contain object, but precise location unknown [Andrews et al. 02, Viola et al. 05]

Audio • Audio wave can be broken up spatially or in frequency domain [Saul et al. 01]

Biological Sequences • Known to contain short subsequence of interest ACTGTGTGACATGTAGC { ACTG, CTGT, TGTG…} … [Ray et al. 05]

Text • Text document broken down into smaller pieces [Andrews et al. 02]

Observations • Sliding windows: bags are large/infinite. • In practice, bag is sub-sampled – Could violate the assumption • Instances of bag not independent – often lie on low dim. manifold (i. e. image patches) !

Outline • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion

Query Bags for MIL • Bag not fixed – can query oracle to get arbitrary number of instances • Each query bag represented by object • To retrieve instances, use query function with location parameter

Query Bag for MIL • Instances often lie on low dim. manifold • Can query for nearby instances

Query Bags for MIL • Can express bag as • Define bag label as

Distribution of locations • Assume for each bag there is some distribution (known or unknown) • Could provide some prior information. • Let , how informative is

Query Bag Size • To determine bag label with confidence need • Bigger bag = better. Less chance of missing correct positive instance • Note the difference between query bags and classical bags

Example: Line Bags • Instances of a bag lie on a line.

Example: Hypercube Bags • Instances of a bag lie in a hypercube

Example: Image Translation Bags • Let be large image, at location be patch centered • Could easily extend this to rotations, scale changes, etc.

Experiments • Goal: compare behavior of synthetic classical bags and query bags to real dataset (MNIST). • Use MILBoost (Viola et al. ’ 05). • Expect qualitatively similar results for other MIL algorithms. • For query bags, subsample instances

Results

Experiment: Variance • How does distribution affect error? • Repeat Line Bag experiment, increase variance of - spreads points out along the line.

Observations • PAC results not applicable to query bags – performance increase as increases. • MNIST results closely resemble synthetic query bag examples. • Need computational strategy for dealing with large bags. • Take advantage of relationships between instances.

Outline • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion

MILBoost Review • Train a strong classifier (just like Ada. Boost) • Optimize log likelihood of bags where and • Use Gradient Boosting (Friedman ’ 01) – In each iteration add close to

MILBoost w/ Query Bags • Bag probability over all instances • In practice, subsample bag: • Could subsample once in the beginning, or do something more clever…

Filtering Strategies • Recently, Bradley & Schapire proposed Filter. Boost, which learns from continuous source of data. • Alternates between training weak classifier and querying oracle for more data. • Apply this idea to MILBoost

Filtering Strategies • Want highest probability instances • Parameters: – – – = number of boosting iterations = number of instances to evaluate = frequency of filtering

Filtering Strategies • Random Sampling (RAND) – Query instances, keep best • Memory (MEM) – Query new instances, combine with old ones, keep best MEM RAND

Filtering Strategies • Search (SRCH) – Assume instances lie on low dimensional manifold – – Search for nearby such that – Test nearby locations MEM SRCH RAND

MNIST Filtering Experiments • Turn SRCH and MEM on and off. • Sweep through: – R = sampling amount (16) – m = bag size (4) – F = sampling frequency (1) – T = boosting iterations (64)

MNIST Filtering Exp: m • Filtering converges w/ smaller memory usage

MNIST Filtering Exp: R & F • MEM is very effective • SRCH helps when MEM is OFF, not as big of a difference when MEM is ON

MNIST Filtering Exp: T • w/o MEM filtering does not converge • Positive region becomes sparse

Why MEM Works • Let be log likelihood with • Can show (for a fixed classifier H) – – Using MEM, we add bag in each iteration, so new instances per • In reality H is not fixed; hard to show convergence.

Outline • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion

Summary • Current assumptions for MIL are not appropriate for typical MIL applications. • We proposed query bag model, fits real data better • For query bags, sampling more instances is better. • We proposed some simple strategies for dealing with large/infinite query bags.

Future Work • Develop more theory for the query bag model. • Experiments with other domains (audio, bioinformatics). • MCL – learning pedestrian parts automatically.

Questions?

Filtering Query Bags

MILBoost with Filtering