Multitask feature selection Guillaume Obozinski Ben Taskar Michael
Multi-task feature selection Guillaume Obozinski, Ben Taskar, Michael Jordan UC Berkeley
An OCR example The letter “a” by 40 writers • Quite different “a”s • Few examples per task • What’s in common? • similar pixel masks? • similar “strokes”?
Multi-task Feature Selection common feature space features selected from a common bag of features T 1 T 2 T 3 tasks
L 1 regularization L 1 ball • Motivation?
A Block L 1/L 2 Norm • The objectives can be collapsed into: x tasks features x…x
Geometric interpretation x w 1 x w 2 ? … x wl Assume w 1 is constrained to have L 1 norm =1. What is the induced constraint on w 2? w 1 w 2 The section of the norm for w 2 changes according to w 1 w 2
Blockwise Boosted L 1/L 2 Fwd/Prediction step: An e-step along: for Bwd/Correction step: An e-step along: for
Application to Multi-writer OCR Extract 1000 common “stroke” features Use raw pixel image 8 x 16 binary pixels Learn classifiers to discriminate between difficult pairs of letters a/d, g/y, g/s, n/m, a/o etc Training sizes: ~ 5 -15 examples of each letter per writer
Pixel features vs Masks for 40 writers: Indep L 1/L 2 10. 3% 4. 4%
Performance comparison
Stroke features vs Strokes Masks for 40 writers: Indep L 1/L 2 3. 0 % 3. 3%
- Slides: 11