Boosting Rong Jin Inefficiency with Bagging Inefficient boostrap

Boosting Rong Jin

Inefficiency with Bagging Inefficient boostrap sampling: D • Every example has equal chance to be sampled • No distinction between “easy” examples and “difficult” examples Boostrap Sampling D 1 D 2 Dk … Inefficient model combination: • A constant weight for each classifier • No distinction between accurate classifiers and inaccurate classifiers h 1 h 2 hk

Improve the Efficiency of Bagging Better sampling strategy • Focus on the examples that are difficult to classify Better combination strategy • Accurate model should be assigned larger weights

Intuition Classifier 1 + Training Examples Classifier 2 Mistakes X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 + Classifier 3 Mistakes X 1 X 3 X 1 Y 1 No training mistakes !! Y 3 Y 1 May overfitting !!

Ada. Boost Algorithm

Ada. Boost Example: t=ln 2 D 0 : h 1 D 1 : h 2 D 2 : x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5 1/5 1/5 1/5 x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5 2/7 1/7 x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5 2/9 1/9 4/9 1/9 Sample x 1, y 1 x 3, y 3 x 5, y 5 Training Update Weights Sample h 1 x 1, y 1 x 3, y 3 Training Update Weights h 2 Sample …

How To Choose t in Ada. Boost? How to construct the best distribution Dt+1(i) 1. 2. Dt+1(i) should be significantly different from Dt(i) Dt+1(i) should create a situation that classifier ht performs poorly

How To Choose t in Ada. Boost?

Optimization View for Choosing t ht(x): x {1, -1}; a base (weak) classifier HT(x): a linear combination of basic classifiers Goal: minimize training error Approximate error swith a exponential function

Ada. Boost: Greedy Optimization Fix HT-1(x), and solve h. T(x) and t

Empirical Study of Ada. Boosting decision trees • • Generate 50 decision trees by Ada. Boost Linearly combine decision trees using the weights of Ada. Boost In general: • • Ada. Boost = Bagging > C 4. 5 Ada. Boost usually needs less number of classifiers than Bagging

Bia-Variance Tradeoff for Ada. Boost • Ada. Boost can reduce both variance and bias simultaneously variance bias single decision tree Bagging decision tree Ada. Boosting decision trees