CostSensitive Learning for Large Scale Hierarchical Classification of






















- Slides: 22
Cost-Sensitive Learning for Large -Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University
Classification is a fundamental problem in information management. Product description Email content UNSPSC Spam Ham Segment Office Equipment and Accessories and Supplies (44) Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Family Marine transport (11) Motor vehicles (10) Aerospace systems (20) Class Safety and rescue vehicles (17) Passenger motor vehicles (15) Product and material transport vehicles (16) Commodity Buses (02) Automobiles or cars (03) Limousines (06)
How should we design a classifier for a given real world task?
Method 1. No Design Training Set f(x) Test Set Try Off-the-shelf Classifiers SVM Logistic-regression Decision Tree Neural Network. . . Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy
Method 2. Optimize what we really care about What’s the use of the classifier? How do we evaluate the performance of a classifier according to our interests? Quantify what we really care about Optimize what we care about
Hierarchical classification of commercial products Textual product description UNSPSC Segment Office Equipment and Accessories and Supplies (44) Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Family Marine transport (11) Motor vehicles (10) Aerospace systems (20) Class Safety and rescue vehicles (17) Passenger motor vehicles (15) Product and material transport vehicles (16) Buses (02) Automobiles or cars (03) Limousines (06) Commodity
Product taxonomy helps customers to find desired products quickly. • Facilitates exploring similar products • Helps product recommendation • Facilitates corporate spend analysis Looking for gift ideas for a kid? Toys&Games dolls . . . puzzles building toys
We assume misclassification of products leads to revenue loss. Textual product description of a mouse Product . . Desktop computer and accessories mouse keyboard realize an expected annual revenue . . . pet lose part of the potential revenue
What do we really care about? A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss
Observation 1: the misclassification cost of a product depends on its potential revenue.
Observation 2: the misclassification cost of a product depends on how far apart the true class and the predicted class in the taxonomy. Textual product description of a mouse Product . . Desktop computer and accessories mouse keyboard . . . pet
The proposed performance evaluation metric: average revenue loss of product x • d(y, y’) 0 1 2 3 4 0 0. 2 0. 4 0. 6 0. 8
Learning – minimizing average revenue loss • Minimize convex upper bound
Multi-class SVM with margin re-scaling
Multi-class SVM with margin re-scaling plug in any loss function 0 -1 error rate (standard multi-class SVM) VALUE product revenue TREE hierarchical distance REVLOSS revenue loss
Dataset • UNSPSC (United Nations Standard Product and Service Code) dataset data source multiple online market places oriented for Do. D and Federal government customers GSA Advantage Do. D EMALL taxonomy structure 4 -level balanced tree UNSPSC taxonomy #examples 1. 4 M #leaf classes 1073 • Product revenues are simulated – revenue = price * sales
Experimental results 60 47. 708 50 48. 082 40 IDENTITY 30 UNIT 20 10 0 4. 745 0 -1 4. 964 TREE 5. 092 VALUE 5. 082 REVLOSS Average revenue loss (in K$) of different algorithms
What’s wrong? • Revenue loss ranges from a few K to several M
Loss normalization •
Final results 60 47. 708 50 48. 082 40 IDENTITY 30 UNIT RANGE 20 10 0 4. 745 0 -1 4. 964 TREE 5. 092 4. 387 VALUE 5. 082 4. 371 7. 88% reduction in average revenue loss! REVLOSS Average revenue loss (in K$) of different algorithms
Conclusion What do we really care about for this task? Minimize error rate? Minimize revenue loss? How do we approximate the performance evaluation metric to make it tractable? Find the best parameters Performance evaluation metric Model + Tractable loss function Optimization regularized empirical risk minimization A general method: multiclass SVM with margin re -scaling and loss normalization
Thank you! Questions?