CostSensitive Learning for Large Scale Hierarchical Classification of

Classification is a fundamental problem in information management. Product description Email content UNSPSC Spam

How should we design a classifier for a given real world task?

Method 1. No Design Training Set f(x) Test Set Try Off-the-shelf Classifiers SVM Logistic-regression

Method 2. Optimize what we really care about What’s the use of the classifier?

Hierarchical classification of commercial products Textual product description UNSPSC Segment Office Equipment and Accessories

Product taxonomy helps customers to find desired products quickly. • Facilitates exploring similar products

We assume misclassification of products leads to revenue loss. Textual product description of a

What do we really care about? A vendor’s business goal is to maximize revenue,

Observation 1: the misclassification cost of a product depends on its potential revenue.

Observation 2: the misclassification cost of a product depends on how far apart the

The proposed performance evaluation metric: average revenue loss of product x • d(y, y’)

Learning – minimizing average revenue loss • Minimize convex upper bound

Multi-class SVM with margin re-scaling plug in any loss function 0 -1 error rate

Dataset • UNSPSC (United Nations Standard Product and Service Code) dataset data source multiple

Experimental results 60 47. 708 50 48. 082 40 IDENTITY 30 UNIT 20 10

What’s wrong? • Revenue loss ranges from a few K to several M

Final results 60 47. 708 50 48. 082 40 IDENTITY 30 UNIT RANGE 20

Conclusion What do we really care about for this task? Minimize error rate? Minimize

Slides: 22

Download presentation

Cost-Sensitive Learning for Large -Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Classification is a fundamental problem in information management. Product description Email content UNSPSC Spam Ham Segment Office Equipment and Accessories and Supplies (44) Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Family Marine transport (11) Motor vehicles (10) Aerospace systems (20) Class Safety and rescue vehicles (17) Passenger motor vehicles (15) Product and material transport vehicles (16) Commodity Buses (02) Automobiles or cars (03) Limousines (06)

How should we design a classifier for a given real world task?

Method 1. No Design Training Set f(x) Test Set Try Off-the-shelf Classifiers SVM Logistic-regression Decision Tree Neural Network. . . Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy

Method 2. Optimize what we really care about What’s the use of the classifier? How do we evaluate the performance of a classifier according to our interests? Quantify what we really care about Optimize what we care about

Hierarchical classification of commercial products Textual product description UNSPSC Segment Office Equipment and Accessories and Supplies (44) Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Family Marine transport (11) Motor vehicles (10) Aerospace systems (20) Class Safety and rescue vehicles (17) Passenger motor vehicles (15) Product and material transport vehicles (16) Buses (02) Automobiles or cars (03) Limousines (06) Commodity

Product taxonomy helps customers to find desired products quickly. • Facilitates exploring similar products • Helps product recommendation • Facilitates corporate spend analysis Looking for gift ideas for a kid? Toys&Games dolls . . . puzzles building toys

We assume misclassification of products leads to revenue loss. Textual product description of a mouse Product . . Desktop computer and accessories mouse keyboard realize an expected annual revenue . . . pet lose part of the potential revenue

What do we really care about? A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss

Observation 1: the misclassification cost of a product depends on its potential revenue.

Observation 2: the misclassification cost of a product depends on how far apart the true class and the predicted class in the taxonomy. Textual product description of a mouse Product . . Desktop computer and accessories mouse keyboard . . . pet

The proposed performance evaluation metric: average revenue loss of product x • d(y, y’) 0 1 2 3 4 0 0. 2 0. 4 0. 6 0. 8

Learning – minimizing average revenue loss • Minimize convex upper bound

Multi-class SVM with margin re-scaling

Multi-class SVM with margin re-scaling plug in any loss function 0 -1 error rate (standard multi-class SVM) VALUE product revenue TREE hierarchical distance REVLOSS revenue loss

Dataset • UNSPSC (United Nations Standard Product and Service Code) dataset data source multiple online market places oriented for Do. D and Federal government customers GSA Advantage Do. D EMALL taxonomy structure 4 -level balanced tree UNSPSC taxonomy #examples 1. 4 M #leaf classes 1073 • Product revenues are simulated – revenue = price * sales

Experimental results 60 47. 708 50 48. 082 40 IDENTITY 30 UNIT 20 10 0 4. 745 0 -1 4. 964 TREE 5. 092 VALUE 5. 082 REVLOSS Average revenue loss (in K$) of different algorithms

What’s wrong? • Revenue loss ranges from a few K to several M

Loss normalization •

Final results 60 47. 708 50 48. 082 40 IDENTITY 30 UNIT RANGE 20 10 0 4. 745 0 -1 4. 964 TREE 5. 092 4. 387 VALUE 5. 082 4. 371 7. 88% reduction in average revenue loss! REVLOSS Average revenue loss (in K$) of different algorithms

Conclusion What do we really care about for this task? Minimize error rate? Minimize revenue loss? How do we approximate the performance evaluation metric to make it tractable? Find the best parameters Performance evaluation metric Model + Tractable loss function Optimization regularized empirical risk minimization A general method: multiclass SVM with margin re -scaling and loss normalization

Thank you! Questions?