Importance of Semantic Representation Dataless Classification MingWei Chang

  • Slides: 39
Download presentation
Importance of Semantic Representation: Dataless Classification Ming-Wei Chang Lev Ratinov Dan Roth Vivek Srikumar

Importance of Semantic Representation: Dataless Classification Ming-Wei Chang Lev Ratinov Dan Roth Vivek Srikumar University of Illinois, Urbana-Champaign

Text Categorization�� Classify the following sentence: �� Syd Millar was the chairman of the

Text Categorization�� Classify the following sentence: �� Syd Millar was the chairman of the International Rugby Board in 2003. Pick a label: Class 1 vs. Class 2 n Traditionally, we � need annotated data to train a classifier Slide 1

Text Categorization n Humans don’t seem to need labeled data Syd Millar was the

Text Categorization n Humans don’t seem to need labeled data Syd Millar was the chairman of the International Rugby Board in 2003. Pick a label: Sports vs. Finance Label names carry a lot of information! Slide 2

Text Categorization Do we really always need labeled data? Slide 3

Text Categorization Do we really always need labeled data? Slide 3

Contributions n We can often go quite far without annotated data ¨ n …

Contributions n We can often go quite far without annotated data ¨ n … if we “know” the meaning of text This works for text categorization ¨ …. and is consistent across different domains Slide 4

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n Robustness to different domains Slide 5

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n Robustness to different domains Slide 6

Semantic Representation n One common representation is the Bag of Words representation n All

Semantic Representation n One common representation is the Bag of Words representation n All text is a vector in the space of words. Slide 7

Semantic Representation n Explicit Semantic Analysis ¨ [Gabrilovich & Markovitch, 2006, 2007] n Text

Semantic Representation n Explicit Semantic Analysis ¨ [Gabrilovich & Markovitch, 2006, 2007] n Text is a vector in the space of concepts n Concepts are defined by Wikipedia articles Slide 8

Explicit Semantic Analysis: Example Apple IPod Monetary Policy ESA representation IPod mini International Monetary

Explicit Semantic Analysis: Example Apple IPod Monetary Policy ESA representation IPod mini International Monetary Fund IPod photo Monetary policy IPod nano Economic and Monetary Union Apple Computer Hong Kong Monetary Authority IPod shuffle Monetarism ITunes Central bank Wikipedia article titles Slide 9

Semantic Representation n Two semantic representations ¨ Bag of words ¨ ESA Slide 10

Semantic Representation n Two semantic representations ¨ Bag of words ¨ ESA Slide 10

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n Robustness to different domains Slide 11

Traditional Text Categorization Labeled corpus Sports Finance Semantic space A classifier Slide 12

Traditional Text Categorization Labeled corpus Sports Finance Semantic space A classifier Slide 12

Dataless Classification Labeled Labels corpus Sports Finance What can we do using just the

Dataless Classification Labeled Labels corpus Sports Finance What can we do using just the labels? Slide 13

But labels are text too! Slide 14

But labels are text too! Slide 14

Dataless Classification New unlabeled Labels document Sports Finance Semantic space Slide 15

Dataless Classification New unlabeled Labels document Sports Finance Semantic space Slide 15

What is Dataless Classification? n Humans don’t need training for classification n Annotated training

What is Dataless Classification? n Humans don’t need training for classification n Annotated training data not always needed n Look for the meaning of words Slide 16

What is Dataless Classification? n Humans don’t need training for classification n Annotated training

What is Dataless Classification? n Humans don’t need training for classification n Annotated training data not always needed n Look for the meaning of words Slide 17

On-the-fly Classification New unlabeled Labels document Sports Finance Semantic space Slide 18

On-the-fly Classification New unlabeled Labels document Sports Finance Semantic space Slide 18

On-the-fly Classification n No training data needed n We � know the meaning of

On-the-fly Classification n No training data needed n We � know the meaning of label names n Pick the label that is closest in meaning to the document n Nearest neighbors Slide 19

On-the-fly Classification New unlabeled New labels document Hockey Baseball Semantic space Slide 20

On-the-fly Classification New unlabeled New labels document Hockey Baseball Semantic space Slide 20

On-the-fly Classification n No need to even know labels before hand n Compare with

On-the-fly Classification n No need to even know labels before hand n Compare with traditional classification ¨ Annotated training data for each label Slide 21

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n Robustness to different domains Slide 22

Dataset 1: Twenty Newsgroups n Posts to newsgroups ¨ Newsgroups have descriptive names sci.

Dataset 1: Twenty Newsgroups n Posts to newsgroups ¨ Newsgroups have descriptive names sci. electronics = Science Electronics rec. motorbikes = Motorbikes Slide 23

Dataset 2: Yahoo Answers n Posts to Yahoo! Answers Posts categorized into a two

Dataset 2: Yahoo Answers n Posts to Yahoo! Answers Posts categorized into a two level hierarchy ¨ 20 top level categories ¨ Totally 280 categories at the second level ¨ Arts and Humanities, Theater Acting Sports, Rugby League Slide 24

Experiments n 20 Newsgroups ¨ 10 binary problems (from [Raina et al, ‘ 06])

Experiments n 20 Newsgroups ¨ 10 binary problems (from [Raina et al, ‘ 06]) Religion vs. Politics. guns � Motorcycles vs. MS Windows n Yahoo! Answers ¨ 20 binary problems Health, Diet fitness vs. Health Allergies Consumer Electronics DVRs vs. Pets Rodents Slide 25

Results: On-the-fly classification Dataset Supervised Baseline Bag of Words ESA Newsgroup 71. 7 65.

Results: On-the-fly classification Dataset Supervised Baseline Bag of Words ESA Newsgroup 71. 7 65. 7 85. 3 Yahoo! 84. 3 66. 8 88. 6 Naïve Bayes classifier Nearest neighbors, Uses annotated data, Ignores labels Uses labels, No annotated data Slide 26

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n Robustness to different domains Slide 27

Using Unlabeled Data n Knowing the data collection helps ¨ n We can learn

Using Unlabeled Data n Knowing the data collection helps ¨ n We can learn specific biases of the dataset Potential for semi-supervised learning Slide 28

Bootstrapping n Each label name is a “labeled” document ¨ n Train initial classifier

Bootstrapping n Each label name is a “labeled” document ¨ n Train initial classifier ¨ n One “example” in word or concept space Same as the on-the-fly classifier Loop: Classify all documents with current classifier ¨ Retrain classifier with highly confident predictions ¨ Slide 29

Co-training n Words and concepts are two independent “views” n Each view is a

Co-training n Words and concepts are two independent “views” n Each view is a teacher for the other [Blum & Mitchell ‘ 98] Slide 30

Co-training n Train initial classifiers in word space and concept space n Loop Classify

Co-training n Train initial classifiers in word space and concept space n Loop Classify documents with current classifiers ¨ Retrain with highly confident predictions� of both classifiers ¨ Slide 31

Using unlabeled data n Three approaches ¨ Bootstrapping with labels using Bag of Words

Using unlabeled data n Three approaches ¨ Bootstrapping with labels using Bag of Words ¨ Bootstrapping with labels using ESA ¨ Co-training Slide 32

More Results Co-training using just labels does as well as supervision with 100 examples

More Results Co-training using just labels does as well as supervision with 100 examples No annotated data Slide 33

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n

Outline n Semantic Representation n On-the-fly Classification n Datasets n Exploiting unlabeled data n Robustness to different domains Slide 34

Domain Adaptation n Classifiers trained on one domain and tested on another n Performance

Domain Adaptation n Classifiers trained on one domain and tested on another n Performance usually decreases across domains Slide 35

But the label names are the same n Label names don’t depend on the

But the label names are the same n Label names don’t depend on the domain n Label names are robust across domains ¨ On-the-fly classifiers are domain independent Slide 36

Example Baseball vs. Hockey Slide 37

Example Baseball vs. Hockey Slide 37

Conclusion n Sometimes, label names are tell us more about a class than annotated

Conclusion n Sometimes, label names are tell us more about a class than annotated examples ¨ n Standard learning practice of treating labels as unique identifiers loses information The right semantic representation helps ¨ What is the right one? Slide 38