Hierarchical Semisupervised Classification with Incomplete Class Hierarchies Bhavana
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi, Aditya Mishra, William W. Cohen
Semi-supervised Entity Classification Everything Food Animals Mammals 2 Reptiles Fruits Vegetabl es
Semi-supervised Entity Classification Everything Food Animals Subset Mammals Reptiles Disjoint 3 Fruits Vegetabl es
Semi-supervised Entity Classification Everything Food Animals Mammals 4 Reptiles Fruits Vegetabl es
Prior work Everything • Coupled Semi-Supervised Learning for Information Extraction, Carlson et al. WSDM Food Animals 2010 • Automatic Gloss Finding for a Knowledge Base using Ontological Constraints, Dalvi et al. WSDMVegetabl 2015 Mammals 5 Reptiles Fruits es
Challenge: Incomplete Class Hierarchies Everything Food Animals Mammals 6 Reptiles Fruits Vegetabl es
Challenge: Incomplete Class Hierarchies Everything Food Animals Mammals 7 C 9 Reptiles Fruits Location Vegetabl es C 8 Beverages
Challenge: Incomplete Class Hierarchies Everything Mammals 8 GOAL: Do semi-supervised Food Animals classification and ontology extension in a single unified framework. Vegetabl Reptiles Fruits es C 9 Location C 8 Beverages
Optimization Problem Maximize { Log Data Likelihood – Model Penalty } m: #clusters, Params{C 1… Cm} subject to, Class constraints: Zm Expectation Maximization 9
Optimized Divide-And-Conquer Strategy Ø Class constraints: Mixed Integer Linear program Ø Missing classes: Soft Divide-And-Conquer method 10
Class constraints: Mixed Integer Linear program Max {likelihood of assignment – constraint violation penalty} 11
Class constraints: Mixed Integer Linear program Max {likelihood of assignment – constraint violation penalty} Score of label assignment Subset constraint Disjoint Constraint 12 Subset constraint Penalty Disjoint constraint Penalty
Missing classes: Soft Divide-And. Conquer 1 3 7 8 4 9 1 0 Near uniform? 13 1 1
Missing classes: Soft Divide-And. Conquer 1 3 7 8 4 9 1 0 Near uniform? 14 1 1
Results: 10% improvement F 1 scores Flat Explore EM Opt. DAC Explore. EM 65 55 45 35 um bl e. M ed iu m Ta Te xt -M ed i um l al bl e. M m Ta Ta ed i xt -M bl e. S um al m Te xt -S Te ed i l um l al m bl e. M Ta bl e. S Ta ed i xt -M Te t-S m al l um 25 Te x Macro avg. seed class F 1 75 Level = 2 3 4 15
Results: Ontology Extension 16
Datasets are made public Four hierarchical entity classification datasets are made publicly available at http: //rtw. ml. cmu. edu/wk/Web. Sets/hierarchical_ Exploratory. Learning_WSDM 2016/index. html 17
Thank You bhavanad@allenai. org 18
- Slides: 18