How To KB Mining How To Knowledge from
How. To. KB: Mining How. To Knowledge from Online Communities Cuong Chu, MPI Saarbruecken Niket Tandon, Allen Institute for AI Gerhard Weikum MPI Saarbruecken task frame : How to paint a wall
How. To. KB: Mining How. To Knowledge from Online Communities Cuong Chu, MPI Saarbruecken Niket Tandon, Allen Institute for AI Gerhard Weikum MPI Saarbruecken task frame : How to paint a wall Attributes Edges
Related work on How. To knowledge acquisition Input Representation
Related work on How. To knowledge acquisition - Tasks are not semantic frames Yang et. al SIGIR’ 15 Syntactic structures Reduced expressivity Generic Input Representation Open. IE Concept. Net Semantic expressivity Prop. Bank Domain specific
Related work on How. To knowledge acquisition Yang et. al SIGIR’ 15 Syntactic structures Reduced expressivity Generic Input Representation Open. IE Concept. Net Prop. Bank Verb. Net Frame. Net Knowlywood Domain specific Semantic expressivity
Related work on How. To knowledge acquisition Yang et. al SIGIR’ 15 Syntactic structures Reduced expressivity Generic Input How. To. KB Representation Open. IE Fillmore’ 76 Minsky’ 74 Concept. Net Prop. Bank Schank’ 75 Verb. Net Frame. Net Knowlywood Domain specific Semantic expressivity
Related work on How. To knowledge acquisition Generic Yang et. al SIGIR’ 15 Syntactic structures Reduced expressivity Input How. To. KB Representation Open. IE Fillmore’ 76 Concept. Net Prop. Bank Schank’ 75 Minsky’ 74 Verb. Net Frame. Net Knowlywood Semantic expressivity - No phrases/tasks - manually populated Domain specific Message: How. To. KB’s knowledge representation is different.
Related work on How. To knowledge acquisition Task Model
Related work on How. To knowledge acquisition Semantic Frame parsing Schema based Task Model Semantic Role Labeling Unsupervised Supervised Open. IE Schema free extraction Syntactic structures
Related work on How. To knowledge acquisition Semantic Frame parsing Semantic Role Labeling Supervised Schema based Task How. To. KB Model Knowlywood - mapped to Word. Net, closed sense repository Unsupervised Open. IE Schema free extraction Syntactic structures Message: our task is different.
Wiki. How: our input dataset
Wiki. How: our input dataset Task Sub task Previous task Sub task Next task … Participating objects
Wiki. How: our input dataset Task Sub task Previous task Sub task Images, videos Next task … Participating objects Message: Wiki. How data is very rich, and can be exploited.
System overview Wiki. How Frame construction Frame organization How. To. KB
System overview Wiki. How Frame construction Frame organization Stage 1 a: convert unstructured articles to structured task frame Stage 1 b: sequencing task frames Novel knowledge representation How. To. KB
System overview Wiki. How Frame construction Frame organization Stage 2: organize the sequenced task frames. Novel hierarchical organization with distributional senses How. To. KB
KB construction: extraction § Open. IE naturally suits task frame construction; easy mapping to task attributes attribute Open. IE mapping location time Participating agent subject Participating object subject/object
KB construction: extraction § Open. IE naturally suits task frame construction; easy mapping to task attributes § Attribute type-checking increases precision from 75% to 97% attribute Open. IE mapping type-checking head ∈ Word. Net location WN-noun time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: Type checking helps to postprocess Open. IE results.
KB construction: extraction § Open. IE naturally suits 1. 2 taskmillion frame construction; easy mapping to task attributes task frames § Attribute type-checking increases precision from 75% to 97% attribute Open. IE mapping type-checking WN: Word. Net location Noun phrase time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: 1. 2 M task frames are isolated from each other.
Why KB organization? task paint wall task paint ceiling participating object brush, paint, . . participating object paint, roller, . . sub-task clean the surface, dip the roller. . Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling
Why KB organization? Task use keyboard Category Iphone, Android Mac, Windows Music listening Music appreciation Visuals Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling b) disambiguation of tasks: use keyboard– piano? or, computer?
Approach to KB organization use keyboard press keystrokes § For the 1. 2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive
Approach to KB organization use keyboard press keystrokes § For the 1. 2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive Expected organization use keyboard, press keystrokes
Approach to KB organization use keyboard press keystrokes § For the 1. 2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive We propose a two stage-clustering, Stage 1: coarse-grained clustering Stage 2: fine-grained clustering use keyboard, press keystrokes
Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f 1 Task frame f 2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling, … … … . . . Category home & garden house decoration
Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f 1 Task frame f 2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling, … … … . . . Category home & garden house decoration
Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f 1 Task frame f 2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling, … … … . . . Category home & garden house decoration Message: Our task frame pairs are dissimilar with an empirical confidence of 99. 9% if a combination of their categorical and lexical similarity is less than a threshold
Coarse-grained clustering use keyboard use mac keyboard use keyboard press keystrokes Efficient Hash Based grouping 1. 2 million task frames Lexical grouping 375 K groups
Coarse-grained clustering use keyboard use mac keyboard use keyboard press keystrokes use keyboard, press keystrokes Efficient Hash Based grouping Fewer pairs, efficient top-k similarity 1. 2 million task frames Lexical grouping 375 K groups Distributional grouping 200 K groups Message: Pruning helps to efficiently reduce the search space.
Fine-grained clustering use keyboard use mac keyboard use keyboard press keystrokes … press keystrokes 1. 2 million task frames Lexical grouping 375 K groups Distributional grouping 200 K groups use keyboard, press keystrokes Allows fast, parallel hierarchical clustering Final clusters
Recap of system architecture Wiki. How Frame construction Frame organization How. To. KB
Resulting How. To. KB Wilson confidence intervals § 0. 5 million grouped task frames, § Avg. per frame: 12 attributes values, 2 images § Precision > 85%
Resulting How. To. KB Wilson confidence intervals § 0. 5 million grouped task frames, § Avg. per frame: 12 attributes values, 2 images § Precision > 85% § As ground truth, turkers fill “very likely” attribute values for 150 frames § Example: In some context such as decorate the house, the most likely location when we paint a wall is ____ Message: How. To. KB maintains high precision at large-scale.
Usecase: finding You. Tube videos for a How. To task Task query make caramel corn You. Tube video Gourmet Caramel Popcorn “Thanks Monique”
Usecase: finding You. Tube videos for a How. To task Task query make caramel corn Expansion using frames (attributes, edges) brown sugar. . . popcorn. . . syrup teaspoon. . . bake soda. . . vanilla. . You. Tube video Gourmet Caramel Popcorn “Thanks Monique”
Usecase: finding You. Tube videos for a How. To task Task query make caramel corn Expansion using frames (attributes, edges) brown sugar. . . popcorn. . . syrup teaspoon. . . bake soda. . . vanilla. . You. Tube video Gourmet Caramel Popcorn “Thanks Monique” § How. To. KB based expansion beats the strong baselines (Word 2 Vec, Word. Net) § 50% of How. To. KB’s context is unique, going beyond distributional context. § For hard ambiguous queries, How. To. KB has 10% precision; baselines achieve < 1% Message: How. To. KB provides rich context, beyond relatedness.
Conclusion § Wiki. How provides a very rich starting point for extraction of task frames § Knowledge organization is performed by our fast, clustering method § Resulting How. To. KB is the first KB on How. To tasks, and is publicly available. http: //www. mpi-inf. mpg. de/yago-naga/webchild/How. To. KB Task paint a wall Category painting and other finishes, ceilings, interior walls Participating object wall, ceiling, stair, paint, latex paint for base coat, masking tape, cotton rags Parent task touch up paint, paint ceiling Sub task move furniture, choose color, sand the wall, fill any hole Message: How. To. KB, with its rich structure, fills an important knowledge gap
- Slides: 37