Let us build a platform for structure extraction

Let us build a platform for structure extraction and matching that. . Sunita Sarawagi IIT Bombay http: //www. cse. iitb. ac. in/~sunita

Knows when it failed n n Attaches every extraction module with a error detection logic Two types of errors n Precision errors: easier to detect n n Recall errors: much harder n n Reference databases Alternative models Human feedback A research challenge Represents errors and exposes them to users n Imprecise data models for results of extraction and deduplication another research challenge

Seamlessly integrates rules, humans and statistics n Existing systems partitioned on n Rule-based Vs Statistical Manual Vs Learning-based Smooth co-existence of all combinations a must given varying difficulty of tasks and sophistication of users

Treats models as first class objects n Tens and thousands of schema elements n n How to share models across different n n n Cannot afford separate extraction and matching model for each levels of hierarchies, natural languages, formatting languages, versions along time. How quickly can we interactively adapt to new domains starting from existing libraries of models

Is selectively lazy n n n Cannot run away from the hard tasks Only way to attack the long tail of missed extractions is via expensive resources Explicitly represent increasing levels of cost and payoffs and do cost-sensitive processing n Selective linguistic processing: n n POS Chunking Dependency parsing Full parsing Database lookups n No lookups Boolean matches TF-IDF matches Edit distance Web seaches
![Supports multi-spectrum queries Knowledge [Schema] should be like a pocket watch, surfaced only when Supports multi-spectrum queries Knowledge [Schema] should be like a pocket watch, surfaced only when](http://slidetodoc.com/presentation_image_h2/022d607b08bbbcbc11e5c0cf2d44206a/image-6.jpg)
Supports multi-spectrum queries Knowledge [Schema] should be like a pocket watch, surfaced only when needed; not like a wrist watch, always flaunted. - A Bengali saying. n n n Fully schema-aware: SQL, XML, … Schema-less: Keyword queries Common-sense schema-aware n n n User understands Is-a, Part-of, Properties Use world knowledge (ontologies, word-nets, etc) to map both schema and content elements in the query Can use limited rounds of user interaction
- Slides: 6