SemiAutomatic Semantic Annotation for HiddenWeb Tables Cui Tao
Semi-Automatic Semantic Annotation for Hidden-Web Tables Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University 1 Supported by NSF
Semantic Annotation § The Hidden Web: § § Hidden behind forms Hard to query “cdk-4" 2 www. deg. byu. edu
Semantic Annotation § The Hidden Web: § § Hidden behind forms Hard to query to find the protein and the animo-acids information for gene “cdk-4" 3 www. deg. byu. edu
Semantic Annotation § The Hidden Web: § § Hidden behind forms Hard to query § Semantic annotation § § Machine-”understandable” Publicly accessible 4 www. deg. byu. edu
System Overview § Initial semantic annotation § § Manually annotate a sample page With respect to a selected ontology § Table interpretation § § Automatic Tables from hidden web pages § Final semantic annotation § § Automatic Annotate interpreted tables 5 www. deg. byu. edu
Initial Semantic Annotation § SMORE: Semantic Markup, Ontology and RDF Editor [Maryland information and network dynamics lab] 6 www. deg. byu. edu
7 www. deg. byu. edu
Table Interpretation § Table interpretation § § § Locate label and value Pair label-value pairs Remember path § TISP – Table Interpretation by Sibling Pages 8 www. deg. byu. edu
TISP 9 www. deg. byu. edu
Interpretation Technique: Sibling Page Comparison Same 10 www. deg. byu. edu
Interpretation Technique: Sibling Page Comparison Almost Same 11 www. deg. byu. edu
Interpretation Technique: Sibling Page Comparison Different Same 12 www. deg. byu. edu
Interpretation Technique: Sibling Page Comparison Structure Pattern of a Table Label Path = Identification. Gene model(s). Gene Model Xpath = html[1]/…/table[3]/tr[1]/td[2]/table[1]/tr[6]/td[2]/table[1]/tr[2]/td[1] www. deg. byu. edu 13
Annotation Protein Name Protein Name 14 www. deg. byu. edu
Annotation – Split Nucleotide Size Nucleotide Size 15 www. deg. byu. edu
Annotation – Merge Protein Information 16 www. deg. byu. edu
Annotation—Union Name 17 www. deg. byu. edu
Annotation—Selection Molecular Function 18 www. deg. byu. edu
Generated RDF Annotation 19 www. deg. byu. edu
Querying Annotated Data to find the protein and the animo-acids information for gene “cdk-4" 20 www. deg. byu. edu
Summary § Semi-automatic semantic annotation for hidden web tables § Facilitate large-scale annotation to the web 21 www. deg. byu. edu
- Slides: 21