An Ontologybased Automatic Semantic Annotation Approach for Patent
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College of Computer Science Zhejiang University Energy Procedia 13(2011) 790 -800
Outline Ø Background Ø Semantic representation Ø Automatic semantic annotation
Background Product innovation design Ø Brain storming Ø Literature review Ø Build on existing inventions and develop novel technologies/products
Background Literature review Ø Refer to existing and relevant patent documents Ø Professional patent retrieval tools Ø Patent. Cafe – keywords and Boolean operator search Ø Bio. Patent. Miner – keywords and query linkage Ø Soo. PAT – keywords and International Patent Classification code Ø Etc. Ø Semantic representation is rarely used
Background Structure Domain Content (concepts) Syntax (relations)
Semantic Representation Semantic representation
Semantic Representation Patent Document
Semantic Representation Patent ontology Ø Patent structure ontology Ø Specific document structure Ø First page, description, claims, figures (media), etc
Semantic Representation Patent ontology Ø Patent structure ontology Ø Specific document structure Ø First page, description, claims, figures (media), etc Ø Patent content ontology Ø Basic parameter ontology
Semantic Representation Patent ontology Ø Patent structure ontology Ø Specific document structure Ø First page, description, claims, figures (media), etc Ø Patent content ontology Ø Basic parameter ontology Ø Technical feature ontology
Semantic Representation Semantic representation Ø Patent structure ontology Ø Specific document structure Ø First page, description, claims, figures (media), etc Ø Patent content ontology Ø Basic parameter ontology Ø Technical feature ontology Ø Common oncology = syntax Ø Domain oncology = domain-specific terms
Semantic Representation Patent document semantics
Automatic Semantic Annotation Automatic semantic annotation Ø Structure level Ø Pre-defined template schemes Ø Regular expression-based parsing Ø Logical structure: basic parameters, description, claim, etc Ø Physical structure: headings, sections, subsections, body, etc Ø Content level Ø Technical feature level
Automatic Semantic Annotation Automatic semantic annotation Ø Structure level Ø Content parsing & filtering: filter out irrelevant sentences Ø POS tagging: identify nouns, verbs, etc Ø Entity identification: pick up nouns and verbs Ø Semantic Recognition Ø Concept recognition: select nouns by domain oncology (using IPC codes) Ø Relation recognition: using common oncology Ø Technical feature level
Automatic Semantic Annotation Automatic semantic annotation Ø Structure level Ø Content level Ø Technical feature level Ø Regular expression method + active learning Ø prefix. Terms + pattern + postfix. Terms Ø A “associated with” B Ø Given prefix. Terms/postfix. Terms, learn new patterns Ø A “is a component of” B Ø Heuristic rules to remove unreasonable patterns
Automatic Semantic Annotation
Discussion Ø Multi-level semantic representation Ø Allow the precision information extraction Ø Leverage NLP and domain knowledge Ø Document type dependent Ø Difficult to generalize
- Slides: 17