Mapping the NCI Thesaurus and the Collaborative InterLingual
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida aehicks@ufl. edu Health. Insight Workshop, Oslo, Norway 20 May 2016
Overview • Ontologies versus wordnets • Inter-Lingual Index • Future work to map National Cancer Institute Thesaurus to the Collaborative Inter-Lingual Index In collaboration with • Francis Bond, Nanyan Technological University, Singapore • Selja Seppälä, University Florida, USA 2
ONTOLOGIES VERSUS WORDNETS 3
Semantic Networks • Words, concepts, or classes that are arranged in a network • Provide a framework for machine readable meaning and context for what would otherwise be uninterpreted syntax • Simultaneously logical objects and mathematical objects, subject to inference and graph theoretical analysis 4
Wordnets versus Ontologies Wordnets are semantic networks that represent how we use language. Meanings are stated in natural language definitions and relatively sparse semantic relations. • The word ‘cat’ in context Ontologies are semantic networks that represent properties of things in the world. Meanings are encoded in logical form. • What it is to be a cat 5
Comparative Strengths Wordnets • NLP applications that require word sense discrimination • Cross-lingual comparison of lexical categories • Distance or related measures of concepts Ontologies • Provide a coherent, stable and unified frame of reference for the interpretation of concepts and specification of classes • May support interoperability of data sets • Support deductive reasoning over structured data 6
INTER-LINGUAL INDEX 7
Current State of Mapping Wordnets • Wordnets exist for many languages. – 33 open wordnets in the Global Wordnet Grid http: //globalwordnet. org/global-wordnet-grid/ • Mapping often occurs through English Word. Net. – English centric – English does not have a word for every concept. • Some wordnets are mapped to each other directly. 8
Mapping wordnets to each other directly gets messy. Vossen, Global. Word. Net Conference, 2016 9
The Collaborative Inter-Lingual Index (CILI) no en es pt 10
CILI • Flat list of concepts with a persistent Semantic Web compliant IRI • Synsets from wordnets mapped directly to the ILI IRI • English Word. Net 3. 0, 3. 1 and Dutch Open Wordnet currently mapped • Unique English definitions are associated with each ILI to support mapping (but no English words or labels) • Not imposed on linked wordnets • Open, anyone can contribute https: //github. com/globalwordnet/ili 11
NCI THESAURUS AND CILI 12
National Cancer Institute Thesaurus (NCIt) • An English medical reference terminology • Definitions crafted by teams of medical experts and terminologists • Covers vocabulary for clinical care, translational and basic research, and public information and administrative activities • Widely used in biomedical and health informatics in the USA 13
Why map NCIt to CILI? • Specialized terminology in CILI should be defined by subject matter experts, not linguists. • There is currently no prototype for mapping specialized vocabulary to CILI. • More resources may lead to improved formal semantics to be integrated with CILI. • To support integration of health knowledge extracted from linguistically heterogeneous sources – Multi-lingual – Layperson/specialized vocab 14
“Patella” in WN and NCIt Word. Net • patella, kneecap, kneepan • A small flat triangular bone Additional in front of the knee that formal semantic protects the knee joint information • Part_holonym – knee • Hypernym – Sesamoid bone Additional synonyms Additional definitional knowledge: potentially useful formalizing semantics NCI T • BONE, PATELLA • A small flat triangular bone in front of the knee that articulates with the femur and protects the knee joint. • sub. Class. Of – Bone of the Lower Extremity – Short Bone • Semantic Type – Anatomical Structure 15
Semantic Modeling One of the goals of CILI is to have ontologies that provide formal semantics for the indexed concepts. • Different semantic resources encode different semantic information. • NCIt can be used to enrich the common semantic model. 16
Our Planned Approach To the Project Map NCI Thesaurus to CILI • Convert NCI Thesaurus to Lexical Markup Framework • Partially automate mapping NCIt to CILI using string matching on Word. Net synsets and NCIt names an similarity measures on definitions • There will still be false negatives that will need to be identified by hand. Formalize the semantics of NCIt related CILIs with other ontologies. • KYOTO • BFO 17
Use for Knowledge Integration … smokes hubbly-bubbly on weekends … Smoking status| L ILI Concept 18
THANK YOU 19
- Slides: 19