UMLS Mike Buck Chengjian Che Kevin Coonan Vojtech
UMLS Mike Buck, Chengjian Che, Kevin Coonan, Vojtech Huser February 2004
Agenda • Overview • • • Metathesaurus Semantic Network Desiderata Application Discussion 2
3
UMLS • UMLS = Unified Medical Language System • Created in 1986 • 3 parts – Metathesaurus • Database of concepts that appear in one or more constituent vocabularies – Semantic Network • Provides categorization of concepts in Metathesaurus and describes working relations between them. – Specialist Lexicon • Biomedical lexicon, associated tools used primarily for NLP • Package: set of data files + tools (Metamorpho. Sys, lvg) + Java API 4
Sub-agenda • • • Domain and intended use Maintenance organization Licensing requirements How it is distributed SNOMED-CT® inclusion 5
Domain and intended use • DOMAIN – – Collection of terminologies Includes many domains, 100+ terminologies • (Me. SH, SNOMED, LOINC, ICD, Rx. Norm, First Databank, Gene Ontology) • Same concepts are represented differently in many different vocabularies • Absence of a standard format for distributing terminologies Bodenreider, O. (2004). "The Unified Medical Language System (UMLS): integrating 6 biomedical terminology. " Nucleic Acids Res 32 Database issue: D 267 -70
Current use – Almost nobody is using the whole UMLS as a production terminology (waiting for SNOMED) – Most users extract only certain vocabularies (Me. SH, Rx. Norm) 7
8
Maintenance organization • National Library of Medicine (NLM) 8600 Rockville Pike Bethesda, MD 20894 • Governmental agency – Funded by Congress – 2004 budget for NLM was $ 316 M – Approximately $3 M for UMLS alone – http: //www. hhs. gov/budget/testify/b 20030408_g. html • Contact for help with UMLS – Custserv@nml. nih. gov 9
Licensing requirements and costs • License – Mail or fax signed license agreement to NLM http: //www. nlm. nih. gov/research/umls/license. html • No charge for UMLS license but many constituent vocabularies have their own license requirements if used outside the scope of the UMLS license 10
Licensing restrictions • Four progressive levels of use restriction in UMLS license (restrictions accumulate) 0 - no restrictions 1 - prohibited to translate or distribute derivative versions 2 - cannot use in production environment 3 - internal use for research and analysis only • Examples – – Me. SH (0) LOINC (0) First Data. Bank National Drug Data File (3) Micromedex DRUGDEX (3) 11
12
How it is distributed • Access via web browser • UMLS Knowledge server (google UMLSKS) umlsks. nlm. nih. gov • Files on your own computer • Download • 2 CD set • Web services API available • to query NLM server 13
Web access 14
15
16
UMLS Files • 4. 9 GB when unzipped • Quarterly updates AA, AB, AC, AD – Not perfectly regular (2003 AD does not exist) – Not perfectly up to schedule (2004 AA does not exist yet) • Manuals printed only once per year • Text files, operating system independent, tools in Java • Metamopho. Sys is a provided tool to extract portions of UMLS into separated data-files • You get the files only - no “browser” included – q. v. RELMA for LOINC or Clue. Browser for SNOMED 17
18
19
20
New Distribution Format • Current format has not substantially changed between 2001 AA – 2003 AC releases • Rich release (MR+) coming now • White paper http: //www. nlm. nih. gov/research/umls/white_paper. html • Rich Release Format (Draft - January 29, 2004 ) http: //umlsinfo. nlm. nih. gov/rich_release_format. html • Metamorpho. Sys will have several additional file output options: – the current relational ASCII file formats – the new expanded relational file formats – and (in 2004) XML formats. • SNOMED-CT® will be released in the first 2004 release (2004 AA), in the new Rich Release Format. 21
World dimensions of UMLS • UMLS is not meant to be just US project • First extension was a Spanish version • Other nations involved (part of distribution) – French, Italian, Russian, Finish, Swedish, Norwegian, Hungarian, Basque, Hebrew, German • UK (SNOMED, NHS, Read codes) • UK: used Read Codes • Incorporated into SNOMED-CT 22
SNOMED inclusion • THIS AGREEMENT ("Agreement") is made as of June 30, 2003 by and between CAP (not-forprofit) and NLM (U. S. federal government agency) (licensee) • • $ 35, 000 price tag 35 K new concepts, 400 K new strings Good until 2008 Text of the license – http: //www. nlm. nih. gov/research/umls/Snomed/snomed_license. html 23
Agenda ü Overview • Metathesaurus • • Semantic Network Desiderata Application Discussion 24
Metathesaurus • DOMAIN – Collection of terminologies – Includes many domains, 100+ terminologies • (Me. SH, SNOMED, LOINC, ICD, Rx. Norm, First Databank) • Statistics – Concepts: – Strings: – Relationships: • 975, 354 2, 361, 983 12, 071, 232 Ref: http: //www. nlm. nih. gov/research/umls/METAB 3. HTML (number of concepts from each member terminology) 25
Summary of Table Contents • Metathesaurus Concept Names (2. 5. 1) = MRCON • Relationships between Different Concept Names (2. 5. 2) = MRREL, MRCOC, MRATX • Attributes (2. 5. 3) = MRSAT, MRDEF, MRSTY, MRLO, MRRANK • Source Information and contexts (2. 5. 4) = MRSO, MRCXT, MRSAB • Indexes (2. 6) = MRXW. BAQ, MRXW. DAN, MRXW. DUT, MRXW. ENG, MRXW. FIN, MRXW. FRE, MRXW. GER, MRXW. HEB, MRXW. HUN, MRXW. ITA, MRXW. NOR, MRXW. POR, MRXW. RUS, MRXW. SPA, MRXW. SWE, MRXNW. ENG, MRXNS. ENG 26
27
Metathesaurus: Concepts • Concept: Cluster of synonymous terms – Identified by a CUI • Term: Set of lexical variants – Identified by a LUI • String: Concept name – Identified by a SUI 28
29
CUIs, LUIs, SUIs LUIs SUIs CUI 30
31
32
MRCON • There is exactly one row in this file for each meaning of each unique string (CUISUI combination) in the Metathesaurus. • Any difference in upper-lower case, word order, etc. creates a different unique string. 33
Metathesaurus data for C 0001403 (“Addison’s Disease”) 34
MRSO • There is exactly one row in this file for each source of each string in the Metathesaurus. 35
Metathesaurus data for C 0001403 (“Addison’s Disease”) 36
37
38
39
40
MRREL • There is one row in this table for each relationship between metathesaurus concepts known to the metathesaurus. 41
Mrrel relationships 42
43
MRSAT • There is exactly one row in this file for each concept, term and string attribute that does not have a sub-element structure. • Attributes provide additional information about the meaning of a concept and explain how it is used in the source vocabulary. 44
Mrsat concept attributes 45
46
47
MRDEF • There is exactly one row in this file for each definition in the Metathesaurus. A few definitions approach 3, 000 characters in length. 48
Definitions 49
MRSTY • There is exactly one row in this file for each semantic type assigned to each concept. 50
Semantic Type MRSTY Semantic Types 51
Other Tables
mratx 53
mrcxt 54
mrlo 55
mrrank 56
mrcoc 57
Metathesaurus Change Files • There are six files or relations that identify key differences between entries in the previous and the current edition of the Metathesaurus. • Deleted Concepts (File=DELETED. CUI) • Merged Concepts (File=MERGED. CUI) • Deleted Terms (File=DELETED. LUI) • Merged Terms (File=MERGED. LUI) • Deleted Strings (File=DELETED. SUI) • Retired CUI Mapping (File=MRCUI) 58
59
Future Changes • Effective with the 2004 AA release of the UMLS Metathesaurus, the release file structure will be substantially expanded. • Three existing relational files (MRCON, MRSO and MRATX) will be deprecated and merged into MRCONSO. • Better version control for tracking changes to concepts and strings. • And many more… 60
Large Scale Vocabulary Test • Does the Metathesaurus cover the medical domain well? • Participants searched 30 different terminologies in the 1996 UMLS for 32, 679 normalized strings! • 58% exact match • 41% related concepts • 1% not found 61
Large Scale Vocabulary Test • Exact meaning match in constituent terminologies ranged from 45% to 71% • Semantic types of missing matches: – – Findings 35% Disorders 20% Procedures 15% Concepts 15% • Individual terminology match ranged from <1 -63% of terms • Only SNMI and RCD had >60% of terms • Combined SNMI+RCD had 79% exact matches • Conclusions: Combined vocabularies get greater coverage. Humphreys, B. L. , A. T. Mc. Cray, et al. (1997). "Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test. " J Am Med Inform Assoc 4(6): 484 -500. 62
Agenda üOverview ü Metathesaurus • Semantic Network • Desiderata • Application • Discussion 63
UMLS Semantic Network--overview • Purpose is to provide a consistent categorization of all UMLS concepts. • Purpose is to provide a set of useful relationships between concepts • 135 semantic types – entity or event • 54 relationships – isa or associated_with 64
Semantic type 3 d map 65
UMLS Semantic Network—structure • Semantic types are nodes – Each metathesuarus concept (CUI) has a semantic type (TUI) located in MRSTY – Eg COOO 1403|T 047|Disease or Syndrome • Relationships are links – Most specific semantic type possible is assigned – Parent-child—linked by isa (primary link) – Large set of non-hierarchical links as well 66
Entity 68
Entity and Event 69
70
UMLS Semantic Network—structure • Semantic types are nodes – Each metathesuarus concept (CUI) has a semantic type (TUI) located in MRSTY • Relationships are links – Most specific semantic type possible is assigned – Parent-child—linked by isa (primary link) – Large set of non-hierarchical links as well 71
72
UMLS Semantic Network—structure • Semantic types are nodes – Each metathesuarus concept (CUI) has a semantic type (TUI) located in MRSTY • Relationships are links – Most specific semantic type possible is assigned – Parent-child—linked by isa (primary link) – Large set of non-hierarchical links • associated_with 73
Non-hierarchical links – Associated_with • • Physically related to Spatially related to Temporally related to Functional related to – Affects » Manages » Treats » Etc. – Brings about – Etc…. • Conceptually related to Attributes of links (relationship) is hierarchical 74
75
Sample hierarchy of non-hierarchical relationships 76
Distributions • Unit record format (one file) SU contains all attributes for any given semantic type or relationship • ASCII relational tables similar to metathesaurus distribution – Flat (text) delimited file 77
ASCII relational format files • SRDEF contains definitions for semantic types and relations. • SRSTR contains the structure of the semantic network – Pairs of semantic types/relationship – Relation of second to first – Status of relationship (defined for argument and children, defined for argument only, blocked • SRSTRE 1/2 allowed relationships in semantic network – Pairs of either encoded (1) or text name (2) with relationship (either coded or text) 78
: : : : SRDEF : : : : STY|T 020|Acquired Abnormality|A 1. 2. 2. 2|An abnormal structure, or one that is abnormal in size or location, found in or deriving from a previously normal structure. Acquired abnormalities are distinguished from diseases even though they may result in pathological functioning (e. g. , "hernias incarcerate"). |Abscess of prostate; Hemorrhoids; Hernia, Femoral; Varicose Veins||||| 79
ASCII relational format – SRDEF contains definitions for semantic types and relations. – SRSTR contains the structure of the semantic network • Pairs of semantic types/relationship • Relation of second to first • Status of relationship (defined for argument and children, defined for argument only, blocked – SRSTRE 1/2 allowed relationships in semantic network • Pairs of either encoded (1) or text name (2) with relationship (either coded or text) 80
: : : : SRSTR : : : : Acquired Abnormality|cooccurs_with|Injury or Poisoning|D|Acquired Abnormality|isa|Anatomical Abnormality|D| Acquired Abnormality|result_of|Behavior|D| Activity|isa|Event|D| Age Group|isa|Group|D| Alga|isa|Plant|D| 81
ASCII relational format – SRDEF contains definitions for semantic types and relations. – SRSTR contains the structure of the semantic network • Pairs of semantic types/relationship • Relation of second to first • Status of relationship (defined for argument and children, defined for argument only, blocked – SRSTRE 1/2 allowed relationships in semantic network • Pairs of either encoded (1) or text name (2) with relationship (either coded or text) 82
: : : : SRSTRE 2 : : : : Acquired Abnormality|isa|Anatomical Abnormality| Acquired Abnormality|isa|Anatomical Structure| Acquired Abnormality|isa|Physical Object| Acquired Abnormality|isa|Entity| Acquired Abnormality|affects|Alga| Acquired Abnormality|affects|Amphibian| Acquired Abnormality|affects|Animal| 83
84
85
86
89
Descendants (first generation) Chemicals & Drugs 5 -HT 3 -receptor antagonist ¤ alizapride ¤ alpha-fluoromethylhistidine ¤ azasetron ¤ benzquinamide 50 MG Intravenous Solution ¤ buclizine ¤ Chlorpromazine ¤ Chlorprothixene ¤ clebopride ¤ Cyclizine ¤ Dexamethasone ¤ DEXTROSE/FRUCTOSE/PHOSPHORIC ACID SOLN, ORAL ¤ Diazepam ¤ diorylate ¤ Diphenhydramine ¤ diphenidol ¤ dixyrazine ¤ DOLASETRON 100 MG Oral Tablet ¤ (69 first generation descendants) [direct children and narrower concepts] 90
91
Co-occurring Concepts Anatomy middle ear [10] ¤ Chemicals & Drugs Analgesics [11] ¤ Analgesics, Non-Narcotic [11] ¤ Analgesics, Opioid [38] ¤ Anesthetics, Inhalation [8] ¤ Concepts & Ideas Practice Guidelines [8] ¤ Quality of life [15] ¤ Disorders Abnormalities, Drug-Induced [12] ¤ Akathisia, Drug-Induced [15] ¤ Basal Ganglia Diseases [9] ¤ Physiology Gastric Emptying [11] ¤ Procedures Adenoidectomy [7] ¤ Ambulatory Surgical Procedures [28] ¤ Analgesia, Epidural [7] ¤ Analgesia, Patient-Controlled [17] ¤ Antineoplastic Combined Chemotherapy Protocols [107] ¤ Number of pairs (shown/all) = 99/850 (12%) Frequency (shown/all) = 5356/6610 (81%) 92
93
94
Agenda üOverview ü Metathesaurus ü Semantic Network • Desiderata • Application • Discussion 95
Desiderata… • Content – Not intended to be an independent terminology but rather a collection and mapping of constituent terminologies. – Extension is dependent upon contributing terminologies • Concept Orientation – Concepts (CUIs) are unique, defined, and specific. – Inter-concept relationships are either inherited from the source vocabularies or generated specifically by the editors of the Metathesaurus 96
…Desiderata … • Concept Permanence – CUIs are never reused, and deprecated CUIs with mapping to current CUIs is maintained (MRCUI). • Nonsemantic Concept Identifier – CUIs, LUIs, SUIs, TUIs are non-hierarchal 97
. . . Desiderata… • Polyhierarchy – While CUIs, etc. are not hierarchical, some hierarchical relationships from source terminologies are preserved (e. g. Me. SH) – Semantic network allows for multiple isa and inverse_isa relationships – Membership in one hierarchy does not preclude membership in another 98
…Desiderata… • Formal Definitions. – No UMLS definitions for CUIs – CUI definitions are available from source vocabularies for some (4. 6%) terms – Semantic types are defined 99
…Desiderata… • Reject “Not Elsewhere Classified” – Depends upon source vocabularies who often have NEC • Multiple Granularities – Semantic network provides abstraction and aggregation of concepts. – Some source terminologies include/are hierarchal (e. g. Me. SH tree) • Multiple Consistent Views – Defined in SRSTRE, MRCTX and MRREL 100
…Desiderata • Representing Context – Semantic Network provides relationships and the context. • Graceful Evolution – Totally dependent upon contributions of constituent vocabularies. – Deprecated concepts are well handled, mapped and documented. • Recognize Redundancy – Some undesirable redundant CUIs – Clearly ambiguous terms are catalogued and are 101
Agenda üOverview ü Metathesaurus ü Semantic Network ü Desiderata • Application • Discussion 102
Some applications of UMLS • Primary user is NLM for indexing Medline • Semantic network used to clean up metathesaurus • NCI uses for classification of precancers • Used for NLP • Used to generate Medline queries by mapping concepts to Me. SH • Used as template for DSS ontology • Has not been used as internal terminology for CDR, but SNOMED-CT may change that 103
Vanderbilt University Wiz. Order Decision Support/Order Entry System Wiz. Order uses the UMLS as a dictionary to encode free-text entries into controlled vocabularies. § The UMLS provides mapping between vocabularies, allowing to translate patient-specific information to Me. SH terms and perform automated literature retrieval. § Wiz. Order uses the tables of co-occurring concepts and the Semantic Network to provide sensible lists of potential drug interactions and adverse drug reactions, and generate fully-formed MEDLINE queries for Pub. Med. § Geissbuhler, A. and R. A. Miller (1998). "Clinical application of the UMLS in a computerized order entry and decision-support system. " Proceedings / AMIA: 320 -4. 104
MAOUSSC A French coding system • MAOUSSC, a multiaxial coding system, was used for the representation of 1500 procedures from 15 clinical specialties, using UMLS concepts and relationships whenever possible. • After UMLS was used for five years as a knowledge source for representing 1500 complex medical procedures in MAOUSSC, its value is considered significant. Bodenreider, O. , A. Burgun, et al. (1998). "Evaluation of the Unified Medical Language System as a medical knowledge source. " Journal of the American Medical Informatics Association 5(1): 76 -87. 105
National Cancer Institute (NIH) Terms in the UMLS (Unified Medical Language System) related to precancers were extracted in the first attempt to create a comprehensive listing of precancers. 106
SAPHIRE v 2 • Goal: enhance encoding of structured (CDA/XML) radiography reports • Question: Optimal combination of terminology for different sections? – Procedure, history, technique, findings and impression • Method: Comparison w/ 50 reports indexed by hand – 10 CXR, 10 head CT, 10 abdominal CT, 10 chest CT, 5 head MR, 5 bone scans – 19 source vocabularies used in comparison (including SNOMED, CCPSS, COSTAR, ICD 10, DXP, MSH, UWDA, RCD) Huang, Y. , H. J. Lowe, et al. (2003). "A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports. " Journal of the American Medical Informatics Association 10(6): 5807. 109
history Modality Optimal comb. Bone scan CXR CT abd MSH MSH+RCD CT chest CT head MRI head ∆ prec ∆ recall 269% -3% 67% -20% -75% -26% CCPSS+MSH+SNMI 21% -30% MSH+SNMI+SNM MSH+RCD+SNMI 93% 10% -5% 0 110
findings Modality Optimal comb. Bone scan CXR CT abd CT chest CT head MRI head MSH+RCD+SNMI MSH+SNMI+UMD MSH+RCD+SNMI CCPSS+MSH+SNMI MSH+RCD+SNMI ∆ prec ∆ recall 24% 87% 38% 51% 21% 14% -18% -13% 0 -13% -8% -16% 111
Conclusions • The UMLS describes the relationships between constituent terminologies via mapping to independent concepts – Production use of some of the terminologies requires additional licenses from the vendors – Addition of SNOMED-CT • Semantic network may support a range of ontology creation methods and tools • The optimal terminology or collection of terminologies for any given domain remain to be determined • Due to it’s size, the UMLS metathesaurus requires special treatment and tools (e. g. UMLSKS) 114
Agenda üOverview ü Metathesarus ü Semantic Network ü Desiderata ü Application • Discussion 115
• NLM. UMLS Knowledge Sources 2003 AC Documentation. NIH. 11 -01 -2003. Available at: http: //www. nlm. nih. gov/research/umls/UMLSDOC. HTML. Accessed 2 -23 -2004, 2004. • Humphreys BL, Mc. Cray AT, Cheh ML. Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test. J Am Med Inform Assoc. Nov-Dec 1997; 4(6): 484 -500. • Geissbuhler A, Miller RA. Clinical application of the UMLS in a computerized order entry and decision-support system. Proceedings / AMIA. 1998: 320 -324. • Bodenreider O, Burgun A, Botti G, Fieschi M, Le Beux P, Kohler F. Evaluation of the Unified Medical Language System as a medical knowledge source. Journal of the American Medical Informatics Association. Jan-Feb 1998; 5(1): 76 -87. • Hersh W, Mailhot M, Arnott-Smith C, Lowe H. Selective automated indexing of findings and diagnoses in radiology reports. Journal of Biomedical Informatics. Aug 2001; 34(4): 262 -273. • Huang Y, Lowe HJ, Hersh WR. A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports. Journal of the American Medical Informatics Association. Nov-Dec 2003; 10(6): 580 -587. • Bodenreider O, Hole WT, Humphreys B, L. , Roth L, A. , Srinivasan S. Customizing the UMLS Metathesaurus for your Applications. November, 2002. Available at: http: //umlsinfo. nlm. nih. gov/powerpoint/T 13 -color. pdf. Accessed 2 -23 -2004, 2004. • Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. Jan 1 2004; 32 Database issue: D 267 -270. • Achour SL, Dojat M, Rieux C, Bierling P, Lepage E. A UMLS-based knowledge acquisition tool for rule-based clinical decision support system development. Journal of the American Medical Informatics Association. Jul-Aug 2001; 8(4): 351 -360. 116
- Slides: 109