NLM Indexing Initiative Tools for NLP Meta Map

  • Slides: 27
Download presentation
NLM Indexing Initiative Tools for NLP: Meta. Map and the Medical Text Indexer Natural

NLM Indexing Initiative Tools for NLP: Meta. Map and the Medical Text Indexer Natural Language Processing: State of the Art, Future Directions April 23, 2012 Alan R. Aronson U. S. National Library of Medicine

Outline • Introduction • Meta. Map • Overview • Linguistic roots • Recent Word

Outline • Introduction • Meta. Map • Overview • Linguistic roots • Recent Word Sense Disambiguation (WSD) efforts • The NLM Medical Text Indexer (MTI) • • Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing U. S. National Library of Medicine 2

Meta. Map/MTI Example • Meta. Map identifies biomedical concepts in text • Medical Text

Meta. Map/MTI Example • Meta. Map identifies biomedical concepts in text • Medical Text Indexer (MTI) summarizes text using Meta. Map and the Medical Subject Headings (Me. SH) vocabulary U. S. National Library of Medicine 3

Outline • Introduction • Meta. Map • Overview • Linguistic roots • Recent Word

Outline • Introduction • Meta. Map • Overview • Linguistic roots • Recent Word Sense Disambiguation (WSD) efforts • The NLM Medical Text Indexer (MTI) • • Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing U. S. National Library of Medicine 4

Meta. Map Overview • • • Named-entity recognition program Identify UMLS Metathesaurus concepts in

Meta. Map Overview • • • Named-entity recognition program Identify UMLS Metathesaurus concepts in text Linguistic rigor Flexible partial matching Emphasis on thoroughness rather than speed U. S. National Library of Medicine 5

The Meta. Map Algorithm • Parsing • Using SPECIALIST minimal commitment parser, SPECIALIST lexicon,

The Meta. Map Algorithm • Parsing • Using SPECIALIST minimal commitment parser, SPECIALIST lexicon, Med. Post part of speech tagger • Variant generation • Using SPECIALIST lexicon, Lexical Variant Generation (LVG) • Candidate retrieval • From the Metathesaurus • Candidate evaluation • Mapping construction U. S. National Library of Medicine 6

Meta. Map Evaluation Function • Weighted average of • centrality (is the head involved?

Meta. Map Evaluation Function • Weighted average of • centrality (is the head involved? ) • variation (average of all variation) • coverage (how much of the text is matched? ) • cohesiveness (in how many pieces? ) U. S. National Library of Medicine 7

C 0180860: Filters [mnob] C 0581406: Optical filter [medd] C 1522664: filter information process

C 0180860: Filters [mnob] C 0581406: Optical filter [medd] C 1522664: filter information process [inpr] C 1704449: Filter (function) [cnce] Inferior vena caval stent filter (PMID 3490760) C 1704684: Filter Device Component [medd] UMLS Semantic(CUI) Type String Metathesaurus Concept Unique Identifier Meta. Map Score (≤Metathesaurus 1000) Candidate Concepts: C 1875155: Filter - medical device [medd] 909 C 0080306: Inferior Vena Cava Filter [medd] 804 C 0180860: Filter [mnob] 804 C 0581406: Filter [medd] 804 C 1522664: Filter [inpr] 804 C 1704449: Filter [cnce] 804 C 1704684: Filter [medd] C 0038257: Stent, device [medd] 804 C 1875155: FILTER [medd] C 1705817: Stent Device Component [medd] 717 C 0521360: Inferior vena caval [blor] 673 C 0042460: Vena caval [bpoc] 637 C 0038257: Stent [medd] 637 C 1705817: Stent [medd] 637 C 0447122: Vena [bpoc] Meta. Map Processing Example U. S. National Library of Medicine 8

Meta. Map Final Mappings Inferior vena caval stent filter Final Mappings (subsets of candidate

Meta. Map Final Mappings Inferior vena caval stent filter Final Mappings (subsets of candidate sets): Meta Mapping (911) 909 C 0080306: Inferior Vena Cava Filter [medd] 637 C 1705817: Stent [medd] Meta Mapping (911): 909 C 0080306: Inferior Vena Cava Filter [medd] 637 C 0038257: Stent [medd] U. S. National Library of Medicine 9

Word Sense Disambiguation (WSD) • Kids with colds may also have a sore throat,

Word Sense Disambiguation (WSD) • Kids with colds may also have a sore throat, cough, headache, mild fever, fatigue, muscle aches, and loss of appetite. • Candidate Meta. Map mappings for cold C 0234192: Cold (Cold sensation) C 0009264: Cold (Cold temperature) C 0009443: Cold (Common cold) U. S. National Library of Medicine 10

Knowledge-based WSD • Compare UMLS candidate concept profile vectors to context of ambiguous word

Knowledge-based WSD • Compare UMLS candidate concept profile vectors to context of ambiguous word • Concept profile vectors’ words from definition, synonyms and related concepts Common cold Weight 265 126 41 40 Word infect disease fever cough Cold temperature Weight 258 86 72 48 Word temperature hypothermia effect hot • Candidate concept with highest similarity is predicted U. S. National Library of Medicine 11

Knowledge-based WSD • Kids with colds may also have a sore throat, cough, headache,

Knowledge-based WSD • Kids with colds may also have a sore throat, cough, headache, mild fever, fatigue, muscle aches, and loss of appetite. Common cold Weight 265 126 41 40 Word infect disease fever cough Cold temperature Weight 258 86 72 48 Word temperature hypothermia effect hot U. S. National Library of Medicine 12

Automatically Extracted Corpus WSD • MEDLINE contains numerous examples of ambiguous words context, though

Automatically Extracted Corpus WSD • MEDLINE contains numerous examples of ambiguous words context, though not disambiguated Candidate concept CUI: C 0009443 cold Unambiguous synonyms commoncold common Query "common cold"[tiab] OR "acute nasopharyngitis"[tiab] … Pub. Med CUI: C 0009264 cold temperature "cold temperature"[tiab] OR "low temperature"[tiab] … U. S. National Library of Medicine 13

WSD Method Results • Corpus method has better accuracy than UMLS method UMLS Corpus

WSD Method Results • Corpus method has better accuracy than UMLS method UMLS Corpus NLM WSD 0. 65 0. 69 MSH WSD 0. 81 0. 84 • MSH WSD data set created using Me. SH indexing • • • 203 ambiguous words 81 semantic types 37, 888 ambiguity cases • Indirect evaluation with summarization and MTI correlates with direct evaluation U. S. National Library of Medicine 14

Outline • Introduction • Meta. Map • Overview • Linguistic roots • Recent Word

Outline • Introduction • Meta. Map • Overview • Linguistic roots • Recent Word Sense Disambiguation (WSD) efforts • The NLM Medical Text Indexer (MTI) • • Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing U. S. National Library of Medicine 15

MEDLINE Citation Example U. S. National Library of Medicine 16

MEDLINE Citation Example U. S. National Library of Medicine 16

MTI • Meta. Map Indexing – Actually found in text Received 2, 330 Indexer

MTI • Meta. Map Indexing – Actually found in text Received 2, 330 Indexer Feedbacks • Incorporated Restrict to Me. SH – Maps 40% into MTI UMLS Concepts to Me. SH March 20, 2012 should only be– indexed for animals, not for • Hibernation Pub. Med Related Citations "stem cell hibernation" Not necessarily found in text Clove (spice) should not be mapped to the verb "cleave" U. S. National Library of Medicine 17

MTI Uses • Assisted indexing of MEDLINE by Index Section • Assisted indexing of

MTI Uses • Assisted indexing of MEDLINE by Index Section • Assisted indexing of Cataloging and History of Medicine Division records • Automatic indexing of NLM Gateway meeting abstracts • First-line indexing (MTIFL) since February 2011 U. S. National Library of Medicine 18

MTI as First-Line Indexer (MTIFL) MTI Processes/ Recommends Me. SH “Normal” MTI Processing Reviser

MTI as First-Line Indexer (MTIFL) MTI Processes/ Recommends Me. SH “Normal” MTI Processing Reviser Reviews Selects Adjusts Approves Indexing Displays in Pub. Med as Usual Indexer Reviews Selects U. S. National Library of Medicine 19

MTI as First-Line Indexer (MTIFL) MTI 23 MEDLINE 45 MEDLINE Journals MTIFL MTI Processing

MTI as First-Line Indexer (MTIFL) MTI 23 MEDLINE 45 MEDLINE Journals MTIFL MTI Processing Reviser Processes/ Indexes Me. SH Reviews Selects Adjusts Approves Indexer Reviews Selects U. S. National Library of Medicine Indexing Displays in Pub. Med as Usual Index Section Compares MTI and Reviser Indexing 20

Check. Tags Machine Learning Results • 200 k citations for training and 100 k

Check. Tags Machine Learning Results • 200 k citations for training and 100 k citations for testing Check. Tag Middle Aged Child, Preschool Adult Male Aged, 80 and over Young Adult Female Adolescent Humans Infant Swine F 1 before ML 1. 01% 11. 72% 6. 11% 19. 49% 38. 47% 1. 50% 2. 83% 46. 06% 24. 75% 79. 98% 34. 39% 71. 04% F 1 with ML 59. 50% 54. 67% 45. 40% 56. 84% 71. 14% 30. 89% 31. 63% 73. 84% 42. 36% 91. 33% 44. 69% 74. 75% U. S. National Library of Medicine Improvement +58. 49 +42. 95 +39. 29 +37. 35 +32. 67 +29. 39 +28. 80 +27. 78 +17. 61 +11. 35 +10. 30 +3. 71 21

Check. Tags Machine Learning Results • 200 k citations for training and 100 k

Check. Tags Machine Learning Results • 200 k citations for training and 100 k citations for testing Check. Tag Middle Aged Child, Preschool Adult Male Aged, 80 and over Young Adult Female Adolescent Humans Infant Swine F 1 before ML 1. 01% 11. 72% 6. 11% 19. 49% 38. 47% 1. 50% 2. 83% 46. 06% 24. 75% 79. 98% 34. 39% 71. 04% F 1 with ML 59. 50% 54. 67% 45. 40% 56. 84% 71. 14% 30. 89% 31. 63% 73. 84% 42. 36% 91. 33% 44. 69% 74. 75% U. S. National Library of Medicine Improvement +58. 49 +42. 95 +39. 29 +37. 35 +32. 67 +29. 39 +28. 80 +27. 78 +17. 61 +11. 35 +10. 30 +3. 71 22

Check. Tags Machine Learning Results • 200 k citations for training and 100 k

Check. Tags Machine Learning Results • 200 k citations for training and 100 k citations for testing Check. Tag Middle Aged Child, Preschool Adult Male Aged, 80 and over Young Adult Female Adolescent Humans Infant Swine F 1 before ML 1. 01% 11. 72% 6. 11% 19. 49% 38. 47% 1. 50% 2. 83% 46. 06% 24. 75% 79. 98% 34. 39% 71. 04% F 1 with ML 59. 50% 54. 67% 45. 40% 56. 84% 71. 14% 30. 89% 31. 63% 73. 84% 42. 36% 91. 33% 44. 69% 74. 75% U. S. National Library of Medicine Improvement +58. 49 +42. 95 +39. 29 +37. 35 +32. 67 +29. 39 +28. 80 +27. 78 +17. 61 +11. 35 +10. 30 +3. 71 23

MTI - How are we doing? on Precision versus Recall Fruition of 2011 Focus

MTI - How are we doing? on Precision versus Recall Fruition of 2011 Focus Changes U. S. National Library of Medicine 24

U. S. National Library of Medicine 25

U. S. National Library of Medicine 25

The Gene Indexing Assistant (GIA) • An automated tool to assist the indexer in

The Gene Indexing Assistant (GIA) • An automated tool to assist the indexer in identifying and creating Gene. RIFs • • Evaluate the article Identify genes Make links to Entrez Gene Suggest gene. RIF annotation • Anticipated Benefits: • Increase in speed • Increase in comprehensiveness U. S. National Library of Medicine 26

The NLM Indexing Initiative Team • • • Alan R. Aronson (Project Leader) James

The NLM Indexing Initiative Team • • • Alan R. Aronson (Project Leader) James G. Mork (Staff) François-Michel Lang (Staff) Willie J. Rogers (Staff) Antonio J. Jimeno-Yepes (Postdoctoral Fellow) J. Caitlin Sticco (Library Associate Fellow) http: //metamap. nlm. nih. gov U. S. National Library of Medicine 27