Tools and Approaches to NLP in Clinical Notes

  • Slides: 20
Download presentation
Tools and Approaches to NLP in Clinical Notes Madison J. Myers Data Scientist, IBM,

Tools and Approaches to NLP in Clinical Notes Madison J. Myers Data Scientist, IBM, CODAIT; UCSF, Research Computing Infrastructure

Intro The following presentation is on my personal experiences approaching NLP problems in a

Intro The following presentation is on my personal experiences approaching NLP problems in a clinical notes setting. Short time at UCSF last summer where I was tasked with inputting clinical notes and outputting each note’s respective patient ID and UMLS information. Took me down a path where I ended up evaluating the tools that exist for UMLS extraction and NLP preprocessing work (prior to any machine learning approach). I now currently work on NLP deep learning problems at IBM and can look back at the problem with a new lense.

NLP in a Clinical Context

NLP in a Clinical Context

Project and Data Background

Project and Data Background

What is UMLS? The Metathesaurus is a large, multi-purpose, and multilingual thesaurus that contains

What is UMLS? The Metathesaurus is a large, multi-purpose, and multilingual thesaurus that contains millions of biomedical and health related concepts, their synonymous names, and their relationships. Extracting the terms via the UMLS Metathesaurus is not only becoming relatively standard for similar use cases, but it is also regularly updated, leading to a current place in which terms can be drawn from. Their uses incorporate: patient care, health services billing, public health statistics, indexing and cataloging of biomedical literature, basic, clinical, and health services research Downloading the dictionary locally, rather than using an API also allows the work to remain on the local server and not be at risk for breaking IRB protocol.

Data XX/XX/XX xxx--xx-xxxx Example of a clinical note. This does not actually contain information,

Data XX/XX/XX xxx--xx-xxxx Example of a clinical note. This does not actually contain information, but if it were a note it could include various UMLS terminologies, patient descriptions, diagnoses and conclusions regarding a patient’s health. Clinical Notes used by UCSF Unstructured text written by nurses, MDs and other medical professionals. Ranges from single sentences to several paragraphs and from anecdotal descriptions to medical diagnoses. Contains sensitive information and is therefore private to the public. This may or may not be signed by xxxxxx *Clinical notes differ in formatting within UCSF, the UC system and across different hospitals. There is no universal structure.

Existing Tool Evaluation

Existing Tool Evaluation

Tools Evaluated (did not use) Tool Purpose SNOMED UMLS extraction KNIME Preprocessing, ML Meta.

Tools Evaluated (did not use) Tool Purpose SNOMED UMLS extraction KNIME Preprocessing, ML Meta. Map UMLS parsing Meta. Map. Lite Dictionary Reference UMLS API Dictionary Reference Cli. NER Named Entity Recognition Py. Med. Termino UMLS parsing/Dictionary Apache c. TAKES UMLS parsing Apache Open. NLP Preprocessing py-UMLS parsing Health Vocabulary Rest API UMLS parsing Negation Detection

Tools Evaluated (did not use) (continued) Tool Purpose Neg. Ex Negation Detection in a

Tools Evaluated (did not use) (continued) Tool Purpose Neg. Ex Negation Detection in a clinical context Neg. Finder Negation Detection Dep. ND Negation Detection py. Con. Text. NLP Preprocessing DEEPEN Negation Detection in a clinical context Negation Resolution Negation Detection

Tools Used Tool Purpose Python 3 NLTK Library NLP Preprocessing UMLS Metathesaurus download, installed

Tools Used Tool Purpose Python 3 NLTK Library NLP Preprocessing UMLS Metathesaurus download, installed by Metamorpho. Sys Dictionary reference Quick. UMLS downloaded and installed via Git. Hub. UMLS parsing/ Algorithm that runs through notes and identifies UMLS. Reasoning: seamless pipeline that I can run locally, connecting UMLS metathesaurus to algorithm that runs through notes. AKA as little steps as possible.

Approach

Approach

Pipeline Overview

Pipeline Overview

Preprocessing First steps: Chunking, removing stop words, part of speech tagging. Tool used: NLTK

Preprocessing First steps: Chunking, removing stop words, part of speech tagging. Tool used: NLTK library (python) Next: break between punctuation points in order to feed Quick. UMLS sentence by sentence data.

Negation Detection “He does have cancer” versus “He doesn’t have cancer” has a critical

Negation Detection “He does have cancer” versus “He doesn’t have cancer” has a critical difference. Must remove the negative UMLS terminologies so that output does not have error. Important especially considering that notes may one day be used to help prescribe medicine or predict illnesses and diseases. Because I did not find a fitting tool in the time allotted, I added a few lines in my script to remove the most commonly occuring negatives and their respective chunk.

UMLS Extraction Done with Quick. UMLS with the UMLS metathesaurus being installed by the

UMLS Extraction Done with Quick. UMLS with the UMLS metathesaurus being installed by the Metamorpho. Sys tool. It was used with only three lines of code in the python 3 script! *Measured performance in a study comparing different UMLS parsing tools.

Roadblocks ● ● ● ● ● Not all tools are python or R friendly

Roadblocks ● ● ● ● ● Not all tools are python or R friendly Tools use different versions of languages. Negation Detection Performance Misspellings Connection to UMLS metathesaurus Performance/accuracy Speed Connection to SQL server New tools/lack of documentation and community IRB approval (access to notes)

Conclusions and Future Work

Conclusions and Future Work

Progress was made in the 5 week time-frame. More can be done using machine

Progress was made in the 5 week time-frame. More can be done using machine learning, especially considering negation, misspellings and UMLS term features. Additional clinical dictionaries can be added to increase universal use.

Implications are huge. We could potentially have a future where your past visits generate

Implications are huge. We could potentially have a future where your past visits generate recommendations on when to see a doctor, what disease or illness you may or may not have, if you’re likely to revisit a hospital and if you are likely to be affected by the flu, a heart attack, diabetes, etc. Medicine would be much more proactive and less reactive.

Thanks! Contact me: Madison J. Myers Madison. Jordan. Myers@ibm. com madisonjmyers@gmail. com Check out

Thanks! Contact me: Madison J. Myers Madison. Jordan. Myers@ibm. com madisonjmyers@gmail. com Check out my paper on the subject on Linked. In: www. linkedin. com/in/madisonjmyers IBM CODAIT: codait. org