Using text mining methods to detect a clinical

  • Slides: 17
Download presentation
Using text mining methods to detect a clinical infection Milena Gianfrancesco, Ph. D MPH

Using text mining methods to detect a clinical infection Milena Gianfrancesco, Ph. D MPH Postdoctoral Researcher Division of Rheumatology UCSF School of Medicine 05/18/2018 Suzanne Tamang, Ph. D Assistant Faculty Director, Data Science Stanford Center for Population Health Sciences

Zoster infection a. k. a. “shingles” Reactivation of the virus that causes chickenpox: varicella

Zoster infection a. k. a. “shingles” Reactivation of the virus that causes chickenpox: varicella zoster virus § 1 out of every 3 people will develop zoster in their lifetime. § Anyone with a history of chickenpox can get zoster, but the risk generally increases with age. § Patients on an immunosuppressive drug have a higher risk of developing zoster, and it can be more severe. § Limited knowledge available to determine which patients are at highest risk for zoster, information critical for implementing preventive strategies such as vaccination or antiviral prophylaxis. 2 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020

Clinical reporting of zoster Zoster infection is often treated outside of specialty clinic §

Clinical reporting of zoster Zoster infection is often treated outside of specialty clinic § Sometimes entered as diagnosis (i. e. ICD code); more often mentioned in clinical note § ICD codes may underestimate prevalence § Potential bias towards more severe cases 3 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020

Goals of project § Apply and validate a text mining system to extract incident

Goals of project § Apply and validate a text mining system to extract incident zoster infection from clinical notes • Do ICD codes truly associate with more severe cases of zoster? 4 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020

Study population § UCSF EHR § Data from June 1, 2012 – November 5,

Study population § UCSF EHR § Data from June 1, 2012 – November 5, 2016 for 800, 000+ individuals • Structured tables • Unstructured data (e. g. clinical notes) 5 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020

Study sample Individuals prescribed an immunosuppressant medication, and > 2 encounters in EHR 30

Study sample Individuals prescribed an immunosuppressant medication, and > 2 encounters in EHR 30 days apart. § 31 immunosuppressant medications (IM) included § N= 36, 042 IM orders § N= 16, 344 unique individuals § N= 259 cases identified via ICD code EHR N~800, 000 Zoster prevalence in general population ~ 0. 5% IM N~16, 000 Zoster prevalence in IM population ~ 1. 6% 6 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020

 Demographics of participants (n=16, 344) Female Age Race White Asian Black Other Unknown/Declined

Demographics of participants (n=16, 344) Female Age Race White Asian Black Other Unknown/Declined Ethnicity Non-Hispanic or Latino Unknown/Declined 7 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection N (%) or Mean (SD) 8, 506 (52%) 50. 31 (19. 60) 8, 304 (51%) 2, 206 (13%) 1, 111 (7%) 3, 810 (23%) 859 (5%) 12, 589 (77%) 2, 867 (18%) 888 (5%) 10/2/2020

[CL]inical [EVE]nt [R]ecognition

[CL]inical [EVE]nt [R]ecognition

Challenges of Clinical Text Analysis § Clinical notes are not SOAP notes § EMR

Challenges of Clinical Text Analysis § Clinical notes are not SOAP notes § EMR > text (IE, not NLP!) § Boundary detection is challenging (context window) • End to end? Maybe not… ‒ CLEVER: assume there are important local contexts § Synonyms, lexical variants (seed terms) § Highly ambiguous (task specific lexicon) § Semantic modifiers (base classes) § Subgrammer, sublanguage (word embeddings) • Acronyms • Colloquial terms • Out of UMLS vocabulary

CLEVER Pipeline N-gram Ranker Concept Recognizer* pn tim e … … … … …

CLEVER Pipeline N-gram Ranker Concept Recognizer* pn tim e … … … … … … … … … … of fs cid 0 cid 1 cid 2 cidm et … … … … 3. Extraction Event 0 1 0 1 … … Rule-based Extractor Statistical Extractor * We do not use a distinct concept extraction step this work, but files for the purpose are produced by CLEVER Event 0 1 0 1 Patient Labels Combined PID p 0 … pn Event 0 1 0 1 = Events Structured Encounter Data Event-level Labels … … … … PID p 0 … pn Structure d PID p 0 … pn 4. Patient-level Reporting patient id time offset age gender CPT codes ICD codes … Section Detector Class Sequencer t 364 candidate id time offset patient id note type note section target term target sequence class sequence. . . Tokeniz er t 0 Eligible Patients Clinical Text Candidate Event Matrix p 0 Unstructured 1. Terminology Construction 2. Pre. Unstructured EHR processing Data p 0 p 1 p 2 pn … … … … Qualifying Criteria … Events ______ Eligible Patients

Example of zoster dictionary

Example of zoster dictionary

Labeled Output: “Negative” SNIPPET: . He was admitted [DATE] for evaluation and management of

Labeled Output: “Negative” SNIPPET: . He was admitted [DATE] for evaluation and management of likely varicella zoster infection. His symptoms began as L-sided mouth pain ~6 -7 days PTA which has become progressively worse. He was initially seen at the… CLEVER ANNOTATION: NEGATIVE|SCREEN|DOT_SCREEN_#VCV#_DOT| zoster infection|PID|NID|Consults|DATETIME|7|VCV|807|1059 |UK|NULL|period: DOT: 1: 986: 73, evaluation: SCREEN: 686: 1013 : 46, period: DOT: 1: 1075: 16, period: DOT: 1: 1168: 109

Labeled Output: Positive SNIPPET: Would continue supportive care and refrain from using nephrotoxic agents

Labeled Output: Positive SNIPPET: Would continue supportive care and refrain from using nephrotoxic agents at this time until pt demonstrates renal recovery. Zoster in immunocompormised: would recommend decrease dose of Acyclovir to 350 mg Q 8 and treat for the minimal treatment time. CLEVER ANNOTATION: POSITIVE|VCV|PT_DOT_#VCV#_PUNCT_DOT|zoster|PID|NI D|Ambulatory Progress Notes|DATETIME|3|VCV|745|3960|ct| 3326|pt: PT: 8: 3928: 32, period: DOT: 1: 3958: 2, colon: PUNCT: 4: 39 87: 27, period: DOT: 1: 4084: 124

Zoster case detection using CLEVER § Generated a ‘dictionary’ of terms associated with zoster

Zoster case detection using CLEVER § Generated a ‘dictionary’ of terms associated with zoster to assist in labeling notes § Ran CLEVER on all notes § Compiled files for each patient • All positive mentions • All negative mentions § Will join with structured data (ICD codes, meds, labs, age, sex, race, etc. ) to help identify case status § Need to determine heuristic (i. e. # positive mentions / all mentions) to guide in labeling as “case”

Conclusion § Further refinement of CLEVER to detect all types of infections will assist

Conclusion § Further refinement of CLEVER to detect all types of infections will assist in developing a highly accurate pipeline for adverse event detection § Better phenotyping of outcomes will assist future studies in identifying risk factors to prevent occurrences of adverse events 15 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020

Acknowledgements Funding: § AHRQ: R 01 HS 024412 (PI: Yazdany) § NIAMS: F 32

Acknowledgements Funding: § AHRQ: R 01 HS 024412 (PI: Yazdany) § NIAMS: F 32 AR 070585 (PI: Gianfrancesco) Rheumatology Quality and Informatics Laboratory (QUIL) § Jinoos Yazdany § Gabriela Schmajuk § Dana Ludwig § Steve Shiboski § Laura Trupin § Michael Evans § Julia Kay § Zara Izadi § Jing Li 16 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020