Using text mining methods to detect a clinical
- Slides: 17
Using text mining methods to detect a clinical infection Milena Gianfrancesco, Ph. D MPH Postdoctoral Researcher Division of Rheumatology UCSF School of Medicine 05/18/2018 Suzanne Tamang, Ph. D Assistant Faculty Director, Data Science Stanford Center for Population Health Sciences
Zoster infection a. k. a. “shingles” Reactivation of the virus that causes chickenpox: varicella zoster virus § 1 out of every 3 people will develop zoster in their lifetime. § Anyone with a history of chickenpox can get zoster, but the risk generally increases with age. § Patients on an immunosuppressive drug have a higher risk of developing zoster, and it can be more severe. § Limited knowledge available to determine which patients are at highest risk for zoster, information critical for implementing preventive strategies such as vaccination or antiviral prophylaxis. 2 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020
Clinical reporting of zoster Zoster infection is often treated outside of specialty clinic § Sometimes entered as diagnosis (i. e. ICD code); more often mentioned in clinical note § ICD codes may underestimate prevalence § Potential bias towards more severe cases 3 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020
Goals of project § Apply and validate a text mining system to extract incident zoster infection from clinical notes • Do ICD codes truly associate with more severe cases of zoster? 4 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020
Study population § UCSF EHR § Data from June 1, 2012 – November 5, 2016 for 800, 000+ individuals • Structured tables • Unstructured data (e. g. clinical notes) 5 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020
Study sample Individuals prescribed an immunosuppressant medication, and > 2 encounters in EHR 30 days apart. § 31 immunosuppressant medications (IM) included § N= 36, 042 IM orders § N= 16, 344 unique individuals § N= 259 cases identified via ICD code EHR N~800, 000 Zoster prevalence in general population ~ 0. 5% IM N~16, 000 Zoster prevalence in IM population ~ 1. 6% 6 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020
Demographics of participants (n=16, 344) Female Age Race White Asian Black Other Unknown/Declined Ethnicity Non-Hispanic or Latino Unknown/Declined 7 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection N (%) or Mean (SD) 8, 506 (52%) 50. 31 (19. 60) 8, 304 (51%) 2, 206 (13%) 1, 111 (7%) 3, 810 (23%) 859 (5%) 12, 589 (77%) 2, 867 (18%) 888 (5%) 10/2/2020
[CL]inical [EVE]nt [R]ecognition
Challenges of Clinical Text Analysis § Clinical notes are not SOAP notes § EMR > text (IE, not NLP!) § Boundary detection is challenging (context window) • End to end? Maybe not… ‒ CLEVER: assume there are important local contexts § Synonyms, lexical variants (seed terms) § Highly ambiguous (task specific lexicon) § Semantic modifiers (base classes) § Subgrammer, sublanguage (word embeddings) • Acronyms • Colloquial terms • Out of UMLS vocabulary
CLEVER Pipeline N-gram Ranker Concept Recognizer* pn tim e … … … … … … … … … … of fs cid 0 cid 1 cid 2 cidm et … … … … 3. Extraction Event 0 1 0 1 … … Rule-based Extractor Statistical Extractor * We do not use a distinct concept extraction step this work, but files for the purpose are produced by CLEVER Event 0 1 0 1 Patient Labels Combined PID p 0 … pn Event 0 1 0 1 = Events Structured Encounter Data Event-level Labels … … … … PID p 0 … pn Structure d PID p 0 … pn 4. Patient-level Reporting patient id time offset age gender CPT codes ICD codes … Section Detector Class Sequencer t 364 candidate id time offset patient id note type note section target term target sequence class sequence. . . Tokeniz er t 0 Eligible Patients Clinical Text Candidate Event Matrix p 0 Unstructured 1. Terminology Construction 2. Pre. Unstructured EHR processing Data p 0 p 1 p 2 pn … … … … Qualifying Criteria … Events ______ Eligible Patients
Example of zoster dictionary
Labeled Output: “Negative” SNIPPET: . He was admitted [DATE] for evaluation and management of likely varicella zoster infection. His symptoms began as L-sided mouth pain ~6 -7 days PTA which has become progressively worse. He was initially seen at the… CLEVER ANNOTATION: NEGATIVE|SCREEN|DOT_SCREEN_#VCV#_DOT| zoster infection|PID|NID|Consults|DATETIME|7|VCV|807|1059 |UK|NULL|period: DOT: 1: 986: 73, evaluation: SCREEN: 686: 1013 : 46, period: DOT: 1: 1075: 16, period: DOT: 1: 1168: 109
Labeled Output: Positive SNIPPET: Would continue supportive care and refrain from using nephrotoxic agents at this time until pt demonstrates renal recovery. Zoster in immunocompormised: would recommend decrease dose of Acyclovir to 350 mg Q 8 and treat for the minimal treatment time. CLEVER ANNOTATION: POSITIVE|VCV|PT_DOT_#VCV#_PUNCT_DOT|zoster|PID|NI D|Ambulatory Progress Notes|DATETIME|3|VCV|745|3960|ct| 3326|pt: PT: 8: 3928: 32, period: DOT: 1: 3958: 2, colon: PUNCT: 4: 39 87: 27, period: DOT: 1: 4084: 124
Zoster case detection using CLEVER § Generated a ‘dictionary’ of terms associated with zoster to assist in labeling notes § Ran CLEVER on all notes § Compiled files for each patient • All positive mentions • All negative mentions § Will join with structured data (ICD codes, meds, labs, age, sex, race, etc. ) to help identify case status § Need to determine heuristic (i. e. # positive mentions / all mentions) to guide in labeling as “case”
Conclusion § Further refinement of CLEVER to detect all types of infections will assist in developing a highly accurate pipeline for adverse event detection § Better phenotyping of outcomes will assist future studies in identifying risk factors to prevent occurrences of adverse events 15 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020
Acknowledgements Funding: § AHRQ: R 01 HS 024412 (PI: Yazdany) § NIAMS: F 32 AR 070585 (PI: Gianfrancesco) Rheumatology Quality and Informatics Laboratory (QUIL) § Jinoos Yazdany § Gabriela Schmajuk § Dana Ludwig § Steve Shiboski § Laura Trupin § Michael Evans § Julia Kay § Zara Izadi § Jing Li 16 Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 10/2/2020
- Text and web mining
- Text-to-media connection
- Text analytics and text mining
- Text analytics and text mining
- Strip mining vs open pit mining
- Strip mining vs open pit mining
- Difference between strip mining and open pit mining
- Mining multimedia databases in data mining
- Eck
- Subsurface mining
- Frequent itemset mining methods
- Methods to avoid false discoveries in data mining
- Binning techniques in data mining
- Clinical teaching methods
- Direct wax pattern
- Diffset
- Detect golden ticket attack
- Iridium jewelry