A Temporal Visualization of Chronic Obstructive Pulmonary Disease

A Temporal Visualization of Chronic Obstructive Pulmonary Disease Progression Using Deep Learning and Free-Text Clinical Reports Meihan Wan, Chunlei Tang, Tang Joseph M. Plasek, Haohan Zhang, Min Jeoung Kang, Haokai Sheng, Yun Xiong, David W. Bates, Li Zhou

Learning Objectives After participating in this oral presentation, the learner should be better able to: • Handle the time dimension in the EHRs; • Use the irregular time-lapse segments to demonstrate disease progression DGIM Research Day 2019

Chronic Obstructive Pulmonary Disease COPD is the third leading chronic disease in the United States, which can take well over ten years to evolve from mild to very severe. The global initiative for chronic obstructive lung disease provides the GOLD standard guidelines which physicians use in managing COPD. DGIM Research Day 2019

Free-Text Clinical Document & Its Time • Irregular Visits: The temporal granularity of a patient’s record may vary significantly over different time periods. • Incomplete Records: Clinical data may not be available for the entire progression of COPD. • Disease Progression Heterogeneity: There is no natural alignment between different patients as progression rates vary. • Discrete Observations: Although the disease progression is a continuous-time process, the patient is only observed at discrete time points with varied intervals • 1 day for an office visit • a few days for a hospitalization DGIM Research Day 2019

Constant Time Segment DGIM Research Day 2019

A Four-Layer Deep Learning Model We used a flatten layer to facilitate the unfolding process followed by a dense layer to combine the time segments into a fully-connected network. DGIM Research Day 2019

Capturing of Time Lapse Segments We used a sigmoid activation function to output a {0, 1}-sequence, in which we set two or more consecutive zeros or ones as a time segment. DGIM Research Day 2019

Dataset {P, R, C, M} • Dataset P a PHYSICIAN INTERPRETATION section of 78, 489 pulmonary notes for 2, 431 unique patients • Dataset R two main sections: FINDINGS and IMPRESSION of 1, 893, 498 chest Xray radiology reports for 13, 414 unique patients • Dataset C an ABNORMAL ECG section of 1, 029, 363 cardiology reports for 13, 918 patients • Dataset M merged Datasets P, R, and C using a heuristic merger that inserts a note into the appropriate chronological place in a corpus that was initialized to represent the most prevalent domain. DGIM Research Day 2019

Results on Dataset M • High Prediction Accuracy Our proposed model achieved a prediction accuracy of 80% on average on our corpus. DGIM Research Day 2019

Results on Dataset P Regular time segment (pre-set time window + delta window) COPD Stage Days before death documented by the experts 1 [0, 70] IV 2 [71, 140] III 3 [141, 210] II-2 4 [211, 280] II-1 5 [281, 350] II-1 6 [351, 420] II-2 7 [421, 490] II-1 8 [491, 560] II-1 9 [560, 630] I LSTM irregular time segment Days before death 1 2 [0, 65] [55, 150] 3 [145, 270] 4 5 [262, 360] [337, 484] 6 [450, 552] 7 [449, 630] DGIM Research Day 2019

DGIM Research Day 2019

- Slides: 12