A Machine Learning Framework for Predicting Frequent Emergency

  • Slides: 24
Download presentation
A Machine Learning Framework for Predicting Frequent Emergency Department Users Using Claims Data Summer

A Machine Learning Framework for Predicting Frequent Emergency Department Users Using Claims Data Summer (Xia) Hu Margret Bjarnadottir Sean Barnes Bruce Golden University of Maryland, College Park POMS Conference May 06, 2016, Orlando, Florida 1

Background: Frequent Emergency Department (ED) Usage Negative Effects – ED Overcrowding Patients • •

Background: Frequent Emergency Department (ED) Usage Negative Effects – ED Overcrowding Patients • • Higher risks Prolonged wait times Higher abandon rates Higher rates of dissatisfaction Providers • • • Higher rates of medical errors Lower productivity and morale Reduced ability responding to mass casualty incidents Insurance Company • Much higher payment (compared with regular PC visits) High ED Usage put stress on the ED system as well as the payer! 2

Background: Frequent ED Users & ED Jumpers Ø Types of ED Users Frequent ED

Background: Frequent ED Users & ED Jumpers Ø Types of ED Users Frequent ED Users (members with ≥ 4 ED visits in a single year) Non-frequent ED Users (members with < 4 ED visits in a single year) ED Jumpers Year 1 Year 2 Ø Facts from Claims Data • Frequent ED users constitute 21% of the member population • Yet they account for 78% of ED visits 3

Objective • Predict potential frequent ED users and ED Jumpers based on their claims

Objective • Predict potential frequent ED users and ED Jumpers based on their claims records from the previous year. • Identify characteristics of frequent users and Jumpers. 4

Method Overview Ø Descriptive Analysis Ø Prediction – Frequent ED Users and ED Jumpers

Method Overview Ø Descriptive Analysis Ø Prediction – Frequent ED Users and ED Jumpers Ø User Segmentation User segmentation via Clustering 5

Data & Preprocessing Ø Raw Data Summary • Five datasets • Four years (Jan

Data & Preprocessing Ø Raw Data Summary • Five datasets • Four years (Jan 09 – Dec. 12) Med Pharmacy Lab MH Dental Ø Data Processing: – Transform claim-based records to patient-based yearly profiles Claim-based Raw Data Eligible Enrollment (≥ 350 days of enrollment for 2 consecutive years) Information Uniqueness (unique gender? Birth year? ) Feature Extraction Claim Aggregation Patient-based Yearly Profiles 6

Final Data • Result in 164, 402 member files, 439 features per year Ø

Final Data • Result in 164, 402 member files, 439 features per year Ø Feature Overview Datasets Feature (from observation year) Datasets Feature (based on the observation year) Profile Member masked ID Sex, Age, Birth Year Profile year Years of consecutive enrollment Number of dental visits Total number of unique dental providers (Top 20 CCS) Total number of unique dental visits Total number of unique MH providers Number of visits per MH disease Number of MH visits divided by number of unique MH providers Indicator of any mental visits Number of different pharmacies Number of unique medications ED Number of ED visits ED intensity group Number of different ED complaints (Based on the CCS Category ) Number of different ED vendors Number of ED visits divided by number of ED vendors Indicator of any mental health ED visits Number of ED visits per general diagnosis group (19 variables ) Number of ED visits per CCS diagnose group (287 variables ) NYU ED usage probability (9 variables ) Number of different chronic diseases (Based on CCS Category ) Number of unique chronic visits Number of visits per chronic disease (100 chronic diseases ) Number of outpatient visits Number of inpatient visits Number of primary care visits Dental Mental Health Pharmacy Total days of medication supply Total days of opium medication supply Ø Medical Number of Result Year’s ED Visits • On average most members have very few ED visits. 7

Descriptive Analysis: Influence of Number of ED Visits from Observation Year Result Year Observation

Descriptive Analysis: Influence of Number of ED Visits from Observation Year Result Year Observation Year Total: 67 % 12 % (Jumper) 12 % 9% 79% 21% • Number of ED Visits from Observation year, as expected, explains Result Year’s ED visits the best Among frequent ED users, 43% of them stay as frequent ED users in the Result Year • Linear trend between the number of ED visits in two consecutive years

Predict Frequent ED Users: Supervised Machine Learning Ø Data Setup Data 439 Features from

Predict Frequent ED Users: Supervised Machine Learning Ø Data Setup Data 439 Features from 1 st Year for each eligible member Predict X Data Partition Ø Training Set (70%) Frequent/ Nonfrequent ED User in 2 nd Year? Y Validation Set (15%) Test Set (15%) Supervised Machine Learning • Unbalanced Binary Classification Problem: (21%) (79%) • Six machine learning algorithms: – Logistic regression (with or without regularization), Naïve Bayes, Decision Tree (CART), Boosted Tree (C 5. 0), Random Forest. 9

Predict Frequent ED Users Performance Metric: Detection Accuracy rates – percentage of correct predictions

Predict Frequent ED Users Performance Metric: Detection Accuracy rates – percentage of correct predictions of 2 nd Year top ED users selected by each model on unseen validation set • Models: • • Baseline: Use the number of ED visits from the observation year as estimation for next year’s number of ED visit Six machine learning models Best Machine Learning Model (C 5. 0) v. s. Baseline Model • The best machine learning model improved detection accuracy of Frequent ED Users by 7. 5% 10

Top Influential Features from Machine Learning in Predicting Frequent ED Users 11

Top Influential Features from Machine Learning in Predicting Frequent ED Users 11

Predict ED Jumpers Ø ED Jumpers Definition Non Jumper Observation Year (1 st Year)

Predict ED Jumpers Ø ED Jumpers Definition Non Jumper Observation Year (1 st Year) Result Year (2 nd Year) Percentage Ø Baseline Model: Use 1) # of ED claims 2) # of outpatient claims from the 1 st year as estimation for next year’s number of ED visit. Machine Learning Models: 84% 16% LDA, MDA, QDA, FDA, RDA Prediction • The best machine learning model improved detection accuracy of frequent ED Users by 53% 12

ED Jumpers: User Segmentation via Clustering Ø Clustering among ED Jumpers • Mixture model

ED Jumpers: User Segmentation via Clustering Ø Clustering among ED Jumpers • Mixture model clustering (based on BIC) on Training set 2 Summary 4 of 3 ED Jumper Clusters Cluster 1 Population 1518 1901 600 Main Character Healthy male who care about teeth Elderly sick people Jumping Level Median Summary High percentage of male (50%); Go to dental most frequently; Least number of chronic diseases, days of supply for medicine, and outpatient claims. 5 6 9956 283 20 People with mental health problems Relative healthy young people ED abuser/ high medical users Old alcohol user Median Low High Oldest Least percentage of preventable ED visits; Youngest; Highest percentage of preventable ED visits; Least number of dental claims; Highest number of chronic diseases Longest days of supply for medicine Largest number of MH claims. Small number of medicine supplies No mental issue. Largest number of chronic claims. Largest percentage of alcohol-related ED visits Relatively old. 13

Feature Summary of ED Jumpers: User Segmentation via Clustering Cluster 6 Cluster 1 14

Feature Summary of ED Jumpers: User Segmentation via Clustering Cluster 6 Cluster 1 14

Future Work Ø Two-year Analysis (Current Focus): • Analyze thresholds that define frequent ED

Future Work Ø Two-year Analysis (Current Focus): • Analyze thresholds that define frequent ED users – How do the coefficients and performance change as a function of the thresholds? • Explore the relationship of mental health utilization on ED utilization and other outcome variables Ø Multi-year Analysis (Next Step): • Create multi-year models – use multiple years of claims history to predict future ED usage levels • Analyze the influence of the allowable gap on the prediction performance – How accurately can we predict ED performance with only partial-year enrollment? • Explore the “stickiness” of frequent ED use – How consistent are frequent ED users from one year to the next? 15

Questions and Comments? Summer (Xia) Hu University of Maryland, College Park xhu 64@umd. edu

Questions and Comments? Summer (Xia) Hu University of Maryland, College Park xhu 64@umd. edu 16

Appendix Example of feature extraction – Classify reasons for each ED visit by three

Appendix Example of feature extraction – Classify reasons for each ED visit by three grouping algorithms i) General diagnosis category Each ED visit ii) NYU ED Visit Severity Algorithm Based on historical probability distribution iii) Clinical Classifications Software (CCS) Three grouping algorithms Preventable ED Visits Non-preventable ED Visits Others Visit types • Non-Emergent • Primary care treatable • ED needed yet preventable • MH-related • Alcohol-related • Substance abuserelated • Injury • Unclassified 17

Appendix Influence of Age on Frequent ED Users Conclusions Age Description Group 0 Most

Appendix Influence of Age on Frequent ED Users Conclusions Age Description Group 0 Most likely to become 0 -10 Chances of becoming decreases with age 8 -13 Least likely to become 13 -23 Chances of becoming increases with age 23 -65 Chances of becoming decreases with age 65+ [Insufficient sample size] (> 40%) ( 10%)

Appendix Influence of Age on Super ED Users Conclusions: Age Group Description 0 -10

Appendix Influence of Age on Super ED Users Conclusions: Age Group Description 0 -10 Chances of becoming decreases with age 4 -16 Least likely to become 16 -23 23 -65 65+ Chances of becoming increases with age Chances of becoming decreases with age [Insufficient sample size] ( ~0. 1%)

Appendix Machine Learning Result – Frequent ED User Top N Users Picked By Models

Appendix Machine Learning Result – Frequent ED User Top N Users Picked By Models Top Percentage Picked By Models Improvement on Detection Accuracy (Best Model v. s. Baseline) 250 1% 7. 5% 500 2% 6. 8% 1000 4% 4. 4% 1500 6% 3. 2% 2000 8% 1. 7% 3000 12% 3. 8% 4000 16% 4. 1% 5000 20% 4. 4% Total Validation Size: 24, 660 100% NA 20

Appendix Key Characteristics Definition Non Jumper Super Jumper Observation Year Gender: Female Age: older

Appendix Key Characteristics Definition Non Jumper Super Jumper Observation Year Gender: Female Age: older Result Year Percentage • 84% 16% 0. 16% Machine learning algorithms did not perform very well in detecting jumpers Mean 0. 13 0. 27 0. 78 Std 0. 77 1. 16 2. 56 • Have 48% more outpatient services (>7. 5) • Have on average 18% more ED visits (>1. 15) Mean 20 23 31 Std 16 16 13 • ED Visit complaints relate to: ü Alcohol, drugs, mental health, headaches, the respiratory system, chest pain, disorder of the teeth and jaw, skin infections ü Joint disorders, sprains and strains, abdominal pain, injuries, spondylosis, and other back problems • In General Sicker with chronic diseases such as: ü Thyroid disorders, acute cerebrovascular ü Disease, spondylosis, and other back problems • Pay more visits to MH doctors - for mood disorders and schizophrenia • Fill larger prescriptions for opium-related medicines • Frequent EDs for dental complaints but have low dental service utilization • On average > 9 Lab tests completed • On average have more than 2 unique medication

Appendix ED Jumpers 22

Appendix ED Jumpers 22

Appendix ED Jumpers (continues) 23

Appendix ED Jumpers (continues) 23

Feature Summary of ED Jumpers: User Segmentation via Clustering 24

Feature Summary of ED Jumpers: User Segmentation via Clustering 24