APPLYING THE CLASSIFICATION TREE METHOD TO PREDICT A
















- Slides: 16
APPLYING THE CLASSIFICATION TREE METHOD TO PREDICT A “KILLER” IN CLINICAL DECISION-MAKING Qiongqiong Liu MS, Isaac Li Ph. D, Yi Wang MS, Edward Tsai Ph. D National Board of Osteopathic Medical Examiners, Chicago, Illinois Med. Biquitous Annual Conference June 5 -6 2017 John Hopkins School of Medicine Baltimore, MD
COMLEX-USA LEVEL 3 EXAMINATION Taken by post-graduate Doctors of Osteopathic Medicine (DOs) in their first or second years of residency (OGME-1 or OGME-2) Adapting to competency-based medical education – Multiple Choice Question (MCQ) – Clinical Decision Making (CDM) – CDM cases assess the ability to apply medical knowledge and clinical skills at specific decision points in patient safety
CLINICAL DECISION-MAKING (CDM) CASES Key features (KF): the critical steps required to appropriately diagnose and treat a clinical case. Multiple KFs embedded in a CDM case. As a proxy for misdiagnosis/mistreatment, if residents selected a response that might harm the patient on the KF, also known as “hitting a killer”.
SAMPLE CDM CASE You are asked to see a 71 -year-old female at the extended care facility because she has lost 7. 3 kg (16 lb) over the past 3 months without having been on a diet. She was diagnosed with suspected Alzheimer disease 6 months ago. She was admitted to the extended care facility because of her inability to take care of herself at home. She was started on donepezil 4 months ago. The dose was increased after the first month to the current dose of 10 mg once daily at bed time. She has had osteoarthritis and chronic knee pain in the past, for which she takes acetaminophen with good results. She had surgery for a rectal prolapse 3 years ago. She has experienced constipation for several years. She eats poorly and requires encouragement from the staff. She is confined to a wheelchair when not in bed and needs 2 people to assist her movement from bed to chair. She appears comfortable and has a good fluid intake with no signs of urinary or respiratory issues, pain, fever, or chills. Vital signs reveal: Temperature: 36. 4°C (97. 6°F) Blood pressure: 118/70 mm. Hg Heart rate: 60/min Respiratory rate: 14/min
EXTENDED MULTIPLE CHOICE (EMC) QUESTION What test(s), if any, will you order for this patient at this time? You may select up to 7 of the 13 options listed below; select the last option if no investigation is needed. A. B. C. D. E. F. G. H. I. J. K. L. M. barium enema basic metabolic panel complete blood count creatinine level CT scan of the abdomen and pelvis electrolyte panel fecal occult blood testing glycosylated hemoglobin level serum albumin level thyroid-stimulating hormone level ultrasonography of the abdomen urinalysis no investigation Answer Key Two required correct responses: H: glycosylated hemoglobin J: thyroid-stimulating hormone level Zero points for selecting any of the following: A: barium enema(Killer) E: CT scan of the abdomen and pelvis(Killer) K: ultrasonography of the abdomen(Killer) More than 7 options (Over treatment)
MOTIVATION Previous results: Residents hitting a killer was associated with a lower total score on COMLEXUSA Level 3. Further analysis: Is there any relationship between hitting a killer and performance on exams of the COMLEX-LEVEL 1 & 2 ? If we use residents’ exam performance to predict this “killer hitting” behavior, which predictor will be the most important in the model?
DATA AND METHOD Software use: a SAS high performance procedure (HPSPLIT) Predict the probability of a “Killer” outcome. Number of observations: Around 800 residents. Data Use: COMLEXUSA series standard scores, subscores, and other demographic variables.
RESULTS: A CLASSIFICATION TREE WITH “KILLER” NODES
MODEL BUILDING PROCEDURE The model is based on a partition of the predictor space into non-overlapping segments, which correspond to the terminal nodes or leaves of the tree. The partitioning is done recursively, starting with the root node, which contains all the data, and ending with the terminal nodes. At each step of the recursion, the parent node is split into child nodes through selection of a predictor variable and a split value that minimize the variability in the response across the child nodes. The splitting rules that define the leaves provide the information that is needed to score new data.
RESULTS: EASY INTERPRETATION FROM A SUBTREE The results showed school is an important factor on one major branch, and psychiatry score from the Level 3 exam is another important predictor when it is lower than 534. 69. This tree model gives each layer predictive information with certain rules and conditions so we could investigate more when we try to link these scores with school information.
RESULTS: MODEL-DATA FIT Model-Based Confusion Matrix Predicted Actual Error Rate 0 1 0 365 13 0. 0344 1 13 372 0. 0338 Model-Based Fit Statistics for Selected Tree No. of Leaves ASE Missclassification Sensitivity 104 0. 0277 0. 0341 0. 9662 Specificity Entropy Gini 0. 9656 0. 13 0. 05 RSS AUC 42. 34 0. 99 The fit statistics showed small error rate(0. 0341) for misclassification when we put new data in the model, which indicate our data fit model well.
RESULTS: SPECIFICITY AND SENSITIVITY The sensitivity indicates the probability of correctly prediction of resident hit killer , 0. 9662. The specificity indicates the probability of correctly prediction of resident not hit killer, which is 0. 9656.
RESULTS: IMPORTANCE FACTOR IN THE MODEL Variable Importance Variable Training Relative Importance Count Biochemistry (L 1) 0. 6754 5. 2538 8 Pediatric (L 2) 0. 6653 5. 1759 8 Anatomy (L 1) 0. 5745 4. 4694 7 In addition to school effect, variable importance shows that basic science disciplines of biochemistry, pediatric, and anatomy are important predictors of mistreatment/mis diagnosis. This finding calls attention to the instruction of these subjects, particularly for those COMs identified by the model.
SUMMARY OF THE TREE BUILDING Advantages Use for complex problem solving. Do not need worry about assumptions for the model. Use for new data predication and easy to visualize. Could easily change the partition rules. Disadvantages 1. Overfitting and under-fitting, particularly for smalldata. 2. Strong correlations could lead the model to outcomes we did not expect.
THANK YOU!
REFERENCES Bordage, G. , Brailovsky, C. , Carretier, H. , & Page, G. (1995). Content validation of key features on a national examination of clinical decision-making skills. Academic Medicine, 70, 276 -81. Holmboe, E. S. , Ward, D. S. , Reznick, R. K. , Katsufrakis, P. J. , Leslie, K. M. , Patel, V. L. , Ray, D. D. , & Nelson, E. A. (2011). Faculty development in assessment: The missing link in competency-based medical education. Academic Medicine, 86, 460 -467. Medical Council of Canada (2012, August). Guidelines for the development of key feature problems and test cases. Retrieved from http: //mcc. ca/wp-content/uploads/CDM-Guidelines. pdf Page, G. , Bordage, G. , & Allen, T. (1995). Developing key-feature problems and examinations to assess clinical decision-making skills. Academic Medicine, 70, 194 -201. Rindler, S. E. (1979). Pitfalls in assessing test speededness. Journal of Educational Measurement, 16, 261– 270. SAS Institute Inc. (2015). SAS/STAT® 14. 1 User’s Guide: High-Performance Procedures. Cary, NC: SAS Institute Inc.