Consensus Strategy for Variable Selection in Clinical Prediction
Consensus Strategy for Variable Selection in Clinical Prediction Rule Development Miriam R. Elman, MPH 1; Jessina C. Mc. Gregor, Ph. D 2; Jodi Lapidus, Ph. D 1 1 Oregon Health & Science University-Portland State University School of Public Health; 2 Oregon State University/Oregon Health & Science University College of Pharmacy BACKGROUND Model Building Approach v Clinical prediction rules aim to prognostically identify presence of diagnoses using baseline patient data v Electronic health record (EHR) data is a rich resource § Massive amount of retrospective patient information v Robust and efficient variable reduction likely aids variable selection on multidimensional, EHR data § Prevent early removal of key predictors v Consensus strategy to reduce candidate predictors RESULTS (continued) Consensus Strategy Random Forest Boosted Classification Group Lasso OBJECTIVE Apply consensus strategy to inform prediction rule developed to direct appropriate selection of antibiotic agents to treat urinary tract infections METHODS • Statistical analysis with R 3. 3. 3 RESULTS Data Preparation STEP 1 Multivariable Logistic Regression STEP 2 STEP 3 v No interaction terms selected for best subsets model v Twenty-two predictors selected by consensus strategy Random Forest (0) (3) Lasso (8) (4) Extract EHR Data & Identify Cohort Split Cohort into Development & Validation Sets • Data management with SAS v 9. 4 (5) Development Set for Prediction Rule (2) Boosting (0) v Saturated and best subsets model results in Table § 3 of 4 predictors selected by all three methods appeared in final model Table. Results of multivariable logistic regression models Model AUC Sensitivity Specificity Saturated 0. 6631 0. 6039 0. 6432 Best subsets 0. 6382 0. 4312 0. 7748 CONCLUSIONS v Prediction rule did not meet minimum acceptable 90% sensitivity and 85% specificity set a priori by clinicians v Challenging prediction problem § Mostly categorical predictors § Key predictors may be missing in retrospective data FUTURE DIRECTIONS v Reviewed predictors with clinical partners and conducting prospective data collection § Further model development with additional data § Explore additional modeling strategies v Developed framework for consensus strategy § Available for other applications Miriam Elman elmanm@ohsu. edu
STEP 1 STEP 3 STEP 2 80% 20% Extract EHR Data & Identify Cohort § Extract retrospective EHR data from electronic repositories § Define cohort, outcome, and predictors Split Cohort into Development & Validation Datasets § Randomly split cohort into development (80%) and validation (20%) datasets Use Development for Prediction Rule § Construct prediction rule on development set § Retain remaining data set aside for rule validation
Consensus Strategy Random Forest • party (1. 2 -2) implementation used • Algorithm repeated x 3 with different seeds and 10 most important variables used for each Group Lasso • grpreg (3. 0 -2) used to select categorical variables as a group • Tuning parameter identified with minimized cross-validated error then refined Boosted Classification • mboost (2. 7 -0) used • Variables defined as ordinary least squared base learners to group categorical variables • Continuous variables centered
Multivariable Logistic Regression v v Model selection conducted with best subsets based on minimized BIC § Model limited to 4 main effects by design § Interactions assessed after main effects selected AUC, sensitivity, and specificity calculated for saturated model and selected model § Youden’s index chosen for sensitivity and specificity cutpoint
Random Forest and Lasso (3) Lasso Alone (8) Lasso and Boosting (5) All Three (4) Boosting Alone (0) Random Forest and Boosting (2)
- Slides: 5