SDT A Tree Method for Detecting Patient Subgroups

SDT: A Tree Method for Detecting Patient Subgroups With Personalized Risk Factors Xiangrui Li 1, Dongxiao Zhu 1, Ming Dong 1, Milad Nezhad 2, Alexander Janke 3, Phillip Levy 4 Department of 1 Computer Science, 2 Industrial and Systems Engineering 3 School of Medicine 4 Department of Emergency Medicine and Cardiovascular Research Institute Wayne State University Detroit, MI

Disclosure § No relations with commercial interests to disclose.

Introduction §Traditional medicine: Ø “one-size-fits-all” treatment §Precision (personalized) medicine: ØTailored treatment §Challenges in precision medicine: ØPatient subgroup identification ØPersonalized risk factor prioritization

Introduction §Subgroup analysis is an effective approach to patient subgroup identification. ØDeveloped specifically for comparison of two different treatments. ØMany tree-based methods have been designed in recent years ( interaction tree, QUINT et al. ). §Our goal: identify subgroups associated with a hypothesized feature as a risk factor. ØFeature selected according to prior knowledge.

Introduction Positive correlation None Response Hypothesized feature Negative correlation § Hypothesized feature is a risk factor for patient subgroups.

Motivation Hypothesis §No single sufficiently explanatory variable accounts for disparities in hypertension. §Presence of vitamin D deficiency predisposes to the development of hypertension in the black community. §Data is collected from a demographic subgroup (African-Americans). §Hypothesis: there are subgroups among participating patients showing associations between hypertension and vitamin D.

Method §Subgroup detection tree (SDT) ØData-driven ØEasiness in model interpretation ØGenerate subgroups without prior knowledge §SDT algorithm in three steps: 1. Grow a large initial tree 2. Prune the initial tree to obtain a nested sequence of subtrees 3. Select the optimal subtree

Step 1: Tree Growing §

Step 1: Tree Growing § feature value

Step 2: Pruning §

Step 3: Subtree selection §Criteria: select the subtree with the smallest predictive error on the response. §Method: ØCross validation ØTesting data

Experiment §Data is collected from African-Americans at high risk for cardiovascular disease in Detroit Receiving Hospital (DRH) § 153 samples § 40 features: vitamin D, diabetes history, demographic information, laboratory results (calcium, chloride, cholesterol, e. GFR, parathyroid hormone et al. )

Experiment §Response: left ventricular mass indexed to body surface area (LVMI) §Hypothesized feature: vitamin D (serum 25 -OH D) §Goal: detect if there exist subgroups showing strong correlations between LVMI and vitamin D

Experiment Procedure §

Result § The best subtree contains four leaf nodes (10 -fold cross-validation error is 292).

Result § Left: SDT results in four subgroups A, B, C and D with size 35, 63, 10 and 20 respectively. COR is Cornell product. ALD is aldosterone. REN is renin. § Right: classification and regression tree (CART) produces 3 leaf nodes R, S and T with size 65,

Result 120 SDT CART 109. 64 110 99. 29 100 90 94. 62 92. 13 91. 08 Populatio n mean 80. 47 80 74. 98 70 60 A B C D § Means of LVMI for subgroups R S T

Result SDT 12 CART 11. 49 11. 5 11. 44 11. 2 11 11. 09 10. 85 D R 10. 85 10 9. 57 9. 5 9 A B C § Means of vitamin D for subgroups S T Populatio n mean

Result 0. 6 CART 0. 55 0. 4 0. 3 0. 16 0. 2 0. 1 0 -0. 12 -0. 3 -0. 07 -0. 1 -0. 2 Populatio n correlatio n -0. 3 -0. 4 A B C D -0. 4 R S T § Correlations between LVMI and Vitamin D for subgroups

Result § In SDT: Ø Subgroup A and D show relatively large negative correlations -0. 30 and -0. 40, with lower vitamin D levels (9. 57 and 10. 85), indicating increasing vitamin D levels may decrease LVMI. Ø In Subgroup C, LVMI is positively correlated with vitamin D (0. 55), with higher vitamin D levels (11. 20), suggesting perhaps a threshold effect, mediated by an associated factor that is also modified by vitamin D level such as parathyroid hormone. § In CART: Only Subgroup T (same with D) has a correlation of -0. 40.

Conclusion §Compared with previous subgroup methods, we aim at subgroup along with risk factor identification. §We show that some subgroups with high correlations can be detected by SDT, but not by conventional approaches such as CART. §Future work: ØApply in large datasets for further discovery of associations. ØGeneralize to multiple or categorical features and responses. ØImprove splitting criteria for theoretical guarantees on correlation.

Acknowledgements §Research in this publication was supported by NSF/CCF grant 1451316 & 1637312. §Cohort collection and related study were supported under grant NIMHD (1 R 01 MD 005849), NHLBI (1 R 01 HL 127215) and PCORI (FC 14 -1409 -21656).

Q&A Questions?