Machine Learning and Biomarker Sub Group Analysis for

Machine Learning and Biomarker Sub. Group Analysis for Precision Medicine JMP Genomics Mastering JMP Series Kelci J. Miclaus, Ph. D Sr. Manager, Advanced Analytics JMP Life Sciences R&D SAS Institute, Inc Copyright © SAS Inst itute Inc. All rig hts reserved.

Outline Machine Learning with Biological Data in JMP Genomics Machine Learning Potential and Principles • Subgroup Analysis Methods • Three Data Stories • Sepsis Mortality Hospital Prediction • Chemoprevention Clinical Trial RNA-Seq Analysis • Diabetes Clinical Trial with Clinical + Metabolite Profiles • • Capabilities highlight JMP exploration, JMP Pro modeling and JMP Genomics analytic processes* * All methods and JMP PRO capabilities available within JMP Genomics solution Copyright © SAS Inst itute Inc. All rig hts reserved.

Potential of Machine Learning in Drug Discovery Data-rich environment for application of modern machine learning methods Published in: Alex Zhavoronkov; Mol. Pharmaceutics 15, 4311 -4313. DOI: 10. 1021/acs. molpharmaceut. 8 b 00930 Copyright © 2018 American Chemical Society Copyright © SAS Inst itute Inc. All rig hts reserved. Page 3

Highlighting Successes Clinical Decision Support Systems Chemical Compound Mining Biomarker Discovery • Prognostic/Prescriptive: Determine • Predictive: Patient outcome specific to whether patient should receive treatment (Drug label/Companion diagnostic) Copyright © SAS Inst itute Inc. All rig hts reserved. Page 4

Machine Learning Principles and Methodology Page 5 Copyright © SAS Inst itute Inc. All rig hts reserved.

Traditional Statistics vs Machine Learning Statistical modeling Algorithmic modeling (machine learning) Data is a sample from a larger population • Model validation: goodness-of-fit tests, residual examination • Simple models over complex models • Parameter interpretability • Data mechanism is unknown • Model validation: prediction performance • Require more training time and data • Complex models (black box) • Copyright © SAS Inst itute Inc. All rig hts reserved.

Avoid Overfitting: Model Validation When a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data Training data Validation data Copyright © SAS Inst itute Inc. All rig hts reserved. Test data

Combine Diverse Models Different algorithms • Regression, tree-based models, neural networks, … Different parameter settings • Grid search, random search, genetic algorithms Different features • Feature selection, feature extraction, feature engineering Different training sets • Bagging, boosting, 5 -fold cross validation Copyright © SAS Inst itute Inc. All rig hts reserved.

Subgroup Analysis Specialized Predictive Modeling Copyright © SAS Inst itute Inc. All rig hts reserved.

Uplift Modeling Lessons from Marketing Campaigns https: //www. predictiveanalyticsworld. com/machinelearningtimes/uplift-modeling-making-predictive-models-actionable/8578/ Copyright © SAS Inst itute Inc. All rig hts reserved.

Subgroup Analysis Applications in Precision Medicine Improve on Active Drug Identify subjects most-likely to respond to treatment 0 DRUG MAKES YOU WORSE 1 GET WELL ANYWAY 1 INCURABLE DRUG CURES YOU 0 Improve on Placebo Copyright © SAS Inst itute Inc. All rig hts reserved.

Subgroup Analysis Interaction Trees (Uplift Modeling) Linear, Logistic or Cox Model f(yi) = β 0 + β 1 xi + β 2 Treatmenti + β 3 Treatmenti*xi Significant interaction implies differential treatment effect between subgroups defined by a Binary Classifier All Randomized Subjects Biomarker 1 Absent Biomarker 2 Absent Select splits to maximize treatment differences Biomarker 1 Present Biomarker 2 Present Biomarker 3 Absent Biomarker 2 Absent Biomarker 3 Present Biomarker 2 Present Biomarker 3 Absent Su et al. (2009) Copyright © SAS Inst itute Inc. All rig hts reserved. Biomarker 3 Present

Subgroup analysis • Virtual Twins (Foster et al. , 2011) • Virtual twins Fit forest model and final tree model to response and counter-factual data estimated treatment effects Copyright © SAS Inst itute Inc. All rig hts reserved.

Optimal treatment Regimes • Subgroup identification • • Precision Medicine “the right patients for a given drug” Optimal treatment regimes • “the best drug for a given patient” Use Patient characteristics to classify treatment assignment that will optimize treatment response • Fit a response regression model and propensity score logistic model to create pseudo binary response and weights that can be used as input into predictive modeling routines including cross-validated designs (Freidlan et al. , 2009) • Copyright © SAS Inst itute Inc. All rig hts reserved.

Machine Learning with JMP Copyright © SAS Inst itute Inc. All rig hts reserved.

Visualization JMP Highlights Graph Builder, Distribution, Multivariate Platforms • Explore patterns and discover data integrity problems • Data Preparation and Quality Screening Tools • • • Explore Missing Values Explore Outliers Predictor Screening Text Explorer Recode Utilities Feature Set Engineering Easy Data Transformations, Formulas, Column Utilities • PCA/MDS Multivariate Analysis • Integration with SAS, CAS/Viya, R and Python, Open Libraries Copyright © SAS Inst itute Inc. All rig hts reserved. Page 16

Advanced Algorithms • • • JMP PRO Highlights Generalized Linear Models Boosted Trees (most popular with tabular data) 1&2 Layer Neural Nets Support Vector Machines Association Analysis Uplift Modeling Functional Data Explorer (Sensor data) Text Analytics Time Series Forecast Automated Data Imputation XGBOOST C++ Libraries (ADD-IN) Cross Validation • Create Validation/Test Columns Model Comparison • Compare Model Fits, Assess Variable Importance, Profile Predictions Formula Depot • Generates and Deploys score code to Python, SQL, Java. Script, C, SAS Copyright © SAS Inst itute Inc. All rig hts reserved. Page 17

JMP GENOMICS Highlights Integrated Solution of JMP PRO and SAS Specialized workflows for biological data analysis and biomarker discovery QC • Normalization • Association Mapping (GWAS) and Differential Expression • Predictive Modeling Review • • • Extensive library of models in cross-validated framework (Cross-Validation/Model Comparison) Create templates of feature set creation and filtering Ensemble Models Specialized Survival Endpoint routines JMP Genomics Addins via Python/R: Bayesian Models and XGBoost Subgroup Analysis Interaction Trees (Uplift Modeling) • Virtual Twins • Optimal Treatment Regimes • Local Control (observational studies) • Copyright © SAS Inst itute Inc. All rig hts reserved. Page 18

Data Stories and Demonstration jmp. com Copyright © SAS Inst itute Inc. All rig hts reserved.

JMP Genomics Workflow Data Story 1: Sepsis Survival Hospital Metabolites • KEY CONCEPTS Basic Expression Workflow - Predictive Modeling Review - Cross Correlation - • Background and Goal • Sepsis outcome combination of demographics, clinical signs, and biomarkers. • Develop clinical + biomarker panel to predict sepsis survival Study • Suspected sepsis cases enrolled in ER hospitals. • Clinical measures and blood (plasma metabolites and proteins) collected at admittance 31 Sepsis Non-survivors, 89 Sepsis Survivors, 29 SIRS criteria non-sepsis • Analyses • • Metabolite Profiling comparing Sepsis Outcomes Machine Learning to predict death from sepsis on data collected at hospital admittance • Cross Correlation: hypothetical association of other clinical metrics to metabolites • Copyright © SAS Inst itute Inc. All rig hts reserved.

JMP Genomics Workflow KEY CONCEPTS Baseline Corrected RNA-SEQ ANOVA - Uplift Modeling - Virtual Twins - Data Story 2: FAP Chemoprevention Clinical Trial Identify differential molecular expression signatures for normal vs. polyp tissues attributable to chemoprevention therapy treatment in FAP Molecular expression baseline measurements taken on normal tissue prior to treatment, followup m. RNA collection on polyp and normal tissue after 6 month treatment (Drug vs Placebo) GEO Data. Copyright Submission of Next. Gen RNA sequencing of duodenal polyps in FAP © SAS Inst itute Inc. All rig hts reserved.

JMP Live Chemoprevention Report https: //public. jmp. com/packages/k. SJt 4 t. VHmz. KTJGTd. Zzl. Rk Copyright © SAS Inst itute Inc. All rig hts reserved.

JMP Genomics Workflow Data Story 3: Anonymized Diabetes Drug Trial • KEY CONCEPTS - Uplift Modeling Virtual Twins Optimal Treatment Regimes Predictive Modeling Review Background and Goal • Glycated haemoglobin (Hb. A 1 c) test common metric on how well Diabetes is being controlled • Use subgroup analyses to find clinical measures or metabolite biomarkers that exhibit interaction effect in response (Hb. A 1 c change from baseline) due to treatment - Find patients responding well, those that don’t respond, those who SHOULD/SHOULD NOT be treated • Study • Patients enrolled in a Diabetes Drug trial • Clinical measures and metabolites at baseline and initial trial visits Copyright © SAS Inst itute Inc. All rig hts reserved.

Kelci. Miclaus@jmp. com for Questions jmp. com Copyright © SAS Inst itute Inc. All rig hts reserved.
- Slides: 24