The Reference Model A Decade of Healthcare Projective





















- Slides: 21

The Reference Model: A Decade of Healthcare Projective Predictive Analytics with Python Jacob Barhak Austin, Texas Py. Texas 2017 November 18 -19, 2017 Galvanize, Austin, TX, USA

Python Enablers Over a Decade • • • numpy – math and statistics scipy – optimization functions sympy – symbolic math parser – creating a modeling language Wx. Python – User Interface matplotlib - visualization • • • MIST – MIcro Simulation Tool Star. Cluster – virtual cluster on the cloud Inspyred – Evolutionary Computation xmllib – parsing XML data Bokeh – Visualization • libsbml – Systems Biology Markup Language Earlier work from 2007 -2012 More recent work from 2012 -2017 future work Jacob Barhak

Predicting Disease Progression Challenges Visualized using Mat. Plot. Lib 4352 Trajectories trajectories from 64 64 modelcombinationsfor & 68=34 x 2 a single cohorts So much information that even the legend wont fit MIST_Ref. Model_2014_03_03_MATRIX_Trace. Back. zip The end point number represents survival after 10 years from 1000 people So much variability amongst existing models and different conditions Jacob Barhak

The Reference Model Accumulates Knowledge Unexplained Best Model Biomarkers Independent Fully Correlated Bio. Marker Correction 8 Populations = 40 cohorts x 2 Raw Models Fully Correlated Biomarkers Independent Knowledge = Observed Data & Assumptions / Models Date Correction Date+Bio. Marker Correction 50 Models x 4 Assumptions Version 22 - MIST_Ref. Model_2014_04_29_MATRIX_Trace. Back. zip Jacob Barhak

The Reference Model Historical Evolution 2012 – Start 2013 – High Performance Computing and Cloud Computing 2014 – Evolutionary Computation for Population Generation 2015 – Object Oriented Population Generation Models (#) 2016 – Assumption Engine / Model Cooperation 2017 – Interface to Clinical. Trials. Gov Populations (#) Cohorts (#) 2017 2016 2015 2014 2013 2012 Cohorts (#)200 0 91 47 47 40 34 22 Populations 400 600(#) 22 9 9 8 6 4 800 Models 1000(#) 1 1 1028 400 64 64 1200 Continuous Model Space = Infinite Model Combinations Jacob Barhak

The Reference Model Now Employs Model Competition and Cooperation Pop 3 A B a. A+b. B+c. C+d. D D C Process CHD MI No CHD Survive MI Pop 2 Pop 1 • • CHD Death Eq EH E Eq AD A 4 2 Pop 1 Pop 2 E F e. E+f. F+g. G+h. H H G Process Stroke No Stroke Alive Other Death Survive Stroke Death Pop 3 … E E E … B C D … 6 2 1 4 6 1 2 3 9 2 … … … … Discrete Fitness Matrix Process Competing Mortality Ensemble model where models cooperate Calculates fitness of multiple models to multiple population sets Uses and Assumption Engine to handles different competing assumptions Uses High Performance Computing Continuous Fitness Space Jacob Barhak

The Reference Model Assumption Engine – A model is an assumption! • It figures out which assumptions work well together to fit observed data – Points to significant models – Rejects incompatible models Fitness (s) • The Assumption Engine allows us to “throw” assumptions at it Eq ua tio n Co eff ic ien t (t 2 =b ) ion at Equ ) t 1=a ( t en fici f Coe Jacob Barhak

Optimization Visualized with Bokeh Simple Jacob Barhak

How Fast Medical Practice Improves? • • Multiple models were used with model stamps ranging from 1978. 5 to 2007. 05 Validated against 9 populations included in the period ranging from 1977 to 2010 • The assumption engine was asked to calculate the best fit: – – – Data and model from 3 decades Significance of each model used The yearly improvement coefficient The prevention coefficient = indicating if improvement is pre event or post event Model Data Time Interval Simulated Time Stamp Time Adjustment Time Jacob Barhak

Optimization Visualized with Bokeh Complex with Temporal Correction Jacob Barhak

Equivalent to Moore’s Law for improvement for Diabetic Cardiovascular Disease Mortality • Correcting models for medical practice improvement creates a better model mixture. • Optimal Yearly Temporal Correction CVD Coefficient was =0. 86/0. 87 CVD death probability Moore’s Law: Number of transistors doubles every year halved every 5 years Prevention/Treatment improved equally • The Prevention coefficient =0. 43/0. 53 Jacob Barhak

Life Expectancy – Assuming Future = Past Gender Male Female Gender Smoke? Non Smoking A 1 c 6 Lipid Ratio 4 5 6 SBP 120 31. 9 31. 1 30. 1 140 30. 1 28. 8 27. 8 160 29. 4 27. 8 26. 7 180 28. 2 26. 9 26. 0 7 55 120 140 160 180 24. 0 22. 6 22. 0 21. 5 23. 2 22. 0 21. 0 20. 3 65 120 140 160 180 16. 9 16. 1 15. 7 15. 4 16. 5 15. 4 15. 1 14. 6 75 8 8 4 5 6 7 29. 1 26. 7 25. 7 24. 4 28. 3 25. 9 24. 4 23. 4 31. 4 29. 6 28. 6 27. 7 30. 5 28. 4 27. 3 26. 4 29. 2 27. 6 26. 2 25. 2 22. 6 20. 9 20. 1 19. 4 21. 8 20. 1 19. 3 18. 5 21. 2 19. 3 18. 4 17. 7 23. 7 22. 4 21. 9 21. 1 22. 8 21. 4 20. 8 20. 0 15. 8 14. 9 14. 4 13. 8 15. 3 14. 2 13. 5 13. 2 14. 6 13. 5 13. 0 12. 5 16. 9 16. 1 15. 5 15. 1 16. 1 15. 2 14. 7 14. 2 120 140 160 180 11. 2 10. 7 10. 3 10. 0 10. 7 10. 2 9. 7 9. 4 10. 3 10. 0 9. 4 9. 0 10. 2 9. 8 9. 2 8. 8 9. 6 8. 9 8. 5 8. 3 45 120 140 160 180 34. 8 33. 2 32. 0 31. 5 34. 2 32. 4 31. 3 30. 3 33. 3 31. 6 30. 2 29. 3 32. 8 30. 9 29. 5 28. 3 31. 9 29. 9 28. 4 27. 2 34. 5 32. 9 31. 9 30. 9 33. 6 32. 1 30. 9 29. 9 55 120 140 160 180 26. 6 25. 3 24. 7 24. 0 26. 0 24. 8 24. 0 23. 2 25. 4 24. 0 23. 2 22. 4 24. 8 23. 2 22. 5 21. 6 24. 2 22. 3 21. 8 20. 6 26. 2 25. 2 24. 3 23. 8 65 120 140 160 180 19. 0 18. 2 17. 8 17. 4 18. 5 17. 8 17. 3 16. 8 18. 0 17. 2 16. 8 16. 1 17. 8 16. 6 16. 1 15. 5 17. 0 16. 1 15. 6 14. 8 75 120 12. 5 12. 3 12. 0 140 12. 1 11. 7 11. 3 160 11. 8 11. 5 11. 1 180 11. 6 11. 2 10. 8 SBP Lipid Ratio 4 5 6 A 1 c 6 Smoke? Non Smoking 11. 4 11. 0 10. 6 10. 4 7 Age 45 Age Smoking 6 4 5 6 7 8 8 4 27. 6 25. 2 23. 9 22. 8 26. 8 24. 1 22. 6 21. 6 25. 6 22. 9 21. 7 20. 4 24. 7 21. 7 20. 5 19. 1 28. 1 26. 1 25. 0 23. 9 27. 0 24. 7 23. 3 22. 5 26. 2 23. 3 22. 5 21. 2 25. 3 22. 3 21. 1 19. 7 24. 1 21. 1 19. 8 18. 6 27. 7 25. 8 24. 7 23. 1 26. 5 24. 2 23. 0 22. 0 25. 5 23. 1 21. 7 20. 6 24. 8 21. 8 20. 4 19. 3 Smoke? A 1 c 8 Lipid Ratio SBP 23. 5 120 20. 6 140 19. 2 160 180 21. 5 20. 0 19. 1 18. 4 20. 5 18. 8 18. 0 17. 4 19. 8 18. 0 17. 1 16. 3 19. 1 17. 0 16. 1 15. 3 18. 2 16. 1 15. 4 14. 4 21. 2 19. 4 18. 8 18. 1 20. 1 18. 4 17. 7 16. 9 19. 5 17. 5 16. 7 15. 9 18. 4 16. 8 15. 7 15. 0 17. 9 15. 6 14. 9 14. 0 20. 5 19. 2 18. 4 17. 5 19. 6 18. 1 17. 4 16. 4 19. 1 17. 2 16. 4 15. 6 18. 2 16. 0 15. 3 14. 4 17. 3 15. 5 14. 3 13. 7 120 140 160 180 55 14. 1 12. 9 12. 3 11. 9 14. 7 13. 9 13. 5 13. 1 14. 3 13. 3 12. 8 12. 3 13. 8 12. 7 12. 2 11. 7 13. 2 11. 9 11. 4 10. 9 12. 5 11. 4 10. 7 10. 2 14. 7 13. 8 13. 3 12. 8 13. 8 12. 9 12. 6 12. 1 13. 3 12. 3 11. 8 11. 3 12. 9 12. 2 11. 7 10. 8 11. 0 10. 5 10. 6 9. 9 14. 3 13. 5 13. 0 12. 6 13. 7 12. 8 12. 3 11. 6 13. 1 12. 0 11. 6 11. 0 12. 4 11. 8 11. 3 10. 7 10. 8 10. 2 10. 3 9. 7 120 140 160 180 65 9. 5 8. 9 8. 6 8. 3 9. 1 8. 4 8. 1 7. 9 9. 7 9. 1 8. 9 8. 6 9. 1 8. 6 8. 4 8. 1 8. 8 8. 3 8. 0 7. 7 8. 4 7. 8 7. 5 7. 2 8. 1 7. 3 7. 1 6. 9 9. 5 8. 9 8. 7 8. 4 9. 0 8. 4 8. 2 8. 0 8. 7 8. 1 7. 8 7. 5 8. 3 7. 6 7. 4 7. 1 7. 9 7. 2 7. 0 6. 7 9. 3 8. 8 8. 5 8. 3 8. 9 8. 3 8. 0 7. 6 8. 4 7. 8 7. 5 7. 4 8. 0 7. 4 7. 1 6. 8 7. 7 7. 0 6. 7 6. 4 120 140 160 180 75 32. 5 30. 6 29. 3 28. 2 31. 8 29. 7 28. 5 27. 2 31. 1 28. 7 27. 5 25. 9 31. 8 29. 5 28. 3 27. 4 30. 9 28. 8 27. 3 26. 3 29. 9 27. 7 26. 2 24. 9 29. 3 26. 9 25. 5 23. 9 28. 4 25. 7 24. 2 22. 9 31. 3 29. 2 28. 0 26. 9 30. 6 28. 2 27. 0 25. 8 29. 6 27. 3 25. 7 24. 5 29. 1 26. 5 24. 9 23. 6 27. 6 25. 4 23. 7 22. 2 31. 2 28. 8 27. 7 26. 5 30. 2 27. 8 26. 5 25. 3 29. 2 26. 6 25. 4 23. 9 28. 3 26. 0 24. 5 23. 2 27. 4 24. 7 23. 3 21. 7 120 140 160 180 45 25. 5 24. 3 23. 3 22. 5 24. 6 23. 3 22. 3 21. 6 24. 1 22. 7 21. 6 20. 9 23. 3 21. 7 20. 7 19. 8 23. 7 22. 3 21. 7 20. 8 23. 3 21. 7 20. 0 22. 6 21. 0 19. 9 19. 2 21. 8 20. 2 19. 1 18. 4 21. 1 19. 2 18. 2 17. 2 23. 5 22. 3 21. 4 20. 6 22. 8 21. 3 20. 7 19. 6 22. 0 20. 6 19. 8 18. 8 21. 5 19. 9 18. 8 17. 7 20. 8 19. 1 17. 9 16. 9 23. 3 21. 8 21. 0 20. 0 22. 5 20. 9 20. 1 19. 2 21. 8 20. 3 19. 3 18. 3 20. 9 19. 5 18. 4 17. 6 20. 5 18. 5 17. 4 16. 6 120 140 160 180 55 18. 6 17. 8 17. 4 16. 9 18. 1 17. 1 16. 8 16. 3 17. 5 16. 6 16. 0 15. 7 17. 0 16. 0 15. 5 15. 0 16. 5 15. 4 15. 0 14. 3 16. 7 15. 9 15. 4 15. 0 16. 4 15. 4 14. 8 14. 4 15. 7 14. 8 14. 4 13. 7 15. 3 14. 3 13. 8 13. 2 14. 7 13. 0 12. 5 16. 7 15. 6 15. 3 14. 9 16. 2 15. 2 14. 5 14. 0 15. 8 14. 6 13. 9 13. 4 15. 1 13. 9 13. 3 12. 8 14. 6 13. 4 12. 8 12. 2 16. 3 15. 4 15. 1 14. 6 15. 9 14. 4 13. 7 15. 3 14. 3 13. 7 13. 1 14. 9 13. 8 13. 1 12. 4 14. 1 13. 1 12. 5 11. 9 120 140 160 180 65 12. 2 11. 9 11. 5 11. 3 11. 8 11. 5 11. 2 10. 9 11. 5 10. 9 10. 7 10. 4 11. 2 10. 8 10. 5 10. 2 10. 3 9. 7 10. 0 9. 6 10. 9 10. 6 10. 3 10. 5 10. 0 9. 7 10. 2 9. 8 9. 4 10. 0 9. 5 9. 0 9. 9 9. 3 9. 0 8. 7 9. 5 9. 0 8. 6 8. 3 10. 7 10. 4 10. 0 10. 3 9. 8 9. 5 10. 0 9. 6 9. 1 9. 6 9. 3 8. 9 9. 8 9. 1 8. 9 8. 5 9. 4 8. 7 8. 4 8. 2 10. 7 10. 2 10. 0 10. 2 9. 8 9. 3 9. 9 9. 5 9. 0 9. 7 9. 1 8. 8 9. 6 9. 1 8. 6 8. 4 9. 2 8. 5 8. 1 7. 9 75 4 10 5 6 7 8 120 140 160 180 SBP 8 Lipid Ratio A 1 c Smoke? 8 10 4 5 6 7 8 28. 5 26. 3 25. 1 24. 1 27. 9 25. 1 23. 7 22. 7 31. 0 29. 1 28. 2 27. 3 29. 9 27. 9 26. 9 25. 8 29. 1 27. 0 25. 8 24. 5 28. 1 25. 5 24. 6 23. 3 27. 2 24. 4 23. 5 22. 3 28. 7 26. 3 25. 4 24. 3 22. 1 20. 5 19. 8 19. 0 21. 4 19. 7 18. 9 18. 1 20. 7 18. 8 17. 0 23. 5 22. 0 21. 2 20. 6 22. 5 21. 0 20. 2 19. 5 21. 8 20. 1 19. 2 18. 6 20. 9 19. 2 18. 4 17. 7 20. 3 18. 4 17. 5 16. 6 15. 6 14. 5 14. 0 13. 5 14. 9 13. 4 12. 8 14. 4 13. 3 12. 7 12. 1 16. 5 15. 8 15. 2 15. 0 15. 8 14. 9 14. 5 14. 2 15. 3 14. 2 13. 9 13. 2 14. 7 13. 1 12. 6 11. 0 10. 5 10. 1 10. 5 10. 0 9. 6 10. 2 9. 6 9. 2 10. 2 9. 5 9. 0 9. 8 9. 1 8. 9 8. 5 9. 3 8. 7 8. 5 8. 1 10. 8 10. 3 10. 0 10. 3 9. 8 9. 4 10. 2 9. 6 9. 1 9. 9 9. 2 8. 7 32. 8 31. 0 29. 8 29. 0 32. 2 30. 4 29. 0 27. 8 31. 2 29. 4 27. 9 26. 8 34. 0 32. 6 31. 4 30. 5 33. 5 31. 7 30. 5 29. 4 25. 7 24. 5 23. 8 22. 8 25. 0 23. 4 22. 9 22. 1 24. 5 22. 9 22. 0 21. 1 23. 7 22. 1 21. 2 20. 3 26. 2 24. 8 24. 1 23. 4 18. 7 18. 2 17. 6 17. 2 18. 2 17. 4 17. 0 16. 4 17. 7 16. 9 16. 5 15. 9 17. 4 16. 4 15. 9 15. 2 16. 8 15. 2 14. 6 11. 3 10. 6 10. 3 10. 0 12. 4 11. 9 11. 7 11. 3 12. 0 11. 5 11. 3 11. 0 11. 6 11. 2 10. 9 10. 5 11. 3 11. 0 10. 9 10. 5 10. 1 10. 2 9. 8 8 4 8 5 6 Reproducibility Info: MIST_Ref. Model_2017_01_02_MODSIM 2017. zip using model version 45 7 8 4 5 6 Smoking 6 4 8 5 6 7 5 6 8 10 4 5 6 7 4 10 5 6 7 Age 45 Age Gender Male Female Gender Jacob Barhak

Life Expectancy – Temporal Correction Gender Male Female Gender Smoke? Non Smoking A 1 c 6 Lipid Ratio 4 5 6 SBP 120 37. 7 37. 6 140 37. 6 37. 5 37. 4 160 37. 7 37. 5 180 37. 6 37. 7 7 55 120 140 160 180 28. 8 28. 7 28. 6 28. 5 65 120 140 160 180 20. 5 20. 4 20. 5 75 120 140 160 180 45 8 8 4 5 6 7 37. 5 37. 6 37. 5 37. 7 37. 5 37. 4 37. 6 37. 7 37. 5 37. 2 37. 7 37. 5 37. 4 37. 5 37. 7 37. 3 28. 7 28. 6 28. 7 28. 9 28. 8 28. 4 28. 6 28. 5 28. 6 28. 3 28. 9 28. 5 28. 7 28. 6 28. 5 20. 2 20. 6 20. 4 20. 6 20. 3 20. 1 20. 2 20. 3 20. 2 20. 5 20. 4 20. 5 13. 3 13. 4 13. 2 13. 3 13. 4 13. 3 13. 1 13. 2 120 140 160 180 39. 8 39. 7 40. 0 39. 9 40. 0 39. 7 40. 0 40. 1 39. 9 39. 7 40. 0 39. 9 39. 8 55 120 140 160 180 30. 8 30. 6 30. 7 30. 8 30. 5 30. 6 65 120 140 160 180 22. 1 22. 4 22. 0 22. 1 22. 2 75 Age 45 Age Smoking 6 4 5 6 7 8 8 4 35. 3 35. 4 35. 2 35. 4 35. 3 35. 2 34. 8 35. 3 35. 0 35. 2 35. 1 35. 5 35. 3 35. 2 35. 1 35. 3 35. 5 35. 2 35. 3 35. 6 35. 3 35. 1 35. 2 35. 4 35. 3 35. 2 35. 4 35. 5 35. 1 35. 2 35. 0 35. 3 35. 2 35. 0 Smoke? A 1 c 8 Lipid Ratio SBP 35. 5 120 35. 2 140 35. 0 160 34. 8 180 26. 7 26. 5 26. 6 26. 5 26. 3 26. 2 26. 5 26. 4 26. 2 26. 3 26. 5 26. 3 26. 4 26. 6 26. 4 26. 3 26. 4 26. 6 26. 5 26. 4 26. 3 26. 5 26. 6 26. 3 26. 4 26. 3 26. 5 26. 6 26. 4 26. 3 26. 5 26. 3 26. 0 26. 7 26. 6 26. 3 26. 6 26. 4 26. 2 26. 3 26. 2 26. 5 26. 4 26. 2 26. 5 26. 2 26. 4 26. 1 120 140 160 180 55 20. 3 20. 1 20. 2 18. 4 18. 6 18. 4 18. 5 18. 6 18. 5 18. 4 18. 3 18. 2 18. 4 18. 6 18. 5 18. 4 18. 3 18. 4 18. 1 18. 5 18. 3 18. 4 18. 3 18. 2 18. 6 18. 4 18. 3 18. 5 18. 3 18. 4 18. 2 18. 1 120 140 160 180 65 13. 2 13. 3 13. 2 13. 1 11. 9 11. 8 12. 0 11. 8 11. 9 11. 8 11. 7 11. 9 11. 7 11. 6 12. 0 11. 9 11. 7 11. 9 11. 8 11. 7 11. 8 11. 9 11. 6 11. 7 11. 8 11. 7 11. 6 11. 7 120 140 160 180 75 40. 0 39. 9 39. 8 39. 9 40. 0 39. 8 39. 7 39. 9 39. 7 39. 8 37. 7 37. 9 37. 6 38. 0 37. 7 37. 4 37. 5 37. 6 37. 8 37. 5 37. 6 37. 4 37. 5 37. 4 37. 8 37. 7 37. 5 37. 7 37. 4 37. 7 37. 6 37. 5 37. 8 37. 5 37. 7 37. 4 37. 6 37. 5 37. 4 37. 6 37. 8 37. 7 37. 6 37. 7 37. 5 37. 7 37. 6 37. 4 37. 5 37. 7 37. 5 37. 3 120 140 160 180 45 30. 7 30. 8 30. 6 30. 7 30. 5 30. 6 30. 8 30. 5 30. 4 30. 7 30. 6 30. 5 28. 6 28. 7 28. 6 28. 5 28. 4 28. 6 28. 4 28. 8 28. 7 28. 5 28. 4 28. 6 28. 4 28. 9 28. 5 28. 8 28. 6 28. 5 28. 7 28. 4 28. 3 28. 5 28. 6 28. 4 28. 5 28. 2 28. 3 28. 7 28. 4 28. 8 28. 7 28. 4 28. 6 28. 4 28. 3 28. 2 28. 5 28. 8 28. 6 28. 3 28. 4 28. 5 28. 3 28. 1 120 140 160 180 55 22. 3 22. 2 22. 3 22. 1 22. 2 21. 8 22. 0 22. 1 21. 8 22. 1 21. 9 20. 3 20. 2 20. 4 20. 3 20. 2 20. 1 20. 2 20. 3 20. 1 20. 2 20. 1 20. 4 20. 0 20. 1 20. 4 20. 3 20. 0 20. 2 20. 0 20. 1 20. 2 20. 1 20. 0 20. 2 20. 1 20. 3 20. 1 20. 2 20. 1 20. 0 19. 9 120 140 160 180 65 14. 7 14. 6 14. 5 14. 7 14. 8 14. 7 14. 6 14. 5 14. 8 14. 7 14. 6 14. 7 14. 5 13. 2 13. 1 13. 4 13. 2 13. 0 13. 2 13. 1 13. 2 13. 0 13. 1 13. 2 13. 1 13. 0 12. 9 13. 2 13. 3 13. 1 13. 2 13. 1 13. 0 13. 1 13. 2 13. 3 13. 1 13. 3 13. 2 13. 1 13. 2 13. 0 12. 9 13. 0 13. 1 12. 9 75 8 4 10 5 6 7 8 4 5 6 Smoking 6 7 8 4 8 5 6 7 8 4 10 5 6 7 120 140 160 180 SBP 8 Lipid Ratio A 1 c Smoke? 8 10 4 5 6 7 8 37. 7 37. 8 37. 5 37. 6 37. 4 37. 5 37. 8 37. 5 37. 4 37. 6 37. 3 37. 6 37. 5 37. 4 37. 2 35. 6 35. 4 35. 6 28. 7 28. 5 28. 4 28. 5 28. 7 28. 5 28. 6 28. 5 28. 3 28. 6 28. 8 28. 6 28. 7 28. 6 28. 4 28. 6 28. 2 28. 5 28. 6 28. 5 28. 3 28. 6 28. 5 28. 4 20. 5 20. 3 20. 4 20. 2 20. 4 20. 3 20. 2 20. 1 20. 6 20. 2 20. 5 20. 3 20. 4 20. 3 20. 0 20. 2 20. 4 20. 3 20. 1 20. 2 13. 4 13. 3 13. 2 13. 3 13. 4 13. 2 13. 3 13. 2 13. 4 13. 3 13. 2 39. 8 39. 6 39. 9 40. 2 39. 8 39. 9 40. 1 39. 9 39. 8 40. 0 39. 9 40. 0 39. 6 39. 8 39. 9 40. 0 39. 8 40. 0 39. 6 39. 8 39. 7 40. 2 39. 8 39. 9 40. 0 40. 1 39. 9 39. 7 30. 8 30. 7 30. 5 30. 8 30. 7 30. 8 30. 9 30. 7 30. 8 30. 6 30. 8 30. 7 30. 6 30. 8 30. 6 30. 4 30. 6 30. 8 30. 5 30. 9 30. 8 30. 7 30. 4 30. 5 30. 4 31. 0 30. 8 22. 2 22. 1 22. 2 22. 0 22. 3 21. 9 22. 1 21. 9 22. 3 22. 1 22. 2 22. 3 22. 2 22. 1 22. 2 22. 0 22. 1 21. 9 22. 1 22. 3 22. 0 22. 1 120 14. 7 14. 9 14. 7 140 14. 7 14. 8 14. 5 160 14. 8 14. 7 180 14. 7 14. 6 14. 7 SBP Lipid Ratio 4 5 6 A 1 c 6 Smoke? Non Smoking 14. 8 14. 6 14. 7 14. 6 14. 5 14. 8 14. 7 14. 8 14. 6 14. 7 14. 6 7 8 4 8 5 6 7 Reproducibility Info: MIST_Ref. Model_2016_11_17_MODSIM 2017. zip using model version 45 5 6 7 8 10 4 5 6 7 Age 45 Age Gender Male Female Gender Jacob Barhak

The Reference Model Data Sources The Reference Model accumulates knowledge from public data sources: 1. Population data = observed evidence based data – Summary statistics are published in clinical trial reports • Demographics Summary • Trial outcomes Data is Public 2. Risk Equations = models / assumptions – Published in the literature as equations – In some cases actual code is published Whereas Individual Data is Restricted Using public data provides a wider view Jacob Barhak

Population Generation Process • Generation Rules: – – – Define how to generate a single individual Test if individual fits the inclusion/exclusion criteria Define ties and correlations between characteristics Object Oriented Population Generation • Objectives: – – – Monte Carlo Define aggregate targets for the entire population Reduce random generation error Handle skewed distributions to fit target Inheritance Code reusability Defining defaults Defining hypotheses INSPYRED MIST Expression Compiler • • Selection Evolutionary Computation Result Population Converges to Objectives Jacob Barhak

Clinical. Trials. Gov • • • National Institutes of Health (NIH) Project Accumulates Clinical Trial Data Many Studies are Now Required to Register Number of International Date Trials Grows Rapidly 3 -Nov-2017 258, 046 Studies with Results 28, 785 12 -Feb-2017 236, 687 24, 251 27 -Sep-2016 226, 460 22, 614 7 -Apr-2015 187, 653 Not collected Jacob Barhak

The Reference Model Populations Reduced import effort from days to hours Using Clinical. Trials. Gov roughly doubled information Study Abbreviation Before Import 9 populations 47 Cohorts New import from Clinical. Trials. Gov 13 populations 44 Cohorts ASPEN ADVANCE ACCORD (BP) UKPDS (33) KP NDR Look AHEAD ADDITION CARDS ACCORD (Glycimaia) BARI 2 D ORIGIN TREAT IONM RECORD HPS 2 -THRIVE ALTITUDE TECOS EXAMINE SAVOR-TIMI 53 EMPA-REG OUTCOME ELIXA Clinical. Trials. Gov ID / Existed before Existed Existed Existed NCT 00000620 NCT 00006305 NCT 00069784 NCT 00093015 NCT 00191282 NCT 00379769 NCT 00461630 NCT 00549757 NCT 00790205 NCT 00968708 NCT 01107886 NCT 01131676 NCT 01147250 TOTAL Number of cohorts modeled 7 9 3 3 6 6 3 3 7 3 5 3 3 3 3 3 4 3 91 Population Size 2, 410 11, 140 4, 733 3, 867 29, 247 29, 034 5, 145 3, 057 2, 838 10, 251 2, 368 12, 537 4, 038 1, 115 4, 447 25, 673 8, 561 14, 671 5, 380 16, 492 7, 020 6, 068 210, 092 Jacob Barhak

Issues Handled During Import • Systematic review – Organizing results in tabular human readable format • Population generation – – – – • Code generation for individuals Code inclusion/exclusion criteria Code generation of objectives Name matching Unit conversion – context sensitive Race/Ethnicity conversion using user defined dictionary Time extraction from free text Summary of Import Module 1. Better view for human 2. Automates error prone tasks 3. Writes actual modeling code 4. Reproducible and traceable Outcome Conversion – Outcomes scaling to same reference – Cohort mapping – Calculation of missing full cohort outcomes Jacob Barhak

What’s Next? • Enhancing reproducibility by model sharing – Systems Biology Markup Language (SBML) – Using SBML Array within libsbml python package – Work in progress with Leandro Watanabe and Chris Myers from Utah • Capturing a snapshot of the clinical mind – Interpretation of ambiguous data as the medical expert sees it – Merging human and machine understanding • Mapping Clinical Trials Explorers need maps Jacob Barhak

Acknowledgments • Deanna J. M. Isaman - who is the spirit behind the great ideas. She taught me my first steps in disease modeling. • Morton Brown & William H. Herman – for guidance, critical feedback, and growth environment. • Continuum Analytics / Anaconda and specifically: – – Benjamin Zeitler for creating the cloud AMI Ilan Schnell for his work on Anaconda Bryan Van de Ven for work and support with bokeh Travis Oliphant for numpy and forging the connection • Aaron Garrett and Drew Pruett for support and advice while implementing computational components • All those who developed free software used and supported it: including Python, Anaconda, Spyder, numpy, Sci. Py, nose, winpdb, Star Cluster, Matplot. Lib, Ubuntu, Sun Grid Engine, Bokeh. • Nick Ide for help with initial steps of interfacing with Clinical. Trials. Gov • The legacy IEST modeling framework was supported by the Biostatistics and Economic Modeling Core of the MDRTC (P 60 DK 020572) and by the Methods and Measurement Core of the MCDTR (P 30 DK 092926), both funded by the National Institute of Diabetes and Digestive and Kidney Diseases. The modeling framework was initially defined as GPL and was funded by Chronic Disease Modeling for Clinical Research Innovations grant (R 21 DK 075077) from the same institute. MIST is based on IEST. • The Reference Model and MIST were developed independently without financial support ! Jacob Barhak

Questions? Jacob Barhak http: //sites. google. com/site/jacobbarhak/ Jacob Barhak