Developing Novel Data Architectures for Comparative Effectiveness Research

  • Slides: 31
Download presentation
Developing Novel Data Architectures for Comparative Effectiveness Research Health Care Day, Leadership Tampa April

Developing Novel Data Architectures for Comparative Effectiveness Research Health Care Day, Leadership Tampa April 6, 2011 David A. Fenstermacher, Ph. D. Chair & Associate Professor Department of Biomedical Informatics H. Lee Moffitt Cancer Center & Research Institute

What is Comparative Effectiveness Research? • Comparative Effectiveness Research – The generation and synthesis

What is Comparative Effectiveness Research? • Comparative Effectiveness Research – The generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat and monitor a clinical condition or to improve the delivery of care. – Provides an opportunity to improve the quality and outcomes of health care by providing more and better information to support decisions by the public, patients, caregivers, clinicians and policy makers From: Initial National Priorities for Comparative Effectiveness Research, National Academic Press

CER - Not Without Controversy

CER - Not Without Controversy

Total Cancer Care and Patient Centered Outcomes Research “The purpose of comparative effectiveness research

Total Cancer Care and Patient Centered Outcomes Research “The purpose of comparative effectiveness research (CER) is to provide information that helps clinicians and patients choose which option best fits an individual patient's needs and preferences. ” Federal Coordinating Council for CER (6/30/2009) Key Statements The Model

The Consent Process The Total Cancer Care Protocol • Can we follow you throughout

The Consent Process The Total Cancer Care Protocol • Can we follow you throughout your lifetime? • Can we study your tumor using molecular technology? • Can we recontact you? Electronic Consenting System Wireless touch- screen tablet Connects via secure interface and forwards HIPAAcompliant information to database Consists of IRB Approved: • Introductory Video • Consent Video by PI • Informed Consent • Signature Capture • Demographics Survey

Partners in the Fight Against Cancer

Partners in the Fight Against Cancer

Total Cancer Care. TM to Date 18 Consortium Sites (including MCC) 88, 616 Consented

Total Cancer Care. TM to Date 18 Consortium Sites (including MCC) 88, 616 Consented Patients MCC (61%) Sites (39%) 33, 435 Tumors Collected MCC (37%) Sites (63%) 16, 226 Gene Expression Profiles (TCC Consented sinception) As of 6/01/2012 Data Generated from Specimens CEL Files (Gene Expression Data) 16, 226 files Targeted Exome Sequencing 4, 016 samples Whole Exome Sequencing (Ovary, Lung, Colon) 535 samples Whole Genome Sequencing (Melanoma) 13 samples with normal pairs SNP/CNV (Lung, Breast Colon) 559 samples

Stratifying Populations for CER Non-Small Cell Lung Cancer • Stratification means that the investigator

Stratifying Populations for CER Non-Small Cell Lung Cancer • Stratification means that the investigator has enough knowledge of the population to subdivide the population, and to allocate sampling effort accordingly. Stage 2 Stage 3 Stage 4 Treatment A Treatment B Treatment C

Molecular Stratification • Molecular technologies – Genomics/Transcriptomics – Proteomics – Metabolomics

Molecular Stratification • Molecular technologies – Genomics/Transcriptomics – Proteomics – Metabolomics

Levering Data for Patient Centered Outcomes Research • Observational Clinical Data – Must assess

Levering Data for Patient Centered Outcomes Research • Observational Clinical Data – Must assess a comprehensive array of healthrelated outcomes for diverse patient populations – Interventions may compare medications, procedures, medical and assistive devices and technologies, diagnostic testing, behavioral change, and delivery system strategies – This research necessitates the development, expansion, and use of a variety of data sources and methods to assess comparative effectiveness and actively disseminate the results

Issues Curtailing Patient Centered Outcomes Research • The information gap – Partially due to

Issues Curtailing Patient Centered Outcomes Research • The information gap – Partially due to how the data are collected, whether by electronic medical records that contain a mixture of discrete and unstructured data or in paper format. A recent survey of U. S. hospitals revealed that only 12% of respondents have a comprehensive EMR and only and additional 6% of clinician offices an EHR 1, 2. – An additional hurdle is that only a small portion of patients are ever enrolled in studies that strive to capture information on risk factors, quality of life or other patient-centric parameters that will be essential to supporting personalized medicine. 1 Des. Roches et al. , 2008 New England Journal of Medicine 359(1): 50 -60 2 Jha et al. , 2010 Health Affairs (Millwood) 29(10): 1951 -1957

Issues Curtailing Patient Centered Outcomes Research • Although many nomenclatures and data standards exist

Issues Curtailing Patient Centered Outcomes Research • Although many nomenclatures and data standards exist (SNOMED CT, ICD-9 -CM, Med. DRA, LOINC, and GO) and are integrated through enterprise vocabulary systems, few healthcare organizations have created enterprise data governance strategies to adopt these standards across their information technology infrastructure.

Issues Curtailing Patient Centered Outcomes Research • Data to describe the lineage and transformation

Issues Curtailing Patient Centered Outcomes Research • Data to describe the lineage and transformation of clinical and research data once moved from primary data systems (i. e. EMR or LIMS) rarely exist in formats consumable by clinicians, researchers and patients. Also, the lack of data quality standards provides significant challenges on the interpretation and usability of the data. • No National healthcare ID; patient mobility

Issues Curtailing Patient Centered Outcomes Research • Architectures of health information systems will be

Issues Curtailing Patient Centered Outcomes Research • Architectures of health information systems will be critical to the sharing of data to facilitate personalized medicine and patient centered outcomes research between healthcare providers to attain the information necessary to develop evidence-based guidelines. The two main architectures currently used are a centralized or federated data model.

The Federated Network Data Model

The Federated Network Data Model

Moffitt and CER • Creating CER Infrastructure based on Total Cancer Care Model –

Moffitt and CER • Creating CER Infrastructure based on Total Cancer Care Model – Enhance the Total Cancer Care Informatics Infrastructure – Capitalize on biomedical informatics, biostatistics, clinical trials and information technology expertise – Assess evolving CER infrastructure using pilot projects

Research Information Exchange

Research Information Exchange

Research Information Exchange

Research Information Exchange

Data Warehouse Enhancements CER Data Mart

Data Warehouse Enhancements CER Data Mart

Creating CER Semantics Overall Goals Research Processes (CER) Contextual Metadata Information Science Infrastructure: Hardware

Creating CER Semantics Overall Goals Research Processes (CER) Contextual Metadata Information Science Infrastructure: Hardware & Software Physical Metadata • Metadata is simply data about data Distinct classes of metadata required within a DW environment • Two main classes of metadata • Contextual: relating to the research processes • Physical: relating to the DW infrastructure (data lineage, data transformations, etc. )

The Moffitt Data Dictionary MCC data dictionary, built using ISO/IEC 11179 metadata standards The

The Moffitt Data Dictionary MCC data dictionary, built using ISO/IEC 11179 metadata standards The ISO/IEC 11179 Model Conceptual Domain Agent Data Element Concept Chemopreventive Agent Name Valid Values Cyclooxygenase Inhibitor Doxercalciferol Eflornithine … Ursodiol Value Domain CTEP Drug Names MC C Data Element Chemopreventive Agent Name SNOMED CT ICD 9 CM Med. DRA LOINC GO

CER Data Dictionary

CER Data Dictionary

Unlocking Clinical Data • Natural Language Processing • EMR a mixture of data •

Unlocking Clinical Data • Natural Language Processing • EMR a mixture of data • Discrete Data • Blobs and clobs (text documents, . pdf) • Images – scanned (medical history)

Displaying Ontological-Based NLP Results

Displaying Ontological-Based NLP Results

Accessing a Wealth of Data • Effectiveness Score – A quality metric derived from

Accessing a Wealth of Data • Effectiveness Score – A quality metric derived from the data quality project (Attribute Score) – A measurement of that data element’s correlation to a defined outcome variable – ES scores can be used to simply evaluate the univariate effectiveness for each element or serve as the input data set for advanced multivariate comparative effectiveness analysis and CER modeling.

Attribute Score • Created Data Quality Metrics Framework, a scoring system that provides percent

Attribute Score • Created Data Quality Metrics Framework, a scoring system that provides percent weights and scores for each element and for each data quality attribute

E-Score Algorithm • ES_i = P(Ri_p|U<=pi, H 0) * P(Ri_a|W>=ai, H 0), where W

E-Score Algorithm • ES_i = P(Ri_p|U<=pi, H 0) * P(Ri_a|W>=ai, H 0), where W is a random variable following the empirical distribution of the AS: P(W>=A) = (# of AS>= A)/N. U(x) is a test statistic of the data of i-th element (x) such that U(. )<=1, U(. )>=0, P(U(X)<=a) = a if i-th element is not significant. • Interpretation: � P(Ri_p|U<=pi, H 0) == the probability that the i-th element has the highest significance conditional on all the uniformly optimal elements � P(Ri_a|W>=ai, H 0) == the probability that the i-th element has the largest AS conditional on all the uniformly optimal elements. Conclusion ES_i = [1 -(1 -pi)M]/(M*pi) *[1 -P(W<ai) M]/[M*(1 -P(W<ai))].

Data Representation Information from the CER data model can be retrieved and displayed in

Data Representation Information from the CER data model can be retrieved and displayed in several formats. The data model includes tables to hold information about CER Projects along with Milestone and Participation data that can be displayed using SQL queries to the database and BIRT generated the reports. The Cmap node links can launch “on demand” reports or present various preformatted documents such as PDF docs, Excel spreadsheets, etc.

Challenges for CER • To improve patient outcomes and safety new information management systems

Challenges for CER • To improve patient outcomes and safety new information management systems built on semantic interoperability are required • Creation of regional consortia that can collect patient-level data (clinical, environmental, risk factor, molecular, and outcomes) and focus on a specific classes of disease, develop research methodologies, create validation networks and encourage partnerships with industry leaders is needed to realize evidence-based approaches

Challenges for Patient Centered Outcomes Research • Initiatives in comparative effectiveness research need to

Challenges for Patient Centered Outcomes Research • Initiatives in comparative effectiveness research need to be developed as validation through clinical trials is not scalable and does not necessarily reflect standard of care where the care is being given • Data sharing and privacy policies need to become global rather than regional to support Patient Center Outcomes Research

Our Mission and Vision To contribute to the prevention and cure of cancer &

Our Mission and Vision To contribute to the prevention and cure of cancer & To be the leader in the discovery, translation, and delivery of personalized cancer care