Department of Quantitative Biomedicine Chair of Medical Informatics
Department of Quantitative Biomedicine / Chair of Medical Informatics Medical Data Formats and Standardization Challenges Michael Krauthammer, MD Ph. D 6/6/2019
Standardization involves … …rules and guidelines for… the achievement of the optimum degree of order in a given context (ISO/IEC, 2004)
Data Traffic in Medicine: Need For Standardized Communication thedatamap. org
Medical Standards: Bringing Order into a Complex Space Extended Health Care System UMLS Metathesaurus Genomic Medicine Clinical Medicine SNOMED CT 349, 548 concepts ID: 31978002 |fracture of tibia| 3, 848, 696 concepts Consumer Health Vocabulary Others HGVS – Nomenclature «Terms» for Billions of Single Nucleotide Variants ENST 00000288602: c. 1799 T>A «BRAF V 600 E mutation»
Medical Standards ≠ Standardization of Medicine Structured «standardized» data Patient Healthcare Interactions SNOMED CT LOINC • EHR: structured data capture • Structured reporting • Ab initio structured data: lab, demographics Data Generation Unstructured data • Text • Images • Sequences
Structured vs Unstructured Medical Data Wei WQ, Denny JC. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med. 2015 Apr 30; 7(1): 41
Unstructured Big Data: Integration Challenge SNOMED CT LOINC Unstructured Data lake Hospital A Data lake Hospital B Automated Data Encoding
Making Sense of Unstructured Narrative Data: Natural Language Processing (NLP) Unstructured Structured Yala A, Barzilay R, et al. Using machine learning to parse breast pathology reports. Breast Cancer Res Treat. 2017 Jan; 161(2): 203 -211
Key Task in Natural Language Processing: Named-Entity Recognition Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012 May 2; 13(6): 395 -40
Automated Patient Phenotyping from EHR Data Multiple methods Robinson, J. R. , Wei, W. -Q. , Roden, D. M. , & Denny, J. C. (2018). Defining Phenotypes from Clinical Data to Drive Genomic Research. Annual Review of Biomedical Data Science, 1(1), 69– 92.
Phe. KB Standardized Phenotyping Instructions for EHR Data www. phekb. org Mo H, Thompson WK et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc. 2015 Nov; 22(6): 1220 -30
Towards Standardized Big Data from the EHR Hospital A • Patient recruitment for clinical trials High-throughput Phenotyping • Phenome-wide association studies • Real-world evidence from Big Data Hospital B High-throughput Phenotyping Hospital C High-throughput Phenotyping
From Evidence-based Medicine (EBM) to Medicine-based Evidence (MBE) using EHR Data Medicine-based Evidence: Concato, J. , & Horwitz, R. I. (2018). Randomized trials and evidence in medicine: A commentary on Deaton and Cartwright. Social Science & Medicine, 210, 32– 36. • Expert opinion • Case report • Case- series • Case-control • Observational cohort • RCT • Allow evidence from clinical practice, particular Electronic Health Record • Review of RCTs Evidence-based Medicine Pyramide • Find the most appropriate clinical evidence for your patient, do not apply hierarchy
Building a Europe-Wide Real World Evidence Network based on EHR Data www. ehden. eu
Standardization Challenges – Who performs data encoding ? – Ab initio: ask physicians to use highly controlled language ( «synaptic reporting» ) – job dissatisfaction – Let physcians use natural language -> use computers to perform data encoding (imperfect? ) – How much standardization do you enforce? – Comprehensive standardization – Minimal standards – When do you standardize (in medical research)? – Early: you may stifle innovation – Late: you will have to deal with data «chaos»
Standardization Challenges – What to standardize? – Disease-centered versus patient-centered (quality of life) concepts – Is imperfect encoding in light of Big Data acceptable ? – True signals will become visible despite encoding noise – How do we standardize machine-derived disease states ? – Data-driven early disease states based on sensor data that do not exhibit typical symptoms – How do we enforce data standardization to allow inter-institutional use of AI classification routines?
michael. krauthammer@uzh. ch
- Slides: 17