Genomics England Ge L Research Environment Clinical Data

  • Slides: 13
Download presentation
Genomics England (Ge. L) Research Environment Clinical Data 3 rd May 2019 Georgia Chan

Genomics England (Ge. L) Research Environment Clinical Data 3 rd May 2019 Georgia Chan

Overview 1. Schematic presentation of datasets and data flows 2. Programmes, models and clinical

Overview 1. Schematic presentation of datasets and data flows 2. Programmes, models and clinical data 3. Linked datasets

Data Sources and Types • Sequencing + Samples • Primary clinical data Biorepository +

Data Sources and Types • Sequencing + Samples • Primary clinical data Biorepository + Sequencing NHSD + Public Health England • Secondary clinical data Genomic Medicine Centers Bioinformatics + Ge. L • Reporting & Interpretation Research Environment 1. Pseudonymized, linked, raw patient data. 2. High Performance Cluster. 3. Select applications. Ge. CIP Industry

Biorepository, Sequencing • Sample-level data • Plated sample data • Related QC data •

Biorepository, Sequencing • Sample-level data • Plated sample data • Related QC data • Sequencing report • BAM files and metadata.

Bioinformatics Table Description tiering data A table of tiered variants from the Genomics England

Bioinformatics Table Description tiering data A table of tiered variants from the Genomics England Rare Disease Interpretation Pipeline. tiered variants frequency The frequency of rare disease tiered variants in other cohorts (gnom. AD etc) panels applied The gene panels applied to rare disease families for interpretation gmc exit questionnaire Data inform to what extent a family’s presented case can be explained by the combined variants reported to the GMC from Genomics England. aggregate gvcf sample stats Quality control metrics for the aggregated g. VCF (~60 K samples)

Genomic Medicine Centres • Demographic data • Disease characteristics • Primary data per programme

Genomic Medicine Centres • Demographic data • Disease characteristics • Primary data per programme

Genomic Medicine Centers Clinical Data - Cancer • Individual-level and tumour-level data • •

Genomic Medicine Centers Clinical Data - Cancer • Individual-level and tumour-level data • • • Tumour data Morphology Topography Linkage of metastases to primaries Staging Care plan (including surgery and chemotherapy) Circulating tumour markers Imaging Pathology (general and tumour type specific) Risk factors

Genomic Medicine Centers Clinical Data – Rare Disease • Family-level and individual-level data •

Genomic Medicine Centers Clinical Data – Rare Disease • Family-level and individual-level data • Family pedigree • Registration data • Disease-level (refs GEL eligibility criteria) and phenotype-level (HPO) • Early childhood observation data • Laboratory blood tests • Imaging • Genetic Investigation

Quick-view table: Rare Disease Analysis • The rare disease analysis quick-view table combines important

Quick-view table: Rare Disease Analysis • The rare disease analysis quick-view table combines important information from many rare disease related tables into a single table for easy browsing and analysis • It contains the latest genome delivery per participant • For example: • • Participant-level: Participant id, family id, sex, year of birth, ethnicity, Disease: Recruited disease QC metrics: Genetic vs Reported Checks (Mendelian inconsistencies, sex checks) Bioinformatics: Path to genome (BAM / VCFs), tiering data

Secondary Clinical Data 1/2 NHSD • Hospital Episode Statistics • • Accident and Emergency

Secondary Clinical Data 1/2 NHSD • Hospital Episode Statistics • • Accident and Emergency Admitted Patient Care Critical Care Outpatient Care • Diagnostic Imaging Dataset (metadata only). • Patient Reported Outcome Measures • Mental Health Minimum Dataset (up to March ‘ 14) • Mortality datasets - ONS, CEN • Received quarterly and approximately three months old Image from CSL-UK

Secondary Clinical Data 2/2 Public Health England • NCRAS (National Cancer Registration and Analysis

Secondary Clinical Data 2/2 Public Health England • NCRAS (National Cancer Registration and Analysis Service) Systemic Anti-Cancer Therapy National Radiotherapy Dataset Diagnostic imaging dataset (metadata only). AV patient (tumour count and death cause) AV tumour AV treatment Route to Diagnosis National audits/surveys (CPES, LUCADA, PROM-colorectal, Cancer survivor PROMS) • Received annually. Please check Data Dictionary for period coverage. • • • Labkey tables: • sact • We aim to publish the rest of the tables in the next release.

Summary Present: • Real world data • Different levels of validation of the course

Summary Present: • Real world data • Different levels of validation of the course of the project. • Data from all phases of pipeline so different stages of query resolution • Secondary data are longitudinal and cover periods of time specific to each dataset (consult Ge. L data dictionary), resulting in multiple rows per participant, depending on activity. • Every release the data overwrites the previous submission with the most current update. Growth so far: 80% new participant data, 20% more data on existing participants. Future additions: • OMOP Common Data Model • Terminology server Sample/ Sequencing

Useful external links • Primary data dictionary: Ge. L_DD • Secondary data: NHSD data

Useful external links • Primary data dictionary: Ge. L_DD • Secondary data: NHSD data dictionary. • HES dataset is a long established resource from NHSD. Abundance of information and access to full datasets description can be found below: • https: //digital. nhs. uk/data-and-information/data-tools-and-services/data-services/hospital-episode-statistics • Licensing restrictions prevent us from sharing descriptions of ICD 10, OPCS 4 and HRG codes. • The full lookup can be found from the official source TRUD (subscription required) • https: //isd. digital. nhs. uk/trud 3/user/guest/group/0/pack/28 • Individual lookups can be served by numerous online applications and the handy i. OS application Reference NHS.