Big Data and How It Is Being Used
“Big Data and How It Is Being Used to Transform the Pharmaceutical Business Model” Guna Rajagopal, VP & Global Head, Computational Sciences Pharmacogenomics Scientific Excellence Innovation Collaboration Discovery Sciences 0
Integrating the digital universe of data to deliver precision medicine Eric Schadt, MSSM
The Big Data Challenge Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it … Anon. Discovery Sciences 2
Opportunities & Challenges • Growing confidence in ability to leverage ‘Big Data’ for healthcare despite critical challenges involving : - sustainability, - harmonization - Indemnification. • Increasing computational power and cheaper sequencing costs • Key issues of privacy, confidentiality and sharing of data to advance R&D still unresolved. • Cost – depending on duration, # of individuals, type of data collected • Study subject retention – easier in clinical trials cf. prospective studies • Biobanking – quality, # of samples, heterogeneity • Multiple academic, government and private company efforts initiated • Sampling Errors • Rich databases under development • Bioinformatics – Multi ‘omics and metadata – ‘Normal’ and disease populations – Link genotype/phenotype Discovery Sciences – sample size, incomplete/noisy data – Signal/Noise issues • Experimental and/or study design – nature and scope of query, ease of acquiring high-quality samples and associated phenotype data 33
Discovery Sciences 44
Example – Application to Pharmacogenomics The Question: How do we identify new genetic markers and mechanisms for response/nonresponse for patients in ((anti-TNF treatment for RA)) using samples and phenotypic data collected in a clinical study? This question addressed in 3 phases – we outline Phase 1 Collaboration: In partnership with R&D IT and Immunology colleagues and external collaborator (Nic Schork, JCVI), Amazon and SDSC experts. 5
Our Challenge – Integrating Big Data, Cloud, HPC platforms & Analytics RA WGS 90 TB FASTQ, BAM, VCF, GFF, report FASTQ/BAM Encryption Factory Amazon S 3 FASTQ-> BAM-> VCF subset Glacier (135 TB) BAM SDSC HPC AWS tran. SMART Clinical/biom arker data Feinstein Critical support from R&D IT and SDSC staff VCF Janssen Scripps 6
Big Data Needs Big Compute! 257 TB Lustre scratch used at peak 5, 000 cores (30% of Gordon) in use at once Partnership with Intel/IMEC/SDSC to optimize code/HPC performance
Limitations - Meaningfulness of Answers A risk with “Big Data Analytics” is that one can “discover” patterns that are meaningless => Check, Double Check & Verify! 8
REFINING THERAPEUTIC DECISIONS & PREDICTING DRUG EFFICACY NR AE R
Discovery Sciences 10
- Slides: 11