Scientific Knowledge Discovery over Big Data BIG DATA
Scientific Knowledge Discovery over Big Data BIG DATA for DISCOVERY SCIENCE Sam Hobel, University of Southern California Fashid Sepehrband, University of Southern California William Matloff, University of Southern California Lu Zhao, Ph. D. , University of Southern California John Darrell Van Horn, Ph. D. , University of Southern California
Big Data to Knowledge (BD 2 K) • NIH Program featuring • • • 12 Centers of Excellence Multiple R 25’s Multiple U 01 awards 16 T 32/T 15 institutional training grants A Training Coordinating Center (TCC) • Seeking to catalyze the data science involved in biomedicine toward new insights in health and disease 9/9/2020 BIG DATA for DISCOVERY SCIENCE 2
Big Data for Discovery Science (BDDS) • A U 54 Center of Excellence • Arthur Toga, PI, USC • Mark and Mary Stevens Neuroimaging and Informatics Institute at USC • Information Sciences Institute (ISI) • Computation Institute, University of Chicago • Institute for Systems Biology (Seattle) • University of Michigan • Human neuroimaging, phenomics, genetics, and proteomics • Data management, software tools, workflows 9/9/2020 BIG DATA for DISCOVERY SCIENCE 3
BDDS Website http: //bd 2 k. ini. usc. edu 9/9/2020 BIG DATA for DISCOVERY SCIENCE 4
BDDS Covers Several Domains 9/9/2020 BIG DATA for DISCOVERY SCIENCE 5
Peer Reviewed Publications 9/9/2020 BIG DATA for DISCOVERY SCIENCE 6
B I G D A T A BDDS Platform – Integrated Tools for Discovery f o r D I S C O V E R Y S C 9/9/2020 7
Sam Hobel • LONI Pipeline Fundamentals (30 mins): Here I will provide a presentation on the basics of how to develop and deploy “big data” processing workflows using the LONI Pipeline environment and its connections to cloud computing resources. LONI Pipeline forms the basis for many of the GWAS/Phe. WAS analyses attendees will see in the following lectures. 9/9/2020 BIG DATA for DISCOVERY SCIENCE 8
Farshid Sepehrband, Ph. D. • Mining Neuroimaging “big data” (30 mins): This presentation will demonstrate a statistical learning approach to explore neuroimaging “big data” in order to derive an association of interest. We will discuss some of the limitations of conventional regression techniques in big data analysis, and will show statistical learning techniques provide increased inferential power. 9/9/2020 BIG DATA for DISCOVERY SCIENCE 9
William Matloff • GWAS Analyses (30 mins): GWAS of Quantitative Traits: A Semi-Automated Approach. We demonstrate the use of state-of-the-art "big data" genomic and neuroimaging workflow tools in reducing the mining of quantitative trait loci to three simple steps. This approach is particularly conducive to "reproducible science" and the flexible visualization of association results. . 9/9/2020 BIG DATA for DISCOVERY SCIENCE 10
Lu Zhao, Ph. D. • Phe. WAS Analyses (30 mins): In this presentation, we will introduce a big data discovery framework for neuroimaging Phe. WAS, and provide an example of using the approach to identify which brain phenotypes out of 2, 000’s are influenced by a specific genetic factor. Such analyses provide unique insights into the role of specific genes on brain -specific phenotypes. 9/9/2020 BIG DATA for DISCOVERY SCIENCE 11
This Afternoon’s Tutorial… • Sam Hobel • Farshid Sepehrband • William Matloff • Lu Zhao • Q&A • Software demonstrations 9/9/2020 BIG DATA for DISCOVERY SCIENCE 12
Away we go! 9/9/2020 BIG DATA for DISCOVERY SCIENCE 13
- Slides: 13