A Systems Genetic Analysis of Chronic Fatigue Syndrome

  • Slides: 24
Download presentation
A Systems Genetic Analysis of Chronic Fatigue Syndrome: Combinatorial Data Integration From SNPs to

A Systems Genetic Analysis of Chronic Fatigue Syndrome: Combinatorial Data Integration From SNPs to Differential Diagnosis of Disease Roumyana Kirova, Ph. D. Oak Ridge National Laboratory Oak Ridge, TN joint work with Elissa Chesler Ph. D, Michael Langston Ph. D, Andy Perkins, Xinxia Peng, Ph. D.

Main Task: Data Integration Many populations: Integration is intellectual Reference population: Integration is computational

Main Task: Data Integration Many populations: Integration is intellectual Reference population: Integration is computational 83 x 227 clinical Population 1 polymorphisms Population 2 m. RNA expression Population 3 m. RNA expression Population 4 20160 x 176 Clinical m. RNA SNPs Proteins 42 x 223 500 x 64 x 16 We speak piously of taking measurements and making small studies that will add another brick to the temple of science. Most such bricks just lie around in the brickyard. - JR Platt Science 146: 347 (1964)

Using Reference Population for Data Integration SYSTEM GENETICS Natural polymorphisms propagate effects at each

Using Reference Population for Data Integration SYSTEM GENETICS Natural polymorphisms propagate effects at each level of the system networks: m. RNA, posttranslational, proteomics, disease. GRAPHS, NETWORKS Combining discrete graph representation with parametric analysis allows to analyze megavariate network structures.

Data Integration Factor of peaksproteins m. RNA co-expression network Genetic Regulatory Network Models: Multilocus

Data Integration Factor of peaksproteins m. RNA co-expression network Genetic Regulatory Network Models: Multilocus Natural allele perturbations SNPs T/C A/T C/G Proteins TC CG CFS A/T Proteins Depression TCA CGT Proteins Gene-protein relationships AGC TGT G/G C/T

A Single Data Matrix: CAMDA Data Design A reference population allows a single data

A Single Data Matrix: CAMDA Data Design A reference population allows a single data table of attributes

From data attributes to correlations

From data attributes to correlations

Factor and Discriminant Analysis of Disease Symptoms Status Nondepr D- Depresse d D+ Least

Factor and Discriminant Analysis of Disease Symptoms Status Nondepr D- Depresse d D+ Least CFS- 61 6 Middle CFS+ 39 28 Worst CFS++ 11 19

Multi Loci Models for Disease Symptoms CRHR 1 Best CFS Predictors PHYS Function CT/CT

Multi Loci Models for Disease Symptoms CRHR 1 Best CFS Predictors PHYS Function CT/CT Activity Reduction GT/AG Bodily Pain CRHR 2 POMC GT/AG NR 3 C 1 Best Depression Predictors Social CT/AG NR 3 C 1 AG/AG Mental NR 3 C 1 Fatigue Emotional GT/AG Vitality COMT TPH 2 NR 3 C 1 COMT AG/AG TH COMT

SNPs as predictors for CFS Discriminant analysis using SNPs

SNPs as predictors for CFS Discriminant analysis using SNPs

Using Combinatorics for High Throughput Data Reduction Attributes of Individuals - Data Array Genotypes,

Using Combinatorics for High Throughput Data Reduction Attributes of Individuals - Data Array Genotypes, m. RNA levels, protein levels, disease symptoms. Trait by Trait Correlation Matrix Statistical measure of correlation under all or subset of conditions. Filter for Correlation > Threshold Global, Trait Co-expression Network Traits Vertices, Correlation edges Compute Cliques, Dense Subgraphs, etc. a a a

Scale Free Networks q A clique is a complete subgraph q NP complete problem

Scale Free Networks q A clique is a complete subgraph q NP complete problem q Fixed parameter tractability with high-performance computational platforms are used 1 Vertex degree histogram 1 Clique distribution histogram F. N. Abu-Khzam, M. A. Langston, P. Shanbhag, and C. T. Symons, Algorithmica, 2006

GO Categories Major networks common for all patients http: //bioinfo. vanderbilt. edu/webgestalt

GO Categories Major networks common for all patients http: //bioinfo. vanderbilt. edu/webgestalt

Differential Clique Analysis Group 1 (CFS-) Group 2 (CFS+)

Differential Clique Analysis Group 1 (CFS-) Group 2 (CFS+)

Differential Clique Analysis CFS perturbed network Depress perturbed network http: //bioinfo. vanderbilt. edu/webgestalt

Differential Clique Analysis CFS perturbed network Depress perturbed network http: //bioinfo. vanderbilt. edu/webgestalt

Network Specific Multi Loci Models Clique kernel C 2 C 1 Common cliques Principal

Network Specific Multi Loci Models Clique kernel C 2 C 1 Common cliques Principal components SNP association Model 1 SNP association Model 2

Annotating a Clique Expression SNP-network Targets of NR 3 C 1, CRHR 1, TH

Annotating a Clique Expression SNP-network Targets of NR 3 C 1, CRHR 1, TH and POMC regulation identified by SNPexpression association Annotated using Natural Language Processing tool, Chilibot. net Confirmation of gene-gene relations and relations to fatigue.

Finding Cliques from Proteomics Edges: Correlations of intensities taken over individuals 561 cliques, 243

Finding Cliques from Proteomics Edges: Correlations of intensities taken over individuals 561 cliques, 243 peaks 1. Cliques of peaks from the same fractions 2. Cliques of peaks from the same m/z – different fractions

Bi-cliques from m. RNA and peptide peaks Peaks m. RNA Protein EDGES: PEARSON CORRELATIONS

Bi-cliques from m. RNA and peptide peaks Peaks m. RNA Protein EDGES: PEARSON CORRELATIONS TAKEN OVER INDIVIDUALS

Bi-clique Covariance Structure Single m. RNA CXorf 21 associated with KIF 23 a number

Bi-clique Covariance Structure Single m. RNA CXorf 21 associated with KIF 23 a number of TSPAN 5 correlated peaks Single peak intensity associated with a number of m. RNAs 6083 3255

Bi-clique as diagnostic Cliques of peaks CFS++ CFS- Depression 6083 5880 3182 3255 12595

Bi-clique as diagnostic Cliques of peaks CFS++ CFS- Depression 6083 5880 3182 3255 12595 9385 6083 4336 3360 66148 4137 4268 3263 8575 1671 3255 5880

Bi-clique as diagnostic m. RNAs from the bi-clique analysis correlated to the protein biomarkers

Bi-clique as diagnostic m. RNAs from the bi-clique analysis correlated to the protein biomarkers for CFS

System genetics as diagnostic tool CFS perturbed NR 3 C 1 CRHR 1 Major

System genetics as diagnostic tool CFS perturbed NR 3 C 1 CRHR 1 Major network genes COMT Depressed perturbed NR 3 C 1 TPH 2 TH

CONCLUSIONS q Reference population Four diverse data types can be integrated computationally q System

CONCLUSIONS q Reference population Four diverse data types can be integrated computationally q System Genetics POLYMORHISMS DISEASE Through complex biological networks q Integration of Combinatorial approach and Parametric analysis Help decompose the large networks and utilize multivariate statistical analysis on local networks

Acknowledgements University of Tennessee Dr. Michael Langston Andy Perkins Jon Scharff Bhavesh Borate Seattle

Acknowledgements University of Tennessee Dr. Michael Langston Andy Perkins Jon Scharff Bhavesh Borate Seattle Biomedical Research Institute Dr. Xinxia Peng Oak Ridge National Laboratory Dr. Elissa J. Chesler Dr. Brynn H. Voy Suzanne H. Baktash