The Measure of Synergy as a Tool in



















- Slides: 19
The Measure of Synergy as a Tool in Systems Biology D. Anastassiou C 2 B 2/MAGNet Center Third Annual Retreat, 4/11/2008
Synergy Definition: “The interaction of two or more agents or forces so that their combined effect is greater than the sum of their individual effects” (American Heritage Dictionary) Natural application in systems biology (holistic as opposed to reductionist paradigm): We wish to analyze multiple interacting factors in terms of the purely cooperative nature of their contributions towards an outcome. D. Anastassiou, "Computational Analysis of the Synergy Among Multiple Interacting Genes" (Review Article), Molecular Systems Biology, Vol. 3, No. 83, February 2007.
Information-theoretic definition Synergy of two factors Gi, Gj with respect to an outcome C: Synergy can be positive or negative (redundancy) and extended to more than two factors.
Example: Synergy of two genes with respect to a phenotype CONDITIONS Given a large set of gene expression data in both presence and absence of a phenotype such as cancer, we can estimate the information I(Gi; C) that any gene Gi provides about cancer C, GENES as well as the information I(Gi, Gj; C) that any pair of two genes (Gi, Gj), jointly provide about cancer C. HEALTH CANCER
Best gene pairs for classification Extension of “gene ranking” based on I(Gi; C) to “gene-pair ranking” based on I(Gi, Gj; C) Observation: Sometimes high-ranked gene pairs do not include any of the high-ranked single genes, suggesting that the correlation of the gene pair with cancer is due to a purely cooperative effect of the two genes. V. Varadan and D. Anastassiou, “Inference of Disease-Related Molecular Logic from Systems-Based Microarray Analysis, ” PLo. S Computational Biology, Vol. 2, Issue 6, June 2006, pp. 585 -597. This purely cooperative effect can be quantified!
What is the “cancer interactome”? High correlation: I(Gi, Gj; C) >> 0 implies that the two genes can be jointly used for classification High synergy: I(Gi, Gj; C) >> I(Gi; C) + I(Gj; C) ≥ 0 further implies that the two genes Gi and Gj “interact” with respect to cancer, and can be used to construct a “synergy network, ” a graph with nodes represent genes and edges connect significantly high-synergy gene pairs. J. Watkinson, X. Wang, T. Zheng, D. Anastassiou, “Identification of gene interactions associated with disease from gene expression data using synergy networks, ” BMC Systems Biology, February 2008
Example (prostate cancer)
Example of scatter plot for highest-synergy gene pair from prostate cancer data 50 green (healthy) and 52 red (cancerous) dots Cancer = (Low RBP 1) AND (High EEF 1 B 2)
Using synergy for inference of gene regulatory interactions The “phenotype” can be the expression level of a third gene
Application to “Challenge 5” of the DREAM 2 conference Given: A “blinded” compendium of 300 normalized Affymetrix microarray experiments from E. coli, involving 3, 456 genes out of which 120 (also blinded) transcription factors. Challenge: Reconstruct a genome-scale transcriptional network (identify TF-target interactions). Score based on known “ground truth” from chromatin precipitation and otherwise experimentally verified Transcription Factor (TF)-target interactions (from Regulon. DB).
“Three-way” mutual information (common to three genes) Can be estimated from continuous data
Three-way mutual information is the opposite of synergy! I(G 1; G 2; G 3) can be negative, in which case there is no Venn diagram possible. It turns out that -I(G 1; G 2; G 3) is equal to I(G 1, G 2; G 3) - [I(G 1; G 3)+I(G 2; G 3)] = I(G 2, G 3; G 1) - [I(G 2; G 1)+I(G 3; G 1)] = I(G 1, G 3; G 2) - [I(G 1; G 2)+I(G 3; G 2)] the synergy of two of the genes with respect to the third.
Synergistic “entanglement” of three genes If I(G 1; G 2; G 3) << 0, this suggests that there is some interaction mechanism connecting the three genes, and the positive quantity -I(G 1; G 2; G 3) can be seen as measuring their synergistic “entanglement. ” In that case, one likely scenario is that one of the three genes is, at least partly or indirectly, synergistically regulated by the other two.
Most-likely regulated gene in a synergistically entangled triplet I(Gi, Gk; Gj) ≥ max {I(Gi, Gj; Gk), I(Gk, Gj; Gi} Or, as it turns out, equivalently: I(Gi; Gk) ≤ min {I(Gi; Gj) , I(Gk; Gj)}
Synergistic regulation index Measures the degree of confidence that gene Gi cooperatively regulates gene Gj It also identifies Gk as the best synergistic partner of Gi for the regulation of Gj Can be used to augment the traditional MI measure: M(i , j ) = I (G i ; G j )
Final score for Gi → Gj regulation computed from 2 -way and 3 -way MI values. Turns out that it is equal to:
Results SCORE TEAM Combined log 10(P value) GISL 40. 5 Team 121 25. 2 Team 73 24. 1 Team 41 18. 7 Team 58 10. 0 GISL: Among top 150 predictions, 106 were in “ground truth. ”
Potential for biological discovery using synergy: Large number of statistically significant entangled triplets
Conclusions and acknowledgements Synergy-based methodologies have the potential to contribute towards empowering systems biology to achieve genuine biological discovery by identifying multiple interacting contributing factors, such as genes, SNPs and CNVs. Co-authors: Prof. Tian Zheng, Statistics, Columbia University Prof. Xiaodong Wang, EE, Columbia University Ph. D. students: John Watkinson, Kuo-ching Liang