Networkbased data integration reveals extensive posttranscriptional regulation of
Network-based data integration reveals extensive post-transcriptional regulation of human tissue-specific metabolism Tomer Shlomi*, Moran Cabili*, Markus J. Herrgard, Bernhard Q Palsson and Eytan Ruppin * These authors contributed equally to this work 1
Metabolism is the totality of all the chemical reactions that operate in a living organism. Catabolic reactions Breakdown and produce energy Anabolic reactions Use energy and build up essential cell components 2
Why Study Human Metabolism? • In born errors of metabolism cause acute symptoms and even death on early age • Metabolic diseases (obesity, diabetics) are major sources of morbidity and mortality. • Metabolic enzymes and their regulators gradually becoming viable drug targets 3
Modeling Cellular Metabolism A Short Review Metabolic flux : The production or elimination of a quantity of metabolite per mass of organ or organism over a specific time frame Metabolite Reaction catalyzed by an enzyme “. . it is the concept of metabolic flux that is crucial in the translation of genotype and environmental factors into phenotype or a threshold for disease. ” Brendan Lee Nature 2006 4
Constraint Based Modeling Find a steady-state flux distribution through all biochemical reactions • Under the constraints: – Mass balance: metabolite production and consumption rates are equal – Thermodynamic: irreversibility of reactions – Enzymatic capacity: bounds on enzyme rates • Successfully predicts: constant 5
Constraint Based Modeling (CBM) Mathematical Representation of Constrains Glucose + ATP Glucokinase Glucose-6 -Phosphate + ADP Mass balance S·v = 0 n Subspace of R metabolites • Stoichiometric matrix – network topology with stoichiometry of biochemical reactions Glucose ATP G-6 -P ADP Thermodynamic & capacity 10 >vi > 0 Glucokinase -1 -1 +1 +1 Optimization Maximize Vgrowth Bounded convex cone Fell, et al (1986), Varma and Palsson (1993) wth o r g 6
Human Metabolic Models • Motivated by the fact that in-vivo studies of tissue-specific metabolic functions are limited in scope • Individual genes and pathways (KEGG, Human. Cyc) • Detailed description of the genes, reactions, enzymes • No connections between pathways • Specific cell-types and organelles • Red blood cell Wiback et al. 2002 • Mitochondria Vo et al. 2004 • Large-Scale Human Metabolic Networks • The first large-scale model of human metabolism ~2000 genes, ~3700 reactions, 7 organelles (Duarte et al. 2007, Ma et al. 2007) 7
CBM in Human Modeling human tissue function is problematic • Various cell-types activate different pathways (shown in Expression studies) • Hard to formulate cellular metabolic objectives – (like biomass maximization for microbial species) • Unknown inputs and outputs of each cell-type Can we use constraint-based modeling to systematically predict tissuespecific metabolic behavior? 8
Our Objective : 1. General approach to study tissue specific metabolic models 2. Tissue specific activity of metabolic genes/reactions Our Method : Model Integration with Tissue-Specific Gene and Protein Expression Data Motivated by the assertion that highly expressed genes in a certain tissue are likely to be active there 9
Our Method 1 Gene expression data Protein measurements data Highly and Lowly expressed gene sets Gene-to-reaction mapping Highly and Lowly expressed reaction sets Human Metabolic Model 2 (Duarte et. al) 3 New objective function: Maximize consistency with expression data. Use Mixed Integer Linear Programming (MILP) 4 Determine activity state and conf. level for each gene/reaction 10
Our Method Determine Highly and Lowly Reaction sets 1. Genes set : Extract set of enzymes whose expression is significantly increased or decreased (Gene. Note, HPRD) 2. Reactions set : Employ a detailed gene-to-reaction mapping to identify a tissue-specific expression state for each reaction R 1 = (g 1 & g 2) | g 3 | g 4 11
Our Method 1 Gene expression data Protein measurements data Highly and Lowly expressed gene sets Gene-to-reaction mapping Highly and Lowly expressed reaction sets Human Metabolic Model 2 (Duarte et. al) 3 New objective function: Maximize consistency with expression data. Use Mixed Integer Linear Programming (MILP) 4 Determine activity state and conf. level for each gene/reaction 12
Our Method Represent Flux Consistency with Expression State Highly expressed E 1 Input H 1 L 1 M 3 Output M 7 E 2 M 4 M 1 L 2 M 6 M 2 E 6 E 5 M 8 E 3 H 2 Output E 4 H 3 E 7 M 9 Lowly expressed Looking for real flux vector V Now additional Boolean vectors H, L s. t : Hi=1 Vi != 0 (if the enzyme associated with Vi is Highly expressed) L i=1 Vi=0 (if the enzyme associated with Vi is Lowly expressed) 13
Our Method Define a New Objective function Highly expressed E 1 Input H 1 L 1 E 5 M 3 Output M 7 E 2 M 4 M 1 L 2 M 5 M 6 M 2 E 6 M 8 E 3 H 2 Output E 4 H 3 E 7 M 9 4 out of 5 reactions were Use Mixed Integer Linear Programming. Define a new objective consistent with function: the MAX Σ (Hi + Li ) expression state! Lowly expressed Which practically mean maximize the number of Highly expressed reactions that are active and the number of Lowly expressed reactions that are inactive Maximize consistency with expression data 14
Our Method 1 Gene expression data Protein measurements data Highly and Lowly expressed gene sets Gene-to-reaction mapping Highly and Lowly expressed reaction sets Human Metabolic Model 2 (Duarte et. al) 3 New objective function: Maximize consistency with expression data. Use Mixed Integer Linear Programming (MILP) 4 Determine activity state and conf. level for each gene/reaction 15
Our Method Flux Activity State • Gene’s flux activity states -reflect the absence/existence of non -zero flux through the enzymatic reactions they encode • Comparison of the flux activity states and the expression state will teach us on post transcription regulation Highly expressed E 1 E 5 Lowly expressed M 3 M 7 E 2 M 4 M 1 M 5 M 2 M 6 E 6 M 8 E 3 E 4 Up regulated E 7 M 9 Down regulated 16
Flux Activity State Consider Space of Possible Solutions • We predict for each tissue active and inactive gene and reactions sets • Since there is a space of possible solutions to the MILP problem we solve a set of MILP problems to determine the gene activity 1. Simulate a state where the gene is inactive 2. Simulate an active gene product Estimate confidence levels based on the drop in the consistency (with expression) between the 2 different solutions! 17
Results Gene Tissue Specific Activity • We employed the method described above on • metabolic network model of Duarte et al. • gene and protein expression measurements from Gene. Note and HPRD • 10 tissues : brain, heart, kidney, liver, lung, pancreas, prostate, spleen, skeletal muscle and thymus. • The activity state of 781 out of 1475 model genes was determined in at least one tissue 18
Post-transcriptional Regulation of Metabolic Genes • Post-transcriptional regulation plays a major role in shaping tissue-specific metabolic behavior: ~20% of the metabolic genes per tissue • average of 42 (3. 6%) genes post-transcriptionally up-regulated and 180 (15. 4%) post-transcriptionally down-regulated in each tissue down-regulated up-regulated 19
Cross Validation Test • We performed a five-fold cross validation test • 80% of the genes were used to constrain the model • Gene activity states for a held-out set of 20% of the genes were predicted according to the expression constrains of the remaining other 80% • The overlap between the genes predicted as active and the highly expressed genes in the held-out data was significantly high for all tissues 20
Large Scale Validation Large-Scale Mining of Tissue-Specificity Data - Tissue-specificity of genes, reactions, and metabolites is significantly correlated with all data sources - Tissue specificity of post-transcriptional up regulated elements is significantly high !!!! - Tissue specificity of post-transcriptional down regulated elements is significantly low !!!! 21
Tissue-Specific Metabolite Exchange with Biofluids • 249 metabolites are known to be secreted or taken up by human tissues • 54% of the metabolites are not associated with transporters and cannot be predicted by expression data • Transport direction can not be inferred by the expression data • A transporter might carry several metabolites • Many of the known transporters are post-transcriptionall regulated 22
Metabolic Disease-Causing Genes • 162 metabolic genes are associated with a mendelian disease • Prediction accuracy: precision of 49% and a recall of 22% • There is a significant affect of post transcriptional regulation on disease-causing genes GBE 1 causes the glycogen storage disease is post-transcriptionally up-regulated in liver, heart, skeletal muscle, and brain) 23
Summary Methodological Standpoint • First constraint-based modeling analysis of recently published human metabolic networks • First to account for post-transcriptional regulation within the computational framework of large-scale metabolic modeling • Integrate expression data as part of the optimization instead of imposing it as a constrain during the preprocessing step (Akesson et al. 2004) 24
Summary Main Conclusions • Post transcriptional regulation plays a significant rule in shaping tissue specific metabolic behavior The tissue specificity of many metabolic disease-causing genes goes markedly beyond that manifested in their expression level, giving rise to new predictions concerning their involvement in different tissues Metabolites exchange with biofluids displays a large variance across tissues, composing a unique view of tissue -specific uptake and secretion of hundreds of metabolites 25
What’s Next? • Integrate other tissue-specificity data • Modeling of metabolic diseases – Using various data sources (known disease-causing genes, drug databases) – Predict tissue-wide metabolic symptoms – Predict metabolic response to drugs • Predict disease biomarkers that can be identified by biofluid metabolomics 26
Thank you! 27
Mathematical representation of our optimization problem 28
- Slides: 28