Functional Mapping A statistical model for mapping dynamic

  • Slides: 51
Download presentation
Functional Mapping A statistical model for mapping dynamic genes

Functional Mapping A statistical model for mapping dynamic genes

Recall: Interval mapping for a univariate trait Simple regression model for univariate trait Phenotype

Recall: Interval mapping for a univariate trait Simple regression model for univariate trait Phenotype = Genotype + Error yi = xi j + ei xi is the indicator for QTL genotype j is the mean for genotype j ei ~ N(0, 2) ! QTL genotype is unobservable (missing data)

A simulation example (F 2) The overall trait distribution is composed of three distributions,

A simulation example (F 2) The overall trait distribution is composed of three distributions, each one coming from one of the three QTL genotypes, QQ, Qq, and qq. Overall trait distribution Qq qq QQ

Solution: consider a finite mixture model With QQ=m+a, Qq=m+d, qq=m-a

Solution: consider a finite mixture model With QQ=m+a, Qq=m+d, qq=m-a

We use finite mixture model for estimating genotypic effects (F 2) yi ~ p(yi|

We use finite mixture model for estimating genotypic effects (F 2) yi ~ p(yi| , ) = 2|if 2(yi) + 1|i f 1(yi) + 0|i f 0(yi) QTL genotype (j) QQ Qq qq Code 2 1 0 where fj(yi) is a normal distribution density with mean j and variance 2 = ( 2, 1, 0) = QTL conditional probability given on flanking markers

Data Structure Marker (M) Subject M … M 1 2 m 1 2 3

Data Structure Marker (M) Subject M … M 1 2 m 1 2 3 4 5 6 7 8 AA(2) BB(2) … AA(2) BB(2) . . . Aa(1) Bb(1) . . . Aa(1) bb(0) . . . aa(0) Bb(1) . . . aa(0) bb(0) … Conditional probability Phenotype of QTL genotype (y) QQ(2) Qq(1) qq(0) y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 2|1 1|1 0|1 2|2 1|2 0|2 2|3 1|3 0|3 2|4 1|4 0|4 2|5 1|5 0|5 2|6 1|6 0|6 2|7 1|7 0|7

Human Development Robbins 1928, Human Genetics, Yale University Press

Human Development Robbins 1928, Human Genetics, Yale University Press

Tree growth Looks mess, but there are simple rules underlying the complexity.

Tree growth Looks mess, but there are simple rules underlying the complexity.

The dynamics of gene expression • Gene expression displays in a dynamic fashion throughout

The dynamics of gene expression • Gene expression displays in a dynamic fashion throughout lifetime. • There exist genetic factors that govern the development of an organism involving: – Those constantly expressed throughout the lifetime (called deterministic genes) – Those periodically expressed (e. g. , regulation genes) • Also environment factors such as nutrition, light and temperature. • We are interested in identifying which gene(s) govern(s) the dynamics of a developmental trait using a procedure called Functional Mapping.

Stem diameter growth in poplar trees Ma et al. (2002) Genetics

Stem diameter growth in poplar trees Ma et al. (2002) Genetics

Poplar tree - height & diameter

Poplar tree - height & diameter

Mouse growth A: male; B: female

Mouse growth A: male; B: female

Developmental Pattern of Genetic Effects QQ Qq Wu and Lin (2006) Nat. Rev. Genet.

Developmental Pattern of Genetic Effects QQ Qq Wu and Lin (2006) Nat. Rev. Genet.

Data Structure Marker (M) Phenotype (y) Sample 1 2 … m t 1 t

Data Structure Marker (M) Phenotype (y) Sample 1 2 … m t 1 t 2 … t. T 1 2 2 … 2 y 1(1) y 1(2) … y 1(T) 2 2 . . . 3 y 2(1) y 2(2) … y 2(T) 1 1 … 4 y 3(1) y 3(2) … y 3(T) 1 1 … y 4(1) y 4(2) … y 4(T) 5 1 1 … y 5(1) y 5(2) … y 5(T) 6 1 0 … 7 y 6(1) y 6(2) … y 6(T) 0 1 … 8 y 7(1) y 7(2) … y 7(T) 0 0 . . . y 8(1) y 8(2) … y 8(T) Parents AA aa F 1 Aa F 2 AA Aa aa ¼ ½ ¼ Conditional probability of QTL genotype QQ(2) Qq(1) qq(0) 2|1 1|1 0|1 2|2 1|2 0|2 2|3 1|3 0|3 2|4 1|4 0|4 2|5 1|5 0|5 2|6 1|6 0|6 2|7 1|7 0|7 2|8 1|8 0|8

Mapping methods for dynamic traits • Traditional approach: treat traits measured at each time

Mapping methods for dynamic traits • Traditional approach: treat traits measured at each time point as a univariate trait and do mapping with traditional QTL mapping approaches such as interval or composite interval mapping. • Limitations: – Single trait model ignores the dynamics of the gene expression change over time, and is too simple without considering the underlying biological developmental principle. • A better approach: Incorporate the biological principle into a mapping procedure to understand the dynamics of gene expression using a procedure called Functional Mapping (pioneered by Wu and group).

Functional Mapping (Fun. Map) A general framework pioneered by Dr. Wu and his colleagues,

Functional Mapping (Fun. Map) A general framework pioneered by Dr. Wu and his colleagues, to map QTLs that affect the pattern and form of development in time course - Ma et al. , Genetics 2002 - Wu et al. , Genetics 2004 (highlighted in Nature Reviews Genetics) - Wu and Lin, Nature Reviews Genetics 2006 While traditional genetic mapping is a combination between classic genetics and statistics, functional mapping combines genetics, statistics and biological principles.

Data structure for an F 2 population Phenotype Marker ____________________________________ Sample y(1) y(2) …

Data structure for an F 2 population Phenotype Marker ____________________________________ Sample y(1) y(2) … y(T) 1 2 … m ___________________________________________ 1 y 11 y 21 … y. T 1 1 1 … 0 2 y 12 y 22 … y. T 2 -1 1 … 1 3 y 13 y 23 … y. T 3 -1 0 … 1 4 y 14 y 24 … y. T 4 1 -1 … 0 5 y 15 y 25 … y. T 5 1 1 … -1 6 y 16 y 26 … y. T 6 1 0 … -1 7 y 17 y 27 … y. T 7 0 -1 … 0 8 y 18 y 28 … y. T 8 0 1 … 1 n y 1 n y 2 n … y. Tn 1 0 … -1 · There are nine groups of two-marker genotypes, 22, 21, 20, 12, 11, 10, 02, 01 and 00, with sample sizes n 22, n 21, …, n 00; · The conditional probabilities of QTL genotypes, QQ (2), Qq (1) and qq (0) given these marker genotypes 2 i, 1 i, 0 i.

Univariate interval mapping L(y) = fj(yi) = j=2, 1, 0 for QQ, Qq, qq

Univariate interval mapping L(y) = fj(yi) = j=2, 1, 0 for QQ, Qq, qq The Lander-Botstein model estimates ( 2, 1, 0, 2, QTL position) Multivariate interval mapping L(y) = Vector y = (y 1, y 2, …, y. T) fj(yi) = Vectors uj = ( j 1, j 2, …, j. T) Residual variance-covariance matrix = The unknown parameters: (u 2, u 1, u 0, , QTL position) [3 T + T(T-1)/2 +T parameters]

Functional mapping: the framework Observed phenotype: yi = [yi(1), …, yi(T)] ~ MVN(uj, )

Functional mapping: the framework Observed phenotype: yi = [yi(1), …, yi(T)] ~ MVN(uj, ) Mean vector: uj = [μj(1), μj(2), …, μj(T)], j=2, 1, 0 (Co)variance matrix:

Functional Mapping Functional mapping does not estimate (u 2, u 1, u 0, )

Functional Mapping Functional mapping does not estimate (u 2, u 1, u 0, ) directly, instead of the biologically meaningful parameters. An innovative model for genetic dissection of complex traits by incorporating mathematical aspects of biological principles into a mapping framework Provides a tool for cutting-edge research at the interplay between gene action and development

The Finite Mixture Model Three statistical issues: Modeling mixture proportions, i. e. , genotype

The Finite Mixture Model Three statistical issues: Modeling mixture proportions, i. e. , genotype frequencies at a putative QTL Modeling the mean vector Modeling the (co)variance matrix

Modeling the developmental Mean Vector • Parametric approach Growth trajectories – Logistic curve HIV

Modeling the developmental Mean Vector • Parametric approach Growth trajectories – Logistic curve HIV dynamics – Bi-exponential function Biological clock – Van Der Pol equation Drug response – Emax model • Nonparametric approach Lengedre function (orthogonal polynomial) Spline techniques

Example: Stem diameter growth in poplar trees Ma, et al. Genetics 2002

Example: Stem diameter growth in poplar trees Ma, et al. Genetics 2002

Logistic Curve of Growth – A Universal Biological Law (West et al. : Nature

Logistic Curve of Growth – A Universal Biological Law (West et al. : Nature 2001) Modeling the genotypedependent mean vector, uj = [uj(1), uj(2), …, uj(T)] = [ , …, ] Instead of estimating mj, we estimate curve parameters p = (aj, bj, rj) Number of parameters to be estimated in the mean vector Time points Traditional approach Our approach 5 3 5 = 15 3 3 = 9 10 3 10 = 30 3 3 = 9 50 3 50 = 150 3 3 = 9

Modeling the Covariance Matrix Stationary parametric approach Autoregressive (AR) model with log transformation =

Modeling the Covariance Matrix Stationary parametric approach Autoregressive (AR) model with log transformation = Nonstationary parameteric approach Structured antedependence (SAD) model Ornstein-Uhlenbeck (OU) process

Functional interval mapping L(y) = Vector y = (y 1, y 2, …, yk)

Functional interval mapping L(y) = Vector y = (y 1, y 2, …, yk) f 2(yi) = f 1(yi) = f 0(yi) = u 2 = ( , …, ) u 1 = ( , …, ) u 0 = ( , …, )

Estimation

Estimation

The EM algorithm E step Calculate the posterior probability of QTL genotype j for

The EM algorithm E step Calculate the posterior probability of QTL genotype j for individual i that carries a known marker genotype M step Solve the log-likelihood equations Iterations are made between the E and M steps until convergence

EM continued The likelihood function:

EM continued The likelihood function:

Statistical Derivations M-step: update the parameters (see Ma et al. 2002, Genetics for details)

Statistical Derivations M-step: update the parameters (see Ma et al. 2002, Genetics for details)

Testing QTL effect: Global test • Instead of testing the mean difference at every

Testing QTL effect: Global test • Instead of testing the mean difference at every time points for different genotypes, we test the difference of the curve parameters. • The existence of QTL is tested by • H 0 means the three mean curves overlap and there is no QTL effect. • Likelihood ratio test with permutation to assess significance. where the notation “~” and “^” indicate parameters estimated under the null and the alternative hypothesis, respectively.

Testing QTL effect: Regional test • Regional test: to test at which time period

Testing QTL effect: Regional test • Regional test: to test at which time period [t 1, t 2] the detect QTL triggers an effect, we can test the difference of the area under the curve (AUC) for different QTL genotype, i. e. , where • Permutation tests can be applied to assess statistical significance.

Applications • Several real examples are used to show the utility of the functional

Applications • Several real examples are used to show the utility of the functional mapping approach. • Application I is about a poplar growth data set. • Application II is about a mouse growth data set. • Application III is about a rice tiller number growth data set.

Application I: A Genetic Study in Poplars Parents AA aa Genetic design F 1

Application I: A Genetic Study in Poplars Parents AA aa Genetic design F 1 Aa AA BC AA Aa ½ ½

Stem diameter growth in poplar trees a: Asymptotic growth b: Initial growth r: Relative

Stem diameter growth in poplar trees a: Asymptotic growth b: Initial growth r: Relative growth rate Ma, Casella & Wu, Genetics 2002

Differences in growth across ages Untransformed Poplar data Log-transformed

Differences in growth across ages Untransformed Poplar data Log-transformed

Modeling the covariance structure Stationary parametric approach First-order autoregressive model (AR(1)) q = (

Modeling the covariance structure Stationary parametric approach First-order autoregressive model (AR(1)) q = ( , 2) Multivariate Box-Cox transformation to stabilize variance (Box and Cox, 1964 Transform-both-side (TBS) technique to reserve the interpretability of growth parameters (Carrol and Ruppert, 1984; Wu et al. , 2004). For a log transformation (i. e. , =0),

Functional mapping incorporated by logistic curves and AR(1) model Results by Fun. Map Results

Functional mapping incorporated by logistic curves and AR(1) model Results by Fun. Map Results by Interval mapping Fun. Map has higher power to detect the QTL than the traditional interval mapping method does. QTL Ma, Casella & Wu, Genetics 2002

Application II: Mouse Genetic Study Detecting Growth Genes Data supplied by Dr. Cheverud at

Application II: Mouse Genetic Study Detecting Growth Genes Data supplied by Dr. Cheverud at Washington University

Mouse Linkage Map

Mouse Linkage Map

Body Mass Growth for Mouse Parents AA aa F 1 Aa F 2 AA

Body Mass Growth for Mouse Parents AA aa F 1 Aa F 2 AA Aa aa ¼ ½ ¼ 510 individuals measured Over 10 weeks

Functional mapping Genetic control of body mass growth in mice Zhao, Ma, Cheverud &

Functional mapping Genetic control of body mass growth in mice Zhao, Ma, Cheverud & Wu, Physiological Genomics 2004

Application III: functional mapping of PCD QTL • Rice tiller development is thought to

Application III: functional mapping of PCD QTL • Rice tiller development is thought to be controlled by genetic factors as well as environments. • The development of tiller number growth undergoes a process called programmed cell death (PCD).

Parents AA aa Genetic design F 1 Aa DH AA aa ½ ½

Parents AA aa Genetic design F 1 Aa DH AA aa ½ ½

Joint model for the mean vector • We developed a joint modeling approach with

Joint model for the mean vector • We developed a joint modeling approach with growth and death phases are modeled by different functions. • The growth phase is modeled by logistic growth curve to fit the universal growth law. • The dead phase is modeled by orthogonal Legendre function to increase the fitting flexibility.

Cui et al. (2006) Physiological Ge

Cui et al. (2006) Physiological Ge

QTL trajectory plot

QTL trajectory plot

Advantages of Functional Mapping • Incorporate biological principles of growth and development into genetic

Advantages of Functional Mapping • Incorporate biological principles of growth and development into genetic mapping, thus, increasing biological relevance of QTL detection • Provide a quantitative framework for hypothesis tests at the interplay between gene action and developmental pattern - When does a QTL turn on? - When does a QTL turn off? - What is the duration of genetic expression of a QTL? - How does a growth QTL pleiotropically affect developmental events? • The mean-covariance structures are modeled by parsimonious parameters, increasing the precision, robustness and stability of parameter estimation

Functional Mapping: toward high-dimensional biology • A new conceptual model for genetic mapping of

Functional Mapping: toward high-dimensional biology • A new conceptual model for genetic mapping of complex traits • A systems approach for studying sophisticated biological problems • A framework for testing biological hypotheses at the interplay among genetics, development, physiology and biomedicine

Functional Mapping: Simplicity from complexity • Estimating fewer biologically meaningful parameters that model the

Functional Mapping: Simplicity from complexity • Estimating fewer biologically meaningful parameters that model the mean vector, • Modeling the structure of the variance matrix by developing powerful statistical methods, leading to few parameters to be estimated, • The reduction of dimension increases the power and precision of parameter estimation