Statistical Genomics Lecture 13 GLM Zhiwu Zhang Washington

  • Slides: 24
Download presentation
Statistical Genomics Lecture 13: GLM Zhiwu Zhang Washington State University

Statistical Genomics Lecture 13: GLM Zhiwu Zhang Washington State University

Administration Midterm exam: 3: 50 -4: 20 Homework 3: due Mar 1, Wednesday, 3:

Administration Midterm exam: 3: 50 -4: 20 Homework 3: due Mar 1, Wednesday, 3: 10 PM Homework 4: Team work(pair preferred, maximum of 3)

HW 2 1. Q 1&Q 2: Claim the difference is significant (-5). 2. Q

HW 2 1. Q 1&Q 2: Claim the difference is significant (-5). 2. Q 1&Q 2: No explanation why stochastic accuracy stays the same with missing rate (-5). 3. Q 3~Q 5: Generated results with SD of 0 and did not report (-5). 5. Q 4: Conduct BEAGLE unsuccessfully (-5). 6. Q 5: No comparison on switching SNP and individual Q 3 and Q 5 (-5). 7. Source code: Didn’t show to do replication of the imputation (-5)

Outline Spurious association Covariates LS concept Linear model

Outline Spurious association Covariates LS concept Linear model

QTNs 0 n CHR 1 -5, leave 6 -10 empty my. GD=read. table(file="http: //zzlab.

QTNs 0 n CHR 1 -5, leave 6 -10 empty my. GD=read. table(file="http: //zzlab. net/GAPIT/data/mdp_numeric. txt", head=T) my. GM=read. table(file="http: //zzlab. net/GAPIT/data/mdp_SNP_information. txt", head=T) setwd("~/Dropbox/Current/ZZLab/WSUCourse/CROPS 545/Demo") source("G 2 P. R") source("GWASby. Cor. R") X=my. GD[, -1] index 1 to 5=my. GM[, 2]<6 X 1 to 5 = X[, index 1 to 5] set. seed(99164) my. Sim=G 2 P(X= X 1 to 5, h 2=. 75, alpha=1, NQTN=10, distribution="norm") p= GWASby. Cor(X=X, y=my. Sim$y)

Visualization color. vector <- rep(c("deepskyblue", "orange", "forestgreen", "indianred 3"), 10) m=nrow(my. GM) plot(t(-log 10(p))~seq(1:

Visualization color. vector <- rep(c("deepskyblue", "orange", "forestgreen", "indianred 3"), 10) m=nrow(my. GM) plot(t(-log 10(p))~seq(1: m), col=color. vector[my. GM[, 2]]) abline(v=my. Sim$QTN. position, lty = 2, lwd=2, col = "black")

QQ plot p. obs=p[!index 1 to 5] m 2=length(p. obs) p. uni=runif(m 2, 0,

QQ plot p. obs=p[!index 1 to 5] m 2=length(p. obs) p. uni=runif(m 2, 0, 1) order. obs=order(p. obs) order. uni=order(p. uni) plot(-log 10(p. uni[order. uni]), -log 10(p. obs[order. obs])) abline(a = 0, b = 1, col = "red")

Phenotypes by genotypes order. obs=order(p. obs) X 6 to 10=X[, !index 1 to 5]

Phenotypes by genotypes order. obs=order(p. obs) X 6 to 10=X[, !index 1 to 5] Xtop=X 6 to 10[, order. obs[1]] boxplot(my. Sim$y~Xtop)

Association with phenotypes r=-0. 32 PCA=prcomp(X) plot(my. Sim$y, PCA$x[, 2]) cor(my. Sim$y, PCA$x[, 2])

Association with phenotypes r=-0. 32 PCA=prcomp(X) plot(my. Sim$y, PCA$x[, 2]) cor(my. Sim$y, PCA$x[, 2])

Least Square Error set. seed(99164) s=sample(length(my. Sim$y), 10) plot(my. Sim$y[s], PCA$x[s, 2]) cor(my. Sim$y[s],

Least Square Error set. seed(99164) s=sample(length(my. Sim$y), 10) plot(my. Sim$y[s], PCA$x[s, 2]) cor(my. Sim$y[s], PCA$x[s, 2]) y r=-0. 18 y = a + cx + e x

GLM for GWAS Phenotype on individuals Population structure Y = SNP + Q (or

GLM for GWAS Phenotype on individuals Population structure Y = SNP + Q (or PCs) + (fixed effect) General Linear Model (GLM) e

Example from the ten individuals cbind(my. Sim$y[s], 1, PCA$x[s, 2], Xtop[s]) observation mean [

Example from the ten individuals cbind(my. Sim$y[s], 1, PCA$x[s, 2], Xtop[s]) observation mean [ b 0 y [1 y = Xb +e PC 2 b 1 SNP b 2 ] =b x 1 x 2 ] =X

Linear model y=b 0 + x 1 b 1 + x 2 b 2

Linear model y=b 0 + x 1 b 1 + x 2 b 2 + … + xpbp + e y: observation, dependent variable x: Explainary/independent variables e: Residuals/errors =e 12 + e 22 + … + en 2 =e'e =(y-Xb)'(y-Xb)

Optimization =e'e =e 2=(y-Xb)2 ∂ /∂b =2 X'(y-Xb) =2 X'y-2 X'Xb=0 X'Xb=X'y b=[X’X]-1[X’Y]

Optimization =e'e =e 2=(y-Xb)2 ∂ /∂b =2 X'(y-Xb) =2 X'y-2 X'Xb=0 X'Xb=X'y b=[X’X]-1[X’Y]

Statistical test

Statistical test

Action in R y=my. Sim$y X=cbind(1, PCA$x[, 2], Xtop) LHS=t(X)%*%X C=solve(LHS) RHS=t(X)%*%y b=C%*%RHS yb=X%*%b

Action in R y=my. Sim$y X=cbind(1, PCA$x[, 2], Xtop) LHS=t(X)%*%X C=solve(LHS) RHS=t(X)%*%y b=C%*%RHS yb=X%*%b e=y-yb n=length(y) ve=sum(e^2)/(n-1) vt=C*ve t=b/sqrt(diag(vt)) p=2*(1 -pt(abs(t), n-2))

Phenotypes by genotypes LM=cbind(b, t, sqrt(diag(vt)), p) rownames(LM)=cbind("Mean", "PC 2", "Xtop") colnames(LM)=cbind("b", "t", "SD",

Phenotypes by genotypes LM=cbind(b, t, sqrt(diag(vt)), p) rownames(LM)=cbind("Mean", "PC 2", "Xtop") colnames(LM)=cbind("b", "t", "SD", "p") LM

G=my. GD[, -1] n=nrow(G) m=ncol(G) P=matrix(NA, 1, m) for (i in 1: m){ x=G[,

G=my. GD[, -1] n=nrow(G) m=ncol(G) P=matrix(NA, 1, m) for (i in 1: m){ x=G[, i] if(max(x)==min(x)){ p=1}else{ X=cbind(1, PCA$x[, 2], x) LHS=t(X)%*%X C=solve(LHS) RHS=t(X)%*%y b=C%*%RHS yb=X%*%b e=y-yb n=length(y) ve=sum(e^2)/(n-1) vt=C*ve t=b/sqrt(diag(vt)) p=2*(1 -pt(abs(t), n-2)) } #end of testing variation P[i]=p[length(p)] } #end of looping for markers Loop through genome

QQ plot p. obs=P[!index 1 to 5] m 2=length(p. obs) p. uni=runif(m 2, 0,

QQ plot p. obs=P[!index 1 to 5] m 2=length(p. obs) p. uni=runif(m 2, 0, 1) order. obs=order(p. obs) order. uni=order(p. uni) plot(-log 10(p. uni[order. uni]), -log 10(p. obs[order. obs]), ylim=c(0, 7)) abline(a = 0, b = 1, col = "red") GWASby. Cor GLM with PC 2

G=my. GD[, -1] n=nrow(G) m=ncol(G) P=matrix(NA, 1, m) for (i in 1: m){ x=G[,

G=my. GD[, -1] n=nrow(G) m=ncol(G) P=matrix(NA, 1, m) for (i in 1: m){ x=G[, i] if(max(x)==min(x)){ p=1}else{ X=cbind(1, PCA$x[, 1: 3], x) LHS=t(X)%*%X C=solve(LHS) RHS=t(X)%*%y b=C%*%RHS yb=X%*%b e=y-yb n=length(y) ve=sum(e^2)/(n-1) vt=C*ve t=b/sqrt(diag(vt)) p=2*(1 -pt(abs(t), n-2)) } #end of testing variation P[i]=p[length(p)] } #end of looping for markers Using three PCs

QQ plot p. obs=P[!index 1 to 5] m 2=length(p. obs) p. uni=runif(m 2, 0,

QQ plot p. obs=P[!index 1 to 5] m 2=length(p. obs) p. uni=runif(m 2, 0, 1) order. obs=order(p. obs) order. uni=order(p. uni) plot(-log 10(p. uni[order. uni]), -log 10(p. obs[order. obs]), ylim=c(0, 7)) abline(a = 0, b = 1, col = "red") GLM with PC 2 GLM with PC 1: 3

QQ plot p. obs=P m 2=length(p. obs) p. uni=runif(m 2, 0, 1) order. obs=order(p.

QQ plot p. obs=P m 2=length(p. obs) p. uni=runif(m 2, 0, 1) order. obs=order(p. obs) order. uni=order(p. uni) plot(-log 10(p. uni[order. uni]), -log 10(p. obs[order. obs]), ) abline(a = 0, b = 1, col = "red")

color. vector <- rep(c("deepskyblue", "orange", "forestgreen", "indianred 3"), 10) m=nrow(my. GM) plot(t(-log 10(P))~seq(1: m),

color. vector <- rep(c("deepskyblue", "orange", "forestgreen", "indianred 3"), 10) m=nrow(my. GM) plot(t(-log 10(P))~seq(1: m), col=color. vector[my. GM[, 2]]) abline(v=my. Sim$QTN. position, lty = 2, lwd=2, col = "black")

Highlight Spurious association Covariates LS concept Linear model

Highlight Spurious association Covariates LS concept Linear model