Statistical Genomics Lecture 11 Power type I error
- Slides: 26
Statistical Genomics Lecture 11: Power, type I error and FDR Zhiwu Zhang Washington State University
Administration Homework 2, due Feb 17, Wednesday, 3: 10 P Homework 3 posted, due Mar 2, Wednesday, 3: 10 PM Midterm exam: February 26, Friday, 50 minutes (3: 354: 25 PM), 25 questions.
Outline Simulation of phenotype from genotype GWAS by correlation Power FDR Cutoff Null distribution of p values Resolution QTN bins and non-QTN bins
GWAS by correlation my. GD=read. table(file="http: //zzlab. net/GAPIT/data/mdp_numeric. txt", head=T) my. GM=read. table(file="http: //zzlab. net/GAPIT/data/mdp_SNP_information. txt", head=T) setwd("~/Dropbox/Current/ZZLab/WSUCourse/CROPS 545/Demo") source("G 2 P. R") source("GWASby. Cor. R") X=my. GD[, -1] index 1 to 5=my. GM[, 2]<6 X 1 to 5 = X[, index 1 to 5] set. seed(99164) my. Sim=G 2 P(X= X 1 to 5, h 2=. 75, alpha=1, NQTN=10, distribution="norm") p= GWASby. Cor(X=X, y=my. Sim$y)
The top five associations index=order(p) top 5=index[1: 5] detected=intersect(top 5, my. Sim$QTN. position) false. Positive=setdiff(top 5, my. Sim$QTN. position) top 5 my. Sim$QTN. position detected length(detected) false. Positive Power=3/10 False Discovery Rate (FDR) =2/5
The top five associations color. vector <- rep(c("deepskyblue", "orange", "forestgreen", "indianred 3"), 10) m=nrow(my. GM) plot(t(-log 10(p))~seq(1: m), col=color. vector[my. GM[, 2]]) abline(v=my. Sim$QTN. position, lty = 2, lwd=2, col = "black") abline(v= false. Positive, lty = 2, lwd=2, col = "red") ü Bonferroni Cutoff: 1%/3093=3. 2 E-6 ü Resolution
Cutoff from null distribution of P values: CHR 6 -10 N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Observed 9. 80 E-08 7. 76 E-07 3. 07 E-06 5. 20 E-06 7. 26 E-06 8. 64 E-06 9. 72 E-06 1. 67 E-05 1. 91 E-05 2. 13 E-05 3. 28 E-05 3. 45 E-05 3. 98 E-05 4. 46 E-05 5. 39 E-05 Expected 0. 000926784 0. 001853568 0. 002780352 0. 003707136 0. 00463392 0. 005560704 0. 006487488 0. 007414273 0. 008341057 0. 009267841 0. 010194625 0. 011121409 0. 012048193 0. 012974977 0. 013901761 1074 1075 1076 1077 1078 1079 0. 9918845 0. 9954024 0. 9960631 0. 9970031 0. 9992356 0. 9999589 0. 9953661 0. 9962929 0. 9972196 0. 9981464 0. 9990732 1 1% of observed p values are below 0. 0000328 index. null=!index 1 to 5 & !is. na(p) p. null=p[index. null] m. null=length(p. null) index. sort=order(p. null) p. null. sort=p. null[index. sort] head(p. null. sort) tail(p. null. sort) seq=seq(1: m. null) table=cbind(seq, p. null. sort, seq/m. null) head(table, 15) tail(table) P value of 3. 28 E-5 is equivalent to 1% type 1 error
What about QTNs every where? set. seed(99164) my. Sim=G 2 P(X= my. GD[, -1], h 2=. 75, alpha=1, NQTN=10, distribution="norm") p= GWASby. Cor(X=X, y=my. Sim$y) plot(t(-log 10(p))~seq(1: m), col=color. vector[my. GM[, 2]]) abline(v=my. Sim$QTN. position, lty = 2, lwd=2, col = "black")
Resolution and bin approach 10 Kb is really good, 100 Kb is OK Bins with QTNs for power Bins without QTNs for type I error
Bins (e. g. 100 Kb) big. Num=1 e 9 resolution=100000 bin=round((my. GM[, 2]*big. Num+my. GM[, 3])/resolution) result=cbind(my. GM, t(p), bin) head(result) Minimum p value within bin
Bins of QTNs QTN. bin=result[my. Sim$QTN. position, ] QTN. bin
Sorted bins of QTNs index. qtn. p=order(QTN. bin[, 4]) QTN. bin[index. qtn. p, ]
FDR and type I error Total number of bins: 3054 (size of 100 kb) N bin t(p) Power #False bins FDR Type. I Error 1 2 3 4 5 6 7 8 9 10 50120 12235 60985 12918 31482 101348 31573 42222 10502 22331 4. 44 E-16 1. 00 E-10 1. 38 E-10 7. 02 E-08 2. 05 E-05 9. 58 E-02 1. 88 E-01 2. 94 E-01 4. 98 E-01 9. 91 E-01 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 0 0 2 416 608 782 1001 1335 0 0 0. 285714286 0. 985781991 0. 988617886 0. 989873418 0. 991089109 0. 992565056 0 0 0. 000654879 0. 1362148 0. 19908317 0. 256057629 0. 327766863 0. 437131631 0. 285714286=2/(2+5) 0. 000654879=2/3054
Receiver Operating Characteristic "The curve is created by plotting the true positive rate against the false positive rate at various threshold settings. " -Wikipedia Power ROC curve FDR Liu et. al. PLo. S Genetics, 2016
GAPIT. FDR. Type. I Function library(compiler) #required for cmpfun source("http: //www. zzlab. net/GAPIT/gapit_functions. txt") my. Stat=GAPIT. FDR. Type. I( WS=c(1 e 0, 1 e 3, 1 e 4, 1 e 5), GM=my. GM, seq. QTN=my. Sim$QTN. position, GWAS=result) str(my. Stat)
Return
Area Under Curve (AUC) par(mfrow=c(1, 2), mar = c(5, 2, 5, 2)) plot(my. Stat$FDR[, 1], my. Stat$Power, type="b") plot(my. Stat$Type. I[, 1], my. Stat$Power, type="b")
Replicates nrep=100 set. seed(99164) stat. Rep=replicate(nrep, { my. Sim=G 2 P(X=my. GD[, -1], h 2=. 5, alpha=1, NQTN=10, distribution="norm") p=p= GWASby. Cor(X=my. GD[, -1], y=my. Sim$y) seq. QTN=my. Sim$QTN. position my. GWAS=cbind(my. GM, t(p), NA) my. Stat=GAPIT. FDR. Type. I(WS=c(1 e 0, 1 e 3, 1 e 4, 1 e 5), GM=my. GM, seq. QTN=my. Sim$QTN. position, GWAS=my. GWAS, max. Out=100, Max. BP= 1 e 10) })
str(stat. Rep)
Means over replicates power=stat. Rep[[2]] #FDR s. fdr=seq(3, length(stat. Rep), 7) fdr=stat. Rep[s. fdr] fdr. mean=Reduce ("+", fdr) / length(fdr) #AUC: power vs. FDR s. auc. fdr=seq(6, length(stat. Rep), 7) auc. fdr=stat. Rep[s. auc. fdr] auc. fdr. mean=Reduce ("+", auc. fdr) / length(auc. fdr)
Plots of power vs. FDR the. Color=rainbow(4) plot(fdr. mean[, 1], power , type="b", col=the. Color [1], xlim=c(0, 1)) for(i in 2: ncol(fdr. mean)){ lines(fdr. mean[, i], power , type="b", col= the. Color [i]) }
Plots of AUC barplot(auc. fdr. mean, names. arg=c("1 bp", "1 K", "100 K"), xlab="Resolution", ylab="AUC")
ROC with different heritability h 2= 25% vs. 75% 10 QTNs Normal distributed QTN effect 100 kb resolution Power against Type I error
Simulation and GWAS nrep=100 set. seed(99164) #h 2=25% stat. Rep 25=replicate(nrep, { my. Sim=G 2 P(X=my. GD[, -1], h 2=. 25, alpha=1, NQTN=10, distribution="norm") p=p= GWASby. Cor(X=my. GD[, -1], y=my. Sim$y) seq. QTN=my. Sim$QTN. position my. GWAS=cbind(my. GM, t(p), NA) my. Stat=GAPIT. FDR. Type. I(WS=c(1 e 0, 1 e 3, 1 e 4, 1 e 5), GM=my. GM, seq. QTN=my. Sim$QTN. position, GWAS=my. GWAS, max. Out=100, Max. BP=1 e 10)}) #h 2=75% stat. Rep 75=replicate(nrep, { my. Sim=G 2 P(X=my. GD[, -1], h 2=. 75, alpha=1, NQTN=10, distribution="norm") p=p= GWASby. Cor(X=my. GD[, -1], y=my. Sim$y) seq. QTN=my. Sim$QTN. position my. GWAS=cbind(my. GM, t(p), NA) my. Stat=GAPIT. FDR. Type. I(WS=c(1 e 0, 1 e 3, 1 e 4, 1 e 5), GM=my. GM, seq. QTN=my. Sim$QTN. position, GWAS=my. GWAS, max. Out=100, Max. BP=1 e 10)})
Means and plot power 25=stat. Rep 25[[2]] s. t 1=seq(4, length(stat. Rep 25), 7) t 1=stat. Rep 25[s. t 1] t 1. mean. 25=Reduce ("+", t 1) / length(t 1) power 75=stat. Rep 75[[2]] s. t 1=seq(4, length(stat. Rep 75), 7) t 1=stat. Rep 75[s. t 1] t 1. mean. 75=Reduce ("+", t 1) / length(t 1) plot(t 1. mean. 25[, 4], power 25, type="b", col="blue", xlim=c(0, 1)) lines(t 1. mean. 75[, 4], power 75, type="b", col= "red")
Highlight Simulation of phenotype from genotype GWAS by correlation Power FDR Cutoff Null distribution of p values Resolution QTN bins and non-QTN bins
- Type 1 or type 2 error statistics
- Type 1 error vs type 2 error example
- Hypothesis example in research
- Hypothesis testing meaning
- Type 2 error
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Difference between structural and functional genomics
- Essnet qsr
- Difference between structural and functional genomics
- Integrative genomics viewer download
- A vision for the future of genomics research
- Integrative genomics viewer
- Rachel butler genomics
- Harvest genomics
- What is genome
- Genomics
- Functional genomics
- Application of genomics
- Types of genomics
- "encoded genomics" -job
- "encoded genomics"
- Power triangle diagram
- Statistical power table
- Cdmvt problems
- How to avoid parallax error
- Power swries
- Error sistematico y error aleatorio