Special Topics in Genomics Ch IPchip and Tiling

  • Slides: 40
Download presentation
Special Topics in Genomics Ch. IP-chip and Tiling Arrays

Special Topics in Genomics Ch. IP-chip and Tiling Arrays

Traditional Method for Understanding Transcription Regulation Gene expression microarray analysis Clustering genes by expression

Traditional Method for Understanding Transcription Regulation Gene expression microarray analysis Clustering genes by expression profile Search conserved sequence motifs in cluster promoters Very challenging for mammalian genomes

Ch. IP-chip Technology • Chromatin Immuno. Precipitation + microarray • Detect genome-wide in vivo

Ch. IP-chip Technology • Chromatin Immuno. Precipitation + microarray • Detect genome-wide in vivo location of TF and other DNA-binding proteins • Can learn the regulatory mechanism of a transcription factor or DNA-binding protein much better and faster

Chromatin Immuno. Precipitation (Ch. IP) By Richard Bourgon at UC Berkley

Chromatin Immuno. Precipitation (Ch. IP) By Richard Bourgon at UC Berkley

TF/DNA Crosslinking in vivo By Richard Bourgon at UC Berkley

TF/DNA Crosslinking in vivo By Richard Bourgon at UC Berkley

Sonication (~500 bp) By Richard Bourgon at UC Berkley

Sonication (~500 bp) By Richard Bourgon at UC Berkley

TF-specific Antibody By Richard Bourgon at UC Berkley

TF-specific Antibody By Richard Bourgon at UC Berkley

Immunoprecipitation By Richard Bourgon at UC Berkley

Immunoprecipitation By Richard Bourgon at UC Berkley

Reverse Crosslink and DNA Purification By Richard Bourgon at UC Berkley

Reverse Crosslink and DNA Purification By Richard Bourgon at UC Berkley

Amplification By Richard Bourgon at UC Berkley

Amplification By Richard Bourgon at UC Berkley

Genome Tiling Arrays # Probes / # Total human Array Probes genome Probe Length

Genome Tiling Arrays # Probes / # Total human Array Probes genome Probe Length Probe Resolution Price Affymetrix 7 6 M 42. 0 M 25 mer 35 bp $2, 000 Nimblegen 38 390 K 14. 8 M 50 mer 110 bp $30, 000 60 mer 300 bp in genes; 500 bp in intergenic $11, 000 Agilent 21 244 K 5. 1 M By Xiaole Shirley Liu at Harvard

Genome Tiling Arrays • Affymetrix genome tiling microarrays – Tile the genome non-repeat regions

Genome Tiling Arrays • Affymetrix genome tiling microarrays – Tile the genome non-repeat regions – Chr 21/22 tiling (earlier version): 1 million probe pairs (PM & MM) at 35 bp resolution on 3 arrays – Whole genome: 42 million PM probes on 7 arrays PM CGACATTGATTCAAGACTACA MM CGACATTGATTCTAGACTACA Probes Chromosome By Xiaole Shirley Liu at Harvard

Chromatin Immuno. Precipitation (Ch. IP) By Richard Bourgon at UC Berkley

Chromatin Immuno. Precipitation (Ch. IP) By Richard Bourgon at UC Berkley

Ch. IP-chip Array Hybridization • Map high intensity probes back to the genome •

Ch. IP-chip Array Hybridization • Map high intensity probes back to the genome • Locate TF binding location Ch. IP-DNA Noise Probes Chromosome By Xiaole Shirley Liu at Harvard

Identify Ch. IP-enriched Region • Controls: sonicated genomic Input DNA • Often 3 Ch.

Identify Ch. IP-enriched Region • Controls: sonicated genomic Input DNA • Often 3 Ch. IP, 3 Ctrl replicates are needed Ch. IP Ctrl By Xiaole Shirley Liu at Harvard

Mann-Whitney U-test for Ch. IP-region Detection • Affy TAS, Cawley et al (Cell 2004):

Mann-Whitney U-test for Ch. IP-region Detection • Affy TAS, Cawley et al (Cell 2004): – Each probe: rank probes (either PM-MM or PM) within [-500 bp, +500 bp] window – Check whether sum of Ch. IP ranks is much smaller By Xiaole Shirley Liu at Harvard

Tile. Map (Ji and Wong, Bioinformatics 2005) STEP 1: Compute a test statistic for

Tile. Map (Ji and Wong, Bioinformatics 2005) STEP 1: Compute a test statistic for each probe to summarize probe level information STEP 2: Combine probe level test statistics of neighboring probes to help infer binding regions

Probe level test statistic: empirical Bayes approach Probe Sample Variance (df) 1 2 3

Probe level test statistic: empirical Bayes approach Probe Sample Variance (df) 1 2 3 … I … Mean Sum of Squares Shrinkage Factor Variance Shrinkage Estimator Variance Estimates … A modified t-statistic Probe level test statistics …

Combining neighboring probes Tile. Map (MA) 1. Compute the probe level test statistic t

Combining neighboring probes Tile. Map (MA) 1. Compute the probe level test statistic t for each probe; 2. Compute a moving average statistic to measure enrichment; 3. Estimate FDR. Tile. Map (HMM) 1. Compute the probe level test statistic t for each probe; 2. Estimate the distribution of t under H 0 and H 1; 3. Model t by a Hidden Markov Model, and decode the HMM.

Shrinking variance increases statistical power Moving Average t-statistic, variance shrinking t-statistic, canonical Mean(X 1)-Mean(X

Shrinking variance increases statistical power Moving Average t-statistic, variance shrinking t-statistic, canonical Mean(X 1)-Mean(X 2)

Peak 2 (180 bp) transgenics Neural tube expression Transgenics

Peak 2 (180 bp) transgenics Neural tube expression Transgenics

Comparisons between Tile. Map and previous methods c. Myc Ch. IP-chip Data: 6 IP

Comparisons between Tile. Map and previous methods c. Myc Ch. IP-chip Data: 6 IP + 6 CT 1 + 6 CT 2 Gold Standard: Using GTRANS and Keles’ method to analyze all 18 arrays Test data: 4 arrays, 2 IP vs 2 CT 1 (s 2 r 2) Tile. Map-HMM (Ji & Wong, 2005) GTRANS or TAS (Kampa et al. , 2004) 1. Set a window; 2. Perform a Wilcoxon signed rank test for each window. Keles et al. (2004) 1. Compute a t-statistic t for each probe (no shrinking, two sample only); 2. Rank probes by a moving average.

Shrinking variance saves money Using non-shrinking method (Keles’ method) to analyze all probes Using

Shrinking variance saves money Using non-shrinking method (Keles’ method) to analyze all probes Using shrinking method to analyze half of the probes, i. e. , reduce information by half

MAT (Johnson W. E. et al. PNAS, 2006) • Model-based Analysis of Tiling arrays

MAT (Johnson W. E. et al. PNAS, 2006) • Model-based Analysis of Tiling arrays for Ch. IP-chip • Goal: – – Find Ch. IP-regions without replicates Find Ch. IP-region without controls Find Ch. IP-regions without MM probes Can analyze data array by array By Xiaole Shirley Liu at Harvard

MAT • Estimate probe behavior by checking other probes with similar sequence on the

MAT • Estimate probe behavior by checking other probes with similar sequence on the same array • Probe sequence plays a big role in signal value • Most of the probes in Ch. IP-chip measures non-specific hybridization By Xiaole Shirley Liu at Harvard

Probe Behavior Model Baseline on number of Ts A, C, G at each position

Probe Behavior Model Baseline on number of Ts A, C, G at each position of the 25 mer A, C, G, T Count Square 25 mer Copy Number along the Genome By Xiaole Shirley Liu at Harvard

Probe Standardization • Fit the probe model array by array • Divide array probes

Probe Standardization • Fit the probe model array by array • Divide array probes to bins (3 k probes/bin) • Background-subtraction and standardization (normalization) on a single array; Observed probe intensity Model predicted probe intensity Observed probe variance within each bin By Xiaole Shirley Liu at Harvard

Eliminate Normalization • Probe log(PM) values before and after standardization • If normalize before

Eliminate Normalization • Probe log(PM) values before and after standardization • If normalize before model fitting – Predicted same Ch. IP-regions, although less confident By Xiaole Shirley Liu at Harvard

Ch. IP-region Detection • Window-based MATscore – Ch. IP without Ctrl – TM: trimmed

Ch. IP-region Detection • Window-based MATscore – Ch. IP without Ctrl – TM: trimmed mean – Multiple Ch. IP with multiple Ctrl – More probes, higher t values in Ch. IP, less variance (fluctuation) more confident By Xiaole Shirley Liu at Harvard

Raw probe values at two spike-in regions with concentration 2 X 2 X 2

Raw probe values at two spike-in regions with concentration 2 X 2 X 2 X Ch. IP_1 Log(PM) Input_1 Log(PM) Sequence-based probe behavior standardization Ch. IP_1 t-value Input_1 t-value Window-based neighboring probe combination for Ch. IP-region detection Ch. IP_1 MATscore Ch. IP_1/Input_1 MATscore 3 Reps Ch. IP/Input MATscore By Xiaole Shirley Liu at Harvard

Statistical Significance of Hits • P-value and FDR cutoff: – P-value from MATscore distribution

Statistical Significance of Hits • P-value and FDR cutoff: – P-value from MATscore distribution – Estimate negative peaks under the same P value cutoff – Regional FDR = #negative_peaks / #positive_peaks By Xiaole Shirley Liu at Harvard

MAT summary • Open source python http: //chip. dfci. harvard. edu/~wli/MAT/ • Runs faster

MAT summary • Open source python http: //chip. dfci. harvard. edu/~wli/MAT/ • Runs faster than array scanner • Can work with single Ch. IP, multiple Ch. IP, and multiple Ch. IP with controls with increasing accuracy – Use single Ch. IP on promoter arrays to test antibody and protocol before going whole genome • Can identify individual failed samples By Xiaole Shirley Liu at Harvard

Benchmark for Ch. IP-chip Target Detection (Johnson D. S. et al. Genome Research, 2008)

Benchmark for Ch. IP-chip Target Detection (Johnson D. S. et al. Genome Research, 2008) • ENCODE Spike-in experiment: both amplified and un-amplified Ch. IP Input 96 ENCODE clones, 2, 4, 8, . . . , 256 X enrichment + total chromatin DNA total genomic DNA • Blind test: Samples hybridized to different tiling arrays, predictions made before the key was released

Comparison of platforms

Comparison of platforms

Comparison of algorithms Combined Johnson D. S. et al. Genome Research 2008 with Ji

Comparison of algorithms Combined Johnson D. S. et al. Genome Research 2008 with Ji H. et al. Nature Biotechnology 2008

MBR: Microarray Blob Remover By Xiaole Shirley Liu at Harvard

MBR: Microarray Blob Remover By Xiaole Shirley Liu at Harvard

x. MAN: e. Xtreme MApping of oligo. Nucleotides • http: //chip. dfci. harvard. edu/~wli/x.

x. MAN: e. Xtreme MApping of oligo. Nucleotides • http: //chip. dfci. harvard. edu/~wli/x. MAN • x. MAN maps ~42 M Affymetrix tiling probes to the newest human genome assembly in less than 6 CPU hours – BLAST needs 20 CPU years; BLAT needs 55 CPU days – Probe TCCCAGCACTTTGGGAGGCTGAGGC maps to 50, 660 times in the genome • Can map long oligos, and paired tag high throughput sequencing fragments • Store the copy number information of every probe • m. XAN filters tiling array probes to ensure one unique probe measurement per 1 kb, improves peak detection By Xiaole Shirley Liu at Harvard

CEAS: Cis-regulatory Element Annotation System • Data Analysis Button for Biologists http: //ceas. cbi.

CEAS: Cis-regulatory Element Annotation System • Data Analysis Button for Biologists http: //ceas. cbi. pku. edu. cn By Xiaole Shirley Liu at Harvard

Cis. Genome (Ji H. et al. Nature Biotechnology, 2008) Graphic User Interface Cis. Genome

Cis. Genome (Ji H. et al. Nature Biotechnology, 2008) Graphic User Interface Cis. Genome Browser Core Data Analysis Programs

Other applications of tiling arrays • • • Transcriptome mapping Me. DIP-chip DNase-chip Nucleosome

Other applications of tiling arrays • • • Transcriptome mapping Me. DIP-chip DNase-chip Nucleosome localization Array CGH and copy number variation