Microarray analysis Quantitation of Gene Expression Data to

  • Slides: 36
Download presentation
Microarray analysis Quantitation of Gene Expression Data to Networks Reading: Ch 16 BIO 520

Microarray analysis Quantitation of Gene Expression Data to Networks Reading: Ch 16 BIO 520 Bioinformatics Jim Lund

Microarray data • Image quantitation. • Normalization • Find genes with significant expression differences

Microarray data • Image quantitation. • Normalization • Find genes with significant expression differences • Annotation • Clustering, pattern analysis, network analysis

Sources of Non-Biological Variation • Dye bias: differences in heat and light sensitivity, efficiency

Sources of Non-Biological Variation • Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation • Differences in the amount of labeled c. DNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide. ) • Variation across replicate slides • Variation across hybridization conditions • Variation in scanning conditions • Variation among technicians doing the lab work.

Factors which impact on the signal level • • • Amount of m. RNA

Factors which impact on the signal level • • • Amount of m. RNA Labeling efficiencies Quality of the RNA Laser/dye combination Detection efficiency of photomultiplier or CCD

Hela Hep. G 2

Hela Hep. G 2

Hela Hep. G 2

Hela Hep. G 2

M = Log (Red - Log Green M vs. A Plot A = (Log

M = Log (Red - Log Green M vs. A Plot A = (Log Green + Log Red) / 2

M v A plots of chip pairs: before normalization

M v A plots of chip pairs: before normalization

M v A plots of chip pairs: after quantile normalization

M v A plots of chip pairs: after quantile normalization

Types of normalization • To total signal (linear normalization) • LOESS (LOcally WEighted polynomial

Types of normalization • To total signal (linear normalization) • LOESS (LOcally WEighted polynomial regre. SSion). • To “house keeping genes” • To genomic DNA spots (Research Genetics) or mixed c. DNA’s • To internal spikes

Microarray analysis • Data exploration: expression of gene X? • Statistical analysis: which genes

Microarray analysis • Data exploration: expression of gene X? • Statistical analysis: which genes show large, reproducible changes? • Clustering: grouping genes by expression pattern. • Knowledge-based analysis: Are amine synthesis genes involved in this experiment?

Fold change: the crudest method of finding differentially expressed genes Hela Hep. G 2

Fold change: the crudest method of finding differentially expressed genes Hela Hep. G 2 >2 -fold expression change

What do we mean by differentially expressed? • Statistically, our gene is different from

What do we mean by differentially expressed? • Statistically, our gene is different from the other genes. Distribution of measurements for gene of interest Log ratio Probability of a given Value of the ratio Number of genes Distribution of average ratios for all genes

Finding differentially expressed genes What affects our certainty that a gene is up or

Finding differentially expressed genes What affects our certainty that a gene is up or down-regulated? Probe Signal • Number of sample points • Difference in means • Standard deviations of sample Sample A Sample B

Practical views on statistics • With appropriate biological replicates, it is possible to select

Practical views on statistics • With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns. • Sensitivity and selectivity are inversely related - e. g. increased selection of true positives WILL result in more false positive and less false negatives. • False negatives are lost opportunities, false positives cost $’s and waste time. • A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow so use conservative statistics to protect against false positives when designing follow-on experiments.

Statistical Tests • Student’s t-test – Correct for multiple testing! (Holm-Bonferroni) • False discovery

Statistical Tests • Student’s t-test – Correct for multiple testing! (Holm-Bonferroni) • False discovery rate. • Significance Analysis of Microarrays (SAM) – http: //www-stat. stanford. edu/~tibs/SAM/ • ANOVA • Principal components analysis • Special methods for periodic patterns in data.

p-value Volcano plot: log(expr) vs p-value Log(fold change)

p-value Volcano plot: log(expr) vs p-value Log(fold change)

Scatter plot showing genes with significant p-values

Scatter plot showing genes with significant p-values

Pattern finding • In many cases, the patterns of differential expression are the target

Pattern finding • In many cases, the patterns of differential expression are the target (as opposed to specific genes) – Clustering or other approaches for pattern identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes – Classification - identify genes which best distinguish 2 or more classes. • The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e. g. cluster analysis of random noise will produce clusters which will be meaningless….

What is clustering? • Group similar objects together. – Genes with similar expression patterns.

What is clustering? • Group similar objects together. – Genes with similar expression patterns. • Objects in the same cluster (group) are more similar to each other than objects in different clusters.

Clustering • What is clustering? • Similarity/distance metrics • Hierarchical clustering algorithms – Made

Clustering • What is clustering? • Similarity/distance metrics • Hierarchical clustering algorithms – Made popular by Stanford, ie. [Eisen et al. 1998] • K-means – Made popular by many groups, eg. [Tavazoie et al. 1999] • Self-organizing map (SOM) – Made popular by Whitehead, ie. [Tamayo et al. 1999]

Typical Tools • SAM (Significance Analysis of Microarrays), Stanford • Gene. Spring • Affymetrix

Typical Tools • SAM (Significance Analysis of Microarrays), Stanford • Gene. Spring • Affymetrix Gene. Chip Operating System (GCOS) • Cluster/Treeview • R statistics package microarray analysis libraries.

How to define similarity? 1 1 Experiments X p genes n genes X Y

How to define similarity? 1 1 Experiments X p genes n genes X Y n Y Raw matrix n Similarity matrix • Similarity metric: – A measure of pairwise similarity or dissimilarity – Examples: • Correlation coefficient • Euclidean distance

Similarity metrics • Euclidean distance Euclidean clustering = magnitude & Direction • Correlation coefficient

Similarity metrics • Euclidean distance Euclidean clustering = magnitude & Direction • Correlation coefficient Correlation clustering = direction

Sporulation-example

Sporulation-example

Sporulation-example

Sporulation-example

Self-organizing maps (SOM) [Kohonen 1995] • Basic idea: – map high dimensional data onto

Self-organizing maps (SOM) [Kohonen 1995] • Basic idea: – map high dimensional data onto a 2 D grid of nodes – Neighboring nodes are more similar than points far away

Self-organizing maps (SOM)

Self-organizing maps (SOM)

SOM Clusters

SOM Clusters

Things learned from microarray gene expression experiments • Pathways not known to be involved

Things learned from microarray gene expression experiments • Pathways not known to be involved – Ontology? • Novel genes involved in a known pathway • “like” and “unlike” tissues

Transcription Factors Regulatory Networks • Identify co-regulated genes • Search for common motifs (transcription

Transcription Factors Regulatory Networks • Identify co-regulated genes • Search for common motifs (transcription factor binding sites) – Evaluate known motifs/factors – Search for new ones. • Programs: MEME, etc.

m. RNA-protein Correlation • YPD: should have relevant data – will yeast be typical?

m. RNA-protein Correlation • YPD: should have relevant data – will yeast be typical? • Electrophoresis 18: 533 – 23 proteins on 2 D gels – r=0. 48 for m. RNA=protein • Post transcriptional and post translational regulation important!

Other microarray formats • Single nucleotide polymorphism (SNP) chips – Oligos with each of

Other microarray formats • Single nucleotide polymorphism (SNP) chips – Oligos with each of 4 nt at each SNP. • Chromosomal IP chips (Ch. IP: chip) – Determine transcription factor binding sites – Promoter DNA on the chip. • Alternative splicing chips – Long oligos, covering alternatively spliced exons, or all exons. • Genome tiling chips

Ch. IP: chip--Identification of Transcription Factor Binding Sites • Cross link transcription factors to

Ch. IP: chip--Identification of Transcription Factor Binding Sites • Cross link transcription factors to DNA with formaldehyde • Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e. g GST fusion). • Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA. • Brown et. al. (2001) Nature, 409(533 -8)

Ch. IP: chip Analysis of TF Binding Sites

Ch. IP: chip Analysis of TF Binding Sites

On to Proteomics DNA RNA Protein

On to Proteomics DNA RNA Protein