Microarray analysis Quantitation of Gene Expression Data to
- Slides: 36
Microarray analysis Quantitation of Gene Expression Data to Networks Reading: Ch 16 BIO 520 Bioinformatics Jim Lund
Microarray data • Image quantitation. • Normalization • Find genes with significant expression differences • Annotation • Clustering, pattern analysis, network analysis
Sources of Non-Biological Variation • Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation • Differences in the amount of labeled c. DNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide. ) • Variation across replicate slides • Variation across hybridization conditions • Variation in scanning conditions • Variation among technicians doing the lab work.
Factors which impact on the signal level • • • Amount of m. RNA Labeling efficiencies Quality of the RNA Laser/dye combination Detection efficiency of photomultiplier or CCD
Hela Hep. G 2
Hela Hep. G 2
M = Log (Red - Log Green M vs. A Plot A = (Log Green + Log Red) / 2
M v A plots of chip pairs: before normalization
M v A plots of chip pairs: after quantile normalization
Types of normalization • To total signal (linear normalization) • LOESS (LOcally WEighted polynomial regre. SSion). • To “house keeping genes” • To genomic DNA spots (Research Genetics) or mixed c. DNA’s • To internal spikes
Microarray analysis • Data exploration: expression of gene X? • Statistical analysis: which genes show large, reproducible changes? • Clustering: grouping genes by expression pattern. • Knowledge-based analysis: Are amine synthesis genes involved in this experiment?
Fold change: the crudest method of finding differentially expressed genes Hela Hep. G 2 >2 -fold expression change
What do we mean by differentially expressed? • Statistically, our gene is different from the other genes. Distribution of measurements for gene of interest Log ratio Probability of a given Value of the ratio Number of genes Distribution of average ratios for all genes
Finding differentially expressed genes What affects our certainty that a gene is up or down-regulated? Probe Signal • Number of sample points • Difference in means • Standard deviations of sample Sample A Sample B
Practical views on statistics • With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns. • Sensitivity and selectivity are inversely related - e. g. increased selection of true positives WILL result in more false positive and less false negatives. • False negatives are lost opportunities, false positives cost $’s and waste time. • A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow so use conservative statistics to protect against false positives when designing follow-on experiments.
Statistical Tests • Student’s t-test – Correct for multiple testing! (Holm-Bonferroni) • False discovery rate. • Significance Analysis of Microarrays (SAM) – http: //www-stat. stanford. edu/~tibs/SAM/ • ANOVA • Principal components analysis • Special methods for periodic patterns in data.
p-value Volcano plot: log(expr) vs p-value Log(fold change)
Scatter plot showing genes with significant p-values
Pattern finding • In many cases, the patterns of differential expression are the target (as opposed to specific genes) – Clustering or other approaches for pattern identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes – Classification - identify genes which best distinguish 2 or more classes. • The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e. g. cluster analysis of random noise will produce clusters which will be meaningless….
What is clustering? • Group similar objects together. – Genes with similar expression patterns. • Objects in the same cluster (group) are more similar to each other than objects in different clusters.
Clustering • What is clustering? • Similarity/distance metrics • Hierarchical clustering algorithms – Made popular by Stanford, ie. [Eisen et al. 1998] • K-means – Made popular by many groups, eg. [Tavazoie et al. 1999] • Self-organizing map (SOM) – Made popular by Whitehead, ie. [Tamayo et al. 1999]
Typical Tools • SAM (Significance Analysis of Microarrays), Stanford • Gene. Spring • Affymetrix Gene. Chip Operating System (GCOS) • Cluster/Treeview • R statistics package microarray analysis libraries.
How to define similarity? 1 1 Experiments X p genes n genes X Y n Y Raw matrix n Similarity matrix • Similarity metric: – A measure of pairwise similarity or dissimilarity – Examples: • Correlation coefficient • Euclidean distance
Similarity metrics • Euclidean distance Euclidean clustering = magnitude & Direction • Correlation coefficient Correlation clustering = direction
Sporulation-example
Sporulation-example
Self-organizing maps (SOM) [Kohonen 1995] • Basic idea: – map high dimensional data onto a 2 D grid of nodes – Neighboring nodes are more similar than points far away
Self-organizing maps (SOM)
SOM Clusters
Things learned from microarray gene expression experiments • Pathways not known to be involved – Ontology? • Novel genes involved in a known pathway • “like” and “unlike” tissues
Transcription Factors Regulatory Networks • Identify co-regulated genes • Search for common motifs (transcription factor binding sites) – Evaluate known motifs/factors – Search for new ones. • Programs: MEME, etc.
m. RNA-protein Correlation • YPD: should have relevant data – will yeast be typical? • Electrophoresis 18: 533 – 23 proteins on 2 D gels – r=0. 48 for m. RNA=protein • Post transcriptional and post translational regulation important!
Other microarray formats • Single nucleotide polymorphism (SNP) chips – Oligos with each of 4 nt at each SNP. • Chromosomal IP chips (Ch. IP: chip) – Determine transcription factor binding sites – Promoter DNA on the chip. • Alternative splicing chips – Long oligos, covering alternatively spliced exons, or all exons. • Genome tiling chips
Ch. IP: chip--Identification of Transcription Factor Binding Sites • Cross link transcription factors to DNA with formaldehyde • Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e. g GST fusion). • Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA. • Brown et. al. (2001) Nature, 409(533 -8)
Ch. IP: chip Analysis of TF Binding Sites
On to Proteomics DNA RNA Protein
- Chapter 17 from gene to protein
- Glycov
- Octet red 96
- Quantitation
- Microarray analysis
- Microarray
- Gene by gene test results
- Lac operon in prokaryotes
- Dr kevin ahern
- Regulation of gene expression in bacteria
- Chapter 18 regulation of gene expression
- Chapter 18 regulation of gene expression
- Regulation of gene expression
- What is the first step in gene expression
- Chapter 18 regulation of gene expression
- Genetic effects on gene expression across human tissues
- What is positive and negative control
- Chapter 18
- Rt pcr primer design
- Gene expression omnibus tutorial
- Gene expression
- Gene expression
- Gene expression
- Gene expression
- Gene expression in prokaryotes and eukaryotes
- At dna
- Lyonization of gene expression
- Microarray uses
- Methylation & chip-on-chip microarray platform
- Dna g c a t
- Rna quality control
- Protein microarray
- Dna microarray
- Microarray
- Hetrozigot
- Estudio microarray
- Microarray types