Analysis of microarray data Introduction Microarrays are chips

  • Slides: 23
Download presentation
Analysis of microarray data

Analysis of microarray data

Introduction • Microarrays are chips which measure whether genes are switched on or off

Introduction • Microarrays are chips which measure whether genes are switched on or off in cells. • They can be used to detect sets of genes responsible for genetic diseases such as cancer. • This lecture: – introduce microarray technology – discuss a few applications – introduce statistical and computational techniques for analysing microarray data

Gene expression • All cells in an organism have the same genomic DNA. •

Gene expression • All cells in an organism have the same genomic DNA. • Distinct cellular identities are due to differences in gene expression (= transcription & translation of gene). • Whether a gene is transcribed is often determined by the presence/ absence of other genes products (esp. proteins) … • … so genes interact in complex networks: gene A switches on B, which turns off C which upregulates (increases) A, … • Hence perturbations to single gene can lead to changes in expression of many genes.

Functional genomics • Next step after sequencing of human genome: understand connection between DNA

Functional genomics • Next step after sequencing of human genome: understand connection between DNA sequence & phenotypic (actual) characteristics of organism. • This is complex, because proteins and genes act in highly connected networks and signalling pathways in an orchestrated manner. • Traditionally molecular biology has worked on a “one gene one function” basis & experiments tend to study the effects of a single gene/ few genes at a time, but. . .

Microarray chips • …microarrays can measure many genes at once. • Microarray chips are

Microarray chips • …microarrays can measure many genes at once. • Microarray chips are commonly glass slides with a matrix of spots printed (using eg. dot matrix technology) on to them. • A spot contains millions of identical molecules of DNA or oligonucleotide (the probes), which will bind a specific DNA sequence, such as the c. DNA of a gene. • The glass slides can contain 1000 s of spots, each recognising a different sequence, eg. one spot for every gene in the human genome.

Microarray experiments • Since almost all m. RNA translated protein, total m. RNA of

Microarray experiments • Since almost all m. RNA translated protein, total m. RNA of cell ~ genes expressed. • Mash up cells and extract m. RNA. • Reverse transcribe RNA c. DNA (can be heated to make single-stranded). • Label c. DNA from reference cells green (Cy 3) and c. DNA from target cells red (Cy 5). • Hybridise (wash on equal amounts of target & reference sample & allow to bind to probes which have complementary bases) both samples, reference and target, to a single microarray chip.

Results of microarray experiments • The spot for gene 1 = – red if

Results of microarray experiments • The spot for gene 1 = – red if more m. RNA 1 in target cells – green if more m. RNA 1 in reference cells – yellow if same in both • Actually, images of red & green fluorescence are taken separately using laser & scanner & their intensities are measured using image software. • Data often expressed as matrix of relative expression levels = , indexed by genes and target samples.

Microarray data Red (Cy 5) and green (Cy 3) images are overlaideach spot corresponds

Microarray data Red (Cy 5) and green (Cy 3) images are overlaideach spot corresponds to a gene.

Microarray data • Reason for using relative intensities: process of printing of spots on

Microarray data • Reason for using relative intensities: process of printing of spots on to chips does not give a reliable fixed number of molecules, so the intensity measurements (which correspond to the amount of bound sample c. DNA) represent not only the level of expression of the gene, but also the peculiarities of the chip. • Some disadvantages to not having the absolute gene expression values- eg. confidence limits on the microarray measurement depend heavily on the actual values.

Principal uses of chips • Genome-scale expression analysis – Differentiation – Response to environmental

Principal uses of chips • Genome-scale expression analysis – Differentiation – Response to environmental factors – Disease states – Effect of drugs • Detection of sequence variation

Applications of microarrays - yeast • The fact that we can only reliably measure

Applications of microarrays - yeast • The fact that we can only reliably measure relative gene expression, means that microarrays tend to be used for comparative experiments: • Eg. “what changes in gene expression arise when yeast is in anaerobic v. aerobic conditions? ” - de. Risi et al, Science v. 278, pp 680 -686 • Spot arrays with complementary DNAs to all genes from the yeast genome (the probes). • Approx. 6400 probes

Applications of microarrays - yeast • Reverse transcribe m. RNA from yeast cells harvested

Applications of microarrays - yeast • Reverse transcribe m. RNA from yeast cells harvested at various time points as conditions are varied from anaerobic to aerobic (start fermentation in sugary solution and allow yeast to deplete sugar). • 7 time points (2 hr intervals, first 9 hrs after placed in sugary medium) • Let sample from first time point be “reference” (totally anaerobic, lots of sugar). • Label reference c. DNA with green dye (Cy 3) and other sample c. DNA (later time points) with red dye (Cy 5).

Applications of microarrays - yeast • Hybridize mixture of equal quantities of reference sample

Applications of microarrays - yeast • Hybridize mixture of equal quantities of reference sample and one of the later-time samples (also do timepoint 1 against itself as control test) to a microarray chip- one experiment/ chip per timepoint. • Take images of red and green fluorescence, measure intensities, process (details of this later in lecture) and create a matrix, M, with entries, at spot representing gene i in chip containing sample j (jth timepoint).

Applications of microarrays - yeast • Look for genes that are differentially expressed in

Applications of microarrays - yeast • Look for genes that are differentially expressed in aerobic and anaerobic conditions. • Find that when sample at initial timepoint is compared to itself, 99% correlation between intensity values. • Timepoint 1 v. timepoint 2: 95% of genes have < 1. 5 -fold difference in expressioncorrelation of 98% between data at 2 timepoints • Timepoint 1 v. timepoint 7: c. 1700 genes out of 6400 had > 2 -fold difference in expressionsome genes had much higher ratio. • Authors could infer properties of signalling pathways involved in the shift in metabolism.

Applications of microarrays- cancer • Take a set of patients with a certain type

Applications of microarrays- cancer • Take a set of patients with a certain type of cancer and a set of control patients with no cancer, take cells from tumour/ region where tumour is in cancer patients. Extract m. RNA, make c. DNA and dye one of the samples from a control patient green; all other samples red. • Make/ buy a chip with human genes- as many as possible/ those thought to be relevant for cancer. • Hybridise mixture of reference sample (green) and one of the other target samples to each chip.

Applications of microarrays- cancer • Process data and statistically analyse to find genes which

Applications of microarrays- cancer • Process data and statistically analyse to find genes which have significantly higher/ lower expression in cancer cells than in normal cells. These genes are likely to be important in causing cancer/ effects of cancer. • Can also cluster data to discover different subclasses of cancer, eg. Alizadeh et al. (2000) Nature, v. 403, pp 503 -511 • A cancer of the immune cells (lymphoma) is clinically diverse: 40% patients respond well to therapy and have good survival. Authors used hierarchical clustering (see later) to discover two new subclasses of the cancer, classified based on gene expression profiles.

Applications of microarrays- cancer • Thinking of the relative gene expression values (in fact

Applications of microarrays- cancer • Thinking of the relative gene expression values (in fact intensities) of the different samples (patients) as a vector, the authors were able to cluster the data. • Microarray profiling of tumours can be used to classify tumours into subclasses (with eg. survival implications) of already known tumour types.

Different kinds of microarray • c. DNA versus oligonucleotide • We have discussed so

Different kinds of microarray • c. DNA versus oligonucleotide • We have discussed so far gene expression microarrays, but also: – Sequencing chips: contain as probes, all possible sequences of a given length k (typ. k=8 -10 bases long). Mark target sample with fluorescent dye and hybridise. The spots with fluorescence are where target bound. The corresponding sequence is part of the target spectrum (=set of k-base sequences in target). Then use computers to assemble whole sequence. Target cannot be too long (eg. 150 -200 bps if k=8). – Can be used for looking for gene mutations/ polymorphism.

Analysis of microarray data • Data is matrix, Mij of (absolute or usually) relative

Analysis of microarray data • Data is matrix, Mij of (absolute or usually) relative expression values of gene i in condition j. Often presented as log 2 values, since this means that downregulation of gene (eg. ratio ½) is not squashed into interval (0, 1), but takes values (eg. – 1) in (- , 0). • Pre-processing: There are several sources of variation in intensity in microarray experiments other than differences in gene expression between samples. These are thought of as noise and we want to remove them, by pre-processing. First subtract background intensity, which is due to binding to wrong spot, etc. (this is usually done by the image processing software).

Analysis of microarray data • Normalization: Another source of noise is due to differences

Analysis of microarray data • Normalization: Another source of noise is due to differences in labelling and detection efficiencies for the fluorescent labels and in the amount of RNA between the 2 samples (red/green). Normalization tries to get rid of this by dividing all the ratios by an appropriate constant to make the mean or the median of the ratios =1 (mean /median centring respectively). If the data is in log form- simply subtract constant. • Assumption is that on average across all (or a chosen subset of) genes the levels of m. RNA produced will be the same in the two samples. • Alternatively use scatter plot of intensity green v. red & normalize to make slope=1.

Analysis of microarray data • Normalized data: where R & G are the red

Analysis of microarray data • Normalized data: where R & G are the red & green intensities – the respective backgrounds and is the normalization constant. • Filtering: This is the process of working out which genes are differentially expressed across the different conditions (eg. timepoints of the yeast experiment or cancer v. non-cancer) and removing from the dataset those genes which don’t vary. We will discuss this in detail later.

Analysis of microarray data • Clustering: If you view the expression values of a

Analysis of microarray data • Clustering: If you view the expression values of a single gene across different samples (rows of the expression matrix) as a vector then the genes can be clustered based on the similarity of the vectors. Likewise, using the columns of the matrix, the samples can be clustered. This helps eg. to classify cancers/ find genes which are in same network as each other or have similar functions.

Conclusions • We have described microarray chips for analysing gene expression. • We have

Conclusions • We have described microarray chips for analysing gene expression. • We have mentioned three key areas of analysis – Normalization – Filtering – Clustering • In the next session we will cover statistical methods necessary for filtering microarray data.