DNA microarray and array data analysis Some of

  • Slides: 27
Download presentation
DNA microarray and array data analysis Some of the slides are adapted from the

DNA microarray and array data analysis Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility at CWRU

What is DNA Microarray n n DNA microarray is a new technology to measure

What is DNA Microarray n n DNA microarray is a new technology to measure the level of the m. RNA gene products of a living cell. A microarray chip is a rectangular chip on which is imposed a grid of DNA spots. These spots form a two dimensional array. Each spot in the array contains millions of copies of some DNA strand, bonded to the chip. Chips are made tiny so that a small amount of RNA is needed from experimental cells.

DNA Microarray n Many applications in both basic and clinical research n n determining

DNA Microarray n Many applications in both basic and clinical research n n determining the role a gene plays in a pathway, disease, diagnostics and pharmacology, … There are three main platforms for performing microarray analyses. c. DNA arrays (generic, multiple manufacturers) n Oligonucleotide arrays (genechips) (Affymetrix) n c. DNA membranes (radioactive detection) n

c. DNA Microarray n Spot cloned c. DNAs onto a glass/nylon microscope slide n

c. DNA Microarray n Spot cloned c. DNAs onto a glass/nylon microscope slide n n usually PCR amplified segments of plasmids Complementary hybridization -- CTAGCAGG actual gene -- GATCGTCC c. DNA (Reverse transcriptase) -- CUAGCAGG m. RNA n n Label 2 m. RNA samples with 2 different colors of fluorescent dye -- control vs. experimental Mix two labeled m. RNAs and hybridize to the chip Make two scans - one for each color Combine the images to calculate ratios of amounts of each m. RNA that bind to each spot

Spotted Microarray Process CTRL TEST

Spotted Microarray Process CTRL TEST

c. DNA Array Experiment Movie n http: //www. bio. davidson. edu/courses/genomic s/chip. html

c. DNA Array Experiment Movie n http: //www. bio. davidson. edu/courses/genomic s/chip. html

“Long Oligos” Like c. DNAs, but instead of using a cloned gene, design a

“Long Oligos” Like c. DNAs, but instead of using a cloned gene, design a 40 -70 base probe to represent each gene n Relies on genome sequence database and bioinformatics n Reduces cross hybridization n Cheaper and possibly more sensitive than Affy. system n

Affymetrix n n Uses 25 base oligos synthesized in place on a chip (20

Affymetrix n n Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) c. RNA labeled and scanned in a single “color” n n n one sample per chip Can have as many as 47, 000 probes on a chip (HGU 133 Plus 2. 0 Array) Arrays get smaller every year (more genes) Chips are expensive (about $400/chip) Proprietary system: “black box” software, can only use their chips

Affymetrix Genome Arrays

Affymetrix Genome Arrays

Affymetrix Gene. Chip Probe Array ®

Affymetrix Gene. Chip Probe Array ®

® Affymetrix Gene. Chip Probe Arrays Hybridized Probe Cell Gene. Chip Probe Array Single

® Affymetrix Gene. Chip Probe Arrays Hybridized Probe Cell Gene. Chip Probe Array Single stranded, fluorescently labeled c. RNA target * * * Oligonucleotide probe 24~50µm 1. 28 cm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe BGT 108_Duke. Univ Image of Hybridized Probe Array

Affymetrix Gene. Chip Probe: n 25 bases long single stranded DNA oligos Probe Cell:

Affymetrix Gene. Chip Probe: n 25 bases long single stranded DNA oligos Probe Cell: n Single square-shaped feature on an array containing one type of probe. n Contains millions of probe molecules Probe Pair: n Probe Set Perfect Match/Mismatch

Array Design 5’ Twenty oligo probes are selected from the last 600 bases from

Array Design 5’ Twenty oligo probes are selected from the last 600 bases from the 3’ end of the gene 3’ For each probe selected, a partner containing a central mutation is also made Perfect Match 25 mer DNA oligo Mismatch Probe Set Perfect Match Mismatch 24 m For each gene a total of 20 probe pairs are arrayed on the chip PM MM Probe Pair 24 m Probe Cell

Probe Sub-types on chips n Known genes n n n Specific transcripts Exemplars Consensus

Probe Sub-types on chips n Known genes n n n Specific transcripts Exemplars Consensus Housekeeping genes Expressed sequence tags (ESTs) Spiked control transcripts

c. RNA preparation Total RNA (5 -8 g) AAAAA c. DNA Strand 1 synthesis

c. RNA preparation Total RNA (5 -8 g) AAAAA c. DNA Strand 1 synthesis SS II reverse transcriptase AAAAA TTTTTNNNNN T 7 RNA pol. promoter E. coli DNA pol. I c. DNA Strand 2 synthesis IVT c. RNA synthesis amplifies and labels transcripts with Biotin AAAAANNNNN TTTTTNNNNN T 7 RNA pol. promoter ………. . ……. … … T 7 RNA pol. Fragmented c. RNA is now ready for hybridization to test chip UUUUUUUUUU UUUUU AAAAAAAN NNNNNNN TNNNN T T TT TTTTTTT T

c. RNA labeled targets B B B B Post hybridiz -ation washes B B

c. RNA labeled targets B B B B Post hybridiz -ation washes B B B c. RNA labeled targets. B Specific Binding B Non. Specific Binding B B B FL B B B S B B BS B B B c. DNA probes FL B S FL B B

FL BS BS FL FL BS FL B S Streptavidin

FL BS BS FL FL BS FL B S Streptavidin

Microarray experiment Cells Biotin-Labeled c. RNA transcript AAAA Poly (A)+ RNA IVT c. DNA

Microarray experiment Cells Biotin-Labeled c. RNA transcript AAAA Poly (A)+ RNA IVT c. DNA B B (B-UTP) Fragment (heat, Mg 2+) B Hybridize Scan Wash Stain (1 -18 hours) B Biotin-Labeled c. RNA fragments

. dat file Probe set The chip image data file (or “. dat” file)

. dat file Probe set The chip image data file (or “. dat” file) is the first part of data acquisition and appears on the computer screen upon completion of the laser scan. Here, we zoom in to see an individual probe set that has been highlighted

The first image is “sample 1. dat. ” note the pixel to pixel variation

The first image is “sample 1. dat. ” note the pixel to pixel variation within a probe cell A “*. cel. ” file is automatically generated when the “*. dat” image first appears on the screen. Note that this derivative file has homogenous signal intensity within its probe cells . cel file

Affymetrix Algorithms All MMs < PMs, No adjustment necessary Few MMs > PMs, change

Affymetrix Algorithms All MMs < PMs, No adjustment necessary Few MMs > PMs, change MMs based on weighted mean of other MMs Most MMs > PMs, change MMs to be slightly lesss than PM 1. Signal 1. 1 Adjusting MMs to purge negative values

Affymetrix Algorithms PM 1000 MM 900 PM-MM 100 5000 2000 430 230 Signal Calculation.

Affymetrix Algorithms PM 1000 MM 900 PM-MM 100 5000 2000 430 230 Signal Calculation. Calculate the signal 765 25 355 331 98 40 3005 1200 413 20333 203 6197 Having adjusted the MM values, we 3000 now 200 740 the 24 1805 210 14136 calculate signal 58 Unweighted mean = 2063 The unweighted mean is vulnerable to outlier data. In order to protect against this, we dampen the effect of outliers by using the Tukey bi-weight mean. PM-MM values that are a number of standard deviations away from the mean are given low weights in accordance with the graph shown here. Individual PM-MM data are multiplied by the weight factor before calculation of the mean. The weighted mean is then called the “signal. ” The PM values. The MM values. The PM-MM values are calculated. Weight factor 1 1 2 3 4 5 6 Standard deviations Using Tukey’s biweight mean = 1780 Signal (expression level) = 1780 590 230 360

. xls file

. xls file

ALL_vs_AML_train_set_38_sorted. res

ALL_vs_AML_train_set_38_sorted. res

ALL_vs_AML_train_set_38_sorted. cls 38 2 1 0000000000000011111 1 27 11

ALL_vs_AML_train_set_38_sorted. cls 38 2 1 0000000000000011111 1 27 11