Data Management and Analysis issues in Microarray Data

  • Slides: 43
Download presentation
Data Management and Analysis issues in Microarray Data Aditya Phatak Persistent Systems Pvt. Ltd.

Data Management and Analysis issues in Microarray Data Aditya Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 1

Roadmap n n n Microarray technology basics Gene expression data analysis Microarray data management

Roadmap n n n Microarray technology basics Gene expression data analysis Microarray data management Gene. Chip Analysis Core at Washington University Function Express (at Wash. U) Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 2

Microarray Technology Basics Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co.

Microarray Technology Basics Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 3

Elementary Concepts Cell -> Chromosome -> DNA -> m. RNA -> Proteins -> Function

Elementary Concepts Cell -> Chromosome -> DNA -> m. RNA -> Proteins -> Function n Every cell of the body contains a full set of chromosomes and identical genes Only a fraction of these genes are “switched on” or “expressed” Gene expression is a highly complex and regulated process Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 4

Life Scientists Want to… n Identify genes that are involved in various diseases. n

Life Scientists Want to… n Identify genes that are involved in various diseases. n n Reveal new patterns of coordinated gene expression n Find differentially expressed genes (“targets”) Find co-regulated genes Find genes responsible for “biological pathways”. Uncover new categories of genes Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 5

DNA Microarrays n n Microarrays allow biologists to analyze expression of hundreds of genes

DNA Microarrays n n Microarrays allow biologists to analyze expression of hundreds of genes within a cell in a single experiment quickly and efficiently Microarrays can be used to find gene expression within a single sample or compare gene expression from two different tissue samples – healthy and diseased tissue Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 6

DNA Microarrays: Technical Foundation n n A set of unique probes (usually short, single-stranded

DNA Microarrays: Technical Foundation n n A set of unique probes (usually short, single-stranded DNA sequences) are immobilized as single spots on a solid surface (chemically modified glass chips) m. RNA is extracted from cell or tissue samples. c. DNA target is generated from the m. RNA sample. This is labeled with fluorescent or radioactive dye (cy 5 and cy 3). The target is incubated with the array, and each probe will bind its complementary target molecule if present. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 7

An example Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

An example Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 8

A DNA Microarray Experiment n n n Prepare a DNA chip using chosen target

A DNA Microarray Experiment n n n Prepare a DNA chip using chosen target DNAs Generate a hybridization solution containing mixture of fluorescently labeled c. DNAs Hybridize mixture with DNA chip Detect c. DNA intensity using laser technology and store data in a computer Analyze data using computational methods Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 9

Types of Microarrays n n Two kind of samples are co-hybridized on the array

Types of Microarrays n n Two kind of samples are co-hybridized on the array (e. g. c. DNA arrays) Only one sample is hybridized and comparisons are made between arrays (e. g. Affymetrix oligonucliotide arrays) Need to deal with different data formats. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 10

Gene Expression Data Analysis Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent.

Gene Expression Data Analysis Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 11

Issues With Output Data n Data Quality n Detect false positives from true positives

Issues With Output Data n Data Quality n Detect false positives from true positives n n Replicate chips Use independent methods to validate results Dye effects Position effects Replication is essential Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 12

Preprocessing Tasks n Adjusting data n n n Filter out genes that are not

Preprocessing Tasks n Adjusting data n n n Filter out genes that are not expressed in any experiments Log Transform data: replace all data values X by log 2(X) Data Normalization n n Intensities are scaled/normalized to a selected chip so that multiple chips can be compared Uses data from a set of controls that have been “spiked” into the DNA and which has an avg. expression ratio of 1. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 13

Analysis Issues n Identify genes that are involved in various diseases. n n n

Analysis Issues n Identify genes that are involved in various diseases. n n n Reveal new patterns of coordinated gene expression n Find differentially expressed genes (“targets”) e. g. find genes that are overexpressed in 6 out of 7 tumor samples versus 8 out of 10 normal samples by five-fold or more Find co-regulated genes Find genes responsible for “biological pathways”. Uncover new categories of genes Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 14

Data Mining: Extracting Meaningful Patterns n Data mining: extracting meaningful patterns n n Supervised

Data Mining: Extracting Meaningful Patterns n Data mining: extracting meaningful patterns n n Supervised methods: You have apriori knowledge of the biological system and are looking for specific patterns e. g. Neighbourhood analysis, supervised tree harvesting Unsupervised methods: Identify patterns that you couldn’t have necessarily been aware of beforehand. E. g. Hierarchical clustering, K-means clustering, SOM, PCA Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 15

Example of Hierarchical Clustering Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent.

Example of Hierarchical Clustering Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 16

Statistical Analysis n n Ad hoc approaches (eg. ‘fold change’) do not consider variability

Statistical Analysis n n Ad hoc approaches (eg. ‘fold change’) do not consider variability of measurements Gives more “sensitive” and “selective” analysis Provides estimate of confidence that gene expression pattern observed would occur Rank the genes by statistical scores Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 17

Microarray Data Management Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co.

Microarray Data Management Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 18

Sharing Gene Expression Data n Goals n Facilitates comparisons between experiments n Improves analysis

Sharing Gene Expression Data n Goals n Facilitates comparisons between experiments n Improves analysis n Confidence in results n n Conduct multivariate analysis of data generated by multiple researchers Don’t penalize those who share Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 19

Tracking All Aspects of Microarray Experiments n An array experiment has many steps n

Tracking All Aspects of Microarray Experiments n An array experiment has many steps n n n RNA preparation Array fabrication, Array platform Scanner setting Image Analysis Use of integrated laboratory information management system (LIMS) Common protocols and language for data sharing n Aditya D Phatak MIAME: Minimal information about a microarry experiment (from MGED) Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 20

Sharing Paradigms n What to share n n Raw images (TIFF) Extracted raw spot

Sharing Paradigms n What to share n n Raw images (TIFF) Extracted raw spot intensity values with background measurements Processed data such as avg. intensity values List of genes that show clear differential expression Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 21

Protection of Intellectual Property n n Most array experiments identify dozens of genes of

Protection of Intellectual Property n n Most array experiments identify dozens of genes of interest, only a few of which can be studied by one lab Some results might provide substantial intellectual property rights to Pharma companies Which data should be shared and when Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 22

Gene. Chip Analysis Core at Washington University Aditya D Phatak Persistent Systems Pvt. Ltd.

Gene. Chip Analysis Core at Washington University Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 23

Architecture of Gene. Chip Core Image Format and Upload Image and Data Scan Wash

Architecture of Gene. Chip Core Image Format and Upload Image and Data Scan Wash Gene Expression Database Hybridize probe to Array Control Experiment DNA Samples Aditya D Phatak Web-based Data Analysis Tools Web/ Application Server Uni. Gene Locus Link GO Gene Annotation Databases Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 24

Function Express Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Function Express Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 25

Why Function Express? n n Existing analysis software provide clustering algorithms These software lack

Why Function Express? n n Existing analysis software provide clustering algorithms These software lack in gene annotation n n It is not possible to visualize genes based on functional classification, chromosomal localization or tissue expression -- E. g. Give me genes that are transcription factors, are expressed in pancreas and are located on chromosome 1 p 31 Integration of gene annotation with clustering techniques is vital to understanding the underlying biological process Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 26

Features of Function Express n n n Annotates genes on chips/experiment automatically Annotation is

Features of Function Express n n n Annotates genes on chips/experiment automatically Annotation is updated periodically Allows examination of gene expression across different experiments conducted on different arrays and on different species Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 27

Gene Annotation in Function Express n n n Provides annotation from Uni. Gene and

Gene Annotation in Function Express n n n Provides annotation from Uni. Gene and Locus. Link and GO databases. There databases are updated frequently Uses Homologene database to get crossspecies annotation Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 28

Cross-Species Investigation Seeing how genes that show differential expression in one experiment on an

Cross-Species Investigation Seeing how genes that show differential expression in one experiment on an organism (say mice) correlate with genes from another experiment done in another organism (say human) n n Find more about interesting genes Validation Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 29

Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 30

Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 30

Microarray Data Schema Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co.

Microarray Data Schema Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 31

Gene Annotation Schema Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co.

Gene Annotation Schema Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 32

Q 1 Q 2 Experiment data Q 3 Annotation data View maintenance using MQO

Q 1 Q 2 Experiment data Q 3 Annotation data View maintenance using MQO Append only Updated frequently Uni. Gene Locus Link GO Gene Annotation Databases Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Update deltas may or may not be available 33

Screenshots of Function Express Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent.

Screenshots of Function Express Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 34

Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 35

Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 35

n The user enters an experiment name, chips included in the analysis along with

n The user enters an experiment name, chips included in the analysis along with an abscissa value and x-axis label for each chip in order to create an experiment. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 36

n A comparison of raw (left panel) versus mean-standard deviation centered (right panel) data

n A comparison of raw (left panel) versus mean-standard deviation centered (right panel) data demonstrates that transformations reveal similar patterns of gene regulation Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 37

n The query generator allows the user to create virtually any combination of logical

n The query generator allows the user to create virtually any combination of logical queries, using a simple GUI interface. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 38

n n The Gene Inspector (A), Gene Annotation (B), Comments (C), and Chip data

n n The Gene Inspector (A), Gene Annotation (B), Comments (C), and Chip data Inspector (D) windows are shown. Each window is updated when the probe selection changes in the Spreadsheet window. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 39

Function Express Client Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co.

Function Express Client Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 40

References 1. 2. DJ Lockhart and EA Winzeler, Genomics, Gene Expression and DNA Arrays.

References 1. 2. DJ Lockhart and EA Winzeler, Genomics, Gene Expression and DNA Arrays. Nature (2000) 405(6788): 827 -836. The Chipping Forecast. http: //www. nature. com/ng/chips_interstitial. html Nature Genetics published a special issue (January 1999 Supplement), The Chipping Forecast. It's a collection of more than 10 reviews (60 pages) on different aspects of microarray analysis. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 41

References… 3. 4. John Quackenbush, Computational Analysis of Microarray Data. Nature Reviews (June 2001)

References… 3. 4. John Quackenbush, Computational Analysis of Microarray Data. Nature Reviews (June 2001) Volume 2 Kathleen Kerr and Gary Churchill, Statistical design and the analysis of gene expression microarray data. Genet. Res. , Camb. (2001) 77: 123 -128. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 42

References… 5. 6. Lot of Opinion/review articles from Nature (June 2001) Volume 2 Microarray

References… 5. 6. Lot of Opinion/review articles from Nature (June 2001) Volume 2 Microarray Gene Expression Database Group(MEGD) http: //www. mged. org/ Home page for the organization that's trying to establish a data standard for microarray data. Aditya D Phatak Persistent Systems Pvt. Ltd. http: //www. persistent. co. in 43