Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed

  • Slides: 31
Download presentation
Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent Systems Private

Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent Systems Private Ltd Pune Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Topics 1. Introduction 2. Data Storage and Exchange Standards 3. Analysis (Clustering) 4. Conclusion

Topics 1. Introduction 2. Data Storage and Exchange Standards 3. Analysis (Clustering) 4. Conclusion and References Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

1. Introduction • Structure Activity Relationship • Structural vs. Functional Genomics • Principals of

1. Introduction • Structure Activity Relationship • Structural vs. Functional Genomics • Principals of Microarray Experiment • Applications Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Structure Activity Relationship GENES (finite) EXPERIMENTAL SETUP Structural Genomics OR Prediction Work Functional Genomics

Structure Activity Relationship GENES (finite) EXPERIMENTAL SETUP Structural Genomics OR Prediction Work Functional Genomics OR Confirmation Work PROTEINS Persistent Systems Pvt. Ltd. http: //www. persistent. co. in FUNCTIONS (infinite)

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Source: Yale Bioinformatics

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Source: Yale Bioinformatics

Principles of a Microarray Experiment: Hybridization 1. Environment Functions Proteins m. RNA c. DNA

Principles of a Microarray Experiment: Hybridization 1. Environment Functions Proteins m. RNA c. DNA 2. Different incubations of cells results in up or down regulation of different sets of genes. 3. Microarray provides a medium for matching known and unknown DNA samples based on base-pairing rules and automating the process of identifying the unknowns 4. Set of expressed genes (at m. RNA stage) isolated and identified using hybridization on a microarray chip Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

HTS Using Hybridization Microarray Chip Probe: oligos/c. DNA (gene templates) + Target: c. DNA

HTS Using Hybridization Microarray Chip Probe: oligos/c. DNA (gene templates) + Target: c. DNA (variables to be detected) Samples Hybridization Analysis of outcome Pathways Targets/Leads Disease Class. Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Functional Annotation Physiological states

Timeline for drug discovery Discovery (5 yrs) 5000 Gene expression study Pre-Clinical (1 yr)

Timeline for drug discovery Discovery (5 yrs) 5000 Gene expression study Pre-Clinical (1 yr) 50 Clinical (6 yrs) 5 Review (2 yrs) 1 Marketed Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

2. Data Storage and Exchange Standards • Raw and Processed Data • Conceptual View

2. Data Storage and Exchange Standards • Raw and Processed Data • Conceptual View of Database • Example of Array. Express • Issues • Standardization for Exchange Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Raw data – images • Red (Cy 5) dot – overexpressed or up-regulated •

Raw data – images • Red (Cy 5) dot – overexpressed or up-regulated • Green (Cy 3) dot – underexpressed or down-regulated • Yellow dot – equally expressed • Intensity - “absolute” level • red/green - ratio of expression c. DNA plotted microarray – – 2 - 2 x overexpressed 0. 5 - 2 x underexpressed • log 2( red/green ) - “log ratio” – – 1 -1 2 x overexpressed 2 x underexpressed Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Microarray Expression Value Representation expression value types composite spots primary measurements primary images Source:

Microarray Expression Value Representation expression value types composite spots primary measurements primary images Source: MGED derived values composite images e. g. , green/red ratios Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Gene expression database – a conceptual view Samples Gene expression matrix Genes Gene annotations

Gene expression database – a conceptual view Samples Gene expression matrix Genes Gene annotations Sample annotations Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Gene expression levels

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

DAG Representation of Biomaterials Sample source Primary sample 1 Derived sample 2 treatment A

DAG Representation of Biomaterials Sample source Primary sample 1 Derived sample 2 treatment A new state of sample source treatment Primary sample 2 extraction Extract 1 Extract 2 labeling Hybridization Source: MGED Labeled extract 2 Labeled extract 1 Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Array. Express (MGED) Design Source: MGED Persistent Systems Pvt. Ltd. http: //www. persistent. co.

Array. Express (MGED) Design Source: MGED Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Array. Express (MGED) Architecture application server Web server MAML data Array. Express data submission

Array. Express (MGED) Architecture application server Web server MAML data Array. Express data submission & Curation database Curation pipeline Persistent Systems Pvt. Ltd. image server? http: //www. persistent. co. in data warehouse Source: MGED

Issues in Storage • Size of Data – Experiments • 100 000 genes, 320

Issues in Storage • Size of Data – Experiments • 100 000 genes, 320 cell types • 2000 compounds, 3 time points, 2 concentrations, 2 replicates – Data • 8 x 1011 data-points • 1 x 1015 = 1 peta. B of data • Others – Raw data are images – lack of standard measurement units for gene expression – lack of standards for sample annotation Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Standardization • MIAME (Minimum Info About a Microarray Expt) – Experimental design, Array design

Standardization • MIAME (Minimum Info About a Microarray Expt) – Experimental design, Array design – Samples, Hybridisations – Measurements, Controls • OMG-LSR-DFT – Life Sciences Research, Domain Task Force Gene Expression RFP – EBI (MAML), Rosetta (GEML), Net. Genics : submitters • Proposed MAGEML (MAML +GEML) – – Annotations + data; data stored as a set of external 2 D matrices Data format independent of particular scanner or image analysis software Sample and treatment can be represented as a Directed Acyclic Graphs Concept of composite images and composite spots Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

3. Data Analysis (Clustering) • Normalization • Hierarchical Clustering • Divisive Clustering • Other

3. Data Analysis (Clustering) • Normalization • Hierarchical Clustering • Divisive Clustering • Other Methods • Visual Tools Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Normalization • Assumption – Average expression ratio =1 – Amount of m. RNA from

Normalization • Assumption – Average expression ratio =1 – Amount of m. RNA from both the sample is same • Total Intensity – Calculate a factor to rescale intensities of all te genes so that • total Cy 3= total Cy 5 • Regression Techniques – Adjust the intensities so that • Slope of scatter plot of Cy 3 vs Cy 5 =1 • Using ratio statistics – Based on ‘housekeeping genes’ expression a probability density ratio is developed which is used for normalization Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Clustering • Hierarchical – Single, Complete and Average Linkage • Divisive – K-means –

Clustering • Hierarchical – Single, Complete and Average Linkage • Divisive – K-means – Self Organizing Maps (SOM) • Others – Principal Component Analysis (PCA) – Supervised Methods Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Hierarchical clustering • Distance metrics or Similarity Measures – Euclidian, Pearson, distance of slopes

Hierarchical clustering • Distance metrics or Similarity Measures – Euclidian, Pearson, distance of slopes etc. . • Cost functions – Single Linkage • Min distance of any two members (one from each of the two clusters) – Complete Linkage • Max distance of any two members (one from each of the two clusters) – Average Linkage • UPGMA • Within Groups – Ward’s Method • Join which produces smallest possible error in some of squared errors Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Divisive clustering • K-means – ‘k’ random (or specified) points used to create clusters,

Divisive clustering • K-means – ‘k’ random (or specified) points used to create clusters, average vectors for the clusters then used iteratively – Knowledge of probable no of clusters (k) needed – Used in combination with PCA and hierarchical clustering • Self Organizing maps – User defined geometric configurations as partitions – Random vectors generated for each partition and TRAINED till convergence (ANN based) • Visualization Methods – Helps in cluster visualization • Scatter Plot, Web plot, histogram – May help in clustering itself • E. g. , Super. Grouper utility of Maxd. View Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Other Clustering Methods • PCA (Principal Component Analysis) – Also called SVD (Singular Value

Other Clustering Methods • PCA (Principal Component Analysis) – Also called SVD (Singular Value Decomposition) – Reduces dimensionality of gene expression space – Finds best view that helps separate data into groups • Supervised Methods – SVM (Support Vector Machine) – Previous knowledge of which genes expected to cluster is used for training – Binary classifier uses ‘feature space’ and ‘kernel function’ to define a optimal ‘hyperplane’ – Also used for classification of samples- ‘expression fingerprinting’ for disease classification Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

4. Conclusion and References • • • Microarrays makes HTS with hybridization possible No

4. Conclusion and References • • • Microarrays makes HTS with hybridization possible No single standard unit for measuring expression levels Handling and interpretation not yet exact Assumptions: Elements in cluster must share some commonality Classification depends on method used for clustering, normalization, distance function • No “correct” way of classification, “biological understanding” is the ultimate guide • Provides extension to existing knowledge (e. g. , classifying a novel gene into a known pathway) Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

Software • Databases – Public repositories: • GEO (NCBI), Gene. X (NCGR), Array. Express

Software • Databases – Public repositories: • GEO (NCBI), Gene. X (NCGR), Array. Express (EBI) – In-house databases • Stanford, MIT, University of Pennsylvania, – Organism specific databases • Mouse Genome Informatics Database – Proprietary databases – • Gene Logic, NCI, Synergy (Net. Genics), Genomics Knowledge Platform (Incyte) • Analysis Tools – Public Domain • maxd. View (University of Manchester) • Cyber. T , RCuster interfaces of Gene. X – Proprietary • Spotfire, Xpression NTI (Informaxinc) Persistent Systems Pvt. Ltd. http: //www. persistent. co. in

References • Microarray Gene Expression Database Group – http: //www. mged. org • National

References • Microarray Gene Expression Database Group – http: //www. mged. org • National Center for Genomic Research – http: //genex. ncgr. org • University of Manchester , Bioinformatics Group – http: //bioinf. man. ac. uk/microarray/resources. html • Nature Reviews Genetics – http: //www. nature. com/nrg/ Persistent Systems Pvt. Ltd. http: //www. persistent. co. in