Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed
- Slides: 31
Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent Systems Private Ltd Pune Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Topics 1. Introduction 2. Data Storage and Exchange Standards 3. Analysis (Clustering) 4. Conclusion and References Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
1. Introduction • Structure Activity Relationship • Structural vs. Functional Genomics • Principals of Microarray Experiment • Applications Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Structure Activity Relationship GENES (finite) EXPERIMENTAL SETUP Structural Genomics OR Prediction Work Functional Genomics OR Confirmation Work PROTEINS Persistent Systems Pvt. Ltd. http: //www. persistent. co. in FUNCTIONS (infinite)
Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Source: Yale Bioinformatics
Principles of a Microarray Experiment: Hybridization 1. Environment Functions Proteins m. RNA c. DNA 2. Different incubations of cells results in up or down regulation of different sets of genes. 3. Microarray provides a medium for matching known and unknown DNA samples based on base-pairing rules and automating the process of identifying the unknowns 4. Set of expressed genes (at m. RNA stage) isolated and identified using hybridization on a microarray chip Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
HTS Using Hybridization Microarray Chip Probe: oligos/c. DNA (gene templates) + Target: c. DNA (variables to be detected) Samples Hybridization Analysis of outcome Pathways Targets/Leads Disease Class. Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Functional Annotation Physiological states
Timeline for drug discovery Discovery (5 yrs) 5000 Gene expression study Pre-Clinical (1 yr) 50 Clinical (6 yrs) 5 Review (2 yrs) 1 Marketed Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
2. Data Storage and Exchange Standards • Raw and Processed Data • Conceptual View of Database • Example of Array. Express • Issues • Standardization for Exchange Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Raw data – images • Red (Cy 5) dot – overexpressed or up-regulated • Green (Cy 3) dot – underexpressed or down-regulated • Yellow dot – equally expressed • Intensity - “absolute” level • red/green - ratio of expression c. DNA plotted microarray – – 2 - 2 x overexpressed 0. 5 - 2 x underexpressed • log 2( red/green ) - “log ratio” – – 1 -1 2 x overexpressed 2 x underexpressed Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Microarray Expression Value Representation expression value types composite spots primary measurements primary images Source: MGED derived values composite images e. g. , green/red ratios Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Gene expression database – a conceptual view Samples Gene expression matrix Genes Gene annotations Sample annotations Persistent Systems Pvt. Ltd. http: //www. persistent. co. in Gene expression levels
Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
DAG Representation of Biomaterials Sample source Primary sample 1 Derived sample 2 treatment A new state of sample source treatment Primary sample 2 extraction Extract 1 Extract 2 labeling Hybridization Source: MGED Labeled extract 2 Labeled extract 1 Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Array. Express (MGED) Design Source: MGED Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Array. Express (MGED) Architecture application server Web server MAML data Array. Express data submission & Curation database Curation pipeline Persistent Systems Pvt. Ltd. image server? http: //www. persistent. co. in data warehouse Source: MGED
Issues in Storage • Size of Data – Experiments • 100 000 genes, 320 cell types • 2000 compounds, 3 time points, 2 concentrations, 2 replicates – Data • 8 x 1011 data-points • 1 x 1015 = 1 peta. B of data • Others – Raw data are images – lack of standard measurement units for gene expression – lack of standards for sample annotation Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Standardization • MIAME (Minimum Info About a Microarray Expt) – Experimental design, Array design – Samples, Hybridisations – Measurements, Controls • OMG-LSR-DFT – Life Sciences Research, Domain Task Force Gene Expression RFP – EBI (MAML), Rosetta (GEML), Net. Genics : submitters • Proposed MAGEML (MAML +GEML) – – Annotations + data; data stored as a set of external 2 D matrices Data format independent of particular scanner or image analysis software Sample and treatment can be represented as a Directed Acyclic Graphs Concept of composite images and composite spots Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
3. Data Analysis (Clustering) • Normalization • Hierarchical Clustering • Divisive Clustering • Other Methods • Visual Tools Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Normalization • Assumption – Average expression ratio =1 – Amount of m. RNA from both the sample is same • Total Intensity – Calculate a factor to rescale intensities of all te genes so that • total Cy 3= total Cy 5 • Regression Techniques – Adjust the intensities so that • Slope of scatter plot of Cy 3 vs Cy 5 =1 • Using ratio statistics – Based on ‘housekeeping genes’ expression a probability density ratio is developed which is used for normalization Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Clustering • Hierarchical – Single, Complete and Average Linkage • Divisive – K-means – Self Organizing Maps (SOM) • Others – Principal Component Analysis (PCA) – Supervised Methods Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Hierarchical clustering • Distance metrics or Similarity Measures – Euclidian, Pearson, distance of slopes etc. . • Cost functions – Single Linkage • Min distance of any two members (one from each of the two clusters) – Complete Linkage • Max distance of any two members (one from each of the two clusters) – Average Linkage • UPGMA • Within Groups – Ward’s Method • Join which produces smallest possible error in some of squared errors Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Divisive clustering • K-means – ‘k’ random (or specified) points used to create clusters, average vectors for the clusters then used iteratively – Knowledge of probable no of clusters (k) needed – Used in combination with PCA and hierarchical clustering • Self Organizing maps – User defined geometric configurations as partitions – Random vectors generated for each partition and TRAINED till convergence (ANN based) • Visualization Methods – Helps in cluster visualization • Scatter Plot, Web plot, histogram – May help in clustering itself • E. g. , Super. Grouper utility of Maxd. View Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Other Clustering Methods • PCA (Principal Component Analysis) – Also called SVD (Singular Value Decomposition) – Reduces dimensionality of gene expression space – Finds best view that helps separate data into groups • Supervised Methods – SVM (Support Vector Machine) – Previous knowledge of which genes expected to cluster is used for training – Binary classifier uses ‘feature space’ and ‘kernel function’ to define a optimal ‘hyperplane’ – Also used for classification of samples- ‘expression fingerprinting’ for disease classification Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
4. Conclusion and References • • • Microarrays makes HTS with hybridization possible No single standard unit for measuring expression levels Handling and interpretation not yet exact Assumptions: Elements in cluster must share some commonality Classification depends on method used for clustering, normalization, distance function • No “correct” way of classification, “biological understanding” is the ultimate guide • Provides extension to existing knowledge (e. g. , classifying a novel gene into a known pathway) Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
Software • Databases – Public repositories: • GEO (NCBI), Gene. X (NCGR), Array. Express (EBI) – In-house databases • Stanford, MIT, University of Pennsylvania, – Organism specific databases • Mouse Genome Informatics Database – Proprietary databases – • Gene Logic, NCI, Synergy (Net. Genics), Genomics Knowledge Platform (Incyte) • Analysis Tools – Public Domain • maxd. View (University of Manchester) • Cyber. T , RCuster interfaces of Gene. X – Proprietary • Spotfire, Xpression NTI (Informaxinc) Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
References • Microarray Gene Expression Database Group – http: //www. mged. org • National Center for Genomic Research – http: //genex. ncgr. org • University of Manchester , Bioinformatics Group – http: //bioinf. man. ac. uk/microarray/resources. html • Nature Reviews Genetics – http: //www. nature. com/nrg/ Persistent Systems Pvt. Ltd. http: //www. persistent. co. in
- Chapter 17 from gene to protein
- Ahmed muhudiin ahmed
- Mao inhibitors mechanism of action
- Professor mushtaq khan
- Gene by gene test results
- Lac operon in prokaryotes
- Prokaryotic
- Regulation of gene expression in bacteria
- Chapter 18 regulation of gene expression
- Chapter 18 regulation of gene expression
- Regulation of gene expression
- טרנסלציה
- Chapter 18 regulation of gene expression
- Genetic effects on gene expression across human tissues
- Is lac operon positive or negative control
- Ch 18+
- Rt pcr primer design
- Gene expression omnibus tutorial
- Gene expression
- Gene expression
- Gene expression
- Gene expression
- Prokaryotes vs eukaryotes gene regulation
- Cells must control gene expression so that __________.
- Lyonization of gene expression
- Quadratic equation
- Reverse distributive property
- Factor the expression using the gcf. $4m+32$
- 1/ 5 as a decimal
- 7-3 more multiplication properties of exponents
- Factoring expressions using gcf
- K map