MIAME The MIAME website http www mged org
MIAME The MIAME website: http: //www. mged. org © 2002 Norman Morrison for Manchester Bioinformatics. 1
Overview • Why capture meta-data? • The data capture challenges – What to capture? – How to capture it? – Who agrees what to capture? 2
Post-genome data genome transcriptome proteome phenome bioinformatics interactome mobileome textome metabolome 3
Why meta-data? • Genome data is static • Post-genome is very state-dependant – – Transcriptome = no. of cell types * no. no of environmental conditions Annotation matters Data comparisons matter Learn from the gene debacle • Protein-tyrosine phosphatase, non-receptor type 6, Protein-tyrosine phosphatase 1 C, PTP-1 C, Hematopoietic cell protein-tyrosine phosphatase, SH-PTP 1, Proteintyrosine phosphatase SHP-1 • LARD, death receptor 3 beta, WSL-1 R protein, lymphocyte associated receptor of death, death receptor 3 • We need repositories 4
Microarray Repositories • A repository is a primary source of data generated by experimentalists. Its main role is to enforce standards and quality thresholds and to make data widely available. • Needs standards. 5
Microarray Repositories II • Repositories allow for easier data exchange between groups • Ensure that key details are kept • BUT: – What should be captured and how • Requires international cooperation – Minimal Information for the Annotation of Microarray experiments (MIAME) – Developed within MGED 6
MIAME – Major Sections • Array design – Reporters – Features – Control elements • Experimental design – – Experiment type Sample details Hybridisations Measurements 7
The Six Parts of MIAME • • • Experimental design: the set of hybridization experiments as a whole Array design: each array used and each element (spot, feature) on the array Samples: samples used, extract preparation and labeling Hybridizations: procedures and parameters Measurements: images, quantification and specifications Normalization controls: types, values and specifications 8
MIAME Glossary 9
Value of audit • Based on (qualifier, value, source) • • • or • • • Qualifier: cell type Value: epithelial Source: Gray’s anatomy (38 th ed. ) Qualifier: treatment Value: 15 heat shock Source: Smith and Jones, Nature Genet. (1992) 10
MIAME definitions • Available from www. mged. org • A minimum document to be read • All details mentioned in MIAME should be captured somewhere – Know where they are • Latest draft: Version 1. 1 (Draft 5, March 5, 2002) – Discussed at MGED IV • See also: A. Brazma, et al. , Nature Genetics, vol 29 (December 2001), pp 365 - 371 11
MIAME part 1: – array description • In principle this is someone else’s problem – (e. g. Affymetrix, Clonetech, etc. ) • Three levels of array design elements: – feature – the location on the array – reporter – the nucleotide sequence present in a particular location on the array – composite sequence – a set of reporters used collectively to measure an expression of a particular gene, exon, or splice-variant • Array design has 5 parts: 1. 1 Array related information 1. 2 Reporter information 1. 3 Feature information 1. 4 Composite sequences 1. 5 Control elements 12
MIAME part 2: Experimental design • This is your problem • Experimental design has four parts 2. 1 Experimental design 2. 2 Sample 2. 3 Hybridisation 2. 4 Measurements 13
2. 1 Experimental design • • • Design and purpose of the set of hybridisations Author, lab and contact Experiment type Experimental factors Number of hybridisations Common reference QC steps Experiment description (plus refs) Anything else 14
2. 2 Sample • Biosource properties – organism, contact, cell type, sex, …. • Biomaterial manipulation – growth conditions, in vivo treatment, compound • Sample labelling – label used, amount, method • Spiked controls – feature, type • Anything else 15
2. 3 Hybridisation • Relationship between samples and arrays • Protocol – full description • Anything else 16
2. 4 Measurement • Raw data – scanner files, scanning protocol • Scanning protocol – parameter settings • Analysis and quantification – analysis output, protocol – e. g. algorithms • Normalisation – strategies and algorithms, final gene expression table • Anything else 17
MAGE, ontologies and maxd The MAGE website: http: //www. mged. org © 2002 Norman Morrison for Manchester Bioinformatics. 18
Outline • MIAME is useful, but …. . – – How can we represent it computationally? How can we use it to share and exchange data? The wonderful world of XML The evil that is free text • ontologies and controlled vocabularies • Maxd – MIAME supportive, MAGE-ML compliant analysis of microarray data. 19
MAGE-ML, MAGE-OM • MIAME sets a standard for what knowledge (meta-data) to capture • But how to do it? • Need a knowledge model – a schema to represent the knowledge and the relationships between them.
Knowledge capture • UML – Universal Modelling Language provides a methodology for capturing knowledge in ways that are computationally tractable (cf database schemas) • MAGE-OM is the MGED approved UML model which attempts to capture the concepts in MIAME
XML • A UML diagram is not useful by itself • MAGE-ML is an attempt to capture MAGE-OM in XML (e. Xtended Markup Language) – the next generation HTML • MAGE-ML provides a structure for a text document (marked up with tags) which describes a microarray experiment
MAGE-ML • MAGE-ML is not nice! – – Complex Not easily human readable Needs software tools to help create it Very rich • MAGE-ML is the standard we have to work to.
Array. Express • Array. Express is the new public microarray data repository based at the EBI • Provides tools to help create MAGE-ML • Experiments will not be entered unless the annotation is of a high quality
Making MAGE useable • For a repository we need a relational database – not an object model • We have created a relational implementation of the MAGE-OM which is MIAME compliant (based on an early UML diagram for arrayexpress) - maxd. SQL
Data repositories • Relational version of MAGE-OM
Outstanding issues – free text • MAGE provides a structure for the knowledge – not a prescription for what gets put in • How to control what people put in the free text areas of MIAME/MAGE (the mickey mouse /. problem) • How do we define what is meant in ways that other people/software understand
Solution 1 • Controlled vocabularies – Agreed lists of terms (and definitions) that a community agree to use • Pros: technical simple, easy to implement • Cons: limiting, how to get agreement? , terms on there own are not very descriptive
Solution 2 • Ontologies – Can be thought of as a set of agreed terms and the relationships between them (a taxonomy is a simple ontology in which the only relationship allowed is an is-a relationship) • Pros: a very rich and powerful infrastructure • Cons: complex • Many developments – a space to watch – Chris Stoeckert and Helen Parkinson http: //www. cbil. upenn. edu/Ontology/MGED_ontology. html
- Slides: 29