Barley Base Barley Base org BARLEYBASE A MIAMECOMPLIANT
Barley. Base: Barley. Base. org BARLEYBASE – A MIAME-COMPLIANT EXPRESSION PROFILING DATABASE FOR PLANTS Lishuang Shen, Jian Gong, Jianqiang Xin, Xiaoyun Tang, Rico A. Caldo, Stacy Turner, Dan Nettleton, Roger P. Wise, Julie A. Dickerson* Virtual Reality Applications Center, Iowa State University, Ames, Iowa 50011 Barley. Express- Web-Based Submission Abstract Barley. Base (www. Barley. Base. org) is a USDA-funded public repository for plant microarray data. Barley. Base houses raw and normalized expression data from the 22 K Affymetrix Barley 1 and Arabidopsis ATH 1 Gene. Chips, plus experiment and sample annotation. And it is expanding to other plant microarray platforms. Barley. Base features a web-based, MIAME-compliant, experiment submission tool, Barley. Express allows users to efficiently submit and manage their experiment descriptions, array design and expression analysis information. Barley. Base contains a broad set of query and display options at all data levels, from experiment, hybridization to probe set and probe levels. Users can do cross-experiment query on probe sets by expression profile and by biological information. Probe set queries are seamlessly integrated with visualization and analysis tools such as scatter plots, the R statistical toolbox, and data filters. • Barley. Express is a MIAME-compliant microarray submission and annotation tool adapted from MIAMExpress. • Submitters first input experiment design information. • Annotate experiment in factorial design with factors and factor level. • Batch upload raw Gene. Chip data files. • Associate raw data files with each studied treatment. • Protocol submission – optional. • Input sample preparation details for each hybridization. Use templates to reuse previous sample submissions. • Finalize experiment submission. • Submitters grant access to designated individuals and groups. • Plant ontology and controlled vocabulary are enforced at each step. Visualization & Analysis • Web-based microarray data analysis pipelines integrate a broad set of probe set query and display options with analysis tools. • Interactive visualization at all data levels for experiments, hybridizations, probe sets, and probes. • Gene list creation with cross-experiment and cross-platform probe set queries for generating hypotheses about genes of interest. • Identification of differentially expressed and co-expressed genes with multiple statistical test and expression profile filters. • Pattern recognition on gene lists, methods include hierarchical clustering, k-means partitioning, PCA, SOM, and multi-dimensional scaling (MDS). • Gene list classification by Gene Ontology. • Data analysis & visualizations use R and Bioconductor. • Probe alignments with exemplar sequence. • Gene prediction through interconnections with Plant. GDB database. • Cross-species comparative genomics through the Gramene and Grain. Genes databases. Barley. Base collaborates with Plant. GDB, Gramene and Grain. Genes to perform gene prediction and cross-species comparison with Barley 1 Gene. Chip exemplar sequences. NASCArrays shares ATH 1 data. Barley. Base houses 20 experiment submissions from Barley and Arabidopsis with total 741 hybridizations (August 31, 2004). Data Processing Pipeline Barley. Express Barley. Base Batch Download MAGE-ML Raw Data CSV Query & MAS 5. 0 Analysis RMA Internet User Figure 1. Barley. Base Overview Data Acquisition & Processing • Experiment and expression raw data submission by submitter. • Barley. Base normalizes submitted raw data with the statistical algorithm from Affymetrix MAS 5 and Robust Multi-Array Analysis (RMA). • Compute summary statistics and graphs for raw and normalized expression data • Store all types of data in an open-source My. SQL database. • Barley. Base assigns unique accession numbers to experiments, hybridizations & samples. • Barley. Base generates MAGE-ML and CSV files for batch download and data exchange. • Submission and associated data are available for online access and analysis. Figure 4. Expression & Annotation for Exemplar Barley 1_11969 Figure 2. Major Steps in Experiment Submission Data Access • Batch download complete data sets for experiment annotation, raw and normalized expression data in MAGE-ML, comma-separated values (CSV), or CEL-file formats. • Navigate experiment, hybridization, sample data, exemplars. • Gene list creation & management for gene-centric analysis. • Access probe sets based on expression profiles with single- or crossexperiment query. • Search genes by biological criteria: annotation, sequence, gene ontology category, pathway, gene family membership. • Flexible, submitter-controlled data access, group access to private submissions Figure 5. Visualization for Hybridizations & Gene Cluster Future Plans Barley. Base Data Model • • Barley. Base uses a hierarchical data model to store microarray gene expression data. • The top level data structure is experiment, each contains one or more treatments, a treatment has one or more samples as replicates, a sample has one or more hybridizations. • Protocols are associated with experiment at the hybridization level. • Five table types : Array, Expression, Experiment, Protocol, Submitter. • Follows MIAME principles recommended by MGED and implemented in MIAMExpress, tuned for plants, and removes the Extract level. • Added statistical experimental factorial design factors fields. • Enforcing plant ontology and controlled vocabulary in experiment description. • Biological annotation for probe sets and exemplars with Gene Ontology. . • Support expression data from Affymetrix Gene. Chips, will add spotted microarray support. Evolve into Pl. Ex. DB, a comprehensive Plant Expression Data Base Support other major plant species: maize, rice, soybean, wheat. Support spotted c. DNA and long-oligo microarray platforms. Analysis & visualization tool development. Cross-experiment, cross-platform & cross-species data analysis. Exemplar annotation with Gene Ontology and pathway information. Acknowledgments • • • Figure 3. Gene List Creation, Management & Analysis The Barley. Base project is funded by the USDA National Research Initiative (NRI) grant no. 02 -35300 -12619 and USDA-CSREES North American Barley Genome Project. Plant. GDB, Gramene, Grain. Genes, KEGG, TAIR share tools and genomic data. NASCArrays and TAIR share Arabidopsis ATH 1 Gene. Chip data. Barley. Base is hosted at the Iowa State University Virtual Reality Applications Center. Exemplar sequences and BLASTX NR annotations were provided by Harv. EST: Barley.
- Slides: 1