Bioinformatics tools for biologists the EBI An overview
Bioinformatics tools for biologists @ the EBI An overview
Bioinformatics • The science of storing, retrieving and analyzing large amounts of biological information • An interdisciplinary science, involving biologists, computer scientists and mathematicians • At the heart of modern biology 2 EBI Overview
“Large-scale” focus • Data explosion and new types of data • High-throughput biology • Emphasis on systems, not reductionism • Large community of users with no training in bioinformatics • Growth of applied biology – molecular medicine, agriculture, food, environmental sciences… 3 EBI Overview
What is EMBL-EBI? • Based on the Wellcome Trust Genome Campus near Cambridge, UK • Part of the European Molecular Biology Laboratory • Non-profit organization 4 EBI Overview
The EBI’s mission • To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress • To contribute to the advancement of biology through basic investigator-driven research in bioinformatics • To provide advanced bioinformatics training to scientists at all levels, from Ph. D students to independent investigators • To help disseminate cutting-edge technologies to industry Filler text 5 EBI Overview
Databases and tools www. ebi. ac. uk
New types of data Literature and ontologies Genomes Protein sequence DNA & RNA sequence Protein structure Gene expression Chemical entities Protein families, motifs and domains Protein interactions Pathways Systems 7 EBI Overview
Databases: molecules to systems Genomes Ensembl Genomes EGA Nucleotide sequence EMBL-Bank Literature and ontologies Cite. Xplore, GO Protein families, motifs and domains Inter. Pro Microarray & gene expression data Array. Express Protein interactions Int. Act Protein structure PDBe Pathways Reactome Proteomes Uni. Prot, PRIDE Chemical entities Ch. EBI Systems Bio. Models 8 EBI Overview
Database collaborations 9 EBI Overview
Standards development – international collaborations Genomics Standards Consortium (GSC) http: //gensc. org Genome annotation www. geneontology. org Protein sequence www. uniprot. org Nucleotide sequence www. insdc. org Microarray and Gene Expression Data (MGED) www. mged. org Cheminformatics www. ebi. ac. uk/chebi HUPO- Proteomics Standards Initiative (PSI) www. psidev. info Pathways www. reactome. org www. biopax. org Metabolomics Standards Initiative (MSI) www. metabolomicssociety. org 10 EBI Overview Protein structure www. wwpdb. org Systems modeling standards www. sbml. org
EBI website: www. ebi. ac. uk Databases 11 EBI Overview Tools
EBI search engine: EB-eye Search all main databases in one go 12 EBI Overview
Nucleotides: European Nucleotide Archive (ENA) • ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data • Collaboration with Gen. Bank and DDBJ for data sharing • It consolidates information from EMBLBank, the European Trace Archive (containing raw data from electrophoresisbased sequencing machines) and the Sequence Read Archive (containing raw data from next-generation sequencing platforms) • Provides access to the whole scale of sequencing information: from raw data, through assembly and mapping information, through to high-level functional annotation (see figure). 13 EBI Overview
Nucleotides: ENA Download data Navigate to view related data, e. g. taxon-specific data Other type of data include SRA experiments 14 EBI Overview
Genomes: Ensembl & Ensembl Genomes • Genome browser providing free access to the complete sequences of higher and model organism • With Ensembl you can: ü Retrieve all or part of a genome sequence ü Perform sequence alignment using BLAST or BLAT ü Link to genome annotation from microarray results ü View expressed m. RNA, protein, etc. in a chromosomal region ü View variations such as SNPs across strains or populations ü View all alternative splicing for a gene ü Explore homologues and phylogenetic tree across > 30 species ü View conserved regions across species • Ensembl Genomes extends to non-vertebrate genomes 15 EBI Overview
Genomes: Ensembl Genomic alignments Chromosomes Genes Pick a genome Synteny Gene families SNPs Across species 16 EBI Overview Orthology Within species
Genomes: Ensembl Genomes Ensembl Metazoa Ensembl-like genome browser for nonvertebrate species Ensembl Bacteria Across species 17 EBI Overview View options Using view options, you can select to view only the current gene or the entire expanded gene tree Select Orthologue view to see putative orthologues
Retrieving data with Biomart • Bio. Mart is a search engine that can be used to download data into a table format • Many EBI databases are powered by Biomart • For example, you can use Ensembl Biomart to retrieve: ü All the genes for one species ü Or… only genes on one specific region of a chromosome ü Or… genes on one region of a chromosome associated with an Inter. Pro domain ü Or…etc. 18 EBI Overview
Biomart – how it works First Step: Choose a dataset Second step: Add filters to define a gene set Third step: Add attributes to determine column output 19 EBI Overview
Biomart results 20 EBI Overview
www. biomart. org 21 EBI Overview
Array. Express & Atlas of Gene Expression • Array. Express Archive is a public repository of functional genomics experiments, including gene expression, supporting scientific publications • You can query it to retrieve experimental information and download functional genomics data • Atlas of Gene Expression contains a subset of curated and re-annotated Archive data • Can be queried for individual gene expression under different biological conditions across experiments 22 EBI Overview
Transcriptomes: Array. Express Archive: browse experiments Search by keyword Expand results Spreadsheets describing the experiment, sample properties or array design 23 EBI Overview
Transcriptomes: Atlas of Gene Expression Atlas interface Gene summary page 24 EBI Overview Experiment page Search by gene name or biological condition
Protein sequence: Uni. Prot • Provides the scientific community with a comprehensive, richly curated, highquality and freely accessible resource of protein sequence and functional information • Users can perform simple and complex text-based queries, run sequence-based searches, perform multiple sequence alignments, etc. • Consists of: ü ü ü 25 EBI Overview Uni. Prot. KB/Swiss-prot, manually annotated Uni. Prot. KB/Tr. EMBL, computationally analyzed records Uniref, clustered by sequence identity Uni. Parc, most comprehensive publicly available non-redundant protein sequence db, unannotated Uni. MES, protein sequence from metagenomic and environmental data
Uni. Port text search for Brca 1 26 EBI Overview
Protein families, motifs & domains: Inter. Pro 27 • Integrated documentation resource for protein families, domains and functional sites • Protein signatures from different member databases describing the same biological protein family or domain are united into a single Inter. Pro entry containing information about the signature(s) and links to the protein in Uni. Prot • Links to Gene Ontology indicate the biological function and process that the proteins are involved in EBI Overview
Protein families, motifs and domains: Inter. Pro Compare methods of protein signature prediction Visualize the taxonomic range for a protein signature View architectures of proteins containing a signature 28 EBI Overview
Molecular interaction database: Intact • Int. Act provides a freely available, open source database system and analysis tools for protein interaction data. • All interactions are derived from literature curation or direct user submissions • With Intact you can: ü Find molecules that interact with your protein of interest ü Display interaction networks ü Analyze interaction networks using GO terms, molecule type, role, etc. ü Download data ü Install Int. Act system locally 29 EBI Overview
The Protein Data Bank in Europe (PDBe) • PDBe is a resource for the collection, organization and dissemination of data about biological macromolecular structures • A suite of web-based services allows you to: ü PDBe. View and PDBe. Lite provide a flexible and user-friendly query interface to the PDBe database ü PDBe. Analysis provides searches and statistical analyses of macromolecular structure and residue information ü PDBe. Fold allows performing pairwise or multiple comparisons as well as 3 D alignments of structures ü PDBe. Chem allows searching for and visualize any molecule in the PDB’s ligand dictionary ü PDBe. Pisa is an interactive tool for exploring macromolecular interfaces and surfaces, predicting probable quaternary structures (assemblies) and searching the PDB for structurally similar interfaces and assemblies ü PDBe. Motif allows complex searches of the PDB based on small 3 D motifs, sequence motifs in conjunction with ligand environment, secondary structure patterns ü Many more tools available 30 EBI Overview
Structures: PDBe Sequence mapping Linking to domain data Ligands Assemblies Electron density visualization Active sites Fold matching 31 EBI Overview Surface matching
PRoteomics IDEntifications database (PRIDE) 32 • PRIDE is a centralized, standards compliant, public data repository for proteomics data • Provides the proteomics community with a public repository for protein and peptide identifications together with the evidence supporting these identifications. • PRIDE is also able to capture details of post-translational modifications coordinated relative to the peptides in which they have been found. EBI Overview
Enzymes: Int. Enz 33 • Int. Enz (Integrated relational Enzyme database) is a freely available resource focused on enzyme nomenclature. • Int. Enz contains the recommendations of the Nomenclature Committee of the IUBMB on the nomenclature and classification of enzymecatalysed reactions. EBI Overview
Chemical entities: Ch. EBI • Ch. EBI is a freely available, manually annotated database of small molecular entities • A molecular entity is any constitutionally or isotopically distinct atom, molecule, ion pair, radical ion, complex, conformer, etc. , identifiable as a separately distinguishable entity, not directly encoded by the genome • With Ch. EBI you can: ü Find the correct chemical terminolgy using name, formula or registry number ü Visualize chemical structures ü Perform similarity searches ü View the relationship between molecules using the ch. EBI ontology ü Bridge the gap between small molecules and the macromolecules they interact with (crosslink to Uni. Prot and Reactome) ü Downoload chemical structures ü Submit new structures 34 EBI Overview
Chemical entities: Ch. EBI View mappings to other databases such as Reactome and Uniprot Download flat files, database dumps and the Ch. EBI Ontology for local installation View relationships in the Ch. EBI Ontology Link to other databases 35 EBI Overview View structure, nomenclature, formula and more
Chemogenomics: Ch. EMBL 36 • Ch. EMBL is a publicly available database of drugs, drug-like small molecules and their targets • The data includes information about how small molecules bind to their targets, how these compounds affect cells and whole organisms, and information on the molecules’ absorption, distribution, metabolism, excretion and toxicity. • Ch. EMBL holds two-dimensional structures, calculated molecular properties (e. g. log. P, molecular weight, Lipinski ‘Rule of Five’ parameters) and bioactivity data (such as binding constants and pharmacology). • The bioactivity data is tagged to show links between molecular targets and published assays, with a set of varying confidence levels. • Additional data on the clinical progress of compounds is being integrated into Ch. EMBL. EBI Overview
Chemogenomics: Ch. EMBL 37 EBI Overview
Pathways: Reactome • A free, online, open-source curated database of pathways and reactions in human biology • Information in the database is authored by expert biologist researchers, maintained by Reactome editorial staff • Used to infer orthologous events in 22 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast • Extensively cross-referenced to other resources e. g. NCBI, Ensembl, UCSC genome Browser, Uni. Prot, Pub. Med, KEGG, Ch. EBI and GO. 38 EBI Overview
Pathways: Reactome View reactions and events in detail Select a pathway Compare events in different species Export pathway
Pathways: Reactome Display expression data 40 EBI Overview Link to source databases
Biological ontologies: Gene Ontology (GO) 41 • The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases • GO develops ontologies that describe biological processes, cellular components and molecular functions in a species-independent manner • Also GO annotates several of the EBI’s databases with GO terms EBI Overview
User support • 2 Can bioinformatics user support – www. ebi. ac. uk/2 Can • Online help pages – www. ebi. ac. uk/help • E-mail support – www. ebi. ac. uk/support 42 EBI Overview
http: //www. ebi. ac. uk/Information/Brochures/ 43 EBI Overview
Research www. ebi. ac. uk/groups
Key facts about research • The EBI provides a unique environment for bioinformatics research • Seven dedicated research groups aim to understand biology through new approaches to interpreting biological data • Services teams also carry out R&D to enhance existing services and develop new ones • Research program complements services and the two are mutually supportive 45 EBI Overview
Research Functional genomics and small RNA analysis Enright Vertebrate genome annotation Flicek Literature analysis and semantic data integration in life science research Rebholz-Schuhmann Algorithmic methods for genome analysis Birney Transcriptome analysis on a genomic scale Brazma Genome analysis using evolutionary tools Goldman Evolutionary biology Marioni Protein sequence analysis and functional annotation Apweiler Analysis of protein structure, function and evolution Thornton Analysis and validation of protein structures; protein– ligand interactions Kleywegt Cheminformatics and metabolism Steinbeck Chemogenomics and drug discovery Overington Genome-scale analysis of regulatory systems Luscombe Neurobiology networks and systems Le Novère Systems Biomedicine Saez-Rodriguez Mammalian stem cell differentiation and development Bertone
Training www. ebi. ac. uk/training
A tripartite user-training programme Training comes to you www. ebi. ac. uk/training/roadshow Training any time, anywhere, at any pace www. ebi. ac. uk/training/elearning Hands-on user training on all our core data resources for researchers www. ebi. ac. uk/training/handson 48 EBI Overview
Hands-on training for all levels of experience • Interactive training in our purpose-built IT training suite at EMBLEBI, Hinxton, Cambridge • Learn from the EBI’s experts through a combination of talks and practical exercises • Take a tour of all our core data resources, or focus in on specific data types • Full programme at www. ebi. ac. uk/training/handson 49 EBI Overview
e. Learning project – pilot phase • Do you want to learn at your own pace at a time that suits you? • We are developing a new e. Learning platform and need our users to help us test it • If you would like to get involved, contact: elearning@ebi. ac. uk 50 EBI Overview
- Slides: 50