Overview of Ch EMBL Database Gareth Owen Ch
Overview of Ch. EMBL Database Gareth Owen, Ch. EBI group, EMBL-EBI The Jackson Laboratory 18 th October 2012 EBI is an Outstation of the European Molecular Biology Laboratory.
What is Ch. EMBL? • Open access database for drug discovery • Freely available (searchable and downloadable) • Content: • 2 D structures & calculated properties (log. P, MW, Lipinski, etc. ) • Associated bioactivity data extracted from the primary medicinal chemistry journals such as J. Med. Chem. • Deposited data from neglected disease screening (e. g. malaria) • Subset of data from Pub. Chem • Covers ~30 years of compound synthesis and testing • Annotated FDA-approved drugs • Secure searching (https: //www. ebi. ac. uk/chembldb ) 2
Ch. EMBL Database • Content Ch. EMBL 14 Targets: 9, 003 Compounds: 1, 376, 469 Activities: 10, 129, 256* Publications: 46, 133 * Includes: ~5, 900, 000 (Pub. Chem) ~100, 000 (Deposited malaria screening sets) Assays are classified as: • Binding measurements • Functional assays • ADME/toxicity data 3 3 60% proteins 20% organisms � 20% cell lines
Ch. EMBL Assays – Binding, Functional, ADMET Binding Assays • Assays which directly measure the binding of a compound to a particular target • E. g. , competition binding assays with a radioligand • Various endpoints measured, but most commonly reported are: • • 4 IC 50 (half maximal inhibitory concentration) Ki (binding affinity) MIC (minimum inhibitory concentration) % Inhibition (of activity)
Functional Assays Whole organism assays (e. g. , anti-infectives/parasitics) (e. g. , human ovarian cancer cell line cytotoxicity) Tissue or cell-based disease model (e. g. , glucose uptake by adipocytes) Tissue or cell-based assay for target effect (e. g. , contraction of guinea-pig ileum) Cell-based assay over-expressing target (e. g. , GPCR calcium mobilisation) 5 Target association Disease-derived cell-line
ADMET Assays • Assays measuring: Absorption, Distribution, Metabolism, Excretion, Toxicity properties of compounds • Examples include: • Half-life of compound in rats • Tissue distribution of compound • Levels of metabolites 6
Ch. EMBL Targets: Protein e. g. , PDE 5 Protein complex e. g. , Nicotinic acetylcholine receptor Cell Line e. g. , HEK 293 cells 7 Tissue e. g. , Nervous Protein family e. g. , Muscarinic receptors Sub-cellular Fraction e. g. , Mitochondria Nucleic Acid e. g. , DNA Organism e. g. , Drosophila
Protein Targets • Each protein target linked to a sequence in Uni. Prot • Information from Uni. Prot used in Ch. EMBL to allow searching: • Protein name/description • Synonyms and gene names • Organism (and NCBI Tax ID) • Proteins in Ch. EMBL also classified according to family (e. g. , Receptor, Kinase, Protease, Transporter etc). • Used for searching by target tree (Browse Targets) 8
Ch. EMBL Compounds • Chemical structures are stored as. mol files • If the stereochemistry is known it is drawn as a specific enantiomer • Tautomers of the same compound are treated as the same compound. The form shown is as in the paper • Identifying unique compounds is done using standard Inchis • Salts and parent molecules are grouped together for displaying bioactivity data although activity data is recorded against the specific salt 9
Ch. EMBL Home Page 10 https: //www. ebi. ac. uk/chembldb
Ch. EMBL Main Search Page 11
Drug Information Clickable structure Parent and Salt Forms 12 Small molecule resources at the EBI
13
Click to display data 14
15
16
Ch. EBI Link: 18
This will take you back to Ch. EMBL 19
Chem. Spider Links: The link works both ways. They link TO Chem. Spider and FROM Chem. Spider. They link on Standard_Inchi 20
Wikipedia Links: We also have links with Wikipedia. These also use the Standard_Inchi as the common identifier. These links will link to the Compound Report Card in Ch. EMBL. The links are added by a Chemo. Bot and can be updated with each release, if required. 21
Use Case 1 - Searching by Target • What is known about chemical structures that bind to a specific protein (Adenosine A 2 a)? • What is known about their potency/selectivity/ADMET Properties • Is there any protein structure data? 22
Use Case 1 Searching by Target in Ch. EMBL Choose Sources to include in search 23
Retrieving Bioactivity Data - Single Target 3 D Structures Bioactivity data for target Assay data for target 24 Display all bioactivity data for target Click pie chart to retrieve particular end-points
Filtering Bioactivities Select targets of interest 25 Select required activity types and define cut-offs e. g Ki<100 n. M
Bioactivity Results Compound structures 26 Activity values Assay details Target details References
Selectivity Data For example: Can search Ch. EMBL for all data on compounds that have adenosine A 2 a Ki values <100 n. M 27
ADMET Data Summary of Ch. EMBL bioavailability data for compounds with A 2 a Ki values <100 n. M Example of Bioavailability data 28
Use Case 2 – Searching by Structure • What compounds contain a particular substructure? • What is known about their bioactivities? • Known drugs/clinical Trials 29
name Lists of Identifiers Types of synonyms: • Research codes • Trade names • INN, USAN Different sketchers 30
Similarity and Substructure Searching 31 Display/Download Bioactivity Data
Filtering Data on Lipinski Properties etc 32 Display Bioactivities of subset
Bioactivities names Structure 33
Bioactivities Properties Cross-references 34 Clinical Trials
Links to Other Resources 35
Links to Other Resources PDBe - http: //www. ebi. ac. uk/pdbe 36
Marketed Drugs 37 Select set of interest Export to Excel or Export SDF
Use Case 3 – Similar Targets • Are there any available data on compounds that bind to proteins similar to IRAK 2? • For these compounds what bioactivity data is there on compounds with related sub-structures? • Is there any crystal structure data on these proteins? 38
Protein Sequence Search • More precise method for identifying targets • Input is a protein sequence of interest • Uses BLAST* algorithm to perform pair-wise comparisons between input sequence and all proteins in the Target Dictionary, to find most closely related matches • Results are scored according to similarity to input sequence (determined by number of amino acids that are identical or have similar properties) 39 *Altschul SF et al. , J Mol Biol. 215(3), p 403 -10 (1990)
Use Case 3 – Similar Targets Protein Sequence of Interest e. g from Uni. Prot http: //www. uniprot. org 40 Data on IRAK 1, IRAK 3 and IRAK 4 but not IRAK 2
IRAK 1, IRAK 3 and IRAK 4 data Identify sub-structure of interest What other data available on compounds with this sub-structure? 41
Use Case 4 - Assay keyword search • Some Ch. EMBL data (e. g. , functional assays) may not be mapped against molecular targets • May want to perform a more general search (e. g. , for a disease process, animal model, cell type of interest) • Examples: 1. What compounds have been tested in disease models (cholesterol lowering)? 2. What data is available for brain penetration (brain to plasma ratio)? 42
Assay Search for “Cholesterol Lowering” 43
Assay Search for “Brain to Plasma” 44
Accessing Ch. EMBL Data 45
Useful Links Ch. EMBL Blog: http: //chembl. blogspot. com If you would like help: chembl-help@ebi. ac. uk For Ch. EMBL news and data releases subscribe to: http: //listserver. ebi. ac. uk/mailman/listinfo/chembl-announce 46
Acknowledgements Ch. EMBL Group John Overington Anne Hersey Anna Gaulton Mark Davies Jon Chambers Louisa Bellis Kazuyoshi Ikeda Patricia Bento Shaun Mc. Glinchey Yvonne Light Felix Krueger Ben Stauch Ruth Akhtar Francis Atkinson Rita Santos 47 EMBL-EBI Samuel Kerrien, Sandra Orchard, Bruno Aranda, Rafael Jimenez, Reactome, Uni. Prot and Ch. EBI teams Collaborators Imperial Cancer Research, University of Dundee, University of Cambridge, Sanger Centre, University of Maryland, NCBI, TDR, IUPHAR, Bayer-Schering, Pfizer, GSK, Schering-Plough, MMV, Novartis, St Jude Children’s Research Hospital Former Inpharmatica colleagues
Exercises! 48
- Slides: 47