AMBIT Software for Data Management and QSAR Applications
AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Joanna Jaworska Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria E-mail nina@acad. bg Central Product Safety Procter and Gamble Belgium QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Introduction – why AMBIT ? o o Limited free, publicly accessible, methodologically transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD) Realization that efficient use of existing information on chemicals requires better ways for n Storage o n standardized formats, computer automated verification of structures, capability to store large amounts of data Taking advantage of rapidly evolving field of data mining and extraction of relevant information QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Content o o o Overview of AMBIT functional modules Technology choice and software capabilities Demonstration of the current state n n Web application o Online similarity search Standalone applications o Ambit Database Tools n n o Descriptor search Experimental data search Similarity search Verhaar classification scheme Ambit. Discovery n n Applicability domain Grouping by different methods QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Software overview Data import and export, Format Conversions Similarity assessment EM 9 -1, 2, 3 EM 9 -1 b Database Search engine Searches by (CAS, SMILES, Name) Substructure search Similarity Search EM 9 -1 a, b, 2, 3 QSAR 2006 8 -12 May, Lyon Applicability domain EM 9 -1 a AMBIT is available online at http: //ambit. acad. bg
AMBIT Database Today Not restricted to these datasets! Any dataset can be imported! (e. g. DSSTox, AQUIRE, LLNA dataset …) QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT More about the internals… o o Open source, relying on open standards Modular approach Stand alone and web versions Implemented in Java, i. e. n n o o The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit http: //cdk. sourceforge. net/ The software is based on a Relational Database Management System n n o Platform independent (same application runs on Windows, Unix, Mac …) Suitable for web applications Allows much faster and convenient access to the data in contrast to flat text files. Our choice is My. SQL database (www. mysql. com), which is the most popular open source relational database. Chemical Markup Language (CML) n n n Acknowledged method of encoding chemical data in XML Being adopted by a large number of chemical organisations, from government, through commercial to academia. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions. QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Information stored: o o o Structures internally stored in (compressed) CML format, allowing transparent and easy storage of 1 D, 2 D or 3 D representations (including mixtures) Multiple 3 D structures per compound Identifiers (SMILES, INChi, CAS or other registry numbers; unlimited number of arbitrary identifiers and synonyms) Inventory indicator Descriptors (unlimited number of arbitrary descriptors) Experimental data (flexible templates for experimental data) QSAR models Literature references Fingerprints and atom environments for fast substructure and similarity search Other information generated in order to accelerate specific queries The complete documentation of AMBIT Database is available at http: //ambit. acad. bg/docs QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Database schema Descriptors Repository Compounds Repository Literature References Repository Users Repository QSAR models Repository QSAR 2006 8 -12 May, Lyon Experimental Results Repository Queries AMBIT is available online at http: //ambit. acad. bg
AMBIT selected functionalities o o Input/output of chemical compounds, descriptors, experimental data and QSAR models (many file formats) Search n n o Grouping n n o Simple search (CAS, SMILES, chemical name) Descriptor search Experimental data search Substructure and similarity search Verhaar classification scheme Similarity (see J. Jaworska presentation tomorrow) QSAR Applicability domain assessment QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Online – Similarity search QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Online - Query result QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Links to other databases - KEGG QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Information about QSAR models QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Database Tools Standalone application QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT User Interface Example: Search by descriptor ranges QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Discovery Software for applicability domain and grouping Methods: o Descriptor space n n Ranges Euclidean distance City-block Distance Probability Density options o o o Threshold Preprocessing (e. g. PCA) Center More…. Structural similarity n n Fingerprints o Consensus fingerprint + Tanimoto distance o Consensus fingerprint + Missing fragments Atom environments o Consensus atom environments + Hellinger distance o k. NN + Tanimoto distance o Ranking Results from several methods can be combined. QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Discovery Data visualisation QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
AMBIT Discovery Results (exported to MSExcel file) QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Similarity based on mechanistic understanding Verhaar H. J. M. , Van Leeuven C. , Hermens J. L. M. , Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol. 25, No. 4, pp. 471 -491, 1992 o o o Verhaar scheme 34 rules 5 classes n n n Class 1. Narcosis or baseline toxicity Class 2 Less inert compounds Class 3 Unspecific reactivity Class 4 Compounds and groups of compounds acting by a specific mechanism Class 5 Not possible to classify according to these rules QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Verhaar scheme implementation Modular approach Can be used within: • AMBIT Database Tools • As an extension to Tox. Tree http: //ecb. jrc. it/qsar/ toxtree QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Summary o o o Many tools were developed and we are working on their seamless integration Both standalone and web application are in beta stage and are being extensively tested Synergies with other projects n n n o LRI Cefic gold standard BCF database will be stored in AMBIT LRI Cefic biotransformation database will be able to communicate with AMBIT BCF ECB Cramer rules software for TTC (human health) - Tox. Tree Fraunhofer Institute subchronic toxicity database (human health) Approaches to similarity assessment will be further extended and tested in context of category development /read across (ECB funded project) Open source software lowers the user barrier, facilitates the dissemination activities and enables the reproducibility of models and results QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Acknowledgment This work is funded by CEFIC LRI EEM-9 Building blocks for a future (Q)SAR decision support system : databases, applicability domain and structure conversions QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
The Chemistry Development Kit http: //cdk. sourceforge. net o o o CDK is a freely available open source Java library for structural chemo- and bioinformatics. Originated in - and is hosted by – the Research Group for Molecular Informatics at Cologne University’s Bioinformatics Center. Maintained and enhanced by more than 20 developers from both academic and industrial institutions all over the world. Used in more than 10 different academic and industrial projects world wide. Provides methods for many common tasks in molecular informatics n n n SMILES parsing and generation Substructure searching 2 D and 3 D rendering of chemical structures I/O routines (format conversions) 3 D builder QSAR module, etc QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
Thank you! Questions? QSAR 2006 8 -12 May, Lyon AMBIT is available online at http: //ambit. acad. bg
- Slides: 24