UNIVERSIT DEGLI STUDI DI MILANO Facolt di Scienze

  • Slides: 24
Download presentation
UNIVERSITÁ DEGLI STUDI DI MILANO Facoltà di Scienze del Farmaco Virtual screening and collaborative

UNIVERSITÁ DEGLI STUDI DI MILANO Facoltà di Scienze del Farmaco Virtual screening and collaborative computing: a new frontier in drug discovery Alessandro Pedretti XI Congreso Venezolano de Química Caracas, June 18, 2013

Overview Collaborative computing applied in a computational chemistry laboratory. Warp. Engine paradigm to distribute

Overview Collaborative computing applied in a computational chemistry laboratory. Warp. Engine paradigm to distribute the calculations in the local network. Virtual screening setup to choose the best software and parameters. Two Warp. Engine applications to evaluate its performances. Short Warp. Engine practical session.

What is the collaborative computing Main definition: The “collaborative computing” term includes technologies and

What is the collaborative computing Main definition: The “collaborative computing” term includes technologies and informatics resources based on a network communication system that allows the documents and projects to be shared between users. All activities are managed by a variety of devices such as desktops, laptops, tablets and smartphones. In a computational chemistry laboratory: The daily activity of a computational chemist requires not only to share information and data between the users, but also hardware resources.

Typical scenario in a lab Internet Servers PCs Firewall Several PCs with heterogeneous hardware

Typical scenario in a lab Internet Servers PCs Firewall Several PCs with heterogeneous hardware / OSs. Very high computational power “fragmented” on the local network. Hard possibility to use all computational power to run a single complex calculation. Network devices Ethernet infrastructure 100 -1000 Mbit/s

Main features Parallel computing without the grid paradigm. Client/server architecture with hot-plug capabilities. Possibility

Main features Parallel computing without the grid paradigm. Client/server architecture with hot-plug capabilities. Possibility to perform calculations with different pieces of software without changing the main code. Expandable by scripting languages. High-level database interface integrated in the main code supporting the most common SQL database engines (Access, My. SQL, SQLite, SQL Server, etc). Easy configuration by graphic interface. High performances and security.

What we need … … to develop Warp. Engine: High-level database interface. Fast customizable

What we need … … to develop Warp. Engine: High-level database interface. Fast customizable Web server. Property calculation Script engine. Graphic environment. Molecule editing MM / MD calculations Surface mapping Trajectory analysis File format conversion Database engine Graphic interface Plug-in expandability Scripting languages

Server scheme Project manager UDP server Power. Net plug-in Job manager Database engine Client

Server scheme Project manager UDP server Power. Net plug-in Job manager Database engine Client manager VEGA ZZ core HTTP server IP filter TCP/IP, HTTP, broadcast Main program Optional encrypted tunnel provided by Warp. Gate To clients

Client scheme Power. Net plug-in Main program Project manager Multithreaded worker UDP client HTTP

Client scheme Power. Net plug-in Main program Project manager Multithreaded worker UDP client HTTP client TCP/IP, HTTP, broadcast VEGA ZZ core To the server

Application fields Warp. Engine is easy expandable by scripting languages, hence it’s possible to

Application fields Warp. Engine is easy expandable by scripting languages, hence it’s possible to perform some calculation types: Semi-empirical calculations Ab-initio calculations Rescore of docking poses Multiple molecular mechanics calculations Virtual screening

Drug discovery and virtual screening Today, the virtual screening is a very common approach

Drug discovery and virtual screening Today, the virtual screening is a very common approach to identify hit compounds from large libraries of molecules in the drug discovery process. It can be classified in: Ligand-based The 3 D structure of the biological target is unknown and a set of geometric rules and/or physical-chemical properties (pharmacophore model) obtained by QSAR studies are used to screen the library. Structure-based It involves molecular docking calculations between each molecule to be tested and the biological target (usually a protein). To evaluate the affinity, a scoring function is applied. The 3 D structure of the target must be known.

Dis-advantages of the virtual screening Advantages: Database Fast (but it depends by the library

Dis-advantages of the virtual screening Advantages: Database Fast (but it depends by the library size). Possibility to optimize the in-home resources. Cheap. Disadvantages: False positive rate. Virtual screening Limited chemical space (ligand-based). Impossibility to discriminate the intrinsic activity (structure-based). Necessity to confirm the results by experimental assays. Hit compounds

Choice of docking software for virtual screening For test purposes, we choose three well

Choice of docking software for virtual screening For test purposes, we choose three well known and free docking software: Auto. Dock 4. 2 http: //autodock. scripps. edu Auto. Dock Vina http: //vina. scripps. edu PLANTS http: //www. tcd. uni-konstanz. de/research/plants. php and the acetylcholine esterase (Ach. E) ligand database from Directory of Useful Decoys (DUD, http: //dud. docking. org), containing: 107 true active molecules 3892 true inactive molecules All these ligands were docked into Ach. E crystal structure downloaded from PDB (1 EVE) in order to evaluate the predictive power and the performances of each docking software.

Hit rate evaluation The hit rate is the measure of the probability to find

Hit rate evaluation The hit rate is the measure of the probability to find active ligands into a set of molecules and it can be calculated by the following equation: Considering the whole dataset: The random hit rate is the probability to find an active compound by random choices. In other words, every 100 randomly selected ligands from the data set, there are 2. 68 active compounds.

Evaluation of virtual screening performances The performances of each virtual screening software evaluated by:

Evaluation of virtual screening performances The performances of each virtual screening software evaluated by: sorting the results by the docking score; calculating the hit rate in a set of top ranked molecules (1%, 2% and 5% of the total data set); calculating the enrichment factor: Every virtual screening calculation must have at least EF > 1. 0 and to be considered enough efficient EF > 2. 0. It means that the screening must have performances at least 2 -fold better than the random.

Auto. Dock and Vina results two Auto. Dock runs were performed: screening and full

Auto. Dock and Vina results two Auto. Dock runs were performed: screening and full docking parameters. one Vina calculation with exhaustiveness set to 7; both software use a similar scoring function based on Amber force field.

PLANTS results The PLANTS enrichment performances were evaluated by considering: all three scoring functions

PLANTS results The PLANTS enrichment performances were evaluated by considering: all three scoring functions (Chem. PLP, PLP and PLP 95); two degrees of exhaustiveness (Speed 1 and Speed 2); flexible side chains of aminoacids (PLP and Speed 2 only).

Hardware for the test 1 PC configured as client and server: Quad-core 9 PC

Hardware for the test 1 PC configured as client and server: Quad-core 9 PC configured as client: 1 six-core 7 quad-core 1 dual-core 1 single-core Operating systems: 6 Windows 7 Pro x 64 3 Windows 7 Pro 1 Windows XP Pro Network connection: Ethernet 100 Mbs 37 cores 42 Gb ram > 3 Tb storage

Software & data for the test APBS – Adaptive Poisson-Boltzmann Solver Calculation of solvation

Software & data for the test APBS – Adaptive Poisson-Boltzmann Solver Calculation of solvation energy. PLANTS – Protein-Ligand ANT system Structure-based virtual screening. Database of drugs in. mdb format 174. 398 molecules, average MW 353, 70. Human M 2 muscarinic receptor PDB ID: 3 UON. Both programs are single-threaded

Real case tests APBS – Solvation energy calculation. 174. 398 molecules, two APBS calculation

Real case tests APBS – Solvation energy calculation. 174. 398 molecules, two APBS calculation for each molecule (reference and solvated state). Time required by a single thread calculation: Time required by Warp. Engine: Warp. Engine speed: 13 days 5 hours 8 hours 36 minutes 339, 10 jobs / min. PLANTS – Virtual screening. 174. 398 molecules, M 2 target, PLP, speed 2. Time required by a single thread calculation: Time required by Warp. Engine: Warp. Engine speed: 36 days 22 hours 1 day 0 hour 1 minute 121, 00 jobs / min.

Test Drive

Test Drive

Graphic interface

Graphic interface

Graphic interface

Graphic interface

Conclusions The collaborative computing not only can help the users to work together on

Conclusions The collaborative computing not only can help the users to work together on the same project, but also can be extended efficiently to share the computational resources that remain often unused. Warp. Engine can collect the unused computational power and convey it to carry out large calculations, such as a virtual screening, without interfering with the normal user activities. The setup phase of a virtual screening plays a pivotal role to obtain good performances in terms of results and calculation speed.

Acknowledgements Giulio Vistoli Matteo Lo Monte Angelica Mazzolari www. vegazz. net

Acknowledgements Giulio Vistoli Matteo Lo Monte Angelica Mazzolari www. vegazz. net