Evaluating Knowledgebased Scoring Function for Drug Discovery Case

Evaluating Knowledge-based Scoring Function for Drug Discovery Case Study: Drug. Score Joseph Jamal Rush III October 31, 2009 Bioinformatics Mentor: Mr. Mondal, Claflin University

Scoring Function • Scoring Function is the process of evaluating a particular pose by counting the number of favorable intermolecular interactions such as hydrogen bonds and hydrophobic contacts. • They are useful for drug and other types of molecular design.

3 Classes of Scoring Functions • Empirical Scoring Function • Knowledge-based Scoring Function • Force-field Scoring Function

Empirical Scoring Function • Empirical scoring function are directly calibrated with a set of protein-ligand complexes with experimentally determined structures and binding affinities through multivariate regression analysis. • Some empirical scoring functions are Bohm’s scoring function, Chem. Score, and X-SCORE.

Force-Field Scoring Function • Force field scoring functions generally quantifies the sum of the interaction energy between the target and the ligand, and the internal energy of the ligand. • Some force-field scoring functions are Gold. Score, D-Score, and G-Score.

Knowledge-based Scoring Functions • The principle of the knowledge-based scoring function is based on atom-atom interactions. • Knowledge based scoring function estimates the binding affinities based on statistical observations of intermolecular close contacts in 3 D-images. • These functions use the sum of potentials of mean force (PMF) between protein and ligand atoms derived from the Brookhaven Protein Data Bank(PDB) as a measure for proteinligand binding affinity. • Examples: Drug. Score and PMF

Scoring Function Under Test • Drug. Score, a Knowledge-based scoring function is used for this project • Drug. Score scoring function has two components: the distance-dependent potential and the surfacedependent potential. In our study we only use distance-dependent potential

Equations for Drug. Score

Equations for Drug. Score

Evaluation Methods • The performance of a scoring function is measured by the docking enrichment factor. – The docking enrichment factor reflects the ability of the docking calculations to find true positives throughout the background database compared to random selection.

2 Enrichment Factors • Own Decoy – Means that the enrichment factor uses only the native ligands of the proteins and the corresponding decoys of these ligands. • Entire Database – Means that’s the enrichment factors uses the entire ligands and all the decoys in the database. – In order for each protein to get to “entire database” enrichment factor, it needs to use all the ligands and decoys to calculate.

Calculating the Docking Enrichment Factor

Strategy -Docking Part Download proteins, ligands, and decoys files from DUD database Change all the protein files rec. pdb to protein. Name. pdb Extract the best docking coordinates of ligands and decoys from Autodock log file, *. dlg Extract individual mol 2 files for ligand decoys. Use autodock to get the best docking coordinates for ligands and decoys.

Strategy -Scoring Part (Knowledge-Based Scoring Function) Extract atom information parts of proteins Run knowledgebased scoring function program to calculate score Create output files containing scores for each proteinligand proteindecoys pair. Sort all the scores of all the compound pairs of each protein, and calculate the enrichment factor for that protein

Tools for Protein-Ligand Docking • Autodock 4, popular free docking software, is used to dock the protein with all their own ligand decoys in DUD. • The best coordinate generated by Autodock for each of the ligands and decoys docked to their own proteins will be used for calculating scores. • How much time to run the autodock 4 depends on the structure of the protein, ligand decoy, and the equipment of the running machine.

3 proteins used in Experiment • cox 1 (Cyclooxygenase 1) – 4492 atoms in the protein file – 25 ligands – 911 decoys • hsp 90 (Human shock protein 90 kinase) – 1627 atoms in the protein file – 37 ligands – 979 decoys • pr (Progesterone) – 2004 atoms in the protein file – 27 ligands – 1036 decoys

Scoring Results tag protein ligand score D cox 1 ZINC 03862207 456696768. 00 L cox 1 ZINC 04617752 1752637952. 00

-------------------------------------------------protein top% n_total n_sampled hit_total hit_sampled EF_top% -------------------------------------------------cox 1 5 936 47 25 3 2. 390 cox 1 10 936 94 25 6 2. 390 cox 1 15 936 140 25 6 1. 605 cox 1 20 936 187 25 6 1. 201 cox 1 25 936 234 25 6 0. 960 cox 1 30 936 281 25 6 0. 799 cox 1 35 936 328 25 6 0. 685 cox 1 40 936 374 25 6 0. 601 cox 1 45 936 421 25 6 0. 534 cox 1 50 936 468 25 9 0. 720 cox 1 55 936 515 25 9 0. 654 cox 1 60 936 562 25 9 0. 600 cox 1 65 936 608 25 9 0. 554 cox 1 70 936 655 25 9 0. 514 cox 1 75 936 702 25 9 0. 480 cox 1 80 936 749 25 12 0. 600 cox 1 85 936 796 25 13 0. 611 cox 1 90 936 842 25 16 0. 711 cox 1 95 936 889 25 19 0. 800 cox 1 100 936 25 25 1. 000

3. 5 3. 0 Enrichment Factor 2. 5 2. 0 cox 1 1. 5 hsp 90 pr 1. 0 0. 5 0. 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Top x% of the Database

Conclusion • Docking algorithms, the design of the scoring function, and the accuracy of the benchmark data are all possible factors to affect the process of evaluation. • Results show that Drug. Score does not do good enrichment in case of cox 1 and hsp 90 but it shows better enrichment in case of pr. So, Drug. Score cannot be used universally to find the binding affinity for all proteins. To overcome this problem, we need a universal scoring function, which is our future project.

Future Work • To evaluate how good a scoring function is, we can use more running times of the generic algorithm of the Autodock 4 to check if this can improve the performance of knowledgebased scoring function in the future.

Acknowledgements • Mr. Ananda Mondal • Mrs. Pamela Shuler-HBCU-Undergraduate Program Manager • HBCU-Undergraduate Program, HRD 0713853 • National Science Foundation • Claflin University
- Slides: 22