QSAR prediction of physicochemical properties and biological activities
QSAR prediction of physico-chemical properties and biological activities of emerging pollutants: brominated flame retardants and perfluorinated -chemicals Paola Gramatica Barun Bhhatarai, Simona Kovarich and Ester Papa QSAR Research Unit in Environmental Chemistry and Ecotoxicology DBSF -University of Insubria, Varese - Italy E-mail: paola. gramatica@uninsubria. it http: //www. qsar. it Sixth Indo-US Workshop on Mathematical Chemistry Kolkata, 8 -10 January 2010
THE CHEMICAL UNIVERSE NEW 11. 000 / year More than 50. 000 (sept. 2009) 34, 849, 353 on the market Q S A R Regulated 247, 952 EINECS TSCA 100. 204 Predictive methods Environmental fate? Human effects? 5% Known data experiments EU-REACH Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
INTRODUCTION – REACH and QSAR Limited availability of experimental data Lack of knowledge of the properties and activities of existing substances Complexity of “old” regulations New EU-regulation: Registration Evaluation Authorisation of Chemicals Interest on development and validation of alternative methods, such as QSARs. The use of predictive QSAR models is suggested : ü To highlight dangerous chemicals ü To prioritize chemicals and to focus the experimental tests ü To fill the data gaps
Prof. Paola Staff Gramatica in Environmental Chemistry Dr. Ester Papa, and Ecotoxicology Ph. D http: //www. qsar. it - University of Insubria Dr. DBSFSimona Kovarich Varese - Italy Dr. Jr. Luini Mara Dr. Barun Bhhatarai, Ph. D (Dr. Jiazhong Li, Ph. D)
INTRODUCTION – Brominated Flame Retardants • Class of emerging pollutants used in a variety of consumer products (plastics, polyurethane foams, textiles, electronic equipments. . ) to increase fire resistancy • Three most marked HPV products: TBBPA Tetra. Bromo. Bisphenol-A HBCD Hexabromocyclododecane 209 possible CONGENERS PBDE Polybrominated Diphenyl Ethers • Levels in the environment and humans increased since they came into use • Ban of penta- and octa-BDE formulations (Deca. BDE under evaluation); HBCD in candidate list? Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
INTRODUCTION – Brominated Flame Retardants Background knowledge about BFRs: • • • Low water solubility High Log. Kow > 5 Persistence in the environment Liver toxicity, thyroid toxicity, developmental toxicity Endocrine disruptors The available amount of experimental data is very small and mainly related to already banned BFRs. There is the need to extend knowledge about properties and ecotoxicological data for a better understanding of BFRs behaviour and related risks Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
INTRODUCTION – Perfluorinated Compounds • Perfluorinated compounds (PFCs) are chemicals containing a long fluorinated carbon tail attached to different functional groups • PFCs as perfluoro-octanesulfonate (PFOS), perfluoro-octanoate (PFOA) and perfluorooctane sulfonylamide (PFOSA) are stable chemicals with a wide range of industrial and consumer applications • Degradable products of commercial PFCs are found in environment and biota and di. PAPs (a group of PFCs used on food wrappers) was recently reported in human blood • PFCs are considered emerging pollutants and are believed to have potential toxic effects in humans and wildlife • PFCs along with Polyfluoro compounds are studied for LC 50 inhalation toxicity of Mouse and Rat Predictive QSAR approaches is used to fill the data gap and to predict toxicity of 250 PFCs on two different species viz. Mouse and Rat Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 7
Aims of the Modelling Studies Ø Development of QSAR models for available end-points paying attention to external validation and applicability domain analysis. Ø Evaluation of environmental behaviour and physico-chemical properties of emerging pollutants: BFRs and PFCs. Ø Identification of more toxic and dangerous chemicals based on the studied end-points. Ø Prioritization of chemicals for experimental tests under CADASTER project Ø Mechanistic interpretation of selected descriptors, highlighting the fate, distribution and properties of chemicals. EU-FP 7 Project CADASTER Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
OECD Principles for QSAR models in REACH To facilitate the consideration of a QSAR model for regulatory purposes, it should be associated with the following information: Ø a defined endpoint Ø an unambiguous algorithm Ø a defined domain of applicability - Ø appropriate measures of goodness of fit, robustness and predictivity Ø a mechanistic interpretation, if possible Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
METHODS Application of the OECD principles for QSAR models 1. Defined end-points of Phys-chem and Toxicity 2. Unambiguous algorithm: • Chemical representation by theoretical molecular descriptors (DRAGON) selected by Genetic Algorithms • Statistical method MLR regression (OLS) 3. Validation for model stability and predictivity (internal and external validation) 4. Applicability Domain Analysis: leverage approach by Hat matrix (MLR) 5. Interpretation of the selected molecular descriptors, if possible. Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
RESULTS QSAR/QSPR models developed for Brominated Flame Retardants Simona Kovarich Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
RESULTS – QSPR models Physico-chemical and degradation Properties Train obj. Test obj. Full 30 - k-ANN Split 24 6 Full 20 - k-ANN Split 14 6 Full 25 - k-ANN Split 20 5 Full 34 - k-ANN Split 28 6 Log. S Full 12 - Log. H Full 7 Log. Kp* Full Log. HLp* Full Endpoint Log. KOA Log. KOW MP Log. PL Model R 2 % Q 2 LOO % Q 2 EXT % on 243 BFR 97. 4 96. 8 - 81. 9 96. 1 95. 0 95. 2 - 96. 4 95. 6 - 86. 0 97. 1 95. 9 94. 7 - 84. 4 81. 9 - 95. 9 82. 2 78. 5 93. 7 - 98. 7 98. 5 - 83. 1 98. 8 98. 5 98. 6 - Mor 23 m 91. 8 88. 5 - 95. 1 - BEHe 7 96. 9 93. 3 - 55. 6 15 - MW 94. 9 93. 8 - 91. 4 15 - T(O. . Br) 94. 3 92. 6 - 81. 9 Desc. T(O. . Br) X 2 A T(O. . Br) AD% * Photodegradation E. Papa, S. Kovarich, P. Gramatica, 2009. Development, validation and inspection of the applicability domain of QSPR models for physico-chemical properties of polybrominated diphenyl ethers. QSAR & Comb. Sci. , 28, 790 -796.
RESULTS - Model for Log Koa Log. Koa= 6. 654 +0. 222 T(O. . Br) n° Obj Descriptor R 2% Q 2 boot% Q 2 EXT(rand 20%) % 30 T(O. . Br) 97. 36 96. 77 99. 56 nona-deca 90. 4 % into AD Are the predictions in the structural domain ? Experimental range of Log. Koa: 7. 34 (mono-BDE) – 11. 96 (hepta-BDE)
RESULTS – Interpretation of descriptors The same descriptor, i. e. T(O. . . Br), was selected as the best modeling variable for three different properties which are related to each other (Log. PL, Log. Koa Log. Kow, Log. Kow Log. HLp). This descriptor gives a double structural information: its values increases according to both the number and the distance of bromine substituents from the oxygen ether, on each phenyl ring. Thus, T(O. . . Br) takes also into account the information related to the position of the bromine atoms on the phenyl rings. Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Comparison with some existing models Predicted and Experimental data for 30 PBDEs tetra-hepta mono-tri Author Method N° obj. N° vars R 2% Q 2 LOO% Q 2 EXT % RMSE (30 obj) Papa et al. (2009) MLR 30 1 97. 4 96. 8 99. 6 0. 23 Xu et al. (2007) MLR 22 2 97. 6 97. 2 - 0. 31 Chen et al. (2003) PLS 13 10 97. 9 97. 5 - - Koa. WIN (Episuite) KOW/KAW 0. 81
Comparison with some existing models Predictions for 209 PBDEs n° bromine increase = D increase YPapa = Predictions by our model (range Log Koa: 7. 32 – 15. 09) YEpisuite = Predictions by Koa. WIN (Dmax = 3. 33 log units; range Log Koa: 6. 8118. 23) YXu = Predictions by Xu et al. (2007) (Dmax =1. 06 log units; range Log Koa: 7. 4 -15. 73)High difference with EPISUITE for highly brominated PBDEs Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
RESULTS – Environmental fate of BFRs 5 <Log. Kow<7 Risk for tri-penta BDE!! Resistance to Photodegradation / Mobility
RESULTS – QSAR models Endocrine Disrupting Activity Train obj. Test obj. Full 18 - Random Split 10 8 Log 1/IC 50 PRANT Full 19 - Random Split 10 9 Log T 4 REP Full 17 - 9 8 21 - 11 10 Endpoint Log 1/RBA Model Random Split Log. E 2 SULT Full -REP Random Split R 2 % Q 2 LOO % Q 2 EXT % on 243 BFR RDF 080 v RDF 035 v 86. 1 79. 3 - 88. 5 87. 2 74. 0 76. 8 R 7 e+ GATS 8 e 85. 9 81. 7 - 91. 3 85. 9 71. 2 qpmax MATS 6 v 95. 2 92. 9 - 96. 7 91. 9 90. 5 B 08[C-O] GGI 7 87. 6 83. 6 - 87. 2 73. 2 87. 6 Desc. AD% 94. 2 97. 9 100 RBA = Ah. R Relative Binding Affinity = EC 50(TCDD) / EC 50(BFR) PRANT = Progesterone Receptor Antagonism T 4 -REP = T 4 -TTR Relative Competition = IC 50(T 4) / IC 50(BFR) E 2 SULT-REP = E 2 SULT Relative Inhibition = IC 50(E 2) / IC 50(BFR) E. Papa, S. Kovarich, P. Gramatica, QSAR modeling and prediction of the Endocrine disrupting potencies of brominated flame retardants, Submitted to J. Chem. Inf. Mod. , 2010.
RESULTS - Model for Log. E 2 SULT-REP Equation of the “Split Model” (Random 50%): Log. E 2 SULT-REP = -0. 56 + 2. 10 B 08[C-O] – 2. 77 GGI 7 R 2 = 0. 87 Q 2 LOO = 0. 73 Q 2 EXT = 0. 88 MORE ACTIVE THAN PCP!
RESULTS QSAR/QSPR models developed for Per-fluorinated Chemicals Barun Bhhatarai Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Results: QSAR models for LC 50 inhalation Mouse Inhalation 56 compounds Splitting Compounds SOM 28. 5% Train: 40 Test: 16 Random by Activity 20% Train: 44 Test: 12 Variables selected X 3 v; H-048; MLOGP; F 01[C-C] Full model Rat Inhalation 52 compounds SOM 18. 9% Train: 42 Test: 10 Random by Train: 42 Activity Test: 10 20% Full model Jhetv: PCR; MLOGP; B 02[Cl-Cl] R 2 (%) Q 2 LOO Q 2 BOOT Q 2 ext R 2 -YScrm 82. 99 78. 09 75. 46 71. 62 10. 32 77. 07 71. 73 69. 89 85. 11 8. 99 79. 83 76. 31 75. 38 - 7. 05 78. 36 72. 99 71. 95 75. 47 8. 75 80. 01 75. 21 74. 12 66. 70 9. 91 78. 14 73. 85 73. 26 - 7. 64 Barun Bhhatarai and Paola Gramatica, Per- and Poly-fluoro Toxicity (LC 50 inhalation) Study in Rat and Mouse using QSAR Modeling, Chem. Res. Toxicol, 2010, in press Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 21
Regression plots for the models on datasets split by SOM Mouse Rat 22 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Descriptor analysis RAT Jhetv PCR Mlog. P B 02[Cl-Cl] bond multiplicity, the heteroatoms and the number of atoms conventional bond-order ID number (pi. ID) divided by the total path count MOUSE hydrophobicity presence of heteroatom and double and triple bonds presence/absence of Cl-Cl at topological distance 02 total number of C-C bond Mlog. P X 3 v F 01[C-C] H-048 formal oxidation number of C-atom which is the sum of the formal bond orders with electronegative atoms • Common descriptor characterizing Hydrophobicity was negative for both species • Jhet. V and X 3 v have similar chemical meanings and are positive for both species • B 02[Cl-Cl] present for 5 of 52 compounds – fitting (? ) descriptor to include all Freons 23 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Applicability Domain (AD) study on 250 PFCs • 75. 6% coverage of PFCs in Mouse model (61 compounds are out of structural domain) and 76. 8% coverage in Rat model (53 out). • Arbitrary cutoff 0. 5 (dotted lines): 11 common compounds are out of domain Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 24
Focus on AD: Common Out-of-domain compounds • Predicted compounds out of applicability domain of both Mouse and Rat model are long chain PFCs (>15 -Carbon) • They are probably extrapolated as the longest compounds in the training sets are with 7 -Carbon 25 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Toxicity Trend Increasing Toxicity Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 26
More Toxic Chemicals Predicted: by PCA analysis PFOA PFOS is under investigation as toxic These chemicals have been suggested to the CADASTER Partners for experimental tests Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 27
QSPR of Melting point: Data splitting Melting Point 94 SOM split descriptor Random split response 53 Training 48 Training 41 Prediction I Perfluorinated chemicals (PERFORCE) 46 Prediction I 17 compounds Prediction II 28 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Results: Melting point (94+17) Variables Train 53 AAC F 02[C-F] C-013 48 Set Prediction I SOM 41 test R 2 Q 2 loo Q 2 boot RMSE Q 2 ext* train ext R 2 Yscr 46. 65 70. 16 5. 18 71. 89 77. 11 73. 35 40. 86 Prediction II 17 test 71. 90 25. 04 91. 40 5. 16 Prediction I Response 46 test 77. 48 48. 52 72. 16 5. 84 24. 60 92. 84 6. 59 41. 86 (cv) - 2. 82 82. 85 79. 30 Prediction II 17 test Total 111 38. 07 77. 36 78. 45 76. 82 76. 60 40. 36 *Consonni, V. , et al. J. Chem. Inf. Model. , 49, 1669 -1678. AAC = mean information index on atomic correlations, information indices F 02[C-F] = frequency of C-F at topological distance 02, 2 D frequency fingerprint C-013 = corresponds to CRX 3 (X =electronegative atom), atom-centered fragments Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 29
Analysis of Melting Point Model MP = 148. 81 (± 18. 43) AAC + 4. 03 (± 0. 66) F 02[C-F] – 14. 47 (± 6. 88) C-013 – 269. 25 n=111 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 30
QSPR of Boiling point: Data splitting Boiling Point 105 SOM split descriptor Random split response 55 Training 53 Training 50 Prediction I Perfluorinated chemicals (PERFORCE) 52 Prediction I 25 compounds Prediction II Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 31
Results: Boiling point (105+25) Variables Train 55 Set Prediction I SOM 50 test R 2 Q 2 loo 53 Prediction I Response 52 test 87. 50 85. 25 81. 38 86. 40 83. 55 87. 54 Q 2 ext* R 2 Yscr 34. 54 75. 71 5. 73 29. 14 85. 17 5. 55 28. 98 87. 50 6. 12 26. 20 89. 53 5. 35 29. 42 (cv) - 2. 41 30. 23 80. 78 88. 54 RMSE ext 24. 78 86. 26 Prediction II 25 test Total 130 RMSE train 83. 16 Prediction II 25 test Ms ATS 1 m n. ROH Q 2 boot 87. 37 28. 21 *Consonni, V. , et al. J. Chem. Inf. Model. , 49, 1669 -1678. Ms = mean electro-topological state, constitutional descriptor ATS 1 m = Autocorrelation of a topological structure, 2 D autocorrelations n. ROH = number of OH groups, functional group counts Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 32
Analysis of Boiling Point Model BP = 128. 43 (± 5. 295)ATS 1 m + 93. 833 (± 5. 85)n. ROH – 54. 23 (± 4. 25)Ms – 43. 098 n=130 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 33
QSPR of Vapor Pressure: Data splitting + PERFORCE data Vapor Pressure 35 SOM split Random split 24 Training 22 Training 11 Prediction I 13 Prediction I 34 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Results: Vapor Pressure (35) Variables n. DB; AAC; F 03[C-F] Q 2 boot RMSE train RMSE ext 84. 33 81. 63 0. 83 0. 97 87. 78 12. 69 93. 75 91. 23 82. 13 0. 64 1. 14 80. 36 14. 08 90. 93 88. 21 86. 06 0. 83 0. 95 (cv) - 8. 95 Set R 2 Q 2 loo Prediction I SOM 11 test 91. 07 Prediction I Response 13 test Total 35 Q 2 ext* R 2 Yscr *Consonni, V. , et al. J. Chem. Inf. Model. , 49, 1669 -1678. n. DB = number of double bonds, constitutional descriptor AAC = mean information index on atomic composition , information indices F 03[C-F] = frequency of C-F at topological distance 03, 2 D frequency fingerprints Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 35
Analysis of Vapour Pressure Model log VP = – 0. 642 (± 0. 405) n. DB – 3. 164 (± 0. 924) AAC – 0. 165 (± 0. 025) F 03[C-F] + 7. 97 n=35 Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 36
Summary of QSPR models on PFCs: End point Melting Point Boiling Point Vapor Pressure Descriptors AAC F 02[C-F] C-013 Ms ATS 1 m n. ROH CIC 0 MATS 1 v TPSA(Tot) n R 2 Q 2 loo Q 2 boot RMSE train RMSE cv RMSE EPI* (n) AD% 111 78. 5 76. 8 76. 1 40. 36 41. 86 46. 678 (248) 94. 7 130 88. 5 87. 3 27. 57 29. 12 43. 046 (290) 97. 9 35 90. 9 88. 2 87. 1 0. 83 0. 95 1. 12 (243) 94. 2 * http: //www. epa. gov/oppt/exposure/pubs/episuite. htm All our models have smaller RMSE in comparison to EPISUITE models Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 37
Conclusions • Predictive models were developed ad-hoc for several toxicity endpoints and physico-chemical properties • ‘OECD principles for the validation of QSAR models, for regulatory applicability’ was strictly followed • Simplicity (linear analysis, few descriptors, robust models) with external validation were used • Prediction of data for ~250 compounds was done for each set of chemicals: BFRs and PFCs • Applicability domain analysis also for new compounds was done • QSA(P)Rs developed could be used to fill data gaps according to the new REACH regulation, facilitating the screening and prioritization of chemicals, reducing animal testing as well as for design of alternative and safer chemicals Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Acknowledgements Financial support by the FP 7 th-EU Project CADASTER http: //www. qsar. it Thanks for your attention !! Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy) 39
- Slides: 39