Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space (CROSBI ID 549928)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Bašic, Ivan ; Lučić, Bono ; Nikolić, Sonja ; Papeš-Šokčević, Lidija ; Nadramija, Damir Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space // International Conference of Computational Methods in Sciences and Engineering 2008 ; Special Volume of the American Institute of Physics (AIP) - Conference Proceedings of ICCMSE 2008. Vol. 1148 / Simos, Theodore (ur.). Melville (NY): American Institute of Physics (AIP), 2009. str. 408-411

Podaci o odgovornosti

Bašic, Ivan ; Lučić, Bono ; Nikolić, Sonja ; Papeš-Šokčević, Lidija ; Nadramija, Damir

engleski

Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space

For selected data set published by Russom et al. (Environ. Toxicol. Chem. 16, 948-967 (1997)) containing 704 organic molecules with measured acute aquatic toxicity data (96-h LC50 tests) we calculated data set of more than 1400 molecular descriptors by the Dragon 5.0 program.[1] After we excluded descriptors that have almost constant values, and those having very low correlation with the logarithm of LC50 values on the training set, about 620 descriptors remained and were used in the modeling process. Data set of molecules was randomly partitioned into the training and test set containing 560 and 144 molecules, respectively. We developed and compared two kinds of ensemble of both linear and nonlinear multi-regression models (1) normal ensembles and (2) ensembles obtained by the clustering of molecules according to their similarity (clustered ensembles). Clustering of molecules was performed by calculating their Euclidian distances in normalized descriptor space. In this method, the final model was developed only on those molecules from the training set that are close (measured using Euclidian distance in normalized descriptor space) to the selected molecule from the test set. Although results obtained by normal ensembles are very good (e.g. nonlinear ensemble of 8-descriptor models ; rtrain = 0.91, strain = 0.54, rtest = 0.76, rtest = 0.80), significant improvement is obtained by taking into account clustering of molecules in development of ensembles of linear models (e.g. 200 3-descriptor models in ensemble: rtrain = 0.91, strain = 0.53, rtest = 0.836, rtest = 0.70 ; or for 200 5-descriptor models in ensemble rtrain = 0.94, strain = 0.45, rtest = 0.84, rtest = 0.70). These results clearly indicate that the use of information about similarity between molecules can improve structure-toxicity models, and we also expect that this could be valid generally.

Acute aquatic toxicity; Organic molecules; QSAR models; Molecular descriptors; Distance based similarity; Clustering of molecules; Ensemble of multi-regression models; Clustered ensembles

doi:10.1063/1.3225331

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

408-411.

2009.

objavljeno

Podaci o matičnoj publikaciji

International Conference of Computational Methods in Sciences and Engineering 2008 ; Special Volume of the American Institute of Physics (AIP) - Conference Proceedings of ICCMSE 2008. Vol. 1148

Simos, Theodore

Melville (NY): American Institute of Physics (AIP)

978-0-7354-0685-8

Podaci o skupu

Nepoznat skup

poster

29.02.1904-29.02.2096

Povezanost rada

Kemija, Računarstvo

Poveznice