Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space

Bašic, Ivan; Lučić, Bono; Nikolić, Sonja; Papeš-Šokčević, Lidija; Nadramija, Damir

izvor podataka: crosbi ✓

Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space (CROSBI ID 549928)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Bašic, Ivan ; Lučić, Bono ; Nikolić, Sonja ; Papeš-Šokčević, Lidija ; Nadramija, Damir Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space // International Conference of Computational Methods in Sciences and Engineering 2008 ; Special Volume of the American Institute of Physics (AIP) - Conference Proceedings of ICCMSE 2008. Vol. 1148 / Simos, Theodore (ur.). Melville (NY): American Institute of Physics (AIP), 2009. str. 408-411

Podaci o odgovornosti

Autori

Bašic, Ivan ; Lučić, Bono ; Nikolić, Sonja ; Papeš-Šokčević, Lidija ; Nadramija, Damir

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space

Sažetak

For selected data set published by Russom et al. (Environ. Toxicol. Chem. 16, 948-967 (1997)) containing 704 organic molecules with measured acute aquatic toxicity data (96-h LC50 tests) we calculated data set of more than 1400 molecular descriptors by the Dragon 5.0 program.[1] After we excluded descriptors that have almost constant values, and those having very low correlation with the logarithm of LC50 values on the training set, about 620 descriptors remained and were used in the modeling process. Data set of molecules was randomly partitioned into the training and test set containing 560 and 144 molecules, respectively. We developed and compared two kinds of ensemble of both linear and nonlinear multi-regression models (1) normal ensembles and (2) ensembles obtained by the clustering of molecules according to their similarity (clustered ensembles). Clustering of molecules was performed by calculating their Euclidian distances in normalized descriptor space. In this method, the final model was developed only on those molecules from the training set that are close (measured using Euclidian distance in normalized descriptor space) to the selected molecule from the test set. Although results obtained by normal ensembles are very good (e.g. nonlinear ensemble of 8-descriptor models ; rtrain = 0.91, strain = 0.54, rtest = 0.76, rtest = 0.80), significant improvement is obtained by taking into account clustering of molecules in development of ensembles of linear models (e.g. 200 3-descriptor models in ensemble: rtrain = 0.91, strain = 0.53, rtest = 0.836, rtest = 0.70 ; or for 200 5-descriptor models in ensemble rtrain = 0.94, strain = 0.45, rtest = 0.84, rtest = 0.70). These results clearly indicate that the use of information about similarity between molecules can improve structure-toxicity models, and we also expect that this could be valid generally.

Ključne riječi

Acute aquatic toxicity; Organic molecules; QSAR models; Molecular descriptors; Distance based similarity; Clustering of molecules; Ensemble of multi-regression models; Clustered ensembles

Napomena

doi:10.1063/1.3225331

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

408-411.

Godina izdavanja

2009.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

International Conference of Computational Methods in Sciences and Engineering 2008 ; Special Volume of the American Institute of Physics (AIP) - Conference Proceedings of ICCMSE 2008. Vol. 1148

Urednici

Simos, Theodore

Izdavač

Melville (NY): American Institute of Physics (AIP)

ISBN

978-0-7354-0685-8

Podaci o skupu

Skup

Nepoznat skup

Vrsta sudjelovanja

poster

Datum održavanja skupa

29.02.1904-29.02.2096

Povezanost rada

Povezane osobe

Sonja Nikolić (autor/i)

Bono Lučić (autor/i)

Lidija Papeš Šokčević (autor/i)

Ivan Bašic (autor/i)

Damir Nadramija (autor/i)

Povezane ustanove

Institut Ruđer Bošković (098) (autorova ustanova)

Pliva Hrvatska d.o.o. (289) (autorova ustanova)

Nastavni zavod za javno zdravstvo "Dr. Andrija Štampar" (121) (autorova ustanova)

Povezani projekti

Odnos strukture i aktivnosti flavonoida (rezultat rada na projektu)

Razvoj metoda za modeliranje svojstava bioaktivnih molekula i proteina (rezultat rada na projektu)

Područje

Kemija, Računarstvo

Poveznice

link.aip.org