A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

Kopriva, Ivica; Filipović, Marko

izvor podataka: crosbi ✓

A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels (CROSBI ID 181032)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Kopriva, Ivica ; Filipović, Marko A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels // BMC bioinformatics, 12 (2011), 496; 1-18. doi: 10.1186/1471-2105-12-496

Podaci o odgovornosti

Autori

Kopriva, Ivica ; Filipović, Marko

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

Sažetak

Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd=2.7%), 97.6% (sd=2.8%) and 90.8% (sd=5.5%) and average specificities of: 93.6% (sd=4.1%), 99% (sd=2.2%) and 79.4% (sd=9.8%) in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy.

Ključne riječi

sparse component analysis; feature extraction; disease prediction; mass spectra; gene expression levels

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

BMC bioinformatics

Volumen (broj)

12 (496)

Godina

2011.

Stranice rada

1-18

Status objave rada

objavljeno

e-ISSN

1471-2105

DOI

10.1186/1471-2105-12-496

Povezanost rada

Povezane osobe

Marko Filipović (autor/i)

Ivica Kopriva (autor/i)

Povezane ustanove

Institut Ruđer Bošković (098) (autorova ustanova)

Povezani projekti

Analiza višespektralih podataka (rezultat rada na projektu)

Područje

Računarstvo, Temeljne medicinske znanosti, Matematika

Poveznice

doi.org

biomedcentral.com

Indeksiranost

Scopus

Medline

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)