Exploring Classification Concept Drift on a Large News Text Corpus (CROSBI ID 176034)
Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Šilić, Artur ; Dalbelo Bašić, Bojana
engleski
Exploring Classification Concept Drift on a Large News Text Corpus
Concept drift research has regained research interest during recent years as many applications use data sources that are changing over time. We study the classification task using logistic regression on a large news collection of 248K texts during a period of seven years. We present extrinsic methods of concept drift detection and quantification using training set formation with different windowing techniques. On our corpus, we characterize concept drift and show the overestimation of classifier performance if it is neglected. We lay out paths for future work where we plan to refine extrinsic characterization methods and investigate the drifting of learning parameters when few examples are available.
text classification; concept drift; logistic regression
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
7181 (1)
2012.
428-437
objavljeno
0302-9743
10.1007/978-3-642-28604-9