crta
Hrvatska znanstvena Sekcija img
bibliografija
3 gif
 Naslovna
 O projektu
 FAQ
 Kontakt
4 gif
Pregledavanje radova
Jednostavno pretraživanje
Napredno pretraživanje
Skupni podaci
Upis novih radova
Upute
Ispravci prijavljenih radova
Ostale bibliografije
Slični projekti
 Bibliografske baze podataka

Pregled bibliografske jedinice broj: 267479

Časopis

Autori: Malenica, Mislav; Šmuc, Tomislav; Jan, Šnajder; Dalbelo Bašić, Bojana
Naslov: Language Morphology Offset: Text Classification on a Croatian-English Parallel Corpus
Izvornik: Information Processing & Management (0306-4573) 44 (2008), 1; 325-339
Vrsta rada: članak
Ključne riječi: text classification; SVM; Croatian; English; morphological normalisation; stemming; lemmatization; feature selection
Sažetak:
We investigate how, and to what extent, morphological complexity of the language influences text classification using Support Vector Machines (SVM). The Croatian-English parallel corpus provides the basis for direct comparison of two languages of radically different morphological complexity. We quantified, compared, and statistically tested the effects of morphological normalisation on SVM classifier performance based on a series of parallel experiments on both languages, carried over a large scale of different feature subset sizes obtained by different feature selection methods, and applying different levels of morphological normalisation. We also quantified the trade-off between feature space size and performance for different levels of morphological normalisation, and compared the results for both languages. Our experiments have shown that the improvements in SVM classifier performance are statistically significant ; they are greater for small and medium number of features, especially for Croatian, whereas for large number of features the improvements are rather small and may be negligible in practice for both languages.
Projekt / tema: 098-0982560-2563, 036-1300646-1986
Izvorni jezik: ENG
Rad je indeksiran u
bazama podataka:
Current Contents Connect (CCC)
Scopus
SCI-EXP, SSCI i/ili A&HCI
Science Citation Index Expanded (SCI-EXP) (sastavni dio Web of Science Core Collectiona)
Social Science Citation Index (SSCI) (sastavni dio Web of Science Core Collectiona)
Kategorija: Znanstveni
Znanstvena područja:
Računarstvo,Informacijske i komunikacijske znanosti
URL Internet adrese: http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6VC8-4NSV189-1-1&_cdi=5948&_user=3875467&_orig=browse&_coverDate=01%2F31%2F2008&_sk=999559998&view=c&wchp=dGLbVlb-zSkWb&md5=2ebddffc5f104d6a3f2770271cb9fd7a&ie=/sdarticle.pdf
URL cjelovitog teksta:
Google Scholar: Language Morphology Offset: Text Classification on a Croatian-English Parallel Corpus



  Verzija za printanje   za tiskati


upomoc
foot_4