Exploring String and Word Kernels on Croatian-English Parallel Corpus (CROSBI ID 548329)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Jonke, Zeno ; Šilić, Artur ; Dalbelo Bašić, Bojana
engleski
Exploring String and Word Kernels on Croatian-English Parallel Corpus
In this paper we investigate classification performance of kernels based document representations, as well as the influence of kernel parameters for text classification in two morphologically different languages. We explore and compare two kernel functions that work at different levels of a sentence. The first is the Gap weighted kernel, a member of the String kernels that operates at the character level and thus compares text documents by subsequences of characters. This removes the need for stemming or lemmatisation, since it captures the stems of the words automatically, which is very important in situations when tools for stemming or lemmatisation are not available. The second method is the Word sequence kernel, an extension of the String kernels that works at the level of the word. This approach provides a more natural representation of the text and has the advantage of reducing document representation, which in turn reduces computation time. These two methods are compared by exploring theirs parameters dependency and by measuring their classification performance for the Croatian-English parallel corpus.
word kernls; string kernels; text classification
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
308-311.
2009.
objavljeno
Podaci o matičnoj publikaciji
Intelligent Systems MIPRO 2009
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO
Podaci o skupu
International Conference MIPRO 2009
predavanje
25.05.2009-29.05.2009
Opatija, Hrvatska