String Distance-Based Stemming of the Highly Inflected Croatian Language (CROSBI ID 554907)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Šnajder, Jan ; Dalbelo Bašić, Bojana
engleski
String Distance-Based Stemming of the Highly Inflected Croatian Language
Stemming refers to the grouping of morphologically related words into so-called stem classes for the purpose of improving information retrieval performance. Traditional approaches to stemming are language-specific and require a substantial amount of linguistic knowledge. A viable alternative is string distance-based stemming, in which stem classes are obtained by clustering word-forms from a corpus. In this paper, we apply string distance-based stemming to the highly inflected Croatian language using a number of string distance measures proposed in the literature. We focus on evaluating the stemming performance at both inflectional and derivational level, and investigate how this performance relates to the choice of the distance threshold value. Although our focus is on the Croatian language, we believe our results transfer well to languages of similar morphological complexity.
Stemming; morphology; string distance; Croatian language
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
411-415.
2009.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of Recent Advances in Natural Language Processing (RANLP-2009)
Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan ; Nicolov, Nicolas ; Nikolov, Nikolai
Šumen: Incoma
1313-8502
Podaci o skupu
International Conference Recent Advances in Natural Language Processing'2009 (RANLP-2009)
poster
14.09.2009-16.09.2009
Borovec, Bugarska