Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

String Distance-Based Stemming of the Highly Inflected Croatian Language (CROSBI ID 554907)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Šnajder, Jan ; Dalbelo Bašić, Bojana String Distance-Based Stemming of the Highly Inflected Croatian Language // Proceedings of Recent Advances in Natural Language Processing (RANLP-2009) / Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan et al. (ur.). Šumen: Incoma, 2009. str. 411-415

Podaci o odgovornosti

Šnajder, Jan ; Dalbelo Bašić, Bojana

engleski

String Distance-Based Stemming of the Highly Inflected Croatian Language

Stemming refers to the grouping of morphologically related words into so-called stem classes for the purpose of improving information retrieval performance. Traditional approaches to stemming are language-specific and require a substantial amount of linguistic knowledge. A viable alternative is string distance-based stemming, in which stem classes are obtained by clustering word-forms from a corpus. In this paper, we apply string distance-based stemming to the highly inflected Croatian language using a number of string distance measures proposed in the literature. We focus on evaluating the stemming performance at both inflectional and derivational level, and investigate how this performance relates to the choice of the distance threshold value. Although our focus is on the Croatian language, we believe our results transfer well to languages of similar morphological complexity.

Stemming; morphology; string distance; Croatian language

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

411-415.

2009.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of Recent Advances in Natural Language Processing (RANLP-2009)

Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan ; Nicolov, Nicolas ; Nikolov, Nikolai

Šumen: Incoma

1313-8502

Podaci o skupu

International Conference Recent Advances in Natural Language Processing'2009 (RANLP-2009)

poster

14.09.2009-16.09.2009

Borovec, Bugarska

Povezanost rada

Računarstvo

Poveznice