Finding Multiword Term Candidates in Croatian (CROSBI ID 492299)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Tadić, Marko ; Šojat, Krešimir
engleski
Finding Multiword Term Candidates in Croatian
The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candidates for multiword terminological units submitted to terminologists for further processing i.e. manual selecting of the “ ; real terms” ; . The statistical measure of co-occurrence used is mutual information (MI3) accompanied with linguistic filters: stop-words and POS. The results on non-lemmatized material of a highly inflected lan-guage such as Croatian show that MI measure alone is not sufficient to find satisfactory number of multi-word term candidates. In this case the usage of absolute frequency combined with linguistic filtering techniques gives broader list of candidates for real terms.
Croatian Language; multiword terms; term candidates; statistical processing; mutual information
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
102-107.
2003.
objavljeno
Podaci o matičnoj publikaciji
Podaci o skupu
predavanje
08.09.2003-09.09.2003
Borovec, Bugarska