Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Finding Multiword Term Candidates in Croatian (CROSBI ID 492299)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Tadić, Marko ; Šojat, Krešimir Finding Multiword Term Candidates in Croatian // Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003). Sofija: BAS, 2003. str. 102-107

Podaci o odgovornosti

Tadić, Marko ; Šojat, Krešimir

engleski

Finding Multiword Term Candidates in Croatian

The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candidates for multiword terminological units submitted to terminologists for further processing i.e. manual selecting of the &#8220 ; real terms&#8221 ; . The statistical measure of co-occurrence used is mutual information (MI3) accompanied with linguistic filters: stop-words and POS. The results on non-lemmatized material of a highly inflected lan-guage such as Croatian show that MI measure alone is not sufficient to find satisfactory number of multi-word term candidates. In this case the usage of absolute frequency combined with linguistic filtering techniques gives broader list of candidates for real terms.

Croatian Language; multiword terms; term candidates; statistical processing; mutual information

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

102-107.

2003.

objavljeno

Podaci o matičnoj publikaciji

Podaci o skupu

predavanje

08.09.2003-09.09.2003

Borovec, Bugarska

Povezanost rada

Filologija