Finding Multiword Term Candidates in Croatian

Tadić, Marko; Šojat, Krešimir

izvor podataka: crosbi ✓

Finding Multiword Term Candidates in Croatian (CROSBI ID 492299)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Tadić, Marko ; Šojat, Krešimir Finding Multiword Term Candidates in Croatian // Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003). Sofija: BAS, 2003. str. 102-107

Podaci o odgovornosti

Autori

Tadić, Marko ; Šojat, Krešimir

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Finding Multiword Term Candidates in Croatian

Sažetak

The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candidates for multiword terminological units submitted to terminologists for further processing i.e. manual selecting of the &#8220 ; real terms&#8221 ; . The statistical measure of co-occurrence used is mutual information (MI3) accompanied with linguistic filters: stop-words and POS. The results on non-lemmatized material of a highly inflected lan-guage such as Croatian show that MI measure alone is not sufficient to find satisfactory number of multi-word term candidates. In this case the usage of absolute frequency combined with linguistic filtering techniques gives broader list of candidates for real terms.

Ključne riječi

Croatian Language; multiword terms; term candidates; statistical processing; mutual information

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

102-107.

Godina izdavanja

2003.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003)

Izdavač

Sofija: BAS

Podaci o skupu

Skup

Information Extraction for Slavic Languages 2003 Workshop

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

08.09.2003-09.09.2003

Mjesto održavanja skupa

Borovec, Bugarska

Povezanost rada

Povezane osobe

Marko Tadić (CroRIS ID: 12084; MBZ: 157043) (autor/i)

Krešimir Šojat (CroRIS ID: 27039; MBZ: 255106) (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Filologija