Comparison of Collocation Extraction Measures for Document Indexing

Petrović, Saša; Šnajder, Jan; Dalbelo Bašić, Bojana; Kolar, Mladen

izvor podataka: crosbi ✓

Comparison of Collocation Extraction Measures for Document Indexing (CROSBI ID 129974)

Prilog u časopisu | izvorni znanstveni rad

Petrović, Saša ; Šnajder, Jan ; Dalbelo Bašić, Bojana ; Kolar, Mladen Comparison of Collocation Extraction Measures for Document Indexing // CIT. Journal of computing and information technology, 14 (2006), 4; 321-327

Podaci o odgovornosti

Autori

Petrović, Saša ; Šnajder, Jan ; Dalbelo Bašić, Bojana ; Kolar, Mladen

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Comparison of Collocation Extraction Measures for Document Indexing

Sažetak

Automatic extraction of collocations from a corpus is a well-known problem in the ﬁ eld of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an aboundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words.

Ključne riječi

corpus statistics; collocation extraction; statistical natural language processing; document indexing

Napomena

International Conference on Information Technology Interfaces : ITI 2006. (28 ; 2006) / Vesna Lužar-Stiffler, Vesna Hljuz Dobrić (ur.) ; Zagreb : University of Zagreb, SRCE, 2006. ; str. 451-456 ; Cavtat, Hrvatska ; 19.-22.06.2006. ; ISBN 953-7138-05-4

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

CIT. Journal of computing and information technology

Volumen (broj)

14 (4)

Godina

2006.

Stranice rada

321-327

Status objave rada

objavljeno

ISSN

1330-1136

Povezanost rada

Povezane osobe

Jan Šnajder (autor/i)

Povezane ustanove

Fakultet elektrotehnike i računarstva (036) (autorova ustanova)

Područje

Računarstvo