Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian (CROSBI ID 30177)

Prilog u knjizi | izvorni znanstveni rad

Bekavac, Božo ; Osenova, Petya ; Simov, Kiril ; Tadić, Marko Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian // Fourth International Conference on Language Resources and Evaluation LREC2004 / Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima et al. (ur.). Pariz : Lisabon: European Language Resources Association (ELRA), 2004. str. 1187-1190-x

Podaci o odgovornosti

Bekavac, Božo ; Osenova, Petya ; Simov, Kiril ; Tadić, Marko

engleski

Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian

This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of ‘ light’ and ‘ hard’ comparable corpora is introduced. At this stage we aim at producing a ‘ light’ bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined.

corpus linguistics, comparable corpora, Croatian, Bulgarian

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

1187-1190-x.

objavljeno

Podaci o knjizi

Fourth International Conference on Language Resources and Evaluation LREC2004

Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel

Pariz : Lisabon: European Language Resources Association (ELRA)

2004.

2-9517408-1-6

Povezanost rada

Informacijske i komunikacijske znanosti, Filologija, Etnologija i antropologija