Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian (CROSBI ID 30177)
Prilog u knjizi | izvorni znanstveni rad
Podaci o odgovornosti
Bekavac, Božo ; Osenova, Petya ; Simov, Kiril ; Tadić, Marko
engleski
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian
This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of ‘ light’ and ‘ hard’ comparable corpora is introduced. At this stage we aim at producing a ‘ light’ bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined.
corpus linguistics, comparable corpora, Croatian, Bulgarian
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
1187-1190-x.
objavljeno
Podaci o knjizi
Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel
Pariz : Lisabon: European Language Resources Association (ELRA)
2004.
2-9517408-1-6