Building the Macedonian-Croatian Parallel Corpus (CROSBI ID 655324)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Cebović, Ines ; Tadić, Marko
engleski
Building the Macedonian-Croatian Parallel Corpus
In this paper we present the newly created parallel corpus of two under-resourced languages, namely, Macedonian-Croatian Parallel Corpus (mk-hr_pcorp) that has been collected during 2015 at the Faculty of Humanities and Social Sciences, University of Zagreb. The mk- hr_pcorp is a unidirectional (mk -> hr) parallel corpus composed of synchronic fictional prose texts received already in digital form with over 500 Kw in each language. The corpus was sentence segmented and provides 39, 735 aligned sentences. The alignment was done automatically and then post-corrected manually. The alignments order was shuffled and this enabled the corpus to be available under CC-BY license through META-SHARE. However, this prevents the research in language units over the sentence level.
written corpus ; parallel corpus ; Macedonian ; Croatian
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
4241-4244.
2016.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Goggi, Sara ; Grobelnik, Marko ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Helene ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios
Portorož : Pariz: European Language Resources Association (ELRA)
978-2-9517408-9-1
Podaci o skupu
Tenth International Conference on Language Resources and Evaluation (LREC 2016)
poster
23.05.2016-28.05.2016
Portorož, Slovenija
Povezanost rada
Filologija, Informacijske i komunikacijske znanosti, Interdisciplinarne humanističke znanosti