Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus (CROSBI ID 573587)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Brkić, Marija ; Matetić, Maja ; Seljan, Sanja Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus // Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011. Chengdu, 2011. str. 1068-1070

Podaci o odgovornosti

Brkić, Marija ; Matetić, Maja ; Seljan, Sanja

engleski

Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus

This paper presents the acquisition of parallel bilingual corpus and all the steps involved in the process of unsupervised sentence alignment, such as tokenization, lowercasing, etc. The problem of sentence alignment is not trivial because translators do not necessarily translate one sentence in the source language into one sentence in the target language. Three different unsupervised and language independent approaches to sentence alignment are presented and implementations of these approaches through three different freely available tools are tested. A gold standard for English-Croatian automatic sentence alignment evaluation is created. Finally, a detailed analysis of the acquired corpus is given.

Sentence alignment ; alignment tools ; sentence alignment evaluation ; parallel corpus ; sentence-length ; word-correspondence

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

1068-1070.

2011.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011

Chengdu:

Podaci o skupu

4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011

predavanje

10.06.2011-12.06.2011

Sichuan, Kina

Povezanost rada

Informacijske i komunikacijske znanosti