The applicability of lemmatisation in translation equivalents detection

Tadić, Marko; Fulgosi, Sanja; Šojat, Krešimir

izvor podataka: crosbi ✓

The applicability of lemmatisation in translation equivalents detection (CROSBI ID 28289)

Prilog u knjizi | izvorni znanstveni rad

Tadić, Marko ; Fulgosi, Sanja ; Šojat, Krešimir The applicability of lemmatisation in translation equivalents detection // Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora / Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela (ur.). London : New York (NY): Continuum International Publishing Group, 2004. str. 195-206-x

Podaci o odgovornosti

Autori

Tadić, Marko ; Fulgosi, Sanja ; Šojat, Krešimir

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

The applicability of lemmatisation in translation equivalents detection

Sažetak

The aim of the research is to help in identification of TEs in 1:1 aligned sentences at the level of single-word units. The research is based on the Croatian-English parallel corpus compiled at the University of Zagreb. The method is based entirely on a statistical approach with no linguistic filter applied before or after the processing which has 3 steps: 1) generation of all possible pairs of tokens from 1:1 aligned sentences (Carthesius product) ; 2) application of mutual information to generated pairs in order to detect candidates for real TE ; 3) sorting the pairs according to calculated MI and choosing real TE for further use. The same method was applied to nonlemmatized and lemmatized material. The latter demonstrated 4.5 % higher precision and it has proven our hypothesis that for Croatian-English pair (and possibly other morphologically rich languages like Croatian) the lemmatized form of corpus data helps the statistical methods of TE detection.

Ključne riječi

Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

195-206-x.

Status objave rada

objavljeno

Podaci o knjizi

Knjiga u kojoj je prilog objavljen

Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora

Urednici

Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela

Izdavač

London : New York (NY): Continuum International Publishing Group

Godina izdavanja

2004.

ISBN

082647490X

Povezanost rada

Povezane osobe

Marko Tadić (CroRIS ID: 12084; MBZ: 157043) (autor/i)

Sanja Fulgosi (CroRIS ID: 27116; MBZ: 256833) (autor/i)

Krešimir Šojat (CroRIS ID: 27039; MBZ: 255106) (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Filologija

Poveznice

is.bham.ac.uk