Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Tagset Reductions in Morphosyntactic Tagging of Croatian Texts (CROSBI ID 556339)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko Tagset Reductions in Morphosyntactic Tagging of Croatian Texts // The Future of Information Sciences: Digital Resources and Knowledge Sharing (INFuture 2009) / Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David et al. (ur.). Zagreb: Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveučilišta u Zagrebu, 2009. str. 289-298

Podaci o odgovornosti

Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko

engleski

Tagset Reductions in Morphosyntactic Tagging of Croatian Texts

Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent of two things: the size of the training corpus and the number of different morphosyntactic tags encompassed by that corpus. Being that the 100 kw Croatia Weekly newspaper corpus by definition makes a rather small language model in terms of stochastic tagging of free domain texts, the paper presents an approach dealing with tagset reductions. Several meaningful subsets of the Croatian Multext- East version 3 morphosyntactic tagset specifications are created and applied on Croatian texts with the CroTag stochastic tagger, measuring overall tagging accuracy and F1-measures. Obtained results are discussed in terms of applying different reductions in different natural language processing systems and specific tasks defined by specific user requirements.

morphosyntactic tagging; part-of-speech tagging; stochastic tagger; Multext East tagset; tagset reductions; Croatian language

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

289-298.

2009.

objavljeno

Podaci o matičnoj publikaciji

The Future of Information Sciences: Digital Resources and Knowledge Sharing (INFuture 2009)

Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David ; Lasić-Lazić, Jadranka ; Slavić, Aida

Zagreb: Odsjek za informacijske i komunikacijske znanosti Filozofskog fakulteta Sveučilišta u Zagrebu

978-953-175-355-5

Podaci o skupu

INFuture2009

predavanje

04.11.2009-06.11.2009

Zagreb, Hrvatska

Povezanost rada

Informacijske i komunikacijske znanosti, Filologija