Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Building a gold standard for event detection in Croatian (CROSBI ID 571701)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Ljubešić, Nikola ; Boras, Damir ; Lauc, Tomislava Building a gold standard for event detection in Croatian / Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente et al. (ur.). Valletta: European Language Resources Association (ELRA), 2010. str. 3101-3104

Podaci o odgovornosti

Ljubešić, Nikola ; Boras, Damir ; Lauc, Tomislava

engleski

Building a gold standard for event detection in Croatian

This paper describes the process of building a newspaper corpus annotated with events described in specific documents. The main differ- ence to the corpora built as part of the TDT initiative is that documents are not annotated by topics, but by specific events they describe. Additionally, documents are gathered from sixteen sources and all documents in the corpus are annotated with the corresponding event. The annotation process consists of a browsing and a searching step. Experiments are performed with a threshold that could be used in the browsing step yielding the result of having to browse through only 1% of document pairs for a 2% loss of relevant document pairs. A statistical analysis of the annotated corpus is undertaken showing that most events are described by few documents while just some events are reported by many documents. The inter- annotator agreement measures show high agreement concerning grouping documents into event clusters, but show a much lower agreement concerning the number of events the documents are organized into. An initial experiment is described giving a baseline for further research on this corpus.

event detection; gold standard; newspaper text; Croatian language

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

3101-3104.

2010.

objavljeno

Podaci o matičnoj publikaciji

Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Rosner, Mike ; Tapias, Daniel

Valletta: European Language Resources Association (ELRA)

2-9517408-6-7

Podaci o skupu

Language Resources and Evaluation Conference

poster

17.05.2010-23.05.2010

Valletta, Malta

Povezanost rada

Informacijske i komunikacijske znanosti