Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search

Merkler, Danijela; Agić, Željko; Tadić, Marko

izvor podataka: crosbi !

Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search (CROSBI ID 616936)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija

Merkler, Danijela ; Agić, Željko ; Tadić, Marko Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search // Proceedings of the 6th International Conference on Corpus Linguistics. Las Palmas: AELINCO, 2014. str. 42-42

Podaci o odgovornosti

Autori

Merkler, Danijela ; Agić, Željko ; Tadić, Marko

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search

Sažetak

The first version of the Croatian Morphological Lexicon (HML) was developed as early as 1994 and was utilized in the implementation of various experiments and systems dealing with Croatian. Since the HML is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, the lexicon is constantly being updated to newer versions by manual inserting unknown wordforms (i.e. the corresponding 3- tuples of lemmas, wordforms and morphosyntactic tags) in batches. Current version of HML cosists of 110.000 lemmas and more than 4.000.000 lexicon entries. Due to limitations in availability of expert human annotators and various other constraints, the process of manual inspection, lemma assingment and inflectional pattern selection for unknown wordforms is a rather slow procedure. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assingment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline.

Ključne riječi

morphological lexicon; automatic enlargement; Croatian language

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

42-42.

Godina izdavanja

2014.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Proceedings of the 6th International Conference on Corpus Linguistics

Izdavač

Las Palmas: AELINCO

Podaci o skupu

Skup

6th International Conference on Corpus Linguistics (CILC 2014)

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

22.05.2014-24.05.2014

Mjesto održavanja skupa

Las Palmas de Gran Canaria, Španjolska

Povezanost rada

Povezane osobe

Željko Agić (CroRIS ID: 27179; MBZ: 291312) (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Povezani projekti

Računalna sintaksa hrvatskoga jezika (rezultat rada na projektu)

Područje

Informacijske i komunikacijske znanosti

Poveznice

congresos.ulpgc.es