Croatian Lemmatization Server (CROSBI ID 524479)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Tadić, Marko
engleski
Croatian Lemmatization Server
The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures — from textsearch, up to parsing. From two predominant approaches to lemmatization: 1) algorithmic (generally rule-based and realized with FSA) and 2) relational (generally data-driven and realized with databases), this paper opted for the latter. The reason is that formal-grammar approaches to Croatian morphology are rare and limited just to a part of morphological system. The other reason is that the generator for Croatian has already been developed (Tadić 1994) as well as Croatian Morphological Lexicon (CML) (Tadić & Fulgosi 2003). The idea was to offer an on-line lemmatization, POS/MSD service using CML v 4.5 as the back-end. The Croatian Lemmatization Server (CLS) is available at http://hml.hnk.ffzg.hr and it offers lemmatization and POS/MSD tagging at unigram level for now. For each token in submitted text, the server delivers all possible lemmas of which this token may be a word-form. For homographic tokens, each lemma is accompanied with all possible POS/MSD tags which are compliant to MulTextEast v3 specifications for Croatian . The CLS can also be used for generation: when lemma is inputted and marked, all its possible word-forms are being retrieved and delivered.
lemmatization; POS tagging; MSD tagging; Croatian; web-service
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
140-146-x.
2006.
objavljeno
Podaci o matičnoj publikaciji
Vulchanova, Mila Dimitrova ; Koeva, Svetla ; Krapova, Iliyana ; Vulchanov, Valentin
Sofija: Bugarska akademija znanosti
Podaci o skupu
Fifth International Conference Formal Approaches to South Slavic and Balkan languages (FASSBL)
predavanje
18.10.2006-20.10.2006
Bugarska