Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Combining morphological resources for Croatian (CROSBI ID 599937)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija

Šojat, Krešimir ; Merkler, Danijela ; Štefanec, Vanja ; Srebačić, Matea ; Tadić, Marko Combining morphological resources for Croatian // 9th Mediterranean Morphology Meeting Book of Abstracts / Raffaelli, Ida ; Kerovec, Barbara ; Srebačić, Matea (ur.). Zagreb: Filozofski fakultet Sveučilišta u Zagrebu, 2013. str. 51-52

Podaci o odgovornosti

Šojat, Krešimir ; Merkler, Danijela ; Štefanec, Vanja ; Srebačić, Matea ; Tadić, Marko

engleski

Combining morphological resources for Croatian

Lexica with morphological information are central components of various NLP tools as e.g. lemmatizers, stemmers and morphological analyzers. In the previous three decades computational processing of Croatian morphology has so far been focussed primarily on inflectional phenomena. The Croatian Morphological Lexicon (HML) comprises app. 120, 000 lemmas and all their inflectional forms. In its on-line version (http://hml.ffzg.hr) HML can be used both as lemmatizer and generator of inflected forms. HML is also used as the basis for morphosyntactic tagging of texts compliant with the MulTextEast recommendations v4.0. However, the processing of derivational phenomena has not been in the focus until recently. The necessity to combine these two lines of work became obvious in the development of tools for morphological analysis beyond inflection. These tools could be used for information extraction from annotated texts and similar tasks. Recently the development of the Derivational Database of Croatian Verbs (CroDeriV) has begun. It comprises more than 14, 000 verbal lemmas analyzed for morphemes. All verbs of the same root are interconnected, and thus the recognition of their derivational spans is enabled, e.g. verb hodati ‘to walk’ is in CroDeriV connected to 25 verbs with the root hod. Since the HML has not been provided with any derivational data and CroDeriV does not include inflectional patterns, we believe that combining these two resources can be beneficial for both of them. Particularly if this procedure can be performed automatically. In this paper we present the first attempts of automatic merging and expanding of these two resources. The experiment was performed on the lexical category of verbs in Croatian, that exhibit extremely rich derivational morphology in terms of affixation. In the first step we examined the coverage of lemmas in both resources. In Table 1. the overall number of verbal lemmas in both resources is aligned with the number of lemmas that are found in one resource, but not in another, i.e. lemmas that exist in HML, but are not listed in CroDeriV, and vice versa. No. of verbal lemmas Uncovered lemmas HML 8964 5716 CroDeriv 13 780 391 The results have shown that a rather large set of lemmas from CroDeriV is not listed in HML. In order to include them in HML we decided to extract them and automatically assign their inflectional patterns. The assignment of inflectional patterns is possible in cases when base verbs are already included in HML. For example, if HML contains the verb hodati ‘to walk’, but does not contain its derivative prehodati ‘to walk over’, the verb prehodati is assigned the inflectional pattern close to hodati based on the derivational relation among them via shared root. The word forms of lemmas with the assigned inflectional patterns can then be easily generated and incorporated into HML. In cases when HML does not contain a paricular base verb, the inflectional pattern has to be assigned manually. On the other hand, the CroDeriV can easily be enriched automatically with the inflectional patterns from HML via lemmas. This procedure enables enrichment of HML and CroDeriV as significant extension of their coverage and usability for further development of tools for morphological analysis in Croatian. The combined inflectional and derivational data can be also used for detailed research on Croatian morphology data, especially distribution and frequency of conjugational classes and verbal paradigms in corpora, i.e. in the area of Croatian linguistics so far almost exclusively based on the intuition of linguists.

morphological processing ; Croatian ; Croatian Morphological Lexicon ; CroDeriV

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

51-52.

2013.

objavljeno

Podaci o matičnoj publikaciji

9th Mediterranean Morphology Meeting Book of Abstracts

Raffaelli, Ida ; Kerovec, Barbara ; Srebačić, Matea

Zagreb: Filozofski fakultet Sveučilišta u Zagrebu

Podaci o skupu

9th Mediterranean Morphology Meeting

poster

15.09.2013-18.09.2013

Dubrovnik, Hrvatska

Povezanost rada

Informacijske i komunikacijske znanosti, Filologija