Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Automatic Acquisition of Inflectional Lexica for Morphological Normalisation (CROSBI ID 137478)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Šnajder, Jan ; Dalbelo Bašić, Bojana ; Tadić, Marko Automatic Acquisition of Inflectional Lexica for Morphological Normalisation // Information processing & management, 44 (2008), 5; 1720-1731. doi: 10.1016/j.ipm.2008.03.006

Podaci o odgovornosti

Šnajder, Jan ; Dalbelo Bašić, Bojana ; Tadić, Marko

engleski

Automatic Acquisition of Inflectional Lexica for Morphological Normalisation

Due to natural language morphology, words can take on various morphological forms. Morphological normalisation – often used in information retrieval and text mining systems – conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.

Morphological normalisation; morphological lexicon; lexicon acquisition; inflection; Croatian language; text mining; information retrieval

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

44 (5)

2008.

1720-1731

objavljeno

0306-4573

10.1016/j.ipm.2008.03.006

Povezanost rada

Računarstvo, Informacijske i komunikacijske znanosti, Filologija

Poveznice
Indeksiranost