crta
Hrvatska znanstvena Sekcija img
bibliografija
3 gif
 Naslovna
 O projektu
 FAQ
 Kontakt
4 gif
Pregledavanje radova
Jednostavno pretraživanje
Napredno pretraživanje
Skupni podaci
Upis novih radova
Upute
Ispravci prijavljenih radova
Ostale bibliografije
Slični projekti
 Bibliografske baze podataka

Pregled bibliografske jedinice broj: 427400

Poglavlje/Rad u knjizi

Autori: Šnajder, Jan; Dalbelo Bašić, Bojana; Tadić, Marko
Naslov: Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language
Knjiga: Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project
Urednik/ci: Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine
Izdavač: Croatian Language Technologies Society
Grad: Zagreb
Godina: 2009
Serija: Language and Technology
Raspon stranica:: 23-80
Ukupni broj stranica u knjizi:: 238
ISBN: 978-953-55375-1-9
Ključne riječi: Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language
Sažetak:
Due to language morphology, words appear in text in various inflectional and derivational forms. This morphological variation has been shown to negatively affect the performance of most information retrieval and text mining systems. Morphological variation may be reduced by performing morphological normalisation, i.e., the conflation of morphological variants of a word into a single representative form. A lexicon-based approach to normalisation allows for high normalisation precision, which for morphologically complex languages may otherwise be difficult to achieve. In this paper we describe a two-stage lexicon-based approach to morphological normalisation that addresses both inflectional and derivational variation. To eliminate the immense effort required to compile a lexicon by hand, we devise a procedure for acquiring automatically an inflectional morphological lexicon from raw corpora. We also propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. We apply our approach to the morphologically complex Croatian language, but our approach should be equally applicable to other languages of similar morphological complexity. A detailed task-independent evaluation reveals that our approach yields good normalisation performance at both inflectional and derivational level.
Projekt / tema: 036-1300646-1986
Izvorni jezik: ENG
Kategorija: Znanstveni
Znanstvena područja:
Računarstvo
Upisao u CROSBI: jsnajder@fer.hr (jsnajder@fer.hr), 21. Ruj. 2009. u 18:17 sati



Verzija za printanje   za tiskati


upomoc
foot_4