Extracting most frequent Croatian root words using digram comparison and latent semantic analysis (CROSBI ID 508073)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Radoš, Zvonimir ; Jović, Franjo ; Job, Josip
engleski
Extracting most frequent Croatian root words using digram comparison and latent semantic analysis
A method for extracting root words from Croatian language text is presented. The described method is knowledge-free and can be applied to any language. Morphological and semantic aspects of the language were used. The algorithm creates morph-semantic groups of words and extract common root for every group. For morphological grouping we use digram comparison to group words depending on their morphological similarity. Latent semantic analysis is applied to split morphological groups into semantic subgroups of words. Root words are extracted from every morpho-semantic group. When applied to Croatian language text, among hundred most frequent root words, produced by this algorithm, there were 60 grammatically correct ones and 25 FAP (for all practical purposes) correct root words.
morphological analysis; LSA; word tree; stem; root word; knowledge-free
ISBN 972-8865-19-8
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
370-373.
2005.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of the 7th International Conference on Enterprise Information Systems (ICEIS 2005) : proceedings
Podaci o skupu
International Conference on Enterprise Information Systems (7 ; 2005)
predavanje
24.05.2005-28.05.2005
Miami (FL), Sjedinjene Američke Države