Distributional Semantics Approach to Detecting Synonyms in Croatian Language (CROSBI ID 590915)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Karan, Mladen ; Šnajder, Jan ; Dalbelo Bašić, Bojana
engleski
Distributional Semantics Approach to Detecting Synonyms in Croatian Language
Identifying synonyms is important for many natural language processing and information retrieval applications. In this paper we address the task of automatically identifying synonyms in Croatian language using distributional semantic models (DSM). We build several DSMs using latent semantic analysis (LSA) and random indexing (RI) on the large hrWaC corpus. We evaluate the models on a dictionarybased similarity test – a set of synonymy questions generated automatically from a machine readable dictionary. Results indicate that LSA models outperform RI models on this task, with accuracy of 68.7%, 68.2%, and 61.6% on nouns, adjectives, and verbs, respectively. We analyze how word frequency and polysemy level affect the performance and discuss common causes of synonym misidentification.
Named Entities ; Extraction ; Classification
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
111-116.
2012.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of the Eighth Language Technologies Conference
Erjavec, Tomaž ; Žganec Gros, Jerneja
Ljubljana:
1581-9973
Podaci o skupu
Information Society 2012 - Eighth Language Technologies Conference
predavanje
08.10.2012-09.10.2012
Ljubljana, Slovenija