crta
Hrvatska znanstvena Sekcija img
bibliografija
3 gif
 Naslovna
 O projektu
 FAQ
 Kontakt
4 gif
Pregledavanje radova
Jednostavno pretraživanje
Napredno pretraživanje
Skupni podaci
Upis novih radova
Upute
Ispravci prijavljenih radova
Ostale bibliografije
Slični projekti
 Bibliografske baze podataka

Pregled bibliografske jedinice broj: 938196

Zbornik radova

Autori: Svoboda, Lukáš; Beliga, Slobodan
Naslov: Evaluation of Croatian Word Embeddings
Izvornik: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) / Calzolari, N. ; Choukri, K. ; Cieri, C. ; Declerck, T. ; Goggi, S. ; Hasida, K. ; Isahara, H. ; Maegaard, B. ; Mariani, J. ; Mazo, H. ; Moreno, A. ; Odijk, J. ; Piperidis, S. ; Tokunaga, T. (ur.). - Paris, France : European Language Resources Association (ELRA) , 2018. 1512-1518 (ISBN: 979-10-95546-00-9).
Skup: Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Mjesto i datum: Miyazaki, Japan, 7-12.5.2018.
Ključne riječi: Croatian word embeddings ; Croatian word analogy ; Croatian language ; Slavic language family ; Word2Vec ; FastText ; Croatian word similarity dataset ; WordSim353 ; RG65
Sažetak:
Croatian is poorly resourced and highly inflected language from Slavic language family. Nowadays, research is focusing mostly on English. We created a new word analogy dataset based on the original English Word2vec word analogy dataset and added some of the specific linguistic aspects from the Croatian language. Next, we created Croatian WordSim353 and RG65 datasets for a basic evaluation of word similarities. We compared created datasets on two popular word representation models, based on Word2Vec tool and fastText tool. Models have been trained on 1.37B tokens training data corpus and tested on a new robust Croatian word analogy dataset. Results show that models are able to create meaningful word representation. This research has shown that free word order and the higher morphological complexity of Croatian language influences the quality of resulting word embeddings.
Rad je indeksiran u
bazama podataka:
Conference Proceedings Citation Index - Science (CPCI-S) (sastavni dio Web of Science Core Collectiona)
Conference Proceedings Citation Index - Social Science & Humanities (CPCI-SSH) (sastavni dio Web of Science Core Collectiona)
Vrsta sudjelovanja: Predavanje
Vrsta prezentacije u zborniku: Cjeloviti rad (više od 1500 riječi)
Vrsta recenzije: Međunarodna recenzija
Izvorni jezik: ENG
Kategorija: Znanstveni
Znanstvena područja:
Računarstvo,Informacijske i komunikacijske znanosti
Puni text rada: 938196.1111.pdf (tekst priložen 15. Svi. 2018. u 09:43 sati)
URL Internet adrese: http://www.lrec-conf.org/proceedings/lrec2018/pdf/1111.pdf
Upisao u CROSBI: Slobodan Beliga (sbeliga@inf.uniri.hr), 15. Svi. 2018. u 09:43 sati



Verzija za printanje   za tiskati


upomoc
foot_4