Web indexing and search with local language support (CROSBI ID 502395)
Prilog sa skupa u zborniku | stručni rad | međunarodna recenzija
Podaci o odgovornosti
Krstinić, Damir ; Slapničar, Ivan
engleski
Web indexing and search with local language support
Web search is becoming essential for every day life, where major need arises for extracting relevant knowledge from enormous amounts of the available data. In a modern information retrieval systems, data is modeled as a term-by-document matrix. User query is represented as a vector and database search becomes a simple vector operation. The Latent Semantic Indexing (LSI) method reduces the size of term by document matrix and improves the performance of information retrieval system. Great majority of these systems are based on the English language. Although these systems are applicable to documents in other languages, they can suffer from incomplete terms recognition. We focus on languages with a complex set of grammar rules where improvement can be achieved by giving the indexing system basic knowledge of the language, and ability to recognize different forms of the same word. Using this technique, original matrix can be reduced by order of magnitude and important term-document connections strengthened. We are developing web indexing engine with local language support using Ispell dictionary files. As part of this effort, Croatian language dictionary files have been developed.
WWW; Internet; information retrieval; text search; vector spaces; latent semantic indexing; LSI; singular value decomposition; SVD; grammar; web spider
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
488-492-x.
2003.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of SoftCOM 2003
D. Begušić, N. Rožić
Split: Fakultet elektrotehnike, strojarstva i brodogradnje Sveučilišta u Splitu
Podaci o skupu
SoftCOM 2003
predavanje
07.10.2003-10.10.2003
Ancona, Italija; Venecija, Italija; Dubrovnik, Hrvatska; Split, Hrvatska