Dimensionality reduction in representation of textual documents (CROSBI ID 546301)
Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija
Podaci o odgovornosti
Dobša, Jasminka
engleski
Dimensionality reduction in representation of textual documents
The task of information retrieval is to extract rele- vant documents for a certain query from collection of textual doc- uments. In the representation of documents in the vector space model documents are presented in the high dimensional vector space. Such a representation su® ; ; ers from the problems caused by the fact that relations between index terms are neglected. Relevant documents for a user query will be recognized only if there is term matching between query and document. That is why are developed methods of reparametrization which represent documents in the lower dimensional space in which documents on similar topic are clustered even if term pro¯ ; ; les used in them are little bit di® ; ; erent. Here are presented two methods of representation of documents in the lower dimensional space: latent semantic indexing and concept indexing. In the latent semantic indexing original representations of documents in the vector space model are projected onto the ¯ ; ; rst k left singular vectors, while in the case of concept indexing representations are projected onto the centroids of clusters. Addition of new documents in collection is particular problem. Vectors on which projection is done are constructed on the ba- sis of representation of all documents in the collection, and the computation of the representations of documents added in the col- lection in the space of reduced dimension demands recomputation of SVD decomposition (for latent semantic indexing) and concept decomposition (for concept indexing). The solution to this prob- lem is the development of methods which will give approximate representation of newly added documents in the space of reduced dimension. Possible solutions for approximate representations will be presented.
information retrieval; latent semantic indexing; concept indexing
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
24-24.
2008.
objavljeno
Podaci o matičnoj publikaciji
4th Croatian Mathematical Congres, CroMC2008
Rudolf Scitovski
Osijek:
Podaci o skupu
4th Croatian Mathematical Congres, CroMC2008
predavanje
17.06.2008-20.06.2008
Osijek, Hrvatska