Automated news item categorization (CROSBI ID 507168)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Bačan, Hrvoje ; Gulija, Darko ; Pandžić, Igor
engleski
Automated news item categorization
We present a system for automatic categorization of news items into a standard set of categories. The system has been built specifically for news stories written in Croatian language. It uses the standard set of news categories established by the International Press Telecommunications Council (IPTC). The algorithm used for categorization transforms each document into a vector of weights corresponding to an automatically chosen set of keywords. This process is performed on a large training set of news items, forming the multi-dimensional space populated by news items of known categories. An unknown news item is also transformed into a vector of keyword weights and then categorized using the k-NN method in this space. The has been trained on the collection of approx. 2700 manually categorized news provided by the Croatian News Agency and tested on a different set of approx. 500 randomly chosen news items from the same source. The automatic categorization gave a correct result for 85% of tested news items.
text categorization; machine learning; news categorization; IPTC
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
57-62-x.
2005.
objavljeno
Podaci o matičnoj publikaciji
Sumi, Yasuyuki ; Nishida, Toyoaki
Kitakyushu: Kyoto University
Podaci o skupu
JSAI 2005 Workshop on Conversational Informatics, in conjunction with the 19th Annual Conference of The Japanese Society for Artificial Intelligence JSAI 2005
predavanje
13.06.2005-14.06.2005
Kitakjūshū, Japan