Compressing Gazetteers Revisited (CROSBI ID 561497)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Budišćak, Ivan ; Piskorski, Jakub ; Ristov, Strahil
engleski
Compressing Gazetteers Revisited
Finite-state automata are state-of-the-art representation of gazetteers in NLP. This paper compares different methods for gazetteer compression based on two, independently published, algorithms for automata substructure recognition. The more recent algorithm, that we denote REC-FSA (Recursive Finite State Automaton) has been invented specially for gazetteer compression and reported as the most space efficient approach at the time of publication. In this paper we apply the older method, denoted here with REC-FSA-2 and obtain circa 30% improvement of the compression rate compared to the more recent algorithm. However, the latter algorithm is much faster. We employ previously published modification of REC- FSA-2, that we denote REC-FSA-2-DICT, to achieve a viable compromise between the compression efficiency and time complexity. The results reported here represent the state-of-the-art in gazetteer compression.
Recursive Finite State Automata; Automata Compression; Gazetteer Compression
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
2009.
objavljeno
Podaci o matičnoj publikaciji
Pre-proceedings of the Eighth International Workshop on Finite-State Methods and Natural Language Processing 2009 workshop
Watson, Bruce ; Kourie, Derrick ; Cleophas, Loek ; Rautenbach, Pierre
Pretoria: University of Pretoria
978-1-86854-743-2
Podaci o skupu
Eighth International Workshop on Finite-State Methods and Natural Language Processing
predavanje
21.07.2009-24.07.2009
Pretoria, Južnoafrička Republika