A method for compressing lexicons, DCC02, Data Compression Conference (CROSBI ID 482818)
Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija
Podaci o odgovornosti
Ristov, Strahil ; Laporte, Eric
engleski
A method for compressing lexicons, DCC02, Data Compression Conference
Natural language lexicon is a set of strings where each string consists of a word and the associated linguistic data. Its computer representation is a structure that returns appropriate linguistic data on a given input word. It should be small and fast. We propose a method for lexicon compression based on extant efficient method for compressing tries. Straightforward trie compression becomes ineffective when strings are long so words and associated data sets are compressed separately, additionally processed and linked with auxiliary index structure. The index file is compressed with canonical Huffman codes and, for the example of 660.000 entries, 18 Mbytes French phonetic lexicon, overall size of searchable compressed string set is 7% of the original size.
natural language lexicon; spelling-to-phonetic conversion; compressed trie; index compression
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
470-x.
2002.
objavljeno
Podaci o matičnoj publikaciji
DCC 2002
Storer, James; Cohn, Martin
IEEE, Computer Society
Podaci o skupu
Data Compression Conference
poster
02.04.2002-04.04.2002
Snowbird (UT), Sjedinjene Američke Države