Comparative analysis of stemmers, lemmatizers and POS taggers for English language (CROSBI ID 417376)
Ocjenski rad | sveučilišni preddiplomski završni rad
Podaci o odgovornosti
Krušić, Lucija
Martinčić-Ipšić, sanda
engleski
Comparative analysis of stemmers, lemmatizers and POS taggers for English language
Stemming algortihms, lemmatizators and Part-of- speech taggers are a crucial part of text processing tools and an important part of speech technologies. They are part of the subcategory of tasks of Natural language processing that deals with syntax (field of linguistics that studies the rules and processes that determine the structure of sentences). Natural language processing is an interdisciplinary field of studies that deals with the interaction between human languages and computors. The main goal of this bachelor thesis is to compare two stemming algorithms with a lemmatizer as well as to compare three POS taggers for english language. The thesis includes a theoretical overview of stemming algorithms and lemmatizers as well as various approaches to POS tagging. The thesis also includes a practical comparison between the Porter stemmer, Paice-Husk stemmer and WordNet lemmatizator and between Stanford, NlpDotNet and Genia POS taggers. Furthermore, the success rates of Stanford's POS tagger, NlpDotNet tagger and Genia Tagger are measured. The results are displayed in percentages, based on the quantity of errors in a given text. The results show that the WordNet lemmatizator was the most successful with an average of 76.34%, followed by the Paice-Husk stemmer which correctly stemmed 64.09% of words and Porter stemmer which had a 62.42% success rate. Among the POS taggers, Stanford's tagger proved to be the most successful with 91.06% of correctly tagged words, followed by Genia with 87% and NlpDotNet with 86.18%.
stemming, lemmatization, POS tagging, Natural language processing
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
70
21.09.2017.
obranjeno
Podaci o ustanovi koja je dodijelila akademski stupanj
Rijeka