Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Comparative analysis of stemmers, lemmatizers and POS taggers for English language (CROSBI ID 417376)

Ocjenski rad | sveučilišni preddiplomski završni rad

Krušić, Lucija Comparative analysis of stemmers, lemmatizers and POS taggers for English language / Martinčić-Ipšić, sanda (mentor); Rijeka, . 2017

Podaci o odgovornosti

Krušić, Lucija

Martinčić-Ipšić, sanda

engleski

Comparative analysis of stemmers, lemmatizers and POS taggers for English language

Stemming algortihms, lemmatizators and Part-of- speech taggers are a crucial part of text processing tools and an important part of speech technologies. They are part of the subcategory of tasks of Natural language processing that deals with syntax (field of linguistics that studies the rules and processes that determine the structure of sentences). Natural language processing is an interdisciplinary field of studies that deals with the interaction between human languages and computors. The main goal of this bachelor thesis is to compare two stemming algorithms with a lemmatizer as well as to compare three POS taggers for english language. The thesis includes a theoretical overview of stemming algorithms and lemmatizers as well as various approaches to POS tagging. The thesis also includes a practical comparison between the Porter stemmer, Paice-Husk stemmer and WordNet lemmatizator and between Stanford, NlpDotNet and Genia POS taggers. Furthermore, the success rates of Stanford's POS tagger, NlpDotNet tagger and Genia Tagger are measured. The results are displayed in percentages, based on the quantity of errors in a given text. The results show that the WordNet lemmatizator was the most successful with an average of 76.34%, followed by the Paice-Husk stemmer which correctly stemmed 64.09% of words and Porter stemmer which had a 62.42% success rate. Among the POS taggers, Stanford's tagger proved to be the most successful with 91.06% of correctly tagged words, followed by Genia with 87% and NlpDotNet with 86.18%.

stemming, lemmatization, POS tagging, Natural language processing

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

70

21.09.2017.

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Rijeka

Povezanost rada

Informacijske i komunikacijske znanosti, Računarstvo