CroRIS - CROSBI

izvor podataka: crosbi !

Comparative analysis of stemmers, lemmatizers and POS taggers for English language (CROSBI ID 417376)

Ocjenski rad | sveučilišni preddiplomski završni rad

Krušić, Lucija Comparative analysis of stemmers, lemmatizers and POS taggers for English language / Martinčić-Ipšić, sanda (mentor); Rijeka, . 2017

Podaci o odgovornosti

Autori

Krušić, Lucija

Mentori

Martinčić-Ipšić, sanda

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Comparative analysis of stemmers, lemmatizers and POS taggers for English language

Sažetak

Stemming algortihms, lemmatizators and Part-of- speech taggers are a crucial part of text processing tools and an important part of speech technologies. They are part of the subcategory of tasks of Natural language processing that deals with syntax (field of linguistics that studies the rules and processes that determine the structure of sentences). Natural language processing is an interdisciplinary field of studies that deals with the interaction between human languages and computors. The main goal of this bachelor thesis is to compare two stemming algorithms with a lemmatizer as well as to compare three POS taggers for english language. The thesis includes a theoretical overview of stemming algorithms and lemmatizers as well as various approaches to POS tagging. The thesis also includes a practical comparison between the Porter stemmer, Paice-Husk stemmer and WordNet lemmatizator and between Stanford, NlpDotNet and Genia POS taggers. Furthermore, the success rates of Stanford's POS tagger, NlpDotNet tagger and Genia Tagger are measured. The results are displayed in percentages, based on the quantity of errors in a given text. The results show that the WordNet lemmatizator was the most successful with an average of 76.34%, followed by the Paice-Husk stemmer which correctly stemmed 64.09% of words and Porter stemmer which had a 62.42% success rate. Among the POS taggers, Stanford's tagger proved to be the most successful with 91.06% of correctly tagged words, followed by Genia with 87% and NlpDotNet with 86.18%.

Ključne riječi

stemming, lemmatization, POS tagging, Natural language processing

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Broj stranica

Datum obrane

21.09.2017.

Status objave rada

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Mjesto

Rijeka

Povezanost rada

Povezane osobe

Sanda Martinčić-Ipšić (mentor/i)

Povezane ustanove

Sveučilište u Rijeci, Fakultet informatike i digitalnih tehnologija (318) (autorova ustanova)

Područje

Informacijske i komunikacijske znanosti, Računarstvo