crta
Hrvatska znanstvena Sekcija img
bibliografija
3 gif
 Naslovna
 O projektu
 FAQ
 Kontakt
4 gif
Pregledavanje radova
Jednostavno pretraživanje
Napredno pretraživanje
Skupni podaci
Upis novih radova
Upute
Ispravci prijavljenih radova
Ostale bibliografije
Slični projekti
 Bibliografske baze podataka

Pregled bibliografske jedinice broj: 125424

Poglavlje/Rad u knjizi

Autori: Tadić, Marko
Naslov: Building the Croatian National Corpus
Knjiga: Third International Conference on Language Resources and Evaluation LREC2002
Urednik/ci: González Rodriguez, M. ; Suarez Araujo, C. P.
Izdavač: ELRA
Grad: Pariz-Las Palmas
Godina: 2002
Raspon stranica:: 441-446
ISBN: 2-9517408-0-8
Ključne riječi: Croatian language, Corpus building, Croatian national corpus, Pos tagging
Sažetak:
The paper presents the work being done so far on the building of the Croatian National Corpus (HNK). It's being collected since 1998 at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. The size, time-span, its composition and criteria for text selection are being presented. The HNK consists of two parts: 1) 30-million corpus of contemporary Croatian language, 2) Croatian Electronic Textual Archive. The procedures of the corpus mark-up and processing are being discussed. One of the most interesting features of this corpus since its launch in 1998 is its availability for querying through the WWW. The future directions of 30m corpus enlargement to 100m in next few years, enhanced corpus management and querying as well as annotation and processing are being discussed at the end.
Projekt / tema: 0130418
Izvorni jezik: ENG
Kategorija: Znanstveni
Znanstvena područja:
Filologija
Puni text rada: 125424.MT4LREC2002.pdf
URL Internet adrese: http://www.hnk.ffzg.hr/txts/mt4LREC2002.pdf
http://www.hnk.ffzg.hr/txts/mt4LREC2002.zip



  Verzija za printanje   za tiskati


upomoc
foot_4