Bibliographic record number: 581314


Authors: Vučković, Kristina; Silberztein, Max; Varadi, Tamas
Title: Corpus Analysis with NooJ
Type: Radionica
Year: 2012
Keywords: corpus processing; linguistic units; queries; annotations; morphology; syntax
NooJ is a freeware language-engineering development environment used to formalize and integrate nine levels of linguistic phenomena: orthography and typography, lexical, inflectional and derivational morphology, local, structural and transformational syntax, semantics. For each of these levels, NooJ provides linguists with one or more formal framework specifically designed to facilitate the description of each phenomenon, as well as parsing, development and debugging tools designed to be as computationally efficient as possible, from Finite-State to Turing machines. This approach distinguishes NooJ from other computational linguistic frameworks that provide a unique formalism that is supposed to cover all linguistic phenomena. As an Engineering development environment, NooJ contains tools to help construct, test, debug, maintain and accumulate large sets of linguistic resources, as well as tools to process large texts and corpora. The system has been developed since 2002 and it has been used to build over 20 language modules. As a corpus processing tool, NooJ allows researchers in various social sciences to extract information from any text or corpus (i.e. not tagged) by applying sophisticated queries based on concepts rather than word forms and build indices and concordances, automatically annotating texts, perform statistical analyses on concepts, etc. NooJ is freely available, runs on Windows, LINUX, SOLARIS and Mac OSX ; linguistic modules can already be freely downloaded for over a dozen languages. See for more information on NooJ ; the page “doc & help” provides references to NooJ-related publications. This workshop intends to help participants to master three basic NooJ functionalities: corpus processing, formalization of linguistic units, syntactic parsing and the automatic annotation of texts.
Project / theme: 130-1300646-1776
Original language: ENG
Research fields:
Information and communication sciences
Contrib. to CROSBI by: (, 24. Svi. 2012. u 16:12 sati

