An Online Syntactic and Semantic Framework for Lexical Relations Extraction Using Natural Language Deterministic Model (CROSBI ID 426315)
Ocjenski rad | doktorska disertacija
Podaci o odgovornosti
Orešković, Marko
Čubrilo, Mirko ; Essert, Mario
engleski
An Online Syntactic and Semantic Framework for Lexical Relations Extraction Using Natural Language Deterministic Model
Given the extraordinary growth in online documents, methods for automated extraction of semantic relations became popular, and shortly after, became necessary. This thesis proposes a new deterministic language model, with the associated artifact, which acts as an online Syntactic and Semantic Framework (SSF) for the extraction of morphosyntactic and semantic relations. The model covers all fundamental linguistic fields: Morphology (formation, composition, and word paradigms), Lexicography (storing words and their features in network lexicons), Syntax (the composition of words in meaningful parts: phrases, sentences, and pragmatics), and Semantics (determining the meaning of phrases). To achieve this, a new tagging system with more complex structures was developed. Instead of the commonly used vectored systems, this new tagging system uses tree-like T-structures with hierarchical, grammatical Word of Speech (WOS), and Semantic of Word (SOW) tags. For relations extraction, it was necessary to develop a syntactic (sub)model of language, which ultimately is the foundation for performing semantic analysis. This was achieved by introducing a new `O-structure', which represents the union of WOS/SOW features from T- structures of words and enables the creation of syntagmatic patterns. Such patterns are a powerful mechanism for the extraction of conceptual structures (e.g., metonymies, similes, or metaphors), breaking sentences into main and subordinate clauses, or detection of a sentence’s main construction parts (subject, predicate, and object). Since all program modules are developed as general and generative entities, SSF can be used for any of the Indo- European languages, although validation and network lexicons have been developed for the Croatian language only. The SSF has three types of lexicons (morphs/syllables, words, and multi- word expressions), and the main words lexicon is included in the Global Linguistic Linked Open Data (LLOD) Cloud, allowing interoperability with all other world languages. The SSF model and its artifact represent a complete natural language model which can be used to extract the lexical relations from single sentences, paragraphs, and also from large collections of documents.
syntax analysis, semantic analysis, lexical relations extraction, new lexicon types, hierarchical tagset structure, linked open data
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
237
15.03.2019.
obranjeno
10.13140/RG.2.2.31092.19849
Podaci o ustanovi koja je dodijelila akademski stupanj
Fakultet organizacije i informatike
Zagreb