Croatian Adult Spoken Language Corpus (HrAL): overview and first analysis (CROSBI ID 657520)
Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija
Podaci o odgovornosti
Hržica, Gordana ; Kuvač Kraljević, Jelena
engleski
Croatian Adult Spoken Language Corpus (HrAL): overview and first analysis
Spoken-language corpora are based on spontaneous, unscripted speech defined by varieties of styles, registers and dialects. Consequently, these types of corpora represent the most comprehensive data source about everyday language of ordinary speakers. This paper has two main goals: 1. To present first Croatian spoken corpora - the Croatian Adult Spoken Language Corpus (HrAL ; Kuvač Kraljević, Hržica, 2016) - its structure and its possible application in different linguistic disciplines. HrAL was built by sampling spontaneous conversations of 617 speakers from all Croatian counties, and it comprises more than 250 000 tokens and more than 100 000 types. 2. To present the research of linguistic complexity in adult speakers of Croatian. The interrelation between two syntactic complexity measures was analysed: length of the production unit, as measured by the mean length of communication unit (MLCU) ; and syntactic sophistication, as measured by the ratio of relative clauses (RRC) in the total number of C-units. Results indicate a significant positive correlation between these two measures, confirming that speakers who produce longer utterances also produce less frequent and more complex syntactic structures. Since HrAL reflects actual use of language in everyday situations, it is expected that it will provide objective information about Croatian language and deeper insights in its usage. HrAL is available within TalkBank, a large database of spoken-language corpora covering different languages (https://talkbank.org), in the Conversational Analyses corpora within subsection Conversational Banks. Data were transcribed, coded and segmented using the transcription format Codes for Human Analysis of Transcripts (CHAT) and the Computerised Language Analysis (CLAN) suite of programmes within the TalkBank toolkit. Such open access should provide opportunities for the usage of HrAL in research of Croatian spoken language and its varieties, but also in cross-linguistic studies comparing various linguistic properties. KUVAČ KRALJEVIĆ, Jelena, HRŽICA, Gordana. 2016. Croatian Adult Spoken Language Corpus (HrAL). Fluminensia: Journal for philological research. 28/2. MACWHINNEY, Brian (2007). The TalkBank Project. In Creating and Digitizing Language Corpora: Synchronic Databases. Edited by J. C. Beal, K. P. Corrigan & H. L. Moisl. Vol.1. Houndmills: Palgrave-Macmillan. 163-180.
conversational anlysis, spoken language corpora, Croatian
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
179-179.
2017.
objavljeno
Podaci o matičnoj publikaciji
12th Slavic Linguistics Society Meeting Book of Abstracts
Podaci o skupu
12th Slavic Linguistics Society Meeting
poster
21.09.2017-24.09.2017
Ljubljana, Slovenija