Institute of Croatian Language and Linguistics
Recent publications
This corpus-based and qualitative study examines the valency patterns of Croatian verbs that encode verbal activity and belong to the semantic field of verbs of speaking through metaphorical and metonymic extensions, using a cognitive linguistics framework, specifically the usage-based model. The analysis focuses on examples such as the metaphoric use of animal sounds (e.g., blejati ‘to bleat’) and verbs associated with bodily processes (e.g., srati ‘to shit’), which often convey negative or stereotypical attitudes towards speakers or messages, or even extreme disdain. This paper contributes to the understanding of cross-domain figurative extensions of verb meanings and their valency adaptations. A dataset of 438 example sentences containing 152 verbs with figurative extensions targeting the domain of speaking was meticulously compiled from Croatian corpora. This dataset enabled a manual annotation and analysis of the transfer and adaptation of valency patterns across domains. The study addresses the following key questions: 1. What source domains are used for verbs of speaking as targets? 2. Do verbs retain the valency patterns of the source domain or adopt those of the target domain? 3. Is the passivization of transitive verbs possible in metaphorical contexts? The findings indicate that with a metaphorical shift in meaning, verbs often adopt new valency patterns from the target domain. Our examples of valency pattern change as a result of a metaphorical meaning shift demonstrate that verbs can appear with arguments not explicitly subcategorized by the verb itself.
Children’s acquisition of variation in the target language depends on a number of factors not yet well understood. This study probes the acquisition of morphological variation in two unrelated languages, Croatian (Slavic) and Estonian (Finnic), focussing on parallel forms of a lexeme expressing a single grammatical category (a phenomenon known as morphological overabundance). We conducted a cross-linguistic elicitation experiment with 140 monolingual, typically developing children aged 3;0 to 6;11 (80 learning Croatian, 60 Estonian). We elicited genitive plural forms in Croatian and partitive plural in Estonian, with lexemes which either are invariant or allow more than one form. Children in both languages were less accurate with lexemes with parallel forms, indicating that the morphological variation hindered acquisition. Pattern type frequency was found to affect accuracy in both languages. Children’s choice between two parallel forms was unaffected by age, but significant language-specific differences emerged.
Proverbs are an important component of cultural literacy and thus they are often encountered in everyday life. Corpus-based studies of proverbs typically focus on proverb frequency. Here we address challenges in using general language corpora and corpus searches as a method for estimating proverb frequency. Using two general language corpora from Sketch Engine, HrWaC2.2 and NoTenTen17, to search for a semantic counterpart of the same proverb in Croatian and Norwegian, we explore various search options and their (dis)advantages, as well as the issue of proverb modifications that hinders attempts to obtain a reliable picture of proverb frequency in a corpus. Based on the corpus evidence, we provide insights into modification types of proverbs. Finally, we propose that an optimal search tool would consider a proverb’s features beyond frequency, such as its semantic class, syntactic complexity, and abstractness as contributors to its cognitive load, allowing a better selection of these expressions in clinical testing, education, research and entertainment.
Early research on the first language acquisition of figurative language indicated that figurative language comprehension and production skills develop relatively late, while recent studies contest this view. This study explores early production of metaphorical (e.g., shark meaning a rapacious crafty person ) and metonymic (e.g., house meaning an organisation ) meanings in English polysemous nouns and verbs by using the Braunwald corpus, which tracks a single child’s speech from the age of 1 year, 5 months to 7 years. We explore the initial production of these meanings, with respect to the age, order of acquisition and part of speech (noun vs. verb). Our study shows that children start using figurative meanings at a much earlier age than previously thought. In this early stage, metonymic meanings emerge earlier, while metaphorical meanings come a few months later. These findings challenge prior beliefs that children only develop figurative language skills at 3 years of age and show that it is not only the pre-figurative skills that develop early but also the production of very conventional types of figurative meaning, which might not necessarily require the completed development of the complex set of cognitive skills necessary for cross-domain comparison.
This paper seeks to determine the place of contemporary standard Croatian future II in the verbal system and to define its meaning more accurately. What applies to Standard Croatian is assumed to apply to the majority of Štokavian and Čakavian dialects as well. While the primary focus of this work is modern future II, some of its older meanings are mentioned briefly in Section 4. Modern Croatian future II is peculiar in several ways, due in part to its unclear place in the verbal system: some linguists treat it as a tense (i. e. as an indicative tense), others as a mood (i. e. as a non-indicative tense). This paper attempts to resolve that disagreement by comparing future II with future I in similar contexts. The difference between the two future forms is difficult to interpret as a difference between categories but, in any case, it is modal rather than temporal. In that sense, future II functions as an ‘oblique’ mood in relation to future I, which belongs to the indicative. Whereas future I refers to a future process anchored in the present, future II refers to one that is not. The anchoredness of a future process in the present is understood chiefly as a relatively concrete concept: one that increases the conviction that a process will be realised. The non-indicativity of future II is also corroborated by the situation in languages that use the subjunctive to express the future II meanings. Future II as an oblique mood, whether formed using the l-participle or the infinitive, is a logical consequence of its development from earlier indicative forms.
Entrenchment and schematisation are the two most important cognitive processes in language acquisition. In this article, the role of the two processes, operationalised by token and type frequency, in the production of overgeneralised verb forms in Croatian preschool children is investigated using a parental questionnaire and computational simulation of language acquisition. The participants of the questionnaire were parents of children aged 3;0–5;11 years (n = 174). The results showed that parents of most children (93 %) reported the parallel use of both adult-like and overgeneralised verb forms, suggesting that Croatian-speaking preschool children have not yet fully acquired the verbal system. The likelihood of overgeneralised forms being reported decreases with the age of the children and verb type frequency. The results of the computational simulation show that patterns with a higher type frequency also show a greater preference for the correct form, while lexical items show both learning and unlearning tendencies during the process.
The automatic detection of implicitly offensive language is a challenge for NLP, as such language is subtle, contextual, and plausibly deniable, but it is becoming increasingly important with the wider use of large language models to generate human-quality texts. This study argues that current difficulties in detecting implicit offence are exacerbated by multiple factors: (a) inadequate definitions of implicit and explicit offense; (b) an insufficient typology of implicit offence; and (c) a dearth of detailed analysis of implicitly offensive linguistic data. In this study, based on a qualitative analysis of an implicitly offensive dataset, a new typology of implicitly offensive language is proposed along with a detailed, example-led account of the new typology, an operational definition of implicitly offensive language, and a thorough analysis of the role of figurative language and humour in each type. Our analyses identify three main issues with previous datasets and typologies used in NLP approaches: (a) conflating content and form in the annotation; (b) treating figurativeness, particularly metaphor, as the main device of implicitness, while ignoring its equally important role in the explicit offence; and (c) an over-focus on form-specific datasets (e.g. focusing only on offensive comparisons), which fails to reflect the full complexity of offensive language use.
Studies on verbal overgeneralization often focus on languages with low morphological complexity. The Croatian conjugational system exhibits varying degrees of complexity, and this complexity is not primarily based on the number of inflectional morphemes, but on an elaborate system of stem changes. During early language development, children face the difficult task of acquiring this system, using overgeneralized forms to overcome its complexity. To date, studies have used a corpus-based method to retrieve overgeneralizations in child language, which has had limited success in capturing this phenomenon. The aim of this study was to investigate the production of overgeneralized verb forms in Croatian monolingual children aged 2;6 to 5;11 using a questionnaire in which parents report overgeneralizations used by their children. We tested the relationship between the production of overgeneralized forms and features of the input language (token frequency and class size). We hypothesized that the rate of overgeneralizations will depend on input language features, i.e. a higher rate of overgeneralizations for infrequent verbs and for verbs with smaller class size. The items selected for the questionnaire are the verbs with stem change used by parents in the longitudinal Croatian corpus of child language. Parents report overgeneralized forms in all verb classes, and verb frequency and class size negatively correlate with the proportion of overgeneralizations. Our results show that children gradually abstract morphological systems in a way that is highly sensitive to the properties of the input.
Manner of speaking verbs denote the transfer of a message through speech, emphasizing the volume, intensity, comprehensibility, psychophysical condition of the speaker, and/or the impression that the speaker leaves on the hearer. In this article, verbs are semantically divided into four subclasses: 1. Verbs with emphasis on volume, 2. verbs of incomprehensible speaking, 3. verbs of meaningless speaking and complaining, and 4. verbs with emphasis on emotional component. Their syntactic peculiarities have been extensively researched in English, while no special attention has been paid to these verbs in Croatian. It is stated that in Croatian they are monovalent verbs. However, these verbs can be bivalent, and even trivalent. The recipient can be expressed by a dative complement within all four semantic subclasses. With the verbs of loud speaking and verbs with negative emotions, it can be expressed by a prepositional complement na ‘at’ + accusative and za ‘after’ + instrumental. The theme can be expressed by a quotation and a clausal complement, a prepositional complement o ‘about’ + locative, an accusative complement, sometimes a prepositional complement protiv ‘against’ + genitive, za ‘for’ + accusative, and with fewer verbs with prepositional phrases za ‘for’ + instrumental or nad ‘over’ + instrumental. Interestingly, there are certain restrictions for the complements’ combination within the same clause, which are described in more detail in the article.
U sjeni Bojničićeva rada, obilježenoga iznimnim prinosom hrvatskoj kulturnoj povijesti i pomoćnim povijesnim znanostima, ostala je Gramatika madžarskoga jezika (1888., 1896., 1905., 1912.) koja je na razmeđu dvaju stoljeća, u vrijeme smjene filoloških škola (zagrebačke školom hrvatskih vukovaca), doživjela nekoliko izmijenjenih izdanja. Gramatiku je – točnije njezino prvo izdanje – kao udžbenik odobrio Odjel za bogoštovlje i nastavu Kr. ugarskoga ministarstva, potom ju nagradio 1889., a naposljetku je ipak negativno ocijenjena, i to u službenom glasilu istoga Odjela koji ju je i nagradio, u Nastavnom vjesniku , a gotovo jednako ocijenit će ju i neki mađarski izvori početkom 20. stoljeća. Pritom je riječ o kritikama koje su se mahom odnosile na (hrvatski) metajezik gramatike, donošenje netočnih pravila te na njezino, po sudu određenih kritičara, nesustavno oblikovanje, a samom se Bojničiću zamjerala nedostatna filološka naobrazba. Upravo ju stoga ti kritičari između ostaloga opisuju kao priručnik neprikladan za nastavnu uporabu. Od navedenih četiriju izdanja gramatike – iako konzultirani hrvatski i mađarski izvori ustvari ne donose nedvosmislen podatak o tome koliko je točno izdanja gramatika doživjela – spomenutoj je filološkoj ocjeni također podlegnulo samo prvo, a autor je poneke ispravke uklopio u kasnija izdanja svoga gramatičkoga priručnika. U ovom se radu uspoređuju četiri izdanja Bojničićeve gramatike, utvrđuju se jezične, nazivoslovne i leksičke mijene njezina polaznoga (hrvatskog) jezika te se propituje u kojoj su mjeri potaknute objavljenim kritikama te koliki je odraz smjene filoloških škola vidljiv u pojedinim izdanjima. U sklopu tumačenja mijena što ih izdanja gramatike sadrže, posebice se ističu jezične osobitosti svojstvene normi zagrebačke filološke škole, čime se pak nastoji potkrijepiti činjenica kako je riječ o obilježjima koja su prisutna u svim četirima izdanjima gramatike neovisno o vremenu njihova izdavanja te jezično-političkim okolnostima i utjecajima pod kojima su nastala. U konačnici se nastoji potvrditi (ne)opravdanost negativne recepcije koju je gramatika imala u dijelu filološke javnosti svojega vremena. Drugim riječima nastoji se dati odgovor na pitanje valja li Bojničiću pridružiti epitet autora čiji rad – pa tako ni njegova gramatika – u odgovarajućoj mjeri nije stručno potkovan ili mu pak, bez obzira na njegovu naobrazbu i upućene kritike, valja odati priznanje zbog neospornih prinosa što ih je dao u području hrvatsko-mađarske gramatikografije. In the shadow of Bojničić’s work marked by exceptional contributions to Croatian cultural history and auxiliary historical sciences remained the Hungarian Grammar (1888, 1896, 1905, 1912), which at the turn of the century, at the time of change of philological schools (Zagreb philological school was supplanted by the school of Croatian Vukovians), saw several modified editions. This grammar book (to be exact, its first edition) was approved as a textbook by the Royal Hungarian Ministry of Worship and Education and awarded by the same institution in 1889. Eventually, the grammar was nevertheless negatively reviewed in Nastavni vjesnik , the official gazette of the same Ministry, which had previously awarded the grammar, and was almost equally evaluated by some Hungarian sources at the beginning of the 20th century. The criticism mostly concerns the grammar’s metalanguage (Croatian), deriving incorrect rules, and its unsystematic format (according to certain critics), and Bojničić himself was criticized for his deficient philological education. This is exactly the reason why those critics, amongst other things, describe it as a handbook inadequate for school use. Of the four above-mentioned editions of the grammar – although the consulted Croatian and Hungarian sources do not explicitly state exactly how many editions the grammar had – only the first edition received the above-mentioned philological evaluation, and the author made some corrections in the later editions of his grammar book. This paper compares the four editions of Bojničić’s grammar, identifies linguistic, terminological, and lexical changes in its source language (Croatian), and examines the extent to which they had been motivated by the published criticism and the extent to which the change of philological schools is reflected in individual editions. Within the interpretation of the changes made in the different editions, linguistic features characteristic of the norm of the Zagreb philological school are highlighted, in an attempt to corroborate the fact that these features are present in all four editions of the grammar irrespective of the time of their publication as well as the linguistic-political circumstances and influences under which they came into existence. Ultimately, the present paper seeks to confirm the (un)justification of the negative reception the grammar had in a part of the philological public of its time. In other words, we seek to answer the question of whether Bojničić is to be given the epithet of an author whose work – including his grammar – is to a certain extent not professionally grounded, or, regardless of his education and the criticism toward his work, he has to be given credit for his indisputable contribution to the field of Croatian–Hungarian grammaticography.
In contemporary linguistics, subordination and coordination are most commonly used in syntax, with co-ordination referring to independent clauses, and subordination to the dependent ones. In this paper, we approach the phenomenon of coordination from the syntagm point of view with the aim of describing the relation between the syntagm and the word. For the purpose of this analysis, “word” is understood as an orthographic unit (a language unit written between two blank spaces) which is, in addition, in the sense of derivational morphology, a compound with at least two stems. In this respect, the most challenging are the derivation of semi-compounds, in the traditional sense, and the understanding of the function of a dash. The dash conveys two senses: formal (orthographic and derivational) and the semantico-syntactic. The formal function refers to the connection of at least two constituents (stems), while the semantico-syntactic function relates, on one hand, to the subordination (connecting parts of the word, i.e., one stem appears as superior to others; also called single-term compound), and on the other hand, to the coordination (connecting two or more words, i.e., the stems of the equal level; multi-term compounds). A similar problem occurs with respect to the compounds which are formally of the same status as the syntagm or the word phrase from which they have been derived. In this paper, the mentioned derivational-morphological problem is explored from the perspective of orthographic diachrony: we analyse the approaches in Croatian orthography books that mirror the concept of coordination at the syntagm and word level (i.e., a compound consisting of at least two stems). Our analysis includes all parts of speech, with the goal of listing, describing and scientifically explaining all the questionable cases.
Istraživanje o kojemu je riječ u radu potaknuto je problemima s kojima su se autorice susrele pri definiranju pojedinih jezikoslovnih naziva u okviru projekta Hrvatsko jezikoslovno nazivlje – Jena i projekta Hrvatski mrežni rječnik – Mrežnik te pri izradi Školske gramatike hrvatskoga jezika. Na korpusu suvremenih tradicionalnih gramatika i radova analizirana je podjela i nazivlje nezavisnosloženih rečenica. Jezični korpus sastavljen u sklopu projekta Jena omogućuje potpuni uvid u potvrđenost pojedinih naziva te daje kontekst i značenja u kojima se naziv upotrebljava. Uz analizu razlika u podjeli nezavisnosloženih rečenica istraživanje se usredotočuje na terminološke aspekte i probleme u definiranju pojedinih naziva povezanih s nezavisnosloženim rečenicama prouzročene s jedne strane postojanjem sinonimnih parova ili nizova, a s druge višeznačnošću nekih naziva u nazivlju nezavisnosložene rečenice. Cilj je ovoga rada predložiti podjelu nezavisnosloženih rečenica u okvirima tradicionalne gramatike, u prvome redu za školske gramatike i udžbenike, riješiti neke terminološke probleme i potaknuti daljnju raspravu o određenim problemima podjele i nazivlja nezavisnosloženih rečenica.
The present paper focuses on the presentation and discussion of aspects of OFFENSIVE LANGUAGE linguistic annotation, including the creation, annotation practice, curation, and evaluation of an OFFENSIVE LANGUAGE annotation taxonomy scheme, that was first proposed in Lewandowska-Tomaszczyk et al. (2021). An extended offensive language ontology comprising 17 categories, structured in terms of 4 hierarchical levels, has been shown to represent the encoding of the defined offensive language schema, trained in terms of non-contextual word embeddings – i.e., Word2Vec and Fast Text, and eventually juxtaposed to the data acquired by using a pair wise training and testing analysis for existing categories in the HateBERT model (Lewandowska-Tomaszczyk et al. submitted). The study reports on the annotation practice in WG 4.1.1. Incivility in media and social media in the context of COST Action CA 18209 European network for Web-centred linguistic data science (Nexus Linguarum) with the INCEpTION tool (https://github.com/inception-project/inception) – a semantic annotation platform offering assistance in the annotation. The results partly support the proposed ontology of explicit offense and positive implicitness types to provide more variance among widely recognized types of figurative language (e.g., metaphorical, metonymic, ironic, etc.). The use of the annotation system and the representation of linguistic data were also evaluated in a series of the annotators’ comments, by means of a questionnaire and an open discussion. The annotation results and the questionnaire showed that for some of the categories there was low or medium inter-annotator agreement, and it was more challenging for annotators to distinguish between category items than between aspect items, with the category items offensive , insulting and abusive being the most difficult in this respect. The need for taxonomic simplification measures on the basis of these results has been recognized for further annotation practices.
In this paper, focusing on three Croatian Glagolitic editions printed in Senj as examples, the authors attempt to form a clearer linguistic picture of the non-liturgical works printed at Senj’s Glagolitic printing press, particularly in light of previously established conceptions of the 16th century Croatian literary language. Combining the vowel and consonant features of the Senj editions Korizmenjak , Mirakuli , and Naručnik , an effort was made to describe in detail the linguistic characteristics of the given level of description. At the same time, special attention was given to both Church Slavonic and vernacular (Čakavian) features. the analysis confirms that many of the described features are characteristic not only of the Čakavian of that time, but also of Croatian Church Slavonic. This indicates, on one hand, a tendency to preserve the literary tradition of the printed word, but on the other hand, also a confirmation of Čakavian phonological features.
Through the decades of working on processing Croatian terminology in STRUNA we have identified two problems that can generate an extremely confusing situation for the end-user. On the one hand, we have homonyms, where the same term is used to indicate multiple concepts. This problem was anticipated since it is well-known that certain terms can have different or slightly different meanings in various fields of knowledge or even in the same field. We handle the homonyms by allowing multiple entries in STRUNA. On the other hand, fairly often, we come upon the case where a new entry was introduced into STRUNA even though the term with a semantically identical definition already existed. In anticipation of this problem, we have, from the very beginning of STRUNA, implemented the option of accepting the existing terminological entry and associating it with the ongoing project. Unfortunately, for various reasons, editors rarely use this feature. Therefore, this issue will have to be addressed in the future. In this paper, we will present these problems experienced by the end-user of STRUNA and suggest possible solutions implemented in a certain project (field) and/or interfiled solutions.
The paper examines the sources of a linguistic corpus explored and interpreted in the framework of the RETROGRAM project, carried out by the Institute of Croatian Language and Linguistics. The abovementioned sources comprise three pre-standard historical grammar books which, aside from the rules governing Italian, German and Latin, additionally include Croatian equivalents of grammatical terms, morphological paradigms as well as linguistic annotation.
The first part of the paper gives an overview of the Croatian Linguistic Terminology – Jena project stating its goals and achievements. The project is now (March 2023) in its last year. In the second part of the paper, plans and challenges of the project are discussed. Special attention is paid to the relation of linguistic terminology to anthropological (anthropolinguistic) terminology and the Jena project to the ANTRONA project, the first humanities and social sciences terminology project in the Struna program. The central part of the paper focuses on some of the most important issues and challenges (supported by examples) connected with the translation of Croatian onomastic terminology into English and vice versa.
The paper analyzes different language problems connected with speaking about people of non-binary sex/gender in Croatian. Research is based on corpus analysis. One of the authors compiled the Croatian gender corpus using the Sketch Engine corpus compiling software for this analysis. The authors analyze these issues connected with speaking about non-binary people: using nouns of masculine, neutral, or feminine gender ; using masculine, neutral, or feminine pronouns and verbal forms ; ways of addressing a non-binary person, normative problems noticed in the Croatian gender corpus. Some issues are compared to the situation in other languages.
Tema ovoga rada jezična je usporedba dvaju izdanja književnokajkavskoga lekcionara Čtejenja i evangelijumi, iz 1831. i 1851. godine. Kako su Čtejenja doživjela niz izdanja, pogodna su za povijesnosociolingvistička istraživanja jer se usporedbom i analizom jezika lekcionara mogu pratiti razvoj jezika i jezične mijene na svim razinama, gotovo iz jednoga desetljeća u drugo. Cilj provedenoga istraživanja bio je utvrditi jezične mijene (na grafijskoj i ortografijskoj, fonološkoj, morfološkoj, sintaktičkoj i leksičkoj razini) koje su tijekom dvaju desetljeća zahvatile kajkavski književni jezik i odgovoriti na pitanje je li se i ako jest, u kojoj mjeri, tijekom dvaju desetljeća književna kajkavština štokavizirala.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
35 members
Ivana Brač
  • Department of Croatian Standard Language
Siniša Runjaić
  • Department of General Linguistics
Kristian Lewis
  • Department of Croatian Standard Language
Information
Address
Centar, Croatia
Head of institution
Željko Jozić
Website