Ukrainian Feminine Personal Nouns in Online Dictionaries and Corpora

Abstract and Figures

The paper discusses how Ukrainian feminine personal nouns are represented in Ukrainian online dictionaries and how corpus data can be used in their exploration. Particular attention is paid to the Web Dictionary of Ukrainian Feminine Personal Nouns (2022, published on and its coverage of these lexical items in comparison with other dictionaries. The discussion of corpora focuses on the General Regionally Annotated Corpus of Ukrainian (GRAC) against the background of two other large Ukrainian corpora. The use of GRAC in the compilation of the said dictionary is explained, and the results of corpus-based explorations of feminine personal nouns are presented, highlighting the unique features of the corpus that are useful for their study. The paper concludes with a summary of the current situation with the feminine terms in modern Ukrainian and outlines prospects for further research.
Olena Synchak1 and Vasyl Starko2
1 Ukrainian Catholic University, 2a Kozelnytska Str., Lviv, 79026, Ukraine
2 Ukrainian Catholic University, 2a Kozelnytska Str., Lviv, 79026, Ukraine
1. Introduction
Feminine personal nouns in Slavic languages have recently become the focus of attention of
linguists from various perspectives: word formation models [7], [10], [21], [29], [50], semantic and
pragmatic features [13], [20], [24], gender-fair language [38], [39], political correctness [29], [49],
language policy [25], and others. The social perception of these lexical items has been studied by
psychologists [11]. In the field of lexicography, dictionaries of feminine terms in Polish [22] and
Russian [34] have been published, and a number of studied are devoted to different lexicographic
traditions for codifying these lexical items [16], [17], [21], [23].
Feminine personal nouns have constituted a distinct lexical subsystem of the Ukrainian language
for a long time now, as shown in the works of Ivan Feketa [9], Ivan Kovalyk [14], Maria Brus [3], [4],
Svitlana Semeniuk [40], Olena Synchak [47], and others. Feminine terms in the Ukrainian dictionaries
have been studied by Feketa [9], Yaryna Puzyrenko [31], Alla Arkhanhelska [1], and Brus [3], [4].
These items are currently one of the most active and dynamic groups in the Ukrainian lexicon. Our
estimate is that the number of feminine forms has increased tenfold over the past two decades [48].
Clearly, this rapid growth is a rection of the derivational system of Ukrainian to societal demands,
particularly in response to the increased participation of women in the business and civic activities of
Ukrainian society and the wider processes of democratization [19], [46]. The creation of new
feminine terms has become so common in Ukrainian that it is second only to borrowings from other
languages as a vehicle for lexicon expansion. However, the former, unlike the latter, uses Ukrainian
building material.
An in-depth lexicographic description of feminine personal nouns based, among other sources, on
up-to-date corpus data remains an outstanding issue in Ukrainian linguistics. This paper aims to offer
an analysis of how these lexical items are covered in traditional and especially online dictionaries of
Ukrainian, focusing specifically on the recently published Web Dictionary of Ukrainian Feminine
Personal Nouns (WDUF) [48]. An analysis of this lexical group in Ukrainian corpora is provided with
an emphasis on the General Regionally Annotated Corpus of Ukrainian (GRAC) and its functionality.
Conclusions are drawn regarding the representation of feminine personal nouns in Ukrainian
dictionaries and corpora.
2. Related Works
Ukrainian has a long history of representing feminine personal nouns in dictionaries. Lavrentii
Zyzanii included seven such nouns in his lexicon (1596) [53]. Borys Hrinchenko’s Slovar (1907
1909) contains 935 feminine forms of personal nouns, e.g., авторка ‘(female) author’, актриса
‘actress’, лікарка ‘(female) doctor’, малярка (female) painter’, свідка (female) witness’,
співробітниця (female) co-worker’, and others [12]. In the dictionaries published in 19201930,
when the Ukrainian language experienced a revival in Soviet Ukraine, Ukrainianization went hand in
hand with feminization. For example, the Russian-Ukrainian dictionary edited by Ahatanhel Krymsky
and Serhii Yefremov [15] provides 2,117 feminine personal nouns as equivalents to Russian
headwords, e.g., народознавиця ‘(female) ethologist’, музиколюбка ‘(female) music lover’, літунка
‘(female) pilot’, позовниця ‘(female) plaintiff’, and others. A dictionary of business Ukrainian (1930)
[8] contains 533 feminine terms: роботодавиця ‘(female) employer’, посвідчиця ‘(female) attestor’,
податковиця ‘(female) tax official’, статистиця ‘(female) statistician’, etc. The 11-volume
explanatory Dictionary of Ukrainian published in Soviet Ukraine (SUM-11, 19701980) [2] has 3,500
feminine forms in its register. The number of such terms has grown in each successive era. However,
feminine personal nouns registered in older dictionaries only sporadically appear in new monolingual
dictionaries of Ukrainian.
As a result of the policy to shut down Ukrainianization, over 1,000 feminine forms from the
dictionaries published in 19201930 (the most significant of these lexicographic works are available
online at did not make it into SUM-11, e.g., мовознавиця ‘(female) linguist’,
літературознавиця ‘(female) literary critic’, мисткиня ‘(female) artist’, and many others. Part of
the word stock in SUM-11 is a reflection of the policies of internationalization and Russification,
namely words that are similar to Russian were provided in the place of more distinctly Ukrainian
forms, e.g., бухгалтерка but not рахівничка ‘(female) accountant’. However, SUM-11 does include
feminine forms derived from loan words, such as композиторка ‘(female) composer’, or constructed
from Ukrainian components, e.g., квартиронаймачка ‘(female) composer’.
After a long lexicographic hiatus, a new explanatory dictionary was produced in Ukraine. In its
second edition, the Large Explanatory Dictionary of Modern Ukrainian (LEDMU, 2005) [5]
contained 250,000 entries, including 3,546 feminine personal nouns. Even though this one-volume
dictionary incorporated much of SUM-11, it did introduce new feminine personal nouns and brought
back some of the old ones which were effectively proscribed in Soviet times. In a number of cases,
LEDMU provides feminine forms where SUM-11 had only masculine forms, e.g., авангардистка
(female) avantgarde artist or writer. Second, the dictionary brought back into circulation, albeit
inconsistently, feminine forms registered in the dictionaries of the Ukrainianization era, for example,
ватажка, верховодиця (female) group leader’. Lastly, LEDMU describes a number of new
feminine personal nouns, such as віндсерфінгістка ‘(female) windsurfer’ and позикотримачка
‘(female) lender’.
The Universal Dictionary by Zoriana Kunch [18] also offers expanded treatment of feminine
personal nouns and provides some previously nonregistered items, e.g., бодибілдерка ‘(female)
bodybuilder’. Other valuable source of feminine terms include the dictionaries of new words by
Anatolii Neliuba [27], [28], which, until 2022, contained arguably the most comprehensive list of
recently coined words of this kind collected from the language of the Ukrainian mass media, e.g.,
барменка ‘(female) bartender’ and виборчиня ‘(female) voter’. The Russian-Ukrainian Medical
Dictionary by Stanislav Nechai [26] turned out to be a surprisingly rich source of feminine
professional titles, including those with the suffix -ин(я), e.g., антрополоґиня ‘(female)
The most recent 20-volume explanatory dictionary of Ukrainian (SUM-20, vols. 1-12 have been
published so far) [42] covers feminine forms from the dictionaries discussed above and includes some
new additions, e.g., документалістка (female) documentary filmmaker’, and a handful of words
that were missing from dictionaries since the 1920s (літературознавиця, мовознавиця, and others).
However, in general the coverage of feminine personal nouns in SUM-20 is inconsistent as it omits a
number of items previously registered in other lexicographic works, for example, авіаторка
‘(female) pilot’, архітекторка ‘(female) architect’, конгресменка ‘(female) congressperson’, and
others. Moreover, the compilers completely ignore all feminine derivatives with the suffix -ин(я),
such as аналітикиня ‘(female) analyst’, біологиня ‘(female) biologist‘, etc.
In summary, even the most recent dictionaries fail to provide systematic coverage of feminine
personal nounsthose that are registered in other lexicographic works and found in texts written over
the past 100 years. Furthermore, such nouns are not treated on par with the masculine forms: they are
not supplied with full-fledged definitions and have outdated illustrations, if any at all. Nor do
dictionaries help users choose the most appropriate feminine derivative in cases of multiple possible
formations, e.g., міністерка – міністриня – міністреса (female) minister’. Thus, there was a need
for a dictionary that would fill those gaps by systematically describing and codifying feminine
personal nouns.
To this end, the Web Dictionary of Ukrainian Feminine Personal Nouns (WDUF in what follows)
[48] was compiled by Olena Synchak, with Hanna Dydyk-Meush serving in the capacity of the
academic editor and Vasyl Starko as an academic consultant. In early 2022, the dictionary was
published on the dictionary portal [35]. Unlike other explanatory dictionaries, WDUF
provides full definitions, supplies illustrations from a corpus, and lists dictionaries (published since
1893) that also register the feminine noun in question. We defer a more detailed description of WDUF
until section 3 and will now provide an overview of other online dictionaries.
2.1. Ukrainian feminine personal nouns in online dictionaries prior to 2022
Online dictionaries that describe the lexical items under discussion fall into three categories:
1) paper editions that were scanned and made available online;
2) fully digitized and searchable paper dictionaries;
3) originally electronic/online editions.
The first group contains rather short lists of feminine derivatives of professional names [30], [32],
[33] and a scanned version of one large general-purpose Ukrainian dictionary [5]. All of these
publications lack search capability and are only suitable for browsing.
The second group includes SUM-11, which has a searchable text version conveniently
complemented by scanned images of the original paper publication [43], and the dictionary
portal, which comprises a number of dictionaries, mostly published in the 1920s and later banned and
removed from circulation under the Soviet regime [35]. This collection includes a highly valuable,
extensive Russian-Ukrainian Dictionary [15]. The r2u portal was specifically designed to provide an
array of flexible search options and is well-suited for the task of finding feminine derivatives. With
the addition of WDUF (which differs from other dictionaries in that it has not been published on
paper at the time of writing), the portal has greatly expanded its coverage of these lexical items and,
importantly, provides information on the modern revival of feminine forms registered in old
dictionaries but ignored or suppressed in Soviet times, e.g., ворогиня ‘(female) enemy’.
The third group includes two resources published by the Ukrainian Linguistic-Information Fund
(ULIF). One is the Dictionaries of Ukraine Online portal [44], which includes a dictionary of word
flection. This is essentially an orthographical dictionary which also provides a full paradigm for each
headword, including feminine forms. For example, for the word корифейка (female) luminary’ the
user will find all the inflected forms along with the correct spelling and grammatical information
(noun, feminine, animate). However, despite its fairly large register (262,000 items), this dictionary
has not been regularly updated and thus omits many recent feminine derivatives. Moreover, this
resource does not provide any definitions or data on usage. These two types of information can be
found in SUM-20, another dictionary produced by ULIF, which is published both on paper and
online. In the online format, it can be browsed or searched by keyword. So far, 12 out of 18
alphabetically arranged volumes have been published, and this part (60% of the entire dictionary)
contains, according to our analysis, 24% of the feminine personal nouns presented in WDUF. By
extrapolation, when SUM-20 is completed (supposedly in 2028), this figure will rise to 40%. This
quite low overlap can be attributed, at least partly, to the difference in lexicographic approaches and
methods of collecting language data. In any case, SUM-20 omits a number of feminine derivative
forms, most markedly those with the suffix -ин(я), which have blossomed in the language of the
Ukrainian mass media over the past several years.
2.2. VESUM and USL
The Large Electronic Dictionary of Ukrainian (VESUM) [36] is a morphological dictionary
designed primarily for machine use but also equipped with a web interface for human use accessible
at VESUM is the largest Ukrainian dictionary of its kind: its current version 5.7.5
contains over 416,000 lemmas from which more than 6.5 million wordforms are generated. The
entries in the web interface consist of the lemma and all indirect forms, each supplied with a
morphological tag representing the part of speech and grammatical features. For example, here is an
incomplete paradigm for the word воїнка ‘(female) warrior’:
воїнка noun:anim:f:v_naz
воїнки noun:anim:f:v_rod
воїнці noun:anim:f:v_dav
where the tag anim signifies ‘animate noun’, f stands for ‘feminine’, and v_naz, v_rod, and v_dav
represent cases (nominative, genitive, and dative, respectively).
VESUM has successfully been used in various projects, including the morphological tagging of
the GRAC corpus [41] and enabling full-text search in Ukrainian Wikipedia. One of the distinctive
features of VESUM is that its authors constantly expand it by scouring texts for words that are not
registered in any other dictionaries of Ukrainian. Overall, VESUM contains more than 5,000 feminine
personal nouns. This kind of broad, text-based coverage makes it possible to establish with a high
degree of reliability which derivational models are most productive for constructing these lexemes.
Our analysis shows that by far the most productive model is the one involving the suffix -к(а), while
forms ending in -иця are a distant second. A total of six models are presented below in the order of
their productivity. The numbers signify how many words of a particular type are found in VESUM:
3,260 with -ка, e.g., блогерка ‘(female) blogger’
1,460 with -иця, e.g., антикорупційниця ‘(female) anticorruption activist’
260 with -иня, e.g., біологиня ‘(female) biologist’
160 with -иха, e.g., гончариха ‘(female) potter’
70 with -ша, e.g., директорша ‘(female) school principal’
35 with -иса or -еса, e.g., патронеса ‘(female) patron’.
Even though VESUM is intended for machine use, it can also be highly useful for human users
who seek morphological information for Ukrainian, either for individual words or groups of words.
When the dictionary can be accessed via its web interface, a word in the canonical form (lemma) or
any indirect form can be searched. Because the majority of feminine derivative forms are created by
means of suffixation, users can find feminine terms, even those they may not be aware of, by querying
VESUM with the corresponding masculine form followed by an asterisk, which replaces zero or more
characters. For example, the search query блогер* will retrieve such words as блогерка, блогерша
‘(female) blogger’ and інстаграм-блогерка (female) Instagram blogger’.
The Ukrainian Semantic Lexicon (USL) [45] is another machine-readable dictionary for Ukrainian
that covers feminine terms. At present, it is a subset of VESUM in which each lemma is supplied with
semantic tags. Noteworthily, all nouns designating living beings are already part of USL, meaning
that all feminine terms from VESUM have semantic tags. More specifically, they fall into a group of
words that have the semantic tag 1:conc:hum, where 1 is the number of the sense, the tag conc means
‘concrete noun’, and hum designates a human being. This sequence is sometimes supplemented with
more specific tags, e.g., prof refers to a profession and occupation: дайверка ‘(female) diver’
3. Methods and Materials
From the outset, WDUF was designed as a corpus-based dictionary. To this end, a comparative
analysis of Ukrainian corpora has been undertaken in order to determine which of them are most
suited to the task of studying the usage of feminine personal nouns.
At present, there are a handful of Ukrainian corpora that differ in size, types of texts, and search
functionality. One of the first professionally constructed corpora of Ukrainian [6] has a little over 100
million tokens and has, unfortunately, limited search functionality and output options. It is possible to
run lemma searches, but even some of the more common modern feminine terms (such as блогерка)
are not part of its lemma list. Some of the bigger corpora (over 500 million tokens) are composed
largely or exclusively of texts pulled from the Internet and offer access via the popular Bonito search
engine. Of these, only two corpora [51], [52] have been lemmatized and offer lemma search capability
for established feminine forms. However, the dictionaries used for tagging these corpora lack a
number of feminine terms that have gained prominence in relatively recent time. For example, the
word блогерша (largely recognized as less preferable to блогерка but still found in texts) is not
lemmatized or POS-tagged and hence marked ‘unknown’ in one of these corpora [51]. In contrast,
this word is incorrectly stemmed to the non-existent form блогерший, rather than lemmatized, in the
other corpus [52]. This example clearly demonstrates two things. First, two different strategies for
handling out-of-vocabulary items have been used in constructing these corpora: the conservative
strategy and the greedy stemming approach. Second, having an extensive morphological dictionary
that is regularly updated is critical for corpus representation of feminine personal nouns in Modern
Ukrainian, which are a highly dynamic segment of the lexicon.
The only dictionary that we know of which fulfills these criteria is VESUM, and each successive
iteration of the GRAC corpus is tagged with the newest available version of VESUM. Moreover,
words that are unrecognized in GRAC are analyzed and later addedroughly in the descending order
of frequencyto VESUM. This kind of tandem between the morphological dictionary and the corpus
is mutually beneficial as each iteration increases the quality and coverage of both. Therefore, in what
follows, we will focus our attention on GRAC, keeping in mind that supplementary information,
statistics, and usage examples can be found in other corpora of Ukrainian, as discussed above.
3.1. General Regionally Annotated Corpus of Ukrainian (GRAC)
GRAC is a large reference corpus of Ukrainian comprising over 860 million tokens, more than
90,000 texts written by 20,000 authors over the span of more than 200 years (18162021) [41]. While
following the general principles for the construction of national (reference) corpora, GRAC adds a
regional dimension to its markup, which is highly valuable for linguistic research and is unique
among Ukrainian corpora.
There are several ways to search for feminine forms in GRAC:
1. Lemma search (works if the word is in VESUM).
2. Wordform search (for a specific wordform and for those lexemes that are not covered by
3. CQL query. The Corpus Query Language is a handy tool for constructing flexible queries that
combine different types of features.
4. Semantic search. GRAC is semantically tagged using USL and the semantic tag 1:conc:hum
can be used in conjunction with other features, such as the ending of a lemma, to restrict the
search query to feminine forms only.
In terms of composition, GRAC is the most diverse Ukrainian corpus to date, and as such it offers
arguably the most balanced snapshot of the distribution of feminine forms in the Ukrainian language.
Using CQL queries, we have found that feminine nouns occur in GRAC more than 2,8 million times
(over 4,000 pm), which serves to show that they are in no way alien to Ukrainian. On the contrary,
they are, in fact, used quite frequently in texts. As far as the derivative power of feminine suffixes is
concerned, corpus data yields a list that is similar to the one based on VESUM, but note that the
second and third positions are swapped:
-ка 2.2 million hits, 3,175 pm -иня 335,000 hits, 500 pm
-иця 215,000 hits, 320 pm -иха 32,000 hits, 48 pm
-иса/-еса 32,000 hits, 48 pm -ша 19,000 hits, 29 pm
The -иня forms are not as numerous but occur more frequently that the ones ending in -иця.
So far, we have limited our attention to single-word feminine forms. GRAC data shows, however,
that compounding is a highly productive way of forming feminine forms. Ukrainian has a universal
and frequent model жінка-<noun> (woman-<noun>). In the place of nouns, mostly masculine forms
occur in texts, e.g., жінка-водій ‘(female) driver’, but feminine terms are also registered, e.g.,
жінка-мусульманка ‘Muslim woman’. Such compound feminine personal nouns are totally absent
from large dictionaries of Ukrainian [2], [42]. In GRAC, they occur in some 13,500 instances (16
pm), and the model with the initial element model жінка- ‘woman’ accounts for over 6,400 of these.
Other frequent compounds involve the following nouns in the initial position:
дівчина- ‘girl’, over 3,000 hits пані(-) ‘lady’, over 2,700 hits
баба- ‘old woman’, 250 hits леді- ‘lady’, 160 hits.
GRAC has been tagged using a package from the NLP UK project [37]. It relies on VESUM but
additionally includes the dynamic module which handles OOV items, particularly those that were
constructed using regular patterns. Compound nouns are prime targets for this dynamic module, and it
correctly tags approximately 95% of them. This way compound feminine terms, as well as feminine
forms that involve highly productive prefixes, such as ультра- ‘ultra-’, are recognized “on the fly” in
GRAC and in any other text that is tagged using the NLP UK utilities, resulting in significantly better
lemmatization and morphological analysis of OOV items.
GRAC has an extensive annotation scheme, which includes regional tags and temporal markup.
This makes GRAC unique among Ukrainian corpora as it offers tools for studying the geographical
and temporal distribution of linguistic units. Conveniently, GRAC has a separate interface for
building time plots. Below is an illustration of how GRAC data can be used to track the tendencies in
the use of feminine forms.
Figure 1: The dynamics of three distinct Ukrainian words referring to a female salesperson.
As can be seen in Fig. 1, the word продавщиця ‘female salesperson’, which is very close to the
Russian продавщица and the only form registered in SUM-11 [2], dominated in Soviet time and until
2010. Starting from the late 1980s and throughout the independence period, we can see that
продавчиня rose from virtual non-existence, reaching an equilibrium around 2010, and then assuming
the dominant position. Meanwhile, the form продавниця, registered in the dictionaries on,
peaked in the 1930s but has been very rarely used ever since. This kind of corpus-based temporal
analysis is critical to the understanding of language trends and the usage of words and can go a long
way in improving their lexicographical description.
The tendency to use feminine forms is set to continue in Ukrainian. Even now, there are thousands
of feminine personal nouns which, for various reasons, are underrepresented in corpora and missing
from dictionaries. One significant case in point involves demonyms derived from the names of small
settlements in Ukraine and elsewhere in the world. Certain lexical groups are going to experience
greater expansion than others, and these forms will also need to be registered. It is expected that
feminine forms will be introduced for the full classification of professions (some 9,000 titles) and will
begin to gradually appear in texts. Therefore, there is a need to supplement the GRAC corpus with a
diverse selection of new texts and to continue updating VESUM.
4. Experiment
The compilation of the Web Dictionary of Ukrainian Feminine Personal Nouns involved a number
of lexicographical methods, corpus tools, and decisions regarding composition, structure, and online
publication. The main aspects of this multifaceted process are explained below.
As far as the scope of WDUF is concerned, a decision was made to limit the focus to nouns that
describe women with regard to their profession, type of activity, social status, character traits,
personal preferences, as well as agent nouns, etc. Outside the scope of WDUF are demonyms
(українка ‘(female) Ukrainian’), ethnic names (лемкиня ‘(female) Lemko’), kinship terms (тітка
‘aunt’), and diminutive forms (сестричка ‘little sister’) [48]. New entries will be added with time as
the format of the web dictionary allows for additions to be made with certain regularity, for example,
on a biannual or annual basis.
Spelling variants for feminine forms are provided in WDUF within the same dictionary entry
(ріелторка, рієлторка ‘(female) real estate agent’), while derivational alternatives are described in
separate cross-linked entries (мовознавиця, мовознавка, мовознавчиня ‘(female) linguist’). Entries
with derivational variants are nested (see Figs. 3 and 5 below), meaning that the definitions and
illustrations are provided for each variant feminine form. Noteworthily, all derivational alternatives
are presented on an equal basis and their contexts and lexicographic coverage are studied with equal
The dictionary provides full-fledged definitions for feminine personal nouns, e.g., фахівчиня з
лінгвістики ‘female specialist in linguistics’ for лінгвістка. This contrasts with other Ukrainian
explanatory dictionaries that include only a reference to the respective masculine form, e.g., жін. до
лінгвістfeminine for лінгвіст ‘(male) linguist’ [42]. In WDUF, senses are explained in a
descriptive way in most cases and women are fully legitimate objects of lexicographic definitions
formulated using the following constructions: та, хто... one that …’; фахівчиня з… ‘a (female)
specialist in …’, etc. Grammar information is also tailored to feminine forms: the dictionary supplies
only plural genitive case endings as this is where problems may arise with these nouns. Each feminine
form is provided with the corresponding masculine form in order to enable search by the latter.
Illustrative examples have been selected from texts over the past 100 years (see Fig. 2). The
majority of examples were discovered in the GRAC corpus and via Google search, in particular in
Google Books. The following key criteria were followed in the selection of illustrations in order to
achieve high variety and quality: 1) diachronic coverage; 2) territorial distribution; 3) stylistic variety;
4) respectable authorship. Moreover, new, nontraditional methods have been used to collect
illustrative material. Some illustrations have been taken from example sentences provided by
language panel members. Some others have been supplied by the compiler herself in order to support
Ukrainian equivalents competing with loan words (легенезнавиця ‘(female) pulmonologist’) or to
reinvigorate a forgotten derivational model, such as when a masculine stem ending in -ик changes to -
иц(я) to form a feminine form, e.g., семіотиця ‘(female) semiotician’.
GRAC data has helped reveal such an extensive and detailed picture of how feminine personal
nouns have been used in Ukrainian in the past 100 years that it can hardly be re-created in hand-
picked citations drawn from a limited number of sources. Figure 2 illustrates a selection of quotations
from 1911 until 2019, adduced in reverse chronological order, from GRAC. For those users who need
more information, the dictionary portal provides, for each search word, quick links that
lead to the GRAC corpus (concordance view), VESUM (inflectional paradigms), SUM-11
(definitions), and the dictionary portal (English equivalents). Thus, a WDUF user is always
just one click away from the full selection of illustrations provided in the most recent version of the
GRAC corpus.
Figure 2: The feminine term адвокатка ‘(female) attorney’ in WDUF: a selection of illustrations
over the past century and quick links to other resources marked with a red rectangle.
Corpus data has also led to the discovery of new senses. For example, SUM-20 registers the word
книгарка in the sense ‘(female) librarian’ and marks it as ‘obsolete’. Texts in GRAC contain
examples when this word is used in modern sources in this sense (thus, not obsolete), as well as in
two other, previously unregistered senses: ‘(female) bookstore worker or owner’ and ‘(female) book
supplier’. (Full description and illustrations are provided in the respective entry in WDUF.)
GRAC allows separating and studying unrecognized words, which has facilitated the investigation
of new feminine terms and those that have fallen into disuse. We have also been able to describe
previously ignored derivational models, such as the word flection model in which a feminine form is
coined with the help of a feminine ending -а and vowel elision in the suffix: підліток підлітка
‘male female teenager’, свідок свідка ‘male female witness’. Derivational models can also be
studied in terms of their diachronic evolution on the basis of GRAC. For example, we have
established that while feminine forms ending in -киня were used in Ukrainian in the late 19th century,
this model was supplanted by -чиня in the mid-20th century, e.g., мовкиня мовчиня ‘(female)
4.1. Language panel
The language panel comprising over 30 specialists has been polled about the most challenging
cases involving a choice between multiple derivatives. To our knowledge, this is the first Ukrainian
dictionary that makes use of a language panel, a highly useful source of information which has
become standard in large monolingual dictionaries of English.
Feminine forms that are not well-established in the Ukrainian language were submitted for review
to a 33-member language panel composed of writers, linguists, literary scholars, and translators. The
questionnaire contained 41 questions divided into three groups:
1. Select the best feminine form from among the provided alternatives. For example, фізикиня,
фізичка, and фізиця ‘(female) physicist’.
2. Select a word which, in your opinion best matches the definition. For example, a female athlete
competing in kayak racing: a) байдаркарка; b) байдарниця; c) байдарочниця; d) other.
3. Create feminine forms based on the given masculine forms and construct sentences with them,
e.g., фільмознавець ‘(male) film critic’.
Panel results are shown in the form of charts in the majority of entries that contain derivational
alternatives. In most cases, the experts and the compiler agree in their preferred choices (Figs. 3
and 5), but sometimes they diverge, e.g., on the designations for female critics критикиня,
критикеса, критиця. In these cases, the users need to make their own decisions based on the
information provided.
The web dictionary reflects not only quantitative but also qualitative results of the poll. The third
type of task involved deriving a feminine form from the corresponding masculine noun and
constructing a sentence to place the feminine personal noun in context. This task helped obtain high-
quality illustrations for the newly coined feminine forms and thus reveal its syntactic potential. For
example, the panelists constructed several feminine forms from the masculine noun рідномовець
‘native speaker’: рідномовиця (65%), рідномовка (20%), рідномовчиня (10%), and рідномовниця
(5%). The sentences they provided strongly suggested that the first option has both qualitative
(semantic) and quantitative (relative preference) advantage over the others (Fig. 3).
Figure 3: The feminine forms for native speaker in WDUF on, including the compiler’s
recommendations and language panel results.
5. Results
WDUF contains 2,000 feminine personal nouns and is focused on four main groups:
1) new feminine forms used in texts but missing from dictionaries (36%, Fig. 4);
2) forgotten forms that contribute to the understanding of competing models today;
3) widely used feminine derivatives that are variously described in other dictionaries (e.g.,
registered in one but missing from others);
4) alternative derivatives.
Figure 4: Percentage of WDUF headwords registered in other Ukrainian dictionaries.
The dictionary does not aspire to cover all feminine personal nouns in Ukrainian. We estimate that
there are 6,0009,000 such forms currently in use, and this number is set to increase. For example, the
National Commission for the Standards of Ukrainian has so far published 400 feminine professional
titles, while intending to feminize a list of 9,000 items. Nevertheless, WDUF offers the most complete
coverage of feminine personal nouns to date, including the most recent coinages, and provides sense
division, ample illustrative contexts, cross-linked derivational alternatives and synonymic terms,
alternative spellings, and other types of information.
5.1. Search capabilities in WDUF
Similar to other dictionaries on the dictionary portal, WDUF has full-text functionality.
This means, among other things, that whenever a user searches for a masculine form in WDUF, the
respective feminine form and in many cases, multiple such forms will automatically be
retrieved and displayed as search results.
In addition to searching among headwords, a search query for a certain feminine personal noun
will find it in illustrations across the entire body of the dictionary, i.e., also in other entries. For
example, the word міністерка ‘(female) minister’ will be found and highlighted as both a headword
and an in-text word in illustrations for the derivational alternatives міністриня and міністерка
(Fig. 5) and for other headwords, such as податковиця ‘(female) tax official’ and прем’єр-
міністерка ‘(female) prime minister’ (Fig. 6). This kind of extended search capability enables the
user to better study the use of a specific feminine personal noun.
Figure 5: Full-text search results for міністерка in WDUF on, headword entries only.
Figure 6: Search hits for міністерка highlighted in other entries in WDUF on
The dictionary portal allows wildcard search. As an asterisk (*) replaces zero or more
characters, while a question mark (?) replaces exactly one character. With the help of these wildcards,
users can search for multiple words representing the same derivational model. For example, the search
queries *знавиця and *графка in WDUF will retrieve all feminine nouns that end with these letter
sequences. This function will be especially useful for the linguistic community as it offers convenient
tools to explore the material of WDUF in derivational, semantic, and other studies.
6. Discussion
WDUF belongs to the third category of online dictionaries (as described in section 2) as it has been
specifically designed and created for electronic/online publication. Unlike most other online
Ukrainian dictionaries, WDUF has full-text search functionality (as do other r2u dictionaries), which
makes it possible to quickly find lexical items using various parameters. Users can set the scope of
their search querieseither all dictionaries or one lexicographic work. Another highly useful feature
allows users to search for a masculine form and find one or multiple respective feminine derivatives
next to it in WDUF. This will be especially helpful to those speakers who are in doubt as to the best
possible feminine derivative for a given masculine personal noun.
What sets WDUF apart from other reference works covering feminine forms is that, in each entry,
it brings together all available types of information on the headword, which is otherwise typically
scattered across different types of dictionaries: spelling; stress pattern; definition; examples of use;
derivatives; synonyms (cross-linked within WDUF); illustrations with a historical perspective (from
the most recent ones to the oldest available examples, going back to the 19th century); references to
other dictionaries that include the headword in question; recommendations from the compiler;
language panel results.
WDUF offers a more complete picture of the use and derivation of feminine forms than
monolingual dictionaries, such as the Soviet-time SUM-11 or even the modern SUM-20. This has
been achieved thanks to the discovery of such nouns in Ukrainian dictionaries published in the 1920s,
banned under the Soviet regime and later brought back into circulation in electronic form on the dictionary portal. Moreover, other modern dictionaries [18], [27], [28], as well as corpus
data (GRAC) and Google search have been utilized.
Until recently, the prevalent understanding in Ukrainian linguistics was that the majority of
feminine personal nouns are used in the colloquial and, rarely, journalistic style. With just a few
exceptions, this view is reinforced by Ukrainian monolingual dictionaries. For example, SUM-20
(but, curiously, not SUM-11) provides the headword депутатка ‘(female) council member’ with the
remark ‘colloquial’, while illustrating this example with sentences dating back to the 1950s and 1960s
[42]. A search in the GRAC corpus reveals, however, that this word, which has the absolute frequency
of more than 1,500 and the relative frequency of 1.85 pm, is used in official speech, for example, in
the minutes of the Verkhovna Rada of Ukraine since 1990 and until now. A number of journalistic
and fiction texts published since the early 20th century and until present time provide ample evidence
that депутатка is actually a neutral term, e.g., Засідання розпочали зі звернення депутатки від
«Європейської Солідарності» Ірини Геращенко <…>. (Ukraina Moloda, 2019).
The practical value of the GRAC corpus in the study of Ukrainian feminine terms lies, above all,
in its extensive markup. Particularly, its regional markup allows comparisons to be made between the
language of mainland Ukraine and the Ukrainian diaspora, which is something no other Ukrainian
corpus can offer. Furthermore, comparisons can be made between genres and time periods. GRAC
data supports the claim that feminine personal nouns are widely used not only in colloquial texts and
official language, but also in legal texts (court decisions, minutes, etc.) and academic publications
(papers and theses). For example, відповідачка ‘(female) defendant’ is used in the official style,
соціологиня ‘(female) sociologist’ in the academic style, and банкірка ‘(female) banker in the
journalistic style. The genre analysis of feminine terms provides a solid foundation for stylistic
remarks in WDUF and has helped cast off the ‘colloquial’ remark assigned to such forms as медичка
‘(female) medic’ and хімічка ‘(female) chemist’ in SUM-11.
7. Conclusions and Further Research
The Web Dictionary of Ukrainian Feminine Personal Nouns is an online lexicographic resource
that comprises 2,000 headwords. It is an effort to systematize and codify feminine forms that
constitute one of the most dynamic strata of the Ukrainian lexicon. In the process of research and
dictionary compilation, a number of online and offline resources have proved useful, notably the older
Ukrainian dictionaries on the dictionary portal, the dictionaries of new Ukrainian words,
and the General Regionally Annotated Corpus of Ukrainian.
The GRAC corpus has unparalleled temporal, geographical, and genre variety of texts, making it
possible to select excellent illustrative examples of feminine personal nouns in context over the past
century and revise some of the stylistic restrictions imposed on them by other monolingual Ukrainian
dictionaries. GRAC texts helped discover new senses of feminine forms, half-forgotten derivational
models, and derivational variants, as well as trace the dynamics of derivational models in diachrony.
Furthermore, thanks to GRAC’s textual data WDUF makes, for the first time in the lexicographic
description of Ukrainian feminine personal nouns, an attempt to bring together two standards of
Ukrainian: that of mainland Ukraine and that of the Ukrainian diaspora. GRAC has made it possible
to rigorously study the use of feminine forms since the late 19th century until present time, leading to
the discovery of words which fell into disuse in the 20th century, e.g., боркиня ‘(female) wrestler,
fighter’, критиця ‘(female) critic’, творкиня ‘(female) creator’, etc.
Full-text, flexible searchability implemented in WDUF meets the needs of various groups of users,
from occasional visitors to students and journalists to professional linguists.
VESUM contains a large selection of feminine forms, including most recent coinages. When used
in conjunction with NLP tools for Ukrainian from the r2u group, it serves as a key resource enabling
the identification and POS tagging of feminine terms in any Ukrainian text. Newly discovered
feminine personal nouns that are presented in WDUF will be added to VESUM. All of feminine
personal nouns contained in VESUM are supplied with semantic tags and are part of the Ukrainian
Semantic Lexicon, a machine-readable dictionary used for semantic tagging.
Among Ukrainian corpora available for research, GRAC has the most diverse collection of texts,
the most extensive markup (temporal, regional, genre, etc.), as well as a rich set of corpus tools. These
features make it the prime corpus resource for the study of Ukrainian feminine terms. GRAC’s tools
for the study and visual presentation of lexical dynamics across time are unique for Ukrainian. They
also enable researchers to study the behavior of large groups of words rather than individual lexemes.
Based on GRAC data, we have found that by far the most productive suffix used to form feminine
personal nouns in Ukrainian is the suffix -к(а), followed by -ин(я) and -иц(я). We have also
discovered that, in addition to single-word feminine forms, compounding is a productive (in fact,
universal) way of designating women in Ukrainian. Such compounds are conspicuously absent from
Ukrainian dictionaries.
Feminine personal nouns need to be further collected, studied, and described in dictionaries on the
basis of language corpora and other textual data. Using special corpus tools available in GRAC, we
have already identified a series of such nouns that are not registered in any Ukrainian lexicographic
A combination of corpora and the online/electronic format of lexicographic description offers the
best opportunities for rapid and complete coverage of feminine personal nouns, whose number in
Ukrainian is constantly growing.
8. Acknowledgements
This project has been financially supported by the Believe in Yourself Foundation at the Ukrainian
Catholic University in Lviv, Ukraine.
