Svetlozara LesevaBulgarian Academy of Sciences | BAS · Department of Computational Linguistics
Svetlozara Leseva
PhD
About
48
Publications
7,253
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
150
Citations
Introduction
Skills and Expertise
Publications
Publications (48)
In this paper I illustrate the semantic description of verbs provided in three semantic resources (FrameNet, VerbNet and VerbAtlas) in comparative terms with a view to identifying common and distinct components in their representation and obtaining a preliminary idea of the resources’ interoperability. To this end, I provide a comparison of a small...
This study offers insights into the similarities and differences between the zero suffix and overt English suffixes involved in verb-to-noun and noun-to-verb derivation. It is based on morphosemantically related pairs of noun and verb senses released as a Princeton WordNet standoff file, which are annotated with a set of fourteen semantic relations...
We carry out a large-scale study of noun-verb zero derivation pairs in English in order to identify possible semantic contrasts between the two derivational directions: V-to-N (zero nouns) and N-to-V (zero verbs). We compile a dataset of 4,879 N-V word sense pairs from the Princeton WordNet, which are annotated for noun and verb semantic classes an...
The paper presents a contrastive analysis of the possibilities of forming inchoative verbs from state predicates in Bulgarian and Russian. The derivational patterns under consideration are much more productive in Bulgarian, but the aspectual properties of predicates impose significant constraints on their combinability with inchoative prefixes in b...
This paper presents a critical overview of thematic classes of stative verbs. To this end, we analyse three well-known thematic classifications of stative verb classes. While the main focus is on the works by Paducheva (1996; 2004), Spencer and Zaretskaya (2003) and Van Valin and LaPolla (1997), where relevant we comment on research by other author...
The paper presents work in progress on the compilation and automatic annotation of a dataset comprising examples of stative verbs in parallel Bulgarian-Russian corpora with the goal of facilitating the elaboration of a classification of stative verbs in the two languages based on their lexical and semantic properties. We extract stative verbs from...
This paper presents a critical overview of thematic classes of stative verbs. To this end, we analyse three well-known thematic classifications of stative verb classes. While the main focus is on the works by Paducheva (1996; 2004), Spencer and Zaretskaya (2003) and Van Valin and LaPolla (1997), where relevant we comment on research by other author...
Our work is focused on the conceptual description of verbs by employing two main resources – the lexical semantic network WordNet and the conceptual frames from FrameNet. We implement a method for inheritance-based mapping between the two resources by transferring the frame assignments from a hypernym to its hyponyms. We discover that the method pe...
The paper presents current efforts towards linking two large lexical semantic resources -WordNet and FrameNet-to the end of their mutual enrichment and the facilitation of the access, extraction and analysis of various types of semantic and syntactic information. In the second part of the paper, we go on to examine the relation of inheritance and o...
The paper deals with the conceptual and syntactic specialisation that takes place between more general and more specific or constrained semantic and lexical descriptions as represented in two previously linked resources-FrameNet and WordNet-where lexical units (LUs) instantiating a particular frame in FrameNet are linked to a semantically correspon...
This paper outlines procedures for enhancing WordNet with conceptual information from FrameNet. The mapping of the two resources is non-trivial. We define a number of techniques for the validation of the consistency of the mapping and the extension of its coverage which make use of the structure of both resources and the systematic relations betwee...
This paper presents the principles and procedures involved in the construction of a classification of verbs using information from 3 semantic resources-WordNet, FrameNet and VerbNet. We adopt the FrameNet frames as the primary categories of the proposed classification and transfer them to WordNet synsets. The hierarchical relationships between the...
A Conference Chronicle - The Annual International Conference of the Institute for Bulgarian Language Prof. Lyubomir Andreychin at the Bulgarian Academy of Sciences (ConfIBL 2019)
Тhe paper presents the Dictionary of Bulgarian Multiword Expressions. We outline the main features of Bulgarian MWEs, their description and classification based on morphosyntactic, structural and semantic criteria. Further, we discuss the organisation of the Dictionary and the components of the description of the MWEs, as well as the links to other...
This paper presents the extraction, representation and management of metadata in the Bulgarian National Corpus. We briefly present the current state of the Corpus and the general principles on which its development lies: uniformity, diversity of text samples, automatic compilation, extensive metadata, multi-layered linguistic annotation. The releva...
This paper presents a machine learning method for automatic identification and classification of morphosemantic relations (MSRs) between verb and noun synset pairs in the Bulgarian WordNet (BulNet). The core training data comprise 6,641 morphosemantically related verb–noun literal pairs from BulNet. The core data were preprocessed quality-wise by a...
In the context of developing wordnets and using them in various applications, we have been enriching the Romanian and Bulgarian resources with morphosemantic relations that can aid broadening the wordnet content and improving the possible NLP applications. In this paper, we build on our previous results, adding to our presentation data from English...
This paper presents work in progress on a machine learning method for classification of morphosemantic relations between verb and noun synsets. The training data comprises 5,584 verb–noun synset pairs from the Bulgarian WordNet, where the morphosemantic relations were automatically transferred from the Princeton Word-Net morphosemantic database. Th...
The paper discusses several key concepts related to the development of corpora and reconsiders them in light of recent developments in NLP. On the basis of an overview of present-day corpora, we conclude that the dominant practices of corpus design do not utilise adequately the technologies and, as a result, fail to meet the demands of corpus lingu...
The paper presents the partially automatically annotated and fully manually validated Bulgarian-English Sentence- and Clause-Aligned Corpus. The discussion covers the motivation behind the corpus development, the structure and content of the corpus, illustrated by statistical data, the segmentation and
alignment strategy and the tools used in the c...
The paper presents a new resource light flexible method for clause alignment which combines the Gale-Church algorithm with internally collected textual information. The method does not resort to any pre-developed linguistic resources which makes it very appropriate for resource light clause alignment. We experiment with a combination of the method...
The paper outlines an approach to the representation of lexical prefixes with a view to what is known as their argument-structure changing properties. The prefixes are treated as abstract frame-evoking predicates that interact with and subordinate the frame of the constant. A case study illustrates the specifics of this interaction and its effects...
The paper presents a tool assisting manual annotation of linguistic data developed at the Department of Computational linguistics, IBL-BAS. Chooser is a general-purpose modular application for corpus annotation based on the principles of commonality and reusability of the created resources, language and theory independence, extendibility and user-f...
The Bulgarian Sense Tagged Corpus is derived from the "Brown" Corpus of Bulgarian and annotated with word senses from the Bulgarian WordNet. The paper gives a brief account of the already available and currently developed language resources and tools which enabled the compilation and annotation of the Bulgarian Sense Tagged Corpus. We briefly descr...
The Bulgarian Sense Tagged Corpus is derived from the " Brown " Corpus of Bulgarian and annotated with word senses from the Bulgarian WordNet. The paper gives a brief account of the already available and currently developed language resources and tools which enabled the compilation and annotation of the Bulgarian Sense Tagged Corpus. We briefly des...
The Bulgarian Sense Tagged Corpus is derived from the " Brown " Corpus of Bulgarian and annotated with word senses from the Bulgarian WordNet. The paper gives a brief account of the already available and currently developed language resources and tools which enabled the compilation and annotation of the Bulgarian Sense Tagged Corpus. We briefly des...
The Bulgarian Part-of–Speech (POS) and Word-Sense (WS) Tagged Corpora are derived from the "Brown" Corpus of Bulgarian, automatically annotated respectively with POS and WS tags and manually disambiguated with the annotation application Chooser. The adopted methodology for constructing and preprocessing the source corpora is briefly described. The...