About
48
Publications
6,097
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
432
Citations
Introduction
I am a computational linguist doing basic and applied research mainly in the areas of lexis, syntax, and semantics and their interdependencies. My research focuses on modeling language and developing resources (computational lexica, annotated corpora, and computational grammars) aimed at training and evaluating Natural Language Processing tools. I am particularly interested in Multi-Word Entries Identification, NERC, Textual Entailment, Sentiment Analysis and Stance Detection.
Additional affiliations
September 1993 - September 1998
Position
- Research Assistant
Description
- As a member of the NLP team, I was engaged in the creation of various dictionaries of MG (morphological; bilingual EL-EN & EL-EN dictionary for human users in collaboration with Collins HarperCollins Publishers Ltd. & the lexicographic Centre COBUILD; Pronunciation & Usage dictionary for Foreigners (with Brasshouse Language Centre); pronunciation & usage dictionary of Greek dialects of N.Italy; KORAIS Dictionary (Anastasiadou-Simeonidou, A., V. Giouli, E. Koutsogeorgopoulou, and G. Kokkinakis).
January 1998 - present
Publications
Publications (48)
This paper presents our implementation of interactive services related to map-based exploration of cultural heritage
geospatial data. We first present our methodology for interlinking geographical entities (e.g., Points of Interest),
i.e., the identification of same real-world geographical representations between different data sources. Then, we
pr...
Multiword expressions (MWEs) are sequences of words that pose a challenge to the computational processing of human languages due to their idiosyncrasies and the mismatch between their phrasal structure and their semantics. These idiosyncrasies are of lexical, morphosyntactic and semantic 11 nature, namely: non-compositionality, i.e., the meaning of...
Thesauri have long been recognized as valuable structured resources aiding Information Retrieval systems. A thesaurus provides a precise and controlled vocabulary which serves to coordinate data indexing and retrieval. The paper presents a bilingual Greek and English specialized thesaurus that is being developed as the backbone of a platform aimed...
The paper gives an account of an infrastructure that will be integrated into a platform aimed at providing a multi-faceted experience to visitors of Northern Greece using mythology as a starting point. This infrastructure comprises a multi-lingual and multi-modal corpus (i.e., a corpus of textual data supplemented with images and video) that belong...
Language resources of any type are of paramount importance to several Natural Language Processing applications; developing and maintaining, however, quality lexical semantic resources is still a laborious and costly task that presents various challenges. In this respect, there is an ever-growing demand for resources that are visible, easily accessi...
The authors report on a recent survey on monolingual dictionaries available on the Greek market. General dictionaries outnumber spelling and educational ones and enjoy a prestigious status. Only one general dictionary is digitally born and only two are available through the web, but several are available as CDs. Most of the prestigious dictionaries...
Multiword expressions can have both idiomatic and literal occurrences. For instance pulling strings can be understood either as making use of one’s influence, or literally. Distinguishing these two cases has been addressed in linguistics and psycholinguistics studies, and is also considered one of the major challenges in MWE processing. We suggest...
The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who...
Multiword expressions (MWEs) are known as a “pain in the neck” due to their idiosyncratic behaviour. While some categories of MWEs have been largely studied, verbal MWEs (VMWEs) such as to take a walk, to break one’s heart or to turn off have been relatively rarely modelled. We describe an initiative meant to bring about substantial progress in und...
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multi-word expressions. We present the annotation methodology, focusing on changes from last year's shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation...
In this paper, we present a case study on colour and emotion terms and their cultural references in the framework of the COST European Network
of e-Lexicography (ENeL), working towards Pan-European lexicography. We take an initial use case of red in connection with emotions (anger) and look at its roots across different European languages, includi...
This paper presents work on collecting comparable corpora for 9 language pairs: Estonian-English, Latvian-English, Lithuanian-English, Greek-English, Greek-Romanian, Croatian-English, Romanian-English, Romanian-German and Slovenian-English. The objective of this work was to gather texts from the same domains and genres and with a similar level of c...
Computational approaches to sentiment analysis focus on the identification, extraction, summarization and visualization of emotion and opinion expressed in texts. These tasks require large-scale language resources (LRs) developed either manually or semi-automatically. Building them from scratch, however, is a laborious and costly task, and re-using...
The proposed paper reports on work in progress aimed at the development of a conceptual lexicon of Modern Greek (MG) and the encoding of MWEs in it. Morphosyntactic and semantic properties of these expressions were specified formally and encoded in the lexicon. The resulting resource will be applicable for a number of NLP applications.
The proposed paper reports on work in progress aimed at the development of a conceptual lexicon of Modern Greek (MG) and the encoding of MWEs in it. Morphosyntactic and semantic properties of these expressions were specified formally and encoded in the lexicon. The resulting resource will be applicable for a number of NLP applications.
Αντικείμενο της παρούσας εργασίας αποτελεί η ανάπτυξη γλωσσικών πόρων στην Ελληνική με στόχο την περαιτέρω εκπαίδευση και αξιολόγηση υπολογιστικών συστημάτων ανάλυσης συναισθήματος και έκφρασης γνώμης σε κείμενα. Ειδικότερα, η εργασία αφορά στην επισημείωση Σώματος Κειμένων (ΣΚ) αναφορικά με την έκφραση συναισθήματος και άποψης ατόμων ή ομάδων ατόμ...
This paper presents corpus work aimed at manually annotating sentiment expressions in the Greek (EL) sub–corpus of a corpus of movies coupled with both orthographic (EN) transcriptions and subtitles in (EL) and (ES). Our effort involves the treatment of emotion predicates and emotion–related concepts in naturally occurring texts and their integrati...
We hereby present work aimed at giving an account of Greek verbs denoting emotion that is placed within a larger context, aimed towards defining and describing the semantic field of emotions by means of identifying, selecting, classifying and organizing a core lexicon of emotions in a conceptual Data Base. The ultimate goal is the exhaustive descri...
We present a multi-lingual Lexical Resource (LR) developed in the context of a lexicographic project that involves the development of user-oriented dictionaries for immigrants in Greece. The LR caters to languages that as of yet remain disconnected, and also encompasses a variety of styles that are relevant to communicative situations that the targ...
This paper presents an ongoing effort work focusing on the development of an audiovisual corpus resource and its annotation in terms of sentiments and opinions. A modular annotation schema has been employed based on the specifications of existing schemas and extending or adapting them to cater for the peculiarities of the corpus-specific data.
This paper presents an ongoing effort work focusing on the
development of an audiovisual corpus resource and its annotation in terms of sentiments and opinions. A modular annotation schema has been employed based on the specifications of existing schemas and
extending or adapting them to cater for the peculiarities of the corpus-specific data.
In this paper we describe on-going work aimed at the creation of a suite of specialized Language Resources (LRs) intended for users not previously targeted at, namely, adult immigrants in Greece. The ultimate goal being to help them integrate in the Greek society, we aim to provide support touching at basic linguistic, social and everyday issues. T...
In this paper, we present work aimed at the linguistic annotation of Greek corpora that belong to the humanities domain, the focus being on the methodological principles as well as the im-plementation framework adopted. This framework, builds on an existing XML annotation platform that was initially developed in an Information Extraction setting an...
There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing
and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed
work aimed at highlighting textual cultural resources that, as of yet, remain under-explo...
This paper presents work on collecting comparable corpora for 9 language pairs: Estonian-English, Latvian-English, Lithuanian-English, Greek-English, Greek-Romanian, Croatian-English, Romanian-English, Romanian-German and Slovenian-English. The objective of this work was to gather texts from the same domains and genres and with a similar level of c...
Introduction This paper reports on work aimed at (a) developing an application tailored to integrate and highlight textual cultural resources that, as of yet, remain under-exploited, and (b) creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural...
This paper reports on completed work carried out in the framework of an EU-funded project aimed at (a) developing a bilingual collection of cultural texts in Greek and Bulgarian, (b) creating a number of accompanying resources that will facilitate study of the primary texts across languages, and (c) integrating a system which aims to provide web-en...
The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was collected in view of a range of NLP applications, where semantic interpreta...
This paper reports on completed work carried out in the framework of the INTERA project, and specifically, on the production of multilingual resources (LRs) for eContent purposes. The paper presents the methodology adopted for the development of the corpus (acquisition and processing of the textual data), discusses the divergence of the initial ass...
In this paper, we describe work in progress for the development of a Greek named entity recognizer. The system aims at information
extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy
have been the basic guidelines for the system’s design. Pattern matching techniques have be...
In this paper we describe a method for the efficient parsing of real-life Greek texts at the surface syntactic level. A grammar consisting of non-recursive regular expressions describing Greek phrase structure has been compiled into a cascade of finite state transducers used to recognize syntactic constituents. The implemented parser lends itself t...
This paper proposes a flexible and unified tagging architecture that could be incorporated into a number of applications like information extraction, cross-language information retrieval, term extraction, or summarization, while providing an essential component for subsequent syntactic processing or lexicographical work. A feature-based multi-tiere...
In this paper, we describe work in progress for the development of a named entity recognizer for Greek. The system aims at information extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy have been the basic guidelines for the system's design. Our system is an automated pipe...
This paper presents two dictionaries which are currently under development at the Wire Communications Laboratory (WCL) of the University of Patras within the framework of the EE program Socrates/Lingua: (i) an Electronic Dictionary of Pronunciation and Usage of Greek Dialects of Southern Italy, and (ii) an Electronic Dictionary of Pronunciation and...
This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development...
The paper reports on the development methodology of a system aimed at multi-domain multi-lingual recognition and classification of names in texts, the focus being on the linguistic resources used for training and testing purposes. The corpus presented here has been collected and annotated in the framework of different projects the critical issue be...