• Home
  • Penny Labropoulou
Penny Labropoulou

Penny Labropoulou
Institute for Language and Speech Processing, Athena R.C.

MSc in Machine Translation, UMIST


How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more


Publications (43)
Conference Paper
Full-text available
This paper presents the CLARIN:EL infrastructure, which comprises three pillars: the language resources and technologies Platform, the Portal and the Knowledge Centre. It serves as a com-prehensive and interoperable environment that supports language-related research in the fields of language technology, language studies, digital humanities, and po...
Conference Paper
Full-text available
Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent...
Full-text available
Short presentation about conversion of publication metadata and other bibliodata to Linked Open Data using Wikibase and Wikidata, read at DH2023 ADHO conference in Graz (Austria); part of the panel "Fostering Collaboration to Enable Bibliodata-driven Research in the Humanities" (DARIAH Working Group Bibliodata)
Full-text available
This paper presents LexMeta, a metadata model for the description of lexical resources, such as dictionaries, word lists, glossaries, etc., to be used in language data catalogues mainly targeting the lexicographic and broader humanities communities but also users exploiting such resources in their research and applications. A comparative review of...
Full-text available
The ELG platform enables producers of language resources and language technology tools and services to upload, describe, share, and distribute their services and products as well as to describe their companies, academic organisations and projects. This chapter presents the functionalities offered through web-based user interfaces for describing LT...
Full-text available
One of the objectives of the European Language Grid is to help overcome the fragmentation of the European Language Technology community by bringing together language resources and technologies, information about them, Language Technology consumers, providers and the wider public. This chapter describes the mechanisms ELG has put in place to build i...
Full-text available
This chapter describes the European Language Grid cloud platform from the point of view of a consumer who wishes to access language resources or make use of language technology tools and services. Three aspects are discussed: 1. the webbased user interface (UI) for casual and non-technical users, 2. the underlying REST APIs that drive the UI but ca...
Full-text available
The European Language Grid is meant to develop into the primary platform of the European Language Technology community. In addition to LT tools and services (Chapter 7) and Language Resources (Chapter 8), ELG represents the actual members of this community, i. e., the companies and research organisations that develop language technologies and that...
Full-text available
This chapter provides an overview of what is available in ELG in terms of datasets, corpora and other language resources (LRs) and how this has been achieved. We look at the procedures and steps that have been followed to complete the full resource ingestion cycle, which goes from repository and LR identification to metadata description and ingesti...
Full-text available
In the fragmented Language Technology (LT) landscape of multilingual Europe, ELG has set out to bring together language resources and technologies (LRTs) and boost the LT sector and its activities. The primary goal is to build a scalable and comprehensive cloud platform for providers, developers, integrators and consumers of language resources and...
Conference Paper
Full-text available
In this paper, we present LexMeta, a metadata model for the description of humanreadable and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexico graphical works. The scope of the proposed model,...
Full-text available
In this paper, we present LexMeta, a metadata model for the description of human readable and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexicographical works. The scope of the proposed model,...
Conference Paper
Full-text available
Language researchers are usually aware of intellectual property and personal data (PD) requirements. The problem, however, arises when these two legal regimes have conflicting requirements. For instance, when copyright law requires the acknowledgement of the author, but personal data law enshrines the data mini-misation principle. It is a practical...
Full-text available
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant im...
Conference Paper
Full-text available
The article focuses on determining responsible parties and the division of potential liability arising from sharing language data (LD) containing personal data (PD). A key issue here is to identify who has to make sure and guarantee the GDPR compliance. The authors aim to answer 1) whether an individual researcher is a controller and 2) whether sha...
Full-text available
Uuring keskendub isikuandmeid sisaldavate keeleandmete jagamisele, mis kujutab endast isikuandmete töötlemist. Rahvusvahelises praktikas ei ole üheselt selge, kuidas jaguneb vastutus isikuandmete töötlemise eest konkreetse teadlase ja teadusasutuse vahel. Näiteks erineb Prantsusmaa ja Saksamaa mudel Eesti, Leedu ja Soome mudelist. Omalaadset lähene...
Conference Paper
Full-text available
The article analyses the responsibility for ensuring compliance with the General Data Protection Regulation (GDPR) in research settings. As a general rule, organisations are considered the data controller (responsible party for the GDPR compliance). Research constitutes a unique setting influenced by academic freedom. This raises the question of wh...
Full-text available
This article investigates the compatibility of the current CLARIN license categorization scheme with the open science paradigm. The first part presents the main concepts and theoretical framework required for the analysis, while the second part discusses the use of the CLARIN categorization system, divided into PUB (public), ACA (academic), and RES...
Conference Paper
Full-text available
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider...
Full-text available
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider...
Full-text available
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT b...
Full-text available
The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the...
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT b...
Full-text available
The authors address the legal issues relating to the creation and use of language models. The article begins with an explanation of the development of language technologies. The authors analyse the technological process within the framework copyright, related rights and personal data protection law. The authors also cover commercial use of language...
Full-text available
Most NLP tools implement machine learning algorithms producing models. Pre-trained models are an essential asset in NLP and data mining infrastructures When can we legally train a model from annotated corpora? Can we distribute the trained model and if so under which conditions?
Conference Paper
Full-text available
A significant concern in processing natural language data is the often unclear legal status of the input and output data/resources. In this paper, we investigate this problem by discussing a typical activity in Natural Language Processing: the training of a machine learning model from an annotated corpus. We examine which legal rules apply at relev...
Conference Paper
Full-text available
This paper is a first analysis of the legal interoperability issues in the framework of the OpenMinTeD (OMTD) project (www.openminted.eu), which aims to create an open, service-oriented e-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. The paper offers an overview into the methods for achieving such interoperabili...
Conference Paper
Full-text available
META-SHARE is an infrastructure for sharing Language Resources (LRs) where significant effort has been made into providing carefully curated metadata about LRs. However, in the face of the flood of data that is used in computational linguistics, a manual approach cannot suffice. We present the development of the META-SHARE ontology, which transform...
Conference Paper
Full-text available
This paper presents a metadata model for the description of language resources proposed in the framework of the META-SHARE infrastructure, aiming to cover both datasets and tools/technologies used for their processing. It places the model in the overall framework of metadata models, describes the basic principles and features of the model, elaborat...
Conference Paper
Full-text available
This paper presents the metadata schema for describing language resources (LRs) cur-rently under development for the needs of META-SHARE, an open distributed facility for the exchange and sharing of LRs. An es-sential ingredient in its setup is the existence of formal and standardized LR descriptions, cornerstone of the interoperability layer of an...
Conference Paper
Full-text available
This paper reports on completed work carried out in the framework of the INTERA project, and specifically, on the production of multilingual resources (LRs) for eContent purposes. The paper presents the methodology adopted for the development of the corpus (acquisition and processing of the textual data), discusses the divergence of the initial ass...
Full-text available
This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development...
Full-text available
This paper presents an automatic Generator of dictionary definitions for concrete entities, based on information extracted from a Computational Lexicon (CL) containing semantic information. The aim of the adopted approach, combining NLG techniques with the exploitation of the formalised and systematic lexical information stored in CL, is to produce...
Full-text available
This paper presents the Hellenic National (HNC), which is the corpus of Modern Greek developed by the Institute for Language and Speech Processing (ILSP). The presentation describes all stages of the creation of the corpus: collection of the material, tagging and tokenizing, construction of the database and the online implementation which aims at r...


Cited By