Sokratis Sofianopoulos

Sokratis Sofianopoulos
Institute for Language and Speech Processing | ISLP · Machine Translation

PhD

About

29
Publications
1,267
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
125
Citations
Citations since 2016
12 Research Items
44 Citations
2016201720182019202020212022024681012
2016201720182019202020212022024681012
2016201720182019202020212022024681012
2016201720182019202020212022024681012
Additional affiliations
March 2005 - present
Institute for Language and Speech Processing
Position
  • Software Engineer, Researcher

Publications

Publications (29)
Conference Paper
Full-text available
This paper presents a collection of parallel corpora generated by exploiting the COVID-19 related dataset of metadata created with the Europe Media Monitor (EMM) / Medical Information System (MediSys) processing chain of news articles. We describe how we constructed comparable monolingual corpora of news articles related to the current pandemic and...
Chapter
Following the detailed description of the PRESEMT Machine Translation system and the report on its performance, the current chapter focuses on the system’s portability. Portability is a term intended to signify the process of integrating a new language pair into the system. This involves reviewing all the necessary system modules and resources and...
Chapter
The topic of the current chapter is the evaluation of the performance of PRESEMT both per se as well as in comparison with other MT systems, the performance relating to the translation quality being achieved. While it is possible to employ humans for this task (subjective evaluation), who assess an MT system in terms of fluency (i.e. grammaticality...
Chapter
This chapter performs a review of the research work discussed in the previous chapters of the present volume. This review represents a summary of the outcomes of the research within the PRESEMT project. As a logical outcome, a set of key directions is identified for future work in order to further improve the MT methodology. A brief report of the m...
Chapter
This chapter presents in detail the main translation process of PRESEMT, delving deeper in the core of the system and its inner workings.
Chapter
This chapter introduces the general design characteristics of PRESEMT and provides a detailed description of all resources required as well as all pre-processing steps needed, such as corpora processing and model creation.
Chapter
This chapter describes a number of improvements performed on the basic PRESEMT system. These improvements are aimed at specific modules of the system in an effort to achieve gains in the translation accuracy, for which alternative implementations have been suggested. These extensions concern different modules of the PRESEMT architecture. The first...
Chapter
This chapter contains a general introduction to the topic of the present book. It presents the current challenges of Machine Translation (MT), in particular for languages where only a limited amount of specialised resources is readily available. To that end, a comprehensive review of the state-of-the-art in MT is performed. Focus is placed on relat...
Book
This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a...
Chapter
The present chapter reviews the development of a hybrid Machine Translation (MT) methodology, which is readily portable to new language pairs. This MT methodology (which has been developed within the PRESEMT project) is based on sampling mainly monolingual corpora, with very limited use of parallel corpora, thus supporting portability to new langua...
Conference Paper
This paper reports on a first prototype implementation for combining and extending a data infrastructure with linguistic processing services, bringing language datasets and basic language processing services together in a unified platform thus boosting the organic growth of data and facilitating language technology research and development. The MET...
Conference Paper
Full-text available
The present article investigates the fusion of different language models to improve translation accuracy. A hybrid MT system, recentlydeveloped in the European Commissionfunded PRESEMT project that combines example-based MT and Statistical MT principles is used as a starting point. In this article, the syntactically-defined phrasal language models...
Conference Paper
The current paper evaluates the performance of the PRESEMT methodology, which facilitates the creation of machine translation (MT) systems for different language pairs. This methodology aims to develop a hybrid MT system that extracts translation information from large, predominantly monolingual corpora, using pattern recognition techniques. PRESEM...
Conference Paper
The current paper presents a language-independent methodology, which facilitates the creation of machine translation (MT) systems for various language pairs. This methodology is implemented in the PRESEMT hybrid MT system. PRESEMT has the lowest possible requirements on specialised resources and tools, given that for many languages (especially less...
Conference Paper
Full-text available
This document contains a brief presentation of the PRESEMT project that aims in the development of a novel language-independent methodology for the creation of a flexible and adaptable MT system.
Article
In this article, aspects regarding the optimisation of mach ine translation systems via evolutionary computation algorithms are examined. The article focuses on pattern- recognition based machine translation systems that use large monolingual corpora in the target language from which statistical information is extracted. The research reported here...
Article
Full-text available
The present article introduces a phrasealignment approach that involves the processing of a small bilingual corpus in order to extract suitable structural information. This is used in the PRESEMT project, whose aim is the quick development of phrase-based Machine Translation (MT) systems for new language pairs. A main bottleneck of such systems is...
Article
In this paper, an automated method is proposed for optimising the real-valued parameters of a hybrid Machine Translation (MT) system that employs pattern recognition techniques together with extensive monolingual corpora in the target language from which statistical information is extracted. The absence of a parallel corpus prohibits the use of the...
Article
Full-text available
METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four par...
Conference Paper
Full-text available
In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is required and in which no full parser or extensive rule sets are needed. We describe the evaluation on a developme...
Article
Full-text available
In this paper, we explain why we have adopted pattern matching for MT pur- poses and why we have embedded it into a hybrid approach. "Patterns" here are understood as independent meaningful sub-sentential segments received in a sys- tematic way. We describe the nature and size of the patterns used as well as the comparison algorithm developed. We d...
Conference Paper
Full-text available
The innovative feature of the system presented in this paper is the use of pattern-matching techniques to retrieve translations resulting in a flexible, language-independent approach, which employs a limited amount of explicit a priori linguistic knowledge. Furthermore, while all state-of-the-art corpus-based approaches to Machine Translation (MT)...
Article
Full-text available
METIS-II, the MT system presented in this paper, does not view translation as a transfer process between a source lan-guage (SL) and a target one (TL), but rather as a matching procedure of patterns within a language pair. More specifically, translation is considered to be an assign-ment problem, i.e. a problem of discover-ing each time the best ma...
Article
In this paper an innovative approach is presented for MT, which is based on pat- tern matching techniques, relies on extensive target language monolingual corpora and em- ploys a series of similarity weights between the source and the target language. Our system is based on the notion of 'patterns', which are viewed as 'models' of target language s...
Article
Full-text available
In the present article, a hybrid approach is pro- posed for implementing a machine translation system using a large monolingual corpus cou- pled with a bilingual lexicon and basic NLP tools. In the first phase of the METIS system, a source language (SL) sentence, after being tagged, lemmatised and translated by a flat lemma-to-lemma lexicon, was ma...

Network

Cited By

Projects

Project (1)