Costantine D SpyropoulosNational Center for Scientific Research Demokritos | ncsr · Insititute of Informatics and Telecommunications
Costantine D Spyropoulos
Doctor of Engineering
About
141
Publications
28,147
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,597
Citations
Introduction
Additional affiliations
January 1987 - present
Publications
Publications (141)
The need for low-cost health monitoring is increasing with the continuous increase of the elderly population. In this context, unobtrusive audiovisual monitoring methods can be of great importance. More particularly, the diameter of the pupil is a valuable source of information, since, apart from pathological cases, it can reveal the emotional stat...
In this work we face the challenge of estimating a ship's main-engine rotational speed from vessel data series, in the context of sea vessel route optimization. To this end, we study the value of different vessel data types as predictors of the engine rotational speed. As a result, we utilize speed data under a time-series view and examine how extr...
The need for low-cost health monitoring is increasing with the continuous increase of the elderly population. In this context, unobtrusive audiovisual monitoring methods can be of great importance. More particularly, the diameter of the pupil is a valuable source of information, since, apart from pathological cases, it can reveal the emotional stat...
This paper presents a method towards estimating a clinical depression-specific score, namely the Beck Depression Inventory (BDI) score, based on analysis of mid-term audio features. A combination of support vector machines and semi-supervised learning has been applied to map the mid-term features to the BDI score. The method has been evaluated on t...
Unobtrusive every day health monitoring can be of important use for the
elderly population. In particular, pupil size may be a valuable source of
information, since, apart from pathological cases, it can reveal the emotional
state, the fatigue and the ageing. To allow for unobtrusive monitoring to gain
acceptance, one should seek for efficient meth...
This chapter summarises the approach and main achievements of the research project BOEMIE (Bootstrapping Ontology Evolution with Multimedia Information Extraction). BOEMIE introduced a new approach towards the automation of knowledge acquisition from multimedia content. In particular, it developed and demonstrated the notion of evolving multimedia...
This book aims to cover the state of the art in the fields of ontology evolution and information extraction from multimedia, while also promoting the synergy between these two fields. The contents stem largely from the research work conducted over a period of three years under the framework of the research project BOEMIE (Bootstrapping Ontology Evo...
In this paper we describe a semi-automated approach for ontology learning. Exploiting an ontology-based multimodal information extraction system, the ontology learning subsystem accumulates documents that are insufficiently analysed and through clustering proposes new concepts, relations and interpretation rules to be added to the ontology.
The basic goal of human robot interaction is to establish an effective communication between the two parties. In particular, robot emotion, speech, and facial expressions determine the way humans regard the robot, and they are deemed as essential for a natural form of communication. Addressing those issues is the focal point of this paper, while as...
The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy user's information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn extraction models....
In this paper we propose a novel relation extraction method, based on grammatical inference. Following a semi- supervised learning approach, the text that connects named entities in an annotated corpus is used to infer a context free grammar. The grammar learning algorithm is able to infer grammars from positive examples only, controlling overgener...
In this paper we perform a comparative evaluation of machine learning methods on the task of identifying the correct sense of a word, based on the context in which it appears. This task is known as word sense disambiguation (WSD) and is one of the hardest and most interesting issues in language engineering. Research on the use of machine learning t...
The information explosion of the Web aggravates the problem of effective information retrieval. Even though linguistic approaches found in the literature perform linguistic annotation by creating metadata in the form of tokens, lemmas or part of speech tags, however,this process is insufficient. This is due to the fact that these linguistic metadat...
As the number of health-related web sites in various languages increases, so does the need for control mechanisms that give
the users adequate guarantee on whether the web resources they are visiting meet a minimum level of quality standards. Based
upon state-of-the-art technology in the areas of semantic web, content analysis and quality labelling...
Ontologies are an essential component in Information Systems since they enable knowledge re-use and sharing in a formal, homogeneous
and unambiguous way. A domain ontology captures knowledge in a static way, as it is snapshot of knowledge from a particular
point of view in a specific time-period. However, in open and dynamic settings, where knowled...
Ontologies are widely used for formalizing and organizing the knowledge of a particular domain of interest. This facilitates knowledge sharing and re-use by both people and systems. Ontologies are becoming increasingly important in the biomedical domain since they enable knowledge sharing in a formal, homogeneous and unambiguous way. Knowledge in a...
The paper presents a platform that facilitates the use of tools for collecting domain specific web pages as well as for extracting
information from them. It also supports the configuration of such tools to new domains and languages. The platform provides
a user friendly interface through which the user can specify the domain specific resources (ont...
This article investigates the effectiveness of voting and stacked generalization -also known as stacking- in the context of information extraction (IE). A new stacking framework is proposed that accommodates well-known approaches for IE. The key idea is to perform cross-validation on the base-level data set, which consists of text documents annotat...
Ontologies are becoming increasingly important in the biomedical domain since they enable the re-use and sharing of knowledge in a formal, homogeneous and unambiguous way. In the rapidly growing field of biomedicine, knowledge is usually evolving and therefore an ontology maintenance process is required to keep the ontological knowledge up-to-date....
The M-PIRO project targets the concept of personalized information objects ─ that is, entities capable of responding to requests for information by taking into account what the requester already knows, what they are most interested in, and how the related information is to be made available. M- PIRO's technology allows textual and spoken descriptio...
This chapter provides an overview of complementary research in the active research areas: AI planning technology and intelligent agents technology. It has been widely acknowledged that modern intelligent agents approaches should combine methodologies, techniques and architectures from many areas of Computer Science, Cognitive Science, Operation Res...
The BOEMIE project proposes a bootstrapping approach to knowledge acquisition, which uses multimedia ontologies for fused extraction of semantics from multiple modalities, and feeds back the extracted information, aiming to automate the ontology evolution process.
In this paper we present eg-GRIDS, an algorithm for induc- ing context-free grammars that is able to learn from positive sample sentences. The presented algorithm, similar to its GRIDS predecessors, uses simplicity as a criterion for directing inference, and a set of opera- tors for exploring the search space. In addition to the basic beam search s...
This paper presents a new framework for extracting information from collections of Web pages across different sites. In the
proposed framework, a standard wrapper induction algorithm is used that exploits named entity information that has been previously
identified. The idea of post-processing the extraction results is introduced for resolving ambi...
This paper presents a novel method for extracting information from collections of Web pages across different sites. Our method uses a standard wrapper induction algorithm and exploits named entity information. We introduce the idea of post-processing the extraction results for resolving ambiguous facts and improve the overall extraction performance...
This paper proposes a meta-learning framework in the context of in-formation extraction from the Web. The proposed framework relies on learning a meta-level classifier, based on the output of base-level information extraction systems. Such systems are typically trained to recognize relevant information within documents, i.e., streams of lexical uni...
Introduction The EC-funded R&D project, CROSSMARC, is developing technology for extracting information from domainspecific web pages, employing language technology methods as well as machine learning methods in order to facilitate technology porting to new domains. CROSSMARC also employs localisation methodologies and user modelling techniques in o...
This paper defines a new stacked generalization framework in the context of information extraction (IE) from online sources. The proposed setting removes the constraint of applying classifiers at the base-level. A set of IE systems are trained instead to identify relevant fragments within text documents, which differs significantly from the task of...
In this paper we present a new computationally efficient algorithm for induc- ing context-free grammars that is able to learn from positive sample sentences. This new algorithm uses simplicity as a criterion for directing inference, and the search process of the new algorithm has been optimised by utilising the results of a theoretical analysis reg...
It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail ("spam"). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatiz...
This paper is a survey of recent work in the ¢eld of web usage mining for the bene¢t of research on the personalization of Web-based information services. The essence of personalization is the adaptability of information systems to the needs of their users. This issue is becoming increasingly important on the Web,as non-expert users are overwhelmed...
This paper presents a novel method for extracting information from collections of Web pages across different sites. Our method uses a standard wrapper induction algorithm and exploits named entity information. We introduce the idea of post-processing the extraction results for resolving ambiguous facts and improve the overall extraction performance...
This paper presents a large-scale Greek morphological lexicon, developed at the Software & Knowledge Engineering Laboratory
(SKEL) of NCSR “Demokritos”. The paper describes the lexicon architecture and the procedure to develop and update it. The
morphological lexicon was used to develop a lemmatiser and a morphological analyser that were exploited...
The EC-funded R&D project, CROSSMARC, is developing technology for extracting information from domain- specific web pages, employing language technology methods as well as machine learning methods in order to facilitate technology porting to new domains. CROSSMARC also employs localisation methodologies and user modelling techniques in order to pro...
This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, stor...
This paper presents a new framework for extracting information from collections of Web pages across different sites. In the proposed framework, a standard wrapper induction algorithm is used that exploits named entity information that has been previously identified. The idea of post-processing the extraction results is introduced for resolving ambi...
This paper presents an extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, also known as "spam", floods the mailboxes of users, causing frustration, wasting bandwidth and money, and exposing minors to unsuitable conte...
Ellogon is a multi-lingual, cross-operating system, general-purpose natural language engineering infrastructure. Ellogon has been used extensively in various NLP applications. It is currently provided for free for research use to research and academic organisations. In this paper, we outline its architecture and data model, present Ellogon features...
In this paper we present how the use of a general-purpose text engineering platform has facilitated the development of a cross-lingual information extraction system and its adaptation to new domains and languages. Our approach for crosslingual information extraction from the Web covers all the way from the identification of Web sites of interest, t...
Interest in the analysis of user behaviour on the Internet has been increasing rapidly, especially since the advent of electronic commerce. In this context, we argue here for the usefulness of constructing communities of users with common behaviour, making use of machine learning techniques. In particular, we assume that the users of any service on...
This paper presents a method that assists in maintaining a rule-based named-entity recognition and classification system. The underlying idea is to use a separate system, constructed with the use of machine learning, to monitor the performance of the rule-based system.
We describe the symbolic authoring facilities of the M-PIRO project. M-PIRO is developing technology that allows personalized multilingual object descriptions, in both textual and spoken form, to be produced from symbolic information in a database and small fragments of text. The technology is being tested in the context of electronic museums, wher...
This paper presents a large-scale Greek morphological lexicon, developed by the Software & Knowledge Engineering Laboratory (SKEL) of NCSR "Demokritos". The paper describes the lexicon architecture, the procedure followed to develop it, as well as the provided functionalities to update it. The morphological lexicon was used to develop a lemmatiser...
In this article we compare the performance of various machine learning algorithms on the task of constructing word-sense disambiguation rules from data.
In this paper we examine the acquisition of user stereotypes and communities automatically from users' data. Stereotypes are built using supervised learning (C4.5) on personal data extracted from a set of questionnaires answered by the users of a news filtering system. Particular emphasis is given to the characteristic features of the task of learn...
This paper compares two alternative approaches to the problem of acquir-ing named-entity recognition and classification systems from training cor-pora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where...
This paper addresses the problem of Information Extraction (IE) system customization to new domains and extraction needs with the use of PatEdit, an IE Pattern Editor. PatEdit is a human-assisted knowledge engineering tool, that facilitates the production of IE patterns. First, we present the problem of IE system customisation and the use of human...
This book constitutes the refereed proceedings of the Second Hellenic Conference on Artificial Intelligence, SETN 2002, held in Thessaloniki, Greece, in April 2002.
The 42 revised full papers presented together with two invited contributions were carefully reviewed and selected for inclusion in the book. The papers are organized in topical sections...
In recent years machine learning has made its way from artificial intelligence into areas of administration, commerce, and industry. Data mining is perhaps the most widely known demonstration of this migration, complemented by less publicized applications of machine learning like adaptive systems in industry, financial prediction, medical diagnosis...
We report on recent work on human-robot spoken dialogue interaction in the context of
We evaluate empirically a scheme for combining
classifiers, known as stacked generalization, in the
context of anti-spam filtering, a novel cost-sensitive
application of text categorization. Unsolicited commercial
e-mail, or "spam", floods mailboxes, causing frustration,
wasting bandwidth, and exposing minors to unsuitable
content. Using a public c...
This paper presents a method that assists in maintaining a rule-based named-entity recognition and classification system. The underlying idea is to use a separate system, constructed with the use of machine learning, to monitor the performance of the rule-based system. The training data for the second system is generated with the use of the rule-ba...
This paper presents the Web Usage Mining system KOINOTITES, which uses data mining techniques for the construction of user communities on the Web. User communities model groups of visitors in a Web site, who have similar interests and navigational behaviour. We present the architecture of the system and the results that we obtained in a real Web si...
The wide availability and accessibility of information have made its management and deployment even more difficult. To this end, remarkable effort has been made for the development of information systems that handle the processing, analysis and management of information. However, the success of these systems does not only depend on the quality of i...
We present a method to detect automatically pornographic content on the Web. Our method combines techniques from language
engineering and image analysis within a machine-learning framework. Experimental results show that it achieves nearly perfect
performance on a set of hard cases.
Hospital management is a hard task due to the complexity of the organization, the costly infrastructure, the specialized services offered to different patients and the need for prompt reaction to emergencies. Artificial Intelligence planning and scheduling methods can offer substantial support to the management of hospitals, and help raising the st...