About
79
Publications
14,561
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,777
Citations
Introduction
I am a research associate in Insititute of Informatics and Telecommunications (IIT) of NCSR "Demokritos", where I am involved in national and international projects.
Current institution
Additional affiliations
April 2005 - April 2008
October 2008 - July 2020
Education
April 2005 - June 2008
September 2003 - September 2004
September 1999 - June 2003
Publications
Publications (79)
Complex diseases pose challenges in prediction due to their multifactorial and polygenic nature. This study employed machine learning (ML) to analyze genomic data from the UK Biobank, aiming to predict the genomic predisposition to complex diseases like multiple sclerosis (MS) and Alzheimer’s disease (AD). We tested logistic regression (LR), ensemb...
We address the need for focused retrieval and integration of biomedical literature, enabling improved subject annotation of documents with domain concepts for which no manual labels are available. To do so, we propose a novel zero-shot method, called PN Relabeler, that improves heuristic concept-level annotations by relabeling the documents Predict...
Objective
This paper presents the novel BioASQ Synergy research process which aims to facilitate the interaction between biomedical experts and automated question answering systems.
Materials and Methods
The proposed research allows systems to provide answers to emerging questions, which in turn are assessed by experts. The assessment of the exper...
The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn la...
The large-scale biomedical semantic indexing and question-answering challenge (BioASQ) aims at the continuous advancement of methods and tools to meet the needs of biomedical researchers and practitioners for efficient and precise access to the ever-increasing resources of their domain. With this purpose, during the last eleven years, a series of a...
Complex diseases pose challenges in disease prediction due to their multifactorial and polygenic nature. In this work, we explored the prediction of two complex diseases, multiple sclerosis (MS) and Alzheimer's disease (AD), using machine learning (ML) methods and genomic data from UK Biobank. Different ML methods were applied, including logistic r...
Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, syno...
This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established ta...
This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established ta...
The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous...
The large-scale biomedical semantic indexing and question-answering challenge (BioASQ) aims at the continuous advancement of methods and tools to meet the need of biomedical researchers and practitioners for efficient and precise access to the ever-increasing resources of their domain. With this purpose, during the last ten years a series of annual...
Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors, representing topics of interest for the biomedical community. Several related but distinct biomedical concepts are often grouped together in a single coarse-grained descriptor and are treated as a single topic for semantic indexing. This study proposes a ne...
The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous...
This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of...
In this paper, we present Knowledge4COVID-19, a framework that aims to showcase the power of integrating disparate sources of knowledge to discover adverse drug effects caused by drug-drug interactions among COVID-19 treatments and pre-existing condition drugs. Initially, we focus on constructing the Knowledge4COVID-19 knowledge graph (KG) from the...
There is a pressing need for advanced semantic annotation technologies of medical content, in particular medical publications, clinical trials and clinical records. Search engines and information retrieval systems require semantic annotation and indexing systems to support more advanced user search queries. Considering the relevance of disease conc...
This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of...
In this paper, we present Knowledge4COVID-19, a framework that aims to showcase the power of integrating disparate sources of knowledge to discover adverse drug effects caused by drug-drug interactions among COVID-19 treatments and pre-existing condition drugs. Initially, we focus on constructing the Knowledge4COVID-19 knowledge graph (KG) from the...
The development of the CRISPR-Cas9 technology has provided a simple yet powerful system for genome editing. Current gRNA design tools serve as an important platform for the efficient application of the CRISPR systems. However, most of the existing tools are black-box models that suffer from limitations, such as variable performance and unclear mech...
The development of the CRISPR-Cas9 technology has provided a simple yet powerful system for targeted genome editing. Compared with previous gene-editing tools, the CRISPR-Cas9 system identifies target sites by the complementarity between the guide RNA (gRNA) and the DNA sequence, which is less expensive and time-consuming, as well as more precise a...
The tenth version of the BioASQ Challenge will be held as an evaluation Lab within CLEF2022. The motivation driving BioASQ is the continuous advancement of approaches and tools to meet the need for efficient and precise access to the ever-increasing biomedical knowledge. In this direction, a series of annual challenges are organized, in the fields...
The clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) system has become a successful and promising technology for gene-editing. To facilitate its effective application, various computational tools have been developed. These tools can assist researchers in the guide RNA (gRNA) design process by pred...
The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary widely used in biomedical knowledge systems, particularly for semantic indexing of scientific literature. As the MeSH hierarchy evolves through annual version updates, some new descriptors are introduced that were not previously available. This paper explores the conceptual pr...
Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This pap...
Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This pap...
In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks a...
In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020. The aim of this analysis was to extract information about the previous status of these new descriptors as Supplementary Concept Records. The Public MeSH Note field contains information in semi-structur...
This paper describes the ninth edition of the BioASQ Challenge, which will run as an evaluation Lab in the context of CLEF2021. The aim of BioASQ is the promotion of systems and methods for highly precise biomedical information access. This is done through the organization of a series of challenges (shared tasks) on large-scale biomedical semantic...
The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary widely used in biomedical knowledge systems, particularly for semantic indexing of scientific literature. As the MeSH hierarchy evolves through annual version updates, some new descriptors are introduced that were not previously available. This paper explores the conceptual pr...
Knowledge Graphs provide insights from data extracted in various domains. In this paper, we present an approach discovering probable drug-to-drug interactions, through the generation of a Knowledge Graph from disease-specific literature. The Graph is generated using natural language processing and semantic indexing of biomedical publications and op...
In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such s...
In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks a...
The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in t...
In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such s...
This paper describes the eighth edition of the BioASQ Challenge, which will run as an evaluation Lab in the context of CLEF2020. The aim of BioASQ is the promotion of systems and methods for highly precise biomedical information access. This is done through the organization of a series of challenges (shared tasks) on large-scale biomedical semantic...
The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in t...
Biomedical researchers working on a specific disease need up-to-date and unified access to knowledge relevant to the disease of their interest. Knowledge is continuously accumulated in scientific literature and other resources such as biomedical ontologies. Identifying the specific information needed is a challenging task and computational tools ca...
In this work, we study the task of predicting the closing price of the following day of a stock, based on technical analysis, news articles and public opinions. The intuition of this study lies in the fact that technical analysis contains information about the event, but not the cause of the change, while data like news articles and public opinions...
Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these...
Artificial Intelligence has been an active research field in Greece for over forty years, and there are more than thirty AI groups throughout the country covering almost all subareas of AI. One milestone for AI research in Greece was in 1988, when the Hellenic Artificial Intelligence Society (EETN) was founded as a non-profit, scientific organizati...
The past years have seen a growing amount of research on question answering (QA) over Semantic Web data, shaping an interaction paradigm that allows end users to profit from the expressive power of Semantic Web standards while, at the same time, hiding their complexity behind an intuitive and easy-to-use interface. On the other hand, the growing am...
The workshop on Medical Information Retrieval took place at SIGIR 2016 in Pisa, Italy on July 21. The workshop programme included seven oral presentations of refereed papers, four posters and an invited keynote presentation. This allowed time for lively discussions among the 27 participants. These made clear the significant and diverse challenges i...
Evaluation in empirical computer science is essential to show progress and
assess technologies developed. Several research domains such as information
retrieval have long relied on systematic evaluation to measure progress: here,
the Cranfield paradigm of creating shared test collections, defining search
tasks, and collecting ground truth for these...
BioASQ is a series of challenges that aims to assess the performance of information systems in supporting two tasks that are central to the biomedical question answering process: (a) the indexing of large volumes of unlabelled data, primarily scientific articles, with biomedical concepts, (b) the processing of biomedical questions and the generatio...
In this report, we summarize the outcome of the "Evaluation-as-a-Service" workshop that was held on the 5th and 6th March 2015 in Sierre, Switzerland. The objective of the meeting was to bring together initiatives that use cloud infrastructures, virtual machines, APIs (Application Programming Interface) and related projects that provide evaluation...
Modern online social networks, such as Twitter and Instagram, are nowadays important sources for publishing information and content around breaking news stories and incidents related to public safety, ranging from natural disasters and aeroplane accidents to terrorist attacks and industrial accidents. A crucial issue regarding such information and...
This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-u...
In the past years social media services received content contributions from millions of users, making them a fruitful source for data analysis. In this paper we present a novel approach for mining Twitter data in order to extract factual information concerning trending events. Our approach is based on relation extraction between named entities, suc...
Most common methods for inquiring genomic sequence composition, are based on the bag-of-words approach and thus largely ignore the original sequence structure or the relative positioning of its constituent oligonucleotides. We here present a novel methodology that takes into account both word representation and relative positioning at various lengt...
In this work, we consider a transfer learning approach based on K-means for splice site recognition. We use different representations for the sequences, based on n-gram graphs. In addition, a novel representation based on the secondary structure of the sequences is proposed. We evaluate our approach on genomic sequence data from model organisms of...
A new transfer learning method is presented in this paper, addressing a particularly hard transfer learning problem: the case where the target domain shares only a subset of its classes with the source domain and only unlabeled data are provided for the target domain. This is a situation that occurs frequently in real-world applications, such as th...
News and social media are emerging as a dominant source of information for numerous applications. However, their vast unstructured content present challenges to efficient extraction of such information. In this paper, we present the SYNC3 system that aims to intelligently structure content from both traditional news media and the blogosphere. To ac...
In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this work is to take advantage of the amount of available unlabeled data together with the set of labeled examples to learn latent models whose structure and underlying hypotheses take more accuratel...
Ontology learning is the process of acquiring (constructing or integrating) an ontology (semi-) automatically. Being a knowledge
acquisition task, it is a complex activity, which becomes even more complex in the context of the BOEMIE project, due to the
management of multimedia resources and the multi-modal semantic interpretation that they require...
In this paper, we address the problem of learning aspect models with partially labeled examples. We propose a method which benefits from both semi-supervised and active learning frameworks. In particular, we combine a semi-supervised extension of the PLSA algorithm [11] with two active learning techniques. We perform experiments over four different...
In this paper we describe a semi-automated approach for ontology learning. Exploiting an ontology-based multimodal information extraction system, the ontology learning subsystem accumulates documents that are insufficiently analysed and through clustering proposes new concepts, relations and interpretation rules to be added to the ontology.
This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled. The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors. These probabilities are then taken into account in the es...