Andre Lamurias

Andre Lamurias
Aalborg University · Department of Computer Science

About

54
Publications
16,165
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
548
Citations
Introduction
Andre Lamurias currently works at the Department of Computer Science, Aalborg University
Skills and Expertise
Additional affiliations
September 2013 - present
University of Lisbon
Position
  • PhD Student

Publications

Publications (54)
Preprint
Microbes have a profound impact on our health and environment, but our understanding of the diversity and function of microbial communities is severely limited. Through DNA sequencing of microbial communities (metagenomics), DNA fragments (reads) of the individual microbes can be obtained, which through assembly graphs can be combined into long con...
Preprint
Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in oth...
Article
Full-text available
Objective: In 2016, the International Agency for Research on Cancer, part of the World Health Organization, released the Exposome-Explorer, the first database dedicated to biomarkers of exposure for environmental risk factors for diseases. The database contents resulted from a manual literature search that yielded over 8,500 citations, but only a s...
Preprint
Full-text available
Wikipedia is an online encyclopedia available in 285 languages. It composes an extremely relevant Knowledge Base (KB), which could be leveraged by automatic systems for several purposes. However, the structure and organisation of such information are not prone to automatic parsing and understanding and it is, therefore, necessary to structure this...
Preprint
Full-text available
In 2016, the International Agency for Research on Cancer, part of the World Health Organization, released the Exposome-Explorer, the first database dedicated to biomarkers of exposure for environmental risk factors for diseases. The database contents resulted from a manual literature search that yielded over 8500 citations, but only a small fractio...
Article
Full-text available
Biomedical relation extraction (RE) datasets are vital in the construction of knowledge bases and to potentiate the discovery of new interactions. There are several ways to create biomedical RE datasets, some more reliable than others, such as resorting to domain expert annotations. However, the emerging use of crowdsourcing platforms, such as Amaz...
Conference Paper
Full-text available
This manuscript describes the participation of the Lasige-BioTM team in the NER and EE tasks of the ChEMU evaluation lab. We have fine-tuned the BioBERT NER model to locate and tag named entities and the BioBERT RE model to detect relations between trigger words and named entities. For the NER task, we obtained a F1-score of 0.9392 (exact matching)...
Article
Full-text available
Background: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation gr...
Article
Full-text available
Question Answering (QA) is a natural language processing task that aims at obtaining relevant answers to user questions. While some progress has been made in this area, biomedical questions are still a challenge to most QA approaches, due to the complexity of the domain and limited availability of training sets. We present a method to automatically...
Chapter
Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations betwe...
Article
Full-text available
Accessible negative results are relevant for researchers and clinicians not only to limit their search space but also to prevent the costly re-exploration of research hypotheses. However, most biomedical relation extraction datasets do not seek to distinguish between a false and a negative relation among two biomedical entities. Furthermore, datase...
Conference Paper
Full-text available
Deep learning models achieve state-of-the-art results in Natural Language Processing (NLP) tasks, such as Question Answering (QA), across different domains, mostly thanks to pre-trained language models such as BERT [1]. However, there is a lack of models designed for NLP tasks in the multilingual panorama, especially in specific domains such as the...
Conference Paper
Full-text available
We propose a new multilingual, parallel corpus for Named Entity Linking benchmarking which comprises English, Portuguese and Spanish clinical case reports 1. The medical diagnostic entities in the reports were annotated with the respective code of the International Classification of Diseases 10-Clinical Modification (ICD10-CM) terminology and its P...
Preprint
Full-text available
Question Answering (QA) is a natural language processing task that aims at retrieving relevant answers to user questions. While much progress has been made in this area, biomedical questions are still a challenge to most QA approaches, due to the complexity of the domain and limited availability of training sets. We present a method to automaticall...
Article
Full-text available
Background: Biomedical literature concerns a wide range of concepts, requiring controlled vocabularies to maintain a consistent terminology across different research groups. However, as new concepts are introduced, biomedical literature is prone to ambiguity, specifically in fields that are advancing more rapidly, for example, drug design and deve...
Preprint
Full-text available
Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations betwe...
Preprint
Full-text available
Human phenotype-gene relations are fundamental to fully understand the origin of some phenotypic abnormalities and their associated diseases. Biomedical literature is the most comprehensive source of these relations, however, we need Relation Extraction tools to automatically recognize them. Most of these tools require an annotated corpus and to th...
Article
Full-text available
Background Recent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast and valuable set of such resources publicly available, whi...
Chapter
Full-text available
In bioinformatics, semantic similarity has been used to compare different types of biomedical entities, such as proteins, compounds and phenotypes, based on their biological role instead on what they look like. This manuscript presents a definition of semantic similarity between biomedical entities described by a common semantic base (e.g. ontology...
Chapter
Full-text available
Biomedical literature has become a rich source of information for various applications. Automatic text mining methods can make the processing of extracting information from a large set of documents more efficient. However, since natural language is not easily processed by computer programs, it is necessary to develop algorithms to transform text in...
Article
Full-text available
Abstract Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To an...
Preprint
Full-text available
Named-Entity Recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our Minimal Named-Entity Recognition and Linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a...
Preprint
Full-text available
Recent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast and valuable set of such resources publicly available, which are cont...
Article
Full-text available
Tolerogenic cell therapies provide an alternative to conventional immunosuppressive treatments of autoimmune disease and address, among other goals, the rejection of organ or stem cell transplants. Since various methodologies can be followed to develop tolerogenic therapies, it is important to be aware and up to date on all available studies that m...
Article
Full-text available
Named-Entity Recognition is commonly used to identify biological entities such as proteins, genes, and chemical compounds found in scientific articles. The Human Phenotype Ontology (HPO) is an ontology that provides a standardized vocabulary for phenotypic abnormalities found in human diseases. This article presents the Identifying Human Phenotypes...
Conference Paper
Full-text available
Natural Language Processing (NLP) and text mining techniques require annotated datasets to develop and evaluate new approaches. These datasets are commonly developed by domain experts, who manually annotate a corpus of documents with the relevant information. This process is expensive and time-consuming, and for this reason, it has been suggested t...
Conference Paper
Full-text available
This paper presents our approach to participate in the SemEval 2017 Task 12: Clinical TempEval challenge, specifically in the event and time expressions span and attribute identification subtasks (ES, EA, TS, TA). Our approach consisted in training Conditional Random Fields (CRF) classifiers using the provided annotations, and in creating manually...
Conference Paper
Full-text available
Named-Entity Recognition (NER) aims at identifying the fragments of a given text that mention a given entity of interest. This manuscript presents our Minimal named-Entity Recognizer (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a given text, MER only requires a lexicon (text file) with the list of terms representin...
Conference Paper
Full-text available
This article presents our approach to the CEMP task of BioCreative V.5, which consisted in using our system, IBEnt, to identify chemical entity mentions in patents through machine learning and semantic similarity techniques. The features used combine the results of a CRF classifier, two lexical matching methods (FiGO and MER) and semantic similarit...
Article
Full-text available
Many biomedical relation extraction approaches are based on supervised machine learning, requiring an annotated corpus. Distant supervision aims at training a classifier by combining a knowledge base with a corpus, reducing the amount of manual effort necessary. This is particularly useful for biomedicine because many databases and ontologies have...
Data
IBRel-miRNA corpus. Corpus generated using the MeSH term “miRNA” and annotated automatically with miRNA and gene entities. (TSV)
Data
TransmiR corpus. Corpus generated using the abstracts referenced in the entries of the TransmiR database. (TSV)
Data
IBRel-CF corpus. Corpus generated using the keywords “cystic fibrosis” and “miRNA” and annotated automatically with miRNA and gene entities. (TSV)
Data
Results with variable window sizes. Results using SL kernel and IBRel with the window size parameter 1 and 5. (ODS)
Conference Paper
Literature is one of the major sources of current biomedical information, in form of papers, patents or other types of written reports. To understand a biological system, it is necessary to integrate the results of various studies, which is highly time-consuming due to the increasing amount of published studies. One possible approach to solving thi...
Conference Paper
Full-text available
Lisboa The amount of clinical notes is increasing and there is a need to implement methods to automatically extract useful information from them, like temporal expressions or clinical events. For example, in SemEval 2016 (Semantic Evaluation) competition there is a Clinical TempEval task that provides clinical notes and pathology reports for cancer...
Conference Paper
Full-text available
Automatic methods are being developed and applied to transform textual biomedical information into machine-readable formats. Machine learning techniques have been a prominent approach to this problem. However, there is still a lack of systems that are easily accessible to users. For this reason, we developed a web tool to facilitate the access to o...
Article
Full-text available
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison o...
Article
Full-text available
Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of te...
Conference Paper
Full-text available
This paper presents the system we developed for the CHEMD-NER task of BioCreative V. This system was adapted from the IICE framework, which combines Conditional Random Fields, implemented by Stanford NER, brown clustering, implemented by Percy Liang's C-based algorithm and a semantic similarity based on the h-index concept, applied to the ChEBI ont...
Article
Full-text available
Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, “Identifying Interactions between Chemical Entities” (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify...
Article
Full-text available
Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, \"Identifying Interactions between Chemical Entities\" (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identi...
Article
Full-text available
As the number of published scientific papers grows everyday, there is also an increasing necessity for automated named entity recognition (NER) systems capable of identifying relevant entities mentioned in a given text, such as chemical entities. Since high precision values are crucial to deliver useful results, we developed a NER method, Identifyi...

Network

Cited By

Projects

Projects (2)
Project
This project intends to study deep learning together with distant supervision techniques to improve Named Entity Disambiguation (NED) solutions, more specifically by automatically identifying deep relations that may enrich current coherence graph model approaches. Additionally, the project intends to explore the full structure of KBs together with the deep relations by using semantic measures for propagating similarity in the coherence graph models. These novel approaches will be assessed in different gold standard corpus using different KBs, including popular and highly complex biomedical ontologies and DBPedia.
Archived project
Build a competitive systems for the SemEval 2017 Task 12: Clinical TempEval