
Mariana NevesBundesinstitut für Risikobewertung | BfR
Mariana Neves
PhD Computer Science
About
58
Publications
9,877
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,425
Citations
Citations since 2017
Introduction
Skills and Expertise
Additional affiliations
October 2013 - present
April 2013 - October 2013
Publications
Publications (58)
Background: European Union legislature requires replacement of animal experiments with alternative methods, whenever such methods are suitable to reach the intended scientific objective. However, searching for alternative methods in the scientific literature is a time-consuming task that requires careful screening of an enormously large number of e...
Background: European Union legislature requires replacement of animal experiments with alternative methods, whenever such methods are suitable to reach the intended scientific objective. However, searching for alternative methods in the scientific literature is a time-consuming task that requires careful screening of an enormously large number of e...
Finding publications that propose alternative methods to animal experiments is an important but time-consuming task since researchers need to perform various queries to literature databases and screen many articles to assess two important aspects: the relevance of the article to the research question, and whether the article's proposed approach qua...
Background The engineering of elaborate and innovative tools to navigate the ever growing biomedical knowledge base, instanced in PubMed/Medline, must be guided by genuine case studies addressing `real world´ user needs. Furthermore, algorithm-based predictions regarding `similarity´, `relatedness´ or `relevance´ of pieces of information (e.g. rele...
Motivation:
Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the...
In this paper, we provide an overview of the seventh annual edition of the CLEF eHealth evaluation lab. CLEF eHealth 2019 continues our evaluation resource building efforts around the easing and support of patients, their next-of-kins, clinical staff, and health scientists in understanding, accessing, and authoring electronic health information in...
Since 2012 CLEF eHealth has focused on evaluation resource building efforts around the easing and support of patients, their next-of-kins, clinical staff, and health scientists in understanding, accessing, and authoring eHealth information in a multilingual setting. This year’s lab offers three tasks: Task 1 on multilingual information extraction;...
The generation of natural language from Resource Description Framework (RDF) data has recently gained significant attention due to the continuous growth of Linked Data. A number of these approaches generate natural language in languages other than English, however, no work has been proposed to generate Brazilian Portuguese texts out of RDF. We addr...
Motivation:
Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the ty...
This paper describes our submission to the 2017 BioASQ challenge. We participated in Task B, Phase B which is concerned with biomedical question answering (QA). We focus on factoid and list question, using an extractive QA model, that is, we restrict our system to output substrings of the provided text snippets. At the core of our system, we use Fa...
Factoid question answering (QA) has recently benefited from the development of deep learning (DL) systems. Neural network models outperform traditional approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 questions) for Wikipedia articles. However, these systems have not yet been applied to QA in more specific domains, such...
Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Fe...
Question answering (QA) systems are crucial when searching for exact answers for natural language questions in the biomedical domain. Answers to many of such questions can be extracted from the 26 millions biomedical publications currently included in MEDLINE when relying on appropriate natural language processing (NLP) tools. In this work we descr...
BioC is a new simple XML format for sharing biomedical text and annotations and libraries to read and write that format. This promotes the development of interoperable tools for natural language processing (NLP) of biomedical text. The interoperability track at the BioCreative IV workshop featured contributions using or highlighting the BioC format...
Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six seman...
Background / Purpose:
We have evaluated the use of a question answering system for literature curation.
Main conclusion:
Evaluation carried out for curation of gene/protein in cells and anatomical parts show that the our question answering system could retrieve relevant text passages when provided with a question in natural language.
CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different
development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical
literature. To date, CellFinder describes 3394 cell types and 5...
Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its m...
New approaches to biomedical text mining crucially depend on the existence of comprehensive annotated corpora. Such corpora, commonly called gold standards, are important for learning patterns or models during the training phase, for evaluating and comparing the performance of algorithms and also for better understanding the information sought for...
The regeneration of vital organs and tissues remains one of the biggest medical challenges. However, the use of embryonic stem cells and induced pluripotent stem cells allows novel replacement strategies. Our project aims to create a stem cell data repository by linking information from existing public databases and by performing text mining on the...
Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offeri...
Background:
Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic grou...
We describe our approach for the extraction of drug-drug in-teractions from literature. The proposed method builds majority voting ensembles of contrasting machine learning methods, which exploit differ-ent linguistic feature spaces. We evaluated our approach in the context of the DDI Extraction 2011 challenge, where using document-wise cross-valid...
Gene/protein recognition and normalization are important prerequisite steps for many biological text mining tasks. Even if
great efforts have been dedicated to these problems and effective solutions have been reported, the availability of easily
integrated tools to perform these tasks is still deficient. We therefore propose Moara, a Java library t...
Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tas...
The BioNLP'09 Shared Task on Event Extraction presented an evaluation on the extraction of biological events related to genes/proteins from the literature. We propose a system that uses the case-based reasoning (CBR) machine learning approach for the extraction of the entities (events, sites and location). The mapping of the proteins in the texts t...
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions...
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions...
This work proposes a case-based classifier to tackle the gene/protein mention problem in biomedical literature. The so called gene mention problem consists of the recognition of gene and protein entities in scientific texts. A classification process aiming at deciding if a term is a gene mention or not is carried out for each word in the text. It i...
This work proposes a new solution to the gene normalization problem by using simple gene/protein dictionaries, global alignment with predefined costs and document similarity for the disambiguation step. A mix of taggers was used to extract the mentions that are further processed in the search of the candidate gene/protein identifiers and a disambig...
This work proposes a case-based classifier to tackle the gene/protein mention problem in biomedical literature. The so called gene mention problem consists of the recognition of gene and protein entities in scientific texts. A classification process aiming at deciding if a term is a gene mention or not is carried out for each word in the text. It i...
Asterias (http://www.asterias.info) is an open-source, web-based, suite for the analysis of gene expression and aCGH data. Asterias implements validated statistical
methods, and most of the applications use parallel computing, which permits taking advantage of multicore CPUs and computing
clusters. Access to, and further analysis of, additional bio...
The work presented here proposes a case-based classification for the gene mention task in the BioCreAtIvE 2 challenge. The classification performed by the system for each word in an article is based on the selection of the best or more similar case in a base of known and unknown cases. The procedure showed good results, precision of 71.68 and recal...
This work describes the CitationFinder, a knowledge-based system
for the automatic classification of Web pages which contain citations of
publications. The system counts on a knowledge base of production rules
with associated certainty factors. It was constructed on the basis of a
corpus of 1.000 sample pages and presented a very satisfactory
perfo...
We present here the methods we have used during our participation in the first round of the CALBC challenge which consists in the annotation of a testing corpus composed of 100,000 abstracts on immunology. We have participated in Task A, using Moara system for the extractions of gene/protein boundaries. We also participated in Task B using natural...
http://www.cin.ufpe.br/{~mln, ~fab} Abstract. This regularity allows the automatic identification of such pages by computational systems based on domain knowledge. The work presented here describes the CitationFinder, a knowledge-based system for the automatic classification of Web pages which contain citations of technical and scientific publicati...
Clinical texts constitute a very important source of information for medical studies. This huge collection of data is susceptible for the application of the many biomedical text mining tools that helps in automating their analysis process. Tasks such as classification according to predefined categories, clustering and extraction of specific entitie...
Motivation: Gene/protein recognition and normalization are important preceding steps for many biological text mining tasks, such as protein-protein interaction. Even if great ef-forts have been dedicated to these problems and effective solutions have been reported, the availability of easily inte-grated tools to perform these tasks is deficient. We...