Mariana Neves

Mariana Neves
Bundesinstitut für Risikobewertung | BfR

PhD Computer Science

About

58
Publications
9,877
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,425
Citations
Citations since 2017
26 Research Items
1866 Citations
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300
Additional affiliations
October 2013 - present
Hasso Plattner Institute
Position
  • PostDoc Position
April 2013 - October 2013
Charité Universitätsmedizin Berlin
Position
  • PostDoc Position

Publications

Publications (58)
Preprint
Full-text available
Background: European Union legislature requires replacement of animal experiments with alternative methods, whenever such methods are suitable to reach the intended scientific objective. However, searching for alternative methods in the scientific literature is a time-consuming task that requires careful screening of an enormously large number of e...
Preprint
Background: European Union legislature requires replacement of animal experiments with alternative methods, whenever such methods are suitable to reach the intended scientific objective. However, searching for alternative methods in the scientific literature is a time-consuming task that requires careful screening of an enormously large number of e...
Article
Full-text available
Finding publications that propose alternative methods to animal experiments is an important but time-consuming task since researchers need to perform various queries to literature databases and screen many articles to assess two important aspects: the relevance of the article to the research question, and whether the article's proposed approach qua...
Preprint
Full-text available
Background The engineering of elaborate and innovative tools to navigate the ever growing biomedical knowledge base, instanced in PubMed/Medline, must be guided by genuine case studies addressing `real world´ user needs. Furthermore, algorithm-based predictions regarding `similarity´, `relatedness´ or `relevance´ of pieces of information (e.g. rele...
Article
Full-text available
Motivation: Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the...
Chapter
Full-text available
In this paper, we provide an overview of the seventh annual edition of the CLEF eHealth evaluation lab. CLEF eHealth 2019 continues our evaluation resource building efforts around the easing and support of patients, their next-of-kins, clinical staff, and health scientists in understanding, accessing, and authoring electronic health information in...
Chapter
Full-text available
Since 2012 CLEF eHealth has focused on evaluation resource building efforts around the easing and support of patients, their next-of-kins, clinical staff, and health scientists in understanding, accessing, and authoring eHealth information in a multilingual setting. This year’s lab offers three tasks: Task 1 on multilingual information extraction;...
Article
Full-text available
The generation of natural language from Resource Description Framework (RDF) data has recently gained significant attention due to the continuous growth of Linked Data. A number of these approaches generate natural language in languages other than English, however, no work has been proposed to generate Brazilian Portuguese texts out of RDF. We addr...
Article
Full-text available
Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the ty...
Article
This paper describes our submission to the 2017 BioASQ challenge. We participated in Task B, Phase B which is concerned with biomedical question answering (QA). We focus on factoid and list question, using an extractive QA model, that is, we restrict our system to output substrings of the provided text snippets. At the core of our system, we use Fa...
Article
Factoid question answering (QA) has recently benefited from the development of deep learning (DL) systems. Neural network models outperform traditional approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 questions) for Wikipedia articles. However, these systems have not yet been applied to QA in more specific domains, such...
Article
Full-text available
Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Fe...
Conference Paper
Full-text available
Question answering (QA) systems are crucial when searching for exact answers for natural language questions in the biomedical domain. Answers to many of such questions can be extracted from the 26 millions biomedical publications currently included in MEDLINE when relying on appropriate natural language processing (NLP) tools. In this work we descr...
Article
Full-text available
BioC is a new simple XML format for sharing biomedical text and annotations and libraries to read and write that format. This promotes the development of interoperable tools for natural language processing (NLP) of biomedical text. The interoperability track at the BioCreative IV workshop featured contributions using or highlighting the BioC format...
Article
Full-text available
Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six seman...
Conference Paper
Full-text available
Background / Purpose: We have evaluated the use of a question answering system for literature curation. Main conclusion: Evaluation carried out for curation of gene/protein in cells and anatomical parts show that the our question answering system could retrieve relevant text passages when provided with a question in natural language.
Article
Full-text available
CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical literature. To date, CellFinder describes 3394 cell types and 5...
Article
Full-text available
Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its m...
Article
Full-text available
New approaches to biomedical text mining crucially depend on the existence of comprehensive annotated corpora. Such corpora, commonly called gold standards, are important for learning patterns or models during the training phase, for evaluating and comparing the performance of algorithms and also for better understanding the information sought for...
Conference Paper
Full-text available
The regeneration of vital organs and tissues remains one of the biggest medical challenges. However, the use of embryonic stem cells and induced pluripotent stem cells allows novel replacement strategies. Our project aims to create a stem cell data repository by linking information from existing public databases and by performing text mining on the...
Article
Full-text available
Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offeri...
Article
Full-text available
Background: Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic grou...
Article
Full-text available
We describe our approach for the extraction of drug-drug in-teractions from literature. The proposed method builds majority voting ensembles of contrasting machine learning methods, which exploit differ-ent linguistic feature spaces. We evaluated our approach in the context of the DDI Extraction 2011 challenge, where using document-wise cross-valid...
Chapter
Gene/protein recognition and normalization are important prerequisite steps for many biological text mining tasks. Even if great efforts have been dedicated to these problems and effective solutions have been reported, the availability of easily integrated tools to perform these tasks is still deficient. We therefore propose Moara, a Java library t...
Article
Full-text available
Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tas...
Article
Full-text available
The BioNLP'09 Shared Task on Event Extraction presented an evaluation on the extraction of biological events related to genes/proteins from the literature. We propose a system that uses the case-based reasoning (CBR) machine learning approach for the extraction of the entities (events, sites and location). The mapping of the proteins in the texts t...
Article
Full-text available
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions...
Article
Full-text available
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions...
Article
Full-text available
This work proposes a case-based classifier to tackle the gene/protein mention problem in biomedical literature. The so called gene mention problem consists of the recognition of gene and protein entities in scientific texts. A classification process aiming at deciding if a term is a gene mention or not is carried out for each word in the text. It i...
Article
Full-text available
This work proposes a new solution to the gene normalization problem by using simple gene/protein dictionaries, global alignment with predefined costs and document similarity for the disambiguation step. A mix of taggers was used to extract the mentions that are further processed in the search of the candidate gene/protein identifiers and a disambig...
Conference Paper
This work proposes a case-based classifier to tackle the gene/protein mention problem in biomedical literature. The so called gene mention problem consists of the recognition of gene and protein entities in scientific texts. A classification process aiming at deciding if a term is a gene mention or not is carried out for each word in the text. It i...
Article
Full-text available
Asterias (http://www.asterias.info) is an open-source, web-based, suite for the analysis of gene expression and aCGH data. Asterias implements validated statistical methods, and most of the applications use parallel computing, which permits taking advantage of multicore CPUs and computing clusters. Access to, and further analysis of, additional bio...
Article
Full-text available
The work presented here proposes a case-based classification for the gene mention task in the BioCreAtIvE 2 challenge. The classification performed by the system for each word in an article is based on the selection of the best or more similar case in a base of known and unknown cases. The procedure showed good results, precision of 71.68 and recal...
Conference Paper
Full-text available
This work describes the CitationFinder, a knowledge-based system for the automatic classification of Web pages which contain citations of publications. The system counts on a knowledge base of production rules with associated certainty factors. It was constructed on the basis of a corpus of 1.000 sample pages and presented a very satisfactory perfo...
Article
Full-text available
We present here the methods we have used during our participation in the first round of the CALBC challenge which consists in the annotation of a testing corpus composed of 100,000 abstracts on immunology. We have participated in Task A, using Moara system for the extractions of gene/protein boundaries. We also participated in Task B using natural...
Article
http://www.cin.ufpe.br/{~mln, ~fab} Abstract. This regularity allows the automatic identification of such pages by computational systems based on domain knowledge. The work presented here describes the CitationFinder, a knowledge-based system for the automatic classification of Web pages which contain citations of technical and scientific publicati...
Article
Full-text available
Clinical texts constitute a very important source of information for medical studies. This huge collection of data is susceptible for the application of the many biomedical text mining tools that helps in automating their analysis process. Tasks such as classification according to predefined categories, clustering and extraction of specific entitie...
Article
Motivation: Gene/protein recognition and normalization are important preceding steps for many biological text mining tasks, such as protein-protein interaction. Even if great ef-forts have been dedicated to these problems and effective solutions have been reported, the availability of easily inte-grated tools to perform these tasks is deficient. We...

Network

Cited By