About
54
Publications
10,957
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
324
Citations
Citations since 2017
Introduction
Additional affiliations
February 2020 - February 2023
September 2004 - February 2020
April 2001 - August 2004
Certisign Certificadora Digital S.A.
Position
- System Administrator
Education
September 2007 - August 2012
August 1999 - December 2000
August 1993 - July 1997
Publications
Publications (54)
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC–MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence....
RNA sequencing has become an increasingly affordable way to profile gene expression analyses. Here we introduce a scientific workflow implementing several open-source software executed by Parsl parallel scripting language in an high-performance computing environment. We have applied the workflow to a single-cardiomyocyte RNA-seq data retrieved from...
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted Liquid Chromatography Mass Spectrometry (LCMS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence....
The increasing volumes of data produced by high-throughput instruments coupled with advanced computational infrastructures for scientific computing have enabled what is often called a {\em Fourth Paradigm} for scientific research based on the exploration of large datasets. Current scientific research is often interdisciplinary, making data integrat...
The field of distributional ecology has seen considerable recent attention, particularly surrounding the theory, protocols, and tools for Ecological Niche Modeling (ENM) or Species Distribution Modeling (SDM). Such analyses have grown steadily over the past two decades—including a maturation of relevant theory and key concepts—but methodological co...
Apresentamos uma análise do comportamento das operações de E/S da versão do workflow científico ParslRNA-Seq, acoplada a ambientes de CAD. O artigo traz discussões sobre a eleição de quais modificações na modelagem do workflow levam à melhora do desempenho e escalabilidade computacional, baseado em redução de gastos em operações de E/S.
Apresentamos uma versão do workflow científico ParslRNA-Seq para análises de experimentos de Expressão Diferencial de Genes, acoplada a ambientes de Computação de Alto Desempenho, que mostrou melhoras no tempo total de execução de até 70%. O desempenho ParslRNA-Seq foi validado por meio de uma análise comparativa de dados da EDG em cardiomiócitos d...
Processos evolutivos e dispersão de genomas de Dengue no Brasil são relevantes na direção do impacto e vigilância endemo-epidêmico e social de arboviroses emergentes. Árvores e redes filogenéticas permitem exibir eventos evolutivos e reticulados em vírus originados pela alta diversidade e taxa de mutação de recombinação homóloga frequente. Apresent...
The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indic...
Este artigo apresenta uma análise comparativa de desempenho do workflow científico ParslRNA-Seq, sobre a parametrização multithreading do Parsl e do Bowtie, respectivamente. A fim de garantir o uso racional e alocação eficiente dos recursos computacionais no Supercomputador Santos Dumont.
Experimentos científicos em larga escala são considerados complexos devido à modelagem de suas atividades, execução e análises de grandes volumes de dados. Na bioinformática esses experimentos são modelados como workflows científicos utilizando conceitos de computação de alto desempenho e ciência de dados. Neste artigo apresentamos o workflow ParslRNA-...
Um banco de dados pode ser definido como uma coleção organizada de dados. a pesquisa científica, o uso de bancos de dados é crescente, como pode ser observado em projetos de diversas áreas, como por exemplo, na biologia. Os bancos de dados também estão cada vez mais presentes na rotina dos pesquisadores, em particular nas ômicas. Este capítulo desc...
O supercomputador Santos Dumont é o quarto maior da América Latina segundo a lista TOP500 com um total de 5.1 petaflops. Ele está hospedado no Laboratório Nacional de Computação Científica (Petrópolis, Brasil), uma unidade de pesquisa do Ministério da Ciência, Tecnologia e Inovação do Brasil. Neste trabalho apresentamos as principais caracterı́stic...
Science gateways have gained increasing attention in the last years from diverse communities. Science gateways are software solutions that bring out the integration of reusable data and specialized techniques via Web servers while hiding the complexity of the underlying high-performance computing resources. Several projects and initiatives have bee...
Biological collections have been historically regarded as fundamental sources of scientific information on biodiversity. They are commonly associated with a variety of biases, which must be characterized and mitigated before data can be consumed. In this work, we are motivated by taxonomic and collector biases, which can be understood as the effect...
The well-being of human and wildlife health involves many challenges, such as monitoring the movement of pathogens; expanding health surveillance; collecting data and extracting information to identify and predict risks; integrating specialists from different areas to handle data, species and distinct social and environmental contexts; and the comm...
Reproducibility is a fundamental requirement of the scientific process since it enables outcomes to be replicated and verified. Computational scientific experiments can benefit from improved reproducibility for many reasons, including validation of results and reuse by other scientists. However, designing reproducible experiments remains a challeng...
Reproducibility is a fundamental requirement of the scientific process since it enables outcomes to be replicated and verified. Computational scientific experiments can benefit from improved reproducibility for many reasons, including validation of results and reuse by other scientists. However, designing reproducible experiments remains a challeng...
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation-and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Managemen...
Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do...
Music streaming platforms are increasingly popular, democratizing and facilitating the access to music content. This effect extends the reach and the penetration of different musical styles, increasing the diversity of listened gen-res in different countries around the world. In order to better understand this diversity and identify countries with...
In this paper we describe two network models as a base for understanding the relevance of social processes involving collectors for shaping the composition of biological collections. Species-Collector Networks (SCNs) represent the interests of collectors towards particular species, while Collector CoWorking Networks (CWNs) represent collaborative t...
The well-being of wildlife health involves many challenges, such as monitoring the movement of pathogens ; expanding health surveillance beyond humans; collecting data and extracting information to identify and predict risks; integrating specialists from different areas to handle data, species and distinct social and environmental contexts; and, th...
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Ma...
Spatial analysis tools and synthesis of results are key to identifying the best solutions in biodiversity conservation. The importance of process automation is associated with increased efficiency and performance both in the data pre-processing phase and in the post-analysis of the results generated by the packages and modeling programs. The Model-...
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Prote...
Background
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced may additionally be integrated with other biological databases, such as Protein-Protein Interactions and annotations. However, the results of t...
Scientific experiments usually demand high performance computing (HPC) and involve the execution of a flow of activities, so they can be modeled as scientific workflows. Scientific Workflow Management Systems (SWfMS) provide ways of defining and executing these experiments in HPC environments and they produce detailed information about the workflow...
Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do...
A new open access database, Brazilian Marine Biodiversity (BaMBa) (https://marinebiodiversity.lncc.br), was developed in order to maintain large datasets from the Brazilian marine environment. Essentially, any environmental information can be added to BaMBa. Certified datasets obtained from integrated holistic studies, comprising physical–chemical...
Seamounts are considered important sources of biodiversity and minerals. However, their biodiversity and health status are not well understood; therefore, potential conservation problems are unknown. The mesophotic reefs of the Vitória-Trindade Seamount Chain (VTC) were investigated via benthic community and fish surveys, metagenomic and water chem...
Data generated by environmental research in Antarctica are essential in evaluating how its biodiversity and environment are affected by global-scale changes triggered by ever-increasing human activities. In this work, we describe BrAntIS, the Brazilian Information System on Antarctic Environmental Research, which enables the acquiring, storing, and...
In this article we describe the Brazilian Biodiversity Information System, which aims to provide an infrastructure for gathering, integrating, and analyzing data produced by various institutions in this area. Both its architecture and the workflow for harvesting and indexing data on species occurrences and checklists, one of the already implemented...
The automation of large scale computational scientific experiments can be accomplished with the use of scientific workflow management systems, which allow for the definition of their activities and data dependencies. The manual analysis of the data resulting from their execution is burdensome, due to the usually large amounts of information. Proven...
In this abstract, we describe provenance traces generated from executions of scientific workflows managed by the Swift parallel scripting system. They follow a provenance data model, used by MTCProv, the provenance management component of Swift. It is similar to PROV, representing most of its core concepts and including additional information about...
Biodiversity data is becoming increasingly available and its volume is growing rapidly. Integration of different related repositories is also advancing through a number of successful initiatives. This data can be given as input to sophisticated computer models for predicting potential distribution of species. As the amounts of manipulated data incr...
Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific workflows. This large scale is also characteristic of the data that flows within such “many-task” computations (MTC). Provenance inform...
Large-scale scientific computations are often organized as a composition of many computational tasks linked through data flow. After the completion of a computational scientific experiment, a scientist has to analyze its outcome, for instance, by checking inputs and outputs of computational tasks that are part of the experiment. This analysis can b...
The Swift parallel scripting language allows for the specification, execution and analysis of large-scale computations in parallel and distributed environments. It incorporates a data model for recording and querying provenance information. In this article we describe these capabilities and evaluate the interoperability with other systems through t...
Scientists increasingly rely on workflow management systems to perform large-scale computational scientific experiments. These
systems often collect provenance information that is useful in the analysis and reproduction of such experiments. On the other
hand, this provenance data may be exposed to security threats which can result, for instance, in...
The Swift parallel scripting language allows for the specification, execution and analysis of large-scale computations in parallel and distributed environments. It incorporates a data model for recording and querying provenance information. In this article we describe these capabilities and evaluate interoperability with other systems through the u...
Secure provenance techniques are essential in generating trustworthy provenance records, where one is interested in protecting their integrity, confidentiality, and availability. In this work, we suggest an architecture to provide protection of authorship and temporal information in grid-enabled provenance systems. It can be used in the resolution...
Monitoring the execution of distributed tasks within the workflow execution is not easy and is frequently controlled manually. This work presents a lightweight middleware monitor to design and control the parallel execution of tasks from a distributed scientific workflow. This middleware can be connected into a workflow management system. This midd...
In this work, we provide some insights about the problem of managing credentials in grid environments. Since user mobility is a very common requirement in grid implementations, centralized credential servers are frequently used to store their cryptographic keys. We study some possible solutions for environments with stronger requirements regarding...
We present an efficient strategy for the application of the
inference rules of a completion procedure for finitely presented groups.
This procedure has been proposed by Cremanns and Otto and uses a
combinatorial structure called word-cycle. Our strategy is complete in
the sense that a set of persistent word-cycles can be used to solve the
reduced w...
We present some slight improvements to a semi-decision algorithm for Presburger arithmetic originally developed by Shostak that increase the class of formulas effectively decidable. Furthermore, we show how decision algorithms for Presburger arithmetic may be combined with conditional rewrite techniques for automated deduction in algebraic specific...
Projects
Projects (7)
A Rede Avançada em Biologia Computacional (RABICÓ) terá sua sede no LNCC e pretende oferecer um conjunto de softwares com tecnologia moderna e atualizada para o desenvolvimento de aplicações que requerem alto poder computacional e recursos avançados de visualização científica, possibilitar a geração e estimular o desenvolvimento da pesquisa de forma integrada com equipes multidisciplinares, para o estudo de problemas relacionados à saúde humana, animal, vegetal e ambiental, e principalmente, formar recursos humanos qualificados na Graduação e na Pós-Graduação.