About
71
Publications
15,522
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
461
Citations
Introduction
Current institution
Additional affiliations
February 2020 - February 2023
April 2001 - August 2004
Certisign Certificadora Digital S.A.
Position
- System Administrator
September 2004 - February 2020
Education
September 2007 - August 2012
August 1999 - December 2000
August 1993 - July 1997
Publications
Publications (71)
Artificial intelligence (AI) is revolutionizing biodiversity research by enabling advanced data analysis, species identification, and habitats monitoring, thereby enhancing conservation efforts. Ensuring reproducibility in AI-driven biodiversity research is crucial for fostering transparency, verifying results, and promoting the credibility of ecol...
In the era of rapidly expanding human genomics in research and healthcare, efficient data reuse is essential to maximise benefits for society. In response, Federated EGA was launched in 2022, and as of 2024, the FEGA Network is composed of seven national nodes. Here we describe the complexities, challenges, and achievements of FEGA, unravelling the...
The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive workflows, AI-HPC convergence, multi-facility workflows, heterogeneous HPC environments, user experience, and FAIR computational workflows. The integration of AI and...
Recent trends within computational and data sciences show an increasing recognition and adoption of computational workflows as tools for productivity, reproducibility, and democratized access to platforms and processing know-how. As digital objects to be shared, discovered, and reused, computational workflows benefit from the FAIR principles, which...
The German Human Genome-Phenome Archive (GHGA) is a cross-institutional project and German National Research Data Infrastructure (NFDI) consortium for the development of a scientific gateway for secure omics data sharing based on FAIR principles to act as the German node of the federated European Genome Archive (fEGA), participating also in the Eur...
Artificial Intelligence (AI) is revolutionizing biodiversity research by enabling advanced data analysis, species identification, and habitats monitoring, thereby enhancing conservation efforts. Ensuring reproducibility in AI-driven biodiversity research is crucial for fostering transparency, verifying results, and promoting the credibility of ecol...
In the battle of the host against lentiviral pathogenesis, the immune response is crucial. However, several questions remain unanswered about the interaction with different viruses and their influence on disease progression. The simian immunodeficiency virus (SIV) infecting nonhuman primates (NHP) is widely used as a model for the study of the huma...
Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the...
In the battle of the host against lentiviral pathogenesis, the immune response is crucial. However, several questions remain unanswered about the interaction with different viruses and their influence on disease progression. The simian immunodeficiency virus (SIV) infecting nonhuman primates (NHP) is widely used as a model for the study of the huma...
Scientific workflows facilitate the automation of different data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency, accessibility, and reusability in workflows, it is essential to implement the 17 FAIR principles as much as possible. To do so, the research data management community has s...
With increasing numbers of human omics data, there is an urgent need for adequate resources for data sharing while also standardizing and harmonizing data processing. As part of the National Research Data Infrastructure (NFDI), the German Human Genome-Phenome Archive (GHGA) strives to connect the data from German researchers and their institutions...
O artigo traz discussões sobre a eleição de modificações no formato de execução do workflow ParslRNA-Seq, que levam a melhora do desempenho e escalabilidade computacional, baseado em redução de gastos com operações de E/S com o uso de SSD em relação ao sistema de arquivos paralelos Lustre no supercomputador Santos Dumont.
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given th...
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC–MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence....
Transcriptomics experiments are often expressed as scientific workflows and benefit from high-performance computing environments. In these environments, workflow management systems can allow handling independent or communicating tasks across nodes, which may be heterogeneous. Specifically, transcriptomics workflows may treat large volumes of data....
RNA sequencing has become an increasingly affordable way to profile gene expression analyses. Here we introduce a scientific workflow implementing several open-source software executed by Parsl parallel scripting language in an high-performance computing environment. We have applied the workflow to a single-cardiomyocyte RNA-seq data retrieved from...
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted Liquid Chromatography Mass Spectrometry (LCMS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence....
The increasing volumes of data produced by high-throughput instruments coupled with advanced computational infrastructures for scientific computing have enabled what is often called a {\em Fourth Paradigm} for scientific research based on the exploration of large datasets. Current scientific research is often interdisciplinary, making data integrat...
The field of distributional ecology has seen considerable recent attention, particularly surrounding the theory, protocols, and tools for Ecological Niche Modeling (ENM) or Species Distribution Modeling (SDM). Such analyses have grown steadily over the past two decades—including a maturation of relevant theory and key concepts—but methodological co...
Apresentamos uma análise do comportamento das operações de E/S da versão do workflow científico ParslRNA-Seq, acoplada a ambientes de CAD. O artigo traz discussões sobre a eleição de quais modificações na modelagem do workflow levam à melhora do desempenho e escalabilidade computacional, baseado em redução de gastos em operações de E/S.
Apresentamos uma versão do workflow científico ParslRNA-Seq para análises de experimentos de Expressão Diferencial de Genes, acoplada a ambientes de Computação de Alto Desempenho, que mostrou melhoras no tempo total de execução de até 70%. O desempenho ParslRNA-Seq foi validado por meio de uma análise comparativa de dados da EDG em cardiomiócitos d...
Processos evolutivos e dispersão de genomas de Dengue no Brasil são relevantes na direção do impacto e vigilância endemo-epidêmico e social de arboviroses emergentes. Árvores e redes filogenéticas permitem exibir eventos evolutivos e reticulados em vírus originados pela alta diversidade e taxa de mutação de recombinação homóloga frequente. Apresent...
Este artigo apresenta uma análise comparativa de desempenho do workflow científico ParslRNA-Seq, sobre a parametrização multithreading do Parsl e do Bowtie, respectivamente. A fim de garantir o uso racional e alocação eficiente dos recursos computacionais no Supercomputador Santos Dumont.
The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indic...
Experimentos científicos em larga escala são considerados complexos devido à modelagem de suas atividades, execução e análises de grandes volumes de dados. Na bioinformática esses experimentos são modelados como workflows científicos utilizando conceitos de computação de alto desempenho e ciência de dados. Neste artigo apresentamos o workflow ParslRNA-...
Um banco de dados pode ser definido como uma coleção organizada de dados. a pesquisa científica, o uso de bancos de dados é crescente, como pode ser observado em projetos de diversas áreas, como por exemplo, na biologia. Os bancos de dados também estão cada vez mais presentes na rotina dos pesquisadores, em particular nas ômicas. Este capítulo desc...
O supercomputador Santos Dumont é o quarto maior da América Latina segundo a lista TOP500 com um total de 5.1 petaflops. Ele está hospedado no Laboratório Nacional de Computação Científica (Petrópolis, Brasil), uma unidade de pesquisa do Ministério da Ciência, Tecnologia e Inovação do Brasil. Neste trabalho apresentamos as principais caracterı́stic...
Science gateways have gained increasing attention in the last years from diverse communities. Science gateways are software solutions that bring out the integration of reusable data and specialized techniques via Web servers while hiding the complexity of the underlying high-performance computing resources. Several projects and initiatives have bee...
Biological collections have been historically regarded as fundamental sources of scientific information on biodiversity. They are commonly associated with a variety of biases, which must be characterized and mitigated before data can be consumed. In this work, we are motivated by taxonomic and collector biases, which can be understood as the effect...
Plataformas de streaming de música são cada vez mais populares, facilitando o acesso ao conteúdo musical. Esse efeito amplia o alcance de diferentes estilos musicais, incrementando a diversidade de gêneros musicais escutados nos diferentes países do mundo. A fim de melhor entender essa diversidade, neste artigo foi construída e analisada uma rede c...
The well-being of human and wildlife health involves many challenges, such as monitoring the movement of pathogens; expanding health surveillance; collecting data and extracting information to identify and predict risks; integrating specialists from different areas to handle data, species and distinct social and environmental contexts; and the comm...
Reproducibility is a fundamental requirement of the scientific process since it enables outcomes to be replicated and verified. Computational scientific experiments can benefit from improved reproducibility for many reasons, including validation of results and reuse by other scientists. However, designing reproducible experiments remains a challeng...
Reproducibility is a fundamental requirement of the scientific process since it enables outcomes to be replicated and verified. Computational scientific experiments can benefit from improved reproducibility for many reasons, including validation of results and reuse by other scientists. However, designing reproducible experiments remains a challeng...
Este trabalho tem como objetivo avaliar o desempenho computacional do framework Model-R de modelagem de nichos ecológiocs no modelo de programação Spark para processamento de dados massivos (Big Data) em uma plataforma de supercomputação. Com o crescimento exponencial dos dados ecológicos e ambientais, torna-se necessário que ferramentas de modelag...
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation-and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Managemen...
Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do...
Music streaming platforms are increasingly popular, democratizing and facilitating the access to music content. This effect extends the reach and the penetration of different musical styles, increasing the diversity of listened gen-res in different countries around the world. In order to better understand this diversity and identify countries with...
In this paper we describe two network models as a base for understanding the relevance of social processes involving collectors for shaping the composition of biological collections. Species-Collector Networks (SCNs) represent the interests of collectors towards particular species, while Collector CoWorking Networks (CWNs) represent collaborative t...
The well-being of wildlife health involves many challenges, such as monitoring the movement of pathogens ; expanding health surveillance beyond humans; collecting data and extracting information to identify and predict risks; integrating specialists from different areas to handle data, species and distinct social and environmental contexts; and, th...
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Ma...
Spatial analysis tools and synthesis of results are key to identifying the best solutions in biodiversity conservation. The importance of process automation is associated with increased efficiency and performance both in the data pre-processing phase and in the post-analysis of the results generated by the packages and modeling programs. The Model-...
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Prote...
Background
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced may additionally be integrated with other biological databases, such as Protein-Protein Interactions and annotations. However, the results of t...
Scientific experiments usually demand high performance computing (HPC) and involve the execution of a flow of activities, so they can be modeled as scientific workflows. Scientific Workflow Management Systems (SWfMS) provide ways of defining and executing these experiments in HPC environments and they produce detailed information about the workflow...
Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do...
A new open access database, Brazilian Marine Biodiversity (BaMBa) (https://marinebiodiversity.lncc.br), was developed in order to maintain large datasets from the Brazilian marine environment. Essentially, any environmental information can be added to BaMBa. Certified datasets obtained from integrated holistic studies, comprising physical–chemical...
Seamounts are considered important sources of biodiversity and minerals. However, their biodiversity and health status are not well understood; therefore, potential conservation problems are unknown. The mesophotic reefs of the Vitória-Trindade Seamount Chain (VTC) were investigated via benthic community and fish surveys, metagenomic and water chem...
Data generated by environmental research in Antarctica are essential in evaluating how its biodiversity and environment are affected by global-scale changes triggered by ever-increasing human activities. In this work, we describe BrAntIS, the Brazilian Information System on Antarctic Environmental Research, which enables the acquiring, storing, and...
In this article we describe the Brazilian Biodiversity Information System, which aims to provide an infrastructure for gathering, integrating, and analyzing data produced by various institutions in this area. Both its architecture and the workflow for harvesting and indexing data on species occurrences and checklists, one of the already implemented...
The automation of large scale computational scientific experiments can be accomplished with the use of scientific workflow management systems, which allow for the definition of their activities and data dependencies. The manual analysis of the data resulting from their execution is burdensome, due to the usually large amounts of information. Proven...
In this abstract, we describe provenance traces generated from executions of scientific workflows managed by the Swift parallel scripting system. They follow a provenance data model, used by MTCProv, the provenance management component of Swift. It is similar to PROV, representing most of its core concepts and including additional information about...
Biodiversity data is becoming increasingly available and its volume is growing rapidly. Integration of different related repositories is also advancing through a number of successful initiatives. This data can be given as input to sophisticated computer models for predicting potential distribution of species. As the amounts of manipulated data incr...
Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific workflows. This large scale is also characteristic of the data that flows within such “many-task” computations (MTC). Provenance inform...
Large-scale scientific computations are often organized as a composition of many computational tasks linked through data flow. After the completion of a computational scientific experiment, a scientist has to analyze its outcome, for instance, by checking inputs and outputs of computational tasks that are part of the experiment. This analysis can b...
The Swift parallel scripting language allows for the specification, execution and analysis of large-scale computations in parallel and distributed environments. It incorporates a data model for recording and querying provenance information. In this article we describe these capabilities and evaluate the interoperability with other systems through t...
Scientists increasingly rely on workflow management systems to perform large-scale computational scientific experiments. These
systems often collect provenance information that is useful in the analysis and reproduction of such experiments. On the other
hand, this provenance data may be exposed to security threats which can result, for instance, in...
The Swift parallel scripting language allows for the specification, execution and analysis of large-scale computations in parallel and distributed environments. It incorporates a data model for recording and querying provenance information. In this article we describe these capabilities and evaluate interoperability with other systems through the u...
Secure provenance techniques are essential in generating trustworthy provenance records, where one is interested in protecting their integrity, confidentiality, and availability. In this work, we suggest an architecture to provide protection of authorship and temporal information in grid-enabled provenance systems. It can be used in the resolution...
Monitoring the execution of distributed tasks within the workflow execution is not easy and is frequently controlled manually. This work presents a lightweight middleware monitor to design and control the parallel execution of tasks from a distributed scientific workflow. This middleware can be connected into a workflow management system. This midd...
In this work, we provide some insights about the problem of managing credentials in grid environments. Since user mobility is a very common requirement in grid implementations, centralized credential servers are frequently used to store their cryptographic keys. We study some possible solutions for environments with stronger requirements regarding...
We present an efficient strategy for the application of the
inference rules of a completion procedure for finitely presented groups.
This procedure has been proposed by Cremanns and Otto and uses a
combinatorial structure called word-cycle. Our strategy is complete in
the sense that a set of persistent word-cycles can be used to solve the
reduced w...
We present some slight improvements to a semi-decision algorithm for Presburger arithmetic originally developed by Shostak that increase the class of formulas effectively decidable. Furthermore, we show how decision algorithms for Presburger arithmetic may be combined with conditional rewrite techniques for automated deduction in algebraic specific...