Renan Souza

Renan Souza
IBM Research

Doctor of Philosophy

About

37
Publications
7,355
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
131
Citations
Citations since 2017
34 Research Items
130 Citations
2017201820192020202120222023010203040
2017201820192020202120222023010203040
2017201820192020202120222023010203040
2017201820192020202120222023010203040

Publications

Publications (37)
Preprint
Full-text available
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to...
Preprint
Full-text available
Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provi...
Article
Full-text available
Scientific workflows need to be iteratively, and often interactively, executed for large input datasets. Reducing data from input datasets is a powerful way to reduce overall execution time in such workflows. When this is accomplished online (i.e., without requiring the user to stop execution to reduce the data, and then resume), it can save much t...
Article
Full-text available
In long-lasting scientific workflow executions in HPC machines, computational scientists (the users in this work) often need to fine-tune several workflow parameters. These tunings are done through user steering actions that may significantly improve performance (e.g, reduce execution time) or improve the overall results. However, in executions tha...
Conference Paper
Full-text available
Computational Science and Engineering (CSE) projects are typically developed by multidisciplinary teams. Despite being part of the same project, each team manages its own workflows, using specific execution environments and data processing tools. Analyzing the data processed by all workflows globally is a core task in a CSE project. However, this a...
Conference Paper
Current modern applications commonly need to manage various types of datasets, usually composed of heterogeneous data and schema manipulated by disparate tools and techniques in an ad-hoc way. This demo presents HKPoly-a solution that tackles the challenge of mapping and linking heterogeneous data, providing data access encapsulation by employing s...
Conference Paper
Large-scale workflows that execute on High-Performance Computing machines need to be dynamically steered by users. This means that users analyze big data files, assess key performance indicators, fine-tune parameters, and evaluate the tuning impacts while the workflows generate multiple files, which is challenging. If one does not keep track of suc...
Conference Paper
Full-text available
In our data-driven society, there are hundreds of possible data systems in the market with a wide range of configuration parameters, making it very hard for enterprises and users to choose the most suitable data systems. There is a lack of representative empirical evidence to help users make an informed decision. Using benchmark results is a widely...
Article
Machine learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provi...
Preprint
Full-text available
Interactive computing notebooks, such as Jupyter notebooks, have become a popular tool for developing and improving data-driven models. Such notebooks tend to be executed either in the user's own machine or in a cloud environment, having drawbacks and benefits in both approaches. This paper presents a solution developed as a Jupyter extension that...
Preprint
Full-text available
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale H...
Preprint
Full-text available
Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt...
Article
Full-text available
Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt...
Conference Paper
Machine Learning (ML) is a core concept behind Artificial Intelligence systems, which work driven by data and generate ML models. These models are used for decision making, and it is crucial to trust their outputs by, e.g., understanding the process that derives them. One way to explain the derivation of ML models is by tracking the whole ML lifecy...
Conference Paper
Full-text available
This work introduces the Cycle Orchestrator, a microservices infrastructure to structure and manage workflows related to heterogeneous data from the O&G domain. Through a knowledge-based perspective, it leverages reasoning, explainability and collaboration among stakeholders.
Conference Paper
Full-text available
This paper showcases the Cycle Orchestrator, a microservices infrastructure designed to structure and manage workflows related to heterogeneous data, through a knowledge-based perspective. It aims at leveraging reasoning, explainability and collaboration among users over experiments that comprise workflow executions. We briefly discuss about design...
Preprint
Full-text available
Machine Learning (ML) has increased its role, becoming essential in several industries. However, questions around training data lineage, such as "where has the dataset used to train this model come from?"; the introduction of several new data protection legislation; and, the need for data governance requirements, have hindered the adoption of ML mo...
Conference Paper
Full-text available
Usually, modern applications manipulate datasets with diverse models, usages, and storages. “One size fits all” approaches are not sufficient for heterogeneous data, storages, and schemes. The rise of new kinds of data stores and processing, like NoSQL data stores, distributed file systems, and new data processing frameworks, brought new possibilit...
Thesis
Full-text available
Computational Science and Engineering (CSE) workflows are large-scale, requireHigh Performance Computing (HPC) execution, and have the exploratory nature ofscience.During the long run, which often lasts for hours or days, users need to steerthe workflow by dynamically analyzing it and adapting it to improve the qualityof results or to reduce the ex...
Article
Full-text available
The development lifecycle of Deep Learning (DL) models requires humans (the model trainers) to analyze and steer the training evolution. They analyze intermediate data, fine-tune hyperparameters, and stop when a resulting model is satisfying. The problem is that existing solutions for DL do not track the trainer actions. There are no explicit data...
Preprint
In long-lasting scientific workflow executions in HPC machines, computational scientists (the users in this work) often need to fine-tune several workflow parameters. These tunings are done through user steering actions that may significantly improve performance (e.g., reduce execution time) or improve the overall results. However, in executions th...
Conference Paper
Identification of geological features in seismic images is a typical activity for the discovery of oil reservoirs. Geoscientists spend hours recognizing structures such as salt bodies, faults, and other geological features in the subsurface. Trying to automate such activity is of high interest for academia and O&G industry. Deep Learning (DL) has b...
Chapter
Capturing provenance data for runtime analysis has several challenges in high performance computational science engineering applications. The main issues are avoiding significant overhead in data capture, loading and runtime query support; and coupling provenance capture mechanisms with applications built with highly efficient numerical libraries,...
Chapter
Full-text available
Due to the exploratory nature of scientific experiments, computational scientists need to steer dataflows running on High-Performance Computing (HPC) machines by tuning parameters, modifying input datasets, or adapting dataflow elements at runtime. This happens in several application domains, such as in Oil and Gas where they adjust simulation para...
Conference Paper
Full-text available
Data-intensive science requires the integration of two fairly different paradigms: high-performance computing (HPC) and data-intensive scalable computing (DISC), as exemplified by frameworks such as Hadoop and Spark. In this context, the SciDISC project addresses the grand challenge of scientific data analysis using DISC, by developing architecture...
Conference Paper
Full-text available
This paper presents Ravel, a multiagent systems (MAS) platform aimed to integrate natural language understanding components with orchestration components of dialogues between human beings and agents. Ravel enables the specification of (social) conversations norms, using deontic logic, for use in contexts where multiple agents and human users are co...
Article
Computer simulations may be composed of several scientific programs chained in a coherent flow running in High Performance Computing and cloud environments. These runs may present different execution behavior associated to the parallel flow of data among programs. Gather insight into the parallel flow of data is important for several applications....
Article
Full-text available
Multi-party Conversational Systems are systems with natural language interaction between one or more people or systems. From the moment that an utterance is sent to a group, to the moment that it is replied in the group by a member, several activities must be done by the system: utterance understanding, information search, reasoning, among others....
Conference Paper
High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations...
Conference Paper
Full-text available
The appliance of new exascale HPC techniques to energy industry simulations is absolutely needed nowadays. In this sense, the common procedure is to customize these techniques to the specific energy sector they are of interest in order to go beyond the state-of-the-art in the required HPC exascale simulations. With this aim, the HPC4E project is de...
Conference Paper
Full-text available
Is it possible to develop a reliable QA-Corpus using social media data? What are the challenges faced when attempting such a task? In this paper, we discuss these questions and present our findings when developing a QA-Corpus on the topic of Brazilian finance. In order to populate our corpus, we relied on opinions from experts on Brazilian finance...
Conference Paper
Full-text available
Resumo. Simulações computacionais, em geral, são compostas pelo encadeamento de aplicações científicas e executadas em ambientes de processamento de alto desempenho. Tais execuções comumente apresentam gargalos associados ao fluxo de dados entre as aplicações. Diversas ferramentas de perfilagem de código têm apoiado a análise de dados de desempenho...

Network

Cited By

Projects

Project (1)