
Rodrigo RochaThe University of Edinburgh | UoE · School of Informatics
Rodrigo Rocha
PhD
About
34
Publications
13,541
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
241
Citations
Introduction
I am a PhD student at the University of Edinburgh, UK. I received two MSc. degrees in Computer Science, one from the University of Edinburgh and one from the Federal University of Minas Gerais (UFMG), Brazil, in 2015, and the B.Sc. degree in Computer Science from Pontifical Catholic University of Minas Gerais (PUC Minas), Brazil, in 2012. I also worked as an assistant professor at PUC Minas for the courses of Computer Science and Information Systems.
Additional affiliations
Education
September 2017 - August 2021
September 2016 - August 2017
February 2013 - July 2015
Publications
Publications (34)
The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on graphics processing units (GPUs). In particular, tiling is a technique that can significantly enhance application p...
Functional programming languages, since their early days, have been regarded as the holy grail of parallelism. And, in fact, the absence of race conditions, coupled with algorithmic skeletons such as map and reduce, have given developers the opportunity to write many different techniques aimed at the automatic parallelization of programs. However,...
Auto-vectorizing compilers automatically generate vector (SIMD) instructions out of scalar code. The state-of-the-art algorithm for straight-line code vectorization is Superword-Level Parallelism (SLP). In this work we identify a major limitation at the core of the SLP algorithm, in the performance-critical step of collecting the vectorization cand...
Auto-vectorization techniques allow the compiler to automatically generate SIMD vector code out of scalar code. SLP is a commonly-used algorithm for converting straight-line code into vector code, which complements the loop-based traditional vectorizers. It works by scanning the input code looking for groups of instructions that can be combined int...
Resource-constrained devices for embedded systems are becoming increasingly important. In such systems, memory is highly restrictive, making code size in most cases even more important than performance. Compared to more traditional platforms, memory is a larger part of the cost and code occupies much of it. Despite that, compilers make little effor...
A criação de aplicações que obtenham o máximo de desempenho computacional nas arquiteturas modernas é uma tarefa complexa. Além de utilizar conhecimentos de paralelismo, o programador precisar ter um amplo conhecimento de vários outros aspectos da aplicação. Por este motivo, os compiladores modernos tentam paralelizar algoritmos de maneira automáti...
Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into vector code. Since unrolling generates re-peatead isomorphic instruction sequences, it enables SLP to...
In the context of video processing, image noise caused by acquisition, transfer and image compression can be attenuated by video denoising algorithms. However, their computational cost must be as low as possible to allow them to be applied to real-time applications. In this paper, we propose STMKF, a real-time video denoising algorithm based on Kal...
SLP Auto-vectorization converts straight-line code into vector code. It scans input code for groups of instructions that can be combined into vectors and replaces them with their corresponding vector instructions. This work introduces Super-Node SLP (SN-SLP), a new SLP-style algorithm, optimized for expressions that include a commu-tative operator...
Neste artigo propomos uma ferramenta que utiliza uma análise estática para detectar computações estêncil em laços aninhados em um códigos C/C++ e um gerador de código que, baseado nas informações do padrão de vizinhança da computação estêncil, gera um código CUDA otimizado. Para validar a nossa ferramenta, analisamos um conjunto de códigos presente...
Neste artigo é proposta uma adaptação do framework PSkel para o processador manycore de baixa potência MPPA-256. O framework permite simplificar o desenvolvimento de aplicações estêncil iterativas para o MPPA-256, escondendo do desenvolvedor detalhes de implementação. Os resultados obtidos no MPPA-256 mostraram uma redução do consumo de energia de...
The OpenACC programming model simplifies the programming for accelerator devices such as GPUs. Its abstract accelerator model defines a least common denominator for accelerator devices, thus it cannot represent architectural specifics of these devices without losing portability. Therefore, this general-purpose approach delivers good performance on...
The OpenACC programming model simplifies the programming for accelerator devices such as GPUs. Its abstract accelerator model defines a least common denominator for accelerator devices, thus it cannot represent architectural specifics of these devices without losing portability. Therefore, this general-purpose approach delivers good performance on...
The main challenge faced by automatic parallelization tools
in functional languages is the fact that parallelism is often hidden under
the syntax of complex recursive functions. In this paper, we propose
an algebraic framework for parallelizing – automatically – two special
classes of recursive functions. We show that these classes are comprehensiv...
Most high-performance data processing (a.k.a. big data) systems allow users to express their computation using abstractions (like MapReduce), which simplify the extraction of parallelism from applications. Most frameworks, however, do not allow users to specify how communication must take place: That element is deeply embedded into the run-time sys...
Neste artigo é proposta uma adaptação de um framework baseado em esqueletos paralelos que oferece suporte ao padrão estêncil (PSkel) para um processador manycore emergente denominado MPPA-256. Os resultados mostraram que a solução adotada apresenta boa escalabilidade, oferecendo reduções de tempo de execução e consumo de energia de até 6x.
O padrão estêncil permite a computação paralela de elementos em função da sua vizinhança. Atualmente, diversos frameworks suportam este padrão para diferentes arquiteturas paralelas. Apesar da diversidade de aplicações reais implementadas para estes frameworks, elas representam apenas uma parte das aplicações estêncil. Neste artigo é proposto um be...
Regras de associação são técnicas específicas para encontrar ocorrências simultâneas de itens frequentes em uma base de dados não numéricos. Dentre as técnicas mais difundidas, o algoritmo Apriori se destaca por ser utilizado tanto para a análise de cesta de compras como também para a predição de políticas de escalonamento em arquiteturas paralelas...
In this paper we present a distributed algorithm for detecting cycles in large-scale directed graphs, along with its correctness proof and analysis. The algorithm is then extended to find strong components in directed graphs. We indicate an application to detecting cycles in number theoretic functions such as the proper divisor function. Our protot...
Most high-performance data processing (aka big-data) systems allow users to express their computation using abstractions (like map-reduce) that simplify the extraction of parallelism from applications. Most frameworks, however, do not allow users to specify how communication must take place: that element is deeply embedded into the run-time system...
Questions
Questions (4)
What is a good book or paper reference for matrix rings M_n(R) on noncommutative rings or semirings R, where the basic operations of matrices are defined, such as the definition of addition, multiplication, inverses, etc. on elements of M_n(R)? I'm mainly interested on the definition of the main properties of multiplication on M_n(R).
Cheers
Let F(G) be a graph transformation on G, e.g. the line graph transformation L(G).
What is the best [general] approach for showing that $G_1 \isomorphic G_2 \iff F(G_1) \isomorphic F(G_2)$ ?
How cycle detection can be used in bioinformatics? What are the main applications for cycle detection in this context? For example, can it be used in protein-protein interaction networks, gene regulation networks, etc?
Cheers!
What do you think are the main pros and cons about posting unpublished work on ResearchGate. Are there any relevance in doing that? When is it appropriate for doing so?