Renato Ferreira

Renato Ferreira
Federal University of Minas Gerais | UFMG · Departamento de Ciência da Computação

Ph.D.

About

130
Publications
29,824
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,995
Citations
Additional affiliations
August 2002 - September 2017
Federal University of Minas Gerais
Position
  • Professor (Associate)
August 2001 - September 2017
Federal University of Minas Gerais
Position
  • Professor (Associate)

Publications

Publications (130)
Article
Full-text available
Digital Pathology is a fast-growing field. It enables assessing disease grading, treatment progression, patient prognosis, etc. These analyses are carried out using whole slide tissue images (WSI). Processing of WSIs is costly due to their high resolutions (may reach over 100,000\(\times\)100,000 pixels). Modern parallel machines offer adequate com...
Conference Paper
This article examines the underutilization of detailed criminal data, collaborating with the Military Police of Minas Gerais, Brazil. We propose a new methodology, materialized in a tool, that is able to transform raw data into strategic information for public security decision-making. The tool evaluation unfolds in three phases: characterizing the...
Preprint
Full-text available
Content-Based Multimedia Retrieval (CBMR) has become very popular in several applications, driven by the growing routine use of multimedia data. Since the datasets used in real-world applications are very large and descriptor’s dimensionality is high, querying is an expensive, albeit important functionality. Further, exact search is prohibitive in...
Article
Full-text available
Similarity search is a key operation in content-based multimedia retrieval (CBMR) applications. Online CBMR applications, which are the focus of this work, perform a large number of search operations on dynamic datasets, which are updated at run-time. Additionally, the rates of search and data insertion (updated) operations vary during the executio...
Preprint
Full-text available
Similarity search is an key operation in Content-based multimedia retrieval(CBMR) applications. Online CBMR applications, which is the focus of thiswork, have to search in large and dynamic datasets that are updated during theexecution while offering low response times. Additionally, these applications aresubmitted to workloads that vary at runtime...
Article
Full-text available
Motivation: Deep learning attained excellent results in Digital Pathology recently. A challenge with its use is that high quality, representative training data sets are required to build robust models. Data annotation in the domain is labor intensive and demands substantial time commitment from expert pathologists. Active Learning (AL) is a strate...
Article
Content-based multimedia retrieval (CBMR) applications are becoming very popular in several online services which handles large volumes of data and are submitted to high query rates. While these applications may be complex, finding the nearest neighboring objects (multimedia descriptors) is typically their most time consuming operation. In order to...
Article
Full-text available
Background Deep learning methods have demonstrated remarkable performance in pathology image analysis, but they are computationally very demanding. The aim of our study is to reduce their computational cost to enable their use with large tissue image datasets. Methods We propose a method called Network Auto-Reduction (NAR) that simplifies a Convol...
Article
The analysis of high resolution whole slide tissue images is a computationally expensive task, which adversely impacts effective use of pathology imaging data in research. We propose runtime solutions to enable efficient execution of pathology image analysis applications on modern distributed memory hybrid platforms equipped with both CPUs and GPUs...
Article
Background and objective: Computerized pathology image analysis is an important tool in research and clinical settings, which enables quantitative tissue characterization and can assist a pathologist's evaluation. The aim of our study is to systematically quantify and minimize uncertainty in output of computer based pathology image analysis. Meth...
Article
Nearest neighbors search is a core operation found in several online multimedia services. These services have to handle very large databases, while, at the same time, they must minimize the query response times observed by users. This is specially complex because those services deal with fluctuating query workloads (rates). Consequently, they must...
Chapter
Associative classification refers to a class of algorithms that is very efficient in classification problems. Data in such domain are multidimensional, with data instances represented as points of a fixed-length attribute space, and are exploited from two large sets: training and testing datasets. Models, known as classifiers, are mined in the trai...
Article
The similarity search in high-dimensional spaces is a core operation found in several online multimedia retrieval applications. With the popularity of these applications, they are required to handle very large and increasing datasets, while keeping the response time low. This problem is worsened in the context of online applications, mostly due to...
Article
Creating textured 3D meshes of objects for real-time applications can be a laborious, slow and expensive task, demanding specific, highly specialized human resources such as 2D and 3D artists. In this paper, we present a fully automatic 3D modeling methodology based on silhouette carving, capable of creating textured 3D meshes from three pieces of...
Chapter
Full-text available
Detection of Cardiac Arrhythmia (CA) is performed using the clinical analysis of the electrocardiogram (ECG) of a patient to prevent cardiovascular diseases. Machine Learning Algorithms have been presented as promising tools in aid of CA diagnoses, with emphasis on those related to automatic classification. However, these algorithms suffer from two...
Chapter
Recently we observe that mobile phones stopped being just devices for basic communication to become providers of many applications that require increasing performance for good user experience. Inside today’s mobile phones we find different processing units (PU) with high computational capacity, as multicore architectures and co-processors like GPUs...
Article
Full-text available
The massive data generation has been pushing for significant advances in computing architectures, reflecting in heterogeneous architectures composed by different types of processing units. The filter-stream paradigm is typically used to exploit the parallel processing power of these new architectures. The efficiency of applications in this paradigm...
Article
The emergence of applications that demand to handle efficiently growing amounts of data has stimulated the development of new computing architectures with several Processing Units (PUs), such as CPUs core, graphics processing units (GPUs) and Intel Xeon Phi (MIC). Aiming to better exploit these architectures, recent works focus on proposing novel r...
Conference Paper
Following the evolution of desktops, mobile architectures are currently witnessing growth in processing power and complexity with the addition of different processing units like multi-core CPUs and GPUs. To facilitate programming and coordinating resource usage in these heterogeneous architectures, we present ParallelME, a Parallel Mobile Engine de...
Conference Paper
Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies pro- pose to deal efficiently and effectively for this purpose, we highlight those related to clustering, more specifically, density-based clustering strategies, which...
Article
Most high-performance data processing (a.k.a. big data) systems allow users to express their computation using abstractions (like MapReduce), which simplify the extraction of parallelism from applications. Most frameworks, however, do not allow users to specify how communication must take place: That element is deeply embedded into the run-time sys...
Article
Full-text available
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core (MIC)) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices...
Article
Collective opinions observed in Social Media represent valuable information for a range of applications. On the pursuit of such information, current methods require a prior knowledge of each individual opinion to determine the collective one in a post collection. Differently, we assume that collective analysis could be better performed when exploit...
Article
Full-text available
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core - MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices...
Conference Paper
Full-text available
The availability of surveillance cameras placed in public locations has increased vastly in the last years, providing a safe environment to people at the cost of huge amount of visual data collected. Such data are mostly processed manually, a task which is labor intensive and prone to errors. Therefore, automatic approaches must be employed to enab...
Conference Paper
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although the...
Article
Simultaneous Multi-Threading (SMT) is a hardware model in which different threads share the same processing unit. This model is a compromise between high parallelism and low hardware cost. Minimal Multi-Threading (MMT) is one architecture recently proposed that shares instruction decoding and execution between threads running the same program in an...
Conference Paper
Full-text available
Most high-performance data processing (aka big-data) systems allow users to express their computation using abstractions (like map-reduce) that simplify the extraction of parallelism from applications. Most frameworks, however, do not allow users to specify how communication must take place: that element is deeply embedded into the run-time system...
Conference Paper
Full-text available
Even as the web 2.0 grows, e-mail continues to be one of the most used forms of communication in the Internet, being responsible for the generation of huge amounts of data. Spam traffic, for example, accounts for terabytes of data daily. It becomes necessary to create tools that are able to process these data efficiently, in large volumes, in order...
Conference Paper
Full-text available
Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search engines. A prominent approach to sentiment analysis is based on the application of classification...
Conference Paper
Full-text available
The emergence of different applications that deal with growing amounts of data at reasonable times, has stimulated the development of new computing architectures consisting of different processing units (PU). Runtime environments have been proposed in order to exploit these resource as much as possible by offering a variety of methods for dynamical...
Conference Paper
Full-text available
Computer Vision problems applied to visual surveillance have been studied for several years aiming at finding ac-curate and efficient solutions, required to allow the execu-tion of surveillance systems in real environments. The main goal of such systems is to analyze the scene focusing on the detection and recognition of suspicious activities per-f...
Article
Full-text available
With the advent of Web 2.0, we see a new and differentiated scenario: there is more data than that can be effectively analyzed. Organizing this data has become one of the biggest problems in Computer Science. Many algorithms have been proposed for this purpose, highlighting those related to the Data Mining area, specifically the clustering algorith...
Conference Paper
Aplicações que lidam com grandes quantidades de dados em tempo aceitável vem impulsionando o desenvolvimento de novas arquiteturas compostas por diferentes unidades de processamento (UP). Ambientes de execução vem sendo propostos para explorar esses recursos, oferecendo métodos capazes de escalonar tarefas entre diferentes UPs. Embora a maioria das...
Conference Paper
Full-text available
The advent of the Web 2.0 has given rise to an interesting phenomenon: there is currently much more data than what can be effectively analyzed without relying on sophisticated automatic tools. Some of these tools, which target the organization and extraction of useful knowledge from this huge amount of data, rely on machine learning and data or tex...
Article
Full-text available
This paper presents a compilation technique that performs automatic parallelization of canonical loops. Canonical loops are a pattern observed in many well known algorithms, such as frequent itemsets, K-means and K nearest neighbors. Automatic parallelization allows application developers to focus on the algorithmic details of the problem they are...
Conference Paper
Full-text available
Simultaneous Multi-Threading (SMT) is a hardware model in which different threads share the same instruction fetching unit. This model is a compromise between high parallelism and low hardware cost. Minimal Multi-Threading (MMT) is a technique recently proposed to share instructions and execution between threads in a SMT machine. In this paper we p...
Article
Real-time search algorithms solve the problem of path planning, regardless the size and complexity of the maps, and the massive presence of entities in the same environment. In such methods, the learning step aims to avoid local minima and improve the results for future searches, ensuring the convergence to the optimal path when the same planning t...
Conference Paper
The task of extracting information from datasets that become larger at a daily basis, such as those collected from the web, is an increasing challenge, but also provides more interesting insights and analysis. Current analyses went beyond content and now focus on tracking and understanding users' relationships and interactions. Such computation is...
Conference Paper
Full-text available
How do we analyze sentiments over a set of opinionated Twitter messages? This issue has been widely studied in recent years, with a prominent approach being based on the application of classification techniques. Basically, messages are classified according to the implicit attitude of the writer with respect to a query term. A major concern, however...
Article
Full-text available
The increases in multi-core processor par-allelism and in the flexibility of many-core accel-erator processors, such as GPUs, have turned tra-ditional SMP systems into hierarchical, heteroge-neous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for th...
Article
Novel pathogens have the potential to become critical issues of national security, public health and economic welfare. As demonstrated by the response to Severe Acute Respiratory Syndrome (SARS) and influenza, genomic sequencing has become an important method for diagnosing agents of infectious disease. Despite the value of genomic sequences in cha...
Conference Paper
Full-text available
Frequent itemset mining (FIM) is a core operation for several data mining applications as association rules computation, correlations, document classification, and many others, which has been extensively studied over the last decades. Moreover, databases are becoming increasingly larger, thus requiring a higher computing power to mine them in reaso...
Conference Paper
Full-text available
The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the de...
Conference Paper
Full-text available
We have been witnessing a continuous growth of both heterogeneous computational platforms (e.g., Cell blades, or the joint use of traditional CPUs and GPUs) and multi- core processor architecture; and it is still an open question how applications can fully exploit such computational potential efficiently. In this paper we introduce a run-time envir...
Conference Paper
Full-text available
We are witnessing an increasing adoption of GPUs for performing general purpose computation, which is usually known as GPGPU. The main challenge in developing such applications is that they often do not fit in the model required by the graphics processing devices, limiting the scope of applications that may be benefit from the computing power provi...
Article
Full-text available
Accurate segmentation of tissue microarrays is a challenging topic because of some of the similarities exhibited by normal tissue and tumor regions. Processing speed is another consideration when dealing with imaged tissue microarrays as each microscopic slide may contain hundreds of digitized tissue discs. In this paper, a fast and accurate image...
Chapter
IntroductionThe architectureRuntime frameworkParallel algorithms for data miningVisual metaphorsCase studiesFuture developmentsConclusions and future work
Conference Paper
Full-text available
GPUs have recently evolved into very fast parallel co-processors capable of executing gen-eral purpose computations extremely efficiently. At the same time, multi-core CPUs evolution continued and today's CPUs have 4-8 cores. These two trends, however, have followed independent paths in the sense that we are aware of very few works that consider bo...
Article
Full-text available
Translational research projects target a wide variety of diseases, test many different kinds of biomedical hypotheses, and employ a large assortment of experimental methodologies. Diverse data, complex execution environments, and demanding security and reliability requirements make the implementation of these projects extremely challenging and requ...
Conference Paper
New architectural trends in chip design resulted in machines with multiple processing units as well as efficient communication networks, leading to the wide availability of systems that provide multiple levels of parallelism, both inter- and intra-machine. Developing applications that efficiently make use of such systems is a challenge, specially f...
Conference Paper
The development of high level abstractions for programming distributed systems is becoming a crucial effort in computer science. Several frameworks have been proposed, which expose simplified programming abstractions that are useful for a broad class of applications and can be implemented efficiently on distributed systems. One such system is Anthi...
Article
Full-text available
Design templates that involve discovery, analysis, and integration of information resources commonly occur in many scientific research projects. In this paper we present examples of design templates from the biomedical translational research domain and discuss the requirements imposed on Grid middleware infrastructures by them. Using caGrid, which...
Article
Full-text available
Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the fir...
Article
Full-text available
Scientific workflow systems have been introduced in response to the demand of researchers from several domains of science who need to process and analyze increasingly larger datasets. The design of these systems is largely based on the observation that data analysis applications can be composed as pipelines or networks of computations on data. In t...
Article
Full-text available
caGrid is a middleware system which combines the Grid computing, the service oriented architecture, and the model driven architecture paradigms to support development of interoperable data and analytical resources and federation of such resources in a Grid environment. The functionality provided by caGrid is an essential and integral component of t...
Conference Paper
Fault tolerance is a desirable feature in distributed high-performance systems, since applications tend to run for long periods of time and faults become more likely as the number of nodes in the system increase. However, most distributed environments lack any fault tolerant features, since they tend to be hard to implement and use, and often hurt...
Conference Paper
Full-text available
The identification of replicas in a database is fundamental to improve the quality of the information. Deduplication is the task of identifying replicas in a database that refer to the same real world entity. This process is not always trivial, because data may be corrupted during their gathering, storing or even manipulation. Problems such as miss...
Conference Paper
Full-text available
This paper presents a fault tolerance framework for applications that process data using a distributed network of user-defined operations in a pipelined fashion. The framework saves intermediate results and messages exchanged among application components in a distributed data management system to facilitate quick recovery from failures. The experim...