Caetano Traina Jr

Caetano Traina Jr
University of São Paulo | USP · Institute of Mathematical and Computer Sciences (ICMC) (São Carlos)

Full Professor

About

345
Publications
39,307
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,639
Citations
Citations since 2016
76 Research Items
1429 Citations
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
Additional affiliations
January 2002 - December 2011
Universidade de São Paulo

Publications

Publications (345)
Article
The amount of data daily generated by different sources grows exponentially and brings new challenges to the information technology experts. The recorded data usually include heterogeneous attribute types, such as the traditional date, numerical, textual, and categorical information, as well as complex ones, such as images, videos, and multidimensi...
Chapter
As modern applications gather more and more data, the data types also become more complex. Traditional retrieval operations based on identity and order comparisons are not suitable for those types. Instead, similarity operators are much more interesting for querying complex data and are gaining increasing attention. Similarity queries retrieve the...
Article
Real-world applications generate large amounts of images every day. With the generalized use of social media, users frequently share images acquired by smartphones. Also, hospitals, clinics, exhibits, factories, and other facilities generate images with potential use for many applications. Processing the generated images usually requires feature ex...
Preprint
Full-text available
Physicians work at a very tight schedule and need decision-making support tools to help on improving and doing their work in a timely and dependable manner. Examining piles of sheets with test results and using systems with little visualization support to provide diagnostics is daunting, but that is still the usual way for the physicians' daily pro...
Article
Physicians work at a very tight schedule and need decision-making support tools to help on improving and doing their work in a timely and dependable manner. Examining piles of sheets with test results and using systems with little visualization support to provide diagnostics is daunting, but that is still the usual way for the physicians' daily pro...
Chapter
The continuous growth in data collection requires effective and efficient capabilities to support Knowledge Discovery in Databases (KDD) over large amounts of complex data. However, as activities such as data acquisition, cleaning, preparation, and recording may lead to incompleteness, impairing the KDD processes, specially because most analysis me...
Conference Paper
In this paper, we present FeatSet, a compilation of visual features extracted from open image datasets reported in the literature. FeatSet has a collection of 11 visual features, consisting of color, texture, and shape representations of the images acquired from 13 datasets. We organized the available features in a standard collection, including th...
Conference Paper
With the COVID-19 pandemic, many hospitals have collected Electronic Health Records (EHRs) from patients and shared them publicly. EHRs include heterogeneous attribute types, such as image exams, numerical, textual, and categorical information. Simply posing similarity queries over EHRs can underestimate the semantics and potential information of p...
Article
Similarity searches can be modeled by means of distances following the Metric Spaces Theory and constitute a fast and explainable query mechanism behind content-based image retrieval (CBIR) tasks. However, classical distance-based queries, e.g., Range and k-Nearest Neighbors, may be unsuitable for exploring large datasets because the retrieved elem...
Conference Paper
The current COVID-19 pandemic has promoted the periodic release of several health databases aimed at discovering relationships in the data, detecting similar problems in patients, and studying the evolution of the disease. A way to exploit the data is to use visualization techniques, which can lead to the discovery of insights and patterns, as well...
Article
Chronic dermatological ulcers cause great discomfort to patients, and while monitoring the size of wounds over time provides significant clues about the healing evolution and the clinical condition of patients, the lack of practical applications in existing studies impairs users’ access to appropriate treatment and diagnosis methods. We propose the...
Article
Full-text available
Data analysis is increasingly being used as an unbiased and accurate way to evaluate many aspects of society and their evolution over the years. This article presents an analysis of student’s characteristics, between 2012 and 2017, in the most important exam for entry into higher education in Brazil, the Exame Nacional do Ensino Médio (ENEM). The i...
Preprint
Full-text available
Lack of data and data quality issues are among the main bottlenecks that prevent further artificial intelligence adoption within many organizations, pushing data scientists to spend most of their time cleaning data before being able to answer analytical questions. Hence, there is a need for more effective and efficient data cleaning solutions, whic...
Conference Paper
Incompleteness harms the quality of content-based retrieval and analysis in similarity queries. Missing data are usually evaluated using exclusion and imputation methods to infer possible values to complete gaps. However, such approaches can introduce bias into data and lose useful information. Similarity queries cannot perform over incomplete comp...
Conference Paper
Full-text available
Several studies have been performed worldwide to improve health services using data generated by digital medical systems. The increasing volume of data generated by these systems is making the use of knowledge discovery and data analysis techniques essential to improve the quality of the health services, which are offered by the medical facilities....
Conference Paper
Full-text available
Recursive queries are one of the main mechanisms in Relational Database Management Systems to process topology-aware, or graph-like, queries. However, existing works focus only on optimizing the recursive query statements and processing, disregarding the potential physical arrangements that might improve performance. In this work, we propose to use...
Article
Background and objectives: Bedridden patients presenting chronic skin ulcers often need to be examined at home. Healthcare professionals follow the evolution of the patients' condition by regularly taking pictures of the wounds, as different aspects of the wound can indicate the healing stages of the ulcer, including depth, location, and size. The...
Article
Full-text available
Similarity search is fundamental to store and retrieve large volumes of complex data required by many real world applications. A useful mechanism for such concept is the query-by-similarity. Based on their topological properties, metric similarity functions can be used to index sets of data which can be queried effectively and efficiently by the so...
Conference Paper
Full-text available
Bedridden patients with skin lesions (ulcers) often do not have access to specialized clinic equipment. It is important to allow healthcare practitioners to use their smartphones to leverage information regarding the proper treatment to be carried. Existing applications require special equipment, such as heat sensors, or focus only on general infor...
Preprint
Full-text available
Similarity searching supports several computational tasks, such as classification and content-based retrieval. A plethora of indexes has been proposed aiming at enhancing similarity queries, being the Omni-family one of the most versatile. The main strength of Omni methods is they handle the data elements regarding a small set of carefully selected...
Conference Paper
Full-text available
Emotion and feelings recognition have been studied in a wide research range in the past decades. The advancement and spread of social networks and online applications, such as Facebook, Instagram, and Whatsapp has motivated the sharing of news and communications among users. Specifically, in social networks, users can give reactions related to diff...
Conference Paper
Full-text available
Similarity searching is employed for content-based image retrieval (CBIR) as a fast and explainable query mechanism. However, standard similarity searches may be unsuitable for querying large data sources as retrieved elements are prone to be very similar among themselves. While adding diversity into similarity searching enhances the result set sem...
Preprint
Full-text available
Background. The image-based identification of distinct tissues within dermatological wounds enhances patients' care since it requires no intrusive evaluations. This manuscript presents an approach, we named QTDU, that combines deep learning models with superpixel-driven segmentation methods for assessing the quality of tissues from dermatological u...
Article
Full-text available
Background: The image-based identification of distinct tissues within dermatological wounds enhances patients' care since it requires no intrusive evaluations. This manuscript presents an approach, we named QTDU, that combines deep learning models with superpixel-driven segmentation methods for assessing the quality of tissues from dermatological...
Preprint
https://arxiv.org/abs/1906.10288 Segmentation of medical images is critical for making several processes of analysis and classification more reliable. With the growing number of people presenting back pain and related problems, the semi-automatic segmentation and 3D reconstruction of vertebral bodies became even more important to support decision m...
Conference Paper
Full-text available
As the variety and complexity of collected data increases, also does the need to analyze them by similarity. However, current Database Management Systems (DBMS) do not provide effective support for similarity queries, and the research on the subject has, until now, provided only a limited support through tools that are not simple to be deployed nor...
Article
The amount and variety of digital data currently being generated, stored and analyzed, including images, videos, and time series, have brought challenges to data administrators, analysts and developers, who struggle to comply with the expectations of both data owners and end users. The majority of the applications demand searching complex data by t...
Article
Background and objective: Identifying abnormalities in chest CT scans is an important and challenging task, demanding time and effort from specialists. Different parts of a single lung image may present both normal and abnormal characteristics. Thus, detecting a single lung as healthy (normal) or not is inaccurate. Methods: In this work we propo...
Article
Full-text available
A DBMS optimizer module takes its decisions by modeling the query costs upon the distribution of the data space. Cost modeling of similarity queries, however, requires the representation of distances' rather than data distributions. Therefore, the finding of a suitable representation (or synopsis) for the distance distribution has a major impact in...
Article
Full-text available
Content-based retrieval still remains one of the main problems with respect to controversies and challenges in digital healthcare over big data. To properly address this problem, there is a need for efficient computational techniques, especially in scenarios involving queries across multiple data repositories. In such scenarios, the common computat...
Article
While Information Retrieval (IR) systems have gained success in Web-style search engines in the past two decades, nonetheless, the DataBase (DB) paradigm remains prevalent in handling data in enterprise environments and digital libraries, and is gaining even more importance in the Semantic Web with the increasing need to handle partly structured (N...
Conference Paper
Full-text available
Exploring large medical image sets by means of traditional similarity query criteria (e.g., neighborhood) can be fruitless if retrieved images are too similar among themselves. This demonstration introduces Kundaha, an exploration tool that assists experts in retrieving and navigating on results from a diversified similarity perspective of user-pos...
Conference Paper
As redes complexas contribuem para a pesquisa computacional por sua capacidade de projetar sistemas modelados por vértices e arestas. Eles fornecem meios para descrever estruturas urbanas por meio das malhas viárias, expressando predicados que se referem ao fluxo e ao transporte em zonas urbanas. Este trabalho tem o objetivo de descrever as interaç...
Chapter
Full-text available
Complex networks are nowadays employed in several applications. Modeling urban street networks is one of them, and in particular to analyze criminal aspects of a city. Several research groups have focused on such application, but until now, there is a lack of a well-defined methodology for employing complex networks in a whole crime analysis proces...
Article
Full-text available
In the past decade, there has been an increasing need for semantic-aware data search and indexing in textual (structured and NoSQL) databases, as full-text search systems became available to non-experts where users have no knowledge about the data being searched and often formulate query keywords which are different from those used by the authors i...
Conference Paper
Full-text available
Relational Database Management Systems (RDBMSs) are widely employed in several applications, including those that deal with data modeled as graphs. Existing solutions store every edge in a distinct row in the edge table, however, for most cases, such modeling does not provide adequate performance. In this work, we propose Edge-k, a technique to gro...
Article
Relational Database Management Systems (RDBMSs) are widely employed in several applications, including those that deal with data modeled as graphs. Existing solutions store every edge in a distinct row in the edge table, however, for most cases, such modeling does not provide adequate performance. In this work, we propose Edge-k, a technique to gro...
Conference Paper
Techniques of bags-of-visual-words based on signature have been employed in image retrieval and analysis, with the benefit of dismissing expensive clustering processes. However, the limitations of such techniques are the requirement of multiple parameters, which may be unintuitive and in most cases depends on the application domain. In this paper,...
Conference Paper
Full-text available
TendeR-Sims (Tender Retrieval by Similarity) is a system that helps to search for satisfiable request for ten-der’s lots in a database by filtering irrelevant lots, so companies can easily discover the contracts they canwin. The system implements the Similarity-aware Relational Division Operator in a commercial RelationalDatabase Management System...
Article
Full-text available
Complex networks are commonly used to model urban street networks, which allows aiding the analysis of criminal activities in cities. Despite several works focusing on such application, there is a lack of a clear methodology focused in the analysis of crime behavior. In this sense, we propose a methodology for employing complex networks in the anal...
Conference Paper
Grouping operators summarize data in DBMS arranging elements in groups using identity comparisons. However, for metric data, grouping by identity is seldom useful, since adopting the concept of similarity is often a better fit. There are operators that can group data elements using similarity. However, the existing operators do not achieve good res...
Conference Paper
Full-text available
In this paper, we present MAMMOSET, a compilation of datasets consisting of regions of interest (ROIs) of mammograms. MAMMOSET is composed of data collected from three diversified sources, namely DDSM, MINI-MIAS, and VIENNA. Accordingly, the images of MAMMOSET were obtained from distinct medical scanners and annotated in different manners. Our cont...
Conference Paper
Full-text available
As the amount of data represented as graph grows, several frameworks are employing relational databases to manage them. However, the existing solutions store graphs creating a row for each edge in an edge table. In this paper, we propose Edge-k, a novel storage approach that combines additional columns in the edges table, allowing to tune the numbe...
Article
Modern Database Management Systems (DBMSs) retrieve songs that resemble those in a music dataset, identify plagiarism in a set of documents, or provide past cases to physicians by taking into account the characteristics of a query exam. All such tasks require the comparison of data by similarity, which can be expressed in terms of distance-based qu...
Article
Outliers removal from matched points is a crucial step in image matching/mosaicing. A single outlier might lead to incorrect adjustments. In order to deal with this issue, graph-based approaches were proposed in the literature, but usually they are very time consuming. Aimed at minimizing the processing time and keeping the matching accuracy, we de...
Article
Social media has become a popular and important tool for human communication. However, due to this popularity, spam and the distribution of malicious content by computer-controlled users, known as bots, has become a widespread problem. At the same time, when users use social media, they generate valuable data that can be used to understand the patt...
Conference Paper
Monitoring systems targeting to improve decision making in emergency scenarios are currently benefiting from crowdsourcing information. The main issue with such kind of data is that the gathered reports quickly become too similar among themselves. Hence, too much similar reports, namely near-duplicates, do not add valuable knowledge to assist crisi...
Poster
Full-text available
Here we show that epidemic propagation models, allied with network mapping techniques and community-related measures, can aid in the characterization of criminal behavior and dispersion in a city.