William Robson Schwartz

William Robson Schwartz
Federal University of Minas Gerais | UFMG · Departamento de Ciência da Computação

Ph.D in Computer Science

About

209
Publications
88,121
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,218
Citations
Introduction
William Robson Schwartz is an Associate Professor in the Department of Computer Science at the Federal University of Minas Gerais, Brazil. He received his PhD degree in Computer Science from the University of Maryland, College Park, USA in 2010. His research interests include Computer Vision, Surveillance, Forensics, and Biometrics. He is the head of the Smart Surveillance Interest Group (SSIG), a research group focusing on large-scale surveillance.
Additional affiliations
February 2012 - present
Federal University of Minas Gerais
Position
  • Professor (Assistant)
February 2012 - present
Federal University of Minas Gerais
Position
  • Professor (Assistant)
October 2010 - February 2012
University of Campinas
Position
  • PostDoc Position
Education
March 1999 - March 2003
Universidade Federal do Paraná
Field of study
  • Computer Science

Publications

Publications (209)
Article
Automatic license plate recognition (ALPR) has been the focus of many researches in the past years. In general, ALPR is divided into the following problems: detection of on-track vehicles, license plate detection, segmentation of license plate characters, and optical character recognition (OCR). Even though commercial solutions are available for co...
Conference Paper
Pedestrian detection is a well-known problem in Computer Vision. To improve detection, several feature descrip-tors have been proposed and combined. However, there are cases where the most powerful features fail to discriminate between false positives similar to the human body structure and actual true positives, which is a critical problem for app...
Conference Paper
Full-text available
Suitable feature representation is essential for performing video analysis and understanding in applications within the smart surveillance domain. In this paper, we propose a novel spatiotemporal feature descriptor based on co-occurrence matrices computed from the optical flow magnitude and orientation. Our method, called Optical Flow Co-occurrence...
Article
Face identification is an important research topic due to areas such as its application to surveillance, forensics and human-computer interaction. In the past few years, a myriad of methods for face identification has been proposed in the literature, with just a few among them focusing on scalability. In this work, we propose a simple but efficient...
Conference Paper
Person re-identification (Re-ID) maintains a global identity for an individual while he moves along a large area covered by multiple cameras. Re-ID enables a multi-camera monitoring of individual activity that is critical for surveillance systems. However, the low-resolution images combined with the different poses, illumination conditions and came...
Conference Paper
Modern visual pattern recognition models are based on deep convolutional networks. Such models are computationally expensive, hindering applicability on resource-constrained devices. To handle this problem, we propose three strategies. The first removes unimportant structures (neurons or layers) of convolutional networks, reducing their computation...
Preprint
Full-text available
Face recognition has been one of the most relevant and explored fields of Biometrics. In real-world applications, face recognition methods usually must deal with scenarios where not all probe individuals were seen during the training phase (open-set scenarios). Therefore, open-set face recognition is a subject of increasing interest as it deals wit...
Preprint
Full-text available
Deeply learned representations are the state-of-the-art descriptors for face recognition methods. These representations encode latent features that are difficult to explain, compromising the confidence and interpretability of their predictions. Most attempts to explain deep features are visualization techniques that are often open to interpretation...
Article
Full-text available
Facial biometrics tend to be spontaneous, instinctive and less human-intrusive. It is regularly employed in the authentication of authorized users and personnel to protect data from violation attacks. A face spoofing attack usually comprises the illegal attempt to access valuable undisclosed information as a tres-passer attempts to impersonate an i...
Article
Full-text available
Gait is a biometry characterized by the identification of individuals by the way they walk. It is recently gaining evidence because it can be collected at distance and does not require subject cooperation, which is desirable on surveillance scenarios. Despite these advantages, the literature reports challenging situations where gait recognition is...
Article
Full-text available
This paper presents an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules. The system is conceived by evaluating...
Article
The diversity of pedestrians detectors proposed in recent years has encouraged some works to fuse them to achieve a more accurate detection. The intuition behind it is to combine the detectors based on its spatial consensus. The hypothesis is that a location pointed by multiple detectors has a high probability of actually belonging to a pedestrian,...
Preprint
Full-text available
This paper presents an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules. The system is conceived by evaluating...
Conference Paper
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These architectures consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the predicti...
Conference Paper
Dimensionality reduction plays an important role in computer vision problems since it reduces computational cost and is often capable of yielding more discriminative data representation. In this context, Partial Least Squares (PLS) has presented notable results in tasks such as image classification and neural network optimization. However, PLS is i...
Conference Paper
A lot of information may be extracted from the Earth’s surface through aerial images. This information may assist in myriad applications, such as urban planning, crop and forest management, disaster relief, etc. However, the process of distilling this information is strongly based on efficiently encoding the spatial features, a challenging task. Fa...
Conference Paper
This work addresses the activity recognition problem. We propose two different representations based on motion information for activity recognition. The first representation is a novel temporal stream for two-stream Convolutional Neural Networks (CNNs) that receives as input images computed from the optical flow magnitude and orientation to learn t...
Conference Paper
Full-text available
Open-set face recognition describes a scenario where unknown subjects, unseen during training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galler...
Preprint
This paper presents a novel human skin detection approach based the employment of a dual autoencoder architecture , composed of models to detect background and skin zones concomitantly. Our method, named Dual-Autoencoder Skin Predictor (DASP), associates the outputs of two autoencoders through a composite loss, responsible to minimize the error bet...
Preprint
Full-text available
A face spoofing attack occurs when an intruder attempts to impersonate someone who carries a gainful authentication clearance. It is a trending topic due to the increasing demand for biometric authentication on mobile devices, high-security areas, among others. This work introduces a new database named Sense Wax Attack dataset (SWAX), comprised of...
Conference Paper
Full-text available
A face spoofing attack occurs when an intruder attempts to impersonate someone who carries a gainful authentication clearance. It is a trending topic due to the increasing demand for biometric authentication on mobile devices, high-security areas, among others. This work introduces a new database named Sense Wax Attack dataset (SWAX), comprised of...
Preprint
Full-text available
Communication through gestures plays a relevant role in human life, in which a non-verbal channel is used to propagate information among individuals. Due to its applicability in several contexts, gesture recognition has been investigated by different approaches. However, most of these methods do not concern about issues, such as class imbalance and...
Preprint
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These architectures consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the predicti...
Article
Modern visual pattern recognition methods are based on convolutional networks since they are able to learn complex patterns directly from the data. However, convolutional networks are computationally expensive in terms of floating point operations (FLOPs), energy consumption and memory requirements, which hinder their deployment on low-power and re...
Preprint
Full-text available
This paper presents an approach to perform human activity recognition in videos through the employment of a deep recurrent network, taking as inputs appearance and optical flow information. Our method proposes a novel architecture named BubbleNET, which is based on a recurrent layer dispersed into several modules (referred to as bubbles) along with...
Article
Sensor-based Human Activity Recognition (sensor-based HAR) has been used in many real-world applications providing valuable knowledge to many areas, such as human-object interaction, medical, military and security. Recently, wearable devices have progressively gained momentum due to their relevant data provided by their sensors, which could be empl...
Article
Full-text available
The predictive ability of convolutional neural networks (CNNs) can be improved by increasing their depth. However, increasing depth also increases computational cost significantly, in terms of both floating point operations and memory consumption, hindering applicability on resource-constrained systems such as mobile and internet of things (IoT) de...
Conference Paper
Full-text available
A face spoofing attack occurs when an intruder attempts to impersonate someone with a desirable authentication clearance. To detect such intrusions, many researchers have dedicated their efforts to study visual liveness detection as the primary indicator to block spoofing violations. In this work, we contemplate low-power devices through the combin...
Chapter
Full-text available
License plate recognition is an important task applied to a myriad of important scenarios. Even though there are several methods for performing license plate recognition, our approach is designed to work not only on high resolution license plates but also when the license plate characters are not recognizable by humans. Early approaches divided the...
Chapter
Gait is a biometry that differentiates individuals by the way they walk. Research on this topic has gained evidence since it is unobtrusive and can be collected at distance, which is desirable in surveillance scenarios. Most of the previous works have focused on human silhouette as representation. However, they suffer from many factors such as move...
Conference Paper
This work addresses the person re-identification problem, which consists on matching images of individuals captured by multiple and non-overlapping surveillance cameras. Works from literature tackle this problem proposing robust feature descriptors and matching functions, where the latter is responsible to assign the correct identity for individual...
Conference Paper
Sensor-based Human Activity Recognition (HAR) provides valuable knowledge to many areas. Recently, wearable devices have gained space as a relevant source of data. However, there are two issues: large number of heterogeneous sensors available and the temporal nature of the sensor data. To handle these issues, we propose a multimodal approach that p...
Conference Paper
Communication through gestures plays a relevant role in human life, in which a non-verbal language is used to propagate information among individuals. To recognize gestures, computers need to represent and interpret human appearance and motion, involving hands, arms, face, head and/or body, in a mathematical sense. Despite the high applicability in...
Preprint
In the last years, the computer vision research community has studied on how to model temporal dynamics in videos to employ 3D human action recognition. To that end, two main baseline approaches have been researched: (i) Recurrent Neural Networks (RNNs) with Long-Short Term Memory (LSTM); and (ii) skeleton image representations used as input to a C...
Conference Paper
Full-text available
Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as vari...
Preprint
Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as vari...
Article
The temporal component of videos provides an important clue for activity recognition , as a number of activities can be reliably recognized based on the motion information. In view of that, this work proposes a novel temporal stream for two-stream convolutional networks based on images computed from the optical flow magnitude and orientation, named...
Preprint
Full-text available
The recent impressive results of deep learning-based methods on computer vision applications brought fresh air to the research and industrial community. This success is mainly due to the process that allows those methods to learn data-driven features, generally based upon linear operations. However, in some scenarios, such operations do not have a...
Article
Semantic segmentation requires methods capable of learning high-level features while dealing with large volume of data. Toward such goal, convolutional networks can learn specific and adaptable features based on the data. However, these networks are not capable of processing a whole remote sensing image, given its huge size. To overcome such limita...
Thesis
Full-text available
The dynamic video summarization of surveillance videos has several critical applications, mainly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, buildings, schools, hospitals, roads, among others. This study presents an approach for the generation of dynamic summa...
Preprint
Dimensionality reduction plays an important role in computer vision problems since it reduces computational cost and is often capable of yielding more discriminative data representation. In this context, Partial Least Squares (PLS) has presented notable results in tasks such as image classification and neural network optimization. However, PLS is i...
Article
Full-text available
We tackle automatic meter reading (AMR) by leveraging the high capability of convolutional neural networks (CNNs). We design a two-stage approach that employs the Fast-YOLO object detector for counter detection and evaluates three different CNN-based approaches for counter recognition. In the AMR literature, most datasets are not available to the r...
Conference Paper
Gestures are related to a non-verbal language used on the interaction between subjects. Due to its applicability in several contexts, gesture recognition has been investigated by different researches, often investing on the capture of motion and appearance on videos. However, most of these methods do not properly explore the well defined gesture te...
Conference Paper
Video understanding is the next frontier of computer vision, in which activity recognition plays a major role. Despite the recent improvements in holistic activity recognition, further researching part-based models such as context may allow us to better understand what is important for activities and thus improve our current activity recognition mo...
Conference Paper
Full-text available
Person Re-Identification is all about determining a person's entire course as s/he walks around camera-equipped zones. More precisely, person Re-ID is the problem of matching human identities captured from non-overlapping surveillance cameras. In this work, we propose an approach that learns a new low-dimensional metric space in an attempt to cut d...
Conference Paper
Full-text available
With the increasing number of cameras available in the cities, video traffic analysis can provide useful insights for the transportation segment. One of such analysis is the Automatic License Plate Recognition (ALPR). Previous approaches divided this task into several cascaded subtasks, i.e., vehicle location, license plate detection, character seg...
Conference Paper
Discovering regions that have sports interest in a set of images acquired from a scene at different times and possibly from different viewpoints and cameras is a crucial step for many applications. Physical activity can be effective at all stages of chronic disease, therefore, finding regions with the presence of physical activities might contribut...
Preprint
Modern pattern recognition methods are based on convolutional networks since they are able to learn complex patterns that benefit the classification. However, convolutional networks are computationally expensive and require a considerable amount of memory, which limits their deployment on low-power and resource-constrained systems. To handle these...
Article
Full-text available
An increasing number of works have investigated the use of convolutional neural network (ConvNets) approaches to perform human activity recognition (HAR) based on wearable sensor data. These approaches present state-of-the-art results in HAR, outperforming traditional approaches, such as handcrafted methods and 1D convolutions. Motivated by this, i...
Article
Full-text available
Human activity recognition based on wearable sensor data has been an attractive research topic due to its application in areas such as healthcare and smart environments. In this context, many works have presented remarkable results using accelerometer, gyroscope and magnetometer data to represent the categories of activities. However, the current s...
Conference Paper
Full-text available
Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detector....
Article
Person reidentification (Re-ID) aims at establishing global identities for individuals as they move across a camera network. It is a challenging task due to the drastic appearance changes that occur between cameras as a consequence of different pose and illumination conditions. Pairwise matching models yield stateof-the-art results in most of the p...
Preprint
Full-text available
Human activity recognition based on wearable sensor data has been an attractive research topic due to its application in areas such as healthcare and smart environments. In this context, many works have presented remarkable results using accelerometer, gyroscope and magnetometer data to represent the categories of activities. However, the current s...
Preprint
Full-text available
The variety of pedestrians detectors proposed in recent years has encouraged some works to fuse pedestrian detectors to achieve a more accurate detection. The intuition behind is to combine the detectors based on its spatial consensus. We propose a novel method called Content-Based Spatial Consensus (CSBC), which, in addition to relying on spatial...
Conference Paper
Features extracted with deep learning have now achieved state-of-the-art results in many tasks. However, to reuse a learned deep model, transfer learning with fine-tuning needs to be employed, which requires to re-train the whole model or part of it to extract useful features in the new domain. This step is burdensome and requires heavy computing p...