J.C. Sanmiguel

J.C. Sanmiguel
  • PhD
  • Professor (Associate) at Autonomous University of Madrid

About

78
Publications
11,031
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
975
Citations
Current institution
Autonomous University of Madrid
Current position
  • Professor (Associate)

Publications

Publications (78)
Article
Camera networks that reconfigure while performing multiple tasks have unique requirements, such as concurrent task allocation with limited resources, the sharing of data among fields of view across the network, and coordination among heterogeneous devices.
Article
We propose an adaptive framework to estimate the quality of video tracking algorithms without ground-truth data. The framework is divided into two main stages, namely, the estimation of the tracker condition to identify temporal segments during which a target is lost and the measurement of the quality of the estimated track when the tracker is succ...
Article
We propose an approach to create camera coalitions in resource-constrained camera networks and demonstrate it for collaborative target tracking. We cast coalition formation as a decentralized resource allocation process where the best cameras among those viewing a target are assigned to a coalition based on marginal utility theory. A manager is dyn...
Article
Full-text available
Camera networks require heavy visual-data processing and high-bandwidth communication. In this paper, we identify key factors underpinning the development of resourceaware algorithms and we propose a comprehensive energy consumption model for the resources employed by smart-camera networks, which are composed of cameras that process data locally an...
Conference Paper
Full-text available
In several video surveillance applications, such as the detection of abandoned/stolen objects or parked vehicles,the detection of stationary foreground objects is a critical task. In the literature, many algorithms have been proposed that deal with the detection of stationary foreground objects, the majority of them based on background subtraction...
Article
Full-text available
Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a lay...
Article
Full-text available
Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real‐world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to the...
Preprint
Full-text available
Semantic segmentation is a computer vision task where classification is performed at a pixel level. Due to this, the process of labeling images for semantic segmentation is time-consuming and expensive. To mitigate this cost there has been a surge in the use of synthetically generated data -- usually created using simulators or videogames -- which,...
Preprint
Full-text available
This paper introduces a novel synthetic dataset that captures urban scenes under a variety of weather conditions, providing pixel-perfect, ground-truth-aligned images to facilitate effective feature alignment across domains. Additionally, we propose a method for domain adaptation and generalization that takes advantage of the multiple versions of e...
Preprint
Full-text available
Segmentation models are typically constrained by the categories defined during training. To address this, researchers have explored two independent approaches: adapting Vision-Language Models (VLMs) and leveraging synthetic data. However, VLMs often struggle with granularity, failing to disentangle fine-grained concepts, while synthetic data-based...
Preprint
Full-text available
Due to the difficulty of replicating the real conditions during training, supervised algorithms for spacecraft pose estimation experience a drop in performance when trained on synthetic data and applied to real operational data. To address this issue, we propose a test-time adaptation approach that leverages the temporal redundancy between images a...
Article
Due to the difficulty of replicating the real conditions during training, supervised algorithms for spacecraft pose estimation experience a drop in performance when trained on synthetic data and applied to real operational data. To address this issue, we propose a test-time adaptation approach that leverages the temporal redundancy between images a...
Preprint
Full-text available
Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a lay...
Preprint
Full-text available
In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite considerable progress in bridging the domain gap, existing methods often experience perfor...
Conference Paper
Full-text available
Diffusion models represent a new paradigm in text-to-image generation. Beyond generating high-quality images from text prompts, models such as Stable Diffusion have been successfully extended to the joint generation of se-mantic segmentation pseudo-masks. However, current ex-tensions primarily rely on extracting attentions linked to prompt words us...
Article
Full-text available
Accurate training of deep neural networks for semantic segmentation requires a large number of pixel-level annotations of real images, which are expensive to generate or not even available. In this context, Unsupervised Domain Adaptation (UDA) can transfer knowledge from unlimited synthetic annotations to unlabeled real images of a given domain. UD...
Article
Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowl...
Article
Full-text available
Pixel-wise image segmentation is key for many Computer Vision applications. The training of deep neural networks for this task has expensive pixel-level annotation requirements, thus, motivating a growing interest on synthetic data to provide unlimited data and its annotations. In this paper, we focus on the generation and application of synthetic...
Preprint
In semantic segmentation, training data down-sampling is commonly performed because of limited resources, adapting image size to the model input, or improving data augmentation. This down-sampling typically employs different strategies for the image data and the annotated labels. Such discrepancy leads to mismatches between the down-sampled pixels...
Preprint
Full-text available
How would you fairly evaluate two multi-object tracking algorithms (i.e. trackers), each one employing a different object detector? Detectors keep improving, thus trackers can make less effort to estimate object states over time. Is it then fair to compare a new tracker employing a new detector with another tracker using an old detector? In this pa...
Preprint
Full-text available
This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. F...
Preprint
Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowl...
Preprint
Cross-camera image data association is essential formany multi-camera computer vision tasks, such as multi-camerapedestrian detection, multi-camera multi-target tracking, 3D poseestimation, etc. This association task is typically stated as abipartite graph matching problem and often solved by applyingminimum-cost flow techniques, which may be compu...
Article
Full-text available
Multi-Target Multi-Camera (MTMC) vehicle tracking is an essential task of visual traffic monitoring, one of the main research fields of Intelligent Transportation Systems. Several offline approaches have been proposed to address this task; however, they are not compatible with real-world applications due to their high latency and post-processing re...
Preprint
Full-text available
Cross-camera image data association is essential for many multi-camera computer vision tasks, such as multi-camera pedestrian detection, multi-camera multi-target tracking, 3D pose estimation, etc. This association task is typically stated as a bipartite graph matching problem and often solved by applying minimum-cost flow techniques, which may be...
Article
Full-text available
Advances in deep-learning-based pipelines have led to breakthroughs in a variety of microscopy image diagnostics. However, a sufficiently big training data set is usually difficult to obtain due to high annotation costs. In the case of banded chromosome images, the creation of big enough libraries is difficult for multiple pathologies due to the ra...
Article
Cross-camera image data association is essential for many multi-camera computer vision tasks, such as multi-camera pedestrian detection, multi-camera multi-target tracking, 3D pose estimation, etc. This association task is typically modeled as a bipartite graph matching problem and often solved by applying minimum-cost flow techniques, which may be...
Preprint
State-of-the-art deep learning approaches for skin lesion recognition often require pretraining on larger and more varied datasets, to overcome the generalization limitations derived from the reduced size of the skin lesion imaging datasets. ImageNet is often used as the pretraining dataset, but its transferring potential is hindered by the domain...
Preprint
Full-text available
Advances in deep-learning-based pipelines have led to breakthroughs in a variety of microscopy image diagnostics. However, a sufficiently big training data set is usually difficult to obtain due to high annotation costs. In the case of banded chromosome images, the creation of big enough libraries is difficult for multiple pathologies due to the ra...
Preprint
Full-text available
Multi-Target Multi-Camera (MTMC) vehicle tracking is an essential task of visual traffic monitoring, one of the main research fields of Intelligent Transportation Systems. Several offline approaches have been proposed to address this task; however, they are not compatible with real-world applications due to their high latency and post-processing re...
Preprint
This paper presents a novel approach for segmenting moving objects in unconstrained environments using guided convolutional neural networks. This guiding process relies on foreground masks from independent algorithms (i.e. state-of-the-art algorithms) to implement an attention mechanism that incorporates the spatial location of foreground and backg...
Chapter
Full-text available
Drones equipped with cameras have been fast deployed to a wide range of applications, such as agriculture, aerial photography, fast delivery, and surveillance. As the core steps in those applications, video object detection and tracking attracts much research effort in recent years. However, the current video object detection and tracking algorithm...
Article
Full-text available
Applying people detectors to unseen data is challenging since patterns distributions, such as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt frame by frame people detectors during runtime classification...
Article
Full-text available
Finding optimal parametrizations for people detectors is a complicated task due to the large number of parameters and the high variability of application scenarios. In this paper, we propose a framework to adapt and improve any detector automatically in multi-camera scenarios where people are observed from various viewpoints. By accurately transfer...
Article
Full-text available
During the last few years, abandoned object detection has emerged as a hot topic in the video-surveillance community. As a consequence, a myriad of systems has been proposed for automatic monitoring of public and private places, while addressing several challenges affecting detection performance. Due to the complexity of these systems, researchers...
Article
A plethora of algorithms have been defined for foreground segmentation, a fundamental stage for many computer vision applications. In this work, we propose a post-processing framework to improve foreground segmentation performance of background subtraction algorithms. We define a hierarchical framework for extending segmented foreground pixels to u...
Article
Foreground segmentation is a key stage in multiple computer vision applications, where existing algorithms are commonly evaluated making use of ground-truth data. Reference-free or stand-alone evaluations that estimate segmented foreground quality are an alternative methodology to overcome the limitations inherent to ground-truth based evaluations....
Article
WiSE-Mnet++ is a holistic simulator that abstracts the key functions of smart-camera networks and models the main operations to account for hardware capabilities, the complexities of visual data, and their associated high-data-rate communication.
Chapter
Performance evaluation of visual tracking approaches (trackers) based on ground-truth data allows to determine their strengths and weaknesses. In this paper, we present a methodology for tracker evaluation that quantifies performance against variations of the tracker input (data and configuration). It addresses three aspects: dataset, performance c...
Article
Background estimation in video consists in extracting a foreground-free image from a set of training frames. Moving and stationary objects may affect the background visibility, thus invalidating the assumption of many related literature where background is the temporal dominant data. In this paper, we present a temporal-spatial block-level approach...
Article
We propose a decision-level approach to fuse the output of multiple trackers based on their estimated individual performance. The proposed approach is divided into three steps. First, we group trackers into clusters based on the spatiotemporal pair-wise correlation of their short-term trajectories. Then, we evaluate performance based on reverse-tim...
Article
We present a block-wise approach to detect stationary objects based on spatio-temporal change detection. First, block candidates are extracted by filtering out consecutive blocks containing moving objects. Then, an online clustering approach groups similar blocks at each spatial location over time via statistical variation of pixel ratios. The stab...
Article
A novel approach for part-based people detection in images that uses contextual information is proposed. Two sources of context are distinguished regarding the local (neighbour) information and the relative importance of the parts in the model. Local context determines part visibility which is derived from the spatial location of static objects in...
Article
A novel approach is proposed for online evaluation of video tracking without ground-truth data. The temporal evolution of the covariance features is exploited to detect the stability of the tracker output over time. A model validation strategy performs such detection without learning the failure cases of the tracker under evaluation. Then, the trac...
Conference Paper
Full-text available
The choice of the most suitable fusion scheme for smart cam-era networks depends on the application as well as on the available computational and communication resources. In this paper we discuss and compare the resource requirements of five fusion schemes, namely centralised fusion, flooding, consensus, token passing and dynamic clustering. The Ex...
Conference Paper
We propose a novel approach for stationary foreground detection in crowds based on the spatio-temporal evolution of multiple features. A generic framework is presented to detect stationarity where history images model the spatio-temporal feature patterns. A feature is proposed based on structural information over each pixel neighborhood for dealing...
Conference Paper
Full-text available
Consensus-based target tracking in camera networks faces three major problems: non-linearity in the measurement model, temporary lack of measurements (naivety) due to the limited field of view (FOV) and redundancy in the iterative exchange of information. In this paper we propose two consensus-based distributed algorithms for non-linear systems usi...
Article
We present an approach for determining the temporal consistency of particle filters in video tracking based on model validation of their uncertainty over sliding windows. The filter uncertainty is related to the consistency of the dispersion of the filter hypotheses in the state space. We learn an uncertainty model via a mixture of Gamma distributi...
Article
This paper presents an approach for skin detection which is able to adapt its parameters to image data captured from video monitoring tasks with a medium field of view. It is composed of two detectors designed to get high and low probable skin pixels (respectively, regions and isolated pixels). Each one is based on thresholding two color channels,...
Technical Report
During recent years, automatic video-surveillance systems have experienced a great development driven by the growing need for security. Many approaches exist whose performance is not clear for a large variety of available scenarios. To precisely identify which ones operate better for each scenario, empirical performance evaluation has been widely u...
Conference Paper
Stationary foreground detection is a common stage in many video-surveillance applications. In this paper, we propose an approach for stationary foreground detection in video based on the spatio-temporal variation of foreground and motion data. Foreground data are obtained by Background Subtraction to detect regions of interest. Motion data allows t...
Article
This paper presents an approach for real-time video event recognition that combines the accuracy and descriptive capabilities of, respectively, probabilistic and semantic approaches. Based on a state-of-art knowledge representation, we define a methodology for building recognition strategies from event descriptions that consider the uncertainty of...
Conference Paper
We present an approach for performance evaluation of deterministic video trackers without ground-truth data. The proposed approach detects if a tracker is correctly operating over time using two main steps. First, it transforms the output of the localization step into a distribution of the target state, which emulates a multi-hypothesis tracker. Th...
Article
A novel approach is proposed for discriminating between abandoned or stolen previously detected stationary foreground regions in video surveillance. It is based on measuring the colour contrast of the contour of the stationary object under analysis at pixel level. Two contrasts are computed by analysing such a contour in the current and background...
Conference Paper
In this paper we propose an approach based on active contours to discriminate previously detected static foreground regions between abandoned and stolen. Firstly, the static foreground object contour is extracted. Then, an active contour adjustment is performed on the current and the background frames. Finally, similarities between the initial cont...
Article
The authors present a feedback-based approach for event detection in video surveillance that improves the detection accuracy and dynamically adapts the computational effort depending on the complexity of the analysed data. A core feedback structure is proposed based on defining different levels of detail for the analysis performed and estimating th...
Conference Paper
This paper presents a real-time video event recognition system for controlled environments. It is able to recognize human activities and interactions with the objects of the environment by exploiting different cues like trajectory analysis, skin detection and people recognition of the foreground blobs of the scene. Time variations of these features...
Article
This paper presents a distributed and scalable framework for video analysis that automatically estimates the optimal workflow required for the analysis of different application domains. It integrates several technologies related with data acquisition, visual analysis tools, communication protocols, and data storage. Moreover, hierarchical semantic...
Conference Paper
In this paper we describe a new algorithm focused on obtaining stationary foreground regions, which is useful for applications like the detection of abandoned/stolen objects and parked vehicles. Firstly, a sub-sampling scheme based on background subtraction techniques is implemented to obtain stationary foreground regions. Secondly, some modificati...
Conference Paper
Failure of tracking algorithms is inevitable in real and on-line tracking systems. The online estimation of the track quality is therefore desirable for detecting tracking failures while the algorithm is operating. In this paper, we propose a taxonomy and present a comparative evaluation of online quality estimators for video object tracking. The m...
Conference Paper
In video-surveillance systems, the moving objectsegmentation stage (commonly based on backgroundsubtraction) has to deal with several issues like noise,shadows and multimodal backgrounds. Hence, its failureis inevitable and its automatic evaluation is a desirablerequirement for online analysis. In this paper, we proposea hierarchy of existing perfo...
Conference Paper
In this paper we describe how the knowledge related to a specific domain and the available visual analysis tools can be used to create dynamic visual analysis systems for video surveillance. Firstly, the knowledge is described in terms of application domain (types of objects, events… that can appear in such domain) and system capabilities (algorith...
Conference Paper
This paper starts from the idea of automatically choosing the appropriate thresholds for a shadow detection algorithm. It is based on the maximization of the agreement between two independent shadow detectors without training data. Firstly, this shadow detection algorithm is described and then, it is adapted to analyze video surveillance sequences....
Conference Paper
Full-text available
In this paper, we propose an ontology for representing the prior knowledge related to video event analysis. It is composed of two types of knowledge related to the application domain and the analysis system. Domain knowledge involves all the high level semantic concepts in the context of each examined domain (objects, events, context...) whilst sys...
Conference Paper
Full-text available
In this paper a new approach for detecting unattended or stolen objects in surveillance video is proposed. It is based on the fusion of evidence provided by three simple detectors. As a first step, the moving regions in the scene are detected and tracked. Then, these regions are classified as static or dynamic objects and human or nonhuman objects....
Conference Paper
This paper describes a generic, scalable, and distributed framework for real-time video-analysis intended for research, prototyping and services deployment purposes. The architecture considers multiple cameras and is based on a server/client model. The information generated by each analysis module and the context information are made accessible to...
Conference Paper
This paper presents the results of analysing the effect of different motion segmentation techniques in a system that transmits the information captured by a static surveillance camera in an adaptative way based on the on-line generation of descriptions and their descriptions at different levels of detail. The video sequences are analyzed to detect...
Conference Paper
This paper presents a system to transmit the information from a static surveillance camera in an adaptive way, from low to higher bit-rate, based on the on-line generation of descriptions. The proposed system is based on a server/client model: the server is placed in the surveillance area and the client is placed in a user side. The server analyzes...

Network

Cited By