Fernando Díaz-de-María

Fernando Díaz-de-María
University Carlos III de Madrid | UC3M · Department of Signal Theory and Communications

PhD in Telecommunication Engineer

About

123
Publications
11,825
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,371
Citations
Introduction
His current research interests include image and video processing and computer vision. Our current research focuses on: object detection a recognition, image segmentation, medical images and visual attention,

Publications

Publications (123)
Article
Cell detection and tracking applied to in vivo fluorescence microscopy has become an essential tool in biomedicine to characterize 4D (3D space plus time) biological processes at the cellular level. Traditional approaches to cell motion analysis by microscopy imaging, although based on automatic frameworks, still require manual supervision at some...
Article
Full-text available
Transcriptional and proteomic profiling of individual cells have revolutionized interpretation of biological phenomena by providing cellular landscapes of healthy and diseased tissues1,2. These approaches, however, do not describe dynamic scenarios in which cells continuously change their biochemical properties and downstream ‘behavioural’ outputs3...
Article
Although the CNNs are a very powerful tool for image retrieval, the need of training datasets properly adapted to the application at hand hinders the usefulness of such networks, specially since the datasets need to be free of noise to avoid spoiling the learning process. An ad hoc preprocessing of the dataset to mitigate the noise is a possible so...
Article
Full-text available
Human eye movements while driving reveal that visual attention largely depends on the context in which it occurs. Furthermore, an autonomous vehicle which performs this function would be more reliable if its outputs were understandable. Capsule Networks have been presented as a great opportunity to explore new horizons in the Computer Vision field,...
Article
Vanishing Point (VP) detection is a computer vision task that can be useful in many different fields of application. In this work, we present a VP detection algorithm for natural landscape images based on an multi-threshold edge extraction process that combines several representations of an image, and on novel clustering and cluster refinement proc...
Article
Full-text available
Image perception can vary considerably between subjects, yet some sights are regarded as aesthetically pleasant more often than others due to their specific visual content, this being particularly true in tourism-related applications. We introduce the ESITUR project, oriented towards the development of 'smart tourism' solutions aimed at improving t...
Article
The process of determining relevant landmarks within a certain region is a challenging task, mainly due to its subjective nature. Many of the current lines of work include the use of density-based clustering algorithms as the base tool for such a task, as they permit the generation of clusters of different shapes and sizes. However, there are still...
Article
Full-text available
Modern computer vision techniques have to deal with vast amounts of visual data, which implies a computational effort that has often to be accomplished in broad and challenging scenarios. The interest in efficiently solving these image and video applications has led researchers to develop methods to expertly drive the corresponding processing to co...
Article
The electrodermal activity (EDA) is a psychophysiological indicator which can be considered a somatic marker of the emotional and attentional reaction of subjects towards stimuli. EDA measurements are not biased by the cognitive process of giving an opinion or a score to characterize the subjective perception, and group-level EDA recordings integra...
Article
License plate detection is a common problem in traffic surveillance applications. Although some solutions have been proposed in the literature, their success is usually restricted to very specific scenarios, with their performance dropping in more demanding conditions. One of the main challenges to be addressed for this kind of systems is the varyi...
Article
Full-text available
Content based video indexing and retrieval (CBVIR) is a lively area of research which focuses on automating the indexing, retrieval and management of videos. This area has a wide spectrum of promising applications where assessing the impact of audiovisual productions emerges as a particularly interesting and motivating one. In this paper we present...
Code
This file is the Matlab code related with the paper "Optimized Update/Prediction Assignment for Lifting Transforms on Graphs", Eduardo Martínez-Enríquez, Jesús Cid-Sueiro, Fernando Díaz-de-María and Antonio Ortega. The user will find three different folders with all the material needed to recreate the results in the paper. 1) Folder Optimized-Updat...
Article
Full-text available
Computer-Aided Diagnosis (CAD) systems for melanoma detection have received a lot of attention during the last decades because of the utmost importance of detecting this type of skin cancer in its early stages. However, despite of the many research efforts devoted to this matter, these systems are not used yet in everyday clinical practice. Very li...
Article
Full-text available
Transformations on graphs can provide compact representations of signals with many applications in denoising, feature extraction or compression. In particular, lifting transforms have the advantage of being critically sampled and invertible by construction, but the efficiency of the transform depends on the choice of a good bipartition of the graph...
Conference Paper
Full-text available
This paper contributes to the field of affective video content analysis through the novel employment of electrodermal activity (EDA) measurements as ground truth for machine learning algorithms. The variation of the electrical properties of the skin, known as EDA, is a psychophysiological indicator widely used in medicine, psychology and neuroscien...
Article
The latest High Efficiency Video Coding standard (HEVC) provides a set of new coding tools to achieve a significantly higher coding efficiency than previous standards. In this standard, the pixels are first grouped into Coding Units (CU), then Prediction Units (PU), and finally Transform Units (TU). All these coding levels are organized into a quad...
Article
The latest High Efficiency Video Coding (HEVC) standard relies on a large number of coding tools from which the encoder should choose for every coding unit. This optimization process is based on the minimization of a Lagrangian cost function that evaluates the distortion produced and the bit-rate needed to encode each coding unit. The value of the...
Article
Full-text available
In this work we describe and optimize a general scheme based on lifting transforms on graphs for video coding. A graph is constructed to represent the video signal. Each pixel becomes a node in the graph and links between nodes represent similarity between them. Therefore, spatial neighbors and temporal motion-related pixels can be linked, while no...
Article
In the last few years large-scale image retrieval has attracted a lot of attention from the multimedia community. Usual approaches addressing this task first generate an initial ranking of the reference images using fast approximations that do not take into consideration the spatial arrangement of local features in the image (e.g. the Bag-of-Words...
Article
Full-text available
Automatic aesthetics prediction of multimedia content is bound to be a powerful tool for artificial intelligence due to the wide range of applications where it could be used. With this paper we contribute to the research in the field of video aesthetics assessment by carrying out a comparative study of (1) the performance of eight families of visua...
Conference Paper
Full-text available
The modeling of visual attention has gained much interest during the last few years since it allows to efficiently drive complex visual processes to particular areas of images or video frames. Although the literature concerning bottom-up saliency models is vast, we still lack of generic approaches modeling top-down task and context-driven visual at...
Article
The emerging high-efficiency video coding standard achieves higher coding efficiency than previous standards by virtue of a set of new coding tools such as the quadtree coding structure. In this novel structure, the pixels are organized into coding units (CU), prediction units, and transform units, the sizes of which can be optimized at every level...
Article
A two-level variable bit rate (VBR) control algorithm for hierarchical video coding, specifically tailored for the new High Efficiency Video Coding (HEVC) standard, is presented here. A long-term level monitors the current bit count along a sliding window of a few seconds, comprising several intra periods (IPs) and shifted on an IP basis. This long...
Article
Full-text available
In this paper we propose a temporal segmentation and a keyframe selection method for User-Generated Video (UGV). Since UGV is rarely structured in shots and usually user’s interest are revealed through camera movements, a UGV temporal segmentation system has been proposed that generates a video partition based on a camera motion classification. Mot...
Article
Flicker is a common video-compression-related temporal artifact. It occurs when co-located regions of consecutive frames are not encoded in a consistent manner, especially when Intra frames are periodically inserted at low and medium bit rates. In this paper we propose a flicker reduction method which aims to make the luminance changes between pixe...
Conference Paper
Full-text available
This paper tackles the problem of automatic brain tumor classification from Magnetic Resonance Imaging (MRI) where, traditionally, general-purpose texture and shape features extracted from the Region of Interest (tumor) have become the usual parameterization of the problem. Two main contributions are made in this context. First, a novel set of clin...
Conference Paper
Full-text available
In this paper, we present a computational model capable of pre-dicting the viewer perception of Youtube car TV commercials by using a set of low-level audio and visual descriptors. Our re-search goal relies on the hypothesis that these descriptors could reflect to some extent the objective value of the videos and, in turn, the average viewer's perc...
Article
Abstract In this paper, we present a computational model capable to predict the viewer perception of car advertisements videos by using a set of low-level video descriptors. Our research goal relies on the hypothesis that these descriptors could reflect the aesthetic value of the videos and, in turn, their viewers’ perception. To that effect, and a...
Article
The motion estimation (ME) process used in the H.264/AVC reference software is based on minimizing a cost function that involves two terms (distortion and rate) that are properly balanced through a Lagrangian parameter, usually denoted as $lambda_{motion}$. In this paper we propose an algorithm to improve the conventional way of estimating $lambda_...
Conference Paper
Flicker is a common video coding artifact that occurs especially at low and medium bit rates. In this paper we propose a temporal filter-based method to reduce flicker. The proposed method has been designed to be compliant with conventional video coding standards, i.e., to generate a bitstream that is decodable by any standard decoder implementatio...
Article
Latent topic models have become a popular paradigm in many computer vision applications due to their capability to unsupervisely discover semantics in visual content. Relying on the Bag-of-Words representation, they consider images as mixtures of latent topics that generate visual words according to some specific distributions. However, the perform...
Conference Paper
The basis functions of lifting transform on graphs are completely determined by finding a bipartition of the graph and defining the prediction and update filters to be used. In this work we consider the design of prediction filters that minimize the quadratic prediction error and therefore the energy of the detail coefficients, which will give rise...
Conference Paper
In this paper we propose a system for automatic detection of specific events and abnormal behaviors in crowded scenes. In particular, we focus on the parametrization by proposing a set of mid-level spatio-temporal features that successfully model the characteristic motion of typical events in crowd behaviors. Furthermore, due to the fact that some...
Article
The latest H.264/AVC video coding standard achieves high compression rates in exchange for high computational complexity. Nowadays, however, many application scenarios require the encoder to meet some complexity constraints. This paper proposes a novel complexity control method that relies on a hypothesis testing that can handle time-variant conten...
Article
The embedded speech-centric interface for handheld wireless devices has been implemented on a commercially available PDA as a part of an application that allows real-time access to stock prices through GPRS. In this article, we have focused mainly in the optimization of the ASR subsystem for minimizing the use of the handheld computational resource...
Conference Paper
In this paper we present an extended feature extraction procedure for Automatic Speech Recognition (ASR) over 3G UMTS channels [2], within a bitstream-based Network Speech Recognition (NSR) architecture. This procedure takes advantage of the Unequal Error Protection (UEP) policy that is applied by the channel coder that highly protects selected par...
Article
Temporal scalability is supported in scalable video coding (SVC) by means of hierarchical prediction structures, where the higher layers can be ignored for frame rate reduction. Nevertheless, this kind of scalability is not totally exploited by the rate control (RC) algorithms since the hypothetical reference decoder (HRD) requirement is only satis...
Conference Paper
This paper proposes a probabilistic generative model that concurrently tackles the problems of image retrieval and detection of the region-of-interest (ROI). By introducing a latent variable that classifies the matches as true or false, we specifically focus on the application of geometric constrains to the keypoint matching process and the achieve...
Conference Paper
A simplified protocol and associated metrics based on Signal Detection Theory (SDT) for subjective Video Quality Assessment (VQA) is proposed with the aim of filling the gap existing between the lack of discrimination abilities of objective Quality Estimates (specially when perceptually motivated processing methods are involved) and the costly norm...
Conference Paper
Perceptual coding has become of great interest in modern video coding due to the need for higher compression rates. Many previous works have been carried out to incorporate perceptual information to hybrid video encoders, either modifying the quantization parameter according to a certain perceptual resource allocation map or preprocessing video seq...
Article
Full-text available
In the last years, support vector machines (SVMs) have shown excellent performance in many applications, especially in the presence of noise. In particular, SVMs offer several advantages over artificial neural networks (ANNs) that have attracted the attention of the speech processing community. Nevertheless, their high computational requirements pr...
Article
The H.264/AVC standard achieves a high coding efficiency compared to previous standards. However, this gain is accomplished at great computational cost, with mode decision being one of the most demanding subsystems. In this paper, a two-level classification-based approach to the inter mode decision problem is proposed. A first classifier detects SK...
Article
Full-text available
In this paper, we propose a novel variable bit rate (VBR) controller for real-time H.264/scalable video coding (SVC) applications. The proposed VBR controller relies on the fact that consecutive pictures within the same scene often exhibit similar degrees of complexity, and consequently should be encoded using similar quantization parameter (QP) va...
Conference Paper
Full-text available
We propose a complete video encoder based on directional “non-separable” transforms that allow spatial and temporal correlation to be jointly exploited. These lifting-based wavelet transforms are applied on graphs that link pixels in a video sequence based on motion information. In this paper, we first consider a low complexity version of this tran...
Article
In this paper we propose a complete pre- processing system for restoring low-quality Quick Response code images. The target application for the system is to restore and decode a code image in a cell phone when the image is shown in another cell phone display. Therefore, the system design focuses on low complexity solutions that are effective for a...
Article
Full-text available
In this paper,estimation method for perceptual video coding is proposed. The method employs a camera motion compensated vector map computed by means of a hierarchical motion estimation (HME) procedure and a Restricted Affine Transformation (RAT)-based modeling of the camera motion. To allow for a computationally efficient solution, the number of la...
Article
Full-text available
The most recent video coding standards are usually based on a rate-distortion optimization (RDO) process that has been formulated in terms of an unconstrained Lagrangian opti- mization. The RDO provides outstanding results in exchange for a high computational cost, especially for the Inter frames, which require a computationally heavy motion estima...
Article
Full-text available
Hybrid speech recognizers, where the estimation of the emission pdf of the states of hidden Markov models (HMMs), usually carried out using Gaussian mixture models (GMMs), is substituted by artificial neural networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirement...
Data
Full-text available
Hybrid speech recognizers, where the estimation of the emission pdf of the states of hidden Markov models (HMMs), usually carried out using Gaussian mixture models (GMMs), is substituted by artificial neural networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirement...
Conference Paper
Full-text available
In this paper we propose a novel rate control initialization algorithm for real-time H.264/scalable video coding. In particular, a two-step approach is proposed. First, the initial quantization parameter (QP) for each layer is determined by means of a parametric rate-quantization (R-Q) modeling that depends on the layer identifier (base or enhancem...
Conference Paper
In this paper we propose a novel rate control initialization algorithm for real-time H.264/scalable video coding. In particular, a two-step approach is proposed. First, the initial quantization parameter (QP) for each layer is determined by means of a parametric rate-quantization (R-Q) modeling that depends on the layer identifier (base or enhancem...
Conference Paper
In this paper we propose a novel VBR controller for real-time H.264/SVC video coding. Since consecutive pictures within the same scene often exhibit similar degrees of complexity, the proposed VBR controller allows for just an incremental variation of QP with respect to that of the previous picture, so preventing unnecessary QP fluctuations. For th...
Article
In this article, an improved version of one of the most cited intra mode decision algorithms in H.264/AVC video coding is proposed with the aim to improve its efficiency and performance. The reference algorithm determines the interpolation/extrapolation spatial direction (mode) for achieving the best intra prediction using the Sobel gradient calcul...
Conference Paper
This paper evaluates the capabilities of model-based distances between time series to identify the musical genre of songs. In contrast with standard approaches, this kind of metrics can take into account the structure of the songs by modeling the dynamics of the parameter sequences. We tackle the problem from a non-supervised and from a supervised...
Conference Paper
In this paper, a novel rate control algorithm for real-time VBR hierarchical video coding is proposed. The algorithm works at two levels that are called long- and short-term levels. The long-term level aims at ensuring that the bit count does not exceed the maximum allowed amount for a few-second long window. To this end, it considers a sliding win...
Article
Full-text available
The rate control problem has been extensively studied in parallel to the development of the different video coding standards. The bit allocation via Cauchy-density-based rate-distortion (R-D) modeling of the discrete cosine transform (DCT) coefficients has proved to be one of the most accurate solution at picture level. Nevertheless, in some specif...
Article
The H.264/AVC video coding standard achieves a high coding efficiency compared to previous standards. However, the encoder complexity results in a very high computational cost due to motion estimation and macroblock mode decisions. In this paper, we propose a simple, content adaptive mode decision method suitable for a variety of applications. Expe...
Article
The use of feature enhancement techniques to obtain estimates of the clean parameters is a common approach for robust automatic speech recognition (ASR). However, the decoding algorithm typically ignores how accurate these estimates are. Uncertainty decoding methods incorporate this type of information. In this paper, we develop a formulation of th...
Article
This paper proposes a novel similarity measure for clustering sequential data. We first construct a common state-space by training a single probabilistic model with all the sequences in order to get a unified representation for the dataset. Then, distances are obtained attending to the transition matrices induced by each sequence in that state-spac...
Article
Additive noise generates important losses in automatic speech recognition systems. In this paper, we show that one of the causes contributing to these losses is the fact that conventional recognisers take into consideration feature values that are outliers. The method that we call bounded-distance HMM is a suitable method to avoid that outliers con...
Conference Paper
For the last few years bag-of-words models have been succesfully applied to the information retrieval field. However their application to visual content suffers from an important shortcoming: they model images as sets of unordered visual words rather than consider their spatial and geometric layout. Visual information is highly organized along the...
Conference Paper
We propose a new algorithm for sequence segmentation based on recent advances in semi-parametric sequence clustering. This approach implies the use of model-based distance measures between sequences, as well as a variant of spectral clustering specially tailored for segmentation. The method is highly flexible since it allows for the use of any prob...
Conference Paper
The H.264/AVC standard achieves a high coding efficiency compared with previous standards. However, it does so at a very high computational cost, with motion estimation being one of the most demanding subsystems. In this paper a hierarchical classification-based approach to the inter mode decision (MD) problem is proposed. A first classifier detect...
Article
Abstract—In this paper, we propose to quantify the quality of the recorded voice through objective nonlinear measures. Quantification of speech signal quality has been traditionally carried out with linear techniques since the classical model of voice production is a linear approximation. Nevertheless, nonlinear behaviors in the voice production pro...
Article
Full-text available
We review the existing alternatives for defining model-based distances for clustering sequences and propose a new one based on the Kullback-Leibler divergence. This distance is shown to be especially useful in combination with spectral clustering. For improved performance in real-world scenarios, a model selection scheme is also proposed.
Article
Full-text available
In this paper, we propose to quantify the quality of the recorded voice through objective nonlinear measures. Quantification of speech signal quality has been traditionally carried out with linear techniques since the classical model of voice production is a linear approximation. Nevertheless, nonlinear behaviors in the voice production process hav...
Conference Paper
The accuracy of the Cauchy probability density function for modeling of the discrete cosine transform coefficient distribution has already been proved for the frame layer of the rate control subsystem of a hybrid video coder. Nevertheless, in some specific applications operating in real-time low-delay environments, a basic unit layer is recommended...
Conference Paper
This paper presents a rate control (RC) algorithm for the scalable extension of the H.264/AVC video coding standard. The proposed rate controller is designed for real-time video streaming with buffer constraint. Since a large buffer delay and bit rate variation are allowed in this kind of applications, our proposal reduces the quantization paramete...