Fernando Pérez-Cruz

University Carlos III de Madrid , Getafe, Madrid, Spain

Are you Fernando Pérez-Cruz?

Claim your profile

Publications (88)104.91 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Crowdsourcing has been proven to be an effective and efficient tool to annotate large datasets. User annotations are often noisy, so methods to combine the annotations to produce reliable estimates of the ground truth are necessary. We claim that considering the existence of clusters of users in this combination step can improve the performance. This is especially important in early stages of crowdsourcing implementations, where the number of annotations is low. At this stage there is not enough information to accurately estimate the bias introduced by each annotator separately, so we have to resort to models that consider the statistical links among them. In addition, finding these clusters is interesting in itself as knowing the behavior of the pool of annotators allows implementing efficient active learning strategies. Based on this, we propose in this paper two new fully unsupervised models based on a Chinese Restaurant Process (CRP) prior and a hierarchical structure that allows inferring these groups jointly with the ground truth and the properties of the users. Efficient inference algorithms based on Gibbs sampling with auxiliary variables are proposed. Finally, we perform experiments, both on synthetic and real databases, to show the advantages of our models over state-of-the-art algorithms.
    07/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of comorbidity is an open and complex research field in the branch of psychiatry, where clinical experience and several studies suggest that the relation among the psychiatric disorders may have etiological and treatment implications. In this paper, we are interested in applying latent feature modeling to find the latent structure behind the psychiatric disorders that can help to examine and explain the relationships among them. To this end, we use the large amount of information collected in the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database and propose to model these data using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the discrete nature of the data, we first need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. We also provide a variational inference algorithm for this model, which provides a complementary (and less expensive in terms of computational complexity) alternative to the Gibbs sampler allowing us to deal with a larger number of data. Finally, we use the model to analyze comorbidity among the psychiatric disorders diagnosed by experts from the NESARC database.
    01/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Modern communications systems use multiple-input multiple-output (MIMO) and high-order QAM constellations for maximizing spectral efficiency. However, as the number of antennas and the order of the constellation grow, the design of efficient and low-complexity MIMO receivers possesses big technical challenges. For example, symbol detection can no longer rely on maximum likelihood detection or sphere-decoding methods, as their complexity increases exponentially with the number of transmitters/receivers. In this paper, we propose a low-complexity high-accuracy MIMO symbol detector based on the Expectation Propagation (EP) algorithm. EP allows approximating iteratively at polynomial-time the posterior distribution of the transmitted symbols. We also show that our EP MIMO detector outperforms classic and state-of-the-art solutions reducing the symbol error rate at a reduced computational complexity.
    IEEE Transactions on Communications 01/2014; 62(8):2840-2849. · 1.75 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated screening systems are commonly used to detect some agent in a sample and take a global decision about the subject (e.g. ill/healthy) based on these detections. We propose a Bayesian methodology for taking decisions in (sequential) screening systems that considers the false alarm rate of the detector. Our approach assesses the quality of its decisions and provides lower bounds on the achievable performance of the screening system from the training data. In addition, we develop a complete screening system for sputum smears in tuberculosis diagnosis, and show, using a real-world database, the advantages of the proposed framework when compared to the commonly used count detections and threshold approach.
    IEEE journal of biomedical and health informatics. 10/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gaussian processes (GPs) are versatile tools that have been successfully employed to solve nonlinear estimation problems in machine learning but are rarely used in signal processing. In this tutorial, we present GPs for regression as a natural nonlinear extension to optimal Wiener filtering. After establishing their basic formulation, we discuss several important aspects and extensions, including recursive and adaptive algorithms for dealing with nonstationarity, low-complexity solutions, non-Gaussian noise models, and classification scenarios. Furthermore, we provide a selection of relevant applications to wireless digital communications.
    IEEE Signal Processing Magazine 01/2013; 30(4):40-50. · 3.37 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose the tree-structured expectation propagation (TEP) algorithm for low-density parity-check (LDPC) decoding over the binary additive white Gaussian noise (BI-AWGN) channel. By approximating the posterior distribution by a tree-structure factorization, the TEP has been proven to improve belief propagation (BP) decoding over the binary erasure channel (BEC). We show for the AWGN channel how the TEP decoder is also able to capture additional information disregarded by the BP solution, which leads to a noticeable reduction of the error rate for finite-length codes. We show that for the range of codes of interest, the TEP gain is obtained with a slight increase in complexity over that of the BP algorithm. An efficient way of constructing the tree-like structure is also described.
    Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we analyze the finite-length performance of low-density parity check (LDPC) ensembles decoded over the binary erasure channel (BEC) using the tree-expectation propagation (TEP) algorithm. In a previous paper, we showed that the TEP improves the BP performance for decoding regular and irregular short LDPC codes, but the perspective was mainly empirical. In this work, given the degree-distribution of an LDPC ensemble, we explain and predict the range of code lengths for which the TEP improves the BP solution. In addition, for LDPC ensembles that present a single critical point, we propose a scaling law to accurately predict the performance in the waterfall region. These results are of critical importance to design practical LDPC codes for the TEP decoder.
    IEEE International Symposium on Information Theory Proceedings (ISIT); 07/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Low-density parity-check convolutional (LDPCC) codes asymptotically achieve channel capacity under belief propagation (BP) decoding. In this paper, we decode LDPCC codes using the Tree-Expectation Propagation (TEP) decoder, recently proposed as an alternative decoding method to the BP algorithm for the binary erasure channel (BEC). We show that, for LDPCC codes, the TEP decoder improves the BP solution with a comparable complexity or, alternatively, it allows using shorter codes to achieve similar error rates. We also propose a window-sliding scheme for the TEP decoder to reduce the decoding latency.
    IEEE Communications Letters 05/2012; 16(5):726-729. · 1.16 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe the channel equalization problem, and its prior estimate of the channel state information (CSI), as a joint Bayesian estimation problem to improve each symbol posterior estimates at the input of the channel decoder. Our approach takes into consideration not only the uncertainty due to the noise in the channel, but also the uncertainty in the CSI estimate. However, this solution cannot be computed in linear time, because it depends on all the transmitted symbols. Hence, we also put forward an approximation for each symbol's posterior, using the expectation propagation algorithm, which is optimal from the Kullback–Leibler divergence viewpoint and yields an equalization with a complexity identical to the BCJR algorithm. We also use a graphical model representation of the full posterior, in which the proposed approximation can be readily understood. The proposed posterior estimates are more accurate than those computed using the ML estimate for the CSI. In order to illustrate this point, we measure the error rate at the output of a low-density parity-check decoder, which needs the exact posterior for each symbol to detect the incoming word and it is sensitive to a mismatch in those posterior estimates. For example, for QPSK modulation and a channel with three taps, we can expect gains over 0.5 dB with same computational complexity as the ML receiver.
    IEEE Transactions on Signal Processing 05/2012; 60(5):2672-2676. · 2.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Strategies for generating knowledge in medicine have included observation of associations in clinical or research settings and more recently, development of pathophysiological models based on molecular biology. Although critically important, they limit hypothesis generation to an incremental pace. Machine learning and data mining are alternative approaches to identifying new vistas to pursue, as is already evident in the literature. In concert with these analytic strategies, novel approaches to data collection can enhance the hypothesis pipeline as well. In data farming, data are obtained in an 'organic' way, in the sense that it is entered by patients themselves and available for harvesting. In contrast, in evidence farming (EF), it is the provider who enters medical data about individual patients. EF differs from regular electronic medical record systems because frontline providers can use it to learn from their own past experience. In addition to the possibility of generating large databases with farming approaches, it is likely that we can further harness the power of large data sets collected using either farming or more standard techniques through implementation of data-mining and machine-learning strategies. Exploiting large databases to develop new hypotheses regarding neurobiological and genetic underpinnings of psychiatric illness is useful in itself, but also affords the opportunity to identify novel mechanisms to be targeted in drug discovery and development.
    Molecular Psychiatry 01/2012; 17(10):956-9. · 15.15 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal constraints over pairs of variables connected to a check node of the LDPC code's Tanner graph. Thanks to these additional constraints, the Tree-EP marginal estimates for each variable in the graph are more accurate than those provided by BP. We also reformulate the Tree-EP algorithm for the binary erasure channel (BEC) as a peeling-type algorithm (TEP) and we show that the algorithm has the same computational complexity as BP and it decodes a higher fraction of errors. We describe the TEP decoding process by a set of differential equations that represents the expected residual graph evolution as a function of the code parameters. The solution of these equations is used to predict the TEP decoder performance in both the asymptotic regime and the finite-length regime over the BEC. While the asymptotic threshold of the TEP decoder is the same as the BP decoder for regular and optimized codes, we propose a scaling law (SL) for finite-length LDPC codes, which accurately approximates the TEP improved performance and facilitates its optimization.
    IEEE Transactions on Information Theory 01/2012; · 2.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a decoding algorithm for LDPC codes that achieves the maximum likelihood (ML) solution over the binary erasure channel (BEC). In this channel, the tree-structured expectation propagation (TEP) decoder improves the peeling decoder (PD) by processing check nodes of degree one and two. However, it does not achieve the ML solution, as the tree structure of the TEP allows only for approximate inference. In this paper, we provide the procedure to construct the structure needed for exact inference. This algorithm, denoted as generalized tree-structured expectation propagation (GTEP), modifies the code graph by recursively eliminating any check node and merging this information in the remaining graph. The GTEP decoder upon completion either provides the unique ML solution or a tree graph in which the number of parent nodes indicates the multiplicity of the ML solution. We also explain the algorithm as a Gaussian elimination method, relating the GTEP to other ML solutions. Compared to previous approaches, it presents an equivalent complexity, it exhibits a simpler graphical message-passing procedure and, most interesting, the algorithm can be generalized to other channels.
    IEEE Transactions on Communications 01/2012; PP(99):1 -9. · 1.75 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). Expectation propagation generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pairwise marginal constraints over pairs of variables connected to a check node of the LDPC code's Tanner graph. Thanks to these additional constraints, the Tree-EP marginal estimates for each variable in the graph are more accurate than those provided by BP. We also reformulate the Tree-EP algorithm for the binary erasure channel (BEC) as a peeling-type algorithm (TEP) and we show that the algorithm has the same computational complexity as BP and it decodes a higher fraction of errors. We describe the TEP decoding process by a set of differential equations that represents the expected residual graph evolution as a function of the code parameters. The solution of these equations is used to predict the TEP decoder performance in both the asymptotic regime and the finite-length regimes over the BEC. While the asymptotic threshold of the TEP decoder is the same as the BP decoder for regular and optimized codes, we propose a scaling law for finite-length LDPC codes, which accurately approximates the TEP improved performance and facilitates its optimization.
    IEEE International Workshop on Machine Learning for Signal Processing (MLSP); 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Spatially-coupled (SC) LDPC codes are constructed from a set of L regular sparse codes of length M. In the asymptotic limit of these parameters, SC codes present an excellent decoding threshold under belief propagation (BP) decoding, close to the maximum a posteriori (MAP) threshold of the underlying regular code. In the finite-length regime, we need both dimensions, L and M, to be sufficiently large, yielding a very large code length and decoding latency. In this paper, and for the erasure channel, we show that the finite-length performance of SC codes is improved if we consider the tree-structured expectation propagation (TEP) algorithm in the decoding stage. When applied to the decoding of SC LDPC codes, it allows using shorter codes to achieve similar error rates. We also propose a window-sliding scheme for the TEP decoder to reduce the decoding latency.
    Information Theory Workshop (ITW), 2012 IEEE; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work we address the design of degree distributions (DD) of low-density parity-check (LDPC) codes for the tree-expectation propagation (TEP) decoder. The optimization problem to find distributions to maximize the TEP decoding threshold for a fixed-rate code can not be analytically solved. We derive a simplified optimization problem that can be easily solved since it is based in the analytic expressions of the peeling decoder. Two kinds of solutions are obtained from this problem: we either design LDPC ensembles for which the BP threshold equals the MAP threshold or we get LDPC ensembles for which the TEP threshold outperforms the BP threshold, even achieving the MAP capacity in some cases. Hence, we proved that there exist ensembles for which the MAP solution can be obtained with linear complexity even though the BP threshold does not achieve the MAP threshold.
    Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on; 09/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This letter proposes a multioutput support vector regression (M-SVR) method for the simultaneous estimation of different biophysical parameters from remote sensing images. General retrieval problems require multioutput (and potentially nonlinear) regression methods. M-SVR extends the single-output SVR to multiple outputs maintaining the advantages of a sparse and compact solution by using an ε-insensitive cost function. The proposed M-SVR is evaluated in the estimation of chlorophyll content, leaf area index and fractional vegetation cover from a hyperspectral compact high-resolution imaging spectrometer images. The achieved improvement with respect to the single-output regression approach suggests that M-SVR can be considered a convenient alternative for nonparametric biophysical parameter estimation and model inversion.
    IEEE Geoscience and Remote Sensing Letters 08/2011; · 1.82 Impact Factor
  • 06/2011;
  • P.M. Olmos, J.J. Murillo-Fuentes, F. Pérez-Cruz
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose Tree-structured Expectation Propagation (TEP) algorithm to decode finite-length Low-Density Parity-Check (LDPC) codes. The TEP decoder is able to continue decoding once the standard Belief Propagation (BP) decoder fails, presenting the same computational complexity as the BP decoder. The BP algorithm is dominated by the presence of stopping sets (SSs) in the code graph. We show that the TEP decoder, without previous knowledge of the graph, naturally avoids some fairly common SSs. This results in a significant improvement in the system performance.
    IEEE Communications Letters 03/2011; · 1.16 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In some applications, the probability of error of a given classifier is too high for its practical application, but we are allowed to gather more independent test samples from the same class to reduce the probability of error of the final decision. From the point of view of hypothesis testing, the solution is given by the Neyman-Pearson lemma. However, there is no equivalent result to the Neyman-Pearson lemma when the likelihoods are unknown, and we are given a training dataset. In this brief, we explore two alternatives. First, we combine the soft (probabilistic) outputs of a given classifier to produce a consensus labeling for K test samples. In the second approach, we build a new classifier that directly computes the label for K test samples. For this second approach, we need to define an extended input space training set and incorporate the known symmetries in the classifier. This latter approach gives more accurate results, as it only requires an accurate classification boundary, while the former needs an accurate posterior probability estimate for the whole input space. We illustrate our results with well-known databases.
    IEEE Transactions on Neural Networks 01/2011; 22(1):158-63. · 2.95 Impact Factor
  • IEEE Communications Letters. 01/2011; 15:235-237.

Publication Stats

590 Citations
104.91 Total Impact Points

Institutions

  • 2001–2013
    • University Carlos III de Madrid
      • • Department of Signal Theory and Communications
      • • Department of Electrical Engineering
      Getafe, Madrid, Spain
  • 2006–2011
    • Universidad de Sevilla
      • Signal and Communications Theory
      Hispalis, Andalusia, Spain
  • 2007–2010
    • Princeton University
      • Department of Electrical Engineering
      Princeton, NJ, United States
  • 2004–2006
    • University College London
      Londinium, England, United Kingdom
    • Complutense University of Madrid
      • Department of Financial Economy and Accounting I (Financial and Actuarial Economy)
      Madrid, Madrid, Spain
  • 2005
    • Oxford Centre for Computational Neuroscience
      Oxford, England, United Kingdom
  • 2002
    • University of Valencia
      • Departamento de Ingeniería Electrónica
      Valencia, Valencia, Spain
  • 2000–2001
    • University of Alcalá
      • Departamento de Teoría de la Señal y Comunicaciones
      Alcalá de Henares, Madrid, Spain
  • 1998
    • University of Vigo
      Vigo, Galicia, Spain