Weifeng Liu

Universidad de Cantabria, Santander, Cantabria, Spain

Are you Weifeng Liu?

Claim your profile

Publications (20)34.58 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Similarity is a key concept to quantify temporal signals or static measurements. Similarity is difficult to define mathematically, however, one never really thinks too much about this difficulty and naturally translates similarity by correlation. This is one more example of how engrained second-order moment descriptors of the probability density function really are in scientific thinking. Successful engineering or pattern recognition solutions from these methodologies rely heavily on the Gaussianity and linearity assumptions, exactly for the same reasons discussed in Chapter 3.
    04/2010: pages 385-413;
  • Source
    Kernel Adaptive Filtering: A Comprehensive Introduction, 03/2010: pages 1 - 26; , ISBN: 9780470608593
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a kernel-based recursive least-squares (KRLS) algorithm on a fixed memory budget, capable of recursively learning a nonlinear mapping and tracking changes over time. In order to deal with the growing support inherent to online kernel methods, the proposed method uses a combined strategy of growing and pruning the support. In contrast to a previous sliding-window based technique, the presented algorithm does not prune the oldest data point in every time instant but it instead aims to prune the least significant data point. We also introduce a label update procedure to equip the algorithm with tracking capability. Simulations show that the proposed method obtains better performance than state-of-the-art kernel adaptive filtering techniques given similar memory requirements.
    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA; 01/2010
  • Source
    Weifeng Liu, Il Park, José C Principe
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper discusses an information theoretic approach of designing sparse kernel adaptive filters. To determine useful data to be learned and remove redundant ones, a subjective information measure called surprise is introduced. Surprise captures the amount of information a datum contains which is transferable to a learning system. Based on this concept, we propose a systematic sparsification scheme, which can drastically reduce the time and space complexity without harming the performance of kernel adaptive filters. Nonlinear regression, short term chaotic time-series prediction, and long term time-series forecasting examples are presented.
    IEEE Transactions on Neural Networks 11/2009; 20(12):1950-61. · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a kernelized version of the extended recursive least squares (EX-KRLS) algorithm which implements for the first time a general linear state model in reproducing kernel Hilbert spaces (RKHS), or equivalently a general nonlinear state model in the input space. The center piece of this development is a reformulation of the well known extended recursive least squares (EX-RLS) algorithm in RKHS which only requires inner product operations between input vectors, thus enabling the application of the kernel property (commonly known as the kernel trick). The first part of the paper presents a set of theorems that shows the generality of the approach. The EX-KRLS is preferable to 1) a standard kernel recursive least squares (KRLS) in applications that require tracking the state-vector of general linear state-space models in the kernel space, or 2) an EX-RLS when the application requires a nonlinear observation and state models. The second part of the paper compares the EX-KRLS in nonlinear Rayleigh multipath channel tracking and in Lorenz system modeling problem. We show that the proposed algorithm is able to outperform the standard KRLS and EX-RLS in both simulations.
    IEEE Transactions on Signal Processing 11/2009; · 2.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The linear least mean squares (LMS) algorithm has been recently extended to a reproducing kernel Hilbert space, resulting in an adaptive filter built from a weighted sum of kernel functions evaluated at each incoming data sample. With time, the size of the filter as well as the computation and memory requirements increase. In this paper, we shall propose a new efficient methodology for constraining the increase in length of a radial basis function (RBF) network resulting from the kernel LMS algorithm without significant sacrifice on performance. The method involves sequential Gaussian elimination steps on the Gram matrix to test the linear dependency of the feature vector corresponding to each new input with all the previous feature vectors. This gives an efficient method of continuing the learning as well as restricting the number of kernel functions used.
    Signal Processing 03/2009; 89:257-265. · 2.24 Impact Factor
  • Source
    Weifeng Liu, Il Park, José C. Príncipe
    IEEE Transactions on Neural Networks. 01/2009; 20:1950-1961.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper demonstrates the effectiveness of a nonlinear extension to the matched filter for signal detection in certain kinds of non-Gaussian noise. The decision statistic is based on a new measure of similarity that can be considered as an extension of the correlation statistic used in the matched filter. The optimality of the matched filter is predicated on second order statistics and hence leaves room for improvement, especially when the assumption of Gaussianity is not applicable. The proposed method incorporates higher order moments in the decision statistic and shows an improvement in the receiver operating characteristics (ROC) for non-Gaussian noise, in particular, those that are impulsive distributed. The performance of the proposed method is demonstrated for detection in two types of widely used impulsive noise models, the alpha-stable model and the two-term Gaussian mixture model. Moreover, unlike other kernel based approaches, and those using the characteristic functions directly, this method is still computationally tractable and can easily be implemented in real-time.
    Signal Processing 01/2009; · 2.24 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The minimum average correlation energy (MACE) filter is well known for object recognition. This paper proposes a nonlinear extension to the MACE filter using the recently introduced correntropy function. Correntropy is a positive definite function that generalizes the concept of correlation by utilizing second and higher order moments of the signal statistics. Because of its positive definite nature, correntropy induces a new reproducing kernel Hilbert space (RKHS). Taking advantage of the linear structure of the RKHS it is possible to formulate the MACE filter equations in the RKHS induced by correntropy and obtained an approximate solution. Due to the nonlinear relation between the feature space and the input space, the correntropy MACE (CMACE) can potentially improve upon the MACE performance while preserving the shift-invariant property (additional computation for all shifts will be required in the CMACE). To alleviate the computation complexity of the solution, this paper also presents the fast CMACE using the fast Gauss transform (FGT). We apply the CMACE filter to the MSTAR public release synthetic aperture radar (SAR) data set as well as PIE database of human faces and show that the proposed method exhibits better distortion tolerance and outperforms the linear MACE in both generalization and rejection abilities.
    Pattern Recognition 01/2009; · 2.58 Impact Factor
  • Source
    Weifeng Liu, J.C. Principe
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we investigate the wellposedness of the kernel adaline. The kernel adaline finds the linear coefficients in a radial basis function network using deterministic gradient descent. We will show that the gradient descent provides an inherent regularization as long as the training is properly early-stopped. Along with other popular regularization techniques, this result is investigated in a unifying regularization-function concept. This understanding provides an alternative and possibly simpler way to obtain regularized solutions comparing with the cross-validation approach in regularization networks.
    Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on; 07/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The combination of the famed kernel trick and the least-mean-square (LMS) algorithm provides an interesting sample-by-sample update for an adaptive filter in reproducing kernel Hilbert spaces (RKHS), which is named in this paper the KLMS. Unlike the accepted view in kernel methods, this paper shows that in the finite training data case, the KLMS algorithm is well posed in RKHS without the addition of an extra regularization term to penalize solution norms as was suggested by Kivinen [Kivinen, Smola and Williamson, ldquoOnline Learning With Kernels,rdquo IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2165-2176, Aug. 2004] and Smale [Smale and Yao, ldquoOnline Learning Algorithms,rdquo Foundations in Computational Mathematics, vol. 6, no. 2, pp. 145-176, 2006]. This result is the main contribution of the paper and enhances the present understanding of the LMS algorithm with a machine learning perspective. The effect of the KLMS step size is also studied from the viewpoint of regularization. Two experiments are presented to support our conclusion that with finite data the KLMS algorithm can be readily used in high dimensional spaces and particularly in RKHS to derive nonlinear, stable algorithms with comparable performance to batch, regularized solutions.
    IEEE Transactions on Signal Processing 03/2008; · 2.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The optimality of second-order statistics depends heavily on the assumption of Gaussianity. In this paper, we elucidate further the probabilistic and geometric meaning of the recently defined correntropy function as a localized similarity measure. A close relationship between correntropy and M-estimation is established. Connections and differences between correntropy and kernel methods are presented. As such correntropy has vastly different properties compared with second-order statistics that can be very useful in non-Gaussian signal processing, especially in the impulsive noise environment. Examples are presented to illustrate the technique.
    IEEE Transactions on Signal Processing 12/2007; · 2.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Information theoretical learning (ITL) is a signal processing technique that goes far beyond the traditional techniques based on second order statistics which highly relies on the linearity and Gaussinarity assumptions. Information potential (IP) and symmetric information potential (SIP) are very important concepts in ITL used for system adaptation and data inference. In this paper, a mathematical analysis of the bias and the variance of their estimators is presented. Our results show that the variances decrease as the sample size N increases at the speed of O(N<sup>-1</sup>) and a bound exists for the biases. A simple numerical simulation is demonstrated to support our analysis.
    Machine Learning for Signal Processing, 2007 IEEE Workshop on; 09/2007
  • Conference Paper: Kernel LMS
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper a nonlinear adaptive algorithm based on a kernel space least mean squares (LMS) approach is presented. With most of the neural network based methods for time series modeling it is difficult to implement a sample-by-sample adaptation method. This puts a serious limitation on the applicability of adaptive nonlinear filters in many optimal signal processing and communication applications where data arrives sequentially. This paper shows that the kernel LMS algorithm provides a computational simple and an effective algorithm to train nonlinear systems for system modeling without the need for regularization, without convergence to local minima and without the need for a separate book of data as a training set
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on; 05/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a new quantity called symmetric information potential (SIP) is proposed to measure the reflection symmetry and to estimate the location parameter of probability density functions. SIP is defined as an inner product in the probability density function space and has a close relation to information theoretic learning. A simple nonparametric estimator directly from the data exists. Experiments demonstrate that this concept can be very useful dealing with impulsive data distributions, in particular, alpha-stable distributions
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on; 05/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Correntropy has recently been introduced as a generalized correlation function between two stochastic processes, which contains both high-order statistics and temporal structure of the stochastic processes in one functional form. Based on this blend of high-order statistics and temporal structure in a single functional form, we propose a unified criterion for instantaneous blind source separation (BSS). The criterion simultaneously exploits both spatial and spectral characteristics of the sources. Consequently, the new algorithm is able to separate independent, identically distributed (i.i.d.) sources, which requires high-order statistics; and it is also able to separate temporally correlated Gaussian sources with distinct spectra, which requires temporal information. Performance of the proposed method is compared with other popular BSS methods that solely depend on either high-order statistics (FastICA, JADE) or second-order statistics at different lags (SOBI). The new algorithm outperforms the conventional methods in the case of mixtures of sub-Gaussian and super- Gaussian sources. r 2007 Elsevier B.V. All rights reserved.
    Signal Processing 01/2007; 87:1872-1881. · 2.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we introduce a new cost function called information theoretic mean shift algorithm to capture the "predominant structure" in the data. We formulate this problem with a cost function which minimizes the entropy of the data subject to the constraint that the Cauchy-Schwartz distance between the new and the original dataset is fixed to some constant value. We show that Gaussian mean shift and the Gaussian blurring mean shift are special cases of this generalized algorithm giving a whole new perspective to the idea of mean shift. Further this algorithm can also be used to capture the principal curve of the data making it ubiquitous for manifold learning.
    Machine Learning for Signal Processing, 2006. Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on; 10/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Minimization of the error entropy (MEE) cost function was introduced for nonlinear and non-Gaussian signal processing. In this paper, we show that this cost function has a close relation to a introduced correntropy criterion and M-estimation, thus it also theoretically explains the robustness of MEE to outliers. Based on this understanding, we propose a modification to the MEE cost function named minimization of error entropy with fiducial points, which sets the bias for MEE in an elegant and robust way. The performance of this new criterion is compared with the original MEE and the mean square error criterion (MSE) in robust regression and short-term prediction of a chaotic time series.
    Machine Learning for Signal Processing, 2006. Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on; 10/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The measure of similarity normally utilized in statistical signal processing is based on second order moments. In this paper, we reveal the probabilistic meaning of correntropy as a new localized similarity measure based on information theoretic learning (ITL) and kernel methods. As such it has vastly different properties when compared with mean square error (MSE) that can be very useful in nonlinear, non-Gaussian signal processing. Two examples are presented to illustrate the technique.
    Neural Networks, 2006. IJCNN '06. International Joint Conference on; 01/2006
  • [Show abstract] [Hide abstract]
    ABSTRACT: The previous chapter defined cross-correntropy for the case of a pair of scalar random variables, and presented applications in statistical inference. This chapter extends the definition of correntropy for the case of random (or stochastic) processes, which are index sets of random variables. In statistical signal processing the index set is time; we are interested in random variables that are a function of time and the goal is to quantify their statistical dependencies (although the index set can also be defined over inputs or channels of multivariate random variables). The autocorrelation function, which measures the statistical dependency between random variables at two different times, is conventionally utilized for this goal. Hence, we generalize the definition of autocorrelation to an autocorrentropy function. The name correntropywas coined to reflect the fact that the function “looks like” correlation but the sum over the lags (or over dimensions of the multivariate random variable) is the information potential (i.e., the argument of Renyi’s quadratic entropy). The definition of cross-correntropy for random variables carries over to time series with a minor but important change in the domain of the variables that now are an index set of lags. When it is clear from the context, we simplify the terminology and refer to the different functions (autocorrentropy, or crosscorrentropy) simply as correntropy function, but keep the word “function” to distinguish them from Chapter 10 quantities.