IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T PATTERN ANAL )

Publisher: IEEE Computer Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers


Theory and application of computers in pattern analysis and machine intelligence. Topics include computer vision and image processing; knowledge representation, inference systems, and probabilistic reasoning. Extensive bibliographies.

  • Impact factor
    Show impact factor history
    Impact factor
  • 5-year impact
  • Cited half-life
  • Immediacy index
  • Eigenfactor
  • Article influence
  • Website
    IEEE Transactions on Pattern Analysis and Machine Intelligence website
  • Other titles
    IEEE transactions on pattern analysis and machine intelligence, Institute of Electrical and Electronics Engineers transactions on pattern analysis and machine intelligence
  • ISSN
  • OCLC
  • Material type
    Periodical, Internet resource
  • Document type
    Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Authors own and employers publicly accessible webpages
    • Preprint - Must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or Authors post-print
    • Preprint - Set-phrase must be added once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Preprint - Set phrase must be added when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • Preprint - IEEE must be informed as to the electronic address of the pre-print
    • Postprint - Publisher copyright and source must be acknowledged (see above set statement)
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: We analyze sequential image labeling methods that sample the posterior label field in order to gather contextual information. We propose an effective method that extracts local Taylor coefficients from the posterior at different scales. Results show that our proposal outperforms state-of-the-art methods on MSRC-21, CAMVID, eTRIMS8 and KAIST2 datasets.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The objective of this work is to learn descriptors suitable for the sparse feature detectors used in viewpoint invariant matching. We make a number of novel contributions towards this goal. First, it is shown that learning the pooling regions for the descriptor can be formulated as a convex optimisation problem selecting the regions using sparsity. Second, it is shown that descriptor dimensionality reduction can also be formulated as a convex optimisation problem, using Mahalanobis matrix nuclear norm regularisation. Both formulations are based on discriminative large margin learning constraints. As the third contribution, we evaluate the performance of the compressed descriptors, obtained from the learnt real-valued descriptors by binarisation. Finally, we propose an extension of our learning formulations to a weakly supervised case, which allows us to learn the descriptors from unannotated image collections. It is demonstrated that the new learning methods improve over the state of the art in descriptor learning on the annotated local patches data set of Brown et al. and unannotated photo collections of Philbin et al. .
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(8):1573-1585.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Soft biometrics are a new form of biometric identification which use physical or behavioral traits that can be naturally described by humans. Unlike other biometric approaches, this allows identification based solely on verbal descriptions, bridging the semantic gap between biometrics and human description. To permit soft biometric identification the description must be accurate, yet conventional human descriptions comprising of absolute labels and estimations are often unreliable. A novel method of obtaining human descriptions will be introduced which utilizes comparative categorical labels to describe differences between subjects. This innovative approach has been shown to address many problems associated with absolute categorical labels—most critically, the descriptions contain more objective information and have increased discriminatory capabilities. Relative measurements of the subjects’ traits can be inferred from comparative human descriptions using the Elo rating system. The resulting soft biometric signatures have been demonstrated to be robust and allow accurate recognition of subjects. Relative measurements can also be obtained from other forms of human representation. This is demonstrated using a support vector machine to determine relative measurements from gait biometric signatures—allowing retrieval of subjects from video footage by using human comparisons, bridging the semantic gap.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(6):1216-1228.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Video analysis often begins with background subtraction. This problem is often approached in two steps—a background model followed by a regularisation scheme. A model of the background allows it to be distinguished on a per-pixel basis from the foreground, whilst the regularisation combines information from adjacent pixels. We present a new method based on Dirichlet process Gaussian mixture models, which are used to estimate per-pixel background distributions. It is followed by probabilistic regularisation. Using a non-parametric Bayesian method allows per-pixel mode counts to be automatically inferred, avoiding over-/under- fitting. We also develop novel model learning algorithms for continuous update of the model in a principled fashion as the scene changes. These key advantages enable us to outperform the state-of-the-art alternatives on four benchmarks.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(4):670-683.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (using a tracking-by-detection framework) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation of our structure-preserving object tracker (SPOT) reveals substantial performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object. Moreover, we show that SPOT can be used to adapt generic, model-based object detectors during tracking to tailor them towards a specific instance of that object.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(4):756-769.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fusing multiple continuous expert annotations is a crucial problem in machine learning and computer vision, particularly when dealing with uncertain and subjective tasks related to affective behavior. Inspired by the concept of inferring shared and individual latent spaces in Probabilistic Canonical Correlation Analysis (PCCA), we propose a novel, generative model that discovers temporal dependencies on the shared/individual spaces (Dynamic Probabilistic CCA, DPCCA). In order to accommodate for temporal lags, which are prominent amongst continuous annotations, we further introduce a latent warping process, leading to the DPCCA with Time Warpings (DPCTW) model. Finally, we propose two supervised variants of DPCCA/DPCTW which incorporate inputs (i.e., visual or audio features), both in a generative (SG-DPCCA) and discriminative manner (SD-DPCCA). We show that the resulting family of models (i) can be used as a unifying framework for solving the problems of temporal alignment and fusion of multiple annotations in time, (ii) can automatically rank and filter annotations based on latent posteriors or other model statistics, and (iii) that by incorporating dynamics, modeling annotation-specific biases, noise estimation, time warping and supervision, DPCTW outperforms state-of-the-art methods for both the aggregation of multiple, yet imperfect expert annotations as well as the alignment of affective behavior.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(7):1299-1311.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a principled framework for analyzing collective activities at different levels of semantic granularity from videos. Our framework is capable of jointly tracking multiple individuals, recognizing activities performed by individuals in isolation (i.e., atomic activities such as walking or standing), recognizing the interactions between pairs of individuals (i.e., interaction activities) as well as understanding the activities of group of individuals (i.e., collective activities). A key property of our work is that it can coherently combine bottom-up information stemming from detections or fragments of tracks (or tracklets) with top-down evidence. Top-down evidence is provided by a newly proposed descriptor that captures the coherent behavior of groups of individuals in a spatial-temporal neighborhood of the sequence. Top-down evidence provides contextual information for establishing accurate associations between detections or tracklets across frames and, thus, for obtaining more robust tracking results. Bottom-up evidence percolates upwards so as to automatically infer collective activity labels. Experimental results on two challenging data sets demonstrate our theoretical claims and indicate that our model achieves enhances tracking results and the best collective classification results to date.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(6):1242-1257.
  • [Show abstract] [Hide abstract]
    ABSTRACT: For many problems in computer vision, human learners are considerably better than machines. Humans possess highly accurate internal recognition and learning mechanisms that are not yet understood, and they frequently have access to more extensive training data through a lifetime of unbiased experience with the visual world. We propose to use visual psychophysics to directly leverage the abilities of human subjects to build better machine learning systems. First, we use an advanced online psychometric testing platform to make new kinds of annotation data available for learning. Second, we develop a technique for harnessing these new kinds of information—“perceptual annotations”—for support vector machines. A key intuition for this approach is that while it may remain infeasible to dramatically increase the amount of data and high-quality labels available for the training of a given system, measuring the exemplar-by-exemplar difficulty and pattern of errors of human annotators can provide important information for regularizing the solution of the system at hand. A case study for the problem face detection demonstrates that this approach yields state-of-the-art results on the challenging FDDB data set.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(8):1679-1686.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an alternative approach based on methodologies widely used in the field of statistical machine learning. Specifically, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a Gaussian process-mixture conditional heteroscedasticity (GPMCH) model for volatility modeling in financial return series. We impose a nonparametric prior with power-law nature over the distribution of the model mixture components, namely the Pitman-Yor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copula-based approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated GPMCH model. We evaluate the efficacy of our approach in a number of benchmark scenarios, and compare its performance to state-of-the-art methodologies.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(5):888-900.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose the visual tracker sampler, a novel tracking algorithm that can work robustly in challenging scenarios, where several kinds of appearance and motion changes of an object can occur simultaneously. The proposed tracking algorithm accurately tracks a target by searching for appropriate trackers in each frame. Since the real-world tracking environment varies severely over time, the trackers should be adapted or newly constructed depending on the current situation, so that each specific tracker takes charge of a certain change in the object. To do this, our method obtains several samples of not only the states of the target but also the trackers themselves during the sampling process. The trackers are efficiently sampled using the Markov Chain Monte Carlo (MCMC) method from the predefined tracker space by proposing new appearance models, motion models, state representation types, and observation types, which are the important ingredients of visual trackers. All trackers are then integrated into one compound tracker through an Interacting MCMC (IMCMC) method, in which the trackers interactively communicate with one another while running in parallel. By exchanging information with others, each tracker further improves its performance, thus increasing overall tracking performance. Experimental results show that our method tracks the object accurately and reliably in realistic videos, where appearance and motion drastically change over time, and outperforms even state-of-the-art tracking methods.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(7):1428-1441.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new framework for tackling face recognition problem. The face recognition problem is formulated as groupwise deformable image registration and feature matching problem. The main contributions of the proposed method lie in the following aspects: (1) Each pixel in a facial image is represented by an anatomical signature obtained from its corresponding most salient scale local region determined by the survival exponential entropy (SEE) information theoretic measure. (2) Based on the anatomical signature calculated from each pixel, a novel Markov random field based groupwise registration framework is proposed to formulate the face recognition problem as a feature guided deformable image registration problem. The similarity between different facial images are measured on the nonlinear Riemannian manifold based on the deformable transformations. (3) The proposed method does not suffer from the generalizability problem which exists commonly in learning based algorithms. The proposed method has been extensively evaluated on four publicly available databases: FERET, CAS-PEAL-R1, FRGC ver 2.0, and the LFW. It is also compared with several state-of-the-art face recognition approaches, and experimental results demonstrate that the proposed method consistently achieves the highest recognition rates among all the methods under comparison.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(4):657-669.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new bilateral filtering method specially designed for practical stereo vision systems. Parallel algorithms are preferred in these systems due to the real-time performance requirement. Edge-preserving filters like the bilateral filter have been demonstrated to be very effective for high-quality local stereo matching. A hardware-efficient bilateral filter is thus proposed in this paper. When moved to an NVIDIA GeForce GTX 580 GPU, it can process a one megapixel color image at around 417 frames per second. This filter can be directly used for cost aggregation required in any local stereo matching algorithm. Quantitative evaluation shows that it outperforms all the other local stereo methods both in terms of accuracy and speed on Middlebury benchmark. It ranks 12th out of over 120 methods on Middlebury data sets, and the average runtime (including the matching cost computation, occlusion handling, and post processing) is only 15 milliseconds (67 frames per second).
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; 36(5):1026-1032.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Traditionally, the hinge loss is used to construct support vector machine (SVM) classifiers. The hinge loss is related to the shortest distance between sets and the corresponding classifier is hence sensitive to noise and unstable for re-sampling. In contrast, the pinball loss is related to the quantile distance and the result is less sensitive. The pinball loss has been deeply studied and widely applied in regression but it has not been used for classification. In this paper, we propose a SVM classifier with the pinball loss, called pin-SVM, and investigate its properties, including noise insensitivity, robustness, and misclassification error. Besides, insensitive zone is applied to the pin-SVM for a sparse model. Compared to the SVM with the hinge loss, the proposed pin-SVM has the same computational complexity and enjoys noise insensitivity and re-sampling stability.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014;

Related Journals