IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T PATTERN ANAL)

Publisher: IEEE Computer Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers

Journal description

Theory and application of computers in pattern analysis and machine intelligence. Topics include computer vision and image processing; knowledge representation, inference systems, and probabilistic reasoning. Extensive bibliographies.

Current impact factor: 5.78

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 5.781
2013 Impact Factor 5.694
2012 Impact Factor 4.795
2011 Impact Factor 4.908
2010 Impact Factor 5.027
2009 Impact Factor 4.378
2008 Impact Factor 5.96
2007 Impact Factor 3.579
2006 Impact Factor 4.306
2005 Impact Factor 3.81
2004 Impact Factor 4.352
2003 Impact Factor 3.823
2002 Impact Factor 2.923
2001 Impact Factor 2.289
2000 Impact Factor 2.094
1999 Impact Factor 1.882
1998 Impact Factor 1.417
1997 Impact Factor 1.668
1996 Impact Factor 2.085
1995 Impact Factor 1.94
1994 Impact Factor 2.006
1993 Impact Factor 1.917
1992 Impact Factor 1.906

Impact factor over time

Impact factor

Additional details

5-year impact 7.76
Cited half-life >10.0
Immediacy index 0.71
Eigenfactor 0.05
Article influence 3.31
Website IEEE Transactions on Pattern Analysis and Machine Intelligence website
Other titles IEEE transactions on pattern analysis and machine intelligence, Institute of Electrical and Electronics Engineers transactions on pattern analysis and machine intelligence
ISSN 0162-8828
OCLC 4253074
Material type Periodical, Internet resource
Document type Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on Author's personal website, employers website or publicly accessible server
    • Author's post-print on Author's server or Institutional server
    • Author's pre-print must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or replaced with Authors post-print
    • Author's pre-print must be accompanied with set-phrase, once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Author's pre-print must be accompanied with set-phrase, when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • IEEE must be informed as to the electronic address of the pre-print
    • If funding rules apply authors may post Author's post-print version in funder's designated repository
    • Author's Post-print - Publisher copyright and source must be acknowledged with citation (see above set statement)
    • Author's Post-print - Must link to publisher version with DOI
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of classifying actions of human subjects using depth movies generated by Kinect or other depth sensors. Representing human body as dynamical skeletons, we study the evolution of their (skeletons’) shapes as trajectories on Kendall’s shape manifold. The action data is typically corrupted by large variability in execution rates within and across subjects and, thus, causing major problems in statistical analyses. To address that issue, we adopt a recently-developed framework of Su et al. [1], [2] to this problem domain. Here, the variable execution rates correspond to re-parameterizations of trajectories, and one uses a parameterization-invariant metric for aligning, comparing, averaging, and modeling trajectories. This is based on a combination of transported square-root vector fields (TSRVFs) of trajectories and the standard Euclidean norm, that allows computational efficiency. We develop a comprehensive suite of computational tools for this application domain: smoothing and denoising skeleton trajectories using median filtering, up- and down-sampling actions in time domain, simultaneous temporalregistration of multiple actions, and extracting invertible Euclidean representations of actions. Due to invertibility these Euclidean representations allow both discriminative and generative models for statistical analysis. For instance, they can be used in a SVM-based classification of original actions as demonstrated here using MSR Action-3D, MSR Daily Activity and 3D Action Pairs datasets. This approach, using only the skeletal data, achieves the state-of-the-art classification results on these datasets.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 12/2015; DOI:10.1109/TPAMI.2015.2439257
  • [Show abstract] [Hide abstract]
    ABSTRACT: Certain inner feelings and physiological states like pain are subjective states that cannot be directly measured, but can be estimated from spontaneous facial expressions. Since they are typically characterized by subtle movements of facial parts, analysis of the facial details is required. To this end, we formulate a new regression method for continuous estimation of the intensity of facial behavior interpretation, called Doubly Sparse Relevance Vector Machine (DSRVM). DSRVM enforces double sparsity by jointly selecting the most relevant training examples (a.k.a. relevance vectors) and the most important kernels associated with facial parts relevant for interpretation of observed facial expressions. This advances prior work on multi-kernel learning, where sparsity of relevant kernels is typically ignored. Empirical evaluation on challenging Shoulder Pain videos, and the benchmark DISFA and SEMAINE datasets demonstrate that DSRVM outperforms competing approaches with a multi-fold reduction of running times in training and testing.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; DOI:10.1109/TPAMI.2015.2501824
  • [Show abstract] [Hide abstract]
    ABSTRACT: Principal component analysis (PCA) is widely applied in various areas, one of the typical applications is in face. Many versions of PCA have been developed for face recognition. However, most of these approaches are sensitive to grossly corrupted entries in a 2D matrix representing a face image. In this paper, we try to reduce the influence of grosses like variations in lighting, facial expressions and occlusions to improve the robustness of PCA. In order to achieve this goal, we present a simple but effective unsupervised preprocessing method, two-dimensional whitening reconstruction (TWR), which includes two stages: 1) A whitening process on a 2D face image matrix rather than a concatenated 1D vector; 2) 2D face image matrix reconstruction. TWR reduces the pixel redundancy of the internal image, meanwhile maintains important intrinsic features. In this way, negative effects introduced by gross-like variations are greatly reduced. Furthermore, the face image with TWR preprocessing could be approximate to a Gaussian signal, on which PCA is more effective. Experiments on benchmark face databases demonstrate that the proposed method could significantly improve the robustness of PCA methods on classification and clustering, especially for the faces with severe illumination changes.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; DOI:10.1109/TPAMI.2015.2501810
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a real-time method to accurately track the human head pose in the 3-dimensional (3D) world. Using a RGB-Depth camera, a face template is reconstructed by fitting a 3D morphable face model, and the head pose is determined by registering this user-specific face template to the input depth video.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; DOI:10.1109/TPAMI.2015.2500221
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses modelling data using the Watson distribution. The Watson distribution is one of the simplest distributions for analyzing axially symmetric data. This distribution has gained some attention in recent years due to its modeling capability. However, its Bayesian inference is fairly understudied due to difficulty in handling the normalization factor. Recent development of Markov chain Monte Carlo (MCMC) sampling methods can be applied for this purpose. However, these methods can be prohibitively slow for practical applications. A deterministic alternative is provided by variational methods that convert inference problems into optimization problems. In this paper, we present a variational inference for Watson mixture models. First, the variational framework is used to side-step the intractability arising from the coupling of latent states and parameters. Second, the variational free energy is further lower bounded in order to avoid intractable moment computation. The proposed approach provides a lower bound on the log marginal likelihood and retains distributional information over all parameters. Moreover, we show that it can regulate its own complexity by pruning unnecessary mixture components while avoiding over-fitting. We discuss potential applications of the modeling with Watson distributions in the problem of blind source separation, and clustering gene expression data sets.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; DOI:10.1109/TPAMI.2015.2498935
  • [Show abstract] [Hide abstract]
    ABSTRACT: Understanding human behavior through nonverbal-based features, is interesting in several applications such as surveillance, ambient assisted living and human-robot interaction. In this article in order to analyze human behaviors in social context, we propose a new approach which explores interrelations between body part motions in scenarios with people doing a conversation. The novelty of this method is that we analyze body motion-based features in frequency domain to estimate different human social patterns: Interpersonal Behaviors (IBs) and a Social Role (SR). To analyze the dynamics and interrelations of people's body motions, a human movement descriptor is used to extract discriminative features, and a multi-layer Dynamic Bayesian Network (DBN) technique is proposed to model the existent dependencies. Laban Movement Analysis (LMA) is a well-known human movement descriptor, which provides efficient mid-level information of human body motions. The mid-level information is useful to extract the complex interdependencies. The DBN technique is tested in different scenarios to model the mentioned complex dependencies. The study is applied for obtaining four IBs (Interest, Indicator, Empathy and Emphasis) to estimate one SR (Leading). The obtained results give a good indication of the capabilities of the proposed approach for people interaction analysis with potential applications in human-robot interaction.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; DOI:10.1109/TPAMI.2015.2496209
  • [Show abstract] [Hide abstract]
    ABSTRACT: There are two sides to every story of visual saliency modeling in the frequency domain. On the one hand, image saliency can be effectively estimated by applying simple operations to the frequency spectrum. On the other hand, it is still unclear which part of the frequency spectrum contributes the most to popping-out targets and suppressing distractors. Toward this end, this paper tentatively explores the secret of image saliency in the frequency domain. From the results obtained in several qualitative and quantitative experiments, we find that the secret of visual saliency may mainly hide in the phases of intermediate frequencies. To explain this finding, we reinterpret the concept of discrete Fourier transform from the perspective of template-based contrast computation and thus develop several principles for designing the saliency detector in the frequency domain. Following these principles, we propose a novel approach to design the saliency detector under the assistance of prior knowledge obtained through both unsupervised and supervised learning processes. Experimental results on a public image benchmark show that the learned saliency detector outperforms 18 state-of-the-art approaches in predicting human fixations.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2424870
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a new method of finding thin, elongated structures in images and volumes. We use shortest paths to minimize very general functionals of higher-order curve properties, such as curvature and torsion. Our method uses line graphs to find the optimal path on a given discretization, often in the order of seconds on a single computer. The curves are then refined using local optimization making it possible to recover very smooth curves. We are able to place constraints on our curves such as maximum integrated curvature, or a maximum curvature at any point of the curve. To our knowledge, we are the first to perform experiments in three dimensions with curvature and torsion regularization. The largest graphs we process have over a hundred billion arcs. Experiments on medical images and in multi-view reconstruction show the significance and practical usefulness of higher order regularization.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2409869
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the space of first order models of smooth frame fields using the method of moving frames. By exploiting the Maurer-Cartan matrix of connection forms we develop geometrical embeddings for frame fields which lie on spherical, ellipsoidal and generalized helicoid surfaces. We design methods for optimizing connection forms in local neighborhoods and apply these to a statistical analysis of heart fiber geometry, using diffusion magnetic resonance imaging. This application of moving frames corroborates and extends recent characterizations of muscle fiber orientation in the heart wall, but also provides for a rich geometrical interpretation. In particular, we can now obtain direct local measurements of the variation of the helix and transverse angles, of fiber fanning and twisting, and of the curvatures of the heart wall in which these fibers lie.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2408352
  • [Show abstract] [Hide abstract]
    ABSTRACT: It is practical to assume that an individual view is unlikely to be sufficient for effective multi-view learning. Therefore, integration of multi-view information is both valuable and necessary. In this paper, we propose the Multi-view Intact Space Learning (MISL) algorithm, which integrates the encoded complementary information in multiple views to discover a latent intact representation of the data. Even though each view on its own is insufficient, we show theoretically that by combing multiple views we can obtain abundant information for latent intact space learning. Employing the Cauchy loss (a technique used in statistical learning) as the error measurement strengthens robustness to outliers. We propose a new definition of multi-view stability and then derive the generalization error bound based on multi-view stability and Rademacher complexity, and show that the complementarity between multiple views is beneficial for the stability and generalization. MISL is efficiently optimized using a novel Iteratively Reweight Residuals (IRR) technique, whose convergence is theoretically analyzed. Experiments on synthetic data and real-world datasets demonstrate that MISL is an effective and promising algorithm for practical applications.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2417578
  • [Show abstract] [Hide abstract]
    ABSTRACT: Color, infrared and flash images captured in different fields can be employed to effectively eliminate noise and other visual artifacts. We propose a two-image restoration framework considering input images from different fields, for example, one noisy color image and one dark-flashed near-infrared image. The major issue in such a framework is to handle all structure divergence and find commonly usable edges and smooth transitions for visually plausible image reconstruction. We introduce a novel scale map as a competent representation to explicitly model derivative-level confidence and propose new functions and a numerical solver to effectively infer it following our important structural observations. Multispectral shadow detection is also used to make our system more robust. Our method is general and shows a principled way to solve multispectral restoration problems.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2417569
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present an algorithm that integrates image co-segmentation into feature matching, and can robustly yield accurate and dense feature correspondences. Inspired by the fact that correct feature correspondences on the same object typically have coherent transformations, we cast the task of feature matching as a density estimation problem in the homography space. Specifically, we project the homographies of correspondence candidates into the parametric Hough space, in which geometric verification of correspondences can be activated by voting. The precision of matching is then boosted. On the other hand, we leverage image co-segmentation, which discovers object boundaries, to determine relevant voters and speed up Hough voting. In addition, correspondence enrichment can be achieved by inferring the concerted homographies that are propagated between the features within the same segments. The recall is hence increased. In our approach, feature matching and image co-segmentation are tightly coupled. Through an iterative optimization process, more and more correct correspondences are detected owing to object boundaries revealed by co-segmentation. The proposed approach is comprehensively evaluated. Promising experimental results on four datasets manifest its effectiveness.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2420556
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many tasks in computer vision, such as action classification and object detection, require us to rank a set of samples according to their relevance to a particular visual category. The performance of such tasks is often measured in terms of the average precision (AP). Yet it is common practice to employ the support vector machine (SVM) classifier, which optimizes a surrogate 0-1 loss. The popularity of SVM can be attributed to its empirical performance. Specifically, in fully supervised settings, SVM tends to provide similar accuracy to AP-SVM, which directly optimizes an AP-based loss. However, we hypothesize that in the significantly more challenging and practically useful setting of weakly supervised learning, it becomes crucial to optimize the right accuracy measure. In order to test this hypothesis, we propose a novel latent AP-SVM that minimizes a carefully designed upper bound on the AP-based loss function over weakly supervised samples. Using publicly available datasets, we demonstrate the advantage of our approach over standard loss-based learning frameworks on three challenging problems: action classification, character recognition and object detection.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2414435
  • [Show abstract] [Hide abstract]
    ABSTRACT: A robust and accurate hue descriptor that is useful in modeling human color perception and for computer vision applications is explored. The hue descriptor is based on the peak wavelength of a Gaussian-like function (called a wraparound Gaussian) and is shown to correlate as well as CIECAM02 hue to the hue designators of papers from the Munsell and Natural Color System color atlases and to the hue names found in Moroney's Color Thesaurus. The new hue descriptor is also shown to be significantly more stable under a variety of illuminants than CIECAM02. The use of wraparound Gaussians as a hue model is similar in spirit to the use of subtractive Gaussians proposed by Mizokami et al., but overcomes many of their limitations.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2420560
  • [Show abstract] [Hide abstract]
    ABSTRACT: A typical scene category contains an enormous number of distinct scene configurations that are composed of objects and regions of varying shapes in different layouts. In this paper, we first propose a representation named Hierarchical Space Tiling (HST) to quantize the huge and continuous scene configuration space. Then, we augment the HST with attributes (nouns and adjectives) to describe the semantics of the objects and regions inside a scene. We present a weakly supervised method for simultaneously learning the scene configurations and attributes from a collection of natural images associated with descriptive text. The precise locations of attributes are unknown in the input and are mapped to the HST nodes through learning. Starting with a full HST, we iteratively estimate the HST model under a learning-by-parsing framework. Given a test image, we compute the most probable parse tree with the associated attributes by dynamic programming. We quantitatively analyze the representative efficiency of HST, show the learned representation is less ambiguous and has semantically meaningful inner concepts. In applications, we apply our model to four tasks: scene classification, attribute recognition, attribute localization, and pixel-wise scene labeling, and show the performance improvements as well as higher efficiency.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2424880