IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T PATTERN ANAL)

Publisher: IEEE Computer Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers

Journal description

Theory and application of computers in pattern analysis and machine intelligence. Topics include computer vision and image processing; knowledge representation, inference systems, and probabilistic reasoning. Extensive bibliographies.

Current impact factor: 5.78

Impact Factor Rankings

2016 Impact Factor Available summer 2017
2014 / 2015 Impact Factor 5.781
2013 Impact Factor 5.694
2012 Impact Factor 4.795
2011 Impact Factor 4.908
2010 Impact Factor 5.027
2009 Impact Factor 4.378
2008 Impact Factor 5.96
2007 Impact Factor 3.579
2006 Impact Factor 4.306
2005 Impact Factor 3.81
2004 Impact Factor 4.352
2003 Impact Factor 3.823
2002 Impact Factor 2.923
2001 Impact Factor 2.289
2000 Impact Factor 2.094
1999 Impact Factor 1.882
1998 Impact Factor 1.417
1997 Impact Factor 1.668
1996 Impact Factor 2.085
1995 Impact Factor 1.94
1994 Impact Factor 2.006
1993 Impact Factor 1.917
1992 Impact Factor 1.906

Impact factor over time

Impact factor

Additional details

5-year impact 7.76
Cited half-life >10.0
Immediacy index 0.71
Eigenfactor 0.05
Article influence 3.31
Website IEEE Transactions on Pattern Analysis and Machine Intelligence website
Other titles IEEE transactions on pattern analysis and machine intelligence, Institute of Electrical and Electronics Engineers transactions on pattern analysis and machine intelligence
ISSN 0162-8828
OCLC 4253074
Material type Periodical, Internet resource
Document type Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on Author's personal website, employers website or publicly accessible server
    • Author's post-print on Author's server or Institutional server
    • Author's pre-print must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or replaced with Authors post-print
    • Author's pre-print must be accompanied with set-phrase, once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Author's pre-print must be accompanied with set-phrase, when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • IEEE must be informed as to the electronic address of the pre-print
    • If funding rules apply authors may post Author's post-print version in funder's designated repository
    • Author's Post-print - Publisher copyright and source must be acknowledged with citation (see above set statement)
    • Author's Post-print - Must link to publisher version with DOI
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC criteria but also extends the conventional criteria in a way that it can be applicable also to a sequence of sparse multinomial observations, where even within a same cluster, the number of multinomial trials may be different for different observations. In addition, as a preliminary estimation step to maximum likelihood estimation, and more generally, to maximum Lq estimation, we propose to use reduced rank projection in combination with non-negative factorization. We motivate our approach by showing that our model selection criterion and preliminary estimation step yield consistent estimates under simplifying assumptions. We also illustrate our approach through numerical experiments using real and simulated data.
    No preview · Article · Feb 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we present an approach to fuse video with sparse orientation data obtained from inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for accurate estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.
    No preview · Article · Feb 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: Partial differential equations (PDEs) have been used to formulate image processing for several decades. Generally, a PDE system consists of two components: the governing equation and the boundary condition. In most previous work, both of them are generally designed by people using mathematical skills. However, in real world visual analysis tasks, such predefined and fixed-form PDEs may not be able to describe the complex structure of the visual data. More importantly, it is hard to incorporate the labeling information and the discriminative distribution priors into these PDEs. To address above issues, we propose a new PDE framework, named learning to diffuse (LTD), to adaptively design the governing equation and the boundary condition of a diffusion PDE system for various vision tasks on different types of visual data. To our best knowledge, the problems considered in this paper (i.e., saliency detection and object tracking) have never been addressed by PDE models before. Experimental results on various challenging benchmark databases show the superiority of LTD against existing state-of-the-art methods for all the tested visual analysis tasks.
    No preview · Article · Feb 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: Soft biometrics have been emerging to complement other traits and are particularly useful for poor quality data. In this paper, we propose an efficient algorithm to estimate human head poses and to infer soft biometric labels based on the 3D morphology of the human head. Starting by considering a set of pose hypotheses, we use a learning set of head shapes synthesized from anthropometric surveys to derive a set of 3D head centroids that constitutes a metric space. Next, representing queries by sets of 2D head landmarks, we use projective geometry techniques to rank efficiently the joint 3D head centroids / pose hypotheses according to their likelihood of matching each query. The rationale is that the most likely hypotheses are sufficiently close to the query, so a good solution can be found by convex energy minimization techniques. Once a solution has been found, the 3D head centroid and the query are assumed to have similar morphology, yielding the soft label. Our experiments point toward the usefulness of the proposed solution, which can improve the effectiveness of face recognizers and can also be used as a privacy-preserving solution for biometric recognition in public environments.
    No preview · Article · Feb 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: Light-field cameras have become widely available in both consumer and industrial applications. However, most previous approaches do not model occlusions explicitly, and therefore fail to capture sharp object boundaries. A common assumption is that for a Lambertian scene, a pixel will exhibit photo-consistency, which means all viewpoints converge to a single point when focused to its depth. However, in the presence of occlusions this assumption fails to hold, making most current approaches unreliable precisely where accurate depth information is most important - at depth discontinuities. In this paper, an occlusion-aware depth estimation algorithm is developed; the method also enables identification of occlusion edges, which may be useful in other applications. It can be shown that although photo-consistency is not preserved for pixels at occlusions, it still holds in approximately half the viewpoints. Moreover, the line separating the two view regions (occluded object vs. occluder) has the same orientation as that of the occlusion edge in the spatial domain. By ensuring photo-consistency in only the occluded view region, depth estimation can be improved. Occlusion predictions can also be computed and used for regularization. Experimental results show that our method outperforms current state-of-the-art light-field depth estimation algorithms, especially near occlusion boundaries.
    No preview · Article · Jan 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research.
    No preview · Article · Jan 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper studies the estimation of Dirichlet process mixtures over discrete incomplete rankings. The generative model for each mixture component is the generalized Mallows (GM) model, an exponential family model for permutations which extends seamlessly to top-t rankings. While the GM is remarkably tractable in comparison with other permutation models, its conjugate prior is not. Our main contribution is to derive the theory and algorithms for sampling from the desired posterior distributions under this DPM. We introduce a family of partially collapsed Gibbs samplers, containing as one extreme point an exact algorithm based on slice-sampling, and at the other a fast approximate sampler with superior mixing that is still very accurate in all but the lowest ranks. We empirically demonstrate the effectiveness of the approximation in reducing mixing time, the benefits of the Dirichlet process approach over alternative clustering techniques, and the applicability of the approach to exploring large real-world ranking datasets.
    No preview · Article · Jan 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence

  • No preview · Article · Jan 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence

  • No preview · Article · Jan 2016 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: Human parsing, namely partitioning the human body into semantic regions, has drawn much attention recently for its wide applications in human-centric analysis. Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. We argue that these approaches cannot obtain optimal pixel-level parsing due to the inconsistent targets between the different tasks. In this work, we directly address the problem of human parsing by using the novel Parselet representation as the building blocks of our parsing model. Parselets are a group of parsable segments which can generally be obtained by low-level over-segmentation algorithms and bear strong semantic meaning. We then build a deformable mixture parsing model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. The proposed model has two unique characteristics: (1) the possible numerous modalities of Parselet ensembles are exhibited as the "And-Or" structure of sub-trees; (2) to further solve the practical problem of Parselet occlusion or absence, we directly model the visibility property at some leaf nodes. The DMPM thus directly solves the problem of human parsing by searching for the best graph configuration from a pool of Parselet hypotheses without intermediate tasks. Fast rejection based on hierarchical filtering is employed to ensure the overall efficiency. Comprehensive evaluations on a new large-scale human parsing dataset, which is crawled from the Internet, with high resolution and thoroughly annotated semantic labels at pixel-level, and also a benchmark dataset demonstrate the encouraging performance of the proposed approach.
    No preview · Article · Dec 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a facial alignment algorithm that is able to jointly deal with the presence of facial pose variation, partial occlusion of the face, and varying illumination and expressions. Our approach proceeds from sparse to dense landmarking steps using a set of specific models trained to best account for the shape and texture variation manifested by facial landmarks and facial shapes across pose and various expressions. We also propose the use of a novel `1-regularized least squares approach that we incorporate into our shape model, which is an improvement over the shape model used by several prior Active Shape Model (ASM) based facial landmark localization algorithms. Our approach is compared against several state-of-the-art methods on many challenging test datasets and exhibits a higher fitting accuracy on all of them.
    No preview · Article · Dec 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cross-modal retrieval has recently drawn much attention due to the widespread existence of multimodal data. It takes one type of data as the query to retrieve relevant data objects of another type, and generally involves two basic problems: the measure of relevance and coupled feature selection. Most previous methods just focus on solving the first problem. In this paper, we aim to deal with both problems in a novel joint learning framework. To address the first problem, we learn projection matrices to map multimodal data into a common subspace, in which the similarity between different modalities of data can be measured. In the learning procedure, the ℓ21-norm penalties are imposed on the projection matrices separately to solve the second problem, which selects relevant and discriminative features from different feature spaces simultaneously. A multimodal graph regularization term is further imposed on the projected data, which preserves the inter-modality and intra-modality similarity relationships. An iterative algorithm is presented to solve the proposed joint learning problem, along with its convergence analysis. Experimental results on cross-modal retrieval tasks demonstrate that the proposed method outperforms the state-of-the-art subspace approaches.
    No preview · Article · Dec 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: The idea of modeling object-object relations has been widely leveraged in many scene understanding applications. However, as the objects are designed by humans and for human usage, when we reason about a human environment, we reason about it through an interplay between the environment, objects and humans. In this paper, we model environments not only through objects, but also through latent human poses and human-object interactions. In order to handle the large number of latent human poses and a large variety of their interactions with objects, we present Infinite Latent Conditional Random Field (ILCRF) that models a scene as a mixture of CRFs generated from Dirichlet processes. In each CRF, we model objects and object-object relations as existing nodes and edges, and hidden human poses and human-object relations as latent nodes and edges. ILCRF generatively models the distribution of different CRF structures over these latent nodes and edges. We apply the model to the challenging applications of 3D scene labeling and robotic scene arrangement. In extensive experiments, we show that our model significantly outperforms the state-of-the-art results in both applications. We further use our algorithm on a robot for arranging objects in a new scene using the two applications aforementioned.
    No preview · Article · Dec 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50% relative to the previous best result on VOC 2012—achieving a mAP of 62.4%. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at
    No preview · Article · Dec 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the problem of simultaneous estimation of different linear deformations, resulting in a global non-linear transformation, between an original object and its broken fragments. A general framework is proposed without using correspondences, where the solution of a polynomial system of equations directly provides the parameters of the alignment. We quantitatively evaluate the proposed algorithm on a large synthetic dataset containing 2D and 3D images, where linear (rigid-body and affine) transformations are considered. We also conduct an exhaustive analysis of the robustness against segmentation errors and the numerical stability of the proposed method. Moreover, we present experiments on 2D real images as well as on volumetric medical images.
    No preview · Article · Dec 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence