IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T PATTERN ANAL)

Publisher: IEEE Computer Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers

Journal description

Theory and application of computers in pattern analysis and machine intelligence. Topics include computer vision and image processing; knowledge representation, inference systems, and probabilistic reasoning. Extensive bibliographies.

Current impact factor: 5.69

Impact Factor Rankings

2015 Impact Factor Available summer 2015
2013 / 2014 Impact Factor 5.694
2012 Impact Factor 4.795
2011 Impact Factor 4.908
2010 Impact Factor 5.027
2009 Impact Factor 4.378
2008 Impact Factor 5.96
2007 Impact Factor 3.579
2006 Impact Factor 4.306
2005 Impact Factor 3.81
2004 Impact Factor 4.352
2003 Impact Factor 3.823
2002 Impact Factor 2.923
2001 Impact Factor 2.289
2000 Impact Factor 2.094
1999 Impact Factor 1.882
1998 Impact Factor 1.417
1997 Impact Factor 1.668
1996 Impact Factor 2.085
1995 Impact Factor 1.94
1994 Impact Factor 2.006
1993 Impact Factor 1.917
1992 Impact Factor 1.906

Impact factor over time

Impact factor

Additional details

5-year impact 6.14
Cited half-life 0.00
Immediacy index 0.63
Eigenfactor 0.05
Article influence 3.24
Website IEEE Transactions on Pattern Analysis and Machine Intelligence website
Other titles IEEE transactions on pattern analysis and machine intelligence, Institute of Electrical and Electronics Engineers transactions on pattern analysis and machine intelligence
ISSN 0162-8828
OCLC 4253074
Material type Periodical, Internet resource
Document type Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on Author's personal website, employers website or publicly accessible server
    • Author's post-print on Author's server or Institutional server
    • Author's pre-print must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or replaced with Authors post-print
    • Author's pre-print must be accompanied with set-phrase, once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Author's pre-print must be accompanied with set-phrase, when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • IEEE must be informed as to the electronic address of the pre-print
    • If funding rules apply authors may post Author's post-print version in funder's designated repository
    • Author's Post-print - Publisher copyright and source must be acknowledged with citation (see above set statement)
    • Author's Post-print - Must link to publisher version with DOI
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of classifying actions of human subjects using depth movies generated by Kinect or other depth sensors. Representing human body as dynamical skeletons, we study the evolution of their (skeletons’) shapes as trajectories on Kendall’s shape manifold. The action data is typically corrupted by large variability in execution rates within and across subjects and, thus, causing major problems in statistical analyses. To address that issue, we adopt a recently-developed framework of Su et al. [1], [2] to this problem domain. Here, the variable execution rates correspond to re-parameterizations of trajectories, and one uses a parameterization-invariant metric for aligning, comparing, averaging, and modeling trajectories. This is based on a combination of transported square-root vector fields (TSRVFs) of trajectories and the standard Euclidean norm, that allows computational efficiency. We develop a comprehensive suite of computational tools for this application domain: smoothing and denoising skeleton trajectories using median filtering, up- and down-sampling actions in time domain, simultaneous temporalregistration of multiple actions, and extracting invertible Euclidean representations of actions. Due to invertibility these Euclidean representations allow both discriminative and generative models for statistical analysis. For instance, they can be used in a SVM-based classification of original actions as demonstrated here using MSR Action-3D, MSR Daily Activity and 3D Action Pairs datasets. This approach, using only the skeletal data, achieves the state-of-the-art classification results on these datasets.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 12/2015; DOI:10.1109/TPAMI.2015.2439257
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a calibration method of a time-of-flight (ToF) sensor and a color camera pair to align the 3D measurements with the color image correctly. We have designed a 2.5D pattern board with irregularly placed holes to be accurately detected from low resolution depth images of a ToF camera as well as from high resolution color images. In order to improve the accuracy of the 3D measurements of a ToF camera, we propose to perform ray correction and range bias correction. We reset the transformation of the ToF sensor which transforms the radial distance into the scene depth in Cartesian coordinate through ray correction. Then we capture a planar scene from different depths to correct the distance error that is shown to be dependent not only on the distance but also on the pixel location. The range error profiles along the calibrated distance are classified according to their wiggling shapes and each cluster of profiles with similar shape are separately estimated using a B-spline function. The standard deviation of the remaining random noise is recorded as an uncertainty information of distance measurements. We show the performance of our calibration method quantitatively and qualitatively on various datasets, and validate the impact of our method by demonstrating an RGB-D shape refinement application.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1501-1513. DOI:10.1109/TPAMI.2014.2363827
  • [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of parameter estimation and energy minimization for a region-based semantic segmentation model. The model divides the pixels of an image into non-overlapping connected regions, each of which is to a semantic class. In the context of energy minimization, the main problem we face is the large number of putative pixel-to-region assignments. We address this problem by designing an accurate linear programming based approach for selecting the best set of regions from a large dictionary. The dictionary is constructed by merging and intersecting segments obtained from multiple bottom-up over-segmentations. The linear program is solved efficiently using dual decomposition. In the context of parameter estimation, the main problem we face is the lack of fully supervised data. We address this issue by developing a principled framework for parameter estimation using diverse data. More precisely, we propose a latent structural support vector machine formulation, where the latent variables model any missing information in the human annotation. Of particular interest to us are three types of annotations: (i) images segmented using generic foreground or background classes; (ii) images with bounding boxes specified for objects; and (iii) images labeled to indicate the presence of a class. Using large, publicly available datasets we show that our methods are able to significantly improve the accuracy of the region-based model.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1373-1386. DOI:10.1109/TPAMI.2014.2372766
  • [Show abstract] [Hide abstract]
    ABSTRACT: Higher-order Markov Random Fields, which can capture important properties of natural images, have become increasingly important in computer vision. While graph cuts work well for first-order MRF’s, until recently they have rarely been effective for higher-order MRF’s. Ishikawa’s graph cut technique [1], [2] shows great promise for many higher-order MRF’s. His method transforms an arbitrary higher-order MRF with binary labels into a first-order one with the same minima. If all the terms are submodular the exact solution can be easily found; otherwise, pseudoboolean optimization techniques can produce an optimal labeling for a subset of the variables. We present a new transformation with better performance than [1], [2], both theoretically and experimentally. While [1], [2] transforms each higher-order term independently, we use the underlying hypergraph structure of the MRF to transform a group of terms at once. For $n$ binary variables, each of which appears in terms with $k$ other variables, at worst we produce $n$ non-submodular terms, while [1], [2] produces $O(- k)$ . We identify a local completeness property under which our method perform even better, and show that under certain assumptions several important vision problems (including common variants of fusion moves) have this property. We show experimentally that our method produces smaller weight of non-submodular edges, and that this metric is directly related to the effectiveness of QPBO [3]. Running on the same field of experts dataset used in [1], [2] we optimally label significantly more variables (96 versus 80 percent) and converge more rapidly to a lower energy. Preliminary experiments suggest that some other higher-order MRF’s used in stereo [4] and segmentation [5] are also locally complete and would thus benefit from our work.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1387-1395. DOI:10.1109/TPAMI.2014.2382109
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many inference tasks in pattern recognition and artificial intelligence lead to partition functions in which addition and multiplication are abstract binary operations forming a commutative semiring. By generalizing max-sum diffusion (one of convergent message passing algorithms for approximate MAP inference in graphical models), we propose an iterative algorithm to upper bound such partition functions over commutative semirings. The iteration of the algorithm is remarkably simple: change any two factors of the partition function such that their product remains the same and their overlapping marginals become equal. In many commutative semirings, repeating this iteration for different pairs of factors converges to a fixed point when the overlapping marginals of every pair of factors coincide. We call this state marginal consistency. During that, an upper bound on the partition function monotonically decreases. This abstract algorithm unifies several existing algorithms, including max-sum diffusion and basic costraint propagation (or local consistency) algorithms in constraint programming. We further construct a hierarchy of marginal consistencies of increasingly higher levels and show than any such level can be enforced by adding identity factors of higher arity (order). Finally, we discuss instances of the framework for several semirings, including the distributive lattice and the max-sum and sum-product semirings.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1455-1468. DOI:10.1109/TPAMI.2014.2363833
  • [Show abstract] [Hide abstract]
    ABSTRACT: Robust human gait recognition is challenging because of the presence of covariate factors such as carrying condition, clothing, walking surface, etc. In this paper, we model the effect of covariates as an unknown partial feature corruption problem. Since the locations of corruptions may differ for different query gaits, relevant features may become irrelevant when walking condition changes. In this case, it is difficult to train one fixed classifier that is robust to a large number of different covariates. To tackle this problem, we propose a classifier ensemble method based on the random subspace nethod (RSM) and majority voting (MV). Its theoretical basis suggests it is insensitive to locations of corrupted features, and thus can generalize well to a large number of covariates. We also extend this method by proposing two strategies, i.e., local enhancing (LE) and hybrid decision-level fusion (HDF) to suppress the ratio of false votes to true votes (before MV). The performance of our approach is competitive against the most challenging covariates like clothing, walking surface, and elapsed time. We evaluate our method on the USF dataset and OU-ISIR-B dataset, and it has much higher performance than other state-of-the-art algorithms.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1-1. DOI:10.1109/TPAMI.2014.2366766
  • [Show abstract] [Hide abstract]
    ABSTRACT: Markov random fields containing higher-order terms are becoming increasingly popular due to their ability to capture complicated relationships as soft constraints involving many output random variables. In computer vision an important class of constraints encode a preference for label consistency over large sets of pixels and can be modeled using higher-order terms known as lower linear envelope potentials. In this paper we develop an algorithm for learning the parameters of binary Markov random fields with weighted lower linear envelope potentials. We first show how to perform exact energy minimization on these models in time polynomial in the number of variables and number of linear envelope functions. Then, with tractable inference in hand, we show how the parameters of the lower linear envelope potentials can be estimated from labeled training data within a max-margin learning framework. We explore three variants of the lower linear envelope parameterization and demonstrate results on both synthetic and real-world problems.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1336-1346. DOI:10.1109/TPAMI.2014.2366760
  • [Show abstract] [Hide abstract]
    ABSTRACT: Use of higher order clique potentials in MRF-MAP problems has been limited primarily because of the inefficiencies of the existing algorithmic schemes. We propose a new combinatorial algorithm for computing optimal solutions to label MRF-MAP problems with higher order clique potentials. The algorithm runs in time in the worst case ( is size of clique and is the number of pixels). A special gadget is introduced to model flows in a higher order clique and a technique for building a flow graph is specified. Based on the primal dual structure of the optimization problem, the notions of the capacity of an edge and a cut are generalized to define a flow problem. We show that in this flow graph, when the clique potentials are submodular, the max flow is equal to the min cut, which also is the optimal solution to the problem. We show experimentally that our algorithm provides significantly better solutions in practice and is hundreds of times faster than solution schemes like Dual Decomposition [1], TRWS [2] and Reduction [3], [4], [5]. The framework represents a - ignificant advance in handling higher order problems making optimal inference practical for medium sized cliques.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1323-1335. DOI:10.1109/TPAMI.2014.2388218
  • [Show abstract] [Hide abstract]
    ABSTRACT: The papers in this special section address the programs and services supported by graphical models in computer vision. This section explores the main challenges in this framework???modeling novel priors, learning, inference???and presents innovative solutions. The papers cover the aspects of modeling novel priors, inference algorithms and parameter learning methods in the context of higher order graphical models.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1321-1322. DOI:10.1109/TPAMI.2015.2434651
  • [Show abstract] [Hide abstract]
    ABSTRACT: Particle filter is a powerful tool for state tracking using non-linear observations. We present a multiscale based method that accelerates the tracking computation by particle filters. Unlike the conventional way, which calculates weights over all particles in each cycle of the algorithm, we sample a small subset from the source particles using matrix decomposition methods. Then, we apply a function extension algorithm that uses a particle subset to recover the density function for all the rest of the particles not included in the chosen subset. The computational effort is substantial especially when multiple objects are tracked concurrently. The proposed algorithm significantly reduces the computational load. By using the Fast Gaussian Transform, the complexity of the particle selection step is reduced to a linear time in and , where is the number of particles and is the number of particles in the selected subset. We demonstrate our method on both simulated and on real data such as object tracking in video sequences.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1-1. DOI:10.1109/TPAMI.2015.2392754
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a very general algorithm for structured prediction learning that is able to efficiently handle discrete MRFs/CRFs (including both pairwise and higher-order models) so long as they can admit a decomposition into tractable subproblems. At its core, it relies on a dual decomposition principle that has been recently employed in the task of MRF optimization. By properly combining such an approach with a max-margin learning method, the proposed framework manages to reduce the training of a complex high-order MRF to the parallel training of a series of simple slave MRFs that are much easier to handle. This leads to a very efficient and general learning scheme that relies on solid mathematical principles. We thoroughly analyze its theoretical properties, and also show that it can yield learning algorithms of increasing accuracy since it naturally allows a hierarchy of convex relaxations to be used for loss-augmented MAP-MRF inference within a max-margin learning approach. Furthermore, it can be easily adapted to take advantage of the special structure that may be present in a given class of MRFs. We demonstrate the generality and flexibility of our approach by testing it on a variety of scenarios, including training of pairwise and higher-order MRFs, training by using different types of regularizers and/or different types of dissimilarity loss functions, as well as by learning of appropriate models for a variety of vision tasks (including high-order models for compact pose-invariant shape priors, knowledge-based segmentation, image denoising, stereo matching as well as high-order Potts MRFs).
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015; 37(7):1425-1441. DOI:10.1109/TPAMI.2014.2368990
  • [Show abstract] [Hide abstract]
    ABSTRACT: Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, Expectation Maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet Process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 07/2015;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a fully automatic system for extracting the semantic structure of a typical academic presentation video, which captures the whole presentation stage with abundant camera motions such as panning, tilting, and zooming. Our system automatically detects and tracks both the projection screen and the presenter whenever they are visible in the video. By analyzing the image content of the tracked screen region, our system is able to detect slide progressions and extract a high-quality, non-occluded, geometrically-compensated image for each slide, resulting in a list of representative images that reconstruct the main presentation structure. Afterwards, our system recognizes text content and extracts keywords from the slides, which can be used for keyword-based video retrieval and browsing. Experimental results show that our system is able to generate more stable and accurate screen localization results than commonly-used object tracking methods. Our system also extracts more accurate presentation structures than general video summarization methods, for this specific type of video.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1233-1246. DOI:10.1109/TPAMI.2014.2361133
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a deterministic explanation for mutual-information-based image registration (MI registration). The explanation is that MI registration works because it aligns certain image partitions. This notion of aligning partitions is new, and is shown to be related to Schur- and quasi-convexity. The partition-alignment theory of this paper goes beyond explaining mutual- information. It suggests other objective functions for registering images. Some of these newer objective functions are not entropy-based. Simulations with noisy images show that the newer objective functions work well for registration, lending support to the theory. The theory proposed in this paper opens a number of directions for further research in image registration. These directions are also discussed.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1286-1296. DOI:10.1109/TPAMI.2014.2361512
  • [Show abstract] [Hide abstract]
    ABSTRACT: Semantic segmentation and object detection are nowadays dominated by methods operating on regions obtained as a result of a bottom-up grouping process (segmentation) but use feature extractors developed for recognition on fixed-form (e.g. rectangular) patches, with full images as a special case. This is most likely suboptimal. In this paper we focus on feature extraction and description over free-form regions and study the relationship with their fixed-form counterparts. Our main contributions are novel pooling techniques that capture the second-order statistics of local descriptors inside such free-form regions. We introduce second-order generalizations of average and max-pooling that together with appropriate non-linearities, derived from the mathematical structure of their embedding space, lead to state-of-the-art recognition performance in semantic segmentation experiments without any type of local feature coding. In contrast, we show that codebook-based local feature coding is more important when feature extraction is constrained to operate over regions that include both foreground and large portions of the background, as typical in image classification settings, whereas for high-accuracy localization setups, second-order pooling over free-form regions produces results superior to those of the winning systems in the contemporary semantic segmentation challenges, with models that are much faster in both training and testing.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1177-1189. DOI:10.1109/TPAMI.2014.2361137
  • [Show abstract] [Hide abstract]
    ABSTRACT: While 3D object-centered shape-based models are appealing in comparison with 2D viewer-centered appearance-based models for their lower model complexities and potentially better view generalizabilities, the learning and inference of 3D models has been much less studied in the recent literature due to two factors: i) the enormous complexities of 3D shapes in geometric space; and ii) the gap between 3D shapes and their appearances in images. This paper aims at tackling the two problems by studying an And-Or Tree (AoT) representation that consists of two parts: i) a geometry-AoT quantizing the geometry space, i.e. the possible compositions of 3D volumetric parts and 2D surfaces within the volumes; and ii) an appearance-AoT quantizing the appearance space, i.e. the appearance variations of those shapes in different views. In this AoT, an And-node decomposes an entity into constituent parts, and an Or-node represents alternative ways of decompositions. Thus it can express a combinatorial number of geometry and appearance configurations through small dictionaries of 3D shape primitives and 2D image primitives. In the quantized space, the problem of learning a 3D object template is transformed to a structure search problem which can be efficiently solved in a dynamic programming algorithm by maximizing the information gain. We focus on learning 3D car templates from the AoT and collect a new car dataset featuring more diverse views. The learned car templates integrate both the shape-based model and the appearance-based model to combine the benefits of both. In experiments, we show three aspects: 1) the AoT is more efficient than the frequently used octree method in space representation; 2) the learned 3D car template matches the state-of-the art performances on car detection and pose estimation in a public multi-view car dataset; and 3) in our new dataset, the learned 3D template solves the joint task of simultaneous object detection, pose/view estimation, and part locali- ation. It can generalize over unseen views and performs better than the version 5 of the DPM model in terms of object detection and semantic part localization.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1190-1205. DOI:10.1109/TPAMI.2014.2362141
  • [Show abstract] [Hide abstract]
    ABSTRACT: Connected operators provide well-established solutions for digital image processing, typically in conjunction with hierarchical schemes. In graph-based frameworks, such operators basically rely on symmetric adjacency relations between pixels. In this article, we introduce a notion of directed connected operators for hierarchical image processing, by also considering non-symmetric adjacency relations. The induced image representation models are no longer partition hierarchies (i.e., trees), but directed acyclic graphs that generalize standard morphological tree structures such as component trees, binary partition trees or hierarchical watersheds. We describe how to efficiently build and handle these richer data structures, and we illustrate the versatility of the proposed framework in image filtering and image segmentation.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1162-1176. DOI:10.1109/TPAMI.2014.2366145
  • [Show abstract] [Hide abstract]
    ABSTRACT: The high complexity of multi-scale, category-level object detection in cluttered scenes is efficiently handled by Hough voting methods. However, the main shortcoming of the approach is that mutually dependent local observations are independently casting their votes for intrinsically global object properties such as object scale. Object hypotheses are then assumed to be a mere sum of their part votes. Popular representation schemes are, however, based on a dense sampling of semi-local image features, which are consequently mutually dependent. We take advantage of part dependencies and incorporate them into probabilistic Hough voting by deriving an objective function that connects three intimately related problems: i) grouping mutually dependent parts, ii) solving the correspondence problem conjointly for dependent parts, and iii) finding concerted object hypotheses using extended groups rather than based on local observations alone. Early commitments are avoided by not restricting parts to only a single vote for a locally best correspondence and we learn a weighting of parts during training to reflect their differing relevance for an object. Experiments successfully demonstrate the benefit of incorporating part dependencies through grouping into Hough voting. The joint optimization of groupings, correspondences, and votes not only improves the detection accuracy over standard Hough voting and a sliding window baseline, but it also reduces the computational complexity by significantly decreasing the number of candidate hypotheses.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1134-1147. DOI:10.1109/TPAMI.2014.2363456