IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T PATTERN ANAL)

Publisher: IEEE Computer Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers

Journal description

Theory and application of computers in pattern analysis and machine intelligence. Topics include computer vision and image processing; knowledge representation, inference systems, and probabilistic reasoning. Extensive bibliographies.

Current impact factor: 5.69

Impact Factor Rankings

2015 Impact Factor Available summer 2015
2013 / 2014 Impact Factor 5.694
2012 Impact Factor 4.795
2011 Impact Factor 4.908
2010 Impact Factor 5.027
2009 Impact Factor 4.378
2008 Impact Factor 5.96
2007 Impact Factor 3.579
2006 Impact Factor 4.306
2005 Impact Factor 3.81
2004 Impact Factor 4.352
2003 Impact Factor 3.823
2002 Impact Factor 2.923
2001 Impact Factor 2.289
2000 Impact Factor 2.094
1999 Impact Factor 1.882
1998 Impact Factor 1.417
1997 Impact Factor 1.668
1996 Impact Factor 2.085
1995 Impact Factor 1.94
1994 Impact Factor 2.006
1993 Impact Factor 1.917
1992 Impact Factor 1.906

Impact factor over time

Impact factor

Additional details

5-year impact 6.14
Cited half-life 0.00
Immediacy index 0.63
Eigenfactor 0.05
Article influence 3.24
Website IEEE Transactions on Pattern Analysis and Machine Intelligence website
Other titles IEEE transactions on pattern analysis and machine intelligence, Institute of Electrical and Electronics Engineers transactions on pattern analysis and machine intelligence
ISSN 0162-8828
OCLC 4253074
Material type Periodical, Internet resource
Document type Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on Author's personal website, employers website or publicly accessible server
    • Author's post-print on Author's server or Institutional server
    • Author's pre-print must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or replaced with Authors post-print
    • Author's pre-print must be accompanied with set-phrase, once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Author's pre-print must be accompanied with set-phrase, when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • IEEE must be informed as to the electronic address of the pre-print
    • If funding rules apply authors may post Author's post-print version in funder's designated repository
    • Author's Post-print - Publisher copyright and source must be acknowledged with citation (see above set statement)
    • Author's Post-print - Must link to publisher version with DOI
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Random forests works by averaging several predictions of de-correlated trees. We show a conceptually radical approach to generate a random forest: random sampling of many trees from a prior distribution, and subsequently performing a weighted ensemble of predictive probabilities. Our approach uses priors that allow sampling of decision trees even before looking at the data, and a power likelihood that explores the space spanned by combination of decision trees. While each tree performs Bayesian inference to compute its predictions, our aggregation procedure uses the power likelihood rather than the likelihood and is therefore strictly speaking not Bayesian. Nonetheless, we refer to it as a Bayesian random forest but with a built-in safety. The safeness comes as it has good predictive performance even if the underlying probabilistic model is wrong. We demonstrate empirically that our Safe-Bayesian random forest outperforms MCMC or SMC based Bayesian decision trees in term of speed and accuracy, and achieves competitive performance to entropy or Gini optimised random forest, yet is very simple to construct.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1297-1303. DOI:10.1109/TPAMI.2014.2362751
  • [Show abstract] [Hide abstract]
    ABSTRACT: Connected operators provide well-established solutions for digital image processing, typically in conjunction with hierarchical schemes. In graph-based frameworks, such operators basically rely on symmetric adjacency relations between pixels. In this article, we introduce a notion of directed connected operators for hierarchical image processing, by also considering non-symmetric adjacency relations. The induced image representation models are no longer partition hierarchies (i.e., trees), but directed acyclic graphs that generalize standard morphological tree structures such as component trees, binary partition trees or hierarchical watersheds. We describe how to efficiently build and handle these richer data structures, and we illustrate the versatility of the proposed framework in image filtering and image segmentation.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1162-1176. DOI:10.1109/TPAMI.2014.2366145
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a new high dynamic range (HDR) imaging algorithm which utilizes rank minimization. Assuming a camera responses linearly to scene radiance, the input low dynamic range (LDR) images captured with different exposure time exhibit a linear dependency and form a rank-1 matrix when stacking intensity of each corresponding pixel together. In practice, misalignments caused by camera motion, presences of moving objects, saturations and image noise break the rank-1 structure of the LDR images. To address these problems, we present a rank minimization algorithm which simultaneously aligns LDR images and detects outliers for robust HDR generation. We evaluate the performances of our algorithm systematically using synthetic examples and qualitatively compare our results with results from the state-of-the-art HDR algorithms using challenging real world examples.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1219-1232. DOI:10.1109/TPAMI.2014.2361338
  • [Show abstract] [Hide abstract]
    ABSTRACT: The high complexity of multi-scale, category-level object detection in cluttered scenes is efficiently handled by Hough voting methods. However, the main shortcoming of the approach is that mutually dependent local observations are independently casting their votes for intrinsically global object properties such as object scale. Object hypotheses are then assumed to be a mere sum of their part votes. Popular representation schemes are, however, based on a dense sampling of semi-local image features, which are consequently mutually dependent. We take advantage of part dependencies and incorporate them into probabilistic Hough voting by deriving an objective function that connects three intimately related problems: i) grouping mutually dependent parts, ii) solving the correspondence problem conjointly for dependent parts, and iii) finding concerted object hypotheses using extended groups rather than based on local observations alone. Early commitments are avoided by not restricting parts to only a single vote for a locally best correspondence and we learn a weighting of parts during training to reflect their differing relevance for an object. Experiments successfully demonstrate the benefit of incorporating part dependencies through grouping into Hough voting. The joint optimization of groupings, correspondences, and votes not only improves the detection accuracy over standard Hough voting and a sliding window baseline, but it also reduces the computational complexity by significantly decreasing the number of candidate hypotheses.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1134-1147. DOI:10.1109/TPAMI.2014.2363456
  • [Show abstract] [Hide abstract]
    ABSTRACT: While 3D object-centered shape-based models are appealing in comparison with 2D viewer-centered appearance-based models for their lower model complexities and potentially better view generalizabilities, the learning and inference of 3D models has been much less studied in the recent literature due to two factors: i) the enormous complexities of 3D shapes in geometric space; and ii) the gap between 3D shapes and their appearances in images. This paper aims at tackling the two problems by studying an And-Or Tree (AoT) representation that consists of two parts: i) a geometry-AoT quantizing the geometry space, i.e. the possible compositions of 3D volumetric parts and 2D surfaces within the volumes; and ii) an appearance-AoT quantizing the appearance space, i.e. the appearance variations of those shapes in different views. In this AoT, an And-node decomposes an entity into constituent parts, and an Or-node represents alternative ways of decompositions. Thus it can express a combinatorial number of geometry and appearance configurations through small dictionaries of 3D shape primitives and 2D image primitives. In the quantized space, the problem of learning a 3D object template is transformed to a structure search problem which can be efficiently solved in a dynamic programming algorithm by maximizing the information gain. We focus on learning 3D car templates from the AoT and collect a new car dataset featuring more diverse views. The learned car templates integrate both the shape-based model and the appearance-based model to combine the benefits of both. In experiments, we show three aspects: 1) the AoT is more efficient than the frequently used octree method in space representation; 2) the learned 3D car template matches the state-of-the art performances on car detection and pose estimation in a public multi-view car dataset; and 3) in our new dataset, the learned 3D template solves the joint task of simultaneous object detection, pose/view estimation, and part locali- ation. It can generalize over unseen views and performs better than the version 5 of the DPM model in terms of object detection and semantic part localization.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1190-1205. DOI:10.1109/TPAMI.2014.2362141
  • [Show abstract] [Hide abstract]
    ABSTRACT: Semantic segmentation and object detection are nowadays dominated by methods operating on regions obtained as a result of a bottom-up grouping process (segmentation) but use feature extractors developed for recognition on fixed-form (e.g. rectangular) patches, with full images as a special case. This is most likely suboptimal. In this paper we focus on feature extraction and description over free-form regions and study the relationship with their fixed-form counterparts. Our main contributions are novel pooling techniques that capture the second-order statistics of local descriptors inside such free-form regions. We introduce second-order generalizations of average and max-pooling that together with appropriate non-linearities, derived from the mathematical structure of their embedding space, lead to state-of-the-art recognition performance in semantic segmentation experiments without any type of local feature coding. In contrast, we show that codebook-based local feature coding is more important when feature extraction is constrained to operate over regions that include both foreground and large portions of the background, as typical in image classification settings, whereas for high-accuracy localization setups, second-order pooling over free-form regions produces results superior to those of the winning systems in the contemporary semantic segmentation challenges, with models that are much faster in both training and testing.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1177-1189. DOI:10.1109/TPAMI.2014.2361137
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a deterministic explanation for mutual-information-based image registration (MI registration). The explanation is that MI registration works because it aligns certain image partitions. This notion of aligning partitions is new, and is shown to be related to Schur- and quasi-convexity. The partition-alignment theory of this paper goes beyond explaining mutual- information. It suggests other objective functions for registering images. Some of these newer objective functions are not entropy-based. Simulations with noisy images show that the newer objective functions work well for registration, lending support to the theory. The theory proposed in this paper opens a number of directions for further research in image registration. These directions are also discussed.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1286-1296. DOI:10.1109/TPAMI.2014.2361512
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a fully automatic system for extracting the semantic structure of a typical academic presentation video, which captures the whole presentation stage with abundant camera motions such as panning, tilting, and zooming. Our system automatically detects and tracks both the projection screen and the presenter whenever they are visible in the video. By analyzing the image content of the tracked screen region, our system is able to detect slide progressions and extract a high-quality, non-occluded, geometrically-compensated image for each slide, resulting in a list of representative images that reconstruct the main presentation structure. Afterwards, our system recognizes text content and extracts keywords from the slides, which can be used for keyword-based video retrieval and browsing. Experimental results show that our system is able to generate more stable and accurate screen localization results than commonly-used object tracking methods. Our system also extracts more accurate presentation structures than general video summarization methods, for this specific type of video.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1233-1246. DOI:10.1109/TPAMI.2014.2361133
  • [Show abstract] [Hide abstract]
    ABSTRACT: A robust and effective specular highlight removal method is proposed in this paper. It is based on a key observation—the maximum fraction of the diffuse colour component in diffuse local patches in colour images changes smoothly. The specular pixels can thus be treated as noise in this case. This property allows the specular highlights to be removed in an image denoising fashion: an edge-preserving low-pass filter (e.g., the bilateral filter) can be used to smooth the maximum fraction of the colour components of the original image to remove the noise contributed by the specular pixels. Recent developments in fast bilateral filtering techniques enable the proposed method to run over faster than state-of-the-art techniques on a standard CPU and differentiates it from previous work.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1304-1311. DOI:10.1109/TPAMI.2014.2360402
  • [Show abstract] [Hide abstract]
    ABSTRACT: A proper temporal model is essential to analysis tasks involving sequential data. In computer-assisted surgical training, which is the focus of this study, obtaining accurate temporal models is a key step towards automated skill-rating. Conventional learning approaches can have only limited success in this domain due to insufficient amount of data with accurate labels. We propose a novel formulation termed Relative Hidden Markov Model and develop algorithms for obtaining a solution under this formulation. The method requires only relative ranking between input pairs, which are readily available from training sessions in the target application, hence alleviating the requirement on data labeling. The proposed algorithm learns a model from the training data so that the attribute under consideration is linked to the likelihood of the input, hence supporting comparing new sequences. For evaluation, synthetic data are first used to assess the performance of the approach, and then we experiment with real videos from a widely-adopted surgical training platform. Experimental results suggest that the proposed approach provides a promising solution to video-based motion skill evaluation. To further illustrate the potential of generalizing the method to other applications of temporal analysis, we also report experiments on using our model on speech-based emotion recognition.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1206-1218. DOI:10.1109/TPAMI.2014.2361121
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a face alignment framework that relies on the texture model generated by the responses of discriminatively trained part-based filters. Unlike standard texture models built from pixel intensities or responses generated by generic filters (e.g. Gabor), our framework has two important advantages. First, by virtue of discriminative training, invariance to external variations (like identity, pose, illumination and expression) is achieved. Second, we show that the responses generated by discriminatively trained filters (or patch-experts) are sparse and can be modeled using a very small number of parameters. As a result, the optimization methods based on the proposed texture model can better cope with unseen variations. We illustrate this point by formulating both part-based and holistic approaches for generic face alignment and show that our framework outperforms the state-of-the-art on multiple ”wild” databases. The code and dataset annotations are available for research purposes from
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1312-1320. DOI:10.1109/TPAMI.2014.2362142
  • [Show abstract] [Hide abstract]
    ABSTRACT: Autoencoders are popular feature learning models, that are conceptually simple, easy to train and allow for efficient inference. Recent work has shown how certain autoencoders can be associated with an energy landscape, akin to negative log-probability in a probabilistic model, which measures how well the autoencoder can represent regions in the input space. The energy landscape has been commonly inferred heuristically, by using a training criterion that relates the autoencoder to a probabilistic model such as a Restricted Boltzmann Machine (RBM). In this paper we show how most common autoencoders are naturally associated with an energy function, independent of the training procedure, and that the energy landscape can be inferred analytically by integrating the reconstruction function of the autoencoder. For autoencoders with sigmoid hidden units, the energy function is identical to the free energy of an RBM, which helps shed light onto the relationship between these two types of model. We also show that the autoencoder energy function allows us to explain common regularization procedures, such as contractive training, from the perspective of dynamical systems. As a practical application of the energy function, a generative classifier based on class-specific autoencoders is presented.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 06/2015; 37(6):1261-1273. DOI:10.1109/TPAMI.2014.2362140
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of classifying actions of human subjects using depth movies generated by Kinect or other depth sensors. Representing human body as dynamical skeletons, we study the evolution of their (skeletons’) shapes as trajectories on Kendall’s shape manifold. The action data is typically corrupted by large variability in execution rates within and across subjects and, thus, causing major problems in statistical analyses. To address that issue, we adopt a recently-developed framework of Su et al. to this problem domain. Here, the variable execution rates correspond to re-parameterizations of trajectories, and one uses a parameterization-invariant metric for aligning, comparing, averaging, and modeling trajectories. This is based on a combination of transported square-root vector fields (TSRVFs) of trajectories and the standard Euclidean norm, that allows computational efficiency. We develop a comprehensive suite of computational tools for this application domain: smoothing and denoising skeleton trajectories using median filtering, up- and down-sampling actions in time domain, simultaneous temporal- registration of multiple actions, and extracting invertible Euclidean representations of actions. Due to invertibility these Euclidean representations allow both discriminative and generative models for statistical analysis. For instance, they can be used in a SVM-based classification of original actions as demonstrated here using MSR Action-3D, MSR Daily Activity and 3D Action Pairs datasets. This approach, using only the skeletal data, achieves the state-of-the-art classification results on these datasets.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 05/2015;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new family of message passing techniques for MAP estimation in graphical models which we call Sequential Reweighted Message Passing (SRMP). Special cases include well-known techniques such as Min-Sum Diffusion (MSD) and a faster Sequential Tree-Reweighted Message Passing (TRW-S). Importantly, our derivation is simpler than the original derivation of TRW-S, and does not involve a decomposition into trees. This allows easy generalizations. The new family of algorithms can be viewed as a generalization of TRW-S from pairwise to higher-order graphical models. We test SRMP on several real-world problems with promising results.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 05/2015; 37(5):919-930. DOI:10.1109/TPAMI.2014.2363465
  • [Show abstract] [Hide abstract]
    ABSTRACT: Modeling intensity of facial action units from spontaneously displayed facial expressions is challenging mainly because of high variability in subject-specific facial expressiveness, head-movements, illumination changes, etc. These factors make the target problem highly context-sensitive. However, existing methods usually ignore this context-sensitivity of the target problem. We propose a novel Conditional Ordinal Random Field (CORF) model for context-sensitive modeling of the facial action unit intensity, where the W5+ (who, when , what, where, why and how) definition of the context is used. While the proposed model is general enough to handle all six context questions, in this paper we focus on the context questions: who (the observed subject), how (the changes in facial expressions), and when (the timing of facial expressions and their intensity). The context questions who and how are modeled by means of the newly introduced context-dependent covariate effects, and the context question when is modeled in terms of temporal correlation between the ordinal outputs, i.e., intensity levels of action units. We also introduce a weighted softmax-margin learning of CRFs from data with skewed distribution of the intensity levels, which is commonly encountered in spontaneous facial data. The proposed model is evaluated on intensity estimation of pain and facial action units using two recently published datasets (UNBC Shoulder Pain and DISFA) of spontaneously displayed facial expressions. Our experiments show that the proposed model performs significantly better on the target tasks compared to the state-of-the-art approaches. Furthermore, compared to traditional learning of CRFs, we show that the proposed weighted learning results in more robust parameter estimation from th- imbalanced intensity data.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 05/2015; 37(5):944-958. DOI:10.1109/TPAMI.2014.2356192
  • [Show abstract] [Hide abstract]
    ABSTRACT: Soft-constraint semi-supervised affinity propagation (SCSSAP) adds supervision to the affinity propagation (AP) clustering algorithm without strictly enforcing instance-level constraints. Constraint violations lead to an adjustment of the AP similarity matrix at every iteration of the proposed algorithm and to addition of a penalty to the objective function. This formulation is particularly advantageous in the presence of noisy labels or noisy constraints since the penalty parameter of SCSSAP can be tuned to express our confidence in instance-level constraints. When the constraints are noiseless, SCSSAP outperforms unsupervised AP and performs at least as well as the previously proposed semi-supervised AP and constrained expectation maximization. In the presence of label and constraint noise, SCSSAP results in a more accurate clustering than either of the aforementioned established algorithms. Finally, we present an extension of SCSSAP which incorporates metric learning in the optimization objective and can further improve the performance of clustering.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 05/2015; 37(5):1041-1052. DOI:10.1109/TPAMI.2014.2359454
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a method to track the shape of an object from video. The method uses a joint shape and appearance model of the object, which is propagated to match shape and radiance in subsequent frames, determining object shape. Self-occlusions and dis-occlusions of the object from camera and object motion pose difficulties to joint shape and appearance models in tracking. They are unable to adapt to new shape and appearance information, leading to inaccurate shape detection. In this work, we model self-occlusions and dis-occlusions in a joint shape and appearance tracking framework. Self-occlusions and the warp to propagate the model are coupled, thus we formulate a joint optimization problem. We derive a coarse-to-fine optimization method, advantageous in tracking, that initially perturbs the model by coarse perturbations before transitioning to finer-scale perturbations seamlessly. This coarse-to-fine behavior is automatically induced by gradient descent on a novel infinite-dimensional Riemannian manifold that we introduce. The manifold consists of planar parameterized regions, and the metric that we introduce is a novel Sobolev metric. Experiments on video exhibiting occlusions/dis-occlusions, complex radiance and background show that occlusion/dis-occlusion modeling leads to superior shape accuracy.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 05/2015; 37(5):1053-1066. DOI:10.1109/TPAMI.2014.2360380
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clothing recognition is a societally and commercially important yet extremely challenging problem due to large variations in clothing appearance, layering, style, and body shape and pose. In this paper, we tackle the clothing parsing problem using a retrieval-based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to recognize clothing items in the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on the fly from retrieved examples, and transferred parse-masks (Paper Doll item transfer) from retrieved examples. We evaluate our approach extensively and show significant improvements over previous state-of-the-art for both localization (clothing parsing given weak supervision in the form of tags) and detection (general clothing parsing). Our experimental results also indicate that the general pose estimation problem can benefit from clothing parsing.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 05/2015; 37(5):1028-1040. DOI:10.1109/TPAMI.2014.2353624