IEEE Transactions on Image Processing (IEEE T IMAGE PROCESS)

Publisher: IEEE Signal Processing Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers

Journal description

This journal will focus on the signal processing aspects of image acquisition, processing, and display, especially where it concerns modeling, design, and analysis having a strong mathematical basis.

Current impact factor: 3.63

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 3.625
2013 Impact Factor 3.111
2012 Impact Factor 3.199
2011 Impact Factor 3.042
2010 Impact Factor 2.606
2009 Impact Factor 2.848
2008 Impact Factor 3.315
2007 Impact Factor 2.462
2006 Impact Factor 2.715
2005 Impact Factor 2.428
2004 Impact Factor 2.011
2003 Impact Factor 2.642
2002 Impact Factor 2.553
2001 Impact Factor 2.185
2000 Impact Factor 2.078
1999 Impact Factor 2.695
1998 Impact Factor 1.364
1997 Impact Factor 1.063

Impact factor over time

Impact factor

Additional details

5-year impact 4.48
Cited half-life 7.70
Immediacy index 0.44
Eigenfactor 0.04
Article influence 1.58
Website IEEE Transactions on Image Processing website
Other titles IEEE transactions on image processing, Institute of Electrical and Electronics Engineers transactions on image processing, Image processing
ISSN 1941-0042
OCLC 24103523
Material type Periodical, Internet resource
Document type Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on Author's personal website, employers website or publicly accessible server
    • Author's post-print on Author's server or Institutional server
    • Author's pre-print must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or replaced with Authors post-print
    • Author's pre-print must be accompanied with set-phrase, once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Author's pre-print must be accompanied with set-phrase, when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • IEEE must be informed as to the electronic address of the pre-print
    • If funding rules apply authors may post Author's post-print version in funder's designated repository
    • Author's Post-print - Publisher copyright and source must be acknowledged with citation (see above set statement)
    • Author's Post-print - Must link to publisher version with DOI
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a two-stage texture synthesis algorithm. At the first stage, a structure tensor map carrying information about the local orientation is synthesized from the exemplar's data and used at the second stage to constrain the synthesis of the texture. Keeping in mind that the algorithm should be able to reproduce as faithfully as possible the visual aspect, statistics and morphology of the input sample, the method is tested on various textures and compared objectively with existing methods, highlighting its strength in successfully synthesizing the output texture in many situations where traditional algorithms fail to reproduce the exemplar's patterns. The promising results pave the way towards the synthesis of accurately large and multi-scale patterns as it is the case for carbon material samples showing laminar structures, for example.
    IEEE Transactions on Image Processing 11/2015; 24(11):4082-4095. DOI:10.1109/TIP.2015.2458701
  • [Show abstract] [Hide abstract]
    ABSTRACT: A recently developed demosaicing methodology, called residual interpolation (RI), has demonstrated superior performance over the conventional color-component difference interpolation (CDI). However, it has been observed that the existing RI-based methods fail to fully exploit the potential of RI strategy on the reconstruction of the most important G channel, as only the R and B channels are restored through the RI strategy. Since any reconstruction error introduced in the G channel will be carried over into the demosaicing process of the other two channels, this makes the restoration of the G channel highly instrumental to the quality of the final demosaiced image. In this paper, a novel iterative residual interpolation (IRI) process is developed for reconstructing a highly accurate G channel first; in essence, it can be viewed as an iterative refinement process for the estimation of those missing pixel values on the G channel. The key novelty of the proposed IRI process is that all the three channels will mutually guide each other until a stopping criterion is met. Based on the restored G channel, the mosaiced R and B channels will be respectively reconstructed by exploiting the existing RI method without iteration. Extensive simulations conducted on two commonly-used test datasets for demosaicing algorithms have demonstrated that our algorithm has achieved the best performance in most cases, compared to the existing state-of-the-art demosaicing methods on both objective and subjective performance evaluations.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2482899
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recognizing human activities from videos is a fundamental research problem in computer vision. Recently, there has been a growing interest in analyzing human behavior from data collected with wearable cameras. First-person cameras continuously record several hours of their wearers' life. To cope with this vast amount of unlabeled and heterogeneous data, novel algorithmic solutions are required. In this paper, we propose a multitask clustering framework for activity of daily living analysis from visual data gathered from wearable cameras. Our intuition is that, even if the data are not annotated, it is possible to exploit the fact that the tasks of recognizing everyday activities of multiple individuals are related, since typically people perform the same actions in similar environments, e.g., people working in an office often read and write documents). In our framework, rather than clustering data from different users separately, we propose to look for clustering partitions which are coherent among related tasks. In particular, two novel multitask clustering algorithms, derived from a common optimization problem, are introduced. Our experimental evaluation, conducted both on synthetic data and on publicly available first-person vision data sets, shows that the proposed approach outperforms several single-task and multitask learning methods.
    IEEE Transactions on Image Processing 10/2015; 24(10):2984-2995. DOI:10.1109/TIP.2015.2438540
  • [Show abstract] [Hide abstract]
    ABSTRACT: Conditional random fields (CRFs) are a flexible yet powerful probabilistic approach and have shown advantages for popular applications in various areas, including text analysis, bioinformatics, and computer vision. Traditional CRF models, however, are incapable of selecting relevant features as well as suppressing noise from noisy original features. Moreover, conventional optimization methods often converge slowly in solving the training procedure of CRFs, and will degrade significantly for tasks with a large number of samples and features. In this paper, we propose robust CRFs (RCRFs) to simultaneously select relevant features. An optimal gradient method (OGM) is further designed to train RCRFs efficiently. Specifically, the proposed RCRFs employ the l1 norm of the model parameters to regularize the objective used by traditional CRFs, therefore enabling discovery of the relevant unary features and pairwise features of CRFs. In each iteration of OGM, the gradient direction is determined jointly by the current gradient together with the historical gradients, and the Lipschitz constant is leveraged to specify the proper step size. We show that an OGM can tackle the RCRF model training very efficiently, achieving the optimal convergence rate [Formula: see text] (where k is the number of iterations). This convergence rate is theoretically superior to the convergence rate O(1/k) of previous first-order optimization methods. Extensive experiments performed on three practical image segmentation tasks demonstrate the efficacy of OGM in training our proposed RCRFs.
    IEEE Transactions on Image Processing 10/2015; 24(10):3124-36. DOI:10.1109/TIP.2015.2438553
  • [Show abstract] [Hide abstract]
    ABSTRACT: Complex visual data contain discriminative structures that are difficult to be fully captured by any single feature descriptor. While recent work on domain adaptation focuses on adapting a single handcrafted feature, it is important to perform adaptation on a hierarchy of features to exploit the richness of visual data. We propose a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N). Our method jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as one traverses deeper into the hierarchy. Experimental results show that our method compares favorably with competing state-of-the-art methods. In addition, it is shown that a multi-layer DASH-N performs better than a single-layer DASH-N.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2479405
  • [Show abstract] [Hide abstract]
    ABSTRACT: Liver segmentation is still a challenging task in medical image processing area due to the complexity of the liver?s anatomy, low contrast with adjacent organs and presence of pathologies. This investigation was used to develop and validate an automated method to segment livers in CT images. The proposed framework consists of three steps: preprocessing, initialization and segmentation. In the first step, statistical shape model is constructed based on principal component analysis and the input image is smoothed using curvature anisotropic diffusion filtering. In the second step, the mean shape model is moved by using thresholding and Euclidean distance transformation to obtain a coarse position in a test image, and then the initial mesh is locally and iteratively deformed to the coarse boundary, which is constrained to stay close to a subspace of shapes describing the anatomical variability. Finally, in order to accurately detect the liver surface, deformable graph cut was proposed, which effectively integrates the properties and inter-relationship of the input images and initialized surface. The proposed method was evaluated on 50 CT scan images, which are publicly available in two databases Sliver07 and 3Dircadb. The experimental results showed that the proposed method was effective and accurate for detection of the liver surface.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2481326
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the development of depth data acquisition technologies, access to high precision depth with more than 8-bit depths has become much easier and determining how to efficiently represent and compress high precision depth is essential for practical depth storage and transmission systems. In this paper, we propose a layered high precision depth compression framework based on an 8-bit image/video encoder to achieve efficient compression with low complexity. Within this framework, considering the characteristics of the high precision depth, a depth map is partitioned into two layers: the most significant bits (MSBs) layer and the least significant bits (LSBs) layer. The MSBs layer provides rough depth value distribution, while the LSBs layer records the details of the depth value variation. For the MSBs layer, an error-controllable pixel domain encoding scheme is proposed to exploit the data correlation of the general depth information with sharp edges and to guarantee the data format of LSBs layer is 8-bit after taking the quantization error from MSBs layer. For the LSBs layer, standard 8-bit image/video codec is leveraged to perform the compression. The experimental results demonstrate that the proposed coding scheme can achieve real-time depth compression with satisfactory reconstruction quality. Moreover, the compressed depth data generated from this scheme can achieve better performance in view synthesis and gesture recognition applications compared to conventional coding schemes thanks to the error control algorithm.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2481324
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces the vector sparse matrix transform (vector SMT), a new decorrelating transform suitable for performing distributed processing of high dimensional signals in sensor networks. We assume that each sensor in the network encodes its measurements into vector outputs instead of scalar ones. The proposed transform decorrelates a sequence of pairs of vector outputs, until these vectors are decorrelated. In our experiments, we simulate distributed anomaly detection by a network of cameras monitoring a spatial region. Each camera records an image of the monitored environment from its particular viewpoint and outputs a vector encoding the image. Our results, with both artificial and real data, show that the proposed vector SMT transform effectively decorrelates image measurements from the multiple cameras in the network while maintaining low overall communication energy consumption. Since it enables joint processing of the multiple vector outputs, our method provides significant improvements to anomaly detection accuracy when compared to the baseline case when the images are processed independently.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2481709
  • [Show abstract] [Hide abstract]
    ABSTRACT: Real-world stereo images are inevitably affected by radiometric differences, including variations in exposure, vignetting, lighting, and noise. Stereo images with severe radiometric distortion can have large radiometric differences and include locally nonlinear changes. In this paper, we first introduce an adaptive orthogonal integral image, which is an improved version of an orthogonal integral image. After that, based on matching by tone mapping and the adaptive orthogonal integral image, we propose a robust and accurate matching cost function that can tolerate locally nonlinear intensity distortion. By using the adaptive orthogonal integral image, the proposed matching cost function can adaptively construct different support regions of arbitrary shapes and sizes for different pixels in the reference image, so it can operate robustly within object boundaries. Furthermore, we develop techniques to automatically estimate the values of the parameters of our proposed function. We conduct experiments using the proposed matching cost function and compare it with functions employing the census transform, supporting local binary pattern and adaptive normalized cross correlation, as well as a mutual information-based matching cost function using different stereo datasets. By using the adaptive orthogonal integral image, the proposed matching cost function reduces the error from 21.51% to 15.73% in the Middlebury dataset, and from 15.9% to 10.85% in the Kitti dataset, as compared with using the orthogonal integral image. The experimental results indicate that the proposed matching cost function is superior to state-of-the-art matching cost functions under radiometric variation.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2481702
  • [Show abstract] [Hide abstract]
    ABSTRACT: While color information is known to provide rich discriminative clues for visual inference, most modern visual trackers limit themselves to the grayscale realm. Despite recent efforts to integrate color in tracking, there is a lack of comprehensive understanding of the role color information can play. In this paper, we attack this problem by conducting a systematic study from both the algorithm and benchmark perspectives. On the algorithm side, we comprehensively encode 10 chromatic models into 16 carefully selected state-of-the-art visual trackers. On the benchmark side, we compile a large set of 128 color sequences with ground truth and challenge factor annotations (e.g., occlusion). A thorough evaluation is conducted by running all the color-encoded trackers, together with two recently proposed color trackers. A further validation is conducted on a RGBD tracking benchmark. The results clearly show the benefit of encoding color information for tracking. We also perform detailed analysis on several issues including the behavior of various combinations between color model and visual tracker, the degree of difficulty of each sequence for tracking, and how different challenge factors affect the tracking performance. We expect the study to provide the guidance, motivation and benchmark for future work on encoding color in visual tracking.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2482905
  • [Show abstract] [Hide abstract]
    ABSTRACT: Feature descriptors around local interest points are widely used in human action recognition both for images and videos. However, each kind of descriptors describes the local characteristics around the reference point only from one cue. To enhance the descriptive and discriminative ability from multiple cues, this paper proposes a descriptor learning framework to optimize the descriptors at the source by learning a projection from multiple descriptors' spaces to a new Euclidean space. In this space, multiple cues and characteristics of different descriptors are fused and complemented for each other. In order to make the new descriptor more discriminative, we learn the multi-cue projection by the minimization of the ratio of within-class scatter to between-class scatter and therefore the discriminative ability of the projected descriptor is enhanced. In the experiment, we evaluate our framework on the tasks of action recognition from still images and videos. Experimental results on two benchmark image and two benchmark video datasets demonstrate the effectiveness and better performance of our method.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2479917
  • [Show abstract] [Hide abstract]
    ABSTRACT: In many image processing and pattern recognition problems, visual contents of images are currently described by high-dimensional features, which are often redundant and noisy. Towards this end, we propose a novel unsupervised feature selection scheme, namely Nonnegative Spectral analysis with Constrained Redundancy (NSCR), by jointly leveraging nonnegative spectral clustering and redundancy analysis. The proposed method can directly identify a discriminative subset of the most useful and redundancy-constrained features. Nonnegative spectral analysis is developed to learn more accurate cluster labels of the input images, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables to select the most discriminative features. Row-wise sparse models with a general ℓ2;p-norm (0 < p 1) are leveraged to make the proposed model suitable for feature selection and robust to noise. Besides, the redundancy between features is explicitly exploited to control the redundancy of the selected subset. The proposed problem is formulated as an optimization problem with a well-defined objective function solved by the developed simple yet efficient iterative algorithm. Finally, we conduct extensive experiments on 9 diverse image benchmarks, including face data, handwritten digit data and object image data. The proposed method achieves encouraging experimental results in comparison with several representative algorithms, which demonstrates the effectiveness of the proposed algorithm for unsupervised feature selection.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2479560
  • [Show abstract] [Hide abstract]
    ABSTRACT: Edge preserving regularization using partial differential equation (PDE) based methods although extensively studied and widely used for image restoration, still have limitations in adapting to local structures. We propose a spatially adaptive multiscale variable exponent-based anisotropic variational PDE method that overcomes current shortcomings such as over smoothing and staircasing artifacts while still retaining and enhancing edge structures across scale. Our innovative model automatically balances between Tikhonov and total variation (TV) regularization effects using scene content information by incorporating a spatially varying edge coherence exponent map constructed using the eigenvalues of the filtered structure tensor. The multiscale exponent model we develop leads to a novel restoration method that preserves edges better and provides selective denoising without generating artifacts for both additive and multiplicative noise models. Mathematical analysis of our proposed method in variable exponent space establishes the existence of a minimizer and its properties. The discretization method we use satisfies the maximum-minimum principle which guarantees that artificial edge regions are not created. Extensive experimental results using synthetic, and natural images indicate that the proposed Multiscale Tikhonov-Total Variation (MTTV) and Dynamical MTTV (D-MTTV) methods perform better than many contemporary denoising algorithms in terms of several metrics including signal-to-noise ratio improvement and structure preservation. Promising extensions to handle multiplicative noise models and multichannel imagery are also discussed.
    IEEE Transactions on Image Processing 09/2015; 24(12). DOI:10.1109/TIP.2015.2479471
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sparse representation shows impressive results for image classification, however, it cannot well characterize the discriminant structure of data, which is important for classification. This paper aims to seek a projection matrix such that the low-dimensional representations well characterize the discriminant structure embedded in high-dimensional data and simultaneously well fit spare representation-based classifier (SRC). To be specific, Fisher discriminant criterion (FDC) is used to extract the discriminant structure, and spare representation is simultaneously considered to guarantee that the projected data well satisfy SRC. Thus, our method, called SRC-FDC, characterizes both the spatial Euclidean distribution and local reconstruction relationship, which enable SRC to achieve better performance. Extensive experiments are done on the AR, CMU-PIE, Extended Yale B face image databases, the USPS digit database and COIL20 database, and results illustrate that the proposed method is more efficient than other feature extraction methods based on SRC.
    IEEE Transactions on Image Processing 09/2015; DOI:10.1109/TIP.2015.2479559
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a hierarchical framework for detecting local and global anomalies via hierarchical feature representation and Gaussian process regression (GPR) which is fully non-parametric and robust to the noisy training data, and supports sparse features. While most research on anomaly detection has focused more on detecting local anomalies, we are more interested in global anomalies that involve multiple normal events interacting in an unusual manner such as car accidents. To simultaneously detect local and global anomalies, we cast the extraction of normal interactions from the training videos as a problem of finding the frequent geometric relations of the nearby sparse spatio-temporal interest points (STIPs). A codebook of interaction templates is then constructed and modeled using GPR, based on which a novel inference method for computing the likelihood of an observed interaction is also developed. Thereafter, these local likelihood scores are integrated into globally consistent anomaly masks, from which anomalies can be succinctly identified. To the authors' best knowledge, it is the first time GPR is employed to model the relationship of the nearby STIPs for anomaly detection. Simulations based on four widespread datasets show that the new method outperforms the main state-of-the-art methods with lower computational burden.
    IEEE Transactions on Image Processing 09/2015; 24(12). DOI:10.1109/TIP.2015.2479561
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a video stabilization algorithm, which extracts a guaranteed number of reliable feature trajectories for robust mesh grid warping. We first estimate feature trajectories through a video sequence and transform the feature positions into rolling-free smoothed positions. When the number of the estimated trajectories is insufficient, we generate virtual trajectories by augmenting incomplete trajectories using a low-rank matrix completion scheme. Next, we detect feature points on a large moving object and exclude them so as to stabilize camera movements, rather than object movements. With the selected feature points, we set a mesh grid on each frame and warp each grid cell by moving the original feature positions to the smoothed ones. For robust warping, we formulate a cost function based on the reliability weights of each feature point and each grid cell. The cost function consists of a data term, a structurepreserving term, and a regularization term. By minimizing the cost function, we determine the robust mesh grid warping and achieve the stabilization. Experimental results demonstrate that the proposed algorithm reconstructs videos more stably than conventional algorithms.
    IEEE Transactions on Image Processing 09/2015; 24(12). DOI:10.1109/TIP.2015.2479918
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with designing sensing matrix for compressive sensing (CS) systems. Traditionally, the optimal sensing matrix is designed so that the Gram of the equivalent dictionary is as close as possible to a target Gram with small mutual coherence. A novel design strategy is proposed, in which, unlike the traditional approaches, the measure takes into account of mutual coherence behavior of the equivalent dictionary as well as sparse representation errors of the signals. The optimal sensing matrix is defined as the one that minimizes this measure and hence is expected to be more robust against sparse representation errors. A closed-form solution is derived for the optimal sensing matrix with a given target Gram. An alternating minimization based algorithm is also proposed for addressing the same problem with the target Gram searched within a set of relaxed equiangular tight frame Grams. Experiments are carried out and the results show that the sensing matrix obtained using the proposed approach outperforms those existing ones using a fixed dictionary in terms of signal reconstruction accuracy for synthetic data and peak signal-to-noise ratio for real images.
    IEEE Transactions on Image Processing 09/2015; 24(12). DOI:10.1109/TIP.2015.2479474