IEEE Transactions on Image Processing (IEEE T IMAGE PROCESS)

Publisher: IEEE Signal Processing Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers

Journal description

This journal will focus on the signal processing aspects of image acquisition, processing, and display, especially where it concerns modeling, design, and analysis having a strong mathematical basis.

Current impact factor: 3.63

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 3.625
2013 Impact Factor 3.111
2012 Impact Factor 3.199
2011 Impact Factor 3.042
2010 Impact Factor 2.606
2009 Impact Factor 2.848
2008 Impact Factor 3.315
2007 Impact Factor 2.462
2006 Impact Factor 2.715
2005 Impact Factor 2.428
2004 Impact Factor 2.011
2003 Impact Factor 2.642
2002 Impact Factor 2.553
2001 Impact Factor 2.185
2000 Impact Factor 2.078
1999 Impact Factor 2.695
1998 Impact Factor 1.364
1997 Impact Factor 1.063

Impact factor over time

Impact factor

Additional details

5-year impact 4.48
Cited half-life 7.70
Immediacy index 0.44
Eigenfactor 0.04
Article influence 1.58
Website IEEE Transactions on Image Processing website
Other titles IEEE transactions on image processing, Institute of Electrical and Electronics Engineers transactions on image processing, Image processing
ISSN 1941-0042
OCLC 24103523
Material type Periodical, Internet resource
Document type Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on Author's personal website, employers website or publicly accessible server
    • Author's post-print on Author's server or Institutional server
    • Author's pre-print must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or replaced with Authors post-print
    • Author's pre-print must be accompanied with set-phrase, once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Author's pre-print must be accompanied with set-phrase, when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • IEEE must be informed as to the electronic address of the pre-print
    • If funding rules apply authors may post Author's post-print version in funder's designated repository
    • Author's Post-print - Publisher copyright and source must be acknowledged with citation (see above set statement)
    • Author's Post-print - Must link to publisher version with DOI
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Horror content sharing on the Web is a growing phenomenon that can interfere with our daily life and affect the mental health of those involved. As an important form of expression, horror images have their own characteristics that can evoke extreme emotions. In this paper, we present a novel context-aware multi-instance learning (CMIL) algorithm for horror image recognition. The CMIL algorithm identifies horror images and picks out the regions that cause the sensation of horror in these horror images. It obtains contextual cues among adjacent regions in an image using a random walk on a contextual graph. Borrowing the strength of the fuzzy support vector machine (FSVM), we define a heuristic optimization procedure based on the FSVM to search for the optimal classifier for the CMIL. To improve the initialization of the CMIL, we propose a novel visual saliency model based on the tensor analysis. The average saliency value of each segmented region is set as its initial fuzzy membership in the CMIL. The advantage of the tensor-based visual saliency model is that it not only adaptively selects features, but also dynamically determines fusion weights for saliency value combination from different feature subspaces. The effectiveness of the proposed CMIL model is demonstrated by its use in horror image recognition on two large-scale image sets collected from the Internet.
    IEEE Transactions on Image Processing 12/2015; 24(12):5193-5205. DOI:10.1109/TIP.2015.2479400
  • [Show abstract] [Hide abstract]
    ABSTRACT: One of the light field capturing techniques is the focused plenoptic capturing. By placing a microlens array in front of the photosensor, the focused plenoptic cameras capture both spatial and angular information of a scene in each microlens image and across microlens images. The capturing results in significant amount of redundant information, and the captured image is usually of a large resolution. A coding scheme that removes the redundancy before coding can be of advantage for efficient compression, transmission and rendering. In this paper, we propose a lossy coding scheme to efficiently represent plenoptic images. The format contains a sparse image set and its associated disparities. The reconstruction is performed by disparity-based interpolation and inpainting, and the reconstructed image is later employed as a prediction reference for the coding of the full plenoptic image. As an outcome of the representation, the proposed scheme inherits a scalable structure with three layers. The results show that plenoptic images are compressed efficiently with over 60 percent bit rate reduction compared to HEVC intra, and with over 20 percent compared to HEVC block copying mode.
    IEEE Transactions on Image Processing 11/2015; DOI:10.1109/TIP.2015.2498406
  • [Show abstract] [Hide abstract]
    ABSTRACT: The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We therefore propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization and each sub-problem can be efficiently solved. Experiments on two challenging real-world image datasets demonstrate the effectiveness and superiority of the proposed method.
    IEEE Transactions on Image Processing 11/2015; DOI:10.1109/TIP.2015.2495116
  • [Show abstract] [Hide abstract]
    ABSTRACT: When applying sparse representation techniques to images, the standard approach is to independently compute the representations for a set of overlapping image patches. This method performs very well in a variety of applications, but results in a representation that is multi-valued and not optimised with respect to the entire image. An alternative representation structure is provided by convolutional sparse representation, in which a sparse representation of an entire image is computed by replacing the linear combination of a set of dictionary vectors by the sum of a set of convolutions with dictionary filters. The resulting representation is both single-valued and jointly optimised over the entire image. While this form of sparse representation has been applied to a variety of problems in signal and image processing and computer vision, the computational expense of the corresponding optimisation problems has restricted application to relatively small signals and images. This paper presents new, efficient algorithms that substantially improve on the performance of other recent methods, contributing to the development of this type of representation as a practical tool for a wider range of problems.
    IEEE Transactions on Image Processing 11/2015; DOI:10.1109/TIP.2015.2495260
  • [Show abstract] [Hide abstract]
    ABSTRACT: Two dimensionally indexed Random Coefficients Autoregressive models (2D-RCA) are obtained by introducing an appropriate random field coefficients to an AR model on Z2. The study of such models is motivated by their capability to capture the spacevarying behavior of the volatility. A Generalized Method of Moment (GMM) approach is considered to estimate the 2D-RCA models. Consistency and the asymptotic normality (CAN) of the estimates are derived. Estimated parameters are used, at a later stage, as pixel features in texture image classication.
    IEEE Transactions on Image Processing 11/2015; DOI:10.1109/TIP.2015.2494740
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a novel approach to local 3D surface matching representation suitable for a range of 3D vision applications is introduced. Local 3D surface patches around key-points on the 3D surface are represented by 2D images such that the representing 2D images enjoy certain characteristics which positively impact the matching accuracy, robustness and speed. First, the proposed representation is complete, in the sense there is no information loss during their computation. Second, the 2D representations are strictly invariant to all the 3DoF rotations. To optimally avail surface information, the sensitivity of the representations to surface information is adjustable. This also provides the proposed matching representation with the means to optimally adjust to a particular class of problems/applications or an acquisition technology. Each 2D matching representation is a sequence of adjustable integral kernels, where each kernel is efficiently computed from a triple of precise 3D curves (profiles) formed by intersecting three concentric spheres with the 3D surface. Robust techniques for sampling the profiles and establishing correspondences among them were devised. Based on the proposed matching representation, two techniques for the detection of key-points were presented. The first is suitable for static images while the second is suitable for 3D videos. The approach was tested on the FRGC v2.0, the 3D TEC and the Bosphorus datasets and a superior face recognition performance was achieved. Additionally, the proposed approach was used in object class recognition and tested on a Kinect dataset.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2492826
  • [Show abstract] [Hide abstract]
    ABSTRACT: Representation-based classifiers (RCs) have attracted considerable attention in face recognition in recent years. However, most existing RCs use the mean square error (MSE) criterion as the cost function, which relies on the Gaussianity assumption of the error distribution and is sensitive to non- Gaussian noise. This may severely degrade the performance of MSE based RCs in recognizing facial images with random occlusion and corruption. In this paper, we present a minimum error entropy based atomic representation (MEEAR) framework for face recognition. Unlike existing MSE based RCs, our framework is based on the minimum error entropy criterion, which is not dependent on the error distribution and shown to be more robust to noise. Specifically, MEEAR can produce discriminative representation vector by minimizing the atomic norm regularized Renyi's entropy of the reconstruction error. The optimality conditions are provided for general atomic representation model. As a general framework, MEEAR can also be used as a platform to develop new classifiers. Two effective MEE based RCs are proposed by defining appropriate atomic sets. The experimental results on popular face databases show that MEEAR can improve both the recognition accuracy and the reconstructed results compared with state-of-the-art MSE based RCs.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2492819
  • [Show abstract] [Hide abstract]
    ABSTRACT: Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: Videoto- Still (V2S), Still-to-Video (S2V) and Video-to-Video (V2V), respectively taking video or still image as query or target. To our best knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX1 Face DB. Specifically, we make three contributions. Firstly, we collect and release a large scale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V and V2V). Secondly, for benchmarking the three scenarios designed on our dataset, we review and experimentally compare a number of existing set-based methods. Thirdly, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark dataset for evaluation.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2493448
  • [Show abstract] [Hide abstract]
    ABSTRACT: Blind image deconvolution (BID) aims to remove or reduce the degradations that have occurred during the acquisition or processing. It is a challenging ill-posed problem due to lack of enough information in degraded image for unambiguous recovery of both point spread function (PSF) and clear image. Although, recently many powerful algorithms appeared; however, it is still an active research area due to the diversity of degraded images as well as degradations. Closed-loop control systems are characterized with their powerful ability to stabilize the behaviour response and overcome external disturbances by designing an effective feedback optimization. In this research, we employed feedback control to enhance the stability of BID by driving the current estimation quality of PSF to the desired level without manually selecting restoration parameters and using an effective combination of machine learning with feedback optimization. The foremost challenge when designing a feedback structure is to construct or choose a suitable performance metric as a controlled index and a feedback information. Our proposed quality metric is based on the blur assessment of deconvolved patches to identify the best PSF and computing its relative quality. Kalman filter based extremum seeking approach is employed to find the optimum value of controlled variable. To find better restoration parameters, learning algorithms such as multilayer perceptron and bagged decision trees are used to estimate the generic PSF support size instead of trial and error methods. The problem is modelled as a combination of pattern classification and regression by using multiple training features including noise metrics, blur metrics and low-level statistics. Multi-objective genetic algorithm is used to find key patches from multiple saliency maps which enhance performance and save extra computation by avoiding ineffectual regions of the image. The proposed scheme is shown to outperform corresponding open-loop schemes, which often fails or needs many assumptions regarding images and thus resulting in sub-optimal results.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2492825
  • [Show abstract] [Hide abstract]
    ABSTRACT: A major difference between amateur and professional video lies in the quality of camera paths. Previous work on video stabilization has considered how to improve amateur video by smoothing the camera path. In this paper, we show that additional changes to the camera path can further improve video aesthetics. Our new optimization method achieves multiple simultaneous goals: (i) stabilizing video content over short time scales, (ii) ensuring simple and consistent camera paths over longer time scales, and (iii) improving scene composition by automatically removing distractions, a common occurrence in amateur video. Our approach uses an L1 camera path optimization framework, extended to handle multiple constraints. Two-passes of optimization are used to address both low-level and high-level constraints on the camera path. Experimental and user study results show that our approach outputs video which is perceptually better than the input, or the results of using stabilization only.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2493959
  • [Show abstract] [Hide abstract]
    ABSTRACT: Quality evaluation of underwater images is a key goal of underwater video image retrieval and intelligent processing. To date, no metric has been proposed for underwater colour image quality evaluation. The special absorption and scattering characteristics of the water medium do not allow direct application of natural colour image quality metrics especially to different underwater environments. In this paper, subjective testing for underwater image quality has been organized. The statistical distribution of the underwater image pixels in the CIELab colour space related to subjective evaluation indicates the sharpness and colourful factors correlate well with subjective image quality perception. Based on these, a new underwater colour image quality evaluation (UCIQE) metric, which is a linear combination of chroma, saturation and contrast, is proposed to quantify the non-uniform colour cast, blurring and low-contrast that characterize underwater engineering and monitoring images. Experiments are conducted to illustrate the performance of the proposed UCIQE metric and its capability to measure the underwater image enhancement results. They show that the proposed metric has comparable performance to the leading natural colour image quality metrics and the underwater grayscale image quality metrics available in literature, and can predict with higher accuracy the relative amount of degradation with similar image content in underwater environments. Importantly, UCIQE is a simple and fast solution for real-time underwater video processing. The effectiveness of the presented measure is also demonstrated by subjective evaluation. The results show better correlation between UCIQE and the subjective Mean Opinion Score (MOS).
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2491020
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a zero-mean white Gaussian noise removal method using high-resolution frequency analysis. It is difficult to separate an original image component from a noise component when using discrete Fourier transform (DFT) or discrete cosine transform (DCT) for analysis because sidelobes occur in the results. Two-dimensional non-harmonic analysis (2D NHA) is a high-resolution frequency analysis technique that improves noise removal accuracy because of its sidelobe reduction feature. However, spectra generated by NHA are distorted, because of which the signal of the image is non-stationary. In this paper, we analyzes each region with a homogeneous texture in the noisy image.Non-uniform regions that occur due to segmentation are analyzed by an extended 2D NHA method called Mask NHA. We conducted an experiment using a simulation image, and found that Mask NHA denoising attains a higher PSNR value than state-of-the-art methods if a suitable segmentation result can be obtained from the input image, even though parameter optimization was incomplete. This experimental result exhibits the upper limit on the value of PSNR in our Mask NHA denoising method. The performance of Mask NHA denoising is expected to approach the limit of PSNR by improving the segmentation method.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2494461