IEEE Transactions on Image Processing (IEEE T IMAGE PROCESS )

Publisher: IEEE Signal Processing Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers


This journal will focus on the signal processing aspects of image acquisition, processing, and display, especially where it concerns modeling, design, and analysis having a strong mathematical basis.

  • Impact factor
  • 5-year impact
  • Cited half-life
  • Immediacy index
  • Eigenfactor
  • Article influence
  • Website
    IEEE Transactions on Image Processing website
  • Other titles
    IEEE transactions on image processing, Institute of Electrical and Electronics Engineers transactions on image processing, Image processing
  • ISSN
  • OCLC
  • Material type
    Periodical, Internet resource
  • Document type
    Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Authors own and employers publicly accessible webpages
    • Preprint - Must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or Authors post-print
    • Preprint - Set-phrase must be added once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Preprint - Set phrase must be added when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • Preprint - IEEE must be informed as to the electronic address of the pre-print
    • Postprint - Publisher copyright and source must be acknowledged (see above set statement)
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification
    ​ green

Publications in this journal

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we investigate the impact of spatial, temporal, and amplitude resolution on the perceptual quality of a compressed video. Subjective quality tests were carried out on a mobile device and a total of 189 processed video sequences with 10 source sequences included in the test. Subjective data reveal that the impact of spatial resolution (SR), temporal resolution (TR), and quantization stepsize (QS) can each be captured by a function with a single content-dependent parameter, which indicates the decay rate of the quality with each resolution factor. The joint impact of SR, TR, and QS can be accurately modeled by the product of these three functions with only three parameters. The impact of SR and QS on the quality are independent of that of TR, but there are significant interactions between SR and QS. Furthermore, the model parameters can be predicted accurately from a few content features derived from the original video. The proposed model correlates well with the subjective ratings with a Pearson correlation coefficient of 0.985 when the model parameters are predicted from content features. The quality model is further validated on six other subjective rating data sets with very high accuracy and outperforms several well-known quality models.
    IEEE Transactions on Image Processing 06/2014; 23(6):2473-86.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We address single image super-resolution using a statistical prediction model based on sparse representations of low- and high-resolution image patches. The suggested model allows us to avoid any invariance assumption, which is a common practice in sparsity-based approaches treating this task. Prediction of high resolution patches is obtained via MMSE estimation and the resulting scheme has the useful interpretation of a feedforward neural network. To further enhance performance, we suggest data clustering and cascading several levels of the basic algorithm. We suggest a training scheme for the resulting network and demonstrate the capabilities of our algorithm, showing its advantages over existing methods based on a low- and high-resolution dictionary pair, in terms of computational complexity, numerical criteria, and visual appearance. The suggested approach offers a desirable compromise between low computational complexity and reconstruction quality, when comparing it with state-of-the-art methods for single image super-resolution.
    IEEE Transactions on Image Processing 06/2014; 23(6):2569-82.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the Part 1 of this two-part study, we present a method of imaging and velocity estimation of ground moving targets using passive synthetic aperture radar. Such a system uses a network of small, mobile receivers that collect scattered waves due to transmitters of opportunity, such as commercial television, radio, and cell phone towers. Therefore, passive imaging systems have significant cost, manufacturing, and stealth advantages over active systems. We describe a novel generalized Radon transform-type forward model and a corresponding filtered-backprojection-type image formation and velocity estimation method. We form a stack of position images over a range of hypothesized velocities, and show that the targets can be reconstructed at the correct position whenever the hypothesized velocity is equal to the true velocity of targets. We then use entropy to determine the most accurate velocity and image pair for each moving target. We present extensive numerical simulations to verify the reconstruction method. Our method does not require a priori knowledge of transmitter locations and transmitted waveforms. It can determine the location and velocity of multiple targets moving at different velocities. Furthermore, it can accommodate arbitrary imaging geometries. In Part 2, we present the resolution analysis and analysis of positioning errors in passive SAR images due to erroneous velocity estimation.
    IEEE Transactions on Image Processing 06/2014; 23(6):2487-500.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fluorescence diffuse optical tomography (FDOT) is an emerging molecular imaging modality that uses near infrared light to excite the fluorophore injected into tissue; and to reconstruct the fluorophore concentration from boundary measurements. The FDOT image reconstruction is a highly ill-posed inverse problem due to a large number of unknowns and limited number of measurements. However, the fluorophore distribution is often very sparse in the imaging domain since fluorophores are typically designed to accumulate in relatively small regions. In this paper, we use compressive sensing (CS) framework to design light illumination and detection patterns to improve the reconstruction of sparse fluorophore concentration. Unlike the conventional FDOT imaging where spatially distributed light sources illuminate the imaging domain one at a time and the corresponding boundary measurements are used for image reconstruction, we assume that the light sources illuminate the imaging domain simultaneously several times and the corresponding boundary measurements are linearly filtered prior to image reconstruction. We design a set of optical intensities (illumination patterns) and a linear filter (detection pattern) applied to the boundary measurements to improve the reconstruction of sparse fluorophore concentration maps. We show that the FDOT sensing matrix can be expressed as a columnwise Kronecker product of two matrices determined by the excitation and emission light fields. We derive relationships between the incoherence of the FDOT forward matrix and these two matrices, and use these results to reduce the incoherence of the FDOT forward matrix. We present extensive numerical simulation and the results of a real phantom experiment to demonstrate the improvements in image reconstruction due to the CS-based light illumination and detection patterns in conjunction with relaxation and greedy-type reconstruction algorithms.
    IEEE Transactions on Image Processing 06/2014; 23(6):2609-24.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a novel local pattern descriptor generated by the proposed local vector pattern (LVP) in high-order derivative space is presented for use in face recognition. Based on the vector of each pixel constructed by computing the values between the referenced pixel and the adjacent pixels with diverse distances from different directions, the vector representation of the referenced pixel is generated to provide the one-dimensional structure of micropatterns. With the devise of pairwise direction of vector for each pixel, the LVP reduces the feature length via comparative space transform (CST) to encode various spatial surrounding relationships between the referenced pixel and its neighborhood pixels. Besides, the concatenation of LVPs is compacted to produce more distinctive features. To effectively extract more detailed discriminative information in a given subregion, the vector of LVP is refined by varying local derivative directions from the n th-order LVP in (n-1)th-order derivative space, which is a much more resilient structure of micropatterns than standard local pattern descriptors. The proposed LVP is compared with the existing local pattern descriptors including local binary pattern (LBP), local derivative pattern (LDP), and local tetra pattern (LTrP) to evaluate the performances from input grayscale face images. Moreover, extensive experiments conducting on benchmark face image databases, FERET, CASPEAL, CMU-PIE, Extended Yale B and LFW, demonstrate that the proposed LVP in high-order derivative space indeed performs much better than LBP, LDP and LTrP in face recognition.
    IEEE Transactions on Image Processing 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Efficient video representation models are critical for many video analysis and processing tasks. In this paper, we present a framework based on the concept of finding the sparsest solution to model video frames. To model the spatio-temporal information, frames from one scene are decomposed into two components: (i) a common frame, which describes the visual information common to all the frames in the scene/segment, and (ii) a set of innovative frames, which depicts the dynamic behaviour of the scene. The proposed approach exploits and builds on recent results in the field of compressed sensing to jointly estimate the common frame and the innovative frames for each video segment. We refer to the proposed modeling framework by CIV (Common and Innovative Visuals). We show how the proposed model can be utilized to find scene change boundaries and extend CIV to videos from multiple scenes. Furthermore, the proposed model is robust to noise and can be used for various video processing applications without relying on motion estimation and detection or image segmentation. Results for object tracking, video editing (object removal, inpainting) and scene change detection are presented to demonstrate the efficiency and the performance of the proposed model.
    IEEE Transactions on Image Processing 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new mathematical and algorithmic framework for unsupervised image segmentation, which is a critical step in a wide variety of image processing applications. We have found that most existing segmentation methods are not successful on histopathology images, which prompted us to investigate segmentation of a broader class of images, namely those without clear edges between the regions to be segmented. We model these images as occlusions of random images, which we call textures, and show that local histograms are a useful tool for segmenting them. Based on our theoretical results, we describe a flexible segmentation framework that draws on existing work on nonnegative matrix factorization and image deconvolution. Results on synthetic texture mosaics and real histology images show the promise of the method.
    IEEE Transactions on Image Processing 05/2014; 23(5):2033-46.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new strategy to evaluate the quality of multi and hyperspectral images, from the perspective of human perception. We define the spectral image difference as the overall perceived difference between two spectral images under a set of specified viewing conditions (illuminants). First, we analyze the stability of seven image-difference features across illuminants, by means of an information-theoretic strategy. We demonstrate, in particular, that in the case of common spectral distortions (spectral gamut mapping, spectral compression, spectral reconstruction), chromatic features vary much more than achromatic ones despite considering chromatic adaptation. Then, we propose two computationally efficient spectral image difference metrics and compare them to the results of a subjective visual experiment. A significant improvement is shown over existing metrics such as the widely used root-mean square error.
    IEEE Transactions on Image Processing 05/2014; 23(5):2058-68.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In image classification tasks, one of the most successful algorithms is the bag-of-features (BoFs) model. Although the BoF model has many advantages, such as simplicity, generality, and scalability, it still suffers from several drawbacks, including the limited semantic description of local descriptors, lack of robust structures upon single visual words, and missing of efficient spatial weighting. To overcome these shortcomings, various techniques have been proposed, such as extracting multiple descriptors, spatial context modeling, and interest region detection. Though they have been proven to improve the BoF model to some extent, there still lacks a coherent scheme to integrate each individual module together. To address the problems above, we propose a novel framework with spatial pooling of complementary features. Our model expands the traditional BoF model on three aspects. First, we propose a new scheme for combining texture and edge-based local features together at the descriptor extraction level. Next, we build geometric visual phrases to model spatial context upon complementary features for midlevel image representation. Finally, based on a smoothed edgemap, a simple and effective spatial weighting scheme is performed to capture the image saliency. We test the proposed framework on several benchmark data sets for image classification. The extensive results show the superior performance of our algorithm over the state-of-the-art methods.
    IEEE Transactions on Image Processing 05/2014; 23(5):1994-2008.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a sequential framework for change detection. This framework allows us to use multiple images from reference and mission passes of a scene of interest in order to improve detection performance. It includes a change statistic that is easily updated when additional data becomes available. Detection performance using this statistic is predictable when the reference and image data are drawn from known distributions. We verify our performance prediction by simulation. Additionally, we show that detection performance improves with additional measurements on a set of synthetic aperture radar images and a set of visible images with unknown probability distributions.
    IEEE Transactions on Image Processing 05/2014; 23(5):2405-13.
  • IEEE Transactions on Image Processing 05/2014; 23(5):2277-2290.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flexible rate-adaptation under varying channel conditions. Accurately predicting the users' quality of experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos. Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online TVSQ prediction in HTTP-based streaming.
    IEEE Transactions on Image Processing 05/2014; 23(5):2206-21.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel saliency detection framework termed as saliency tree. For effective saliency measurement, the original image is first simplified using adaptive color quantization and region segmentation to partition the image into a set of primitive regions. Then, three measures, i.e., global contrast, spatial sparsity, and object prior are integrated with regional similarities to generate the initial regional saliency for each primitive region. Next, a saliency-directed region merging approach with dynamic scale control scheme is proposed to generate the saliency tree, in which each leaf node represents a primitive region and each non-leaf node represents a non-primitive region generated during the region merging process. Finally, by exploiting a regional center-surround scheme based node selection criterion, a systematic saliency tree analysis including salient node selection, regional saliency adjustment and selection is performed to obtain final regional saliency measures and to derive the high-quality pixel-wise saliency map. Extensive experimental results on five datasets with pixel-wise ground truths demonstrate that the proposed saliency tree model consistently outperforms the state-of-the-art saliency models.
    IEEE Transactions on Image Processing 05/2014; 23(5):1937-52.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In inverse synthetic aperture radar (ISAR) imaging, a target is usually regarded as consist of a few strong (specular) scatterers and the distribution of these strong scatterers is sparse in the imaging volume. In this paper, we propose to incorporate the sparse signal recovery method in 3D multiple-input multiple-output radar imaging algorithm. Sequential order one negative exponential (SOONE) function, which forms homotopy between 1 and 0 norms, is proposed to measure the sparsity. Gradient projection is used to solve a constrained nonconvex SOONE function minimization problem and recover the sparse signal. However, while the gradient projection method is computationally simple, it is not robust when a matrix in the algorithm is ill conditioned. We thus further propose using diagonal loading and singular value decomposition methods to improve the robustness of the algorithm. In order to handle targets with large flat surfaces, a combined amplitude and total-variation objective function is also proposed to regularize the shapes of the flat surfaces. Simulation results show that the proposed gradient projection of SOONE function method is better than orthogonal matching pursuit, CoSaMp, l1-magic, Bayesian method with Laplace prior, smoothed l0 method, and l1-ls in high SNR cases for recovery of ± 1 random spikes sparse signal. The quality of the simulated 3D images and real data ISAR images obtained using the new method is better than that of the conventional correlation method and minimum l2 norm method, and competitive to the aforementioned sparse signal recovery algorithms.
    IEEE Transactions on Image Processing 05/2014; 23(5):2168-83.