IEEE Transactions on Image Processing (IEEE T IMAGE PROCESS )

Publisher: IEEE Signal Processing Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers


This journal will focus on the signal processing aspects of image acquisition, processing, and display, especially where it concerns modeling, design, and analysis having a strong mathematical basis.

Impact factor 3.11

  • Hide impact factor history
    Impact factor
  • 5-year impact
  • Cited half-life
  • Immediacy index
  • Eigenfactor
  • Article influence
  • Website
    IEEE Transactions on Image Processing website
  • Other titles
    IEEE transactions on image processing, Institute of Electrical and Electronics Engineers transactions on image processing, Image processing
  • ISSN
  • OCLC
  • Material type
    Periodical, Internet resource
  • Document type
    Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on Author's personal website, employers website or publicly accessible server
    • Author's post-print on Author's server or Institutional server
    • Author's pre-print must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or replaced with Authors post-print
    • Author's pre-print must be accompanied with set-phrase, once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Author's pre-print must be accompanied with set-phrase, when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • IEEE must be informed as to the electronic address of the pre-print
    • If funding rules apply authors may post Author's post-print version in funder's designated repository
    • Author's Post-print - Publisher copyright and source must be acknowledged with citation (see above set statement)
    • Author's Post-print - Must link to publisher version with DOI
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: We address single image super-resolution using a statistical prediction model based on sparse representations of low- and high-resolution image patches. The suggested model allows us to avoid any invariance assumption, which is a common practice in sparsity-based approaches treating this task. Prediction of high resolution patches is obtained via MMSE estimation and the resulting scheme has the useful interpretation of a feedforward neural network. To further enhance performance, we suggest data clustering and cascading several levels of the basic algorithm. We suggest a training scheme for the resulting network and demonstrate the capabilities of our algorithm, showing its advantages over existing methods based on a low- and high-resolution dictionary pair, in terms of computational complexity, numerical criteria, and visual appearance. The suggested approach offers a desirable compromise between low computational complexity and reconstruction quality, when comparing it with state-of-the-art methods for single image super-resolution.
    IEEE Transactions on Image Processing 06/2014; 23(6):2569-82.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we investigate the impact of spatial, temporal, and amplitude resolution on the perceptual quality of a compressed video. Subjective quality tests were carried out on a mobile device and a total of 189 processed video sequences with 10 source sequences included in the test. Subjective data reveal that the impact of spatial resolution (SR), temporal resolution (TR), and quantization stepsize (QS) can each be captured by a function with a single content-dependent parameter, which indicates the decay rate of the quality with each resolution factor. The joint impact of SR, TR, and QS can be accurately modeled by the product of these three functions with only three parameters. The impact of SR and QS on the quality are independent of that of TR, but there are significant interactions between SR and QS. Furthermore, the model parameters can be predicted accurately from a few content features derived from the original video. The proposed model correlates well with the subjective ratings with a Pearson correlation coefficient of 0.985 when the model parameters are predicted from content features. The quality model is further validated on six other subjective rating data sets with very high accuracy and outperforms several well-known quality models.
    IEEE Transactions on Image Processing 06/2014; 23(6):2473-86.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the Part 1 of this two-part study, we present a method of imaging and velocity estimation of ground moving targets using passive synthetic aperture radar. Such a system uses a network of small, mobile receivers that collect scattered waves due to transmitters of opportunity, such as commercial television, radio, and cell phone towers. Therefore, passive imaging systems have significant cost, manufacturing, and stealth advantages over active systems. We describe a novel generalized Radon transform-type forward model and a corresponding filtered-backprojection-type image formation and velocity estimation method. We form a stack of position images over a range of hypothesized velocities, and show that the targets can be reconstructed at the correct position whenever the hypothesized velocity is equal to the true velocity of targets. We then use entropy to determine the most accurate velocity and image pair for each moving target. We present extensive numerical simulations to verify the reconstruction method. Our method does not require a priori knowledge of transmitter locations and transmitted waveforms. It can determine the location and velocity of multiple targets moving at different velocities. Furthermore, it can accommodate arbitrary imaging geometries. In Part 2, we present the resolution analysis and analysis of positioning errors in passive SAR images due to erroneous velocity estimation.
    IEEE Transactions on Image Processing 06/2014; 23(6):2487-500.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fluorescence diffuse optical tomography (FDOT) is an emerging molecular imaging modality that uses near infrared light to excite the fluorophore injected into tissue; and to reconstruct the fluorophore concentration from boundary measurements. The FDOT image reconstruction is a highly ill-posed inverse problem due to a large number of unknowns and limited number of measurements. However, the fluorophore distribution is often very sparse in the imaging domain since fluorophores are typically designed to accumulate in relatively small regions. In this paper, we use compressive sensing (CS) framework to design light illumination and detection patterns to improve the reconstruction of sparse fluorophore concentration. Unlike the conventional FDOT imaging where spatially distributed light sources illuminate the imaging domain one at a time and the corresponding boundary measurements are used for image reconstruction, we assume that the light sources illuminate the imaging domain simultaneously several times and the corresponding boundary measurements are linearly filtered prior to image reconstruction. We design a set of optical intensities (illumination patterns) and a linear filter (detection pattern) applied to the boundary measurements to improve the reconstruction of sparse fluorophore concentration maps. We show that the FDOT sensing matrix can be expressed as a columnwise Kronecker product of two matrices determined by the excitation and emission light fields. We derive relationships between the incoherence of the FDOT forward matrix and these two matrices, and use these results to reduce the incoherence of the FDOT forward matrix. We present extensive numerical simulation and the results of a real phantom experiment to demonstrate the improvements in image reconstruction due to the CS-based light illumination and detection patterns in conjunction with relaxation and greedy-type reconstruction algorithms.
    IEEE Transactions on Image Processing 06/2014; 23(6):2609-24.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a novel local pattern descriptor generated by the proposed local vector pattern (LVP) in high-order derivative space is presented for use in face recognition. Based on the vector of each pixel constructed by computing the values between the referenced pixel and the adjacent pixels with diverse distances from different directions, the vector representation of the referenced pixel is generated to provide the one-dimensional structure of micropatterns. With the devise of pairwise direction of vector for each pixel, the LVP reduces the feature length via comparative space transform (CST) to encode various spatial surrounding relationships between the referenced pixel and its neighborhood pixels. Besides, the concatenation of LVPs is compacted to produce more distinctive features. To effectively extract more detailed discriminative information in a given subregion, the vector of LVP is refined by varying local derivative directions from the n th-order LVP in (n-1)th-order derivative space, which is a much more resilient structure of micropatterns than standard local pattern descriptors. The proposed LVP is compared with the existing local pattern descriptors including local binary pattern (LBP), local derivative pattern (LDP), and local tetra pattern (LTrP) to evaluate the performances from input grayscale face images. Moreover, extensive experiments conducting on benchmark face image databases, FERET, CASPEAL, CMU-PIE, Extended Yale B and LFW, demonstrate that the proposed LVP in high-order derivative space indeed performs much better than LBP, LDP and LTrP in face recognition.
    IEEE Transactions on Image Processing 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Efficient video representation models are critical for many video analysis and processing tasks. In this paper, we present a framework based on the concept of finding the sparsest solution to model video frames. To model the spatio-temporal information, frames from one scene are decomposed into two components: (i) a common frame, which describes the visual information common to all the frames in the scene/segment, and (ii) a set of innovative frames, which depicts the dynamic behaviour of the scene. The proposed approach exploits and builds on recent results in the field of compressed sensing to jointly estimate the common frame and the innovative frames for each video segment. We refer to the proposed modeling framework by CIV (Common and Innovative Visuals). We show how the proposed model can be utilized to find scene change boundaries and extend CIV to videos from multiple scenes. Furthermore, the proposed model is robust to noise and can be used for various video processing applications without relying on motion estimation and detection or image segmentation. Results for object tracking, video editing (object removal, inpainting) and scene change detection are presented to demonstrate the efficiency and the performance of the proposed model.
    IEEE Transactions on Image Processing 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Image reranking is effective for improving the performance of a text-based image search. However, existing reranking algorithms are limited for two main reasons: 1) the textual meta-data associated with images is often mismatched with their actual visual content and 2) the extracted visual features do not accurately describe the semantic similarities between images. Recently, user click information has been used in image reranking, because clicks have been shown to more accurately describe the relevance of retrieved images to search queries. However, a critical problem for click-based methods is the lack of click data, since only a small number of web images have actually been clicked on by users. Therefore, we aim to solve this problem by predicting image clicks. We propose a multimodal hypergraph learning-based sparse coding method for image click prediction, and apply the obtained click data to the reranking of images. We adopt a hypergraph to build a group of manifolds, which explore the complementarity of different features through a group of weights. Unlike a graph that has an edge between two vertices, a hyperedge in a hypergraph connects a set of vertices, and helps preserve the local smoothness of the constructed sparse codes. An alternating optimization procedure is then performed, and the weights of different modalities and the sparse codes are simultaneously obtained. Finally, a voting strategy is used to describe the predicted click as a binary event (click or no click), from the images' corresponding sparse codes. Thorough empirical studies on a large-scale database including nearly 330 K images demonstrate the effectiveness of our approach for click prediction when compared with several other methods. Additional image reranking experiments on real-world data show the use of click prediction is beneficial to improving the performance of prominent graph-based image reranking algorithms.
    IEEE Transactions on Image Processing 05/2014; 23(5):2019-32.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many imaging applications require the implementation of space-varying convolution for accurate restoration and reconstruction of images. Here, we use the term space-varying convolution to refer to linear operators whose impulse response has slow spatial variation. In addition, these space-varying convolution operators are often dense, so direct implementation of the convolution operator is typically computationally impractical. One such example is the problem of stray light reduction in digital cameras, which requires the implementation of a dense space-varying deconvolution operator. However, other inverse problems, such as iterative tomographic reconstruction, can also depend on the implementation of dense space-varying convolution. While space-invariant convolution can be efficiently implemented with the fast Fourier transform, this approach does not work for space-varying operators. So direct convolution is often the only option for implementing space-varying convolution. In this paper, we develop a general approach to the efficient implementation of space-varying convolution, and demonstrate its use in the application of stray light reduction. Our approach, which we call matrix source coding, is based on lossy source coding of the dense space-varying convolution matrix. Importantly, by coding the transformation matrix, we not only reduce the memory required to store it; we also dramatically reduce the computation required to implement matrix-vector products. Our algorithm is able to reduce computation by approximately factoring the dense space-varying convolution operator into a product of sparse transforms. Experimental results show that our method can dramatically reduce the computation required for stray light reduction while maintaining high accuracy.
    IEEE Transactions on Image Processing 05/2014; 23(5):1965-79.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In inverse synthetic aperture radar (ISAR) imaging, a target is usually regarded as consist of a few strong (specular) scatterers and the distribution of these strong scatterers is sparse in the imaging volume. In this paper, we propose to incorporate the sparse signal recovery method in 3D multiple-input multiple-output radar imaging algorithm. Sequential order one negative exponential (SOONE) function, which forms homotopy between 1 and 0 norms, is proposed to measure the sparsity. Gradient projection is used to solve a constrained nonconvex SOONE function minimization problem and recover the sparse signal. However, while the gradient projection method is computationally simple, it is not robust when a matrix in the algorithm is ill conditioned. We thus further propose using diagonal loading and singular value decomposition methods to improve the robustness of the algorithm. In order to handle targets with large flat surfaces, a combined amplitude and total-variation objective function is also proposed to regularize the shapes of the flat surfaces. Simulation results show that the proposed gradient projection of SOONE function method is better than orthogonal matching pursuit, CoSaMp, l1-magic, Bayesian method with Laplace prior, smoothed l0 method, and l1-ls in high SNR cases for recovery of ± 1 random spikes sparse signal. The quality of the simulated 3D images and real data ISAR images obtained using the new method is better than that of the conventional correlation method and minimum l2 norm method, and competitive to the aforementioned sparse signal recovery algorithms.
    IEEE Transactions on Image Processing 05/2014; 23(5):2168-83.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Photo cropping is a widely used tool in printing industry, photography, and cinematography. Conventional cropping models suffer from the following three challenges. First, the deemphasized role of semantic contents that are many times more important than low-level features in photo aesthetics. Second, the absence of a sequential ordering in the existing models. In contrast, humans look at semantically important regions sequentially when viewing a photo. Third, the difficulty of leveraging inputs from multiple users. Experience from multiple users is particularly critical in cropping as photo assessment is quite a subjective task. To address these challenges, this paper proposes semantics-aware photo cropping, which crops a photo by simulating the process of humans sequentially perceiving semantically important regions of a photo. We first project the local features (graphlets in this paper) onto the semantic space, which is constructed based on the category information of the training photos. An efficient learning algorithm is then derived to sequentially select semantically representative graphlets of a photo, and the selecting process can be interpreted by a path, which simulates humans actively perceiving semantics in a photo. Furthermore, we learn a prior distribution of such active graphlet paths from training photos that are marked as aesthetically pleasing by multiple users. The learned priors enforce the corresponding active graphlet path of a test photo to be maximally similar to those from the training photos. Experimental results show that: 1) the active graphlet path accurately predicts human gaze shifting, and thus is more indicative for photo aesthetics than conventional saliency maps and 2) the cropped photos produced by our approach outperform its competitors in both qualitative and quantitative comparisons.
    IEEE Transactions on Image Processing 05/2014; 23(5):2235-45.
  • IEEE Transactions on Image Processing 05/2014; 23(5):2277-2290.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In image classification tasks, one of the most successful algorithms is the bag-of-features (BoFs) model. Although the BoF model has many advantages, such as simplicity, generality, and scalability, it still suffers from several drawbacks, including the limited semantic description of local descriptors, lack of robust structures upon single visual words, and missing of efficient spatial weighting. To overcome these shortcomings, various techniques have been proposed, such as extracting multiple descriptors, spatial context modeling, and interest region detection. Though they have been proven to improve the BoF model to some extent, there still lacks a coherent scheme to integrate each individual module together. To address the problems above, we propose a novel framework with spatial pooling of complementary features. Our model expands the traditional BoF model on three aspects. First, we propose a new scheme for combining texture and edge-based local features together at the descriptor extraction level. Next, we build geometric visual phrases to model spatial context upon complementary features for midlevel image representation. Finally, based on a smoothed edgemap, a simple and effective spatial weighting scheme is performed to capture the image saliency. We test the proposed framework on several benchmark data sets for image classification. The extensive results show the superior performance of our algorithm over the state-of-the-art methods.
    IEEE Transactions on Image Processing 05/2014; 23(5):1994-2008.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a sequential framework for change detection. This framework allows us to use multiple images from reference and mission passes of a scene of interest in order to improve detection performance. It includes a change statistic that is easily updated when additional data becomes available. Detection performance using this statistic is predictable when the reference and image data are drawn from known distributions. We verify our performance prediction by simulation. Additionally, we show that detection performance improves with additional measurements on a set of synthetic aperture radar images and a set of visible images with unknown probability distributions.
    IEEE Transactions on Image Processing 05/2014; 23(5):2405-13.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flexible rate-adaptation under varying channel conditions. Accurately predicting the users' quality of experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos. Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online TVSQ prediction in HTTP-based streaming.
    IEEE Transactions on Image Processing 05/2014; 23(5):2206-21.