IEEE Transactions on Image Processing (IEEE T IMAGE PROCESS )

Publisher: IEEE Signal Processing Society; Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronics Engineers


This journal will focus on the signal processing aspects of image acquisition, processing, and display, especially where it concerns modeling, design, and analysis having a strong mathematical basis.

  • Impact factor
  • 5-year impact
  • Cited half-life
  • Immediacy index
  • Eigenfactor
  • Article influence
  • Website
    IEEE Transactions on Image Processing website
  • Other titles
    IEEE transactions on image processing, Institute of Electrical and Electronics Engineers transactions on image processing, Image processing
  • ISSN
  • OCLC
  • Material type
    Periodical, Internet resource
  • Document type
    Journal / Magazine / Newspaper, Internet Resource

Publisher details

Institute of Electrical and Electronics Engineers

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Authors own and employers publicly accessible webpages
    • Preprint - Must be removed upon publication of final version and replaced with either full citation to IEEE work with a Digital Object Identifier or link to article abstract in IEEE Xplore or Authors post-print
    • Preprint - Set-phrase must be added once submitted to IEEE for publication ("This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible")
    • Preprint - Set phrase must be added when accepted by IEEE for publication ("(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")
    • Preprint - IEEE must be informed as to the electronic address of the pre-print
    • Postprint - Publisher copyright and source must be acknowledged (see above set statement)
    • Publisher's version/PDF cannot be used
    • Publisher copyright and source must be acknowledged
  • Classification
    ‚Äč green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: In the Part 1 of this two-part study, we present a method of imaging and velocity estimation of ground moving targets using passive synthetic aperture radar. Such a system uses a network of small, mobile receivers that collect scattered waves due to transmitters of opportunity, such as commercial television, radio, and cell phone towers. Therefore, passive imaging systems have significant cost, manufacturing, and stealth advantages over active systems. We describe a novel generalized Radon transform-type forward model and a corresponding filtered-backprojection-type image formation and velocity estimation method. We form a stack of position images over a range of hypothesized velocities, and show that the targets can be reconstructed at the correct position whenever the hypothesized velocity is equal to the true velocity of targets. We then use entropy to determine the most accurate velocity and image pair for each moving target. We present extensive numerical simulations to verify the reconstruction method. Our method does not require a priori knowledge of transmitter locations and transmitted waveforms. It can determine the location and velocity of multiple targets moving at different velocities. Furthermore, it can accommodate arbitrary imaging geometries. In Part 2, we present the resolution analysis and analysis of positioning errors in passive SAR images due to erroneous velocity estimation.
    IEEE Transactions on Image Processing 06/2014; 23(6):2487-500.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fluorescence diffuse optical tomography (FDOT) is an emerging molecular imaging modality that uses near infrared light to excite the fluorophore injected into tissue; and to reconstruct the fluorophore concentration from boundary measurements. The FDOT image reconstruction is a highly ill-posed inverse problem due to a large number of unknowns and limited number of measurements. However, the fluorophore distribution is often very sparse in the imaging domain since fluorophores are typically designed to accumulate in relatively small regions. In this paper, we use compressive sensing (CS) framework to design light illumination and detection patterns to improve the reconstruction of sparse fluorophore concentration. Unlike the conventional FDOT imaging where spatially distributed light sources illuminate the imaging domain one at a time and the corresponding boundary measurements are used for image reconstruction, we assume that the light sources illuminate the imaging domain simultaneously several times and the corresponding boundary measurements are linearly filtered prior to image reconstruction. We design a set of optical intensities (illumination patterns) and a linear filter (detection pattern) applied to the boundary measurements to improve the reconstruction of sparse fluorophore concentration maps. We show that the FDOT sensing matrix can be expressed as a columnwise Kronecker product of two matrices determined by the excitation and emission light fields. We derive relationships between the incoherence of the FDOT forward matrix and these two matrices, and use these results to reduce the incoherence of the FDOT forward matrix. We present extensive numerical simulation and the results of a real phantom experiment to demonstrate the improvements in image reconstruction due to the CS-based light illumination and detection patterns in conjunction with relaxation and greedy-type reconstruction algorithms.
    IEEE Transactions on Image Processing 06/2014; 23(6):2609-24.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We address single image super-resolution using a statistical prediction model based on sparse representations of low- and high-resolution image patches. The suggested model allows us to avoid any invariance assumption, which is a common practice in sparsity-based approaches treating this task. Prediction of high resolution patches is obtained via MMSE estimation and the resulting scheme has the useful interpretation of a feedforward neural network. To further enhance performance, we suggest data clustering and cascading several levels of the basic algorithm. We suggest a training scheme for the resulting network and demonstrate the capabilities of our algorithm, showing its advantages over existing methods based on a low- and high-resolution dictionary pair, in terms of computational complexity, numerical criteria, and visual appearance. The suggested approach offers a desirable compromise between low computational complexity and reconstruction quality, when comparing it with state-of-the-art methods for single image super-resolution.
    IEEE Transactions on Image Processing 06/2014; 23(6):2569-82.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we investigate the impact of spatial, temporal, and amplitude resolution on the perceptual quality of a compressed video. Subjective quality tests were carried out on a mobile device and a total of 189 processed video sequences with 10 source sequences included in the test. Subjective data reveal that the impact of spatial resolution (SR), temporal resolution (TR), and quantization stepsize (QS) can each be captured by a function with a single content-dependent parameter, which indicates the decay rate of the quality with each resolution factor. The joint impact of SR, TR, and QS can be accurately modeled by the product of these three functions with only three parameters. The impact of SR and QS on the quality are independent of that of TR, but there are significant interactions between SR and QS. Furthermore, the model parameters can be predicted accurately from a few content features derived from the original video. The proposed model correlates well with the subjective ratings with a Pearson correlation coefficient of 0.985 when the model parameters are predicted from content features. The quality model is further validated on six other subjective rating data sets with very high accuracy and outperforms several well-known quality models.
    IEEE Transactions on Image Processing 06/2014; 23(6):2473-86.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Efficient video representation models are critical for many video analysis and processing tasks. In this paper, we present a framework based on the concept of finding the sparsest solution to model video frames. To model the spatio-temporal information, frames from one scene are decomposed into two components: (i) a common frame, which describes the visual information common to all the frames in the scene/segment, and (ii) a set of innovative frames, which depicts the dynamic behaviour of the scene. The proposed approach exploits and builds on recent results in the field of compressed sensing to jointly estimate the common frame and the innovative frames for each video segment. We refer to the proposed modeling framework by CIV (Common and Innovative Visuals). We show how the proposed model can be utilized to find scene change boundaries and extend CIV to videos from multiple scenes. Furthermore, the proposed model is robust to noise and can be used for various video processing applications without relying on motion estimation and detection or image segmentation. Results for object tracking, video editing (object removal, inpainting) and scene change detection are presented to demonstrate the efficiency and the performance of the proposed model.
    IEEE Transactions on Image Processing 05/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a novel local pattern descriptor generated by the proposed local vector pattern (LVP) in high-order derivative space is presented for use in face recognition. Based on the vector of each pixel constructed by computing the values between the referenced pixel and the adjacent pixels with diverse distances from different directions, the vector representation of the referenced pixel is generated to provide the one-dimensional structure of micropatterns. With the devise of pairwise direction of vector for each pixel, the LVP reduces the feature length via comparative space transform (CST) to encode various spatial surrounding relationships between the referenced pixel and its neighborhood pixels. Besides, the concatenation of LVPs is compacted to produce more distinctive features. To effectively extract more detailed discriminative information in a given subregion, the vector of LVP is refined by varying local derivative directions from the n th-order LVP in (n-1)th-order derivative space, which is a much more resilient structure of micropatterns than standard local pattern descriptors. The proposed LVP is compared with the existing local pattern descriptors including local binary pattern (LBP), local derivative pattern (LDP), and local tetra pattern (LTrP) to evaluate the performances from input grayscale face images. Moreover, extensive experiments conducting on benchmark face image databases, FERET, CASPEAL, CMU-PIE, Extended Yale B and LFW, demonstrate that the proposed LVP in high-order derivative space indeed performs much better than LBP, LDP and LTrP in face recognition.
    IEEE Transactions on Image Processing 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new mathematical and algorithmic framework for unsupervised image segmentation, which is a critical step in a wide variety of image processing applications. We have found that most existing segmentation methods are not successful on histopathology images, which prompted us to investigate segmentation of a broader class of images, namely those without clear edges between the regions to be segmented. We model these images as occlusions of random images, which we call textures, and show that local histograms are a useful tool for segmenting them. Based on our theoretical results, we describe a flexible segmentation framework that draws on existing work on nonnegative matrix factorization and image deconvolution. Results on synthetic texture mosaics and real histology images show the promise of the method.
    IEEE Transactions on Image Processing 05/2014; 23(5):2033-46.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Image reranking is effective for improving the performance of a text-based image search. However, existing reranking algorithms are limited for two main reasons: 1) the textual meta-data associated with images is often mismatched with their actual visual content and 2) the extracted visual features do not accurately describe the semantic similarities between images. Recently, user click information has been used in image reranking, because clicks have been shown to more accurately describe the relevance of retrieved images to search queries. However, a critical problem for click-based methods is the lack of click data, since only a small number of web images have actually been clicked on by users. Therefore, we aim to solve this problem by predicting image clicks. We propose a multimodal hypergraph learning-based sparse coding method for image click prediction, and apply the obtained click data to the reranking of images. We adopt a hypergraph to build a group of manifolds, which explore the complementarity of different features through a group of weights. Unlike a graph that has an edge between two vertices, a hyperedge in a hypergraph connects a set of vertices, and helps preserve the local smoothness of the constructed sparse codes. An alternating optimization procedure is then performed, and the weights of different modalities and the sparse codes are simultaneously obtained. Finally, a voting strategy is used to describe the predicted click as a binary event (click or no click), from the images' corresponding sparse codes. Thorough empirical studies on a large-scale database including nearly 330 K images demonstrate the effectiveness of our approach for click prediction when compared with several other methods. Additional image reranking experiments on real-world data show the use of click prediction is beneficial to improving the performance of prominent graph-based image reranking algorithms.
    IEEE Transactions on Image Processing 05/2014; 23(5):2019-32.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many imaging applications require the implementation of space-varying convolution for accurate restoration and reconstruction of images. Here, we use the term space-varying convolution to refer to linear operators whose impulse response has slow spatial variation. In addition, these space-varying convolution operators are often dense, so direct implementation of the convolution operator is typically computationally impractical. One such example is the problem of stray light reduction in digital cameras, which requires the implementation of a dense space-varying deconvolution operator. However, other inverse problems, such as iterative tomographic reconstruction, can also depend on the implementation of dense space-varying convolution. While space-invariant convolution can be efficiently implemented with the fast Fourier transform, this approach does not work for space-varying operators. So direct convolution is often the only option for implementing space-varying convolution. In this paper, we develop a general approach to the efficient implementation of space-varying convolution, and demonstrate its use in the application of stray light reduction. Our approach, which we call matrix source coding, is based on lossy source coding of the dense space-varying convolution matrix. Importantly, by coding the transformation matrix, we not only reduce the memory required to store it; we also dramatically reduce the computation required to implement matrix-vector products. Our algorithm is able to reduce computation by approximately factoring the dense space-varying convolution operator into a product of sparse transforms. Experimental results show that our method can dramatically reduce the computation required for stray light reduction while maintaining high accuracy.
    IEEE Transactions on Image Processing 05/2014; 23(5):1965-79.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose FeatureMatch, a generalised approximate nearest-neighbour field (ANNF) computation framework, between a source and target image. The proposed algorithm can estimate ANNF maps between any image pairs, not necessarily related. This generalisation is achieved through appropriate spatial-range transforms. To compute ANNF maps, global colour adaptation is applied as a range transform on the source image. Image patches from the pair of images are approximated using low-dimensional features, which are used along with KD-tree to estimate the ANNF map. This ANNF map is further improved based on image coherency and spatial transforms. The proposed generalisation, enables us to handle a wider range of vision applications, which have not been tackled using the ANNF framework. We illustrate two such applications namely: 1) optic disk detection and 2) super resolution. The first application deals with medical imaging, where we locate optic disks in retinal images using a healthy optic disk image as common target image. The second application deals with super resolution of synthetic images using a common source image as dictionary. We make use of ANNF mappings in both these applications and show experimentally that our proposed approaches are faster and accurate, compared with the state-ofthe-art techniques.
    IEEE Transactions on Image Processing 05/2014; 23(5):2193-205.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
    IEEE Transactions on Image Processing 05/2014; 23(5):2047-57.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present an efficient multiscale low-rank representation for image segmentation. Our method begins with partitioning the input images into a set of superpixels, followed by seeking the optimal superpixel-pair affinity matrix, both of which are performed at multiple scales of the input images. Since low-level superpixel features are usually corrupted by image noise, we propose to infer the low-rank refined affinity matrix. The inference is guided by two observations on natural images. First, looking into a single image, local small-size image patterns tend to recur frequently within the same semantic region, but may not appear in semantically different regions. The internal image statistics are referred to as replication prior, and we quantitatively justified it on real image databases. Second, the affinity matrices at different scales should be consistently solved, which leads to the cross-scale consistency constraint. We formulate these two purposes with one unified formulation and develop an efficient optimization procedure. The proposed representation can be used for both unsupervised or supervised image segmentation tasks. Our experiments on public data sets demonstrate the presented method can substantially improve segmentation accuracy.
    IEEE Transactions on Image Processing 05/2014; 23(5):2159-67.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for learning overcomplete dictionaries of atoms composed of two modalities that describe a 3D scene: 1) image intensity and 2) scene depth. We propose a novel joint basis pursuit (JBP) algorithm that finds related sparse features in two modalities using conic programming and we integrate it into a two-step dictionary learning algorithm. The JBP differs from related convex algorithms because it finds joint sparsity models with different atoms and different coefficient values for intensity and depth. This is crucial for recovering generative models where the same sparse underlying causes (3D features) give rise to different signals (intensity and depth). We give a bound for recovery error of sparse coefficients obtained by JBP, and show numerically that JBP is superior to the group lasso algorithm. When applied to the Middlebury depth-intensity database, our learning algorithm converges to a set of related features, such as pairs of depth and intensity edges or image textures and depth slants. Finally, we show that JBP outperforms state of the art methods on depth inpainting for time-of-flight and Microsoft Kinect 3D data.
    IEEE Transactions on Image Processing 05/2014; 23(5):2122-32.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Texture enhancement presents an ongoing challenge, in spite of the considerable progress made in recent years. Whereas most of the effort has been devoted so far to enhancement of regular textures, stochastic textures that are encountered in most natural images, still pose an outstanding problem. The purpose of enhancement of stochastic textures is to recover details, which were lost during the acquisition of the image. In this paper, a texture model, based on fractional Brownian motion (fBm), is proposed. The model is global and does not entail using image patches. The fBm is a self-similar stochastic process. Self-similarity is known to characterize a large class of natural textures. The fBm-based model is evaluated and a single-image regularized superresolution algorithm is derived. The proposed algorithm is useful for enhancement of a wide range of textures. Its performance is compared with single-image superresolution methods and its advantages are highlighted.
    IEEE Transactions on Image Processing 05/2014; 23(5):2096-108.

Related Journals