Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Saliency or salient region extraction from images is still a challenging field as it needs some understanding of the image and its nature. A technique that is suitable for some applications is not necessarily useful in other applications, thus, saliency identification is dependent upon the application. Based on a survey of existing methods of saliency detection, a new technique to extract the salient regions from an image is proposed that utilizes local features of the region surrounding each pixel. The level of saliency is decided based on the irregularity of the region with compared to other regions. To make the process fully automatic, a new Fuzzy-based thresholding technique has also been developed. In addition to the above, a survey of existing saliency evaluation techniques has been carried out and we have proposed new evaluation methods. The proposed saliency extraction technique has been compared with other algorithms reported in the literature, and the results are discussed in detail.
Content may be subject to copyright.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Referring to our previous work in this field, it has been proven that irregularity can be used as a reliable measure in extracting saliency, which we will adopt in this study [6] and [9]. In this paper we shall use the principles of irregularity, both globally and locally, to develop a new robust algorithm for saliency extraction. ...
... FM is the F-measure and is a weighting factor which was selected to be 0.3 in ref. [26]. Also, we shall use the exclusive OR measure (XOR) that was proposed in [6] and [9] and is calculated as given in Eq. (4). ...
... Two statistical measures have been used for this purpose, which are the expected value and the deviation. Since in regular regions the expected value ij is very close to the pixels values and the standard deviation value ij is small, then the irregularity measure has been derived from these two measures as shown in Eq. (9) [6,9]: ...
Article
Full-text available
Saliency extraction is a technique inspired by the human approach in processing a selected portion of the visual information received. This feature in the human visual system helps reduce the processing the brain needs to extract important information and neglect general and unimportant information. This paper presents a novel approach to identifying the saliency of regions in a scene from which objects likely to be salient can be extracted. The proposed approach uses two stages, namely, local saliency identification (LSI) and global saliency identification (GSI) and uses irregularity as the saliency measure in both stages. Local saliency uses the structure of the object to determine saliency while global saliency identifies the saliency of the region based on the contrast in relation to the entire background. An object is considered to be salient if it satisfies both local and global criteria. In this work, the key challenges and limitations of existing methods, such as the sensitivity to texture and noise, the need to manually define certain parameters, and the need to have pre-knowledge of the nature of the image, were considered and appropriate solutions have been suggested. The proposed algorithm was tested on a set of 1000 images selected from MSRA saliency identification standard dataset and benchmarked with state-of-the-art approaches. The results obtained showed very good efficiency and this is evident from the evaluation values obtained from the used evaluation method, e. g. the value of the F-measure, reached 96.5 per cent in some cases. The limitation of the approach was with complex objects which themselves comprising more than one important region such as an image of a person. This will be discussed thoroughly in the result section.
... The strengths and weaknesses of the mentioned methods are beyond the scope of this paper, a sufficient discussion is found in [26] and [27], which also includes a new method of extracting the saliency based on the irregularity of the intensity of the region. In this method, some statistical measures are used to measure the irregularity of the region. ...
... Regions with high irregularity measure are considered to be salient regions. In this research, we shall adopt Irregularity-Based Saliency given in [26] and [27] as a measure to extract the salient regions of an image. ...
Article
Full-text available
Due to the importance of searching for an image in a database in various applications, many algorithms have been proposed to identify the contents of the image. Algorithms that identify the content of the image as a whole can offer good results in some applications and fail to produce satisfactory results in other applications. Therefore, searching for an object inside the image was used to overcome the limitations of identifying the image as a whole. Hence, studies focused on segmenting the image into small sub-images and identified their contents. In this paper, we introduce a new algorithm inspired by human attention and utilises the saliency principles to identify the contents of an image and search for similar objects in the images stored in a database. We also demonstrate that the use of salient objects produces better and more accurate results in the image retrieval process. A new retrieval algorithm is therefore presented here, focused on identifying the objects extracted from the salient regions. To assess the efficiency of the proposed algorithm, a new evaluation method is also proposed which considers the order of the retrieved image in assessing the efficiency of the algorithm.
... Salient object detection aims to identify the spatial locations and scales of the most attention-grabbing object in a given image [1,2,4,8,33,41]. It is an important problem in computer vision community and has shown to be quite helpful for various vision tasks, such as object recognition [29], adaptive image display [11,32], and content-aware image editing Fig. 1 Illustration of our proposed two-stage saliency process. ...
... Chen et al. [12] simultaneously evaluates global contrast differences and spatial coherence. Al-Azawi et al. [4] extracts the salient regions utilizing local features of the region surrounding each pixel. Perazzi et al. [38] combines global and local contrast estimation via high-dimensional Gaussian filters. ...
Article
Full-text available
Salient object detection aims to identify both spatial locations and scales of the salient object in an image. However, previous saliency detection methods generally fail in detecting the whole objects, especially when the salient objects are actually composed of heterogeneous parts. In this work, we propose a saliency bias and diffusion method to effectively detect the complete spatial support of salient objects. We first introduce a novel saliency-aware feature to bias the objectness detection for saliency detection on a given image and incorporate the saliency clues explicitly in refining the saliency map. Then, we propose a saliency diffusion method to fuse the saliency confidences of different parts from the same object for discovering the whole salient object, which uses the learned visual similarities among object regions to propagate the saliency values across them. Benefiting from such bias and diffusion strategy, the performance of salient object detection is significantly improved, as shown in the comprehensive experimental evaluations on four benchmark data sets, including MSRA-1000, SOD, SED, and THUS-10000.
... In these studies, it was argued that objects that are larger in size, located at the centre of the image or with high intensity or colour contrast can get higher importance than others [8]. From our previous studies, it was found that irregularity and rarity can attract human attention and, therefore, irregular regions can be considered as regions of interest [9], [10]. ...
Article
Full-text available
Developing systems inspired by human intelligence was the interest of research for many decades. Human vision is one of the fields the researchers tried to simulate and imitate to develop powerful machine vision systems. This research focuses on the advantages of extracting regions of interest from a scene in reducing the processing power and storage requirements. It explains how the saliency extraction is inspired by human visual attention (HVA) principles and how the computational saliency extraction can be used to simulate these principles. The research includes a brief explanation of HVA and salient point extraction in addition to the relationship between them. Empirical Experiments and results are presented, as well, to support the discussion and the aim of this research.
... When the observer's attention is stimulated by a location in the scene, he orients his vision toward that location, after that his brain filters the unnecessary information and searches for the information it feels that are of interest. Irregularity in colours, lighting, shape, size, or texture can be used to stimulate the observer's attention [16], [17]. For example, in the image shown in Fig. 1, it is clear that the observer has focussed on the centre of the image for two reasons; firstly, the observer mostly starts observation from the centre of the scene and the second reason is the irregularity of the colour of the object in the centre as compare to the surrounding regions. ...
Article
Full-text available
Eye tracking is one of the important technologies that is used to identify individual's interest by recording and analysing his/her eyes' movements. The attention and interest identification can be, then, used in various applications such as marketing and education. In this research, we utilise this technology and apply it in consumer behaviour application namely in retail items display and shelves organising. The gaze points data obtained from eye trackers is analysed and the consumer interest is discussed based on the analysis. From the analyses, it is shown that the human attention could be attracted by adding some irregularity in colour, shape, and size to the scene.
... Parts of this section were published in Multimedia Applications and Tools Journal, 2014 [178] and in IEEE ICCIC 2013 [179]. ...
Thesis
This research introduces an image retrieval system which is, in different ways, inspired by the human vision system. The main problems with existing machine vision systems and image understanding are studied and identified, in order to design a system that relies on human image understanding. The main improvement of the developed system is that it uses the human attention principles in the process of image contents identification. Human attention shall be represented by saliency extraction algorithms, which extract the salient regions or in other words, the regions of interest. This work presents a new approach for the saliency identification which relies on the irregularity of the region. Irregularity is clearly defined and measuring tools developed. These measures are derived from the formality and variation of the region with respect to the surrounding regions. Both local and global saliency have been studied and appropriate algorithms were developed based on the local and global irregularity defined in this work. The need for suitable automatic clustering techniques motivate us to study the available clustering techniques and to development of a technique that is suitable for salient points clustering. Based on the fact that humans usually look at the surrounding region of the gaze point, an agglomerative clustering technique is developed utilising the principles of blobs extraction and intersection. Automatic thresholding was needed in different stages of the system development. Therefore, a Fuzzy thresholding technique was developed. Evaluation methods of saliency region extraction have been studied and analysed; subsequently we have developed evaluation techniques based on the extracted regions (or points) and compared them with the ground truth data. The proposed algorithms were tested against standard datasets and compared with the existing state-of-the-art algorithms. Both quantitative and qualitative benchmarking are presented in this thesis and a detailed discussion for the results has been included. The benchmarking showed promising results in different algorithms. The developed algorithms have been utilised in designing an integrated saliency-based image retrieval system which uses the salient regions to give a description for the scene. The system auto-labels the objects in the image by identifying the salient objects and gives labels based on the knowledge database contents. In addition, the system identifies the unimportant part of the image (background) to give a full description for the scene.
Presentation
Keynote in the conference..
Conference Paper
Full-text available
Abstract— This research presents an approach of the application of salient regions extraction in identifying the contents of an image which is referred to as Saliency-Based Image Retrieval System (SBIR). A discussion of Content-based image retrieval (CBIR), image identification and evaluation of image retrieval systems is presented in this paper as well. Mathematical representation, derivation and analysis for the colour histogram is presented and discussed briefly. Three main types of image retrieval and recognition are listed and discussed based on the area from the image that is used in the recognition process, whether it covers the entire image, the region contains the object or the object only. To compare the results obtained from SBIR with those obtained from CBIR, we will discuss image retrieval evaluation measures such as Precision and Recall.
Conference Paper
Full-text available
Abstract—This research focuses on the use of human visual attention (HVA) principles to improve machine vision processing by understanding these principles and simulating them in salient regions extraction. It includes a brief explanation for HVA and salient point extraction and comparison between them. It, also, reviews some state-of-the-art approaches and researches to represent HVA computationally. A comparison between the results obtained using these approaches and the experiments obtained from eye trackers was presented and discussed in this research as well.
Article
To extract and combine the features of the original images, a novel algorithm based on visual salient features and the cross-contrast is proposed in this paper. Original images were decomposed into low frequency subband coefficients and bandpass direction subband coefficients by using the nonsubsampled contourlet transform. Three maps of visual salient features are constructed based on visual salient features the local energy, the contrast and the gradient respectively, and low-frequency subband coefficients are got by utilizing these visual saliency maps. The cross-contrast is obtained by computing the ratio between the local gray mean of bandpass direction subband coefficients and the local gray mean of fused low-frequency subband coefficients. Bandpass direction subband coefficients is goted by the cross-contrast. Comparison experiments have been performed on different image sets, and experimental results demonstrate that the proposed method performs better in both subjective and objective qualities.
Article
Full-text available
We address the issue of visual saliency from three perspectives. First, we consider saliency detection as a frequency domain analysis problem. Second, we achieve this by employing the concept of nonsaliency. Third, we simultaneously consider the detection of salient regions of different size. The paper proposes a new bottom-up paradigm for detecting visual saliency, characterized by a scale-space analysis of the amplitude spectrum of natural images. We show that the convolution of the image amplitude spectrum with a low-pass Gaussian kernel of an appropriate scale is equivalent to an image saliency detector. The saliency map is obtained by reconstructing the 2D signal using the original phase and the amplitude spectrum, filtered at a scale selected by minimizing saliency map entropy. A Hypercomplex Fourier Transform performs the analysis in the frequency domain. Using available databases, we demonstrate experimentally that the proposed model can predict human fixation data. We also introduce a new image database and use it to show that the saliency detector can highlight both small and large salient regions, as well as inhibit repeated distractors in cluttered images. In addition, we show that it is able to predict salient regions on which people focus their attention.
Conference Paper
Full-text available
In this paper, we propose an efficient saliency model using regional color and spatial information. The original image is first segmented into a set of regions using a superpixel segmentation algorithm. For each region, its color saliency is evaluated based on the color similarity measures with other regions, and its spatial saliency is evaluated based on its color distribution and spatial position. The final saliency map is generated by combining color saliency measures and spatial saliency measures of regions. Experimental results on a public dataset containing 1000 images demonstrate that our computationally efficient saliency model outperforms the other six state-of-the-art models on saliency detection performance.
Conference Paper
Full-text available
In computer vision applications it is necessary to extract the regions of interest in order to reduce the search space and to improve image contents identification. Human-Oriented Regions of Interest can be extracted by collecting some feedback from the user. The feedback usually provided by the user by giving different ranks for the identified regions in the image. This rank is then used to adapt the identification process. Nowadays eye tracking technology is widely used in different applications, one of the suggested applications is by using the data collected from the eye-tracking device, which represents the user gaze points in extracting the regions of interest. In this paper we shall introduce a new agglomerative clustering algorithm which uses blobs extraction technique and statistical measures in clustering the gaze points obtained from the eye tracker. The algorithm is fully automatic, which means does not need any human intervention to specify the stopping criterion. In the suggested algorithm the points are replaced with small regions (blobs) then these blobs are grouped together to form a cloud, from which the interesting regions are constructed.
Conference Paper
Full-text available
Saliency or Salient regions extractions form images is still a challenging field since it needs some understanding for the image and the nature of the image. The technique that is suitable in some application is not necessarily useful in other application, thus, saliency enhancement is application oriented. In this paper, a new technique of extracting the salient regions from an image is proposed which utilizes the local features of the surrounding region of the pixels. The level of saliency is then decided based on the global comparison of the saliency-enhanced image. To make the process fully automatic a new Fuzzy- Based thresholding technique has been proposed also. The paper contains a survey of the state-of-the-art methods of saliency evaluation and a new saliency evaluation technique was proposed.
Article
Full-text available
In this paper an efficient method to learn and recognize objects from unlabeled natural scenes using patch based object representation is proposed. In the domain of object recognition, it is often the case that images have to be classified based on objects which make up only a very limited part of the image. Hence Patches (local features) are used to describe properties of certain region of an image. The proposed algorithm directly matches the parts distributed in a reference image that contains the object to those extracted in the test and hence it reports better matching. The experimental evaluation of the proposed method is done using the well-known Caltech database.
Article
Full-text available
Content-based Image Retrieval (CBIR) has become one of the most active research areas in the past few years. Most of the attention from the research has been focused on indexing techniques based on global feature distributions. However, these global distributions have limited discriminating power because they are unable to capture local image information. The use of interest points in content-based image retrieval allow image index to represent local properties of the image. Classic corner detectors can be used for this purpose. However, they have drawbacks when applied to various natural images for image retrieval, because visual features need not be corners and corners may gather in small regions. In this paper * , we present a salient point detector. The detector is based on wavelet transform to detect global variations as well as local ones. The wavelet-based salient points are evaluated for image retrieval with a retrieval system using color and texture features. The results show that salient points with Gabor feature perform better than the other point detectors from the literature and the randomly chosen points. Significantly improvements are achieved in terms of retrieval accuracy, computational complexity when compared to the global feature approaches.
Article
Full-text available
Attention and memory are very closely related and their aim is to simplify the acquired data into an intelligent structured data set. Two main points are discussed in this paper. The first one is the presentation of a novel visual attention model for still images which includes both a bottom-up and a top-down approach. The bottom-up model is based on structures rarity within the image during the forgetting process. The top-down information uses mouse-tracking experiments to build models of a global behavior for a given kind of image. The second interesting point is that the relative importance of bottom-up and top-down information depends on the specificity of each image. For the three different sets of images within the database the importance of the top-down information is different. The proposed models assessment is achieved on a 91-image database.
Conference Paper
Full-text available
An important step in content-based image retrieval is finding an interesting object within an image. We propose a method for extracting an interesting object from a complex background. Interesting objects are generally located near the center of the image and contain regions with significant color distribution. The significant color is the more frequently co-occurred color near the center of the image than at the background of the image. A core object region is selected as a region a lot of pixels of which have the significant color, and then it is grown by iteratively merging its neighbor regions and ignoring background regions. The final merging result called a central object may include different color-characterized regions and/or two or more connected objects of interest. The central objects automatically extracted with our method matched well with significant objects chosen manually.
Conference Paper
Full-text available
Detecting moving objects against dynamic backgrounds remains a challenge in computer vision and robotics. This paper presents a surprisingly simple algorithm to detect objects in such conditions. Based on theoretic analysis, we show that 1) the displacement of the foreground and the background can be represented by the phase change of Fourier spectra, and 2) the motion of background objects can be extracted by Phase Discrepancy in an efficient and robust way. The algorithm does not rely on prior training on particular features or categories of an image and can be implemented in 9 lines of MATLAB code. In addition to the algorithm, we provide a new database for moving object detection with 20 video clips, 11 subjects and 4785 bounding boxes to be used as a public benchmark for algorithm evaluation.
Article
Full-text available
This paper introduces a new computational visual-attention model for static and dynamic saliency maps. First, we use the Earth Mover's Distance (EMD) to measure the center-surround difference in the receptive field, instead of using the Difference-of-Gaussian filter that is widely used in many previous visual-attention models. Second, we propose to take two steps of biologically-inspired nonlinear operations for combining different features: combining subsets of basic features into a set of super features using the LmL^m-norm and then combining the super features using the Winner-Take-All mechanism. Third, we extend the proposed model to construct dynamic saliency maps from videos, by using EMD for computing the center-surround difference in the spatio-temporal receptive field. We evaluate the performance of the proposed model on both static image data and video data. Comparison results show that the proposed model outperforms several existing models under a unified evaluation setting.
Article
Full-text available
Saliency detection plays important roles in many image processing applications, such as regions of interest extraction and image resizing. Existing saliency detection models are built in the uncompressed domain. Since most images over Internet are typically stored in the compressed domain such as joint photographic experts group (JPEG), we propose a novel saliency detection model in the compressed domain in this paper. The intensity, color, and texture features of the image are extracted from discrete cosine transform (DCT) coefficients in the JPEG bit-stream. Saliency value of each DCT block is obtained based on the Hausdorff distance calculation and feature map fusion. Based on the proposed saliency detection model, we further design an adaptive image retargeting algorithm in the compressed domain. The proposed image retargeting algorithm utilizes multioperator operation comprised of the block-based seam carving and the image scaling to resize images. A new definition of texture homogeneity is given to determine the amount of removal block-based seams. Thanks to the directly derived accurate saliency information from the compressed domain, the proposed image retargeting algorithm effectively preserves the visually important regions for images, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-the-art algorithms, as demonstrated with the in-depth analysis in the extensive experiments.
Conference Paper
Full-text available
Local image descriptors computed in areas around salient points in images are essential for many algorithms in computer vision. Recent work suggests using as many salient points as possible. While sophisticated classifiers have been proposed to cope with the resulting large number of descriptors, processing this large amount of data is computationally costly. In this paper, computational methods are proposed to compute salient points designed to allow a reduction in the number of salient points while maintaining state of the art performance in image retrieval and object recognition applications. To obtain a more sparse description, a color salient point and scale determination framework is proposed operating on color spaces that have useful perceptual and saliency properties. This allows for the necessary discriminative points to be located, allowing a significant reduction in the number of salient points and obtaining an invariant (repeatability) and discriminative (distinctiveness) image description. Experimental results on large image datasets show that the proposed method obtains state of the art results with the number of salient points reduced by half. This reduction in the number of points allows subsequent operations, such as feature extraction and clustering, to run more efficiently. It is shown that the method provides less ambiguous features, a more compact description of visual data, and therefore a faster classification of visual data.
Article
Full-text available
A simple method for detecting salient regions in images is proposed. It requires only edge detection, threshold decomposition, the distance transform, and thresholding. Moreover, it avoids the need for setting any parameter values. Experiments show that the resulting regions are relatively coarse, but overall the method is surprisingly effective, and has the benefit of easy implementation. Quantitative tests were carried out on Liu et al.'s dataset of 5000 images. Although the ratings of our simple method were not as good as their approach which involved an extensive training stage, they were comparable to several other popular methods from the literature. Further tests on Kootstra and Schomaker's dataset of 99 images also showed promising results.
Article
Full-text available
The common problem in content based image retrieval (CBIR) is selection of features. Image characterization with lesser number of features involving lower computational cost is always desirable. Edge is a strong feature for characterizing an image. This paper presents a robust technique for extracting edge map of an image which is followed by computation of global feature (like fuzzy compactness) using gray level as well as shape information of the edge map. Unlike other existing techniques it does not require pre segmentation for the computation of features. This algorithm is also computationally attractive as it computes different features with limited number of selected pixels.
Conference Paper
Full-text available
A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: first forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human fixations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%.
Conference Paper
Full-text available
This paper presents a computational framework for saliency maps. It employs the Earth Mover's Distance based on weighted-Histogram (EMD-wH) to measure the center-surround difference, instead of the Difference-of - Gaussian (DoG) filter used by traditional models. In addition, the model employs not only the traditional features such as colors, intensity and orientation but also the local entropy which expresses the local complexity. The major advantage of combining the local entropy map is that it can detect the salient regions which are not complex regions. Also, it uses a general framework to integrate the feature dimensions instead of summing the features directly. This model considers both local and global salient information, in contrast to the existing models that consider only one or the other. Furthermore, the "large scale bias" and "central bias" hypotheses are used in this model to select the fixation locations in the saliency map of different scales. The performance of this model is assessed by comparing their saliency maps and human fixation density. The results from this model are finally compared to those from other bottom-up models for reference. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Full-text available
Attention and memory are very closely related and their aim is to simplify the acquired data into an intelligent structured data set. Two main points are discussed in this paper. The first one is the presentation of a novel visual attention model for still images which includes both a bottom-up and a top-down approach. The bottom-up model is based on structures rarity within the image during the forgetting process. The top-down information uses mouse-tracking experiments to build models of a global behavior for a given kind of image. The proposed models assessment is achieved on a 91-image database. The second interesting point is that the relative importance of bottom-up and top-down attention depends on the specificity of each image. In unknown images the bottom-up influence remains very important while in specific kinds of images (like web sites) top-down attention brings the major information.
Conference Paper
Full-text available
The ability of human visual system to detect visual saliency is extraordinarily fast and reliable. However, computational modeling of this basic intelligent behavior still remains a challenge. This paper presents a simple method for the visual saliency detection. Our model is independent of features, categories, or other forms of prior knowledge of the objects. By analyzing the log-spectrum of an input image, we extract the spectral residual of an image in spectral domain, and propose a fast method to construct the corresponding saliency map in spatial domain. We test this model on both natural pictures and artificial images such as psychological patterns. The result indicate fast and robust saliency detection of our method.
Conference Paper
Full-text available
A representation for observing local image content is proposed for the purpose of considering the distinguishing characteristics of visual content that tends to draw a human observers gaze. Within this representation, the spectral profile distinguishing fixated from non-fixated locations is considered. Finally, the possibility of designing saliency operators based on the proposed local magnitude spectrum representation is explored, revealing a promising domain for predicting human gaze patterns.
Article
This paper presents a computational framework for saliency maps. It employs the Earth Mover's Distance based on weighted-Histogram (EMD-wH) to measure the center-surround difference, instead of the Difference-of-Gaussian (DoG) filter used by traditional models. In addition, the model employs not only the traditional features such as colors, intensity and orientation but also the local entropy which expresses the local complexity. The major advantage of combining the local entropy map is that it can detect the salient regions which are not complex regions. Also, it uses a general framework to integrate the feature dimensions instead of summing the features directly. This model considers both local and global salient information, in contrast to the existing models that consider only one or the other. Furthermore, the "large scale bias" and "central bias" hypotheses are used in this model to select the fixation locations in the saliency map of different scales. The performance of this model is assessed by comparing their saliency maps and human fixation density. The results from this model are finally compared to those from other bottom-up models for reference.
Book
This richly illustrated book describes the use of interactive and dynamicgraphics as part of multidimensional data analysis. Chapters includeclustering, supervised classification, and working with missing values. Avariety of plots and interaction methods are used in each analysis, oftenstarting with brushing linked low-dimensional views and working up to manualmanipulation of tours of several variables. The role of graphical methods isshown at each step of the analysis, not only in the early exploratory phase,but in the later stages, too, when comparing and evaluating models.All examples are based on freely available software: GGobi for interactivegraphics and R for static graphics, modeling, and programming. The printedbook is augmented by a wealth of material on the web, encouraging readersfollow the examples themselves. The web site has all the data and codenecessary to reproduce the analyses in the book, along with moviesdemonstrating the examples.The book may be used as a text in a class on statistical graphics orexploratory data analysis, for example, or as a guide for the independentlearner. Each chapter ends with a set of exercises.The authors are both Fellows of the American Statistical Association, pastchairs of the Section on Statistical Graphics, and co-authors of the GGobisoftware. Dianne Cook is Professor of Statistics at Iowa State University.Deborah Swayne is a member of the Statistics Research Department at AT&T Labs.
Article
Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. Recently, full-resolution salient maps that retain well-defined boundaries have attracted attention. In these maps, boundaries are preserved by retaining substantially more frequency content from the original image than older techniques. However, if the salient regions comprise more than half the pixels of the image, or if the background is complex, the background gets highlighted instead of the salient object. In this paper, we introduce a method for salient region detection that retains the advantages of such saliency maps while overcoming their shortcomings. Our method exploits features of color and luminance, is simple to implement and is computationally efficient. We compare our algorithm to six state-of-the-art salient region detection methods using publicly available ground truth. Our method outperforms the six algorithms by achieving both higher precision and better recall. We also show application of our saliency maps in an automatic salient object segmentation scheme using graph-cuts.
Article
Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. In this paper, we introduce a method for salient region detection that outputs full resolution saliency maps with well-defined boundaries of salient objects. These boundaries are preserved by retaining substantially more frequency content from the original image than other existing techniques. Our method exploits features of color and luminance, is simple to implement, and is computationally efficient. We compare our algorithm to five state-of-the-art salient region detection methods with a frequency domain analysis, ground truth, and a salient object segmentation application. Our method outperforms the five algorithms both on the ground-truth evaluation and on the segmentation task by achieving both higher precision and better recall.
Article
Many computational models of visual attention have been created from a wide variety of different approaches to predict where people look in images. Each model is usually introduced by demonstrating performances on new images, and it is hard to make immediate comparisons between models. To alleviate this problem, we propose a benchmark data set containing 300 natural images with eye tracking data from 39 observers to compare model performances. We calculate the performance of 10 models at predicting ground truth fixations using three different metrics. We provide a way for people to submit new models for evaluation online. We find that the Judd et al. and Graph-based visual saliency models perform best. In general, models with blurrier maps and models that include a center bias perform well. We add and optimize a blur and center bias for each model and show improvements. We compare performances to baseline models of chance, center and human performance. We show that human performance increases with the number of humans to a limit. We analyze the similarity of different models using multidimensional scaling and explore the relationship between model performance and fixation consistency. Finally, we offer observations about how to improve saliency models in the future.
Article
TEXT. Regression. Inference in Regression. Attributes as Explanatory Variables. Nonlinear Relationships. Regression and Time Series. Lagged Variables. Regression Miscellanea. More on Inference in Regression. Autoregressive Models. The Classification Problem. More on Classification. Models of Systems. CASES. Appendices. Selected References. Index.
Conference Paper
Several salient object detection approaches have been published which have been assessed using different evaluation scores and datasets resulting in discrepancy in model comparison. This calls for a methodological framework to compare existing models and evaluate their pros and cons. We analyze benchmark datasets and scoring techniques and, for the first time, provide a quantitative comparison of 35 state-of-the-art saliency detection models. We find that some models perform consistently better than the others. Saliency models that intend to predict eye fixations perform lower on segmentation datasets compared to salient object detection algorithms. Further, we propose combined models which show that integration of the few best models outperforms all models over other datasets. By analyzing the consistency among the best models and among humans for each scene, we identify the scenes where models or humans fail to detect the most salient object. We highlight the current issues and propose future research directions.
Conference Paper
Image quality assessment is one application out of many that can be aided by the use of computational saliency models. Existing visual saliency models have not been extensively tested under a quality assessment context. Also, these models are typically geared towards predicting saliency in non-distorted images. Recent work has also focussed on mimicking the human visual system in order to predict fixation points from saliency maps. One such technique (GAFFE) that uses foveation has been found to perform well for non-distorted images. This work extends the foveation framework by integrating it with saliency maps from well known saliency models. The performance of the foveated saliency models is evaluated based on a comparison with human ground-truth eye-tracking data. For comparison, the performance of the original non-foveated saliency predictions is also presented. It is shown that the integration of saliency models with a foveation based fixation finding framework significantly improves the prediction performance of existing saliency models over different distortion types. It is also found that the information maximization based saliency maps perform the best consistently over different distortion types and levels under this foveation based framework.
Conference Paper
What makes an object salient? Most previous work assert that distinctness is the dominating factor. The difference between the various algorithms is in the way they compute distinctness. Some focus on the patterns, others on the colors, and several add high-level cues and priors. We propose a simple, yet powerful, algorithm that integrates these three factors. Our key contribution is a novel and fast approach to compute pattern distinctness. We rely on the inner statistics of the patches in the image for identifying unique patterns. We provide an extensive evaluation and show that our approach outperforms all state-of-the-art methods on the five most commonly-used datasets.
Conference Paper
Saliency estimation has become a valuable tool in image processing. Yet, existing approaches exhibit considerable variation in methodology, and it is often difficult to attribute improvements in result quality to specific algorithm properties. In this paper we reconsider some of the design choices of previous methods and propose a conceptually clear and intuitive algorithm for contrast-based saliency estimation. Our algorithm consists of four basic steps. First, our method decomposes a given image into compact, perceptually homogeneous elements that abstract unnecessary detail. Based on this abstraction we compute two measures of contrast that rate the uniqueness and the spatial distribution of these elements. From the element contrast we then derive a saliency measure that produces a pixel-accurate saliency map which uniformly covers the objects of interest and consistently separates fore- and background. We show that the complete contrast and saliency estimation can be formulated in a unified way using high-dimensional Gaussian filters. This contributes to the conceptual simplicity of our method and lends itself to a highly efficient implementation with linear complexity. In a detailed experimental evaluation we analyze the contribution of each individual feature and show that our method outperforms all state-of-the-art approaches.
Article
Image segmentation is one of the most important techniques in image processing. It is widely used in different applications such as computer vision, digital pattern recognition, robot vision, etc. Histogram was the earliest feature that has been used for isolating objects from their background, it is widely applicable in different application in which one needs to divide the image into distinct regions like background and object. The thresholding technique is the most popular solution in which a value on the histogram is selected to separate the regions. This value, which is known as the threshold, should be specified in an appropriate way. One of the methods is by using the global minimum value of the histogram and divides the histogram into white and black (binary image). Due to the spatial and grey uncertainty and ambiguity, the extraction of the threshold value in a crispy way is not suitable always. To overcome such problems, the proposed method uses two membership functions to measure the whiteness and blackness of a member element. The pixel belonging to one of the region is dependent on the membership value it has according to the membership functions.
Conference Paper
Abstract— Image segmentation is one of the most important techniques in image processing, it is widely used in variety of applications like; computer vision, digital pattern recognition, robot vision, etc. histogram was the earliest feature which has been used for segmenting objects from their background, it is widely applicable in different application in which one needs to divide the image into two distinct regions like background and object. The thresholding technique is the most popular solution in which a value on the histogram is selected to separate the two regions. This value which is called the threshold should be specified in an appropriate way. One of the methods is by using the global minimum value of the histogram and divides the histogram to white and black (binary image). Due to the spatial and gray uncertainty and ambiguity the image used to contain, crispy selection of the threshold value is not suitable always, to overcome such problem the proposed method uses two membership function to measure the whiteness and blackness of a member element. The pixel belonging to one of the region is dependent on the membership value it has according to the membership functions. Index Terms— Image segmentation, bimodal histogram, fuzzy intelligence, thresholding.
Article
A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research. After an over view of the clustering literature, the clustering process is discussed within a seven-step framework. The four major types of clustering methods can be characterized as hierarchical, partitioning, overlapping, and ordina tion algorithms. The validation of such algorithms re fers to the problem of determining the ability of the methods to recover cluster configurations which are known to exist in the data. Validation approaches in clude mathematical derivations, analyses of empirical datasets, and monte carlo simulation methods. Next, interpretation and inference procedures in cluster anal ysis are discussed. inference procedures involve test ing for significant cluster structure and the problem of determining the number of clusters in the data. The paper concludes with two sets of recommendations. One set deals with topics in clustering that would ben efit from continued research into the methodology. The other set offers recommendations for applied anal yses within the framework of the clustering process.
Article
To detect visually salient elements of complex natural scenes, computational bottom-up saliency models commonly examine several feature channels such as color and orientation in parallel. They compute a separate feature map for each channel and then linearly combine these maps to produce a master saliency map. However, only a few studies have investigated how different feature dimensions contribute to the overall visual saliency. We address this integration issue and propose to use covariance matrices of simple image features (known as region covariance descriptors in the computer vision community; Tuzel, Porikli, & Meer, 2006) as meta-features for saliency estimation. As low-dimensional representations of image patches, region covariances capture local image structures better than standard linear filters, but more importantly, they naturally provide nonlinear integration of different features by modeling their correlations. We also show that first-order statistics of features could be easily incorporated to the proposed approach to improve the performance. Our experimental evaluation on several benchmark data sets demonstrate that the proposed approach outperforms the state-of-art models on various tasks including prediction of human eye fixations, salient object detection, and image-retargeting.
Article
The human visual system possesses the remarkable ability to pick out salient objects in images. Even more impressive is its ability to do the very same in the presence of disturbances. In particular, the ability persists despite the presence of noise, poor weather, and other impediments to perfect vision. Meanwhile, noise can significantly degrade the accuracy of automated computational saliency detection algorithms. In this article, we set out to remedy this shortcoming. Existing computational saliency models generally assume that the given image is clean, and a fundamental and explicit treatment of saliency in noisy images is missing from the literature. Here we propose a novel and statistically sound method for estimating saliency based on a nonparametric regression framework and investigate the stability of saliency models for noisy images and analyze how state-of-the-art computational models respond to noisy visual stimuli. The proposed model of saliency at a pixel of interest is a data-dependent weighted average of dissimilarities between a center patch around that pixel and other patches. To further enhance the degree of accuracy in predicting the human fixations and of stability to noise, we incorporate a global and multiscale approach by extending the local analysis window to the entire input image, even further to multiple scaled copies of the image. Our method consistently outperforms six other state-of-the-art models (Bruce & Tsotsos, 2009; Garcia-Diaz, Fdez-Vidal, Pardo, & Dosil, 2012; Goferman, Zelnik-Manor, & Tal, 2010; Hou & Zhang, 2007; Seo & Milanfar, 2009; Zhang, Tong, & Marks, 2008) for both noise-free and noisy cases.
Article
In this paper we formulate a hierarchical configurable deformable template (HCDT) to model articulated visual objects—such as horses and baseball players—for tasks such as parsing, segmentation, and pose estimation. HCDTs represent an object by an AND/OR graph where the OR nodes act as switches which enables the graph topology to vary adaptively. This hierarchical representation is compositional and the node variables represent positions and properties of subparts of the object. The graph and the node variables are required to obey the summarization principle which enables an efficient compositional inference algorithm to rapidly estimate the state of the HCDT. We specify the structure of the AND/OR graph of the HCDT by hand and learn the model parameters discriminatively by extending Max-Margin learning to AND/OR graphs. We illustrate the three main aspects of HCDTs—representation, inference, and learning—on the tasks of segmenting, parsing, and pose (configuration) estimation for horses and humans. We demonstrate that the inference algorithm is fast and that max-margin learning is effective. We show that HCDTs gives state of the art results for segmentation and pose estimation when compared to other methods on benchmarked datasets. KeywordsHierarchy–Shape representation–Object parsing–Segmentation–Structure learning–Max margin
Conference Paper
Point matching techniques have been widely used in content-based image retrieval progress. In this paper, a new color image salient points detector is proposed based on wavelet transform and the Barnard detector. First, an image is divided into three channels and each channel is decomposed with wavelet transform separately. Second, for a given wavelet coefficient, a corresponding wavelet coefficient could be found at a finer scale according to the absolute maximum value. Barnard detector is used when finding potential salient points in the image. Finally, color salient points are extracted by using a self-adaptable threshold and the continuous points set reduction method. Experiments show that this method is robust and the extracted salient points can give a satisfying representation of an image. It can also improve the retrieval performance effectively
Article
This study presents an efficient saliency model mainly aiming at content-based applications such as salient object segmentation. The input colour image is first pre-segmented into a set of regions using the mean shift algorithm. A set of Gaussian models are estimated on the basis of segmented regions, and then for each pixel, a set of normalised colour likelihood measures to different Gaussian models are calculated. The colour saliency measure and spatial saliency measure of each Gaussian model are evaluated based on its colour distinctiveness and the spatial distribution, respectively. Finally, the pixel-wise colour saliency map and spatial saliency map are generated by summing the colour and spatial saliency measures of Gaussian models weighted by the normalised colour likelihood measures, and they are further combined to obtain the final saliency map. Experimental results on a dataset with 1000 images and ground truths demonstrate the better saliency detection performance of our saliency model.
Conference Paper
Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. Recently, full-resolution salient maps that retain well-defined boundaries have attracted attention. In these maps, boundaries are preserved by retaining substantially more frequency content from the original image than older techniques. However, if the salient regions comprise more than half the pixels of the image, or if the background is complex, the background gets highlighted instead of the salient object. In this paper, we introduce a method for salient region detection that retains the advantages of such saliency maps while overcoming their shortcomings. Our method exploits features of color and luminance, is simple to implement and is computationally efficient. We compare our algorithm to six state-of-the-art salient region detection methods using publicly available ground truth. Our method outperforms the six algorithms by achieving both higher precision and better recall. We also show application of our saliency maps in an automatic salient object segmentation scheme using graph-cuts.
Conference Paper
Using prior knowledge about object(s) is greatly beneficial for accurate image segmentation. In this paper, we present an image segmentation method that performs segmentation using a set of user provided object templates. For the object templates, we call the salient points in the object the characteristic points of the object. Based on the correspondence between the salient points of an arbitrary image and the characteristic points of the objects, object templates are mapped onto the image as bit masks. All templates for the same object are combined to generate the final object/background segmentation. Experiments show that this method can create good segmentations
Conference Paper
Visual attention analysis provides an alternative methodology to semantic image understanding in many applications such as adaptive content delivery and region-based image retrieval. In this paper, we propose a feasible and fast approach to attention area detection in images based on contrast analysis. The main contributions are threefold: 1) a new saliency map generation method based on local contrast analysis is proposed; 2) by simulating human perception, a fuzzy growing method is used to extract attended areas or objects from the saliency map; and 3) a practicable framework for image attention analysis is presented, which provides three-level attention analysis, i.e., attended view, attended areas and attended points. This framework facilitates visual analysis tools or vision systems to automatically extract attentions from images in a manner like human perception. User study results indicate that the proposed approach is effective and practicable.
Conference Paper
In this paper, we propose a novel image retargeting algorithm to resize images based on the extracted saliency information from the compressed domain. Firstly, we utilize DCT coefficients in JPEG bitstream to perform saliency detection with the consideration of the human visual sensitivity. The obtained saliency information is used to determine the relative visual importance of each 8 x 8 block for the image. Furthermore, we propose a new adaptive block-level seam removal operation for connected blocks to resize the image. Thanks to the directly derived saliency information from the compressed domain, the proposed image retargeting algorithm effectively preserves the objects of attention, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-the-art algorithms, as demonstrated with the careful analysis and in the extensive experiments.
Conference Paper
In image retrieval, global features related to color or texture are commonly used to describe the image content. The problem with this approach is that these global features cannot capture all parts of the image having different characteristics. Therefore, local computation of image information is necessary. By using salient points to represent local information, more discriminative features can be computed. In this paper, we compare a wavelet-based salient point extraction algorithm with two corner detectors using the criteria: repeatability rate and information content. We also show that extracting color and texture information in the locations given by our salient points provides significantly improved results in terms of retrieval accuracy, computational complexity, and storage space of feature vectors as compared to global feature approaches. q 2003 Published by Elsevier B.V.
Conference Paper
Conference Paper
We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L 2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to quantization, distortion, occlusion etc. In this paper we present a new Earth Mover’s Distance (EMD) variant. We show that it is a metric (unlike the original EMD [1] which is a metric only for normalized histograms). Moreover, it is a natural extension of the L 1 metric. Second, we propose a linear time algorithm for the computation of the EMD variant, with a robust ground distance for oriented gradients. Finally, extensive experimental results on the Mikolajczyk and Schmid dataset [2] show that our method outperforms state of the art distances.