Chapter

Scale space

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Scale-space theory is a framework for multi-scale image representation, which has been developed by the computer vision community with complementary motivations from physics and biological vision. The idea is to handle the multi-scale nature of real-world objects, which implies that objects may be perceived in different ways depending on the scale of observation. If one aims at developing automatic algorithms for interpreting images of unknown scenes, there is no way to know a priori what scales are relevant. Hence, the only reasonable approach is to consider representations at all scales simultaneously. From axiomatic derivations is has been shown that given the requirement that coarse-scale representations should correspond to true simplifications of fine scale structures, convolution with Gaussian kernels and Gaussian derivatives is singled out as a canonical class of image operators for the earliest stages of visual processing. These image operators can be used as basis for solving a large variety of visual tasks, including feature detection, feature classification, stereo matching, motion descriptors, shape cues and image-based recognition. By complementing scale-space representation with a module for automatic scale selection based on the maximization of normalized derivatives over scales, early visual modules can be made scale invariant. In this way, visual modules will be able to automatically adapt to the unknown scale variations that may occur due to objects and substructures of varying physical size as well as objects with varying distances to the camera. An interesting similarity to biological vision is that the scale-space operators closely resemble receptive field profiles registered in neurophysiological studies of the mammalian retina and visual cortex.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Authors in [21,22] have shown that using image blobs [18] tackles the limitations encountered by Block-based, keypoints-based, and segments-based CMFD techniques. Therefore to estimate the geometric transformations parameters from CMF we will use image blobs and scale-rotation invariant features. ...
... Image blobs are regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions [18,19]. The goal of a blob detector is to identify and mark these regions. ...
... Blobs are maxima of the LoG response in scale − space and the radius of each blob is approximately √ 2 [18]. LoG is multiplied by 2 to achieve the scale independence. ...
Article
Full-text available
A copy-move forgery is a passive tampering wherein one or more regions have been copied and pasted within the same image. Often, geometric transformations, including scale, rotation, and rotation+scale are applied to the forged areas to conceal the counterfeits to the copy-move forgery detection methods. Recently, copy-move forgery detection using image blobs have been used to tackle the limitation of the existing detection methods. However, the main limitation of blobs-based copy-move forgery detection methods is the inability to perform the geometric transformation estimation. To tackle the above-mentioned limitation, this article presents a technique that detects copy-move forgery and estimates the geometric transformation parameters between the authentic region and its duplicate using image blobs and scale-rotation invariant keypoints. The proposed algorithm involves the following steps: image blobs are found in the image being analyzed; scale-rotation invariant features are extracted; the keypoints that are located within the same blob are identified; feature matching is performed between keypoints that are located within different blobs to find similar features; finally, the blobs with matched keypoints are post-processed and a 2D affine transformations is computed to estimate the geometric transformation parameters. Our technique is flexible and can easily take in various scale-rotation invariant keypoints including AKAZE, ORB, BRISK, SURF, and SIFT to enhance the effectiveness. The proposed algorithm is implemented and evaluated on images forged with copy-move regions combined with geometric transformation from standard datasets. The experimental results indicate that the new algorithm is effective for geometric transformation parameters estimation.
... We propose a novel alternative approach using image blobs [15,20] and BRISK feature to overcome some of the limitations of Block-based, Keypoints-based, and Segments-based CMFD techniques. Our primary contribution is to show experimentally that image blobs improve the performance of several previously studied features, and in particular BRISK features in CMFD. ...
... Image blobs are image regions that differ in properties, such as brightness or color compared to surrounding regions and the goal of blob detection is to identify and mark these regions. Most common blob detectors include Laplacian of Gaussian (LoG) [11,15] and the Difference of Gaussian (DoG) [15,16] operators. LoG blob filter is the second derivative of Gaussian filter. ...
... Image blobs are image regions that differ in properties, such as brightness or color compared to surrounding regions and the goal of blob detection is to identify and mark these regions. Most common blob detectors include Laplacian of Gaussian (LoG) [11,15] and the Difference of Gaussian (DoG) [15,16] operators. LoG blob filter is the second derivative of Gaussian filter. ...
Article
Full-text available
One of the most frequently used types of digital image forgery is copying one area in the image and pasting it into another area of the same image; this is known as the copy-move forgery. To overcome the limitations of the existing Block-based and Keypoint-based copy-move forgery detection methods, in this paper, we present an effective technique for copy-move forgery detection that utilizes the image blobs and keypoints. The proposed method is based on the image blobs and Binary Robust Invariant Scalable Keypoints (BRISK) feature. It involves the following stages: the regions of interest called image blobs and BRISK feature are found in the image being analyzed; BRISK keypoints that are located within the same blob are identified; finally, the matching process is performed between BRISK keypoints that are located in different blobs to find similar keypoints for copy-move regions. The proposed method is implemented and evaluated on the copy-move forgery standard datasets MICC-F8multi, MICC-F220, and CoMoFoD. The experimental results show that the proposed method is effective for geometric transformation, such as scaling and rotation, and shows robustness to post-processing operation, such as noise addition, blurring, and jpeg compression.
... Scale-space theory provides a framework for objects that occur at different scales and positions and that can be observed from a two-dimensional grayscale image , . The Gaussian scale-space representation , ; of , is a family of derived smoothed images at different scales ( ) and locations ( , ) (Lindeberg, 2010(Lindeberg, , 2008 as , ; , ; * , , ...
... Schematic diagram for tree detection and its uncertainty assessment Scale-space theory Scale-space theory provides a framework dealing with image structure where objects occur at different scales and positions. The input in this study is given by a two-dimensional orchard image , resulting into a family , ; of derived smoothed images from finer to coarser scale where tree objects are defined at any scale level(Lindeberg, 2010(Lindeberg, , 2008: ...
Thesis
Full-text available
Precision orchard management as a specific form of precision agriculture aims at supporting decision makers and farm managers by providing strategies to optimize crop production. Multiple information sources are used. In this thesis, the use of remote sensing images is explored for that purpose. In the past, an orchard was the smallest management scale to deal with it, whereas nowadays it concerns individual trees and leaves. This research explored downscaling methods for satellite images, bridging the gap between the tree patterns and detailed geographical information of trees on the ground. It focused on using both coarse and very high resolution satellite images with the aim of providing meaningful information at different level of scales. First, downscaling cokriging was carried out to match the spatial resolutions when obtaining Land Surface Temperature (LST) and Actual EvapoTranspiration (AET) from remote sensing images. We first applied it to a 1000m resolution MODIS LST product. We also downscaled a coarse AET map to a 250m resolution. For both procedures, the 250 m resolution MODIS NDVI product was the co-variable. The two procedures were applied to an agricultural area with a traditional irrigation network in Iran. The study showed that AET values obtained with the two downscaling procedures were similar to each other, but that AET showed a higher spatial variability if obtained with downscaled LST. We concluded that LST had a large effect on producing AET maps from Remote Sensing (RS) images and that downscaling cokriging was helpful to provide daily AET maps at medium spatial resolution. Second, super resolution mapping (SRM) was applied to a high resolution GeoEye image of a vineyard in Iran with the aim to determine the Actual EvapoTranspiration (AET) and Potential EvapoTranspiration (PET). The Surface Energy Balance System (SEBS) applied for that purpose requires the use of a thermal band, provided by a Landsat TM image of a 30 m resolution. Image fusion downscaled this information to the 0.5 by 0.5 m2 scale level. Grape trees in the vineyard planted in rows allowed us to distinguish three levels: field, rows of trees and individual trees. The study concluded that modern satellite derived information in combination with recently developed image analysis methods is able to provide reliable AET values at the row level, but not yet for every individual tree. Third, a framework based upon scale-space theory for detecting and delineating individual trees was developed. The study focused on extracting reliable and detailed information from very High Resolution (VHR) satellite images for the detection of individual trees. The images contain detailed information on spectral and geometrical properties of trees. Individual trees were modeled using a bell shaped spectral profile. Gaussian scale-space theory was applied to search for extrema in the scale-space domain. The procedures were tested on two orchards with different tree types, tree sizes and tree observation patterns. Local extrema of the determinant of the Hessian corresponded well to the geographical coordinates and the size of individual trees. False detections arising from a slight asymmetry of trees were distinguished from multiple detections of the same tree with different extents. The study demonstrated how the suggested framework can be used for image segmentation for orchards with different types of trees. It concluded that Gaussian scale-space theory can be applied to extract information from VHR satellite images for individual tree detection. This may lead to improved decision making for irrigation and crop water requirement purposes in future studies. Fourth, a refined tree crown model based upon Gaussian scale-space theory was developed from very high resolution satellite images. It focused on investigating the use of scale-space theory to detect individual trees in orchards. Trees were characterized by blobs, e.g., bell shaped surfaces. Their modelling required the identification of local maxima, whereas location of the maxima in the scale direction provided information about the tree size. The study presents a two-step procedure to relate the detected blobs to tree objects in the field. A Gaussian blob model identified tree crowns and an improved tree crown model was applied by modifying this model in the scale direction. Three representative cases were evaluated: an area with isolated vitellaria trees, an orchard with walnut trees and one with oil palm trees. Results showed that the refined Gaussian blob model improves upon the traditional Gaussian blob model by discriminating well between false and correct detections and accurately identifying size and position of trees. We concluded that the presented two-step modeling procedure is useful to automatically detect individual trees from VHR satellite images for at least three representative cases. To summarize, this research focused on satellite based methods at different levels of scales for orchard management. It improved the monitoring of trees, the detection of changes, mapping of tree health and determination of crop water requirement.
... Scale-space theory provides a framework for objects that occur at different scales and positions and that can be observed from a two-dimensional grayscale image f (x, y). The Gaussian scale-space representation L(x, y; s). of f (x, y) is a family of derived smoothed images at different scales (s) and locations (x, y) [27,28]: ...
Article
Full-text available
This research investigates the use of scale-space theory to detect individual trees in orchards from very-high resolution (VHR) satellite images. Trees are characterized by blobs, for example, bell-shaped surfaces. Their modeling requires the identification of local maxima in Gaussian scale space, whereas location of the maxima in the scale direction provides information about the tree size. A two-step procedure relates the detected blobs to tree objects in the field. First, a Gaussian blob model identifies tree crowns in Gaussian scale space. Second, an improved tree crown model modifies this model in the scale direction. The procedures are tested on the following three representative cases: an area with vitellaria trees in Mali, an orchard with walnut trees in Iran, and one case with oil palm trees in Indonesia. The results show that the refined Gaussian blob model improves upon the traditional Gaussian blob model by effectively discriminating between false and correct detections and accurately identifying size and position of trees. A comparison with existing methods shows an improvement of 10-20% in true positive detections. We conclude that the presented two-step modeling procedure of tree crowns using Gaussian scale space is useful to automatically detect individual trees from VHR satellite images for at least three representative cases.
... Let us introduce the following model of a point that belongs to a fiber image: it is a point that forms local maximum (ridge) in the source document image brightness profile in brightness gradient direction. Most popular criterion for checking if selected point is described by the introduced model is the criterion [25], based on whether the Hessian matrix first eigenvalue is negative. However, using this criterion leads to response not only on the ridges but also on objects boundaries that are wider that the fibers. ...
Preprint
In this work we consider the problem of the fluorescent security fibers detection on the images of identity documents captured under ultraviolet light. As an example we use images of the second and third pages of the Russian passport and show features that render known methods and approaches based on image binarization non applicable. We propose a solution based on ridge detection in the gray-scale image of the document with preliminary normalized background. The algorithm was tested on a private dataset consisting of both authentic and model passports. Abandonment of binarization allowed to provide reliable and stable functioning of the proposed detector on a target dataset.
... However, if the target line could be modeled as a line passing through an area with constant brightness, it leads to doubled boundaries. To detect such lines ridge detectors can be used [31,26]. However, all these methods detect curves is without curvature restrictions or, in other words, can detect lightning-like curves, which are not always satisfy the problem conditions. ...
Preprint
In this paper we consider a method for detecting end-to-end curves of limited curvature like the k-link polylines with bending angle between adjacent segments in a given range. The approximation accuracy is achieved by maximization of the quality function in the image matrix. The method is based on a dynamic programming scheme constructed over Fast Hough Transform calculation results for image bands. The proposed method asymptotic complexity is O(h⋅(w+h/k)⋅log(h/k)), where h and w are the image size, and k is the approximating polyline links number, which is an analogue of the complexity of the fast Fourier transform or the fast Hough transform. We also show the results of the proposed method on synthetic and real data.
Article
Full-text available
Deep learning models have achieved impressive performance in various tasks, but they are usually opaque with regards to their inner complex operation, obfuscating the reasons for which they make decisions. This opacity raises ethical and legal concerns regarding the real-life use of such models, especially in critical domains such as in medicine, and has led to the emergence of the eXplainable Artificial Intelligence (XAI) field of research, which aims to make the operation of opaque AI systems more comprehensible to humans. The problem of explaining a black-box classifier is often approached by feeding it data and observing its behaviour. In this work, we feed the classifier with data that are part of a knowledge graph, and describe the behaviour with rules that are expressed in the terminology of the knowledge graph, that is understandable by humans. We first theoretically investigate the problem to provide guarantees for the extracted rules and then we investigate the relation of “explanation rules for a specific class” with “semantic queries collecting from the knowledge graph the instances classified by the black-box classifier to this specific class”. Thus we approach the problem of extracting explanation rules as a semantic query reverse engineering problem. We develop algorithms for solving this inverse problem as a heuristic search in the space of semantic queries and we evaluate the proposed algorithms on four simulated use-cases and discuss the results.
Article
Full-text available
In recent years, Keypoints-based image detection algorithms have become essential to many image processing applications. They should be stable and invariant especially against image distortion and noise caused by different illumination conditions. Thus, the challenge is to design a faster and more robust detector in terms of accuracy and saliency of the detected keypoints. Toward this objective, the flexibility of artificial intelligence (AI) and its ability to learn and adapt has made it the primary choice to achieve this goal. In this paper, we propose a novel detector that combines the power of neural networks to detect robust feature points and fuzzy logic to select among them only the most significant. A neural network is implemented as a supervised machine learning technique. It is trained on a predefined database of straight edges (SEs) with different patterns representing a set of flow directions. The aim is to decompose a given contour into a set of connected straight edges (SEs) and estimate the flow direction for each. The transition points between nonlinear SEs are classified as edge corners (ECs). Finally, the set of these ECs is pruned by a fuzzy logic system to keep only the significant ones based on key corner parameters that can highly contribute in the matching process. Experimental results demonstrate clearly the robustness and saliency of our newly proposed NF-ECD in extracting keypoints. In addition, the NF-ECD achieves the best performance as compared to the state of the art keypoints detection algorithms. Using experiments conducted on the illumination set of the HPatches dataset, the repeatability score reaches 72.6%. On the other hand, the average computational time complexity obtained using the Object Recognition Dataset reaches 2.18 s which is the lowest among other similar detectors. In addition, NF-ECD shows an effective reduction in the matching runtime.
Article
Full-text available
The world is composed of objects, the ground, and the sky. Visual perception of objects requires solving two fundamental challenges: 1) segmenting visual input into discrete units and 2) tracking identities of these units despite appearance changes due to object deformation, changing perspective, and dynamic occlusion. Current computer vision approaches to segmentation and tracking that approach human performance all require learning, raising the question, Can objects be segmented and tracked without learning? Here, we show that the mathematical structure of light rays reflected from environment surfaces yields a natural representation of persistent surfaces, and this surface representation provides a solution to both the segmentation and tracking problems. We describe how to generate this surface representation from continuous visual input and demonstrate that our approach can segment and invariantly track objects in cluttered synthetic video despite severe appearance changes, without requiring learning.
Article
Full-text available
Scale-invariant keypoint detection is a fundamental problem in low-level vision. To accelerate keypoint detectors (e.g. DoG, Harris-Laplace, Hessian-Laplace) that are developed in Gaussian scale-space, various fast detectors (e.g., SURF, CenSurE, and BRISK) have been developed by approximating Gaussian filters with simple box filters. However, there is no principled way to design the shape and scale of the box filters. Additionally, the involved integral image technique makes it difficult to figure out the continuous kernels that correspond to the discrete ones used in these detectors, so there is no guarantee that those good properties such as causality in the original Gaussian space can be inherited. To address these issues, in this paper, we propose a unified B-spline framework for scale-invariant keypoint detection. Owing to an approximate relationship to Gaussian kernels, the B-spline framework provides a mathematical interpretation of existing fast detectors based on integral images. In addition, from B-spline theories, we illustrate the problem in repeated integration, which is the generalized version of the integral image technique. Finally, following the dominant measures for keypoint detection and automatic scale selection, we develop B-spline determinant of Hessian (B-DoH) and B-spline Laplacian-of-Gaussian (B-LoG) as two instantiations within the unified B-spline framework. For efficient computation, we propose to use repeated running-sums to convolve images with B-spline kernels with fixed orders, which avoids the problem of integral images by introducing an extra interpolation kernel. Our B-spline detectors can be designed in a principled way without the heuristic choice of kernel shape and scales and naturally extend the popular SURF and CenSurE detectors with more complex kernels. Extensive experiments on the benchmark dataset demonstrate that the proposed detectors outperform the others in terms of repeatability and efficiency.
Article
Full-text available
Background Automated segmentation of coronary arteries is a crucial step for computer-aided coronary artery disease (CAD) diagnosis and treatment planning. Correct delineation of the coronary artery is challenging in X-ray coronary angiography (XCA) due to the low signal-to-noise ratio and confounding background structures. Methods A novel ensemble framework for coronary artery segmentation in XCA images is proposed, which utilizes deep learning and filter-based features to construct models using the gradient boosting decision tree (GBDT) and deep forest classifiers. The proposed method was trained and tested on 130 XCA images. For each pixel of interest in the XCA images, a 37-dimensional feature vector was constructed based on (1) the statistics of multi-scale filtering responses in the morphological, spatial, and frequency domains; and (2) the feature maps obtained from trained deep neural networks. The performance of these models was compared with those of common deep neural networks on metrics including precision, sensitivity, specificity, F1 score, AUROC (the area under the receiver operating characteristic curve), and IoU (intersection over union). Results With hybrid under-sampling methods, the best performing GBDT model achieved a mean F1 score of 0.874, AUROC of 0.947, sensitivity of 0.902, and specificity of 0.992; while the best performing deep forest model obtained a mean F1 score of 0.867, AUROC of 0.95, sensitivity of 0.867, and specificity of 0.993. Compared with the evaluated deep neural networks, both models had better or comparable performance for all evaluated metrics with lower standard deviations over the test images. Conclusions The proposed feature-based ensemble method outperformed common deep convolutional neural networks in most performance metrics while yielding more consistent results. Such a method can be used to facilitate the assessment of stenosis and improve the quality of care in patients with CAD.
Chapter
Full-text available
Chapter
Full-text available
Chapter
Detecting corner points for the digital images is based on determining significant geometrical locations. Corner points lead and guide for providing significant clues for shape analysis and representation. They actually provide significant features of an object, which can be used in different phases of processing. In shape analysis problems, for example, a shape can be efficiently reformulated in a compact way and with sufficient accuracy if the corners are properly located. This chapter selects seven well referred algorithms from the literature to review, compare, and analyze empirically. It provides an overview of these selected algorithms so that users can easily pick an appropriate one for their specific applications and requirements.
Article
Textures and patterns are the distinguishing characteristics of objects. Texture classification plays fundamental role in computer vision and image processing applications. In this paper, texture classification using PDE (partial differential equation) approach and wavelet transform is presented. The proposed method uses wavelet transform to obtain the directional information of the image. A PDE for anisotropic diffusion is employed to obtain texture component of the image. The feature set is obtained by computing different statistical features from the texture component. The linear discriminant analysis (LDA) enhances separability of texture feature classes. The features obtained from LDA are class representatives. The proposed approach is experimented on three gray scale texture datasets: VisTex, Kylberg, and Oulu. The classification accuracy of the proposed method is evaluated using k-NN classifier. The experimental results show the effectiveness of the proposed method as compared to the other methods in the literature.
Article
Full-text available
This paper presents a brain T1-weighted structural magnetic resonance imaging (MRI) biomarker that combines several individual MRI biomarkers (cortical thickness measurements, volumetric measurements, hippocampal shape, and hippocampal texture). The method was developed, trained, and evaluated using two publicly available reference datasets: a standardized dataset from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the imaging arm of the Australian Imaging Biomarkers and Lifestyle flagship study of ageing (AIBL). In addition, the method was evaluated by participation in the Computer-Aided Diagnosis of Dementia (CADDementia) challenge. Cross-validation using ADNI and AIBL data resulted in a multi-class classification accuracy of 62.7% for the discrimination of healthy normal controls (NC), subjects with mild cognitive impairment (MCI), and patients with Alzheimer’s disease (AD). This performance generalized to the CADDementia challenge where the method, trained using the ADNI and AIBL data, achieved a classification accuracy 63.0%. The obtained classification accuracy resulted in a first place in the challenge, and the method was significantly better (McNemar’s test) than the bottom 24 methods out of the total of 29 methods contributed by 15 different teams in the challenge. The method was further investigated with learning curve and feature selection experiments using ADNI and AIBL data. The learning curve experiments suggested that neither more training data nor a more complex classifier would have improved the obtained results. The feature selection experiment showed that both common and uncommon individual MRI biomarkers contributed to the performance; hippocampal volume, ventricular volume, hippocampal texture, and parietal lobe thickness were the most important. This study highlights the need for both subtle, localized measurements and global measurements in order to discriminate NC, MCI, and AD simultaneously based on a single structural MRI scan. It is likely that additional non-structural MRI features are needed to further improve the obtained performance, especially to improve the discrimination between NC and MCI.
Article
Full-text available
In this paper the use of nonlinear cross-diffu\-sion systems to model image restoration is investigated, theoretically and numerically. In the first case, well-posedness, scale-space properties and long time behaviour are analyzed. From a numerical point of view, a computational study of the performance of the models is carried out, suggesting their diversity and potentialities to treat image filtering problems. The present paper is a continuation of a previous work of the same authors, devoted to linear cross-diffusion models. \keywords{Cross-diffusion \and Complex diffusion \and Image restoration}
Chapter
Image matching is a part of many computer vision or image processing applications, such as object recognition, registration, panoramic images and image mosaics, three-dimensional (3D) reconstruction and modeling, stereovision or even indexing and searching for images via content. This chapter examines a general scenario where the amount of visual information is limited and where no prior knowledge is available. The current process of image matching consists of three main stages: detecting the feature points/regions; calculating the descriptors such as Daisy descriptor and multi-scale oriented patches (MOPS) descriptor, in this region, normalized if necessary; and matching the feature points of the two images using their descriptors and estimating the geometric transformation by removing false matches.
Conference Paper
This paper researches on the key and difficult issues in stereo measurement deeply, including camera calibration, feature extraction, stereo matching and depth computation, and then put forwards a novel matching method combined the seed region growing and SIFT feature matching. It first uses SIFT characteristics as matching criteria for feature points matching, and then takes the feature points as seed points for region growing to get better depth information. Experiments are conducted to validate the efficiency of the proposed method using standard matching graphs, and then the proposed method is applied to dimensional measurement of mechanical parts. The results show that the measurement error is less than 0.5mm for medium sized mechanical parts, which can meet the demands of precision measurement.
Article
Full-text available
Most tactile sensors are based on the assumption that touch depends on measuring pressure. However, the pressure distribution at the surface of a tactile sensor cannot be acquired directly and must be inferred from the deformation field induced by the touched object in the sensor medium. Currently, there is no consensus as to which components of strain are most informative for tactile sensing. Here, we propose that shape-related tactile information is more suitably recovered from shear strain than normal strain. Based on a contact mechanics analysis, we demonstrate that the elastic behavior of a haptic probe provides a robust edge detection mechanism when shear strain is sensed. We used a jamming-based robot gripper as a tactile sensor to empirically validate that shear strain processing gives accurate edge information that is invariant to changes in pressure, as predicted by the contact mechanics study. This result has implications for the design of effective tactile sensors as well as for the understanding of the early somatosensory processing in mammals. A reliable mapping between the physical world and the acquired data is a basic issue faced by any artificial or natural sensory system. For instance, in vision, a fundamental challenge is to access robustly the geometry of a body from the structure of the captured light intensity, despite variations in the viewing conditions. This task is difficult because the intrinsic geometry of a body is not mapped one-to-one to the geometry of images
Article
The extraction of ideal age feature is a challenging task in vibration-based bearing remaining useful life (RUL) estimation. Aiming at this problem, a new approach is proposed on the basis of time–frequency representation (TFR) and supervised dimensionality reduction. Firstly, S transform and Gaussian pyramid are employed to obtain TFRs at multiple scales. Textural features of TFRs are used as the high-dimensional features. Then, a two-step supervised dimensionality reduction technique, i.e. principal component analysis (PCA) plus linear discriminant analysis, is employed to reduce the dimensionality, in which the target dimension and number of classes are taken as variable parameters. Finally, the simple multiple linear regression model is utilized to estimate the RUL. Experimental results indicate that the proposed approach outperforms the methods using traditional statistical features and/or PCA. Additionally, variable conditions of load and speed should be considered in the future to further improve the proposed approach.
Article
In the traditional photoelectric tracking system, the fixed scale filter is usually used for target detection. The target features especially the target size is changing all the time due to the distance change between the target and tracking system. It results in the filter not suitable for the target size changing, and it is difficult for the system to obtain the best extraction results. For this question, a small target detection method based on adaptive scale is proposed. Based on the Laplacian scale-space theory, the normalized Laplacian scale-space images are studied, the target signal is enhanced and the noise is suppressed at the same time. By finding the best parameter of target both in the scale and space, the small targets with different scale could be detected effectively. The simulation test and the experimental results show that the method proposed improved the detection capability of the system for small target with different size remarkably.
Article
Texture classification is an important approach for effective classification of digital images. Extraction of features is a challenging task due to spatial entanglement, orientation mixing and high frequency overlapping. The partial differential equation (PDE) transform is an efficient method for functional mode decomposition. This paper presents a novel approach for texture modeling based on PDE. Each image f is decomposed into a family of derived sub-images. f is split into u component, obtained with anisotropic diffusion and the texture component v, which is calculated by difference between original image and the u component. The feature set is obtained by applying local directional binary patterns (LDBP) approach and extracting co-occurrence parameters. The separability of texture classes is enhanced using linear discriminant analysis (LDA). The features obtained from LDA are class representatives. The proposed approach is validated on sixteen Brodatz textures. The k-NN classifier is used for classification. The experimental results indicate that the proposed approach leads to significant improvements in classification accuracy, reduction in feature dimensionality, reduction in computational and time complexity.
Article
The stability of feature matching is a fundamental problem for many robotic tasks, such as visual servoing and navigation. This paper presents a new feature extractor which is able to improve the robustness of feature matching under large scale change. The new extractor consists of a fast and scalable Laplacian of Gaussian (LoG) approximator based on blocky Mexican hat wavelet, and an optimized sampling distribution for the features in the multi-resolution scale space. The sampling distribution is a critical factor to boosting the matching rate, however it was not discussed in depth by the studies in recent years. For evaluation, the new algorithm is compared with SIFT and SURF, it demonstrates a significant matching rate improvement.
Article
A data-adaptive multiscale spatial decomposition model is proposed to deal with skew-distributed data (eg, population or GDP). Relying on the filtering characteristics according to the bandwidth change of kernel smoothing, the parallel spatial kernel smoothing with different bandwidths is constructed as spatial filter banks for filtering spatial variations at different spatial scales. The filtering residual, a function changing with the spatial scale, is then extracted by parallel spatial kernel smoothing. With a change point detection model based on the second deviation, standard deviation of the residential data is selected for identifying robust significant scales. Then we present the iterative algorithm to extract and remove significant spatial variations at different scales. With well-designed stop criteria, the full hierarchical spatial scale structure in the original spatial process can be adaptively established without assigning the decomposition levels artificially. The computation processes and the statistical and spatial distribution characteristics are demonstrated with case studies of 2003 Chinese population data and GDP data, and the results show that the proposed model is suitable for decomposing the spatial data with spatial heterogeneity. Comparison with the 2D wavelet decomposition suggests that our model has better data-adaptive and shape-preserving ability.
Article
Full-text available
We present an improved model and theory for time-causal and time-recursive spatio-temporal receptive fields, obtained by a combination of Gaussian receptive fields over the spatial domain and first-order integrators or equivalently truncated exponential filters coupled in cascade over the temporal domain. Compared to previous spatio-temporal scale-space formulations in terms of non-enhancement of local extrema or scale invariance, these receptive fields are based on different scale-space axiomatics over time by ensuring non-creation of new local extrema or zero-crossings with increasing temporal scale. Specifically, extensions are presented about (i)~parameterizing the intermediate temporal scale levels, (ii)~analysing the resulting temporal dynamics, (iii)~transferring the theory to a discrete implementation in terms of recursive filters over time and (iv)~computing scale-normalized spatio-temporal derivative expressions for spatio-temporal feature detection and (v)~computational modelling of receptive fields in the lateral geniculate nucleus (LGN) and the primary visual cortex (V1) in biological vision. We show how scale-normalized temporal derivatives can be defined for these time-causal scale-space kernels and how the composed theory can be used for computing basic types of scale-normalized spatio-temporal derivative expressions in a computationally efficient manner.
Article
Full-text available
We present a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to (i) enable invariance of receptive field responses under natural sound transformations and (ii) ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or a cascade of time-causal first-order integrators over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.
Article
Full-text available
Texture is an important feature that aids in identifying objects of interest or region of interest irrespective of the source of the image. In this paper, a novel and simple isopattern-based texture feature is introduced. Spatial gray scale dependencies represented by bit plane is analyzed for speci¯c patterns and are accumulated in bins. These are scaled by half-normal weighting function to provide isopattern texture feature. The ability of this texture feature in capturing textural variations of the images despite the presence of illumination, scale and rotation is demonstrated by conducting texture analysis on Brodatz, OuTex texture datasets and its classi¯cation accuracy on Kylberg dataset. The results of these two experimentation indicate that the proposed textural feature picks variation in texture signi¯cantly and has a better texture classi¯cation accuracy of 98.26% when compared with the state-of-the-art features like Gabor, GLCM and LBP.
Article
Full-text available
Automatic quantification of cardinal histologic features of nonalcoholic fatty liver disease (NAFLD) may reduce human variability and allow continuous rather than semiquantitative assessment of injury. We recently developed an automated classifier that can detect and quantify macrosteatosis with greater than or equal to 95% precision and recall (sensitivity). Here, we report our early results on the classifier's performance in detecting lobular inflammation and hepatocellular ballooning. Automatic quantification of lobular inflammation and ballooning was performed on digital images of hematoxylin and eosin-stained slides of liver biopsy samples from 59 individuals with normal liver histology and varying severity of NAFLD. Two expert hepatopathologists scored liver biopsies according the nonalcoholic steatohepatitis clinical research network scoring system and provided annotations of lobular inflammation and hepatocyte ballooning on the digital images. The classifier had precision and recall of 70% and 49% for lobular inflammation, and 91% and 54% for hepatocyte ballooning. In addition, the classifier had an area under the curve of 95% for lobular inflammation and 98% for hepatocyte ballooning. The Spearman rank correlation coefficient for comparison with pathologist grades was 45.2% for lobular inflammation and 46% for hepatocyte ballooning. Our novel observations demonstrate that automatic quantification of cardinal NAFLD histologic lesions is feasible and offer promise for further development of automatic quantification as a potential aid to pathologists evaluating NAFLD biopsies in clinical practice and clinical trials. Copyright © 2015. Published by Elsevier Inc.
Article
Motivated by the robust principal component analysis, infrared small target image is regarded as low-rank background matrix corrupted by sparse target and noise matrices, thus a new target-background separation model is designed, subsequently, an adaptive detection method of infrared small target is presented. Firstly, multi-scale transform and patch transform are used to generate an image patch set for infrared small target detection; secondly, target-background separation of each patch is achieved by recovering the low-rank and sparse matrices using adaptive weighting parameter; thirdly, the image reconstruction and fusion are carried out to obtain the entire separated background and target images; finally, the infrared small target detection is realized by threshold segmentation of template matching similarity measurement. In order to validate the performance of the proposed method, three experiments: target-background separation, background clutter suppression and infrared small target detection, are performed over different clutter background with real infrared small targets in single-frame or sequence images. A series of experiment results demonstrate that the proposed method can not only suppress background clutter effectively even if with strong noise interference but also detect targets accurately with low false alarm rate.
Conference Paper
Keypoint or interest point detection is the first step in many computer vision algorithms. The detection performance of the state-of-the-art detectors is, however, strongly influenced by compression artifacts, especially at low bit rates. In this paper, we design a novel quantization table for the widely-used JPEG compression standard which leads to improved feature detection performance. After analyzing several popular scale-space based detectors, we propose a novel quantization table which is based on the observed impact of scale-space processing on the DCT basis functions. Experimental results show that the novel quantization table outperforms the JPEG default quantization table in terms of feature repeatability, number of correspondences, matching score, and number of correct matches.
Conference Paper
This paper presents a novel method to find corners that are well located and stable interest points in a given image. Our corners are defined as intersection points of non collinear straight image edges, which are very robust against various image transformations like image scaling, rotation, translation and also to viewpoint and illumination changes. Some light updates on the linking edge step that should be applied in order to extract edges and their intersections that construct the searched corners are also discussed. Experiments conducted on real images demonstrate that the proposed method is able to achieve a very good performance in accuracy, stability and especially computational efficiency in comparison with existing methods.
Conference Paper
Full-text available
Detection and analysis of tables on document images has been one of the most researched topics in document image processing. In this study, we define novel methods for the detection and analysis of tables from document images, and show their performance results on realistic table examples. The main method developed is projection-scale-space (PSS), where local and global constraints of the table in row basis are analyzed for consistency. PSS is robust to the character set used in a document, the image resolution and the noise ratio of a document image, and can perform detection operations in a highly effective manner. Furthermore, the method proposed works on tables with and without table borders and is able to analyze rows and columns of tables. The proposed method has been tested on a dataset of 105 documents containing 130 tables and the systems high performance has been stated in quantitative basis.
Article
This paper proposes a novel family of local feature descriptors, a variant of the speed up robust features (SURF) descriptor, which is capable of demonstrably better performance. The conventional SURF descriptor is an efficient implementation of the SIFT descriptor. Although the SURF descriptor can represent the nature of the underlying image pattern, it is still sensitive to more complicated deformations such as large viewpoint and rotation changes. To solve this problem, our family of descriptors, called MDGHM-SURF, is based on the modified discrete Gaussian–Hermite moment (MDGHM), which devises a movable mask to represent the local feature information of non-square images. Whereas conventional SURF uses first-order derivatives, MDGHM-SURF uses MDGHM, which offers more feature information than first-order derivative-based local descriptors such as SURF and SIFT. Consequently, by redefining the conventional SURF descriptor using MDGHM, MDGHM-SURF can extract more distinctive features than conventional SURF. The results of evaluations conducted with six types of deformations indicate that our proposed method outperforms the matching accuracy of other SURF related algorithms.
Conference Paper
Smooth gradient distributions are obtained through an explicit filtering technique that complements an adjoint Navier-Stokes method in the framework of CAD-free shape op- timisation. The gradients of the objective functional are obtained through a continuous adjoint method that uses consistent discretisation schemes devised based on the primal discretisation. After a verification study based on the direct-differentiation method, the approach is used to optimise the shape of ducts for incompressible flow. The suggested filtering approach is shown to be first-order equivalent to the well-established smoothing based on so-called Sobolev gradients.
Article
Full-text available
The performance of matching and object recognition methods based on interest points depends on both the properties of the underlying interest points and the choice of associated image descriptors. This paper demonstrates advantages of using generalized scale-space interest point detectors in this context for selecting a sparse set of points for computing image descriptors for image-based matching. For detecting interest points at any given scale, we make use of the Laplacian nabla^2 L, the determinant of the Hessian det H L and four new unsigned or signed Hessian feature strength measures D_1 L, \tilde{D}_1 L, D_2, L and \tilde{D}_2 L, which are defined by generalizing the definitions of the Harris and Shi-and-Tomasi operators from the second moment matrix to the Hessian matrix. Then, feature selection over different scales is performed either by scale selection from local extrema over scale of scale-normalized derivates or by linking features over scale into feature trajectories and computing a significance measure from an integrated measure of normalized feature strength over scale. A theoretical analysis is presented of the robustness of the differential entities underlying these interest points under image deformations, in terms of invariance properties under affine image deformations or approximations thereof. Disregarding the effect of the rotationally symmetric scale-space smoothing operation, the determinant of the Hessian det H L is a truly affine covariant differential entity and the Hessian feature strength measures D_1 L and \tilde{D}_1 L have a major contribution from the affine covariant determinant of the Hessian, implying that local extrema of these differential entities will be more robust under affine image deformations than local extrema of the Laplacian operator or the Hessian feature strength measures D_2 L, \tilde{D}_2 L. It is shown how these generalized scale-space interest points allow for a higher ratio of correct matches and a lower ratio of false matches compared to previously known interest point detectors within the same class.The best results are obtained using interest points computed with scale linking and with the new Hessian feature strength measures D_1 L$, \tilde{D}_1 L and the determinant of the Hessian det H L being the differential entities that lead to the best matching performance under perspective image transformations with significant foreshortening, and better than the more commonly used Laplacian operator, its difference-of-Gaussians approximation or the Harris-Laplace operator. We propose that these generalized scale-space interest points, when accompanied by associated local scale-invariant image descriptors, should allow for better performance of interest point based methods for image-based matching, object recognition and related visual tasks.
Article
Full-text available
The α scale spaces is a recent theory that open new possibilities of phase-based image processing. It is a parameterised class (α∈]0,1])(α∈]0,1]) of linear scale space representations, which allows a continuous connection beyond the well-known Gaussian scale space (α=1α=1). In this paper, we make use of this unified representation to derive new families of band pass quadrature filters, built from derivatives and difference of the α scale space generating kernels. This construction leads to generalised α kernel filters including the commonly known families derived from the Gaussian and the Poisson kernels. The properties of each family are first studied and then experiments on one- and two-dimensional signals are shown to exemplify how the suggested filters can be used for edge detection. This work is complemented by an experimental evaluation, which demonstrates that the new proposed filters are a good alternative to the commonly used Log-Gabor filter.
Chapter
Full-text available
The notion of scale selection refers to methods for estimating characteristic scales in image data and for automatically determining locally appropriate scales in a scale-space representation, so as to adapt subsequent processing to the local image structure and compute scale invariant image features and image descriptors. An essential aspect of the approach is that it allows for a bottom-up determination of inherent scales of features and objects without first recognizing them or delimiting alternatively segmenting them from their surrounding. Scale selection methods have also been developed from other viewpoints of performing noise suppression and exploring top-down information.
Article
A novel spatial filtering method is presented to detect small targets in a cluttered background for infrared search and track (IRST). The filtering process of localised directional Laplacian-of-Gaussian (LoG) filtering, and the minimum selection can then remove false detections around cloud edges maintaining a small target detection capability. Experimental results validate the feasibility of the proposed method (called min-local-LoG).
Article
Automated assessment of histological features of non-alcoholic fatty liver disease (NAFLD) may reduce human variability and provide continuous rather than semiquantitative measurement of these features. As part of a larger effort, we perform automatic classification of steatosis, the cardinal feature of NAFLD, and other regions that manifest as white in images of hematoxylin and eosin-stained liver biopsy sections. These regions include macrosteatosis, central veins, portal veins, portal arteries, sinusoids and bile ducts. Digital images of hematoxylin and eosin-stained slides of 47 liver biopsies from patients with normal liver histology (n = 20) and NAFLD (n = 27) were obtained at 20× magnification. The images were analyzed using supervised machine learning classifiers created from annotations provided by two expert pathologists. The classification algorithm performs with 89% overall accuracy. It identified macrosteatosis, bile ducts, portal veins and sinusoids with high precision and recall (≥82%). Identification of central veins and portal arteries was less robust but still good. The accuracy of the classifier in identifying macrosteatosis is the best reported. The accurate automated identification of macrosteatosis achieved with this algorithm has useful clinical and research-related applications. The accurate detection of liver microscopic anatomical landmarks may facilitate important subsequent tasks, such as localization of other histological lesions according to liver microscopic anatomy.
Conference Paper
We propose a method to perform automatic detection of electrophysiology (EP) catheters in fluoroscopic sequences. Our approach does not need any initialization, is completely automatic, and can detect an arbitrary number of catheters at the same time. The method is based on the usage of blob detectors and clustering in order to detect all catheter electrodes, overlapping or not, within the X-ray images. The proposed technique is validated on 1422 fluoroscopic images yielding a tip detection rate of 99.3% and mean distance of 0.5mm from manually labeled ground truth centroids for all electrodes.
Article
The Gaussian scale-space is a singular integral convolution operator with scaled Gaussian kernel. For a large class of singular integral convolution operators with differentiable kernels, a general method for constructing mother wavelets for continuous wavelet transforms is developed, and Calderón type inversion formulas, in both integral and semi-discrete forms, are derived for functions in LpLp spaces. In the case of the Gaussian scale-space, the semi-discrete inversion formula can further be expressed as a sum of wavelet transforms with the even order derivatives of the Gaussian as mother wavelets. Similar results are obtained for BB-spline scale-space, in which the high frequency component of a function between two consecutive dyadic scales can be represented as a finite linear combination of wavelet transforms with the derivatives of the BB-spline or the spline framelets of Ron and Shen as mother wavelets.
Article
As schlieren image data quantity has increased with faster frame rates, we are now faced with literally thousands of images to analyze. This presents an opportunity to study global flow structures over time that may not be evident from traditional surface measurements. Oblique structures, such as shock waves and contact surfaces, which give critical flowfield information, are common in many of these images. As data sets have become large, a degree of automation is desirable to extract these features to derive information on their behavior through the sequence. This paper employs a methodology based on computer vision techniques to provide an empirical estimate of oblique structure angles through an unsteady sequence. The methodology has been applied to a complex flowfield with multiple shock structures in a small region of interest (88 x 128 pixels). This study obtains a converged detection success rate of 94% and 97% for these structures and shows that computer vision techniques can be effective for the evaluation of optical data sets.
Article
Full-text available
A receptive field constitutes a region in the visual field where a visual cell or a visual operator responds to visual stimuli. This paper presents a theory for what types of receptive field profiles can be regarded as natural for an idealized vision system, given a set of structural requirements on the first stages of visual processing that reflect symmetry properties of the surrounding world. These symmetry properties include (i) covariance properties under scale changes, affine image deformations, and Galilean transformations of space-time as occur for real-world image data as well as specific requirements of (ii) temporal causality implying that the future cannot be accessed and (iii) a time-recursive updating mechanism of a limited temporal buffer of the past as is necessary for a genuine real-time system. Fundamental structural requirements are also imposed to ensure (iv) mutual consistency and a proper handling of internal representations at different spatial and temporal scales. It is shown how a set of families of idealized receptive field profiles can be derived by necessity regarding spatial, spatio-chromatic, and spatio-temporal receptive fields in terms of Gaussian kernels, Gaussian derivatives, or closely related operators. Such image filters have been successfully used as a basis for expressing a large number of visual operations in computer vision, regarding feature detection, feature classification, motion estimation, object recognition, spatio-temporal recognition, and shape estimation. Hence, the associated so-called scale-space theory constitutes a both theoretically well-founded and general framework for expressing visual operations. There are very close similarities between receptive field profiles predicted from this scale-space theory and receptive field profiles found by cell recordings in biological vision. Among the family of receptive field profiles derived by necessity from the assumptions, idealized models with very good qualitative agreement are obtained for (i) spatial on-center/off-surround and off-center/on-surround receptive fields in the fovea and the LGN, (ii) simple cells with spatial directional preference in V1, (iii) spatio-chromatic double-opponent neurons in V1, (iv) space-time separable spatio-temporal receptive fields in the LGN and V1, and (v) non-separable space-time tilted receptive fields in V1, all within the same unified theory. In addition, the paper presents a more general framework for relating and interpreting these receptive fields conceptually and possibly predicting new receptive field profiles as well as for pre-wiring covariance under scaling, affine, and Galilean transformations into the representations of visual stimuli. This paper describes the basic structure of the necessity results concerning receptive field profiles regarding the mathematical foundation of the theory and outlines how the proposed theory could be used in further studies and modelling of biological vision. It is also shown how receptive field responses can be interpreted physically, as the superposition of relative variations of surface structure and illumination variations, given a logarithmic brightness scale, and how receptive field measurements will be invariant under multiplicative illumination variations and exposure control mechanisms.
Article
Gaussian filtering is a basic tool for image processing. Noise reduction, scale-space generation or edge detection are examples of tasks where different Gaussian filters can be successfully utilized. However, their implementation in a conventional digital processor by applying a convolution kernel throughout the image is quite inefficient. Not only the value of every single pixel is taken into consideration sucessively, but also contributions from their neighbors need to be taken into account. Processing of the frame is serialized and memory access is intensive and recurrent. The result is a low operation speed or, alternatively, a high power consumption. This inefficiency is specially remarkable for filters with large variance, as the kernel size increases significantly. In this paper, a different approach to achieve Gaussian filtering is proposed. It is oriented to applications with very low power budgets. The key point is a reconfigurable focal-plane binning. Pixels are grouped according to the targeted resolution by means of a division grid. Then, two consecutive shifts of this grid in opposite directions carry out the spread of information to the neighborhood of each pixel in parallel. The outcome is equivalent to the application of a 3×3 binomial filter kernel, which in turns is a good approximation of a Gaussian filter, on the original image. The variance of the closest Gaussian filter is around 0.5. By repeating the operation, Gaussian filters with larger variances can be achieved. A rough estimation of the necessary energy for each repetition until reaching the desired filter is below 20nJ for a QCIF-size array. Finally, experimental results of a QCIF proofof- concept focal-plane array manufactured in 0.35mum CMOS technology are presented. A maximum RMSE of only 1.2% is obtained by the on-chip Gaussian filtering with respect to the corresponding equivalent ideal filter implemented off-chip.
Article
Full-text available
The brain is able to maintain a stable perception although the visual stimuli vary substantially on the retina due to geometric transformations and lighting variations in the environment. This paper presents a theory for achieving basic invariance properties already at the level of receptive fields. Specifically, the presented framework comprises (i) local scaling transformations caused by objects of different size and at different distances to the observer, (ii) locally linearized image deformations caused by variations in the viewing direction in relation to the object, (iii) locally linearized relative motions between the object and the observer and (iv) local multiplicative intensity transformations caused by illumination variations. The receptive field model can be derived by necessity from symmetry properties of the environment and leads to predictions about receptive field profiles in good agreement with receptive field profiles measured by cell recordings in mammalian vision. Indeed, the receptive field profiles in the retina, LGN and V1 are close to ideal to what is motivated by the idealized requirements. By complementing receptive field measurements with selection mechanisms over the parameters in the receptive field families, it is shown how true invariance of receptive field responses can be obtained under scaling transformations, affine transformations and Galilean transformations. Thereby, the framework provides a mathematically well-founded and biologically plausible model for how basic invariance properties can be achieved already at the level of receptive fields and support invariant recognition of objects and events under variations in viewpoint, retinal size, object motion and illumination. The theory can explain the different shapes of receptive field profiles found in biological vision, which are tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time, from a requirement that the visual system should be invariant to the natural types of image transformations that occur in its environment.
Chapter
Full-text available
So far we have been concerned with the theory of scale-space representation and its application to feature detection in image data. A basic functionality of a computer vision system, however, is the ability to derive information about the three-dimensional shape of objects in the world.
Book
Full-text available
Chapter
Full-text available
The notion of multi-scale representation is essential to many aspects of early visual processing. This article deals with the axiomatic formulation of the special type of multi-scale representation known as scale-space representation. Specifically, this work is concerned with the problem of how different choices of basic assumptions (scale-space axioms) restrict the class of permissible smoothing operations. A scale-space formulation previously expressed for discrete signals is adapted to the continuous domain. The basic assumptions are that the scale-space family should be generated by convolution with a one-parameter family of rotationally symmetric smoothing kernels that satisfy a semi-group structure and obey a causality condition expressed as a non-enhancement requirement of local extrema. Under these assumptions, it is shown that the smoothing kernel is uniquely determined to be a Gaussian. Relations between this scale scale-space formulation and recent formulations based on scale invariance are explained in detail. Connections are also pointed out to approaches based on non-uniform smoothing.
Conference Paper
Full-text available
In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF’s strong performance.
Article
Full-text available
This paper defines a multiple resolution representation for the two-dimensional gray-scale shapes in an image. This representation is constructed by detecting peaks and ridges in the difference of lowpass (DOLP) transform. Descriptions of shapes which are encoded in this representation may be matched efficiently despite changes in size, orientation, or position. Motivations for a multiple resolution representation are presented first, followed by the definition of the DOLP transform. Techniques are then presented for encoding a symbolic structural description of forms from the DOLP transform. This process involves detecting local peaks and ridges in each bandpass image and in the entire three-dimensional space defined by the DOLP transform. Linking adjacent peaks in different bandpass images gives a multiple resolution tree which describes shape. Peaks which are local maxima in this tree provide landmarks for aligning, manipulating, and matching shapes. Detecting and linking the ridges in each DOLP bandpass image provides a graph which links peaks within a shape in a bandpass image and describes the positions of the boundaries of the shape at multiple resolutions. Detecting and linking the ridges in the DOLP three-space describes elongated forms and links the largest peaks in the tree. The principles for determining the correspondence between symbols in pairs of such descriptions are then described. Such correspondence matching is shown to be simplified by using the correspondence at lower resolutions to constrain the possible correspondence at higher resolutions.
Conference Paper
Full-text available
This paper describes an extension of a technique for the recognition and tracking of every day objects in cluttered scenes. The goal is to build a system in which ordinary desktop objects serve as physical icons in a vision based system for man-machine interaction. In such a system, the manipulation of objects replaces user commands. A view-variant recognition technique, developed by the second author, has been adapted by the first author for a problem of recognising and tracking objects on a cluttered background in the presence of occlusions. This method is based on sampling a local appearance function at discrete viewpoints by projecting it onto a vector of receptive fields which have been normalised to local scale and orientation. This paper reports on the experimental validation of the approach, and of its extension to the use of receptive fields based on colour. The experimental results indicate that the second author’s technique does indeed provide a method for building a fast and robust recognition technique. Furthermore, the extension to coloured receptive fields provides a greater degree of local discrimination and an enhanced robustness to variable background conditions. The approach is suitable for the recognition of general objects as physical icons in an augmented reality.
Article
Full-text available
In this paper we propose a novel approach for detecting interest points invariant to scale and affine transformations. Our scale and affine invariant detectors are based on the following recent results: (1) Interest points extracted with the Harris detector can be adapted to affine transformations and give repeatable results (geometrically stable). (2) The characteristic scale of a local structure is indicated by a local extremum over scale of normalized derivatives (the Laplacian). (3) The affine shape of a point neighborhood is estimated based on the second moment matrix.
Article
Full-text available
Physiological evidence is presented that visual receptive fields in the primate eye are shaped like the sum of a Gaussian function and its Laplacian. A new 'difference-of-offset-Gaussians' or DOOG neural mechanism was identified, which provided a plausible neural mechanism for generating such Gaussian derivative-like fields. The DOOG mechanism and the associated Gaussian derivative model provided a better approximation to the data than did the Gabor or other competing models. A model-free Wiener filter analysis provided independent confirmation of these results. A machine vision system was constructed to simulate human foveal retinal vision, based on Gaussian derivative filters. It provided edge and line enhancement (deblurring) and noise suppression, while retaining all the information in the original image.
Article
Full-text available
The relative efficiency of any particular image-coding scheme should be defined only in relation to the class of images that the code is likely to encounter. To understand the representation of images by the mammalian visual system, it might therefore be useful to consider the statistics of images from the natural environment (i.e., images with trees, rocks, bushes, etc). In this study, various coding schemes are compared in relation to how they represent the information in such natural images. The coefficients of such codes are represented by arrays of mechanisms that respond to local regions of space, spatial frequency, and orientation (Gabor-like transforms). For many classes of image, such codes will not be an efficient means of representing information. However, the results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy (e.g., correlation between the intensities of neighboring pixels) into first-order redundancy (i.e., the response distribution of the coefficients). Such coding produces a relatively high signal-to-noise ratio and permits information to be transmitted with only a subset of the total number of cells. These results support Barlow’s theory that the goal of natural vision is to represent the information in the natural environment with minimal redundancy.
Article
Full-text available
Quantification of the degree of stenosis or vessel dimensions are important for diagnosis of vascular diseases and planning vascular interventions. Although diagnosis from three-dimensional (3-D) magnetic resonance angiograms (MRA's) is mainly performed on two-dimensional (2-D) maximum intensity projections, automated quantification of vascular segments directly from the 3-D dataset is desirable to provide accurate and objective measurements of the 3-D anatomy. A model-based method for quantitative 3-D MRA is proposed. Linear vessel segments are modeled with a central vessel axis curve coupled to a vessel wall surface. A novel image feature to guide the deformation of the central vessel axis is introduced. Subsequently, concepts of deformable models are combined with knowledge of the physics of the acquisition technique to accurately segment the vessel wall and compute the vessel diameter and other geometrical properties. The method is illustrated and validated on a carotid bifurcation phantom, with ground truth and medical experts as comparisons. Also, results on 3-D time-of-flight (TOF) MRA images of the carotids are shown. The approach is a promising technique to assess several geometrical vascular parameters directly on the source 3-D images, providing an objective mechanism for stenosis grading.
Conference Paper
Full-text available
We describe and demonstrate a texture region descriptor which is invariant to affine geometric and photometric transformations, and insensitive to the shape of the texture region. It is applicable to texture patches which are locally planar and have stationary statistics. The novelty of the descriptor is that it is based on statistics aggregated over the region, resulting in richer and more stable descriptors than those computed at a point. Two texture matching applications of this descriptor are demonstrated: (1) it is used to automatically identify, regions of the same type of texture, but with varying surface pose, within a single image; (2) it is used to support wide baseline stereo, i.e. to enable the automatic computation of the epipolar geometry between two images acquired from quite separated viewpoints. Results are presented on several sets of real images
Conference Paper
Full-text available
We present a robust method for automatically matching features in images corresponding to the same physical point on an object seen from two arbitrary viewpoints. Unlike conventional stereo matching approaches we assume no prior knowledge about the relative camera positions and orientations. In fact in our application this is the information we wish to determine from the image feature matches. Features are detected in two or more images and characterised using affine texture invariants. The problem of window effects is explicitly addressed by our method-our feature characterisation is invariant to linear transformations of the image data including rotation, stretch and skew. The feature matching process is optimised for a structure-from-motion application where we wish to ignore unreliable matches at the expense of reducing the number of feature matches
Conference Paper
Full-text available
This paper describes a probabilistic object recognition technique which does not require correspondence matching of images. This technique is an extension of our earlier work (1996) on object recognition using matching of multi-dimensional receptive field histograms. In the earlier paper we have shown that multi-dimensional receptive field histograms can be matched to provide object recognition which is robust in the face of changes in viewing position and independent of image plane rotation and scale. In this paper we extend this method to compute the probability of the presence of an object in an image. The paper begins with a review of the method and previously presented experimental results. We then extend the method for histogram matching to obtain a genuine probability of the presence of an object. We present experimental results on a database of 100 objects showing that the approach is capable recognizing all objects correctly by using only a small portion of the image. Our results show that receptive field histograms provide a technique for object recognition which is robust, has low computational cost and a computational complexity which is linear with the number of pixels
Conference Paper
Full-text available
When extracting features from image data, the type of information that can be extracted may be strongly dependent on the scales at which the feature detectors are applied. This article presents a systematic methodology for addressing this problem. A mechanism is presented for automatic selection of scale levels when detecting one-dimensional features, such as edges and ridges. A novel concept of a scale-space edge is introduced, defined as a connected set of points in scale-space at which: (i) the gradient magnitude assumes a local maximum in the gradient direction, and (ii) a normalized measure of the strength of the edge response is locally maximal over scales. An important property of this definition is that it allows the scale levels to vary along the edge. Two specific measures of edge strength are analysed in detail. It is shown that by expressing these in terms of γ-normalized derivatives, an immediate consequence of this definition is that fine scales are selected for sharp edges (so as to reduce the shape distortions due to scale-space smoothing), whereas coarse scales are selected for diffuse edges, such that an edge model constitutes a valid abstraction of the intensity profile across the edge. With slight modifications, this idea can be used for formulating a ridge detector with automatic scale selection, having the characteristic property that the selected scales on a scale-space ridge instead reflect the width of the ridge
Article
Full-text available
. The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represented by the joint statistics of such local neighborhood operators. As such, this represents a new class of appearance based techniques for computer vision. Based on joint statistics, the paper develops techniques for the identification of multiple objects at arbitrary positions and orientations in a cluttered scene. Experiments show that these techniques can identify over 100 objects in the presence of major occlusions. Most remarkably, the techniques have low complexity and therefore run in real-time. 1. Introduction The paper proposes a framework for the statistical representation of the appearance of arbitrary 3D objects. This representation consists of a probability density function or jo...
Book
Full-text available
A basic problem when deriving information from measured data, such as images, originates from the fact that objects in the world, and hence image structures, exist as meaningful entities only over certain ranges of scale. "Scale-Space Theory in Computer Vision" describes a formal theory for representing the notion of scale in image data, and shows how this theory applies to essential problems in computer vision such as computation of image features and cues to surface shape. The subjects range from the mathematical foundation to practical computational techniques. The power of the methodology is illustrated by a rich set of examples This book is the first monograph on scale-space theory. It is intended as an introduction, reference, and inspiration for researchers, students, and system designers in computer vision as well as related fields such as image processing, photogrammetry, medical image analysis, and signal processing in general.
Article
Full-text available
Since the pioneering work by Witkin (1983) and Koenderink (1984) on the notion of "scale-space representation", a large number of different scale-space formulations have been stated, based on different types of assumptions (usually referred to as scale-space axioms). The main subject of this chapter is to provide a synthesis between these linear scale-space formulations and to show how they are related. Another aim is to show how the scale-space formulations, which were originally expressed for continuous data on spatial domains without preferred directions, can be extended to discrete data as well as to spatio-temporal domains with preferred directions. Connections will also be pointed out to approaches
Article
Full-text available
The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a so-called scale-space representation. Traditional scale-space theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic approach for dealing with this problem---a heuristic principle is presented stating that local extrema over scales of different combinations of gamma-normalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is proposed that this idea can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure. Support is given in terms of a general theoretical investigation of the behaviour of the scale selection method under rescalings of the input pattern and by experiments on real-world and synthetic data. Support is also given by a detailed analysis of how different types of feature detectors perform when integrated with a scale selection mechanism and then applied to characteristic model patterns. Specifically, it is described in detail how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation.
Article
Full-text available
In this paper we present a robust method for automatically matching features in images corresponding to the same physical point on an object seen from two arbitrary viewpoints. Unlike conventional stereo matching approaches we assume no prior knowledge about the relative camera positions and orientations. In fact in our application this is the information we wish to determine from the image feature matches. Features are detected in two or more images and characterised using affine texture invariants. The problem of window effects is explicitly addressed by our method - our feature characterisation is invariant to linear transformations of the image data including rotation, stretch and skew. The feature matching process is optimised for a structure-from-motion application where we wish to ignore unreliable matches at the expense of reducing the number of feature matches.
Article
Full-text available
When computing descriptors of image data, the type of information that can be extracted may be strongly dependent on the scales at which the image operators are applied. This article presents a systematic methodology for addressing this problem. A mechanism is presented for automatic selection of scale levels when detecting one-dimensional image features, such as edges and ridges. A novel concept of a scale-space edge is introduced, defined as a connected set of points in scale-space at which: (i) the gradient magnitude assumes a local maximum in the gradient direction, and (ii) a normalized measure of the strength of the edge response is locally maximal over scales. An important consequence of this definition is that it allows the scale levels to vary along the edge. Two specific measures of edge strength are analysed in detail, the gradient magnitude and a di#erential expression derived from the third-order derivative in the gradient direction. For a certain way of normalizing these ...
Chapter
We describe a teehnique for image encoding in which local operators of many scaJes but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as In space. Pixel-to-pixel correlations are first removed by subtracting a low-pass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass IiItered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is etJuivalent to sampling the image ""'ith Laplacian operators of many scales. Thus, the code tends to enhance salient image Ceatures. A Curther advantage of the present cude is thut it is weU suited for many imClge analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding.
Article
An abstract is not available.
Thesis
In this work a scale-space framework has been presented which does not require any monotony assumption (comparison principle). We have seen that, besides the fact that many global smoothing scale-space properties are maintained, new possibilities with respect to image restoration appear. Rather than deducing a unique equation from first principles, we have analyzed well-posedness and scale-space properties of a general family of regularized anisotropic diffusion filters. Existence and uniqueness results, continuous dependence of the solution on the initial image, maximum-minimum principles, invariances, Lyapunov functionals, and convergence to a constant steady state have been established. The large class of Lyapunov functionals permits to regard these filters in numerous ways as simplifying, information-reducing transformations. These global smoothing properties do not contradict seemingly opposite local effects such its edge enhancement. For this reason it is possible to design scale-spaces with restoration properties giving segmentation-like results. Prerequisites have been stated under which one can prove well-posedness and scale-space results in the continuous, semidiscrete and discrete setting. Each of these frameworks stands on its own and does not require the others. On the other hand, the prerequisites in all three settings reveal many similarities and, as a consequence, representatives of the semidiscrete class can be obtained by suitable spatial discretizations of the continuous class, while representatives of the discrete class may arise from time discretizations of semidiscrete filters. The degree of freedom within the proposed class of filters can be used to tailor the filters towards specific restoration tasks. Therefore, these scale-spaces do not need to be uncommitted; they give the user the liberty to incorporate a-priori knowledge, for instance concerning size and contrast of especially interesting features. The analyzed class comprises linear diffusion filtering and the nonlinear isotropic model of Catté, Lions, Morel, Coll and Whitaker and Pizer, but also novel approaches have been proposed: The use of diffusion tensors instead of scalar-valued diffusivities puts us in a position to design real anisotropic diffusion processes which may reveal advantages at noisy edges. Last but not least, the fact that these filters are steered by the structure tensor instead of the regularized gradient allows to adapt them to more sophisticated tasks such as the enhancement of coherent flow-like structures. In view of these results, anisotropic diffusion deserves to be regarded as much more than all ad-hoc strategy for transforming a degraded image into a more pleasant looking one. It is a flexible and mathematically sound class of methods which ties the advantages of two worlds: scale-space analysis and image restoration.
Article
Linear scale-space is considered to be a modern bottom-up tool in computer vision. The American and European vision community, however, is unaware of the fact that it has already been axiomatically derived in 1959 in a Japanese paper by Taizo Iijima. This result formed the starting point of vast linear scale-space research in Japan ranging from various axiomatic derivations over deep structure analysis to applications to optical character recognition. Since the outcomes of these activities are unknown to western scale-space researchers, we give an overview of the contribution to the development of linear scale-space theories and analyses. In particular, we review four Japanese axiomatic approaches that substantiate linear scale-space theories proposed between 1959 and 1981. By juxtaposing them to ten American or European axiomatics, we present an overview of the state-of-the-art in Gaussian scale-space axiomatics. Furthermore, we show that many techniques for analysing linear scale-space have also been pioneered by Japanese researchers.
Article
We describe a technique for image encoding in which local operators of many scales but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as in space. Pixel-to-pixel correlations are first removed by subtracting a lowpass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass filtered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is equivalent to sampling the image with Laplacian operators of many scales. Thus, the code tends to enhance salient image features. A further advantage of the present code is that it is well suited for many image analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding.
Article
Detection of tubular structures in 3D images is an important issue for vascular medical imaging. We present in this paper a new approach for centerline detection and reconstruction of 3D tubular structures. Several models of vessels are introduced for estimating the sensitivity of the image second-order derivatives according to elliptical cross section, to curvature of the axis, or to partial volume effects. Our approach uses a multiscale analysis for extracting vessels of different sizes according to the scale. For a given model of vessel, we derive an analytic expression of the relationship between the radius of the structure and the scale at which it is detected. The algorithm gives both centerline extraction and radius estimation of the vessels allowing their reconstruction. The method has been tested on synthetic images, an image of a phantom, and real images, with encouraging results.
Article
We present a computational framework for stereopsis based on the outputs of linear spatial filters tuned to a range of orientations and scales. This approach goes beyond edge-based and area-based approaches by using a richer image description and incorporating several stereo cues that have previously been neglected in the computer vision literature. A technique based on using the pseudo-inverse is presented for characterizing the information present in a vector of filter responses. We show how in our framework viewing geometry can be recovered to determine the locations of epipolar lines. An assumption that visible surfaces in the scene are piecewise smooth leads to differential treatment of image regions corresponding to binocularly visible surfaces, surface boundaries, and occluded regions that are only monocularly visible. The constraints imposed by viewing geometry and piecewise smoothness are incorporated into an iterative algorithm that gives good results on random-dot stereograms, artificially generated scenes, and natural grey-level images.
Article
This article describes a method for reducing the shape distortions due to scale-space smoothing that arise in the computation of 3-D shape cues using operators (derivatives) defined from scale-space representation. More precisely, we are concerned with a general class of methods for deriving 3-D shape cues from a 2-D image data based on the estimation of locally linearized deformations of brightness patterns. This class constitutes a common framework for describing several problems in computer vision (such as shape-from-texture, shape-from disparity-gradients, and motion estimation) and for expressing different algorithms in terms of similar types of visual front-end-operations. It is explained how surface orientation estimates will be biased due to the use of rotationally symmetric smoothing in the image domain. These effects can be reduced by extending the linear scale-space concept into an affine Gaussian scalespace representation and by performing affine shape adaptation of the smoothing kernels. This improves the accuracy of the surface orientation estimates, since the image descriptors, on which the methods are based, will be relative invariant under affine transformations, and the error thus confined to the higher-order terms in the locally linearized perspective transformation. A straightforward algorithm is presented for performing shape adaptation in practice. Experiments on real and synthetic images with known orientation demonstrate that in the presence of moderately high noise levels the accuracy is improved by typically one order of magnitude.
Conference Paper
Forgrey-valueimages,itiswellacceptedthattheneighbor- hood rather than the pixel carries the geometrical interpretation. Inter- estinglythespatialconflgurationoftheneighborhoodisthebasisforthe perceptionofhumans.Commonpractiseincolorimageprocessing,isto use the color information without considering the spatial structure. We aim at aphysical basisforthe local interpretation ofcolorimages. We propose a framework for spatial color measurement, based on the Gaussianscale-spacetheory.WeconsideraGaussiancolormodel,which inherentlyusesthespatialandcolorinformationinanintegratedmodel. Theframeworkiswell-foundedinphysicsaswellasinmeasurementsci- ence. The framework delivers sound and robust spatial color invariant features. The usefulness of the proposed measurement framework is il- lustrated by edge detection, where edges are discriminated as shadow, highlight, or object boundary. Other applications of the framework in- clude color invariantimage retrieval andcolorconstantedge detection.
Conference Paper
The extrema in a signal and its first few derivatives provide a useful general-purpose qualitative description for many kinds of signals. A fundamental problem in computing such descriptions is scale: a derivative must be taken over some neighborhood, but there is seldom a principled basis for choosing its size. Scale-space filtering is a method that describes signals qualitatively, managing the ambiguity of scale in an organized and natural way. The signal is first expanded by convolution with gaussian masks over a continuum of sizes. This "scale-space" image is then collapsed, using its qualitative structure, into a tree providing a concise but complete qualitative description covering all scales of observation. The description is further refined by applying a stability criterion, to identify events that persist of large changes in scale.
Article
We study the recognition of surfaces made from different materials such as concrete, rug, marble, or leather on the basis of their textural appearance. Such natural textures arise from spatial variation of two surface attributes: (1) reflectance and (2) surface normal. In this paper, we provide a unified model to address both these aspects of natural texture. The main idea is to construct a vocabulary of prototype tiny surface patches with associated local geometric and photometric properties. We call these 3D textons. Examples might be ridges, grooves, spots or stripes or combinations thereof. Associated with each texton is an appearance vector, which characterizes the local irradiance distribution, represented as a set of linear Gaussian derivative filter outputs, under different lighting and viewing conditions. Given a large collection of images of different materials, a clustering approach is used to acquire a small (on the order of 100) 3D texton vocabulary. Given a few (1 to 4) images of any material, it can be characterized using these textons. We demonstrate the application of this representation for recognition of the material viewed under novel lighting and viewing conditions. We also illustrate how the 3D texton model can be used to predict the appearance of materials under novel conditions.
Article
This paper presents the measurement of colored object reflectance, under different, general assumptions regarding the imaging conditions. We exploit the Gaussian scale-space paradigm for color images to define a framework for the robust measurement of object reflectance from color images. Object reflectance is derived from a physical reflectance model based on the Kubelka-Munk theory for colorant layers. Illumination and geometrical invariant properties are derived from the reflectance model. Invariance and discriminative power of the color invariants is experimentally investigated, showing the invariants to be successful in discounting shadow, illumination, highlights, and noise. Extensive experiments show the different invariants to be highly discriminative, while maintaining invariance properties. The presented framework for color measurement is well-founded in the physics of color as well as in measurement science. Hence, the proposed invariants are considered more adequate for the measurement of invariant color features than existing methods
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Article
It is shown that a convolution with certain reasonable receptive field (RF) profiles yields the exact partial derivatives of the retinal illuminance blurred to a specified degree. Arbitrary concatenations of such RF profiles yield again similar ones of higher order and for a greater degree of blurring. By replacing the illuminance with its third order jet extension we obtain position dependent geometries. It is shown how such a representation can function as the substrate for “point processors” computing geometrical features such as edge curvature. We obtain a clear dichotomy between local and multilocal visual routines. The terms of the truncated Taylor series representing the jets are partial derivatives whose corresponding RF profiles closely mimic the well known units in the primary visual cortex. Hence this description provides a novel means to understand and classify these units. Taking the receptive field outputs as the basic input data one may devise visual routines that compute geometric features on the basis of standard differential geometry exploiting the equivalence with the local jets (partial derivatives with respect to the space coordinates).
Article
In practice the relevant details of images exist only over a restricted range of scale. Hence it is important to study the dependence of image structure on the level of resolution. It seems clear enough that visual perception treats images on several levels of resolution simultaneously and that this fact must be important for the study of perception. However, no applicable mathematically formulated theory to deal with such problems appears to exist. In this paper it is shown that any image can be embedded in a one-parameter family of derived images (with resolution as the parameter) in essentially only one unique way if the constraint that no spurious detail should be generated when the resolution is diminished, is applied. The structure of this family is governed by the well known diffusion equation (a parabolic, linear, partial differential equation of the second order). As such the structure fits into existing theories that treat the front end of the visual system as a continuous stack of homogeneous layers, characterized by iterated local processing schemes. When resolution is decreased the images becomes less articulated because the extrem ("light and dark blobs") disappear one after the other. This erosion of structure is a simple process that is similar in every case. As a result any image can be described as a juxtaposed and nested set of light and dark blobs, wherein each blob has a limited range of resolution in which it manifests itself. The structure of the family of derived images permits a derivation of the sampling density required to sample the image at multiple scales of resolution.(ABSTRACT TRUNCATED AT 250 WORDS)
Article
Neurons in the central visual pathways process visual images within a localized region of space, and a restricted epoch of time. Although the receptive field (RF) of a visually responsive neuron is inherently a spatiotemporal entity, most studies have focused exclusively on spatial aspects of RF structure. Recently, however, the application of sophisticated RF-mapping techniques has enabled neurophysiologists to characterize RFs in the joint domain of space and time. Studies that use these techniques have revealed that neurons in the geniculostriate pathway exhibit striking RF dynamics. For a majority of cells, the spatial structure of the RF changes as a function of time; thus, these RFs can be characterized adequately only in the space-time domain. In this review, the spatiotemporal RF structure of neurons in the lateral geniculate nucleus and primary visual cortex is discussed.
Conference Paper
We present a framework for texture recognition based on local affine-invariant descriptors and their spatial layout. At modelling time, a generative model of local descriptors is learned from sample images using the EM algorithm. The EM framework allows the incorporation of unsegmented multitexture images into the training set. The second modelling step consists of gathering co-occurrence statistics of neighboring descriptors. At recognition time, initial probabilities computed from the generative model are refined using a relaxation step that incorporates co-occurrence statistics. Performance is evaluated on images of an indoor scene and pictures of wild animals.
Article
This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations
Article
A method that treats linear neighborhood operators within a unified framework that enables linear combinations, concatenations, resolution changes, or rotations of operators to be treated in a canonical manner is presented. Various families of operators with special kinds of symmetries (such as translation, rotation, magnification) are explicitly constructed in 1-D, 2-D, and 3-D. A concept of `order' is defined, and finite orthonormal bases of functions closely connected with the operators of various orders are constructed. Linear transformations between the various representations are considered. The method is based on two fundamental assumptions: a decrease of resolution should not introduce spurious detail, and the local operators should be self-similar under changes of resolution. These assumptions merely sum up the even more general need for homogeneity isotropy, scale invariance, and separability of independent dimensions of front-end processing in the absence of a priori information
Article
For grey-value images, it is well accepted that the neighborhood rather than the pixel carries the geometrical interpretation. Interestingly the spatial con guration of the neighborhood is the basis for the perception of humans. Common practise in color image processing, is to use the color information without considering the spatial structure. We aim at a physical basis for the local interpretation of color images.
Article
We propose a representation of images in which a global, but not a local topology is defined. The topology is restricted to resolutions up to the extent of the local region of interest (ROI). Although the ROI's may contain many pixels, there is no spatial order on the pixels within the ROI, the only information preserved is the histogram of pixel values within the ROI's. This can be considered as an extreme case of a textel (texture element) image: The histogram is the limit of texture where the spatial order has been completely disregarded. We argue that locally orderless images are ubiquitous in perception and the visual arts. Formally, the orderless images are most aptly described by three mutually intertwined scale spaces. The scale parameters correspond to the pixellation ("inner scale"), the extent of the ROI's ("outer scale") and the resolution in the histogram ("tonal scale"). We describe how to construct locally orderless images, how to render them, and how to use them in a variety of local and global image processing operations.
  • L M J Florack
Florack, L. M. J. (1997) Image Structure, Kluwer/Springer.
Colour space', unpublished lecture notes
  • J J Koenderink
  • A Kappers
Koenderink, J. J. and Kappers, A. (1998) 'Colour space', unpublished lecture notes, Utrecht University, The Netherlands.
Viewpoint invariant texture matching and wide baseline stereo
  • Schaffahtzky F.