We propose a method for characterizing spatial region data. The method efficiently constructs a k-dimensional feature vector using concentric spheres in 3D (circles in 2D) radiating out of a region's center of mass. These signatures capture structural and internal volume properties. We evaluate our approach by performing experiments on classification and similarity searches, using artificial and real datasets. To generate artificial regions we introduce a region growth model. Similarity searches on artificial data demonstrate that our technique, although straightforward, compares favorably to mathematical morphology, while being two orders of magnitude faster. Experiments with real datasets show its effectiveness and general applicability.
This paper presents a flexible framework to build a target-specific, part-based representation for arbitrary articulated or rigid objects. The aim is to successfully track the target object in 2D, through multiple scales and occlusions. This is realized by employing a hierarchical, iterative optimization process on the proposed representation of structure and appearance. Therefore, each rigid part of an object is described by a hierarchical spring system represented by an attributed graph pyramid. Hierarchical spring systems encode the spatial relationships of the features (attributes of the graph pyramid) describing the parts and enforce them by spring-like behavior during tracking. Articulation points connecting the parts of the object allow to transfer position information from reliable to ambiguous parts. Tracking is done in an iterative process by combining the hypotheses of simple trackers with the hypotheses extracted from the hierarchical spring systems.
3D electron microsscopy aims at the reconstruction of density volumes corresponding to the electrostatic potential distribution of macro-molecules. There are many factors limiting the resolution achievable when this technique is applied to biological macromolecules: microscope imperfections, molecule flexibility, lack of projections from certain directions, unknown angular distribution, noise, etc. In this communication we explore the quality gain in the reconstruction by including a priori knowledge such as particle symmetry, occupied volume, known surface relief, density nonnegativity and similarity to a known volume in order to improve the quality of the reconstruction. If the reconstruction is represented as a series expansion, such constraints can be expressed by set of equations that the expansion coefficients must satisfy. In this work, these equation sets are specified and combined in a novel way with the ART + blobs reconstruction algorithm. The effect of each one on the reconstruction of a realistic phantom is explored. Finally, the application of these restrictions to 3D reconstructions from experimental data are studied.
A new approach, based on the hierarchical soft correspondence detection, has been presented for significantly improving the speed of our previous HAMMER image registration algorithm. Currently, HAMMER takes a relative long time, e.g., up to 80 minutes, to register two regular sized images using Linux machine (with 2.40GHz CPU and 2-Gbyte memory). This is because the results of correspondence detection, used to guide the image warping, can be ambiguous in complex structures and thus the image warping has to be conservative and accordingly takes long time to complete. In this paper, a hierarchical soft correspondence detection technique has been employed to detect correspondences more robustly, thereby allowing the image warping to be completed straightforwardly and fast. By incorporating this hierarchical soft correspondence detection technique into the HAMMER registration framework, the robustness and the accuracy of registration (in terms of low average registration error) can be both achieved. Experimental results on real and simulated data show that the new registration algorithm, based the hierarchical soft correspondence detection, can run nine times faster than HAMMER while keeping the similar registration accuracy.
A method for spatio-temporally smooth and consistent estimation of cardiac motion from MR cine sequences is proposed. Myocardial motion is estimated within a 4-dimensional (4D) registration framework, in which all 3D images obtained at different cardiac phases are simultaneously registered. This facilitates spatio-temporally consistent estimation of motion as opposed to other registration-based algorithms which estimate the motion by sequentially registering one frame to another. To facilitate image matching, an attribute vector (AV) is constructed for each point in the image, and is intended to serve as a "morphological signature" of that point. The AV includes intensity, boundary, and geometric moment invariants (GMIs). Hierarchical registration of two image sequences is achieved by using the most distinctive points for initial registration of two sequences and gradually adding less-distinctive points to refine the registration. Experimental results on real data demonstrate good performance of the proposed method for cardiac image registration and motion estimation. The motion estimation is validated via comparisons with motion estimates obtained from MR images with myocardial tagging.
Kidney cancer occurs in both a hereditary (inherited) and sporadic (non-inherited) form. It is estimated that almost a quarter of a million people in the USA are living with kidney cancer and their number increases with 51,000 diagnosed with the disease every year. In clinical practice, the response to treatment is monitored by manual measurements of tumor size, which are 2D, do not reflect the 3D geometry and enhancement of tumors, and show high intra- and inter-operator variability. We propose a computer-assisted radiology tool to assess renal tumors in contrast-enhanced CT for the management of tumor diagnoses and responses to new treatments. The algorithm employs anisotropic diffusion (for smoothing), a combination of fast-marching and geodesic level-sets (for segmentation), and a novel statistical refinement step to adapt to the shape of the lesions. It also quantifies the 3D size, volume and enhancement of the lesion and allows serial management over time. Tumors are robustly segmented and the comparison between manual and semi-automated quantifications shows disparity within the limits of inter-observer variability. The analysis of lesion enhancement for tumor classification shows great separation between cysts, von Hippel-Lindau syndrome lesions and hereditary papillary renal carcinomas (HPRC) with p-values inferior to 0.004. The results on temporal evaluation of tumors from serial scans illustrate the potential of the method to become an important tool for disease monitoring, drug trials and noninvasive clinical surveillance.
Analysis of functional magnetic resonance imaging (fMRI) data in its native, complex form has been shown to increase the sensitivity both for data-driven techniques, such as independent component analysis (ICA), and for model-driven techniques. The promise of an increase in sensitivity and specificity in clinical studies, provides a powerful motivation for utilizing both the phase and magnitude data; however, the unknown and noisy nature of the phase poses a challenge. In addition, many complex-valued analysis algorithms, such as ICA, suffer from an inherent phase ambiguity, which introduces additional difficulty for group analysis. We present solutions for these issues, which have been among the main reasons phase information has been traditionally discarded, and show their effectiveness when used as part of a complex-valued group ICA algorithm application. The methods we present thus allow the development of new fully complex data-driven and semi-blind methods to process, analyze, and visualize fMRI data.We first introduce a phase ambiguity correction scheme that can be either applied subsequent to ICA of fMRI data or can be incorporated into the ICA algorithm in the form of prior information to eliminate the need for further processing for phase correction. We also present a Mahalanobis distance-based thresholding method, which incorporates both magnitude and phase information into a single threshold, that can be used to increase the sensitivity in the identification of voxels of interest. This method shows particular promise for identifying voxels with significant susceptibility changes but that are located in low magnitude (i.e. activation) areas. We demonstrate the performance gain of the introduced methods on actual fMRI data.
This paper provides exact analytical expressions for the first and second moments of the true error for linear discriminant analysis (LDA) when the data are univariate and taken from two stochastic Gaussian processes. The key point is that we assume a general setting in which the sample data from each class do not need to be identically distributed or independent within or between classes. We compare the true errors of designed classifiers under the typical i.i.d. model and when the data are correlated, providing exact expressions and demonstrating that, depending on the covariance structure, correlated data can result in classifiers with either greater error or less error than when training with uncorrelated data. The general theory is applied to autoregressive and moving-average models of the first order, and it is demonstrated using real genomic data.
We describe an annealing procedure that computes the normalized N-cut of a weighted graph G. The first phase transition computes the solution of the approximate normalized 2-cut problem, while the low temperature solution computes the normalized N-cut. The intermediate solutions provide a sequence of refinements of the 2-cut that can be used to split the data to K clusters with 2 </= K </= N. This approach only requires specification of the upper limit on the number of expected clusters N, since by controlling the annealing parameter we can obtain any number of clusters K with 2 </= K </= N. We test the algorithm on an image segmentation problem and apply it to a problem of clustering high dimensional data from the sensory system of a cricket.
Accumulating evidence suggests that characteristics of pre-treatment FDG-PET could be used as prognostic factors to predict outcomes in different cancer sites. Current risk analyses are limited to visual assessment or direct uptake value measurements. We are investigating intensity-volume histogram metrics and shape and texture features extracted from PET images to predict patient's response to treatment. These approaches were demonstrated using datasets from cervix and head and neck cancers, where AUC of 0.76 and 1.0 were achieved, respectively. The preliminary results suggest that the proposed approaches could potentially provide better tools and discriminant power for utilizing functional imaging in clinical prognosis.
In this paper we propose a microcalcification classification scheme, assisted by content-based mammogram retrieval, for breast cancer diagnosis. We recently developed a machine learning approach for mammogram retrieval where the similarity measure between two lesion mammograms was modeled after expert observers. In this work we investigate how to use retrieved similar cases as references to improve the performance of a numerical classifier. Our rationale is that by adaptively incorporating local proximity information into a classifier, it can help to improve its classification accuracy, thereby leading to an improved "second opinion" to radiologists. Our experimental results on a mammogram database demonstrate that the proposed retrieval-driven approach with an adaptive support vector machine (SVM) could improve the classification performance from 0.78 to 0.82 in terms of the area under the ROC curve.
Many attempts have been made to characterize latent structures in "texture spaces" defined by attentive similarity judgments. While an optimal description of perceptual texture space remains elusive, we suggest that the similarity judgments gained from these procedures provide a useful standard for relating image statistics to high-level similarity. In the present experiment, we ask subjects to group natural textures into visually similar clusters. We also represent each image using the features employed by three different parametric texture synthesis models. Given the cluster labels for our textures, we use linear discriminant analysis to predict cluster membership. We compare each model's assignments to human data for both positive and contrast-negated textures, and evaluate relative model performance.
With the development of micron-scale imaging techniques, capillaries can be conveniently visualized using methods such as two-photon and whole mount microscopy. However, the presence of background staining, leaky vessels and the diffusion of small fluorescent molecules can lead to significant complexity in image analysis and loss of information necessary to accurately quantify vascular metrics. One solution to this problem is the development of accurate thresholding algorithms that reliably distinguish blood vessels from surrounding tissue. Although various thresholding algorithms have been proposed, our results suggest that without appropriate pre- or post-processing, the existing approaches may fail to obtain satisfactory results for capillary images that include areas of contamination. In this study, we propose a novel local thresholding algorithm, called directional histogram ratio at random probes (DHR-RP). This method explicitly considers the geometric features of tube-like objects in conducting image binarization, and has a reliable performance in distinguishing small vessels from either clean or contaminated background. Experimental and simulation studies suggest that our DHR-RP algorithm is superior over existing thresholding methods.
Many problems in paleontology reduce to finding those features that best discriminate among a set of classes. A clear example is the classification of new specimens. However, these classifications are generally challenging because the number of discriminant features and the number of samples are limited. This has been the fate of LB1, a new specimen found in the Liang Bua Cave of Flores. Several authors have attributed LB1 to a new species of Homo, H. floresiensis. According to this hypothesis, LB1 is either a member of the early Homo group or a descendent of an ancestor of the Asian H. erectus. Detractors have put forward an alternate hypothesis, which stipulates that LB1 is in fact a microcephalic modern human. In this paper, we show how we can employ a new Bayes optimal discriminant feature extraction technique to help resolve this type of issues. In this process, we present three types of experiments. First, we use this Bayes optimal discriminant technique to develop a model of morphological (shape) evolution from Australopiths to H. sapiens. LB1 fits perfectly in this model as a member of the early Homo group. Second, we build a classifier based on the available cranial and mandibular data appropriately normalized for size and volume. Again, LB1 is most similar to early Homo. Third, we build a brain endocast classifier to show that LB1 is not within the normal range of variation in H. sapiens. These results combined support the hypothesis of a very early shared ancestor for LB1 and H. erectus, and illustrate how discriminant analysis approaches can be successfully used to help classify newly discovered specimens.
Identifying and validating novel phenotypes from images inputting online is a major challenge against high-content RNA interference (RNAi) screening. Newly discovered phenotypes should be visually distinct from existing ones and make biological sense. An online phenotype discovery method featuring adaptive phenotype modeling and iterative cluster merging using improved gap statistics is proposed. Clustering results based on compactness criteria and Gaussian mixture models (GMM) for existing phenotypes iteratively modify each other by multiple hypothesis test and model optimization based on minimum classification error (MCE). The method works well on discovering new phenotypes adaptively when applied to both of synthetic datasets and RNAi high content screen (HCS) images with ground truth labels.
This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values.
This paper proposes a new nonlinear classifier based on a generalized Choquet integral with signed fuzzy measures to enhance the classification accuracy and power by capturing all possible interactions among two or more attributes. This generalized approach was developed to address unsolved Choquet-integral classification issues such as allowing for flexible location of projection lines in n-dimensional space, automatic search for the least misclassification rate based on Choquet distance, and penalty on misclassified points. A special genetic algorithm is designed to implement this classification optimization with fast convergence. Both the numerical experiment and empirical case studies show that this generalized approach improves and extends the functionality of this Choquet nonlinear classification in more real-world multi-class multi-dimensional situations.
An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.
CT Colonography (CTC) is an emerging minimally invasive technique for screening and diagnosing colon cancers. Computer Aided Detection (CAD) techniques can increase sensitivity and reduce false positives. Inspired by the way radiologists detect polyps via 3D virtual fly-through in CTC, we borrowed the idea from geographic information systems to employ topographical height map in colonic polyp measurement and false positive reduction. After a curvature based filtering and a 3D CT feature classifier, a height map is computed for each detection using a ray-casting algorithm. We design a concentric index to characterize the concentric pattern in polyp height map based on the fact that polyps are protrusions from the colon wall and round in shape. The height map is optimized through a multi-scale spiral spherical search to maximize the concentric index. We derive several topographic features from the map and compute texture features based on wavelet decomposition. We then send the features to a committee of support vector machines for classification. We have trained our method on 394 patients (71 polyps) and tested it on 792 patients (226 polyps). Results showed that we can achieve 95% sensitivity at 2.4 false positives per patient and the height map features can reduce false positives by more than 50%. We compute the polyp height and width measurements and correlate them with manual measurements. The Pearson correlations are 0.74 (p=0.11) and 0.75 (p=0.17) for height and width, respectively.
We propose an approach to shape detection of highly deformable shapes in images via manifold learning with regression. Our method does not require shape key points be defined at high contrast image regions, nor do we need an initial estimate of the shape. We only require sufficient representative training data and a rough initial estimate of the object position and scale. We demonstrate the method for face shape learning, and provide a comparison to nonlinear Active Appearance Model. Our method is extremely accurate, to nearly pixel precision and is capable of accurately detecting the shape of faces undergoing extreme expression changes. The technique is robust to occlusions such as glasses and gives reasonable results for extremely degraded image resolutions.
Deformable shape detection is an important problem in computer vision and pattern recognition. However, standard detectors are typically limited to locating only a few salient landmarks such as landmarks near edges or areas of high contrast, often conveying insufficient shape information. This paper presents a novel statistical pattern recognition approach to locate a dense set of salient and non-salient landmarks in images of a deformable object. We explore the fact that several object classes exhibit a homogeneous structure such that each landmark position provides some information about the position of the other landmarks. In our model, the relationship between all pairs of landmarks is naturally encoded as a probabilistic graph. Dense landmark detections are then obtained with a new sampling algorithm that, given a set of candidate detections, selects the most likely positions as to maximize the probability of the graph. Our experimental results demonstrate accurate, dense landmark detections within and across different databases.
This paper presents an approach to recognizing two-dimensional multiscale objects on a reconfigurable mesh architecture with horizontal and vertical broadcasting. The object models are described in terms of a convex/concave multiscale boundary decomposition that is represented by a tree structure. The problem of matching an observed object against a model is formulated as a tree matching problem. A parallel dynamic programming solution to this problem is presented that requires O(max(n,m)) time on n×m reconfigurable mesh, where n and m are the sizes of the two trees
A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture. The problems caused by noise, illumination and many source type related degradations are addressed. The algorithm uses document characteristics to determine (surface) attributes, often used in document segmentation. Using characteristic analysis, two new algorithms are applied to determine a local threshold for each pixel. An algorithm based on soft decision control is used for thresholding the background and picture regions. An approach utilizing local mean and variance of gray values is applied to textual regions. Tests were performed with images including different types of document components and degradations. The results show that the method adapts and performs well in each case
This paper describes a new approach to adaptive digital halftoning with the least squares model-based (LSMB) method. A framework is presented for the adaptive control of smoothness and sharpness of the halftone patterns according to local image characteristics. The proposed method employs explicit, quantitative models of the human visual system represented as 2D linear filters (eye filters). In contrast with the standard LSMB method where the single eye filter is employed uniformly over the image, the model parameters are controlled according to local image characteristics for each pixel. Because of the adaptive selection of eye filters for the pixels, image enhancement is incorporated into the halftoning process. Effectiveness of the proposed approach is demonstrated through experiments using real data compared with the error-diffusion algorithm and the standard LSMB method
A general form of morphological operators is developed. The structuring elements of the operators adapt their shapes according to the local features of the processed image and can be any shape formed by connecting a given number of predetermined basic elements. The basic element can be a single pixel or any other connected small shape. It is shown that while sharing the basic properties of the conventional morphological operators, the operators also have some distinguished properties. Those properties are related to intuitive descriptions of the geometric performances of the operators and the development of fast algorithms for the implementation of the operators. The efficiency of the operators in image processing is discussed, and two application examples are given.< >
Facial features extraction algorithms which can be used for automated visual interpretation and recognition of human faces are presented. Here, we can capture the contours of the eye and mouth by a deformable template model because of their analytically describable shapes. However, the shapes of the eyebrow, nostril and face are difficult to model using a deformable template. We extract them by using an active contour model (snake). In the experiments, 12 models are photographed, and the feature contours are extracted for each portrait.
This paper investigates surface approximation using a mesh
optimization approach. The mesh optimization problem is how to locate a
limited number n of grid points such that the established mesh of n grid
points approximates the digital surface of N sample points as closely as
possible. The resulting combinatorial problem has an NP-hard search
space of C(N, n) instances, i.e., the number of ways of choosing n grid
points out of N sample points. A genetic algorithm-based method has been
proposed for establishing optimal approximating mesh surfaces. It was
shown that the GA-based method is effective in searching the
combinatorial space which is intractable when n and N are in the order
of thousands. This paper proposes an efficient coarse-to-fine
evolutionary algorithm with a novel 2D orthogonal crossover for
obtaining an optimal solution to the mesh optimization problem. It is
shown empirically that the proposed coarse-to-fine evolutionary
algorithm outperforms the existing GA-based method in solving the mesh
optimization problem in terms of both approximation quality and
convergence speed, especially in solving large mesh optimization
In this paper, a new sub-pixel mapping method inspired by the clonal selection algorithm (CSA) in artificial immune systems (AIS) is proposed, namely clonal selection subpixel mapping (CSSM). In CSSM, the sub-pixel mapping problem becomes one of assigning land cover classes to the sub-pixels while maximizing the spatial dependence by clonal selection algorithm. CSSM inherits the biologic properties of human immune systems, i.e. clone, mutation, memory, to build a memory-cell population with a diverse set of local optimal solutions. Based on the memory-cell population, CSSM outputs the value of the memory cell and find the optimal sub-pixel mapping result. The proposed method was tested using the synthetic and degraded real imagery. Experimental results demonstrate that the proposed approach outperform traditional sub-pixel mapping algorithms, and hence provide an effective option for sub-pixel mapping of remote sensing imagery.
In this paper we propose a system for localization of cephalometric landmarks. The process of localization is carried out in two steps: deriving a smaller expectation window for each landmark using a trained neuro-fuzzy system (NFS) then applying a template-matching algorithm to pin point the exact location of the landmark. Four points are located on each image using edge detection. The four points are used to extract more features such as distances, shifts and rotation angles of the skull. Limited numbers of representative groups that will be used for training are selected based on k-means clustering. The most effective features are selected based on a Fisher discriminant for each feature set. Using fuzzy linguistics if-then rules, membership degree is assigned to each of the selected features and fed to the FNS. The FNS is trained, utilizing gradient descent, to learn the relation between the sizes, rotations and translations of landmarks and their locations. The data for training is obtained manually from one image from each cluster. Images whose features are located closer to the center of their cluster are used for extracting data for the training set. The expected locations on target images can then be predicted using the trained FNS. For each landmark a parametric template space is constructed from a set of templates extracted from several images based on the clarity of the landmark in that image. The template is matched to the search windows to find the exact location of the landmark. Decomposition of landmark shapes is used to desensitize the algorithm to size differences. The system is trained to locate 20 landmarks on a database of 565 images. Preliminary results show a recognition rate of more than 90%.
It is important to model strokes and their relationships for
on-line handwriting recognition, because they reflect character
structures. We propose to model them explicitly and statistically with
Bayesian networks. A character is modeled with stroke models and their
relationships. Strokes, parts of handwriting traces that are
approximately linear, are modeled with a set of point models and their
relationships. Points are modeled with conditional probability tables
and distributions for pen status and X, Y positions in the 2-D space,
given the information of related points. A Bayesian network is adopted
to represent a character model, whose nodes correspond to point models
and arcs their dependencies. The proposed system was tested on the
recognition of on-line handwritten digits. It showed higher recognition
rates than the HMM based recognizer with chaincode features and was
comparable to other published systems
A novel biological model for color constancy (CC) is presented.
This is the first biological model that succeeds in achieving automatic
color correction of real still and video images (registered patent). It
is based on two chromatic adaptation mechanisms in retinal color-coded
cells: local adaptation and remote chromatic adaptation. Our simulations
calculate the perceived image in order to correct image colors (as is
commonly required in cameras). The results indicate that the
contribution of adaptation mechanisms to CC is significant, robust, and
succeeded in performing color correction of still images and video
sequences under single and multiple illumination conditions
A region-based method is presented for the recognition of roads
and bridges in fully polarimetric synthetic aperture radar (SAR) images.
For the recognition of bridges, a CFAR detector is first used to extract
strong backscatterers from bridge fences; the backscatterers are then
grouped using a Hough transform to find potential bridge fences;
finally, bridges are delineated according to the polarimetric features
of the regions between the potential fences. After the delineation of
bridges, another Hough transform is used to recognize roads.
Morphological filtering is used to suppress speckle in SAR images.
Results on high resolution SAR images obtained from Lincoln Laboratory
The aim of motion stereo is to extract the 3-D information of an
object from images of a moving camera using the geometric relationships
between corresponding points. This paper presents an accurate and robust
motion stereo algorithm employing multiple images, taken under a general
motion. The object functions for individual stereo pairs are
represented, with respect to the distance, then these object functions
are integrated considering the position of cameras and the shape of the
object functions. By integrating the general motion stereo images, we
not only reduce the ambiguities in correspondence, but also improve the
precision of the reconstruction. Also by introducing an adaptive window
technique, we can alleviate the effect of projective distortion in
matching features and improve the accuracy greatly. Experimental results
on a synthetic and real data set are presented to demonstrate the
performance of the proposed algorithm
A character image database plays an important role not only in the development stage but also for the evaluation of a handwritten character recognition system. Such a database is obtained from outside or customly made. At this point there is no measure which tells the level of recognition difficulty of a given database. If such a measure is available, we can use it in many useful ways. Especially, it will be valuable when we compare and evaluate the performance results of various systems since different databases whose recognition difficulties are unknown are usually used. In this paper we propose such a measure. We first define the entropy of a point of an image in the database. Then we obtain the measure by applying some normalizing factors to the entropy mentioned above. Note that such a measure should be used to compare the recognition difficulties of databases only of the same character set. We show that the proposed measure can be used for databases not only of different numbers of images per class but also of different image sizes. Finally we confirm that the proposed measure really reflects the relative recognition difficulties of databases by using real databases
In this paper, a modular clutter rejection technique using region-based principal component analysis (PCA) is proposed. Our modular clutter rejection system uses dynamic ROI (region of interest) extraction to overcome the problem of poorly centered targets. In dynamic ROI extraction, a representative ROI is moved in several directions with respect to the center of the potential target image to extract a number of ROIs. Each module in the proposed system applies region-based PCA to generate the feature vectors, which are subsequently used to decide about the identity of the potential target. We also present experimental results using real-life data evaluating and comparing the performance of the clutter rejection systems with static and dynamic ROI extraction
There is a substantial interest in retrieving images from a large database using the textual information contained in the images. An algorithm which will automatically locate the textual regions in the input image will facilitate this task; the optical character recognizer can then be applied to only those regions of the image which contain text. We present a method for automatically locating text in complex color images. The algorithm first finds the approximate locations of text lines using horizontal spatial variance, and then extracts text components in these boxes using color segmentation. The proposed method has been used to locate text in compact disc (CD) and book cover images, as well as in the images of traffic scenes captured by a video camera. Initial results are encouraging and suggest that these algorithms can be used in image retrieval applications
The paper deals with the creation of visual segment trees involved
in MPEG-7 description schemes. After a brief overview of MPEG-7
description schemes in general and of the Segment Description Scheme in
particular, tools allowing the creation of segment trees are discussed.
It is proposed to create a binary partition tree in a first step and to
restructure the tree in a second step. Several examples involving
spatial and temporal partition trees are presented
Studies the use of curvature in addition to the gradient of
gray-scale character images in order to improve the accuracy of
handwritten numeral recognition. Three procedures, based on the
curvature coefficient, biquadratic interpolation and gradient vector
interpolation, are proposed for calculating the curvature of the
equi-gray-scale curves of an input image. The efficiency of the feature
vector is tested by recognition experiments for the handwritten numeral
database IPTP CDROM1, which is a ZIP code database provided by the
Institute for Posts and Telecommunications Policy (IPTP). The
experimental results show the usefulness of the curvature feature, and a
recognition rate of 99.40%, which is the highest that has ever been
reported for this database, is achieved
Texture segmentation deals with identification of regions where distinct textures exist. In this paper, a new scheme for texture segmentation using hierarchical wavelet decomposition is proposed. In the first step, using Daubechies' 4-tap filter, an original image is decomposed into three detailed images and one approximate image. The decomposition can be recursively applied to the approximate image to generate a lower resolution of the pyramid. The segmentation starts at the lowest resolution using the K-means clustering scheme and the result is propagated through the pyramid to a higher one with continuously improving segmentation
This paper provides a generalization of non-reducible descriptors by extending the concept of distance between patterns of different classes. Generalized non-reducible descriptors are used in supervised pattern recognition problems where the feature vectors consist of Boolean variables. Generalized non-reducible descriptors are expressed as conjunctions and correspond to syndromes in medical diagnosis. Generalized non-reducible descriptors minimize the number of operations in the decision rules. A mathematical model to construct generalized non-reducible descriptors, a computational procedure, and numerical examples are discussed.
We consider the problem of reconstructing the shape of an object
from multiple images related by translations, when only small portions
of the object can be observed in each image. Lindenbaum and Bruckstein
(1988) have considered this problem in the specific case where the
translating object is seen by small sensors, for application to the
understanding of insect vision. Their solution is limited by the fact
that its run time is exponential in the number of images and sensors. We
show that the problem can be solved in time that is polynomial in the
number of sensors, but is in fact NP complete when the number of images
is unbounded. We therefore consider the special case of convex objects,
which we can solve efficiently even when many images are used
We address the problem of image-based form document retrieval. The
essential element of this problem is the definition of a similarity
measure that is applicable in real situations, where query images are
allowed to differ from the database images. Based on the definition of
form signature, we have proposed a similarity measure that is
insensitive to translation, scaling, moderate skew (<5°) and
variations in the geometrical proportions of the form layout. This
similarity measure also has a good tolerance to line detection errors.
We have developed a prototype form retrieval system which has been
tested on a database containing 100 different kinds of forms
In this paper, we first approximate the Gaussian function with any scale by the linear finite combination of Gaussian functions with dyadic scale; consequently, the scale space can be constructed much more efficiently: we only perform smoothing at these dyadic scales and the smoothed signals at other scales can be found by calculating linear combinations of these discrete scale signals. We show that the approximation error is so small that our approach can be used in most of the computer vision fields. We analyse the behavior of zero-crossing (ZC) across scales and show that features at any scale can be found efficiently by tracking from the dyadic scales, thus we show that the new representation is necessary and complete. In the case that the derivatives are calculated by a special multiscale filter, we show that all the derivative signals can be treated in the same way.
Based on high-level geometric knowledge, especially symmetry, imposed upon objects in images, we demonstrate how to edit images in terms of correct 3-D shape and relationships of the objects, without explicitly performing full 3-D reconstruction. Symmetry is proposed as the central notation that unifies both conceptually and algorithmically all types of geometric regularities such as parallelism, orthogonality, and similarity. The methods are extremely simple, accurate and easy to implement, and they demonstrate the power of applying scene knowledge.
This paper demonstrates the importance of active behavior in machine vision systems. It proposes using activity of the observer for object finding and recognition. The presented robotic system is based on active and qualitative vision approaches which are supplemented by a hierarchical, multiresolution image representation and a parallel implementation of the active vision algorithms. The system uses a hierarchical search strategy to locate and then identify objects in a three-dimensional space. The paper describes the recognition algorithm and its parallel implementation. The preliminary results show near real-time performance
A texture segmentation algorithm inspired by the multichannel filtering theory for visual information processing in the early stages of the human visual system is presented. The channels are characterized by a bank of Gabor filters that nearly uniformly covers the spatial-frequency domain. A systematic filter selection scheme based on reconstruction of the input image from the filtered images is proposed. Texture features are obtained by subjecting each (selected) filtered image to a nonlinear transformation and computing a measure of energy in a window around each pixel. An unsupervised square-error clustering algorithm is then used to integrate the feature images and produce a segmentation. A simple procedure to incorporate spatial adjacency information in the clustering process is proposed. Experiments on images with natural textures as well as artificial textures with identical second and third-order statistics are reported. The algorithm appears to perform as predicted by preattentive texture discrimination by a human
Interpreting images is a difficult task to automate. Image
interpretation essentially consists of both low level and high level
vision tasks. In this paper, we develop a joint scheme for segmentation
and image interpretation in a multiresolution framework, where
segmentation (low level) and interpretation (high level) interleave. The
idea being that the interpretation block should be able to guide the
segmentation block which in turn helps the interpretation block in
better interpretation. We assume that the conditional probability of the
interpretation labels, given the knowledge vector and the measurement
vector is a Markov random field (MRF) and formulate the problem as a MAP
estimation problem at each resolution. We find the optimal
interpretation labels by using the simulated annealing algorithm. The
proposed algorithm is validated on some real scene images
This paper considers supervised learning, structure and parameter adaptive binary pattern recognition when a nongaussian pattern is observed in gaussian noise. To facilitate a feasible solution, certain judicious approximations are made use of. Two examples are presented to demonstrate the learning capability of the proposed algorithms.