Independent travel is a well known challenge for blind and visually impaired persons. In this paper, we propose a proof-of-concept computer vision-based wayfinding aid for blind people to independently access unfamiliar indoor environments. In order to find different rooms (e.g. an office, a lab, or a bathroom) and other building amenities (e.g. an exit or an elevator), we incorporate object detection with text recognition. First we develop a robust and efficient algorithm to detect doors, elevators, and cabinets based on their general geometric shape, by combining edges and corners. The algorithm is general enough to handle large intra-class variations of objects with different appearances among different indoor environments, as well as small inter-class differences between different objects such as doors and door-like cabinets. Next, in order to distinguish intra-class objects (e.g. an office door from a bathroom door), we extract and recognize text information associated with the detected objects. For text recognition, we first extract text regions from signs with multiple colors and possibly complex backgrounds, and then apply character localization and topological analysis to filter out background interference. The extracted text is recognized using off-the-shelf optical character recognition (OCR) software products. The object type, orientation, location, and text information are presented to the blind traveler as speech.
Characterizing the dignity of breast lesions as benign or malignant is specifically difficult for small lesions; they don't exhibit typical characteristics of malignancy and are harder to segment since margins are harder to visualize. Previous attempts at using dynamic or morphologic criteria to classify small lesions (mean lesion diameter of about 1 cm) have not yielded satisfactory results. The goal of this work was to improve the classification performance in such small diagnostically challenging lesions while concurrently eliminating the need for precise lesion segmentation. To this end, we introduce a method for topological characterization of lesion enhancement patterns over time. Three Minkowski Functionals were extracted from all five post-contrast images of sixty annotated lesions on dynamic breast MRI exams. For each Minkowski Functional, topological features extracted from each post-contrast image of the lesions were combined into a high-dimensional texture feature vector. These feature vectors were classified in a machine learning task with support vector regression. For comparison, conventional Haralick texture features derived from gray-level co-occurrence matrices (GLCM) were also used. A new method for extracting thresholded GLCM features was also introduced and investigated here. The best classification performance was observed with Minkowski Functionals area and perimeter, thresholded GLCM features f8 and f9, and conventional GLCM features f4 and f6. However, both Minkowski Functionals and thresholded GLCM achieved such results without lesion segmentation while the performance of GLCM features significantly deteriorated when lesions were not segmented (p < 0.05). This suggests that such advanced spatio-temporal characterization can improve the classification performance achieved in such small lesions, while simultaneously eliminating the need for precise segmentation.
In this work we explored class separability in feature spaces built on extended representations of pixel planes (EPP) produced using scale pyramid, subband pyramid, and image transforms. The image transforms included Chebyshev, Fourier, wavelets, gradient and Laplacian; we also utilized transform combinations, including Fourier, Chebyshev and wavelets of the gradient transform, as well as Fourier of the Laplacian transform. We demonstrate that all three types of EPP promote class separation. We also explored the effect of EPP on suboptimal feature libraries, using only textural features in one case and only Haralick features in another. The effect of EPP was especially clear for these suboptimal libraries, where the transform-based representations were found to increase separability to a greater extent than scale or subband pyramids. EPP can be particularly useful in new applications where optimal features have not yet been developed.
Target recognition is a multi-level process requiring a sequence
of algorithms at low, intermediate and high levels. Generally, such
systems are open loop with no feedback between levels and assuring their
performance at the given probability of correct identification (PCI) and
probability of false alarm (Pf) is a key challenge in computer vision
and pattern recognition research. In this paper a robust closed-loop
system for recognition of SAR images based on reinforcement learning is
presented. The parameters in the model-based SAR target recognition are
learned. The method meets performance specifications by using PCI and Pf
as feedback for the learning system. It has been experimentally
validated by learning the parameters of the recognition system for SAR
imagery, successfully recognizing articulated targets, targets of
different configuration and targets of different depression angles
Edges are useful features for structural image analysis, but the output of standard edge detectors must be thresholded to remove the many spurious edges. This paper describes experiments with both new and old techniques for: 1. determining edge saliency (as alternatives to gradient magnitude) and 2. automatically determining appropriate edge threshold values. Some examples of edge saliency measures are lifetime, wiggliness, spatial width, and phase congruency. Examples of thresholding techniques use: the Rayleigh distribution to model the edge gradient magnitude histogram, relaxation labelling, and an edge curve "length"-"average gradient magnitude" feature space.
There is a rapid growth in sensor technology in regions of the
electromagnetic spectrum beyond the visible. Also, the parallel growth
of powerful, inexpensive computers and digital electronics has made many
new imaging applications possible. Although there are many approaches to
sensor fusion, this paper provides a discussion of relevant infrared
phenomenology and attempts to apply known methods of human color vision
to achieve image fusion. Two specific topics of importance are color
contrast enhancement and color constancy
Chinese calligraphy is an oriental art. In this paper, an interactive calligraphic guiding system is proposed to grade the score of written characters by using the image processing and the fuzzy inference techniques. The written documents are automatically segmented. Three quantized features, the center, the size and the projections of each written character, are extracted to measure the score of calligraphy. Through this useful system, users could learn and practice Chinese calligraphy at home.
Presents a writer independent system for on-line handwriting
recognition which can handle both cursive script and hand-print. The pen
trajectory is recorded by a touch sensitive pad, such as those used by
note-pad computers. The input to the system contains the pen trajectory
information, encoded as a time-ordered sequence of feature vectors.
Features include X and Y coordinates, pen-lifts, speed, direction and
curvature of the pen trajectory. A time delay neural network with local
connections and shared weights is used to estimate a posteriori
probabilities for characters in a word. A hidden Markov model segments
the word into characters in a way which optimizes the global word score,
taking a dictionary into account. A geometrical normalization scheme and
a fast but efficient dictionary search are also presented. Trained on
20000 unconstrained cursive words from 59 writers and using a 25000 word
dictionary the authors reached a 89% character and 80% word recognition
rate on test data from a disjoint set of writers
A complete and practical isolated-object recognition system has been developed which is very robust with respect to scale, position and orientation changes of the objects as well as noise and local deformations of shape due to perspective projection, segmentation errors and non-rigid material used in some objects. The system has been tested on a wide variety of 3-D objects with different shapes and surface properties. A light-box setup is used to obtain silhouette images which are segmented to obtain the physical boundaries of the objects which are classified as either convex or concave. Convex curves are recognized using their four high-scale curvature extrema points. Curvature scale space (CSS) representations are computed for concave curves. The CSS representation is a multi-scale organization of the natural invariant features of a curve. A three-stage coarse-to-fine matching algorithm quickly detects the correct object in each case
This paper uses a simple method for representing motion in successively layered silhouettes that directly encode system time termed the timed Motion History Image (tMHI). This representation can be used to both (a) determine the current pose of the object and to (b) segment and measure the motions induced by the object in a video scene. These segmented regions are not “motion blobs”, but instead motion regions naturally connected to the moving parts of the object of interest. This method may be used as a very general gesture recognition “toolbox”. We use it to recognize waving and overhead clapping motions to control a music synthesis program
Kernel-based object tracking refers to computing the translation of an isotropic object kernel from one video frame to the
next. The kernel is commonly chosen as a primitive geometric shape and its translation is computed by maximizing the likelihood
between the current and past object observations. In the case when the object does not have an isotropic shape, kernel includes
non-object regions which biases the motion estimation and results in loss of the tracked object. In this paper, we propose
to use an asymmetric object kernel for improving the tracking performance. An important advantage of an asymmetric kernel
over an isotropic kernel is its precise representation of the object shape. This property enhances tracking performance due
to discarding the non-object regions. The second contribution of our paper is the introduction of a new adaptive kernel scale
and orientation selection method which is currently achieved by greedy algorithms. In our approach, the scale and orientation
are introduced as additional dimensions to the spatial image coordinates, in which the mode seeking, hence tracking, is achieved
simultaneously in all coordinates. Demonstrated in a set of experiments, the proposed method has better tracking performance
with comparable execution time then kernel tracking methods used in practice.
KeywordsMode seeking–Visual tracking–Asymmetric kernels–Scale selection–Orientation selection
The 4-D approach to real-time machine vision presented in the companion paper (Dickmanns and Graefe 1988, this volume) is applied here to two problem areas of widespread interest in robotics. Following a discussion of the vision hardware sed, first, the precise position control for planar docking between 3-D vehicles is discussed; second, the application to high speed road vehicle guidance is demonstrated. With the 5 ton test vehicle VaMoRs, speeds up to 96 km/h (limited by the speed capability of the basic vehicle) have been reached. The test run available, of more than 20 km length, has been driven autonomously several times under various weather conditions.
A new 2D code called Secure 2D code is designed in this paper, both encoder and decoder are also proposed. Secure 2D code
can store any kind of data and provides high security. With regard to security, the input data is divided into two parts:
general and secret. The general data is transformed into a 2D code pattern, then secret data is hidden in the 2D code pattern.
To raise the reading speed and allow various reading environments, some features are added around the 2D code pattern boundary.
As to the reliability, RS code is adopted to treat damaged patterns.
The automation of the analysis of large volumes of seismic data is a data mining problem, where a large database of 3D images
is searched by content, for the identification of the regions that are of most interest to the oil industry. In this paper
we perform this search using the 3D orientation histogram as a texture analysis tool to represent and identify regions within
the data, which are compatible with a query texture.
The recent advances in 3-D imaging technologies give rise to databases of
human shapes, from which statistical shape models can be built. These
statistical models represent prior knowledge of the human shape and enable us
to solve shape reconstruction problems from partial information. Generating
human shape from traditional anthropometric measurements is such a problem,
since these 1-D measurements encode 3-D shape information. Combined with a
statistical shape model, these easy-to-obtain measurements can be leveraged to
create 3D human shapes. However, existing methods limit the creation of the
shapes to the space spanned by the database and thus require a large amount of
training data. In this paper, we introduce a technique that extrapolates the
statistically inferred shape to fit the measurement data using nonlinear
optimization. This method ensures that the generated shape is both human-like
and satisfies the measurement conditions. We demonstrate the effectiveness of
the method and compare it to existing approaches through extensive experiments,
using both synthetic data and real human measurements.
In the past, deformation/fracture (D/F) characteristics, defined as load-deformation relationships until the materials are
fractured, have been analyzed and evaluated on the surface. The D/F characteristics are affected by more than 10,000 micro-scale
internal structures like air bubbles (pores), dispersed particles and cracks in 1mm3; therefore, it is required to analyze nano-scale D/F characteristics inside materials. In this paper, we propose an analysis
method by obtaining displacement vectors of dispersed particles from nano-order 3D-CT images. A problem of matching over 10,000
dispersed particles between deformation is solved by a stratified matching.
Angularity is a critically important property in terms of the performance of natural particulate materials. It is also one of the most difficult to measure objectively using traditional methods. Here we present an innovative and efficient approach to the determination of particle angularity using image analysis. The direct use of three-dimensional data offers a more robust solution than the two-dimensional methods proposed previously. The algorithm is based on the application of mathematical morphological techniques to range imagery, and effectively simulates the natural wear processes by which rock particles become rounded. The analysis of simulated volume loss is used to provide a valuable measure of angularity that is geometrically commensurate with the traditional definitions. Experimental data obtained using real particle samples are presented and results correlated with existing methods in order to demonstrate the validity of the new approach. The implementation of technologies such as these has the potential to offer significant process optimisation and environmental benefits to the producers of aggregates and their composites. The technique is theoretically extendable to the quantification of surface texture.
Processing images acquired by multi-camera systems is nowadays an effective and convenient way of performing 3D reconstruction.
The basic output, i.e. the 3D location of points, can easily be further processed to also acquire information about additional
kinematic data: velocity and acceleration. Hence, many such reconstruction systems are referred to as 3D kinematic systems
and are very broadly used, among other tasks, for human motion analysis. A prerequisite for the actual reconstruction of the
unknown points is the calibration of the multi-camera system. At present, many popular 3D kinematic systems offer so-called
wand calibration, using a rigid bar with attached markers, which is from the end user’s point of view preferred over many
traditional methods. During this work a brief criticism on different calibration strategies is given and typical calibration
approaches for 3D kinematic systems are explained. In addition, alternative ways of calibration are proposed, especially for
the initialization stage. More specifically, the proposed methods rely not only on the enforcement of known distances between
markers, but also on the orthogonality of two or three rigidly linked wands. Besides, the proposed ideas utilize common present
calibration tools and shorten the typical calibration procedure. The obtained reconstruction accuracy is quite comparable
with that obtained by commercial 3D kinematic systems.
In this paper, we present an automatic horizon-picking algorithm, based on a surface detection technique, to detect horizons in 3D seismic data. The surface detection technique, and the use of 6-connectivity, allows us to detect fragments of horizons that are afterwards combined to form full horizons. The criteria of combining the fragments are similarity of orientation of the fragments, as expressed by their normal vectors, and proximity using 18-connectivity. The identified horizons are interrupted at faults, as required by the experts.
Scanning by a moving range sensor from the air is one of the most effective methods to obtain range data of large-scale objects
since it can measure some regions invisible from the ground. The obtained data, however, have some distortions due to the
sensor motion during the scanning period. Besides these distorted range data, there should be available range data sets taken
by other sensors fixed on the ground. Based on the overlapping regions visible from the moving sensor and the fixed ones,
we propose an extended alignment algorithm to rectify the distorted range data and to align the data to the models by the
fixed sensors. By using CAD models, we estimate the accuracy and effectiveness of our proposed method. Then we apply it to
some real data sets to prove the validity of the method.
KeywordsMoving range sensor–Alignment with motion parameters–Large-scale site-modeling
Despite great progress achieved in 3-D pose tracking during the past years, occlusions and self-occlusions are still an open
issue. This is particularly true in silhouette-based tracking where even visible parts cannot be tracked as long as they do
not affect the object silhouette. Multiple cameras or motion priors can overcome this problem. However, multiple cameras or
appropriate training data are not always readily available. We propose a framework in which the pose of 3-D models is found
by minimising the 2-D projection error through minimisation of an energy function depending on the pose parameters. This framework
makes it possible to handle occlusions and self-occlusions by tracking multiple objects and object parts simultaneously. Therefore,
each part is described by its own image region each of which is modeled by one probability density function. This allows to
deal with occlusions explicitly, which includes self-occlusions between different parts of the same object as well as occlusions
between different objects. The results we present for simulations and real-world scenes demonstrate the improvements achieved
in monocular and multi-camera settings. These improvements are substantiated by quantitative evaluations, e.g. based on the
KeywordsPose estimation–Model-based tracking–Kinematic chain–Computer vision–Human motion analysis–Occlusion handling
Abstract. The Perceptive Workbench endeavors to create a spontaneous and unimpeded interface between the physical and virtual worlds.
Its vision-based methods for interaction constitute an alternative to wired input devices and tethered tracking. Objects are
recognized and tracked when placed on the display surface. By using multiple infrared light sources, the object's 3-D shape
can be captured and inserted into the virtual interface. This ability permits spontaneity, since either preloaded objects
or those objects selected at run-time by the user can become physical icons. Integrated into the same vision-based interface
is the ability to identify 3-D hand position, pointing direction, and sweeping arm gestures. Such gestures can enhance selection,
manipulation, and navigation tasks. The Perceptive Workbench has been used for a variety of applications, including augmented
reality gaming and terrain navigation. This paper focuses on the techniques used in implementing the Perceptive Workbench
and the system's performance.
The Lucas–Kanade tracker (LKT) is a commonly used method to track target objects over 2D images. The key principle behind
the object tracking of an LKT is to warp the object appearance so as to minimize the difference between the warped object’s
appearance and a pre-stored template. Accordingly, the 2D pose of the tracked object in terms of translation, rotation, and
scaling can be recovered from the warping. To extend the LKT for 3D pose estimation, a model-based 3D LKT assumes a 3D geometric
model for the target object in the 3D space and tries to infer the 3D object motion by minimizing the difference between the
projected 2D image of the 3D object and the pre-stored 2D image template. In this paper, we propose an extended model-based
3D LKT for estimating 3D head poses by tracking human heads on video sequences. In contrast to the original model-based 3D
LKT, which uses a template with each pixel represented by a single intensity value, the proposed model-based 3D LKT exploits
an adaptive template with each template pixel modeled by a continuously updated Gaussian distribution during head tracking.
This probabilistic template modeling improves the tracker’s ability to handle temporal fluctuation of pixels caused by continuous
environmental changes such as varying illumination and dynamic backgrounds. Due to the new probabilistic template modeling,
we reformulate the head pose estimation as a maximum likelihood estimation problem, rather than the original difference minimization
procedure. Based on the new formulation, an algorithm to estimate the best head pose is derived. The experimental results
show that the proposed extended model-based 3D LKT achieves higher accuracy and reliability than the conventional one does.
Particularly, the proposed LKT is very effective in handling varying illumination, which cannot be well handled in the original
KeywordsVisual tracking-Lucas–Kanade tracker-3D pose estimation-Maximum likelihood estimation-Adaptive template
Texture analysis techniques have been used extensively for surface inspection, in which small defects that appear as local anomalies in textured surfaces must be detected. Traditional surface inspection methods are mainly concentrated on homogeneous textures. In this paper, we propose a 3D Fourier reconstruction scheme to tackle the problem of surface inspection on sputtered glass substrates that contain inhomogeneous textures. Such sputtered surfaces can be found in touch panels and liquid crystal displays (LCDs).
Since an inhomogeneously textured surface does not have repetition, self-similarity properties in the image, a sequence of faultless images along with the inspection image are used to construct a 3D image so that the periodic patterns of the surface can be observed in the additional frame-axis. Bandreject filtering is used to eliminate frequency components associated with faultless textures in the spatial domain image, and the 3D inverse Fourier transform is then carried out to reconstruct the image. The resulting image can effectively remove background textures and distinctly preserve anomalies. This converts the difficult defect detection in complicated inhomogeneous textures into a simple thresholding in nontextured images. Experimental results from a number of sputtered glass surfaces have shown the efficacy of the proposed 3D Fourier image reconstruction scheme.
This paper presents a machine vision system for the high-accuracy 3D measurement of holes on the surface of industrial components. This is a very important application for inline quality inspection in assembly plants. A CAD-based stereo vision approach is adopted. The introduction of several novel techniques enables the system to achieve high robustness in versatile industrial environments, rapid response, and accuracy below 0.1 mm. These are demonstrated by extensive experiments with synthetic and real data.