Machine Vision and Applications

Published by Springer Nature

Online ISSN: 1432-1769


Print ISSN: 0932-8092


Toward a Computer Vision-based Wayfinding Aid for Blind Persons to Access Unfamiliar Indoor Environments
  • Article

April 2013


218 Reads


Xiaodong Yang


Chucai Yi


Independent travel is a well known challenge for blind and visually impaired persons. In this paper, we propose a proof-of-concept computer vision-based wayfinding aid for blind people to independently access unfamiliar indoor environments. In order to find different rooms (e.g. an office, a lab, or a bathroom) and other building amenities (e.g. an exit or an elevator), we incorporate object detection with text recognition. First we develop a robust and efficient algorithm to detect doors, elevators, and cabinets based on their general geometric shape, by combining edges and corners. The algorithm is general enough to handle large intra-class variations of objects with different appearances among different indoor environments, as well as small inter-class differences between different objects such as doors and door-like cabinets. Next, in order to distinguish intra-class objects (e.g. an office door from a bathroom door), we extract and recognize text information associated with the detected objects. For text recognition, we first extract text regions from signs with multiple colors and possibly complex backgrounds, and then apply character localization and topological analysis to filter out background interference. The extracted text is recognized using off-the-shelf optical character recognition (OCR) software products. The object type, orientation, location, and text information are presented to the blind traveler as speech.

Classification of small lesions in dynamic breast MRI: Eliminating the need for precise lesion segmentation through spatio-temporal analysis of contrast enhancement

October 2013


44 Reads


Markus B Huber


Thomas Schlossbauer




Axel Wismüller
Characterizing the dignity of breast lesions as benign or malignant is specifically difficult for small lesions; they don't exhibit typical characteristics of malignancy and are harder to segment since margins are harder to visualize. Previous attempts at using dynamic or morphologic criteria to classify small lesions (mean lesion diameter of about 1 cm) have not yielded satisfactory results. The goal of this work was to improve the classification performance in such small diagnostically challenging lesions while concurrently eliminating the need for precise lesion segmentation. To this end, we introduce a method for topological characterization of lesion enhancement patterns over time. Three Minkowski Functionals were extracted from all five post-contrast images of sixty annotated lesions on dynamic breast MRI exams. For each Minkowski Functional, topological features extracted from each post-contrast image of the lesions were combined into a high-dimensional texture feature vector. These feature vectors were classified in a machine learning task with support vector regression. For comparison, conventional Haralick texture features derived from gray-level co-occurrence matrices (GLCM) were also used. A new method for extracting thresholded GLCM features was also introduced and investigated here. The best classification performance was observed with Minkowski Functionals area and perimeter, thresholded GLCM features f8 and f9, and conventional GLCM features f4 and f6. However, both Minkowski Functionals and thresholded GLCM achieved such results without lesion segmentation while the performance of GLCM features significantly deteriorated when lesions were not segmented (p < 0.05). This suggests that such advanced spatio-temporal characterization can improve the classification performance achieved in such small lesions, while simultaneously eliminating the need for precise segmentation.

Fig. 2 Examples of EPP using Scale (a) and Subband (b) Pyramids. A Scale Pyramid produces a set of images where each level is scaled by half. A Subband Pyramid is implemented by applying a windowed 
Table 2 Data sets used
Fig. 3 The computational chain for features computed with MPF library on Scale/Subband pyramids. The MPF library is applied to the original image, resulting in a feature vector of length L. The same MPF is applied to each level of the pyramid to generate level-specific features, resulting in an overall feature vector-length of 4L 
Fig. 4 Examples of transforms for one instance from the Yale dataset. Image transforms shown: a log (|F[P]|), b |∇P|, c P, d Ch [P], e W [P], f log (|F[|∇P|]|), g log (|F[|P|]|), h Ch [|∇P|], and 
Fig. 5 Three chains computed on the original pixel plane a L-chain), on planes produced by single transforms (b ITF1(L) chain), and by compound transforms (c ITF2(L) chain). The set of features computed from each transform-derived pixel plane is transform-specific so that additional transforms and transform combinations result in additional features in the final feature vector 


Improving class separability using extended pixel planes: A comparative study
  • Article
  • Full-text available

September 2012


114 Reads

In this work we explored class separability in feature spaces built on extended representations of pixel planes (EPP) produced using scale pyramid, subband pyramid, and image transforms. The image transforms included Chebyshev, Fourier, wavelets, gradient and Laplacian; we also utilized transform combinations, including Fourier, Chebyshev and wavelets of the gradient transform, as well as Fourier of the Laplacian transform. We demonstrate that all three types of EPP promote class separation. We also explored the effect of EPP on suboptimal feature libraries, using only textural features in one case and only Haralick features in another. The effect of EPP was especially clear for these suboptimal libraries, where the transform-based representations were found to increase separability to a greater extent than scale or subband pyramids. EPP can be particularly useful in new applications where optimal features have not yet been developed.

Adaptive target recognition
Target recognition is a multi-level process requiring a sequence of algorithms at low, intermediate and high levels. Generally, such systems are open loop with no feedback between levels and assuring their performance at the given probability of correct identification (PCI) and probability of false alarm (Pf) is a key challenge in computer vision and pattern recognition research. In this paper a robust closed-loop system for recognition of SAR images based on reinforcement learning is presented. The parameters in the model-based SAR target recognition are learned. The method meets performance specifications by using PCI and Pf as feedback for the learning system. It has been experimentally validated by learning the parameters of the recognition system for SAR imagery, successfully recognizing articulated targets, targets of different configuration and targets of different depression angles

Edges: saliency measures and automatic thresholding
Edges are useful features for structural image analysis, but the output of standard edge detectors must be thresholded to remove the many spurious edges. This paper describes experiments with both new and old techniques for: 1. determining edge saliency (as alternatives to gradient magnitude) and 2. automatically determining appropriate edge threshold values. Some examples of edge saliency measures are lifetime, wiggliness, spatial width, and phase congruency. Examples of thresholding techniques use: the Rayleigh distribution to model the edge gradient magnitude histogram, relaxation labelling, and an edge curve "length"-"average gradient magnitude" feature space.

Extending color vision methods to bands beyond the visible

February 1999


23 Reads

There is a rapid growth in sensor technology in regions of the electromagnetic spectrum beyond the visible. Also, the parallel growth of powerful, inexpensive computers and digital electronics has made many new imaging applications possible. Although there are many approaches to sensor fusion, this paper provides a discussion of relevant infrared phenomenology and attempts to apply known methods of human color vision to achieve image fusion. Two specific topics of importance are color contrast enhancement and color constancy

An interactive grading and learning system for Chinese calligraphy
Chinese calligraphy is an oriental art. In this paper, an interactive calligraphic guiding system is proposed to grade the score of written characters by using the image processing and the fuzzy inference techniques. The written documents are automatically segmented. Three quantized features, the center, the size and the projections of each written character, are extracted to measure the score of calligraphy. Through this useful system, users could learn and practice Chinese calligraphy at home.

On-line cursive script recognition using time delay neural networks and hidden Markov models

May 1994


32 Reads

Presents a writer independent system for on-line handwriting recognition which can handle both cursive script and hand-print. The pen trajectory is recorded by a touch sensitive pad, such as those used by note-pad computers. The input to the system contains the pen trajectory information, encoded as a time-ordered sequence of feature vectors. Features include X and Y coordinates, pen-lifts, speed, direction and curvature of the pen trajectory. A time delay neural network with local connections and shared weights is used to estimate a posteriori probabilities for characters in a word. A hidden Markov model segments the word into characters in a way which optimizes the global word score, taking a dictionary into account. A geometrical normalization scheme and a fast but efficient dictionary search are also presented. Trained on 20000 unconstrained cursive words from 59 writers and using a 25000 word dictionary the authors reached a 89% character and 80% word recognition rate on test data from a disjoint set of writers

Silhouette-based object recognition through curvature scale space

June 1993


30 Reads

A complete and practical isolated-object recognition system has been developed which is very robust with respect to scale, position and orientation changes of the objects as well as noise and local deformations of shape due to perspective projection, segmentation errors and non-rigid material used in some objects. The system has been tested on a wide variety of 3-D objects with different shapes and surface properties. A light-box setup is used to obtain silhouette images which are segmented to obtain the physical boundaries of the objects which are classified as either convex or concave. Convex curves are recognized using their four high-scale curvature extrema points. Curvature scale space (CSS) representations are computed for concave curves. The CSS representation is a multi-scale organization of the natural invariant features of a curve. A three-stage coarse-to-fine matching algorithm quickly detects the correct object in each case

Motion segmentation and pose recognition with motion historygradients

February 2000


93 Reads

This paper uses a simple method for representing motion in successively layered silhouettes that directly encode system time termed the timed Motion History Image (tMHI). This representation can be used to both (a) determine the current pose of the object and to (b) segment and measure the motions induced by the object in a video scene. These segmented regions are not “motion blobs”, but instead motion regions naturally connected to the moving parts of the object of interest. This method may be used as a very general gesture recognition “toolbox”. We use it to recognize waving and overhead clapping motions to control a music synthesis program

Yilmaz, A.: Kernel Based Object Tracking Using Asymmetric Kernels with Adaptive Scale and Orientation Selection. Machine Vision and Applications 22(2), 255-268

March 2011


290 Reads

Kernel-based object tracking refers to computing the translation of an isotropic object kernel from one video frame to the next. The kernel is commonly chosen as a primitive geometric shape and its translation is computed by maximizing the likelihood between the current and past object observations. In the case when the object does not have an isotropic shape, kernel includes non-object regions which biases the motion estimation and results in loss of the tracked object. In this paper, we propose to use an asymmetric object kernel for improving the tracking performance. An important advantage of an asymmetric kernel over an isotropic kernel is its precise representation of the object shape. This property enhances tracking performance due to discarding the non-object regions. The second contribution of our paper is the introduction of a new adaptive kernel scale and orientation selection method which is currently achieved by greedy algorithms. In our approach, the scale and orientation are introduced as additional dimensions to the spatial image coordinates, in which the mode seeking, hence tracking, is achieved simultaneously in all coordinates. Demonstrated in a set of experiments, the proposed method has better tracking performance with comparable execution time then kernel tracking methods used in practice. KeywordsMode seeking–Visual tracking–Asymmetric kernels–Scale selection–Orientation selection

Applications of Dynamic Monocular Machine Vision.” Machine Vision and Applications 1, 241-261

January 1988


54 Reads

The 4-D approach to real-time machine vision presented in the companion paper (Dickmanns and Graefe 1988, this volume) is applied here to two problem areas of widespread interest in robotics. Following a discussion of the vision hardware sed, first, the precise position control for planar docking between 3-D vehicles is discussed; second, the application to high speed road vehicle guidance is demonstrated. With the 5 ton test vehicle VaMoRs, speeds up to 96 km/h (limited by the speed capability of the basic vehicle) have been reached. The test run available, of more than 20 km length, has been driven autonomously several times under various weather conditions.

A system for a new two-dimensional code: Secure 2D code

October 1998


36 Reads

A new 2D code called Secure 2D code is designed in this paper, both encoder and decoder are also proposed. Secure 2D code can store any kind of data and provides high security. With regard to security, the input data is divided into two parts: general and secret. The general data is transformed into a 2D code pattern, then secret data is hidden in the 2D code pattern. To raise the reading speed and allow various reading environments, some features are added around the 2D code pattern boundary. As to the reliability, RS code is adopted to treat damaged patterns.

Data mining for large scale 3D seismic data analysis

November 2009


108 Reads

The automation of the analysis of large volumes of seismic data is a data mining problem, where a large database of 3D images is searched by content, for the identification of the regions that are of most interest to the oil industry. In this paper we perform this search using the 3D orientation histogram as a texture analysis tool to represent and identify regions within the data, which are compatible with a query texture.

Estimating 3D Human Shapes from Measurements

September 2011


182 Reads

The recent advances in 3-D imaging technologies give rise to databases of human shapes, from which statistical shape models can be built. These statistical models represent prior knowledge of the human shape and enable us to solve shape reconstruction problems from partial information. Generating human shape from traditional anthropometric measurements is such a problem, since these 1-D measurements encode 3-D shape information. Combined with a statistical shape model, these easy-to-obtain measurements can be leveraged to create 3D human shapes. However, existing methods limit the creation of the shapes to the space spanned by the database and thus require a large amount of training data. In this paper, we introduce a technique that extrapolates the statistically inferred shape to fit the measurement data using nonlinear optimization. This method ensures that the generated shape is both human-like and satisfies the measurement conditions. We demonstrate the effectiveness of the method and compare it to existing approaches through extensive experiments, using both synthetic data and real human measurements.

Proposal of a method to analyze 3D deformation/fracture characteristics inside materials based on a stratified matching approach

August 2010


19 Reads

In the past, deformation/fracture (D/F) characteristics, defined as load-deformation relationships until the materials are fractured, have been analyzed and evaluated on the surface. The D/F characteristics are affected by more than 10,000 micro-scale internal structures like air bubbles (pores), dispersed particles and cracks in 1mm3; therefore, it is required to analyze nano-scale D/F characteristics inside materials. In this paper, we propose an analysis method by obtaining displacement vectors of dispersed particles from nano-order 3D-CT images. A problem of matching over 10,000 dispersed particles between deformation is solved by a stratified matching. KeywordsMaterial evaluation-3D-CT-PTV

A mathematical morphology approach to image based 3D particle shape analysis

December 2005


48 Reads

Angularity is a critically important property in terms of the performance of natural particulate materials. It is also one of the most difficult to measure objectively using traditional methods. Here we present an innovative and efficient approach to the determination of particle angularity using image analysis. The direct use of three-dimensional data offers a more robust solution than the two-dimensional methods proposed previously. The algorithm is based on the application of mathematical morphological techniques to range imagery, and effectively simulates the natural wear processes by which rock particles become rounded. The analysis of simulated volume loss is used to provide a valuable measure of angularity that is geometrically commensurate with the traditional definitions. Experimental data obtained using real particle samples are presented and results correlated with existing methods in order to demonstrate the validity of the new approach. The implementation of technologies such as these has the potential to offer significant process optimisation and environmental benefits to the producers of aggregates and their composites. The technique is theoretically extendable to the quantification of surface texture.

Calibration of 3D kinematic systems using orthogonality constraints

November 2007


144 Reads

Processing images acquired by multi-camera systems is nowadays an effective and convenient way of performing 3D reconstruction. The basic output, i.e. the 3D location of points, can easily be further processed to also acquire information about additional kinematic data: velocity and acceleration. Hence, many such reconstruction systems are referred to as 3D kinematic systems and are very broadly used, among other tasks, for human motion analysis. A prerequisite for the actual reconstruction of the unknown points is the calibration of the multi-camera system. At present, many popular 3D kinematic systems offer so-called wand calibration, using a rigid bar with attached markers, which is from the end user’s point of view preferred over many traditional methods. During this work a brief criticism on different calibration strategies is given and typical calibration approaches for 3D kinematic systems are explained. In addition, alternative ways of calibration are proposed, especially for the initialization stage. More specifically, the proposed methods rely not only on the enforcement of known distances between markers, but also on the orthogonality of two or three rigidly linked wands. Besides, the proposed ideas utilize common present calibration tools and shorten the typical calibration procedure. The obtained reconstruction accuracy is quite comparable with that obtained by commercial 3D kinematic systems.

Horizon picking in 3D seismic data volumes

October 2004


411 Reads

In this paper, we present an automatic horizon-picking algorithm, based on a surface detection technique, to detect horizons in 3D seismic data. The surface detection technique, and the use of 6-connectivity, allows us to detect fragments of horizons that are afterwards combined to form full horizons. The criteria of combining the fragments are similarity of orientation of the fragments, as expressed by their normal vectors, and proximity using 18-connectivity. The identified horizons are interrupted at faults, as required by the experts.

Determination of motion parameters of a moving range sensor approximated by polynomials for rectification of distorted 3D data

November 2011


67 Reads

Scanning by a moving range sensor from the air is one of the most effective methods to obtain range data of large-scale objects since it can measure some regions invisible from the ground. The obtained data, however, have some distortions due to the sensor motion during the scanning period. Besides these distorted range data, there should be available range data sets taken by other sensors fixed on the ground. Based on the overlapping regions visible from the moving sensor and the fixed ones, we propose an extended alignment algorithm to rectify the distorted range data and to align the data to the models by the fixed sensors. By using CAD models, we estimate the accuracy and effectiveness of our proposed method. Then we apply it to some real data sets to prove the validity of the method. KeywordsMoving range sensor–Alignment with motion parameters–Large-scale site-modeling

Region-based pose tracking with occlusions using 3D models

May 2012


127 Reads

Despite great progress achieved in 3-D pose tracking during the past years, occlusions and self-occlusions are still an open issue. This is particularly true in silhouette-based tracking where even visible parts cannot be tracked as long as they do not affect the object silhouette. Multiple cameras or motion priors can overcome this problem. However, multiple cameras or appropriate training data are not always readily available. We propose a framework in which the pose of 3-D models is found by minimising the 2-D projection error through minimisation of an energy function depending on the pose parameters. This framework makes it possible to handle occlusions and self-occlusions by tracking multiple objects and object parts simultaneously. Therefore, each part is described by its own image region each of which is modeled by one probability density function. This allows to deal with occlusions explicitly, which includes self-occlusions between different parts of the same object as well as occlusions between different objects. The results we present for simulations and real-world scenes demonstrate the improvements achieved in monocular and multi-camera settings. These improvements are substantiated by quantitative evaluations, e.g. based on the HumanEVA benchmark. KeywordsPose estimation–Model-based tracking–Kinematic chain–Computer vision–Human motion analysis–Occlusion handling

The perceptive workbench: Computer-vision-based gesture tracking, object tracking, and 3D reconstruction for augmented desks

January 2003


313 Reads

Abstract. The Perceptive Workbench endeavors to create a spontaneous and unimpeded interface between the physical and virtual worlds. Its vision-based methods for interaction constitute an alternative to wired input devices and tethered tracking. Objects are recognized and tracked when placed on the display surface. By using multiple infrared light sources, the object's 3-D shape can be captured and inserted into the virtual interface. This ability permits spontaneity, since either preloaded objects or those objects selected at run-time by the user can become physical icons. Integrated into the same vision-based interface is the ability to identify 3-D hand position, pointing direction, and sweeping arm gestures. Such gestures can enhance selection, manipulation, and navigation tasks. The Perceptive Workbench has been used for a variety of applications, including augmented reality gaming and terrain navigation. This paper focuses on the techniques used in implementing the Perceptive Workbench and the system's performance.

Extending 3D Lucas–Kanade tracking with adaptive templates for head pose estimation

October 2010


165 Reads

The Lucas–Kanade tracker (LKT) is a commonly used method to track target objects over 2D images. The key principle behind the object tracking of an LKT is to warp the object appearance so as to minimize the difference between the warped object’s appearance and a pre-stored template. Accordingly, the 2D pose of the tracked object in terms of translation, rotation, and scaling can be recovered from the warping. To extend the LKT for 3D pose estimation, a model-based 3D LKT assumes a 3D geometric model for the target object in the 3D space and tries to infer the 3D object motion by minimizing the difference between the projected 2D image of the 3D object and the pre-stored 2D image template. In this paper, we propose an extended model-based 3D LKT for estimating 3D head poses by tracking human heads on video sequences. In contrast to the original model-based 3D LKT, which uses a template with each pixel represented by a single intensity value, the proposed model-based 3D LKT exploits an adaptive template with each template pixel modeled by a continuously updated Gaussian distribution during head tracking. This probabilistic template modeling improves the tracker’s ability to handle temporal fluctuation of pixels caused by continuous environmental changes such as varying illumination and dynamic backgrounds. Due to the new probabilistic template modeling, we reformulate the head pose estimation as a maximum likelihood estimation problem, rather than the original difference minimization procedure. Based on the new formulation, an algorithm to estimate the best head pose is derived. The experimental results show that the proposed extended model-based 3D LKT achieves higher accuracy and reliability than the conventional one does. Particularly, the proposed LKT is very effective in handling varying illumination, which cannot be well handled in the original LKT. KeywordsVisual tracking-Lucas–Kanade tracker-3D pose estimation-Maximum likelihood estimation-Adaptive template

Defect detection in inhomogeneously textured sputtered surfaces using 3D Fourier image reconstruction

November 2007


265 Reads

Texture analysis techniques have been used extensively for surface inspection, in which small defects that appear as local anomalies in textured surfaces must be detected. Traditional surface inspection methods are mainly concentrated on homogeneous textures. In this paper, we propose a 3D Fourier reconstruction scheme to tackle the problem of surface inspection on sputtered glass substrates that contain inhomogeneous textures. Such sputtered surfaces can be found in touch panels and liquid crystal displays (LCDs). Since an inhomogeneously textured surface does not have repetition, self-similarity properties in the image, a sequence of faultless images along with the inspection image are used to construct a 3D image so that the periodic patterns of the surface can be observed in the additional frame-axis. Bandreject filtering is used to eliminate frequency components associated with faultless textures in the spatial domain image, and the 3D inverse Fourier transform is then carried out to reconstruct the image. The resulting image can effectively remove background textures and distinctly preserve anomalies. This converts the difficult defect detection in complicated inhomogeneous textures into a simple thresholding in nontextured images. Experimental results from a number of sputtered glass surfaces have shown the efficacy of the proposed 3D Fourier image reconstruction scheme.

Stereo vision system for precision dimensional inspection of 3D holes

January 2003


147 Reads

This paper presents a machine vision system for the high-accuracy 3D measurement of holes on the surface of industrial components. This is a very important application for inline quality inspection in assembly plants. A CAD-based stereo vision approach is adopted. The introduction of several novel techniques enables the system to achieve high robustness in versatile industrial environments, rapid response, and accuracy below 0.1 mm. These are demonstrated by extensive experiments with synthetic and real data.

Top-cited authors