Active appearance models (AAMs) have demonstrated great utility when being employed for non-rigid face alignment/tracking. The "simultaneous" algorithm for fitting an AAM achieves good non-rigid face registration performance, but has poor real time performance (2-3 fps). The "project-out" algorithm for fitting an AAM achieves faster than real time performance (> 200 fps) but suffers from poor generic alignment performance. In this paper we introduce an extension to a discriminative method for non-rigid face registration/tracking referred to as a constrained local model (CLM). Our proposed method is able to achieve superior performance to the "simultaneous" AAM algorithm along with real time fitting speeds (35 fps). We improve upon the canonical CLM formulation, to gain this performance, in a number of ways by employing: (i) linear SVMs as patch-experts, (ii) a simplified optimization criteria, and (iii) a composite rather than additive warp update step. Most notably, our simplified optimization criteria for fitting the CLM divides the problem of finding a single complex registration/warp displacement into that of finding N simple warp displacements. From these N simple warp displacements, a single complex warp displacement is estimated using a weighted least-squares constraint. Another major advantage of this simplified optimization lends from its ability to be parallelized, a step which we also theoretically explore in this paper. We refer to our approach for fitting the CLM as the "exhaustive local search" (ELS) algorithm. Experiments were conducted on the CMU Multi-PIE database.
The manual signs in sign languages are generated and interpreted using three basic building blocks: handshape, motion, and place of articulation. When combined, these three components (together with palm orientation) uniquely determine the meaning of the manual sign. This means that the use of pattern recognition techniques that only employ a subset of these components is inappropriate for interpreting the sign or to build automatic recognizers of the language. In this paper, we define an algorithm to model these three basic components form a single video sequence of two-dimensional pictures of a sign. Recognition of these three components are then combined to determine the class of the signs in the videos. Experiments are performed on a database of (isolated) American Sign Language (ASL) signs. The results demonstrate that, using semi-automatic detection, all three components can be reliably recovered from two-dimensional video sequences, allowing for an accurate representation and recognition of the signs.
Pain is typically assessed by patient self-report. Self-reported pain, however, is difficult to interpret and may be impaired or in some circumstances (i.e., young children and the severely ill) not even possible. To circumvent these problems behavioral scientists have identified reliable and valid facial indicators of pain. Hitherto, these methods have required manual measurement by highly skilled human observers. In this paper we explore an approach for automatically recognizing acute pain without the need for human observers. Specifically, our study was restricted to automatically detecting pain in adult patients with rotator cuff injuries. The system employed video input of the patients as they moved their affected and unaffected shoulder. Two types of ground truth were considered. Sequence-level ground truth consisted of Likert-type ratings by skilled observers. Frame-level ground truth was calculated from presence/absence and intensity of facial actions previously associated with pain. Active appearance models (AAM) were used to decouple shape and appearance in the digitized face images. Support vector machines (SVM) were compared for several representations from the AAM and of ground truth of varying granularity. We explored two questions pertinent to the construction, design and development of automatic pain detection systems. First, at what level (i.e., sequence- or frame-level) should datasets be labeled in order to obtain satisfactory automatic pain detection performance? Second, how important is it, at both levels of labeling, that we non-rigidly register the face?
Image registration is a key step in a great variety of biomedical imaging applications. It provides the ability to geometrically align one dataset with another, and is a prerequisite for all imaging applications that compare datasets across subjects, imaging modalities, or across time. Registration algorithms also enable the pooling and comparison of experimental findings across laboratories, the construction of population-based brain atlases, and the creation of systems to detect group patterns in structural and functional imaging data. We review the major types of registration approaches used in brain imaging today. We focus on their conceptual basis, the underlying mathematics, and their strengths and weaknesses in different contexts. We describe the major goals of registration, including data fusion, quantification of change, automated image segmentation and labeling, shape measurement, and pathology detection. We indicate that registration algorithms have great potential when used in conjunction with a digital brain atlas, which acts as a reference system in which brain images can be compared for statistical analysis. The resulting armory of registration approaches is fundamental to medical image analysis, and in a brain mapping context provides a means to elucidate clinical, demographic, or functional trends in the anatomy or physiology of the brain.
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important ...
Foreground-background segmentation has recently been applied [26,12] to the detection and segmentation of specific objects or structures of interest from the background as an efficient alternative to techniques such as deformable templates . We introduce a graphical model (i.e. Markov random field)-based formulation of structure-specific figure-ground segmentation based on simple geometric features extracted from an image, such as local configurations of linear features, that are characteristic of the desired figure structure. Our formulation is novel in that it is based on factor graphs, which are graphical models that encode interactions among arbitrary numbers of random variables. The ability of factor graphs to express interactions higher than pairwise order (the highest order encountered in most graphical models used in computer vision) is useful for modeling a variety of pattern recognition problems. In particular, we show how this property makes factor graphs a natural framework for performing grouping and segmentation, and demonstrate that the factor graph framework emerges naturally from a simple maximum entropy model of figure-ground segmentation.We cast our approach in a learning framework, in which the contributions of multiple grouping cues are learned from training data, and apply our framework to the problem of finding printed text in natural scenes. Experimental results are described, including a performance analysis that demonstrates the feasibility of the approach.
This paper presents a whole body surface imaging system based on stereo vision technology. We have adopted a compact and economical configuration which involves only four stereo units to image the frontal and rear sides of the body. The success of the system depends on a stereo matching process that can effectively segment the body from the background in addition to recovering sufficient geometric details. For this purpose, we have developed a novel sub-pixel, dense stereo matching algorithm which includes two major phases. In the first phase, the foreground is accurately segmented with the help of a predefined virtual interface in the disparity space image, and a coarse disparity map is generated with block matching. In the second phase, local least squares matching is performed in combination with global optimization within a regularization framework, so as to ensure both accuracy and reliability. Our experimental results show that the system can realistically capture smooth and natural whole body shapes with high accuracy.
Polynomial relations are established between the invariants of certain mixed sets of points and lines and the invariants of their projected images. The relations are obtained using the properties of a rational curve, in fact a twisted cubic, which is a covariant of the given set of points and lines
We propose a system for human computer interaction via 3D hand movements, based on a combination of visual tracking and a cheap, off-the-shelf, accelerometer. We use a 3D model and region based tracker, resulting in robustness to variations in illumination, motion blur and occlusions. At the same time the accelerometer allows us to deal with the multimodality in the silhouette to pose function. We synchronise the accelerometer and tracker online, by casting the calibration problem as a maximum covariance problem, which we then solve probabilistically. We show the effectiveness of our solution with multiple real-world tests and demonstration scenarios.
The authors show some very recent results using the
weak-calibration idea. Assuming that only the epipolar geometry is known
of a pair of stereo images, encoded in the so-called fundamental matrix,
it is shown that some useful three-dimensional information such as
relative positions of points and planes and 3-D convex hulls can be
computed. The notion of visibility is introduced, which allows deriving
those properties. Results of both synthetic and real data are
A method for the automatic reconstruction of 3D objects from
multiple camera views for 3D multimedia applications is presented.
Conventional 3D reconstruction techniques use equipment that restrict
the flexibility of the user. In order to increase this flexibility, the
presented method is characterized by a simple measurement environment,
that consists of a new calibration pattern placed below the object
allowing object and pattern acquisition simultaneously. This ensures,
that each view can be calibrated individually. From these obtained
calibrated camera views, a textured 3D wireframe model is estimated
using a shape-from-silhouette approach and texture mapping of the
original camera views. Experiments with this system have confirmed a
significant gain of flexibility for the user and a drastic reduction of
costs for technical equipment while ensuring comparable model quality as
conventional reconstruction techniques at the same time
We present a system to detect passenger cars in aerial images
where cars appear as small objects. We pose this as a 3D object
recognition problem to account for the variation in viewpoint and the
shadow. We started from psychological tests to find important features
for human detection of cars. Based on these observations, we selected
the boundary of the car body, the boundary of the front windshield and
the shadow as the features. Some of these features are affected by the
intensity of the car and whether or not there is a shadow along it. This
information is represented in the structure of the Bayesian network that
we use to integrate all features. Experiments show very promising
results even on some very challenging images
In this paper we present a method for automatically constructing a CAD model of an unknown object from range images. The method is an incremental one that interleaves a sensing operation that acquires and merges information into the model with a planning phase to determine the next sensor position or “view”. This is accomplished by integrating a system for 3-D model acquisition with a sensor planner. The model acquisition system provides facilities for range image acquisition, solid model construction and model merging: both mesh surface and solid representations are used to build a model of the range data from each view, which is then merged with the model built from previous sensing operations. The planning system utilizes the resulting incomplete model to plan the next sensing operation by finding a sensor viewpoint that will improve the fidelity of the model. Experimental results are presented for a complex part that includes polygonal faces, curved surfaces, and large self-occlusions
Many problems in machine learning and computer vision consist of predicting multi-dimensional output vectors given a specific set of input features. In many of these problems, there exist inherent temporal and spacial dependencies between the output vectors, as well as repeating output patterns and input-output associations, that can provide more robust and accurate predictors when modelled properly. With this intrinsic motivation, we propose a novel Output-Associative Relevance Vector Machine (OA-RVM) regression framework that augments the traditional RVM regression by being able to learn non-linear input and output dependencies. Instead of depending solely on the input patterns, OA-RVM models output structure and covariances within a predefined temporal window, thus capturing past, current and future context. As a result, output patterns manifested in the training data are captured within a formal probabilistic framework, and subsequently used during inference. As a proof of concept, we target the highly challenging problem of dimensional and continuous prediction of emotions from naturalistic facial expressions. We demonstrate the advantages of the proposed OA-RVM regression by performing both subject-dependent and subject-independent experiments using the SAL database. The experimental results show that OA-RVM regression outperforms the traditional RVM and SVM regression approaches in prediction accuracy, generating more robust and accurate models.
The authors describe how active shape models (ASMs) have been used to accurately and robustly locate vertebrae in lateral dual energy X-ray absorptiometry (DXA) images of the spine. DXA images are of low spatial resolution, and contain significant random and structural noise, providing a difficult challenge for object location methods. All vertebrae in the image were searched for simultaneously, improving robustness in location of individual vertebrae by making use of constraints on shape provided by the position of other vertebrae. They show that the use of ASMs with minimal user interaction allows accuracy to be obtained which is as good as that achievable by human operators using a standard manual method
The intended applications of automatic face recognition systems include venues that vary widely in demographic diversity. Formal evaluations of algorithms do not commonly consider the effects of population diversity on performance. We document the effects of racial and gender demographics on the accuracy of algorithms that match identity in pairs of face images. In particular, we focus on the effects of the “background” population distribution of non-matched identities against which identity matches are compared. The algorithm we tested was created by fusing three of the top performers from a recent US Government competition. First, we demonstrate the variability of algorithm performance estimates when the non-matched identities were demographically “yoked” by race and/or gender (i.e., “yoking” constrains non-matched pairs to be of the same race or gender). We also found a shift in the match threshold required to maintain a stable false positive rate when demographic control scenarios varied. These results were verified with two independent data sets that differed in demographic characteristics. In a second experiment, we explored the effects of progressive increases in population diversity on algorithm performance. We found systematic, but non-general, effects when the balance between majority and minority populations of non-matched identities shifted. Finally, we show that identity match accuracy differs substantially when the non-match identity population varied by race. The results indicate the importance of the demographic composition and modeling of the background population in predicting the accuracy of face recognition algorithms.
Skip to Main Content
IEEE Xplore Digital Library
Personal Sign In
Forgot Username or Password?
Personal Sign In
Institutional Sign In
Communication, Networking & Broadcasting
Components, Circuits, Devices & Systems
Computing & Processing (Hardware/Software)
Engineered Materials, Dielectrics & Plasmas
Fields, Waves & Electromagnetics
General Topics for Engineers (Math, Science & Engineering)
Photonics & Electro-Optics
Power, Energy, & Industry Applications
Robotics & Control Systems
Signal Processing & Analysis
Books & eBooks
Education & Learning
Journals & Magazines
What can I access?
About IEEE Xplore
Resources and Help
What Can I Access?
Other Search Options
Browse Conference Publications >
Information, Communications a ...
Motion databases have a strong potential to guide progress in the field of machine recognition and motion-based animation. Existing databases either have a very loose structure that does not sample the domain according to any controlled methodology or too few action samples which limits their potential to quantitatively evaluate the performance of motion-based techniques. The controlled sampling of the motor domain in the database may lead investigators to identify the fundamental difficulties of motion cognition problems and allow the addressing of these issues in a more objective way. In this paper, we describe the construction of our Human Motion Database using controlled sampling methods (parametric and cognitive sampling) to obtain the structure necessary for the quantitative evaluation of several motion-based research problems. The Human Motion Database is organized into several components: the praxicon dataset, the cross-validation dataset, the generalization dataset, the compositionality dataset, and the interaction dataset. The main contributions of this paper include (1) a survey of human motion databases describing data sources related to motion synthesis and analysis problems, (2) a sampling methodology that takes advantage of a systematic controlled capture, denoted as cognitive sampling and parametric sampling, and (3) a novel structured motion database organized into several datasets addressing a number of aspects in the motion domain.
The paper proposes a method for obstacle detection on a runway for
autonomous navigation and landing of an aircraft. Detection is done in
the presence of extraneous features such as tire marks. Suitable
features are extracted from the image and warping using approximately
known camera and plane parameters is performed in order to compensate
ego-motion as far as possible. Residual disparity after warping is
estimated using an optical flow algorithm. Features are tracked from
frame to frame so as to obtain more reliable estimates of their motion.
Corrections are made to motion parameters with the residual disparities
using a robust method, and features having large residual disparities
are signalled as obstacles. Sensitivity analysis of the procedure is
also studied. A Bayesian framework is used at every stage so that the
confidence in the estimates can be determined
The authors present a novel algorithm for recovering motion
parameters of the 3D point sets given point coordinates in three
orthographic projections before and after motion. The algorithm requires
no correspondences between two time instants and only a few
correspondences in projections at the second time instant. The 3S
scatter matrix of the 3D sets is calculated from 2D scatter matrices of
projections. The algorithm uses matrix-eigendecomposition of the 3D
scatter matrix to determine four candidate solutions for the rotation
The Art Gallery Problem is the problem of determining the number of observers necessary to cover an art gallery room such that every point is seen by at least one observer. This problem is well known and has a linear solution for the 2 dimensional case, but little is known in the 3-D case. In this paper we present a polynomial time solution for the 3-D version of the Art Gallery problem. Because the problem is NP-hard, the solution presented is an approximation, and we present the bounds to our solution. Our solution uses techniques from computational geometry, graph coloring and set coverage. A complexity analysis is presented for each step and an analysis of the overall quality of the solution is given
We propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust “signature” extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods, including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos
The purpose of a document is to facilitate the transfer of information from its author to its readers. It is the author's job to design the document so that the information it contains can be interpreted accurately and efficiently. To do this, the author can make use of a set of stylistic tools. In this paper, we introduce the concept of document functionality, which attempts to describe the roles of documents and their components in the process of transferring information. A functional description of a document provides insight into the type of the document, into its intended uses, and into strategies for automatic document interpretation and retrieval. To demonstrate these ideas, we define a taxonomy of functional document components and show how functional descriptions can be used to reverse-engineer the intentions of the author, to navigate in document space, and to provide important contextual information to aid in interpretation
This paper presents a novel automatic method for view synthesis (or image transfer) from a triplet of uncalibrated images based on trinocular edge matching followed by transfer by interpolation, occlusion detection and correction and finally rendering. The edge-based technique proposed here is of general practical relevance because it overcomes most of the problems encountered in other approaches that either rely upon dense correspondence, work in projective space or need explicit camera calibration. Applications range from immersive media and teleconferencing, image interpolation for fast rendering and compression
This paper presents a novel method for determining the location of
the instantaneous epipole in a sequence of images acquired by an
uncalibrated camera and containing a single, rigid motion (e.g., the
camera moves in a static environment). The method uses the full
perspective camera model and requires the estimation of the optical flow
at a minimum of six image locations. The key observation is that the
optical flow equations can be written in terms of the epipole in a
strikingly simple form if the translational and rotational flow
components are not separated as done usually. The epipole location can
then be obtained as the minimum of a least-square residual function
associated to the computed optical flow. We report and discuss initial
experiments on both synthetic and real data and illustrate possible
developments of this method towards the use of uncalibrated optical flow
for 3-D motion and structure reconstruction
In general, multiple views are required to create a complete 3-D
model of an object or of a multi-roomed indoor scene. In this work, we
address the problem of merging multiple textured 3-D data sets, each of
which corresponds to a different view of a scene or object. There are
two steps to the merging process: registration and integration. To
register, or align, data sets we use a modified version of the Iterative
Closest Point algorithm; our version, which we call color ICP, considers
not only 3-D information, but color as well. We show that the use of
color decreases registration error by an order of magnitude. Once the
3-D data sets have been registered we integrate them to produce a
seamless, composite 3-D textured model. Our approach to integration uses
a 3-D occupancy grid to represent likelihood of spatial occupancy
through voting. In addition to occupancy information, we store surface
normal in each voxel of the occupancy grid. Surface normal is used to
robustly extract a surface from the occupancy grid; on that surface we
blend textures from multiple views
We present a method for recovery of shape and surface reflectance
of specular object from color images taken with the light source
rotating. The image sequence is taken with a fixed camera while the
light source is relatively rotating around the axis parallel to the
optical axis of the camera. We derive the relationship between the body
reflectance at every object surface point and the rotating angle of the
object. For the adapting specular reflectance case, we develop the
algorithm to extract only the body reflectance component which obeys the
derived relationship for estimating surface normals and the body
reflectance component. As a result, the object shape and body
reflectance distribution of the object surface can be recovered. In
addition to this, reflectance parameters of specularity are also
estimated by minimizing the fitting error of the specular reflectance
model. For demonstrating the effectiveness of the proposed method, we
show the shape, body reflectance, and specular reflectance of specular
objects, which are successfully recovered by the proposed method
A methodology for the optimal design of projection patterns for stereometric structured light systems is presented. The similarity as well as the difference between the design of projection patterns and the design of optimal signals for digital communication are discussed. The design of K projection patterns for a structured light system with L distinct planes of light is shown to be equivalent to the placement of L points in a K dimensional space subject to certain constraints. optimal design in the MSE sense is defined, but shown to lead to an intractable multi-parameter global optimization problem. Intuitively appealing suboptimal solutions derived from the family of K dimensional space-filling Hilbert curves are obtained. Preliminary experimental results are presented
At Oxford University much research has recently been focused on
active vision, and in particular dynamic contours. These deformable
model curves are able to track objects in an image using dynamic models
of the behaviour of the target which can be learnt over time. It seems
natural to extend these methodologies to investigate the problem of
measuring surfaces (rather than contours) undergoing motion. The range
sensor developed by the authors allows them to measure moving scenes, by
requiring just a single image to measure depth unambiguously even in the
presence of occlusion. By acquiring range data at frame rates from the
sensor, they can develop real time 3D surface trackers. Their work looks
at several key areas of dynamic model based tracking. The aim is to
apply this to the tracking of a flexing hand. Already the surface
contour is capable of tracking continuous surfaces in real time, such as
a piece of paper as it flexes
Most of the approaches to human action recognition tend to form complex models which require lots of parameter estimation and computation time. In this study, we show that, human actions can be simply represented by pose without dealing with the complex representation of dynamics. Based on this idea, we propose a novel pose descriptor which we name as Histogram-of-Oriented-Rectangles (HOR) for representing and recognizing human actions in videos. We represent each human pose in an action sequence by oriented rectangular patches extracted over the human silhouette. We then form spatial oriented histograms to represent the distribution of these rectangular patches. We make use of several matching strategies to carry the information from the spatial domain described by the HOR descriptor to temporal domain. These are (i) nearest neighbor classification, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis to rectangular patches, (iii) a classifier-based approach using Support Vector Machines, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the HOR descriptor. For the cases when pose descriptor is not sufficiently strong alone, such as to differentiate actions “jogging” and “running”, we also incorporate a simple velocity descriptor as a prior to the pose based classification step. We test our system with different configurations and experiment on two commonly used action datasets: the Weizmann dataset and the KTH dataset. Results show that our method is superior to other methods on Weizmann dataset with a perfect accuracy rate of 100%, and is comparable to the other methods on KTH dataset with a very high success rate close to 90%. These results prove that with a simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations.
A model-based handwritten Chinese character recognition (HCCR) system is proposed. The characters are represented by attributed relational graphs (ARG) using strokes as ARG vertices. A number of vector relational attributes are also used in the representation to improve the performance of the translation and scale invariant and rotation sensitive recognition system. Since the ETL-8 database is very noisy and broken strokes are commonly encountered, a suitable homomorphic energy function is proposed that allows the segments of a broken stroke of a test character to be matched to the corresponding model stroke. The homomorphic ARG matching energy is minimised using the self-organising Hopfield neural networks  [Suganthan, P.N., Teoh, E.K., Mital, D.P., A self-organising Hopfield network for attributed relational graph matching, Image and Vision Computing, 13(1) (1995) 61–73]. An effective formulation is introduced to determine the matching score. The formulation does not penalise the matching scores of test characters with broken strokes. Experiments were performed with 100 classes of characters in the ETL-8 database and 98.9% recognition accuracy has been achieved.
We present an approach to recognition of complex objects in cluttered three-dimensional (3D) scenes that does not require feature extraction or segmentation. Our object representation comprises descriptive images associated with oriented points on the surface of an object. Using a single point basis constructed from an oriented point, the position of other points on the surface of the object can be described by two parameters. The accumulation of these parameters for many points on the surface of the object results in an image at each oriented point. These images, localized descriptions of the global shape of the object, are invariant to rigid transformations. Through correlation of images, point correspondences between a model and scene data are established. Geometric consistency is used to group the correspondences from which plausible rigid transformations that align the model with the scene are calculated. The transformations are then refined and verified using a modified iterative closest point algorithm. The effectiveness of our representation comes from its ability to combine the descriptive nature of global object properties with the robustness to partial views and clutter of local shape descriptions. The wide applicability of our algorithm is demonstrated with results showing recognition of complex objects in cluttered scenes with occlusion.
This paper is on user relevance for image retrieval. We take this problem as a standard two-class pattern classification problem aiming at refining the retrieval precision by learning through the user relevance feedback data. However, we have investigated the problem by noting two important unique characteristics of the problem: small sample collection and asymmetric sample distributions between positive and negative samples. We have developed a novel approach to empirical Bayesian learning to solve for this problem by explicitly exploiting the two unique characteristics, which is the methodology of Bayesian Learning in Asymmetric and Small sample collections, thus called BALAS. In BALAS different learning strategies are used for positive and negative sample collections, respectively, based on the two unique characteristics. By defining the relevancy confidence as the relevant posterior probability, we have developed an integrated ranking scheme in BALAS, which complementarily combines the subjective relevancy confidence and the objective similarity measure to capture the overall retrieval semantics. The experimental evaluations have confirmed the rationale of the proposed ranking scheme, and have also demonstrated that BALAS is superior to an existing relevance feedback method in the current literature in capturing the overall retrieval semantics.
Shape is one of the primary low-level image features exploited in the newly emerged content-based image retrieval (CBIR). Many shape methods exist. Among these shape methods, Fourier descriptor (FD) is one of the most widely used shape descriptors due to its simple computation, clarity and coarse to fine description capability. FD has been applied to a variety of applications, including image retrieval application. Generally, FD can be acquired in a number of ways, however, FD acquired from different ways can have different retrieval performance. In this paper, we study shape retrieval using FD. Specifically, we study different ways of acquiring FD, the number of FD features needed for general shape description and the retrieval performance of different FD. A Java client–server retrieval framework has been developed to facilitate the study. The retrieval performance of the different FD is tested using a standard shape database and a commonly used performance measurement.
We describe a novel continuous medial representation for object geometry and a deformable templates method for fitting the representation to images. Our representation simultaneously describes the boundary and medial loci of geometrical objects, always maintaining Blum's symmetric axis transform (SAT) relationship. Cubic b-splines define the continuous medial locus and the associated thickness field, which in turn generate the object boundary. We present geometrical properties of the representation and derive a set of constraints on the b-spline parameters. The 2D representation encompasses branching medial loci; the 3D version can model objects with a single medial surface, and the extension to branching medial surfaces is a subject of ongoing research. We present preliminary results of segmenting 2D and 3D medical images. The representation is ultimately intended for use in statistical shape analysis.
We present a two-step algorithm for the recognition of circles. The first step uses a 2D Hough Transform for the detection of the centres of the circles and the second step validates their existence by radius histogramming. The 2D Hough Transform technique makes use of the property that every chord of a circle passes through its centre. We present results of experiments with synthetic data demonstrating that our method is more robust to noise than standard gradient based methods. The promise of the method is demonstrated with its application on a natural image and on a digitized mammogram.
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the 27 to 12 DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose.
We present a novel representation for scale, translation and rotation independent recognition of 2D object features based on the invariance properties of the included angles of a triangle which we exploit to construct signature histograms of local shape. We describe the practical implementation of this new technique together with its properties, and present a statistical quantification of performance in the presence of: fragmentation, additive noise and clutter. The scale-invariant properties are assessed, the results of which imply a fundamental limit on scale-invariant recognition from a single model.
This paper presents a new technique for deriving information on visual saliency with experimental eye-tracking data. The strength and potential pitfalls of the method are demonstrated with feature correspondence for 2D to 3D image registration. With this application, an eye-tracking system is employed to determine which features in endoscopy video images are considered to be salient from a group of human observers. By using this information, a biologically inspired saliency map is derived by transforming each observed video image into a feature space representation. Features related to visual attention are determined by using a feature normalisation process based on the relative abundance of image features within the background image and those dwelled on visual search scan paths. These features are then back-projected to the image domain to determine spatial area of interest for each unseen endoscopy video image. The derived saliency map is employed to provide an image similarity measure that forms the heart of a new 2D/3D registration method with much reduced rendering overhead by only processing-selective regions of interest as determined by the saliency map. Significant improvements in pose estimation efficiency are achieved without apparent reduction in registration accuracy when compared to that of using an intensity-based similarity measure.
Recent work in the psychological literature has indicated that attractive faces are in some ways “average” [J.H. Langlois, L.A. Roggman, Attractive faces are only average, Psychological Science, 1(2) (1990) 115–121] and that the apparent age of a face can be related to its proximity to the average of a computationally derived “face space” [A.J. O'Toole, T. Vetter, H. Volz, E.M. Salter, Three-dimensional caricatures of human heads: distinctiveness and the perception of facial age, Perception, 26 (1997) 719–732]. We examined the relationship between facial attractiveness, age, and “averageness”, using laser scans of faces that were put into complete correspondence with the average face [T. Vetter, V. Blanz, Estimating coloured 3D face models from single images: an example based approach, in: H. Burkhardt, B. Neumann (Eds.), Proceedings of the Fifth European Conference on Computer Vision, Freiburg, Germany, 1998, pp. 499–513]. This representation enabled selective normalization of the 3D shape versus the surface texture map of the faces. Shape-normalized faces, created by morphing the texture maps from individual faces onto the average head shape, and texture-normalized faces, created by morphing the average texture onto the shape of each individual face, were judged by human subjects to be both more attractive and younger than the original faces. The study shows that relatively global, psychologically meaningful attributes of faces can be modeled very simply in face spaces of this sort.
We propose a novel model-based coding system for video. Model-based coding aims at improving compression gain by replacing the non-informative image elements with some perceptually equivalent models. Images enclosing large textured regions are ideal candidates. Texture movies are obtained by filming a static texture with a moving camera. The integration of the motion information within the generative texture process allows to replace the ‘real’ texture with a ‘visually equivalent’ synthetic one, while preserving the correct motion perception. Global motion estimation is used to determine the movement of the camera and to identify the overlapping region between two successive frames. Such an information is then exploited for the generation of the texture movies. The proposed method for synthesizing 2D+1 texture movies is able to emulate any piece-wise linear trajectory. The compression performance is very encouraging. On this kind of video sequences, the proposed method improves the compression rate of an MPEG4 state-of-the-art video coder of an order of magnitude while providing a sensibly better perceptual quality. Importantly, the current implementation is real-time on Intel PIII processors.
Two novel algorithms are presented for depth estimation using point correspondences and the ground plane constraint. One is a direct non-iterative method, and the other a simple, well-behaved iterative technique where the choice of initial value is straightforward. The algorithms are capable of handling any number of points and frames as well as points which become occluded. Once the point depths are determined, motion parameters can be obtained by a linear least squares technique. Extensive test results are included which show that the proposed algorithms are robust to noise, and perform satisfactorily using real outdoor image sequences.
This paper presents a framework for ‘Filling In’ missing gaps in images and particularly patches with texture. The algorithm can also be used as a fallback mode in treating missing data for video sequence reconstruction. The underlying idea is to construct a parametric model of the p.d.f. of the texture to be re-synthesised and then draw samples from that p.d.f. to create the resulting reconstruction. A Bayesian approach is used to articulate 2D Autoregressive Models as generative models for texture (using the Gibbs sampler) given surrounding boundary conditions. A fast implementation is presented that iterates between pixelwise updates and blockwise parametric model estimation. The novel ideas in this paper are joint parameter estimation and fast, efficient texture reconstruction using linear models.
Statistical shape models are used widely as a basis for segmenting and interpreting images. A major drawback of the approach is the need, during training, to establish a dense correspondence across a training set of segmented shapes. We show that model construction can be treated as an optimisation problem, automating the process and guaranteeing the effectiveness of the resulting models. This is achieved by optimising an objective function with respect to the correspondence. We use an information theoretic objective function that directly promotes desirable features of the model. This is coupled with an effective method of manipulating correspondence, based on re-parameterising each training shape, to build optimal statistical shape models. The method is evaluated on several training sets of shapes, showing that it constructs better models than alternative approaches.
A 3D object recognition system is described that employs novel multiresolution representation and coarse encoding of feature information. Modifications are bought to classic feature extraction methods by proposing the use of wavelet transform maxima for directing the actions of feature extraction modules. The reasons behind the use of a multi-channel architecture are described, together with the description of the feature extraction and coarse modules. The targeted field of application being automatic categorisation of natural objects, the proposed system is designed to run on ordinary hardware platforms and to process an input in a short timeframe. The system has been evaluated on a variety of 2D views of a set of 5 synthetic objects designed to present various degrees of similarity, as being rated by a panel of human subjects. Parallels between these ratings and the system’s behaviour are drawn. Additionally a small set of photomicrographs of fish larvae has been used to assess the system’s performance when presented with very similar, non-rigid shapes. For comparison, the parameters extracted from each image were fed into two categorisers, discriminant analysis and multilayer feedforward neural network with backpropagation of error. Experimental evidence is presented which demonstrates the efficacy of the methods. The satisfactory categorisation performances of the system are reported, and conclusions are drawn about the system’s behaviour.
An object capturing technique using cubic Bézier is presented in this paper. Proposed technique produces set of data points which are the control points of approximating Bézier curve. The control points are determined by an efficient search algorithm producing optimal curves. Approximation process is simplified by decomposition of outline into smaller curves. The decomposition/subdivision is performed on detected corner points as a preprocessing step. Further subdivision is done by recursive algorithm during the approximation process. Proposed algorithm has various advantages like computational efficiency, better shape representation, low approximation error and high compression ratio. This is demonstrated in comparison with other algorithms.
Accurate detection and localisation of two-dimensional (2D) image features (or ‘key-points’) is important for vision tasks such as structure from motion, stereo matching and line labelling. Despite this interest, no one has produced an adequate definition of 2D image features that encompasses the variety of features that should be included under this banner. In this paper, we present a new method for the detection of 2D image features that relies upon maximal 2D order in the phase domain of the image signal. Points of maximal phase congruency correspond to all the different types of 2D features detected by other schemes, including grey-level corners, line terminations, and a variety of junctions. An assessment of our implementation's performance is provided, in terms of its robustness, accuracy of detection and localisation of 2D image features.
Motion quantification from 2D sequential cardiac images has been performed on axial images of the left ventricle (LV) obtained from two different imaging modalities (MRI and Echocardiography images). The detail point wise motion vectors were evaluated by establishing shape correspondence between the consecutive contours after reconstructing curvature information by wavelet synthesis filters at multiple levels. We present a simple approach that optimizes the shape correspondence taking the non-uniform contour variation in to account. The shape matching is done by maximizing the correlation between the approximation coefficient vectors at certain levels. The algorithm has been tested over sets of 2D images and the results are compared with that obtained from a bending energy model. Some experimental results have also been presented for validation of the algorithm.
In this paper, a new framework for one-dimensional contour extraction from discrete two-dimensional data sets is presented. Contour extraction is important in many scientific fields such as digital image processing, computer vision, pattern recognition, etc. This novel framework includes (but is not limited to) algorithms for dilated contour extraction, contour displacement, shape skeleton extraction, contour continuation, shape feature based contour refinement and contour simplification. Many of the new techniques depend strongly on the application of a Delaunay tessellation. In order to demonstrate the versatility of this novel toolbox approach, the contour extraction techniques presented here are applied to scientific problems in material science, biology and heavy ion physics. Comment: 20 pages, 15 figures, submitted to "Image and Vision Computing" journal
The results of our investigation of several measurements on digitized 2D and 3D objects with fuzzy borders are presented. The performance of surface area, volume, and roundness measure estimators for digitized balls with fuzzy borders is analyzed. The method we suggest provides significant improvement in precision, compared to analogous estimation results obtained on a crisp (hard) segmentation, especially in the case of low resolution images.