[Show abstract][Hide abstract]ABSTRACT: This paper presents a mesh-based approach for D face recognition using a novel local shape descriptor and a SIFT-like matching process. Both maximum and minimum curvatures estimated in the 3D Gaussian scale space are employed to detect salient points. To comprehensively characterize 3D facial surfaces and their variations, we calculate weighted statistical distributions of multiple order surface differential quantities, including histogram of mesh gradient (HoG), histogram of shape index (HoS) and histogram of gradient of shape index (HoGS) within a local neighborhood of each salient point. The subsequent matching step then robustly associates corresponding points of two facial surfaces, leading to much more matched points between different scans of a same person than the ones of different persons. Experimental results on the Bosphorus dataset highlight the effectiveness of the proposed method and its robustness to facial expression variations.
[Show abstract][Hide abstract]ABSTRACT: 3D face models accurately capture facial surfaces, making it possible for precise description of facial activities. In this
paper, we present a novel mesh-based method for 3D facial expression recognition using two local shape descriptors. To characterize
shape information of the local neighborhood of facial landmarks, we calculate the weighted statistical distributions of surface
differential quantities, including histogram of mesh gradient (HoG) and histogram of shape index (HoS). Normal cycle theory
based curvature estimation method is employed on 3D face models along with the common cubic fitting curvature estimation method
for the purpose of comparison. Based on the basic fact that different expressions involve different local shape deformations,
the SVM classifier with both linear and RBF kernels outperforms the state of the art results on the subset of the BU-3DFE
database with the same experimental setting.
[Show abstract][Hide abstract]ABSTRACT: Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Full-text available · Article · May 2011 · IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society
[Show abstract][Hide abstract]ABSTRACT: In this paper we present the results of the 3D Shape Retrieval Contest 2011 (SHREC'11) track on face model retrieval. The aim of this track is to evaluate the performance of 3D shape retrieval algorithms that can operate on 3D face models. The benchmark dataset consists of 780 3D face scans of 130 individuals. Four groups have participated in the track with 14 method variations in total.
[Show abstract][Hide abstract]ABSTRACT: In this paper, we focus on one of the ImageCLEF tasks that LIRIS-Imagine research group participated: visual concept detection and annotation. For this task, we firstly propose two kinds of textual fea tures to extract semantic meanings from text associated to images: one is based on semantic distance matrix between the text and a semantic dictionary, and the other one carries the valence and arousal meanings by making use of the Affective Norms for English Words (ANEW) dataset. Meanwhile, we investigate efficiency of different visual features including color, texture, shape, high level features, and we test four fusion methods to combine various features to improve the performance including min, max, mean and score. The results have shown that combination of our textural features and visual features can improve the performance significantly.
[Show abstract][Hide abstract]ABSTRACT: Recognition of facial action units (AU) is one of two main streams in the facial expressions analysis. Action units deform facial appearance simultaneously in landmark locations and local texture as well as geometry on 3D faces. Thus, it is necessary to extract features from multiple facial modalities to characterize these deformations comprehensively. In order to fuse the contribution of the discriminative power from all features efficiently, we propose to use our extended statistical facial feature models (SEAM) to generate feature instances corresponding to AU class for each feature. Then the similarity between each feature on a face and its instances are evaluated so that a set of similarity scores are obtained. All sets of scores on the face are then weighted for AU recognition. Experiments on the Bosphorus database show its state-of-the-art performance.
[Show abstract][Hide abstract]ABSTRACT: Automatic facial expression recognition on 3D face data is still a challenging problem. In this paper we propose a novel approach to perform expression recognition automatically and flexibly by combining a Bayesian Belief Net (BBN) and Statistical facial feature models (SFAM). A novel BBN is designed for the specific problem with our proposed parameter computing method. By learning global variations in face landmark configuration (morphology) and local ones in terms of texture and shape around landmarks, morphable Statistic Facial feature Model (SFAM) allows not only to perform an automatic landmarking but also to compute the belief to feed the BBN. Tested on the public 3D face expression database BU-3DFE, our automatic approach allows to recognize expressions successfully, reaching an average recognition rate over 82%.
[Show abstract][Hide abstract]ABSTRACT: In this paper, we introduce a new approach for partial 3D face recognition, which makes use of shape decomposition over the rigid part of a face. To explore the descriptiveness of shape dissimilarity over an isometric part of a face, which has lower probability to be influenced by expression, we transform a 3D shape to a 2D domain using conformal mapping and use shape decomposition as a similarity measurement. In our work we investigate several classifiers as well as several shape descriptors for recognition purposes. Recognition tests on a subset of the FRGC data set show approximately 80% rank-one recognition rate using only the eyes and nose part of the face.
[Show abstract][Hide abstract]ABSTRACT: The Local Binary Pattern (LBP) operator is a computationally efficient yet powerful feature for analyzing local texture structures. While the LBP operator has been successfully applied to tasks as diverse as texture classification, texture segmentation, face recognition and facial expression recognition, etc., it has been rarely used in the domain of Visual Object Classes (VOC) recognition mainly due to its deficiency of power for dealing with various changes in lighting and viewing conditions in real-world scenes. In this paper, we propose six novel multi-scale color LBP operators in order to increase photometric invariance property and discriminative power of the original LBP operator. The experimental results on the PASCAL VOC 2007 image benchmark show significant accuracy improvement by the proposed operators as compared with both the original LBP and other popular texture descriptors such as Gabor filter.
[Show abstract][Hide abstract]ABSTRACT: 3D face landmarking aims at automatic localization of 3D facial features and has a wide range of applications, including face recognition, face tracking, facial expression analysis. Methods so far developed for pure 2D texture images were shown sensitive to lighting condition changes. In this paper, we present a statistical model-based technique for accurate 3D face landmarking, thus using an Â¿analysis by synthesisÂ¿ approach. Our model learns from a training set both variations of global face shapes as well as the local ones in terms of scale-free texture and range patches around each landmark. Given a shape instance, local regions of a new face can be approximated by synthesizing texture and range instances using respectively the texture and range models. By optimizing an objective function describing the similarity of the new face and instances, we can optimize the best shape in order to locate the landmarks. Experimented on more than 1860 face models from FRGC datasets, our method achieves an average of locating errors less than 7 mm for 15 feature points. Compared with a curvature analysis-based method also developed within our team, this learning-based method enables localization of more facial landmarks with a general better accuracy at the cost of a learning step.
[Show abstract][Hide abstract]ABSTRACT: In this paper we present a conformal mapping-based ap-proach for 3D face recognition. The proposed approach makes use of conformal UV parameterization for mapping purpose and Shape Index decomposition for similarity mea-surement. Indeed, according to conformal geometry theory, each 3D surface with disk topology can be mapped onto a 2D domain through a global optimization, resulting in a diffeomorphism, i.e., one-to-one and onto. This allows us to reduce the 3D surface matching problem to a 2D image matching one by comparing the corresponding 2D confor-mal geometric maps. To deal with facial expressions, the Möbius transformation of UV conformal space has been used to 'compress' face mimic region. Rasterized images are used as an input for (2D) 2 P CA recognition algorithm. Experimented on 62 subjects randomly selected from the FRGC dataset v2 which includes different facial expres-sions, the proposed method displays a 86.43%, 97.65% and 69.38 rank-one recognition rate in respectively Neutral vs. All, Neutral vs. Neutral and Neutral vs. Expression scenar-ios.
[Show abstract][Hide abstract]ABSTRACT: While it is very hard to achieve automatic sports competition key moments detection only based on visual analysis, we propose in this paper automatic highlights detection based on an audio classifier. The audio classifier is based on a new modeling technique of the audio spectrum called Piecewise Gaussian Modeling (PGM) and Neural Networks. The proposed approach was evaluated on soccer and tennis videos, though our technique has no restriction on the sports' type. It is shown that audio-based highlights detection can be effective for tennis segmentation since 97.5% of end-of-serves were correctly classified. Goals can be detected in soccer videos using audio analysis as well. An intelligent sports-videos player is proposed based on the audio analysis permitting the user to navigate through key moments in a sports video.
[Show abstract][Hide abstract]ABSTRACT: 3D face landmarking aims at automatic localization of 3D facial features and has a wide range of applications, including face
recognition, face tracking, facial expression analysis. Methods so far developed for 2D images were shown sensitive to lighting
condition changes. In this paper, we propose a learning-based approach for reliable locating of face landmarks in 3D. Our
approach relies on a statistical model, called 3D Statistical Facial feAture Model(SFAM) in the paper, which learns both global
variations in 3D face morphology and local ones around the 3D face landmarks in terms of local texture and shape. Experimented
on FRGC v1.0 dataset, our approach shows its effectiveness and achieves 99.09% of locating accuracy in 10mm precision. The
mean error and standard deviation of each landmark are respectively less than 5mm and 4.
[Show abstract][Hide abstract]ABSTRACT: Vision-based people counting systems have wide potential applications including video surveillance and public resources management. Most works in the literature rely on moving object detection and tracking, assuming that all moving objects are people. In this paper, we present our people counting approach based on face detection, tracking and trajectory classification. While we have used a standard face detector, we achieve face tracking combining a new scale invariant Kalman filter with kernel based tracking algorithm. From each potential face trajectory an angle histogram of neighboring points is then extracted. Finally, an Earth Mover's Distance-based K-NN classification discriminates true face trajectories from the false ones. Experimented on a video dataset of more than 160 potential people trajectories, our approach displays an accuracy rate up to 93%.
[Show abstract][Hide abstract]ABSTRACT: This paper introduces audio scenes and audio chapters in movies and presents an efficient algorithm for automatically structuring a video based on the audio stream only. The automatic solution to audio scene and chapter segmentation is evaluated on manually segmented videos.
[Show abstract][Hide abstract]ABSTRACT: This paper presents a novel approach for visual object classification. Based on Gestalt theory, we propose to extract features
from coarse regions carrying visually significant information such as line segments and/or color and to include neighborhood
information in them. We also introduce a new classification method based on the polynomial modeling of feature distribution
which avoids some drawbacks of a popular approach, namely “bag of keypoints”. Moreover we show that by separating features
extracted from different sources in different “channels”, which are then combined using a late fusion strategy, we can limit
the impact of feature dimensionality and actually improve classification accuracy. Using this classifier, experiments reveal
that our features lead to better results than the popular SIFT descriptors, but also that they can be combined with SIFT features
to reinforce performance, suggesting that our features managed to extract information which is complementary to the one of
[Show abstract][Hide abstract]ABSTRACT: A digital picture generally contains tens of thousands of colors. Therefore, most image processing applications first need to apply a color reduction scheme before performing further sophisticated analysis operations such as segmentation. While a lot of color reduction techniques exist in the literature, they are mainly designed for image compression and are unfortunately not suited for many image processing operations (e.g. segmentation) as they tend to alter image color structure and distribution. In this paper, we propose a new color reduction scheme (SICR), using probabilities and information theory elements to balance between the information provided by the selected colors and the necessity to accurately represent the selected colors. We also advocate for the use of perceptually accurate metrics for evaluation. Experimental results on a diversified dataset of images selected from the internet show that our technique performs well compared to other color reduction schemes.
[Show abstract][Hide abstract]ABSTRACT: The audio channel conveys rich clues for content-based multimedia indexing. Interesting audio analysis includes, besides widely
known speech recognition and speaker identification problems, speech/music segmentation, speaker gender detection, special
effect recognition such as gun shots or car pursuit, and so on. All these problems can be considered as an audio classification
problem which needs to generate a label from low audio signal analysis. While most audio analysis techniques in the literature
are problem specific, we propose in this paper a general framework for audio classification. The proposed technique uses a
perceptually motivated model of the human perception of audio classes in the sense that it makes a judicious use of certain
psychophysical results and relies on a neural network for classification. In order to assess the effectiveness of the proposed
approach, large experiments on several audio classification problems have been carried out, including speech/music discrimination
in Radio/TV programs, gender recognition on a subset of the switchboard database, highlights detection in sports videos, and
musical genre recognition. The classification accuracies of the proposed technique are comparable to those obtained by problem
specific techniques while offering the basis of a general approach for audio classification.
Full-text available · Article · Jul 2007 · Multimedia Tools and Applications
[Show abstract][Hide abstract]ABSTRACT: While the problem of Content Based Image Retrieval (CBIR) and automated image indexing systems has been widely studied in the past years they still represent a chal-lenging research field. Indeed capturing high level seman-tics from digital images basing on low level basic descrip-tors remains an issue. A review of existing systems shows that edge descriptors are among the most popular features. While color features have led to extensive work, edge fea-tures haven't produced such active research and most cur-rent systems rather rely on completing basic edge informa-tion with other, more computationally expensive features such as texture. In this paper we propose to work on a more accurate edge feature while keeping a relatively low computation cost. We will begin with a review of common edge features used in CBIR and automated indexing sys-tems, we will then explain our Enhanced Fast Hough Trans-form algorithm and the edge descriptor we derived from it. Through a study of computational complexity, we will ex-plain that the computational burden is kept minimal and ex-perimental results using a sample automated indexing sys-tem will show that our new edge feature significantly im-proves over more traditional descriptors.