ArticlePDF Available

Abstract and Figures

A basic goal of the area of knowledge Human Computer Interaction (HCI) is to improve the interaction process between users and computers by making their interfaces more usable and receptive to the user's needs. Specifically, HCI is concerned with developing new interfaces and interaction techniques. For that we need to study and implement new types of interaction, such as the ones based on facial tracking approaches. The main goals of this paper are to, firstly, do a survey\evaluation of the existing facial processing systems and secondly, to suggest a possible hybrid based solution, robust enough to use in different kind of applications.
Content may be subject to copyright.
Facial Tracking
Paulo Brito1,
1 Universidade do Minho, Mestrado de Informática – UCE 15, Portugal
brito.paulo@netcabo.pt
Abstract. A basic goal of the area of knowledge Human Computer Interaction
(HCI) is to improve the interaction process between users and computers by
making their interfaces more usable and receptive to the user's needs.
Specifically, HCI is concerned with developing new interfaces and interaction
techniques. For that we need to study and implement new types of interaction,
such as the ones based on facial tracking approaches. The main goals of this
paper are to, firstly, do a survey\evaluation of the existing facial processing
systems and secondly, to suggest a possible hybrid based solution, robust
enough to use in different kind of applications.
Keywords: facial tracking, human-computer interaction, object detection,
pattern recognition
1 Introduction
Images containing faces are essential to a smart vision-based human computer
interaction. Because of this, several lines of investigation in face processing focus
their efforts in face recognition, face tracking, pose estimation and expression
recognition. On average used face detection algorithms can detect faces accurately, in
terms of its position in a picture. A comprehensive survey on face detection can be
found in [1].
To make fully automated systems that analyze the information contained in face
images, robust and efficient face detection algorithms are required. Considering a
single image, the goal of face detection is to identify all the regions in the picture
which potentially contain a face regardless of its three-dimensional position,
orientation, and lighting conditions. Such a problem is demanding because faces are
non-rigid and have a high degree of variability in size, shape, color and texture.
Human face tracking and detection is often the first step in applications such as
video surveillance, human computer interface, face recognition and image database
management. Locating and tracking human faces is a requirement for face recognition
and/or facial expressions analysis, although it is often assumed that a face image is
available.
The disabled who have restricted voluntary motion suffers often great difficulties
to interact with the computer in order to access the internet or/and media
entertainments, or even perform basic tasks such as information searching. Therefore,
assistive technology devices have been developed to diminish these limitations.
However, most of them require extra devices which are uncomfortable to wear or use,
such as infrared appliance, electrodes, goggles and mouth sticks.
As described in [2], the facial tracking technology has already several applications
in the industry for disabled persons. This allows, for instance, users who have
difficulty using a standard interaction device, such as a mouse, to be able to
manipulate an on-screen cursor by moving their head.
Recently, due to the enhancement of computer performance and popular use of
webcam, it is common to construct Human-Computer-Interaction (HCI) systems by
combining the strengths of novel and intelligent video processing and computer vision
methods. These systems track the users’ motions with a video camera and map them
into the displacement of the mouse cursor on the screen.
However, to obtain a real-time tracking, many problems have to be resolved. One
important problem lies in the variety of existing phenotypes, such as skin or eyes
color, beards and glasses. Another problem is the response time of the system. It
doesn’t matter if one can recognize and track a face if it’s not within a
tolerable/useable time frame. Also, the tracking has to be precise and robust for
practical use. These issues increase the difficulty for the recognition of faces and for
the tracking of its features in real time.
The problem of tracking faces using a video camera makes the focus of this work
on the design of vision-based perceptual user interfaces. These are systems that use a
video camera to track user’s face position in 3D, in order to translate it to a position of
a cursor or another virtual object in 2D screen. They are projected to provide a hands-
free alternative to mouse, joystick and track pad.
The organization of this paper is the following. Section 2 describes and evaluates
some of the methodologies of face tracking. Section 3 presents a discussion about
possible hybrid face tracking system solution. Finally, we show some conclusions and
future work in Section 4.
1.1 Previous work
The applications that use face tracking algorithms require the system to be fast,
affordable and, most importantly, precise and robust. In particular, the precision
should be sufficient to control a cursor, while the robustness should be high enough to
allow a user the convenience and the flexibility of head motion.
Only few hardware companies have developed hands-free mouse replacements. In
particular, in accessibility community, several companies developed products which
can track a head both accurately and reliably. These products however either use
dedicated software running on hardware or structured environment (e.g. markings on
the user’s face) to simplify the tracking process. At the same time, the recent
advances in hardware, the fast Universal Serial Bus (USB) and USB2 interfaces, the
falling camera prices, and the increase of computer power, brought a lot of attention
to the real-time face tracking problem from the computer vision community.
The obtained vision-based solutions though still do not exhibit required precision
and robustness. The approaches to vision-based face tracking can be separated into
two classes: image-based and feature-based approaches [3].
Image-based approaches use global facial cues such as skin colour, head geometry
and motion. They are robust to head rotation and scale and do not require high quality
images. On the other hand, these approaches lack precision and therefore cannot be
used to control the cursor precisely.
In order to achieve precise and smooth face tracking, feature-based approaches are
used [2] instead. These approaches are based on tracking individual facial features.
The features can be tracked with pixel accuracy, which allows one to convert their
positions to the cursor position. The disadvantage of these approaches is that they
usually require expensive high-resolution cameras. Also, they are not robust to the
head motion, especially to head rotation and scale. This is the reason why vision-
based games and interfaces are still not widespread.
Although recently it has been shown that the robustness of local feature tracking
can be significantly improved if instead of commonly used edge-based features such
as corners and edges of brows, mouth and nostrils, features based on the curvature of
the nose are used [4]. This actually creates a new range of interesting possibilities to
the feature based face tracking.
2 Methodologies of face tracking
Face tracking is also a core component to enable the computer to perceive the
computer user in a Human-Computer Interface system. Because of this, we want to
analyze solutions that can give us the capability of building a robust and accurate face
tracking system running in real-time and based only in a USB camera and a middle
market desktop system.
In this section we will study/evaluate some face tracking algorithms and their
performance regarding precision, robustness and learning capabilities.
Table 1. Categorization of Methods.
Method Approach
2.1 Stochastic color model
and the deformable template Feature invariant : skin color
Template Matching : deformable
templates
2.2 Robust skin color Feature invariant : skin color in different
color spaces
2.3 Skin Color in Image by
Neural Networks Feature invariant : skin color
Appearance-based method : neural
networks
2.4 QuadTree based Color
Analysis and Support Vector
Verification
Feature invariant : skin color
Appearance-based method : support
vector machine
Knowledge based : multi-resolution
2.5 Facial Feature Detection Feature invariant : facial features
using Haar Classifiers
2.1 Face tracking system based on the stochastic color model and the
deformable template
This algorithm was proposed by [5] and is a real-time face-tracking algorithm. It
starts with a single face tracking algorithm based on statistical color modeling and the
usage of a deformable template.
The method described in [6] is used to implement the face tracking. It models color
according a statistical approach having as a reference the so-called deformable
template. First, for each pixel in the current video frame, it is calculated the
probabilities of each pixel belonging to each of the two classes: the face class (the
foreground class) and the non-face class (the background class). Then it uses a
deformable template to group the pixels more likely to belong to the face class. It
deforms the template so that the area includes as many face pixels as possible and at
the same time includes as few background pixels as possible. The optimal
deformation can be found by using logarithmic search.
Algorithm :
-Color model
Use Gaussian Mixture Model (GMM) to model the color distribution of the
face region and the background region.
Then for each pixel, calculate the log probability ratio.
-Deformable template
Use a logarithmic search to deform the template and fit it with the face
optimally as we can see in figure 1, based on the target function (log
probability ratio) that was just calculated in the step above.
Fig. 1. This shows the region where the face was correctly targeted. The red zone is the
foreground class and the predominant orange zone is the background class. Image adapted from
[7].
- Constraint based multiple face tracking
Expand the algorithm to multiple face tracking based on constraints on the
speed and size of the faces.
This algorithm was used on video sequences with different occlusion patterns and
the conclusion is that it works well for most of the sequences. Still, the recovery
process could be improved because after occlusion there is a very small tracking error
detected (the tracking error is the difference between the face detection process and
the tracking results).
2.2 Robust Skin Color Based Face Detection Algorithm
This algorithm is based on “Skin Color” and in the analysis of these three color
spaces, RGB, YCbCr and HSI [8]. In this case the solution is to compare the
algorithms based on these color spaces and combine them to get a new skin color
based face detection algorithm with a higher accuracy.
The study on skin color classification has gained increasing attention in recent
years due to the active research in content-based image representation. It would be
fair to say that the most popular way to face localization is to use the color
information, whereby estimating areas with skin color is often the first vital step of
such strategy. Hence, skin color classification has become a significant task. Much of
the research in skin color based face localization and detection is based on RGB,
YCbCr and HSI color spaces.
The Skin Color Based Face Detection in RGB Color Space is one of the simplest
algorithms for detecting skin pixels. The perceived human skin color varies as a
function of the relative direction to the illumination. The pixels for skin region can be
detected using a normalized color histogram, and can be further normalized for
changes in intensity on dividing by luminance. Thus a [R, G, B] vector is converted
into a [r, g] vector of normalized color which provides a fast means of skin detection.
This gives the skin color region where the face is. The output is a face detected image
from the skin region. This algorithm fails when there are more potential skin regions
like legs or arms.
The Skin Color Based Face Detection in YCbCr Color Space is based on color
statistics gathered from this color space. Studies have found that pixels belonging to
skin region exhibit similar Cb and Cr values. Furthermore, it has been shown that skin
color model based on the Cb and Cr values can provide good coverage of different
human skin tones. A pixel is classified to have skin tone if the values [Cr, Cb] fall
inside the interval chosen as [Cr1, Cr2] and [Cb1, Cb2]. The skin color distribution
gives the face portion in the color image. This algorithm has also the constraint that
the image should contain only a face as the skin region.
The Skin Color Based Face Detection in HSI Color Space is based on a color
predicate in HSV color space to separate skin regions from background. Skin color
classification in HSV color space is the same as in YCbCr but here the responsible
values are the hue (H) and the saturation (S). Firstly, a threshold is chosen for
maximum and minimum values, such as [H1, S1] and [H2, S2]. Then, a pixel is
classified to have skin tone if its values [H,S] fall within those threshold. Similar to
the above algorithms, this one also presents the same constraint.
Combining the detected regions from the three algorithms described above, all the
skin regions present in the image are extracted. Further these regions have their
features extracted, so that a human face can be detected. Finally, a bounding box is
drawn around the face region with the help of the detected facial features. The
assumption can thus be stated as: if the skin image is detected by one or more
algorithm(s) and for the same image other algorithm gives the false result, then the
face is extracted using the combination algorithm. This statement is based on the
basic idea of the Venn diagram from the Set Theory [9]. If we state that the result
from RGB color space is “A”, the result from YCbCr color space is “B” and the result
from HSI color space is “C” and if any of the result contains a skin image then the
union of the three will surely be a skin image, as we can see in figure 2.
Fig. 2. Result of combining the three color spaces into one, and doing a dilation and erosion.
Image adapted from [8].
The evaluation of this technique shows that is good enough to localize a human
face in an image with an accuracy of 95%.
2.3 Face Detection based on Skin Color in Image by Neural Networks
As far as seen till now, information of skin color in a color image is a very popular
and useful technique for face detection. It allows the construction of a very rapid
classifier. This algorithm actually extends the solution proposed by the Skin Color
Based Face Detection in YCbCr Color Space described in the section 2.2, as we can
see in figure3, adding to it a neural network approach.
This approach [10] can be structured into three main categories, pre processing,
segmentation and classification using neural networks.
Fig. 3. Algorithm for face detection based on skin color in image by neural networks.
Image adapted from [10].
Different people have different skin color, while the difference lies mostly in the color
intensity not in chrominance color itself. The YCbCr color space is one of the
successful color spaces in segmenting skin color accurately.
The discrete cosine transform (DCT) helps separate the image into parts of differing
importance (with respect to the image's visual quality). The DCT is similar to the
discrete Fourier transform: it transforms a signal or image from the spatial domain to
a frequency domain.
Neural networks are often used in face detection. In this case , the use of multi layer
perception (MLP) back propagation neural networks is for training data set and
classify features that are extracted using DCT coefficient after being divided into
blocks of size 8x8 pixels. The main advantage of choosing Back propagation in neural
networks is the simplicity and ability of supervising pattern matching.
This algorithm has been tested on a dataset of frontal color face images from the
internet and achieved excellent detection rates. Nevertheless this system is just a first
step to implement a face recognition system in real time, because of the pre
processing phase.
2.4 Fast Face Detection Using QuadTree based Color Analysis and Support
Vector Machine
The proposed system consists of three major components, namely the skin color
detection module, the wavelet transformation module and the support vector machine
module [11].
The skin color detection module extracts skin color region from the image. The
wavelet transformation [12] module transforms the image into a pyramid that consists
of images at different resolutions. Then it forms a primitive Quadtree [13] for
efficient skin color searching. Finally the support vector machine module determines
whether the input pattern is a face pattern or not.
In general, by starting analysis from lowest resolution and by limiting the range of
resolution of the image to be analyzed, the total number of pixels to be actually
processed is small and the algorithm is thus fast. In addition, no further morphological
analysis is needed because doing an inferior to better quality analysis using a
Quadtree structure is indeed an image smoothing and noise filtering process.
Although it seems that the wavelet decomposition step causes serious computational
load compared with simple blurring in order to generate image pyramid, the wavelet
coefficients calculated can be used in the verification step which increases the
reliability of the system. The system is supposed to detect faces reliably in reasonable
time, which means that it can be used to track faces in real time by a user.
The color analysis shows that skin color tends to form clusters in different color
spaces. In the proposed system, color is used as the only heuristic feature. Skin color
pixel can be identified by comparing the given pixel with the skin color distribution or
model.
Skin color model is learnt offline. Face images taken from office environment
under different lighting condition are used in skin color learning. Skin pixels are
separated manually. Those skin color pixels extracted will be converted from RGB
color space to normalized rg-color space: r = R/(R +G+B) and g = G/(R + G + B). A
histogram is then formed from those manually selected skin pixels in rg-color space.
In the multi-scale wavelet analysis the image can be broken down into constituent
images with different resolution through wavelet transform and an image pyramid can
then be formed. Under Quadtree based searching scheme, color analysis can start
from the top of the pyramid (images with lowest resolution) downwards to the bottom
of the pyramid (images with highest resolution). Information from analysis at a lower
resolution can be used in analysis at a higher resolution. Analysis at a lower resolution
can thus determine whether it is necessary to explore certain set of pixels at a higher
resolution. Time can hence be saved by avoiding unnecessary traversal of pixels at
higher resolution. Besides, by assuming that faces detected are of similar size, the
algorithm can limit the searching depth to a finite number of levels after skin color is
first detected and consequently increase the efficiency.
Finally we have the support vector machine (SVM) that has been widely used in
face detection and recognition recently due to its non-linear classification power [14].
Support vector machines are a set of related supervised learning methods used for
classification and regression. Viewing input data as two sets of vectors in a n-
dimensional space, a SVM will construct a separating hyperplane in that space, one
which maximizes the margin between the two data sets. To calculate the margin, two
parallel hyperplanes are constructed, one on each side of the separating hyperplane,
which are "pushed up against" the two data sets. Intuitively, a good separation is
achieved by the hyperplane that has the largest distance to the neighboring datapoints
of both classes, since in general the larger the margin the better the generalization
error of the classifier.
In this algorithm, during the learning phase, the SVM will be trained to learn the
face pattern. During testing phase, the wavelet coefficients correspondents to skin
region reported by the skin color module are converted to a feature vector and are
classified by the SVM. Face pattern can then be verified.
In this approach the search time is reduced because possible regions are explored at
a lower resolution and the searching is limited to appropriate range of resolution. In
addition, face verification ensure high accuracy of detection. Experimental results
showed that this algorithm can detect faces efficiently and reliably. Note that, the
approach describe in this section assumes that faces in the image are of similar size
and are in frontal view. Detection of faces from different views, depths and sizes will
cause a poor result in this face tracking algorithm.
2.5 Facial Feature Detection using Haar Classifiers
Viola and Jones [15] introduced a method to accurately and efficiently (the use of a
face tracking system in real time) detect faces within an image. This method can be
adapted to accurately detect facial features [16]. However, the area of the image being
analyzed for a facial feature needs to be regionalized to the location with the highest
probability of containing the feature. By regionalizing the detection area, false
positives are eliminated and the speed of detection is increased due to the reduction of
the area examined.
The core basis for Haar classifier object detection is the features extracted from the
Haar transform. These features, rather than using the intensity values of a pixel, use
the change in contrast values between adjacent rectangular groups of pixels. The
contrast variances between the pixel groups are used to determine relative light and
dark areas. Haar features can easily be scaled by increasing or decreasing the size of
the pixel group being examined. This allows features to be used to detect objects of
various sizes.
Although calculating a feature is extremely efficient and fast, calculating all
features contained within a 24 × 24 sub-image is impractical.
Fortunately, only a tiny fraction of those features are actually needed to determine
if a sub-image potentially contains the desired object. In order to eliminate as many
sub-images as possible, only a few of the features that define an object are used when
analyzing sub-images. The purpose is to eliminate a substantial amount, around 50%,
of the sub-images that do not contain the object. This process continues, increasing
the number of features used to analyze the sub-image at each stage.
The cascading of the classifiers allows only the sub-images with the maximum
probability to be analyzed for all Haar features that distinguish an object. It also
allows varying the accuracy of the classifier. One can increase both the false alarm
rate and positive hit rate by decreasing the number of stages. The inverse of this is
also true.
[15] were able to achieve a 95% accuracy rate for the detection of a human face
using only 200 simple features. Using a 2 GHz computer, a Haar classifier cascade
could detect human faces at a rate of at least five frames per second.
Detecting human facial features, such as the mouth, eyes, and nose require that
Haar classifier cascades be trained first. In order to train the classifiers, the Adaptive
Boosting (AdaBoost) algorithm and Haar feature algorithms must be implemented.
Intel developed an open source library dedicated to easing the implementation of
computer vision related programs called Open Computer Vision Library (OpenCV).
The OpenCV library is designed to be used in conjunction with applications that are
relevant to the field of HCI, robotics, biometrics, image processing, and other areas
where visualization is important and includes an implementation of Haar classifier
detection and training.
To train the classifiers, two set of images are needed. One set regards images or
scenes that does not contain the object, in this case a facial feature, which is going to
be detected. This set of images is referred to as the negative images. The other set of
images, the positive images, contain one or more instances of the object. The location
of the objects within the positive images is specified by: image name, the upper left
pixel and the height, and width of the object.
The final classifier only uses a few hundred Haar features [16]. Yet, it achieves a
very good hit rate with a relatively low false detection rate.
3 Discussion
After reviewing the various techniques, we can argue that each technique has
strong and weak points. So, the possible best course of action to detect the movement
would be a system that offers the best compromise between precision and ease of use.
A possible facial tracking hybrid based solution would perform facial feature
detection using Haar classifiers and for automatic recovery and second evaluation of
the facial tracking, it would use a robust skin color based face detection algorithm.
The strong points of each algorithm can overlap with the weak ones, e.g. the color
based algorithm can be used in front and side skin detection, in a recovery situation
and to try to shorten the learning time. The facial feature detection using Haar
classifiers can be used to track facial features in case where skin occlusion happens.
To tackle this solution that has implications on the performance, each algorithm
can be stressful regarding computation time, we could use multi threading. In parallel
to multi threading we could use a control condition that maintain the performance of
the global algorithm. Multi-threading can occur in a real sense, in that a multi-
processor unit may have more than one processor executing instructions for a
particular process at a time, or it may be effectively simulated by multiple threads
executing in sequence. Also we just could use the strong points of each algorithm to
ease the computation load. This solution could be implemented by merging the two
algorithms and/or discarding some steps of the algorithms without injuring the
performance.
The objective of this proposal is to merge two distinct facial tracking techniques
that would complement each other in run time, by using a mainstream dual core
processer and make a more robust facial tracking end system.
Table 2. Strong and weak point’s of the mentioned algorithms.
Method Strong point’s Weak point’s
2.1 Stochastic color model
and the deformable
template
Can work with
occlusion. Can handle just
frontal face tracking
and the problem of
tracking error when
occlusion occurs not
solved completely.
2.2 Robust skin color Face detection with a
higher accuracy. Fails when there are
more potential skin
regions like legs or
arms. And dependent
on light conditions.
2.3 Skin Color in Image
by Neural Networks Learning capability
provides an high
accuracy when classify
features.
Pre processing phase.
2.4 QuadTree based Color
Analysis and Support
Vector Verification
Fast search time and
high accuracy ensured
by SVM.
Different views,
depths and sizes will
cause a poor result.
2.5 Facial Feature
Detection using Haar
Classifiers
With a 1.2GHz AMD
processor to analyze a
320 by 240 image, a
frame rate of 3 with
regionalization and 5
frames with more
powerful processor.
Feature face
occlusion.
4 Conclusion and Future work
In this paper we have presented a brief survey of the most popular facial tracking
approaches and pointed out their usefulness in different practical applications. We
also proposed a method that is a combination of some of the algorithms we discussed
and takes advantage of the main pros of each one. This method could be used to
increase the reliability and efficiency of a human computer interface designed for
disabled people.
On the other hand, USB cameras have become very popular nowadays because of
their low prices and easiness of installation. However, up till now these cameras are
barely used for anything more intelligent than web-casting and video-surveillance.
Automatic motion detection is considered the top high level vision task which can be
performed using these cameras. We also consider that this technology should be used
for designing vision-based perceptual user interfaces.
A future implementation of this method will demonstrate the practical and useful
component of facial tracking. The main target area would be people who have
disabilities and have problems using the mouse to interact with a computer. Also the
system has to be intelligent enough to provide users an acceptable efficiency and
usability.
Acknowledgments. Thanks to Prof. Manuel João for assistance with some of the
algorithms and insightful comments about face tracking. Thanks to Profs. Elizabeth
Carvalho and António Ramires for helpful comments.
References
1. Hjelmas, E., Low, B.K.: Face detection: A survey. CVIU 83 (2001)
2. K. Toyama. Look, Ma – no hands! Hands-free cursor control with real-time 3D face
tracking. In Proc. Workshop on Perceptual User Interfaces (PUI’98), San Fransisco,
November 1998.
3. Yang, M., Kriegman, D., Ahuja, N.: Detecting faces in images: A survey. PAMI 24 (2002)
4. D. Gorodnichy. On importance of nose for face tracking. In Proc. Intern. Conf. on
Automatic Face and Gesture Recognition (FG’2002), Washington, D.C., May 2002.
5. F. J. Huang and T. Chen, "Tracking of Multiple Faces for Human-Computer Interfaces and
Virtual Environments", IEEE Intl. Conf. on Multimedia and Expo., New York, July 2000
6. R. Rao and R. Mersereau, “On Merging Hidden Markov Models with Deformable
Templates,” Proceedings of the Int'l Conf. on Image Processing, Washington D.C., 1995.
7. Advanced Multimedia Processing Lab - Carnegie Mellon University,
http://amp.ece.cmu.edu/projects/FaceTracking/
8. Sanjay Kr. Singh, D. S. Chauhan, Mayank Vatsa, Richa Singh, “A Robust Skin Color Based
Face Detection Algorithm” Tamkang Journal of Science and Engineering, Vol. 6, No. 4, pp.
227-234 (2003)
9. Venn Diagram Wikipedia, http://en.wikipedia.org/wiki/Venn_Diagram
10. Jianmin Jiang, Ying Weng, “Face Detection based on Skin Color in Image by Neural
Networks” (2007)
11. Wong, S., Wong, K.K.: Fast Face Detection Using QuadTree Based Color Analysis and
Support Vector Verification. Image Analysis and Recognition. LNCS, vol. 3212, pp. 676-
683. Springer Berlin, Heidelberg (2004)
12. Wavelet Wikipedia, http://en.wikipedia.org/wiki/Wavelet
13. Quadtree Wikipedia, http://en.wikipedia.org/wiki/Quadtree
14. Support vector machine Wikipedia, http://en.wikipedia.org/wiki/Support_vector_machine
15. Viola, P. and Jones, M. Rapid object detection using boosted cascade of simple features.
IEEE Conference on Computer Vision and Pattern Recognition, (2001)
16. Wilson, P.I., Fernandez, J.: Facial feature detection using Haar classifiers. Journal of
Computing Sciences in Colleges. Volume 21 , Issue 4, Pages: 127 - 133 ( 2006)
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Face detection is one of the challenging problems in the image processing. A novel face detection system is presented in this paper. The approach relies on skin based color, while features extracted from two dimentional discreate cosine transfer (DCT) and neural networks. which can be used to detect faces by using skin color from DCT coefficient of Cb and Cr feature vectors. This system contains the skin color which is the main feature of faces for detection and then the skin face candidate is examined by using the neural networks, which learns from the feature of faces to classify whether the original image includes a face or not. The processing stage is based on normalization and discreate cosine transfer ( DCT ). Finally the classification based on neural networks approach. The experiments results on upright frontal color face images from the internet show an a excellent detection rate.
Conference Paper
Full-text available
Face detection has potential applications in a wide range of commercial products such as automatic face recognition system. Com- monly used face detection algorithms can extract faces from images accu- rately and reliably, but they often take a long time to flnish the detection process. Recently, there is an increasing demand of real time face detec- tion algorithm in applications like video surveillance system. This paper aims at proposing a multi-scale face detection scheme using Quadtree so that the time complexity of the face detection process can be reduced. By performing analysis from coarse to flne scales, the proposed scheme uses skin color as a heuristic feature, and support vector machine as a verifl- cation tool to detect face. Experimental results show that the proposed scheme can detect faces from images reliably and quickly.
Conference Paper
Full-text available
Human nose, while being in many cases the only facial feature clearly visible during the head motion, seems to be very undervalued in face tracking technology. This paper shows theoretically and by experiments conducted with ordinary USB cameras that, by properly defining nose as an extremum of the 3D curvature of the nose surface, nose becomes the most robust feature which can be seen for almost any position of the head and which can be tracked very precisely, even with low resolution cameras
Conference Paper
Full-text available
This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.
Article
In this paper, a detailed experimental study of face detection algorithms based on "Skin Color" has been made. Three color spaces, RGB, YCbCr and HSI are of main concern. We have compared the algorithms based on these color spaces and have combined them to get a new skin color based face detection algorithm which gives higher accuracy. Experimental results show that the proposed algorithm is good enough to localize a human face in an image with an accuracy of 95.18%.
Article
Viola and Jones [9] introduced a method to accurately and rapidly detect faces within an image. This technique can be adapted to accurately detect facial features. However, the area of the image being analyzed for a facial feature needs to be regionalized to the location with the highest probability of containing the feature. By regionalizing the detection area, false positives are eliminated and the speed of detection is increased due to the reduction of the area examined.
Conference Paper
For sophisticated background, a fast and self-adaptive face detection algorithm based on skin color is introduced. In this algorithm, histogram skin color model was built with great amount of skin color pixels in HS color space first, and then skin color segmentation was made to images by using histogram backprojection, in which a binary image of skin color area was obtained after thresholding. Morphological and Blob analysis were used to make further optimization to the segmentation result. Experimental results show that the proposed algorithm can detect faces with different sizes, rotations and expressions under different illumination conditions fast and accurately.
Article
In this paper we present a comprehensive and critical survey of face detection algorithms. Face detection is a necessary first-step in face recognition systems, with the purpose of localizing and extracting the face region from the background. It also has several applications in areas such as content-based image retrieval, video coding, video conferencing, crowd surveillance, and intelligent human–computer interfaces. However, it was not until recently that the face detection problem received considerable attention among researchers. The human face is a dynamic object and has a high degree of variability in its apperance, which makes face detection a difficult problem in computer vision. A wide variety of techniques have been proposed, ranging from simple edge-based algorithms to composite high-level approaches utilizing advanced pattern recognition methods. The algorithms presented in this paper are classified as either feature-based or image-based and are discussed in terms of their technical approach and performance. Due to the lack of standardized tests, we do not provide a comprehensive comparative evaluation, but in cases where results are reported on common datasets, comparisons are presented. We also give a presentation of some proposed applications and possible application areas.
Conference Paper
Face detection plays an important role in many applications such as surveillance, human computer interface, face recognition, and face image database management. This paper proposes a face detection algorithm for color images, which is based on an adaptive threshold and chroma chart that shows probability of skin colors. The experiments with variety of images show that the method proposed here is efficient and useful.
Conference Paper
Describes a real-time face-tracking algorithm. We start with single face tracking based on statistical color modeling and a deformable template. We then expand the algorithm to track multiple faces, possibly with occlusion, by constraining the speed and size changes of the faces. We test the algorithm on sequences with different occlusion patterns and analyze the tracking performance. We also present a tracking software library based on this algorithm. This library can be applied to human-computer interfaces, lip-reading and virtual environments