Conference PaperPDF Available

On importance of nose for face tracking

Authors:

Abstract and Figures

Human nose, while being in many cases the only facial feature clearly visible during the head motion, seems to be very undervalued in face tracking technology. This paper shows theoretically and by experiments conducted with ordinary USB cameras that, by properly defining nose as an extremum of the 3D curvature of the nose surface, nose becomes the most robust feature which can be seen for almost any position of the head and which can be tracked very precisely, even with low resolution cameras
Content may be subject to copyright.
A preview of the PDF is not available
... A blink detection algorithm for human computer interaction also has been proposed by Morris [21], in which the initialization step requires motion detection to locate the eyelids. There still have not been many blink detection related systems designed to work with inexpensive USB webcams [10,11]. There are some other feature detection systems that use more expensive and less portable alternatives, such as digital and inferred cameras for video input [2,19,24,35]. ...
Article
Full-text available
This paper represents a system which can understand and react appropriately to human facial expression for nonverbal communications. The considerable events of this system are detection of human emotions, eye blinking, head nodding and shaking. The key step in the system is to appropriately recognize a human face with acceptable labels. This system uses currently developed OpenCV Haar Feature-based Cascade Classifier for face detection because it can detect faces to any angle. Our system can recognize emotion which is divided into several phases: segmentation of facial regions, extraction of facial features and classification of features into emotions. The first phase of processing is to identify facial regions from real time video. The second phase of processing identifies features which can be used as classifiers to recognize facial expressions. Finally, an artificial neural network is used in order to classify the identified features into five basic emotions. It can also detect eye blinking accurately. It works for the active scene where the eye moves freely and the head and the camera moves independently in all directions of the face. Finally, this system can identify the natural head nodding and shaking that can be recognized in real-time using optical flow motion tracking and find the direction of head during the head movement for nonverbal communication.
... With the NP system, tracking is initiated by blink detection followed by tracking of scale-insensitive structure in the vicinity of the eyes and estimation of eye positions. The nose tip is located as the brightest point in a region of interest determined by the eyes [6]. The eyes and nose determine a region of interest for segmentation of the shadow area in the open mouth (MSROI). ...
Preprint
Full-text available
Considerable effort has been devoted to the automatic extraction of information about action of the face from image sequences. Within the context of human-computer interaction (HCI) we may distinguish systems that allow expression from those which aim at recognition. Most of the work in facial action processing has been directed at automatically recognizing affect from facial actions. By contrast, facial gesture interfaces, which respond to deliberate facial actions, have received comparatively little attention. This paper reviews several projects on vision-based interfaces that rely on facial action for intentional HCI. Applications to several domains are introduced, including text entry, artistic and musical expression and assistive technology for motor-impaired users.
... This difference is then compared with a threshold level to decide whether the object has been detected or not. Different facial landmarks such as frontal-face, eyes, mouth an nose are used [7]. The thresholds are predefined for each landmark using different feature boxes and features. ...
Article
Human Computer Interaction has garnered lot of attention from researchers. In order to have an effective communication channel, the system should be able to deduce the emotions of a person from his facial expressions. This paper proposes an automatic emotion recognition system using live video input based on facial features. Cascaded classifier based on Haar features have been used to extract the facial features. These features have then been classified using Naïve Bayes classifier. A Raspberry Pi has been used to create a hardware interface for decision making once the emotion has been recognized.
... Recently it was showed that the robustness of tracking based on individual characteristics of the face can be significantly improved, if instead of using features such as edges and corners of eyes, mouth and nostrils, they use features based on curvature of the nose [8]. This creates a new range of interesting possibilities for tracking facial features based on face. ...
Chapter
In this paper, artist face recognition, and all movie prediction system is proposed. It comprises of two phases. Initially, the faces in the video are recognized using an l1-minimization CNN + HOG framework, and some keyframes are selected, based on a robust measure of confidence. Then the labels are propagated from the keyframes to the remaining frames, by using transductive learning. The constraints in both feature and temporal spaces are integrated simultaneously. The output of the algorithm is tested on Indian Movie Face—Database and generated all the movies of those actors/actresses of the film.
Chapter
Interaction methods based on computer-vision hold the potential to become the next powerful technology to support breakthroughs in the field of human-computer interaction. Non-invasive vision-based techniques permit unconventional interaction methods to be considered, including use of movements of the face and head for intentional gestural control of computer systems. Facial gesture interfaces open new possibilities for assistive input technologies. This chapter gives an overview of research aimed at developing vision-based head and face-tracking interfaces. This work has important implications for future assistive input devices. To illustrate this concretely the authors describe work from their own research in which they developed two vision-based facial feature tracking algorithms for human computer interaction and assistive input. Evaluation forms a critical component of this research and the authors provide examples of new quantitative evaluation tasks as well as the use of model real-world applications for the qualitative evaluation of new interaction styles.
Article
Recently, various types of camera mouse are developed using the image processing. The camera mouse showed limited performance compared to the traditional optical mouse in terms of the response time and the usability. These problems are caused by the mismatch between the size of the monitor and that of the active pixel area of the CMOS Image Sensor. To overcome these limitations, we designed a new input device that uses the face recognition as well as the speech recognition simultaneously. In the proposed system, the area of the monitor is partitioned into `n` zones. The face recognition is performed using the web-camera, so that the mouse pointer follows the movement of the face of the user in a particular zone. The user can switch the zone by speaking the name of the zone. The multimodal mouse is analyzed using the Keystroke Level Model and the initial experiments was performed to evaluate the feasibility and the performance of the proposed system.
Article
To improve single-handed operation of mobile devices, rear touch panel operation to control commands and facial feature detection to control cursor position are proposed. Operational control is achieved through finger chord gestures on a rear touch panel, and nose movement is used to control cursor movement. Zooming is achieved by detecting the apparent distance between the left and right eyes in conjunction with a finger chord gesture. We have evaluated movement time, error rates, and the throughputs of these techniques in comparison with the conventional single-handed front touch panel thumb operations using Fitts's law. Experiments have been conducted to evaluate two operation modes, selection and zooming, in the form of reciprocal 1-D pointing tasks with 12 participants. For the target selection task, the proposed technique achieved 12% (0.26 s) shorter movement time and 4.7% smaller error rate than the conventional method on average. Especially, for long distance targets, the performance of the conventional method became remarkably inferior due to the limit of reach of the thumb, whereas the proposed technique achieved much less deterioration and obtained expected performance because the cursor could reach anywhere on the display. For the target size adjustment task, the proposed technique achieved 9% (0.22 s) shorter movement time than the conventional method and obtained a comparable error rate of less than 4%. Consequently, we could demonstrate the techniques that make single-handed select and zoom operations available anywhere on a large-sized tablet device with no blockage of the display.
Article
Full-text available
Sensitivity to variations in illumination is a fundamental and challenging problem in face recognition. In this paper, we describe a new method based on symmetric shape-from-shading (SFS) to develop a face recognition system that is robust to changes in illumination. The basic idea of this approach is to use the symmetric SFS algorithm as a tool to obtain a prototype image which is illumination-normalized. Applying traditional SFS algorithms to real images of complex objects (in terms of their shape and albedo variations) such as faces is very challenging. It is shown that the symmetric SFS algorithm has a unique point-wise solution. In practice, given a single real face image with complex shape and varying albedo, even the symmetric SFS algorithm cannot guarantee the recovery of accurate and complete shape information. For the particular problem of face recognition, we utilize the fact that all faces share a similar shape making the direct computation of the prototype image from a giv...
Article
Full-text available
With the invention of fast USB interfaces and recent increase of computer power and decrease of camera cost, it has be- come very common to see a camera on top of a computer monitor. Vision-based games and interfaces however are still not common, even despite the realization of the benefits vision could bring: hand-free control, multiple-user inter- action etc. The reason for this lies in the inability to track human faces in video both precisely and robustly. This pa- per describes a face tracking technique based on tracking a convex-shape nose feature which resolves this problem. The technique has been successfully applied to interactive com- puter games and perceptual user interfaces. These results
Article
Full-text available
Traditionally, image intensities have been processed to segment an image into regions or to find edge-fragments. Image intensities carry a great deal more information about three-dimensional shape, however. To exploit this information, it is necessary to understand how images are formed and what determines the observed intensity in the image. The gradient space, popularized by Huffman and Mackworth in a slightly different context, is a helpful tool in the development of new methods.
Article
As a first step towards a perceptual user interface, a computer vision color tracking algorithm is developed and applied towards tracking human faces. Computer vision algorithms that are intended to form part of a perceptual user interface must be fast and efficient. They must be able to track in real time yet not absorb a major share of computational resources: other tasks must be able to run while the visual interface is being used. The new algorithm developed here is based on a robust...
Article
An adaptive logic network (ALN) is a multilayer perceptron that accepts vectors of real (or floating point) values as inputs and produces a logic 0 or 1 as output. The ALN has a number of linear threshold units (perceptrons) acting on the network inputs, and their (Boolean) outputs feed into a tree of logic gates of types AND and OR. An ALN represents a real-valued function of real variables by giving a logic 1 response to points on and under the graph of the function, and a logic 0 otherwise. It cannot compute a real-valued function directly, but it can provide information about how to perform that computation in a separate decision-tree-based program. If a function is invertible, then the same ALN can be used to derive a second decision tree to compute an inverse. Another way to look at function synthesis is that linear functions are combined by a tree expression of MAXIMUM and MINIMUM operations. In this way, ALNs can approximate any continuous function defined on a compact set to any degree of precision. The logic tree structure can control qualitative properties of learned functions, for example convexity. Constraints can be imposed on monotonicities and partial derivatives. ALNs can be used for prediction, data analysis, pattern recognition and control applications. They may be particularly useful for extremely large systems, where lazy evaluation allows large parts of a computation to be omitted. A second, earlier type of ALN is also discussed where the inputs are fixed thresholds on variables and the nodes adapt by changing their logical functions.
Article
Detection and tracking of facial features without using any head mounted devices may become required in various future visual communication applications, such as teleconferencing, virtual reality etc. In this paper we propose an automatic method of face feature detection using a method called edge pixel counting. Instead of utilizing color or gray scale information of the facial image, the proposed edge pixel counting method utilized the edge information to estimate the face feature positions such as eyes, nose and mouth in the first frame of a moving facial image sequence, using a variable size face feature template. For the remaining frames, feature tracking is carried out alternatively using a method called deformable template matching and edge pixel counting. One main advantage of using edge pixel counting in feature tracking is that it does not require the condition of a high inter frame correlation around the feature areas as is required in template matching. Some experimental results are shown to demonstrate the effectiveness of the proposed method.© (1995) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.
Conference Paper
As a first step towards a perceptual user interface, a computer vision color tracking algorithm is developed and applied towards tracking human faces. Computer vision algorithms that are intended to form part of a perceptual user interface must be fast and efficient. They must be able to track in real time yet not absorb a major share of computational resources: other tasks must be able to run while the visual interface is being used. The new algorithm developed here is based on a robust non- parametric technique for climbing density gradients to find the mode (peak) of probability distributions called the mean shift algorithm. In our case, we want to find the mode of a color distribution within a video scene. Therefore, the mean shift algorithm is modified to deal with dynamically changing color probability distributions derived from video frame sequences. The modified algorithm is called the Continuously Adaptive Mean Shift (CAMSHIFT) algorithm. CAMSHIFT's tracking accuracy is compared against a Polhemus tracker. Tolerance to noise, distractors and performance is studied. CAMSHIFT is then used as a computer interface for controlling commercial computer games and for exploring immersive 3D graphic worlds.
Article
At the heart of every model-based visual tracker lies a pose estimation routine. Recent work has emphasized the use of least-squares techniques which employ all the available data to estimate the pose. Such techniques are, however, susceptible to the sort of spurious measurements produced by visual feature detectors, often resulting in an unrecoverable tracking failure. This paper investigates an alternative approach, where a minimal subset of the data provides the pose estimate, and a robust regression scheme selects the best subset. Bayesian inference in the regression stage combines measurements taken in one frame with predictions from previous frames, eliminating the need to further filter the pose estimates. The resulting tracker performs very well on the difficult task of tracking a human face, even when the face is partially occluded. Since the tracker is tolerant of noisy, computationally cheap feature detectors, frame-rate operation is comfortably achieved on standard hardware.
Conference Paper
We have developed an artificial neural network based gaze tracking system which can be customized to individual users. A three layer feed forward network, trained with standard error back propagation, is used to determine the position of a user's gaze from the appearance of the user's eye. Unlike other gaze trackers, which normally require the user to wear cumbersome headgear, or to use a chin rest to ensure head immobility, our system is entirely non-intrusive. Currently, the best intrusive gaze tracking sys- tems are accurate to approximately 0.75 degrees. In our experiments, we have been able to achieve an accuracy of 1.5 degrees, while allowing head mobility. In its current implementation, our system works at 15 hz. In this paper we present an empirical analysis of the performance of a large number of artificial neural network architectures for this task. Suggestions for further explorations for neurally based gaze trackers are presented, and are related to other similar artificial neural network applications such as autonomous road following.
Conference Paper
This paper provides an introduction to the field of reasoning with uncertainty in Artificial Intelligence (AI), with an emphasis on reasoning with numeric uncertainty. The considered formalisms are Probability Theory and some of its generalizations, the Certainty Factor Model, Dempster-Shafer Theory, and Probabilistic Networks.