Figure 1 - uploaded by Anwar Saeed
Content may be subject to copyright.
The head pose rotation angles. Yaw is the rotation around the Y-axis. Pitch around the X-axis, roll around the Z-axis.

The head pose rotation angles. Yaw is the rotation around the Y-axis. Pitch around the X-axis, roll around the Z-axis.

Source publication
Article
Full-text available
Head pose estimation is a crucial initial task for human face analysis, which is employed in several computer vision systems, such as: facial expression recognition, head gesture recognition, yawn detection, etc. In this work, we propose a frame-based approach to estimate the head pose on top of the Viola and Jones (VJ) Haar-like face detector. Sev...

Contexts in source publication

Context 1
... face is usually modeled as a rigid object, with three DOF in the pose characterized by three rotation angles: pitch, roll and yaw. With a human head facing the camera, yaw is the angle of moving the head left and right (rotation around the Y-axis); the pitch is that of moving the head up and down (rotation around the X-axis); and roll is the tilt angle (rotation around the Z-axis); as shown in Figure 1. ...
Context 2
... order to avoid such detections, we whitened the background out with the help of the depth data, as shown in Figure 9b. Consequently, the detection rates drawn in Figure 8a,c are updated and depicted in Figure 10. It is clearly shown that the detection rate using the frontal model for faces in profile poses (faces with higher yaw angle) is decreased. ...
Context 3
... cover a wider range of head poses in the database, we exploit both frontal and profile models for the face detection, where the profile model is applied when the frontal model fails to return a true positive detection. Figure 11a depicts the detection rate using both frontal and profile models. Obviously, the use of the profile model causes higher detection rates for the faces at significant yaw angles, reaching more than 95% in most grids. ...
Context 4
... the use of the profile model causes higher detection rates for the faces at significant yaw angles, reaching more than 95% in most grids. The faces with extreme pitch angles are still hard to detect, as shown in Figure 12e. In this experiment, a white background and leave two out cross-validation are employed. ...
Context 5
... concatenation of HOG + HOG d + HPC + MCDP provides the most accurate estimates, where the average errors are not exceeding 5.1 • , 4.6 • , 4.2 • for pitch, yaw and roll, respectively. The mean error of the estimated three angles, resulting from the use of HOG + HOG d + HPC + MCDP concatenation, is depicted in Figure 11b. Interestingly, the estimation is accurate for high yaw angles as for the low ones. ...
Context 6
... the other hand, the estimation error is increasing as the pitch angle gets high, which in most cases is due to false cropping. Those outperforming results can be attributed to many factors: (1) employing the most distinctive features (HOG, HOG d , HPC + MCDP) for the head pose estimation; (2) employing the profile model of the face detector, which guarantees high accuracy in the profile cases as shown in Figure 12b,c; (3) the consistent face cropping using the proposed two-step search using the VJ face detector; (4) the generalizing capability of the exploited SVM regressors; (5) the parameters (of the feature extractors and regressors) estimation using grid search with cross-validation on the training set. Figure 12 shows samples of our cross-validation evaluation on the Biwi database, where Figure 12a,b are samples of the frontal model detection, while Figure 12c,d of the profile model. The face in Figure 12e cannot be detected by both models. ...
Context 7
... outperforming results can be attributed to many factors: (1) employing the most distinctive features (HOG, HOG d , HPC + MCDP) for the head pose estimation; (2) employing the profile model of the face detector, which guarantees high accuracy in the profile cases as shown in Figure 12b,c; (3) the consistent face cropping using the proposed two-step search using the VJ face detector; (4) the generalizing capability of the exploited SVM regressors; (5) the parameters (of the feature extractors and regressors) estimation using grid search with cross-validation on the training set. Figure 12 shows samples of our cross-validation evaluation on the Biwi database, where Figure 12a,b are samples of the frontal model detection, while Figure 12c,d of the profile model. The face in Figure 12e cannot be detected by both models. ...
Context 8
... the other hand, the estimation error is increasing as the pitch angle gets high, which in most cases is due to false cropping. Those outperforming results can be attributed to many factors: (1) employing the most distinctive features (HOG, HOG d , HPC + MCDP) for the head pose estimation; (2) employing the profile model of the face detector, which guarantees high accuracy in the profile cases as shown in Figure 12b,c; (3) the consistent face cropping using the proposed two-step search using the VJ face detector; (4) the generalizing capability of the exploited SVM regressors; (5) the parameters (of the feature extractors and regressors) estimation using grid search with cross-validation on the training set. Figure 12 shows samples of our cross-validation evaluation on the Biwi database, where Figure 12a,b are samples of the frontal model detection, while Figure 12c,d of the profile model. The face in Figure 12e cannot be detected by both models. ...
Context 9
... the other hand, the estimation error is increasing as the pitch angle gets high, which in most cases is due to false cropping. Those outperforming results can be attributed to many factors: (1) employing the most distinctive features (HOG, HOG d , HPC + MCDP) for the head pose estimation; (2) employing the profile model of the face detector, which guarantees high accuracy in the profile cases as shown in Figure 12b,c; (3) the consistent face cropping using the proposed two-step search using the VJ face detector; (4) the generalizing capability of the exploited SVM regressors; (5) the parameters (of the feature extractors and regressors) estimation using grid search with cross-validation on the training set. Figure 12 shows samples of our cross-validation evaluation on the Biwi database, where Figure 12a,b are samples of the frontal model detection, while Figure 12c,d of the profile model. The face in Figure 12e cannot be detected by both models. ...
Context 10
... outperforming results can be attributed to many factors: (1) employing the most distinctive features (HOG, HOG d , HPC + MCDP) for the head pose estimation; (2) employing the profile model of the face detector, which guarantees high accuracy in the profile cases as shown in Figure 12b,c; (3) the consistent face cropping using the proposed two-step search using the VJ face detector; (4) the generalizing capability of the exploited SVM regressors; (5) the parameters (of the feature extractors and regressors) estimation using grid search with cross-validation on the training set. Figure 12 shows samples of our cross-validation evaluation on the Biwi database, where Figure 12a,b are samples of the frontal model detection, while Figure 12c,d of the profile model. The face in Figure 12e cannot be detected by both models. Table 3. ...
Context 11
... Figure 12. Samples of head pose estimations taken from the Biwi database, where a concatenation of HOG + HOG d + HPC + MCDP feature types is employed. ...

Similar publications

Conference Paper
Full-text available
Head pose estimation is not only a crucial pre-processing task in applications such as facial expression and face recognition, but also the core task for many others, e.g. gaze; driver focus of attention; head gesture recognitions. In real scenarios, the fine location and scale of a processed face patch should be consistently and automatically obta...

Citations

... Ren et al. applied the Kinect sensor for partbased hand gesture recognition, using a novel distance metric known as the Finger-Earth Mover's Distance to assess the dissimilarity of handshapes [58]. Saeed et al. proposed a head pose estimation method based on Kinect and a new depthbased feature descriptor that provided competitive estimation accuracy while requiring less computation time [59]. In [60], Southwell and Fang proposed a human object recognition approach using Kinect and a depth information mask matching model. ...
Article
Full-text available
Low-cost and highly efficient 3D face reconstruction is a significant technique for developing virtual interaction applications. While the Kinect, a typical low-cost 3D data collection device, has been widely used in many applications, in this work, we propose a highly efficient 3D face point cloud optimization method for face reconstruction based on Kinect. Based on the different characteristics of the Kinect point cloud, diverse optimization strategies were delicately applied to improve the reconstruction quality. Our extensive experiments clearly show that our approach outperforms traditional approaches in terms of accuracy.
... The head rotates around its vertical axis and turns left and right, which creates a rotation angle between (-75, 75) degrees. This type of movement allows rotation from zero to 150 degrees from right to left [11]. As shown in Figure.1. ...
... The head tilts forward and flattens back, and forms a forward tilt angle of (-60, 60) degrees, that is, it allows the forward tilt with a movement ranging from zero to 120 degrees from bottom to top [11]. As shown in Figure.1. ...
... The head tilts to the right and left sides, and forms a lateral tilt angle between (-40, 40) degrees, that is, it allows a lateral tilt ranging from 0 to 80 degrees from right to left [11]. As shown in Figure.1. ...
Article
Full-text available
The detection and tracking of head movements have been such an active area of research during the past years. This area contributes highly to computer vision and has many applications of computer vision. Thus, several methods and algorithms of face detection have been proposed because they are required in most modern applications, in which they act as the cornerstone in many interactive projects. Implementation of the detected angles of the head or head direction is very useful in many fields, such as disabled people assistance, criminal behavior tracking, and other medical applications. In this paper, a new method is proposed to estimate the angles of head direction based on Dlib face detection algorithm that predicts 68 landmarks in the human face. The calculations are mainly based on the predicated landmarks to estimate three types of angles Yaw, Pitch and Roll. A python program has been designed to perform face detection and its direction. To ensure accurate estimation, the particular landmarks were selected, such that, they are not affected by the movement of the head, so, the calculated angles are approximately accurate. The experimental results showed high accuracy measures for the entire three angles according to real and predicted measures. The sample standard deviation results for each real and calculated angle were Yaw (0.0046), Pitch (0.0077), and Roll (0.0021), which confirm the accuracy of the proposed method compared with other studies. Moreover, the method performs faster which promotes accurate online tracking.
... In practice, it may not be competent to entirely rely upon the performance of just one machine learning model. Ensemble learning [19], [20] offers a systematic solution to combine the predictive power of multiple learners referring to learning a weighted combination of base models [21], [22]. The resultant model gives the aggregated output from several models. ...
Article
Full-text available
Automatically real-time synthesizing behaviors for a six-legged walking robot pose several exciting challenges, which can be categorized into mechanics design, control software , and the combination of both. Due to the complexity of controlling and automation, numerous studies choose to gear their attention to a specific aspect of the whole challenge by either proposing valid and low-power assumption of mechanical parts or implementing software solutions upon sensorial capabilities and camera. Therefore, a complete solution associating both mechanical moving parts, hardware components, and software encouraging generalization should be adequately addressed. The architecture proposed in this article orchestrates (i) interlocutor face detection and recognition utilizing ensemble learning and convolutional neural networks, (ii) maneuverable automation of six-legged robot via hexapod locomotion, and (iii) deployment on a Raspberry Pi, that has not been previously reported in the literature. Not satisfying there, the authors even develop one step further by enabling real-time operation. We believe that our contributions ignite multi-research disciplines ranging from IoT, computer vision, machine learning, and robot autonomy.
... However, only the yaw angle considered for head orientation. Saeed et al. [5] proposed a frame-based technique to classify head poses based on Viola-Jones Haar-like features. SVM used to classify the head poses, which requires much training time on large data sets. ...
... In order to classify the head pose from the single image, a deep CNN based system is developed which used a multi-task framework for training [9]. Some approaches estimated the head pose based on depth data [10,11], and few are used both RGB and depth data [5]. Expected head poses value, and point-based geometric analysis developed for head pose tracking [7]. ...
Chapter
Full-text available
Recently, the classification of the head pose has gained incremented attention due to the rapid development of HCI/HRI interfaces. The resoluteness of head pose plays a considerable part in interpreting the person’s focus of attention in human-robot or human-human intercommunications since it provides explicit information of his/her attentional target. This paper proposes a geometrical feature-based human head pose classification using deep convolution networks. An MTCNN framework is implemented to identify the human face and a ResNet50 layered architecture built to classify nine head poses. The system is trained with 2, 85, 000 and tested by 1, 15, 500 head pose images. The proposed system achieved 90.00% precision for nine head pose classes.
... Comparisons on the BIWI Dataset. The proposed method is also compared with state-of-the-art methods, including CNN-syn [30], DNN [60], regression [61], Two-Stage [62], KEPLER [25], QuatNet [63], Dlib [51], FAN [48], 3DDFA [37], QT_PYR [35], 4C_4S_var4 [36], Haar-Like(LBP) [64] and HAFA [65] on the BIWI dataset. We divided these 14 methods into two groups. ...
Article
Full-text available
Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D–2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D–2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing’04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing’04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.
... The recent success of RGB-D cameras such as the Kinect sensor depicts a broad prospect of three-dimensional data-based computer applications [32]. In recent years, with the development of low-cost sensors such as Microsoft Kinect and Intel RealSense, RGB-D images have been widely used in face database [33][34][35], hand gesture [36][37][38][39][40], head pose [41][42][43], and skeleton information [44][45][46]. Compared with the RGB image, the RGB-D image can provide additional information about the object's three-dimensional geometric structure and provide more effective information for object localization and 3D measurement. ...
Article
Full-text available
Three-dimensional (3D) shape information is valuable for fruit quality evaluation. Grading of the fruits is one of the important postharvest tasks that the fruit processing agro-industries do. Although the internal quality of the fruit is important, the external quality of the fruit influences the consumers and the market price significantly. To solve the problem of feature size extraction in 3D fruit scanning, this paper proposes an automatic fruit measurement scheme based on a 2.5-dimensional point cloud with a Kinect depth camera. For getting a complete fruit model, not only the surface point cloud is obtained, but also the bottom point cloud is rotated to the same coordinate system, and the whole fruit model is obtained by iterative closest point algorithm. According to the centroid and principal direction of the fruit, the cut plane of the fruit is made in the x-axis, y-axis, and z-axis respectively to obtain the contour line of the fruit. The experiment is divided into two groups, the first group is various sizes of pears to get the morphological parameters; the second group is the various colors, shapes, and textures of many fruits to get the morphological parameters. Comparing the predicted value with the actual value shows that the automatic extraction scheme of the size information is effective and the methods are universal and provide a reference for the development of the related application.
... For gaze estimation or head pose estimation using RGB-D cameras [26][27][28], RGB and depth images can also be used in different processing events. Usually, a depth image is used for foreground segmentation, head localization and object tracking, while the RGB image is used for eye localization and feature extraction. ...
Article
Full-text available
The driver gaze zone is an indicator of a driver’s attention and plays an important role in the driver’s activity monitoring. Due to the bad initialization of point-cloud transformation, gaze zone systems using RGB-D cameras and ICP (Iterative Closet Points) algorithm do not work well under long-time head motion. In this work, a solution for a continuous driver gaze zone estimation system in real-world driving situations is proposed, combining multi-zone ICP-based head pose tracking and appearance-based gaze estimation. To initiate and update the coarse transformation of ICP, a particle filter with auxiliary sampling is employed for head state tracking, which accelerates the iterative convergence of ICP. Multiple templates for different gaze zone are applied to balance the templates revision of ICP under large head movement. For the RGB information, an appearance-based gaze estimation method with two-stage neighbor selection is utilized, which treats the gaze prediction as the combination of neighbor query (in head pose and eye image feature space) and linear regression (between eye image feature space and gaze angle space). The experimental results show that the proposed method outperforms the baseline methods on gaze estimation, and can provide a stable head pose tracking for driver behavior analysis in real-world driving scenarios.
... The eye and face detection module was programmed with the modified feature-based Haar cascade face detection classifier algorithm [13,22,26,28,31,32]. The modified Haar cascade face detection classifier algorithm detects an edge, line, and center-surround features of an intruder object. ...
... After initial face detection is complete, an intelligent secondary stage algorithm was designed to eliminate false positives which capture half-hidden human faces, based on the brightest features of the face [26,27,[33][34][35][36][37][38]. For each positive sample, a dynamic threshold (range from 0.5 to 0.7) was set to capture the brightest part of the intruders partially or fully covered faces. ...
Article
Full-text available
The proposed research methodology aims to design a generally implementable framework for providing a house owner/member with the immediate notification of an ongoing theft (unauthorized access to their premises). For this purpose, a rigorous analysis of existing systems was undertaken to identify research gaps. The problems found with existing systems were that they can only identify the intruder after the theft, or cannot distinguish between human and non-human objects. Wireless Sensors Networks (WSNs) combined with the use of Internet of Things (IoT) and Cognitive Internet of Things are expanding smart home concepts and solutions, and their applications. The present research proposes a novel smart home anti-theft system that can detect an intruder, even if they have partially/fully hidden their face using clothing, leather, fiber, or plastic materials. The proposed system can also detect an intruder in the dark using a CCTV camera without night vision capability. The fundamental idea was to design a cost-effective and efficient system for an individual to be able to detect any kind of theft in real-time and provide instant notification of the theft to the house owner. The system also promises to implement home security with large video data handling in real-time. The investigation results validate the success of the proposed system. The system accuracy has been enhanced to 97.01%, 84.13, 78.19%, and 66.5%, in scenarios where a detected intruder had not hidden his/her face, hidden his/her face partially, fully, and was detected in the dark from 85%, 64.13%, 56.70%, and 44.01%.
... Considering the real-time, fault tolerance and scalability requirements, we utilize the cascade classifier, Haar feature [34] and LBP feature [35]. The AdaBoost [36] method trains the same classifier (weak classifier) for different training sets and then combines these classifiers obtained on different training sets to form a stronger final classifier (strong classifier). ...
Article
Full-text available
In this paper, we propose a parking slot markings detection method based on the geometric features of parking slots. The proposed system mainly consists of two steps, namely, separating line detection and parking slot entrance detection. First, in the separating line detection stage, we propose a line-clustering method based on the line segment detection (LSD) algorithm. Our detecting and line-clustering algorithm can detect the separating lines that contain a pair of parallel lines with a fixed distance in a bird’s eye view (BEV) image under diverse lighting and ground conditions. Consequently, parking slot candidates are generated by pairing the separating lines according to the width of the parking slots. In the parking slot entrance detection process, we propose a multiview fusion-based learning approach that can increase the number of training samples by performing a perspective transformation on the acquired BEV images. The proposed method was evaluated using 353 BEV images covering diverse parking slot markings. Experiments show that the proposed method can recognize typical perpendicular and parallel rectangular parking slots, and a precision of 97.4% and recall of 96.6% are achieved.
... The eye and face detection module was programmed with the modified feature-based Haar cascade face detection classifier algorithm [13,22,26,28,31,32]. The modified Haar cascade face detection classifier algorithm detects an edge, line, and center-surround features of an intruder object. ...
... After initial face detection is complete, an intelligent secondary stage algorithm was designed to eliminate false positives which capture half-hidden human faces, based on the brightest features of the face [26,27,[33][34][35][36][37][38]. For each positive sample, a dynamic threshold (range from 0.5 to 0.7) was set to capture the brightest part of the intruders partially or fully covered faces. ...
Preprint
Full-text available
The proposed research methodology aims to design a generally implementable framework for providing a house owner/member with the immediate notification of an on-going theft (unauthorized access to their premises). For this purpose, a rigorous analysis of existing systems was undertaken to identify research gaps. The problems found with existing systems were that they can only identify the intruder after the theft, or cannot distinguish between human and non-human objects. Wireless Sensors Networks (WSNs) combined with the use of Internet of Things (IoT), Cognitive Internet of Things, Internet of Medical Things, and Cloud Computing are expanding smart home concepts and solutions, and their applications. The primary objective of the present research work was to design and develop IoT and cloud computing based smart home solutions. In addition, we also propose a novel smart home anti-theft system that can detect an intruder, even if they have partially/fully hidden their face using clothing, leather, fiber, or plastic materials. The proposed system can also detect an intruder in the dark using a CCTV camera without night vision facility. The fundamental idea was to design a cost-effective and efficient system for an individual to be able to detect any kind of theft in real-time and provide instant notification of the theft to the house owner. The system also promises to implement home security with large video data handling in real-time.