Figure 3 - uploaded by Zhuoni Jie
Content may be subject to copyright.
(a) Eye Regions of Interest (EROI) and Mouth Regions of Interest (MROI) in yawning state; (b) EROI and MROI in normal state.
Contexts in source publication
Context 1
... the face region is detected. The face tracker we use works well under partial occlusion. After landmark detection, we are able to identify the face regions of interest in all the valid frames. Two regions of interest were identified: Mouth Region of Interest (MROI) and Eye Region of Interest (EROI), described in Fig. 3. The MROI region is used to detect mouth features, which is of primary importance in yawning detection. The EROI is used to detect eye openness and possible wrinkles within eye area, which are also important features to detect yawning even when mouth is covered. Taking these two regions of interest into consideration has the following ...
Context 2
... features selected proved invariant to different lighting conditions. As shown in Fig 3, lighting varied according to the simulated scenes in the driving simulator, which indi- cates that our method remains robust in changing lighting conditions and generalisable to real road scenarios. We did not include a baseline comparison here, because to the best of our knowledge, there is not available spontaneous dataset of natural yawns that is publicly available for comparison. ...
Similar publications
The state of functioning (posture) of a driver at the wheel of a car involves a complex set of psychological, physiological, and physical parameters. This combination induces fatigue, which manifests itself in repeated yawning, stinging eyes, a frozen gaze, a stiff and painful neck, back pain, and other signs. The driver may fight fatigue for a few...
Traffic jams are one of the serious issues in many developed countries. After the pandemic, many employees were allowed to travel interstate to work. This contributes to more severe jams, especially in the capital and nearby states. Long-distance driving and congestion can easily make the drivers sleepy and thus lead to traffic accidents. This pape...
Traffic jams are one of the serious issues in many developed countries. After the pandemic, many employees were allowed to travel interstate to work. This contributes to more severe jams, especially in the capital and nearby states. Long-distance driving and congestion can easily make the drivers sleepy and thus lead to traffic accidents. This pape...
The use of face video information for driver fatigue detection has received extensive attention because of its low cost and non-invasiveness. However, the current vehicle-mounted embedded device has insufficient memory and limited computing power, which cannot complete the real-time detection of driver fatigue based on deep learning. Therefore, thi...
The significant number of road traffic accidents caused by fatigued drivers presents substantial risks to the public’s overall safety. In recent years, there has been a notable convergence of intelligent cameras and artificial intelligence (AI), leading to significant advancements in identifying driver drowsiness. Advances in computer vision techno...
Citations
... In the system of identifying dangerous driving behaviors, the required data should be collected first and then feature extraction and classification should be carried out. Data collection types are mainly divided into four parts: questionnaire collection data [31], vehicle running state data [32], machine vision data [33], and driver physiological characteristics data [34]. Only when the collected data are accurate and comprehensive enough can it provide the necessary support for the dangerous driving recognition model. ...
... The parameters provide the most immediate and reliable dataset, readily attainable through a variety of sensors. The Controller Area Network (CAN) bus, as referenced [32], is instrumental in acquiring precise, real-time vehicle operation data. An inertial measurement unit (IMU) [38], which includes an accelerometer, gyroscope, and magnetometer, is employed to measure three-dimensional acceleration, angular velocity, and magnetic field alignment, as well as to assess the impact of longitudinal velocity on vehicle yaw rate. ...
In response to the rising frequency of traffic accidents and growing concerns regarding driving safety, the identification and analysis of dangerous driving behaviors have emerged as critical components in enhancing road safety. In this paper, the research progress in the recognition methods of dangerous driving behavior based on deep learning is analyzed. Firstly, the data collection methods are categorized into four types, evaluating their respective advantages, disadvantages, and applicability. While questionnaire surveys provide limited information, they are straightforward to conduct. The vehicle operation data acquisition method, being a non-contact detection, does not interfere with the driver’s activities but is susceptible to environmental factors and individual driving habits, potentially leading to inaccuracies. The recognition method based on dangerous driving behavior can be monitored in real time, though its effectiveness is constrained by lighting conditions. The precision of physiological detection depends on the quality of the equipment. Then, the collected big data are utilized to extract the features related to dangerous driving behavior. The paper mainly classifies the deep learning models employed for dangerous driving behavior recognition into three categories: Deep Belief Network (DBN), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN). DBN exhibits high flexibility but suffers from relatively slow processing speeds. CNN demonstrates excellent performance in image recognition, yet it may lead to information loss. RNN possesses the capability to process sequential data effectively; however, training these networks is challenging. Finally, this paper concludes with a comprehensive analysis of the application of deep learning-based dangerous driving behavior recognition methods, along with an in-depth exploration of their future development trends. As computer technology continues to advance, deep learning is progressively replacing fuzzy logic and traditional machine learning approaches as the primary tool for identifying dangerous driving behaviors.
... These incidents not only cause physical and mental distress to those involved but also lead to substantial financial losses due to vehicle repairs and insurance claims. Among the specific behaviors studied, driver drowsiness has been extensively analyzed through facial expression monitoring, and yawning detection has been explored by classifying the states of mouth openness [3,6]. Nevertheless, other atypical behaviors like prolonged stops in yellow box junctions and sudden lane changes remain underexplored. ...
Over recent years, video-surveillance systems have seen extensive adoption, largely driven by security imperatives, with radar-based speed detection being a common feature in traffic monitoring. Despite its prevalence, broader anomaly detection in traffic patterns has not received equivalent focus. This research develops a sophisticated deep learning framework, drawing architectural inspiration from MobileNet, ResNet50, and VGG19, to not only detect and track vehicles but also analyze trajectory data to identify nonstandard behaviors. Specifically, our model detects four distinct anomalies: overspeeding, lingering in no-stopping zones, insufficient spacing between vehicles, and violations of traffic light signals. To support this, we constructed a unique dataset comprising over 60,000 video frames. The YOLOv3 algorithm facilitated initial object recognition, which was complemented by data augmentation techniques to mitigate issues related to class imbalance and the limited availability of annotated datasets in this domain. Our enhanced model achieved an overall accuracy of 95%, with a detailed performance breakdown for each detected anomaly.
... The last phase uses well-known drowsy measures, such as Percentage of Eye Closure (PERCLOS) and Eye Closure Duration (ECD), to determine the final state of the driver. In [19], the authors introduce a yawning-based method for real-time detection of drowsy drivers. First, the proposed method presents a novel indicator for drowsiness state based on face touches presence and analysis. ...
... Year Technique ML-Based Model DL-Based Model Accuracy [18] 2014 AdaBoost, MAP X 90% [27] 2017 MTCNN X 89.5% [19] 2018 Mouth and Eye, SVM X 95% [20] 2018 Bayesian, SVM X 97.3% [21] 2018 SVM, DBSCAN X 89.13% [24] 2018 MLP, Violla Jones X 93% [25] 2018 Google API X 93.37% [26] 2018 DBN X 96.7% [22] 2019 CNN, SSD, MobileNet X 84% [23] 2019 MCNN, KCF X 92% [12] 2020 3D-CNN X 92% [45] 2022 CNN, Haar Cascade X 91% [46] 2022 CNN X 90% ...
Nowadays, driving accidents are considered one of the most crucial challenges for governments and communities that affect transportation systems and peoples lives. Unfortunately, there are many causes behind the accidents; however, drowsiness is one of the main factors that leads to a significant number of injuries and deaths. In order to reduce its effect, researchers and communities have proposed many techniques for detecting drowsiness situations and alerting the driver before an accident occurs. Mostly, the proposed solutions are visually-based, where a camera is positioned in front of the driver to detect their facial behavior and then determine their situation, e.g., drowsy or awake. However, most of the proposed solutions make a trade-off between detection accuracy and speed. In this paper, we propose a novel Visual-based Alerting System for Detecting Drowsy Drivers (VAS-3D) that ensures an optimal trade-off between the accuracy and speed metrics. Mainly, VAS-3D consists of two stages: detection and classification. In the detection stage, we use pre-trained Haar cascade models to detect the face and eyes of the driver. Once the driver’s eyes are detected, the classification stage uses several pre-trained Convolutional Neural Network (CNN) models to classify the driver’s eyes as either open or closed, and consequently their corresponding situation, either awake or drowsy. Subsequently, we tested and compared the performance of several CNN models, such as InceptionV3, MobileNetV2, NASNetMobile, and ResNet50V2. We demonstrated the performance of VAS-3D through simulations on real drowsiness datasets and experiments on real world scenarios based on real video streaming. The obtained results show that VAS-3D can enhance the accuracy detection of drowsy drivers by at least 7.5% (the best accuracy reached was 95.5%) and the detection speed by up to 57% (average of 0.25 ms per frame) compared to other existing models.
... For comparison, we selected HOG & LBP combined with SVM [7], the deep neural network-based method Dense-LSTM (Long Short-Term Memory) [17]. Our objective was to determine the performance in handling complex scenes found in FatigueView, e.g., occlusion and micro-shading. ...
... Driver behavior-based parameters are based on recognizing different behavioral clues displayed by a fatigued driver. A common focus is on facial expressions that convey characteristics like eye tracking system, rapid blinking (Junaedi & Akbar, 2018), head nodding or swinging (Ghourabi et al., 2020) or regular yawning (Jie et al., 2018). All of these symptoms indicate that a person is sleep deprived and/or fatigued. ...
Driving fatigue is the leading cause of traffic accidents in many countries, prompting the development of a number of fatigue detection devices. This paper concisely reviews the existing fatigue detection system for transportation sectors. A rigorous systematic literature review (SLR) was utilized to find robust and high-potential material related to the research issue. According to the available literature research, many fatigue detection devices have been developed and commercialized, categorized into three groups based on the detection target's features: vehicle-based parameters, behaviour-based parameters and physiological-based parameters. However, currently available driver fatigue detection systems are divided into two categories: (i) very expensive systems that are limited to specific high-end automobile models and (ii) affordable alternatives for old and cheap vehicles that are not robust. Regardless of the physiological-based parameters' great accuracy in identifying driving fatigue, practically all available fatigue detection devices are classified as vehicle and driver behaviour-based parameters. As a result, this study looked into the use of physiological method in the future fatigue detection studies. The study's findings will help researchers, politicians, and practitioners create a system to significantly reduce road accidents and improve road safety.
... Most work on grounding revolves around visual, textual, or audio modalities. However, less studied modalities in the context of grounding, such as physiological, sensorial, or behavioral, are valuable in diverse applications such as measuring driver alertness (Jie et al., 2018;Riani et al., 2020), detecting depression (Bilalpur et al., 2023) or detecting deceptive behaviors (Abouelenien et al., 2016). These modalities raise interesting questions across the entire pipeline, starting with data collection and representation, all the way to evaluation and deployment. ...
Recent progress in large language models has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has in turn made many NLP researchers -- especially those at the beginning of their career -- wonder about what NLP research area they should focus on. This document is a compilation of NLP research directions that are rich for exploration, reflecting the views of a diverse group of PhD students in an academic research lab. While we identify many research areas, many others exist; we do not cover those areas that are currently addressed by LLMs but where LLMs lag behind in performance, or those focused on LLM development. We welcome suggestions for other research directions to include: https://bit.ly/nlp-era-llm
... Zhang et al. [40] 2015 92.00% Omidyeganeh et al. [37] 2016 75.00% Zhang and Su [39] 2018 88.60% Jie et al. [41] 2018 94.63% Kassem et al. [42] 2020 96.20% Our system in this study 2022 98.00% Table 5 Comparison of driver fatigue detection for YawDD dataset. ...
... So, it can be seen from the literatures mentioned above that in all the features, the characteristics of the eyes and the mouth are the most widely used, and the establishment of complex models and the large amount of data to be processed pose new challenges to the computing power of computers. More relative references can also be founded [20][21][22][23][24][25][26]. ...
Fatigue driving has always received a lot of attention, but few studies have focused on the fact that human fatigue is a cumulative process over time, and there are no models available to reflect this phenomenon. Furthermore, the problem of incorrect detection due to facial expression is still not well addressed. In this article, a model based on BP neural network and time cumulative effect was proposed to solve these problems. Experimental data were used to carry out this work and validate the proposed method. Firstly, the Adaboost algorithm was applied to detect faces, and the Kalman filter algorithm was used to trace the face movement. Then, a cascade regression tree-based method was used to detect the 68 facial landmarks and an improved method combining key points and image processing was adopted to calculate the eye aspect ratio (EAR). After that, a BP neural network model was developed and trained by selecting three characteristics: the longest period of continuous eye closure, number of yawns, and percentage of eye closure time (PERCLOS), and then the detection results without and with facial expressions were discussed and analyzed. Finally, by introducing the Sigmoid function, a fatigue detection model considering the time accumulation effect was established, and the drivers' fatigue state was identified segment by segment through the recorded video. Compared with the traditional BP neural network model, the detection accuracies of the proposed model without and with facial expressions increased by 3.3% and 8.4%, respectively. The number of incorrect detections in the awake state also decreased obviously. The experimental results show that the proposed model can effectively filter out incorrect detections caused by facial expressions and truly reflect that driver fatigue is a time accumulating process.
... Al-sudani et al. [21] proposed a yawning-based fatigue prediction method that monitors driver drowsiness levels. ...
Driver drowsiness is a severe problem that usually causes traffic accidents, classified as more dangerous. The record of the National Safety Council reported that drowsy driving is caused by 9.5% of all crashes (100,000 cases). Therefore, preventing and minimizing driver fatigue is a significant research area. This study aims to design a nonintrusive real-time drowsiness system based on image processing and fuzzy logic techniques. It is an enhanced approach for Viola–Jones to examine different visual signs to detect the driver's drowsiness level. It extracted eye blink duration and mouth features to detect driver drowsiness based on the desired facial feature image in a specific driver video frame. The size and orientation of the captured features were tracked and handled for determining image features such as brightness, shadows, and clearness. Lastly, the fuzzy control system provides different alert sounds based on the tracked information from the face, eyes, and mouth in separate cases, such as race, wearing glasses or not, gender, and various illumination backgrounds. The experiments’ results show that the proposed approach achieved high accuracy of 94.5% in detecting driver status compared with other studies. Also, the fuzzy logic controller efficiently issued the required alert signal of the drowsy driver status that helps to save the driver's life.
... Advantages Disadvantages MultiHPOG [49] Combining multiple features with multiple scales Ad hoc features, limited in generalization HiKS [22] Using multiple features such as PCA and LDA Ad hoc features, limited in generalization EARSequence [84] Using both EAR and its temporal information Not robust to head movements such as looking down MeanDistance [19] Optimized EAR, robuster to noise, temporal information Simple classification with limited robustness MotionVector [85] Using optical flow to capture blinking in temporal Limited in generalization with empirically defined thresholds HOG&LBP [33] Using both HOG and LBP features from mouth and eyes Low accuracy, especially confusions from normal open mouth Dense-LSTM [86] Fully use the temporal information with LSTM The first stage is easily overfit to none-drowsiness frames HeadMotion [87] Special tracking using a point between the eyes Using pre-defined rules is not sufficiently enough 2sAGCN [88] Robust action recognition, the model is relatively light Training data quantity and quality, especially the first one TSN [89] Integrating multiple frames for robust action recognition Context information from multiple frames are not fully used Eye4features [20] Using blinking frequency and length to detect drowsiness Facial and body features are ignored, limited in robustness Eye16features [17] Considering more features from eyes Facial and body features are ignored, limited in robustness FacialUnits [28] Using six drowsiness-related features and their extensions Not fine-grained enough to capture subtle visual signs HM-LSTM [18] Using four blinking features to learn an LSTM model Facial and body features are ignored, limited in robustness DDDNet [14] Modeling multiple features from eye, mouth and face The temporal information is not fully used VariousNets [61] Using features from both multiple frames and optical flow Context information from multiple frames are not fully used ...
Although vision-based drowsiness detection approaches have achieved great success on empirically organized datasets, it remains far from being satisfactory for deployment in practice. One crucial issue lies in the scarcity and lack of datasets that represent the actual challenges in real-world applications, e.g. tremendous variation and aggregation of visual signs, challenges brought on by different camera positions and camera types. To promote research in this field, we introduce a new large-scale dataset, FatigueView, that is collected by both RGB and infrared (IR) cameras from five different positions. It contains real sleepy driving videos and various visual signs of drowsiness from subtle to obvious, e.g. with 17,403 different yawning sets totaling more than 124 million frames, far more than recent actively used datasets. We also provide hierarchical annotations for each video, ranging from spatial face landmarks and visual signs to temporal drowsiness locations and levels to meet different research requirements. We structurally evaluate representative methods to build viable baselines. With FatigueView, we would like to encourage the community to adapt computer vision models to address practical real-world concerns, particularly the challenges posed by this dataset.