Conference Paper

Detection of Distraction-related Actions on DMD: An Image and a Video-based Approach Comparison

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The recently presented Driver Monitoring Dataset (DMD) extends research lines for Driver Monitoring Systems. We intend to explore this dataset and apply commonly used methods for action recognition to this specific context, from image-based to video-based analysis. Specially, we aim to detect driver distraction by applying action recognition techniques to classify a list of distraction-related activities. This is now possible thanks to the DMD, that offers recordings of distracted drivers in video format. A comparison between different state-of-the-art models for image and video classification is reviewed. Also, we discuss the feasibility of implementing image-based or video-based models in a real-context driver monitoring system. Preliminary results are presented in this article as a point of reference to future work on the DMD.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Experiment II compares the test accuracy to the previous state-of-the-art accuracy achieved with a 3D CNN from [4]. [4] reach 97.2% accuracy while the vision transformer experiment reaches 97.5% accuracy ( Figure 4A). ...
... Experiment II compares the test accuracy to the previous state-of-the-art accuracy achieved with a 3D CNN from [4]. [4] reach 97.2% accuracy while the vision transformer experiment reaches 97.5% accuracy ( Figure 4A). The test loss descends consistently compared to training loss, indicating the transformer did not overfit. ...
... Training accuracy of the Video Swin Transformer on DMD compared to the previous state-of-the-art[4] with a CNNbased architecture ...
Preprint
Full-text available
A 20% rise in car crashes in 2021 compared to 2020 has been observed as a result of increased distraction and drowsiness. Drowsy and distracted driving are the cause of 45% of all car crashes. As a means to decrease drowsy and distracted driving, detection methods using computer vision can be designed to be low-cost, accurate, and minimally invasive. This work investigated the use of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs. Two separate transformers were trained for drowsiness and distractedness. The drowsy video transformer model was trained on the National Tsing-Hua University Drowsy Driving Dataset (NTHU-DDD) with a Video Swin Transformer model for 10 epochs on two classes -- drowsy and non-drowsy simulated over 10.5 hours. The distracted video transformer was trained on the Driver Monitoring Dataset (DMD) with Video Swin Transformer for 50 epochs over 9 distraction-related classes. The accuracy of the drowsiness model reached 44% and a high loss value on the test set, indicating overfitting and poor model performance. Overfitting indicates limited training data and applied model architecture lacked quantifiable parameters to learn. The distracted model outperformed state-of-the-art models on DMD reaching 97.5%, indicating that with sufficient data and a strong architecture, transformers are suitable for unfit driving detection. Future research should use newer and stronger models such as TokenLearner to achieve higher accuracy and efficiency, merge existing datasets to expand to detecting drunk driving and road rage to create a comprehensive solution to prevent traffic crashes, and deploying a functioning prototype to revolutionize the automotive safety industry.
... For the temporal annotations of the DMD, a tool was developed to make annotation tasks easier and faster to complete (TaTo, Temporal Annotation Tool, version V1 [60]); it is Python-based and allows the visualization of frame-by-frame annotations by colours on a timeline, which makes this process more intuitive. The output metadata was structured using OpenLABEL annotation format [61]. ...
... The first approach of distraction detection was made in [62], where the DMD material was annotated with some distractionrelated labels that led to the definition of a training dataset called dBehaviourMD. Other iteration in the annotation process resulted in the creation of labels destined for distraction detection that were used in the research of [60]. Recently, thanks to the metadata methodology and sensor configuration defined in this work, we have built a more descriptive and detailed annotation criterion that can support distraction detection through action recognition algorithms. ...
Article
Full-text available
Tremendous advances in advanced driver assistance systems (ADAS) have been possible thanks to the emergence of deep neural networks (DNN) and Big Data (BD) technologies. Huge volumes of data can be managed and consumed as training material to create DNN models which feed functions such as lane keeping systems (LKS), automated emergency braking (AEB), lane change assistance (LCA), etc. In the ADAS/AD domain, these advances are only possible thanks to the creation and publication of large and complex datasets, which can be used by the scientific community to benchmark and leverage research and development activities. In particular, multi-modal datasets have the potential to feed DNN that fuse information from different sensors or input modalities, producing optimised models that exploit modality redundancy, correlation, complementariness and association. Creating such datasets pose a scientific and engineering challenge. The BD dimensions to cover are volume (large datasets), variety (wide range of scenarios and context), veracity (data labels are verified), visualization (data can be interpreted) and value (data is useful). In this paper, we explore the requirements and technical approach to build a multi-sensor, multi-modal dataset for video-based applications in the ADAS/AD domain. The Driver Monitoring Dataset (DMD) was created and partially released to foster research and development on driver monitoring systems (DMS), as it is a particular sub-case which receives less attention than exterior perception. Details on the preparation, construction, post-processing, labelling and publication of the dataset are presented in this paper, along with the announcement of a subsequent release of DMD material publicly available for the community.
... Various multiview multimodal methods have been proposed with different emphases. Some propose novel learning methods (e.g., supervised contrastive learning [15]), while others [1,4,[21][22][23] are focused on handling the temporal dimension. However, how to combine heterogenous data in DMS has rarely been studied. ...
Preprint
Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensors mounted at different locations to monitor the driver and the vehicle's interior scene and employ decision-level fusion to integrate these heterogenous data. However, this fusion method may not fully utilize the complementarity of different data sources and may overlook their relative importance. To address these limitations, we propose a novel multiview multimodal driver monitoring system based on feature-level fusion through multi-head self-attention (MHSA). We demonstrate its effectiveness by comparing it against four alternative fusion strategies (Sum, Conv, SE, and AFF). We also present a novel GPU-friendly supervised contrastive learning framework SuMoCo to learn better representations. Furthermore, We fine-grained the test split of the DAD dataset to enable the multi-class recognition of drivers' activities. Experiments on this enhanced database demonstrate that 1) the proposed MHSA-based fusion method (AUC-ROC: 97.0\%) outperforms all baselines and previous approaches, and 2) training MHSA with patch masking can improve its robustness against modality/view collapses. The code and annotations are publicly available.
... Canas et al. [6] used CNN and Long Short Term Memory (LSTM) for detection of driver's behavior. This method suffers from high computational complexity. ...
Article
Full-text available
With the current market’s growing need for electric vehicles and technologies in high-end vehicles, distracted driver detection requires the artificial intelligence’s attention. In this paper, new strategies for improving the performance of the driver detection methodology are proposed. The proposed approach consists of two sub-systems namely driver activity detection and driver fatigue detection systems. The former one detects the activities of driver. The latter one is based on the facial feature recognition and determines the driver’s fatigue level. The proposed model is evaluated on the activity detection and attained the classification accuracy of 99.69%, compared to the 94.32% accuracy in the state-of-the-art comparison. The KNN classifier had the best accuracy for detecting driver fatigue, with a 76.33% success rate. Experimental results reveal the superiority of proposed model over the existing models. The proposed model can be applied in the real-life environment.
... Therefore, one can classify the distraction actions through video or image classification. The author in Cañas et al. (2021) tested both approaches and found that image classification, where turning video frames into image dataset, performs better than video classification. They achieved an image classification accuracy of 99.5% with MobileNetV1, while a video classification accuracy of 97.3% with MobileNetV1 + LSTM through 30-frame video clips. ...
Article
Driver distraction is one of the main causes of fatal traffic accidents. Therefore, the ability to detect driver inattention is essential in building a safe yet intelligent transportation system. Currently, the available driver distraction detection systems are not widely available or limited to specific class actions. Various research efforts have approached the problem through different techniques, including the usage of intrusive sensors, which are not feasible for mass production. Most of the work in early 2010s used traditional machine learning approaches to perform the detection task. With the emergence of deep learning algorithms, many research has been conducted to perform distraction detection using neural networks. Furthermore, most of the work in the field is conducted under simulation or lab environment, and did not validate the proposed system under naturalistic scenario. Most importantly, the research efforts in the field could be further subdivided into many subtasks. Thus, this paper aims to provide a comprehensive review of approaches used to detect driving distractions through various methods. We review all recent papers from 2014-2021 and categorized them according to the sensors used. Based on the reviewed articles, a simplified framework to visualize the detection flow, starting from the used sensors, collected data, measured data, computed events, inferred behaviour, and finally its inferred distraction type is proposed. Besides providing an in-depth review and concise summary of various published works, the practicality and relevancy of driver distraction detection towards increasing vehicle automation are discussed. Further, several open research challenges and provide suggestions for future research directions are provided. We believe that this review will remain helpful despite the development towards a higher level of vehicle automation.
Chapter
This paper concerns a methodology of a semi-automatic annotation strategy for the gaze estimation material of the Driver Monitoring Dataset (DMD). It consists of a pipeline of semi-automatic annotation that uses ideas from Active Learning to annotate data with an accuracy as high as possible using less human intervention. A dummy model (the initial model) that is improved by iterative training and other state-of-the-art (SoA) models are the actors of an automatic label assessment strategy that will annotate new material. The newly annotated data will be used as an iterative process to train the dummy model and repeat the loop. The results show a reduction of annotation work for the human by 60%, where the automatically annotated images have a reliability of 99%.
Article
Detecting driver inattentive behaviors is crucial for driving safety in a driver monitoring system (DMS). Recent works treat driver distraction detection as a multiclass action recognition problem or a binary anomaly detection problem. The former approach aims to classify a fixed set of action classes. Although specific distraction classes can be predicted, this approach is inflexible to detect unknown driver anomalies. The latter approach mixes all distraction actions into one class: anomalous driving. Because the objective focuses on finding the difference between safe and distracted driving, this approach has better generalization in detecting unknown driver distractions. However, a detailed classification of the distraction is missing from the predictions, meaning that the downstream DMS can only treat all distractions with the same severity. In this work, we propose a two-phase anomaly proposal and classification framework [driver anomaly detection and classification network (DADCNet)] robust for open-set anomalies while maintaining high-level distraction understanding. DADCNet makes efficient allocation of multimodal and multiview inputs. The anomaly proposal network first utilizes a subset of the available modalities and views to suggest suspicious anomalous driving behavior. Then, the classification network employs more features to verify the anomaly proposal and classify the proposed distraction action. Through extensive experiments in two driver distraction datasets, our approach significantly reduces the total amount of computation during inference time while maintaining high anomaly detection sensitivity and robust performance in classifying common driver distractions.
ResearchGate has not been able to resolve any references for this publication.