Conference Paper

Assessment of Efficient and Cost-Effective Vehicle Detection in Foggy Weather

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Vision-based vehicle detection in adverse weather conditions such as fog, haze, and mist is a challenging research area in the fields of autonomous vehicles, collision avoidance, and Internet of Things (IoT)-enabled edge/fog computing traffic surveillance and monitoring systems. Efficient and cost-effective vehicle detection at high accuracy and speed in foggy weather is essential to avoiding road traffic collisions in real-time. To evaluate vision-based vehicle detection performance in foggy weather conditions, state-of-the-art Vehicle Detection in Adverse Weather Nature (DAWN) and Foggy Driving (FD) datasets are self-annotated using the YOLO LABEL tool and customized to four vehicle detection classes: cars, buses, motorcycles, and trucks. The state-of-the-art single-stage deep learning algorithms YOLO-V5, and YOLO-V8 are considered for the task of vehicle detection. Furthermore, YOLO-V5s is enhanced by introducing attention modules Convolutional Block Attention Module (CBAM), Normalized-based Attention Module (NAM), and Simple Attention Module (SimAM) after the SPPF module as well as YOLO-V5l with BiFPN. Their vehicle detection accuracy parameters and running speed is validated on cloud (Google Colab) and edge (local) systems. The mAP50 score of YOLO-V5n is 72.60%, YOLO-V5s is 75.20%, YOLO-V5m is 73.40%, and YOLO-V5l is 77.30%; and YOLO-V8n is 60.20%, YOLO-V8s is 73.50%, YOLO-V8m is 73.80%, and YOLO-V8l is 72.60% on DAWN dataset. The mAP50 score of YOLO-V5n is 43.90%, YOLO-V5s is 40.10%, YOLO-V5m is 49.70%, and YOLO-V5l is 57.30%; and YOLO-V8n is 41.60%, YOLO-V8s is 46.90%, YOLO-V8m is 42.90%, and YOLO-V8l is 44.80% on FD dataset. The vehicle detection speed of YOLO-V5n is 59 Frame Per Seconds (FPS), YOLO-V5s is 47 FPS, YOLO-V5m is 38 FPS, and YOLO-V5l is 30 FPS; and YOLO-V8n is 185 FPS, YOLO-V8s is 109 FPS, YOLO-V8m is 72 FPS, and YOLO-V8l is 63 FPS on DAWN dataset. The vehicle detection speed of YOLO-V5n is 26 FPS, YOLO-V5s is 24 FPS, YOLO-V5m is 22 FPS, and YOLO-V5l is 17 FPS; and YOLO-V8n is 313 FPS, YOLO-V8s is 182 FPS, YOLO-V8m is 99 FPS, and YOLO-V8l is 60 FPS on FD dataset. YOLO-V5s, YOLO-V5s variants and YOLO-V5l_BiFPN, and YOLO-V8 algorithms are efficient and cost-effective solution for real-time vision-based vehicle detection in foggy weather.
Article
Full-text available
In computer vision, object detection is the classical and most challenging problem to get accurate results in detecting objects. With the significant advancement of deep learning techniques over the past decades, most researchers work on enhancing object detection, segmentation and classification. Object detection performance is measured in both detection accuracy and inference time. The detection accuracy in two stage detectors is better than single stage detectors. In 2015, the real-time object detection system YOLO was published, and it rapidly grew its iterations, with the newest release, YOLOv8 in January 2023. The YOLO achieves a high detection accuracy and inference time with single stage detector. Many applications easily adopt YOLO versions due to their high inference speed. This paper presents a complete survey of YOLO versions up to YOLOv8. This article begins with explained about the performance metrics used in object detection, post-processing methods, dataset availability and object detection techniques that are used mostly; then discusses the architectural design of each YOLO version. Finally, the diverse range of YOLO versions was discussed by highlighting their contributions to various applications.
Article
Full-text available
The development of object detection has led to huge improvements in human interaction systems. Object detection is a challenging task because it involves many parameters including variations in poses, resolution, occlusion, and daytime versus nighttime detection. This study surveys on various aspects of object detection that includes (1) basics of object detection, (2) object detection techniques, (3) datasets, (4) metrics and deep learning libraries. This study presents a systematic analysis of recent publications on object detection covering around 400 research articles and synthesised the findings to provide empirical answers to research questions. The review is based on relevant articles published from 2015 through 2022, as well as discussions of challenges and future directions in this field. Furthermore, the survey examined the contributions of various researchers concerning their respective application domains, while emphasizing the advantages and disadvantages of the research work. Despite the success of various methods proposed in literature for predicting results, there remains room for improvement in the accuracy of object detection.
Article
Full-text available
This work addresses the problem of semantic foggy scene understanding (SFSU). Although extensive research has been performed on image dehazing and on semantic scene understanding with weather-clear images, little attention has been paid to SFSU. Due to the difficulty of collecting and annotating foggy images, we choose to generate synthetic fog on real images that depict weather-clear outdoor scenes, and then leverage these synthetic data for SFSU by employing state-of-the-art convolutional neural networks (CNN). In particular, a complete pipeline to generate synthetic fog on real, weather-clear images using incomplete depth information is developed. We apply our fog synthesis on the Cityscapes dataset and generate Foggy Cityscapes with 20550 images. SFSU is tackled in two fashions: 1) with typical supervised learning, and 2) with a novel semi-supervised learning, which combines 1) with an unsupervised supervision transfer from weather-clear images to their synthetic foggy counterparts. In addition, this work carefully studies the usefulness of image dehazing for SFSU. For evaluation, we present Foggy Driving, a dataset with 101 real-world images depicting foggy driving scenes, which come with ground truth annotations for semantic segmentation and object detection. Extensive experiments show that 1) supervised learning with our synthetic data significantly improves the performance of state-of-the-art CNN for SFSU on Foggy Driving; 2) our semi-supervised learning strategy further improves performance; and 3) image dehazing marginally benefits SFSU with our learning strategy. The datasets, models and code will be made publicly available to encourage further research in this direction.
Article
The use of object detection algorithms has become extremely important in autonomous vehicles. Object detection at high accuracy and a fast inference speed is essential for safe autonomous driving. Therefore, the balance between effectiveness and efficiency of the object detector must be considered. This paper proposes a one-stage object detection framework for improving the detection accuracy while supporting a true real-time operation based on the YOLOv4. The backbone network in the proposed framework is the CSPDarknet53_dcn(P). The last output layer in the CSPDarknet53 is replaced with deformable convolution to improve the detection accuracy. In order to perform feature fusion, a new feature fusion module PAN++ is designed and 5 scales detection layers are used to improve the detection accuracy of small objects. In addition, this paper proposes an optimized network pruning algorithm to solve the problem that the real-time performance of the algorithm cannot be satisfied due to the limited computing resources of the vehicle-mounted computing platform. The method of sparse scaling factor is used to improve the existing channel pruning algorithm. Compared to the YOLOv4, the YOLOV4-5D improves the mean average precision by 4.23%on the BDD datasets and 1.68% on the KITTI datasets. Finally, by pruning the model, the inference speed of YOLOV4-5D is increased 31.3% and the memory is only 98.1 MB when the detection accuracy is almost unchanged. Nevertheless, the proposed algorithm is capable of real-time detection at faster than 66 frames per second (fps) and shows higher accuracy than the previous approaches with a similar fps.
Article
We present YOLO, a unified pipeline for object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is also extremely fast; YOLO processes images in real-time at 45 frames per second, hundreds to thousands of times faster than existing detection systems. Our system uses global image context to detect and localize objects, making it less prone to background errors than top detection systems like R-CNN. By itself, YOLO detects objects at unprecedented speeds with moderate accuracy. When combined with state-of-the-art detectors, YOLO boosts performance by 2-3% points mAP.
Article
A neural network model for a mechanism of visual pattern recognition is proposed in this paper. The network is self-organized by "learning without a teacher", and acquires an ability to recognize stimulus patterns based on the geometrical similarity (Gestalt) of their shapes without affected by their positions. This network is given a nickname "neocognitron". After completion of self-organization, the network has a structure similar to the hierarchy model of the visual nervous system proposed by Hubel and Wiesel. The network consists of an input layer (photoreceptor array) followed by a cascade connection of a number of modular structures, each of which is composed of two layers of cells connected in a cascade. The first layer of each module consists of "S-cells", which show characteristics similar to simple cells or lower order hypercomplex cells, and the second layer consists of "C-cells" similar to complex cells or higher order hypercomplex cells. The afferent synapses to each S-cell have plasticity and are modifiable. The network has an ability of unsupervised learning: We do not need any "teacher" during the process of self-organization, and it is only needed to present a set of stimulus patterns repeatedly to the input layer of the network. The network has been simulated on a digital computer. After repetitive presentation of a set of stimulus patterns, each stimulus pattern has become to elicit an output only from one of the C-cells of the last layer, and conversely, this C-cell has become selectively responsive only to that stimulus pattern. That is, none of the C-cells of the last layer responds to more than one stimulus pattern. The response of the C-cells of the last layer is not affected by the pattern's position at all. Neither is it affected by a small change in shape nor in size of the stimulus pattern.
DAWN: vehicle detection in adverse weather nature dataset
  • M A Kenk
  • M Hassaballah