PreprintPDF Available

Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLOv8, YOLOv9, YOLOv10, and YOLOv11

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The increasing urbanization and the growing number of vehicles in cities have underscored the need for efficient parking management systems. Traditional smart parking solutions often rely on sensors or cameras for occupancy detection, each with its limitations. Recent advancements in deep learning have introduced new YOLO models (YOLOv8, YOLOv9, YOLOv10, and YOLOv11), but these models have not been extensively evaluated in the context of smart parking systems, particularly when combined with Region of Interest (ROI) selection for object detection. Existing methods still rely on fixed polygonal ROI selections or simple pixel-based modifications, which limit flexibility and precision. This work introduces a novel approach that integrates Internet of Things, Edge Computing, and Deep Learning concepts, by using the latest YOLO models for vehicle detection. By exploring both edge and cloud computing, it was found that inference times on edge devices ranged from 1 to 92 seconds, depending on the hardware and model version. Additionally, a new pixel-wise post-processing ROI selection method is proposed for accurately identifying regions of interest to count vehicles in parking lot images. The proposed system achieved 99.68% balanced accuracy on a custom dataset of 3,484 images, offering a cost-effective smart parking solution that ensures precise vehicle detection while preserving data privacy
Content may be subject to copyright.
SMART PARKING WITH PIXEL-WISE ROI SELECTION FOR
VEHICLE DETECTION USING YOLOV8, YOLOV9, YOLOV10,
AND YOLOV11
A PREPRINT
Gustavo P. C. P. da Luz
Institute of Computing
University of Campinas (UNICAMP)
Av. Albert Einstein, 1251,
Campinas, 13083-852, SP, Brazil
ra271582@students.ic.unicamp.br
Gabriel Massuyoshi Sato
Institute of Computing
University of Campinas (UNICAMP)
Av. Albert Einstein, 1251,
Campinas, 13083-852, SP, Brazil
ra172278@students.ic.unicamp.br
Luis Fernando Gomez Gonzalez
Institute of Computing
University of Campinas (UNICAMP)
Av. Albert Einstein, 1251,
Campinas, 13083-852, SP, Brazil
gonzalez@unicamp.br
Juliana Freitag Borin
Institute of Computing
University of Campinas (UNICAMP)
Av. Albert Einstein, 1251,
Campinas, 13083-852, SP, Brazil
juliana@ic.unicamp.br
December 4, 2024
ABS TRAC T
The increasing urbanization and the growing number of vehicles in cities have underscored the need
for efficient parking management systems. Traditional smart parking solutions often rely on sensors
or cameras for occupancy detection, each with its limitations. Recent advancements in deep learning
have introduced new YOLO models (YOLOv8, YOLOv9, YOLOv10, and YOLOv11), but these
models have not been extensively evaluated in the context of smart parking systems, particularly
when combined with Region of Interest (ROI) selection for object detection. Existing methods still
rely on fixed polygonal ROI selections or simple pixel-based modifications, which limit flexibility and
precision. This work introduces a novel approach that integrates Internet of Things, Edge Computing,
and Deep Learning concepts, by using the latest YOLO models for vehicle detection. By exploring
both edge and cloud computing, it was found that inference times on edge devices ranged from
1 to 92 seconds, depending on the hardware and model version. Additionally, a new pixel-wise
post-processing ROI selection method is proposed for accurately identifying regions of interest to
count vehicles in parking lot images. The proposed system achieved 99.68% balanced accuracy on a
custom dataset of 3,484 images, offering a cost-effective smart parking solution that ensures precise
vehicle detection while preserving data privacy.
Keywords Smart Parking ·IoT ·Edge Computing ·YOLO
1 Introduction
With the growing number of vehicles in cities, the need for improved parking management in public areas has become
more pressing. The search for available parking spaces, commonly known as "cruising," contributes to increased
traffic circulation on roads, leading to higher congestion and carbon emissions from vehicles [
1
]. Additionally, it
causes driver dissatisfaction, which can potentially reduce economic competitiveness in areas where finding parking is
Corresponding Author
arXiv:2412.01983v1 [cs.CV] 2 Dec 2024
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
difficult [
2
]. Motivated by the concept of smart cities, which aim to optimize resource and energy use and enhance
service efficiency [3], there is a strong drive to improve this process.
On average, 31% of the land in large cities is used for parked cars, with some cities like Los Angeles [
4
], reaching up to
81%. Additionally, with rapid urban population growth, the UN estimates that around 6 billion people will be living in
cities by 2050 [
3
]. Given these figures, studies related to smart cities, including this one, are essential for improving the
efficiency of urban spaces in the future.
Several existing smart parking solutions use sensors in each parking space to detect occupancy. Others employ car
sensors, monitoring cameras or even drones to capture parking lot images [
5
]. Despite some successful implementations,
many of these solutions have limitations. Scalability is a key factor, given the difficulty of deploying a complex Internet
of Things (IoT) infrastructure for data collection and analysis in urban areas. Implementation costs are also a decisive
factor in choosing the best solution for an ongoing project [
3
]. Privacy concerns regarding data and image collection
from citizens are additional challenges.
This paper proposes an efficient and scalable smart parking solution by integrating IoT, Edge Computing, and Deep
Learning. It explores the use of recent YOLO models for vehicle detection, addressing the limitations of existing
methods. The paper evaluates different models using a pixel-wise Region of Interest (ROI) selection method, which
improves flexibility and precision in detecting vehicles within parking images. One key scalability metric considered is
the cost per parking space, with the goal of reducing costs while maintaining accurate predictions of the number of
parked cars.
The proposed smart parking system uses cameras to capture images of the parking area at predefined intervals, according
to the business needs. These images are processed by a neural network to determine the number of available parking
spaces, which can be done either on a local edge device or on a remote server in the cloud. The number of vehicles is
then sent to an IoT platform, enabling users to access this data through a website or mobile app before arriving at the
parking facility. A real parking lot is used as a case study. The main contributions of this paper are:
proposal of a cost-effective and scalable system for vehicle detection, demonstrating a high accuracy on a
custom dataset using recently released pre-trained models and optimized image processing techniques;
comparison of Edge Computing and Cloud Computing for image classification using neural networks, focusing
on inference time across six different devices. This comparison highlights the efficiency of Edge Computing
in scenarios where latency and accuracy are critical;
introduction of a new, fully customized masking method for ROI selection in images. This method offers
greater flexibility and precision compared to traditional approaches by allowing free-form selection rather than
just polygons, making it well-suited for complex scenarios;
to the best of the authors’ knowledge, this is the first paper to compare recently released models such as
YOLOv9, YOLOv10, and YOLOv11 applied to a parking lot dataset;
A comparison of inference times for four different YOLO versions across six different hardware platforms with
varying computational capacities. Additionally, we measured the latency in a standardized GPU environment
(NVIDIA A100), which is not found on the literature.
The rest of this paper is organized as follows: Section 2 brings an overview of the related works and similar approaches
using camera-based systems. Section 3 presents an overview of the system and the processing scenarios. Section
4 discusses the experiments conducted and the metrics used. Section 5 presents the results and discusses model
performance, time, resources, and cost analysis. Section 6 concludes the paper and suggests possible directions for
future work.
2 Related Work
Currently, there are various methods for detecting cars in parking spaces, with the predominant method being the use of
devices based on infrared, ultrasonic, or magnetic sensors.
Infrared sensors can be divided into two types: i) passive detection, which detects changes in ambient infrared radiation
when a car occupies a space - this type of sensor is also used in security alarms and automatic lighting systems; ii)
active detection, which emits infrared radiation and measures the reflected signal to detect objects. This approach is
also used in obstacle detection applications. Ultrasonic sensors emit sound waves, and detection occurs based on the
time it takes for the wave to return. The use of magnetometers for car detection is based on changes in the magnetic
field when a car approaches the sensor.
2
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
While sensor usage proves to be efficient in producing accurate results, it has implementation and scalability limitations.
The need for one or more sensors for each parking space poses a resource usage challenge, especially for large areas,
significantly raising project costs [
6
,
7
]. Additionally, a robust infrastructure is required for each sensor to work
correctly, making the solution less scalable for large parking lots. Another limitation is that this solution is often applied
in controlled indoor environments (e.g., underground mall parking). In open areas with significant pedestrian traffic,
sensors may struggle to distinguish between cars and people, animals, or even cars not parked in a space [8].
Another car detection strategy involves distributing tags and sensors for each vehicle [
9
]. This method allows
radiofrequency receivers to identify the number of cars in the parking lot. However, determining the exact location of
vacant spaces is not tipically feasible, and the installation process can be complex [10].
Finally, car identification through images is a method for assessing the status of a parking lot. Drones can take aerial
photos, but recognizing cars in vertical images poses a significant challenge [
11
]. Furthermore, issues related to power
consumption and flight time add to the complexity of utilizing drones for this purpose [
12
]. The use of cameras mounted
on poles is a promising method for capturing images from suitable angles for car recognition.
Before the advancements in machine learning and deep learning, classification was performed using methods like the
traditional Haar Feature-based Cascade Classifier. This method achieved a detection rate ranging from 89.3% to 95%
on a custom dataset, as reported by Sieck et al. [2020][13]. The study used an NVIDIA Jetson Nano for inference.
Acharya et al. [2018][
14
] proposed the use of a classifier combining a Convolutional Neural Network (CNN) with a
binary Support Vector Machine (SVM), achieving 99.7% accuracy for the PKLot dataset [
15
] and 96.7% for a custom
dataset.
Bura et al. [2018][
16
] proposed a custom-designed neural network model based on AlexNet to detect parking occupancy
from video streamed from top-view cameras in the dataset from the NVIDIA AI City Challenge [
17
], combined with
data from PKlot and CNRPark. The proposed network consists of 1 input layer, 1 convolution layer, 1 Rectified Linear
Unit (ReLU), 1 max pooling, and 3 fully connected layers. It was designed to run on an edge devices such as the
NVIDIA Jetson Tx2 or a Raspberry Pi. The proposed system also includes a set of cameras installed close to the ground
to capture images of license plates for vehicle tracking. The authors achieved high precision in parking spot detection,
with a 99.7% success rate on a custom dataset combined with PKlot and CNRPark [
6
]. The inference time per parking
spot was 7.11 milliseconds.
Carrasco et al. [2021][
18
] proposes T-YOLO, a modification of the YOLOv5 model tailored for tiny object detection.
This approach achieves a precision of up to 96,34% on the PKLot dataset with a fine-tuned model. It employs an ROI
mask to select the monitored area following the pre-processing method described in Section 4.3.1.
Shukla et al. [2022][
19
] presents a system that utilizes surveillance cameras installed in parking spaces to capture
frames. A CNN model then identifies free or occupied slots. The system provides real-time updates to a server, which
can be accessed through mobile and web applications. Additionally, it incorporates a Long Short-Term Memory
(LSTM) model to predict parking space availability based on factors such as day, time, and weather conditions. The
authors report that the CNN model achieves an precision of around 97.89% under different weather conditions using
the CNRPark dataset and around 97.5% in a custom dataset. The proposed system uses a centralized architecture.
Nithya et al. [2022][
20
] uses Faster Region-based CNN (Faster R-CNN) with YOLOv3 to detect vehicles in parking lot
images captured by a web camera in a NVIDIA Jetson Tx2. The accuracy results for this approach reach 98.41% on a
custom dataset.
In 2023, Satyanath et al.[
21
] attack the problem of vacant parking slot detection under hazy conditions. The detection
system utilizes two networks: an end-to-end dehazing network and a parking slot classifier. For the first one, they
employ an All-in-One Dehazing network (AOD-Net) architecture, while for the second one, they follow the mAlexnet
architecture (proposed by Amato et al. [2016][
22
]). The proposed strategy improved the precision of the model
proposed by Amato et al. [2016][
22
] by 10% to 15%. Runtime analysis did not consider edge devices. Rafique et al.
[2023][
23
] uses a fine-tuned YOLOv5 to implement a parking management system, evaluating the model for PKLot
dataset and custom dataset with respective accuracies of 99.5% and 96.8%, using the pre-processing ROI method
showed in Section 4.3.1. The authors found that using a pre-trained YOLO model achieved results superior to training
with the PKLot dataset.
Recents advances can be seen in works such as the one from Doshi et al. [2024][
24
], that compared the performance of
YOLOv3, YOLOv5, YOLOv7, YOLOv8 across PKlot and an Aerial View Car Detection dataset with custom training,
achieving the best results with YOLOv8. Sundaresan Geetha et al. [2024][
25
] evaluated how well YOLOv8 and
YOLOv10 could detect different vehicles, showing a higher classification accuracy of cars for YOLOv8.
3
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Hudda et al. [2024][
26
] used R-CNN and Faster R-CNN combined with multiple wireless sensor networks (WSN) in a
smart parking, optimizing energy usage and switching between WSN and vision based sensing. They achieved 99.16%
accuracy with a fine-tuned Faster R-CNN with a ResNet50 backbone at the ACPDS dataset [
27
]. Despite using WSN
and optical verification at the edge, the work does not perform the inference on-device, thus it was not considered as
edge inference.
Considering all types of car recognition solutions in parking lots, the study conducted in this paper addresses many
challenges faced by these systems, such as using only pre-trained state of the art models. Using cameras for car
detection eliminates the need for a robust and expensive hardware structure, as in the case of sensors. Additionally, the
project is implemented in an open parking area with considerable foot traffic, prepared to handle these challenges. Data
privacy will also be possible to be maintained, as image processing is possible to happen at the edge of the system,
eliminating the need to transmit camera pictures over the internet to a cloud or server.
The definition of the edge can vary across different authors. In this context, the concept of Edge Intelligence (EI) is
adopted, which represents a fusion of Edge Computing and Artificial Intelligence. This work follows the definition of
Zhou et al. [2019] [
28
], who proposed a six-level rating system for EI based on the amount and path length of data
offloading. In this case, a Level 3 EI device is considered, where a deep neural network model is trained in the cloud,
but inference is performed on the device itself. This approach ensures that only the number of vehicles detected is sent
to the cloud, preserving data privacy. While using vision-based approaches for vehicle detection can present challenges
in terms of data privacy, it also offers an opportunity to leverage EI to reduce data transmission costs [29].
A device is considered to be at the edge of the network when it performs inference locally on the device, rather than
sending the collected data to a central location, such as a data center hosted by a cloud service provider, or any other
external server or computer that requires data transfer over the network.
Table 1 compares different approaches to the proposed solution in terms of inference location (edge/cloud), the use of a
mask for selecting a ROI, and the classifier model employed.
Table 1: Comparison of Camera-based Parking Occupancy Detection Methods
Work Inference at ROI
Mask Classifier
Cloud Edge
Acharya et al.[14] (2018) CNN + SVM
Bura et al.[16] (2018) Custom AlexNet
Sieck et al.[13](2020) Haar Cascade
Carrasco et al.[18] (2021) Custom YOLOv5
Shukla et al.[19] (2022) CNN + LSTM
Nithya et al.[20] (2022) Faster R-CNN
and YOLOv3
Satyanath et al.[21] (2023) AOD-Net
and mAlexNet
Rafique et al.[23] (2023) YOLOv5
Hudda et al.[26] (2024) Faster R-CNN
and R-CNN
Doshi et al.[24] (2024) YOLO:
v3, v5, v7 and v8
This work (2024) YOLO:
v8, v9, v10 and v11
4
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
3 System model
The main idea of the system is to use a camera to monitor several parking spaces by capturing images at predefined time
intervals. Each image goes through a deep neural network model trained to detect objects. This step can be performed
on a server in the cloud by sending the image through the Internet or on a local device, leveraging the potential of edge
computing. We tested both approaches in this work. Next, the result of the inference is used to count the number of
available spots, deducting the number of detected vehicles from the total number of spots. This result is sent to an IoT
platform, which maintains a history of data and makes the data available for applications. Figure 1 shows the layout of
the system.
Figure 1: System Architecture.
There are significant differences in processing power, bandwidth usage, privacy, and cost when choosing the location
for image processing. Taking that into account when designing a data pipeline is a necessary step in the system model
proposal, choosing which data is captured, what is the processing at the edge and how the cloud deals with this data
[
30
]. This can provide an architecture that benefits from edge and cloud while meeting the application business needs.
3.1 Using a cloud server to process
In this scenario, images taken from the parking lot are sent to a cloud server. The advantage of this approach is the
reduced processing time, as the computing power in the cloud is superior to that of a device at the edge. However,
there are disadvantages such as increased bandwidth usage for sending images, bureaucratic hurdles, and privacy issues
associated with transmitting images from public places.
3.2 Using an edge device to process
In this scenario, a device such as a Raspberry Pi is connected to the cameras to provide edge processing. This alternative
avoids the transmission of images over the network, reducing bandwidth usage and enhancing privacy. However,
it presents challenges such as inference time, requiring a model that can detect cars with high accuracy but low
computational cost, given the limited hardware resources of edge devices compared to those available on cloud servers.
Specifically at the smart parking application, we want the device that senses the data (eg. a Raspberry Pi) to be the
device that makes an inference of the number of vehicles detected, sending only this number to the cloud.
4 Experiments
To conduct the experiments, the dataset was collected, labeled, pre-processed, and filtered. The inference model options
were chosen and the number of cars predicted in each image was compared with the labels to obtain performance
5
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
metrics for the proposed model. The experiments were conducted using various hardware configurations to compare
both cloud and edge computing strategies.
4.1 Dataset
In a parking lot within Universidade Estadual de Campinas (UNICAMP), we installed a camera. A single photo was
sufficient to cover the entire parking lot, which contained 16 spaces, as depicted in Figure 2. In parking lots with
favorable viewing angles, dozens of spaces can be monitored using just one camera. A prior study was conducted to
determine the optimal viewing angle for camera placement.
Data collection: For several weeks, photos were captured to compile and validate our dataset. The entire dataset
contains 4,477 images selected to classify 71,632 different parking scenarios, including rainy, sunny, cloudy, day, night,
crowded, and empty conditions.
Figure 2: A sample picture of the dataset.
Data labeling: We labeled the dataset, counting the number of vehicles in each image, including cars and trucks. This
number of vehicles is deducted from the total number of places to obtain the amount of free spaces.
Data pre-processing: Data was pre-processed by compressing the images from Portable Network Graphics (PNG)
format to Joint Photographic Experts Group (JPG).
Data filtering: The dataset was filtered before running the experiments to images containing at least one car, so images,
where the parking is empty, are not counted. This helps to balance the number of backgrounds and vehicles and produce
fair metrics. 22% of the original dataset was removed, resulting in a number of 3,484 images.
4.2 Inference Model
A lot of classifiers could be used and compared, as, there is a wide range of possible models in terms of object detection,
such as shown by Table 1.Sapkota et al. [2024][
31
] made an analysis of the advancements from traditional approaches
such as classic machine learning classifiers, followed by the emergence of CNN’s, Region-based CNN, and the YOLO
series. As stated by other works superior results of YOLO [
32
][
33
] and initial tests not showing promising results
with models such as Masrk R-CNN and EfficientDet, we selected different recent flavors of YOLO in this study, with
versions 8, 9, 10, and 11, selecting the largest and lightest model of each version (eg. YOLOv8x and YOLOv8n).
YOLO stands for "You Only Look Once", being an object detector that uses features learned by a deep convolutional
neural network to detect an object in a single forward pass, different than two-stage detectors that propose regions and
then process the region candidates [
34
]. For YOLO, every input image to the model is split into grids and each cell in a
grid predicts some bounding boxes and gives a confidence value which indicates how sure the model is that the box
6
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
contains an object. After this step, the model knows where the object is in that image but it does not know what that
object is. For knowing the class, each cell predicts a class probability using the pre-trained weights. Finally, the box
and class predictions are combined to identify the object.
The first version of YOLO was released in 2015 by Redmon [2016][
35
] with an CNN combined with two fully
connected layers. The evolution is shown by Wang and Liao [2024][
36
], highlighting the changes in the architectures,
such as the backbone, usage of Feature Pyramid Networks (FPN) [
37
], Spatial Pyramid Pooling Networks (SPP) [
38
]
and Path Aggregation Networks (PAN) [
39
]. This work focuses on the latest versions released since 2023: YOLOv8,
YOLOv9, YOLOv10, and YOLOv11.
YOLOv8 [
40
] uses an anchor-free model with a decoupled head, sigmoid function for activation, and softmax function
for class probabilities, using the backbone as a variant of CSPNet with the named cross-stage partial bottleneck with
two convolutions (C2f) module, combining high-level features with contextual information to improve performance
[41]. After going throught the CNN, the FPN is replaced by a PAN.
YOLOv9 [
42
] implements Programmable Gradient Information to preserve data across the layers of neural network and
improve convergence, combined with a change in the backbone proposing the use of GELAN, combining CSPNet and
ELAN.
YOLOv10 [
43
] on the other hand brings attention mechanisms with the position-sensitive attention block (PSA) to the
CSPNet backbone, while eliminating the need for Non Maximum Suppression (NMS) during inference.
YOLOv11 [
44
] brings improvements in the feature extractor using more convolutions with a modified CSPnet backbone,
containing a series of PSA blocks enhancing attention mechanisms and further data augmentations techniques.
Some metrics of the detection on the benchmark Common Objects in Context (COCO) dataset [
45
] of 2017 with 80
pre-trained classes and diverse objects are shown in Table 2.
Table 2: Comparison of Model Performance Metrics for COCO dataset. Adapted from Jocher et al. [2023] [40].
Model mAP1
Reported
Latency2(ms)
Measured
Latency3(ms)
Parameters4(M) FLOPs5(B)
YOLOv8n 37.3 6.16 3.61 ±0.38 3.2 8.7
YOLOv8x 53.9 16.86 8.42 ±0.49 68.2 257.8
YOLOv9t 38.3 - 5.12 ±0.41 2.0 7.7
YOLOv9e 55.6 - 10.69 ±0.39 58.1 192.5
YOLOv10n 38.5 1.84 3.14 ±0.11 2.3 6.7
YOLOv10x 54.4 10.7 7.46 ±0.28 29.5 160.4
YOLOv11n 39.5 1.5 3.61 ±0.21 2.6 6.5
YOLOv11x 54.7 11.3 7.87 ±0.42 56.9 194.9
1mean Average Precision (mAP) measures object detection performance across multiple classes over different Intersection over
Union (IoU) thresholds, from 0.5 to 0.95.
2Latency is measured on different GPU’s running TensorRT.
3Latency is measured on a NVIDIA A100 GPU running TensorRT.
4Parameters refer to the number of trainable weights in the model.
5Floating Point Operations (FLOPs) measure the computational complexity of the model.
The analysis of the model performance metrics for the COCO dataset suggests that YOLOv9e has the highest detection
accuracy, with a mAP of 55.6. YOLOv11n, in the other hand, is more efficient, with the lowest reported latency and
computational complexity. This shows that more advanced version of YOLO are getting more adequate for resource-
constrained devices, with little impact on accuracy. As shown in Table 2, some values are missing in the reported
latency, and the measurements are not consistent. In some cases, an A100 GPU is used (e.g., for YOLOv8), while in
others, a T4 GPU is used (e.g., for YOLOv10 and YOLOv11), and in some instances, no latency data is reported (e.g.,
for YOLOv9). This inconsistency motivated the introduction of a standardized latency measure using only an NVIDIA
A100 GPU with TensorRT models at the FP32 precision. The test involved executing 1,500 inferences on a COCO
dataset image, discarding the initial 100 inferences. The results from this test show that YOLOv10 achieved lower
latency. Section 5 presents additional inference time measurements across six devices, providing a more comprehensive
assessment of the latest models’ latency, including those in non-GPU environments.
The dataset was not used to train the model, as we used only the pre-trained weights of the network, so 100% of the data
was used as validation. In our work, we kept the premise of feeding the model with the images at original collection
sizes (768x1024), as the model performs an automatic resizing in the image if they do not match the original training
7
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
shape (640x640), when using the Ultralytics library [
40
]. Vehicles were counted based on the result generated by the
model inference, matching the class names stored in the model dictionary.
4.3 Region Of Interest
The dataset contains images with vehicles positioned outside of the parking lot space, so we created a mask to select
the ROI (Region Of Interest) avoiding classifying cars in unwanted places. We used two approaches to select the ROI:
with a pre-processing of the photos and with post-processing after the neural network makes the prediction. For both
approaches, the same mask was created by selecting a random picture and manually editing it using the free image
editor software GNU Image Manipulation Program (GIMP)
2
. The region in which there are wanted cars was selected
and painted with black color, leaving the rest as white, as shown in Figure 3. The mask was left in the same shape as the
original image.
Figure 3: Reference Mask used to select ROI .
4.3.1 Pre-Processing Mask Approach
In the pre-processing approach, we utilized the reference mask to change the value of all pixels corresponding to the
white region to gray in the three color channels (R = 128, G = 128, B= 128). This effectively covered the regions outside
of the parking lot with gray, as can be seen in Figure 4. Works [
18
,
23
] have used similar handcrafted masks in order to
select the region and to consider only annotated slots when fine-tuning YOLOv5 models, fixing issues at missannotated
datasets. All the images underwent this process before being passed to the model. Algorithm 1 shows the pseudocode,
where each image pixels are changed to contain only the selected region:
2https://www.gimp.org
8
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Algorithm 1 Pseudocode of Pre-Processing Mask Approach
1: Input: Load reference mask (black for ROI, white for non-ROI)
2: for each image in the dataset do
3: Load image
4: Multiply image by mask
5: for each pixel in the image do
6: if pixel in reference mask is white then
7: Set pixel value in the image to gray in the three color channels (128, 128, 128)
8: end if
9: end for
10: Run model inference to detect objects in the updated image
11: Output: Total object count for the image
12: end for
Figure 4: Sample of the pre-processing approach.
4.3.2 Post-Processing Mask Approach
The pre-processing approach modifies the image pixels, removing parts that can be important to provide context to the
model, which is a factor that can improve the performance of YOLO models [46][47][48].
To solve that, the proposed post-processing approach uses the same reference mask shown at Figure 3 and do not change
any pixel value of all the images inputted into the model. As shown in Algorithm 2, after the model produces the results
containing the bounding boxes of the predictions, we compare the center of x and y coordinates with the corresponding
pixel value at the reference mask and considers only the vehicles inside it, which had the black color on the mask.
9
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Algorithm 2 Pseudocode of Post-Processing Mask Approach
1: Input: Load reference mask (black for ROI, white for non-ROI)
2: for each image in the dataset do
3: Load image
4: Run model inference to detect objects in the image
5: Initialize: object_count
6: for each predicted bounding box representing the object class do
7: Get the center coordinates (x, y)of the bounding box
8: if pixel value at (x, y)in reference mask is black then
9: Increment object_count by 1 (valid ROI)
10: end if
11: end for
12: Output: Total object count for the image
13: end for
In Figure 5 it is possible to see that there are 13 cars identified, however only 8 are counted, which is the desired effect
computing only vehicles inside the parking lot.
Figure 5: Sample of the post-processing approach.
Although this post-processing method represents a novel approach for selecting a fully customized pixel-wise ROI for
object detection, similar methods can be found in the Ultralytics library [
40
], particularly in the "Object Counting in
Different Regions" section and other tutorials. However, these existing methods are limited to regular polygonal shapes
and may not perform well in more complex, custom scenarios. In contrast, the proposed method is both precise and
flexible, accommodating any mask shape.
4.4 Metrics
To compute the metrics and compare the results between different approaches and with existing works, the following
basic terms are defined:
True Positives (TP): Correctly predicted empty space.
True Negatives (TN): Correctly predicted vehicle.
10
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
False Positives (FP): Predicted empty space, but there’s a vehicle.
False Negatives (FN): Predicted vehicle, but there’s an empty space.
This choice was based on similar smart parking solutions ways of assessing model performance [
16
,
49
]. The following
metrics were used based on the terms defined:
1. Accuracy provides an overall view of the model’s performance. It is calculated as:
Accuracy =T P +T N
T P +F P +T N +F N (1)
2. F1-score is the harmonic mean of precision and recall, especially useful for imbalanced classes.
Precision measures the proportion of true positive predictions among all positive predictions and is
calculated as:
Precision =T P
T P +F P (2)
Recall measures the proportion of true positive predictions among all actual positive instances and is
calculated as:
Recall =T P
T P +F N (3)
The F1-score is defined as:
F1-score = 2 ×precision ×recall
precision +recall (4)
3. Balanced Accuracy is useful for imbalanced datasets.
Sensitivity measures the proportion of true positive predictions among all actual positive instances and is
calculated as:
Sensitivity =T P
T P +F N (5)
Specificity measures the proportion of true negative predictions among all actual negative instances and
is calculated as:
Specificity =T N
T N +F P (6)
Balanced Accuracy is defined as:
Balanced Accuracy =Sensitivity +Specificity
2(7)
We also produced the confusion matrices for both pre and pos-processing approaches, which is a matrix that is often
used to describe the performance of a classification model on a set of test data for which the true values are known.
11
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
5 Results and Discussion
5.1 Model Performance
Table 3: Model Performance Metrics: n= 3484.
ROI Method Model Metrics
Model Accuracy Bal Acc F1-score
Pre-Processed
Mask YOLOv8n 82.21% 66.24% 0.9172
YOLOv9t 84.08% 59.09% 0.8950
YOLOv10n 89.73% 70.20% 0.9300
YOLOv11n 85.75% 61.23% 0.9066
YOLOv8x 93.47% 76.57% 0.9579
YOLOv9e 94.26% 79.45% 0.9604
YOLOv10x 95.04% 80.37% 0.9674
YOLOv11x 94.30% 78.99% 0.9623
Post-Processed
Mask YOLOv8n 96.63% 94.86% 0.9720
YOLOv9t 94.60% 86.90% 0.9644
YOLOv10n 97.57% 97.32% 0.9757
YOLOv11n 96.15% 95.10% 0.9599
YOLOv8x 98.88% 98.49% 0.9907
YOLOv9e 99.76% 99.68% 0.9975
YOLOv10x 98.75% 98.30% 0.9875
YOLOv11x 99.46% 99.39% 0.9946
Table 3 shows the results for the selected metrics for both methods and selected deep learning models. To compare the
models we use a radar chart that shows the balanced accuracy for each model and ROI method in Figure 6.
It is possible to observe that despite the model chosen, the post-processed method showed higher balanced accuracy,
with significant gains such as over 28 percent points at YOLOv8n. It was possible no notice that, as expected, the
lightest models performs worse than the heaviest, but the range of this variation is different according to the model
evaluated. For example, specially when analysing YOLOv9t and YOLOv9e with the post-processed method, the
balanced accuracy goes from 86.90% to 99.68%, indicating a range of around 13 percent points. This variation was not
strongly observed at the other YOLO versions. YOLOv10, in particular, showed to be an stable model for both ROI
methods, despite not having the highest performance. YOLOv11 presented a high performance specially at the extra
large version, however still lower than YOLOv9, which was the better model in terms of accuracy, balanced accuracy
and
F1
-score. To continue the analysis, we investigated the confusion matrices for the worst and best possible result for
each ROI method at Figures 7 and 8.
Regarding model performance, the pre-processing method showed a high amount of false positives. A possible reason
for that is the lack of context in the image that got some regions removed from it, leaving some vehicles unclassified.
Another possible cause is the need to leave some vehicles on the edges of the image not fully appearing, as the masks
were delimited in a way no cars on the outside appear, which would lead to unwanted vehicles being also classified.
This issue was attenuated for the best case scenario with YOLOv10x, still leaving a considerable amount of 4.82% false
positives. In other hand, the pre-processing method showed a small amount of false negatives for the worst and best
model, indicating that when a vehicle is detected, it has a high confidence and the prediction can be trusted.
For the post-processing method, it is possible to observe a smaller amount of false positives, with its worst case
(YOLOV9t with 3.93%) lower than the best case of the to pre-processing method (YOLOv10x with 4.82%). The worst
case showed to have a considerable amount of false negatives (YOLOV9t with 0.52%), higher than the pre-processing
method, which can be caused by cases where the model detects more than one vehicle in a part of the image. However,
for the best model found, it was the lowest number found for all cases (YOLOv9e with 0.07%). This is an evidence that
choosing the appropriate model and method can result in a well balanced confusion matrix detecting more vehicles and
maintaining a high detection confidence.
We also made an qualitative analysis of the results of the post-processing method by selecting the tested models at
Figure 9.
12
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Figure 6: Balanced Accuracy by model and ROI method.
Figure 7: Confusion Matrices for worst (YOLOv9t) to best (YOLOv10x) result on pre-processing method based on
balanced accuracy.
13
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Figure 8: Confusion Matrices for worst (YOLOv9t) to best (YOLOv9e) result on post-processing method.
Smaller versions of the models suffer with lighting conditions and miss classify cars in some cases, presenting a larger
number of false positives. In the largest versions, the predictions are more stable and the problems encountered are
attenuated, despite in some cases having a higher number of false negatives.
5.2 Time Performance
Table 4: Time Analysis Metrics
Hardware YOLO Version
8n 8x 9t 9e 10n 10x 11n 11x
Desktop
PC GPU116 ±
0.7 ms 30 ±
0.2 ms 22 ±
0.5 ms 31 ±
0.3 ms 18 ±
0.5 ms 24 ±
0.2 ms 17 ±
0.3 ms 27 ±
0.2 ms
Mobile
PC GPU226 ±
0.5 ms 204 ±
0.7 ms 29 ±
0.6 ms 198 ±
0.9 ms 29 ±
0.6 ms 161 ±
1.2 ms 27 ±
1.1 ms 164 ±
2.0 ms
Desktop
PC CPU345 ±
1.6 ms 395 ±
2.2 ms 62 ±
0.8 ms 470 ±
2.5 ms 49 ±
1.8 ms 315 ±
1.7 ms 45 ±
1.1 ms 351 ±
1.6 ms
Mobile
PC CPU473 ±
1.7 ms 953 ±
3.3 ms 102 ±
8.7 ms 968 ±
4.5 ms 87 ±
2.3 ms 728 ±
3.4 ms 80 ±
3.0 ms 801 ±
2.5 ms
Raspberry
Pi 4 1±
0.02 s 13 ±
1.2 s 1±
0.01 s 16 ±
0.2 s 1±
0.01 s 12 ±
0.1 s 1±
0.03 s 14 ±
0.1 s
Raspberry
Pi 3 2±
0.06 s 28 ±
0.7 s 2±
0.02 s 92 ±
9.3 s 2±
0.03 s 24 ±
0.2 s 2±
0.02 s 46 ±
7.4 s
1NVIDIA RTX4060Ti
2NVIDIA MX450
3AMD Ryzen 5 3600
4Intel i7-11390H @ 3.40GHz
Table 4 shows the results for the average time and standard deviation to perform model inference per picture in multiple
hardware. The average time was taken by selecting 80 random pictures from the dataset and running model inference
in each hardware. For all CPU tests, we used all threads available. These variations in inference time are a form of
assessing the suitability of each device for applications, according to the business needs. In our case, a wait time up to
five minutes is acceptable, since the parking lot usually do not suffer major differences in terms of occupation in this
14
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Figure 9: Qualitative analysis for the selected models
15
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Figure 10: Average Processing time for each hardware and model.
time window, however, other cases may need to choose more carefully the model and hardware setup. Figure 10 shows
inference time by model and hardware using the logarithm scale on the y-axis.
The Raspberry Pi units were kept in an environment with controlled temperature to prevent performance degradation due
to CPU throttling. This was a crucial step for the stability observed in the results presented in the Table, as overheating
caused variations in the time required for each image inference. Raspberry Pi 3 Model B+(1 GB) and Raspberry Pi
4 Model B (4 GB) were used to perform tests at edge devices, achieving an inference time range from 1-92 seconds,
which is beyond acceptable for the application needs.
The desktop computer is based on a Ryzen 5 3600 (6 cores / 12 threads) with 96 GB of DDR4 3200MHz RAM. The
GPU used was the 16GB NVIDIA RTX 4060Ti. Similar to the Raspberry Pis, the PC was operated in an environment
with forced ventilation and controlled temperature, and no performance degradation due to temperature increase was
observed. Using the automatic method for selecting the GPU’s performance states (P-states), it was observed that it
did not consume more than 45W during the tests. This would be the hardware with most computational capabilities
used at the study and a viable choice for cloud computing, achieving an almost real-time inference time range from 16
milliseconds in GPU to 470 milliseconds in CPU.
The mobile PC used for this test was a Dell Inspiron 15-5510 equipped with an Intel i7-11390H processor, with a
NVIDIA MX450 GPU. One bottleneck observed during testing was CPU throttling, which affected inference times,
underscoring the challenges of deploying high-demand models on such hardware, that presented an inference time range
from three to five times higher than the GPU of the mobile PC. This reflects the current tendency at optimizing YOLO
models for GPU. The Mobile GPU did not allow power consumption measurements as done on the desktop GPU; it
was only possible to monitor the operating temperature, which remained constant at 76°C throughout the experiment.
We previously investigated using YOLOv8x for object detection on Raspberry Pi 3 and Raspberry Pi 4 at a people
counting application [
33
], indicating that beyond being possible to use this hardware for edge processing in terms of
inference time, average CPU and memory usage requirements are also met, even at more resource constrained devices
such as TV Boxes [
50
]. Table 2 and 4 presents YOLOv8x as the model with highest FLOPs, parameters and latency.
This indicates that optimized newest models should also be compatible with this devices without issues, as the state
of the art advances. Newest nano version of YOLO have better latency on constrained resources devices, while extra
large version have better latency on GPU. These results have significant implications for edge intelligence, as less
resource constrained devices are developed and lighter more accurate deep learning models can be used near the sensor,
maintaining data privacy.
16
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
5.3 Cost Analysis
We estimated the solution cost to be 177 United States Dollars (USD), composing a Raspberry Pi 4 Model B (4GB
RAM) a surveillance camera (Raspberry Pi Camera Module 3), a power supply, a MicroSD Card and a weatherproof
case . This estimate may vary slightly depending on the type of camera used. As for the solution using sensors, which
is currently the most used commercially, we calculate an average of 15 USD per parking space. To define this value we
consider that a device can be used for only one parking spaces, and is composed mostly of a sensor, a microcontroller
and a solar panel. The bill of materials for both kind of solutions are shown at Tables 5 and 6.
Table 5: Bill of Materials for the camera-based solution
Item Quantity Cost (USD)
Raspberry Pi 4 Model B (4GB RAM) 1 55
Raspberry Pi Camera Module 3 1 25
Power Supply 1 10
MicroSD Card 1 15
Case (Weatherproof) 1 15
Total 5120
Table 6: Bill of Materials for the sensor-based solution (assuming one device per space)
Item Quantity Cost (USD)
Microcontroller 1 5
Small Solar Panel 1 5
Battery 1 6
Charge Controller 1 2
Sensor 1 2
Enclosure (Weatherproof) 1 10
Total 630
Figure 11 shows the estimated costs of the solutions using sensors and cameras, with the threshold where it pays off to
use cameras is showed. Despite having a higher initial cost, the solution using cameras is more economical in larger
parking lots since a single camera can monitor a higher number of parking spaces. In our case, as we count with 16
spots, we are at the 4x threshold point. Another aspect to consider is concerning the maintenance cost, as solutions
using sensors have a high maintenance cost due to the high number of modules needed while our solution uses few
modules that would eventually need maintenance, while camera-based solutions are easier and cheaper to monitor, as in
some cases they are already implemented at the parking lot and can be reused.
Figure 11: Cost Comparison between Cameras and Sensors.
17
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
6 Conclusion
This paper evaluates vehicle classification in various parking lot scenarios using a low-cost architecture comprised of
accessible hardware and modern inference methods. This approach maintains data privacy by utilizing edge devices,
such as a Raspberry Pi, achieving inference times ranging from 1 to 92 seconds. Future work could explore different
methods to reduce model complexity, including quantization, conversion to TensorFlow Lite, and model pruning.
Overall, the proposed post-processing method to select the ROI in the image performed better than the traditional
pre-processing method, solving the issue of vehicles left unclassified. The tested models presented good performance
metrics, achieving a balanced accuracy of 99.68% with YOLOv9e, which indicates that not always newest released
YOLO models are having better results in terms of accuracy, but can bring deep learning closer to the edge with lightest
models. This performance could be also compared with other variants of models and also with the regular polygonal
ROI selection in future work.
As new models are released, that could be increased even more using other neural networks as object detection is a
highly emerging field of research. An analysis can be made to understand how the models are performing according to
different scenarios and other datasets can be used for testing. At bigger parking lots, other issues can appear as they
contain a more diverse range of scenarios, with more vehicles and potentially deal with multiple cameras. This may
indicate the need to use different techniques such as other image processing tools and fine-tuned models to achieve
good performance.
Using deep learning yields good results; however, it provides less insight into the model’s features. The use of
explainable AI can bring more trustworthiness and transparency to the users of smart city systems [
51
]. That can be
combined with the different masks approaches to understand the impact on the neural networks using handcrafted
masks.
Another direction for future work involves conducting stability tests to assess how consistent the predictions are for
similar images, as well as improving the labeling of the dataset by following bounding box coordinates. This would
enable the calculation of additional metrics commonly used in image classification problems, such as Intersection over
Union (IoU). Also, assessing inference time at different devices such as NVIDIA Jetson Nano, Raspberry Pi 5 and even
TV Boxes could enable a benchmark with a larger range of edge devices options.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Acknowledgments
This project was supported by CAPES (process 88887.999360/2024-00), CNPq (process 308840/2020-8 and
131653/2023-7), by the Brazilian Ministry of Science, Technology and Innovations, with resources from Law
8,248, of October 23, 1991, within the scope of PPI-SOFTEX, coordinated by Softex and published Arquitetura
Cognitiva (Phase 3), DOU 01245.003479/2024 -10, and by FAPESP (process 2023/00811-0).
Contributions
All authors contributed to interpreting the results and writing and reviewing the manuscript. G.P.C.P.D.L., G.M.S., and
L.F.G.G. conducted the experiments. J.F.B. supervised the project.
Availability of data and materials
The datasets generated and/or analysed during the current study are not available due to the sensitive nature of the
research, however codes and materials that support the findings of this study are available upon reasonable request. All
licence plates that could appear in this work are already unidentified at the base picture and also were blurred.
18
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
Abbreviations
The following abbreviations are used in this paper:
AI Artificial intelligence
AOD-Net All-in-One Dehazing Network
ARM Advanced RISC Machine
CNN Convolutional Neural Network
COCO Common Objects in Context
CPU Central Processing Unit
C2f Cross-Stage Partial Bottleneck with Two Convolutions
EI Edge Intelligence (EI)
Faster R-CNN Faster Region-based Convolutional Neural Networks
FLOPs Floating Point Operations
FN False Negatives
FP False Positives
GIMP GNU Image Manipulation Program
GPU Graphics Processing Unit
IoT Internet of Things
IOU Intersect Over Union
JPG Joint Photographic Experts Group
LSTM Long Short-Term Memory
mAP mean Average Precision
MaskRCNN Mask Region Convolutional Neural Network
PNG Portable Network Graphics
R-CNN Region-based Convolutional Neural Network (R-CNN)
ReLu Rectified Linear Unit
ROI Region Of Interest
SD Secure Digital
Unicamp State University of Campinas
SVM Support Vector Machine
SWAP Swap File
TP True Positives
TN True Negatives
UN United Nations
USD United States Dollars
YOLO You Only Look Once
References
[1]
Sandeep Saharan, Neeraj Kumar, and Seema Bawa. An efficient smart parking pricing system for smart city
environment: A machine-learning based approach. Future Generation Computer Systems, 106:622–640, 2020.
[2]
Rachel R. Weinberger, Adam Millard-Ball, and Robert C. Hampshire. Parking search caused congestion: Where’s
all the fuss? Transportation Research Part C: Emerging Technologies, 120:102781, 2020.
[3]
Ebenezer Okai, Xiaohua Feng, and Paul Sant. Smart cities survey. In 2018 IEEE 20th International Conference
on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE
4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 1726–1730, 2018.
[4]
Trista Lin, Herve Rivano, and Frédéric Le Mouël. A survey of smart parking solutions. IEEE Transactions on
Intelligent Transportation Systems, 18:3229 3253, 04 2017.
[5]
J. V. Baggio, L. F. Gonzalez, and J. F. Borin. Smartparking: A smart solution using deep learning. Technical
report, Instituto de Computação - UNICAMP, 2020.
[6]
Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Carlo Meghini, and Claudio Vairo. Deep
learning for decentralized parking lot occupancy detection. Expert Systems with Applications, 72:327–334, 2017.
[7]
Sai Sneha Channamallu, Sharareh Kermanshachi, Jay Michael Rosenberger, and Apurva Pamidimukkala. A
review of smart parking systems. Transportation Research Procedia, 73:289–296, 2023. International Scientific
Conference „The Science and Development of Transport - Znanost i razvitak prometa ZIRP 2023”.
19
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
[8]
Zheng Xie and Xing Wei. Automatic parking space detection system based on improved yolo algorithm. In 2021
2nd International Conference on Computer Science and Management Technology (ICCSMT), pages 279–285,
2021.
[9]
Zeydin Pala and Nihat Inanc. Smart parking applications using rfid technology. In 2007 1st Annual RFID Eurasia,
pages 1–3. IEEE, 2007.
[10]
MY Idna Idris, YY Leng, EM Tamil, NM Noor, Zaidi Razak, et al. Car park system: A review of smart parking
system and its technology. Information technology journal, 8(2):101–113, 2009.
[11]
Thomas Moranduzzo and Farid Melgani. Detecting cars in uav images with a catalog-based approach. IEEE
Transactions on Geoscience and Remote Sensing, 52(10):6356–6367, 2014.
[12]
Srishti Srivastava, Sarthak Narayan, and Sparsh Mittal. A survey of deep learning techniques for vehicle detection
from uav images. Journal of Systems Architecture, 117:102152, 2021.
[13]
Noah Sieck, Cameron Calpin, and Mohammad Almalag. Machine vision smart parking using internet of things
(iots) in a smart university. In 2020 IEEE International Conference on Pervasive Computing and Communications
Workshops (PerCom Workshops), pages 1–6. IEEE, 2020.
[14]
Debaditya Acharya, Weilin Yan, and Kourosh Khoshelham. Real-time image-based parking occupancy detection
using deep learning. Research@ Locate, 4:33–40, 2018.
[15]
Paulo RL De Almeida, Luiz S Oliveira, Alceu S Britto Jr, Eunelson J Silva Jr, and Alessandro L Koerich. Pklot–a
robust dataset for parking lot classification. Expert Systems with Applications, 42(11):4937–4949, 2015.
[16]
Harshitha Bura, Nathan Lin, Naveen Kumar, Sangram Malekar, Sushma Nagaraj, and Kaikai Liu. An edge based
smart parking solution using camera networks and deep learning. In 2018 IEEE International Conference on
Cognitive Computing (ICCC), pages 17–24, 2018.
[17]
Milind Naphade, David C Anastasiu, Anuj Sharma, Vamsi Jagrlamudi, Hyeran Jeon, Kaikai Liu, Ming-Ching
Chang, Siwei Lyu, and Zeyu Gao. The nvidia ai city challenge. In 2017 IEEE SmartWorld, Ubiquitous Intelligence
& Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data
Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI),
pages 1–6. IEEE, 2017.
[18]
Daniel Padilla Carrasco, Hatem A Rashwan, Miguel Ángel García, and Domenec Puig. T-yolo: Tiny vehicle
detection based on yolo and multi-scale convolutional neural networks. Ieee Access, 11:22430–22440, 2021.
[19]
Shweta Shukla, Rishabh Gupta, Sarthik Garg, Samarpan Harit, and Rijwan Khan. Real-Time Parking Space
Detection and Management with Artificial Intelligence and Deep Learning System, pages 127–139. Springer
International Publishing, Cham, 2022.
[20]
R. Nithya, V. Priya, C. Sathiya Kumar, J. Dheeba, and K. Chandraprabha. A smart parking system: An iot based
computer vision approach for free parking spot detection using faster r-cnn with yolov3 method. Wireless Personal
Communications, 125:3205–3225, 2022.
[21]
Gaurav Satyanath, Jajati Keshari Sahoo, and Rajendra Kumar Roul. Smart parking space detection under
hazy conditions using convolutional neural networks: a novel approach. Multimedia Tools and Applications,
117:102152, 2023.
[22]
Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, and Claudio Vairo. Car parking occu-
pancy detection using smart camera networks and deep learning. In 2016 IEEE Symposium on Computers and
Communication (ISCC), pages 1212–1217, 2016.
[23]
Sarmad Rafique, Saba Gul, Kaleemullah Jan, and Gul Muhammad Khan. Optimized real-time parking management
framework using deep learning. Expert Systems with Applications, 220:119686, 2023.
[24]
Yash Doshi, Khushi Shah, Neha Katre, Vinaya Sawant, and Stevina Correia. Comparision of yolo models for
object detection from parking spot images. Educational Administration: Theory and Practice, 30(4):10401–10411,
2024.
[25]
Athulya Sundaresan Geetha, Mujadded Al Rabbani Alif, Muhammad Hussain, and Paul Allen. Comparative
analysis of yolov8 and yolov10 in vehicle detection: Performance metrics and model efficacy. Vehicles, 6(3):1364–
1382, 2024.
[26] Shreeram Hudda, Rishabh Barnwal, Abhishek Khurana, and K Haribabu. A wsn and vision based smart, energy
efficient, scalable, and reliable parking surveillance system with optical verification at edge for resource constrained
iot devices. Internet of Things, page 101346, 2024.
[27]
Martin Marek. Image-based parking space occupancy classification: Dataset and baseline. arXiv preprint
arXiv:2107.12207, 2021.
20
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
[28]
Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. Edge intelligence: Paving the last mile of
artificial intelligence with edge computing. Proceedings of the IEEE, 107(8):1738–1762, 2019.
[29]
George Plastiras, Maria Terzi, Christos Kyrkou, and Theocharis Theocharides. Edge intelligence: Challenges
and opportunities of near-sensor machine learning applications. In 2018 ieee 29th international conference on
application-specific systems, architectures and processors (asap), pages 1–7. IEEE, 2018.
[30]
Luke Munn. Staying at the edge of privacy: edge computing and impersonal extraction. Media and Communication,
8(2):270–79, 2020.
[31]
Ranjan Sapkota, Rizwan Qureshi, Marco Flores Calero, Muhammad Hussain, Chetan Badjugar, Upesh Nepal,
Alwin Poulose, Peter Zeno, Uday Bhanu Prakash Vaddevolu, Hong Yan, et al. Yolov10 to its genesis: A decadal
and comprehensive review of the you only look once series. arXiv preprint arXiv:2406.19407, 2024.
[32]
Shahriar Shakir Sumit, Junzo Watada, Anurava Roy, and DRA Rambli. In object detection deep learning methods,
yolo shows supremum to mask r-cnn. In Journal of Physics: Conference Series, volume 1529, page 042086. IOP
Publishing, 2020.
[33]
Gabriel Sato, Gustavo Luz, Luis Gonzalez, and Juliana Borin. Reaproveitamento de tv boxes para aplicação de
contagem de pessoas na borda em cidades inteligentes. In Anais do VIII Workshop de Computação Urbana, pages
197–209, Porto Alegre, RS, Brasil, 2024. SBC.
[34]
Sanjog Tamang, Biswaraj Sen, Ashis Pradhan, Kalpana Sharma, and Vikash Kumar Singh. Enhancing covid-19
safety: Exploring yolov8 object detection for accurate face mask classification. International Journal of Intelligent
Systems and Applications in Engineering, 11(2):892–897, 2023.
[35]
J Redmon. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, 2016.
[36]
Chien-Yao Wang and Hong-Yuan Mark Liao. Yolov1 to yolov10: The fastest and most accurate real-time object
detection systems. arXiv preprint arXiv:2408.09332, 2024.
[37]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid
networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 2117–2125, 2017.
[38]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks
for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9):1904–1916, 2015.
[39]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network for instance segmentation. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8759–8768, 2018.
[40]
Glenn Jocher, Anurag Chaurasia, and Jian Qiu. Ultralytics YOLO (Version 8.0.0).
https://github.com/
ultralytics/ultralytics, 2023. Computer software.
[41]
Juan Terven, Diana-Margarita Córdova-Esparza, and Julio-Alejandro Romero-González. A comprehensive review
of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Machine Learning and Knowledge
Extraction, 5(4):1680–1716, 2023.
[42]
CY Wang, IH Yeh, and HYM Liao. Yolov9: Learning what you want to learn using programmable gradient
information. arxiv prepr. arXiv preprint arXiv:2402.13616, 2024.
[43]
Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and Guiguang Ding. Yolov10: Real-time
end-to-end object detection. arXiv preprint arXiv:2405.14458, 2024.
[44] Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024.
[45]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and
C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th
European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755.
Springer, 2014.
[46]
Ning Zhao, Ke Wang, Jiaxing Yang, Fengkai Luan, Liping Yuan, and Hu Zhang. Cmca-yolo: A study on a
real-time object detection model for parking lot surveillance imagery. Electronics, 13(8):1557, 2024.
[47] Goran Oreski. Yolo* c—adding context improves yolo performance. Neurocomputing, 555:126655, 2023.
[48]
Joachim Krois, Lisa Schneider, and Falk Schwendicke. Impact of image context on deep learning for classification
of teeth on radiographs. Journal of Clinical Medicine, 10(8):1635, 2021.
[49]
Chantri Polprasert, Chaiyaboon Sruayiam, Prathan Pisawongprakan, and Sirapob Teravetchakarn. A camera-
based smart parking system employing low-complexity deep learning for outdoor environments. In 2019 17th
International Conference on ICT and Knowledge Engineering (ICT&KE), pages 1–5. IEEE, 2019.
21
Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLO A PREPRINT
[50]
Gustavo PC P da Luz, Gabriel Massuyoshi Sato, Luis Fernando Gomez Gonzalez, and Juliana Freitag Borin.
Repurposing of tv boxes for a circular economy in smart cities applications. 2024.
[51]
Abdul Rehman Javed, Waqas Ahmed, Sharnil Pandya, Praveen Kumar Reddy Maddikunta, Mamoun Alazab, and
Thippa Reddy Gadekallu. A survey of explainable artificial intelligence for smart cities. Electronics, 12(4):1020,
2023.
22
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Nos últimos anos, grandes quantidades de equipamentos de TV Box ilegais tem sido apreendidos no Brasil. Segundo notícia divulgada em março deste ano, estima-se que haja em torno de 2,5 milhões de TV Boxes nos depósitos da Receita Federal. Por outro lado, o avanço das aplicações baseadas em Internet das Coisas (IoT) e aprendizado de máquina em cidades inteligentes tem impulsionado pesquisas em computação na borda usando dispositivos com limitação de hardware. Este artigo apresenta um estudo sobre a viabilidade de se reaproveitar TV Boxes para computação na borda em uma aplicação de contagem de pessoas a partir de imagens coletadas por câmeras. Uma comparação entre o desempenho de 2 modelos de TV Boxes e hardwares amplamente utilizados em soluções de IoT durante a execução dos modelos de aprendizado profundo YOLOv8 e EfficientDet evidenciam esta viabilidade.
Article
Full-text available
Citation: Sundaresan Geetha, A.; Alif, M.A.R.; Hussain, M.; Allen, P. Comparative Analysis of YOLOv8 and YOLOv10 in Vehicle Detection: Performance Metrics and Model Efficacy. Vehicles 2024, 6, 1364-1382. https://doi.org/10.3390/ vehicles6030065 Academic Editors: Hocine Imine and Claudio Lantieri Abstract: Accurate vehicle detection is crucial for the advancement of intelligent transportation systems, including autonomous driving and traffic monitoring. This paper presents a comparative analysis of two advanced deep learning models-YOLOv8 and YOLOv10-focusing on their efficacy in vehicle detection across multiple classes such as bicycles, buses, cars, motorcycles, and trucks. Using a range of performance metrics, including precision, recall, F1 score, and detailed confusion matrices , we evaluate the performance characteristics of each model.The findings reveal that YOLOv10 generally outperformed YOLOv8, particularly in detecting smaller and more complex vehicles like bicycles and trucks, which can be attributed to its architectural enhancements. Conversely, YOLOv8 showed a slight advantage in car detection, underscoring subtle differences in feature processing between the models. The performance for detecting buses and motorcycles was comparable, indicating robust features in both YOLO versions. This research contributes to the field by delineating the strengths and limitations of these models and providing insights into their practical applications in real-world scenarios. It enhances understanding of how different YOLO architectures can be optimized for specific vehicle detection tasks, thus supporting the development of more efficient and precise detection systems.
Article
Full-text available
In the accelerated phase of urbanization, intelligent surveillance systems play an increasingly pivotal role in enhancing urban management efficiency, particularly in the realm of parking lot administration. The precise identification of small and overlapping targets within parking areas is of paramount importance for augmenting parking efficiency and ensuring the safety of vehicles and pedestrians. To address this challenge, this paper delves into and amalgamates cross-attention and multi-spectral channel attention mechanisms, innovatively designing the Criss-cross and Multi-spectral Channel Attention (CMCA) module and subsequently refining the CMCA-YOLO model, specifically optimized for parking lot surveillance scenarios. Through meticulous analysis of pixel-level contextual information and frequency characteristics, the CMCA-YOLO model achieves significant advancements in accuracy and speed for detecting small and overlapping targets, exhibiting exceptional performance in complex environments. Furthermore, the study validates the research on a proprietary dataset of parking lot scenes comprising 4502 images, where the CMCA-YOLO model achieves an mAP@0.5 score of 0.895, with a pedestrian detection accuracy that surpasses the baseline model by 5%. Comparative experiments and ablation studies with existing technologies thoroughly demonstrate the CMCA-YOLO model’s superiority and advantages in handling complex surveillance scenarios.
Article
Full-text available
YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO’s development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.
Article
Full-text available
The emergence of Explainable Artificial Intelligence (XAI) has enhanced the lives of humans and envisioned the concept of smart cities using informed actions, enhanced user interpretations and explanations, and firm decision-making processes. The XAI systems can unbox the potential of black-box AI models and describe them explicitly. The study comprehensively surveys the current and future developments in XAI technologies for smart cities. It also highlights the societal, industrial, and technological trends that initiate the drive towards XAI for smart cities. It presents the key to enabling XAI technologies for smart cities in detail. The paper also discusses the concept of XAI for smart cities, various XAI technology use cases, challenges, applications, possible alternative solutions, and current and future research enhancements. Research projects and activities, including standardization efforts toward developing XAI for smart cities, are outlined in detail. The lessons learned from state-of-the-art research are summarized, and various technical challenges are discussed to shed new light on future research possibilities. The presented study on XAI for smart cities is a first-of-its-kind, rigorous, and detailed study to assist future researchers in implementing XAI-driven systems, architectures, and applications for smart cities.
Article
You Only Look Once (YOLO) algorithms deliver state-of-the-art performance in object detection. This research proposes a novel one-stage YOLO-based algorithm that explicitly models the spatial context inherent in traffic scenes. The new YOLO*C algorithm introduces the MCTX context module and integrates loss function changes, effectively leveraging rich global context information. The performance of YOLO*C models is tested on BDD100K traffic data with multiple context variables. The results show that including context improves YOLO detection results without losing efficiency. Smaller models report the most significant improvements. The smallest model accomplished more than a 10% increase in mAP .5 compared to the baseline YOLO model. Modified YOLOv7 outperformed all models on mAP .5, including two-stage and transformer-based detectors, available at the dataset zoo. The analysis shows that improvement mainly results from better detection of smaller traffic objects, which presents a significant detection challenge within the complex traffic environment.
Article
The COVID-19 pandemic has emphasized the importance of wearing face masks as an effective measure to reduce the spreading of the virus. With the increasing demand for automated systems capable of detecting and classifying face mask wearing conditions, deep learning models have emerged as a powerful tool in this domain. In this research paper, we investigate the performance of the YOLOv8 (You Only Look Once) object detection algorithm for the classification of face mask wearing conditions. YOLOv8 is a state-of-the-art deep learning model known for its real-time object detection capabilities. The model is trained with Face Mask Detector(FMD) dataset to provide ground truth labels for training and evaluation purposes. We fine-tune the YOLOv8 model using transfer learning techniques on this dataset, enabling it to classify face mask wearing conditions accurately. The experiments performed demonstrate that the YOLOv8 model achieves excellent performance in face mask wearing condition classification. We evaluate the model on various metrics, including precision, recall, mAP, to assess its accuracy, sensitivity, and overall performance. The results show that the model successfully distinguishes between individuals wearing face masks, not wearing face masks, or wearing face masks incorrectly, with high precision and recall rates.The YOLOv5 model was also trained using the same dataset for comparative analysis.