Conference PaperPDF Available

Autonomous Navigation of a Mobile Robot Using Overhead Camera and Computer Vision Methods for Time-Critical Tasks

Authors:

Abstract and Figures

This paper describes the implementation of an in-telligent navigation system for the mobile robot GFS-X equippedwith a manipulator, aimed at solving a range of tasks on the arenawithin a limited time frame. The approach to route planning andadjustment using a map obtained from an overhead camera isexamined. YOLOv11 is utilized to gather information about thelocation of objects and their identification.
Content may be subject to copyright.
Autonomous Navigation of a Mobile Robot Using
Overhead Camera and Computer Vision Methods
for Time-Critical Tasks
Anatoly Tselishchev
Department of Applied Mathematics
Lipetsk State Technical University
Lipetsk, Russia
anatolysamaris@gmail.com
Diana Ustinova
Department of Applied Mathematics and Cybernetics
Petrozavodsk State University
Petrozavodsk, Russia
Bibedaa@yandex.ru
Dmitry Savinov
Institute of Advanced Technologies “School of X”
Don State Technical University
Rostov-on-Don, Russia
savinov.dima44@yandex.ru
Abstract—This paper describes the implementation of an in-
telligent navigation system for the mobile robot GFS-X equipped
with a manipulator, aimed at solving a range of tasks on the arena
within a limited time frame. The approach to route planning and
adjustment using a map obtained from an overhead camera is
examined. YOLOv11 is utilized to gather information about the
location of objects and their identification.
Index Terms—yolo, autonomous navigation, mobile robot, com-
puter vision, overhead camera, object detection, image processing
I. INTRODUCTION
Robotic systems are becoming a popular solution for
performing tasks in hazardous environments and optimizing
performance in dynamically changing conditions. This ne-
cessitates ongoing research and development in the field of
intelligent control systems for robots to accomplish designated
tasks. One of the most informative sensors available in robotic
systems is the camera, which can be mounted on mobile robots
or integrated into video systems with distributed cameras,
facilitating the use of real-time computer vision and image
processing algorithms. In this work, we utilize an overhead
camera positioned above the test arena to construct an optimal
route to the target and to rapidly adjust the route in response
to significant changes in the surrounding environment, as well
as an onboard camera to verify the successful completion of
the task. Consequently, the intelligent system generates control
commands and monitors the execution of tasks.
II. TASK DEFINITION
The robot was assigned several tasks that it needed to
complete in a maze within a limited timeframe of 2 minutes:
Pressing the green button;
Transferring cubes to the basket corresponding to robot’s
team;
Moving a ball to the basket corresponding to robot’s team.
Points are awarded for the successful completion of each
task.
Additionally, there are actions that the robot is prohibited
from undertaking:
Exiting the boundaries of the arena;
Colliding with the wall;
Colliding with the opposing robot.
Points are deducted for these violations.
Consequently, it is essential to develop a strategy for task
execution, outline optimal routes for the robot, and ensure
efficient manipulation of objects.
The following sensors are available for task execution: two
cameras (one with an overhead view and one mounted on the
robot’s body), an ultrasonic sensor, and three infrared sensors
for distance measurement.
A manipulator mounted on the robot’s body is utilized for
object interaction.
III. OVERHEAD CAMERA NAVIGATION
The fundamental algorithm for executing tasks is object
detection on the arena using the state-of-the-art architecture
of the convolutional neural network [1] YOLOv11 [2]. By
obtaining information about the detected objects and their
positions, we can perform a variety of useful actions for robot
control and trajectory planning.
A. Camera calibration
The upper camera is a wide-angle camera with a field
of view of 102 degrees. One of the characteristics of such
cameras is the presence of distortions known as the ”fisheye
effect. [3]979-8-3315-3217-8/24/$31.00 ©2024 IEEE
Fig. 1. OpenCV camera model.
These distortions can significantly impair the accuracy of
calculations based on the positions of detected objects; there-
fore, prior to the utilization of the neural network, the frames
undergo a calibration procedure [4].
The calibration is performed using a specific formula that
takes into account the camera parameters. In particular, the
formula may be expressed as follows:
K=
fx 0cx
0fy cy
0 0 1
-fx and f y are the focal lengths of the camera along the
xand yaxes, respectively, which determine the scale of the
image. - cx and cy are the coordinates of the image center,
indicating the location of the principal point in the image
(typically the center of the frame).
distortion =k1k2p1p2k3
-k1, k2, k3are the radial distortion coefficients that account
for distortions occurring at the edges of the image (e.g.,
”pincushion” or ”barrel” distortion). - p1, p2are the tangential
distortion coefficients that consider the shift of the optical axis,
causing distortions in the image when the optical system is not
perfectly aligned.
B. Creating a dataset
The dataset [5] was created based on calibrated images
from the upper camera. To increase the training sample and
enhance the model’s ability to generalize features, augmenta-
tion techniques such as noise and shear were applied. A total
of approximately 500 images were initially captured, which
increased to 1100 after augmentation.
The annotation of the dataset using objects that are directly
present during task execution ensures the highest possible
accuracy for the detector model.
The model was trained for 300 epochs, achieving a mAP50
metric value of 0.88, which is a good indicator of the model’s
accuracy, considering the impact of distortions in the images.
Fig. 2. Example of a labeled frame.
C. Route planning
To effectively plan [6] the route, a graph was developed,
representing an abstraction of the space in which navigation
[7] takes place. The simplest approach to constructing such a
graph is to use each pixel of the field as a vertex. However,
considering the large number of pixels in an image, such a
grid graph would require significant amounts of memory and
would include redundant information.
Instead, a more practical solution is to select only key points
in the space as the vertices of the graph [8]. The graph is
marked based on distances from static objects, such as walls,
boundaries, and blind spots. This results in a graph consisting
of no more than 50 vertices.
Fig. 3. Graph map initialization.
In the considered space, there are also dynamic objects
that need to be taken into account in the graph structure.
These dynamic vertices are connected to static ones, ensuring
minimal changes in the graph when objects move within the
field.
To solve the problem of finding the shortest path in the
graph, the Aalgorithm was used. This algorithm requires
a weighted graph, which is why all edges of the graph were
assigned a weight of 1. The Aalgorithm has good asymptotic
complexity, O(|E|log(V)), where Eis the number of edges
and Vis the number of vertices. This makes it efficient for
finding optimal routes in various scenarios.
The considered space also contains an enemy robot, and
colliding with it is unacceptable. Therefore, all edges directly
connecting vertices located near this robot were assigned a
high weight. This allows the Aalgorithm to avoid routes
that intersect or approach the enemy object, thus ensuring safe
navigation and successful task execution.
D. Calculating the robot’s rotation
Since the robot does not have instruments to measure its
position, the upper camera was utilized for the approximate
calculation of the robot’s rotation angle relative to the camera
[9]. This enables the division of the route into sub-tasks such
as ”Move forward by N centimeters” and ”Perform a turn of
N degrees.
Fig. 4. Example of robot’s angle approximation.
To solve this problem, we will find the bounding box
containing our robot, and then we will determine its center.
Our robot has a LED of a specific color located at the rear
of the robot’s body, perpendicular to the robot’s movement.
Using thresholding and contour detection, we find the centroid
of the LED. Then we will find the vector from the centroid
of the LED to the center of the robot. By calculating the
angle between this vector and the vector along the x-axis
of the image, we will obtain an approximate value of the
robot’s rotation angle relative to the camera, which allows us
to control the robot’s turning angle.
IV. ROBOT MOV EM EN T
For the task of moving the robot, the main types of
commands were identified, which are generated on the server
(a laptop in the local network connected to the camera
and performing the necessary computations) and sent to the
Raspberry Pi single-board computer located within the robot:
Forward movement for a specified distance. This com-
mand facilitates the robot’s movement over a predeter-
mined distance. Carefully selected coefficients are em-
ployed to enable smooth acceleration and deceleration,
thereby reducing the error along the perpendicular axis
and enhancing accuracy.
Rotation by a specified angle. This command is used
to alter the robot’s direction of movement by a defined
angle. The angle is determined using data from the
overhead camera, and iterative adjustments ensure high
precision in the rotation.
Movement along a wall for a specified distance. This
command allows the robot to traverse parallel to the wall,
which increases the speed and confidence of navigation.
The robot utilizes the wall as a reference, thereby re-
ducing the risk of collisions, while a versatile regulation
method ensures stable movement.
A. Movement regulation
The control of the robot’s movement is complicated by
the absence of odometry and limited data: there are no
encoders or IMU, and delays from the overhead camera hinder
timely adjustments. The robot relies on ultrasonic and infrared
sensors. Standard PID [10] controllers under such conditions
lead to oscillatory motion: as the robot corrects the error, it
alters its direction vector, causing it to move either towards
or away from the wall [11], rather than parallel to it. This
increases the risk of collision and the loss of signal from the
ultrasonic sensor.
To address these issues, an unconventional solution was
implemented the use of a negative integral controller with
buffering. This approach involves the controller accumulating
error as the robot moves. When the robot approaches the wall
[12] and the error approaches zero, the accumulated negative
error starts to exert a significant effect in the opposite direction,
thereby aligning the robot along the wall.
Unlike a classic integral controller, which only increases
its impact with accumulated error, the negative integral con-
troller allows for preemptive compensation of deviations. This
prevents oscillatory motion of the robot, enabling it to move
parallel to the wall.
An iteration of the calculation of the control input to the
motor is demonstrated. Firstly we calculate PI-component of
the regulator:
P Ioutput =Kp·e(t)Ki·
239
X
i=0
ebuffer [i](1)
where P Ioutput is output value of PI-regulator, Kpis a
proportional coefficient, e(t)is a current error, Kiis an
integral coefficient, ebuff er is an array of errors where the
last 240 error values are stored.
Then we normalize the value of P Ioutput using (2) and
interpolate in the range between -20 and 20 using (3).
P Inorm = max(100,min(100, P Ioutput )) (2)
where P Inorm is a value between -100 and 100.
P Iinterp =(P Inorm + 100)
200 ·40 20 (3)
where P Iinterp is interpolated PI-regulator value between
-20 and 20.
The full formula of the control input to the motor is:
Motor pwm =base velocity +P Iinterp (4)
where base velocity is the velocity that is when motor has
no regulations.
B. Manipulator
For the successful execution of object manipulation, it is
essential for the robot to maintain a specified distance from
the object and to be correctly oriented towards it. This is
achieved through the regulation of the rotation angle via
cameras and the previously described motion commands. The
manipulator control system ensures the execution of sequential
commands determined during experiments, thereby guaran-
teeing successful grasping and manipulation of objects. This
approach provides high precision in manipulations even when
sensor data regarding the robot’s position and state is limited.
V. CONCLUSION
Despite the limited set of sensors, it is possible to perform
tasks with a mobile robot based on an overhead camera that
provides for all strategic and tactical levels of the robot’s
operation, while identifying an appropriate controller to en-
sure stable movement of the robot in accordance with the
commands it receives. Although the testing of the intelligent
system was conducted in a gaming arena environment, such
a solution can be applied in real-world scenarios with more
complex conditions and a greater volume of input data.
REFERENCES
[1] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, “You
Only Look Once: Unified, Real-Time Object Detection, 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
10.1109/CVPR.2016.91, 2016.
[2] Juan Terven, Diana-Margarita Cordova-Esparza, Julio-Alejandro
Romero-Gonz´
alez, “A Comprehensive Review of YOLO Architectures
in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS.
Machine Learning and Knowledge Extraction,” 2023, 1680-1716, doi:
10.3390/make5040083.
[3] Christian Br¨
auer-Burchardt, Klaus Voss, “A new algorithm to correct
fish-eye and strong wide-angle lens distortion from single images,” 2001
IEEE Trans Image Process, 1, 225-228, 10.1109/ICIP.2001.958994.
[4] Z. Zhang, “A flexible new technique for camera calibration,” in IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no.
11, pp. 1330-1334, Nov. 2000, doi: 10.1109/34.888718.
[5] Cristian Rocco, “Synthetic Dataset Creation For Computer Vision Ap-
plication: Pipeline Proposal,” 2021, 10.13140/RG.2.2.12115.25126.
[6] M. Elbanhawi, M. Simic, “Sampling-Based Robot Motion Planning: A
Review,” in IEEE Access, vol. 2, pp. 56-77, 2014, doi: 10.1109/AC-
CESS.2014.2302442.
[7] Edwin Olson, “AprilTag: A robust and flexible visual fiducial system,”
2011, IEEE International Conference on Robotics and Automation, 3400
- 3407, 10.1109/ICRA.2011.5979561.
[8] Fareh R, Baziyad M, Rahman MH, Rabie T, Bettayeb M.,
“Investigating Reduced Path Planning Strategy for Differential
Wheeled Mobile Robot,” Robotica, 2020, 38(2):235-255,
doi:10.1017/S0263574719000572.
[9] Devi Parikh, Gavin Jancke, “Localization and Segmentation
of A 2D High Capacity Color Barcode,” 2008, 1 - 6.
10.1109/WACV.2008.4544033.
[10] Hendril Purnama, Tole Sutikno, Srinivasan Alavandar, Nuryono Satya
Widodo. (2019). Efficient PID Controller based Hexapod Wall Following
Robot. 10.23919/EECSI48112.2019.8976964.
[11] Farkh Rihem, Khaled Aljaloud, “Vision Navigation Based PID Control
for Line Tracking Robot, 2023, Intelligent Automation and Soft Com-
puting, 35, 901-911, 10.32604/iasc.2023.027614.
[12] Heru Suwoyo, Ferryawan Kristanto, “Performance of a Wall-Following
Robot Controlled by a PID-BA using Bat Algorithm Approach,
2022, International Journal of Engineering Continuity, 1, 56-71,
10.58291/ijec.v1i1.39.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO’s development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.
Article
Full-text available
In a controlled indoor environment, line tracking has become the most practical and reliable navigation strategy for autonomous mobile robots. A line tracking robot is a self-mobile machine that can recognize and track a painted line on the floor. In general, the path is set and can be visible, such as a black line on a white surface with high contrasting colors. The robot’s path is marked by a dis-tinct line or track, which the robot follows to move. Several scientific contribu-tions from the disciplines of vision and control have been made to mobile robot vision-based navigation. Localization, automated map generation, autonomous navigation and path tracking is all becoming more frequent in vision appli-cations. A visual navigation line tracking robot should detect the line with a camera using an image processing technique. The paper focuses on combining computer vision techniques with a proportional-integral-derivative (PID) control-ler for automatic steering and speed control. A prototype line tracking robot is used to evaluate the proposed control strategy.
Article
Full-text available
A wall-following robot needs a controller that applies the closed-loop concept to move actively without hindrance. Some controllers with good capabilities can act as controllers for wall follower robots, such as PID controllers. Conceptually, this controller's good performance depends on tuning the three gains before use. Instead of giving the expected and appropriate output, wrong settings will provide inaccuracies for the controller, so applying the manual method at the tuning stage is not recommended. For this reason, PID controllers are often implemented in a system supported by appropriate optimization methods, such as Genetic Algorithm or Particle Swarm Optimization. Furthermore, different from this, in this study, the Bath Algorithm is used as an alternative optimization algorithm. Its application begins with a realistic simulation of a wall-following robot. This is done to provide the possibility to implement online PID controllers and BAs. In the end, several methods are compared to find out the performance of this type of approach. Moreover, based on the observed comparative results, the proposed method gives a better value in accumulative error and convergence speed in the PID optimization process.
Thesis
Full-text available
Convolutional neural network (CNN) is a data-hungry machine learning (ML) model normally used in computer vision (CV) applications and which demands a high quantity of usually annotated images. The acquisition and creation of that elevated quantity of data is a complex task since it might consider privacy, compliance, environment variables, dislocation of equipment, and the demanding task of image annotation. The consideration of all those aspects adds complexity to the dataset creation process (DCP), resulting most likely in delays and higher financial expenses for the development team. To reduce those negative impacts, a commonly known technique is the creation of a virtual environment (VE). The creation of a VE allows the capture and automatic annotation of the images, leading to the creation of synthetic data (SD). The creation of SD involves complex demands such as various variables, parameters, and a production procedure. Based on those aspects, this work aims to identify a valid production pipeline (PP) for dataset creation and its main aspects. The proposal is supported by a case study and a systematic literature review (SLR) that covers 1743 articles from the platform Lens.org since 2012 and evaluates in-depth 19 articles since the year 2016. It has the objective of reducing the academic gap observed. The proposal of an embracing and flexible procedure of SD creation would help the identification of failures and risks in the synthetic dataset creation process (SDCP). The PP consisted of nine main activities, namely, requirement identification, requirements review, environment creation, environment validation, CNN review, domain adaptation, model training, model evaluation, and data storage. The validation of the proposed PP was done along with the creation of a case study that virtually mimics one of the largest agriculture datasets available for the CV system, the 2016 sugar beet dataset from the University of Bonn. After the creation of the SD, its performance was evaluated by comparing six different dataset splits that considered real, synthetic, and mixed data. The comparison was performed using cross-validation considering a k-fold of 4. The main results found after the training were that the usage of only synthetic datasets might achieve similar results to a dataset composed of only real images, depending on the number of images considered. Another finding from the case study was that a mixed dataset with a lower quantity of real data (25% lower) can achieve an average F-measure of 0.718, similar to a real-world only dataset, F-measure of 0.725, suggesting the possibility of being less dependent on real data and expensive tasks such as manual annotation. Based on the PP, it was identified that the main challenges faced when dealing with synthetic datasets are the artistic demands, rendering time, and the doubt about the dataset’s validity and reliability. However, being less dependent on human annotation tasks and surpassing several data-gathering challenges from the application environment might overcome the odds.
Conference Paper
Full-text available
This paper presents a design of wall following behaviour for hexapod robot based on PID controller. PID controller is proposed here because of its ability to control many cases of non-linear systems. In this case, we proposed a PID controller to improve the speed and stability of hexapod robot movement while following the wall. In this paper, PID controller is used to control the robot legs, by adjusting the value of swing angle during forward or backward movement to maintain the distance between the robot and the wall. The experimental result was verified by implementing the proposed control method into actual prototype of hexapod robot.
Article
Full-text available
This paper presents a vision-based path planning strategy that aims to reduce the computational time required by a robot to find a feasible path from a starting point to the goal point. The proposed algorithm presents a novel strategy that can be implemented on any well-known path planning algorithm such as A*, D* and probabilistic roadmap (PRM), to improve the swiftness of these algorithms. This path planning algorithm is suitable for real-time scenarios since it reduces the computational time compared to the basis and traditional algorithms. To test the proposed path planning strategy, a tracking control strategy is implemented on a mobile platform. This control strategy consists of three major stages. The first stage deals with gathering information about the surrounding environment using vision techniques. In the second stage, a free-obstacle path is generated using the proposed reduced scheme. In the final stage, a Lyapunov kinematic tracking controller and two Artificial Neural Network (ANN) based-controllers are implemented to track the proposed path by adjusting the rotational and linear velocity of the robot. The proposed path planning strategy is tested on a Pioneer P3-DX differential wheeled mobile robot and an Xtion PRO depth camera. Experimental results prove the efficiency of the proposed path planning scheme, which was able to reduce the computational time by a large percentage which reached up to 88% of the time needed by the basis and traditional scheme, without significant adverse effect on the workability of the basis algorithm. Moreover, the proposed path planning algorithm has improved the path efficiency, in terms of the path length and trackability, challenging the traditional trade-off between swiftness and path efficiency.
Article
Full-text available
Motion planning is a fundamental research area in robotics. Sampling-based methods offer an efficient solution for what is otherwise a rather challenging dilemma of path planning. Consequently, these methods have been extended further away from basic robot planning into further difficult scenarios and diverse applications. A comprehensive survey of the growing body of work in sampling-based planning is given here. Simulations are executed to evaluate some of the proposed planners and highlight some of the implementation details that are often left unspecified. An emphasis is placed on contemporary research directions in this field. We address planners that tackle current issues in robotics. For instance, real-life kinodynamic planning, optimal planning, replanning in dynamic environments, and planning under uncertainty are discussed. The aim of this paper is to survey the state of the art in motion planning and to assess selected planners, examine implementation details and above all shed a light on the current challenges in motion planning and the promising approaches that will potentially overcome those problems.
Conference Paper
While the use of naturally-occurring features is a central focus of machine perception, artificial features (fiducials) play an important role in creating controllable experiments, ground truthing, and in simplifying the development of systems where perception is not the central objective. We describe a new visual fiducial system that uses a 2D bar code style "tag", allowing full 6 DOF localization of features from a single image. Our system improves upon previous systems, incorporating a fast and robust line detection system, a stronger digital coding system, and greater robustness to occlusion, warping, and lens distortion. While similar in concept to the ARTag system, our method is fully open and the algorithms are documented in detail.
Conference Paper
The use of super-wide angle and fish-eye lenses causes strong distortions in the resulting images. A methodology for the correction of distortions in these cases using only single images and linearity of imaged objects is presented. Contrary to most former algorithms, the algorithm discussed here does not depend on information about the real world co-ordinates of matching points. Moreover reference points determination and camera calibration is not required in this case. The algorithm is based on circle fitting. It requires only the possibility of the extraction of distorted image points from straight lines in the 3D scene. Further, the actual distortion must approximately fit the chosen distortion model. For most fish-eye lenses appropriate distortion correction results can be obtained