Ioannis Pitas’s research while affiliated with Aristotle University of Thessaloniki and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (697)


RoboFireFuseNet: Robust Fusion of Visible and Infrared Wildfire Imaging for Real-Time Flame and Smoke Segmentation
  • Preprint

May 2025

Dimitrios Fotiou

·

Vasileios Mygdalis

·

Ioannis Pitas

Architecture of the proposed robust tracking module. Numbers indicate the depth of the calculated features after each layer
The training procedure for the proposed robust tracking module (RTM) involves the following steps: In each iteration, training data (images) are loaded and then distorted, resulting in both clean and distorted versions of the same training samples. The RTM processes the distorted image, and its output is compared to the clean image, generating the L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_1$$\end{document}-loss. The outputs of RTM for both the clean and distorted images are then given as input into the tracking procedure, which produces the tracking loss. The L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_1$$\end{document}-loss and tracking loss are summed, and the RTM’s weights are updated accordingly. This process is repeated for 60k iterations
Example training pairs Ie\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {I}_e$$\end{document}, Id\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {I}_d$$\end{document} with varying levels of Salt and Pepper noise
Example training pairs Ie\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {I}_e$$\end{document}, Id\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {I}_d$$\end{document} with varying levels of White Gaussian Noise
Average overlap score in LaSOT for input images distorted by White Gaussian Noise, Salt and Pepper noise and Gaussian Blurring. The baseline performance of each method is indicated in gray, while the improvement achieved by employing RTM is highlighted in red

+5

Enhancing visual object tracking robustness through a lightweight denoising module
  • Article
  • Full-text available

April 2025

·

17 Reads

The Visual Computer

Visual object tracking is crucial for numerous applications ranging from smartphones to autonomous vehicles. However, the impact of input noise on tracking performance remains underexplored. This paper presents a lightweight neural network module designed to enhance the robustness of 2D tracking methods against various types of noise. By performing image-to-image translation, the proposed robust tracking module (RTM) standardizes the operational space of tracking algorithms, thereby improving their resilience. Experimental results on benchmark datasets demonstrate the effectiveness of RTM in mitigating performance degradation caused by noise. Additionally, we introduce an evaluation toolkit that facilitates the assessment of tracking robustness against common noise types. The source code of the proposed method is available at https://github.com/iason1907/RTM.

Download




These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios With Decisive Disparity Diffusion

February 2025

·

1 Read

·

2 Citations

IEEE Transactions on Image Processing

Chuang-Wei Liu

·

Yikang Zhang

·

·

[...]

·

Rui Fan

Stereo matching has emerged as a cost-effective solution for road surface 3D reconstruction, garnering significant attention towards improving both computational efficiency and accuracy. This article introduces decisive disparity diffusion (D3Stereo), marking the first exploration of dense deep feature matching that adapts pre-trained deep convolutional neural networks (DCNNs) to previously unseen road scenarios. A pyramid of cost volumes is initially created using various levels of learned representations. Subsequently, a novel recursive bilateral filtering algorithm is employed to aggregate these costs. A key innovation of D3Stereo lies in its alternating decisive disparity diffusion strategy, wherein intra-scale diffusion is employed to complete sparse disparity images, while inter-scale inheritance provides valuable prior information for higher resolutions. Extensive experiments conducted on our created UDTIRI-Stereo and Stereo-Road datasets underscore the effectiveness of D3Stereo strategy in adapting pre-trained DCNNs and its superior performance compared to all other explicit programming-based algorithms designed specifically for road surface 3D reconstruction. Additional experiments conducted on the Middlebury dataset with backbone DCNNs pre-trained on the ImageNet database further validate the versatility of D3Stereo strategy in tackling general stereo matching problems. Our source code and supplementary material are publicly available at https://mias.group/D3-Stereo.




SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection

January 2025

·

5 Reads

·

7 Citations

IEEE Transactions on Instrumentation and Measurement

Feature-fusion networks with duplex encoders have proven to be an effective technique to solve the road freespace detection problem. However, despite the compelling results achieved by previous research efforts, the exploration of adequate and discriminative heterogeneous feature fusion, as well as the development of fallibility-aware loss functions, remains relatively scarce. This article makes several significant contributions to address these limitations: 1) it presents a novel heterogeneous feature fusion block, comprising a holistic attention module, a heterogeneous feature contrast descriptor, and an affinity-weighted feature recalibrator, enabling a more in-depth exploitation of the inherent characteristics of the extracted features, 2) it incorporates both inter-scale and intra-scale skip connections into the decoder architecture, while eliminating redundant ones, leading to both improved accuracy and computational efficiency, and 3) it introduces two fallibility-aware loss functions that separately focus on semantic-transition and depth-inconsistent regions, collectively contributing to greater supervision during model training. Our proposed SNE-RoadSegV2, which incorporates all these innovative components, demonstrates superior performance in comparison to all other freespace detection algorithms across multiple public datasets.



Citations (49)


... Advancements in machine intelligence and autonomous systems have dramatically fueled the integration of environmental perception technologies into daily life and various industries [1][2][3][4][5]. This widespread adoption is prominently seen in applications such as autonomous cars [6], smart wheelchairs [7], and unmanned ground vehicles [8]. Recently, researchers have shifted their focus toward enhancing both driving safety and comfort [9,10]. ...

Reference:

A glance over the past decade: road scene parsing towards safe and comfortable autonomous driving
SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection
  • Citing Article
  • January 2025

IEEE Transactions on Instrumentation and Measurement

... Deep learning has made remarkable advancements in fields such as autonomous driving [1]- [3], with real-time object detection technologies, exemplified by YOLOs and DETRs, gaining widespread adoption [4], [5]. However, due to factors such as light attenuation, color distortion, and difficulty distinguishing targets from coral reefs, mud, and other underwater structures, the development of highperformance real-time underwater object detection (UOD) has been relatively slow [6]. ...

These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios With Decisive Disparity Diffusion
  • Citing Article
  • February 2025

IEEE Transactions on Image Processing

... It is widely accepted that there are three major components impacting inference performance: Efficiency, Consistency and Accuracy [42][43][44]. Efficiency refers to how quickly the model processes and returns responses. It is influenced by (a) Response Time (i.e., the time taken for the model to generate a response after receiving a query); (b) Server Busy Rate (SBR, i.e., the proportion of queries that cannot be processed due to system overload or resource constraints). ...

Efficient Data Utilization in Deep Neural Networks for Inference Reliability
  • Citing Conference Paper
  • October 2024

... Neural networks are a powerful and flexible tool for forecasting [19][20][21]. When determining what exactly should be predicted, it is necessary to specify the variables that are analyzed and used in the forecasting process. ...

Generative Representation Learning in Recurrent Neural Networks for Causal Timeseries Forecasting

IEEE Transactions on Artificial Intelligence

... A few leading countries have been incorporating UAV swarms in their military attacks [6,7] and they have been proven to be extremely effective. UAV swarms have also been used in the film industry for cinematography purposes [8,9] and in the entertainment sector to create appealing drone shows that grab the attention of tourists [10,11], such as in Dubai. The practical implications of securing UAV operations extend across diverse sectors, showcasing the transformative potential of this technology when effectively safeguarded against threats such as GPS spoofing. ...

Vision-based Drone Control for Autonomous UAV Cinematography

... A typical stage where the NMS are struggle in performing are, when they are operated on the image, depicting the objects in a complex scenes, and in more of complex and several occlusions [17]. To overcome all these complexities faced by the existing approaches, such as the complexity in background, increased computational time, less accuracy rates, intensive calculation protocols [18][19][20], the proposed model used the Deep Pliable YOLOv5 model, for the aircraft detection using the aircraft and airbus dataset, using the Adaptive Spatial Pooling layer for obtaining the features via a parallelised method, which are more advanced than the traditional pooling method in the image recognition. ...

Efficient Feature Extraction for Non-Maximum Suppression in Visual Person Detection

... Parallel implementations have significantly improved computational efficiency on embedded GPU architectures, demonstrating more than 40 times speedups while maintaining detection accuracy [29]. Neural attention-driven techniques have emerged to better handle occlusions in pedestrian detection by jointly processing geometric and visual properties through sequenceto-sequence formulations [30]. For overlapping object scenarios, adaptive NMS strategies dynamically adjust IoU thresholds based on object density, substantially enhancing performance for weed detection applications [31]. ...

Neural Attention-Driven Non-Maximum Suppression for Person Detection

IEEE Transactions on Image Processing

... Zhang et al. proposed a multidimensional scaling method based on the Wasserstein-Fourier distance to classify complex time series from a frequency domain perspective . A fast multidimensional scaling method on big geospatial data using neural networks is presented by Mademlis et al., where sampling a small subset of the original dataset is conducted (Mademlis et al., 2023). Multidimensional scaling is also combined with principal component analysis to analyze varietal sedimentary provenance data (Vermeesch et al.,2023), and it is used to investigate and compare the similarity of writing prompts in the IELTS and TOEFL iBT tests (Khademi, 2023). ...

Fast Multidimensional Scaling on Big Geospatial Data Using Neural Networks

Earth Science Informatics

... In early approaches, the summarization component was composed of LSTM units that estimated the frames' importance according to their temporal dependence (thus indicating the most significant video parts for inclusion in the summary), while the reconstruction of the video based on the specified summary was performed using trainable auto-encoders (Mahasseni et al., 2017;Apostolidis et al., 2019;Yuan et al., 2020) that in some cases were combined with tailored attention mechanisms (Jung et al., 2019;Apostolidis et al., 2020;Kanafani et al., 2021). In more recent methods, the selection of the most important frames or fragments for the summary was assisted by trainable Actor-Critic models (Apostolidis et al., 2021a;Alexoudi et al., 2023), self-attention mechanisms (He et al., 2019;Jung et al., 2020;Liang et al., 2022), spatio-temporal networks (Wu et al., 2021) or knowledge distillation mechanisms (Sreeja & Kovoor, 2022). A less popular approach for unsupervised video summarization is based on the definition of hand-crafted reward functions about specific properties of the generated summary, and the use of the computed rewards for training video summarization architectures based on reinforcement learning. ...

Escaping local minima in Deep Reinforcement Learning for video summarization