Thomas B. Moeslund

Thomas B. Moeslund
Aalborg University · Department of Architecture, Design and Media Technology

About

430
Publications
115,569
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,048
Citations
Citations since 2016
204 Research Items
7006 Citations
201620172018201920202021202202004006008001,000
201620172018201920202021202202004006008001,000
201620172018201920202021202202004006008001,000
201620172018201920202021202202004006008001,000
Introduction

Publications

Publications (430)
Article
A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, f...
Chapter
There exists no comprehensive metric for describing the complexity of Multi-Object Tracking (MOT) sequences. This lack of metrics decreases explainability, complicates comparison of datasets, and reduces the conversation on tracker performance to a matter of leader board position. As a remedy, we present the novel MOT dataset complexity metric (MOT...
Preprint
Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically...
Article
Full-text available
This paper presents the extraction of the emotional signals from traumatic brain-injured (TBI) patients through the analysis of facial features and implementation of the effective emotion-recognition model through the Pepper robot to assist in the rehabilitation process. The identification of emotional cues from TBI patients is very challenging due...
Preprint
Full-text available
There exists no comprehensive metric for describing the complexity of Multi-Object Tracking (MOT) sequences. This lack of metrics decreases explainability, complicates comparison of datasets, and reduces the conversation on tracker performance to a matter of leader board position. As a remedy, we present the novel MOT dataset complexity metric (MOT...
Preprint
Full-text available
A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we...
Article
Full-text available
The safe in-field operation of autonomous agricultural vehicles requires detecting all objects that pose a risk of collision. Current vision-based algorithms for object detection and classification are unable to detect unknown classes of objects. In this paper, the problem is posed as anomaly detection instead, where convolutional autoencoders are...
Article
This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera. Conventional visual scene understanding interprets the environment based on specific descriptive categories. However, such a representation is not directly interpretable for decision-making and constrains robot operation to...
Article
Full-text available
This paper summarizes the 2021 ChaLearn Looking at People Challenge on Understanding Social Behavior in Dyadic and Small Group Interactions (DYAD), which featured two tracks, self-reported personality recognition and behavior forecasting, both on the UDIVA v0.5 dataset. We review important aspects of this multimodal and multiview dataset consisting...
Article
Full-text available
We present the first work where re-identification ofthe Giant Sunfish (Mola alexandrini) is automated using computer vision and deep learning. We propose a pipeline that scores an mAP of 60.34% on a full rank of the novel TinyMola dataset which includes 31 IDs and 91 images. The method requires no domain-adaptation or training which makes it especi...
Article
Full-text available
Sensor drift in Wastewater Treatment Plants (WWTPs) reduces the efficiency of the plants and needs to be handled. Several studies have investigated anomaly detection and fault detection in WWTPs. However, these solutions often remain as academic projects. In this study, the gap between academia and practice is investigated by applying suggested alg...
Article
Full-text available
Recent advances in computer vision are primarily driven by the usage of deep learning, which is known to require large amounts of data, and creating datasets for this purpose is not a trivial task. Larger benchmark datasets often have detailed processes with multiple stages and users with different roles during annotation. However, this can be diff...
Preprint
Full-text available
Anomaly detection in X-ray images has been an active and lasting research area in the last decades, especially in the domain of medical X-ray images. For this work, we created a real-world labeled anomaly dataset, consisting of 16-bit X-ray image data of fuel cell electrodes coated with a platinum catalyst solution and perform anomaly detection on...
Article
Full-text available
Abstract Most existing face image Super‐Resolution (SR) methods assume that the Low‐Resolution (LR) images were artificially downsampled from High‐Resolution (HR) images with bicubic interpolation. This operation changes the natural image characteristics and reduces noise. Hence, SR methods trained on such data most often fail to produce good resul...
Article
Full-text available
More frequent and thorough inspection of sewer pipes has the potential to save billions in utilities. However, the amount and quality of inspection are impeded by an imprecise and highly subjective manual process. It involves technicians judging stretches of sewer based on video from remote-controlled robots. Determining the state of sewer pipes ba...
Article
Full-text available
Outdoor fall detection, in the context of accidents, such as falling from heights or in water, is a research area that has not received as much attention as other automated surveillance areas. Gathering sufficient data for developing deep-learning models for such applications has also proven to be not a straight-forward task. Normally, footage of v...
Article
Full-text available
Convolutional neural networks (CNNs) have been originally used for computer vision tasks, such as image classification. While several digital soil mapping studies have been assessing these deep learning algorithms for the prediction of soil properties, their potential for soil classification has not been explored yet. Moreover, the use of deep lear...
Preprint
Transformer models have shown great success modeling long-range interactions. Nevertheless, they scale quadratically with input length and lack inductive biases. These limitations can be further exacerbated when dealing with the high dimensionality of video. Proper modeling of video, which can span from seconds to hours, requires handling long-rang...
Article
Full-text available
Satisfactory indoor thermal environments can improve working efficiencies of office staff. To build such satisfactory indoor microclimates, individual thermal comfort assessment is important, for which personal clothing insulation rate (Icl) and metabolic rate (M) need to be estimated dynamically. Therefore, this paper proposes a vision-based metho...
Article
Full-text available
Facial emotion recognition is an inherently complex problem due to individual diversity in facial features and racial and cultural differences. Moreover, facial expressions typically reflect the mixture of people’s emotional statuses, which can be expressed using compound emotions. Compound facial emotion recognition makes the problem even more dif...
Preprint
Full-text available
Anomaly detection is commonly pursued as a one-class classification problem, where models can only learn from normal training samples, while being evaluated on both normal and abnormal test samples. Among the successful approaches for anomaly detection, a distinguished category of methods relies on predicting masked information (e.g. patches, futur...
Preprint
Full-text available
The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer defect classification has been investigated for decades, little attention has been given to classifying sewer pipe p...
Article
Full-text available
Glare is a common local visual discomfort that is difficult to identify with conventional light sensors. This article presents an artificial intelligence algorithm that detects subjective local glare discomfort from the image analysis of the video footage of an office occupant’s face. The occupant’s face is directly used as a visual comfort sensor....
Chapter
We propose a novel means to improve the accuracy of semantic segmentation based on multi-task learning. More specifically, in our Multi-Task Semantic Segmentation and Super-Resolution (MT-SSSR) framework, we jointly train a super-resolution and semantic segmentation model in an end-to-end manner using the same task loss for both models. This allows...
Article
Full-text available
Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. Ho...
Preprint
Full-text available
Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-...
Preprint
This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera. Conventional visual scene understanding interprets the environment based on specific descriptive categories. However, such a representation is not directly interpretable for decision-making and constrains robot operation to...
Article
Efficient measurement of harvested corn silage from forage harvesters can be a critical tool for a farmer. Suboptimal fragmentation of kernels can affect milk yield from dairy cows when the silage is used as fodder and oversized stover particles can promote mould yielding bacteria during storage due to resulting air pockets. As a forage harvester c...
Chapter
Full-text available
State-of-the-art (SoTA) detection-based tracking methods mostly accomplish the detection and the identification feature learning tasks separately. Only a few efforts include the joint learning of detection and identification features. This work proposes two novel one-stage trackers by introducing implicit and explicit attention to the tracking rese...
Article
Full-text available
Design features such as polishing strokes share similarities with defects; this makes defect detection and quality assessment difficult to perform both manually and automatically. Human assessors rotate objects to probe different incoming illumination angles and evaluate the defect dimension to limits samples i.e. decide whether differences between...
Article
Full-text available
Wheelchair mounted upper limb exoskeletons offer an alternative way to support disabled individuals in their activities of daily living (ADL). Key challenges in exoskeleton technology include innovative mechanical design and implementation of a control method that can assure a safe and comfortable interaction between the human upper limb and exoske...
Preprint
Full-text available
Glare is a common local visual discomfort that is difficult to identify with conventional light sensors. This article presents an artificial intelligence algorithm that detects subjective local glare discomfort from the image analysis of the video footage of an office occupant's face. The occupant's face is directly used as a visual comfort sensor....
Chapter
Systems for automatic inspection of product quality are in high demand. However, their prevalence is limited by complex development and great expenses. Since inspection systems must be engineered to specific products and environments, such systems are generally only viable with high volume product series. Inspired by human visual inspection of high...
Article
Full-text available
Automating inspection of critical infrastructure such as sewer systems will help utilities optimize maintenance and replacement schedules. The current inspection process consists of manual reviews of video as an operator controls a sewer inspection vehicle remotely. The process is slow, labor-intensive, and expensive and presents a huge potential f...
Article
Full-text available
Abstract Dense connections in convolutional neural networks (CNNs), which connect each layer to every other layer, can compensate for mid/high‐frequency information loss and further enhance high‐frequency signals. However, dense CNNs suffer from high memory usage due to the accumulation of concatenating feature‐maps stored in memory. To overcome th...
Preprint
Perhaps surprisingly sewerage infrastructure is one of the most costly infrastructures in modern society. Sewer pipes are manually inspected to determine whether the pipes are defective. However, this process is limited by the number of qualified inspectors and the time it takes to inspect a pipe. Automatization of this process is therefore of high...
Article
Full-text available
Timely maintenance of sewers is essential to preventing reduced functionality and breakdown of the systems. Due to the high costs associated with inspecting a sewer system, substantial research has focused on sewer deterioration modeling and identification of the most useful features. However, there is a lack of consensus in the findings. This stud...
Chapter
In recent years, companies, such as Intel and Google, have brought onto the market small low-power platforms that can be used to deploy and run inference of Deep Neural Networks at a low cost. These platforms can process data at the edge, such as images from a camera, to avoid transfer of large amount of data across a network. To determine which pl...
Chapter
The recently emerged field of explainable artificial intelligence (XAI) attempts to shed lights on ‘black box’ Machine Learning (ML) models in understandable terms for human. As several explanation methods are developed alongside different applications for a black box model, the need for expert-level evaluation in inspecting their effectiveness bec...
Preprint
Most existing face image Super-Resolution (SR) methods assume that the Low-Resolution (LR) images were artificially downsampled from High-Resolution (HR) images with bicubic interpolation. This operation changes the natural image characteristics and reduces noise. Hence, SR methods trained on such data most often fail to produce good results when a...
Preprint
Full-text available
Explainable Artificial Intelligence (XAI) has in recent years become a well-suited framework to generate human understandable explanations of black box models. In this paper, we present a novel XAI visual explanation algorithm denoted SIDU that can effectively localize entire object regions responsible for prediction in a full extend. We analyze it...
Preprint
Full-text available
Despite recent significant advancements in the field of human emotion recognition, applying upper body movements along with facial expressions present severe challenges in the field of human-robot interaction. This article presents a model that learns emotions through upper body movements and corresponds with facial expressions. Once this correspon...
Chapter
Thermal cameras are used in various domains where the vision of RGB cameras is limited. Thermographic imaging enables the visualizations of objects beyond the visible range, which enables its use in many applications like autonomous cars, nightly footage, military, or surveillance. However, the high cost of manufacturing this type of camera limits...