Toby P. Breckon’s research while affiliated with Durham University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (315)


Figure 1. 1-NNA-Abs50 EMD & COV EMD ( Sec. 4.1) performance (%) vs. parameter size (millions) on ShapeNet-v2 Car category. For 1-NNA-Abs50 EMD (left), lower value indicates better generation quality and fidelity. For COV EMD (right), higher is better diversity. In both plots, moving left along the horizontal axis denotes smaller model.
Figure 3. Qualitative results comparing our approach (right) with other leading contemporary approaches (left/middle), our TFDM can generate high-quality and diverse point clouds. Three illustrative object categories tairplanes, chairs, carsu are included here only.
Figure 8. Qualitative results of our model jointly trained on ten categories, presented in the following order: bag, keyboard, mug, pillow, rocket, earphone, basket, bed, bowl, and cap.
TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba
  • Preprint
  • File available

March 2025

·

2 Reads

Jiaxu Liu

·

·

·

Toby P. Breckon

Diffusion models currently demonstrate impressive performance over various generative tasks. Recent work on image diffusion highlights the strong capabilities of Mamba (state space models) due to its efficient handling of long-range dependencies and sequential data modeling. Unfortunately, joint consideration of state space models with 3D point cloud generation remains limited. To harness the powerful capabilities of the Mamba model for 3D point cloud generation, we propose a novel diffusion framework containing dual latent Mamba block (DM-Block) and a time-variant frequency encoder (TF-Encoder). The DM-Block apply a space-filling curve to reorder points into sequences suitable for Mamba state-space modeling, while operating in a latent space to mitigate the computational overhead that arises from direct 3D data processing. Meanwhile, the TF-Encoder takes advantage of the ability of the diffusion model to refine fine details in later recovery stages by prioritizing key points within the U-Net architecture. This frequency-based mechanism ensures enhanced detail quality in the final stages of generation. Experimental results on the ShapeNet-v2 dataset demonstrate that our method achieves state-of-the-art performance (ShapeNet-v2: 0.14\% on 1-NNA-Abs50 EMD and 57.90\% on COV EMD) on certain metrics for specific categories while reducing computational parameters and inference time by up to 10×\times and 9×\times, respectively. Source code is available in Supplementary Materials and will be released upon accpetance.

Download

Fig. 3: Sensor placement. Left: the top view of the vehicle equipped with sensors. Right: our spherical camera on top of the LiDAR. Both figures show the coordinates space for each sensor.
Fig. 4: Validation loss curves for different values of γ. From top to bottom: γ = 0.2, 0.4, 0.8, 0.6, 1, 2, 5. The curves illustrate how the choice of γ influences the convergence behavior during training.
Fig. 5: The inference visualisation of the Coarse/Fine sampling strategy and Focal Loss with γ = 2 on Dur360BEV validation split. Left: Input image; Middle: Prediction; Right: Ground Truth Map.
Dur360BEV: A Real-world Single 360-degree Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving

March 2025

·

13 Reads

Wenke E

·

Chao Yuan

·

Li Li

·

[...]

·

Toby P. Breckon

We present Dur360BEV, a novel spherical camera autonomous driving dataset equipped with a high-resolution 128-channel 3D LiDAR and a RTK-refined GNSS/INS system, along with a benchmark architecture designed to generate Bird-Eye-View (BEV) maps using only a single spherical camera. This dataset and benchmark address the challenges of BEV generation in autonomous driving, particularly by reducing hardware complexity through the use of a single 360-degree camera instead of multiple perspective cameras. Within our benchmark architecture, we propose a novel spherical-image-to-BEV (SI2BEV) module that leverages spherical imagery and a refined sampling strategy to project features from 2D to 3D. Our approach also includes an innovative application of Focal Loss, specifically adapted to address the extreme class imbalance often encountered in BEV segmentation tasks. Through extensive experiments, we demonstrate that this application of Focal Loss significantly improves segmentation performance on the Dur360BEV dataset. The results show that our benchmark not only simplifies the sensor setup but also achieves competitive performance.


FEVER-OOD: Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection

December 2024

·

6 Reads

Modern machine learning models, that excel on computer vision tasks such as classification and object detection, are often overconfident in their predictions for Out-of-Distribution (OOD) examples, resulting in unpredictable behaviour for open-set environments. Recent works have demonstrated that the free energy score is an effective measure of uncertainty for OOD detection given its close relationship to the data distribution. However, despite free energy-based methods representing a significant empirical advance in OOD detection, our theoretical analysis reveals previously unexplored and inherent vulnerabilities within the free energy score formulation such that in-distribution and OOD instances can have distinct feature representations yet identical free energy scores. This phenomenon occurs when the vector direction representing the feature space difference between the in-distribution and OOD sample lies within the null space of the last layer of a neural-based classifier. To mitigate these issues, we explore lower-dimensional feature spaces to reduce the null space footprint and introduce novel regularisation to maximize the least singular value of the final linear layer, hence enhancing inter-sample free energy separation. We refer to these techniques as Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection (FEVER-OOD). Our experiments show that FEVER-OOD techniques achieve state of the art OOD detection in Imagenet-100, with average OOD false positive rate (at 95% true positive rate) of 35.83% when used with the baseline Dream-OOD model.


Racial Bias within Face Recognition: A Survey

November 2024

·

25 Reads

·

10 Citations

ACM Computing Surveys

Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current causation and future potential solutions. In support, this study provides an extensive taxonomic review of research on racial bias within face recognition exploring every aspect and stage of the associated facial processing pipeline. Firstly, we discuss the problem definition of racial bias, starting with race definition, grouping strategies, and the societal implications of using race or race-related groupings. Secondly, we divide the common face recognition processing pipeline into four stages: image acquisition, face localisation, face representation, face verification and identification, and review the relevant corresponding literature associated with each stage. The overall aim is to provide comprehensive coverage of the racial bias problem with respect to each and every stage of the face recognition processing pipeline whilst also highlighting the potential pitfalls and limitations of contemporary mitigation strategies that need to be considered within future research endeavours or commercial applications alike.




Progressively Select and Reject Pseudolabeled Samples for Open-Set Domain Adaptation

September 2024

·

10 Reads

·

22 Citations

IEEE Transactions on Artificial Intelligence

Domain adaptation solves image classification problems in the target domain by taking advantage of the labelled source data and unlabelled target data. Usually, the source and target domains share the same set of classes. As a special case, Open-Set Domain Adaptation (OSDA) assumes there exist additional classes in the target domain but are not present in the source domain. To solve such a domain adaptation problem, our proposed method learns discriminative common subspaces for the source and target domains using a novel Open-Set Locality Preserving Projection (OSLPP) algorithm. The source and target domain data are aligned in the learned common spaces class-wise. To handle the open-set classification problem, our method progressively selects target samples to be pseudo-labelled as known classes, rejects the outliers if they are detected as unknown classes, and leaves the remaining target samples as uncertain. The common subspace learning algorithm OSLPP simultaneously aligns the labelled source data and pseudo-labelled target data from known classes and pushes the rejected target data away from the known classes. The common subspace learning and the pseudo-labelled sample selection/rejection facilitate each other in an iterative learning framework and achieve state-of-the-art performance on four benchmark datasets Office-31, Office-Home, VisDA17 and Syn2Real-O with the average HOS of 87.6%, 67.0%, 76.1% and 65.6% respectively.


Figure 1: Our proposed TraIL architecture for 3D object detection leverages TraIL features from the point cloud. ➊ We take point cloud inputs as input and augment them with differing views. ➋ The augmented point clouds are sampled to the initial paired region proposals. ➌ The encoding module (TraIL MAE) extracts expressive proposal representations by considering the geometric relations among points within each proposal. ➍ We extract the concatenated features with the Multi-Head Attention Encoding Module (TraIL MAE). ➎ Inter-Proposal Discrimination (IPD) and Inter-Cluster Separation (ICS), i.e. D&S module [52] are subsequently enforced to optimize the whole network.
Component-wise ablation of our TraIL-Det.
TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training

August 2024

·

13 Reads

3D point clouds are essential for perceiving outdoor scenes, especially within the realm of autonomous driving. Recent advances in 3D LiDAR Object Detection focus primarily on the spatial positioning and distribution of points to ensure accurate detection. However, despite their robust performance in variable conditions, these methods are hindered by their sole reliance on coordinates and point intensity, resulting in inadequate isometric invariance and suboptimal detection outcomes. To tackle this challenge, our work introduces Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture. Our TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize the inherent isotropic radiation of LiDAR to enhance local representation, improve computational efficiency, and boost detection performance. To effectively process the geometric relations among points within each proposal, we propose a Multi-head self-Attention Encoder (MAE) with asymmetric geometric features to encode high-dimensional TraIL features into manageable representations. Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI (67.8, 20% label, moderate) and Waymo (68.9, 20% label, moderate) datasets under various label ratios (20%, 50%, and 100%).


Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis

July 2024

·

11 Reads

Object detection is a pivotal task in computer vision that has received significant attention in previous years. Nonetheless, the capability of a detector to localise objects out of the training distribution remains unexplored. Whilst recent approaches in object-level out-of-distribution (OoD) detection heavily rely on class labels, such approaches contradict truly open-world scenarios where the class distribution is often unknown. In this context, anomaly detection focuses on detecting unseen instances rather than classifying detections as OoD. This work aims to bridge this gap by leveraging an open-world object detector and an OoD detector via virtual outlier synthesis. This is achieved by using the detector backbone features to first learn object pseudo-classes via self-supervision. These pseudo-classes serve as the basis for class-conditional virtual outlier sampling of anomalous features that are classified by an OoD head. Our approach empowers our overall object detector architecture to learn anomaly-aware feature representations without relying on class labels, hence enabling truly open-world object anomaly detection. Empirical validation of our approach demonstrates its effectiveness across diverse datasets encompassing various imaging modalities (visible, infrared, and X-ray). Moreover, our method establishes state-of-the-art performance on object-level anomaly detection, achieving an average recall score improvement of over 5.4% for natural images and 23.5% for a security X-ray dataset compared to the current approaches. In addition, our method detects anomalies in datasets where current approaches fail. Code available at https://github.com/KostadinovShalon/oln-ssos.


TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training

July 2024

·

20 Reads

·

1 Citation

3D point clouds are essential for perceiving outdoor scenes, especially within the realm of autonomous driving. Recent advances in 3D LiDAR Object Detection focus primarily on the spatial positioning and distribution of points to ensure accurate detection. However, despite their robust performance in variable conditions, these methods are hindered by their sole reliance on coordinates and point intensity, resulting in inadequate isometric in-variance and suboptimal detection outcomes. To tackle this challenge, our work introduces Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture. Our TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize the inherent isotropic radiation of LiDAR to enhance local representation, improve computational efficiency, and boost detection performance. To effectively process the geometric relations among points within each proposal, we propose a Multi-head self-Attention Encoder (MAE) with asymmetric geometric features to encode high-dimensional TraIL features into manageable representations. Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI (67.8, 20% label, moderate) and Waymo (68.9, 20% label, moderate) datasets under various label ratios (20%, 50%, and 100%).


Citations (54)


... Scope: Fairness and bias in machine learning are expansive topics, and their application in biometrics has drawn significant attention in recent years. Several comprehensive reviews have addressed fairness and bias in machine learning broadly [5], [20], while others focus specifically on biometrics, offering insights into various modalities such as face, fingerprint, and vein, alongside applications beyond recognition, including region of interest (ROI) detection, quality assessment, and presentation attack detection [3], [4], [13], [21]. However, as face remains the most commonly used biometric trait, a substantial portion of research on demographic bias has concentrated on this modality. ...

Reference:

Review of Demographic Bias in Face Recognition
Racial Bias within Face Recognition: A Survey
  • Citing Article
  • November 2024

ACM Computing Surveys

... Although segmentation tasks can be categorized into semantic level, instance level, panoptic level and 4D panoptic level, they can generally be divided into four types based on the input to the networks. Point-based [52,28,62,53,83,81,3,19,47,37], projection-based [71,82,51,32,2,74,14,84,10], voxel-based [13,20,86,43,35,40,41,26] and multi-modality-based [72,75,44,11,87,26,80]. Despite the notable success of LiDAR point cloud segmentation, the effectiveness of automatic annotation using them directly in a zero-shot manner remains unsatisfactory. ...

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

... This makes it challenging for untrained personnel to identify prohibited items amidst high occlusion and clutter [2,41]. Additionally, directly transferring pretrained models from natural images is difficult because X-ray images exhibit unique characteristics such as inherent material-based color variations, insufficient texture information, mutual occlusions, cluttered backgrounds, and high similarity among different objects' appearances [14,18]. ...

Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery
  • Citing Conference Paper
  • June 2024

... in-domain performance established this representation as the default choice for most state-of-the-art detection models. However, such models are highly sensitive to rigid transformations [25,57], being more biased toward object position than local features like shape or appearance. Therefore, they generalize poorly to novel, unseen domains [9,56,62,63]. ...

TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training

... (2) Other factors, such as the number of model parameters, max GPU memory allocated during inference, inference latency, frames per second (FPS), intersection-over-union (IoU) scores over each class, and mean IoU (mIoU) scores over all classes, should be considered together. For example, the latency or FPS of the models during testing must be taken into consideration in the real-time applications, although obtaining high IoU and mIoU scores in computer vision fields is a de facto decisive factor in paper acceptance [18]- [20]. Therefore, a comprehensive comparison study of the PCS models from an application perspective is required. ...

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

... CMD aggregates multi-view visual features to enhance point representation training, while SVC iteratively clusters point features to extract semantic classes. U3DS 3 [64] generates superpoints based on geometry for clustering, followed by iterative training using clustering-based pseudo-labels. Moreover, U3DS 3 [64] utilizes voxelized features for representation learning. ...

U3DS 3 : Unsupervised 3D Semantic Scene Segmentation

... Wang et al. [42] transfer the knowledge from a labeled source graph to an unlabeled target graph with the objective of learning feature representations and classifying the nodes of the target domain correctly. Wang et al. [43] adopt a progressive pseudo-labeling strategy to assess the reliability of target samples and exclude those with uncertain classifications, and learn shared subspaces across the source and target domains. ...

Progressively Select and Reject Pseudolabeled Samples for Open-Set Domain Adaptation
  • Citing Article
  • September 2024

IEEE Transactions on Artificial Intelligence

... Neural Representations with Vector Quantization. Vector quantization, first introduced in VQ-VAE (Oord, Vinyals, and Kavukcuoglu 2017) for image generation, has been applied in binary neural networks (Gordon et al. 2023), data augmentation (Wu et al. 2022), compression (Dupont et al. 2022), novel view synthesis (Yang et al. 2023b), point cloud completion (Fei et al. 2022), image synthesis (Gu et al. 2022), and 3D reconstruction/generation using Transformers or diffusion models (Corona-Figueroa et al. 2023;Li et al. 2023a). Unlike these approaches, we quantize input queries to approximate continuous representations for SLAM systems, addressing runtime efficiency and visibility constraints during optimization. ...

Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers

... Neural Architecture Search (NAS) has emerged as a powerful paradigm in machine learning, offering the potential to automatically identify optimal neural network (NN) architectures for a given task [1]. In recent years, NAS has gained broad attention due to its versatility and applicability in scenarios where computational or hardware constraints demand efficient and specialized models, such as mobile devices or edge computing environments [2,3]. ...

Neural architecture search: A contemporary literature review for computer vision applications
  • Citing Article
  • October 2023

Pattern Recognition

... Although segmentation tasks can be categorized into semantic level, instance level, panoptic level and 4D panoptic level, they can generally be divided into four types based on the input to the networks. Point-based [52,28,62,53,83,81,3,19,47,37], projection-based [71,82,51,32,2,74,14,84,10], voxel-based [13,20,86,43,35,40,41,26] and multi-modality-based [72,75,44,11,87,26,80]. Despite the notable success of LiDAR point cloud segmentation, the effectiveness of automatic annotation using them directly in a zero-shot manner remains unsatisfactory. ...

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation