Mohammed Bennamoun

Mohammed Bennamoun
  • The University of Western Australia

About

709
Publications
277,058
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
27,717
Citations
Introduction
Bennamoun is Winthrop Professor. He served as Head of School at UWA for 5 years, and Director of a research Centre at QUT for 4 years. He was Erasmus Mundus Scholar at University of Edinburgh. He is the co-author of a book on Object Recognition, Springer 2001. He published over 150 publications, and secured highly competitive national grants. He won the Best Supervisor of the Year Award at QUT and award for research supervision at UWA. His interest includes computer vision and robotics.
Current institution
The University of Western Australia
Additional affiliations
January 2003 - present
The University of Western Australia
Position
  • Winthrop Professor
January 2003 - present
The University of Western Australia
Position
  • Winthrop Professor
Description
  • Main area of current research: Computer Vision with a wide applications in various other areas.
January 1998 - December 2000
Queensland University of Technology
Position
  • Director, Space Centre for Satellite Navigation
Education
July 1992 - June 1996
Queensland University of Technology / Queen's University
Field of study
  • Computer Vision

Publications

Publications (709)
Preprint
Image-to-video (I2V) generation seeks to produce realistic motion sequences from a single reference image. Although recent methods exhibit strong temporal consistency, they often struggle when dealing with complex, non-repetitive human movements, leading to unnatural deformations. To tackle this issue, we present LatentMove, a DiT-based framework s...
Article
Full-text available
3D human pose estimation can be handled by encoding the geometric dependencies between the body parts and enforcing the kinematic constraints. Recently, transformers have been adopted to better encode the long-range dependencies between the joints across both the spatial and temporal domains. However, previous studies have highlighted the need of i...
Preprint
Full-text available
Humans naturally understand moments in a video by integrating visual and auditory cues. For example, localizing a scene in the video like "A scientist passionately speaks on wildlife conservation as dramatic orchestral music plays, with the audience nodding and applauding" requires simultaneous processing of visual, audio, and speech signals. Howev...
Article
Full-text available
Patch-based matching is a technique meant to measure the disparity between pixels in a source and target image and is at the core of various methods in computer vision. When the subpixel disparity between the source and target images is required, the cost function or the target image has to be interpolated. While cost-based interpolation is easier...
Article
Full-text available
Controlling systems with highly nonlinear or uncertain dynamics present significant challenges, particularly when using conventional Proportional–Integral–Derivative (PID) controllers, as they can be difficult to tune. While PID controllers can be adapted for such systems using advanced tuning methods, they often struggle with lag and instability d...
Article
Full-text available
Monocular depth cues, such as shading, are fundamental for resolving three-dimensional information, such as an object’s shape. Animal colour patterns may potentially exploit this mechanism of depth perception, generating false illusions for functions such as camouflage. Reconstructing the potential percept produced by false depth cues is challengin...
Preprint
Full-text available
In Computational Pathology (CPath), the introduction of Vision-Language Models (VLMs) has opened new avenues for research, focusing primarily on aligning image-text pairs at a single magnification level. However, this approach might not be sufficient for tasks like cancer subtype classification, tissue phenotyping, and survival analysis due to the...
Preprint
Full-text available
Radio Frequency Interference (RFI) is a known growing challenge for radio astronomy, intensified by increasing observatory sensitivity and prevalence of orbital RFI sources. Spiking Neural Networks (SNNs) offer a promising solution for real-time RFI detection by exploiting the time-varying nature of radio observation and neuron dynamics together. T...
Preprint
Full-text available
Radio Frequency Interference (RFI) from anthropogenic radio sources poses significant challenges to current and future radio telescopes. Contemporary approaches to detecting RFI treat the task as a semantic segmentation problem on radio telescope spectrograms. Typically, complex heuristic algorithms handle this task of `flagging' in combination wit...
Preprint
Full-text available
Advancements in Computer-Aided Screening (CAS) systems are essential for improving the detection of security threats in X-ray baggage scans. However, current datasets are limited in representing real-world, sophisticated threats and concealment tactics, and existing approaches are constrained by a closed-set paradigm with predefined labels. To addr...
Article
Semantic instance completion aims to recover the complete 3D shapes of foreground objects together with their labels from a partial 2.5D scan of a scene. Previous works have relied on full supervision, which requires ground-truth annotations, in the form of bounding boxes and complete 3D objects. This has greatly limited their real-world applicatio...
Preprint
Full-text available
We propose a novel framework for the statistical analysis of genus-zero 4D surfaces, i.e., 3D surfaces that deform and evolve over time. This problem is particularly challenging due to the arbitrary parameterizations of these surfaces and their varying deformation speeds, necessitating effective spatiotemporal registration. Traditionally, 4D surfac...
Preprint
Full-text available
The preservation of aquatic biodiversity is critical in mitigating the effects of climate change. Aquatic scene understanding plays a pivotal role in aiding marine scientists in their decision-making processes. In this paper, we introduce AquaticCLIP, a novel contrastive language-image pre-training model tailored for aquatic scene understanding. Aq...
Preprint
Significant progress has been made in the field of video question answering (VideoQA) thanks to deep learning and large-scale pretraining. Despite the presence of sophisticated model structures and powerful video-text foundation models, most existing methods focus solely on maximizing the correlation between answers and video-question pairs during...
Article
This paper introduces a two-phase learning approach for hyperspectral image (HSI) classification using few-shot learning. For the first phase, we present a novel spatiospectral masked autoencoder (ssMAE) - an advanced self-supervised learner. For the ssMAE backbone network, we designed a transformer encoder-decoder network, where we replaced the li...
Article
Detecting small objects in optical images and videos is a significant challenge in numerous intelligent transportation and autonomous systems. State-of-the-art generic object detection methods fail to accurately localize and identify such small objects (e.g., pedestrians, small vehicles, obstacles). Because small objects occupy only a small area in...
Article
Full-text available
With the exponential rise in global air traffic, ensuring swift passenger processing while countering potential security threats has become a paramount concern for aviation security. Although X-ray baggage monitoring is now standard, manual screening has several limitations, including the propensity for errors, and raises concerns about passenger p...
Preprint
In response to the growing threat of deepfake technology, we introduce BENet, a Cross-Domain Robust Bias Expansion Network. BENet enhances the detection of fake faces by addressing limitations in current detectors related to variations across different types of fake face generation techniques, where ``cross-domain" refers to the diverse range of th...
Preprint
Full-text available
Spiking Neural Networks (SNNs) promise efficient spatio-temporal data processing owing to their dynamic nature. This paper addresses a significant challenge in radio astronomy, Radio Frequency Interference (RFI) detection, by reformulating it as a time-series segmentation task inherently suited for SNN execution. Automated RFI detection systems cap...
Preprint
This paper introduces a novel framework for unified incremental few-shot object detection (iFSOD) and instance segmentation (iFSIS) using the Transformer architecture. Our goal is to create an optimal solution for situations where only a few examples of novel object classes are available, with no access to training data for base or old classes, whi...
Article
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, w...
Preprint
Full-text available
Fuzzy PD controllers are widely used in industry due to their excellent control performance and robustness. However, their performance heavily relies on manually designed fuzzy logic. Fuzzy neural networks (FNNs) combine neural networks and fuzzy logic, allowing them to utilize expert knowledge and possess self-learning capabilities. FNNs can be us...
Article
Full-text available
Maize (Zea mays L.) has been shown to be sensitive to temperature deviations, influencing its yield potential. The development of new maize hybrids resilient to unfavourable weather is a desirable aim for crop breeders. In this paper, we showcase the development of a multimodal deep learning model using RGB images, phenotypic, and weather data unde...
Preprint
Full-text available
We introduce Referring Human Pose and Mask Estimation (R-HPM) in the wild, where either a text or positional prompt specifies the person of interest in an image. This new task holds significant potential for human-centric applications such as assistive robotics and sports analysis. In contrast to previous works, R-HPM (i) ensures high-quality, iden...
Preprint
Full-text available
Recent years witnessed remarkable progress in computational histopathology, largely fueled by deep learning. This brought the clinical adoption of deep learning-based tools within reach, promising significant benefits to healthcare, offering a valuable second opinion on diagnoses, streamlining complex tasks, and mitigating the risks of inconsistenc...
Preprint
Full-text available
Automatic annotation of large-scale datasets can introduce noisy training data labels, which adversely affect the learning process of deep neural networks (DNNs). Consequently, Noisy Labels Learning (NLL) has become a critical research field for Convolutional Neural Networks (CNNs), though it remains less explored for Vision Transformers (ViTs). In...
Article
Plant disease outbreaks continuously challenge food security and sustainability. Traditional chemical methods used to treat diseases have environmental and health concerns, raising the need to enhance inherent plant disease resistance mechanisms. Traits, including disease resistance, can be linked to specific loci in the genome and identifying thes...
Preprint
Full-text available
We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shap...
Preprint
Full-text available
This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework, focusing on their impact on video generation quality and computational efficiency. Our findings indicate that CLIP embeddings, while crucial for aesthetic quality, do not significantly contribute towards the subject and background consistenc...
Article
Full-text available
Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper pres...
Preprint
Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain o...
Preprint
Full-text available
Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper pres...
Article
Early diagnosis of Alzheimer's disease (AD) is crucial for its prevention, and hippocampal atrophy is a significant lesion for early diagnosis. The current DL-based AD diagnosis methods only focus on either AD classification or hippocampus segmentation independently, neglecting the correlation between the two tasks and lacking pathological interpre...
Preprint
Full-text available
This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations...
Article
Full-text available
Weeds pose a significant threat to agricultural production, leading to substantial yield losses and increased herbicide usage, with severe economic and environmental implications. This paper uses deep learning to explore a novel approach via targeted segmentation mapping of crop plants rather than weeds, focusing on canola (Brassica napus) as the t...
Article
This paper proposes a novel transformer-based framework to generate accurate class-specific object localization maps for weakly supervised semantic segmentation (WSSS). Leveraging the insight that the attended regions of the one-class token in the standard vision transformer can generate class-agnostic localization maps, we investigate the transfor...
Article
Full-text available
Autonomous X-ray baggage security screening has shown significant strides recently, proving itself a viable solution to the flaws in manual screening, thanks to advancements in deep learning. However, these data-hungry techniques feed on extensively annotated data involving strenuous labor, impeding their advances in baggage screening. Consequently...
Article
Full-text available
Current semi-supervised video object segmentation (VOS) methods often employ the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we introduce a Region Aware Video Object Segmentation (RAVOS) approach, which predicts regions of interest (ROIs) for effic...
Article
Full-text available
This study explores the effectiveness of Explainable Artificial Intelligence (XAI) for predicting suicide risk from medical tabular data. Given the common challenge of limited datasets in health-related Machine Learning (ML) applications, we use data augmentation in tandem with ML to enhance the identification of individuals at high risk of suicide...
Article
Most existing weakly supervised semantic segmentation (WSSS) methods rely on class activation mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pretraine...
Article
Full-text available
Aims Patients with atrial fibrillation (AF) have a higher risk of ischaemic stroke and death. While anticoagulants are effective at reducing these risks, they increase the risk of bleeding. Current clinical risk scores only perform modestly in predicting adverse outcomes, especially for the outcome of death. We aimed to test the multi-label gradien...
Article
Full-text available
Ensuring the security and safety of passengers and cargo through effective baggage screening presents a critical challenge in high-traffic environments like airports, where traditional manual processes are affected by high error rates, fatigue among security personnel, and privacy concerns. These issues underscore the urgent need for sophisticated...
Article
Generative models such as generative adversarial networks and autoencoders have gained a great deal of attention in the medical field due to their excellent data generation capability. This paper provides a comprehensive survey of generative models for three-dimensional (3D) volumes, focusing on the brain and heart. A new and elaborate taxonomy of...
Article
Full-text available
Introduction Natural language processing (NLP) uses various computational methods to analyse and understand human language, and has been applied to data acquired at Emergency Department (ED) triage to predict various outcomes. The objective of this scoping review is to evaluate how NLP has been applied to data acquired at ED triage, assess if NLP b...
Article
Long non-coding ribonucleic acids (lncRNAs) have been shown to play an important role in plant gene regulation, involving both epigenetic and transcript regulation. LncRNAs are transcripts longer than 200 nucleotides that are not translated into functional proteins but can be translated into small peptides. Machine learning models have predominantl...
Article
Camera pose estimation has long relied on geometry-based approaches and sparse 2D-3D keypoint correspondences. With the advent of deep learning methods, the estimation of camera pose parameters, i.e., the six parameters that describe position and rotation denoted by 6 Degrees of Freedom (6-DoF), has decreased from tens of meters to a few centimeter...
Preprint
Full-text available
Transformers have rapidly gained popularity in computer vision, especially in the field of object recognition and detection. Upon examining the outcomes of state-of-the-art object detection methods, we noticed that transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset. While transformer-b...
Article
Full-text available
Atrial fibrillation arises mainly due to abnormalities in the cardiac conduction system and is associated with anatomical remodeling of the atria and the pulmonary veins. Cardiovascular imaging techniques, such as echocardiography, computed tomography, and magnetic resonance imaging, are crucial in the management of atrial fibrillation, as they not...
Preprint
Full-text available
Diagnostic investigation has an important role in risk stratification and clinical decision making of patients with suspected and documented Coronary Artery Disease (CAD). However, the majority of existing tools are primarily focused on the selection of gatekeeper tests, whereas only a handful of systems contain information regarding the downstream...
Article
Full-text available
The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This article serves as a tutorial and perspective showing how to apply the lessons learned from several decades of research in deep le...
Article
Full-text available
Introduction Surveys conducted internationally have found widespread interest in artificial intelligence (AI) amongst medical students. No similar surveys have been conducted in Western Australia (WA) and it is not known how medical students in WA feel about the use of AI in healthcare or their understanding of AI. We aim to assess WA medical stude...
Article
Cross-resolution person re-identification (CRReID) is a challenging and practical problem that involves matching low-resolution (LR) query identity images against high-resolution (HR) gallery images. Query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras. State-of-the-art solutions fo...
Article
Full-text available
The astounding success made by artificial intelligence in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. Many junior researchers faced a lack of data, because of a variety of reasons. Me...
Article
Full-text available
3D face recognition has been extensively investigated in the last two decades due to its wide range of applications in many areas such as security and forensics. Numerous methods have been proposed to deal with the challenges faced by 3D face recognition such as facial expressions, pose variations and occlusions. These methods have achieved superio...
Preprint
This paper proposes a novel transformer-based framework that aims to enhance weakly supervised semantic segmentation (WSSS) by generating accurate class-specific object localization maps as pseudo labels. Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnos...
Preprint
Full-text available
Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (low-resolution) vision-language features to segment the decoded high-resolution features. We discovered that this causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation. This negative...
Article
Full-text available
Weakly supervised semantic segmentation (WSSS) commonly relies on Class Activation Mapping (CAM) to produce pseudo semantic labels using image-level annotations. However, because CAM maps often form sparse object regions with poor boundaries, they cannot provide sufficient segmentation supervision. Because off-the-shelf saliency maps can provide ri...
Article
Background: Cardiac exercise stress testing (EST) offers a non-invasive way in the management of patients with suspected coronary artery disease (CAD). However, up to 30% EST results are either inconclusive or non-diagnostic, which results in significant resource wastage. Our aim was to build machine learning (ML) based models, using patients demo...
Article
Background and objective: The generation of three-dimensional (3D) medical images has great application potential since it takes into account the 3D anatomical structure. Two problems prevent effective training of a 3D medical generative model: (1) 3D medical images are expensive to acquire and annotate, resulting in an insufficient number of trai...
Article
Recently, transformers have achieved great success in a number of computer vision tasks due to their excellent ability to capture long-range feature dependencies. In contrast, convolutional neural networks (CNNs) are good at extracting local features. Given that the capture of short- and long-range band dependencies are both important for hyperspec...
Article
Full-text available
Narrow-leafed lupin (Lupinus angustifolius) is an important dryland crop, providing a protein source in global grain markets. While agronomic practices have successfully controlled many dicot weeds among narrow-leafed lupins, the closely related sandplain lupin (Lupinus cosentinii) has proven difficult to control, reducing yield and harvest quality...
Preprint
Full-text available
This study investigates the effectiveness of Explainable Artificial Intelligence (XAI) techniques in predicting suicide risks and identifying the dominant causes for such behaviours. Data augmentation techniques and ML models are utilized to predict the associated risk. Furthermore, SHapley Additive exPlanations (SHAP) and correlation analysis are...
Article
Full-text available
In the last two decades, baggage scanning has become one of the prime aviation security concerns worldwide. Manual screening of the baggage items is tedious and an error-prone process that also compromises privacy. Hence, many researchers have developed X-ray imagery-based autonomous systems to address these shortcomings. This paper presents a casc...
Chapter
Word sense embeddings are vector representations of polysemous words – words with multiple meanings. These induced sense embeddings, however, do not necessarily correspond to any dictionary senses of the word. To overcome this, we propose a method to find new sense embeddings with known meaning. We term this method refitting, as the new embedding i...
Preprint
Full-text available
p>Camera pose estimation has long relied on geometry-based approaches and sparse 2D-3D keypoint correspondences. With the advent of deep learning methods, the estimation of camera pose parameters (i.e., the six parameters that describe position and rotation) has decreased from tens of meters to a few centimeters in median error for indoor applicati...
Preprint
Full-text available
p>Camera pose estimation has long relied on geometry-based approaches and sparse 2D-3D keypoint correspondences. With the advent of deep learning methods, the estimation of camera pose parameters (i.e., the six parameters that describe position and rotation) has decreased from tens of meters to a few centimeters in median error for indoor applicati...
Article
Full-text available
In the past few years, deep learning-based models have been very successful in achieving state-of-the-art results in many tasks in computer vision, speech recognition, and natural language processing. These models seem to be a natural fit for handling the ever-increasing scale of biometric recognition problems, from cellphone authentication to airp...
Preprint
Full-text available
INTRODUCTION Millions of patients attend emergency departments (EDs) around the world every year. Patients are triaged on arrival by a trained nurse who collects structured data and an unstructured free-text history of presenting complaint. Natural language processing (NLP) uses various computational methods to analyse and understand human language...

Network

Cited By