Walter J. Scheirer’s research while affiliated with Université Notre Dame d'Haïti and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (250)


Adaptive self-supervised vision transformers for multisensor automatic target recognition
  • Conference Paper

May 2025

Sophia Abraham

·

Suya You

·

Jonathan Hauenstein

·

Walter Scheirer

Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms

May 2025

Jane Cleland-Huang

·

Pedro Antonio Alarcon Granadeno

·

Arturo Miguel Russell Bernal

·

[...]

·

Walter Scheirer

Small Uncrewed Aerial Systems (sUAS) are increasingly deployed as autonomous swarms in search-and-rescue and other disaster-response scenarios. In these settings, they use computer vision (CV) to detect objects of interest and autonomously adapt their missions. However, traditional CV systems often struggle to recognize unfamiliar objects in open-world environments or to infer their relevance for mission planning. To address this, we incorporate large language models (LLMs) to reason about detected objects and their implications. While LLMs can offer valuable insights, they are also prone to hallucinations and may produce incorrect, misleading, or unsafe recommendations. To ensure safe and sensible decision-making under uncertainty, high-level decisions must be governed by cognitive guardrails. This article presents the design, simulation, and real-world integration of these guardrails for sUAS swarms in search-and-rescue missions.



Visual narratives and political instability: a case study of visual media prior to the Russia-Ukraine conflict

April 2025

·

20 Reads


Novelty is a significant confounding factor affecting the safety of autonomous driving. In (A), smartphones, which emerged in the late 2000s, represent a type of novelty that fundamentally changed pedestrian behavior. In (B), safely navigating around existing traffic accidents has been a significant challenge for Tesla's Autopilot advanced driver assistance system, likely due to the rarity of such incidents in the training data. Further, these events often involve unique visual appearances on a per‐instance basis, resulting in novel interactions between vehicles, victims, bystanders, and first responders. (“September 26, 2007 accident, highway 9, CT” by Ragesoss is licensed under CC BY‐SA 3.0).
Overview of the eight open issues in open‐world learning. Each issue falls under one of the three primary categories: (1) Theory of Novelty, (2) Agent Design, and (3) Agent Evaluation.
Open issues in open world learning
  • Article
  • Full-text available

April 2025

·

15 Reads

Meaningful progress has been made in open world learning (OWL), enhancing the ability of agents to detect, characterize, and incrementally learn novelty in dynamic environments. However, novelty remains a persistent challenge for agents relying on state‐of‐the‐art learning algorithms. This article considers the current state of OWL, drawing on insights from a recent DARPA research program on this topic. We identify open issues that impede further advancements spanning theory, design, and evaluation. In particular, we emphasize the challenges posed by dynamic scenarios that are crucial to understand for ensuring the viability of agents designed for real‐world environments. The article provides suggestions for setting a new research agenda that effectively addresses these open issues.

Download

HomOpt: A Flexible Homotopy-Based Hyperparameter Optimization Method

February 2025

·

3 Reads

Over the past few decades, machine learning has made remarkable strides, owed largely to algorithmic advancements and the abundance of high-quality, large-scale datasets. However, an equally crucial aspect in achieving optimal model performance is the fine-tuning of hyperparameters. Despite its significance, hyperparameter optimization (HPO) remains challenging due to several factors. Many existing HPO techniques rely on simplistic search methods or assume smooth and continuous loss functions, which may not always hold true. Traditional methods like grid search and Bayesian optimization often struggle to adapt swiftly and efficiently navigate the loss landscape. Moreover, the search space for HPO is frequently high-dimensional and non-convex, posing challenges in efficiently finding a global minimum. Additionally, optimal hyperparameters can vary significantly based on the dataset or task at hand, further complicating the optimization process. To address these challenges, this paper presents HomOpt, an advanced HPO methodology that integrates a surrogate model framework with homotopy optimization techniques. Unlike rigid methodologies, HomOpt offers flexibility by incorporating diverse surrogate models tailored to specific optimization tasks. Our initial investigation focuses on leveraging Generalized Additive Model (GAM) surrogates within the HomOpt framework to enhance the effectiveness of existing optimization methodologies. HomOpt's ability to expedite convergence towards optimal solutions across varied domain spaces, encompassing continuous, discrete, and categorical domains is highlighted. We conduct a comparative analysis of HomOpt applied to multiple optimization techniques (e.g., Random Search, TPE, Bayes, and SMAC), demonstrating improved objective performance on numerous standardized machine learning benchmarks and challenging open-set recognition tasks. We also integrate CatBoost within the HomOpt framework as a surrogate, showcasing its adaptability and effectiveness in handling more complex datasets. This integration facilitates an evaluation against state-of-the-art methods such as BOHB, particularly on challenging computer vision datasets like CIFAR-10 and ImageNet. Comparative analyses reveal HomOpt's competitive performance with reduced iterations and underscore potential optimizations in execution time. All the experimentation and method code can be found here: https://github.com/sabraha2/HOMOPT



Participant Demographics
The Challenges of Bringing Religious and Philosophical Values Into Design

February 2025

·

8 Reads

HCI is increasingly taking inspiration from philosophical and religious traditions as a basis for ethical technology designs. If these values are to be incorporated into real-world designs, there may be challenges when designers work with values unfamiliar to them. Therefore, we investigate the variance in interpretations when values are translated to technology designs. To do so we identified social media designs that embodied the main principles of Catholic Social Teaching (CST). We then interviewed 24 technology experts with varying levels of familiarity with CST to assess how their understanding of how those values would manifest in a technology design. We found that familiarity with CST did not impact participant responses: there were clear patterns in how all participant responses differed from the values we determined the designs embodied. We propose that value experts be included in the design process to more effectively create designs that embody particular values.


Figure 1: The SGSM computational pipeline that processes two texts í µí±‡ 1 and í µí±‡ 2 by first passing them through the BERT model í µí±€. The output of the model í µí±€ produces a feature set í µí±“ í µí±– for every text passage í µí±¡ í µí±– . Each of these feature sets í µí±“ í µí±– is a set of story grammar labels ℓ, which ends the labeling stage. Lastly, the matching stage compares each text passage í µí±¡ í µí±– from the source text í µí±‡ 1 against every text passage í µí±¡ í µí±– from the target text í µí±‡ 2 by calculating their Levenshtein distance, which is the metric í µí¼‡. The final output is a set of all possible text passage pairs í µí±† with their respective Levenshtein distance scores.
Story Grammar Semantic Matching for Literary Study

February 2025

·

6 Reads

In Natural Language Processing (NLP), semantic matching algorithms have traditionally relied on the feature of word co-occurrence to measure semantic similarity. While this feature approach has proven valuable in many contexts, its simplistic nature limits its analytical and explanatory power when used to understand literary texts. To address these limitations, we propose a more transparent approach that makes use of story structure and related elements. Using a BERT language model pipeline, we label prose and epic poetry with story element labels and perform semantic matching by only considering these labels as features. This new method, Story Grammar Semantic Matching, guides literary scholars to allusions and other semantic similarities across texts in a way that allows for characterizing patterns and literary technique.


Fig. 3. Qualitative comparison of Single-Objective and Multi-Objective models under perturbations. Blur (severity = 0.3), Gaussian Noise (severity = 0.1), and Occlusion (severity = 0.5) are applied to input images (first column). The Single-Objective model produces fragmented and inaccurate segmentations, especially in occluded and blurred regions. In contrast, the Multi-Objective model exhibits greater robustness, preserving facial structure despite degradations, with improved stability under occlusion.
Fig. 6. Impact of Segmentation Maps on GAN-Based Face Synthesis. Segmentation maps from a Single-Objective U-Net and Multi-Objective U-Nets (Linear, Sigmoid, Piecewise) serve as inputs to a Pix2Pix GAN. Single-objective segmentation introduces inconsistencies, distorting facial details. In contrast, multi-objective segmentation improves structural coherence, yielding more natural and perceptually accurate face synthesis.
Fig. 7. Effect of Segmentation Quality on Diffusion-Based Synthesis After One Epoch of Fine-Tuning. The top row shows images generated using segmentation maps from the Single-Objective U-Net, while the bottom row corresponds to the Multi-Objective (Linear) U-Net. Despite being finetuned for just one epoch, the Multi-Objective model produces structurally coherent and visually consistent images, reducing artifacts and distortions in facial features. In contrast, the Single-Objective model exhibits irregular textures and geometric inconsistencies.
Towards Fair and Robust Face Parsing for Generative AI: A Multi-Objective Approach

February 2025

·

10 Reads

Face parsing is a fundamental task in computer vision, enabling applications such as identity verification, facial editing, and controllable image synthesis. However, existing face parsing models often lack fairness and robustness, leading to biased segmentation across demographic groups and errors under occlusions, noise, and domain shifts. These limitations affect downstream face synthesis, where segmentation biases can degrade generative model outputs. We propose a multi-objective learning framework that optimizes accuracy, fairness, and robustness in face parsing. Our approach introduces a homotopy-based loss function that dynamically adjusts the importance of these objectives during training. To evaluate its impact, we compare multi-objective and single-objective U-Net models in a GAN-based face synthesis pipeline (Pix2PixHD). Our results show that fairness-aware and robust segmentation improves photorealism and consistency in face generation. Additionally, we conduct preliminary experiments using ControlNet, a structured conditioning model for diffusion-based synthesis, to explore how segmentation quality influences guided image generation. Our findings demonstrate that multi-objective face parsing improves demographic consistency and robustness, leading to higher-quality GAN-based synthesis.


Citations (47)


... The activity domains consisted of Angry Birds , Monopoly (Kejriwal and Thomas 2021), CartPole , VizDoom (Wydmuch, Kempka, and Jaśkowski 2018), Polycraft (Goss et al. 2023), and CARLA (Dosovitskiy et al. 2017). The perceptual domains consisted of video activity recognition (Prijatelj et al. 2024), natural language text identification , and image recognition (Kumar et al. 2021). ...

Reference:

Open issues in open world learning
Human Activity Recognition in an Open World
  • Citing Article
  • December 2024

Journal of Artificial Intelligence Research

... 151 Moving forward, researchers use knowledge related to brain functions and combine machine learning with 152 high-throughput behavioral optogenetics to stimulate very precise brain areas. They have found that the 153 nature and magnitude of hallucinations experienced by macaques highly depend on concurrent visual input, 154 the location of brain stimulation, and the intensity of the stimulation (Shahbazi et al., 2024). 155 Some would argue that current neural network architectures and their connection to the human brain 156 are fleeting given that understanding of the dynamic processes in the brain has changed and the artificial 157 intelligence community has understandably prioritized complex optimization-based architectures over • the use of convolution operations in CNNs is a consequence of our notion of how simple cells in V1 172 perform linear filtering, by "calculating" the weighted sum of their inputs, with weights defined by the 173 receptive field profiles Wiesel, 1959, 1962), 174 • filtering kernels in CNNs' early layers often converge (during CNN's training) to Gabor wavelets 175 (Gabor, 1946), which were found as good models (in a least squares sense) of the receptive field 176 profiles in simple cells in visual cortex (Daugman, 1985), ...

Perceptography unveils the causal contribution of inferior temporal cortex to visual perception

... Human vision has shown significant robustness to occlusion [83,98] and outperformed CV models on aerial detection of persons during search and rescue scenarios [60]. With previous research having successfully incorporated human perception into CV models in a variety of manners [31,34,83], we addressed the challenge of person aerial Figure 1. Development of Psych-ER, our behavioral dataset for Emergency Response (ER) aerial search, and its derived psychophysical loss. ...

Informing Machine Perception With Psychophysics
  • Citing Article
  • February 2024

Proceedings of the IEEE

... To evaluate our theoretical framework, we conduct an empirical study using two influential prototypebased architectures: ProtoVAE [27] and PrototypeDNN [51] (hereinafter ProtoDNN, for simplicity). These models are particularly well-suited to our analysis, as they define class-level explanations in terms of entire prototypes, rather than relying on fine-grained, local features that explain only part of the input [17,35,16]. We trained both architectures on the MNIST and Fashion-MNIST datasets, and for an increasing number of prototypes: |S| P t10, 20, 50, 100u. ...

Pixel-Grounded Prototypical Part Networks
  • Citing Conference Paper
  • January 2024

... For example, in earthquake and flood scenarios, UAVs have been used to map damaged areas and locate survivors more rapidly compared to traditional ground-based methods [17]. The use of UAVs helps to overcome physical barriers that often hinder rescue teams, as demonstrated in various studies [18]. ...

NOMAD: A Natural, Occluded, Multi-scale Aerial Dataset, for Emergency Response Scenarios
  • Citing Conference Paper
  • January 2024

... A more sophisticated approach considers hyperparameter tuning as an optimization problem, aiming to find the function that maximizes model performance or minimizes errors [13]. Examples of such techniques include Bayesian optimization [10], gradient-based optimization [9], genetic algorithms [5], and surrogate models [14], [15]. However, these techniques can be challenging for inexperienced modelers and often require a large number of runs or substantial memory resources. ...

NCQS: Nonlinear Convex Quadrature Surrogate Hyperparameter Optimization
  • Citing Conference Paper
  • October 2023

... For example, novel methods for approximating Shapley values or integrating gradient-based techniques with other interpretability strategies could be explored. Another important direction is the refinement of post-hoc methods to capture higher-order interactions between input features [105]. Current methods typically focus on the individual importance of each feature, but many decision-making processes in large models involve complex interactions between multiple features. ...

Novelty in Image Classification
  • Citing Chapter
  • August 2023

Synthesis Lectures on Computer Vision

... HCR field has been studied extensively for more than five decades [1,7,25,28,31,52,53,56,65,74,85,93,94]. Earlier, the focus of research was mainly on developing feature extraction techniques and applying different classification techniques for recognition. ...

Novelty in Handwriting Recognition
  • Citing Chapter
  • August 2023

Synthesis Lectures on Computer Vision

... Approaches include approximating a complex, biased model with an ostensibly fair, interpretable one [Aivodji et al., 2019], designing an unfair model that switches to generating 'fair' predictions when being audited, akin to automotive manufacturers cheating emissions tests [Slack et al., 2020], or performing biassed sampling of the data points used to compute the SHAP values [Laberge et al., 2023a]. Recently, authors have proposed methods to detect and thwart such attacks , Carmichael and Scheirer, 2022. However, the present paper focusses on a higher level threat: the risk of (misguided) AutoML. ...

Unfooling Perturbation-Based Post Hoc Explainers
  • Citing Article
  • June 2023

Proceedings of the AAAI Conference on Artificial Intelligence

... In support of the thermal ATR task, Abraham et al. [33] improve the performance of YOLOv5 models on the DSIAC dataset via a novel homotopy-based hyperparameter optimization algorithm. Leveraging the visible imagery of the DSIAC dataset, VS et al. [34] propose a metalearning strategy for unsupervised domain adaptation in thermal ATR. ...

Efficient hyperparameter optimization for ATR using homotopy parametrization
  • Citing Conference Paper
  • June 2023