Oliver Deussen’s research while affiliated with University of Konstanz and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (340)


Schematic of eye tracking and retinal view reconstruction of freely swimming schooling fish in 3D
Three main modules are included in the task: 3D posture reconstruction based on DeepShapKit²⁴ (a), eye tracking (b), and retinal view reconstruction (c).
Illustration of the three steps involved in retinal view reconstruction based on eye movement tracking
First step (yellow): position a camera at the tracked eye position in 3D. Second step (green): rotate the camera to align with the fish’s view direction. Final step (pink): configure the camera parameters based on the structure of the fish’s eye.
Evaluation of the eye tracking
a The pipeline of manually labeling eye positions after perspective transformation and conducting eye tracking as a comparison. Probability distribution function (PDF) of b–d: ground truth and detected eye view angle, pupil position, and eye position. e–g: deviations between the ground truth and eye detection outputs. h The pipeline of generating synthesized data as ground truth and conducting eye tracking for a comparison. The PDF of i–k ground truth and detected eye view angle, pupil position, and eye position (k). l–n deviations between the ground truth and eye detection outputs. The deviations are scaled based on the average eye diameter (6.75 mm).
Eye tracking of two swimming goldfish exhibiting leader-follower behavior
a Selected pairs of goldfish within swimming groups. (i) Two-fish relationship from the bottom view. (ii) Front fish position density map on a 3D sphere. All 45,499 pairs of data are binned in 12 degrees increments, with a radius of 17.5 cm. (iii): The front fish position heat map, viewed from the bottom side, shows the relative positions of the front fish from the perspective of the following fish looking upwards. Each point on the heat map represents the position of the front fish, with the following fish’s position fixed at the center. The x and y axes span a range of ±50 cm, capturing the spatial distribution of the front fish. (iv): The front fish position heat map, viewed from the front side, presents the relative positions of the front fish from the perspective of the following fish facing left. Again, each point represents the front fish’s position, with the following fish at the center. The x and y axes cover a range of ±50 cm, illustrating the positioning patterns during the interaction. b The correlation between the normalised eye movements ranging from −0.4 (rightmost position) to 0.4 (leftmost position) and relative position angles.
Application of retinal view reconstruction in two swimming goldfish
a follower’s retinal view of the front individual with tracked eye movements (i), static eye (ii), and randomly moved eye (iii). b position of the front individual on the follower’s retina along the x-axis (i) and y-axis (ii), and their corresponding variances (iii and iv) as determined by bootstrap analysis.
Non-invasive eye tracking and retinal view reconstruction in free swimming schooling fish
  • Article
  • Full-text available

December 2024

·

80 Reads

Communications Biology

·

Oliver Deussen

·

·

Eye tracking has emerged as a key method for understanding how animals process visual information, identifying crucial elements of perception and attention. Traditional fish eye tracking often alters animal behavior due to invasive techniques, while non-invasive methods are limited to either 2D tracking or restricting animals after training. Our study introduces a non-invasive technique for tracking and reconstructing the retinal view of free-swimming fish in a large 3D arena without behavioral training. Using 3D fish bodymeshes reconstructed by DeepShapeKit, our method integrates multiple camera angles, deep learning for 3D fish posture reconstruction, perspective transformation, and eye tracking. We evaluated our approach using data from two fish swimming in a flow tank, captured from two perpendicular viewpoints, and validated its accuracy using human-labeled and synthesized ground truth data. Our analysis of eye movements and retinal view reconstruction within leader-follower schooling behavior reveals that fish exhibit negatively synchronised eye movements and focus on neighbors centered in the retinal view. These findings are consistent with previous studies on schooling fish, providing a further, indirect, validation of our method. Our approach offers new insights into animal attention in naturalistic settings and potentially has broader implications for studying collective behavior and advancing swarm robotics.

Download


Figure 6. Qualitative comparison with baseline methods on various editing tasks. Our results demonstrate high alignment with the text guidance while keeping consistency with the reference image.
HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads

November 2024

·

4 Reads

Diffusion Transformers (DiTs) have exhibited robust capabilities in image generation tasks. However, accurate text-guided image editing for multimodal DiTs (MM-DiTs) still poses a significant challenge. Unlike UNet-based structures that could utilize self/cross-attention maps for semantic editing, MM-DiTs inherently lack support for explicit and consistent incorporated text guidance, resulting in semantic misalignment between the edited results and texts. In this study, we disclose the sensitivity of different attention heads to different image semantics within MM-DiTs and introduce HeadRouter, a training-free image editing framework that edits the source image by adaptively routing the text guidance to different attention heads in MM-DiTs. Furthermore, we present a dual-token refinement module to refine text/image token representations for precise semantic guidance and accurate region expression. Experimental results on multiple benchmarks demonstrate HeadRouter's performance in terms of editing fidelity and image quality.


Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms

October 2024

·

12 Reads

When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network. We present Newton Losses, a method for improving the performance of existing hard to optimize losses by exploiting their second-order information via their empirical Fisher and Hessian matrices. Instead of training the neural network with second-order techniques, we only utilize the loss function's second-order information to replace it by a Newton Loss, while training the network with gradient descent. This makes our method computationally efficient. We apply Newton Losses to eight differentiable algorithms for sorting and shortest-paths, achieving significant improvements for less-optimized differentiable algorithms, and consistent improvements, even for well-optimized differentiable algorithms.




A Comprehensive Evaluation of Arbitrary Image Style Transfer Methods

September 2024

·

13 Reads

·

1 Citation

IEEE Transactions on Visualization and Computer Graphics

Despite the remarkable process in the field of arbitrary image style transfer (AST), inconsistent evaluation continues to plague style transfer research. Existing methods often suffer from limited objective evaluation and inconsistent subjective feedback, hindering reliable comparisons among AST variants. In this study, we propose a multi-granularity assessment system that combines standardized objective and subjective evaluations. We collect a fine-grained dataset considering a range of image contexts such as different scenes, object complexities, and rich parsing information from multiple sources. Objective and subjective studies are conducted using the collected dataset. Specifically, we innovate on traditional subjective studies by developing an online evaluation system utilizing a combination of point-wise, pair-wise, and group-wise questionnaires. Finally, we bridge the gap between objective and subjective evaluations by examining the consistency between the results from the two studies. We experimentally evaluate CNN-based, flow-based, transformer-based, and diffusion-based AST methods by the proposed multi-granularity assessment system, which lays the foundation for a reliable and robust evaluation. Providing standardized measures, objective data, and detailed subjective feedback empowers researchers to make informed comparisons and drive innovation in this rapidly evolving field. Finally, for the collected dataset and our online evaluation system, please see http://ivc.ia.ac.cn .


UADAPy: An Uncertainty-Aware Visualization and Analysis Toolbox

September 2024

·

14 Reads

Current research provides methods to communicate uncertainty and adapts classical algorithms of the visualization pipeline to take the uncertainty into account. Various existing visualization frameworks include methods to present uncertain data but do not offer transformation techniques tailored to uncertain data. Therefore, we propose a software package for uncertainty-aware data analysis in Python (UADAPy) offering methods for uncertain data along the visualization pipeline. We aim to provide a platform that is the foundation for further integration of uncertainty algorithms and visualizations. It provides common utility functionality to support research in uncertainty-aware visualization algorithms and makes state-of-the-art research results accessible to the end user. The project is available at https://github.com/UniStuttgart-VISUS/uadapy.



3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

May 2024

·

145 Reads

·

12 Citations

International Journal of Computer Vision

Markerless methods for animal posture tracking have been rapidly developing recently, but frameworks and benchmarks for tracking large animal groups in 3D are still lacking. To overcome this gap in the literature, we present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple camera views. We train a pose estimator to infer 2D keypoints and bounding boxes of multiple pigeons, then triangulate the keypoints to 3D. For identity matching of individuals in all views, we first dynamically match 2D detections to global identities in the first frame, then use a 2D tracker to maintain IDs across views in subsequent frames. We achieve comparable accuracy to a state of the art 3D pose estimator in terms of median error and Percentage of Correct Keypoints. Additionally, we benchmark the inference speed of 3D-MuPPET, with up to 9.45 fps in 2D and 1.89 fps in 3D, and perform quantitative tracking evaluation, which yields encouraging results. Finally, we showcase two novel applications for 3D-MuPPET. First, we train a model with data of single pigeons and achieve comparable results in 2D and 3D posture estimation for up to 5 pigeons. Second, we show that 3D-MuPPET also works in outdoors without additional annotations from natural environments. Both use cases simplify the domain shift to new species and environments, largely reducing annotation effort needed for 3D posture tracking. To the best of our knowledge we are the first to present a framework for 2D/3D animal posture and trajectory tracking that works in both indoor and outdoor environments for up to 10 individuals. We hope that the framework can open up new opportunities in studying animal collective behaviour and encourages further developments in 3D multi-animal posture tracking.


Citations (69)


... The use of multimodal data for content generation has become a central theme in recent research. Several studies [24,45] investigate music generation synchronized with video, particularly for dance videos, by extracting rhythmic features from human motion. However, such humancentric approaches are not generalizable to the wide variety of videos found online. ...

Reference:

VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features
Dance-to-Music Generation with Encoder-based Textual Inversion
  • Citing Conference Paper
  • December 2024

... Validasi ekternal ini memastikan klaster yang terbentuk relevan terhadap informasi atau kategori lain di dataset 1. Tabel Distribusi status gizi tiap klaster. 2. Heatmap utuk membantu memvisualisasikan proporsi status gizi pada masing-masing klaster [21]. ...

The Categorical Data Map: A Multidimensional Scaling-Based Approach
  • Citing Conference Paper
  • October 2024

... In the field of visual generation, there has been a significant increase in evaluation studies over the past years, especially in text-to-image generation. Currently, static evaluation [1], [15], [18]- [20], [31], [33], [36], [38], [41] are the predominant method for evaluation, which employs a one-time assessment using pre-collected data sets, which can show the generalization ability of the model on specific data. Firstly, evaluators determine testing perspectives such as concept conjunction [23], and spatial relationships [35]. ...

A Comprehensive Evaluation of Arbitrary Image Style Transfer Methods
  • Citing Article
  • September 2024

IEEE Transactions on Visualization and Computer Graphics

... Their work also use a landmark-based approach, although in a different domain, complements our methodology in leveraging the environment refine 3D reconstruction and reduce error; however, our method extends this concept by utilizing Voronoi diagrams for a more robust context-aware outlier rejection, especially in dynamic aviary settings. Similarly, [21] creates a system that essentially estimate the tracks of 3D-poses, called 3D-MuPPET, of similar animals such as pigeons, using multi-camera setups in controlled environments. While their work is focused on pose estimation in controlled conditions, our work focuses on more outdoor settings and environmental cues to handle harsh conditions and occlusions, aiming for enhanced tracking accuracy. ...

3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

International Journal of Computer Vision

... Nevertheless, in these previous network pruning works, the training data and testing data were assumed i.i.d. and the domain gap was not taken into consideration. More recently, some network pruning methods for crossdomain scenarios have been proposed (Cai et al., 2021;Nguyen et al., 2022;Sun, 2023;Wu et al., 2024). Cai et al. (2021) proposed to search for a subnetwork that can help with multi-source Ki67 image analysis. ...

Lighting Image/Video Style Transfer Methods by Iterative Channel Pruning
  • Citing Conference Paper
  • April 2024

... Human-computer interaction methods Ming [59] Graph, Glyph, Bar chart EA, FD, IV, NG Spinner [60] Graph, Heatmap EA, FD, IV Strobelt [61] Heatmap, Bar chart EA, IV Park [62] Heatmap, Graph, Glyph, Bar chart, Small multiple, Text style EA, FD, IV DeRose [63] Glyph, Heatmap EA, FD, IV Gao [64] Heatmap, Graph EA, FD, IV, NG Syed [65] Heatmap, Text style, Bar chart EA, FD, IV Kahng [66] Text style, Bar chart EA, FD, IV Arawjo [67] Text style, Bar chart EA, FD, IV ...

generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation
  • Citing Article
  • March 2024

The ACM Transactions on Interactive Intelligent Systems

... It's also crucial to note that fish inhabit a 3D environment, produce 3D fluid dynamics, and experience 3D hydrodynamic interactions. Our recent research leveraged the RoboTwin platform [35], which boasts features such as a flow tank system, recording and tracking functionalities, advanced robotic fish models, a multi-axis position control system, thrust measurement tools, and a particle image velocimetry (PIV) system. Through the combined insights from our CFD and robotic experiments, we can comprehend how fish perceive flow and make corresponding movement decisions. ...

RoboTwin: A Platform to Study Hydrodynamic Interactions in Schooling Fish

IEEE Robotics & Automation Magazine

... DDPM Model Analysis. Recent works [8,16,33,39] analyze the diffusion process to pinpoint parts responsible for generating various visual aspects. ProSpect [39] examine how varying text conditions at different timesteps impact aspects like material, artistic style, and content alignment. ...

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models
  • Citing Article
  • December 2023

ACM Transactions on Graphics

... This prompts us to choose one to simplify the comparison process for experts. We chose the similarity mode because it shows slightly better performance in BHDI and alignment with hierarchical structures, and this mode is more suitable for data analysis in practice [60]. Experts. ...

Reducing Ambiguities in Line-Based Density Plots by Image-Space Colorization

IEEE Transactions on Visualization and Computer Graphics

... jGCaMP8s has recently been used as the calcium reporter in a newly developed platform that combines Ca 2+ imaging and optogenetics in freely moving zebrafish (Chai et al., 2024). SyjGCaMP8m, a variant of GCaMP8 expressed in synaptic terminals, has been used to image bipolar cells in the inner retina of larval zebrafish, in order to investigate the role of amacrine and bipolar cells in the processing of color information (Wang et al., 2023). ...

Reverse engineering the control law for schooling in zebrafish using virtual reality