Daniel Sonntag

Daniel Sonntag
Deutsches Forschungszentrum für Künstliche Intelligenz | DFKI · Intelligent User Interfaces

About

232
Publications
58,627
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,879
Citations

Publications

Publications (232)
Conference Paper
Full-text available
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning a...
Chapter
Active learning algorithms have become increasingly popular for training models with limited data. However, selecting data for annotation remains a challenging problem due to the limited information available on unseen data. To address this issue, we propose EdgeAL, which utilizes the edge information of unseen images as a priori information for me...
Chapter
Full-text available
Enhancing the interpretability and consistency of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of f...
Chapter
Dialogue systems are an important and very active research area with many practical applications. However, researchers and practitioners new to the field may have difficulty with the categorisation, number and terminology of existing free and commercial systems. Our paper aims to achieve two main objectives. Firstly, based on our structured literat...
Chapter
Full-text available
AI models that can recognize and understand the semantics of graphical user interfaces (GUIs) enable a variety of use cases ranging from accessibility to automation. Recent efforts in this domain have pursued the development of a set of foundation models: generic GUI understanding models that can be used off-the-shelf to solve a variety of GUI-rela...
Conference Paper
Deep learning methods are well suited for data analysis in several domains, but application is often limited by technical entry barriers and the availability of large annotated datasets. We present an interactive machine learning tool for annotating passive acoustic monitoring datasets created for wildlife monitoring, which are time-consuming and c...
Conference Paper
Full-text available
Biodiversity loss is taking place at accelerated rates globally, and a business-as-usual trajectory will lead to missing internationally established conservation goals. Biosphere reserves are sites designed to be of global significance in terms of both the biodiversity within them and their potential for sustainable development, and are therefore i...
Conference Paper
Within the past few years, the accuracy of deep learning and machine learning models has been improving significantly while less attention has been paid to their responsibility, explainability, and interpretability. eXplainable Artificial Intelligence (XAI) methods, guidelines, concepts, and strategies offer the possibility of models' evaluation fo...
Preprint
Full-text available
Active learning algorithms have become increasingly popular for training models with limited data. However, selecting data for annotation remains a challenging problem due to the limited information available on unseen data. To address this issue, we propose EdgeAL, which utilizes the edge information of unseen images as {\it a priori} information...
Preprint
Full-text available
Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization o...
Article
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most...
Preprint
Full-text available
Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to t...
Preprint
Full-text available
Interactive machine learning (IML) is a beneficial learning paradigm in cases of limited data availability, as human feedback is incrementally integrated into the training process. In this paper, we present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained image captioning model to a new data distribution bas...
Preprint
Full-text available
Image Captioning (IC) models can highly benefit from human feedback in the training process, especially in cases where data is limited. We present work-in-progress on adapting an IC system to integrate human feedback, with the goal to make it easily adaptable to user-specific data. Our approach builds on a base IC model pre-trained on the MS COCO d...
Article
Full-text available
Allocentric semantic 3D maps are highly useful for a variety of human–machine interaction related tasks since egocentric viewpoints can be derived by the machine for the human partner. Class labels and map interpretations, however, may differ or could be missing for the participants due to the different perspectives. Particularly, when considering...
Preprint
Full-text available
Deep learning is ubiquitous, but its lack of transparency limits its impact on several potential application areas. We demonstrate a virtual reality tool for automating the process of assigning data inputs to different categories. A dataset is represented as a cloud of points in virtual space. The user explores the cloud through movement and uses h...
Preprint
Full-text available
The impact of machine learning (ML) in many fields of application is constrained by lack of annotated data. Among existing tools for ML-assisted data annotation, one little explored tool type relies on an analogy between the coordinates of a graphical user interface and the latent space of a neural network for interaction through direct manipulatio...
Chapter
Full-text available
Future healthcare ecosystems integrating human-centered artificial intelligence (AI) will be indispensable. AI-based healthcare technologies can support diagnosis processes and make healthcare more accessible globally. In this context, we conducted a design science research project intending to introduce design principles for user interfaces (UIs)...
Preprint
Full-text available
In this paper, we propose a CNN fine-tuning method which enables users to give simultaneous feedback on two outputs: the classification itself and the visual explanation for the classification. We present the effect of this feedback strategy in a skin lesion classification task and measure how CNNs react to the two types of user feedback. To implem...
Conference Paper
Full-text available
Explainable Artificial Intelligence (XAI) is an emerging subdiscipline of Machine Learning (ML) and human-computer interaction. Discriminative models need to be understood. An explanation of such ML models is vital when an AI system makes decisions that have significant consequences, such as in healthcare or finance. By providing an input-specific...
Article
Full-text available
Background: New methods are constantly being developed to adapt cognitive load measurement to different contexts. However, research on middle childhood students' cognitive load measurement is rare. Research indicates that the three cognitive load dimensions (intrinsic, extraneous, and germane) can be measured well in adults and teenagers using dif...
Conference Paper
Full-text available
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most...
Preprint
Full-text available
This paper presents our project proposal for extracting biomedical information from German clinical narratives with limited amounts of annotations. We first describe the applied strategies in transfer learning and active learning for solving our problem. After that, we discuss the design of the user interface for both supplying model inspection and...
Preprint
Full-text available
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segment...
Preprint
Full-text available
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most...
Article
Full-text available
Artificial Intelligence (AI) systems are increasingly pervasive: Internet of Things, in-car intelligent devices, robots, and virtual assistants, and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality....
Chapter
Full-text available
Despite the current dominance of typed text, writing by hand remains the most natural mean of written communication and information keeping. Still, digital pen input provides limited user experience and lacks flexibility, as most of the manipulations are performed on a digitalized version of the text. In this paper, we present our prototype that en...
Chapter
Full-text available
Many features have been proposed for encoding the input signal from digital pens and touch-based interaction. They are widely used for analyzing and classifying handwritten texts, sketches, or gestures. Although they are well defined mathematically, many features are non-trivial and therefore difficult to understand for a human. In this paper, we p...
Conference Paper
Full-text available
Interactive Machine Learning (IML) systems incorporate humans into the learning process to enable iterative and continuous model improvements. The interactive process can be designed to leverage the expertise of domain experts with no background in machine learning, for instance, through repeated user feedback requests. However, excessive requests...
Conference Paper
Full-text available
Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance in crowded scenes or in wide spaces. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted...
Article
Full-text available
Digital pen features model characteristics of sketches and user behavior, and can be used for various supervised machine learning (ML) applications, such as multi-stroke sketch recognition and user modeling. In this work, we use a state-of-the-art set of more than 170 digital pen features, which we implement and make publicly available. The feature...
Preprint
Full-text available
Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction. Several methods have been proposed f...
Conference Paper
Full-text available
Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relev...
Preprint
Full-text available
We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and...
Article
Full-text available
Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer...
Article
Full-text available
Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this...
Preprint
Full-text available
Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted...
Article
Full-text available
Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74...
Chapter
Full-text available
Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we inve...
Article
Full-text available
CC BY 4.0 https:// creativecommons.org/licenses/by/ 4.0/ Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances...
Chapter
We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the positi...
Chapter
This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods o...
Chapter
To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances...
Chapter
Full-text available
Dermatologists recognize melanomas by inspecting images in which they identify human-comprehensible visual features. In this paper, we investigate to what extent such features correspond to the saliency areas identified on CNNs trained for classification. Our experiments, conducted on two neural architectures characterized by different depth and di...
Chapter
This paper presents an investigation on the task of anomaly detection for images of skin lesions. The goal is to provide a decision support system with an extra filtering layer to inform users if a classifier should not be used for a given sample. We tested anomaly detectors based on autoencoders and three discrimination methods: feature vector dis...
Preprint
Full-text available
This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods o...
Article
Full-text available
Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of vis...
Conference Paper
Full-text available
Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augment...
Preprint
We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large...
Article
Artificial intelligence (AI) has attained a new level of maturity in recent years and is becoming the driver of digitalization in all areas of life. AI is a cross-sectional technology with great importance for all areas of medicine employing image data, text data and bio-data. There is no medical field that will remain unaffected by AI, with AI-ass...
Preprint
Full-text available
Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune the medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer L...
Article
Full-text available
Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye trac...
Chapter
Full-text available
We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-r...
Chapter
The classification of skin lesion images is known to be biased by artifacts of the surrounding skin, but it is still not clear to what extent masking out healthy skin pixels influences classification performances, and why. To better understand this phenomenon, we apply different strategies of image masking (rectangular masks, circular masks, full m...
Preprint
Full-text available
Until now, Coronavirus SARS-CoV-2 has caused more than 850,000 deaths and infected more than 27 million individuals in over 120 countries. Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Rece...
Chapter
In this work, we propose a new approach to automatically predict the locations of visual dermoscopic attributes for Task 2 of the ISIC 2018 Challenge. Our method is based on the Attention U-Net with multi-scale images as input. We apply a new strategy based on transfer learning, i.e., training the deep network for feature extraction by adapting the...
Chapter
We present a study on the fusion of pixel data and patient metadata (age, gender, and body location) for improving the classification of skin lesion images. The experiments have been conducted with the ISIC 2019 skin lesion classification challenge data set. Taking two plain convolutional neural networks (CNNs) as a baseline, metadata are merged us...
Preprint
Full-text available
The aim of this task is to develop a system that automatically labels radiology images with relevant medical concepts. We describe our Deep Neural Network (DNN) based approach for tackling this problem. On the challenge test set of 3,534 radiology images, our system achieves an F1 score of 0.375 and ranks high (12th among all the systems that were...
Article
Full-text available
Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance a...