Sebastian Möller’s research while affiliated with Deutsches Forschungszentrum für Künstliche Intelligenz and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (574)


Figure 4: Visualization of clustering in AG News and SST2, where stars denote cluster centroids.
Figure 5: Label distribution of AG News and SST2.
Figure 6: Example instances from AG News and SST2.
Three open sourced LLMs used in ZEROCF and FITCF.
FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation
  • Preprint
  • File available

January 2025

·

7 Reads

Qianli Wang

·

Nils Feldhus

·

Simon Ostermann

·

[...]

·

Counterfactual examples are widely used in natural language processing (NLP) as valuable data to improve models, and in explainable artificial intelligence (XAI) to understand model behavior. The automated generation of counterfactual examples remains a challenging task even for large language models (LLMs), despite their impressive performance on many tasks. In this paper, we first introduce ZeroCF, a faithful approach for leveraging important words derived from feature attribution methods to generate counterfactual examples in a zero-shot setting. Second, we present a new framework, FitCF, which further verifies aforementioned counterfactuals by label flip verification and then inserts them as demonstrations for few-shot prompting, outperforming two state-of-the-art baselines. Through ablation studies, we identify the importance of each of FitCF's core components in improving the quality of counterfactuals, as assessed through flip rate, perplexity, and similarity measures. Furthermore, we show the effectiveness of LIME and Integrated Gradients as backbone attribution methods for FitCF and find that the number of demonstrations has the largest effect on performance. Finally, we reveal a strong correlation between the faithfulness of feature attribution scores and the quality of generated counterfactuals.

Download

Factors in Crowdsourcing for Evaluation of Complex Dialogue Systems

November 2024

·

2 Reads

In the last decade, crowdsourcing has become a popular method for conducting quantitative empirical studies in human-machine interaction. The remote work on a given task in crowdworking settings suits the character of typical speech/language-based interactive systems for instance with regard to argumentative conversations and information retrieval. Thus, crowdworking promises a valuable opportunity to study and evaluate the usability and user experience of real humans in interactions with such interactive systems. In contrast to physical attendance in laboratory studies, crowdsourcing studies offer much more flexible and easier access to large numbers of heterogeneous participants with a specific background, e.g., native speakers or domain expertise. On the other hand, the experimental and environmental conditions as well as the participant's compliance and reliability (at least better monitoring of the latter) are much better controllable in a laboratory. This paper seeks to present a (self-)critical examination of crowdsourcing-based studies in the context of complex (spoken) dialogue systems. It describes and discusses observed issues in crowdsourcing studies involving complex tasks and suggests solutions to improve and ensure the quality of the study results. Thereby, our work contributes to a better understanding and what needs to be considered when designing and evaluating studies with crowdworkers for complex dialogue systems.


Exploring Augmented Table Setup and Lighting Customization in a Simulated Restaurant to Improve the User Experience

November 2024

·

14 Reads

This study explored a concept for using Augmented Reality (AR) glasses to customize augmented table setup and lighting in a restaurant. The aim was to provide insights into AR usage in restaurants and contribute to existing research by introducing an extendable and versatile concept for scholars and restaurateurs. A controlled laboratory study, using a within-subjects design, was conducted to investigate the effects of a customizable augmented table setup and lighting on user experience (UX), perceived waiting time, psychological ownership, and social acceptability. A simulated restaurant environment was created using a 360-degree image in Virtual Reality (VR). The study implemented default and customizable table setup and lighting. Results from a paired samples t-test showed a statistically significant effect of table setup and lighting on the pragmatic quality of UX, hedonic quality of UX, overall UX, valence, dominance, psychological ownership, and affect. Furthermore, table setup had a significant effect on arousal and perceived waiting time. Moreover, table setup significantly affected AR interaction, isolation, and safety acceptability, while lighting only affected AR interaction acceptability. Findings suggest that these investigated variables are worth considering for AR applications in a restaurant, especially when offering customizable augmented table setup and lighting.




Figure 2: Impact of Preference Pairs Category Distribution: Presents the relative improvements in accuracy (left) and ∆ W −L (right) between M Anchor and M Rank with respect to λ.
Inference parameters per component
Distribution of anchor categories: This table presents the distribution of the cate- gories-Consistently Correct (CC), Consistently Incorrect (CI), and Variable (V)-across datasets used during the DPO alignment phase of M Anchor .
Anchored Alignment for Self-Explanations Enhancement

October 2024

·

26 Reads

In this work, we introduce a methodology for alignment designed to enhance the ability of large language models (LLMs) to articulate their reasoning (self-explanation) even in the absence of annotated rationale explanations. Our alignment methodology comprises three key components: explanation quality assessment, self-instruction dataset generation, and model alignment. Additionally, we present a novel technique called Alignment with Anchor Preference Pairs, which improves the selection of preference pairs by categorizing model outputs into three groups: consistently correct, consistently incorrect, and variable. By applying tailored strategies to each category, we enhance the effectiveness of Direct Preference Optimization (DPO). Our experimental results demonstrate that this approach significantly improves explanation quality while maintaining accuracy compared to other fine-tuning strategies.



Working with Mixed Reality in Public: Effects of Virtual Display Layouts on Productivity, Feeling of Safety, and Social Acceptability

October 2024

·

3 Reads

Nowadays, Mixed Reality (MR) headsets are a game-changer for knowledge work. Unlike stationary monitors, MR headsets allow users to work with large virtual displays anywhere they wear the headset, whether in a professional office, a public setting like a cafe, or a quiet space like a library. This study compares four different layouts (eye level-close, eye level-far, below eye level-close, below eye level-far) of virtual displays regarding feelings of safety, perceived productivity, and social acceptability when working with MR in public. We test which layout is most preferred by users and seek to understand which factors affect users' layout preferences. The aim is to derive useful insights for designing better MR layouts. A field study in a public library was conducted using a within-subject design. While the participants interact with a layout, they are asked to work on a planning task. The results from a repeated measure ANOVA show a statistically significant effect on productivity but not on safety and social acceptability. Additionally, we report preferences expressed by the users regarding the layouts and using MR in public.


Digital Eyes: Social Implications of XR EyeSight

October 2024

·

19 Reads

The EyeSight feature, introduced with the new Apple Vision Pro XR headset, promises to revolutionize user interaction by simulating real human eye expressions on a digital display. This feature could enhance XR devices' social acceptability and social presence when communicating with others outside the XR experience. In this pilot study, we explore the implications of the EyeSight feature by examining social acceptability, social presence, emotional responses, and technology acceptance. Eight participants engaged in conversational tasks in three conditions to contrast experiencing the Apple Vision Pro with EyeSight, the Meta Quest 3 as a reference XR headset, and a face-to-face setting. Our preliminary findings indicate that while the EyeSight feature improves perceptions of social presence and acceptability compared to the reference headsets, it does not match the social connectivity of direct human interactions.


Prospectively investigating the impact of AI onshared decision-making in post kidney transplant care (PRIMA-AI): protocol for a longitudinal qualitative study among patients, their support persons and treating physicians at a tertiary care centre

October 2024

·

16 Reads

·

1 Citation

BMJ Open

Introduction As healthcare is shifting from a paternalistic to a patient-centred approach, medical decision making becomes more collaborative involving patients, their support persons (SPs) and physicians. Implementing shared decision-making (SDM) into clinical practice can be challenging and becomes even more complex with the introduction of artificial intelligence (AI) as a potential actant in the communicative network. Although there is more empirical research on patients’ and physicians’ perceptions of AI, little is known about the impact of AI on SDM. This study will help to fill this gap. To the best of our knowledge, this is the first systematic empirical investigation to prospectively assess the views of patients, their SPs and physicians on how AI affects SDM in physician–patient communication after kidney transplantation. Using a transdisciplinary approach, this study will explore the role and impact of an AI-decision support system (DSS) designed to assist with medical decision making in the clinical encounter. Methods and analysis This is a plan to roll out a 2 year, longitudinal qualitative interview study in a German kidney transplant centre. Semi-structured interviews with patients, SPs and physicians will be conducted at baseline and in 3-, 6-, 12- and 24-month follow-up. A total of 50 patient–SP dyads and their treating physicians will be recruited at baseline. Assuming a dropout rate of 20% per year, it is anticipated that 30 patient–SP dyads will be included in the last follow-up with the aim of achieving data saturation. Interviews will be audio-recorded and transcribed verbatim. Transcripts will be analysed using framework analysis. Participants will be asked to report on their (a) communication experiences and preferences, (b) views on the influence of the AI-based DSS on the normative foundations of the use of AI in medical decision-making, focusing on agency along with trustworthiness, transparency and responsibility and (c) perceptions of the use of the AI-based DSS, as well as barriers and facilitators to its implementation into routine care. Ethics and dissemination Approval has been granted by the local ethics committee of Charité—Universitätsmedizin Berlin (EA1/177/23 on 08 August 2023). This research will be conducted in accordance with the principles of the Declaration of Helsinki (1996). The study findings will be used to develop communication guidance for physicians on how to introduce and sustainably implement AI-assisted SDM. The study results will also be used to develop lay language patient information on AI-assisted SDM. A broad dissemination strategy will help communicate the results of this research to a variety of target groups, including scientific and non-scientific audiences, to allow for a more informed discourse among different actors from policy, science and society on the role and impact of AI in physician–patient communication.


Citations (36)


... This has delayed the integration of AI-powered platforms with electronic health records (EHRs) in transplant centers. For example, in some multi-center kidney transplant studies, discrepancies in data formats and infrastructure incompatibilities have complicated efforts to implement AI solutions at scale, reducing their potential impact [112,113]. These real-world examples highlight the urgency of addressing these challenges through collaborative efforts, standardized protocols, and regulatory frameworks to ensure the successful integration of AI in transplantation medicine. ...

Reference:

The Impact of Artificial Intelligence and Machine Learning in Organ Retrieval and Transplantation: A Comprehensive Review
Prospectively investigating the impact of AI onshared decision-making in post kidney transplant care (PRIMA-AI): protocol for a longitudinal qualitative study among patients, their support persons and treating physicians at a tertiary care centre

BMJ Open

... '-' in the System Summary indicates no system description paper was submitted. cluding the baseline, Team Yseop won both subtasks in Task 2. For a more detailed description of the task and its results, we refer the reader to Raithel et al. (2024a). ...

Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese
  • Citing Conference Paper
  • January 2024

... The literature suggests that PA can generate emotions that significantly impact CS [46,47], providing unique experiences, impressions, and sensations that create deep and memorable connections with consumers [32,48]. However, studies such as those conducted by Faizan et al. [49] and Jee-Hoon [50] contradict this idea, asserting that there is no relationship between the two variables, clarifying that CS is influenced by other factors. ...

The Impact of Social Environment and Interaction Focus on User Experience and Social Acceptability of an Augmented Reality Game
  • Citing Conference Paper
  • June 2024

... This method is suitable for feature attributions on tabular data and classifiers based on this data type, as well as other methods like counterfactual explanations. Similarly, Castelnovo et al. [302], QoEXplainer [303] and XAIstories [301] propose to generate natural language explanations from Shapley scores through human-computer interactions. XAIstories includes separate modules for counterfactual explanations and direct natural language explanations. ...

QoEXplainer: Mediating Explainable Quality of Experience Models with Large Language Models
  • Citing Conference Paper
  • June 2024

... AI is becoming a significant part of our everyday lives and significantly shaping the digital landscape also with respect to disinformation generation. This raises the potential for the creation of disinformation, deep fakes, and propagation of hate speech, thereby greatly compromising the reliability of information ecosystems [25,29,38,47,62,73]. The infodemic during Covid-19 [7] and the war in Ukraine and Israel [15] serve as concrete examples of the severe effect of using AI for generating disinformation and shaping public opinions [48]. ...

NewsPolyML: Multi-lingual European News Fake Assessment Dataset

... As a result, the identification of disinformation continues to require human involvement. Recent research suggests that hybrid models, which integrate both human insight and artificial intelligence, have the potential to fulfill tasks beyond the reach of either entity alone [18,50]. In such cases, humans oversee AI performance to help flag questionable news, but these AI systems frequently lack the necessary transparency for dependable predictions and recommendations. ...

The Role of Explainability in Collaborative Human-AI Disinformation Detection

... Despite the promise of AI, its impact on interactions among patients, their support networks, and HCPs remains largely unexplored [144]. Although empirical studies on AI's influence on SDM are limited [145], ongoing research is investigating AI-based risk prediction's role in physician-patient interactions [146]. ...

Investigating the Impact of AI on Shared Decision-Making in Post-Kidney Transplant Care (PRIMA-AI): Protocol for a Randomized Controlled Trial
  • Citing Article
  • April 2024

JMIR Research Protocols

... LLM-based MT Several studies have explored the application of LLMs for translation tasks, highlighting their impressive performance across multiple high-resource language pairs (Xu et al., 2024a,b;Wu et al., 2024). One notable advantage of LLMs over traditional neural machine translation (NMT) systems is their ability to generate more controlled and nuanced translations, particularly when dealing with idiomatic expressions that require less literal interpretation (Manakhimova et al., 2023;Stap et al., 2024). In this work, we focus on evaluating the capabilities of LLMs in proverb translation, a challenging task due to the cultural and figurative nature of proverbs. ...

Linguistically Motivated Evaluation of the 2023 State-of-the-art Machine Translation: Can ChatGPT Outperform NMT?