Michael Barz

Michael Barz
Deutsches Forschungszentrum für Künstliche Intelligenz | DFKI · Interactive Machine Learning

Master of Science

About

42
Publications
8,818
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
317
Citations
Citations since 2017
33 Research Items
312 Citations
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080
Additional affiliations
November 2015 - present
Deutsches Forschungszentrum für Künstliche Intelligenz
Position
  • Researcher
Education
October 2013 - September 2015
Universität des Saarlandes
Field of study
  • Computer Science
April 2010 - September 2013
Universität des Saarlandes
Field of study
  • Computer Science

Publications

Publications (42)
Preprint
Full-text available
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segment...
Chapter
Full-text available
Despite the current dominance of typed text, writing by hand remains the most natural mean of written communication and information keeping. Still, digital pen input provides limited user experience and lacks flexibility, as most of the manipulations are performed on a digitalized version of the text. In this paper, we present our prototype that en...
Chapter
Full-text available
Many features have been proposed for encoding the input signal from digital pens and touch-based interaction. They are widely used for analyzing and classifying handwritten texts, sketches, or gestures. Although they are well defined mathematically, many features are non-trivial and therefore difficult to understand for a human. In this paper, we p...
Conference Paper
Full-text available
Interactive Machine Learning (IML) systems incorporate humans into the learning process to enable iterative and continuous model improvements. The interactive process can be designed to leverage the expertise of domain experts with no background in machine learning, for instance, through repeated user feedback requests. However, excessive requests...
Conference Paper
Full-text available
Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relev...
Article
Full-text available
Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this...
Chapter
Full-text available
Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we inve...
Article
Full-text available
CC BY 4.0 https:// creativecommons.org/licenses/by/ 4.0/ Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances...
Article
Full-text available
Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of vis...
Conference Paper
Full-text available
Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augment...
Article
Full-text available
Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye trac...
Chapter
Full-text available
We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-r...
Article
Full-text available
Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance a...
Chapter
Manual evaluation of individual results of natural language generation tasks is one of the bottlenecks. It is very time consuming and expensive if it is, for example, crowdsourced. In this work, we address this problem for the specific task of automatic image captioning. We automatically generate human-like judgements on grammatical correctness, im...
Preprint
Full-text available
We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or re-training comes at a high cost. Our re-rankin...
Preprint
Full-text available
In this paper we provide a categorisation and implementation of digital ink features for behaviour characterisation. Based on four feature sets taken from literature, we provide a categorisation in different classes of syntactic and semantic features. We implemented a publicly available framework to calculate these features and show its deployment...
Conference Paper
Full-text available
Gaze estimation error can severely hamper usability and performance of mobile gaze-based interfaces given that the error varies constantly for different interaction positions. In this work, we explore error-aware gaze-based interfaces that estimate and adapt to gaze estimation error on-the-fly. We implement a sample error-aware user interface for g...
Article
Full-text available
This paper provides an overview of prominent deep learning toolkits and, in particular, reports on recent publications that contributed open source software for implementing tasks that are common in intelligent user interfaces (IUI). We provide a scientific reference for researchers and software engineers who plan to utilise deep learning technique...
Conference Paper
We present a multimodal medical 3D image system for radiologists in an virtual reality (VR) environment. Users can walk freely inside the virtual room and interact with the system using speech, going through patient records, and manipulate 3D image data with hand gestures. Medical images are retrieved from the hospital's Picture and Archiving Syste...
Chapter
Full-text available
Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visu...
Article
Full-text available
Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that...
Conference Paper
Full-text available
In this applied research paper, we describe an architecture for seamlessly integrating factory workers in industrial cyber-physical production environments. Our human-in-the-loop control process uses novel input techniques and relies on state-of-the-art industry standards. Our architecture allows for real-time processing of semantically annotated d...
Conference Paper
Full-text available
We present a speech dialogue system that facilitates medical decision support for doctors in a virtual reality (VR) application. The therapy prediction is based on a recurrent neural network model that incorporates the examination history of patients. A central supervised patient database provides input to our predictive model and allows us, first,...
Conference Paper
Full-text available
Gaze is known to be a dominant modality for conveying spatial information, and it has been used for grounding in human-robot dialogues. In this work, we present the prototype of a gaze-supported multi-modal dialogue system that enhances two core tasks in human-robot collaboration: 1) our robot is able to learn new objects and their location from us...
Conference Paper
Full-text available
Gaze is known to be a dominant modality for conveying spatial information, and it has been used for grounding in human-robot dialogues. In this work, we present the prototype of a gaze-supported multi-modal dialogue system that enhances two core tasks in human-robot collaboration: 1) our robot is able to learn new objects and their location from us...
Conference Paper
Full-text available
In this paper we describe a multimodal-multisensor annotation tool for physiological computing; for example mobile gesture-based interaction devices or health monitoring devices can be connected. It should be used as an expert authoring tool to annotate multiple video-based sensor streams for domain-specific activities. Resulting datasets can be us...
Conference Paper
Full-text available
Recent advances in eye tracking technologies opened the way to design novel attention-based user interfaces. This is promising for pro-active and assistive technologies for cyber-physical systems in the domains of, e.g., healthcare and industry 4.0. Prior approaches to recognize a user's attention are usually limited to the raw gaze signal or senso...
Conference Paper
In this paper, we present WaterCoaster, a mobile device and a mobile application to motivate people to drink beverages more often and more regularly. The WaterCoaster measures the amount drunk and reminds the user to consume more, if necessary. The app is designed as a game in which the user needs to take care of a virtual character living in a fis...
Conference Paper
Full-text available
Gaze estimation error is inherent in head-mounted eye trackers and seriously impacts performance, usability, and user experience of gaze-based interfaces. Particularly in mobile settings, this error varies constantly as users move in front and look at different parts of a display. We envision a new class of gaze-based interfaces that are aware of t...
Technical Report
Full-text available
Gaze estimation error is inherent in head-mounted eye trackers and seriously impacts performance, usability, and user experience of gaze-based interfaces. Particularly in mobile settings, this error varies constantly as users move in front and look at different parts of a display. We envision a new class of gaze-based interfaces that are aware of t...

Network

Cited By

Projects

Projects (2)
Project
Das Projekt „GeAR“ beschäftigt sich mit der Erforschung der Auswirkungen von Augmented Reality („AR“) auf das Rezeptions- und Produktionsverhalten im Kontext schulischer Lehr-Lernprozesse. Darüber hinaus werden Potentiale und besondere Herausforderungen des Einsatzes von AR im beschriebenen Themenfeld (z.B. Akzeptanz, Usability) identifiziert. Im Austausch mit Lehrern, Fachleitern und Schulen sollen schlussendlich auch die Möglichkeiten und Grenzen der Implementierung von AR-basierten Lernumgebungen in konkreten schulischen Unterrichtssituationen eruiert werden. Es handelt sich um ein Kooperationsprojekt der Universität des Saarlandes, der Technischen Universität Kaiserslautern, sowie des Deutschen Forschungszentrums für Künstliche Intelligenz (DFKI) in Kaiserslautern und Saarbrücken. Weitere Kooperationspartner sind das Staatliche Studienseminar für das Lehramt an Grundschulen Kaiserslautern und das Hohestaufen Gymnasium Kaiserslautern. Das Projekt GeAR wird vom Bundesministerium für Bildung und Forschung (BMBF) im Rahmen der Förderlinie "Digitalisierung im Bildungsbereich" finanziert.