Marco A. Wiering

Marco A. Wiering
University of Groningen | RUG · Institute of Artificial Intelligence and Cognitive Engineering (Alice)

Lecturer

About

239
Publications
293,733
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,082
Citations
Introduction
Marco Wiering is a lecturer in the department of Artificial Intelligence from the Bernoulli institute of Mathematics, Computer Science and Artificial Intelligence at the University of Groningen, the Netherlands. His main research interests are deep learning, reinforcement learning, neural networks, support vector machines, computer vision, time-series analysis, game playing programs, optimization, and robotics.
Additional affiliations
May 1999 - October 1999
University of Amsterdam
Position
  • PostDoc Position
September 2007 - present
University of Groningen
Position
  • Professor
January 2000 - September 2007
Utrecht University

Publications

Publications (239)
Conference Paper
Full-text available
This paper describes a new machine learning algorithm for regression and dimensionality reduction tasks. The Neural Support Vector Machine (NSVM) is a hybrid learning algorithm consisting of neural networks and support vector machines (SVMs). The output of the NSVM is given by SVMs that take a central feature layer as their input. The feature-layer...
Conference Paper
Full-text available
This paper describes using multi-agent reinforcement learning (RL) algorithms for learning traffic light controllers to minimize the overall waiting time of cars in a city. The RL systems learn value functions estimating expected waiting times for cars given different settings of traffic lights. Selected settings of traffic lights result from combi...
Article
Full-text available
A crucial challenge in reinforcement learning is to reduce the number of interactions with the environment that an agent requires to master a given task. Transfer learning proposes to address this issue by re-using knowledge from previously learned tasks. However, determining which source task qualifies as the most appropriate for knowledge extract...
Preprint
Full-text available
A crucial challenge in reinforcement learning is to reduce the number of interactions with the environment that an agent requires to master a given task. Transfer learning proposes to address this issue by re-using knowledge from previously learned tasks. However, determining which source task qualifies as optimal for knowledge extraction, as well...
Preprint
Full-text available
Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in model-based RL allows agents to be much more data-efficient, as it enables them to learn behaviors of visual environments in imagination by leveraging an internal World Model of the environment....
Conference Paper
Full-text available
Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nevertheless , it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potentia...
Chapter
The Tsetlin Machine is a recent supervised machine learning algorithm that has obtained competitive results in several benchmarks, both in terms of accuracy and resource usage. It has been used for convolution, classification, and regression, producing interpretable rules. In this paper, we introduce the first framework for reinforcement learning b...
Article
Full-text available
Background: The inclusion of facial and bodily cues (clinical gestalt) in machine learning (ML) models improves the assessment of patients' health status, as shown in genetic syndromes and acute coronary syndrome. It is unknown if the inclusion of clinical gestalt improves ML-based classification of acutely ill patients. As in previous research in...
Article
Full-text available
Critically ill patients constitute a highly heterogeneous population, with seemingly distinct patients having similar outcomes, and patients with the same admission diagnosis having opposite clinical trajectories. We aimed to develop a machine learning methodology that identifies and provides better characterization of patient clusters at high risk...
Preprint
Sub-optimal control policies in intersection traffic signal controllers (TSC) contribute to congestion and lead to negative effects on human health and the environment. Reinforcement learning (RL) for traffic signal control is a promising approach to design better control policies and has attracted considerable research interest in recent years. Ho...
Article
Full-text available
Despite having a similar post-operative complication profile, cardiac valve operations are associated with a higher mortality rate compared to coronary artery bypass grafting (CABG) operations. For long-term mortality, few predictors are known. In this study, we applied an ensemble machine learning (ML) algorithm to 88 routinely collected peri-oper...
Conference Paper
Full-text available
In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods,...
Conference Paper
Full-text available
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestima-tion bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful dependi...
Conference Paper
Full-text available
Deep Reinforcement Learning (DRL) has the potential to surpass the existing state of the art in various practical applications. However, as long as learned strategies and performed decisions are difficult to interpret, DRL will not find its way into safety-relevant fields of application. SHAP values are an approach to overcome this problem. It is e...
Conference Paper
Full-text available
In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation bias, which may lead to poor performance or unstable learning. In this paper, we present a novel analysis of this problem using various control tasks. For solving these tasks, Q-learning is combined with a multilayer perceptron (MLP), experience rep...
Article
In this paper, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. We train a deep neural network for estimating the robot’s position in the environment using ground truth information provided by a classical localization an...
Preprint
Full-text available
Introduction Despite extensive research, the goal of unravelling patient heterogeneity in critical care remains largely unattained. Combining clustering analysis of routinely collected high-frequency data with the identification of features driving cluster separation may constitute a step towards improving patient characterization. Methods In this...
Article
This paper presents CentroidNetV2, a novel hybrid Convolutional Neural Network (CNN) that has been specifically designed to segment and count many small and connected object instances. This complete redesign of the original CentroidNet uses a CNN backbone to regress a field of centroid-voting vectors and border-voting vectors. The segmentation mask...
Preprint
Full-text available
The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. F...
Preprint
Full-text available
In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment and consequently the agent receives the reward corresponding to that action by a delay called action-effect delay. Such delays reduce the performance of the learning algorithm and increase the computationa...
Conference Paper
Full-text available
Counting the number of fruits in an image is important for orchard management, but is complex due to different challenging problems such as overlapping fruits and the difficulty to create large labeled datasets. In this paper, we propose the use of a data-augmentation technique that creates novel images by adding a number of manually cropped fruits...
Chapter
Full-text available
Counting the number of fruits in an image is important for orchard management, but is complex due to different challenging problems such as overlapping fruits and the difficulty to create large labeled datasets. In this paper, we propose the use of a data-augmentation technique that creates novel images by adding a number of manually cropped fruits...
Conference Paper
Full-text available
This paper describes a novel approach to control forest fires in a simulated environment using connectionist reinforcement learning (RL) algorithms. A forest fire simulator is introduced that allows to benchmark several popular model-free RL algorithms that are combined with multilayer perceptrons that serve as a value function approximator. For ou...
Conference Paper
Full-text available
We present a novel approach for learning an approximation of the optimal state-action value function (Q) in model-free Deep Reinforcement Learning (DRL). We propose to learn this approximation while simultaneously learning an approximation of the state-value function (V). We introduce two new DRL algorithms, called DQV-Learning and DQV-Max Learning...
Article
Full-text available
For performing multi-class classification, deep neural networks almost always employ a One-vs-All (OvA) classification scheme with as many output units as there are classes in a dataset. The problem of this approach is that each output unit requires a complex decision boundary to separate examples from one class from all other examples. In this pap...
Preprint
Full-text available
In this paper, a novel racing environment for OpenAI Gym is introduced. This environment operates with continuous action- and state-spaces and requires agents to learn to control the acceleration and steering of a car while navigating a randomly generated racetrack. Different versions of two actor-critic learning algorithms are tested on this envir...
Article
Full-text available
Background: Hemodynamic assessment of critically ill patients is a challenging endeavor, and advanced monitoring techniques are often required to guide treatment choices. Given the technical complexity and occasional unavailability of these techniques, estimation of cardiac function based on clinical examination is valuable for critical care physi...
Conference Paper
Full-text available
Following the rise of e-commerce there has been a dramatic increase in online criminal activities targeting online shoppers. Considering that the number of online stores has risen dramatically, manually checking these stores has become intractable. An automated process is therefore required. We approached this problem by applying machine learning t...
Article
Full-text available
A novel framework for intelligent structural control is proposed using reinforcement learning. In this approach, a deep neural network learns how to improve structural responses using feedback control. The effectiveness of the framework is demonstrated in a case study for a moment frame subjected to earthquake excitations. The performance of the le...
Preprint
Full-text available
This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function ($V$), alongside an approximation of the state-action value function ($Q$). Our analysis starts with a thorough study...
Chapter
Full-text available
Sepsis is an excessive bodily reaction to an infection in the bloodstream, which causes one in five patients to deteriorate within two days after admission to the hospital. Until now, no clear tool for early detection of sepsis induced deterioration has been found. This research uses electrocardiograph (ECG), respiratory rate, and blood oxygen satu...
Thesis
Full-text available
This thesis describes and compares three algorithms for solving the 0-1 knapsack problem. The latter is a combinatorial optimization problem in which the aim is to maximize value with some constraint. The knapsack problem is NP-complete which means there is no algorithm that can solve every knapsack problem in polynomial time, furthermore it is als...
Article
Full-text available
Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to...
Preprint
BACKGROUND Hemodynamic assessment of critically ill patients is a challenging endeavor, and advanced monitoring techniques are often required to guide treatment choices. Given the technical complexity and occasional unavailability of these techniques, being able to estimate cardiac function based on clinical examination is a valuable tool for criti...
Conference Paper
Full-text available
In semantic segmentation tasks the Jaccard Index, or Intersection over Union (IoU), is often used as a measure of success. While this measure is more representative than per-pixel accuracy, state-of-the-art deep neural networks are still trained on accuracy by using Binary Cross Entropy loss. In this research, an alternative is used where deep neur...
Conference Paper
Full-text available
The online game Agar.io has become massively popular on the internet due to its intuitive game design and its ability to instantly match players with others around the world. The game has a continuous input and action space and allows diverse agents with complex strategies to compete against each other. In this paper we focus on the pellet eating t...
Article
Full-text available
Precision agriculture using unmanned aerial vehicles (UAVs) is gaining popularity. These UAVs provide a unique aerial perspective suitable for inspecting agricultural fields. With the use of hyperspectral cameras, complex inspection tasks are being automated. Payload constraints of UAVs require low weight and small hyperspectral cameras; however, s...
Chapter
Full-text available
This paper describes a novel reinforcement learning system for learning to play the game of Tron. The system combines Q-learning, multi-layer perceptrons, vision grids, opponent modelling, and Monte Carlo rollouts in a novel way. By learning an opponent model, Monte Carlo rollouts can be effectively applied to generate state trajectories for all po...
Chapter
In precision agriculture, counting and precise localization of crops is important for optimizing crop yield. In this paper CentroidNet is introduced which is a Fully Convolutional Neural Network (FCNN) architecture specifically designed for object localization and counting. A field of vectors pointing to the nearest object centroid is trained and c...
Poster
Full-text available
We introduce Deep Quality-Value (DQV) Learning, a novel Deep Reinforcement Learning (DRL) algorithm which learns significantly faster and better than Deep Q-Learning and Double Deep Q-Learning. DQV uses temporal-difference learning to train a Value neural network and uses this network for training a second Quality-value network that learns to estim...
Conference Paper
Full-text available
Sepsis is an excessive bodily reaction to an infection in the bloodstream, which causes one in five patients to deteriorate within two days after admission to the hospital. Until now, no clear tool for early detection of sepsis induced deterioration has been found. This research uses electrocardiograph (ECG), respiratory rate, and blood oxygen satu...
Chapter
Full-text available
This paper describes the use of two different deep-learning algorithms for object detection to recognize different badgers. We use recordings of four different badgers under varying background illuminations. In total four different object detection algorithms based on deep neural networks are compared: The single shot multi-box detector (SSD) with...
Article
Traffic signal control plays a pivotal role in reducing traffic congestion. Traffic signals cannot be adequately controlled with conventional methods due to the high variations and complexity in traffic environments. In recent years, reinforcement learning (RL) has shown great potential for traffic signal control because of its high adaptability, f...
Preprint
Full-text available
We introduce a novel Deep Reinforcement Learning (DRL) algorithm called Deep Quality-Value (DQV) Learning. DQV uses temporal-difference learning to train a Value neural network and uses this network for training a second Quality-value network that learns to estimate state-action values. We first test DQV's update rules with Multilayer Perceptrons a...
Preprint
Full-text available
In this paper, a new offline actor-critic learning algorithm is introduced: Sampled Policy Gradient (SPG). SPG samples in the action space to calculate an approximated policy gradient by using the critic to evaluate the samples. This sampling allows SPG to search the action-Q-value space more globally than deterministic policy gradient (DPG), enabl...
Preprint
Full-text available
In this paper, a new offline actor-critic learning algorithm is introduced: Sampled Policy Gradient (SPG). SPG samples in the action space to calculate an approximated policy gradient by using the critic to evaluate the samples. This sampling allows SPG to search the action-Q-value space more globally than deterministic policy gradient (DPG), enabl...
Conference Paper
Full-text available
In precision agriculture, counting and precise localization of crops is important for optimizing crop yield. In this paper CentroidNet is introduced which is a Fully Convolutional Neural Network (FCNN) architecture specifically designed for object localization and counting. A field of vectors pointing to the nearest object centroid is trained and c...
Article
Full-text available
Reinforcement learning (RL) algorithms enable computer programs to learn from interacting with an environment. The goal is to learn the optimal policy that maximizes the long-term intake of a reward signal, where rewards are given to the agent for reaching particular environmental situations. The eld of RL has developed a lot since the past decade....
Conference Paper
Full-text available
Pac-Xon is an arcade video game in which the player tries to fill a level space by conquering blocks while being threatened by enemies. In this paper it is investigated whether a reinforcement learning (RL) agent can successfully learn to play this game. The RL agent consists of a multi-layer perceptron (MLP) that uses a feature representation of t...
Conference Paper
Full-text available
This study focuses on supplementing data sets with data of absent classes by using other, similar data sets in which these classes are represented. The data is generated using Gener-ative Adversarial Nets (GANs) trained on the CelebA and MNIST datasets. In particular we use and compare Coupled GANs (CoGANs), Auxiliary Classifier GANs (AC-GANs) and...
Conference Paper
Full-text available
This paper describes a novel hierarchical reinforcement learning (HRL) algorithm for training an autonomous agent to play a dungeon crawler game. As opposed to most previous HRL frameworks, the proposed HRL system does not contain complex actions that take multiple time steps. Instead there is a hierarchy of behaviours which can either execute an a...
Article
Full-text available
In this paper, we examine a novel data augmentation (DA) method that transforms an image into a new image containing multiple rotated copies of the original image. The DA method creates a grid of cells, in which each cell contains a different randomly rotated image and introduces a natural background in the newly created image. We investigate the u...
Article
Full-text available
Generative adversarial networks (GANs) have demonstrated to be successful at generating realistic real-world images. In this paper we compare various GAN techniques, both supervised and unsupervised. The effects on training stability of different objective functions are compared. We add an encoder to the network, making it possible to encode images...
Conference Paper
Full-text available
Generative adversarial networks (GANs) have demonstrated to be successful at generating realistic real-world images. In this paper we compare various GAN techniques, both supervised and unsupervised. The effects on training stability of different objective functions are compared. We add an encoder to the network, making it possible to encode images...
Article
Full-text available
Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on continuous reinforcement learning. Although they h...
Chapter
Full-text available
Neural networks and reinforcement learning have successfully been applied to various games, such as Ms. Pacman and Go. We combine multilayer perceptrons and a class of reinforcement learning algorithms known as actor-critic to learn to play the arcade classic Donkey Kong. Two neural networks are used in this study: the actor and the critic. The act...
Chapter
Full-text available
We compare classic text classification techniques with more recent machine learning techniques and introduce a novel architecture that outperforms many state-of-the-art approaches. These techniques are evaluated on a new multi-label classification task, where the task is to predict the genre of a movie based on its subtitle. We show that pre-traine...
Conference Paper
Full-text available
In this paper we propose a new approach to Deep Neural Networks (DNNs) based on the particular needs of navigation tasks. To investigate these needs we created a labelled image dataset of a test environment and we compare classical computer vision approaches with the state of the art in image classification. Based on these results we have developed...
Article
Full-text available
Designing efficient traffic signal controllers has always been an important concern in traffic engineering. This is owing to the complex and uncertain nature of traffic environments. Within such a context, reinforcement learning has been one of the most successful methods owing to its adaptability and its online learning ability. Reinforcement lear...
Poster
Full-text available
In this paper we propose a new approach to Deep Neural Networks (DNNs) based on the particular needs of navigation tasks. To investigate these needs we created a labeled image dataset of a test environment and we compare classical computer vision approaches with the state of the art in image classification. Based on these results we have developed...