About
21
Publications
5,930
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
222
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (21)
Background
Treating patients with combinations of drugs that have synergistic effects has become widespread practice in the clinic. Drugs work synergistically when the observed effect of a drug combination is larger than the effect predicted by the reference model. The reference model is a theoretical null model that returns the combined effect of...
Emulators provide approximations to computationally expensive functions and are widely used in diverse domains, despite the ever increasing speed of computational devices. In this paper we establish a connection between two independently developed emulation methods: radial basis function networks and Gaussian process emulation. The methodological r...
It is commonplace to determine the effectiveness of the combination of drugs by comparing the observed effects to a reference model that describes the combined effect under the assumption that the drugs do not interact. Depending on what is to be understood by non-interacting behavior, several reference models have been developed in the literature....
Gaussian process (GP) emulation is a relatively recent statistical technique that provides a fast-running approximation to a complex computer model, given training data generated by the considered model. Despite its sound theoretical foundation, GP emulation falls short in practical applications where the training dataset is very large, due to nume...
In this paper, we show the relationship between two seemingly unrelated approximation techniques. On the one hand, a certain class of Gaussian process-based interpolation methods, and on the other hand inverse distance weighting, which has been developed in the context of spatial analysis where there is often a need for interpolating from irregular...
A common way to evaluate surrogate models is by
using validation measures. This amounts to applying a chosen
validation measure to a test data set that was not used to train
the surrogate model. The selection of a validation measure is
typically motivated by diverse guidelines, such as simplicity of
the measure, ease of implementation, popularity o...
Probabilistic sensitivity analysis (SA) allows to incorporate background knowledge on the considered input variables more easily than many other existing SA techniques. Incorporation of such knowledge is performed by constructing a joint density function over the input domain. However, it rarely happens that available knowledge directly and uniquel...
In this paper, we compare three surrogate models on a highly irregular, yet real-world data set. The three methods strongly vary in mathematical sophistication and computational complexity. First, inverse distance weighting, a very intuitive method whose single parameter can be readily determined via basic, albeit nonlinear, optimization. Secondly,...
We describe the application of statistical emulation to the outcomes
of an agent-based model. The agent-based model simulates the mechanisms that
might have linked the reversal of gender inequality in higher education with observed
changes in educational assortative mating in Belgium. Using the statistical
emulator as a computationally fast approxi...
This paper presents a method to apply statistical emulation on very large data sets, making use of cluster analysis. It is shown how integrating cluster analysis with the interpolation method called inverse distance weighting, naturally generalizes the basic emulation framework where a single Gaussian distribution is used, to a framework where a mi...
Robustness is an important concept when dealing with clustering algorithms. While most literature directed to this concept discusses robustness with respect to changes in the given data set, this paper focuses on robustness with respect to changes in the initial conditions. We build on our previous work, where we introduced the concepts of instabil...
In this paper we present a survey on the application of recurrent neural networks to the task of statistical language modeling. Although it has been shown that these models obtain good performance on this task, often superior to other state-of-the-art techniques, they suffer from some important drawbacks, including a very long training time and lim...
In this paper we give a general definition for the concept ‘optimal clustering’ which is applicable to overlapping clusterings. Overlapping clusterings are a generalization of hard clusterings and their structure is formally developed in this paper. It is generally assumed that the domain of clustering is too heuristic to develop a general, i.e. ax...
In this paper, we generalize the hard clustering paradigm. While in this paradigm a data set is subdivided into disjoint clusters, we allow different clusters to have a nonempty intersection. The concept of hard clustering is then analysed in this general setting, and we show which specific properties hard clusterings possess in comparison to more...
Clustering is an important approach in the analysis of biological data, and often a first step to identify interesting patterns of coexpression in gene expression data. Because of the high complexity and diversity of gene expression data, many genes cannot be easily assigned to a cluster, but even if the dissimilarity of these genes with all other...
Clustering is an important approach in the analysis of biological data, and often a first step to identify interesting patterns of coexpression in gene expression data. Because of the high complexity and diversity of gene expression data, many genes cannot be easily assigned to a cluster, but even if the dissimilarity of these genes with all other...
We propose a measure for the validation of clusterings of gene expression data. This measure is also useful to estimate missing gene expression levels, based on the similarity information contained in a given clustering. It is shown that this measure is an improvement over the figure of merit, an existing validation measure especially developed for...
It is well known that the clusters produced by a clustering algorithm depend on the chosen initial centers. In this paper
we present a measure for the degree to which a given clustering algorithm depends on the choice of initial centers, for a
given data set. This measure is calculated for four well-known offline clustering algorithms (k-means Forg...