About
67
Publications
16,373
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,222
Citations
Publications
Publications (67)
In this paper, we present a modified Xception architecture, the NEXcepTion network. Our network has significantly better performance than the original Xception, achieving top-1 accuracy of 81.5% on the ImageNet validation dataset (an improvement of 2.5%) as well as a 28% higher throughput. Another variant of our model, NEXcepTion-TP, reaches 81.8%...
Recommender systems are increasingly used for personalised navigation through large amounts of information, especially in the e-commerce domain for product purchase advice. Whilst much research effort is spent on developing recommenders further, there is little to no effort spent on analysing the impact of them - neither on the supply (company) nor...
Although the anomaly detection problem can be considered as an extreme case of class imbalance problem, very few studies consider improving class imbalance classification with anomaly detection ideas. Most data-level approaches in the imbalanced learning domain aim to introduce more information to the original dataset by generating synthetic sample...
Although over 90 oversampling approaches have been developed in the imbalance learning domain, most of the empirical study and application work are still based on the “classical” resampling techniques. In this paper, several experiments on 19 benchmark datasets are set up to study the efficiency of six powerful oversampling approaches, including bo...
This paper presents WODAN2.0, a workflow using Deep Learning for the automated detection of multiple archaeological object classes in LiDAR data from the Netherlands. WODAN2.0 is developed to rapidly and systematically map archaeology in large and complex datasets. To investigate its practical value, a large, random test dataset—next to a small, no...
Kriging or Gaussian Process Regression is applied in many fields as a non-linear regression model as well as a surrogate model in the field of evolutionary computation. However, the computational and space complexity of Kriging, that is cubic and quadratic in the number of data points respectively, becomes a major bottleneck with more and more data...
In this paper, the recently published benchmark of Goldstein and Uchida [3] for unsupervised anomaly detection is extended with three anomaly detection techniques: Sparse Auto-Encoders, Isolation Forests, and Restricted Boltzmann Machines. The underlying mechanisms of these algorithms differ substantially from the more traditional anomaly detection...
In processes involving human professional judgment (e.g., in Knowledge Intensive processes) it is not easy to verify if similar cases receive similar treatment. In these processes there is a risk of dissimilar treatment as human process workers may develop their individual experiences and convictions or change their behavior due to changes in workl...
The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitati...
The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitati...
Including categorical variables with many levels in a logistic regression model easily leads to a sparse design matrix. This can result in a big, ill-conditioned optimization problem causing overfitting, extreme coefficient values and long run times. Inspired by recent developments in matrix factorization, we propose four new strategies of overcomi...
Tax administrations need to become more efficient due to a growing workload, higher demands from citizens, and, in many countries, staff reduction and budget cuts. The novel field of analytics has achieved successes in improving efficiencies in areas such as banking, insurance and retail. Analytics, which is often described as an extensive use of d...
Kriging or Gaussian Process Regression has been successfully applied in many fields. One of the major bottlenecks of Kriging is the complexity in both processing time (cubic) and memory (quadratic) in the number of data points. To overcome these limitations, a variety of approximation algorithms have been proposed. One of these approximation algori...
Real-life datasets that occur in domains such as industrial process control, medical diagnosis, marketing, risk management, often contain missing values. This poses a challenge for many classification and regression algorithms which require complete training sets. In this paper we present a new approach for “repairing” such incomplete datasets by c...
Missing values in datasets form a very relevant and often overlooked problem in many fields. Most algorithms are not able to handle missing values for training a predictive model or analyzing a dataset. For this reason, records with missing values are either rejected or repaired. However, both repairing and rejecting affects the dataset and the fin...
Tax administrations need to become more efficient due to an expanding workload, higher demands from citizens, and, in many countries, budget cuts. The novel field of analytics has achieved successes in improving efficiencies in areas such as banking, telecommunication and retail. Analytics, which is often described as the extensive use of data, sta...
In business and academia we are continuously trying to model and analyze complex processes in order to gain insight and optimize. One of the most popular modeling algorithms is Kriging, or Gaussian Processes. A major bottleneck with Kriging is the amount of processing time of at least O(n3) and memory required O(n2) when applying this algorithm on...
The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected...
In Subgroup Discovery, one is interested in finding subgroups that behave differently from the ‘average’ behavior of the entire population. In many cases, such an approach works well because the general population is rather homogeneous, and the subgroup encompasses clear outliers. In more complex situations however, the investigated population is a...
In the literature, languages have been identified as having more or less transparent orthographies, depending on the degree of predictability of their spelling-to-sound correspondences. Quantitative measures based on large-scaled language corpora which are capable to objectively assess such cross-linguistic variation are rather scarce. The quantita...
The clinical assessment of speech discrimination by professional audiologists is resource intensive. Yet discrepancies in language or dialect between the test subject and the audiologist may cause a significant bias in the test result. To address these issues, a speech audiometric test (SAT) has been designed to be language/dialect independent and...
The goal of this research was to develop a system that will automatically measure changes in the emotional state of a speaker, by analyzing his/her voice. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually splitted into speech units. Three recordings per speaker were collected, in which he/she was in a positive, neut...
In this paper we present a hybrid method for identifying suspicious behavior in transactional data by combining techniques from outlier detection and subgroup discovery. Most existing outlier detection approaches focus on the identification of single outliers without providing a description of these outliers. Moreover, these methods find single out...
Conventional techniques for detecting outliers address the problem of finding isolated observations that significantly differ
from other observations that are stored in a database. For example, in the context of health insurance, one might be interested
in finding unusual claims concerning prescribed medicines. Each claim record may contain informa...
Recommender systems are increasingly used for personalised navigation through large amounts of information, especially in the e-commerce domain for product purchase advice. Whilst much research effort is spent on developing recommenders further, there is little to no effort spent on analysing the impact of them --- neither on the supply (company) n...
The objective of this process evaluation study was to gain insight into the reach, compliance, appreciation, usage barriers, and users' perceived effectiveness of a web-based intervention http://www.wiagesprek.nl. This intervention was aimed at empowerment of disability claimants, prior to the assessment of disability by an insurance physician.
Rea...
Recommender systems are increasingly used for per-sonalised navigation through large amounts of informa-tion, especially in the e-commerce domain for product purchase advice. Whilst much research effort is spent on developing recommenders further, there is little to no effort spent on analysing the impact of them – nei-ther on the supply (company)...
We study wireless ad-hoc networks consisting of small microprocessors with limited memory, where the wireless communication between the processors can be highly unreliable. For this setting, we propose a number of algorithms to estimate the number of nodes in the network, and the number of direct neighbors of each node. The algorithms are simulated...
In this paper we describe an interactive approach for finding outliers in big sets of records, such as collected by banks, insurance companies, web shops. The key idea behind our approach is the usage of an easy-to-compute and easy-to-interpret outlier score function. This function is used to identify a set of potential outliers. The outliers, orga...
A new method for maintaining a Gaussian mixture model of a data stream that arrives in blocks is presented. The method constructs local Gaussian mixtures for each block of data and iteratively merges pairs of closest compo- nents. Time and space complexity analysis of the presented approach demon- strates that it is 1-2 orders of magnitude more eff...
The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simul-taneously and the RL is changing the EA parameters on-the-fly. We evaluate this approach experimentally on a range of fitn...
In this paper we describe a practical approach for modeling navigation patterns of visitors of unstructured websites. These
patterns are derived from web logs that are enriched with 3 sorts of information: (1) content type of visited pages, (2) visitor
type, and (3) location of the visitor. We developed an intelligent Text Mining system, iTM, which...
It has been recently demonstrated that the classical EM al- gorithm for learning Gaussian mixture models can be successfully im- plemented in a decentralized manner by resorting to gossip-based ran- domized distributed protocols. In this paper we describe a gossip-based implementation of an alternative algorithm for learning Gaussian mix- tures in...
The paper presents results of analysis of clickstream data in the context of the ECML/PKDD Challenge. We focused on two aspects: detection of anomalies and profiling visitors of internet shops. Several unusual patterns were discovered with the help of simple tools such as frequency tables, histograms, etcetera. Some insight into click-behaviour of...
The Internet, which is becoming a more and more dynamic, extremely heterogeneous network has recently became a platform for huge fully distributed peer-to-peer overlay networks containing millions of nodes typically for the purpose of information dissemination and file sharing. This paper targets the problem of analyzing data which are scattered ov...
The emergence of the Internet as a computing platform increases the demand for new classes of algorithms that combine massive distributed processing and complete decentralization. Moreover, these algorithms should be able to execute in an environment that is heterogeneous, changes almost continuously, and consists of millions of nodes. An important...
Monitoring large computer networks often involves aggregation of various sorts of data that are distributed across network components. Finding extreme values, counting discrete observations or computing an average or a sum of some parameter values are typical examples of such "background" activities that provide input to monitoring systems. Another...
We propose a gossip-based distributed algorithm for Gaussian mixture learning, Newscast EM. The algorithm operates on network topologies where each node observes a local quantity and can communicate with other nodes in an arbitrary point-to-point fashion. The mai n difference between Newscast EM and the standard EM algorithm is that the M-step in o...
We present an approximate algorithm for reconstructing internals of multi-layer perceptrons from membership queries. The key component of the algorithm is a procedure for reconstructing weights of a single linear threshold unit. We prove that the approximation error, measured as the distance between the original and the reconstructed weights, is dr...
Stylish differences among poets are usually sought in sound and semantics. In human analysis, the criteria for recognizing stylistic differences are manifold and intermingled. This study demonstrates that successful identification of poets based on their work is possible using one criterion: letter sequences. Poets show preferences for certain lett...
We present an approach to knowledge discovery in databases that is based on the idea of the attribute space partition. An inspiration for this approach was the methodology of the rough set theory. Two systems, ProbRough and TRANCE, which are representative of this approach are capable of inducing decision rules from databases with practically unlim...
Abstract . Some constraint satisfaction problems are over-constrained, i.e. it is impossible to find a solution that would satisfy all the constraints. In such situations one often assigns some weights (utilities) to constraints and looks for solutions that maximize the sum of the weights of satisfied constraints. The problem of finding such soluti...
This paper contains results of a research project aiming at modelling the phenomenon of customer retention. Historical data from a database of a big mutual fund investment company have been analyzed with three techniques: logistic regression, rough data models, and genetic programming. Models created by these techniques were used to gain insights i...
Ethologists and farmers are convinced that sounds produced by animals allow the recognition of individuals and include information about their state. This may have practical implications for increasing the efficiency of holding and handling farm and other animals. The paper presents an application of digital signal processing techniques to identify...
: The paper contains results of a research project aimed at application and evaluation of modern data analysis techniques in the field of marketing. The investigated techniques were: neural networks, evolutionary algorithms, CHAID and logistic regression analysis. All techniques were applied to the problem of making optimal selections for direct ma...
Many practical problems are overconstrained, i.e., it is impossible to find a solution that would satisfy all constraints. In such situations one may try to find solutions that satisfy as many constraints as possible, see [1]. In a more general approach one can assign some weights (utilities) to constraints and then look for solutions that maximize...
The paper presents a general framework for developing complex systems for tasks like project management, dynamic scheduling and design. The framework is based on the concept of solving meta-constraint satisfaction problems interactively and it uses a number of basic notions from Artificial Intelligence: states, search space, operators that modify t...
Our task was to find a model for classifying ROBECO clients into four classes according to their degree of satisfaction. Each client was represented by a vector of 30 variables, which could be split into two groups: variables related to the specific client and socio-geographical variables characterizing the area in which the client lived. The origi...