Peter van der Putten's research while affiliated with Leiden University and other places

Publications (49)

Chapter
Newspapers write for a particular readership and from a certain ideological or political perspective. This paper applies various natural language processing methods to newspaper articles to analyse to which extent the ideological positioning of newspapers is reflected in their writing. Political bias is illustrated in terms of coverage bias and age...
Article
Due to an increased presence of robots in human-inhabited environments, we observe a growing body of examples in which humans show behavior that is indicative of strong social engagement towards robots that do not possess any life-like realism in appearance or behavior. In response, we focus on the under-explored concept of a common locus as a rele...
Preprint
Full-text available
During the COVID-19 pandemic, large amounts of COVID-19 misinformation are spreading on social media. We are interested in the stance of Twitter users towards COVID-19 misinformation. However, due to the relative recent nature of the pandemic, only a few stance detection datasets fit our task. We have constructed a new stance dataset consisting of...
Article
Full-text available
Conversational artificial agents and artificially intelligent (AI) voice assistants are becoming increasingly popular. Digital virtual assistants such as Siri, or conversational devices such as Amazon Echo or Google Home are permeating everyday life, and are designed to be more and more humanlike in their speech. This study investigates the effect...
Preprint
Full-text available
The dominant paradigm in spatiotemporal action detection is to classify actions using spatiotemporal features learned by 2D or 3D Convolutional Networks. We argue that several actions are characterized by their context, such as relevant objects and actors present in the video. To this end, we introduce an architecture based on self-attention and Gr...
Preprint
Full-text available
Sign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary features such as handshape and location. This study focuses on exploring a reverse search functionality where a user can sign a query sign in front of a...
Chapter
The dominant paradigm in spatiotemporal action detection is to classify actions using spatiotemporal features learned by 2D or 3D Convolutional Networks. We argue that several actions are characterized by their context, such as relevant objects and actors present in the video. To this end, we introduce an architecture based on self-attention and Gr...
Chapter
Computer simulations have been used to model psychological and sociological phenomena in order to provide insight into how they affect human behavior and population-wide systems. In this study, three agent-based simulations (ABSs) were developed to model opinion dynamics in an online social media context. The main focus was to test the effects of ‘...
Conference Paper
In this paper we explore the use of deep neural networks to analyze semi-structured series of artworks. We train stacked Restricted Boltzmann Machines on the Exactitudes collection of photo series, and use this to understand the relationship between works and series, uncover underlying features and dimensions, and generate new images. The projectio...
Conference Paper
The aim of this study is to find to what extent computers can assist humans in the creative process of writing titles, using psychological tests for creativity that are typically used for humans only. To this end, a computer tool was designed that recommends new titles to users, based on knowledge generated from a pre-built corpus. This paper gives...
Conference Paper
Full-text available
It is often claimed that data pre-processing is an important factor contributing towards the performance of classification algorithms. In this paper we investigate feature selection, a common data pre-processing technique. We conduct a large scale experiment and present results on what algorithms and data sets benefit from this technique. Using met...
Conference Paper
This paper introduces The Morality Machine, a system that tracks ethical sentiment in Twitter discussions. Empirical approaches to ethics are rare, and to our knowledge this system is the first to take a machine learning approach. It is based on Moral Foundations Theory, a framework of moral values that are assumed to be universal. Carefully handcr...
Conference Paper
Full-text available
In this paper we study the effect of target set size on transfer learning in deep learning convolutional neural networks. This is an important problem as labelling is a costly task, or for new or specific classes the number of labelled instances available may simply be too small. We present results for a series of experiments where we either train...
Article
In this paper, we report on a machine learning approach to condensing class diagrams. The goal of the algorithm is to learn to identify what classes are most relevant to include in the diagram, as opposed to full reverse engineering of all classes. This paper focuses on building a classifier that is based on the names of classes in addition to desi...
Article
This paper outlines the approach developed together with the Radio Network Strategy & Design Department of a large European telecom operator in order to forecast the Air-Interface load in their 3G network, which is used for planning network upgrades and budgeting purposes. It is based on large scale intelligent data analysis and modeling at the lev...
Conference Paper
This paper outlines an approach developed as a part of a company-wide churn management initiative of a major European telecom operator. We are focusing on explanatory churn model for the postpaid segment, assuming that the mobile telecom network, the key resource of operators, is also a churn driver in case it under delivers to customers’ expectati...
Conference Paper
There is a range of techniques available to reverse engineer software designs from source code. However, these approaches generate highly detailed representations. The condensing of reverse engineered representations into more high-level design information would enhance the understandability of reverse engineered diagrams. This paper describes an a...
Conference Paper
This paper outlines the approach developed together with the Radio Network Strategy & Design Department of a large telecom operator in order to forecast the Air-Interface load in their 3G network, which is used for planning network upgrades and budgeting purposes. It is based on large scale intelligent data analysis and modeling at the level of tho...
Poster
Full-text available
This project specifically aims at reconstructing the class diagrams from source code in such a way that unnecessary detail that results from reverse engineering is eliminated. This is work in progress and in this abstract we report on early results and open problems. We have spent considerable effort to construct benchmark data sets for 10 pairs of...
Article
Full-text available
This ongoing research addresses the use of page ranking for computing relatedness coef-ficients between pairs of nodes in a directed graph, based on their edge structure. A novel, hybrid algorithm is proposed for a complete assessment of nodes and their connecting edges, which is then applied to a practical application, namely a recommender system...
Article
With no data, there is nothing to mine in. Multiple of sources of data can exist, and linking this data together can be non trivial. Assume we are given an instance, representing for example a customer. The problem of merging information from different sources about this particular instance, assuming it can’t be done with simple joins, is
Conference Paper
Prepaid customers in mobile telecommunications are not bound by a contract and can therefore change operators (‘churn’) at their convenience and without notification. This makes the task of predicting churn both challenging and financially rewarding. This paper presents an explorative, real world study of prepaid churn modeling by varying on three...
Chapter
Many data mining papers start with claiming that the exponential growth in the amount of data provides great opportunities for data mining. Reality can be different though. In real world applications, the number of sources over which this information is fragmented can grow at an even faster rate, resulting in barriers to widespread application of d...
Conference Paper
Full-text available
A central debate in visual perception theory is the argument for indirect versus direct perception; i.e., the use of intermediate, abstract, and hierarchical representations versus direct semantic interpretation of images through interaction with the outside world. We present a content-based representation that combines both approaches. The previou...
Article
In this paper we present an approach for bench-marking and profiling novel classification algorithms. We apply it to AIRS, an artificial immune system algorithm inspired by how the natural immune system recognizes and remembers intruders. We provide basic benchmarking results for AIRS, to our knowledge the first such test under standardised conditi...
Conference Paper
We describe a model for estimating the customer lifetime value (CLV) of customers in an e-commerce environment. The model is explained and experiments are performed on real-life data from a large Dutch Internet retailer. Our method results in CLV estimates that have similar accuracy to estimates generated by the commonly used model, while keeping t...
Article
Full-text available
Morphometrics from images, image analysis, may reveal differences between classes of objects present in the images. We have performed an image-features-based classification for the pathogenic yeast Cryptococcus neoformans. Building and analyzing image collections from the yeast under different environmental or genetic conditions may help to diagnos...
Chapter
Full-text available
The work presented here introduces a real time automatic scene classifier within content-based video retrieval. In our envisioned approach end users like documentalists, not image processing experts, build classifiers interactively, by simply indicating positive examples of a scene. Classification consists of a two stage procedure. First, small ima...
Conference Paper
Full-text available
The yeast cryptococcus neoformans can cause dangerous infections such as meningitis. The presence of a thick capsule is shown to be correlated with virulence of a yeast cell. This paper reports on our approach towards developing a classifier for detecting virulent cells in images. We present our methods for creating samples, collecting images, prep...
Conference Paper
Artificial Immune Systems are a new class of algorithms inspired by how the immune system recognizes, attacks and remembers intruders. This is a fascinating idea, but to be accepted for mainstream data mining applications, extensive benchmarking is needed to demonstrate the reliability and accuracy of these algorithms. In our research we focus on t...
Article
The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a...
Article
Full-text available
Data fusion is the process of enriching data sets by combining information from different sources, to provide a single data set to mine in. Data fusion projects are complex, and to structure these we have built a process model for data fusion, inspired by the CRISP process model for data mining. The end goal is to build a fusion factory, where fusi...
Conference Paper
Full-text available
this paper we position data fusion as both a key enabling technology and an interesting research topic for data mining. A fair amount of work has been done on data fusion over the past 30 years, but primarily outside the knowledge discovery community. We would like to share and summarize the main approaches taken so far from a data mining perspecti...
Conference Paper
In this paper we present a neural network for nonmetric multidimensional scaling. In our approach, the monotone transformation that is a part of every nonmetric scaling algorithm is performed by a special feedforward neural network with a modified backpropagation algorithm. Contrary to traditional methods, we thus explicitly model the monotone tran...
Conference Paper
For many direct marketing activities, organisations frequently find that customer databases do not contain enough information. Additional databases such as socio-economic databases constructed from census and survey data can be purchased to supplement customer databases. One of the difficulties in fusing separate databases however is that the infor...
Article
Full-text available
We present the problem tasks of the CoIL Challenge as they were explained to the participants. Furthermore, a general overview is given of the Challenge results.
Article
Full-text available
This work has been done as part of the EU VICAR (IST) project and the EU SCOFI project (IAP). The aim of the r st project was to develop a real time video indexing classicat ion annotation and retrieval system. For our systems, we have adapted the approach of Picard and Minka (3), who categorized elements of a scene automatically with so-called 'st...

Citations

... Rather than scrutinizing the logistics of building these (largely) commercial products, or assessing the experiences of end users, as has been done within a large body of earlier studies [1][2][3], I focus in this paper on the perspectives of those who must articulate, quantify, and measure usability in real time through their work on Conversational Voice Assistants (CVAs). I argue that this technology, which so often forces humans to adapt to its idiosyncrasies (training humans to work for and with AI rather than making AI work for humans), poses a significant challenge to the field of user experience. ...
... For the latter, this can be explained by the nature of the RF algorithm, which trains at each iteration a number of different trees with randomly selected features. Because of this built-in feature selection mechanism, tree-based algorithms often do not benefit from feature selection [46]. ...
... Verkoelen et al. [169] trained Restricted Boltzmann Machines (RBM) [170] with Deep Neural Networks for image classification. They used Exactitude's dataset [171], which contains 154 series of portraits of people. ...
... People commonly volunteer moral judgements on others' or their own actions, and attempts to extract these judgements automatically from social media texts have led to interesting insights on social behaviour (Teernstra et al., 2016;Johnson and Goldwasser, 2018;Hoover et al., 2020;Botzer et al., 2022). On the other hand, some researchers have argued that machines need to be explicitly trained to be able to make ethical judgements as a step towards ensuring their ethical behaviour when interacting with humans. ...
... Secondly, since current methods of identification of concentration levels can involve imaging the particulates individually or as a mass (e.g., photographs of skylines), application of neural networks has focussed on these areas. Since the accuracy of neural networks tend to be dependent on the amount of training data [116], accuracies of image-based neural networks are likely to improve over time as more data can be acquired. ...
... (2) less need for storage and memory when implementing the classifier; and (3) an increase in the ability to interpret the model generated [11]. ...
... The idea of locally occurrent unanticipated changes in churn probabilities is supported by several publications concerning the topic of churn in the neighborhood of influential churners (Dasgupta et al. 2008;Kusuma et al. 2013;Droftina et al. 2015a, b). For example, Droftina et al. (2015b, p. 1) assert that "highly influential customers deserve special attention, since their churns can also trigger churns of their peers." ...
... In contrast, logistic regression and Naive Bayes model parameters are estimated based on the potentially large number of instances and can thus be seen as more global models [21]. On the other hand, OneR generates one-level decision tree expressed in the form of a set of rules that all test one particular attribute and ZeroR predict the majority class (if nominal) or the average value (if numeric) [14], More explanation about these algorithms can be found at [14] and [22], ii. Classification model construction This task is supported by WEKA [14] (tool). ...
... The problem is of relevance, if QoE inference is derived as a function of all the information available, such as network state, results of marketing campaigns, contractual, demographic, billing, handset, market, customer survey data and other factors. For instance, in [16], 750 features are jointly analyzed. The amount of information may become even larger if other critical factors are considered, such as cyber-security indicators [1], or in case network monitoring data [5] are directly put as an input to the forecast model. ...