Jan Ramon

Jan Ramon
KU Leuven | ku leuven · Department of Computer Science

PhD

About

149
Publications
44,680
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,978
Citations
Citations since 2017
6 Research Items
836 Citations
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
Additional affiliations
March 2013 - present
KU Leuven
Position
  • IWT-SBO InSPECtor
Description
  • http://www.proteinspector.com/
October 2011 - present
KU Leuven
Position
  • Probabilistic structured models: learning from large hybrid domains
Description
  • http://dtai.cs.kuleuven.be/research/projects/OTPSM11
December 2009 - present
KU Leuven
Position
  • ERC StG MiGraNT: Mining Graphs and Networks, a Theory-based approach
Description
  • http://people.cs.kuleuven.be/~jan.ramon/MiGraNT/
Education
October 1997 - October 2002
KU Leuven
Field of study
  • Computer science
October 1994 - June 1997
KU Leuven
Field of study
  • Engineering sciences
October 1993 - September 1994
KU Leuven
Field of study
  • Physics

Publications

Publications (149)
Conference Paper
Full-text available
Mining frequent patterns in a single network (graph) poses a number of challenges. Already only to match one path pattern to a network (upto subgraph isomorphism) is NP-complete. Matching algorithms that exist, become intractable even for reasonably small patterns, on networks which are large or have a high average degree. Based on recent advances...
Article
Full-text available
Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample where two or more training examples may share common features. We propose an efficient weighting method for learning from networked examples and show the sa...
Article
Trypsin is the workhorse protease in mass spectrometry based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction wit...
Article
Maximum common substructures (MCS) have received a lot of attention in the chemoinformatics community. They are typically used as a similarity measure between molecules, showing high predictive performance when used in classification tasks, while being easily explainable substructures. In the present work, we applied the Pairwise Maximum Common Sub...
Article
Full-text available
Significance Systems biology involves the development of large computational models of biological systems. The radical improvement of systems biology models will necessarily involve the automation of model improvement cycles. We present here a general approach to automating systems biology model improvement. Humans are eukaryotic organisms, and the...
Article
Full-text available
Counting the number of times a pattern occurs in a database is a fundamental data mining problem. It is a subroutine in a diverse set of tasks ranging from pattern mining to supervised learning and probabilistic model learning. While a pattern and a database can take many forms, this paper focuses on the case where both the pattern and the database...
Article
Full-text available
Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards un...
Article
Full-text available
We provide a systematic approach to deal with the following problem. Let $X_1,\ldots,X_n$ be, possibly dependent, $[0,1]$-valued random variables. What is a sharp upper bound on the probability that their sum is significantly larger than their mean? In the case of independent random variables, a fundamental tool for bounding such probabilities is d...
Article
This article introduces a new type of structural fragment called a geometrical pattern. Such geometrical patterns are defined as molecular graphs that include a labelling of atoms together with constraints on interatomic distances. The discovery of geometrical patterns in a chemical dataset relies on the induction of multiple decision trees combine...
Article
We provide a lower bound on the probability that a binomial random variable is exceeding its mean. Our proof employs estimates on the mean absolute deviation and the tail conditional expectation of binomial random variables.
Article
With the current expanded technical capabilities to perform mass spectrometry-based biomedical proteomics experiments, an improved focus on the design of experiments is crucial. As it is clear that ignoring the importance of a good design leads to an unprecedented rate of false discoveries which would poison our results, more and more tools are dev...
Article
Let $Y_v, v\in V,$ be $[0,1]$-valued random variables having a dependency graph $G=(V,E)$. We show that \[ \mathbb{E}\left[\prod_{v\in V} Y_{v} \right] \leq \prod_{v\in V} \left\{ \mathbb{E}\left[Y_v^{\frac{\chi_b}{b}}\right] \right\}^{\frac{b}{\chi_b}}, \] where $\chi_b$ is the $b$-fold chromatic number of $G$. This inequality may be seen as a dep...
Conference Paper
Kernels for structured data have gained a lot of attention in a world with an ever increasing amount of complex data, generated from domains such as biology, chemistry, or engineering. However, while many applications involve spatial aspects, up to now only few kernel methods have been designed to take 3D information into account. We introduce a no...
Article
Full-text available
We consider the naive bottom-up concatenation scheme for a context-free language and show that this scheme has the incremental polynomial time property. This means that all members of the language can be enumerated without duplicates so that the time between two consecutive outputs is bounded by a polynomial in the number of strings already generat...
Article
Statistical relational learning (SRL) is concerned with developing formalisms for representing and learning from data that exhibit both uncertainty and complex, relational structure. Most of the work in SRL has focused on modeling and learning from data that only contain discrete variables. As many important problems are characterized by the presen...
Article
Full-text available
We show that the, so-called, Bernstein-Hoeffding method can be employed to a larger class of generalized moments. This class contains the exponential moments whose properties play a key role in the proof of a well-known inequality of Wassily Hoeffding, regarding sums of independent and bounded random variables whose mean is assumed to be known. As...
Article
Full-text available
Subject Areas: biotechnology, computational biology, synthetic biology There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robo...
Article
Full-text available
Background A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates differen...
Conference Paper
Full-text available
Causal polytrees are singly connected causal models and they are frequently applied in prac-tice. However, in various applications, many variables remain unobserved and causal poly-trees cannot be applied without explicitly includ-ing unobserved variables. Our study thus propos-es the ancestral polytree model, a novel combi-nation of ancestral grap...
Article
Many machine learning algorithms are based on the assumption that training examples are drawn identically and independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We first show tha...
Article
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Si...
Article
Metrics for structured data have received an increasing interest in the machine learning community. Graphs provide a natural representation for structured data, but a lot of operations on graphs are computationally intractable. In this article, we present a polynomial-time algorithm that computes a maximum common subgraph of two outerplanar graphs....
Article
Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of overlap-graph based approaches is that they combine anti-monotonicity with counting the occurrences of a subgraph pattern which are independent...
Article
Full-text available
Présentation accessible ici : http://sfci2013.loria.fr/wp-content/uploads/2013/10/SFCi13-Ve_11_09h00-Comparing_chemical_fingerprints_for_ecotoxicology.pdf
Article
Full-text available
This paper explores the use of predicate logic as a modeling language. Using IDP3, a finite model generator that supports first order logic enriched with types, inductive definitions, aggregates and partial functions, search problems stated in a variant of predicate logic are solved. This variant is introduced and applied on a range of problems ste...
Article
Full-text available
We present PIUS, a tool that identifies peptides from tandem mass spectrometry data by analyzing the six-frame translation of a complete genome. It differs from earlier studies that have performed such a genomic search in two ways: (i) it considers a larger search space and (ii) it is designed for natural peptide identification rather than proteomi...
Article
Monte Carlo tree search (MCTS) is a sampling and simulation based technique for searching in large search spaces containing both decision nodes and probabilistic events. This technique has recently become popular due to its successful application to games, e.g. Poker Van den Broeck et al. (2009) and Go Coulom (2006); Chaslot et al. (2006); Gelly an...
Conference Paper
Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of the overlap graph based approaches is that they combine anti-monotonicity with counting occurrences of a pattern which are independent accordin...
Conference Paper
Full-text available
This paper reports on the use of the FO(·) language and the IDP framework for modeling and solving some machine learning and data mining tasks. The core component of a model in the IDP framework is an FO(·) theory consisting of formulas in first order logic and definitions; the latter are basically logic programs where clause bodies can have arbitr...
Article
Full-text available
Development of acute kidney injury (AKI) during the postoperative period is associated with increases in both morbidity and mortality. The aim of this study is to develop a statistical model capable of predicting the occurrence of AKI in patients after elective cardiac surgery.
Article
Probabilistic logical models have proven to be very success-ful at modelling uncertain, complex relational data. Most current models and implementations focus on modelling domains that only have discrete variables. Yet many real-world problems are hybrid and have both dis-crete and continuous variables. In this paper we focus on the Logical Bayesia...
Article
Full-text available
In graph mining, a frequency measure for graphs is anti-monotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where frequent subgraphs have to be found in one graph. Vanetik, Gudes and Shimony already gave...
Article
Full-text available
The intensive care unit (ICU) length of stay (LOS) of patients undergoing cardiac surgery may vary considerably, and is often difficult to predict within the first hours after admission. The early clinical evolution of a cardiac surgery patient might be predictive for his LOS. The purpose of the present study was to develop a predictive model for I...
Article
Full-text available
Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is d...
Article
Full-text available
The standard approach to feature construction and predictive learning in molecular datasets is to employ computationally expensive graph mining techniques and to bias the feature search exploration using frequency or correlation measures. These features are then typically employed in predictive models that can be constructed using, for example, SVM...
Chapter
Full-text available
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this inte...
Article
The frequent connected subgraph mining problem, i.e., the problem of listing all connected graphs that are subgraph isomorphic to at least a certain number of transaction graphs of a database, cannot be solved in output polynomial time in the general case. If, however, the transaction graphs are restricted to forests then the problem becomes tracta...
Article
Full-text available
This work studies the impact of using dynamic information as features in a machine learning algorithm for the prediction task of classifying critically ill patients in two classes according to the time they need to reach a stable state after coronary bypass surgery: less or more than 9 h. On the basis of five physiological variables (heart rate, sy...
Article
Full-text available
Probability trees are decision trees that predict class probabilities rather than the most likely class. The pruning criterion used to learn a probability tree strongly influences the size of the tree and thereby also the quality of its probability estimates. While the effect of pruning criteria on classification accuracy is well-studied, only rece...
Conference Paper
Full-text available
We investigate the use of Monte-Carlo Tree Search (MCTS) within the field of computer Poker, more specifically No-Limit Texas Hold’em. The hidden information in Poker results in so called miximax game trees where opponent decision nodes have to be modeled as chance nodes. The probability distribution in these nodes is modeled by an opponent model t...
Article
Computerization in healthcare in general, and in the operating room (OR) and intensive care unit (ICU) in particular, is on the rise. This leads to large patient databases, with specific properties. Machine learning techniques are able to examine and to extract knowledge from large databases in an automatic way. Although the number of potential app...
Article
Many pattern recognition and machine learning approaches employ a distance met- ric on patterns, or a generality relation to partially order the patterns. We investi- gate the relationship amongst them and prove a theorem that shows how a distance metric can be derived from a partial order (and a corresponding size on patterns) under mild condition...
Article
Full-text available
This work studies the impact of using dynamic information as features in a machine learning algorithm for the prediction task of classifying critically ill patients in two classes according to the time they need to reach a stable state after coronary bypass surgery: less or more than nine hours. On the basis of five physiological variables differen...
Article
In this paper we investigate the evolutionary dynamics of strategic behavior in the game of poker by means of data gathered from a large number of real world poker games. We perform this study from an evolutionary game theoretic perspective using two Replicator Dynamics models. First we consider the basic selection model on this data, secondly we u...
Article
Full-text available
Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efficiently for many classes of graphs of restricted topology, such as trees. In this article we introduce the concept of a dense augmentation sch...
Article
Full-text available
Bioinformatics is an application domain where information is naturally represented in terms of relations between heterogenous objects. Modern experimentation and data acquisition techniques allow the study of complex interactions in biological systems. This raises interesting challenges for machine learning and data mining researchers, as the amoun...
Conference Paper
Full-text available
Relational reinforcement learning (RRL) has emerged in the machine learning community as a new promising subfield of reinforcement learning (RL) (e.g. [1]). It upgrades RL techniques by using relational representations for states, actions and learned value-functions or policies to allow more natural representations and abstractions of complex tasks...
Conference Paper
Full-text available
Experimental results often present a substantial fraction of missing and censored values. Here we propose a strategy to perform principal component analysis under this specific incomplete information hypothesis. This allows the reconstruction of the missing information in a way consistent with the experimental observations.
Article
Full-text available
In recent years, there has been a growing interest in using rich repre- sentations such as relational languages for reinforcement learning. However, while expressive languages have many advantages in terms of generalization and reasoning, extending existing approaches to such a relational setting is a non-trivial problem. For a relational reinforce...
Conference Paper
Full-text available
In graph mining, a frequency measure is anti-monotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where the dataset is a single graph. Vanetik, Gudes and Shimony already gave sufficient and necessary cond...
Conference Paper
Full-text available
In machine learning, there has been an increased interest in metrics on structured data. The application we focus on is drug discovery. Although graphs have become very popular for the representation of molecules, a lot of operations on graphs are NP-complete. Representing the molecules as outerplanar graphs, a subclass within general graphs, and u...
Conference Paper
Full-text available
An important task in many scientific and engineering disciplines is to set up experiments with the goal of finding the best instances (substances, compositions, designs) as evaluated on an unknown target function using limited resources. We study this problem using machine learning principles, and introduce the novel task of active k-optimization....
Conference Paper
Full-text available
The frequent connected subgraph mining problem, i.e., the problem of listing all connected graphs that are subgraph isomorphic to at least a certain number of transaction graphs of a database, cannot be solved in output polynomial time in the general case. If, however, the transaction graphs are restricted to forests then the problem becomes tracta...
Article
Full-text available
Recently, there has been an increasing interest in directed probabilistic logical models and a variety of formalisms for describing such models has been proposed. Although many authors provide high-level arguments to show that in principle models in their formalism can be learned from data, most of the proposed learning algorithms have not yet been...
Conference Paper
Full-text available
State representation for intelligent agents is a continuous challenge as the need for abstraction is unavoidable in large state spaces. Pre- dictive representations offer one way to obtain state abstraction by replacing a state with a set of predictions about future interactions with the world. One such formalism is the Temporal-Difference Net- wor...
Conference Paper
Full-text available
We propose an opponent modeling approach for no- limit Texas hold-em poker that starts from a (learned) prior, i.e., general expectations about opponent behav- ior and learns a relational regression tree-function that adapts these priors to specific opponents. An important asset is that this approach can learn from incomplete in- formation (i.e. wi...
Article
Full-text available
In this paper we investigate the evolutionary dynamics of strategic behaviour in the game of poker by means of data gathered from a large number of real-world poker games. We perform this study from an evolutionary game theoretic perspective using the Replicator Dynamics model. We investigate the dynamic properties by studying how players switch be...
Article
Full-text available
In graph mining, a frequency measure is anti-monotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where the dataset is a single graph. Vanetik, Gudes and Shimony already gave sufficient and necessary cond...
Article
Full-text available
We study the task of approximating the k best instances with regard to a function us- ing a limited number of evaluations. We also apply an active learning algorithm based on Gaussian processes to the problem, and evaluate it on a challenging set of structure- activity relationship prediction tasks.
Conference Paper
Full-text available
Recently, there has been an increasing interest in directed probabilistic logical models and a variety of languages for describing such models has been proposed. Although many authors provide high-level arguments to show that in principle models in their language can be learned from data, most of the proposed learning algorithms have not yet been s...
Conference Paper
Full-text available
In this paper we investigate the relation between transfer learning in reinforcement learning with function approximation and su- pervised learning with concept drift. We present a new incremental rela- tional regression tree algorithm that is capable of dealing with concept drift through tree restructuring and show that it enables a reinforcement...
Conference Paper
Full-text available
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networks. In this paper we propose to upgrade another algorithm, namely ordering-search, since for Bayesian networks this was found to work...
Article
In this paper we describe the application of data mining methods for predicting the evolution of patients in an intensive care unit. We discuss the importance of such methods for health care and other application domains of engineering. We argue that this problem is an important but challenging one for the current state of the art data mining metho...
Conference Paper
Full-text available
There is an increasing interest in upgrading Bayesian networks to the relational case, resulting in directed probabilistic logical models. Many formalisms to describe such models have been introduced and learning algorithms have been developed for several such formalisms. Most of these algorithms are upgrades of the traditional structure search alg...
Conference Paper
Full-text available
In recent years, there has been a growing inter- est in using rich representations such as relational languages for reinforcement learning. However, while expressive languages have many advantages in terms of generalization and reasoning, extending existing approaches to such a relational setting is a non-trivial problem. In this paper, we present...
Conference Paper
Full-text available
We consider the problem of policy learning in a Markov Decision Process . A MDP consists of a state spaceS, a set of actions A, a transition probability function t(s,a,s0) and a reward function R : S ! R. The problem is to find a policy, a