Jan Ramon

Jan Ramon
  • PhD
  • Principal Investigator at KU Leuven

About

150
Publications
49,607
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,314
Citations
Current institution
KU Leuven
Current position
  • Principal Investigator
Additional affiliations
March 2013 - present
KU Leuven
Position
  • IWT-SBO InSPECtor
Description
  • http://www.proteinspector.com/
October 2011 - present
KU Leuven
Position
  • Probabilistic structured models: learning from large hybrid domains
Description
  • http://dtai.cs.kuleuven.be/research/projects/OTPSM11
December 2009 - present
KU Leuven
Position
  • ERC StG MiGraNT: Mining Graphs and Networks, a Theory-based approach
Description
  • http://people.cs.kuleuven.be/~jan.ramon/MiGraNT/
Education
October 1997 - October 2002
KU Leuven
Field of study
  • Computer science
October 1994 - June 1997
KU Leuven
Field of study
  • Engineering sciences
October 1993 - September 1994
KU Leuven
Field of study
  • Physics

Publications

Publications (150)
Conference Paper
Full-text available
Mining frequent patterns in a single network (graph) poses a number of challenges. Already only to match one path pattern to a network (upto subgraph isomorphism) is NP-complete. Matching algorithms that exist, become intractable even for reasonably small patterns, on networks which are large or have a high average degree. Based on recent advances...
Article
Full-text available
Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample where two or more training examples may share common features. We propose an efficient weighting method for learning from networked examples and show the sa...
Article
Trypsin is the workhorse protease in mass spectrometry based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction wit...
Article
Maximum common substructures (MCS) have received a lot of attention in the chemoinformatics community. They are typically used as a similarity measure between molecules, showing high predictive performance when used in classification tasks, while being easily explainable substructures. In the present work, we applied the Pairwise Maximum Common Sub...
Article
Full-text available
Significance Systems biology involves the development of large computational models of biological systems. The radical improvement of systems biology models will necessarily involve the automation of model improvement cycles. We present here a general approach to automating systems biology model improvement. Humans are eukaryotic organisms, and the...
Article
Full-text available
Counting the number of times a pattern occurs in a database is a fundamental data mining problem. It is a subroutine in a diverse set of tasks ranging from pattern mining to supervised learning and probabilistic model learning. While a pattern and a database can take many forms, this paper focuses on the case where both the pattern and the database...
Article
Full-text available
Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards un...
Article
Full-text available
We provide a systematic approach to deal with the following problem. Let $X_1,\ldots,X_n$ be, possibly dependent, $[0,1]$-valued random variables. What is a sharp upper bound on the probability that their sum is significantly larger than their mean? In the case of independent random variables, a fundamental tool for bounding such probabilities is d...
Article
This article introduces a new type of structural fragment called a geometrical pattern. Such geometrical patterns are defined as molecular graphs that include a labelling of atoms together with constraints on interatomic distances. The discovery of geometrical patterns in a chemical dataset relies on the induction of multiple decision trees combine...
Article
We provide a lower bound on the probability that a binomial random variable is exceeding its mean. Our proof employs estimates on the mean absolute deviation and the tail conditional expectation of binomial random variables.
Preprint
We provide a lower bound on the probability that a binomial random variable is exceeding its mean. Our proof employs estimates on the mean absolute deviation and the tail conditional expectation of binomial random variables.
Article
With the current expanded technical capabilities to perform mass spectrometry-based biomedical proteomics experiments, an improved focus on the design of experiments is crucial. As it is clear that ignoring the importance of a good design leads to an unprecedented rate of false discoveries which would poison our results, more and more tools are dev...
Article
Let $Y_v, v\in V,$ be $[0,1]$-valued random variables having a dependency graph $G=(V,E)$. We show that \[ \mathbb{E}\left[\prod_{v\in V} Y_{v} \right] \leq \prod_{v\in V} \left\{ \mathbb{E}\left[Y_v^{\frac{\chi_b}{b}}\right] \right\}^{\frac{b}{\chi_b}}, \] where $\chi_b$ is the $b$-fold chromatic number of $G$. This inequality may be seen as a dep...
Conference Paper
Kernels for structured data have gained a lot of attention in a world with an ever increasing amount of complex data, generated from domains such as biology, chemistry, or engineering. However, while many applications involve spatial aspects, up to now only few kernel methods have been designed to take 3D information into account. We introduce a no...
Article
Full-text available
We consider the naive bottom-up concatenation scheme for a context-free language and show that this scheme has the incremental polynomial time property. This means that all members of the language can be enumerated without duplicates so that the time between two consecutive outputs is bounded by a polynomial in the number of strings already generat...
Article
Statistical relational learning (SRL) is concerned with developing formalisms for representing and learning from data that exhibit both uncertainty and complex, relational structure. Most of the work in SRL has focused on modeling and learning from data that only contain discrete variables. As many important problems are characterized by the presen...
Article
Full-text available
We show that the, so-called, Bernstein-Hoeffding method can be employed to a larger class of generalized moments. This class contains the exponential moments whose properties play a key role in the proof of a well-known inequality of Wassily Hoeffding, regarding sums of independent and bounded random variables whose mean is assumed to be known. As...
Article
Full-text available
Subject Areas: biotechnology, computational biology, synthetic biology There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robo...
Article
Full-text available
Background A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates differen...
Conference Paper
Full-text available
Causal polytrees are singly connected causal models and they are frequently applied in prac-tice. However, in various applications, many variables remain unobserved and causal poly-trees cannot be applied without explicitly includ-ing unobserved variables. Our study thus propos-es the ancestral polytree model, a novel combi-nation of ancestral grap...
Article
Many machine learning algorithms are based on the assumption that training examples are drawn identically and independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We first show tha...
Article
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Si...
Article
Metrics for structured data have received an increasing interest in the machine learning community. Graphs provide a natural representation for structured data, but a lot of operations on graphs are computationally intractable. In this article, we present a polynomial-time algorithm that computes a maximum common subgraph of two outerplanar graphs....
Article
Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of overlap-graph based approaches is that they combine anti-monotonicity with counting the occurrences of a subgraph pattern which are independent...
Article
Full-text available
Présentation accessible ici : http://sfci2013.loria.fr/wp-content/uploads/2013/10/SFCi13-Ve_11_09h00-Comparing_chemical_fingerprints_for_ecotoxicology.pdf
Article
Full-text available
This paper explores the use of predicate logic as a modeling language. Using IDP3, a finite model generator that supports first order logic enriched with types, inductive definitions, aggregates and partial functions, search problems stated in a variant of predicate logic are solved. This variant is introduced and applied on a range of problems ste...
Article
Full-text available
We present PIUS, a tool that identifies peptides from tandem mass spectrometry data by analyzing the six-frame translation of a complete genome. It differs from earlier studies that have performed such a genomic search in two ways: (i) it considers a larger search space and (ii) it is designed for natural peptide identification rather than proteomi...
Article
Monte Carlo tree search (MCTS) is a sampling and simulation based technique for searching in large search spaces containing both decision nodes and probabilistic events. This technique has recently become popular due to its successful application to games, e.g. Poker Van den Broeck et al. (2009) and Go Coulom (2006); Chaslot et al. (2006); Gelly an...
Conference Paper
Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of the overlap graph based approaches is that they combine anti-monotonicity with counting occurrences of a pattern which are independent accordin...
Conference Paper
Full-text available
This paper reports on the use of the FO(·) language and the IDP framework for modeling and solving some machine learning and data mining tasks. The core component of a model in the IDP framework is an FO(·) theory consisting of formulas in first order logic and definitions; the latter are basically logic programs where clause bodies can have arbitr...
Article
Full-text available
Development of acute kidney injury (AKI) during the postoperative period is associated with increases in both morbidity and mortality. The aim of this study is to develop a statistical model capable of predicting the occurrence of AKI in patients after elective cardiac surgery.
Article
Probabilistic logical models have proven to be very success-ful at modelling uncertain, complex relational data. Most current models and implementations focus on modelling domains that only have discrete variables. Yet many real-world problems are hybrid and have both dis-crete and continuous variables. In this paper we focus on the Logical Bayesia...
Article
Full-text available
In graph mining, a frequency measure for graphs is anti-monotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where frequent subgraphs have to be found in one graph. Vanetik, Gudes and Shimony already gave...
Article
Full-text available
The intensive care unit (ICU) length of stay (LOS) of patients undergoing cardiac surgery may vary considerably, and is often difficult to predict within the first hours after admission. The early clinical evolution of a cardiac surgery patient might be predictive for his LOS. The purpose of the present study was to develop a predictive model for I...
Article
Full-text available
Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is d...
Article
Full-text available
The standard approach to feature construction and predictive learning in molecular datasets is to employ computationally expensive graph mining techniques and to bias the feature search exploration using frequency or correlation measures. These features are then typically employed in predictive models that can be constructed using, for example, SVM...
Chapter
Full-text available
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this inte...
Article
The frequent connected subgraph mining problem, i.e., the problem of listing all connected graphs that are subgraph isomorphic to at least a certain number of transaction graphs of a database, cannot be solved in output polynomial time in the general case. If, however, the transaction graphs are restricted to forests then the problem becomes tracta...
Article
Full-text available
This work studies the impact of using dynamic information as features in a machine learning algorithm for the prediction task of classifying critically ill patients in two classes according to the time they need to reach a stable state after coronary bypass surgery: less or more than 9 h. On the basis of five physiological variables (heart rate, sy...
Article
Full-text available
Probability trees are decision trees that predict class probabilities rather than the most likely class. The pruning criterion used to learn a probability tree strongly influences the size of the tree and thereby also the quality of its probability estimates. While the effect of pruning criteria on classification accuracy is well-studied, only rece...
Conference Paper
Full-text available
We investigate the use of Monte-Carlo Tree Search (MCTS) within the field of computer Poker, more specifically No-Limit Texas Hold’em. The hidden information in Poker results in so called miximax game trees where opponent decision nodes have to be modeled as chance nodes. The probability distribution in these nodes is modeled by an opponent model t...
Conference Paper
Full-text available
Experimental results often present a substantial fraction of missing and censored values. Here we propose a strategy to perform principal component analysis under this specific incomplete information hypothesis. This allows the reconstruction of the missing information in a way consistent with the experimental observations.
Article
Computerization in healthcare in general, and in the operating room (OR) and intensive care unit (ICU) in particular, is on the rise. This leads to large patient databases, with specific properties. Machine learning techniques are able to examine and to extract knowledge from large databases in an automatic way. Although the number of potential app...
Article
Many pattern recognition and machine learning approaches employ a distance met- ric on patterns, or a generality relation to partially order the patterns. We investi- gate the relationship amongst them and prove a theorem that shows how a distance metric can be derived from a partial order (and a corresponding size on patterns) under mild condition...
Article
Full-text available
This work studies the impact of using dynamic information as features in a machine learning algorithm for the prediction task of classifying critically ill patients in two classes according to the time they need to reach a stable state after coronary bypass surgery: less or more than nine hours. On the basis of five physiological variables differen...
Article
In this paper we investigate the evolutionary dynamics of strategic behavior in the game of poker by means of data gathered from a large number of real world poker games. We perform this study from an evolutionary game theoretic perspective using two Replicator Dynamics models. First we consider the basic selection model on this data, secondly we u...
Article
Full-text available
Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efficiently for many classes of graphs of restricted topology, such as trees. In this article we introduce the concept of a dense augmentation sch...
Article
Full-text available
Bioinformatics is an application domain where information is naturally represented in terms of relations between heterogenous objects. Modern experimentation and data acquisition techniques allow the study of complex interactions in biological systems. This raises interesting challenges for machine learning and data mining researchers, as the amoun...
Conference Paper
Full-text available
Relational reinforcement learning (RRL) has emerged in the machine learning community as a new promising subfield of reinforcement learning (RL) (e.g. [1]). It upgrades RL techniques by using relational representations for states, actions and learned value-functions or policies to allow more natural representations and abstractions of complex tasks...
Article
Full-text available
In recent years, there has been a growing interest in using rich repre- sentations such as relational languages for reinforcement learning. However, while expressive languages have many advantages in terms of generalization and reasoning, extending existing approaches to such a relational setting is a non-trivial problem. For a relational reinforce...
Conference Paper
Full-text available
In graph mining, a frequency measure is anti-monotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where the dataset is a single graph. Vanetik, Gudes and Shimony already gave sufficient and necessary cond...
Conference Paper
Full-text available
In machine learning, there has been an increased interest in metrics on structured data. The application we focus on is drug discovery. Although graphs have become very popular for the representation of molecules, a lot of operations on graphs are NP-complete. Representing the molecules as outerplanar graphs, a subclass within general graphs, and u...
Conference Paper
Full-text available
An important task in many scientific and engineering disciplines is to set up experiments with the goal of finding the best instances (substances, compositions, designs) as evaluated on an unknown target function using limited resources. We study this problem using machine learning principles, and introduce the novel task of active k-optimization....
Conference Paper
Full-text available
The frequent connected subgraph mining problem, i.e., the problem of listing all connected graphs that are subgraph isomorphic to at least a certain number of transaction graphs of a database, cannot be solved in output polynomial time in the general case. If, however, the transaction graphs are restricted to forests then the problem becomes tracta...
Article
Full-text available
Recently, there has been an increasing interest in directed probabilistic logical models and a variety of formalisms for describing such models has been proposed. Although many authors provide high-level arguments to show that in principle models in their formalism can be learned from data, most of the proposed learning algorithms have not yet been...
Conference Paper
Full-text available
State representation for intelligent agents is a continuous challenge as the need for abstraction is unavoidable in large state spaces. Pre- dictive representations offer one way to obtain state abstraction by replacing a state with a set of predictions about future interactions with the world. One such formalism is the Temporal-Difference Net- wor...
Conference Paper
Full-text available
We propose an opponent modeling approach for no- limit Texas hold-em poker that starts from a (learned) prior, i.e., general expectations about opponent behav- ior and learns a relational regression tree-function that adapts these priors to specific opponents. An important asset is that this approach can learn from incomplete in- formation (i.e. wi...
Article
Full-text available
In this paper we investigate the evolutionary dynamics of strategic behaviour in the game of poker by means of data gathered from a large number of real-world poker games. We perform this study from an evolutionary game theoretic perspective using the Replicator Dynamics model. We investigate the dynamic properties by studying how players switch be...
Article
Full-text available
In graph mining, a frequency measure is anti-monotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where the dataset is a single graph. Vanetik, Gudes and Shimony already gave sufficient and necessary cond...
Article
Full-text available
We study the task of approximating the k best instances with regard to a function us- ing a limited number of evaluations. We also apply an active learning algorithm based on Gaussian processes to the problem, and evaluate it on a challenging set of structure- activity relationship prediction tasks.
Conference Paper
Full-text available
Recently, there has been an increasing interest in directed probabilistic logical models and a variety of languages for describing such models has been proposed. Although many authors provide high-level arguments to show that in principle models in their language can be learned from data, most of the proposed learning algorithms have not yet been s...
Conference Paper
Full-text available
In this paper we investigate the relation between transfer learning in reinforcement learning with function approximation and su- pervised learning with concept drift. We present a new incremental rela- tional regression tree algorithm that is capable of dealing with concept drift through tree restructuring and show that it enables a reinforcement...
Conference Paper
Full-text available
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networks. In this paper we propose to upgrade another algorithm, namely ordering-search, since for Bayesian networks this was found to work...
Article
In this paper we describe the application of data mining methods for predicting the evolution of patients in an intensive care unit. We discuss the importance of such methods for health care and other application domains of engineering. We argue that this problem is an important but challenging one for the current state of the art data mining metho...
Conference Paper
Full-text available
There is an increasing interest in upgrading Bayesian networks to the relational case, resulting in directed probabilistic logical models. Many formalisms to describe such models have been introduced and learning algorithms have been developed for several such formalisms. Most of these algorithms are upgrades of the traditional structure search alg...
Conference Paper
Full-text available
In recent years, there has been a growing inter- est in using rich representations such as relational languages for reinforcement learning. However, while expressive languages have many advantages in terms of generalization and reasoning, extending existing approaches to such a relational setting is a non-trivial problem. In this paper, we present...
Conference Paper
Full-text available
We consider the problem of policy learning in a Markov Decision Process . A MDP consists of a state spaceS, a set of actions A, a transition probability function t(s,a,s0) and a reward function R : S ! R. The problem is to find a policy, a
Conference Paper
Full-text available
Graphs are mathematical structures that are capable of representing relational data. In the chemoinformatics context, they have be- come very popular for the representation of molecules. However, a lot of operations on graphs are NP-complete, so no ecien t al- gorithms that can handle these structures exist. In this paper we focus on outerpla- nar...
Conference Paper
Of many graph mining algorithms an essential component is its procedure for enumerating graphs such that no two enumerated graphs are isomorphic. All frequent subgraph miners require such a component [14, 5, 1, 6], but also other
Article
We propose a novel machine learning al-gorithm for learning mutation pathways of viruses from a population of viral DNA strands. More specifically, given a number of sequences, the algorithm constructs a phy-logenetic tree that expresses the ancestry of the sequences, and at the same time builds a model describing dependencies between mu-tations th...
Article
Full-text available
Reinforcement learning is a well-suited approach for many decision-making problems. Lots of interesting domains are, however, not solvable in practice by this approach due to their size: traditional reinforcement learning algorithm need to store every combination of state and action which was encountered. A common method for dealing with large stat...
Conference Paper
In relational learning, predictions for an individual are based not only on its own properties but also on the properties of a set of related individuals. Many systems use aggregates to summarize this set. Features thus introduced compare the result of an aggregate function to a thresh- old. We consider the case where the set to be aggregated is ge...
Conference Paper
Model trees are a special case of regression trees in which lin- ear regression models are predicted in the leaves. Little attention has been paid to model trees in relational learning, mainly because the task of learn- ing linear regression equations in this context involves dealing with non- determinacy of predictive attributes. Whereas existing...
Article
Full-text available
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to v...
Conference Paper
Full-text available
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to v...
Article
Full-text available
In this paper we describe an application of data mining methods for different prediction tasks in an intensive care unit. Some of the challenging aspects of performing data min-ing in this domain are highlighted. The ap-plied methods result in models with good performances within medical standards that can be valuable in assisting medical decision...
Article
Full-text available
In this paper we present the use of Gaussian Processes for regression in the application of prediction in Intensive Care. We propose a preliminary solution to predicting the evolution of a patient's state during his stay in intensive care by means of defined patient specific characteristics.
Article
Full-text available
A representation of the search space in optical pulse shaping problems employing an acousto-optic programmable dispersive filter (AOPDF) is presented for use in closed-loop learning experiments where the optimal spectral phase function to some control problem is determined by an iterative learning algorithm. The representation allows the algorithm...
Article
Full-text available
This paper ofiers an approach to the problem of large state spaces for reinforce- ment learning by constructing a state-action pair aggregation (treating similar state-action pairs as if they were the same) with the use of domain knowledge. Arbitrary aggregation is known to give possibly very large errors. In this pa- per it is shown how, by using...
Article
Full-text available
In this paper we describe an interesting application of tem- poral data mining, predicting the evolution of critically ill patients. We point out several issues which make this ap- plication particularly challenging. We outline our work in progress and discuss directions for further work.
Article
Full-text available
In this paper a method to learn a single interpretable model from a relational ensemble is presented. The new model is obtained by artificially generating partial data examples using the distributions im- plicit in the ensemble and by building a new relational model from this artificial data.
Conference Paper
Full-text available
Relational reinforcement learning is a Q-learning technique for relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. In this case, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value...
Conference Paper
Full-text available
Probability trees (or Probability Estimation Trees, PET’s) are decision trees with probability distributions in the leaves. Several alternative approaches for learning probability trees have been proposed but no thorough comparison of these approaches exists. In this paper we experimentally compare the main approaches using the relational decision...
Conference Paper
Full-text available
Logical Bayesian Networks (LBNs) have recently been introduced as another language for knowledge based model construction of Bayesian networks, besides existing languages such as Probabilistic Relational Models (PRMs) and Bayesian Logic Programs (BLPs). The original description of LBNs introduces them as a variant of BLPs and discusses the differen...
Conference Paper
Full-text available
In this paper we study Relational Reinforcement Learning in a multi-agent setting. There is growing evidence in the Reinforcement Learning research community that a relational representation of the state space has many bene ts over a propositional one. Complex tasks as planning or information retrieval on the web can be represented more naturally i...

Network

Cited By