[show abstract][hide abstract] ABSTRACT: This work shows how to leverage causal inference to understand the behavior
of complex learning systems interacting with their environment and predict the
consequences of changes to the system. Such predictions allow both humans and
algorithms to select changes that improve both the short-term and long-term
performance of such systems. This work is illustrated by experiments carried
out on the ad placement system associated with the Bing search engine.
[show abstract][hide abstract] ABSTRACT: We propose a new method to quantify the solution stability of a large class of combinatorial optimization problems arising in machine learning. As practical example we apply the method to correlation clustering, clustering aggregation, modularity clustering, and relative performance significance clustering. Our method is extensively motivated by the idea of linear programming relaxations. We prove that when a relaxation is used to solve the original clustering problem, then the solution stability calculated by our method is conservative, that is, it never overestimates the solution stability of the true, unrelaxed problem. We also demonstrate how our method can be used to compute the entire path of optimal solutions as the optimization problem is increasingly perturbed. Experimentally, our method is shown to perform well on a number of benchmark problems.
Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 769-776 (2009). 01/2009;
[show abstract][hide abstract] ABSTRACT: Abstraction of complex, longer motor tasks into simpler elemental movements enables humans and animals to exhibit motor skills which have not yet been matched by robots. Humans intuitively decompose complex motions into smaller, simpler segments. For example when describing simple movements like drawing a triangle with a pen, we can easily name the basic steps of this movement. Surprisingly, such abstractions have rarely been used in artificial motor skill learning algorithms. These algorithms typically choose a new action (such as a torque or a force) at a very fast time-scale. As a result, both policy and temporal credit assignment problem become unnecessarily complex - often beyond the reach of current machine learning methods. We introduce a new framework for temporal abstractions in reinforcement learning (RL), i.e. RL with motion templates. We present a new algorithm for this framework which can learn high-quality policies by making only few abstract decisions.
Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 753-760 (2009). 01/2009;
[show abstract][hide abstract] ABSTRACT: Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.
Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 745-752 (2009). 01/2009;
[show abstract][hide abstract] ABSTRACT: Bandpass filtering, orientation selectivity, and contrast gain control are prominent features of sensory coding at the level of V1 simple cells. While the effect of bandpass filtering and orientation selectivity can be assessed within a linear model, contrast gain control is an inherently nonlinear computation. Here we employ the class of $L_p$ elliptically contoured distributions to investigate the extent to which the two features---orientation selectivity and contrast gain control---are suited to model the statistics of natural images. Within this framework we find that contrast gain control can play a significant role for the removal of redundancies in natural images. Orientation selectivity, in contrast, has only a very limited potential for redundancy reduction.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 1521-1528 (2009). 01/2009;
[show abstract][hide abstract] ABSTRACT: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe- art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 153-160 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: EEG connectivity measures could provide a new type of feature space for inferring a subject‘s intention in Brain-Computer Interfaces (BCIs). However, very little is known on EEG connectivity patterns for BCIs. In this study, EEG connectivity during motor imagery (MI) of the left and right is investigated in a broad frequency range across the whole scalp by combining Beamforming with Transfer Entropy and taking into account possible volume conduction effects. Observed connectivity patterns indicate that modulation intentionally induced by MI is strongest in the gamma-band, i.e., above 35 Hz. Furthermore, modulation between MI and rest is found to be more pronounced than between MI of different hands. This is in contrast to results on MI obtained with bandpower features, and might provide an explanation for the so far only moderate success of connectivity features in BCIs. It is concluded that future studies on connectivity based BCIs should focus on high frequency bands and conside r ex peri mental paradigms that maximally vary cognitive demands between conditions.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 561-568 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian Process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other nonparametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and nu-SVR.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 1193-1200 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 1177-1184 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 1025-1032 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: We show how improved sequences for magnetic resonance imaging can be found through automated optimization of Bayesian design scores. Combining recent advances in approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first scalable Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires approximate inference for dense, non-Gaussian models on a scale seldom addressed before. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on a realistic setup with raw data from a 3T MR scanner.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 1441-1448 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: From an information-theoretic perspective, a noisy transmission system such as a visual Brain-Computer Interface (BCI) speller could benefit from the use of errorcorrecting codes. However, optimizing the code solely according to the maximal minimum-Hamming-distance criterion tends to lead to an overall increase in target frequency of target stimuli, and hence a significantly reduced average target-to-target interval (TTI), leading to difficulties in classifying the individual event-related potentials (ERPs) due to overlap and refractory effects. Clearly any change to the stimulus setup must also respect the possible psychophysiological consequences. Here we report new EEG data from experiments in which we explore stimulus types and codebooks in a within-subject design, finding an interaction between the two factors. Our data demonstrate that the traditional, rowcolumn code has particular spatial properties that lead to better performance than one would expect from its TTIs and Hamming-distances alone, but nonetheless error-correcting codes can improve performance provided the right stimulus type is used.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 665-672 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: Motor primitives or motion templates have become an important concept for both modeling human motor control as well as generating robot behaviors using imitation learning. Recent impressive results range from humanoid robot movement generation to timing models of human motions. The automatic generation of skill libraries containing multiple motion templates is an important step in robot learning. Such a skill learning system needs to cluster similar movements together and represent each resulting motion template as a generative model which is subsequently used for the execution of the behavior by a robot system. In this paper, we show how human trajectories captured as multidimensional time-series can be clustered using Bayesian mixtures of linear Gaussian state-space models based on the similarity of their dynamics. The appropriate number of templates is automatically determined by enforcing a parsimonious parametrization. As the resulting model is intractable, we introduce a novel approximation method based on variational Bayes, which is especially designed to enable the use of efficient inference algorithms. On recorded human Balero movements, this method is not only capable of finding reasonable motion templates but also yields a generative model which works well in the execution of this complex task on a simulated anthropomorphic SARCOS arm.
Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference, 297-304 (2009). 01/2008;
[show abstract][hide abstract] ABSTRACT: Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs.
Proceedings of the 23rd International Conference on Machine Learning, 026-026 (2006). 01/2006;
[show abstract][hide abstract] ABSTRACT: We describe a trainable system for analyzing videos of developing C. elegans embryos. The system automatically detects, segments, and locates cells and nuclei in microscopic images. The system was designed as the central component of a fully automated phenotyping system. The system contains three modules 1) a convolutional network trained to classify each pixel into five categories: cell wall, cytoplasm, nucleus membrane, nucleus, outside medium; 2) an energy-based model, which cleans up the output of the convolutional network by learning local consistency constraints that must be satisfied by label images; 3) a set of elastic models of the embryo at various stages of development that are matched to the label images.
IEEE Transactions on Image Processing 10/2005; · 3.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images.
Advances in Neural Information Processing Systems, 465-472 (2005). 01/2004;
[show abstract][hide abstract] ABSTRACT: We describe methods for computing an implicit model of a hypersurface that is given only by a finite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regions in terms of hyperplanes.
Advances in Neural Information Processing Systems, 1193-1200 (2005). 01/2004;
[show abstract][hide abstract] ABSTRACT: We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest neighbor methods, support vector machines, and convolutional networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for convolutional nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on; 01/2004