## No full-text available

To read the full-text of this research,

you can request a copy directly from the author.

Non-metric proximity measures got wide interest in various domains such as life sciences, robotics and image processing. The majority of learning algorithms for these data are focusing on classification problems. Here we derive a regression algorithm for indefinite data representations based on the support vector machine. The approach avoids heuristic eigen spectrum modifications or costly proxy matrix approximations, as used in general. We evaluate the method on a number of benchmark data using an indefinite measure.

To read the full-text of this research,

you can request a copy directly from the author.

This paper presents a theoretical foundation for an SVM solver in Kre˘ın spaces. Up to now, all methods are based either on the matrix correction, or on non-convex minimization, or on feature-space embedding. Here we justify and evaluate a solution that uses the original (indefinite) similarity measure, in the original Kre˘ın space. This solution is the result of a stabilization procedure. We establish the correspondence between the stabilization problem (which has to be solved) and a classical SVM based on minimization (which is easy to solve). We provide simple equations to go from one to the other (in both directions). This link between stabilization and minimization problems is the key to obtain a solution in the original Kre˘ın space. Using KSVM, one can solve SVM with usually troublesome kernels (large negative eigenvalues or large numbers of negative eigenvalues). We show experiments showing that our algorithm KSVM outperforms all previously proposed approaches to deal with indefinite matrices in SVM-like kernel methods.

Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are, however, often naturally obtained by nonmetric proximity measures that cannot easily be handled by classical learning algorithms. Major efforts have been undertaken to provide approaches that can either directly be used for such data or to make standard methods available for these types of data. We provide a comprehensive survey for the field of learning with nonmetric proximities. First, we introduce the formalism used in nonmetric spaces and motivate specific treatments for nonmetric proximity data. Second, we provide a systematization of the various approaches. For each category of approaches, we provide a comparative discussion of the individual algorithms and address complexity issues and generalization properties. In a summarizing section, we provide a larger experimental study for the majority of the algorithms on standard data sets. We also address the problem of large-scale proximity learning, which is often overlooked in this context and of major importance to make the method relevant in practice. The algorithms we discuss are in general applicable for proximity-based clustering, one-class classification, classification, regression, and embedding approaches. In the experimental part, we focus on classification tasks.

Forest fires are a major environmental issue, creating economical and ecological damage while endangering human lives. Fast detection is a key element for controlling such phenomenon. To achieve this, one alternative is to use automatic tools based on local sensors, such as provided by meteorological stations. In effect, meteorological conditions (e.g. temperature, wind) are known to influence forest fires and several fire indexes, such as the forest Fire Weather Index (FWI), use such data. In this work, we explore a Data Mining (DM) approach to predict the burned area of forest fires. Five different DM techniques, e.g. Support Vector Machines (SVM) and Random Forests, and four distinct feature selection setups (using spatial, temporal, FWI components and weather attributes), were tested on recent real-world data collected from the northeast region of Por-tugal. The best configuration uses a SVM and four meteorological inputs (i.e. temperature, relative humidity, rain and wind) and it is capable of predicting the burned area of small fires, which are more frequent. Such knowledge is particularly useful for improving firefighting resource management (e.g. prioritizing targets for air tankers and ground crews).

In this paper we show that many kernel methods can be adapted to deal with indefinite kernels, that is, kernels which are not positive semidefinite. They do not satisfy Mercer's condition and they induce associated functional spaces called Reproducing Kernel Kreǐn Spaces (RKKS), a generalization of Reproducing Kernel Hubert Spaces (RKHS). Machine learning in RKKS shares many "nice" properties of learning in RKHS, such as orthogonality and projection. However, since the kernels are indefinite, we can no longer minimize the loss, instead we stabilize it. We show a general representer theorem for constrained stabilization and prove generalization bounds by computing the Rademacher averages of the kernel class. We list several examples of indefinite kernels and investigate regularization methods to solve spline interpolation. Some preliminary experiments with indefinite kernels for spline smoothing are reported for truncated spectral factorization, Landweber-Fridman iterations, and MR-II.

In this paper we show that many kernel methods can be adapted to deal with indefinite kernels, that is, kernels which are not positive semidefinite. They do not satisfy Mercer's condition and they induce associated functional spaces called Reproducing Kernel Kre&icaron;n Spaces (RKKS), a generalization of Reproducing Kernel Hilbert Spaces (RKHS).Machine learning in RKKS shares many "nice" properties of learning in RKHS, such as orthogonality and projection. However, since the kernels are indefinite, we can no longer minimize the loss, instead we stabilize it. We show a general representer theorem for constrained stabilization and prove generalization bounds by computing the Rademacher averages of the kernel class. We list several examples of indefinite kernels and investigate regularization methods to solve spline interpolation. Some preliminary experiments with indefinite kernels for spline smoothing are reported for truncated spectral factorization, Landweber-Fridman iterations, and MR-II.

In the process of designing pattern recognition systems one may choose a representation based on pairwise dissimilarities
between objects. This is especially appealing when a set of discriminative features is difficult to find. Various classification
systems have been studied for such a dissimilarity representation: the direct use of the nearest neighbor rule, the postulation
of a dissimilarity space and an embedding to a virtual, underlying feature vector space.
It appears in several applications that the dissimilarity measures constructed by experts tend to have a non-Euclidean behavior. In this paper we first analyze the causes of such choices and then experimentally verify that the non-Euclidean
property of the measure can be informative.

In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Di- vergence (6), to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key prop- erties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to han- dle this class of distance measures by expanding and map- ping points in the original space to a new extended space. Subsequently, we show how state-of-the-art tree-based in- dexing methods, for low to moderate dimensional datasets, and vector approximation file (VA-file) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bound- ing techniques and distribution-based index optimization are also introduced to improve the performance of query an- swering and index construction respectively, which can be applied on both the R-trees and VA files. Extensive exper- iments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions.

Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis functions, or a mixture of linear models. Furthermore, we show that the former is a special case of the latter. Our ambition is thus to provide a deep understanding of the relationship between these algorithms, that, despite being derived from very different principles, use a function representation that can be captured within one unified model. Finally, step-by-step derivations of the algorithms from first principles and visualizations of their inner workings allow this article to be used as a tutorial for those new to regression.
Copyright © 2015 Elsevier Ltd. All rights reserved.

Domain specific (dis-)similarity or proximity measures used e.g. in alignment
algorithms of sequence data, are popular to analyze complex data objects and to
cover domain specific data properties. Without an underlying vector space these
data are given as pairwise (dis-)similarities only. The few available methods
for such data focus widely on similarities and do not scale to large data sets.
Kernel methods are very effective for metric similarity matrices, also at large
scale, but costly transformations are necessary starting with non-metric (dis-)
similarities. We propose an integrative combination of Nystroem approximation,
potential double centering and eigenvalue correction to obtain valid kernel
matrices at linear costs in the number of samples. By the proposed approach
effective kernel approaches, become accessible. Experiments with several larger
(dis-)similarity data sets show that the proposed method achieves much better
runtime performance than the standard strategy while keeping competitive model
accuracy. The main contribution is an efficient and accurate technique, to
convert (potentially non-metric) large scale dissimilarity matrices into
approximated positive semi-definite kernel matrices at linear costs.

Reproducing kernel Kreǐn spaces are used in learning from data via kernel methods when the kernel is indefinite. In this paper, a characterization of a subset of the unit ball in such spaces is provided. Conditions are given, under which upper bounds on the estimation error and the approximation error can be applied simultaneously to such a subset. Finally, it is shown that the hyperbolic-tangent kernel and other indefinite kernels satisfy such conditions.

Recognition is the fundamental task of visual cognition, yet how to formalize
the general recognition problem for computer vision remains an open issue. The
problem is sometimes reduced to the simplest case of recognizing matching
pairs, often structured to allow for metric constraints. However, visual
recognition is broader than just pair matching -- especially when we consider
multi-class training data and large sets of features in a learning context.
What we learn and how we learn it has important implications for effective
algorithms. In this paper, we reconsider the assumption of recognition as a
pair matching test, and introduce a new formal definition that captures the
broader context of the problem. Through a meta-analysis and an experimental
assessment of the top algorithms on popular data sets, we gain a sense of how
often metric properties are violated by good recognition algorithms. By
studying these violations, useful insights come to light: we make the case that
locally metric algorithms should leverage outside information to solve the
general recognition problem.

A common approach in structural pattern classification is to define a dissimilarity measure on patterns and apply a distance-based nearest-neighbor classifier. In this paper, we introduce an alternative method for classification using kernel functions based on edit distance. The proposed approach is applicable to both string and graph representations of patterns. By means of the kernel functions introduced in this paper, string and graph classification can be performed in an implicit vector space using powerful statistical algorithms. The validity of the kernel method cannot be established for edit distance in general. However, by evaluating theoretical criteria we show that the kernel functions are nevertheless suitable for classification, and experiments on various string and graph datasets clearly demonstrate that nearest-neighbor classifiers can be outperformed by support vector machines using the proposed kernel functions.

We propose a data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step. A large dataset (when compared to other studies in this domain) is considered, with white and red vinho verde samples (from Portugal). Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful to support the oenologist wine tasting evaluations and improve wine production. Furthermore, similar techniques can help in target marketing by modeling consumer tastes from niche markets.

We discuss the use of divergences in dissimilarity-based classification. Divergences can be employed whenever vectorial data consists of non-negative, potentially normalized features. This is, for instance, the case in spectral data or histograms. In particular, we introduce and study divergence based learning vector quantization (DLVQ). We derive cost function based DLVQ schemes for the family of γ‐divergences which includes the well-known Kullback–Leibler divergence and the so-called Cauchy–Schwarz divergence as special cases. The corresponding training schemes are applied to two different real world data sets. The first one, a benchmark data set (Wisconsin Breast Cancer) is available in the public domain. In the second problem, color histograms of leaf images are used to detect the presence of cassava mosaic disease in cassava plants. We compare the use of standard Euclidean distances with DLVQ for different parameter settings. We show that DLVQ can yield superior classification accuracies and Receiver Operating Characteristics.

Pairwise dissimilarity representations are frequently used as an alternative to feature vectors in pattern recognition. One of the problems encountered in the analysis of such data, is that the dissimilarities are rarely Euclidean, and are sometimes non-metric too. As a result the objects associated with the dissimilarities can not be embedded into a Euclidean space without distortion. One way of gauging the extent of this problem is to compute the total mass associated with the negative eigenvalues of the dissimilarity matrix. However,this test does not reveal the origins of non-Euclidean or non-metric artefacts in the data. The aim in this paper is to provide simple empirical tests that can be used to determine the origins of the negative dissimilarity eigenvalues. We consider three sources of the negative dissimilarity eigenvalues, namely a) that the data resides on a manifold (here for simplicity we consider a sphere), b) that the objects may be extended and c) that there is Gaussian error. We develop three measures based on the non-metricity and the negative spectrum to characterize the possible causes of non-Euclidean data. We then experimentally test our measures on various real-world dissimilarity datasets.

In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

Part structure and articulation are of fundamental importance in computer and human vision. We propose using the inner-distance to build shape descriptors that are robust to articulation and capture part structure. The inner-distance is defined as the length of the shortest path between landmark points within the shape silhouette. We show that it is articulation insensitive and more effective at capturing part structures than the Euclidean distance. This suggests that the inner-distance can be used as a replacement for the Euclidean distance to build more accurate descriptors for complex shapes, especially for those with articulated parts. In addition, texture information along the shortest path can be used to further improve shape classification. With this idea, we propose three approaches to using the inner-distance. The first method combines the inner-distance and multidimensional scaling (MDS) to build articulation invariant signatures for articulated shapes. The second method uses the inner-distance to build a new shape descriptor based on shape contexts. The third one extends the second one by considering the texture information along shortest paths. The proposed approaches have been tested on a variety of shape databases, including an articulated shape data set, MPEG7 CE-Shape-1, Kimia silhouettes, the ETH-80 data set, two leaf data sets, and a human motion silhouette data set. In all the experiments, our methods demonstrate effective performance compared with other algorithms.

This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.