Frank-Michael Schleif's research while affiliated with Hochschule für angewandte Wissenschaften Würzburg-Schweinfurt and other places

Publications (143)

Article
Deep learning is reaching state of the art in many applications. However, the generalization capabilities of the learned networks are limited to the training or source domain. The predictive power decreases when these models are evaluated in a target domain different from the source domain. Joint adversarial domain adaptation networks are currently...
Article
Full-text available
In recent years social media became an important part of everyday life for many people. A big challenge of social media is, to find posts, that are interesting for the user. Many social networks like Twitter handle this problem with so-called hashtags. A user can label his own Tweet (post) with a hashtag, while other users can search for posts cont...
Article
Full-text available
Concept drift is a change of the underlying data distribution which occurs especially with streaming data. Besides other challenges in the field of streaming data classification, concept drift has to be addressed to obtain reliable predictions. Robust Soft Learning Vector Quantization as well as Generalized Learning Vector Quantization has already...
Preprint
Full-text available
Matrix approximations are a key element in large-scale algebraic machine learning approaches. The recently proposed method MEKA (Si et al., 2014) effectively employs two common assumptions in Hilbert spaces: the low-rank property of an inner product matrix obtained from a shift-invariant kernel function and a data compactness hypothesis by means of...
Chapter
In non-stationary environments, several constraints require algorithms to be fast, memory-efficient, and highly adaptable. While there are several classifiers of the family of lazy learners and tree classifiers in the streaming context, the application of prototype-based classifiers has not found much attention. Prototype-based classifiers however...
Chapter
Statistical and adversarial adaptation are currently two extensive categories of neural network architectures in unsupervised deep domain adaptation. The latter has become the new standard due to its good theoretical foundation and empirical performance. However, there are two shortcomings. First, recent studies show that these approaches focus too...
Conference Paper
Full-text available
Statistical and adversarial adaptation are currently two ex-tensive categories of neural network architectures in unsupervised deepdomain adaptation. The latter has become the new standard dueto itsgood theoretical foundation and empirical performance. However, thereare two shortcomings. First, recent studies show that these approachesfocus too muc...
Chapter
Over the last two decades, kernel learning attracted enormous interest and led to the development of a variety of successful machine learning models. The selection of an efficient data representation is one of the critical aspects to get high-quality results. In a variety of domains, this is achieved by incorporating expert knowledge in the used do...
Article
Full-text available
Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms re...
Conference Paper
Full-text available
Similar as traditional algorithms, deep learning networks struggle in generalizing across domain boundaries. A current solution is the simultaneous training of the classification model and the minimization of domain differences in the deep network. In this work, we propose a new unsupervised deep domain adaptation architecture, which trains a class...
Chapter
In static environments Random Projection (RP) is a popular and efficient technique to preprocess high-dimensional data and to reduce its dimensionality. While RP has been widely used and evaluated in stationary data analysis scenarios, non-stationary environments are not well analyzed. In this paper we provide an evaluation of RP on streaming data...
Chapter
Full-text available
Current supervised learning models cannot generalize well across domain boundaries, which is a known problem in many applications, such as robotics or visual classification. Domain adaptation methods are used to improve these generalization properties. However, these techniques suffer either from being restricted to a particular task, such as visua...
Preprint
Full-text available
Proximities are at the heart of almost all machine learning methods. If the input data are given as numerical vectors of equal lengths, euclidean distance, or a Hilbertian inner product is frequently used in modeling algorithms. In a more generic view, objects are compared by a (symmetric) similarity or dissimilarity measure, which may not obey par...
Preprint
Full-text available
Transfer learning is focused on the reuse of supervised learning models in a new context. Prominent applications can be found in robotics, image processing or web mining. In these fields, the learning scenarios are naturally changing but often remain related to each other motivating the reuse of existing supervised models. Current transfer learning...
Preprint
Full-text available
The amount of real-time communication between agents in an information system has increased rapidly since the beginning of the decade. This is because the use of these systems, e. g. social media, has become commonplace in today's society. This requires analytical algorithms to learn and predict this stream of information in real-time. The nature o...
Article
Full-text available
The amount of real-time communication between agents in an information system has increased rapidly since the beginning of the decade. This is because the use of these systems, e.g. social media, has become commonplace in today’s society. This requires analytical algorithms to learn and predict this stream of information in real-time. The nature of...
Conference Paper
Proximities are at the heart of almost all machine learning methods. In a more generic view, objects are compared by a (symmetric) similarity or dissimilarity measure, which may not obey particular mathematical properties. This renders many machine learning methods invalid, leading to convergence problems and the loss of generalization behavior. In...
Chapter
Concept drift is a change of the underlying data distribution which occurs especially with streaming data. Besides other challenges in the field of streaming data classification, concept drift should be addressed to obtain reliable predictions. The Robust Soft Learning Vector Quantization has already shown good performance in traditional settings a...
Article
Supervised learning employing positive semi definite kernels has gained wide attraction and lead to a variety of successful machine learning approaches. The restriction to positive semi definite kernels and a hilbert space is common to simplify the mathematical derivations of the respective learning methods, but is also limiting because more recent...
Article
Transfer learning is focused on the reuse of supervised learning models in a new context. Prominent applications can be found in robotics, image processing or web mining. In these fields, the learning scenarios are naturally changing but often remain related to each other motivating the reuse of existing supervised models. Current transfer learning...
Conference Paper
Full-text available
Todays datasets, especially in streaming context, are more and more non-static and require algorithms to detect and adapt to change. Recent work shows vital research in the field, but mainly lack stable performance during model adaptation. In this work, a concept drift detection strategy followed by a prototype based insertion strategy is proposed....
Preprint
Transfer learning focuses on the reuse of supervised learning models in a new context. Prominent applications can be found in robotics, image processing or web mining. In these areas, learning scenarios change by nature, but often remain related and motivate the reuse of existing supervised models. While the majority of symmetric and asymmetric dom...
Conference Paper
The increasing availability of wireless networks inside buildings has opened up numerous opportunities for new innovative smart systems. For a lot of these systems, acquisition of context-sensitive information about attendant people has evolved to a key challenge. Especially the position and distribution of attendants significantly influence the sy...
Conference Paper
In an era of smart information systems and smart buildings, detecting, tracking and identifying the presence of attendants inside of enclosed rooms have evolved to a key challenge in the research area of smart building systems. Therefore, several types of sensing systems were proposed over the past decade to tackle these challenge. Depending on the...
Article
Indefinite similarity measures can be frequently found in bio-informatics by means of alignment scores, but are also common in other fields like shape measures in image retrieval. Lacking an underlying vector space, the data are given as pairwise similarities only. The few algorithms available for such data do not scale to larger datasets. Focusing...
Conference Paper
Full-text available
Transfer learning is focused on the reuse of supervised learning models in a new context. Prominent applications can be found in robotics, image processing or web mining. In these fields, the learning scenarios are naturally changing but often remain related to each other motivating the reuse of existing supervised models. Current transfer learning...
Conference Paper
Existing algorithms for the detection of stellar structures in the Milky Way are most efficient when full phase-space and color information is available. This is rarely the case. Since recently, the Gaia satellite surveys the whole sky and is providing highly accurate positions for more than one billion sources. In this contribution we propose two...
Conference Paper
Non-metric proximity measures got wide interest in various domains such as life sciences, robotics and image processing. The majority of learning algorithms for these data are focusing on classification problems. Here we derive a regression algorithm for indefinite data representations based on the support vector machine. The approach avoids heuris...
Article
The recently proposed Krĕin space Support Vector Machine (KSVM) is an efficient classifier for indefinite learning problems, but with quadratic to cubic complexity and a non-sparse decision function. In this paper a Krĕin space Core Vector Machine (iCVM) solver is derived. A sparse model with linear runtime complexity can be obtained under a low ra...
Conference Paper
Kernel based learning is very popular in machine learning, but many classical methods have at least quadratic runtime complexity. Random fourier features are very effective to approximate shift-invariant kernels by an explicit kernel expansion. This permits to use efficient linear models with much lower runtime complexity. As one key approach to ke...
Article
Full-text available
Indefinite similarity measures can be frequently found in bio-informatics by means of alignment scores, but are also common in other fields like shape measures in image retrieval. Lacking an underlying vector space, the data are given as pairwise similarities only. The few algorithms available for such data do not scale to larger datasets. Focusing...
Chapter
In supervised learning feature vectors are often implicitly mapped to a high-dimensional space using the kernel trick with quadratic costs for the learning algorithm. The recently proposed random Fourier features provide an explicit mapping such that classical algorithms with often linear complexity can be applied. Yet, the random Fourier feature a...
Chapter
Sequence data are widely used to get a deeper insight into biological systems. From a data analysis perspective they are given as a set of sequences of symbols with varying length. In general they are compared using nonmetric score functions. In this form the data are nonstandard, because they do not provide an immediate metric vector space and the...
Conference Paper
Indefinite similarity measures can be frequently found in bio-informatics by means of alignment scores. Lacking an underlying vector space, the data are given as pairwise similarities only. Indefinite Kernel Fisher Discriminant (iKFD) is a very effective classifier for this type of data but has cubic complexity and does not scale to larger problems...
Article
Full-text available
Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities ar...
Article
In supervised learning probabilistic models are attractive to define discriminative models in a rigid mathematical framework. More recently, prototype approaches, known for compact and efficient models, were defined in a probabilistic setting, but are limited to metric vectorial spaces. Here we propose a generalization of the discriminative probabi...
Article
Metric learning constitutes a well-investigated field for vectorial data with successful applications, e.g. in computer vision, information retrieval, or bioinformatics. One particularly promising approach is offered by low-rank metric adaptation integrated into modern variants of learning vector quantization (LVQ). This technique is scalable with...
Article
In supervised learning the parameters of a parametric Euclidean distance or mahalanobis distance can be effectively learned by so called Matrix Relevance Learning. This adaptation is not only useful to improve the discrimination capabilities of the model, but also to identify relevant features or relevant correlated features in the input data. Clas...
Article
Full-text available
Odor classification by a robot equipped with an electronic nose (e-nose) is a challenging task for pattern recognition since volatiles have to be classified quickly and reliably even in the case of short measurement sequences, gathered under operation in the field. Signals obtained in these circumstances are characterized by a high-dimensionality,...
Article
Domain specific (dis-)similarity or proximity measures used e.g. in alignment algorithms of sequence data, are popular to analyze complex data objects and to cover domain specific data properties. Without an underlying vector space these data are given as pairwise (dis-)similarities only. The few available methods for such data focus widely on simi...
Article
Existing semi-supervised learning algorithms focus on vectorial data given in Euclidean space. But many real life data are non-metric, given as (dis-)similarities which are not widely addressed. We propose a conformal prototype-based classifier for dissimilarity data to semi-supervised tasks. A 'secure region' of unlabeled data is identified to imp...
Article
Neighbor-preserving embedding of relational data in low-dimensional Euclidean spaces is studied. Contrary to variants of stochastic neighbor embedding that minimize divergence measures between estimated neighborhood probability distributions, the proposed approach fits configurations in the output space by maximizing correlation with potentially as...
Article
Since they represent a model in terms of few typical representatives, prototype based learning such as learning vector quantization (LVQ) constitutes a directly interpretable machine learning technique. Recently, several LVQ schemes have been extended towards a kernelized or dissimilarity based version which can be applied if data are represented b...
Conference Paper
Proximity matrices like kernels or dissimilarity matrices provide non-standard data representations common in the life science domain. Here we extend fast soft competitive learning to a discriminative and vector labeled learning algorithm for proximity data. It provides a more stable and consistent integration of label information in the cost funct...
Article
Existing classification algorithms focus on vectorial data given in Euclidean space or representations by means of positive semi-definite kernel matrices. Many real world data, like biological sequences are not vectorial, often non-euclidean and given only in the form of (dis-)similarities between examples, requesting for efficient and interpretabl...
Article
Prototype-based methods often display very intuitive classification and learning rules. However, popular prototype based classifiers such as learning vector quantization (LVQ) are restricted to vectorial data only. In this contribution, we discuss techniques how to extend LVQ algorithms to more general data characterized by pairwise similarities or...
Chapter
We introduce a generalization of Multivariate Robust Soft Learning Vector Quantization. The approach is a probabilistic classifier and can deal with vectorial class labelings for the training data and the prototypes. It employs t-norms, known from fuzzy learning and fuzzy set theory, in the class label assignments, leading to a more flexible model...
Conference Paper
Due to the increasing amount of large data sets, efficient learning algorithms are necessary. Also the interpretation of the final model is desirable to draw efficient conclusions from the model results. Prototype based learning algorithms have been extended recently to proximity learners to analyze data given in non-standard data formats. The supe...
Conference Paper
Domain specific (dis-)similarity or proximity measures, employed e.g. in alignment algorithms in bio-informatics, are often used to compare complex data objects and to cover domain specific data properties. Lacking an underlying vector space, data are given as pairwise (dis-)similarities. The few available methods for such data do not scale well to...
Conference Paper
The amount and complexity of data increase rapidly, however, due to time and cost constrains, only few of them are fully labeled. In this context non-vectorial relational data given by pairwise (dis-)similarities without explicit vectorial representation, like score- values in sequences alignments, are particularly challenging. Existing semi-superv...
Article
Full-text available
Soft competitive learning is an advanced k-means like clustering approach overcoming some severe drawbacks of k-means, like initialization dependence and sticking to local minima. It achieves lower distortion error than k-means and has shown very good performance in the clustering of complex data sets, using various metrics or kernels. While very e...
Conference Paper
Full-text available
Current classification algorithms focus on vectorial data, given in eu-clidean or kernel spaces. Many real world data, like biological sequences are not vectorial and often non-euclidean, given by (dis-)similarities only, requesting for efficient and interpretable models. Current classifiers for such data require com-plex transformations and provid...
Conference Paper
Full-text available
In the life sciences, short time series with high dimensional entries are becoming more and more popular such as spectrometric data or gene expression profiles taken over time. Data characteristics rule out classical time series analysis due to the few time points, and they prevent a simple vectorial treatment due to the high dimensionality. In thi...
Article
Recently, diverse high quality prototype-based clustering techniques have been developed which can directly deal with data sets given by general pairwise dissimilarities rather than standard Euclidean vectors. Examples include affinity propagation, relational neural gas, or relational generative topographic mapping. Corresponding to the size of the...
Article
Full-text available
Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques h...
Conference Paper
Full-text available
Recently, an extension of popular learning vector quantiza-tion (LVQ) to general dissimilarity data has been proposed, relational generalized LVQ (RGLVQ) [10, 9]. An intuitive prototype based classi-fication scheme results which can divide data characterized by pairwise dissimilarities into priorly given categories. However, the technique relies on...
Conference Paper
Full-text available
We suggest and investigate the use of Generalized Matrix Relevance Learning (GMLVQ) in the context of discriminative visualization. This prototype-based, supervised learning scheme parameterizes an adaptive distance measure in terms of a matrix of relevance factors. By means of a few benchmark problems, we demonstrate that the training process yiel...
Conference Paper
Full-text available
While state-of-the-art classifiers such as support vector machines offer efficient classification for kernel data, they suffer from two drawbacks: the underlying classifier acts as a black box which can hardly be inspected by humans, and non-positive definite Gram matrices require additional preprocessing steps to arrive at a valid kernel. In this...
Article
We present an extension of the recently introduced Generalized Matrix Learning Vector Quantization algorithm. In the original scheme, adaptive square matrices of relevance factors parameterize a discriminative distance measure. We extend the scheme to matrices of limited rank corresponding to low-dimensional representations of the data. This allows...
Conference Paper
Full-text available
Prototype-based models offer an intuitive interface to given data sets by means of an inspection of the model prototypes. Supervised classification can be achieved by popular techniques such as learning vector quantization (LVQ) and extensions derived from cost functions such as generalized LVQ (GLVQ) and robust soft LVQ (RSLVQ). These methods, how...
Conference Paper
Topographic mapping offers an intuitive interface to inspect large quantities of electronic data. Recently, it has been extended to data described by general dissimilarities rather than Euclidean vectors. Unlike its Euclidean counterpart, the technique has quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, it is i...
Conference Paper
Clustering approaches are very important methods to analyze data sets in an initial unsupervised setting. Traditionally many clustering approaches assume data points to be independent. Here we present a method to make use of local dependencies to improve clustering under guaranteed distortions. Such local dependencies are very common for data gener...
Conference Paper
Clustering approaches constitute important methods for unsupervised data analysis. Traditionally, many clustering models focus on spherical or ellipsoidal clusters in Euclidean space. Kernel methods extend these approaches to more complex cluster forms, and they have been recently integrated into several clustering techniques. While leading to very...
Conference Paper
Full-text available
Topographic mapping offers a very flexible tool to inspect large quantities of high-dimensional data in an intuitive way. Often, electronic data are inherently non-Euclidean and modern data formats are connected to dedicated non-Euclidean dissimilarity measures for which classical topographic mapping cannot be used. We give an overview about extens...
Conference Paper
Full-text available
The increasing size and complexity of modern data sets turns modern data mining techniques to indispensable tools when inspecting biomedical data sets. Thereby, dedicated data formats and detailed information often cause the need for problem specific similarities or dissimilarities instead of the standard Euclidean norm. Therefore, a number of clus...
Article
This paper introduces a hierarchical model for the description and deconvolution of composite patterns. The patterns are described in a basis system of spectral basis functions.The mixture coefficients for the composite patterns are determined by solving a linear mixture model with nonnegative coefficients. In life science research, wet-lab mixed s...
Article
In content-based image retrieval (CBIR), relevance feedback has been proven to be a powerful tool for bridging the gap between low level visual features and high level semantic concepts. Traditionally, relevance feedback driven CBIR is often considered ...
Article
We discuss the use of divergences in dissimilarity-based classification. Divergences can be employed whenever vectorial data consists of non-negative, potentially normalized features. This is, for instance, the case in spectral data or histograms. In particular, we introduce and study divergence based learning vector quantization (DLVQ). We derive...