# Thomas VillmannHochschule Mittweida | HSMW · Department of Mathematics

Thomas Villmann

Prof. Dr.

## About

369

Publications

54,017

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

5,307

Citations

Citations since 2017

Introduction

Additional affiliations

March 2009 - present

January 1997 - February 2009

Education

January 1997 - December 2005

September 1991 - March 1996

September 1985 - August 1990

## Publications

Publications (369)

The paper reconsiders multilayer perceptron networks for the case where the Euclidean inner product is replaced by a semi-inner product. This would be of interest, if the dissimilarity measure between data is given by a general norm such that the Euclidean inner product is not longer consistent to that situation. We prove mathematically that the un...

This paper introduces concepts for quantum computing based learning vector quantization and prototype learning models. These concepts relate to current quantum computing patterns and quantum hardware. For this purpose, we introduce a new computing pattern for prototype updates and possible measurement strategies in the quantum computing regime. Fur...

In this paper we propose a modification of the Classification-by-Components (CbC) model for classification. Particularly, we integrate explicitly the Dempster-Shafer-theory, which in the original approach was mentioned to be implicitly realized but not explained deeply. Thus, we redefine the CbC keeping the main aspects of positive and negative rea...

Learning Vector Quantization (LVQ) and its cost-function-based variant called Generalized Learning Vector Quantization (GLVQ) are powerful, yet simple and interpretable Machine Learning (ML) algorithms for multi-class classification. Even though GLVQ is an effective tool for classifying vectorial data in its native form, it is not particularly suit...

The encounter of large amounts of biological sequence data generated during the last decades and the algorithmic and hardware improvements have offered the possibility to apply machine learning techniques in bioinformatics. While the machine learning community is aware of the necessity to rigorously distinguish data transformation from data compari...

Prototype-based models like the Generalized Learning Vector Quantization (GLVQ) belong to the class of interpretable classifiers. Moreover, quantum-inspired methods get more and more into focus in machine learning due to its potential efficient computing. Further, its interesting mathematical perspectives offer new ideas for alternative learning sc...

We present an approach to discriminate SARS-CoV-2 virus types based on their RNA sequence descriptions avoiding a sequence alignment. For that purpose, sequences are preprocessed by feature extraction and the resulting feature vectors are analyzed by prototype-based classification to remain interpretable. In particular, we propose to use variants o...

We present a method, which allows to train a Generalized Matrix Learning Vector Quantization (GMLVQ) model for classification using data from several, maybe non-calibrated, sources without explicit transfer learning. This is achieved by using a siamese-like GMLVQ-architecture, which comprises different sets of prototypes for the target classificati...

In the present article we propose the application of variants of the mutual information function as characteristic fingerprints of biomolecular sequences for classification analysis. In particular, we consider the resolved mutual information functions based on Shannon-, Rényi-, and Tsallis-entropy. In combination with interpretable machine learning...

This paper concludes the Brilliant Challenges contest. Participants had to design interesting optimization problems and publish them using the Optil.io platform. It was the first widely-advertised contest in the area of operational research where the objective was to submit the problem definition instead of the algorithmic solutions. Thus, it is a...

Classification in a possibilistic scenario is a kind of multiple class assignments for data. One of the most prominent and interpretable classifier is the learning vector quantization (LVQ) realizing a nearest prototype classifier model. Figuring out the problem of classifying based on possibilistic or probabilistic class labels (assignments) leads...

The paper demonstrates how to realize neural vector quantizers by means of quantum computing approaches. Particularly, we consider self-organizing maps and the neural gas vector quantizer for unsupervised learning as well as generalized learning vector quantization for classification learning. We show how quantum computing concepts can be adopted f...

Sensor fusion has gained a great deal of attention in recent years. It is used as an application tool in many different fields, especially the semiconductor, automotive, and medical industries. However, this field of research, regardless of the field of application, still presents different challenges concerning the choice of the sensors to be comb...

Motivation
Viruses are the most abundant biological entities and constitute a large reservoir of genetic diversity. In recent years, knowledge about them has increased significantly as a result of dynamic development in life sciences and rapid technological progress. This knowledge is scattered across various data repositories, making a comprehensi...

In this contribution, we consider the classification of time series and similar functional data which can be represented in complex Fourier and wavelet coefficient space. We apply versions of learning vector quantization (LVQ) which are suitable for complex-valued data, based on the so-called Wirtinger calculus. It allows for the formulation of gra...

In the present contribution we investigate the mathematical model of the trade-off between optimum classification and reject option. The model provides a threshold value in dependence of classification, rejection and error costs. The model is extended to the case that the training data are affected by label noise. We consider the respective mathema...

In this contribution the discrimination between native and mirror models of proteins according to their chirality is tackled based on the structural protein information. This information is contained in the Ramachandran plots of the protein models. We provide an approach to classify those plots by means of an interpretable machine learning
classifi...

We present an approach to investigate SARS-CoV-2 virus sequences based
on alignment-free methods for RNA sequence comparison. In particular,
we verify a given clustering result for the GISAID data set, which
was obtained analyzing the molecular differences in coronavirus populations
by phylogenetic trees. For this purpose, we use alignment-free
dis...

Dropout and DropConnect are useful methods to prevent multilayer neural networks from overfitting. In addition, it turns out that these tools can also be used to estimate the stability of networks regarding disturbances. Prototype based networks gain more and more attraction in current research because of their inherent interpretability and robust...

Adversarial attacks and the development of (deep) neural networks robust against them are currently two widely researched topics. The robustness of Learning Vector Quantization (LVQ) models against adversarial attacks has however not yet been studied to the same extent. We therefore present an extensive evaluation of three LVQ models: Generalized L...

Neural Gas is a prototype based clustering technique taking the ranking of the prototypes regarding their distance to the data samples into account. Previously, we proposed a fuzzy version of this approach, yet restricted our method to probabilistic cluster assignments. In this paper we extend this method by combining possibilistic and probabilisti...

An appropriate choice of the activation function plays an important role for the performance of (deep) multilayer perceptrons (MLP) in classification and regression learning. Usually, these activations are applied to all perceptron units in the network. A powerful alternative to MLPs are the prototype-based classification learning methods like (gen...

The most plausible hypothesis for explaining the origins of life on earth is the RNA world hypothesis supported by a growing number of research results from various scientific areas. Frequently, the existence of a hypothetical species on earth is supposed, with a base RNA sequence probably dissimilar from any known genomes today. It is hard to dist...

Introduction:
Little is known about peril constellations in primary hemostasis contributing to an acute myocardial infarction (MI) in patients with already manifest atherosclerosis. The study aimed to establish a predicting model based on six biomarkers of primary hemostasis: platelet count, mean platelet volume, hematocrit, soluble glycoprotein V...

This paper investigates the mathematically appropriate treatment of data density estimators in machine learning approaches, if these estimators rely on data dissimilarity density models. We show exemplarily for two well-known machine learning approaches for classification and data visualization that this dependence is apparently analyzing the respe...

Adversarial attacks and the development of (deep) neural networks robust against them are currently two widely researched topics. The robustness of Learning Vector Quantization (LVQ) models against adversarial attacks has however not yet been studied to the same extend. We therefore present an extensive evaluation of three LVQ models: Generalized L...

An appropriate choice of the activation function (like ReLU, sigmoid or swish) plays an important role in the performance of (deep) multilayer perceptrons (MLP) for classification and regression learning. Prototype-based classification learning methods like (generalized) learning vector quantization (GLVQ) are powerful alternatives. These models al...

Background: Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness o...

Neural networks currently dominate the machine learning community and they do so for good reasons. Their accuracy on complex tasks such as image classification is unrivaled at the moment and with recent improvements they are reasonably easy to train. Nevertheless, neural networks are lacking robustness and interpretability. Prototype-based vector q...

This paper proposes a variant of the generalized learning vector quantizer (GLVQ) optimizing explicitly the area under the receiver operating characteristics (ROC) curve for binary classification problems instead of the classification accuracy, which is frequently not appropriate for classifier evaluation. This is particularly important in case of...

Background: Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness o...

In the present contribution we consider sequence learning by means of unsupervised and supervised vector quantization, which should be invariant regarding to shifts in the sequences. A mathematical tool to achieve a respective invariant representation and comparison of sequences are Hankel matrices with an appropriate dissimilarity measure based on...

In this paper we introduce taxonomies for similarity and dissimilarity measures, respectively, based on their mathematical properties. Further, we propose a definition for rank equivalence of (dis)similarities regarding given data for prototype based methods. Starting with this definition we provide a measure to judge the degree of equivalence, whi...

In diesem Beitrag werden aktuelle Methoden der angewandten Datenanalyse zum Clustern und Klassifizieren vorgestellt. Das automatische Klassifizieren und Clustern von großen Datenmengen von zum Teil heterogenen Datenstrukturen ist ein zentrales Thema in vielen Bereichen der Forensik. Daher sollen hier neuere Methoden der Computational Intelligence z...

Learning vector quantization (LVQ) is one of the most powerful approaches for prototype based classification of vector data, intuitively introduced by Kohonen. The prototype adaptation scheme relies on its attraction and repulsion during the learning providing an easy geometric interpretability of the learning as well as of the classification decis...

Learning vector quantization models (LVQ) belong to the most successful machine learning classifiers. LVQs are intuitively designed and generally allow an easy interpretation according to the class dependent prototype principle. Originally, LVQs try to optimize the classification accuracy during adaptation, which can be misleading in case of imbala...

Tangent distances (TDs) are important concepts for data manifold distance description in machine learning. In this paper we show that the Hausdorff distance is equivalent to the TD for certain conditions. Hence, we prove the metric properties for TDs. Thereafter, we consider those TDs as dissimilarity measure in learning vector quantization (LVQ) f...

An introduction is given to the use of prototype-based models in supervised machine learning. The main concept of the framework is to represent previously observed data in terms of so-called prototypes, which reflect typical properties of the data. Together with a suitable, discriminative distance or dissimilarity measure, prototypes can be used fo...

Data dissimilarities and similarities are the key ingredients of machine learning. We give a mathematical characterization and classification of those measures based on structural properties also involving psychological-cognitive aspects of similarity determination, and investigate admissible conversions. Finally, we discuss some consequences of th...

In this paper, we propose an extension of the learning vector quantization approach to classify matrix data. Examples for those data are functional data depending on time and frequency. The resulting learning matrix quantization algorithm is similar to the vectorial approach but now based on matrix norms. We favor Schatten-p-norms as the generaliza...

An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of potentially high-dimensional, complex datasets. We disc...

This paper addresses the application of gradientdescentbased machine learning methods to complex-valued data. In particular, the focus is on classification using Learning Vector Quantization and extensions thereof. In order to apply gradient-based methods to complex-valued data we use the mathematical formalism of Wirtinger’s calculus to describe t...

Prototype-based
classification is mainly influenced by the family of learning vector quantizers (LVQ) as introduced by Kohonen. The main goal is to optimize the classification accuracy while the prototypes explore the class distribution in the data space. Recent variants can deal also with dissimilarity data, i.e. only the dissimilarities between t...

Reject
options
in
classification
play
a major role whenever the costs of a misclassification are
higher than the costs to
postpone the decision, prime examples being safety critical systems, medical diagnosis, or models which rely on user interaction and user acceptance. While optimum reject options can be computed analytically in case of a probabi...

This report documents the talks, discussions and outcome of the Dagstuhl seminar 16261 “Integration of Expert Knowledge for Interpretable Models in Biomedical Data Analysis”. The seminar brought together 37 participants from three diverse disciplines, who would normally not have opportunities to meet in such a forum, let alone discuss common intere...

We consider a reject option for prototype-based Learning Vector Quantization (LVQ), which facilitates the detection of outliers in the data during the classification process. The rejection mechanism is based on a distance-based criterion and the corresponding threshold is automatically adjusted in the training phase according to pre-defined rejecti...

An increasing number of known RNA 3D structures contributes to the recognition of various RNA families and identification of their features. These tasks are based on an analysis of RNA conformations conducted at different levels of detail. On the other hand, the knowledge of native nucleotide conformations is crucial for structure prediction and un...

We used the machine learning strategy Generalized Matrix Learning Vector Quantization (GMLVQ) to separate enzyme classes solely by structural motif representation. As unique characteristic, GMLVQ rates the contribution of each feature to the classification performance. Remarkably, the set of valuable features differs significantly for the investiga...

In this paper we investigate possibilities of relevance learning in learning matrix quantization and discuss their mathematical properties. Learning matrix quantization can be seen as an extension of the learning vector quantization method, which is one of the most popular and intuitive prototype based vector quantization algorithms for classificat...

Exemplar based techniques such as affinity propagation represent data in terms of typical exemplars. This has two benefits: (i) the resulting models are directly interpretable by humans since representative exemplars can be inspected in the same way as data points, (ii) the model can be applied to any dissimilarity measure including non-Euclidean o...

In supervised learning the parameters of a parametric Euclidean distance or mahalanobis distance can be effectively learned by so called Matrix Relevance Learning. This adaptation is not only useful to improve the discrimination capabilities of the model, but also to identify relevant features or relevant correlated features in the input data. Clas...

Principal component analysis based on Hebbian learning is originally designed for data processing in Euclidean spaces. We present in this contribution an extension of Oja׳s Hebbian learning approach for non-Euclidean spaces. We show that for Banach spaces the Hebbian learning can be carried out using the underlying semi-inner product. Prominent exa...

Prototype based vector quantization is usually proceeded in the Euclidean data space. In the last years, also non-standard metrics became popular. For classification by support vector machines, Hilbert space representations, which are based on so-called kernel metrics, seem to be very successful. In this paper we show that gradient based learning i...

Learning vector quantization (LVQ) algorithms as powerful classifier models for class discrimination of vectorial data belong to the family of prototype-based classifiers with a learning scheme based on Hebbian learning as a widely accepted neuronal learning paradigm. Those classifier approaches estimate the class distribution and generate from thi...

Semi-automatic semantic labeling of occupancy grid maps has numerous applications for assistance robotic. This paper proposes an approach based on non-negative matrix factorization (NMF) to extract environment specific features from a given occupancy grid map. NMF also computes a description about where on the map these features need to be applied....

In this work we propose an online approach to compute a more precise assignment between parts of an upper human body model to RGBD image data. For this, a Self-Organizing Map (SOM) will be computed using a set of features where each feature is weighted by a relevance factor (RFSOM). These factors are computed using the generalized matrix learning v...

The basic concepts of distance based classification are introduced in terms of clear-cut example systems. The classical k-Nearest-Neigbhor (kNN) classifier serves as the starting point of the discussion. Learning Vector Quantization (LVQ) is introduced, which represents the reference data by a few prototypes. This requires a data driven training pr...

Classification is one of the most frequent tasks in machine learning. However, the variety of classification tasks as well as classifier methods is huge. Thus the question is coming up: which classifier is suitable for a given problem or how can we utilize a certain classifier model for different tasks in classification learning. This paper focuses...

The amount of available functional data like time series and hyper-spectra in remote sensing is rapidly growing and requires an efficient processing taking into account the knowledge about this special data characteristic. Usually these data are high-dimensional but with inherent correlations between neighbored vector dimensions reflecting the func...

We investigate the role of redundancy for exploratory learning of inverse functions, where an agent learns to achieve goals by performing actions and observing outcomes. We present an analysis of linear redundancy and investigate goal-directed exploration ...

We introduce a generalization of Multivariate Robust Soft Learning Vector Quantization. The approach is a probabilistic classifier and can deal with vectorial class labelings for the training data and the prototypes. It employs t-norms, known from fuzzy learning and fuzzy set theory, in the class label assignments, leading to a more flexible model...

In this contribution, we focus on reject options for prototype-based classifiers, and we present a comparison of reject options based on statistical models for prototype-based classification as compared to alternatives which are motivated by simple geometric principles. We compare the behavior of generative models such as Gaussian mixture models an...

The general aim in classification learning by supervised training is to achieve a high classification performance, frequently judged in terms of classification accuracy. A powerful method is the generalized learning vector quantizer, which realizes a gradient based optimization scheme based on a cost function approximating the usual symmetric miscl...

Prototype-based models such as learning vector quantization (LVQ) enjoy a wide popularity because they combine excellent classification and generalization ability with an intuitive learning paradigm: models are represented by few characteristic prototypes, the latter often being located at class typical positions in the data space. In this article...