Project

# Evaluation of Machine Learning Techniques by Entropic Means

Goal: The goal is to obtain a set of mathematical and visual tools to help in Exploratory Data Analysis of e.g. classification and clustering results. So far we have develop this for confusion matrices of multiclass classifiers and for multivariate data sources. The next step is to extend this framework to multivariate data processing.

Tools and materials: http://gpm.webs.tsc.uc3m.es/resources/tutorials/ettutorial/

Methods: Normalized Information Transfer (NIT) rate, Source Multivariate Entropy Balance Equation, SMET, CBET, Entropy-Modified Accuracy (EMA), Channel Entropy Balance equation, Source Multivariate Entropy Triangle, Channel Bivariate Entropy Triangle

0 new
3
Recommendations
0 new
0
Followers
0 new
23
0 new
328

## Project log

We introduce a framework for the evaluation of multiclass classifiers by exploring their confusion matrices. Instead of using error-counting measures of performance, we concentrate in quantifying the information transfer from true to estimated labels using information-theoretic measures. First, the Entropy Triangle allows us to visualize the balance of mutual information, variation of information and the deviation from uniformity in the true and estimated label distributions. Next the Entropy-Modified Accuracy allows us to rank classifiers by performance while the Normalized Information Transfer rate allows us to evaluate classifiers by the amount of information accrued during learning. Finally, if the question rises to elucidate which errors are systematically committed by the classifier, we use a generalization of Formal Concept Analysis to elicit such knowledge. All such techniques can be applied either to artificially or biologically embodied classifiers---e.g. human performance on perceptual tasks. We instantiate the framework in a number of examples to provide guidelines for the use of these tools in the case of assessing single classifiers or populations of them---whether induced with the same technique or not---either on single tasks or in a set of them. These include UCI tasks and the more complex KDD cup 99 competition on Intrusion Detection.
We introduce a variant of the R\'enyi entropy definition that aligns it with the well-known H\"older mean: in the new formulation, the r-th order R\'enyi Entropy is the logarithm of the inverse of the r-th order H\"older mean. This brings about new insights into the relationship of the R\'enyi entropy to quantities close to it, like the information potential and the partition function of statistical mechanics. We also provide expressions that allow us to calculate the R\'enyi entropies from the Shannon cross-entropy and the escort probabilities. Finally, we discuss why shifting the R\'enyi entropy is fruitful in some applications.
We introduce a variant of the Rényi entropy definition that aligns it with the well-known Hölder mean: in the new formulation, the r-th order Rényi Entropy is the logarithm of the inverse of the r-th order Hölder mean. This brings about new insights into the relationship of the Rényi entropy to quantities close to it, like the information potential and the partition function of statistical mechanics. We also provide expressions that allow us to calculate the Rényi entropies from the Shannon cross-entropy and the escort probabilities. Finally, we discuss why shifting the Rényi entropy is fruitful in some applications.
We introduce a variant of the Rényi entropy definition that aligns it with the well-known Hölder mean: in the new formulation, the r-th order Rényi Entropy is the logarithm of the inverse of the r-th order Hölder mean. This brings about new insights into the relationship of the Rényi entropy to quantities close to it, like the information potential and the partition function of statistical mechanics. We also provide expressions that allow us to calculate the Rényi entropies from the Shannon cross-entropy and the escort probabilities. Finally, we discuss why shifting the Rényi entropy is fruitful in some applications.
Jaime Mouvet has just defended his Grad. Thesis on developing a Python package to use the Entropy Triangles in Python applications. It covers both the Multivariate and Binary Channel Entropy Triangles (CMET, CBET) as well as the Source Multivariate Entropy Triangle (SMET).
Check the code at:

We gave a tutorial at IJCNN18 in Rio de Janeiro. Find here the slides and more materials and software at http://gpm.webs.tsc.uc3m.es/resources/tutorials/ettutorial/

Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information X¯ into a discrete, multivariate sink of information Y¯ related by a distribution PX¯Y¯. The first contribution is a decomposition of the maximal potential entropy of (X¯,Y¯), which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of X¯ and Y¯, respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.
In this paper we use information-theoretic measures to provide a theory and tools to analyze the flow of information from a discrete, multivariate source of information $\overline X$ to a discrete, multivariate sink of information $\overline Y$ joined by a distribution $P_{\overline X \overline Y}$. The first contribution is a decomposition of the maximal potential entropy of $(\overline X, \overline Y)$ that we call a balance equation, that can also be split into decompositions for the entropies of $\overline X$ and $\overline Y$ respectively. Such balance equations accept normalizations that allow them to be represented in de Finetti entropy diagrams, our second contribution. The most important of these, the aggregate Channel Multivariate Entropy Triangle CMET is an exploratory tool to assess the efficiency of multivariate channels. We also present a practical contribution in the application of these balance equations and diagrams to the assessment of information transfer efficiency for PCA and ICA as feature transformation and selection procedures in machine learning applications.
We set out to demonstrate that the R\'enyi entropies with parameter $\alpha$ are better thought of as operating in a type of non-linear semiring called a positive semifield. We show how the R\'enyi's postulates lead to Pap's g-calculus where the functions carrying out the domain transformation are Renyi's information function and its inverse. In its turn, Pap's g-calculus under R\'enyi's information function transforms the set of positive reals into a family of semirings where "standard" product has been transformed into sum and "standard" sum into a power-deformed sum. Consequently, the transformed product has an inverse whence the structure is actually that of a positive semifield. Instances of this construction lead into idempotent analysis and tropical algebra as well as to less exotic structures. Furthermore, shifting the definition of the $\alpha$ parameter shows in full the intimate relation of the R\'enyi entropies to the weighted generalized power means. We conjecture that this is one of the reasons why tropical algebra procedures, like the Viterbi algorithm of dynamic programming, morphological processing, or neural networks are so successful in computational intelligence applications. But also, why there seem to exist so many procedures to deal with "information" at large.
We assess the behaviour of 5 different feature extraction methods for an acoustic event classification task - built using the same SVM underlying technology - by means of two different techniques: accuracy and the entropy triangle. The entropy triangle is able to find a classifier instance whose relatively high accuracy stems from an attempt to specialize in some classes to the detriment of the overall behaviour. On all other cases, fair classifiers, accuracy and entropy triangle agree.