Project

Evaluation of Machine Learning Techniques by Entropic Means

Goal: The goal is to obtain a set of mathematical and visual tools to help in Exploratory Data Analysis of e.g. classification and clustering results. So far we have develop this for confusion matrices of multiclass classifiers and for multivariate data sources. The next step is to extend this framework to multivariate data processing.

Tools and materials: http://gpm.webs.tsc.uc3m.es/resources/tutorials/ettutorial/

Methods: Normalized Information Transfer (NIT) rate, Source Multivariate Entropy Balance Equation, SMET, CBET, Entropy-Modified Accuracy (EMA), Channel Entropy Balance equation, Source Multivariate Entropy Triangle, Channel Bivariate Entropy Triangle

Updates
0 new
3
Recommendations
0 new
0
Followers
0 new
23
Reads
0 new
328

Project log

Francisco J. Valverde-Albacete
added a research item
We introduce a framework for the evaluation of multiclass classifiers by exploring their confusion matrices. Instead of using error-counting measures of performance, we concentrate in quantifying the information transfer from true to estimated labels using information-theoretic measures. First, the Entropy Triangle allows us to visualize the balance of mutual information, variation of information and the deviation from uniformity in the true and estimated label distributions. Next the Entropy-Modified Accuracy allows us to rank classifiers by performance while the Normalized Information Transfer rate allows us to evaluate classifiers by the amount of information accrued during learning. Finally, if the question rises to elucidate which errors are systematically committed by the classifier, we use a generalization of Formal Concept Analysis to elicit such knowledge. All such techniques can be applied either to artificially or biologically embodied classifiers---e.g. human performance on perceptual tasks. We instantiate the framework in a number of examples to provide guidelines for the use of these tools in the case of assessing single classifiers or populations of them---whether induced with the same technique or not---either on single tasks or in a set of them. These include UCI tasks and the more complex KDD cup 99 competition on Intrusion Detection.
Francisco J. Valverde-Albacete
added a research item
We introduce a variant of the R\'enyi entropy definition that aligns it with the well-known H\"older mean: in the new formulation, the r-th order R\'enyi Entropy is the logarithm of the inverse of the r-th order H\"older mean. This brings about new insights into the relationship of the R\'enyi entropy to quantities close to it, like the information potential and the partition function of statistical mechanics. We also provide expressions that allow us to calculate the R\'enyi entropies from the Shannon cross-entropy and the escort probabilities. Finally, we discuss why shifting the R\'enyi entropy is fruitful in some applications.
Francisco J. Valverde-Albacete
added a research item
We introduce a variant of the Rényi entropy definition that aligns it with the well-known Hölder mean: in the new formulation, the r-th order Rényi Entropy is the logarithm of the inverse of the r-th order Hölder mean. This brings about new insights into the relationship of the Rényi entropy to quantities close to it, like the information potential and the partition function of statistical mechanics. We also provide expressions that allow us to calculate the Rényi entropies from the Shannon cross-entropy and the escort probabilities. Finally, we discuss why shifting the Rényi entropy is fruitful in some applications.
Francisco J. Valverde-Albacete
added a research item
We introduce a variant of the Rényi entropy definition that aligns it with the well-known Hölder mean: in the new formulation, the r-th order Rényi Entropy is the logarithm of the inverse of the r-th order Hölder mean. This brings about new insights into the relationship of the Rényi entropy to quantities close to it, like the information potential and the partition function of statistical mechanics. We also provide expressions that allow us to calculate the Rényi entropies from the Shannon cross-entropy and the escort probabilities. Finally, we discuss why shifting the Rényi entropy is fruitful in some applications.
Francisco J. Valverde-Albacete
added an update
Jaime Mouvet has just defended his Grad. Thesis on developing a Python package to use the Entropy Triangles in Python applications. It covers both the Multivariate and Binary Channel Entropy Triangles (CMET, CBET) as well as the Source Multivariate Entropy Triangle (SMET).
Check the code at:
 
Carmen Peláez-Moreno
added an update
We gave a tutorial at IJCNN18 in Rio de Janeiro. Find here the slides and more materials and software at http://gpm.webs.tsc.uc3m.es/resources/tutorials/ettutorial/
 
Carmen Peláez-Moreno
added a research item
Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information X¯ into a discrete, multivariate sink of information Y¯ related by a distribution PX¯Y¯. The first contribution is a decomposition of the maximal potential entropy of (X¯,Y¯), which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of X¯ and Y¯, respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.
Francisco J. Valverde-Albacete
added a research item
In this paper we use information-theoretic measures to provide a theory and tools to analyze the flow of information from a discrete, multivariate source of information $\overline X$ to a discrete, multivariate sink of information $\overline Y$ joined by a distribution $P_{\overline X \overline Y}$. The first contribution is a decomposition of the maximal potential entropy of $(\overline X, \overline Y)$ that we call a balance equation, that can also be split into decompositions for the entropies of $\overline X$ and $\overline Y$ respectively. Such balance equations accept normalizations that allow them to be represented in de Finetti entropy diagrams, our second contribution. The most important of these, the aggregate Channel Multivariate Entropy Triangle CMET is an exploratory tool to assess the efficiency of multivariate channels. We also present a practical contribution in the application of these balance equations and diagrams to the assessment of information transfer efficiency for PCA and ICA as feature transformation and selection procedures in machine learning applications.
Francisco J. Valverde-Albacete
added a research item
We set out to demonstrate that the R\'enyi entropies with parameter $\alpha$ are better thought of as operating in a type of non-linear semiring called a positive semifield. We show how the R\'enyi's postulates lead to Pap's g-calculus where the functions carrying out the domain transformation are Renyi's information function and its inverse. In its turn, Pap's g-calculus under R\'enyi's information function transforms the set of positive reals into a family of semirings where "standard" product has been transformed into sum and "standard" sum into a power-deformed sum. Consequently, the transformed product has an inverse whence the structure is actually that of a positive semifield. Instances of this construction lead into idempotent analysis and tropical algebra as well as to less exotic structures. Furthermore, shifting the definition of the $\alpha$ parameter shows in full the intimate relation of the R\'enyi entropies to the weighted generalized power means. We conjecture that this is one of the reasons why tropical algebra procedures, like the Viterbi algorithm of dynamic programming, morphological processing, or neural networks are so successful in computational intelligence applications. But also, why there seem to exist so many procedures to deal with "information" at large.
Francisco J. Valverde-Albacete
added a research item
We assess the behaviour of 5 different feature extraction methods for an acoustic event classification task - built using the same SVM underlying technology - by means of two different techniques: accuracy and the entropy triangle. The entropy triangle is able to find a classifier instance whose relatively high accuracy stems from an attempt to specialize in some classes to the detriment of the overall behaviour. On all other cases, fair classifiers, accuracy and entropy triangle agree.
Carmen Peláez-Moreno
added 2 research items
We extend a framework for the analysis of classifiers to encompass also the analysis of data sets. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions, not only bivariate ones. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how information gets transformed through machine learning classification tasks.
The introduction of Deep Neural Network (DNN) based acoustic models has produced dramatic improvements in performance. In particular, we have recently found that Deep Maxout Networks, a modification of DNNs’ feed-forward architecture that uses a max-out activation function, provides enhanced robustness to environmental noise. In this paper we further investigate how these improvements are translated into the different broad phonetic classes and how does it compare to classical Hidden Markov Models (HMM) based back-ends. Our experiments demonstrate that performance is still tightly related to the particular phonetic class being stops and affricates the least resilient but also that relative improvements of both DNN variants are distributed unevenly across those classes having the type of noise a significant influence on the distribution. A combination of the different systems DNN and classical HMM is also proposed to validate our hypothesis that the traditional GMM/HMM systems have a different type of error than the Deep Neural Networks hybrid models.
Francisco J. Valverde-Albacete
added a research item
We develop two tools to analyze the behavior of multiple-class, or multi-class, classifiers by means of entropic measures on their confusion matrix or contingency table. First we obtain a balance equation on the entropies that captures interesting properties of the classifier. Second, by normalizing this balance equation we first obtain a 2-simplex in a three-dimensional entropy space and then the de Finetti entropy diagram or entropy triangle. We also give examples of the assessment of classifiers with these tools.
Francisco J. Valverde-Albacete
added 2 research items
The implementation of the entropy triangle stemming from the 2011 paper and used for several other papers until 2014.
The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment. Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes. The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to “cheat” using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.
Francisco J. Valverde-Albacete
added a research item
We introduce from first principles an analysis of the information content of multivariate distributions as information sources. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions and find notable differences with similar analyses done on joint distributions as models of information channels. As an example application, we extend a framework for the analysis of classifiers to also encompass the analysis of data sets. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how well do datasets convey the information they are supposed to capture about the phenomena they stand for.
Francisco J. Valverde-Albacete
added a project goal
The goal is to obtain a set of mathematical and visual tools to help in Exploratory Data Analysis of e.g. classification and clustering results. So far we have develop this for confusion matrices of multiclass classifiers and for multivariate data sources. The next step is to extend this framework to multivariate data processing.