PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Previous research has shown the potential that Deep Neural Networks have in building representations that are useful not only for the task that the network was trained for but also for correlated tasks that take data from similar input distributions. For instance, recent works showed that representations built by a Convolutional Neural Network (CNN) are better than the state-of-art handcrafted features used for object classification and demonstrated that the representations learned by Google LeNet could be used for the task of Unsupervised Visual Object Recognition (UVOC), achieving about 75-90% of agreement with labels assigned by humans in an unseen dataset, when fed as input to a SOM-based clustering method. In this work, we propose an approach that combines SOM with Deep Learning in a synergic way to allow dealing with complex data structures, such as images and sound, by backpropagating the unsupervised error through layers of neurons.
Content may be subject to copyright.
Backpropagating the Unsupervised Error of
Self-Organizing Maps to Deep Neural Networks
Pedro H. M. Braga
CIn, Universidade Federal de Pernambuco
Recife, PE, Brazil, 50.740-560
phmb4@cin.ufpe.br
Heitor R. Medeiros
CIn, Universidade Federal de Pernambuco
Recife, PE, Brazil, 50.740-560
hrm@cin.ufpe.br
Hansenclever F. Bassani
CIn, Universidade Federal de Pernambuco
Recife, PE, Brazil, 50.740-560
hfb@cin.ufpe.br
1 Introduction
Previous research has shown the potential that Deep Neural Networks have in building representations
that are useful not only for the task that the network was trained for but also for correlated tasks that
take data from similar input distributions. For instance, Nanni et al.
[9]
showed that representations
built by a Convolutional Neural Network (CNN) are better than the state-of-art handcrafted features
used for object classification and Medeiros et al.
[8]
demonstrated that the representations learned
by GoogleLeNet can be used for the task of Unsupervised Visual Object Recognition (UVOC),
achieving about 75-90% of agreement with labels assigned by humans in an unseen dataset, when fed
as input to a Self-Organizing Map (SOM)-based clustering method [2].
In this work, we propose an approach that combines SO M with Deep Learning in a synergic way
to allow dealing with complex data structures, such as images and sound, by backpropagating the
unsupervised error through layers of neurons.
2 Research Problem
A key to the success of supervised learning, especially, deep supervised learning, is the availability of
sufficiently large labeled training data. Unfortunately, creating a sufficiently large amount of properly
labeled data with enough examples for each class is not easy. On the other hand, unlabeled data
usually can be easily obtained due to the advances in technology that have produced datasets of
increasing size, not only in terms of the number of samples but also in the number of features.
SOM [
6
] is a biologically inspired unsupervised learning method that maps data from a higher-
dimensional input space to a lower-dimensional output space, while preserving the similarities and
the topological relations found between points in the input space. SOM can create abstractions and
provide a simplified way of exhibiting information, being widely used in robotics, computer vision,
data mining, and natural language processing.
State-of-the-art SO M-based models are suitable for clustering high-level features. In [
1
], a fixed
topology SOM was proposed for subspace clustering by learning the relevance of each input dimension
for each cluster. In [
2
], a time-varying structure version was proposed, in which the map only grows
when new nodes are required, called LARFDSSOM. In [
3
,
4
], this model was extended to take
advantage of labeled data when it is available, thus enabling Semi-Supervised Learning.
LatinX in AI Research Workshop at NeurIPS 2019
However, despite these models are suitable for high-level features, they cannot deal with more
complex data structures, such as images and sound. Therefore, combining S OM with Deep Learning,
appears to be a good path to follow, and is the research problem considered in this work.
3 Motivation
DeepCluster, proposed by Caron et al.
[5]
, provides a framework to train Deep Learning Models
using K-means cluster to create pseudo-labels and adjust the weights. The work shows that good
representations of visual features can be learned. Van den Oord et al.
[11]
introduce the VQ-VAE, a
family of models that combine VAEs with vector quantization to build a discrete latent representation.
Their experiments demonstrated that the discrete latent space generated by the model captures
important features of the data in a completely unsupervised manner preserving the neighborhood.
However, current SOM-based methods, such as LARFDSSOM [
2
], display far superior results in
clustering than K-Means, being able to create a map that preserves the neighborhood and to learn
the relevance of each input dimension for each cluster. This leads us to believe that the error signal
provided by LARFDSSOM could be used to guide a Deep Neural Network to learn a smooth and
disentangled representation in its latent space.
4 Technical Contribution
The main contribution of this work is the proposition of a new loss function based on the clustering
error of SOM in a way that can be used to backpropagate errors to previous layers, counteracting the
tendencies of the model to collapse to trivial solutions. This loss function allows learning mappings
from a complex to a simple disentangled representation space that SOM can handle with smooth
transitions between cluster points.
Figure 1: Each distinct situation that must be handled when a batch is given.
In order to achieve this, a PyTorch implementation of a SOM-based trainable layer was developed
to take advantage of the high levels of parallelism of GPUs. This approach can be viewed as an
extension of traditional shallow SOM models. It also supports semi-supervised learning by splitting
the samples into two different batches. This process is illustrated by Fig. 1.
The loss function is given by:
Dω(z,cj)2=
m
X
i=1
ωji (zicji)2.(1)
4.1 Experimental Results
Preliminary results show that the proposed approach performs well in the image classification
benchmark datasets, SVHN [
10
], MNIST [
7
], and Fashion-MNIST [
12
], even with very low amounts
of labeled data (Fig. 2). However, it is necessary to carry out more detailed experiments to verify the
properties of the representations learned.
2
Figure 2: Results obtained varying the amount of labeled data from 1% to 100% on MNIST, Fashion-
MNIST and SVHN datasets.
Acknowledgments
The authors would like to thank the Brazilian National Council for Technological and Scientific
Development (CNPq) and Coordination for the Improvement of Higher Education Personnel (CAPES)
for supporting this research study. Moreover, the authors also gratefully acknowledge the support of
NVIDIA Corporation with the GPU Grant of a Titan V.
References
[1]
H. F. Bassani and A. F. Araújo. Dimension selective self-organizing maps for clustering high
dimensional data. In The 2012 International Joint Conference on Neural Networks (IJCNN),
pages 1–8. IEEE, 2012.
[2]
H. F. Bassani and A. F. Araujo. Dimension selective self-organizing maps with time-varying
structure for subspace and projected clustering. IEEE transactions on neural networks and
learning systems, 26(3):458–471, 2014.
[3]
P. H. Braga and H. F. Bassani. A semi-supervised self-organizing map for clustering and
classification. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8.
IEEE, 2018.
[4]
P. H. Braga and H. F. Bassani. A semi-supervised self-organizing map with adaptive local
thresholds. arXiv preprint arXiv:1907.01086, 2019.
[5]
M. Caron, P. Bojanowski, A. Joulin, and M. Douze. Deep clustering for unsupervised learning
of visual features. In Proceedings of the European Conference on Computer Vision (ECCV),
pages 132–149, 2018.
[6] T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480, 1990.
[7]
Y. LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
[8]
H. R. Medeiros, F. D. de Oliveira, H. F. Bassani, and A. F. Araujo. Dynamic topology and
relevance learning som-based algorithm for image clustering tasks. Computer Vision and Image
Understanding, 179:19–30, 2019.
[9]
L. Nanni, S. Ghidoni, and S. Brahnam. Handcrafted vs. non-handcrafted features for computer
vision classification. Pattern Recognition, 71:158–172, 2017.
[10]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in nat-
ural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and
Unsupervised Feature Learning, volume 2011, page 5, 2011.
[11]
A. Van den Oord, O. Vinyals, et al. Neural discrete representation learning. In Advances in
Neural Information Processing Systems, pages 6306–6315, 2017.
[12]
H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking
machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
3
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
In the recent years, there is a growing interest in semi-supervised learning, since, in many learning tasks, there is a plentiful supply of unlabeled data, but insufficient labeled ones. Hence, Semi-Supervised learning models can benefit from both types of data to improve the obtained performance. Also, it is important to develop methods that are easy to parameterize in a way that is robust to the different characteristics of the data at hand. This article presents a new Self-Organizing Map (SOM) based method for clustering and classification, called Adaptive Local Thresholds Semi-Supervised Self-Organizing Map (ALTSS-SOM). It can dynamically switch between two forms of learning at training time according to the availability of labels, as in previous models, and can automatically adjust itself to the local variance observed in each data cluster. The results show that the ALTSS-SOM surpass the performance of other semisupervised methods in terms of classification, and other pure clustering methods when there are no labels available, being also less sensitive than previous methods to the parameters values.
Article
Full-text available
In this paper, the task of unsupervised visual object categorization (UVOC) is addressed. We utilize a variant of Self-organizing Map (SOM) to cluster images in two different scenarios: disjoint (images from Caltech256) and non-disjoint (images from MSRC2) sets. First, we ran several tests to evaluate different image representation techniques: features obtained by a deep convolutional network were compared with those obtained by handcrafted methods, such as SIFT combined with a set of interest point detectors. As expected, we found that deep convolutional network features significantly outperformed its handcrafted counterparts. After choosing the best image representation technique, we compared the state-of-the-art image clustering algorithms with a SOM-based subspace clustering method that identifies automatically the relevant features in the high-dimensional image representations. The results have shown that our method achieves substantially lower clustering error than all competitors in several challenging testing settings.
Conference Paper
Full-text available
There has been an increasing interest in semi-supervised learning in the recent years because of the great number of datasets with a large number of unlabeled data but only a few labeled samples. Semi-supervised learning algorithms can work with both types of data, combining them to obtain better performance for both clustering and classification. Also, these datasets commonly have a high number of dimensions. This article presents a new semi-supervised method based on self-organizing maps (SOMs) for clustering and classification, called Semi-Supervised Self-Organizing Map (SS-SOM). The method can dynamically switch between supervised and unsupervised learning during the training according to the availability of the class labels for each pattern. Our results show that the SSSOM outperforms other semi-supervised methods in conditions in which there is a low amount of labeled samples, also achieving good results when all samples are labeled.
Article
Full-text available
This work presents a generic computer vision system designed for exploiting trained deep Convolutional Neural Networks (CNN) as a generic feature extractor and mixing these features with more traditional hand-crafted features. Such a system is a single structure that can be used for synthesizing a large number of different image classification tasks. Three substructures are proposed for creating the generic computer vision system starting from handcrafted and non-handcrafter features: i) one that remaps the output layer of a trained CNN to classify a different problem using an SVM; ii) a second for exploiting the output of the penultimate layer of a trained CNN as a feature vector to feed an SVM; and iii) a third for merging the output of some deep layers, applying a dimensionality reduction method, and using these features as the input to an SVM. The application of feature transform techniques to reduce the dimensionality of feature sets coming from the deep layers represents one of the main contributions of this paper. Three approaches are used for the non-handcrafted features: deep transfer learning features based on convolutional neural networks (CNN), principal component analysis network (PCAN), and the compact binary descriptor (CBD). For the handcrafted features, a wide variety of state-of-the-art algorithms are considered: Local Ternary Patterns, Local Phase Quantization, Rotation Invariant Co-occurrence Local Binary Patterns, Completed Local Binary Patterns, Rotated local binary pattern image, Globally Rotation Invariant Multi-scale Co-occurrence Local Binary Pattern, and several others. The computer vision system based on the proposed approach was tested on many different datasets, demonstrating the generalizability of the proposed approach thanks to the strong performance recorded. The Wilcoxon signed rank test is used to compare the different methods; moreover, the independence of the different methods is studied using the Q-statistic. To facilitate replication of our experiments, the MATLAB source code will be available at (https://www.dropbox.com/s/bguw035yrqz0pwp/ElencoCode.docx?dl=0).
Article
Full-text available
Subspace clustering is the task of identifying clusters in subspaces of the input dimensions of a given dataset. Noisy data in certain attributes cause difficulties for traditional clustering algorithms, because the high discrepancies within them can make objects appear too different to be grouped in the same cluster. This requires methods specially designed for subspace clustering. This paper presents our second approach to subspace and projected clustering based on self-organizing maps (SOMs), which is a local adaptive receptive field dimension selective SOM. By introducing a time-variant topology, our method is an improvement in terms of clustering quality, computational cost, and parameterization. This enables the method to identify the correct number of clusters and their respective relevant dimensions, and thus it presents nearly perfect results in synthetic datasets and surpasses our previous method in most of the real-world datasets considered.
Conference Paper
Full-text available
High dimensional datasets usually present several dimensions which are irrelevant for certain clusters while they are relevant to other clusters. These irrelevant dimensions bring difficulties to the traditional clustering algorithms, because the high discrepancies within them can make objects appear too different to be grouped in the same cluster. Subspace clustering algorithms have been proposed to address this issue. However, the problem remains an open challenge for datasets with noise and outliers. This article presents an approach for subspace and projected clustering based on Self-Organizing Maps (SOM), that is called Dimensional Selective Self-Organizing Map. DSSOM keeps the properties of SOM and it is able to find clusters and identify their relevant dimensions, simultaneously, during the self-organizing process. The results presented by DSSOM were promising when compared with state of art subspace clustering algorithms.
Article
Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
Article
The self-organized map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications. The self-organizing map has the property of effectively creating spatially organized internal representations of various features of input signals and their abstractions. One result of this is that the self-organization process can discover semantic relationships in sentences. Brain maps, semantic maps, and early work on competitive learning are reviewed. The self-organizing map algorithm (an algorithm which order responses spatially) is reviewed, focusing on best matching cell selection and adaptation of the weight vectors. Suggestions for applying the self-organizing map algorithm, demonstrations of the ordering process, and an example of hierarchical clustering of data are presented. Fine tuning the map by learning vector quantization is addressed. The use of self-organized maps in practical speech recognition and a simulation experiment on semantic mapping are discussed
Deep clustering for unsupervised learning of visual features
  • M Caron
  • P Bojanowski
  • A Joulin
  • M Douze
M. Caron, P. Bojanowski, A. Joulin, and M. Douze. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 132-149, 2018.