Conference PaperPDF Available

A self-organizing map-based method for multi-label classification

Authors:
  • Federal University of São Carlos, SP, Brazil
A Self-Organizing Map-based Method for
Multi-Label Classification
Gustavo G. Colombini
Department of Computer Science
Federal University of S˜
ao Carlos
Rodovia Washington Lu´
ıs, km 235
S˜
ao Carlos - SP - Brazil
Email: gustavogiordanocolombini@gmail.com
Iuri Bonna M. de Abreu
Department of Computer Science
Federal University of S˜
ao Carlos
Rodovia Washington Lu´
ıs, km 235
S˜
ao Carlos - SP - Brazil
Email: iuri.bonna@gmail.com
Ricardo Cerri
Department of Computer Science
Federal University of S˜
ao Carlos
Rodovia Washington Lu´
ıs, km 235
S˜
ao Carlos - SP - Brazil
Email: cerri@dc.ufscar.br
Abstract—In Machine Learning, multi-label classification is the
task of assigning an instance to two or more categories simulta-
neously. This is a very challenging task, since datasets can have
many instances and become very unbalanced. While most of the
methods in the literature use supervised learning to solve multi-
label problems, in this paper we propose the use of unsupervised
learning through neural networks. More specifically, we explore
the power of Self-Organizing Maps (Kohonen Maps), since they
have a self-organization ability and maps input instances to a map
of neurons. Because instances that are assigned to similar groups
of labels tend to be more similar, there is a network tendency that,
after organization, training instances which are similar to each
other are mapped to closer neurons in the map. Testing instances
can then be mapped to specific neurons in the network, being
classified in the labels assigned to training instances mapped
to these neurons. Our proposal was experimentally compared
to other literature methods, showing competitive performances.
The evaluation was performed using freely available datasets and
measures specifically designed for multi-label problems.
I. INTRODUCTION
In the Machine Learning literature, conventional classifi-
cation problems are called single-label. In these problems, a
classifier is trained in a set of instances which are associated
with one single class lfrom a set of disjoint classes L, where
|L|>1. However, there are more complex problems in which
instances can be classified in many classes simultaneously.
These problems are called multi-label.
In a multi-label classification, instances are frequently asso-
ciated with a set of classes YL. In the past, the multi-label
classification was mainly motivated by tasks such as document
classification [1], [2], [3] and medical diagnostic [4]. Many
works can also be found in Bioinformatics [5], [6], [7], [8],
[9] and Image Classification [10], [11]. In document cate-
gorization problems, documents usually belong to more than
one class, e.g., Computer Sciences and Biology. In medical
diagnostic problems, a patient can be suffering from diabetes
and prostate cancer at the same time. An image can contain
mountain and beach characteristics simultaneously [12]. In
addition to these fields, multi-label classification methods are
also being applied to the classification of sentiments, since it
is possible to extract emotions from natural language texts,
such as micro-blogs [13].
Figure 1 illustrates a comparison between a conventional
classification problem, in which instances can be assigned only
to one class, and a multi-label classification problem. Figure
1(a) illustrates a classification problem where a document
belongs to only one class (“Biology” or “Computer Science”),
but never to both classes simultaneously. Figure 1(b) illustrates
a problem where a document can be assigned simultaneously
to the “Biology” and “Computer Science” classes. In Fig-
ure 1(b), instances inside the highlighted region addresses both
Computer Science and Biology subjects.
Different strategies are proposed in the literature to deal with
multi-label classification. The existing strategies fall into two
approaches: algorithm-independent and algorithm-dependent.
The algorithm-independent approach uses traditional clas-
sification algorithms, transforming the original multi-label
problem into a set of single-label problems. The algorithm-
dependent approach develops specific algorithms to deal with
the multi-label problem. These algorithms can be based on
conventional classifiers, as Support Vector Machines [14] and
Decision Trees [15].
While most of the literature methods use supervised learn-
ing to solve multi-label problems, in this paper we propose
the use of unsupervised learning through neural networks.
Because instances classified in similar sets of labels have an
intrinsic relationship, we explore the power of Self-Organizing
Maps (Kohonen Maps) [16], to map similar input instances
to network neurons. Training instances which are similar to
each other are mapped to closer neurons in the map. This
is performed by the adaptation process of the Kohonen Map.
Testing instances can then be mapped to specific neurons in
the network, being classified in the labels assigned to training
instances mapped to these neurons.
The remainder of this paper is organized as follows. Sec-
tion II reviews literature on multi-label classification; our pro-
posed method using Kohonen Maps is presented in Section III,
while Section IV presents the datasets, algorithms and evalua-
tion measures used; our experiments are reported in Section V,
together with their analysis; Finally, Section VI provides our
final considerations and future research directions.
(a)
Computer Science
Biology
(b)
Fig. 1. Example of classification problems: (a) conventional classification (single-label); (b) multi-label classification. Adapted from [11].
II. RE LATE D WOR K
Many studies have been proposed based on both algorithm
dependent and algorithm independent approaches. This section
presents some of them.
A. Algorithm Independent Approach
A very simple method, based on the algorithm independent
approach, uses Lclassifiers, with Lbeing the number of
classes which are involved in the problem. Each classifier
is then associated to a classes and trained to solve a binary
classification problem. The classes for which a classifier is
associated is considered against all the other involved classes.
This method is called Binary-Relevance (BR) [17]. A draw-
back of this method is that it assumes that the classes assigned
to an instance are independent of each other. This is not always
true, and ignoring all possible correlations between classes
may harm the generalization ability of the classifiers.
Over the years, BR improvements were proposed. In the
works of Cherman et. al. [18], Read et. al [19] and Dem-
bczynski et. al. [20], methods based on the Binary-Relevance
transformation were proposed. The idea is to use the instances’
classes to complement their attribute vectors, trying to incor-
porate the dependencies between labels in the learning pro-
cess. In Huang and Zhou [21], instances were clustered, and
similarities calculated within each cluster. These similarities
were used to augment the original feature vectors. Yu et.
al. [22] used concepts of neighborhood rough sets. The idea
was to find out the possibly related labels for a given instance,
excluding all unrelated ones. Spolaˆ
or et. al. [23] used labels
pairwise correlations to construct new binary labels to augment
the original feature vectors.
Another example of the algorithm independent approach is
the Label-Powerset (LP) transformation. For each instance,
all the classes assigned to it are combined into a new and
unique class. With this combination, the correlations between
classes are considered, but the number of classes involved
in the problem can be considerably increased, leading some
classes to end up with very few positive instances. This label
combination strategy was used in the works of Tsoumakas et.
al. [12] and Boutell et. al. [10].
Still in Tsoumakas et. al. [12], a method called Random
k-Labelsets (RAKEL) was proposed, based on the Label-
Powerset strategy. This method iteratively builds a combina-
tion of mLabel-Powerset classifiers. Being Lthe set of labels
of the problem, a k-labelset is given by subset of YL,
with k=|Y|. The term Lkrepresents the set of all the k-
labelsets of L. At every iteration, 1...m, a k-labelset Yiis
random selected of Lk, without replacements. A classifier Hi
is then trained for Yi. For the classification of a new instance,
each classifier Hitakes a binary decision for each label λj
from the k-labelset Yi. An average decision is calculated for
each label λjin L, and the final decision is positive for a given
label if the average decision is greater than a given threshold
t. The RAKEL method was proposed to take into account the
correlations between the classes and, at the same time, avoid
the disadvantages of the Label-Powerset method, where some
classes might end with few positive instances.
B. Algorithm Dependent Approach
A decision tree-based method was proposed by Clare and
King [5]. In this work, the authors modified the C4.5 [15]
algorithm to deal with protein classification according to its
functions. The C4.5 algorithm defines the decision tree nodes
through a measure called entropy. The authors modified the en-
tropy formula, originally elaborated for single-label problems,
in a way to allow its use in multi-label problems. Another
modification made by the authors was the use of the tree leaf-
nodes to represent a set of labels. A leaf node contains a set
of labels, and when reached, a separated rule is produced for
each class.
In Zhang and Zhou [6] a method based on the KNN
algorithm was proposed, called ML-kNN. In this method, for
each instance, the classes which are associated with its K
closest neighbors are recovered, and the number of neighbors
associated to each class is recovered. The maximum a posteri-
ori principle is then used to define the set of classes of a new
instance.
Also in Zhang and Zhou [24] a multi-label error mea-
sure was proposed for the training of neural networks with
the Back-propagation algorithm. The measure considers the
multiple classes of the instances in the calculation of the
classification error.
In Schapire and Singer [25], [26], two extensions of the
Adaboost algorithm [27] were proposed, allowing its use in
multi-label problems. In the first one, a modification was
proposed to measure the predictive performance of the induced
model, verifying its ability to predict a correct set of classes.
In the second, a change in the algorithm makes it predicts a
ranking of classes for each instance.
Thabtah et. al. [28] proposed a multi-label algorithm based
in class association rules. The algorithm was called multi-class
multi-label associative classification (MMAC). Initially, a set
of rules is created, and all instances associated to this set are
removed. The remaining instances are then utilized to create
a new set of rules. This procedure is executed until there are
no instances left.
A classification algorithm based on entropy was proposed
by Shenghuo et. al. [29] to the task of data recovery. The
authors used the model to explore correlations between classes
in multi-label documents.
Madjarov et. al. [30] published a work in which multiple
classification methods, based both in the algorithm dependent
and independent approaches, were compared. Several evalu-
ation measures were also used in the experiments. The best
performances were obtained by methods which try to consider
the dependencies of the classes during the training process.
III. MULTI -LA BE L CLASSIFICATION WITH
SEL F-ORGANIZING MA PS
In this section, we present our method for multi-label
classification using Kohonen Maps. Knowing the existence
of instance correlations in multi-label datasets, we used the
adaptive power of the self-organizing maps in order to group
correlated instances in a same region of the Kohonen Map.
We call our method Self Organizing Maps Multi-label Learn-
ing (SOM-MLL).
A. Kohonen Maps
Kohonen Maps [16] are self-organizing neural networks
capable of mapping similar instances to neurons next to each
other, belonging to a two-dimensional map of neurons. This
mapping is done through a competition between the map
neurons. The winner neuron is the one whose weight vector
is the closest to the instance vector being mapped.
When an instance is mapped to a neuron, the weights of
its neuron connections are adjusted in a way to strengthen
this mapping. A neighborhood around the winner neuron
(neighboring neurons) also have their weights adjusted in order
to form a neighborhood of similar neurons around the winner
neuron. Figure 2 illustrates a neural network connected to a
training instance.
With a Kohonen Map, it is possible to analyse the attributes
which led the instances to be mapped to a same region.
The mapping is performed calculating the Euclidean distance
between the instances’ attribute vector and the neurons’ weight
Input Instance
Matrix of
Neurons
Synapic
Connections
Winning
Neuron
Fig. 2. Mapping an instance to a Kohonen Map.
vectors. The smaller the distance between these vectors, the
closer the instance is from a neuron.
Figure 3 illustrates different views of a neuron map obtained
after training a Kohonen Map. Figure 3(a) illustrates, using a
color palette, the number of instances mapped to each neuron.
The black color represents neurons that did not have instances
mapped to it. Figure 3(b) illustrates the distance between
instances mapped to each neuron.
The idea behind the Kohonen Map is that similar instances
are mapped to the same neighborhood of neurons. Thus,
groups of similar instances are obtained. After obtaining these
groups, the map can be used for classification. For this, we can
map test instances to the already trained map. This process is
explained in the next sections.
B. Mapping Procedure
During the network training, and when mapping a test
instance, the winner neuron is obtained using the Euclidean
distance. This measure is used to calculate the distance be-
tween the attribute vector of an instance (x) and the weight
vector (wj) of a neuron j. This calculation is presented on
Equation 1, where Arepresents the number of attributes of
an instance. The winning neuron is the one closest to the
input instance. In the case of categorical attributes, other
distance measures can be used.
dj(x) = v
u
u
t
A
X
i=1
(xiwji )2(1)
After obtaining the winning neuron, its weights should be
adjusted to approximate it to the instance. Also, a topological
neighbourhood should be defined, and the weights of the
neurons in this neighbourhood are also adjusted, approximat-
ing them to the the winning neuron. These adjustments are
necessary to guarantee that a neighbourhood of close neurons
is created around the winning neuron.
A good choice for the neighbourhood function is the Gaus-
sian function. This function is presented in Equation 2, where
hj,i represents the neighbourhood around the winning neuron
i, formed by other excited neurons j. Also, dj,i defines a lateral
distance (in the neuron grid) between a winning neuron iand
an excited neuron j. The σparameter defines how broad is the
Fig. 3. Kohonen Maps: (a) number of instances mapped to each neuron; (b) distances between instances mapped to each neuron.
neighbourhood, influencing how the neighbourhood excited
neurons participate in the learning process.
hj,i =exp d2
j,i
2σ2!(2)
For auto-organization, the weight vector of a neuron j
should be adjusted according to the input instance xand a
learning rate η. This adjustment is given by Equation 3.
wj=ηhj,i (xwj)(3)
Given a weight vector wjat iteration t, the updated weight
vector at iteration t+ 1 is given by Equation 4.
wj(t+ 1) = wj(t) + ηhj,i (xwj(t)) (4)
The training process continues for a given number of
iterations. With repetitive presentations of the instances, the
network tends to converge, and the weights in the map tend
to follow the distribution of the input vectors. Algorithm III.1
shows the SOM-MLL training procedure.
C. Classification Procedure
To classify an instance xi, the classes of all instances are
represented by a binary vector vi. In this vector, the jth
position corresponds to the jth class of the problem. If an
instance xibelongs to class cj, then the position vi,j receives
the value 1, and 0 otherwise. With that representation, is
possible to classify a test instance using a prototype vector
v. After mapping a test instance to its closest neuron, the
prototype vector is obtained averaging the class vectors of
the training instances mapped to this neuron. The formula
to obtain the vector vfor a neuron nis presented in the
Equation 5. In this equation, Sn,j is the set of training
instances mapped to the neuron nwhich are classified in class
cj, and nis the full set of instances mapped to neuron n.
vn,j =|Sn,j |
|Sn|(5)
From Equation 5, we see that each position vn,j contains
the proportion of instances mapped to neuron nwhich are
Algorithm III.1: SOM-MLL training procedure
Function: train-SOM-MLL(X,e)
input : X= [q, (a+l)]: dataset with qinstances, a
attributes and llabels
e: number of epochs
output: W= [n, a]: weight matrix with nneurons and a
weights
Randomly Initialize weight matrix W;
for i1to edo
Randomize dataset X;
for j1to qdo
// Select winner neuron from
neuron grid
o(xj) = argmink||xjwk||k;
// Adjust weights of all excited
neurons
wk(i+1) = wk(i) + η(i)hk,o(xj)(i)(xj(i)wk(i))
return {W};
classified in class cj. This can be interpreted as the probability
of an instance to belong to class cj. To obtain a deterministic
prediction, a threshold is used. Thus, if a threshold value equal
to 0.5is used, all the positions whose values are greater or
equal to 0.5receives the value 1, and 0 otherwise. With this, it
is possible to compare the vectors of predicted classes with the
vectors of true classes. Figure 4 illustrates a prototype vector,
where a threshold value of 0.5was used. Algorithm III.2
presents the procedure to classify a new instance.
IV. MATERIALS AND METHODS
In this Section we present the datasets, algorithms and
evaluation measures used in our experiments.
A. Datasets
All the datasets used in this work are freely available1.
We chose seven representative ones from different applica-
1http://mulan.sourceforge.net/datasets-mlc.html
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
Class 7
Class 8
Class 9
Class 10
Class 11
Class 12
Class 13
Class 14
Class 15
Class 16
0.9 0.80.8 0.6 0.7 0.7 0.6 0.5 0.70.0 0.6 0.5 0.0 0.40.0 0.6
(a)
(b) 1 11 1 0 1 0 1 10 1 1 0 00 1
Fig. 4. Predictions for SOM-MLL. (a) prototype vector; (b) final predictions after threshold application. Adapted from Cerri et. al. [31].
Algorithm III.2: SOM-MLL classification procedure
Function: classify-SOM-MLL(X,e)
input : Xtrain = [q, (a+l)]: dataset with qinstances, a
attributes and llabels
Xtest = [m, a]: dataset with minstances and a
attributes
W= [n, a]: weight matrix with nneurons and a
weights
output: P= [m, q]: prediction matrix with mrows and
lcolumns
for j1to mdo
// Select winner neuron from neuron
grid
o(xtest
j) = argmink||xtest
jwk||k;
// Get training instances mapped to
winner neuron
Tinstances mapped to o(xtest
j);
// Get prototype vector
vjaverage of the label vectors from T;
// Associate prototype to instance
xtest
jxtest
j+vj;
pjvj;
return {P};
tion domains: audio, images, music and biology. The main
characteristics of the datasets are shown in Table I.
Each column from Table I gives, respectively, the dataset
identification, application domain, number of instances, num-
ber of nominal and numeric attributes, label cardinality, label
density, and number of distinct labelsets. Label Cardinality
(LC) is the average number of labels per instance, while Label
Density (LD) is LC divided by the total number of labels. In
Equations 6 and 7, mgives the total number of instances and
qgives the number of labels.
LC =1
m
m
X
i=1
|Yi|(6)
LD =1
m
m
X
i=1
|Yi|
q(7)
Analysing Label Cardinality (LC) and Label Density (LD)
is important to understand the behaviour of the classification
algorithms. While LC does not consider the number of labels,
LD does. LC can be used to quantify the number of alternative
labels assigned to an instance. Two datasets can have the same
LC, but different LD, causing the same classifier to behave
different. The number of distinct label sets is also important,
strongly influencing methods which operate on subsets of
labels [17].
B. Classification Algorithms
We compared our proposal with several algorithm depen-
dent and independent-based methods from the literature. The
algorithms used are listed below.
Support Vector Machine (SVM) [14], J48 decision tree
induction [15] and k-Nearest Neighbours (kNN) [32]. All
these algorithm were used with the Binary-Relevance and
Label-Powerset transformations;
Back-Propagation Multi-Label Learning (BPMLL) [24],
a neural network algorithm dependent-based method;
Multi-Label k-Nearest Neighbours (MLkNN) [33], a
KNN algorithm dependent-based method.
All the classification algorithms are implemented within
Mulan [34], a Java library for multi-label learning. For all
algorithms used, the default parameter values were used.
Considering our proposed method, it was implemented
using the Kohonen R package [35] within the R programming
language. Table II presents the SOM-MLL parameter values.
C. Evaluation Measures
Unlike single-label classification, wherein an instance is
classified either correctly or wrongly, in multi-label classifi-
cation, a classification can be considered partially correct or
partially wrong, requiring specific evaluation measures.
Let Hbe a multi-label classifier, with Zi=H(xi)the set
of predicted labels by Hfor a given instance xi;Yithe set of
true labels, Lthe total set of labels and Sthe set of instances.
Two commonly used evaluation measures are Precision and
Recall. They were used in the work of Godbole [36], and are
presented in Equations 8 and 9.
P recision(H, S ) = 1
|S|
|S|
X
i=1
|YiZi|
|Zi|(8)
Recall(H, S ) = 1
|S|
|S|
X
i=1
|YiZi|
|Yi|(9)
TABLE I
DATASET SS TATIST ICS
Name Domain # Instances # Nominal # Numeric # Labels Cardinality Density # Distinct
cal500 music 502 0 68 174 26.044 0.150 502
birds audio 645 2 258 19 1.014 0.053 133
emotions music 593 0 72 6 1.869 0.311 27
flags image 194 9 10 7 3.392 0.485 54
genbase biology 662 1186 0 27 1.252 0.046 32
scene image 2407 0 294 6 1.074 0.179 15
yeast biology 2417 0 103 14 4.237 0.303 198
TABLE II
SOM-MLL PARAMETER VALUES
Parameter Value
Grid topology Hexagonal
Number of neurons 5 ×5 = 25
Neighbourhood function Gaussian
Learning rate (η) Linearly decreases from 0.05 to 0.01 at each epoch
Neighbourhood radius (σ) Start with a number that covers 2/3 of all neighbour neurons. Linearly decreases at each epoch until reaches the negative value of that number
As Precision and Recall alone are not adequate for the
evaluation of classifiers, we also used the harmonic mean
of these two measures, called Fmeasure. Its calculation is
presented in Equation 10.
F measure(H, S )=2×P recision ×Recall
P recision +Recall (10)
In order to validate our analysis, all the experiments were
performed using the 10-fold cross validation strategy. To split
the data, we used the iterative stratification strategy proposed
by [37]. In this strategy, the desired number of instances in
each subset is calculated. Then, each instance is examined
iteratively so that the algorithm can select an appropriate sub-
set for distribution. The stratification strategy are implemented
within the utiml R Package [38].
V. EX PE RI ME NT S AN D DISCUSSION
Tables III, IV and V show the mean precision, recall and
fmeasure results obtained in our experiments. We refer to our
method as Self Organizing Maps Multi-label Learning (SOM-
MLL). For the algorithm independent-based methods, we refer
to them as Binary-Relevance (BR) or Label-Powerset (LP).
As can be seen considering the average and individual
dataset results, the performance of SOM-MLL can be consid-
ered competitive with the performances of the other literature
methods. Regarding the precision values, SOM-MLL was
able to obtain better results than some algorithm dependent
and independent based methods, specially in datasets cal500,
emotions, flags and yeast. If we look at the recall values,
Table IV shows that our method obtained smaller values than
most of the methods. This may be explained by the use of
only the winner neuron when calculating the prototype vector
used to classify a new instance. The use of a neighbourhood
of neurons can lead to a better coverage.
Considering the f-measure results, SOM-MLL could not
obtain better results than the SVM, KNN and MLkNN algo-
rithms. However, the results obtained can be considered very
promising, specially if we compare our method with J48 and
BPMLL, where we obtained competitive or better results.
We consider the results obtained so far very promising,
specially considering that there is still room for improvements
in the algorithm. As already mentioned, one modification that
could improve the results is related to the number of neurons
used to construct the average label vector of a test instance.
Instead of considering only the training instances mapped to
the winning neuron, we can also consider the training instances
mapped to the neighbourhood of the winning neuron. A thresh-
old can be used to vary the size of this neighbourhood. Such
modification can considerably improve the results, specially
considering that, in our current version, there are winning
neurons with only one or two training instances mapped to
it, which can drop the recall values obtained.
Another improvement that can be implemented is related to
the number of neurons used to build the grid of neurons for
training. Currently, we are using the default values provided
by the Kohonen R package. However, we could also tune
this value specifically for each dataset. The other parameters,
such as learning rate or size of the neighbourhood for weight
update, could also be tuned for each dataset.
To verify if statistically significant results were obtained,
we applied the Friedman [39] statistical test considering the
fmeasure results. The p-value obtained was 0.035, which
does not provide strong evidence about statistically significant
differences. The Nemenyi post-hoc test was then applied to
identify which pairwise comparisons presented statistically
significant differences. The critic diagram in Figure 5 shows
the results of the Nemenyi test. We connected methods where
no statistically significant results were detected.
According to Figure 5, the only method which obtained
statistically better results than SOM-MLL was the SVM
with the Label Powerset transformation. We would like to
emphasize, however, that this conclusion is based on the
averaged f-measure values obtained considering the fmea-
sure on all datasets. Also, the Friedman p-value of 0.035
TABLE III
PRECISION RESULTS
Dataset SOM-MLL SVM-BR J48-BR KNN-BR SVM-LP J48-LP KNN-LP BPMLL MLkNN
cal500 0.60 ±0.02 0.62 ±0.07 0.45 ±0.08 0.35 ±0.04 0.34 ±0.05 0.34 ±0.04 0.35 ±0.04 0.35 ±0.04 0.60 ±0.06
birds 0.53 ±0.03 0.70 ±0.12 0.63 ±0.09 0.66 ±0.10 0.70 ±0.12 0.63 ±0.13 0.66 ±0.07 0.45 ±0.11 0.62 ±0.10
emotions 0.63 ±0.07 0.68 ±0.10 0.59 ±0.13 0.63 ±0.11 0.68 ±0.16 0.58 ±0.15 0.63 ±0.11 0.64 ±0.12 0.70 ±0.16
flags 0.68 ±0.05 0.72 ±0.06 0.69 ±0.14 0.68 ±0.16 0.69 ±0.12 0.66 ±0.14 0.68 ±0.16 0.69 ±0.08 0.72 ±0.10
genbase 0.93 ±0.03 0.99 ±0.02 0.99 ±0.03 0.99 ±0.02 0.99 ±0.02 0.99 ±0.03 0.99 ±0.02 0.04 ±0.04 0.98 ±0.05
scene 0.53 ±0.04 0.62 ±0.08 0.56 ±0.06 0.71 ±0.05 0.76 ±0.05 0.60 ±0.07 0.71 ±0.05 0.37 ±0.08 0.70 ±0.07
yeast 0.71 ±0.01 0.72 ±0.06 0.60 ±0.06 0.60 ±0.07 0.66 ±0.05 0.54 ±0.06 0.60 ±0.07 0.62 ±0.05 0.72 ±0.04
Average 0.65 0.72 0.64 0.66 0.68 0.62 0.66 0.45 0.72
TABLE IV
REC ALL R ES ULTS
Dataset SOM-MLL SVM-BR J48-BR KNN-BR SVM-LP J48-LP KNN-LP BPMLL MLkNN
cal500 0.23 ±0.01 0.23 ±0.04 0.29 ±0.07 0.35 ±0.06 0.35 ±0.06 0.34 ±0.05 0.35 ±0.06 0.72 ±0.05 0.22 ±0.05
birds 0.73 ±0.02 0.66 ±0.11 0.61 ±0.10 0.67 ±0.10 0.68 ±0.09 0.62 ±0.09 0.67 ±0.10 0.52 ±0.20 0.56 ±0.09
emotions 0.60 ±0.05 0.66 ±0.11 0.57 ±0.10 0.63 ±0.08 0.71 ±0.09 0.58 ±0.17 0.63 ±0.08 0.73 ±0.11 0.63 ±0.18
flags 0.65 ±0.06 0.76 ±0.16 0.74 ±0.12 0.65 ±0.14 0.68 ±0.10 0.66 ±0.15 0.65 ±0.18 0.76 ±0.12 0.76 ±0.17
genbase 0.92 ±0.03 0.99 ±0.02 0.99 ±0.02 0.99 ±0.02 0.99 ±0.03 0.98 ±0.04 0.99 ±0.02 0.66 ±0.03 0.95 ±0.05
scene 0.51 ±0.04 0.65 ±0.07 0.64 ±0.08 0.70 ±0.05 0.75 ±0.06 0.60 ±0.06 0.70 ±0.05 0.83 ±0.16 0.69 ±0.06
yeast 0.54 ±0.01 0.58 ±0.03 0.58 ±0.07 0.60 ±0.06 0.62 ±0.04 0.54 ±0.07 0.60 ±0.06 0.69 ±0.05 0.59 ±0.07
Average 0.59 0.64 0.63 0.65 0.68 0.61 0.65 0.70 0.62
TABLE V
FMEASURE RESULTS
Dataset SOM-MLL SVM-BR J48-BR KNN-BR SVM-LP J48-LP KNN-LP BPMLL MLkNN
cal500 0.32 ±0.01 0.34 ±0.07 0.34 ±0.07 0.34 ±0.05 0.34 ±0.05 0.33 ±0.05 0.34 ±0.05 0.45 ±0.03 0.32 ±0.06
birds 0.56 ±0.03 0.66 ±0.10 0.61 ±0.09 0.65 ±0.10 0.68 ±0.09 0.61 ±0.09 0.65 ±0.10 0.44 ±0.12 0.58 ±0.09
emotions 0.60 ±0.06 0.60 ±0.11 0.55 ±0.08 0.60 ±0.06 0.67 ±0.12 0.55 ±0.14 0.60 ±0.08 0.66 ±0.10 0.63 ±0.16
flags 0.64 ±0.05 0.73 ±0.11 0.70 ±0.13 0.65 ±0.16 0.67 ±0.09 0.66 ±0.15 0.65 ±0.15 0.70 ±0.10 0.73 ±0.11
genbase 0.92 ±0.04 0.99 ±0.02 0.99 ±0.03 0.99 ±0.02 0.99 ±0.03 0.99 ±0.04 0.99 ±0.02 0.06 ±0.06 0.96 ±0.05
scene 0.52 ±0.04 0.62 ±0.07 0.56 ±0.04 0.70 ±0.05 0.75 ±0.06 0.51 ±0.06 0.70 ±0.05 0.49 ±0.11 0.69 ±0.06
yeast 0.59 ±0.01 0.61 ±0.03 0.56 ±0.06 0.57 ±0.07 0.62 ±0.04 0.51 ±0.06 0.57 ±0.07 0.63 ±0.06 0.62 ±0.05
Average 0.59 0.65 0.61 0.64 0.67 0.59 0.64 0.49 0.64
CD
1 2 345 6 789
SVM-LP
SVM-BR
KNN-LP
KNN-BR
SOM-MLL
J48-LP
J48-BR
BPMLL
MLkNN
Fig. 5. Critical Diagram for the Nemenyi post-hoc Statistical Test.
does not provide strong evidence of statistically significant
differences. Considering the individual datasets, SOM-MLL
obtained better precision and recall results than SVM-LP in
some datasets, resulting in very competitive fmeasure values in
some cases. See for example datasets cal500, flags and yeast.
Again, considering a neighbourhood of neurons, better results
can be obtained.
VI. CONCLUSIONS AND FUTURE WO RK S
In this paper, we proposed a method called Self Organizing
Maps Multi-label Learning (SOM-MLL), which uses Kohonen
Maps for multi-label classification. The training instances are
mapped to the neurons of a grid, which organizes itself in
order to make similar instances to be mapped to the same
region of the grid. The idea is that instance classified into a
similar set of classes are mapped to the same region of the
grid. To classify a new test instance, it is mapped to a neuron
of the grid, and the classes of the training instances mapped
to this neuron are used to label the test instance.
The experiments showed that SOM-MLL presented compet-
itive and promising results compared to the literature methods
investigated, specially considering there is still a lot of room
for improving the algorithm.
As future work, we plan to extend our method, allowing
a neighbourhood of neurons to be used to classify a new
instance, instead of only the winning neuron. Also, different
neighbourhood sizes and topologies can be experimented,
together with different algorithm parameters. More multi-label
classification algorithms and datasets should also be used in
the experimental comparisons.
ACKNOWLEDGMENT
The authors would like to thank CAPES, CNPq and
FAPESP for their financial support, specially the grant
#2015/14300-1 - S˜
ao Paulo Research Foundation (FAPESP).
REFERENCES
[1] T. Gonc¸alves and P. Quaresma, “A preliminary approach to the multilabel
classification problem of portuguese juridical documents,” in EPIA,
2003, pp. 435–444.
[2] B. Lauser and A. Hotho, “Automatic multi-label subject indexing in a
multilingual environment,” in Proc. of the 7th European Conference in
Research and Advanced Technology for Digital Libraries, ECDL 2003,
vol. 2769. Springer, 2003, pp. 140–151.
[3] X. Luo and N. A. Zincir-Heywood, “Evaluation of two systems on multi-
class multi-label document classification,” in International Syposium on
Methodologies for Intelligent Systems, 2005, pp. 161–169.
[4] A. Karalic and V. Pirnat, “Significance level based multiple tree classi-
fication,” in Informatica, vol. 15, no. 5, 1991, p. 12.
[5] A. Clare and R. D. King, “Knowledge discovery in multi-label phe-
notype data,” in 5th European Conference on Principles of Data
Mining and Knowledge Discovery (PKDD2001), ser. LNAI, vol. 2168.
Springer, 2001, pp. 42–53.
[6] M.-L. Zhang and Z.-H. Zhou, “A k-Nearest Neighbor Based Algorithm
for Multi-label Classification,” vol. 2. The IEEE Computational
Intelligence Society, 2005, pp. 718–721 Vol. 2.
[7] A. Elisseeff and J. Weston, “Kernel Methods for Multi-labelled Classi-
fication and Categorical Regression Problems,” in Advances in Neural
Information Processing Systems. MIT Press, 2001, pp. 681–687.
[8] C. Vens, J. Struyf, L. Schietgat, S. Dˇ
zeroski, and H. Blockeel, “Decision
trees for hierarchical multi-label classification,” Machine Learning,
vol. 73, no. 2, pp. 185–214, 2008.
[9] R. Cerri, R. C. Barros, A. C. P. L. F. de Carvalho, and Y. Jin, “Reduction
strategies for hierarchical multi-label classification in protein function
prediction,” BMC Bioinformatics, vol. 17, no. 1, p. 373, 2016.
[10] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label
scene classification,” Pattern Recognition, vol. 37, no. 9, pp. 1757–1771,
2004.
[11] X. Shen, M. Boutell, J. Luo, and C. Brown, “Multilabel machine learning
and its application to semantic scene classification,” in Society of Photo-
Optical Instrumentation Engineers (SPIE) Conference Series, vol. 5307,
Dec. 2003, pp. 188–199.
[12] G. Tsoumakas and I. Katakis, “Multi Label Classification: An
Overview,International Journal of Data Warehousing and Mining,
vol. 3, no. 3, pp. 1–13, 2007.
[13] S. M. Liu and J.-H. Chen, “A multi-label classification based approach
for sentiment classification,” Expert Systems with Application, pp. 1083–
1093, 2015.
[14] V. N. Vapnik, The Nature of Statistical Learning Theory (Information
Science and Statistics). Springer-Verlag New York, Inc., 1999.
[15] J. R. Quinlan, C4.5: programs for machine learning. San Francisco,
CA, USA: Morgan Kaufmann Publishers Inc., 1993.
[16] T. K. Kohonen, “The self-organizing map,Proceedings of the IEEE,
vol. 78, no. 9, pp. 1464–1480, Sept 1990. [Online]. Available:
http://dx.doi.org/10.1109/5.58325
[17] G. Tsoumakas, I. Katakis, and I. P. Vlahavas, “Mining Multi-label
Data,” in Data Mining and Knowledge Discovery Handbook, 2nd ed.,
O. Maimon and L. Rokach, Eds. Springer, 2010, pp. 667–685.
[18] E. A. Cherman, J. Metz, and M. C. Monard, “Incorporating label depen-
dency into the binary relevance framework for multi-label classification,”
Expert Systems with Applications, vol. 39, no. 2, pp. 1647–1655, Feb.
2012.
[19] J. Read, B. Pfahringer, G. Holmes, and E. Frank, “Classifier chains for
multi-label classification,” in Proceedings of the European Conference
on Machine Learning and Knowledge Discovery in Databases: Part II,
ser. ECML PKDD ’09. Berlin, Heidelberg: Springer-Verlag, 2009, pp.
254–269.
[20] K. Dembczynski, W. Cheng, and E. H´
ullermeier, “Bayes optimal multi-
label classification via probabilistic classifier chains,” in Proceedings of
the 27th International Conference on Machine Learning, J. F´
urnkranz
and T. Joachims, Eds. Omnipress, 2010, pp. 279–286.
[21] S.-J. Huang and Z.-H. Zhou, “Multi-label learning by exploiting label
correlations locally,” in AAAI Conference on Artificial Intelligence, 2012,
pp. 949–955.
[22] Y. Yu, W. Pedrycz, and D. Miao, “Multi-label classification by exploiting
label correlations,” Expert Syst. Appl., vol. 41, no. 6, pp. 2989–3004,
2014.
[23] N. Spolaˆ
or, M. C. Monard, G. Tsoumakas, and H. D. Lee, “A systematic
review of multi-label feature selection and a new method based on label
construction,” Neurocomputing, vol. 180, no. C, pp. 3–15, 2016.
[24] M.-L. Zhang and Z.-H. Zhou, “Multilabel Neural Networks with Ap-
plications to Functional Genomics and Text Categorization,” IEEE
Transactions on Knowledge and Data Engineering, vol. 18, pp. 1338–
1351, 2006.
[25] R. E. Schapire and Y. Singer, “Improved Boosting Algorithms Using
Confidence-rated Predictions,” in Machine Learning, vol. 37. Hingham,
MA, USA: Kluwer Academic Publishers, 1999, pp. 297–336.
[26] ——, “BoosTexter: a boosting-based system for text categorization,” in
Machine Learning, vol. 39. Hingham, MA, USA: Kluwer Academic
Publishers, 2000, pp. 135–168.
[27] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of
on-line learning and an application to boosting,” in European Conference
on Computational Learning Theory, 1995, pp. 23–37.
[28] F. A. Thabtah, P. Cowling, Y. Peng, R. Rastogi, K. Morik, M. Bramer,
and X. Wu, “Mmac: A new multi-class, multi-label associative classi-
fication approach,” in Fourth IEEE International Conference on Data
Mining, 2004, pp. 217–224.
[29] S. Zhu, X. Ji, W. Xu, and Y. Gong, “Multi-labelled classification using
maximum entropy method,” in International conference on research and
development in information retrieval. New York, NY, USA: ACM,
2005, pp. 274–281.
[30] G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. Dzeroski, “An extensive
experimental comparison of methods for multi-label learning,” Pattern
Recognition, vol. 45, no. 9, pp. 3084–3104, 2012.
[31] R. Cerri, G. L. Pappa, A. C. P. Carvalho, and A. A. Freitas,
“An extensive evaluation os decision tree-based hierarchical multilabel
classification methods and performance measures,” Computational
Intelligence, 2013, aceito para publicac¸ ˜
ao. [Online]. Available:
http://dx.doi.org/10.1111/coin.12011
[32] D. W. Aha, D. Kibler, and M. K. Albert, “Instance-based learning
algorithms,” Machine Learning, vol. 6, no. 1, pp. 37–66, 1991.
[33] M.-L. Zhang and Z.-H. Zhou, “ML-KNN: A lazy learning approach
to multi-label learning,” Pattern Recognition, vol. 40, pp. 2038–2048,
July 2007. [Online]. Available: http://portal.acm.org/citation.cfm?id=
1234417.1234635
[34] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, and I. Vlahavas,
“Mulan: A java library for multi-label learning,Journal of Machine
Learning Research, vol. 12, pp. 2411–2414, 2011.
[35] R. Wehrens and L. Buydens, “Self- and super-organising maps in r:
the kohonen package,” J. Stat. Softw., vol. 21, no. 5, 2007. [Online].
Available: http://www.jstatsoft.org/v21/i05
[36] S. Godbole and S. Sarawagi, “Discriminative methods for multi-labeled
classification,” in 8th Pacific-Asia Conference on Knowledge Discovery
and Data Mining. Springer, 2004, pp. 22–30. [Online]. Available:
http://www.springerlink.com/content/maa4ag38jd3pwrc0
[37] K. Sechidis, G. Tsoumakas, and I. Vlahavas, On the Stratification
of Multi-label Data. Berlin, Heidelberg: Springer Berlin Heidelberg,
2011, pp. 145–158. [Online]. Available: http://dx.doi.org/10.1007/
978-3- 642-23808- 6 10
[38] A. Rivolli, utiml: Utilities for Multi-Label Learning, 2016, r package
version 0.1.0. [Online]. Available: http://CRAN.R-project.org/package=
utiml
[39] J. Demˇ
sar, “Statistical Comparisons of Classifiers over Multiple Data
Sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
... Diante do exposto, o presente artigo tem por objetivo, dentro do contexto de redes neurais artificiais, utilizar neurônios de uma região do Mapa de Kohonen para a classificação multirrótulo de exemplos. Para isso, será estendido o trabalho proposto por [Colombini et al. 2017], a fim de que as classes atribuídas a uma entrada de teste sejam escolhidas não apenas com base em um único neurônio vencedor, mas também com base na vizinhança ao redor desse neurônio. Utilizando conjuntos de dados artificiais e reais, esse trabalho visa investigar se a utilização de uma vizinhança de neurônios ao redor do neurônio vencedor leva a melhores resultados se comparado com a utilização de apenas um neurônio vencedor. ...
... O trabalho proposto por [Colombini et al. 2017] inspira grande parte do presente artigo. Naquele trabalho, os autores demonstraram que a tarefa de classificação multirrótulo utilizando Mapas de Kohonen é promissora, mesmo trazendo resultados de classificação utilizando apenas o neurônio vencedor. ...
... Naquele trabalho, os autores demonstraram que a tarefa de classificação multirrótulo utilizando Mapas de Kohonen é promissora, mesmo trazendo resultados de classificação utilizando apenas o neurônio vencedor. Assim, neste trabalho propomos uma extensão do método de [Colombini et al. 2017], utilizando, além do neurônio vencedor, uma vizinhança ao redor desse neurônio. · 3 ...
Conference Paper
O problema convencional de classificação no contexto do aprendizado de máquina consiste em classificar exemplos de conjuntos de dados em categorias pré-definidas, de acordo com uma ou mais características semelhantes. Contudo, alguns conjuntos de dados possuem classes com intersecções, ou seja, exemplos podem pertencer a mais de uma classe simultaneamente. Exemplos desses problemas podem ser encontrados, por exemplo, na identificação de gêneros de livros e na classificação de imagens. Esses tipos de problemas são denominados multirrótulo. O objetivo deste artigo é propor um novo método de classificação multirrótulo com Mapas de Kohonen. A ideia é utilizar o neurônio vencedor do processo competitivo do mapa auto-organizável, juntamente com a vizinhança ao redor desse neurônio, para a classificação de dados. Assim, um novo exemplo é classificado nas classes pertencentes aos exemplos de treino mapeados para o neurônio vencedor e sua vizinhança. A linguagem Python e a biblioteca de aprendizado de máquina Scikit-Learn foram utilizadas para implementação do modelo da rede neural, para implementação das medidas de avaliação, e para a geração de conjuntos de dados sintéticos. A utilização de uma vizinhança de neurônios foi comparada com uma proposta anterior utilizando apenas um neurônio vencedor. Os resultados mostraram que a utilização de uma vizinhança ao redor do neurônio vencedor é promissora, obtendo melhores resultados.
... In unsupervised manner, Gustavo et al. [22] explore the power of SOM for multi-label classification. Since the SOM has ability to map input instances to a map of neurons. ...
... The process is continued as follows: the weights of each neighborhood is also updated, and approximate to the winning neuron; a good choice for finding for the neighborhood is using Gaussian function 6, where h j,i is the neighborhood of the winning neuron i, while j is the older winning neuron, the distance d j,i is a distance between neurons, the defines the spreading of neighborhoods. [22] ...
... Second, the winning neuron is selected from the neuron grid Ω using the distance metric (e.g., Euclidean). Finally, the weights of all related neurons will be updated using Eq. 8 [22]. The process is terminated when the network is converged, and the weights in the map might have the same distribution as the input vectors. ...
Article
Full-text available
The autoencoder-based latent representations have been widely developed for unsupervised learning in cyber-security domain, and has shown remarkable performance. Our previous work has introduced a hybrid autoencoders (AEs) and self-organizing maps (SOMs) for unsupervised IoT malware detection. However, the paper has only examined the characteristics of the latent representation of ordinary AEs in comparison to that of principle component analysis (PCA) on various IoT malware scenarios. This paper extends the work by employing denoising AEs (DAEs) to enhance the generalization ability of latent representations as well as optimizing hyper-parameters of SOMs to improve the hybrid performance. Particularly, this aims to further examine the characteristics of AE-based structure models (i.e., DAE) for identifying unknown/new IoT attacks and transfer learning. Our model is evaluated and analyzed extensively in comparison with PCA and AEs by a number of experiments on the NBaIoT dataset. The experimental results demonstrate that the latent representation of DAEs is often superior to that of AEs and PCAs in the task of identifying IoT malware.
... In unsupervised manner, Gustavo et. al. [6] explore the power of SOM for multi-label classification. Since the SOM has ability to map input instances to a map of neurons. ...
... The process is continued as follow: the weights of each neighborhood also is updated, and approximate to the winning neuron; a good choice for finding for the neighborhood is using Gaussian function 4, where h j,i is the neighborhood of the winning neuron i, while j is the older winning neuron, the distance d j,i is a distance between neurons, the σ defines the spreading of neighborhoods. [6]. ...
... Finally, the weight vector at iteration (t + 1) is updated by Equation 6. ...
Chapter
Full-text available
The feature representation of AutoEncoders (AEs) has been widely used for unsupervised learning, particularly in cybersecurity domain, and demonstrated promising performance. However, deeply investigations of the feature learner for the task of IoT attack detection in unsupervised learning have not been carried out yet. In this paper, we study the feature representation of AEs in combination with a subsequent clustering-based technique like Self-Organizing Maps (SOM) for unsupervised learning IoT attack detection. This aims to get insight into the characteristics of the AE learners in the tasks of unsupervised IoT detection such as identifying unknown/new IoT attacks and transfer learning. To highlight the behavior of AE-based learners, a feature reduction like Principle Component Analysis (PCA) is used to construct a feature space for facilitating SOM. The proposed models are investigated and assessed extensively by a number of experiments and analyses on the NBaIoT dataset. The experimental results highly suggest that AEs should be used for transferring models as training data is highly un-balanced and includes IoT attacks being similar to Benign. If the training data seems to be balanced, and contains IoT attacks being significantly deviated from Benign, the feature reduction like PCA is more preferable.
... For example, the multi-scale attention-guided convolutional neural network is presented by Zheng et al. (2022) to effectively capture the varied properties of gastric cancer lesions. The authors demonstrated improved segmentation accuracy compared to traditional methods, highlighting the potential of advanced machine learning techniques for this task [15], [16], [17], [18] and Li et al. (2021) proposed a hybrid segmentation framework combining multiple approaches, including thresholding, region growing, and clustering, for the detection and delineation of gastric ulcers in endoscopic images. Their method addressed the challenges of varied lesion characteristics and showed promising results for clinical applications, yet the specificity for bleeding tissues in gastric images remains under-explored [19]. ...
Preprint
Full-text available
Timely and precise classification and segmentation of gastric bleeding in endoscopic imagery are pivotal for the rapid diagnosis and intervention of gastric complications, which is critical in life-saving medical procedures. Traditional methods grapple with the challenge posed by the indistinguishable intensity values of bleeding tissues adjacent to other gastric structures. Our study seeks to revolutionize this domain by introducing a novel deep learning model, the Dual Spatial Kernelized Constrained Fuzzy C-Means (Deep DuS-KFCM) clustering algorithm. This Hybrid Neuro-Fuzzy system synergizes Neural Networks with Fuzzy Logic to offer a highly precise and efficient identification of bleeding regions. Implementing a two-fold coarse-to-fine strategy for segmentation, this model initially employs the Spatial Kernelized Fuzzy C-Means (SKFCM) algorithm enhanced with spatial intensity profiles and subsequently harnesses the state-of-the-art DeepLabv3+ with ResNet50 architecture to refine the segmentation output. Through extensive experiments across mainstream gastric bleeding and red spots datasets, our Deep DuS-KFCM model demonstrated unprecedented accuracy rates of 87.95%, coupled with a specificity of 96.33%, outperforming contemporary segmentation methods. The findings underscore the model's robustness against noise and its outstanding segmentation capabilities, particularly for identifying subtle bleeding symptoms, thereby presenting a significant leap forward in medical image processing.
... Then, when a test data is aimed to be clustered, it is mapped to the topographic map, which means BMU is defined for this data. Based on the BMU label, which is defined in the training process, the test data is clustered [27]. ...
Conference Paper
Full-text available
Online monitoring of electric power components in smart grids is of great importance to enhance reliability. Fault detection at primary levels in distribution transformers, the chief components to maintaining the integrity of modern power networks, prevents following significant destructive damages and high costs of failures in smart grids. Data-driven structure in smart grids provides accessibility to data related to the condition of transformers in data centers. Frequency response analysis (FRA), an efficient and sensitive technique to identify transformer defects, can be utilized in online monitoring. However, a trustworthy and consistent code for interpreting frequency responses has not yet been proposed by standards. This study proposes a self-organizing map (SOM) neural network as an intelligent interpreter using appropriate feature groups obtained from suitable statistical indices (SIns). In order to distinguish the severities and locations of disk space variation (DSV) defects as common faults in transformers, an experimental setup including 20 kV windings of a 1.6 MVA distribution transformer and an impedance analyzer are provided. The promising performance of SOM in detecting DSV faults with 100% accuracy shows that the proposed method is capable of identifying faults using high dimensional and nonlinear FRA data sets.
... After extracting the delta-radiomics features from the original ROI, due to the large number of radiomic features extracted from the images, many methods mentioned above are used to rule out redundant delta-radiomic features (DRFs). The selected DRFs are then tested to determine their significance as a treatment response function using linear regression models, t-test, and mixed-effect models (50). Significant DRFs are further tested and modeled using machine-learning algorithms to create a model that can predict the outcome of a new patient. ...
Article
Full-text available
By breaking the traditional medical image analysis framework, precision medicine–radiomics has attracted much attention in the past decade. The use of various mathematical algorithms offers radiomics the ability to extract vast amounts of detailed features from medical images for quantitative analysis and analyzes the confidential information related to the tumor in the image, which can establish valuable disease diagnosis and prognosis models to support personalized clinical decisions. This article summarizes the application of radiomics and dosiomics in radiation oncology. We focus on the application of radiomics in locally advanced rectal cancer and also summarize the latest research progress of dosiomics in radiation tumors to provide ideas for the treatment of future related diseases, especially ¹²⁵I CT-guided radioactive seed implant brachytherapy.
Article
This paper proposes a multi-label classification algorithm capable of continual learning by applying an Adaptive Resonance Theory (ART)-based clustering algorithm and the Bayesian approach for label probability computation. The ART-based clustering algorithm adaptively and continually generates prototype nodes corresponding to given data, and the generated nodes are used as classifiers. The label probability computation independently counts the number of label appearances for each class and calculates the Bayesian probabilities. Thus, the label probability computation can cope with an increase in the number of labels. Experimental results with synthetic and real-world multi-label datasets show that the proposed algorithm has competitive classification performance to other well-known algorithms while realizing continual learning.
Preprint
Full-text available
Informational tools are necessary at schools and colleges due to the sheer volume and diversity of data they handle. Numerous scholars has emphasized towards applying machine learning to retrieve information from the education database to enable students and educators in attaining greater results as a means of simplifying essential work. Selecting efficient tactics that might produce acceptable prediction performance is a challenging task for prediction models. In order to improve classification performance by addressing the misclassification issue, this study proposes a hybrid approach known as arbitrator miniature that combines factor analysis with the following nine machine learning techniques: Support vector machine, Random Forest, K Nearest Neighbor, Logistic Regression, Artificial neural network, Decision Tree, XG boost, Ada boost and Naïve Bayes. To evaluate the robustness of the suggested models, student datasets from a variety of academic fields at diploma-granting institutions in Karnataka, India, were used. In order to assess the proposed model using the datasets, assessment criteria such as classification accuracy and root mean square error were employed. This study’s findings revealed that proposed arbitrator miniature model might significantly improve classification performance. For the purpose of resolving prediction and classification issues, the proposed arbitrator miniature may be viewed as the best prediction models.
Article
Full-text available
Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label classifier. The properties of the stream may continuously change due to concept drift. Therefore, algorithms must constantly adapt to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named Adaptive Ensemble of Self-Adjusting Nearest Neighbor Subspaces (AESAKNNS). It leverages a self-adjusting kNN as a base classifier with the advantages of ensembles to adapt to concept drift in the multi-label environment. To promote diverse knowledge within the ensemble, each base classifier is given a unique subset of features and samples to train on. These samples are distributed to classifiers in a probabilistic manner that follows a Poisson distribution as in online bagging. Accompanying these mechanisms, a collection of ADWIN detectors monitor each classifier for the occurrence of a concept drift on the subspace. Upon detection, the algorithm automatically trains additional classifiers in the background to attempt to capture new concepts on new subspaces of features. The dynamic classifier selection chooses the most accurate classifiers from the active and background ensembles to replace the current ensemble. Our experimental study compares the proposed approach with 30 other classifiers, including problem transformation, algorithm adaptation, kNNs, and ensembles on 30 diverse multi-label datasets and 12 performance metrics. Results, validated using non-parametric statistical analysis, support the better performance of the AESAKNNS and highlight the contribution of its components in improving the performance of the ensemble.
Article
Full-text available
Background: Hierarchical Multi-Label Classification is a classification task where the classes to be predicted are hierarchically organized. Each instance can be assigned to classes belonging to more than one path in the hierarchy. This scenario is typically found in protein function prediction, considering that each protein may perform many functions, which can be further specialized into sub-functions. We present a new hierarchical multi-label classification method based on multiple neural networks for the task of protein function prediction. A set of neural networks are incrementally training, each being responsible for the prediction of the classes belonging to a given level. Results: The method proposed here is an extension of our previous work. Here we use the neural network output of a level to complement the feature vectors used as input to train the neural network in the next level. We experimentally compare this novel method with several other reduction strategies, showing that it obtains the best predictive performance. Empirical results also show that the proposed method achieves better or comparable predictive performance when compared with state-of-the-art methods for hierarchical multi-label classification in the context of protein function prediction. Conclusions: The experiments showed that using the output in one level as input to the next level contributed to better classification results. We believe the method was able to learn the relationships between the protein functions during training, and this information was useful for classification. We also identified in which functional classes our method performed better.
Article
Full-text available
In this age of ever-increasing data set sizes, especially in the natural sciences, visualisation becomes more and more important. Self-organizing maps have many features that make them attractive in this respect: they do not rely on distributional assumptions, can handle huge data sets with ease, and have shown their worth in a large number of applications. In this paper, we highlight the kohonen package for R, which implements self-organizing maps as well as some extensions for supervised pattern recognition and data fusion.
Article
Full-text available
Multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization, and semantic scene classification. This article introduces the task of multi-label classification, organizes the sparse related literature into a structured presentation and performs comparative experimental results of certain multilabel classification methods. It also contributes the definition of concepts for the quantification of the multi-label nature of a data set.
Article
It is well known that exploiting label correlations is important for multi-label learning. Existing approaches typically exploit label correlations globally, by assuming that the label correlations are shared by all the instances. In real-world tasks, however, different instances may share different label correlations, and few correlations are globally applicable. In this paper, we propose the ML-LOC approach which allows label correlations to be exploited locally. To encode the local influence of label correlations, we derive a LOC code to enhance the feature representation of each instance. The global discrimination fitting and local correlation sensitivity are incorporated into a unified framework, and an alternating solution is developed for the optimization. Experimental results on a number of image, text and gene data sets validate the effectiveness of our approach. Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved.
Article
Each example in a multi-label dataset is associated with multiple labels, which are often correlated. Learning from this data can be improved when dimensionality reduction tasks, such as feature selection, are applied. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. In this work, we propose an alternative method, LCFS, that constructs new labels based on relations between the original labels. By doing so, the label set from the data is augmented with second-order information before applying the standard approach. To assess LCFS, an experimental evaluation using Information Gain as a measure to estimate the importance of features was carried out on 10 benchmark multi-label datasets. This evaluation compared four LCFS settings with the standard approach, using random feature selection as a reference. For each dataset, the performance of a feature selection method is estimated by the quality of the classifiers built from the data described by the features selected by the method. The results show that a simple LCFS setting gave rise to classifiers similar to, or better than, the ones built using the standard approach. Furthermore, this work also pioneers the use of the systematic review method to survey the related work on multi-label feature selection. The summary of the 99 papers found promotes the idea that exploring label dependence during feature selection can lead to good results.
Chapter
In this chapter we consider bounds on the rate of uniform convergence. We consider upper bounds (there exist lower bounds as well (Vapnik and Chervonenkis, 1974); however, they are not as important for controlling the learning processes as the upper bounds).
Article
The self-organized map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications. The self-organizing map has the property of effectively creating spatially organized internal representations of various features of input signals and their abstractions. One result of this is that the self-organization process can discover semantic relationships in sentences. Brain maps, semantic maps, and early work on competitive learning are reviewed. The self-organizing map algorithm (an algorithm which order responses spatially) is reviewed, focusing on best matching cell selection and adaptation of the weight vectors. Suggestions for applying the self-organizing map algorithm, demonstrations of the ordering process, and an example of hierarchical clustering of data are presented. Fine tuning the map by learning vector quantization is addressed. The use of self-organized maps in practical speech recognition and a simulation experiment on semantic mapping are discussed