ArticlePDF Available

Unsupervised Multi-Level Non-Negative Matrix Factorization Model: Binary Data Case

Springer Nature
International Journal of Information Security
Authors:

Abstract and Figures

Rank determination issue is one of the most significant issues in non-negative matrix factorization (NMF) research. However, rank determination problem has not received so much emphasis as sparseness regularization problem. Usually , the rank of base matrix needs to be assumed. In this paper, we propose an unsupervised multi-level non-negative matrix factorization model to extract the hidden data structure and seek the rank of base matrix. From machine learning point of view, the learning result depends on its prior knowledge. In our unsupervised multi-level model, we construct a three-level data structure for non-negative matrix factorization algorithm. Such a construction could apply more prior knowledge to the algorithm and obtain a better approximation of real data structure. The final bases selection is achieved through L 2-norm optimization. We implement our experiment via binary datasets. The results demonstrate that our approach is able to retrieve the hidden structure of data, thus determine the correct rank of base matrix.
Content may be subject to copyright.
Journal of Information Security, 2012, 3, 245-250
doi:10.4236/jis.2012.34031 Published Online October 2012 (http://www.SciRP.org/journal/jis)
Unsupervised Multi-Level Non-Negative Matrix
Factorization Model: Binary Data Case
Qingquan Sun1, Peng Wu2, Yeqing Wu1, Mengcheng Guo1, Jiang Lu1
1Department of Electrical and Computer Engineering, The University of Alabama, Tuscaloosa, USA
2School of Information Engineering, Wuhan University of Technology, Wuhan, China
Email: quanqian123@hotmail.com
Received July 21, 2012; revised August 26, 2012; accepted September 7, 2012
ABSTRACT
Rank determination issue is one of the most significant issues in non-negative matrix factorization (NMF) research.
However, rank determination problem has not received so much emphasis as sparseness regularization problem. Usu-
ally, the rank of base matrix needs to be assumed. In this paper, we propose an unsupervised multi-level non-negative
matrix factorization model to extract the hidden data structure and seek the rank of base matrix. From machine learning
point of view, the learning result depends on its prior knowledge. In our unsupervised multi-level model, we construct a
three-level data structure for non-negative matrix factorization algorithm. Such a construction could apply more prior
knowledge to the algorithm and obtain a better approximation of real data structure. The final bases selection is
achieved through L2-norm optimization. We implement our experiment via binary datasets. The results demonstrate that
our approach is able to retrieve the hidden structure of data, thus determine the correct rank of base matrix.
Keywords: Non-Negative Matrix Factorization; Bayesian Model; Rank Determination; Probabilistic Model
1. Introduction
Non-negative matrix factorization (NMF) was proposed
by Lee and Seung [1] in 1999. NMF has become a
widely used technique over the past decade in machine
learning and data mining fields. The most significant
properties of NMF are non-negative, intuitive and part
based representative. The specific applications of NMF
algorithm include image recognition [2], audio and acous-
tic signal processing [3], semantic analysis and content
surveillance [4]. In NMF, given a non-negative dataset
M
N
VR
, the objective is to find two non-negative fac-
tor matrices
M
K
WR
and
K
N
H
R
. Here W is
called base matrix and H is named feature matrix. In ad-
dition, W and H satisfy
.. 0, 0VWHstW H (1)
K is the rank of base matrix and it satisfies the inequality

K
MN M N.
For NMF research, the cost function and initialization
problems of NMF are the main issues for researchers.
Now the rank determination problem becomes popular.
The rank of base matrix is indeed an important parameter
to evaluate the accuracy of structure extraction. On the
one hand, it reflects the real feature and property of data;
on the other hand, more accurate learning could help us
get better understanding and analyzing of data, thus im-
proving the performance in applications: recognition [5,6]
surveillance and tracking. The main challenge of rank
determination problem is that it is pre-defined. Therefore,
it is hard to know the correct rank of base matrix before
the updating process of components. As the same as the
cost function, there are no more priors added to the algo-
rithm in previous methods. That is why the canonical
NMF method and traditional probabilistic methods (ML,
MAP) cannot handle the rank determination problem.
Therefore in this paper, we propose an unsupervised
multi-level model to automatically seek the correct rank
of base matrix. Furthermore, we use L2-norm to show the
contribution of hyper-prior in correct bases learning pro-
cedure. Experimental results on two binary datasets dem-
onstrate that our method is efficient and robust.
The rest of this paper is organized as follows: Section
2 provides a brief review of related works. In Section 3,
we describe our unsupervised multi-level NMF model in
details. The experimental results of two binary datasets
are shown in Section 4. Section 5 concludes the paper.
2. Related Work
As we mentioned above, rank determination problem is a
new popular issue in NMF research. Actually, there are
few literatures discussing this issue. Although the author
in [7] proposed a method based on sampler selection, it
Copyright © 2012 SciRes. JIS
Q. Q. SUN ET AL.
246
needs to pass through all the possible values of rank of
base matrix to choose the best one. Obviously, this method
is not impressive enough for unsupervised learning. In
[8], the author proposed a rank determination method
based on automatic relevance determination. In this method,
a parameter is defined relevant to the columns of W.
Then using EM algorithm to find a subset, however, this
subset of bases is not accurate to represent true bases.
Actually, the nature of this hyper-parameter is to affect
the updating procedure of base matrix and feature matrix,
thus affect the components’ distributions.
The only feasible solution is fully Bayesian models.
Such kind of methods have been proposed in [9]. In this
paper, the author addresses an EM based fully Bayesian
algorithm to discover the rank of base matrix. EM based
methods are an approximation solution. In comparison, a
little more accurate solution is Gibbs sampling based
methods. Such approach is utilized to find the correct
rank in [10]. Although such kinds of methods are flexible,
it requires successively calculation of the marginal like-
lihood for each possible value of each rank K. The
drawback is too much computation cost involved. Addi-
tionally, when such methods are applied to real time ap-
plication or some large scale dataset based applications,
the high computation load is impractical. Motivated by
the current condition, we propose a low computation,
robust multi-level model for NMF to solve rank determi-
nation problem. Our unsupervised model with multi-
lever priors only calculate once of the rank of base ma-
trix and is able to successfully find the correct rank of
base matrix given a large enough rank K. Therefore, our
method involves less computation. This will be discussed
in details in next section.
3. Unsupervised Multi-Level Non-Negative
Matrix Factorization Model
In our unsupervised multi-level NMF model, we intro-
duce a hyper-prior level. Hence, there are three levels in
our model: data model, prior model, hyper-prior model.
The model structure is shown in Figure 1. We will seek
Figure 1. Unsupervised multi-level non-negative matrix fac-
torization model.
the solutions through optimizing the maximum a poste-
rior criterion. Our approach could be depicted by the
following equation, here c
denotes equality up to a
constant,
is the prior of both W and H.


,, log log
log log
c
MAP p p
pp



WH VWH W
H (2)
The difference between our approach and the tradi-
tional MAP criterion is that in traditional one there is no
hyper-prior added to the model. Moreover, in our model
we attempt to update the hyper-priors recursively, but not
just set it as a constant.
3.1. Model Construction
In NMF algorithm, the updating rules are based on the
specific data model. Therefore, the first step is to set a
data model for our problem. Here, in our experiment we
assume that the data follows Poisson distribution. Con-
sequently, the cost function of our model will be gener-
alized KL-divergence. So given a variable x, which fol-
lows Poisson distribution with parameter
, we have
1
x
px e x

. Thus, in NMF algorithm,
given dataset V, we have the likelihood

1pe

V
WH
VWH WH V (3)
The generalized KL-divergence is given by:

 
log mn
KL mn mn mn
mn mn
v
Dv v
wh









VWH wh
(4)
Thus, the log-likelihood of the dataset V can be re-
written as:


log
1log log 1
KL
mn mn mn
mn
p
D
vv v


VWH
VWH (5)
From (2) and (5) we could conclude that maximizing a
posterior is equivalent to maximizing the log-likelihood,
and maximizing the log-likelihood is equivalent to mini-
mizing the KL-divergence. Thus, maximizing a posterior
is equivalent to minimizing the KL-divergence. Therefore,
it is possible to find a base matrix W and a feature matrix
H to approximate the dataset V via maximizing a poste-
rior criterion.
In data model
pVWH we regard WH as the pa-
rameter of data V. With respect to the base matrix W and
the feature matrix H, we also introduce a parameter
as a prior to them. Moreover, we define an independent
Exponent distribution for each column of W and each
row of H with prior k
because exponent distribution
has sharper performance. It is no doubt that we can
choose other exponential family distributions such as
Copyright © 2012 SciRes. JIS
Q. Q. SUN ET AL. 247
Gaussian distribution, Gamma distribution, etc. There-
fore, the columns of W and rows of H yield:

kmk
w
pw e


mk k k (6)

kkn
h
kn k k
ph e

 (7)
Then the log-likelihood of the priors cou
te ld be rewrit-
n as:

log log kkmk
mk
pw


W (8)


log log kkkn
kn
ph


H (9)
Compare to setting
as a constant, the diversity of
k
and recursively upd ing of k
at
enable the inference
cedure to converge at the stationary point. Through
calculating the L2-morm of each column of base matrix
W, we could discover that the data finally emerges to two
clusters. One cluster contains the points of which the
L2-norm are much larger than 0, whereas in the other
cluster the L2-norm values are 0 or almost 0.
In order to find the best value for k
pro
, here we intro-
duce hyper-prior for k
. Since k
is e parameter of
Exponent distribution e defin k
th
, w e
follows Gamma
distribution which is the conjugate ior for Exponent
distribution. pr


1
1
,exp
() k
a
kkk k kk
a
k
pab b
ab



(10)
Here and are the hyper-priors of
k
ak
b k
. Thus,
the log-l lihood ike of
is given as:



log p
log log 1 log
kk k k kkk
kab a a b

 (11)
3.2. Inference
hment of data model and the deduction After the establis
of log-likelihood of each prior, we can gain the maxi-
mum a posterior equation:
 






,,MAP
log 1 log 1
log
log
log log 1 log
mn mn mn
mn
KL k mk k
mk
kkn k
kn
kk kkk k k
k
vv v
Dw
h
ab a ba b








VW
H(12)
Since the first factor in (12) has nothing to do with the
pri
WH
ors, and we have discussed the relationship between
the posterior probability and KL-divergence, here we
minimize the second factor to seek the solutions for this
criterion. In our paper, we choose gradient decent updat-
ing method as our updating rule. Although multiplicative
method is simpler, it has no detailed deduction about
why the approach works. On the contrary, gradient de-
cent updating will give us clear deduction about the
whole updating procedure. We utilize this method to in-
fer the priors W and H, as well as the hyper-priors
and b. First we find the gradient of the parameters:
TT
f
W

VHH
WH (13)
TT
f
H

V
WW
WH (14)

11
mk kn k k
mn
k
fwhbNMa

(15)
2
kk
k
k
a
f
b
b

k
b
(16)
Then we utilize gradient coefficient to get rid of the
subtraction operation during the updating procedure for
W and H to guarantee the non-negative constrain. The
parameters k
and k
b are updated by zeroing.
The updating rules listed as follows: are

mn kn
n
mn
mk mk
kn k
n
vh
wh
ww h


 (17)

mn
mk
mmn
kn kn
mk k
m
v
wwh
hh w


 (18)
1
1
k
k
mk kn k
mn
MNa
whb

 (19)
k
k
k
a
b
(20)
Then we find the correct bases and determine the order
of the data model by:
1
RB (21)
where B is defined as
22
,0
kk
Bww
(22)
R is the rank of base matrix.
4. Experimental Results and Evaluation
In this section, we apply our unsupervised multi-level
NMF algorithm on two binary datasets. One is fence
dataset, and the other is famous swimmer dataset. Both
of the experiment results demonstrate the efficacy of our
method on the rank determination issue.
Copyright © 2012 SciRes. JIS
Q. Q. SUN ET AL.
248
4.1. Fence Dataset
We first performed our experiments on fence dataset.
ra
Here I defined the data with four row bars (the size is 1 ×
32) and four column bars (the size is 32 × 1). The size of
each image is 32 × 32 with zero-value background, and
the value of each pixel in eight bars is one. Each image is
separated into five parts in both horizontal direction and
vertical direction. Additionally, in each image the num-
ber of row bars and the number of column bars should be
the same. For instance, there are two row bars in a sam-
ple image, then there should be two column bars in this
image. Hence, the total number of the fence dataset is N
= 69. The samples of Fence dataset are shown in Figure 2.
Here, we set the initial rank K = 16 (the initial value of
nk K needs to be larger than the value of real rank of
base matrix), the hyper-parameter a = 2,
1
0.05 0.05
k
K
b
. Figure 3 shows t
earned via our unsupervised multi-
level NMF approach, we could see that the data is sparse,
especially the base matrix. In both images, the color parts
denote the effective bases or features, and the black parts
denote irrelevant bases or features there. In addition,
from image processing perspective, we can conclude that
compared to the values of effective bases and features,
the values of irrelevant bases and features are very small,
since the color of such pixels are very dark. We could
clearly find that there are eight color column vectors in
the first image. Additionally, among the eight color vec-
tors, four are composed of several separated color pixels,
whereas the other four are composed of assembly pixels.
Actually, the former four vectors are row bars, and the
latter four vectors are column bars. We resize the dataset
in columns during factorization procedure. Hence the
row bars and column bars have different structures. Fur-
thermore, there are also eight rows in the second image,
which are the corresponding coefficients of the bases.
he base matrix
and feature matrix l
Figure 2. Sample images of fence dataset.
Figure 3. Base matrix W an ature matrix H learned via
how the bases clearly, we draw the bases
in
4.2. Swimmer Dataset
tial rank is set to K = 25,
th
d fe
our algorithm.
In order to s
Figure 4. Since we set the initial rank of base matrix K =
16, however, only eight images have non-zero values.
Moreover, the eight images show 4 row bars and 4 col-
umn bars appearing in different positions. The results are
perfectly consistent to the design of Fence dataset.
Therefore, we could get the conclusion that our algorithm
is very powerful and efficient to find the real basic com-
ponents and the correct rank.
The other dataset we used is the swimmer dataset.
Swimmer dataset is a typical dataset for feature extrac-
tion. Due to the clearly definition and composition of 16
dynamic parts, it is quite appropriate to the unique char-
acteristic of NMF algorithm, which is to learn part-based
data. As we know, however, the swimmer dataset is a
gray-level image dataset. In our experiment, we focus on
binary dataset, so first we need to convert this gray-level
dataset to binary dataset. Then apply our approach to
perform inference. In this swimmer dataset, there are 256
images totally, each of which depicts a swimming ges-
ture using one torso and four dynamic limbs. The size of
each image is 32 × 32. Each dynamic part could appear
at four different positions. Figure 5 shows some sample
images of the swimmer dataset.
In this experiment part, the ini
e initial values of hyper-parameters are a = 2,
1
0.05 0.05
k
K
b
. Figure 6 shows the experiment
Figure 4. The bases obtained by our algorithm on fence
dataset.
Figure 5. Sample images of the swimmer dataset.
Copyright © 2012 SciRes. JIS
Q. Q. SUN ET AL. 249
resul
ages and the
co
ts for the swimmer dataset. It could be observed that
as for this dataset, we also could find out the correct
bases via our algorithm. In this figure there are 25 base
images. The black ones correspond to irrelevant bases,
and the other 17 images depict the torso and the limbs at
each possible position. We can see that the correct torso
and limbs are discovered successfully.
The differences between the black im
rrect base images are shown in Figure 7. Figure 7
depicts L2-norm of each column of the base matrix. The
total number of points in this figure is the same to the
initial rank. Obviously, the points are classified into two
clusters. One is zero-value cluster, and the other is lar-
ger-value cluster. Thus the rank of base matrix in swim-
mer dataset is 117RB. The results of L2-norm of
base matrix not ow we could find the correct
bases, but also tell us how we could determine the correct
rank of base matrix.
only tell us h
Figure 6. The bases of swimmer dataset learned by our al-
gorithm.
0 5 10 15 20 25
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Base vector inde
x
Normalized L2 norm
Figure 7. L2-norm of base vectors.
supervised multi-level non-
[1] D. D. Lee and of Objects
Non-negative Matrix Fac-
urrieu, “Non-Negative
il Surveillance Using
and Q. Hao, “Context Awareness Emer-
ile Targets Region-of-
stering-Based
Oja, “Automatic Rank Determina-
ative Ma-
,
5. Conclu
We have presented an un
sion
negative matrix factorization algorithm which is power-
ful and efficient to seek the correct rank of a data model.
This is achieved by introducing a multi-prior structure.
The experiment results on binary datasets adequately
demonstrate the efficacy of our algorithm. Compare to
the fully Bayesian method, it is simpler and more con-
venient. The crucial points of this method are how to
introduce the hyper-priors and what kind of prior is ap-
propriate to a certain data model. This algorithm also
could be extended to other data models and noise models.
Although our experiment is based on binary dataset, this
algorithm is suitable to other datasets such as gray-level
dataset, colorful dataset, etc.
REFERENCES
H. S. Seung, “Learning the Parts
by Non-Negative Matrix Factorization,” Nature, Vol. 401,
No. 6755, 1999, pp. 788-791.
[2] Z. Yuan, E. Oja, “Projective
torization for Image Compression and Feature Extrac-
tion,” Springer, Heidelberg, 2005.
[3] C. Fevotte, N. Bertin and J. L. D
Matrix Factorization with the Itakura-Saito Divergence,”
With Application to Music Analysis. Neural Computation,
Vol. 21, No. 3, 2009, pp. 793-830.
[4] M. W. Berry and M. Browne, “Ema
Non-negative Matrix Factorization,” Computational and
Mathematical Organization Theory, Vol. 11, No. 3, 2005,
pp. 249-264.
[5] Q. Sun, F. Hu
gence for Distributed Binary Pyroelectric Sensors,” Pro-
ceeding of 2010 IEEE Conference on Multisensor Fusion
and Integration for Intelligent Systems, Salt Lake City,
5-7 September 2010, pp.162-167.
[6] F. Hu, Q. Sun and Q. Hao, “Mob
Interest via Distributed Pyroelectric Sensor Network:
Towards a Robust, Real-Pyroelectric Sensor Network,”
Proceeding of 2010 IEEE Conference on Sensors, Wai-
koloa, 1-4 November 2010, pp. 1832-1836.
[7] Y. Xue, C. S. Tong, Y. C. W. Chen, “Clu
Initialization for Non-negative Matrix Factorization,” Ap-
plied Mathematics and Computation, Vol. 205, No. 2,
2008, pp. 525-536.
[8] Z. Yang, Z. Zhu and E.
tion in Projective Non-negative Matrix Factorization,”
Proceedings of 9th International Conference on LVA/ICA,
St. Malo, 27-30 September 2010, pp. 514-521.
[9] A. T. Cemgil, “Bayesian Inference for Non-neg
trix Factorization Models,” Computational Intelligence
and Neuroscience, Vol. 2009, 2009, Article ID 785152.
[10] M. Said, D. Brie, A. Mohammad-Djafari and C. Cedric
“Separation of Nonnegative Mixture of Nonnegative
Sources using a Bayesian Approach and MCMC Sam-
pling,” IEEE Transactions on Signal Processing, Vol. 54,
Copyright © 2012 SciRes. JIS
Q. Q. SUN ET AL.
Copyright © 2012 SciRes. JIS
250
No. 11, 2006, pp. 4133-4145.
... NMF has been widely applied in nonnegative data analysis in order to provide more interpretable and meaningful representation of data [1,2]. Particularly, NMF has the ability of learning parts of objects as only addition operations are permitted, which makes it very attractive and an almost indispensable tool in many nonnegative data analysis tasks [1][2][3]. ...
... where P þ ðXÞ projects all negative entries of X to zeros, and B ðnÞ is given in (3). As the size of B ðnÞ is often huge, the computation of B ðnÞ T B ðnÞ can be simplified as ...
... from (2) and (3). Note that the time complexity of (13) is only about OðI n J 2 Þ, which is significantly less than the case without low-rank approximation of Y. ...
... However, sampling-based approaches require many samples to generate meaningful parameter estimates. Furthermore, Sun et al. [13] proposed a multilevel NMF model with the Poisson likelihood, an exponential prior and a gamma superprior that was solved by the maximum a posterior (MAP) estimator. In fact, the parameter estimation of MAP belongs to point estimation, which can easily lead to overfitting. ...
... Our goal is to minimize (12), but this part contains an intractable posterior distribution. Therefore, the original problem of minimizing the KL divergence is equivalent to maximizing (13), which is called the evidence lower bound objection (ELBO) [23]. ...
Article
Full-text available
Nonnegative matrix factorization (NMF) is a novel paradigm for feature representation and dimensionality reduction. However, the performance of the NMF model is affected by two critical and challenging problems. One is that the original NMF does not consider the distribution information of data and parameters, resulting in inaccurate representations. The other is the high computational complexity in online processing. Bayesian approaches are proposed to address the former problem of NMF. However, most existing Bayesian-based NMF models utilize an exponential prior, which only guarantees the nonnegativity of parameters without fully considering the prior information of the parameters. Thus, a new Bayesian-based NMF model is constructed based on the Gaussian likelihood and a truncated Gaussian prior, called the truncated Gaussian-based NMF (TG-NMF) model, in which a truncated Gaussian prior can prevent overfitting while ensuring nonnegativity. Furthermore, Bayesian inference-based incremental learning is introduced to reduce the high computational complexity of TG-NMF; this model is called TG-INMF. We adopt variational Bayesian to estimate all parameters of TG-NMF and TG-INMF. Experiments on genetic data-based tumor recognition demonstrate that our models are competitive with other existing methods for classification problems.
... Non-negative matrix factorization (NMF) [37] is an intuitive method for non-negative data that has become widely used in the machine learning and data mining fields [38]. It can be applied in many use cases, including, feature extraction, signal processing, dimension reduction as well as text and image analysis [37], [39]- [41]. ...
Article
Strength training exercises are essential for rehabilitation, improving our health as well as in sports. For optimal and safe training, educators and trainers in the industry should comprehend exercise form or technique. Currently, there is a lack of tools measuring in-depth skills of strength training experts. In this study, we investigate how data mining methods can be used to identify novel and useful skill patterns from a binary multiple choice questionnaire test designed to measure the knowledge level of strength training experts. A skill test assessing exercise technique expertise and comprehension was answered by 507 fitness professionals with varying backgrounds. A triangulated approach of clustering and non-negative matrix factorization (NMF) was used to discover skill patterns among participants and patterns in test questions. Four distinct participant subgroups were identified in data with clustering and further question patterns with NMF. The results can be used to, for example, identify missing skills and knowledge in participants and subgroups of participants and form general and personalized or background specific guidelines for future education. In addition, the test can be optimized based on, for example, if some questions can be answered correct even without the required skill or if they seem to be measuring overlapping skills. Finally, this approach can be utilized with other multiple choice test data in future educational research.
... The experiment results on two different pattern classification problems show that the proposed algorithm outperforms the state-of-the-art multivariate performance measure optimization methods. In the future, we will also explore the potential of using the proposed methods to bioinformatics problems [14], [15], [16], [17], [18], [19], [20], [21], [22], integrated circuit design [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], multiple model big data analysis [33], [34], [35], [36], [37], [38], [39], [40], software and network security [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], and power systems optimization [51], [52]. Moreover, we will also improve the proposed method by regularizing the learning of classifier by graphs [53], [54], [55], [56], [57], [58], [59], [60], [61]. ...
Article
In this paper, we propose a multi-kernel classifier learning algorithm to optimize a given nonlinear and nonsmoonth multivariate classifier performance measure. Moreover, to solve the problem of kernel function selection and kernel parameter tuning, we proposed to construct an optimal kernel by weighted linear combination of some candidate kernels. The learning of the classifier parameter and the kernel weight are unified in a single objective function considering to minimize the upper boundary of the given multivariate performance measure. The objective function is optimized with regard to classifier parameter and kernel weight alternately in an iterative algorithm by using cutting plane algorithm. The developed algorithm is evaluated on two different pattern classification methods with regard to various multivariate performance measure optimization problems. The experiment results show the proposed algorithm outperforms the competing methods.
Conference Paper
Nonnegative tensor factorization (NTF) has been widely applied in high-dimensional nonnegative tensor data analysis. However, existing algorithms suffer from slow convergence caused by the nonnegativity constraint and hence their practical applications are severely limited. By combining accelerated proximal gradient and low-rank approximation, we propose a new NTF algorithm which is significantly faster than state-of-the-art NTF algorithms.
Conference Paper
Local-learning-based feature selection has been successfully applied to high-dimensional data analysis. It utilizes class labels to define a margin for each data sample and selects the most discriminative features by maximizing the margins with regard to a feature weight vector. However, it requires that all data samples are labeled, which makes it unsuitable for semi-supervised learning where only a handful of training samples are labeled while most are unlabeled. To address this issue, we herein propose a new semi-supervised local-learning-based feature selection method. The basic idea is to learn the class labels of unlabeled samples in a new feature subspace induced by the learned feature weights, and then use the learned class labels to define the margins for feature weight learning. By constructing and optimizing a unified objective function, the feature weights and class labels are learned simultaneously in an iterative algorithm. The experiments performed on some benchmark data sets show the advantage of the proposed algorithm over stat-of-the-art semi-supervised feature selection methods.
Article
In this paper, we present a new variations of the popular nonnegative matrix factorization (NMF) approach to extend it to the data with negative values. When a NMF problem is formulated as μ ≈μμ, we try to develop a new method that only allows μ to contain nonnegative values, but allows both μ and μ to have both nonnegative and negative values. In this way, the original NMF is extended to be used for real value data matrix instead restricted to only negative value data matrix. To this end, we develops novel method to factorize the real value data matrix. The method is evaluated experimentally and the results showed its effectiveness.
Article
Full-text available
We have established a multi-walker recognition/tracking testbed based on low-cost pyroelectrc sensor network (PSN). In order to identify a region of interest (Rol) in the monitoring area for the detection of any interesting mobile targets, we propose to use Bayesian machine learning and binary signal projection to extract the statistical contextual features from real-time, high-dimensional PSN data. This paper describes our recent results in this area, which include two aspects: (1) we have proposed to use binary principle component analysis (B-PCA) to interpret the relationship between observed sensor data and hidden context patterns. (2) We have conducted comprehensive experiments from real PSN sensor data to verify the context detection accuracy based on B-PCA models. Our results show that B-PCA can better capture context basis than general PCA algorithm.
Article
Full-text available
This letter presents theoretical, algorithmic, and experimental results about nonnegative matrix factorization (NMF) with the Itakura-Saito (IS) divergence. We describe how IS-NMF is underlaid by a well-defined statistical model of superimposed gaussian components and is equivalent to maximum likelihood estimation of variance parameters. This setting can accommodate regularization constraints on the factors through Bayesian priors. In particular, inverse-gamma and gamma Markov chain priors are considered in this work. Estimation can be carried out using a space-alternating generalized expectation-maximization (SAGE) algorithm; this leads to a novel type of NMF algorithm, whose convergence to a stationary point of the IS cost function is guaranteed. We also discuss the links between the IS divergence and other cost functions used in NMF, in particular, the Euclidean distance and the generalized Kullback-Leibler (KL) divergence. As such, we describe how IS-NMF can also be performed using a gradient multiplicative algorithm (a standard algorithm structure in NMF) whose convergence is observed in practice, though not proven. Finally, we report a furnished experimental comparative study of Euclidean-NMF, KL-NMF, and IS-NMF algorithms applied to the power spectrogram of a short piano sequence recorded in real conditions, with various initializations and model orders. Then we show how IS-NMF can successfully be employed for denoising and upmix (mono to stereo conversion) of an original piece of early jazz music. These experiments indicate that IS-NMF correctly captures the semantics of audio and is better suited to the representation of music signals than NMF with the usual Euclidean and KL costs.
Article
Full-text available
Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.
Article
Full-text available
This paper addresses blind-source separation in the case where both the source signals and the mixing coefficients are non-negative. The problem is referred to as non-negative source separation and the main application concerns the analysis of spectrometric data sets. The separation is performed in a Bayesian framework by encoding non-negativity through the assignment of Gamma priors on the distributions of both the source signals and the mixing coefficients. A Markov chain Monte Carlo (MCMC) sampling procedure is proposed to simulate the resulting joint posterior density from which marginal posterior mean estimates of the source signals and mixing coefficients are obtained. Results obtained with synthetic and experimental spectra are used to discuss the problem of non-negative source separation and to illustrate the effectiveness of the proposed method
Article
Non-negative matrix factorization (NMF) is an unsupervised learning algorithm that can extract parts from visual data. The goal of this technique is to find intuitive basis such that training examples can be faithfully reconstructed using linear combination of basis images which are restricted to non-negative values. Thus, NMF basis images can be understood as localized features that correspond better with intuitive notions of parts of images. However, there has been few systematic study to explore various methods for initialization of NMF algorithm, which is crucial for the performance of NMF algorithm in data analysis. In this paper, we discuss a structured NMF initialization scheme based on the clustering method. Comparing with the random initialization in common use, our method achieved faster convergence while maintaining the data structure and also obtained good result for the face recognition task. Furthermore, we also proposed to use a normalized AIC incorporated with our NMF initialization for rank selection of traditional NMF at the cost of much less computational load while obtaining a good performance in face recognition.
Conference Paper
In image compression and feature extraction, linear expan- sions are standardly used. It was recently pointed out by Lee and Seung that the positivity or non-negativity of a linear expansion is a very power- ful constraint, that seems to lead to sparse representations for the images. Their technique, called Non-negative Matrix Factorization (NMF), was shown to be a useful technique in approximating high dimensional data where the data are comprised of non-negative components. We propose here a new variant of the NMF method for learning spatially localized, sparse, part-based subspace representations of visual patterns. The algo- rithm is based on positively constrained projections and is related both to NMF and to the conventional SVD or PCA decomposition. Two it- erative positive projection algorithms are suggested, one based on mini- mizing Euclidean distance and the other on minimizing the divergence of the original data matrix and its non-negative approximation. Experimen- tal results show that P-NMF derives bases which are somewhat better suitable for a localized representation than NMF.
Article
In this study, we apply a non-negative matrix factorization approach for the extraction and detection of concepts or topics from electronic mail messages. For the publicly released Enron electronic mail collection, we encode sparse term-by-message matrices and use a low rank non-negative matrix factorization algorithm to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in topic detection and message clustering are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting basis vectors and matrix projections of this approach can be used to identify and monitor underlying semantic features (topics) and message clusters in a general or high-level way without the need to read individual electronic mail messages.
Bayesian Inference for Non-neg trix Factorization Models
A. T. Cemgil, " Bayesian Inference for Non-neg trix Factorization Models, " Computational Intelligence and Neuroscience, Vol. 2009, 2009, Article ID 785152.