ArticlePDF Available

Robust Extreme Learning Machine With its Application to Indoor Positioning

Authors:

Abstract and Figures

The increasing demands of location-based services have spurred the rapid development of indoor positioning system and indoor localization system interchangeably (IPSs). However, the performance of IPSs suffers from noisy measurements. In this paper, two kinds of robust extreme learning machines (RELMs), corresponding to the close-to-mean constraint, and the small-residual constraint, have been proposed to address the issue of noisy measurements in IPSs. Based on whether the feature mapping in extreme learning machine is explicit, we respectively provide random-hidden-nodes and kernelized formulations of RELMs by second order cone programming. Furthermore, the computation of the covariance in feature space is discussed. Simulations and real-world indoor localization experiments are extensively carried out and the results demonstrate that the proposed algorithms can not only improve the accuracy and repeatability, but also reduce the deviation and worst case error of IPSs compared with other baseline algorithms.
Content may be subject to copyright.
194 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
Robust Extreme Learning Machine With its
Application to Indoor Positioning
Xiaoxuan Lu, Han Zou, Hongming Zhou, Lihua Xie, Fellow, IEEE, and Guang-Bin Huang, Senior Member, IEEE
Abstract—The increasing demands of location-based services
have spurred the rapid development of indoor positioning system
and indoor localization system interchangeably (IPSs). However,
the performance of IPSs suffers from noisy measurements. In this
paper, two kinds of robust extreme learning machines (RELMs),
corresponding to the close-to-mean constraint, and the small-
residual constraint, have been proposed to address the issue
of noisy measurements in IPSs. Based on whether the feature
mapping in extreme learning machine is explicit, we respec-
tively provide random-hidden-nodes and kernelized formulations
of RELMs by second order cone programming. Furthermore,
the computation of the covariance in feature space is discussed.
Simulations and real-world indoor localization experiments are
extensively carried out and the results demonstrate that the
proposed algorithms can not only improve the accuracy and
repeatability, but also reduce the deviation and worst case error
of IPSs compared with other baseline algorithms.
Index Terms—Indoor positioning system (IPS), robust extreme
learning machine (RELM), second order cone program-
ming (SOCP).
I. INTRODUCTION
DUE to the nonline-of-sight transmission channels
between a satellite and a receiver, wireless indoor posi-
tioning has been extensively studied and a number of solutions
have been proposed in the past two decades. Unlike other wire-
less technologies, such as ultrawideband and radio frequency
identification, which require the deployment of extra infras-
tructures, the existing IEEE 802.11 network infrastructures,
such as WiFi routers, are widely available in large numbers of
commercial and residential buildings. In addition, nearly every
mobile device now is equipped with a WiFi receiver [1].
The WiFi-based machine learning (ML) approaches are
becoming popular in indoor positioning in recent years [2].
Fingerprinting method based on WiFi received signal
strength (RSS), in particular, has received a lot of attentions.
The fingerprinting localization procedure usually involves two
stages: 1) offline calibration stage and 2) online matching stage.
Manuscript received September 4, 2014; revised December 13, 2014;
accepted January 25, 2015. Date of publication February 24, 2015;
date of current version December 14, 2015. This work was sup-
ported in part by the National Research Foundation of Singapore under
Grant NRF2011NRF-CRP001-090 and Grant NRF2013EWT-EIRP004-012,
and in part by the Natural Science Foundation of China under NSFC
61120106011. This paper was recommended by Associate Editor X. Wang.
The authors are with the School of Electrical and Electronics
Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
xlu010@ntu.edu.sg).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCYB.2015.2399420
During the offline stage, a cite survey is conducted and signal
strengths received at each location from various access points
(APs) are recorded in a radio map. During the online stage, users’
positions can be estimated by matching the online RSSs with
the fingerprints stored in the radio map. Online matching strat-
egy according to the relationships between physical locations
and RSS map modeled by different ML algorithms is cru-
cial for the performance of indoor positioning systems (IPSs).
Neural network (NN) and support vector machines (SVM) [3],
as two sophisticated ML techniques, have both been utilized
in fingerprinting-based indoor positioning [4].
However, either NN or SVM-based IPSs face two chal-
lenges. On one hand, NN and SVM are time-consuming, and
this issue becomes more serious in fingerprinting-based posi-
tioning systems, because large amount of training data are
required for generating a radio map. Their high computational
costs leave us little leeway, especially for some large-scale
scenarios, to improve the performance and robustness of
ML-based IPSs. On the other hand, noisy measurements are
inevitable, considering that manual observational errors of cal-
ibrated points happen throughout the calibration phase. In
addition, signal variation and ambient dynamics also affect
the signals received by APs. These adverse factors can be con-
sidered as uncertainties, which may degrade the performance
of IPSs. Many researchers bypass optimizing ML methods to
enhance the robustness of IPSs since it will aggravate the sit-
uation of slow training rate. Kothari et al.[5] utilized the
integration of complementary localization algorithms of dead
reckoning and WiFi signal strength fingerprinting to achieve
robust indoor localization, nevertheless, a disadvantage of dead
reckoning is that the errors are cumulative, since new positions
are calculated solely from previous ones. Meng et al.[6]pro-
posed a robust noniterative three-step location sensing method,
but its capability of reducing the worst case error (WCE)
and variance is comparatively limited. Other robust indoor
localization algorithms demand either extra infrastructure or
users’ interaction during calibration phases, which is not
cost-efficient in reality.
These undesirable results motivate us to reconsider the
problem: can we find a ML technique which is fast in train-
ing and has the capability of handling the robustness issue
in IPSs? As a novel learning technique, extreme learning
machines (ELM) has been demonstrated with its outstanding
performance in training speed, prediction accuracy, and gener-
alization ability [7], [8]. Several IPSs have already leveraged
ELM to deliver accurate location estimation with fast training
speed [1], [9], [10].
2168-2267 c
2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 195
Extended from ELM, this paper proposes two robust
ELMs (RELMs), which can be implemented in the
random-hidden-nodes form or kernelization form depending
on the situation, to boost the robustness of IPSs.
The problem of uncertainty and robustness has been
intensively studied in recent years. Wang et al.[11]pro-
posed an ELM tree model-based on the heuristics of uncer-
tainty reduction and computationally lightweight for big
data classification. Fuzzy integral method is adopted to
study the probabilistic feed-forward neural networks [12].
Horata et al.[13] proposed an approach, which is also named
RELM to improve the computational robustness by extended
complete orthogonal decomposition and outlier robustness
by reweighted least squares. Unlike these works, consider-
ing the noises in IPS as discussed above, we propose our
algorithm under a stochastic framework. It is worthwhile to
mention that RELMs are based on second order cone program-
ming (SOCP), which is widely adopted in robust convex opti-
mization problems. Simulation and real-world experimental
results both demonstrate that RELMs-based IPSs outperform
other baseline algorithms-based IPSs in terms of accuracy,
repeatability (REP), and WCE.
An outline of this paper is as follows. In Section II,we
introduce the preliminaries for this paper, including basic com-
ponents of a WiFi-based IPS, backgrounds for ELM, and
its comparison with SVR. Two second order moment con-
straints, i.e., close to mean (CTM) and small residual (SR)
constraints, with their geometric interpretations are given in
Section III. The random-hidden-nodes and kernelized formula-
tions of RELMs are derived in Sections IV and V, respectively.
How to calculate the covariance in the feature space is studied
in Section VI. In Section VII, the proposed algorithms are eval-
uated by both simulation and real-world IPSs. The conclusion
is drawn in Section VIII.
II. PRELIMINARIES
A. WiFi Indoor Positioning
An enormous body of indoor positioning problems fall into
a sort of regression problem. As shown in Table I, the input
variable x(x1,x2,...,xd)is a vector of RSS received from
APs in the environment, and t(t1,t2)is the indoor 2-D physical
coordinates of a target’s location. When an AP is undetectable
in a position, its corresponding RSS is taken as 100 dBm.
The problem here is to train and approximate the regression
model.
Although in some works, the procedure of collecting sig-
nal strength involves physically moving a wireless device all
around the target area, as in [14] and [15], we only pick out
some spatially representative locations, i.e., reference (calibra-
tion) points, from the target area, and conduct sampling at each
reference point for a period of time to build up a radio map.
B. Introduction to ELM
Originally inspired by biological learning to overcome the
challenging issues faced by back propagation (BP) learn-
ing algorithms, ELM is a kind of ML algorithm based on
a generalized single-hidden layer feedforward NN (SLFN)
TAB LE I
INPUT VARIABLE:RSS(x)AND OUTPUT:LOCATION (t)
architecture [16]. It has been demonstrated to provide good
generalization performance at an extremely fast learning
speed [17]–[19].
Let ϒ={(xi,ti);i=1,2,...,N}be a training set consist-
ing of patterns, where xiR1×dand tiR1×m, then the goal
of regression is to find the relationship between xiand ti. Since
the only parameters to be optimized are the output weights,
the training of ELM is equivalent to solving a least squares
problem [20].
In the training process, the first stage is that the hidden
neurons of ELM map the inputs onto a feature space
h:xih(xi)(1)
where h(xi)R1×L.
We denote Has the hidden layer output matrix (randomized
matrix)
H=
h(x1)
h(x2)
.
.
.
h(xN)
N×L
(2)
with Lthe dimension of the feature space and βRL×mas
the output weight matrix that connects the hidden layer with
the output layer. Then, each output of ELM is given by
ti=h(xi)β,i=1,2,...,N.(3)
ELM theory aims to reach the smallest training error but
also the smallest norm of output weight [16]
min
ξ,βRL×mLP=1
2β1
1+C
2
N
i=1
ξi
s.t.h(xi)βti2
2=ξii=1,2,...,N(4)
where 1>0,
2>0,
1,
2=0,1/2,1,2,...,+∞,1Cis
the penalty coefficient on the training errors and ξiRmis
the error vector with respect to the ith training pattern.
A simplest example of the above is basic ELM [17]
min
βRL×mLP=
N
i=1
ξi
s.t.h(xi)βti2=ξii=1,2,...,N(5)
which can be solved by the least squares method
β=HT(6)
where His the Moore–Penrose generalized inverse of H.
1Unless explicitly specified, 1=2=2 for all norm notations in this
paper.
196 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
Extended from basic ELM, [21] proposed an optimization-
based ELM (OPT-ELM) for the binary classification problem
by introducing inequality constraints. We follow from [21]to
give a form of OPT-ELM for regression problems:
min
ξ,βRL×mLP=1
2β2+C
2
N
i=1
ξi
s.t.h(xi)βti≤ε+ξi
ξi0i=1,2,...,N(7)
where εis a slack variable. This formulation is very similar to
support vector regression (SVR) in a nonlinear case [3], [22],
which is in the following form:
min
ξ,w,bLPSVM =1
2w2+C
2
N
i=1
ξi
s.t.w·φ(xi)+bti≤ε+ξi
ξi0i=1,2,...,N(8)
where φ(·)is the nonlinear feature mapping function in SVR,
wis the output weights and bis the approximation (output)
bias. εand ξiare as defined in the OPT-ELM case.
Detailed comparison between ELM and SVM for classi-
fication problems are given in [21] and [23], and in the next
section we further this comparison to regression problems. For
convenience of description, we henceforth follow from [16]
to refer to the formulation of (7) as OPT-ELM, while basic
ELM stands for the formulation of (5). The terminology ELM
in the rest of this paper has more broad meaning, which
can be considered as the gathering of basic ELM and its
random-hidden-nodes-based variants.2
C. Comparisons Between ELM and SVR
Both formulations of ELM and SVR are within the scope
of quadratic programming, however, the decision variable b,
i.e., the bias term, is not existent in ELM.
SVR and its variants emphasize the importance of bias
bin their implementation. The reason is that the separation
capability of SVM was considered more important than its
regression capability when SVM was first proposed to handle
binary classification applications. Under this background, its
universal approximation capability may somehow have been
neglected [3]. Due to the inborn reason that the feature map-
ping φ(·)in SVR is unknown, it is difficult to study the
universal approximation capability of SVR without the explic-
itness of feature mapping. Since φ(·)is unknown and may not
have universal approximation capability, given a target func-
tion f(·)and any small εprecision, there may not exist a w
such that w·φ(x)f(x). In other words, there may exist
some system errors even if SVM and its variants with appro-
priate kernels can classify different classes well, and these
system errors need to be absorbed by the bias b. This may be
the reason why in principle the bias bhas to remain in the
optimization constraints [16].
2We particularly avoid including kernel ELM and its variants in the above
gathering, given the fact that they do not possess the most significant property
of ELM—random feature mapping.
On the other hand, all the parameters of the ELM map-
ping h(x)are randomly generated, and h(x)is known to users
finally. According to [17]–[19], ELM with almost any non-
linear piecewise continuous function h(x)has the universal
approximation capability. Therefore, the bias bis not necessary
in the output nodes of ELM.
In addition, from the optimization point of view, less
decision variables to be determined implies less computa-
tional costs, and this computational superiority becomes more
obvious when the scale of the training data gets larger.
Kernel ELM is somehow superior to SVR for the sake of
flexibility in kernels. Namely, the feature mapping to form
the kernels can be unknown mapping or random feature map-
ping. More introduction about kernel ELM will be given in
Section V.
Huang [16] pointed out that the “redundant” brenders SVR
sub-optimal compared with ELM if same kernels are both used
in them, because the feasible solution space of SVR is a subset
of ELM feasible solution space.
We shall indicate that the main difference between ELM
and SVR is their different account of starting points. SVR [24]
was developed at first as an extension of SVM. As mentioned
above, SVM was designed for binary classification at first, and
the subsequent variants for regression problems were devel-
oped on the basis of SVM without addressing the problem
caused by b. By contrast, ELM was originally proposed for
regression, the feature mappings h(x)are known, and univer-
sal approximation capability was considered at the first place.
Thus, in ELM, the approximation error tends to be zero and
bshould not be present [16], [21], [23].
III. ROBUST ELM
A. Uncertainties of Input and Output Data
RELM is proposed under a stochastic framework. Assume
that both input xand output data tare perturbed by noises.
Since His the feature space after nonlinear mapping from
the input space, if the input data is contaminated, His also
mixed with disturbances. We follow from [25] to assume the
disturbances in the feature space are additive:
h(xi)=h(xi)true +(ι1)i
ti=(ti)true +(ι2)i(9)
where (ι1)iand (ι2)iare uncorrelated perturbations in the
feature space and output space with proper dimensions, respec-
tively. The new vector yiR1×(L+m)is the ith input and ith
output observation, i.e., yi=[h(xi), ti]. And now we give the
following definitions:
¯
h(xi)=E(h(xi)), ¯
ti=E(ti)
i
hh =Cov(h(xi), h(xi)), i
tt =Cov(ti,ti)(10)
where E(·)and Cov(·)denote expectation and covariance
operators for random variables, respectively. Since, the per-
turbations in the feature space (ι1)iand output space (ι2)iare
uncorrelated, i.e., i
ht =0, we have
¯
yi=E([h(xi), ti])=¯
h(xi), ¯
ti
i
yy =Cov(yi,yi)=i
hh 0
0i
tt (L+m)×(L+m)
.(11)
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 197
The ith prediction error is denoted by eiR1×mand its
expectation ¯
eiis defined as follows:
ei=h(xi)βti,¯
ei=¯
h(xi)β¯
ti.(12)
It follows from [25] and [26] that, by inserting CTM and
SR constraints into SVR, the predictions can be robust to
perturbations in the data set.
CTM is a criterion on that we require the prediction errors
to be insensitive to the distribution of the noises in input and
output data
Pr
xi,yi{|ei−¯ei|≥θi}≤ηi=1,2,...,N(13)
xi,yihere are the input and output data, and θimeans the
confidence threshold while ηdenotes the maximum tolerance
of the deviation.
An alternative way to boost the robustness is restricting the
residual to be small, which leads to the SR constraint
Pr
xi,yi{|ei|≥ξi+ε}≤η(14)
where ξicorresponds to the prediction error and εis a slack
variable. Compared with the CTM constraint, the SR con-
straint requires the estimator to be robust in terms of deviations
which lead to larger estimation error rather than centering. In
fact, both CTM and SR constraints are robust constraints uti-
lized to bound probabilities of highly deviated errors subject
to second order moment constraints.
B. Sufficient Condition of CTM Constraint
It should be pointed out that, the above two robust con-
straints only consider a scalar output case, however, the
outputs of IPSs are usually vectors. Moreover, ELM or
kernel ELM algorithms are inherently different from SVR,
therefore different constraints should be provided for our
problem setting. We now give our CTM constraint for this
paper
Pr
h(xi),tiei¯
ei2θ2
iτi=1,2,...,N(15)
where θiis still a confidence threshold and τhere stands
for some probability. Nevertheless, CTM constraints in this
form are intractable. Multidimensional Chebyshev’s inequal-
ity is leveraged to convert the original constraints into tractable
ones.
Lemma 1 [27]: Let zbe an m-dimensional random row vec-
tor with expected value ¯
zand positive-definite covariance ,
then
Pr (z¯
z)1(z¯
z)Tθ2m
θ2.(16)
Proposition 1: For zand defined in Lemma 1,ifz2
, then z1zT.
Proof: Since is a real-valued symmetric matrix, it can be
diagonalized as =P1P.here is a real-valued matrix
with eigenvalues of on its diagonal. It can be shown that
≤I1≥1I(17)
which leads to
z1zT=zP11PzTzzT
(18)
and (18) gives rise to
z2⇒z1zT. (19)
Proposition 1also implies
Pr z2Pr z1zT.(20)
Theorem 1: Let βRL×mand ω=[βT,1]T
R(L+m)×mand i
yy is defined in (11), then a sufficient con-
dition for (15)is
i
yy1
2ω
θiτ/m(21)
where 1is a vector of all entries of 1 with proper length.
Proof: Substitute ei
ifor z into (16), we have
Pr
h(xi),ti(ei¯
ei)i
ee1(ei¯
ei)Tθ2
im
θ2
i
(22)
which together with (20), leads to
Pr
h(xi),tiei¯
ei2θ2
i
Pr
h(xi),ti(ei¯
ei)i
ee1(ei¯
ei)Tθ2
i
i
ee
mi
ee
θ2
i
.(23)
Thus, mi
ee 2
iτis a sufficient condition for (15). By
taking into account that
i
ee =ωTi
yyω(24)
inserting (24)intomi
ee 2
iτand then taking the square
root on both sides, (21) follows.
C. Sufficient Condition of SR Constraint
The sufficient condition of SR constraint can be derived in
the same fashion. The SR constraint in our case is
Pr
h(xi),tiei2i+ε)2τi=1,2,...,N.(25)
Theorem 2: Let βRL×m,ω=[βT,1]TR(L+m)×m
and i
yy is defined in (11), then a sufficient condition for (25)is
i
yy1
2ω
¯
h(xi)β¯
ti
(ξi+ε)τ/m(26)
where 1is a vector of all entries of 1 with proper length.
Proof: Taking eieT
iRas a random variable, from
Markov’s inequality, we have
Pr
h(xi),tiei2i+ε)2=Pr
h(xi),tieieT
ii+ε)2
E(eieT
i)
i+ε)2.
198 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
Fig. 1. Shadow area indicates the possible region the random variable may
fall into.
Denote tr(·)as the trace operator of a matrix
EeieT
i=Etr eT
iei
=Etr eT
iei¯
eT
i¯
ei+tr ¯
eT
i¯
ei
=tr i
ee +¯
eT
i¯
ei.(27)
Since i
ee and ¯
eT
i¯
eiare both positive semi-definite, which
implies that i
ee +¯
eT
i¯
eiis positive semi-definite. Since
i
ee +¯
eT
i¯
ei
=max{λ1,...,λ
m}(28)
where λistands for an eigenvalue of i
ee +¯
eT
i¯
ei,wehave
tr i
ee +¯
eT
i¯
eim
i
ee +¯
eT
i¯
ei
(29)
which leads to
m
i
ee +¯
eT
i¯
ei
=m
i
yy1
2ω
¯
h(xi)β¯
ti
2
.(30)
By letting
m
i+ε)2
i
yy1
2ω
¯
h(xi)β¯
ti
2
τ(31)
and taking square root on both sides, we claim that (26)isa
sufficient condition for (25).
D. Geometric Interpretation
The geometric interpretations of the above claims are as
follows:
1) Proposition 1 can be interpreted as that the chance of
a random variable lying outside a sphere with radius
is greater than that of a random variable lying
outside an ellipsoid with radius and covariance
matrix . This is intuitive because the largest length
of semi-axe of the ellipsoid is equal to the radius of the
sphere and they share the same center. Fig. 1shows the
illustration when the ellipsoid and sphere are projected
onto a 2-D space.
2) The above CTM robust criterion can be understood as
a restriction that each training data yipicked from the
ellipsoid i(¯
yi,
i
yy,(m )1/2)satisfies the inequality
ei¯
eiθi(32)
where
i¯
yi,
i
yy,m
τ
=yi|(yi¯
yi)i
yy1
(yi¯
yi)Tm
τ.(33)
From Theorem 1,wehave
m
i
yy
1
2ω
θi.(34)
Further, by noting that
ei¯
ei=(yi−¯yi)ω
=
(yi−¯yi)i
yy1
2i
yy
1
2ω
(yi−¯yi)i
yy1
2
i
yy
1
2ω
m
τ
i
yy
1
2ω
.(35)
It is obvious that the above geometric interpretation for
the CTM constraint holds.
3) A similar geometric interpretation can be given for the
SR constraint. Let
i
yy =i
yy +yT
iyi(36)
a SR constraint enforces each training data yipicked
from the ellipsoid i(0,
i
yy,m )
i0,
i
yy,m
τ
=yi|yi
i
yy1yT
im
τ(37)
satisfies the following inequality:
ei≤ξi+ε. (38)
The procedure to verify this interpretation is in the same
fashion of the CTM case
ei=yiω
=
yi
i
yy1
2
i
yy
1
2ω
yi
i
yy1
2
i
yy
1
2ω
m
τ
i
yy
1
2ω
.(39)
From Theorem 2,wehave
i
yy1
2ω
¯
h(xi)β¯
ti
2
=
i
ee +¯
eT
i¯
ei
=
ωTi
yy +yT
iyiω
=
ωTi
yy +yT
iyiω
τ
m(ξi+ε)2.(40)
Taking square roots of (40) yields
i
yy1
2ω
τ
m(ξi+ε)(41)
which together with (39) implies
eiξi+ε. (42)
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 199
IV. ROBUST ELM FOR REGRESSION
Based on the preliminary results of last section, we now
formulate CTM-constrained RELM (CTM-RELM) and SR-
constrained RELM (SR-RELM) for noisy input and output
data.
A. CTM-Based RELM
By adding second order moment constraints to the basic
ELM formulation in Theorem 1, the CTM-RELM is formu-
lated as
min
β,b,θ,ξLP=b+C
N
i=1
ξi+D
N
i=1
θi
s.t.h(xi)βtiε+ξi
i
yy1
2ω
θiτ/m
ξi0i=1,2,...,N
β≤b(43)
where Cis defined in (7), and Dis a penalty coefficient to
control the deviation of the prediction errors.
B. SR-Based RELM
Likewise, Theorem 2also leads to a SOCP problem
formulation
min
β,b,ξLP=b+C
N
i=1
ξi
s.t.
i
yy1
2ω
¯
h(xi)β¯
ti
(ξi+ε)τ/m
ξi0i=1,2,...,N
β≤b.(44)
V. KERNELIZATION FOR RELMS
As discussed in Section II-C, the kernel trick is adopted
in SVR. In fact, the kernel trick can also be applied to
ELM. We have indicated that, the explicit nonlinear fea-
ture mapping with random hidden nodes in ELM can bring
about some advantages compared to SVR. Nevertheless, it
does not mean that the kernel trick is useless for ELM. In
reality, the capability of universal approximation of ELM
can not be fulfilled due to the curse of dimensionality.
Kernel methods enable access to the corresponding very
high-dimensional, even infinite-dimensional, feature spaces
at a low computational cost both in space and time [28].
In the case of a Gaussian kernel, the feature map lives in
an infinite dimensional space, i.e., it has infinite number of
hidden nodes L, which enables ELM to work as universal
approximator [18]. Some related works have adopted the ker-
nel method in ELM and produce desirable results [23], [29].3
In this section, we slightly modify CTM and SR constraints
3For terminology consistency, we use kernel ELM to refer to the kernel
trick-based ELM and its variants.
and then incorporate them into the kernelized formulations
of RELMs.
It follows from [23] that the optimal weight matrix βin
ELM has the form:
β=HTP(45)
where PRN×m. Once the model, i.e., β, is determined, we
can make predictions by
f(x)=h(x)β=
N
i=1
h(x)h(xi)TPi.(46)
Based on the definition of ELM kernel, we have
f(x)=
N
i=1
k(x,xi)Pi(47)
where k(·,·)is a kernel function. The kernel matrix of ELM
is defined as [16]
K=HHT:Ki,j=h(xi)·hxjT=kxi,xj(48)
when the number of training samples is n,KRN×N.
The intrinsic modularity of kernel machines also means that
any kernel function can be used provided it produces symmet-
ric, positive semi-definite kernel matrices [28]. In our case, we
restrict Knot only to satisfy the modularity but also have all
of its entries being real numbers. Thus, we can decompose K
in such way
K=K1
2K1
2(49)
where K1/2is real symmetric. From (45) and (48), we get
βTβ=PTKP =K1
2PTK1
2P(50)
which leads to β=K1/2P.
We now give the kernelized CTM constraint
i
yy
1
2
K1
2P
1
θiτ/mi=1,2,...,N(51)
where 1is a matrix of all entries of 1 with the dimen-
sion of m×m. Note that (51) is a sufficient condition of (21)
since
i
yy1
2ω
i
yy
1
2ω
i
yy
1
2
K1
2P
1
(52)
where ω=[βT,1]T, and the kernelized CTM-RELM is of
the form as
min
P,b,θ,ξLP=b+C
N
i=1
ξi+D
N
i=1
θi
s.t.Ki,:Pti≤ε+ξi
i
yy
1
2
K1
2P
1
θiτ/m
ξi0i=1,2,...,N
K1
2P
b.(53)
200 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
A similar fashion can be adopted to derive the kernelized
SR-RELM formulation
min
P,b,ξLP=b+C
N
i=1
ξi
s.t.
i
yy
1
2K1
2P
1
Ki,:Pti
(ξi+ε)τ/m
ξi0i=1,2,...,N
K1
2P
b.(54)
VI. COVARIANCE IN THE FEATURE SPACE
We firstly calculate the covariance when the nonlinear
mapping functions are known explicitly. We write h(x)as
follows:
h(x)=[G(a1,b,x),...,G(aL,b,x)] (55)
where ai,bare randomly generated weights and bias con-
necting an input and the ith hidden node. G(ai,b,x)is the
activation function.
A statistical method is provided to derive the covariance
theoretically in the feature space. For each input xi, we ran-
domly generate Zsamples {x1
i,x2
i,...,xZ
i}according to the
distribution of xiwith mean ¯
xiand covariance i
xx. Then the
covariance matrix of h(xi)can be approximated by
i
hh =1
Z
Z
z=1˜
hxz
iT˜
hxz
i(56)
where
˜
hxz
i=hxz
i1
Z
Z
z=1
hxz
i.(57)
However, the covariance in the kernel case is more delicate
and cannot be derived explicitly. Note that, in the kernelized
cases of (53) and (54), only the norm of covariance i
yy is
needed, that is
i
yy =
i
hh 0
0i
tt
=max
i
hh
,
i
tt
.(58)
i
ttcan be readily calculated, and we now give a solution
to approximate i
hh.TheL2-norm of real symmetric matrix
i
hh equals its largest eigenvalue. Let λand vbe an eigenvalue
and its corresponding eigenvector
λv=i
hhv.(59)
It is been proved in [30] that λof i
yy also satisfies
Zλα=˜
Kiα(60)
where ˜
Ki=KiLKiKiL+LKiLand LRZ×Zwith
each entry Li,j=1/Z. Here, the Z×Zmatrix Kis defined by
Ki
i,j:=kxi
i,xj
i=hxi
i·hxj
i.(61)
Fig. 2. Positions of the WiFi AP, offline calibration points, and online testing
points in the simulated field.
Hence, we can compute the L2-norm of i
hh from the set
of eigenvalues of ˜
Ki
i
hh
=1
Zmax λ˜
Ki (62)
where λ( ˜
Ki)is the set of all the eigenvalues of ˜
Ki.
VII. PERFORMANCE VERIFICATION
A. Simulation Results and Evaluation
We develop a simulation environment using MATLAB
R2013a in order to evaluate the performance of our proposed
algorithms before any real-world experiment is conducted. As
shown in Fig. 2,weassumea20×20 m room where four
WiFi APs are installed at the four corners of the room. The
most commonly used path loss model for indoor environment
is the ITU indoor propagation model [31]. Since it provides
a relation between the total path loss PL (dBm) and distance
d(m), it is adopted to simulate the WiFi signal generated from
each WiFi AP. The indoor path loss model can be expressed as
PL(d)=PL010αlog(d)+X(63)
where PL0is the path loss coefficient and it is set to be
40 dBm in our simulation. αis the path loss exponent and
Xrepresents some random noises.
The distribution of RSS indication from four real-world APs
in our IPS is illustrated in Fig. 3. As shown in Fig. 3, the sig-
nals collected by one AP can be quite different even at a same
location due to noises and outliers. Therefore, four different
types of data with disturbances are generated based on (63),
i.e., data mixed with the Gaussian noise XN(0,1), data
mixed with the student’s noise XT(0,1,1), data mixed
with the gamma noise XGa(1,1)and data contaminated by
one-sided outliers (20% contamination rate),4to test the per-
formance of RELMs. To make our simulation more practical,
100 testing samples are artificially generated at each training
point and testing point, respectively using (63) with different
perturbations.
We apply our RELMs to the simulated data, and com-
pare our proposed algorithms with basic ELM, OPT-ELM,
4The strategy of adding outliers here is similar to the one of [13].
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 201
Fig. 3. RSS index of distribution of four APs at one position.
kernel ELM, and SVR [32]. In the CTM-RELM formulation,
there are three hyperparameters, C,D, and τto be tuned.
Cand Dare both selected by grid method from the exponential
sequence [25,24,...,25] utilizing fivefold cross-validation
on the training data set. τincreases from 0.1 to 1 with a step
size of 0.1. In SR-RELM case, there are two hyper-parameters,
Cand τto be tuned, they are all selected with the same strat-
egy as CTM-RELM. For both RELMs, the slack variable εis
empirically selected as 0.05. The SOCP problems are solved
by CVX MATLAB toolbox [33]. Since the performances of
ELM and its variants are not sensitive to the number of hid-
den nodes Las long as it is larger than some threshold [23],
we fix Las 500 for our proposed algorithms, basic ELM and
OPT-ELM to facilitate the comparison of computational costs.
The width of Gaussian kernel λused in SVR and kernel ELM
are selected from the exponential sequence [25,24,...,25]
utilizing fivefold cross-validation.
Four performance measures are introduced: mean root
square error (MRSE), standard deviation (STD), WCE, and
REP over rrepeated realizations. Noted that MRSE, STD,
and WCE in this case are taken from the mean over the r
repeated realizations. REP is measured by the deviation of
the MRSE over the repeated realizations, and this measure is
proposed based on the fact that ELM with same parameters,
e.g., the number of hidden nodes, in the same training data
set may draw quite different results. rin our experiment is
selected as 30
MRSE =1
r
r
j=1!1
s
s
i=1
tihiˆ
β
"j
STD =1
r
r
j=1
%
&
&
's
i=1!
tihiˆ
β
1
s
s
i=1
tihiˆ
β
"2
j
WCE =1
r
r
j=1max
iS
tihiˆ
β
j
REP =%
&
&
&
'1
r
r
j=1!1
s
s
i=1
tihiˆ
β
MRSE"2
j
Fig. 4. Cumulative percentile of error distance for simulation data sets.
where sis the number of testing samples, Sis the index set
of testing samples like [1,2,...,s].
As shown in Fig. 4, the proposed two algorithms outperform
the other four algorithms in terms of accuracy and WCE. More
exact number can be found in Table II, from which we see that
the REP of the RELMs-based systems is improved compared
with basic ELM and OPT-ELM-based ones. The enhancement
of the REP is due to more constraints brought in our algo-
rithms, which shrinks the size of solution searching space.
Note that, the shrinking happening here is different from the
one discussed in [21], in which the loss of solution searching
freedom of SVR is caused by the redundant b[16].
B. Evaluation in Real-World IPSs
The system architecture of our WiFi-based IPS is shown
in Fig. 5. The main components of this system consist of
the existing commercial WiFi APs, mobile devices with WiFi
function, a location server and a web-based monitoring system.
The following is a brief operation procedure of our WiFi-based
IPS. First of all, a data collection App for android devices was
developed. After the mobile device turns on the WiFi mod-
ule, it can collect RSS information from different APs every
second and sends this information to a location server. The
responsibility of the location server is to analyze the RSS, and
calculate the estimated position of the mobile device. Then,
the user can obtain his or her real time position through our
web-based monitoring system directly on his or her mobile
device.
We conducted real-world indoor localization experiments to
evaluate the performance of the proposed RELM approaches.
The testbed is the Internet of Things Laboratory in the
School of Electrical and Electronic Engineering, Nanyang
Technological University. The area of the test-bed is around
580 m2(35.1 ×16.6 m).
202 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
TAB LE I I
COMPARISON OF SIMULATION RESULTS
Fig. 5. System architecture of our WiFi-based IPS.
The layout of the testbed is shown in Fig. 7. Eight D-link
DIR-605L WiFi cloud routers are utilized as WiFi APs for
our experiments. The Android application is installed on
a Samsung I929 Galaxy SII mobile phone. All the WiFi
RSS fingerprints at offline calibration points and online test-
ing points are collected using this phone for performance
evaluation.
The RELM model was built up by the following steps.
During the offline phase, 30 offline calibration points were
selected and 200 WiFi RSS fingerprints were collected at
Fig. 6. Cumulative percentile of error distance for IPS testing results.
each point. The positions of these 30 offline calibration points
are demonstrated in Fig. 7. By leveraging these 6000 WiFi
RSS fingerprints and their physical positions as training inputs
and training targets (outputs) accordingly, the RELM model
was constructed. During the online phase, we continued to col-
lect WiFi RSS fingerprints at online testing points for five days.
On each day, two distinct online testing points were selected
in order to reflect the environmental dynamics. The positions
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 203
Fig. 7. Positions of the WiFi APs, offline calibration points, and online testing points in the test-bed.
TABLE III
COMPARISON OF EXPERIMENTAL TESTING RESULTS
of these ten online testing points are also presented in Fig. 7.
Two hundred WiFi RSS fingerprints are collected at each point.
The parameter setting for the proposed and compared algo-
rithms in this experiment is similar with the one introduced in
Section VII-A, apart from the number of hidden units, which
is set to 1000.
The testing results with respect to four performance mea-
sures given in Section VII-A are shown in Table III.Fig.6
illustrates the comparison in terms of cumulative percentile of
error distance, which shows that the proposed CTM-RELM
can provide higher accuracy and have an obvious effect in
reducing the STD compared to ELM and OPT-ELM. On the
other hand, SR-RELM also gives an accuracy as good as
CTM-RELM, and has better performance of confining the
WCE. The above results are reasonable, since the two robust
constraints have their different emphasis. In addition, both
CTM-RELM and SR-RELM can give better performances in
REP than basic ELM.
The proposed algorithms incur longer training time due to
the introduction of second order moment constraints instead
of linear constraints. However, a slightly longer training time
is not a concern in IPSs, considering that it is the calibration
phase, e.g., procedure of radio map generating, that accounts
for the large body of time consumption. Besides, RELMs
inherit the simpleness, e.g., random feature mapping, dis-
pensation with bias b, and single layer structure from ELM,
therefore its training time is still competitive compared with
SVR and its variants.
VIII. CONCLUSION
Before concluding this paper, we provide some important
discussions.
1) Choice of the Measure for Accuracy: It is noteworthy
that, we adopt MRSE instead of the conventional root
mean square error (RMSE) as our measure. It is because
MRSE makes more practical sense than RMSE for IPSs,
which has been widely adopted in indoor positioning
contests [2]. The measure of REP is introduced in par-
ticular for ELM because it produces variation in repeated
realizations, namely, with same parameters setting, e.g.,
the number of hidden nodes, of the same training set,
ELM may draw different results. This is mainly due to
the reason that the number of hidden units is not infi-
nite so that the universal approximation using SLFNs
with random nodes may not be accurate [18]. However,
it is should be noted that, most iteratively tuning-based
algorithms such as BP, actually also face the unrepro-
ducibility issue, and from the perspective of STD, ELM
is even more stable.
2) Abandonment of Kernelized RELMs: Although we have
proposed the kernelized CTM-RELM and SR-RELM,
we did not adopt them in simulation and real-world
experiment due to their limits in scaling. Firstly, the
size of the decision variables in the kernelized CTM-
RELM formulation is N×m+2N+1, while the size
of the CTM-RELM is L×m+2N+1. Considering that
the number of training data Nis usually several times
204 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
larger than the number of hidden nodes L, we would
encounter memory issue if we implement the Kernelized
CTM-RELM. The same logic applies to the SR-RELM
case. Secondly, the kernel-based algorithms enjoy com-
putational efficiency in optimization problems when the
dimension dof the feature is larger than N, while in
our case, the size of feature is far fewer than the num-
ber of training samples, therefore it is not cost-effective
to conduct training with kernels.5Thirdly, prediction by
kernel-based methods takes O(Nd)time since it uses the
dual variables, while prediction using random-hidden-
nodes-based methods by primal variables, e.g., ELM,
OPT-ELM, and RELMs only takes O(d)[28]. The test-
ing time listed in Tables II and III is consistent with the
above claim. Although a slightly longer training time is
within the tolerance for IPSs, the fast prediction speed is
highly demanded as IPSs’ servers need to provide real-
time positioning services for large crowds in some dense
indoor environments such as shopping malls, cinemas
and airports. However, when encountering small-scale
data sets, or where the size of features is very large,
kernelized RELMs can be leveraged.
3) Implementation Tricks for RELMs: How to calculate the
covariance and mean is tricky for regression problems,
since one has to use only one sample to approximate its
corresponding statistics. In this paper, we take advantage
of the specificity of the learning problem in IPSs—
grouping. The whole data set can be divided into several
groups by their belonging calibration points, and in any
group, its members “theoretically” should have the same
RSS (input) and coordinates (output). But in reality, it is
impossible due to the uncertainties as discussed above.
However, these members in one certain group can be
intuitively used to calculate the mean and covariance
needed to represent the group for problem formulations.
By this “grouping” trick, we can further reduce the num-
ber of the constraints in (43) and (44)fromnto N/g,
where gis the size of a group the number of sampling at
one calibration point. This trick can be directly extended
to RELMs for classification problems.
4) Assumption About Additive Noises in the Feature Space:
Though we assume that the noises lying in the feature
space are additive, the simulation is conducted under
the circumstances that the inputs were corrupted with
additive disturbances. The simulation results demon-
strate that RELMs are effective for these cases. In fact,
assuming noises in the feature space are additive is
conventionally adopted by a number of ML and opti-
mization researchers [34]–[36]. It is possible that our
assumption becomes invalid under some circumstances,
e.g., input mixed with multiplicative noises. However,
5Indeed, kernel ELM possesses fast training speed, because it adopts nor-
mal equation method, i.e., it is equality constrained-optimization-based [16].
But when inequality constraints are added in the convex optimization
setting (inequality constraints can bring about the benefit of sparsity in
solutions [23], [29]), the normal closed-form method may not work any-
more. Some recent work on ELM, e.g., sparse ELM [29] has already used
the inequality constraints-based formulation. Thus, the above claim about the
computational costs still holds for kernel ELM.
the case of multiplicative noises lying in RSS is rare
in indoor environments [37]. When they are not signifi-
cant, those multiplicative noises can be seen as outliers
and Section VII-A has shown that RELMs can address
outliers (20% contamination rate) well.
To sum up, this paper proposed CTM-RELM and SR-RELM
to address the problem of noisy measurements in IPSs by intro-
ducing two CTM and SR constraints to the OPT-ELM, and
further gave two SOCP-based formulations. The kernelized
RELMs and the method to calculate the theoretical covariance
matrix in the feature space were further discussed. Simulation
results and real-world indoor localization experiments both
demonstrated that the CTM-RELM-based IPS can provide
higher accuracy and smaller STD than other algorithms-based
IPSs; while the SR-RELM-based IPS can provide better accu-
racy and smaller WCEs. The REP of the proposed algorithms
was also demonstrated to be better.
The future work will focus on how to reduce the compu-
tational costs of the proposed algorithms for IPSs with large
data sets. Sparse matrix techniques will be leveraged to make
it possible. Meanwhile, more performance testing for RELMs
will be conducted for classification problems with different
combinations of 1and 2for the norm.
REFERENCES
[1] H. Zou, X. Lu, H. Jiang, and L. Xie, “A fast and precise indoor localiza-
tion algorithm based on an online sequential extreme learning machine,”
Sensors, vol. 15, no. 1, pp. 1804–1824, Jan. 2015.
[2] Q. Yang, S. J. Pan, and V. W. Zheng, “Estimating location using Wi-Fi,”
IEEE Intell. Syst., vol. 23, no. 1, pp. 8–13, Jan./Feb. 2008.
[3] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
vol. 20, no. 3, pp. 273–297, Mar. 1995.
[4] H. Liu, H. Darabi, P. Banerjee, and J. Liu, “Survey of wireless indoor
positioning techniques and systems,” IEEE Trans. Syst., Man, Cybern. C,
Appl. Rev., vol. 37, no. 6, pp. 1067–1080, Nov. 2007.
[5] N. Kothari, B. Kannan, E. D. Glasgwow, and M. B. Dias, Robust indoor
localization on a commercial smart phone,” Proc. Comput. Sci., vol. 10,
pp. 1114–1120, Aug. 2012.
[6] W. Meng, W. Xiao, W. Ni, and L. Xie, “Secure and robust Wi-Fi finger-
printing indoor localization,” in Proc. Int. Conf. Indoor Position. Indoor
Nav. (IPIN), Guimarães, Portugal, Sep. 2011, pp. 1–7.
[7] G.-B. Huang and L. Chen, “Convex incremental extreme learning
machine,” Neurocomputing, vol. 70, no. 16, pp. 3056–3062, Oct. 2007.
[8] W. Xi-Zhao, S. Qing-Yan, M. Qing, and Z. Jun-Hai, “Architecture selec-
tion for networks trained with extreme learning machine using local-
ized generalization error model,” Neurocomputing, vol. 102, pp. 3–9,
Feb. 2013.
[9] W. Xiao, P. Liu, W.-S. Soh, and Y. Jin, “Extreme learning machine for
wireless indoor localization,” in Proc. 11th Int. Conf. Inf. Process. Sens.
Netw., Beijing, China, Apr. 2012, pp. 101–102.
[10] J. Liu, Y. Chen, M. Liu, and Z. Zhao, “SELM: Semi-supervised
ELM with application in sparse calibrated location estimation,”
Neurocomputing, vol. 74, no. 16, pp. 2566–2572, Sep. 2011.
[11] R. Wang, Y.-L. He, C.-Y. Chow, F.-F. Ou, and J. Zhang, “Learning
ELM-tree from big data based on uncertainty reduction,” Fuzzy Sets
Syst., vol. 258, pp. 79–100, Jan. 2015.
[12] J. Zhai, H. Xu, and Y. Li, “Fusion of extreme learning machine with
fuzzy integral,” Int. J. Uncertain. Fuzz. Knowl.-Based Syst., vol. 21,
pp. 23–34, Dec. 2013.
[13] P. Horata, S. Chiewchanwattana, and K. Sunat, “Robust extreme learning
machine,” Neurocomputing, vol. 102, pp. 31–44, Feb. 2013.
[14] L. M. Ni, Y. Liu, Y. C. Lau, and A. P. Patil, “LANDMARC: Indoor
location sensing using active RFID, Wireless Netw., vol. 10, no. 6,
pp. 701–710, Nov. 2004.
[15] H. Zou, H. Wang, L. Xie, and Q.-S. Jia, An RFID indoor positioning
system by using weighted path loss and extreme learning machine,”
in Proc. 1st IEEE Int. Conf. Cyber-Phys. Syst. Netw. Appl. (CPSNA),
Taipei, Taiwan, Aug. 2013, pp. 66–71.
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 205
[16] G.-B. Huang, “An insight into extreme learning machines: Random
neurons, random features and kernels,” Cogn. Comput., vol. 6, no. 3,
pp. 1–15, Sep. 2014.
[17] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning
machine: Theory and applications,” Neurocomputing, vol. 70, nos. 1–3,
pp. 489–501, Dec. 2006.
[18] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using
incremental constructive feedforward networks with random hidden
nodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.
[19] M.-B. Li, G.-B. Huang, P. Saratchandran, and N. Sundararajan,
“Fully complex extreme learning machine, Neurocomputing, vol. 68,
pp. 306–314, Oct. 2005.
[20] G. Huang, S. Song, J. N. Gupta, and C. Wu, “Semi-supervised and
unsupervised extreme learning machines,” IEEE Trans. Cybern., vol. 44,
no. 12, pp. 2405–2417, Dec. 2014.
[21] G.-B. Huang, X. Ding, and H. Zhou, “Optimization method based
extreme learning machine for classification,” Neurocomputing, vol. 74,
no. 1, pp. 155–163, Dec. 2010.
[22] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,
Stat. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004.
[23] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning
machine for regression and multiclass classification,” IEEE Trans. Syst.,
Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Apr. 2012.
[24] V. Vapnik, S. E. Golowich, and A. Smola, “Support vector method for
function approximation, regression estimation, and signal processing,”
in Proc. Adv. Neural Inf. Process. Syst., 1997, pp. 281–287.
[25] P. K. Shivaswamy, C. Bhattacharyya, and A. J. Smola, “Second order
cone programming approaches for handling missing and uncertain data,”
J. Mach. Learn. Res., vol. 7, pp. 1283–1314, Jul. 2006.
[26] G. Huang, S. Song, C. Wu, and K. You, “Robust support vector regres-
sion for uncertain input and output data,” IEEE Trans. Neural Netw.
Learn. Syst., vol. 23, no. 11, pp. 1690–1700, Nov. 2012.
[27] J. Navarro, A very simple proof for the multivariate Chebyshev
inequality, Commun. Stat. Theory Methods, Dec. 2013.
[28] K. P. Murphy, Machine Learning: A Probabilistic Perspective.
Cambridge, MA, USA: MIT Press, 2012.
[29] Z. Bai, G.-B. Huang, D. Wang, H. Wang, and M. B. Westover, “Sparse
extreme learning machine for classification,” IEEE Trans. Cybern.,
vol. 25, no. 4, pp. 836–843, Apr. 2014.
[30] B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component anal-
ysis as a kernel eigenvalue problem, Neural Comput., vol. 10, no. 5,
pp. 1299–1319, Jul. 1998.
[31] T. Chrysikos, G. Georgopoulos, and S. Kotsopoulos, “Site-specific val-
idation of ITU indoor path loss model at 2.4 GHz,” in Proc. IEEE Int.
Symp. World Wireless Mobile Multimedia Netw. Workshops (WoWMoM),
Kos, Greece, Jun. 2009, pp. 1–6.
[32] J. A. Suykens et al., Least Squares Support Vector Machines,vol.4.
River Edge, NJ, USA: World Scientific, 2002.
[33] M. C. Grant, S. P. Boyd, and Y. Ye. (Jun. 2014). CVX: MATLAB
Software for Disciplined Convex Programming (Web Page and Software).
[Online]. Available: http://cvxr.com/cvx
[34] H. Xu, C. Caramanis, and S. Mannor, “Robustness and regularization of
support vector machines,” J. Mach. Learn. Res., vol. 10, pp. 1485–1510,
Jul. 2009.
[35] D. Bertsimas, D. B. Brown, and C. Caramanis, “Theory and applica-
tions of robust optimization,” SIAM Rev., vol. 53, no. 3, pp. 464–501,
Aug. 2011.
[36] K. P. Bennett and E. Parrado-Hernández, “The interplay of optimiza-
tion and machine learning research,” J. Mach. Learn. Res.,vol.7,
pp. 1265–1281, Jul. 2006.
[37] A. Goldsmith, Wireless Communications. Cambridge, NY, USA:
Cambridge Univ. Press, 2005.
Xiaoxuan Lu received the B.Eng. degree from the Nanjing University of
Aeronautics and Astronautics, Nanjing, China, in 2013. He is currently
pursuing the M.Eng. degree from the School of Electrical and Electronic
Engineering, Nanyang Technological University, Singapore.
His current research interests include machine learning, mobile computing,
signal processing, and their applications to energy reduction in buildings.
Han Zou received the B.Eng. (First Class Honors) degree from Nanyang
Technological University, Singapore, in 2012, where he is currently pursuing
the Ph.D. degree from the School of Electrical and Electronic Engineering.
He is currently a Graduate Student Researcher with the Berkeley Education
Alliance for Research in Singapore Limited, Singapore. His current research
interests include wireless sensor networks, mobile computing, indoor posi-
tioning and navigation systems, indoor human activity sensing and inference,
and occupancy modeling in buildings.
Hongming Zhou received the B.Eng. and Ph.D. degrees from Nanyang
Technological University, Singapore, in 2009 and 2014, respectively.
He is currently a Research Fellow with the School of Electrical and
Electronic Engineering, Nanyang Technological University. His current
research interests include classification and regression algorithms such as
extreme learning machines, neural networks, and support vector machines
as well as their applications including heating, ventilation and air condition-
ing system control applications, biometrics identification, image retrieval, and
financial index prediction.
Lihua Xie (F’07) received the B.E. and M.E. degrees from the Nanjing
University of Science and Technology, Nanjing, China, in 1983 and 1986,
respectively, and the Ph.D. degree from the University of Newcastle,
Callaghan, NSW, Australia, in 1992, all in electrical engineering.
Since 1992, he has been at the School of Electrical and Electronic
Engineering, Nanyang Technological University, Singapore. From 1986 to
1989, he was a Teacher at the Department of Automatic Control, Nanjing
University of Science and Technology. From 2006 to 2011, he was a
Changjiang Visiting Professor at the South China University of Technology,
Guangzhou, China. From 2011 to 2014, he was a Professor and the Head of
Division of Control and Instrumentation at Nanyang Technological University,
Singapore. His current research interests include robust control and estimation,
networked control systems, multiagent networks, and unmanned systems. He
has published over 260 journal papers and co-authored two patents and six
books.
Prof. Xie has served as an Editor of IET Book Series in Control and an
Associate Editor of a number of journals including the IEEE TRANSACTIONS
ON AUTOMATIC CONTROL,Automatica, the IEEE TRANSACTIONS ON
CONTROL SYSTEMS TECHNOLOGY, and the IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS-II.
Guang-Bin Huang (SM’04) received the B.Sc. degree in applied mathematics
and M.Eng. degree in computer engineering from Northeastern University,
Shenyang, China, in 1991 and 1994, respectively, and the Ph.D. degree in
electrical engineering from Nanyang Technological University, Singapore, in
1999.
He was at the Applied Mathematics Department and Wireless
Communication Department of Northeastern University. From 2001, he was
an Assistant Professor and an Associate Professor (with tenure) at the School
of Electrical and Electronic Engineering, Nanyang Technological University,
Singapore. He is the Principal Investigator of several industrial sponsored
research and development projects. He has also led/implemented several key
industrial projects including the Chief Architect/Designer and the Technical
Leader of Singapore Changi Airport Cargo Terminal 5 Inventory Control
System Upgrading Project. His current research interests include big data
analytics, human computer interface, brain computer interface, image process-
ing/understanding, machine-learning theories and algorithms, extreme learning
machine, and pattern recognition. He was the Highly Cited Researcher listed
in 2014—The World’s Most Influential Scientific Minds by Thomson Reuters.
He was also invited to give keynotes on numerous international conferences.
Dr. Huang was the recipient of the Best Paper Award from the IEEE
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS in
2013. He is currently serving as an Associate Editor of Neurocomputing,
Cognitive Computation,Neural Networks, and the IEEE TRANSACTIONS ON
CYBERNETICS.
... Further, second-order cone programming, which is extensively used in robust convex optimization problems, was specifically integrated into ELM [29]. Probabilistic regularized ELM (PRELM) [30], considering the modeling error distribution, constructed a new objective function to minimize the mean and variance of the modeling error. ...
... The computational burden of the novel ELM is relatively heavy [29] Probabilistic regularized ELM PRELM 2018 Its objective function still retains the square loss function ( 2 -norm). However, it has been pointed out that the 2 -norm is prone to be badly affected by outliers. ...
Article
Full-text available
The extreme learning machine (ELM) is a well-known approach for training single hidden layer feedforward neural networks (SLFNs) in machine learning. However, ELM is most effective when used for regression on datasets with simple Gaussian distributed error because it often employs a squared loss in its objective function. In contrast, real-world data is often collected from unpredictable and diverse contexts, which may contain complex noise that cannot be characterized by a single distribution. To address this challenge, we propose a robust mixture ELM algorithm, called Mixture-ELM, that enhances modeling capability and resilience to both Gaussian and non-Gaussian noise. The Mixture-ELM algorithm uses an adjusted objective function that blends Gaussian and Laplacian distributions to approximate any continuous distribution and match the noise. The Gaussian mixture accurately models the residual distribution, while the inclusion of the Laplacian distribution addresses the limitations of the Gaussian distribution in identifying outliers. We derive a solution to the novel objective function using the expectation maximization (EM) and iteratively reweighted least squares (IRLS) algorithms. We evaluate the effectiveness of the algorithm through numerical simulation and experiments on benchmark datasets, thereby demonstrating its superiority over other state-of-the-art machine learning methods in terms of robustness and generalization.
... In [47], the authors utilized a stacked AE-ELM to efficiently compress seismic data, achieving comparable performance with much lower training times and effort, compared to the benchmark. Lu et al. [48] utilized the AE-ELM structure for fingerprintbased indoor positioning, efficiently removing the noise from the measurements, increasing the positioning accuracy and reducing the worst-case error, compared to the other IPS. The suitability of stacked AE-ELM was also evaluated in [21], where the proposed model outperformed the commonly utilized fingerprinting methods such as k-NN. ...
Article
Full-text available
Indoor positioning based on machinelearning models has attracted widespread interest in the last few years, given its high performance and usability. Supervised, semi-supervised, and unsupervised models have thus been widely used in this field not only to estimate the user position but also to compress, clean, and denoise fingerprinting datasets. Some scholars have focused on developing, improving and optimizing machine learning models to provide accurate solutions to the end user. This paper introduces a novel method to initialize the input weights in Autoencoder Extreme Learning Machine (AE-ELM), namely Factorised Input Data (FID), which is based on the normalized form of the orthogonal component of the input data. AE-ELM with Factorised Input Data (FID) weight initialization is used to efficiently reduce the radio map. Once the dimensionality of the dataset is reduced, we use k-nearest neighbors (k-NN) to perform the position estimation. This research work includes a comparative analysis with several traditional ways to initialize the input weights in AE-ELM, showing that FID provides a significantly better reconstruction error. Finally, we perform an assessment with 13 indoor positioning datasets collected from different buildings and in different countries. We show that the dimensionality of the datasets can be reduced more than 11 times on average, while the positioning error suffers only a small increment of 15% (on average) in comparison to the baseline.
... During the optimization process, we mainly focus on the optimization problem of the connection weight β and tuning parameter ζ. To obtain consistent and globally optimal estimators for the iterative process, we implement an l 1 -norm extreme learning machine (l 1 -norm-ELM) that replaces the l 2 loss with l 1 loss function (or medium regression) [35] as the initial model. This model is insensitive to outliers. ...
Article
Full-text available
In time series forecasting with outliers and random noise, parameter estimation in a neural network via minimizing the l2l_{2} loss is unreliable. Therefore, an adaptive rescaled lncosh loss function is proposed in this article to handle time series modeling with outliers and random noise. It overcomes the limitation of the single distribution of traditional loss functions and can switch among l1l_{1} , l2l_{2} , and the Huber losses. A tuning parameter in the loss function is estimated by using a “working” likelihood approach according to estimated residuals. From the proposed loss function, a robust adaptive rescaled lncosh neural network (RARLNN) regression model is developed for highly accurate predictions. In the training phase of the model, an iterative learning procedure is presented to estimate the tuning parameter and train the neural network in iterations. A new prediction interval construction method is also developed based on quantile theory. The proposed RARLNN model is applied to two groups of wind speed forecasting tasks. The results show that the proposed RARLNN model is more conducive to enhancing forecasting accuracy and stability from the perspectives of noise distribution and outliers.
Article
Short-range indoor localization is one of the key necessities in automation industries and healthcare setups. With its increasing demand, the need for more precise positioning systems is rapidly increasing. Millimeter-wave (mm-wave) technology is emerging to enable highly precise localization performance. However, due to the limited availability of low-cost mm-wave sensors, it is challenging to accelerate research on real data. Furthermore, noise due to the hardware components of a sensor incurs perturbation in the received signal, which corrupts the estimation of range and the angle of arrival (AoA). Owing to the huge success of data-driven algorithms in solving regression problems, we propose a data-driven approach, which employs two deep learning (DL) based regression models i.e., dense neural network and convolutional neural network, and compare their performance with two machine learning based regression models, linear regression and support vector regression, to reduce errors in the estimate of AoA and range obtained via a mm-wave sensor. Our main goal is to optimize the localization measurements acquired from a low-cost mm-wave sensor for short-range applications. This will accelerate the development of proof of concepts and foster research on cost-effective mm-wave based indoor positioning systems. All experiments were conducted using over-the-air data collected with a mm-wave sensor, and the validity of the experiments was verified in unseen environments. The results obtained from our experimental evaluations, both for in-sample and out-of-sample testing, indicate improvements in the estimation of AoA and range with our proposed DL models. The improvements achieved were greater than 15% for AoA estimation and over 85% for range estimation compared to the baseline methods.
Article
Full-text available
span lang="EN-US">In recent years, indoor navigation and localization has become a popular alternative to paper-based maps. However, the most popular navigation approach of using the global positioning satellite (GPS) does not work well indoors and the majority of current approaches designed for indoor navigation does not provide realistic solutions to key challenges, including implementation cost, accuracy, longer computation processes, and practicality. The step count method was proposed to solve these issues. This paper introduces GoMap - a mobile-based indoor locator map application, which combines the step counting technique and augmented reality (AR). The design and architecture of GoMap is described in this paper. Two small-scale studies were conducted to demonstrate the performance of GoMap. The first study found that GoMap’s performance and accuracy was comparable to other step counting app such as “Google Fit”. The second part of the study demonstrated the feasibility of the application when used in a real-world setting. The findings from the studies show that GoMap is a promising application that can help the indoor navigation process.</span
Article
With the advent of Bluetooth Low Energy (BLE)-enabled smartphones, there has been considerable interest in investigating BLE-based distancing/positioning methods (e.g., for social distancing applications). In this paper, we present a novel hybrid learning method to support Mobile Ad-hoc Distancing (MAD) / Positioning (MAP) using BLE-enabled smartphones. Compared to traditional BLE-based distancing/positioning methods, the hybrid learning method provides the following unique features and contributions. First, it combines unsupervised learning, supervised learning and genetic algorithms for enhancing distance estimation accuracy. Second, unsupervised learning is employed to identify three pseudo channels/clusters for enhanced RSSI data processing. Third, its underlying mechanism is based on a new pattern-inspired approach to enhance the machine learning process. Fourth, it provides a flagging mechanism to alert users if a predicted distance is accurate or not. Fifth, it provides a model aggregation scheme with an innovative two-dimensional genetic algorithm to aggregate the distance estimation results of different machine learning models. As an application of hybrid learning for distance estimation, we also present a new MAP scenario with an iterative algorithm to estimate mobile positions in an ad-hoc environment. Experimental results show the effectiveness of the hybrid learning method. In particular, hybrid learning without flagging and with flagging outperform the baseline by 57 and 65 percent respectively in terms of mean absolute error. By means of model aggregation, a further 4 percent improvement can be realized. The hybrid learning approach can also be applied to previous work to enhance distance estimation accuracy and provide valuable insights for further research.
Article
Seamless positioning and navigation requires an integration of outdoor and indoor positioning systems. Until recently, these systems mostly function in-silos. Though GNSS has become a standalone system for outdoors, no unified positioning modality could be found for indoor environments. Wi-Fi and Bluetooth signals are popular choices though. Increased adoption of different machine learning techniques for indoor–outdoor context detection and localization could be witnessed in the recent literature. The difficulty in precise data annotation, need for sensor fusion, the effect of different hardware configurations pose critical challenges that affect the success of indoor–outdoor (IO) positioning systems. Wireless sensor-based techniques are explicitly programmed, hence estimating locations dynamically becomes challenging. Machine learning and deep learning techniques can be used to overcome such situations and react appropriately by self-learning through experiences and actions without human intervention or reprogramming. Hence, the focus of the work is to present the readers a comprehensive survey of the applicability of machine learning and deep learning to achieve seamless navigation. The paper systematically discusses the application perspectives, research challenges, and the framework of ML (mostly) and DL (a few) based positioning approaches. The comparisons against various parameters like the technology used, the procedure applied, output metric and challenges are presented along with experimental results on benchmark datasets. The paper contributes to bridging the IO localization approaches with IO detection techniques so as to pave the way into the research domain for seamless positioning. Recent advances and hence, possible future research directions in the context of IO localization have also been articulated.
Article
Full-text available
Low-cost localization solutions for indoor environments have a variety of real-world applications ranging from emergency evacuation to mobility aids for people with disabilities. In this paper, we introduce a methodology for indoor localization using a commercial smart-phone combining dead reckoning and Wifi signal strength fingerprinting. Additionally, we outline an automated procedure for collecting Wifi calibration data that uses a robot equipped with a laser rangefinder and fiber optic gyroscope. These measurements along with a generated robot map of the environment are combined using a particle filter towards robust pose estimation. The uniqueness of our approach lies in the implementation of the complementary nature of the solution as well as in the efficient adaptation to the smart-phone platform. The system was tested using multiple participants in two different indoor environments, and achieved localization accuracies on the order of 5 meters; sufficient for a variety of navigation and context-aware applications. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [name organizer]
Article
Full-text available
Nowadays, developing indoor positioning systems (IPSs) has become an attractive research topic due to the increasing demands on location-based service (LBS) in indoor environments. WiFi technology has been studied and explored to provide indoor positioning service for years in view of the wide deployment and availability of existing WiFi infrastructures in indoor environments. A large body of WiFi-based IPSs adopt fingerprinting approaches for localization. However, these IPSs suffer from two major problems: the intensive costs of manpower and time for offline site survey and the inflexibility to environmental dynamics. In this paper, we propose an indoor localization algorithm based on an online sequential extreme learning machine (OS-ELM) to address the above problems accordingly. The fast learning speed of OS-ELM can reduce the time and manpower costs for the offline site survey. Meanwhile, its online sequential learning ability enables the proposed localization algorithm to adapt in a timely manner to environmental dynamics. Experiments under specific environmental changes, such as variations of occupancy distribution and events of opening or closing of doors, are conducted to evaluate the performance of OS-ELM. The simulation and experimental results show that the proposed localization algorithm can provide higher localization accuracy than traditional approaches, due to its fast adaptation to various environmental dynamics.
Article
Full-text available
Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.
Article
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS-SVM for Unsupervised Learning LS-SVM for Recurrent Networks and Control.
Article
Extreme learning machines (ELMs) basically give answers to two fundamental learning problems: (1) Can fundamentals of learning (i.e., feature learning, clustering, regression and classification) be made without tuning hidden neurons (including biological neurons) even when the output shapes and function modeling of these neurons are unknown? (2) Does there exist unified framework for feedforward neural networks and feature space methods? ELMs that have built some tangible links between machine learning techniques and biological learning mechanisms have recently attracted increasing attention of researchers in widespread research areas. This paper provides an insight into ELMs in three aspects, viz: random neurons, random features and kernels. This paper also shows that in theory ELMs (with the same kernels) tend to outperform support vector machine and its variants in both regression and classification applications with much easier implementation.
Article
In this short note a very simple proof of the Chebyshev's inequality for random vectors is given. This inequality provides a lower bound for the percentage of the population of an arbitrary random vector X with finite mean μ =E(X) and a positive definite covariance matrix V = Cov(X) whose Mahalanobis distance with respect to V to the mean μ is less than a fixed value. The main advantage of the proof is that it is a simple exercise for a first year probability course. An alternative proof based on principal components is also provided. This proof can be used to study the case of a singular covariance matrix V.