ArticlePDF Available

Robust Extreme Learning Machine With its Application to Indoor Positioning

Authors:

Abstract and Figures

The increasing demands of location-based services have spurred the rapid development of indoor positioning system and indoor localization system interchangeably (IPSs). However, the performance of IPSs suffers from noisy measurements. In this paper, two kinds of robust extreme learning machines (RELMs), corresponding to the close-to-mean constraint, and the small-residual constraint, have been proposed to address the issue of noisy measurements in IPSs. Based on whether the feature mapping in extreme learning machine is explicit, we respectively provide random-hidden-nodes and kernelized formulations of RELMs by second order cone programming. Furthermore, the computation of the covariance in feature space is discussed. Simulations and real-world indoor localization experiments are extensively carried out and the results demonstrate that the proposed algorithms can not only improve the accuracy and repeatability, but also reduce the deviation and worst case error of IPSs compared with other baseline algorithms.
Content may be subject to copyright.
194 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
Robust Extreme Learning Machine With its
Application to Indoor Positioning
Xiaoxuan Lu, Han Zou, Hongming Zhou, Lihua Xie, Fellow, IEEE, and Guang-Bin Huang, Senior Member, IEEE
Abstract—The increasing demands of location-based services
have spurred the rapid development of indoor positioning system
and indoor localization system interchangeably (IPSs). However,
the performance of IPSs suffers from noisy measurements. In this
paper, two kinds of robust extreme learning machines (RELMs),
corresponding to the close-to-mean constraint, and the small-
residual constraint, have been proposed to address the issue
of noisy measurements in IPSs. Based on whether the feature
mapping in extreme learning machine is explicit, we respec-
tively provide random-hidden-nodes and kernelized formulations
of RELMs by second order cone programming. Furthermore,
the computation of the covariance in feature space is discussed.
Simulations and real-world indoor localization experiments are
extensively carried out and the results demonstrate that the
proposed algorithms can not only improve the accuracy and
repeatability, but also reduce the deviation and worst case error
of IPSs compared with other baseline algorithms.
Index Terms—Indoor positioning system (IPS), robust extreme
learning machine (RELM), second order cone program-
ming (SOCP).
I. INTRODUCTION
DUE to the nonline-of-sight transmission channels
between a satellite and a receiver, wireless indoor posi-
tioning has been extensively studied and a number of solutions
have been proposed in the past two decades. Unlike other wire-
less technologies, such as ultrawideband and radio frequency
identification, which require the deployment of extra infras-
tructures, the existing IEEE 802.11 network infrastructures,
such as WiFi routers, are widely available in large numbers of
commercial and residential buildings. In addition, nearly every
mobile device now is equipped with a WiFi receiver [1].
The WiFi-based machine learning (ML) approaches are
becoming popular in indoor positioning in recent years [2].
Fingerprinting method based on WiFi received signal
strength (RSS), in particular, has received a lot of attentions.
The fingerprinting localization procedure usually involves two
stages: 1) offline calibration stage and 2) online matching stage.
Manuscript received September 4, 2014; revised December 13, 2014;
accepted January 25, 2015. Date of publication February 24, 2015;
date of current version December 14, 2015. This work was sup-
ported in part by the National Research Foundation of Singapore under
Grant NRF2011NRF-CRP001-090 and Grant NRF2013EWT-EIRP004-012,
and in part by the Natural Science Foundation of China under NSFC
61120106011. This paper was recommended by Associate Editor X. Wang.
The authors are with the School of Electrical and Electronics
Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
xlu010@ntu.edu.sg).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCYB.2015.2399420
During the offline stage, a cite survey is conducted and signal
strengths received at each location from various access points
(APs) are recorded in a radio map. During the online stage, users’
positions can be estimated by matching the online RSSs with
the fingerprints stored in the radio map. Online matching strat-
egy according to the relationships between physical locations
and RSS map modeled by different ML algorithms is cru-
cial for the performance of indoor positioning systems (IPSs).
Neural network (NN) and support vector machines (SVM) [3],
as two sophisticated ML techniques, have both been utilized
in fingerprinting-based indoor positioning [4].
However, either NN or SVM-based IPSs face two chal-
lenges. On one hand, NN and SVM are time-consuming, and
this issue becomes more serious in fingerprinting-based posi-
tioning systems, because large amount of training data are
required for generating a radio map. Their high computational
costs leave us little leeway, especially for some large-scale
scenarios, to improve the performance and robustness of
ML-based IPSs. On the other hand, noisy measurements are
inevitable, considering that manual observational errors of cal-
ibrated points happen throughout the calibration phase. In
addition, signal variation and ambient dynamics also affect
the signals received by APs. These adverse factors can be con-
sidered as uncertainties, which may degrade the performance
of IPSs. Many researchers bypass optimizing ML methods to
enhance the robustness of IPSs since it will aggravate the sit-
uation of slow training rate. Kothari et al.[5] utilized the
integration of complementary localization algorithms of dead
reckoning and WiFi signal strength fingerprinting to achieve
robust indoor localization, nevertheless, a disadvantage of dead
reckoning is that the errors are cumulative, since new positions
are calculated solely from previous ones. Meng et al.[6]pro-
posed a robust noniterative three-step location sensing method,
but its capability of reducing the worst case error (WCE)
and variance is comparatively limited. Other robust indoor
localization algorithms demand either extra infrastructure or
users’ interaction during calibration phases, which is not
cost-efficient in reality.
These undesirable results motivate us to reconsider the
problem: can we find a ML technique which is fast in train-
ing and has the capability of handling the robustness issue
in IPSs? As a novel learning technique, extreme learning
machines (ELM) has been demonstrated with its outstanding
performance in training speed, prediction accuracy, and gener-
alization ability [7], [8]. Several IPSs have already leveraged
ELM to deliver accurate location estimation with fast training
speed [1], [9], [10].
2168-2267 c
2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 195
Extended from ELM, this paper proposes two robust
ELMs (RELMs), which can be implemented in the
random-hidden-nodes form or kernelization form depending
on the situation, to boost the robustness of IPSs.
The problem of uncertainty and robustness has been
intensively studied in recent years. Wang et al.[11]pro-
posed an ELM tree model-based on the heuristics of uncer-
tainty reduction and computationally lightweight for big
data classification. Fuzzy integral method is adopted to
study the probabilistic feed-forward neural networks [12].
Horata et al.[13] proposed an approach, which is also named
RELM to improve the computational robustness by extended
complete orthogonal decomposition and outlier robustness
by reweighted least squares. Unlike these works, consider-
ing the noises in IPS as discussed above, we propose our
algorithm under a stochastic framework. It is worthwhile to
mention that RELMs are based on second order cone program-
ming (SOCP), which is widely adopted in robust convex opti-
mization problems. Simulation and real-world experimental
results both demonstrate that RELMs-based IPSs outperform
other baseline algorithms-based IPSs in terms of accuracy,
repeatability (REP), and WCE.
An outline of this paper is as follows. In Section II,we
introduce the preliminaries for this paper, including basic com-
ponents of a WiFi-based IPS, backgrounds for ELM, and
its comparison with SVR. Two second order moment con-
straints, i.e., close to mean (CTM) and small residual (SR)
constraints, with their geometric interpretations are given in
Section III. The random-hidden-nodes and kernelized formula-
tions of RELMs are derived in Sections IV and V, respectively.
How to calculate the covariance in the feature space is studied
in Section VI. In Section VII, the proposed algorithms are eval-
uated by both simulation and real-world IPSs. The conclusion
is drawn in Section VIII.
II. PRELIMINARIES
A. WiFi Indoor Positioning
An enormous body of indoor positioning problems fall into
a sort of regression problem. As shown in Table I, the input
variable x(x1,x2,...,xd)is a vector of RSS received from
APs in the environment, and t(t1,t2)is the indoor 2-D physical
coordinates of a target’s location. When an AP is undetectable
in a position, its corresponding RSS is taken as 100 dBm.
The problem here is to train and approximate the regression
model.
Although in some works, the procedure of collecting sig-
nal strength involves physically moving a wireless device all
around the target area, as in [14] and [15], we only pick out
some spatially representative locations, i.e., reference (calibra-
tion) points, from the target area, and conduct sampling at each
reference point for a period of time to build up a radio map.
B. Introduction to ELM
Originally inspired by biological learning to overcome the
challenging issues faced by back propagation (BP) learn-
ing algorithms, ELM is a kind of ML algorithm based on
a generalized single-hidden layer feedforward NN (SLFN)
TAB LE I
INPUT VARIABLE:RSS(x)AND OUTPUT:LOCATION (t)
architecture [16]. It has been demonstrated to provide good
generalization performance at an extremely fast learning
speed [17]–[19].
Let ϒ={(xi,ti);i=1,2,...,N}be a training set consist-
ing of patterns, where xiR1×dand tiR1×m, then the goal
of regression is to find the relationship between xiand ti. Since
the only parameters to be optimized are the output weights,
the training of ELM is equivalent to solving a least squares
problem [20].
In the training process, the first stage is that the hidden
neurons of ELM map the inputs onto a feature space
h:xih(xi)(1)
where h(xi)R1×L.
We denote Has the hidden layer output matrix (randomized
matrix)
H=
h(x1)
h(x2)
.
.
.
h(xN)
N×L
(2)
with Lthe dimension of the feature space and βRL×mas
the output weight matrix that connects the hidden layer with
the output layer. Then, each output of ELM is given by
ti=h(xi)β,i=1,2,...,N.(3)
ELM theory aims to reach the smallest training error but
also the smallest norm of output weight [16]
min
ξ,βRL×mLP=1
2β1
1+C
2
N
i=1
ξi
s.t.h(xi)βti2
2=ξii=1,2,...,N(4)
where 1>0,
2>0,
1,
2=0,1/2,1,2,...,+∞,1Cis
the penalty coefficient on the training errors and ξiRmis
the error vector with respect to the ith training pattern.
A simplest example of the above is basic ELM [17]
min
βRL×mLP=
N
i=1
ξi
s.t.h(xi)βti2=ξii=1,2,...,N(5)
which can be solved by the least squares method
β=HT(6)
where His the Moore–Penrose generalized inverse of H.
1Unless explicitly specified, 1=2=2 for all norm notations in this
paper.
196 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
Extended from basic ELM, [21] proposed an optimization-
based ELM (OPT-ELM) for the binary classification problem
by introducing inequality constraints. We follow from [21]to
give a form of OPT-ELM for regression problems:
min
ξ,βRL×mLP=1
2β2+C
2
N
i=1
ξi
s.t.h(xi)βti≤ε+ξi
ξi0i=1,2,...,N(7)
where εis a slack variable. This formulation is very similar to
support vector regression (SVR) in a nonlinear case [3], [22],
which is in the following form:
min
ξ,w,bLPSVM =1
2w2+C
2
N
i=1
ξi
s.t.w·φ(xi)+bti≤ε+ξi
ξi0i=1,2,...,N(8)
where φ(·)is the nonlinear feature mapping function in SVR,
wis the output weights and bis the approximation (output)
bias. εand ξiare as defined in the OPT-ELM case.
Detailed comparison between ELM and SVM for classi-
fication problems are given in [21] and [23], and in the next
section we further this comparison to regression problems. For
convenience of description, we henceforth follow from [16]
to refer to the formulation of (7) as OPT-ELM, while basic
ELM stands for the formulation of (5). The terminology ELM
in the rest of this paper has more broad meaning, which
can be considered as the gathering of basic ELM and its
random-hidden-nodes-based variants.2
C. Comparisons Between ELM and SVR
Both formulations of ELM and SVR are within the scope
of quadratic programming, however, the decision variable b,
i.e., the bias term, is not existent in ELM.
SVR and its variants emphasize the importance of bias
bin their implementation. The reason is that the separation
capability of SVM was considered more important than its
regression capability when SVM was first proposed to handle
binary classification applications. Under this background, its
universal approximation capability may somehow have been
neglected [3]. Due to the inborn reason that the feature map-
ping φ(·)in SVR is unknown, it is difficult to study the
universal approximation capability of SVR without the explic-
itness of feature mapping. Since φ(·)is unknown and may not
have universal approximation capability, given a target func-
tion f(·)and any small εprecision, there may not exist a w
such that w·φ(x)f(x). In other words, there may exist
some system errors even if SVM and its variants with appro-
priate kernels can classify different classes well, and these
system errors need to be absorbed by the bias b. This may be
the reason why in principle the bias bhas to remain in the
optimization constraints [16].
2We particularly avoid including kernel ELM and its variants in the above
gathering, given the fact that they do not possess the most significant property
of ELM—random feature mapping.
On the other hand, all the parameters of the ELM map-
ping h(x)are randomly generated, and h(x)is known to users
finally. According to [17]–[19], ELM with almost any non-
linear piecewise continuous function h(x)has the universal
approximation capability. Therefore, the bias bis not necessary
in the output nodes of ELM.
In addition, from the optimization point of view, less
decision variables to be determined implies less computa-
tional costs, and this computational superiority becomes more
obvious when the scale of the training data gets larger.
Kernel ELM is somehow superior to SVR for the sake of
flexibility in kernels. Namely, the feature mapping to form
the kernels can be unknown mapping or random feature map-
ping. More introduction about kernel ELM will be given in
Section V.
Huang [16] pointed out that the “redundant” brenders SVR
sub-optimal compared with ELM if same kernels are both used
in them, because the feasible solution space of SVR is a subset
of ELM feasible solution space.
We shall indicate that the main difference between ELM
and SVR is their different account of starting points. SVR [24]
was developed at first as an extension of SVM. As mentioned
above, SVM was designed for binary classification at first, and
the subsequent variants for regression problems were devel-
oped on the basis of SVM without addressing the problem
caused by b. By contrast, ELM was originally proposed for
regression, the feature mappings h(x)are known, and univer-
sal approximation capability was considered at the first place.
Thus, in ELM, the approximation error tends to be zero and
bshould not be present [16], [21], [23].
III. ROBUST ELM
A. Uncertainties of Input and Output Data
RELM is proposed under a stochastic framework. Assume
that both input xand output data tare perturbed by noises.
Since His the feature space after nonlinear mapping from
the input space, if the input data is contaminated, His also
mixed with disturbances. We follow from [25] to assume the
disturbances in the feature space are additive:
h(xi)=h(xi)true +(ι1)i
ti=(ti)true +(ι2)i(9)
where (ι1)iand (ι2)iare uncorrelated perturbations in the
feature space and output space with proper dimensions, respec-
tively. The new vector yiR1×(L+m)is the ith input and ith
output observation, i.e., yi=[h(xi), ti]. And now we give the
following definitions:
¯
h(xi)=E(h(xi)), ¯
ti=E(ti)
i
hh =Cov(h(xi), h(xi)), i
tt =Cov(ti,ti)(10)
where E(·)and Cov(·)denote expectation and covariance
operators for random variables, respectively. Since, the per-
turbations in the feature space (ι1)iand output space (ι2)iare
uncorrelated, i.e., i
ht =0, we have
¯
yi=E([h(xi), ti])=¯
h(xi), ¯
ti
i
yy =Cov(yi,yi)=i
hh 0
0i
tt (L+m)×(L+m)
.(11)
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 197
The ith prediction error is denoted by eiR1×mand its
expectation ¯
eiis defined as follows:
ei=h(xi)βti,¯
ei=¯
h(xi)β¯
ti.(12)
It follows from [25] and [26] that, by inserting CTM and
SR constraints into SVR, the predictions can be robust to
perturbations in the data set.
CTM is a criterion on that we require the prediction errors
to be insensitive to the distribution of the noises in input and
output data
Pr
xi,yi{|ei−¯ei|≥θi}≤ηi=1,2,...,N(13)
xi,yihere are the input and output data, and θimeans the
confidence threshold while ηdenotes the maximum tolerance
of the deviation.
An alternative way to boost the robustness is restricting the
residual to be small, which leads to the SR constraint
Pr
xi,yi{|ei|≥ξi+ε}≤η(14)
where ξicorresponds to the prediction error and εis a slack
variable. Compared with the CTM constraint, the SR con-
straint requires the estimator to be robust in terms of deviations
which lead to larger estimation error rather than centering. In
fact, both CTM and SR constraints are robust constraints uti-
lized to bound probabilities of highly deviated errors subject
to second order moment constraints.
B. Sufficient Condition of CTM Constraint
It should be pointed out that, the above two robust con-
straints only consider a scalar output case, however, the
outputs of IPSs are usually vectors. Moreover, ELM or
kernel ELM algorithms are inherently different from SVR,
therefore different constraints should be provided for our
problem setting. We now give our CTM constraint for this
paper
Pr
h(xi),tiei¯
ei2θ2
iτi=1,2,...,N(15)
where θiis still a confidence threshold and τhere stands
for some probability. Nevertheless, CTM constraints in this
form are intractable. Multidimensional Chebyshev’s inequal-
ity is leveraged to convert the original constraints into tractable
ones.
Lemma 1 [27]: Let zbe an m-dimensional random row vec-
tor with expected value ¯
zand positive-definite covariance ,
then
Pr (z¯
z)1(z¯
z)Tθ2m
θ2.(16)
Proposition 1: For zand defined in Lemma 1,ifz2
, then z1zT.
Proof: Since is a real-valued symmetric matrix, it can be
diagonalized as =P1P.here is a real-valued matrix
with eigenvalues of on its diagonal. It can be shown that
≤I1≥1I(17)
which leads to
z1zT=zP11PzTzzT
(18)
and (18) gives rise to
z2⇒z1zT. (19)
Proposition 1also implies
Pr z2Pr z1zT.(20)
Theorem 1: Let βRL×mand ω=[βT,1]T
R(L+m)×mand i
yy is defined in (11), then a sufficient con-
dition for (15)is
i
yy1
2ω
θiτ/m(21)
where 1is a vector of all entries of 1 with proper length.
Proof: Substitute ei
ifor zinto (16), we have
Pr
h(xi),ti(ei¯
ei)i
ee1(ei¯
ei)Tθ2
im
θ2
i
(22)
which together with (20), leads to
Pr
h(xi),tiei¯
ei2θ2
i
Pr
h(xi),ti(ei¯
ei)i
ee1(ei¯
ei)Tθ2
i
i
ee
mi
ee
θ2
i
.(23)
Thus, mi
ee 2
iτis a sufficient condition for (15). By
taking into account that
i
ee =ωTi
yyω(24)
inserting (24)intomi
ee 2
iτand then taking the square
root on both sides, (21) follows.
C. Sufficient Condition of SR Constraint
The sufficient condition of SR constraint can be derived in
the same fashion. The SR constraint in our case is
Pr
h(xi),tiei2i+ε)2τi=1,2,...,N.(25)
Theorem 2: Let βRL×m,ω=[βT,1]TR(L+m)×m
and i
yy is defined in (11), then a sufficient condition for (25)is
i
yy1
2ω
¯
h(xi)β¯
ti
(ξi+ε)τ/m(26)
where 1is a vector of all entries of 1 with proper length.
Proof: Taking eieT
iRas a random variable, from
Markov’s inequality, we have
Pr
h(xi),tiei2i+ε)2=Pr
h(xi),tieieT
ii+ε)2
E(eieT
i)
i+ε)2.
198 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
Fig. 1. Shadow area indicates the possible region the random variable may
fall into.
Denote tr(·)as the trace operator of a matrix
EeieT
i=Etr eT
iei
=Etr eT
iei¯
eT
i¯
ei+tr ¯
eT
i¯
ei
=tr i
ee +¯
eT
i¯
ei.(27)
Since i
ee and ¯
eT
i¯
eiare both positive semi-definite, which
implies that i
ee +¯
eT
i¯
eiis positive semi-definite. Since
i
ee +¯
eT
i¯
ei
=max{λ1,...,λ
m}(28)
where λistands for an eigenvalue of i
ee +¯
eT
i¯
ei,wehave
tr i
ee +¯
eT
i¯
eim
i
ee +¯
eT
i¯
ei
(29)
which leads to
m
i
ee +¯
eT
i¯
ei
=m
i
yy1
2ω
¯
h(xi)β¯
ti
2
.(30)
By letting
m
i+ε)2
i
yy1
2ω
¯
h(xi)β¯
ti
2
τ(31)
and taking square root on both sides, we claim that (26)isa
sufficient condition for (25).
D. Geometric Interpretation
The geometric interpretations of the above claims are as
follows:
1) Proposition 1 can be interpreted as that the chance of
a random variable lying outside a sphere with radius
is greater than that of a random variable lying
outside an ellipsoid with radius and covariance
matrix . This is intuitive because the largest length
of semi-axe of the ellipsoid is equal to the radius of the
sphere and they share the same center. Fig. 1shows the
illustration when the ellipsoid and sphere are projected
onto a 2-D space.
2) The above CTM robust criterion can be understood as
a restriction that each training data yipicked from the
ellipsoid i(¯
yi,
i
yy,(m/τ )1/2)satisfies the inequality
ei¯
eiθi(32)
where
i¯
yi,
i
yy,m
τ
=yi|(yi¯
yi)i
yy1
(yi¯
yi)Tm
τ.(33)
From Theorem 1,wehave
m
i
yy
1
2ω
θi.(34)
Further, by noting that
ei¯
ei=(yi−¯yi)ω
=
(yi−¯yi)i
yy1
2i
yy
1
2ω
(yi−¯yi)i
yy1
2
i
yy
1
2ω
m
τ
i
yy
1
2ω
.(35)
It is obvious that the above geometric interpretation for
the CTM constraint holds.
3) A similar geometric interpretation can be given for the
SR constraint. Let
i
yy =i
yy +yT
iyi(36)
a SR constraint enforces each training data yipicked
from the ellipsoid i(0,
i
yy,m/τ )
i0,
i
yy,m
τ
=yi|yi
i
yy1yT
im
τ(37)
satisfies the following inequality:
ei≤ξi+ε. (38)
The procedure to verify this interpretation is in the same
fashion of the CTM case
ei=yiω
=
yi
i
yy1
2
i
yy
1
2ω
yi
i
yy1
2
i
yy
1
2ω
m
τ
i
yy
1
2ω
.(39)
From Theorem 2,wehave
i
yy1
2ω
¯
h(xi)β¯
ti
2
=
i
ee +¯
eT
i¯
ei
=
ωTi
yy +yT
iyiω
=
ωTi
yy +yT
iyiω
τ
m(ξi+ε)2.(40)
Taking square roots of (40) yields
i
yy1
2ω
τ
m(ξi+ε)(41)
which together with (39) implies
eiξi+ε. (42)
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 199
IV. ROBUST ELM FOR REGRESSION
Based on the preliminary results of last section, we now
formulate CTM-constrained RELM (CTM-RELM) and SR-
constrained RELM (SR-RELM) for noisy input and output
data.
A. CTM-Based RELM
By adding second order moment constraints to the basic
ELM formulation in Theorem 1, the CTM-RELM is formu-
lated as
min
β,b,θ,ξLP=b+C
N
i=1
ξi+D
N
i=1
θi
s.t.h(xi)βtiε+ξi
i
yy1
2ω
θiτ/m
ξi0i=1,2,...,N
β≤b(43)
where Cis defined in (7), and Dis a penalty coefficient to
control the deviation of the prediction errors.
B. SR-Based RELM
Likewise, Theorem 2also leads to a SOCP problem
formulation
min
β,b,ξLP=b+C
N
i=1
ξi
s.t.
i
yy1
2ω
¯
h(xi)β¯
ti
(ξi+ε)τ/m
ξi0i=1,2,...,N
β≤b.(44)
V. KERNELIZATION FOR RELMS
As discussed in Section II-C, the kernel trick is adopted
in SVR. In fact, the kernel trick can also be applied to
ELM. We have indicated that, the explicit nonlinear fea-
ture mapping with random hidden nodes in ELM can bring
about some advantages compared to SVR. Nevertheless, it
does not mean that the kernel trick is useless for ELM. In
reality, the capability of universal approximation of ELM
can not be fulfilled due to the curse of dimensionality.
Kernel methods enable access to the corresponding very
high-dimensional, even infinite-dimensional, feature spaces
at a low computational cost both in space and time [28].
In the case of a Gaussian kernel, the feature map lives in
an infinite dimensional space, i.e., it has infinite number of
hidden nodes L, which enables ELM to work as universal
approximator [18]. Some related works have adopted the ker-
nel method in ELM and produce desirable results [23], [29].3
In this section, we slightly modify CTM and SR constraints
3For terminology consistency, we use kernel ELM to refer to the kernel
trick-based ELM and its variants.
and then incorporate them into the kernelized formulations
of RELMs.
It follows from [23] that the optimal weight matrix βin
ELM has the form:
β=HTP(45)
where PRN×m. Once the model, i.e., β, is determined, we
can make predictions by
f(x)=h(x)β=
N
i=1
h(x)h(xi)TPi.(46)
Based on the definition of ELM kernel, we have
f(x)=
N
i=1
k(x,xi)Pi(47)
where k(·,·)is a kernel function. The kernel matrix of ELM
is defined as [16]
K=HHT:Ki,j=h(xi)·hxjT=kxi,xj(48)
when the number of training samples is n,KRN×N.
The intrinsic modularity of kernel machines also means that
any kernel function can be used provided it produces symmet-
ric, positive semi-definite kernel matrices [28]. In our case, we
restrict Knot only to satisfy the modularity but also have all
of its entries being real numbers. Thus, we can decompose K
in such way
K=K1
2K1
2(49)
where K1/2is real symmetric. From (45) and (48), we get
βTβ=PTKP =K1
2PTK1
2P(50)
which leads to β=K1/2P.
We now give the kernelized CTM constraint
i
yy
1
2
K1
2P
1
θiτ/mi=1,2,...,N(51)
where 1is a matrix of all entries of 1 with the dimen-
sion of m×m. Note that (51) is a sufficient condition of (21)
since
i
yy1
2ω
i
yy
1
2ω
i
yy
1
2
K1
2P
1
(52)
where ω=[βT,1]T, and the kernelized CTM-RELM is of
the form as
min
P,b,θ,ξLP=b+C
N
i=1
ξi+D
N
i=1
θi
s.t.Ki,:Pti≤ε+ξi
i
yy
1
2
K1
2P
1
θiτ/m
ξi0i=1,2,...,N
K1
2P
b.(53)
200 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
A similar fashion can be adopted to derive the kernelized
SR-RELM formulation
min
P,b,ξLP=b+C
N
i=1
ξi
s.t.
i
yy
1
2K1
2P
1
Ki,:Pti
(ξi+ε)τ/m
ξi0i=1,2,...,N
K1
2P
b.(54)
VI. COVARIANCE IN THE FEATURE SPACE
We firstly calculate the covariance when the nonlinear
mapping functions are known explicitly. We write h(x)as
follows:
h(x)=[G(a1,b,x),...,G(aL,b,x)] (55)
where ai,bare randomly generated weights and bias con-
necting an input and the ith hidden node. G(ai,b,x)is the
activation function.
A statistical method is provided to derive the covariance
theoretically in the feature space. For each input xi, we ran-
domly generate Zsamples {x1
i,x2
i,...,xZ
i}according to the
distribution of xiwith mean ¯
xiand covariance i
xx. Then the
covariance matrix of h(xi)can be approximated by
i
hh =1
Z
Z
z=1˜
hxz
iT˜
hxz
i(56)
where
˜
hxz
i=hxz
i1
Z
Z
z=1
hxz
i.(57)
However, the covariance in the kernel case is more delicate
and cannot be derived explicitly. Note that, in the kernelized
cases of (53) and (54), only the norm of covariance i
yy is
needed, that is
i
yy =
i
hh 0
0i
tt
=max
i
hh
,
i
tt
.(58)
i
ttcan be readily calculated, and we now give a solution
to approximate i
hh.TheL2-norm of real symmetric matrix
i
hh equals its largest eigenvalue. Let λand vbe an eigenvalue
and its corresponding eigenvector
λv=i
hhv.(59)
It is been proved in [30] that λof i
yy also satisfies
Zλα=˜
Kiα(60)
where ˜
Ki=KiLKiKiL+LKiLand LRZ×Zwith
each entry Li,j=1/Z. Here, the Z×Zmatrix Kis defined by
Ki
i,j:=kxi
i,xj
i=hxi
i·hxj
i.(61)
Fig. 2. Positions of the WiFi AP, offline calibration points, and online testing
points in the simulated field.
Hence, we can compute the L2-norm of i
hh from the set
of eigenvalues of ˜
Ki
i
hh
=1
Zmax λ˜
Ki (62)
where λ( ˜
Ki)is the set of all the eigenvalues of ˜
Ki.
VII. PERFORMANCE VERIFICATION
A. Simulation Results and Evaluation
We develop a simulation environment using MATLAB
R2013a in order to evaluate the performance of our proposed
algorithms before any real-world experiment is conducted. As
shown in Fig. 2,weassumea20×20 m room where four
WiFi APs are installed at the four corners of the room. The
most commonly used path loss model for indoor environment
is the ITU indoor propagation model [31]. Since it provides
a relation between the total path loss PL (dBm) and distance
d(m), it is adopted to simulate the WiFi signal generated from
each WiFi AP. The indoor path loss model can be expressed as
PL(d)=PL010αlog(d)+X(63)
where PL0is the path loss coefficient and it is set to be
40 dBm in our simulation. αis the path loss exponent and
Xrepresents some random noises.
The distribution of RSS indication from four real-world APs
in our IPS is illustrated in Fig. 3. As shown in Fig. 3, the sig-
nals collected by one AP can be quite different even at a same
location due to noises and outliers. Therefore, four different
types of data with disturbances are generated based on (63),
i.e., data mixed with the Gaussian noise XN(0,1), data
mixed with the student’s noise XT(0,1,1), data mixed
with the gamma noise XGa(1,1)and data contaminated by
one-sided outliers (20% contamination rate),4to test the per-
formance of RELMs. To make our simulation more practical,
100 testing samples are artificially generated at each training
point and testing point, respectively using (63) with different
perturbations.
We apply our RELMs to the simulated data, and com-
pare our proposed algorithms with basic ELM, OPT-ELM,
4The strategy of adding outliers here is similar to the one of [13].
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 201
Fig. 3. RSS index of distribution of four APs at one position.
kernel ELM, and SVR [32]. In the CTM-RELM formulation,
there are three hyperparameters, C,D, and τto be tuned.
Cand Dare both selected by grid method from the exponential
sequence [25,24,...,25] utilizing fivefold cross-validation
on the training data set. τincreases from 0.1 to 1 with a step
size of 0.1. In SR-RELM case, there are two hyper-parameters,
Cand τto be tuned, they are all selected with the same strat-
egy as CTM-RELM. For both RELMs, the slack variable εis
empirically selected as 0.05. The SOCP problems are solved
by CVX MATLAB toolbox [33]. Since the performances of
ELM and its variants are not sensitive to the number of hid-
den nodes Las long as it is larger than some threshold [23],
we fix Las 500 for our proposed algorithms, basic ELM and
OPT-ELM to facilitate the comparison of computational costs.
The width of Gaussian kernel λused in SVR and kernel ELM
are selected from the exponential sequence [25,24,...,25]
utilizing fivefold cross-validation.
Four performance measures are introduced: mean root
square error (MRSE), standard deviation (STD), WCE, and
REP over rrepeated realizations. Noted that MRSE, STD,
and WCE in this case are taken from the mean over the r
repeated realizations. REP is measured by the deviation of
the MRSE over the repeated realizations, and this measure is
proposed based on the fact that ELM with same parameters,
e.g., the number of hidden nodes, in the same training data
set may draw quite different results. rin our experiment is
selected as 30
MRSE =1
r
r
j=1!1
s
s
i=1
tihiˆ
β
"j
STD =1
r
r
j=1
%
&
&
's
i=1!
tihiˆ
β
1
s
s
i=1
tihiˆ
β
"2
j
WCE =1
r
r
j=1max
iS
tihiˆ
β
j
REP =%
&
&
&
'1
r
r
j=1!1
s
s
i=1
tihiˆ
β
MRSE"2
j
Fig. 4. Cumulative percentile of error distance for simulation data sets.
where sis the number of testing samples, Sis the index set
of testing samples like [1,2,...,s].
As shown in Fig. 4, the proposed two algorithms outperform
the other four algorithms in terms of accuracy and WCE. More
exact number can be found in Table II, from which we see that
the REP of the RELMs-based systems is improved compared
with basic ELM and OPT-ELM-based ones. The enhancement
of the REP is due to more constraints brought in our algo-
rithms, which shrinks the size of solution searching space.
Note that, the shrinking happening here is different from the
one discussed in [21], in which the loss of solution searching
freedom of SVR is caused by the redundant b[16].
B. Evaluation in Real-World IPSs
The system architecture of our WiFi-based IPS is shown
in Fig. 5. The main components of this system consist of
the existing commercial WiFi APs, mobile devices with WiFi
function, a location server and a web-based monitoring system.
The following is a brief operation procedure of our WiFi-based
IPS. First of all, a data collection App for android devices was
developed. After the mobile device turns on the WiFi mod-
ule, it can collect RSS information from different APs every
second and sends this information to a location server. The
responsibility of the location server is to analyze the RSS, and
calculate the estimated position of the mobile device. Then,
the user can obtain his or her real time position through our
web-based monitoring system directly on his or her mobile
device.
We conducted real-world indoor localization experiments to
evaluate the performance of the proposed RELM approaches.
The testbed is the Internet of Things Laboratory in the
School of Electrical and Electronic Engineering, Nanyang
Technological University. The area of the test-bed is around
580 m2(35.1 ×16.6 m).
202 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
TAB LE I I
COMPARISON OF SIMULATION RESULTS
Fig. 5. System architecture of our WiFi-based IPS.
The layout of the testbed is shown in Fig. 7. Eight D-link
DIR-605L WiFi cloud routers are utilized as WiFi APs for
our experiments. The Android application is installed on
a Samsung I929 Galaxy SII mobile phone. All the WiFi
RSS fingerprints at offline calibration points and online test-
ing points are collected using this phone for performance
evaluation.
The RELM model was built up by the following steps.
During the offline phase, 30 offline calibration points were
selected and 200 WiFi RSS fingerprints were collected at
Fig. 6. Cumulative percentile of error distance for IPS testing results.
each point. The positions of these 30 offline calibration points
are demonstrated in Fig. 7. By leveraging these 6000 WiFi
RSS fingerprints and their physical positions as training inputs
and training targets (outputs) accordingly, the RELM model
was constructed. During the online phase, we continued to col-
lect WiFi RSS fingerprints at online testing points for five days.
On each day, two distinct online testing points were selected
in order to reflect the environmental dynamics. The positions
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 203
Fig. 7. Positions of the WiFi APs, offline calibration points, and online testing points in the test-bed.
TABLE III
COMPARISON OF EXPERIMENTAL TESTING RESULTS
of these ten online testing points are also presented in Fig. 7.
Two hundred WiFi RSS fingerprints are collected at each point.
The parameter setting for the proposed and compared algo-
rithms in this experiment is similar with the one introduced in
Section VII-A, apart from the number of hidden units, which
is set to 1000.
The testing results with respect to four performance mea-
sures given in Section VII-A are shown in Table III.Fig.6
illustrates the comparison in terms of cumulative percentile of
error distance, which shows that the proposed CTM-RELM
can provide higher accuracy and have an obvious effect in
reducing the STD compared to ELM and OPT-ELM. On the
other hand, SR-RELM also gives an accuracy as good as
CTM-RELM, and has better performance of confining the
WCE. The above results are reasonable, since the two robust
constraints have their different emphasis. In addition, both
CTM-RELM and SR-RELM can give better performances in
REP than basic ELM.
The proposed algorithms incur longer training time due to
the introduction of second order moment constraints instead
of linear constraints. However, a slightly longer training time
is not a concern in IPSs, considering that it is the calibration
phase, e.g., procedure of radio map generating, that accounts
for the large body of time consumption. Besides, RELMs
inherit the simpleness, e.g., random feature mapping, dis-
pensation with bias b, and single layer structure from ELM,
therefore its training time is still competitive compared with
SVR and its variants.
VIII. CONCLUSION
Before concluding this paper, we provide some important
discussions.
1) Choice of the Measure for Accuracy: It is noteworthy
that, we adopt MRSE instead of the conventional root
mean square error (RMSE) as our measure. It is because
MRSE makes more practical sense than RMSE for IPSs,
which has been widely adopted in indoor positioning
contests [2]. The measure of REP is introduced in par-
ticular for ELM because it produces variation in repeated
realizations, namely, with same parameters setting, e.g.,
the number of hidden nodes, of the same training set,
ELM may draw different results. This is mainly due to
the reason that the number of hidden units is not infi-
nite so that the universal approximation using SLFNs
with random nodes may not be accurate [18]. However,
it is should be noted that, most iteratively tuning-based
algorithms such as BP, actually also face the unrepro-
ducibility issue, and from the perspective of STD, ELM
is even more stable.
2) Abandonment of Kernelized RELMs: Although we have
proposed the kernelized CTM-RELM and SR-RELM,
we did not adopt them in simulation and real-world
experiment due to their limits in scaling. Firstly, the
size of the decision variables in the kernelized CTM-
RELM formulation is N×m+2N+1, while the size
of the CTM-RELM is L×m+2N+1. Considering that
the number of training data Nis usually several times
204 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016
larger than the number of hidden nodes L, we would
encounter memory issue if we implement the Kernelized
CTM-RELM. The same logic applies to the SR-RELM
case. Secondly, the kernel-based algorithms enjoy com-
putational efficiency in optimization problems when the
dimension dof the feature is larger than N, while in
our case, the size of feature is far fewer than the num-
ber of training samples, therefore it is not cost-effective
to conduct training with kernels.5Thirdly, prediction by
kernel-based methods takes O(Nd)time since it uses the
dual variables, while prediction using random-hidden-
nodes-based methods by primal variables, e.g., ELM,
OPT-ELM, and RELMs only takes O(d)[28]. The test-
ing time listed in Tables II and III is consistent with the
above claim. Although a slightly longer training time is
within the tolerance for IPSs, the fast prediction speed is
highly demanded as IPSs’ servers need to provide real-
time positioning services for large crowds in some dense
indoor environments such as shopping malls, cinemas
and airports. However, when encountering small-scale
data sets, or where the size of features is very large,
kernelized RELMs can be leveraged.
3) Implementation Tricks for RELMs: How to calculate the
covariance and mean is tricky for regression problems,
since one has to use only one sample to approximate its
corresponding statistics. In this paper, we take advantage
of the specificity of the learning problem in IPSs—
grouping. The whole data set can be divided into several
groups by their belonging calibration points, and in any
group, its members “theoretically” should have the same
RSS (input) and coordinates (output). But in reality, it is
impossible due to the uncertainties as discussed above.
However, these members in one certain group can be
intuitively used to calculate the mean and covariance
needed to represent the group for problem formulations.
By this “grouping” trick, we can further reduce the num-
ber of the constraints in (43) and (44)fromnto N/g,
where gis the size of a group the number of sampling at
one calibration point. This trick can be directly extended
to RELMs for classification problems.
4) Assumption About Additive Noises in the Feature Space:
Though we assume that the noises lying in the feature
space are additive, the simulation is conducted under
the circumstances that the inputs were corrupted with
additive disturbances. The simulation results demon-
strate that RELMs are effective for these cases. In fact,
assuming noises in the feature space are additive is
conventionally adopted by a number of ML and opti-
mization researchers [34]–[36]. It is possible that our
assumption becomes invalid under some circumstances,
e.g., input mixed with multiplicative noises. However,
5Indeed, kernel ELM possesses fast training speed, because it adopts nor-
mal equation method, i.e., it is equality constrained-optimization-based [16].
But when inequality constraints are added in the convex optimization
setting (inequality constraints can bring about the benefit of sparsity in
solutions [23], [29]), the normal closed-form method may not work any-
more. Some recent work on ELM, e.g., sparse ELM [29] has already used
the inequality constraints-based formulation. Thus, the above claim about the
computational costs still holds for kernel ELM.
the case of multiplicative noises lying in RSS is rare
in indoor environments [37]. When they are not signifi-
cant, those multiplicative noises can be seen as outliers
and Section VII-A has shown that RELMs can address
outliers (20% contamination rate) well.
To sum up, this paper proposed CTM-RELM and SR-RELM
to address the problem of noisy measurements in IPSs by intro-
ducing two CTM and SR constraints to the OPT-ELM, and
further gave two SOCP-based formulations. The kernelized
RELMs and the method to calculate the theoretical covariance
matrix in the feature space were further discussed. Simulation
results and real-world indoor localization experiments both
demonstrated that the CTM-RELM-based IPS can provide
higher accuracy and smaller STD than other algorithms-based
IPSs; while the SR-RELM-based IPS can provide better accu-
racy and smaller WCEs. The REP of the proposed algorithms
was also demonstrated to be better.
The future work will focus on how to reduce the compu-
tational costs of the proposed algorithms for IPSs with large
data sets. Sparse matrix techniques will be leveraged to make
it possible. Meanwhile, more performance testing for RELMs
will be conducted for classification problems with different
combinations of 1and 2for the norm.
REFERENCES
[1] H. Zou, X. Lu, H. Jiang, and L. Xie, “A fast and precise indoor localiza-
tion algorithm based on an online sequential extreme learning machine,”
Sensors, vol. 15, no. 1, pp. 1804–1824, Jan. 2015.
[2] Q. Yang, S. J. Pan, and V. W. Zheng, “Estimating location using Wi-Fi,”
IEEE Intell. Syst., vol. 23, no. 1, pp. 8–13, Jan./Feb. 2008.
[3] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
vol. 20, no. 3, pp. 273–297, Mar. 1995.
[4] H. Liu, H. Darabi, P. Banerjee, and J. Liu, “Survey of wireless indoor
positioning techniques and systems,” IEEE Trans. Syst., Man, Cybern. C,
Appl. Rev., vol. 37, no. 6, pp. 1067–1080, Nov. 2007.
[5] N. Kothari, B. Kannan, E. D. Glasgwow, and M. B. Dias, “Robust indoor
localization on a commercial smart phone,” Proc. Comput. Sci., vol. 10,
pp. 1114–1120, Aug. 2012.
[6] W. Meng, W. Xiao, W. Ni, and L. Xie, “Secure and robust Wi-Fi finger-
printing indoor localization,” in Proc. Int. Conf. Indoor Position. Indoor
Nav. (IPIN), Guimarães, Portugal, Sep. 2011, pp. 1–7.
[7] G.-B. Huang and L. Chen, “Convex incremental extreme learning
machine,” Neurocomputing, vol. 70, no. 16, pp. 3056–3062, Oct. 2007.
[8] W. Xi-Zhao, S. Qing-Yan, M. Qing, and Z. Jun-Hai, “Architecture selec-
tion for networks trained with extreme learning machine using local-
ized generalization error model,” Neurocomputing, vol. 102, pp. 3–9,
Feb. 2013.
[9] W. Xiao, P. Liu, W.-S. Soh, and Y. Jin, “Extreme learning machine for
wireless indoor localization,” in Proc. 11th Int. Conf. Inf. Process. Sens.
Netw., Beijing, China, Apr. 2012, pp. 101–102.
[10] J. Liu, Y. Chen, M. Liu, and Z. Zhao, “SELM: Semi-supervised
ELM with application in sparse calibrated location estimation,”
Neurocomputing, vol. 74, no. 16, pp. 2566–2572, Sep. 2011.
[11] R. Wang, Y.-L. He, C.-Y. Chow, F.-F. Ou, and J. Zhang, “Learning
ELM-tree from big data based on uncertainty reduction,” Fuzzy Sets
Syst., vol. 258, pp. 79–100, Jan. 2015.
[12] J. Zhai, H. Xu, and Y. Li, “Fusion of extreme learning machine with
fuzzy integral,” Int. J. Uncertain. Fuzz. Knowl.-Based Syst., vol. 21,
pp. 23–34, Dec. 2013.
[13] P. Horata, S. Chiewchanwattana, and K. Sunat, “Robust extreme learning
machine,” Neurocomputing, vol. 102, pp. 31–44, Feb. 2013.
[14] L. M. Ni, Y. Liu, Y. C. Lau, and A. P. Patil, “LANDMARC: Indoor
location sensing using active RFID,Wireless Netw., vol. 10, no. 6,
pp. 701–710, Nov. 2004.
[15] H. Zou, H. Wang, L. Xie, and Q.-S. Jia, “An RFID indoor positioning
system by using weighted path loss and extreme learning machine,”
in Proc. 1st IEEE Int. Conf. Cyber-Phys. Syst. Netw. Appl. (CPSNA),
Taipei, Taiwan, Aug. 2013, pp. 66–71.
LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 205
[16] G.-B. Huang, “An insight into extreme learning machines: Random
neurons, random features and kernels,” Cogn. Comput., vol. 6, no. 3,
pp. 1–15, Sep. 2014.
[17] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning
machine: Theory and applications,” Neurocomputing, vol. 70, nos. 1–3,
pp. 489–501, Dec. 2006.
[18] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using
incremental constructive feedforward networks with random hidden
nodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.
[19] M.-B. Li, G.-B. Huang, P. Saratchandran, and N. Sundararajan,
“Fully complex extreme learning machine,Neurocomputing, vol. 68,
pp. 306–314, Oct. 2005.
[20] G. Huang, S. Song, J. N. Gupta, and C. Wu, “Semi-supervised and
unsupervised extreme learning machines,” IEEE Trans. Cybern., vol. 44,
no. 12, pp. 2405–2417, Dec. 2014.
[21] G.-B. Huang, X. Ding, and H. Zhou, “Optimization method based
extreme learning machine for classification,” Neurocomputing, vol. 74,
no. 1, pp. 155–163, Dec. 2010.
[22] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,
Stat. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004.
[23] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning
machine for regression and multiclass classification,” IEEE Trans. Syst.,
Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Apr. 2012.
[24] V. Vapnik, S. E. Golowich, and A. Smola, “Support vector method for
function approximation, regression estimation, and signal processing,”
in Proc. Adv. Neural Inf. Process. Syst., 1997, pp. 281–287.
[25] P. K. Shivaswamy, C. Bhattacharyya, and A. J. Smola, “Second order
cone programming approaches for handling missing and uncertain data,”
J. Mach. Learn. Res., vol. 7, pp. 1283–1314, Jul. 2006.
[26] G. Huang, S. Song, C. Wu, and K. You, “Robust support vector regres-
sion for uncertain input and output data,” IEEE Trans. Neural Netw.
Learn. Syst., vol. 23, no. 11, pp. 1690–1700, Nov. 2012.
[27] J. Navarro, “A very simple proof for the multivariate Chebyshev
inequality,Commun. Stat. Theory Methods, Dec. 2013.
[28] K. P. Murphy, Machine Learning: A Probabilistic Perspective.
Cambridge, MA, USA: MIT Press, 2012.
[29] Z. Bai, G.-B. Huang, D. Wang, H. Wang, and M. B. Westover, “Sparse
extreme learning machine for classification,” IEEE Trans. Cybern.,
vol. 25, no. 4, pp. 836–843, Apr. 2014.
[30] B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component anal-
ysis as a kernel eigenvalue problem,Neural Comput., vol. 10, no. 5,
pp. 1299–1319, Jul. 1998.
[31] T. Chrysikos, G. Georgopoulos, and S. Kotsopoulos, “Site-specific val-
idation of ITU indoor path loss model at 2.4 GHz,” in Proc. IEEE Int.
Symp. World Wireless Mobile Multimedia Netw. Workshops (WoWMoM),
Kos, Greece, Jun. 2009, pp. 1–6.
[32] J. A. Suykens et al., Least Squares Support Vector Machines,vol.4.
River Edge, NJ, USA: World Scientific, 2002.
[33] M. C. Grant, S. P. Boyd, and Y. Ye. (Jun. 2014). CVX: MATLAB
Software for Disciplined Convex Programming (Web Page and Software).
[Online]. Available: http://cvxr.com/cvx
[34] H. Xu, C. Caramanis, and S. Mannor, “Robustness and regularization of
support vector machines,” J. Mach. Learn. Res., vol. 10, pp. 1485–1510,
Jul. 2009.
[35] D. Bertsimas, D. B. Brown, and C. Caramanis, “Theory and applica-
tions of robust optimization,” SIAM Rev., vol. 53, no. 3, pp. 464–501,
Aug. 2011.
[36] K. P. Bennett and E. Parrado-Hernández, “The interplay of optimiza-
tion and machine learning research,” J. Mach. Learn. Res.,vol.7,
pp. 1265–1281, Jul. 2006.
[37] A. Goldsmith, Wireless Communications. Cambridge, NY, USA:
Cambridge Univ. Press, 2005.
Xiaoxuan Lu received the B.Eng. degree from the Nanjing University of
Aeronautics and Astronautics, Nanjing, China, in 2013. He is currently
pursuing the M.Eng. degree from the School of Electrical and Electronic
Engineering, Nanyang Technological University, Singapore.
His current research interests include machine learning, mobile computing,
signal processing, and their applications to energy reduction in buildings.
Han Zou received the B.Eng. (First Class Honors) degree from Nanyang
Technological University, Singapore, in 2012, where he is currently pursuing
the Ph.D. degree from the School of Electrical and Electronic Engineering.
He is currently a Graduate Student Researcher with the Berkeley Education
Alliance for Research in Singapore Limited, Singapore. His current research
interests include wireless sensor networks, mobile computing, indoor posi-
tioning and navigation systems, indoor human activity sensing and inference,
and occupancy modeling in buildings.
Hongming Zhou received the B.Eng. and Ph.D. degrees from Nanyang
Technological University, Singapore, in 2009 and 2014, respectively.
He is currently a Research Fellow with the School of Electrical and
Electronic Engineering, Nanyang Technological University. His current
research interests include classification and regression algorithms such as
extreme learning machines, neural networks, and support vector machines
as well as their applications including heating, ventilation and air condition-
ing system control applications, biometrics identification, image retrieval, and
financial index prediction.
Lihua Xie (F’07) received the B.E. and M.E. degrees from the Nanjing
University of Science and Technology, Nanjing, China, in 1983 and 1986,
respectively, and the Ph.D. degree from the University of Newcastle,
Callaghan, NSW, Australia, in 1992, all in electrical engineering.
Since 1992, he has been at the School of Electrical and Electronic
Engineering, Nanyang Technological University, Singapore. From 1986 to
1989, he was a Teacher at the Department of Automatic Control, Nanjing
University of Science and Technology. From 2006 to 2011, he was a
Changjiang Visiting Professor at the South China University of Technology,
Guangzhou, China. From 2011 to 2014, he was a Professor and the Head of
Division of Control and Instrumentation at Nanyang Technological University,
Singapore. His current research interests include robust control and estimation,
networked control systems, multiagent networks, and unmanned systems. He
has published over 260 journal papers and co-authored two patents and six
books.
Prof. Xie has served as an Editor of IET Book Series in Control and an
Associate Editor of a number of journals including the IEEE TRANSACTIONS
ON AUTOMATIC CONTROL,Automatica, the IEEE TRANSACTIONS ON
CONTROL SYSTEMS TECHNOLOGY, and the IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS-II.
Guang-Bin Huang (SM’04) received the B.Sc. degree in applied mathematics
and M.Eng. degree in computer engineering from Northeastern University,
Shenyang, China, in 1991 and 1994, respectively, and the Ph.D. degree in
electrical engineering from Nanyang Technological University, Singapore, in
1999.
He was at the Applied Mathematics Department and Wireless
Communication Department of Northeastern University. From 2001, he was
an Assistant Professor and an Associate Professor (with tenure) at the School
of Electrical and Electronic Engineering, Nanyang Technological University,
Singapore. He is the Principal Investigator of several industrial sponsored
research and development projects. He has also led/implemented several key
industrial projects including the Chief Architect/Designer and the Technical
Leader of Singapore Changi Airport Cargo Terminal 5 Inventory Control
System Upgrading Project. His current research interests include big data
analytics, human computer interface, brain computer interface, image process-
ing/understanding, machine-learning theories and algorithms, extreme learning
machine, and pattern recognition. He was the Highly Cited Researcher listed
in 2014—The World’s Most Influential Scientific Minds by Thomson Reuters.
He was also invited to give keynotes on numerous international conferences.
Dr. Huang was the recipient of the Best Paper Award from the IEEE
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS in
2013. He is currently serving as an Associate Editor of Neurocomputing,
Cognitive Computation,Neural Networks, and the IEEE TRANSACTIONS ON
CYBERNETICS.
... With the rapid development of machine learning, Refs. [27][28][29] propose to improve the positioning accuracy. Their common characteristic is that they all need a large amount of data for training. ...
... Their common characteristic is that they all need a large amount of data for training. As a simple and fast learning method, the Extreme Learning Machine (ELM) has been widely used in the field of fingerprint-based positioning [29]. There are few works based on extreme values. ...
... ELM [29] uses the neural network of machine learning to train the positioning model and then predicts the corresponding position according to the signal strength. ...
Article
Full-text available
Wi-Fi-based fingerprint indoor positioning technology has gained special attention, but the development of this technology has been full of challenges such as positioning time cost and positioning accuracy. Therefore, selecting reasonable Wireless Access Points (APs) for positioning is essential, as the more APs used for positioning, the higher the online computation, energy and time cost. Furthermore, the received signal strength (RSS) is easily affected by diverse interference (obstacles, multipath effects, etc.), decreasing the positioning accuracy. AP selection and positioning algorithms are proposed in this paper to solve these issues. The proposed AP selection algorithm fuses RSS distribution and interval overlap degree to select a small number of APs with high importance for positioning. The proposed positioning algorithm uses the location distance between reference points (RPs) to construct a circle and leverages extreme values (maximum and minimum values) of circles to determine the possibility that the test point (TP) appears in each circle, then it finds useful APs to determine the weight of RPs. Extensive experiments are conducted in two different areas, and the results show the effectiveness of the proposed algorithm.
... (3) Dempster-Shafer Reasoning. For the inference method, its main feature is to properly deal with undetermined problems, especially for conditional probability, and it can realize a specific posterior method, but the disadvantage of this method is that its framework is relatively limited; meanwhile, conflicting combinations are prone to exist [16,17]. ...
Article
Full-text available
Outdoor positioning can often achieve accurate positioning according to GPS and mobile phone signaling, while indoor positioning is difficult to meet the needs of practical application due to the limitations of satellite reception. In order to effectively solve the problem of large error in the individual positioning strategy in the indoor environment, this paper applies multisensor in the multisource information fusion indoor positioning system. By using the positioning results of multiple sensors to limit the range of geomagnetic matching for combined matching, the matching error can be effectively reduced. Then, the global optimal value of indoor network is calculated by using the multi-information data fusion algorithm, which can optimize the initial value and threshold of the multi-information data fusion algorithm, improve the network accuracy as much as possible, and accelerate the convergence speed at the same time. After completing the optimization processing, the indoor network can obtain the combined positioning and predicted positioning results, so as to facilitate the fusion training to the actual position coordinates, and finally obtain the optimal positioning results. The simulation results show that the mean square error predicted by the multi-information data fusion algorithm calculated by the multi-information data fusion algorithm can be effectively reduced by 76%, and the fusion positioning accuracy can be improved by 48% compared with the accuracy of a single positioning strategy. The method proposed in this paper effectively improves the positioning accuracy, indicating that the positioning performance is better.
... It is suitable for both supervised and unsupervised learning problems. ELM can randomly select input weights and biases and then determine output weights by simple matrix calculations instead of using traditional gradient-based learning methods [55][56][57][58]. Traditional ELM has a single hidden-layer feedforward neural network, which has advantages in terms of learning rate and generalization ability when compared with other learning systems, such as single-layer perceptron and SVM [59]. ...
Article
Full-text available
Permanent-magnet linear motors (PMLMs) are widely used in various fields of industrial production, and the optimization design of the PMLM is increasingly attracting attention in order to improve the comprehensive performance of the motor. The primary problem of PMLM optimization design is the establishment of a motor model, and this paper summarizes the modeling of the PMLM electromagnetic field. First, PMLM parametric modeling methods (model-driven methods) such as the equivalent circuit method, analytical method, and finite element method, are introduced, and then non-parametric modeling methods (data-driven methods) such as the surrogate model and machine learning are introduced. Non-parametric modeling methods have the characteristics of higher accuracy and faster computation, and are the mainstream approach to motor modeling at present. However, surrogate models and traditional machine learning models such as support vector machine (SVM) and extreme learning machine (ELM) approaches have shortcomings in dealing with the high-dimensional data of motors, and some machine learning methods such as random forest (RF) require a large number of samples to obtain better modeling accuracy. Considering the modeling problem in the case of the high-dimensional electromagnetic field of the motor under the condition of a limited number of samples, this paper introduces the generative adversarial network (GAN) model and the application of the GAN in the electromagnetic field modeling of PMLM, and compares it with the mainstream machine learning models. Finally, the development of motor modeling that combines model-driven and data-driven methods is proposed.
... Since this neural network does not use the traditional gradient-based learning algorithms, its training time is remarkably low. ELM has been widely used for classification, regression, clustering and dimensionality reduction, providing good general performance [8]- [12]. Likewise, [13] proposed a new method based on support vector machine (SMV) for classification and data undersampling to deal with unbalanced radio-maps. ...
Preprint
Full-text available
Machine learning models have become an essential tool in current indoor positioning solutions, given their high capabilities to extract meaningful information from the environment. Convolutional neural networks (CNNs) are one of the most used neural networks (NNs) due to that they are capable of learning complex patterns from the input data. Another model used in indoor positioning solutions is the Extreme Learning Machine (ELM), which provides an acceptable generalization performance as well as a fast speed of learning. In this paper, we offer a lightweight combination of CNN and ELM, which provides a quick and accurate classification of building and floor, suitable for power and resource-constrained devices. As a result, the proposed model is 58\% faster than the benchmark, with a slight improvement in the classification accuracy (by less than 1\%
... However, since many resource-limited smart devices have been extensively installed under the framework of internet of things (IoT), finding lightweight online approaches suited for energy-limited deviceassisted-localization is still eagerly appealing under the resource-constraint environments such as the localization system for survivance rescue and unmanned aerial vehicle (UAV)-aided systems [1]. Recently, an effective extreme learning based robust algorithm is proposed by Lu et al. to locate the target in the indoor environment, which is under the close-to-mean and small-residual constraints [23], more recently, the reinforcement learning with particle filtering framework is developed for wireless indoor positioning, which has been proved its robustness by two typical indoor scenarios [36]. It is noted that the genetic algorithm together with the extreme learning is also a powerful algorithm for indoor localization reported in [34]. ...
Article
The mixed noise such as Gaussian noise together with the abrupt noise widely exists in the indoor environment, which always leads to the problem of performance degradation of the positioning system under the Internet of Things (IoT). In this paper, a novel kernel function named generalized Student’s t kernel (GSt) and a resulting sparse generalized Student’s t kernel adaptive filter (SGStKAF) is proposed to attack this problem. The proposed SGStKAF utilizes the kernel mean p-power error criterion (KMPE) with the L1-norm penalty. The proposed SGStKAF has three significant features. Firstly, the generalized Student’s t kernel can suppress the abrupt noise effectively. Secondly, the L1-norm penalty guarantees that the fixed-point sub-iteration is available so that the more precise solution can be obtained in a few iterations. At last, a sparse structure of neural networks for the implementation of the proposed method can also be obtained via the L1 constraint. Three experiments and comparisons are carried out to prove the effectiveness of the proposed positioning framework in terms of accuracy and robustness in both the simulation situation and the real-world indoor environments.
Article
Full-text available
With the rapid growth of indoor location-based services (LBS) along with the internet of things (IoT) applications in daily life, the development of indoor positioning systems has attracted significant attention. Since the global positioning system (GPS) struggles to perform indoors, cost-effective Wi-Fi fingerprinting has become a popular alternative for indoor positioning over the past decade. Accurate identification of the floor level in complex multi-floor buildings is a prerequisite for the success of the positioning systems. Nevertheless, Wi-Fi positioning accuracy is not satisfactory, mainly due to the signal overlap in the adjacent floors with hollow areas, device heterogeneity, and signal attenuation. In this paper, to overcome these issues, we propose a two-label hierarchical extreme learning machine. The affinity propagation clustering is applied to place overlapping signal points in the same cluster and improve deep learning-based ELM performance reducing the learning parameters of the classification layer. Then, two clusters and a floor label are assigned to each sample. In the training phase, the proposed model is trained based on the new labeled data. The final location is calculated based on the determined cluster and floor, in the positioning phase. The experimental result shows that the proposed method increases the average positioning accuracy by up to 40% compared to state-of-the-art methods. Also, in terms of floor identification, it outperforms compared methods.
Article
The electrical activities of the brain are recorded and measured with Electroencephalography (EEG) by means of placing the electrodes on the scalp of the brain. It is quite a famous and versatile methodology utilized in both clinical and academic research activities. In this work, sparse depiction is initially incorporated to the EEG signals by means of using K-Singular Value Decomposition (K-SVD) algorithm and the features are extracted by means of using Self-Organizing Map (SOM) technique. The extracted features are initially classified with Extreme Learning Machine (ELM) and the proposed classification versions of ELM such as Ensemble ELM model and Nature Inclined ELM Model. The proposed ensemble ELM model makes use of the combination of Modified Adaboost. RT based on wavelet thresholding with ELM. The proposed Nature Inclined ELM makes use of the combination of some famous swarm intelligence algorithms such as Genetic Algorithm based ELM (GA-ELM), Particle Swarm Optimization based ELM (PSO-ELM), Ant Colony Optimization based ELM (ACO-ELM), Artificial Bee Colony based ELM (ABC-ELM) and Glowworm Swarm Optimization based ELM (GSO-ELM).The extracted features are also classified with deep learning methodology by means of utilizing an incidental Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN). Another famous methodology using Non-negative Matrix Factorization (NMF) and Affinity Propagation Congregation based Mutual Information (APCMI) with transfer learning techniques is also proposed and implemented once the sparse modelling is done and the results are analysed. The proposed methodology is implemented for two EEG datasets such as epilepsy dataset and schizophrenia dataset and a comprehensive analysis is done with very promising results.
Article
Full-text available
Location-based services in different applications push the research toward outdoor localization for users’ equipment in Long Term Evolution (LTE) networks. Telecom operators can introduce valuable services to users based on their location, both in emergency and ordinary situations. This paper introduces DeepFeat: A deep-learning-based framework for outdoor localization using a rich feature set in LTE networks. DeepFeat works on the mobile operator side, and it leverages many mobile network features and other metrics to achieve high localization accuracy. In order to reduce computation and complexity, we introduce a feature selection module to choose the most appropriate features as inputs to the deep learning model. This module reduces the computation and complexity by around 20.6%, with enhancement in system accuracy. The feature selection module uses correlation and Chi-squared algorithms to reduce the feature set to 12 inputs only regardless of the area size, compared to a large number of cell towers in similar systems; such input increases exponentially with increasing the test area. In order to enhance the accuracy of DeepFeat, a One-to-Many augmenter is introduced to extend the dataset and improve the system’s overall performance. The results show the impact of the proper features selection adopted by DeepFeat on the system performance. DeepFeat achieved median localization accuracy of 13.179m in an outdoor environment in a mid-scale area of 6.27Km2. In a large-scale area of 45Km2, the median localization accuracy is 13.7m. DeepFeat was compared to other state-of-the-art deep-learningbased localization systems that leverage a small number of features. We show that using the DeepFeat carefully selected feature set enhances the localization accuracy compared to the state-of-the-art systems by at least 286%.
Preprint
Full-text available
Complementary to the fine-grained channel state information (CSI) from the physical layer and coarse-grained received signal strength indicator (RSSI) measurements, the mid-grained spatial beam attributes (e.g., beam SNR) that are available at millimeter-wave (mmWave) bands during the mandatory beam training phase can be repurposed for Wi-Fi sensing applications. In this paper, we propose a multi-band Wi-Fi fusion method for Wi-Fi sensing that hierarchically fuses the features from both the fine-grained CSI at sub-6 GHz and the mid-grained beam SNR at 60 GHz in a granularity matching framework. The granularity matching is realized by pairing two feature maps from the CSI and beam SNR at different granularity levels and linearly combining all paired feature maps into a fused feature map with learnable weights. To further address the issue of limited labeled training data, we propose an autoencoder-based multi-band Wi-Fi fusion network that can be pre-trained in an unsupervised fashion. Once the autoencoder-based fusion network is pre-trained, we detach the decoders and append multi-task sensing heads to the fused feature map by fine-tuning the fusion block and re-training the multi-task heads from the scratch. The multi-band Wi-Fi fusion framework is thoroughly validated by in-house experimental Wi-Fi sensing datasets spanning three tasks: 1) pose recognition; 2) occupancy sensing; and 3) indoor localization. Comparison to four baseline methods (i.e., CSI-only, beam SNR-only, input fusion, and feature fusion) demonstrates the granularity matching improves the multi-task sensing performance. Quantitative performance is evaluated as a function of the number of labeled training data, latent space dimension, and fine-tuning learning rates.
Article
Magnetic fields are often utilized for position sensing of mobile devices. In typical sensing systems, multiple sensors are used to detect magnetic fields generated by target devices. To determine the positions of the devices, magnetic-field data detected by the sensors must be converted to device-position data. The data conversion is not trivial because it is a nonlinear inverse problem. In this study, we propose a machine-learning approach suitable for data conversion required in the magnetic-field-based position sensing of target devices. In our approach, two different sets of training data are used. One of the training datasets is composed of raw data of magnetic fields to be detected by sensors. The other set is composed of logarithmically represented data of the fields. We can obtain two different predictor functions by learning with these training datasets. Results show that the prediction accuracy of the target position improves when the two different predictor functions are used. Based on our simulation, the error of the target position estimated with the predictor functions is within 10 cm in a 2 m × 2 m × 2 m cubic space for 87% of all the cases of the target device states. The computational time required for predicting the positions of the target device is 4 ms. As the prediction method is accurate and rapid, it can be utilized for the real-time tracking of moving objects and people.
Article
Full-text available
Low-cost localization solutions for indoor environments have a variety of real-world applications ranging from emergency evacuation to mobility aids for people with disabilities. In this paper, we introduce a methodology for indoor localization using a commercial smart-phone combining dead reckoning and Wifi signal strength fingerprinting. Additionally, we outline an automated procedure for collecting Wifi calibration data that uses a robot equipped with a laser rangefinder and fiber optic gyroscope. These measurements along with a generated robot map of the environment are combined using a particle filter towards robust pose estimation. The uniqueness of our approach lies in the implementation of the complementary nature of the solution as well as in the efficient adaptation to the smart-phone platform. The system was tested using multiple participants in two different indoor environments, and achieved localization accuracies on the order of 5 meters; sufficient for a variety of navigation and context-aware applications. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [name organizer]
Article
Full-text available
Nowadays, developing indoor positioning systems (IPSs) has become an attractive research topic due to the increasing demands on location-based service (LBS) in indoor environments. WiFi technology has been studied and explored to provide indoor positioning service for years in view of the wide deployment and availability of existing WiFi infrastructures in indoor environments. A large body of WiFi-based IPSs adopt fingerprinting approaches for localization. However, these IPSs suffer from two major problems: the intensive costs of manpower and time for offline site survey and the inflexibility to environmental dynamics. In this paper, we propose an indoor localization algorithm based on an online sequential extreme learning machine (OS-ELM) to address the above problems accordingly. The fast learning speed of OS-ELM can reduce the time and manpower costs for the offline site survey. Meanwhile, its online sequential learning ability enables the proposed localization algorithm to adapt in a timely manner to environmental dynamics. Experiments under specific environmental changes, such as variations of occupancy distribution and events of opening or closing of doors, are conducted to evaluate the performance of OS-ELM. The simulation and experimental results show that the proposed localization algorithm can provide higher localization accuracy than traditional approaches, due to its fast adaptation to various environmental dynamics.
Article
Full-text available
Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.
Article
Full-text available
Extreme learning machine (ELM) was initially proposed for single-hidden-layer feedforward neural networks (SLFNs). In the hidden layer (feature mapping), nodes are randomly generated independently of training data. Furthermore, a unified ELM was proposed, providing a single framework to simplify and unify different learning methods, such as SLFNs, least square support vector machines, proximal support vector machines, and so on. However, the solution of unified ELM is dense, and thus, usually plenty of storage space and testing time are required for large-scale applications. In this paper, a sparse ELM is proposed as an alternative solution for classification, reducing storage space and testing time. In addition, unified ELM obtains the solution by matrix inversion, whose computational complexity is between quadratic and cubic with respect to the training size. It still requires plenty of training time for large-scale problems, even though it is much faster than many other traditional methods. In this paper, an efficient training algorithm is specifically developed for sparse ELM. The quadratic programming problem involved in sparse ELM is divided into a series of smallest possible sub-problems, each of which are solved analytically. Compared with SVM, sparse ELM obtains better generalization performance with much faster training speed. Compared with unified ELM, sparse ELM achieves similar generalization performance for binary classification applications, and when dealing with large-scale binary classification problems, sparse ELM realizes even faster training speed than unified ELM.
Article
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Extreme learning machines (ELMs) basically give answers to two fundamental learning problems: (1) Can fundamentals of learning (i.e., feature learning, clustering, regression and classification) be made without tuning hidden neurons (including biological neurons) even when the output shapes and function modeling of these neurons are unknown? (2) Does there exist unified framework for feedforward neural networks and feature space methods? ELMs that have built some tangible links between machine learning techniques and biological learning mechanisms have recently attracted increasing attention of researchers in widespread research areas. This paper provides an insight into ELMs in three aspects, viz: random neurons, random features and kernels. This paper also shows that in theory ELMs (with the same kernels) tend to outperform support vector machine and its variants in both regression and classification applications with much easier implementation.
Article
In this short note a very simple proof of the Chebyshev's inequality for random vectors is given. This inequality provides a lower bound for the percentage of the population of an arbitrary random vector X with finite mean μ =E(X) and a positive definite covariance matrix V = Cov(X) whose Mahalanobis distance with respect to V to the mean μ is less than a fixed value. The main advantage of the proof is that it is a simple exercise for a first year probability course. An alternative proof based on principal components is also provided. This proof can be used to study the case of a singular covariance matrix V.
Article
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.