Content uploaded by Han Zou

Author content

All content in this area was uploaded by Han Zou on Feb 19, 2016

Content may be subject to copyright.

194 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016

Robust Extreme Learning Machine With its

Application to Indoor Positioning

Xiaoxuan Lu, Han Zou, Hongming Zhou, Lihua Xie, Fellow, IEEE, and Guang-Bin Huang, Senior Member, IEEE

Abstract—The increasing demands of location-based services

have spurred the rapid development of indoor positioning system

and indoor localization system interchangeably (IPSs). However,

the performance of IPSs suffers from noisy measurements. In this

paper, two kinds of robust extreme learning machines (RELMs),

corresponding to the close-to-mean constraint, and the small-

residual constraint, have been proposed to address the issue

of noisy measurements in IPSs. Based on whether the feature

mapping in extreme learning machine is explicit, we respec-

tively provide random-hidden-nodes and kernelized formulations

of RELMs by second order cone programming. Furthermore,

the computation of the covariance in feature space is discussed.

Simulations and real-world indoor localization experiments are

extensively carried out and the results demonstrate that the

proposed algorithms can not only improve the accuracy and

repeatability, but also reduce the deviation and worst case error

of IPSs compared with other baseline algorithms.

Index Terms—Indoor positioning system (IPS), robust extreme

learning machine (RELM), second order cone program-

ming (SOCP).

I. INTRODUCTION

DUE to the nonline-of-sight transmission channels

between a satellite and a receiver, wireless indoor posi-

tioning has been extensively studied and a number of solutions

have been proposed in the past two decades. Unlike other wire-

less technologies, such as ultrawideband and radio frequency

identiﬁcation, which require the deployment of extra infras-

tructures, the existing IEEE 802.11 network infrastructures,

such as WiFi routers, are widely available in large numbers of

commercial and residential buildings. In addition, nearly every

mobile device now is equipped with a WiFi receiver [1].

The WiFi-based machine learning (ML) approaches are

becoming popular in indoor positioning in recent years [2].

Fingerprinting method based on WiFi received signal

strength (RSS), in particular, has received a lot of attentions.

The ﬁngerprinting localization procedure usually involves two

stages: 1) ofﬂine calibration stage and 2) online matching stage.

Manuscript received September 4, 2014; revised December 13, 2014;

accepted January 25, 2015. Date of publication February 24, 2015;

date of current version December 14, 2015. This work was sup-

ported in part by the National Research Foundation of Singapore under

Grant NRF2011NRF-CRP001-090 and Grant NRF2013EWT-EIRP004-012,

and in part by the Natural Science Foundation of China under NSFC

61120106011. This paper was recommended by Associate Editor X. Wang.

The authors are with the School of Electrical and Electronics

Engineering, Nanyang Technological University, Singapore 639798 (e-mail:

xlu010@ntu.edu.sg).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCYB.2015.2399420

During the ofﬂine stage, a cite survey is conducted and signal

strengths received at each location from various access points

(APs) are recorded in a radio map. During the online stage, users’

positions can be estimated by matching the online RSSs with

the ﬁngerprints stored in the radio map. Online matching strat-

egy according to the relationships between physical locations

and RSS map modeled by different ML algorithms is cru-

cial for the performance of indoor positioning systems (IPSs).

Neural network (NN) and support vector machines (SVM) [3],

as two sophisticated ML techniques, have both been utilized

in ﬁngerprinting-based indoor positioning [4].

However, either NN or SVM-based IPSs face two chal-

lenges. On one hand, NN and SVM are time-consuming, and

this issue becomes more serious in ﬁngerprinting-based posi-

tioning systems, because large amount of training data are

required for generating a radio map. Their high computational

costs leave us little leeway, especially for some large-scale

scenarios, to improve the performance and robustness of

ML-based IPSs. On the other hand, noisy measurements are

inevitable, considering that manual observational errors of cal-

ibrated points happen throughout the calibration phase. In

addition, signal variation and ambient dynamics also affect

the signals received by APs. These adverse factors can be con-

sidered as uncertainties, which may degrade the performance

of IPSs. Many researchers bypass optimizing ML methods to

enhance the robustness of IPSs since it will aggravate the sit-

uation of slow training rate. Kothari et al.[5] utilized the

integration of complementary localization algorithms of dead

reckoning and WiFi signal strength ﬁngerprinting to achieve

robust indoor localization, nevertheless, a disadvantage of dead

reckoning is that the errors are cumulative, since new positions

are calculated solely from previous ones. Meng et al.[6]pro-

posed a robust noniterative three-step location sensing method,

but its capability of reducing the worst case error (WCE)

and variance is comparatively limited. Other robust indoor

localization algorithms demand either extra infrastructure or

users’ interaction during calibration phases, which is not

cost-efﬁcient in reality.

These undesirable results motivate us to reconsider the

problem: can we ﬁnd a ML technique which is fast in train-

ing and has the capability of handling the robustness issue

in IPSs? As a novel learning technique, extreme learning

machines (ELM) has been demonstrated with its outstanding

performance in training speed, prediction accuracy, and gener-

alization ability [7], [8]. Several IPSs have already leveraged

ELM to deliver accurate location estimation with fast training

speed [1], [9], [10].

2168-2267 c

2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 195

Extended from ELM, this paper proposes two robust

ELMs (RELMs), which can be implemented in the

random-hidden-nodes form or kernelization form depending

on the situation, to boost the robustness of IPSs.

The problem of uncertainty and robustness has been

intensively studied in recent years. Wang et al.[11]pro-

posed an ELM tree model-based on the heuristics of uncer-

tainty reduction and computationally lightweight for big

data classiﬁcation. Fuzzy integral method is adopted to

study the probabilistic feed-forward neural networks [12].

Horata et al.[13] proposed an approach, which is also named

RELM to improve the computational robustness by extended

complete orthogonal decomposition and outlier robustness

by reweighted least squares. Unlike these works, consider-

ing the noises in IPS as discussed above, we propose our

algorithm under a stochastic framework. It is worthwhile to

mention that RELMs are based on second order cone program-

ming (SOCP), which is widely adopted in robust convex opti-

mization problems. Simulation and real-world experimental

results both demonstrate that RELMs-based IPSs outperform

other baseline algorithms-based IPSs in terms of accuracy,

repeatability (REP), and WCE.

An outline of this paper is as follows. In Section II,we

introduce the preliminaries for this paper, including basic com-

ponents of a WiFi-based IPS, backgrounds for ELM, and

its comparison with SVR. Two second order moment con-

straints, i.e., close to mean (CTM) and small residual (SR)

constraints, with their geometric interpretations are given in

Section III. The random-hidden-nodes and kernelized formula-

tions of RELMs are derived in Sections IV and V, respectively.

How to calculate the covariance in the feature space is studied

in Section VI. In Section VII, the proposed algorithms are eval-

uated by both simulation and real-world IPSs. The conclusion

is drawn in Section VIII.

II. PRELIMINARIES

A. WiFi Indoor Positioning

An enormous body of indoor positioning problems fall into

a sort of regression problem. As shown in Table I, the input

variable x(x1,x2,...,xd)is a vector of RSS received from

APs in the environment, and t(t1,t2)is the indoor 2-D physical

coordinates of a target’s location. When an AP is undetectable

in a position, its corresponding RSS is taken as −100 dBm.

The problem here is to train and approximate the regression

model.

Although in some works, the procedure of collecting sig-

nal strength involves physically moving a wireless device all

around the target area, as in [14] and [15], we only pick out

some spatially representative locations, i.e., reference (calibra-

tion) points, from the target area, and conduct sampling at each

reference point for a period of time to build up a radio map.

B. Introduction to ELM

Originally inspired by biological learning to overcome the

challenging issues faced by back propagation (BP) learn-

ing algorithms, ELM is a kind of ML algorithm based on

a generalized single-hidden layer feedforward NN (SLFN)

TAB LE I

INPUT VARIABLE:RSS(x)AND OUTPUT:LOCATION (t)

architecture [16]. It has been demonstrated to provide good

generalization performance at an extremely fast learning

speed [17]–[19].

Let ϒ={(xi,ti);i=1,2,...,N}be a training set consist-

ing of patterns, where xi∈R1×dand ti∈R1×m, then the goal

of regression is to ﬁnd the relationship between xiand ti. Since

the only parameters to be optimized are the output weights,

the training of ELM is equivalent to solving a least squares

problem [20].

In the training process, the ﬁrst stage is that the hidden

neurons of ELM map the inputs onto a feature space

h:xi→h(xi)(1)

where h(xi)∈R1×L.

We denote Has the hidden layer output matrix (randomized

matrix)

H=⎡

⎢

⎢

⎢

⎣

h(x1)

h(x2)

.

.

.

h(xN)

⎤

⎥

⎥

⎥

⎦N×L

(2)

with Lthe dimension of the feature space and β∈RL×mas

the output weight matrix that connects the hidden layer with

the output layer. Then, each output of ELM is given by

ti=h(xi)β,i=1,2,...,N.(3)

ELM theory aims to reach the smallest training error but

also the smallest norm of output weight [16]

min

ξ,β∈RL×mLP=1

2β1

1+C

2

N

i=1

ξi

s.t.h(xi)β−ti2

2=ξii=1,2,...,N(4)

where 1>0,

2>0,

1,

2=0,1/2,1,2,...,+∞,1Cis

the penalty coefﬁcient on the training errors and ξi∈Rmis

the error vector with respect to the ith training pattern.

A simplest example of the above is basic ELM [17]

min

β∈RL×mLP=

N

i=1

ξi

s.t.h(xi)β−ti2=ξii=1,2,...,N(5)

which can be solved by the least squares method

β=H†T(6)

where H†is the Moore–Penrose generalized inverse of H.

1Unless explicitly speciﬁed, 1=2=2 for all norm notations in this

paper.

196 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016

Extended from basic ELM, [21] proposed an optimization-

based ELM (OPT-ELM) for the binary classiﬁcation problem

by introducing inequality constraints. We follow from [21]to

give a form of OPT-ELM for regression problems:

min

ξ,β∈RL×mLP=1

2β2+C

2

N

i=1

ξi

s.t.h(xi)β−ti≤ε+ξi

ξi≥0i=1,2,...,N(7)

where εis a slack variable. This formulation is very similar to

support vector regression (SVR) in a nonlinear case [3], [22],

which is in the following form:

min

ξ,w,bLPSVM =1

2w2+C

2

N

i=1

ξi

s.t.w·φ(xi)+b−ti≤ε+ξi

ξi≥0i=1,2,...,N(8)

where φ(·)is the nonlinear feature mapping function in SVR,

wis the output weights and bis the approximation (output)

bias. εand ξiare as deﬁned in the OPT-ELM case.

Detailed comparison between ELM and SVM for classi-

ﬁcation problems are given in [21] and [23], and in the next

section we further this comparison to regression problems. For

convenience of description, we henceforth follow from [16]

to refer to the formulation of (7) as OPT-ELM, while basic

ELM stands for the formulation of (5). The terminology ELM

in the rest of this paper has more broad meaning, which

can be considered as the gathering of basic ELM and its

random-hidden-nodes-based variants.2

C. Comparisons Between ELM and SVR

Both formulations of ELM and SVR are within the scope

of quadratic programming, however, the decision variable b,

i.e., the bias term, is not existent in ELM.

SVR and its variants emphasize the importance of bias

bin their implementation. The reason is that the separation

capability of SVM was considered more important than its

regression capability when SVM was ﬁrst proposed to handle

binary classiﬁcation applications. Under this background, its

universal approximation capability may somehow have been

neglected [3]. Due to the inborn reason that the feature map-

ping φ(·)in SVR is unknown, it is difﬁcult to study the

universal approximation capability of SVR without the explic-

itness of feature mapping. Since φ(·)is unknown and may not

have universal approximation capability, given a target func-

tion f(·)and any small εprecision, there may not exist a w

such that w·φ(x)−f(x)<ε. In other words, there may exist

some system errors even if SVM and its variants with appro-

priate kernels can classify different classes well, and these

system errors need to be absorbed by the bias b. This may be

the reason why in principle the bias bhas to remain in the

optimization constraints [16].

2We particularly avoid including kernel ELM and its variants in the above

gathering, given the fact that they do not possess the most signiﬁcant property

of ELM—random feature mapping.

On the other hand, all the parameters of the ELM map-

ping h(x)are randomly generated, and h(x)is known to users

ﬁnally. According to [17]–[19], ELM with almost any non-

linear piecewise continuous function h(x)has the universal

approximation capability. Therefore, the bias bis not necessary

in the output nodes of ELM.

In addition, from the optimization point of view, less

decision variables to be determined implies less computa-

tional costs, and this computational superiority becomes more

obvious when the scale of the training data gets larger.

Kernel ELM is somehow superior to SVR for the sake of

ﬂexibility in kernels. Namely, the feature mapping to form

the kernels can be unknown mapping or random feature map-

ping. More introduction about kernel ELM will be given in

Section V.

Huang [16] pointed out that the “redundant” brenders SVR

sub-optimal compared with ELM if same kernels are both used

in them, because the feasible solution space of SVR is a subset

of ELM feasible solution space.

We shall indicate that the main difference between ELM

and SVR is their different account of starting points. SVR [24]

was developed at ﬁrst as an extension of SVM. As mentioned

above, SVM was designed for binary classiﬁcation at ﬁrst, and

the subsequent variants for regression problems were devel-

oped on the basis of SVM without addressing the problem

caused by b. By contrast, ELM was originally proposed for

regression, the feature mappings h(x)are known, and univer-

sal approximation capability was considered at the ﬁrst place.

Thus, in ELM, the approximation error tends to be zero and

bshould not be present [16], [21], [23].

III. ROBUST ELM

A. Uncertainties of Input and Output Data

RELM is proposed under a stochastic framework. Assume

that both input xand output data tare perturbed by noises.

Since His the feature space after nonlinear mapping from

the input space, if the input data is contaminated, His also

mixed with disturbances. We follow from [25] to assume the

disturbances in the feature space are additive:

h(xi)=h(xi)true +(ι1)i

ti=(ti)true +(ι2)i(9)

where (ι1)iand (ι2)iare uncorrelated perturbations in the

feature space and output space with proper dimensions, respec-

tively. The new vector yi∈R1×(L+m)is the ith input and ith

output observation, i.e., yi=[h(xi), ti]. And now we give the

following deﬁnitions:

¯

h(xi)=E(h(xi)), ¯

ti=E(ti)

i

hh =Cov(h(xi), h(xi)), i

tt =Cov(ti,ti)(10)

where E(·)and Cov(·)denote expectation and covariance

operators for random variables, respectively. Since, the per-

turbations in the feature space (ι1)iand output space (ι2)iare

uncorrelated, i.e., i

ht =0, we have

¯

yi=E([h(xi), ti])=¯

h(xi), ¯

ti

i

yy =Cov(yi,yi)=i

hh 0

0i

tt (L+m)×(L+m)

.(11)

LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 197

The ith prediction error is denoted by ei∈R1×mand its

expectation ¯

eiis deﬁned as follows:

ei=h(xi)β−ti,¯

ei=¯

h(xi)β−¯

ti.(12)

It follows from [25] and [26] that, by inserting CTM and

SR constraints into SVR, the predictions can be robust to

perturbations in the data set.

CTM is a criterion on that we require the prediction errors

to be insensitive to the distribution of the noises in input and

output data

Pr

xi,yi{|ei−¯ei|≥θi}≤ηi=1,2,...,N(13)

xi,yihere are the input and output data, and θimeans the

conﬁdence threshold while ηdenotes the maximum tolerance

of the deviation.

An alternative way to boost the robustness is restricting the

residual to be small, which leads to the SR constraint

Pr

xi,yi{|ei|≥ξi+ε}≤η(14)

where ξicorresponds to the prediction error and εis a slack

variable. Compared with the CTM constraint, the SR con-

straint requires the estimator to be robust in terms of deviations

which lead to larger estimation error rather than centering. In

fact, both CTM and SR constraints are robust constraints uti-

lized to bound probabilities of highly deviated errors subject

to second order moment constraints.

B. Sufﬁcient Condition of CTM Constraint

It should be pointed out that, the above two robust con-

straints only consider a scalar output case, however, the

outputs of IPSs are usually vectors. Moreover, ELM or

kernel ELM algorithms are inherently different from SVR,

therefore different constraints should be provided for our

problem setting. We now give our CTM constraint for this

paper

Pr

h(xi),tiei−¯

ei2≥θ2

i≤τi=1,2,...,N(15)

where θiis still a conﬁdence threshold and τhere stands

for some probability. Nevertheless, CTM constraints in this

form are intractable. Multidimensional Chebyshev’s inequal-

ity is leveraged to convert the original constraints into tractable

ones.

Lemma 1 [27]: Let zbe an m-dimensional random row vec-

tor with expected value ¯

zand positive-deﬁnite covariance ,

then

Pr (z−¯

z)−1(z−¯

z)T≥θ2≤m

θ2.(16)

Proposition 1: For zand deﬁned in Lemma 1,ifz2≥

, then z−1zT≥.

Proof: Since is a real-valued symmetric matrix, it can be

diagonalized as =P−1P.here is a real-valued matrix

with eigenvalues of on its diagonal. It can be shown that

≤I⇒−1≥−1I(17)

which leads to

z−1zT=zP−1−1PzT≥zzT

(18)

and (18) gives rise to

z2≥⇒z−1zT≥. (19)

Proposition 1also implies

Pr z2≥≤Pr z−1zT≥.(20)

Theorem 1: Let β∈RL×mand ω=[βT,−1]T∈

R(L+m)×mand i

yy is deﬁned in (11), then a sufﬁcient con-

dition for (15)is

i

yy1

2ω

≤θiτ/m(21)

where −1is a vector of all entries of −1 with proper length.

Proof: Substitute ei,θ

ifor z,θ into (16), we have

Pr

h(xi),ti(ei−¯

ei)i

ee−1(ei−¯

ei)T≥θ2

i≤m

θ2

i

(22)

which together with (20), leads to

Pr

h(xi),tiei−¯

ei2≥θ2

i

≤Pr

h(xi),ti(ei−¯

ei)i

ee−1(ei−¯

ei)T≥θ2

i

i

ee

≤mi

ee

θ2

i

.(23)

Thus, mi

ee/θ 2

i≤τis a sufﬁcient condition for (15). By

taking into account that

i

ee =ωTi

yyω(24)

inserting (24)intomi

ee/θ 2

i≤τand then taking the square

root on both sides, (21) follows.

C. Sufﬁcient Condition of SR Constraint

The sufﬁcient condition of SR constraint can be derived in

the same fashion. The SR constraint in our case is

Pr

h(xi),tiei2≥(ξi+ε)2≤τi=1,2,...,N.(25)

Theorem 2: Let β∈RL×m,ω=[βT,−1]T∈R(L+m)×m

and i

yy is deﬁned in (11), then a sufﬁcient condition for (25)is

i

yy1

2ω

¯

h(xi)β−¯

ti

≤(ξi+ε)τ/m(26)

where −1is a vector of all entries of −1 with proper length.

Proof: Taking eieT

i∈Ras a random variable, from

Markov’s inequality, we have

Pr

h(xi),tiei2≥(ξi+ε)2=Pr

h(xi),tieieT

i≥(ξi+ε)2

≤E(eieT

i)

(ξi+ε)2.

198 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016

Fig. 1. Shadow area indicates the possible region the random variable may

fall into.

Denote tr(·)as the trace operator of a matrix

EeieT

i=Etr eT

iei

=Etr eT

iei−¯

eT

i¯

ei+tr ¯

eT

i¯

ei

=tr i

ee +¯

eT

i¯

ei.(27)

Since i

ee and ¯

eT

i¯

eiare both positive semi-deﬁnite, which

implies that i

ee +¯

eT

i¯

eiis positive semi-deﬁnite. Since

i

ee +¯

eT

i¯

ei

=max{λ1,...,λ

m}(28)

where λistands for an eigenvalue of i

ee +¯

eT

i¯

ei,wehave

tr i

ee +¯

eT

i¯

ei≤m

i

ee +¯

eT

i¯

ei

(29)

which leads to

m

i

ee +¯

eT

i¯

ei

=m

i

yy1

2ω

¯

h(xi)β−¯

ti

2

.(30)

By letting

m

(ξi+ε)2

i

yy1

2ω

¯

h(xi)β−¯

ti

2

≤τ(31)

and taking square root on both sides, we claim that (26)isa

sufﬁcient condition for (25).

D. Geometric Interpretation

The geometric interpretations of the above claims are as

follows:

1) Proposition 1 can be interpreted as that the chance of

a random variable lying outside a sphere with radius

√is greater than that of a random variable lying

outside an ellipsoid with radius √and covariance

matrix . This is intuitive because the largest length

of semi-axe of the ellipsoid is equal to the radius of the

sphere and they share the same center. Fig. 1shows the

illustration when the ellipsoid and sphere are projected

onto a 2-D space.

2) The above CTM robust criterion can be understood as

a restriction that each training data yipicked from the

ellipsoid i(¯

yi,

i

yy,(m/τ )1/2)satisﬁes the inequality

ei−¯

ei≤θi(32)

where

i¯

yi,

i

yy,m

τ

=yi|(yi−¯

yi)i

yy−1

(yi−¯

yi)T≤m

τ.(33)

From Theorem 1,wehave

m/τ

i

yy

1

2ω

≤θi.(34)

Further, by noting that

ei−¯

ei=(yi−¯yi)ω

=

(yi−¯yi)i

yy−1

2i

yy

1

2ω

≤

(yi−¯yi)i

yy−1

2

i

yy

1

2ω

≤m

τ

i

yy

1

2ω

.(35)

It is obvious that the above geometric interpretation for

the CTM constraint holds.

3) A similar geometric interpretation can be given for the

SR constraint. Let

i

yy =i

yy +yT

iyi(36)

a SR constraint enforces each training data yipicked

from the ellipsoid i(0,

i

yy,√m/τ )

i0,

i

yy,m

τ

=yi|yi

i

yy−1yT

i≤m

τ(37)

satisﬁes the following inequality:

ei≤ξi+ε. (38)

The procedure to verify this interpretation is in the same

fashion of the CTM case

ei=yiω

=

yi

i

yy−1

2

i

yy

1

2ω

≤

yi

i

yy−1

2

i

yy

1

2ω

≤m

τ

i

yy

1

2ω

.(39)

From Theorem 2,wehave

i

yy1

2ω

¯

h(xi)β−¯

ti

2

=

i

ee +¯

eT

i¯

ei

=

ωTi

yy +yT

iyiω

=

ωTi

yy +yT

iyiω

≤τ

m(ξi+ε)2.(40)

Taking square roots of (40) yields

i

yy1

2ω

≤τ

m(ξi+ε)(41)

which together with (39) implies

ei≤ξi+ε. (42)

LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 199

IV. ROBUST ELM FOR REGRESSION

Based on the preliminary results of last section, we now

formulate CTM-constrained RELM (CTM-RELM) and SR-

constrained RELM (SR-RELM) for noisy input and output

data.

A. CTM-Based RELM

By adding second order moment constraints to the basic

ELM formulation in Theorem 1, the CTM-RELM is formu-

lated as

min

β,b,θ,ξLP=b+C

N

i=1

ξi+D

N

i=1

θi

s.t.h(xi)β−ti≤ε+ξi

i

yy1

2ω

≤θiτ/m

ξi≥0i=1,2,...,N

β≤b(43)

where Cis deﬁned in (7), and Dis a penalty coefﬁcient to

control the deviation of the prediction errors.

B. SR-Based RELM

Likewise, Theorem 2also leads to a SOCP problem

formulation

min

β,b,ξLP=b+C

N

i=1

ξi

s.t.

i

yy1

2ω

¯

h(xi)β−¯

ti

≤(ξi+ε)τ/m

ξi≥0i=1,2,...,N

β≤b.(44)

V. KERNELIZATION FOR RELMS

As discussed in Section II-C, the kernel trick is adopted

in SVR. In fact, the kernel trick can also be applied to

ELM. We have indicated that, the explicit nonlinear fea-

ture mapping with random hidden nodes in ELM can bring

about some advantages compared to SVR. Nevertheless, it

does not mean that the kernel trick is useless for ELM. In

reality, the capability of universal approximation of ELM

can not be fulﬁlled due to the curse of dimensionality.

Kernel methods enable access to the corresponding very

high-dimensional, even inﬁnite-dimensional, feature spaces

at a low computational cost both in space and time [28].

In the case of a Gaussian kernel, the feature map lives in

an inﬁnite dimensional space, i.e., it has inﬁnite number of

hidden nodes L, which enables ELM to work as universal

approximator [18]. Some related works have adopted the ker-

nel method in ELM and produce desirable results [23], [29].3

In this section, we slightly modify CTM and SR constraints

3For terminology consistency, we use kernel ELM to refer to the kernel

trick-based ELM and its variants.

and then incorporate them into the kernelized formulations

of RELMs.

It follows from [23] that the optimal weight matrix βin

ELM has the form:

β=HTP(45)

where P∈RN×m. Once the model, i.e., β, is determined, we

can make predictions by

f(x)=h(x)β=

N

i=1

h(x)h(xi)TPi.(46)

Based on the deﬁnition of ELM kernel, we have

f(x)=

N

i=1

k(x,xi)Pi(47)

where k(·,·)is a kernel function. The kernel matrix of ELM

is deﬁned as [16]

K=HHT:Ki,j=h(xi)·hxjT=kxi,xj(48)

when the number of training samples is n,K∈RN×N.

The intrinsic modularity of kernel machines also means that

any kernel function can be used provided it produces symmet-

ric, positive semi-deﬁnite kernel matrices [28]. In our case, we

restrict Knot only to satisfy the modularity but also have all

of its entries being real numbers. Thus, we can decompose K

in such way

K=K1

2K1

2(49)

where K1/2is real symmetric. From (45) and (48), we get

βTβ=PTKP =K1

2PTK1

2P(50)

which leads to β=K1/2P.

We now give the kernelized CTM constraint

i

yy

1

2

K1

2P

−1

≤θiτ/mi=1,2,...,N(51)

where −1is a matrix of all entries of −1 with the dimen-

sion of m×m. Note that (51) is a sufﬁcient condition of (21)

since

i

yy1

2ω

≤

i

yy

1

2ω≤

i

yy

1

2

K1

2P

−1

(52)

where ω=[βT,−1]T, and the kernelized CTM-RELM is of

the form as

min

P,b,θ,ξLP=b+C

N

i=1

ξi+D

N

i=1

θi

s.t.Ki,:P−ti≤ε+ξi

i

yy

1

2

K1

2P

−1

≤θiτ/m

ξi≥0i=1,2,...,N

K1

2P

≤b.(53)

200 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016

A similar fashion can be adopted to derive the kernelized

SR-RELM formulation

min

P,b,ξLP=b+C

N

i=1

ξi

s.t.

i

yy

1

2K1

2P

−1

Ki,:P−ti

≤(ξi+ε)τ/m

ξi≥0i=1,2,...,N

K1

2P

≤b.(54)

VI. COVARIANCE IN THE FEATURE SPACE

We ﬁrstly calculate the covariance when the nonlinear

mapping functions are known explicitly. We write h(x)as

follows:

h(x)=[G(a1,b,x),...,G(aL,b,x)] (55)

where ai,bare randomly generated weights and bias con-

necting an input and the ith hidden node. G(ai,b,x)is the

activation function.

A statistical method is provided to derive the covariance

theoretically in the feature space. For each input xi, we ran-

domly generate Zsamples {x1

i,x2

i,...,xZ

i}according to the

distribution of xiwith mean ¯

xiand covariance i

xx. Then the

covariance matrix of h(xi)can be approximated by

i

hh =1

Z

Z

z=1˜

hxz

iT˜

hxz

i(56)

where

˜

hxz

i=hxz

i−1

Z

Z

z=1

hxz

i.(57)

However, the covariance in the kernel case is more delicate

and cannot be derived explicitly. Note that, in the kernelized

cases of (53) and (54), only the norm of covariance i

yy is

needed, that is

i

yy =

i

hh 0

0i

tt

=max

i

hh

,

i

tt

.(58)

i

ttcan be readily calculated, and we now give a solution

to approximate i

hh.TheL2-norm of real symmetric matrix

i

hh equals its largest eigenvalue. Let λand vbe an eigenvalue

and its corresponding eigenvector

λv=i

hhv.(59)

It is been proved in [30] that λof i

yy also satisﬁes

Zλα=˜

Kiα(60)

where ˜

Ki=Ki−LKi−KiL+LKiLand L∈RZ×Zwith

each entry Li,j=1/Z. Here, the Z×Zmatrix Kis deﬁned by

Ki

i,j:=kxi

i,xj

i=hxi

i·hxj

i.(61)

Fig. 2. Positions of the WiFi AP, ofﬂine calibration points, and online testing

points in the simulated ﬁeld.

Hence, we can compute the L2-norm of i

hh from the set

of eigenvalues of ˜

Ki

i

hh

=1

Zmax λ˜

Ki (62)

where λ( ˜

Ki)is the set of all the eigenvalues of ˜

Ki.

VII. PERFORMANCE VERIFICATION

A. Simulation Results and Evaluation

We develop a simulation environment using MATLAB

R2013a in order to evaluate the performance of our proposed

algorithms before any real-world experiment is conducted. As

shown in Fig. 2,weassumea20×20 m room where four

WiFi APs are installed at the four corners of the room. The

most commonly used path loss model for indoor environment

is the ITU indoor propagation model [31]. Since it provides

a relation between the total path loss PL (dBm) and distance

d(m), it is adopted to simulate the WiFi signal generated from

each WiFi AP. The indoor path loss model can be expressed as

PL(d)=PL0−10αlog(d)+X(63)

where PL0is the path loss coefﬁcient and it is set to be

−40 dBm in our simulation. αis the path loss exponent and

Xrepresents some random noises.

The distribution of RSS indication from four real-world APs

in our IPS is illustrated in Fig. 3. As shown in Fig. 3, the sig-

nals collected by one AP can be quite different even at a same

location due to noises and outliers. Therefore, four different

types of data with disturbances are generated based on (63),

i.e., data mixed with the Gaussian noise X∼N(0,1), data

mixed with the student’s noise X∼T(0,1,1), data mixed

with the gamma noise X∼Ga(1,1)and data contaminated by

one-sided outliers (20% contamination rate),4to test the per-

formance of RELMs. To make our simulation more practical,

100 testing samples are artiﬁcially generated at each training

point and testing point, respectively using (63) with different

perturbations.

We apply our RELMs to the simulated data, and com-

pare our proposed algorithms with basic ELM, OPT-ELM,

4The strategy of adding outliers here is similar to the one of [13].

LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 201

Fig. 3. RSS index of distribution of four APs at one position.

kernel ELM, and SVR [32]. In the CTM-RELM formulation,

there are three hyperparameters, C,D, and τto be tuned.

Cand Dare both selected by grid method from the exponential

sequence [2−5,2−4,...,25] utilizing ﬁvefold cross-validation

on the training data set. τincreases from 0.1 to 1 with a step

size of 0.1. In SR-RELM case, there are two hyper-parameters,

Cand τto be tuned, they are all selected with the same strat-

egy as CTM-RELM. For both RELMs, the slack variable εis

empirically selected as 0.05. The SOCP problems are solved

by CVX MATLAB toolbox [33]. Since the performances of

ELM and its variants are not sensitive to the number of hid-

den nodes Las long as it is larger than some threshold [23],

we ﬁx Las 500 for our proposed algorithms, basic ELM and

OPT-ELM to facilitate the comparison of computational costs.

The width of Gaussian kernel λused in SVR and kernel ELM

are selected from the exponential sequence [2−5,2−4,...,25]

utilizing ﬁvefold cross-validation.

Four performance measures are introduced: mean root

square error (MRSE), standard deviation (STD), WCE, and

REP over rrepeated realizations. Noted that MRSE, STD,

and WCE in this case are taken from the mean over the r

repeated realizations. REP is measured by the deviation of

the MRSE over the repeated realizations, and this measure is

proposed based on the fact that ELM with same parameters,

e.g., the number of hidden nodes, in the same training data

set may draw quite different results. rin our experiment is

selected as 30

MRSE =1

r

r

j=1!1

s

s

i=1

ti−hiˆ

β

"j

STD =1

r

r

j=1⎛

⎝%

&

&

's

i=1!

ti−hiˆ

β

−1

s

s

i=1

ti−hiˆ

β

"2⎞

⎠j

WCE =1

r

r

j=1max

i∈S

ti−hiˆ

β

j

REP =%

&

&

&

'1

r

r

j=1!1

s

s

i=1

ti−hiˆ

β

−MRSE"2

j

Fig. 4. Cumulative percentile of error distance for simulation data sets.

where sis the number of testing samples, Sis the index set

of testing samples like [1,2,...,s].

As shown in Fig. 4, the proposed two algorithms outperform

the other four algorithms in terms of accuracy and WCE. More

exact number can be found in Table II, from which we see that

the REP of the RELMs-based systems is improved compared

with basic ELM and OPT-ELM-based ones. The enhancement

of the REP is due to more constraints brought in our algo-

rithms, which shrinks the size of solution searching space.

Note that, the shrinking happening here is different from the

one discussed in [21], in which the loss of solution searching

freedom of SVR is caused by the redundant b[16].

B. Evaluation in Real-World IPSs

The system architecture of our WiFi-based IPS is shown

in Fig. 5. The main components of this system consist of

the existing commercial WiFi APs, mobile devices with WiFi

function, a location server and a web-based monitoring system.

The following is a brief operation procedure of our WiFi-based

IPS. First of all, a data collection App for android devices was

developed. After the mobile device turns on the WiFi mod-

ule, it can collect RSS information from different APs every

second and sends this information to a location server. The

responsibility of the location server is to analyze the RSS, and

calculate the estimated position of the mobile device. Then,

the user can obtain his or her real time position through our

web-based monitoring system directly on his or her mobile

device.

We conducted real-world indoor localization experiments to

evaluate the performance of the proposed RELM approaches.

The testbed is the Internet of Things Laboratory in the

School of Electrical and Electronic Engineering, Nanyang

Technological University. The area of the test-bed is around

580 m2(35.1 ×16.6 m).

202 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016

TAB LE I I

COMPARISON OF SIMULATION RESULTS

Fig. 5. System architecture of our WiFi-based IPS.

The layout of the testbed is shown in Fig. 7. Eight D-link

DIR-605L WiFi cloud routers are utilized as WiFi APs for

our experiments. The Android application is installed on

a Samsung I929 Galaxy SII mobile phone. All the WiFi

RSS ﬁngerprints at ofﬂine calibration points and online test-

ing points are collected using this phone for performance

evaluation.

The RELM model was built up by the following steps.

During the ofﬂine phase, 30 ofﬂine calibration points were

selected and 200 WiFi RSS ﬁngerprints were collected at

Fig. 6. Cumulative percentile of error distance for IPS testing results.

each point. The positions of these 30 ofﬂine calibration points

are demonstrated in Fig. 7. By leveraging these 6000 WiFi

RSS ﬁngerprints and their physical positions as training inputs

and training targets (outputs) accordingly, the RELM model

was constructed. During the online phase, we continued to col-

lect WiFi RSS ﬁngerprints at online testing points for ﬁve days.

On each day, two distinct online testing points were selected

in order to reﬂect the environmental dynamics. The positions

LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 203

Fig. 7. Positions of the WiFi APs, ofﬂine calibration points, and online testing points in the test-bed.

TABLE III

COMPARISON OF EXPERIMENTAL TESTING RESULTS

of these ten online testing points are also presented in Fig. 7.

Two hundred WiFi RSS ﬁngerprints are collected at each point.

The parameter setting for the proposed and compared algo-

rithms in this experiment is similar with the one introduced in

Section VII-A, apart from the number of hidden units, which

is set to 1000.

The testing results with respect to four performance mea-

sures given in Section VII-A are shown in Table III.Fig.6

illustrates the comparison in terms of cumulative percentile of

error distance, which shows that the proposed CTM-RELM

can provide higher accuracy and have an obvious effect in

reducing the STD compared to ELM and OPT-ELM. On the

other hand, SR-RELM also gives an accuracy as good as

CTM-RELM, and has better performance of conﬁning the

WCE. The above results are reasonable, since the two robust

constraints have their different emphasis. In addition, both

CTM-RELM and SR-RELM can give better performances in

REP than basic ELM.

The proposed algorithms incur longer training time due to

the introduction of second order moment constraints instead

of linear constraints. However, a slightly longer training time

is not a concern in IPSs, considering that it is the calibration

phase, e.g., procedure of radio map generating, that accounts

for the large body of time consumption. Besides, RELMs

inherit the simpleness, e.g., random feature mapping, dis-

pensation with bias b, and single layer structure from ELM,

therefore its training time is still competitive compared with

SVR and its variants.

VIII. CONCLUSION

Before concluding this paper, we provide some important

discussions.

1) Choice of the Measure for Accuracy: It is noteworthy

that, we adopt MRSE instead of the conventional root

mean square error (RMSE) as our measure. It is because

MRSE makes more practical sense than RMSE for IPSs,

which has been widely adopted in indoor positioning

contests [2]. The measure of REP is introduced in par-

ticular for ELM because it produces variation in repeated

realizations, namely, with same parameters setting, e.g.,

the number of hidden nodes, of the same training set,

ELM may draw different results. This is mainly due to

the reason that the number of hidden units is not inﬁ-

nite so that the universal approximation using SLFNs

with random nodes may not be accurate [18]. However,

it is should be noted that, most iteratively tuning-based

algorithms such as BP, actually also face the unrepro-

ducibility issue, and from the perspective of STD, ELM

is even more stable.

2) Abandonment of Kernelized RELMs: Although we have

proposed the kernelized CTM-RELM and SR-RELM,

we did not adopt them in simulation and real-world

experiment due to their limits in scaling. Firstly, the

size of the decision variables in the kernelized CTM-

RELM formulation is N×m+2N+1, while the size

of the CTM-RELM is L×m+2N+1. Considering that

the number of training data Nis usually several times

204 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 1, JANUARY 2016

larger than the number of hidden nodes L, we would

encounter memory issue if we implement the Kernelized

CTM-RELM. The same logic applies to the SR-RELM

case. Secondly, the kernel-based algorithms enjoy com-

putational efﬁciency in optimization problems when the

dimension dof the feature is larger than N, while in

our case, the size of feature is far fewer than the num-

ber of training samples, therefore it is not cost-effective

to conduct training with kernels.5Thirdly, prediction by

kernel-based methods takes O(Nd)time since it uses the

dual variables, while prediction using random-hidden-

nodes-based methods by primal variables, e.g., ELM,

OPT-ELM, and RELMs only takes O(d)[28]. The test-

ing time listed in Tables II and III is consistent with the

above claim. Although a slightly longer training time is

within the tolerance for IPSs, the fast prediction speed is

highly demanded as IPSs’ servers need to provide real-

time positioning services for large crowds in some dense

indoor environments such as shopping malls, cinemas

and airports. However, when encountering small-scale

data sets, or where the size of features is very large,

kernelized RELMs can be leveraged.

3) Implementation Tricks for RELMs: How to calculate the

covariance and mean is tricky for regression problems,

since one has to use only one sample to approximate its

corresponding statistics. In this paper, we take advantage

of the speciﬁcity of the learning problem in IPSs—

grouping. The whole data set can be divided into several

groups by their belonging calibration points, and in any

group, its members “theoretically” should have the same

RSS (input) and coordinates (output). But in reality, it is

impossible due to the uncertainties as discussed above.

However, these members in one certain group can be

intuitively used to calculate the mean and covariance

needed to represent the group for problem formulations.

By this “grouping” trick, we can further reduce the num-

ber of the constraints in (43) and (44)fromnto N/g,

where gis the size of a group the number of sampling at

one calibration point. This trick can be directly extended

to RELMs for classiﬁcation problems.

4) Assumption About Additive Noises in the Feature Space:

Though we assume that the noises lying in the feature

space are additive, the simulation is conducted under

the circumstances that the inputs were corrupted with

additive disturbances. The simulation results demon-

strate that RELMs are effective for these cases. In fact,

assuming noises in the feature space are additive is

conventionally adopted by a number of ML and opti-

mization researchers [34]–[36]. It is possible that our

assumption becomes invalid under some circumstances,

e.g., input mixed with multiplicative noises. However,

5Indeed, kernel ELM possesses fast training speed, because it adopts nor-

mal equation method, i.e., it is equality constrained-optimization-based [16].

But when inequality constraints are added in the convex optimization

setting (inequality constraints can bring about the beneﬁt of sparsity in

solutions [23], [29]), the normal closed-form method may not work any-

more. Some recent work on ELM, e.g., sparse ELM [29] has already used

the inequality constraints-based formulation. Thus, the above claim about the

computational costs still holds for kernel ELM.

the case of multiplicative noises lying in RSS is rare

in indoor environments [37]. When they are not signiﬁ-

cant, those multiplicative noises can be seen as outliers

and Section VII-A has shown that RELMs can address

outliers (20% contamination rate) well.

To sum up, this paper proposed CTM-RELM and SR-RELM

to address the problem of noisy measurements in IPSs by intro-

ducing two CTM and SR constraints to the OPT-ELM, and

further gave two SOCP-based formulations. The kernelized

RELMs and the method to calculate the theoretical covariance

matrix in the feature space were further discussed. Simulation

results and real-world indoor localization experiments both

demonstrated that the CTM-RELM-based IPS can provide

higher accuracy and smaller STD than other algorithms-based

IPSs; while the SR-RELM-based IPS can provide better accu-

racy and smaller WCEs. The REP of the proposed algorithms

was also demonstrated to be better.

The future work will focus on how to reduce the compu-

tational costs of the proposed algorithms for IPSs with large

data sets. Sparse matrix techniques will be leveraged to make

it possible. Meanwhile, more performance testing for RELMs

will be conducted for classiﬁcation problems with different

combinations of 1and 2for the norm.

REFERENCES

[1] H. Zou, X. Lu, H. Jiang, and L. Xie, “A fast and precise indoor localiza-

tion algorithm based on an online sequential extreme learning machine,”

Sensors, vol. 15, no. 1, pp. 1804–1824, Jan. 2015.

[2] Q. Yang, S. J. Pan, and V. W. Zheng, “Estimating location using Wi-Fi,”

IEEE Intell. Syst., vol. 23, no. 1, pp. 8–13, Jan./Feb. 2008.

[3] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,

vol. 20, no. 3, pp. 273–297, Mar. 1995.

[4] H. Liu, H. Darabi, P. Banerjee, and J. Liu, “Survey of wireless indoor

positioning techniques and systems,” IEEE Trans. Syst., Man, Cybern. C,

Appl. Rev., vol. 37, no. 6, pp. 1067–1080, Nov. 2007.

[5] N. Kothari, B. Kannan, E. D. Glasgwow, and M. B. Dias, “Robust indoor

localization on a commercial smart phone,” Proc. Comput. Sci., vol. 10,

pp. 1114–1120, Aug. 2012.

[6] W. Meng, W. Xiao, W. Ni, and L. Xie, “Secure and robust Wi-Fi ﬁnger-

printing indoor localization,” in Proc. Int. Conf. Indoor Position. Indoor

Nav. (IPIN), Guimarães, Portugal, Sep. 2011, pp. 1–7.

[7] G.-B. Huang and L. Chen, “Convex incremental extreme learning

machine,” Neurocomputing, vol. 70, no. 16, pp. 3056–3062, Oct. 2007.

[8] W. Xi-Zhao, S. Qing-Yan, M. Qing, and Z. Jun-Hai, “Architecture selec-

tion for networks trained with extreme learning machine using local-

ized generalization error model,” Neurocomputing, vol. 102, pp. 3–9,

Feb. 2013.

[9] W. Xiao, P. Liu, W.-S. Soh, and Y. Jin, “Extreme learning machine for

wireless indoor localization,” in Proc. 11th Int. Conf. Inf. Process. Sens.

Netw., Beijing, China, Apr. 2012, pp. 101–102.

[10] J. Liu, Y. Chen, M. Liu, and Z. Zhao, “SELM: Semi-supervised

ELM with application in sparse calibrated location estimation,”

Neurocomputing, vol. 74, no. 16, pp. 2566–2572, Sep. 2011.

[11] R. Wang, Y.-L. He, C.-Y. Chow, F.-F. Ou, and J. Zhang, “Learning

ELM-tree from big data based on uncertainty reduction,” Fuzzy Sets

Syst., vol. 258, pp. 79–100, Jan. 2015.

[12] J. Zhai, H. Xu, and Y. Li, “Fusion of extreme learning machine with

fuzzy integral,” Int. J. Uncertain. Fuzz. Knowl.-Based Syst., vol. 21,

pp. 23–34, Dec. 2013.

[13] P. Horata, S. Chiewchanwattana, and K. Sunat, “Robust extreme learning

machine,” Neurocomputing, vol. 102, pp. 31–44, Feb. 2013.

[14] L. M. Ni, Y. Liu, Y. C. Lau, and A. P. Patil, “LANDMARC: Indoor

location sensing using active RFID,” Wireless Netw., vol. 10, no. 6,

pp. 701–710, Nov. 2004.

[15] H. Zou, H. Wang, L. Xie, and Q.-S. Jia, “An RFID indoor positioning

system by using weighted path loss and extreme learning machine,”

in Proc. 1st IEEE Int. Conf. Cyber-Phys. Syst. Netw. Appl. (CPSNA),

Taipei, Taiwan, Aug. 2013, pp. 66–71.

LU et al.: RELM WITH ITS APPLICATION TO INDOOR POSITIONING 205

[16] G.-B. Huang, “An insight into extreme learning machines: Random

neurons, random features and kernels,” Cogn. Comput., vol. 6, no. 3,

pp. 1–15, Sep. 2014.

[17] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning

machine: Theory and applications,” Neurocomputing, vol. 70, nos. 1–3,

pp. 489–501, Dec. 2006.

[18] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using

incremental constructive feedforward networks with random hidden

nodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.

[19] M.-B. Li, G.-B. Huang, P. Saratchandran, and N. Sundararajan,

“Fully complex extreme learning machine,” Neurocomputing, vol. 68,

pp. 306–314, Oct. 2005.

[20] G. Huang, S. Song, J. N. Gupta, and C. Wu, “Semi-supervised and

unsupervised extreme learning machines,” IEEE Trans. Cybern., vol. 44,

no. 12, pp. 2405–2417, Dec. 2014.

[21] G.-B. Huang, X. Ding, and H. Zhou, “Optimization method based

extreme learning machine for classiﬁcation,” Neurocomputing, vol. 74,

no. 1, pp. 155–163, Dec. 2010.

[22] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,”

Stat. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004.

[23] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning

machine for regression and multiclass classiﬁcation,” IEEE Trans. Syst.,

Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Apr. 2012.

[24] V. Vapnik, S. E. Golowich, and A. Smola, “Support vector method for

function approximation, regression estimation, and signal processing,”

in Proc. Adv. Neural Inf. Process. Syst., 1997, pp. 281–287.

[25] P. K. Shivaswamy, C. Bhattacharyya, and A. J. Smola, “Second order

cone programming approaches for handling missing and uncertain data,”

J. Mach. Learn. Res., vol. 7, pp. 1283–1314, Jul. 2006.

[26] G. Huang, S. Song, C. Wu, and K. You, “Robust support vector regres-

sion for uncertain input and output data,” IEEE Trans. Neural Netw.

Learn. Syst., vol. 23, no. 11, pp. 1690–1700, Nov. 2012.

[27] J. Navarro, “A very simple proof for the multivariate Chebyshev

inequality,” Commun. Stat. Theory Methods, Dec. 2013.

[28] K. P. Murphy, Machine Learning: A Probabilistic Perspective.

Cambridge, MA, USA: MIT Press, 2012.

[29] Z. Bai, G.-B. Huang, D. Wang, H. Wang, and M. B. Westover, “Sparse

extreme learning machine for classiﬁcation,” IEEE Trans. Cybern.,

vol. 25, no. 4, pp. 836–843, Apr. 2014.

[30] B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component anal-

ysis as a kernel eigenvalue problem,” Neural Comput., vol. 10, no. 5,

pp. 1299–1319, Jul. 1998.

[31] T. Chrysikos, G. Georgopoulos, and S. Kotsopoulos, “Site-speciﬁc val-

idation of ITU indoor path loss model at 2.4 GHz,” in Proc. IEEE Int.

Symp. World Wireless Mobile Multimedia Netw. Workshops (WoWMoM),

Kos, Greece, Jun. 2009, pp. 1–6.

[32] J. A. Suykens et al., Least Squares Support Vector Machines,vol.4.

River Edge, NJ, USA: World Scientiﬁc, 2002.

[33] M. C. Grant, S. P. Boyd, and Y. Ye. (Jun. 2014). CVX: MATLAB

Software for Disciplined Convex Programming (Web Page and Software).

[Online]. Available: http://cvxr.com/cvx

[34] H. Xu, C. Caramanis, and S. Mannor, “Robustness and regularization of

support vector machines,” J. Mach. Learn. Res., vol. 10, pp. 1485–1510,

Jul. 2009.

[35] D. Bertsimas, D. B. Brown, and C. Caramanis, “Theory and applica-

tions of robust optimization,” SIAM Rev., vol. 53, no. 3, pp. 464–501,

Aug. 2011.

[36] K. P. Bennett and E. Parrado-Hernández, “The interplay of optimiza-

tion and machine learning research,” J. Mach. Learn. Res.,vol.7,

pp. 1265–1281, Jul. 2006.

[37] A. Goldsmith, Wireless Communications. Cambridge, NY, USA:

Cambridge Univ. Press, 2005.

Xiaoxuan Lu received the B.Eng. degree from the Nanjing University of

Aeronautics and Astronautics, Nanjing, China, in 2013. He is currently

pursuing the M.Eng. degree from the School of Electrical and Electronic

Engineering, Nanyang Technological University, Singapore.

His current research interests include machine learning, mobile computing,

signal processing, and their applications to energy reduction in buildings.

Han Zou received the B.Eng. (First Class Honors) degree from Nanyang

Technological University, Singapore, in 2012, where he is currently pursuing

the Ph.D. degree from the School of Electrical and Electronic Engineering.

He is currently a Graduate Student Researcher with the Berkeley Education

Alliance for Research in Singapore Limited, Singapore. His current research

interests include wireless sensor networks, mobile computing, indoor posi-

tioning and navigation systems, indoor human activity sensing and inference,

and occupancy modeling in buildings.

Hongming Zhou received the B.Eng. and Ph.D. degrees from Nanyang

Technological University, Singapore, in 2009 and 2014, respectively.

He is currently a Research Fellow with the School of Electrical and

Electronic Engineering, Nanyang Technological University. His current

research interests include classiﬁcation and regression algorithms such as

extreme learning machines, neural networks, and support vector machines

as well as their applications including heating, ventilation and air condition-

ing system control applications, biometrics identiﬁcation, image retrieval, and

ﬁnancial index prediction.

Lihua Xie (F’07) received the B.E. and M.E. degrees from the Nanjing

University of Science and Technology, Nanjing, China, in 1983 and 1986,

respectively, and the Ph.D. degree from the University of Newcastle,

Callaghan, NSW, Australia, in 1992, all in electrical engineering.

Since 1992, he has been at the School of Electrical and Electronic

Engineering, Nanyang Technological University, Singapore. From 1986 to

1989, he was a Teacher at the Department of Automatic Control, Nanjing

University of Science and Technology. From 2006 to 2011, he was a

Changjiang Visiting Professor at the South China University of Technology,

Guangzhou, China. From 2011 to 2014, he was a Professor and the Head of

Division of Control and Instrumentation at Nanyang Technological University,

Singapore. His current research interests include robust control and estimation,

networked control systems, multiagent networks, and unmanned systems. He

has published over 260 journal papers and co-authored two patents and six

books.

Prof. Xie has served as an Editor of IET Book Series in Control and an

Associate Editor of a number of journals including the IEEE TRANSACTIONS

ON AUTOMATIC CONTROL,Automatica, the IEEE TRANSACTIONS ON

CONTROL SYSTEMS TECHNOLOGY, and the IEEE TRANSACTIONS ON

CIRCUITS AND SYSTEMS-II.

Guang-Bin Huang (SM’04) received the B.Sc. degree in applied mathematics

and M.Eng. degree in computer engineering from Northeastern University,

Shenyang, China, in 1991 and 1994, respectively, and the Ph.D. degree in

electrical engineering from Nanyang Technological University, Singapore, in

1999.

He was at the Applied Mathematics Department and Wireless

Communication Department of Northeastern University. From 2001, he was

an Assistant Professor and an Associate Professor (with tenure) at the School

of Electrical and Electronic Engineering, Nanyang Technological University,

Singapore. He is the Principal Investigator of several industrial sponsored

research and development projects. He has also led/implemented several key

industrial projects including the Chief Architect/Designer and the Technical

Leader of Singapore Changi Airport Cargo Terminal 5 Inventory Control

System Upgrading Project. His current research interests include big data

analytics, human computer interface, brain computer interface, image process-

ing/understanding, machine-learning theories and algorithms, extreme learning

machine, and pattern recognition. He was the Highly Cited Researcher listed

in 2014—The World’s Most Inﬂuential Scientiﬁc Minds by Thomson Reuters.

He was also invited to give keynotes on numerous international conferences.

Dr. Huang was the recipient of the Best Paper Award from the IEEE

TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS in

2013. He is currently serving as an Associate Editor of Neurocomputing,

Cognitive Computation,Neural Networks, and the IEEE TRANSACTIONS ON

CYBERNETICS.