ArticlePDF Available

Local convergence of tri-level alternating optimization

Authors:
.\' t'rt trr
l. Ptt ru I I t'l & k i ctrtifi<' Cotttltrttatirttr.r 9 ( 2fi) I ) I 9-2ti
Local convergence of Tri-Level Alternating optimization
Richard J. Harhawal'I, Yingkang Hul. and James C. Bezdek2
lMathematics and Computer Science Department, Georgia Southern Universiq'
Statesboro, GA 30460
?Computer Science Department, University of West Florida
Pensacola, FL 32514
Abstract
Tri-level alternating optimi:ation (TLAO) of a real valued function f(u') consists of
partitioning the vector variable rll' into three parts, say \^' = (x,)'.2), and alternating
Lptimizations over each of the three parts while holding the other two at their newest
*r"lu.r. Alternatine optimization is not usually the best approach to optimizing a
function. However, in cases when (x,y,z) has special structure where each of the partial
optimizations can be performed very easily, then the method can be simple to implement
and computationalll' competitive with other popular approaches such as quasi-Newron or
conjugate gradient methods. Convergence analysis of ri-level alternating optimization is
given which shou,s that the method is locally, q-linearly gonvergent to minimizers at
*tti.tt the second derivative of the objective function is positive dehnite. A usefulrecent
application of tri-level alternating optimization in the area of pattern recognition is
described.
Keywords - alternating optimization, local convergence' pattern recogrrition
1. INTRODUCTION
in this paper we consider the convergence analysis of a technique for computing local
solutions to the problem:
#T' r(w) ' (1)
where f: Rs ) R is rwice differentiable. The technique is called tri-level alternating
optimi2ation (TLAO). Application of this technique requires partitioning the variable w
e Rtasw:(x,y,z),withx e RP,y e Rq, and* e Rt. TLAOattemptstominimize f
using an iteration that sequentially minimizes f over each of the grouped subsets of x' y,
Rcc.'tr etl .lantrarl l(X) IlOt{.1-2-56.1 50-1.-i() O Dvnanric Publishers. Inc.
20 Hathaway, Hu, and Bezdek
and z variables. The TLAO procedure is stated next, and the notation: "arg min" is used
to denote the argument that minimizes; i.e., the minimizer.
Tri-Level Alternating Optimization (ILAO) of f: Rr ) R
TLAO.I Partltlon w R- as w = (X,y,Z), wrth x € R', y K', and w K', p +
Q * r = s. pick the initial iterate *(0) = (x(0),y(0),2(0)) and stopping
Partition w e Rt as w = (x,y,z), with x e RP, y e Rq, and w e tt-
criterion. . For example, choose a vector norm ll.ll *d termination
threshold e, and stop when ll*"(t.tl) - wG)ll < e or when k > T, where
T is a maximum iteration limit.
setk=0. ll.l[
TLA0-2 Compuie *G+l)- argmin f1x,y(k),"(k)1
xeRP
TLAO-3 Compute r(k+l)- arg min f1x(k+l),y,2(k);
y€Rq
TLAO-4 Compute r(k+l) - arg min f1x(k+l),r(k+t),r,
zeRt
TLAO-S tf llw(t*tl) -w(k)ll ( e ork> T, then quit; orherwise, set k= k+l and
ilil
go to TLAO-2.
A bi-level version of this approach is analyzed in Bezdek et al. (1987). The bi-level
version has been widely used to optimize numerous firzzy clustering criteria. Our interest
in the tri-level version is motivated in p"rt by the need to validate the optimization
procedure employed in the recently devised pattern recognition tool in Hathaway and
Bezdek (1999) that is briefly described in Section 3. This technique is a modification of
the popular fuzzy c-means algorithm (Bezdek, l98l) and is capable of effectively
clustering incomplete data. Incomplete data vectors are data vectors that are missing
values for some (but not all) components. This note will help supply the underlying
convergence theory for the useful new clustering technique in Hathaway and Bezdek
(1999) and other statistical and fwa methods for pattem recognition that alternate
optimizations over three sets of variables.
We mention that the global convergence of TLAO to minimizers (or in some cases.
saddle poins) of f follows easily from the general convergence theory of Zangwill
(1969), and is based on the monotonic decrease in the objective function values as
iteration proceeds. In short. Zang;will's theory can be used to show that under mild
(2)
(3)
(4)
[-ocal ('onvergence of Tri-Level Alternuting Optinrization
assumptions any limit point of a TLAO sequence is a point (x*,Yt,z*) satis$ing (2-4)
*ith *G*l) : x*, yG) : y(k+l) = y*, and zG) = zG+l) = z*. This type of point could either
be a minimizer or a saddle point, but in practice, computed (x*,Y*,z*) values are almost
never saddle points.
The next section gives the local analysis of TLAO. Section 3 briefly describes a new
clustering algorithm that uses TLAO to optimiie a particulax clustering criterion. The
final section contains concluding remarks and some ideas regarding worthwhile future
work.
2. LOCAL COITYERGENCE ANALYSIS OF TLAO
Letf:RPx RQx rr'*+ nandpartitionw=(x,y,z) e RPx Rq* Rt. Weshow inthis
section that TLAO is locally, q-linearly convergent to any minimizer of f for which the
Hessian of f is positive definite. Corresponding to (2-4) we define X: RQ x Rt ) Rp, Y:
RP x n* ) Rq, andZ: RP * Re ) n'-t, as :
2l
x(y,z) :
Y(x,z) =
z(x,v) =
arg min f (x,y,z)
xeRP
arg min f(x,y,z)
y€Rq
arg min f(x,y,z)
ze Rt
(5)
(6)
(7)
The reasoning used in this section is invariant to translation of any minimizer of f by a
constant vector, so we can simpliff our notation by assuming that the local minimizer of
interest is (0,0,0) . pP;pqxRt. We first show that under reasonable assumptions, X, Y,
and Z are continuously differentiable near (0,0) e RQ"Rt, RPxRt, RPxRQ, respectively'
(Hereafter, we sometimes leave it to the reader'to infer the appticable dimensions of
points such as (0,0) rather than explicitly mentioning them') We let f'(w) denote the
sxs Hessian matrix of f.
Lemma 2.1 Let f: RP x RQ x nt > n satisff the conditions:
(i) f is C2 in a neighborhood of (0,0,0);
(ii) f"(0,0,0) is positive definite; and
(iiD (0,0,0) is a local minimizer of f'
Then in some neighborhood of (0,0) e R9*Rt, the minimizing function X(y,z) in (5)
exists and is continuously differentiable. Similar results hold for Y(x,z) and Z(x'y)'
22 Hathaway, Hu, and Bezdek
[f"*1x,y,z; f*r(x,y,z) fug,y,)f
Proof. Partition f'(x,y,z) as lf,*{x,l,z) fr(x,y,z) fn(x,y,z) l. t, tU and (ii),
f fo(x,y,z) f o(x,y,z) f og,y,z) )
fo(x,y,z) is positive definite and nonsingular in a neighborhood of (0,0,0). The
implicit function theorem guarantees a continuously differentiable function X: Rq x Rr )
RP, defined in a neighborhood of (0,0) e RQxRt, satisfting f*(X(y,z),y,z) = 0. This
implies x = x$t,z) is a critical point of K . ,y,z), and this together with (iii) gives us that
X(0,0) = 0. Since (X(y,z),y,2) is near (0,0,0) for (y,z) near (0,0), it follows using (i) and
(ii) that fo(X(y,z),y,2)is positive definite for g,z) near (0,0), and this implies that
X(y,z) is a minimizer of ( ,y,z). This shows that the continuously differentiable
function guaranteed by the implicit function theorem is in fact the minimizing function in
(5). Similar arguments give the results for y(x,z) and,Z(x,y). I
For notational convenience in the following, we define A: f "(0,0,0) and partition it as
fxy (0,0,0)
fw (0,0,0)
fry (o,o,o)
Define the mapping s: RP x Rq x Rt ) RP x R9 x Rt corresponding to one iteration
through steps TLAO-2, 3 and 4 as:
S( X,y,Z) = ( St (x, y, z) , 52 (x, y, z) , 53 (x, y, z) ) (9a)
: ( X(y,z), Y(X(y,2,),2),Z(X(y,z),Y(X(y,z),2)) ) (9b)
The results of Lemma 2.1 imply that S is continuously differentiable in a neighborhood
of (0,0,0) with s(0,0,0) = (0,0,0). Let p(s'(0,0,0)) denote rhe spectral radius of s'(0,0,0).
As will be seen in the proof of Theorem 2.1, the fundamental properry needed to establish
convergence of a TLAO sequence is that p(s'(0,0,0)) < l, which is proved in Lemma 2.2.
Lemma 2.2 Let f: RP x RQ x nt + n satisfy the conditions of Lemma 2.1 and iet s:
RPx R{* il) Rpx Rer Rtbedefuredby(9). ttren p(S'10,0,0)) < l.
I51, (o,o,o) S1, (o,o,o) slz(o,o,o)]
Proof. Partition s'(0,0,0) as I s2*(0,0,0) Sry(0,0,0) s2z(0,0,0) l. In calculating
[S3" (0,0,0) S3y (0,0,0) S3" (0,0,0)]
s'(0'0,0), we will need the various panials \,(0,0), xz(0,0), y,(0,0), yz(0,0), zx(0,0),
and Z"(0,0), which are obtained first. We suppress the argument (0.0) in the following.
To obtain X,r, differentiate f*(X(y,z),y,2) = 0 with respect to y and evaluate at (0,0) to get:
fxz (0,0,0)l
fyz(0,0,0) | = f'(0,0,0). (8)
fzz (0,0,0)l
[o* A*y n*r-l [r** 1o,o,o;
o=lor Ayy er'l= lr^io,o,o)
Lo* Ary A-) [t{o,o,o)
xr= - A*lA*, (10b)
The remaining partials are calculated by differentiating t(x,Y(x,z),2) = 0 with respect to
x and z, andfr(x,y,Z(x,y)) = 0 with resPect to x and y. They are:
l-oc a I Convergence. of Tri-Leve I A lternat i ng Opt irn izat ion
which yields: fxx(0'0'0) xv + Ev(0'0'0) = 0'
\ = - e*lA*v
Differentiating lr(X(l,z),V,2):0 with respect to z and evaluating at (0,0) gives
Y*:-AilA"*
Yr--AilA.
z*: - A). A,-.
zy = - \).Ary
Now the components of S'(0,0,0) are calculated using (10) as:
Slx(0,0,0)= 0€RP*P;
S2x(0,0,0)= 0eR9'P;
S3x(0,0,0):0eRoP;
S1u(0,0,0): \: -Ail A'y;
S2u(0,0,0) = Yfi: A-; Rr R-i A*, ;
53,(0,0,0) = Zfi+ZvYxx.y= A-'r]-.Ao. A*1A*y
t). t"y Ail o^ a*l nr;
Slz(0,0,0) = Xr:- A-*l e*r;
Szz(0,0,0) = Y*Xt *Y.: A-rJ A^ A*l A*r- Afl nr";
S3z(0,0,0) = Z*X.-+ zyY xxz * \Y r-- A-.jA,,( A-x,l Axz -
rJ, n.yAil A"* A* A*, + A),no e-/ e,o
(10a)
(l 0c)
(l0d)
(l 0e)
(100
(l la)
(l lb)
(l lc)
(l 1d)
(l le)
(ll0
(l le)
(l lh)
(l1i)
We can now establish that p(S'10.0,01) < I by recognizing an important relationship
between 5'(0,0,0) and A. Define the matrices B, C, and D as
24
[o* o ol to-Axy-R*,] [o*
t=lo^ Ayy 0l,C=lo o -A*l,anaD=l o
Lo^ Azy o-) [o o o j Lo
Hathaway, Hu, and Bezdek
00
Avv o
o Art
Note that A = B - C. Since A is positive definite, it follows that D is positive definite
and B is nonsingular. A straightforward but tedious calculation yields
5'(0,0,0) : B-l C(13)
By Theorem 7.1.9 in Ortega (1972) and the assumption that A is symmetric and positive
definite,wehavethat p(s'10,0,0)) = p(B-tA)< I if A- B-c isap-regularsplitting.
Bydefuritiotr,B-cisaP-regularsplittingifBisnonsingularandB+cispositive
definite. By earlier comments, it only remains to show '.hat B + C is positive Jefinite.
The symmeric part of B + C is
1(r.c)*(B'*ct) = j("*cr)*(nr *c) = {(o1*(or) : o, (14)
which is positive definite. I
We now give the main result for local convergence, which is essentially an adaptation of
Ostrowski's theorem (Theorem 8.1.7 in Ortega, lg72).
Theorem 2.1 Let w* be a local minimizer of f: Rs ) R for which f'(w*) is posirive
definite. and let f be C2 on a neighborhood of w*. Then there is a neighborhood U of w*
such that for any ,"(0). u, rhe corresponding TLAo iteration sequence {w(k), dehned
using S in (9) as w(k-l) = 51*(k\ converges q-tinearly to w*.
Proof. As discussed eariier, we can assume that w* : 0. It is necessarv to show
rli11*(k*t)= rg1stL*tl,*(0\=0 (r5)
for all choice, of ,or(0) close enough to w*. Apply Lemma 2.2 to obtain
. p=p(s'1o,o,o;) <t. (16)
Pickd > 0 such that p +2 6< l. By Theorem 3.8 in Stewart (1973), there exists anonn
il'lls on Rs such that lor all w e Rs,
lis'1o.o.oiwllu < (p +s) llwllu (17)
since s' is continuous near w* = (x*.y*,2*) = (0.0.0). rhere is a number r > 0. such that
],,,,
[.ocal ('onlersence of 'l'ri-Level Alternating Optittrizatiort
lls'iwr)*zlla < (p +za) llw2llu (l 8)
e nt 1 llwl[u < r]. From (18) and the fact that S(0) = 0, we
2-5
for all w, and wt e B, = {w
have:
llst*lll'
(le)
The result of ( l9) establishes that for initialization of TLAO near w*, the error is reduced
by (p + 26) < I at each iteration, which gives the local q-linear convergence of 1*(k); to
w+.
3. AN EXAMPLE OF TLAO
An important problem from the area of pattern recognition is that of partitioning a set of
datz X: {x1,...,\} c Rt into natural data groupings (or clusters). A popular and
effective method for partitioning data into c fuzzy clusters {C1,...,C"}is based on
minimization of the fuzzy c-means functional (Bezdek, 1981):
J.(U,v) = it uilil*u -v'llt,
i=1k=l
where: m > I is the fuzzification parameter;
U = [Uit], where Ug: degree to which xL e C1i
V = [V1,...,V.J, where V1 e Rt is the center of Ci; and
ll.ll is an inner product norm on Rt.
The optimization of (20) is auempted over all V e Ruc and U e lr'\.n, where
(cnl
Mf.n: ] UeR"'n lUil eto,ll'lU* =l')-Uik > 0: Vi'kl (21)
lFrk=r)
The most popular method for optimizing (20) is a bi-level alternating optimization over
the U and V variables known as the fuzz,v c-means algorithm (Bezdek, l98l). It
calculates a new V liom the most recent U via
: l[i,',*,**il,
I
< Jlls't*)'" llu at
0
s (p +26) llwll'
I
(20)
26 Hathaway, Hu, and Bezdek
v j,i, (22)
and a new U from the most recent V by
'7 i,k, (irl
where Vli and xlt are the jth components of V1 and xp, respectively, and llxi -V6ll > O V
h. (See Bezdek (1981) for the necessary condition that supplants (23) when one or more
ilxp-v6ll =0.1
Real-world data sets sometimes contain incomplete data which contain feanre
vectors with one or more missing values (Jain and Dubes, 1988; Dixon , lg79). Cases of
this kind can arise from human or sensor error, subsequent data comrption, etc. For
examole. darum component xjk may be missing so that r, e Rt has the lorm
.T
(x1p..r.p.?.x-,r.....,\t)^. Unfornrnately, the irerarion described by (21) and (?3) requires
access to complete data. If the number of data with missing componenrs is small. then
one option is to delete that portion of the data set and base the clustering entirely on rhose
data containing no missing components. This approach is problematic if the proportion
of data with missing components i5 significant and in all cases fails to provide a
classification of those data deleted from the cluster analysis. Recently a missing data
version of the fuzzv c-means aleorithm (Hathaway and Bezdek. 1999) has been devised
tbr the incomplete data case and we give the algorithm here as a useful example of
TLAO.
In the tbllorving we will use i to represent missing dara componenrs: e.g.. xk :
(xtk..\:k.i;t-\rr,....\t)T. Th. collection of ail missins dara components will be denoted
by i c x. we adapr fuzzy c-means to missing data using a principal of optimaliry
similar to that used in maximum likelihood methods trom statistics. In rhis case. we
assume that the missing data is consistent with the dara set having strong cluster
strucnrre. \l'e implement this optimal completion strareg, b.v d-vnamically estimating the
missing data components to be those numbers that optimize the cluster sructure, as
measured bv minimum values of the criterion in (20).
The incomplete data version of fu2ry c-means from Hatharvay and Bezdek (1999)
minimizes (10) over U. V, and X. ttr. minimization is done using TLAO based on the
tbllowins updating. The current values of V end U are used to esrimate the missine data
compon.nrs * by:
V i,,.e X. 11a)
JN
u,, = [*i,u**,-J/[Euil J,
u,u = (il* o - u, ll-' u*-" /*t,,,,. - - vn fl
-z irm-u ),
o,- = liuii v, ], ii,il i
[-ocll Convergence rrf Tri-l-elel Alternating Optintization
The current missing data values X are then used to complete X so that (22) can be used to
calculate the new V. The third inner step of one complete TLAO iteration is then done
using the completed X and the new V in (23) to calculate the new U. Preliminary testing
of this missing data approach for clustering has demonstrated it to be highly effective.
The convergence theory of the last section guarantees the procedure to be locally
convergent to minimizers, at a q-linear rate.
4. DISCUSSION
Tri-level alternating optimization attempts to optimize a function f(x,y,z) using an
iteration that alternates optimizations over each of three (vector) variables while holding
the other rwo fixed. The method is locally, q-linearly conversent to any minimizer at
which the second derivative is positive definite. The giobal analysis of this method fits
into the general convergence theory of Zangwill (1969), which guarantees that any limit
point of an iteration sequence must satisry the first order necessary conditions for
minimizing f.
A recent exampie of TLAO tbr an important problem in panem recosnition was
given. The TLAO scheme applied to the fuzzy c-means function allows clustering of
data' sets where some of the data vectors have missing components. Preliminar-v
numerical tests olthe approach have shown it to produce good results even when there is
a substantial proponion of incomplete data.
Alternating optimization is often not the best approach to optimization. but it is
certainly worth consideration if there is a natural partitioning of the variables so that each
of the partial minimizations is simple. While the convergence rate of the AO approach is
in senerai oniy q-linear, if the partial minimizations are simple. the method can still be
competitive or superior to joint oPtimization approaches rvith taster rates (e.g.. q-
superlinear) of convergence (Hu and Hathaway. 1999).
One of the most interesting mathematical questions concerning this approach is horv
to systematicaily and efficienrly determine the "best" partitioning of the variables so that
the value of p in (16) is as small as possible. We expect the value of p to be small rvhen
the partitioning produces groups of variabies that are "largely independent". For
example. minimizarion of f(x.y.z) = *l * yJ + ,7 can be ,lone in one TLAO iteration ( p =
0) because there is complete independence among the three variables. Other
computationall-v oriented questions concern how to best tbrmulate a relaration scheme
and how an alternating optimizacion approach can best be h;-bridized "vith a q-
superlinearl-v- (or taster) convergent local method. Finally. the authors plan to uniff the
convergence theory of grouped coordinate descent and extend it to the case of m-level
aiternating optimization. and to survey many imponant instances oi alternating
optimization t-vpe schemes in pattern recognition and statistics.
REFERENCES
Bezdek. J.C. (1981). Punern Recognition wirh Ftc=' Objecrire Functiotrs. Nerv
York: Plenum Press.
21
'28 Hathaway, Hu, and Bezdek
Bezdelq J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., & Windham, M.p.
(1987). Local convergence analysis of a grouped variable version of coordinate
descent Journal of Optimization Theory and App lications, v. S 4, 47 I -47 7 .
Dixon, J.K. (1979). Pattern recognition with partly missing d^tL IEEE Transactions
on Systems, Man and Cybemetics, v.9,617-621.
Hu, Y., & Hathaway, R.J. (1999). on effrciency of optimization in fuzzy c-means,
preprint.
Hathaway, R.J., & Bezdeh J.c. (1999). Fvzzy c-means clustering of incomplete
dat4 preprint.
Jain, A.K., & Dubes, R.C. (19E8). Algorithms for Clustering Data. Englewood
Cliffs, NJ: Prentice-Hall.
fu"gq J.M.{1972). Numerical Analysis: A Second Course. New York Academic
Press.
stewart, c.w. (1973) - Introduction to Matrix Computations. New York Academic
Press.
Zangwill, W. (1969). Nonlinear Programming: A Unifed Approach. Englewood
Cliffs, NJ: Prentice-Hall.
... In this part we present a general principle of alternating optimization and existing approaches of photometric stereo which exploit this principle. Alternating optimization is one of the existing heuristic algorithms which consists in minimization of the objective function successively fixing all but one variable, [97], [98], [99]. ...
Article
The PhD thesis work is dedicated to the problem of uncalibrated non-Lambertian photometric stereo surface reconstruction. The proposed approach consists in two phases: first we correct images of the input sequence from specularities in order to obtain images of pseudo-Lambertian surfaces, and then realize Lambertian photometric stereo reconstruction. In this work we proposed two original methods, respectively, for specularity correction and surface reconstruction with no prior information neither on light sources nor on surface properties. We apply the novel processing to Petri dish images for microbial colonies surface reconstruction.Specularity is an optical effect of a complex physical nature. This effect is useful for human 3D objects perception but it affects automated image processing. In order to be able to apply the Lambertian photometric stereo model, specularities should be removed from the input images. We propose an original method for specular zones correction adapted to estimation of pseudo-Lambertian surface images and further reconstruction. This algorithm is able to detect specularities as abnormally elevated pixel intensity values in an image of the input sequence and to correct the found zones using information from all other images of the sequence and a specific continuous correcting function. This method allows removing specularities while still preserving all other particularities of shading important for the further surface reconstruction.We then propose an original stereo photometric method for Lambertian surface reconstruction with no prior on illuminations. The implemented photometric stereo model consists of four components, two of them (albedo and normals) describe surface properties and the others (light sources intensities and directions) describe illumination. The proposed algorithm of the photometric stereo reconstruction uses the alternating optimization principle. Each model component is found iteratively fixing all variables but one and applying value and quality constraints for the optimization function. The original scheme of resolution allows separating of different information types included in input images. Thanks to such matrix factorization, the surface reconstruction is made with no prior information on lighting directions and the reconstructed objects properties. The applicability of the algorithm is proved using artificially created and open data-sets for which the ground truth information is available.The last part of the thesis is dedicated to the application of the proposed uncalibrated non- Lambertian photometric stereo approach to the Petri dish images. Images are obtained using illuminating sources which are supposed to be unknown for the reconstruction process. Moreover, the reconstructed microbial colonies are very diverse, generally have small size, can be Lambertian or not, and their surface properties are not defined in advance. The results of reconstruction for such complex real-world data add value and importance to the developed approach.
... Hathaway and Bezdek [26] established that local q-linear convergence was maintained even if restricted minimizations for one of the vector variables was only done approximately, using a single iteration of Newton's method. Recently, the local theory was extended to the case t = 3 [27]. Our new result completes the local theory, by giving a proof of local convergence for an arbitrary partitioning of x, i.e., all values of t, 2 ≤ t ≤ s. ...
Article
Full-text available
Let f : ℜ s a ℜ be a real-valued scalar field, and let x = (x 1 ,…, x s) T ∈ ℜ s be partitioned into t subsets of non-overlapping variables as x = (X 1 ,…,X t) T , with X i ∈ ℜ p i , for i = 1, …, t, p i i =1 t ∑ = s . Alternating optimization (AO) is an iterative procedure for minimizing (or maximizing) the function f(x) = f(X 1 ,X 2 ,…,X t) jointly over all variables by alternating restricted minimizations over the individual subsets of variables X 1 ,…,X t . AO is the basis for the c-means clustering algorithms (t=2), many forms of vector quantization (t = 2, 3, and 4), and the expectation-maximization (EM) algorithm (t = 4) for normal mixture decomposition. First we review where and how AO fits into the overall optimization landscape. Then we discuss the important theoretical issues connected with the AO approach. Finally, we state two new theorems that give very general local and global convergence and rate of convergence results that hold for all partitionings of x. Formal proofs of these two theorems are long, and require more space than allowed for this note; however, we provide proof sketches ("proofitos") for both results that will lead the interested reader towards the complete arguments.
Article
Data clustering based on regression relationship is able to improve the validity and reliability of the engineering data mining results. Surrogate models are widely used to evaluate the regression relationship in the process of data clustering, but there is no single surrogate model that always performs the best for all the regression relationships. To solve this issue, a fuzzy clustering algorithm based on hybrid surrogate model is proposed in this work. The proposed algorithm is based on the framework of fuzzy c-means algorithm, in which the differences between the clusters are evaluated by the regression relationship instead of Euclidean distance. Several surrogate models are simultaneously utilized to evaluate the regression relationship through a weighting scheme. The clustering objective function is designed based on the prediction errors of multiple surrogate models, and an alternating optimization method is proposed to minimize it to obtain the memberships of data and the weights of surrogate models. The synthetic datasets are used to test single surrogate model-based fuzzy clustering algorithms to choose the surrogate models used in the proposed algorithm. It is found that support vector regression-based and response surface-based fuzzy clustering algorithms show competitive clustering performance, so support vector regression and response surface are used to construct the hybrid surrogate model in the proposed algorithm. The experimental results of synthetic datasets and engineering datasets show that the proposed algorithm can provide more competitive clustering performance compared with single surrogate model-based fuzzy clustering algorithms for the datasets with regression relationships.
Article
In recent years, a number of operation data from engineering systems have been measured and recorded, which promotes the development of engineering data mining. However, the operating state of the engineering system usually changes greatly, which results that the patterns of operation data vary considerably as well. Thus, partitioning these data can provide useful references to the design and analysis of engineering systems. In this paper, a new clustering algorithm based on support vector regression and fuzzy c-means algorithm (SVR–FCM) is proposed to accomplish this work. The SVR–FCM algorithm is based on the framework of fuzzy c-means algorithm (FCM), in which the differences between the clusters are evaluated by the relationship among attributes of data. In the proposed algorithm, support vector regression (SVR) is utilized to describe the relationship among attributes of, and an alteration optimization method is designed to optimize the new designed clustering objective function. A series of experiments on synthetic datasets and real-world datasets are conducted to evaluate the performance of the SVR–FCM algorithm, which shows the higher effectiveness and advances of the SVR–FCM algorithm compared with other popular clustering algorithms. The SVR–FCM algorithm is applied to a tunnel boring machine (TBM) operation dataset collected from a real TBM project in China. The experimental results show that the proposed algorithm performs well in TBM operation data clustering. This paper also highlights the applicability and potential of data clustering in the analysis of other complex engineering systems similar to TBMs.
Article
The Fuzzy C-Means (FCM) algorithm is one of the most commonly used clustering methods. In this study, the reconstructed data supervised by the original data is introduced into the FCM clustering, and a dual expression between cluster prototypes and reconstructed data is mined by extending the FCM clustering model using cluster prototypes, memberships and reconstructed data as variables. The convergence and the time complexity of the proposed algorithm are also discussed. Experiments using synthetic data sets and real-world data sets are focused on the influence of the extent to which the reconstructed data are supervised by the original data on the clustering performance. A way of parameter selection is provided which is helpful for enhancing the usefulness of the proposed algorithm. An application case study for monitoring data of shield construction is also presented. It reveals the effectiveness of the proposed algorithm from the viewpoints of the interpretability of clustering results and the representativeness of cluster prototypes.
Conference Paper
Full-text available
Photometric stereo is a technique of surface reconstruction using several object images made with a fixed camera position and varying illumination directions. Reconstructed surfaces can have complex reflecting properties which are unknown a priori and often simplified by Lambertian model (reflecting light uniformly in all directions). Such simplification leads to certain inaccuracy of reconstruction but in most cases is sufficient to obtain general object relief important for further recognition. Not only surface properties but also lighting sources utilized for each image acquisition can be very complex for modeling, or even unknown. Our work demonstrates how to find surface normals from Lambertian photometric stereo model using color images made with a priori unknown lighting directions. Evaluation of model components is based on an alternating optimization approach.
Article
Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missing values into the algorithm of fuzzy clustering. First, non-parametric hypothesis testing is employed to describe the missing values adhering to a certain Gaussian distribution as probabilistic information granules based on the nearest neighbors of incomplete data. Second, we propose a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion. Third, the clustering model is optimized by using a tri-level alternating optimization utilizing the method of Lagrange multipliers. The convergence and the time complexity of the clustering algorithm are also discussed. The experiments reported both on synthetic and real-world data sets demonstrate that the proposed approach can effectively realize clustering of incomplete data.
Article
Full-text available
Let f : Rs → R be a real-valued function, and let x = (x1,...,xs)T ∈ Rs be partitioned into t subsets of non-overlapping variables as x = (X1,...,Xt)T, with Xi ∈ Rpi for i = 1,...,t, Σi=1tpi = s. Alternating optimization (AO) is an iterative procedure for minimizing f(x) = f(X1, X2,..., Xt) jointly over all variables by alternating restricted minimizations over the individual subsets of variables X1,...., Xt. Alternating optimization has been (more or less) studied and used in a wide variety of areas. Here a self-contained and general convergence theory is presented that is applicable to all partitionings of x. Under reasonable assumptions, the general AO approach is shown to be locally, q-linearly convergent, and to also exhibit a type of global convergence.
Article
This paper presents a method for utilizing Data Envelopment Analysis (DEA) with sparse input and output data using fuzzy clustering concepts. DEA, a methodology to assess relative technical efficiency of production units is susceptible to missing data, thus, creating a need to supplement sparse data in a reliable and accurate manner. The approach presented is based on a modified fuzzy c-means clustering using optimal completion strategy (OCS) algorithm. This particular algorithm is sensitive to the initial values chosen to substitute missing values and also to the selected number of clusters. Therefore, this paper proposes an approach to estimate the missing values using the OCS algorithm, while considering the issue of initial values and cluster size. This approach is demonstrated on a real and complete dataset of 22 rural clinics in the State of Kansas, assuming varying levels of missing data. Results show the effect of the clustering based approach on the data recovered considering the amount and type of missing data. Moreover, the paper shows the effect that the recovered data has on the DEA scores.
Article
Full-text available
This article contributes to the development of the field of Alternating Optimization (AO) and general Mixed Discrete Non-Linear Programming (MDNLP) by introducing a new decomposition algorithm (AO-MDNLP) based on the Augmented Lagrangian Multipliers method. In the proposed algorithm, an iterative solution strategy is proposed by transforming the constrained MDNLP problem into two unconstrained components or units; one solving for the discrete variables, and another for the continuous ones. Each unit focuses on minimizing a different set of variables while the other type is frozen. During optimizing each unit, the penalty parameters and multipliers are consecutively updated until the solution moves towards the feasible region. The two units take turns in evolving independently for a small number of cycles. The validity, robustness and effectiveness of the proposed algorithm are exemplified through some well known benchmark mixed discrete optimization problems.
Article
Full-text available
The problem of clustering a real s-dimensional data set X={x(1 ),,,,,x(n)} subset R(s) is considered. Usually, each observation (or datum) consists of numerical values for all s features (such as height, length, etc.), but sometimes data sets can contain vectors that are missing one or more of the feature values. For example, a particular datum x(k) might be incomplete, having the form x(k)=(254.3, ?, 333.2, 47.45, ?)(T), where the second and fifth feature values are missing. The fuzzy c-means (FCM) algorithm is a useful tool for clustering real s-dimensional data, but it is not directly applicable to the case of incomplete data. Four strategies for doing FCM clustering of incomplete data sets are given, three of which involve modified versions of the FCM algorithm. Numerical convergence properties of the new algorithms are discussed, and all approaches are tested using real and artificially generated incomplete data sets.
Article
An experimental comparison of several simple inexpensive ways of doing pattern recognition when some data elements are missing (blank) is presented. Pattern recognition methods are usually designed to deal with perfect data, but in the real world data elements are often missing due to error, equipment failure, change of plans, etc. Six methods of dealing with blanks are tested on five data sets. Blanks were inserted at random locations into the data sets. A version of the K-nearest neighbor technique was used to classify the data and evaluate the six methods. Two methods were found to be consistently poor. Four methods were found to be generally good. Suggestions are given for choosing the best method for a particular application.
Article
The efficiency of optimization in fuzzy c-means clustering is investigated. Numerous, powerful, general-purpose simultaneous optimization (SO) methods, and hybrid methods combining these and the most widely used alternating optimization (AO) method, are extensively tested for speed comparison. AO is clearly the best and simplest of the methods we tested when used on data sets of small or moderate sizes, especially those containing well-separated clusters. This justifies the extremely wide use of AO. On large-scale problems, some methods we tested are significantly faster than AO.
Punern Recognition wirh Ftc=' Objecrire Functiotrs
  • J C Bezdek
Bezdek. J.C. (1981). Punern Recognition wirh Ftc=' Objecrire Functiotrs. Nerv York: Plenum Press. '28
Local convergence analysis of a grouped variable version of coordinate descent Journal of Optimization Theory and App lications
  • J C Bezdelq
  • R J Hathaway
  • R E Howard
  • C A Wilson
  • M P Windham
Bezdelq J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., & Windham, M.p. (1987). Local convergence analysis of a grouped variable version of coordinate descent Journal of Optimization Theory and App lications, v. S 4, 47 I -47 7.
Fvzzy c-means clustering of incomplete dat4 preprint
  • R J Hathaway
  • J C Bezdeh
Hathaway, R.J., & Bezdeh J.c. (1999). Fvzzy c-means clustering of incomplete dat4 preprint.