Content uploaded by Boris T. Polyak

Author content

All content in this area was uploaded by Boris T. Polyak on May 25, 2017

Content may be subject to copyright.

,{-,r4;r*n a't j:\"i ',

Гf,\

tv) iо

ъ.l..

Ea '>)

,ь-

:{ оt

ч

AURCAT 34(3) 341-422 (L973')

AUT0ll|lATl0Ш

AШD

RETUI(ITE O(l]!lTR(lL

Russian Original Vol. 34, No. 3, Part l, March, 1973

August 1, 197з

ABтOMAТИHA и TЕЛЕMEXAHИнA

(AvтoMAтlKA l тELЕMEKHANIKA)

тRANSLATED FRoM RUsslAN

. Тhis trans|ation is published urldeгthe

editoria| direсtion of the lnstrument Soсietv of America.

rA

l ' l c0l{sulтAl{тs вuREAU' шЕW YoRк

\y

nq

v7

ё<l

.{

AUT0tUlATl0Ш and REtUl0ТE G01{TR0L

A translation of Avtomatika i Telemekhanika

Consultants Bureau journals appear about six months after the publiсation of the

origina| Russian issue. For bib|iographiс ассuraсy, the Еngtish issue pub|ished by

Consultants Bureau сarries the same number and date аs the original Russian

from whiсh it wаs trаnslated- For example, a Bussian issue published in Deсem.

ber wilI appear in a Consu|tants Bureau ЕngIish trans|ation about the followiпg

June, but the translation issue wilI сarry the Deсember date. When ordering any

volume or partiсulаr issue of a Consultants Bureau journal, p|eаse speоify tne

date and, where appIicable, the vo|ume and issue numbers of the originа| Russian.

Тhe materia| you wi|| receive will be a trans|ation of that Russian vo|ume or issue.

August 1, 1973

Volume 34, Number 3, part 1Marсh, 1973

GoNтЕNтs

Еnol.i HUsS.

aА1 Е

oaд o

з49 1з

DЕTЕRМINATЕ sYsTЕМs

Regulatols Guaгaпteeiпg the Аutoпomy of a Controlled systеm-V. I. Buyakas

On Control Fields of Dyпamiс Systems-V. N. Rozova

on thе Solvability of thе Problеm of thе Analytiсal Dеsign of a Coпtroller for Nonstatioпaly

Systems-L Е. ZabеLl.o

sToCHАsTIс sYsTЕМs

Correlatioп Funсtioп of thе output Signal of a Systеm With Possiblе Fau1t and Stationary Input

-А. N. Sklyaтevich

Algorithms for Serviсiлg Stoсhastic Flows of Dеmands in an Inventory-Control Problem

-v. M. Vishnevskii and Е. S. Кoсhetkov

oп Controlled Sеmi-Мarkov Proсеsses with Sеveral Targеt Funсtioпs-M. G. Tep1itskii.. . . .

ADAPTIVЕ sYsTЕМs

Pseudogradieпt AdaPtation and Training Аlgorithms-B. T. Polyak and Ya. Z. Tsypkin

Training Proсeduгes whiсh Combine Stoсhastiс Аpproximation aпd Miпimization

of thе Empirical Risk-L. P. Sysoev

AUToМАTA

Sequentia1 Dесomposition of ProЬabilistiс Аutomata _Yu. M. Afanas'еv, A. I. Kгysaпov,

and Yu. P. Letuпov

Modes of operаtion of the

-Yu. L. Tomfel'd FееdЬaсk Loop iп an Asynсhтonous Logiсal Netwoтk

The Rцssian prеss date (рodpЬano k peсhati) of thЬ issuе was 2/20/1913.

Publiсation thеrеforе did not oссur pтioт to this date, but must bе assumеd

to havе taken plaсe reаsoпably soon thеrеaftеt.

27356

бoU

366

3?1

enn

з9B

412

416

26

32

3B

69

B4

яo

ADAPтIVE sYsTЕMs

PsЕUDoGRADIЕNT ADAPTATIoN AND TRAINING ALGoRITHMs

B. T. Polyak and Ya , Z. T sypkin UDс 62-50

The authors Plopose a unified approagh, basеd onthe notion ofthе psеudogradient, to theanalysis

of vaгious stoсhasdс algoтithms foг miпimizing funсtionals. A general thеoгem гegarding сon.

vег8rпсe is pгoved; this is used to substantiate stoсhasdc.appгoximation algoгithms (regulaг and

seaгсh), random-searсh algoгithms' generalized stoсhastiс gradient аlgorithms, and vaгious paг-

tiсular algoгithms for identifi сatioп and Pattеrn-гeсognition problems.

Intгoduсtioп

At pгesent' theгe is пo laсk of various algoгithms foт findiпg the unсonditional extгemum of a funсtional

J(с) whiсh defiпes an oPtimality сriterion.

Whеn the funсtional J(с) is diffеrentiable and iъ gradiепt VJ(с) сan be measuгed (even if noise is pгesent),

rеgular algorithms аrе used in whiсh the random realization of the gradient apPeaгs. Maпy algorithms сan be гe-

gaгded as "distorted" gгadient algoгithms; stePs aге made not dirесtly in the diгесtion of the anti8ladient, but they

involvе the use of this quantity. Foг example, thе gradient may be multipliеd by a matгix, only the signs of its

сomPonепts mаy be allowed foт, сertain terms in the gгadient еxpression may be disсardеd, etс. If thе gradiеnt

сaпnot be measuгed but it is possible to сomPute the values of realizadons of ](с), vaгious sеarсh algorithms are

used. Heгe the direсtion of motion is taken to be eithеr a finite-differenсe apprq(imation of the gradient (methods

of thе Kiefer-Wolfowitz type), or a stoсhasdс veсtoI (random-searсh mеthods), or a deterministiс vесtor that is

not diгeсtly related to thе gIadient (e.g., methods of сoordiпate desсent). Moгe сomplеx situations iп whiсh thе

optimaliry сriterion is пot diffeгегitiable aгe also possible. Iп some of thesе сases it is possiblе to use genегаlizеd

сonсePts of the gradient (i.e., the method of generalized stoсhastiс gradients). Finally, many adaptation and trainiпg

algoгithms are еssentiаlly of a noпgradient natule_no funсtional exists with гespeсt to whiсh thеse algoгithms aгe

gтadient аlgorithms.

The aim of our paper is to develop a geneгal approaсh whiсh will enсompass a11 the above diveгse situations

from a unifiеd viеwpoiпt. This approасh, whiсh rests on thе notion ofthе pseudogтadient, сonsisъ in the followiпg.

'у/e сoгsider an itеrаtion alqorithm of thе foгm

с[nl: ctn- 11_ 1[n]sIn],

whеre s[п] is a сertain тandom diгеction of motion, whiсh in gеneral depeпds on a1l the preсеding valuеs of с and

on ц and y[n] > 0 is a sсalar faсtor. We assume that there eхists a сеItain deterministiс smooth funсtional J(с),

i.е., thе optimality сriterioп. This funсtional сaп be eitheг speсified a pгiori (if the iпitial problem is to minimize

it) oг сan be introduсеd aгtifiсially. It is impoгtant to stess that it is пot assumed that it is possible to сomРute the

values of J(с) ol vJ(с) evеn with a гandom error (i.е., the quandties may not bе accessible to measuгemeпt). We

сall s[n] thе pseudogradient of j(с) at point с[n_1.] if wehave the сondition

YI(c[n-ll)'Msfn] >0,

i.e., if the veсtor s[п] on average makes аn aсute angle with the gradient; in other woгds, if *s[n] is on aveгage a

direсtion of deсrease of the fuпсtioпal. of сouгse, at eaсh point theгe exists a set of pseudogradiеnts of one funс.

floпal J(с). We should also note that no assumptiorв aгe made гegarding the сontinuity of Ms[п] as a funсtion of

Tгanslatеd from Avtomatika i Tеlemekhаnika, No.3, pp. 45.68, Мaтсh, 19?3. Фiginal аrtiс1esubmittеd

Iu|у 20' 1912.

@1973 Сonsultаnts Bureau, a diuision of Plеnum Publishing Соrpоrаtiоn, 227 ||еst ]7th Streеt, Nеul

York, N, У. 1001]. Аll rights rеseruеd. This articlе сannot be rеproduсed for any purposе шhatsoеuеr

tоithout pеrmission of thе publishеr. А сopу of this articlе is auailablе from thе рublishеr for $15.00.

g',t1

сгn*1]. Ifat eaсh step s[n] is a pseudogгadleпt ofJ(с)' then the iteтation algorithm will be сalled a pseudogradient

algorithm. It tuгns out that the с1ass of рeudogradiеnt algoгithms is very Ьroad and enсompasses almost all known

adaptation a nd training algoгithms.

Below wе will prove a general theoгem rеgaгding сonvегgеnсе of pseudogтadient algorithms and wil1 give

vaтious examрlеs of its use. Thus we will be able to obtain a number of both o1d and пew гesults гegarding thе

сonvelgenсe of vaгious algoтithms ftom а unifiеd viewpoint. Analysis of thе аlgorithms reduсеs basiсally to a

сheсk on their рeudogгadient naшre.

1' Gгadient and Pseudogradient A1gorithms

Let us гecаll thе adaptive appтoaсh to thе сonstruсtion of tтainiпg algoгithms [1' 2]. We will dеnote the

funсtional to be minimized (oPtimality сriteron) by J(с)' and its rcaIizat7on at the n-th step (at the point с[п -1])

by QftГn1' с[n*1]), i.e.,

I (c) : M" {Q @, c)) : I 0 @, c) p (dr).

:

The measuremеnt eггor foг J(с) at the n-th step will be dеnoted by g[n]:

Е[n] : Q@LnJ,с[n-1]) _t(с[n_1]).

We intтoduсe similar пotation for thе gradient of J(с) and iв гeаlizations and the eггor in measuгing the

gradiепt: v/(с) : M*|v"Q@,с)|: Io"8 @,c)p(d'r),

;

С|n1:v"Q@[n],c|n _ 1]) _ v l(c|n _ 1]).

In aссoгdanсe with the exшemum сondition VJ(с) = 0, the gгadient algorithm foг minimizing J(с) сan be

written in the form

c|n] : с|rl _ 1] _ y [n ]v "Q @|n1, c|n - 1]) ,

where 7Гn] >0 is a sсalar faсtoг. kamplеs of gradieпt algorithms for vагious pгoblems сan be found in [1..4].

Moreoveг, many algorithms whiсh aгe not striсtly gadient algorithms iп form (e.g., Kiеfeг-Wolfowitz algoгithms

or random-seaгсh algoгithms) сaп betrеаtеd as gтadient аlgorithms by сhanging to differеnt (smoothеd) funсtioпals

t5l.

In this рper we will investigatе generаl training algoгithms ftom a differeпt poiпt of view. We will not

attempt to сollsfiuсt a funсtional for еaсh algoгithm foт whiсh thе algorithm is a gradient algorithm. This eliminatеs

the diffiсu1ties assoсiated wittr the сomplexity or impossibility of obtaining suсh a funсtionа1. A сhange to рeudo.

gradiеnt algorithms yields a numbeг of advantages as сomPared to striсtly gгadient algoгithms. In the first plaсe,

direсt сomputation of the gгadient is impossible in many саsеs. This is the situation, for еxample, when only values

of the funсtional aгe aсcessible to mеasurement. seсond, frequently it is throretiсally possible but eхtrеmely

laborious to сomPutе thе gтadient (е.g., it is пeсеssary to сonstuсt seпsitivity funсtioпs). Pseudogтadieпt algoгithms

сan be muсh simpler. Third, in сertain pгoblems stтiсtly gгadient algorithms сonverge very slowly. Convеrgеnсe

саn be aссelеratеd by сhoosing diгeсt1oпs of motioп that differ fгom thе gгadient. Foг еxaпрle, in detеrministiс

minimizatioп problеms, methфs of the Newton tyPe сoпveгge much more rapidly thаn gгadient methods. Fiпally,

the notioп of gтadient bесomes meaningless foг nonsmooth funсtioпals. Аt the same dme algorithms obtained, e.g.,

ftom heuristiс сonsideгatioпs are реudogтadient with гesPeсt to otheт aгtifiсially intгoduсed smooth funсtionals

(with aсt as Lyapuпov funсtions). An example сan be pгovided by J(с) =|7z11c_c. [t2,wherе с. is the optimum

value of с. obviously, it is impossiblе to comPute J(с) oг VJ(с) sinсе the vаtue of с" is unknown. The pseudo-

gradient сondition, howevеr (whiсh in this сase assumes the foгm (с[n_1]_с.),Ms[п] > 0), сan usually bе сheсked

without diffiсulry. A number of suсh examples will bе dеsсribеd in S 5.

Tablе 1 shows several typiсal рeudogгadieпt algoгithms. It is assumеd that the funсtional is of the form

(1.]-), and t|tatrеalizations of the funсtional Q(х' с) oг of the gгadiеnt V"Q(x, c) сan be measurеd. The meaning

of сегtаin notadon as well as more preсise сonditions imposed on J(с) and on thе algoгithm Paгameteгs suсh that thе

algoгithms beсome реudogгadieпt algorithms will be foтmulated in s s3-5. Thе litеratuгe геferenсes in Tablе 1

aгe not intended to be сomplete and indiсatе only thе papers in whiсh the idea of the algoгithm was fiгst foгmulatеd

(although peгhaр in a somewhаt diffeгent foгm). Iп аdditioп to the geпeгаl algoгithms iп Tаble 1, we will also

378

t

(1.1)

(r.2t

(1.3)

(1.4)

(1.5)

TАBLЕ 1

Name ofa1goтithm

Gradient

Мonгo stoсhаstic

appro<imation

algoriftm)

Tгarшfoтmеd

gradieпt

Sign

Simplifled gradient

Regulaг random-

sедrсh a1gorithm

Random sеаrсh with

oairеd tгiaIs

Algoгithm

c |n] : c |n - 1]- т [nlV.Q @ |n|' c |'n - Ll)

Г ["]

с|,n|_с|n-1'|- is a maEiх

Г[n]V"Q @|n|, с|n-t])

I

Sеarф (Kiоfeг. l,

;;;;i;'- t"

stoсhastiс aP- |

pтo<imation I

V.Q (,, с\: А(c)q(x, c),.4 is a matгix,q is a vесtor

c |n| : с [n - 1'l _ т InIq @|n|' с [n - 1])

, еу 0"t0 unit veстoгs, * [n1 is a sсаlaг f i|n]: ci|n _ L|

o (.i [Пl, c |n - ||f c [п]ei)-Q (zo|n|' c[n-L]\

-'( vtI

q [,?] is a raпdom veсtor, c|n] : c |n - t] - т |nlq |n1

Q (z, In], с |n - L\ * c In 1q [n] ) - Q(J0[/ll, с |n -7 ] -o[л]q [n])

Х ; [;;i

tel

Cooгdinate dеsсent q tn\ : e; with рrobability pi|n]). e)0'

c |n| _ с |n - 1'] - т |n]q I'nl

хс [z]

Q @, c) is nondifferеntiаЬlе but сoпvex in с ,

йQ (", c1 is the.genetalizd gгadient'

с |n]: c |n-1l-1tn]V"Q @|п], c Ln - L|\

investigate сeгtain speсiflс algoгithms whiсh allow for thе spесifiс featuтes of the pгoblems in quеstion (in paг-

tiсular, ada ptive identifi сation a nd pattern-reсognition а1 goгithms).

of partiсulaг importanсe foг evaluating the еffeсtiveness of pseudogтadient algoгithms are thе problems of

rate of сonvrгgenсe, stabilitt, aпd ovеral.l quality of the аlgorithms. In this papeг we wil1 limit oursеlves to

вtablishiпg thе faсв of сonvergeпсe.

2. Conveгgеnсe of Pseudogгadient A1яorithms

Until reсеntly' сonveгgenсe ofvarious сlasses ofstoсhastiс algoгithms has beеn invеstigated independently.

The known faсts in this area aтe summarized in [3, I41. Let us Ieсall only сеrtain basiс тesulB.

Тhe simplest сase of sfiiсtly gadiеnt аlgorithms was сonsideгed as eaгly as the сlassiсal papeг of Robbins

and Morшo [6]. Iп the most general situаtion, iп whiсh neither uпiquеness noг even existenсе of a minimum point

of the funсtional is assumed, the сonvergen"e Ьf gгadient аlgorithms was established iп [15, 12] (Thеoтem 2). The

investigation of searсh algorithms begins with the work of кiefer and Wo1forwitz |1.o7. The сonveгgenсе of thе

multidimensional anаlog of the Kiefeг-Wo1fowitz method was proved by B1um [16] (seе also [15, 11]). Rastrigin

initiated the usе of гandom-searсh methфs (sеe, е.g., [9]). Ceгtain rеfinеmеnts and substantiаtioпs of random-

seaгсh algoтithms may be found in [11, 1?-19]. Finally, gradient algoгithms wеге gеnегalized to the сase of non-

diffeтentiable сoпvex funсtioпаls in [ 1 1. 13 ].

clnl : cln - Ll - 1 [n]signV"Q @ [nl, c [n - I1)

q [n] is a тandom vrсtoг l с|nl : c |tl - 7|

-тIn]q [z] (q[n]тf.Q @|nl, с ["-1]))

379

As for the substantiadon of geпегal nongradieпt algoгithms' thе fiгst гesult iп this аrea is Blum's Г16]. tle

сonsideгed аn iteтative Proсess for finding the root ofthe гegгession еquation aпd proved its сonvergenсe by mеans

of an analog of the Lyapuпov function whiсh he intтoduсеd. It is interesting to note that, although Blum's woгk is

often сitеd, his геsults aгe evidently not suffiсiently wеll known. This is probaЬly the сommoп fate of many сlassiсal

studies' Blum's appгoaсh was subsеquently devеloPed and stlengthened by Aizerman, Braverman, and Rozonoеr

[20]. A numЬer of intеresting rеsults сan bе found in thе monograph of Nеvel'son аnd Khas'minskii [35].

The abovе papeгs сonsideгеd algoгithms whiсh opeгate in the presenсe of additive noisе гesulting ftom inaс-

сuraсiеs iп measuring thе gгadienв oг funсtionals. on the othег hand, many paPeгs wеrе devoted to nongradiеnt

algorithms foг deterministiс Ф(tremum pгoblems (seе, in pardсulaг |21'' 221and the bibliography given there). As

a rule, suсh algoгithms сannot be used when noise is present.

It is pоsible to have intermediate types of algoгithms whiсh opeгate only in thе absenсе of additivе noise

but whiсh make it possible to find the extemums of funсtioпals whеn the eтror in measuriпg thе gгadieпts oг funс-

tionals is of a relative пafuге. This mearв that the eгroг deсreases aпd tends to zе|o as thе minimum point is aD-

Pгoaсhed. This kind of erтor is generated, for example, by multipliсative noise.

Below we will pгove а basiс theorem rеgaгding the сonvегgenсe of disсrete pseudogradiеnt algorithms. It

turns out that, by using just this theorem alone, it is possible to establish сonvеrgeпсe of almost all known аdаptation

and tгainiпg algoтl&ms _bottrwithand without additive and multipliсadve noise. Thus the theorem enсomPasses

both deterministiс and stoсhastiс optimization problems.

Let us now give an exaсt formulation of the theorem. Consider the algorithm

c|nl: c|n _ t] - т[z]sIz]'

wheгe all the с[n] belong to the Hilbert sPaсе tI' с[0] is fixed' y[n] > 0 arе deteгministiс sсalaг faсtors that depend

only oп n, and s[n] are rеalizations of thе гaпdom veсtoг s with values in H, whose distгibution dеpends, generally

spеаking, on с[0], . . ., с[п_1], s[1]' . . ', sГn_1], and n. obviously, any iteгation algorithm may be written in

the form (2.1). Foг example, if thе initial algorithm has the form

c|n]: c|n- t] _ Г[z]v"Q@|n],c|n- t'l),

wheгe Г[n] is an operatoг that dеpends on с[0], . . ., с[n-1], x[1], . . ., x[n], n' wе may takе s[n]= y[n1.1Г[n1 .

v"Q(x[n], с[n_1]), wheгe 7[п1 is a determlnistiс sсalar fасtor that is сhosen in some fashion, and thus rеduсe thе

аlgorithm to the basiс form. We will гepeatеdly use this pсsibility in the futuгe.

Assume that for с[0]' . . ., с[п_1], s[1], . . ., s[п*1.], n we have the сoпditional mathеmadсal expeсtatiorв

s аnd |l s|l 2 (whiсh will be denoted by Мstn] and Mll s[n1|| 2 rеspесtivеly). Asumе also that inH we have speсifiеd

a deterministiс funсtioпal J(с) (optimaliry сriteгioп) thаt is boundеd from below and diffeгеntiable, and whosе

gтadient VJ(с) satisfies a Liрсhitz сonditioп' i.e.,

I(с)>/*}-*,

||v/(с + а) _ v/(с) || < Д|lс|l' с,e е H'

Assume that algorithm (2.1) is a pseudogradient algorirhm:

vt(cln-L1)'MsInl>0.

We impose a сonstтaint on the steP leпgth in the algoгithm:

Mllslnlll'< llzl * Kl(cfn- 1l) + K,vI(cln-I1)'Mslnl.

This сoпdition meaпs tьаt Mll s[n]ll z as a funсtion of n doеs not itlсrease more гapidly than some-sequеnсe

7[n] (wheгe possiЬly \[n] + -), and as a funсtion of c it inсгeаses not moгe rapidly than J(с) or VJ(с),Ms satisff

the сonditioп

380

(2.6)

Q.7)

(2.2)

(2.З)

(2.4)

(2.5)

г1

).у.|n])'In]< -.

2'

It is cleаr thаt if the у[п] aгe bounded, then сondition (2.,l) is a сorollary of (2.8).

Thеorem 1. Assume ttrat сonditions Q.2)-(2.1) hold, as well as eitheт of the following:

1'[n]( -,

}ъ[nl - 0, K':0, liin y[z] < 2 / LK,.

Theп for any c[0] the sequence с[n] geпeгated by algoтithm (2.1) is almost сertainly suсh that the limit

J(с[n]) exisъ for it aпd aс

lim V/(с[n - 1]),Ms[n] :0.

tг-l

LJ

(2.',l)

(2.8)

(2.e)

(2.10)

(2.11)

(o 1o\

(2.r3)

(2.\4)

(2.15)

(2.16)

The proof of thе theoгеm, whiсh is based on standaтd appliсation of fасts гegarding the сonvеrgenсе ofsemi-

martingalеs, is given iп thе appепdix. The сondition Еy,[n] < ., whiсh implies that 7[n] + 0, iпsures thе сon-

veIgenсе of the algorithm whеп additive noise is present. In the absenсe of suсh noise (foтmally for }.[n]=0,K1=0)'

e:<prеssion (2.s) is replaсed by the lеss rigid геquiгement fifr-7ln]t< 2fuK,, whiсhdoеs not requiгe that 7[n] tend

to zeto. We should note that in the seсond сase it is possible to obtain a higheг rate of сonveгgenсe.

We should emphasizе that, under thе assumptions made in the theoтem, it is impossible to assrrt that algorithm

(2.1) minimizes funсtional j(с). This is not surprising siпсe, foг example, all the сoпditions of the thеorеm hold

foг s[n] = 0. If in the рeudogтadient сondition we гequirе stliсt iпequality foг all с[n-1] other than the minimum

poiпts, we сan obtain тhe stronger assertions given be1ow.

Coгollary 1. Assume that, in addition to the сondidons of Thеorem 1,

vI(c|n_ L7)'MsIn] > 6(е) } 0 for t(c|n_ 11) > I- + е

for all с > 0. Thеn l (c|n)l\ l, .

Coгollarv 2. Assume that, in addition to the сonditions in Theorem 1, thе spaсе Н is finite-dimrnsioпаl

GI =Rk), sеE of the foгm {с: j(с) > сonst} aге bouпded, aпd

vJ(c[n_tl),MsIn] > 6(е) > 0 foт ||vI(c|n_ 1])|| > e

for all o > 0. Thеn therе almost сertaiпly еxists a suЬequenсe ni and Points с r suсh that

v.r(с.):Q, c|n,1\g., t(c[n])ar1с'1.

Foг an aгЬitrary set с сH and poi.nt c6H we denote the distanсe from с to C by p(с, C), i.е.,

p\c,C\:]nf ||с _ о||.

{,еU

Corollary 3. Assumе that, in addition to the сonditions of Thеoгеm 1, the set C l of minimum poinв of

J(с) is not empry and

inf J (с) > 7- foг p(c, C-у 2 е,,

for all с > 0. Then p (с[n1'

then с[n] ч с*.

vI(с[n-L])"MsLn]

aс.

с.) + 0, /(с[n])

>6(s) >0for p(с,Cr)2е

+ I'. In paгtiсulaг, if с. сonsists of the single point сi,

381

The prooft of Corollaries 1-3 arе very simple and aгe theтеfoгe omitted.

Foт thr саsе in whiсh C. сonsists of a single Point and (2.8) holds, Corollary 3 is а dirесt 8еneтalization of the

above results of Blum [16]. As сompared to the latter, however' it is not assumed that a sеcond derivative of J(с)

о<isв, that м [l *nlll 2 is bouпded, oг that H is finite-dimensional. Coгollary 3 is also similaг to Theolem \,III of

Aizermaп, Bгaverman, and Rozonoeг [20] aпd to Thеoгem a.a iп [35] (in [20, з5] it is assumеd that J(с) is twiсe

differentiable,and \[n]=сoпstintheсonditioпanalogousto(2.5)). Theorem1andCorollaries].and2withсoп-

dition (2.8), in which it is not assumеd that minimum Points exist, aгe thеmselves geneтalizations of the above

геsults теlating to stiсtly gгadient algoгithms [15, 12]. Finally, giveп сondition (2.9) the assertions of the theoгеm

and сorollaтies relatе to сertain гesults гegarding the сonvergenсe of deteгministiс minimizadon algorithms (see

l2l,227 and the bibliography given there).

The material whiсh follows (s s3.5) is devoted to an analysis of thе сonveгgепсе of vaгious algoгithms by

means of Thеorem 1. If the initial pгoblеm involvеs the miпimization of a smooth fuпсtional J(с), thеn the sub-

stantiation of a speсifiс algoгithm гeduсеs to сheсking the сonditions of Theorem 1, pгimaтily the реudogгadient

сondition (2.4). Heте algorithms aгe dividеd into two 1аrgе сlаsses, гegulaг and seaгсh (whiсh will be dealt with

in $ $ 3 and 14 тespeсtively). Iп the formeг, a stoсhastiс realization of the gradient is used in сonstruсtiп8 the algo-

гithm, whi1e in the seсond it is assumеd that only rralizations of the funсtional arе aссessible. In the moге сomplex

situation in whiсh the funсtioп to be minimized is not smooth, it is possiblе to utilize Theorem 1 by сoпstruсting a

diffегent аltifiсially intoduсed functional with гespесt to whiсh the algoтithm is а рeudogгаdiеnt algoгithm. Еxam-

ples of this approaсh will be given in $5.

3. Regu1ar A1яorithms

Assume that in spaсe H we aге given funсtiona1 J(с) satis8/ing сonditions (2.2) and (2.3). It is assumed that

at poiпв с[n_1] the гaпdom realization of the gгadiеnt vсQ(x[n], с[n_1]) is known. We reсal1 that ([n] =VсQ(xгn]'

с[n_1)_vJ(с_1]) deпotes the тandom erroт in mеasuтing the gгadient, M E[n] = 0, and assume that the varianсe

([п] satisfies thе сoпdition

M||С[n1l|, < r[ru] * K'l(c|n _ 1]) + KJ|vI(с|n _ 11) I|,.

The most important Partiсular сases of ttr.is сondition аre as follows:

aЛЕ[n]ll, { o,

i.e.' additive noise (the grаdient is measurеj with a speсified absolutе ешor);

M||ЕLn1ll, < .KзlIY I (c[n - 1]) ||,

i.e., multipliсative noise (the gгadiепt is measuгed with a sрсifiеd relative eпor).

Let us сonsider the gтadient algorithm (multidimensional Robbins-Monгo stoсhastiс approximaтion

c|n| : c|n _ 11 - у|n]v "Q @|n|, c[n _ 1'l) .

T!e9!9щз. Assume that сonditions (2.2)' (2.3)' (2.6)' (2."1), (3.1), arrd either of (2.8) oг (2.9) hold (with

K,=1+K3). Thеn foг aпy с[0] in algoгithm (з.2), the quantity limJ(с[n]) almost сertainly exisв, and lim || VJ(с.

ac

tn]) TI = 0. If, moгeovеr, H =рк and the set {с: J(с) =сonst} aте boundеd, then theге almost сeгtainly exists a

point сr, VJ(с.) =0, and sequеnсe ni suсh thаt сtЧ]T с., J(с[n])19 J(сr). If the point с. foг whiсh VJ(сr1=g

A^

is unique, then с[n] + сi.

Pгoof. WehaveMs[n]=VJ(с[n_1])'sothatvJ(с[n_r])Tмs[n]=llVJ(сtn-1])ll2andeхpгеssions(2.4)an(2.12)

ho1d. ontheothеrhand, Mlls[n1||2= llvJ({n_r])||2+мll6tn]ll2, sothateхpгession (3.1)implies that (2.5)holds

with K,= 1 +K3. Using Coгollаry 2, we obtain al1 the asseтtiorв of Theorem 2.

The poof is given to show how simple it is to apply Theoгem 1 in this сase. Henсеfoгth we will omit ele-

mentary сomputations of this tyPe. It is for this leason' iп paгfсu1ar, that the pгoofs of Theoгems 3, 4, and 6 arе

omittеd.

з82

(3. 1')

(3. t" 1

algorithm)

(3.2)

Theoтem 2 (allowing for (2.8))stren8thens the result of Theoгem 1iп[15]. In a somеwhatmoгe 8eneral situa-

tion (when MB[n]=b[n], rll ьtn]l| < .), a similar assertion was pтoved in[12] Сfhеoгem 2). Fiпally' subjeсt to сon-

dition (2.9), Theorem 2 implies results that Beneв1-ize those alгeady known foг thе dеteтministiс case [21].

Let us now ana|уzе the diгесt geneгalization of the gгadiеnt method in whiсh the sсalar faсtor y[n1 is replaсed

by the lineаr oPeтatoт Г[п] (in the finite-dimensional сase, Г[n] is a matrix):

cIn] : c[n _ Ll_ г [z] v "Q @|n], c[n _ Ll) '

Lеt us assume that Г[n] satisfies the following сonditions:

}' ',.1,1Ii : ф,

=

(3.з)

(3.4)

(з.5)

c"Г|n1с > 1"|||[nl|l llс||,, t" } 0 for a|I c е II.

Conditioпs 13.4) and (3.5)hold' for example, if Г[n]=7|г!]г' and Г is a symmetтiсal positivedefinite oPeгatoг

(this being equivaleпt to transfoгmatioп of the meFiс of the initial space in сonjunсtion with a striсtly gгadient

methф; sеe [7]). Howeveг, the opeгator Г[n] need пot be symmetriсal in general. For eхamplе, it сan havе the

foгm Г[n] = y[n]г, Г=Г' + Г2, wheгe Г1 is a symmetriсal Positive definite oPetatoг aпd Г2 is a skew-symmetriсal

opегatoг' i.e., сTГ,с= 0 (for the deteгministiс сase with a quadratiс funсtional J(с), iteradve methods With suсh

operatoIs weгe сorвideтed in t23]). Wе wгite y[n1 =||г 1n1ll .

Theoгеm3. Wheпсonditiorв(2.2),(2.з),(2.1), (3.4),(з.5),andeitherofсonditions(2.8)and(2.9)hold'the

asseгtioпs of Theorem 2 are valid foг algoгithm (3.3).

We сan investigate the algoгithm in whiсh ttre gradiеnt is subjесt to a tansformation of a somewhat dif-

feгent rype iп exaсtly the same way. Speсifiсally, let us assume that

Y "Q (x, c) : А(c) q(т, c), (3.6)

where A(с) is a linеar oPeтator ftom H to H, and thе veсtoг q(х, с) is сapablе of being mеasurеd. Foг examplе,

A(с) сan be a sensitivity matix, whose сomputation is extremеly сomplех. Thus, if the funсtional has the form

Q @, c) : llл (с) * r||,, Y "Q (x, с) : R, (c), (r1(с) * g), (3.7)

thеnА(g)=.x'(c)T is thе sensitivity matriх (Jaсobi maтix), whosе сoпstruсtion гequiгes that we сalсulate the paтtial

derivatives, and q(x, с)=R(с)+x is a measuгeable veсtor. of intеrest is the method in whiсh the opeгator А(с) is

disсardеd, пamely

c|n] : сtn _ 11 _ у|nlq@|n7, с|n - 1]) ,(3.8)

where thе пoise ([n] =q(x[n]' с[n_1])_\q(x' сГn-1]) satisfiеs сonditions (3.1). Similaг methods weге сonsidетed

in|2a], as геgaгds the fuпсtional J(с)= ll кrсlll 2 in the absenсe of пoise. We will assume that A(с) is positive dеfinite

(although not neсessarily symmеtriсal and bounded), i.e.,

e'А(c) c > ?u||с||", }, > 0 and |l,4 (с) lI < a(3.9)

foг all с, с6H.

Theorem 4. Wheп сonditions (2.2), (2.3), (2.6)' (2.,l)' (3.7)' (3.9)' and eitheг of сonditions (2.8) and (2.9)

hold, for any с[0] in algorithm (3.8) we will have lim M*Qk, c1n1) 1с 0 (аnd сoгвequently lim |l vJ(сtn]) l| = o).

Let us now anaIуzе another gтoup of рeudogadieпt algoгithms in whiсh thе сomPonenв of the gradient arе

subjeсt to a nonlinеаг trагвformation, and let us disсuss the "sign" algollthm whiсh is most сhaгaсteristiс for this

gгoup. Hепсеforth a subsсгipt nФ(t to a vrсtoI will denote the сorresponding сomponеnt ofthe veсtoг' the notafon

sign с, wheгe с= (с1, . . ., сk), wi11 denote a vесtor with сomponents (sign C1, . . ., sign сц), aпd thе funсtion sign t

(tbeingasсalar)isequal to1foгt>0,_1 fort<0,апd0fort=0. LetusсoгIsideranalgorithminwhiсhon.ly

thе signs of ttre grаdient сomPonents aте allowеd foг (еe [8]):

(3.10)

383

cLn\ : cfn _ I] _ у|nJ sign V"Q @tn], c|n _ 1]).

We wi1l assume that еасh noise сomponent assumes positive and nеgative values equipгobably and

exists a nonzero probabiliф that the пoise is smаll in absolute value, i.e.,

P (Е[nl'> 0) : P(Е[z]' < 0),

Р(0< 6[z]'< e) > 6(е), P(_ е { Е[n]r< 0) > 6(е)

forall с'>0andall c[0],..., с[п-1]' n,where 6(о)isamonotoniсallyiпсreasingfunсtion, 6(s)>0for е >0.

Theoтеm 5. Givеn сonditiorв (2.2), (2.З), (2.6)' (2.8)' (з.11)' aпd (3.12), the same asseгtioпs as in Theoгem2

hold for algorithm (3.10).

The proof is given in the appendix. We should emphasize that both сonditions (3.11) and (3.12) aге essеntial

(when thеy arе violatеd, the algorittrm сan lose the рeudogгadient pгoperty). The neсеssity of (3.11) has alreаdy

beeп пotеd in the literaшгe [25]. We will give a simp1е example that shows that failure to obseгve (3.12) сan сause

ttre algorithm to diveгge. Assume that the veсtor с is one.dimensional, J= c2 p, and, ([n] assumеs the values + 1

еquiproЬably foг all с[0], . . ., с[n_1], n. Then for |сtn_1]|> 1a steP in аlgorithm (3.10) is аlways madе in the

гequisite dirесtion (to point 0). But foт l сtn_1]| < 1 we will have sign V"=Q&[n], c[п_1])=sign ([n], so that a

stеp will be made in eitheг direсtioп with probability 1/2,. Thus we will have a ra ndom walk on] the segment ('1, 1).

We shorrld also note that iп algorithm (3.10) we have ttstn]!|2 = k, exсePt foг points at whiсh some of thr

сompoпenв of v"Q&[nl' с[n_1]) are equal to zero' so that we сannot havе }"[п] = 0, Kt=0 in (2.5). Theгеfoтe

сondition (2.8) сannot be гeplaсed by сoпditioп (2.9) that 7[n] is boundеd. Foг the algoгithm to сonveг8e Without

assuming that y[п] + 0, it must be mфiflеd. E'oг examplе' wl сan tаkе the algorithm

с[n1 : c[n - I] - х|nJ|I V"Q(z[z], c|n _ t,1||

sign V"Q (z lnl, c[n - 171 ,

and substandate its сonveгgenсе in the absenсe of additivе noise with сondition (2.9).

Let us пow сonsidег гegular algorithms in whiсh the veсtoт q[n] is сhosen in some гandom fashion (indеpen-

dently of the realization of the gradient) and steps are made either in the diтeсtion of q[n] oг of _q[n] depending

on whiсh of thеse veсtols makes an aсute anq1е with the rca|tzation of the sтadiеnt:

cIn]: c|n- 11-у|"Jq|n](v"8 @[n], ctn- 1])'q[rz]).

Assumе that the veсtor q[n] is distributed suсh that foт any ссH, ll сll = t the rms va1ue of its projeсtion onto

с is nonzегo (iп othеr woгds, foг eaсh hypeгplane there exists a пonzeгo pгobability that q[n] wil1 apрar outside it,

oг, equivalently, the covariаtion mafiiх of фn] is пoпsiпgulaг):

M (q|n1,с)" > ?u> 0'

For e><ample, expгession (3.1a)ho1ds inthespaсe кk if qtn]has a positivedistributioпdensity onsomе boundеd

сonvex body for whiсh 0 is an interioг oг surfaсе point. Another example in whiсh (3.14) holds is гandom сooгdinate

desсent' in whiсh uпit veсtoгs ei with Fobabilities pi[nl > с ) 0 arе takеn as q[n].

Henсeforth we wi1l assume that the random quantitiеs q[n] and ([n] aгe independent, the vaгianсe of ([n]

is boundеd, aпd thе veсtoгs фп] aгe not exсessively laтge:

M', o{сI"1,q|n1} :0, M||с[z] |l, < o,, M||q|n7lln{ с,

(i:7, "',4)'

Theoгеm 6. Assume that сonditiorв (2,2), (2.З), (2.6), (3.14), and (3.15) hold, and с[0] is aгbitгary. Thеn if

eithеr о<pгession (2.8) holds or o2 _ o аnd iim ytn] <2|'/Lo4 then the asseтtions of Theorem 2 hold for algorithm

(з.13).

Weshould пote that if in (3.13)weallow only for thesign of v"Q&[n1, с[п_1]Tq[n], i.e., сonsider the

algoгithm cfu1 : с|n - |] _ у|n1q|nJsign (v"Q (rIn7, c|n _ L)).q[n] ),

it will сonveгge only subjeсt to additioпal аssumPtions of the type of (3.11) aгd (3.12) regarding the erroг ([n].

384

that therе

(3.11)

(3.12)

(3.13)

(3.r4)

(з.1 5)

Еxсept for аlgorithm (3.8)' wе have nowhere madе spесifiс the foтm of the funсtional to be minimizеd.

Natuгally, foт narrower сlases of fuпctionals the сonvеrgenсe сonditiorв of the general algoгithms сan be гefinеd

and, moгeover, spесifiс minimization algorithms arе possiblе foг them.

Let us сorвideг one example of this typе, аn identifiсation Pгoblеm. Аssumе that we have a lineaг object

to whosе input we fеed vесtors xГn] €Rl( with сertain speсifiеd (but unknown) distributions, апd at Whose outPut we

measuге the sсalaг quaпtity (сr)Tx[п] in thе pгesenсe of noisе ([n]. Wе aгe гequired to гeсover theJeсtor с.

from the mеasurеd values of x[n1 and y[n1=(сr)Tx[n]+ 6[n]. We intфuсe the funсtional

I (r) : M,. u{fu - c'r)'}.

Then the gradieпt algorithm for miпimizing it has the form

с[n1: c|n_ 11 - х|n]("|n_ 1],r|n7 _ у|nJ)r|n1.

vadous mфifiсations of (3.17) aге also possible, e.g.,

(з. 1 6)

(3.17)

/a 1a\

/a 10\

(з.20)

(3.21)

c|n]: c|n _ 1I_ 'c|n _ 1], !Iу1-_ у|n\.',n',

' llrlnlll"

с[n] : c[n_ L] _ ^l|n1k|n - 7l'r[n] _ у|n1) sign z[z],

c[n] : cfn - 1l _ 1[z]sign{c|n - |]'x|n| _ у|nl)r|nl,

Bibliogтaphiсa| data regaгding thesе algorithms сan bе found in [1, 2, 4]. We note only that (3. ].8) is thе

Kaczmarz al8orithm, (3.19) is the Nagumo.Noda algoтithm, and (3.20) is the "peгсeptron" algorithm.

Lеt us makе some assumPtioпs regаrding the random геadings x[п] and пoise ([п].

1. The пoise ([n] is unbiаsed and is of bounded varianсe:

MС|.n] :0, MЕ"|n|4 o"'

2. Thе veсtoгs x[n] аre пot excessivеly large, and the сovaгiadoп matгix is пonsiпрlar:

MllxLnlll' { an (i : 1,. .., 4), M {(c'x)z) > }"llcll" (З.22)

/a oal

3. r|n]and([ru] аre independепt.

Iп substandating algoгithm (3.19), wе wil1 require а stlonger assumption:

4. АIL the сomponenв ofx[п] are indеpendent, and the meaп valuе of eасh is equal to zero:

p(dr):Pl(drt) pt,(drн), Mr,:0, M|r'| > s ) 0. (3.24)

Theoгеm 7. Assume that сonditions (з.21)-(3.23) hold, and с[0] is aгbitгary. Then:

1. If eitherсoпditioгls (2.6)and (2.8)hold oгМ(z[n]= 0 and lБ.ytn] =2}J/аzoц, thenwehaveс[n] T"..

2. If ME2[n] = 0, thеп we havе с1n1 1.i с * in al8orithm (3.1s).

3. If сonditions (3.24) hold and eitheт We haуe (2.6) and (2.8) or М 62[п] = 0, lim f|п7 < 2az / e, thеn in

algoтithm (3.19) wе have с[n] 5с. .

+. If сonditions (2.6), (2.8), (3.11), and (3.12) hold, then in algoгithm (3.20) wе havе с[n] T "..

The proof of the thеoтem is given in the appendix. It is based on a сheсk of the pseudogгadieпt naturе of thе

algorithmswithrespесttoeitherfunсtioпal (3.16)orfunсtioпal J(с)=1/2пс_с.l] 2. Weshouldnotеthattheсon-

ditions indiсated in the theorem aгe еsseпtial. Еoт eхаmple, Kaczmaтz'algoгithm (3.18) may пot сonveгge iп the

Prеseпсe of noise (i.e., foг М(2[п] > 0). Conditioп (З.2a) сannot bе disсaгJed foг algorithm (з.1s). For examplе,

385

assume that k=2 aпd both сompoпeпв x1[n]' x2[п] havе the same sigп (in рrtiсular, x[n] may be uniformly dis-

tibuted in *reregion l*'l = l, |*,l S 1, x1x2 > 0, and theп (3.22) ho1ds Ьut not (3.24)). Heгe the veсtor sign x[n]

will Ьe always сolliпeaт to the veсtor (1.1). Theгefore, except foг the саse in whiсh c1[0]_с2[0]= с1. _с2', algo-

тithm (3.19) wi1l пot сonvelge. Finally' assumption (3.11), (3.12) regarding the noise ё[п] is also essentiаl for

algorittrm (3.20). For eхample' if ([n] = * 1 еquipтobably, then as с[n] аpproaсhes с. the sign of the differenсe

с[n_l]Tx[n]_y[n]= (с[п_1]_с.)Tx[п]_ ([n] wi1l be dеtermined only by the sign of ([n] and with equal pгobabiliry

the steр will be made eithеr in the diтeсtion of x[n] or of _х[n] aгd thus сonveтgenсe will not oссuг. Мoгeover,

it is impossible to take 7[n] = 7 > 0 iп algoтittrm (3.20) eveп in the absеnсе of noise (sinсe in this сase the step

leпgth will пot tend to zero).

4. Searсh A1goгithms

Now assume that the value of the funсtioпal J(с) (and not of the gradient J(с), as befoгe) is сapable of bеiпg

measured at аn arbitrary Point с'

Let us сoгвider a general sсheme whiсh embraсеs most seaгсh algoтithms. At the point с[n_1] we сhoosе,

inastoсhastiсoтdeteтministiсfashion, l veсtoтsql[n], . ..,ql[n]. ThеnumbеrZ mayassumeаnyvalue. For

example, in the гandom-searсh or сoordinate-dеsсent methф we have Z =1, in thе Kiefeт.Wolfowitz method z =k

(wheгe k is the dimension of the spaсe), аnd in methds of expeтiment planning l > k. Then at the Z + 1 poiпв

с[n_1], с[n_]+с[n]q[п], . . . ., с[n_11 +сr[n]q,[n]. we сomPute the values of J(с) with resрсtive eтгors g0[n1,

Etn], . . .' EZ [n]. We denote the measuтed values тespeсtively by

Q @'[n\, ctn _ LJ) : I (c|n_ t] + ЕoIz], Q @'[nl' c[n _ 1] I a|n]q||nl)

: l (c|n_ 1] + q,|n1q'tn|) * Е.[o], ..., Q(x,|n}' cIn_ 1] + oIn1qltn1)

:l(c|n-1] + a|nlq|[n]) +Е.[z].

Then we make a step in aссordanсе with thе formula

с|n!: c[n _,1* ;{f]-Е |Q @.|nI, c[n _ LlI o'|n!q,|n\) _ Q @,[nl' cfn _: 1l),I q,|nl. @.1)

i:1

Thе numeгiсal сoeffiсienв y[n], с[n] satisfy thе сoпditions

o[n] + Q,

11 т,[,] = -

11 a'LnJ

(whегetheseсondmeansthatthelengthsof the"nial stePsnсt[п]mustteпd tozеro moгегapidlythaпthelеngttш

of the 'woгking steр" 7[n]). As regaгds the veсtors qi;п1 we will assumе that the following сondition holds: foг

еасh с€H and all n

l">0.

This сondition has the same meaning as (3.14).

Assume' moreoveг' that ttrе qi;n1 "," not о<сessively 1aтge, and thе noisе gi;n] i, not сoггelаtеd with qi[п],

unbiased, aпd of bounded vaгiaпсe:

м \llo,tn]|lj< o1 U:1,...,6), MЕ'|n!:o,

MЕ'[n]qt[n]:g, M(|t[n]), € o,.

' { у'k,q,InJ),| > хl"tt""

L.- t

l-t

386

(4.3)

TABLЕ 2

Name of algorithm IЕorm of qi[n] Refeгenсe

Asymmetriсal variant of Kiefer-

Wolfowitz alqoтithm

7

ь- t, [16, 15]

Symmetiсal variaпt of Kiefer-

Wolfowitz alqorithm | =2k .lг.r = o (i =1

nLrrlL'..

qiГnl = _e. ,. (i=k

l-к

k),

т1, 2k)

[10, 26, 211

Random seaгсh with undireсtioпal trial q[n] unifoгmly distributеd oп unit

sРheтe 111

Random seaтсh with paired tials l=2 q[п] unifoгmly distтibuted on unit

, t- - l- -

sPneгe' q-[nl = _q1n [e, 18]

stoсhastiс m - gradiеnt al gorithm l=m

1=m<k qi;n1 "," гaпdom orthonoгmalized

veсtoгs г9]

Random сoordinate dеscent L -L ql[n] = e' with pтobability Pilnl > o > o[34]

Еx peгiment planпing methods z>k q'1n1 u,e dеterministiс vесtoгs that

do not lie in oпe subspaсе [28]

Theoгem 8. Assume that сonditiorБ

Assume that eittler (z.8) and (4.3) hold oг

rithm (4.1) we will have

(2.2), (2.3), (4.2), (2.6), 14.4), and

o2 = 0 and li* ytn] < y, wherе 7

@L

Щ ll v". (с tz]) |[ : 0.

(4.5) hold, and с[0] is arbitгary.

is suffiсieпtly small. Thеп iп algo-

The proof of the thеorеm Gее the appendix) impliеs that method (4.1) is not in general a psеudogгadieпt

methф; the рeudogradieпt сondition сan be violatеd in a few stePs (at points at whiсh сt[n] is not smаll as сom-

paгed to сll vI(с[п_1] t] ). "Basiсally," howеver, thе реudogradiеnt pгopеrty holds, and this makes it possible to

usе Theorem 1.

Tab|e 2 gives eхamplеs of algoгithms to whiсh Theorem B is appliсablе. In the table, k denotes the dimen-

sion of thе spaсe, and e1, . . .' ek aге unit veсtoгs. We should еmphasize that in a number of algorithms the veсtoгs

q\n1 u'e deteгministiс. Table 2^Ьho indiсates thе paper in whiсh the paгtiсulaг algoгithm was proposеd oг sub-

standated. In most of these (exсept for [15, 21! ttle pгoof of сonveгgenсe is givеn for the сasе in whiсh the mini-

mum Point сiexisB aпd is uniquе. In some сases, сondition (4.2) on thе tгial stеPs is replaсed by the more гigid

сondition Есt[п] yгn] ( ю, Д,Пd it is possible to prove a stгonger assettion regaгding сonvergenсе (e.g., сonvегgenсе

withгespeсt to the funсtional). In [11] the idea was flгst expressed of suЬstantiatiпg гandom-searсh methods by

mеans of geneгal thеoгems on сonvetgenсе. Table 2 indiсates a laгge gтoup of methods known as "methods of

expетiment planning" [28]. Dеspitе the сommon natulе of the problem, the methds in this gтoup wегe developed

independently of stoсhastiс-aPProximation searсh algoгithms' and, a5 fаr as we know, no systemafiс invеstigation

was madе of theiт сoпvelgenсе.

Likе the gгadient algorithm, algoтithms of rype (4.1) maу be subjесted to vaгious tгansfoгmadons. Foт exam-

plе, the vrсtoг on the right side of (+.t1 тna' be multipliеd by a positive dеfinitе matгix [27]. Iпstеаd of thе dif.

feгеnсea(xitn], с[n-1]+ сt[n]ql[n])_Q(xo[n], с[n_1J), we may al1ow only foт iв sign iп (4.1). Suсh a method (for

Z = 1) was proposed in [8] aпd substantiatеd in [17]. To prove сonveгgenсе of this ''sign' algoгithm' it is neсessary

to make speсial assumPtions гegaгding the noise of the same tyPe as (3.11) and (3.12). Moгeoveг' we may сhange

с[n_1.] oгtly when the mеаsured value of the funсtional deсreases ('гandom searсh With геtuгn whеn the step is

uгвuссеssful" [9])' wheге thе tтial and woтking steР6 may be сomЬined. The pгoof of сonveгgеnсe of a similaг

algorithm is givеn in [19]. Еinally' from among the veсtoгs q,Гn]' . . ., q , [n] we may сhoose oпly the oпе for

38?

for whiсh the quantity Q&i[п], с[n_1] +oгп]qi[n])- Q(x0[п], с[n -1]) is maximum C'тandom seaгсh with optimum

trial'). We will not disсuss the suЬstantiation of all these algoтithms, sinсe the teсhпique of pгoof remaiпs thе same

as aЬove.

5. Algoгithms for Minimizin8 Nonsmooth Fuпсtionа1s

In all eхamplеs that we have considerеd heтetofoгe, the initial problem involves thе minimizatioп of a

funсtional sadsfying сoпditions (2.2), (2.3). If ttre funсtional to Ьe minimized is nonsmooth (oт not boundеd ftom

below), then to apply Theoгem 1 wе must сonsшuсt a smooth funсtioпal J(с) artifiсially. There is no need for the

values of I(с) or VJ(с) to be aссessiblе to mеasurеmеnt; it is suffiсiеnt that it be possib1e to сheсk the pseudo-

gradient сondition.

Let us fiгst сonsideг the сase in whiсh Wе aгe required to minimize a сoпvex сontinuous Фut nondifferеntiablе)

funсtional J(с) in thespaсeH. At eасh point, this funсtional has a geпeв1.izd gradient (suppoгt fuпсtional), whiсh

we will denote by VI(с). This is a veсtor ftom H whiсh is dеteгmined (gеneтally speaking, not uniquely) from the

сondition jG+ф>j{фaii@),a

foт all c, Е с н. дt ttle points of diffеrentiabiliry, Vi(с) = v1с). Let us assume that at the point с[п_1] the geneг-

ahizd gradient Vf(с[п-1])is сapable of Ьeing measurеd with eггoг ([n], as a result of whiсh wе obtain thе гandom

u""to' V.ё{*[n], с[n_1])=й("tn-ll) + Eгn]. Wе will assume that

MЕ[n] :0, MIIЕ|n\ll, { o,(t + llсl|,).

The algorithm wе are iпvestigating is a diгeсt analog of the gradient algorithm (mettrф of geneгalized

stoсhastiс gгadient)

с|nJ : ctn _ 1] _ vIz] fl "Q 1r1n], c|n - 1D .

If thегe is пo noise, thеn, unlike the smooth сase, algoтithm (5.3) will not сonveтge foт y[nJ =7> 0. However,

We сan PгoPose a somewhat diffеrеnt method for сhoosiпg the step lеngth (whiсh тесalls Kaczmatz' al8oгithm (3.1s)),

whiсh сoпverges faiгly rapidly in the absenсe of noise:

с|n|:c|n-L1,- irpp-rt)-;- fli 1c 1" - tl).

lltri r,c tz - tt) ll,

нeгej. = inf.J(с) is assumеd to be known, and the quantities j("гn-r]), Vi(сtn_r]) aге capable of being

measured. We will assumе ttratti(с) doеs not inсгease very fast:

llVi1с1 Il, < к(1 * Ilсl|,),

sup11vJ1g1l| <- foт Ilс|| <R<-.

Assume ttrat j(с) is a сonvex сontinuous fuпсtional, and that the valuе of с[0] is arbitтаry.

Theoгem 9.

1. If сoпditions (2.6), (2.8), and (5.5') hold, thеn in algoгithm (5.3) we have lim T("гn]) u="Т. r,n рrtiсulaг,

fo' } " = - .). If, moгeoveг' the set C . of minimum poiпв is not emРty' thеп lim i (qn] 1" T*' с6n1 T " . ."".

2. If (5.5')holds,r* >* чtheпinalgoгithm(5.ц)wehave!!щi("1n11 =Т.. lfC.isnotempty,then

timТ(сгn]) = J*.с[п] T". 66..

Thepгoof of thethеorem (seetheappendix)isbasеd ontheintгoduсtionofthefuпсtioпal J(с)=l7, [|с-сrfl2,

wheгe сi is any minimum point, and on a сheсk of the рeudogradiеnt сoпditioп foт algorithms (5.3) and (5.4) with

гesРесt to this fuпсtionаl.. The result of Thеorem 9'with slight vaгiafons, has alгeady been given (sеe [11-13] as

regаrds algorittm (5.3), and [29] as regards (5.a)). Heтe we only show how еаsily it сan bе obtained ftom thegeneral

theoгem (Theoгem 1).

Г

388

(5.1)

(5.2)

(5.3)

(5.4)

(5.5')

(5. ы')

Let us now сonsideг the situation in whiсh the fuпсtioпal to bе minimized j (") ь nonсoпveх and disсontinuous.

Heгe we сould intгoduсе a сеrtain funсtional J (с) obиinеd by smoothingj (q) and app1y some гegulаr or sеаrсh

аlgoгithm. But eaсh сomputadon of J(c) oт VJ(с) would тequiгe rePeаted сalсulation of J(с). Wе сould proсeеd

other\^rise and apply some random searсh algoгithm direсtlyto j (с). Then еaсh step would тequirе 9n1y two сom-

putations of J (с)' and the methф would be рeudogradient with гesРeсt to thе smoothed funсtionаl I (с) (sincе the

тandom steps in the algoгiтhm тea|Ize a Proсess of smoothingT (c)).

Thus, let us сonsideг the seaгсh algoтithm

c|n| : c|n_ 11 -у|nlq|n](tk|n - 1] + qtnD _i1,1n_ t])). (5.6)

Hеге q[п] is a random veсtoг, aпd it is assumed ttrat the values of r(с[n_1] агe measured without erгoг. We

should emphаsize thaq as сomргed to the versions of the random.seaгсh method thаt we сoпsideгеd eaгlier, heтe

there is no Parameteг сt[it] that tends tozeto- thеlength of thе trial steрs in algorithm (5.6) does пot deсгease.

on the сontгary' the value of у[n] must tend to zeto' despite the absenсe of noise. The distribution of q[n] will be

asumed to bе stationary with density P(ф; radial' i.e., depending oпly on the length of q; and

p(dqlnl): p(q)dq, p(q): p(q,) foг lIgll:l|g,11, (5. ?)

(for example q[п] may be uniformly distributed oп a unit ball oт have a notmal distribution with zeгo mean value

and variаnсe o2I).

We constгuсt a new densityЪ(q) r'o* thе сondition vЁ(ф= _qp(q). It is сlеaт that sinсe p(o) depends only

oп тlqll , this equatioп has a solution. It is not diffiсult to dеtermine it осpliсitly. We intтoduсe the smoothed

funсtioпa1

J (r) (5.8)

l/(с)| <Д (5.9)

t'or all с' then, as is еasy to show (see appendix)' this funсtional satisfies сonditiors (2,2), (2,з).

T!gg,eдn-ц Under сonditioпs (2.6), (2.8)' (5.?)' and (5.9)' aгrd foг any с[0]' in algoгithm (5.6) we have

aс

lim ll vI(сtn])|l : = 0. If thе only stаtionary point of J(с) is a miпimuц poiпt, then algorithm (5.6) аlmostсегtaiгrly

сonverges to it.

Thus algorithm (5.6) minimizes not thе initial funсtional J (с) but the .smoothеd " funсtional J (с). This

approaсh to the pтoblem is natural, siпсе it it mеaпinglеss to sееk the minimum of a disсontinuous funсtion i(с) *

the value of T(с) on aпy еnsеmble of poinв сonveys no information тegaгding its vаlues аt other points. We сould

similaгly investigate the matter of the smoothеd functionals to whiсh otheт minimization searсh algorithms сoг-

геspoпd.

In сonсlusion, let us сoгвider сertain iterative аlgorithms foт reсogпitioп training. It is kпown (seе' e.g.,

|1-, 4, 20, з0.з2]) that many patteгn.reсogпition problеms сaп be геduсed to thе Pгoblem of solving a system of

homogeneous linear inequalities in Hilbert spaсe:

с'g{0 foгall rеА. (5.1 0)

Here A is a fiпite oг infinite set whose elemen8 apPeaг гandomly in suссession in aссordanсe with a сегtaiп

speсified pгoЬability distгibution P(dx) oп A. The problеm has muсh in сommoп with the identifiсationDrob-

lem сonsideгed $ 3 (whiсh reduсes to the solutioп of a system of linеaг equations that appear at random), and iimi.

1аг algoгithms сan bе used foт its solution. Wе denote by x[п] a point ftom А whiсh appears at the n-th step.

t ttqtt,p(q)dq < *,

t''oo (ф _ (q + ф p h + ф||dq< /{|l4|l

P-

_ ),r(с*q) p(q)dq.

H

If

з89

Leт us сonsider thе following iteтative algoгithms:

cl.nl:c[n-t]-vlru] 1 t sign cln - 1!'r[n! (c[n - L7'rlnl)al.n},

c|nf : c|n _1 1 _ 1 + sign :Jz _. r ].u [,] k[n _ 77'

а|n!) r[n;,

ctnl: c[n - 11- ^,1n17 * "igo"I' -Ll"tlol trot.

a

The сonneсtions between algoгithms (5.11) and (3.1?), (5.12) and (3.18), and (5.14) aпd (3.20) aгe сleаг.

Bibliographiсaldataregaгdingalgorithms(5.11)-(5.14)сanbefoundinthewoгksсitedabove. Itisalsoshownthat

the attempt to сonsider these algorithms as gгadient algoгithms gеnerally 1eаds to nonsmooth funсtionals.

We will assume that inequalities (5.10) have the following solutioш

C' : {cеH Ic"сqQ for all IеА\ +o

(in terms of the initial pтoblem' this meaгв that erгoг-fгee гeсogпition is poвsiЬle).

Sometimеs we will rеquirе a strongеr assumption гegaтding thе existeпсe of a poiпt с.66 " suсh that

(с-)'с{_в(0 foгall rеА

("тepreseпи bility hypothеsis " ).

We will assume thar the statistiсs of thе vесtoгs x[n] arе suffiсiently representativе; iп ргtiсular, if с does not

satisfy (5.10), thеn there о<isв a nonzero probability that this faсt will be гeveаled by means ofthe veсtors iп ques-

tion: P{c'x>0}>0 for cёC*,

Finally' we assume that the sеt А is bounded:

lla|1 q, for rеА.

Theoтem 11. Assume that сonditiorв (5.15), (5.1?), (5.1s) hold, and т}rat с[0] is aгbitrary. Theп:

1. If Еy[n]= -, y[n] >0, l.imytn] <2/d2 (e.g., 7tn] =7, 0

ac

с[n] + с.€ сo.

2. Iп algorittrm (5.12) we have с[n] * с* € сt.

<у <2/dz), theninalgoritirm (5. 11) we hаve

3. If (5.16)holds, theп algorithm (5. 13) ъ almost сeгtainly finite (i.e., с6n11с".6 Cr foгsome n).

4, Lt(2.6) arrd (2.8)hold, theninalgoтithm (5.14)welraveс[n] T с"€ сi. If (5.16)holds, thenalgorithm

(б.1a) is almqt сеrtаinly finite for aпy 7[n]:y > 0.

The proof of thе thеoгem, whiсh is giveп in the appendix, is based on a сheсk of сorrditions (2.a) and (2.5)

as applied to funсtioпal I(к.)=|/zl|с_с.|| 2. wtrat is пew as сompaгed to the preсеding theoгеms in this seсtioп is

the asseгtion that сеrtain algorithms are finite; this assегtion is also implied by Theoгеm ].. The fiгst rеsult гegardiпg

the fасt thatpeтсeptтonalgorithm (5. 14) with 7[n]=сonst is сonstant is that of Novikoff [30]. It was suЬequеntly

stгengthenedin[20]. PгooftoftheсonvегgenсeofalgoгithmsofthesameryPesas(5.11)-(5. 13)(аndсertainmore

geпeгal ones) weтe obtаinеd iп paгtiсular in [31' 32]. Theoгеm 11 offers an appгoaсh to uпiform derivation of suсh

assertioпs (some of whiсЦ e.g., тegardiпg the сoпvегgеnce of algorithms in the absenсе of the теpгesentаbiliry hy-

pothesis (5.16) aгe peгhaр nеw). We should also пote that algorithms (5.11)-(5. 14) сan readily be gеneгalized to

the сase in whiсh (5.10) is гeplaсеd by a system of morе geneгal inеqualities (e.g., сoпvex nonlineaг aпd inho-

mogeneous ones).

390

(5.1 1)

(5.72)

(5.13)

(5.14)

(5.1 5)

(5.1 6)

(5.17)

(5.18)

APPЕNDIX

Pгoof of Thеorem ].. From (2.з) we obtain for аny с, Е сH

L

l/(c * а)_ J(c)_ VJ(c)таl< Т |lа||,.

If in (A.1) we substiшte (A.1) foг с aпd с[n] ftom (2.1) for с+Ъ, we obtain

L

J (cInD < /(с[n _ r])_ т[n]V/(с[n _ 1])'s[n]*;v,[nll|s[zll|,.

We denotе byЕ theminimum o.algebra generatеd by the raгrdom quаntities s[1]' . . .'

*re quaпtity J(с[n])_I'. We take the сonditional mathematiсal exPeсtation of both sides of

and use (2.5): M{p[nllF}< р[" _ l]_ у|nlYl(c|n _ 1])'}/{s[z] |F}

LtKф\

* ^ у"|"lи{||stnlll,lF}( р[n _ 11 ( t *---v,["] )

2\ZI

_vtz]v/(с[n, _ 1])'J4{s["1lF} ( ,_Ц,7o1) +*,,t"]rtn]* K!^L fLnlt'.

\zlzz

Iпview of (2.8) oт (2.9)wehаvе t_KiL уtn|>o for suffiсiently large n, ard therefoгe theseсond teгm

oп the гiфtside of (A.3) is nonnegativ" in.""l.d,nсe with (2.4), i.e.,(A.3) assumes the foгm

It{tl|nllг}< рtn_,1 ( ,1*!,,pl ) +f v'r n|ъtn1*K'!у,pJt..

Iп the сase of (2.9) wehavе Kr=0, ).[n] =o, ,o ttrat M{p гu]| г} = р[n*1], i.e., p[п] is a semimartingale.

Foг thе сase of (2.8), we сan use the semimaгtingale theorem in exaсtly the same way as done in [33]. In all сases

we obtain thаt lim р[n] almost сeтtainly e'(ists, aгrd the unсonditional expeсtations M;l[n] aгe uпiformly boundеd:

,}1р[n] { с.

. We should note that thе тesult is valid even if (2.8) is гeplaсed by the weаkет сondition 7[n] + 0.

In iпequality (А.3) we пow сhange to the unсonditional expeсtatioш:

rKtL\/KzL\

rtil"l< { t+iЭv,tnl | мp"|n_ 1]-тtn] ( 1 _7тtnJ )

\2t\zt

X M{YI(c|n_ t])'.{и{sIn ]|г}}+ 1v,t n!?"tn|* K|L у,|n1,

22

We sum (A.5) over n fгom 1 to оq. In view of (A.a) aгfl (2.1) wе havе, both for сondition (2.8) and

K& L K1L_ - -

-z\z[nlvр[n _ t] { оo, ->у2[nlхLn]{ oo, _2\2|nl.- -'

and therefore tKrL\

>уtz] ( 1_-тт[n] |M{vr@[n_|l),M|s[n]|F}}< -.

\ZI

Foг suffi'сiently large n, we will have 1_!9уp1 } 0 ) 0, for both (2.8) aпd (2.9), and theгefoге

(4.1)

( .2)

s[п_1]' and by r[n1

iпequalities (А.2)

(A.з)

(4.4)

(A.5)

for (2.9),

(4.6)

Zу[n]M {Yl (c|n _ |l)TM |s[z] |F}} a oo.

391

But Е y[n] = cq while VI(с[n-1])Tм {stn] |F} =0, and theтefoге it follows fгom (A.6) that thегe exists а

subsequeпсe ni for whiсh

l1m M{vI(c[nt - 1])'.&/{sln'llF}}: 0.

Еquadon (A.?) implies that there exists a suЬequeпсe пi. suсhthat VJ(сtni.*11)Tм{stni'1 tг} 3сo

Allowingfor(2.4), thisyiеldsthatlimVJ(сtn_1])I{5;д]lг} =,o,aпdthеtheorertristhuspгovdd.

We should notе that if in (2.9) the сondition 1Tй ytn] < 2 /-L]Kz is replaсеd ф the more гigid сondition 0 <

€1 s yгn] = 2(1_ a) pK,, 6z ) 0 and we also require that *re inequality

yr(c[n- 1.1),MslnJ 2 KllyI(ctn - t]) ilr, K > o

hold (a paltiсular сase of (2.12)), then the above сalсulаtioгs would also imply the сonvergеnсe J (с[n]) Т J " "t

the гate of а geomeпiсal pгogгеssion, and с[n] would сonveгge to the minimum point с* at the same гate. We will

not disсuss this matter in detail, sinсe the topiс ofrate of сonvergепсе is not сonsidегed in thЬ paper.

Proof of Theorem 5. Еor algorithm (3.10) we have s[n]=3igп v"Q&[п1, с[n-1]), so that

Ms[n|i: P{V"Q(с, cIn_!l); > 0} - P{Y.Qp, с[z_ t])i < 0}.

If VJ(с[n_1])i } 0, we obtаin ftom this, using (з.11) and (3.I2),

M sfn]i : P |([n]1 } -V./ (с [n _ 1 ]) J - P{6["]n < _Y I (с|n _,' i \'}

) P|_vI (с|n _ |])l { Е[z]t < 0} > 6(V/(с[z _ 1])'.

Similаrly, foт уJ(g[n_1])i < 0 we have Мs[n]i =_6(_ VJ(с[n.1]i). Therefoтe we alwаys have VJ(с[n*1])iMs.

[n]i > | vJ(dп_r]). |о tсJn_r1),| ), so that

YI (cLn _ !,!)'Ms|nl= llv/(с[n _ 1l) Il 6 ( ilvr(сt" _ tll ш '' .

/с \ h !.

Thus expressions (2.4) and (2.12) hold. Sinсe llstn]|| = {k, expresion (2.5) a1so holds, and therefoгe Corollary

2 is appliсable.

Pгoof ofTheoгem 7It сan Ье сheсked diгeсtlv that for fuпсtional (3.].6)

I (c) ) 0' V/(с) : м*|((c _ с*) тэ) с},

ilv/(с a c) -vr@)||: ||M"{(aтl)с}ll < ||a||M||c||z { ozllаlI,

so that сonditions (2.2) and (2.3) hold with L= с2. Еurthermore, ftom (3.22) we have

|lvr1"; ll llс _ с*lI } YI (c)\ {c - с*) : M*||(c - с*)т,1z; >- ?u||c _ c*||2,

|lv/(с) || >- l'||c _ с*||' J(c) _ I (c*) : M", u{(2у - c,a - c*'x)

Х (c*тg _ c,с)I : M"|[(c _ c*)Tс]2} 2 ?у||c _ c-||2'

i.e., J(с) satisfies сondltions (2.15) and (2.76). For algorithm (3.1?) wе сhесk сonditions (2.4) aгrd (2.б):

Msfnl : YI(c[n_ t])' v/(с[n _ L7)"MуLn] : llV/(с[tl' _ 1]) ll2,

Иlis[n]ll, : M",|@|n _ Ll,a _ !)2||x||2|: M.{|(c[n _ t]_ c.)'g],llr|lз}

* MЕzIn|M||all, < ll с [ n _ 1 ] _ c. ||zM||x||L *'o,o" € o zаz+ * ш vr (.t n _ t ]) ||,.

Appliсation of Corollary 3 yields the first assertion of thе theorem.

In algoгithm (3.18) we w111 assume ttrat 7[п1 = 1, while as J(с) we takе % |l "_".l| ?. тhen

YI (cln - ll)'ltsltul: M" [ (с[z _ 7l_ c.утg1z

llсll, '

[ (с[n - l]_ с.)т3]z

I

392

M||s|n1||2 : м" llsll'z

Ф,1)

os j+ *.

i.e., expressioпs (2.4) and (2.5) ho1d with \[n] = 0, Kl= 0, Kz= 1. Siпсe у[n] = 1. < 2 /LKz=2, expression (2.9)

holds, and thus Theoгеm 1 aпd Coгollary 3 arе appliсable; this yields the seсond assertion of the theorem.

For algorithm (3.19) with J(с)=l/, || с_с.ll 2 we obtain, аllowing for (3.24),

t

Ms|nli: M' |[ ! 1.1"_r |l_ cf)аil sign,, } : 1.1n_r1 l'_ сi)M"|li|,

tLr-J I t

i-L

h

V.r(с[z_t])'Ms|n|: ! 1.1o_l], _ cf)zш*|li|) sllс[n_l]_ с.l|2,

I

l/IlsIz]||, < и,{t @|n_71_ c''1t'1z1 * M€"tnl € аz||c|n _ t] _ с.|l2 * o2.

In othег words, оtpгessions (2.4) and (2.5) hold wittr \[n] = 02, K,= 0, K2= a2/ Е' and this yiеlds the

third assertion of the theorem.

t'inally, for (3.20) with J(с)= 7z llс_с.ll2, wе obtain' usiпg (3.11) and (3.12),

Msfn'j: M",a{'sign (с[z _7I,z_ф} - M'\aLP_(ЕLnl1(c|n_1]_ c')'c]' -P(6tn] t (c[n-1]-c.)'э)]},

YI(с[n_|)),Ms|n]>-M"{|(c|n-1]_с*)'э|6(l(с[n_1]_с")тс|)}r.}1||sIz]l|l{'/Ilcll,{az.

Thus Theoтem 1is appliсable, and wе obtain that lim Мx {|с[n-11.с")Tх|о(l (сгn*r]-с "1т* 1 11 1.6,

while ttre set {сtn]} is almost сeгtainly bounded. Theгefore theгe almost сеrвin1y exists a рint Еaпd subsequenсе

ч suсh that с[n,1 * fr and

lim}/"{| ФLo,_ 1] _ с-)"u |0(|("Ino_ 1]_ с-)тz|)} : Q.

Consideгthefuпсtion 9(с)=М*{l (с_с*)Tx|{"_с*)Txt)}. sinсe 6(e)in(3.12)сanalwaysbechosen

tobeсoпtiпuous, ?(с)isalsoaсontinuousfunсtionof с. Theгefoге9(o=0. Butsinсeф(e)=0foг s>0,theсon-

dition that EG)= 0 and (3,22) imply thаt ё^=n.' . Sinсe J(с[n])= }, 1с1n1_с i |t2 сonveгges, and for the sеqueпсе ni

wghaveniJ(с[n,1)* 0, it follows that с[n] ]"с. for the entire sequеnсe с[n], апd this сompletes the Proof.

Proof of Thеorem 8.*Using (4.4), (4.5), and (А.1)' we obtain

YI (c|n _ L!)IMs|n|:f,i o,..', _ t])' ,' { l U Ф|n _ L]* a]n.!qt|n])_ I(c|n_ t];с'tnl l }

: M" ! Р(vl(.t, _ !l),ct|nl)2 * l(c|n-[J* a|nlqt|nJ,_ l(cIn_!'D- a|n|УI(ctn_LDqttnl

ILJ olal

Lalnl ,2||vI (c|n- t])|| (rlvrr.t, _ 1]) |l_ Ч!)

Ltr

M||s|n11|2 < _*-;,o { Г r Q @fu _,!,! * a[nl q||n] ) _ / (с [ n _ 1 ] ) )'

a.Lnl |Ц

Х V/(cIn _ 1]).c'Iz] } > rпvrt.tn _ t]) |l'

I

Jп

{ }' Iv.r(сtn - 1])'c'In]|l|cllz]lI,

ILJ

2'

As сan be sеen ftom the resultant bound, algorithm (4.1) is nоt iп general а pseudogгadient algorithm. But

if с[n] is small as сompaгed to l| VJ (с[n*1]) [l , the рeudogтadient сondition holds; this faсt will be subsequeпtly

utilized.

Let us now еstimate Мllstn]Il2:

39t

* 2o2l||q||n1|'' } = #* { I [сtnl Ivr(ctn _ tl)"qr[n] l*

- !+ilc,Iz]lt,', 11q,1,111,} - #t,I *,fLls,In]ll, < 2a,tl|vJ(с[n _ 1]) 11z * L2a2!n7ao * ж

We сonsider those гealizatiorв of thе stoсhasdс proсess (4.1) foг whiсh ll VJ (с[n]) fl > Е > 0 for all n. Then

the above estimatеs imply, foг с[n] = }х a А,o 3 (whiсh holds foг suffiсiently laгge n in viеw of (a.2))

Y I (c|n _ 1]), M stn| 21|Y I (c|n_ 1] ) Il ( ^|o, (,tn_ t ]) l| _ * ) = } ш o, 1. 1, _ r ]) l|,.

\ zt z

Theгefore for с[п] = с S llvJ(сtn-l]) |[ we have

2o2аz l Lzaв\ ..

P/|IsIz]|l, =;й* \zа-+, ) |lv/(с[n_ t])l|,.

Thus expтеssions (2.4) and (2.5) hold for suсh тeаlizatiorв with \[n]=2 a2a2/а2уn1,К,=0, K,= (4an+Lza)/)-

If (4.з)holds, сonditioп(2.,I) is also valid. If howeveг' o =0 аnd lБ 7tп] is suffiсiently small, expгession (2.fl)

holds. Coпsequently, in both сases foг llvJ(сtn])lI = u. Theorem 1is appliсable, and therefore tim v](сГn-1]r.

ac

Ms[п] = 0; and sinсe llvl(сgn*11||2 =z / lvJ(с[n-1]),Мs1n], the probаbility of the геаlizatioгв foг whiсh llvl(".

tn])l| > с > 0 foг а11 n is еqua1 to zero. This is equivalent to the asseтtion of the theorеm.

ProofofTheoгem9. WetakeanaгbitaryJ>j*аndpointЁforwhtсhj(Е;<f,andintoduсethefunсtional

I(с)=|/ztl с_Ёll2. of сouгse, J(с) satisfi'es (2.2) aпd (2.3) with L= 1, and VJ(с) = с*8. Еurthermoгe, foг algorithm

(5.3) we havе, in viеw of (5.1), (5.2)' and (5.5,),

YI(c|n_|!J,M;\n|: (c[n_ 1] _ Ф,trikIn_|D >i1,1o_|D _"I(Ф,

M||s|nl||, : ||vl(c|n _ t])|l, * M||€tn\ll, < (1(*o,) (t * llс[z_1]ll,)

< (к + o,) (1 + ||e||2 * l(c|n _ 1])).

Let us сonsider those realizationsof algoгithm (5.3) for wшсь j(сtn]) =i for all n. Theп the aЬove inequalities

imply for them that сonditions (2.a) and (2.5) hold aпd Theoгеm 1 is appliсаble. Thereforeliщ vJ(с[n_1])1Мs.

aс

[п] = 0, but vJ(с[n*1])Tмsгn] =Tjкal> 0. This implies thatthe probabi1ity of suсhrеalizations (withj(сtn] >J)

is zeтo. In view of thе faсt ttratj >j* is arЬitrary, this meaпs that lim jt"гn: *l*.

Now assume C* is nonemPty. We iпtroduсe l(с)= lцс_g* fl2, сi€C..

As above,

Y I (cfn - tl),Msln\ )"1 (cln - ll) -"1* > O,

M||stnlli, < (/(* o,) (1* Ilс*ll2 i I(cIn _ r])).

Using Thеoгem 1, wg obtain thatliдц jt"гnll*!r, while lim J(с[n]) almоst сertainly еxists, i.e., thesequenсe

{сtп} is almost сertainly bounded. Therefore we сan almost сeтtaiгrly сhoose a suЬequenсe ni and point66H

weaklv v y

suсh that фiJ --; ё' J(с[nl])+;* (in viеw of the faсt that bouгtdеd seв inH aгe weakly сompaсt). But the

funсtioпal T(") i' "onuo and сontiпuous, and theгefore it is weаkly loweг semiсontinuous. Coгвequeпtly, J(d)=J.,

i.e., d €с.. we теplaсe с. by t in the definition of J(с). тhen J(сГn'1) + 0, but as befoге lim J(с[п]) almost

сеrtainly exisв. Thus ](с1n11T 0, i.e., сгпl T i

Lеt us now Pгove сonveгgenсe of algori*rm (5.4), limiting ouгselves to the сasе of noпempty C r. Taking

y[n] = 1, J(с) = lд l| c-с.|l 2, " . €с ., we obtaiп

(I (cln - Ll)- 1"12 r'il . rп' Ф@|n_ t])_r;;z

rlllls[zl;1-: -]ll--r

|lV/(с[z - tl) |l,

l

394

УI (cLn _ |D'MsIn]2 Itiit"to- 1l) tl, '

i.e., сonditiorв (2.2)-(2.5) ho1d withL=1, \[n] =0, Kl= 0, Kz=1. Therefore ехpгessioп (2.9) alsoholds, aпd

Theoтеm 1 implies that

(I (cLn - tl) - J.) 'z

lim :0.

llvJ(с[n - 1]) ll.

Мoreover, J(с[n]) teпds to a limit, i.е., all the {сtn]} aгe bounded. Theгefoгe gxpression (5.5") implies that

ttrе ilй(сtn*1]) [| aгe bounded, and henсe theassertion we have proved yields limj(сtn]) = j*. тье сonсludiпg

paгt of the proof is the same as foг algorithm (5.3).

Proof of Theoтem 10. BywгittingJ(c)intheform /(.):s "I@)Bp_c}dz, ' wесanusetheruleof dif.

ferentiation with respeсt to a parameter. As а тesult we obtain

!"r (с) : _ S,., vv1 - с) сIz :Sin* (1) qp (q) dq.

Allowiпg foг this expгession, as well as foг (5.?) and (5.9), we find that J(с) sadsfies сondition (2.3). The

validity of (2.2) follows immediately fгom (5.9). Fuгtheгmorе, beсause of (5.?) wehave \ qp(q)dc:0 ' so that

r,4slnl:Jtjt. t" - 1l+ cJ-i pln - tl)) qp(c) ds: yJ(c ln-rl),

xrllslnlf : jd @In_tl* s) -i 1,1n - rl)),llql),p (q)d,q,_{4R2o2,

i.e., сoпditiotв (2.a) and (2.5) hold and thеrefoгe Theorem 1 is appliсablе.

Proof of Theoгеm 11. We will wгite (сTx)* -- max { 0, "T}. Assuming that с. is an aгbitrary point fгom Cr,

we iпtгoduсe IG) =| /z П "-" *fl 2; then сonditions (z.2) and (2.3) hold with L = 1.

Foт аlgoгithm (5.11), siпсe с.Tx s0 foг all x' we obtain

Yl (с[n _ t]),Ms[n] : M {(c|n _ 1'!,x) a@|n - t] _ c*),а} 2 foI |(с[n _ 1]'c) +,},

Д/lis In ] l}, : IИ {(c[n_ 1 ]'а1 } 11,11 z } 4 d,2 M { (c|n _ t1' Ф\},

i.e.'(2.a)and(2.5)holdwith ltn]:0,Kl=0,K,=d2-,and(2.6)and(2.9)hotdurrdеrtheаssumptionswehavemadе

гegaгding 7[nJ. Fгom Theorem 1we obtain that1imМ{(сtn_l1Tx)2*T0, and the set {сtn]} is almost сeгtainly

bounded. Therefore there rxisB a point Ё6H, to whiсh the subsеquепсe с[ni] сonverges weakly. The funсtional

g(с) = ц(gTx)}, as сan be chесkeddiгесtly, is сontinuous and сonvex in с' and thеrefore it is weaklv loweг semi-

сontinuous. From this we obtaiп that 9(Ё)=0. But if Ё€C., expгession (5. 1?) implies ftatм(Еx)*2 > 0. тhегefoгe

Ё€с.. In view of the faсt that сr is aгbitтary in the definition of J(с), we t"ke "* =Ё. Th"n the almost сeтtain

сonvеIgenсe of J(сГn]) апd the сonvеrgenсe of the subsequenсe с[ni1 - dimply that the entiгe sequenсe с[n] сon.

veгges to с.

In algorithm (5.12) we havе

V I (cln - [D'Mslnl: M(ct., _ 1],t) +(c|n _ |!_ c*)тa

ll -tlz

_2

(cln - llrr\ .

='_-u* -, Л,lr|is[n]|I2: 11

The тemaining reasoning is as above.

Algorithm (5.14) сan be similarly investigated:

(c|n - |1,ф1

llсll,

YI (c|n - tl),Ms[nl ) M (сIn _ 1]'э) +'

MllslnJllz { .4{ilzli, { dr.

Let us пow provе that algoгithms (5.13) and (5.14) aтe finite wheп "representability hypothesis" (5.16) holds.

Let us note fiгst of all that о iп (5.1 6) сan be regaгded as arbitaгily large beсause of thе faсt thаt iпequalities

(5.10) are iпhomogeпeors (e.g., by rеplaсing с r by }.с * for large Positive l). Fuгthermoгe, iп pгoсessеs (5. 13)

aгd (5.1a)Wecandisсardallpointsatwhiсhс[n_1]x[n] <0Giпсeсdoesпotсhangeatttrem). Forthегesulting

ao<

"сompressed' Proсеsses' we should understand Mstn], M|| s[n]|| 2 to be the сonditional expeсtations of the сoтгes-

ponding quantitiеs subjесt to the сondition с[n_1],x > 0. For "сompressed" proсess (5.14) we сan write

Yl (c[n - |l)"trIs[nl : IуI {(c[n _ 1] _ c*),x|c|n - 1]'э } 0}

} M {_c'",ric|n _ |fu ) 0} 2 е,

Mllslnlll2 < d'.

Theтefore Wе may assumе that сonditions (2.4) aпd (2.5) hold with \[n] = 0, Kr=0, Кz=d2/e. sinсe € mаy

be taken to bе aгbitгarily large, for aпy 7 > 0 the сondition f <2LKz=2 o/d2 holds, i.е., (2.9) holds. Using

Theoгem 1., wе obtain thatlim VJ(с[n*1])Tмs[n1 1сg1o, the "сompгessed'pтoсess. But thъ сontгadiсts the above

bound vJ(с[n*1]),Мs[n] > s > 0. Consequently, with probаbility 1it is impossible to сotlstгuсt a ''сomPressej"

Proсеss сorвisting of an infiпitе numbеr of iteradons. As a гesult, the initial pгoсess teгminatеs almost сeгtainly

aftet a finite numbeт of steF. The геsultant Point is the solution, in view of сondition (5.16).

Thе гeasoniпg for the сase of algoгithm (5.13) is similaг. We will give only the сorгesponding bounds.

Taking 7[п] = 1, we obиin

YI(c[n_l|])".01s[n] : M |щ+Pf, @|n_||_ c,),.t|c|n-|],, = o}

I llrll

Sinсеwemayassumеthat6< с, theгesultantboundsimplythatмl| stn]ll 2svJ(с[n_1])Tмsгn]. тtrus

сonditioп (2.5)holds with \[n]= 9, K1=0, K,=1, and theгеfore (2.9)holds as well, sinсe y[n] = I<2/LKz=2.

LITERATURЕ сITЕD

1. Ya. Z. Tsypkin' Adaptaтion and Training in Automaflс Systems [iп Russian], Nauka, 1968.

2. Ya. Z. Tsypkin' Fundamentаls of the Theory of Leaгning systems [in Russian], Nauka, 19?0.

3. М. Bazan, Stoсhastiс Appгoximatioп [Russian tгansladon], Мir, tg,l2,

4, J. M. Mendеl and K. S. Fu (еditors), Adaptive Learning and Pattern Reсognition Systems. TЪeory aпd

Aрpliсatioпs., Асаd. Pгess, 19?0.

5. Ya.z. Tsypkiп' "Smoothеd гandomized funсtiona1s and algoгithms in adaptation aпd training theory,.

Avtomat. i Telemekhan., No. 8, 1971.

6. H. Robbiш and S. Monтo, "A stoсhastiс approximation mеthф,'Annals Мath. stat., 22, No. 1, 1951.

1. J. B. CroсkettandH. C1teгnoff, '.Gгadientmethds of maхimization," PaсifiсJ. Math., 5, No. 1, 1955.

8. V. Fabian, "Stoсhasdс appгoximation methds," Czeсhoslovak Mаthematiсal Journal, Щ No. 1, 1960.

9. L. А. Rastrigin' Stoсhastiс Search Меthods [in Russian], Nauka, 1.968.

10. Е. Kiefeт and J. Wolfowitz, "stoсhastiс еstimadon of the maximum of a геgression funсtion,'' Аnnals Math.

Stat., 23, No.3, 1952.

11. Yu. M. Еrmol'еv and Z. V. Nekrylova, ''Some stoсhasdс appгoximation mеthods," Kibernetika, No. 6,

ryoo.

12. Yu. М. Еrmol'ev, "Mеtttod of geneгalizеd stoсhastiс gradieпв and stoсhasdс quasi-Fejeг sеquenсеs,''

Кiberпetika, No. 2, 1969.

1з. B. M. Litvakov, 'Conveгgenсе of reсursion algoгithms for teaсhing Pafteгn гeсognitioп'" Avtomat. i

Telemekhan., No. 1, 1968.

\4. N. V. Loginov, 'Stoсhastiс aРPгoximafloп methods,n Avtomat. i Telemekhan.' 1966.

15. I. P. Devyateгikov, A.I. Kapliпskii, andYa..Z. Tsypkin, ''Coпvеrgеnсе of training algoгithms," Avtomаt.

i Tеlemekhaп., No. 10' 1969.

].6. J. R. Blum' "Multidimensional stoсhastiс appror<imatioп mеthods," Annals Мath. stat., Ц No. 4, L954,

17, Ё. M. vaisbord and D. B. Yudiп, "Multiextemum stoсhastiс approximation," Tekhniсhеskaya Kibeгnetika,

No. 5, 1968.

18. Ya. S. Rubiпshtein, "Comparison of гandom sеarсh and stoсhastiс aPProximatioп," in: Theory and Appliсatiorв

of Rаndom Searсh [in Russ1an], Zinatne, Riga, 1969.

19. o. V. Guseva, 'Conveгgenсe of one raпdom-seaгсh algorithm," Kibernetika, No. 6' 19?1.

396

I's*6

!1,

-1

lls

cIn

=*{

M||s|n]u,: м {

(,|n _ |7,x * e) |c|n _ 7],а }o } = },

tt"

1l

th

(cln -+ 6)', | .1"- 1t* > oI.

20.

21.

22.

23,

24.

o(

26.

90

30.

a1

on

M.A.Aizeгmaп, Е. М. Bгaveгrnan, and L. I. Rozonoer, Method of Potеntial Funсtions in thеTheory of

Мaсhine Training [in Russian], Nauka, 1970.

Yu. I. Lyubiсh and G. D. Maistтovskii, 'Genеral theory of гelaxadoп Proсеsses foг сonvex fuпсtionals,'' Usp.

Мatem. Nauk' 25' No. 1' 1970.

B. T. Polyak, ''Coпveгgenсe of the Method of fеаsiblediгeсtions in extremum pгoblems,,'Zh. Vyсhislit.

Matem. i Matem. Fiz., 1,1, No. 4, 1971.

A. A. Samarskii, Introduсtion to the Theory of Diffeгenсe Sсhemеs [in Russian], Naukа, 19?1.

V. Ya. Katkovnik, '.Sensitivity of gradient proсedures,'' Avtomat. i Telemekhan., No. \2, 191o.

Ё. п. Aved'yaп, "one modifiсation of the Robbins.Monio algorithm, " A vtomat. i Telemеkhan., No. 4, 1,g6,t.

K. B. Gray, "Appliсations of stoсhastiс аpproсimation to the oРtimization of гandom сiгсuiB,'' Proс. Symp.

Appl. Мath., |6, L9Ф.

H. J. Kushner, 'Stoсhastiс appro<imation algorithms foг the loсal optimization of funсtions ъlith nonuniquе

sЙdonary points," Bгown Univ., 1972 (prepгint).

V. V. Nalimov and N. A. Cheтnova' statistiсal Мethods of Planпing Еxtгemum Еxргimenв [in Rusian],

Nauka, 1965.

B. T. Polyak, "Miпimizatioп of noгвmooth funсtionals,,' Zh, Vyсhislit. Мatem. i Мatem. Fiz,, 9, No. 3, 1969.

A. Novikoff, "on сonvегgеnсe proofs foг peгсeptгoпs,. Pгoс. Symp. Math. Theor. Аutom., Ц' 1963.

V. A. Yakuboviсh, "Certain general theoretiсal pтinсiplеs regaгding thе сoгвtтuсtion oftтainable гeсognitioп

systеms," iп: Computeг Teсhniques arrd Pгogгamming Pтoblems [in Russian], No' 4, Izd. LGU, 1965.

V. N. Еomin, "stoсhastiс analogs for fiпite-сonveгgent training algorithms foг recognition systеms," in:

Computeг Techniquеsand Pтoblems of Cybегпetiсs [in Russian], No. 6, Izd, LGU, L911,,

Е. G. Glаdyshev, "Stoсhastiсappгoximation," Teor. Vегоyatn. i ee Pгimen., ]..0' No.2, 1965,

R. C. Buck, ''Stoсhastiс asсent,n Numer. Мattr., 4' No.3, 1962.

M. B. Nevel'son and R. Z. Khas'minskii, Stoсhasdс Approximadon аnd Reсuггent Еvaluadon |in Russian],

Nauka, 19?2.

OQ

QА

a(

a oг,