Page 1

A Parallel Architecture for the Self-Sorting FFT Algorithm

F. Arguello

J.D. Bruguera

E.L. Zapata

December 1995

Technical Report No: UMA-DAC-95/26

Published in:

Journal of Parallel and Distributed Computing

vol. 31, no. 1, pp. 88-97, November 1995

University of Malaga

Department of Computer Architecture

C. Tecnologico • PO Box 4114 • E-29080 Malaga • Spain

Page 2

FFT ALGORITHM

F rancisco Arg? uello? Ja vier D?Bruguera? andEmilio L? Zapata

?

????hs????

Dept?Electr? onicay Computaci?on?F acultad deF

?

isica?

Universidad de San tiago??????San tiago? Spain

?

Dept? Arquitecturade Computadores?

F acultad deInform? atica? Universidad de M?alaga?

?????M? alaga?Spain

Mailing Address?EmilioL?Zapata

Dept?Arquitectura deComputadores

Univ ersidadde Malaga

Plaza El Ejido??????Malaga

Spain

PHONE? ??? ????? ??

F AX???? ??? ????

e?mail?ezapata?ctima?uma?es

?

Thisw orkw as supp ortedin partby theMinistry of Educationand Science?CICYT?of Spain undercon tract

TIC????????C?? andXunta de GaliciaXUGA?????B???

?

Page 3

form ?FFT??b ecause?unlike the generallyused algorithms?it do es notrequiresh u?ing the

sequence tobe transformed ?digitrev ersal??In thisw orkwe prop osea parallel arc hitecture

that implemen ts the SSradixr?r? ?? algorithm?The data?ow ofthe algorithmis decom?

p osed? ina naturalway? intotwo sections that areimplementedby meansof FIFO queues

lo catedin thepro cessors and anin terpro cessorconnection netw ork?perfect unshu?e??The

resultingdesign isregularand mo dular?and? whenev erpossible? presen tsconstantgeometry?

Thetotal processing time requiredisnN ?rQcyclesforatransformofsize N?r

n

computed

usingQ?r

q

processors?Consequen tly? thereareno cyclelosses?

Keywords? Self?Sorting Algorithm?P arallel Arc hitecture?Pipeline Design?FFT?

?INTRODUCTION

The self?sorting ?SS?algorithm??? isav ersionof thesuccessive doubling ?SD?metho d?This

isageneral ande?cient procedure forthedesign offastalgorithms? It isbased onthe divide

and conquerstrategy and establishesthe solutionofa problembyworkingwithagroup of

subproblems ofthe samet ypeand smallersize? Thisway?theredundances present in many

algorithmsare eliminated? resultingin fasteralgorithms?Another application of the SDmethod

is the extraction ofthe parallelismpresent in thesealgorithms?

Consideringthe t yp es ofdata ?o wsgeneratedbythe SDmetho d?wecan distinguishb etw een

in?place? split?radix?constant?geometry and SSalgorithms???????? In?placealgorithms??? min?

imize thememory requiremen ts asthe partialresultsare writtenov er the datausedfor their

calculation?Split?radix ???algorithms arecharacterized by the fact thatthey requirea smaller

numb erof arithmetic op erationsthan other algorithms?How ever? split?radixalgorithms present

irregularitiesin thedata?owthathinderthebalancing ofthe computationalloadin parallel

arc hitectures???? Constan t?geometry algorithms??? presentiden tical data?o wsinev ery stage

of thetransform? Finally? SSalgorithms ???????? unlike theotherv ersions ofthe SD method?

arec haracterizedby thefact thattheypro vide anoutput sequence that isdigitrev ersedwith

resp ectto theinputsequence?

Theparallelimplementation of anSS algorithmwith resp ecttoanunorderedalgorithm

isadv antageous interms ofspeed? Inthe implementation of anFFT algorithm??? with an

unorderedoutput ?in?placeor constant?geometry?forinstance??the orderingof the datasequence

can be carriedoutintwo basicwa ys?First? theorderingcanbe carriedoutby meansof an

interconnectionnetw ork?such as theone inref??????Howev er?thissolution cannotbe used

inpartitionedalgorithms? asaglobal datashu?eand nota shu?e within each partitionm ust

becarried out?Secondly? if theFFTdeviceis used asacopro cessorbya hostprocessor?the

sequencecanbewritteninanunorderedformat andthehostcanperform theorderingoperation?

Anotheralternativeistogenerateanadequatesequencingofthe addresses in ordertowritethe

datain memoryin anordered format?thissolutionisused in??????In???? theyproposean

addressgeneratorusing application speci?chardware?

Giv enthe importance ofthe FFT?many e?orts havebeen dev otedto the design ofsystolic

arc hitecturesthate?cientlyexploittheparallelism associated with the SD method?Untilvery

recently? no designs contemplatingthepartitioning ofthedata have app earedinthe literature?

?

Page 4

m??

?unlik

SSFFTalgorithm isin troduced insection ?? The notationusedwillbeintroduced in section??

Inthis samesection?we present thedesign of thepro cessor implementing the SS algorithm? In

section?we extend thearchitecture toa pro cessorcolumnthat isinterconnected by meansof

thep erfectunsh u?ep ermutation? Finally? insection?we ev aluatethe prop oseddesigns?

? SELF?SORTING FFT ALGORITHM

The process leadingto theF astF ourierT ransform?FFT?isbased on thefactoring propertyof

the DFT? Ifwe haveasequencex?m????m?N? theDFTX?k????k?Nof thissequence

is de?nedby means of the followingexpression?

X?k??

N??

X

W

mk

N

x?m? ???

whereW

N

?exp??j???N ?? IfN isdecomp osedin toaset ofn factorsN?N

n

N

n??

???N

?

then?the DFTof lengthN canbecomputed innstagesthroughm ultiple DFTsoflengthN

i

?n umb ering thestagesi???????n??Usually? in everystageN

i

?r? and? consequently? the

input and outputindices fortransformm andk? canbe writtenin baser? for example?

m?m

n

r

n??

?m

n??

r

n??

?????m

?

r?m

?

??m

n

?m

n??

?????m

?

?m

?

? ???

Substitutingthesefactorings ofmand k inequation????weobtain the standard radixr FFT

algorithmthatw asinitiallyderiv edby Co oley andT uk ey ????

Any FFTalgorithmbasically impliestwopro cesses?a shu?ing of the datasequence?digit

rev ersal?anda set ofn calculationstages wherethe?butter?ies?are computed?There are?

how ev er?di?erent FFTalgorithms?althoughtheirresultsare thesame? thedi?erence maybe

found in thecomputationscarried outand intheway in which theintermediate resultsarestored

????Consideringthet yp esof data ?o ws?we candistinguishb etw eenin?place? constan t?geometry

and SSalgorithms?In?place algorithms???minimize thememory requiremen ts asthepartial

resultsare writtenover the datausedfor theircalculation? Theconstan t?geometryalgorithms

???present identicaldata ?o ws in every stage of thetransform?Finally? in SS algorithms??? it is

notnecessary todigit reverse thesequencee in?placeandconstan t?geometryalgorithms?

as itiscarriedout insitu duringtheexecutionofthe transform?

IntableIwesummarize themostfrequently usedalgorithms? indicatingthepositions where

the inputsand outputs ofeach butter?yare located?Inthei?thstage?i???????n?? the

butter?ies aremadeupwith theset ofthe rinputs that only di?erindigitm

n?i??

?producing

ther outputsthatdi?er indigitk

i

?Thelast threealgorithms of tableI arevarian tsof the ?rst

threeobtainedby reverting theindices?and are?consequently?v ery similartothem?

Thealgorithms intable Iarenotthe onlypossiblealternativ es?Using adi?erentapproach?

Johnson andBurrus????showedthattheradix?algorithm can berestructuredsothatitis b oth

self?sortingand in?place?Thistechniquew asgeneralizedbyT emperton???? for itsapplication

to di?erentradices?Otherfastalgorithmswithdi?erentproperties arefoundinthebookby

Rabiner andGold????Similarprinciplestothose appliedto theFouriertransform canbe

?

Page 5

t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

???

Where? for thesake ofclaritywe haveunderlined thedigitsthat are

v

shifted?After the? stages

that make up theSSalgorithm? the outputindex is thedigit reversal oftheinputindex? Figure

? showsthe data ?ow of theSSalgorithmfor a sequenceof sixteendata items?

As sho wnin tableI? the SS algorithm

v

admitsavariant thatconsists inp erformingthe cyclic

displacements to the left?insteadofp erformingthem to the righ

v

t? Thearchitecture that results

inthiscase ????has identicalprop erties totheonewe willdescribe

C

inthe follo wing sections?

? RADIXr PR OCESSORARCHITECTURE

Inthissection we will in tro ducethenotationthatwillbe usedtogether withthe design ofa

singleprocessor architecture that implemen tsthe radixr SSalgorithm? Inthenextsection this

formulation willbe generalizedbymeansof theintro ductionofparallelism? Thiswill p ermit the

design ofa recirculating PEcolumn?F orb othtaskswe willbase ourapproach on the partial

p erfect unsh u?ep ermutation?

??? Notation

We considera sequence tobe transformedfa

i

g of sizeN?r

n

?b eingr ther adix?We willassume

that thissequence isarranged asatwo dimensionalmatrix ofr columns ofr data itemsas

follo ws?n?v? ???

?

B

B

B

B

?

a

r??

???a

?

a

?

a

r

?v

??

???a

r

v

??

a

r

?

?

?

a

r

v??

??

???a

r

v??

?r

v

??

a

r

v ??

?r

v

?

C

C

C

A

???

Toindicatea sequenceof thist yp e?we will useatwodimensional representation oftheindex

for eachdata item? writtenin ther adixbase?Thisway? the mostsigni?cant digitofeach index?

?in baser? willdenotethe rown umb er? whereas therestofthe digitswill denotetheindex of

the columnas follows?

?x?z?

r

? ??x???z

v

?????z

?

??

r

???

with x?z

k

?f????????r??g? Theindexof thei?thdataitemwill bei?x?r

v

?z

v

?r

v??

?????

z

?

?r?z

?

?n?v? ???Thisrepresen tationisc hosenbecauseit is themost appropriate onefor

expressingthedi?eren t op erationstobe carriedout?Forsimplicityin thenotation? from now

onwe willwrite expression??? withoutsubindexr?

Ifweconsider matrix??? asatwodimensional sequence ofdata ?owing fromleft torigh t?

whereasthecomputation of the butter?iesof thetransforms is carried outov er eachofthe

columnsof thematrix?we canin terpret thetwodimensionalrepresen tation?x?z?as?path?cycle??

Coordinatex?whichcounts therows from toptobottom?determinestheparallelismofeach

stage?abutter?ywithrdataitemsisprocessed eachcycle?? whereasco ordinatez?thatcounts

?