Page 1

A Parallel Architecture for the Self-Sorting FFT Algorithm

F. Arguello

J.D. Bruguera

E.L. Zapata

December 1995

Technical Report No: UMA-DAC-95/26

Published in:

Journal of Parallel and Distributed Computing

vol. 31, no. 1, pp. 88-97, November 1995

University of Malaga

Department of Computer Architecture

C. Tecnologico • PO Box 4114 • E-29080 Malaga • Spain

Page 2

FFT ALGORITHM

F ranciscoArg? uello? Javier D? Bruguera?andEmilioL? Zapata

?

????hs????

Dept?Electr? onicay Computaci?on?Facultad deF

?

isica?

Universidadde Santiago? ?????San tiago?Spain

?

Dept? ArquitecturadeComputadores?

Facultad deInform? atica?Universidad deM?alaga?

????? M?alaga?Spain

MailingAddress? EmilioL? Zapata

Dept?ArquitecturadeComputadores

Universidadde Malaga

Plaza ElEjido??????Malaga

Spain

PHONE???? ???????

FAX???? ????? ??

e?mail? ezapata?ctima?uma?es

?

Thisw orkw assupportedinpartbythe Ministry ofEducationandScience ?CICYT?of Spainunder contract

TIC????????C??and Xuntade Galicia XUGA?????B???

?

Page 3

form ?FFT??b ecause?unlike thegenerallyusedalgorithms? it do esnotrequireshu?ing the

sequence tobe transformed ?digitreversal?? Inthisworkwe proposeaparallelarc hitecture

thatimplemen tstheSS radixr?r??? algorithm?Thedata?ow ofthe algorithmisdecom?

posed?inanaturalway?intotwosectionsthatareimplemen tedbymeans ofFIFO queues

located inthe processorsandanin terprocessor connection network ?perfect unshu?e?? The

resulting designisregular and modular? and?whenev erp ossible?presen tsconstant geometry?

ThetotalprocessingtimerequiredisnN?rQcyclesforatransformofsizeN?r

n

computed

usingQ?r

q

processors?Consequen tly?therearenocyclelosses?

Keywords?Self?Sorting Algorithm?ParallelArchitecture?PipelineDesign?FFT?

?INTRODUCTION

Theself?sorting?SS?algorithm???isaversionofthe successive doubling?SD?method? This

isageneraland e?cientprocedureforthedesign offastalgorithms?Itisbasedonthedivide

andconquer strategyandestablishesthesolution ofaproblembyworking withagroupof

subproblemsofthesametypeandsmallersize?Thisway?theredundancespresentin many

algorithmsareeliminated?resultingin fasteralgorithms?AnotherapplicationoftheSDmethod

istheextraction oftheparallelismpresentinthesealgorithms?

Consideringthetypesofdata?o wsgeneratedbytheSDmetho d?wecandistinguishb etween

in?place?split?radix?constan t?geometryandSSalgorithms???????? In?placealgorithms???min?

imizethememoryrequiremen tsasthepartialresultsare writtenov erthe datausedfortheir

calculation?Split?radix???algorithmsarecharacterizedbythefactthattheyrequireasmaller

n umb erofarithmeticop erationsthanotheralgorithms?However?split?radix algorithmspresent

irregularitiesinthedata?owthat hinderthebalancingofthecomputationalloadinparallel

architectures????Constan t?geometryalgorithms ??? presentidenticaldata?owsineverystage

ofthetransform?Finally?SSalgorithms????????unliketheotherversionsoftheSDmethod?

arecharacterizedbythefactthattheyprovideanoutput sequencethat isdigitreversedwith

respect totheinputsequence?

Theparallel implementation of anSSalgorithmwithrespectto anunorderedalgorithm

is advantageousinterms of speed? IntheimplementationofanFFTalgorithm???withan

unordered output?in?placeor constant?geometry?forinstance??the orderingofthedatasequence

canbecarriedoutintwobasicways?First?theorderingcanbecarriedoutbymeansofan

interconnectionnetwork?suchastheone inref? ?????However?this solution cannotbeused

in partitionedalgorithms?asa globaldata shu?eandnota shu?ewithineachpartitionmust

becarriedout?Secondly? iftheFFTdeviceisusedasacoprocessorbyahostprocessor?the

sequencecanbe writtenin anunorderedformat andthehost can perform the orderingoperation?

Anotheralternativ eistogenerate an adequate sequencing ofthe addressesinordertowrite the

datainmemory in anordered format?this solutionisusedin?????? In????theypropose an

addressgeneratorusing applicationsp eci?c hardware?

Giv entheimportance ofthe FFT?many e?ortshaveb eendevoted tothe design ofsystolic

arc hitecturesthat e?cientlyexploit theparallelismasso ciated withtheSDmetho d?Untilv ery

recen tly? nodesignscon templatingthepartitioning ofthe data haveapp earedin theliterature?

?

Page 4

m??

?unlik

SSFFT algorithmis introduced insection ??Thenotationusedwillbe intro ducedin section??

Inthissamesection?wepresen tthedesignofthepro cessor implementingtheSS algorithm?In

section?weextend thearchitecture toaprocessor columnthatis interconnectedbymeans of

theperfectunshu?epermutation?Finally?insection?we evaluatetheproposed designs?

?SELF?SORTINGFFTALGORITHM

TheprocessleadingtotheFastFourierTransform?FFT?isbased onthefactoringpropertyof

the DFT?Ifwehavea sequencex?m????m?N?theDFTX?k????k?N ofthis sequence

isde?nedbymeansofthefollo wingexpression?

X?k??

N??

X

W

mk

N

x?m????

whereW

N

? exp??j???N ??IfNisdecomposedin toasetofnfactorsN?N

n

N

n??

???N

?

then?theDFT oflengthNcanbe computedinn stagesthroughmultipleDFTs oflengthN

i

?numberingthe stagesi???????n??Usually?ineverystageN

i

?r?and? consequently?the

inputandoutputindicesfortransformmandk?canbewrittenin baser? forexample?

m?m

n

r

n??

?m

n??

r

n??

?????m

?

r?m

?

??m

n

?m

n??

?????m

?

?m

?

? ???

Substituting thesefactorings ofm andk inequation ????we obtain thestandard radixr FFT

algorithmthatwasinitiallyderiv edby Co oley andTuk ey????

AnyFFTalgorithm basicallyimpliestwoprocesses?a shu?ing ofthedatasequence?digit

reversal?andaset ofncalculationstages wherethe?butter?ies?arecomputed?Thereare?

however?di?erentFFTalgorithms?althoughtheirresultsarethesame?thedi?erencemaybe

foundin thecomputationscarried outandin thewayinwhichthe in termediate resultsare stored

????Consideringthe typ esof data?o ws?we candistinguishbet ween in?place? constan t?geometry

andSSalgorithms?In?placealgorithms???minimize thememoryrequiremen tsasthepartial

results arewritten overthedatausedfor theircalculation? The constant?geometryalgorithms

??? presen tiden ticaldata ?owsinev erystageof thetransform? Finally? inSSalgorithms ???it is

notnecessary to digitreversethesequence ein?placeand constant?geometryalgorithms?

asitiscarried out insituduringtheexecution of thetransform?

IntableIwesummarize themostfrequen tlyusedalgorithms? indicatingthep ositionswhere

the inputs andoutputs ofeachbutter?y are located? Inthei?thstage?i???????n??the

butter?iesare made up withthesetof therinputs thatonly di?erin digitm

n?i??

? producing

theroutputsthatdi?er indigitk

i

? ThelastthreealgorithmsoftableI arevariantsof the?rst

threeobtained byrevertingthe indices?and are? consequently?verysimilartothem?

The algorithmsin tableIarenotthe onlypossible alternativ es? Usingadi?erentapproac h?

Johnsonand Burrus ????showedthat theradix ? algorithmcanberestructured so thatit isboth

self?sortingand in?place?Thistec hniquewasgeneralized byTemp erton???? foritsapplication

to di?erent radices? Otherfastalgorithmswithdi?erentpropertiesare foundinthebookby

RabinerandGold????Similarprinciplestothose appliedtotheFouriertransform canbe

?

Page 5

t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

?t

?

t

?

t

?

t

?

???

Where?for thesakeof claritywehave underlinedthe digitsthatare

v

shifted? Afterthe? stages

thatmake upthe SSalgorithm?theoutput indexisthe digitrev ersaloftheinputindex? Figure

?sho wsthedata ?ow ofthe SSalgorithmfora sequenceof sixteendataitems?

Asshown intable I?theSSalgorithm

v

admitsav ariantthatconsistsinperformingthe cyclic

displacements tothe left?insteadofp erformingthemtotherigh

v

t?Thearchitecturethatresults

inthiscase????hasiden ticalprop ertiestotheonewewilldescribe

C

inthe followingsections?

? RADIXrPROCESSORAR CHITECTURE

Inthissection wewillin troducethe notationthat willbe usedtogetherwiththedesignofa

singleprocessorarchitecturethatimplementstheradixrSS algorithm?Inthenextsectionthis

formulationwillbegeneralizedbymeansoftheintroductionofparallelism?Thiswillpermitthe

design ofarecirculatingPEcolumn?Forboth taskswewillbaseourapproachonthepartial

p erfectunshu?epermutation?

??? Notation

Weconsiderasequencetobetransformedfa

i

gofsizeN?r

n

?beingrtheradix?Wewillassume

thatthissequenceis arranged asatwodimensionalmatrix ofrcolumns ofr dataitemsas

follows?n?v????

?

B

B

B

B

?

a

r??

???a

?

a

?

a

r

?v

??

???a

r

v

??

a

r

?

?

?

a

r

v??

??

???a

r

v ??

?r

v

??

a

r

v ??

?r

v

?

C

C

C

A

???

To indicateasequenceof thist yp e?wewill useatwodimensional representation ofthe index

for eachdata item?writtenin theradix base?Thisway? themostsigni?cant digitofeach index?

?inbaser? willdenotethe rown umber? whereastherest ofthedigits will denotethe indexof

the columnasfollows?

?x?z?

r

???x???z

v

?????z

?

??

r

???

with x?z

k

?f????????r??g?The index ofthei?thdata itemwillbei?x?r

v

?z

v

?r

v??

?????

z

?

?r?z

?

?n?v? ???This represen tationisc hosenbecause itis the mostappropriateonefor

expressing thedi?erentop erationsto becarried out?For simplicityin thenotation? fromnow

onwe will writeexpression???without subindexr?

If weconsidermatrix ???asatwodimensionalsequenceofdata?owingfromlefttoright?

whereasthe computationofthe butter?iesof thetransformsiscarried outovereach ofthe

columnsofthematrix?wecan interpretthetwodimensionalrepresentation?x?z?as?path?cycle??

Coordinatex ?whichcountstherowsfromtoptob ottom?determinestheparallelismof each

stage ?abutter?ywithrdataitemsisprocessedeachcycle?? whereascoordinatez?that coun ts

?