Parallel merged multiplier-accumulator coprocessor optimized for digital filters

ArticleinComputers & Electrical Engineering 36(5):864-873 · September 2010with13 Reads
DOI: 10.1016/j.compeleceng.2008.04.005 · Source: DBLP
In an attempt to improve the speed of VLSI signal processing systems, a new architecture for a high-speed multiply–accumulate (MAC) unit optimized for digital filters is proposed. This unit is designed as a coprocessor for the LEON2 RISC processor [LEON2 Processor; 2005 [Online]. ]. In this work, four parallel MAC units with two dual-port coefficient register-files, a three-port general register-file and a control unit are included in the coprocessing block. With the existence of four parallel units, several SIMD format instructions have been added to LEON2 instruction set. Each MAC unit has two 16-bit inputs, 32-bit output register and a programmable round-saturate block. The MAC unit uses a new architecture which embeds the accumulate module within the partial products summation tree of the multiplier with minimum overhead. A central control unit controls inputs of the four MACs and loading of the output registers. Our experimental results demonstrate a high performance in implementation of digital filters at elevated speeds of up to 33 millions of input samples per second in a 0.18μm technology.
  • [Show abstract] [Hide abstract] ABSTRACT: In this paper we propose two high performance multiplication-accumulation (MAC) designs. Targeting to operate at higher frequency, we investigate two different techniques based on the carry-save representation in order to reduce the delay impact of the accumulation process on the MAC operation. We conducted detailed experimental measurements to verify the advantages of the proposed MAC designs compared to two existing ones. Both the proposed designs operate at higher frequency without any losses in the area occupied or the power consumed.
    Conference Paper · Dec 2014