Conference Paper

A Two Level Architecture for High Throughput DCT-Processor and Implementing on FPGA

DOI: 10.1109/ReConFig.2010.67 Conference: Reconfigurable Computing and FPGAs (ReConFig), 2010 International Conference on
Source: IEEE Xplore

ABSTRACT Frequency analysis using discrete cosine transform is being used in a large variety of algorithms such as image processing algorithms. This paper proposes a new high throughput architecture for the DCT processor. This system has got a 2level architecture which uses parallelism and pipelining and has been synthesized on Xilinx Virtex5 FPGA. Synthesis results show that this system works at 150 MHz. Applying DCT on each 8×8 matrix of image take 67 clock pulses. In other words, applying DCT on each pixel takes approximately one clock pulse.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper shows the implementation of an 8/spl times/8 scaled two-dimensional Discrete Cosine Transform processor (2D-DCT) based on the Residue Number System (RNS). The row-column decomposition technique is used and each 1D-DCT processor has been derived by the application of a previously developed scaled Fast Cosine Transform (FCT) algorithm that requires a reduced number of multiplications. Simulations of binary 2's complement and RNS version of the scaled 2D-DCT processor using VHDL over Field-Programmable Logic (FPL) devices provide a throughput improvement for the proposed RNS-based 2D-DCT processor of up to 148% when 8-bit moduli are used. This is achieved due to the synergy between RNS and modern FPL device families.
    Signals, Systems and Computers, 2000. Conference Record of the Thirty-Fourth Asilomar Conference on; 02/2000
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multimedia applications, and in particular the en- coding and decoding of standard image and video formats, are usually a typical target for Systems- on-Chip (SoC). The bi-dimensional Discrete Cosine Transformation (2D-DCT) is a commonly used fre- quency transformation in graphic compression algo- rithms. Many hardware implementations, adopting disparate algorithms, have been proposed for Field Programmable Gate Arrays (FPGA). These designs focus either on performance or area, and often do not succeed in balancing the two aspects. In this paper, we present a design of a fast 2D- DCT hardware accelerator for a FPGA-based SoC. This accelerator makes use of a single seven stages 1D-DCT pipeline able to alternate computation for the even and odd coefficients in every cycle. In addition, it uses special memories to perform the transpose opera- tions. Our hardware takes 80 clock cycles at 107MHz to generate a complete 8x8 2D DCT, from the writ- ing of the first input sample to the reading of the last result (including the overhead of the interface logic). We show that this architecture provides optimal per- formance/area ratio with respect to several alternative designs.
    2007 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2007), May 9-11, 2007, Porto Alegre, Brazil; 01/2007
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to the importance of the discrete cosine transform (DCT) in the field of transform coding of images, various algorithms and architectures for real-time 2-D DCT processor designs have been proposed. In this paper we present a new fast algorithm for 8×8 2-D DCT based on partial sum and its corresponding hardware architecture for VLSI realization. The algorithm costs fewest multipliers in theory and the system is a serial-in serial-out system. Theoretical proof and simulation results on FPGA devices show the efficiency of the algorithm. The kernel architecture corresponding to the algorithm is regular with lower complexity and performs high throughput.
    VLSI Design and Video Technology, 2005. Proceedings of 2005 IEEE International Workshop on; 06/2005


Available from