Conference Paper
A Two Level Architecture for High Throughput DCTProcessor and Implementing on FPGA
DOI: 10.1109/ReConFig.2010.67 Conference: Reconfigurable Computing and FPGAs (ReConFig), 2010 International Conference on
Source: IEEE Xplore
 Citations (17)
 Cited In (0)

Conference Paper: A new RNS architecture for the computation of the scaled 2DDCT on fieldprogrammable logic
[Show abstract] [Hide abstract]
ABSTRACT: This paper shows the implementation of an 8/spl times/8 scaled twodimensional Discrete Cosine Transform processor (2DDCT) based on the Residue Number System (RNS). The rowcolumn decomposition technique is used and each 1DDCT processor has been derived by the application of a previously developed scaled Fast Cosine Transform (FCT) algorithm that requires a reduced number of multiplications. Simulations of binary 2's complement and RNS version of the scaled 2DDCT processor using VHDL over FieldProgrammable Logic (FPL) devices provide a throughput improvement for the proposed RNSbased 2DDCT processor of up to 148% when 8bit moduli are used. This is achieved due to the synergy between RNS and modern FPL device families.Signals, Systems and Computers, 2000. Conference Record of the ThirtyFourth Asilomar Conference on; 02/2000 
Conference Paper: A Pipelined Fast 2DDCT Accelerator for FPGAbased SoCs.
[Show abstract] [Hide abstract]
ABSTRACT: Multimedia applications, and in particular the en coding and decoding of standard image and video formats, are usually a typical target for Systems onChip (SoC). The bidimensional Discrete Cosine Transformation (2DDCT) is a commonly used fre quency transformation in graphic compression algo rithms. Many hardware implementations, adopting disparate algorithms, have been proposed for Field Programmable Gate Arrays (FPGA). These designs focus either on performance or area, and often do not succeed in balancing the two aspects. In this paper, we present a design of a fast 2D DCT hardware accelerator for a FPGAbased SoC. This accelerator makes use of a single seven stages 1DDCT pipeline able to alternate computation for the even and odd coefficients in every cycle. In addition, it uses special memories to perform the transpose opera tions. Our hardware takes 80 clock cycles at 107MHz to generate a complete 8x8 2D DCT, from the writ ing of the first input sample to the reading of the last result (including the overhead of the interface logic). We show that this architecture provides optimal per formance/area ratio with respect to several alternative designs.2007 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2007), May 911, 2007, Porto Alegre, Brazil; 01/2007 
Conference Paper: A new fast algorithm for 8×8 2D DCT and its VLSI implementation
[Show abstract] [Hide abstract]
ABSTRACT: Due to the importance of the discrete cosine transform (DCT) in the field of transform coding of images, various algorithms and architectures for realtime 2D DCT processor designs have been proposed. In this paper we present a new fast algorithm for 8×8 2D DCT based on partial sum and its corresponding hardware architecture for VLSI realization. The algorithm costs fewest multipliers in theory and the system is a serialin serialout system. Theoretical proof and simulation results on FPGA devices show the efficiency of the algorithm. The kernel architecture corresponding to the algorithm is regular with lower complexity and performs high throughput.VLSI Design and Video Technology, 2005. Proceedings of 2005 IEEE International Workshop on; 06/2005
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.