A high-performance/low-latency vector rotational CORDIC architecture based on extended elementary angle set and trellis-based searching schemes

Dept. of Electr. Eng., Nat. Central Univ., Chung-li, Taiwan
IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing 10/2003; DOI: 10.1109/TCSII.2003.816923
Source: IEEE Xplore

ABSTRACT The coordinate rotational digital computer (CORDIC) algorithm is a well-known iterative method for the computation of vector rotation. For applications that require forward rotation (or vector rotation) only, the angle recoding (AR) technique provides a relaxed approach to speed up the operation of the CORDIC algorithm. In this paper, we further apply the concept of AR technique to extend the elementary angle set in the microrotation phase. This technique is called the extended elementary-angle set (EEAS) scheme. The proposed EEAS scheme provides a more flexible way of decomposing the target rotation angle in CORDIC operation, and its quantization error performance is better than the AR technique. Meanwhile, to solve the optimization problem encountered in the EEAS scheme, we also proposed a novel search algorithm, called the trellis-based searching (TBS) algorithm. Compared with the greedy algorithm used in the conventional AR technique, the proposed TBS algorithm yields apparent signal-to-quantization-noise ratio (SQNR) improvement. Moreover, in the scaling phase of the EEAS-based CORDIC algorithm, we suggest a novel scaling operation, called Extended Type-II (ET-II) scaling operation. The ET-II scaling operation applies the same design concepts as the EEAS scheme. It results in much smaller quantization error than conventional Type-I scaling operation in the numerical approximation of scaling factor. By combining the aforementioned new schemes, the proposed EEAS-based CORDIC algorithm can improve the overall SQNR performance by up to 25 dB compared with previous works. Also, given the same target SQNR performance, we require only about 66% iteration number in the iterative CORDIC structure, or use 66% hardware complexity in the parallel CORDIC structure compared with conventional AR technique. Hence, high-performance/low-latency CORDIC very large-scale integration architectures can be achieved without degrading the SQNR performance.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The COordinate Rotation DIgital Computer (CORDIC) algorithm is a famous technique for realizing complex arithmetic functions using simple shift-add operations. This paper presents a novel completely scaling-free CORDIC algorithm in rotation mode for high performance hyperbolic computations. We target algorithm level improvements to achieve low area and power-delay product on FPGA. Instead of complex search algorithms, we use the most significant one bit detection technique for micro-rotation sequence identification, which helps in significantly reducing the number of pipelining stages. The proposed technique uses mathematical identities to extend the range of convergence. The eight-staged pipelined architecture implementation requires a ROM in the preprocessing unit for storing the initial coordinate values, while the ROM for storing the elementary angles is eliminated. The FPGA implementation of the proposed processor requires 46.35% less gates and has 31.81% less delay when compared with Xilinx Core IP-CORDIC v3.0. Moreover, on an average it consumes 75.96% less power when compared with Xilinx CORDIC v3.0. Hence, the proposed technique provides an area–power-delay efficient VLSI implementation for calculating hyperbolic functions and exponents. The detailed algorithm design, along with FPGA implementation and area and time complexities, is presented in this paper.
    The Computer Journal 05/2012; 55(5):616-628. DOI:10.1093/comjnl/bxr109 · 0.89 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Synthetic transmit aperture (STA) has been widely investigated in ultrasound system recently with characteristics of high frame rate and low hardware cost. Since the high-resolution image (HRI) of STA is formed by summation of low-resolution images (LRIs), it is susceptible to inter-firing motions. In this paper, we propose a low-complexity global motion compensation algorithm. We use the common region of interest $({rm ROI}_{{rm com}})$ between different transmissions of STA imaging to beamform backward and forward beam vectors. Then, the magnitude and direction of motion can be evaluated by cross-correlations between specific beam vectors in STA imaging. Compared with the uncompensated image in two-dimentional (2D) motion environment, the proposed motion compensation algorithm can improve the contrast ratio (CR) and contrast-to-noise ratio (CNR) by 13.73 and 2.04 dB, respectively. Also, the proposed algorithm improves the CR and CNR about 7.84 and 1.36 dB comparing with the reference work, respectively. In the Field II breath model, the proposed method also improves the CR and CNR about 6.65 and 1.04 dB than the reference method, respectively. Moreover, we propose a low-complexity delay generator in the architecture design to further reduce the computational complexity of the whole beamforming system. Finally, we verify the proposed low-complexity motion compensation beamforming engine by using the VLSI implementation with CMOS 90 nm technology. In the post-layout result, the core size is 2.39 mm$^2$ at 125 MHz operating frequency and the frame rate of the beamforming system is 42.23 frames per second.
    IEEE Transactions on Signal Processing 02/2014; 62(4):840-851. DOI:10.1109/TSP.2013.2295551 · 3.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new approach to design multiplierless constant rotators. The approach is based on a combined coefficient selection and shift-and-add implementation (CCSSI) for the design of the rotators. First, complete freedom is given to the selection of the coefficients, i.e., no constraints to the coefficients are set in advance and all the alternatives are taken into account. Second, the shift-and-add implementation uses advanced single constant multiplication (SCM) and multiple constant multiplication (MCM) techniques that lead to low-complexity multiplierless implementations. Third, the design of the rotators is done by a joint optimization of the coefficient selection and shift-and-add implementation. As a result, the CCSSI provides an extended design space that offers a larger number of alternatives with respect to previous works. Furthermore, the design space is explored in a simple and efficient way. The proposed approach has wide applications in numerous hardware scenarios. This includes rotations by single or multiple angles, rotators in single or multiple branches, and different scaling of the outputs. Experimental results for various scenarios are provided. In all of them, the proposed approach achieves significant improvements with respect to state of the art.
    Circuits and Systems I: Regular Papers, IEEE Transactions on 07/2014; 61(7):2002-2012. DOI:10.1109/TCSI.2014.2304664 · 2.30 Impact Factor