A high-performance/low-latency vector rotational CORDIC architecture based on extended elementary angle set and trellis-based searching schemes

Dept. of Electr. Eng., Nat. Central Univ., Chung-li, Taiwan
IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing 10/2003; DOI: 10.1109/TCSII.2003.816923
Source: IEEE Xplore

ABSTRACT The coordinate rotational digital computer (CORDIC) algorithm is a well-known iterative method for the computation of vector rotation. For applications that require forward rotation (or vector rotation) only, the angle recoding (AR) technique provides a relaxed approach to speed up the operation of the CORDIC algorithm. In this paper, we further apply the concept of AR technique to extend the elementary angle set in the microrotation phase. This technique is called the extended elementary-angle set (EEAS) scheme. The proposed EEAS scheme provides a more flexible way of decomposing the target rotation angle in CORDIC operation, and its quantization error performance is better than the AR technique. Meanwhile, to solve the optimization problem encountered in the EEAS scheme, we also proposed a novel search algorithm, called the trellis-based searching (TBS) algorithm. Compared with the greedy algorithm used in the conventional AR technique, the proposed TBS algorithm yields apparent signal-to-quantization-noise ratio (SQNR) improvement. Moreover, in the scaling phase of the EEAS-based CORDIC algorithm, we suggest a novel scaling operation, called Extended Type-II (ET-II) scaling operation. The ET-II scaling operation applies the same design concepts as the EEAS scheme. It results in much smaller quantization error than conventional Type-I scaling operation in the numerical approximation of scaling factor. By combining the aforementioned new schemes, the proposed EEAS-based CORDIC algorithm can improve the overall SQNR performance by up to 25 dB compared with previous works. Also, given the same target SQNR performance, we require only about 66% iteration number in the iterative CORDIC structure, or use 66% hardware complexity in the parallel CORDIC structure compared with conventional AR technique. Hence, high-performance/low-latency CORDIC very large-scale integration architectures can be achieved without degrading the SQNR performance.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Lattice structures have several advantages over the tapped delay line form, especially for the hardware implementation of general digital filters. It is also efficient for the implementation of quadrature mirror filter (QMF), because the perfect reconstruction is preserved under the coefficient quantization. Moreover, if lattice coefficients are implemented in signed powers-of-two (SPT), the hardware complexity can also be reduced. But the discrete coefficient space with the SPT representation is sparse when the number of nonzero bits is small. This paper proposes a structure of orthogonal QMF lattice with SPT coefficients, which has much denser discrete coefficient space than the conventional structure. While the conventional approaches directly quantize the lattice coefficients into SPT form, the proposed algorithm considers the quantization in the SPT angle space. For this, each lattice stage is implemented by the cascade of several variants of COordinate Rotation DIgital Computer. The resulting angle space and corresponding discrete coefficient space is much denser than the one generated by the conventional direct quantization approach. An efficient coefficient search algorithm for this structure is also proposed. Since the proposed architecture provides denser coefficient space, it shows less coefficient quantization error than the conventional QMF lattice
    Circuits and Systems I: Regular Papers, IEEE Transactions on 07/2006; 53(6-53):1254 - 1265. DOI:10.1109/TCSI.2006.870478 · 2.30 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Synthetic transmit aperture (STA) has been widely investigated in ultrasound system recently with characteristics of high frame rate and low hardware cost. Since the high-resolution image (HRI) of STA is formed by summation of low-resolution images (LRIs), it is susceptible to inter-firing motions. In this paper, we propose a low-complexity global motion compensation algorithm. We use the common region of interest $({rm ROI}_{{rm com}})$ between different transmissions of STA imaging to beamform backward and forward beam vectors. Then, the magnitude and direction of motion can be evaluated by cross-correlations between specific beam vectors in STA imaging. Compared with the uncompensated image in two-dimentional (2D) motion environment, the proposed motion compensation algorithm can improve the contrast ratio (CR) and contrast-to-noise ratio (CNR) by 13.73 and 2.04 dB, respectively. Also, the proposed algorithm improves the CR and CNR about 7.84 and 1.36 dB comparing with the reference work, respectively. In the Field II breath model, the proposed method also improves the CR and CNR about 6.65 and 1.04 dB than the reference method, respectively. Moreover, we propose a low-complexity delay generator in the architecture design to further reduce the computational complexity of the whole beamforming system. Finally, we verify the proposed low-complexity motion compensation beamforming engine by using the VLSI implementation with CMOS 90 nm technology. In the post-layout result, the core size is 2.39 mm$^2$ at 125 MHz operating frequency and the frame rate of the beamforming system is 42.23 frames per second.
    IEEE Transactions on Signal Processing 02/2014; 62(4):840-851. DOI:10.1109/TSP.2013.2295551 · 3.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an area-time efficient CORDIC algorithm that completely eliminates the scale-factor. By suitable selection of the order of approximation of Taylor series the proposed CORDIC circuit meets the accuracy requirement, and attains the desired range of convergence. Besides we have proposed an algorithm to redefine the elementary angles for reducing the number of CORDIC iterations. A generalized micro-rotation selection technique based on high speed most-significant-1-detection obviates the complex search algorithms for identifying the micro-rotations. The proposed CORDIC processor provides the flexibility to manipulate the number of iterations depending on the accuracy, area and latency requirements. Compared to the existing recursive architectures the proposed one has 17% lower slice-delay product on Xilinx Spartan XC2S200E device.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 08/2012; 20(8):1542-1546. DOI:10.1109/TVLSI.2011.2158459 · 1.14 Impact Factor