[Show abstract][Hide abstract] ABSTRACT: A vector algorithm for the computation of the two- dimensional Discrete Cosine Transform (2D-VDCT) is presented. The formulation of 2D-VDCT by means of elements of multilinear algebra offers not only a formalism for describing the algorithm, but it enables the derivation by pure algebraic manipulations of an algorithm that is well suited to be implemented in vector- SIMD signal processors with a scalable level of parallelism. The vector formulation of the two-dimensional 2D-VDCT can be im- plemented in a matrix oriented language and a suitable compiler generates code for our family of STA (Synchronous Transfer Architecture) vector architectures with different amounts of SIMD-parallelism. We show in this paper how important speedup factors can be achieved with this methodology. I. I NTRODUCTION The two-dimensional DCT plays a paramount role in video and image compression techniques. Over the years many fast algorithms have been proposed for the computation of the DCT. Most of the publications related to implementation issues of the DCT concentrate on VLSI implementations. We address in this paper the implementation of a fast algorithm for the DCT into our family of STA processor cores featuring SIMD- vector parallelism. In the last time, we have experienced how the SIMD- vector computational model has made its way from classical supercomputers to real-time embedded applications. In fact, vector signal processors have emerged upon the promise of delivering flexibility and processing power for computing number crunching algorithms at reasonable levels of power consumption. In (1), we have presented a novel micro- architecture for designing and implementing low-power, high- performance DSPs cores. We call this architectural template Synchronous Transfer Architecture (STA). Moreover, in (2) we presented a hardware design methodology that enables the rapid silicon implementation of SIMD-vector processors with different levels of parallelism based on our STA architectural template. The fast computation of signal transformations like the DCT is based on iterative divide-and-conquer algorithms: the transformation matrix is expressed as a function of smaller transformation matrices. Thus, the original computation that operates on vector spaces of a high dimensionality is reduced to the computation of smaller transformation matrices that operate on smaller vector spaces. The iterative formulation of the original transformation matrix is achieved by adequate permutation of the input samples. Elements of multilinear algebra are specially suitable for the description of this sort of algorithms. On the one hand, the rich framework offered by multilinear algebra allows for expressing the recursive nature of divide-and-conquer algorithms. On the other hand, it also enables the manipulation and derivation of new algorithms by exploiting pure algebraic properties. Especially interesting are those algebraic manipulations that reveals the vector op- erations of the algorithm, since they lead to formulations of algorithms that process data in vector fashion. These ideas are discussed in detail in (3),(4), and they encouraged many researchers to publish a series of papers. Most of these papers address the derivation of vector algorithms for the classical example of the Fast Fourier Transform (FFT). Especially interesting is the work by Franchetti (5), where an algorithm for the vector computation of the FFT is presented. In this paper we present the design of a vector algorithm for the two-dimensional DCT based on the framework of multilinear algebra. Once a suitable algorithm is designed, we implement it in a matrix oriented language like MatlabTM. Such a language allows for expressing vector algorithms described in the notation of multilinear algebra. A suitable compiler can recognize these operators and generate a se- quence of vector machine instructions for our family of STA DSP cores . We show that important speedup factors are achieved by this methodology. The remainder of this paper is as follows. In section II we present our STA architectural template. In section III we introduce some elements of multi- linear algebra. In section IV we use this algebraic framework for the derivation of the 2D-VDCT algorithm. In section V we introduce our compiler infrastructure and the results obtained from the automatic code generation. Finally, in Section VI we present our conclusions.
Embedded Computer Systems: Architectures, Modeling, and Simulation 5th International Workshop, SAMOS 2005, Samos, Greece, July 18-20, 2005, Proceedings; 01/2005
[Show abstract][Hide abstract] ABSTRACT: Today's communications systems especially in the field of wireless communications rely on many different algorithms to provide applications with constantly increasing data rates and higher quality. This development combined with the wireless channel characteristics as well as the invention of turbo codes has particularly increased the importance of interleaver algorithms. In this paper, we demonstrate the feasibility to exploit the hardware parallelism in order to accelerate the interleaving procedure. Based on a heuristic algorithm, the possible speedup for different interleavers as a function of the degree of parallelism of the hardware is presented. The parallelization is generic in the sense that the assumed underlying hardware is based on a parallel datapath DSP architecture and therefore provides the flexibility of software solutions.
Journal of VLSI Signal Processing 01/2005; 39. · 0.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Recently, communication systems with multiple transmission and reception antennas (MIMO) have been intro-duced and proven to be suitable for achieving a high spectral efficiency. Assuming full channel knowledge at the receiver, so-called sphere detectors perform a tree search and generate multiple hypotheses about the transmitted symbol, from which soft output information can be derived for each bit. Most schemes constrain the search to a certain radius, estimated from the desired number of hypotheses and statistical channel properties. However, the actual number of found hypotheses in this case still deviates strongly from the desired quantity. In this paper, we introduce a new algorithm that determines a more precise search radius by considering the current channel realization and received signal. Simulation results show an improved performance, a strong reduction of the deviation in hypotheses quantity, and thus a low variance in complexity, which is very important for practical implementation.
[Show abstract][Hide abstract] ABSTRACT: The outage behavior of a single cell DS-CDMA system using higher order modulation schemes is investigated for the uplink in presence of flat Rayleigh fading channels. For this purpose, the results of the asymptotic analysis for large CDMA systems with random spreading codes are used to obtain the signal-to-interference-and-noise ratio after the linear minimum mean-squared-error receiver. This information is used to determine the outage probabilities for different rates and both coding at capacity and modulation and coding at cut-off rate. The influence of the system load on the outage is determined analytically and compared to simulation results. Furthermore, an expression is presented that allows to easily determine the ad-ditional required signal-to-noise ratio for modulation and coding at cut-off rate from the results for coding at capacity. Finally, the investigations are extended to determine the maximum spectral efficiency for given outage constraints and modulation schemes. The optimum loads corresponding to the maximum efficiencies are presented and shown to vary for the different modulation schemes.
[Show abstract][Hide abstract] ABSTRACT: Recently, the concept of forced convergence decoding for Low-Density Parity-Check Codes has been introduced. Restricting the message passing in the itera- tive process to the nodes that still significantly contribute to the decoding result, this approach allows for substan- tial reduction in decoding complexity at negligible deteri- oration in performance. We analyze this novel technique using EXIT charts and show how it compares to and can be combined with other complexity reduction techniques. Our findings imply that forced convergence works effec- tively in conjunction with other complexity reduction tech- niques while retaining its attractiveness in terms of the complexity-performance trade-off.
[Show abstract][Hide abstract] ABSTRACT: Cooperative relaying recently emerged as a viable option for future wireless networks. By simultaneously exploiting path loss savings known from relaying scenarios and the diversity inherent to any scheme involving spatially separated transmitters, this technique is able to leverage gains from both relaying and spatial diversity techniques. In this paper, we study different cooperative relaying protocols and compare their performance with that of direct transmission and conventional relaying. We investigate under which conditions the developed techniques provide gains over other approaches. Our results confirm that cooperative relaying is an effective means of enhancing the per- formance of wireless systems whenever temporal and frequency diversity is scarce.
European Transactions on Telecommunications 01/2005; 16:5-16. · 1.05 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The system spectral efficiency is considered for the single-cell DS-CDMA uplink with random spread-ing and flat Rayleigh fading channels. Based on the asymp-totic analysis for large systems, the signal-to-interference-and-noise ratio after the linear minimum mean-squared-error detector is determined analytically for multi-code transmission. This measure is used to obtain the maximum rate and thus efficiency by applying the concepts of cod-ing at capacity and at cut-off rate. The efficiency of mul-tiple codes is compared to multi-level modulation for dif-ferent signal-to-noise ratios and system loads. It is shown that multi-code transmission improves efficiency for lower loads, but cannot increase its maximum value. In contrast, with multi-level modulation the maximum efficiency can be increased in the low-noise region. The accuracy of these re-sults is investigated by means of simulations. Both simula-tion and asymptotic theoretical results are very close. Fi-nally, the optimal parameters are compared to the UMTS parameter set. The study suggests increasing both spread-ing factor and modulation order to achieve better spectral efficiency when using an linear minimum mean-squared-error detector.
[Show abstract][Hide abstract] ABSTRACT: Many digital signal processors (DSPs) and also microprocessors are employing the single-instruction multiple-data (SIMD) paradigm
for controling their data paths. While this can provide high computational power and efficiency, not all applications can
profit from this feature. One important application of DSPs are recursive filters. Due to their data-dependencies they can
not exploit the capabilities of SIMD-controlled DSPs. This paper introduces enhancements of the SIMD control paradigm to accommodate
recursive filters. Three methods for calculating recursive filters on SIMD-controlled DSPs and their requirement’s for control
and data transfer are presented. Their performance and hardware requirements are evaluated to determine the most efficient
solution in terms of the AT-product.
[Show abstract][Hide abstract] ABSTRACT: In order to achieve high performance and low hardware overhead over application specific integrated circuits (ASICs), application-specific DSPs (AS-DSPs) are more and more widely used. However, designing them is still a tedious, time-consuming and error-prone task since each application has to be analyzed thoroughly, which is usually done by hand. Recently, we proposed a platform approach to design data paths for AS-DSPs. In this paper we are introducing a compiler-based methodology to speed-up and simplify the customization of the data path platform. Based on the SUIF compiler framework by Stanford University, we implemented analysis passes to determine the kind and useful number of functional units, the potential degree of parallelism, and the required connectivity between functional units.
Parallel Computing in Electrical Engineering, 2004. PARELEC 2004. International Conference on; 10/2004
[Show abstract][Hide abstract] ABSTRACT: Many digital signal processors (DSPs) and also microprocessors are employing the single-instruction multiple-data (SIMD) paradigm for controling their data paths. While this can provide high computational power and e#ciency, not all applications can profit from this feature.
[Show abstract][Hide abstract] ABSTRACT: s. Nodes, which can not reach the base station directly are given the possibility to access it via other nodes, making a much larger network feasible. But this is not the whole truth. Even though the source node may save energy, relaying nodes have to spend transmit energy as well. Moreover, in order to receive the packets properly, the relays have to be in receive mode for some time. Of course, several MAC-schemes were developed to reduce idle listening and overhearing , . Nevertheless, even in fully synchronized networks, where all nodes know their neighbors and their duty cycles, the receive energy and transmit energy have to be included in the calculation. Additional communication overhead has to be considered as well. Even worse, nodes close to the base station would have to handle more traffic, leading to an energy-unbalanced system, causing a decreased network lifetime. So the overall energy consumption for a data transmission using multiple hops may be worse than for the
[Show abstract][Hide abstract] ABSTRACT: Future wireless communications systems are expected to provide ever higher data rates. Still, devices have to be produced at reasonable cost in order to be affordable to customers. The widely known impairments -- "dirt effects" -- in analog RF tend to aggravate as we go for the large transmission bandwidths and high carrier frequencies that usually come with an increased data throughput.
[Show abstract][Hide abstract] ABSTRACT: Cooperative relaying is a recently developed concept that allows for providing single-antenna devices with gains from spatial diversity. So far, the performance of those schemes has mainly been investigated in comparison to conventional multiple-antenna systems and conventional relaying techniques. Yet, the cooperation of mobile terminals offers another important enhancement. Whenever other sources of diversity are scarce, the transmission over a statistically independent relay path can provide a significant amount of spatial diversity then to be exploited by error correction techniques to effectively combat fading effects. We therefore examine the performance of cooperative relaying protocols in slow and fast fading regimes, in comparison to approaches that exploit temporal diversity. Our results imply that user cooperation is a powerful means of enhancing link level performance in environments where temporal diversity is limited and delay constraints preclude the use of larger interleavers.
[Show abstract][Hide abstract] ABSTRACT: The paper considers various relaying strategies for wireless networks. We comparatively discuss and analyse direct transmission, conventional "multihop" relaying, and the novel concepts of cooperative relaying from the viewpoint of system level performance. While conventional relaying exploits pathloss savings, cooperative relaying additionally takes two inherent advantages of relay-based systems into account: the ability to exploit the broadcast nature of the wireless medium, and the diversity offered by the relay channel. Following a description of these concepts, we analyse the performance of such systems in an exemplary manner for power-controlled cellular and ad hoc CDMA systems. The resulting power savings and capacity improvements suggest that cooperative relaying may constitute an interesting candidate for future cellular and ad hoc network architectures.
[Show abstract][Hide abstract] ABSTRACT: This paper provides a general analysis of cooperative amplify-and-forward relay networks and gives a unified and comparative view of various proposed systems. The studied cooperative relay networks offer spatial diversity gains that conventional multi-antenna systems often fail to achieve due to limitations of the feasible number of antennas per terminal and correlated propagation. We first present a general analysis for the case of multiple antennas per node, which is subsequently used to investigate the relative attractiveness of cooperative schemes for various parameters that include SNR, spectral efficiency, and spatial correlation. We conclude that such virtual antenna arrays are an attractive option to overcome the drawbacks of conventional relaying systems.
[Show abstract][Hide abstract] ABSTRACT: Recently, cooperative diversity has emerged as a means of providing gains from spatial diversity to devices with single antennas. Yet, the performance of these protocols remains limited in symmetric networks. In this paper, we investigate the performance of a novel "detached" cooperative diversity protocol that is designed for asymmetric networks, both in terms of outage probability and frame error rate. The influence of data rate, path loss, and network geometry on the performance of the proposed protocol is studied, and the usage region, in which cooperative schemes outperform direct transmission, is derived.
[Show abstract][Hide abstract] ABSTRACT: The growing use of digital signal processors (DSPs) in embedded systems necessitates the use of optimizing compilers supporting their special architecture features. Beside the irregular DSP architectures for reducing chip size and energy consumption, single instruction multiple data (SIMD) functionality is frequently integrated with the intention of performance improvement. In order to get an energy-efficient system consisting of processor and compiler, it is necessary to optimize hardware as well as software. It is not obvious that SIMD operations can save any energy: if n operations are executed in parallel, each of them might consume the same amount of energy as if there were executed sequentially. Up to now, no work has been done to investigate the influence of compiler generated code containing SIMD operations w.r.t. the energy consumption. This paper deals with the exploration of the energy saving potential of SIMD operations for a DSP by using a generic compilation framework including an integrated instruction level energy cost model for our target architecture. Effects of SIMD operations on the energy consumption are shown for several benchmarks and an MP3 application .
[Show abstract][Hide abstract] ABSTRACT: Application tailored signal processors fill the gap between ASICs and general purpose DSPs. Single Instruction Multiple Data(SIMD)
Signal Processors offer high computational power with low control overhead. This paper describes the development of a multi-processor
OFDM-System x using automatically generated SIMD-DSP Cores. The focus of this case of study was the test of our integrated
design flow which is based on our core generation tool. We show how with our design methodology we reduce the design cycle
in comparsion with other HW/SW Co-design tools and traditional design flows.
Computer Systems: Architectures, Modeling, and Simulation, Third and Fourth International Workshops, SAMOS 2003 and SAMOS 2004, Samos, Greece, July 21-23, 2003 and July 19-21, 2004, Proceedings; 01/2004