[Show abstract][Hide abstract] ABSTRACT: Developing a high quality compiler tool for application-specific instruction-set processors (ASIPs) including DSP for multimedia application is challenging. The specialization in ASIPs often calls for extensions at the high-level languages to allow the designers to exploit the specialized capabilities. This in turn requires the frontend of the compiler to handle the new syntax and carry the intentions of the designers across to the compiler backend implementations. The backend implementations also require extra efforts for optimized uses of the specialize features of ASIPs. Meanwhile, because of the diversity of the application, it is necessary to make full use of the compiler to complete supports and to make up some shortages of ASIP processors, the corresponding library functions are increased to support of certain operations, such as floating point arithmetic that may not be supported in ASIPs. With the development of the embedded parallelism, the advanced ASIP compilers need the support of parallelism for future application. This paper describes the High-performance C Compiler (HCC) and its specific implementation for an industrial ASIP and its family processors. HCC is a C language compiler extended and retargeted from GCC. A compiler extension framework is proposed processing programming syntax extensions of standard ANSI C for the ASIPs. With target-specific implementation, the adding optimized arithmetic functions library and chips definition file (CDF) as well as the header files for corresponding ASIPs, HCC compiler could be enhanced for the processing capabilities of target processors. Finally, this paper describes a new compiler static allocation and scheduling scheme for loop parallelization based on the OpenMP specification to improve the load imbalance. We have conducted analysis and extensive experiments to verify the correctness and effectiveness of the HCC compiler with the presented ideas. The results show that HCC compiler has a stable performance with excellent codes quality and it has been used in market.
No preview · Article · Jan 2015 · Multimedia Tools and Applications
[Show abstract][Hide abstract] ABSTRACT: Traditional barcode recognition algorithm usually do not fit the cylindrical code but the one on flat surface. This paper proposes a low-cost approach to implement recognition of the curved QR codes printed on bottles or cans. Finder patterns are extracted from detecting module width proportion and corners of contours and an efficient direct least-square ellipse fitting method is employed to extract the elliptic edge and the boundary of code region. Then the code is reconstructed by direct mapping from the stereoscopic coordinates to the image plane using the 3D back-projection, thus the data of code could be restored. Compared with previous approaches, the proposed algorithm outperforms in not only the computation amount but also higher accuracy of the barcode recognition, whether in the flat or the cylindrical surface.
[Show abstract][Hide abstract] ABSTRACT: In this paper, an innovative method to analyze the software specifications by using a model based on the Markov Chain is proposed. It is well known that all kinds of software are executed via the instruction set of the processor. Since the instruction set can be classified and divided into a series of finite state (i.e. the finite-state machine), it is natural that the software programs based on it has different corresponding characteristics of the Markov process in each state. More importantly, the transition probability can be calculated through the Markov chain in the form of different sparse matrix and with the help of the Discrete Fourier Transform (DFT), the model of the software can be acquired. Once the modeling is done, it will be possible to optimize the software both in the hardware design and the compiling process, which differs from the usual optimization applied only during the hardware designing process. Experimental results have shown that the model is able to greatly reduce the difficulty in solving the problems of the hit rate, cache consistency, etc.
[Show abstract][Hide abstract] ABSTRACT: In this paper, a brand-new Side-poly electrode is introduced to improve the peak electric field of the conventional Trench-NPT-IGBT. By providing an additional electric field opposite to the original one, the proposed structure is able to counteract the electric field concentration under the bottom of the trench, and therefore increases the breakdown voltage and reduces the threshold voltage at the same time. Meanwhile, the simulation results have shown that the new IGBT device mentioned above has a breakdown voltage increased by 90 Volts and a threshold voltage reduced by 0.63 Volts compared to the conventional Trench-NPT-IGBT.
[Show abstract][Hide abstract] ABSTRACT: The complicated calculations and data dependency have limited further application of the H.264/ AVC standard for a considerable time. To solve such a problem, here in this paper, a dual-parallel architecture that combines block parallel processing and mode parallel processing is proposed to speed up the process. Since the improvement of the parallelism will lead to increased consumption of the hardware. a formula sharing method is presented to reduce the hardware cost. The experimental results have shown that, synthesized into a TSMC 0.18 μ m CMOS cell library, the new architecture only requires less than 135 K gates and is able to encode 1080pHD video sequences at 30 frames per second (fps), when running at 136 MHZ.
[Show abstract][Hide abstract] ABSTRACT: An identity authentication scheme specially designed for cloud storage via USB token is presented in this paper. Meanwhile, a secure cloud storage model, in which the excellent features of the proposed scheme are fully displayed, is introduced and detailed cryptanalysis are implemented to illustrate that the scheme is able to achieve all the existing criteria of the USB token-based identity authentication. In addition, comparison with other related authentication schemes has been made, and the results have shown that the proposed scheme is both applicable and efficient.
[Show abstract][Hide abstract] ABSTRACT: The adaptive traffic flow Regulation (ATFT) algorithm is applied to adjust the parameters in the combinations of AIFS, CWmin and CWmax based on the 802.11e-based wireless networks platform. A simple but effective strategy of three parameters combinations with the adaptive priority of tuning machine is developed to achieve the high quality of service (QoS). Methods determine internal competition in the business analysis to resolve the traffic flow busy problem with the higher real-time service own the higher priority. When the business is retreating to the zero state, the ATFT adjusts the conflict probability by the regulating of AIFS, CWmin and CWmax to approach the higher performance in delivering the large amount of data services. Three performance analyses within four traffic flows in the wireless networks platform are applied to show the adaptive and better parameters tuning results.
No preview · Article · Dec 2012 · International Journal of Advancements in Computing Technology
[Show abstract][Hide abstract] ABSTRACT: In Microsoft Office, the file security is mainly protected by user authentication and files encryption. The cryptographic keys are usually derived from a password. Thus, password based key derivation function (PBKDF) is the core of the security scheme. However, the security of the PBKDF of Office is not yet ensured. In this paper, the PBKDF algorithm is analyzed through the game-playing approach and upper bounded of the Adversary's Advantage over the KDF and random function. Based on that, we discussed the practical safety of the Office encrypted files, and show that Office is secure when the user password is longer than 6 characters.
[Show abstract][Hide abstract] ABSTRACT: Traditional methods usually hypothesize that QR codes are printed on flat surfaces. This work proposes an extraction method of QR-codes printed on both flat and cylindrical surface. First, finder patterns are extracted from detecting gravity points and corners of closed contours, An efficient Hough Transform with a restrained parameter space is then employed to extract the boundary of code region, The code is reconstructed by directly mapping from 3D coordinates to the image plane thus the data of code could be restored. The experiment shows that the method is effective.
[Show abstract][Hide abstract] ABSTRACT: Benefit from the OpenCL (Open Computing Language), applications can be easily transplanted among different GPUs, multi-core CPUs, and other processors. In this paper, we present implementation of AES finalists (Rijndael, Serpent, Twofish) in XTS mode, based on OpenCL. Benchmark testing is performed on 4 mainstream GPUs and multi-core CPUs. The results are also compared with implementations based on traditional serial programming model and CUDA. The resulting data shows that throughputs based on OpenCL are higher than serial programming model, while a little lower than CUDA. Which demonstrates that OpenCL promises a portable language for GPU programming, while entail a performance penalty. On the basis of describing encryption algorithms and OpenCL programming framework, we present implementation of AES finalists (Rijndael, Serpent, Twofish) in XTS mode based on OpenCL. In order to evaluate algorithm performance comparison with CUDA, We test the performance in multi-core CPUs and GPU, and compare with traditional serial programming and CUDA. The results show that the implementation based on OpenCL can gain 1.1~2 times throughput relative to traditional serial programming in multi-core CPU, further GPU based on OpenCL can gain 23 times at most, while compared with CUDA, the implementation based on OpenCL are 10%~20% lower in GTX285. The results demonstrate that OpenCL promises a portable language for GPU programming, while entail a performance penalty.
[Show abstract][Hide abstract] ABSTRACT: In this paper, a multi-QR codes extraction method in illegible images is proposed. This method utilizes the contour tracing and corner detection technique for identification of key corner points, and then locates the potential code region by the finder pattern. The timing patterns are used to confirm a real code region. Moreover, image pre-processing procedures are merged in order to reduce the time and memory consumption. We test our method on images taken by embedded camera device under different environment, such as normal illumination, highlight spot and inadequate illumination. The results show that the method is fast for multi-QR-code extraction and practical for a digital camera with embedded DSP capable of performing advanced image processing algorithms on the fly.
[Show abstract][Hide abstract] ABSTRACT: System design and performance are presented for an experimental FM-DCSK radio system with a blind timing acquisition scheme. The transmitter and receiver architecture is proposed, and a novel two-stage blind bit synchronization algorithm for a fast and efficient timing acquisition process is introduced. This synchronization scheme exploits the waveform repetition pattern which naturally present in the DCSK transmitted reference signal structure. The BER performance of such systems is evaluated under AWGN and multi-path channel, the value is fairly close to that of perfect synchronization, which is 0.2dB difference at SNR 10-15dB. Key building blocks of circuit implementation are also presented.
[Show abstract][Hide abstract] ABSTRACT: An optimizing operator of focusing algorithms is proposed for automatic microscope system. Based on the characteristics of optimal evaluation function, this operator is applicable to automatic splicing graphics with microscope system. Experiments show that the optimized algorithm not only becomes much sharper near the focusing point, but also changes more slowly when it is far from the focusing point. Therefore, the signal to noise ratio (S/N) is largely improved and it becomes much easier to focus. On the other hand, the operator needs only a few operations of addition and multiplication. This optimizing operator obviously improved both the precision and the speed in our system for automatic splicing graphics.
[Show abstract][Hide abstract] ABSTRACT: Benefit from the novel compute unified device architecture (CUDA) introduced by NVIDIA, graphics processing unit (GPU) turns out to be a promising solution for cryptography applications. In this paper we present an efficient implementation for MD5-RC4 encryption using NVIDIA GPU with novel CUDA programming framework. The MD5-RC4 encryption algorithm was implemented on NVIDIA GeForce 9800GTX GPU. The performance of our solution is compared with the implementation running on an AMD Sempron Processor LE-1200 CPU. The results show that our GPU-based implementation exhibits a performance gain of about 3-5 times speedup for the MD5-RC4 encryption algorithm.
[Show abstract][Hide abstract] ABSTRACT: In this paper, an integrated scheduling algorithm of unicast and multicast is presented for providing QOS service and handling the mixed data of unicast and multicast. The key idea is to schedule the six QOS types independently and in parallel, and schedule the unicast and multicast by a single integrated algorithm, then arbitrate among them for accessing to the switching fabric. The QOS mechanism is designed according to The DIFFSERV model defined by TETF, furthermore, a special queue system is designed for supporting the QOS implementation. Finally this integrated algorithm is implemented by hardware, and the final hardware implementation result shows that this algorithm can normally work and all of the features are just as right as the pre-designed.
[Show abstract][Hide abstract] ABSTRACT: The MD4-family algorithms have been widely applied in cryptographic field. Nowadays, it is discovered that MD4-family algorithms are also suitable for random number generators. Since the MD4-family algorithms are computing intensive, they can be accelerated on Graphics Processing Units (GPUs) to generate massive high-quality random numbers. This paper presents acceleration of MD4-family algorithms based on GPU, and the results show that these implementations achieve 100 times speedup over AMD Sempron Processor LE-1200 CPU.
[Show abstract][Hide abstract] ABSTRACT: A new method called EPDG-GA which utilizes the edge partitions dominator graph (EPDG) and genetic algorithm (GA) for branch coverage testing is presented in this paper. First, a set of critical branches (CBs) are obtained by analyzing the EPDG of the tested program, while covering all the CBs implies covering all the branches of the control flow graph (CFG). Then, the fitness functions are instrumented in the right position by analyzing the pre-dominator tree (PreDT), and two metrics are developed to prioritize the CBs. Coverage-Table is established to record the CBs information and keeps track of whether a branch is executed or not. GA is used to generate test data to cover CBs so as to cover all the branches. The comparison results show that this approach is more efficient than random testing approach.
[Show abstract][Hide abstract] ABSTRACT: Efficient architectures for realizing MDCT/IMDCT are presented. Based on the symmetry property of trigonometric functions, N-point MDCT formula was transferred into an odd-even index paralleling process which can be achieved by two different types of decomposition, recursive formed DCT-II kernel or FFT-based DCT-IV kernel. Then a butterfly unit is employed to accelerate the computational speed. Furthermore, these new architectures are suitable for the computations of both MDCT and IMDCT, which improve the hardware efficiency. For verification, an experiment of MPEG2 AAC using this optimized FFT-based MDCT architecture on Altera Stratix FPGA is implemented. The analyzed results show the proposed structure provides a superior performance in terms of the computation rate, the data throughput and the hardware utilization. In addition, these MDCT/IMDCT architectures can be employed in many international audio standard systems.
[Show abstract][Hide abstract] ABSTRACT: Benefit from the novel Compute Unified Device Architecture (CUDA) introduced by NVIDIA, Graphics Processing Unit (GPU) turns out to be a promising solution for cryptography applications. In this paper we present an efficient implementation for MD5-RC4 encryption using NVIDIA GPU with novel CUDA programming framework. The MD5-RC4 encryption algorithm was implemented on NVIDIA GeForce 9800GTX GPU. The performance of our solution is compared with the implementation running on an AMD Sempron Processor LE-1200 CPU. The results show that our GPU-based implementation exhibits a performance gain of about 3-5 times speedup for the MD5-RC4 encryption algorithm.