-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes a novel algorithm and its very large scale integration design for context-based adaptive variable length code (CAVLC) decoding. In order to improve through put of CAVLC decoder, we propose two new methods, which are multiple level decoding (MLD) and nonzero skipping for run_before decoding (NZS). By performing parallel operations on the level decoder, MLD can decode two levels in one cycle at most situations, and NZS can produce several values of run_ before in the same cycle. These two methods have the advantages of low complexity and regularity. The proposed architecture needs 141 cycles/macroblock. Moreover, the proposed CAVLC decoder can run at 33.5 MHz to meet the real time requirement for 1920 × 1088 resolution. The power consumption for the 1920 × 1088 resolution is about 1.83 mW. The operation frequency can be reduced about 29.1% to 71.5% compared with other architectures. With an aid on a lower operation frequency, it is suitable for many low power applications. The synthesis result shows that the gate count is 13175 gates, and the maximum frequency can archive 160 MHz.
IEEE Transactions on Circuits and Systems for Video Technology 04/2011; · 1.65 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Detection of moving objects in video streams is the first relevant step of information extraction in many computer vision applications. Aside from the intrinsic usefulness of being able to segment video streams into moving and background components, detecting moving objects provides a focus of attention for recognition, classification, and activity analysis, making these later steps more efficient. In this paper, a background subtraction based on Bayesian estimation is proposed. The basic of our solution is a Bayesian likelihood test which can distinguish between foreground variation and dynamic background variation. The prior knowledge about the likelihood test is brought to bear by appropriately specified a priori probability as Markov random field. Based on this approach, decision thresholds vary depending on context, thus improving detection performance substantially. We compare our method with other modeling techniques and report experimental results, both in term of detection accuracy, for color video sequences that represent typical situations critical for video surveillance systems. Quantitative evaluation and comparison with the existing methods show that the proposed algorithm provides much improved results.
Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on; 01/2011
-
IEICE Transactions. 01/2011; 94-D:2513-2522.
-
IEEE Trans. Circuits Syst. Video Techn. 01/2011; 21:1646-1658.
-
IEEE Trans. Circuits Syst. Video Techn. 01/2011; 21:311-319.
-
Signal Processing Systems. 01/2011; 62:97-112.
-
IEEE Transactions on Multimedia. 01/2011; 13:29-39.
-
[show abstract]
[hide abstract]
ABSTRACT: This research utilizes embedded videotexts to achieve an intelligent multimedia display. We design a videotext in picture (TiP) display system which can extract the videotexts in the subchannel and then combine these videotexts with the main channel. This system was constructed on a dual-core platform to reach real-time videotext extraction and display. A schedulable design framework was proposed to partition the TiP display with videotext extraction in pipeline running. A data-aware transfer scheme was designed in which some data can be reused. Single instruction multiple data (SIMD) based mechanisms were created to enhance the computational efficiency on numerous convolutions and accumulations in videotext extraction. Quadruple buffering was manipulated to process the input/output in videotext extraction simultaneously. To optimize the labeling and filling tasks, the multi-banking and multi-tasking were developed. The evaluation results indicated that the proposed techniques can speed up the processing time of TiP display with videotext extraction. The equivalent comparison presented that the proposed techniques are more proficient at realizing videotext extraction.
IEEE Transactions on Consumer Electronics 09/2010; · 0.94 Impact Factor
-
02/2010; , ISBN: 978-953-307-049-0
-
Proceedings of the International Conference on Image Processing, ICIP 2010, September 26-29, Hong Kong, China; 01/2010
-
Proceedings of the International Conference on Image Processing, ICIP 2010, September 26-29, Hong Kong, China; 01/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes a novel processing order and an efficient architecture for real-time implementation of the deblocking filter in H.264/AVC video coding standard. The process of the deblocking filter causes the intensive requirement of data and computations and increases the execution time of both encoding and decoding. The proposed processing order, double-cross processing order, is effectively constructed by a parallel flow to improve processing speed and reduce memory access. Moreover, the proposed architecture can save about 38-80% of memory access as compared with other designs. Based on this high efficient architecture, the processing performance can be enhanced, and the operation frequency for standardized video specifications can be reduced. For the general video specification HDTV1080p (1920 Ã 1080 @30 fps), the operation frequency of the proposed architecture is only 11.5 MHz. For the high resolution QFHD specification (3840 Ã 2160 @30 fps), the operation frequency of the proposed architecture is only 46.6 MHz. The implementation result is about 20.14 K gates, and the memory requirement is 64 Ã 32 bits. The power dissipation for QFHD specification is 7.7 mW at 46.6 MHz operating frequency.
IEEE Transactions on Consumer Electronics 12/2009; · 0.94 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Automatic understanding of events happening at a site is the ultimate goal for many visual surveillance systems. Understanding of events requires that certain lower level computer vision tasks be performed. These include foreground detection, labeling foreground parts, and tracking targets. To achieve these tasks, it is necessary to build background subtraction and foreground tracking in the scene. This paper proposed a hardware-oriental algorithm for background subtraction and foreground tracking. To achieve real-time processing and flexibility, the system is then mapped to a SoC architecture with a single camera. The architecture contains two acceleration units and a programmable micro-processor unit. The usage of micro-processor can provide high flexibility for events understanding in different surveillance by user program. And the proposed accelerator hardware unit is used to increase the entire throughput. Simulation results show that the foreground detection and tracking results are satisfied. Performance of the proposed architecture estimated in terms of the number of clocks is brought forward to justify the real-time processing ability for 30 CIF frames per second.
Information Assurance and Security, 2009. IAS '09. Fifth International Conference on; 09/2009
-
[show abstract]
[hide abstract]
ABSTRACT: Iterative decoding of convolutional turbo code (CTC) has a large memory power consumption. To reduce the power consumption of the state metrics cache (SMC), low-power memory-reduced traceback maximum a posteriori algorithm (MAP) decoding is proposed. Instead of storing all state metrics, the traceback MAP decoding reduces the size of the SMC by accessing difference metrics. The proposed traceback computation requires no complicated reversion checker, path selection, and reversion flag cache. For double-binary (DB) MAP decoding, radix-2times2 and radix-4 traceback structures are introduced to provide a tradeoff between power consumption and operating frequency. These two traceback structures achieve an around 20% power reduction of the SMC, and around 7% power reduction of the DB MAP decoders. In addition, a high-throughput 12-mode WiMAX CTC decoder applying the proposed radix-2times2 traceback structure is implemented by using a 0.13-mum CMOS process in a core area of 7.16 mm<sup>2</sup>. Based on postlayout simulation results, the proposed decoder achieves a maximum throughput rate of 115.4 Mbps and an energy efficiency of 0.43 nJ/bit per iteration.
Circuits and Systems I: Regular Papers, IEEE Transactions on 06/2009; · 1.97 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents an implementation of a low-power and pure-hardware advanced-audio-coding (AAC) audio decoder system. Based on the characteristics of each decoding block, the AAC system is partitioned into four separate modules. For low-power and low-complexity considerations, architectural- and algorithmic-level approaches are adopted in both individual modules and whole system. In parallel PLA-based codeword decoder, we achieve a constant output rate of Huffman decoding in 2.5 cycles for the worst case, and memory usage is decreased compared to that in the binary-tree memory-based method. In reduced lookup table inverse quantizer, a table lookup with interpolation scheme is adopted which reduces the size of the lookup table from 8192 to 256. In hardware-shared signal processor, we use a hardware-sharing technique which integrates several similar blocks into a common hardware to reduce cost and enhance hardware utilization. In fully pipelined filterbank, a fast algorithm decreases the numbers of multiplication and addition largely to factors of 24 and 144 for the short and long blocks, respectively. A corresponding hardware for filterbank processing is proposed with fully pipelined architecture. Referring to stereo processing, a single hardware is shared for the channel pairs with low-cost consideration. The hardware operations of each module are well scheduled with high utilization of pipeline, and furthermore, the parallel processing among blocks is joined to increase efficiency. A 48% power savings can be reached by using the pipeline and parallel techniques of the channel pair. The proposed AAC decoder is realized in UMC 0.18-mum 1P6M technology and is operated at only 3 MHz in the worst case. The power dissipation is only 2.45 mW at the sampling frequency of 44.1 kHz.
Circuits and Systems I: Regular Papers, IEEE Transactions on 02/2009; · 1.97 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The H.264/AVC inter-prediction is performed for variable block-size motion estimation (VBSME) such as 16times16, 16times8, 8times16, 8times8, 8times4, 4times8 and 4times4, it cause the high complexity for H.264 motion estimation (ME). This investigation develops an architecture for a combined fast motion estimation algorithm with edge information mode decision (EIMD) and predict hexagon search (PHS). Compared with other popular ME architecture, the proposed architecture has a large search range and low processing frequency. For the general specification of SDTV (720times480) with 4 reference frames, search range 256times256, the proposed architecture needs only 18.66 MHz. For the very high specification of QFHD (3840times2160) with 1 reference frame, search range 256times256, the proposed architecture only requires 112 MHz. The gate count of the proposed architecture is 300 K, and the memory usage is 12.6 KB.
Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on; 01/2009
-
[show abstract]
[hide abstract]
ABSTRACT: Many videotexts exist in TV programs. Some videotexts provide valuable information. Thus, an efficient design to extract these videotexts is requested. Existing videotext extractors work on the PC platform and they are difficult to achieve real-time extraction and integration. Therefore, this work designs a videotext extractor on a dual-core platform. A distributed design framework for a dual-core platform is proposed. The extraction task is dispatched to the ARM and the DSP. The ARM core executes capture, display, control, and extraction threads. The DSP core performs algorithms. The ARM and the DSP communicate by buffers and solid channels. On the DSP side, some techniques are manipulated to optimize the videotext extractor. They include software pipeline, internal memory, adjusted program, assembly optimization, and DMA. To achieve high performance, two transferred schemes of DMA are proposed. This system is implemented on the TI Davinci DM6446 platform. All input videos are 720 times 480 with 30 fps captured from real-time DVB-T system. The simulation result shows that this extractor can process the large-size frames, and all the videotext can be extracted. With this novel architecture, the extraction speed can be enhanced to 23 frames per second.
Asia-Pacific Services Computing Conference, 2008. APSCC '08. IEEE; 01/2009
-
International Symposium on Circuits and Systems (ISCAS 2009), 24-17 May 2009, Taipei, Taiwan; 01/2009
-
Proceedings of the International Conference on Image Processing, ICIP 2009, 7-10 November 2009, Cairo, Egypt; 01/2009
-
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, ICME 2009, June 28 - July 2, 2009, New York City, NY, USA; 01/2009