-
[show abstract]
[hide abstract]
ABSTRACT: Due to high bandwidth demand on memory system of stream applications, most of stream processors use software-managed streaming memory. However, this memory disadvantages ease of programming, compatibility, and supporting irregular stream access, which hinder the usage of stream processor in broader application domains. Meanwhile, hardware-managed coherent caches overcome these shortcomings of software-managed streaming memory with side-effect due to lack of supporting stream. For this problem, this paper developed a streamization cache whose performance is comparable to streaming memory but is more easy to use. The paper presents the motivation and details of our proposed design, including three stream-specific techniques for cache on data fetch policy, replacement policy and multi-client access. Moreover, a streamization cache instance is implemented in FT64, a 64-bit high performance stream processor. Based on a set of streaming application benchmark, the paper estimates the performance, power consumption and the area cost of the proposed architecture. Results show that these streamization techniques for cache are worthwhile.
High Performance Computing (HiPC), 2009 International Conference on; 01/2010
-
[show abstract]
[hide abstract]
ABSTRACT: There has recently been much interest in stream processing, both in industry (Cell, Storm series, NVIDIA G80, AMD FIRESTREAM) and academia (IMAGINE). Some researchers have accelerated a lot of applications in media processing, scientific computing and signal processing with a special programming style called stream programming. This paper presents a framework to program DSP with this special programming style. Stream program can run on DSP without any architectural support. H264 encoding is selected to evaluate our technique. The result shows that significant speedup is achieved, ranging from 3.2x for cavlc up to 7.1x for analysis.
Embedded and Multimedia Computing, 2009. EM-Com 2009. 4th International Conference on; 01/2010
-
Proceedings of the 17th International Conference on Multimedia 2009, Vancouver, British Columbia, Canada, October 19-24, 2009; 01/2009
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper shows the extension of application domains, hardware-managed memory structures such as caches are drawing attention for dealing with irregular stream applications. However, since a real application usually has both regular and irregular stream characteristics, conventional stream register files, caches, or combinations thereof have shortcomings. This article focuses on combining software- and hardware-managed memory structures and presents a new syncretic memory system based on the ft64 stream accelerator.
IEEE Micro 08/2008; · 1.78 Impact Factor
-
IEEE Micro. 01/2008; 28:51-70.
-
Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Seoul, Korea, August 23-25, 2007, Proceedings; 01/2007
-
High Performance Computing - HiPC 2007, 14th International Conference, Goa, India, December 18-21, 2007, Proceedings; 01/2007
-
[show abstract]
[hide abstract]
ABSTRACT: This paper begins with the study of MASA stream architecture; followed by the discussion of the stream algorithm on MASA, together with experiments using a cycle-accurate hardware simulator to analyze the performance of the implementation and to measure application run-time. The comparison with the traditional, general purpose processors code confirms MASA's potential to deliver high performance
Computer and Information Science, 2006 and 2006 1st IEEE/ACIS International Workshop on Component-Based Software Engineering, Software Architecture and Reuse. ICIS-COMSAR 2006. 5th IEEE/ACIS International Conference on; 08/2006
-
Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, ACSAC 2006, Shanghai, China, September 6-8, 2006, Proceedings; 01/2006
-
Image Analysis and Recognition, Second International Conference, ICIAR 2005, Toronto, Canada, September 28-30, 2005, Proceedings; 01/2005
-
[show abstract]
[hide abstract]
ABSTRACT: Real-time H.264 encoding of high-definition (HD) video (up to 1080p) is a challenge workload to most existing programmable processors. Instead, the novel programmable parallel processors such as stream processor, Graphic processor unit (GPU) and DSP offer a different and very promising technology for these demands. Thus, parallel computing for H.264 encoding on these processors is becoming a hot research point. It's challenged, because most emerging parallel processors focus on supporting Data Level Parallel (DLP), while the dependency inherently existing in traditional H.264 encoding algorithm significantly restricts exploiting DLP. Facing the challenge, this paper presents data parallel processing methods for key modules of H.264 encoder which can eliminate the dependency restriction. The result shows that key modules including Intra-prediction, Inter-prediction and CAVLC achieve significant speedup on stream processor by using these data parallel processing methods.