Conference Paper

Rate analysis for streaming applications with on-chip buffer constraints

ETH Zurich, Switzerland
Conference: Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, Yokohama, Japan, January 27-30, 2004
Source: DBLP


While mapping a streaming (such as multimedia or network packet processing) application onto a specified architecture, an important issue is to determine the input stream rates that can be supported by the architecture for any given mapping. This is subject to typical constraints such as on-chip buffers should not overflow, and specified play out buffers (which feed audio or video devices) should not underflow, so that the quality of the audio/video output is maintained. The main difficulty in this problem arises from the high variability in execution times of stream processing algorithms, coupled with the bursty nature of the streams to be processed. We present a mathematical framework for such a rate analysis for streaming applications, and illustrate its feasibility through a detailed case study of a MPEG-2 decoder application. When integrated into a tool for automated design-space exploration, such an analysis can be used for fast performance evaluation of different stream processing architectures.

4 Reads
  • Source
    • "Hu et al. propose an efficient greedy algorithm to size the input queues in a NoC router given the application traffic characteristics such that the NoC performance is maximized while satisfying a total buffering resource budget [29]. Maxiaguine et al. present a mathematical framework for the performance analysis of streaming applications once the on-chip buffer constraints are given [39]. The problems that we address in this paper are different from the ones in these works due to the particular constraints imposed by the AND-firing policy of the shells and the distinct buffering roles that relay stations and shell queues play in LID. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Latency-insensitive protocols allow system-on-chip (SoC) engineers to decouple the design of the computing cores from the design of the intercore communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS), each core is encapsulated within a shell, which is a synthesized interface module that dynamically controls its operation. At each clock period, if new data have not arrived on an input channel or if a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing (QS). We evaluate the heuristic algorithm with experiments over a large set of synthetically generated systems and with a case study of a real SoC system. We find that the topology of a LIS can impact not only how much throughput degradation will occur but also the difficulty of finding optimal QS solutions.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 01/2009; 27(12-27):2277 - 2290. DOI:10.1109/TCAD.2008.2008914 · 1.00 Impact Factor
  • Source
    • "Examples of formal models used for performance evaluation are Markov processes, Queuing Networks and Timed Petri Nets. In the field of performance evaluation for SoC design, Network Calculus [5] and Stochastic Automata Networks [6] have also been recently proposed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This article proposes a hardware/software partitioning method targeted to performance-constrained systems for datapath applications. Exploiting a platform based design, a Timed Petri Net formalism is proposed to represent the mapping of the application onto the platform, allowing to statically extract performance estimations in early phases of the de- sign process and without the need of expensive simulations. The mapping process is generalized in order to allow an automatic exploration of the solution space, that identi- es the best performance/area congurations among several application-architecture combinations. The method is eval- uated implementing a typical datapath performance con- strained system, i.e. a packet processing application.
    Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2008, Atlanta, GA, USA, October 19-24, 2008; 01/2008
  • Source
    • "Now suppose that we are also given that α u x (2) = 4, which means that within a time interval of length 2 there might be a burst of at most 4 stream objects. Following this specification, if 4 stream objects arrive at b during the time interval [0] [2], then over the time interval (2] [10] at most 1 stream object can arrive. Hence, although the " long-term/average " arrival rate of the stream is 0.5 stream objects per unit time, there might be occasional bursts. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we address the "rate analysis" problem for media-processing platforms consisting of multiple proces- sor cores connected in a pipelined fashion. More pre- cisely, we aim at determining tight bounds on the rates at which multimedia streams can be fed into such architec- tures. These bounds depend on architectural constraints (e.g. the available on-chip memory, bus arbitration policies, etc.), as well as the application characteristics (e.g. appli- cation partitioning and mapping, workload rates generated by different tasks, etc.). The proposed framework for rate analysis can be used for fast design space exploration to determine how these bounds change with different architec- tural parameters, mapping of the application, or changing the QoS requirements associated with the input streams.
    12th IEEE Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2006), 16-18 August 2006, Sydney, Australia; 01/2006
Show more


4 Reads