Conference Paper

Rate analysis for streaming applications with on-chip buffer constraints

ETH Zurich, Switzerland
Conference: Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, Yokohama, Japan, January 27-30, 2004
Source: DBLP


While mapping a streaming (such as multimedia or network packet processing) application onto a specified architecture, an important issue is to determine the input stream rates that can be supported by the architecture for any given mapping. This is subject to typical constraints such as on-chip buffers should not overflow, and specified play out buffers (which feed audio or video devices) should not underflow, so that the quality of the audio/video output is maintained. The main difficulty in this problem arises from the high variability in execution times of stream processing algorithms, coupled with the bursty nature of the streams to be processed. We present a mathematical framework for such a rate analysis for streaming applications, and illustrate its feasibility through a detailed case study of a MPEG-2 decoder application. When integrated into a tool for automated design-space exploration, such an analysis can be used for fast performance evaluation of different stream processing architectures.

Full-text preview

Available from:
  • Source
    • "Hu et al. propose an efficient greedy algorithm to size the input queues in a NoC router given the application traffic characteristics such that the NoC performance is maximized while satisfying a total buffering resource budget [29]. Maxiaguine et al. present a mathematical framework for the performance analysis of streaming applications once the on-chip buffer constraints are given [39]. The problems that we address in this paper are different from the ones in these works due to the particular constraints imposed by the AND-firing policy of the shells and the distinct buffering roles that relay stations and shell queues play in LID. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Latency-insensitive protocols allow system-on-chip (SoC) engineers to decouple the design of the computing cores from the design of the intercore communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS), each core is encapsulated within a shell, which is a synthesized interface module that dynamically controls its operation. At each clock period, if new data have not arrived on an input channel or if a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing (QS). We evaluate the heuristic algorithm with experiments over a large set of synthetically generated systems and with a case study of a real SoC system. We find that the topology of a LIS can impact not only how much throughput degradation will occur but also the difficulty of finding optimal QS solutions.
    Preview · Article · Jan 2009 · IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • Source
    • "Examples of formal models used for performance evaluation are Markov processes, Queuing Networks and Timed Petri Nets. In the field of performance evaluation for SoC design, Network Calculus [5] and Stochastic Automata Networks [6] have also been recently proposed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This article proposes a hardware/software partitioning method targeted to performance-constrained systems for datapath applications. Exploiting a platform based design, a Timed Petri Net formalism is proposed to represent the mapping of the application onto the platform, allowing to statically extract performance estimations in early phases of the de- sign process and without the need of expensive simulations. The mapping process is generalized in order to allow an automatic exploration of the solution space, that identi- es the best performance/area congurations among several application-architecture combinations. The method is eval- uated implementing a typical datapath performance con- strained system, i.e. a packet processing application.
    Full-text · Conference Paper · Jan 2008
  • Source
    • "• Lastly, we show that the interface-theoretic formulation of the rate analysis problem has several advantages. These include the possibility of component-level analysis, which is computationally more efficient than the global, monolithic analysis proposed in [9], [11]. This implies easier component-level design space exploration and resource dimensioning. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Interface-based design is now considered to be one of the keys to tackling the increasing complexity of modern embedded systems. The central idea is that different com- ponents comprising such systems can be developed inde- pendently and a system designer can connect them together only if their interfaces match, without knowing the details of their internals. We use the concept of rate interfaces for compositional (correct-by-construction) design of embed- ded systems whose components communicate through data streams. Using the associated rate interface algebra, two components can be connected together if the output rate of one component is "compatible" with the input rate of the other component. We formalize this notion of compatibil- ity and show that such an algebra is non-trivial because it has to accurately model the burstiness in the arrival rates of such data streams and the variability in their processing re- quirements. We discuss how rate interfaces simplify compo- sitional design and at the same time help in functional and performance verification which would be difficult to address otherwise. Finally, we illustrate these advantages through a realistic case study involving a component-based design of a multiprocessor architecture running a picture-in-picture application.
    Preview · Conference Paper · Dec 2006
Show more