S. Yalamanchili’s research while affiliated with Georgia Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (41)


Figure 1. ShareStreams System Architecture: Processor-attached Configuration
Figure 4. Sharestreams Scheduler Timeline
Figure 5. Register Block: Critical Update Logic paths
Figure 6. Decision Block: Concurrent Single-Cycle Evaluation with Predicate Logic
Figure 7. Area-Clock Rate Characteristics (Virtex I) 

+4

ShareStreams: A Scalable Architecture and Hardware Support for High-Speed QoS Packet Schedulers
  • Conference Paper
  • Full-text available

May 2004

·

95 Reads

·

11 Citations

·

S. Yalamanchili

·

·

ShareStreams (scalable hardware architectures for stream schedulers) is a unified hardware architecture for realizing a range of wire-speed packet scheduling disciplines for output link scheduling. This paper presents opportunities to exploit parallelism, design issues, tradeoffs and evaluation of the FPGA hardware architecture for use in switch network interfaces. The architecture uses processor resources for queuing and data movement and FPGA hardware resources for accelerating decisions and priority updates. The hardware architecture stores state in register base blocks, stream service attributes are compared using single-cycle decision blocks arranged in a novel single-stage recirculating network. The architecture provides effective mechanisms to trade hardware complexity for lower execution-time in a predictable manner. The hardware realized in a Virtex-I and Virtex-II FPGA can meet the packet-time requirements of 10 Gbps links for 256 stream queues with window-constrained scheduling disciplines. The hardware can schedule 1536 stream queues with priority-class/fair-queuing scheduling disciplines using 16 service-classes to meet 10 Gbps packet-times.

Download

A new switch scheduling algorithm to improve QoS in the multimedia router

January 2003

·

10 Reads

·

2 Citations

·

·

F.J. Quiles

·

[...]

·

S. Yalamanchili

The multimedia router (MMR) is aimed at providing QoS to multimedia flows, which coexist with conventional best-effort traffic, by means of a single-chip, compact router designed for cluster and local area environments. As the router is based on a multiplexed crossbar, hardware efficient link and switch scheduling algorithms are needed. Their goal is to achieve a high utilization, while the QoS needed by the multimedia connections is guaranteed. This work presents a novel switch scheduling algorithm, the candidate conflict arbiter (CCA), that can be efficiently implemented in the MMR. Simulation results show that this proposal beats other previous algorithms in terms of maximum throughput achieved while still providing QoS to the multimedia flows.


Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router

April 2002

·

78 Reads

·

5 Citations

The primary objective of the Multimedia Router (MMR) project is the design and implementation of a compact router optimized for multimedia applications. The router is targeted for use in cluster and LAN interconnection networks, which offer different constraints and therefore differing router solutions than WANs. The goal is to provide architectural support to enable a range of Quality Of Service (QoS) guarantees at latencies comparable to state-of-the-art multiprocessor cut-through routers. One of the critical design parameters in order to provide this is the switch scheduling algorithm. The authors proposed in an earlier work an efficient crossbar arbitration scheme, the Candidate-Order Arbiter algorithm. In this paper, the performance obtained with this proposal is analyzed and compared to other well-known scheme. The results show that QoS may not be guaranteed by using a switch scheduling algorithm targeted only to maximize crossbar utilization. Moreover, simulations show that our approach is a suitable algorithm to guarantee high bandwidth utilization, up to 78%, while still providing QoS to both CBR and VBR traffic.


A multimedia router architecture to provide high performance and QoS guarantees to mixed traffic

February 2002

·

6 Reads

·

2 Citations

The explosive growth in using scalable and cost-effective clusters and local area environments involve the design of high performance networks aimed at providing QoS to multimedia flows. Thus, the main goal pursued by the Multi-Media (MMR) project is to design a single-chip router able to efficiently handle multimedia flows and best-effort traffic. In this paper we focus on the performance evaluation of the MMR architecture using a mix of CBR, VBR and best effort workload. Preliminary simulation results show that, by using simple link and switch scheduling algorithms, the router is able to achieve a link bandwidth utilization of 80%, while still providing QoS guarantees to both CBR and VBR traffic in the presence of best-effort traffic.




Figure 1. The Architecture of the MultiMedia Router (MMR) 
Figure 3. Jitter vs. Offered Load: Fixed and Biased Priorities 40 60 80 100 0 
Figure 4. Delay vs. Offered Load: Fixed and Biased Priorities 40 60 80 100 0 
Figure 5. Delay and Jitter vs. Offered Load: Fixed and Biased Priorities, Autonet, Perfect Switch 40 60 80 100 0 
MMR: a high-performance MultiMedia Router-architecture and designtrade-offs

February 1999

·

487 Reads

·

46 Citations

This paper presents the architecture of a router designed to efficiently support traffic generated by multimedia applications. The router is targeted for use in clusters and LANs rather than in WANs, the latter being served by communication substrates such as ATM. The distinguishing features of the proposed router architecture are the use of small fixed-size buffers, a large number of virtual channels, link-level virtual channel flow control, support for dynamic modification of connection bandwidth and priorities, and coordinated scheduling of connections across all output channels. The paper begins with a discussion of the design choices and architectural trade-offs made in the current MultiMedia Router (MMR) project. The performance evaluation section presents some preliminary results of the coordinated scheduling of constant bit rate (CBR) traffic streams


QUIC: a quality of service network interface layer for communication in NOWs

February 1999

·

17 Reads

·

2 Citations

This project explores the development of a hardware/software infrastructure to enable the provision of quality of service (QoS) guarantees in high performance networks used to configure clusters of workstations/PCs. These networks of workstations (NOWs) have emerged as a viable high performance computational vehicle and are also being called upon to support access to multimedia datasets. Example applications include Web servers, video-on-demand servers, immersive environments, virtual meetings, multi-player 3-D games, interactive simulations, and collaborative design environments. Such applications must often share the interconnect with traditional compute intensive parallel/distributed applications that are usually driven by latency requirements in contrast to jitter, loss rate, or throughput requirements. The challenge is to develop a communication infrastructure that effectively manages the network resources to enable the diverse QoS requirements to be met. The major components of QUIC include: use of powerful, processors embedded in the network interfaces; scheduling paradigms for concurrently satisfying distinct QoS requirements over multiple streams; re-configurable hardware support to enable complex scheduling decisions to be made in the desired time frames; and a flexible and extensible virtual communication machine that provides a uniform interface for dynamically adding hardware/software functionality to the network interfaces (NIs). This paper reviews the goals, approach and current status of this project


Dynamically configurable message flow control for fault-tolerant routing

February 1999

·

42 Reads

·

27 Citations

IEEE Transactions on Parallel and Distributed Systems

Fault-tolerant routing protocols in modern interconnection networks rely heavily on the network flow control mechanisms used. Optimistic flow control mechanisms, such as wormhole switching (WS), realize very good performance, but are prone to deadlock in the presence of faults. Conservative flow control mechanisms, such as pipelined circuit switching (PCS), ensure the existence of a path to the destination prior to message transmission, achieving reliable transmission at the expense of performance. This paper proposes a general class of flow control mechanisms that can be dynamically configured to trade-off reliability and performance. Routing protocols can then be designed such that, in the vicinity of faults, protocols use a more conservative flow control mechanism, while the majority of messages that traverse fault-free portions of the network utilize a WS like flow control to maximize performance. We refer to such protocols as two-phase protocols. This ability provides new avenues for optimizing message passing performance in the presence of faults. A fully adaptive two-phase protocol is proposed, and compared via simulation to those based on WS and PCS. The architecture of a network router supporting configurable flow control is also described


On adaptive resource allocation for complex real-time applications

January 1998

·

23 Reads

·

138 Citations

Resource allocation for high-performance real-time applications is challenging due to the applications' data-dependent nature, dynamic changes in their external environment, and limited resource availability in their target embedded system platforms. These challenges may be met by use of adaptive resource allocation (ARA) mechanisms that can promptly adjust resource allocation to changes in an application's resource needs, whenever there is a risk of failing to satisfy its timing constraints. By taking advantage of an application's adaptation capabilities, ARA eliminates the need for `over-sizing' real-time systems to meet worst-case application needs. This paper proposes a model for describing an application's adaptation capabilities and the runtime variation of its resource needs. The paper also proposes a satisfiability-driven set of performance metrics for capturing the impact of ARA mechanisms on the performance of adaptable real-time applications. The relevance of the proposed set of metrics is demonstrated experimentally, using a synthetic application designed to represent time-critical applications in C31 systems


Citations (30)


... Finally, for parallel applications that exhibit phase behavior, there may be may be different degrees of parallelism in different phases. Application runtimes can recognize such behavior and can share information about the needs of future phases with system-level resource management [6,42]. ...

Reference:

Attaining system performance points
On adaptive resource allocation for complex real-time applications
  • Citing Article

... In [15], GQ * and FiConn were empirically compared on the basis of network throughput, latency, load balancing, faulttolerance, and cost-to-build (network throughput was measured according to the aggregate bottleneck throughput). Network throughput, load balancing, and fault tolerance were evaluated with respect to a routing algorithm GQSRouting for GQ * k,n , based on well-known fault-tolerant routing algorithms for GQ (surveyed by Young and Yalamanchili in [43]). The experiments in [15] show that GQ * outperforms FiConn in all evaluations undertaken. ...

Adaptive routing in generalized hypercube architectures
  • Citing Article
  • January 1991

... Image processing in general involves intensive computation. Parallel computing appears to be the economical way to achieve real-time performance 3,4]. The SIMD mesh architecture is considered as a natural parallel architecture for image processing because it directly mirrors image data structures and can be easily implemented in hardware. ...

Parallel Processing Methodologies for Image Processing and Computer Vision
  • Citing Chapter
  • December 1993

... The problem of scheduling AAPC was studied in [10,11,25] for circuit-switched meshes, tori and hypercubes. A similar problem, called scheduling total (or complete) exchange, and its variations in which an arbitrary communication pattern needs to be scheduled, were studied for rings (unidirectional) in addition to meshes, tori and hypercubes in [2,12,26,27,30], where store-and-forward routing was assumed. Another similar problem, called scheduling all-to-all broadcasting, in which each node needs to broadcast the same packet to all other nodes, was studied for meshes, tori and hypercubes in [4,15,16]. ...

All-to-all personalized exchange in two-dimension and 3-dimension tori
  • Citing Article

... There are several switching techniques. These include packet switching (store-and-forward), circuit switching, virtual cut-through switching, wormhole switching, mad postman switching, buffered wormhole switching, pipelined circuit switching, and scouting switching [4,13,2,7,6,11,15,10,20]. ...

Distributed deadlock-free routing in faulty pipelined kary n-cubes
  • Citing Article

... Another way to achieve deadlock-free is based on flow control, which prevents the formation of buffer occupation-request cycles in the network, thus achieving deadlock-free [105]. Majumder et al. [106] proposed a solution called remote control (RC) based on the fact that inter-chiplet networks are guaranteed to be deadlock-free at design time, thus deadlocks are only involved in the inter-chiplet BR, during the high congestion of outbound and inbound packets. ...

Interconnection Networks
  • Citing Book
  • January 2002

... In this case, a priority algorithm based on bandwidth and delay has been selected, the SIABP (Simple-IABP) algorithm [2]. The algorithm has good behavior under high traffic loads and it is the core of the link scheduling algorithm in the MMR router [3]. The algorithm also increases the priority of packets proportionally to its waiting time in the buffer input queue. ...

Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router

... From Table 1 [14], [15], [19], [22]- [27]. Table 3 summarizes results obtained with various queue management architectures, knowing that the throughput of the QMRD [17] system depends on the protocol data unit (PDU) payload size, the reported OD-QM [13] results are for 512 active queues, and 64 bytes per packet. To make sure that our design is comparable, it was implemented with a total of 512 PDIs queue capacity, 64/32 bit priority, and the worst case egress port throughput is reported assuming 64-byte packets, supporting pipelined enqueue, dequeue and replace operations in a single clock cycle, i.e., O(1). ...

ShareStreams: A Scalable Architecture and Hardware Support for High-Speed QoS Packet Schedulers

... However, this is avoided by pipelining flit transmission at the phit level. The authors have proposed several link and switch scheduling algorithms [16][3][4] . Link scheduling algorithms are based on the concept of biased priori- ties [8][10][15] . ...

A new switch scheduling algorithm to improve QoS in the multimedia router
  • Citing Conference Paper
  • January 2003