[Show abstract][Hide abstract] ABSTRACT: Congestion has been an open issue in lossless interconnection networks for the last twenty years. In this environment, as packets can not be dropped, congestion may lead to a dramatic degradation of network performance. Although congestion was traditionally avoided by overdimensioning the network, this solution is no longer valid due to cost and power consumption concerns, that tend to reduce the size of the network in current systems. The restriction on network size makes the system working closer to the network saturation point, thus increasing congestion probability. Therefore, an effective congestion management mechanism is essential in order to avoid network performance degradation. However, except the recently proposed RECN mechanism, none of the congestion management strategies proposed in the literature had been able to achieve, at the same time, the required efficiency and scalability levels demanded by emerging systems. In this paper we analyze the difficulties in han-dling congestion on lossless interconnection networks, identifying the key points for the design of an efficient and scalable congestion management technique. We also describe how these ideas have been implemented in the RECN design. Moreover, we also show that RECN is fully compatible with the Advanced Switching standard, allowing efficient and scalable congestion management in real systems.
[Show abstract][Hide abstract] ABSTRACT: Interconnection networks are a key element in a wide variety of systems: massive parallel processors, local and system area networks, clusters of PCs and workstations, and Internet Protocol routers. They are essential to high performance in the form of high-bandwidth communications, with low latency, "quality of service" (guaranteed service levels), efficient switching, and flexibility of network topology, as embodied in Myrinet, InfiniBand, Quadrics, Advanced Switching, and similar interconnects. But, despite all the advances that modem interconnects offer, congestion is a growing problem as "lossless" interconnection networksrdquo those that do not allow data packets to be discarded" come to the fore.
[Show abstract][Hide abstract] ABSTRACT: Compared to the overdimensioned designs of the past, current interconnection networks operate closer to the point of saturation and run a higher risk of congestion. Among proposed strategies for congestion management, only the regional explicit congestion notification (RECN) mechanism achieves both the required efficiency and the scalability that emerging systems demand
[Show abstract][Hide abstract] ABSTRACT: As VLSI technology advances, the interconnection net-work represents a larger percentage of the total system cost and power consumption. In fact, a current trend in network design is to reduce the number of components. However, this leads to systems working closer to saturation point, and therefore an efficient congestion management technique is required. In that sense, RECN has been recently proposed for Advanced Switching (AS). RECN detects the formation of congestion trees and dynamically allocates queues for storing congested packets, thus, eliminating the HOL block-ing introduced by congestion trees. These queues are deal-located when congestion vanishes. We have identified two shortcomings that may affect RECN scalability and implementation. Firstly, although RECN allocates queues in an efficient way, resource deal-location is performed in-order, thus losing efficiency and wasting resources. This leads to an excessive requirement of memory at switch ports. Secondly, both allocation and deallocation mechanisms involve the use of specific control packets not supported by the AS standard, thus preventing RECN implementation. In this sense we provide a detailed description of the current RECN deallocation mechanism. In this paper we present an enhanced RECN version (RECN-DD) where these problems have been eliminated. Specifically, we propose a new distributed queue deallo-cation mechanism that reduces the number of required re-sources and does not require the use of control packets. Moreover, we propose a new congestion notification mech-anism that does not require non-standard AS packets. In-stead, flow control packets are used to notify congestion, thus simplifying the implementation of RECN-DD in AS.
[Show abstract][Hide abstract] ABSTRACT: Designers of large parallel computers and clusters are becoming increasingly concerned with the cost and power consumption of the interconnection network. A simple way to reduce them consists of reducing the number of network components and increasing their utilization. However, doing so without a suitable congestion management mechanism may lead to dramatic throughput degradation when the network enters saturation. Congestion management strategies for lossy networks (computer networks) are well known, but relatively little effort has been devoted to congestion management in lossless networks (parallel computers, clusters, and on-chip networks). Additionally, congestion is much more difficult to solve in this context due to the formation of congestion trees.
In this paper we study the dynamic evolution of congestion trees. We show that, contrary to the common belief, trees do not only grow from the root toward the leaves. There exist cases where trees grow from the leaves to the root, cases where several congestion trees grow independently and later merge, and even cases where some congestion trees completely overlap while being independent. This complex evolution and its implications on switch architecture are analyzed, proposing enhancements to a recently proposed congestion management mechanism and showing the impact on performance of different design decisions.
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose a new congestion management strategy for lossless multistage interconnection networks that scales as network size and/or link bandwidth increase. Instead of eliminating congestion, our strategy avoids performance degradation beyond the saturation point by eliminating the HOL blocking produced by congestion trees. This is achieved in a scalable manner by using separate queues for congested flows. These are dynamically allocated only when congestion arises, and deallocated when congestion subsides. Performance evaluation results show that our strategy responds to congestion immediately and completely eliminates the performance degradation produced by HOL blocking while using only a small number of additional queues.
[Show abstract][Hide abstract] ABSTRACT: Interconnection networks used in clusters of PCs are often dimensioned with certain restrictions. One restriction could be
the reduction of power consumption and overall cost. In this sense, the network size must be reduced. Another restriction
is to guarantee that the system offers a minimum bandwidth. In this case, the network size must be increased. In both cases,
the head-of-line (HOL) blocking effect (related to network congestion) may appear, degrading network performance and thus,
preventing the correct sizing of the network. Therefore, some mechanisms should be implemented for reducing or eliminating
this problem, in order to dimension the network as desired while keeping network performance at maximum. In this paper we
analyze the impact on network performance when using different mechanisms for handling HOL blocking when interconnection networks
with mesh topology are dimensioned in several ways. We show that the previously proposed RECN congestion control mechanism
is key in order to efficiently eliminate HOL blocking in meshes and, therefore, it allows the correct network sizing.