Fig 1 - uploaded by Enrique Vallejo
Content may be subject to copyright.
Sample Dragonfly topology with h=2 (p=2, a=4), 36 routers and 72 compute nodes.

Sample Dragonfly topology with h=2 (p=2, a=4), 36 routers and 72 compute nodes.

Source publication
Conference Paper
Full-text available
Dragonfly networks have been recently proposed for the interconnection network of forthcoming exascale supercomputers. Relying on large-radix routers, they build a topology with low diameter and high throughput, divided into multiple groups of routers. While minimal routing is appropriate for uniform traffic patterns, adversarial traffic patterns c...

Contexts in source publication

Context 1
... a given h, the maximum size network will be composed of 2h 2 + 1 groups, 4h 3 + 2h routers and 4h 4 + 2h 2 processing nodes. The total number of ports per router is 4h − 1. Figure 1 represents an example of a Dragonfly with h = 2. A Dragonfly network built from routers with 64 ports (h = 16, such as the PERCS technology [3]) scales to more than 256K processing nodes, leading to multi-million core supercomputers. ...
Context 2
... et al. study the impact of such hotspots using traces of real applications simulations [5]. They quantify the large impact of such link saturation by observing that different traces on a large simulated system are executed 1.72 to 3.06 times faster only by enabling nonminimal routing (Table 6 and Figure 13 of their paper, DEF vs. DFI). However, they do not consider adaptive routing mechanisms in their study. ...
Context 3
... these packets will have to be forwarded through the h subsequent global links. As global wiring is typically consecutive (observe the topology in Figure 1), all these links happen to be in the next router, R o . Then, all misrouted traffic received in R i has to be forwarded to R o through the single local link connecting both routers. ...

Similar publications

Article
Full-text available
In large-scale supercomputers, the interconnection network plays a key role in system performance. Network topology highly defines the performance and cost of the interconnection network. Direct topologies are sometimes used due to its reduced hardware cost, but the number of network dimensions is limited by the physical 3D space, which leads to an...

Citations

... Both versions of UGAL require 3 VCs to provide deadlock free-dom in Dragonflies. By contrast, there are proposals of deadlock-free adaptive routing algorithms for Dragonflies that do not need VCs to prevent deadlocks [23,24], although in practice they require other additional network resources. In general, deadlock-free adaptive routings for Dragonflies introduce some degree of network complexity and demand a higher number of network resources with respect to deterministic minimal routing. ...
Preprint
Full-text available
The Dragonfly topology is currently one of the most popular network topologies in high-performance parallel systems. The interconnection networks of many of these systems are built from components based on the InfiniBand specification. However, due to some constraints in this specification, the available versions of the InfiniBand network controller (OpenSM) do not include routing engines based on some popular deadlock-free routing algorithms proposed theoretically for Dragonflies, such as the one proposed by Kim and Dally based on Virtual-Channel shifting. In this paper we propose a straightforward method to integrate this routing algorithm in OpenSM as a routing engine, explaining in detail the configuration required to support it. We also provide experiment results, obtained both from a real InfiniBand-based cluster and from simulation, to validate the new routing engine and to compare its performance and requirements against other routing engines currently available in OpenSM.
... The network accommodates a total of 1056 servers. With a diameter of 3, the distribution and connection of global links follows the PalmTree arrangement, as specified in [10], and four virtual channels are employed. ...
... The network exhibits a diameter of 3, connecting a total of 1332 endpoints. The distribution and connection of global links follows the PalmTree arrangement, as detailed in [10], and two virtual channels are employed. ...
Conference Paper
Full-text available
A hotspot traffic pattern of communications can be a common phenomenon in HPC topologies that causes significant and lasting network performance degradation. This performance deterioration remains persistent over time, intensifying its impact even after the cessation of the detrimental traffic injection into the network. To understand its causes and effects, we analyze the network behavior under different hotspot traffic scenarios and compare the performance on various topologies. We examine both the performance drop due to traffic flows with endpoint contention, and the recovery process of the network after this phenomenon has occurred, if swift action is taken to mitigate it. Our results show that some topologies are more resilient to hotspot traffic than others, both to reduce the performance drop and/or to accelerate the recovery process. In particular, Flattened Butterfly is more resilient to congestion and consistently demonstrates a rapid recovery. The results of the analysis reinforce the need for mechanisms with effective and expeditious action to reduce the magnitude and duration of the performance drop. Furthermore, they highlight behavioral differences between topologies that can affect the effectiveness of mechanisms using congestion-based metrics.
... Due to the importance of UGAL with local information, many improvements have been proposed, including allowing adaptive routing decisions to be made after the source router [11,12,18], using a static threshold to decide whether to direct the traffic to the non-minimal path [10], using average queue occupancy to overcome "phantom" congestion [27], and dynamically adjusting the bias value based on traffic condition [9,20]. None of the existing proposals challenge the latency approximation formula used in UGAL-L, as shown in Algorithm 1. ...
... Since then, various enhancements for the UGAL routing scheme have been developed. Garcia et al. [11] identified local congestion inside intermediate groups and improved UGAL by randomly selecting intermediate nodes for non-minimal routing. Garcia et al. [11] proposed the on the fly adaptive routing (OFAR) that can dynamically change the packet routes from minimal to non-minimal on both intra-and inter-group communication. ...
... Garcia et al. [11] identified local congestion inside intermediate groups and improved UGAL by randomly selecting intermediate nodes for non-minimal routing. Garcia et al. [11] proposed the on the fly adaptive routing (OFAR) that can dynamically change the packet routes from minimal to non-minimal on both intra-and inter-group communication. Additional congestion management mechanisms have been studied in OFAR-CM [13]. ...
... This pattern is based on the servers in a group sending messages to servers in the following group, causing minimal routes to utilize the single link between the two groups. Subsequently, in [16], this idea was improved to obtain an adverse pattern that additionally presented problems for a particular Valiant scheme. ...
... However, a cycle alternating local and global links only has = 2 , as each global-local-global path inside the cycle would have an alternative local-global-local (or shortest) outside the cycle. A worse pattern for the Dragonfly is the ADV-k from [16], where each node in group k sends traffic to a node in the group g + k . This pattern can be seen as a cycle going through each group once. ...
... In the case of Dragonfly, as mentioned earlier, a well-known adverse traffic pattern denoted as ADV-h is also simulated [16]. In this traffic pattern, each packet from a server in group g has its destination set as a randomly selected server in group g + h , where h represents the number of global links per switch. ...
Article
Full-text available
Since today’s HPC and data center systems can comprise hundreds of thousands of servers and beyond, it is crucial to equip them with a network that provides high performance. New topologies proposed to achieve such performance need to be evaluated under different traffic conditions, aiming to closely replicate real-world scenarios. While most optimizations should be guided by common traffic patterns, it is essential to ensure that no pathological traffic pattern can compromise the entire system. Determining synthetic adversarial traffic patterns for a network typically relies on a thorough understanding of its topology and routing. In this paper, we address the problem of identifying a generic adversarial traffic pattern for low-diameter direct interconnection networks. We first focus on Random Regular Graphs (RRGs), which represent a typical case for these networks. Moreover, RRGs have been proposed as topologies for interconnection networks due to their superior scalability and expandability, among other advantages. We introduce Ant Mill, an adversarial traffic pattern for RRGs when using routes of minimal length. Secondly, we demonstrate that the Ant Mill traffic pattern is also adversarial in other low-diameter direct interconnection networks such as Slimfly, Dragonfly, and Projective networks. Ant Mill is thoroughly motivated and evaluated, enabling future studies of low-diameter direct interconnection networks to leverage its findings.
... Examples of indirect and direct interconnect networks for HPC and data centers[13][14][15]. ...
Article
Full-text available
Reflecting the recent slow-down in Moore’s law and the proliferation of artificial intelligence/machine learning workloads, the performance and energy consumption of networks are becoming barriers in high-performance computing (HPC) and data centers. Optical switches are expected to break these barriers, and indeed their introduction has recently commenced in data centers. This paper discusses how optical switching technologies can innovate future intra data center networks. Hyperscale data centers are much bigger in scale, and network requirements are slightly different from those of HPC. This paper focus on data center networks, since the impact of optical technologies will be more significant in data centers than in HPC. In addition to the scale issue, important metrics to be considered for network design are traffic characteristics and latency, both of which are highlighted in this paper. For hybrid (electrical packet and optical circuit) switching networks, the target latency for the optical circuit switch network (connection setup/teardown time) is shown to be around 10 µs, and the needed technologies are clarified and verified by experiments. The optical switch can simplify the present multi-tier switching network above tier-1 switches into a single tier configuration, which is possible with the development of efficient large port count optical switches. Among the different switching architectures, combining the different dimensions of space and wavelength is shown to be one of the best solutions. Fast switching needs fast device response time. Si photonics devices using Mach–Zehnder interferometers or ring-resonator-based switches and tunable filters are the most promising candidates; they offer cost-effective mass-production and fast operation and so are excellent candidates for the optical switches envisaged. Another critical technology to maximize the benefits of optical switches is a simple and low-latency control mechanism. Different approaches have been suggested as summarized in this work. Among them, harnessing optical switch parallelism is a unique technique that matches recent advances in electrical switch chips. A fast control network is realized by using a fully decentralized and asynchronous control mechanism. A hyperscale data center offers a wide variety of services, and no one system fits all needs. Optimization of parameters is an important task for maximizing the impact of optical switching in different kinds of data centers.
... This is the inspiration of the research presented in this paper. There are several hybrid interconnections that use different topologies at each level as in BlueGene IBM [11], [12], [13], Dragonfly [14], [15], X-Mesh [16] and Torus [17]. Furthermore, newer technologies on optical networks and photonic switches are emerging. ...
... Recently, a lot of interest has been devoted to multi-level direct networks as promising scalable topology for high performance computing such as PETCS [30], Dragonfly [14], [15] and Cray [30]. These networks built from high-radix switches give the impression of an all-to-all interconnection. ...
Preprint
Full-text available
Multi-level direct networks fueled by the evolving technology of active optical cables and increasing pin-bandwidth achieve reduced diameters and cost, with high-radix switches. These networks, like Dragonfly, are becoming the preferred aspirants for extreme high-performance parallel machines such as exa-scale computing. In this paper, we introduce a Hyper Z-Tree topology, which deploys the Z-Tree, a variant of fat-tree, as a computing node of a Generalized Hypercube (GHC) configuration. The resulting configuration provides higher bisection bandwidth, lower latency for some applications and higher throughput. Furthermore, the levels of the fat tree offer several path diversities across the GHC dimensions, hence conceding a more fault tolerant architecture. To profit from these path diversities, we propose two adaptive routing algorithms, which are extensions to the routing algorithm suggested in HyperX topology. These two algorithms exhibit better latencies and throughput than the HyperX.
... The universal globally-adaptive load-balanced routing (UGAL) [8] was made to route dynamically by exploiting an indicator of the local congestion such as output queue occupancy. Some variants of UGAL were also proposed including [2], [7], [9]. However, the local indicator usually fails to accurately reflect congestion at the network level [1], and therefore leads to wrong decision-making and degradation in both network latency and throughput. ...
... Traditional routing methods. Traditional routing algorithms can be generally divided into two categories: nonadaptive routing such as MIN, VALg [7], and VALn [6] and adaptive routing such as UGALg [8], UGALn [2] and Progressive Adaptive Routing (PAR) [9]. MIN always forwards packets along the shortest path, and ensures that packets reach their destination routers within 3 hops. ...
... For example, by using some local congestion information from the source router, UGALg (UGALn) chooses between a minimal path and a VALg (VALn) non-minimal path. PAR [9] generalized UGALn by further allowing some intermediate routers to adjust routing paths according to their local congestion information. ...
... Work by Bhatele et al. [21] showed that non-minimal routing on contiguous allocations could provide similar performance benefits to minimal routing with random placement on two-level direct networks -with the caveat that it created additional traffic. Garcia et al. [22] worked to improve adaptive routing by incorporating measures of congestion along intermediate hops in potential non-minimal paths. Roweth et al. [14] found low utilization of optical links in simulations of DOE workloads on dragonfly networks. ...
... This is because the targeted stack is often based on TCP that suffers performance degradation whenever packets become reordered. In contrast, HPC networks usually use packet level adaptivity, and research focuses on choosing good congestion signals, often with hardware modifications [82], [83]. ...
Article
Full-text available
The recent line of research into topology design focuses on lowering network diameter. Many low-diameter topologies such as Slim Fly or Jellyfish that substantially reduce cost, power consumption, and latency have been proposed. A key challenge in realizing the benefits of these topologies is routing. On one hand, these networks provide shorter path lengths than established topologies such as Clos or torus, leading to performance improvements. On the other hand, the number of shortest paths between each pair of endpoints is much smaller than in Clos, but there is a large number of non-minimal paths between router pairs. This hampers or even makes it impossible to use established multipath routing schemes such as ECMP. In this article, to facilitate high-performance routing in modern networks, we analyze existing routing protocols and architectures, focusing on how well they exploit the diversity of minimal and non-minimal paths. We first develop a taxonomy of different forms of support for multipathing and overall path diversity. Then, we analyze how existing routing schemes support this diversity. Among others, we consider multipathing with both shortest and non-shortest paths, support for disjoint paths, or enabling adaptivity. To address the ongoing convergence of HPC and “Big Data” domains, we consider routing protocols developed for both HPC systems and for data centers as well as general clusters. Thus, we cover architectures and protocols based on Ethernet, InfiniBand, and other HPC networks such as Myrinet. Our review will foster developing future high-performance multipathing routing protocols in supercomputers and data centers.
... The implementation used in the Niagara supercomputer relies on Mellanox InfiniBand hardware. To select the optimal path, Dragonfly+ uses a variation of the OFAR adaptive routing [61], which at each hop re-evaluates the optimal path to use. Explicit control messages are sent among the switches to notify congestion and avoid creating hotspots in the network. ...