Conference Paper

Combining OpenFabrics Software and Simulation Tools for Modeling InfiniBand-Based Interconnection Networks

Authors:
  • University of Castilla-La Mancha, Albacete, Spain
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We have performed simulations and experiments with real IB-based hardware, using a framework which integrates IB control software, IB-based hardware and OMNeT++-based simulators (see Fig. 12). We have extended previously proposed tools [2], [35], [36]. ...
... We use several modules of the OpenFabrics Software [14], such as the ibsim, which simulates the control traffic in the IB fabric, IB tools, used to test and verify the configuration of the IB fabric, and the OpenSM. Moreover, we have developed the RAAP HPC (RHPC) tools [35]. RHPC provides utilities to obtain information about the network, such as topology or routing paths, which can be used to feed the OMNeT++-based simulator described in the next section. ...
Article
Full-text available
Dragonfly topologies are gathering great interest as one of the most promising interconnect options for High-Performance Computing systems. Dragonflies contain physical cycles that may lead to traffic deadlocks unless the routing algorithm prevents them properly. Previous topology-aware algorithms are difficult to implement, or even unfeasible, in systems based on the InfiniBand (IB) architecture, which is the most widely used network technology in HPC systems. In this paper, we present a new deterministic, minimal-path routing for Dragonfly that prevents deadlocks using VLs according to the IB specification, so that it can be straightforwardly implemented in IB-based networks. We have called this proposal D3R (Deterministic Deadlock-free Dragonfly Routing). D3R is scalable as it requires only 2 VLs to prevent deadlocks regardless of network size, i.e. fewer VLs than the required by the deadlock-free routing engines available in IB that are suitable for Dragonflies. Alternatively, D3R achieves higher throughput if an additional VL is used to reduce internal contention in the Dragonfly groups. We have implemented D3R as a new routing engine in OpenSM, the control software including the subnet manager in IB. We have evaluated D3R by means of simulation and by experiments performed in a real IB-based cluster, the results showing that, in general, D3R outperforms other routing engines.
... In more detail, in order to get results for this evaluation we have performed both simulations and experiments with real IB hardware, using a framework which integrates IB control software, IB-based hardware and OMNeT++based simulators [30]. Basically, in the context of this study we have extended a previously proposed methodology [18,31], adding support for managing and using SL2VL tables to the flit-level IB simulator contributed by Mellanox Technologies™ [18], and to an in-house packet-level technology-agnostic interconnection networks simulator [32]. ...
Preprint
Full-text available
The Dragonfly topology is currently one of the most popular network topologies in high-performance parallel systems. The interconnection networks of many of these systems are built from components based on the InfiniBand specification. However, due to some constraints in this specification, the available versions of the InfiniBand network controller (OpenSM) do not include routing engines based on some popular deadlock-free routing algorithms proposed theoretically for Dragonflies, such as the one proposed by Kim and Dally based on Virtual-Channel shifting. In this paper we propose a straightforward method to integrate this routing algorithm in OpenSM as a routing engine, explaining in detail the configuration required to support it. We also provide experiment results, obtained both from a real InfiniBand-based cluster and from simulation, to validate the new routing engine and to compare its performance and requirements against other routing engines currently available in OpenSM.
... It was outside of the scope of this work to design simulation parameters that reflected a wide array of use cases, and instead an existing simulation tool for HPC systems was used: Sauron [92]. This tool, based on the OMNeT++ framework, has been used in several studies of HPC routing techniques [93,94], including several combining queueing schemes and adaptive routing [55,70,71] and energy saving techniques [101]. Within this framework, simple parameters were chosen to model a network small but large enough to exhibit the behaviour studied in that case. ...
Thesis
Building efficient supercomputers requires optimising communications, and their exaflopic scale causes an unavoidable risk of relatively frequent failures.For a cluster with given networking capabilities and applications, performance is achieved by providing a good route for every message while minimising resource access conflicts between messages.This thesis focuses on the fat-tree family of networks, for which we define several overarching properties so as to efficiently take into account a realistic superset of this topology, while keeping a significant edge over agnostic methods.Additionally, a partially novel static congestion risk evaluation method is used to compare algorithms.A generic optimisation is presented for some applications on clusters with heterogeneous equipment.The proposed algorithms use distinct approaches to improve centralised static routing by combining computation speed, fault-resilience, and minimal congestion risk.
... In more detail, in order to get results for this evaluation we have performed both simulations and experiments with real IB hardware, using a framework which integrates IB control software, IB-based hardware and OMNeT++based simulators[30]. Basically, in the context of this study we have extended a previously proposed methodology[18,31], adding support for managing and using SL2VL tables to the flit-level IB simulator contributed by Mel-In this section we analyze the impact of having different VL buffer sizes and the use of Virtual Output Queuing (VOQ) on the performance of the analyzed routing engines. In that sense, the InfiniBand (IB) specification requires to provide separate buffering resources at switch and HCA ports, enough space in ferent interconnection networks configurations and switch features[32]. ...
Article
Full-text available
The Dragonfly topology is currently one of the most popular network topologies in high-performance parallel systems. The interconnection networks of many of these systems are built from components based on the InfiniBand specification. However, due to some constraints in this specification, the available versions of the InfiniBand network controller (OpenSM) do not include routing engines based on some popular deadlock-free routing algorithms proposed theoretically for Dragonflies, such as the one proposed by Kim and Dally based on Virtual-Channel shifting. In this paper we propose a straightforward method to integrate this routing algorithm in OpenSM as a routing engine, explaining in detail the configuration required to support it. We also provide experiment results, obtained both from a real InfiniBand-based cluster and from simulation, to validate the new routing engine and to compare its performance and requirements against other routing engines currently available in OpenSM.
Article
Simulation is used to evaluate and validate the behavior and performance of computing systems, in particular the interconnection network in the context of high-performance computing. For the simulation to be performed, the simulator program must be provided with a mechanism that generates network traffic or workload. Although synthetic traffic has been widely used, communication from real applications is a better and more representative workload. With this kind of network workload, the simulations can become slower, especially when simulating Exascale systems. In this paper, we extend the VEF trace framework, originally designed for feeding off-chip networks with MPI traffic, including new functionality related to the on-chip communications and introducing improvements to speed up the simulations. This way, the VEF framework allows to study the behavior of Exascale interconnection networks with realistic traffic and in reasonably short times.
Article
The interconnection network architecture is crucial for High-Performance Computing (HPC) clusters, since it must meet the increasing computing demands of applications. Current trends in the design of these networks are based on increasing link speed, while reducing latency and number of components in order to lower the cost. The InfiniBand Architecture (IBA) is an example of a powerful interconnect technology, delivering huge amounts of information in few microseconds. The IBA-based hardware is able to deliver EDR and HDR speed (i.e. 100 and 200 Gb/s, respectively). Unfortunately, congestion situations and their derived problems (i.e. Head-of-Line blocking and buffer hogging), are a serious threat for the performance of both the interconnection network and the entire HPC cluster. In this paper, we propose a new approach to provide IBA-based networks with techniques for reducing the congestion problems. We propose Flow2SL-ITh, a technique that combines a static queuing scheme (SQS) with the closed-loop congestion control mechanism included in IBA-based hardware (a.k.a. injection throttling, ITh). Flow2SL-ITh separates traffic flows storing them in different virtual lanes (VLs), in order to reduce HoL blocking, while the injection rate of congested flows is throttled. Meanwhile congested traffic vanishes, there is no buffer sharing among traffic flows stored in different VLs, which reduces congestion negative effects. We have implemented Flow2SL-ITh in OpenSM, the open-source implementation of the IBA subnet manager (SM). Experimental results obtained by running simulations and real workloads in a small IBA cluster show that Flow2SL-ITh outperforms existing techniques by up to 44%, under some traffic scenarios.
Article
Full-text available
Both QoS support and congestion management techniques become essential to achieve good network performance in current high-speed interconnection networks. The most effective techniques traditionally considered for both issues, however, require too many resources for being implemented. In this paper, we propose a new cost-effective switch architecture able to face the challenges of congestion management and, at the same time, to provide QoS. The efficiency of our proposal is based on using the resources (queues) used by RECN (an efficient Head-of-Line blocking elimination technique) also for QoS support, without increasing queue requirements. The provided results show that the new switch architecture is able to guarantee QoS levels without any degradation due to congestion situations. Index Terms—High-speed interconnection networks, quality of service, congestion management.
Article
The growing system size of high performance computers results in a steady decrease of the mean time between failures. Exchanging network components often requires whole system downtime which increases the cost of failures. In this work, we study a fail-in-place strategy where broken network elements remain untouched. We show, that a fail-in-place strategy is feasible for todays networks and the degradation is manageable, and provide guidelines for the design. Our network failure simulation tool chain allows system designers to extrapolate the performance degradation based on expected failure rates, and it can be used to evaluate the current state of a system. In a case study of real-world HPC systems, we will analyze the performance degradation throughout the systems lifetime under the assumption that faulty network components are not repaired, which results in a recommendation to change the used routing algorithm to improve the network performance as well as the fail-in-place characteristic.
Conference Paper
One of the objectives of the decade for High-Performance Computing systems is to reach the exascale level of computing power before 2018, hence this will require strong efforts in their design. In that sense, High-speed low-latency interconnection networks are essential elements for exascale HPC systems. Indeed, the performance of the whole system depends on that of the interconnection network. In order to develop and test new techniques, suited to exascale HPC systems, software-based networks simulators are commonly used. As developing a network simulator from scratch is a difficult task, several platforms help the developers, OMNeT++ being one of the most popular. In this paper, we propose a new generic network simulator, exploiting the features of the OMNeT++ framework. The proposed tool is the first step to model HPC high-performance interconnection networks of exascale HPC systems: the message switching layer, routing and arbitration algorithms and buffer organizations have been modeled according to the current and expected characteristics of these systems. In addition, the tool has been designed so that it is possible to simulate networks of large size. Simulation results, validated against real systems, show the accuracy of the model.
Conference Paper
Efficient deadlock-free routing strategies are crucial to the performance of large-scale computing systems. There are many methods but it remains a challenge to achieve lowest latency and highest bandwidth for irregular or unstructured high-performance networks. % We investigate a novel routing strategy based on the single-source-shortest-path routing algorithm and extend it to use virtual channels to guarantee deadlock-freedom. We show that this algorithm achieves minimal latency and high bandwidth with only a low number of virtual channels and can be implemented in practice. % We demonstrate that the problem of finding the minimal number of virtual channels needed to route a general network deadlock-free is NP-complete and we propose different heuristics to solve the problem. We implement all proposed algorithms in the Open Subnet Manager of InfiniBand and compare the number of needed virtual channels and the bandwidths of multiple real and artificial network topologies which are established in practice. % Our approach allows to use the existing virtual channels more effectively to guarantee deadlock-freedom and increase the effective bandwidth of up to a factor of two. Application benchmarks show an improvement of up to 95%. Our routing scheme is not limited to InfiniBand but can be deployed on existing InfiniBand installations to increase network performance transparently without modifications to the user applications.