
Jesus Escudero-Sahuquillo- PhD in Computer Science
- Professor (Associate) at University of Castilla-La Mancha, Albacete, Spain
Jesus Escudero-Sahuquillo
- PhD in Computer Science
- Professor (Associate) at University of Castilla-La Mancha, Albacete, Spain
About
70
Publications
4,196
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
471
Citations
Introduction
Jesus Escudero-Sahuquillo is an Associate Professor at UCLM, Spain. His research is focused on interconnection networks for HPC systems and Datacenters, and all the strategies related to improving them, such as congestion management, routing algorithms, network topologies, and power saving. In these fields, he has published around 45 papers, published in peer-reviewed international conferences and journals. He has also participated in several projects funded by public and private bodies.
Current institution
University of Castilla-La Mancha, Albacete, Spain
Current position
- Professor (Associate)
Publications
Publications (70)
Supercomputers (SCs) enable advanced research for a variety of scientific fields, and data centers (DCs) power our day-to-day services. These two massive systems work at scales, in terms of storage and computing power, which are not comparable to our everyday devices. As such, they require state-of-the-art technology to constantly evolve and meet o...
Over the past decade, specialized computing and storage devices, such as GPUs, TPUs, and high-speed storage, have been increasingly integrated into server nodes within Supercomputers and Data Centers. The advent of high-bandwidth memory (HBM) has facilitated a more compact design for these components, enabling multiple units to be interconnected wi...
The Dragonfly topology is currently one of the most popular network topologies in high-performance parallel systems. The interconnection networks of many of these systems are built from components based on the InfiniBand specification. However, due to some constraints in this specification, the available versions of the InfiniBand network controlle...
The interconnection network is a key element in High-Performance Computing (HPC) and Datacenter (DC) systems whose performance depends on several design parameters, such as the topology, the switch architecture, and the routing algorithm. Among the most common topologies in HPC systems, the Fat-Tree offers several shortest-path routes between any p...
The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications' communication operations. Unfortunately, congestion situations may spoil network performance unless the network design applies specific countermeasures. Adaptive routing algorit...
Interconnection networks are crucial in data centers and supercomputers, ensuring high communication bandwidth and low latency under demanding traffic patterns from data-intensive applications. These patterns can cause congestion, affecting system performance if not addressed efficiently. Current congestion control techniques, like DCQCN, struggle...
The InfiniBand (IB) interconnection technology is widely used in the networks of modern supercomputers and data centers. Among other advantages, the IB-based network devices allow for building multiple network topologies, and the IB control software (subnet manager) supports several routing engines suitable for the most common topologies. However,...
InfiniBand networking technology widely utilized in modern high-performance systems. This work describes the implementation of the hybrid network topology known as KNS in a real HPC cluster using an InfiniBand interconnection network. He have used cluster CELLIA (Cluster for the Evaluation of Low-Latency Architectures), consists of 50 compute and s...
The interconnection network is a subsystem evermore important in current Datacenters and supercomputers where data- and computing-hungry applications, used in high-performance computing (HPC), artificial intelligence (AI), or Cloud fields, demand a growing number of communication operations among the server hosts. These demands require that the net...
Current high-performance interconnection networks for HPC and Data-Center systems incorporate mechanisms to prevent congestion from degrading network performance. Specifically, the popular InfiniBand specification defines a mechanism to reduce the injection rate of the traffic flows contributing to congestion. However, the efficiency of this mechan...
The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications’ communication operations. Unfortunately, congestion situations may spoil network performance unless the network design applies specific countermeasures. Adaptive routing algorit...
Data centers are a fundamental infrastructure in the Big-Data era, where applications and services demand a high amount of data and minimum response times. The interconnection network is an essential subsystem in the data center, as it must guarantee high communication bandwidth and low latency to the communication operations of applications, other...
In high-performance computing systems, the interference among applications that share network resources is one of the major causes of performance degradation and increase variability among different executions of the same application. In this article, we propose a virtual partitioning scheme for dragonfly networks that combines a job-allocation pol...
Modern high-speed interconnection networks include support for the provision of quality of service (QoS) to the applications. The output scheduling algorithm plays an important role in the QoS provision, choosing the packets to be delivered from the output buffers. InfiniBand, one of the most used interconnection technologies, includes a table-base...
High-performance interconnection networks are essential subsystems of HPC systems and data centers. In this paper we present a switch model whose arbitration unit has been modified in order to be more suitable to networks that support traffic with different needs or priorities. We base our work on the NetFPGA-SUME platform, that consists of a board...
The performance of lossy data-center networks (DCNs) may degrade due to packet dropping (and possible retransmission) under congestion. In this article we propose and evaluate a solution to deal with congestion in lossy DCNs, based on the same approach as the Dynamic Virtual Lanes (DVL) technique, previously proposed for lossless DCNs. This approac...
The Dragonfly topology is currently one of the most popular network topologies in high-performance parallel systems. The interconnection networks of many of these systems are built from components based on the InfiniBand specification. However, due to some constraints in this specification, the available versions of the InfiniBand network controlle...
The interconnection network is a key element in High-Performance Computing (HPC) and Datacenter (DC) systems whose performance depends on several design parameters, such as the topology, the switch architecture, and the routing algorithm. Among the most common topologies in HPC systems, the Fat-Tree offers several shortest-path routes between any p...
The number of endnodes in high-performance computing (HPC) and Datacenter (DC) systems is constantly increasing. Hence, it is crucial to minimize the impact of network congestion to guarantee a suitable network performance. InfiniBand is a prominent interconnect technology that allows implementing efficient topologies and routing algorithms, as wel...
The architecture of modern DataCenters (DCs) has evolved to meet the stringent communication latency requirements of applications. RDMA technologies such as RoCEv2 have become mainstream to reduce latency, but their performance is impaired in systems with lossy networks due to the overload introduced by packet retransmissions. Thus, lossless networ...
Hybrid and direct topologies are cost-efficient and scalable options to interconnect thousands of end nodes in high-performance computing (HPC) systems. They offer a rich path diversity, high bisection bandwidth, and a reduced diameter guaranteeing low latency. In these topologies, efficient deterministic routing algorithms can be used to balance s...
Interconnection network performance is a key issue in HPC systems and datacenters, especially as their number of end nodes grows, to cope with application needs. The network topology and the routing algorithm are important factors for performance and cost. Topologies such as fat-tree or Dragonfly were proposed to maximize network performance while...
The interconnection network architecture is crucial for High-Performance Computing (HPC) clusters, since it must meet the increasing computing demands of applications. Current trends in the design of these networks are based on increasing link speed, while reducing latency and number of components in order to lower the cost. The InfiniBand Architec...
Dragonfly topologies are gathering great interest as one of the most promising interconnect options for High-Performance Computing systems. Dragonflies contain physical cycles that may lead to traffic deadlocks unless the routing algorithm prevents them properly. Previous topology-aware algorithms are difficult to implement, or even unfeasible, in...
Simulation is often used in order to evaluate the behavior and the performance of computing systems. Specifically, in the field of high-performance interconnection networks for HPC clusters the simulation has been extensively considered to verify and validate network operation models and to evaluate their performance. Nevertheless, experiments cond...
The performance of interconnection networks is a challenging issue for High-Performance Computing (HPC) systems, which becomes even more important when the number of interconnected endnodes grows. In that sense, Dragonfly interconnection patterns are a very popular option to configure the network topology, especially for large systems, as they are...
The number of endnodes in high-performance computing systems has grown significantly in the last years. Hence, the interconnection network has become an essential issue as it may end up being the system bottleneck if it is not properly designed. In that sense, the Dragonfly topology has become very popular for interconnecting high-performance compu...
Current high-performance platforms such as Datacenters or High-Performance Computing systems rely on highspeed interconnection networks able to cope with the ever-increasing communication requirements of modern applications. In particular, in high-performance systems that must offer differentiated services to applications which involve traffic prio...
Interconnection networks are key components in high-performance computing (HPC) systems, their performance having a strong influence on the overall system one. However, at high load, congestion and its negative effects (e.g., Head-of-line blocking) threaten the performance of the network, and so the one of the entire system. Congestion control (CC)...
As parallel computing systems increase in size, the interconnection network is becoming a critical subsystem. The current trend in network design is to use as few components as possible to interconnect the end nodes, thereby reducing cost and power consumption. However, this increases the probability of congestion appearing in the network. As conge...
Head-of-Line (HoL) blocking is a well-known phenomenon that may dramatically degrade the performance of the modern high-performance interconnection networks. Many techniques have been proposed to solve this problem, most of them based on separating traffic flows into different queues at switch ports. However, the efficiency of these proposals may v...
One of the objectives of the decade for High-Performance Computing systems is to reach the exascale level of computing power before 2018, hence this will require strong efforts in their design. In that sense, High-speed low-latency interconnection networks are essential elements for exascale HPC systems. Indeed, the performance of the whole system...
The fat-tree is one of the most common topologies among the interconnection networks of the systems currently used for high-performance parallel computing. Among other advantages, fat-trees allow the use of simple but very efficient routing schemes. One of them is a deterministic routing algorithm that has been recently proposed, offering a similar...
High-speed interconnection networks are essential elements for different high-performance parallel-computing systems. One of the most common interconnection network topologies is the fat-tree, whose advantages have turned it into the favorite topology of many interconnect designers. One of these advantages is the possibility of using simple but eff...
Existing congestion control mechanisms in interconnects can be divided into two general approaches. One is to throttle traffic injection at the sources that contribute to congestion, and the other is to isolate the congested traffic in specially designated resources. These two approaches have different, but non-overlapping weaknesses. In this paper...
The Interconnection networks are essential elements in current computing systems. For this reason, achieving the best network performance, even in congestion situations, has been a primary goal in recent years. In that sense, there exist several techniques focused on eliminating the main negative effect of congestion: the Head of Line (HOL) blockin...
The fat-tree is one of the most common topologies for the interconnection networks of PC Clusters which are currently used
for high-performance parallel computing. Among other advantages, fat-trees allow the use of simple but very efficient routing
schemes. One of them is a deterministic routing algorithm that has been recently proposed, offering s...
As the number of components in cluster-based systems increases, costand power consumption also increase. One way to reduce both problems is usingsmaller networks with adequate congestion management mechanisms. Recentsuccessful proposals (RECN) eliminate the negative effects of congestion,the Head-of-Line (HOL) blocking, leaving congestion harmless....