
Enrique Vallejo- PhD
- Professor (Associate) at University of Cantabria
Enrique Vallejo
- PhD
- Professor (Associate) at University of Cantabria
About
60
Publications
18,594
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
731
Citations
Introduction
Current institution
Additional affiliations
January 2007 - present
Publications
Publications (60)
Many shared-memory parallel systems use lock-based synchronization mechanisms to provide mutual exclusion or reader-writer access to memory locations. Software locks are inefficient either in memory usage, lock transfer time, or both. Proposed hardware locking mechanisms are either too specific (for example, requiring static assignment of threads t...
The interconnection network comprises a significant portion of the cost of
large parallel computers, both in economic terms and power consumption. Several
previous proposals exploit large-radix routers to build scalable low-distance
topologies with the aim of minimizing these costs. However, they fail to
consider potential unbalance in the network...
High-performance computing (HPC) is recognized as one of the pillars for further progress in science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging architectural challenges in order to reach Exascale level of performance, projected for the year 2020. The much larger embedded and mobile market allows...
Low-diameter network topologies require non-minimal routing to avoid network congestion, such as Valiant routing. This increases base latency but avoids congestion issues. Optimized restricted variants focus on reducing path length. However, these optimizations only reduce paths for local traffic, where source and destination of each packet belong...
Low-diameter network topologies require non-minimal routing, such as Valiant routing, to avoid network congestion under challenging traffic patterns like the so-called adversarial. However, this mechanism tends to increase the average path length, base latency, and network load. The use of shorter non-minimal paths has the potential to enhance perf...
In recent years, numerous multicore RISC-V platforms have emerged. Development frameworks such as OpenPiton are employed in designs that aim to scale to a large number of cores. While OpenPiton presents a large flexibility, supporting different requirements and processing cores, some of its design decisions result in designs that are not optimized...
SynFull is a widely employed tool that generates realistic traffic patterns for the performance evaluation of a NoC. In this work, we identify the main limitations of SynFull: high variability and long simulation time and also that these limitations increase when SynFull is integrated with RTL designs. SynFull-RTL employs a statistical approach, si...
Many-core processors demand scalable, efficient and low latency NoCs. Bypass routers are an affordable solution to attain low latency in relatively simple topologies like the mesh. SMART improves on traditional bypass routers implementing multi-hop bypass which reduces the importance of the distance between pairs of nodes. Nevertheless, the conserv...
Minimizing latency and power are key goals in the design of NoC routers. Different proposals combine lookahead routing and router bypass to skip the arbitration and buffering, reducing router delay. However, the conditions to use the bypass require completely empty buffers in the intermediate routers. This restricts the amount of flits that use the...
Minimizing latency and power are key goals in the design of NoC routers. Different proposals combine lookahead routing and router bypass to skip the arbitration and buffering, reducing router delay. However, the conditions to use them requires completely empty buffers in the intermediate routers. This restricts the amount of flits that use the bypa...
Current compute-intensive applications largely exceed the resources of single-core processors. To face this problem, multi-core processors along with parallel computing techniques have become a solution to increase the computational performance. Likewise, multi-processors are fundamental to support new technologies and new science applications chal...
Low latency and low implementation cost are two key requirements in NoCs. SMART routers implement multi-hop bypass, obtaining latency values close to an ideal point-to-point interconnect. However, it requires a significant amount of resources such as Virtual Channels (VCs), which are not used as efficiently as possible, preventing bypass in certain...
Execution time of parallel applications depends on the balanced execution and synchronization of all its processes. However, most interconnects exhibit significant throughput unfairness, introducing load unbalance. At high loads, such unfairness significantly degrades the performance of some nodes, and eventually the whole system. Different strateg...
Low-diameter networks require non-minimal adaptive routing to deal with varying traffic characteristics and avoid pathological performance. Such routing is based on local estimations of network congestion, based on link-level flow control credits. Dragonfly networks based on the extensions of commodity Ethernet networks using OpenFlow have been pro...
Minimizing latency and power are key goals in the design of NoC routers. Different proposals combine lookahead routing and router bypass to skip the arbitration and buffering stages of their pipeline, reducing router delay to a single-cycle. However, the conditions to use the bypass are unnecessarily conservative, requiring completely empty buffers...
The growing complexity of multi-core architec-tures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its programmability, portability and potential for optimizations. However, with the expected increase in core counts, fine-grained...
Valiant routing randomizes network traffic to avoid pathological congestion issues by diverting traffic to a random intermediate switch. It has received significant attention in recently proposed high-radix, low-diameter topologies, which are prone to congestion issues. It has been implemented obliviously, or as the basis of some non-minimal adapti...
The Graph500 benchmark attempts to steer the design of High-Performance Computing systems to maximize the performance under memory-constricted application workloads. A realistic simulation of such benchmarks for architectural research is challenging due to size and detail limitations. By contrast, synthetic traffic workloads constitute one of the l...
La asignatura Introducción a las Redes de Computadores de la Universidad de Cantabria presenta un rendimiento bajo, con queja de la falta de clases prácticas que refuercen los contenidos teóricos. Para atacar el problema, se ha desarrollado un conjunto de materiales audiovisuales que reemplacen algunas de las clases teóricas y aumenten la cantidad...
Commodity Ethernet networks are used in many HPC systems. Extensions based on OpenFlow have been proposed for large HPC deployments, considering scalability and power consumption concerns. Such designs employ low-diameter topologies to minimize power consumption, such as Flattened Butterflies or Dragonflies. However, these topologies require non-mi...
As BigData applications have gained momentum over the last years, the Graph500 benchmark has appeared in an attempt to steer the design of HPC systems to maximize the performance under memory-constricted application workloads. A realistic simulation of such benchmarks for architectural research is challenging due to size and detail limitations, and...
Dragonfly networks arrange network routers in a two-level hierarchy, providing a competitive cost-performance solution for large systems. Non-minimal adaptive routing (adaptive misrouting) is employed to fully exploit the path diversity and increase the performance under adversarial traffic patterns. Network fairness issues arise in the dragonfly f...
The interconnection network comprises a significant portion of the cost of large parallel computers, both in economic terms and power consumption. Several previous proposals exploit large-radix routers to build scalable low-distance topologies with the aim of minimizing these costs. However, they fail to consider potential unbalance in the network...
Dragonfly networks have a two-level hierarchical arrangement of the network routers, and allow for a competitive cost-performance solution in large systems. Non-minimal adaptive routing is employed to fully exploit the path diversity and increase the performance under adversarial traffic patterns. Throughput unfairness prevents a balanced use of th...
Traces of parallel programs are a valuable resource for users of HPC systems because they provide insight about the efficiency of the execution of their applications, allowing to improve the code. Additionally, for computer scientists they are useful as an input to simulation tools to guide the development of high performance computing (HPC) system...
Current High-Performance Computing (HPC) and data center networks rely on large-radix routers. Hamming graphs (Cartesian products of complete graphs) and dragonflies (two-level direct networks with nodes organized in groups) are some direct topologies proposed for such networks. The original definition of the dragonfly topology is very loose, with...
Adaptive deadlock-free routing mechanisms are required to handle variable traffic patterns in dragonfly networks. However, distance-based deadlock avoidance mechanisms typically employed in Dragonflies increase the router cost and complexity as a function of the maximum allowed path length. This paper presents on-the-fly adaptive routing (OFAR), a...
Dragonfly topologies are recent network designs that are considered one of the most promising interconnect options for Exascale systems. They offer a low diameter and low network cost, but do so at the expense of path diversity, which makes them vulnerable to certain adversarial traffic patterns. Indirect routing approaches can alleviate the perfor...
High-radix hierarchical networks are cost-effective topologies for large scale computers. In such networks, routers are organized in super nodes, with local and global interconnections. These networks, known as Dragonflies, outperform traditional topologies such as multi-trees or tori, in cost and scalability. However, depending on the traffic patt...
Dragonfly networks are appealing topologies for large-scale Data center and HPC networks, that provide high throughput with low diameter and moderate cost. However, they are prone to congestion under certain frequent traffic patterns that saturate specific network links. Adaptive non-minimal routing can be used to avoid such congestion. That kind o...
Twisted torus topologies have been proposed as an alternative to toroidal rectangular networks, improving distance parameters and providing network symmetry. However, twisting is apparently less amenable to task mapping algorithms of real life applications. In this paper we make an analytical study of different mapping and concentration techniques...
Dragonfly networks are composed of interconnected groups of routers. Adaptive routing allows packets to be forwarded minimally or non-minimally adapting to the traffic conditions in the network. While minimal routing sends traffic directly between groups, non-minimal routing employs an intermediate group to balance network load.
A random selection...
The performance of an interconnection network is typically measured by two metrics: average latency and peak network throughput. Average network throughput is usually reported in the belief the network is fair and all source nodes are supposedly able to inject at the same rate. However, most systems exhibit significant network unfairness under non-...
This work attempts to compare size and cost of two network topologies proposed for large-radix routers: concentrated torus and dragonflies. We study and compare the scalability, cost and fault tolerance of each network. On average, we found that a concentrated torus can be a cost-efficient option for middle-range networks.
Dragonfly networks have been recently proposed for the interconnection network of forthcoming exascale supercomputers. Relying on large-radix routers, they build a topology with low diameter and high throughput, divided into multiple groups of routers. While minimal routing is appropriate for uniform traffic patterns, adversarial traffic patterns c...
Transactional Memory (TM) intends to simplify the design and implementation of the shared-memory data structures used in parallel
software. Many Software TM systems are based on writer-locks to protect the data being modified. Such implementations can
suffer from the “privatization” problem, in which transactional and non-transactional accesses to...
Many current parallel computers are built around a torus interconnection network. Machines from Cray, HP, and IBM, among others, make use of this topology. In terms of topological advantages, square (2D) or cubic (3D) tori would be the topologies of choice. However, for different practical reasons, 2D and 3D tori with different number of nodes per...
The field of Computer Networks has evolved quickly during the last years. In this paper we consider different aspects of those changes that condition the design of a subject in Networking; specifically, we discuss social, technological and economic aspects. Additionally, the changes proposed in the University studies for the European Higher Educati...
Without care, Hardware Transactional Memory presents several performance pathologies that can degrade its performance. Among them, writers of commonly read variables can suffer from starvation. Though different solutions have been proposed for HTM systems, hybrid systems can still suffer from this performance problem, given that software transactio...
To reduce the overhead of Software Transactional Memory (STM) there are many recent proposals to build hybrid systems that use architectural support either to accelerate parts of a particular STM algorithm (Ha-TM), or to form a hybrid system allowing hardware-transactions and software-transactions to inter-operate in the same address space (Hy-TM)....
Although they have been the main server technology for many years, multiprocessors are undergoing a renaissance due to multi-core
chips and the attractive scalability properties of combining a number of such multi-core chips into a system. The widespread
use of multiprocessor systems will make performance losses due to consistency models and synchr...
Many parallel computers use Tori interconnection networks. Machines from Cray, HP and IBM, among others, exploit these topologies. In order to maintain full network symmetry, 2D and 3D Tori must have the same number of nodes (k) per dimension resulting in square or cubic topologies. Nevertheless, for practical reasons, computer engineers have desig...
This paper explores the suitability of dense circulant graphs of degree four for the design of on-chip interconnection networks. Networks based on these graphs reduce the Torus diameter in a factor √2, which translates into significant performance gains for unicast traffic. In addition, they are clearly superior to Tori when managing collective com...
Chip Multiprocessors (CMPs) are an efficient way of designing and use the huge amount of transistors on a chip. Different cores on a chip can compose a shared memory system with a very low-latency interconnect at a very low cost. Unfortunately, consistency models and synchronization styles of popular programming models for multiprocessors impose se...
Multiprocessors are coming into wide-spread use in many application areas, yet there are a number of challenges to achieving a good tradeoff between complexity and performance. For example, while implementing memory coherence and consistency is essential for correctness, efficient implementation of critical sections and synchronization points is de...
Circulant graphs have been deeply studied in technical literature. Midimew networks are a class of distance-related optimal circulant graphs of degree four which have applications in network engineering and coding theory. In this research, a new layout for Midimew networks which keeps the maximum link length under the value √5 is presented, conside...
Circulant graphs have been deeply studied in technical literature. Midimew networks are a class of distance-related optimal circulant graphs of degree four which have applications in network engineering and coding theory. In a previous work, a new layout for Midimew networks which keeps the maximum link length under 5 has been presented. The most i...
We present in this paper some of the topolog-ical properties of an interesting class of Cir-culant graphs whose nodes are labeled by a subset of the Gaussian integers. Such graphs and the problems we solve on them have direct applications to the design of interconnection networks and, in addition, they can be consid-ered in the design of perfect er...
Multiprocessors are coming into wide-spread use in many application areas, yet there are a number of challenges to achieving a good tradeoff between complexity and performance. For example, while implementing memory coherence and consistency is essential for correctness, efficient implementation of critical sections and synchronization points is de...