Greg Faanes’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Cray Cascade: A scalable HPC system based on a Dragonfly network
  • Conference Paper

November 2012

·

454 Reads

·

238 Citations

Greg Faanes

·

·

·

[...]

·

James Reinhard

Higher global bandwidth requirement for many applications and lower network cost have motivated the use of the Dragonfly network topology for high performance computing systems. In this paper we present the architecture of the Cray Cascade system, a distributed memory system based on the Dragonfly [1] network topology. We describe the structure of the system, its Dragonfly network and the routing algorithms. We describe a set of advanced features supporting both mainstream high performance computing applications and emerging global address space programing models. We present a combination of performance results from prototype systems and simulation data for large systems. We demonstrate the value of the Dragonfly topology and the benefits obtained through extensive use of adaptive routing.


Figure 1. BlackWidow node organization. 
Table 1 . System configurations for benchmarking.
Figure 4. Block diagram of the BlackWidow 8-pipe vector unit, with detail of pipe 0.
Figure 5. Block diagram of the 4-way superscalar processor.
The Cray BlackWidow: A highly scalable vector multiprocessor
  • Conference Paper
  • Full-text available

November 2007

·

1,723 Reads

·

60 Citations

Abstract This paper,describes the system,architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory,(DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and,41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor,loads and,stores and,is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node,is implemented,as a 4-way SMP with up to 128 Gbytes of DDR2 main memory,capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages,such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance,results and discuss design tradeoffs. General terms: design; architecture; performance Keywords:,multiprocessor; shared,memory;,vector; MPP;

Download

Citations (2)


... In addition, they offer a good scalability due to their hierarchical structure. Thanks to these advantages, Dragonfly topologies have been configured in several real systems such as the IBM PERCS [2], the Cray XC series [3] or the Niagara (the Canada's fastest supercomputer) [4], which implements a Dragonfly+ architecture [5]. ...

Reference:

Leveraging InfiniBand Controller to Configure Deadlock-Free Routing Engines for Dragonflies
Cray Cascade: A scalable HPC system based on a Dragonfly network
  • Citing Conference Paper
  • November 2012

... Although both the ARM SVE and the RISC-V vector extensions took inspiration from the more traditional vector architectures, such as the Cray-1 [33], there is a remarkable difference between them. While ARM SVE allows implementations from 128-bits up to 2048-bits, RISC-V does not limit the MVL, spacing from short and medium size vectors, to long vector designs, which are akin to classic vector supercomputers [8], [21], [33] and modern VPs [7], [39]. For example, the Aurora VP from NEC [39] can multiplyaccumulate two 256 element double-precision floating-point vectors in a single instruction. ...

The Cray BlackWidow: A highly scalable vector multiprocessor