-
M. Blocksome,
C. Archer,
T. Inglett,
P. Mccarthy,
M. Mundy,
J. Ratterman,
A. Sidelnik,
B. Smith,
G. Almasi, J. Castanos,
D. Lieber,
J. Moreira,
S. Krishnamoorthy,
V. Tipparaju,
J. Nieplocha
[show abstract]
[hide abstract]
ABSTRACT: This paper discusses the design and implementation of a onesided communication interface for the IBM Blue Gene/L supercomputer. This interface facilitates ARMCI and the Global Arrays toolkit and can be used by other one-sided communication libraries. New protocols, interrupt driven communication, and compute node kernel enhancements were required to enable these libraries. Three possible methods for enabling ARMCI on the Blue Gene/L software stack are discussed. A detailed look into the development process shows how the implementation of the one-sided communication interface was completed. This was accomplished on a compressed time scale with the collaboration of various organizations within IBM and open source communities. In addition to enabling the one-sided libraries, bandwidth enhancements were made for communication along a diagonal on the Blue Gene/L torus network. The maximum bandwidth improved by a factor of three. This work will enable a variety of one-sided applications to run on Blue Gene/L.
SC 2006 Conference, Proceedings of the ACM/IEEE; 12/2006
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we present two approaches to improve the execution of OpenMP applications on the IBM Cyclops multithreaded architecture. Both solutions are independent and they are focused to obtain better performance through a better management of the cache locality. The first solution is based on software modifications to the OpenMP runtime library to balance stack accesses across all data caches. The second solution is a small hardware modification to change the data cache mapping behavior, with the same goal. Both solutions help parallel applications to improve scalability and obtain better performance in this kind of architectures. In fact, they could also be applied to future multi-core processors. We have executed (using simulation) some of the NAS benchmarks to prove these proposals. They show how, with small changes in both the software and the hardware, we achieve very good scalability in parallel applications. Our results also show that standard execution environments oriented to multiprocessor architectures can be easily adapted to exploit multithreaded processors.
Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International; 05/2005
-
N.R. Adiga,
G. Almasi,
G.S. Almasi,
Y. Aridor,
R. Barik,
D. Beece,
R. Bellofatto,
G. Bhanot,
R. Bickford,
M. Blumrich, [......],
M. Tubbs,
G. Ulsh,
C. Wait,
J. Wittrup,
M. Bae,
K. Dockser,
L. Kissel,
M.K. Seager,
J.S. Vetter,
K. Yates
[show abstract]
[hide abstract]
ABSTRACT: This paper gives an overview of the BlueGene/L Supercomputer. This is a jointly funded research partnership between IBM and the Lawrence Livermore National Laboratory as part of the United States Department of Energy ASCI Advanced Architecture Research Program. Application performance and scaling studies have recently been initiated with partners at a number of academic and government institutions,including the San Diego Supercomputer Center and the California Institute of Technology. This massively parallel system of 65,536 nodes is based on a new architecture that exploits system-on-a-chip technology to deliver target peak processing power of 360 teraFLOPS (trillion floating-point operations per second). The machine is scheduled to be operational in the 2004-2005 time frame, at price/performance and power consumption/performance targets unobtainable with conventional architectures.
Supercomputing, ACM/IEEE 2002 Conference; 12/2002
-
G. Almasi,
G.S. Almasi,
D. Beece,
R. Bellofatto,
G. Bhanot,
R. Bickford,
M. Blumrich,
A.A. Bright,
J. Brunheroto,
C. Cascaval, [......],
B.D. Steinmacher-Burow,
K. Strauss,
R. Swetz,
T. Takken,
P. Vranas,
T.J.C. Ward,
J. Brown,
T. Liebsch,
A. Schram,
G. Ulsh
[show abstract]
[hide abstract]
ABSTRACT: System-on-a-chip technology allows a level of integration that can
be leveraged to develop inexpensive high-performance, low-power
computing nodes. When used in aggregate, this approach promises to
challenge conventional supercomputer architectures in the
high-performance computing arena. Systems under consideration reach into
the hundreds of thousand nodes per machine. Architecture for these
systems are described
Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC. 2002 IEEE International; 02/2002
-
G. Almasi,
G.S. Almasi,
D. Beece,
R. Bellofatto,
G. Bhanot,
R. Bickford,
M. Blumrich,
A.A. Bright,
J. Brunheroto,
C. Cascaval, [......],
T. Takken,
R.B. Tremaine,
M. Tsao,
P. Vranas,
T.J.C. Ward,
M. Wazlowski,
J Brown,
T. Liebsch,
A. Schram,
G. Ulsh
[show abstract]
[hide abstract]
ABSTRACT: Summary form only given. Large powerful networks coupled to state-of-the-art processors have traditionally dominated supercomputing. As technology advances, this approach is likely to be challenged by a more cost-effective System-On-A-Chip approach, with higher levels of system integration. The scalability of applications to architectures with tens to hundreds of thousands of processors is critical to the success of this approach. Significant progress has been made in mapping numerous compute-intensive applications, many of them grand challenges, to parallel architectures. Applications hoping to efficiently execute on future supercomputers of any architecture must be coded in a manner consistent with an enormous degree of parallelism. The BG/L program is developing a peak nominal 180 TFLOPS (360 TFLOPS for some applications) supercomputer to serve a broad range of science applications. BG/L generalizes QCDOC, the first System-On-A-Chip supercomputer that is expected in 2003. BG/L consists of 65,536 nodes, and contains five integrated networks: a 3D torus, a combining tree, a Gb Ethernet network, barrier/global interrupt network and JTAG.
Cluster Computing, 2002. Proceedings. 2002 IEEE International Conference on; 02/2002
-
F Allen,
G Almasi,
W Andreoni,
D Beece,
B J Berne,
A Bright,
J Brunheroto,
C Cascaval, J Castanos,
P Coteus, [......],
Y Sham,
S Singh,
M Snir,
F Suits,
R Swetz,
W C Swope,
N Vishnumurthy,
T J C Ward,
H Warren,
R Zhou
Ibm Systems Journal 02/2001; 40:310-327. · 1.29 Impact Factor