Conference Paper

First-principles calculations of electron states of a silicon nanowire with 100, 000 atoms on the K computer.

DOI: 10.1145/2063384.2063386 In proceeding of: Conference on High Performance Computing Networking, Storage and Analysis, SC 2011, Seattle, WA, USA, November 12-18, 2011
Source: DBLP

ABSTRACT Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations on the electron states of silicon nanowires with up to 107,292 atoms carried out during the initial performance evaluation phase of the K computer being developed at RIKEN. The RSDFT code has been parallelized and optimized so as to make effective use of the various capabilities of the K computer. Simulation results for the self-consistent electron states of a silicon nanowire with 10,000 atoms were obtained in a run lasting about 24 hours and using 6,144 cores of the K computer. A 3.08 peta-flops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 peta-flops.

1 Bookmark
 · 
46 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As an entry for the 2012 Gordon-Bell performance prize, we report performance results of astrophysical N-body simulations of one trillion particles performed on the full system of K computer. This is the first gravitational trillion-body simulation in the world. We describe the scientific motivation, the numerical algorithm, the parallelization strategy, and the performance analysis. Unlike many previous Gordon-Bell prize winners that used the tree algorithm for astrophysical N-body simulations, we used the hybrid TreePM method, for similar level of accuracy in which the short-range force is calculated by the tree algorithm, and the long-range force is solved by the particle-mesh algorithm. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49% and 42% of the peak speed.
    11/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents the Netuno supercomputer, a large-scale cluster installed at Federal University of Rio de Janeiro in Brazil. A detailed performance evaluation of Netuno is presented, depicting its computational and I/O performance, as well as the results for two real-world applications. Since building a high- performance cluster for running a wide range of applications is a non-trivial task, some lessons learned from assembling and operating this cluster, such as the excellent performance of the OpenMPI library, and the relevance of employing an efficient parallel file system over the traditional NFS system, can be useful knowledge to support the design of new systems. Currently, Netuno is being heavily used to run large scale simulations in the areas of ocean modeling, meteorology, engineering, physics, and geophysics.
    International Journal of Parallel Programming 04/2014; · 0.40 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a decomposition approach to differential eigenvalue problems with Abelian or non-Abelian symmetries. In the approach, we divide the original differential problem into eigenvalue subproblems which require less eigenpairs and can be solved independently. Our approach can be seamlessly incorporated with grid-based discretizations such as finite difference, finite element, or finite volume methods. We place the approach into a two-level parallelization setting, which saves the CPU time remarkably. For illustration and application, we implement our approach with finite elements and carry out electronic structure calculations of some symmetric cluster systems, in which we solve thousands of eigenpairs with millions of degrees of freedom and demonstrate the effectiveness of the approach.
    Journal of Scientific Computing 12/2013; · 1.71 Impact Factor