Conference Paper

First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer

DOI: 10.1145/2063384.2063386 Conference: Conference on High Performance Computing Networking, Storage and Analysis, SC 2011, Seattle, WA, USA, November 12-18, 2011
Source: DBLP

ABSTRACT

Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations on the electron states of silicon nanowires with up to 107,292 atoms carried out during the initial performance evaluation phase of the K computer being developed at RIKEN. The RSDFT code has been parallelized and optimized so as to make effective use of the various capabilities of the K computer. Simulation results for the self-consistent electron states of a silicon nanowire with 10,000 atoms were obtained in a run lasting about 24 hours and using 6,144 cores of the K computer. A 3.08 peta-flops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 peta-flops.

1 Follower
 · 
19 Reads
  • Source
    • "A set of two simulations was conducted under present and future conditions to examine the change in the properties of TCs. The integration period of NICAM is for the first time extended to 20 years, which is achieved by using a recently developed supercomputer (the K computer; Hasegawa et al. 2011). "

    Full-text · Dataset · Sep 2015
  • Source
    • "Parallel processing for RSDFT needs to consider whole process parallelism. As we mentioned, RSDFT needs to parallelize the orthogonalization routine and other routines, such as updating wave function using the CG method and updating potential fields, in addition to the eigensolver [2]. Although the number of processes may be too large for the eigensolver, it is necessary for the parallelization of other parts of RSDFT, and it is favorable to avoid any extra costs such as matrix re-distribution. "

    Full-text · Article · Jan 2015
  • Source
    • "Parallel processing for RSDFT needs to consider whole process parallelism . As we mentioned, RSDFT needs to parallelize the orthogonalization routine and other routines, such as updating wave function using the CG method and updating potential fields, in addition to the eigensolver [2]. Although the number of processes may be too large for the eigensolver, it is necessary for the parallelization of other parts of RSDFT, and it is favorable to avoid any extra costs such as matrix re-distribution. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a supercomputer. We assume that the sizes also fit the exa-scale computing requirements of current production runs of an application. To minimize communication time, we added several communication avoiding and communication reducing algorithms based on Message Passing Interface (MPI) non-blocking implementations. A perfor-mance evaluation with up to full nodes of the FX10 system indicates that (1) the MPI non-blocking implementation is 3x as efficient as the baseline implementation, (2) the hybrid MPI execution is 1.9x faster than the pure MPI execution, (3) our proposed solver is 2.3x and 22x faster than a ScaLA-PACK routine with optimized blocking size and cyclic-cyclic distribution, respectively.
    Full-text · Article · Oct 2014
Show more