First-principles calculations of electron states of a silicon nanowire with 100, 000 atoms on the K computer.
ABSTRACT Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations on the electron states of silicon nanowires with up to 107,292 atoms carried out during the initial performance evaluation phase of the K computer being developed at RIKEN. The RSDFT code has been parallelized and optimized so as to make effective use of the various capabilities of the K computer. Simulation results for the self-consistent electron states of a silicon nanowire with 10,000 atoms were obtained in a run lasting about 24 hours and using 6,144 cores of the K computer. A 3.08 peta-flops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 peta-flops.
- SourceAvailable from: Takahiro Katagiri[Show abstract] [Hide abstract]
ABSTRACT: In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a supercomputer. We assume that the sizes also fit the exa-scale computing requirements of current production runs of an application. To minimize communication time, we added several communication avoiding and communication reducing algorithms based on Message Passing Interface (MPI) non-blocking implementations. A perfor-mance evaluation with up to full nodes of the FX10 system indicates that (1) the MPI non-blocking implementation is 3x as efficient as the baseline implementation, (2) the hybrid MPI execution is 1.9x faster than the pure MPI execution, (3) our proposed solver is 2.3x and 22x faster than a ScaLA-PACK routine with optimized blocking size and cyclic-cyclic distribution, respectively.ABCLib Working Notes. 10/2014; 11.
- [Show abstract] [Hide abstract]
ABSTRACT: In the era of petascale supercomputing, the importance of load balancing is crucial. Although dynamic load balancing is widespread, it is increasingly difficult to implement effectively with thousands of processors or more, prompting a second look at static load-balancing techniques even though the optimal allocation of tasks to processors is an NP-hard problem. We propose a heuristic static load-balancing algorithm, employing fitted benchmarking data, as an alternative to dynamic load balancing. The problem of allocating CPU cores to tasks is formulated as a mixed-integer nonlinear optimization problem, which is solved by using an optimization solver. On 163,840 cores of Blue Gene/P, we achieved a parallel efficiency of 80% for an execution of the fragment molecular orbital method applied to model protein-ligand complexes quantum-mechanically. The obtained allocation is shown to outperform dynamic load balancing by at least a factor of 2, thus motivating the use of this approach on other coarse-grained applications.High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for; 01/2012
- [Show abstract] [Hide abstract]
ABSTRACT: In this paper, we propose a decomposition approach to differential eigenvalue problems with Abelian or non-Abelian symmetries. In the approach, we divide the original differential problem into eigenvalue subproblems which require less eigenpairs and can be solved independently. Our approach can be seamlessly incorporated with grid-based discretizations such as finite difference, finite element, or finite volume methods. We place the approach into a two-level parallelization setting, which saves the CPU time remarkably. For illustration and application, we implement our approach with finite elements and carry out electronic structure calculations of some symmetric cluster systems, in which we solve thousands of eigenpairs with millions of degrees of freedom and demonstrate the effectiveness of the approach.Journal of Scientific Computing 12/2013; · 1.71 Impact Factor