-
[show abstract]
[hide abstract]
ABSTRACT: We present a scalable parallelization scheme for high-order stencil computations that also optimizes memory behavior on multicore clusters. Our multilevel approach combines: (i) inter-node parallelization via spatial decomposition;
(ii) inter-core parallelization via multithreading and explicit non-uniform memory access (NUMA) control; (iii) data locality optimizations through auto-tuned tiling for
efficient use of hierarchical memory; and (iv) register blocking and data parallelism
via single-instruction multiple-data techniques to utilize registers and exploit data
locality. The scheme is applied to a sixth-order stencil based finite-difference timedomain code. Weak-scaling parallel efficiency is over 98 % on 32,768 BlueGene/P
processors. Multithreading with explicit NUMA control attains 9.9-fold speedup on a
dual 12-core AMD Opteron system. Data locality optimizations achieve 7.7-fold reduction of the last level cache miss rate of Intel Nehalem, whereas register blocking
increases data parallelism and thereby achieves 5.9 Gflops performance on a single
core. Register blocking + multithreading optimizations achieve 5.8-fold speedup on
a single quadcore Nehalem.
The Journal of Supercomputing 12/2012; 62(2):946-966. · 0.58 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Exciton dynamics at an interface between an electron donor, rubrene, and a C(60) acceptor is studied by nonadiabatic quantum molecular dynamics simulation. Simulation results reveal an essential role of the phenyl groups in rubrene in increasing the charge-transfer rate by an order-of-magnitude. The atomistic mechanism of the enhanced charge transfer is found to be the amplification of aromatic breathing modes by the phenyl groups, which causes large fluctuations of electronic excitation energies. These findings provide insight into molecular structure design for efficient solar cells, while explaining recent experimental observations.
The Journal of chemical physics 05/2012; 136(18):184705. · 3.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Dynamic irregular applications such as molecular dynam- ics (MD) simulation often suffer considerable performance deterioration during execution. To address this problem, an optimal data-reordering schedule has been developed for runtime memory-access optimization of MD simulations on parallel computers. Analysis of the memory-access penalty during MD simulations shows that the performance improve- ment from computation and data reordering degrades gradually as data translation lookaside buffer misses increase. We have also found correla- tions between the performance degradation with physical properties such as the simulated temperature, as well as with computational parameters such as the spatial-decomposition granularity. Based on a performance model and pre-profiling of data fragmentation behaviors, we have de- veloped an optimal runtime data-reordering schedule, thereby archiving speedup of 1.35, 1.36 and 1.28, respectively, for MD simulations of silica at temperatures 300 K, 3,000 K and 6,000 K.
01/2012: pages 781-792; , ISBN: 9783642328190
-
[show abstract]
[hide abstract]
ABSTRACT: Optical planar waveguide-mode sensor is a promising candidate for highly sensitive biosensing techniques in fields such as protein adsorption, receptor-ligand interaction and surface bacteria adhesion. To make the waveguide-mode sensor system more realistic, a spectral readout type waveguide sensor is proposed to take advantage of its high speed, compactness and low cost. Based on our previously proposed monolithic waveguide-mode sensor composed of a SiO2 waveguide layer and a single crystalline Si layer [1], the mechanism for achieving high sensitivity is revealed by numerical simulations. The optimal achievable sensitivities for a series of waveguide structures are summarized in a contour map, and they are found to be better than those of previously reported angle-scan type waveguide sensors.
Optics Express 10/2011; 19(21):20205-13. · 3.59 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Deformation, plasticity, and flow in silica-based glasses have been studied for decades, and yet important questions remain about the atomistic mechanisms underlying these processes. Our molecular dynamics simulations of nanoindentation indicate that these mechanical processes have a unified underlying atomistic mechanism. The simulations reveal that indentation nucleates under-coordinated silicon and oxygen defects, which migrate by switching bonds in string-like processes. We also observe defect annihilation in the plastic region underneath and the pileup region around the indenter. These defects have also been observed in simulations of nanovoid coalescence under hydrostatic tension and in nanovoid deformation and breakup in shearing silica glass.
Applied Physics Letters 09/2011; 99(11):111906-111906-3. · 3.84 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The optical reflectance of He-Ne laser light on a waveguide-mode sensor was measured as a function of light incident angle, in the case of either a metal (Au, Cr or Pt) film or nanoparticles being attached to the waveguide surface of the sensor. A dip appears in the reflectance spectrum as a function of incident angle at the angle where waveguide-mode excitation is induced. It is found that the dip moves toward a lower angle in the case that the attached metal is of a film shape, while it shifts toward a higher angle when the metal is an ensemble of nanoparticles. This difference in the direction of shift can be explained well by theoretical calculations using average refractive indices of the metal-containing layers. The present result indicates that one can estimate whether a metal nanostructure is film-like or an ensemble of spherical nanoparticles by the sensor.
Nanotechnology 06/2011; 22(24):245503. · 3.98 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We have developed a scalable hierarchical parallelization scheme for molecular dynamics (MD) simulation on multicore clusters. The scheme explores multi-level parallelism combining: (1) Inter-node parallelism using spatial decomposition via message passing; (2) inter-core parallelism using cellular decomposition via multithreading employing a master/worker model; (3) data-level optimization via single-instruction multiple-data (SIMD) parallelism with various code transformation techniques. By using a hierarchy of parallelisms, the scheme exposes very high concurrency and data locality, thereby achieving: (1) inter-node weak-scaling parallel efficiency 0.985 on 106,496 BlueGene/L nodes (0.975 on 32,768 BlueGene/P nodes), inter-node strong-scaling parallel efficiency 0.90 on 8,192 BlueGene/L nodes; (2) inter-core multithread parallel efficiency 0.65 for eight threads on a dual quadcore Xeon platform; and (3) SIMD speedup around 2 for problem sizes ranging from 3,072 to 98,304 atoms. Furthermore, the effect of memory-access penalty on SIMD performance is analyzed, and an application-based SIMD analysis scheme is proposed to help programmers determine whether their applications are amenable to SIMDization.
The Journal of Supercomputing. 01/2011; 57:20-33.
-
[show abstract]
[hide abstract]
ABSTRACT: Rapid sequencing of individual human genome is prerequisite to genomic medicine, where diseases will be prevented by preemptive cures. Quantum-mechanical tunneling through single-stranded DNA in a solid-state nanopore has been proposed for rapid DNA sequencing, but unfortunately the tunneling current alone cannot distinguish the four nucleotides due to large fluctuations in molecular conformation and solvent. Here, we propose a machine-learning approach applied to the tunneling current-voltage (I-V) characteristic for efficient discrimination between the four nucleotides. We first combine principal component analysis (PCA) and fuzzy c-means (FCM) clustering to learn the "fingerprints" of the electronic density-of-states (DOS) of the four nucleotides, which can be derived from the I-V data. We then apply the hidden Markov model and the Viterbi algorithm to sequence a time series of DOS data (i.e., to solve the sequencing problem). Numerical experiments show that the PCA-FCM approach can classify unlabeled DOS data with 91% accuracy. Furthermore, the classification is found to be robust against moderate levels of noise, i.e., 70% accuracy is retained with a signal-to-noise ratio of 26 dB. The PCA-FCM-Viterbi approach provides a 4-fold increase in accuracy for the sequencing problem compared with PCA alone. In conjunction with recent developments in nanotechnology, this machine-learning method may pave the way to the much-awaited rapid, low-cost genome sequencer. Comment: 19 pages, 7 figures
12/2010;
-
[show abstract]
[hide abstract]
ABSTRACT: We have developed an optical system designed for detecting colored nanomaterials in aqueous solutions, using the concept of evanescent-field-coupled waveguide-mode sensors. In this study, we found that the waveguide modes induced in the sensor are intrinsically sensitive to a change in optical absorption, or a 'change in color'. The system detects less than one gold nanoparticle (diameter: 20 nm) adsorbed per square micrometer. It is also demonstrated that significant signal enhancement due to adsorption of molecules is achieved using a dye. The developed sensor rarely suffers from a drawback of impurity adsorption. The system is expected to be applied as an effective sensing tool for metal colloids, nanoparticles, and colored biomolecules in solution.
Optics Express 07/2010; 18(15):15732-40. · 3.59 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Impurities segregated to grain boundaries of a material essentially alter its fracture behavior. A prime example is sulfur segregation-induced embrittlement of nickel, where an observed relation between sulfur-induced amorphization of grain boundaries and embrittlement remains unexplained. Here, 48x10(6)-atom reactive-force-field molecular dynamics simulations provide the missing link. Namely, an order-of-magnitude reduction of grain-boundary shear strength due to amorphization, combined with tensile-strength reduction, allows the crack tip to always find an easy propagation path.
Physical Review Letters 04/2010; 104(15):155502. · 7.37 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Silica glass was irradiated by swift heavy ions by selecting the ion species and its energy in order to induce the largest damaged regions. These regions were then selectively etched by hydrofluoric acid vapour to form nanopores on the glass surface. Subsequently, gold nanoparticles were embedded into the nanopores by vacuum evaporation, followed by thermal treatment. In the new plasmonic structure obtained with these procedures, the localized surface plasmon excitation wavelength induced around the gold nanoparticles was found to show a redshift, which agreed well with the theoretical calculation, when water was introduced into the nanopores. This indicates that the fabricated structure can be used as a sensing element to detect the adhesion of substances such as biomolecules to the nanoparticles by measuring the redshift.
Nanotechnology 11/2009; 20(47):475306. · 3.98 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In this review, we present our recent results for atomistic mechanisms of damage nucleation and growth and dynamic fracture in silica glass. These results have been obtained with multimillion-to-billion atom, parallel, molecular dynamics simulations of (1) the interaction and coalescence of nanovoids in amorphous silica subjected to dilatational strain and (2) the nucleation, growth and healing of wing cracks and damage nanocavities in silica glass under impact loading. We also give an overview of our current efforts to perform dynamic fracture simulations over microsecond time scales and multiscale simulations of stress corrosion cracking in silica glass.
Journal of Physics D Applied Physics 10/2009; 42(21):214011. · 2.54 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We study shear deformation and breakup of voids in silica glass using molecular dynamics simulations. With an increase in the shear strain, two kinds of defects--threefold-coordinated silicon and nonbridging oxygen atoms--appear as spherical voids deform elastically into ellipsoidal shapes. For shear strains epsilon>15%, nanocracks appear on void surfaces and voids deform plastically into a threadlike structure. Nanocracks are nucleated by the migration of threefold-coordinated Si and nonbridging O on -Si-O-Si-O- rings. For epsilon>40%, the threadlike structures break up into several fragments.
Physical Review Letters 07/2009; 103(3):035501. · 7.37 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We study the conformational transition in d,l-peptide nanorings (PNRs) and nanotubes (PNTs) computationally based on the total energy calculation. Ab initio energy calculation has been carried out to investigate the static states of PNRs, whereas the molecular dynamics (MD) calculation has been employed to examine PNRs' dynamical states. We, then, discuss the time-dependent (TD) feature via the transition process from E-type to B-type and vice versa. The conformational transition occurs easily from E-type equatorial (Eeq) to B-type axial (Bax) but is unreversible for the opposite direction because of a larger activation energy. The TD tracing of the two dihedral angles in the individual amino acid residues reveals that the conformational change propagates along the peptide skeleton ring nearly at the sound velocity. We further expand our study to the tubular forms and reveal that the PNT has an ability to produce the two kinds of homogeneous tubes, being composed of E rings (E-tube) and of B rings (B-tube), and also that these two PNRs should be mixed to produce a binary alloyed PNT.
The Journal of Physical Chemistry B 02/2009; 113(5):1473-84. · 3.70 Impact Factor
-
23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23-29, 2009; 01/2009
-
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2009, Las Vegas, Nevada, USA, July 13-17, 2009, 2 Volumes; 01/2009
-
Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, Delft, The Netherlands, August 25-28, 2009. Proceedings; 01/2009
-
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2009, Las Vegas, Nevada, USA, July 13-17, 2009, 2 Volumes; 01/2009
-
23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23-29, 2009; 01/2009
-
Aiichiro Nakano,
Rajiv K. Kalia, Ken-ichi Nomura,
Ashish Sharma,
Priya Vashishta,
Fuyuki Shimojo,
Adri C. T. van Duin,
William A. Goddard,
Rupak Biswas,
Deepak Srivastava,
Lin H. Yang
IJHPCA. 01/2008; 22:113-128.