Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Studying the dynamics of a large number of particles interacting through long-range forces, commonly referred to as the "N-body problem", is a central aspect of many different branches of physics. In recent years, physicists have made significant advances in the development of fast N-body algorithms to deal efficiently with such complex problems. This book gives a thorough introduction to these so-called "tree methods", setting out the basic principles and giving many practical examples of their use. The authors assume no prior specialist knowledge, and they illustrate the techniques throughout with reference to a broad range of applications. The book will be of great interest to graduate students and researchers working on the modeling of systems in astrophysics, plasma physics, nuclear and particle physics, condensed matter physics and materials science.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Supplementary resource (1)

... 21 to realize N-independence and further improve the efficiency of the current framework. Here, the PTROM employs hierarchical decomposition and source agglomeration via 165 the Barnes-Hut tree method [6,55]. The Barnes-Hut tree method builds a hierarchical quad-tree (or an oct-tree in three-dimensions) data structure, Ξ, that performs recursive partitioning over the entire domain (the root node) that contains all N bodies. ...
... Figure 3 illustrates the hierarchical data-structure generated by the Barnes-Hut tree method. The Barnes-Hut tree method is well-documented in the 170 literature, where pseudo codes and flowcharts to build the hierarchical data structure can be found in [55]. ...
... Traditionally, computing the Barnes-Hut tree decomposition and corresponding source clustering is performed over the N-body state-space, where both tree decomposition and clusters are updated at incremental time steps throughout a simulation. However, building the tree data structure and performing source clustering are N-dependent operations [55], which would not overcome the N-dependent OCC barrier in the hyper-reduction step, as discussed 205 in Section 3.5. To overcome the need to perform multiple online tree construction and clustering of the state space, the PTROM constructs the hierarchical data structure and source clustering in a weighted POD space, which occurs offline and only once. ...
Article
Full-text available
This work presents a data-driven reduced-order modeling framework to accelerate the computations of nonlocal and N-body methods that model dynamical systems. The proposed framework differs from traditional acceleration methods, like the Barnes–Hut method, which requires online tree building of the state space, or the fast-multipole method, which requires rigorous a priori analysis of governing kernels and online tree building. Our approach combines Barnes–Hut hierarchical decomposition, projection-based reduced-order modeling via the least-squares Petrov–Galerkin (LSPG) projection, and hyper-reduction by way of the Gauss–Newton with approximated tensor (GNAT) approach. The resulting projection-tree reduced-order model (PTROM) enables a drastic reduction in operational count complexity by constructing sparse hyper-reduced pairwise interactions of the non-compact N-body dynamical system. As a result, the presented framework is capable of achieving an operational count complexity that is independent of N, the number of bodies in the numerical domain. Capabilities of the PTROM method are demonstrated on the two-dimensional fluid-dynamic Biot–Savart kernel within a parametric and reproductive setting. Results show the PTROM is capable of achieving over 2000× wall-time speed-up with respect to the full-order model, where the speed-up increases with N. The resulting solution delivers quantities of interest with errors that are less than 0.1% with respect to full-order model.
... A classical system that consists of N particles interacting through long-range forces, called N -body problem, is a famous example in this class of problems and has interested physicists for centuries, because it is appearing in a wide range of applications and in many disciplines [4]. Each particle moves according to Newton's equations of motion, which can be calculated by various numerical integrators provided that the stimulating 1 force is known. ...
... The other interactions correspond to the direct summation. Since the particles are so close, the approximation through a pseudo-particle cannot be applied [4]. A variant of the tree code is the Warren-Salmon-HOT scheme (Hashed Oct-Tree) [12]. ...
... Once the global data structure is in place, it is a straightforward matter to calculate the pseudo-particle properties for each node, that will be accessed by the hash entry node. For every twig the calculation of the properties is simple, because the multipole moments can be successively shifted up to their parent level using displacement vectors [4]. ...
... , where erfc is the complementary error function and h is an integer reciprocal vector,  is a parameter determines the relative convergent speed of the two series but the total potential total  is independent of it(221). It is possible to achieve a good accuracy with relatively low values of p and h as suggested by Sangster and Dixon (222) that a practical choice of these parameters is:. ...
... In order to get higher local precision at the center of charge, we expand the potential at a contribution from its father nodes Some technique, such as tabulating the higher multipole moment correction and fast multipole method, can be applied to further accelerate the performance of tree algorithm. These content are not discussed in this work and can be found in the work of Pfalzner and Gibbon(221). ...
Article
Here, we used the coarse-grain molecular dynamics (CGMD) method to establish a simulation model for the axon plasma membrane (APM) that was then used to study the mechanical properties of the APM and to investigate how the axon plasma membrane skeleton (APMS) affects diffusion of membrane proteins in the axon. Super-resolution microscopy has illustrated that the APMS consists of periodic actin ring-like structures along its length connected by spectrin tetramers and anchored to the lipid bilayer via ankyrin. Based on these experimental results, we developed a CGMD model for the APMS. In particular, the model comprises representations of periodic actin rings, spectrin tetramers, ankyrin, and ankyrin associated sodium channels. The model was validated using atomic force microscopy experimental results, which showed that axons are almost ~6 fold stiffer than the soma and ~2 fold stiffer than dendrites. Using the APMS model, we demonstrated that because the spectrin filaments are under tension, the thermal motion of the actin-associated ankyrin particles is minimal. In addition, we showed that any axonal injuries causing laceration of spectrin filaments will likely lead to a permanent disruption of the membrane skeleton due to the inability of spectrin filaments to spontaneously form their initial under-tension configuration. Then, we extended the APMS model by adding a representation of the lipid bilayer to investigate the effect of the APMS on the diffusion of APM proteins. To reconcile the experimental observations, which show restricted diffusion of integral monotopic proteins (IMPs) of the outer leaflet, with our simulations, we conjectured the existence of actin-anchored proteins that form a fence restricting the longitudinal diffusion of IMPs of the outer leaflet. Our simulations also revealed that spectrin filaments could impede transverse diffusion in the inner leaflet of the axon and in some conditions modify diffusion from normal to abnormal. Finally, we introduced the Barnes-Hut tree algorithm to simulate the long-range potential with both open and periodic boundary conditions. In particular, we simulated the electrostatic potential between particles and validated the simulation method by measuring the electric field of an infinite plane and the Rayleigh-Taylor instability in the presence of the electric charges.
... Consequently, a number of alternative approaches have been developed. Examples are lattice sum methods [57,58,59,60,61], reaction field methods [62], cut-off methods [63,64,65,66], the isotropic periodic sum method [66], and hierarchical methods [67] such as multigrid [68,69,70,71,72] and fast multipole methods [73]. Further approaches can be found in the excellent review [74]. ...
... To mention just a few which are well adapted to the requirements of md, we have charge group cut-off [52], the isotropic periodic sum method [66], Lekner summation [148,149,150], Ewald [58] summation, smooth particle Ewald [142] summation and particle-particle-particle-mesh (P 3 M) [1]. There are also several variations of hierarchical methods [67]; a few examples are the method of Barnes and Hut (BH) [151], multigrid [68,69,70,71,72] the fast multipole method (FMM), with [152] and without [153] multipoles, and the cell multipole method [154]. ...
... After computing and / on the boundary with BEM, Equation (2) is often evaluated using fast multipole or Barnes-Hut schemes [Greengard and Rokhlin 1987;Pfalzner and Gibbon 1997]. These acceleration strategies are often necessary due to the quadratic complexity of evaluating the BIE. ...
Preprint
Grid-free Monte Carlo methods such as \emph{walk on spheres} can be used to solve elliptic partial differential equations without mesh generation or global solves. However, such methods independently estimate the solution at every point, and hence do not take advantage of the high spatial regularity of solutions to elliptic problems. We propose a fast caching strategy which first estimates solution values and derivatives at randomly sampled points along the boundary of the domain (or a local region of interest). These cached values then provide cheap, output-sensitive evaluation of the solution (or its gradient) at interior points, via a boundary integral formulation. Unlike classic boundary integral methods, our caching scheme introduces zero statistical bias and does not require a dense global solve. Moreover we can handle imperfect geometry (e.g., with self-intersections) and detailed boundary/source terms without repairing or resampling the boundary representation. Overall, our scheme is similar in spirit to \emph{virtual point light} methods from photorealistic rendering: it suppresses the typical salt-and-pepper noise characteristic of independent Monte Carlo estimates, while still retaining the many advantages of Monte Carlo solvers: progressive evaluation, trivial parallelization, geometric robustness, \etc{}\ We validate our approach using test problems from visual and geometric computing.
... (Pascual-Cid and Kaltenbrunner 2009). To highlight the arborescence of the discussion and to distinguish the arguments of every branch of the thread, the tool applies a flexible force-directed graph layout that accelerates charge interaction through the Barnes-Hut approximation (Pfalzner and Gibbon 2005). In addition, to identify the messages that receive more attention, the size of the nodes is proportional to the number of votes. ...
Article
Full-text available
Online debate tools for participatory democracy and crowdsourcing legislation are limited by different factors. One of them arises when discussion of proposals reaches a large number of contributions and therefore citizens encounter difficulties in mapping the arguments that constitute the dialectical debate. To address this issue, we present a visualization tool that shows the discussion of any proposal as an interactive radial tree. The tool builds on Decide Madrid, a recently created platform for direct democracy launched by the City Council of Madrid. Decide Madrid is one of the most relevant platforms that allows citizens to propose, debate and prioritise city policies.
... These equations are solved by time integration schemes based on finite difference methods such as the Gear algorithm or Verlet method (31)(32). By the knowledge of the position and their time derivatives at time t, the scheme gives the same quantities at a later time. ...
Article
Full-text available
Understanding the microscopic dispersion and aggregation of nanoparticles in nanoscale media has become an important challenge during the last decades. Molecular dynamics is one of the important techniques to tackle many of the complex problems faced by rheologists and engineers. Making progress in the investigations at nanoscale whether experimentally or computationally has helped understand the physical phenomena at the molecular scale. In addition, important developments have been made in predicting behavior of confined fluids and lubricants at nanoscale. In this review we will discuss on some progress made on the illustration of aggregation mechanisms in nanofluids. Our main focus will be on the application of molecular modeling in the effect of aggregation on the nano-rheology of nanofluids.
... The surface density of interstellar objects is tracked by 6 × 10 4 initial test particles (white contours; in projection). Barnes & Hut 1986) is used for modelling the self-gravity of the gas (Pfalzner & Gibbon 1996;Bédorf et al. 2014). In contrast, as only a few dozen sink particles are created in these simulations, their self-gravity is calculated to machine precision using direct N-body integration, for which we use Huayno (Jänes et al. 2014). ...
Preprint
Full-text available
Interstellar objects (ISOs), the parent population of 1I/Oumuamua and 2I/Borisov, are abundant in the interstellar medium of the Milky Way. This means that the interstellar medium, including molecular cloud regions, has three components: gas, dust, and ISOs. From the observational constraints for the field density of ISOs drifting in the solar neighbourhood, we infer a typical molecular cloud of 10 pc diameter contains some 1018^{18} ISOs. At typical sizes ranging from hundreds of metres to tens of km, ISOs are entirely decoupled from the gas dynamics in these molecular clouds. Here we address the question of whether ISOs can follow the collapse of molecular clouds. We perform low-resolution simulations of the collapse of molecular clouds containing initially static ISO populations toward the point where stars form. In this proof-of-principle study, we find that the interstellar objects definitely follow the collapse of the gas --- and many become bound to the new-forming numerical approximations to future stars (sinks). At minimum, 40\% of all sinks have one or more ISO test particles gravitationally bound to them for the initial ISO distributions tested here. This value corresponds to at least 101010^{10} actual interstellar objects being bound after three initial free-fall times. Thus, ISOs are a relevant component of star formation. We find that more massive sinks bind disproportionately large fractions of the initial ISO population, implying competitive capture of ISOs. Sinks can also be solitary, as their ISOs can become unbound again --- particularly if sinks are ejected from the system. Emerging planetary systems will thus develop in remarkably varied environments, ranging from solitary to richly populated with bound ISOs.
... 用树结构管理空间分散 数据在内存和快速搜索方面有明显优势. 在天体演 化及星系形成研究领域, 空间多级树广泛用来管理 空间的质量颗粒 [14,15] , 实现颗粒受力的快速计算和 星系质量分布分析. 在本工作中, 我们采用空间多级"树"管理 n 维空 间中的物体, 提出了适应给定搜索条件的两种快速 搜索算法, 在二维和三维空间上实现了 Delaunay 空 间划分和团簇构造. ...
... There are various adaptation methods explored by many researchers in the mid 1980s through the mid 90s, to improve the PM and P3M methods such as: the Nested Grid Particle-Mesh and Tree-Codes methods [http://www.amara.com/papers/nbody.html; Pfalzner and Gibbon, 1996]. However, the most successful algorithm solving the problem of the long-range interactions is based on the Multi-Grid separation of scales (see Fig.7). ...
Chapter
Full-text available
Mesoscopic features embedded within macroscopic phenomena in colloids and suspensions, when coupled together with microstructural dynamics and boundary singularities, produce complex multiresolutional patterns, which are difficult to capture with the continuum model using partial differential equations, i.e., the Navier-Stokes equation and the Cahn-Hillard equation. The con- tinuum model must be augmented with discretized microscopic models, such as molecular dynamics (MD), in order to provide an effective solver across the diverse scales with different physics. The high degree of spatial and temporal disparities of this approach makes it a computa- tionally demanding task. In this survey we present the off-grid discrete-particles methods, which can be applied in modeling cross-scale properties of complex fluids. We can view the cross- scale endeavor characteristic of a multiresolutional homogeneous particle model, as a manifestation of the interactions present in the discrete particle model, which allow them to produce the micro- scopic and macroscopic modes in the mesoscopic scale. First, we describe a discrete-particle models in which the following spatio-temporal scales are obtained by subsequent coarse-graining of hierarchical systems consisting of atoms, molecules, fluid particles, and moving mesh nodes. We then show some examples of 2D and 3D modeling of the Rayleigh-Taylor mixing, phase separ- ation, colloidal arrays, colloidal dynamics in the mesoscale, and blood flow in microscopic vessels. The modeled multiresolutional patterns look amazingly similar to those found in laboratory experiments and can mimic a single micelle, colloidal crystals, large-scale colloidal aggregates up to scales of hydrodynamic instabilities, and the macroscopic phenomenon involving the clustering of red blood cells in capillaries. We can summarize the computationally homogeneous discrete par- ticle model in the following hierarchical scheme: nonequilibrium molecular dynamics (NEMD), dissipative particle dynamics (DPD), fluid particle model (FPM), smoothed particle hydrodynamics (SPH), and thermodynamically consistent DPD. An idea of a powerful toolkit over the GRID can be formed from these discrete particle schemes to model successfully multiple-scale phenomena such as biological vascular and mesoscopic porous-media systems.
... Particle-mesh (PM) methods proved insufficient to obtain the required force resolution and were replaced by P 3 M (Particle-Particle PM) algorithms (e.g., Ref. 3 ), and tree codes. 18 Because of the high degree of clustering in cosmological simulations, P 3 M codes have been mostly displaced by tree codes (nevertheless, as demonstrated by HACC, P 3 M can be resurrected for CPU/GPU systems). To localize tree walks and make handling periodic boundary conditions easier, hybrid TreePM methods were introduced, and form the mainstay of gravity-only cosmology simulations. ...
Article
Supercomputing is evolving toward hybrid and accelerator- based architectures with millions of cores. The Hardware/Hybrid Accelerated Cosmology Code (HACC) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. In this Research Highlight, we demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining very high levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
... The MicPIC dynamics can be solved efficiently by using the particle-particle particle-mesh (P 3 M) concept introduced originally for electrostatic simulations by Eastwood and Hockney [36] and is capable of tracking 10 7 particles on a single CPU (∼10 10 expected with parallelization ). Microscopic resolution with comparable particle numbers was so far restricted to electrostatic P 3 M or tree schemes [37, 38], which however neglect laser propagation and magnetic fields. The key advantages of MicPIC over conventional PIC approaches are the atomistic resolution of the plasma dynamics (including the surface) as well as the capability to directly model strongly coupled plasmas. ...
Chapter
This chapter provides an overview over the numerical method of microscopic particle-in-cell (MicPIC), its validation, and some applications. It focuses on clusters exposed to intense light fields, as they present an ideal testbed for MicPIC for the following reasons. First, analytical solutions (Mie solution) exist, by which the validity of the MicPIC approach can be tested. Second, nano-plasma processes can be investigated over a wide range of sizes, changing the weight of plasma volume-to-surface processes. Third, laser-cluster interaction has important applications in the areas of nanophotonics, nonlinear optics, and strong-field laser physics. The chapter reviews the validity ranges of the different theoretical approaches to classical light-matter interaction. It presents the formal theory behind MicPIC, its implementation, and validation. The chapter explores the application of MicPIC to the microscopic analysis of light-matter processes in cluster nanoplasmas.
... Simulating such an n-body system would have a computational complexity of O(n 2 ), to overcome this problem D3 uses the Barnes-Hut (Pfalzner et al., 2005) approximation algorithm. In this, a quadtree is applied to accelerate the charge interactions between the particles, reducing the computational complexity to O(n log(n) ). ...
Article
Full-text available
In this paper, a novel framework for social user clustering is proposed. Given a current controversial political topic, the Louvain Modularity algorithm is used to detect communities of users sharing the same political preferences. The political alignment of a set of users is labeled manually by a human expert and then the quality of the community detection is evaluated against this gold standard. In the last section, we propose a novel force-directed graph algorithm to generate a visual representation of the detected communities.
... Simulating such an n-body system would have a computational complexity of ( ), to overcome this problem D3 uses the Barnes-Hut [9] approximation algorithm. In this, a quadtree is applied to accelerate the charge interactions between the particles, reducing the computational complexity to ( log( )). ...
Chapter
In this paper, a novel agent-based platform for Twitter user clustering is proposed. We describe how our system tracks the activity for a given topic in the social network and how to detect communities of users with similar political preferences by means of the Louvain Modularity. The quality of this clustering method is evaluated against a subset of human-labeled user profiles. Finally, we propose combining community detection with a force-directed graph algorithm to produce a visual representation of the political communities.
... For a N-particle simulation, it generally requires computational time of the order of O[N log(N)] using a tree code algorithm (e.g. [24]). Thus an estimated reduction in computational time is 12 times if 2.0 g of particles is used instead of 20.0 g. ...
Article
Full-text available
A model for polydisperse particle clouds has been developed in this study. We extended the monodisperse particle cloud model of Lai et al. (Environ Fluid Mech 13(5):435–463, 2013) to the case of polydisperse particles. The particle cloud is first considered to be a thermal or buoyant vortex ring, with the thermal induced velocity field modeled by an expanding spherical Hill’s vortex. The buoyancy of the composite thermal is assumed to be the sum of buoyancy contributed by the all particles inside the thermal. Individual particles (of different particle properties) in the cloud are then tracked by the particle tracking equation using the computed induced velocity field. The turbulent dispersion effect is also accounted for by using a random walk model. Experiments of polydisperse particle clouds were carried out to validate the model. The agreement between model predictions and experiments was reasonable. We further validate our model by comparing it with the LES study of Wang et al. (J Hydraul Eng ASCE 141(7):06015006, 2014). The limitations of our model are then discussed with reference to the comparison. Overall, although some flow details are not captured by our model, the simplicity and generality of the model makes it useful in engineering applications.
... In this case there are 936 vertices, and 32 supernodes. Under a reasonable assumption [8,83] of the distribution of vertex positions , it can be proved that building the quadtree takes a time complexity of O(|V | log |V |). Finding all the supernodes with reference to a vertex i can be done in a time complexity of O(log|V |). ...
Article
With the prevalence of big data, there is a growing need for algorithms and techniques for visualizing very large and complex graphs. In this article, we review layout algorithms and interactive exploration techniques for large graphs. In addition, we briefly look at softwares and datasets for visualization graphs, as well as challenges that need to be addressed. WIREs Comput Stat 2015, 7:115–136. doi: 10.1002/wics.1343 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Data: Types and Structure > Graph and Network Data Statistical and Graphical Methods of Data Analysis > Statistical Graphics and Visualization
... The second is a parallel N-body tree method implemented for traditional CPU-cluster environments. 19 Details of the Biot-Savart acceleration methods are given in Appendix A. ...
Conference Paper
A new coupled Eulerian/Lagrangian CFD method is presented for rotorcraft wake flow modeling. Specifically, the Vortex Particle Method is coupled with an overset, finite-difference URANS algorithm to solve the wallbounded and wake flow. The coupled algorithm is presented in detail along with the necessary parallel computing algorithms. The coupled algorithm is then used to model the flow over a NACA0015 airfoil wing at 12° angle-of-attack. Results from the coupled algorithm are compared to a baseline CFD solution and, where available, experimental data. Surface pressure profiles, sectional loads and tip vortex visualization and velocity profiles are compared to assess the effectiveness of the coupling algorithm. Comparisons show that the coupled approach can capture the tip vortex far better than the baseline URANS solution. However, the particle solution can be significantly impacted by the excessive dissipation through the Eulerian domain.
... Each square is checked, and recursively opened, until the inequality (1) is satisfied. Under a reasonable assumption [2,48] of the distribution of vertices, it can be proved that building the quadtree takes a time complexity of O(|V | log |V |). Finding all the supernodes with reference to a vertex i can be done in a time complexity O(log|V |). ...
Article
Full-text available
Graphs are often used to encapsulate relationship between objects. Graph drawing enables visualization of such relationships. The usefulness of this visual representation is dependent on whether the drawing is aesthetic. While there are no strict criteria for aesthetics of a drawing, it is generally agreed, for example
... Popular alternatives include pure particle-based methods (tree codes) or multi-scale grid-based methods (AMR codes), or hybrids of the two (TreePM, particle-particle particlemesh, P 3 M). It is not our purpose here to go into many details of the algorithms and their implementations; good coverage of the background material can be found in Barnes & Hut (1986), Hockney & Eastwood (1988), Warren & Salmon (1993), Pfalzner & Gibbon (1996), Dubinski et al. (2004), Springel (2005, and Dolag et al. (2008). ...
Article
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.
... The MicPIC dynamics can be solved efficiently by using the particle-particle particle-mesh (P 3 M) concept introduced originally for electrostatic simulations by Eastwood and Hockney [36] and is capable of tracking 10 7 particles on a single CPU (∼10 10 expected with parallelization ). Microscopic resolution with comparable particle numbers was so far restricted to electrostatic P 3 M or tree schemes [37, 38], which however neglect laser propagation and magnetic fields. The key advantages of MicPIC over conventional PIC approaches are the atomistic resolution of the plasma dynamics (including the surface) as well as the capability to directly model strongly coupled plasmas. ...
Article
The dynamics of solid-density nanoplasmas driven by intense lasers takes place in the strongly-coupled plasma regime, where collisions play an important role. The microscopic particle-in-cell method has enabled the complete classical electromagnetic description of these processes. The theoretical foundation of the approach and its relation to existing methods are reviewed. Selected applications to laser cluster processes are presented that have been inaccessible to numerical simulation so far. The dynamics of solid-density nanoplasmas driven by intense lasers takes place in the strongly-coupled plasma regime, where collisions play an important role. The microscopic particle-in-cell method has enabled the complete classical electromagnetic description of these processes. The theoretical foundation of the approach and its relation to existing methods are reviewed. image</graphic
... t too easily becomes dominated by the direct summation part and becomes unacceptable slow for larger systems. There are various adaptation methods explored by many researchers in the mid 1980s through the mid 90s, to improve the PM and P3M methods such as: the Nested Grid Particle-Mesh and Tree-Codes methods [http://www.amara.com/papers/nbody.html; Pfalzner and Gibbon, 1996]. However, the most successful algorithm solving the problem of the long-range interactions is based on the Multi- Grid separation of scales (seeFig.7). The Fast Multipole Method (FMM) is a tree code that uses two representations of the potential field. As shown inFig.8, the two representations are: far field (multipole) and local expans ...
Article
Full-text available
Mesoscopic features embedded within macroscopic phenomena in colloids and suspensions, when coupled together with micro-structural dynamics and boundary singularities, produce complex multi-resolution patterns, which are difficult to capture with the continuum model using partial differential equations, i.e., the Navier-Stokes equation and the Cahn-Hillard equation. The continuum model must be augmented with discretized microscopic models, such as molecular dynamics (MD), in order to provide an effective solver across the diverse scales with different physics. The high degree of spatial and temporal disparities of this approach makes it a computationally demanding task. In this survey we present the off-grid discrete-particles methods, which can be applied in modeling cross-scale properties of co mplex fluids. We can view the cross-scale endeavor characteristic of a multi-resolution homogeneous particle model, as a manifestation of the interactions present in the discrete particle model, which allow them to produce the microscopic and macroscopic modes in the mesoscopic scale. First, we describe a discrete-particle models in which the following spatio-temporal scales are obtained by subsequent coarse-graining of hierarchical systems consisting of atoms, molecules, fluid particles and moving mesh nodes. We then show some examples of 2D and 3D modeling of the Rayleigh- Taylor mixing, phase separation, colloidal arrays, colloidal dynamics in the mesoscale and blood flow in microscopic vessels. The modeled multi-resolution patterns look amazingly similar to those found in laboratory experiments and can mimic a single micelle, colloidal crystals, large- scale colloidal aggregates up to scales of hydrodynamic instabilities and the macroscopic phenomenon involving the clustering of red blood cells in capillar ies. We can summarize the computationally homogeneous discrete particle model in the following hierarchical scheme: non- equilibrium molecular dynamics (NEMD), dissipative particle dynamics (DPD), fluid particle model (FPM), smoothed particle hydrodynamics (SPH) and thermodynamically consistent DPD. An idea of powerful toolkit over the GRID can be formed from these discrete particle schemes to model successfully multiple-scale phenomena such as biological vascular and mesoscopic porous-media systems.
Article
Full-text available
The fragment molecular orbital (FMO) method is an efficient quantum chemical method suitable for calculating the electronic structures of large molecular systems. FMO can be accelerated by several approximations, an important one being the approximation to the environmental electrostatic potential (ESP) exerted on the fragment monomers or dimers. The environmental ESP is often approximated using the Mulliken atomic orbital charge (AOC) for proximal fragment dimers (ESP-AOC approximation) and the Mulliken point charge (PTC) for distant dimers (ESP-PTC approximation). Recently, another approximation method based on Cholesky decomposition with adaptive metric has been proposed for environmental ESP (ESP-CDAM, Okiyama et al., Bull. Chem. Soc. Jpn., 2021, 94, 91). In the current article, the energy gradient is derived under the ESP-CDAM approximation and implemented in FMO-based molecular dynamics (FMO-MD). Several test FMO-MD simulations are performed to compare the accuracy of the ESP-CDAM approximation with that of the conventional methods. The results show that ESP-CDAM is more accurate than ESP-AOC. Fullsize Image
Article
In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothness of Bézier curves, or, when the polygonal approximation is extremely detailed, result in large and costly systems of equations that must be solved. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation. Given a desired resolution and viewport, we then interpolate this solution and use the boundary element method to render it. We couple this hybrid approach with the fast multipole method on a non-uniform quadtree for efficient computation. Furthermore, we introduce an adaptive strategy to enable truly scalable infinite-resolution diffusion curves.
Article
Grid-free Monte Carlo methods such as walk on spheres can be used to solve elliptic partial differential equations without mesh generation or global solves. However, such methods independently estimate the solution at every point, and hence do not take advantage of the high spatial regularity of solutions to elliptic problems. We propose a fast caching strategy which first estimates solution values and derivatives at randomly sampled points along the boundary of the domain (or a local region of interest). These cached values then provide cheap, output-sensitive evaluation of the solution (or its gradient) at interior points, via a boundary integral formulation. Unlike classic boundary integral methods, our caching scheme introduces zero statistical bias and does not require a dense global solve. Moreover we can handle imperfect geometry (e.g., with self-intersections) and detailed boundary/source terms without repairing or resampling the boundary representation. Overall, our scheme is similar in spirit to virtual point light methods from photorealistic rendering: it suppresses the typical salt-and-pepper noise characteristic of independent Monte Carlo estimates, while still retaining the many advantages of Monte Carlo solvers: progressive evaluation, trivial parallelization, geometric robustness, etc. We validate our approach using test problems from visual and geometric computing.
Article
Full-text available
Local graph symmetry groups act in a non-identical fashion on just a proper (local) subset of a graph’s vertices, and consequent theorems for adjacency matrices simplify eigen-solutions. These theorems give a way to deal with a hierarchy of local sub-symmetries, such as are manifested by so-called “dendrimers”, which are (highly) branched polymers obtainable at a given generation number r from the polymer at the preceding generation number (r − 1) by connecting d copies of new branching monomer units to each end-unit of this preceding tree-like dendrimer, the initial generation number r = 1 consisting of a single monomer unit connected to d others. Our local symmetry methodology leads to an (essentially) analytic eigen-solution for the Bethe tree case, with the branching units just single sites—but further there result novel (qualitatively distinctive) features: eigenvector localization and eigenvalue clumping. Moreover, these novel characteristics persist for more general “dendrimers”, here considered and illustrated in the context of electronic structure of conjugated-carbon π-networks. The overall view here is of a systematic development and characterization for such dendrimer polymers paralleling some aspects of the standard development and characteristics for linear-chain benzenoid polymers—for instance, that of plotting eigen-energies as a function of symmetry. Clumping of eigen-spectra, and localization features in dendrimer eigenfunctions occur and are examined.
Article
Full-text available
Monte Carlo simulations of the failure of unidirectional fibre composites typically require numerous evaluations of the stress-state in partially damaged composite patches. In a simulated composite patch comprised of N fibres, of which NbN_b fibres are broken in a common cross-sectional plane transverse to the fibre direction, the stress overloads in the intact fibres are given by the weighted superposition of the unit break solutions associated with each of the breaks. Determining the weights involves solving NbN_b linear equations, and determining overloads in the intact fibres requires matrix-vector multiplication. These operations require O(Nb3)O(N_b^3), and O(NNb)O(N N_b) floating point operations, respectively. These costs become prohibitive for large N, and NbN_b; they limit Monte Carlo failure simulations to composite patches of only a few thousand fibres. In the present work, a fast algorithm to determine the overloads in a partially damaged composite, requiring O(Nb1/3NlogN)O( N_b^{1/3} N \log N) floating point operations, is proposed. This algorithm is based on the discrete Fourier transform. The efficiency of the proposed method derives from the computational simplicity of weighted superposition in Fourier space. Computations of the stress state ahead of large circular clusters of breaks in composite patches comprised of about one million fibres are used to demonstrate the efficiency of the proposed algorithm.
Article
A new approach to Darwin or magnetoinductive plasma simulation is presented, which combines a mesh-free field solver with a robust time-integration scheme avoiding numerical divergence errors in the solenoidal field components. The mesh-free formulation employs an efficient parallel Barnes-Hut tree algorithm to speed up the computation of fields summed directly from the particles, avoiding the necessity of divergence cleaning procedures typically required by particle-in-cell methods. The time-integration scheme employs a Hamiltonian formulation of the Lorentz force, circumventing the development of violent numerical instabilities associated with time differentiation of the vector potential. It is shown that a semi-implicit scheme converges rapidly and is robust to further numerical instabilities which can develop from a dominant contribution of the vector potential to the canonical momenta. The model is validated by various static and dynamic benchmark tests, including a simulation of the Weibel-like filamentation instability in beam-plasma interactions.
Chapter
This chapter analyzes nationwide supplier–buyer relationship data for nearly a million firms and 4 million transactions in Japan. The production network constructed by firms through their transaction relations reflects the characteristics of economic activities in Japan. For an intuitive understanding of the network structure, we first visualize the network in three-dimensional space using a spring–electrostatic model. In this model, we replace nodes (firms) and links (transaction relations) by particles with identical charges and springs. This visualization shows that the network is highly heterogeneous, with some firms being tightly connected and forming groups, between which there are much looser connections. Such industrial communities are identified here using algorithms that maximize modularity, which measures the share of links encircled by a given partition of nodes, with reference to the expected share of intra-links for corresponding random networks with the same node partitions. Since major communities thereby detected are still very heterogeneous, the detection of communities is repeated within them. The 10 largest communities and their principal sub-communities are then characterized by areal and industry sectoral attributes of firms. In addition, how closely the sub-communities are related to each other is quantified by introducing a metric of “distance” between them. Finally, the hierarchical relationship between the communities is clarified by considering directional features of the transactions.
Article
A new method for calculating the resistance tensors of arbitrarily shaped particles and the translational and rotational self-diffusivity in suspensions of such particles is developed. This approach can be harnessed to efficiently and accurately predict the hydrodynamic and transport properties of large macromolecules such as antibodies in solution. Particles are modeled as a rigid composite of spherical beads, and the continuum equations for low Reynolds number fluid mechanics are used to calculate the drag on the composite or its diffusivity in a solution of other composites. The hydrodynamic calculations are driven by a graphics processing unit (GPU) implementation of the particle-mesh-Ewald technique which offers log-linear scaling with respect to the complexity of the composite-bead particles modeled as well as high speed execution leveraging the hyper-parallelization of the GPU. Matrix-free expressions for the hydrodynamic resistance and translational and rotational diffusivity of composite bead particles are developed, which exhibit substantial improvements in computational complexity over existing approaches. The effectiveness of these methods is demonstrated through a series of calculations for composite-bead particles having a spherical geometry, and the results are compared to exact solutions for spheres. Included in the supplementary material is an implementation of the proposed algorithm which functions as a plug-in for the GPU molecular dynamics suite HOOMD-blue.
Article
We overview the Fast Multipole Method (FMM) and the Barnes-Hut tree method. These algorithms evaluate mutual gravitational interaction between N particles in O (N) or O (N log N) times, respectively. We present basic algorithms as well as recent developments, such as Anderson's method of using Poisson's formula, the use of FFT, and other optimization techniques. We also summarize the current states of two algorithms. Though FMM with O (N) scaling is theoretically preferred over O (N log N) tree method, comparisons of existing implementations proved otherwize.
Article
We overview our GRAPE (GRAvity PipE) and GRAPE-DR project to develop dedicated computers for astrophysical N-body simulations. The basic idea of GRAPE is to attach a custom-build computer dedicated to the calculation of gravitational interaction between particles to a general-purpose programmable computer. By this hybrid architecture, we can achieve both a wide range of applications and very high peak performance. GRAPE-6, completed in 2002, achieved the peak speed of 64 Tflops. The next machine, GRAPE-DR, will have the peak speed of 2 Pflops and will be completed in 2008. We discuss the physics of stellar systems, evolution of general-purpose high-performance computers, our GRAPE and GRAPE-DR projects and issues of numerical algorithms.
Article
When ultrahigh irradiance laser pulses interact with high energy electrons, the emission of radiation from these electrons can significantly affect the motion of the electrons themselves (radiation reaction). In this paper we present results for a single particle and multi-particles interacting with a laser. We can directly solve the interaction between the electrons numerically allowing an accurate description of the dynamics. Results of a single high energy, 1 GeV, electron interacting with a laser pulse of irradiance 10 22 W/cm 2 show significant energy loss over a short time indicating the possibility of a very powerful gamma ray source. We examine the collective effects of the simplest case of two electrons with equal energies, but with initial spatial offsets undergoing strong radiation reaction effects. We find that due to the mutual repulsive force acting on the electrons the electrons scatter off each other and acquire different energies due to both mutual repulsion and radiation reaction.
Article
Full-text available
Variations of k-d trees represent a fundamental data structure used in Computational Geometry with numerous applications in science. For example particle track fitting in the software of the LHC experiments, and in simulations of N-body systems in the study of dynamics of interacting galaxies, particle beam physics, and molecular dynamics in biochemistry. The many-body tree methods devised by Barnes and Hutt in the 1980s and the Fast Multipole Method introduced in 1987 by Greengard and Rokhlin use variants of k-d trees to reduce the computation time upper bounds to O(n log n) and even O(n) from O(n2). We present an algorithm that uses the principle of well-separated pairs decomposition to always produce compressed trees in O(n log n) work. We present and evaluate parallel implementations for the algorithm that can take advantage of multi-core architectures.
Conference Paper
The critical path, which describes the longest execution sequence without wait states in a parallel program, identifies the activities that determine the overall program runtime. Combining knowledge of the critical path with traditional parallel profiles, we have defined a set of compact performance indicators that help answer a variety of important performance-analysis questions, such as identifying load imbalance, quantifying the impact of imbalance on runtime, and characterizing resource consumption. By replaying event traces in parallel, we can calculate these performance indicators in a highly scalable way, making them a suitable analysis instrument for massively parallel programs with thousands of processes. Case studies with real-world parallel applications confirm that - in comparison to traditional profiles - our indicators provide enhanced insight into program behavior, especially when evaluating partitioning schemes of MPMD programs.
Article
Rotor-induced-flow modeling and prediction has been one of the central issues for rotorcraft performance, control, stability, loads, and vibration analysis for decades. Traditional singularity-based methods used in most current comprehensive rotorcraft analysis codes are limited by the potential flow assumption and thus have to rely on empirical formulations (e.g., vortex decay factor, vortex core size, etc.) to reach a solution. This paper discusses the development and validation of a viscous vortex particle model for modeling the complicated rotor wake vorticity transportation and diffusion. Instead of solving the viscous vorticity equations through numerical discretization over the flowfield grid, the vortex particle method addresses the solution through a Lagrangian formulation in which there is no artificial numerical dissipation involved. The Lagrangian approach also allows the application of the hierarchical TreeCode and the fast multipole method. These methods can dramatically improve the computational efficiency of the viscous vortex particle simulation, which enables it to be used for practical and comprehensive rotorcraft analysis.
Article
Event traces are helpful in understanding the performance behavior of parallel applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks on most cluster systems may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors or confuse the users of time-line visualization tools by showing messages flowing backward in time. In our earlier work, we have developed a scalable algorithm called the controlled logical clock that eliminates inconsistent inter-process timings postmortem in traces of pure MPI applications, potentially running on large processor configurations. In this paper, we first demonstrate that our algorithm also proves beneficial in computational grids, where a single application is executed using the combined computational power of several geographically dispersed clusters. Second, we present an extended version of the algorithm that—in addition to message-passing event semantics—also preserves and restores shared-memory event semantics, enabling the correction of traces from hybrid applications.
Article
Full-text available
We discuss a microscopic particle-in-cell (MicPIC) approach that allows bridging of the microscopic and macroscopic realms of laser-driven plasma physics. The simultaneous resolution of collisions and electromagnetic field propagation in MicPIC enables the investigation of processes that have been inaccessible to rigorous numerical scrutiny so far. This is illustrated by the two main findings of our analysis of pre-ionized, resonantly laser-driven clusters, which can be realized experimentally in pump–probe experiments. In the linear response regime, MicPIC data are used to extract the individual microscopic contributions to the dielectric cluster response function, such as surface and bulk collision frequencies. We demonstrate that the competition between surface collisions and radiation damping is responsible for the maximum in the size-dependent lifetime of the Mie surface plasmon. The capacity to determine the microscopic underpinning of optical material parameters opens new avenues for modeling nano-plasmonics and nano-photonics systems. In the non-perturbative regime, we analyze the formation and evolution of recollision-induced plasma waves in laser-driven clusters. The resulting dynamics of the electron density and local field hot spots opens a new research direction for the field of attosecond science.
Conference Paper
This paper provides a performance and programmability comparison of high-level parallel programming support in Haskell, F# and Scala. Developing several parallel versions, we employ skeleton-based, semi-explicit and explicit approaches to parallelism. We focus on advanced language features for separating computational and coordination aspects of the code and tuning performance. We also assess the impact of functional purity and multi-paradigm design of the languages on program development and performance. Basis for these comparisons are several Barnes-Hut implementations of the n-body problem in all three languages, on both Linux and Windows. Our performance measurements on state-of-the-art multi-cores achieve a speedup up to 5.62 (on 8 cores) with a highly-tuned Haskell version. For comparable implementations in Scala and F# we achieve speedups of 4.51 (on 8 cores) and 2.28 (on 4 cores), respectively. We observe that near best speedups are achieved using the highest level abstraction in these languages.
Article
We shed light on industrial structure of the economic system in Japan by combining visualization technique and community analysis. The production network consisting of submillion nodes (firms) and three million links (transactions) is visualized taking advantage of MD simulation technique. Also communities inherent in such a large-scale network is extracted through maximization of the modularity using both greedy (bottom-up) and bisection (top-down) algorithms; the bisection method works better. It is shown that nodes belonging to the same community are located close to each other in a visualization (three-dimensional) space.
Article
Sporadically, relativistic energies have been considered already in the foregoing chapters. As the laser flux density in the near infrared and visible long wavelength regime exceeds I @ 1018 Wcm-2I\simeq 10^{18}\,{\textrm{Wcm}^{-2}} , the electron quiver energy assumes relativistic values. A brief presentation of basic relativity may be useful here.
Article
Full-text available
c 2006 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and th at copies bear this notice and the full citation on the first page. To cop y otherwise requires prior specific permission by the publisher mention ed above.
Article
The acceleration and transport of energetic particles prod uced by high intensity laser inter- action with solid targets is studied using a recently develo ped plasma simulation technique. Based on a parallel tree algorithm, this method provides a powerful, mesh-free approach to numerical plasma modelling, permitting 'whole target' inv estigations without the need for artificial particle and field boundaries. Moreover, it also o ffers a natural means of treat- ing three-dimensional, collisional transport effects hit herto neglected or suppressed in con- ventional explicit particle-in-cell simulation. Multi-m illion particle simulations of this chal- lenging interaction regime using the code PEPC (Pretty Effic ient Parallel Coulomb-solver: http://www.fz-juelich.de/zam/pepc) have been performed on the JUMP and BlueGene/L computers for various open-boundary geometries. These simulations highlight the importance of target resisitivity and surface effects o n the fast electron current flow.
Article
An parallel tree code for rapid computation of long-range Coulomb forces based on the Warren-Salmon`Hashed Oct Tree' algorithm is described. Communication overhead is minimised by bundling multipoledata for large groups of particles prior to shipment. Implementations on the Cray T3E and the IBM-p690cluster show the expected O(N log N) scaling with particle number, as well as good scaling properties withnumber of processors.
ResearchGate has not been able to resolve any references for this publication.