## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

HONPAS is an ab initio electronic structure program for linear scaling or O(N) first-principles calculations of large and complex systems using standard norm-conserving pseudopotentials, numerical atomic orbitals (NAOs) basis sets, and periodic boundary conditions. HONPAS is developed in the framework of the SIESTA methodology and focuses on the development and implementation of efficient O(N) algorithms for ab initio electronic structure calculations. The Heyd–Scuseria–Ernzerhof (HSE) screened hybrid density functional has been implemented using a NAO2GTO scheme to evaluate the electron repulsion integrals (ERIs) with NAOs. ERI screening techniques allow the HSE functional calculations to be very efficient and scale linearly. The density matrix purification algorithms have been implemented, and the PSUTC2 and SUTC2 methods have been developed to deal with spin unrestricted systems with or without predetermined spin multiplicity, respectively. After the self-consistent field (SCF) process, additional O(N) post-SCF calculations for frontier molecular orbitals and maximally localized Wannier functions are also developed and implemented. Finally, an O(N) method based on the density matrix perturbation theory has been proposed and implemented to treat electric field in solids. This article provides an overall introduction to capabilities of HONPAS and implementation details of different O(N) algorithms. © 2014 Wiley Periodicals, Inc.

To read the full-text of this research,

you can request a copy directly from the authors.

... To demonstrate the computational efficiency of our framework, we conducted a comparative analysis with DFT methodologies on GaAs for electron-phonon interaction at the HSE level and on CsV 3 Sb 5 for superconductivity. We used the HONPAS package 22,23 to compute the HSE Hamiltonian matrices of GaAs supercells. For CsV 3 Sb 5 , we employed Quantum ESPRESSO 28,29 coupled with the electron-phonon Wannier (EPW) 2,30 approach to calculate superconducting properties. ...

... Utilizing HamGNN, we propose an efficient framework for computing the electron-phonon interaction, as shown in Fig. 1b. As the first step, we generate a training dataset of ab initio tight-binding Hamiltonian matrices using software such as OpenMX 20,21 , HONPAS 22,23 or ABA-CUS 24,25 to train a HamGNN model. If the HSE Hamiltonian matrix is used as the training set, HamGNN can predict EPCs at HSE levels. ...

... We calculated the Hamiltonian matrices for 30 GaAs supercells with dimensions of 2 × 2 × 2 using the HSE functional within the HONPAS 22,23 package. These supercells were constructed by expanding the experimental primitive cell of GaAs, and each atom was randomly perturbed by 0.02 Å. ...

The calculation of electron–phonon couplings (EPCs) is essential for understanding various fundamental physical properties, including electrical transport, optical and superconducting behaviors in materials. However, obtaining EPCs through fully first-principles methods is notably challenging, particularly for large systems or when employing advanced functionals. Here we introduce a machine learning framework to accelerate EPC calculations by utilizing atomic orbital-based Hamiltonian matrices and gradients predicted by an equivariant graph neural network. We demonstrate that our method not only yields EPC values in close agreement with first-principles results but also enhances calculation efficiency by several orders of magnitude. Application to GaAs using the Heyd–Scuseria–Ernzerhof functional reveals the necessity of advanced functionals for accurate carrier mobility predictions, while for the large Kagome crystal CsV3Sb5, our framework reproduces the experimentally observed double domes in pressure-induced superconducting phase diagrams. This machine learning framework offers a powerful and efficient tool for the investigation of diverse EPC-related phenomena in complex materials.

... The success of hybrid functionals has also prompted the development of efficient numerical techniques for reducing the computational cost and scaling of HFX calculations in the past two decades. Currently, hybrid functional calculations for periodic systems are available in a range of DFT packages with plane-wave (PW) (Marsman et al., 2008;Spencer and Alavi, 2008;Broqvist et al., 2009;Hu et al., 2017b), Gaussiantype orbital (GTO) Guidon et al., 2008;Lee et al., 2022), and numerical atomic orbital (NAO) (Shang et al., 2011;Levchenko et al., 2015;Qin et al., 2015;Lin et al., 2020) basis sets. For PW basis sets, a low-rank approximation called adaptively compressed exchange (ACE) (Lin, 2016;Hu et al., 2017a) operator has been proposed, resulting in significant acceleration of hybrid functional calculations. ...

... In fact, current linear-scaling electronic structure packages, such as SIESTA , CONQUEST (Torralba et al., 2008), OPENMX (Ozaki and Kino, 2005), FHI-aims (Blum et al., 2009), HONPAS (Qin et al., 2015;2020a) and ABACUS (Li et al., 2016), prefer to adopt NAO basis sets. Compared to exponentially decayed GTOs, NAOs are strictly localized in real space, which provides greater convenience for linearscaling calculations. ...

... We have previously proposed this scheme called NAO2GTO (Shang et al., 2011) to take full advantages of both NAOs and GTOs. In conjunction with several integral screening techniques, HFX calculations based on the NAO2GTO scheme can be very efficient and scale linearly (Shang et al., 2011;Qin et al., 2015). In practice, however, the NAOs cannot be fitted accurately with a small number (e.g., 3-6) of GTOs, which will seriously affects the accuracy and even the convergence of a hybrid functional calculation. ...

The NAO2GTO scheme provides an efficient way to evaluate the electron repulsion integrals (ERIs) over numerical atomic orbitals (NAOs) with auxiliary Gaussian-type orbitals (GTOs). However, the NAO2GTO fitting will significantly impact the accuracy and convergence of hybrid functional calculations. To address this issue, here we propose to use the fitted orbitals as a new numerical basis to properly handle the mismatch between NAOs and fitted GTOs. We present an efficient and linear-scaling implementation of analytical gradients of Hartree-Fock exchange (HFX) energy for periodic HSE06 calculations with fitted NAOs in the HONPAS package. In our implementation, the ERIs and their derivatives for HFX matrix and forces are evaluated analytically with the auxiliary GTOs, while other terms are calculated using numerically discretized GTOs. Several integral screening techniques are employed to reduce the number of required ERI derivatives. We benchmark the accuracy and efficiency of our implementation and demonstrate that our results of lattice constants, bulk moduli, and band gaps of several typical semiconductors are in good agreement with the experimental values. We also show that the calculation of HFX forces based on a master-worker dynamic parallel scheme has a very high efficiency and scales linearly with respect to system size. Finally, we study the geometry optimization and polaron formation due to an excess electron in rutile TiO2 by means of HSE06 calculations to further validate the applicability of our implementation.

... It has been demonstrated that linear-scaling DFT calculations of CONQUEST and CP2K can simulate extreme-large systems containing millions of atoms using tens of thousands of cores. [21,[23][24][25] In particular, several China's domestic HPC DFT codes, such as HONPAS [26], DGDFT [27], ABA-CUS [28] and BDF [29], have been developed maintained with independent intellectual property rights supported on the Chinese home-grown supercomputers. For example, DGDFT can scale up to nearly ten millions of cores on the Sunway TaihuLight supercomputer for studying large-scale systems with tens of thousands of atoms [27]. ...

... In this section, we recall the basic knowledge of Khon-Sham Density functional theory (KS-DFT) [2,3] implemented in PWDFT [12], HONPAS [26] and DGDFT [38], including the KS equations, the exchange-correlation functionals and the self-consistent field procedure. ...

... In this section, we first introduce three high performance DFT software packages developed by our group, including PWDFT [12], HONPAS [26] and DGDFT [38]. Then, we describe the theoretical algorithms and parallel implementations of them on modern supercomputers. ...

High performance computing (HPC) plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory (KS-DFT) for investigating quantum structural and electronic properties of large-scale molecules and solids in condensed matter physics, quantum chemistry and materials science. This review focuses on recent advances for HPC software development in large-scale KS-DFT calculations containing tens of thousands of atoms on modern heterogeneous supercomputers, especially for the HPC software with independent intellectual property rights supported on the Chinese domestic exascale supercomputers. We first introduce three various types of DFT software developed on modern heterogeneous supercomputers, involving PWDFT (Plane-Wave Density Functional Theory), HONPAS (Hefei Order-N Packages for Ab initio Simulations) and DGDFT (Discontinuous Galerkin Density Functional Theory), respectively based on three different types of basis sets (plane waves, numerical atomic orbitals and adaptive local basis functions). Then, we describe the theoretical algorithms and parallel implementation of these three software on modern heterogeneous supercomputers in detail. Finally, we conclude this review and propose several promising research fields for future large-scale KS-DFT calculations towards exascale supercomputers.

... In particular, when numerical atomic orbitals (NAOs) are used as basis functions to expend the Kohn-Sham (KS) orbitals, this issue will become more serious, since time-consuming numerical integration over NAOs also results in a large prefactor. 6 Although the strictly localized NAOs have been widely adopted in modern linear-scaling density functional theory (DFT) codes with excellent capabilities for local or semilocal DFT calculations, such as SIESTA, 7 CONQUEST, 8 OPENMX, 9 FHI-aims, 10 HONPAS, 11 and ABACUS, 12 hybrid functional calculations with NAOs are rarely available. Several approximate approaches, such as the NAO2GTO scheme 13 and local resolution-of-the-identity (RI) approximation, 14,15 have been developed to reduce both the scaling and prefactor of NAO-based HFX calculations; however, either accuracy or efficiency is unsatisfactory. ...

... where ω = 0.11 Bohr −1 in HSE06 3 is a screening parameter defining the range separation, E X SR,HF denotes the short-range nonlocal Hartree-Fock exchange (HFX) energy, and E X SR,PBE (ω) and E C PBE are the semilocal short-range exchange and full-range correlation energies of Perdew−Burke− Ernzerhof (PBE). 46 Within NAOs, the short-range HFX energy for a closed molecular system or periodic system with a Γ-point-only sampling of the Brillouin zone (BZ) is given by (10) and the corresponding short-range HFX Hamiltonian matrix is expressed as (11) where the indices μ, ν, λ, and σ represent N b NAOs {ϕ(r)}, D μν is the two-electron density matrix, and (μν|λσ) SR is the short-range ERI tensor defined as r r erfc r r r r r r r r ...

... In this section, we demonstrate the computational accuracy and efficiency of the K-means-based ISDF approach for hybrid functional (HSE06) calculations. We implement this approach in the HONPAS package, 11 which is currently developed based on the SIESTA code. 7 For all our calculations, the NAOs are generated using the default parameters in SIESTA, and the norm-conserving PBE pseudopotentials of the Troullier-Martins type 47 are used to represent the core−valence interaction. ...

The interpolative separable density fitting (ISDF) is an efficient and accurate low-rank decomposition method to reduce the high computational cost and memory usage of the Hartree-Fock exchange (HFX) calculations with numerical atomic orbitals (NAOs). In this work, we present a machine learning K-means clustering algorithm to select the interpolation points in ISDF, which offers a much cheaper alternative to the expensive QR factorization with column pivoting (QRCP) procedure. We implement this K-means-based ISDF decomposition to accelerate hybrid functional calculations with NAOs in the HONPAS package. We demonstrate that this method can yield a similar accuracy for both molecules and solids at a much lower computational cost. In particular, K-means can remarkably reduce the computational cost of selecting the interpolation points by nearly two orders of magnitude compared to QRCP, resulting in a speedup of ∼10 times for ISDF-based HFX calculations.

... Nowadays, with the rapid development of modern heterogeneous supercomputers, the high-performance computing (HPC) has become a powerful tool for accelerating the DFT calculations to deal with large-scale systems. Several highly efficient DFT software based on low-scaling methods have been developed, such as SIESTA (Soler et al., 2002), OPENMX (Ozaki and Kino, 2005), CP2K (Kühne et al., 2020), CONQUEST (Gillan et al., 2007), PROFESS (Ho et al., 2008), FREEON (Challacombe, 2014), ONETEP (Skylaris et al., 2005), BigDFT (Genovese et al., 2008;Mohr et al., 2014), FHI-aims (Blum et al., 2009), ABACUS (Chen et al., 2010(Chen et al., , 2011, HONPAS (Qin et al., 2015), and DGDFT (Lin et al., 2012;Hu et al., 2015a,b;Banerjee et al., 2016;Zhang et al., 2017), which are capable to make full advantage of the massive parallelism available on HPC architectures beneting from the local data communication of sparse Hamiltonian matrix generated with local basis sets. In linear-scaling DFT calculations, the kernel for HPC is to parallel sparse matrix-matrix multiplication. ...

... In this work, we present a parallel implementation of linearscaling density matrix second-order trace-correcting purification (TC2) algorithm (Niklasson, 2002) to solve the KS equations with the NAOs in the HONPAS package (Qin et al., 2015). We propose to use the MPI_Allgather function for parallel programming to deal with such sparse matrix multiplication within the CSR format, which can be scaled linearly up to hundreds of processing cores on modern heterogeneous supercomputers. ...

... The KMG requires a initial approximate Wannier functions and a prior knowledge of the chemical potential. In the HONPAS-SIESTA package (Qin et al., 2015), we implement the density matrix purification algorithms, including the trace-preserving canonical purification scheme of PM (Palser and Manolopoulos, 1998;Daniels and Scuseria, 1999), the trace-correcting purification (TC) (Niklasson, 2002), and the trace resetting density matrix purification (TRS) (Niklasson et al., 2003). ...

Linear-scaling density functional theory (DFT) is an efficient method to describe the electronic structures of molecules, semiconductors, and insulators to avoid the high cubic-scaling cost in conventional DFT calculations. Here, we present a parallel implementation of linear-scaling density matrix trace correcting (TC) purification algorithm to solve the Kohn–Sham (KS) equations with the numerical atomic orbitals in the HONPAS package. Such a linear-scaling density matrix purification algorithm is based on the Kohn's nearsightedness principle, resulting in a sparse Hamiltonian matrix with localized basis sets in the DFT calculations. Therefore, sparse matrix multiplication is the most time-consuming step in the density matrix purification algorithm for linear-scaling DFT calculations. We propose to use the MPI_Allgather function for parallel programming to deal with the sparse matrix multiplication within the compressed sparse row (CSR) format, which can scale up to hundreds of processing cores on modern heterogeneous supercomputers. We demonstrate the computational accuracy and efficiency of this parallel density matrix purification algorithm by performing large-scale DFT calculations on boron nitrogen nanotubes containing tens of thousands of atoms.

... Specifically, the computational expense of processing a single snapshot for 2 × 2 × 2 or 3 × 3 × 4 systems in N 2 AMD is reduced by four orders of magnitude, as shown in Figure 3e. It should be noted that HONPAS, which we used for comparison with our ML frameworks, employs the NAO2GTO scheme to compute electron repulsion integrals and their derivatives, making it substantially faster than other widely used DFT codes [51]. However, even with this optimization, a NAMD simulation requires at least thousands of such single-point calculations, rendering the use of hybrid functionals in DFT-NAMD impractical due to the high computational cost. ...

... DFT calculations of TiO 2 and GaAs are performed with the HONPAS code [51], which implements NAO basis and norm-conserving pseudopotentials (NCPP). The valence electron configuration is 3s 2 3p 6 3d 2 4s 2 for Ti atoms, 2s 2 2p 4 for O atoms, 4s 2 4p 1 for Ga atoms and 4s 2 4p 3 for As atoms. ...

Non-adiabatic molecular dynamics (NAMD) simulations have become an indispensable tool for investigating excited-state dynamics in solids. In this work, we propose a general framework, N$^2$AMD which employs an E(3)-equivariant deep neural Hamiltonian to boost the accuracy and efficiency of NAMD simulations. The preservation of Euclidean symmetry of Hamiltonian enables N$^2$AMD to achieve state-of-the-art performance. Distinct from conventional machine learning methods that predict key quantities in NAMD, N$^2$AMD computes these quantities directly with a deep neural Hamiltonian, ensuring supreme accuracy, efficiency, and consistency. Furthermore, N$^2$AMD demonstrates excellent generalizability and enables seamless integration with advanced NAMD techniques and infrastructures. Taking several extensively investigated semiconductors as the prototypical system, we successfully simulate carrier recombination in both pristine and defective systems at large scales where conventional NAMD often significantly underestimates or even qualitatively incorrectly predicts lifetimes. This framework not only boosts the efficiency and precision of NAMD simulations but also opens new avenues to advance materials research.

... In both algorithms, the calculation of ERIs is independent of each other, so the communication time is minimized. We show our speedup results to demonstrate the performance of these static parallel distributed algorithms in the Hefei Order-N packages for ab initio simulations (HONPAS) Qin et al. (2014). ...

... In order to take advantages of both types of atomic orbitals, we have proposed a new scheme called NAO2GTOShang et al. (2011), in which GTO can be used for analytical computation of ERIs in a straightforward and efficient way, while NAO can be employed to set the strict cutoff for atomic orbitals. After employing several ERI screening techniques, the construction of HFX matrix can be very efficient and scale linearlyShang et al. (2011); Qin et al. (2014). ...

Hybrid density-functional calculation is one of the most commonly adopted electronic structure theory used in computational chemistry and materials science because of its balance between accuracy and computational cost. Recently, we have developed a novel scheme called NAO2GTO to achieve linear scaling (Order-N) calculations for hybrid density-functionals. In our scheme, the most time-consuming step is the calculation of the electron repulsion integrals (ERIs) part. So how to create an even distribution of these ERIs in parallel implementation is an issue of particular importance. Here, we present two static scalable distributed algorithms for the ERIs computation. Firstly, the ERIs are distributed over ERIs shell pairs. Secondly, the ERIs is distributed over ERIs shell quartets. In both algorithms, the calculation of ERIs is independent of each other, so the communication time is minimized. We show our speedup results to demonstrate the performance of these static parallel distributed algorithms in the Hefei Order-N packages for \textit{ab initio} simulations (HONPAS).

... For example, Sodt et al. proposed the atomic resolution-of-identity for exchange (ARI-K) by employing local fitting domains, 27 Merlot et al. introduced a simpler pairatomic resolution-of-identity (PARI-K) approximation, 28 and Neese et al. developed a chain-of-spheres exchange (COSX) method by combining seminumeric integration with RI. 29,30 Furthermore, when employing larger basis sets, the occupied orbital RI-K (occ-RI-K) method with an iterative framework has been proposed, 31 which indicates that the method of adaptively compressed exchange (ACE) operator developed by Lin can also be introduced. 32 Compared to analytical GTOs, the strictly localized numerical atomic orbitals (NAOs) are more convenient and flexible for linear-scaling DFT calculations, which have been widely adopted in linear-scaling DFT codes, such as SIESTA, 33 CONQUEST, 34 OPENMX, 35 FHI-aims, 36 HONPAS, 37 and ABACUS. 38 However, hybrid functional calculations with NAOs are rarely available since multicenter NAO integrals are fairly troublesome. ...

... To overcome this problem, our group has proposed an NAO2GTO scheme to approximately evaluate ERIs using auxiliary GTOs to represent NAOs in the HONPAS package. 37,44 After using integral screening techniques, the HFX calculation is found to be efficient and linear scaling. Furthermore, Ren et al. have successfully extended the RI approach to NAOs in the FHI-aims code. ...

The high cost associated with the evaluation of Hartree-Fock exchange (HFX) makes hybrid functionals computationally challenging for large systems. In this work, we present an efficient way to accelerate HFX calculations with numerical atomic basis sets. Our approach is based on the recently proposed interpolative separable density fitting (ISDF) decomposition to construct a low rank approximation of HFX matrix, which avoids explicit calculations of the electron repulsion integrals (ERIs) and significantly reduces the computational cost. We implement the ISDF method for hybrid functional (PBE0) calculations in the HONPAS package. We take benzene and polycyclic aromatic hydrocarbons molecules as examples and demonstrate that hybrid functionals with ISDF yields quite promising results at a significantly reduced computational cost. Especially, the ISDF approach reduces the total cost for evaluating HFX matrix by nearly 2 orders of magnitude compared to conventional approaches of direct evaluation of ERIs.

... We have proposed a mixed scheme called NAO2GTO [29] to take advantages of both types of atomic orbitals. In the NAO2GTO method, the strict cutoff of the atomic orbitals is satisfied with NAO, and then the NAO is fitted with several GTOs to analytically calculate the ERIs, after employing several ERI screening techniques, the construction of HFX matrix can be very efficient and scale linearly [29,31]. ...

... Here in this work, a new dynamic parallel distribution algorithm based on the NAO2GTO scheme [29] has been proposed and implemented in the Order-N performance HONPAS code [31]. In our approaches, the calculations of the ERIs are perfectly loading balanced and can scale to very large numbers of cores thanks to the 2-level master-worker distribution of shell pairs. ...

This work presents a dynamic parallel distribution scheme for the Hartree–Fock exchange (HFX) calculations based on the real-space NAO2GTO framework. The most time-consuming electron repulsion integrals (ERIs) calculation is perfectly load-balanced with 2-level master–worker dynamic parallel scheme, the density matrix and the HFX matrix are both stored in the sparse format, the network communication time is minimized via only communicating the index of the batched ERIs and the final sparse matrix form of the HFX matrix. The performance of this dynamic scalable distributed algorithm has been demonstrated by several examples of large scale hybrid density-functional calculations on Tianhe-2 supercomputers, including both molecular and solid states systems with multiple dimensions, and illustrates good scalability.

... In order to take advantages of both types of atomic orbitals, we have proposed a new scheme called NAO2GTO (Shang et al., 2011), in which GTO can be used for analytical computation of ERIs in a straightforward and efficient way, while NAO can be employed to set the strict cutoff for atomic orbitals. After employing several ERI screening techniques, the construction of HFX matrix can be very efficient and scale linearly (Qin et al., 2014;Shang et al., 2011). ...

... Previously, for codes using localized atomic orbitals, the parallelization of ERIs is mainly implemented to treat finite, isolated systems (Alexeev et al., 2002;Chow et al., 2015;Liu et al., 2014;Schmidt et al., 1993), but only a few literature reports exist for the treatment of periodic boundary conditions with such basis sets (Bush et al., 2011;Guidon et al., 2008), in which the Order-N screening for the ERIs calculations has not been considered. The purpose of this work is to present the static parallel distribution algorithms for the NAO2GTO scheme (Shang et al., 2011) with Order-N performance in Hefei Order-N packages for ab initio simulations code (Qin et al., 2014). In our approaches, the calculations of ERIs are not replicated, but are distributed over CPU cores, as a result both the memory and the CPU requirements of the ERIs calculation are paralleled. ...

Hybrid density-functional calculation is one of the most commonly adopted electronic structure theories in computational chemistry and materials science because of its balance between accuracy and computational cost. Recently, we have developed a novel scheme called NAO2GTO to achieve linear scaling (Order-N) calculations for hybrid density-functionals. In our scheme, the most time-consuming step is the calculation of the electron repulsion integrals (ERIs) part, so creating an even distribution of these ERIs in parallel implementation is an issue of particular importance. Here, we present two static scalable distributed algorithms for the ERIs computation. Firstly, the ERIs are distributed over ERIs shell pairs. Secondly, the ERIs are distributed over ERIs shell quartets. In both algorithms, the calculation of ERIs is independent of each other, so the communication time is minimized. We show our speedup results to demonstrate the performance of these static parallel distributed algorithms in the Hefei Order-N packages for ab initio simulations.

... The optimization of lattice geometry and electronic structure were performed with generalized gradient approximation of Perdew, Burke, and Ernzerhof (GGA-PBE) 58,59 . Recognizing that GGA-PBE significantly underestimates the band gap, a portion of the electronic structure calculations was also carried out using the hybrid Heyd-Scuseria-Ernzerhof functional (HSE06) 60 in conjunction with the HONPAS package 61,62 . The electron-ion interactions were described by Troullier-Martins conserved norm pseudopotentials 63,64 , in the factored form of Kleinman-Bylander 65 . ...

The boron nitride (BN) analogue of 8-16-4 graphyne, termed SBNyne, is proposed for the first time. Its physical properties were explored using first-principles calculations and classical molecular dynamics (MD) simulations. Thermal stability assessments reveal that SBNyne maintains structural integrity up to 1000 K. We found that SBNyne exhibits a wide indirect bandgap of 4.58 eV using HSE06 and 3.20 eV using PBE. It displays strong optical absorption in the ultraviolet region while remaining transparent in the infrared and visible regions. Additionally, SBNyne exhibits significantly lower thermal conductivity compared to h-BN. Phonon spectrum analysis indicates that out-of-plane phonons predominantly contribute to the vibrational density of states only at very low frequencies, explaining its low thermal conductivity. These findings expand the knowledge of BN-based 2D materials and open new avenues for their design and advanced technological applications.

... For exchange-correlation functional, the Perdew-Burke-Ernzerhof (PBE) type of the generalized gradient approximation (GGA) was implemented 55 . In order to determine band gaps with higher accurately, we have used the Heyd-Scuseria-Ernzerhof (HSE06) hybrid functional with the HONPAS package 56,57 . The selection of the double zeta polarized (DZP) basis set was accompanied by setting the orbital confining cut-off at 0.01 Ry. ...

In this study, the effects of interlayer interaction and biaxial strain on the electronic structure, phonon dispersion and optical properties of monolayer and bilayer BAs are studied, using first-principles calculations within the framework of density functional theory. The interlayer coupling in bilayer BAs causes the splitting of out-of-plane acoustic (ZA) and optical (ZO) mode. For both structures, positive phonon modes across the Brillouin zone have been observed under biaxial tensile strain from 0 to 8%, which indicate their dynamical stability under tensile strain. Also, the phonon band gap between longitudinal acoustic (LA) and longitudinal optical (LO)/transverse optical (TO) modes for monolayer and bilayer BAs decreases under tensile strain. An appreciable degree of optical anisotropy is noticeable in the materials for parallel and perpendicular polarizations, accompanied by significant absorption in the ultraviolet and visible regions. The absorption edge of bilayer BAs is at a lower energy with respect to the monolayer BAs. The results demonstrate that the phonon dispersion and optoelectronic properties of BAs sheet could as well be tuned with both interlayer interaction and biaxial strain that are promising for optoelectronic and thermoelectric applications.

... For the methodologies such as Hartree-Fock (HF) [6][7][8][9][10][11][12][13][14] and Density Functional Theory (DFT) [15][16][17][18][19][20][21][22][23][24] both of the above mentioned challenges have been extensively addressed. 25,26 Both were implemented in highly parallel codes and scale very well with large amounts of computational resources as work can be distributed over different nodes and/or GPUs using distributed Fock builds. ...

We present here a massively parallel implementation of the recently developed CPS(D-3)
excitation energy model which is based on Cluster Perturbation Theory. The new algorithm
extends the one developed in [P. Baudin, et al, J. Chem. Phys., 150.13, 134110 (2019)] to
leverage multiple nodes and to utilize graphical processing units for acceleration of heavy
tensor contractions. Furthermore, we show that the extended algorithm scales efficiently
with increasing amounts of computational resources and that the developed code enables
CPS(D-3) excitation energy calculations on large molecular systems with a low time-tosolution.
More specifically, calculations on systems with over 100 atoms and 1000 basis
functions are possible in a few hours of wall clock time. This establishes CPS(D-3) excitation
energies as a computationally efficient alternative to those obtained from the Coupled
Cluster Singles and Doubles model.

... However, KSSOLV 2.0 is not designed for performing large-scale electronic structure calculations. For these types of calculations, many existing software tools can be used alternatively, such as Gaussian [22], NWChem (NorthWest computational Chemistry) [23], Q-CHEM [24], BDF (Beijing Density Functional program package) [25], and PySCF (Python-based Simulations of Chemistry Framework) [26] within Gaussian-type orbital (GTO) basis set; SIESTA (Spanish Initiative for Electronic Simulations with Thousands of Atoms) [27], HONPAS (Hefei Order-N Packages for Ab initio Simulations) [28,29,30], FHI-aims (Fritz Haber Institute ab initio molecular simulations) [31] and ABACUS (Atomic-orbital Based Ab-initio Computation at Ustc) [32] within numerical atomic orbital (NAO) basis set; and VASP (Vienna Ab initio Simulation Package) [33], ABINIT [34], QE (QUANTUM ESPRESSO) [35], PWmat [36], PWDFT (Plane-Wave Density Functional Theory) [37] within plane-wave basis set. These DFT codes are often written in languages such as FORTRAN and C++, and parallelized with OpenMP, MPI, and CUDA. ...

KSSOLV (Kohn-Sham Solver) is a MATLAB toolbox for performing Kohn-Sham density functional theory (DFT) calculations with a plane-wave basis set. KSSOLV 2.0 preserves the design features of the original KSSOLV software to allow users and developers to easily set up a problem and perform ground-state calculations as well as to prototype and test new algorithms. Furthermore, it includes new functionalities such as new iterative diagonalization algorithms, k-point sampling for electron band structures, geometry optimization and advanced algorithms for performing DFT calculations with local, semi-local, and hybrid exchange-correlation functionals. It can be used to study the electronic structures of both molecules and solids. We describe these new capabilities in this work through a few use cases. We also demonstrate the numerical accuracy and computational efficiency of KSSOLV on a variety of examples.

... However, KSSOLV 2.0 is not designed for performing large-scale electronic structure calculations. For these types of calculations, many existing software tools can be used alternatively, such as Gaussian [22], NWChem (NorthWest computational Chemistry) [23], Q-CHEM [24], BDF (Beijing Density Functional program package) [25], and PySCF (Python-based Simulations of Chemistry Framework) [26] within Gaussian-type orbital (GTO) basis set; SIESTA (Spanish Initiative for Electronic Simulations with Thousands of Atoms) [27], HONPAS (Hefei Order-N Packages for Ab initio Simulations) [28,29,30], FHI-aims (Fritz Haber Institute ab initio molecular simulations) [31] and ABACUS (Atomic-orbital Based Ab-initio Computation at Ustc) [32] within numerical atomic orbital (NAO) basis set; and VASP (Vienna Ab initio Simulation Package) [33], ABINIT [34], QE (QUANTUM ESPRESSO) [35], PWmat [36], PWDFT (Plane-Wave Density Functional Theory) [37] within plane-wave basis set. These DFT codes are often written in languages such as FORTRAN and C++, and parallelized with OpenMP, MPI, and CUDA. ...

KSSOLV (Kohn-Sham Solver) is a MATLAB toolbox for performing Kohn-Sham density functional theory (DFT) calculations with a plane-wave basis set. KSSOLV 2.0 preserves the design features of the original KSSOLV software to allow users and developers to easily set up a problem and perform ground-state calculations as well as to prototype and test new algorithms. Furthermore, it includes new functionalities such as new iterative diagonalization algorithms, k-point sampling for electron band structures, geometry optimization and advanced algorithms for performing DFT calculations with local, semi-local, and hybrid exchange-correlation functionals. It can be used to study the electronic structures of both molecules and solids. We describe these new capabilities in this work through a few use cases. We also demonstrate the numerical accuracy and computational efficiency of KSSOLV on a variety of examples.
Program summary
Program title: Kohn-Sham Solver 2.0 (KSSOLV 2.0)
CPC Library link to program files: https://doi.org/10.17632/pp8vgvfcv4.1
Developer's repository link: https://bitbucket.org/berkeleylab/kssolv2.0/src/release/
Licensing provisions: BSD 3-clause
Programming language:: MATLAB
Nature of problem: KSSOLV2.0 is used to perform Kohn-Sham density functional theory based electronic structure calculations to study chemical and material properties of molecules and solids. The key problem to be solved is a constrained energy minimization problem, which can also be formulated as a nonlinear eigenvalue problem.
Solution method: The KSSOLV 2.0 implements both the self-consistent field (SCF) iteration with a variety of acceleration strategies and a direct constrained minimization algorithms. It is written completely in MATLAB and uses MATLAB's object oriented programming features to make it easy to use and modify.

... Linear-scaling DFT Goedecker (1999) is an efficient method to yield the structural and electronic properties of molecules, semiconductors, and insulators to avoid the high cubic-scaling cost in conventional DFT calculations. Luo et al. described an efficient parallel implementation of linear-scaling density matrix trace correcting purification algorithm Niklasson (2002) to solve the Kohn-Sham equations with numerical atomic orbitals in the HONPAS Qin et al. (2015) package. The authors have performed large-scale DFT calculations on boron nitrogen nanotubes containing tens of thousands of atoms, which can scale up to hundreds of processing cores on modern heterogeneous supercomputers. ...

... In recent decades, the rapid development of modern heterogeneous supercomputers enables the high performance computing (HPC) as a powerful tool to accelerate the Kohn-Sham density functional theory (KS-DFT) calculations for large-scale systems. In particular, several effective HPC KS-DFT software packages for ground-state electronic structure calculations with small localized basis sets have been developed, such as SIESTA [7], CP2K [8], CONQUEST [9], FHI-aims [10], BigDFT [11], HONPAS [12][13][14] and DGDFT [15][16][17], which are beneficial to take advantage of the massive parallelism on modern heterogeneous supercomputers. For example, large-scale KS-DFT calculations containing tens of thousands of atoms have been performed in CP2K [8], CONQUEST [9] and DGDFT [16,17], which can scale up to hundreds of thousands of cores on the Cray, Edison, Cori and Sunway TaihuLight supercomputers. ...

High performance computing is a powerful tool to accelerate the Kohn–Sham density functional theory calculations on modern heterogeneous supercomputers. Here, we describe a massively parallel implementation of large-scale linear-response time-dependent density functional theory (LR-TDDFT) to calculate the excitation energies and wave functions of solids with plane-wave basis set. We adopt a two-level parallelization strategy that combines the message passing interface with open multi-processing parallel programming to deal with the matrix operations and data communications of constructing and diagonalizing the LR-TDDFT Hamiltonian matrix. Numerical results illustrate that the LR-TDDFT calculations can scale up to 24 576 processing cores on modern heterogeneous supercomputers to study the excited state properties of bulky silicon systems containing thousands of atoms (4,096 atoms). We demonstrate that the LR-TDDFT calculations can be used to investigate the photoinduced charge separation of water molecule adsorption on rutile TiO 2 (110) surface from an excitonic perspective.

... Similarly, in order to keep the exponential value to be smaller than unity and avoid computational divergence, we inserted the VBM/CBM value into the exponential factor when calculated the MP2 reformulation of the VBM/CBM. The canonical and Laplace-transformed MP2 methods described above have been implemented in the Order-N performance HONPAS code (Qin et al., 2014). ...

We present an implementation of the canonical and Laplace-transformed formulation of the second-order Møller–Plesset perturbation theory under periodic boundary conditions using numerical atomic orbitals. To validate our approach, we show that our results of the Laplace-transformed MP2 correlation correction for the total energy and the band gap are in excellent agreement with the results of the canonical MP2 formulation. We have calculated the binding energy curve for the stacked trans-polyacetylene at the Hartree–Fock + MP2 level as a preliminary application.

... [47][48][49][50][51][52] In case of HDFs, where the computation of two-electron Coulomb repulsion integrals (ERIs) is a necessity, NAO-based implementations have only appeared recently. Existing implementations circumvent a straightforward evaluation of ERIs in terms of NAOs in one way or another, by expanding the NAOs in terms of GTOs, 53,54 or by employing the resolution-of-the-identity (RI) technique [55][56][57][58] (as well as its refined variant -the interpolative separable density fitting scheme 59,60 ). Within the RI 61-63 -also known as variational density fitting 64-66 -approximation, the four-index ERIs are decomposed into three-and two-index ones, whereby the storage requirement and the computational cost are significantly reduced. ...

We present an efficient, linear-scaling implementation for building the (screened) Hartree-Fock exchange (HFX) matrix for periodic systems within the framework of numerical atomic orbital (NAO) basis functions. Our implementation is based on the localized resolution of the identity approximation by which two-electron Coulomb repulsion integrals can be obtained by only computing two-center quantities -- a feature that is highly beneficial to NAOs. By exploiting the locality of basis functions and efficient prescreening of the intermediate three- and two-index tensors, one can achieve a linear scaling of the computational cost for building the HFX matrix with respect to the system size. Our implementation is massively parallel, thanks to a MPI/OpenMP hybrid parallelization strategy for distributing the computational load and memory storage. All these factors add together to enable highly efficient hybrid functional calculations for large-scale periodic systems. In this work we describe the key algorithms and implementation details for the HFX build as implemented in the ABACUS code package. The performance and scalability of our implementation with respect to the system size and the number of CPU cores are demonstrated for selected benchmark systems up to 4096 atoms.

... These low-scaling methods principally rely on the nearsightedness principle in molecules and semiconductors, and have been widely implemented with small localized basis sets in real-space, such as Gaussian [8] and numerical atomic orbitals [4], resulting in the sparse Hamiltonian in real space. Based on these low-scaling methods, several highly efficient KS-DFT materials simulation software packages have been developed, such as SIESTA [9], CP2K [10], CONQUEST [11] and HON-PAS [12], which are beneficial to take full advantage of the massive parallelism available on modern high performance computing (HPC) architectures due to the local data communication of sparse Hamiltonian generated in small localized basis sets. However, the accuracy of these low-scaling methods strongly depends on the parameters of localized basis sets, and is difficult to be improved systematically, compared to large uniform basis sets, such as plane-waves. ...

High performance computing (HPC) is a powerful tool to accelerate the Kohn-Sham density functional theory (KS-DFT) calculations on modern heterogeneous supercomputers. Here, we describe a massively parallel implementation of discontinuous Galerkin density functional theory (DGDFT) method on the Sunway TaihuLight supercomputer. The DGDFT method uses the adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field (SCF) iteration to solve the KS equations with high precision comparable to plane-wave basis set. In particular, the DGDFT method adopts a two-level parallelization strategy that deals with various types of data distribution, task scheduling, and data communication schemes, and combines with the master-slave multi-thread heterogeneous parallelism of SW26010 processor, resulting in large-scale HPC KS-DFT calculations on the Sunway TaihuLight supercomputer. We show that the DGDFT method can scale up to 8,519,680 processing cores (131,072 core groups) on the Sunway TaihuLight supercomputer for studying the electronic structures of two-dimensional (2D) metallic graphene systems that contain tens of thousands of carbon atoms.

... 55 In that case, the PAO-ERIs are transformed into a set of contracted GTO-ERIs, which are then calculated analytically. 56,57 Instead, we use a route that circumvents the calculation of the ERIs and works for any smooth finite-range functions, which is particularly well suited for O(N) approaches based on the pseudopotential approximation. The key part is to perform the sum over the index lν before solving for the Coulomb potential of the pair densities; this simple re-ordering increases the efficiency of the procedure markedly. ...

We survey the underlying theory behind the large-scale and linear scaling density functional theory code, conquest, which shows excellent parallel scaling and can be applied to thousands of atoms with diagonalization and millions of atoms with linear scaling. We give details of the representation of the density matrix and the approach to finding the electronic ground state and discuss the implementation of molecular dynamics with linear scaling. We give an overview of the performance of the code, focusing in particular on the parallel scaling, and provide examples of recent developments and applications.

... [50] In that case, the PAO-ERIs are transformed into a set of contracted GTO-ERIs which are then calculated analytically. [51,52] Instead, we use a route which circumvents the calculation of the ERIs and works for any smooth finite-range functions, which is particulary well suited for O(N ) approaches based on the pseudopotential approximation. The key part is to perform the sum over the index lν before solving for the Coulomb potential of the pair densities; this simple re-ordering increases the efficiency of the procedure markedly. ...

We survey the underlying theory behind the large-scale and linear scaling DFT code, Conquest, which shows excellent parallel scaling and can be applied to thousands of atoms with exact solutions, and millions of atoms with linear scaling. We give details of the representation of the density matrix and the approach to finding the electronic ground state, and discuss the implementation of molecular dynamics with linear scaling. We give an overview of the performance of the code, focussing in particular on the parallel scaling, and provide examples of recent developments and applications.

... f 0 (F) . . .)). (6) Recursive polynomial expansions are implemented in a number of electronic structure codes including CP2K [25], ERGO [23,24], HONPAS [13], and LATTE [5]. The initial polynomial f 0 usually maps the eigenvalues of F into the interval [0, 1] in reverse order. ...

The recursive polynomial expansion for construction of a density matrix approximation with rigorous error control [J. Chem. Phys. 128, 074106 (2008)] is implemented in the quantum chemistry program Ergo [SoftwareX 7, 107 (2018)] using the Chunks and Tasks matrix library [Parallel Comput. 57, 87 (2016)]. The expansion is based on second-order polynomials and accelerated by the scale-and-fold technique [J. Chem. Theory Comput. 7, 1233 (2011)]. We evaluate the performance of the implementation by computing the density matrix from the Fock matrix in the large-scale self-consistent field calculations. We demonstrate that the amount of communicated data per worker process tends to a constant with increasing system size and number of computer nodes such that the amount of work per worker process is fixed.

... We use the generalized gradient approximation of Perdew, Burke, and Ernzerhof (GGA-PBE) 40 exchange correlation functional with collinear spin polarization, and the double zeta plus polarization orbital basis set (DZP) to describe the valence electrons within the framework of a linear combination of numerical atomic orbitals (LCAO). 41 Because semi-local GGA-PBE calculations are less reliable in predicting the electronic structures of ZZGNFs, the screened hybrid HSE06 42 calculations implemented in HONPAS [43][44][45] (Hefei Order-N Packages for Ab Initio Simulations based on SIESTA) are also used to compute the electronic and magnetic properties of ZZGNFs. All atomic coordinates are fully relaxed using the conjugate gradient (CG) algorithm until the energy and force convergence criteria of 10 −4 eV and 0.02 eV/Å, respectively, are reached. ...

Graphene is a nonmagnetic semimetal and cannot be directly used as electronic and spintronic devices. Here, we demonstrate that zigzag graphene nanoflakes (GNFs), also known as graphene quantum dots, can exhibit strong edge magnetism and tunable energy gaps due to the presence of localized edge states. By using large-scale first principle density functional theory calculations and detailed analysis based on model Hamiltonians, we can show that the zigzag edge states in GNFs (${\mathrm{C}}_{6n^2}$H6n, n = 1–25) become much stronger and more localized as the system size increases. The enhanced edge states induce strong electron–electron interactions along the edges of GNFs, ultimately resulting in a magnetic configuration transition from nonmagnetic to intra-edge ferromagnetic and inter-edge antiferromagnetic, when the diameter is larger than 4.5 nm (C480H60). Our analysis shows that the inter-edge superexchange interaction of antiferromagnetic states between two nearest-neighbor zigzag edges in GNFs at the nanoscale (around 10 nm) can be stabilized at room temperature and is much stronger than that exists between two parallel zigzag edges in graphene nanoribbons, which cannot be stabilized at ultra-low temperature (3 K). Furthermore, such strong and localized edge states also induce GNFs semiconducting with tunable energy gaps, mainly controlled by adjusting the system size. Our results show that the quantum confinement effect, inter-edge superexchange (antiferromagnetic), and intra-edge direct exchange (ferromagnetic) interactions are crucial for the electronic and magnetic properties of zigzag GNFs at the nanoscale.

... In contrast to our strictly atom-centered approach, this work employs bond-centered auxiliary basis sets. Another technique is the expansion of the NAOs in terms of a Gaussian function set, which is implemented in the SIESTA code [50, 51]. While this strategy has the advantage of offering an analytic solution, each NAO must be fitted by several Gaussians, which partially undoes the benefits of the compactness of NAO basis sets. ...

A key component in calculations of exchange and correlation energies is the Coulomb operator, which requires the evaluation of two-electron integrals. For localized basis sets, these four-center integrals are most efficiently evaluated with the resolution of identity (RI) technique, which expands basis-function products in an auxiliary basis. In this work we show the practical applicability of a localized RI-variant (‘RI-LVL’), which expands products of basis functions only in the subset of those auxiliary basis functions which are located at the same atoms as the basis functions. We demonstrate the accuracy of RI-LVL for Hartree–Fock calculations, for the PBE0 hybrid density functional, as well as for RPA and MP2 perturbation theory. Molecular test sets used include the S22 set of weakly interacting molecules, the G3 test set, as well as the G2–1 and BH76 test sets, and heavy elements including titanium dioxide, copper and gold clusters. Our RI-LVL implementation paves the way for linear-scaling RI-based hybrid functional calculations for large systems and for all-electron many-body perturbation theory with significantly reduced computational and memory cost.

... When the electron density is computed from the expression given in (8), we use the recently developed PEXSI method [28][29][30] to compute the diagonal blocks of the density matrix. This technique avoids the diagonalization procedure which has an O(N 3 ) complexity. ...

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) [J. Comput. Phys. 2012, 231, 2140] method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field (SCF) iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. It minimizes the number of degrees of freedom required to represent the solution to the Kohn-Sham problem for a desired level of accuracy. In particular, DGDFT can reach the planewave accuracy with far fewer numbers of degrees of freedom. By using the pole expansion and selected inversion (PEXSI) technique to compute electron density, energy and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of two-dimensional (2D) phosphorene systems with 3,500-14,000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

... Sparse matrix-matrix multiplication is used in particular in polynomial expansion [23] and minimization methods [21] to compute the density matrix. Such methods are used in a number of electronic structure codes such as Conquest [17], CP2K [32], Ergo [29], FreeON [5], Honpas [24], and Onetep [19] to achieve a computational cost that increases only linearly with system size. The matrix sparsity varies between tens to thousands of nonzeros per row depending on the underlying model and the basis set used. ...

We present a library for parallel block-sparse matrix-matrix multiplication
on distributed memory clusters. The library is based on the Chunks and Tasks
programming model [Parallel Comput. 40, 328 (2014)]. Acting as matrix library
developers, using this model we do not have to explicitly deal with
distribution of work and data or communication between computational nodes in a
distributed memory cluster. The mapping of work and data to physical resources
is instead handled by a Chunks and Tasks library. We evaluate our matrix
library using the Chunks and Tasks library CHT-MPI, see
www.chunks-and-tasks.org, which dynamically distributes data and work between
nodes. This allows for exploitation of a priori unknown and irregular sparsity
patterns to improve performance. On each node, available Graphics Processing
Units (GPUs) are used for matrix-matrix multiplication tasks. When all GPUs are
busy, tasks are also running on the available CPUs, thus making use of the full
computing capacity of each node.
Matrices are represented by sparse quadtrees of chunk objects. The leaves in
the hierarchy are block-sparse submatrices. Sparsity may occur at any level in
the hierarchy and/or within the submatrix leaves. Overlap between computation
and communication is achieved for both the data transfers between nodes in the
cluster and for the data transfers to/from GPUs.
We evaluate the performance of the matrix-matrix multiplication
implementations for dense and banded matrices and matrices with randomly
distributed submatrix blocks. We demonstrate the performance of the symmetric
matrix square operation, taking advantage of multiplicand and product matrices
being symmetric. We also present symmetric matrix square calculations for
matrices from linear scaling electronic structure calculations. The sparse
symmetric matrix square is a major computational challenge for such
calculations.

... Although various linear scaling O(N 1 ) methods [3][4][5] have been proposed for improving the efficiency of DFT calculations, they rely on the nearsightedness principle, which leads to exponentially localized density matrices in real-space for systems with a finite energy gap or at finite temperature. On the other hand, most of the existing linear scaling DFT codes, such as SIESTA, 6 CONQUEST, 7 OPENMX 8 and HONPAS, 9 are based on the contracted and localized basis sets in the real-space, such as Gaussian-type orbitals or numerical atomic orbitals. 4 It is relatively difficult to improve the accuracy of methods based on such contracted basis functions in a systematic fashion compared to methods based on conventional uniform basis sets, for example, the planewave basis set. 10 The disadvantage of using uniform basis sets is the relatively large number of basis functions required per atom. ...

With the help of our recently developed massively parallel DGDFT
(Discontinuous Galerkin Density Functional Theory) methodology, we perform
large-scale Kohn-Sham density functional theory calculations on phosphorene
nanoribbons with armchair edges (ACPNRs) containing a few thousands to ten
thousand atoms. The use of DGDFT allows us to systematically achieve
conventional plane wave basis set type of accuracy, but with a much smaller
number (about 15) of adaptive local basis (ALB) functions per atom for this
system. The relatively small number degrees of freedom required to represent
the Kohn-Sham Hamiltonian, together with the use of the pole expansion the
selected inversion (PEXSI) technique that circumvents the need to diagonalize
the Hamiltonian, result in a highly efficient and scalable computational scheme
for analyzing the electronic structures of ACPNRs as well as its dynamics. The
total wall clock time for calculating the electronic structures of large-scale
ACPNRs containing 1080-10800 atoms is only 10-25 s per self-consistent field
(SCF) iteration, with accuracy fully comparable to that obtained from
conventional planewave DFT calculations. For the ACPNR system, we observe that
the DGDFT methodology can scale to 5,000-50,000 processors. We use DGDFT based
ab-initio molecular dynamics (AIMD) calculations to study the thermodynamic
stability of ACPNRs. Our calculations reveal that a 2 * 1 edge reconstruction
appears in ACPNRs at room temperature.

The recent synthesis of Goldene, a 2D atomic monolayer of gold, has opened new avenues in exploring novel materials. However, the question of when multilayer Goldene transitions into bulk gold remains unresolved. This study used density functional theory calculations to address this fundamental question. Our findings reveal that multilayer Goldene retains an AA-like stacking configuration of up to six layers, with no observation of Bernal-like stacking as seen in graphene. Goldene spontaneously transitions to a bulk-like gold structure at seven layers, adopting a rhombohedral (ABC-like) stacking characteristic of bulk face-centered cubic (FCC) gold. The atomic arrangement converges entirely to the bulk gold lattice for more than ten layers. Quantum confinement significantly impacts the electronic properties, with monolayer and bulk Goldene exhibiting a single Dirac cone at the X-point of the Brillouin zone. In contrast, multilayer Goldene shows two Dirac cones at the X- and Y-points. Additionally, monolayer Goldene exhibits anisotropic optical absorption, which is absent in bulk gold. This study provides a deeper understanding of multilayer Goldene's structural and electronic properties and stacked 2D materials in general.

The numerical atomic orbital (NAO) basis sets offer a computationally efficient option for electronic structure calculations, as they require fewer basis functions compared with other types of basis sets. Moreover, their strict localization allows for easy combination with current linear scaling methods, enabling efficient calculation of large physical systems. In recent years, NAO bases have become increasingly popular in modern electronic structure codes. This article provides a review of the ab initio electronic structure calculations using NAO bases. We begin by introducing basic formalisms of the NAO‐based electronic structure method, including NAO base set generation, self‐consistent calculations, force, and stress calculations. We will then discuss some recent advances in the methods based on the NAO bases, such as real‐time dependent density functional theory (rt‐TDDFT), efficient implementation of hybrid functionals, and other advanced electronic structure methods. Finally, we introduce the ab initio tight‐binding model, which can be generated directly after the self‐consistent calculations. The model allows for efficient calculation of electronic structures, and the associated topological, and optical properties of the systems.
This article is categorized under: Electronic Structure Theory > Ab Initio Electronic Structure Methods
Electronic Structure Theory > Density Functional Theory
Structure and Mechanism > Computational Materials Science

Nanosystems play an important role in many applications. Due to their complexity, it is challenging to accurately characterize their structure and properties. An important means to reach such a goal is computational simulation, which is grounded on ab initio electronic structure calculations. Low scaling and accurate electronic-structure algorithms have been developed in recent years. Especially, the efficiency of hybrid density functional calculations for periodic systems has been significantly improved. With electronic structure information, simulation methods can be developed to directly obtain experimentally comparable data. For example, scanning tunneling microscopy images can be effectively simulated with advanced algorithms. When the system we are interested in is strongly coupled to environment, such as the Kondo effect, solving the hierarchical equations of motion turns out to be an effective way of computational characterization. Furthermore, the first principles simulation on the excited state dynamics rapidly emerges in recent years, and nonadiabatic molecular dynamics method plays an important role. For nanosystem involved chemical processes, such as graphene growth, multiscale simulation methods should be developed to characterize their atomic details. In this review, we review some recent progresses in methodology development for computational characterization of nanosystems. Advanced algorithms and software are essential for us to better understand of the nanoworld.

Graphene quantum dots (GQDs) exhibit abundant magnetic edge states with promising applications in spintronics. Hexagonal zigzag GQDs possess a ground state with an antiferromagnetic (AFM) inter-edge coupling, followed by a metastable state with ferromagnetic (FM) inter-edge coupling. By analyzing the Hubbard model and performing large-scale spin-polarized density functional theory calculations containing thousands of atoms, we predict a series of new mixed magnetic edge states of GQDs arising from the size effect, namely mix-n, where n is the number of spin arrangement parts at each edge, with parallel spin in the same part and anti-parallel spin between adjacent parts. In particular, we demonstrate that the mix-2 state of bare GQDs (C6N2) appears when N ≥ 4 and the mix-3 state appears when N ≥ 6, where N is the number of six-membered-ring at each edge, while the mix-2 and mix-3 magnetic states appear in the hydrogenated GQDs with N = 13 and N = 15, respectively.

Electronic structure methods based on quantum mechanics (QM) are widely employed in the computational predictions of the molecular properties and optoelectronic properties of molecular materials. The computational costs of these QM methods, ranging from density functional theory (DFT) or time-dependent DFT (TDDFT) to wave-function theory (WFT), usually increase sharply with the system size, causing the curse of dimensionality and hindering the QM calculations for large sized systems such as long polymer oligomers and complex molecular aggregates. In such cases, in recent years low scaling QM methods and machine learning (ML) techniques have been adopted to reduce the computational costs and thus assist computational and data driven molecular material design. In this review, we illustrated low scaling ground-state and excited-state QM approaches and their applications to long oligomers, self-assembled supramolecular complexes, stimuli-responsive materials, mechanically interlocked molecules, and excited state processes in molecular aggregates. Variable electrostatic parameters were also introduced in the modified force fields with the polarization model. On the basis of QM computational or experimental datasets, several ML algorithms, including explainable models, deep learning, and on-line learning methods, have been employed to predict the molecular energies, forces, electronic structure properties, and optical or electrical properties of materials. It can be conceived that low scaling algorithms with periodic boundary conditions are expected to be further applicable to functional materials, perhaps in combination with machine learning to fast predict the lattice energy, crystal structures, and spectroscopic properties of periodic functional materials.

In the past decade, developments of computational technology around density functional theory (DFT) calculations have considerably increased the system sizes which can be practically simulated. The advent of robust high performance computing algorithms which scale linearly with system size has unlocked numerous opportunities for researchers. This fact enables computational physicists and chemists to investigate systems of sizes which are comparable to systems routinely considered by experimentalists, leading to collaborations with a wide range of techniques and communities. This has important consequences for the investigation paradigms which should be applied to reduce the intrinsic complexity of quantum mechanical calculations of many thousand atoms. It becomes important to consider portions of the full system in the analysis, which have to be identified, analyzed, and employed as building‐blocks from which decomposed physico‐chemical observables can be derived. After introducing the state‐of‐the‐art in the large scale DFT community, we will illustrate the emerging research practices in this rapidly expanding field, and the knowledge gaps which need to be bridged to face the stimulating challenge of the simulation of increasingly realistic systems.
This article is categorized under: Electronic Structure Theory > Density Functional Theory
Software > Simulation Methods
Structure and Mechanism > Computational Materials Science

We present an efficient, linear-scaling implementation for building the (screened) Hartree-Fock exchange (HFX) matrix for periodic systems within the framework of numerical atomic orbital (NAO) basis functions. Our implementation is based on the localized resolution of the identity approximation by which two-electron Coulomb repulsion integrals can be obtained by only computing two-center quantities-a feature that is highly beneficial to NAOs. By exploiting the locality of basis functions and efficient prescreening of the intermediate three- and two-index tensors, one can achieve a linear scaling of the computational cost for building the HFX matrix with respect to the system size. Our implementation is massively parallel, thanks to a MPI/OpenMP hybrid parallelization strategy for distributing the computational load and memory storage. All these factors add together to enable highly efficient hybrid functional calculations for large-scale periodic systems. In this work, we describe the key algorithms and implementation details for the HFX build as implemented in the ABACUS code package. The performance and scalability of our implementation with respect to the system size and the number of CPU cores are demonstrated for selected benchmark systems up to 4096 atoms.

This work presents a dynamic parallel distribution scheme for the Hartree-Fock exchange~(HFX) calculations based on the real-space NAO2GTO framework. The most time-consuming electron repulsion integrals~(ERIs) calculation is perfectly load-balanced with 2-level master-worker dynamic parallel scheme, the density matrix and the HFX matrix are both stored in the sparse format, the network communication time is minimized via only communicating the index of the batched ERIs and the final sparse matrix form of the HFX matrix. The performance of this dynamic scalable distributed algorithm has been demonstrated by several examples of large scale hybrid density-functional calculations on Tianhe-2 supercomputers, including both molecular and solid states systems with multiple dimensions, and illustrates good scalability.

We present an implementation of hybrid density functional approximations for periodic systems within a pseudopotential-based, numerical atomic orbital (NAO) framework. The two-electron Coulomb repulsion integrals are evaluated using the localized resolution-of-identity (LRI) approximation. The accuracy of the LRI approximation is benchmarked unambiguously against independent reference results obtained by expanding the products of NAOs in terms of plane waves. Alternative strategy for constructing auxiliary basis sets is proposed, and its accuracy is assessed and compared to the previously used procedure. Finally, the reliability of our implementation is benchmarked against other established implementations within different numerical frameworks for the calculated band gap value of a set of semiconductors and insulators.

An atomic-orbital basis set framework is presented for carrying out velocity-gauge real-time time-dependent density functional theory (TDDFT) simulations in periodic systems employing range-separated hybrid functionals. Linear optical response obtained from real-time propagation of the time-dependent Kohn-Sham equations including nonlocal exchange is considered in prototypical solid-state materials such as bulk Si, LiF and monolayer hexagonal-BN. Additionally core excitations in monolayer hexagonal-BN at the B and N K-edges are investigated and the role of long-range and short-range nonlocal exchange in capturing valence and core excitonic effects is discussed. Results obtained using this time-domain atomic orbital basis set framework are shown to be consistent with equivalent frequency-domain planewave results in the literature. The developments discussed lead to a time-domain generalized Kohn-Sham TDDFT implementation for the treatment of core and valence electron dynamics and light-matter interaction in periodic solid-state systems.

Extended Lagrangian Born-Oppenheimer molecular dynamics [Phys. Rev. Lett., ${\bf 100}$, 123004 (2008)] is formulated for general Hohenberg-Kohn density functional theory and compared to the extended Lagrangian framework of first principles molecular dynamics by Car and Parrinello [Phys. Rev. Lett. ${\bf 55}$, 2471 (1985)]. It is shown how extended Lagrangian Born-Oppenheimer molecular dynamics overcomes several shortcomings of regular, direct Born-Oppenheimer molecular dynamics, while improving or maintaining important features of Car-Parrinello simulations. The accuracy of the electronic degrees of freedom in extended Lagrangian Born-Oppenheimer molecular dynamics, with respect to the exact Born-Oppenheimer solution, is of second order in the size of the integration time step and of fourth order in the potential energy surface. Improved stability over recent formulations of extended Lagrangian Born-Oppenheimer molecular dynamics is achieved by generalizing the theory to finite temperature ensembles, using fractional occupation numbers in the calculation of the inner-product kernel of the extended harmonic oscillator that appears as a preconditioner in the electronic equations of motion. Materials systems that normally exhibit slow self-consistent field convergence can be simulated using integration time steps of the same order as in direct Born-Oppenheimer molecular dynamics, but without the requirement of an iterative, non-linear electronic ground state optimization prior to the force evaluations and without a systematic drift in the total energy. In combination with proposed low-rank and on-the-fly updates of the kernel, this formulation provides an efficient and general framework for quantum based Born-Oppenheimer molecular dynamics simulations.

Parameterless stopping criteria for recursive polynomial expansions to construct the density matrix in electronic structure calculations are proposed. Based on convergence order estimation the new stopping criteria automatically and accurately detect when the calculation is dominated by numerical errors and continued iteration does not improve the result. Difficulties in selecting a stopping tolerance and appropriately balancing it in relation to parameters controlling the numerical accuracy are avoided. Thus, our parameterless stopping criteria stand in contrast to the standard approach to stop as soon as some error measure goes below a user-defined parameter or tolerance. We demonstrate that the stopping criteria work well both in dense and sparse matrix calculations and in large-scale self-consistent field calculations with the quantum chemistry program Ergo (www.ergoscf.org).

In this work, methods for the efficient simulation of large systems embedded in a molecular environment are presented. These methods combine linear-scaling (LS) Kohn-Sham (KS) density functional theory (DFT) with subsystem (SS) DFT. LS DFT is efficient for large subsystems, while SS DFT is linear scaling with a smaller prefactor for large sets of small molecules. The combination of SS and LS, which is an embedding approach, can result in a ten-fold speedup over a pure LS simulation for large systems in aqueous solution. In addition to a ground-state Born-Oppenheimer SS+LS implementation, a time-dependent density functional theory based Ehrenfest molecular dynamics (EMD) using density matrix propagation is presented that allows for performing non-adiabatic dynamics. Density matrix based EMD in the SS framework is naturally linear scaling and appears suitable to study the electronic dynamics of molecules in solution. In the LS framework, linear scaling results as long as the density matrix remains sparse during time propagation. However, we generally find a less than exponential decay of the density matrix after a sufficiently long EMD run, preventing LS EMD simulations with arbitrary accuracy. The methods are tested on various systems, including spectroscopy on dyes, the electronic structure of TiO2 nanoparticles, electronic transport in carbon nanotubes, and the satellite tobacco mosaic virus in explicit solution.

Density matrix perturbation theory [Niklasson and Challacombe, Phys. Rev.
Lett. 92, 193001 (2004)] is generalized to canonical (NVT) free energy
ensembles in tight-binding, Hartree-Fock or Kohn-Sham density functional
theory. The canonical density matrix perturbation theory can be used to
calculate temperature dependent response properties from the coupled perturbed
self-consistent field equations as in density functional perturbation theory.
The method is well suited to take advantage of sparse matrix algebra to achieve
linear scaling complexity in the computational cost as a function of system
size for sufficiently large non-metallic materials and metals at high
temperatures.

We discuss in this review recent progress, especially by our group, on linear scaling algorithms for electronic structure calculations with numerical atomic basis sets. The principles of the construction of numerical basis sets and the Hamiltonian are introduced first. Then we discuss how to solve the single-electron equation self-consistently, and how to obtain electronic properties via post-self-consistent-field processes in a linear scaling way. The linear response calculation with linear scaling is also introduced. Numerical implementation is emphasized, with some applications presented for demonstration purposes.

The global and local convergence properties of a class of augmented Lagrangian methods for solving nonlinear programming problems are considered. In such methods, simple bound constraints are treated separately from more general constraints and the stopping rules for the inner minimization algorithm have this in mind. Global convergence is proved, and it is established that a potentially troublesome penalty parameter is bounded away from zero.

From first principles, we investigated the adsorption of a hydrogen atom
on zigzag single-walled boron nitride nanotubes with and without radial
deformation. The adsorption is exothermic. We found that the adsorption
energy and site can be modified by the radial deformation. When the
deformation is small, H prefers to adsorb on the boron atom, which
creates an acceptor state in the gap. However, when the deformation is
large enough, H prefers to adsorb on the nitrogen atom in the high
curvature region of the radially deformed boron mitride nanotube and
creates a donor state. This adsorption and doping behavior can be
explained by the frontier-orbital theory and hydrogen level in
semiconductors.

A new approach to linear scaling construction of the density matrix is proposed, based on trace resetting purification of an effective Hamiltonian. Trace resetting is related to the trace preserving canonical purification scheme of Palser and Manolopoulos [Phys. Rev. B 58, 12704 (1999)] in that they both work with a predefined occupation number and do not require adjustment or prior knowledge of the chemical potential. In the trace resetting approach, trace conservation is not strictly enforced, allowing greater flexibility in the choice of purification polynomial and improved performance for Hamiltonian systems with high or low filling. However, optimal polynomials may in some cases admit unstable solutions, requiring a resetting mechanism to bring the solution back into the domain of convergent purification. A quartic trace resetting method is developed, along with analysis of stability and error accumulation due to incomplete sparse-matrix methods that employ a threshold tau to achieve sparsity. It is argued that threshold metered purification errors in the density matrix are O(tauDeltag-1) at worst, where Deltag is the gap at the chemical potential. In the low filling regime, purification derived total energies are shown to converge smoothly with tau2 for RPBE/STO-6G C60 and a RPBE0/STO-3G Ti substituted zeolite. For the zeolite, the quartic trace resetting method is found to be both faster and over an order of magnitude more accurate than the Palser-Manolopoulos method. In the low filling limit, true linear scaling is demonstrated for RHF/6-31G** water clusters, and the trace resetting method is found to be both faster and an order of magnitude more accurate than the Palser-Manolopoulos scheme. Basis set progression of RPBE chlorophyll reveals the quartic trace resetting to be up to four orders of magnitude more accurate than the Palser-Manolopoulos algorithm in the limit of low filling. Furthermore, the ability of trace resetting and trace preserving algorithms to deal with degeneracy and fractional occupation is discussed.

We evaluate the performances of ab initio GW calculations for the ionization energies and highest occupied molecular orbital-lowest unoccupied molecular orbital gaps of 13 gas phase molecules of interest for organic electronic and photovoltaic applications, including the C60 fullerene, pentacene, free-base porphyrins and phtalocyanine, PTCDA, and standard monomers such as thiophene, fluorene, benzothiazole, or thiadiazole. Standard G0W0 calculations, that is, starting from eigenstates obtained with local or semilocal functionals, significantly improve the ionization energy and band gap as compared to density functional theory Kohn-Sham results, but the calculated quasiparticle values remain too small as a result of overscreening. Starting from Hartree-Fock-like eigenvalues provides much better results and is equivalent to performing self-consistency on the eigenvalues, with a resulting accuracy of 2%–4% as compared to experiment. Our calculations are based on an efficient Gaussian-basis implementation of GW with explicit treatment of the dynamical screening through contour deformation techniques.

We present first-principles calculations of optimally localized Wannier functions for Cu and use these for an ab initio determination of Hubbard (Coulomb) matrix elements. We use a standard linearized muffin-tin orbital calculation in the atomic-sphere approximation to calculate Bloch functions, and from these determine maximally localized Wannier functions using a method proposed by Marzari and Vanderbilt. The resulting functions were highly localized, with greater than 89% of the norm of the function within the central site for the occupied Wannier states. Two methods for calculating Coulomb matrix elements from Wannier functions are presented and applied to fcc Cu. For the unscreened on-site Hubbard U for the Cu 3d bands, we have obtained about 25 eV. These results are also compared with results obtained from a constrained local-density approximation calculation.

Linear scaling algorithms based on Fermi operator expansions (FOE) have been considered significantly slower than other alternative approaches in evaluating the density matrix in Kohn–Sham density functional theory, despite their attractive simplicity. In this work, two new improvements to the FOE method are introduced. First, novel fast summation methods are employed to evaluate a matrix polynomial or Chebyshev matrix polynomial with matrix multiplications totalling roughly twice the square root of the degree of the polynomial. Second, six different representations of the Fermi operators are compared to assess the smallest possible degree of polynomial expansion for a given target precision. The optimal choice appears to be the complementary error function. Together, these advances make the FOE method competitive with the best existing alternatives. © 2003 American Institute of Physics.

Recent developments in and around the SIESTA method of first-principles simulation of condensed matter are described and reviewed, with emphasis on (i) the applicability of the method for large and varied systems, (ii) efficient basis sets for the standards of accuracy of density-functional methods, (iii) new implementations, and (iv) extensions beyond ground-state calculations.

The density matrix divide-and-conquer technique for the solution of Kohn–Sham density functional theory has been implemented within the framework of the SIESTA methodology. Implementation details are provided where the focus is on the scaling of the computation time and memory use, in both serial and parallel versions. We demonstrate the linear-scaling capabilities of the technique by providing ground state calculations of moderately large insulating, semiconducting and (near-) metallic systems. This linear-scaling technique has made it feasible to calculate the ground state properties of quantum systems consisting of tens of thousands of atoms with relatively modest computing resources. A comparison with the existing order-N functional minimization (Kim–Mauri–Galli) method is made between the insulating and semiconducting systems.

We consider a new algorithm, an interior-reflective Newton approach, for the problem of minimizing a smooth nonlinear function of many variables, subject to upper and/or lower bounds on some of the variables. This approach generatesstrictly feasible iterates by using a new affine scaling transformation and following piecewise linear paths (reflection paths). The interior-reflective approach does not require identification of an “activity set”. In this paper we establish that the interior-reflective Newton approach is globally and quadratically convergent. Moreover, we develop a specific example of interior-reflective Newton methods which can be used for large-scale and sparse problems.

Microscopic mechanisms of the puzzling insulating ferromagnetic (FM) order observed in La422 were identified based on calculated parameters of interactions between half-filled localized Wannier states (WSs). The long-omitted FM direct exchange was shown to overwhelm AF superexchange. The spatial distribution of the constructed low-energy all-electron WSs provided detailed insight into the exchange processes and the partial contribution of spin moment.

The unpolarized absorption and circular dichroism spectra of the
fundamental vibrational transitions of the chiral molecule,
4-methyl-2-oxetanone, are calculated ab initio. Harmonic force fields
are obtained using Density Functional Theory (DFT), MP2, and SCF
methodologies and a 5S4P2D/3S2P (TZ2P) basis set. DFT calculations use
the Local Spin Density Approximation (LSDA), BLYP, and Becke3LYP (B3LYP)
density functionals. Mid-IR spectra predicted using LSDA, BLYP, and
B3LYP force fields are of significantly different quality, the B3LYP
force field yielding spectra in clearly superior, and overall excellent,
agreement with experiment. The MP2 force field yields spectra in
slightly worse agreement with experiment than the B3LYP force field. The
SCF force field yields spectra in poor agreement with experiment.The
basis set dependence of B3LYP force fields is also explored: the 6-31G
and TZ2P basis sets give very similar results while the 3-21G basis set
yields spectra in substantially worse agreements with experiment.

The research of Lanczos procedures for eigenelement computations continues. Although many interesting results were obtained, many of the theoretical questions concerning Lanczos procedures were not satisfactory resolved. Much of the existing literature on Lanczos procedures have not adequately incorporated the effects of roundoff errors due to the inexactness of the computer arithmetic. Numerical experiments with various Lanczos procedures have however clearly demonstrated their advantages and capabilities.

Over the last decades, linear‐scaling quantum–chemical methods (QM) have become an important tool for studying large molecular systems, so that already with modest computer resources molecules with more than a thousand atoms are well in reach. The key feature of the methods is the reduction of the steep scaling of the computational effort of conventional ab initio schemes to linear while reliability and accuracy of the underlying quantum–chemical approximation is preserved in the most successful schemes. This review gives a brief overview of selected linear‐scaling approaches at the Hartree–Fock and density‐functional theory level with a particular emphasis on density matrix‐based approaches. The focus is not only on energetics, but also on the calculation of molecular properties providing an important link between theory and experiment. In addition, the usefulness of linear‐scaling QM approaches within quantum mechanical/molecular mechanical (QM/MM) hybrid schemes is briefly discussed. WIREs Comput Mol Sci 2013, 3:614–636. doi: 10.1002/wcms.1138
This article is categorized under: Electronic Structure Theory > Ab Initio Electronic Structure Methods

The implementation of the orbital minimization method (OMM) for solving the
self-consistent Kohn-Sham (KS) problem for electronic structure calculations in
a basis of non-orthogonal numerical atomic orbitals of finite-range is
reported. We explore the possibilities for using the OMM as an exact
cubic-scaling solver for the KS problem, and compare its performance with that
of explicit diagonalization in realistic systems. We analyze the efficiency of
the method depending on the choice of line search algorithm and on two free
parameters, the scale of the kinetic energy preconditioning and the
eigenspectrum shift. The results of several timing tests are then discussed,
showing that the OMM can achieve a noticeable speedup with respect to
diagonalization even for minimal basis sets for which the number of occupied
eigenstates represents a significant fraction of the total basis size (>15%).
We investigate the hard and soft parallel scaling of the method on multiple
cores, finding a performance equal to or better than diagonalization depending
on the details of the OMM implementation. Finally, we discuss the possibility
of making use of the natural sparsity of the operator matrices for this type of
basis, leading to a method that scales linearly with basis size.

A new linear scaling method for computation of the Cartesian Gaussian-based Hartree-Fock exchange matrix is described, which employs a method numerically equivalent to standard direct SCF, and which does not enforce locality of the density matrix. With a previously described method for computing the Coulomb matrix [J. Chem. Phys. 106, 5526 (1997)], linear scaling incremental Fock builds are demonstrated for the first time. Microhartree accuracy and linear scaling are achieved for restricted Hartree-Fock calculations on sequences of water clusters and polyglycine α-helices with the 3-21G and 6-31G basis sets. Eightfold speedups are found relative to our previous method. For systems with a small ionization potential, such as graphitic sheets, the method naturally reverts to the expected quadratic behavior. Also, benchmark 3-21G calculations attaining microhartree accuracy are reported for the P53 tetramerization monomer involving 698 atoms and 3836 basis functions.

The hybrid functional method within the HSE06 scheme is tested on various oxides such as TiO2, SrTiO3, ZnO, SnO2, MgO, SiO2, and Al2O3. Since the canonical mixing parameter still underestimates the energy gap, we optimize it by fitting the energy gap to the experimental value. It is found that optimized values lie between 0.2 and 0.4 depending on the material. The structural properties are examined and the lattice parameters calculated with the HSE06 functional are in better agreement with experiment compared to (semi)local functional results. The relative shifts in the valence and conduction edges are provided, which can serve as first-order corrections to the semilocal functional results on defect levels or interfacial band offsets.

This paper deals with the ground state of an interacting electron gas in an external potential v(r). It is proved that there exists a universal functional of the density, Fn(r), independent of v(r), such that the expression Ev(r)n(r)dr+Fn(r) has as its minimum value the correct ground-state energy associated with v(r). The functional Fn(r) is then discussed for two situations: (1) n(r)=n0+n(r), n/n01, and (2) n(r)= (r/r0) with arbitrary and r0. In both cases F can be expressed entirely in terms of the correlation energy and linear and higher order electronic polarizabilities of a uniform electron gas. This approach also sheds some light on generalized Thomas-Fermi methods and their limitations. Some new extensions of these methods are presented.

If, in an n-dimensional crystal, the structure of a simple (d=1) or complex (d>1) energy band fulfills proper symmetry conditions, the band can be spanned by a set of Wannier functions and, in many cases, the following statements can be established. (1) There exists a set of Bloch waves (d=1) or quasi Bloch waves (d>1) which are periodic and analytic functions of the complex wave vector K=K′+iK′′ in a domain of the complex K space defined by an equation of the form |K′′|<A where A is a positive constant. (2) The corresponding Wannier functions fall off exponentially at infinity.

In an n-dimensional crystal, an energy band is usually made of several branches which are connected with each other. Accordingly, the Bloch states of wave vector K which are eigenfunctions of a one-electron Hamiltonian H=-Δ+V and which belong to a given band B, define a subspace S(K) of finite dimensionality. For a large class of potentials, two properties concerning the subspaces S(K) which are associated with a fixed band B have been proved for n-dimensional crystals. (1) The projection operator P(K) on S(K) can be defined for complex values of K, and its matrix elements 〈r|P(K)|r′〉 are analytic in a strip of the complex K space; this strip is centered on the real K space and is independent of r and r'. (2) The projection operator P=∫dhKP(K) (integration on the Brillouin zone) has matrix elements 〈r|P|r′〉 which decrease exponentially when the length|r-r′| goes to infinity.

We present an analytical study of the spatial decay rate gamma of the one-particle density matrix rho$r-->,r-->'$~exp$-gamma\|r-->-r-->'\|$ for systems described by single-particle orbitals in periodic potentials in arbitrary dimensions. This decay reflects electronic locality in condensed matter systems and is also crucial for O$N$ density functional methods. We find that gamma behaves contrary to the conventional wisdom that generically gamma~&surd;Delta in insulators and gamma~&surd;T in metals, where Delta is the direct band gap and T is the temperature. Rather, in semiconductors gamma~Delta, and in metals at low temperature gamma~T.

Although it is usually stated that the Hartree–Fock method formally scales as N4, where N is the number of basis functions employed in the calculation, it is also well known that mathematical bounds computed with the Schwarz inequality can be used to screen and eliminate four‐center two‐electron integrals smaller than a certain threshold. In this work, quantitative data is presented to illustrate the effects of this integral screening on the scaling properties of the Hartree–Fock (HF) method. Calculations are performed on a range of carbon–hydrogen model systems, two‐dimensional graphitic sheets, and three‐dimensional diamond pieces, to determine the effective scaling exponent α of the computational expense. The data obtained in this paper for calculations including over 250 carbon atoms and 1500 basis functions shows two significant trends: (1) in the asymptotic limit of large molecules, α is found to be approximately 2.2–2.3, and (2) for molecules of modest size, α is still very much less than 4. Therefore, integral screening is quantitatively shown to substantially reduce the Hartree–Fock scaling from its formal value of N4.

A general approach to the parallel sparse-blocked matrix–matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.

We propose a new trust region approach for minimizing a nonlinear function subject to simple bounds. Unlike most existing methods, our proposed method does not require that a quadratic programming subproblem, with inequality constraints, be solved in each iteration. Instead, a solution to a trust region subproblem is defined by minimizing a quadratic function subject only to an ellipsoidal constraint. The iterates generated are strictly feasible. Our proposed method reduces to a standard trust region approach for the unconstrained problem when there are no upper or lower bounds on the variables. Global and local quadratic convergence is established. Preliminary numerical experiments are reported indicating the practical viability of this approach.

A projector expansion method is presented for an efficient and accurate implementation of the first-principles electronic structure calculations using pseudopotentials and atomic basis functions. By expressing the rapidly varying local potential in the vicinity of nuclei by a separable projector expansion, the difficulty involved in the grid integration using the regular real-space grid is remarkably reduced without increasing the computational effort. To illustrate the capability, it is shown that the proposed method significantly suppresses not only a spurious oscillation in the energy curve for the atomic displacement involved in a weak interaction such as hydrogen bonding, but also the dependence of optimized structure on relative position to the real-space grid in the geometry optimization within a modest grid fineness.

A method is described for the computation of the generalized Wannier functions corresponding to a composite group of electromagnetic bands in a photonic crystal. The Wannier functions are optimally localized, in that they have minimum real-space spread, defined as the average over the set of the second moment of the functions. The mathematical approach follows that developed by N. Marzari and D. Vanderbilt [Phys. Rev. B 56, 12 847 (1997)] to obtain similarly well-localized Wannier functions for electrons in crystalline solids. Results are presented for the lowest few TE and TM bands in two-dimensional photonic crystals consisting of square and triangular lattices of holes and rods.

A density matrix divide‐and‐conquer method is proposed for electronic structure calculation of large molecules. It is based on partition of density matrix and thus applicable to both density‐functional and Hartree–Fock method. Compared to the original formulation with electron density, the present method is more efficient and as accurate. © 1995 American Institute of Physics.

A new method for the multipole evaluation of contracted Cartesian Gaussian-based electron repulsion integrals is described, and implemented in linear scaling methods for computation of the Hartree–Fock exchange matrix. The new method, which relies on a nonempirical multipole acceptability criterion [J. Chem. Phys. 109, 8764 (1998)], renders the work associated with integral evaluation independent of the basis set contraction length. Benchmark calculations on a series of three-dimensional water molecule clusters and graphitic sheets with highly contracted basis sets indicate that the new method is up to 4.6 times faster than a well optimized direct integral evaluation routine. For calculations involving lower levels of contraction a factor of 2 speedup is typically observed. Importantly, the method achieves these large gains in computational efficiency while maintaining numerical equivalence with standard direct self consistent field theory. © 1999 American Institute of Physics.

We introduce the near‐field exchange method for calculating Hartree–Fock exchange in time scaling near‐linearly with system size. Benchmarks on polyglycine chains, water clusters, and diamond pieces show that microhartree accuracy and substantial speedups (up to 10×) over traditional calculations can be obtained for electrically insulating systems larger than 300 atoms. © 1996 American Institute of Physics.

We present a new method (LinK) to form the exact exchange matrix, as needed in Hartree–Fock and hybrid density functional theory calculations, with an effort capable of scaling only linearly with molecular size. It preserves the highly optimized structure of conventional direct self-consistent field (SCF) methods with only negligible prescreening overhead and does not impose predefined decay properties. Our LinK method leads to very early advantages as compared to conventional methods for systems with larger band gaps. Due to negligible screening overhead it is also competitive with conventional SCF schemes both for small molecules and systems with small band gaps. For the formation of an exchange-type matrix in coupled perturbed SCF theory our LinK method can exhibit sublinear scaling, or more precisely, independence of the computational effort from molecular size. © 1998 American Institute of Physics.

Recurrence expressions are derived for various types of molecular integrals over Cartesian Gaussian functions by the use of the recurrence formula for three‐center overlap integrals. A number of characteristics inherent in the recursive formalism allow an efficient scheme to be developed for molecular integral computations. With respect to electron repulsion integrals and their derivatives, the present scheme with a significant saving of computer time is found superior to other currently available methods. A long innermost loop incorporated in the present scheme facilitates a fast computation on a vector processing computer.

A simplified version of the Li, Nunes and Vanderbilt [Phys. Rev. B 47, 10891 (1993)] and Daw [Phys. Rev. B 47, 10895 (1993)] density matrix minimization is introduced that requires four fewer matrix multiplies per minimization step relative to previous formulations. The simplified method also exhibits superior convergence properties, such that the bulk of the work may be shifted to the quadratically convergent McWeeny purification, which brings the density matrix to idempotency. Both orthogonal and nonorthogonal versions are derived. The AINV algorithm of Benzi, Meyer, and Tůma [SIAM J. Sci. Comp. 17, 1135 (1996)] is introduced to linear scaling electronic structure theory, and found to be essential in transformations between orthogonal and nonorthogonal representations. These methods have been developed with an atom-blocked sparse matrix algebra that achieves sustained megafloating point operations per second rates as high as 50% of theoretical, and implemented in the MondoSCF suite of linear scaling SCF programs. For the first time, linear scaling Hartree–Fock theory is demonstrated with three-dimensional systems, including water clusters and estane polymers. The nonorthogonal minimization is shown to be uncompetitive with minimization in an orthonormal representation. An early onset of linear scaling is found for both minimal and double zeta basis sets, and crossovers with a highly optimized eigensolver are achieved. Calculations with up to 6000 basis functions are reported. The scaling of errors with system size is investigated for various levels of approximation. © 1999 American Institute of Physics.

An efficient method is presented for evaluating two‐electron Cartesian Gaussian integrals, and their first derivatives with respect to nuclear coordinates. It is based on the recurrence relation (RR) of Obara and Saika [J. Chem. Phys. 84, 3963 (1986)], and an additional new RR, which are combined together in a general algorithm applicable to any angular momenta. This algorithm exploits the fact that the new RR can be applied outside contraction loops. It is shown, by floating point operation counts and comparative timings, to be generally superior to existing methods, particularly for basis sets containing d functions.

Two methods are suggested for performing ground-state
electronic-structure calculations within the independent-electron
approximation. Both methods involve the purification of a specified
initial density matrix, which can be done either canonically (at a fixed
electron count Ne) or grand canonically (at a fixed chemical
potential μ). Linear system-size scaling is achieved either way, as
we illustrate in example tight-binding calculations on carbon nanotubes.

The adaptive finite-element method proposed in our previous work [Phys.
Rev. B 54 (1996) 7602] is extended to fully self-consistent
calculations of realistic materials. Our method is highly adaptive,
sparse, parallel, and suited for the O(N) methods, thanks to the
localized finite-element basis functions. Accurate ionic forces can also
be calculated within practical time usage. Applications to the
structural properties of diamond, c-BN, and the C60 molecule,
and molecular dynamics within O(N3) scaling are shown first,
followed by detailed error analyses. Then the O(N) method based on the
orbital formulation is realized within our approach.

We present a complete linear scaling method for hybrid Kohn−Sham density functional theory electronic structure calculations and demonstrate its performance. Particular attention is given to the linear scaling computation of the Kohn−Sham exchange-correlation matrix directly in sparse form within the generalized gradient approximation. The described method makes efficient use of sparse data structures at all times and scales linearly with respect to both computational time and memory usage. Benchmark calculations at the BHandHLYP/3-21G level of theory are presented for polypeptide helix molecules with up to 53 250 atoms. Threshold values for computational approximations were chosen on the basis of their impact on the occupied subspace so that the different parts of the calculations were carried out at balanced levels of accuracy. The largest calculation used 307 204 Gaussian basis functions on a single computer with 72 GB of memory. Benchmarks for three-dimensional water clusters are also included, as well as results using the 6-31G** basis set.

Hybrid density functionals are very successful in describing a wide range of molecular properties accurately. In large molecules and solids, however, calculating the exact (Hartree-Fock) exchange is computationally expensive, especially for systems with metallic characteristics. In the present work, we develop a new hybrid density functional based on a screened Coulomb potential for the exchange interaction which circumvents this bottleneck. The results obtained for structural and thermodynamic properties of molecules are comparable in quality to the most widely used hybrid functionals. In addition, we present results of periodic boundary condition calculations for both semiconducting and metallic single wall carbon nanotubes. Using a screened Coulomb potential for Hartree-Fock exchange enables fast and accurate hybrid calculations, even of usually difficult metallic systems. The high accuracy of the new screened Coulomb potential hybrid, combined with its computational advantages, makes it widely applicable to large molecules and periodic systems.

Density functional theory (DFT) is the most widely used technique in the realm of first-principles electronic structure methods. Principally, this is because DFT in the Kohn–Sham (KS) formalism offers the appealing combination of relatively high accuracy and relatively low computational cost. Despite their great successes, traditional semilocal functionals fail to describe some important problems in solid state physics and materials science, the most conspicuous example being the notorious band gap problem. More sophisticated functionals providing greater accuracy without sacrificing computational efficiency are therefore needed. The Heyd–Scuseria–Ernzerhof (HSE) screened hybrid density functional [J. Heyd, G. E. Scuseria, and M. Ernzerhof, J. Chem. Phys. 118, 8207 (2003); J. Heyd and G. E. Scuseria, J. Chem. Phys. 121, 1187 (2004)] successfully addresses some of the chief problems which plague semilocal functionals by including only the important parts of exact nonlocal Hartree–Fock-type exchange. This work discusses some of the concepts underlying HSE and provides illustrative examples highlighting the successes of HSE in numerous solid state applications.

Three improvements on the direct self-consistent field method are proposed and tested which together increase CPU-efficiency by about 50%: (i) selective storage of costly integral batches; (ii) improved integral bond for prescreening; (iii) decomposition of the current density matrix into a linear combination of previous density matrices—for which the two-electron contributions to the Fock matrix are available—and a remainder ΔD, which is minimized; construction of the current Fock matrix only requires processing of the small ΔD which enhances prescreening.

Linear-scaling methods, or
methods, have computational and memory requirements which scale linearly with the number of atoms in the system, N, in contrast to standard approaches which scale with the cube of the number of atoms. These methods, which rely on the short-ranged nature of electronic structure, will allow accurate, ab initio simulations of systems of unprecedented size. The theory behind the locality of electronic structure is described and related to physical properties of systems to be modelled, along with a survey of recent developments in real-space methods which are important for efficient use of high-performance computers. The linear-scaling methods proposed to date can be divided into seven different areas, and the applicability, efficiency and advantages of the methods proposed in these areas are then discussed. The applications of linear-scaling methods, as well as the implementations available as computer programs, are considered. Finally, the prospects for and the challenges facing linear-scaling methods are discussed.

Methods exhibiting linear scaling with respect to the size of the system, the so-called O(N) methods, are an essential tool for the calculation of the electronic structure of large syst