ArticlePDF Available

Interpolative Separable Density Fitting through Centroidal Voronoi Tessellation With Applications to Hybrid Functional Electronic Structure Calculations

Authors:

Abstract and Figures

The recently developed interpolative separable density fitting (ISDF) decomposition is a powerful way for compressing the redundant information in the set of orbital pairs, and has been used to accelerate quantum chemistry calculations in a number of contexts. The key ingredient of the ISDF decomposition is to select a set of non-uniform grid points, so that the values of the orbital pairs evaluated at such grid points can be used to accurately interpolate those evaluated at all grid points. The set of non-uniform grid points, called the interpolation points, can be automatically selected by a QR factorization with column pivoting (QRCP) procedure. This is the computationally most expensive step in the construction of the ISDF decomposition. In this work, we propose a new approach to find the interpolation points based on the centroidal Voronoi tessellation (CVT) method, which offers a much less expensive alternative to the QRCP procedure when ISDF is used in the context of hybrid functional electronic structure calculations. The CVT method only uses information from the electron density, and can be efficiently implemented using a K-Means algorithm. We find that this new method achieves comparable accuracy to the ISDF-QRCP method, at a cost that is negligible in the overall hybrid functional calculations. For instance, for a system containing 1000 silicon atoms simulated using the HSE06 hybrid functional on 2000 computational cores, the cost of QRCP-based method for finding the interpolation points costs 434.2 seconds, while the CVT procedure only takes 3.2 seconds. We also find that the ISDF-CVT method also enhances the smoothness of the potential energy surface in the context of \emph{ab initio} molecular dynamics (AIMD) simulations with hybrid functionals.
Content may be subject to copyright.
Interpolative Separable Density Fitting through
Centroidal Voronoi Tessellation With
Applications to Hybrid Functional Electronic
Structure Calculations
Kun Dong,,Wei Hu,,and Lin Lin,,
Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, United
States, Computational Research Division, Lawrence Berkeley National Laboratory,
Berkeley, California 94720, United States, and Department of Mathematics, University of
California, Berkeley, California 94720, United States
E-mail: kd383@cornell.edu; whu@lbl.gov; linlin@math.berkeley.edu
Abstract
The recently developed interpolative separable density fitting (ISDF) decomposition
is a powerful way for compressing the redundant information in the set of orbital
pairs, and has been used to accelerate quantum chemistry calculations in a number
of contexts. The key ingredient of the ISDF decomposition is to select a set of non-
uniform grid points, so that the values of the orbital pairs evaluated at such grid points
can be used to accurately interpolate those evaluated at all grid points. The set of non-
uniform grid points, called the interpolation points, can be automatically selected by a
To whom correspondence should be addressed
Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, United States
Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720,
United States
Department of Mathematics, University of California, Berkeley, California 94720, United States
1
arXiv:1711.01531v1 [physics.comp-ph] 5 Nov 2017
QR factorization with column pivoting (QRCP) procedure. This is the computationally
most expensive step in the construction of the ISDF decomposition. In this work, we
propose a new approach to find the interpolation points based on the centroidal Voronoi
tessellation (CVT) method, which offers a much less expensive alternative to the QRCP
procedure when ISDF is used in the context of hybrid functional electronic structure
calculations. The CVT method only uses information from the electron density, and
can be efficiently implemented using a K-Means algorithm. We find that this new
method achieves comparable accuracy to the ISDF-QRCP method, at a cost that is
negligible in the overall hybrid functional calculations. For instance, for a system
containing 1000 silicon atoms simulated using the HSE06 hybrid functional on 2000
computational cores, the cost of QRCP-based method for finding the interpolation
points costs 434.2 seconds, while the CVT procedure only takes 3.2 seconds. We also
find that the ISDF-CVT method also enhances the smoothness of the potential energy
surface in the context of ab initio molecular dynamics (AIMD) simulations with hybrid
functionals.
1 Introduction
Orbital pairs of the form {ϕi(r)ψj(r)}N
i,j=1, where ϕi, ψjare single particle orbitals, appear
ubiquitously in quantum chemistry. A few examples include the Fock exchange operator, the
MP2 amplitude, and the polarizability operator.1,2 When Nis proportional to the number
of electrons Nein the system, the total number of orbital pairs is N2 O(N2
e). On the
other hand, the number of degrees of freedom needed to resolve all orbital pairs on a dense
grid is only O(Ne). Hence as Nebecomes large, the set of all orbital pairs contains apparent
redundant information. In order to compress the redundant information and to design more
efficient numerical algorithms, many algorithms in the past few decades have been developed.
Pseudospectral decomposition, 3,4 Cholesky decomposition,5–8 density fitting (DF) or reso-
lution of identity (RI),9,10 and tensor hypercontraction (THC)11,12 are only a few examples
2
towards this goal. When the single particle orbitals ϕi, ψjare already localized functions,
“local methods” or “linear scaling methods”13–15 can be applied to construct such decom-
position with cost that scales linearly with respect to Ne. Otherwise, the storage cost of the
matrix to represent all orbital pairs on a grid is already O(N3
e), and the computational cost
of compressing the orbital pairs is then typically O(N4
e).
Recently, Lu and Ying developed a new decomposition called the interpolative separable
density fitting (ISDF),16 which takes the following form
ϕi(r)ψj(r)
Nµ
X
µ=1
ζµ(r) (ϕi(ˆ
rµ)ψj(ˆ
rµ)) .(1)
For a given r, if we view ψi(r)ψj(r) as a row of the matrix {ψiψj}discretized on a dense
grid, then the ISDF decomposition states that all such matrix rows can be approximately
expanded using a linear combination of matrix rows with respect to a selected set of in-
terpolation points {ˆ
rµ}Nµ
µ=1. The coefficients of such linear combination, or interpolating
vectors, are denoted by {ζµ(r)}Nµ
µ=1. Here Nµcan be interpreted as the numerical rank of the
ISDF decomposition. Compared to the standard density fitting method, the three-tensor
(ϕi(ˆ
rµ)ψj(ˆ
rµ)) with three indices i, j, µ takes a separable form. This reduces the storage cost
of the decomposed tensor from O(N3
e) to O(N2
e), and the computational cost from O(N4
e) to
O(N3
e). Note that if the interpolation points {ˆ
rµ}Nµ
µ=1 are chosen to be on a uniform grid, then
the ISDF decomposition reduces to the pseudospectral decomposition, where Nµ O(Ne)
but with a large preconstant. For instance, the pseudospectral decomposition can be highly
inefficient for molecular systems, where the grid points in the vacuum contributes nearly
negligibly to the representation of the orbital pairs. On the other hand, by selecting the
interpolation points carefully, e.g. through a randomized QR factorization with column piv-
oting (QRCP) procedure,17 the number of interpolation points can be significantly reduced.
The QRCP based ISDF decomposition has been applied to accelerate a number of applica-
tions such as two-electron integral computation,16 correlation energy in the random phase
3
approximation,18 density functional perturbation theory,19 and hybrid density functional
calculations.20 For example, in the context of iterative solver for hybrid density functional
calculations, the Fock exchange operator VXdefined in terms of a set of orbitals {ϕi}needs
to be repeatedly applied to another set of Kohn-Sham orbitals {ϕj}
(VX[{ϕi}]ψj) (r) =
Ne
X
i=1
ϕi(r)ZK(r,r0)ϕi(r0)ψj(r0) dr0.(2)
where K(r,r0) is the kernel for the Coulomb or the screened Coulomb operator. The integra-
tion in Eq. (2) is often carried out by solving Poisson-like equations, using e.g. a fast Fourier
transform (FFT) method, and the computational cost is O(N3
e) with a large preconstant.
This is typically the most time consuming component in hybrid functional calculations, and
can be accelerated by the ISDF decomposition for the orbital pairs {ϕiψj}.
In Ref.,16 the interpolation points and the interpolation vectors are determined simulta-
neously through a randomized QR factorization with column pivoting (QRCP) applied to
{ψi(r)ψj(r)}directly. We recently found that the randomized QRCP procedure has O(N3
e)
complexity but with a relatively large preconstant, and may not be competitive enough when
used repeatedly. In order to overcome such difficulty, we proposed a different approach in
Ref.20 that determines the two parts separately and reduces the computational cost. We
use the relatively expensive randomized QRCP procedure to find the interpolation points
in advance, and only recompute the interpolation vectors whenever {ψi(r)ψj(r)}has been
updated using an efficient least squares procedure that exploits the separable nature of the
matrix to be approximated. As a result, we can significantly accelerate hybrid functional
calculations using the ISDF decomposition in all but the first SCF iteration.
In this work, we further remove the need of performing the QRCP decomposition com-
pletely, and hence significantly reduce the computational cost. Note that an effective choice
of the set of interpolation points should satisfy the following two conditions. 1) The distribu-
tion of the interpolation points should roughly follow the distribution of the electron density.
4
In particular, there should be more points when the electron density is high, and less or even
zero points if the electron density is very low. 2) The interpolation points should not be
very close to each other. Otherwise matrix rows represented by the interpolation points are
nearly linearly dependent, and the matrix formed by the interpolation vectors will be highly
ill-conditioned. The QRCP procedure satisfies both 1) and 2) simultaneously, and thus is an
effective way for selecting the interpolation points. Here we demonstrate that 1) and 2) can
also be satisfied through a much simpler centroidal Voronoi tessellation (CVT) procedure
applied to a weight vector such as the electron density.
The Voronoi tessellation technique has been widely used in computer science,21 and
scientific and engineering applications such as image processing,22 pattern recognition, 23
and numerical integration.24 The concept of Voronoi tessellation can be simply understood
as follows. Given a discrete set of weighted points, the CVT procedure divides a domain
into a number of regions, each consisting of a collection of points that are closest to its
weighted centroid. Here we choose the electron density as the weight, and the centroids
as the interpolation points. The centroids must be located where the electron density is
significant, and hence satisfy the requirement 1). The centroids are also mutually separated
from each other by a finite distance due to the nearest neighbor principle, and hence satisfy
the requirement 2). Although detailed analysis of the error stemming from such choice of
interpolation points is very difficult for general nonlinear functions, we find that the CVT
procedure approximately minimizes the residual of the ISDF decomposition (1). In practice,
the CVT procedure only applies to one vector (the electron density) instead of O(N2
e) vectors
and hence is very efficient.
We apply the ISDF-CVT method to accelerate hybrid functional calculations in a planewave
basis set. We perform such calculations for different systems with insulating (liquid water),
semiconducting (bulk silicon), and metallic (disordered silicon aluminum alloy) characters,
as well as ab initio molecular dynamics (AIMD) simulations. We find that the ISDF-CVT
method achieves similar accuracy to that obtained from the ISDF-QRCP method, with sig-
5
nificantly improved efficiency. For instance, for a bulk silicon system containing 1000 silicon
atoms computed on 2000 computational cores, the QRCP procedure finds the interpolation
points with 434.2 seconds, while the CVT procedure only takes 3.2 seconds. Since the solu-
tion of the CVT procedure is continuous with respect to changes in the electron density, we
also find that the CVT procedure produces a smoother potential energy surface than that
by the QRCP procedure in the context of ab initio molecular dynamics (AIMD) simulations.
The remainder of the paper is organized as follows. We briefly introduce the ISDF
decomposition in section 2. In section 3 we describe the ISDF-CVT procedure and its imple-
mentation for hybrid functional calculations. We present numerical results of the ISDF-CVT
method in section 4, and conclude in section 5. We also provide the theoretical justification
of the CVT method in Appendix A.
2 Interpolative Separable Density Fitting (ISDF) de-
composition
In this section, we briefly introduce the ISDF decomposition16 evaluated using the method
developed in Ref.,20 which employs a separate treatment of the interpolation points and
interpolation vectors.
First, assume the interpolation points {ˆ
rµ}Nµ
µ=1 are known, then the interpolation vectors
can be efficiently evaluated using a least squares method as follows. Using a linear algebra
notation, Eq. (1) can be written as
ZΘC, (3)
where each column of Zis given by Zij (r) = ϕi(r)ψj(r) sampled on a dense real space
grids {ri}Ng
i=1, and Θ = [ζ1, ζ2, ..., ζNµ] contains the interpolating vectors. Each column of C
indexed by (i, j) is given by
[ϕi(ˆ
r1)ψj(ˆ
r1),· · · , ϕi(ˆ
rµ)ψj(ˆ
rµ),· · · , ϕi(ˆ
rNµ)ψj(ˆ
rNµ)]T.
6
Eq. (3) is an over-determined linear system with respect to the interpolation vectors Θ. The
least squares approximation to the solution is given by
Θ = ZCT(CC T)1.(4)
It may appear that the matrix-matrix multiplications ZCTand CCTtake O(N4
e) operations
because the size of Zis Ng×(NeN) and the size of Cis Nµ×(NeN). However, both
multiplications can be carried out with fewer operations due to the separable structure of Z
and C. The computational complexity for computing the interpolation vectors is O(N3
e), and
numerical results indicate that the preconstant is also much smaller than that involved in
hybrid functional calculations.20 Hence the interpolation vectors can be obtained efficiently
using the least squares procedure.
The problem for finding a suitable set of interpolation points {ˆ
rµ}Nµ
µ=1 can be formulated
as the following linear algebra problem. Consider the discretized matrix Zof size Ng×N2,
and find Nµrows of Zso that the rest of the rows of Zcan be approximated by the linear
combination of the selected Nµrows. This is called an interpolative decomposition,25 and
a standard method to achieve such a decomposition is the QR factorization with column
pivoting (QRCP) procedure25 as
ZTΠ = QR. (5)
Here ZTis the transpose of Z,Qis an N2×Ngmatrix that has orthonormal columns, R
is an upper triangular matrix, and Π is a permutation matrix chosen so that the magnitude
of the diagonal elements of Rform an non-increasing sequence. The magnitude of each
diagonal element Rindicates how important the corresponding column of the permuted ZT
is, and whether the corresponding grid point should be chosen as an interpolation point.
The QRCP factorization can be terminated when the (Nµ+ 1)-th diagonal element of R
becomes less than a predetermined threshold. The leading Nµcolumns of the permuted ZT
are considered to be linearly independent numerically. The corresponding grid points are
7
chosen as the interpolation points. The indices for the chosen interpolation points {ˆ
rµ}can
be obtained from indices of the nonzero entries of the first Nµcolumns of the permutation
matrix Π.
The QRCP decomposition satisfies the requirements 1) and 2) discussed in the intro-
duction. First, QRCP permutes matrix columns of ZTwith large norms to the front, and
pushes matrix columns of ZTwith small norms to the back. Note that the square of the
vector 2-norm of the column of ZTlabeled by ris just
N
X
i,j=1
ϕ2
i(r)ϕ2
j(r) = N
X
i=1
ϕ2
i(r)! N
X
j=1
ψ2
j(r)!.(6)
In the case when ϕi, ψjare the set of occupied orbitals, the norm of each column of ZT
is simply the electron density. Hence the interpolation points chosen by QRCP will occur
where the electron density is significant. Second, once a column is selected, all other columns
are immediately orthogonalized with respect to the chosen column. Hence nearly linearly
dependent matrix columns will not be selected repeatedly. As a result, the interpolation
points chosen by QRCP are well separated spatially.
It turns out that the direct application of the QRCP procedure (5) still requires O(N4
e)
computational complexity. The key idea used in Ref.16 to lower the cost is to randomly
subsample columns of the matrix Zto form a smaller matrix e
Zof size Ng×e
Nµ, where e
Nµ
is only slightly larger than Nµ. Applying the QRCP procedure to this subsampled matrix e
Z
approximately yields the choice of interpolation points, but the computational complexity is
reduced to O(N3
e). In the context of hybrid density functional calculations, we demonstrated
that the cost of the randomized QRCP method can be comparable to that of applying the
exchange operator in the planewave basis set.20 However, the ISDF decomposition can still
significantly reduce the computational cost, since the interpolation points only need to be
performed once for a fixed geometric configuration.
8
3 Centroidal Voronoi Tessellation based ISDF decom-
position
In this section, we demonstrate that the interpolation points can also be selected from a
Voronoi tessellation procedure. For a d-dimensional space, the Voronoi tessellation partitions
a set of points {ri}Ng
i=1 in Rdinto a number of disjoint cells. The partition is based on the
distance of each point to a finite set of points, called its generators. In our context, let
{ˆ
rµ}Nµ
µ=1 denote such a set of generators, the corresponding cell of a given generator ˆ
rµ,
defined through a cluster of points Cµis
Cµ={ri|dist(ri,ˆ
rµ)<dist(ri,ˆ
rν) for all µ6=ν}.(7)
The distance can be chosen to be any metric, e.g. the L2distance as dist(r,r0) = krr0k.
In the case when the distances of a point rto ˆ
rµ,ˆ
rνare exactly the same, we may arbitrarily
assign rto one of the clusters.
The Centroidal Voronoi tessellation (CVT) is a specific type of Voronoi tessellation in
which the generator ˆ
rµis chosen to be the centroid of its cell. Given a weight function ρ(r)
(such as the electron density), the centroid of a cluster Cµis defined as
c(Cµ) = Prj∈Cµrjρ(rj)
Prj∈Cµρ(rj).(8)
Combined with the L2distance, CVT can be viewed as a minimization problem over both
all possible partition of the cells and the centroids as26
{C
µ,c
µ}= arg min
{Cµ,cµ}
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)kricµk2,(9)
and the interpolation points are then chosen to be the minimizers ˆ
rµ=cµ(C
µ) = c
µ. Follow-
ing the discussion in the introduction, the electron density as the weight function (9) enforces
9
that the interpolation points should locate at points where the electron density is significant
and hence satisfies the requirement 1). Since the cells C
µare disjoint, the centroids c
µare
also separated by a finite distance away from each other and hence satisfies the requirement
2). In Appendix A we provide another theoretical justification in the sense that the CVT
method approximately minimizes the residual error of the ISDF decomposition.
Many algorithms have been developed to efficiently compute the Voronoi tessellation.27
One most widely used method is the Llyod’s algorithm,28 which in discrete case is equivalent
to the K-Means algorithm.26 The K-Means algorithm is an iterative method that greedily
minimizes the objective by taking alternating steps between {Cµ}and {cµ}. In this work, we
adopt a weighted version of the K-Means algorithm, which is demonstrated in Algorithm 1.
Note that the K-Means algorithm can be straightforwardly parallelized. We distribute the
grid points evenly at the beginning. The classification step is the most time consuming step,
and can be locally computed for each group of grid points. After this step, the weighted sum
and total weight of all clusters can be reduced from and broadcast to all processors for the
next iteration.
Algorithm 1: Weighted K-Means Algorithm to Find Interpolation Points for Density
Fitting
Input : Grid points {ri}Ng
i=1, Weight function ρ(r), Initial centroids {c(0)
µ}
Output: Interpolation points {ˆ
rµ}Nµ
µ=1
1Set t0
2do
3Classification step: for i= 1 to Ngdo
4Assign point rito the cluster C(t)
µif c(t)
µis the closest centroid to ri
5end
6Update step: for µ= 1 to Nµdo
7c(t+1)
µPrj∈C(t)
µrjρ(rj)/Prj∈C(t)
µρ(rj)
8end
9Set tt+ 1
10 while {c(t)
µ}not converged and maximum steps not reached;
11 for µ= 1 to Nµdo
12 Set ˆ
rµc(t)
µ
13 end
10
In order to demonstrate the CVT procedure, we consider the weight function ρ(r) given by
the summation of 4 Gaussian functions in a 2D domain. The initial choice of centroids, given
by 40 uniformly distributed random points, together with its associated Voronoi tessellation
are plotted in Figure 1 (a). Figure 1 (b) demonstrates the converged centroids and the
associated Voronoi tessellation using the weighted K-Means algorithm. We observe that the
centroids concentrate on where the weight function is significant, and are well-separated.
Figure 1: Schematic illustration of the CVT procedure for 4 Gaussian functions in a 2D do-
main, including (a) initial random choice of centroids and Voronoi tessellation and centroidal
Voronoi tessellation generated by the weighted K-Means algorithm.
We also show how the interpolation points are placed and moved during the ammonia-
borane (BH3NH3) decomposition reaction process. Figure 2 (a) shows the electron density
of the molecule at the compressed, equilibrium, and dissociated configurations, respectively,
according to the energy landscape in Fig. 2 (c). We plot the interpolation points found by
the weighted K-Means algorithm in Fig. 2 (b). At the compressed configuration, all the in-
terpolation points are distributed evenly around the molecule. As the bond length increases,
some interpolation points are transferred from BH3to NH3. Finally at the dissociated con-
figuration, the NH3has more interpolation points around the molecule, since there are more
electrons in NH3than BH3. Along the decomposition reaction process, both the transfer of
the interpolation points and the potential energy landscape are smooth with respect to the
change of the bond length.
11
Figure 2: The decomposition reaction process of BH3NH3computed with hybrid functional
(HSE06) calculations by using the CVT procedure to select interpolation points, including
(a) the electron density (yellow isosurfaces), (b) the interpolation points (yellow squares)
{ˆ
rµ}Nµ
µ=1 (Nµ= 8) selected from the real space grid points {ri}Ng
i=1 (Ng= 663) when the BN
distance respectively is 1.3, 1.7 and 2.8 ˚
A and (c) the binding energy as a function of BN
distance for BH3NH3in a 10 ˚
A×10 ˚
A×10 ˚
A box. The white, pink and blue pink balls
denote hydrogen, boron and nitrogen atoms, respectively.
12
4 Numerical results
We demonstrate the accuracy and efficiency of the ISDF-CVT method for hybrid functional
calculations by using the DGDFT (Discontinuous Galerkin Density Functional Theory) soft-
ware package.29–33 DGDFT is a massively parallel electronic structure software package de-
signed for large scale DFT calculations involving up to tens of thousands of atoms. It
includes a self-contained module called PWDFT for performing planewave based electronic
structure calculations (mostly for benchmarking and validation purposes). We implemented
the ISDF-CVT method in PWDFT. We use the Message Passing Interface (MPI) to handle
data communication, and the Hartwigsen-Goedecker-Hutter (HGH) norm-conserving pseu-
dopotential.34 All calculations use the HSE06 functional.35 All calculations are carried out on
the Edison systems at the National Energy Research Scientific Computing Center (NERSC).
Each node consists of two Intel “Ivy Bridge” processors with 24 cores in total and 64 gigabyte
(GB) of memory. Our implementation only uses MPI. The number of cores is equal to the
number of MPI ranks used in the simulation.
In this section, we demonstrate the performance of the ISDF-CVT method for acceler-
ating hybrid functional calculations by using three types of systems. 36 They consist of bulk
silicon systems (Si64, Si216 and Si1000 ), a bulk water system with 64 molecules ((H2O)64) and
a disordered silicon aluminum alloy system (Al176Si24). Bulk silicon systems (Si64, Si216 and
Si1000) and bulk water system ((H2O)64) are semiconducting with a relatively large energy
gap Egap >1.0 eV, and the Al176Si24 system is metallic with a small energy gap Egap <0.1
eV. All systems are closed shell systems, and the number of occupied bands is Nband =Ne/2.
In order to compute the energy gap in the systems, we also include two unoccupied bands
in all calculations.
13
4.1 Accuracy: Si216 and Al176Si24
We demonstrate the accuracy of the CVT-based ISDF decomposition in hybrid functional
calculation for semiconducting Si216 and metallic Al176Si24 systems, respectively. Although
there is no general theoretical guarantee for the convergence of the K-Means algorithm and
the convergence can depend sensitively on the initialization,37,38 we find that in the current
context, initialization to have little impact on the final accuracy of the approximation. Hence
we use random initialization for the K-Means algorithm. In all calculations, the adaptively
compressed exchange (ACE) technique is used to accelerate hybrid functional calculations
without loss of accuracy.39 The results obtained in this work are labeled as ACE-ISDF
(CVT), which are compared against those obtained from the previous work based on the
QRCP decomposition20 labeled as ACE-ISDF (QRCP). In both cases, we introduce a rank
parameter cto control the trade off between efficiency and accuracy, by setting the number
of interpolation points Nµ=cNe. We measure the error using the valence band maximum
(VBM) energy level, the conduction band minimum (CBM) energy level, the energy gap,
the Hartree-Fock energy, the total energy, and the atomic forces, respectively. The last three
quantities are defined as
EHF =|EACE-ISDF (CVT)
HF EACE
HF |/NA
E=|EACE-ISDF (CVT) EACE|/NA
F= max
IkFACE-ISDF (CVT)
IFACE
Ik
where NAis the number of atoms and Iis the atom index.
Table 1 shows that the accuracy of the ACE-ISDF (CVT) method can systematically
improve as the rank parameter cincreases. When the rank parameter is large enough, the
accuracy is fully comparable to that obtained from the benchmark calculations. For a more
modest choice c= 6.0, the error of the energy per atom reaches below the chemical accuracy
of 1 kcal/mol (1.6×103Ha/atom), and the error of the force is around 103Ha/Bohr. This
14
is comparable to the accuracy obtained from ACE-ISDF (QRCP), and to e.g. linear scaling
methods for insulating systems with reasonable amount of truncation needed to achieve
significant speedup.40 In fact, when compared with ACE-ISDF (QRCP) in Figure 3, we find
that the CVT based ISDF decomposition achieves slightly higher accuracy, though there is
no theoretical guarantee for this to hold in general. The last column of Table 1 shows the
runtime of the K-Means algorithm. As cincreases, the number of interpolation points and
hence the number of cells increases proportionally. Hence we observe that the runtime of
K-Means scales linearly with respect to c.
Figure 3: The accuracy of ACE-ISDF based hybrid functional calculations (HSE06) obtained
by using the CVT and QRCP procedures to select the interpolation points, with varying rank
parameter cfrom 4 to 20 for Si216 and Al176Si24 , including the error of (a) Hartree-Fock energy
EHF (Ha/atom) and (b) total energy E(Ha/atom).
15
Table 1: The accuracy of ACE-ISDF based hybrid functional calculations (HSE06) obtained
by using the CVT method to select interpolation points, with varying rank parameter c
for semiconducting Si216 and metallic Al176Si24 systems. The unit for VBM (EVBM ), CBM
(ECBM) and the energy gap Egap is eV. The unit for the error in the Hartree-Fock exchange
energy EHF and the total energy Eis Ha/atom, and the unit for the error in atomic forces
Fis Ha/Bohr. We use the results from the ACE-enabled hybrid functional calculations as
the reference. The last column shows the time for K-Means with different cvalues, with 434
cores for Si216 and 314 cores for Al176Si24 on Edison.
ACE-ISDF: Semiconducting Si216 (Nband = 432)
c EVBM ECBM Egap EHF EFTKMEANS
4.0 6.7467 8.3433 -1.5967 2.69E-03 3.08E-03 5.04E-03 0.228
5.0 6.6852 8.2231 -1.5379 9.46E-04 1.12E-03 2.29E-03 0.248
6.0 6.6640 8.1522 -1.4882 3.76E-04 4.62E-04 1.05E-03 0.301
7.0 6.6550 8.1163 -1.4613 1.55E-04 1.98E-04 6.49E-04 0.312
8.0 6.6510 8.1030 -1.4520 7.33E-05 9.55E-05 3.07E-04 0.349
9.0 6.6490 8.0980 -1.4490 3.60E-05 4.96E-05 2.30E-04 0.398
10.0 6.6479 8.0959 -1.4480 1.78E-05 2.64E-05 1.30E-04 0.477
12.0 6.6472 8.0945 -1.4473 4.46E-06 8.91E-06 8.37E-05 0.530
16.0 6.6469 8.0937 -1.4468 1.51E-07 1.41E-06 3.20E-05 0.773
20.0 6.6468 8.0935 -1.4467 4.06E-07 3.33E-07 1.20E-05 0.830
24.0 6.6468 8.0935 -1.4467 2.99E-07 1.06E-07 5.18E-06 0.931
ACE 6.6468 8.0934 -1.4466 0.00E+00 0.00E+00 0.00E+00 0.000
ACE-ISDF: Metallic Al176Si24 (Nband = 312)
c EVBM ECBM Egap EHF EFTKMEANS
4.0 7.9258 8.0335 -0.1076 3.80E-03 4.03E-03 8.01E-03 0.430
5.0 7.8537 7.9596 -0.1059 1.60E-03 1.69E-03 3.18E-03 0.535
6.0 7.8071 7.9127 -0.1056 6.07E-04 6.39E-04 1.48E-03 0.611
7.0 7.7843 7.8860 -0.1017 2.07E-04 2.17E-04 1.03E-03 0.731
8.0 7.7749 7.8749 -0.1000 7.43E-05 7.77E-05 4.40E-04 0.948
9.0 7.7718 7.8710 -0.0992 3.02E-05 3.20E-05 1.98E-04 0.947
10.0 7.7709 7.8697 -0.0989 1.48E-05 1.60E-05 1.80E-04 1.096
12.0 7.7703 7.8690 -0.0987 4.64E-06 5.60E-06 8.51E-05 1.305
16.0 7.7702 7.8688 -0.0986 6.35E-07 1.41E-06 3.24E-05 1.646
20.0 7.7701 7.8687 -0.0986 1.70E-08 5.30E-07 1.91E-05 2.037
ACE 7.7701 7.8687 -0.0986 0.00E+00 0.00E+00 0.00E+00 0.000
16
4.2 Efficiency: Si1000
We report the efficiency of the ISDF-CVT method by performing hybrid DFT calculations
for a bulk silicon system with 1000 atoms (Nband = 2000) on 2000 computational cores as
shown in Table 2, with respect to various choices of the kinetic energy cutoff (Ecut). With
the number of interpolation points fixed at Nµ= 12000, both QRCP and K-Means scales
linearly with the number of grid points Ng. Yet the runtime of K-Means is around two orders
of magnitude faster than QRCP. The determination of interpolation vectors, which consists
of solving a least-square problem, previously costs a fifth of the ISDF runtime but now
becomes the dominating component in CVT-based ISDF decomposition. Notice that the
ISDF method allows us to reduce the number of Poisson-like equations from N2
e= 4 ×106to
Nµ= 12000, which results in a significant speedup in terms of the cost of the FFT operations.
Table 2: The wall clock time (in seconds) spent in the components of the ACE-ISDF and
ACE enabled hybrid DFT calculations related to the exchange operator, for Si1000 on 2002
Edison cores at different Ecut levels. Interpolation points are selected via either the QRCP
or CVT procedure with the same rank parameter c= 6.0. Ngis the number of grid points
in real space.
Si1000 ACE-ISDF ACE
Ecut NgIPQRCP IPKMEANS IV (FFT) FFT
10 74338.06 0.70 12.48 (0.33) 85.15
20 1043126.39 1.24 36.48 (0.71) 143.54
30 1283240.87 2.03 68.50 (1.43) 268.88
40 1483434.16 3.26 108.18 (3.10) 783.27
4.3 AIMD: Si64 and (H2O)64
In this section, we demonstrate the accuracy of the ACE-ISDF (CVT) method in the context
of AIMD simulations for a bulk silicon system Si64 under the NVE ensemble, and a liquid
water system (H2O)64 under the NVT ensemble, respectively. The MD time step size is
1.0 femtosecond (fs). For the Si64 system, the initial MD structure (initial temperature
17
T= 300 K) is optimized by hybrid DFT calculations, and we perform the simulation for
0.5 ps. For the (H2O)64 system, we perform the simulation for 2.0 ps to sample the radial
distribution function after equilibrating the system starting from a prepared initial guess.41
We use a single level Nose-Hoover thermostat42,43 at T= 295 K, and the choice of mass of
the Nose-Hoover thermostat is 85000 au.
In the AIMD simulation, the interpolation points need to be recomputed for each atomic
configuration. At the initial MD step, although the initialization strategy does not impact
the accuracy of the physical observable, it can impact the convergence rate of the K-Means
algorithm. We measure the convergence in terms of the fraction of points that switch clusters
during two consecutive iterations. Figure 4 (a) shows the convergence of the K-Means
algorithm with interpolation points initially chosen from a random distribution and from
the QRCP solution, respectively. We find that the K-Means algorithm spends around half
the number of iterations to wait for 0.1% of the points to settle on the respective clusters.
However, these points often belong to the boundary of the clusters and have little effect
on the positions of the centroids (interpolation points). Therefore, we decide to terminate
K-Means algorithm whenever the fraction of points that switch clusters falls below the 0.1%
threshold. It is evident that QRCP initialization leads to faster convergence than random
sampling. However, in the AIMD simulation, a very good initial guess of the interpolation
points can be simply obtained from those from the previous MD step. Figure 4 (b) shows
that the number of K-Means iterations in the MD simulation can be very small, which
demonstrates the effectiveness of this initialization strategy.
Figure 5 (a-b) demonstrate the positive and velocity of a Si atom over a MD trajectory
of 500 fs obtained using the ACE, ACE-ISDF (QRCP) and ACE-ISDF (CVT) methods
(rank parameter c= 8.0), respectively. The three trajectories fully overlap with each other,
indicating that ISDF is a promising method for reducing the cost of hybrid functional cal-
culations with controllable loss of accuracy. Figure 5 (c) shows the total potential energy
obtained by the three methods along the MD trajectory, and the difference among the three
18
Figure 4: Comparison of the ISDF-CVT method by using either random or QRCP initializa-
tion for hybrid DFT AIMD simulations on bulk silicon system Si64 and liquid water system
(H2O)64, including (a) the fraction of points what switch cluster in each K-Means iteration
and (b) the number of K-Means iterations during each MD step.
19
methods is more noticeable. This is due to the fact that ISDF decomposition is a low rank
decomposition for the pair product of orbitals, which leads to error in the Fock exchange
energy and hence the total potential energy. Nonetheless, we find that such difference merely
results in a shift of the potential energy surface along the MD trajectory, and hence affects
little physical observables defined via relative potential energy differences. Furthermore, the
CVT method yields a potential energy trajectory that is much smoother compared to that
obtained from QRCP. This is because the interpolation points obtained from CVT are driven
by the electron density, which varies smoothly along the MD trajectory. Such properties do
not hold for the QRCP method. This means that the CVT method can be more effective
when a smooth potential energy surface is desirable, such as the case of geometry optimiza-
tion. The absolute error of the potential energy from the CVT method is coincidentally
smaller than that from QRCP, but we are not aware of reasons for this behavior to hold
in general. Finally, Figure 5 (d) shows that both the CVT-based and QRCP-based ISDF
decomposition lead to controlled energy drift in the NVE simulation.
We also apply the ACE-ISDF (CVT) and ACE-ISDF (QRCP) methods for hybrid DFT
AIMD simulations on liquid water system (H2O)64 under the NVT ensemble to sample the
radial distribution function in Figure 6. We find that the results from all three methods
agree very well, and our result is in quantitative agreement with previous hybrid functional
DFT calculations,41 where the remaining difference with respect to the experimental result
can be to a large extent attributed to the nuclei quantum effects.44
5 Conclusion
In this work, we demonstrate that the interpolative separable density fitting decomposition
(ISDF) can be efficiently performed through a separated treatment of interpolation points
and interpolation vectors. We find that the centroidal Voronoi tessellation method (CVT)
provides an effective choice of interpolation points using only the electron density as the
20
Figure 5: Comparison of hybrid HSE06 DFT AIMD simulations by using the ISDF-CVT
and ISDF-QRCP methods as well as exact nested two-level SCF iteration procedure as the
reference on the bulk silicon Si64, including three coordinates (X, Y and Z directions) of (a)
position and (b) velocity of a specific Si atom, (c) potential energy and (d) relatively energy
drift during MD steps.
21
Figure 6: The oxygen-oxygen radial distribution functions gOO(r) of liquid water system
(H2O)64 at T= 295 K obtained from hybrid DFT AIMD simulations with the ISDF-CVT
and ISDF-QRCP methods as well as exact nested two-level SCF iteration procedure as the
reference.
input information. The resulting choice of interpolation points are by design inhomogeneous
in the real space, concentrated at regions where the electron density is significant, and are
well separated from each other. These are all key ingredients for obtaining a low rank decom-
position that is accurate and a well conditioned set of interpolation vectors. We demonstrate
that the CVT-based ISDF decomposition can be an effective strategy for reducing the cost
hybrid functional calculations for large systems. The CVT-based method achieves similar
accuracy when compared with that obtained from QRCP, with significantly improved effi-
ciency. For a supercell containing 1000 silicon atoms on 2000 computational cores, the cost
of QRCP-based method for finding the interpolation points costs 434.2 seconds, while the
CVT procedure only takes 3.2 seconds. Since the solution of the CVT method depends con-
tinuously with respect to the electron density, we also find that the CVT method produces a
smoother potential energy surface than that by the QRCP method in the context of ab initio
molecular dynamics simulation. Our analysis indicates that it might be possible to further
improve the quality of the interpolation points by taking into account the gradient informa-
tion in the weight vector. We also expect that the CVT-based strategy can also be useful in
other contexts where the ISDF decomposition is applicable, such as ground state calculations
22
with rung-5 exchange-correlation functionals, and excited state calculations. These will be
explored in the future work.
6 Acknowledgments
This work was partly supported by the National Science Foundation under grant No. DMS-
1652330, the DOE under grant No. de-sc0017867, the DOE CAMERA project (L. L.), and
by the DOE Scientific Discovery through Advanced Computing (SciDAC) program (K. D.,
W. H. and L. L.). The authors thank the National Energy Research Scientific Computing
(NERSC) center and the Berkeley Research Computing (BRC) program at the University
of California, Berkeley for making computational resources available. We thank Anil Damle
and Robert Saye for useful discussions.
A Minimization of the approximate residual
The ISDF decomposition is a highly nonlinear process, and in general we cannot expect the
choice of interpolation points from CVT decomposition to maximally reduce the error of the
decomposition. Here we demonstrate that the choice of the interpolation points from the
centroidal Voronoi tessellation algorithm approximately minimizes the residual for the ISDF
decomposition, and hence provides a heuristic solution to the problem of finding interpolation
points.
For simplicity we assume ϕi=ψi, and hence each row of Zis Z(r)=[ϕi(r)ϕj(r)]N
i,j=1.
Now suppose we cluster all matrix rows of Zinto sub-collections {Cµ}Nµ
µ=1, and for each Cµwe
choose a representative matrix row Z(rµ). Then the error of the ISDF can be approximately
characterized as
R=
Nµ
X
µ=1 X
rk∈Cµ
Z(rk)Projspan{Z(rµ)}Z(rk)
2,(10)
23
where the projection is defined according to the L2inner product as
Projspan{Z(rµ)}Z(rk) = Z(rk)·Z(rµ)
Z(rµ)·Z(rµ)Z(rµ).(11)
Let Φ be the Ng×Nmatrix with each row Φ(r)=[ϕi(r)]N
i=1, then the electron density ρ(r)
is equal to Φ(r)·Φ(r). Using the relation
Z(rµ)·Z(rµ) = (Φ(rµ)·Φ(rµ))2=ρ(rµ)2,(12)
we have
R=
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)21(Φ(rk)·Φ(rµ))4
ρ(rk)2ρ(rµ)2=
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)2[1 cos4(θ(rk,rµ))].(13)
Here θ(rk,rµ) is the angle between the vectors Φ(rk) and Φ(rµ). Use the fact that
ρ(rk)[1 cos4(θ(rk,rµ))] 2Φ(rk)·Φ(rk) sin2(θ(rk,rµ)) 2kΦ(rk)Φ(rµ)k2,(14)
we have
R2
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)kΦ(rk)Φ(rµ)k22
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)k∇rΦ(rµ)k2krkrµk2.(15)
If we further neglect the spatial inhomogeneity of the gradient Φ(r), we arrive at the mini-
mization criterion for the centroidal Voronoi decomposition.
References
(1) Szabo, A.; Ostlund, N. Modern Quantum Chemistry: Introduction to Advanced Elec-
tronic Structure Theory; McGraw-Hill, New York, 1989.
(2) Martin, R. Electronic Structure Basic Theory and Practical Methods; Cambridge
24
Univ. Pr.: West Nyack, NY, 2004.
(3) Murphy, R. B.; Beachy, M. D.; Friesner, R. A.; Ringnalda, M. N. Pseudospectral lo-
calized MøllerPlesset methods: Theory and calculation of conformational energies. J.
Chem. Phys. 1995,103, 1481.
(4) Reynolds, G.; Martinez, T. J.; Carter, E. A. Local weak pairs spectral and pseudospec-
tral singles and doubles configuration interaction. J. Chem. Phys. 1996,105, 6455.
(5) Beebe, N. H. F.; Linderberg, J. Simplifications in the generation and transformation of
two-electron integrals in molecular calculations. Int. J. Quantum Chem. 1977,12, 683.
(6) Koch, H.; anchez de Mer´as, A.; Pedersen, T. B. Reduced scaling in electronic structure
calculations using Cholesky decompositions. J. Chem. Phys. 2003,118, 9481–9484.
(7) Aquilante, F.; Pedersen, T. B.; Lindh, R. Low-cost evaluation of the exchange Fock
matrix from Cholesky and density fitting representations of the electron repulsion in-
tegrals. J. Chem. Phys. 2007,126, 194106.
(8) Manzer, S.; Horn, P. R.; Mardirossian, N.; Head-Gordon, M. Fast, accurate evaluation
of exact exchange: The occ-RI-K algorithm. J. Chem. Phys. 2015,143, 024113.
(9) Ren, X.; Rinke, P.; Blum, V.; Wieferink, J.; Tkatchenko, A.; Sanfilippo, A.; Reuter, K.;
Scheffler, M. Resolution-of-Identity Approach to Hartree-Fock, Hybrid Density Func-
tionals, RPA, MP2 and GW with Numeric Atom-Centered Orbital Basis Functions.
New J. Phys. 2012,14, 053020.
(10) Weigend, F. A fully direct RI-HF algorithm: Implementation, optimised auxiliary basis
sets, demonstration of accuracy and efficiency. Phys. Chem. Chem. Phys. 2002,4,
4285–4291.
(11) Parrish, R. M.; Hohenstein, E. G.; Mart´ınez, T. J.; Sherrill, C. D. Tensor hypercon-
traction. II. Least-squares renormalization. J. Chem. Phys. 2012,137, 224106.
25
(12) Parrish, R. M.; Hohenstein, E. G.; Mart´ınez, T. J.; Sherrill, C. D. Discrete variable
representation in electronic structure theory: Quadrature grids for least-squares tensor
hypercontraction. J. Chem. Phys. 2013,138, 194107.
(13) Goedecker, S. Linear scaling electronic structure methods. Rev. Mod. Phys. 1999,71,
1085–1123.
(14) Bowler, D. R.; Miyazaki, T. O(N) methods in Electronic Structure Calculations. Rep.
Prog. Phys. 2012,75, 036503.
(15) Guidon, M.; Hutter, J.; Vandevondele, J. Auxiliary density matrix methods for Hartree-
Fock exchange calculations. J. Chem. Theory Comput. 2010,6, 2348–2364.
(16) Lu, J.; Ying, L. Compression of the Electron Repulsion Integral Tensor in Tensor Hy-
percontraction Format with Cubic Scaling Cost. J. Comput. Phys. 2015,302, 329–335.
(17) Golub, G. H.; Van Loan, C. F. Matrix computations, 4th ed.; Johns Hopkins Univ.
Press: Baltimore, 2013.
(18) Lu, J.; Thicke, K. Cubic scaling algorithms for RPA correlation using interpolative
separable density fitting. J. Comput. Phys. 2017,351, 187 202.
(19) Lin, L.; Xu, Z.; Ying, L. Adaptively Compressed Polarizability Operator for Acceler-
ating Large Scale Ab Initio Phonon Calculations. Multiscale Model. Simul. 2017,15,
29–55.
(20) Hu, W.; Lin, L.; Yang, C. Interpolative Separable Density Fitting Decomposition for
Accelerating Hybrid Density Functional Calculations With Applications to Defects in
Silicon. J. Chem. Theory Comput. 2017,accepted.
(21) Aurenhammer, F. Voronoi diagramsa survey of a fundamental geometric data structure.
ACM Computing Surveys (CSUR) 1991,23, 345–405.
26
(22) Du, Q.; Gunzburger, M.; Ju, L.; Wang, X. Centroidal Voronoi tessellation algorithms
for image compression, segmentation, and multichannel restoration. Journal of Mathe-
matical Imaging and Vision 2006,24, 177–194.
(23) Ogniewicz, R. L.; ubler, O. Hierarchic voronoi skeletons. Pattern recognition 1995,
28, 343–359.
(24) Becke, A. D. A multicenter numerical integration scheme for polyatomic molecules. The
Journal of chemical physics 1988,88, 2547–2553.
(25) Chan, T. F.; Hansen, P. C. Some Applications of the Rank Revealing QR Factorization.
SIAM J. Sci. Statist. Comput. 1992,13, 727–741.
(26) MacQueen, J. Some methods for classification and analysis of multivariate observations.
Proc. of the Fifth Berkeley Symp. On Math. Stat. and Prob. 1967; pp 281–297.
(27) Medvedev, N. The algorithm for three-dimensional Voronoi polyhedra. Journal of com-
putational physics 1986,67, 223–229.
(28) Lloyd, S. Least squares quantization in PCM. IEEE transactions on information theory
1982,28, 129–137.
(29) Lin, L.; Lu, J.; Ying, L.; E, W. Adaptive Local Basis Set for KohnCSham Density
Functional Theory in a Discontinuous Galerkin Framework I: Total Energy Calculation.
J. Comput. Phys. 2012,231, 2140–2154.
(30) Hu, W.; Lin, L.; Yang, C. DGDFT: A Massively Parallel Method for Large Scale Density
Functional Theory Calculations. J. Chem. Phys. 2015,143, 124110.
(31) Hu, W.; Lin, L.; Yang, C. Edge Reconstruction in Armchair Phosphorene Nanoribbons
Revealed by Discontinuous Galerkin Density Functional Theory. Phys. Chem. Chem.
Phys. 2015,17, 31397–31404.
27
(32) Banerjee, A. S.; Lin, L.; Hu, W.; Yang, C.; Pask, J. E. Chebyshev Polynomial Filtered
Subspace Iteration in the Discontinuous Galerkin Method for Large-Scale Electronic
Structure Calculations. J. Chem. Phys. 2016,145, 154101.
(33) Zhang, G.; Lin, L.; Hu, W.; Yang, C.; Pask, J. E. Adaptive Local Basis Set for Kohn-
Sham Density Functional Theory in a Discontinuous Galerkin Framework II: Force,
Vibration, and Molecular Dynamics Calculations. J. Comput. Phys. 2017,335, 426–
443.
(34) Hartwigsen, C.; Goedecker, S.; Hutter, J. Relativistic Separable Dual-Space Gaussian
Pseudopotentials from H to Rn. Phys. Rev. B 1998,58, 3641.
(35) Heyd, J.; Scuseria, G. E.; Ernzerhof, M. Erratum: ”Hybrid functionals based on a
screened Coulomb potential” [J. Chem. Phys. 118, 8207 (2003)]. J. Chem. Phys. 2006,
124, 219906.
(36) Hu, W.; Lin, L.; Yang, C. Projected Commutator DIIS Method for Accelerating Hy-
brid Functional Electronic Structure Calculations. J. Chem. Theory Comput. 2017,
accepted.
(37) Arthur, D.; Vassilvitskii, S. How slow is the k-means method? Proceedings of the
twenty-second annual symposium on Computational geometry. 2006; pp 144–153.
(38) Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. Proceedings
of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 2007; pp 1027–
1035.
(39) Lin, L. Adaptively Compressed Exchange Operator. J. Chem. Theory Comput. 2016,
12, 2242–2249.
(40) Dawson, W.; Gygi, F. Performance and Accuracy of Recursive Subspace Bisection for
28
Hybrid DFT Calculations in Inhomogeneous Systems. J. Chem. Theory Comput. 2013,
11, 4655–4663.
(41) Jr., R. A. D.; Santra, B.; Li, Z.; Wu, X.; Car, R. The Individual and Collective effects
of Exact Exchange and Dispersion Interactions on the Ab Initio Structure of Liquid
Water. J. Chem. Phys. 2014,141, 084502.
(42) Nos´e, S. A Unified Formulation of the Constant Temperature Molecular Dynamics
Methods. J. Chem. Phys. 1984,81, 511.
(43) Hoover, W. G. Canonical Dynamics: Equilibrium Phase-Space Distributions. Phys.
Rev. A 1985,31, 1695.
(44) Morrone, J.; Car, R. Nuclear quantum effects in water. Phys. Rev. Lett. 2008,101,
017801.
29
... The question we want to address is very closely related to a widely studied in computational chemistry, namely tensor hypercontraction (THC) [14,15,[21][22][23][24]. In fact, that technique was originally introduced in order to reduce the computation requirements related to evaluating the ERI tensor, similar to our purpose. ...
... , N . In that context, M is called the THC-rank, and through numerous studies it has been verified that it typically scales linearly in N [21][22][23][24]. ...
... In fact, we can simply bound the latter expression byε V ≤ ϵ V N 2 ∥V ijkl ∥ 2 /∥V ∥ (see App. C). Even though we do not expect this bound to be tight and ratherε V ≈ ϵ V nevertheless, as suggested by Figs. 1, 2 and other empirical studies [21][22][23][24], ϵ V goes down quickly when increasing the THC rank M , and thereforeε V can be sufficiently small by choosing a proper M ≪ N 2 . Additionally, in case the quantum algorithm is used to prepare the ground state of the original Hamiltonian (by, for instance, adiabatic evolution), we already gave evidence in the Hydrogen chain, for which we can benchmark the final result by means of DMRG computations, that M can be kept as small as O(N ) (see Section II). ...
Preprint
We propose a quantum algorithm to simulate the dynamics in quantum chemistry problems. It is based on adding fresh qubits at each Trotter step, which enables a simpler implementation of the dynamics in the extended system. After each step, the extra qubits are recycled, so that the whole process accurately approximates the correct unitary evolution. A key ingredient of the approach is an isometry that maps a simple, diagonal Hamiltonian in the extended system to the original one. We give a procedure to compute this isometry, while minimizing the number of extra qubits required. We estimate the error at each time step, as well as the number of gates, which scales as O(N2)O(N^2), where N is the number of orbitals. We illustrate our results with two examples: the Hydrogen chain and the FeMoCo molecule. In the Hydrogen chain we observe that the error scales in the same way as the Trotter error. For FeMoCo, we estimate the number of gates in a fault-tolerant setup.
... In this work, we present an improved K-means clustering algorithm [13][14][15] as an alternative to the QRCP procedure 10,11 in ISDF, 12 ...
... In order to accelerate the DFPT calculations within plane waves, we introduce the ACP 8,9 algorithm to reduce the computational complexity of DFPT calculations from O(N 4 ) to O(N 3 ) by combining the ISDF 10-12 algorithm. We present the improved K-means clustering algorithm [13][14][15] to select the interpolation points in ISDF, which offers a much cheaper quadratic-scaling alternative to the expensive cubic-scaling QRCP algorithm to accelerate the ACP based DFPT calculations within plane waves as shown in Figure 1. ...
... The orbitals pairs can be expanded approximately by a set of auxiliary basis functions In this study, we adopt a much faster interpolation points scheme based on the K-means clustering method with a computational complexity of O(N 2 e ), [13][14][15] significantly accelerating the computing process of the ACP algorithm. The K-means clustering algorithm, being an unsupervised machine learning method, is specifically designed to address optimization ...
... The transposed Khatri-Rao product of the Kohn-Sham orbitals can be expressed by z ij = ϕ i (r)ψ * j (r) ≈ Nµ µ=1 ζ µ (r)ϕ i (r µ )ψ * j (r µ ), where {r µ } Nµ µ=1 are a set of interpolation points from grid points {r i } Nr i=1 in real space, N µ is proportional to N ϕ N ψ (N µ = t N ϕ N ψ , t is the rank truncation constant), and ζ µ (r) is the auxiliary basis functions (ABFs). The ISDF decomposition has already been successfully applied in several types of multi-center integrals in the Kohn-Sham DFT calculations within Gaussian-type orbitals (GTOs), numerical atomic orbitals (NAOs), and plane-wave (PW) basis sets, such as hybrid DFT calculations, [31][32][33][34] RPA correlation, 30 quantum Monte Carlo (QMC) simulations, 35 TDDFT, 36 MP2, 37 GW 38,39 and BSE 40 calculations, for molecular and periodic systems. ...
... The standard approach is the randomized QRCP as mentioned previously, 29 which is accurate but expensive in the ISDF decomposition process. 31 Another approach is the centroidal Voronoi tessellation (CVT) algorithm proposed by Dong et al., 32 which only requires the information from the electron density in the DFT calculations. The CVT method can be performed easily by K-means clustering algorithm, a classical unsupervised machine learning algorithm. ...
... As K-means clustering converges only to a local optimal solution, the accuracy of K-means clustering is heavily reliant on the selection of initial centroids and the definition of distance and centroids. 32,34,41,42 For realvalued orbitals, recent numerical results [31][32][33][34] have demonstrated that K-means clustering algorithms with relatively simple weight definitions could reduce the computation cost and yield reasonably high accuracy. ...
... This low-rank approximation inherently maintains cubic complexity and, when integrated into plane wave basis set algorithms, effectively addresses the aforementioned prefactor issue of magnitudes exceeding 10 3 . 27,[29][30][31][32][33][34][35][36][37] With the incorporation of the ISDF algorithm, implementations of both low-scaling EXX and RPA algorithms under a plane wave basis set no longer suffer from the challenges posed by high prefactor values which hamper memory and computational efficiency. ...
... In addition to the QRCP algorithm, another approach is the centroidal Voronoi tessellation (CVT) algorithm proposed by Dong et al., 30 which can be performed easily by K-means clustering algorithm, a classical unsupervised machine learning algorithm. Such K-means clustering algorithm aims at dividing real space grid points into some clusters. ...
... Compared to QRCP, the K-means algorithm substantially reduces both memory requirements and computational complexity. 27,30,32,35,[41][42][43][44][45] Since K-means clustering only converges to a local optimal solution, the accuracy of K-means clustering strongly depends on the selection of initial centroids and the definition of weight function and centroids. 35,41 In our current work, we introduce a novel weight function that is concurrently applicable to both EXX and RPA calculations. ...
... The so-called interpolation points {r k } are selected with randomized QR decomposition or K-means clustering algorithms and the ISDF-equivalent to the ABFs are then obtained by least-square optimization. 38 The ISDF method has been used to accelerate different electronic structure methods, including HF or hybrid functionals, 37,[39][40][41][42] MP2, 43 RPA, 36 GW, 44 the Bethe-Salpeter equation 45,46 (BSE), or quantum Monte Carlo, 47 and has been implemented in plane-wave, real-space, GTO, and NAO basis set codes. 38 Recently, Duchemin and Blase 48 proposed a separable real-space RI scheme (RI-RS), where ISDF-like expansion coefficients were obtained over a compact set of real-space points by fitting them to the RI-V coefficients. ...
Article
Full-text available
Four-center two-electron Coulomb integrals routinely appear in electronic structure algorithms. The resolution-of-the-identity (RI) is a popular technique to reduce the computational cost for the numerical evaluation of these integrals in localized basis-sets codes. Recently, Duchemin and Blase proposed a separable RI scheme [J. Chem. Phys. 150, 174120 (2019)], which preserves the accuracy of the standard global RI method with the Coulomb metric and permits the formulation of cubic-scaling random phase approximation (RPA) and GW approaches. Here, we present the implementation of a separable RI scheme within an all-electron numeric atom-centered orbital framework. We present comprehensive benchmark results using the Thiel and the GW100 test set. Our benchmarks include atomization energies from Hartree–Fock, second-order Møller–Plesset (MP2), coupled-cluster singles and doubles, RPA, and renormalized second-order perturbation theory, as well as quasiparticle energies from GW. We found that the separable RI approach reproduces RI-free HF calculations within 9 meV and MP2 calculations within 1 meV. We have confirmed that the separable RI error is independent of the system size by including disordered carbon clusters up to 116 atoms in our benchmarks.
Article
Full-text available
The simulation of chemistry is among the most promising applications of quantum computing. However, most prior work exploring algorithms for block encoding, time evolving, and sampling in the eigenbasis of electronic structure Hamiltonians has either focused on modeling finite-sized systems, or has required a large number of plane-wave basis functions. In this work, we extend methods for quantum simulation with Bloch orbitals constructed from symmetry-adapted atom-centered orbitals so that one can model periodic ab initio Hamiltonians using only a modest number of basis functions. We focus on adapting existing algorithms based on combining qubitization with tensor factorizations of the Coulomb operator. Significant modifications of those algorithms are required to obtain an asymptotic speedup leveraging translational (or, more broadly, Abelian) symmetries. We implement block encodings using known tensor factorizations and a new Bloch orbital form of tensor hypercontraction. Finally, we estimate the resources required to deploy our algorithms to classically challenging model materials relevant to the chemistry of lithium nickel oxide battery cathodes within the surface code. We find that even with these improvements, the quantum runtime of these algorithms is on the order of thousands of days and further algorithmic improvements are required to make realistic quantum simulation of materials practical.
Article
We present a low-scaling algorithm for the random phase approximation (RPA) with k-point sampling in the framework of tensor hypercontraction (THC) for electron repulsion integrals (ERIs). The THC factorization is obtained via a revised interpolative separable density fitting (ISDF) procedure with a momentum-dependent auxiliary basis for generic single-particle Bloch orbitals. Our formulation does not require preoptimized interpolating points or auxiliary bases, and the accuracy is systematically controlled by the number of interpolating points. The resulting RPA algorithm scales linearly with the number of k-points and cubically with the system size without any assumption on sparsity or locality of orbitals. The errors of ERIs and RPA energy show rapid convergence with respect to the size of the THC auxiliary basis, suggesting a promising and robust direction to construct efficient algorithms of higher order many-body perturbation theories for large-scale systems.
Article
Hybrid density functional theory (DFT) remains intractable for large periodic systems due to the demanding computational cost of exact exchange. We apply the tensor hypercontraction (THC) (or interpolative separable density fitting) approximation to periodic hybrid DFT calculations with Gaussian-type orbitals using the Gaussian plane wave approach. This is done to lower the computational scaling with respect to the number of basis functions (N) and k-points (Nk) at a fixed system size. Additionally, we propose an algorithm to fit only occupied orbital products via THC (i.e., a set of points, NISDF) to further reduce computation time and memory usage. This algorithm has linear scaling cost with k-points, no explicit dependence of NISDF on basis set size, and overall cubic scaling with unit cell size. Significant speedups and reduced memory usage may be obtained for moderately sized k-point meshes, with additional gains for large k-point meshes. Adequate accuracy can be obtained using THC-oo-K for self-consistent calculations. We perform illustrative hybrid density function theory calculations on the benzene crystal in the basis set and thermodynamic limits to highlight the utility of this algorithm.
Article
Full-text available
The commutator direct inversion of the iterative subspace (commutator DIIS or C-DIIS) method developed by Pulay is an efficient and the most widely used scheme in quantum chemistry to accelerate the convergence of self consistent field (SCF) iterations in Hartree-Fock theory and Kohn-Sham density functional theory. The C-DIIS method requires the explicit storage of the density matrix, the Fock matrix and the commutator matrix. Hence the method can only be used for systems with a relatively small basis set, such as the Gaussian basis set. We develop a new method that enables the C-DIIS method to be efficiently employed in electronic structure calculations with a large basis set such as planewaves for the first time. The key ingredient is the projection of both the density matrix and the commutator matrix to an auxiliary matrix called the gauge-fixing matrix. The resulting projected commutator-DIIS method (PC-DIIS) only operates on matrices of the same dimension as the that consists of Kohn-Sham orbitals. The cost of the method is comparable to that of standard charge mixing schemes used in large basis set calculations. The PC-DIIS method is gauge-invariant, which guarantees that its performance is invariant with respect to any unitary transformation of the Kohn-Sham orbitals. We demonstrate that the PC-DIIS method can be viewed as an extension of an iterative eigensolver for nonlinear problems. We use the PC-DIIS method for accelerating Kohn-Sham density functional theory calculations with hybrid exchange-correlation functionals, and demonstrate its superior performance compared to the commonly used nested two-level SCF iteration procedure.
Article
Full-text available
We present a new efficient way to perform hybrid density functional theory (DFT) based electronic structure calculation. The new method uses an interpolative separable density fitting (ISDF) procedure to construct a set of numerical auxiliary basis vectors and a compact approximation of the matrix consisting of products of occupied orbitals represented in a large basis set such as the planewave basis. Such an approximation allows us to reduce the number of Poisson solves from \Or(N_{e}^2) to \Or(N_{e}) when we apply the exchange operator to occupied orbitals in an iterative method for solving the Kohn-Sham equations, where NeN_{e} is the number of electrons in the system to be studied. We show that the ISDF procedure can be carried out in \Or(N_{e}^3) operations, with a much smaller pre-constant compared to methods used in existing approaches. When combined with the recently developed adaptively compressed exchange (ACE) operator formalism, which reduces the number of times the exchange operator needs to be updated, the resulting ACE-ISDF method significantly reduces the computational cost \REV{associated with the exchange operator} by nearly two orders of magnitude compared to existing approaches for a large silicon system with 1000 atoms. We demonstrate that the ACE-ISDF method can produce accurate energies and forces for insulating and metallic systems, and that it is possible to obtain converged hybrid functional calculation results for a 1000-atom bulk silicon within 10 minutes on 2000 computational cores. We also show that ACE-ISDF can scale to 8192 computational cores for a 4096-atom bulk silicon system. We use the ACE-ISDF method to geometrically optimize a 1000-atom silicon system with a vacancy defect using the HSE06 functional and computes its electronic structure.
Article
Full-text available
We present a new cubic scaling algorithm for the calculation of the RPA correlation energy. Our scheme splits up the dependence between the occupied and virtual orbitals in χ0\chi^0 by use of Cauchy's integral formula. This introduces an additional integral to be carried out, for which we provide a geometrically convergent quadrature rule. Our scheme also uses the newly developed Interpolative Separable Density Fitting algorithm to further reduce the computational cost in a way analogous to that of the Resolution of Identity method.
Article
Full-text available
The Discontinuous Galerkin (DG) electronic structure method employs an adaptive local basis set to solve the equations of density functional theory in a discontinuous Galerkin framework. The methodology is implemented in the Discontinuous Galerkin Density Functional Theory (DGDFT) code for large-scale parallel electronic structure calculations. In DGDFT, the basis is generated on-the-fly to capture the local material physics, and can systematically attain chemical accuracy with only a few tens of degrees of freedom per atom. Hence, DGDFT combines the key advantage of planewave basis sets in terms of systematic improvability with that of localized basis sets in reducing basis size. A central issue for large-scale calculations, however, is the computation of the electron density from the discretized Hamiltonian in an efficient and scalable manner. We show in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) can be used to address this issue and push the envelope in large-scale materials simulations in a DG framework. In particular, this strategy makes it possible to attack complex materials problems involving many thousands of atoms routinely. We describe how the subspace filtering steps can be done in an efficient and scalable manner using a two-dimensional parallelization scheme, thanks to properties of the DG basis set and the DG Hamiltonian matrix. The on-the-fly nature of the basis requires additional care in carrying out the subspace iterations. We demonstrate the parallel scalability of the DGDFT-CheFSI approach using large-scale two-dimensional graphene sheets and bulk three-dimensional lithium-ion electrolyte systems as examples. Employing 55,296 computational cores, the time per self-consistent field iteration for a sample of the bulk 3D electrolyte containing 8,586 atoms is 90 seconds, and the time for a graphene sheet containing 11,520 atoms is 75 seconds.
Article
Full-text available
Nose has modified Newtonian dynamics so as to reproduce both the canonical and the isothermal-isobaric probability densities in the phase space of an N-body system. He did this by scaling time (with s) and distance (with V¹D/ in D dimensions) through Lagrangian equations of motion. The dynamical equations describe the evolution of these two scaling variables and their two conjugate momenta p/sub s/ and p/sub v/. Here we develop a slightly different set of equations, free of time scaling. We find the dynamical steady-state probability density in an extended phase space with variables x, p/sub x/, V, epsilon-dot, and zeta, where the x are reduced distances and the two variables epsilon-dot and zeta act as thermodynamic friction coefficients. We find that these friction coefficients have Gaussian distributions. From the distributions the extent of small-system non-Newtonian behavior can be estimated. We illustrate the dynamical equations by considering their application to the simplest possible case, a one-dimensional classical harmonic oscillator.
Article
Phonon calculations based on first principle electronic structure theory, such as the Kohn-Sham density functional theory, have wide applications in physics, chemistry and material science. The computational cost of first principle phonon calculations typically scales steeply as O(Ne4)\mathcal{O}(N_e^4), where NeN_e is the number of electrons in the system. In this work, we develop a new method to reduce the computational complexity of computing the full dynamical matrix, and hence the phonon spectrum, to O(Ne3)\mathcal{O}(N_e^3). The key concept for achieving this is to compress the polarizability operator adaptively with respect to the perturbation of the potential due to the change of the atomic configuration. Such adaptively compressed polarizability operator (ACP) allows accurate computation of the phonon spectrum. The reduction of complexity only weakly depends on the size of the band gap, and our method is applicable to insulators as well as semiconductors with small band gaps. We demonstrate the effectiveness of our method using one-dimensional and two-dimensional model problems.
Article
The Fock exchange operator plays a central role in modern quantum chemistry. The large computational cost associated with the Fock exchange operator hinders Hartree-Fock calculations and Kohn-Sham density functional theory calculations with hybrid exchange-correlation functionals, even for systems consisting of hundreds of atoms. We develop the adaptively compressed exchange operator (ACE) formulation, which greatly reduces the computational cost associated with the Fock exchange operator without loss of accuracy. The ACE formulation does not depend on the size of the band gap, and thus can be applied to insulating, semiconducting as well as metallic systems. In an iterative framework for solving Hartree-Fock-like systems such as in planewave based methods, the ACE formulation only requires moderate modification of the code. The ACE formulation can also be advantageous for other types of basis sets, especially when the storage cost of the exchange operator is expensive. Numerical results indicate that the ACE formulation can become advantageous even for small systems with tens of atoms. In particular, the cost of each self-consistent field iteration for the electron density in the ACE formulation is only marginally larger than that of the generalized gradient approximation (GGA) calculation, and thus offers orders of magnitude speedup for Hartree-Fock-like calculations.