Content uploaded by Wei Hu
Author content
All content in this area was uploaded by Wei Hu on Jun 26, 2018
Content may be subject to copyright.
Interpolative Separable Density Fitting through
Centroidal Voronoi Tessellation With
Applications to Hybrid Functional Electronic
Structure Calculations
Kun Dong,∗,†Wei Hu,∗,‡and Lin Lin∗,¶,‡
Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, United
States, Computational Research Division, Lawrence Berkeley National Laboratory,
Berkeley, California 94720, United States, and Department of Mathematics, University of
California, Berkeley, California 94720, United States
E-mail: kd383@cornell.edu; whu@lbl.gov; linlin@math.berkeley.edu
Abstract
The recently developed interpolative separable density fitting (ISDF) decomposition
is a powerful way for compressing the redundant information in the set of orbital
pairs, and has been used to accelerate quantum chemistry calculations in a number
of contexts. The key ingredient of the ISDF decomposition is to select a set of non-
uniform grid points, so that the values of the orbital pairs evaluated at such grid points
can be used to accurately interpolate those evaluated at all grid points. The set of non-
uniform grid points, called the interpolation points, can be automatically selected by a
∗To whom correspondence should be addressed
†Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, United States
‡Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720,
United States
¶Department of Mathematics, University of California, Berkeley, California 94720, United States
1
arXiv:1711.01531v1 [physics.comp-ph] 5 Nov 2017
QR factorization with column pivoting (QRCP) procedure. This is the computationally
most expensive step in the construction of the ISDF decomposition. In this work, we
propose a new approach to find the interpolation points based on the centroidal Voronoi
tessellation (CVT) method, which offers a much less expensive alternative to the QRCP
procedure when ISDF is used in the context of hybrid functional electronic structure
calculations. The CVT method only uses information from the electron density, and
can be efficiently implemented using a K-Means algorithm. We find that this new
method achieves comparable accuracy to the ISDF-QRCP method, at a cost that is
negligible in the overall hybrid functional calculations. For instance, for a system
containing 1000 silicon atoms simulated using the HSE06 hybrid functional on 2000
computational cores, the cost of QRCP-based method for finding the interpolation
points costs 434.2 seconds, while the CVT procedure only takes 3.2 seconds. We also
find that the ISDF-CVT method also enhances the smoothness of the potential energy
surface in the context of ab initio molecular dynamics (AIMD) simulations with hybrid
functionals.
1 Introduction
Orbital pairs of the form {ϕi(r)ψj(r)}N
i,j=1, where ϕi, ψjare single particle orbitals, appear
ubiquitously in quantum chemistry. A few examples include the Fock exchange operator, the
MP2 amplitude, and the polarizability operator.1,2 When Nis proportional to the number
of electrons Nein the system, the total number of orbital pairs is N2∼ O(N2
e). On the
other hand, the number of degrees of freedom needed to resolve all orbital pairs on a dense
grid is only O(Ne). Hence as Nebecomes large, the set of all orbital pairs contains apparent
redundant information. In order to compress the redundant information and to design more
efficient numerical algorithms, many algorithms in the past few decades have been developed.
Pseudospectral decomposition, 3,4 Cholesky decomposition,5–8 density fitting (DF) or reso-
lution of identity (RI),9,10 and tensor hypercontraction (THC)11,12 are only a few examples
2
towards this goal. When the single particle orbitals ϕi, ψjare already localized functions,
“local methods” or “linear scaling methods”13–15 can be applied to construct such decom-
position with cost that scales linearly with respect to Ne. Otherwise, the storage cost of the
matrix to represent all orbital pairs on a grid is already O(N3
e), and the computational cost
of compressing the orbital pairs is then typically O(N4
e).
Recently, Lu and Ying developed a new decomposition called the interpolative separable
density fitting (ISDF),16 which takes the following form
ϕi(r)ψj(r)≈
Nµ
X
µ=1
ζµ(r) (ϕi(ˆ
rµ)ψj(ˆ
rµ)) .(1)
For a given r, if we view ψi(r)ψj(r) as a row of the matrix {ψiψj}discretized on a dense
grid, then the ISDF decomposition states that all such matrix rows can be approximately
expanded using a linear combination of matrix rows with respect to a selected set of in-
terpolation points {ˆ
rµ}Nµ
µ=1. The coefficients of such linear combination, or interpolating
vectors, are denoted by {ζµ(r)}Nµ
µ=1. Here Nµcan be interpreted as the numerical rank of the
ISDF decomposition. Compared to the standard density fitting method, the three-tensor
(ϕi(ˆ
rµ)ψj(ˆ
rµ)) with three indices i, j, µ takes a separable form. This reduces the storage cost
of the decomposed tensor from O(N3
e) to O(N2
e), and the computational cost from O(N4
e) to
O(N3
e). Note that if the interpolation points {ˆ
rµ}Nµ
µ=1 are chosen to be on a uniform grid, then
the ISDF decomposition reduces to the pseudospectral decomposition, where Nµ∼ O(Ne)
but with a large preconstant. For instance, the pseudospectral decomposition can be highly
inefficient for molecular systems, where the grid points in the vacuum contributes nearly
negligibly to the representation of the orbital pairs. On the other hand, by selecting the
interpolation points carefully, e.g. through a randomized QR factorization with column piv-
oting (QRCP) procedure,17 the number of interpolation points can be significantly reduced.
The QRCP based ISDF decomposition has been applied to accelerate a number of applica-
tions such as two-electron integral computation,16 correlation energy in the random phase
3
approximation,18 density functional perturbation theory,19 and hybrid density functional
calculations.20 For example, in the context of iterative solver for hybrid density functional
calculations, the Fock exchange operator VXdefined in terms of a set of orbitals {ϕi}needs
to be repeatedly applied to another set of Kohn-Sham orbitals {ϕj}
(VX[{ϕi}]ψj) (r) = −
Ne
X
i=1
ϕi(r)ZK(r,r0)ϕi(r0)ψj(r0) dr0.(2)
where K(r,r0) is the kernel for the Coulomb or the screened Coulomb operator. The integra-
tion in Eq. (2) is often carried out by solving Poisson-like equations, using e.g. a fast Fourier
transform (FFT) method, and the computational cost is O(N3
e) with a large preconstant.
This is typically the most time consuming component in hybrid functional calculations, and
can be accelerated by the ISDF decomposition for the orbital pairs {ϕiψj}.
In Ref.,16 the interpolation points and the interpolation vectors are determined simulta-
neously through a randomized QR factorization with column pivoting (QRCP) applied to
{ψi(r)ψj(r)}directly. We recently found that the randomized QRCP procedure has O(N3
e)
complexity but with a relatively large preconstant, and may not be competitive enough when
used repeatedly. In order to overcome such difficulty, we proposed a different approach in
Ref.20 that determines the two parts separately and reduces the computational cost. We
use the relatively expensive randomized QRCP procedure to find the interpolation points
in advance, and only recompute the interpolation vectors whenever {ψi(r)ψj(r)}has been
updated using an efficient least squares procedure that exploits the separable nature of the
matrix to be approximated. As a result, we can significantly accelerate hybrid functional
calculations using the ISDF decomposition in all but the first SCF iteration.
In this work, we further remove the need of performing the QRCP decomposition com-
pletely, and hence significantly reduce the computational cost. Note that an effective choice
of the set of interpolation points should satisfy the following two conditions. 1) The distribu-
tion of the interpolation points should roughly follow the distribution of the electron density.
4
In particular, there should be more points when the electron density is high, and less or even
zero points if the electron density is very low. 2) The interpolation points should not be
very close to each other. Otherwise matrix rows represented by the interpolation points are
nearly linearly dependent, and the matrix formed by the interpolation vectors will be highly
ill-conditioned. The QRCP procedure satisfies both 1) and 2) simultaneously, and thus is an
effective way for selecting the interpolation points. Here we demonstrate that 1) and 2) can
also be satisfied through a much simpler centroidal Voronoi tessellation (CVT) procedure
applied to a weight vector such as the electron density.
The Voronoi tessellation technique has been widely used in computer science,21 and
scientific and engineering applications such as image processing,22 pattern recognition, 23
and numerical integration.24 The concept of Voronoi tessellation can be simply understood
as follows. Given a discrete set of weighted points, the CVT procedure divides a domain
into a number of regions, each consisting of a collection of points that are closest to its
weighted centroid. Here we choose the electron density as the weight, and the centroids
as the interpolation points. The centroids must be located where the electron density is
significant, and hence satisfy the requirement 1). The centroids are also mutually separated
from each other by a finite distance due to the nearest neighbor principle, and hence satisfy
the requirement 2). Although detailed analysis of the error stemming from such choice of
interpolation points is very difficult for general nonlinear functions, we find that the CVT
procedure approximately minimizes the residual of the ISDF decomposition (1). In practice,
the CVT procedure only applies to one vector (the electron density) instead of O(N2
e) vectors
and hence is very efficient.
We apply the ISDF-CVT method to accelerate hybrid functional calculations in a planewave
basis set. We perform such calculations for different systems with insulating (liquid water),
semiconducting (bulk silicon), and metallic (disordered silicon aluminum alloy) characters,
as well as ab initio molecular dynamics (AIMD) simulations. We find that the ISDF-CVT
method achieves similar accuracy to that obtained from the ISDF-QRCP method, with sig-
5
nificantly improved efficiency. For instance, for a bulk silicon system containing 1000 silicon
atoms computed on 2000 computational cores, the QRCP procedure finds the interpolation
points with 434.2 seconds, while the CVT procedure only takes 3.2 seconds. Since the solu-
tion of the CVT procedure is continuous with respect to changes in the electron density, we
also find that the CVT procedure produces a smoother potential energy surface than that
by the QRCP procedure in the context of ab initio molecular dynamics (AIMD) simulations.
The remainder of the paper is organized as follows. We briefly introduce the ISDF
decomposition in section 2. In section 3 we describe the ISDF-CVT procedure and its imple-
mentation for hybrid functional calculations. We present numerical results of the ISDF-CVT
method in section 4, and conclude in section 5. We also provide the theoretical justification
of the CVT method in Appendix A.
2 Interpolative Separable Density Fitting (ISDF) de-
composition
In this section, we briefly introduce the ISDF decomposition16 evaluated using the method
developed in Ref.,20 which employs a separate treatment of the interpolation points and
interpolation vectors.
First, assume the interpolation points {ˆ
rµ}Nµ
µ=1 are known, then the interpolation vectors
can be efficiently evaluated using a least squares method as follows. Using a linear algebra
notation, Eq. (1) can be written as
Z≈ΘC, (3)
where each column of Zis given by Zij (r) = ϕi(r)ψj(r) sampled on a dense real space
grids {ri}Ng
i=1, and Θ = [ζ1, ζ2, ..., ζNµ] contains the interpolating vectors. Each column of C
indexed by (i, j) is given by
[ϕi(ˆ
r1)ψj(ˆ
r1),· · · , ϕi(ˆ
rµ)ψj(ˆ
rµ),· · · , ϕi(ˆ
rNµ)ψj(ˆ
rNµ)]T.
6
Eq. (3) is an over-determined linear system with respect to the interpolation vectors Θ. The
least squares approximation to the solution is given by
Θ = ZCT(CC T)−1.(4)
It may appear that the matrix-matrix multiplications ZCTand CCTtake O(N4
e) operations
because the size of Zis Ng×(NeN) and the size of Cis Nµ×(NeN). However, both
multiplications can be carried out with fewer operations due to the separable structure of Z
and C. The computational complexity for computing the interpolation vectors is O(N3
e), and
numerical results indicate that the preconstant is also much smaller than that involved in
hybrid functional calculations.20 Hence the interpolation vectors can be obtained efficiently
using the least squares procedure.
The problem for finding a suitable set of interpolation points {ˆ
rµ}Nµ
µ=1 can be formulated
as the following linear algebra problem. Consider the discretized matrix Zof size Ng×N2,
and find Nµrows of Zso that the rest of the rows of Zcan be approximated by the linear
combination of the selected Nµrows. This is called an interpolative decomposition,25 and
a standard method to achieve such a decomposition is the QR factorization with column
pivoting (QRCP) procedure25 as
ZTΠ = QR. (5)
Here ZTis the transpose of Z,Qis an N2×Ngmatrix that has orthonormal columns, R
is an upper triangular matrix, and Π is a permutation matrix chosen so that the magnitude
of the diagonal elements of Rform an non-increasing sequence. The magnitude of each
diagonal element Rindicates how important the corresponding column of the permuted ZT
is, and whether the corresponding grid point should be chosen as an interpolation point.
The QRCP factorization can be terminated when the (Nµ+ 1)-th diagonal element of R
becomes less than a predetermined threshold. The leading Nµcolumns of the permuted ZT
are considered to be linearly independent numerically. The corresponding grid points are
7
chosen as the interpolation points. The indices for the chosen interpolation points {ˆ
rµ}can
be obtained from indices of the nonzero entries of the first Nµcolumns of the permutation
matrix Π.
The QRCP decomposition satisfies the requirements 1) and 2) discussed in the intro-
duction. First, QRCP permutes matrix columns of ZTwith large norms to the front, and
pushes matrix columns of ZTwith small norms to the back. Note that the square of the
vector 2-norm of the column of ZTlabeled by ris just
N
X
i,j=1
ϕ2
i(r)ϕ2
j(r) = N
X
i=1
ϕ2
i(r)! N
X
j=1
ψ2
j(r)!.(6)
In the case when ϕi, ψjare the set of occupied orbitals, the norm of each column of ZT
is simply the electron density. Hence the interpolation points chosen by QRCP will occur
where the electron density is significant. Second, once a column is selected, all other columns
are immediately orthogonalized with respect to the chosen column. Hence nearly linearly
dependent matrix columns will not be selected repeatedly. As a result, the interpolation
points chosen by QRCP are well separated spatially.
It turns out that the direct application of the QRCP procedure (5) still requires O(N4
e)
computational complexity. The key idea used in Ref.16 to lower the cost is to randomly
subsample columns of the matrix Zto form a smaller matrix e
Zof size Ng×e
Nµ, where e
Nµ
is only slightly larger than Nµ. Applying the QRCP procedure to this subsampled matrix e
Z
approximately yields the choice of interpolation points, but the computational complexity is
reduced to O(N3
e). In the context of hybrid density functional calculations, we demonstrated
that the cost of the randomized QRCP method can be comparable to that of applying the
exchange operator in the planewave basis set.20 However, the ISDF decomposition can still
significantly reduce the computational cost, since the interpolation points only need to be
performed once for a fixed geometric configuration.
8
3 Centroidal Voronoi Tessellation based ISDF decom-
position
In this section, we demonstrate that the interpolation points can also be selected from a
Voronoi tessellation procedure. For a d-dimensional space, the Voronoi tessellation partitions
a set of points {ri}Ng
i=1 in Rdinto a number of disjoint cells. The partition is based on the
distance of each point to a finite set of points, called its generators. In our context, let
{ˆ
rµ}Nµ
µ=1 denote such a set of generators, the corresponding cell of a given generator ˆ
rµ,
defined through a cluster of points Cµis
Cµ={ri|dist(ri,ˆ
rµ)<dist(ri,ˆ
rν) for all µ6=ν}.(7)
The distance can be chosen to be any metric, e.g. the L2distance as dist(r,r0) = kr−r0k.
In the case when the distances of a point rto ˆ
rµ,ˆ
rνare exactly the same, we may arbitrarily
assign rto one of the clusters.
The Centroidal Voronoi tessellation (CVT) is a specific type of Voronoi tessellation in
which the generator ˆ
rµis chosen to be the centroid of its cell. Given a weight function ρ(r)
(such as the electron density), the centroid of a cluster Cµis defined as
c(Cµ) = Prj∈Cµrjρ(rj)
Prj∈Cµρ(rj).(8)
Combined with the L2distance, CVT can be viewed as a minimization problem over both
all possible partition of the cells and the centroids as26
{C∗
µ,c∗
µ}= arg min
{Cµ,cµ}
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)kri−cµk2,(9)
and the interpolation points are then chosen to be the minimizers ˆ
rµ=cµ(C∗
µ) = c∗
µ. Follow-
ing the discussion in the introduction, the electron density as the weight function (9) enforces
9
that the interpolation points should locate at points where the electron density is significant
and hence satisfies the requirement 1). Since the cells C∗
µare disjoint, the centroids c∗
µare
also separated by a finite distance away from each other and hence satisfies the requirement
2). In Appendix A we provide another theoretical justification in the sense that the CVT
method approximately minimizes the residual error of the ISDF decomposition.
Many algorithms have been developed to efficiently compute the Voronoi tessellation.27
One most widely used method is the Llyod’s algorithm,28 which in discrete case is equivalent
to the K-Means algorithm.26 The K-Means algorithm is an iterative method that greedily
minimizes the objective by taking alternating steps between {Cµ}and {cµ}. In this work, we
adopt a weighted version of the K-Means algorithm, which is demonstrated in Algorithm 1.
Note that the K-Means algorithm can be straightforwardly parallelized. We distribute the
grid points evenly at the beginning. The classification step is the most time consuming step,
and can be locally computed for each group of grid points. After this step, the weighted sum
and total weight of all clusters can be reduced from and broadcast to all processors for the
next iteration.
Algorithm 1: Weighted K-Means Algorithm to Find Interpolation Points for Density
Fitting
Input : Grid points {ri}Ng
i=1, Weight function ρ(r), Initial centroids {c(0)
µ}
Output: Interpolation points {ˆ
rµ}Nµ
µ=1
1Set t←0
2do
3Classification step: for i= 1 to Ngdo
4Assign point rito the cluster C(t)
µif c(t)
µis the closest centroid to ri
5end
6Update step: for µ= 1 to Nµdo
7c(t+1)
µ←Prj∈C(t)
µrjρ(rj)/Prj∈C(t)
µρ(rj)
8end
9Set t←t+ 1
10 while {c(t)
µ}not converged and maximum steps not reached;
11 for µ= 1 to Nµdo
12 Set ˆ
rµ←c(t)
µ
13 end
10
In order to demonstrate the CVT procedure, we consider the weight function ρ(r) given by
the summation of 4 Gaussian functions in a 2D domain. The initial choice of centroids, given
by 40 uniformly distributed random points, together with its associated Voronoi tessellation
are plotted in Figure 1 (a). Figure 1 (b) demonstrates the converged centroids and the
associated Voronoi tessellation using the weighted K-Means algorithm. We observe that the
centroids concentrate on where the weight function is significant, and are well-separated.
Figure 1: Schematic illustration of the CVT procedure for 4 Gaussian functions in a 2D do-
main, including (a) initial random choice of centroids and Voronoi tessellation and centroidal
Voronoi tessellation generated by the weighted K-Means algorithm.
We also show how the interpolation points are placed and moved during the ammonia-
borane (BH3NH3) decomposition reaction process. Figure 2 (a) shows the electron density
of the molecule at the compressed, equilibrium, and dissociated configurations, respectively,
according to the energy landscape in Fig. 2 (c). We plot the interpolation points found by
the weighted K-Means algorithm in Fig. 2 (b). At the compressed configuration, all the in-
terpolation points are distributed evenly around the molecule. As the bond length increases,
some interpolation points are transferred from BH3to NH3. Finally at the dissociated con-
figuration, the NH3has more interpolation points around the molecule, since there are more
electrons in NH3than BH3. Along the decomposition reaction process, both the transfer of
the interpolation points and the potential energy landscape are smooth with respect to the
change of the bond length.
11
Figure 2: The decomposition reaction process of BH3NH3computed with hybrid functional
(HSE06) calculations by using the CVT procedure to select interpolation points, including
(a) the electron density (yellow isosurfaces), (b) the interpolation points (yellow squares)
{ˆ
rµ}Nµ
µ=1 (Nµ= 8) selected from the real space grid points {ri}Ng
i=1 (Ng= 663) when the BN
distance respectively is 1.3, 1.7 and 2.8 ˚
A and (c) the binding energy as a function of BN
distance for BH3NH3in a 10 ˚
A×10 ˚
A×10 ˚
A box. The white, pink and blue pink balls
denote hydrogen, boron and nitrogen atoms, respectively.
12
4 Numerical results
We demonstrate the accuracy and efficiency of the ISDF-CVT method for hybrid functional
calculations by using the DGDFT (Discontinuous Galerkin Density Functional Theory) soft-
ware package.29–33 DGDFT is a massively parallel electronic structure software package de-
signed for large scale DFT calculations involving up to tens of thousands of atoms. It
includes a self-contained module called PWDFT for performing planewave based electronic
structure calculations (mostly for benchmarking and validation purposes). We implemented
the ISDF-CVT method in PWDFT. We use the Message Passing Interface (MPI) to handle
data communication, and the Hartwigsen-Goedecker-Hutter (HGH) norm-conserving pseu-
dopotential.34 All calculations use the HSE06 functional.35 All calculations are carried out on
the Edison systems at the National Energy Research Scientific Computing Center (NERSC).
Each node consists of two Intel “Ivy Bridge” processors with 24 cores in total and 64 gigabyte
(GB) of memory. Our implementation only uses MPI. The number of cores is equal to the
number of MPI ranks used in the simulation.
In this section, we demonstrate the performance of the ISDF-CVT method for acceler-
ating hybrid functional calculations by using three types of systems. 36 They consist of bulk
silicon systems (Si64, Si216 and Si1000 ), a bulk water system with 64 molecules ((H2O)64) and
a disordered silicon aluminum alloy system (Al176Si24). Bulk silicon systems (Si64, Si216 and
Si1000) and bulk water system ((H2O)64) are semiconducting with a relatively large energy
gap Egap >1.0 eV, and the Al176Si24 system is metallic with a small energy gap Egap <0.1
eV. All systems are closed shell systems, and the number of occupied bands is Nband =Ne/2.
In order to compute the energy gap in the systems, we also include two unoccupied bands
in all calculations.
13
4.1 Accuracy: Si216 and Al176Si24
We demonstrate the accuracy of the CVT-based ISDF decomposition in hybrid functional
calculation for semiconducting Si216 and metallic Al176Si24 systems, respectively. Although
there is no general theoretical guarantee for the convergence of the K-Means algorithm and
the convergence can depend sensitively on the initialization,37,38 we find that in the current
context, initialization to have little impact on the final accuracy of the approximation. Hence
we use random initialization for the K-Means algorithm. In all calculations, the adaptively
compressed exchange (ACE) technique is used to accelerate hybrid functional calculations
without loss of accuracy.39 The results obtained in this work are labeled as ACE-ISDF
(CVT), which are compared against those obtained from the previous work based on the
QRCP decomposition20 labeled as ACE-ISDF (QRCP). In both cases, we introduce a rank
parameter cto control the trade off between efficiency and accuracy, by setting the number
of interpolation points Nµ=cNe. We measure the error using the valence band maximum
(VBM) energy level, the conduction band minimum (CBM) energy level, the energy gap,
the Hartree-Fock energy, the total energy, and the atomic forces, respectively. The last three
quantities are defined as
∆EHF =|EACE-ISDF (CVT)
HF −EACE
HF |/NA
∆E=|EACE-ISDF (CVT) −EACE|/NA
∆F= max
IkFACE-ISDF (CVT)
I−FACE
Ik
where NAis the number of atoms and Iis the atom index.
Table 1 shows that the accuracy of the ACE-ISDF (CVT) method can systematically
improve as the rank parameter cincreases. When the rank parameter is large enough, the
accuracy is fully comparable to that obtained from the benchmark calculations. For a more
modest choice c= 6.0, the error of the energy per atom reaches below the chemical accuracy
of 1 kcal/mol (1.6×10−3Ha/atom), and the error of the force is around 10−3Ha/Bohr. This
14
is comparable to the accuracy obtained from ACE-ISDF (QRCP), and to e.g. linear scaling
methods for insulating systems with reasonable amount of truncation needed to achieve
significant speedup.40 In fact, when compared with ACE-ISDF (QRCP) in Figure 3, we find
that the CVT based ISDF decomposition achieves slightly higher accuracy, though there is
no theoretical guarantee for this to hold in general. The last column of Table 1 shows the
runtime of the K-Means algorithm. As cincreases, the number of interpolation points and
hence the number of cells increases proportionally. Hence we observe that the runtime of
K-Means scales linearly with respect to c.
Figure 3: The accuracy of ACE-ISDF based hybrid functional calculations (HSE06) obtained
by using the CVT and QRCP procedures to select the interpolation points, with varying rank
parameter cfrom 4 to 20 for Si216 and Al176Si24 , including the error of (a) Hartree-Fock energy
∆EHF (Ha/atom) and (b) total energy ∆E(Ha/atom).
15
Table 1: The accuracy of ACE-ISDF based hybrid functional calculations (HSE06) obtained
by using the CVT method to select interpolation points, with varying rank parameter c
for semiconducting Si216 and metallic Al176Si24 systems. The unit for VBM (EVBM ), CBM
(ECBM) and the energy gap Egap is eV. The unit for the error in the Hartree-Fock exchange
energy ∆EHF and the total energy ∆Eis Ha/atom, and the unit for the error in atomic forces
∆Fis Ha/Bohr. We use the results from the ACE-enabled hybrid functional calculations as
the reference. The last column shows the time for K-Means with different cvalues, with 434
cores for Si216 and 314 cores for Al176Si24 on Edison.
ACE-ISDF: Semiconducting Si216 (Nband = 432)
c EVBM ECBM Egap ∆EHF ∆E∆FTKMEANS
4.0 6.7467 8.3433 -1.5967 2.69E-03 3.08E-03 5.04E-03 0.228
5.0 6.6852 8.2231 -1.5379 9.46E-04 1.12E-03 2.29E-03 0.248
6.0 6.6640 8.1522 -1.4882 3.76E-04 4.62E-04 1.05E-03 0.301
7.0 6.6550 8.1163 -1.4613 1.55E-04 1.98E-04 6.49E-04 0.312
8.0 6.6510 8.1030 -1.4520 7.33E-05 9.55E-05 3.07E-04 0.349
9.0 6.6490 8.0980 -1.4490 3.60E-05 4.96E-05 2.30E-04 0.398
10.0 6.6479 8.0959 -1.4480 1.78E-05 2.64E-05 1.30E-04 0.477
12.0 6.6472 8.0945 -1.4473 4.46E-06 8.91E-06 8.37E-05 0.530
16.0 6.6469 8.0937 -1.4468 1.51E-07 1.41E-06 3.20E-05 0.773
20.0 6.6468 8.0935 -1.4467 4.06E-07 3.33E-07 1.20E-05 0.830
24.0 6.6468 8.0935 -1.4467 2.99E-07 1.06E-07 5.18E-06 0.931
ACE 6.6468 8.0934 -1.4466 0.00E+00 0.00E+00 0.00E+00 0.000
ACE-ISDF: Metallic Al176Si24 (Nband = 312)
c EVBM ECBM Egap ∆EHF ∆E∆FTKMEANS
4.0 7.9258 8.0335 -0.1076 3.80E-03 4.03E-03 8.01E-03 0.430
5.0 7.8537 7.9596 -0.1059 1.60E-03 1.69E-03 3.18E-03 0.535
6.0 7.8071 7.9127 -0.1056 6.07E-04 6.39E-04 1.48E-03 0.611
7.0 7.7843 7.8860 -0.1017 2.07E-04 2.17E-04 1.03E-03 0.731
8.0 7.7749 7.8749 -0.1000 7.43E-05 7.77E-05 4.40E-04 0.948
9.0 7.7718 7.8710 -0.0992 3.02E-05 3.20E-05 1.98E-04 0.947
10.0 7.7709 7.8697 -0.0989 1.48E-05 1.60E-05 1.80E-04 1.096
12.0 7.7703 7.8690 -0.0987 4.64E-06 5.60E-06 8.51E-05 1.305
16.0 7.7702 7.8688 -0.0986 6.35E-07 1.41E-06 3.24E-05 1.646
20.0 7.7701 7.8687 -0.0986 1.70E-08 5.30E-07 1.91E-05 2.037
ACE 7.7701 7.8687 -0.0986 0.00E+00 0.00E+00 0.00E+00 0.000
16
4.2 Efficiency: Si1000
We report the efficiency of the ISDF-CVT method by performing hybrid DFT calculations
for a bulk silicon system with 1000 atoms (Nband = 2000) on 2000 computational cores as
shown in Table 2, with respect to various choices of the kinetic energy cutoff (Ecut). With
the number of interpolation points fixed at Nµ= 12000, both QRCP and K-Means scales
linearly with the number of grid points Ng. Yet the runtime of K-Means is around two orders
of magnitude faster than QRCP. The determination of interpolation vectors, which consists
of solving a least-square problem, previously costs a fifth of the ISDF runtime but now
becomes the dominating component in CVT-based ISDF decomposition. Notice that the
ISDF method allows us to reduce the number of Poisson-like equations from N2
e= 4 ×106to
Nµ= 12000, which results in a significant speedup in terms of the cost of the FFT operations.
Table 2: The wall clock time (in seconds) spent in the components of the ACE-ISDF and
ACE enabled hybrid DFT calculations related to the exchange operator, for Si1000 on 2002
Edison cores at different Ecut levels. Interpolation points are selected via either the QRCP
or CVT procedure with the same rank parameter c= 6.0. Ngis the number of grid points
in real space.
Si1000 ACE-ISDF ACE
Ecut NgIPQRCP IPKMEANS IV (FFT) FFT
10 74338.06 0.70 12.48 (0.33) 85.15
20 1043126.39 1.24 36.48 (0.71) 143.54
30 1283240.87 2.03 68.50 (1.43) 268.88
40 1483434.16 3.26 108.18 (3.10) 783.27
4.3 AIMD: Si64 and (H2O)64
In this section, we demonstrate the accuracy of the ACE-ISDF (CVT) method in the context
of AIMD simulations for a bulk silicon system Si64 under the NVE ensemble, and a liquid
water system (H2O)64 under the NVT ensemble, respectively. The MD time step size is
1.0 femtosecond (fs). For the Si64 system, the initial MD structure (initial temperature
17
T= 300 K) is optimized by hybrid DFT calculations, and we perform the simulation for
0.5 ps. For the (H2O)64 system, we perform the simulation for 2.0 ps to sample the radial
distribution function after equilibrating the system starting from a prepared initial guess.41
We use a single level Nose-Hoover thermostat42,43 at T= 295 K, and the choice of mass of
the Nose-Hoover thermostat is 85000 au.
In the AIMD simulation, the interpolation points need to be recomputed for each atomic
configuration. At the initial MD step, although the initialization strategy does not impact
the accuracy of the physical observable, it can impact the convergence rate of the K-Means
algorithm. We measure the convergence in terms of the fraction of points that switch clusters
during two consecutive iterations. Figure 4 (a) shows the convergence of the K-Means
algorithm with interpolation points initially chosen from a random distribution and from
the QRCP solution, respectively. We find that the K-Means algorithm spends around half
the number of iterations to wait for 0.1% of the points to settle on the respective clusters.
However, these points often belong to the boundary of the clusters and have little effect
on the positions of the centroids (interpolation points). Therefore, we decide to terminate
K-Means algorithm whenever the fraction of points that switch clusters falls below the 0.1%
threshold. It is evident that QRCP initialization leads to faster convergence than random
sampling. However, in the AIMD simulation, a very good initial guess of the interpolation
points can be simply obtained from those from the previous MD step. Figure 4 (b) shows
that the number of K-Means iterations in the MD simulation can be very small, which
demonstrates the effectiveness of this initialization strategy.
Figure 5 (a-b) demonstrate the positive and velocity of a Si atom over a MD trajectory
of 500 fs obtained using the ACE, ACE-ISDF (QRCP) and ACE-ISDF (CVT) methods
(rank parameter c= 8.0), respectively. The three trajectories fully overlap with each other,
indicating that ISDF is a promising method for reducing the cost of hybrid functional cal-
culations with controllable loss of accuracy. Figure 5 (c) shows the total potential energy
obtained by the three methods along the MD trajectory, and the difference among the three
18
Figure 4: Comparison of the ISDF-CVT method by using either random or QRCP initializa-
tion for hybrid DFT AIMD simulations on bulk silicon system Si64 and liquid water system
(H2O)64, including (a) the fraction of points what switch cluster in each K-Means iteration
and (b) the number of K-Means iterations during each MD step.
19
methods is more noticeable. This is due to the fact that ISDF decomposition is a low rank
decomposition for the pair product of orbitals, which leads to error in the Fock exchange
energy and hence the total potential energy. Nonetheless, we find that such difference merely
results in a shift of the potential energy surface along the MD trajectory, and hence affects
little physical observables defined via relative potential energy differences. Furthermore, the
CVT method yields a potential energy trajectory that is much smoother compared to that
obtained from QRCP. This is because the interpolation points obtained from CVT are driven
by the electron density, which varies smoothly along the MD trajectory. Such properties do
not hold for the QRCP method. This means that the CVT method can be more effective
when a smooth potential energy surface is desirable, such as the case of geometry optimiza-
tion. The absolute error of the potential energy from the CVT method is coincidentally
smaller than that from QRCP, but we are not aware of reasons for this behavior to hold
in general. Finally, Figure 5 (d) shows that both the CVT-based and QRCP-based ISDF
decomposition lead to controlled energy drift in the NVE simulation.
We also apply the ACE-ISDF (CVT) and ACE-ISDF (QRCP) methods for hybrid DFT
AIMD simulations on liquid water system (H2O)64 under the NVT ensemble to sample the
radial distribution function in Figure 6. We find that the results from all three methods
agree very well, and our result is in quantitative agreement with previous hybrid functional
DFT calculations,41 where the remaining difference with respect to the experimental result
can be to a large extent attributed to the nuclei quantum effects.44
5 Conclusion
In this work, we demonstrate that the interpolative separable density fitting decomposition
(ISDF) can be efficiently performed through a separated treatment of interpolation points
and interpolation vectors. We find that the centroidal Voronoi tessellation method (CVT)
provides an effective choice of interpolation points using only the electron density as the
20
Figure 5: Comparison of hybrid HSE06 DFT AIMD simulations by using the ISDF-CVT
and ISDF-QRCP methods as well as exact nested two-level SCF iteration procedure as the
reference on the bulk silicon Si64, including three coordinates (X, Y and Z directions) of (a)
position and (b) velocity of a specific Si atom, (c) potential energy and (d) relatively energy
drift during MD steps.
21
Figure 6: The oxygen-oxygen radial distribution functions gOO(r) of liquid water system
(H2O)64 at T= 295 K obtained from hybrid DFT AIMD simulations with the ISDF-CVT
and ISDF-QRCP methods as well as exact nested two-level SCF iteration procedure as the
reference.
input information. The resulting choice of interpolation points are by design inhomogeneous
in the real space, concentrated at regions where the electron density is significant, and are
well separated from each other. These are all key ingredients for obtaining a low rank decom-
position that is accurate and a well conditioned set of interpolation vectors. We demonstrate
that the CVT-based ISDF decomposition can be an effective strategy for reducing the cost
hybrid functional calculations for large systems. The CVT-based method achieves similar
accuracy when compared with that obtained from QRCP, with significantly improved effi-
ciency. For a supercell containing 1000 silicon atoms on 2000 computational cores, the cost
of QRCP-based method for finding the interpolation points costs 434.2 seconds, while the
CVT procedure only takes 3.2 seconds. Since the solution of the CVT method depends con-
tinuously with respect to the electron density, we also find that the CVT method produces a
smoother potential energy surface than that by the QRCP method in the context of ab initio
molecular dynamics simulation. Our analysis indicates that it might be possible to further
improve the quality of the interpolation points by taking into account the gradient informa-
tion in the weight vector. We also expect that the CVT-based strategy can also be useful in
other contexts where the ISDF decomposition is applicable, such as ground state calculations
22
with rung-5 exchange-correlation functionals, and excited state calculations. These will be
explored in the future work.
6 Acknowledgments
This work was partly supported by the National Science Foundation under grant No. DMS-
1652330, the DOE under grant No. de-sc0017867, the DOE CAMERA project (L. L.), and
by the DOE Scientific Discovery through Advanced Computing (SciDAC) program (K. D.,
W. H. and L. L.). The authors thank the National Energy Research Scientific Computing
(NERSC) center and the Berkeley Research Computing (BRC) program at the University
of California, Berkeley for making computational resources available. We thank Anil Damle
and Robert Saye for useful discussions.
A Minimization of the approximate residual
The ISDF decomposition is a highly nonlinear process, and in general we cannot expect the
choice of interpolation points from CVT decomposition to maximally reduce the error of the
decomposition. Here we demonstrate that the choice of the interpolation points from the
centroidal Voronoi tessellation algorithm approximately minimizes the residual for the ISDF
decomposition, and hence provides a heuristic solution to the problem of finding interpolation
points.
For simplicity we assume ϕi=ψi, and hence each row of Zis Z(r)=[ϕi(r)ϕj(r)]N
i,j=1.
Now suppose we cluster all matrix rows of Zinto sub-collections {Cµ}Nµ
µ=1, and for each Cµwe
choose a representative matrix row Z(rµ). Then the error of the ISDF can be approximately
characterized as
R=
Nµ
X
µ=1 X
rk∈Cµ
Z(rk)−Projspan{Z(rµ)}Z(rk)
2,(10)
23
where the projection is defined according to the L2inner product as
Projspan{Z(rµ)}Z(rk) = Z(rk)·Z(rµ)
Z(rµ)·Z(rµ)Z(rµ).(11)
Let Φ be the Ng×Nmatrix with each row Φ(r)=[ϕi(r)]N
i=1, then the electron density ρ(r)
is equal to Φ(r)·Φ(r). Using the relation
Z(rµ)·Z(rµ) = (Φ(rµ)·Φ(rµ))2=ρ(rµ)2,(12)
we have
R=
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)21−(Φ(rk)·Φ(rµ))4
ρ(rk)2ρ(rµ)2=
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)2[1 −cos4(θ(rk,rµ))].(13)
Here θ(rk,rµ) is the angle between the vectors Φ(rk) and Φ(rµ). Use the fact that
ρ(rk)[1 −cos4(θ(rk,rµ))] ≤2Φ(rk)·Φ(rk) sin2(θ(rk,rµ)) ≤2kΦ(rk)−Φ(rµ)k2,(14)
we have
R≤2
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)kΦ(rk)−Φ(rµ)k2≈2
Nµ
X
µ=1 X
rk∈Cµ
ρ(rk)k∇rΦ(rµ)k2krk−rµk2.(15)
If we further neglect the spatial inhomogeneity of the gradient Φ(r), we arrive at the mini-
mization criterion for the centroidal Voronoi decomposition.
References
(1) Szabo, A.; Ostlund, N. Modern Quantum Chemistry: Introduction to Advanced Elec-
tronic Structure Theory; McGraw-Hill, New York, 1989.
(2) Martin, R. Electronic Structure – Basic Theory and Practical Methods; Cambridge
24
Univ. Pr.: West Nyack, NY, 2004.
(3) Murphy, R. B.; Beachy, M. D.; Friesner, R. A.; Ringnalda, M. N. Pseudospectral lo-
calized MøllerPlesset methods: Theory and calculation of conformational energies. J.
Chem. Phys. 1995,103, 1481.
(4) Reynolds, G.; Martinez, T. J.; Carter, E. A. Local weak pairs spectral and pseudospec-
tral singles and doubles configuration interaction. J. Chem. Phys. 1996,105, 6455.
(5) Beebe, N. H. F.; Linderberg, J. Simplifications in the generation and transformation of
two-electron integrals in molecular calculations. Int. J. Quantum Chem. 1977,12, 683.
(6) Koch, H.; S´anchez de Mer´as, A.; Pedersen, T. B. Reduced scaling in electronic structure
calculations using Cholesky decompositions. J. Chem. Phys. 2003,118, 9481–9484.
(7) Aquilante, F.; Pedersen, T. B.; Lindh, R. Low-cost evaluation of the exchange Fock
matrix from Cholesky and density fitting representations of the electron repulsion in-
tegrals. J. Chem. Phys. 2007,126, 194106.
(8) Manzer, S.; Horn, P. R.; Mardirossian, N.; Head-Gordon, M. Fast, accurate evaluation
of exact exchange: The occ-RI-K algorithm. J. Chem. Phys. 2015,143, 024113.
(9) Ren, X.; Rinke, P.; Blum, V.; Wieferink, J.; Tkatchenko, A.; Sanfilippo, A.; Reuter, K.;
Scheffler, M. Resolution-of-Identity Approach to Hartree-Fock, Hybrid Density Func-
tionals, RPA, MP2 and GW with Numeric Atom-Centered Orbital Basis Functions.
New J. Phys. 2012,14, 053020.
(10) Weigend, F. A fully direct RI-HF algorithm: Implementation, optimised auxiliary basis
sets, demonstration of accuracy and efficiency. Phys. Chem. Chem. Phys. 2002,4,
4285–4291.
(11) Parrish, R. M.; Hohenstein, E. G.; Mart´ınez, T. J.; Sherrill, C. D. Tensor hypercon-
traction. II. Least-squares renormalization. J. Chem. Phys. 2012,137, 224106.
25
(12) Parrish, R. M.; Hohenstein, E. G.; Mart´ınez, T. J.; Sherrill, C. D. Discrete variable
representation in electronic structure theory: Quadrature grids for least-squares tensor
hypercontraction. J. Chem. Phys. 2013,138, 194107.
(13) Goedecker, S. Linear scaling electronic structure methods. Rev. Mod. Phys. 1999,71,
1085–1123.
(14) Bowler, D. R.; Miyazaki, T. O(N) methods in Electronic Structure Calculations. Rep.
Prog. Phys. 2012,75, 036503.
(15) Guidon, M.; Hutter, J.; Vandevondele, J. Auxiliary density matrix methods for Hartree-
Fock exchange calculations. J. Chem. Theory Comput. 2010,6, 2348–2364.
(16) Lu, J.; Ying, L. Compression of the Electron Repulsion Integral Tensor in Tensor Hy-
percontraction Format with Cubic Scaling Cost. J. Comput. Phys. 2015,302, 329–335.
(17) Golub, G. H.; Van Loan, C. F. Matrix computations, 4th ed.; Johns Hopkins Univ.
Press: Baltimore, 2013.
(18) Lu, J.; Thicke, K. Cubic scaling algorithms for RPA correlation using interpolative
separable density fitting. J. Comput. Phys. 2017,351, 187 – 202.
(19) Lin, L.; Xu, Z.; Ying, L. Adaptively Compressed Polarizability Operator for Acceler-
ating Large Scale Ab Initio Phonon Calculations. Multiscale Model. Simul. 2017,15,
29–55.
(20) Hu, W.; Lin, L.; Yang, C. Interpolative Separable Density Fitting Decomposition for
Accelerating Hybrid Density Functional Calculations With Applications to Defects in
Silicon. J. Chem. Theory Comput. 2017,accepted.
(21) Aurenhammer, F. Voronoi diagramsa survey of a fundamental geometric data structure.
ACM Computing Surveys (CSUR) 1991,23, 345–405.
26
(22) Du, Q.; Gunzburger, M.; Ju, L.; Wang, X. Centroidal Voronoi tessellation algorithms
for image compression, segmentation, and multichannel restoration. Journal of Mathe-
matical Imaging and Vision 2006,24, 177–194.
(23) Ogniewicz, R. L.; K¨ubler, O. Hierarchic voronoi skeletons. Pattern recognition 1995,
28, 343–359.
(24) Becke, A. D. A multicenter numerical integration scheme for polyatomic molecules. The
Journal of chemical physics 1988,88, 2547–2553.
(25) Chan, T. F.; Hansen, P. C. Some Applications of the Rank Revealing QR Factorization.
SIAM J. Sci. Statist. Comput. 1992,13, 727–741.
(26) MacQueen, J. Some methods for classification and analysis of multivariate observations.
Proc. of the Fifth Berkeley Symp. On Math. Stat. and Prob. 1967; pp 281–297.
(27) Medvedev, N. The algorithm for three-dimensional Voronoi polyhedra. Journal of com-
putational physics 1986,67, 223–229.
(28) Lloyd, S. Least squares quantization in PCM. IEEE transactions on information theory
1982,28, 129–137.
(29) Lin, L.; Lu, J.; Ying, L.; E, W. Adaptive Local Basis Set for KohnCSham Density
Functional Theory in a Discontinuous Galerkin Framework I: Total Energy Calculation.
J. Comput. Phys. 2012,231, 2140–2154.
(30) Hu, W.; Lin, L.; Yang, C. DGDFT: A Massively Parallel Method for Large Scale Density
Functional Theory Calculations. J. Chem. Phys. 2015,143, 124110.
(31) Hu, W.; Lin, L.; Yang, C. Edge Reconstruction in Armchair Phosphorene Nanoribbons
Revealed by Discontinuous Galerkin Density Functional Theory. Phys. Chem. Chem.
Phys. 2015,17, 31397–31404.
27
(32) Banerjee, A. S.; Lin, L.; Hu, W.; Yang, C.; Pask, J. E. Chebyshev Polynomial Filtered
Subspace Iteration in the Discontinuous Galerkin Method for Large-Scale Electronic
Structure Calculations. J. Chem. Phys. 2016,145, 154101.
(33) Zhang, G.; Lin, L.; Hu, W.; Yang, C.; Pask, J. E. Adaptive Local Basis Set for Kohn-
Sham Density Functional Theory in a Discontinuous Galerkin Framework II: Force,
Vibration, and Molecular Dynamics Calculations. J. Comput. Phys. 2017,335, 426–
443.
(34) Hartwigsen, C.; Goedecker, S.; Hutter, J. Relativistic Separable Dual-Space Gaussian
Pseudopotentials from H to Rn. Phys. Rev. B 1998,58, 3641.
(35) Heyd, J.; Scuseria, G. E.; Ernzerhof, M. Erratum: ”Hybrid functionals based on a
screened Coulomb potential” [J. Chem. Phys. 118, 8207 (2003)]. J. Chem. Phys. 2006,
124, 219906.
(36) Hu, W.; Lin, L.; Yang, C. Projected Commutator DIIS Method for Accelerating Hy-
brid Functional Electronic Structure Calculations. J. Chem. Theory Comput. 2017,
accepted.
(37) Arthur, D.; Vassilvitskii, S. How slow is the k-means method? Proceedings of the
twenty-second annual symposium on Computational geometry. 2006; pp 144–153.
(38) Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. Proceedings
of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 2007; pp 1027–
1035.
(39) Lin, L. Adaptively Compressed Exchange Operator. J. Chem. Theory Comput. 2016,
12, 2242–2249.
(40) Dawson, W.; Gygi, F. Performance and Accuracy of Recursive Subspace Bisection for
28
Hybrid DFT Calculations in Inhomogeneous Systems. J. Chem. Theory Comput. 2013,
11, 4655–4663.
(41) Jr., R. A. D.; Santra, B.; Li, Z.; Wu, X.; Car, R. The Individual and Collective effects
of Exact Exchange and Dispersion Interactions on the Ab Initio Structure of Liquid
Water. J. Chem. Phys. 2014,141, 084502.
(42) Nos´e, S. A Unified Formulation of the Constant Temperature Molecular Dynamics
Methods. J. Chem. Phys. 1984,81, 511.
(43) Hoover, W. G. Canonical Dynamics: Equilibrium Phase-Space Distributions. Phys.
Rev. A 1985,31, 1695.
(44) Morrone, J.; Car, R. Nuclear quantum effects in water. Phys. Rev. Lett. 2008,101,
017801.
29