Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry

The Ohio State University, Columbus, Ohio, USA.
The Journal of Physical Chemistry A (Impact Factor: 2.78). 11/2009; 113(45):12715-23. DOI: 10.1021/jp9051215
Source: PubMed

ABSTRACT Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the coupled cluster method. This paper addresses two complementary aspects of performance optimization of such tensor contraction expressions. Transformations using algebraic properties of commutativity and associativity can be used to significantly decrease the number of arithmetic operations required for evaluation of these expressions. The identification of common subexpressions among a set of tensor contraction expressions can result in a reduction of the total number of operations required to evaluate the tensor contractions. The first part of the paper describes an effective algorithm for operation minimization with common subexpression identification and demonstrates its effectiveness on tensor contraction expressions for coupled cluster equations. The second part of the paper highlights the importance of data layout transformation in the optimization of tensor contraction computations on modern processors. A number of considerations, such as minimization of cache misses and utilization of multimedia vector instructions, are discussed. A library for efficient index permutation of multidimensional tensors is described, and experimental performance data is provided that demonstrates its effectiveness.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a new implementation of a recent open-ended response theory formulation for time- and perturbation-dependent basis sets (Thorvaldsen et al., J. Chem. Phys. 2008, 129, 214108) at the Hartree-Fock and density functional levels of theory. A novel feature of the new implementation is the use of recursive programming techniques, making it possible to write highly compact code for the analytic calculation of any response property at any valid choice of rule for the order of perturbation at which to include perturbed density matrices. The formalism is expressed in terms of the density matrix in the atomic orbital basis, allowing the recursive scheme presented here to be used in linear-scaling formulations of response theory as well as with two- and four-component relativistic wave functions. To demonstrate the new code, we present calculations of the third geometrical derivatives of the frequency-dependent second hyperpolarizability for HSOH at the Hartree-Fock level of theory, a seventh-order energy derivative involving basis sets that are both time and perturbation dependent. © 2014 Wiley Periodicals, Inc.
    Journal of Computational Chemistry 03/2014; 35(8). DOI:10.1002/jcc.23533 · 3.60 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study and systematically evaluate a class of composable code transformations that improve arithmetic intensity in local assembly operations, which represent a significant fraction of the execution time in finite element methods. Their performance optimization is indeed a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions, which vary among different problems, make it hard to determine an optimal sequence of successful transformations. Our investigation has resulted in the implementation of a compiler (called COFFEE) for local assembly kernels, fully integrated with a framework for developing finite element methods. The compiler manipulates abstract syntax trees generated from a domain-specific language by introducing domain-aware optimizations for instruction-level parallelism and register locality. Eventually, it produces C code including vector SIMD intrinsics. Experiments using a range of real-world finite element problems of increasing complexity show that significant performance improvement is achieved. The generality of the approach and the applicability of the proposed code transformations to other domains is also discussed.
    ACM Transactions on Architecture and Code Optimization 01/2015; 11(4):1-25. DOI:10.1145/2687415 · 0.60 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe an extension of our graphics processing unit (GPU) electronic structure program TeraChem to include atom-centered Gaussian basis sets with d angular momentum functions. This was made possible by a “meta-programming” strategy that leverages computer algebra systems for the derivation of equations and their transformation to correct code. We generate a multitude of code fragments that are formally mathematically equivalent, but differ in their memory and floating-point operation footprints. We then select between different code fragments using empirical testing to find the highest performing code variant. This leads to an optimal balance of floating-point operations and memory bandwidth for a given target architecture without laborious manual tuning. We show that this approach is capable of similar performance compared to our hand-tuned GPU kernels for basis sets with s and p angular momenta. We also demonstrate that mixed precision schemes (using both single and double precision) remain stable and accurate for molecules with d functions. We provide benchmarks of the execution time of entire self-consistent field (SCF) calculations using our GPU code and compare to mature CPU based codes, showing the benefits of the GPU architecture for electronic structure theory with appropriately redesigned algorithms. We suggest that the meta-programming and empirical performance optimization approach may be important in future computational chemistry applications, especially in the face of quickly evolving computer architectures.
    Journal of Chemical Theory and Computation 11/2012; 9(1):213–221. DOI:10.1021/ct300321a · 5.31 Impact Factor

Full-text (2 Sources)

Available from
Jun 5, 2014