March 2025
·
6 Reads
·
1 Citation
Journal of Chemical Theory and Computation
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
March 2025
·
6 Reads
·
1 Citation
Journal of Chemical Theory and Computation
March 2025
·
29 Reads
Journal of Computational Chemistry
Multi‐fidelity methods in machine learning (ML) have seen increasing usage for the prediction of quantum chemical properties. These methods, such as ‐ML and Multifidelity Machine Learning (MFML), have been shown to significantly reduce the computational cost of generating training data. This work implements and analyzes several multi‐fidelity methods including ‐ML and MFML for the prediction of electronic molecular energies at DLPNO‐CCSD(T) level, that is, at the level of coupled cluster theory including single and double excitations and perturbative triples corrections. The models for small organic molecules are evaluated not only on the basis of accuracy of prediction, but also on efficiency in terms of the time‐cost of generating training data. In addition, the models are evaluated for the prediction of energies for molecules sampled from a public dataset, in particular for atmospherically relevant molecules, isomeric compounds, and highly conjugated complex molecules.
February 2025
·
4 Reads
·
2 Citations
Scientific Data
Progress in both Machine Learning (ML) and Quantum Chemistry (QC) methods have resulted in high accuracy ML models for QC properties. Datasets such as MD17 and WS22 have been used to benchmark these models at a given level of QC method, or fidelity, which refers to the accuracy of the chosen QC method. Multifidelity ML (MFML) methods, where models are trained on data from more than one fidelity, have shown to be effective over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. One hurdle for effective research here is the lack of a diverse multifidelity dataset for benchmarking. We provide the Quantum chemistry MultiFidelity (QeMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. QeMFi offers to the community a variety of QC properties such as vertical excitation properties and molecular dipole moments. Further QeMFi offers QC computation times allowing for a time benefit benchmark of multifidelity models for ML-QC.
January 2025
·
15 Reads
Multifidelity methods in machine learning (ML) have seen an increasing usage for the prediction of quantum chemical properties. These methods, such as ∆-ML and multifidelity ML, have been shown to significantly reduce the computational cost of generating training data. This work implements and analyzes several multifidelity methods including ∆-ML and multifidelity ML for the prediction of electronic molecular energies at DLPNO-CCSD(T) level, i.e., at the level of coupled cluster theory including single and double excitations and perturbative triples corrections. The models for small organic molecules are evaluated not only on the basis of accuracy of prediction, but also on efficiency in terms of the time-cost of generating training data. In addition, the models are evaluated for the prediction of energies for molecules sampled from a public dataset, in particular for atmospherically relevant molecules, isomeric compounds, and highly conjugated complex molecules.
October 2024
·
50 Reads
Natural light-harvesting antenna complexes efficiently capture solar energy using chlorophyll, i.e., magnesium porphyrin pigments, embedded in a protein matrix. Inspired by this natural configuration, artificial clay-porphyrin antenna structures have been experimentally synthesized and have demonstrated remarkable excitation energy transfer properties. The study presents the computational design and simulation of a synthetic light-harvesting system that emulates natural mechanisms by arranging cationic free-base porphyrin molecules on an anionic clay surface. We investigated the transfer of excitation energy among the porphyrin dyes using a multiscale quantum mechanics/molecular mechanics (QM/MM) approach based on the semi-empirical density functional-based tight-binding (DFTB) theory for the ground state dynamics. To improve the accuracy of our results, we incorporated an innovative multifidelity machine learning (MFML) approach, which allows the prediction of excitation energies at the numerically demanding time-dependent density functional theory level with the Def2-SVP basis set. This approach was applied to an extensive dataset of 640K geometries for the 90-atom porphyrin structures, facilitating a thorough analysis of the excitation energy diffusion among the porphyrin molecules adsorbed to the clay surface. The insights gained from this study, inspired by natural light-harvesting complexes, demonstrate the potential of porphyrin-clay systems as effective energy transfer systems.
October 2024
·
8 Reads
Machine learning interatomic potentials (MLIPs) have seen significant advances as efficient replacement of expensive quantum chemical calculations. Uncertainty estimations for MLIPs are crucial to quantify the additional model error they introduce and to leverage this information in active learning strategies. MLIPs that are based on Gaussian process regression provide a standard deviation as a possible uncertainty measure. An alternative approach are ensemble-based uncertainties. Although these uncertainty measures have been applied to active learning, it has rarely been studied how they correlate with the error, and it is not always clear whether active learning actually outperforms random sampling strategies. We consider GPR models with Coulomb and SOAP representations as inputs to predict potential energy surfaces and excitation energies of molecules. We evaluate, how the GPR variance and ensemble-based uncertainties relate to the error and whether model performance improves by selecting the most uncertain samples from a fixed configuration space. For the ensemble based uncertainty estimations, we find that they often do not provide any information about the error. For the GPR standard deviation, we find that often predictions with an increasing standard deviation also have an increasing systematical bias, which is not captured by the uncertainty. In these cases, selecting training samples with the highest uncertainty leads to a model with a worse test error compared to random sampling. We conclude that confidence intervals, which are derived from the predictive standard deviation, can be highly overconfident. Selecting samples with high GPR standard deviation leads to a model that overemphasizes the borders of the configuration space represented in the fixed dataset. This may result in worse performance in more densely sampled areas but better generalization for extrapolation tasks.
October 2024
·
16 Reads
Multifidelity methods in machine learning (ML) have seen an increasing usage for the prediction of quantum chemical properties. These methods, such as ∆-ML and multifidelity ML, have been shown to significantly reduce the computational cost of generating training data. This work implements and analyzes several multifidelity methods including ∆-ML and multifidelity ML for the prediction of electronic molecular energies at DLPNO-CCSD(T) level, i.e., at the level of coupled cluster theory including single and double excitations and perturbative triples corrections. The models for small organic molecules are evaluated not only on the basis of accuracy of prediction, but also on efficiency in terms of the time-cost of generating training data. In addition, the models are evaluated for the prediction of energies for molecules sampled from a public dataset, in particular for atmospherically relevant molecules, isomeric compounds, and highly conjugated complex molecules.
October 2024
·
3 Reads
Recent progress in machine learning (ML) has made high-accuracy quantum chemistry (QC) calculations more accessible. Of particular interest are multifidelity machine learning (MFML) methods where training data from differing accuracies or fidelities are used. These methods usually employ a fixed scaling factor, , to relate the number of training samples across different fidelities, which reflects the cost and assumed sparsity of the data. This study investigates the impact of modifying on model efficiency and accuracy for the prediction of vertical excitation energies using the QeMFi benchmark dataset. Further, this work introduces QC compute time informed scaling factors, denoted as , that vary based on QC compute times at different fidelities. A novel error metric, error contours of MFML, is proposed to provide a comprehensive view of model error contributions from each fidelity. The results indicate that high model accuracy can be achieved with just 2 training samples at the target fidelity when a larger number of samples from lower fidelities are used. This is further illustrated through a novel concept, the -curve, which compares model error against the time-cost of generating training samples, demonstrating that multifidelity models can achieve high accuracy while minimizing training data costs.
October 2024
·
7 Reads
The development of machine learning (ML) methods has made quantum chemistry (QC) calculations more accessible by reducing the compute cost incurred in conventional QC methods. This has since been translated into the overhead cost of generating training data. Increased work in reducing the cost of generating training data resulted in the development of -ML and multifidelity machine learning methods which use data at more than one QC level of accuracy, or fidelity. This work compares the data costs associated with -ML, multifidelity machine learning (MFML), and optimized MFML (o-MFML) in contrast with a newly introduced Multifidelity-Machine Learning (MFML) method for the prediction of ground state energies over the multifidelity benchmark dataset QeMFi. This assessment is made on the basis of training data generation cost associated with each model and is compared with the single fidelity kernel ridge regression (KRR) case. The results indicate that the use of multifidelity methods surpasses the standard -ML approaches in cases of a large number of predictions. For cases, where -ML method might be favored, such as small test set regimes, the MF-ML method is shown to be more efficient than conventional -ML.
October 2024
·
21 Reads
·
5 Citations
Multifidelity machine learning (MFML) for quantum chemical properties has seen strong development in the recent years. The method has been shown to reduce the cost of generating training data for high-accuracy low-cost ML models. In such a set-up, the ML models are trained on molecular geometries and some property of interest computed at various computational chemistry accuracies, or fidelities. These are then combined in training the MFML models. In some multifidelity models, the training data is required to be nested, that is the same molecular geometries are included to calculate the property across all the fidelities. In these multifidelity models, the requirement of a nested configuration restricts the kind of sampling that can be performed while selection training samples at different fidelities. This work assesses the use of non-nested training data for two of these multifidelity methods, namely MFML and optimized MFML (o-MFML). The assessment is carried out for the prediction of ground state energies and first vertical excitation energies of a diverse collection of molecules of the CheMFi dataset. Results indicate that the MFML method still requires a nested structure of training data across the fidelities. However, the o-MFML method shows promising results for non-nested multifidelity training data with model errors comparable to the nested configurations.
... In addition to the single fidelity GPR and MFML models, a recently introduced MFML approach, referred to as the Γ-curve, 46 is analyzed as well. In conventional MFML theory, the training samples at the various fidelities are decided by a scaling factor, γ, that is, ...
March 2025
Journal of Chemical Theory and Computation
... Here, each fidelity is treated as inter-related to the others and a surrogate MTGPR model is created. Further, a diverse multifidelity dataset consisting of 135,000 point geometries has recently been made available [16,17] with various QC properties, such as vertical excitation energies, calculated with DFT formalism. The fidelities are differentiated by the choice of basis set used in the calculation. ...
February 2025
Scientific Data
... The value of γ = 2 is conventionally used in MFML based on previous work. 32,[49][50][51] In a recent work, the effect of different values of γ in the model error of MFML has been studied. 33 Ref. 33 reports that the use of very little training data at the target fidelity combined with increasing values of γ, results in a more data efficient model. ...
October 2024
... Multifidelity methods harnessing inherent QC hierarchies to cancel out errors across different numerical QC methods have since superseded the single fidelity ML methods. These methods include Δ-ML 12 based models such as hierarchical machine learning 13 , multifidelity machine learning (MFML) 14,15 , and optimized MFML (o-MFML) 16 . Certain other flavors of ML using multifidelity data have been proposed and tested, including multi-task Gaussian processes treating the different fidelities as interdependent tasks 17,18 . ...
March 2024
... Ramakrishnan et al. (2015) popularized the ∆-learning approach (Bogojeski et al., 2020), where a model learns to predict the difference between some prior and the reference quantum mechanical targets. Multi-fidelity learning generalizes ∆-learning by building a hierarchy of models that predict increasingly accurate levels of theory (Giselle Fernández-Godino, 2023;Vinod et al., 2023;Forrester et al., 2007;Heinen et al., 2024). Making predictions in the hierarchical multi-fidelity setting corresponds to evaluating a baseline fidelity level and then refining this prediction with models that provide corrections to more accurate levels of theory in the hierarchy. ...
October 2023
Journal of Chemical Theory and Computation
... Efforts are being focused on integrating more complex fluid solvers into these creation suites. For instance, the 3D solver for the two-phase incompressible Navier-Stokes equations NaSt3DGPF was successfully coupled with Maya in a toolkit that enables the user to control the full fluid simulation within Maya's interface [42] [43]. The solver uses high-order Finite Difference discretization methods and the rendering techniques result in realistic CFD visualizations. ...
January 2019
International Journal for Uncertainty Quantification
... Δ-Machine Learning (ML) aims to efficiently elevate a DFT-MLP to close to the CCSD(T) level. 41,64,101,[106][107][108][109] The Δ-ML approach we use 101 for this purpose is given by the following equation: ...
December 2018
Journal of Chemical Theory and Computation
... has been the topic of many articles, see [5,19,26,29,31,45] to mention a few. Their commonality is that they are usually employed for the standard Sobolev spaces ...
February 2019
Journal of Scientific Computing
... Here (a) is commonly encountered for compressing forward operators in integral equations and kernel matrices. Existing codes include HLIBpro [3], [18], H2Pack [16], ASKIT [19], GOFMM [20], and GPU implementations like H2Opus [17] and hmglib [21]. They typically leverage adaptive cross approximation, proxy surface, or preselected skeletons to construct the H 2 matrix. ...
August 2017
Journal of Scientific Computing
... This work introduces algebraically constructed multilevel hierarchies [8,10,24] for the solution of elliptic problems on tensor product domains. While previous works [14,15] first constructed the multilevel hierarchy of meshes or triangulations and then discretized the problem by finite elements, the new approach first discretizes the problem on Ω on the finest (potentially unstructured) mesh T J and then constructs coarser versions of the linear system resulting from the fine discretization. ...
January 2016
Linear Algebra and its Applications