Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Supplementary Information:
Excitation Energy Transfer between Porphyrin
Dyes on a Clay Surface: A study employing
Multifidelity Machine Learning
Dongyu Lyu,†Matthias Holzenkamp,‡Vivin Vinod,‡Yannick Marcel Holtkamp,†
Sayan Maity,†Carlos R. Salazar,†,§Ulrich Kleinekathöfer,∗,†and Peter Zaspel∗,¶
†School of Science, Constructor University, Bremen 28759, Germany
‡School of Mathematics and Natural Science, University of Wuppertal, Wuppertal 42119,
Germany
¶School of Mathematics and Natural Sciences, University of Wuppertal, Wuppertal 42119,
Germany
§Department of Physics, Chemistry and Biology (IFM), Linköping University, SE-581 83
Linköping, Sweden
E-mail: ukleinekathoefer@constructor.university;zaspel@uni-wuppertal.de
S1
arXiv:2410.20551v1 [physics.chem-ph] 27 Oct 2024
Distributions of Excitation energies
1.6 1.8 2.0 2.2 2.4
Site Energy [eV]
Density
DFTB
STO-3G
3-21G
6-31G
def2-SVP
(a) m-type porphyrin molecules.
1.6 1.8 2.0 2.2 2.4
Site Energy [eV]
Density
DFTB
STO-3G
3-21G
6-31G
def2-SVP
(b) p-type porphyrin molecules.
Figure S1: Distributions of excitation energies for different levels of quantum chemistry
averaged over all pigments of the same type.
Transition charges from electrostatic potentials
Atom x (Å) y (Å) z (Å) Charge (e)
N 0.421597 -2.132765 -0.040233 0.004221
N -1.983771 -0.368911 -0.035220 0.134251
N -3.605662 -4.502220 -1.031529 0.005530
N 2.050558 0.310653 -0.195799 -0.137842
N -0.340619 2.060808 -0.211058 0.004226
N 5.008518 -2.994448 1.040074 -0.003230
N -4.820361 3.055132 1.091452 -0.000599
N 3.012745 4.886841 0.779343 -0.018144
C -1.936494 -2.908660 0.017158 0.227779
C -0.592297 -3.057533 -0.006731 -0.161439
C -2.572244 -1.533003 0.043459 -0.153517
C 1.689852 -2.671401 -0.041360 0.148796
S2
C -2.985299 0.594980 -0.016338 -0.175660
C 2.910429 -2.092278 -0.071507 -0.218195
C -2.857323 1.947617 -0.081107 0.192928
C -2.821890 -4.088093 0.085790 -0.057952
C 0.120829 -4.341481 -0.001371 0.046537
C -4.043513 -1.414690 0.156894 -0.044387
C 3.016401 -0.576386 -0.173204 0.145679
C -1.602042 2.649152 -0.219007 -0.166818
C 1.442074 -4.115945 -0.008222 -0.046181
C -4.240539 -0.120133 0.091302 0.006181
C 2.053042 2.787755 -0.289399 -0.213202
C 4.144607 -2.906612 -0.087775 0.056347
C -4.077582 2.804998 -0.094049 -0.040800
C 2.645689 1.566310 -0.252764 0.189595
C 0.630520 3.043435 -0.323455 0.172590
C 2.909964 4.009287 -0.347216 0.047661
C 4.364091 0.032361 -0.242907 0.041203
C -1.396375 4.010535 -0.353687 0.082123
C 4.082246 1.325772 -0.268444 -0.007633
C -0.025732 4.244872 -0.440427 -0.076580
C -2.902804 -4.747067 1.263714 0.018754
C 4.443901 -3.576882 -1.230780 -0.019234
C -4.455729 3.341823 -1.283890 0.012739
C -4.441843 -5.605528 -0.861493 0.014622
C -3.595555 -3.841328 -2.329381 0.003907
C -3.767104 -5.892177 1.392982 0.003225
C 3.515863 4.302707 -1.512417 -0.008985
S3
C 6.150667 -3.800456 0.934022 -0.015111
C -5.976699 3.836917 0.977024 0.011786
C -4.550884 -6.306116 0.265665 0.002901
C 4.780040 -2.326283 2.313005 -0.005672
C -4.470466 2.530065 2.401559 0.003310
C 5.621544 -4.404871 -1.287265 -0.002870
C -5.647546 4.157855 -1.347639 0.003229
C 3.806318 6.031995 0.630129 0.002088
C 6.492501 -4.485792 -0.149290 -0.004012
C -6.411432 4.386493 -0.152522 0.000575
C 2.371226 4.668773 2.066005 0.002163
C 4.323421 5.489409 -1.610854 -0.003148
C 4.454485 6.367095 -0.484315 -0.003731
H 0.261394 -1.143165 -0.051560 0.000000
H -0.348384 -5.312816 0.010812 -0.000000
H -4.775179 -2.197865 0.252831 0.000000
H 2.195243 -4.890485 -0.003348 0.000000
H -5.225600 0.348567 0.120893 0.000000
H 5.327980 -0.443695 -0.262616 -0.000000
H -2.149577 4.785825 -0.398553 0.000000
H 4.842302 2.094002 -0.314675 0.000000
H 0.436296 5.225117 -0.552454 0.000000
H -2.344832 -4.471084 2.148587 0.000000
H 3.835005 -3.548910 -2.119916 0.000000
H -3.932434 3.191837 -2.207290 0.000000
H -5.020987 -5.888868 -1.736485 -0.000000
H -2.913668 -2.995132 -2.404093 0.000000
S4
H -4.601480 -3.488041 -2.561983 0.000000
H -3.295689 -4.575277 -3.093514 0.000000
H -3.834607 -6.433284 2.323061 0.000000
H 3.440235 3.703766 -2.403736 0.000000
H 6.780809 -3.835399 1.823010 -0.000000
H -6.528279 3.986771 1.900444 -0.000000
H -5.197358 -7.159778 0.370213 0.000000
H 3.865095 -1.729997 2.344930 -0.000000
H 5.625547 -1.647177 2.515078 0.000000
H 4.727504 -3.063028 3.119384 0.000000
H -3.555538 1.946794 2.429889 -0.000000
H -5.301639 1.897090 2.760437 -0.000000
H -4.346592 3.366288 3.097250 0.000000
H 5.863872 -4.952648 -2.187886 -0.000000
H -5.968453 4.591096 -2.279423 -0.000000
H 3.867598 6.673476 1.512256 -0.000000
H 7.372166 -5.102891 -0.211581 0.000000
H -7.291128 4.991701 -0.210475 0.000000
H 1.781549 3.759567 2.129459 0.000000
H 1.709796 5.517388 2.282484 0.000000
H 3.142764 4.622414 2.844645 -0.000000
H 4.837824 5.728414 -2.536957 -0.000000
H 5.058418 7.248463 -0.568306 0.000000
H -0.130919 1.090401 -0.079390 0.000000
Table S1: Transition charges and Atomic coordinates (x, y, z) for m-TMPyP
S5
Atom x (Å) y (Å) z (Å) Charge (e)
C 5.005311 1.315763 -0.062397 -0.063429
C 6.822858 2.793393 0.644527 0.016456
C 5.452531 2.475338 0.575636 0.031097
C 6.008654 0.519655 -0.631443 0.030054
C 7.373325 0.807816 -0.442659 0.010262
N 7.754891 1.940274 0.188017 0.001928
C 9.188915 2.200493 0.418145 0.013398
C -5.052217 -1.423867 -0.169090 0.051190
C -6.992483 -2.902767 0.223869 -0.011712
C -5.617956 -2.671033 0.111290 -0.041290
C -6.044986 -0.459596 -0.490947 -0.020794
C -7.423356 -0.678102 -0.297683 -0.014965
N -7.865256 -1.882258 0.091848 0.000800
C -9.287730 -2.097999 0.423846 -0.014479
C 2.323920 -6.683852 -1.045090 -0.001502
C 1.939680 -5.337467 -1.137863 -0.008750
C 1.059098 -5.468637 1.087439 -0.009720
C 1.439782 -6.810833 1.128542 -0.000243
N 2.071177 -7.382541 0.074935 -0.005793
C 2.504820 -8.798642 0.187687 0.001520
C 1.313582 -4.709116 -0.054869 0.043909
C -2.118989 6.731555 -1.097417 -0.000216
C -1.701528 5.401750 -1.199121 0.007361
C -1.069160 5.470925 1.112486 -0.000117
C -1.502945 6.798853 1.162907 0.001945
S6
N -2.026010 7.401639 0.074949 0.003539
C -2.514783 8.782573 0.176029 -0.002770
C -1.172951 4.756131 -0.084051 -0.032454
C 0.927242 -3.257775 -0.121570 -0.216674
C 1.900035 -2.332055 -0.022505 0.170788
C 3.298715 -2.654680 0.238283 -0.040914
C 4.015500 -1.521009 0.319571 0.048500
C 3.151940 -0.382099 0.022333 -0.150854
C 3.534724 0.912782 -0.119660 0.235659
C 2.386349 1.918228 -0.278449 -0.139459
C 2.716319 3.313392 -0.590601 -0.041075
C 1.533552 3.914371 -0.474028 0.007256
C 0.477991 2.953611 -0.235422 -0.150672
C -0.839282 3.276963 -0.124360 0.185411
C -1.936775 2.350795 0.048328 -0.178131
C -3.243524 2.666098 0.380580 0.063236
C -3.946086 1.469323 0.554469 -0.072974
C -3.177941 0.416909 0.093080 0.143864
C -3.519603 -1.018215 -0.090718 -0.211386
C -2.502465 -1.941113 -0.125223 0.181780
C -2.618071 -3.364187 -0.155226 0.008662
C -1.444374 -3.979942 -0.285920 0.038749
C -0.506687 -2.849032 -0.194898 0.145077
N 1.895430 -0.958442 -0.083072 0.002560
N 1.090244 1.704229 -0.142735 0.115325
N -1.906131 0.973502 -0.123502 0.004150
N -1.119084 -1.687825 -0.124413 -0.134106
S7
H 7.172348 3.718628 1.101911 0.000000
H 4.761783 3.168529 1.058701 0.000000
H 5.776178 -0.378675 -1.199207 -0.000000
H 8.155051 0.129758 -0.787542 0.000000
H 9.354566 3.286331 0.428975 0.000000
H 9.772077 1.746511 -0.380679 0.000000
H 9.441675 1.762876 1.379397 0.000000
H -7.392522 -3.892071 0.453755 -0.000000
H -5.065105 -3.573468 0.325220 0.000000
H -5.761459 0.517887 -0.874408 -0.000000
H -8.156848 0.098515 -0.433902 0.000000
H -9.481020 -3.168262 0.520690 0.000000
H -9.887691 -1.677071 -0.367777 0.000000
H -9.471451 -1.601390 1.374755 0.000000
H 2.832358 -7.179728 -1.865393 0.000000
H 2.161299 -4.791212 -2.057151 0.000000
H 0.565313 -5.033971 1.948220 0.000000
H 1.240358 -7.440234 1.993114 0.000000
H 3.085954 -9.072683 -0.698984 -0.000000
H 1.607704 -9.411396 0.257808 0.000000
H 3.122131 -8.887079 1.080551 0.000000
H -2.551078 7.256783 -1.950198 0.000000
H -1.822645 4.871201 -2.145350 0.000000
H -0.682918 4.995776 2.013124 0.000000
H -1.440300 7.384339 2.075035 0.000000
H -2.928017 9.113924 -0.769246 0.000000
H -1.673227 9.420067 0.469777 0.000000
S8
H -3.299635 8.802407 0.949666 0.000000
H 3.709373 -3.638351 0.430252 -0.000000
H 5.050082 -1.504249 0.607771 0.000000
H 3.650642 3.770489 -0.840408 0.000000
H 1.401908 4.975229 -0.586045 0.000000
H -3.636689 3.653312 0.563152 0.000000
H -4.965491 1.428800 0.911641 0.000000
H -3.453088 -4.022879 -0.048589 0.000000
H -1.277767 -5.040636 -0.309835 0.000000
H 1.054709 -0.401934 -0.231692 0.000000
H -1.081090 0.425675 -0.356299 0.000000
Table S2: Transition charges and Atomic coordinates (x, y, z) for p-TMPyP
Hyperparameter optimization
In the GPR ML models, we have the set of hyperparamters σ2
n, σ2
f, l2
1, . . . , l2
D, which needs to
be optimized as part of the training process. To this end, we use the marginal log likelihood
(MLL) of the model. GPyTorch1provides the ADAM optimizer, which we run with a
learning rate of lr = 0.1for a total of 200 steps on a separated, randomly chosen validation
set of 1,000 samples. Since optimizers like ADAM are sensible to initial value selection, we
first select an improved set of initial values by grid search with 5-times repeated 5-fold cross
validation on the MLL optimization task over the grid given in Table S3. The best initial
values are then used in a final optimization procedure over the full validation set.
S9
Table S3: Grid of possible initial values for the ADAM-based hyperparameter optimization.
Hyperparameter Possible values
lengthscales l1, ..., lD{2, 100.75,101.5}
output scale σf{1}
noise σ2
n{10−4,10−6,10−8}
Preliminary Multifidelity Data Analysis
42024
yf
[meV] 1e2
0
1
2
3
4
5
6
7
Density
1e 3 Distribution
LC-DFTB
STO-3G
6-31G
SVP
TZVP
LC-DFTB STO-3G 6-31G SVP
0
1
2
3
4
ySVP
f
[meV]
1e1 Fidelity Difference
210123
yf
[meV] 1e2
2
1
0
1
2
ySVP
[meV]
1e2 Fidelity Scatter
LC-DFTB
STO-3G
6-31G
SVP
Figure S2: Preliminary multifidelity data analysis as recommended in Ref.2for porphyrin
molecule p9. The STO3G fidelity shows an unfavorable distribution with respect to the
target fidelity of SVP.
210123
yf
[meV] 1e2
0
1
2
3
4
5
6
7
Density
1e 3 Distribution
LC-DFTB
6-31G
SVP
TZVP
LC-DFTB 6-31G SVP
0.0
0.5
1.0
1.5
2.0
2.5
3.0
ySVP
f
[meV]
1e1 Fidelity Difference
21012
yf
[meV] 1e2
2
1
0
1
2
ySVP
[meV]
1e2 Fidelity Scatter
LC-DFTB
6-31G
SVP
Figure S3: Same as Fig. S2 but without the STO3G energies.
S10
42024
yf
[meV] 1e2
0
1
2
3
4
5
Density
1e 3 Distribution
LC-DFTB
STO-3G
6-31G
SVP
TZVP
LC-DFTB STO-3G 6-31G SVP
0
1
2
3
4
5
ySVP
f
[meV]
1e1 Fidelity Difference
2 0 2 4
yf
[meV] 1e2
3
2
1
0
1
2
3
4
ySVP
[meV]
1e2 Fidelity Scatter
LC-DFTB
STO-3G
6-31G
SVP
Figure S4: Preliminary multifidelity data analysis as recommended in Ref.2for the concate-
nated trajectories of the p-type porphyrin molecules. The STO3G fidelity shows unfavorable
distribution with respect to the target fidelity of SVP.
42024
yf
[meV] 1e2
0
1
2
3
4
5
Density
1e 3 Distribution
LC-DFTB
6-31G
SVP
TZVP
LC-DFTB 6-31G SVP
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
ySVP
f
[meV]
1e1 Fidelity Difference
2 0 2 4
yf
[meV] 1e2
3
2
1
0
1
2
3
4
ySVP
[meV]
1e2 Fidelity Scatter
LC-DFTB
6-31G
SVP
Figure S5: Same as Fig. S4 but without the STO3G energies.
420246
yf
[meV] 1e2
0
1
2
3
4
5
6
Density
1e 3 Distribution
LC-DFTB
STO-3G
6-31G
SVP
TZVP
LC-DFTB STO-3G 6-31G SVP
0
1
2
3
4
5
6
ySVP
f
[meV]
1e1 Fidelity Difference
2 0 2 4
yf
[meV] 1e2
2
1
0
1
2
3
ySVP
[meV]
1e2 Fidelity Scatter
LC-DFTB
STO-3G
6-31G
SVP
Figure S6: Preliminary multifidelity data analysis as recommended in Ref.2for the concate-
nated trajectories of the m-type porphyrin molecules. The STO3G fidelity shows unfavorable
distribution with respect to the target fidelity of SVP.
S11
2 0 2 4
yf
[meV] 1e2
0
1
2
3
4
5
6
Density
1e 3 Distribution
LC-DFTB
6-31G
SVP
TZVP
LC-DFTB 6-31G SVP
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
ySVP
f
[meV]
1e1 Fidelity Difference
210123
yf
[meV] 1e2
2
1
0
1
2
3
ySVP
[meV]
1e2 Fidelity Scatter
LC-DFTB
6-31G
SVP
Figure S7: Same as Fig. S6 but without the STO3G energies.
Additional Machine Learning Results
2 8 32 128 1024
NSVP
train
20
30
50
15
40
60
MAE [meV]
12.3
24.8
2 8 32 128 1024
NSVP
train
MAE [meV]
24.7
29.1
fb
GPR-SVP
6-31G
3-21G
STO-3G
DFTB
2 8 32 128 1024
NSVP
train
MAE [meV]
17.8
22.3
Figure S8: MFML learning curves with STO3G fidelity included.
101102103104
Ttrain data
[hr]
20
30
50
60
15
25
40
MAE [meV]
(a) p-TMPyP traj-9
101102103104
Ttrain data
[hr]
(b) p-TMPyP concatenated
GPR (SVP)
MFML(LC-DFTB)
(2)
(4)
(8)
(16)
101102103104
Ttrain data
[hr]
(c) m-TMPyP concatenated
Figure S9: Full time-cost plots.
S12
Spectral Densities
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
m1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(a) m1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
m2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(b) m2
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
m3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(c) m3
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
m4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(d) m4
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(e) p1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(f) p2
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(g) p3
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(h) p4
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(i) p5
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p6
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(j) p6
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p7
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(k) p7
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(l) p8
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p9
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(m) p9
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(n) p10
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p11
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(o) p11
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
1000
2000
3000
4000
5000
J( ) [cm 1]
p12
0.0
0.1
0.2
0.3
0.4
0.5
0.6
J( ) [eV]
(p) p12
Figure S10: Spectral densities of the individual pigments.
S13
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
500
1000
1500
2000
2500
3000
3500
4000
J( ) [cm 1]
m-type DFTB
m-type MFML
0.0
0.1
0.2
0.3
0.4
0.5
J( ) [eV]
(a) m-type porphyrin
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
500
1000
1500
2000
2500
3000
3500
J( ) [cm 1]
p-type DFTB
p-type MFML
0.0
0.1
0.2
0.3
0.4
J( ) [eV]
(b) p-type porphyrin
Figure S11: Comparison of the average Spectral densities based on MFML and DFTB.
References
(1) Gardner, J.; Pleiss, G.; Weinberger, K. Q.; Bindel, D.; Wilson, A. G. GPyTorch: Black-
box matrix-matrix Gaussian process inference with GPU acceleration. Advances in Neu-
ral Information Processing Systems 2018,31 .
(2) Vinod, V.; Maity, S.; Zaspel, P.; Kleinekathöfer, U. Multifidelity Machine Learning for
Molecular Excitation Energies. J. Chem. Theory Comput. 2023,19, 7658–7670, PMID:
37862054.
S14
Excitation Energy Transfer between Porphyrin
Dyes on a Clay Surface: A study employing
Multifidelity Machine Learning
Dongyu Lyu,†Matthias Holzenkamp,‡Vivin Vinod,¶Yannick Marcel Holtkamp,†
Sayan Maity,†Carlos R. Salazar,†,§Ulrich Kleinekathöfer,∗,†and Peter Zaspel∗,‡
†School of Science, Constructor University, Bremen 28759, Germany
‡School of Mathematics and Natural Science, University of Wuppertal, Wuppertal 42119,
Germany
¶School of Mathematics and Natural Sciences, University of Wuppertal, Wuppertal 42119,
Germany
§Department of Physics, Chemistry and Biology (IFM), Linköping University, SE-581 83
Linköping, Sweden
E-mail: ukleinekathoefer@constructor.university;zaspel@uni-wuppertal.de
Abstract
Natural light-harvesting antenna complexes efficiently capture solar energy using
chlorophyll, i.e., magnesium porphyrin pigments, embedded in a protein matrix. In-
spired by this natural configuration, artificial clay-porphyrin antenna structures have
been experimentally synthesized and have demonstrated remarkable excitation energy
transfer properties. The study presents the computational design and simulation of
a synthetic light-harvesting system that emulates natural mechanisms by arranging
cationic free-base porphyrin molecules on an anionic clay surface. We investigated the
1
arXiv:2410.20551v1 [physics.chem-ph] 27 Oct 2024
transfer of excitation energy among the porphyrin dyes using a multiscale quantum me-
chanics/molecular mechanics (QM/MM) approach based on the semi-empirical density
functional-based tight-binding (DFTB) theory for the ground state dynamics. To im-
prove the accuracy of our results, we incorporated an innovative multifidelity machine
learning (MFML) approach, which allows the prediction of excitation energies at the nu-
merically demanding time-dependent density functional theory level with the Def2-SVP
basis set. This approach was applied to an extensive dataset of 640K geometries for the
90-atom porphyrin structures, facilitating a thorough analysis of the excitation energy
diffusion among the porphyrin molecules adsorbed to the clay surface. The insights
gained from this study, inspired by natural light-harvesting complexes, demonstrate
the potential of porphyrin-clay systems as effective energy transfer systems.
Introduction
In nature, sunlight is collected efficiently by pigment molecules embedded in a protein matrix
and subsequently transferred to a reaction center. At the same time, remarkable progress
has been made in recent years in dye chemistry developing functional materials.1To identify
next-generation energy devices, researchers are exploring both experimental and theoretical
biohybrid approaches inspired by biological systems.2Of particular interest are efforts to
translate biological principles directly into synthetic energy systems.3To this end, photo-
chemical systems with a light-harvesting function on inorganic nanosheets are being devel-
oped.4–7In particular, porphyrins and other tetrapyrrole macrocycles have been shown to
have an impressive variety of functional properties that have been and can be exploited in
natural and artificial systems.8
Experimental results indicate that clay-porphyrin complexes have the potential to be
used in the development of highly efficient artificial light-harvesting systems.4,9The assem-
bly of porphyrin molecules on clay surfaces can be effectively controlled using electrostatic
interactions.4,5,10 The authors reported an almost 100 % efficiency of the energy-transfer re-
2
action.4In terms of adsorption, the “size-matching effect”, which entails aligning the anionic
charge distribution with the dimensions of the adsorbing molecules, has been posited as a
means of achieving enhanced adsorption. Furthermore, the observation of multiple-step en-
ergy transfer reactions between three adsorbed dyes on the clay surface is noteworthy.5The
combination of a metalloporphyrin as a photocatalyst and a subporphyrin as a photosen-
sitizer in a self-assembling manner has been shown to enable the photochemical conversion
of cyclohexene using a wider range of visible light without any energy loss due to the sup-
pression of unexpected deactivation processes.6Other porphyrin-clay arrangements include,
e.g., the intercalation of a double-decker porphyrin metal complex into clay nanosheets.11
These examples demonstrate potential applications of porphyrin-clay systems. In this study,
we examine a clay-pophyrin system that emulates the experimental setup of Ref. 4. To this
end, we have developed a molecular multiscale approach coupled to multi-fidelity machine
learning in order to simulate the excitation energy transfer.
In several of the experimental studies, the clay saponite is utilized. While the atom-
istic structure has yet to be resolved, numerous individual features have been identified.12
Given the lack of full atomic-level details for saponite, this study will utilize montmoril-
lonite nanosheets for simulations, as has been done in other studies on clays, including the
adsorption of esters and polymers.13–16 Very recently, ClayCode, a software facilitating the
modeling of clay systems closely resembling experimentally determined structures, has been
made public.17 This will facilitate the simulation of an even greater number of clay types,
as referenced.
During photosynthesis in plants, algae, and some bacteria, the pigment molecules respon-
sible for the absorption of sunlight are (bacterio)chlrophyll molecules. These molecules play
a dual role in the process, serving both as light absorbers and as mediators of excitation en-
ergy transfer to the reaction centers. In the experimental system, which will be emulated in
this study, tetrakis(1-methylpyridinium-3-yl) (m-TMPyP) and tetrakis(1-methylpyridinium-
4-yl) porphyrins (p-TMPyP) adsorbed on a clay surface were selected for the same objective4
3
(see Fig. 1). Therefore, the present simulation study will focus on the same porphyrin types,
with the understanding that the results may also be applicable to other porphyrin molecules.
Regarding the approach to modeling the excitation energy transfer in the porphyrin-clay
system, we will largely follow a procedure that has been successfully employed to model bi-
ological light-harvesting systems.18–23 In summary, the initial step is to conduct a molecular
dynamics (MD) simulation of the entire system. The equilibrated structure is then used
as the starting point for QM/MM (quantum mechanics/molecular mechanics) simulations
simulating the pigment molecules. These employ the efficient density functional tight bind-
ing (DFTB) approach24 to improve the description of the vibrational motion of the pigment
molecules. Subsequently, the excited states are determined using a machine-learning ap-
proach as detailed below. At the same time, the excitonic couplings are calculated using the
popular TrESP (Transition charges from Electrostatic potential) approach.25,26 Based on this
information, a time-dependent Hamiltonian can be constructed, which in turn is used to es-
timate the exciton dynamics employing the NISE (Numerical Integration of the Schrödinger
equation) formalism,27 also known as Ehrenfest scheme without back-reaction.28
Concerning the calculation of the excitation energies along the QM/MM trajectories, we
want to majorly improve their accuracy by employing the multi-fidelity machine learning
(MFML) to the present problem.29,30 Starting from a training set of pairs of (descriptors
for) molecular geometries and corresponding calculated excitonic properties, we can con-
struct regression models in machine learning (ML) to be used for the determination of the
excitation energies along trajectories. These models are cheap to evaluate and ideally have
low prediction error, relative to unseen molecular geometries that are picked from a neigh-
borhood of the conformation space spanned up by the training data.31 We use ML models
to predict excitation energies at the level of time-dependent density functional theory us-
ing the CAM-B3LYP functional (TD-DFT/CAM-B3LYP) with the basis set def2-SVP. To
get a strong model with low prediction error, we would need a large amount of such train-
ing samples, which would however come at a prohibitive computational cost. Instead, we
4
use MFML29,30,32,33 models, which are specifically constructed from training data at different
levels of fidelity, here different basis set sizes (STO-3G, 3-21G, 6-31G, dev2-SVP) for the TD-
DFT/CAM-B3LYP calculations with a strongly decreasing amount of samples with growing
level of the hierarchy.33 Thereby, as we will show, we clearly reduce the total amount of
computational effort over building “classical” single-fidelity ML models from standard train-
ing data. In addition, we investigate active learning strategies34–37 to reduce the required
amount of training samples, even on a single level. Active learning aims at choosing or
creating new training samples, which are maximally informative to the model and therefore
reduce redundant information in the training data. In our setup, the selection process is
done on a large amount of candidate molecular configurations (without calculated energies)
from the MD runs via uncertainty sampling.38
The following section outlines the atomistic setup of the porphyrin-clay system, which
will be simulated at the classical MD and QM/MM levels. The MD simulations allow us
to assess the stability and movement of porphyrin molecules on the insulating clay surface.
Furthermore, we examine the nearest-neighbor distances of select porphyrin molecules and
the resulting excitonic couplings between them. Subsequently, we will address the topic
of active learning and multi-fidelity machine learning of excitation energies. Based on these
results, we can determine the spectral densities, which describe the interaction of the primary
modes and those treated as a thermal bath. The excitation energies and excitonic couplings
are used in the NISE approach to determine the exciton dynamics and the diffusion of
excitation energy along the clay surface. The contribution concludes with a brief overview
of the key findings and future directions, while a Methods section gives some more insights
into technical details of the used approaches.
5
MD and QM/MM simulations
In a first step, the clay surface has to be modelled. Here we decided to model a sheet
of montmorillonite since it is widely used in experiments and structural data is available.
The chemical formula of montmorillonite is (K, N a)x[Si4O8][Al(2−x)M gxO2(OH)2]. In the
present study, we use Na and Cl ions as counterions dissolved in the surrounding water. The
parameter x, which can vary between 0 and 0.95, indicates the number of aluminum and
magnesium atoms in the system and can be used to adjust the charge of the surface. Four
different values of x, i.e., 0.13, 0.2, 0.45 and 0.94, were tested together with four m-TMPyP
and twelve p-TMPyP molecules (see Methods section for details of the MD and QM/MM
setups). Larger values of xmake the clay surface more anionic, though the cationic porphyrin
molecules have to compete with sodium ion concerning the binding to the surface. It has to be
noted that in the present cationic montmorillonite model, the negative nature of the surface
(a) (b)
Figure 1: (a) The chemical structures of two porphyrins used in experiment4and in the
present study. (b) The montmorillonite surface used in our simulations. Oxygen, silicon,
and hydrogen atoms are shown in red, yellow, and white, respectively, while the aluminum
and magnesium atoms are colored in pink and blue, respectively. Visible in the structure is
the non-regular arrangement of the magnesium defects located in the middle layer.
6
is due to atom exchange in the so-called octahedral layer while in the experimental setup
with the saponite system the ions were exchanged in the tetrahedral layer4(see Methods
section for details). The latter layer is closer to the surface of the material. Therefore, the
negative charges in the present modeling setup are slightly more screened at the surface,
leading to a somewhat reduced binding of the molecules to the surface. The montmorillonite
surface with a xvalue of 0.45 was chosen for the following calculations, since all porphyrin
molecules stayed on the surface after the MD equilibrium for 30 ns. In simulations with the
other values of x, at least one dye molecule left the surface during the simulations.
Starting from the last frame of the MD simulation, quantum mechanics/molecular me-
chanics dynamics (QM/MM) ground state simulations are performed. For each porphyrin
molecule, 40,000 conformations including all surrounding point charges were saved with a
stride of 1 fs. Subsequently, these conformations were used as the basis of excited state
calculations using TD-LC-DFTB.26,39 In addition, TD-DFT/CAM-B3LYP calculations in
combination with four different basis sets, i.e., STO-3G, 3-21G, 6-31G and def2-SVP, were
also used to calculate the excited states of these conformations with a stride of 8, 16, 32 and
64 fs, respectively. Unlike the protein environment in natural light-harvesting systems, the
clay surface is basically rigid and has a high degree of periodicity if one neglects the non-
periodic charge distribution due to the Mg doping inside the material. The conformations
of the porphyrin molecules, however, are flexible and show fluctuations. These changes in
the conformations lead to energy fluctuation. A short piece of such energy fluctuations is
depicted in Fig. 2a. The excitation energies, also termed site energies below, lead to distri-
butions as shown in Fig. 2b. It becomes clear that the m-type porphyrins have site energies
which are, on average, higher in energy than those of the p-type porphyrin dyes. The average
site energies obtained by different level of theory show offsets with respect to each other,
but an overall consistent trend as shown in Fig. S1 and the average excitation energies are
shown in Fig. 2c. The same type of pigments have somewhat different site energies due to
their different placements on the surface and the interactions with other porphyrins on the
7
0 25 50 75 100 125 150 175 200
Time [fs]
1.7
1.8
1.9
2.0
Site Energy [eV]
m1
m2
p1
p2
(a)
1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2
Site Energy [eV]
Density
m1
m2
m3
m4
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
(b)
m1 m2 m3 m4 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10p11p12
1.6
1.8
2.0
2.2
Site Energy [eV]
TD-LC-DFTB
STO-3G
3-21G
6-31G
def2-SVP
ML
(c)
Figure 2: (a) An example fragment of the energy fluctuations at TD-LC-DFTB level of
theory for four porphyrins on the surface. (b) Distribution of the excited state energies
using TD-LC-DFTB, which are approximately Gaussian distributed. (c) The average site
energies using different levels of quantum chemistry and the predictions of the MFML model.
The numbering of the different m-type (m1-m4) and p-type (p1-p12) porphyrin dyes on the
clay surface are discussed below.
surface. These energy values from different levels of quantum chemistry are then used as
input for the MFML model, as described in the next section.
Furthermore, an additional 100 ns-long classical MD simulation was performed producing
10,000 snapshots at a stride of 10 ps in order to be able to better evaluate the positioning and
movement of the porphyrin dyes on the clay surface and to determine the excitonic couplings
between them. In Fig. 3, three snapshots at 0, 50 and 100 ns of this additional trajectory give
an impression of the positions and conformational difference of the porphyrin molecules along
the trajectory. The m-TMPyP molecules (red) are tightly adsorbed, with all four side chain
8
groups interacting closely with the clay surface for most of the simulation time. It should
be noted that the adsorption positions of the side chains are mostly close to the positions of
the magnesium defects in the montmorillonite. In Refs.4,40 this has been termed the “size-
matching rule”. On the contrary, only two to three side chains of the p-TMPyPs molecules
(a) 0ns (b) 50ns (c) 100ns
Figure 3: Several snapshots of the classical MD trajectory. The montmorillonite surface
is shown in yellow in the background, with gray beads representing the magnesium defects
inside the clay material. The m-TMPyP and p-TMPyP molecules on the surface are colored
in red and blue, respectively.
(a) Single Copy (b) All Copies
Figure 4: Scatter plot of the center of masses of the dyes reported every 100 ps along the
100 ns trajectory in (a) a single unit cell and (b) in a plot also showing the nearest periodic
images to all sides in the plane of the clay surface.
9
(blue) are adsorbed on the surface, and the rest are “wandering” in the water environment,
making the molecules “stand” with a certain angle, and move more easily on the clay surface.
In order to better visualize the range of movements of the porphyrins along the trajectory,
the center of mass of the pigments along the trajectory was extracted and plotted in Fig. 4.
Compared to the m-type porphyrins, which are approximately fixed on the surface, there are
distinct movements of the p-type ones. These movements also significantly modify the now
time-dependent distances between porphyrins. In addition, note in particular that although
Fig. 4a suggests a large range of motion for the porphyrin m3, it only starts moving after
molecule p2 approaches.
In natural light-harvesting systems, the chromophores are embedded in a stable pro-
tein environment without significant changes in the distances between them. However, in
this artificial porphyrin-clay surface system, not only the distance between the porphyrin
0 20 40 60 80 100
Time [ns]
1.6
1.8
2.0
2.2
2.4
2.6
2.8
Nearest Neighbor Distance [nm]
Nearest Neighbor
p3
p6
p7
p9
p10
m2
(a) Nearest neighbors of m1.
0 20 40 60 80 100
Time [ns]
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
Nearest Neighbor Distance [nm]
Nearest Neighbor
p4
p5
p2
p12
(b) Nearest neighbors of m3.
0 20 40 60 80 100
Time [ns]
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
Nearest Neighbor Distance [nm]
Nearest Neighbor
m4
p10
m3
p1
p12
(c) Nearest neighbors of p2.
0 20 40 60 80 100
Time [ns]
6
4
2
0
2
4
Strongest Coupling [cm 1]
Coupled To
p3
p6
p9
p10
(d) Strongest coupling to m1.
0 20 40 60 80 100
Time [ns]
10
5
0
5
10
15
20
Strongest Coupling [cm 1]
Coupled To
p4
p5
p6
p1
p2
p12
(e) Strongest coupling to m3.
0 20 40 60 80 100
Time [ns]
50
40
30
20
10
0
10
20
Strongest Coupling [cm 1]
Coupled To
m4
p3
p4
p10
p5
p6
m3
p1
p8
p9
p11
p12
(f) Strongest coupling to p2.
Figure 5: Nearest neighbor distances (top row) and strongest coupling (bottom row) of select
pigments. The colors indicate which neighbor is the nearest (or strongest coupled) at a given
time. The nearest neighbors and pigments with the largest coupling value can be across the
periodic boundary.
10
molecules, but also which dye is the nearest neighbor can change continuously while the
porphyrins move on the surface. These changes in distance between pigments have a sig-
nificant impact on the excitonic couplings and subsequently the exciton transfer dynamics.
Considering the periodic boundary conditions, the distance between each porphyrin and its
nearest neighbor was extracted from an extended box with 3×3copies (see Fig. 4b) over the
10,000 frames. Here, we choose three porphyrins as examples and show the nearest neighbor
distances in the top row of Fig. 5. The different colors indicate the (frequent) change of the
nearest neighbor. The average nearest neighbor distance across all pigments and snapshots
is 2.01 nm, which is of the same magnitude as the value of 2.6 nm reported in the respective
experiment.4However, in analysis of the experiment, a perfect hexagonal packing was as-
sumed for the porphyrins on the surface. The average distance to the six nearest neighbors
is 2.92 nm in our case, suggesting that our distances on average are slightly larger than the
value reported in the experiment.
The couplings were calculated based on the Coulomb interaction of the transition charges
Vmn(t) = f
4πϵ0X
I,J
qm,T
I·qn,T
J
|rm
I(t)−rn
J(t)|,(1)
where ϵ0denotes the vacuum permittivity, qm,T
Iand qn,T
Jthe transition charges of atom I
and Jfrom pigment mand n, while rm
I(t)and rn
J(t)are the positions and fis the screening
factor. The transition charges were calculated with the “transition charges from electrostatic
potentials” (TrESP) method41,42 based on CAM-B3LYP/6-31G* level of theory and can be
found in Tables S1 and S2. The positions were based on the MD trajectory with a time step
of 10 ps. Furthermore, we chose the pairwise screening factor introduced in Ref. 43 as the
screening factor f. In a final step, we usually rescale the coupling to match experimental
values. Since no experimental dipole moments for these types of porphyrin were available,
we used a previous rescaling factor for bacteriochlorophyll (BChl).18 The largest absolute
coupling value was extracted for each porphyrin along the trajectory. The bottom row of
11
Fig. 5shows the highest coupling values for the same pigments as in the top row. The signs
of the coupling values can vary over time, as the nearest neighbor and the orientation of the
porphyrin molecules with respect to each other change over time. Therefore, the coupling
values indirectly also contain some information about the rotation of the molecules, i.e.,
mainly of the p-type molecules since the m-type molecules are rather immobile. Usually, the
closest pigment is also the one which has the largest excitonic coupling value. In cases of
similar distances to the nearest and the next nearest neighbor, however, the orientation of
the pigments to each other becomes the determining factor. The average highest absolute
coupling is 4.36 cm−1, while the median is only 2.44 cm−1suggesting strong outliers as can
also be seen in Fig. 5. With such strong fluctuations and even change of sign, the usual
practice of taking average couplings is likely very flawed in our case, so that we have to take
the time dependence of the couplings into account for our further investigations.
Machine Learning
Expensive calculations of the excitation energy for the first excited state are replaced by
evaluations of MFML models, one model for all 12 p-type molecules and one model for all
m-type molecules. Prior to building these two MFML models on a large, computationally
training dataset, the characteristics of the porphyrin data of the system are being studied
in the context of ML model construction for cheap single-fidelity data, namely excitation
energies calculated using the TD-LC-DFTB approach. A particularly important topic is the
transferability of the final models across the pigments. Also, the impact of using Active
Learning is studied.
Transferability Analysis
In Figure 6we analyze the configuration spaces of the 12 different p-type molecules, which
are encountered during the MD run, and how they relate to each other. In Fig. 6a a UMAP
12
(Uniform Manifold Approximation and Projection) plot44 of the Coulomb representations of
the 40,000 configurations of every p-type trajectory is shown. It can be seen that, in most
cases, the configuration spaces, which are covered by the different molecules, are clearly
differentiated. Only for a few cases, the configuration spaces of two molecules overlap, e.g.,
for p9 and p1. These distinct configurations spaces are problematic, since they give a first
(a)
10 100 1000
Ntrain
15
20
30
50
70
100
150
200
MAE [meV]
Test on p9
Test on other trajectories
(b)
10 100 1000
Ntrain
20
25
30
40
50
60
80
100
MAE [meV]
Test on all trajectories
Test on separate trajectories
(c)
Figure 6: Results of a transferability study of single-fidelity models across p-type pigments:
(a) For an understanding of the variations in the porphyrin conformations, a UMAP (Uniform
Manifold Approximation and Projection) plot is provided based on the MD trajectories for
the 12 p-type porphyrin molecules. (b) Learning curves for a single-fidelity model trained
on p9 and separately evaluated on all p-type pigments. The model does not generalize to
the other pigments. (c) Learning curves for a single-fidelity model trained on the union of
the trajectories of all p-type pigments. The model generalizes much better. Note that for
better visibility, the vertical axis of (b) and (c) have different ranges.
13
indication towards a bad transferability of ML models being trained on a single trajectory
towards model evaluations on the trajectories of the remaining pigments.
To study this transferability further, we train a single-fidelity Gaussian Process Regres-
sion (GPR)45 model only on p9 data, with hyperparameters being optimized on a separated,
randomized validation subset with 1,000 samples of the p9 trajectory. Figure 6b, shows
learning curves, i.e., an analysis of the prediction error for a growing number of training
samples. Here, the model built on p9 data is evaluated on separate, randomized subsets of
size 2,000 for each p-type molecule. Results reflect an average over five GPR models that are
trained using different random selections of training data, excluding the test and validation
sets. The model trained exclusively on p9 fails to make reasonable predictions for configura-
tions from the other trajectories. The only exception is, when testing on p1, where the error
decreases with an increasing number of training samples, although the error is significantly
higher than for the test set of p9. In the UMAP, see Figure 6a, the configuration spaces of p1
of p9 are one of the few configuration spaces of different trajectories which are overlapping.
This shows that the differences in configuration spaces among the various trajectories, as
illustrated in the UMAP, are indeed substantial. A model trained solely on samples from
one trajectory cannot be transferred to predict configurations from other trajectories.
To overcome this challenge, any subsequent model used in this overall study is trained
over the union of all trajectories of all pigments of one type. Fig. 6c depicts learning curves
for a model trained on samples from all trajectories of p-type molecules, using the same
construction approach outlined for the previous set of learning curves. This joint model
shows a prediction error that decreases with an increasing number of training samples across
the test sets of all different p-type pigments. We hence obtain a model with largely uniform
prediction errors cross the individual pigments. Still, it needs to be noted that the prediction
error for the joint model is higher than the error of the model trained only on p9, when
predicting the p9 test configurations. This is however not surprising as we, in average, only
take 1,000/12 samples from trajectories of each pigment, compared to the 1,000 of only p9
14
in the other case. At this point, we balance transferability with model accuracy.
Assessing Active learning
One important question is, whether we can gain from specifically choosing training samples
out of a set of unlabeled candidate molecular configurations, in contrast to randomly or
uniformly sample training data from a given molecular trajectory. This is what is done in
Active Learning (AL). In an exemplary study, we analyze the impact of using uncertainty
sampling with the Gaussian Process Regression standard deviation as uncertainty measure
to select training samples, rather than using a randomized sampling.
200 500 1000 2000 5000
Ntrain
5
6
7
10
15
20
30
40
50
MAE [meV]
p-traj 9
Absolute error
Random
GPR standard deviation
200 500 1000 2000 5000
Ntrain
5
6
7
10
15
20
30
40
50 Concatenated
Figure 7: Comparative study between randomized training sample selection and Active
Learning using GPR standard deviation for a model build for a single pigment (left) and for
a model build on all p-type pigments (right).
Figure 7shows, on the left-hand side, learning curves comparing randomized sampling
to active learning for a model that is only built on training samples of p9. Besides of the
selection of the particular training set, the same data and same error evaluation approach as
in the last section is used. We start the study from a model that is build on 200 randomly
selected samples and consecutive add samples form the remaining full trajectory, either with
active learning or in a randomized fashion. As a reference, we altermatively add samples
with the highest actual absolute error, to show the optimum that we could achieve with
a greedy sampling scheme such as AL. For p9, it can be seen that uncertainty sampling
with GPR standard deviation results in a slightly improved prediction error compared to
15
random sampling. The (red) reference result indicates, that there would be room for further
improvement with a greedy sampling scheme, if a better uncertainty measure than GPR
standard deviation would be available.
In a second study, for which the results are depicted in Figure 7on the right-hand side,
active learning is tested on the union over all trajectories for all p-type pigments, as we need
it for transferability, see last section. In this case, randomized sampling gives better results
than AL, in contrast to the result for p9, only. Based on this mixed picture, we decided to
stick to a randomized training sample selection strategy for the remaining construction of
MFML models.
Multifidelity Machine Learning Models
The final models for the prediction of the excitation energies of the p-type and m-type
porphyrin pigments are multifidelity machine learning models (MFML). Such a MFML model
involves the use of quantum chemistry training data at different fidelities, which refer to the
accuracy of the data with respect to the actual value.32 The MFML models are built with
respect to some target fidelity and a baseline fidelity. The former, denoted hereon as F, refers
to the most accurate (and thereby costliest) fidelity that one is interested in predicting with
the ML model. The baseline fidelity, fb, is the cheapest (and thereby the least accurate)
fidelity data that is used in the MFML model. This MFML model is denoted by P(F,ηF;fb)
M F ML
where 2ηF=N(F)
train is the number of training samples used at the target fidelity. The number
of training samples at the subsequently cheaper fidelities are determined by the use of a
scaling factor,γwhich is conventionally set to 2 based on Ref. 32. That is, Nf
train =γ·Nf−1
train
for all fb< f ≤F. The detailed theoretical framework of MFML is further presented in
the Section Multifidelity Machine Learning Approach. In this work, the target fidelity is
TD-DFT/CAM-B3LYP with the def2-SVP basis set while the cheapest baseline fidelity is
the LC-DFTB approach. These will be reported using a shorthand notation, that is SVP
and DFTB, respectively. The different ML models are assessed based on the MAE using
16
learning curves which depict the MAE as a function of the number of training samples used
at the target fidelity. In addition, the MAE is studied as a function of the time-cost of
generating training data for a specific ML model, be it single fidelity GPR or MFML models
with different values of fb(see Model Evaluation). These are reported alongside a recently
introduced MFML approach, termed the Γ-curve which fixes the number of training samples
at fidelity Fand varies only the value of γ.46 This is shown to be superior to the conventional
approach of MFML in providing a low-cost high-accuracy ML model for the prediction of
excitation energies of porphyrin.
2 8 32 128 1024
NSVP
train
20
30
50
15
40
60
MAE [meV]
13.9
24.7
fb
GPR-SVP
6-31G
3-21G
DFTB
(a) Based only on a trajectory
of porphyrin pigment p9.
2 8 32 128 1024
NSVP
train
20
30
50
15
40
60
MAE [meV]
24.8
29.1
fb
GPR-SVP
6-31G
3-21G
DFTB
(b) Concatenated trajectories of
the p-type pigments.
2 8 32 128 1024
NSVP
train
20
30
50
15
40
60
MAE [meV]
19.2
22.3
fb
GPR-SVP
6-31G
3-21G
DFTB
(c) Concatenated trajectories of
the m-type pigments.
Figure 8: MFML learning curves for three cases of predicting excitation energies for por-
phyrin molecules. The prediction error (as MAE) of the single fidelity GPR and of the
MFML model with the DFTB baseline fidelity are explicitly stated for NSV P
train = 1024.
As in the previous sections, single fidelity GPR models and MFML models were built
and compared on the single pigment p9 and on the union over all p-type pigments for trans-
ferability reasons. In addition tests are conducted on the union over all m-type pigments. In
the all-pigment models an even sampling of the training data is used. The models are tested
on a separated holdout test set for which the excitation energies are calculated at the target
fidelity, that is SVP. For p-type pigments, 2,000 test samples are used, while for m-type
pigments 500 test samples are considered, to account for the lower total amount of data.
The resulting MFML learning curves for the different cases are shown in Fig. 8. The shown
learning curves are an average over ten learning curves created from shuffling the training
17
data set. The different learning curves in a single study are given for a growing number of
utilized fidelity levels starting from the baseline fidelity fb, as indicated in the legend.
The case of training and testing on the same trajectory of p-type porphyrin molecules
is shown in Fig. 8a for different fb. With the addition of cheaper baseline fidelities, one
observes that the MFML model predicts with a lower MAE in comparison to the single
fidelity GPR model built with training samples only from the target fidelity. For instance,
with NSV P
train = 1024, the GPR model results in an MAE of 24.7 meV while the MFML model
with the baseline fidelity fbset to DFTB results in an MAE of 13.9 meV. While this is
a promising result, the transferability study from above indicated that a joint model for
all pigments of one type should be constructed. For this reason, the final MFML models
that are used to predict the excitation energies for the porphyrin molecules are built with
training data taken from a pool of trajectories for each type of porphyrin molecule. The
MFML learning curves for the p-type porphyrin molecules are delineated in Fig. 8b for
different baseline fidelities. While the addition of cheaper baselines does decrease the model
MAE, this drop is not as significant as seen in the case for the single trajectory. The drop
in error between single fidelity GPR and MFML with the DFTB baseline fidelity is about
6 meV for NSV P
train = 1024. However, this is anticipated since both the single fidelity GPR
and the MFML models have to cover a wider region of the conformational phase space (see
discussion on the UMAPs from Fig. 6a) as opposed to a smaller region that is to be covered
in the case for the single trajectory models. A similar observation is made for the MFML
learning curves for m-type porphyrin molecules as seen in Fig. 8c with the single fidelity GPR
model reporting an MAE of 22 meV and the MFML model with DFTB baseline reaching
an MAE of 19 meV with 1024 training samples at the SVP fidelity. The slightly overall
lower MAE for the m-type porphyrin dyes can be explained once again by the fact that
the concatenated trajectories of this porphyrin type result in a lower number of geometries
which in turn could span a smaller region of the conformational phase-space as opposed to
the case for the p-type porphyrin molecules. That is, the m-type set has a smaller number of
18
total geometries when concatenated in comparison to the total geometries of the p-TMPyP
set. The larger number of total geometries for the p-type set implies that the MFML model
with NSV P
train would contain a smaller amount of information about the conformation space of
the molecule, in contrast to the MFML model built for m-type set. This fact is reflected in
the learning curves.
101102103104
Ttrain data
[hr]
20
30
50
60
15
25
40
MAE [meV]
GPR (SVP)
MFML(DFTB)
(8)
(a) Based only on a trajectory
of porphyrin pigment p9.
101102103104
Ttrain data
[hr]
20
30
50
60
15
25
40
MAE [meV]
GPR (SVP)
MFML(DFTB)
(8)
(b) Concatenated trajectories of
the p-type pigments.
101102103104
Ttrain data
[hr]
20
30
50
60
15
25
40
MAE [meV]
GPR (SVP)
MFML(DFTB)
(8)
(c) Concatenated trajectories of
the m-type pigments.
Figure 9: Time-cost of generating training data versus MAE in meV for a single fidelity
KRR contrasted with that for the MFML model built with baseline findelity fbDFTB. The
Γ(8)-curve is also depicted for increasing values of γas explained in Section .
In order to better assess the computational impact of single fidelity and MFML models
for porphyrin molecules, the model error is studied as a function of the cost of generating
the training data used in the ML model. In this work, the QC calculation times as returned
by the ORCA computing software47 and DFTB+ software24,48 are used. For the GPR, this
cost is directly related to the number of training samples. For the MFML model, this cost
includes not only the training samples used at the top fidelity, but also the cost of the
training samples used at the subsequent lower fidelities. These curves are shown in Fig. 9.
The time required for training the models and predictions over the holdout test set of the
MFML model for the largest training set size used (that is, NS V P
train = 1024) was 12.97 seconds
and 12.45 seconds for the p-type and m-type porphyrin molecules, respectively. Since this is
such a small contribution, only the total time for generating the training data is considered
in the MAE versus time-cost curves.
In addition to the single fidelity GPR and MFML models, a recently introduced MFML
19
approach, referred to as the Γ-curve,46 is analyzed as well. In conventional MFML theory,
the training samples at the various fidelities are decided by a scaling factor,γ, that is,
Nf
train =γ·Nf−1
train for f∈ {2, . . . , F }. Conventionally, a MFML model is built with γ= 2
based on previous studies.32,49–51 However, the use of different values of γhas recently been
studied resulting in a reportedly more efficient approach titled the Γ-curve. The Γ-curve is a
plot of MAE versus time-cost of the MFML model with increasing values of γ. The Γ-curve
is built with a fixed number of training samples at the target fidelity, SVP. Figure 9reports
the Γ(8) curve, that is, with NMSV P
train = 8 with varying values of γ. Different values of NSV P
train
were considered and are shown in Fig. S9 in the supplementary material.
Figure 9a depicts the MAE versus time-cost of the ML models for the case of the single
trajectory of p-type porphyrin molecules. One observes that for a given time-cost on the
horizontal axis, the curve for the MFML model is always below that for the single fidelity
GPR curve. This implies that for a given time-cost, the MFML model results in a lower
error than the single fidelity GPR model. Furthermore, the Γ(8) curve lies lower than
the conventional MFML curve. Once again, this implies that for a given time-cost, the
multifidelity model built along the Γ(8)-curve results in a lower MAE. A similar observation
is made for the case of concatenated trajectories of the p-type and m-type molecules in
Figs. 9b and 9c, respectively. Although for the m-type porphyrin molecules, the GPR curve
does reach close to the conventional MFML curve, the Γ(8)-curve always lies beneath it. The
final multifidelity models that are used in this work for the prediction of excitation energies
correspond to the final data point of the Γ(8) curve, which corresponds to γ= 12. The
multifidelity training structure for this model is {8,12·8 = 216,122·8 = 1152,123·8 = 13824}
with decreasing fidelity. For the p-molecules, this model results in an MAE of ∼25 meV
with a time cost of about 1500 hours, while the conventional MFML model reports a similar
error with a time cost of roughly 8000 hours. The single fidelity GPR model only reaches
an MAE of 29 meV with a time-cost of 2000 hours. The use of the multifidelity model along
the Γ(8) curve results in a time-cost benefit of roughly 5 over the conventional MFML model
20
with γ= 2 and NSV P
train = 1024. For the p-type porphyrin molecules, the corresponding time-
benefit of using the multifidelity model along the Γ(8) curve over the conventional MFML
model is about the same with the former reporting an MAE of about 17 meV for a time-cost
of roughly 1500 hours, while the latter costs as much as 8000 hours for an MAE of about 19
meV.
Exciton dynamics
In this section, we investigate the exciton dynamics in the porphyrin-clay system by con-
structing a time-dependent Frenkel exciton Hamiltonian, employing the Numerical Integra-
tion of the Schrödinger Equation (NISE) method, and analyzing the exciton diffusion. Our
aim is to simulate exciton dynamics over 10ns with a 1 fs time step for nine copies of the 16
pigments, totaling in 144 pigments (see Fig. 4).
Building the Frenkel Exciton Hamiltonian
To model the exciton dynamics, we construct a time-dependent Frenkel exciton Hamiltonian
ˆ
H(t) =
N
X
i=1
[Ei+ ∆Ei(t)] |i⟩⟨i|+X
i=j
Vij (t)|i⟩⟨j|,(2)
where Eidenotes the average site energies, ∆Ei(t)the site energy fluctuations, Vij (t)the
time-dependent electronic couplings, and Nthe total number of pigments in the system.
The average site energies Eiand the site energy fluctuations ∆Ei(t)are obtained from the
MFML model, which predicts the excited-state energies based on the atomic positions from
the QM/MM trajectories for each pigment, as described in earlier sections. The electronic
couplings Vij (t)are calculated using the TrESP method41,42 along a 100 ns MD trajectory
with a 10 ps time step, as detailed above. In previous studies, average couplings were of-
ten used to simplify the Hamiltonian.23,28,52,53 As discussed, however, in the Section on the
21
MD and QM/MM simulations, using average couplings may not be suitable here due to
significant coupling fluctuations as well as the canceling of positive and negative couplings.
Therefore, in addition to constructing the Hamiltonian with average couplings, we also con-
struct a Hamiltonian with time-dependent couplings Vij (t), where the couplings between two
snapshots are obtained by linear interpolation.
Our goal is to perform a 10 ns exciton dynamics simulation to investigate long-term exci-
ton transport. However, the QM/MM trajectories used to obtain the site energies and their
fluctuations are only 40 ps long. Generating 10 ns of QM/MM simulations for every pigment
is computationally prohibitive. To overcome this limitation, we employ a noise generation
algorithm based on spectral densities54 to create longer site energy fluctuation trajectories
∆Ei(t). This algorithm allows us to extend the site energy fluctuations to the desired sim-
ulation length while preserving the statistical properties of the original data. As the noise
generation algorithm relies on the spectral densities of the site energy fluctuations, we first
compute the spectral densities for each pigment type. As a first step, the autocorrelation
function C(t)of the site energy fluctuations ∆Ei(t)is obtained from the MFML model along
the QM/MM trajectories55 using
C(tj) = 1
N−j
N−j
X
i=1
∆E(ti+tj)∆E(ti).(3)
To suppress noise in the autocorrelation function, a Gaussian damping with a timescale of
5 ps is applied. The spectral density J(ω)is then obtained via the Cosine transform of the
autocorrelation function
J(ω) = βω
πZ∞
0
C(t) cos(ωt)dt , (4)
where β= 1/(kBT)denotes the inverse temperature with kBbeing the Boltzmann constant
and Tthe temperature. Since all pigments of the same type are chemically identical, we
average the spectral densities over all pigments of the same type, i.e., m-type or p-type.
This averaging minimizes the contributions from specific configurations or local environ-
22
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
250
500
750
1000
1250
1500
1750
2000
J( ) [cm 1]
m average
p average
0.00
0.05
0.10
0.15
0.20
0.25
J( ) [eV]
(a) SD of m-type and p-type porphyrin
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
[eV]
0
250
500
750
1000
1250
1500
1750
2000
J( ) [cm 1]
FMO
p average
0.00
0.05
0.10
0.15
0.20
0.25
J( ) [eV]
(b) Experimental SD of FMO.
Figure 10: Panel (a) shows the average spectral densities of all m-type and p-type porphyrin
molecules, while panel (b) shows an experimental spectral density of FMO as comparison.18,56
ments present in the relatively short 40 ps QM/MM simulations, leading to a more rep-
resentative spectral density for each pigment type. The spectral densities for the m-type
and p-type porphyrin molecules are shown in Fig. 10, together with an experimental spec-
tral density of the Fenna–Matthews–Olson (FMO) complex containing bacteriochlorophyll
molecules (BChl).18,56 Notably, the porphyrins exhibit spectral features up to approximately
0.4eV, which is significantly higher than the spectral features of BChl or chlorophyll (Chl)
in plants, typically extending only up to about 0.2eV. These higher frequency components
might originate from the outer rings of the porphyrin molecules, which are absent in BChl
and Chl, or the central nitrogen-hydrogen (N-H) bonds, which have the highest force constant
within porphyrin molecules.57
To generate the noise ∆Ei(t)following the spectral densities, we first create white noise
η(t)with zero mean and unit variance. The target noise ∆Ei(t)is then obtained by mul-
tiplying the Fourier transform of the white noise ˜η(ω)with the square root of the power
spectrum ˜
C(ω) = J(ω)2π/(βω)followed by an inverse Fourier transform54
∆Ei(t) = ℜ(IFFT(˜η(ω)˜
C(ω))) .(5)
This process yields noise ∆Ei(t)that follows the desired spectral density and can be used
23
for the desired 10 ns simulations. Combining all components, the final Hamiltonian ˆ
H(t)
is constructed with average site energies Eifrom the MFML model along the QM/MM
trajectories, time-dependent fluctuations ∆Ei(t)generated from the noise algorithm, and
the time-dependent electronic couplings Vij (t)obtained from average or linearly interpolated
TrESP couplings along the MD simulation.
Numerical integration of Schrödinger equation
To enable exciton calculations over 10 ns for 144 pigments, i.e., 9 copies of 16 pigments,
a numerically efficient algorithm is needed. To this end, we choose the Numerical integra-
tion of Schrödinger equation (NISE) as it is fast, easily parallelizable and works with the
time dependent Hamiltonian we just constructed.27,28,54 The NISE approach treats only the
system quantum mechanically and the coupling to the environment classically. This treat-
ment includes an implicit high temperature limit, so that detailed balance is not fulfilled,
i.e., the long term distribution is an even distribution rather than the expected Boltzmann
distribution.58 Modifications to the NISE that address this limitation exist,59,60 but have
shortcomings in systems with low couplings Vij 60 and high frequency contributions in the
noise,54 both of which are present in the porphyrin-clay system. Therefore, we have to keep
this limitation in mind when interpreting the results.
In the NISE method, we solve the time-dependent Schrödinger equation for the excitonic
wave function |ψs(t)⟩
i¯h∂
∂t |ψs(t)⟩=ˆ
H(t)|ψs(t)⟩,(6)
where ˆ
H(t)is the time-dependent Hamiltonian we constructed earlier. The excitonic state
can be expressed in terms of site basis states |m⟩
|ψs(t)⟩=X
m
cm(t)|m⟩,(7)
with time-dependent coefficients cm(t). These coefficients can be obtained by solving Eq. 6
24
0246810
time [ns]
0.0
0.2
0.4
0.6
0.8
1.0
Population
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
m1
m2
m3
m4
(a) Average coupling values.
0246810
time [ns]
0.0
0.2
0.4
0.6
0.8
1.0
Population
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
m1
m2
m3
m4
(b) Time-dependent coupling values.
Figure 11: Population dynamics of an exciton initially placed on the respective pigment in
the central copy. The populations were calculated using an average over 100 realizations.
Panel (a) shows the simulation for average coupling values, and panel (b) for time-dependent
coupling values.
numerically by assuming H(t)is constant for small time steps ∆t, where we use a time step
of 1 fs. The population of an exciton on site mis then given by
Pm(t) = |cm(t)|2.(8)
Finally this has to be repeated for many realizations to get reasonable results. The software
code TorchNISE which is available on GitHub is used to run the calculations.61 The exciton
dynamics are simulated by initially exciting a single porphyrin molecule in the central copy
and observing the population decay over time as depicted in Fig. 11. Two versions are
shown for the Hamiltonian constructed based on the average and on the time-dependent
coupling values. The results indicate significant differences in the decay of the exciton
population between the two cases. With average couplings, some p-type porphyrins decay
much slower than others. In particular, certain p-type pigments retain higher populations
over longer times, which is not observed with the time-dependent coupling values. We
suspect that averaging the couplings, which can be both positive and negative due to the
relative orientations of the transition dipole moments (see Fig. 5), leads to an underestimation
25
of the effective coupling strengths as positive and negative values cancel out each other.
Moreover, the transfer from pigment m3 shows notably different behavior between the two
cases. In simulations with average couplings, the exciton population on m3 decays much
faster compared to other m-type pigments. This rapid decay is also observed with time-
dependent couplings at short times but is followed by a slowdown in the decay rate. This
behavior can be explained by examining the coupling patterns in the system: pigment m3 is
strongly coupled to p2 during certain periods (as shown in Fig. 5), and p2 becomes strongly
coupled to p1 at other times. With average couplings, these strong interactions are always
present in the Hamiltonian, allowing the exciton to quickly transfer from m3 to p2, then
to p1, and further into the system, resulting in the faster decay of the exciton population
on m3. In contrast, time-dependent couplings capture these strong interactions only during
specific intervals. The exciton initially equilibrates quickly between m3 and p2 due to their
strong coupling at that time, but transfer to other pigments is slower because they are not
coupled strongly at the same time.
Despite the improved realism provided by time-dependent couplings, our simulations
show that the exciton lifetimes for the m-type porphyrins are much longer than experimen-
tal observations. In the experimental study by Ishida et al.,4the exciton lifetime of m-type
porphyrin was determined to be approximately 0.4 ns, whereas in our simulations, the life-
times are clearly much longer. There are several potential sources for this discrepancy.
Firstly, we have included only the Qxstate in our exciton dynamics simulations. Incor-
porating the Qystate might increase the transfer rates significantly when weak couplings
are the result of misaligned transition dipoles. Whenever the transition dipole of the Qx
states of neighboring pigments is not aligned, the dipole of the Qystate would be very well
aligned with the Qxstate, thereby increasing overall transfer. Additionally, a higher time-
resolution of the couplings might impact the transfer rates. While fluctuating couplings are
usually considered less important in exciton calculations in light-harvesting systems, they
clearly have an impact on the present system. It is possible that variations of the couplings
26
on faster timescales would affect the lifetimes further. Furthermore, in the experiment, a
hexagonal packing of the porphyrins on the clay surface is assumed, with an inter-pigment
distance of approximately 2.6 nm,4whereas in the present simulations, the average distance
to the six nearest neighbors is about 2.9 nm. Since dipole-dipole coupling decreases with
the 6th power of the distance, we might underestimate the couplings about 50% due to the
larger distances, resulting in slower exciton transfer and longer lifetimes in the simulations.
Another possibility is the absence of thermalization effects in the NISE as p-type porphyrin
molecules generally have lower energy levels.
Diffusion
To quantify the exciton transport, we analyze the diffusion of the exciton over time. The
diffusion is defined as the mean squared displacement (MSD) of the exciton and can be
determined using the positions of the porphyrin molecules and the exciton populations
⟨d(t)2⟩=X
i|(xi(t)−x0(0))|2Pi(t),(9)
where Pi(t)denotes the exciton population on site iobtained from NISE, xi(t)the center-
of-mass position of porphyrin i, and x0(0) the initial center-of-mass position of the initially
excited porphyrin. As a comparison, we use a simple classical model. We place the porphyrin
molecules on a hexagonal grid at a distance of 2.6 nm, as reported in Ref. 4. The population
on any given porphyrin molecule will transfer to the six neighboring porphyrins with a
transfer rate tr. In Ref. 4, the transfer rate from m- to p-type porphyrins was reported to be
2.4 ns−1with one m-type porphyrin surrounded by 4.5 p-type porphyrins on average. Hence,
we estimate the pigment-to-pigment transfer rate as 2.4/4.5=0.53 ns−1. We further assume
this transfer rate holds for all types of transfers (m to p, p to m, p to p, m to m), because
no other transfer rates were determined. This is, of course, a significant assumption, since
in the present calculations transfer from p-type porphyrins is much faster than from m-type.
27
The diffusion is then calculated from the resulting population using Eq. 9.
0246810
time [ns]
0
5000
10000
15000
20000
Diffusion [Å2]
NISE average
NISE p-type
NISE m-type
classical hexagonal
(a) Average Coupling
0246810
time [ns]
0
5000
10000
15000
20000
Diffusion [Å2]
NISE average
NISE p-type
NISE m-type
only molecule motion
classical hexagonal
(b) Time-dependent Coupling
Figure 12: Diffusion with average coupling (panel (a)) and time-dependent couplings and
positions (panel (b)) calculated from the same NISE simulation as Fig. 11. In addition, the
diffusion based on the classical hexagonal model is shown in both panels and, furthermore,
the diffusion based only on the molecular motion is shown in panel (b)
Fig. 12 displays the exciton diffusion averaged over the four NISE calculations with the
exciton placed on an m-type porphyrin in the central copy, the 12 calculations with the
exciton placed on a p-type porphyrin, and averaged over all 16 calculations. Additionally,
the classical diffusion based on the hexagonal grid is shown. Panel (a) shows the simulation
for average couplings, and panel (b) for time-dependent couplings. Panel (b) also includes the
diffusion of the porphyrin molecules themselves, i.e., ⟨di,mol(t)2⟩=|xi(t)−xi(0)|2), averaged
over all porphyrins.
The results show that time-dependent couplings significantly enhance the exciton dif-
fusion rate compared to average couplings, which aligns with the generally faster decay of
initial populations in Fig. 11. This also brings the NISE diffusion results closer to the classical
diffusion. The movement of the pigments themselves contributes negligibly to the diffusion,
indicating that exciton transfer is the main driver of exciton diffusion. It can also be seen
that, diffusion is slower for m-type porphyrins compared to p-type porphyrins, which aligns
again with Fig. 11, where the exciton decay for m-type porphyrins is slower. The diffusion
28
0246810
time [ns]
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Diffusion Constant [Å2/ps]
NISE average
NISE p-type
NISE m-type
classical hexagonal
(a) Long time (10ns)
0.0 0.2 0.4 0.6 0.8 1.0
time [ps]
10
0
10
20
30
40
Diffusion Constant [Å2/ps]
NISE no moving average
NISE average
NISE p-type
NISE m-type
classical hexagonal
(b) Short time (1ps)
Figure 13: Diffusion “constant” over time, shown as (a) a moving average over 100ps and (b)
compared to instantaneous value for short times. The NISE calculations are taken from the
simulation with the time-dependent couplings.
“constant” D(t)is defined as the time derivative of the diffusion
D(t) = d
dt < d(t)2> . (10)
In the classical model, the diffusion constant remains constant. However, in the NISE-based
calculations, we observe that the diffusion “constant” fluctuates on very short timescales,
comparable to the Rabi frequencies between porphyrin pairs which can be calculated to be on
the order of 30-150fs based on the average energy difference and coupling (see Fig. 13(b)).62
While these very fast fluctuation are a result of the quantum description of diffusion, they
have little impact on the long term behavior. Hence, we compute a moving average of the
diffusion constant over 100 ps to avoid the fluctuations dominating the graph and to enable
a better comparison with the classical diffusion (see Fig. 13(a)). The moving average reveals
that the diffusion constant is initially larger for p-type porphyrins but becomes similar to
that of m-type porphyrins at a timescale comparable to the lifetime of excitons placed on
the m-type porphyrins. This suggests that while p-type porphyrins facilitate faster exciton
transport initially, the diffusion rate become equal once the exciton population is distributed
between m- and p-type porphyrins. The initial quick drop in the diffusion constant can
29
be attributed to an initial equilibration of the exciton between strongly coupled nearest
neighbors, as discussed previously for m-3 in Fig. 11. Such strongly coupled nearest neighbors
are absent on an orderly hexagonal grid. The reduction of the diffusion constant over longer
times might also be related to some population reaching the edge of our 3×3 copy simulation.
Even small populations at the edges will have a large influence on diffusion as it grows with
the square of the distance.
A property often referred to in experimental papers is the diffusion length, which, for
classical diffusion, can be calculated from the diffusion constant Dand the lifetime of the
exciton τas63
L=√Dτ . (11)
This formula equation is usually used because Dis often calculated directly without calculat-
ing < d(t)2>first and Dτ is equal to < d(τ)2>for classical diffusion. In our case, we have
calculated < d(t)2>directly. Hence, determining the diffusion length as L=p(< d(τ)2>)
is more sensible as it avoids complications with the fluctuations in Dfor the quantum cases.
We chose the experimental lifetime of τ= 5.6ns of the combined porphyrin-clay complex
for all our calculations.4For the time-dependent coupling calculations, we obtain a diffu-
sion length of 7.3 nm for excitations starting on an m-type porphyrin molecules, 9.8 nm for
excitations starting on a p-type dye, and 12.3 nm for the classical hexagon-based diffusion.
These values are generally in line with experimental values of organic semiconductors, which
are typically between 5 and 20 nm, although longer diffusion lengths have also been ob-
served.64 For porphyrin-based systems, diffusion lengths between 7.5 and 40 nm have been
observed.65–68 Interestingly, most of these have higher diffusion constants, up to 900Å2/ps for
porphyrin-based metal–organic frameworks,68 but shorter diffusion times, which ultimately
lead to comparable diffusion lengths.
Since the NISE simulations underestimate the transfer from m- to p-type porphyrins,
we expect that our calculations also underestimate the diffusion length for the porphyrin-
clay complex. If that underestimation is of similar magnitude as for the population decay
30
of the m-type, the porphyrin-clay complex might have one of the largest diffusion lengths
among comparable systems. This would be an important finding because a large diffusion
lengths is an important property for organic solar cells. To investigate this further, one
could determine the other pigment-pigment transfer rates or directly measure the diffusion
length experimentally. Alternatively, the previously mentioned improvements to the exciton
dynamics (inclusion of the Qystate, better-resolved couplings, and closer aligned nearest-
neighbor distances) could align the simulations closer to reality and predict a better diffusion
length.
Conclusions and Outlook
Inspired by the energy transfer networks present in the light-harvesting complexes of plants,
bacteria, and algae, a significant amount of work has been done over the past years to repli-
cate these natural systems artificially. Experimental evidence4has shown that porphyrin
molecules attached to inorganic clay surfaces can achieve remarkable energy transfer effi-
ciencies close to 100%. In this study, we tried to emulate the experimental setup in Ref.
4, i.e., cationic free-base porphyrin molecules adsorbed on an anionic inorganic clay sur-
face. The surface model was created using the CHARMM-GUI webserver. Two types of
porphyrins, m-TMPyP and p-TMPyP, were selected due to their positive charge, in a ration
of 1:3 and with a total of 16 porphyrin molecules.
Classical MD simulations were conducted to equilibrate the system, allowing each por-
phyrin molecule to find its optimal binding location on the clay surface. The equilibrated
structures were subsequently used as starting structures for QM/MM MD simulations, fol-
lowed by excitation energy calculations using the TC-LC-DFTB method. This workflow
established the computational protocol for investigating energy transfer processes among
the pigment molecules. In addition, a novel multifidelity machine learning-based approach,
MFML, was used to compute excitation energies at the computationally demanding TD-
31
DFT/Def2-SVP level for all porphyrin molecules, each consisting of 90 atoms, across 640,000
geometries. The extensive sampling of high-level quantum excitation energies was then used
to construct a time-dependent exciton Hamiltonian, enabling simulations of exciton dynam-
ics and diffusion using the NISE approach in a periodic system. The results demonstrate
the feasibility of simulating energy transfer in artificial systems by utilizing porphyrin-clay
hybrid materials. The high efficiency observed in the energy transfer process underscores the
potential for porphyrin-based materials in designing light-harvesting devices. The compu-
tational protocol in the present study effectively bridges classical and quantum simulations,
enabling an in-depth exploration of energy dynamics at the molecular level. The successful
implementation of the MFML technique for large-scale quantum calculations also highlights
the potential for machine learning to significantly accelerate complex computational tasks
in quantum chemistry.
As an outlook, the computational framework developed in this study can be extended
to design more sophisticated artificial light-harvesting networks using various dye molecules,
paving the way for further innovations in solar energy harvesting. By providing a detailed
theoretical understanding of the mechanisms underlying efficient energy transfer, this work
contributes to the development of sustainable energy solutions. Furthermore, the results
serve as a valuable benchmark for experimental investigations, offering a basis for future
experimental studies to validate and refine the proposed models. Subsequent studies might
investigate more intricate combinations of clay and dye, varying dye proportions, and ex-
ternal influences like controlling the temperature to enhance the computational design of
synthetic light-harvesting systems to adhere to the “size-matching” principle.
32
Methods
Simulation and Calculation Details
Here we give some more details for modelling the montmorillonite nanosheet. The aluminum
atoms in the octahedral layer of the system, i.e., the central layer in Fig. 1b, are partially
replaced by magnesium, which leads to a charge difference and forms a negatively charged
surface. The parameter x, which can vary between 0 and 0.95, indicates the number of
magnesium defects in the system. The cations in the formula imply that the system is in
a water environment with dissolved cations to achieve neutrality. It has to be noted that
in the experimental setup with the saponite system, the silicon atoms in the tetrahedral
layer, which is shown in yellow in Fig. 1b, were replaced by aluminum. Although the layer
of atoms exchanged in our simulations were different from the experiment, the distances
between replacement sites were consistent, so the “size-matching rule” still applies. The clay
surface with a size of 10.38 nm ×10.82 nm ×1 nm in vacuum was generated using the
online tool CHARMM-GUI69,70 where the ratio of defects and ion options can be customized
and modified. Four different values of x(0.13, 0.2, 0.45 and 0.94) were selected to generate
the topology and perform individual MD simulations. The size of the initial simulation box
was chosen to fit the material size, with an height of 4 nm. Periodic boundary conditions
(PBC) were first only applied for xand yaxis. The structures of the porphyrin molecules
were obtained from ChemSpider (CSID: 133612 and 4086), and the ACPYPE (AnteChamber
PYthon Parser interfacE) tool71 was then employed to generate the topology and coordi-
nates in GROMACS format. After randomly placing four m-TMPyP and twelve p-TMPyP
molecules on top of the clay surface, a short preliminary simulation was performed in vac-
uum for a maximum of 5 ns, allowing the porphyrins to be attracted to the clay surface
solely through electrostatic interactions. The number of the porphyrins was chosen as it was
proved to be one of the ratios with the highest energy transfer efficiency.4The last frame of
the simulation was extracted and then used as the initial configuration for the further MD
33
simulations.
Due to technical reasons, 3D-PBC was required in the subsequent MD simulations. The
height of the simulation box was extended to 25 nm, which is about 2.5 times the width, to
minimize the effect of the mirror structure on the adsorption behavior of porphyrins. After-
ward, the system was solvated using TIP3P water molecules, and neutralized with sodium
and chloride ions, followed by energy minimization, a 5 ns NVT equilibration at 300 K and
a 5 ns NPT equilibration. Thereafter, the classical MD simulations for 30 ns were performed
with a time step of 1 fs employing GROMACS 5.1.472 using the general AMBER force field
(GAFF)73 and the interface force field (IFF)74 for porphyrins and clay surface, respectively.
The Nose-Hoover thermostat75 and Berendsen barostat76 were employed to control the tem-
perature and pressure, respectively. The cutoff for short-range non-bonded interaction was
set to 1.2 nm and the long-range electrostatics was treated using the Particle Mesh Ewald
(PME) method.77 In addition, the LINCS algorithm was used for bond constrains.78
The montmorillonite surface with a xvalue of 0.45 was chosen for the subsequent quantum
mechanics/molecular mechanics dynamics (QM/MM) since all sixteen porphyrins were still
adsorbed on the surface after the 30 ns MD simulation, while some porphyrins detached in
the simulations using the other xvalues. Each of the porphyrin molecules was assigned to
the QM region and treated separately using the DFTB3 approach with the 3OB-f parameter
set.79 Classical force fields were employed for the remainder of the system. A 50 ps NPT
equilibration was performed in GROMACS including the DFTB+interface,26 followed by
another 40 ps QM/MM MD simulation with a smaller integrator time step, which was set to
0.5 fs. The atomic coordinates were stored every two steps with 40,000 frames, which were
subjected to excited state calculations employing TD-LC-DFTB39 and TD-DFT with the
CAM-B3LYP functional, as implemented in DFTB+21.124 and the ORCA 5.0.3 package,80
respectively. Four different basis sets, STO-3G, 3-21G, 6-31G and def2-SVP, were used for
the TD-DFT calculations according to their quantum chemical hierarchy, with a time stride
of 8, 16, 32 and 64 fs, respectively.
34
Furthermore, another 100 ns classical MD simulation was performed for 10,000 snap-
shots after the equilibration, in order to evaluate the excitonic couplings between the por-
phyrin molecules the transition charges from electrostatic potentials (TrESP) method was
applied.41,42
Gaussian Process Regression
In GPR,81 a machine learning model is found for a training set T={(xi, yi)}N
i=1, with train-
ing inputs xiand corresponding output measurements yi. Then, starting from a Gaussian
process prior GP(0, k(x,x′)), with covariance (kernel) function k(x,x′), a posterior predic-
tive distribution is found by conditioning on the training set T. The posterior distribution
has a the mean
m(x) = K(x,X)K(X,X) + σ2
nI−1y(12)
that gives predictions for unknown inputs x, i.e. it is the constructed ML model. By X, we
denote the set of training inputs and yis a vector with the corresponding outputs. In the
model, k(x,X)denotes the vector of pair-wise kernel evaluations between the query input
xand the set of training inputs X, whereas K(X,X)is the matrix of all pair-wise kernel
evaluations between all training inputs X. In addition, σ2
nis a hyperparameter that models
the (unknown) variance of the noise that is assumed on the outputs of the training data.
For our tests we chose a Gaussian covariance function
k(x,x′) = σ2
fexp −1
2(x−x′)TM(x−x′).(13)
Here, M=diag 1
l2
1, ..., 1
l2
Dis a diagonal matrix, where Dis the number of features. This
results in a different length-scale for every feature, known as automatic relevance determi-
nation.81 Moreover, σ2
fis a hyperparameter, the output-scale. All hyperparamters are found
35
by finding by maximising the marginal log-likelihood, with details in the supplementary
material. We used the GPR implementation of GPyTorch.82
In this work, the inputs xare Coulomb matrices,83 representing the molecular configura-
tions, and the outputs yare the corresponding excitation energies. For Coulomb matrices,
to ensure invariance against atom permutation, often a sorting is carried out, which results
in discontinuities and makes them unfavorable for training ML models.84 In our tests, all we
have fixed atoms given in a fixed order. Therefore, we can safely ignore invariance against
atom permutation and hence use unsorted Coulomb matrices.
Active learning
Active learning (AL) algorithms try to choose or create new training samples, which are
maximally informative to an ML model. In our setup, we have a large amount of unlabeled
molecular configurations and generating the respective labels (excitation energies) via quan-
tum chemical calculations is expensive. For this setting, uncertainty sampling38 can be a
well suited AL scheme: First, for a small random subset of the data, excitation energies are
calculated. Then, this data is used to train an initial model. In the iterative process to add
favorable new training samples, the standard deviation of the predictive distribution of the
GPR model
σ(x) = qk(x,x)−K(x,X)K(X,X) + σ2
nI−1K(X,x)(14)
is used as uncertainty measure. Those samples are added that have the highest uncertainty.
Once a samples is selected from the remaining pool of unlabeled samples, its label, i.e. the
excitation energy for the given molecule, is calculated. The input-output pair is then added
as additional training sample. The full procedure is summarized in Algorithm 1.
36
Algorithm 1 Active Learning by Uncertainty Sampling
Require: Set of unlabeled inputs U, initial sample size ninit, number of iterations niter
1: Randomly select ninit samples from Uand store them in set L
2: Obtain labels y(L)for all inputs in L
3: U=U\L
4: Train initial model m, following eq. (12), using labeled data (L, y(L))
5: for i= 1 to niter do
6: Compute uncertainty σ(x), from eq. (14), for each sample in Uusing model m
7: Select sample x∗from Uwith highest uncertainty
8: Obtain label y∗for selected sample x∗
9: L=L∪ {x∗}
10: U=U\{x∗}
11: Retrain model m, following eq. (12), using updated labeled dataset (L, y(L))
12: end for
13: return Trained model m
Multifidelity Machine Learning Approach
Multifidelity machine learning (MFML) systematically combines ML sub-models trained on
multiple fidelities, f, to produce a low-cost high accuracy ML model interested in predicting
the excitation energies for a target fidelity F.32,50 A composite index, s= (f, ηf), is used to
identify the sub-models of MFML. In this index, we have 2ηf=N(f)
train. The MFML model is
built with a target fidelity, F, for a given baseline fidelity fb, which refers to the lowest QC
fidelity used in the model. One can write the MFML model for a query molecular descriptor,
Xqas
P(F,ηF;fb)
MFML (Xq) := X
s∈S(F ,ηF;fb)
βsP(s)
GPR (Xq),(15)
where, the summation is made for the set of selected sub-models of MFML, S(F,ηF;fb). This
selection is decided by the choice of fb, and N(F)
train = 2ηF, the number of training samples
at the target fidelity, as presented in Ref. 50. The βsfrom Eq. (15) are referred to as the
coefficients of the linear combination of these selected sub-models. For MFML, the βsare
chosen to be
βMFML
s=
+1,if f+ηf=F+ηF
−1,otherwise
,(16)
37
based on previous work from Ref. 32.
In the MFML method, the number of training samples at the subsequent fidelities are
set by a scaling factor,γ. That is, Nf
train =γ·Nf−1
train for fb< f ≤F. The value of γ= 2
is conventionally used in MFML based on previous work.32,49–51 In a recent work, the effect
of different values of γin the model error of MFML has been studied.33 Ref. 33 reports
that the use of very little training data at the target fidelity combined with increasing values
of γ, results in a more data efficient model. These models are studied using the model
error and time-cost of generating the training data required for the model. This specific
curve is referred to as the Γ(NF
train)curve with each data point essentially being the MFML
model built for a specific value of γ. Herein, the Γ(8) curve is built for γ∈ {2,...,12}.
The supplementary information associated with this work also reports Γ(2),Γ(4), and Γ(16)
curves in Figure S9. The difference between the Γ(8) and Γ(16) curves were not seen to
be significantly different. The final model used in this work in the prediction of def2-SVP
energies is the MFML model along the Γ(8) curve for γ= 12 as discussed in Figure 9.
Model Evaluation
The different ML models that are built in this work, both single fidelity and MFML, are
evaluated using mean absolute error (MAE) over holdout test sets. Given a test set at target
fidelity, F, denoted as QF:= {(Xq, yref
q)}Neval
q=1 , the MAE for an ML model is calculated as
MAE := 1
Neval
Neval
X
q=1
yML
q−yref
q
1.(17)
The model error is initially reported as a function of number of training samples used at
the target fidelity in the form of learning curves85–87 for both single fidelity and MFML
models since these are a common metric of model evaluation for the ML for QC quantities
workflow.88–90
Since the main aim of multifidelity models is to reduce the time-cost of generating training
38
data, this work also presents a recently introduced analysis of MAE versus time-cost of
generating training data for a certain ML model.49 For single fidelity models, this is simply
the QC calculation cost of the number of training samples used for the ML model. For MFML
models, this cost takes into account the QC computation cost of not just the training samples
used at the target fidelity, but also the cost of training samples used in the entire multifidelity
data structure.
Acknowledgement
The authors are grateful to the developers of the webtool CHARMM-GUI and in particular
Prof. Wonpil Im for assistance in modelling and modifying the montmorillonite nanosheet.
Furthermore, the authors acknowledge support by the DFG through the Priority Program
SPP 2363 on “Utilization and Development of Machine Learning for Molecular Applications
– Molecular Machine Learning” through the project ZA 1175/4-1 and KL 1299/25-1 and
further funding through projects ZA 1175/3-1 and KL 1299/24-1. VV and PZ would also like
to acknowledge the support of the ‘Interdisciplinary Center for Machine Learning and Data
Analytics (IZMD)’ at the University of Wuppertal. Furthermore, part of the simulations
were performed on a compute cluster funded through the DFG project INST 676/7-1 FUGG,
while part of the machine learning training was carried out on the PLEIADES cluster at the
University of Wuppertal, which is supported by the Deutsche Forschungsgemeinschaft (DFG,
grant No. INST 218/78-1 FUGG) and the Bundesministerium für Bildung und Forschung
(BMBF).
Supporting Information Available
Additional figures, tables, and explanations on excitation energies, transition charges, spec-
tral densities and the MFML approach.
39
References
(1) Bialas, D.; Kirchner, E.; Röhr, M. I. S.; Würthner, F. Perspectives in Dye Chemistry:
A Rational Approach toward Functional Materials by Understanding the Aggregate
State. J. Am. Chem. Soc. 2021,143, 4500–4518.
(2) Richhariya, G.; Kumar, A.; Tekasakul, P.; Gupta, B. Natural Dyes for Dye Sensitized
Solar Cell: A Review. Renewable Sustainable Energy Rev. 2017,69, 705–718.
(3) Mogren Al Mogren, M.; Ahmed, N. M.; Hasanein, A. A. Molecular Modeling and
Photovoltaic Applications of Porphyrin-Based Dyes: A Review. J. Saudi Chem. Soc.
2020,24, 303–320.
(4) Ishida, Y.; Shimada, T.; Masui, D.; Tachibana, H.; Inoue, H.; Takagi, S. Efficient
Excited Energy Transfer Reaction in Clay/Porphyrin Complex toward an Artificial
Light-Harvesting System. J. Am. Chem. Soc. 2011,133, 14280–14286.
(5) Ohtani, Y.; Kawaguchi, S.; Shimada, T.; Takagi, S. Energy Transfer among Three Dye
Components in a Nanosheet-Dye Complex: An Approach To Evaluating the Perfor-
mance of a Light-Harvesting System. J. Phys. Chem. C 2017,121, 2052–2058.
(6) Tsukamoto, T.; Shimada, T.; Takagi, S. Artificial Photosynthesis Model: Photochemical
Reaction System with Efficient Light-Harvesting Function on Inorganic Nanosheets.
ACS Omega 2018,3, 18563–18571.
(7) Nishina, H.; Hoshino, S.; Ohtani, Y.; Ishida, T.; Shimada, T.; Takagi, S. Anisotropic En-
ergy Transfer in a Clay-Porphyrin Layered System with Environment-Responsiveness.
Phys. Chem. Chem. Phys. 2020,22, 14261–14267.
(8) Auwärter, W.; Ecija, D.; Klappenberger, F.; Barth, J. V. Porphyrins at Interfaces. Nat.
Chem. 2015,7, 105–120.
40
(9) Hassen, J.; Silver, J. The Unique Structure–Activity Relationship of Porphyrins and
Clay Mineral Systems in Modern Applications: A Comprehensive Review. Chem.
Biochem. Eng. Q. 2024, (in press).
(10) Fujimura, T.; Shimada, T.; Sasai, R.; Takagi, S. Optical Humidity Sensing Using Trans-
parent Hybrid Film Composed of Cationic Magnesium Porphyrin and Clay Mineral.
Langmuir 2018,34, 3572–3577.
(11) Yamada, Y.; Nishino, T.; Hashimoto, A.; Toyoda, Y.; Yoshikawa, H.; Tanaka, K.
Inorganic–Organic Framework Constructed by the Intercalation of a Double-Decker
Porphyrin Metal Complex into Clay Nanosheets and Its Efficient Dye Adsorption Abil-
ity. ACS App. Opt. Mat. 2024,2, 405–413.
(12) Chanturiya, V.; Minenko, V.; Makarov, D.; Suvorova, O.; Selivanova, E. Advanced
Techniques of Saponite Recovery from Diamond Processing Plant Water and Areas of
Saponite Application. Minerals 2018,8, 549.
(13) Willemsen, J. A. R.; Myneni, S. C. B.; Bourg, I. C. Molecular Dynamics Simulations
of the Adsorption of Phthalate Esters on Smectite Clay Surfaces. J. Phys. Chem. C
2019,123, 13624–13636.
(14) Sun, W.; Zeng, H.; Tang, T. Synergetic Adsorption of Polymers on Montmorillonite:
Insights from Molecular Dynamics Simulations. Appl. Clay Sci. 2020,193, 105654.
(15) Willemsen, J. A. R.; Emunah, M.; Bourg, I. C. Molecular Dynamics Simulation of
Organic Contaminant Adsorption on Organic-Coated Smectite Clay. Soil Sci. Soc. Am.
J. 2022,86, 238–252.
(16) Wang, J.; Wilson, R. S.; Aristilde, L. Electrostatic Coupling and Water Bridging in
Adsorption Hierarchy of Biomolecules at Water–Clay Interfaces. Proc. Natl. Acad. Sci.
2024,121 .
41
(17) Pollak, H.; Degiacomi, M. T.; Erastova, V. Modeling Realistic Clay Systems with Clay-
Code. J. Chem. Theory Comput. 2024,
(18) Maity, S.; Bold, B. M.; Prajapati, J. D.; Sokolov, M.; Kubař, T.; Elstner, M.;
Kleinekathöfer, U. DFTB/MM Molecular Dynamics Simulations of the FMO Light-
Harvesting Complex. J. Phys. Chem. Lett. 2020,11, 8660–8667.
(19) Maity, S.; Daskalakis, V.; Elstner, M.; Kleinekathöfer, U. Multiscale QM/MM Molec-
ular Dynamics Simulations of the Trimeric Major Light-Harvesting Complex II. Phys.
Chem. Chem. Phys. 2021,23, 7407–7417.
(20) Maity, S.; Sarngadharan, P.; Daskalakis, V.; Kleinekathöfer, U. Time-Dependent Atom-
istic Simulations of the CP29 Light-Harvesting Complex. J. Chem. Phys. 2021,155,
055103.
(21) Sarngadharan, P.; Maity, S.; Kleinekathöfer, U. Spectral Densities and Absorption Spec-
tra of the Core Antenna Complex CP43 from Photosystem II. J. Chem. Phys. 2022,
156, 215101.
(22) Maity, S.; Kleinekathöfer, U. Recent Progress in Atomistic Modeling of Light-
Harvesting Complexes: A Mini Review. Photosynth. Res. 2023,156, 147–162.
(23) Sarngadharan, P.; Holtkamp, Y.; Kleinekathöfer, U. Protein Effects on the Excitation
Energies and Exciton Dynamics of the CP24 Antenna Complex. J. Phys. Chem. B
2024,128, 5201–5217.
(24) Hourahine, B. et al. DFTB+, a Software Package for Efficient Approximate Density
Functional Theory Based Atomistic Simulations. J. Chem. Phys. 2020,152, 124101.
(25) Renger, T. Theory of Excitation Energy Transfer: From Structure to Function. Photo-
synth. Res. 2009,102, 471–485.
42
(26) Bold, B. M.; Sokolov, M.; Maity, S.; Wanko, M.; Dohmen, P. M.; Kranz, J. J.;
Kleinekathöfer, U.; Höfener, S.; Elstner, M. Benchmark and Performance of Long-
Range Corrected Time-Dependent Density Functional Tight Binding (LC-TD-DFTB)
on Rhodopsins and Light-Harvesting Complexes. Phys. Chem. Chem. Phys. 2020,22,
10500–10518.
(27) Jansen, T. L. C.; Knoester, J. Nonadiabatic Effects in the Two-Dimensional Infrared
Spectra of Peptides: Application to Alanine Dipeptide. J. Phys. Chem. B 2006,110,
22910–22916.
(28) Aghtar, M.; Liebers, J.; Strümpfer, J.; Schulten, K.; Kleinekathöfer, U. Juxtaposing
Density Matrix and Classical Path-Based Wave Packet Dynamics. J. Chem. Phys.
2012,136, 214101.
(29) Vinod, V.; Maity, S.; Zaspel, P.; Kleinekathöfer, U. Multifidelity Machine Learning for
Molecular Excitation Energies. J. Chem. Theory Comput. 2023,19, 7658–7670.
(30) Vinod, V.; Kleinekathöfer, U.; Zaspel, P. Optimized Multifidelity Machine Learning for
Quantum Chemistry. Mach. Learn.: Sci. Technol. 2024,5, 015054.
(31) Westermayr, J.; Marquetand, P. Machine Learning for Electronically Excited States of
Molecules. Chem. Rev. 2020,121, 9873–9926.
(32) Zaspel, P.; Huang, B.; Harbrecht, H.; Von Lilienfeld, O. A. Boosting Quantum Machine
Learning Models with a Multilevel Combination Technique: Pople Diagrams Revisited.
J. Chem. Theory Comput. 2019,15, 1546–1559.
(33) Vinod, V.; Zaspel, P. Investigating Data Hierarchies in Multifidelity Machine Learning
for Excitation Energies. 2024.
(34) Rupp, M.; Bauer, M. R.; Wilcken, R.; Lange, A.; Reutlinger, M.; Boeckler, F. M.;
43
Schneider, G. Machine learning estimates of natural product conformational energies.
PLoS Comput. Bio. 2014,10, e1003400.
(35) Uteva, E.; Graham, R. S.; Wilkinson, R. D.; Wheatley, R. J. Active learning in Gaus-
sian process interpolation of potential energy surfaces. The Journal of chemical physics
2018,149 .
(36) Zaverkin, V.; Holzmüller, D.; Steinwart, I.; Kästner, J. Exploring chemical and con-
formational spaces by batch mode deep active learning. Digital Discovery 2022,1,
605–620.
(37) Wilson, N.; Willhelm, D.; Qian, X.; Arróyave, R.; Qian, X. Batch active learning
for accelerating the development of interatomic potentials. Computational Materials
Science 2022,208, 111330.
(38) Lewis, D. D. A sequential algorithm for training text classifiers: Corrigendum and
additional data. Acm Sigir Forum. 1995; pp 13–19.
(39) Kranz, J. J.; Elstner, M.; Aradi, B.; Frauenheim, T.; Lutsker, V.; Garcia, A. D.;
Niehaus, T. A. Time-Dependent Extension of the Long-Range Corrected Density Func-
tional Based Tight-Binding Method. J. Chem. Theory Comput. 2017,13, 1737–1747.
(40) Miharu, E.; Shinsuke, T.; Hiroshi, T.; Haruo, I. The ‘size matching rule’ in di-, tri-, and
tetra-cationic charged porphyrin/synthetic clay complexes: effect of the inter-charge
distance and the number of charged sites. J. Phys. Chem. Solids 2004,65, 403–407.
(41) Madjet, M. E.; Abdurahman, A.; Renger, T. Intermolecular Coulomb Couplings from
Ab Initio Electrostatic Potentials: Application to Optical Transitions of Strongly Cou-
pled Pigments in Photosynthetic Antennae and Reaction Centers. J. Phys. Chem. B
2006,110, 17268–81.
44
(42) Renger, T.; Madjet, M.-A.; Schmidt am Busch, M.; Adolphs, J.; Müh, F. Structure-
based Modeling of Energy Transfer in Photosynthesis. Photosynth. Res. 2013,116,
367–388.
(43) Megow, J.; Renger, T.; May, V. Mixed Quantum-Classical Description of Excitation
Energy Transfer in Supramolecular Complexes: Screening of the Excitonic Coupling.
ChemPhysChem 2014,15, 478–485.
(44) McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Pro-
jection for Dimension Reduction. arXiv:1802.03426, 2020; https://arxiv.org/abs/
1802.03426.
(45) Rasmussen, C.; Williams, C. Gaussian Processes for Machine Learning, T. Dietterich,
Ed. 2006.
(46) Vinod, V.; Zaspel, P. Investigating Data Hierarchies in Multifidelity Machine Learn-
ing for Excitation Energies. arXiv:2410.11392, 2024; https://arxiv.org/abs/2410.
11392.
(47) Neese, F.; Wennmohs, F.; Becker, U.; Riplinger, C. The ORCA Quantum Chemistry
Program Package. J. Chem. Phys. 2020,152, 224108.
(48) Sokolov, M.; Bold, B. M.; Kranz, J. J.; Höfener, S.; Niehaus, T. A.; Elstner, M. Ana-
lytical Time-Dependent Long-Range Corrected Density Functional Tight Binding (TD-
LC-DFTB) Gradients in DFTB+: Implementation and Benchmark for Excited-State
Geometries and Transition Energies. J. Chem. Theory Comput. 2021,17, 2266–2282.
(49) Vinod, V.; Maity, S.; Zaspel, P.; Kleinekathöfer, U. Multifidelity Machine Learning for
Molecular Excitation Energies. J. Chem. Theory Comput. 2023,19, 7658–7670, PMID:
37862054.
45
(50) Vinod, V.; Kleinekathöfer, U.; Zaspel, P. Optimized multifidelity machine learning for
quantum chemistry. Mach. learn.: sci. technol. 2024,5, 015054.
(51) Vinod, V.; Zaspel, P. Assessing non-nested configurations of multifidelity machine learn-
ing for quantum-chemical properties. Mach. Learn.: Sci. Technol. 2024,5, 045005.
(52) Renger, T.; May, V.; Kühn, O. Ultrafast Excitation Energy Transfer Dynamics in
Photosynthetic Pigment-Protein Complexes. Phys. Rep. 2001,343, 137–254.
(53) May, V.; Kühn, O. Charge and Energy Transfer in Molecular Systems, 3rd ed.; Wiley–
VCH, 2011.
(54) Holtkamp, Y. M.; Godinez-Ramirez, E.; Kleinekathöfer, U. Spectral Densities, Struc-
tured Noise and Ensemble Averaging within Open Quantum Dynamics. J. Chem. Phys
2024,161, 134101.
(55) Damjanović, A.; Kosztin, I.; Kleinekathöfer, U.; Schulten, K. Excitons in a Photosyn-
thetic Light-Harvesting System: A Combined Molecular Dynamics, Quantum Chem-
istry and Polaron Model Study. Phys. Rev. E 2002,65, 031919.
(56) Rätsep, M.; Freiberg, A. Electron-Phonon and Vibronic Couplings in the FMO Bacte-
riochlorophyll a Antenna Complex Studied by Difference Fluorescence Line Narrowing.
J. Lumin. 2007,127, 251–259.
(57) Kim, B.; Bohandy, J. SPECTROSCOPY OF PORPHYRINS. Johns Hopkins APL
Tech. Dig. 1981,2, 153–163.
(58) Parandekar, P. V.; Tully, J. C. Detailed Balance in Ehrenfest Mixed Quantum-Classical
Dynamics. J. Chem. Theory Comput. 2006,2, 229–235.
(59) Jansen, T. L. C. Simple Quantum Dynamics with Thermalization. J. Phys. Chem. A
2018,122, 172–183.
46
(60) Holtkamp, Y.; Kowalewski, M.; Jasche, J.; Kleinekathöfer, U. Machine-Learned Correc-
tion to Ensemble-Averaged Wave Packet Dynamics. J. Chem. Phys. 2023,159, 094107.
(61) Holtkamp, Y.; Godinez-Ramirez, E.; Kleinekathöfer, U. TorchNISE. https://github.
com/CPBPG/TorchNISE, 2024; Accessed: 2024-10-26.
(62) Merlin, R. Rabi oscillations, Floquet states, Fermi’s golden rule, and all that: Insights
from an exactly solvable two-level model. Am. J. Phys. 2021,89, 26–34.
(63) Scully, S. R.; McGehee, M. D. Effects of optical interference and energy transfer on
exciton diffusion length measurements in organic semiconductors. J. Appl. Phys. 2006,
100, 034907.
(64) Firdaus, Y. et al. Long-range exciton diffusion in molecular non-fullerene acceptors.
Nat. Commun. 2020,11, 5220.
(65) Huijser, A.; Savenije, T. J.; Kroeze, J. E.; Siebbeles, L. D. A. Exciton Diffusion and
Interfacial Charge Separation in meso-Tetraphenylporphyrin/TiO2 Bilayers: Effect of
Ethyl Substituents. J. Chem. Phys. B 2005,109, 20166–20173.
(66) Huijser, A.; Savenije, T. J.; Meskers, S. C. J.; Vermeulen, M. J. W.; Siebbeles, L.
D. A. The Mechanism of Long-Range Exciton Diffusion in a Nematically Organized
Porphyrin Layer. Journal of the American Chemical Society 2008,130, 12496–12500,
PMID: 18717557.
(67) Kaushal, M.; Ortiz, A. L.; Kassel, J. A.; Hall, N.; Lee, T. D.; Singh, G.; Walter, M. G.
Enhancing exciton diffusion in porphyrin thin films using peripheral carboalkoxy groups
to influence molecular assembly. J. Mater. Chem. C 2016,4, 5602–5609.
(68) Gu, C.; Zhang, H.; Yu, J.; Shen, Q.; Luo, G.; Chen, X.; Xue, P.; Wang, Z.; Hu, J.
Assembled Exciton Dynamics in Porphyrin Metal–Organic Framework Nanofilms. Nano
Letters 2021,21, 1102–1107, PMID: 33404245.
47
(69) Jo, S.; Kim, T.; Iyer, V. G.; Im, W. CHARMM-GUI: A Web-Based Graphical User
Interface for CHARMM. J. Comput. Chem. 2008,29, 1859–1865.
(70) Choi, Y. K.; Kern, N. R.; Kim, S.; Kanhaiya, K.; Afshar, Y.; Jeon, S. H.; Jo, S.;
Brooks, B. R.; Lee, J.; Tadmor, E. B.; Heinz, H.; Im, W. CHARMM-GUI Nanomate-
rial Modeler for Modeling and Simulation of Nanomaterial Systems. J. Chem. Theory
Comput. 2022,18, 479–493.
(71) Da Silva, A. W. S.; Vranken, W. F. ACPYPE-Antechamber Python Parser Interface.
BMC Res. Notes 2012,5, 367.
(72) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E.
GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism
from Laptops to Supercomputers. SoftwareX 2015,1-2, 19–25.
(73) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and
Testing of a General Amber Force Field. J. Comput. Chem. 2004,25, 1157–1174.
(74) Heinz, H.; Lin, T.-J.; Kishore Mishra, R.; Emami, F. S. Thermodynamically Consistent
Force Fields for the Assembly of Inorganic, Organic, and Biological Nanostructures:
The INTERFACE Force Field. Langmuir 2013,29, 1754–1765.
(75) Evans, D. J.; Holian, B. L. The Nose–Hoover Thermostat. J. Chem. Phys. 1985,83,
4069–4074.
(76) Berendsen, H. J. C.; Postma, J. P. M. v.; van Gunsteren, W. F.; DiNola, A. R. H. J.;
Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys.
1984,81, 3684–3690.
(77) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A
Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995,103, 8577–8593.
48
(78) Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS: A Linear
Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997,18, 1463–1472.
(79) Gaus, M.; Goez, A.; Elstner, M. Parametrization and Benchmark of DFTB3 for Organic
Molecules. J. Chem. Theory Comput. 2013,9, 338–354.
(80) Neese, F. Software Update: The ORCA Program System—Version 5.0. Wires. Comput.
Mol. Sci. 2022,12, e1606.
(81) Williams, C. K.; Rasmussen, C. E. Gaussian processes for machine learning ; MIT press
Cambridge, MA, 2006; Vol. 2.
(82) Gardner, J.; Pleiss, G.; Weinberger, K. Q.; Bindel, D.; Wilson, A. G. GPyTorch: Black-
box matrix-matrix Gaussian process inference with GPU acceleration. Advances in
Neural Information Processing Systems 2018,31 .
(83) Rupp, M.; Tkatchenko, A.; Müller, K.-R.; Von Lilienfeld, O. A. Fast and accurate
modeling of molecular atomization energies with machine learning. Physical review
letters 2012,108, 058301.
(84) Langer, M. F.; Goeßmann, A.; Rupp, M. Representations of molecules and materials
for interpolation of quantum-mechanical simulations via machine learning. npj Compu-
tational Materials 2022,8, 41.
(85) Li, W.; Duan, L.; Xu, D.; Tsang, I. W. Learning with augmented features for supervised
and semi-supervised heterogeneous domain adaptation. IEEE Transactions on Pattern
analysis and machine intelligence 2013,36, 1134–1148.
(86) Müller, K.-R.; Finke, M.; Murata, N.; Schulten, K.; Amari, S.-i. A Numerical Study
on Learning Curves in Stochastic Multilayer Feedforward Networks. Neural Comput.
1996,8, 1085–1106.
49
(87) Cortes, C.; Jackel, L. D.; Solla, S.; Vapnik, V.; Denker, J. Learning Curves: Asymptotic
Values and Rate of Convergence. Advances in Neural Information Processing Systems.
1993.
(88) Westermayr, J.; Marquetand, P. Machine Learning for Electronically Excited States of
Molecules. Chem. Rev. 2020,121, 9873–9926.
(89) Westermayr, J.; Faber, F. A.; Christensen, A. S.; von Lilienfeld, O. A.; Marquetand, P.
Neural networks and kernel ridge regression for excited states dynamics of CH2NH:
From single-state to multi-state representations and multi-property machine learning
models. Mach. learn.: sci. technol. 2020,1.
(90) Dral, P. O. Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett.
2020,11, 2336–2347.
50