Random forest classication for predicting lifespan-
extending chemical compounds
University of Surrey Faculty of Engineering and Physical Sciences https://orcid.org/0000-0001-6234-
Brendan James Howlin ( firstname.lastname@example.org )
Department of Chemistry, FEPS, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Keywords: ageing, anti-ageing drugs, lifespan extension, DrugAge, C. elegans, machine learning, random
forest, molecular descriptors, molecular ngerprints, QSAR
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative
diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related
diseases are a growing research area. The aim of this study was to build a machine learning model
based on the data of the DrugAge database to predict whether a chemical compound will extend the
lifespan of the worm species
. Five predictive models were built using the random
forest algorithm with molecular ngerprints and/or molecular descriptors as features. Feature selection
was achieved using variation and mutual information-based methods. The best performing classier,
built using molecular descriptors, achieved an area under the curve (AUC) score of 0.815 for classifying
the compounds in the test set. The features of the model were ranked using the Gini importance measure
of the random forest algorithm. The top 30 most important features included descriptors related to atom
and bond counts, topological and partial charge properties. The model was applied to predict the class of
compounds in an external database, consisting of 1,738 small-molecules. The chemical compounds of
the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of
were broadly separated into (i) avonoids, (ii) fatty acids and conjugates, and (iii)
Pharmacological interventions for longevity extension
Ageing is a major health, social and nancial challenge, characterised by the deterioration of the
physiological processes of an organism [1, 2]. Ageing is a predominant risk factor for many conditions
including various types of cancers, cardiovascular and neurodegenerative diseases [3, 4]. Interventions
targeting the cellular and molecular process of ageing can help delay and prevent age-related diseases
Several pharmaceutical and non-pharmaceutical interventions have been identied to extend the lifespan
of a variety of model organisms . Caloric restriction can slow down ageing and protect against age-
related diseases by regulating signalling pathways such as mTOR, the mammalian target of rapamycin
. However, the intensity of long-term dietary restrictions makes it dicult to maintain .
Pharmaceutical interventions are considered the most practical interventions for combating human
ageing, as they are easier to maintain than dietary restrictions, as well as, free of ethical concerns
associated with genetic interventions .
Caenorhabditis elegans in ageing research
The worm species
) is one of the most studied model organisms in
longevity research and has signicantly contributed to our fundamental understanding of organismal
has a short lifespan of approximately 3 weeks which makes it well suited for
longevity studies in contrast to long-lived mammals [6, 7]. Besides its short lifespan,
several observable and quantiable changes with ageing in its anatomical and functional features. Thus,
the ageing process can be easily monitored [6, 7].
Ageing affects several important tissues systems of
including cuticle (skin), hypodermis,
muscles, reproductive and nervous system . With ageing, the cuticle becomes progressively thicker and
more wrinkled. The muscle tissues of
deteriorate resulting in a decline in locomotion .
Studies have shown that decline in locomotion more closely predicted the time of death of individual
worms than chronological age . “Chronological age” does not always perfectly correlate with
“biological age”, which is the organism's physical state . “Biological age” is inuenced by the genetic
background of the organism as well as environmental factors .
The nervous system of
displays more subtle changes with increasing age compared to other
tissue systems . These include synaptic deterioration, a decline in learning ability as well as reduced
regeneration capacity of motor neurons . Reproduction of
ceases with age and its
reproductive system structure deteriorates .
has much simpler physiology than humans, it possess several of the key organ
systems present in more complex organisms such as digestive, nervous and reproductive systems .
Many of the mechanisms and genes that extend the lifespan of
are evolutionarily conserved
across organisms, from yeast to humans . Therefore, potential lifespan-extending drugs can be rst
tested on worm and then assessed on mammals.
Overview of key ageing studies
Several ageing studies have identied interventions that extend the lifespan of model organisms ranging
from nematodes and fruit ies to rodents. These interventions include dietary restrictions, genetic
modications and pharmaceutical interventions. Lee
(2006) presented the rst evidence that long-
term dietary deprivation can improve longevity in a multicellular species,
(2009) showed that rapamycin, an inhibitor of the mTOR pathway, extended the lifespan of both female
and male mice . In the same year, Selman
(2009) reported that genetic deletion of S6 protein
kinase 1 increased the lifespan of mice and protected against age-related diseases .
(2014) developed a pharmacological network to identify pharmacological classes related to the
. The network showed that resistance to oxidative stress and lifespan extension
clustered in a few pharmacological classes, most of them related to intercellular signalling .
(2016) developed a deep learning neural network that predicted human
chronological age from a basic blood test . The study identied the top ve most critical blood
markers for determining chronological age in humans, which were albumin, glucose, alkaline
phosphatase, urea and erythrocytes . Mamoshina
(2018) developed a deep learning-based
haematological ageing clock using blood samples from Canadian, South Korean, and Eastern European
populations, with millions of subjects . The study demonstrated that population-specic ageing
clocks were more accurate in predicting chronological age and quantifying biological age than generic
ageing clocks .
(2017) built a random forest model to predict whether a compound would increase the
based on the data of the DrugAge database [1, 4]. The features used to build the
random forest model were molecular descriptors and gene ontology terms. Feature selection was
performed using random forest’s feature importance measure. The best performing model, with an AUC
score of 0.80, was applied to predict the class of the compounds in the DGIdb database.
Purpose of the work
In this study, the random forest algorithm was applied to predict whether a compound will increase the
. This was achieved by building ve predictive models, each using different
descriptor types, based on the data of DrugAge database published by Barardo
(2017) . The
features of the models were molecular ngerprints and/or molecular descriptors calculated from the
structure of the compounds in DrugAge database. The lter-based feature selection method, mutual
information, was employed to select the most relevant features. To the best of our knowledge, this is the
rst application of molecular ngerprints to build a machine learning model based on the entries of the
DrugAge database. The best performing model was applied to predict the class of the compounds in an
external database, consisting of 1,738 small-molecules.
Random forest models
A random forest is a supervised machine learning algorithm that is widely applied for classication tasks.
This method was selected as it is robust to overtting in high-dimensional databases with a small
number of entries, making it suitable for the data used in this study .
The choice of chemical descriptors can signicantly impact on the quality and predictions of the QSAR
models. Descriptors represent chemical information of the molecules in a digital or numerical way that is
suitable for model development and are computer-interpretable [18, 19]. In this study, 2D and 3D
molecular descriptors were calculated using the Molecular Operating Environment (MOE™) software .
2D descriptors are calculated from the 2D structure of a molecule and provide information related to its
structural, topological and physicochemical properties . On the other hand, 3D descriptors are
generated from the 3D structure of a chemical compound and include electronic parameters (e.g. dipole
momentum), quantum–chemical descriptors (e.g. HOMO and LUMO energies), and surface:volume
descriptors [19, 22, 23].
Molecular ngerprints are a digital representation of a molecule’s structure using binary vectors, where 1
corresponds to a particular feature being present and 0 that it is absent. Several different categories of
molecular ngerprints exist, each reecting different aspects of a molecule . Herein, extended-
connectivity ngerprints (ECFP) of 1,024- and 2,048-bit lengths and RDKit topological ngerprints of
2,048-bit length were generated using the RDKit Python environment . Lastly, the combination of
molecular descriptors with ECFPs was tested.
Results And Discussion
Visualization of chemical space
This study involved high-dimensional datasets containing hundreds of molecular ngerprints and
descriptors. The PCA algorithm was applied to reduce the chemical space into two-dimensions. The
chemical space representations for the ECFP, RDKit ngerprints, molecular descriptors and combination
of ECFP with molecular descriptors produced using the PCA algorithm are shown in Fig.1.
In chemical space visualisation, structural analogues are positioned nearer to each other than to
unrelated compounds . This allows clustering techniques, such as PCA, do identify neighbourhoods
with similarly structured molecules . Thus, some degree of clustering was expected to be observed
between active compounds.
Among the single descriptor types, Fig.1(a-d), the highest degree of clustering between active molecules
was observed in the chemical space visualisation of the molecular descriptors. An explanation is that the
chemical ngerprints used in this study were hashed ngerprints. Hashed ngerprints often involve loss
of information due to bit collisions, thus, the distances between the ngerprints may not perfectly
correlate to the similarity of the compounds . Interestingly, the chemical space visualisation of the
combined feature type, Fig.1e, is almost identical to that of the molecular descriptors shown in Fig.1d.
This indicates that the molecular descriptors have a stronger expressive power than the ECFPs of 1,024-
bit length for the chemical space analysis of the DrugAge database.
Feature selection was employed to select the most relevant features for predicting the activity of a
molecule in the database. This was performed only for the training set which contained 80% of
compounds in the dataset. Feature selection was achieved by applying variance and mutual information-
based pre-selection methods. This reduced the number of features used by each model, making
computational calculations less expensive. The median AUC scores and standard deviation of 10-fold
cross-validation obtained by random forest classication for each feature combination can be found in
Supplementary Table1, Additional File 1. For each descriptor type, the feature combination with the
highest AUC score in 10-fold cross-validation was selected for classifying the compounds in the test set.
In cases where two feature combinations achieved the same AUC score, the combination that had the
smallest standard deviation was used.
The test set contained 20% of the data not used in training the models. The performances of the random
forest classiers on 10-fold cross-validation and on classifying the compounds in the test set are shown
Model Number of Selected Features Cross-Validation (AUC stdev) Test Set (AUC)
ECFP_1024 55 0.794 0.048 0.793
ECFP_2048 504 0.789 0.042 0.776
RDKit5 654 0.836 0.053 0.777
MD 69 0.823 0.041 0.815
ECFP_1024_MD 33 0.828 0.040 0.806
As illustrated in Fig.2, the predictive performances of the random forest models did not signicantly drop
for classifying the compounds in the test set and were compatible with the spread of the AUC scores
from cross-validation. This indicated that overtting was minimised.
The receiver operating characteristic (ROC) curve is the plot of the True Positive Rate (TPR) against the
False Positive Rate (FPR) at varying classication thresholds. The ROC curves, displayed in Fig.3,
compare the performances of the descriptor types for classifying the samples of the test set. Analysis of
the ROC curves indicates that the ve random forest models performed better than a random prediction.
The best performing model, selected by its ability to correctly classify the compounds in the test set, was
used for predicting the class of the compounds in the screening dataset. In general, the random forest
models with a smaller number of selected features, such as ECFP_1024, MD and ECFP_1024_MD, had
better performances on the test set. The classier built using only molecular descriptor, the MD model,
had the greatest ability to correctly predict the class of the compounds in the test set. Combining MD with
ECFP_1024, the random forest model with the second-highest predictive ability, did not result in higher
performance. The ECFP_1024 features could have provided additional information that was not useful to
the random forest classier making the predictions more dicult. Therefore, the MD model, which had an
AUC score of 0.815 for classifying the compounds in the test set, was selected for further analysis.
The confusion matrix of the MD random forest model for predicting the class of the molecules in the test
set is shown in Fig.4. The classication accuracy of the model was 0.853 and the AUC score was 0.815.
The calculation of the Positive Predictive Value (PPV), Eq.1, and Negative Predictive Value (NPV), Eq.2, is
In binary classication, the PPV and NPV are the percentage of positive and negative values, respectively,
that are correctly classied. Herein, the PPV and NPV indicate that the random forest model performed
better on correctly classifying inactive compounds than active ones. The data used in this study was
imbalanced as approximately 79% of the samples were negative entries. Thus, a random prediction that a
compound is inactive had a much higher initial probability of being correct. To handle the imbalanced
data, the “class_weight” argument of the random forest algorithm was set to “balanced”, which penalises
misclassication of the minority class . This improved the performance of the model, as the PPV for
classifying the compounds of the test set increased from 61.1% (value without balancing the class
weights) to 65.6% (score achieved after balancing the class weights).
In this experiment, the feature relevance was measured using the “Gini importance” of the random forest
algorithm. The selected model, MD, was composed of 69 molecular descriptors calculated by the MOE™
software . The table containing the full feature ranking can be found in Additional File 2. The analysis
was focused on the top 30 features with the highest Gini importance (Table2), which contained both 2D
and 3D molecular descriptors.
Top ranking MD descriptors.
importance Feature Description
0.062 a_nN Number of nitrogen atoms
0.029 PEOE_VSA + 2 Total positive van der Waals surface area of atoms with a partial
charge in the range of 0.10 to 0.15
0.026 vsurf_D8 Hydrophobic volume
0.024 h_pKa The pKa of the reaction that removes a proton
0.023 SMR_VSA6 Sum of van der Waals surface areas such that the molar
refractivity contribution is in the range of 0.485 to 0.560
0.023 rsynth A value in [0,1] indicating the synthetic reasonableness, or
feasibility, of the chemical structure. A value of 0 means it is
unlikely that the molecule can be synthesized while a value of 1
means that it is likely that the molecule can be synthesized. The
value reects the fraction of heavy atoms in the molecule that
can be traced back to starting materials fragments resulting from
retrosynthetic disconnection rules.
0.022 PEOE_VSA-4 Total positive van der Waals surface area of atoms with a partial
charge in the range of -0.25 to -0.20
0.021 PEOE_VSA + 4 Total positive van der Waals surface area of atoms with a partial
charge in the range of 0.20 to 0.25
0.021 PEOE_VSA-6 Total positive van der Waals surface area of atoms with a partial
charge that is less than − 0.30
0.021 PEOE_VSA_PPOS Total positive van der Waals surface area of atoms with a partial
charge that is greater than 0.20
0.020 chi0_C Carbon connectivity index (order 0)
0.020 Q_VSA_PNEG Total negative polar van der Waals surface area of atoms of with
a partial charge that is less than − 0.20
0.020 PEOE_VSA_POL Total polar van der Waals surface area of atoms of which the
absolute value of their partial charge is greater than 0.20
0.020 chi0v_C Carbon valence connectivity index (order 0)
0.019 SMR_VSA3 Sum of van der Waals surface areas such that the molar
refractivity contribution is in the range of 0.35 to 0.39
0.019 Q_VSA_PPOS Total positive van der Waals surface area of atoms with a partial
charge that is greater than 0.20
0.018 b_single Number of single bonds
0.018 a_count Number of atoms
importance Feature Description
0.018 SlogP_VSA3 Sum of van der Waals surface areas such that the logP(o/w) is in
the range of 0.0 to 0.1
0.018 PEOE_VSA_PNEG Total negative polar van der Waals surface area of atoms of with
a partial charge that is less than − 0.20
0.017 TPSA Topological polar surface area
0.017 zagreb Zagreb index
0.017 weinerPol Wiener polarity number
0.017 opr_brigid The number of rigid bonds
0.017 Kier3 Third kappa shape index
0.016 PEOE_VSA-1 Total positive van der Waals surface area of atoms with a partial
charge in the range of -0.10 to -0.05
0.016 chi0 Atomic connectivity index (order 0)
0.016 Kier2 Second kappa shape index
0.016 SlogP_VSA2 Sum of van der Waals surface areas such that the logP(o/w) is in
the range of -0.2 to 0.0
0.015 a_nH Number of hydrogen atoms
Top 30 features ranked by Gini importance for the MD random forest model. The description of the
features was taken from the MOE™ software documentation .
The highest-ranking features were broadly separated into the following categories (i) atom and bond
counts (ii) topological and (iii) partial charge descriptors.
Atom and bond counts are simple descriptors that do not provide any information on molecular geometry
or atom connectivity. The highest-ranking atom and bond count descriptors were a_nN, b_single, a_count,
opr_brigid, and a_nH. While very simplistic, the atom and bond counts outperformed other more complex
molecular descriptors. This is because atom and bond counts can partially capture the overall properties
of a compound such as size, hydrogen bonding and polarity, which often impact the activity of a drug
. The number of nitrogen atoms, a_nN, was the top-ranking feature of the MD random forest model
with a Gini importance score of 0.062. This is consistent with the results of Barardo
a_nN was also ranked highest for predicting the class of the compounds in the DrugAge database .
Nitrogen atoms could have affected the physicochemical properties of the drugs as well as the
interactions and binding of the molecules with target residues.
The highest-ranking topological descriptors included chi0_C, chi0v_C, zagreb, weinerPol, Kier3, chi0 and
Kier2. Topological descriptors take into account atom connectivity. The descriptors are computed from
molecular graphs, where atoms are represented by vertices and the bonds by edges . These
descriptors can provide information on the degree branching of the structure as well as molecular size
and shape . Although topological descriptors are extensively used in predictive modelling, they are
usually hard to interpret . Topological descriptors may have provided information on how well a
molecule ts in the binding site and along with atom counts the interactions with the binding residues.
Top ranking partial charge descriptors were PEOE_VSA + 2, PEOE_VSA-4, PEOE_VSA + 4, PEOE_VSA-6,
PEOE_VSA_PPOS, Q_VSA_PNEG, PEOE_VSA_POL, Q_VSA_PPOS and PEOE_VSA_PNEG. The “PEOE_”
prex denotes descriptors calculated using the partial equalization of orbital electronegativity (PEOE)
algorithm for quantication of partial charges in the system [34, 35]. On the other hand, descriptors
prexed with “Q_” were calculated using the Amber10:EHT force eld . In a ligand-receptor system,
partial charges can play a key role in the binding properties of the molecule as well as molecular
Predicting potential lifespan-extending compounds
The MD random forest model was applied to predict the class compounds in an external database,
consisting of 1,738 small-molecules obtained from the DrugBank database . The top-ranking
compounds with a predictive probability of for increasing the lifespan of
are shown in
Table3. The full ranking of the molecules in the screening database can be found in Additional File 2.
The compounds were broadly separated into the following categories; (i) avonoids, (ii) fatty acids and
conjugates, and (iii) organooxygen compounds. The compound classication was taken from the
category “Class” in the chemical taxonomy section of the DrugBank database (provided by Classyre) or
assigned manually if not available .
Top-hit compounds from external database.
Compound name Predictive probability
Gamolenic acid 0.95
Sodium aurothiomalate 0.82
Chemical compounds from the screening database with a
predictive probability of 0.80 or above for increasing the of
Flavonoids are a group of secondary metabolites in plants that are common polyphenols in the human
diet . Major nutritional sources include tea, soy, fruits, vegetables, wine and nuts [38, 39]. Flavonoids
are separated into subclasses based on their chemical structure, including avones, avonols,
avanones, and isoavones . Isoavones differ to other avonoids by having ring B attached to C-3
position of ring C, rather than the C-2 position as shown in Fig.5 .
Flavonoids have been associated with health benets for age-related conditions such as metabolic
diseases, cancer, inammation and cognitive decline [38, 39]. Possible mechanisms of action include
antioxidant activity, scavenging of radicals, central nervous system effects, alteration of the intestinal
transport, sequestration and processing of fatty acids, PPAR activation and increase of insulin sensitivity
Diosmin was the top-hit molecule in the screening database, with a predictive probability of 0.96. Diosmin
is a avonol glycoside that is either extracted from plants such as Rutaceae or obtained synthetically
. It has anti-inammatory, free radical scavenging, and anti-mutagenic properties and has been used
medically to treat pain and bleeding of haemorrhoids, chronic venous disease and lymphedema .
Nevertheless, diosmin has a poor aqueous solubility, which is a challenge for oral administration .
(2017) found that a combination of diosmin with essential oils showed skin antioxidant, anti-
ageing and sun-blocking effects on mice . The underlying mechanisms for diosmin’s anti-ageing and
photo-protective effects include enhancing lymphatic drainage, ameliorating capillary microcirculation
inammation and preventing leukocyte activation, trapping, and migration [42, 43].
Other avonoids that ranked high for increasing the lifespan of
were rutin and hesperidin with
a predictive probability of 0.95 and 0.94, respectively. Rutin (or quercetin-3-rutinoside), is a avonol
glycoside that is abundant in many plants such as passionower, apple, tea, buckwheat seeds and citrus
fruits [44, 45]. It possesses a range of biological properties including antioxidant, anticancer,
neuroprotective, cardio-protective and skin-regenerative activities [44, 45]. Rutin had a high structural
similarity to other avonoids in the DrugAge database and particularly with quercetin 3-O-β-d-
glucopyranoside-(4→1)-β-d-glucopyranoside (Q3M). The Tanimoto coecient between the RDKit
ngerprints of Q3M and rutin was 0.99. The similarity map between the two compounds is shown in
Q3M is a avonoid abundant in onion peel that was found to extend the lifespan of
. In the
same study, although rutin was found to improve the tolerance of
to oxidative stress, which is
desirable for longevity , it did not affect the worm's lifespan . Davalli
(2016) also reported that
rutin did not improve the longevity of
. On the other hand, Chattopadhyay
showed the rutin promoted longevity in a species of y,
Hesperidin has shown reactive oxygen species (ROS) inhibition and anti-ageing effects in the yeast
(2011) found that hesperidin extracted
from orange juice had a positive inuence on the lifespan of
showed that orange extracts, where hesperidin was the predominant phenolic compound, increased the
mean lifespan of
. In the same study, orange extracts were also found to promote longevity
by enhancing motility and reducing the accumulation of age pigment and ROS levels .
Soy isoavones include genistein, glycitein, and daidzein. Genistein, a compound of the DrugAge, has
been found to prolong the lifespan of
and increase its tolerance to oxidative stress .
(2005) found that
fed with soy isoavone glycitein had an improved
resistance towards oxidative stress . However, in comparison to control worms, the lifespan of
fed with glycitein was not signicantly affected . The effect of daidzein on the lifespan of
in the presence of pathogenic bacteria was investigated by Fischer
(2012) . The study
found that daidzein had an estrogenic effect that which extended the worm’s lifespan in presence of
pathogenic bacteria and heat . Herein, we applied the MD random forest model to predict the effect of
6''-O-malonyldaidzin on the lifespan of
6''-O-Malonyldaidzin is an o-glycoside derivative of
daidzein found in food products such as soybean, miso, soy milk and soy yoghurt . Its predicted
probability for extending the lifespan of the worm was 0.84.
Fatty acids and conjugates
Lipid metabolism has an essential role in many biological processes of an organism. Lipids are used as
energy storage in the form of triglycerides and can therefore aid survival under severe conditions .
Additionally, lipids have a key role in intercellular and intracellular signalling as well as organelle
homeostasis . Research on both invertebrates and mammals suggest that alteration in lipid levels
and composition are associated with ageing and longevity [56, 57].
A recent review by Johnson and Stolzing (2019), on lipid metabolism and its role in ageing, lifespan
extension and age-related conditions, summarised key lipid-related interventions that promote longevity
. Some of the studies presented in that review are reported here. In response to fasting
(2013), showed that supplementing
with the -6 polyunsaturated fatty acids
(PUFAs) arachidonic acid and di-homo‐γ‐linoleic increased the worm’s starvation resistance and
prolonged its lifespan by stimulating autophagy . Similarly, Qi
(2017), found that treating
with -3 PUFA -linolenic acid in dose‐dependent manner extended the worm’s lifespan . The
study indicated that the -3 fatty acid underwent oxidation to generate a group of molecules known as
oxylipins. The ndings suggested that the increase the worm’s lifespan could be a result of the combined
effects of the α-linolenic acid and oxylipin metabolites . Sugawara
(2013) found that a low dose
of sh oils, which contained PUFAs eicosapentaenoic acid and docosahexaenoic acid, signicantly
increased the lifespan of
. The authors proposed that a low dose of sh oils induces
moderate oxidative stress that extended the lifespan of the organism. In contrast, large amounts of sh
oils had a diminishing effect on the worm’s lifespan .
Gamolenic acid or –linolenic acid (GLA) was the second top-hit molecule of the screening database with
a predictive probability of 0.95. GLA is an -6 PUFA, composed of an 18-carbon chain with three double
bonds in the 6th, 9th and 12th position . Rich sources of GLA include evening primrose oil (EPO),
black currant oil, and borage oil . In mammals, GLA is synthesized from linoleic acid (dietary) via the
action of the enzyme -6 desaturase [62, 63]. GLA is a precursor for other essential fatty acids such as
arachidonic acid [62, 63]. Conditions such as hypertension and diabetes as well as stress and various
aspects of ageing, reduce the capacity of -6 desaturase to convert linoleic acid to GLA . This may
lead to a deciency of long-chain fatty acid derivatives and metabolites of GLA. GLA has been used as a
constituent of anti-ageing supplements and has shown to possess various therapeutic effects in humans
including improvement of age-related anomalies .
Sodium aurothiomalate, with a lifespan increase probability of 0.82, is a thia short-chain fatty acid used
for the treatment of rheumatoid arthritis and has potential antineoplastic activities [37, 65]. In preclinical
models, sodium aurothiomalate inhibited protein kinase C iota (PKCι) signalling, which is overexpressed
in non-small cell lung, ovarian and pancreatic cancers . The chemical structure of sodium
aurothiomalate is shown in Fig.7.
Lactose, with a lifespan increase probability of 0.89, is a disaccharide found in milk and other dairy
product. In the human intestine, lactose is hydrolysed to glucose and galactose by the enzyme lactase.
Out of the compounds in the DrugAge database, lactose had the highest structural similarity with
trehalose. Trehalose has been found to increase the mean lifespan of
by over 30%, without
showing any side effects . The Tanimoto coecient between the RDKit ngerprint representations of
trehalose and lactose was 0.85. The similarity map generated using ECFP ngerprints is shown in Fig.8.
Even though lactose has a high (Tanimoto) similarity to trehalose, Xing
(2019) found that lactose
treatment shortened the lifespan of
Sucrose, with a lifespan increase probability of 0.83, is a disaccharide composed of glucose and fructose
. It is used as the main form of transporting carbohydrates in fruits and vegetables . Other sugars
such as trehalose, galactose and fructose have been found to extend the lifespan of
70]. However, Zheng
(2017) found the treating
with sucrose had no signicant effect on
the organism’s mean lifespan . In rats, sucrose has been found to shorten the mean lifespan and
elevate the blood pressure . Rovenko
(2015) showed that in
, high sucrose
consumption decelerated pupation, increased pupa mortality and promoted obesity .
Lactulose, with a lifespan increase probability of 0.83, is a synthetic disaccharide composed of
monosaccharides lactose and galactose . Lactulose has been to be an effective treatment for chronic
constipation in elderly patients as well as improve the cognitive function in patients with hepatic
encephalopathy [72, 73].
Other classes of compounds
Other compounds with a predictive probability ≥ 0.80 for increasing the lifespan of
aloin, a constituent of
with a predictive probability of 0.81, as well as the antibiotics daxomicin
(predictive probability = 0.84), rifapentine (predictive probability = 0.81) and chlortetracycline (predictive
probability = 0.80).
is a well-known plant used in medicine, cosmetics and beverages. It possesses a wide range of
biological properties including anti-inammatory, anticancer, laxative and antioxidant activities as well as
promoting the healing process of dermal injuries [74, 75]. Additionally,
has been associated with
improving disorders such as diabetes, microbial diseases, cardiovascular and liver problems . Its
biological activities have been attributed to the plethora of phytochemicals present in the
and gel. Various studies have demonstrated that the anthraquinones and glycosides present in the sap
have a key role in its anticancer, anti-inammatory, laxative effects, tyrosinase inhibition, free radical and
proliferative activities . Chandrashekara
(2011) found that
that lifespan of
larvae . This effect was attributed to the plethora of chemicals
including proteins, lipids, amino acids and small-molecules. The authors proposed
extract had a similar effect to the worm’s lifespan as resveratrol, including
neuroprotection and stimulation of regrowth or repair of nerve bres .
Aloin is a bioactive compound in various
species. It is composed of two diastereoisomers, aloin A, or
barbaloin, and aloin B, or isobarbaloin, which have similar chemical properties . Aloin is an
anthraquinone glycoside, which is an anthraquinone containing a sugar molecule. Aloin has been used
medically as stimulant-laxative, alleviating constipation by triggering bowel movements . In this study,
the MD random forest model was applied to predict the effect of aloin A on the lifespan of
which had a predictive probability of 0.81. Aloin has been found to possess anti-inammatory,
antiproliferative and anticancer activities as well as protect dermal broblasts against oxidative stress
damage [77–80]. Experimental testing would be required to further investigate the effect of aloin A on the
Rifapentine is a macrolactam antibiotic approved for the treatment of tuberculosis . Macrolactams
are a small class of compounds which consist of cyclic amides having unsaturation or heteroatoms
replacing one or more carbon atoms in the ring . Macrolactams such as rifampicin and rifamycin
have been found to increase the lifespan of
Advanced glycation end (AGE) products are formed from the non-enzymatic reaction of sugars, such as
glucose, with proteins, lipids or nucleic acids . AGE products have been implicated in ageing and age-
related diseases such as diabetes, atherosclerosis, and neurodegenerative . Golegaonkar
showed that rifampicin reduced AGE products and extended the mean lifespan of
by 60% .
The effect of two other macrolactams, rifamycin SV and rifaximin, on the worm’s lifespan was also
investigated. Rifamycin SV was found to exhibit similar activity to rifampicin, while rifaximin lacked anti-
glycating activity and did not extend the lifespan of
. The authors suggested that the anti-
glycation properties of rifampicin and rifamycin could be attributed to the presence of a para-dihydroxyl
moiety, which was not present in rifaximin . As shown in Fig.9, this functional group is also present in
rifapentine. Experimental testing would be required to investigate whether rifapentine possess similar
properties to rifampicin and rifamycin.
Evaluation of the chemical similarity principle
Several of the compounds identied by the random forest model had already been experimentally
evaluated for increasing the lifespan of
and other model organisms. In particular, the RDKit
ngerprints of rutin are 0.99 (Tanimoto) similar to that of Q3M, an active compound. However,
experimental studies found that although it is structurally similar to active compounds, rutin does not
extend the lifespan of
[47, 48]. Additionally, the Tanimoto coecient between the RDKit
ngerprint representations of lactose and trehalose, an active compound, is 0.85. Nevertheless,
studies showed that treatment with lactose reduced the lifespan of
. In these cases, the
chemical similarity principle, which states that chemically similar compounds tend to have similar
bioactivities, appears to be invalid. An explanation presented by Martin
(2002) is that protein
structures are complex and exible systems . Thus, structurally similar chemicals may bind in
different orientations to the active site, interact with a different conformation of the protein or even bind
to completely different proteins .
Pharmaceutical interventions that modulate ageing-related genes and pathways are considered the most
effective approach for combating human ageing and age-related diseases. Widely used strategies for
identifying active compounds include screening existing drugs with potential anti-ageing activities.
In this study, the random forest algorithm was applied to analyse the DrugAge database and predict
whether a compound would increase the lifespan of
. Five different random forest models were
built using molecular ngerprints and/or molecular descriptors as features. Feature selection and
dimensionality reduction were performed using variation and mutual information-based pre-selection
methods. The best performing classier, the MD model, used molecular descriptors and achieved an AUC
score of 0.815 for classifying the compounds in the test set. Combining molecular descriptors with
ECFPs did not further improve the model’s performance. The features of the MD model were ranked using
random forest’s Gini importance measure. Among the 30 highest important features were molecular
descriptors related to atom and bond counts, topological and partial charge properties.
The highest performing model was applied to predict the class of the compounds in the screening
database which consisted of 1,738 small-molecules from DrugBank. The compounds with a predictive
probability of ≥ 0.80 for increasing the lifespan of
were broadly separated into (i) avonoids,
(ii) fatty acids and conjugates, and (iii) organooxygen compounds. This study also elucidated several
molecules such as orange extracts, rutin, lactose and sucrose, that have been experimentally evaluated
but were not entries of the predictive database. Future work would include
promising compounds such as –linolenic acid, aloin and rifapentine to investigate their effect on the
Dataset for predicting lifespan-extending compounds
The dataset published in the study by Barardo
(2017) contains positive entries, which are
compounds that “increase the lifespan of
” and negative entries, compounds that “do not
increase the lifespan of
” . In particular, the dataset contains 1,392 compounds of which 229
are positive and 1,163 are negative entries . The positive entries of this dataset were obtained from
DrugAge database of ageing-related drugs, (Build 2, release date: 01/09/2016), available in the Human
Ageing Genomic Resources website [1, 84]. DrugAge provides information on drugs, compounds and
supplements with anti-ageing properties that have been found to extend the lifespan of model organisms
. The species include worms, mice and ies, with the majority of data representing
has been obtained from studies performed under standard conditions and contain information relevant to
ageing, such as average/median lifespan, maximum lifespan, strain, dosage and gender where available
. The negative entries of the database used in the study of Barardo
(2017) were obtained from the
At the time of writing, the latest version of DrugAge database, Build 3 (release date: 19/07/2019), corrects
for small errors and adds hundreds of new entries. Herein, the positive entries in the database used in
(2017) were replaced with the data from the newest version of DrugAge, Build 3. The same
negative entries as Barardo
(2017) were used . The modied database contained a total of 1,558
compounds with 395 positive entries and 1,163 negative ones. In this study, the term “DrugAge database”
refers to the modied dataset with a total of 1,558 compounds.
Representation of chemical compounds
The chemical structures of the DrugAge dataset were converted into canonical SMILES strings using the
Python package PubChemPy . The SMILES strings were standardised by the Standardiser tool
developed by Francis Atkinson in 2014 . Standardisation removed inorganic compounds, salt/solvent
components and metal species as well as neutralised the compounds by adding or removing hydrogen
atoms . Stereoisomers, even if biologically may have different activities, were treated as duplicates as
they had identical SMILES strings. For two or more stereoisomers in the same class, only one was kept.
For duplicates in different classes, both were removed . After standardisation and duplicate removal,
the number of molecules in DrugAge database was reduced to a total of 1,430 compounds with 304
positive and 1,126 negative entries. The predictive database used in this study can be found in Additional
Molecular descriptor generation
The standardised SMILES strings were converted into mol les in the RDKit environment and opened in
the MOE™ software [25, 30]. The chemical structures were energy minimised in the Energy Minimize
General mode of MOE™ using Amber10:EHT force eld . A total of 354 descriptors were calculated
including all 2D, internal i3D and external x3D coordinate depended on 3D descriptors. Due to software
limitation, few 3D descriptors ('AM1_E', 'AM1_Eele', 'AM1_HF', 'AM1_HOMO', 'AM1_IP', 'AM1_LUMO',
'MNDO_E', 'MNDO_Eele', 'MNDO_HF', 'MNDO_HOMO', 'MNDO_IP', 'MNDO_LUMO', 'PM3_E', 'PM3_Eele',
'PM3_HF', 'PM3_HOMO', 'PM3_IP', 'PM3_LUMO') could not be calculated for ten chemical structures. The
missing values were replaced with the average value of the remaining chemical structures for the given
Molecular ngerprint generation
Molecular ngerprints were generated in the Python RDKit environment from the standardised SMILES
strings . ECFP of 1,024-bits and 2,048-bits length were calculated with an atomic radius of 2. These
were represented as “ECFP_1024” and “ECFP_2048”, respectively. In addition to the ECFPs, RDKit
topological ngerprints were generated with a maximum path length of 5 bonds and denoted as
Five random forest models were build using ve different feature types and trained with the data of the
DrugAge database. The feature types explored in this study, ECFP_1024, ECFP_2048, RDKit5, MD and
ECFP_1024_MD, are described in Table4. The ECFP_1024_MD feature was a combined descriptor type
consisting of ECFPs of 1,024 bit-length and molecular descriptors.
Description of feature types explored in this study.
Database name Feature description Number of
ECFP_1024 ECFP of 1,024-bit length generated in the Python RDKit environment 1,024
ECFP_2048 ECFP of 2,048-bit length generated in the Python RDKit environment 2,048
RDKit5 RDKit topological ngerprints with a maximum path length of 5
bonds generated in the Python RDKit environment 2,048
MD 2D and 3D molecular descriptors calculated in MOE™ 354
ECFP_1024_MD Combination of “ECFP_1024” and “MD” descriptors 1,378
Feature selection was performed for each of the descriptor types shown in Table4 and implemented in
Python library . Features with low variance were removed rst, creating three sub-
databases var_100, var_95 and var_90. These removed features with the same value in all entries,
features that had greater than 95% of constant values and features with more than 90% constant values,
For each of the sub-databases, Adjusted Mutual Information (AMI) was applied using the
“adjusted_mutual_info_score” function of
to order the features based on their AMI score .
The following settings were tested: using 5%, 10%, 25%, 50%, 75% and 100% of the features with the
highest AMI score . For example, if var_100 for MD contained 349 features, the database with 5% of
the features would consist only of the 17 highest-ranking features. This process is outlined in
Supplementary Fig.1, Additional File 1.
Cross-validation was performed in the
Python library using the “cross_val_score” function .
The predictive database was randomly split into 80% training and 20% test set. The 10-fold cross-
validation was performed only on the training set. The performance of the models was evaluated using
the AUC measure. Cross-validation was repeated 10 times, yielding 10 AUC scores. The predictive
accuracy reported was the median AUC value of the 10 measurements obtained by cross-validation. The
median, rather than average, AUC score was calculated as the former is more robust to outliers .
Random forest settings
The random forest classiers were built in the
Python module . To handle the unbalanced
data used in this study, the random forest parameter “class_weight” was set to “balanced”. The remaining
parameters of the random forest classier were set to their default settings. The models were run with
100 estimators (number of trees in the forest) and the maximum number of features considered in each
tree node was the square root of the total number of features. The AUC scores were calculated with
“roc_auc_score” matrix of
using the “predict_proba” method .
Chemical space implementation
The 2D representations of the chemical space were generated by applying the PCA algorithm in the
library . Visualisation of molecular descriptors required feature scaling as the
descriptors had different ranges. Scale difference can negatively impact the performance of the PCA
model, as it incorrectly considers some features as more important than others. The resulting molecular
descriptors had a standard normal distribution with a mean of zero and a standard deviation of one .
Feature scaling was not required for the molecular ngerprints they only consisted of binary values.
The best performing model was applied to predict the class of the compounds in an external database,
where the effect of the compounds on the lifespan of
was mostly unknown. The external
database consisted of small-molecules obtained from the External Drug Links database of DrugBank
(version 5.1.5, released on 2020-01-03) . The External Drug Links database contained a list of drugs
and links to other databases, such as PubChem and UniProt, providing information on these compounds
[36, 90, 91].
Generation of SMILES strings, standardisation and descriptor calculation was performed in the same
method used for the training (DrugAge) database, described in the above sections. Some of the entries of
the DrugBank database were substances composed by more than one molecule, such as vegetable oils.
These entries where either removed from the database or replaced by their one of their main active
ingredients. For example, “borage oil” was replaced with “gamolenic acid”. In the case of “soy
isoavones”, the major soy isoavones (genistein, glycitein, and daidzein) had already been
experimentally evaluated on the lifespan of
. Therefore, the entry was replaced with “6''-O-
malonyldaidzin”, a derivative of daidzein with unknown activity. Stereoisomers were treated as duplicates
and only one of them was kept. Substances and stereoisomers present in both the DrugBank and
DrugAge databases were removed from the screening database. The resulting database consisted of a
total of 1,738 small-molecules.
Tanimoto coecient and similarity maps
The Tanimoto coecients and similarity maps were computed in the Python RDKit environment . The
Tanimoto similarity is calculated between a reference molecule, which is known to be active, and a
compound of interest with unknown activity.
Herein, the reference molecules were the positive entries of the DrugAge database. The compound with
unknown activity was a selected entry of the screening database which achieved a predictive probability
of ≥ 0.80 for increasing the lifespan of
. The Tanimoto coecient between the compound of
interest with each of the reference molecules was calculated. The highest score achieved as well as the
reference molecule used to obtain that score was reported. The Tanimoto coecients were computed
using the RDKit ngerprint representations of the compounds. Similarity maps were generated using
ECFP ngerprint representations.
Availability of data and materials
All software and datasets can be obtained by application to the authors at email@example.com.
The authors declare that there are no conicts of interest.
This research was carried out as a nal year project by SK, no funding was available or used.
BJH designed and supervised the study. SK performed data curation, built the predictive models and
wrote the manuscript. BJH aided the interpretation of the ndings and reviewed the manuscript providing
We are grateful to the members of the Department of Chemistry at the University of Surrey for their
support throughout the study. We also acknowledge Konstantinos Kallidromitis for reading the
manuscript and discussing the implementation and results obtained from the predictive models.
BJH (PhD) is a Senior Lecturer in Computational Chemistry at the University of Surrey, Department of
Chemistry. SK (BSc) was previously an undergraduate student at the University of Surrey and currently a
graduate student at Imperial College London.
1. Barardo D, Thornton D, Thoppil H, et al (2017) The DrugAge database of aging-related drugs. Aging
Cell 16:594–597. https://doi.org/10.1111/acel.12585
2. Qian M, Liu B (2019) Advances in pharmacological interventions of aging in mice. Transl Med Aging
3. Blagosklonny M V. (2018) Disease or not, aging is easily treatable. Aging (Albany NY) 10:3067–
4. Barardo DG, Newby D, Thornton D, et al (2017) Machine learning for predicting lifespan-extending
chemical compounds. Aging (Albany NY) 9:1721–1737. https://doi.org/10.18632/aging.101264
5. Longo VD, Antebi A, Bartke A, et al (2015) Interventions to slow aging in humans: Are we ready?
Aging Cell 14:497–510. https://doi.org/10.1111/acel.12338
6. Mack HID, Heimbucher T, Murphy CT (2018) The nematode Caenorhabditis elegans as a model for
aging research. Drug Discov Today Dis Model 27:3–13.
7. Son HG, Altintas O, Kim EJE, et al (2019) Age-dependent changes and biomarkers of aging in
Caenorhabditis elegans. Aging Cell 18:1–11. https://doi.org/10.1111/acel.12853
8. Herndon LA, Wolkow CA, Driscoll M, Hall DH (2017) Effects of Ageing on the Basic Biology and
Anatomy of C. elegans. In: Olsen A, Gill MS (eds) Ageing: Lessons from C. elegans. Springer
International Publishing, Cham, pp 9–39
9. Das UN (2011) Molecular Basis of Health and Disease. In: Molecular Basis of Health and Disease,
1st ed. Springer, Dordrecht, pp 491–512
10. Apfeld J, Alper S (2018) What Can We Learn About Human Disease from the Nematode C. elegans?
Methods Mol Biol 1706:53–75. https://doi.org/10.1007/978-1-4939-7471-9_4
11. Curran SP, Ruvkun G (2007) Lifespan regulation by evolutionarily conserved genes essential for
viability. PLoS Genet 3:0479–0487. https://doi.org/10.1371/journal.pgen.0030056
12. Lee GD, Wilson MA, Zhu M, et al (2006) Dietary deprivation extends lifespan in Caenorhabditis
elegans. Aging Cell 5:515–524. https://doi.org/10.1111/j.1474-9726.2006.00241.x
13. Harrison DE, Strong R, Sharp ZD, et al (2009) Rapamycin fed late in life extends lifespan in
genetically heterogeneous mice. Nature 460:392–395. https://doi.org/10.1038/nature08221
14. Selman C, Tullet JMA, Wieser D, et al (2009) Ribosomal Protein S6 Kinase 1 Signaling Regulates
Mammalian Life Span. Science (80- ) 326:140–144. https://doi.org/10.1126/science.1177221
15. Ye X, Linton JM, Schork NJ, et al (2014) A pharmacological network for lifespan extension in
Caenorhabditis elegans. Aging Cell 13:206–215. https://doi.org/10.1111/acel.12163
16. Putin E, Mamoshina P, Aliper A, et al (2016) Deep biomarkers of human aging: Application of deep
neural networks to biomarker development. Aging (Albany NY) 8:1021–1033.
17. Mamoshina P, Kochetov K, Putin E, et al (2018) Population Specic Biomarkers of Human Aging: A
Big Data Study Using South Korean, Canadian, and Eastern European Patient Populations. J
Gerontol A Biol Sci Med Sci 73:1482–1490. https://doi.org/10.1093/gerona/gly005
18. Winter R, Montanari F, Noé F, Clevert D-A (2019) Learning continuous and data-driven molecular
descriptors by translating equivalent chemical representations. Chem Sci 10:1692–1701.
19. Hong H, Xie Q, Ge W, et al (2008) Mold2, molecular descriptors from 2D structures for
chemoinformatics and toxicoinformatics. J Chem Inf Model 48:1337–1344.
20. Rinnie, Gaba V, Rani K, et al (2019) QSAR study on 4-alkynyldihydrocinnamic acid analogs as free
fatty acid receptor 1 agonists and antidiabetic agents: Rationales to improve activity. Arab J Chem
21. Roy K, Kar S, Das RN (2015) Chapter 2 - Chemical Information and Descriptors. In: Understanding the
Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment. Academic Press,
Boston, pp 47–80
22. Lo YC, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug
discovery. Drug Discov Today 23:1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010
23. Perkins R, Fang H, Tong W, Welsh WJ (2003) Quantitative structure-activity relationship methods:
Perspectives on drug discovery and toxicology. Environ Toxicol Chem 22:1666–1679.
24. Cereto-massagué A, José M, Valls C, et al (2015) Molecular ngerprint similarity search in virtual
screening. Methods 71:58–63. https://doi.org/10.1016/j.ymeth.2014.08.005
25. RDKit: Open-source cheminformatics. http://www.rdkit.org. Accessed April 2020.
26. Naveja JJ, Medina-Franco JL (2019) Finding Constellations in Chemical Space Through Core
Analysis. Front Chem 7:510. https://doi.org/10.3389/fchem.2019.00510
27. Hirohara M, Saito Y, Koda Y, et al (2018) Convolutional neural network based on SMILES
representation of compounds for detecting chemical motif. BMC Bioinformatics 19:526.
28. Sonego P, Kocsor A, Pongor S (2008) ROC analysis: applications to the classication of biological
sequences and 3D structures. Brief Bioinform 9:198–209. https://doi.org/10.1093/bib/bbm064
29. Chen C, Breiman L (2004) Using Random Forest to Learn Imbalanced Data. Univ California, Berkeley
30. Chemical Computing Group Inc (2019) Molecular Operating Environment (2019.01) Montreal,
31. Bender A, Glen RC (2005) A discussion of measures of enrichment in virtual screening: Comparing
the information content of descriptors with increasing levels of sophistication. J Chem Inf Model
32. Gozalbes R, Doucet JP DF (2002) Application of Topological Descriptors in QSAR and Drug Design:
History and New Trends. Infect Disord Drug Targets 2:93–102.
33. Guha R, Willighagen E (2012) A survey of quantitative descriptions of molecular structure. Curr Top
Med Chem 12:1946–1956. https://doi.org/10.2174/156802612804910278
34. Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity—a rapid access
to atomic charges. Tetrahedron 36:3219–3228. https://doi.org/https://doi.org/10.1016/0040-
35. Kleinoeder T (2005) Prediction of Properties of Organic Compounds - Emperical Methods and
Management of Property Data. PhD Thesis, University of Erlangen-Nuernberg.
36. Wishart DS, Feunang YD, Guo AC, et al (2018) DrugBank 5.0: a major update to the DrugBank
database for 2018. Nucleic Acids Res 46:D1074–D1082. https://doi.org/10.1093/nar/gkx1037
37. Djoumbou Feunang Y, Eisner R, Knox C, et al (2016) ClassyFire: automated chemical classication
with a comprehensive, computable taxonomy. J Cheminform 8:61. https://doi.org/10.1186/s13321-
38. Prasain JK, Carlson SH, Wyss JM (2010) Flavonoids and age-related disease: risk, benets and
critical windows. Maturitas 66:163–171. https://doi.org/10.1016/j.maturitas.2010.01.010
39. Ayaz M, Sadiq A, Junaid M, et al (2019) Flavonoids as Prospective Neuroprotectants and Their
Therapeutic Propensity in Aging Associated Neurological Disorders. Front Aging Neurosci 11:155.
40. Ramelet AA (2011) Venoactive Drugs. In: Goldman MP, Guex JJ, Weiss RA (eds) Sclerotherapy:
Treatment of Varicose and Telangiectatic Leg Veins, 5th ed. W.B. Saunders, Edinburgh, pp 369–377
41. Mangoni AA (2012) Drugs acting on the cerebral and peripheral circulations. In: Aronson JK (ed) A
worldwide yearly survey of new data in adverse drug reactions and interactions. Elsevier, pp 311–316
42. Kamel R, Abbas H, Fayez A (2017) Diosmin/essential oil combination for dermal photo-protection
using a lipoid colloidal carrier. J Photochem Photobiol B Biol 170:49–57.
43. Bergan JJ, Schmid-Schönbein GW, Takase S (2001) Therapeutic approach to chronic venous
insuciency and its complications: place of Daon 500 mg. Angiology 52 Suppl 1:S43-7.
44. Ganeshpurkar A, Saluja AK (2017) The Pharmacological Potential of Rutin. Saudi Pharm J 25:149–
45. Chattopadhyay D, Chitnis A, Talekar A, et al (2017) Hormetic ecacy of rutin to promote longevity in
Drosophila melanogaster. Biogerontology 18:397–411. https://doi.org/10.1007/s10522-017-9700-1
46. Riniker S, Landrum GA (2013) Similarity maps - a visualization strategy for molecular ngerprints
and machine-learning methods. J Cheminform 5:43. https://doi.org/10.1186/1758-2946-5-43
47. Xue YL, Ahiko T, Miyakawa T, et al (2011) Isolation and Caenorhabditis elegans lifespan assay of
avonoids from onion. J Agric Food Chem 59:5927–5934. https://doi.org/10.1021/jf104798n
48. Davalli P, Mitic T, Caporali A, et al (2016) ROS, Cell Senescence, and Novel Molecular Mechanisms in
Aging and Age-Related Diseases. Oxid Med Cell Longev 2016:3565127.
49. Sun K, Xiang L, Ishihara S, et al (2012) Anti-Aging Effects of Hesperidin on Saccharomyces
cerevisiae via Inhibition of Reactive Oxygen Species and UTH1 Gene Expression. Biosci Biotechnol
Biochem 76:640–645. https://doi.org/10.1271/bbb.110535
50. Fernández-Bedmar Z, Anter J, de La Cruz-Ares S, et al (2011) Role of Citrus Juices and Distinctive
Components in the Modulation of Degenerative Processes: Genotoxicity, Antigenotoxicity,
Cytotoxicity, and Longevity in Drosophila. J Toxicol Environ Heal Part A 74:1052–1066.
51. Wang J, Deng N, Wang H, et al (2020) Effects of orange extracts on longevity, healthspan, and stress
resistance in Caenorhabditis elegans. Molecules 25:1–17.
52. Lee EB, Ahn D, Kim BJ, et al (2015) Genistein from Vigna angularis Extends Lifespan in
Caenorhabditis elegans. Biomol Ther (Seoul) 23:77–83.
53. Gutierrez-Zepeda A, Santell R, Wu Z, et al (2005) Soy isoavone glycitein protects against beta
amyloid-induced toxicity and oxidative stress in transgenic Caenorhabditis elegans. BMC Neurosci
54. Fischer M, Regitz C, Kahl M, et al (2012) Phytoestrogens genistein and daidzein affect immunity in
the nematode Caenorhabditis elegans via alterations of vitellogenin expression. Mol Nutr Food Res
55. Wishart DS, Feunang YD, Marcu A, et al (2018) HMDB 4.0: the human metabolome database for
2018. Nucleic Acids Res 46:D608–D617. https://doi.org/10.1093/nar/gkx1089
56. Papsdorf K, Brunet A (2019) Linking Lipid Metabolism to Chromatin Regulation in Aging. Trends Cell
Biol 29:97–116. https://doi.org/10.1016/j.tcb.2018.09.004
57. Han S, Schroeder EA, Silva-García CG, et al (2017) Mono-unsaturated fatty acids link H3K4me3
modiers to C. elegans lifespan. Nature 544:185–190. https://doi.org/10.1038/nature21686
58. Johnson AA, Stolzing A (2019) The role of lipid metabolism in aging, lifespan regulation, and age-
related disease. Aging Cell 18:e13048. https://doi.org/10.1111/acel.13048
59. O’Rourke EJ, Kuballa P, Xavier R, Ruvkun G (2013) ω-6 Polyunsaturated fatty acids extend life span
through the activation of autophagy. Genes Dev 27:429–440.
60. Qi W, Gutierrez GE, Gao X, et al (2017) The ω-3 fatty acid α-linolenic acid extends Caenorhabditis
elegans lifespan via NHR-49/PPARα and oxidation to oxylipins. Aging Cell 16:1125–1135.
61. Sugawara S, Honma T, Ito J, et al (2013) Fish oil changes the lifespan of
lipid peroxidation. J Clin Biochem Nutr 52:139–145. https://doi.org/10.3164/jcbn.12-88
62. Khan SA, Haider A, Mahmood W, et al (2017) Gamma-linolenic acid ameliorated glycation-induced
memory impairment in rats. Pharm Biol 55:1817–1823.
63. Knauf VC, Shewmaker C, Flider F, et al (2011) Saower with Elevated Gamma-Linolenic Acid. US
Patent 2011/0129428A1, Jun. 2, 2011.
64. Rezapour-Firouzi S (2017) Chapter 24 - Herbal Oil Supplement With Hot-Nature Diet for Multiple
Sclerosis. In: Watson RR, Killgore WDSBT-N and L in NAD (eds). Academic Press, pp 229–245
65. De Giorgio R, Ruggeri E, Stanghellini V, et al (2015) Chronic constipation in the elderly: a primer for
the gastroenterologist. BMC Gastroenterol 15:130. https://doi.org/10.1186/s12876-015-0366-3
66. Honda Y, Tanaka M, Honda S (2010) Trehalose extends longevity in the nematode Caenorhabditis
elegans. Aging Cell 9:558–569. https://doi.org/10.1111/j.1474-9726.2010.00582.x
67. Xing S, Zhang L, Lin H, et al (2019) Lactose induced redox-dependent senescence and activated Nrf2
pathway. Int J Clin Exp Pathol 12:2034–2045
68. Yahia EM, Carrillo-López A, Bello-Perez LA (2019) Carbohydrates. In: Yahia EM (ed) Postharvest
Physiology and Biochemistry of Fruits and Vegetables. Woodhead Publishing, pp 175–205
69. Edwards C, Caneld J, Copes N, et al (2015) Mechanisms of amino acid-mediated lifespan extension
in Caenorhabditis elegans. BMC Genet 16:8. https://doi.org/10.1186/s12863-015-0167-2
70. Zheng J, Gao C, Wang M, et al (2017) Lower Doses of Fructose Extend Lifespan in Caenorhabditis
elegans. J Diet Suppl 14:264–277. https://doi.org/10.1080/19390211.2016.1212959
71. Preuss HG, el Zein M, Areas JL, et al (1991) Effects of excess sucrose ingestion on the life span of
hypertensive rats (SHR). Geriatr Nephrol Urol 1:13–20. https://doi.org/10.1007/BF00451857
72. Rovenko BM, Kubrak OI, Gospodaryov D V, et al (2015) High sucrose consumption promotes obesity
whereas its low consumption induces oxidative stress in Drosophila melanogaster. J Insect Physiol
73. Yang N, Liu H, Jiang Y, et al (2015) Lactulose enhances neuroplasticity to improve cognitive function
in early hepatic encephalopathy. Neural Regen Res 10:1457–1462. https://doi.org/10.4103/1673-
74. Hekmatpou D, Mehrabi F, Rahzani K, Aminiyan A (2019) The Effect of Aloe Vera Clinical Trials on
Prevention and Healing of Skin Wound: A Systematic Review. Iran J Med Sci 44:1–9
75. Baruah A, Bordoloi M, Deka Baruah HP (2016) Aloe vera: A multipurpose industrial crop. Ind Crops
Prod 94:951–963. https://doi.org/https://doi.org/10.1016/j.indcrop.2016.08.034
76. Chandrashekara KT, Shakarad MN (2011) Aloe vera or Resveratrol Supplementation in Larval Diet
Delays Adult Aging in the Fruit Fly, Drosophila melanogaster. Journals Gerontol Ser A 66A:965–971.
77. Nićiforović A, Adžić M, Zarić B, Radojčić MB (2007) Adjuvant antiproliferative and cytotoxic effect of
aloin in irradiated HeLaS3 cells. Russ J Phys Chem A 81:1463–1466.
78. Park M-Y, Kwon H-J, Sung M-K (2011) Dietary aloin, aloesin, or aloe-gel exerts anti-inammatory
activity in a rat colitis model. Life Sci 88:486–492. https://doi.org/10.1016/j.lfs.2011.01.010
79. Kumar S, Matharasi DP, Gopi S, et al (2010) Synthesis of cytotoxic and antioxidant Schiff’s base
analogs of aloin. J Asian Nat Prod Res 12:360–370. https://doi.org/10.1080/10286021003775327
80. Liu F-W, Liu F-C, Wang Y-R, et al (2015) Aloin Protects Skin Fibroblasts from Heat Stress-Induced
Oxidative Stress Damage by Regulating the Oxidative Defense System. PLoS One 10:e0143528
81. Munsiff SS, Kambili C, Ahuja SD (2006) Rifapentine for the Treatment of Pulmonary Tuberculosis.
Clin Infect Dis 43:1468–1475. https://doi.org/10.1086/508278
82. Golegaonkar S, Tabrez SS, Pandit A, et al (2015) Rifampicin reduces advanced glycation end
products and activates DAF-16 to increase lifespan in Caenorhabditis elegans. Aging Cell 14:463–
83. Martin YC, Kofron JL, Traphagen LM (2002) Do structurally similar molecules have similar biological
activity? J Med Chem 45:4350–4358. https://doi.org/10.1021/jm020155c
84. Tacutu R, Craig T, Budovsky A, et al (2013) Human Ageing Genomic Resources: integrated databases
and tools for the biology and genetics of ageing. Nucleic Acids Res 41:D1027–D1033.
85. PubChemPy. https://pypi.org/project/PubChemPy/. Accessed April 2020.
86. Atkinson F L (2014) Standardiser. https://github.com/atkinson/standardiser. Accessed on April
87. Kotsampasakou E, Ecker GF (2017) Predicting Drug-Induced Cholestasis with the Help of Hepatic
Transporters-An in Silico Modeling Approach. J Chem Inf Model 57:608–615.
88. Pedregosa F, Varoquaux G, Gramfort A, et al (2011) Scikit-learn. J Mach Learn Res 12:2825-2830.
89. Fehér NK (2018) Exploring Predicted Drug Metabolism in in silico Toxicity Prediction. Dissertation,
University of Cambridge.
90. Kim S, Chen J, Cheng T, et al (2018) PubChem 2019 update: improved access to chemical data.
Nucleic Acids Res 47:D1102–D1109. https://doi.org/10.1093/nar/gky1033
91. Consortium TU (2018) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–