R E S E A R C H Open Access
Highly accurate skin-specific methylome
analysis algorithm as a platform to screen
and validate therapeutics for healthy aging
, Alessandra Zonari
, Carolina Reis de Oliveira
, Kallie Alkatib
, Edgar Andres Ochoa Cruz
Lear E. Brace
and Juliana Lott de Carvalho
Background: DNA methylation (DNAm) age constitutes a powerful tool to assess the molecular age and overall
health status of biological samples. Recently, it has been shown that tissue-specific DNAm age predictors may
present superior performance compared to the pan- or multi-tissue counterparts. The skin is the largest organ in
the body and bears important roles, such as body temperature control, barrier function, and protection from
external insults. As a consequence of the constant and intimate interaction between the skin and the environment,
current DNAm estimators, routinely trained using internal tissues which are influenced by other stimuli, are mostly
inadequate to accurately predict skin DNAm age.
Results: In the present study, we developed a highly accurate skin-specific DNAm age predictor, using DNAm data
obtained from 508 human skin samples. Based on the analysis of 2,266 CpG sites, we accurately calculated the
DNAm age of cultured skin cells and human skin biopsies. Age estimation was sensitive to the biological age of the
donor, cell passage, skin disease status, as well as treatment with senotherapeutic drugs.
Conclusions: This highly accurate skin-specific DNAm age predictor constitutes a holistic tool that will be of great
use in the analysis of human skin health status/molecular aging, as well as in the analysis of the potential of
established and novel compounds to alter DNAm age.
Keywords: Aging, Skin aging, Epigenetics, DNA methylation, Fibroblasts, Molecular clock, DNAm age algorithm
Aging is defined as a complex, multifactorial process asso-
ciated with functional decline of organs and tissues, lead-
ing to increased chances of death . Currently, genomic
instability, telomere attrition, epigenetic alterations, loss of
proteostasis, deregulated nutrient sensing, mitochondrial
dysfunction, cellular senescence, stem cell exhaustion, and
altered intercellular communication are considered the
hallmarks of aging, emerging as active areas of investiga-
tion . Age-related epigenetic alterations have gained
prominence in this context since the observation that
DNA methylation (DNAm) undergoes predictable time-
dependent modifications, which can be explored as a
highly accurate method to estimate the molecular versus
the chronological age of human tissues .
Recently, additional factors beyond time have been shown
to influence DNAm age, such as genetic diseases [3,4], in-
fectious diseases , inflammatory disorders [6,7], and life-
style factors, such as exercise and obesity , to cite a few.
DNAm data was also significantly correlated to mortality
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this article are included in the article's Creative Commons
licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons
licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the
data made available in this article, unless otherwise stated in a credit line to the data.
* Correspondence: firstname.lastname@example.org
Bioinformatics and Computational Biology Lab, Division of Experimental and
Translational Research, Brazilian National Cancer Institute, Rio de Janeiro, RJ
OneSkin Technologies, San Francisco, USA
Full list of author information is available at the end of the article
Boroni et al. Clinical Epigenetics (2020) 12:105
risk . Therefore, DNAm has evolved from a chrono-
logical age estimation tool to an indicator of overall health.
Cellular senescence is an evolving concept and is cur-
rently defined as a cellular state characterized by four
main aspects: an irreversible cell-cycle arrest, an inflam-
matory senescence-associated secretory phenotype
(SASP), macromolecular damage (DNA, protein and
lipid damage), and altered metabolism . Other as-
pects attributed to senescent cells include apoptosis re-
sistance, senescence-associated heterochromatin foci,
and morphological alterations. Even though cellular sen-
escence and organismal aging are not synonymous, re-
cent data supports the fact that DNAm aging observed
in vivo also occurs in vitro in senescent cell cultures
(e.g., mammalian cell cultures) in a highly reproducible
pattern [11,12]. Indeed, DNAm of cultured cells can be
used to predict cellular passage, regardless of donor
chronological age . In this context, even though the
mechanism by which cellular passage and time promote
DNAm alterations in cells/tissues has not been clarified,
a few functional correlations between DNA methylation
and phenotype have been described. For instance, as
shown by Xie et al. , during replicative cellular sen-
escence, hypermethylation occurs mainly in promoter
sites of genes responsible for cellular biosynthesis and
metabolism regulation, promoting, or at least favoring, a
gradual decrease of biosynthetic processes observed in
senescent cells. Importantly, not only epigenetic alterations
are shared between cellular senescence and organismal
aging but also other molecular and phenotypic aspects, in-
cluding telomere attrition and slower renewal rate .
These findings fuel the current perception that cellular sen-
escence is a good model to be investigated in aging studies.
The possibility to study aging in vitro initiates the oppor-
tunity to apply DNAm age as a parameter to screen and/or
validate potential investigational senotherapeutic com-
pounds, defined as molecules targeting senescent cells, ei-
ther by promoting their death (senolytics), or altering the
senescence status (senomorphics) .
DNAm aging has been characterized in numerous tis-
sues, including the skin . Nevertheless, perhaps due to
the high influence of environmental factors on skin aging
, the pan-tissue algorithm developed to estimate tissue
aging failed to accurately calculate the chronological age
of cultured fibroblasts . Also, phenotypic skin analysis
failed to relate to blood DNAm age . In recognition of
the limited accuracy of the pan-tissue estimator of DNAm
age, Horvath et al. developed a skin and blood clock .
In the present study, we developed a skin-specific algo-
rithm that calculates DNAm age of skin samples with high
accuracy and a low error compared to existing molecular
age estimators. The present algorithm can be used in a
scalable platform to validate the effect of new and estab-
lished compounds to the skin DNAm age.
The utilized workflow is depicted in Additional File 1–
Fig S1. For the algorithm development, we analyzed pre-
viously published DNAm data of human skin biopsies,
retrieved from three datasets. The specific datasets,
GSE51954, E-MTAB-4385, GSE90124, are available in
public databanks (GEO and ArrayExpress) and each
comprise epigenetic data, as well as additional informa-
tion about the analyzed samples, i.e., origin tissue, donor
sex, and chronological age (18–95 years old). A total of
508 samples (40 derived from the dermis, 146 from the
epidermis, and 322 samples derived from whole skin tis-
sue) were interrogated for whole-genome methylation
levels by more than 450,000 CpG probes per sample.
The main characteristics of the cohort are described in
Additional File 2–Supplementary Table S1.
Data normalization and pre-selection of features
Each dataset was individually processed for quality con-
trol and merged for preprocessing in order to build a
machine learning algorithm able to accurately predict
DNAm age. After normalizing all datasets by quantile
(Additional File 3–Fig S2), we obtained a homogeneous
dataset with 397,598 probes.
We then removed 1720 cross-reactive probes and
26,490 probes which were not present in the new version
of the EPIC array. Probes targeting sex chromosomes were
also removed, resulting in 369,388 probes. A feature selec-
tion step was performed to reduce the dimensionality of
our dataset using a package as a wrapper for three differ-
ent algorithms implementations (Additional File 4–Fig
S3). Since the number of features is much greater than the
samples in our dataset (the curse of dimensionality), this
is an important step to reduce a model’s overfitting, i.e.,
highly accurate on training data but poor generalization
on unseen test data, while improving its accuracy if a
proper feature' subset is chosen . By reducing the
complexity of a model, we also reduce the time of training.
Each different algorithm ranked the 369,388 probes ac-
cording to their importance to predict the sample’sage.
The wrapper also ranked the probes according to the
union-importance, which was calculated considering the
results of the three algorithms together. We first retrieved
the top 100 probes ranked according to each different al-
gorithm and the top 2000 probes ranked according to the
union-importance. We also added-up the top 400 probes
most correlated with age according to Pearson’scorrel-
ation coefficient, totalizing 2410 probes (some probes
were ranked by more than one strategy). Next, we re-
moved features that are correlated with the response, but
highly correlated with each other. The final dataset con-
sisted of 2266 probes, divided into training (249 samples)
and testing (259 samples) data subsets. Samples were
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 2 of 16
randomly selected for training and testing datasets follow-
ing a balanced distribution between the donor ages (cut-
off of 5 samples per age window, wherein an age window
is approximately 7 years) (Table 1,AdditionalFile1–Fig
S1 and Additional File 5–Fig S4), in order to avoid overfit-
ting in older ages, since the full dataset was enriched in
older donor samples. Dermis samples were all placed in
the training dataset, due to their small number.
Selection of the best skin-specific DNAm age predictor
Next, we tested five machine learning (ML) algorithms to
build different models and select the best skin-specific
DNAm age predictor, including the random forest; support
vector machines (SVMs); ridge regression, which penalizes
the size of parameter estimates by shrinking them toward
zero (L1 penalty) in order to decrease model complexity
while keeping all variables in the model; Lasso (least abso-
lute shrinkage and selection operator), which drops some
features by penalizing coefficients and driving them to zero
(L2 penalty); and elastic net regression, a regularized regres-
sion method that linearly combines the L1 and L2 penalties
of the Lasso and ridge methods. After training, an optimal
regression was selected based on a minimum mean abso-
lute error (MAE) and root mean squared error (RMSE),
and maximum R
(Additional File 6–Fig S5). Ridge and
elastic net displayed similar performances, and due to its
characteristics, elastic net was chosen. After the 50-fold
cross-validation, the best model was obtained with fraction
= 1 and lambda = 1 × 10
, corresponding to a regression
model with an R
of 0.99, RMSE of 2.34 years, and MAE of
1.94 years (Additional File 6–Fig S5).
The elastic net model was able to predict the testing
dataset with high confidence. The correlation between
predicted and chronological age was 0.95 (p≤2.2 ×
) with an RMSE of 3.89 years (Fig. 1a). When com-
paring algorithm performance between epidermal and
whole skin methylome data, a slightly improved accuracy
was observed for epidermis samples (Fig. 1b).
Although the machine learning step did not utilize the
testing dataset during training, this dataset was a subset
of the original dataset used for training the model. We
then evaluated the accuracy of the model using a com-
pletely independent new subset of 16 whole skin biopsies
that include a methylation profile accessed using the
EPIC array. By using this external dataset, we again ob-
tained highly accurate predictions, with a correlation be-
tween predicted and chronological age of 0.95 (p≤2.1 ×
) and an RMSE of 4.98 years, outperforming previ-
ous DNAm estimators described in the literature (Fig. 1c
and Additional File 7—Supplementary Table 2) [2,19].
Predictors as skin aging biomarkers
In order to find potential new biomarkers for skin aging
and skin age-reversal interventions, we next evaluated
the probes interrogated in our model and the genes in
which they are associated. From the 2266 probes, 53%
were positively correlated with age in the final model.
Most probes located within the body of gene sequence
(34.5%), 11.5% were localized on the 1stExon, 3.4% on
the 3′UTR, 14.6% on the 5′UTR, 20.3% on the TSS1500,
and 15.6% on the TSS200.
In general, the methylation level differences of probes
used in our model were strongly influenced by tissue
type (i.e., epidermis, dermis, or whole skin) and sun ex-
posure (ultraviolet radiation (UV) exposure). Even
though the methylation level differences across different
ages were relatively small, a large drift was observed
around age 30, where some probes displayed increased
methylation levels (Fig. 2a). According to the Illumina
array manifest, the 2266 probes selected could be related
to 1572 unique genes. From those, 50% of genes were
associated with positively correlated probes and 58% had
probes selected in their promoter region. We also com-
pared the expression alterations across aging of probe-
associated genes, using an independent publicly available
RNA-Seq dataset composed of 91 skin biopsy samples
obtained from sun-protected regions (inner arm) of do-
nors ranging from 19 to 89 years old. When evaluating
gene expression alteration across ages, a less noticeable
correlation between probes-associated gene expression
and aging could be observed (Fig. 2b).
Table 1 Training and testing data description
Dataset Number of samples Type of sample Sex Ethnicity Age
Training 249 40 dermis
110 whole skin
Caucasian Min. 18.00
1st Qu. 35.70
3rd Qu. 66.21
Testing 259 0 dermis
212 whole skin
Caucasian Min. 20.00
1st Qu. 54.59
3rd Qu. 67.67
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 3 of 16
When evaluating the pathway enrichment within all
probe-associated genes, only 1272 gene IDs were unambigu-
ously mapped to unique Entrez Gene IDs. Gene ontology
analysis determined that approximately 39% of genes were
associated with developmental processes, 36% with cell com-
munication, and 25% with functions related to nucleic acid
binding (Fig. 2c). Genes associated with positively corre-
lated probes were significantly enriched in pathways such
as calcium signaling and cortisol synthesis and secretion
(false discovery rate (FDR) < 0.05, Bonferroni test, Fig. 2d
and Additional File 8–Supplementary Table 3), while no
significant results were obtained for genes associated with
negatively correlated probes after controlling for FDR
We next evaluated the importance of each probe, con-
sidering their contribution to the model (Loess r-squared
variable importance). The top 50 most important probes
are highlighted in Fig. 3a. The majority of these probes
were located in the gene body (44%) (Fig. 3b) and posi-
tively correlated with age (92%) (Fig. 3c). However, only a
few probe-associated genes had their mRNA expression
levels correlated with age such as GRIA2,TBR1,RGS22
(positively correlated), and B3GNT9 (negatively corre-
lated) (Fig. 3d). When considering the genes associated
with the top 300 most important probes for pathways en-
richment analysis, cAMP signaling and neuroactive
ligand-receptor pathways were both enriched in our data-
set (FDR < 0.05, Bonferroni, Table 2).
Fig. 1 Age estimation accuracy of the Skin-Specific DNAm age predictor. aCorrelation analysis between predicted age using the elastic net
model and chronological age for all samples from the testing dataset. bA correlation was evaluated considering only epidermal or whole skin
samples from the testing dataset. cPerformance comparison with previously published algorithms by a correlation analysis between predicted
and chronological age using a novel dataset of whole skin biopsies (external validation)
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 4 of 16
Finally, we analyzed for significant overlap between
the probes interrogated by the skin-specific DNAm age
predictor and both the pan-tissue and the skin and
blood DNAm age estimators. As expected, the overlap
between the skin-specific DNAm age predictor and the
pan-tissue algorithm was negligible (14 of 2266 probes)
(Additional File 9–Fig S6a). The number of common
probes between the skin and blood DNAm age esti-
mator and the newly developed skin-specific DNAm
age predictor was 57. Eight probes were shared
among the three predictors and are depicted in Add-
itional File 10–Supplementary Table 4 and their
methylation levels across age are shown in Additional
File 9–Fig S6b. Shared probes were associated with
eight genes, from which only six were present in our
RNA-Seq dataset. When evaluating their mRNA ex-
pression, no differences among age groups were ob-
served (Additional File 9–Fig S6c), showing that their
Fig. 2 Effects of aging on CpGs and genes associated with the skin-specific DNAm age predictor. aHeat map of DNA methylation levels of
probes associated with the model across all samples. Only probes with a SD between the second and third quartile are plotted. Color codes
represent beta DNAm values after row-wise z-score transformation. Probes (rows) were clustered using Pearson correlation. Samples were ordered
according to age. Features regarding tissue of origin, sun exposure, sex, and age group (age 1: < 30 years old, age 2: between 30 and 60 years
old, and age 3: > 60 years old) are also shown. bHeat map of CpG-related genes expression levels associated with the model across all samples.
Only genes with a SD higher than the second quartile are plotted. Color-codes represent log(normalized expression + 1) values after row-wise z-
score transformation. Genes (rows) were clustered using Pearson correlation. Samples were ordered as shown in (a). cGene ontology (GO)
enrichment summary for genes associated with probes in the model. dOver representation analysis using KEGG database genes associated with
probes positively correlated with age and egenes associated with probes negatively correlated with age. Dark bars represent significantly
enriched pathways after controlling for false discovery rate (FDR) using the Bonferroni method
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 5 of 16
contribution to the model is probably related to epi-
genetic changes during aging that do not influence
the expression of the gene in which they are located
Applications of the skin-specific DNAm age predictor
Since the skin-specific algorithm developed here ac-
curately estimates the DNAm age of skin samples, we
decided to investigate whether the algorithm would
be able to calculate the effect of different
interventional-aging therapeutics over skin DNAm
age. To do so, first, we verified the ability of our al-
gorithm to predict DNAm age differences in primary
human dermal fibroblasts obtained from donors of
different chronological ages (Fig. 4a). While fibro-
blasts derived from a 29-year-old donor were
Fig. 3 Importance of predictors. aVariable importance for top 50 predictors according to the Loess r-squared variable importance given by the
varImp function from caret R package. bFrequency of regions where top 50 probes are located. Blue color refers to probes positively correlated
with age in the model, and red color refers to probes negatively correlated with age. cHeat map of DNA methylation levels of the top 50
probes. Color codes represent beta DNAm values after row-wise z-score transformation. Probes (rows) are ordered according to their importance.
Samples were ordered according to their age. dHeat map of the top 50 CpG-related gene expression levels associated with the model across all
samples. Only genes with SD higher than the second quartile are plotted. Color-codes represent log(normalized expression + 1) values after row-
wise z-score transformation. Genes (rows) were clustered using Pearson correlation. Samples were ordered according to their age. Features
regarding tissue of origin, sun exposure, sex, and aging group (age 1: under 30 years old, age 2: between 30 and 60 years old and age 3: over 60
years old) are also shown
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 6 of 16
Table 2 Pathway enrichment results for genes associated with the top 300 key probes belonging to the skin-specific DNAm age
predictor. The top 10 results are shown. The enrichment method ORA was performed using the KEGG database. Pvalues were
corrected to control for FDR using the Bonferroni method, statistically significant enrichment is highlighted in bold letters
pvalue FDR Genes
cAMP signaling pathway 5.7451 2.1×
Neuroactive ligand-receptor interaction 4.2918 2.0×
Circadian entrainment 6.4865 9.6 ×
Cardiac muscle contraction 7.0969 2.3 ×
Arrhythmogenic right ventricular
6.9941 2.4 ×
Calcium signaling pathway 4.2087 2.8 ×
GABAergic synapse 5.8144 4.8 ×
Type II diabetes mellitus 8.226 5.6 ×
Amyotrophic lateral sclerosis (ALS) 7.0969 8.4 ×
Alcoholism 3.7009 1.1 ×
Fig. 4 Skin-specific DNAm age predictor applications. aDNAm age of primary human dermal fibroblasts obtained from two healthy donors of
different ages. bDNAm age of primary human dermal fibroblasts derived from an HGPS donor with different cell passage number. cDNAm age
of human psoriatic (PP) and paired uninvolved psoriatic (PN) skin tissues (GSE73894). dDNAm age residuals of normal epidermis tissues,
AK—actinic keratosis and cSCC—cutaneous squamous cell carcinoma epidermis samples (E-MTAB-5738)
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 7 of 16
predicted to have an average age of 72.4 (standard
deviation—SD: 0.393), the average age of 96.4 (SD:
1.73) was calculated for fibroblasts derived from an
84-year-old donor (p= 0.001, ttest, Fig. 4a), showing
that our algorithm was able to accurately predict the
progression of chronological age in fibroblasts of dif-
ferent donors. The fact that both cell preparations
were analyzed at passage 22 explains, at least in part,
the discrepancies between donor age and DNAm age-
predicted for cultured cells. The capacity of our algo-
rithm to detect the effect of cell passage was also
confirmed using dermal fibroblasts isolated from a 6-
year-old Hutchinson-Gilford Progeria (HGPS) patient.
While cells at passage 11 presented a mean of 43.3
(SD: 0.673) years DNAm age, the same cell culture
presented a mean DNAm age of 49.1 (SD: 0.39) years
at passage 19 (p=7.0×10
,ttest, Fig. 4b).
We also tested the capability of our algorithm to pre-
dict DNAm age alterations related to disease conditions
such as psoriasis, a chronic inflammatory skin condition,
cutaneous squamous cell carcinoma (cSCC), and UV-
induced precancerous lesions termed actinic keratosis
(AK). We predicted the molecular age of 39 samples
from active psoriatic skin tissue (PP) and paired inactive
control non-psoriatic skin tissues (PN) obtained from
patients diagnosed with the disease (data obtained from
). Different from what has been previously shown
, a statistically significant difference was observed be-
tween the DNAm age of PN (mean of 39.2 years, a me-
dian of 34.8, SD: 14.4) and PP (mean of 35.2 years, a
median of 37.0, SD: 13.7) samples (p= 2.0 × 10
sample ttest, Fig. 4c). For the AK and cSCC analysis, we
predicted the DNAm age of 12 normal epidermis sam-
ples, 16 AK epidermis samples, and 18 cSCC epidermis
samples (data obtained from ). When comparing the
DNAm age residuals from healthy skin controls (mean of
8.88 DNAm residuals, SD: 9.66) and AK (mean of −6.82
DNAm residuals, SD: 23.8), we observed a statistically sig-
nificant decrease in the molecular age of AK samples (p=
0.026, ttest, Fig. 4d), which was not detected previously,
when the same data were analyzed using the pan-tissue
DNAm algorithm . The epidermal stem cell origin of
AK and cSCC  may justify the lower DNAm age de-
tected by our algorithm.
Next, we assessed the capacity of the skin-specific
DNAm age predictor to validate the effects of numerous
aging-related interventions. Recently, the treatment of
cells with the reprogramming factors OCT4, SOX2,
KLF4, c-MYC, LIN28, and NANOG (OSKMLN) 
was shown to promote a partial reversion in cellular age
without altering cellular identity. Here, we applied the
skin-specific DNAm age predictor to analyze the pub-
lished data and observe the effect of the reprogramming
treatment on fibroblast DNAm age. No statistically
significant difference was found (p= 0.15, paired sample
ttest, Fig. 5a) when comparing the DNAm age residuals
from treated (mean age residuals of −1.08, SD: 2.47) and
control samples (mean age residuals of 1.08, SD: 0.95),
likely due to the small dataset and the statistical test
When applying the skin-specific algorithm to evaluate
known senotherapeutic treatments on human fibroblast
cultures, we treated HGPS cells with ABT-263 and
Rapamycin, drugs known to be senolytic and seno-
morphic respectively, i.e., drugs that specifically target
cellular senescence, an altered cell state associated with
aging and age-related diseases, and therefore expected to
reduce cellular DNAm age [26,27]. After 3-day treat-
ments, no statistically significant differences were ob-
served when comparing treated and untreated samples
(Fig. 5b). Interestingly, despite the lack of statistical sig-
nificance in the DNAm alteration of 2D cultured fibro-
blasts, detectable alterations in other phenotypes
associated with aging were observed. Specifically, both
ABT-263 treatments (1.25 μM and 5 μM) promoted a
statistically significant reduction of approximately 50%
in the mean senescence-associated beta-galactosidase
(SA-β-Gal) staining intensity of treated samples (p<
0.0001), while Rapamycin (100 nM) treatment did not
significantly alter the SA-β-Gal staining levels (Fig. 5c,
left graph). The treatments with ABT-263 at 1.25 and 5
μM, as well as the treatment with Rapamycin at 100 nM,
promoted a statistically significant reduction of p< 0.01,
p< 0.001, and p< 0.0001, respectively, in the average
number of ATRX foci/cell (Fig. 5c, right graph), which is
an early predictor of senescence burden . Such
phenotypic alterations corroborated the statistically sig-
nificant reduction of approximately 30% of the CDKN2
(P16) mRNA expression detected in all experimental con-
ditions (1.25 μMABT-263p<0.05;5μMABT-263p<
0.01; Rapamycin 100 nM p< 0.01, compared to untreated
samples—Fig. 5d left graph). The ABT-263-treated sam-
ples also presented a statistically significant decrease in
IL6 mRNA expression (ABT-263 1.25 μM, p< 0.05; ABT-
263 5 μM, p< 0.001 compared to control, according to t
test analysis—Fig. 5d, right graph). In all comparisons
using 2D cell cultures, the previously published skin and
blood DNAm estimator presented similar results to the
estimator presented here (Additional File 11–Fig S7).
We then evaluated the ability of our DNAm estimator to
predict treatment efficacy using fresh human skin biopsy
samples, which were maintained in culture for 5 days. In
this case, the skin-specific algorithm developed in the
present study was able to detect a DNAm age reduction in
biopsies after treatment with 100 nM Rapamycin (mean
age of 70.4, SD of 1.10 for control versus mean age of 68.0,
SD of 3.16 for treatment, p=0.17ttest, Fig. 6a). Oppos-
itely, the skin and blood DNAm age estimator predicted an
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 8 of 16
increase of the molecular age after treatment (mean age of
69, SD of 3.95 for control versus mean age of 70.1, SD of
3.24 for treatment, p= 0.64, ttest, Fig. S7g). The treatment
with Rapamycin did not modify the overall morphological
structure of the skin assessed by hematoxylin and eosin
(H&E) staining (Fig. 6b), but it resulted in a statistically sig-
nificant increase in insulin growth factor binding protein 3
(IGFBP3) mRNA expression and a significant decrease in
interleukin-8 (CXCL8 or IL8) mRNA expression (Fig. 6c),
similar to what has been published previously .
CDKN2A (P16), β2 microglobulin (B2M), and marker of
proliferation Ki-67 (MKi67) presented non-statistically sig-
nificant mRNA expression alterations. In the dermis, the
decreased DNAm age of Rapamycin-treated samples was
Fig. 5 Skin-specific DNAm predictor as a tool to validate the senotherapeutic potential of different compounds. aDNAm age residuals of primary
human dermal fibroblasts treated with OSKMLN reprogramming factors (GSE142439 data) and untreated control samples (Ctrl). b–dPrimary
human dermal fibroblasts derived from HGPS donor treated with ABT-263 (ABT) at 1.25 and 5 μM, as well as 100 nM of Rapamycin (Rapa) for 3
days. Untreated cells were considered as controls (Ctrl). bPredicted DNAm age using the new skin-specific molecular clock, csenescence-associated
beta-galactosidase (SA-β-Gal) staining intensity per nuclei, and the number of ATRX foci/cell. dRelative gene expression of CDKN2A (P16) and IL6
measured by qRT-PCR compared to untreated samples using ANOVA and Bonferroni *p<0.05;**p<0<0.01;***p<0.001;****p< 0.0001
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 9 of 16
associated with a statistically significant increase in collagen
type I alpha 1 (COL1A1)andIGFBP3 mRNA expression, as
well as non-significant increases in hyaluronic acid
synthase-2 (HAS2)andMKi67 mRNA expression (Fig. 6d).
DNA methylation molecular clocks constitute algo-
rithms which highly correlate (r> 0.8) DNA methylation
patterns of specific biological samples with chronological
age or time . Two of the first-reported clocks include
the molecular algorithm built by Hannum et al., based
on methylome data of blood samples obtained from 656
individuals  and the pan-tissue molecular algorithm
built by Steve Horvath using 8000 samples of 51 differ-
ent healthy tissues and cell types . Since then, many
other molecular clocks have been built, all presenting
pros and cons. Of note, as knowledge regarding DNAm
age accumulates, it becomes increasingly clear that
tissue-specific algorithms are potentially more accurate
than pan-tissue counterparts. Furthermore, it has re-
cently been shown that epigenetic age, as calculated by
DNAm algorithms, is not only influenced by chrono-
logical age but also by the health status of the sample
. Therefore, such algorithms are currently considered
as possible tools to predict lifespan and healthspan .
In the present study, we proposed that such molecular
clocks can also be used to accurately assess skin health,
aging, and also the effect of experimental interventions
on skin DNAm age.
Even though the causes of the accumulation of epigen-
etic alterations in DNA are unknown, current knowledge
points to the participation of both intrinsic (intracellular)
Fig. 6 Effect of senotherapeutic treatments in human skin biopsies treated with 100 nM Rapamycin for 5 days. aPredicted DNAm age using the
skin-specific molecular clock. bRepresentative images of H&E of treated and untreated (control) samples, cmRNA expression in the epidermis,
and dmRNA expression in the dermis. Ctrl control, IGFBP3 insulin growth factor binding protein 3, B2M β2 microglobulin, IL8 interleukin-8, HAS2
hyaluronic acid synthase 2, COL1A1 collagen type 1 alpha 1 compared to untreated samples using ANOVA and Bonferroni, or ttest. Data refer to
experiments performed in triplicate
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 10 of 16
and extrinsic (extracellular) processes. The skin is no
different from other tissues in this sense, as its aging re-
flects both intrinsic and extrinsic processes, such as gen-
etic and metabolic factors, as well as UV exposure and
the general exposure to the greater environment . As
observed in other tissues, skin aging is generally accom-
panied by the accumulation of senescent cells [34,35],
reduced cellular proliferation/tissue renewal, and altered
extracellular matrix, which are associated with skin
wrinkling, sagging, altered pigmentation , and cancer
. Still, accumulating evidence shows that tissues age
differently, both in the sense that aging may be desyn-
chronized within a single multicellular organism, and
also that different methylation markers may be more
relevant to calculate DNAm age from one tissue than
another. Those aspects of tissue aging became evident in
the Horvath pan-tissue molecular clock algorithm, which
determined that hormone exposed organs age more
quickly than those that are not exposed , and also
presented unequal performances when calculating the
age of different tissues . Of note, the pan-tissue mo-
lecular clock presented high error rates when used to
calculate the age of skin samples and cells, especially fi-
In recognition of such limitations, the same author re-
cently developed another molecular clock algorithm, the
skin and blood molecular clock . Such an algorithm
was built using methylome data of human fibroblasts, ker-
atinocytes, buccal cells, endothelial cells, lymphoblastoid
cells, skin, blood, and saliva samples, leading to improved
performance compared to previous data for DNAm age
prediction in skin samples . In the present study, we
tested both algorithms pan-tissue as well as the skin and
blood and compared them to the algorithm we built using
DNA methylation data of human skin biopsies only. Such
a strategy is exclusive in the sense that it is the first mo-
lecular clock developed from DNA methylation data of
human skin biopsies, and, interestingly, it led to compel-
ling results. The focus on skin biopsy samples limited the
amount of data used to build the algorithm but resulted in
a highly efficient DNAm age predictor specific for human
skin. Furthermore, the algorithm could be executed with
high accuracy and a smaller standard deviation compared
to former DNAm age predictors. The algorithm developed
here also accurately predicted the DNAm age of fibro-
blasts, which was significantly influenced by donor age
and cell passage, as previously reported by others [11,12].
The new skin-specific algorithm developed in the
present study is based on the analysis of 2266 CpG sites,
of which less than 3% overlap with the skin and blood
algorithm developed previously . As mentioned, a
link between cause and consequence has not yet been
established between the methylation pattern of the skin
during aging and the resulting DNAm age. Nevertheless,
the analysis of the biological function attributed to the
genome regions assessed by the top 300 probes has led
to interesting insights. In the skin, the genes located in
the regions interrogated by the probes are related to sev-
eral cellular processes. The most enriched pathway in
this analysis was cAMP signaling, which has previously
been associated with organismal aging and cellular sen-
When analyzing the DNAm data obtained from do-
nors of different age groups (< 30 years old, between 30
and 60 years old, and > 60 years old), we noticed the
predictors used in the skin-specific DNAm algorithm de-
tected that, while young methylomes were more similar
among each other, old methylomes appeared to be sub-
stantially more heterogeneous. This is in consonance to
published observations, which indicate that while methy-
lation patterning within an individual becomes more
homogeneous with age, the differences between individ-
uals increase .
The skin-specific DNAm predictor could also de-
tect the reduction of DNAm age of primary human
dermal fibroblasts with a partial pluripotency repro-
gramming regimen, corroborating previous reports
. Surprisingly, the treatment of progeroid fibro-
blasts with the senolytic ABT-263, although promot-
ing a decrease in the number of senescent cells and
the gene expression of aging markers, displayed a
trend of increasing the DNAm age. This data rein-
forces that alterations in gene expression and DNAm
age may highlight markers of aging with different
sensitivity, and that mRNA expression does not ne-
cessarily correlate with methylation alterations.
Short-term treatment with Rapamycin also altered
the gene expression of age markers but did not alter
the DNAm age, despite revealing that the treatment
promoted a trend in DNAm age decrease. When the
same data was analyzed using the skin and blood
DNAm algorithm, Rapamycin treatment also failed
to promote statistically significant DNAm age alter-
ation but resulted in a trend of DNAm age increase,
which conflicts with the decrease in mRNA markers
related to aging and senescence. Moreover, it sug-
gests that increased treatment times should be inves-
tigated to validate the effect of senotherapeutics on
the DNAm age.
Using human skin biopsies, we noticed that Rapamy-
cin promoted a nonsignificant decrease in the calculated
DNAm age of treated samples. Nevertheless, alterations
at the gene expression level were observed with the sig-
nificant increase in IGFBP3 mRNA expression and also
a significant decrease in CXCL8 (IL8) mRNA expression,
similar to what has been published previously . In
the dermis, the decreased DNAm age of Rapamycin-
treated samples was associated with a statistically
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 11 of 16
significant increase in COL1A1 and IGFBP3 mRNA ex-
pression, suggesting DNAm age is related to several
established markers of skin aging.
The skin maintains a few characteristics which render
this organ as an interesting target for DNAm age studies.
First, it is one of the few organs from which it is rela-
tively easy to obtain samples for analysis. It can also be
replicated in vitro with high fidelity; it is the first organ
to externalize signs of aging, and suffers extreme exter-
nal influence from environmental stimuli such as the
sun and pollution. Interestingly, in addition to its par-
ticularities and the appeal of cosmetic applications, due
to its size, the skin can play a major role in organismal
inflammation levels and has been linked to numerous
chronic diseases of aging . Hu et al. recently demon-
strated that epidermal dysfunction largely accounts for
age-associated elevations in circulating cytokine levels
and that improving epidermal function reduced these
levels in mice . Ye et al. provided similar evidence in
humans . In the last 5 years, studies of psoriasis 
and dermatitis  have similarly determined that skin
inflammation likely increases the risk of cardiovascular
disease. Therefore, the development of a skin-specific
DNAm age predictor can offer a powerful tool for the
comprehension of skin diseases and how it may influ-
ence overall organismal health status and DNAm age.
Since skin cancer has been correlated to age ,
DNAm analysis may also benefit skin cancer studies.
Prior to the present study, DNAm age predictors failed
to detect any influence of AK with DNAm age , dis-
couraging any correlation between skin aging and AK.
In the present study, we were able to detect significant
differences between the DNAm age of healthy and AK
samples as well as significant differences in psoriatic and
cSCC samples. Nevertheless, our observation of a de-
crease in the mDNA age of psoriatic samples compared
to normal counterparts disagrees with previously pub-
lished data describing that psoriatic lesions are charac-
terized by the presence of senescent keratinocytes,
mainly observed in the mid and upper epidermal layers
. The keratinocytes derived from psoriatic diseases
have already been shown to express high levels of
CDKN2 (P16), CDKN1A (P21), as well as low CDK1 and
CCNA1, in addition to IGFBP2 , which is a compo-
nent of SASP . Still, the development of a skin-
specific DNAm age predictor may offer powerful means
for the comprehension of skin illnesses.
The application of senotherapeutic molecules for skin
treatment is a very recent concept, which has very lim-
ited clinical evidence of efficacy currently. In our hands,
the importance of executing several technical replicates
for each DNAm analysis was important, due to the high
standard deviation observed between samples. In a study
involving a small number of patients, the topical
application of Rapamycin for 6 months promoted a re-
duction in p16
protein levels in the skin and also
improved skin appearance . Here, we tested short
Rapamycin treatment in skin biopsies and could note a
reduction in DNAm age, as well as mRNA expression of
age-related genes. This data supports the use of DNAm
age as a parameter to be investigated during the research
and validation of novel senotherapeutic models for skin
care as well as interventions that modulate skin aging.
The minimum amount of time required to promote sig-
nificant changes in the skin mDNA may widely vary ac-
cording to the intervention executed. Therefore, this
remains to be validated in larger and long-term studies.
Taken together, the present data suggests that in vitro
models recapitulate key aspects of skin aging and DNAm
age analysis using the present algorithm provides a more
comprehensive and highly accurate method to analyze
human skin health status and aging. Such data may sup-
port further investigation and understanding of how skin
ages, as well as how DNAm age of the skin is impacted
by health status and experimental treatments.
Methylation data access and pre-processing
Pre-processed beta values from GSE51954, GSE90124,
GSE73894, and GSE142439 datasets were individually
downloaded using the getGEO function from “GEOqu-
ery”R library version 2.54.1 , while E-MTAB-4385
and E-MTAB-5738 raw files were downloaded from
ArrayExpress and converted to beta values using the
“minfi”package version 1.32.0 . Datasets used for al-
gorithm construction were individually processed,
merged, and normalized using the function betaqn from
the “wateRmelon”package version 1.30.0 , which
performs quantile normalization on the beta values. We
have checked for batch effects when combining the three
dataset for the algorithm construction. Even though the
removal of potential batch effects in our data using the
comBat function from the sva package version 3.34.0
[44,47] seems to improve data homogenization, the ef-
fect was the opposite in the final model when predicting
unseen data. Therefore, we did not remove the batch ef-
fect to avoid the overfitting of the final model. Heatmaps
were constructed using the pheatmap package version
1.0.12 using beta values, z-scaled across samples. Probes
(rows) were clustered using Pearson correlation and
samples were ordered based on their chronological age.
Age prediction analysis with previously published
Quantile-normalized beta values for all samples were
used as input for both R software codes underlying the
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 12 of 16
pan-tissue  and/or skin and blood . The R codes
were retrieved from the paper’s supplementary informa-
tion. For datasets where chronological age was informed,
we also calculated the DNAm age residual, defined as
the residual of a linear model where the independent
variable is chronological age and the response is DNAm
age. Box plots were used to compare age predictions
among groups in different datasets.
RNA-Seq data access and pre-processing
SRA files from the project SRP082426 weredown-
loaded and converted to fastq files using SRA Toolkit ver-
sion 2.8.2-1. Trimmed reads using the software
Trimmomatic version 0.37  with default options were
mapped to the human genome (GRCh38–ENSEMBL re-
lease 88) using STAR version 2.5.3a  with default pa-
rameters for single unstranded reads as per developer’s
manual. Htseq-count version 0.11.1  was used to
assign uniquely mapped reads to genes (excluding pseudo-
genes) according to the annotation in the Homo_
sapiens.GRCh38.89.gtf (ENSEMBL release 88). Read
counts were analyzed using the R package DESeq2 version
1.26.0  and libraries were normalized using the estima-
teSizeFactors function of the package after samples were
split into three groups according to their chronological
age: age 1 (20 samples from donors under 30 years old);
age 2 (20 samples from donors between 30 and 60 years
old), and age 3 (51 samples from donors above 60 years
old). Heat maps were constructed using the pheatmap
package version 1.0.12 using a regularized Log2-
transformed counts-per-million, z-scaled across samples.
Genes (rows) were clustered using Pearson correlation
and samples were ordered based on their chronological
age. Box plots using ggplot2 package version 3.3.1 
were used to compare gene expression among age groups.
Pathway enrichment analysis
Genes associated with the probes according to the Illu-
mina manifest were retrieved from a list of probes posi-
tively and negatively correlated with age in our model.
The gene lists were then analyzed for known biological
functions or processes enriched using the over-
representation analysis (ORA) methodology in the WEB-
based GEne SeT AnaLysis Toolkit  using the Kyoto
Encyclopedia of Genes and Genomes (KEGG) database.
Genes associated with the top 300 probes ranked ac-
cording to their importance for the model were also ana-
lyzed by this methodology. Pvalues were controlled for
false discovery rate (FDR) using the Bonferroni method.
Cross-reactive probes , probes on the sex chromo-
somes, probes that were not present in the Infinium
MethylationEPIC Array (Illumina), and probes with
missing values were excluded. In order to reduce the data
dimensionality, the R package “FeatureSelection”version
used, which is a wrapper to select features based on three
different algorithmic implementations proper for high-
dimensional data sets: Glmnet is a package that fits a gen-
eralized linear model via penalized maximum likelihood;
Xgboost stands for “Extreme Gradient Boosting”and is a
fast implementation of the well-known boosted trees;
ranger is a fast implementation of random forest, particu-
larly suited for high-dimensional data. In this step, highly
correlated features were removed.
Machine learning training
Five ML algorithms implemented in the R package caret
version 6.0-86  were used with the training dataset:
the ranger implementation of random forest (using 100
trees); support vector machines with radial basis function
kernel; ridge regression, which penalizes sum of squared
coefficients (L2 penalty); Lasso regression, which penalizes
the sum of absolute values of the coefficients (L1 penalty);
and elastic net, a convex combination of ridge and Lasso.
In each case, 50-fold resampling cross-validation was used
for the optimization of the tuning parameters. Model pre-
diction errors were computed using mean absolute error
(MAE) and/or root mean squared error (RMSE). Fitness
levels and significance of the applied regression models
were evaluated by computing Pearson’s correlation coeffi-
cient using the training data. RMSE was used to select the
optimal model using the smallest value. We also com-
pared the performance of the select model with the pan-
tissue and skin and blood DNAm age predictors by using
the same parameters in a novel dataset consisting of 16
skin biopsies samples from different donors whose methy-
lation levels were accessed by the Infinium MethylationE-
PIC Array (Illumina).
Cell culture and treatments
Primary human dermal fibroblasts derived from an
HGPS donor were obtained from The Progeria Research
Foundation Cell and Tissue Bank and cultured in Dul-
becco’s modified Eagle medium (Invitrogen), supple-
mented with 10% v/vfetal bovine serum (FBS;
Invitrogen) and 1% v/vPenicillin-Streptomycin (Invitro-
gen). Before reaching complete confluence, cells were
expanded using 0.25% Trypsin/EDTA (Gibco), followed
by inactivation of the enzyme using FBS-containing
medium. Primary human dermal fibroblasts derived
from healthy donors were purchased from Coriell Insti-
tute for Medical Research and maintained in the same
ABT-263 (ApexBio, final concentration: 1.25 or 5 μM)
or Rapamycin (Fisher Scientific, final concentration 100
nM) were added to cell culture media of fibroblasts for
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 13 of 16
2D assays and maintained for 3 days. Samples were col-
lected for analysis after three days of recovery following
ABT-263 or Rapamycin removal.
β-galactosidase staining was performed using Senescence
β-Galactosidase Staining Kit (Cell Signaling, 9860S), fol-
lowing the manufacturer’s instructions. Cells were
washed with 1× PBS and fixed for 10–15 min. Then,
cells were washed 2 times with 1× PBS and incubated
with β-Galactosidase Staining Solution overnight at 37
°C in a dry incubator without CO
. Cells were then
stained with DAPI and observed by × 10 magnification
(6D High Throughput, Nikon) and blue staining quanti-
fied as the mean color intensity compared to the total
number of cells using CellProfiler
ATRX foci were detected by immunofluorescence, using
ATRX (Santa Cruz Biotechnology, D5 - sc55584) antibody
diluted 1:2000, followed by goat anti-mouse IgG H&L-
Alexa Fluor® 488 (Abcam, Cambridge, MA, ab150113), as
described previously . Briefly, cells were fixed with 4%
paraformaldehyde solution for 10 min. Permeabilization
was performed for 5 min with 0.1% Triton followed by
blocking for 40 min with 0.5% Tween and 1% BSA. The
primary antibody was incubated overnight at 4 °C. After
three washes with PBS, cells were incubated at room
temperature for 1 h with the secondary antibody +
Hoechst 33342. Cells were imaged at × 40 magnification
using the IN Cell Analyzer 2500 (GE Healthcare). The
analysis was performed using the IN Cell Developer tool-
box. The average ATRX foci per cell was defined by the
total ATRX foci/total nuclei. A minimum of 150 cells was
analyzed per experimental condition.
Ex-vivo skin samples and Treatment with Rapamycin
Skin samples from a healthy donor (Female, Caucasian,
79 years) were obtained from ZenBio (Research Triangle,
NC) and maintained in an air-liquid interface in Dulbec-
co’s modified Eagle medium (Invitrogen, Carlsbad, CA),
supplemented with 10% v/vFBS. The skin samples were
treated with either vehicle or 100 nM Rapamycin (Fisher
Scientific, Hampton, NH) in the media, on days 1 and 3.
After 5 days, the samples were harvested and fixed in
formalin for histology, or used for RNA and DNA
RNA isolation and RT-qPCR
RNA was isolated from skin biopsies or cell culture sam-
ples using the Quick RNA Miniprep kit (Zymo Research,
Irvine, CA) and following the manufacturer instructions.
Total RNA was then quantified and 1 μg was used for
reverse transcription, using the high-capacity cDNA
Reverse Transcription Kit (Thermo Fisher Scientific).
qPCR was performed using PerfeCTa® qPCR ToughMix®,
Low ROX™(QuantaBio) and the Taqman (Invitrogen)
probes for CDKN2 (P16) (Hs00923894_m1), IL6
(hS00174131_m1), IGFBP3 (Hs00181211_m1), B2M
(Hs00187842_m1), CXCL8 (IL8) (Hs00174103_m1),
MKi67 (Ki67) (Hs04260396_g1), HAS2 (Hs00193435_
m1), COL1A1 (Hs00164004_m1), and GAPDH
DNA sample acquisition and methylation analysis
Total DNA samples were obtained from cultured cells or
human skin biopsy samples (purchased from Genoskin,
Inc., France) using the QIAamp DNA Mini Kit (QIAGEN)
and following manufacturer instructions. DNA methyla-
tion assessment was performed by The University of Brit-
ish Columbia (Vancouver, Canada) using the human
Illumina Infinium EPIC 850K chip. DNA samples are de-
scribed in the Table S1 and include 16 skin biopsy sam-
ples, 3 samples of technical replicates of HGPS passage 11,
3 samples of technical replicates of HGPS passage 19, 3
samples of technical replicates of human primary fibro-
blasts obtained from 29-year-old donor, 3 samples of tech-
nical replicates of human primary fibroblasts obtained
from the 84-year-old donor, 3 samples of technical repli-
cates of fibroblasts treated with 100 nM Rapamycin, 1.25
5μM ABT-263, or non-treated, 5 skin biopsy samples
which were considered as untreated controls, and 5 skin
biopsy samples treated with 100 nM Rapamycin. The raw
image data was processed using the commands prepro-
cessRaw() followed by preprocessSWAN(). Methylation
signals (Mvalues) were then converted to ratios using the
ratioConvert() and next to beta values using getBeta(), all
functions implemented in the “minfi”R package version
1.32.0 . Beta values were normalized using the
betaqn() method, which quantile normalizes betas, imple-
mented by the “wateRmelon”package version 1.30.0 .
Normalized beta values were used for age estimation.
Data was tested for normal distribution performing the
Shapiro-Wilk test. In cases where more than 2 groups
were compared, we performed one-way ANOVA
followed by Bonferroni’s multiple comparisons test. For
cases where paired samples were provided, a paired ttest
was performed. P≤0.05 were considered statistically sig-
nificant. Analyses were performed using the program
GraphPad Prism 8 software or R version 3.6.3 software.
Skin-specific DNAm predictor
The algorithm described here is available to the scientific
committee through an Application Programming Inter-
face (API) that can be accessed by the link: www.mol-
clock.com. As an input, users will upload their matrix
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 14 of 16
containing pre-processed and normalized methylation
levels (beta values) measured on the Illumina BeadChip
platform and will receive as output a table with the pre-
dicted DNAm age.
Supplementary information accompanies this paper at https://doi.org/10.
Additional file 1. Supplementary Figure 1. Experimental design. (A)
Schematic representation of the workflow used for constructing and
validating the skin-specific DNAm age estimator. (B) Schematic representa-
tion of the skin-specific molecular clock applications envisioned in this work.
Additional file 2. Supplementary Table 1. Description of all data used in
Additional file 3. Supplementary Figure 2: Normalization quality control.
Density plots showing the methylation beta values distribution before
and after normalizing all three datasets (E_MATB_4385, GSE51954, and
GSE90124) by quantile.
Additional file 4. Supplementary Figure 3: Pre-selected probes for algo-
rithm training. Top 50 probes ranked according to different algorithms
implementations in the feature selection step. Probes that were cross-
reactive, targeting sex chromosomes or were not present in the current
version of EPIC array were excluded, as described in the methodology
Additional file 5. Supplementary Figure 4. Age distribution of samples
in training and testing datasets. Samples were randomly distributed
between training and testing datasets, following a balanced distribution
according to donor age.
Additional file 6 Supplementary Figure 5. Comparative analysis of
machine learning algorithm performance. Machine Learning (ML)
algorithms random forest (rf), support vector machines (svm), lasso,
elastic net (enet) and ridge were compared according to their
performance, as assessed by Mean Absolute Error (MAE), Root mean
squared error (RMSE) and maximum R
Additional file 7. Supplementary Table 2. Age estimations comparison
among three DNAm age predictors for 16 biopsies samples (external
Additional file 8. Supplementary Table 3. The enrichment method Over
Representation Analysis (ORA) was performed using the Kyoto
Encyclopedia of Genes and Genomes (KEGG) database. p values were
corrected to control for FDR using the Bonferroni method and only tests
with p < 0.05 were considered significant.
Additional file 9. Supplementary Figure 6. Probes shared by different
DNAm age predictors. (A) Analysis of common probes among the Skin-
Specific, the Skin & Blood, and the Pan-Tissue DNAm age predictors. (B)
Heat map of DNA methylation levels of the eight probes shared by the
three DNAm algorithms. Color codes represent beta DNAm values after
row-wise z-score transformation. Probes (rows) are ordered according to
their importance. Samples were ordered according to their age. (C) Ex-
pression among age groups of genes associated with the shared probes
Additional file 10. Supplementary Table 4. List of probes shared by the
Skin-Specific, the Skin & Blood (H2), and the Pan-Tissue (H1) DNAm age
Additional file 11. Supplementary Figure 7. DNAm age estimation
using the Skin & Blood algorithm. Similar to the analysis performed with
the Skin-Specific DNAm age predictor, the Skin & Blood DNAm estimator
was used to calculate the DNAm age of (A) primary human dermal fibro-
blasts obtained from healthy donors from increasing chronological age.
(B) DNAm age of primary human dermal fibroblasts derived from an
HGPS donor increased with the cell passage. (C) DNAm age of human
psoriatic (PP) and paired uninvolved psoriatic (PN) skin tissues
(GSE73894). (D) DNAm age residuals of normal epidermis tissues, AK sam-
ples, and cSCC epidermis samples (E-MTAB-5738 data). (E) DNAm age
residuals of primary human dermal fibroblasts treated with OSKMLN re-
programming factors and controls (Ctrl) (GSE142439 data). (F) DNAm age
of primary human dermal fibroblasts derived from HGPS donor treated
with ABT-263 (ABT) at 1.25 and 5 μM, as well as 100 nM of Rapamycin
(Rapa) for three days. Untreated cells were considered as controls (Ctrl).
(G) DNAm age of human skin biopsies treated with 100 nM Rapamycin
(Rapa) for five days and untreated controls (Ctrl).
ABT: ABT-263; SA-BGal: Senescence-associated beta-galactosidase staining;
DNAm: DNA methylation; FDR: False discovery rate; HGPS: Hutchinson-Gilford
Progeria; H&E: Hematoxylin and eosin; KEGG: Kyoto Encyclopedia of Genes
and Genomes; MAE: Mean absolute error; ML: Machine learning; ORA: Over
representation analysis; PP: Psoriatic skin tissue; PN: Control non-psoriatic skin
tissue; Rapa: Rapamycin; RMSE: Root mean squared error; SASP: Senescence-
associated secretory phenotype; SD: Standard deviation; UC: normal skin
We would like to thank Marcos Sanderlei de Oliveira, Fabrizzio Yamamoto
Zampieri, and Alexandre Kleis for their commitment to helping us develop
the API. They were essential in this step. We also acknowledge Mylieneth
Guiang and Daniel Foyt for helping with cell culture and lab maintenance
and Bailey Marshall for testing the R-code.
Conception and design: MB, AZ, CR, LB, EA, and JC. Wet lab experiments: AZ,
LB, and KA. Data analysis: MB. Drafting the article: JC, AZ, and MB. Final
revision: All authors. The authors read and approved the final manuscript.
The present study was funded by OneSkin Technologies.
Availability of data and materials
The five datasets produced for this work have been uploaded to the GEO
database under the accession number GSE151617 (reference Series).
Ethics approval and consent to participate
Consent for publication
MB, AZ, CR, LB, EA, and JC are named as inventors of a provisional patent
directed at this invention, which is solely owned by OneSkin Technologies.
MB, AZ, CR, EA, and JC are co-founders of OneSkin Technologies.
Bioinformatics and Computational Biology Lab, Division of Experimental and
Translational Research, Brazilian National Cancer Institute, Rio de Janeiro, RJ
OneSkin Technologies, San Francisco, USA.
Sciences and Biotechnology Program, Catholic University of Brasilia, Brasilia,
Faculty of Medicine, University of Brasilia, Brasilia, Brazil.
Received: 27 April 2020 Accepted: 3 July 2020
1. López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The hallmarks
of aging. Cell. 2013. 1194–1217. doi:https://doi.org/10.1016/j.cell.2013.05.039.
2. Horvath S. DNA methylation age of human tissues and cell types. Genome
Biol. 2013. p. R115. doi:https://doi.org/10.1186/gb-2013-14-10-r115.
3. Heyn H, Moran S, Esteller M. Aberrant DNA methylation profiles in the
premature aging disorders Hutchinson-Gilford Progeria and Werner
syndrome. Epigenetics. 2013;8:28–33.
4. Horvath S, Garagnani P, Bacalini MG, Pirazzini C, Salvioli S, Gentilini D, et al.
Accelerated epigenetic aging in Down syndrome. Aging Cell. 2015;14:491–5.
5. Jacob JA. Men with HIV age faster according to DNA methylation study.
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 15 of 16
6. Boehncke W-H. Systemic inflammation and cardiovascular comorbidity in
psoriasis patients: Causes and Consequences. Front Immunol. 2018;9:579.
7. Yuan M, Cao W-F, Xie X-F, Zhou H-Y, Wu X-M. Relationship of atopic
dermatitis with stroke and myocardial infarction: A meta-analysis. Medicine.
8. Quach A, Levine ME, Tanaka T, Lu AT, Chen BH, Ferrucci L, et al. Epigenetic
clock analysis of diet, exercise, education, and lifestyle factors. Aging . 2017;
9. Marioni RE, Shah S, McRae AF, Chen BH, Colicino E, Harris SE, et al. DNA
methylation age of blood predicts all-cause mortality in later life. Genome
10. Gorgoulis V, Adams PD, Alimonti A, Bennett DC, Bischof O, Bishop C, et al.
Cellular senescence: defining a path forward. Cell. 2019;179:813–27.
11. Matsuyama M, WuWong DJ, Horvath S, Matsuyama S. Epigenetic clock
analysis of human fibroblasts : effects of hypoxia, donor age, and expression
of hTERT and SV40 largeT. Aging. 2019;11:3012–22.
12. Sturm G, Cardenas A, Bind M-A, Horvath S, Wang S, Wang Y, et al. Human
aging DNA methylation signatures are conserved but accelerated in
cultured fibroblasts. Epigenetics. 2019;14:961–76.
13. Koch CM, Joussen S, Schellenberg A, Lin Q, Zenke M, Wagner W. Monitoring
of cellular senescence by DNA-methylation at specific CpG sites. Aging Cell.
14. Xie W, Kagiampakis I, Pan L, Zhang YW, Murphy L, Tao Y, et al. DNA
Methylation Patterns Separate Senescence from Transformation Potential
and Indicate Cancer Risk. Cancer Cell. 2018;33:309–321.e5.
15. Niedernhofer LJ, Robbins PD. Senotherapeutics for healthy ageing. Nature
reviews. Drug Des Discov. 2018:377.
16. Bormann F, Rodríguez-Paredes M, Hagemann S, Manchanda H, Kristof B,
Gutekunst J, et al. Reduced DNA methylation patterning and transcriptional
connectivity define human skin aging. Aging Cell. 2016;15:563–71.
17. Parrado C, Mercado-Saenz S, Perez-Davo A, Gilaberte Y, Gonzalez S, Juarranz
A. Environmental Stressors on Skin Aging. Mechanistic Insights. Front
18. Marioni RE, Belsky DW, Deary IJ, Wagner W. Association of facial ageing with DNA
methylation and epigenetic age predictions. Clin Epigenetics. 2018;10:140.
19. Horvath S, Oshima J, Martin GM, Lu AT, Quach A, Cohen H, et al. Epigenetic
clock for skin and blood cells applied to Hutchinson Gilford Progeria
Syndrome and studies. Aging. 2018;10:1758–75.
20. Chen Y-A, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, et al.
Discovery of cross-reactive probes and polymorphic CpGs in the Illumina
Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–9.
21. Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine learning and
integrative analysis of biomedical big data. Genes. 2019;10. https://doi.org/
22. Zhou F, Wang W, Shen C, Li H, Zuo X, Zheng X, et al. Epigenome-wide
association analysis identified nine skin DNA methylation loci for psoriasis. J
Invest Dermatol. 2016;136:779–87.
23. Shen C, Wen L, Ko R, Gao J, Shen X, Zuo X, et al. DNA methylation age is
not affected in psoriatic skin tissue. Clin Epigenetics. 2018. https://doi.org/
24. Rodríguez-Paredes M, Bormann F, Raddatz G, Gutekunst J, Lucena-Porcel C,
Köhler F, et al. Methylation profiling identifies two subclasses of squamous
cell carcinoma related to distinct cells of origin. Nat Commun. 2018;9:577.
25. Sarkar TJ, Quarta M, Mukherjee S, Colville A, Paine P, Doan L, et al. Transient
non-integrative expression of nuclear reprogramming factors promotes
multifaceted amelioration of aging in human cells. Nat Commun. 2020;11:1545.
26. Chang J, Wang Y, Shao L, Laberge R-M, Demaria M, Campisi J, et al.
Clearance of senescent cells by ABT263 rejuvenates aged hematopoietic
stem cells in mice. Nat Med. 2016;22:78–83.
27. Demidenko ZN, Zubova SG, Bukreeva EI, Pospelov VA, Pospelova TV,
Blagosklonny MV. Rapamycin decelerates cellular senescence. Cell Cycle.
28. Kovatcheva M, Liao W, Klein ME, Robine N, Geiger H, Crago AM, et al. ATRX
is a regulator of therapy induced senescence in human cells. Nat Commun.
29. Chung CL, Lawrence I, Hoffman M, Elgindi D, Nadhan K, Potnis M, et al.
Topical rapamycin reduces markers of senescence and aging in human skin:
an exploratory, prospective, randomized trial. Geroscience. 2019;41:861–9.
30. Bell CG, Lowe R, Adams PD, Baccarelli AA, Beck S, Bell JT, et al. DNA
methylation aging clocks: challenges and recommendations. Genome Biol.
31. Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. Genome-
wide methylation profiles reveal quantitative views of human aging rates.
Mol Cell. 2013;49:359–67.
32. Lu AT, Quach A, Wilson JG, Reiner AP, Aviv A, Raj K, et al. DNA methylation
GrimAge strongly predicts lifespan and healthspan. Aging. 2019;11:303–27.
33. Introduction to skin aging. J Tissue Viability. 2017;26:37–46.
34. Dimri GP, Lee X, Basile G, Acosta M, Scott G, Roskelley C, et al. A biomarker
that identifies senescent human cells in culture and in aging skin in vivo.
Proc Natl Acad Sci U S A. 1995;92:9363–7.
35. Ressler S, Bartkova J, Niederegger H, Bartek J, Scharffetter-Kochanek K,
Jansen-Durr P, et al. p16INK4Ais a robust in vivo biomarker of cellular aging
in human skin. Aging Cell. 2006:379–89. https://doi.org/10.1111/j.1474-9726.
36. Wang AS, Dreesen O. Biomarkers of cellular senescence and skin aging.
Front Genet. 2018;9:247.
37. Lai V, Cranwell W, Sinclair R. Epidemiology of skin cancer in the mature
patient. Clin Dermatol. 2018;36:167–76.
38. Olesen MS, Starnawska A, Bybjerg-Grauholm J, Bielfeld AP, Agerholm I,
Forman A, et al. Biological age of the endometrium using DNA methylation.
39. Wang Z, Zhang L, Liang Y, Zhang C, Xu Z, Zhang L, et al. Cyclic AMP
mimics the anti-ageing effects of calorie restriction by up-regulating sirtuin.
Sci Rep. 2015;5:12012.
40. Ye L, Mauro TM, Dang E, Wang G, Hu LZ, Yu C, et al. Topical applications of
an emollient reduce circulating pro-inflammatory cytokine levels in
chronically aged humans: a pilot clinical study. J Eur Acad Dermatol
41. Hu L, Mauro TM, Dang E, Man G, Zhang J, Lee D, et al. Epidermal
dysfunction leads to an age-associated increase in levels of serum
inflammatory cytokines. J Investig Dermatol. 2017:1277–85. https://doi.org/
42. Mercurio L, Lulli D, Mascia F, Dellambra E, Scarponi C, Morelli M, et al.
Intracellular insulin-like growth factor binding protein 2 (IGFBP2) contributes
to the senescence of keratinocytes in psoriasis by stabilizing cytoplasmic
p21. Aging. 2020;12:6823–51.
43. Basisty N, Kale A, Jeon OH, Kuehnemann C, Payne T, Rao C, et al. A
proteomic atlas of senescence-associated secretomes for aging biomarker
development. PLoS Biol. 2020;18:e3000599.
44. Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression
Omnibus (GEO) and BioConductor. Bioinformatics. 2007;23:1846–7.
45. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al.
Minfi: a flexible and comprehensive Bioconductor package for the analysis of
Infinium DNA methylation microarrays. Bioinformatics. 2014;30:1363–9.
46. Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven
approach to preprocessing Illumina 450K methylation array data. BMC
47. Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Zhang Y, Storey JD,
Torres LC (2020). sva: Surrogate variable analysis. R package version 3.36.0.
[cited 22 Jun 2020]. Available: http://bioconductor.org/packages/sva/.
48. Xu J, Spitale RC, Guan L, Flynn RA, Torre EA, Li R, et al. Novel gene
expression profile of women with intrinsic skin youthfulness by whole
transcriptome sequencing. PLoS One. 2016;11:e0165913.
49. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics. 2014;30:2114–20.
50. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR:
ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
51. Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-
throughput sequencing data. Bioinformatics. 2015;31:166–9.
52. Love MI, Huber W, Anders S. Moderated estimation of fold change and
dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.
53. Wickham H. ggplot2: Elegant Graphics for Data Analysis: Springer; 2016.
54. Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set
analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 2019;47:
55. Kuhn M. caret: classification and regression training [R package caret version
6.0-71]. 2016.[cited 22 Jun 2020]. Available: https://CRAN.R-project.org/
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Boroni et al. Clinical Epigenetics (2020) 12:105 Page 16 of 16
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at