Linguistic markers predict onset of Alzheimer’s disease
*, Sachin Mathur
, Mar Santamaria
, Guillermo Cecchi
*, Melissa Naylor
IBM Thomas J. Watson Research Center, IBM Research, Yorktown Heights, NY 10598, United States
Pﬁzer Worldwide Research and Development, Cambridge, MA 02139, United States
Received 27 April 2020
Revised 19 September 2020
Accepted 22 September 2020
Available online 22 October 2020
Background: The aim of this study is to use classiﬁcation methods to predict future onset of Alzheimer’s dis-
ease in cognitively normal subjects through automated linguistic analysis.
Methods: To study linguistic performance as an early biomarker of AD, we performed predictive modeling of
future diagnosis of AD from a cognitively normal baseline of Framingham Heart Study participants. The lin-
guistic variables were derived from written responses to the cookie-theft picture-description task. We com-
pared the predictive performance of linguistic variables with clinical and neuropsychological variables. The
study included 703 samples from 270 participants out of which a dataset consisting of a single sample from
80 participants was held out for testing. Half of the participants in the test set developed AD symptoms
before 85 years old, while the other half did not. All samples in the test set were collected during the cogni-
tively normal period (before MCI). The mean time to diagnosis of mild AD was 7.59 years.
Findings: Signiﬁcant predictive power was obtained, with AUC of 0.74 and accuracy of 0.70 when using lin-
guistic variables. The linguistic variables most relevant for predicting onset of AD have been identiﬁed in the
literature as associated with cognitive decline in dementia.
Interpretation: The results suggest that language performance in naturalistic probes expose subtle early signs
of progression to AD in advance of clinical diagnosis of impairment.
Funding: Pﬁzer, Inc. provided funding to obtain data from the Framingham Heart Study Consortium, and to
support the involvement of IBM Research in the initial phase of the study. The data used in this study was
supported by Framingham Heart Study’s National Heart, Lung, and Blood Institute contract (N01-HC-25195),
and by grants from the National Institute on Aging grants (R01-AG016495, R01-AG008122) and the National
Institute of Neurological Disorders and Stroke (R01-NS017950).
© 2020 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
A key priority in Alzheimer’s disease (AD) research is the identiﬁca-
tion of early intervention strategies that will decrease the risk, delay the
onset, or slow the progression of disease. Early interventions can only
be effectively tested and implemented if the population that stands to
beneﬁtcanbeidentiﬁed. While many variables have been associated
with risk of AD, there is still a great need for the development of cheap,
reliable biomarkers of preclinical AD. Aging-related cognitive decline
manifests itself in almost all aspects of language comprehension and
production. Even seemingly mundane linguistic abilities, such as object
naming, engage extensive brain networks . As a result, these linguis-
tic abilities can easily be disrupted, which makes language competence
a sensitive indicator of mental dysfunction. The inﬂuential Nun Study
 provided initial evidence of a correlation between lower linguistic
performance early in life and higher incidence of cognitive decline and
conversion rates to AD late in life.
The aim of this study is to test to what extent linguistic perfor-
mance at a single time point can be utilized as a prognostic marker of
conversion to AD. We used data from the Framingham Heart Study
(FHS) , a large cohort longitudinal study spanning several decades.
As a part of FHS, qualifying participants were administered a neuro-
psychological (NP) test battery in successive visits , which
included the cookie-theft picture description task (CTT) from the Bos-
ton Aphasia Diagnostic Examination . Picture description tasks are
used to assess discourse in subjects with disorders such as aphasia
and dementia, and CTT has become the most frequently used picture
description task in clinical settings . We applied computational
techniques to extract linguistic variables from written responses to
the CTT and compared their prognostic value with that of more tradi-
tional clinical variables that could easily be obtained in the screening
period of a clinical trial, including NP test scores, demographic and
genetic information, and medical history. Using the variables
obtained when the participants were assessed to be cognitively
* Corresponding authors.
E-mail addresses: firstname.lastname@example.org (E. Eyigoz), email@example.com
(G. Cecchi), firstname.lastname@example.org (M. Naylor).
2589-5370/© 2020 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
EClinicalMedicine 28 (2020) 100583
Contents lists available at ScienceDirect
journal homepage: https://www.journals.elsevier.com/eclinicalmedicine
normal, we developed models to predict whether or not a particular
participant will develop MCI due to AD on or before 85 years old.
Our work signiﬁcantly differs from the current literature on pre-
dicting future onset of AD in the following ways: First, our prediction
is based on data collected while the participants were cognitively
healthy. Second, we focus exclusively on variables readily attainable
as part of the screening phase of an early-intervention trial and assess
predictive performance using only linguistic metrics derived from a
single administration of the Cookie Theft Task, a relatively simple and
naturalistic language probe. Third, we utilized a machine learning
approach to deal with a multivariate representation of linguistic per-
formance. Finally, we compare the predictive ability of language fea-
tures with that of more traditional variables associated with
identiﬁcation of high risk for AD, e.g., for inclusion in a clinical trial of
potentially disease-modifying therapy).
2.1. Cognitive assessment in the Framingham heart study
The FHS is a well-documented, community-based cohort study
initiated in 1948, with the purpose of longitudinal monitoring of par-
ticipants’health [3,9]. Cognitive status monitoring of the original
cohort began in 1975, and since 1981 the participants’cognitive sta-
tus has been assessed with the MiniMental State Examination
(MMSE)  at examinations taking place every 4 years [4,11]. Partic-
ipants in the offspring cohort have undergone MMSEs since
1991, and have undergone NP examinations every 5 or 6 years since
1999 . Annual neurologic and neuropsychological examinations
were performed when cognitive decline was reported by a family
member of the participant, upon referral by a physician or by the
investigators of the FHS, or after review of the participant’s medical
records . Cognitive status monitoring of the participants was
reviewed by the Institutional Review Board of Boston University, and
informed consent was obtained from the participants.
The neuropsychological test battery resulted in a dementia rating,
which represents the impression of the examiner who administered
the test battery . The test battery included the cookie-theft pic-
ture description task (CTT) from the Boston Aphasia Diagnostic Exam-
ination, in which participants were asked to write down the
description of the cookie-theft picture. As highlighted above, picture
description tasks are commonly used to assess discourse in subjects
with disorders such as aphasia and dementia, and, given its sensi-
tivity to cognitive impairments, CTT has become the most frequently
used picture description task in clinical settings . The FHS study
participants who qualiﬁed for inclusion in our study were among the
oldest participants of the FHS, mostly from the original cohort, which
was limited in its representativeness of the wider population .
A dementia-review panel with at least one neurologist and one
neuropsychologist reviewed possible cognitive decline and dementia
cases documented in the FHS [12,13]. Diagnosis of dementia was
based on criteria from DSM-IV , and diagnosis of Alzheimer’s dis-
ease was based on criteria from NINCDSADRDA [15,16].
2.2. Predictive modeling approach
To ﬁt predictive models of future diagnosis of AD, we had to deter-
mine which participants to label as cases and which to label as con-
trols. FHS participants varied in terms of whether their data was
comprehensively reviewed by a panel of experts to determine
dementia and AD status. We ﬁrst identiﬁed a clinically deﬁned test
set by using these dementia reviews to label cases, selecting one CTT
sample from each case, and matching it to a CTT sample collected in a
control of the approximately the same age, gender and level of educa-
tion. Because the FHS data available to us included a dementia review
for only 39% of participants, only 80 of the participants qualiﬁed for
inclusion in this test set. This left a very large number of participants
unused. While most of the participants did not have dementia review
data allowing for deﬁnitive labeling of cases, a dementia rating was
available for the majority of the administrations of the neuropsycho-
logical test-battery. Using these dementia ratings, we used additional
participants to create a training set. In semi-supervised learning ter-
minology, the clinical dementia-review provided the ground-truth
labels of the test data, whereas the dementia ratings provided the
weak labels of the training data. This weakly-labeled training set was
used only for machine learning training.
We validated predictive models in two ways: the hold-out method
and the cross-validation method. For the hold-out method, we made use
of the weakly-labeled training data by ﬁtting the model to weakly
labeled training data and then validating it on the held-out ground-truth
test data. For the cross-validation method we implemented 20-fold cross
validation on the test data (see the Supplementary Material for details).
2.3. Selection of participants and samples
In this study, the onset of AD was deﬁned as the onset of mild cog-
nitive impairment (MCI) in a participant who later received a diagno-
sis of AD. MCI is a heterogeneous condition; however, for those MCI
patients who eventually convert to AD, MCI is considered by many to
represent early-stage AD . AD patients who developed MCI
on or before age 85 (denoted as 85) were deﬁned as cases.
We deﬁned the normal-aging group as the participants who were
recorded to be dementia-free on or after age 85 ( 85). The control
group was deﬁned as the combination of the normal-aging group and
AD patients whose onset of cognitive impairment was after 85 (>85)
years old, as depicted in Fig. 1. According to this deﬁnition, all cases
have already developed cognitive impairment due to AD at 85, and
none of the controls have developed cognitive impairment due to AD at
85. Age 85 was chosen as a threshold, because this threshold was the
optimum age to provide the largest balanced test set from the FHS data
that was available to this study. As the age threshold increases, less par-
ticipants qualify to be controls, and more participants qualify to be cases.
Conversely, as the age threshold decreases, more participants qualify to
be controls, and less participants qualify to be cases. In addition to pro-
viding the largest test set from FHS, age 85 has been widely used as a
threshold to deﬁne oldest-old in AD studies . It has been suggested
that very-late onset AD (VLOAD), as deﬁned by AD onset after the sec-
ond half of the ninth decade differs from earlier-onset AD with respect
to genetic and environmental patterns: Genetic risk factors for AD are
more inﬂuential at relatively earlier ages with decreasing inﬂuence as
age increases; while environmental factors may play a larger role in
developing VLOAD .
The test data set included only one sample per participant, and
they were matched to a control sample using age (+/- 2 years), gen-
der, and education. As our purpose was to predict conversion in cog-
nitively normal subjects, we included only samples collected prior to
any cognitive impairment onset. Samples from participants who did
not meet criteria for inclusion as ground-truth were used for training;
see the Supplementary Material for the details. Fig. 2 shows a dia-
gram summarizing the selection of participants and samples for the
test set and the weakly-labeled training set. The demographics of
participants in the test and training data sets can be seen in Table 1.
For the test cases, the mean time to diagnosis with mild AD from cog-
nitive normality was 7.59 years with standard deviation of 4.91, and
the mean time to cognitive impairment onset from cognitive normal-
ity was 3.93 years with standard deviation of 3.69.
2.4. Psycholinguistic analyses
In this section, we provide an overview of the psycholinguistic
analyses that were performed automatically for this study. See the
2E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583
Fig. 1. The diagram depicts method for selection of cases vs controls and predictive model setting. Participants who developed MCI due to AD on or before age 85 were selected as
cases, and participants remained dementia-free until age 85 were selected as controls. The three predictive models included only non-linguistic variables, only linguistic variables,
or both (see Table 3), collected when participants were considered cognitively normal, and were trained to predict conversion status by age 85 vs. later or no conversion.
Fig. 2. Method of selection of participants for creating a test data set and a training data set from the FHS data. The available data consisted of 3113 samples from 1254 participants,
486 of which have been reviewed by a panel for dementia status. The participants who were reviewed by the panel were candidates for creating a test data set. Their samples were
eliminated according to the inclusion criteria, and then the qualifying samples were passed through age, education and gender matching. This resulted in a test set of 80 samples.
The participants who were not reviewed by the dementia review panel were used for creating a larger weakly-labeled data set, only for the purpose of machine learning training.
Validation of predictive modeling consisted of the hold-out method (train on weak-labels, test on ground-truth), and cross-validation (train on ground-truth, test on ground-truth).
E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583 3
Supplementary Material for the variables computed for the analyses
presented in this section.
Verbosity, lexical richness, and repetitiveness was assessed by
using metrics such as number of words, number of unique words,
and frequencies of repetitions (Fig. 3). Misspellings, use of punctua-
tion, and uppercasing were analyzed to assess writing performance
and style. Language-modeling analyses were performed to model the
distributions of word sequences. Syntactic complexity was assessed
through analysis of parse trees. Semantic content was assessed
through analysis of participants’mention of information content
units. Finally, propositional idea density analysis was used to quantify
syntactic and semantic complexity.
2.5. Non-linguistic variables
The non-linguistic variables are age, gender, education (dichoto-
mized as college degree vs. no college degree, and high-school vs. no
high-school degree), number of APOE e4 alleles, two binary indicator
variables capturing evidence of hypertension or diabetes, and varia-
bles resulting from the NP tests. The NP tests used in this study
include assessment of visuospatial and executive reasoning, object
naming, memory, attention, abstraction, and language skills. A total
of 13 NP tests, as listed in Table 2, resulting in 32 NP variables (see
Supplementary Table 6) were used in this study. Consequently, the
comprehensiveness of neuropsychological assessment used in this
study surpass concise assessments, such as the Montreal cognitive
assessment MoCA . The clinical measures MMSE and the demen-
tia ratings were not included in the models, as all samples in the
ground-truth labeled test set, for both controls and cases, were col-
lected during the periods of cognitive normality and had no signiﬁ-
2.6. Variable selection and training of predictive models
In total, 87 linguistic variables were computed (see Supplemen-
tary Table 6). Two NP test scores were excluded, because they were
missing for more than half of the samples, leaving 31 NP test scores,
three clinical and two demographic variables as non-linguistic varia-
bles. Before training predictive models, variable selection was per-
formed strictly on the training data by using a univariate test between
the preclinical AD cases and the control groups for each variable and
eliminating variables that were not statistically signiﬁcant (p>=
0.05). The t-test was used in the cross-validation experiments and
the Wilcoxon signed rank test was used in the hold-out experiments.
The use of different univariate tests for different experiment condi-
tions was justiﬁed due to difference in data size and the noise in the
Fig. 3. CTT examples from FHS, including an unimpaired sample (a), an impaired sample showing telegraphic speech and lack of punctuation (b), and an even more impaired sample
showing in addition signiﬁcant misspellings and minimal grammatic complexity, e.g. lack of subjects (c).
Age demographic and the education level of the ground-truth labeled and weakly-labeled data sets. The
number of samples and the number of participants are the same in the ground-truth labeled set, as only one
sample per participant was included.
Ground-truth labels Weak labels
Age (mean +/- SD) Participants/
Samples Age (mean +/- SD) Samples Participants
Control Female 78.86 §6.01 22 84.34 §5.18 326 86
Male 79.0 §4.39 18 83.72 §4.99 191 46
Case Female 78.45 §5.36 22 71.79 §5.1 61 29
Male 79.17 §4.29 18 73.76 §5.43 45 29
Control No college 78.74 §5.89 23 83.83 §5.14 204 60
College 79.18 §4.48 17 84.36 §5.08 313 72
Case No college 78.22 §5.19 23 72.59 §5.71 22 16
College 79.53 §4.42 17 72.63 §5.23 84 42
Total 80 80 623 190
4E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583
weak labels. See Supplementary Material for details of the variable
selection for the hold-out and the cross-validation methods. For
training predictive models, we experimented with linear SVM, logis-
tic regression and Naïve Bayes classiﬁers. The hyperparameters of the
classiﬁers were set using nested cross-validation.
2.7. Longitudinal analysis of linguistic and NP variables
To identify possible longitudinal trends present in our multi-
dimensional assessment of cognitive status, we implemented a fac-
torization analysis, using all available samples from each eligible par-
ticipant taking into account the correlational structure between both
linguistic variables and the NP test scores. For this, we used the cases
and the normal-aging participants who have a record of cognitive
impairment onset. We aligned their samples temporally by their cog-
nitive impairment date. The frequency of administration of the NP
exams varied across participants and was on average 2.2 years. In
order to normalize this variance across participants, we created syn-
thetic samples by linear interpolation with a frequency of six months.
We then used Nonnegative Matrix Factorization (NMF) on the up-
sampled dataset. To compare the progression of cases and the
normal-aging group, we projected the latter onto the factors learned
for the former. The projections of each sample on the ﬁrst factor were
computed, and then averaged over all samples in each six-month
interval to obtain the loading of each interval. For this analysis, we
used all NP variables, and linguistic variables that were statistically
signiﬁcant on the test set with t-test, that were statistically signiﬁcant
on the training set with Wilcoxon signed rank test, and linguistic var-
iables that were statistically signiﬁcant with the Cox proportional-
hazards model analysis, which is described in the following section.
2.8. Analysis of time to diagnosis with mild AD
To assess whether linguistic variables associated with the time-to-
diagnosis with mild AD, we used Cox proportional-hazards models.
Date of mild AD diagnosis was obtained from the dementia review,
and the participants who were recorded as dementia free in their
dementia review were censored. If a censored participant was alive
at the date of the review, then the review date was used as the censor
date. If the participant was no longer alive at the date of the review,
then the oldest age the participant is known to be not demented was
used as the censor date. Models for each single linguistic variable
included as additional covariates age, gender, and education (i.e., col-
lege degree vs. no college degree.)
2.9. Role of funding source
Pﬁzer, Inc. provided funding to obtain the data from the Framing-
ham Heart Study Consortium, and to support the involvement of IBM
Research in the initial phase of the study. The funder had no role in
data analysis or interpretation, which were the responsibility of the
Univariate tests of individual variables between cases and controls
showed that future onset of AD was associated with telegraphic
speech, repetitiveness, and misspellings (see Supplementary Table
3). Telegraphic speech, as exempliﬁed in Fig. 3, is a common symp-
tom of non-ﬂuent aphasia. In telegraphic speech, grammatical struc-
ture is reduced or absent, such that language contains simpliﬁed
phrases consisting mainly of content words, with morphology and
function words largely missing [24,25]. As shown in the examples
from cognitively impaired participants in Fig. 3, telegraphic speech is
not only simpler in grammatical structure, but also marked by lack of
determiners (‘the’,‘a’), auxiliaries (‘is’,‘are’) and entire subjects. Fur-
thermore, samples from impaired participants further demonstrate
misspellings and lack of punctuation.
Fig. 4. The ROC curve of the test-set with the hold-out method for the linguistic-based
model (see Table 3). This result was obtained by a Logistic Regression classiﬁer.
Neuropsychological tests (NP) included in the predictive models along with clinical and demographic variables, separately
and in conjunction with linguistic variables. See Supplementary Table 1 for the full list of NP variables obtained through
these NP tests.
Cognitive Domain Description
Word retrieval Boston Naming Test
Learning The paired associate learning subtest from the Wechsler Memory Scale (WMS)
Attention and concentration Wechsler Adult Intelligence Scale (WAIS) score for digit span
Verbal memory The logical memory subtest from the Wechsler Memory Scale (WMS)
Premorbid intelligence The reading subset of the Wide range achievement test WRAT-3
Verbal ability and executive control Verbal ﬂuency
Attention and concentration Trail making tests A and B
Abstract reasoning The similarities test from Wechsler Adult Intelligence Scale (WAIS)
Visuoperceptual organization Hooper Visual Organization Test
Visual memory The visual reproduction subtest from the Wechsler Memory Scale (WMS-R)
Verbal Comprehension The information subset of the Wechsler Adult Intelligence Scale (WAIS-R)
Spatial visualization Block design test
Psychomotor speed Finger tapping test
E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583 5
Prediction performance in each experimental setting obtained by the
best performing classiﬁer are shown in Table 3. The plots showing the
separation of the test and the training datasets by the best performing
classiﬁer reported in Table 3 canbeseeninSupplementaryFigure2.In
Table 3, all metrics for each experimental setting were obtained by the
same classiﬁer. The results obtained by other classiﬁers that are not
reported in Table 3 can be found in the Supplementary Table 2.
Supplementary Table 4 shows the weights assigned to the linguis-
tic variables by the best performing classiﬁer reported in Table 3 in
the hold-out method. Similarly, Supplementary Table 5 shows the
weights assigned to the non-linguistic variables by the best
performing classiﬁer reported in Table 3 in the hold-out method. We
performed a step-wise classiﬁcation analysis by ranking the variables
with respect to the weights assigned to the them by the best per-
forming classiﬁer, and by incrementally adding variables for classiﬁ-
cation until all variables with p-value <0.05 were exhausted.
Supplementary Figure 4 shows that the highest AUC of 0.76 was
obtained with using the highest ranked 10 linguistic variables, which
can be found in Supplementary Table 4.
In order to assess statistical signiﬁcance, we computed a null distri-
bution of AUCs for chance classiﬁcation outcomes, and applied z-statis-
tics to estimate the probability of the AUCs obtained by the predictive
Results of prediction experiments for the three models. AUC stands for the area-under-the-curve statistic. Accuracy is ratio of
correctly predicted samples to the total number of samples. Positive predictive value is the ratio of correctly predicted positive
samples to the total predicted positive samples. Sensitivity is the ratio of correctly predicted positive samples to the all obser-
vations in the patient class. All metrics for each experimental setting were obtained by the same classiﬁer. The best perform-
ing classiﬁers in the hold-out experiments were Logistic Regression in the linguistic and non-linguistic settings, and Naïve
Bayes with the combination of linguistic and non-linguistic features. The best performing classiﬁers in the CV-experiments
were Logistic Regression in the linguistic settings, and Naïve Bayes in the non-linguistic setting and the combination of lin-
guistic and non-linguistic features.
CV method Hold-out method
Best classiﬁer Logistic Regression Logistic Regression
Linguistic variables from single
CTT samples during cognitive normalcy
Accuracy 0.65 0.70
AUC 0.73 0.74
Positive predictive value 0.64 0.74
Sensitivity 0.67 0.62
Best classiﬁer Naïve Bayes Logistic Regression
Non-linguistic variables (age, gender,
education, APOE, hypertension, diabetes, NP)
Accuracy 0.60 0.59
AUC 0.64 0.60
Positive predictive value 0.64 0.61
Sensitivity 0.44 0.48
Best classiﬁer Naïve Bayes Naïve Bayes
Aggregation of linguistic and
Accuracy 0.67 0.69
AUC 0.72 0.67
Positive predictive value 0.81 0.71
Sensitivity 0.44 0.62
Fig. 5. The results of the non-negative matrix factorization (NMF) analysis of the linguistic and the NP variables on longitudinal data. This plot demonstrates that the factorization of
the variables without using time information temporal trend as well as a differentiation between cases and controls, which starts several years before cognitive impairment. The
controls’samples are projected onto the factorization learned from the cases’samples and averaged over six-month intervals. Controls are shown in blue, and cases in red. The hori-
zontal axis is years to/from cognitive impairment onset, where 0 stands for the date of cognitive impairment. .
6E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583
models . The z-score indicated that AUC of 0.74 (see Fig. 4 for the
ROC curve) corresponds to a 4.26-fold increase in predictability over
chance (p<0.001). We observed a ten-point increase in accuracy
obtained by adding linguistic variables to the non-linguistic variables
(non-linguistic alone 0.59, combined 0.69). This indicates that linguistic
variables offer signiﬁcant information over the non-linguistic variables
in terms of their predictive diagnostic ability. The ratio of z-scores rela-
tive to the hull hypothesis indicated that the linguistic variables yielded
aclassiﬁcation performance 2.4 times better than non-linguistic varia-
bles; the ratio of AUC gains respect to chance, provides a comparable
value (0.19/0.09 = 2.11).
To examine the effects of education and sex on performance of the
model using linguistic variables and the hold-out method, AUC scores
were computed for participants with college degree vs participants
without a college degree, and for females vs males. The participants
with a college degree were harder to predict than participants without
a college degree (AUC of 0.70 for college-degree vs 0.76 for no-college
degree, see Supplementary Figure 3 for the ROC curves). The ratio of z-
scores indicated that classiﬁcation of the participants with no college
degree was 1.52 times better than for the participants with college
degree (as above, the gain ratio is 0.26/0.20 = 1.3). Similarly, females
were both more accurately and more conﬁdently predicted than males,
and the difference is substantial (AUC of 0.83 for females vs 0.64 for
males, see Supplementary Figure 3 for the ROC curves). The ratio of z-
scores indicated that the females were classiﬁed 2.61 times better than
males when compared to chance (the gain ratio is 0.33/0.14 = 2.35).
The longitudinal analysis in Fig. 5 shows the results of NMF factor-
ization of linguistic and NP variables, and demonstrates that an unsu-
pervised grouping of the variables without using time information
indeed shows a clear temporal trend, as well as a differentiation
between cases and controls which starts several years before cogni-
tive impairment. The plot shows the change in the loading of each
time interval on the ﬁrst component obtained by NMF, where 0 in
the horizontal axis stands for the date of cognitive impairment onset.
The green line shows the controls’progression in time, whereas the
blue line shows the cases’s progression in time, with a steeper decline
for the cases. Supplementary Figure 5 shows the loading of the factors
on the ﬁrst component from the NMF analysis, which shows the
respective contribution of linguistic and NP variables in the computa-
tion of the plot in Fig. 5 in the manuscript.
For the Cox proportional-hazards analysis, we used 143 partici-
pants, of which 28 were censored, with a total of 1159 person-years,
where average was 8.10 years per person. See Table 4 for all statisti-
cally signiﬁcant linguistic variables according to the Wald statistic.
Our results show that using the referentially generic terms boy, girl,
woman instead of the more speciﬁcson, brother, sister, daughter,
mother to refer to the subjects in the picture is associated with higher
risk of AD. Our results also show that mentioning the details in the
picture, such as the dishcloth and the dishes, is associated with lower
risk of AD. Consequently, this analysis revealed that the strongest
prognostic factors of AD involved semantic processing.
Our results demonstrate that it is possible to predict future onset
of Alzheimer’s disease using language samples obtained from cogni-
tively normal individuals. Moreover, we showed that using linguistic
variables from a single administration of the cookie-theft picture
description task performed better than predictive models that incor-
porated APOE, demographic variables, and NP test results.
Linguistic competence is a behavioral marker of educational and
occupational attainment, both of which have been suggested to
increase ‘cognitive reserve’by epidemiologic studies. Higher cogni-
tive reserve allows some people to be more resilient to brain pathol-
ogy than others , such that they can compensate the dysfunction
and delay diagnosis of AD . In this regard, we found a signiﬁcant
differentiation between participants with and without college educa-
tion. Furthermore, it is well-known that the prevalence of AD is sig-
niﬁcantly higher in women as compared to men, and that women
show a faster rate of progression after onset of cognitive impairment
. Similar to what we observed with educational attainment,
we found that it is much easier to predict conversion in women than
in men, suggesting that prodromal changes are more prominent in
females than in males.
The linguistic variables that we identiﬁed as most relevant for
predicting future onset of AD, prominently agraphia, telegraphic
speech and repetitiveness (see Supplementary Table 3), have been
consistently identiﬁed in the literature as associated with cognitive
decline in dementia. Repetitive speech that involves repetitive ques-
tioning, repetitive stories/statements, repetitive themes have been
reported in patients with dementia [32,33]. Studies on agraphia in
dementia and in AD participants have shown that patients made
more writing errors compared to controls . Declines in structural
complexity of utterances have been extensively investigated in peo-
ple with Alzheimer’s disease and dementia [35,36]. Another linguistic
element that has been associated with dementia, referential speciﬁc-
ity, was identiﬁed as having a strong weight in the survival analysis,
which is supported by a large number of studies showing that
semantic impairments are the earliest linguistic markers of
While the Cox Proportional-Hazards analysis identiﬁed semantic/
lexical factors, these factors did not prove to be discriminatory in the
classiﬁcation tasks. We believe that this is due to the differences in
the design of these analyses. An age threshold was used for inclusion
in the control vs case group in the classiﬁcation task, whereas the
Cox analysis treated all participants with a diagnosis of AD equally as
non-censored participant. As a result, among the 115 non-censored
participants in the Cox analysis, 48 of them had MCI onset after 85
years old, which would put them in the control group in the classiﬁ-
cation task. The contrasting results in these analyses indicate that, in
accordance with prior literature, semantic factors are predictive of
future diagnosis of AD for all subjects regardless of the age of onset,
as opposed to being predictive of AD onset before mid-eighties. Simi-
larly, verbosity and lexical richness metrics, which stand out as
strong markers of cognitive impairment in already demented
patients , were not among the strong predictors of future diagno-
sis of AD in cognitively normal individuals in our study.
The result of the longitudinal analysis of linguistic and NP varia-
bles, depicted in Fig. 5, shows a steeper decline in the trajectory of
aging for the AD group as compared with normal aging, which starts
during the preclinical phase. Similar clinical trajectories for AD and
normal aging were suggested in the literature .
The analysis of the written version of the CTT may be considered a
limitation of our study. The spoken version of the task may reveal dif-
ferent aspects of linguistic dysfunction. Another limitation of our
study is that a thorough analysis of the correlational structure of
Results of the Cox proportional hazards models: HR
stands for hazard ratio, CI for lower 95 and upper 95 con-
ﬁdence interval for the hazard ratio. HRs are for 1 SD
increase in these measures.
ICU HR CI P-value
falling 1.3148 (1.0531.6417) 0.0157
dishes 0.8172 (0.69010.9677) 0.0193
girl 1.1895 (1.02311.3829) 0.024
dishcloth 0.8368 (0.7120.9835) 0.0307
boy 1.1704 (1.01161.354) 0.0344
woman 1.2066 (1.00131.4539) 0.0484
E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583 7
linguistic features and neuropsychological test scores is outside the
scope of the present article. Finally, our deﬁnition of the ‘case’and
‘control’labels, while designed to be as clinically relevant as possible,
is ultimately discretionary and open to interpretation.
Biomarkers such as cerebrospinal ﬂuid or brain imaging  and
neuropsychological tests [41,42] have been used to predict progres-
sion of MCI to AD/dementia. Most recently, very promising results
were reported using Neuroﬁlament light chain (NfL) for disease pro-
gression at the early pre-symptomatic stages of familial Alzheimer’s
disease . However, these are still technologically or logistically
demanding, and require signiﬁcant specialists’involvement. On the
other hand, simple, naturalistic and inexpensive speech probes, as
our results suggest, can provide an assistive tool for the early detec-
tion and progression monitoring of AD, particularly given that such
probes can be easily adapted to remote digital platforms with low
Elif Eyigoz contributed to the research design, implemented the
coding and ran the experiments. She performed the literature review
and drafted and edited the manuscript. Finally, she contributed to
the interpretation of the results. Melissa Naylor contributed to the
research design, reviewed and edited the manuscript, and contrib-
uted to the interpretation of the results. Guillermo Cecchi contributed
to the research design, reviewed and edited the manuscript, and con-
tributed to the interpretation of the results. Sachin Mathur contrib-
uted to the coding, and research design, and reviewed and edited the
manuscript. Mar Santamaria contributed to the research design and
reviewed the manuscript.
Pﬁzer, Inc. provided funding to obtain the data from the Framing-
ham Study Consortium, and funding to IBM Corp. for the initial phase
of the study. The data used in this study was supported by Framing-
ham Heart Study’s National Heart, Lung, and Blood Institute contract
(N01-HC-25195), and by grants from the National Institute on Aging
grants (R01-AG016495,R01-AG008122) and the National Institute of
Neurological Disorders and Stroke (R01-NS017950).
Data sharing statement
In order to gain access to the Framingham Heart Study (FHS) data,
investigators have to submit a research proposal for review by one or
more FHS review committees. Approved study proposals further
require a fully executed Data and Materials Distribution Agreement,
and an IRB approval. The Data and Materials Distribution Agreement
can be accessed from the following link: https://framingham
Declaration of Competing Interests
Elif Eyigoz and Guillermo Cecchi has worked as salaried employ-
ees of IBM Corp. for the full duration of this project. Melissa Naylor
was a salaried employee of Pﬁzer, Inc. when assigned to this project,
until October 2018, and since then has been a salaried employee of
Takeda Pharmaceuticals. Sachin Mathur and Mar Santamaria have
worked as salaried employees of Pﬁzer, Inc. for the full duration of
this project. Guillermo Cecchi declares that IBM holds a patent (US-
9508360-B2) for the extraction of one of the features used in the lin-
Supplementary material associated with this article can be found,
in the online version, at doi:10.1016/j.eclinm.2020.100583.
 Roth CR, Helm-Estabrooks N. Boston naming test. Encyclopedia of clinical neuro-
psychology. Springer International Publishing; 2018. p. 611–5.
 Snowdon DA. Linguistic ability in early life and cognitive function and Alz-
heimer’s disease in late life. Findings from the Nun study. JAMA: J Am Med Assoc
 Dawber TR, Meadors GF, Moore Jr FE. Epidemiological approaches to heart dis-
ease: the Framingham study. Am J Public Health Nations Health 1951;41:279–86.
 Farmer ME, White LR, Kittner SJ, Kaplan E, Moes E, McNamara P, et al. Neuropsy-
chological test performance in Framingham: a descriptive study. Psychol Rep
 Seshadri S, Wolf PA, Beiser A, Au R, McNulty K, White R, et al. Lifetime risk of
dementia and Alzheimer’s disease: the impact of mortality on risk estimates in
the Framingham study. Neurology 1997;49:1498–504.
 Au R, Seshadri S, Wolf PA, Elias MF, Elias PK, Sullivan L, et al. New norms for a new
generation: cognitive performance in the Framingham offspring cohort. Exp
Aging Res 2004;30:333–58.
 Goodglass H, Kaplan E. The assessment of aphasia and related disorders. Lea &
 Cummings L. Describing the cookie theft picture: sources of breakdown in Alz-
heimer’s dementia. Pragm Soc 2019;10:153–76.
 Kannel WB, Feinleib M, McNamara PM, Garrison RJ, Castelli WP. An investigation
of coronary heart disease in families: the Framingham offspring Study. Am J Epi-
 Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”: a practical method for
grading the cognitive state of patients for the clinician. J Psychiatr Res
 Seshadri S, Wolf P, Beiser A, Au R, McNulty K, White R, et al. Lifetime risk of
dementia and Alzheimer’s disease: the impact of mortality on risk estimates in
the Framingham Study. Neurology 1997;49:1498–504.
 Seshadri S, Beiser A, Au R, Wolf PA, Evans DA, Wilson RS, et al. Operationalizing
diagnostic criteria for Alzheimer’s disease and other age-related cognitive
impairment—part 2. Alzheimer’s Dement 2011;7:35–52.
 Au R, Seshadri S, Knox K, Beiser A, Himali JJ, Cabral HJ, et al. The Framingham
brain donation program: neuropathology along the cognitive continuum. Curr
Alzheimer Res 2012;9:673–86.
 Association AP, others. Diagnostic and statistical manual of mental disorders
(DSM-5Ò). American Psychiatric Pub; 2013.
 McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical
diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group
under the auspices of department of health and human services task force on Alz-
heimer’s disease. Neurology 1984;34 939939.
 Bachman DL, Wolf PA, Linn RT, Knoefel JE, Cobb JL, Belanger AJ, et al. Incidence of
dementia and probable Alzheimer’s disease in a general population the Framing-
ham study. Neurology 1993;43 515515.
 Morris JC, Storandt M, Miller JP, McKeel DW, Price JL, Rubin EH, et al. Mild cogni-
tive impairment represents early-stage Alzheimer disease. Arch Neurol 2001:58.
 Morris JC. Mild cognitive impairment is early-stage Alzheimer disease: time to
revise diagnostic criteria. Arch Neurol 2006;63:15–6.
 Stephan B, Hunter S, Harris D, Llewellyn D, Siervo M, Matthews F, et al. The neuro-
pathological proﬁle of mild cognitive impairment (MCI): a systematic review. Mol
 Silverman JM, Smith CJ, Marin DB, Mohs RC, Propper CB. Familial patterns of risk
in very late-onset Alzheimer disease. Arch Gen Psychiatry 2003;60:190–7.
 Silverman JM, Li G, Zaccario ML, Smith CJ, Schmeidler J, Mohs RC, et al. Patterns of
risk in ﬁrst-degree relatives of patients with Alzheimer’s disease. Arch Gen Psy-
 Silverman JM, Ciresi G, Smith CJ, Marin DB, Schnaider-Beeri M. Variability of
familial risk of Alzheimer disease across the late life span. Arch Gen Psychiatry
 Nasreddine ZS, Phillips NA, B
edirian V, Charbonneau S, Whitehead V, Collin I,
et al. The montreal cognitive assessment, MoCA: a brief screening tool for mild
cognitive impairment. J Am Geriatr Soc 2005;53:695–9.
 Goodglass H. Understanding aphasia. Academic Press; 1993.
 Thompson CK. Treatment of syntactic and morphologic deﬁcits in agrammatic
aphasia: treatment of underlying forms. Language intervention strategies in
aphasia and related neurogenic communication disorders: ﬁfth edition. Wolters
Kluwer Health Adis (ESP); 2012. p. 735–55.
 Mason SJ, Graham NE. Areas beneath the relative operating characteristics (ROC)
and relative operating levels (ROL) curves: statistical signiﬁcance and interpreta-
tion. Quart J R Meteorol Soc: J Atmos Sci Appl Meteorol Phys Oceanogr
 Stern Y. Inﬂuence of education and occupation on the incidence of Alzheimer’s
disease. JAMA 1994;271:1004–10.
 Katzman R, Terry R, DeTeresa R, Brown T, Davies P, Fuld P, et al. Clinical, patholog-
ical, and neurochemical changes in dementia: a subgroup with preserved mental
status and numerous neocortical plaques. Ann Neurol: Off J Am Neurol Assoc
Child Neurol Soc 1988;23:138–44.
8E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583
 Andersen K, Launer LJ, Dewey ME, Letenneur L, Ott A, Copeland JRM, et al. Gender
differences in the incidence of AD and vascular dementia: the EURODEM studies.
Neurology 1999;53 19921992.
na J, Lloret A. Why women have more Alzheimer’s disease than men: gender
and mitochondrial toxicity of amyloid-bpeptide. J Alzheimer’s Dis 2010;20:
 Mielke M, Vemuri P, Rocca W. Clinical epidemiology of Alzheimer’s disease:
assessing sex and gender differences. Clin Epidemiol 2014:37.
 Barton S, Findlay D, Blake RA. The management of inappropriate vocalisation in
dementia: a hierarchical approach. Int J Geriatr Psychiatry 2005;20:1180–6.
 de Lira JO, Ortiz KZ, Campanha AC, Bertolucci PHF, Minett TSC. Microlinguistic
aspects of the oral narrative in patients with Alzheimer’s disease. Int Psychoger-
 Lambert J, Eustache F, Viader F, Dary M, Rioux P, Lechevalier B, et al. Agraphia in
Alzheimer’s disease: an independent lexical impairment. Brain Lang
 Kempler D, Almor A, Tyler LK, Andersen ES, MacDonald MC. Sentence compre-
hension deﬁcits in Alzheimer’s disease: a comparison of off-line vs. on-line sen-
tence processing. Brain Lang 1998;64:297–316.
 Lyons K, Kemper S, Labarge E, Ferraro FR, Balota D, Storandt M. Oral language and
Alzheimer’s disease: a reduction in syntactic complexity. Aging, Neuropsychol
 Martin A, Fedio P. Word production and comprehension in Alzheimer’s disease:
the breakdown of semantic knowledge. Brain Lang 1983;19:124–41.
 Appell J, Kertesz A, Fisman M. A study of language functioning in Alzheimer
patients. Brain Lang 1982;17:73–91.
 Bucks RS, Singh S, Cuerden JM, Wilcock GK. Analysis of spontaneous, conversa-
tional speech in dementia of Alzheimer type: evaluation of an objective technique
for analysing lexical performance. Aphasiology 2000;14:71–91.
 Sperling RA, Aisen PS, Beckett LA, Bennett DA,Craft S, Fagan AM,et al. Toward deﬁn-
ing the preclinical stages of Alzheimer’s disease: recommendations from the national
institute on aging-Alzheimer’s association workgroups on diagnostic guidelines for
Alzheimer’s disease. Alzheimer\Textquotesingles Dement 2011;7:280–92.
 Cui Y, Liu B, Luo S, Zhen X, Fan M, Liu T, et al. Identiﬁcation of conversion from
mild cognitive impairment to Alzheimer’s disease using multivariate predictors.
PLoS ONE 2011;6:e21896.
 Pereira T, Lemos L, Cardoso S, Silva D, Rodrigues A, Santana I, et al. Predicting pro-
gression of mild cognitive impairment to dementia using neuropsychological
data: a supervised learning approach using time windows. BMC Med Inform Decis
 Preische O, Schultz S, Apel A, Kuhle J, Kaeser S, Barro C, et al. Dominantly inherited
Alzheimer network. serum neuroﬁlament dynamics predicts neurodegeneration
and clinical progression in presymptomatic Alzheimer’s disease. Nat Med
E. Eyigoz et al. / EClinicalMedicine 28 (2020) 100583 9