ArticlePDF Available

Using structural MRI to identify bipolar disorders – 13 site machine learning study in 3020 individuals from the ENIGMA Bipolar Disorders Working Group


Abstract and Figures

Bipolar disorders (BDs) are among the leading causes of morbidity and disability. Objective biological markers, such as those based on brain imaging, could aid in clinical management of BD. Machine learning (ML) brings neuroimaging analyses to individual subject level and may potentially allow for their diagnostic use. However, fair and optimal application of ML requires large, multi-site datasets. We applied ML (support vector machines) to MRI data (regional cortical thickness, surface area, subcortical volumes) from 853 BD and 2167 control participants from 13 cohorts in the ENIGMA consortium. We attempted to differentiate BD from control participants, investigated different data handling strategies and studied the neuroimaging/clinical features most important for classification. Individual site accuracies ranged from 45.23% to 81.07%. Aggregate subject-level analyses yielded the highest accuracy (65.23%, 95% CI = 63.47–67.00, ROC-AUC = 71.49%, 95% CI = 69.39–73.59), followed by leave-one-site-out cross-validation (accuracy = 58.67%, 95% CI = 56.70–60.63). Meta-analysis of individual site accuracies did not provide above chance results. There was substantial agreement between the regions that contributed to identification of BD participants in the best performing site and in the aggregate dataset (Cohen’s Kappa = 0.83, 95% CI = 0.829–0.831). Treatment with anticonvulsants and age were associated with greater odds of correct classification. Although short of the 80% clinically relevant accuracy threshold, the results are promising and provide a fair and realistic estimate of classification performance, which can be achieved in a large, ecologically valid, multi-site sample of BD participants based on regional neurostructural measures. Furthermore, the significant classification in different samples was based on plausible and similar neuroanatomical features. Future multi-site studies should move towards sharing of raw/voxelwise neuroimaging data.
This content is subject to copyright. Terms and conditions apply.
Molecular Psychiatry (2020) 25:21302143
Using structural MRI to identify bipolar disorders 13 site machine
learning study in 3020 individuals from the ENIGMA Bipolar
Disorders Working Group
Abraham Nunes1,2 et al. for the ENIGMA Bipolar Disorders Working Group
Received: 15 February 2018 / Revised: 11 June 2018 / Accepted: 24 July 2018 / Published online: 31 August 2018
© The Author(s) 2018. This article is published with open access
Bipolar disorders (BDs) are among the leading causes of morbidity and disability. Objective biological markers, such as
those based on brain imaging, could aid in clinical management of BD. Machine learning (ML) brings neuroimaging
analyses to individual subject level and may potentially allow for their diagnostic use. However, fair and optimal application
of ML requires large, multi-site datasets. We applied ML (support vector machines) to MRI data (regional cortical thickness,
surface area, subcortical volumes) from 853 BD and 2167 control participants from 13 cohorts in the ENIGMA consortium.
We attempted to differentiate BD from control participants, investigated different data handling strategies and studied the
neuroimaging/clinical features most important for classication. Individual site accuracies ranged from 45.23% to 81.07%.
Aggregate subject-level analyses yielded the highest accuracy (65.23%, 95% CI =63.4767.00, ROC-AUC =71.49%, 95%
CI =69.3973.59), followed by leave-one-site-out cross-validation (accuracy =58.67%, 95% CI =56.7060.63). Meta-
analysis of individual site accuracies did not provide above chance results. There was substantial agreement between the
regions that contributed to identication of BD participants in the best performing site and in the aggregate dataset (Cohens
Kappa =0.83, 95% CI =0.8290.831). Treatment with anticonvulsants and age were associated with greater odds of correct
classication. Although short of the 80% clinically relevant accuracy threshold, the results are promising and provide a fair
and realistic estimate of classication performance, which can be achieved in a large, ecologically valid, multi-site sample of
BD participants based on regional neurostructural measures. Furthermore, the signicant classication in different samples
was based on plausible and similar neuroanatomical features. Future multi-site studies should move towards sharing of raw/
voxelwise neuroimaging data.
Bipolar disorders (BDs) are lifelong conditions, which tend to
start in adolescence or early adulthood and consequently rank
among the leading causes of morbidity and disability world-
wide [1,2]. Despite substantial advances in our understanding
of the neurobiology of BD, the diagnostic system in psy-
chiatry continues to be based on description of behavioral
symptoms. This often results in delayed or inaccurate diag-
nosis of BD [35], which in turn leads to delayed or
ineffective treatment [6]. Objective, biological markers could
aid signicantly in the clinical management of mental dis-
orders [7], might reduce stigma, facilitate research and
expedite the development of new treatments [8].
Brain imaging offers the unique ability to non-
invasively investigate brain structure and function. Pre-
vious brain-imaging meta-analyses and large-scale multi-
site studies have demonstrated that adults with BD had
robust and replicable neurostructural alterations in sub-
cortical, that is, hippocampus, amygdala, thalamus
[911], as well as cortical regions, including inferior
frontal gyrus, precentral gyrus, fusiform gyrus, middle
frontal cortex [1214]. Despite these advances and the
relatively broad availability, the diagnostic potential of
magnetic resonance imaging (MRI) in psychiatry has not
been fully realized.
The translation of brain imaging from bench to the
bedside has been hindered by the low sensitivity and
*Tomas Hajek
Extended author information available on the last page of the article
Electronic supplementary material The online version of this article
( contains supplementary
material, which is available to authorized users.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
specicity of between-group differences, by clinical het-
erogeneity and limited generalizability of ndings from
relatively small samples. The problem of low sensitivity and
specicity may be overcome by novel analytical tools, such
as machine learning (ML) [15,16]. Traditional mass-
univariate methods of MRI data analysis focus on localized
and spatially segregated patterns of between-group differ-
ences [17]. The effect sizes of such changes (Cohensd=
0.150.29 [11,14]) tend to be many times smaller than the
effects needed for clinical application (Cohensd=1.50
3.00 [18,19]). In contrast, the ML techniques increase
predictive power by targeting multivariate alterations dis-
tributed throughout the whole brain, which may
better characterize the abnormalities found in psychiatric
disorders [15,20,21]. Thus, ML brings neuroimaging
analyses to the level of individual subjects, and with some
caveats, potentially allows for diagnostic use. When pre-
viously applied to structural MRI, ML differentiated BD
from control participants with accuracies between 59.5%
[22] and 73.00% [23].
However, ML approaches typically require large samples
to optimize the performance of the classier, provide a
generalizable snapshot of the studied disorder, decrease the
risk of sampling effects and allow for application of rigor-
ous cross-validation approaches [19]. Single-site studies
may provide high site-specic accuracies [24], which,
however, may not generalize across samples [25,26]. Small
studies may also yield a wide range of classication per-
formances and inconsistencies in regions, which contribute
to the overall classication [2527]. Previous ML structural
MRI studies in BD have typically included <50 BD parti-
cipants recruited in a single site [23,2834]. The largest
currently available neurostructural ML studies investigated
128190 BD and 127284 control participants [3537],
from up to two sites [22,23,38].
Large, multi-site datasets will necessarily be more het-
erogeneous than single site, carefully controlled samples. In
fact, heterogeneity is one of the dening characteristics of
big-data [39]. Single-site studies with rigorous inclusion/
exclusion criteria may help isolate sources of heterogeneity,
but they will represent only a small fraction of the patient
space.In contrast, a large, multi-site study will primarily
target generalizable alterations, which are shared among the
participants, regardless of illness subtype, effects of treat-
ment and other clinical variables. This is related to the fact
that different sources of heterogeneity (i.e., presence of
psychosis, neuroprogression, exposure to medications)
affect different brain regions and often act in opposing
directions [1114,4042]. In addition, individual sources of
heterogeneity, which are present only in some participants,
are unlikely to systematically bias the ndings in large,
multi-site investigations. Thus, smaller, carefully controlled
studies and large, multi-site datasets are complementary and
ask different questions. BD is a broad and heterogeneous
condition. Therefore, it is all the more important to quantify
the extent to which ML can classify large, ecologically valid
datasets based on neuroanatomy.
Researching generalizable brain alterations has only
recently become possible through research consortia com-
mitted to aggregation and sharing of brain-imaging data
across research groups. Despite the inherent limitations,
retrospective data sharing initiatives create an optimal
environment for application of ML strategies and for a fair
and realistic estimation of the utility of MRI for classica-
tion of neuropsychiatric disorders. This approach has been
utilized to improve predictive models of autism or Alzhei-
mer dementia [26], but has not yet been applied to BD. The
Enhancing Neuro Imaging Genetics through Meta-Analysis
(ENIGMA) consortium is an international multi-cohort
collaboration, which, by combining datasets from multiple
sites, has allowed for more accurate testing of the repro-
ducibility of disease effects in participants with schizo-
phrenia [43], BD [11,14] or major depression [44]. Due to
the multi-site nature, methodological harmonization and
access to some of the largest neuroimaging datasets to date,
the ENIGMA platform provides an ideal opportunity to test
ML on sufciently large and generalizable samples.
In collaboration with the ENIGMA-BD Working Group,
we applied ML to structural MRI data from 3020 partici-
pants recruited in 13 independent sites around the world.
We attempted to differentiate BD from control participants
based on brain structure. In addition, we studied the effects
of different data handling strategies on classication per-
formance, described the neuroanatomical features, which
contributed to individual subject classication and investi-
gated the effects of clinical variables on classication
Materials and methods
The ENIGMA-BD Working Group brings together
researchers with brain imaging and clinical data from BD
participants and healthy controls [11,14]. Thirteen of the
sites from previously published ENIGMA studies [11,14]
provided individual subject data for this ML project. Each
cohorts demographics are detailed in Supplementary
Table S1. Supplementary Table S2 lists the instruments
used to obtain diagnosis and clinical information. Supple-
mentary Table S3 lists exclusion criteria for study enroll-
ment. Briey, all studies used standard diagnostic
instruments, including SCID (N=10), MINI (N=2) and
DIGS (N=1). Most studies (N=7) included bipolar
spectrum disorders, ve studies included only BD-I and a
Using structural MRI to identify bipolar disorders 13 site machine learning study in 3020 individuals. . . 2131
Content courtesy of Springer Nature, terms of use apply. Rights reserved
single study only BD-II participants. Substance abuse was
an exclusion criterion in 9/13 studies. Most studies (10/13)
did not exclude comorbidities, other than substance abuse.
A single study recruited medication naive participants. The
remaining studies did not restrict medication use. Conse-
quently, the sample is a broad, ecologically valid and gen-
eralizable representation of BD.
All participating sites obtained approval from their local
institutional review boards and ethics committees, and all
study participants provided written informed consent.
Image processing and analyses
Structural T1-weighted MRI brain scans were acquired at
each site and analyzed locally using harmonized analysis
and quality control protocols from the ENIGMA con-
sortium. Image acquisition parameters are listed in Sup-
plementary Table S4. All groups used the same analytical
protocol and performed the same visual and statistical
quality assessment. These harmonized protocols were used
in the previous publications by our group [11,14] and they
have been applied more broadly in large-scale ENIGMA
studies of other disorders. Briey, using a freely available
and extensively validated FreeSurfer software, we per-
formed segmentations and parcellations into 7 subcortical
and 34 cortical gray matter regions per hemisphere (left and
right), based on the DesikanKilliany atlas. Visual quality
controls were performed on a region of interest (ROI) level
aided by a visual inspection guide including pass/fail seg-
mentation examples. Diagnostic histogram plots were gen-
erated for each site and outlier subjects were agged for
further review. All ROIs failing quality inspection were
withheld from subsequent analyses. Previous analyses from
the ENIGMA-BD Working Group showed that scanner
eld strength, voxel volume and the version of FreeSurfer
used for segmentation did not signicantly inuence the
effect size estimates. Further details regarding these ana-
lyses, as well as forest plots of cortical and subcortical effect
sizes from individual sites can be found here [11,14].
Data preprocessing
Input features were ROI cortical thicknesses (CT), surface
area (SA) and subcortical volumes, a total of 150 features,
and intracranial volume. As SA and CT are genetically
distinct [45], inuenced by different neurobiological
mechanisms [46] and sometimes affected in opposite
directions [47], we used both as input features. Prior to
tting of the ML classier, we imputed missing data using
mean values of the respective features, as well as centered
and scaled each continuous feature.
Using statistical harmonization to reduce heterogeneity
of data could improve accuracy [48], but at a cost to
generalizability. Such approaches may compromise the
train/test separation and may introduce additional assump-
tions, which are difcult to verify. Thus, in keeping with
other studies [23,38,49], instead of statistical harmoniza-
tion, we modeled between-site effects by using several
different data handling strategies and investigated the
association between relevant variables and classication
accuracy, as described below.
Support vector machine classier
We a priori chose to use support vector machine (SVM
[50]), which is the most frequently used ML method in
psychiatric brain imaging [15,51]. The present analyses
implemented a linear kernel, because this limits the risk of
overtting, contains only a single parameter, see below, and
the coefcients of a linear classier can be interpreted as
relative measures of feature importance. However, we also
performed sensitivity analyses to determine the impact of
using a non-linear kernel (radial basis function) on results.
All ML analyses were implemented in the Python pro-
gramming language v. 3.6 using the Scikit-Learn package v.
0.19 [52].
The linear kernel SVM has only a single parameter, C,
which controls the trade-off between having zero training
errors and allowing misclassications. We decided to a
priori x the hyperparameter at C =1, for the following
reasons. First, this setting is a common choice in the
existing literature [5356]. Second, SVM performance is
relatively robust to changes in C values [57]. Third, the
decision to perform hyperparameter optimization has data
costs, as one must perform a further nesting of cross-
validation, resulting in smaller effective training sets [58].
Also, hyperparameter optimization involves many steps,
which have not been standardized and which may con-
tribute to vibration of effects, including introduction of
further hyperparameters (of the optimizers), selection of
the best objective function over which to optimize,
selection of constraints over the hyperparameter being
optimized and of the hyperparameter optimization algo-
rithm. Nevertheless, we also performed sensitivity ana-
lyses to determine the impact of hyperparameter
optimization in a nested cross-validation procedure, see
Supplementary material.
As the features used in the present study are engineered
(i.e., the feature set does not consist of raw, voxelwise
images), we opted against further feature selection. This
decision was also supported by the large sample size and the
fact that we had 20 times more participants than features.
Importantly, in the above-described methods, the SVM
models are independent across folds and no statistical har-
monization, model selection or comparison was done prior
to splitting the samples into testing and training.
2132 A. Nunes et al.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Consequently, we have minimized the potential for infor-
mation leak.
Classication performance was measured using standard
metrics including accuracy, sensitivity, specicity, positive
predictive value, negative predictive value and area under
the receiver operating characteristic curve (ROC-AUC).
Data handling
The rst application of the above-described classier was to
the classication of cases versus controls in individual sites,
referred to as site-level analyses. For each site, we tan
SVM and measured its performance using a stratied K-fold
cross-validation procedure. This method is stratied insofar
as the proportion of cases and controls (in respective folds)
is similar in both training and validation sets. The number of
folds was selected independently for each site, such that the
validation set on each fold would have approximately 3
( ±1) cases.
To further study how overall classication performance
relates to different methods of data handling, we imple-
mented three approaches. The rst was a meta-analysis of
diagnostic accuracy from site-level analyses, referred to as
meta-analysis. This models the typical method of analyzing
data in a multi-site collaboration [11,14]. The meta-
analyses were done using the hierarchical summary receiver
operating characteristic, implemented in HSROC package
v. 2.1.8 [59], in the R programming language, see Sup-
plementary material.
Second, we evaluated the same linear SVM para-
meterization used in all other analyses on a leave-one-site-
out (LOSO) cross-validation procedure, referred to as
LOSO analyses. In each fold of cross-validation, one sites
data were completely excluded from the training partition.
The SVM was then trained on the training partition and
predictive performance was evaluated on the data from the
held-out site.
Third, we t an SVM classier to the data pooled across
all sites, using the same linear SVM parameterization as in
the site-level analyses, and the same cross-validation pro-
cedure. This yielded a total of 284-folds and is further
referred to as aggregate subject-level analysis.
We corrected for the effects of imbalanced data in all
analyses and thereby trained the SVM classiers on an
effectively balanced dataset. To do this, we implemented
the Synthetic Minority Oversampling Technique with
Tomek link [60,61] using the imblearn package v. 0.3.0.
dev0 [62], in the Python language v. 3.6. The computer
code for the above-described analyses will be provided
upon reasonable request.
Feature importance
To determine feature importance, we plotted the SVM
coefcients learned (over a total of K-folds per sample)
based on the aggregated data and the SVM coefcients
learned from the site with the highest ROC-AUC perfor-
mance. To quantify whether similar features contributed to
classication in different analyses, we computed Cohens
kappa for agreement in ranking of feature coefcients of
individual regions between these two models, see Supple-
mentary material for details of this calculation.
Investigation of clinical heterogeneity/potential
confounding factors
We investigated whether any confounding factors con-
tributed to the classication by examining the relationship
between clinical/demographic variables and classication
results using mixed-effects logistic regression glmer
function in the lme4 package of the R Statistical Program-
ming Language [63]. Variables listed in Table 1and
Table 1 Descriptive statistics of the whole sample
Controls Cases p-Value
N2167 853
Age mean (SD) 34.89 (12.41) 37.43
< 0.001
Sex, N(%) female 1201 (55.4) 516 (60.5) 0.013
Diagnosis, N(%)
BD-I - 582 (68.63)
BD-II - 234 (27.59)
BD-NOS - 13 (1.53)
SZA - 19 (2.24)
Treatment at the time of scanning, N(%)
Li 265 (33.5)
AED - 339 (43.1)
FGA - 32 (4.1)
SGA - 313 (39.9)
AD - 281 (35.5)
Mood state, N(%)
Euthymic - 475 (75.5)
Depressed - 131 (20.8)
Manic - 11 (1.7)
Hypomanic - 9 (1.4)
Mixed - 3 (0.5)
Age of onset mean (SD) - 22.36 (9.08)
Duration of illness mean
- 14.64
History of psychosis, N(%) - 372 (61.1)
AD antidepressants, AED antiepileptics, BD-I bipolar I disorder, BD-II
bipolar II disorder, BD-NOS bipolar disorder not otherwise specied,
FGA rst-generation antipsychotics, Li lithium, SD standard deviation,
SGA second-generation antipsychotics, SZA schizoaffective disorder
Using structural MRI to identify bipolar disorders 13 site machine learning study in 3020 individuals. . . 2133
Content courtesy of Springer Nature, terms of use apply. Rights reserved
intercepts were taken as random effects varying between
sites about a group mean, see Supplementary material. For
numerical stability, age, age of onset and duration of illness
were scaled to have mean 0 and unit variance.
We included 3020 participants (853 BD cases and 2167
controls), see Table 1.
The classication accuracy in individual sites ranged
from 45.23% (95% condence interval (95% CI) =35.91
54.57) to 81.07% (95% CI =78.6883.46), see Fig. 1a. The
classication performance was closely associated with the
method of data handling. Meta-analysis of individual site
results yielded the lowest performance, which did not
exceed chance level, see Fig. 1b, Table 2. The LOSO cross-
validation provided above chance classication, but per-
formed worse than the aggregate subject-level analyses.
Aggregating the data across sites yielded the highest and
Fig. 1 aPerformance of SVM classiers independently trained on
each sample mean with 95% condence interval. Each row denotes a
site in the data set, whereas each column denotes a specic perfor-
mance metric. bMeta-analytic (summary) receiver operating char-
acteristic (SROC) curves. Site-level sensitivity (Sn) and specicity
(Sp) are empty circles of radius proportional to sample size. The red
point is the median estimate of Sn and Sp. The solid black line is the
SROC curve. Dashed diagonal represents chance performance. The red
ellipse is the 95% posterior credible region, and the blue dashed line is
the 95% posterior predictive region. cReceiver operating characteristic
(ROC) curves for the aggregate subject-level analysis. Faint gray lines
are the ROC curves for individual validation folds, and blue lines
represent the mean ROC curve
2134 A. Nunes et al.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
statistically signicant classication performance, see
Fig. 1c, Table 2.
Feature importance
Ranking of features, which contributed to classication in
the site with the highest ROC-AUC and the aggregate
subject-level analyses, see Fig. 2, showed substantial
agreement (Cohens Kappa =0.83, 95% CI =0.829
Effects of clinical heterogeneity
Among BD participants in the aggregate subject-level
analysis, both age (odds ratio (OR) =1.4, 95% CI =1.05
1.88, p=0.02) and antiepileptic use (OR =1.73, 95%
CI =1.072.78, p=0.02) were positively and additively
associated with correct classication. There was no asso-
ciation between correct classication and diagnostic sub-
group, treatment with rst-, second-generation
antipsychotics, lithium (Li), age of onset, history of psy-
chosis, mood state or sex, see Supplementary Table S5. Age
was necessarily co-linear with duration of illness (r(782) =
0.66, p< 0.001), but there was no univariate association
between the duration of illness and correct classication
(OR =1.18, 95% CI =0.981.43, p=0.09).
Treatment with anticonvulsants was negatively asso-
ciated with Li treatment (OR =0.39, 95% CI =0.190.80,
p=0.01), but not with any other clinical features, see
Supplementary Table S6.
In the whole sample, both age (OR =1.46, 95% CI =
1.171.81, p< 0.001) and status (BD versus controls;
OR =1.60, 95% CI =1.282.01, p< 0.001), but not sex
(OR =1.21, 95% CI =0.991.48, p=0.06) were inde-
pendently associated with being classied as a BD
Sensitivity analyses
Using the radial basis function kernel yielded accuracy of
68%, 95% CI =6769%. Hyperparameter optimization
resulted in training set accuracy of 65.9%, 95% CI =65.7
66.0 and testing set accuracy of 57.5%, 95% CI =49.1
65.9. Thus, it is unlikely that substantial classication per-
formance was sacriced by forgoing kernel nonlinearity or
hyperparameter optimization.
In the LOSO analysis, when we left out the sites with the
highest ROC-AUC curves, i.e., Halifax, Marburg
(FOR 2107), Cape Town (CIAM), we acquired ROC-AUC
of 65.42%, 66.18%, 63.07%, respectively, see Fig. 3, which
was comparable to the overall ROC-AUC of 60.92% in the
LOSO. Thus, the overall results did not appear to be overly
inuenced by the best performing sites.
When applied to structural brain-imaging data, ML differ-
entiated BD participants from controls with above chance
accuracy even in a large and heterogeneous sample of 3020
participants from 13 sites worldwide. Aggregate analyses of
individual subject data yielded better performance than
LOSO or meta-analysis of site-level results. Despite the
multi-site nature, ML identied a set of plausible brain-
imaging features, which characterized individual BD parti-
cipants and generalized across samples. Age and exposure
to anticonvulsants were associated with greater odds of
correct classication.
Previous studies employing raw structural MRI data
have yielded accuracies between 59.50 and 73.00% [22,
23] for differentiating BD from control participants. A
single study using results from automated segmentation
reported accuracy below 60.00% [37]. Although direct
Table 2 Summary of
classication results from meta-
analysis of sample-level
classiers, leave-one-site-out
and aggregate subject-level
Statistic Meta-analysis Leave-one-site-out Aggregate subject-level
Accuracy (%) - 58.67 (56.7060.63) 65.23 (63.4767.00)
ROC-AUC - 60.92 (58.1863.67) 71.49 (69.3973.59)
Sensitivity (%) 42.60 (13.4071.57) 51.99 (48.2055.78) 66.02 (62.7169.33)
Specicity (%) 59.14 (30.5987.94) 64.85 (61.9167.79) 64.90 (62.8666.93)
PPV (%) - 47.25 (37.6756.84) 44.45 (42.0446.86)
NPV (%) - 67.67 (60.3674.98) 83.73 (82.2185.26)
Note that meta-analytic results of the HSROC package include only sensitivity and specicity of the overall
meta-analytic classication. Results for meta-analytic summary are the posterior predictive value of the
performance metric, reported as mean (95% credible interval; the Bayesian analog of 95% condence
intervals). Results for the aggregate subject-level and leave-one-site-out analyses are reported as mean and
95% condence interval
NPV negative predictive value, PPV positive predictive value, ROC-AUC area under receiver operating
characteristic curve
Using structural MRI to identify bipolar disorders 13 site machine learning study in 3020 individuals. . . 2135
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2136 A. Nunes et al.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
comparison is complicated by methodological and sample
size differences, the modest accuracies in previous studies
ings appear realistic and there is little evidence for
The classication performance in the aggregated dataset
was signicantly above chance level and the ROC-AUC of
71.49% (69.3973.59) reached acceptable discrimination
[64,65]. However, the accuracy of 65.23% (95% CI =
63.4767.00) fell short of the 80% threshold, which is
deemed clinically relevant [66]. We need to consider several
issues when interpreting these ndings. BDs are difcult to
diagnose even by standard methods. The Cohens kappa for
reliability of the BD-I diagnosis is 0.56 and as low as 0.40
for BD-II [67]. In addition, the illness shows marked clin-
ical and neurobiological heterogeneity [10,12]. Perhaps
most importantly, we worked with regional brain measures,
not raw/voxelwise data. This approach necessarily involves
some information loss in the feature engineering process.
Analyses of experimenter-dened features are increasingly
outperformed by models capable of learning abstractions
from raw data alone [68]. Applying deep learning [69]to
raw data would likely offer the greatest increase in classi-
cation accuracy.
This study provides important clues about the impact of
data handling on the classication performance. As expec-
ted, the meta-analysis of individual site results, typically the
rst method of data analyses in multi-site collaborations,
yielded the lowest accuracy, which did not exceed chance
level. The LOSO analyses performed better than the meta-
analytic approach, but worse than when individual subject
data were aggregated and analyzed jointly. These differ-
ences in performance are likely related to the way each
method handles the conditional relationships between the
sites. Meta-analyses essentially model these relationships
after the fact. The LOSO analyses are hindered by the fact
that data are partitioned along some factor that is not ran-
dom. In contrast, pooling of data allows for random parti-
tioning and incorporates the relationships between the sites
in their raw form. In addition, the classication performance
is closely linked to the size of the training sample [49,70],
which increased from individual site through LOSO to
aggregate analyses.
Thus, the empirical pattern of ndings is convergent with
theoretical prediction of how each of these methods should
perform. It is also congruent with previous studies in autism
[49], schizophrenia [70] and Alzheimer dementia [26],
which also showed increasing performance with increasing
size of the training set. It is a question whether this would
also be the case in more heterogeneous conditions, such as
major depression or anxiety disorders. Regardless, aggre-
gate analyses provided the best classication performance
in BD. Future multi-site brain-imaging studies should
attempt to move towards sharing of individual subject data,
not only site-level results.
The linear SVM kernel allowed us to visualize the con-
tribution of individual regions to the overall classication. It
Fig. 2 Violin plot of feature importance across cross-validation (CV)
folds for aggregate subject-level analysis (left), and the site, which
yielded the highest ROC-AUC (right). At each CV iteration, we
extracted linear support vector machine (SVM) coefcients. The set of
all coefcients from our SVM models are centered about 0. Deviation
of coefcients from zero is an indication of the relative importance of
individual features in the data. Features with positive and negative
coefcients have positive and negative associations, respectively, with
probability of classication as a case. The yaxis lists variables for
which SVM coefcients were strictly non-zero throughout all cross-
validation iterations
Fig. 3 Bar plot of the area under
the receiver operating
characteristic curve (ROC-AUC)
for the leave-one-site-out
(LOSO) analyses. The sites
listed along the xaxis are those
that were held-out at each fold
Using structural MRI to identify bipolar disorders 13 site machine learning study in 3020 individuals. . . 2137
Content courtesy of Springer Nature, terms of use apply. Rights reserved
is of note that the results of a backward model should not be
used for localization [71]. We used this approach to broadly
verify the neurobiological plausibility [26], not to infer
pathophysiology. Our ndings showed good validity, as
many of the same regions, which have previously shown
differences between groups of BD patients and controls,
contributed to the classication on individual subject level,
including hippocampus, amygdala [911], as well as cor-
tical regions, such as inferior frontal gyrus [12,14] and
precentral gyrus [13].
In addition, we wanted to determine whether similar
features were used for classication across different ana-
lyses. Indeed, there was a substantial agreement between the
regions, which contributed to the classication in the site,
which yielded the highest ROC-AUC and in the aggregate
dataset, with Cohens Kappa of 0.83 (95% CI =0.829
0.831). Furthermore, when we trained the classier on data
from all but the best performing sites, the classication
performance did not drop below the overall accuracy in the
LOSO analyses. Thus, individual sites did not markedly
inuence the overall ndings. Taken together, these results
suggest that the classication was based on a biologically
plausible and generalizable neurostructural signature, which
is shared among subjects in a large, multi-site sample. This
is highly interesting, as existence of a generalizable bio-
marker is one of the key dening features of a diagnostic
category [72].
We also investigated the effects of clinical/demographic
variables on classication accuracy. Older age and antic-
onvulsant treatment were associated with greater odds of
correct classication. The effect of age may be related to the
fact that illness-related alterations may get worse with age/
duration of illness [73]. Interestingly, similar association
was noted in a meta-analysis of brain-imaging ML studies
in schizophrenia [74]. These ndings also broadly agree
with another study, in which late-stage BD was easier to
classify than early stage illness [36]. However, we did not
nd an association between accuracy of classication and
duration of illness or age of onset.
The association with anticonvulsant treatment may
reect effects of illness or medications. Treatment with
anticonvulsants was not associated with severity of illness,
diagnostic category, mood state, age of onset or personal
history of psychotic symptoms and thus did not appear to
index a specic subgroup within BD. Interestingly, parti-
cipants who were treated with anticonvulsants were less
likely to also receive Li treatment. Perhaps, the neuropro-
tective effects of Li, which may normalize brain alterations
in BD [10,75] could presumably make the classication
based on brain structure more difcult. However, Li treat-
ment itself was not associated with classication accuracy.
Previous studies have suggested that valproate, may nega-
tively affect brain structure [76], which could contribute to
correct differentiation of anticonvulsant treated from control
participants. This was, however, not documented for
lamotrigine, which is also frequently used in treatment of
BD. Overall, the reasons why treatment with antic-
onvulsants and age were associated with easier classica-
tion are unclear and will be subject to future analyses.
A related question is whether the clinical/demographic
heterogeneity confounded our ndings and whether the age
and/or treatment with anticonvulsants contributed more to
the classication than the presence or absence of BD. Due
to selection bias, heterogeneity is more likely to affect
results in smaller studies [25]. The strength of a large, multi-
center analysis is that it will primarily target the common
alterations, which are generalizable to most participants and
not individual sources of heterogeneity, which are present
only in some [25]. In addition, both age and status were
independently and additively associated with being classi-
ed as a BD participant in the whole sample. Also, within
the site with the highest classication performance, BD
participants and controls were balanced by age. In addition,
43.1% of patients in the whole sample were treated with
anticonvulsants and yet, we reached a 66.02% sensitivity
for correctly identifying BD participants. Last but not least,
the sites with the highest proportion of anticonvulsant-
treated participants (61.4%) and the highest discrepancy in
age showed relatively low sensitivities of 49% and 29%,
respectively. Thus, although certain clinical and demo-
graphic variables were associated with correct classication,
it is unlikely that overall we were classifying participants
based on the presence or absence of specic clinical/
demographic variables, rather than the presence or absence
of BD.
Our study has the following limitations. Due to differ-
ences in availability, we did not include other brain-imaging
modalities or other types of data, that is, genetic, neuro-
cognitive or biochemical. Access to raw data would allow
us to use deep learning methods [68] or create a meta-model
by combining classiers trained on the local datasets [77].
However, currently there are signicant practical and legal
limitations to raw data sharing. The clinical heterogeneity
and multi-site nature, which complicate traditional between-
group comparisons, allowed us to test the ML algorithms
on a wide range of participants in a fair setting that better
resembles a clinical situation. To achieve a clearer exposi-
tion and reduce methodological heterogeneity, we decided
to use SVM. Previous studies have generally found minimal
differences between shallowML method [37]. As we
worked with regional brain measures, not voxelwise data,
we would not be able to fully exploit the power of deeper
methods [78]. The depth and breadth of phenotyping are
general issues in retrospective, multi-site data sharing col-
laborations. Specic sources of heterogeneity, that is, neu-
roprogression and comorbid conditions, may be particularly
2138 A. Nunes et al.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
difcult to quantify. Addressing them would require a dif-
ferent research design. However, the large, multi-site sam-
ple, together with the exploratory analyses and examination
of individual site results made it less likely that individual
clinical characteristics systematically confounded the nd-
ings. Finally, attempting to differentiate BD from control
participants is the rst step before moving to more clinically
relevant problems, such as differential diagnosis.
The key advantages of this study include the large,
generalizable sample, access to individual subject data from
13 sites and the conservative and scalable nature of the
analyses. This is currently the largest application of ML to
brain-imaging data in BD, with up to two orders of mag-
nitude, greater sample size than in previous studies. The
unique nature of the dataset provides qualitative, not only
quantitative advantages. Previous studies showed low sta-
bility of ML results with fewer than 130 participants [70], a
threshold we exceeded 716 times. The multi-site dataset
maximized the training set size, provided ecologically valid
representation of the illness, allowed us to focus on com-
mon, BD-related alterations and for the rst time apply the
LOSO cross-validation in BD brain imaging. We com-
pletely separated the testing and training sets at each level of
analysis, thus minimizing the risk of information leak, and
specically focused on maximizing generalizability/redu-
cing the risk of overtting. The study is an example of close
international collaboration, which is one of the best ways,
how to create optimal datasets for ML analyses.
This study provides a realistic and fair estimate of classi-
cation performance, which can be achieved in a large,
ecologically valid, multi-site sample of BD participants
based on regional neurostructural measures. Although short
of the 80% clinically relevant threshold the 65.23% accu-
racy, 71.49% ROC-AUC are promising, as we used an
engineered feature set in a difcult to diagnose condition,
which shows a marked clinical and neurobiological het-
erogeneity. In addition, similar, biologically plausible fea-
tures contributed to classication in different analyses.
Together these ndings provide a proof of concept for a
generalizable brain-imaging signature of BD, which can be
detected on individual subject level, even in a large, multi-
site sample. Although specic clinical/demographic char-
acteristics, such age and anticonvulsant treatment, may
affect classication, the clinical heterogeneity was not in the
way of differentiating BD from control participants. Finally,
we demonstrated that meta-analyses of individual site/study
ML performances provide a poor proxy for results, which
could be obtained by pooling of individual subject data.
These ndings are an important step towards translating
brain imaging from bench to the bedside. They suggest that
a multi-site ML classier may correctly identify previously
unseen data and aid in diagnosing individual BD partici-
pants. Application of deep learning to raw data might
considerably increase the accuracy of classication.
Acknowledgements The researchers and studies included in this paper
were supported by Australian National Medical and Health Research
Council (Program Grant 1037196) and the Lansdowne Foundation,
Generalitat de Catalunya PERIS postdoc contract (Plaestrategic de
Recerca i Innovacio en Salut), the Spanish Ministry of Economy and
Competitiveness (PI15/00283) integrated into the Plan Nacional de I
+D+Iyconanciado por el ISCIII-Subdirección General de Eva-
luación y el Fondo Europeo de Desarrollo Regional (FEDER);
CIBERSAM; and the Comissionat per a Universitats i Recerca del
DIUE de la Generalitat de Catalunya to the Bipolar Disorders Group
(2014 SGR 398), Dalhousie Department of Psychiatry and Clinician
Investigator Program (AN), grants of the Deanery of the Medical
Faculty of the University of Münster, National Institute of Mental
Health grant MH083968, R01MH101111, R01MH107703,
K23MH85096; Desert-Pacic Mental Illness Research, Education, and
Clinical Center, 2010 NARSAD Young Investigator Award to Dr
Xavier Caseras, PRISMA U.T., Colciencias, Research Council of
Norway (249795), the South-Eastern Norway Regional Health
Authority (2014097), Research grant Health Region South-East,
Norway and Throne-Holst research grant, South African Medical
Research Council, Australian National Medical and Health Research
Council (NHMRC) Program Grant 1037196, 1063960 and 1066177,
Janette Mary ONeil Research Fellowship, the European Commis-
sions 7th Framework Programme #602450 (IMAGEMEND),
#602450, the FIDMAG Hermanas Hospitalarias Research Foundation
sample is supported by the Comissionat per a Universitats i Recerca
del DIUE de la Generalitat de Catalunya (2017-SGR-1271) and several
grants funded by Instituto de Salud Carlos III (Co-funded by European
Regional Development Fund/European Social Fund) Investing in
your future): Miguel Servet Research Contract (CPII16/00018 to EP-
C and CPII13/00018 to RS), Sara Borrell Contract grant (CD16/00264
to MF-V) and Research Projects (PI14/01148 to EP-C, PI14/01151 to
RS and PI15/00277 to EJC-R), Health Research Board (grant number
HRA_POR/2011/100), Agence Nationale pour la Recherche (ANR-
11-IDEX-0004 Labex BioPsy, ANR-10-COHO-10-01 psyCOH),
Fondation pour la Recherche Médicale (Bioinformatique pour la bio-
logie 2014) and the Fondation de lAvenir (Recherche Médicale
Appliquée 2014), The Research Council of Norway (#223273,
#229129, #249711), KG Jebsen Stiftelsen (SKGJMED008), the
South-Eastern Norway Regional Health Authority, Oslo University
Hospital, the Ebbe Frøland foundation, and a research grant from Mrs.
Throne-Holst, FAPESP-Brazil 2013/03905-4), CNPq-Brazil
(#478466/2009 and 480370/2009), and the Brain & Behavior
Research Foundation (2010 NARSAD Independent Investigator
Award to GFB), Innovative Medizinische Forschung (RE111604 to
RR and RE111722 to RR); SFB-TRR58, Projects C09 and Z02 to UD
and the Interdisciplinary Center for Clinical Research (IZKF) of the
medical faculty of Münster (grant Dan3/012/17 to UD), The German
Research Foundation (DFG) as part of the Research Unit Neuro-
biology of Affective Disorders(DFG FOR 2107; KI 588/14-1, KO
4291/3-1, KI 588/14-1, KR 3822/5-1, DA 1151/5-1, DA1151/5-2, JA
1890/7-1, AK 3822/5-1), University Research Committee, University
of Cape Town, South Africa; National Research Foundation, South
Africa, Canadian Institutes of Health Research (103703, 106469 and
64410), Nova Scotia Health Research Foundation, Dalhousie Clinical
Research Scholarship to T Hajek, NARSAD 2007 Young Investigator
and 2015 Independent Investigator Awards to T Hajek, the NIH grant
(U54 EB020403) to the ENIGMA Center for Worldwide Medicine,
Using structural MRI to identify bipolar disorders 13 site machine learning study in 3020 individuals. . . 2139
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Imaging & Genomics, funded as part of the NIH Big Data to
Knowledge (BD2K) initiative.
Compliance with ethical standards
Conict of interest OAA received speakers honorarium from Lund-
beck. JCS has participated in research funded by BMS, Forest, Merck,
Elan, J&J consulted for Astellas and has been a speaker for Pzer,
Abbott and Sano. TE has received honoraria for lecturing from
GlaxoSmithKlein, Pzer, and Lundbeck. EV has received grants and
served as consultant, advisor or speaker for the following entities: AB-
Biotics, Allergan, Angelini, Dainippon Sumitomo Pharma, Farm-
industria, Ferrer, Gedeon Richter, Janssen, Johnson and Johnson,
Lundbeck, Otsuka, Pzer, Roche, Sano-Aventis, Servier, the Brain
and Behavior Foundation, the Seventh European Framework Pro-
gramme (ENBREC), the Stanley Medical Research Institute, Suno-
vion, and Takeda. The remaining authors declare that they have no
conict of interest.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if
changes were made. The images or other third party material in this
article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not
included in the articles Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this license, visit http://creativecommons.
1. Gustavsson A, Svensson M, Jacobi F, Allgulander C, Alonso J,
Beghi E, et al. Cost of disorders of the brain in Europe 2010. Eur
Neuropsychopharmacol. 2011;21:71879.
2. Whiteford HA, Degenhardt L, Rehm J, Baxter AJ, Ferrari AJ,
Erskine HE, et al. Global burden of disease attributable to mental
and substance use disorders: ndings from the Global Burden of
Disease Study 2010. Lancet. 2013;382:157586.
3. Bschor T, Angst J, Azorin JM, Bowden CL, Perugi G, Vieta E,
et al. Are bipolar disorders underdiagnosed in patients with
depressive episodes? Results of the multicenter BRIDGE screen-
ing study in Germany. J Affect Disord. 2012;142:4552.
4. Ghaemi SN, Sachs GS, Chiou AM, Pandurangi AK, Goodwin K.
Is bipolar disorder still underdiagnosed? Are antidepressants
overutilized? J Affect Disord. 1999;52:13544.
5. Duffy A, Alda M, Hajek T, Grof P. Early course of bipolar dis-
order in high-risk offspring: prospective study. Br J Psychiatry.
6. Conus P, Macneil C, McGorry PD. Public health signicance of
bipolar disorder: implications for early intervention and preven-
tion. Bipolar Disord. 2014;16:54856.
7. Schmitt A, Rujescu D, Gawlik M, Hasan A, Hashimoto K, Iceta S,
et al. Consensus paper of the WFSBP Task Force on Biological
Markers: criteria for biomarkers and endophenotypes of schizo-
phrenia part II: cognition, neuroimaging and genetics. World J
Biol Psychiatry. 2016;17:40628.
8. Woodcock J, Woosley R. The FDA critical path initiative and its
inuence on new drug development. Annu Rev Med. 2008;
9. Hajek T, Kopecek M, Kozeny J, Gunde E, Alda M, Hoschl C.
Amygdala volumes in mood disorders - meta-analysis of magnetic
resonance volumetry studies. J Affect Disord. 2009;115:395410.
10. Hajek T, Kopecek M, Hoschl C, Alda M. Smaller hippocampal
volumes in patients with bipolar disorder are masked by exposure
to lithium: a meta-analysis. J Psychiatry Neurosci.
11. Hibar DP, Westlye LT, van Erp TG, Rasmussen J, Leonardo CD,
Faskowitz J, et al. Subcortical volumetric abnormalities in bipolar
disorder. Mol Psychiatry. 2016;21:17106.
12. Hajek T, Cullis J, Novak T, Kopecek M, Blagdon R, Propper L,
et al. Brain structural signature of familial predisposition for
bipolar disorder: replicable evidence for involvement of the right
inferior frontal gyrus. Biol Psychiatry. 2013;73:14452.
13. Ganzola R, Duchesne S. Voxel-based morphometry meta-analysis
of gray and white matter nds signicant areas of differences in
bipolar patients from healthy controls. Bipolar Disord.
14. Hibar DP, Westlye LT, Doan NT, Jahanshad N, Cheung JW,
Ching CRK, et al. Cortical abnormalities in bipolar disorder: an
MRI analysis of 6503 individuals from the ENIGMA Bipolar
Disorder Working Group. Mol Psychiatry. 2018;23:93242.
15. Orru G, Pettersson-Yeo W, Marquand AF, Sartori G, Mechelli A.
Using support vector machine to identify imaging biomarkers of
neurological and psychiatric disease: a critical review. Neurosci
Biobehav Rev. 2012;36:114052.
16. Fu CH, Costafreda SG. Neuroimaging-based biomarkers in psy-
chiatry: clinical opportunities of a paradigm shift. Can J Psy-
chiatry. 2013;58:499508.
17. Davatzikos C. Why voxel-based morphometric analysis should be
used with great caution when characterizing group differences.
Neuroimage. 2004;23:1720.
18. Castellanos FX, Di Martino A, Craddock RC, Mehta AD, Milham
MP. Clinical applications of the functional connectome. Neuro-
image. 2013;80:52740.
19. Milham MP, Craddock RC, Klein A. Clinically useful brain
imaging for neuropsychiatry: how can we get there? Depress
Anxiety. 2017;34:57887.
20. Atluri G, Padmanabhan K, Fang G, Steinbach M, Petrella JR, Lim
K, et al. Complex biomarker discovery in neuroimaging data:
nding a needle in a haystack. Neuroimage Clin. 2013;3:12331.
21. Davatzikos C, Shen D, Gur RC, Wu X, Liu D, Fan Y, et al.
Whole-brain morphometric study of schizophrenia revealing a
spatially complex set of focal abnormalities. Arch Gen Psychiatry.
22. Schnack HG, Nieuwenhuis M, van Haren NE, Abramovic L,
Scheewe TW, Brouwer RM, et al. Can structural MRI aid in
clinical classication? A machine learning study in two indepen-
dent samples of patients with schizophrenia, bipolar disorder and
healthy subjects. Neuroimage. 2014;84:299306.
23. Rocha-Rego V, Jogia J, Marquand AF, Mourao-Miranda J, Sim-
mons A, Frangou S. Examination of the predictive value of
structural magnetic resonance scans in bipolar disorder: a pattern
classication approach. Psychol Med. 2014;44:51932.
24. Bansal R, Staib LH, Laine AF, Hao X, Xu D, Liu J, et al. Ana-
tomical brain images alone can accurately diagnose chronic neu-
ropsychiatric illnesses. PLoS ONE. 2012;7:e50698.
25. Schnack HG, Kahn RS. Detecting neuroimaging biomarkers for
psychiatric disorders: sample size matters. Front Psychiatry.
26. Woo CW, Chang LJ, Lindquist MA, Wager TD. Building better
biomarkers: brain models in translational neuroimaging. Nat
Neurosci. 2017;20:36577.
27. Varoquaux G. Cross-validation failure: small sample sizes lead to
large error bars. Neuroimage. 2017;S1053-8119:30531-1.
2140 A. Nunes et al.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
28. Mwangi B, Spiker D, Zunta-Soares GB, Soares JC. Prediction of
pediatric bipolar disorder using neuroanatomical signatures of the
amygdala. Bipolar Disord. 2014;16:71321.
29. Jie NF, Zhu MH, Ma XY, Osuch EA, Wammes M, Theberge J,
et al. Discriminating bipolar disorder from major depression based
on SVM-FoBa: efcient feature selection with multimodal brain
imaging data. IEEE Trans Auton Ment Dev. 2015;7:32031.
30. Serpa MH, Ou Y, Schaufelberger MS, Doshi J, Ferreira LK,
Machado-Vieira R, et al. Neuroanatomical classication in a
population-based sample of psychotic major depression and
bipolar I disorder with 1 year of diagnostic stability. Biomed Res
Int. 2014;2014:706157.
31. Fung G, Deng Y, Zhao Q, Li Z, Qu M, Li K, et al. Distinguishing
bipolar and major depressive disorders by brain structural mor-
phometry: a pilot study. BMC Psychiatry. 2015;15:298.
32. Rubin-Falcone H, Zanderigo F, Thapa-Chhetry B, Lan M, Miller
JM, Sublette ME, et al. Pattern recognition of magnetic resonance
imaging-based gray matter volume measurements classies
bipolar disorder and major depressive disorder. J Affect Disord.
33. Sacchet MD, Livermore EE, Iglesias JE, Glover GH, Gotlib IH.
Subcortical volumes differentiate major depressive disorder,
bipolar disorder, and remitted major depressive disorder. J Psy-
chiatr Res. 2015;68:918.
34. Koutsouleris N, Meisenzahl EM, Borgwardt S, Riecher-Rossler A,
Frodl T, Kambeitz J et al. Individualized differential diagnosis of
schizophrenia and mood disorders using neuroanatomical bio-
markers. Brain 2015;138:205973.
35. Doan NT, Kaufmann T, Bettella F, Jorgensen KN, Brandt CL,
Moberget T, et al. Distinct multivariate brain morphological pat-
terns and their added predictive value with cognitive and poly-
genic risk scores in mental disorders. Neuroimage Clin.
36. Mwangi B, Wu MJ, Cao B, Passos IC, Lavagnino L, Keser Z,
et al. Individualized prediction and clinical staging of bipolar
disorders using neuroanatomical biomarkers. Biol Psychiatry
Cogn Neurosci Neuroimaging. 2016;1:18694.
37. Salvador R, Radua J, Canales-Rodriguez EJ, Solanes A, Sarro S,
Goikolea JM, et al. Evaluation of machine learning algorithms and
structural features for optimal MRI-based diagnostic prediction in
psychosis. PLoS ONE. 2017;12:e0175683.
38. Redlich R, Almeida JJ, Grotegerd D, Opel N, Kugel H, Heindel
W, et al. Brain morphometric biomarkers distinguishing unipolar
and bipolar depression. A voxel-based morphometry-pattern
classication approach. JAMA Psychiatry. 2014;71:122230.
39. Iniesta R, Stahl D, McGufn P. Machine learning, statistical
learning and the future of biological research in psychiatry. Psy-
chol Med. 2016;46:245565.
40. Kempton MJ, Haldane M, Jogia J, Grasby PM, Collier D, Frangou
S. Dissociable brain structural changes associated with predis-
position, resilience, and disease expression in bipolar disorder. J
Neurosci. 2009;29:108638.
41. Roberts G, Lenroot R, Frankland A, Yeung PK, Gale N, Wright
A, et al. Abnormalities in left inferior frontal gyral thickness and
parahippocampal gyral volume in young people at high genetic
risk for bipolar disorder. Psychol Med. 2016;46:208396.
42. Hajek T, Cullis J, Novak T, Kopecek M, Hoschl C, Blagdon R,
et al. Hippocampal volumes in bipolar disorders: opposing effects
of illness burden and lithium treatment. Bipolar Disord.
43. Kelly S, Jahanshad N, Zalesky A, Kochunov P, Agartz I, Alloza
C, et al. Widespread white matter microstructural differences
in schizophrenia across 4322 individuals: results from the
ENIGMA Schizophrenia DTI Working Group. Mol Psychiatry
44. Schmaal L, Hibar DP, Samann PG, Hall GB, Baune BT, Jahan-
shad N, et al. Cortical abnormalities in adults and adolescents with
major depression based on brain scans from 20 cohorts worldwide
in the ENIGMA Major Depressive Disorder Working Group. Mol
Psychiatry. 2017;22:9009.
45. Panizzon MS, Fennema-Notestine C, Eyler LT, Jernigan TL,
Prom-Wormley E, Neale M, et al. Distinct genetic inuences on
cortical surface area and cortical thickness. Cereb Cortex.
46. Winkler AM, Kochunov P, Blangero J, Almasy L, Zilles K, Fox
PT, et al. Cortical thickness or grey matter volume? The impor-
tance of selecting the phenotype for imaging genetics studies.
Neuroimage. 2010;53:113546.
47. Lin A, Ching CRK, Vajdi A, Sun D, Jonas RK, Jalbrzikowski M,
et al. Mapping 22q11.2 gene dosage effects on brain morpho-
metry. J Neurosci. 2017;37:618399.
48. Rozycki M, Satterthwaite TD, Koutsouleris N, Erus G, Doshi J,
Wolf DH, et al. Multisite machine learning analysis provides a
robust structural imaging signature of schizophrenia detectable
across diverse patient populations and within individuals. Schi-
zophr Bull. 2017.
49. Abraham A, Milham MP, Di Martino A, Craddock RC, Samaras
D, Thirion B, et al. Deriving reproducible biomarkers from multi-
site resting-state data: an autism-based example. Neuroimage.
50. Cortes C, Vapnik V. Support-vector networks. Mach Learn.
51. Arbabshirani MR, Plis S, Sui J, Calhoun VD. Single subject
prediction of brain disorders in neuroimaging: promises and pit-
falls. Neuroimage. 2017;145(Pt B):13765.
52. Pedregosa F, Varoquaux GI, Gramfort A, Michel V, Thirion B,
Grisel O, et al. Scikit-learn: machine learning in python. J Mach
Learn Res. 2012;12:282530.
53. Mourao-Miranda J, Reinders AA, Rocha-Rego V, Lappin J,
Rondina J, Morgan C, et al. Individualized prediction of illness
course at the rst psychotic episode: a support vector machine
MRI study. Psychol Med. 2012;42:103747.
54. Ecker C, Rocha-Rego V, Johnston P, Mourao-Miranda J, Mar-
quand A, Daly EM, et al. Investigating the predictive value of
whole-brain structural MR scans in autism: a pattern classication
approach. Neuroimage. 2010;49:4456.
55. Hajek T, Cooke C, Kopecek M, Novak T, Hoschl C, Alda M.
Using structural MRI to identify individuals at genetic risk for
bipolar disorders: a 2-cohort, machine learning study. J Psychiatry
Neurosci. 2015;40:31624.
56. Pettersson-Yeo W, Benetti S, Marquand AF, Dellacqua F, Wil-
liams SC, Allen P, et al. Using genetic, cognitive and multi-modal
neuroimaging data to identify ultra-high-risk and rst-episode
psychosis at the individual level. Psychol Med. 2013;43:
57. LaConte S, Strother S, Cherkassky V, Anderson J, Hu X. Support
vector machines for temporal classication of block design fMRI
data. Neuroimage. 2005;26:31729.
58. Wolfers T, Buitelaar JK, Beckmann CF, Franke B, Marquand AF.
From estimating activation locality to predicting disorder: a
review of pattern recognition for neuroimaging-based psychiatric
diagnostics. Neurosci Biobehav Rev. 2015;57:32849.
59. Rutter CM, Gatsonis C. A hierarchical regression approach to
meta-analysis of diagnostic test accuracy evaluations. Stat Med.
60. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE:
synthetic minority over-sampling technique. J Artif Intell Res.
61. He H, Garcia E. Learning from imbalanced data sets. IEEE Trans
Knowl data Eng. 2010;21:12634.
Using structural MRI to identify bipolar disorders 13 site machine learning study in 3020 individuals. . . 2141
Content courtesy of Springer Nature, terms of use apply. Rights reserved
62. Lemaitre G, Nogueira F, Aridas CK. Imbalanced-learn: a python
toolbox to tackle the curse of imbalanced datasets in machine
learning. J Mach Learn Res. 2017;18:15.
63. Bates D, MÃ chler M, Bolker B, Walker S. Fitting linear mixed-
effects models using lme4. J Stat Softw. 2015;67:51.
64. Hosmer DW, Lemeshow S, Sturdivant RX. Applied logistic
regression. 3rd ed. Hoboken, New Jersey, USA: Wiley; 2013
65. Iniesta R, Hodgson K, Stahl D, Malki K, Maier W, Rietschel M,
et al. Antidepressant drug-specic prediction of depression treat-
ment outcomes from genetic and clinical variables. Sci Rep.
66. Savitz JB, Rauch SL, Drevets WC. Clinical application of brain
imaging for the diagnosis of mood disorders: the current state of
play. Mol Psychiatry. 2013;18:52839.
67. Regier DA, Narrow WE, Clarke DE, Kraemer HC, Kuramoto SJ,
Kuhl EA, et al. DSM-5 eld trials in the United States and
Canada, Part II: test-retest reliability of selected categorical
diagnoses. Am J Psychiatry. 2013;170:5970.
68. Bengio Y, Courville A, Vincent P. Representation learning: a
review and new perspectives. IEEE Trans Pattern Anal Mach
Intell. 2013;35:1798828.
69. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature.
70. Nieuwenhuis M, van Haren NE, Hulshoff Pol HE, Cahn W, Kahn
RS, Schnack HG. Classication of schizophrenia patients and
healthy controls from structural MRI scans in two large inde-
pendent samples. Neuroimage. 2012;61:60612.
71. Haufe S, Meinecke F, Gorgen K, Dahne S, Haynes JD, Blankertz
B, et al. On the interpretation of weight vectors of linear models in
multivariate neuroimaging. Neuroimage. 2014;87:96110.
72. Robins E, Guze SB. Establishment of diagnostic validity in psy-
chiatric illness: its application to schizophrenia. Am J Psychiatry.
73. Berk M, Kapczinski F, Andreazza AC, Dean OM, Giorlando F,
Maes M, et al. Pathways underlying neuroprogression in bipolar
disorder: focus on inammation, oxidative stress and neurotrophic
factors. Neurosci Biobehav Rev. 2011;35:80417.
74. Kambeitz J, Kambeitz-Ilankovic L, Leucht S, Wood S, Davatzikos
C, Malchow B, et al. Detecting neuroimaging biomarkers for
schizophrenia: a meta-analysis of multivariate pattern recognition
studies. Neuropsychopharmacology. 2015;40:174251.
75. Hajek T, Weiner MW. Neuroprotective effects of lithium in
human brain? Food for thought. Curr Alzheimer Res.
76. Tariot PN, Schneider LS, Cummings J, Thomas RG, Raman R,
Jakimovich LJ, et al. Chronic divalproex sodium to attenuate
agitation and clinical progression of Alzheimer disease. Arch Gen
Psychiatry. 2011;68:85361.
77. Dluhos P, Schwarz D, Cahn W, Van Haren N, Kahn R, Spaniel F,
et al. Multi-center machine learning in imaging psychiatry: a
meta-model approach. Neuroimage. 2017;155:1024.
78. Goodfellow I, Bengio Y, Courville A. Chapter 5: Machine
learning basics. In Deep Learning. Cambridge, MA, USA: MIT
Press; 2016.
Abraham Nunes1,2 Hugo G. Schnack3Christopher R. K. Ching4,5 Ingrid Agartz6,7,8,9
Theophilus N. Akudjedu 10 Martin Alda1Dag Alnæs 6,7 Silvia Alonso-Lana 11,12 Jochen Bauer13
Bernhard T. Baune 14 Erlend Bøen8Caterina del Mar Bonnin15 Geraldo F. Busatto16,17
Erick J. Canales-Rodríguez11,12 Dara M. Cannon10 Xavier Caseras18 Tiffany M. Chaim-Avancini16,17
Udo Dannlowski19 Ana M. Díaz-Zuluaga20 Bruno Dietsche21 Nhat Trung Doan6,7 Edouard Duchesnay 22
Torbjørn Elvsåshagen6,23 Daniel Emden19 Lisa T. Eyler 24,25 Mar Fatjó-Vilas11,12,26 Pauline Favre22
Sonya F. Foley27 Janice M. Fullerton 28,29 David C. Glahn 30,31 Jose M. Goikolea15 Dominik Grotegerd19
Tim Hahn19 Chantal Henry32 Derrek P. Hibar5Josselin Houenou22,33 Fleur M. Howells34,35 Neda Jahanshad5
Tobias Kaufmann6,7 Joanne Kenney 10 Tilo T. J. Kircher21 Axel Krug21 Trine V. Lagerberg6
Rhoshel K. Lenroot36,37 Carlos López-Jaramillo20,38 Rodrigo Machado-Vieira16,39 Ulrik F. Malt40,41
Colm McDonald10 Philip B. Mitchell36,42 Benson Mwangi39 Leila Nabulsi 10 Nils Opel19 Bronwyn J. Overs28
Julian A. Pineda-Zapata43 Edith Pomarol-Clotet11,12 Ronny Redlich19 Gloria Roberts36,42 Pedro G. Rosa16,17
Raymond Salvador11,12 Theodore D. Satterthwaite 44 Jair C. Soares39 Dan J. Stein45 Henk S. Temmingh45,46
Thomas Trappenberg2Anne Uhlmann45,47 Neeltje E. M. van Haren3,48 Eduard Vieta15 Lars T. Westlye 6,7,49
Daniel H. Wolf44 Dilara Yüksel21 Marcus V. Zanetti16,17,50 Ole A. Andreassen 6,7 Paul M. Thompson5
Tomas Hajek1for the ENIGMA Bipolar Disorders Working Group
1Department of Psychiatry, Dalhousie University, Halifax, Nova
Scotia, Canada
2Faculty of Computer Science, Dalhousie University,
Halifax, Nova Scotia, Canada
3Department of Psychiatry, Brain Center Rudolf Magnus,
University Medical Center Utrecht, Utrecht University,
Utrecht, The Netherlands
4Interdepartmental Neuroscience Program, University of California,
Los Angeles, CA, USA
5Imaging Genetics Center, Mark and Mary Stevens Neuroimaging
and Informatics Institute, Keck School of Medicine of USC,
University of Southern California, Marina del Rey, CA, USA
6NORMENT KG Jebsen Centre, University of Oslo, Oslo, Norway
7Division of Mental Health and Addiction, Oslo University
Hospital, Oslo, Norway
8Department of Psychiatric Research, Diakonhjemmet Hospital,
Oslo, Norway
2142 A. Nunes et al.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9Department of Clinical Neuroscience, Centre for Psychiatric
Research, Karolinska Institutet, Stockholm, Sweden
10 Centre for Neuroimaging and Cognitive Genomics (NICOG),
Clinical Neuroimaging Laboratory, NCBES Galway Neuroscience
Centre, College of Medicine Nursing and Health Sciences,
National University of Ireland Galway, Galway, Ireland
11 FIDMAG Germanes Hospitalàries Research Foundation,
Barcelona, Spain
12 Centro de Investigación Biomédica en Red de Salud Mental
(CIBERSAM), Madrid, Spain
13 Institute of Clinical Radiology, Medical Faculty University of
Muenster and University Hospital Muenster,
Muenster, Germany
14 Department of Psychiatry, Melbourne Medical School, The
University of Melbourne, Parkville, VIC, Australia
15 Hospital Clinic, University of Barcelona, IDIBAPS, CIBERSAM,
Barcelona, Catalonia, Spain
16 Laboratory of Psychiatric Neuroimaging (LIM-21), Department
and Institute of Psychiatry, Faculty of Medicine, University of São
Paulo, São Paulo, Brazil
17 Center for Interdisciplinary Research on Applied Neurosciences
(NAPNA), University of São Paulo, São Paulo, Brazil
18 MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff
University, Cardiff, UK
19 Department of Psychiatry, University of Münster,
Münster, Germany
20 Research Group in Psychiatry, Department of Psychiatry, Faculty
of Medicine, Universidad de Antioquia, Medellín, Antioquia,
21 Department of Psychiatry and Psychotherapy, Philipps-University
Marburg, Marburg, Germany
22 NeuroSpin, CEA, Paris-Saclay, Gif sur Yvette, France
23 Department of Neurology, Oslo Universisty Hospital,
Oslo, Norway
24 Department of Psychiatry, University of California, San Diego,
La Jolla, CA, USA
25 Desert-Pacic Mental Illness Research, Education, and Clinical
Center, VA San Diego Healthcare System, La Jolla, CA, USA
26 Departament de Biologia Evolutiva, Ecologia i Ciències
Ambientals, Facultat de Biologia, Universitat de Barcelona,
Barcelona, Spain
27 Cardiff University Brain Research Imaging Centre, Cardiff
University, Cardiff, UK
28 Neuroscience Research Australia, Sydney, NSW, Australia
29 School of Medical Sciences, University of New South Wales,
Sydney, NSW, Australia
30 Department of Psychiatry, Yale University, New Haven, CT, USA
31 Olin Neuropsychiatric Research Center, Institute of Living,
Hartford Hospital, Hartford, CT, USA
32 Institut Pasteur, Unité Perception et Mémoire, Paris, France
33 INSERM U955 Team 15 Translational Psychiatry, University
Paris East, APHP, CHU Mondor, Fondation FondaMental,
Créteil, France
34 Neuroscience Institute, University of Cape Town, Cape Town,
South Africa
35 Translational Neuroscience Group, Department of Psychiatry and
Mental Health, Cape Town, South Africa
36 School of Psychiatry, University of New South Wales,
Sydney, NSW, Australia
37 Department of Psychiatry and Behavioural Sciences, University of
New Mexico, Albuquerque, NM, USA
38 Mood Disorders Program, Hospital Universitario San Vicente
Fundación, Medellín, Antioquia, Colombia
39 Department of Psychiatry, University of Texas Health Science
Center at Houston, Houston, TX, USA
40 Psychosomatic Unit, Division of Mental Health and Dependence,
Oslo University Hospital and University of Oslo, Oslo, Norway
41 University of Oslo, Institute of Clinical Medicine, Oslo, Norway
42 Black Dog Institute, Prince of Wales Hospital, Sydney, NSW,
43 Research Group, Instituto de Alta Tecnología Médica (IATM),
Medellín, Antioquia, Colombia
44 Department of Psychiatry, University of Pennsylvania,
Philadelphia, PA, USA
45 Department of Psychiatry, SA MRC Unit on Risk & Resilience in
Mental Disorders, University of Cape Town, Cape Town, South
46 Western Cape Department of Health, Valkenberg Hospital,
Cape Town, Western Cape, South Africa
47 Department of Psychiatry, University of Vermont, Burlington, VT,
48 Department of Child and Adolescent Psychiatry/Psychology,
Erasmus Medical Centre, Rotterdam, The Netherlands
49 Department of Psychology, University of Oslo, Oslo, Norway
50 Instituto de Ensino e Pesquisa, Hospital Sírio-Libanês, Sao Paulo,
Using structural MRI to identify bipolar disorders 13 site machine learning study in 3020 individuals. . . 2143
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
... have most likely afflicted at least some of these studies [162,216,215]. For bipolar disorder, a large recent multi-site study [217] based on 13 cohorts from ENIGMA exists (N = 3020) that report aggregated subject-level accuracies of about 65% using a combination of SVMs and extracted MRI features (regional cortical thickness, surface area, and subcortical volumes). ...
Full-text available
By promising more accurate diagnostics and individual treatment recommendations, deep neural networks and in particular convolutional neural networks have advanced to a powerful tool in medical imaging. Here, we first give an introduction into methodological key concepts and resulting methodological promises including representation and transfer learning, as well as modelling domain-specific priors. After reviewing recent applications within neuroimaging-based psychiatric research, such as the diagnosis of psychiatric diseases, delineation of disease subtypes, normative modeling, and the development of neuroimaging biomarkers, we discuss current challenges. This includes for example the difficulty of training models on small, heterogeneous and biased data sets, the lack of validity of clinical labels, algorithmic bias, and the influence of confounding variables.
... The most negative correlation is found for the median cingulate and paracingulate gyryinferior parietal gyrus, excluding supramarginal and angular gyri [57,58]. In addition, we identify brain regions that the machine learning algorithm considers to be significant in brain age estimation using the feature importance [59]. The feature importance values are normalized to give the top 20 functional connectivity features (Fig. 3). ...
Full-text available
Major depressive disorder (MDD) is one of the most common mental health conditions that has been intensively investigated for its association with brain atrophy and mortality. Recent studies suggest that the deviation between the predicted and the chronological age can be a marker of accelerated brain aging to characterize MDD. However, current conclusions are usually drawn based on structural MRI information collected from Caucasian participants. The universality of this biomarker needs to be further validated by subjects with different ethnic/racial backgrounds and by different types of data. Here we make use of the REST-meta-MDD, a large scale resting-state fMRI dataset collected from multiple cohort participants in China. We develop a stacking machine learning model based on 1101 healthy controls, which estimates a subject’s chronological age from fMRI with promising accuracy. The trained model is then applied to 1276 MDD patients from 24 sites. We observe that MDD patients exhibit a +4.43 years (p < 0.0001, Cohen’s d = 0.31, 95% CI: 2.23–3.88) higher brain-predicted age difference (brain-PAD) compared to controls. In the MDD subgroup, we observe a statistically significant +2.09 years (p < 0.05, Cohen’s d = 0.134525) brain-PAD in antidepressant users compared to medication-free patients. The statistical relationship observed is further checked by three different machine learning algorithms. The positive brain-PAD observed in participants in China confirms the presence of accelerated brain aging in MDD patients. The utilization of functional brain connectivity for age estimation verifies existing findings from a new dimension.
... Support vector machine is most frequently used among many ML methods for distinguishing BD from HCs. For example, Nunes et al. (2020) employed brain regional cortical thickness, surface area, and subcortical volumes to train the linear kernel SVM algorithm to delineate BD from HCs and obtain an accuracy of 58.67%. Other studies reported accuracy of 60 and 66.1% when SVM was trained by GM density and volume of GM and WM in distinguishing BD from HCs (Schnack et al., 2014;Serpa et al., 2014). ...
Full-text available
The diagnosis based on clinical assessment of pediatric bipolar disorder (PBD) may sometimes lead to misdiagnosis in clinical practice. For the past several years, machine learning (ML) methods were introduced for the classification of bipolar disorder (BD), which were helpful in the diagnosis of BD. In this study, brain cortical thickness and subcortical volume of 33 PBD-I patients and 19 age-sex matched healthy controls (HCs) were extracted from the magnetic resonance imaging (MRI) data and set as features for classification. The dimensionality reduced feature subset, which was filtered by Lasso or f_classif, was sent to the six classifiers (logistic regression (LR), support vector machine (SVM), random forest classifier, naïve Bayes, k-nearest neighbor, and AdaBoost algorithm), and the classifiers were trained and tested. Among all the classifiers, the top two classifiers with the highest accuracy were LR (84.19%) and SVM (82.80%). Feature selection was performed in the six algorithms to obtain the most important variables including the right middle temporal gyrus and bilateral pallidum, which is consistent with structural and functional anomalous changes in these brain regions in PBD patients. These findings take the computer-aided diagnosis of BD a step forward.
Introduction: Estimating the risk of manic relapse could help the psychiatrist individually adjust the treatment to the risk. Some authors have attempted to estimate this risk from baseline clinical data. Still, no studies have assessed whether the estimation could improve by adding structural magnetic resonance imaging (MRI) data. We aimed to evaluate it. Material and Methods: We followed a cohort of 78 patients with a manic episode without mixed symptoms (bipolar type I or schizoaffective disorder) at 2-4-6-9-12-15-18 months and up to 10 years. Within a cross-validation scheme, we created and evaluated a Cox lasso model to estimate the risk of manic relapse using both clinical and MRI data. Results: The model successfully estimated the risk of manic relapse (Cox regression of the time to relapse as a function of the estimated risk: hazard ratio (HR)=2.35, p=0.027; area under the curve (AUC)=0.65, expected calibration error (ECE)<0.2). The most relevant variables included in the model were the diagnosis of schizoaffective disorder, poor impulse control, unusual thought content, and cerebellum volume decrease. The estimations were poorer when we used clinical or MRI data separately. Conclusion: Combining clinical and MRI data may improve the risk of manic relapse estimation after a manic episode. We provide a website that estimates the risk according to the model to facilitate replication by independent groups before translation to clinical settings.
The digital world has been growing exponentially, changing the way we do science—both in how we generate and disseminate it. This has consequently opened up opportunities to improve mental healthcare. Mobile interventions, real-time symptom monitoring, chatbots, and digital phenotyping are among those opportunities that take mental health outside of the office, empowering patients in their own treatment. However, these approaches do not rule out the presence of mental health professionals, who not only use technology for telepsychiatry consultations, but also for more effective and data-driven diagnosis, treatment, and follow-up in the clinic. New technologies can support direct interventions with patients, the collection of data outside the office, either active or passively, and a clearer look at the data both for patients and providers with the help of data curators. To guarantee privacy, efficacy, and evaluate risks attached to the treatment provided by these apps, the American Psychiatric Association (APA) developed an evaluation model to do such assessment. This way the clinician–patient relationship can expand, providing more clinical insights to both clinicians and patients and a more empathetic look for providers at their own patients. Ethical concerns must be considered when talking about the integration of technology with mental healthcare, but its perceived advantages, such as accessibility, efficacy, and engagement, promise a new era in this important department worldwide.
Full-text available
Background Major depressive disorder and bipolar disorder in adolescents are prevalent and are associated with cognitive impairment, executive dysfunction, and increased mortality. Early intervention in the initial stages of major depressive disorder and bipolar disorder can significantly improve personal health. Methods We collected 309 samples from the Adolescent Brain Cognitive Development study, including 116 adolescents with bipolar disorder, 64 adolescents with major depressive disorder, and 129 healthy adolescents, and employed a support vector machine to develop classification models for identification. We developed a multimodal model, which combined functional connectivity of resting-state functional magnetic resonance imaging and four anatomical measures of structural magnetic resonance imaging (cortical thickness, area, volume, and sulcal depth). We measured the performances of both multimodal and single modality classifiers. Results The multimodal classifiers showed outstanding performance compared with all five single modalities, and they are 100% for major depressive disorder versus healthy controls, 100% for bipolar disorder versus healthy control, 98.5% (95% CI: 95.4–100%) for major depressive disorder versus bipolar disorder, 100% for major depressive disorder versus depressed bipolar disorder and the leave-one-site-out analysis results are 77.4%, 63.3%, 79.4%, and 81.7%, separately. Conclusions The study shows that multimodal classifiers show high classification performances. Moreover, cuneus may be a potential biomarker to differentiate major depressive disorder, bipolar disorder, and healthy adolescents. Overall, this study can form multimodal diagnostic prediction workflows for clinically feasible to make more precise diagnose at the early stage and potentially reduce loss of personal pain and public society.
Bipolar disorder (BD) is a severe mental illness associated with alterations in brain organization. Neuroimaging studies have generated a large body of knowledge regarding brain morphological and functional abnormalities in BD. Current advances in the field have focussed on the need for more precise neuroimaging biomarkers. Here we present a selective overview of precision neuroimaging biomarkers for BD, focussing on personalized metrics and novel neuroimaging methods aiming to provide mechanistic insights into the brain alterations associated with BD. The evidence presented covers (a) machine learning techniques applied to neuroimaging data to differentiate patients with BD from healthy individuals or other clinical groups; (b) the ‘brain-age-gap-estimation (brainAGE), which is an individualized measure of brain health; (c) diffusional kurtosis imaging (DKI), neurite orientation dispersion and density imaging (NODDI) and Positron Emission Tomography (PET) techniques that open new opportunities to measure microstructural changes in neurite/synaptic integrity and function.
Full-text available
Background The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded. Results The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases. Conclusions The proposed test (implemented in the package mlconfound; can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers.
Background and Hypothesis Multisite massive schizophrenia neuroimaging data sharing is becoming critical in understanding the pathophysiological mechanism and making an objective diagnosis of schizophrenia; it remains challenging to obtain a generalizable and interpretable, shareable, and evolvable neuroimaging biomarker for schizophrenia diagnosis. Study Design A Morphometric Integrated Classification Index (MICI) was proposed as a potential biomarker for schizophrenia diagnosis based on structural magnetic resonance imaging data of 1270 subjects from 10 sites (588 schizophrenia patients and 682 normal controls). An optimal XGBoost classifier plus sample-weighted SHapley Additive explanation algorithms were used to construct the MICI measure. Study Results The MICI measure achieved comparable performance with the sample-weighted ensembling model and merged model based on raw data (Delong test, P > 0.82) while outperformed the single-site models (Delong test, P < 0.05) in either the independent-sample testing datasets from the 9 sites or the independent-site dataset (generalizable). Besides, when new sites were embedded in, the performance of this measure was gradually increasing (evolvable). Finally, MICI was strongly associated with the severity of schizophrenia brain structural abnormality, with the patients’ positive and negative symptoms, and with the brain expression profiles of schizophrenia risk genes (interpretable). Conclusions In summary, the proposed MICI biomarker may provide a simple and explainable way to support clinicians for objectively diagnosing schizophrenia. Finally, we developed an online model share platform to promote biomarker generalization and provide free individual prediction services (
Brain diseases impact more than 1 billion people worldwide and include a wide spectrum of diseases and disorders such as stroke, Alzheimer’s, Parkinson’s, Epilepsy and other Seizure disorders. Most of these brain illnesses are subjected to misclassification, and early diagnosis increases the possibilities of preventing or delaying the development of these disorders. Magnetic Resonance Imaging (MRI) plays an important role in the diagnosis of patients with brain disorders and offers the potential of non-invasive longitudinal monitoring and bio-markers of disease progression. Our work focuses on using machine learning and deep learning techniques for the preemptive diagnosis of Schizophrenia using Kaggle data set and Alzheimer’s using TADPOLE data set comprising of MRI features. Since the number of works using TADPOLE data set is minimum, we have chosen this for our study. Machine learning algorithms such as support vector machine (SVM), Decision Tree, Random Forest, Gaussian Naive Bayes, and 1D-CNN deep learning algorithm have been used for the classification of the disorders. It has been observed that Gaussian NB performed the best on Schizophrenia data, while Random Forest outperformed on Alzheimer’s data compared to the other classifiers.KeywordsMagnetic Resonance Imaging (MRI)Alzheimer’sParkinson’sBrain disorders1D-CNNGaussian Naive Bayes
Full-text available
Individuals with depression differ substantially in their response to treatment with antidepressants. Specific predictors explain only a small proportion of these differences. To meaningfully predict who will respond to which antidepressant, it may be necessary to combine multiple biomarkers and clinical variables. Using statistical learning on common genetic variants and clinical information in a training sample of 280 individuals randomly allocated to 12-week treatment with antidepressants escitalopram or nortriptyline, we derived models to predict remission with each antidepressant drug. We tested the reproducibility of each prediction in a validation set of 150 participants not used in model derivation. An elastic net logistic model based on eleven genetic and six clinical variables predicted remission with escitalopram in the validation dataset with area under the curve 0.77 (95%CI; 0.66-0.88; p = 0.004), explaining approximately 30% of variance in who achieves remission. A model derived from 20 genetic variables predicted remission with nortriptyline in the validation dataset with an area under the curve 0.77 (95%CI; 0.65-0.90; p < 0.001), explaining approximately 36% of variance in who achieves remission. The predictive models were antidepressant drug-specific. Validated drug-specific predictions suggest that a relatively small number of genetic and clinical variables can help select treatment between escitalopram and nortriptyline.
Full-text available
The regional distribution of white matter (WM) abnormalities in schizophrenia remains poorly understood, and reported disease effects on the brain vary widely between studies. In an effort to identify commonalities across studies, we perform what we believe is the first ever large-scale coordinated study of WM microstructural differences in schizophrenia. Our analysis consisted of 2359 healthy controls and 1963 schizophrenia patients from 29 independent international studies; we harmonized the processing and statistical analyses of diffusion tensor imaging (DTI) data across sites and meta-analyzed effects across studies. Significant reductions in fractional anisotropy (FA) in schizophrenia patients were widespread, and detected in 20 of 25 regions of interest within a WM skeleton representing all major WM fasciculi. Effect sizes varied by region, peaking at (d=0.42) for the entire WM skeleton, driven more by peripheral areas as opposed to the core WM where regions of interest were defined. The anterior corona radiata (d=0.40) and corpus callosum (d=0.39), specifically its body (d=0.39) and genu (d=0.37), showed greatest effects. Significant decreases, to lesser degrees, were observed in almost all regions analyzed. Larger effect sizes were observed for FA than diffusivity measures; significantly higher mean and radial diffusivity was observed for schizophrenia patients compared with controls. No significant effects of age at onset of schizophrenia or medication dosage were detected. As the largest coordinated analysis of WM differences in a psychiatric disorder to date, the present study provides a robust profile of widespread WM abnormalities in schizophrenia patients worldwide. Interactive three-dimensional visualization of the results is available at
Full-text available
The brain underpinnings of schizophrenia and bipolar disorders are multidimensional, reflecting complex pathological processes and causal pathways, requiring multivariate techniques to disentangle. Furthermore, little is known about the complementary clinical value of brain structural phenotypes when combined with data on cognitive performance and genetic risk. Using data-driven fusion of cortical thickness, surface area, and gray matter density maps (GMD), we found six biologically meaningful patterns showing strong group effects, including four statistically independent multimodal patterns reflecting co-occurring alterations in thickness and GMD in patients, over and above two other independent patterns of widespread thickness and area reduction. Case-control classification using cognitive scores alone revealed high accuracy, and adding imaging features or polygenic risk scores increased performance, suggesting their complementary predictive value with cognitive scores being the most sensitive features. Multivariate pattern analyses reveal distinct patterns of brain morphology in mental disorders, provide insights on the relative importance between brain structure, cognitive and polygenetic risk score in classification of patients, and demonstrate the importance of multivariate approaches in studying the pathophysiological substrate of these complex disorders.
Full-text available
Reciprocal chromosomal rearrangements at the 22q11.2 locus are associated with elevated risk of neurodevelopmental disorders. The 22q11.2 deletion confers the highest known genetic risk for schizophrenia, but a duplication in the same region is strongly associated with autism and is less common in schizophrenia cases than in the general population. Here we conducted the first study of 22q11.2 gene dosage effects on brain structure in a sample of 143 human subjects: 66 with 22q11.2 deletions (22q-del; 32 males), 21 with 22q11.2 duplications (22q-dup; 14 males), and 56 age- and sex-matched controls (31 males). 22q11.2 gene dosage varied positively with intracranial volume, gray and white matter volume, and cortical surface area (deletion < control < duplication). In contrast, gene dosage varied negatively with mean cortical thickness (deletion > control > duplication). Widespread differences were observed for cortical surface area with more localized effects on cortical thickness. These diametric patterns extended into subcortical regions: 22q-dup carriers had a significantly larger right hippocampus, on average, but lower right caudate and corpus callosum volume, relative to 22q-del carriers. Novel subcortical shape analysis revealed greater radial distance (thickness) of the right amygdala and left thalamus, and localized increases and decreases in subregions of the caudate, putamen, and hippocampus in 22q-dup relative to 22q-del carriers. This study provides the first evidence that 22q11.2 is a genomic region associated with gene-dose-dependent brain phenotypes. Pervasive effects on cortical surface area imply that this copy number variant affects brain structure early in the course of development.
Full-text available
Despite decades of research, the pathophysiology of bipolar disorder (BD) is still not well understood. Structural brain differences have been associated with BD, but results from neuroimaging studies have been inconsistent. To address this, we performed the largest study to date of cortical gray matter thickness and surface area measures from brain magnetic resonance imaging scans of 6503 individuals including 1837 unrelated adults with BD and 2582 unrelated healthy controls for group differences while also examining the effects of commonly prescribed medications, age of illness onset, history of psychosis, mood state, age and sex differences on cortical regions. In BD, cortical gray matter was thinner in frontal, temporal and parietal regions of both brain hemispheres. BD had the strongest effects on left pars opercularis (Cohen’s d=−0.293; P=1.71 × 10⁻²¹), left fusiform gyrus (d=−0.288; P=8.25 × 10⁻²¹) and left rostral middle frontal cortex (d=−0.276; P=2.99 × 10⁻¹⁹). Longer duration of illness (after accounting for age at the time of scanning) was associated with reduced cortical thickness in frontal, medial parietal and occipital regions. We found that several commonly prescribed medications, including lithium, antiepileptic and antipsychotic treatment showed significant associations with cortical thickness and surface area, even after accounting for patients who received multiple medications. We found evidence of reduced cortical surface area associated with a history of psychosis but no associations with mood state at the time of scanning. Our analysis revealed previously undetected associations and provides an extensive analysis of potential confounding variables in neuroimaging studies of BD.
Past work on relatively small, single-site studies using regional volumetry, and more recently machine learning methods, has shown that widespread structural brain abnormalities are prominent in schizophrenia. However, to be clinically useful, structural imaging biomarkers must integrate high-dimensional data and provide reproducible results across clinical populations and on an individual person basis. Using advanced multi-variate analysis tools and pooled data from case–control imaging studies conducted at 5 sites (941 adult participants, including 440 patients with schizophrenia), a neuroanatomical signature of patients with schizophrenia was found, and its robustness and reproducibility across sites, populations, and scanners, was established for single-patient classification. Analyses were conducted at multiple scales, including regional volumes, voxelwise measures, and complex distributed patterns. Single-subject classification was tested for single-site, pooled-site, and leave-site-out generalizability. Regional and voxelwise analyses revealed a pattern of widespread reduced regional gray matter volume, particularly in the medial prefrontal, temporolimbic and peri-Sylvian cortex, along with ventricular and pallidum enlargement. Multivariate classification using pooled data achieved a cross-validated prediction accuracy of 76% (AUC = 0.84). Critically, the leave-site-out validation of the detected schizophrenia signature showed accuracy/AUC range of 72–77%/0.73–0.91, suggesting a robust generalizability across sites and patient cohorts. Finally, individualized patient classifications displayed significant correlations with clinical measures of negative, but not positive, symptoms. Taken together, these results emphasize the potential for structural neuroimaging data to provide a robust and reproducible imaging signature of schizophrenia. A web-accessible portal is offered to allow the community to obtain individualized classifications of magnetic resonance imaging scans using the methods described herein.
Background: Bipolar Disorder (BD) cannot be reliably distinguished from Major Depressive Disorder (MDD) until the first manic or hypomanic episode. Consequently, many patients with BD are treated with antidepressants without mood stabilizers, a strategy that is often ineffective and carries a risk of inducing a manic episode. We previously reported reduced cortical thickness in right precuneus, right caudal middle-frontal cortex and left inferior parietal cortex in BD compared with MDD. Methods: This study extends our previous work by performing individual level classification of BD or MDD in an expanded, currently unmedicated, cohort using gray matter volume (GMV) based on Magnetic Resonance Imaging and a Support Vector Machine. All patients were in a Major Depressive Episode and a leave-two-out analysis was performed. Results: Nineteen out of 26 BD subjects and 20 out of 26 MDD subjects were correctly identified, for a combined accuracy of 75%. The three brain regions contributing to the classification were higher GMV in bilateral supramarginal gyrus and occipital cortex indicating MDD, and higher GMV in right dorsolateral prefrontal cortex indicating BD. Limitations: This analysis included scans performed with two different headcoils and scan sequences, which limited the interpretability of results in an independent cohort analysis. Conclusions: Our results add to previously published data which suggest that regional gray matter volume should be investigated further as a clinical diagnostic tool to predict BD before the appearance of a manic or hypomanic episode.
Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.