ArticlePDF Available

What can we learn about the psychiatric diagnostic categories by analysing patients' lived experiences with Machine-Learning? Open Access

Authors:

Abstract and Figures

Background To deliver appropriate mental healthcare interventions and support, it is imperative to be able to distinguish one person from the other. The current classification of mental illness (e.g., DSM) is unable to do that well, indicating the problem of diagnostic heterogeneity between disorders (i.e., the disorder categories have many common symptoms). As a result, the same person might be diagnosed with two different disorders by two independent clinicians. We argue that this problem might have resulted because these disorders were created by a group of humans (APA taskforce members) who relied on more intuition and consensus than data. Literature suggests that human-led decisions are prone to biases, group-thinking, and other factors (such as financial conflict of interest) that can enormously influence creating diagnostic and treatment guidelines. Therefore, in this study, we inquire that if we prevent such human intervention (and thereby their associated biases) and use Artificial Intelligence (A.I.) to form those disorder structures from the data (patient-reported symptoms) directly, then can we come up with homogenous clusters or categories (representing disorders/syndromes: a group of co-occurring symptoms) that are adequately distinguishable from each other for them to be clinically useful. Additionally, we inquired how these A.I.-created categories differ (or are similar) from human-created categories. Finally, to the best of our knowledge, this is the first study, that demonstrated how to use narrative qualitative data from patients with psychopathology and group their experiences using an A.I. Therefore, the current study also attempts to serve as a proof-of-concept. Method We used secondary data scraped from online communities and consisting of 10,933 patients’ narratives about their lived experiences. These patients were diagnosed with one or more DSM diagnoses for mental illness. Using Natural Language Processing techniques, we converted the text data into a numeric form. We then used an Unsupervised Machine Learning algorithm called K-Means Clustering to group/cluster the symptoms. Results Using the data mining approach, the A.I. found four categories/clusters formed from the data. We presented ten symptoms or experiences under each cluster to demonstrate the practicality of application and understanding. We also identified the transdiagnostic factors and symptoms that were unique to each of these four clusters. We explored the extent of similarities between these clusters and studied the difference in data density in them. Finally, we reported the silhouette score of + 0.046, indicating that the clusters are poorly distinguishable from each other (i.e., they have high overlapping symptoms). Discussion We infer that whether humans attempt to categorise mental illnesses or an A.I., the result is that the categories of mental disorders will not be unique enough to be able to distinguish one service seeker from another. Therefore, the categorical approach of diagnosing mental disorders can be argued to fall short of its purpose. We need to search for a classification system beyond the categorical approaches even if there are secondary merits (such as ease of communication and black-and-white (binary) decision making). However, using our A.I. based data mining approach had several meritorious findings. For example, we found that some symptoms are more exclusive or unique to one cluster. In contrast, others are shared by most other clusters (i.e., identification of transdiagnostic experiences). Such differences are interesting objects of inquiry for future studies. For example, in clear contrast to the traditional diagnostic systems, while some experiences, such as auditory hallucinations, are present in all four clusters, others, such as trouble with eating, are exclusive to one cluster (representing a syndrome: a group of co-occurring symptoms). We argue that trans-diagnostic conditions (e.g., auditory hallucinations) might be prime targets for symptom-level interventions. For syndrome-level grouping and intervention, however, we argue that exclusive symptoms are the main targets. Conclusion Categorical approach to mental disorders is not a way forward because the categories are not unique enough and have several shared symptoms. We argue that the same symptoms can be present in more than one syndrome, although dimensionally different. However, we need additional studies to test this hypothesis. Future directions and implications were discussed.
This content is subject to copyright. Terms and conditions apply.
Ghoshetal. BMC Psychiatry (2022) 22:427
https://doi.org/10.1186/s12888-022-03984-2
RESEARCH
What can we learn aboutthepsychiatric
diagnostic categories byanalysing patients’
lived experiences withMachine-Learning?
Chandril Chandan Ghosh1*, Duncan McVicar2, Gavin Davidson3, Ciaran Shannon4 and Cherie Armour1
Abstract
Background: To deliver appropriate mental healthcare interventions and support, it is imperative to be able to
distinguish one person from the other. The current classification of mental illness (e.g., DSM) is unable to do that
well, indicating the problem of diagnostic heterogeneity between disorders (i.e., the disorder categories have many
common symptoms). As a result, the same person might be diagnosed with two different disorders by two independ-
ent clinicians. We argue that this problem might have resulted because these disorders were created by a group of
humans (APA taskforce members) who relied on more intuition and consensus than data. Literature suggests that
human-led decisions are prone to biases, group-thinking, and other factors (such as financial conflict of interest)
that can enormously influence creating diagnostic and treatment guidelines. Therefore, in this study, we inquire
that if we prevent such human intervention (and thereby their associated biases) and use Artificial Intelligence (A.I.)
to form those disorder structures from the data (patient-reported symptoms) directly, then can we come up with
homogenous clusters or categories (representing disorders/syndromes: a group of co-occurring symptoms) that
are adequately distinguishable from each other for them to be clinically useful. Additionally, we inquired how these
A.I.-created categories differ (or are similar) from human-created categories. Finally, to the best of our knowledge, this
is the first study, that demonstrated how to use narrative qualitative data from patients with psychopathology and
group their experiences using an A.I. Therefore, the current study also attempts to serve as a proof-of-concept.
Method: We used secondary data scraped from online communities and consisting of 10,933 patients’ narratives
about their lived experiences. These patients were diagnosed with one or more DSM diagnoses for mental illness.
Using Natural Language Processing techniques, we converted the text data into a numeric form. We then used an
Unsupervised Machine Learning algorithm called K-Means Clustering to group/cluster the symptoms.
Results: Using the data mining approach, the A.I. found four categories/clusters formed from the data. We presented
ten symptoms or experiences under each cluster to demonstrate the practicality of application and understanding.
We also identified the transdiagnostic factors and symptoms that were unique to each of these four clusters. We
explored the extent of similarities between these clusters and studied the difference in data density in them. Finally,
we reported the silhouette score of + 0.046, indicating that the clusters are poorly distinguishable from each other
(i.e., they have high overlapping symptoms).
Discussion: We infer that whether humans attempt to categorise mental illnesses or an A.I., the result is that the
categories of mental disorders will not be unique enough to be able to distinguish one service seeker from another.
© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco
mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Open Access
*Correspondence: ghoshchandril@gmail.com
1 School of Psychology, Queen’s University Belfast, Belfast, United Kingdom
Full list of author information is available at the end of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 2 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
Introduction
Diagnostic categories are important for mental healthcare
services and research. Current diagnostic approaches
have been demonstrated to be unreliable [13], and
their usefulness questionable [4] because it is unable to
clearly differentiate between different service seekers (i.e.,
between-disorder diagnostic heterogeneity). So, develop-
ing an alternative diagnostic approach is warranted and
arguably necessary to advance research and clinical prac-
tice further. In this study, we propose an alternative way
to categorise psychopathological symptoms.
e end goal of healthcare is to minimise or remove
harmful or unhealthy experiences and promote well-
being. e grouping of symptoms or diagnostic catego-
ries is important to any branch of health care, including
mental health. It facilitates clear and consistent commu-
nication with patients, physicians, the government, and
other stakeholders. It also facilitates the development of
a treatment that would otherwise be difficult to develop
as well as to administer. Concerning mental ill health,
psychiatry has created diagnostic categories for patients’
experiences. However, the existing approaches have been
previously criticised for a range of limitations, including
being “shrouded in the rhetoric of science” [5].
Problems withthetraditional taxonomies, theDSM
andtheICD
ere is an emerging literature suggesting that the tra-
ditional diagnostic systems such as the Diagnostic and
Statistical Manual of Mental Disorders (DSM–5, [6]) and
the International Classification of Diseases 11th Revi-
sion (ICD-11, World Health Organization, WHO, 2020)
[7] are unreliable (e.g.,[1,8,9]). e same service seeker
might receive two different diagnoses by two independ-
ent clinicians (i.e., low inter-rater reliability). In other
words, two or more people (e.g., clinicians) do not arrive
at the same diagnosis given an identical set of data. For
example, one of the studies demonstrated that 40% of
diagnoses did not meet even a relaxed cut-off for accept-
able interrater reliability [9]. Two related concerns (as
reviewed in [9] are that there is a co-occurrence of symp-
toms between disorders suggesting an excessive over-
lap of symptoms between people who received different
diagnoses. e second concern is that some patients’
experiences do not fit neatly with the disorders’ crite-
ria. is indicates that some people, despite expressing
significant distress or impairment and need for help, do
not fit well with the criteria for the DSM’s diagnostic
categories.
e root of these problems may relate to the fact that
historically such disorders were derived using a top-
down approach where a committee of experts agreed
upon certain names of disorders and which symptoms
to be included for its diagnosis based on the ideologies
prominent at that time of history [10]. As a result, these
traditional diagnoses rely on certain untested assump-
tions, such as the assumption that mental disorders can
be organised effectively into categories and which symp-
toms to include under what label.
Furthermore, there might have been undue politi-
cal and commercial influences in creating the classifica-
tion systems, like the Diagnostic and Statistical Manual
of Mental Disorders (DSM-5), created by the Ameri-
can Psychiatric Association (APA) task force members.
For example, it was reported that 69% of the members
responsible for creating the DSM-5 had ties to the phar-
maceutical industry [11]. It has been argued that such
Therefore, the categorical approach of diagnosing mental disorders can be argued to fall short of its purpose. We
need to search for a classification system beyond the categorical approaches even if there are secondary merits (such
as ease of communication and black-and-white (binary) decision making). However, using our A.I. based data mining
approach had several meritorious findings. For example, we found that some symptoms are more exclusive or unique
to one cluster. In contrast, others are shared by most other clusters (i.e., identification of transdiagnostic experiences).
Such differences are interesting objects of inquiry for future studies. For example, in clear contrast to the traditional
diagnostic systems, while some experiences, such as auditory hallucinations, are present in all four clusters, others,
such as trouble with eating, are exclusive to one cluster (representing a syndrome: a group of co-occurring symp-
toms). We argue that trans-diagnostic conditions (e.g., auditory hallucinations) might be prime targets for symptom-
level interventions. For syndrome-level grouping and intervention, however, we argue that exclusive symptoms are
the main targets.
Conclusion: Categorical approach to mental disorders is not a way forward because the categories are not unique
enough and have several shared symptoms. We argue that the same symptoms can be present in more than one
syndrome, although dimensionally different. However, we need additional studies to test this hypothesis. Future direc-
tions and implications were discussed.
Keywords: Classification, Taxonomy, Machine Learning, Lived experiences, Narratives
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 3 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
financial ties might have had an enormous influence on
the diagnostic and treatment guidelines (e.g., “pro-indus-
try habit of thought”). In other words, the DSM was not
without conflict of interests.
Such unreliable diagnostics is problematic because
such manuals shape the way healthcare profession-
als and the society at large views mental illnesses and
those suffering from it. So, an unreliable diagnostic sys-
tem is expected to lead to misjudge people experiencing
psychopathology on how they are experiencing it, how
much help they need, and what they are capable of – with
important implications such as taking away their voting
rights—as in the United Kingdom where patients can
lose their right to vote “if deemed to lack mental capac-
ity by a health care provider” [12]. Furthermore, unreli-
able diagnostics might lead to unreliable categorisation of
research participants in clinical trials (e.g., anti-depres-
sant trials) and other empirical studies (e.g., neuroimag-
ing studies to differentiate brain functioning or studies
attempting to find biomarkers in patients with a particu-
lar diagnosis from those people without that diagnosis)
leading to production of questionable literature (knowl-
edge base). Combined, an unreliable diagnostic system
is likely to worse treatment. Likewise, due to the uncer-
tainty in the validity of the current diagnostic categories,
they do not always guide treatment and predict outcomes
[13].
Addressing thelimitations ofthecurrent taxonomies
In the current study, the aims were to explore the pos-
sibility of an alternative diagnostic categorical system
by taking a bottom-up approach where we built a diag-
nostic system from the narrative data of patients’ lived
experiences. ere was no limit on how many disorders
or syndromes could be generated. ere was also no ini-
tial decision about which symptoms would be included
under each disorder. Instead, based solely on the struc-
ture of the data, the A.I. algorithm organised the data
using 100s of iterations performed using different symp-
toms. In simpler words, a single “iteration” refers to the
step of estimating the centroid and assigning all the data
points to the cluster based on their distance from the
centroid. We have run several iterations to improve the
quality of the clusters.
Emerging alternative approach
One proposed an alternative way to understand psycho-
pathology is the Hierarchical Taxonomy of Psychopathol-
ogy (HiTOP]).
e HiTOP represents an emerging nosological system
that organises psychopathology in a hierarchical format.
e components in the higher levels of the structure
indicate the most common or general features (shared
between patients). ey consist of dimensional syn-
dromes as a continuum (or spectra). On the other hand,
the components in the lower levels of the hierarchy (in
the HiTOP model) consist of signs and symptoms spe-
cific to each condition/patient. e intermediate levels
of HiTOP consist of subfactors, syndromes and com-
ponents/traits in descending order.It is different from
the DSM, as it does not categorise conditions. Instead,
it attempts to allow for a flexible patient description
depending on the desired degree of specificity.
Furthermore, the HiTOP has been able to inform the
RDoC framework [14]. RdoC framework is a research
framework for studying mental disorders. It aims to
understand the nature of mental health and illness in
terms of different degrees of dysfunction of the general
psychological/biological system. e HiTOP has been
argued to inform the RDoC framework [14] regarding
key clinical dimensions that need to be considered and
provide clearer phenotypes for basic research. HiTOP is a
possible way forward in the post-DSM era, but it is yet to
be adopted in mainstream practice.
Problems withHiTOP
e HiTOP proposes a dimensional model, but clinical
care often requires black-and-white(binary) decisions.
e traditional taxonomies tend to offer a single cut-off,
that is, the diagnostic threshold. e HiTOP, which fol-
lows a dimensional route, attempts to overcome this
problem by segmenting dimensions into illness severity
(like blood pressure ranges). is is similar to the clini-
cal staging model framework that defines the extent of
progression of a disorder at a time and where a person
lies currently along the continuum of the course of a psy-
chiatric condition such as Psychotic and Related Mood
Disorders [15]. However, even when applied in practice,
when compared with the DSM, HiTOP relatively com-
plicates the communication process between different
stakeholders of mental healthcare services.
Furthermore, it is important to note that both the
HiTOP and the clinical staging model framework
attempts to re-use many of the constructs from the tradi-
tional diagnostic systems. We argue that the focus should
attempt on a symptom-level at this stage since the litera-
ture suggests that much of the constructs from the tradi-
tional diagnostic systems and literature (e.g., depression)
suffers from low validity and low reliability. So, re-using
such constructs would be repeating the same mistake.
We also argue that a limitation of those dimensions
may be that they are based on self-report scales and
questionnaires, which restricts the person’s responses to
a set of conditions (set by the researcher) without allow-
ing much room for reporting symptoms or experiences
outside that pre-fixed list of symptoms and experiences.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 4 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
is risks loss of information (e.g., maybe the person had
other more important symptoms to report, but it was
not reported due to the restricted data collection tool).
erefore, a better alternative is to collect data from peo-
ple using open-ended questions to report their phenom-
enological experiences.
Some people with relatively serious conditions might
perceive and report their condition as less bothering. For
example, some people might get habituated to the dis-
tress caused over the years and accept it as part of life.
Others, such as patients who are also a parent, might be
concerned that reporting their symptom severity hon-
estly might lead to the professional judgement that there
is an ongoing risk to the child from the parent with the
possible outcome of children being removed into alterna-
tive care.
Simultaneously, some people might, consciously or
unconsciously, exaggerate relatively minor concerns and
report them as more disabling or severe. For example,
people whose experience of their parents was negative
have reported increased pain and fear experiences [16].
ere may also be incentives to exaggerate psychiatric
symptoms, such as securing access to limited services or
meeting narrow eligibility criteria for disability benefits
in certain countries [17].
Above all, the HiTOP is based on the past literature and
questionnaires/scales based on the DSM and the ICD
and, in doing so, carry forward some of the major con-
cerns of the past approaches. For example, if question-
naires and scales such as the PTSD checklist for DSM-5
(PCL-5) are designed to collect data consistent with
DSM/ICD (to collect data allowing clinicians to ‘score’
patients on various DSM categories, such as PTSD in this
case) then using data collected in this way to support the
development of an alternative system (e.g. HiTOP) may
be problematical because the underlying data are them-
selves constrained by the DSM. To avoid this problem,
in this study, we do not use the scales or questionnaires
found across studies that have relied on the DSM or the
ICD.
Addressing thelimitations oftheemerging taxonomies
(researchers’ position)
First, traditional systems consider all mental disorders
to be categories. In contrast, the evidence to date sug-
gests that psychopathology exists on a continuum with
normal-range functioning [9]. But implementing a pure
dimensional approach may be problematic, as it would
mean we would need a scale for each symptom, and it
might be difficult for pragmatic communication pur-
poses. erefore, while we focus this study on identi-
fying clusters or categories, we acknowledge that the
symptoms might differ in scale/magnitude/frequency.
erefore, we propose that future studies should
explore a mid-way. For example, both clusters A and B
might have sleep disturbance, but the frequency might
differ. is might address some of the DSM’s concerns
related to a substantial loss of information and diagnos-
tic instability [1820]. is is different from the DSM’s
approach, where diagnoses can also be mild, moder-
ate, and severe because unlike the DSM, our focus is on
individual symptoms (not disorders or category level).
Such focus on individual symptoms is likely to be more
valid than reliance on human-made constructs such as
depression.
Note that the data we used in this study do not
have numeric values such as frequency. However, we
acknowledge the possibility of symptoms differing in
frequency or magnitude here. erefore, we do not test
the assumption and leave that to future studies on this
line.
Other than the HiTOP, some taxonomic studies have
also proposed non-categorical approaches, such as the
Network structure of psychopathology (e.g., [21, 22]) and
transdiagnostic approach [23]. In this study, we acknowl-
edge all such possibilities and promises. Still, we attempt
to build a categorical model of psychopathology because
of its acceptability in the dominant culture and clinical
practice and its merits in communication and other value
propositions (as mentioned above).
Importance andaims
is study is an important potential way forward in the
classification literature because it discovers potential syn-
dromes in psychopathology based primarily on people’s
open-ended narratives about their lived experiences with
mental illness. At its least, the current study shows how
to create a data-driven bottom-up approach to classifying
mental illnesses. e current study aims to group symp-
toms based on peoples’ lived experiences. e study asks
the following questions:
How many groups or clusters can we group peoples’
experiences with mental illness?
Which are the top ten symptoms under each of those
clusters?
Which factors or experiences are common to multi-
ple clusters?
Which factors or experiences are unique to each
cluster?
In how, these clusters are similar?
How is the density of data distributed between the
clusters?
How well each of these clusters is differentiated from
each other?
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 5 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
Method
We used secondary data, which we gathered and curated
for another study [1] which extent are these demonstrate
diagnostic heterogeneity between patients diagnosed
with Major Depressive Disorder and Bulimia Nervosa.
e dataset was used in the current study to address
different research questions mentioned above and had
personal narratives of 10,933 people who mentioned hav-
ing received a mental illness diagnosis. In the version of
the data used in the current study (i.e., keeping only the
symptoms and removing other parts of the sentences),
the patients reported an average of 4.8 symptoms with an
SD of 3.8. e data can be acquired from https:// github.
com/ Chand ril/ patie nt_ narra tives_ proce ssed_ data (data-
file titled as data29.csv).
Participants
Ten thousand nine hundred and thirty-three narratives
from an online journal (i.e., live journal, https:// www.
livej ournal. com/). LiveJournal is an open social network-
ing service that hosts multiple communities from sports
to investing to health (it is not an academic journal as
it might seem to some people from the name). e nar-
ratives we used were posts made by people who self-
reported to have received a psychiatric diagnosis. Anyone
can sign-up and share their concerns with the hope that
the community will respond with support or advice from
other members. ere are no moderators. We specifically
chose the communities meant for people with a particu-
lar psychiatric diagnosis. e members of such commu-
nities are currently at different stages of their recovery
process. For example, while someone might be in their
early stage of illness, others might be those whose condi-
tion is either improved or gone (remitting). ere are no
content or word restrictions. We scraped all the commu-
nity posts till 16th October 2019 was scraped.
e available data do not include the patients’ sociode-
mographic information nor their geographic location.
No directly identifiable data were collected, and narra-
tives were disassociated from usernames before analy-
ses. We collected only English text, indicating that the
patient knows how to write in English (inclusion criteria).
Also, we know that they were diagnosed with mental ill-
ness because they had reported so. It covered 84.2% of
all diagnostic categories mentioned in the DSM 5 which
is a good range, but it is important to acknowledge that
this was not a random, representative sample and so
there should be caution about generalising to the wider
population and/or across contexts. e distribution of
the diagnosis the sample received has been visually pre-
sented in Fig. 1. e average number of words in the
narratives was 586.5 (Standard Deviation, S.D. = 48.79).
As mentioned above, after processing (i.e., keeping only
the symptoms/experiences), each patient in our dataset
reports an average of 4.8 (~ 5) symptomatic words with
an S.D. of 3.82 (~ 4).
From the Fig.1, it can be suggested that a lot more
patients diagnosed with depressive and anxiety disorders
wrote narratives on such online platforms than ones with
sexual dysfunction and dissociative disorders. erefore,
there is no equal distribution and hence there is a pos-
sibility of bias. Our data covered narratives from patients
diagnosed with 84.2% of all the diagnostic categories
mentioned in the DSM 5. However, the database does
not include patients who have explicitly mentioned being
diagnosed with neurocognitive disorders (e.g., demen-
tia), paraphilic disorders (e.g. paedophilia), or elimination
disorders. Regarding our choice of which words would
qualify as “symptoms”—as reported in the study [1]—in
which this data was collected, we explained that “while
we did refer to the DSM and ICD for gathering collec-
tions of words, but we also focused on manual scan-
ning of the words the patients wrote about their mental
ill health experiences – without specifying or restrict-
ing ourselves to a particular disorder or syndrome (e.g.,
depression).
One concern might be that the peoples’ narratives are
‘corrupted’ by DSM given their diagnoses and the wider
discourse about the disorder in the media. But that was
demonstrated to not be the case in [1] because using the
same dataset (as used in the current study), it was found
that the patients who received the same diagnosis (e.g.,
Major Depressive Disorder) reported different symp-
toms. If their perceptions were shaped by the media or
their knowledge of the diagnosis, the study would have
found a high similarity between patients who received
the same diagnosis.
The procedure oftextual analysis
Pre-processing data: Raw data can be dirty and have
missing values, incorrect entries, among several other
issues. As reported in a recent study [22], there are 672
unique symptomatic/psychopathological experience-
based words in the dataset. Some of them are nonsen-
sible (e.g., “eb”), and some are irrelevant to the context
(e.g., “falling”). erefore, we considered only sympto-
matic words such as “stress” and “trauma” in our analysis.
Secondly, machine learning algorithm don’t understand
the text as well as they understand numbers, so before
the data is run through the machine learning algorithm,
the text data was converted into numbers (called “vec-
torisation”). e Term Frequency-Inverse Document Fre-
quency (TF-IDF) vectoriser was used for this purpose. In
the current study, the TF-IDF reflects how important a
symptom is to the dataset (in a narrative collection).It is
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 6 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
often used as a weighting factor. ‘TF’ refers to the infor-
mation on how often a term appears in a narrative (or
document) and ‘IDF’ indicates the information about
the relative rarity of a term in the collection of narra-
tives (or documents). Together (TF-IDF), they represent
the importance of a “word” being inversely related to its
frequency across narratives (or documents). erefore,
in this study, the main purpose of using TF-IDF is to
indicate how important a “symptom” is in a patient nar-
rative in a collection of narratives (from all patients com-
bined)—helps to adjust for the fact that some symptoms
appear more frequently in a narrative..
is study will use an unsupervised Machine Learn-
ing algorithm called K-Means Clustering on the text data
to do the clustering (i.e., groups the unlabelled dataset
into different clusters). e purpose of using K-Means
Clustering is similar to the idea of grouping different ele-
ments (e.g., chicken, fruits, and vegetables) and sorting
or grouping them based on their similarities and differ-
ences (such as vegetarian diet: fruits and vegetables Vs,
non-vegetarian diet: chicken). We attempted to group
symptoms based on their co-occurring nature in the cur-
rent study. For example, if low mood and anhedonia tend
to co-occur often in the sample and if intrusive thoughts
and compulsive actions tend to co-occur often, we expect
that the K-Means clustering will form two clusters: one
with low mood and anhedonia while the other cluster
was having intrusive thoughts and compulsive actions.
In K-Means, K defines the number of pre-defined clus-
ters that need to be created in the process. e K-Means
algorithm works iteratively to assign each data point
to one of the K groups based on the provided features.
For example, in this case, we ran the K-Means Cluster-
ing (distance metrics = Euclidean) through 300 iterations
(default value) to develop the results. Since k-means have
a random initialisation component, multiple instantia-
tions of k-means are often done, and an average is taken,
but this is not practical in the current study. So, in most
cases, when such averaging is not done, people use the
k-means + + initialisation heuristic. K-Means + + algo-
rithm helps to improve the conventional initialisation
algorithm by choosing the initial values (or “seeds”) for
the k-means clustering algorithm. We do not expect the
algorithm to carry any form of humanly biasness in this
case because it only had data about symptoms and the
goal was only to explore or group (not predict X from Y).
Had the algorithm been subjected to analyse sociodemo-
graphic data, and had the goal been to predict a variable
Fig. 1 The distribution of diagnostic categories in the narrative sample
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 7 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
there could have been a possibility of introducing/imitat-
ing human bias (e.g., predicting people of black race to be
more likely to commit crimes).
After that, the “elbow” method was set to silhouette
score (metrics) to select the optimal number of clusters
by fitting the model with a range of values for K. e
“elbow” method helps us to select the optimal number
of clusters by fitting the model with a range of values for
K. e default metrics (in the used KElbowVisualizer,
[24] is set to distortion (mean sum of squared distances
to centres). But we chose silhouette metrics (mean ratio
of intra-cluster and nearest-cluster distance) instead. e
rationale was that the silhouette coefficient was argued
to exhibit a peak characteristic as compared to the gentle
bend in the default elbow method [25] which is easier to
visualize and reason with.
e elbow method then ran k-means clustering on the
dataset for a range of values for k (i.e., 1–10) and then for
each value of k computes an average score for all clusters.
As mentioned above, these clusters were evaluated using
silhouette (mean ratio of intra-cluster and nearest-cluster
distance). In the resulting plot, if the line chart resembles
an arm, then the “elbow” (the point of inflexion on the
curve) is a good indication that the underlying model fits
best at that point.Likewise, the best value of silhouette
score is 1, and the worst value is -1. Values near 0 indicate
overlapping clusters. Negative values generally indicate
that a sample has been assigned to the wrong cluster, as a
different cluster is more similar.
We chose to report the top ten symptoms under each
resultant cluster to make it a practical starting point. We
argue that we can put as many symptoms as we want
under each cluster but imagine what benefit a taxonomic
manual does to a clinician trying to diagnose a patient or
a researcher trying to design the treatment—if each of its
syndromes has hundreds of symptoms. Finally, we argue
that it would be pre-mature to establish these syndromes
as biological-truth, as is the case with physical ailments
such as cancer or cardiovascular conditions. ese are
some of the DSM and ICD mistakes. We attempt to
explicitly mention those aspects where human decisions
were made without pretending to establish the clusters as
ground rules. We chose the top 10 symptoms under each
cluster for the current study to ensure ease of interpreta-
tion and feasibility.
We then used Jaccard’s similarity index as the similar-
ity metric, which measures the similarity between two
nominal attributes by taking the intersection of both and
dividing by their union. In other words, Jaccard similar-
ity is the number of common attributes divided by the
number of attributes that exist in at least one of the two
objects. So, the coefficient equals to zero if there are no
intersecting symptoms and equals to one if all symptoms
intersect. e rationale for choosing Jaccard similarity
was associated with a potential problem with the nature
of data: narratives often have repetitive words.
As mentioned above, we scrape the user-generated
content about their mental health experiences from a
social networking service called https:// www. livej our-
nal. com/. Time and again, researchers have scraped data
from this website and have published it in peer-reviewed
journals. An example of one such study that used Live-
Journal to scrap the data (concerning mental illness) can
be found in [26]. A more generic example of scraping
health discussions from websites can be found in [27].
Ethical approval was awarded by the Queen’s Manage-
ment School Research Ethics Committee. We followed
the guidance for internet-mediated research from the
British Psychological Society [28] and adhered to copy-
right laws in conducting this work. Direct consent could
not be obtained because of the nature of the data col-
lection. Still, implicit consent was deemed to have been
given by virtue of posting in an open forum.
Result
e elbow visualiser in Fig.2 demonstrated that there are
four clusters (potential categories of syndromes).
e amount of time to train the clustering model
perKis indicated as a dashed green line. is green line
is displayed as part of the default KElbowVisualizer’s out-
put (yellowbrick library).
“There are four clusters ofpsychopathologies.
e symptoms and syndromes are distributed in the
four clusters whenthe algorithm was run up to a max of
k = 10, as depicted in Table1. e subsequent analyses
will be based on the choice of 4 clusters. It is not arbitrary
but based on the point of inflection on the curve sug-
gested by the “knee point detection algorithm” based on
the data. is point of inflection is a good indication that
the underlying model fits best at that point. e inflec-
tion point on the curve (= 4) does not change when the
same algorithm is run up to k = 200 (Additional file1:
Appendix A, Fig.2).
For practical purposes, we have requested the feature
extraction module (used in the script, belongs to the
sklearn python library) to draw ten words for each clus-
ter. erefore, each cluster had ten words under them.
Note that this decision was taken by researchers based on
a pragmatic rationale. is is not to say that there cannot
be more than ten signs and symptoms under each syn-
drome because it is not ground truth. However, we have
presented the result for 5 and 20 as well (Additional file1:
Appendix B) for the reader to realise that this does not
unduly influence the results and merely adds new symp-
toms or experiences to the list.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 8 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
Further, we argue that these syndromes are treated
more like human-made constructs built on observed
experiences of people who lived through them.
Being constructed, these clusters are open to being
revised on how many symptoms should be under
each syndrome.
We presented ten symptoms or experiences under
each cluster to apply and understand the practicality.
Note that each cluster label and the numbers associated
with the cluster names do not have a meaning. So, clus-
ter 0 can be relabelled as cluster A, cluster 1 as cluster
B, and so forth, and they will still mean the same.
Fig. 2 Silhouette score elbow for KMeans Clustering (when the algorithm was run up to a max of k = 10)
Table 1 The distribution of symptoms and syndromes within the four clusters (potential constructs for mental disorders)
Note. The order or sequence of the symptoms does not matter here. However, the symptoms were arranged in this order to visualise the common and uncommon
factors across the clusters
Cluster 0: Cluster 1: Cluster 2: Cluster 3:
feeling sick feeling sick feeling sick feeling sick
fear fear fear fear
depressed mood and loss of interest depressed mood and loss of interest depressed mood and loss of interest depressed mood and
loss of interest
auditory hallucination auditory hallucination auditory hallucination auditory hallucination
mania and depression
pain pain
experience of loss
sadness sadness sadness sadness
sleep sleep
eating
repetitive thoughts and actions
anxiety anxiety anxiety
compulsion
attention deficit attention deficit
rituals
isolation
loneliness
cry
panic attack
stress
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 9 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
Transdiagnostic factors
It can be inferred that the following four experiences are
common across all the four clusters: feeling sick, fear,
depressed mood and loss of interest, and auditory hallu-
cination. erefore, it can be indicated as transdiagnos-
tic factors. e traditional diagnostic system attempts
to group people based on their symptoms. However, we
attempted to group the symptoms based on their co-
occurrences in the current study. Such generation of clus-
ters of symptoms based on co-occurrences will find its
applications in assisting healthcare professionals in guid-
ing their clinical interview questions (e.g., if the patient
reports symptom X, then ask about symptom Y, which is
co-occurring in the sample).” erefore, the finding that
auditory hallucination presented itself as a transdiagnos-
tic factor does not indicate that the majority of patients
experience it. Instead, the finding suggests that auditory
hallucination frequently co-occurs with the respective
symptoms under each cluster. is possibility was sup-
ported/exemplified by the high eigenvalues for these
syndromes and symptoms [22]. For example, fear (0.99),
auditory hallucination (0.95), depressed mood and loss of
interest (0.94) had some of the highest eigenvalues (equal
to or greater than 0.80 out of 1.0). A symptom gets a high
eigenvalue if it frequently co-occurs with other symp-
toms. e traditional literature on mental health would
probably call such symptoms “comorbid” and “transdiag-
nostic”. On the other hand, the symptoms with low eigen-
values are the ones that are the distinguishing feature of
each cluster.
Distinguishing factors amongtheclusters
Among the unique factors, what distinguishes cluster 0 is
the problem with sleep and eating. Cluster 1 can be dis-
tinguished by repetitive thoughts and actions, perform-
ing rituals (indicative of compulsions). Cluster 2 can be
distinguished by the presence of feeling isolated, pain,
loneliness and crying spells. Finally, cluster 3 can be dis-
tinguished by the presence of panic attacks and stress.
Heterogeneity withinthese four clusters
e presence of transdiagnostic factors reflects that there
is heterogeneity within our four clusters. Now the next
question that arises is to what extent they are overlapping
(between clusters). A recent study used Jaccard’s coeffi-
cient as a similarity metric on the same narrative data-
set [1]. e study aimed to evaluate the heterogeneity
within and between the two most homogenous diagnos-
tic categories of Major Depressive Disorder and Bulimia
Nervosa. In the current study, we used a similar method-
ology. Using a simple, open-source online tool (https://
plane tcalc. com/ 1664/), we estimated the similarity index
between two sets (in this case, each set represented one
cluster) with elements in them representing individual
symptoms. Entering the two sets (with their respective
elements) meant we compare the two clusters and see
how many symptoms are identical between them (e.g.,
cluster 1 Vs cluster 2).
e Jaccard’s coefficients indicate the similarity index
between the 4 clusters range from 0.33 to 0.43 (Table2).
e detailed result can be found in Additional file 1:
Appendix B (Table2).
Dierences indensities withinthefour clusters
e thickness of each silhouette in the plot (Fig.3) indi-
cates the proportion of the data split into four clusters.
e blue colour represents cluster 0, green represents
cluster 1, red represents cluster 2, and pink represents
cluster 3. Clearly, they are not of equal size. For example,
people labelled in cluster 0 have the most heterogeneity
and are poorly clustered compared to clusters 1, 2, and 3.
Uniqueness betweentheClusters
In addition to the number of clusters, how they are dis-
tributed, and which symptoms are there, it is also impor-
tant to report how well the clusters are differentiated. e
current study used thesilhouettevalue to estimate how
similar an object is to its cluster (cohesion) compared to
other clusters (separation).
esilhouetteranges from 1 to + 1. e best value
is 1, and the worst value is -1. Values near 0 indi-
cate overlapping clusters. e silhouette coefficients
near + 1 indicate that the sample is far away from the
neighbouring clusters. A value of 0 indicates that the
sample is on or very close to the decision boundary
between two neighbouring clusters, and negative values
indicate that those samples might have been assigned
to the wrong cluster.
As presented in Fig.1, the current study found the sil-
houette coefficient to be 0.046, which is closer to zero.
is indicates that the four clusters are not well separated
or distinguished from one another, indicating reinvention
of the diagnostic heterogeneity (comorbidity) between
disorders. In turn, hinting that the key problem with
DSM might not be the consensus-driven, intuition-based
approach it arrives at its classification, but that classifica-
tion itself is difficult in the context of psychopathology.
e elbow method was also tried using the distortion
score (Within-Cluster Sum of Scores) as an alternative
to the silhouette score (Additional file1: Appendix A,
Fig.1). However, for both elbow graphs with silhouette
and distortion scores, the finding remains the same; that
is, there was no distinctive clustering pattern evident in
the dataset. So, although we found clusters of symptoms,
they are not highly unique in terms of the symptoms.
erefore, it is difficult to differentiate service seekers
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 10 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
Table 2 Test of similarity of narratives obtained within and across clinical diagnoses
Note. The table depicts the average Jaccard’s coecient—either with a diagnosis of Cluster 0, 1, 2 and 3
Cluster 1 Cluster 2 Cluster 3
Cluster 0 0.33 0.43 0.42
(Common conditions: auditory hallucination, depressed
mood and loss of interest, fear, feeling sick, sadness) (Common conditions: auditory hallucination, depressed
mood and loss of interest, fear, feeling sick, pain, sadness) (Common conditions: auditory hallucination, depressed mood
and loss of interest, fear, feeling sick, sadness, sleep)
Cluster 1 0.43 0.54
(Common conditions: anxiety, auditory hallucination,
depressed mood and loss of interest, fear, feeling sick, sad-
ness)
(Common conditions: anxiety, attention deficit, auditory hal-
lucination, depressed mood and loss of interest, fear, feeling
sick, sadness)
Cluster 2 0.43 0.43
(Common conditions: anxiety, auditory hallucination,
depressed mood and loss of interest, fear, feeling sick, sad-
ness)
(Common conditions: anxiety, auditory hallucination,
depressed mood and loss of interest, fear, feeling sick, sadness)
Cluster 3 0.43 0.43
(Common conditions: anxiety, auditory hallucination,
depressed mood and loss of interest, fear, feeling sick, sad-
ness)
(Common conditions: anxiety, auditory hallucination,
depressed mood and loss of interest, fear, feeling sick, sad-
ness)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 11 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
from each other and make treatment decisions based on
these syndromes/clusters.
Discussion
The number ofclusters
e algorithm generated four syndromes from our data-
set. e DSM 5 has 20 disorder chapters, there are spe-
cific diagnostic categories, but broadly, the DSM divides
people into 20 categories.1 Our dataset had patients’ nar-
ratives covering 84.2% of the DSM diagnosis (16 out of
19). So, in other words, we can say that when we took
narratives from patients diagnosed with 16 different
DSM diagnoses, we found 4 clusters. is might indicate
that in an attempt to overcome the problem of hetero-
geneity, the DSM has specified too many disorder sub-
types and was stretched way too much. On similar lines,
[29] argued that most of such subtypes had been defined
rationally rather than derived from structural research
and failed to demarcate homogenous subgroups. So, we
argue that if there is any attempt to group people into cat-
egories of mental disorders based on this dataset, there
be four groupings. e extent to which this is generalis-
able can be verified in future studies using similar but dif-
ferent datasets. In this study, we intended to demonstrate
the approach and to draw conclusions about transdiag-
nostic symptoms.
Mutually shared factors
e presence of transdiagnostic symptoms such as feel-
ing sick, fear, depressed mood and loss of interest, and
auditory hallucination among all the four clusters hints
towards the reinvention of the potential problem with the
DSM, that is, diagnostic heterogeneity (e.g., [1].
We propose that clusters or syndromes be defined
by the symptoms exclusive to the category. Symptoms
or experiences that are mutually shared between
clusters should be studied to see if they trigger or
maintain the unique conditions and the individual
differences (e.g., protective factors and social envi-
ronment) that lead to such differences in mental
health trajectories.
The similarity betweenclusters
e Jaccard’s coefficients indicate the similarity index
between the 4 clusters range from 0.33 to 0.43 (Table2).
is means that about 60–70% of the symptoms are
shared between each pair of clusters. erefore, we found
considerable overlap between clusters, which aligns with
the idea that a single dimension, called the p-factor, can
capture a person’s liability to mental disorder [30]. e
possible existence of this general factor of psychopa-
thology (p-factor) has been proposed as it captures the
shared variance across psychiatric symptoms. Addition-
ally, it predicts a multitude of poor outcomes and general
life impairment. A recent study on this line demonstrates
the geneticpfactor that represents a continuous, under-
lying dimension of psychiatric risk [31].
Fig. 3 Silhouette plot of KMeans Clustering
1 e 20 chapters in the DSM are as follows: Attention-Deficit/Hyperactivity
Disorder (ADHD), Autism Spectrum Disorder, Conduct Disorder, Disruptive
Mood, Dysregulation Disorder, Eating Disorders, Gender Dysphoria, Intellec-
tual Disability, Internet Gaming Disorder, Major Depressive Disorder and the
Bereavement Exclusion, Mild Neurocognitive Disorder, Obsessive-Compulsive
and Related Disorders, Paraphilic Disorders, Personality Disorder, Posttrau-
matic Stress Disorder, Schizophrenia, Sleep-Wake Disorders, Specific Learn-
ing Disorder, Social Communication Disorder, Somatic Symptom Disorder,
Substance-Related and Addictive Disorders.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 12 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
Alternatively, the high overlap can also be taken to be
somewhat imitating the DSM’s problem of diagnostic
overlap, which might lead a patient to get diagnosed with
two different disorders when evaluated by two different
physicians. Such low reliability in the current psychiat-
ric diagnostic system is likely to hamper the treatment
process of the service seeker. Proponents of the medical
model of mental illness argue that comorbidity exists in
several recognised medical disorders. For example, indi-
viduals with AIDS are relatively likely to develop yeast
infections because of their compromised autoimmune
system. However, it is undeniable that this leads to prob-
lems in diagnosis and treatment. Accepting it on the
argument mentioned above would mean because physical
diseases have comorbidity…it is ok if mental illnesses have
comorbidities as well. We argue that future studies and
classification approaches must divorce from the categori-
cal viewpoint of mental illnesses and explore alternative
viewpoints such as dimensionality and network to under-
stand mental illnesses.
Proponents of the categorical classification model
argue that it is required for auxiliary stakeholder deci-
sion-making, such as to cover insurance. For example,
in the USA, the healthcare system is mostly private and
expensive for the service seeker. So, clinicians use DSM-5
diagnoses to request reimbursement from insurance
organisations. On that end, we propose that if the pri-
mary stakeholders, the patients, do not benefit from this,
then keeping the categorical approach for the ease of sec-
ondary or tertiary stakeholders (e.g., insurance compa-
nies or the government) might not make sense. Instead,
research should explore how the insurance can make
black-and-white decisions (e.g., Yes/No), such as using
disability scales.
Similarities withtheDSM disorders
From Table3, we can roughly see that some of the clus-
ters are reflecting approximate similarities with the DSM
diagnostic manual. For example, eating issues under
cluster 0 make it aligned to DSM-5’s Eating Disorders
such as Anorexia and Bulimia. Likewise, the presence
of repetitive and intrusive thoughts and related actions
(e.g., rituals) under cluster 1 is similar to the DSM-5’s
Obsessive–Compulsive and Related Disorders. Cluster 2
reflects experiences of depressed mood and loss of inter-
est, sadness, isolation as with patients diagnosed with
Depressive Disorders. Finally, the presence of fear, stress,
and panic attack indicates an orientation towards anxiety
disorders.
Dierences withtheDSM disorders
On the other hand, there are marked differences with the
DSM in these mined clusters. Most of which are because
of the choice of units or conditions to study. e task
force of the DSM decided on which symptoms to study
and list under each category. In doing so, the task force
may have missed thinking about or including symptoms
or experiences that were otherwise important.
In contrast, the current study did not have any restric-
tions. Instead, it mined the free-flowing data written by
the patients about their experience with mental illness.
In doing so, the current study included numerous words
representing experiences that were not present in the
DSM’s limited vocabulary. However, the current study
acknowledges that the suggestion is not that the symp-
toms included in the DSM are necessarily inaccurate but
more that the way they are currently organised into diag-
nostic categories does not fully reflect the complexity of
people’s experiences. erefore, the current study also
considered the symptoms listed in the DSM.
e inclusion of such words in this study has demon-
strated to create a clearer structure of psychopathology.
For example, the current study found that people who
experience issues with eating might also experience a
feeling of loss (cluster A). is raises important questions
about the temporal sequence of these two. Does the feel-
ing of loss trigger issues related to eating in some people?
Or if it’s the other way round. ere is an argument in the
popular media that some people use food to self-medi-
cate the pain of loss. Future studies can investigate this
possibility.
Other questions such as “are there causal relations
between the two symptoms?” (e.g., Is symptom X causing
symptom Y)? If so, what makes some people compensate
for their feeling of loss with binge eating (for example)?
Are there any specific cognitive beliefs or socio-economic
and cultural determinants of this trajectory?
Table 3 Similarities with existing DSM categories
Clusters mined in the current study from patients’ narratives Approximate DSM-5 Categories
Cluster 0 Eating Disorders
Cluster 1 Obsessive–Compulsive and Related Disorders
Cluster 2 Depressive Disorders
Cluster 3 Anxiety disorders
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 13 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
Likewise, in cluster 1, we found that people who expe-
rience obsessive thoughts and compulsive behaviours
might also experience attention deficits. While we do
not know which causes the other, but it raises interesting
possibilities. For example, will it reduce those symptoms
if we apply cognitive training for attention enhancement
(e.g., biofeedback) on patients experiencing such intru-
sive thoughts and compulsions?
Critical evaluation ofone cluster asanexample
is attempt to build an alternative taxonomic system of
mental illness differs from the previous attempts, such
as the DSM and ICD, in one major way. While the DSM
and ICD relied on a top-down approach, where the crea-
tors of the diagnostic manuals proposed the structure
of the disorders and then searched for evidence risking
confirmatory bias; in this study, we took an atheoreti-
cal approach where we gather lived experiences of about
10,933 people diagnosed with mental illness and clus-
tered their symptoms based on their narratives.
We will review and evaluate cluster A (to provide an
example) and draw inferences about the past and future
psychopathological nosology. Although we discuss only
cluster A here, the same line of thought applies to other
clusters too. To prevent repetition and to save space, the
current article has discussed cluster A and few others in
this discussion section as and when required.
An eating disorder is related to the experience of pain
[32]. e cited study found that while 41.2% of the study
participants with chronic pain reported that eating dis-
order symptoms developed after the onset of their pain,
35.3% reported having eating disorder symptoms before
they experienced chronic pain. e literature has also
associated eating disorders with manic and depressive
episodes [33], sleep disturbances [34], self-hatred [35],
attention deficit [36], and potential thought-related issues
such as cognitive distortions [37] and eating-related
intrusive thoughts [38].
On a related note, we offer to treat these clusters (pro-
posed in the current study) as generic themes that war-
rant further exploration instead of specific conditions.
e rationale behind this proposition is that the concep-
tualisations are based on what patients reported in their
narratives and, therefore, based on specific words. So,
while we know that there is a problem with weight and
sleep in this cluster, we do not know whether the weight
increased for some and decreased for others. Likewise,
the experience of loss can translate to multiple possibili-
ties, such as loss of control, overeating [39, 40], among
other possibilities. us, we might require one research
project or at least an individual study to investigate each
of the 4 clusters.
From the above discussion, we infer two lessons:
Reconceptualisation of DSM-based Disorders
e traditional nosological systems framed up a
construct, such as Eating Disorder. en other
researchers followed it to find associations of Eating
disorders with other conditions such as mania, lone-
liness, and so forth. But we argue that a more data-
driven approach to conceptualising psychopatholog-
ical conditions would be to formulate the construct
holistically. us, for example, instead of restrict-
ing eating disorders to problems related to eating
and mentioning all other associated problems as a
comorbidity (and hence other disorders), we argue
that because the patients experience all these symp-
toms together (frequently) therefore, the cluster or
syndrome for an eating disorder should be reformu-
lated as an accumulation of all these psychopatho-
logical experiences (found in this current study and
supported by the existing literature).
Why does a nosological system need to exist?
e points above raise an important question about
what is the purpose of a diagnostic system? We
argue that it is to group patients’ experiences that
frequently co-occur together so that it can guide
researchers to design interventions or drugs (e.g.,
if patients experience X, Y and Z together, then the
intervention that is targeted for patients reporting
X should also cover for Y and Z) and help clinicians
to ask the right clinical questions (e.g., if you experi-
ence X, do you also experience Y and Z?).
We propose that people indicating issues related to
their eating should be inquired about their experi-
ences related to pain, sleep and others as indicated
by this cluster (cluster A). Accordingly, the purpose
of clusters is to assist the clinicians in probing the
frequently associated problems that are otherwise
important for a treatment plan. Still, the patient
might miss, ignore, or forget to report during their
primary contact with the clinician.
Clusters were notdistinguishable
Additional evidence to substantiate the unfeasibility
of the categorical approach comes from the silhou-
ette score (Fig.2). One of the criticisms of the DSM
was that it was based on the intuitive consensus of a
group of people who proposed the categories and not
systematic research. Empirical studies, after that, most
assumed it to be valid and attempted to further the lit-
erature of mental illness. erefore, the heterogeneity
of the DSM can be argued to be the faulty human-led
grouping of symptoms. So, in this study, we asked if we
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 14 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
attempt to create the categories using patients’ first-
hand data (without human intervention), can we find
homogenous clusters?
e current study found a silhouette coefficient of
0.046, demonstrating that even when the approach is
pure data-driven, the categorical approach does not fit
patients’ conditions well into categories. is is not to
say that the creation of DSM categories was not arbi-
trary. at might have been part of the problem, while
the other part of the problem is with the categorical
ideology of mental illness.
Future studies can collect survey data on those specific
experiences (under the syndromes of this study) and then
use the quantitative data to build a diagnostic tool using
K-Means as a classifier and report the accuracy. If the
accuracy falls short, then it can be inferred that even using
this ground-up approach to diagnostic classification does
not work well. But suppose the accuracy is above 80%. In
that case, we can present this as a potential alternative to
the existing classification system in psychiatry.
Nature ofsymptoms
e high overlap between clusters clarifies that there are
no true clusters or categories of disorders in our sample.
But it is interesting to note that we found that symptoms
differ by the extent to which they are co-occurring or
present across several clusters. Some symptoms, such as
Auditory Hallucination, are shared between all 4 clus-
ters. It is consistent with the literature that suggests that
hallucinations are prevalent in all DSM-based disorders
[41]. On the other hand, symptoms such as trouble with
eating are mostly exclusive to one syndrome.
e syndrome of anxiety was found to be present in 3
out of 4 clusters in this study. One reason could be that
anxiety is a massive syndrome in itself. Despite people
reporting it freely, people might mean different symp-
toms when they use the term anxiety. So, because of the
nature of the data, our findings indicate the words that
patients report in this study. However, this calls for an
important future study where different anxiety symptoms
are to be tested to understand which ones come under
which syndrome.
But regardless, in this study, we argue that some symp-
toms are more exclusive. In contrast, others are generic
and present in multiple syndromes. We argue that these
trans-diagnostic symptoms (e.g., symptoms within the
anxiety syndrome) are prime targets for symptom-level
interventions.
The categorical approach which considers network
relations ofsymptoms withdimensional variations
In the current study, our conceptualisation of psycho-
pathology accepts that symptoms are interrelated as
a network, with differing eigenvalues (indicating co-
occurrences). But symptoms can also be grouped in dif-
ferent clusters. We argue that the same symptoms can be
present in more than one syndrome, but they might be
dimensionally different. For example, the frequency of
pain experienced might differ in cluster A and someone
in cluster E. However, we need additional studies using
quantitative data to test this hypothesis. Combined, we
attempt to integrate three different approaches (i.e., cat-
egorical, network-based, and Dimensional approach) of
classification under one system.
Limitations
e current dataset was based on peoples’ narratives
posted or shared online – about their mental health
experiences. To make the dataset closer to the real-world
psychiatric population, we searched for the narratives of
all the possible diagnoses anyone can receive and com-
piled the narratives of lived experiences to create clus-
ters of symptoms. It is important to acknowledge that
the dataset does not represent all types of patients as
per the traditional diagnostic system. In other words, we
searched for and realised that patients of all diagnoses are
not sharing their lived experiences equally. For example,
there are conditions such as Narcissistic Personality Dis-
order (NPD) where the person may not see the problem
in themselves. Instead, they complain about the world
and others. In such a case, it may be less likely that we
will find narratives where people will say, “I got diagnosed
with NPD because I am so self-absorbed and behave nega-
tively to others at times.”
On the other hand, someone diagnosed with Gen-
eralised Anxiety Disorder might write a narrative as “I
was diagnosed with GAD, and I feel scared of everything
all the time.” So, it is unlikely that we will get any nar-
ratives from NPD talking about narcissism. Likewise,
there maybe more stigma about sexual dysfunction than
depression. So, people with sexual dysfunction are less
likely to write about it in public forums. However, that
being said, there is merit in the current study. e study
considered a large sample of 10,933 patients. erefore
may be, it represents the experiences of a certain section
of mental healthcare service seekers.
Our sample covers narratives from the patients diag-
nosed with different disorders, spanning 84.2% of all the
diagnostic categories mentioned in the DSM 5. However,
our sample lacked any narratives for 3 out of the 19 cat-
egories, possibly due to issues of stigma, lack of knowl-
edge, or inability to recall or write memories due to the
condition’s nature. Specifically, the database does not
include patients who have explicitly mentioned being
diagnosed with neurocognitive disorders (e.g. dementia),
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 15 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
paraphilic disorders (e.g. paedophilia), or elimination
disorders.
erefore, it is important to note that we are not argu-
ing for a comprehensive taxonomic system based on
this current study. Instead, we demonstrate a possible
method to create an alternative system. We argue that we
collected narratives about the lived experiences of about
10,933 people diagnosed with the traditional psychiatric
system, and we found 8 clusters from that data.
A different possibility for the high overlap between
clusters could be because of that the fact that the far
majority of our sample experienced depressive and/
or anxiety symptoms and such overrepresentation of
depressive and anxiety symptoms could act as a poten-
tial explanation (from a causal perspective) for the high
overlap between clusters. at said, the constructs of
anxiety and depression are heterogenous in themselves
and different questionnaires using different symptoms to
assess depression and anxiety indicates that there is lit-
tle agreement among the experts on them. It might also
be that symptoms of anxiety and depression are more
consequences of experiencing other psychopathological
symptoms (e.g., auditory hallucination triggering anxiety)
instead of causing or leading to similar psychopathologi-
cal experiences between clusters.
e symptoms under each syndrome discovered in
this study are based on patients’ narratives. erefore,
by its very nature, there are no numeric data. However, it
serves as a qualitative reference point. It requires collect-
ing quantitative data further to investigate the nature of
the symptoms’ dimensional distribution (e.g., variations
in frequencies under each syndrome).
A further limitation is that many terms are generic
because the data was based on what people wrote about
their experiences. For example, anxiety forms the most
reported and highest eigenvalue. Still, anxiety is not a sin-
gular condition. It is a collection of several unique symp-
toms, as can be seen in the scale items called GAD-7. So,
we have a relatively shallow level of understanding using
such sources of data. To gain more specific informa-
tion about which symptoms in anxiety contributed or is
related to each syndrome, we need future studies to con-
sider that. However, that being said, the current study is
an effort to contribute towards building literature that is
divorced from the DSM or ICD based diagnostic catego-
ries opening up new possibilities of conceptualising men-
tal illness. e DSM and ICD based categories misled
the mental health literature. Many studies use DSM/ICD
without acknowledgement or exploration of the limita-
tions of those frameworks. e current study attempts to
ward off the tradition and proposes a novel approach to
understand mental health diagnostics and highlight the
need to revisit why it needs to exist. But at the same time,
we acknowledge that our understanding of psychopathol-
ogy is in its infancy. e current work is far from com-
plete or comprehensive work. Future studies are needed
to grow on this line. e academic community needs to
understand the nuances of psychopathology somehow.
It is likely that the traditional conceptualisations of
mental illnesses, such as the ones proposed by the DSM
(and then popularised by mass media), might impact how
people perceive, interpret, and tell their mental health
stories. However, a recent study [1] using the same data-
set demonstrated that the symptoms narrated by the
patients who received same diagnosis reported being dis-
similar even though DSM likely influences peoples’ nar-
ratives (and choice of language).
Studying mental illness will lead to the development
of more effective treatment and is expected to help peo-
ple in crisis. However, harnessing the social network as a
data source is a new venture. erefore, there is still very
little guidance available. Few people raise concerns about
such data collection sources as being "intrusive", but
most people with whom we have sought counsel spoke
in favour of using this new source of data, and it was
acknowledged that the benefits outweigh any “perceived”
concerns – this highlights the need to revisit it. We argue
that almost any novel scientific endeavour divides peo-
ple into two groups: those who support it and those who
do not. is is similar to the use of magic mushrooms to
treat depression. ere are almost always some concerns
with the side effects of using psychedelic drugs—still, the
scientific communities worldwide progress based on the
rationale that the benefits outweigh any potential harm.
In our case, we argue that there are no or negligible
risks because we accessed only public accounts, no user-
names or profiles were recorded, and all words (except
the symptomatic experiences) were removed automati-
cally, making it non-traceable to specific individuals
– even to the researchers. Also, when someone writes
or posts something on an open public forum, the per-
son understands that anyone can read and analyse their
content. erefore, there is implicit consent in the very
act of posting an experience online in public. As a result,
researchers analysed and drew their inferences from a
dataset that had only symptomatic words.
Future studies
We mentioned the problem of the categorical Vs dimen-
sional approach. e DSM and ICD are categorical, while
the evidence suggests that psychopathological experi-
ences are dimensional by nature. We argue that although
the evidence goes with the dimensional approach, build-
ing a taxonomic system on the dimensional scale of each
symptom would indicate each symptom on the scale and
no diagnostic system at all to facilitate communication,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 16 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
treatment, prognosis and research. We proposed a mid-
way in this study. Signs and symptoms in each syndrome
can be dimensional. is means while 2 clusters or syn-
dromes might have “sleep disturbance” as the common
symptom. But each of these symptoms might are dimen-
sionally different. For example, for people under cluster
A, sleep disturbances happen, maybe seven days a week.
But for people under cluster D, sleep disturbance occurs
only once or twice a month. Unfortunately, the dataset
used in this current study does not have the required
quantitative data. Still, it forms a pre-cursor to make
such as study possible. Following this study, the imme-
diate next step might be conducting a survey, collecting
quantitative data asking about the frequency for specific
symptoms under each syndrome we found in the current
study and analysing the differences in each symptom’s
dimensional aspect under each syndrome.
Such future studies can find the categories via tradi-
tional approaches such as Exploratory Factor Analysis
(EFA) and Confirmatory Factor Analysis (CFA) studies
or via person centred methods like Latent Class Analy-
sis (LCA) and Latent Profile Analysis (LPA) – such lines
of investigation would want to know how the categories
related to each other and also how they relate to numer-
ous other variables such as behavioural data, how they
manifest over time, how they respond to treatment pro-
tocols are some more/less resistant and so forth.
Furthermore, future studies can use neural text embed-
dings as the preferred representation format (instead of
the TF-IDF approach) using the same dataset and com-
pare the results from this study.
To conclude, the current study, in a sense, was an
attempt to explore the DSM based disorders using clus-
tering patterns from the data given by patients diagnosed
with DSM disorders. e test was to see, unguided by
the DSM diagnosis but with just the symptoms reported
by the patients, how well we can group symptoms into
categories. In doing so, we also attempt to demon-
strate how can we build alternative forms of diagnos-
tic systems without relying on the traditional ones and
thereby avoiding their pitfalls. e hope is to encourage
patients’ perspectives to be more central to the mental
healthcare’system’s design and delivery.
Supplementary Information
The online version contains supplementary material available at https:// doi.
org/ 10. 1186/ s12888- 022- 03984-2.
Additional le1: Appendix A. Figure1 Elbow using the distortion score
(Within-Cluster Sum of Scores). Figure2 Silhouette score elbow for K
Means Clustering (when the algorithm was run up to a max of k=200).
Appendix B.Table1 Potential categories of mental disorders based on
patients’ first-hand narratives. Table2 Jaccard’s coefficient indicating
similarity between clusters.
Acknowledgements
N/A
Authors’ contributions
CCG contributed to the conception and design of the work; the acquisition,
analysis, and interpretation of data; the creation of a new algorithm used
in the work; and writing the original draft. D.M., G.D., and C.S. substantially
supervised the research development and reviewed and revised earlier drafts.
C.A. reviewed and substantially contributed to the original draft and rewrote
several sections of the final draft, which clarified the academic arguments
and interpretation of the results. All authors read and approved the final
manuscript.
Funding
We acknowledge the support of the European Union’s Horizon 2020 research
and innovation programme under the Marie Skłodowska-Curie grant agree-
ment No 754507 for funding this project. The funders had no other role or
intervention in this current study.
Availability of data and materials
The data can be obtained from Chandril Ghosh (ghoshchandril@gmail.com)
upon reasonable request.
Declarations
Ethics approval and consent to participate
All methods were carried out in accordance with relevant guidelines and
regulations. The current study used a secondary dataset which was previously
fully considered for its ethical aspects by the Research Ethics Committee of
Queen’s University Management School and approved.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Author details
1 School of Psychology, Queen’s University Belfast, Belfast, United Kingdom.
2 Queen’s Management School, Queen’s University Belfast, Belfast, United
Kingdom. 3 School of Social Sciences, Education and Social Work, Queen’s Uni-
versity Belfast, Belfast, United Kingdom. 4 IMPACT Research Centre, Northern
Health and Social Care Trust, Antrim, United Kingdom.
Received: 20 August 2021 Accepted: 10 May 2022
References
1. Ghosh C, McVicar D, Davidson G, Shannon C. Measuring diagnostic het-
erogeneity using text-mining of the lived experiences of patients. BMC
Psychiatry. 2021;21(1). https:// doi. org/ 10. 1186/ s12888- 021- 03044-1.
2. Kendler K. Classification of psychopathology: conceptual and historical
background. World Psychiatry. 2018;17(3):241–2. https:// doi. org/ 10. 1002/
wps. 20549.
3. Maj M. Why the clinical utility of diagnostic categories in psychiatry is
intrinsically limited and how we can use new approaches to comple-
ment them. World Psychiatry. 2018;17(2):121–2. https:// doi. org/ 10. 1002/
wps. 20512.
4. First M, Rebello T, Keeley J, Bhargava R, Dai Y, Kulygina M, et al. Do mental
health professionals use diagnostic classifications the way we think they
do? A global survey. World Psychiatry. 2018;17(2):187–95. https:// doi. org/
10. 1002/ wps. 20525.
5. Kirk S, Kutchins H. The selling of DSM: The Rhetoric of Science in Psychia-
try (1st ed.). New York: Routledge; 1992.
6. American Psychiatric Association. 2013. Diagnostic and statistical manual
of mental disorders (5th ed.). https:// doi. org/ 10. 1176/ appi. books. 97808
90425 596.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 17 of 17
Ghoshetal. BMC Psychiatry (2022) 22:427
7. World Health Organization. 2020. International statistical classification of
diseases and related health problems (11th ed.). https:// icd. who. int/.
8. Kirk S, Kutchins H. The Myth of the Reliability of DSM. J Mind Behav.
1994;15(1/2):71–86.
9. Kotov R, Krueger R, Watson D, Achenbach T, Althoff R, Bagby R, et al. The
Hierarchical Taxonomy of Psychopathology (HiTOP): a dimensional alter-
native to traditional nosologies. J Abnorm Psychol. 2017;126(4):454–77.
https:// doi. org/ 10. 1037/ abn00 00258.
10. Shorter E. 2013. The History of DSM. In: Paris J., Phillips J. (eds) Mak-
ing the DSM-5. Springer, New York, NY. https:// doi. org/ 10. 1007/
978-1- 4614- 6504-1_1.
11. Cosgrove L, Krimsky S. A comparison of DSM-IV and DSM-5 panel mem-
bers’ financial associations with industr y: a pernicious problem persists.
PLoS Med. 2012;9(3):e1001190. https:// doi. org/ 10. 1371/ journ al. pmed.
10011 90.
12. Regan P, Hudson N, McRory B. Patient participation in public elections:
a literature review. Nurs Manage. 2011;17(10):32–6. https:// doi. org/ 10.
7748/ nm2011. 03. 17. 10. 32. c8358.
13. Scott J, Henry C. Clinical staging models: from general medicine to
mental disorders. Bjpsych Advances. 2017;23(5):292–9. https:// doi. org/ 10.
1192/ apt. bp. 116. 016436.
14. Cuthbert B. Research Domain Criteria: toward future psychiatric nosolo-
gies. Dialogues Clin Neurosci. 2015;17(1):89–97. https:// doi. org/ 10. 31887/
dcns. 2015. 17.1/ bcuth bert.
15. McGorry P, Nelson B, Goldstone S, Yung A. Clinical staging: a heuristic
and practical strategy for new research and better health and social
outcomes for psychotic and related mood disorders. Can J Psychiatry.
2010;55(8):486–97. https:// doi. org/ 10. 1177/ 07067 43710 05500 803.
16. Ghosh C, Sen I. Relation of negative parenting style with percep-
tion of pain and fear among young adults. J Res: Bede Athenaeum.
2015;6(1):142. https:// doi. org/ 10. 5958/ 0976- 1748. 2015. 00017.x.
17. Chafetz M, Underhill J. Estimated costs of malingered disability. Arch Clin
Neuropsychol. 2013;28(7):633–9. https:// doi. org/ 10. 1093/ arclin/ act038.
18. MacCallum R, Zhang S, Preacher K, Rucker D. On the practice of dichoto-
mization of quantitative variables. Psychol Methods. 2002;7(1):19–40.
https:// doi. org/ 10. 1037/ 1082- 989x.7. 1. 19.
19. Markon K, Chmielewski M, Miller C. The reliability and validity of discrete
and continuous measures of psychopathology: a quantitative review.
Psychol Bull. 2011;137(5):856–79. https:// doi. org/ 10. 1037/ a0023 678.
20. Morey R, Gold A, LaBar K, Beall S, Brown V, Haswell C, et al. Amygdala vol-
ume changes in posttraumatic stress disorder in a large case-controlled
veterans group. Arch Gen Psychiatry. 2012;69(11):1169. https:// doi. org/ 10.
1001/ archg enpsy chiat ry. 2012. 50.
21. Contreras A, Nieto I, Valiente C, Espinosa R, Vazquez C. The study of
psychopathology from the network analysis perspective: a systematic
review. Psychotherapy And Psychosomatics. 2019;88(2):71–83. https://
doi. org/ 10. 1159/ 00049 7425.
22. Ghosh C. Using data analytics and innovative research methodologies
for the mapping of psychopathology (Doctoral dissertation). Queen’s
University Belfast; 2022.
23. Fusar-Poli P, Solmi M, Brondino N, Davies C, Chae C, Politi P, et al.
Transdiagnostic psychiatry: a systematic review. World Psychiatr y.
2019;18(2):192–207. https:// doi. org/ 10. 1002/ wps. 20631.
24. Bengfort B, Bilbro R. Yellowbrick: Visualizing the scikit-learn model selec-
tion process. J Open Source Softw. 2019;4(35):1075. https:// doi. org/ 10.
21105/ joss. 01075.
25. Sarkar, T. 2019. Clustering metrics better than the elbow-method [Blog].
Retrieved from https:// towar dsdat ascie nce. com/ clust ering- metri cs- bet-
ter- than- the- elbow- method- 6926e 1f723 a6.
26. Nguyen T, Phung D, Dao B, Venkatesh S, Berk M. Affective and content
analysis of online depression communities. IEEE Trans Affect Comput.
2014;5(3):217–26. https:// doi. org/ 10. 1109/ taffc. 2014. 23156 23.
27. Baskaran U, Ramanujam K. Automated scraping of structured data
records from health discussion forums using semantic analysis. Inform
Med Unlocked. 2018;10:149–58. https:// doi. org/ 10. 1016/j. imu. 2018. 01.
003.
28. Kaye L, Hewson C, Buchanan T, Coulson N, Branley-Bell D, Fullwood
C, Devlin L. (2021). Ethics guidelines for internetmediated research.
Retrieved 30 March 2022, from https:// www. bps. org. uk/ sites/ www. bps.
org. uk/ files/ Policy/ Policy% 20-% 20Fil es/ Ethics% 20Gui delin es% 20for%
20Int ernet- media ted% 20Res earch. pdf.
29. Watson D. Subtypes, specifiers, epicycles, and eccentrics: toward a more
parsimonious taxonomy of psychopathology. Clin Psychology: Sci Pract.
2003;10(2):233–8. https:// doi. org/ 10. 1093/ clipsy/ bpg013.
30. Caspi A, Houts R, Belsky D, Goldman-Mellor S, Harrington H, Israel S, et al.
The p Factor: one general psychopathology factor in the structure of
psychiatric disorders? Clin Psychol Sci. 2013;2(2):119–37. https:// doi. org/
10. 1177/ 21677 02613 497473.
31. Selzam S, Coleman J, Caspi A, Moffitt T, Plomin R. A polygenic p factor for
major psychiatric disorders. Transl Psychiatry. 2018;8(1). https:// doi. org/ 10.
1038/ s41398- 018- 0217-4.
32. Sim L, Lebow J, Weiss K, Harrison T, Bruce B. Eating Disorders in Adoles-
cents With Chronic Pain. J Pediatr Health Care. 2017;31(1):67–74. https://
doi. org/ 10. 1016/j. pedhc. 2016. 03. 001.
33. McAulay C, Hay P, Mond J, Touyz S. Eating disorders, bipolar disorders and
other mood disorders: complex and under-researched relationships. J Eat
Disord. 2019;7(1). https:// doi. org/ 10. 1186/ s40337- 019- 0262-2.
34. Kim S, Lee H. Sleep and Circadian Rhythm Disturbances in Eating Dis-
orders. Chronobiol Med. 2020;2(4):141–7. https:// doi. org/ 10. 33069/ cim.
2020. 0027.
35. Birgegård A, Björck C, Norring C, Sohlberg S, Clinton D. Anorexic self-
control and bulimic self-hate: differential outcome prediction from initial
self-image. Int J Eat Disord. 2009;42(6):522–30. https:// doi. org/ 10. 1002/
eat. 20642.
36. Yates W, Lund B, Johnson C, Mitchell J, McKee P. Attention-deficit hyper-
activity symptoms and disorder in eating disorder inpatients. Int J Eat
Disord. 2009;42(4):375–8. https:// doi. org/ 10. 1002/ eat. 20627.
37. Shafran R, Robinson P. Thought-shape fusion in eating disorders. Br J Clin
Psychol. 2004;43(4):399–408. https:// doi. org/ 10. 1348/ 01446 65042 389008.
38. Perpiñtá C, Roncero M, Belloch A, Sánchez-Reales S. Eating-related intru-
sive thoughts inventory: exploring the dimensionality of eating disorder
symptoms. Psychol Rep. 2011;109(1):108–26. https:// doi. org/ 10. 2466/ 02.
09. 13. 18. pr0. 109.4. 108- 126.
39. Goossens L, Braet C, Van Vlierberghe L, Mels S. Loss of control over eating
in overweight youngsters: The role of anxiety, depression and emotional
eating. Eur Eat Disord Rev. 2009;17(1):68–78. https:// doi. org/ 10. 1002/ erv.
892.
40. Swenne I, Larsson P. Heart risk associated with weight loss in anorexia
nervosa and eating disorders: risk factors for QTc interval prolongation
and dispersion. Acta Paediatr. 2007;88(3):304–9. https:// doi. org/ 10. 1111/j.
1651- 2227. 1999. tb011 01.x.
41. Kelleher I, DeVylder J. Hallucinations in borderline personality disorder
and common mental disorders. Br J Psychiatry. 2017;210(3):230–1.
https:// doi. org/ 10. 1192/ bjp. bp. 116. 185249.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The Research Domain Criteria (RDoC) project was initiated by the National Institute of Mental Health (NIMH) in early 2009 as the implementation of Goal 1.4 of its just-issued strategic plan. In keeping with the NIMH mission, to “transform the understanding and treatment of mental illnesses through basic and clinical research,” RDoC was explicitly conceived as a research-related initiative. The statement of the relevant goal in the strategic plan reads: “Develop, for research purposes, new ways of classifying mental disorders based on dimensions of observable behavior and neurobiological measures.” Due to the novel approach that RDoC takes to conceptualizing and studying mental disorders, it has received widespread attention, well beyond the borders of the immediate research community. This review discusses the rationale for the experimental framework that RDoC has adopted, and its implications for the nosology of mental disorders in the future.
Article
Full-text available
Background The diagnostic system is fundamental to any health discipline, including mental health, as it defines mental illness and helps inform possible treatment and prognosis. Thus, the procedure to estimate the reliability of such a system is of utmost importance. The current ways of measuring the reliability of the diagnostic system have limitations. In this study, we propose an alternative approach for verifying and measuring the reliability of the existing system. Methods We perform Jaccard’s similarity index analysis between first person accounts of patients with the same disorder (in this case Major Depressive Disorder) and between those who received a diagnosis of a different disorder (in this case Bulimia Nervosa) to demonstrate that narratives, when suitably processed, are a rich source of data for this purpose. We then analyse 228 narratives of lived experiences from patients with mental disorders, using Python code script, to demonstrate that patients with the same diagnosis have very different illness experiences. Results The results demonstrate that narratives are a statistically viable data resource which can distinguish between patients who receive different diagnostic labels. However, the similarity coefficients between 99.98% of narrative pairs, including for those with similar diagnoses, are low (< 0.3), indicating diagnostic Heterogeneity. Conclusions The current study proposes an alternative approach to measuring diagnostic Heterogeneity of the categorical taxonomic systems (e.g. the Diagnostic and Statistical Manual, DSM). In doing so, we demonstrate the high Heterogeneity and limited reliability of the existing system using patients’ written narratives of their illness experiences as the only data source. Potential applications of these outputs are discussed in the context of healthcare management and mental health research.
Article
Full-text available
The usefulness of current psychiatric classification, which is based on ICD/DSM categorical diagnoses, remains questionable. A promising alternative has been put forward as the “transdiagnostic” approach. This is expected to cut across existing categorical diagnoses and go beyond them, to improve the way we classify and treat mental disorders. This systematic review explores whether self‐defining transdiagnostic research meets such high expectations. A multi‐step Web of Science literature search was performed according to an a priori protocol, to identify all studies that used the word “transdiagnostic” in their title, up to May 5, 2018. Empirical variables which indexed core characteristics were extracted, complemented by a bibliometric and conceptual analysis. A total of 111 studies were included. Most studies were investigating interventions, followed by cognition and psychological processes, and neuroscientific topics. Their samples ranged from 15 to 91,199 (median 148) participants, with a mean age from 10 to more than 60 (median 33) years. There were several methodological inconsistencies relating to the definition of the gold standard (DSM/ICD diagnoses), of the outcome measures and of the transdiagnostic approach. The quality of the studies was generally low and only a few findings were externally replicated. The majority of studies tested transdiagnostic features cutting across different diagnoses, and only a few tested new classification systems beyond the existing diagnoses. About one fifth of the studies were not transdiagnostic at all, because they investigated symptoms and not disorders, a single disorder, or because there was no diagnostic information. The bibliometric analysis revealed that transdiagnostic research largely restricted its focus to anxiety and depressive disorders. The conceptual analysis showed that transdiagnostic research is grounded more on rediscoveries than on true innovations, and that it is affected by some conceptual biases. To date, transdiagnostic approaches have not delivered a credible paradigm shift that can impact classification and clinical care. Practical “TRANSD”iagnostic recommendations are proposed here to guide future research in this field.
Article
Full-text available
Background: Network analysis (NA) is an analytical tool that allows one to explore the map of connections and eventual dynamic influences among symptoms and other elements of mental disorders. In recent years, the use of NA in psychopathology has rapidly grown, which calls for a systematic and critical analysis of its clinical utility. Methods: Following PRISMA guidelines, a systematic review of published empirical studies applying NA in psychopathology, between 2010 and 2017, was conducted. We included the literature published in PubMed and PsycINFO using as keywords any combination of “network analysis” with the terms “anxiety,” “affective disorders,” “depression,” “schizophrenia,” “psychosis,” “personality disorders,” “substance abuse” and “psychopathology.” Results: The review showed that NA has been applied in a plethora of mental disorders in adults (i.e., 13 studies on anxiety disorders; 19 on mood disorders; 7 on psychosis; 1 on substance abuse; 1 on borderline personality disorder; 18 on the association of symptoms between disorders), and 6 on childhood and adolescence. Conclusions: A critical examination of the results of each study suggests that NA helps to identify, in an innovative way, important aspects of psychopathology like the centrality of the symptoms in a given disorder as well as the mutual dynamics among symptoms. Yet, despite these promising results, the clinical utility of NA is still uncertain as there are important limitations on the analytic procedures (e.g., reliability of indices), the type of data included (e.g., typically restricted to secondary analysis of already published data), and ultimately, the psychometric and clinical validity of the results.
Article
Full-text available
It has recently been proposed that a single dimension, called the p factor, can capture a person’s liability to mental disorder. Relevant to the p hypothesis, recent genetic research has found surprisingly high genetic correlations between pairs of psychiatric disorders. Here, for the first time, we compare genetic correlations from different methods and examine their support for a genetic p factor. We tested the hypothesis of a genetic p factor by applying principal component analysis to matrices of genetic correlations between major psychiatric disorders estimated by three methods—family study, genome-wide complex trait analysis, and linkage-disequilibrium score regression—and on a matrix of polygenic score correlations constructed for each individual in a UK-representative sample of 7 026 unrelated individuals. All disorders loaded positively on a first unrotated principal component, which accounted for 57, 43, 35, and 22% of the variance respectively for the four methods. Our results showed that all four methods provided strong support for a genetic p factor that represents the pinnacle of the hierarchical genetic architecture of psychopathology.
Article
Full-text available
We report on a global survey of diagnosing mental health professionals, primarily psychiatrists, conducted as a part of the development of the ICD-11 mental and behavioural disorders classification. The survey assessed these professionals' use of various components of the ICD-10 and the DSM, their attitudes concerning the utility of these systems, and usage of "residual" (i.e., "other" or "unspecified") categories. In previous surveys, most mental health professionals reported they often use a formal classification system in everyday clinical work, but very little is known about precisely how they are using those systems. For example, it has been suggested that most clinicians employ only the diagnostic labels or codes from the ICD-10 in order to meet administrative requirements. The present survey was conducted with clinicians who were members of the Global Clinical Practice Network (GCPN), established by the World Health Organization as a tool for global participation in ICD-11 field studies. A total of 1,764 GCPN members from 92 countries completed the survey, with 1,335 answering the questions with reference to the ICD-10 and 429 to the DSM (DSM-IV, DSM-IV-TR or DSM-5). The most frequent reported use of the classification systems was for administrative or billing purposes, with 68.1% reporting often or routinely using them for that purpose. A bit more than half (57.4%) of respondents reported often or routinely going through diagnostic guidelines or criteria systematically to determine whether they apply to individual patients. Although ICD-10 users were more likely than DSM-5 users to utilize the classification for administrative purposes, other differences were either slight or not significant. Both classifications were rated to be most useful for assigning a diagnosis, communicating with other health care professionals and teaching, and least useful for treatment selection and determining prognosis. ICD-10 was rated more useful than DSM-5 for administrative purposes. A majority of clinicians reported using "residual" categories at least sometimes, with around 12% of ICD-10 users and 19% of DSM users employing them often or routinely, most commonly for clinical presentations that do not conform to a specific diagnostic category or when there is insufficient information to make a more specific diagnosis. These results provide the most comprehensive available information about the use of diagnostic classifications of mental disorders in ordinary clinical practice.