Figure 1 - uploaded by Thomas H McCoy
Content may be subject to copyright.
Domain comparison contour plots showing change between admission (top) and discharge (bottom). BPAD-M, bipolar disorder-mania; MDD, major depressive disorder.
Source publication
Background:
Relying on diagnostic categories of neuropsychiatric illness obscures the complexity of these disorders. Capturing multiple dimensional measures of neuropathology could facilitate the clinical and neurobiological investigation of cognitive and behavioral phenotypes.
Methods:
We developed a natural language processing-based approach t...
Contexts in source publication
Context 1
... goal of subsequent steps in phenotype derivation was to derive a set of tokens (i.e., single words or sets of two words [bigrams]) reflecting individual RDoC domains in narrative notes. We developed a multistep process that used the text of DSM-IV-TR, a list of 10 to 50 seed unigrams or bigrams manually curated per domain based on expert consensus (THM, RHP) review of the RDoC workgroup statements, and psychiatric discharge summaries to identify terms that may be conceptually similar to those experts associate with each of the five RDoC domains (23); for an overview of the entire process, see Supplemental Figure S1. Both the DSM-IV and the corpus of narrative discharge notes were normalized using the Unified Medical Language System Lexical Variant Gener- ation package (24). ...
Context 2
... identified 3619 individuals with 4623 hospital discharges between 2010 and 2015, and sociodemographic and clinical descriptors are shown in Table 1. Figure 1 illustrates the dis- tribution of cognition and negative valence estimated RDoC scores for individuals with MDD or BPD mania, at hospital admission (top) and discharge (bottom). While depressive symptoms are generally more severe among patients admitted for MDD, the range of depressive symptoms among those with mania illustrates the spectrum of mixed features. ...
Citations
... Relative to qualitative analyses, they are quantitative and scalable (easily applied to large text data sets, eg, social media, electronic health record data). Natural language methods have been valuable in several psychiatric applications, including dimensional phenotyping 15 and prediction of treatment response, 16 but we are not aware of previous applications to symptom attributions. ...
Importance
In primary chronic back pain (CBP), the belief that pain indicates tissue damage is both inaccurate and unhelpful. Reattributing pain to mind or brain processes may support recovery.
Objectives
To test whether the reattribution of pain to mind or brain processes was associated with pain relief in pain reprocessing therapy (PRT) and to validate natural language–based tools for measuring patients’ symptom attributions.
Design, Setting, and Participants
This secondary analysis of clinical trial data analyzed natural language data from patients with primary CBP randomized to PRT, placebo injection control, or usual care control groups and treated in a US university research setting. Eligible participants were adults aged 21 to 70 years with CBP recruited from the community. Enrollment extended from 2017 to 2018, with the current analyses conducted from 2020 to 2022.
Interventions
PRT included cognitive, behavioral, and somatic techniques to support reattributing pain to nondangerous, reversible mind or brain causes. Subcutaneous placebo injection and usual care were hypothesized not to affect pain attributions.
Main Outcomes and Measures
At pretreatment and posttreatment, participants listed their top 3 perceived causes of pain in their own words (eg, football injury, bad posture, stress); pain intensity was measured as last-week average pain (0 to 10 rating, with 0 indicating no pain and 10 indicating greatest pain). The number of attributions categorized by masked coders as reflecting mind or brain processes were summed to yield mind-brain attribution scores (range, 0-3). An automated scoring algorithm was developed and benchmarked against human coder–derived scores. A data-driven natural language processing (NLP) algorithm identified the dimensional structure of pain attributions.
Results
We enrolled 151 adults (81 female [54%], 134 White [89%], mean [SD] age, 41.1 [15.6] years) reporting moderate severity CBP (mean [SD] intensity, 4.10 [1.26]; mean [SD] duration, 10.0 [8.9] years). At pretreatment, 41 attributions (10%) were categorized as mind- or brain-related across intervention conditions. PRT led to significant increases in mind- or brain-related attributions, with 71 posttreatment attributions (51%) in the PRT condition categorized as mind- or brain-related, as compared with 22 (8%) in control conditions (mind-brain attribution scores: PRT vs placebo, g = 1.95 [95% CI, 1.45-2.47]; PRT vs usual care, g = 2.06 [95% CI, 1.57-2.60]). Consistent with hypothesized PRT mechanisms, increases in mind-brain attribution score were associated with reductions in pain intensity at posttreatment (standardized β = −0.25; t 127 = −2.06; P = .04) and mediated the effects of PRT vs control on 1-year follow-up pain intensity (β = −0.35 [95% CI, −0.07 to −0.63]; P = .05). The automated word-counting algorithm and human coder-derived scores achieved moderate and substantial agreement at pretreatment and posttreatment (Cohen κ = 0.42 and 0.68, respectively). The data-driven NLP algorithm identified a principal dimension of mind and brain vs biomechanical attributions, converging with hypothesis-driven analyses.
Conclusions and Relevance
In this secondary analysis of a randomized trial, PRT increased attribution of primary CBP to mind- or brain-related causes. Increased mind-brain attribution was associated with reductions in pain intensity.
... 24 In the clinical setting, this methodology has been validated against expert annotation, formal cognitive testing, and clinical prediction tasks. [25][26][27][28][29][30][31] To our knowledge, it has not yet been applied to the health records from post-mortem brain donors and used in combination with neuropathological readouts. ...
... [11][12][13][14] The domain-specific symptom burden can be estimated from patient medical records using NLP. 28,32,33 RDoC symptom burdens estimated from medical records by NLP have been associated with genetic variants and clinical outcomes including suicide, hospital use, new dementia diagnosis, and progression from dementia diagnosis to death. [34][35][36] We put forward that application of NLP-based methodologies to human brain post-mortem studies may represent a significant step toward a more current and translatable interpretation of molecular and cellular read-outs in the context of transdiagnostic clinical domains and symptom constructs. ...
... The present study used a previously described and validated NLP algorithm for quantifying estimated cognitive symptoms from narrative clinical text. 28 In brief, this method relies on recognizing a pre-specified set of symptom-related terms within the available records. The term list was developed through an iterative process of refinement seeded with lists of terms developed by a group of clinical experts, including the NIMH Research Domain Criteria Working Group. ...
Introduction
Transdiagnostic dimensional phenotypes are essential to investigate the relationship between continuous symptom dimensions and pathological changes. This is a fundamental challenge to post‐mortem work, as assessments of phenotypic concepts need to rely on existing records.
Methods
We adapted well‐validated methodologies to compute National Institute of Mental Health Research Domain Criteria (RDoC) scores using natural language processing (NLP) from electronic health records (EHRs) obtained from post‐mortem brain donors and tested whether cognitive domain scores were associated with Alzheimer's disease neuropathological measures.
Results
Our results confirm an association of EHR‐derived cognitive scores with neuropathological findings. Notably, higher neuropathological load, particularly neuritic plaques, was associated with higher cognitive burden scores in the frontal (ß = 0.38, P = 0.0004), parietal (ß = 0.35, P = 0.0008), temporal (ß = 0.37, P = 0.0004) and occipital (ß = 0.37, P = 0.0003) lobes.
Discussion
This proof‐of‐concept study supports the validity of NLP‐based methodologies to obtain quantitative measures of RDoC clinical domains from post‐mortem EHR. The associations may accelerate post‐mortem brain research beyond classical case–control designs.
... In order to derive features from the unstructured clinician notes, we created a custom lexicon of suicide-relevant and psychiatric concepts using a variety of approaches including: (1) selecting signs and symptoms, and mental and behavioral process semantic types from the Unified Medical Language System (UMLS) 19 ; (2) mapping DSM symptoms and concepts from structured instruments 20 ; (3) automatically extracting features from public sources including Wikipedia and MedScape; (4) incorporating RDoC domain matrix terms 20 ; (5) selecting predictive features from coded suicide attempt prediction models 21 ; and (6) manual annotation of terms by expert clinicians. This lexicon was linked to UMLS concepts and included 480 distinct semantic concepts and 1,273 tokens or phrases. ...
Clinical risk prediction models powered by electronic health records (EHRs) are becoming increasingly widespread in clinical practice. With suicide-related mortality rates rising in recent years, it is becoming increasingly urgent to understand, predict, and prevent suicidal behavior. Here, we compare the predictive value of structured and unstructured EHR data for predicting suicide risk. We find that Naive Bayes Classifier (NBC) and Random Forest (RF) models trained on structured EHR data perform better than those based on unstructured EHR data. An NBC model trained on both structured and unstructured data yields similar performance (AUC = 0.743) to an NBC model trained on structured data alone (0.742, p = 0.668), while an RF model trained on both data types yields significantly better results (AUC = 0.903) than an RF model trained on structured data alone (0.887, p < 0.001), likely due to the RF model’s ability to capture interactions between the two data types. To investigate these interactions, we propose and implement a general framework for identifying specific structured-unstructured feature pairs whose interactions differ between case and non-case cohorts, and thus have the potential to improve predictive performance and increase understanding of clinical risk. We find that such feature pairs tend to capture heterogeneous pairs of general concepts, rather than homogeneous pairs of specific concepts. These findings and this framework can be used to improve current and future EHR-based clinical modeling efforts.
... To generate symptom domains, we applied a simple natural language processing (NLP) strategy that we have successfully applied in abundant prior work examining clinical and biological associations with neuropsychiatric symptoms (see, e.g., McCoy and Barroilhet [15,16]). NLP refers to using computational methods to extract concepts from text (i.e., natural language) rather than relying solely on coded data such as ICD-10 diagnoses. ...
Neuropsychiatric symptoms may persist following acute COVID-19 illness, but the extent to which these symptoms are specific to COVID-19 has not been established. We utilized electronic health records across 6 hospitals in Massachusetts to characterize cohorts of individuals discharged following admission for COVID-19 between March 2020 and May 2021, and compared them to individuals hospitalized for other indications during this period. Natural language processing was applied to narrative clinical notes to identify neuropsychiatric symptom domains up to 150 days following hospitalization, in addition to those reflected in diagnostic codes as measured in prior studies. Among 6619 individuals hospitalized for COVID-19 drawn from a total of 42,961 hospital discharges, the most commonly-documented symptom domains between 31 and 90 days after initial positive test were fatigue (13.4%), mood and anxiety symptoms (11.2%), and impaired cognition (8.0%). In regression models adjusted for sociodemographic features and hospital course, none of these were significantly more common among COVID-19 patients; indeed, mood and anxiety symptoms were less frequent (adjusted OR 0.72 95% CI 0.64–0.92). Between 91 and 150 days after positivity, most commonly-detected symptoms were fatigue (10.9%), mood and anxiety symptoms (8.2%), and sleep disruption (6.8%), with impaired cognition in 5.8%. Frequency was again similar among non-COVID-19 post-hospital patients, with mood and anxiety symptoms less common (aOR 0.63, 95% CI 0.52–0.75). Propensity-score matched analyses yielded similar results. Overall, neuropsychiatric symptoms were common up to 150 days after initial hospitalization, but occurred at generally similar rates among individuals hospitalized for other indications during the same period. Post-acute sequelae of COVID-19 may benefit from standard if less-specific treatments developed for rehabilitation after hospitalization.
... Uncovering the latent dimensions of psychopathology from dense symptom-and trait-level data, on the one hand, will help to identify individuals' unique (dys-)functional profile and enable targeted interventions for specific functional spectra; on the other hand, they can provide neurobiological studies with an improved scaffolding to investigate underlying pathogenic processes. Although attempts of genomic enquiry on dimensional traits established in a data-driven fashion are still scarce, efforts of GWAS on theory-based psychopathology traits (e.g., extracted from clinical notes [43] , neurocognitive tests [44] , self-report assessments [45] ) are blooming and provide important leads for follow-up causal assessments. ...
The dense co-occurrence of psychiatric disorders questions the categorical classification tradition and motivates efforts to establish dimensional constructs with neurobiological foundations that transcend diagnostic boundaries. In this study, we examined the genetic liability for eight major psychiatric disorder phenotypes under both a disorder-specific and a transdiagnostic framework. In a deeply-phenotyped sample (n=513) consisting of 452 patients from tertiary care with mood disorders, anxiety disorders, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorders (ASD), and/or substance use disorders (SUD) and 61 unaffected comparison individuals, we derived subject-specific multi-base polygenic risk score (PRS) profiles and assessed their associations with psychiatric diagnoses, comorbidity status, as well as cross-disorder behavioral dimensions. High PRS for depression was unselectively associated with the diagnosis of SUD, ADHD, anxiety disorders, mood disorders, and the comorbidities among them. In the dimensional approach, four distinct functional domains were uncovered, namely the negative valence, social, cognitive, and regulatory systems, closely matching the major functional domains proposed by the Research Domain Criteria (RDoC) framework. Critically, the genetic predisposition for depression was selectively reflected in the functional aspect of negative valence systems but not others. This study highlights a misalignment between current psychiatric nosology and the underlying psychiatric genetic etiology, and underscores the effectiveness of the dimensional approach in both the functional characterization of psychiatric patients and the delineation of the genetic liability for psychiatric disorders.
... These can include analyses of texts generated by participants, but other sources are also promising. A recent study demonstrated the feasibility of NLP to extract measures of RDoC domains from narrative inpatient chart notes, which predicted pertinent clinical outcomes such as length of stay and increases in readmission risk (McCoy et al., 2018). These advanced technologies, combined with computational methods for analysis, have the potential to revolutionize the understanding of real-world behavior and its relationship to psychopathology on an individual basis. ...
The National Institute of Mental Health (NIMH) addressed in its 2008 Strategic Plan an emerging concern that the current diagnostic system was hampering translational research, as accumulating data suggested that the system’s disorder categories constituted heterogeneous syndromes rather than specific diseases. However, established practices in peer review placed high priority on that system’s disorders in evaluating grant applications for mental illness. To provide guidelines for alternative study designs, NIMH set a goal to develop new ways of studying psychopathology based on dimensions of measurable behavior and related neurobiological measures. The Research Domain Criteria (RDoC) project is the result, intended to build a literature that informs new conceptions of mental illness and future revisions to diagnostic manuals. The framework calls for the study of empirically derived fundamental dimensions characterized by related behavioral/psychological and neurobiological data (e.g., reward valuation, working memory). RDoC also emphasizes approaches including neurodevelopment, environmental effects, and the full range of dimensions of interest (from typical to increasingly abnormal), as well as research designs that integrate data across behavioral, biological, and self-report measures. This article provides an overview of the project’s first decade and its potential future directions. RDoC remains grounded in experimental psychopathology perspectives, and its progress is strongly linked to psychological measurement and integrative approaches to brain-behavior relationships.
... In fact, it has been explicitly suggested that the majority of samples used in published genetic discovery studies have not been collected with the required amount of phenotypic data necessary to advance diagnostics, stratification and treatment 18 . Thus, many research groups have directed their efforts to access resources with large amounts of routinely collected data, such as population biobanks and electronic health record systems, from which rich phenotypic data can be derived [18][19][20] . However, some common limitations of these include selection biases and underrepresentation of clinically severe disorders 20 21 . ...
Introduction
Current psychiatric diagnoses, although heritable, have not been clearly mapped onto distinct underlying pathogenic processes. The same symptoms often occur in multiple disorders, and a substantial proportion of both genetic and environmental risk factors are shared across disorders. However, the relationship between shared symptomatology and shared genetic liability is still poorly understood. Well-characterised, cross-disorder samples are needed to investigate this matter, but currently few exist, and severe mental disorders are poorly represented in existing biobanking efforts. Purposely curated and aggregated data from individual research groups can fulfil this unmet need, resulting in rich resources for psychiatric research.
Methods and analyses
As part of the Cardiff MRC Mental Health Data Pathfinder, we have curated and harmonised phenotypic and genetic information from 15 studies within the MRC Centre for Neuropsychiatric Genetics and Genomics to create a new data repository, DRAGON-DATA. To date, DRAGON-DATA includes over 45,000 individuals: adults or children with psychiatric diagnoses, affected probands with family members and individuals who carry a known neurodevelopmental copy number variant (ND-CNV). We have processed the available phenotype information to derive core variables that can be reliably analysed across groups. In addition, all datasets with genotype information have undergone rigorous quality control, imputation, CNV calling and polygenic score generation.
Ethics and Dissemination
DRAGON-DATA combines genetic and non-genetic information and is available as a resource for research across traditional psychiatric diagnostic categories. Its structure and governance follow standard UK ethical requirements (at the level of participating studies and the project as a whole) and conforms to principles reflected in the EU data protection scheme (GDPR). Algorithms and pipelines used for data harmonisation are currently publicly available for the scientific community, and an appropriate data sharing protocol will be developed as part of ongoing projects (DATAMIND) in partnership with HDR UK.
... b, Semantic similarity to seed terms in the RDoC framework for our term lists generated using GloVe (colored) compared to a baseline from the literature (dark gray). The baseline model includes term lists generated by McCoy et al. 26 through latent semantic analysis followed by filtering. Bootstrap distributions for each domain were generated by resampling the 100-n embedding dimension with replacement over 10,000 iterations, then assessed for a one-sided difference in means (*FDR < 0.01, **FDR < 0.001). ...
... For RDoC, word embeddings were trained using GloVe 25 on a corpus of 29,828 general neuroimaging articles (Extended Data Fig. 1b Table 2) by the cosine similarity of their embeddings to the centroid of seed embeddings in each domain. This approach yielded synonyms of the RDoC domains with higher semantic similarity to seed terms than a previously published NLP approach 26 (Fig. 3b). Finally, we mapped brain circuits from each term list based on PMI-weighted co-occurrences with brain structures across the full corpus of articles with coordinates (n = 18,155 articles), restricting circuits to positive values with FDR < 0.01. ...
... Categories should not only minimize the comorbidities and internal heterogeneity observed for the symptom criteria 12 , but also exhibit predictive value for clinical outcomes and treatment decisions. In this vein, RDoC has been applied to generate dimensional ratings of domains in electronic medical records that were not only associated with genes relevant to psychopathology 34 , but also predictive of time to hospital readmission 26 . We expect that a data-driven classification system for psychiatric disease would facilitate targeting of neuromodulatory treatments to the patients and stimulation sites most likely to respond. ...
Functional neuroimaging has been a mainstay of human neuroscience for the past 25 years. Interpretation of functional magnetic resonance imaging (fMRI) data has often occurred within knowledge frameworks crafted by experts, which have the potential to amplify biases that limit the replicability of findings. Here, we use a computational approach to derive a data-driven framework for neurobiological domains that synthesizes the texts and data of nearly 20,000 human neuroimaging articles. Across multiple levels of domain specificity, the structure–function links within domains better replicate in held-out articles than those mapped from dominant frameworks in neuroscience and psychiatry. We further show that the data-driven framework partitions the literature into modular subfields, for which domains serve as generalizable prototypes of structure–function patterns in single articles. The approach to computational ontology we present here is the most comprehensive characterization of human brain circuits quantifiable with fMRI and may be extended to synthesize other scientific literatures.
... The answer is clearly yes, but there are still significant improvements that are needed in EHRs as they are used both to deliver care [31] and to provide data for clinical research [32] (see Chap. 16). Some recent results show how useful information can be derived from both structured and unstructured information in an EHR [33][34][35]. In terms of finding biomarkers to solve the subgroup problem, information from EHRs will likely end up being somewhere between helpful and essential. ...
The current state of mental health informatics has been covered in the earlier chapters in this book. The focus of this chapter is on the future of this emerging field. First, a vision for that future is enumerated, imagining what an integrated approach to mental health informatics would be able to accomplish. Second, the harmonization of data across diverse mental health-relevant datasets is discussed. This is a significant obstacle to aggregating data from multiple laboratories that must be overcome. The need for training the informatics-savvy mental health research and healthcare workforce that will be required to achieve that vision is explored. Finally, a case study is presented showing what can be done using existing infrastructure and suggesting how a learning health care system can be built on top of that infrastructure.
... Natural language processing (NLP) was introduced as one of the ways to use RDoC, and hospital readmission could be predicted with RDoC domains extracted by NLP [10]. Thus, NLP can be used effectively to evaluate psychiatric notes as RDoC domains [11]. ...
... McCoy et al [11] previously described a method for estimating RDoC scores from narrative text. In summary, the method evaluates a document using a predetermined set of terms belonging to a given research domain. ...
... For instance, if 10 terms comprise a predefined list and 2 appear in a document, the note would be assigned a score of 2/10 (20%). The list of terms predetermined by the NIMH RDoC working group is publicly available on the internet [11]. The patient admission notes in this study were written in English and Korean. ...
Background:
Suicide has emerged as a serious concern for public health; however, only few studies have revealed the differences between major psychiatric disorders and suicide. Recent studies have attempted to quantify research domain criteria (RDoC) into numeric scores to systematically use them in computerized methods. The RDoC scores were used to reveal the characteristics of suicide and its association with major psychiatric disorders.
Objective:
We intended to investigate the differences in the dimensional psychopathology among hospitalized suicidal patients and the association between the dimensional psychopathology of psychiatric disorders and length of hospital stay.
Methods:
This retrospective study enrolled hospitalized suicidal patients diagnosed with major psychiatric disorders (depression, schizophrenia, and bipolar disorder) between January 2010 and December 2020 at a tertiary hospital in South Korea. The RDoC scores were calculated using the patients' admission notes. To measure the differences between psychiatric disorder cohorts, analysis of variance and the Cochran Q test were conducted and post hoc analysis for RDoC domains was performed with the independent two-sample t test. A linear regression model was used to analyze the association between the RDoC scores and sociodemographic features and comorbidity index. To estimate the association between the RDoC scores and length of hospital stay, multiple logistic regression models were applied to each psychiatric disorder group.
Results:
We retrieved 732 admissions for 571 patients (465 with depression, 73 with schizophrenia, and 33 with bipolar disorder). We found significant differences in the dimensional psychopathology according to the psychiatric disorders. The patient group with depression showed the highest negative RDoC domain scores. In the cognitive and social RDoC domains, the groups with schizophrenia and bipolar disorder scored higher than the group with depression. In the arousal RDoC domain, the depression and bipolar disorder groups scored higher than the group with schizophrenia. We identified significant associations between the RDoC scores and length of stay for the depression and bipolar disorder groups. The odds ratios (ORs) of the length of stay were increased because of the higher negative RDoC domain scores in the group with depression (OR 1.058, 95% CI 1.006-1.114) and decreased by higher arousal RDoC domain scores in the group with bipolar disorder (OR 0.537, 95% CI 0.285-0.815).
Conclusions:
This study showed the association between the dimensional psychopathology of major psychiatric disorders related to suicide and the length of hospital stay and identified differences in the dimensional psychopathology of major psychiatric disorders. This may provide new perspectives for understanding suicidal patients.