Fig 4 - available via license: CC BY
Content may be subject to copyright.
RDoC score rank correlation heatmap: We calculate the rank correlation between the selected CANTAB cognitive score metrics and embedding-based derived RDoC scores. High correlation is shown in warm colors. Significance is shown with stars: 0.1 � , 0.05 �� , and 0.01 ��� . https://doi.org/10.1371/journal.pone.0230663.g004
Source publication
Background
Recent initiatives in psychiatry emphasize the utility of characterizing psychiatric symptoms in a multidimensional manner. However, strategies for applying standard self-report scales for multiaxial assessment have not been well-studied, particularly where the aim is to support both categorical and dimensional phenotypes.
Methods
We pr...
Citations
... For instance, Vu et al. [5] used pretrained BERT to encode participants' social media posts and the questions from the Big-Five personality questionnaire, to predict individual-level responses to out-of-sample Big-Five questions. Pellegrini et al. [42] used the skip-gram embedding algorithm [1] to encode psychiatric questionnaires and patients' responses to those questionnaires. They showed that the resulting embeddings can be used for effective diagnosis of some mental health conditions. ...
Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are high-quality representations of the information needed to be encoded. We view this quality evaluation problem from a measurement validity perspective, and propose the use of the classic construct validity framework to evaluate the quality of text embeddings. First, we describe how this framework can be adapted to the opaque and high-dimensional nature of text embeddings. Second, we apply our adapted framework to an example where we compare the validity of survey question representation across text embedding models.
... By making use of the generated pretrained text embeddings, they were able to moderately improve the prediction of individual-level responses to out-of-sample Big-Five questions, compared to not using any embeddings. Pellegrini et al. [32] used the skip-gram embedding algorithm [1] to represent the questions in 9 different questionnaires about psychiatric symptoms. The embeddings of the questions were then weighted by the numerical responses from psychiatric patients, indicating the severity of specific disease symptoms. ...
Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social science research. We therefore propose the use of the classic construct validity framework to evaluate the validity of text embeddings. We show how this framework can be adapted to the opaque and high-dimensional nature of text embeddings, with application to survey questions. We include several popular text embedding methods (e.g. fastText, GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct validity analyses. We find evidence of convergent and discriminant validity in some cases. We also show that embeddings can be used to predict respondent's answers to completely new survey questions. Furthermore, BERT-based embedding techniques and the Universal Sentence Encoder provide more valid representations of survey questions than do others. Our results thus highlight the necessity to examine the construct validity of text embeddings before deploying them in social science research.
... Word embeddings have been utilized in a wide variety of fields such as artificial intelligence [12], information retrieval [13,14], sentiment analyses [15], and psychiatric [16]. There are several word embedding techniques such as Word2Vec, Glove, Latent Semantic Analysis (LSA), Doc2Vec, skip-gram, fasttex. ...
With the increase in accumulated data and usage of the Internet, social media such as Twitter has become a fundamental tool to access all kinds of information. Therefore, it can be expressed that processing, preparing data, and eliminating unnecessary information on Twitter gains its importance rapidly. In particular, it is very important to analyze the information and make it available in emergencies such as disasters. In the proposed study, an earthquake with the magnitude of Mw = 6.8 on the Richter scale that occurred on January 24, 2020, in Elazig province, Turkey, is analyzed in detail. Tweets under twelve hashtags are clustered separately by utilizing the Social Spider Optimization (SSO) algorithm with some modifications. The sum-of intra-cluster distances (SICD) is utilized to measure the performance of the proposed clustering algorithm. In addition, SICD, which works in a way of assigning a new solution to its nearest node, is used as an integer programming model to be solved with the GUROBI package program on the test data-sets. Optimal results are gathered and compared with the proposed SSO results. In the study, center tweets with optimal results are found by utilizing modified SSO. Moreover, results of the proposed SSO algorithm are compared with the K-means clustering technique which is the most popular clustering technique. The proposed SSO algorithm gives better results. Hereby, the general situation of society after an earthquake is deduced to provide moral and material supports.
... In [45], the authors compared NLP-based models analyzing free text answering to an open question, against logistic regression prediction models analyzing answers to structured psychiatric questionnaires. In [46], the authors explore some applications of word embeddings to the characterization of some psychiatric symptoms. In [47], adapted word embeddings are used to recognize psychiatric symptoms in psychiatrist texts. ...
Mathematical modeling of language in Artificial Intelligence is of the utmost importance for many research areas and technological applications. Over the last decade, research on text representation has been directed towards the investigation of dense vectors popularly known as word embeddings. In this paper, we propose a cognitive-emotional scoring and representation framework for text based on word embeddings. This representation framework aims to mathematically model the emotional content of words in short free-form text messages, produced by adults in follow-up due to any mental health condition in the outpatient facilities within the Psychiatry Department of Hospital Fundación Jiménez Díaz in Madrid, Spain. Our contribution is a geometrical-topological framework for Sentiment Analysis, that includes a hybrid method that uses a cognitively-based lexicon together with word embeddings to generate graded sentiment scores for words, and a new topological method for clustering dense vector representations in high-dimensional spaces, where points are very sparsely distributed. Our framework is useful in detecting word association topics, emotional scoring patterns, and embedded vectors’ geometrical behavior, which might be useful in understanding language use in this kind of texts. Our proposed scoring system and representation framework might be helpful in studying relations between language and behavior and their use might have a predictive potential to prevent suicide.
The Research Domain Criteria (RDoC) initiative aims to organize research according to domains of brain function. Dysfunction within these domains leads to psychopathology that is classically measured with rating scales. Examining the correspondence between the specific measures assessed within rating scales and RDoC domains is necessary to assess the needs for new RDoC-focused scales. Such RDoC-focused scales have the potential of allowing translation of this work into the clinical domain of measuring psychopathology and designing treatment. Here, we describe an initial qualitative assessment by a group of 10 clinician-scientists of the alignment between RDoC domains and the items within five commonly used rating scales. In this commentary, we report limited correspondence and make recommendations for future work needed to address these limitations.