ArticleLiterature Review

The compatibility of theoretical frameworks with machine learning analyses in psychological research

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Supervised machine learning has been increasingly used in psychology and psychiatry research. Machine learning offers an important advantage over traditional statistical analyses: statistical model training in example data to enhance predictions in external test data. Additional advantages include advanced, improved statistical algorithms, and empirical methods to select a smaller set of predictor variables. Yet machine learning researchers often use large numbers of predictor variables, without using theory to guide variable selection. Such approach leads to Type I error, spurious findings, and decreased generalizability. We discuss the importance of theory to the psychology field. We offer suggestions for using theory to drive variable selection and data analyses using machine learning in psychological research, including an example from the cyberpsychology field.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the enterprise IT governance group, the PubMed study [25] highlighted the importance of MLOps in the efficient management of AI models in the IT governance framework. For the ML model development and deployment group, multiple registries such as [26]- [55] contributed to a comprehensive view of how MLOps optimizes the entire lifecycle of models, from initial develop pment to deployment in production environments. In the context of drug discovery, Yadav and Thakkar [56] highlighted the specific application of MLOps in the the neural oscillation attention long short-term memory (NOA-LSTM) architecture for time series forecasting. ...
... The first group, composed of articles [4], [26], [29], [32], [34], [43], [51], [60], [63], [66], [72], [77], [83], [86], efficiency in projects with MLOps was explored, evaluating development time, resource utilization and adaptability in these contexts. In the second group, composed of articles [17], [27], [28], [30], [33], [35], [37], [39], [41], [42], [45], [48]- [50], [52], [53], [55], [56], [58]- [61], [68], [69], [71], [73]- [75], [78]- [82], [85], [86], [88], [90], [92]- [94], [105], [106], examined the performance in projects with MLOps, analyzing accuracy, processing speed, and scalability. The third group, composed of papers [10], [24], [25], [31], [36], [38], [40], [44], [46], [47], [54], [57], [62], [64], [65], [67], [70], [76], [84], [91], [95]- [104], focused on anomaly detection in projects with MLOps, evaluating the ability of the models to identify unconventional behaviors. ...
... These findings provide a solid basis for the discussion and conclusions of the systematic review, emphasizing the importance of implementing MLOps in the context of computer auditing and AI. [26], [29], [32], [34], [43], [51], [60], [63], [66], [72], [77], [83], [86] 2 Performance in projects with MLOps [17], [27], [28], [30], [33], [35], [37], [39], [41], [42], [45], [48]- [50], [52], [53], [55], [56], [58]- [61], [68], [69], [71], [73]- [75], [78]- [82], [85], [87], [88], [90], [92]- [94], [105], [106] 3 Detection of anomalies in projects with MLOps [10], [24], [25], [31], [36], [38], [40], [44], [46], [47], [54], [57], [62], [64], [65], [67], [70], [76], [84], [91], [95] - [104] 4 Efficiency in projects without MLOps [10], [25], [28], [29], [33], [34], [37], [43], [44], [51], [54], [56], [59], [60], [61], [63], [65], [67], [72], [75], [78], [80], [83], [85], [88], [89], [91]- [93], [95]- [105] 5 Performance in projects without MLOps [4], [17], [24], [26], [27], [30]- [32], [35], [36], [38]- [42], [45]- [50], [52], [53], [55], [57], [58], [60], [64], [66], [68]- [71], [73], [74], [76], [77], [79], [81], [82], [86], [87], [90], [106] 6 Anomaly detection in projects without MLOps [10], [87], [92], [94]- [104] 4. CONCLUSIONS This systematic review follows a sound methodology that includes formulating the research question using the PICO approach and applying the PRISMA methodology. The results significantly contribute to understanding the implementation of the MLOps methodology in identifying anomalies in integrating AI projects in computer auditing. ...
Article
Full-text available
This systematic review focused on evaluating the impact of the machine learning operations (MLOps) methodology on anomaly detection and the integration of artificial intelligence (AI) projects in computer auditing. Data collection was carried out by searching for articles in databases, such as Scopus and PubMed, covering the period from 2018 to 2024. The rigorous application of the preferred reporting items for systematic reviews and metaanalyses (PRISMA) methodology allowed 88 significant records to be selected from an initial set of 1,389, highlighting the completeness of the selection phase. Both quantitative and qualitative analysis of the data obtained revealed emerging trends in the research and provided key insights into the implementation of MLOps in AI projects, especially in response to increasing complexity, whereby the adoption of the MLOps methodology stands out as a crucial component to optimize anomaly detection and improve integration in the context of information technology auditing. This systematic approach not only consolidates current knowledge but also stands as an essential guide for researchers and practitioners, and the information derived from this systematic review provides valuable guidance for future practices and decisions at the intersection of AI and information technology auditing.
... Monitoring changes in key variables like attachment security or emotional regulation, before and after therapy, provides valuable feedback on intervention outcomes (Rosenbusch et al., 2021). This adaptability supports continuous improvement and optimization of treatment strategies ultimately offering a more comprehensive and effective approach to managing love addiction (Elhai & Montag, 2020). ...
... Love addiction as a psychological construct can be characterized by emotional attachment, interpersonal dependence, and self-loathing which all blend to help trigger the development and the sustained engagement of such a craving (Gigerenzer, 2024). The specific measures that were discussed in this studypositive affect, thwarted belongingness, and interpersonal needs-are also in line with previous psychological explanations that viewed emotional health and the desire to connect with others as protective factors against pathological relationships (Elhai & Montag, 2020). ...
Chapter
Introduction: Love addiction, a negative emotional construct, can significantly impact an individual’s personal and social life. It is characterized by emotional dependency involving both positive and negative emotions, specific interpersonal needs, and elements of self-hate. In recent years, machine learning has become increasingly valuable in predicting and detecting negative emotional states in psychology. These algorithms assist in unraveling the complexity of such phenomena. This study explores the application of 12 selected machine learning algorithms to explain love addiction among Iranian students based on these psychological factors. Method: This study utilized a convenience sample of 428 Iranian students who participated in 2024. Data collection tools included demographic questionnaires and assessments of positive and negative affect, interpersonal needs, and self-hate. The dataset was analyzed using various machine learning algorithms: AdaBoost, CatBoost, decision trees (DT), Extra Trees, k-nearest neighbors (KNN), LightGBM (LGBM), logistic regression (LogReg), multilayer perceptron (MLP), naive Bayes (NB), random forest (RF), support vector machine (SVM), and XGBoost (XGB). The input features consisted of positive affect (PA), negative affect, interpersonal needs, and self-hate, while the target variable classified love addiction into low and high levels. Results: The results showed that the random forest classifier achieved the highest performance with a mean accuracy of 0.82, sensitivity of 0.93, and an AUC value of 0.92. Other models, such as SVM and decision tree performed well with SVM achieving the highest sensitivity (0.98) but lower specificity. Feature importance analysis revealed that positive affect (PA), thwarted belongingness (TB), and interpersonal needs (INT) were the most important predictors of love addiction. Conclusion: Among the 12 machine learning algorithms, random forest demonstrated the best overall performance in predicting love addiction with superior discriminatory power (AUC = 0.92). The feature importance and the Shapley Additive Explanations (SHAP) value analyses further identified key psychological factors, such as PA and TB, which contribute to love addiction. This study highlights the potential of machine learning models in understanding and predicting psychological phenomena like love addiction, providing valuable insights for mental health professionals and researchers
... In alignment with the popularity, there have been an increasing number of prominent introductory papers for psychologists that explain machine learning concepts, opportunities, and limitations (Adjerid & Kelley, 2018;Bleidorn & Hopwood, 2019;Bzdok, 2017;Dwyer et al., 2018;Elhai & Montag, 2020;Hofman et al., 2017;Hullman et al., 2022;Liem et al., 2018;Orrù et al., 2020;Rocca & Yarkoni, 2021;Tay et al., 2022;Van Lissa, BRIDGING STATISTICAL AND MACHINE-LEARNING APPROACHES IN PSYCHOLOGY 4 2022; Yarkoni & Westfall, 2017) as well as provide tutorials for specific methods or approaches (Boedeker & Kearns, 2019;E. E. Chen & Wojcik, 2016;De Rooij & Weeda, 2020;Jacobucci et al., 2019;Pargent et al., 2023;Rosenbusch et al., 2021). ...
... Namely, these involve handling a limited sample size, measurement error, non-independent data, and missing data. Several prominent papers have discussed the application of a machine learning approach in psychology (Adjerid & Kelley, 2018;Bzdok, 2017;Dwyer et al., 2018;Elhai & Montag, 2020;Hofman et al., 2017;Hullman et al., 2022;Orrù et al., 2020;Rocca & Yarkoni, 2021;Tay et al., 2022;Van Lissa, 2022;Yarkoni & Westfall, 2017). For example, Liem et al., 2018 outlines different options for integrating machine learning into analysis pipelines in psychology, including the direct substitution of a traditional statistical model for a machine learning model. ...
Preprint
Full-text available
In recent years, machine learning has propagated into different aspects of psychological research, and supervised machine learning methods have increasingly been used as a tool for predicting human behavior or psychological characteristics when there is a large number of possible predictors. However, researchers often face practical challenges when using machine learning methods on psychological data. In this article, we identify and discuss four key challenges that often arise when applying machine learning to data collected for psychological research. The four challenge areas cover (i) limited sample size, (ii) measurement error, (iii) non-independent data, and (iv) missing data. Such challenges are extensively discussed in the “traditional” statistical literature but are often not explicitly addressed, or at least not to the same extent, in the applied machine learning community. We present how each of these challenges is dealt with first from a traditional statistics perspective and then from a machine learning perspective, and discuss the strengths and weaknesses of these solutions by comparing the approaches. We argue that the boundary between traditional statistics and machine learning is fluid, and emphasize the need for cross-disciplinary collaboration to better tackle these core challenges and improve replicability.
... One class of neuroimaging methods that allows the building of such models is the one based on supervised machine learning. Supervised machine learning (SML) has been increasingly used for neuroscientific research (Elhai & Montag, 2020) to predict class labels or continuous variables of interest (Hinton, 2011;Sarker, 2021). Compared to standard frequentist approaches, SML approaches, being multivariate, provide more sensitivity and flexibility (Schrouff et al., 2013), and, most importantly, results are tested for generalization to predict new cases. ...
... First of all, to analyze behavioural data, we applied the feature selection procedure, that produces more generalizable outcomes due to the automatic variables addition or removal at each iteration, in order to reduce dimensions (Elhai & Montag, 2020). To make a solid decision, the neighborhood component analysis (NCA) (Djerioui et al., 2019) was applied to eliminate the redundant features in our predictive model and to prevent the dimensionality curse (Bellman, 2015). ...
Article
Narcissism is a multifaceted construct often linked to pathological conditions whose neural correlates of narcissism are still poorly understood. Previous studies have reported inconsistent findings on the brain regions probably due to methodological limitations, such as the low number of participants or the use of mass univariate methods or instruments. The present study aims to overcome the previously methodological limitations and build a predictive model of narcissism based on neural and psychological features underlying individual differences in narcissistic personality traits. In this respect, two machine learning-based methods were used to predict narcissistic traits from brain structural features and other normal and abnormal personality features. Results showed that a circuit including the lateral and middle frontal gyrus, the angular gyrus, Rolandic operculum and Heschl’s gyrus predicted the individual differences in narcissistic personality traits (p<0.003). Moreover, narcissistic traits were predicted by normal (openness, agreeableness, conscientiousness) and abnormal (borderline, antisocial, insecure, addicted, negativistic, machiavellianism) personality traits. These results expand the possibility of predicting personality traits from neural and psychological features and can pave the way to build possible biomarkers of personality pathology. This study is the first predictive based on a supervised machine learning approach that can be used to decode narcissistic personality traits.
... One class of neuroimaging methods that allows the building of such models is the one based on supervised machine learning. Supervised machine learning (SML) has been increasingly used for neuroscientific research (Elhai & Montag, 2020) to predict class labels or continuous variables of interest (Hinton, 2011;Sarker, 2021). Compared to standard frequentist approaches, SML approaches, being multivariate, provide more sensitivity and flexibility (Schrouff et al., 2013), and, most importantly, results are tested for generalization to predict new cases. ...
... First of all, to analyze behavioral data, we applied the feature selection procedure, that produces more generalizable outcomes due to the automatic variables addition or removal at each iteration, in order to reduce dimensions (Elhai & Montag, 2020). To make a solid decision, the neighborhood component analysis (NCA) (Djerioui et al., 2019) was applied to eliminate the redundant features in our predictive model and to prevent the dimensionality curse (Bellman, 2015). ...
... (3) Methodologically, an advanced deep learning algorithm, LMF, is employed for the analysis of psychology-based issues. Both intra-and intermodal interactions among an individual's visual, vocal, and textual information are mapped, thus leading to a more precise prediction of individual competence, which offers insights into machine learning analysis in psychological research [34][35][36]. (4) We provide the Chinese Competence Evaluation Multimodal Dataset (CH-CMD) for individual competence evaluation to the public. This dataset, comprising rich information on visual, vocal, and textual information, along with annotated competence scores, could encourage further studies of the relationship between inner and outer traits and individual competence. ...
Article
Full-text available
In social interactions, people who are perceived as competent win more chances, tend to have more opportunities, and perform better in both personal and professional aspects of their lives. However, the process of evaluating competence is still poorly understood. To fill this gap, we developed a two-step empirical study to propose a competence evaluation framework and a predictor of individual competence based on multimodal data using machine learning and computer vision methods. In study 1, from a knowledge-driven perspective, we first proposed a competence evaluation framework composed of 4 inner traits (skill, expression efficiency, intelligence, and capability) and 6 outer traits (age, eye gaze variation, glasses, length-to-width ratio, vocal energy, and vocal variation). Then, eXtreme Gradient Boosting (XGBoost) and Shapley Additive exPlanations (SHAP) were utilized to predict and interpret individual competence, respectively. The results indicate that 8 (4 inner and 4 outer) traits (in descending order: vocal energy, age, length-to-width ratio, glasses, expression efficiency, capability, intelligence, and skill) contribute positively to competence evaluation, while 2 outer traits (vocal variation and eye gaze variation) contribute negatively. In study 2, from a data-driven perspective, we accurately predicted competence with a cutting-edge multimodal machine learning algorithm, low-rank multimodal fusion (LMF), which exploits the intra- and intermodal interactions among all the visual, vocal, and textual features of an individual’s competence behavior. The results indicate that vocal and visual features contribute most to competence evaluation. In addition, we provided a Chinese Competence Evaluation Multimodal Dataset (CH-CMD) for individual competence analysis. This paper provides a systemic competence framework with empirical consolidation and an effective multimodal machine learning method for competence evaluation, offering novel insights into the study of individual affective traits, quality, personality, etc.
... Entrepreneurial readiness is described as a set of individual characteristics that separate individuals by demonstrating curiosity and competence in observing and analyzing their creative and productive potential and running a business by directing all abilities for self-achievement [10]. Entrepreneurial readiness is a system that identifies individual interests, assesses a person's preparedness to start a business, and increases socioeconomic welfare. ...
Article
Full-text available
Machine learning has become an exciting topic in psychology-related research, one of which is counseling psychological readiness for entrepreneurship. An intelligent application developed using a machine learning model to assist the counseling process in measuring a person's psychological readiness for entrepreneurship. This application was generated using the Entrepreneurship Psychological Readiness (EPR) instrument. In this study, to get the most suitable machine learning model, a comparison of 2 (two) machine learning models, namely, Naïve Bayesian (NB) and k-Nearest Neighbor (k-NN), involving 1095 training data. There are 4 (four) prediction classes recommended from the results of counseling: categories not ready for entrepreneurship, given training, guided, and prepared for entrepreneurship. The EPR instrument consists of 33 question items to measure 8 (eight) parameters used as inputs for the prediction process. The data has been randomized, and the experiment has been repeated 5 (five) times to check the consistency of performance of all techniques. 80% of the data was used as training data, and the other 20% was used as testing data. The results of the five (5) trials show that the Naïve Bayesian model provides the most consistent results in predicting a person's psychological readiness for entrepreneurship, with 89.58% accuracy, in testing. Therefore, the Naïve Bayesian model is recommended to be used in psychological counseling to predict a person's readiness for entrepreneurship
... However, concerns about machine learning techniques are mainly about the "black box" results they produce (Elhai & Montag, 2020;Guidotti et al., 2018). The learning algorithms can predict outcomes successfully with great accuracy, but they do not provide the causal or explanatory information that traditional methods generate and require. ...
Article
Full-text available
Virtual platforms and autonomous robotic systems have recently gained a lot of attention due to the enormous growth of novel computational techniques, such as artificial intelligence and machine learning, allowing various fields and processes to be transformed. Cognitive psychology is a field where such virtual platforms can be applied in order to enhance the current procedures and processes, offering an objective and non-intrusive method, for psychological tasks execution, especially in the case of children. More specifically, this paper presents a virtual platform, complemented with a robotic experimenter and a machine learning processing module, allowing the objective and neutral execution of psychological experiments and tasks to children, remotely or in person.
... This data-driven method employs an advanced feature selection algorithm optimized for dissecting intricate variable relationships among highly correlated predictors and reducing the dimensionality of the model. Given our research goal to identify and prioritize predictors of psychological and subjective well-being measures across diverse domains, we adopted a machine learning (ML) approach to address the limitations of traditional regression methods, such as overfitting, by adding a penalty to the regression coefficients while maintaining predictive accuracy (Dwyer et al., 2018;Elhai & Montag, 2020). Despite ML approaches across various domains, such as emotion (Prout et al., 2020), personality (Gladstone et al., 2019), health (Kim et al., 2015;Lee et al., 2020a), and workplace dynamics (Sajjadiani et al., 2019), its use for systematically comparing potential predictors of happiness remains surprisingly sparse. ...
Article
Full-text available
The quest to unravel what contributes to happiness continues to captivate interest in both everyday experiences and academic discourse. Nonetheless, empirical research on the relative importance of possible candidates and their associations with two key aspects of well-being—eudaimonia (the good life) and hedonia (pleasure)—is limited. This study addresses this gap by exploring the relative strength of 32 predictors from multiple domains on psychological well-being (PWB) and subjective well-being (SWB). Using a machine learning approach on a dataset of 559 Korean adults, we identified distinct primary determinants for each well-being aspect. For PWB, meaning in life, self-esteem, and essentialist beliefs about happiness emerged as the strongest predictors requiring careful consideration. For SWB, depressive symptoms, subjective socioeconomic status, and emotional stability were salient predictors. Our findings highlight potential cultural nuances in the prioritization of happiness and offer valuable insights for policymakers and decision-makers in tailoring interventions and strategies to optimize individual well-being.
... Thus, integrating machine learning-based analyses into the workflow of psychological experiments can maximize accuracy and minimize issues related to reproducibility (Orr� u et al., 2019). However, it is worth noting that machine learning researchers often use numerous predictor variables without the guidance of theory, which can lead to issues such as class I errors, false discoveries, and reduced generalizability (Elhai & Montag, 2020). Therefore, combining theory-driven variable selection with machine learning methods can provide a better understanding of psychological mechanisms and the ability to more accurately predict future behavior, resulting in a more solid theoretical foundation and practical guidance for future aesthetic research and related industries. ...
... Machine learning offers sophisticated algorithms capable of detecting complex, non-linear patterns and interactions within large datasets, which are common in social media studies. This enables a deeper understanding of the nuanced effects of social media usage that traditional methods might miss (31). Additionally, machine learning can enhance predictive accuracy and provide insights into predictive factors of mental health issues, facilitating targeted interventions (32). ...
Article
Full-text available
This study delves into the complex relationship between various mental health indicators and their influencing factors among Indian youths. It specifically examines how external validation, interactions on social media platforms, and demographic variables such as gender, age, and occupation impact a range of mental health outcomes. These outcomes include experiencing negative thoughts, a disinterest in activities, low self-esteem, the development of eating disorders, disturbances in sleep patterns, symptoms of depression, difficulties in concentration, and feelings of fatigue. Employing the theoretical framework of Social Cognitive Theory, this research utilized an online random sampling method to gather data from a diverse group of 151 Indian youth participants. The findings of this study highlight the significant role that external validation and social media usage play in shaping mental health conditions among the youth. This underscores the critical need for integrating digital literacy components into mental health initiatives, aiming to foster healthier online behaviors and interactions. Furthermore, the study advocates for a holistic approach to mental health care, emphasizing the consideration of the specific needs of various demographic segments. It suggests that future mental health policies and interventions should be culturally sensitive and responsive. The results of this research underscore the pressing necessity for ongoing investigations into these vital dynamics, aiming to better understand and address the mental health challenges faced by Indian youths.
... Given the promising results of other mental health domains and to fill this research gap in the field of insomnia, this study investigates the predictive value of objectively measured smartphone usage behavior and self-reported insomnia symptoms. Given the limited research in this field and to overcome some limitations of traditional statistical analyses (e.g., overfitting, predictor collinearity, and linearity; [29]), we choose an exploratory approach utilizing supervised machine learning. In particular, a data set featuring a validated self-report questionnaire for assessing insomnia symptoms (ISI; [30])) and objectively measured smartphone usage data of the previous seven days was analyzed to investigate the following research questions: ...
Article
Full-text available
Introduction Digital phenotyping can be an innovative and unobtrusive way to improve the detection of insomnia. This study explores the correlations between smartphone usage features (SUF) and insomnia symptoms and their predictive value for detecting insomnia symptoms. Methods In an observational study of a German convenience sample, the Insomnia Severity Index (ISI) and smartphone usage data (e.g., time the screen was active, longest time the screen was inactive in the night) for the previous 7 days were obtained. SUF (e.g., min, mean) were calculated from the smartphone usage data. Correlation analyses between the ISI and SUF were conducted. For the specification of the machine learning models (ML), 80 % of the data was allocated to training, 20 % to testing, and five-fold cross-validation was used. Six algorithms (support vector machine, XGBoost, Random Forest, k-Nearest-Neighbor, Naive Bayes, and Logistic Regressions) were specified to predict ISI scores ≥15. Results 752 participants (51.1 % female, mean ISI = 10.23, mean age = 41.92) were included in the analyses. Small correlations between some of the SUF and insomnia symptoms were found. In the ML models, sensitivity was low, ranging from 0.05 to 0.27 in the testing subsample. Random Forest and Naive Bayes were the best-performing algorithms. Yet, their AUCs (0.57, 0.58 respectively) in the testing subsample indicated a low discrimination capacity. Conclusions Given the small magnitude of the correlations and low discrimination capacity of the ML models, SUFs, as measured in this study, do not appear to be sufficient for detecting insomnia symptoms. Further research is necessary to explore whether examining intra-individual variations and subpopulations or employing alternative smartphone sensors yields more promising outcomes.
... Still, the demand for XAI is increasing, with some instances where it is mandated by judicial bodies [34]. However, there might be additional solutions: A recent work put forward some ideas for the psychological sciences that psychologists and data scientists might narrow down on what the machine learns [35], namely, by only feeding the machine with information, which from a theoretical perspective makes sense to predict a certain outcome. Beyond this, one could investigate how predictions from data change when some data are provided to a machine and others are excluded. ...
Article
Full-text available
Artificial intelligence (AI) is built into many products and has the potential to dramatically impact societiesaround the world. This short theoretical paper aims to provide a simple framework that might help us understand how theintroduction and/or use of products with AI might influence the well-being of humans. It is proposed that considering the dynamic Interplay between variables stemming from Modality, Person,Area, Culture and Transparency categories will help to understand the influence of AI on well-being. The Modality category encompasses areas such as the degree of AI being interactive, informational versus actualizing, orautonomous. The Person variable contains variables such as age, gender, personality, technological self-efficacy,and perceived competence when interacting with AI, whereas the Area variable can comprise a certain product where AI is in-built or a certain domain where AI is used to make a difference (such as the health sector, militarysector, education sector, etc.). The Culture variable is of importance to understand because cultural settings might shape attitudes towards AI. Finally, this might also be true for transparent AI (or understandable/explainable AI), with high degrees of transparency likely to elicit trust. The proposed model suggests that there is no easy answer when one seeks to understand the impact of AI onthe world and humans. Only by considering a myriad number of variables in a model, summed up in the acronym IMPACT (Interaction/Interplay of Modality-Person-Area-Culture-Transparency), we might get closer to an un-derstanding of how AI impacts individuals’ well-being.
... Metode yang dapat digunakan adalah Clustering. Clustering adalah salah satu metode dalam analisis data statistik, statistik multivariat, dan juga masuk ke dalam salah satu kelas di Machine Learning [1]. Dalam beberapa klasifikasi di Machine Learning, Clustering ini masuk ke dalam unsupervised learning. ...
Method
Bila suatu bangunan memiliki beberapa lantai dan memiliki beberapa lift, bagaimana cara mendesain akses setiap lift. Bila semua lift bisa mengakses ke semua lantai, maka akan terjadi proses menunggu yang sangat lama. Salah satu metode yang bisa digunakan adalah teknik Clustering.
... The first step is feature selection. The initial version of the short-form BWAQ was created by analyzing the training set using stepwise regression and an ANOVA F-test to extract the items that contribute most to the questionnaire results (i.e., the most important features) [36,37]. The second step is machine learning modeling. ...
Article
Full-text available
For adolescents, high levels of aggression are often associated with suicide, physical injury, worsened academic performance, and crime. Therefore, there is a need for the early identification of and intervention for highly aggressive adolescents. The Buss–Warren Aggression Questionnaire (BWAQ) is one of the most widely used offensive measurement tools. It consists of 34 items, and the longer the scale, the more likely participants are to make an insufficient effort response (IER), which reduces the credibility of the results and increases the cost of implementation. This study aimed to develop a shorter BWAQ using machine learning (ML) techniques to reduce the frequency of IER and simultaneously decrease implementation costs. First, an initial version of the short-form questionnaire was created using stepwise regression and an ANOVA F-test. Then, a machine learning algorithm was used to create the optimal short-form questionnaire (BWAQ-ML). Finally, the reliability and validity of the optimal short-form questionnaire were tested using independent samples. The BWAQ-ML contains only four items, thirty items less than the BWAQ, and its AUC, accuracy, recall, precision, and F1 score are 0.85, 0.85, 0.89, 0.83, and 0.86, respectively. BWAQ-ML has a Cronbach’s alpha of 0.84, a correlation with RPQ of 0.514, and a correlation with PTM of −0.042, suggesting good measurement performance. The BWAQ-ML can effectively measure individual aggression, and its smaller number of items improves the measurement efficiency for large samples and reduces the frequency of IER occurrence. It can be used as a convenient tool for early adolescent aggression identification and intervention.
... It may, however, require additional substantive input to form a meaningful statement (Rubin & Donkin, 2022). This is strongly the case when a causal hypothesis is derived from the finding of an association (Elhai & Montag, 2020;Glymour et al., 2019). ...
Article
Full-text available
Transparent exploration in science invites novel discoveries by stimulating new or modified claims about hypotheses, models, and theories. In this second article of two consecutive parts, we outline how to explore data patterns that inform such claims. Transparent exploration should be guided by two contrasting goals: comprehensiveness and efficiency. Comprehensivenes calls for a thorough search across all variables and possible analyses as to not to miss anything that might be hidden in the data. Efficiency adds that new and modified claims should withstand severe testing with new data and give rise to relevant new knowledge. Efficiency aims to reduce false positive claims, which is better achieved if a bunch of results is reduced into a few claims. Means for increasing efficiency are methods for filtering local data patterns (e.g., only interpreting associations that pass statistical tests or using cross-validation) and for smoothing global data patterns (e.g., reducing associations to relations between a few latent variables). We suggest that researchers should condense their results with filtering and smoothing before publication. Coming up with just a few most promising claims saves resources for confirmation trials and keeps scientific communication lean. This should foster the acceptance of transparent exploration. We end with recommendations derived from the considerations in both parts: an exploratory research agenda and suggestions for stakeholders such as journal editors on how to implement more valuable exploration. These include special journal sections or entire journals dedicated to explorative research and a mandatory separate listing of the confirmed and new claims in a paper’s abstract.
... ANN's main advantage is its ability to learn specific rules by self-training instead of requiring a mathematical equation to describe the mapping relationship between the input and output layers. Compared with other machine learning models, MLPANN models have ascendancy over them in human perception studies [57][58][59]. Therefore, the neural fitting tool in MATLAB, which can execute many procedures concurrently and accelerate training speed significantly, was employed in this study. ...
Article
Exercisers' visual comfort is an essential factor in successful gymnasium design. Existing research has identified viable indicators of visual comfort and the explained the interaction between humans and the light environment. However, it remains difficult to accurately quantify the impact of the daylight environment on human perception. Given the particularity of exercisers' behavior and activities in gymnasiums, the current general assessment model for exercisers' visual perception is lacking. Taking a university gymnasium in Harbin as a case, this study aimed to establish a computational method for assessing visual comfort from the human-centric perspective via mutual authentication between questionnaire and physiological indices and luminance. An analysis of the questionnaire responses revealed that the synthetical visual evaluation (SVE) was an appropriate visual evaluation index. Machine learning was applied to quantify the correlation between various luminance levels and human perception to assess exercisers' level of visual comfort. Multilayer Perceptron models with the best-fit optimization were selected by artificial neural networks (ANNs) to determine the most optimized visual comfort assessment model. Based on the ANNs, the correlation coefficient between luminance, SVE, and physiological indicator ranged from 85% to 90%. According to the genetic algorithm, the average luminance of the entire field of view (Lfov) was 55–135 cd/m2, the average luminance of the target area (Lt) was 82–375 cd/m2, and the average luminance of the window area (Lw) was 960–1950 cd/m2, for a comfortable visualization.
... 7. Not abandoning theory as atheoretical approaches like machine learning become more powerful, and alluringthe alternative would be to combine psychological theory building with machine learning, which is feasible (e.g., Elhai & Montag, 2020). ...
... While a growing number of studies have recognized the potential of machine learning methods in psychology research and patient care, they are also criticized for their "black-box" effect [18,19], thus limiting the interpretability and, consequently, the acceptability of ML-based models. In this work, we demonstrated that both robust and trustworthy ML models can be developed (i.e., by pairing a classification model with SHAP values to generate an explainable predictive model), allowing us to quantify at a patient level the unique and additive importance of the predictors in the classification task. ...
... Those authors entered several psychopathological and demographic variables to determine their ability to predict problematic smartphone use. They further discussed the compatibility of machine learning alongside theoretical frameworks in psychological research [30]. Additionally, neural networks and decision trees were used to predict sixth semester CGPA as a proxy for academic performance [31]. ...
Article
Full-text available
This paper reports a two-part study examining the relationship between fear of missing out (FoMO) and maladaptive behaviors in college students. This project used a cross-sectional study to examine whether college student FoMO predicts maladaptive behaviors across a range of domains (e.g., alcohol and drug use, academic misconduct, illegal behavior). Participants (N = 472) completed hard copy questionnaire packets assessing trait FoMO levels and questions pertaining to unethical and illegal behavior while in college. Part 1 utilized traditional statistical analyses (i.e., hierarchical regression modeling) to identify any relationships between FoMO, demographic variables (socioeconomic status, living situation, and gender) and the behavioral outcomes of interest. Part 2 looked to quantify the predictive power of FoMO, and demographic variables used in Part 1 through the convergent approach of supervised machine learning. Results from Part 1 indicate that college student FoMO is indeed related to many diverse maladaptive behaviors spanning the legal and illegal spectrum. Part 2, using various techniques such as recursive feature elimination (RFE) and principal component analysis (PCA) and models such as logistic regression, random forest, and Support Vector Machine (SVM), showcased the predictive power of implementing machine learning. Class membership for these behaviors (offender vs. non-offender) was predicted at rates well above baseline (e.g., 50% at baseline vs 87% accuracy for academic misconduct with just three input variables). This study demonstrated FoMO’s relationships with these behaviors as well as how machine learning can provide additional predictive insights that would not be possible through inferential statistical modeling approaches typically employed in psychology, and more broadly, the social sciences. Research in the social sciences stands to gain from regularly utilizing the more traditional statistical approaches in tandem with machine learning.
... randomisiert kontrollierte Studien bleiben der Goldstandard um kausale Schlussfolgerungen ziehen zu können (Yarkoni & Westfall, 2017). Eine Herausforderung bleibt es Maschinelles Lernen mit theoriegeleiteterpsychologischer Forschung zusammenzuführen (Elhai & Montag, 2020). ...
Article
Full-text available
Zusammenfassung. Digitale Phänotypisierung stellt einen neuen, leistungsstarken Ansatz zur Realisierung psychodiagnostischer Aufgaben in vielen Bereichen der Psychologie und Medizin dar. Die Grundidee besteht aus der Nutzung digitaler Spuren aus dem Alltag, um deren Vorhersagekraft für verschiedenste Anwendungsmöglichkeiten zu überprüfen und zu nutzen. Voraussetzungen für eine erfolgreiche Umsetzung sind elaborierte Smart Sensing Ansätze sowie Big Data-basierte Extraktions- (Data Mining) und Machine Learning-basierte Analyseverfahren. Erste empirische Studien verdeutlichen das hohe Potential, aber auch die forschungsmethodischen sowie ethischen und rechtlichen Herausforderungen, um über korrelative Zufallsbefunde hinaus belastbare Befunde zu gewinnen. Hierbei müssen rechtliche und ethische Richtlinien sicherstellen, dass die Erkenntnisse in einer für Einzelne und die Gesellschaft als Ganzes wünschenswerten Weise genutzt werden. Für die Psychologie als Lehr- und Forschungsdomäne bieten sich durch Digitale Phänotypisierung vielfältige Möglichkeiten, die zum einen eine gelebte Zusammenarbeit verschiedener Fachbereiche und zum anderen auch curriculare Erweiterungen erfordern. Die vorliegende narrative Übersicht bietet eine theoretische, nicht-technische Einführung in das Forschungsfeld der Digitalen Phänotypisierung, mit ersten empirischen Befunden sowie einer Diskussion der Möglichkeiten und Grenzen sowie notwendigen Handlungsfeldern.
... We randomly shuffled the sample's participant rows using a fixed number seed for later consistent replication. After shuffling, we randomly selected 80 % of the sample (n = 3420) as the training sample, and the residual 20 % (n = 855) as the hold-out test sample, a common practice in supervised machine learning (Elhai and Montag, 2020). We preprocessed the data, centering and scaling the predictors and dependent variables as z-scores, after allocation to the training and test samples (Kuhn and Johnson, 2013). ...
Article
Full-text available
As on-demand streaming technology rapidly expanded, binge-watching (i.e., watching multiple episodes of TV series back-to-back) has become a widespread activity, and substantial research has been conducted to explore its potential harmfulness. There is, however, a need for differentiating non-harmful and problematic binge-watching. This is the first study using a machine learning analytical strategy to further investigate the distinct psychological predictors of these two binge-watching patterns. A total of 4275 TV series viewers completed an online survey assessing sociodemographic variables, binge-watching engagement, and relevant predictor variables (i.e., viewing motivations, impulsivity facets, and affect). In one set of analyses, we modeled intensity of nonharmful involvement in binge-watching as the dependent variable, while in a following set of analyses, we modeled intensity of problematic involvement in binge-watching as the dependent variable. Emotional enhancement motivation, followed by enrichment and social motivations, were the most important variables in modeling non-harmful involvement. Coping/escapism motivation, followed by urgency and lack of perseverance (two impulsivity traits), were found as the most important predictors of problematic involvement. These findings indicate that non-harmful involvement is characterized by positive reinforcement triggered by TV series watching, while problematic involvement is linked to negative reinforcement motives and impulsivity traits.
... The use of Facebook data to predict personality traits, voter preferences, mental states or suicide risk (Eichstaedt et al. 2018;Kristensen et al. 2017;Matz et al. 2017;Reardon 2017) represent other nonhealthcare examples of behavioral phenotyping. The connection between a user's posts and their future behavior is nuanced and requires the use of advanced machine learning algorithms trained on vast amounts of data (Elhai and Montag 2020). In these cases, the user has less control or transparency of how their data and predictions of preferences and behavior are used. ...
Chapter
In this chapter we introduce digital phenotyping and its applications to healthcare. Despite the promise of this new form of clinical diagnosis in medicine and psychiatry, use of digital phenotyping raises several ethical concerns. We use insights derived from a clinical case study to frame these different ethical questions. We discuss how current healthcare practice and privacy policies address these questions and impose requirements for non-healthcare scientists and practitioners using digital phenotyping. We emphasize that this chapter frames the discussion from the perspective of the healthcare practitioner. We conclude by briefly reviewing more strongly theoretically based discussions of this emerging topic.
... Elaborate machine learning algorithms that include smartphone usage frequency and duration, different sensors' data, and external factors (e.g., socio-demographic data, date and time of day of smartphone usage, additional self-reported measures, etc.) have been developed to predict smartphone users' negative affective states (e.g., depression, anxiety; Hung et al. 2016;Ware et al. 2020) and stress (Reimer et al. 2017), social anxiety (Jacobson et al. 2020), and schizophrenia (Wang et al. 2017). Of note, machine learning methods are also finding their way into smartphone use research that is more dependent on self-reports (Elhai et al. 2020d;Elhai and Montag 2020). ...
Chapter
Smartphones allow for several daily life enhancements and productivity improvements. Yet, over the last decade the concern regarding daily life adversities in relation to excessive smartphone use have been raised. This type of behavior has been regarded as “problematic smartphone use” (PSU) to describe the effects resembling a behavioral addiction. In addition to other problems in daily life, research has consistently shown that PSU is linked to various psychopathology constructs. The aim of this chapter is to provide an overview of some findings in PSU research regarding associations with psychopathology. We also discuss some of the theoretical explanations that may be helpful in conceptualizing PSU. We then take a look at self-reported PSU in relation to objectively measured smartphone use, and, finally, provide some insight into current findings and future opportunities in objectively measuring smartphone use in association with psychopathology measures. This chapter may be useful as an introductory overview into the field of PSU research.
... The authors suggested using theory to execute variable selection and data analyses by utilizing machine learning in psychological research, they also include an example from the cyberpsychology field (Delhai & Montag, 2020). ...
Article
Full-text available
Concerns about the health effects of frequent exposure to electromagnetic fields (EMF) emitted from mobile towers and handsets have been raised because of the gradual increase in usage of cell phones and frequent setting up of mobile towers. Present study is targeted to detrimental effects of EMF radiation on various biological systems mainly due to online teaching and learning process by suppressing the immune system. During COVID-19 pandemic the increased usage of internet due to online education and online office leads to more detrimental effects of EMF radiation. Further inculcation of soft computing techniques in EMF radiation has been presented. A literature review focusing on the usage of soft computing techniques in the domain of EMF radiation has been presented in the article. An online survey has been conducted targeting Indian academic stakeholders’ (Specially Teachers, Students and Parents termed as population in paper) for analyzing the awareness towards the bio hazards of EMF exposure.
... While adoption of AI in psychology is still at an early stage, its use extends into all domains of psychology. In addition to machine learning, which can be used to mine large data files (Dwyer et al., 2018) and evaluate psychological research questions (Elhai & Montag, 2020), AI has led to the development of models and theories, alongside applied uses in clinical psychology. Although psychologists typically focus on explaining human behaviour, Yarkoni and Westfall (2017) emphasise the importance of predicting behaviour, particularly for applied domains such as clinical psychology. ...
Article
Full-text available
Scientific discovery is a driving force for progress, involving creative problem-solving processes to further our understanding of the world. Historically, the process of scientific discovery has been intensive and time-consuming; however, advances in computational power and algorithms have provided an efficient route to make new discoveries. Complex tools using artificial intelligence (AI) can efficiently analyse data as well as generate new hypotheses and theories. Along with AI becoming increasingly prevalent in our daily lives and the services we access, its application to different scientific domains is becoming more widespread. For example, AI has been used for early detection of medical conditions, identifying treatments and vaccines (e.g., against COVID-19), and predicting protein structure. The application of AI in psychological science has started to become popular. AI can assist in new discoveries both as a tool that allows more freedom to scientists to generate new theories, and by making creative discoveries autonomously. Conversely, psychological concepts such as heuristics have refined and improved artificial systems. With such powerful systems, however, there are key ethical and practical issues to consider. This review addresses the current and future directions of computational scientific discovery generally and its applications in psychological science more specifically.
... It may, however, require additional substantive input to form a meaningful statement (Rubin & Donkin, 2022). This is strongly the case when a causal hypothesis is derived from the finding of an association (Elhai & Montag, 2020;Glymour et al., 2019;Ryan et al., 2019). ...
Preprint
Full-text available
Transparent exploration opens the door for scientific novelty through stimulating new or modified claims about hypotheses, models, and theories. In this second of two consecutive papers, we outline foundations, goals and means of conducting exploration. Transparency in how exploration has been done (through preregistration, open data and open analysis) is crucial for assessing the initial amount of evidence for a claim and the explorative approach to succeed. We discuss how background knowledge may inform exploration planning between the conflicting goals of completeness and efficiency. Efficiency means that new and modify claims should withstand severe testing with new data and generates relevant new knowledge. We provide guidance on filtering local data patterns (e.g. internal cross-validation) and smoothing global data patterns. The paper ends with recommendations derived from the arguments of both papers: an exploratory research agenda and suggestions for stakeholders such as journal editors on how to implement more valuable exploration. These include special journal sections or entire journals dedicated to explorative research and a mandatory separate listing of confirmed and new claims yet in a paper’s abstract.
... Please note, that in the present short perspective, we cannot discuss the many barriers to be overcome in this rapidly evolving research field. It is not trivial to deal with Big Data from a methodological aspect [29] coming in the three V's (different velocity, variety and volume) requiring analysis methods going beyond inferential statistics. And Big Data is what researchers usually are facing when studying data from the IoT. ...
Article
Full-text available
Digital data are abundantly available for researchers in the age of the Internet of Things. In the psychological and psychiatric sciences such data can be used in myriad ways to obtain insights into mental states and traits. Most importantly, such data allow researchers to record and analyze behavior in a real-world context, a scientific approach which was expensive and difficult to conduct until only recently. Much research in recent years linked digital footprints to self-report questionnaire data, likely to demonstrate proof of concept(s)—for instance linking socializing on the smartphone to self-reported extraversion (a personality trait linked to socializing)—in the sciences investigating the human mind. The present perspective piece reflects on this approach by revisiting recent work which has been carried out mining smartphone log and social media data and questions if and when self-report data will still be of relevance in psychological/psychiatric research in the near future.
... In addition, psychological studies should consider multiple time-series analyses, process evaluations, qualitative research and longitudinal surveys [11]. Existing theoretical models, postulating etiological, developmental and therapeutic aspects of PUI, should also be tested in the context of COVID-19 [23,75,76]. These measures cumulatively may offer additional understanding of underlying mechanisms and coping strategies related to mental health concerns, and propose effective strategies for intervention in behavioral addictions and PUI [16]. ...
Article
Full-text available
With the onset of the COVID-19 pandemic and the accelerated spread of the SARS-CoV-2 virus came jurisdictional limitations on mobility of citizens and distinct alterations in their daily routines. Confined to their homes, many people increased their overall internet use, with problematic use of the internet (PUI) becoming a potential reason for increased mental health concerns. Our narrative review summarizes information on the extent of PUI during the pandemic, by focusing on three types: online gaming, gambling and pornography viewing. We conclude by providing guidance for mental health professionals and those affected by PUI (with an outline of immediate research priorities and best therapeutic approaches), as well as for the general public (with an overview of safe and preventative practices).
... Too many predictors could lead to over-fitting, instability and poor generalization of a model. Thus, feature selection should be performed prior to model fitting either through a data-driven approach, a theory-driven approach, or a hybrid of these two approaches which could leverage the strengths of each (97). The number of features to retain for model development is another key consideration; in general, it is recommended that the ratio of predictors to outcome instances should be approximately 10:1 (98). ...
Article
Full-text available
Prediction and prevention of negative clinical and functional outcomes represent the two primary objectives of research conducted within the clinical high-risk for psychosis (CHR-P) paradigm. Several multivariable “risk calculator” models have been developed to predict the likelihood of developing psychosis, although these models have not been translated to clinical use. Overall, less progress has been made in developing effective interventions. In this paper, we review the existing literature on both prediction and prevention in the CHR-P paradigm and, primarily, outline ways in which expanding and combining these paths of inquiry could lead to a greater improvement in individual outcomes for those most at risk.
... Machine learning (ML) is a dynamic, robust statistical approach that allows for the identification of complex (i.e., nonlinear) relationships and interactions between a large number of predictors that lead to a given outcome. Although ML is often used in an exploratory and atheoretical manner, using theory and prior research on stress, coping, and trauma adaptation to guide variable selection serves to reduce Type I error and spurious findings and increase generalizability (Elhai & Montag, 2020). A handful of COVID-19 mental health studies have adopted an ML approach. ...
Article
Full-text available
Objective This study explored risk and resilience factors of mental health functioning during the coronavirus disease (COVID‐19) pandemic. Methods A sample of 467 adults (M age = 33.14, 63.6% female) reported on mental health (depression, anxiety, posttraumatic stress disorder [PTSD], and somatic symptoms), demands and impacts of COVID‐19, resources (e.g., social support, health care access), demographics, and psychosocial resilience factors. Results Depression, anxiety, and PTSD rates were 44%, 36%, and 23%, respectively. Supervised machine learning models identified psychosocial factors as the primary significant predictors across outcomes. Greater trauma coping self‐efficacy and forward‐focused coping, but not trauma‐focused coping, were associated with better mental health. When accounting for psychosocial resilience factors, few external resources and demographic variables emerged as significant predictors. Conclusion With ongoing stressors and traumas, employing coping strategies that emphasize distraction over trauma processing may be warranted. Clinical and community outreach efforts should target trauma coping self‐efficacy to bolster resilience during a pandemic.
... Additionally, machine learning is an inherently exploratory analytic procedure (Jordan & Mitchell, 2015). Nonetheless, we infused theory into machine learning analyses by selecting predictor variables from prior theory and relevant empirical work (Elhai & Montag, 2020). ...
Article
Objectives Research during prior virus outbreaks has examined vulnerability factors associated with increased anxiety and fear. Design We explored numerous psychopathology, sociodemographic, and virus exposure-related variables associated with anxiety and perceived threat of death regarding COVID-19. Method We recruited 908 adults from Eastern China for a cross-sectional web survey, from 24 February to 15 March 2020, when social distancing was heavily enforced in China. We used several machine learning algorithms to train our statistical model of predictor variables in modeling COVID-19-related anxiety, and perceived threat of death, separately. We trained the model using many simulated replications on a random subset of participants, and subsequently externally tested on the remaining subset of participants. Results Shrinkage machine learning algorithms performed best, indicating that stress and rumination were the most important variables in modeling COVID-19-related anxiety severity. Health anxiety was the most potent predictor of perceived threat of death from COVID-19. Conclusions Results are discussed in the context of research on anxiety and fear from prior virus outbreaks, and from theory on outbreak-related emotional vulnerability. Implications regarding COVID-19-related anxiety are also discussed.
... Authors in this special issue review research on virtual reality [31,32], and mobile app interventions to facilitate mental healthcare [33]. Other authors discuss artificial intelligence and machine learning [34,35] and digital phenotyping [36] in order to improve observation of human behavior to better understand emotions and behavior [37]. Also covered is social robotics to implement human socialization interventions [38], and technology for improving learning outcomes among students [39]. ...
Article
Full-text available
BACKGROUND: Shame and stigma often prevent individuals with social anxiety disorder (SAD) from seeking and attending costly and time-intensive psychotherapies, highlighting the importance of brief, low-cost, and scalable treatments. Creating prescriptive outcome prediction models is thus crucial for identifying which clients with SAD might gain the most from a unique scalable treatment option. Nevertheless, widely used classical regression methods might not optimally capture complex nonlinear associations and interactions. Precision medicine approaches were thus harnessed to examine prescriptive predictors of optimization to a 14-day fully self-guided mindfulness ecological momentary intervention (MEMI) over a self-monitoring app (SM). METHOD: The current study involved 191 participants who had probable SAD. Participants were randomly assigned to MEMI (n=96) or SM (n=95). They completed self-reports of symptoms, risk factors, treatment, and socio-demographics at baseline, post-treatment, and one-month follow-up (1MFU). ML models with 17 predictors of optimization to MEMI over SM, defined as a higher probability of SAD remission from MEMI at post-treatment and 1MFU, were evaluated. The Social Phobia Diagnostic Questionnaire (SPDQ), structurally equivalent to the Diagnostic and Statistical Manual (DSM) SAD criteria, was used to define remission. These ML models included random forest and support vector machines (radial basis function kernel) and 10-fold nested cross-validation that separated model training, minimal tuning in inner folds, and model testing in outer folds. RESULTS: ML models outperformed logistic regression. The multivariable ML models using the ten most important predictors achieved good performance, with the area under the receiver operating characteristic curve (AU-ROC) values ranging from .71 to .72 at post-treatment and 1MFU. These pre-randomization and early-stage prescriptive predictors consistently identified which participants had the highest probability of optimization of MEMI over SM after 14 days and 6 weeks from baseline. Significant predictors included five strengths (higher trait mindfulness, lower SAD severity, presence of university education, lower SAD severity, no current psychotropic medication use), two weaknesses (higher generalized anxiety severity and clinician-diagnosed depression or anxiety disorder), and one socio-demographic variable (Chinese ethnicity). Emotion dysregulation and current psychotherapy predicted remission with inconsistent signs across time points. CONCLUSION: The AU-ROC values indicated moderately meaningful effect sizes in identifying prescriptive predictors within multivariable models for clients with SAD. Focusing on the identified notable client strengths, weaknesses, and one socio-demographic variable may enhance our ability to predict future responses to scalable treatments. Estimating the likelihood of SAD remission with a ‘prescriptive predictor calculator’ for each client may help clinicians and policymakers allocate scarce treatment resources effectively. Plausibly, clients with high remission probability benefit from receiving the MEMI as a vigilant waitlist strategy before intensive therapist-led psychotherapy. These efforts may aid in creating actionable treatment selection tools to optimize care for clients with SAD in routine healthcare settings that employ stratified care principles.
Article
The last few decades have witnessed a revolution in the field of mental health, brought about by state-of-the-art techniques of artificial intelligence (AI). Here, we review the evidence for the systematic application of AI for the detection and intervention of stress-related mental health problems. We first explore the potential application of AI in stress detection and screening through advanced computational techniques of machine learning algorithms that analyze biomarkers of stress and anxiety. Building on the accurate detection of mental health problems, we further review the evidence for AI-based stress interventions and propose the promising prospect of applying decoded neurofeedback as a personalized resilience-building intervention. Together, the current review assesses the effectiveness and major challenges of AI technologies in real-world applications and demonstrates the transforming impact of AI on the field of mental health.
Chapter
This academic chapter provides a comprehensive analysis of digital media use by children in early childhood, encompassing current trends and historical perspectives on previous research. Conceptual and measuring challenges are explored to enhance the understanding and study of this critical subject. The introduction offers insights into the prevalence and significance of digital media usage among young children in today’s society. Additionally, it provides a contextual background by reviewing historical research in the field. Conceptual challenges address the need to clarify key terms and concepts used in the research. The paper emphasizes the importance of selecting appropriate theoretical frameworks to guide the research process, with a particular focus on the context role in shaping the study’s direction. Furthermore, it discusses the content role of conceptual frameworks in organizing and structuring the research. Measuring challenges center on the strengths and limitations of data collection tools and techniques, specifically examining parent report and apps. The paper critically assesses their effectiveness and potential biases in capturing digital media use data among young children. By addressing these conceptual and measuring challenges, this paper contributes valuable insights to researchers, educators, and policymakers, enabling more informed and rigorous studies on digital media use by children in early childhood.
Article
Full-text available
The present study investigated whether life engagement and happiness can be predicted from gaming motives and primary emotional traits. Two machine learning algorithms (random forest model and one-dimensional convolutional neural network) were applied using a dataset from before the COVID-19 pandemic as the training dataset. The algorithms derived were then applied to test if they would be useful in predicting life engagement and happiness from gaming motives and primary emotional systems on a dataset collected during the pandemic. The best prediction values were observed for happiness with ρ = 0.758 with explained variance of R ² = 0.575 when applying the best performing algorithm derived from the pre-COVID dataset to the COVID dataset. Hence, this shows that the derived algorithm based on the pre-pandemic data set, successfully predicted happiness (and life engagement) from the same set of variables during the pandemic. Overall, this study shows the feasibility of applying machine learning algorithms to predict life engagement and happiness from gaming motives and primary emotional systems.
Article
Political psychologists often examine the influence of psychological dispositions on political attitudes. Central to this field is the ideological asymmetry hypothesis (IAH), which asserts significant psychological differences between conservatives and liberals. According to the IAH, conservatives tend to exhibit greater resistance to change, a stronger inclination to uphold existing social systems, and heightened sensitivity to threats and uncertainty compared with their liberal counterparts. Our review and reanalysis, however, question the empirical strength of the IAH. We expose major concerns regarding the construct validity of the psychological dispositions and political attitudes traditionally measured. Furthermore, our research reveals that the internal validity of these studies is often compromised by endogeneity and selection biases. External and statistical validity issues are also evident, with many findings relying on small effect sizes derived from nonrepresentative student populations. Collectively, these data offer scant support for the IAH, indicating that simply amassing similar data is unlikely to clarify the validity of the hypothesis. We suggest a more intricate causal model that addresses the intricate dynamics between psychological dispositions and political attitudes. This model considers the bidirectional nature of these relationships and the moderating roles of individual and situational variables. In conclusion, we call for developing more sophisticated theories and rigorous research methodologies to enhance our comprehension of the psychological underpinnings of political ideology.
Article
Full-text available
Háttér és célkitűzés A pszichológiai kutatásmódszertan eljárásait (főképp a p értékre építkező bizonyításokat) számos kritika érte az utóbbi évtizedek során. A kutatói elfogultság és a módszertanok (például az adatgyűjtés, az adatszelekció vagy a statisztikai próbák) könnyű manipulálhatósága teret adott a félrevezető és nehezen reprodukálható kutatásoknak. A gépi tanulás elterjedése megfigyelhető a pszichológia területén is, új eszköztárat biztosítva a kutatók számára. Az eljárás áthelyezi a hangsúlyt a statisztikai bizonyításról az előrejelzésre, valamint az ehhez kapcsolódó validációs folyamatokra, ezáltal lecsökkentve a kutatói szubjektivitás hatását. Jelen tanulmány célja gyakorlati példákon keresztül betekintést nyújtani a gépi tanulás módszertanába, fókuszálva a pszichológiai alkalmazhatóságára. Módszer A vizsgálati szakasz első részében két, a gépi tanulás használatára irányuló tanulmány kerül bemutatásra a humán döntéshozatali mechanizmusok, valamint a pandémiás helyzet okozta mentális hatások területére vonatkozóan. A vizsgálati szakasz második részében egy klasszifikációs feladat (filmpreferencia és nemi identitás kapcsolata) keretén belül kerül összehasonlításra egy nem parametrikus statisztikai módszer és két, gépi tanuláson alapuló eljárás. Eredmények A kapott eredmények bemutatják a gépi tanulás által nyújtott előnyöket (validációs eljárások és többletinformáció kinyerése), párhuzamot vonva a nem parametrikus eljárással. Következtetések A tanulmány népszerűsíteni és alátámasztani hivatott a gépi tanulás alkalmazhatóságát a kutatói szektorban tevékenykedő pszichológusok számára. A bemutatott kutatás reprodukálhatóságának érdekében az adatok és programozási kódsorok szabadon felhasználhatók a tanulmányban megadott elérhetőségeken keresztül.
Article
Full-text available
Precision medicine methods (machine learning; ML) can identify which clients with generalized anxiety disorder (GAD) benefit from mindfulness ecological momentary intervention (MEMI) vs. self-monitoring app (SM). We used randomized controlled trial data of MEMI vs. SM for GAD (N = 110) and tested three ML models to predict one-month follow-up reliable improvement in GAD severity, perseverative cognitions (PC), trait mindfulness (TM), and executive function (EF). Eleven baseline predictors were tested regarding differential reliable change from MEMI vs. SM (age, sex, race, EF errors, inhibitory dyscontrol, set-shifting deficits, verbal fluency, working memory, GAD severity, TM, PC). The final top five prescriptive predictor models of all outcomes performed well (AUC = 0.752–0.886). The following variables predicted better outcome from MEMI vs. SM: Higher GAD severity predicted more GAD improvement but less EF improvement. Elevated PC, inhibitory dyscontrol, and verbal dysfluency predicted better improvement in most outcomes. Greater set-shifting and TM predicted stronger improvements in GAD symptoms and TM. Older age predicted more alleviation of GAD and PC symptoms. Women exhibited more enhancements in trait mindfulness and EF than men. Caucasians benefitted more than non-Caucasians. PC, TM, EF, and sociodemographic data could help predictive models optimize intervention selection for GAD.
Article
Full-text available
This study aims to create a decision tree model using machine learning to predict psychological readiness for entrepreneurship in college graduates. This research was conducted through several stages of research. In the early stages, a survey was conducted on 700 students from several universities in Riau aged between 17-25 years. The survey was conducted using the Entrepreneur Psychology Readiness (EPR) instrument. Furthermore, the survey data was validated and obtained 604 valid data to be used in forming machine learning models The urgency of this research is to find a number of decision rules from the best decision tree model to be used in building AI-based counseling applications in measuring entrepreneurial psychology readiness for college graduates. In this research, the decision tree model that is formed is divided into 2 models, namely: decision tree with pruning model and decision tree with unpruning. The pruning decision tree model produces 180 decision rules, while the unpruning model produces 121 decision rules. Good accuracy results are obtained in the pruned decision tree, which is above 99% in the use training set mode, and 82.87% in the percentage split mode. Meanwhile, the accuracy results on the unpruned decision tree are 90.18% with the use training set mode test, and 80.38% in the percentage split mode. The decision tree model with pruning technique has better performance than the unpruning decision tree model.
Article
Full-text available
Introduction Although outpatient psychodynamic psychotherapy is effective, there has been no improvement in treatment success in recent years. One way to improve psychodynamic treatment could be the use of machine learning to design treatments tailored to the individual patient's needs. In the context of psychotherapy, machine learning refers mainly to various statistical methods, which aim to predict outcomes (e.g., drop-out) of future patients as accurately as possible. We therefore searched various literature for all studies using machine learning in outpatient psychodynamic psychotherapy research to identify current trends and objectives. Methods For this systematic review, we applied the Preferred Reporting Items for systematic Reviews and Meta-Analyses Guidelines. Results In total, we found four studies that used machine learning in outpatient psychodynamic psychotherapy research. Three of these studies were published between 2019 and 2021. Discussion We conclude that machine learning has only recently made its way into outpatient psychodynamic psychotherapy research and researchers might not yet be aware of its possible uses. Therefore, we have listed a variety of perspectives on how machine learning could be used to increase treatment success of psychodynamic psychotherapies. In doing so, we hope to give new impetus to outpatient psychodynamic psychotherapy research on how to use machine learning to address previously unsolved problems.
Article
Data retrieval systems supporting the discovery and reuse of open data are emerging as important tools in the open data ecosystem. However, user satisfaction with them is relatively low. This study proposes the primacy-peak-recency effect to investigate the cognitive mechanisms underlying data searchers’ overall satisfaction. To test the primacy-peak-recency effect, primacy-peak-recency cubes consisting of eye movement indicators at primacy, peak, and recency moments and their combinations are constructed as the theoretical model. A user experiment was conducted to collect eye movement data and satisfaction scores generated during 48 doctoral students’ interactions with data retrieval systems. An ensemble machine learning framework was then applied to analyze eye movement data to assess the theoretical model. The results indicate that the primacy-peak-recency cubes are salient predictors of data searchers’ satisfaction (the prediction accuracy=0.682 and regression R2=0.749). This finding suggests that data searchers’ complex cognitive processes at primacy, peak, and recency moments measured by uni-, bi-, and three-dimensional eye movement indicators are predictors of overall satisfaction, confirming the primacy-peak-recency effect. In addition, combinations of varying types of influential moments and multidimensional eye movement events are the best predictors of overall satisfaction. This suggests that influential moments and cognitive processes have additive effects on overall satisfaction. Combining theory-driven and data-driven approaches, this study sheds light on the potential of machine learning approaches for analyzing neuropsychological data for heuristics examination. With these insights, practical strategies to predict data searchers’ satisfaction and optimize the user-experience design of data retrieval systems are proposed.
Article
Full-text available
This article introduces the research community to the power of machine learning over traditional approaches when analyzing longitudinal data. Although traditional approaches work well with small to medium datasets, machine learning models are more appropriate as the available data becomes larger and more complex. Additionally, machine learning methods are ideal for analyzing longitudinal data because they do not make any assumptions about the distribution of the dependent and independent variables or the homogeneity of the underlying population. They can also analyze cases with partial information. In this article, we use the Household, Income, and Labour Dynamics in Australia (HILDA) survey to illustrate the benefits of machine learning. Using a machine learning algorithm, we analyze the relationship between job‐related variables and neuroticism across 13 years of the HILDA survey. We suggest that the results produced by machine learning can be used to generate generalizable rules from the data to augment our theoretical understanding of the domain. With a technical guide, this article offers critical information and best‐practice recommendations that can assist social science researchers in conducting machine learning analysis with longitudinal data.
Chapter
In this chapter a short overview on the many topics falling under the umbrella terms digital phenotyping and mobile sensing are provided. The key terms digital phenotyping and mobile sensing are also shortly introduced. Chapter 1 is meant as a starting point to get insights on the many areas of research being covered in the second edition of this book.
Chapter
The ubiquitous presence of sensors (e.g., in smartphones) in our everyday life allows a constant real-time collection of data. This data has been successfully used in diagnosis and prediction of health outcomes and has the potential to improve health care. However, with data security and accountability as core requirements of medical applications, it remains a major challenge to integrate smart sensing information into the health care systems. One promising application is the integration into expert systems, in which smart sensing information is used to assist medical experts in their decisions. The present chapter aims to introduce expert systems, outline conceptual examples of such a smart sensing enhanced expert system, and summarize the evidence for smart sensing enhanced expert systems in health care. Lastly, the chapter will be concluded by discussing challenges in the field including ethical, privacy and security, and clinical issues followed by an outlook about future directions and developments.
Article
Full-text available
Research in the field of digital phenotyping and mobile sensing has seen a tremendous rise in interest over the last few years. The psychological and psychiatric sciences were early adopters of implementing these promising techniques into their research to better understand the human mind. The most often studied data to predict mental states and traits at the moment represent reaction‐time and app usage data from multi‐step human‐smartphone interactions and digital footprints left from the user's interactions with social media platforms. Interestingly, research that links reaction time measurements and other digital footprints to underlying neurobiology data from magnetic resonance imaging, electroencephalography, or molecular genetics has thus far been mostly lacking. As a starting point for discussion among neuroscientists, in this article, we review the scant literature applying digital phenotyping/mobile sensing to neuroscientific research and outline the potential of this new research approach. With the ubiquity of smartphones, many of these reviewed works focus on smartphone‐based‐studies in the neuroscientific digital phenotyping/mobile sensing field.
Article
Full-text available
Purpose of Review The present paper provides an accessible overview on the potential of digital phenotyping and mobile sensing not only shedding light on the nature of Internet Use Disorders (IUD), but also to provide new ideas on how to improve psycho-diagnostics of mental processes linked to IUD. Recent Findings In detail, the psycho-diagnostic areas of prevention, treatment, and aftercare in the realm of IUDs are focused upon in this work. Before each of these areas is presented in more specificity, the terms digital phenotyping and mobile sensing are introduced against the background of an interdisciplinary research endeavor called Psychoinformatics. Obstacles to overcome problems in this emerging research endeavor—sensing psychological traits/states from digital footprints—are discussed together with risks and chances, which arise from the administration of online-tracking technologies in the field of IUDs. Summary Given the limited validity and reliability of traditional assessment via questionnaires or diagnostic interviews with respect to recall bias and tendencies to answer towards social desirability, digital phenotyping and mobile sensing offer a novel approach overcoming recall bias and other limitations of usual assessment approaches. This will not only set new standards in precisely mapping behavior, but it will also offer scientists and practitioners opportunities to detect risky Internet use patterns in a timely manner and to establish tailored feedback as a means of intervention.
Article
Background: Existing research using machine learning to investigate alcohol use among adolescents has largely neglected peer influences and tended to rely on models which selected predictors based on data availability, rather than being guided by a unifying theoretical framework. In addition, previous models of peer influence were typically estimated by using traditional regression techniques, which are known to have worse fit compared to the models estimated using machine learning methods. Methods: Addressing these limitations, we use three machine-learning algorithms to fit a theoretical model of social interactions in alcohol consumption. The model is fit to a large, nationally representative sample of U.S. school-aged adolescents and accounts for various channels of peer influence. Results: We find that extreme gradient boosting is the best performing algorithm in predicting alcohol consumption. After the algorithm ranks, the explanatory variables by their importance in classification, previous year drinking status, misperception about friends’ drinking, and average actual drinking among friends are the most important predictors of adolescent drinking. Conclusions: Our findings suggest that an effective intervention should focus on school peers and adolescents’ perceptions about drinking norms, in addition to the history of alcohol use. Our study may also increase interest in theory-driven selection of covariates for machine-learning models.
Article
Full-text available
In this study, problematic social media use (PSU) was modeled using machine learning with artificial neural networks (ANN) and support vector machines (SVM). Fifteen predictor variables were examined in predicting PSU, including social media usage habits (frequency of daily social media use, history of social media usage, frequency of checking social media accounts, number of shares on social media, and number of social media accounts), desire for being liked, envy of the life of others, narcissistic personality traits (exhibitionism, grandiose fantasies, manipulativeness, thrill-seeking, narcissistic admiration and narcissistic rivalry), fear of missing out (FOMO), and online socialization. The present study comprised 309 (208 females and 101 males) university students. Using ANN and SVM, estimation was performed using k-folds (k = 5) cross validation. Results demonstrated a large relationship between predictors and PSU scores. Estimation rates with ANN and SVM were each .61. Then we used forward selection procedures to determine variable importance. We found that frequency of daily social media use, frequency of checking social media accounts, desire for being liked, exhibitionism and FOMO were the five most important variables in association with PSU severity. Finally, we analyzed the extent to which these five variables predicted PSU, finding that the estimate with five variables had a higher coefficient of estimation than with the fifteen variables. Prediction rates for the five variables were .62 using ANN and .63 using SVM. Results demonstrate that several psychological and social media-related variables were important in modeling PSU severity.
Article
Full-text available
This article discusses the fear of missing out (FOMO) on rewarding experiences, an important psychological construct in contemporary times. We present an overview of the FOMO construct and its operational definition and measurement. Then, we review recent empirical research on FOMO’s relationship with levels of online social engagement, problematic technology and internet communication use, negative affectivity, and sociodemographic variables. Additionally, we discuss theoretical conceptualizations regarding possible causes of FOMO and how FOMO may drive problematic internet technology use. Finally, we discuss future directions for the empirical study of FOMO.
Article
Full-text available
Player protection and harm minimization have become increasingly important in the gambling industry along with the promotion of responsible gambling (RG). Among the most widespread RG tools that gaming operators provide are limit-setting tools that help players limit the amount of time and/or money they spend gambling. Research suggests that limit-setting significantly reduces the amount of money that players spend. If limit-setting is to be encouraged as a way of facilitating responsible gambling, it is important to know what variables are important in getting individuals to set and change limits in the first place. In the present study, 33 variables assessing the player behavior among Norsk Tipping clientele (N = 70,789) from January to March 2017 were computed. The 33 variables which reflect the players’ behavior were then used to predict the likelihood of gamblers changing their monetary limit between April and June 2017. The 70,789 players were randomly split into a training dataset of 56,532 and an evaluation set of 14,157 players (corresponding to an 80/20 split). The results demonstrated that it is possible to predict future limit-setting based on player behavior. The random forest algorithm appeared to predict limit-changing behavior much better than the other algorithms. However, on the independent test data, the random forest algorithm’s accuracy dropped significantly. The best performance on the test data along with a small decrease in accuracy in comparison to the training data was delivered by the gradient boost machine learning algorithm. The most important variables predicting future limit-setting using the gradient boost machine algorithm were players receiving feedback that they had reached 80% of their personal monthly global loss limit, personal monthly loss limit, the amount bet, theoretical loss, and whether the players had increased their limits in the past. With the help of predictive analytics, players with a high likelihood of changing their limits can be proactively approached.
Article
Full-text available
In recent years, the trustworthiness of psychological science has been questioned. A major concern is that many research findings are less robust than the published evidence suggests. Several reasons may contribute to this state of affairs. Two prominently discussed reasons are that (a) researchers use questionable research practices (so called p-hacking) when they analyze the data of their empirical studies, and (b) studies that revealed results consistent with expectations are more likely published than studies that “failed” (publication bias). The present large-scale simulation study estimates the extent to which meta-analytic effect sizes are biased by different degrees of p-hacking and publication bias, considering several factors of influence that may impact on this bias (e.g., the true effect of the phenomenon of interest). Results show that both p-hacking and publication bias contribute to a potentially severely biased impression of the overall evidence. This is especially the case when the true effect that is investigated is very small or does not exist at all. Severe publication bias alone can exert considerable bias; p-hacking exerts considerable bias only when there is also publication bias. However, p-hacking can severely increase the rate of false positives, that is, findings that suggest that a study found a real effect when, in reality, no effect exists. A key implication of the present study is that, in addition to preventing p-hacking, policies in research institutions, funding agencies, and scientific journals need to make the prevention of publication bias a top priority to ensure a trustworthy base of evidence.
Article
Full-text available
We examined the extent to which the Big Five domains, 30 facets, and nuances (uniquely represented by individual questionnaire items) capture age differences in personality, expecting domains to contain the least and nuances the most age-related information. We used an Internet sample (N = 24,000), evenly distributed between ages of 18 and 50 years and tested with a 300-item questionnaire. Separately based on domains, facets, and items, we trained models to predict age in one part of the sample and tested their predictive accuracy in another part. Big Five domains predicted age with an accuracy of r = .28, whereas facets' (r = .44) and items' (r = .65) predictions were more accurate. Less than 15% of the sample was needed to train models to their optimal accuracy. Residualizing the 300 items for all facets had no impact on their predictive accuracy, suggesting that age differences in specific behaviors, thoughts, and feelings (i.e., items) were not because of domains and facets but mostly unique to nuances. These findings replicated in a multisample dataset tested with another questionnaire. We found little evidence that age differences only appeared nuanced because items referred to age-graded roles or experiences. Therefore, a substantial part of personality development may be uniquely ascribed to narrow personality characteristics, suggesting the possibility for a many-dimensional representation of personality development. Besides theoretical implications, we provide concrete illustrations of how this can open new research avenues by enabling to study systematic variations between traits. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Article
Full-text available
Objective: Depression is a highly common mental disorder and a major cause of disability worldwide. Several psychological interventions are available, but there is a lack of evidence to decide which treatment works best for whom. This study aimed to identify subgroups of patients who respond differentially to cognitive behavioural therapy (CBT) or person-centred counselling for depression (CfD). Methods: This was a retrospective analysis of archival routine practice data for 1435 patients who received either CBT (N=1104) or CfD (N=331) in primary care. The main outcome was post-treatment reliable and clinically significant improvement (RCSI) in the PHQ-9 depression measure. A targeted prescription algorithm was developed in a training sample (N=1085) using a supervised machine learning approach (elastic net with optimal scaling). The clinical utility of the algorithm was examined in a statistically independent test sample (N=350) using chi-square analysis and odds ratios. Results: Cases in the test sample that received their model-indicated “optimal” treatment had a significantly higher RCSI rate (62.5%) compared to those who received the “suboptimal” treatment (41.7%); x2 (DF = 1) = 4.79, p = .03, OR = 2.33 (95% CI = 1.09, 5.02). Conclusions: Targeted prescription has the potential to make best use of currently available evidence-based treatments, improving outcomes for patients at no additional cost to psychological services.
Article
Full-text available
This paper analyzes current practices in psychology in the use of research methods and data analysis procedures (DAP) and aims to determine whether researchers are now using more sophisticated and advanced DAP than were employed previously. We reviewed empirical research published recently in prominent journals from the USA and Europe corresponding to the main psychological categories of Journal Citation Reports and examined research methods, number of studies, number and type of DAP, and statistical package. The 288 papers reviewed used 663 different DAP. Experimental and correlational studies were the most prevalent, depending on the specific field of psychology. Two-thirds of the papers reported a single study, although those in journals with an experimental focus typically described more. The papers mainly used parametric tests for comparison and statistical techniques for analyzing relationships among variables. Regarding the former, the most frequently used procedure was ANOVA, with mixed factorial ANOVA being the most prevalent. A decline in the use of non-parametric analysis was observed in relation to previous research. Relationships among variables were most commonly examined using regression models, with hierarchical regression and mediation analysis being the most prevalent procedures. There was also a decline in the use of stepwise regression and an increase in the use of structural equation modeling, confirmatory factor analysis, and hierarchical linear modeling. Overall, the results show that recent empirical studies published in journals belonging to the main areas of psychology are employing more varied and advanced statistical techniques of greater computational complexity.
Article
Full-text available
Background: Early identification of probable post-traumatic stress disorder (PTSD) can lead to early intervention and treatment. Aims: This study aimed to evaluate supervised machine learning (ML) classifiers for the identification of probable PTSD in those who are serving, or have recently served in the United Kingdom (UK) Armed Forces. Methods: Supervised ML classification techniques were applied to a military cohort of 13,690 serving and ex-serving UK Armed Forces personnel to identify probable PTSD based on self-reported service exposures and a range of validated self-report measures. Data were collected between 2004 and 2009. Results: The predictive performance of supervised ML classifiers to detect cases of probable PTSD were encouraging when compared to a validated measure, demonstrating a capability of supervised ML to detect the cases of probable PTSD. It was possible to identify which variables contributed to the performance, including alcohol misuse, gender and deployment status. A satisfactory sensitivity was obtained across a range of supervised ML classifiers, but sensitivity was low, indicating a potential for false negative diagnoses. Conclusions: Detection of probable PTSD based on self-reported measurement data is feasible, may greatly reduce the burden on public health and improve operational efficiencies by enabling early intervention, before manifestation of symptoms.
Article
Full-text available
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
Article
Full-text available
We show that faces contain much more information about sexual orientation than can be perceived or interpreted by the human brain. We used deep neural networks to extract features from 35,326 facial images. These features were entered into a logistic regression aimed at classifying sexual orientation. Given a single facial image, a classifier could correctly distinguish between gay and heterosexual men in 81% of cases, and in 71% of cases for women. Human judges achieved much lower accuracy: 61% for men and 54% for women. The accuracy of the algorithm increased to 91% and 83%, respectively, given five facial images per person. Facial features employed by the classifier included both fixed (e.g., nose shape) and transient facial features (e.g., grooming style). Consistent with the prenatal hormone theory of sexual orientation, gay men and women tended to have gender-atypical facial morphology, expression, and grooming styles. Prediction models aimed at gender alone allowed for detecting gay males with 57% accuracy and gay females with 58% accuracy. Those findings advance our understanding of the origins of sexual orientation and the limits of human perception. Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits, our findings expose a threat to the privacy and safety of gay men and women.
Article
Full-text available
Background Available therapies for Alzheimer’s disease (AD) can only alleviate and delay the advance of symptoms, with the greatest impact eventually achieved when provided at an early stage. Thus, early identification of which subjects at high risk, e.g., with MCI, will later develop AD is of key importance. Currently available machine learning algorithms achieve only limited predictive accuracy or they are based on expensive and hard-to-collect information. Objective The current study aims to develop an algorithm for a 3-year prediction of conversion to AD in MCI and PreMCI subjects based only on non-invasively and effectively collectable predictors. Methods A dataset of 123 MCI/PreMCI subjects was used to train different machine learning techniques. Baseline information regarding sociodemographic characteristics, clinical and neuropsychological test scores, cardiovascular risk indexes, and a visual rating scale for brain atrophy was used to extract 36 predictors. Leave-pair-out-cross-validation was employed as validation strategy and a recursive feature elimination procedure was applied to identify a relevant subset of predictors. Results 16 predictors were selected from all domains excluding sociodemographic information. The best model resulted a support vector machine with radial-basis function kernel (whole sample: AUC = 0.962, best balanced accuracy = 0.913; MCI sub-group alone: AUC = 0.914, best balanced accuracy = 0.874). Conclusions Our algorithm shows very high cross-validated performances that outperform the vast majority of the currently available algorithms, and all those which use only non-invasive and effectively assessable predictors. Further testing and optimization in independent samples will warrant its application in both clinical practice and clinical trials.
Article
Full-text available
Psychology advances knowledge by testing statistical hypotheses using empirical observations and data. The expectation is that most statistically significant findings can be replicated in new data and in new laboratories, but in practice many findings have replicated less often than expected, leading to claims of a replication crisis. We review recent methodological literature on questionable research practices, meta-analysis, and power analysis to explain the apparently high rates of failure to replicate. Psychologists can improve research practices to advance knowledge in ways that improve replicability. We recommend that researchers adopt open science conventions of preregi-stration and full disclosure and that replication efforts be based on multiple studies rather than on a single replication attempt. We call for more sophisticated power analyses, careful consideration of the various influences on effect sizes, and more complete disclosure of nonsignificant as well as statistically significant findings.
Article
Full-text available
Mediated moderation (meMO) occurs when the moderation effect of the moderator (W) on the relationship between the independent variable (X) and the dependent variable (Y) is transmitted through a mediator (M). To examine this process empirically, 2 different model specifications (Type I meMO and Type II meMO) have been proposed in the literature. However, both specifications are found to be problematic, either conceptually or statistically. For example, it can be shown that each type of meMO model is statistically equivalent to a particular form of moderated mediation (moME), another process that examines the condition when the indirect effect from X to Y through M varies as a function of W. Consequently, it is difficult for one to differentiate these 2 processes mathematically. This study therefore has 2 objectives. First, we attempt to differentiate moME and meMO by proposing an alternative specification for meMO. Conceptually, this alternative specification is intuitively meaningful and interpretable, and, statistically, it offers meMO a unique representation that is no longer identical to its moME counterpart. Second, using structural equation modeling, we propose an integrated approach for the analysis of meMO as well as for other general types of conditional path models. VS, a computer software program that implements the proposed approach, has been developed to facilitate the analysis of conditional path models for applied researchers. Real examples are considered to illustrate how the proposed approach works in practice and to compare its performance against the traditional methods.
Article
Full-text available
Mediation analysis has become one of the most popular statistical methods in the social sciences. However, many currently available effect size measures for mediation have limitations that restrict their use to specific mediation models. In this article, we develop a measure of effect size that addresses these limitations. We show how modification of a currently existing effect size measure results in a novel effect size measure with many desirable properties. We also derive an expression for the bias of the sample estimator for the proposed effect size measure and propose an adjusted version of the estimator. We present a Monte Carlo simulation study conducted to examine the finite sampling properties of the adjusted and unadjusted estimators, which shows that the adjusted estimator is effective at recovering the true value it estimates. Finally, we demonstrate the use of the effect size measure with an empirical example. We provide freely available software so that researchers can immediately implement the methods we discuss. Our developments here extend the existing literature on effect sizes and mediation by developing a potentially useful method of communicating the magnitude of mediation.
Article
Full-text available
Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation and a method by Tibshirani and Tibshirani, BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based hypothesis test we stop training of models on new folds of statistically-significantly inferior configurations. We name the method Bootstrap Corrected with Early Dropping CV (BCED-CV) that is both efficient and provides accurate performance estimates.
Article
Full-text available
Within the last two decades, many studies have addressed the clinical phenomenon of Internet-use disorders, with a particular focus on Internet-gaming disorder. Based on previous theoretical considerations and empirical findings, we suggest an Interaction of Person-Affect-Cognition-Execution (I-PACE) model of specific Internet-use disorders. The I-PACE model is a theoretical framework for the processes underlying the development and maintenance of an addictive use of certain Internet applications or sites promoting gaming, gambling, pornography viewing, shopping, or communication. The model is composed as a process model. Specific Internet-use disorders are considered to be the consequence of interactions between predisposing factors, such as neurobiological and psychological constitutions, moderators, such as coping styles and Internet-related cognitive biases, and mediators, such as affective and cognitive responses to situational triggers in combination with reduced executive functioning. Conditioning processes may strengthen these associations within an addiction process. Although the hypotheses regarding the mechanisms underlying the development and maintenance of specific Internet-use disorders, summarized in the I-PACE model, must be further tested empirically, implications for treatment interventions are suggested.
Article
Full-text available
Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM). In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration.
Article
Full-text available
The veracity of substantive research claims hinges on the way experimental data are collected and analyzed. In this article, we discuss an uncomfortable fact that threatens the core of psychology’s academic enterprise: almost without exception, psychologists do not commit themselves to a method of data analysis before they see the actual data. It then becomes tempting to fine tune the analysis to the data in order to obtain a desired result—a procedure that invalidates the interpretation of the common statistical tests. The extent of the fine tuning varies widely across experiments and experimenters but is almost impossible for reviewers and readers to gauge. To remedy the situation, we propose that researchers preregister their studies and indicate in advance the analyses they intend to conduct. Only these analyses deserve the label “confirmatory,” and only for these analyses are the common statistical tests valid. Other analyses can be carried out but these should be labeled “exploratory.” We illustrate our proposal with a confirmatory replication attempt of a study on extrasensory perception.
Article
With the advent of digital approaches to mental health, modern artificial intelligence (AI), and machine learning in particular, is being used in the development of prediction, detection and treatment solutions for mental health care. In terms of treatment, AI is being incorporated into digital interventions, particularly web and smartphone apps, to enhance user experience and optimise personalised mental health care. In terms of prediction and detection, modern streams of abundant data mean that data-driven AI methods can be employed to develop prediction/detection models for mental health conditions. In particular, an individual's 'digital exhaust', the data gathered from their numerous personal digital device and social media interactions, can be mined for behavioural or mental health insights. Language, long considered a window into the human mind, can now be quantitatively harnessed as data with powerful computer-based natural language processing to also provide a method of inferring mental health. Furthermore, natural language processing can also be used to develop conversational agents used for therapeutic intervention.
Article
Machine learning (i.e., data mining, artificial intelligence, big data) has been increasingly applied in psychological science. Although some areas of research have benefited tremendously from a new set of statistical tools, most often in the use of biological or genetic variables, the hype has not been substantiated in more traditional areas of research. We argue that this phenomenon results from measurement errors that prevent machine-learning algorithms from accurately modeling nonlinear relationships, if indeed they exist. This shortcoming is showcased across a set of simulated examples, demonstrating that model selection between a machine-learning algorithm and regression depends on the measurement quality, regardless of sample size. We conclude with a set of recommendations and a discussion of ways to better integrate machine learning with statistics as traditionally practiced in psychological science.
Article
Multiple comparison adjustments have a long history, yet confusion remains about which procedures control type 1 error rate in a strong sense and how to show this. Part of the confusion stems from a powerful technique called the closed testing principle, whose statement is deceptively simple, but is sometimes misinterpreted. This primer presents a straightforward way to think about multiplicity adjustment.
Article
Frequently, researchers in psychology are faced with the challenge of narrowing down a large set of predictors to a smaller subset. There are a variety of ways to do this, but commonly it is done by choosing predictors with the strongest bivariate correlations with the outcome. However, when predictors are correlated, bivariate relationships may not translate into multivariate relationships. Further, any attempts to control for multiple testing are likely to result in extremely low power. Here we introduce a Bayesian variable-selection procedure frequently used in other disciplines, stochastic search variable selection (SSVS). We apply this technique to choosing the best set of predictors of the perceived unpleasantness of an experimental pain stimulus from among a large group of sociocultural, psychological, and neurobiological (functional MRI) individual-difference measures. Using SSVS provides information about which variables predict the outcome, controlling for uncertainty in the other variables of the model. This approach yields new, useful information to guide the choice of relevant predictors. We have provided Web-based open-source software for performing SSVS and visualizing the results.
Article
We examined a model of psychopathology variables, age and sex as correlates of problematic smartphone use (PSU) severity using supervised machine learning in a sample of Chinese undergraduate students. A sample of 1097 participants completed measures querying demographics, and psychological measures of PSU, depression and anxiety symptoms, fear of missing out (FOMO), and rumination. We used several different machine learning algorithms to train our statistical model of age, sex and the psychological variables in modeling PSU severity, trained using many simulated replications on a random subset of participants, and externally tested on the remaining subset of participants. Shrinkage algorithms (lasso, ridge, and elastic net regression) performing slightly but statistically better than other algorithms. Results from the training subset generalized to the test subset, without substantial worsening of fit using traditional fit indices. FOMO had the largest relative contribution in modeling PSU severity when adjusting for other covariates in the model. Results emphasize the significance of FOMO to the construct of PSU.
Article
******** Limited number of free copies: https://www.tandfonline.com/eprint/ZUITGIZPWBFPI6H6SG7Y/full?target=10.1080/10705511.2019.1693273 ******* Correct detection of measurement bias could help researchers revise models or refine psychological scales. Measurement bias detection can be viewed as a variable-selection problem, in which biased items are optimally selected from a set of items. This study investigated a number of regularization methods: ridge, lasso, elastic net (enet) and adaptive lasso (alasso), in comparison with maximum likelihood estimation (MLE) for detecting various forms of measurement bias in regard to a continuous violator using restricted factor analysis. Particularly, complex structural equation models with relatively small sample sizes were the study focus. Through a simulation study and an empirical example, results indicated that the enet outperformed other methods in small samples for identifying biased items. The alasso yielded low false positive rates for non-biased items outside of a high number of biased items. MLE performed well for the overall estimation of biased items.
Article
Importance Suicide is a public health problem, with multiple causes that are poorly understood. The increased focus on combining health care data with machine-learning approaches in psychiatry may help advance the understanding of suicide risk. Objective To examine sex-specific risk profiles for death from suicide using machine-learning methods and data from the population of Denmark. Design, Setting, and Participants A case-cohort study nested within 8 national Danish health and social registries was conducted from January 1, 1995, through December 31, 2015. The source population was all persons born or residing in Denmark as of January 1, 1995. Data were analyzed from November 5, 2018, through May 13, 2019. Exposures Exposures included 1339 variables spanning domains of suicide risk factors. Main Outcomes and Measures Death from suicide from the Danish cause of death registry. Results A total of 14 103 individuals died by suicide between 1995 and 2015 (10 152 men [72.0%]; mean [SD] age, 43.5 [18.8] years and 3951 women [28.0%]; age, 47.6 [18.8] years). The comparison subcohort was a 5% random sample (n = 265 183) of living individuals in Denmark on January 1, 1995 (130 591 men [49.2%]; age, 37.4 [21.8] years and 134 592 women [50.8%]; age, 39.9 [23.4] years). With use of classification trees and random forests, sex-specific differences were noted in risk for suicide, with physical health more important to men’s suicide risk than women’s suicide risk. Psychiatric disorders and possibly associated medications were important to suicide risk, with specific results that may increase clarity in the literature. For example, stress disorders among unmarried men older than 30 years were important factors for suicide risk in the presence of depression (risk, 0.54). Generally, diagnoses and medications measured 48 months before suicide were more important indicators of suicide risk than when measured 6 months earlier. Individuals in the top 5% of predicted suicide risk appeared to account for 32.0% of all suicide cases in men and 53.4% of all cases in women. Conclusions and Relevance Despite decades of research on suicide risk factors, understanding of suicide remains poor. In this study, the first to date to develop risk profiles for suicide based on data from a full population, apparent consistency with what is known about suicide risk was noted, as well as potentially important, understudied risk factors with evidence of unique suicide risk profiles among specific subpopulations.
Article
Background: Substance use disorder (SUD) exacts enormous societal costs in the United States, and it is important to detect high-risk youths for prevention. Machine learning (ML) is the method to find patterns and make prediction from data. We hypothesized that ML identifies the health, psychological, psychiatric, and contextual features to predict SUD, and the identified features predict high-risk individuals to develop SUD. Method: Male (N = 494) and female (N = 206) participants and their informant parents were administered a battery of questionnaires across five waves of assessment conducted at 10-12, 12-14, 16, 19, and 22 years of age. Characteristics most strongly associated with SUD were identified using the random forest (RF)algorithm from approximately 1000 variables measured at each assessment. Next, the complement of features was validated, and the best models were selected for predicting SUD using seven ML algorithms. Lastly, area under the receiver operating characteristic curve (AUROC) evaluated accuracy of detecting individuals who develop SUD+/- up to thirty years of age. Results: Approximately thirty variables strongly predict SUD. The predictors shift from psychological dysregulation and poor health behavior in late childhood to non-normative socialization in mid to late adolescence. In 10-12-year-old youths, the features predict SUD+/- with 74% accuracy, increasing to 86% at 22 years of age. The RF algorithm optimally detects individuals between 10-22 years of age who develop SUD compared to other ML algorithms. Conclusion: These findings inform the items required for inclusion in instruments to accurately identify high risk youths and young adults requiring SUD prevention.
Article
A worrying number of psychological findings are not replicable. Diagnoses of the causes of this “replication crisis,” and recommendations to address it, have nearly exclusively focused on methods of data collection, analysis, and reporting. We argue that a further cause of poor replicability is the often weak logical link between theories and their empirical tests. We propose a distinction between discovery-oriented and theory-testing research. In discovery-oriented research, theories do not strongly imply hypotheses by which they can be tested, but rather define a search space for the discovery of effects that would support them. Failures to find these effects do not question the theory. This endeavor necessarily engenders a high risk of Type I errors—that is, publication of findings that will not replicate. Theory-testing research, by contrast, relies on theories that strongly imply hypotheses, such that disconfirmation of the hypothesis provides evidence against the theory. Theory-testing research engenders a smaller risk of Type I errors. A strong link between theories and hypotheses is best achieved by formalizing theories as computational models. We critically revisit recommendations for addressing the “replication crisis,” including the proposal to distinguish exploratory from confirmatory research, and the preregistration of hypotheses and analysis plans.
Article
We propose an updated version of the Interaction of Person-Affect-Cognition-Execution (I-PACE) model, which we argue to be valid for several types of addictive behaviors, such as gambling, gaming, buying-shopping, and compulsive sexual behavior disorders. Based on recent empirical findings and theoretical considerations, we argue that addictive behaviors develop as a consequence of the interactions between predisposing variables, affective and cognitive responses to specific stimuli, and executive functions, such as inhibitory control and decision-making. In the process of addictive behaviors, the associations between cue-reactivity/craving and diminished inhibitory control contribute to the development of habitual behaviors. An imbalance between structures of fronto-striatal circuits, particularly between ventral striatum, amygdala, and dorsolateral prefrontal areas, may be particularly relevant to early stages and the dorsal striatum to later stages of addictive processes. The I-PACE model may provide a theoretical foundation for future studies on addictive behaviors and clinical practice. Future studies should investigate common and unique mechanisms involved in addictive, obsessive-compulsive-related, impulse-control, and substance-use disorders.
Article
This study provides a predictive measurement tool to examine perceived anxiety from a longitudinal perspective, using a non-intrusive machine learning approach to scale human rating of anxiety in microblogs. Results suggest that our chosen machine learning approach depicts perceived user state-anxiety fluctuations over time, as well as mean trait anxiety. We further find a reverse relationship between perceived anxiety and outcomes such as social engagement and popularity. Implications on the individual, organizational, and societal levels are discussed.
Article
The science and technology is more and more developed. Digital media such as articles, commentary, videos, animations and others on the Internet is becoming more and more important. English semantic analysis has many basic technologies, many applications are also gradually budding in this basic technology. On the other hand, there is no uniform or complete reorganization of the basic technologies in Chinese semantic analysis. Chinese semantic analysis is difficult than English semantic analysis because it is difficult to judge the true meaning of Chinese words and sentences. This study collects articles about common news sites in Taiwan and related to individual stocks. After the data is preprocessed and Skip-gram, each word is converted to word features using Word2Vec. The Lexicon stores the most relevant words around the keyword. In the prediction stage, this study calculates the impact of new articles on the stock price according to the full training lexicon. Finally, this study uses the deep learning approach - LSTM (Long Short-Term Memory) to evaluate the final results. The aim of this study is to adopt anticipatory computing to explore the public mood and emotion from news articles. Then this study can predict the future stock market trend and can be the reference model to the related industries.
Article
Background This paper aims to synthesise the literature on machine learning (ML) and big data applications for mental health, highlighting current research and applications in practice. Methods We employed a scoping review methodology to rapidly map the field of ML in mental health. Eight health and information technology research databases were searched for papers covering this domain. Articles were assessed by two reviewers, and data were extracted on the article's mental health application, ML technique, data type, and study results. Articles were then synthesised via narrative review. Results Three hundred papers focusing on the application of ML to mental health were identified. Four main application domains emerged in the literature, including: (i) detection and diagnosis; (ii) prognosis, treatment and support; (iii) public health, and; (iv) research and clinical administration. The most common mental health conditions addressed included depression, schizophrenia, and Alzheimer's disease. ML techniques used included support vector machines, decision trees, neural networks, latent Dirichlet allocation, and clustering. Conclusions Overall, the application of ML to mental health has demonstrated a range of benefits across the areas of diagnosis, treatment and support, research, and clinical administration. With the majority of studies identified focusing on the detection and diagnosis of mental health conditions, it is evident that there is significant room for the application of ML to other areas of psychology and mental health. The challenges of using ML techniques are discussed, as well as opportunities to improve and advance the field.
Article
The replication crisis facing the psychological sciences is widely regarded as rooted in methodological or statistical shortcomings. We argue that a large part of the problem is the lack of a cumulative theoretical framework or frameworks. Without an overarching theoretical framework that generates hypotheses across diverse domains, empirical programs spawn and grow from personal intuitions and culturally biased folk theories. By providing ways to develop clear predictions, including through the use of formal modelling, theoretical frameworks set expectations that determine whether a new finding is confirmatory, nicely integrating with existing lines of research, or surprising, and therefore requiring further replication and scrutiny. Such frameworks also prioritize certain research foci, motivate the use diverse empirical approaches and, often, provide a natural means to integrate across the sciences. Thus, overarching theoretical frameworks pave the way toward a more general theory of human behaviour. We illustrate one such a theoretical framework: dual inheritance theory.
Article
Despite psychological scientists’ increasing interest in replicability, open science, research transparency, and the improvement of methods and practices, the clinical psychology community has been slow to engage. This has been shifting more recently, and with this review, we hope to facilitate this emerging dialogue. We begin by examining some potential areas of weakness in clinical psychology in terms of methods, practices, and evidentiary base. We then discuss a select overview of solutions, tools, and current concerns of the reform movement from a clinical psychological science perspective. We examine areas of clinical science expertise (e.g, implementation science) that should be leveraged to inform open science and reform efforts. Finally, we reiterate the call to clinical psychologists to increase their efforts toward reform that can further improve the credibility of clinical psychological science. Expected final online publication date for the Annual Review of Clinical Psychology Volume 15 is May 7, 2019. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations.
Article
Background Adolescents have high rates of nonfatal suicide attempts, but clinically practical risk prediction remains a challenge. Screening can be time consuming to implement at scale, if it is done at all. Computational algorithms may predict suicide risk using only routinely collected clinical data. We used a machine learning approach validated on longitudinal clinical data in adults to address this challenge in adolescents. Methods This is a retrospective, longitudinal cohort study. Data were collected from the Vanderbilt Synthetic Derivative from January 1998 to December 2015 and included 974 adolescents with nonfatal suicide attempts and multiple control comparisons: 496 adolescents with other self‐injury (OSI), 7,059 adolescents with depressive symptoms, and 25,081 adolescent general hospital controls. Candidate predictors included diagnostic, demographic, medication, and socioeconomic factors. Outcome was determined by multiexpert review of electronic health records. Random forests were validated with optimism adjustment at multiple time points (from 1 week to 2 years). Recalibration was done via isotonic regression. Evaluation metrics included discrimination (AUC, sensitivity/specificity, precision/recall) and calibration (calibration plots, slope/intercept, Brier score). Results Computational models performed well and did not require face‐to‐face screening. Performance improved as suicide attempts became more imminent. Discrimination was good in comparison with OSI controls (AUC = 0.83 [0.82–0.84] at 720 days; AUC = 0.85 [0.84–0.87] at 7 days) and depressed controls (AUC = 0.87 [95% CI 0.85–0.90] at 720 days; 0.90 [0.85–0.94] at 7 days) and best in comparison with general hospital controls (AUC 0.94 [0.92–0.96] at 720 days; 0.97 [0.95–0.98] at 7 days). Random forests significantly outperformed logistic regression in every comparison. Recalibration improved performance as much as ninefold – clinical recommendations with poorly calibrated predictions can lead to decision errors. Conclusions Machine learning on longitudinal clinical data may provide a scalable approach to broaden screening for risk of nonfatal suicide attempts in adolescents.
Article
Background Major depressive disorder is a high-prevalence disease associated with a heavy burden on both personal well-being and socio-economical welfare, partly as a result of lacking tailored treatment options [1]. Common single nucleotide polymorphisms (SNPs) were estimated to account for 0.42 of the variance in antidepressant response [2], confirming the hypothesis that genetic polymorphisms may be used as effective markers to provide tailored antidepressant treatments. Candidate gene and genome-wide analyses can provide a complementary strategy, since the former can be applied to clarify the role of SNPs with high pre-test probability of association with the trait and the latter is useful to study the joint effects of a number of SNPs in a gene or a set of genes [3]. Aim We applied such a complementary strategy to the study of eight genes that are very strong candidates with previous evidence of pleiotropic effect across psychiatric traits. The genes of interest are involved in the regulation of neurotransmission (CACNA1C, CACNB2, ANK3), neural differentiation, synaptic plasticity, adhesion processes and structural organization (GRM7, TCF4, ITIH3, SYNE1) and glucocorticoid signaling (FKBP5). Methods Three samples with major depressive disorder (total n=671) were genotyped for 44 SNPs in strong candidate genes based on biological function and previous genome-wide association studies (CACNA1C, CACNB2, ANK3, GRM7, TCF4, ITIH3, SYNE1, FKBP5). Phenotypes were response/remission after 4 weeks of treatment and treatment-resistant depression (TRD: non response/non remission to at least two antidepressant treatments). Genome-wide data from STAR*D were used to replicate findings for response/remission (Level 1, n=1409) and TRD (Level 2, n=620). Pathways including the most promising candidate genes for involvement in TRD were investigated in STAR*D Level 2. Top pathway(s) were investigated using machine learning models. Results FKBP5 rs3800373, rs1360780 and rs9470080 showed replicated associations with response, remission or TRD. CACNA1C SNPs showed contradictory direction of association across samples. ANK3 rs1049862 AA genotype showed a replicated association with better outcome. In STAR*D the best pathway associated with TRD included CACNA1C (GO:0006942, permutated p=0.15). Neural networks and gradient boosted machine showed that independent SNPs in this pathway predicted TRD with a mean sensitivity of 0.83 and specificity of 0.56 after 10-fold cross validation repeated 100 times. Conclusions FKBP5 polymorphisms should be considered for inclusion in antidepressant pharmacogenetic tests. CACNA1C is a good candidate and GO:0006942 includes several genes coding for ion channels expressed in the central nervous system and other genes relevant for excitatory mechanisms. CACNB2 and ANK3 showed replicated associations with phenotypes and further investigations could help in clarifying their role. This study may pave the way to the identification of sets of genetic predictors in specific pathways able to predict the risk of TRD. It is reasonable to hypothesize a certain degree of variability in the genetic variants involved in TRD across different patients, but the involved pathways are expected to be more stable. Validated genetic markers of TRD could have a pivotal role in the implementation of targeted antidepressant treatments.
Article
Nearly all aspects of modern life are in some way being changed by big data and machine learning. Netflix knows what movies people like to watch and Google knows what people want to know based on their search histories. Indeed, Google has recently begun to replace much of its existing non–machine learning technology with machine learning algorithms, and there is great optimism that these techniques can provide similar improvements across many sectors.
Article
High-dimensional data analysis is a challenge for researchers and engineers in the fields of machine learning and data mining. Feature selection provides an effective way to solve this problem by removing irrelevant and redundant data, which can reduce computation time, improve learning accuracy, and facilitate a better understanding for the learning model or data. In this study, we discuss several frequently-used evaluation measures for feature selection, and then survey supervised, unsupervised, and semi-supervised feature selection methods, which are widely applied in machine learning problems, such as classification and clustering. Lastly, future challenges about feature selection are discussed.
Article
Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice because of its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to the critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined. Expected final online publication date for the Annual Review of Clinical Psychology Volume 14 is May 7, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Objective: Despite efforts to identify characteristics associated with medication-placebo differences in antidepressant trials, few consistent findings have emerged to guide participant selection in drug development settings and differential therapeutics in clinical practice. Limitations in the methodologies used, particularly searching for a single moderator while treating all other variables as noise, may partially explain the failure to generate consistent results. The present study tested whether interactions between pretreatment patient characteristics, rather than a single-variable solution, may better predict who is most likely to benefit from placebo versus medication. Methods: Data were analyzed from 174 patients aged 75 years and older with unipolar depression who were randomly assigned to citalopram or placebo. Model-based recursive partitioning analysis was conducted to identify the most robust significant moderators of placebo versus citalopram response. Results: The greatest signal detection between medication and placebo in favor of medication was among patients with fewer years of education (≤12) who suffered from a longer duration of depression since their first episode (>3.47 years) (B = 2.53, t(32) = 3.01, p = 0.004). Compared with medication, placebo had the greatest response for those who were more educated (>12 years), to the point where placebo almost outperformed medication (B = -0.57, t(96) = -1.90, p = 0.06). Conclusion: Machine learning approaches capable of evaluating the contributions of multiple predictor variables may be a promising methodology for identifying placebo versus medication responders. Duration of depression and education should be considered in the efforts to modulate placebo magnitude in drug development settings and in clinical practice.
Article
Technological advances led to increasingly larger industrial quality-related datasets calling for process monitoring methods able to handle them. In such context, the application of variable selection (VS) in quality control methods emerges as a promising research topic. This review aims at presenting the current state-of-the-art of the integration of VS in multivariate statistical process control (MSPC) methods. Proposals aligned with the objective were identified, classified according to VS approach, and briefly presented. Research on the topic has considerably increased in the past five years. Thirty methods were identified and categorized in 10 clusters, according to the objective of improvement in MSPC and the step of process monitoring they were aimed to improve. The majority of the propositions were either targeted at exclusively monitoring potential out-of-control variables or improving the monitoring of in-control variables. MSPC improvements were centered in principal component analysis (PCA) projection methods, while VS was mainly carried out using the Least Absolute Shrinkage and Selection Operator (LASSO) method and genetic algorithms. Fault isolation was the most addressed step in process monitoring. We close the paper proposing five topics for future research, exploring the opportunities identified in the literature.
Article
A candidate gene and a genome-wide approach were combined to study the pharmacogenetics of antidepressant response and resistance. Investigated genes were selected on the basis of pleiotropic effect across psychiatric phenotypes in previous genome-wide association studies and involvement in antidepressant response. Three samples with major depressive disorder (total = 671) were genotyped for 44 SNPs in 8 candidate genes (CACNA1C, CACNB2, ANK3, GRM7, TCF4, ITIH3, SYNE1, FKBP5). Phenotypes were response/remission after 4 weeks of treatment and treatment-resistant depression (TRD). Genome-wide data from STAR*D were used to replicate findings for response/remission (n = 1409) and TRD (n = 620). Pathways including the most promising candidate genes were investigated in STAR*D for involvement in TRD. FKBP5 polymorphisms showed replicated but nominal associations with response, remission or TRD. CACNA1C rs1006737 and rs10848635 were the only polymorphisms that survived multiple-testing correction. In STAR*D the best pathway associated with TRD included CACNA1C (GO:0006942, permutated p = 0.15). Machine learning models showed that independent SNPs in this pathway predicted TRD with a mean sensitivity of 0.83 and specificity of 0.56 after 10-fold cross validation repeated 100 times. FKBP5 polymorphisms appear good candidates for inclusion in antidepressant pharmacogenetic tests. Pathways including the CACNA1C gene may be involved in TRD and they may provide the base for developing multi-marker predictors of TRD.
Article
Exploratory mediation analysis refers to a class of methods used to identify a set of potential mediators of a process of interest. Despite its exploratory nature, conventional approaches are rooted in confirmatory traditions, and as such have limitations in exploratory contexts. We propose a two-stage approach called exploratory mediation analysis via regularization (XMed) to better address these concerns. We demonstrate that this approach is able to correctly identify mediators more often than conventional approaches and that its estimates are unbiased. Finally, this approach is illustrated through an empirical example examining the relationship between college acceptance and enrollment.
Article
Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.
Chapter
When predicting a categorical outcome, some measure of classification accuracy is typically used to evaluate the model’s effectiveness. However, there are different ways to measure classification accuracy, depending of the modeler’s primary objectives. Most classification models can produce both a continuous and categorical prediction output. In Section 11.1, we review these outputs, demonstrate how to adjust probabilities based on calibration plots, recommend ways for displaying class predictions, and define equivocal or indeterminate zones of prediction. In Section 11.2, we review common metrics for assessing classification predictions such as accuracy, kappa, sensitivity, specificity, and positive and negative predicted values. This section also addresses model evaluation when costs are applied to making false positive or false negative mistakes. Classification models may also produce predicted classification probabilities. Evaluating this type of output is addressed in Section 11.3, and includes a discussion of receiver operating characteristic curves as well as lift charts. In Section 11.4, we demonstrate how measures of classification performance can be generated in R.
Article
caret has several functions that attempt to streamline the model building and evaluation process. The train function can be used to • evaluate, using resampling, the effect of model tuning parameters on performance • choose the “optimal ” model across these parameters • estimate model performance from a training set To optimize tuning parameters of models, train can be used to fit many predictive models over a grid of parameters and return the “best ” model (based on resampling statistics). See Table 1 for the models currently available. As an example, the multidrug resistance reversal (MDRR) agent data is used to determine a predictive model for the “ability of a compound to reverse a leukemia cell’s resistance to adriamycin” (Svetnik et al, 2003). For each sample (i.e. compound), predictors are calculated that reflect characteristics of the molecular structure. These molecular descriptors are then used to predict assay results that reflect resistance. The data are accessed using data(mdrr). This creates a data frame of predictors called mdrrDescr and a factor vector with the observed class called mdrrClass. To start, we will: • use unsupervised filters to remove predictors with unattractive characteristics (e.g. distributions or high inter–predictor correlations) spare • split the entire data set into a training and test setThe caret Package • center and scale the training and test set using the predictor means and standard deviations from the training set See the package vignette “caret Manual – Data and Functions ” for more details about these operations.> print(ncol(mdrrDescr)) [1] 342> nzv <- nearZeroVar(mdrrDescr)> filteredDescr <- mdrrDescr[,-nzv]> print(ncol(filteredDescr)) [1] 297> descrCor <- cor(filteredDescr)> highlyCorDescr <- findCorrelation(descrCor, cutoff = 0.75)> filteredDescr <- filteredDescr[,-highlyCorDescr]> print(ncol(filteredDescr)) [1] 50> set.seed(1)> inTrain <- sample(seq(along = mdrrClass), length(mdrrClass)/2)> trainDescr <- filteredDescr[inTrain,]> testDescr <- filteredDescr[-inTrain,]> trainMDRR <- mdrrClass[inTrain]> testMDRR <- mdrrClass[-inTrain]> print(length(trainMDRR)) [1] 264> print(length(testMDRR)) [1] 264> preProcValues <- preProcess(trainDescr)> trainDescr <- predict(preProcValues, trainDescr)> testDescr <- predict(preProcValues, testDescr)
Studying psychopathology in relation to smartphone use
  • D Marengo
  • M Settanni
Marengo D, Settanni M: Studying psychopathology in relation to smartphone use. In Digital Phenotyping and Mobile Sensing. Edited by Baumeister H, Montag C. Springer; 2019:109-124.