PreprintPDF Available

Introduction to Predictive Psychodiagnostics

Preprints and early-stage research may not have been peer reviewed yet.


The article discusses the theoretical and practical features of constructing predictive classifiers based on the results of psychological tests using machine learning methods. The purpose of creating such classifiers is to obtain models that can predict the presence or absence of certain personal and professional qualities in a psychodiagnostic test respondent and the success of a particular type of activity. These forecasts help reduce hiring errors and make more informed hiring decisions.
Introduction to Predictive
Peter G. Menshih, Igor V. Niesov,
The article discusses the theoretical and practical features of constructing predictive
classifiers based on the results of psychological tests using machine learning methods.
The purpose of creating such classifiers is to obtain models that can predict the
presence or absence of certain personal and professional qualities in a
psychodiagnostic test respondent and the success of a particular type of activity. These
forecasts help reduce hiring errors and make more informed hiring decisions.
I. Introduction
Formation of forecasts based on the results of filling in a psychodiagnostic test is a fairly
common practice in medicine [1], sports [2], psychology [3], transport [4], the army [5,
6], and other areas. In such cases, the results of psychodiagnostic tests based, for
example, on methods such as MMPI [7] or the Rorschach test [8–10] are taken as initial
data. Testing data is used with unique labels - information about whether the respondent
has the necessary properties that make up the predictive metric.
One of the disadvantages of the traditional approach to forming predictive systems is
that the predictors obtained in this way are not universal and require independent
research and development in each specific case, depending on the field of application.
In addition, in the base case, the formation of a predictive assessment results from a
psychodiagnostic technique that is not initially intended for the formation of prognostic
assessments. At the same time, the results of passing psychodiagnostic methods give a
complete description of the person in his answers to the test questions. However, in
most cases, these data are not used directly but only through the values of the
calculated scales. In contrast to the test results strictly typified by the methodology in
the form of scales, which contain data about the respondent placed as if in a vacuum,
the answers to the questions themselves often contain information about a person in his
direct, living relationship with the world.
If using all the results of the psychodiagnostic test - the values of the scales and the
calculated psycho type, the answers chosen by the respondent to the questions of the
questionnaire, and also add additional data to them - answers to questions that are not
included in the questionnaire, but contain valuable information, for example, about the
field of activity and position of the respondent, then, using machine learning algorithms,
it is possible to build and train a model that can regress indicators or classify a
respondent according to criteria set by the researcher that is inaccessible to the
psychodiagnostic method itself.
Thus, predictive psychodiagnostics can be described as the process of forming a
predictive assessment of a metric based on psychodiagnostic testing data using
machine learning methods.
The development of methods capable of isolating and visualizing data inaccessible to
traditional psychodiagnostic methods is highly relevant. Given sufficient statistical
significance and predictability, such data can qualitatively change the management of
the processes of assessment and selection of personnel, career guidance, as well as
forecasting the “work path” of a person, taking into account the industry specifics of his
field of activity.
II. Methodology
The approach described in the paper to the creation of a predictive psychodiagnostic
system is based on the principle of the synergy of objective data, such as, for example,
the values of scales formed through the use of a basic psychodiagnostic technique (for
this work, KPMI [11]) and a predictive assessment of additional properties and qualities
of the respondent obtained by approximation of survey data using machine learning
algorithms and the use of supervised learning methods.
The process of creating models for a predictive psychodiagnostic system generally
consists of the following set of steps:
1. definition of metrics calculated using machine learning;
2. adding questions to the basic questionnaire, the answers to which clearly
describe the selected metrics;
3. collection of a sufficient amount of data for training the ML model (i.e.,
questionnaires filled out by respondents);
4. model training.
The model obtained as a result of the described steps takes input data on the
respondent's answers to the survey questions and the scale values calculated
according to the rules of the basic psychodiagnostic methodology.
The output of the model contains a predictive estimate of the target metric.
The basic psychodiagnostic technique based on which a predictive system is built can
be any, and its choice should be based on those specific tasks that such a system must
solve. At the same time, the basic methodology for psychodiagnostic testing should, to
one degree or another, correlate with the subject area of the metrics determined using
For example, if it is necessary to work with the forecast of effectiveness in the field of
people management, then using only methods for assessing intellectual abilities will not
be enough since they need incentives to identify significant emotional intelligence
features for interpersonal interaction.
The features used by the model can be both a set of respondents' answers to the
questionnaire questions and scale values calculated following the basic
psychodiagnostic methodology. In addition, it is possible to use a combination of
answers to questions and scale values, as well as construct new features based on
them. The choice of a specific approach depends on the algorithms used, target
metrics, and the amount of data for training and requires a separate study in the context
of a specific problem being solved.
It is important to note that an excellent psychodiagnostic technique uses a model that
describes the psychological characteristics of a person. Furthermore, this model is
complete. In one way or another, it includes various factors that determine human
behavior. Moreover, this, in turn, means that it will most likely contain those factors that
determine the measured metrics.
III. Research
A practical example of the implementation of the model of predictive psychodiagnostics
can be found in [12]. The researcher's predictive model is a three-layer neural network
with a layer configuration of 60-30-15 neurons.
The model is created to form a predictive assessment of the level of involvement of
employees in sales in a large retailer's network. The input data for the model are the
results of filling out the KPMI psychodiagnostic test. The output is a value from 0 to 1,
which characterizes the predicted level of the respondent's involvement.
The answer to the NPS-type question "I am ready to recommend a job in the company
to my friends and acquaintances" was used as marking data during training, with
possible values ranging from -100 to 100 in increments of 20. At the same time,
respondents were considered critics, that is, uninvolved employees, those who chose
answer options with values from -100 to -20 inclusive, and involved (promoters), those
who chose answer options 80 and 100.
To eliminate the influence of the stochastic nature of neural network algorithms on the
result of the procedure for generating training and validation samples, as well as direct
training and validating the resulting model, the procedures for generating samples and
training/validating the model were repeated 100 times.
The average accuracy value obtained from 100 iterations was used as a performance
metric of the resulting model. In this case, accuracy is understood as the ratio of the
number of correctly classified samples to the total size of the test sample.
The total volume of initial data was 2.095 questionnaires, and the test sample size was
15%. We trained the model using a balanced dataset from the original sample by
random undersampling.
IV. Results and Discussion
Based on the results of 100 iterations of training and validation of the resulting model.
The average height value is 66%, with a standard deviation value of 0.05. On the
training model, it takes an average of 100 epochs to reach peak performance.
The model accuracy value of 66% is at least not found by the psychodiagnostic
predictive method [8, 13, 14]. At the same time, the predictive model described in this
study is essentially not optimized. Its final performance indicator also includes the area
of model uncertainty in the range from 0.4 to 0.6.
Finer tuning of the neural network architecture and careful selection of its
hyperparameters, coupled with unusual predictive estimates in the range of 0.4–0.6,
make it possible to form models with a performance of 75% or more. The indicator is
already poorly achievable for excellent methods for constructing predictive
psychodiagnostic systems.
V. Conclusion
Using the results of psychological testing to predict any qualities of a person within a
specific activity does not give sufficiently reliable results when using classical methods
of psychodiagnostics. This is especially true for cases in which the target metric is
specific [15]. Accuracy can be improved using machine learning methods that allow
taking into account not only scales values, but also response patterns, testing time, and
other meta-parameters. The tandem of psychodiagnostics and machine learning can
become a new milestone in determining personality traits in general and career
prospects in particular [16], and its capabilities are being actively studied [17–20].
In addition, developing a predictive system on a classical psychodiagnostic basis
requires a specific "sharpening" for a predictive metric. This does not allow using the
developed test for solving other problems, for example, assessing the involvement or
satisfaction of employees. In turn, the use of machine learning methods allows, subject
to the availability of the necessary information in the essence of the test questions, to
use existing tests with a minimum number of additions or without them in their original
form to build predictive systems.
In conclusion, it is worth noting that neural networks do not represent an
easy-to-understand version of the model as a whole, being "black boxes." The result of
the neural network is used "as is" without explaining how it was obtained [21]. In the
context of professional orientation tasks, personnel assessment, and
psychodiagnostics, such a feature in certain situations can make using neural networks
difficult solely because of a lack of understanding of "how it works."
VI. Acknowledgment
1. Arnold C. Small, James Madero, Lorie Teagno, & Michael H. Ebert. (1983). Intellect, perceptual
characteristics, and weight gain in anorexia nervosa. Journal of Clinical Psychology.
2. Lawless, J., & Grobbelaar, H. (n.d.). Sport psychological skills profile of track and field athletes
and less successful track athletes. Retrieved November 15, 2022, from
3. Pastushenya, A., Vasishchev, A., Filaretov, S., & Zharkikh, A. (2019). Improvement of
psychological tests to identify persons prone to escape from correctional institutions and places of
detention. International Penitentiary Journal, 1(2), 118–136.
4. Zaoral, A. (2009, November 30). Manual of Recommended Psychodiagnostic Methods for
Examination and Assessment of Mental Competence to Drive Motor Vehicles.
5. Muhammad, R. S., Wolters, H. M. K., & Jayne, B. S. (2020). Personality testing: enhancing
in-service selection of mid-career soldiers. Military Psychology, 32(1), 71–80.
6. DeLuca, J. (1968). Predicting the Full Scale WaisIQof Army Basic Trainees. The Journal of
Psychology, 68(1), 83–86.
7. Marek, R., Tarescavage, A., Ben-Porath, Y., Ashton, K., Heinberg, L., & Merrell Rish, J. (2015).
Presurgical Psychological Testing: Incremental Contribution to Predicting Failure to Follow
through with Bariatric Surgery. Surgery for Obesity and Related Diseases, 11(6), S157–S159.
8. Cmelic, S., & Henig, L. (1975). Prognostic value of the Rorschach psychodiagnostic test in
psychotherapy. Socijalna Psihijatrija, 3(4), 357–363.
9. Newmark, C.S, Konanc, J.T., Simpson, M., Boren, R.B, & Prillaman, K. (1979). Predictive validity
of the Rorschach prognostic rating scale with schizophrenic patients. Journal of Nervous and
Mental Disease, 167(3), 135–143.
10. Predictive validity of MMPI-2 and Rorschach in the diagnosis of depression and schizophrenia -
ProQuest. (n.d.).
11. KPMI (Keys to Personal Mastery Inventory) (2016).
12. Menshih, P. (2022, June 2). Engagement prediction for retail chain salespeople. Kaggle.
13. Golyanich, V. M., Bondaruk, A. F., Shapoval, V. A., & Tulupyeva, T. V. (2018). Value contradictions
as psychodiagnostic criteria of professional competence and an intrapersonal conflict.
Experimental Psychology (Russia), 11(3), 120–139.
14. Zhan'ko, Chulaevskiĭ. (2006, January 1). Ability to process information as a factor of professional
success - Europe PMC.
15. Catron J. Leo. Diagnostic utility of the minnesota multiphasic personality inventory for objective
diagnostic classifications - ProQuest. (1982).
ON STUDENTS FROM PRAGUE UNIVERSITIES. 3rd International Thematic Monograph -
Thematic Proceedings: Modern Management Tools and Economy of Tourism Sector in Present
Era, 755–768.
17. Dolce, P., Marocco, D., Maldonato, M. N., & Sperandeo, R. (2020). Toward a Machine Learning
Predictive-Oriented Approach to Complement Explanatory Modeling. An Application for
Evaluating Psychopathological Traits Based on Affective Neurosciences and Phenomenology.
Frontiers in Psychology, 11.
18. Gonzalez, O. (2020). Psychometric and machine learning approaches for diagnostic assessment
and tests of individual classification. Psychological Methods.
19. Fardouly, J., Crosby, R. D., & Sukunesan, S. (2022). Potential benefits and limitations of machine
learning in the field of eating disorders: current research and future directions. Journal of Eating
Disorders, 10(1).
20. Littlefield, A. K., Cooke, J. T., Bagge, C. L., Glenn, C. R., Kleiman, E. M., Jacobucci, R., Millner,
A. J., & Steinley, D. (2021). Machine Learning to Classify Suicidal Thoughts and Behaviors:
Implementation Within the Common Data Elements Used by the Military Suicide Research
Consortium. Clinical Psychological Science, 9(3), 467–481.
21. Chollet, F. (2018). Deep Learning with Python. Manning, Cop.
... The use of machine learning (ML), in particular artificial neural networks (ANN), in psychometrics and psychological research [1][2][3], in psychodiagnostics [4][5][6][7][8] is becoming increasingly popular. This increased interest is due to the fact that the use of ML demonstrates performance no worse than classical methods and provides more focus on prediction than just data categorization. ...
Full-text available
This paper reports on a study of the impact of dataset sizes on the performance of psychodiagnostic models. The study examined datasets with the results of psychodiagnostic tests, and the markup was based on answers of respondents about their attitude towards their work (field of activity or foa-category). A classifier based on a deep neural network was used as a psychodiagnostic model. The considered dataset sizes ranged from 100 to 8000 items. It is found that most models corresponding to various foa-categories demonstrate performance of at least 65% before reducing the dataset size to 400 items. It is also found that the greatest impact on model performance has foa-category and not dataset size. It is shown that the procedure used to evaluate models performance is unbiased.
Full-text available
Plain English Summary Machine learning models are computer algorithms that learn from data to reach an optimal solution for a problem. These algorithms provide exciting potential for the accurate, accessible, and cost-effective early identification, prevention, and treatment of eating disorders, but this potential is just beginning to be explored. Research to date has mainly used machine learning to predict women’s eating disorder status with relatively high levels of accuracy from responses to validated surveys, social media posts, or neuroimaging data. These studies show potential for the use of machine learning in the field, but we are far from using these methods in practice. Useful avenues for future research include the use of machine learning to personalise prevention and treatment options, provide ecological momentary interventions via smartphones, and to aid clinicians with their treatment fidelity and effectiveness. More research is needed with large samples of diverse participants to ensure that machine learning models are accurate, unbiased, and generalisable to all people with eating disorders. There are limitations and ethical considerations with using these methods in practice. If accurate and generalisable machine learning models can be created in the field of eating disorders, it could improve the way we identify, prevent, and treat these debilitating disorders.
Full-text available
This paper presents a procedure that aims to combine explanatory and predictive modeling for the construction of new psychometric questionnaires based on psychological and neuroscientific theoretical grounding. It presents the methodology and the results of a procedure for items selection that considers both the explanatory power of the theory and the predictive power of modern computational techniques, namely exploratory data analysis for investigating the dimensional structure and artificial neural networks (ANNs) for predicting the psychopathological diagnosis of clinical subjects. Such blending allows deriving theoretical insights on the characteristics of the items selected and their conformity with the theoretical framework of reference. At the same time, it permits the selection of those items that have the most relevance in terms of prediction by therefore considering the relationship of the items with the actual psychopathological diagnosis. Such approach helps to construct a diagnostic tool that both conforms with the theory and with the individual characteristics of the population at hand, by providing insights on the power of the scale in precisely identifying out-of-sample pathological subjects. The proposed procedure is based on a sequence of steps that allows the construction of an ANN capable of predicting the diagnosis of a group of subjects based on their item responses to a questionnaire and subsequently automatically selects the most predictive items by preserving the factorial structure of the scale. Results show that the machine learning procedure selected a set of items that drastically improved the prediction accuracy of the model (167 items reached a prediction accuracy of 88.5%, that is 25.6% of incorrectly classified), compared to the predictions obtained using all the original items (260 items with a prediction accuracy of 74.4%). At the same time, it reduced the redundancy of the items and eliminated those with less consistency.
Full-text available
This paper presents results of comparing technical university students with students studying tourism from the perspective of their personal diagnostics. We used the Big Five personality traits and Grit-S scale as these are fast and reliable tools that can reveal, among others, whether job applicants have the personality profile required for the job. In the case of tourism, the applicants should be extraverted, agreeable and with a low level of neuroticism. The study showed that the personality of tourism students is on average within the population norm, but they differ significantly from technical students in certain personality factors. Tourism students manifest significantly higher conscientiousness and agreeableness than technical students. It can be presumed that students with greatest perseverance and conscientiousness are those studying tourism, then students studying management at technical schools, whereas rather technically and science-oriented VSCHT students had the lowest score.
Full-text available
The article considers new criteria for quantitative assessment of value contradictions — value oppositions and an indicator of value-intentional coherence — that develop Schwartz’s ideas about harmonious, neutral or oppositional relationships of values in the structure of value consciousness. Using statistical analysis, the informative character of the criteria in predicting the success of the formation of professional competencies and assessing the effectiveness of human adaptation to unfavorable socio-psychological conditions is shown. A hypothesis is formulated about the possibility of applying the criteria of value contradictions as a measure of intentional competence, the intensity of an intrapersonal conflict, the success of the formation of professional competencies and the integration of the identity of a person. The obtained results testify to the prospects of using the developed psychodiagnostic criteria in the practice of psychological and personnel counseling, psychological support of the educational process, and also to assess the intensity of social and psychological adaptation.
Full-text available
The aim of this study was to compile a sport psychological skills profile of track and field athletes and to compare the psychological skill levels of successful and less successful track athletes during the 2011 University Sport South Africa Athletics Championships. The participants included 143 athletes (age= 21.6±2.32 years). Their perceived importance and need for psychological skills training, as well as their perceived ability to be mentally prepared for training sessions and competitions were investigated. Practical significant differences were observed between the top (n=21) and bottom (n=21) sprinters for Peak Performance Profile (PPP) total and stress control, Psychological Skills Inventory (PSI) total and achievement motivation, as well as between the top (n=21) and average (n=20) sprinters for PPP total, concentration, stress control, PSI total, achievement motivation, maintaining self-confidence and concentration. The successful (n=21) middle-and long-distance athletes recorded significantly higher achievement motivation values than their less successful (n=21) counterparts. Collectively, these results confirm a relationship between psychological skills and track and field success. The effect of psychological skills training programmes on psychological skills development and performance enhancement requires further empirical studies.
Suicide rates among military-connected populations have increased over the past 15 years. Meta-analytic studies indicate prediction of suicide outcomes is lacking. Machine-learning approaches have been promoted to enhance classification models for suicide-related outcomes. In the present study, we compared the performance of three primary machine-learning approaches (i.e., elastic net, random forests, stacked ensembles) and a traditional statistical approach, generalized linear modeling (i.e., logistic regression), to classify suicide thoughts and behaviors using data from the Military Suicide Research Consortium’s Common Data Elements (CDE; n = 5,977–6,058 across outcomes). Models were informed by (a) selected items from the CDE or (b) factor scores based on exploratory and confirmatory factor analyses on the selected CDE items. Results indicated similar classification performance across models and sets of features. In this study, we suggest the need for robust evidence before adopting more complex classification models and identify measures that are particularly relevant in classifying suicide-related outcomes.
The article is devoted to the problem of developing tests for predicting escapes of convicts, suspects and accused from pre-trial detention centers, prisons and correctional institutions, as well as the algorithm of their implementation in the automated workplace of a penitentiary psychologist. In the overall picture of penitentiary crime, the problem of escapes of convicts, suspects and accused is actual and important. The analysis of literary sources shows that in case of escapes prediction, it is necessary to consider not only criminal, but also social, biological and psychological features of convict’s personality. The psychological determinants of convicts’ escape activity include: emotional instability, conformity, increased anxiety, aggressiveness, rigidity of thinking, pessimism, suspicion, difficult process of adaptation to the conditions of serving punishment; low level of intelligence, presence of negative mental states, expressed motivation to evade serving punishment, an irresistible desire to be free, a desire to protect themselves from physical or psychological impact, active illegal activity in criminal communities. However, identification of the above qualities in convicts, suspects and accused does not allow calculating the probability of their escapes qualitatively. In order to develop psychodiagnostic tools to predict the convicts’, suspects’ and accused persons’ escape probability in terms of predictive validity, the use of “empirical-inductive” strategy of test construction is more effective. As part of this strategy, the scales of escape prediction of convicts, suspects and accused persons in the automated workplace of a penitentiary psychologist, using the psychodiagnostic system “Psychometric Expert”, were implemented. Two scales “Escape 365” and “Escape 90” were added to the method of accentuation research (G. Smishek, K. Leongard). In the method “Comprehensive study of the personality of a convict” – CSPC (E. A. Chebalova) scale “Escape 540” was added. In the method “Abbreviated multifactorial questionnaire for personality research” − Mini-mult (V. P. Zaytsev) scales “Escape 365” and “Escape 180” were added. The analysis of the practice of using predicting scales, implemented in the automated workplace of a penitentiary psychologist, shows that they are additional tools for escapes prevention.
Assessments are commonly used to make a decision about an individual, such as grade placement, treatment assignment, job selection, or to inform a diagnosis. A psychometric approach to classify respondents based on the assessment would aggregate items into a score, and then each respondent's score is compared to a cut score. In contrast, a machine learning approach to classify respondents would build a model to predict the probability of belonging to a specific class from assessment items, and then respondents are classified based on their predicted probability of belonging to that class. It remains unclear whether psychometric and machine learning methods have comparable classification accuracy or if 1 method is preferable in all or some situations. In the context of diagnostic assessment, this study used Monte Carlo simulation methods to compare the classification accuracy of psychometric and machine learning methods as a function of the diagnosis-test correlation, prevalence, sample size, and the structure of the diagnostic assessment. Results suggest that machine learning models using logistic regression or random forest could have comparable classification accuracy to the psychometric methods using estimated item response theory scores. Therefore, machine learning models could provide a viable alternative for classification when psychometric methods are not feasible. Methods are illustrated with an empirical example predicting an oppositional defiant disorder diagnosis from a behavior disorders scale in children of age seven. Strengths and limitations for each of the methods are examined, and the overlap between the field of machine learning and psychometrics is discussed. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Noncognitive attributes – notably personality – consistently predict important job-related outcomes for the Army (e.g., attrition, performance, disciplinary incidents) during Soldiers’ first term of enlistment. Recently, the U.S. Army Research Institute for the Behavioral and Social Sciences (ARI) has conducted research suggesting that personality measures can be used to improve person-job match and enhance performance beyond first term of enlistment. This paper summarizes recent research on the relationship between personality assessments and in-service job performance. This research is important because the duties of many Army in-service job assignments (e.g., Recruiter, Drill Sergeant, non-commissioned officer [NCO] Instructor, Special Operations) differ substantially from the type of work performed by the NCOs prior to such assignments, which renders moot the truism “the best predictor of future performance is past performance.” The paper also offers a conceptual framework for future research, which can add value to multiple types of in-service assignment decisions.