ArticlePDF Available

Abstract and Figures

Artificial intelligence generally and machine learning specifically have become deeply woven into the lives and technologies of modern life. Machine learning is dramatically changing scientific research and industry and may also hold promise for addressing limitations encountered in mental health care and psychotherapy. The current paper introduces machine learning and natural language processing as related methodologies that may prove valuable for automating the assessment of meaningful aspects of treatment. Prediction of therapeutic alliance from session recordings is used as a case in point. Recordings from 1,235 sessions of 386 clients seen by 40 therapists at a university counseling center were processed using automatic speech recognition software. Machine learning algorithms learned associations between client ratings of therapeutic alliance exclusively from session linguistic content. Using a portion of the data to train the model, machine learning algorithms modestly predicted alliance ratings from session content in an independent test set (Spearman's ρ = .15, p < .001). These results highlight the potential to harness natural language processing and machine learning to predict a key psychotherapy process variable that is relatively distal from linguistic content. Six practical suggestions for conducting psychotherapy research using machine learning are presented along with several directions for future research. Questions of dissemination and implementation may be particularly important to explore as machine learning improves in its ability to automate assessment of psychotherapy process and outcome. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Content may be subject to copyright.
Machine Learning and Natural Language Processing in Psychotherapy
Research: Alliance as Example Use Case
Simon B. Goldberg
University of Wisconsin–Madison
Nikolaos Flemotomos and Victor R. Martinez
University of Southern California
Michael J. Tanana and Patty B. Kuo
University of Utah
Brian T. Pace
University of Utah and Veterans Affairs Palo Alto Health
Care System, Palo Alto, California
Jennifer L. Villatte
University of Washington
Panayiotis G. Georgiou
University of Southern California
Jake Van Epps and Zac E. Imel
University of Utah
Shrikanth S. Narayanan
University of Southern California
David C. Atkins
University of Washington
Artificial intelligence generally and machine learning specifically have become deeply woven into
the lives and technologies of modern life. Machine learning is dramatically changing scientific
research and industry and may also hold promise for addressing limitations encountered in mental
health care and psychotherapy. The current paper introduces machine learning and natural language
processing as related methodologies that may prove valuable for automating the assessment of
meaningful aspects of treatment. Prediction of therapeutic alliance from session recordings is used
as a case in point. Recordings from 1,235 sessions of 386 clients seen by 40 therapists at a university
counseling center were processed using automatic speech recognition software. Machine learning
algorithms learned associations between client ratings of therapeutic alliance exclusively from
session linguistic content. Using a portion of the data to train the model, machine learning algorithms
modestly predicted alliance ratings from session content in an independent test set (Spearman’s ␳⫽
.15, p.001). These results highlight the potential to harness natural language processing and
machine learning to predict a key psychotherapy process variable that is relatively distal from
linguistic content. Six practical suggestions for conducting psychotherapy research using machine
Editor’s Note. Sigal Zilcha-Mano served as the action editor for this
article.—DMK Jr.
XSimon B. Goldberg, Department of Counseling Psychology, Univer-
sity of Wisconsin–Madison; Nikolaos Flemotomos, Department of Elec-
trical Engineering, University of Southern California; Victor R. Martinez,
Department of Computer Science, University of Southern California; Mi-
chael J. Tanana, College of Social Work, University of Utah; Patty B. Kuo,
Department of Educational Psychology, University of Utah; Brian T. Pace,
Department of Educational Psychology, University of Utah, and Veterans
Affairs Palo Alto Health Care System, Palo Alto, California; Jennifer L.
Villatte, Department of Psychiatry and Behavioral Sciences, University of
Washington; Panayiotis G. Georgiou, Department of Electrical Engineering,
University of Southern California; Jake Van Epps, University of Utah Coun-
seling Center, University of Utah; Zac E. Imel, Department of Educational
Psychology, University of Utah; Shrikanth S. Narayanan, Department of
Electrical Engineering, University of Southern California; David C. Atkins,
Department of Psychiatry and Behavioral Sciences, University of Washington.
Michael J. Tanana, David C. Atkins, Shrikanth S. Narayanan, and Zac E.
Imel are cofounders with equity stake in a technology company, Lyssn.io,
focused on tools to support training, supervision, and quality assurance of
psychotherapy and counseling. Shrikanth S. Narayanan is chief scientist
and co-founder with equity stake of Behavioral Signals, a technology
company focused on creating technologies for emotional and behavioral
machine intelligence. The remaining authors report no conflicts of interest.
Portions of the data presented in this article were reported at the North
American Society for Psychotherapy Research meeting in Park City, UT in
September 2018. Funding was provided by the National Institutes of
Health/National Institute on Alcohol Abuse and Alcoholism (Award R01/
AA018673). Support for this research was also provided by the University
of Wisconsin-Madison, Office of the Vice Chancellor for Research and
Graduate Education with funding from the Wisconsin Alumni Research
Foundation.
Correspondence concerning this article should be addressed to Simon B.
Goldberg, Department of Counseling Psychology, University of Wisconsin–
Madison, 335 Education Building, 1000 Bascom Mall, Madison, WI 53703.
E-mail: sbgoldberg@wisc.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Journal of Counseling Psychology
© 2020 American Psychological Association 2020, Vol. 67, No. 4, 438– 448
ISSN: 0022-0167 http://dx.doi.org/10.1037/cou0000382
438
learning are presented along with several directions for future research. Questions of dissemination
and implementation may be particularly important to explore as machine learning improves in its
ability to automate assessment of psychotherapy process and outcome.
Public Significance Statement
Our study suggests that client-rated therapeutic alliance can be predicted using session content
through machine learning models, albeit modestly.
Keywords: machine learning, natural language processing, methodology, artificial intelligence,
therapeutic alliance
Supplemental materials: http://dx.doi.org/10.1037/cou0000382.supp
New directions in science are launched by new tools much more than
by new concepts. The effect of a concept-driven revolution is to
explain old things in new ways. The effect of a tool driven revolution
is to discover new things that have to be explained. (Freeman Dyson,
1998, pp. 50 –51)
Whether or not we know it, and certainly whether or not we like
it, machine learning (ML) is transforming modern life. From eerily
prescient Google search suggestions or Amazon product recom-
mendations to iPhones capable of understanding spoken language
(i.e., Siri), ML undergirds many of the most commonplace tech-
nologies of industrialized society. Manifestations range from the
seemingly benign or mundane to the perhaps more pernicious (e.g.,
targeted advertising). These contemporary conveniences are based
on a family of quantitative methods that are rapidly changing
science and technology and fall under the general umbrella of
artificial intelligence. The term artificial intelligence has been
defined as “the study of agents that receive percepts from the
environment and perform actions” (Russell & Norvig, 2016,p.
viii). Early work on artificial intelligence dates back to the 1950s
(e.g., Turing, 1950). ML combines pattern recognition and statis-
tical inference and plays an integral role within the inner workings
of artificial intelligence. ML can be defined as “the study of
computer algorithms capable of learning to improve their perfor-
mance of a task on the basis of their own previous experience”
(Mjolsness & DeCoste, 2001, p. 2051).
The ways that ML has impacted scientific research and industry is
hard to overstate (Jordan & Mitchell, 2015;Mjolsness & DeCoste,
2001;Stead, 2018). Evidence for the widespread relevance of ML
dates back several decades (e.g., detecting fraudulent credit card
transactions; Mitchell, 1997). More recent ML-based innovations
in medicine include detection of diabetic retinopathy (Gulshan et al.,
2016), informing cancer treatment decision making (Bibault, Gi-
raud, & Burgun, 2016), and predicting disease outbreak (Chen,
Hao, Hwang, Wang, & Wang, 2017). Innovations based on ML are
occurring in basic science as well (e.g., materials science; Butler,
Davies, Cartwright, Isayev, & Walsh, 2018). While not all ML
applications in science and technology have gone smoothly (e.g.,
Google Flu consistently overestimating flu occurrence; Lazer,
Kennedy, King, & Vespignani, 2014), the potential is unequivocal.
Efforts to apply ML within mental health care are also underway
(for a recent scoping review, see Shatte, Hutchinson, & Teague,
2019). Examples include the use of passive sensing to predict
psychosis (e.g., data collected from sensors built into modern
smartphones; Insel, 2017;Wang et al., 2016), analysis of speech
signals to infer symptoms of depression (France, Shiavi, Silver-
man, Silverman, & Wilkes, 2000;Moore, Clements, Peifer, &
Weisser, 2008), prediction of treatment dropout from ecological
momentary assessment (Lutz et al., 2018), and the use of conver-
sational agents (i.e., computers) for clinical assessment and even
treatment (Miner, Milstein, & Hancock, 2017). While not incor-
porated in most settings, these ML-based innovations could dra-
matically change how mental health treatment and psychotherapy,
in particular, is provided. Importantly, once an ML algorithm has
been appropriately trained, it can be deployed at scale without
additional human judgment.
The Need for Innovation in Psychotherapy
Psychotherapy is in need of innovation. For one, mental health
care matters: mental health conditions are extremely common and
associated with enormous economic and social costs (Substance
Abuse and Mental Health Services Administration, 2014;Whit-
eford et al., 2013). Psychotherapy is a frontline treatment approach
(Cuijpers et al., 2014), with efficacy similar to psychotropic med-
ications and with potentially longer lasting benefits and fewer side
effects (Berwian, Walter, Seifritz, & Huys, 2017). Yet despite
enormous investment in psychotherapy in terms of therapist and
client time and health care dollars (Olfson & Marcus, 2010), what
actually happens in psychotherapy is largely unknown (i.e., is
unobserved). Psychotherapy research remains heavily reliant on
retrospective client or therapist self-report (e.g., Elliott, Bohart,
Watson, & Murphy, 2018;Flückiger, Del Re, Wampold, & Hor-
vath, 2018), limiting our understanding of actual therapist-client
interactions that drive treatment. We do know that treatment out-
comes vary widely, related to client (Lambert & Barley, 2001;
Thompson, Goldberg, & Nielsen, 2018), therapist (Baldwin &
Imel, 2013;Johns, Barkham, Kellett, & Saxon, 2019), relationship
(e.g., therapeutic alliance; Flückiger et al., 2018), and treatment-
specific factors.
One source of variability may be treatment quality. To date,
however, there are no established and routinely implemented
methods for quality control. The absence of quality control limits
clinical training, supervision, and the development of therapist
expertise (Tracey, Wampold, Lichtenberg, & Goodyear, 2014);
decreases the ability to demonstrate quality to payers (Fortney et
al., 2017); slows scientific progress in determining which treat-
ments are likely to succeed and why; and restricts efforts to
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
439
MACHINE LEARNING AND ALLIANCE
improve service delivery (Fairburn & Cooper, 2011). For these
reasons, psychotherapy researchers have developed numerous ob-
server rating systems to evaluate aspects of treatment quality (e.g.,
adherence and competence; Goldberg, Baldwin, et al., 2019;
Webb, DeRubeis, & Barber, 2010). Behavioral coding has been
invaluable in allowing researchers to understand what occurs in the
moment between therapists and clients that may contribute to
therapeutic change. However, human-coded rating systems are
labor intensive, expensive to implement, and not widely used in
community-based therapy (Fairburn & Cooper, 2011). Clients may
also be asked to provide evaluation of treatment quality (e.g.,
measures of satisfaction, therapeutic alliance; Flückiger et al.,
2018). Regular use of these kinds of measures, while robust
predictors of outcome (Flückiger et al., 2018), increase burden on
clients and providers, are at risk for response set biases (e.g., social
desirability) and random error, and have known psychometric
limitations (e.g., ceiling effects; Tryon, Blackwell, & Hammel,
2008).
The New Tools of Psychotherapy Research
Recent methodological advances may be quickly changing our
ability to process the complex data of psychotherapy (Imel, Cap-
erton, Tanana, & Atkins, 2017) and could allow automated assess-
ment of treatment quality along with other outcome and process
variables. Two related innovations include the development of
natural language processing (NLP) and ML. As spoken language
forms a key component of most psychotherapies, the ability to
rapidly and reliably process speech (or text) data may allow
routine assessment of treatment quality and evaluation of numer-
ous other constructs of interest. Several recent proof-of-concept
examples have appeared in the literature, including using NLP and
ML to reliably code motivational interviewing treatment fidelity
(Atkins, Steyvers, Imel, & Smyth, 2014;Imel et al., in press), to
differentiate classes of psychotherapy (e.g., cognitive-behavioral
therapy and psychodynamic psychotherapy; Imel, Steyvers, &
Atkins, 2015), and to identify linguistic behaviors of effective
counselors in text-based crisis counseling (Althoff, Clark, & Les-
kovec, 2016).
The current study extends these efforts further by employing
NLP and ML to predict one of the most studied process variables
in psychotherapy: the therapeutic alliance (Flückiger et al., 2018).
This was examined within the context of a large, naturalistic
psychotherapy dataset drawn from a university counseling center.
Session recordings were available for 1,235 sessions of 386 clients
seen by 40 therapists. NLP and ML methods were used to predict
client-rated alliance from session recordings.
Alliance is used as a test case to demonstrate the potential
applicability of NLP and ML for several reasons. First, alliance is
important for effective psychotherapy, based on its robust relation-
ship with outcome (Flückiger et al., 2018). Second, alliance, unlike
other more objective linguistic features (e.g., ratio of open and
closed questions in motivational interviewing adherence coding;
Miller, Moyers, Ernst, & Amrhein, 2003), requires a potentially
higher order of processing to assess (e.g., through the cognitive
and affective system of a client, therapist, or observer providing
alliance ratings). This additional level of abstraction likely makes
automated prediction more difficult, but also more widely relevant
if it can be accomplished. Third, alliance represents a relatively old
concept (Bordin, 1979;Greenson, 1965) that may be less viable for
concept-driven innovations (Dyson, 1998). New tools, however,
could drive innovation in this area. There are also important open
questions related to alliance, such as the proportion and cause of
therapist and client contributions to alliance (Baldwin, Wampold,
& Imel, 2007), the source of unreliability in alliance ratings across
rating perspectives (i.e., client, therapist, and observer; Tichenor &
Hill, 1989), the state- versus traitlike qualities of alliance (Zilcha-
Mano, 2017), the potentially causal nature of alliance as a driver of
symptom change (Falkenström, Granström, & Holmqvist, 2013;
Flückiger et al., 2018;Zilcha-Mano & Errázuriz, 2017), and ways
to include alliance assessment in routine clinical care without
increasing participant burden (Duncan et al., 2003;Goldberg,
Rowe, et al., 2019). While NLP and ML are likely not panacea for
resolving all outstanding debates regarding alliance, they may be
useful research tools. Theoretically, these questions could be ad-
dressed more thoroughly if ML enabled alliance assessment on a
much larger scale, particularly if ML models were built in a way
to minimize construct irrelevant variance (e.g., social desirability).
Ultimately, assessment of alliance could be automated using ML,
providing clients and therapists with ongoing information about
this aspect of therapeutic process without the drawbacks (e.g., time
required, psychometric issues) of repeated self-report assessment.
Such technology could also be used to assess alliance directly from
session transcripts or recordings.
Prior to presenting a preliminary attempt at assessing alliance
using NLP and ML, it is worth introducing basic concepts involved
in each methodology. This is, of course, intended to be a cursory
treatment and interested readers are encouraged to review sources
cited below.
Basics of NLP
NLP is a subfield of computer science and linguistics focused on
the interaction between machines and humans through language
(Jurafsky & Martin, 2014). NLP aims to understand human com-
munication by processing and analyzing large quantities of textual
data. Popular applications of NLP include machine translation
(e.g., Google Translate), question-answering systems, or sentiment
analysis (e.g., extraction of sentiments within social media).
Typically, NLP applications start with a collection of raw text
documents (i.e., a language corpus). From this corpus, the first step
is to extract or estimate quantitative features from the text. One of
the most widely used NLP features is the bag-of-words represen-
tation (BoW). In BoW, each document is represented by counts of
its unique words, without regard to the ordering of these words.
Conceptually, BoW is a large crosstabulation table of words by
documents. Other common text features include N-grams (Shan-
non, 1948), which are short multiword phrases with Nelements
(e.g., bigrams include 2-word phrases); dictionary-based features,
such as those provided by Linguistic Inquiry and Word Count
(LIWC; Pennebaker, Boyd, Jordan, & Blackburn, 2015)orthe
General Inquirer (Stone, Bales, Namenwirth, & Ogilvie, 1962);
and dialogue acts (Okada et al., 2016), which try to capture a
high-level interaction between participants in a conversation (i.e.,
“statement,” “question,” etc.). More recently, linguistic units are
converted to a vector-space representation of either word (Mikolov,
Sutskever, Chen, Corrado, & Dean, 2013,Pennington, Socher, &
Manning, 2014) or sentence (Pagliardini, Gupta, & Jaggi, 2017)
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
440 GOLDBERG ET AL.
embeddings, which capture the semantic context. Words (or sen-
tences) that appear in similar contexts appear closer to each other
in vector space, and semantic relationships are represented by the
operations of addition and subtraction (e.g., v(king) v(man)
v(woman) v(queen) where v(w) is represents the vector for
word w).
Basics of ML
The human brain has a remarkable ability to learn and recognize
patterns from its surrounding environment. ML comprises a set of
computational techniques simulating this capability (Haykin,
2009). As opposed to knowledge-based approaches, where a hu-
man designs an algorithm having specific rules in mind, ML is
typically based on data-driven methods and on statistical inference.
ML algorithms derive prediction rules from (typically) large
amounts of data.
Two major paradigms in ML are unsupervised and supervised
learning (Murphy, 2012). Similar to cluster analysis, unsupervised
learning does not involve an outcome to predict but rather focuses
on finding structure within a given set of data. Supervised learning
is similar to regression modeling, in which an outcome (either
discrete or continuous) is associated with a set of input data, and
the ML algorithm is tasked with finding an optimal mapping
function between the input data and the outcome (e.g., linking
linguistic content with alliance ratings). Once such a mapping has
been learned, it can be used to predict outcomes for new data.
Since the goal of ML is to apply the algorithm on previously
unseen data, ML analyses train algorithms on a subset of “training
data” but are evaluated on a separate subset of “test data.” Typical
supervised learning algorithms include support vector machines,
regularized linear or logistic regression, and decision trees (Mur-
phy, 2012). Recently, there has been rapid development and in-
creased focus on artificial neural networks and deep learning
techniques (Goodfellow, Bengio, & Courville, 2016).
Method
Participants and Setting
Data were collected at the counseling center of a large, Western
university. The counseling center provides approximately 10,000
sessions per year, with treatment focused on concerns common
among undergraduate and graduate students (e.g., depression, anx-
iety, substance use, academic concerns, relationship concerns;
Benton, Robertson, Tseng, Newton, & Benton, 2003). Treatment is
provided by a combination of licensed permanent staff (including
social workers, psychologists, and counselors) as well as trainees
pursuing masters- or doctoral-level mental health degrees (e.g.,
masters of social work, doctorate in counseling/clinical psychol-
ogy).
Data were collected between September 11, 2017 and December
11, 2018. Both clients and therapists provided consent for audio
recording of sessions and for use of recordings for the current
study. Recordings were made from microphones installed in clinic
offices and archived on clinic servers. Two microphones were
hung from the ceiling in each room. One cardioid choir mic was
hung to capture voice anywhere in the room and a second choir
mic pointed in the direction where the therapist generally sits. In
order for sessions to be recorded, clinicians had to start and stop
recordings (i.e., sessions were not recorded automatically). All
recordings were from individual therapy sessions (approximately
50 min in length). All audio recordings with associated alliance
ratings were used (i.e., no exclusions were made). Alliance is
assessed routinely in the clinic, with no standardized instructions
regarding how therapists use these ratings in therapy.
The current study was integrated into the partner clinic with
minimum modifications to the existing clinic workflow. One fea-
ture of the workflow is collecting alliance ratings prior to sessions,
rather than asking clients to complete measures both before (e.g.,
symptom ratings) and after (e.g., alliance ratings) session. When
making alliance ratings prior to session, clients were asked to
reflect on their experience of alliance at their previous session (i.e.,
time 1). In all models, alliance ratings were associated with the
session they were intended to represent (e.g., ratings made prior to
Session 2 were associated with Session 1). No alliance ratings
were made prior to the initial session. Study procedures were
approved by the relevant institutional review board.
Clients were, on average, 23.77 years old (SD 4.86). The
majority of the sample identified as female (n214, 55.4%), with
the remainder identifying as male (n158), nonbinary (n5),
genderqueer (n1), gender neutral (n3), female-to-male
transgender (n1), and questioning (n2), with two choosing
not to respond. The client sample predominantly identified as
White (n294, 76.2%), with the remainder identifying as Latinx
(n33), Asian American (n28), African American (n5),
Pacific Islander (n2), Middle Eastern (n1), and multiracial
(n21), with two choosing not to respond.
Demographic data were available from 26 of the 40 included
therapists. Therapists were, on average, 35.15 years old (SD
14.04). The majority identified as female (n17, 65.4%), with the
remainder identifying as male (n7), or genderqueer (n1).
The majority identified as White (n15, 57.7%), with the
remainder identifying as Latinx (n4), Asian American (n3),
African American (n2), Middle Eastern (n1), and multiracial
(n1).
Measures
Therapeutic alliance was assessed using a previously validated
(Imel, Hubbard, Rutter, & Simon, 2013) four-item version of the
Working Alliance Inventory—Short Form Revised (Hatcher &
Gillaspy, 2006) representing the bond, task, and goal dimensions
of alliance. Items included “_________ and I are working towards
mutually agreed upon goals” (goal), “I believe the way we are
working on my problem is correct” (task), “I feel that _________
appreciates me” (bond), and “_________ really understands me”
(bond). Items were rated ona1(Never)to7(Always) scale. A total
score was computed by averaging across the four items. Internal
consistency reliability was adequate in the current sample (␣⫽
.90). As noted above, ratings were made prior to each session
(starting with the second session) asking clients to reflect back on
their experience of alliance in the previous session. Although
alliance can be rated from various perspectives (e.g., client, ther-
apist, observer; Flückiger et al., 2018), the current study employed
client-rated alliance due to its robust link with treatment outcome,
ease of data collection, and ecological validity (i.e., the experience
of alliance largely exists in the subjective experience of the client).
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
441
MACHINE LEARNING AND ALLIANCE
Data Analysis
For this study, we used 1,235 recorded sessions together with
client-reported alliance, assessed prior to the subsequent session
occurring between the same therapist and client. Audio recordings
were processed through a speech pipeline to generate automatic
speech-to-text transcriptions. The automatic speech recognition
made use of the open-source, freely available Kaldi software
(Povey et al., 2011). Components of the pipeline along with their
corresponding accuracy (vs. human transcription) using data from
the current study include: (a) a voice activity detector, where
speech segments are detected over silence or noise (unweighted
average recall 82.7%); (b) a speaker diarization system, where
the speech is clustered into speaker-homogeneous groups (i.e.,
Speaker A, Speaker B; diarization error rate 6.4%); (c) a speaker
role recognizer, where each group is assigned the label “‘therapist”
or “client” (misclassification rate 0.0%); and (d) an automatic
speech recognizer, which transduces speech to text (word error
rate 36.43%). The modules of the speech pipeline have been
adapted with the Kaldi speech recognition toolkit (Povey et al.,
2011) using psychotherapy sessions provided by the same coun-
seling center, but not used for the alliance prediction, thus not
inducing bias. A similar system architecture is described in Xiao et
al. (2016) and Flemotomos et al. (2019).
Linguistic features were extracted from resulting transcripts,
independently for therapist and client text. We report results using
unigrams and bigrams (i.e., 1- and 2-word pairings) weighted by
the term frequency-inverse document frequency (tf-idf; Salton &
McGill, 1986) or sentence (Sent2vec) embeddings (Pagliardini et
al., 2017). Tf-idf weighting accounts for the frequency with which
words appear within a given document (i.e., session), while also
considering its frequency within the larger corpus of text (i.e., all
sessions). This allows less commonly used words (e.g., suicide)
more weight than commonly used words (e.g., the). Thus, less
common words are treated as more important. Tf-idf weighting
was calculated across all sessions in the train set and applied to the
test set. As described earlier, Sent2vec maps sentences to vectors
of real numbers. Using Sent2vec, the session is represented as the
mean of its sentence embeddings. Models used linear regression
with L2-norm regularization (i.e., ridge regression; Hoerl & Ken-
nard, 1970), which is a method designed for highly correlated
features, which is often the case for NLP data.
To estimate the performance of our method, experiments were
run using a 10-fold cross-validation: data is split into 10 parts, with
nine parts used for training at each iteration (train), and one for
evaluation (test). This is commonly used in ML and allows esti-
mation of the extent to which model results based on the training
set (train) will generalize to an independent sample (test). Train
and test sets were constructed so as not to share therapists between
them, as shared therapists could artificially inflate the model’s
accuracy. The algorithm is therefore expected to learn patterns of
words related to alliance ratings in general instead of capitalizing
on therapist-specific characteristics.
We employed two commonly used metrics of accuracy: mean
square error (MSE) and Spearman’s rank correlation (). These
metrics reflect the accuracy of the ML algorithm when applied to
the test set. Specifically, mean squared error is the average of the
squared differences between the predictions and the true values
and is useful for comparing models, though its absolute value is
not interpretable. Spearman’s rank correlation measures the strength
of association between two variables, ranging from 1 to 1, with
higher values preferred.
Computer Software
Self-report data were processed within the R statistical environ-
ment (R Core Team, 2018). NLP and ML was conducted using the
Python programming language (Python Software Foundation,
2019). Models used the “scikit-learn” toolkit (Pedregosa et al.,
2011) and the “sklearn.linear_model.Ridge” function (Hoerl &
Kennard, 1970; see Table 1 in the online supplemental materials
for syntax). Sent2vec was implemented using the method devel-
oped by Pagliardini et al. (2017) and N-grams obtained using the
text feature extraction in “scikit.”
1
The time required for running
the speech pipeline and ML models can vary. In the current data,
the speech pipeline required approximately 30 min per 50-min
session using one core of an AMD Opteron Processor 6276 (2.3
GHz). The 10-fold cross-validation models took approximately 10
min on a MacBook Pro with 2.8 GHz Intel Core i7, 16 GB RAM,
and 2133 MHz LPDDR3.
Results
The sample included a total of 1,235 sessions with recordings
and associated alliance ratings (provided at the subsequent session;
n386 clients; 40 therapists). Clients had, on average, 3.20
sessions in the data set (SD 2.50, range 1 to 13) and therapists
had 30.88 (SD 32.97, range 1 to 131). Sessions represented
a variety of points in treatment, with a mean session number of
5.31 (SD 3.37, range 1 to 23). Across the 1,235 alliance
ratings, the mean rating was 5.47 (SD 0.83, median 5.5,
range 1.75 to 6.50; see Figure 1 in the online supplemental
materials). Ratings showed the typical negative skew found in the
assessment of alliance (Tryon et al., 2008).
ML model results are presented in Table 1. Models are shown
using either therapist or client text as the input. Results are also
separated by feature extraction method (tf-idf, Sent2vec). The
baseline model reflects accuracy of the average rating (i.e., 5.47)
and is useful to evaluate model performance.
The predictions of three out of the four models are significantly
better than chance (Spearman’s ␳⬎.00, p.01). The model that
used therapist text and extracted features using tf-idf performed
best overall, with MSE 0.67 and ␳⫽0.15, p.001. For
illustrative purposes only, we extracted the 15 unigrams/bigrams
that were most positively or negatively correlated with alliance
ratings in our best performing model. As these features represent
only a small portion of the corresponding model, they should not
be viewed as a replacement for the full model. The 15 most
positively correlated unigrams/bigrams were: group, really, hus-
band, right, think, phone, values, maybe, divorce, got, yeah, situ-
ation, um right, don think, max. The 15 most negatively correlated
unigrams/bigrams were: counseling, yeah yeah, going, sure, cop-
ing, just want, friends, motivation, feeling, Monday, huh yeah, oh,
physical, pretty, time.
1
Readers interested in working with text data in Python are encouraged
to read the “scikit” and Kaldi tutorials (https://scikit-learn.org/stable/tutorial/
text_analytics/working_with_text_data.html;https://kaldi-asr.org/doc/kaldi_
for_dummies.html).
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
442 GOLDBERG ET AL.
Discussion
The current study introduces two related quantitative methods—
NLP and ML—that have the potential to significantly expand
methodological tools available to psychotherapy researchers and
clinicians. The prediction of client-rated therapeutic alliance from
session recordings was used as a test case for these methods due to
the importance of alliance in psychotherapy and the potential
contribution of technologies able to reliably automate alliance
assessment. Results presented here suggest that ML models mod-
estly predict alliance ratings (␳⫽.15). That is to say, there was
linguistic signal indicative of the strength of the alliance that is
detectable through ML, supporting the notion that ML may be a
useful tool for examining alliance in future studies.
It is worth contextualizing these results within the broader field
of speech signal processing and NLP as well as prior work spe-
cifically within the domain of psychotherapy research. An important
feature of the alliance, and part of the motivation to examine alliance,
is its greater degree of abstraction from the actual linguistic context of
a psychotherapy session. Compare alliance with another commonly
studied psychotherapy process variable—motivational interviewing
fidelity codes. Motivational interviewing codes are primarily lin-
guistic in nature (e.g., open vs. closed question; Miller et al., 2003)
and can be reliably coded by trained human raters and ML algo-
rithms at approximately similar levels (e.g., s.75 for use of
open questions over a session of motivational interviewing; Atkins
et al., 2014). Importantly, aspects of motivational interviewing
fidelity that show lower interrater reliability among human raters
(e.g., empathy) are also more difficult to predict via ML (e.g., s
.25 for talk turns and .00 for sessions; Atkins et al., 2014).
Alliance, in contrast to most motivational interviewing fidelity
dimensions, requires in-depth processing by humans (i.e., client,
therapist, or observer) and is presumably influenced by a variety of
unobservable, nonlinguistic factors. It is exactly this nonlinguistic,
internal processing that may be more difficult for ML models to
replicate. This highlights a truism of NLP methodologies: behav-
iors more distal from linguistic content that are more difficult for
human raters to rate reliably will also be more difficult for ML
models to predict. This may make predicting even more abstracted
aspects of treatment, such as treatment outcome, yet more chal-
lenging to predict using ML.
Practical Suggestions
Given these potential limitations, there are six practical consid-
erations offered here that may increase the viability of ML to
contribute to psychotherapy research. Several of these are funda-
mental principles of ML reviewed previously but are worth high-
lighting due to the possibility that many readers may not be
familiar with them.
1. ML may be most promising for predicting observable
linguistic behaviors. For efforts employing ML using text
data, it may be valuable to start with observable behav-
iors that humans can code reliably using only text data
(e.g., treatment fidelity; Atkins et al., 2014). Human
reliability provides an estimate of the upper limit to
reliability likely to be achieved using ML models. Be-
haviors for which humans have difficulty reaching con-
sensus will likely be more challenging for ML models as
well.
2. ML models should be trained using human coding as the
gold standard. Related to the previous suggestion, it may
be prudent to develop ML models based on behaviors
that are observable and to use human-based ratings as the
standard for training ML algorithms. Thankfully, prom-
ising observer-rated measures of alliance and other psy-
chotherapy processes (e.g., empathy, treatment fidelity)
have been developed that may serve as a basis for future
ML psychotherapy research. While this has been done in
previous work on motivational interviewing (Atkins et
al., 2014;Xiao et al., 2016), this was not used in the
current study due both to resource limitations and an
interest in attempting to predict client (rather than ob-
server) ratings. However, ML models could be con-
structed predicting observer-rated alliance, which may be
less prone to client response set biases (e.g., social de-
sirability). While models using human coding as the basis
are a promising starting point, it may also be useful to
develop models attempting to predict more diffuse con-
structs that are not reliably rated by observers (e.g.,
treatment outcome).
3. ML models should be tested using large data sets. One of
the distinct advantages of ML is its potential to process
large amounts of data, an impractical task when using
human coders. However, for the development of reliable
ML algorithms, large amounts of training data are ideal.
The actual amount of data necessary varies widely de-
pending on the nature of the ML task, but data sets of
10,000 cases or more are commonly used in NLP appli-
cations. Given advances in NLP, researchers and clini-
cians who have access to high fidelity session recordings
may be able to convert existing recordings to text data for
ML models.
4. Develop models using a training set and test models
using a test set. Similar to the rationale for employing
Table 1
Results From Machine Learning Prediction Model
Model
Feature extraction
method MSE p
Therapist tf-idf .67 .15 .001
Sent2vec 3.34 .08 .003
Client tf-idf .69 .11 .001
Sent2vec 3.67 .01 .800
Baseline Average .69 .00 n/a
Note. Models employed unigrams and bigrams (i.e., 1- and 2-word pair-
ings) and a linear regression with L2-norm regularization (i.e., ridge
regression; Hoerl & Kennard, 1970). Models were evaluated using 10-fold
cross-validation with nine parts used for model training and one used for
evaluation. Therapist therapist speech; Client client speech; base-
line model results if model always predicts the mean alliance rating (i.e.,
5.47); MSE mean square error; ␳⫽Spearman’s rank order correlation;
tf-idf term frequency-inverse document frequency weighting based on
(inverse) frequency of occurrence within the document and larger corpus;
Sent2vec sentence embeddings used to map sentences to vectors of real
numbers.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
443
MACHINE LEARNING AND ALLIANCE
separate sample for exploratory and confirmatory factor
analysis (Gerbing & Hamilton, 1996), evaluation of ML
algorithms requires separate samples. It is possible to get
perfect accuracy within a training set, but this in no way
indicates that results will be perfectly accurate in a future
data set (i.e., for prediction). The need for separate samples
echoes the need for large data sets when conducting ML.
5. Develop interdisciplinary collaborations. Most psychother-
apy researchers are not trained in ML during graduate
school. As these models depart in some important ways
from traditional quantitative methods used in psychology
(e.g., regression and analysis of variance), it may be vital for
researchers interested in ML to build collaborations with
colleagues more versed in the intricacies of ML. Research-
ers with expertise in processing linguistic data, with back-
grounds in computer science and engineering, for example,
may be ideal complements to the clinical and context ex-
pertise brought by psychologists. Of course, interdisciplin-
ary collaborations involve their own complexity, with re-
searchers working across disciplinary cultures, practices,
and standards.
6. Have reasonable expectations and avoid the risk of “al-
chemy.” A final suggestion is that those interested in pur-
suing ML-based psychotherapy research have reasonable
expectations about the promise of these methods, and the
speed with which they will become viable tools. One con-
cern is that ML-based models simply replicate the human
biases in the patient-rated measures: if the model accurately
learns the human rating, it will also include ceiling effects,
social desirability, and other potentially construct-irrelevant
variance. In addition, it is encouraged that ML not be
viewed as form of alchemy (Hutson, 2018) in which ML
becomes a quasimagical black box for researchers and con-
sumers of research. ML research, like other research meth-
odologies, is likely to benefit from transparency, humility,
and replication (Open Science Collaboration, 2015) along
with a healthy dose of skepticism.
Future Directions
Consistent with these practice suggestions, future work should
continue to explore important psychotherapy process and outcome
variables using linguistic, paralinguistic (e.g., prosody, pitch), and
nonverbal therapy behaviors. Ideally this is done using large data
sets (e.g., Ns10,000 sessions). The current study focused on
alliance, but future work could use similar methods to predict
treatment outcome (e.g., Hamilton Rating Scale of Depression;
Hamilton, 1960), multicultural competence (Tao, Owen, Pace, &
Imel, 2015), empathy (Imel et al., 2014), interpersonal skill (An-
derson, Ogles, Patterson, Lambert, & Vermeersch, 2009), treat-
ment fidelity (e.g., Cognitive Therapy Rating Scale; Creed et al.,
2016;Goldberg, Baldwin, et al., 2019), and other variables previ-
ously assessed using observer ratings (e.g., innovative moments;
Gonçalves, Ribeiro, Mendes, Matos, & Santos, 2011).
Development will also ideally occur in tandem with attention to
measurement and known issues in psychotherapy research. For
example, future work should consider likely bias in the measure-
ment of alliance. Clients whose ratings are invariant across ses-
sions (e.g., consistently provided alliance ratings at the ceiling of
the measure) could be removed from ML models, perhaps even-
tually providing models that better predict the correlates of alliance
(e.g., treatment retention) than self-report. Or ML models could be
used to determine when collecting self-report alliance data would
provide information beyond what analysis of session content could
provide (e.g., models predicting discrepancies between ML-based
and self-report alliance ratings). It also may be worthwhile at-
tempting to predict therapist-level alliance scores using session
content and ratings aggregated across multiple clients.
The current cross-validation design allowed no therapist to
appear in both the train and test sets. Conceptually, this ML
approach is trying to discover a universal model for mapping
language to alliance, and as such, it is the hardest and most
conservative modeling approach. Alternative strategies would al-
low therapists to be in both train and test sets, which allows a
model to learn individual-specific mappings of text to alliance to
support prediction of future alliance scores for either therapist or
client. It could be valuable to explore these additional models in
future work.
Provided ML models continue to improve in their ability to
detect important aspects of psychotherapy, questions of dissemi-
nation and implementation will become increasingly central. Many
potentially valuable technologies have existed for years (e.g.,
models detecting depression symptoms via speech features; France
et al., 2000), yet are not widely implemented. There are, of course,
numerous reasons that innovations may not be adopted, and con-
siderable scholarship focused on precisely this research-to-practice
impasse (e.g., Wandersman et al., 2008). Part of the solution to
bringing ML-based technologies to market may require research-
ers moving outside of the traditional academic boundaries and
developing collaborations with industry. For clinicians and re-
searchers alike, there may be discomfort with the notion of part-
nering with for-profit entities with fears of disruptions in objec-
tivity that form the theoretical backbone of both science and
practice (DeAngelis, 2000). While these concerns may be valid,
these partnerships may play a central role in bringing novel tech-
nologies such as those based on ML to the therapists and clients
who could benefit from them.
Gaining buy-in from clinicians is another dissemination and
implementation barrier. Clinician discomfort discussed in relation
to measurement-based care (e.g., Boswell, Kraus, Miller, & Lam-
bert, 2015;Fortney et al., 2017;Goldberg et al., 2016) may very
well be magnified when clinicians are asked to routinely record
therapy sessions. Discomfort may be further magnified knowing
that these recordings will subsequently be analyzed by a computer
algorithm to determine treatment quality, therapeutic alliance, or
outcome. Sensitivity to these and other dissemination and imple-
mentation issues will be crucial for moving this work forward.
A final future direction to mention is the importance of ulti-
mately evaluating whether ML-based feedback— be it focused on
alliance, fidelity, or any other aspect of treatment—actually pro-
vides benefits. The benefit of interest may depend on the stake-
holder: for payers, this may involve demonstrating the quality of
services; for clinicians, this may involve demonstrating improved
client outcomes; and for researchers, this may involve demonstrat-
ing reliability and validity with reduced cost of research team time
and money. It is likely these metrics will ultimately determine
whether ML can transform psychotherapy.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
444 GOLDBERG ET AL.
Limitations
While promising, the current study has several important limi-
tations. The first is the relatively modest sample size. While large
by human coding standards, the current number of sessions eval-
uated is well below the samples often used for ML. As noted
previously, ML models improve with larger amounts of training
data. Thus, the available sample size may have reduced the ability
to predict alliance ratings from session recordings.
Another limitation is related to the available speech signal
processing technology. In particular, existing NLP technologies
have known limitations, including inaccuracy in transcription (i.e.,
misinterpreting spoken words) and errors in assigning speech to a
given speaker (i.e., diarization). These factors introduce error
variability into the text data which functions to reduce statistical
power and the accuracy of the ML models.
A third key limitation is related to the assessment of alliance.
For one, ratings were made retrospectively (i.e., about a prior
session). Collecting ratings at time points more distant from the
actual session may have reduced linkages between ratings and
session content and thereby decreased the signal available for
detection (i.e., exerting a conservative rather than liberal bias on
our ability to predict alliance ratings from session content). Sim-
ilarly, there was evidence that alliance ratings in the current study
suffered from range restriction due to the well-documented ceiling
effects for ratings of alliance (Tryon et al., 2008). Range restriction
also may have decreased statistical power and the ability to reli-
ably predict alliance ratings (Cohen, Cohen, West, & Aiken,
2003). For this reason, it may be useful to examine alliance in
other contexts in which ratings may be more variable (e.g., clients
with more severe personality psychopathology). Lastly, alliance
was assessed only by clients. While relevant and ecologically
valid, accuracy may have been improved for predicting observer-
rated alliance in which observers and ML algorithms had access to
the same information (i.e., session text).
Clinical Vignette
The algorithm developed in the current study is only a first
attempt at predicting alliance ratings using ML, but these initial
results suggest a potential future for using these technologies in
clinical research and practice. We imagine a future application in
the following vignette. This example indicates how ML-generated
analytics derived directly from the session encounter can be used
as another source of information for the therapist to reflect on their
work and potentially improve the process of therapy.
Sandra is a 43-year old, married, African American, cisgender
female who has been struggling with social anxiety since adoles-
cence. She is a school librarian and the mother of two teenage
sons. She has recently begun working with a psychologist, Dr.
Martinez, due to “increasing stress and anxiety” at work which is
beginning to spill over into Sandra’s family life. She reports she
has trouble “asserting herself and expressing her needs” at home
and at work.
During the intake session, Dr. Martinez shares with Sandra that
the clinic has been using a recording platform that can provide Dr.
Martinez with information about how therapy might be going, in
particular, feedback on the therapy “relationship.” Sandra provides
her consent for use of the platform. Therapy starts out smoothly,
with Sandra sharing more about the difficulties she is experienc-
ing, which in recent months have included periodic panic attacks
in social situations. Dr. Martinez, who primarily operates from a
cognitive-behavioral therapy perspective, introduces exposure therapy
as a treatment approach for reducing her symptoms.
During the fifth session, Dr. Martinez initiates a conversation
about Sandra’s progress in treatment. Sandra reports that therapy is
going “just fine” and she apologies for not having had the time to
complete the exposure exercises Dr. Martinez had recommended.
Dr. Martinez reflects that she knows it can be challenging to make
the time for engaging in therapy “homework” and that the expo-
sures themselves can be unpleasant. Sandra quickly assures Dr.
Martinez that she will try to do a better job making time for
exposures.
Through the treatment, Dr. Martinez has been reviewing ses-
sions and automated feedback on the quality of her relationship
with Sandra and has noticed that the alliance scores generated by
the system have been low in the past two sessions. Although
Sandra indicated in session that treatment was going fine, the
alliance algorithm was built using observer-rated alliance that is
less contaminated with self-report biases (e.g., social desirability).
Dr. Martinez uses this opportunity to discuss the automated feed-
back with Sandra:
You know Sandra, I was reviewing some feedback I received on our
sessions last week, and it suggested that it might be smart for me to
check in with you again on how things are going. I know you said,
things are fine, but I can’t help wonder if there’s something I’m
missing. I’d really like to know.
At this point, Sandra notes that she has been having trouble with
Dr. Martinez’s therapeutic approach. Sandra shares that she has
been having significant difficulties in her marriage recently and
has experienced several racial microaggressions at work that have
contributed to her anxiety. Sandra notes that she was hoping to
discuss these events in therapy but was not sure how to bring them
up, given Dr. Martinez’s emphasis on exposure therapy and San-
dra’s difficulty completing her exposure exercises. Dr. Martinez
expresses her appreciation to Sandra for sharing this. They begin
a discussion of ways to refocus treatment to include these addi-
tional dimensions.
Conclusion
The current study introduced and attempted to model ML as a
statistical approach that may be relevant for addressing important
questions about psychotherapy. Just as ML is centrally involved in
numerous cultural, technological, and social changes, it may also
play a leading role in future innovation within psychotherapy
research and practice. Our prediction of therapeutic alliance dis-
cussed here is one of several recent examinations of potential
synergy between ML and psychotherapy. As available sample
sizes grow and technology evolves, it may well be that ML
algorithms can be developed to even more reliably detect treatment
features like alliance from session recordings. Clearly such tech-
nologies could dramatically revolutionize training and provision of
clinical services. In a way, these methods, while heavily reliant on
computers and artificial intelligence, may prove crucial in helping
human researchers and clinicians unravel the dizzying complexity
of the human interaction that is psychotherapy.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
445
MACHINE LEARNING AND ALLIANCE
References
Althoff, T., Clark, K., & Leskovec, J. (2016). Large-scale analysis of
counseling conversations: An application of natural language processing
to mental health. Transactions of the Association for Computational
Linguistics, 4, 463– 476. http://dx.doi.org/10.1162/tacl_a_00111
Anderson, T., Ogles, B. M., Patterson, C. L., Lambert, M. J., & Ver-
meersch, D. A. (2009). Therapist effects: Facilitative interpersonal skills
as a predictor of therapist success. Journal of Clinical Psychology, 65,
755–768. http://dx.doi.org/10.1002/jclp.20583
Atkins, D. C., Steyvers, M., Imel, Z. E., & Smyth, P. (2014). Scaling up the
evaluation of psychotherapy: Evaluating motivational interviewing fi-
delity via statistical text classification. Implementation Science, 9, 49.
http://dx.doi.org/10.1186/1748-5908-9-49
Baldwin, S. A., & Imel, Z. E. (2013). Therapist effects: Findings and
methods. In M. J. Lambert (Ed.), Bergin and Garfield’s handbook of
psychotherapy and behavior change (6th ed., pp. 258 –297). Hoboken,
NJ: Wiley.
Baldwin, S. A., Wampold, B. E., & Imel, Z. E. (2007). Untangling the
alliance-outcome correlation: Exploring the relative importance of ther-
apist and patient variability in the alliance. Journal of Consulting and
Clinical Psychology, 75, 842– 852. 852.
Benton, S. A., Robertson, J. M., Tseng, W. C., Newton, F. B., & Benton,
S. L. (2003). Changes in counseling center client problems across 13
years. Professional Psychology: Research and Practice, 34, 66 –72.
http://dx.doi.org/10.1037/0735-7028.34.1.66
Berwian, I. M., Walter, H., Seifritz, E., & Huys, Q. J. (2017). Predicting
relapse after antidepressant withdrawal - a systematic review. Psychological
Medicine, 47, 426 – 437. http://dx.doi.org/10.1017/S0033291716002580
Bibault, J. E., Giraud, P., & Burgun, A. (2016). Big data and machine learning
in radiation oncology: State of the art and future prospects. Cancer Letters,
382, 110 –117. http://dx.doi.org/10.1016/j.canlet.2016.05.033
Bordin, E. S. (1979). The generalizability of the psychoanalytic concept of
the working alliance. Psychotherapy: Theory, Research & Practice, 16,
252–260. http://dx.doi.org/10.1037/h0085885
Boswell, J. F., Kraus, D. R., Miller, S. D., & Lambert, M. J. (2015).
Implementing routine outcome monitoring in clinical practice: Benefits,
challenges, and solutions. Psychotherapy Research, 25, 6 –19. http://dx
.doi.org/10.1080/10503307.2013.817696
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O., & Walsh, A.
(2018). Machine learning for molecular and materials science. Nature,
559, 547–555. http://dx.doi.org/10.1038/s41586-018-0337-2
Chen, M., Hao, Y., Hwang, K., Wang, L., & Wang, L. (2017). Disease
prediction by machine learning over big data from healthcare commu-
nities. IEEE Access: Practical Innovations, Open Solutions, 5, 8869 –
8879. http://dx.doi.org/10.1109/ACCESS.2017.2694446
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences (3rd ed.).
Mahwah, NJ: Erlbaum.
Creed, T. A., Frankel, S. A., German, R. E., Green, K. L., Jager-Hyman, S.,
Taylor, K. P.,...Beck, A. T. (2016). Implementation of transdiagnostic
cognitive therapy in community behavioral health: The Beck Commu-
nity Initiative. Journal of Consulting and Clinical Psychology, 84,
1116 –1126. http://dx.doi.org/10.1037/ccp0000105
Cuijpers, P., Sijbrandij, M., Koole, S. L., Andersson, G., Beekman, A. T.,
& Reynolds, C. F., III. (2014). Adding psychotherapy to antidepressant
medication in depression and anxiety disorders: A meta-analysis. World
Psychiatry, 13, 56 – 67. http://dx.doi.org/10.1002/wps.20089
DeAngelis, C. D. (2000). Conflict of interest and the public trust. Journal
of the American Medical Association, 284, 2237–2238. http://dx.doi.org/
10.1001/jama.284.17.2237
Duncan, B. L., Miller, S. D., Sparks, J. A., Claud, D. A., Reynolds, L. R.,
& Johnson, L. D. (2003). The Session Rating Scale: Preliminary psy-
chometric properties of a “working” alliance measure. Journal of Brief
Therapy, 3, 3–12.
Dyson, F. J. (1998). Imagined worlds (Vol. 6). Cambridge, MA: Harvard
University Press.
Elliott, R., Bohart, A. C., Watson, J. C., & Murphy, D. (2018). Therapist
empathy and client outcome: An updated meta-analysis. Psychotherapy,
55, 399 – 410. http://dx.doi.org/10.1037/pst0000175
Fairburn, C. G., & Cooper, Z. (2011). Therapist competence, therapy
quality, and therapist training. Behaviour Research and Therapy, 49,
373–378. http://dx.doi.org/10.1016/j.brat.2011.03.005
Falkenström, F., Granström, F., & Holmqvist, R. (2013). Therapeutic
alliance predicts symptomatic improvement session by session. Journal
of Counseling Psychology, 60, 317–328. http://dx.doi.org/10.1037/
a0032258
Flemotomos, N., Martinez, V., Chen, Z., Singla, K., Peri, R., Ardulov, V.,
&Narayanan, S. (2019). A speech and language pipeline for quality
assessment of recorded psychotherapy sessions. Manuscript in prepara-
tion.
Flückiger, C., Del Re, A. C., Wampold, B. E., & Horvath, A. O. (2018).
The alliance in adult psychotherapy: A meta-analytic synthesis. Psycho-
therapy, 55, 316 –340. http://dx.doi.org/10.1037/pst0000172
Fortney, J. C., Unützer, J., Wrenn, G., Pyne, J. M., Smith, G. R., Schoe-
nbaum, M.,...Harbin, H. T. (2017). A tipping point for measurement-
based care. Psychiatric Services, 68, 179 –188. http://dx.doi.org/10
.1176/appi.ps.201500439
France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, D. M.
(2000). Acoustical properties of speech as indicators of depression and
suicidal risk. IEEE Transactions on Biomedical Engineering, 47, 829 –
837. http://dx.doi.org/10.1109/10.846676
Gerbing, D. W., & Hamilton, J. G. (1996). Viability of exploratory factor
analysis as a precursor to confirmatory factor analysis. Structural Equa-
tion Modeling, 3, 62–72. http://dx.doi.org/10.1080/10705519609540030
Goldberg, S. B., Babins-Wagner, R., Rousmaniere, T., Berzins, S., Hoyt,
W. T., Whipple, J. L.,...Wampold, B. E. (2016). Creating a climate for
therapist improvement: A case study of an agency focused on outcomes
and deliberate practice. Psychotherapy, 53, 367–375. http://dx.doi.org/
10.1037/pst0000060
Goldberg, S. B., Baldwin, S. A., Merced, K., Caperton, D., Imel, Z. E., Atkins,
D. C., & Creed, T. (2019). The structure of competence: Evaluating the
factor structure of the Cognitive Therapy Rating Scale. Behavior Therapy.
Advance online publication. http://dx.doi.org/10.1016/j.beth.2019.05.008
Goldberg, S. B., Rowe, G., Malte, C. A., Ruan, H., Owen, J. J., & Miller,
S. D. (2019). Routine monitoring of therapeutic alliance to predict
treatment engagement in a Veterans Affairs substance use disorders
clinic. Psychological Services. Advance online publication. http://dx.doi
.org/10.1037/ser0000337
Gonçalves, M. M., Ribeiro, A. P., Mendes, I., Matos, M., & Santos, A.
(2011). Tracking novelties in psychotherapy process research: The in-
novative moments coding system. Psychotherapy Research, 21, 497–
509.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cam-
bridge, MA: MIT Press.
Greenson, R. R. (1965). The working alliance and the transference neurosis.
The Psychoanalytic Quarterly, 34, 155–181. http://dx.doi.org/10.1080/
21674086.1965.11926343
Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanas-
wamy, A., . . . Webster, D. R. (2016). Development and validation of a
deep learning algorithm for detection of diabetic retinopathy in retinal
fundus photographs. Journal of the American Medical Association, 316,
2402–2410. http://dx.doi.org/10.1001/jama.2016.17216
Hamilton, M. (1960). A rating scale for depression. Journal of Neurology,
Neurosurgery and Psychiatry, 23, 56 – 62. http://dx.doi.org/10.1136/jnnp.23
.1.56
Hatcher, R. L., & Gillaspy, J. A. (2006). Development and validation of a
revised short version of the Working Alliance Inventory. Psychotherapy
Research, 16, 12–25. http://dx.doi.org/10.1080/10503300500352500
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
446 GOLDBERG ET AL.
Haykin, S. S. (2009). Neural networks and learning machines (3rd ed.).
Upper Saddle River, NJ: Pearson.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estima-
tion for nonorthogonal problems. Technometrics, 12(1), 55– 67.
Hutson, M. (2018). Has artificial intelligence become alchemy? Science,
360, 478. http://dx.doi.org/10.1126/science.360.6388.478
Imel, Z. E., Barco, J. S., Brown, H. J., Baucom, B. R., Baer, J. S., Kircher,
J. C., & Atkins, D. C. (2014). The association of therapist empathy and
synchrony in vocally encoded arousal. Journal of Counseling Psychol-
ogy, 61, 146 –153. http://dx.doi.org/10.1037/a0034943
Imel, Z. E., Caperton, D. D., Tanana, M., & Atkins, D. C. (2017). Technology-
enhanced human interaction in psychotherapy. Journal of Counseling Psy-
chology, 64, 385–393. http://dx.doi.org/10.1037/cou0000213
Imel, Z. E., Hubbard, R. A., Rutter, C. M., & Simon, G. (2013). Patient-
rated alliance as a measure of therapist performance in two clinical
settings. Journal of Consulting and Clinical Psychology, 81, 154 –165.
http://dx.doi.org/10.1037/a0030903
Imel, Z. E., Pace, B. T., Soma, C. S., Tanana, M., Gibson, J., Hirsch, T.,
. . . Atkins, D. A. (in press). Initial development and evaluation of an
automated, interactive, web-based therapist feedback system for moti-
vational interviewing fidelity. Psychotherapy.
Imel, Z. E., Steyvers, M., & Atkins, D. C. (2015). Computational psycho-
therapy research: Scaling up the evaluation of patient-provider interac-
tions. Psychotherapy, 52, 19 –30. http://dx.doi.org/10.1037/a0036841
Insel, T. R. (2017). Digital phenotyping. Journal of the American Medical
Association, 318, 1215–1216. http://dx.doi.org/10.1001/jama.2017.11295
Johns, R. G., Barkham, M., Kellett, S., & Saxon, D. (2019). A systematic
review of therapist effects: A critical narrative update and refinement to
review. Clinical Psychology Review, 67, 78 –93. http://dx.doi.org/10.1016/
j.cpr.2018.08.004
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspec-
tives, and prospects. Science, 349, 255–260. http://dx.doi.org/10.1126/
science.aaa8415
Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (2nd
ed.). London, UK: Pearson.
Lambert, M. J., & Barley, D. E. (2001). Research summary on the therapeutic
relationship and psychotherapy outcome. Psychotherapy: Theory, Research,
Practice, Training, 38, 357–361. http://dx.doi.org/10.1037/0033-3204.38.4
.357
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). Big data. The
parable of Google Flu: Traps in big data analysis. Science, 343, 1203–
1205. http://dx.doi.org/10.1126/science.1248506
Lutz, W., Schwartz, B., Hofmann, S. G., Fisher, A. J., Husen, K., & Rubel,
J. A. (2018). Using network analysis for the prediction of treatment
dropout in patients with mood and anxiety disorders: A methodological
proof-of-concept study. Scientific Reports, 8, 7819. http://dx.doi.org/10
.1038/s41598-018-25953-0
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. (2013).
Distributed representations of words and phrases and their composition-
ality. Proceedings of Advances in Neural Information Processing Sys-
tems, 26, 3111–3119.
Miller, W. R., Moyers, T. B., Ernst, D., & Amrhein, P. (2003). Manual for
the motivational interviewing skills code v. 2.0. Retrieved from http://
casaa.unm.edu/codinginst.html
Miner, A. S., Milstein, A., & Hancock, J. T. (2017). Talking to machines about
personal mental health problems. Journal of the American Medical Asso-
ciation, 318, 1217–1218. http://dx.doi.org/10.1001/jama.2017.14151
Mitchell, T. M. (1997). Does machine learning really work? AI Magazine,
18, 11–20.
Mjolsness, E., & DeCoste, D. (2001). Machine learning for science: State
of the art and future prospects. Science, 293, 2051–2055. http://dx.doi
.org/10.1126/science.293.5537.2051
Moore, E., II, Clements, M. A., Peifer, J. W., & Weisser, L. (2008). Critical
analysis of the impact of glottal features in the classification of clinical
depression in speech. IEEE Transactions on Biomedical Engineering,
55, 96 –107. http://dx.doi.org/10.1109/TBME.2007.900562
Murphy, K. P. (2012). Machine learning: A probabilistic perspective.
Cambridge, MA: MIT Press.
Okada, S., Ohtake, Y., Nakano, Y. I., Hayashi, Y., Huang, H. H., Takase,
Y., & Nitta, K. (2016). Estimating communication skills using dialogue
acts and nonverbal features in multiple discussion datasets. Proceedings
of the 18th ACM International Conference on Multimodal Interaction
(pp. 169 –176). New York, NY: ACM.
Olfson, M., & Marcus, S. C. (2010). National trends in outpatient psycho-
therapy. The American Journal of Psychiatry, 167, 1456 –1463. http://
dx.doi.org/10.1176/appi.ajp.2010.10040570
Open Science Collaboration. (2015). Estimating the reproducibility of
psychological science. Science, 349, aac4716. http://dx.doi.org/10.1126/
science.aac4716
Pagliardini, M., Gupta, P., & Jaggi, M. (2017). Unsupervised learning of
sentence embeddings using compositional n-gram features. CoRRarXiv:
1703.02507. Retrieved from http://dx.doi.org/10.18653/v1/N18-1049
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,
Grisel, O.,...Vanderplas, J. (2011). Scikit-learn: Machine learning in
Python. Journal of Machine Learning Research, 12, 2825–2830.
Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The
development and psychometric properties of LIWC2015. Austin: Uni-
versity of Texas at Austin. Technical Report.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors
for word representation. Proceedings of the 2014 Conference on Empir-
ical Methods in Natural Language Processing (EMNLP) (pp.1532–
1543). Stroudsburg, PA: Association for Computational Linguistics.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N.,
. . . Silovsky, J. (2011). The Kaldi speech recognition toolkit. IEEE 2011
Workshop on Automatic Speech Recognition and Understanding. Big
Island, Hawaii: IEEE Signal Processing Society.
Python Software Foundation. (2019). Python language reference (Version
3.7.2) [Computer software]. Retrieved from http://www.python.org
R Core Team. (2018). R: A language and environment for statistical
computing. Vienna, Austria: R Foundation for Statistical Computing.
Retrieved from https://www.R-project.org/
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: A modern
approach (3rd ed.). Essex, UK: Pearson Education.
Salton, G., & McGill, M. J. (1986). Introduction to modern information
retrieval. New York, NY: McGraw-Hill.
Shannon, C. E. (1948). A mathematical theory of communication. The Bell
System Technical Journal, 27, 379 – 423. http://dx.doi.org/10.1002/j.1538-
7305.1948.tb01338.x
Shatte, A. B. R., Hutchinson, D. M., & Teague, S. J. (2019). Machine
learning in mental health: A scoping review of methods and applications.
Psychological Medicine, 49, 1426 –1448. http://dx.doi.org/10.1017/S00
33291719000151
Stead, W. W. (2018). Clinical implications and challenges of artificial
intelligence and deep learning. Journal of the American Medical Asso-
ciation, 320, 1107–1108. http://dx.doi.org/10.1001/jama.2018.11029
Stone, P. J., Bales, R. F., Namenwirth, J. Z., & Ogilvie, D. M. (1962). The
general inquirer: A computer system for content analysis and retrieval
based on the sentence as a unit of information. Behavioral Science, 7,
484 – 498. http://dx.doi.org/10.1002/bs.3830070412
Substance Abuse and Mental Health Services Administration. (2014).
Projections of national expenditures for treatment of mental and sub-
stance use disorders, 2010 –2020. Rockville, MD: Author.
Tao, K. W., Owen, J., Pace, B. T., & Imel, Z. E. (2015). A meta-analysis of
multicultural competencies and psychotherapy process and outcome. Jour-
nal of Counseling Psychology, 62, 337–350. http://dx.doi.org/10.1037/
cou0000086
Thompson, M. N., Goldberg, S. B., & Nielsen, S. L. (2018). Patient
financial distress and treatment outcomes in naturalistic psychotherapy.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
447
MACHINE LEARNING AND ALLIANCE
Journal of Counseling Psychology, 65, 523–530. http://dx.doi.org/10.1037/
cou0000264
Tichenor, V., & Hill, C. E. (1989). A comparison of six measures of
working alliance. Psychotherapy: Theory, Research, Practice, Training,
26, 195–199. http://dx.doi.org/10.1037/h0085419
Tracey, T. J. G., Wampold, B. E., Lichtenberg, J. W., & Goodyear, R. K.
(2014). Expertise in psychotherapy: An elusive goal? American Psychol-
ogist, 69, 218 –229. http://dx.doi.org/10.1037/a0035099
Tryon, G. S., Blackwell, S. C., & Hammel, E. F. (2008). The magnitude of
client and therapist working alliance ratings. Psychotherapy: Theory, Re-
search, Practice, Training, 45, 546 –551. http://dx.doi.org/10.1037/a001
4338
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59,
433– 460. http://dx.doi.org/10.1093/mind/LIX.236.433
Wandersman, A., Duffy, J., Flaspohler, P., Noonan, R., Lubell, K., Still-
man, L.,...Saul, J. (2008). Bridging the gap between prevention
research and practice: The interactive systems framework for dissemi-
nation and implementation. American Journal of Community Psychol-
ogy, 41(3– 4), 171–181. http://dx.doi.org/10.1007/s10464-008-9174-z
Wang, R., Aung, M. S., Abdullah, S., Brian, R., Campbell, A. T., Choud-
hury, T.,...Tseng, V. W. (2016, September). CrossCheck: Toward
passive sensing and detection of mental health changes in people with
schizophrenia. Proceedings of the 2016 ACM International Joint Con-
ference on Pervasive and Ubiquitous Computing (pp. 886-897). New
York, NY: Association for Computing Machinery.
Webb, C. A., DeRubeis, R. J., & Barber, J. P. (2010). Therapist adherence/
competence and treatment outcome: A meta-analytic review. Journal of
Consulting and Clinical Psychology, 78, 200 –211. http://dx.doi.org/10
.1037/a0018912
Whiteford, H. A., Degenhardt, L., Rehm, J., Baxter, A. J., Ferrari, A. J.,
Erskine, H. E.,...Vos, T. (2013). Global burden of disease attributable
to mental and substance use disorders: Findings from the Global Burden
of Disease Study 2010. The Lancet, 382, 1575–1586. http://dx.doi.org/
10.1016/S0140-6736(13)61611-6
Xiao, B., Huang, C., Imel, Z. E., Atkins, D. C., Georgiou, P., & Narayanan,
S. S. (2016). A technology prototype system for rating therapist empathy
from audio recordings in addiction counseling. PeerJ Computer Science,
2, e59. http://dx.doi.org/10.7717/peerj-cs.59
Zilcha-Mano, S. (2017). Is the alliance really therapeutic? Revisiting this
question in light of recent methodological advances. American Psychol-
ogist, 72, 311–325. http://dx.doi.org/10.1037/a0040435
Zilcha-Mano, S., & Errázuriz, P. (2017). Early development of mechanisms of
change as a predictor of subsequent change and treatment outcome: The
case of working alliance. Journal of Consulting and Clinical Psychology,
85, 508 –520. http://dx.doi.org/10.1037/ccp0000192
Received March 9, 2019
Revision received July 8, 2019
Accepted August 8, 2019
E-Mail Notification of Your Latest Issue Online!
Would you like to know when the next issue of your favorite APA journal will be available
online? This service is now available to you. Sign up at https://my.apa.org/portal/alerts/ and you will
be notified by e-mail when issues of interest to you become available!
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
448 GOLDBERG ET AL.
... The development of the patient-therapist relationship and its impact on therapy outcomes have been extensively studied (Gelso and 1985; Norcross, 2010). Previous work characterized therapeutic alliance (Goldberg et al., 2020) and ruptures (Tsakalidis et al., 2021) through emotional engagement (Christian et al., 2021), sentiment (Syzdek, 2020), linguistic coordination (Nasir et al., 2019), and synchrony (Doré and Morris, 2018). Our work characterizes an additional dimension of the patient-therapist relationship based on the joint act of redirection. ...
... One possible way to quantify additional dimensions of metacognitive changes associated with MERIT is through natural language processing (NLP). NLP is an emerging approach in psychotherapy research that can objectively quantify elements such as the quality and progress of treatment (Goldberg et al. 2020). Clinical applications of NLP include identification of successful conversational strategies, evaluation and improvement of therapeutic skills of psychotherapists and better detection of high-risk clinical conditions such as suicidality (Althoff, Clark, and Leskovec 2016;Flemotomos et al. 2022;Levis et al. 2021). ...
... In general, artificial intelligence is a discipline that studies how to construct an intelligent machine or an intelligent system so that it can simulate, extend and expand human intelligence. Currently, the research of artificial intelligence is more combined with specific fields, and the main research areas are expert systems, machine learning, pattern recognition and natural language understanding [10][11][12]. Since the introduction of artificial intelligence technology, its research has had remarkable results. ...
Article
Full-text available
Clinical psychological counseling suffers from problems such as a shortage of medical personnel and uneven quality, and artificial intelligence technology provides a feasible way to solve these problems. In this paper, a chatbot model for psychological counseling is designed using Seq2seq, and Encoder-Decoder and Attention mechanisms are introduced to improve decoding accuracy. LSTM is used as the basic unit, and the beam search algorithm is added to improve the diversity of replies. The experimental results show that adding LSTM and Beam Search can generate higher-quality and more natural psychological counseling responses, and the loss value of this paper’s model decreases to 1.33 after 10 rounds of training. The total score of the OQ-45.2 questionnaire of the experimental group’s post-test decreased by 13.8 points, and the mean value of symptom distress decreased by 8.26, performing significantly better than that of the control group. The chatbot design in this paper is reasonable and aids in improving the quality of clinical psychological counseling services.
... However, recent years have seen increased interest in novel measures that allow for a more objective operationalization of alliance. For example, researchers have begun asking independent observers or artificial intelligence algorithms to rate alliance based on session recordings (Christian et al., 2021;Goldberg et al., 2020;Ryu et al., 2023). With the rise of interest in objective indices of alliance, measuring interpersonal synchrony has emerged as an innovative approach. ...
Preprint
Full-text available
Therapeutic alliance is defined as the collaborative relationship between patient and therapist. Strong therapeutic alliances have been associated with positive clinical outcomes, but much remains unknown about alliance and its physiological basis. Interpersonal neural synchrony (INS) – the synchronization of neural signals between interacting individuals – is emerging as a novel lens for studying interactions with relatively objective and time-sensitive neuroimaging techniques. We searched four databases to identify studies of INS in clinical encounters. Our search yielded 161 articles, 46 met criteria for full-text review and 12 were included. The included articles reported INS across a total of 174 dyads, all published since 2018. Despite diverse methodologies, INS was observed in all studies and was often related to therapeutic outcomes. However, results were mixed regarding associations between INS and therapeutic alliance. These results highlight that, while promising, additional research is needed to elucidate the relationship between INS, therapeutic alliance and clinical outcomes. Future studies should aim to standardize methodologies, explore temporal dynamics (e.g., through longitudinal assessments), and include larger sample sizes.
... Recently, NLP methods have been used to examine interventions and topics of conversation in therapy sessions (Atkins et al., 2012;Flemotomos et al., 2018Flemotomos et al., , 2022Gaut et al., 2017;Imel et al., 2015;Salvatore et al., 2012;Tanana et al., 2016). NLP techniques have also been used to evaluate empathy of therapists (Xiao et al., 2015), emotional valence of therapists and clients (Tanana et al., 2021), and therapeutic alliance (Goldberg et al., 2020). While NLP methodologies, to date, have not been used to examine cultural contexts in therapy, they have been used extensively to detect hate speech in social media (Jahan & Oussalah, 2023). ...
Article
Full-text available
Researchers have historically focused on understanding therapist multicultural competency and orientation through client self-report measures and behavioral coding. While client perceptions of therapist cultural competency and multicultural orientation and behavioral coding are important, reliance on these methods limits therapists receiving systematic, scalable feedback on cultural opportunities within sessions. Prior research demonstrating the feasibility of automatically identifying topics of conversation in psychotherapy suggests that natural language processing (NLP) models could be trained to automatically identify when clients and therapists are talking about cultural concerns and could inform training and provision of rapid feedback to therapists. Utilizing 103,170 labeled talk turns from 188 psychotherapy sessions, we developed NLP models that recognized the discussion of cultural topics in psychotherapy (F − 1 = 70.0; Spearman’s ρ = 0.78, p < .001). We discuss implications for research and practice and applications for future NLP-based feedback tools.
... This data enables the examination of treatment adherence, evaluation of patient results, identification of treatment elements, assessment of the therapeutic relationship, and measurement of suicide risk in a manner that is transformative, generating anticipation and concern among participants [26]. Recently, NLP has been utilized in mental health domains, such as social media and electronic health records [34]. ...
Preprint
Full-text available
This study aims to evaluate the utilization and effectiveness of artificial intelligence (AI) applications in managing symptoms of anxiety and depression. The primary objectives are to identify current AI tools, analyze their practicality and efficacy, and assess their potential benefits and risks. A comprehensive literature review was conducted using databases such as ScienceDirect, Google Scholar, PubMed, and ResearchGate, focusing on publications from the last five years. The search utilized keywords including "artificial intelligence," "applications," "mental health," "anxiety," "LLMs" and "depression". Various AI tools, including chatbots, mobile applications, wearables, virtual reality settings, and large language models (LLMs), were examined and categorized based on their functions in mental health care. The findings indicate that AI applications, including LLMs, show significant promise in symptom management, offering accessible and personalized interventions that can complement traditional mental health treatments. Tools such as AI-driven chatbots, mobile apps, and LLMs have demonstrated efficacy in reducing symptoms of anxiety and depression, improving user engagement and mental health outcomes. LLMs, in particular, have shown potential in enhancing therapeutic chatbots, diagnostic tools, and personalized treatment plans by providing immediate support and resources, thus reducing the workload on mental health professionals. However, limitations include concerns over data privacy, the potential for over-reliance on technology, and the need for human oversight to ensure comprehensive care. Ethical considerations, such as data security and the balance between AI and human interaction, were also addressed. The study concludes that while AI, including LLMs, has the potential to significantly aid mental health care, it should be used as a complement to, rather than a replacement for, human therapists. Future research should focus on enhancing data security measures, integrating AI tools with traditional therapeutic methods, and exploring the long-term effects of AI interventions on mental health. Further investigation is also needed to evaluate the effectiveness of AI applications across diverse populations and settings.
Article
Although clinician‐supported computer‐assisted cognitive‐behaviour therapy (CCBT) is well established as an effective treatment for depression and anxiety, less is known about the specific interventions used during coaching sessions that contribute to outcomes. The current study used artificial intelligence (AI) to identify specific components of clinician‐supported CCBT and correlated those scores with therapy outcomes. Data from a randomized clinical trial comparing clinician‐supported CCBT with treatment as usual in a primary care setting were utilized. Participants ( n = 95) engaged in CCBT with coaching sessions. The primary outcome was the Patient Health Questionnaire (PHQ‐9), with Generalized Anxiety Disorder (GAD‐7), Satisfaction with Life Scale (SWLS) and Automatic Thoughts Questionnaire (ATQ) ratings as secondary outcomes, which were assessed at 12 weeks (post), 3‐ and 6‐month follow‐up. The Lyssn system utilized AI technology to code CBT techniques and common general psychotherapeutic techniques. After controlling for initial ratings, 13 Lyssn‐variables were observed to be significantly associated with reducing anxiety on the GAD‐7 after 12 weeks of treatment. Among the most effective CBT interventions for anxiety included the use of guided discovery, understanding, interpersonal effectiveness and agenda setting. The most beneficial intervention was the proportion of open questions across all variables. Lyssn did not identify any CBT‐specific interventions significantly associated with PHQ‐9, SWLS or ATQ. Therapist use of CBT‐specific techniques was significantly associated with reduction of anxiety symptoms after 12 weeks, but such gains were not observed at follow up. Therapist use of open questions was observed to be the most impactful technique contributing to treatment outcomes.
Preprint
Full-text available
Background: Mental health disorders significantly impact global populations, prompting the rise of digital mental health interventions like artificial intelligence-powered chatbots to address gaps in access to care. This review explores the potential for a "digital therapeutic alliance," emphasizing empathy, engagement, and alignment with traditional therapeutic principles to enhance user outcomes. Objective: The primary objective of this review was to identify key concepts underlying the digital therapeutic alliance in AI-driven psychotherapeutic interventions for mental health. The secondary objective was to propose an initial definition of the digital therapeutic alliance based on these identified concepts. Methods: The PRISMA for Scoping Reviews and Tavares de Souza's integrative review methodology were followed, encompassing systematic literature searches in Medline, Web of Science, PsycNet, and Google Scholar. Data from eligible studies were extracted and analyzed using Horvath et al.'s conceptual framework on therapeutic alliance, focusing on goal alignment, task agreement, and the therapeutic bond, with quality assessed using the Newcastle-Ottawa Scale and Cochrane Risk of Bias Tool. Results: A total of 23 studies were identified from an initial pool of 1,227 articles after excluding duplicates and ineligible studies. These studies informed the development of a conceptual framework for Digital Therapeutic Alliance, encompassing key elements such as goal alignment, task agreement, therapeutic bond, user engagement, and the facilitators and barriers affecting therapeutic outcomes. The interventions primarily focused on AI-powered chatbots, digital psychotherapy, and other digital tools. Conclusions: The findings of this integrative review provide a foundational framework for the concept of Digital Therapeutic Alliance, and reported its potential to replicate key therapeutic mechanisms such as empathy, trust, and collaboration in artificial intelligence-driven psychotherapeutic tools. While promising for enhancing accessibility and engagement in mental health care, further research and innovation are needed to address challenges like personalization, ethical concerns, and long-term impact.
Article
Full-text available
The Cognitive Therapy Rating Scale (CTRS) is an observer-rated measure of cognitive behavioral therapy (CBT) treatment fidelity. Although widely used, the factor structure and psychometric properties of the CTRS are not well established. Evaluating the factorial validity of the CTRS may increase its utility for training and fidelity monitoring in clinical practice and research. The current study used multilevel exploratory factor analysis to examine the factor structure of the CTRS in a large sample of therapists (n = 413) and observations (n = 1,264) from community-based CBT training. Examination of model fit and factor loadings suggested that three within-therapist factors and one between-therapist factor provided adequate fit and the most parsimonious and interpretable factor structure. The three within-therapist factors included items related to (a) session structure, (b) CBT-specific skills and techniques, and (c) therapeutic relationship skills, although three items showed some evidence of cross-loading. All items showed moderate to high loadings on the single between-therapist factor. Results support continued use of the CTRS and suggest factors that may be a relevant focus for therapists, trainers, and researchers.
Article
Full-text available
Measurement-based care (MBC) can improve mental health treatment outcomes and is a priority within the Department of Veterans Affairs (VA). However, to date, MBC efforts within the VA have focused on assessment of psychological symptoms to the exclusion of psychotherapy process variables such as the therapeutic alliance that may predict treatment response. This quality improvement project involved the implementation of routine monitoring of alliance within a VA substance use disorder (SUD) clinic predominantly serving veterans with serious mental illness. Alliance ratings were provided by 98 veterans following group therapy sessions. Low alliance ratings were used by the clinicians (n = 4) leading the groups (n = 9) as opportunities to discuss veterans’ treatment experience and increase engagement. Using multilevel models that accounted for the nested nature of the data and veteran demographics, alliance ratings showed a small increase over time (B = 0.075, p < .001, f² = 0.033). In addition, maximum alliance rating (i.e., patients’ highest rating of alliance across all observations) was significantly but modestly associated with attendance at both MBC group sessions and all SUD-related visits in the 3 months following the initial alliance rating (Bs = 0.96 and 1.79; ps = .006 and .004; f²s = 0.079 and 0.088, respectively). Average alliance rating, however, was not associated with treatment attendance (ps > .050). Findings suggest that assessment of alliance is feasible within a VA SUD clinic and may provide information signaling risk for disengagement that could be used for increasing treatment engagement.
Article
Full-text available
Here we summarize recent progress in machine learning for the chemical sciences. We outline machine-learning techniques that are suitable for addressing research questions in this domain, as well as future directions for the field. We envisage a future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence.
Article
Full-text available
Although psychotherapy is on the whole an effective health care practice, treatment efficacy for patients with varying levels of reported financial distress is less clear. The purpose of this study was to examine the impact of patient self-reported financial distress on psychotherapy outcomes using a large, naturalistic psychotherapy dataset of college students who sought psychotherapy services (n = 5,078 patients, n = 238 therapists). Multilevel models accounted for the nesting of patients within therapists and treatment outcome was assessed using the Outcome Questionnaire-45. Patients on the whole showed treatment effects in the moderate to large range (d = 0.73). However, patients with higher financial distress at baseline were more likely to drop out of treatment after 1 session and, when controlling for baseline severity, had worse outcomes at the end of treatment. Though the effects were small, these findings held when controlling for age, gender, and treatment length. Further, the relationship between baseline financial distress and treatment retention (but not treatment outcome) varied between therapists, though the effects were also small. Patients’ financial distress specifically and social class more generally may be patient contributors to psychotherapy outcome (and therapist effects) that warrant further attention.
Article
Full-text available
Machine learning needs more rigor, scientists argue.
Article
Background This paper aims to synthesise the literature on machine learning (ML) and big data applications for mental health, highlighting current research and applications in practice. Methods We employed a scoping review methodology to rapidly map the field of ML in mental health. Eight health and information technology research databases were searched for papers covering this domain. Articles were assessed by two reviewers, and data were extracted on the article's mental health application, ML technique, data type, and study results. Articles were then synthesised via narrative review. Results Three hundred papers focusing on the application of ML to mental health were identified. Four main application domains emerged in the literature, including: (i) detection and diagnosis; (ii) prognosis, treatment and support; (iii) public health, and; (iv) research and clinical administration. The most common mental health conditions addressed included depression, schizophrenia, and Alzheimer's disease. ML techniques used included support vector machines, decision trees, neural networks, latent Dirichlet allocation, and clustering. Conclusions Overall, the application of ML to mental health has demonstrated a range of benefits across the areas of diagnosis, treatment and support, research, and clinical administration. With the majority of studies identified focusing on the detection and diagnosis of mental health conditions, it is evident that there is significant room for the application of ML to other areas of psychology and mental health. The challenges of using ML techniques are discussed, as well as opportunities to improve and advance the field.
Article
Artificial intelligence (AI) and deep learning are entering the mainstream of clinical medicine. For example, in December 2016, Gulshan et al¹ reported development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. An accompanying editorial by Wong and Bressler² pointed out limits of the study, the need for further validation of the algorithm in different populations, and unresolved challenges (eg, incorporating the algorithm into clinical work flows and convincing clinicians and patients to “trust a ‘black box’”). Sixteen months later, the Food and Drug Administration (FDA)³ permitted marketing of the first medical device to use AI to detect diabetic retinopathy. FDA reduced the risk of releasing the device by limiting the indication for use to screening adults who do not have visual symptoms for greater than mild retinopathy, to refer them to an eye care specialist.
Article
Objective: To review the therapist effects literature since Baldwin and Imel's (2013) review. Method: Systematic literature review of three databases (PsycINFO, PubMed and Web of Science) replicating Baldwin and Imel (2013) search terms. Weighted averages of therapist effects (TEs) were calculated, and a critical narrative review of included studies conducted. Results: Twenty studies met inclusion criteria (3 RCTs; 17 practice-based) with 19 studies using multilevel modeling. TEs were found in 19 studies. The TE range for all studies was 0.2% to 29% (weighted average = 5%). For RCTs, 1%-29% (weighted average = 8.2%). For practice-based studies, 0.2-21% (weighted average = 5%). The university counseling subsample yielded a lower TE (2.4%) than in other groupings (i.e., primary care, mixed clinical settings, and specialist/focused settings). Therapist sample sizes remained lower than recommended, and few studies appeared to be designed specifically as TE studies, with too few examples of maximising the research potential of large routine patient datasets. Conclusions: Therapist effects are a robust phenomenon although considerable heterogeneity exists across studies. Patient severity appeared related to TE size. TEs from RCTs were highly variable. Using an overall therapist effects statistic may lack precision, and TEs might be better reported separately for specific clinical settings.