ArticlePDF Available

Using text analysis software to detect deception in written short‐answer questions in employee selection

Authors:

Abstract and Figures

This study investigated if word frequencies informed by the Newman‐Pennebaker (NP) and Reality Monitoring (RM) models could classify honest and deceptive responses to short‐answer questions often used in online employee applications. Participants (n = 106; 58% male; Mage = 30.28 years, SD = 8.85) completed two written short‐answer questions both deceptively and honestly. The questions asked participants to describe a notable personal achievement or a time where they had demonstrated interpersonal skills. Linguistic Inquiry and Word Count was used to calculate the prevalence of words in various linguistic categories. Deceptive statements contained significantly fewer first‐person singular pronouns, auxiliary verbs, adverbs, conjunctions, and cognitive process words. Results revealed the NP and RM models accuracy at classifying responses varied on question type.
Content may be subject to copyright.
SHORT ANSWER DECEPTION
1
Using Text Analysis Software to Detect Deception in Written Short-Answer Questions in
Employee Selection
Loch Forsyth, Jeromy Anglim1
Abstract: This study investigated if word frequencies informed by the Newman-
Pennebaker (NP) and Reality Monitoring (RM) models could classify honest and deceptive
responses to short-answer questions often used in online employee applications. Participants (n =
106; 58% male; Mage = 30.28 years, SD = 8.85) completed two written short-answer questions
both deceptively and honestly. The questions asked participants to describe a notable personal
achievement or a time where they had demonstrated interpersonal skills. Linguistic Inquiry and
Word Count was used to calculate the prevalence of words in various linguistic categories.
Deceptive statements contained significantly fewer first-person singular pronouns, auxiliary verbs,
adverbs, conjunctions, and cognitive process words. Results revealed the NP and RM models
accuracy at classifying responses varied on question type.
Introduction
Research shows that applicants often lie in order to gain employment (Comer & Stephens,
2004; Jupe, Vrij, Leal, & Mann, 2016; Robinson, Shepherd, & Heywood, 1998; Vrij, 2008; Wood
et al., 2007), and that this deception can occur in a range of settings including the curriculum vitae
(Henle, Dineen, & Duffy, 2017; Jupe, Vrij, Leal, & Mann, 2016), social media profiles (Roulin &
Levashina, 2016), personality tests (Anglim, Bozic, Little, & Lievens, 2018; Anglim, Morse, De
Vries, MacCann, & Marty 2017; Birkeland, Manson, Kisamore, Brannick, & Smith, 2006), and
employment interviews (Bourdage, Roulin, & Levashina, 2017; Levashina, Hartwell, Morgeson,
& Campion, 2014; Melchers, Roulin, & Buehl, 2020). Nonetheless, a robust finding throughout
the deception literature is that most individuals score no better than chance at identifying deception
(Bond & DePaulo, 2006; Vrij, 2008). Furthermore, experienced recruiters have been found to score
no better than laypeople at identifying deceptive job applicants (Roulin, Bangerter, & Levashina,
2014), and managers often hold exaggerated beliefs about their ability to detect deception (Roulin
et al., 2014). Therefore, it is reasonable to expect deception to occur and go unrecognized
throughout the employee selection process which threatens the integrity of this process.
As the employment application process shifts predominantly to an online format, the ability
to assess the truthfulness of the information provided by applicants becomes critically important.
One popular screening tool involves asking applicants to provide written examples of past behavior
that might illustrate relevant job competencies (Eggert, 2013). A multitude of online employment
help sites and forums offer suggestions to job seekers on how to answer these questions. These
responses are often collected by prospective employers as a form of Biodata with applicants
providing written responses and evidence of knowledge to questions specific to the advertised role
(Levashina, Morgeson, & Campion, 2009; Ployhart, 2014). Despite research on deceptive
responses to similar questions in face-to-face interview settings (Bourdage et al., 2017; Schneider,
Powell, & Roulin, 2015), no research has examined deceptive responses to written online
behavioral interview questions, typically obtained in the initial employment application phase.
1 Please cite:
Forsyth, L., & Anglim, J. (2020). Using text analysis software to detect deception in written short-answer questions in employee
selection. International journal of selection and assessment, 28(3), 236-246. https://doi.org/10.1111/ijsa.12284
Correspondence concerning this article should be addressed to Loch Forsyth, School of Health Sciences and Psychology, Faculty
of Health, Federation University Australia, PO Box 663 Ballarat VIC 3353. Email l.forsyth@federation.edu.au Jeromy Anglim,
School of Psychology, Deakin University, Geelong, Australia.
SHORT ANSWER DECEPTION
2
This is important because computer mediated communication gives the applicant the
opportunity to edit their responses and remove or modify the potential cues of deception that are
often present in face-to-face interactions (Ho, Hancock, Booth, & Liu, 2016). Nonetheless, there
is a broad literature that suggests that linguistic differences exist between statements derived from
memory of actual events and those derived from imagination or fabrication (often referred to as
the Undeutsch hypothesis, Porter & Yuille, 1995; Undeutsch, 1954; 1967). Recent research
suggests that the manual coding of applicant responses in an interview setting can assist in
identifying deception (Roulin & Powell, 2018). Unfortunately, traditional approaches related to
detecting deception in written statements which rely on manual coding are difficult, labor
intensive, and susceptible to high error rates (Vrij, 2008). They are therefore not efficient methods
for assessing the veracity of statements that are often obtained from large volumes of job
applicants. As such, the present study aims to examine the effectiveness of text analysis software
in classifying honest and deceptive statements to written past-behavioral short-answer job
application questions.
A Computerized Approach for Analyzing Language
Text analysis software can be applied to automate the counting of words in various
linguistic categories (Tauzczik & Pennebaker 2010). In particular, Linguistic Inquiry and Word
Count (LIWC; Pennebaker, Booth, Boyd, & Francis, 2015) is a particularly popular tool for text
analysis. It can assist in several ways including (1) operationalizing the assessment of linguistic
criteria indicative of deception, (2) automating the assessment of large volumes of responses, and
(3) guiding human decision making regarding veracity. LIWC involves the automated extraction
of word frequencies across various linguistic categories. These frequencies can then be examined
to assess whether they differentiate between honest and deceptive text (Pennebaker, 2011). The
analysis of word frequencies generated with LIWC has revealed several potential differences
between deceptive and honest statements, although there is not consensus of which word
categories hold the most reliable utility when analyzing statements. As such, we organize our
approach in terms of two models that have been used to guide LIWC analysis of deceptive and
honest statements previously: (1) Newman-Pennebaker (NP) and (2) Reality Monitoring (RM).
The Newman-Pennebaker (NP) Model
The NP Model is an empirically derived approach that posits that deceptive statements can
be distinguished from those that are honest by using the internal dictionaries of language software
to count the frequency of specific word categories (Hancock & Woodworth, 2013). This line of
research suggests that deceptive statements are generally characterized by fewer first person
singular pronouns (‘I’), fewer conjunctions (‘and, also’), more motion words (‘go, walk, move’),
and more negative emotion words (Newman, Pennebaker, Berry, & Richards, 2003). Theoretical
interpretations can be used to understand the processes that may underlie this language profile. For
instance, the reduced use of pronouns is thought to signal psychological distancing (Hancock &
Woodworth, 2013). Likewise, the greater use of negative emotion words when lying is often
attributed to what is referred to as the leakage hypothesis. This purports that the anxiety associated
with lying can result in more negative emotions that are both spoken and displayed (Ekman, 2007).
Motion words as described by Toma and Hancock (2012) such as “walk” and “go” are examples
of concrete descriptors that require less cognitive resources and are therefore more readily used in
deceptive responses (Newman et al., 2003).
The NP model has been successfully applied across a range of contexts including attitudes
on abortion, the existence of Weapons of Mass Destruction, and suspect links with Al-Qaeda
(Hancock & Woodworth, 2013; Newman et al., 2003). False statements were found to contain
SHORT ANSWER DECEPTION
3
fewer first-person singular pronouns (“I”) and exclusive terms (“except, but”), and more negative
emotion terms and verbs classed as action verbs (Hancock & Woodworth, 2013). In general, past
research has found that the NP model is capable of accurately classifying between 61-76% of
statements as either false or truthful (Hancock & Woodworth, 2013; Newman et al., 2003).
The Reality Monitoring (RM) Framework
The RM framework emerged from research on human memory and posits that truthful
memories differ from fabricated memories and that these differences can manifest in human
responses (Zhou, Burgoon, Nunamaker, & Twitchell, 2004). Specifically, responses based on
truthful memories are theorized to include more references to sensory, affective, and contextual
details, reflective of a true lived experience (Bond & Lee, 2005; Zhou et al., 2004). Unlike the NP
model which is primarily data driven, the RM approach was originally used by raters who manually
coded transcripts for the elements associated with RM criteria. Review articles suggest that trained
raters applying the RM approach can typically achieve around 70% accuracy in classifying
statements (Granhag, Vrij, & Verschuere, 2015; Vrij, 2008).
While text analysis software such as LIWC does not allow for the contextual elements of
a recalled memory to be coded, several word categories overlap with the RM approach. In
particular, the following categories have been used by past research (Bond & Lee, 2005): See
(View, Saw), Hear (Listen, Hearing), Space (Around, Over), Time (Hours, Day), Affect (Happy,
Ugly), and Cognitive Processes (Possible, Maybe). As the current study is using past-behavioral
questions that are often used in the initial phase of job applications, it is assumed that to answer
honestly individuals will be required to recall a memory. This will therefore require them to use
words reflective of this past event that should be richer in sensory detail. By applying the word
categories of LIWC most closely associated with the RM approach, deceptive statements such as
ones based on false memories are expected to contain fewer sensory, spatial, temporal, and
affective words and more cognitive processing words (Bond & Lee, 2005; Johnson & Raye, 1981;
Vrij, 2008). The higher rate of words associated with cognitive processes such as possible and
maybe” is reflective of increased thought processes, imagination, and mental construction related
to imagined events (Kleinberg, van der Toolen, Vrij, Arntz, & Verschuere, 2017; Nahari, 2017).
The Current Study
The current study aimed to assess whether LIWC word categories associated with the NP
and RM models can be used to accurately classify responses to short-answer past-behavioral
questions typically used in online job applications. The NP model predicts deceptive language will
be distinguished from that of honest by focusing on four main functional word categories.
We assessed the same RM-related word categories within LIWC that Bond and Lee (2005)
successfully operationalized to accurately classify 70% of prisoner statements as either deceptive
or honest. As was undertaken by Bond and Lee (2005), the words from the collected responses to
past behavioral questions will be categorized using the default LIWC dictionary (for a review of
all LIWC word categories see Pennebaker, Boyd, Jordan, & Blackburn, 2015). While the primary
focus of this study is on the word categories directly associated with both the NP and RM models,
the frequencies of the related word stems for each model's respective word categories will also be
reported in the preliminary analysis. This is consistent with previous research that has utilised
LIWC (Newman et al., 2003) and will inform an exploratory data-driven approach.
Hypothesis 1: Informed by the NP approach, deceptive statements are predicted to contain
(a) fewer first person singular pronouns due to psychological distancing, (b) fewer conjunctions,
understood as a measure of language complexity, as deception is considered more cognitively
taxing than telling the truth (Hancock & Woodworth, 2013), (c) more motion verbs, as these
SHORT ANSWER DECEPTION
4
concrete descriptors are more readily accessible than cognitively complex language when liars are
relying on a fabricated memory (Newman et al., 2003), and (d) more negative emotion words, in
line with the leakage hypothesis (Ekman, 2007; Hancock & Woodworth, 2013).
Hypothesis 2: Consistent with the RM approach it was predicted that sensory, spatial,
temporal, and affective words would be more prevalent in honest statements than in deceptive
statements because they are theorized to reflect the qualities of a true memory as opposed to a
fabricated one. Cognitive process words are also expected to be less prevalent in honest statements
(Bond & Lee, 2005). Thus, overall it was hypothesized that through utilizing LIWC software the
NP and RM approaches could be applied to classify deceptive and honest responses to job
application questions at above chance levels.
Finally, we also examined an exploratory data-driven model that contained word categories
identified by LIWC as being those found to be the most prevalent in the combined deceptive
responses collected from across both past behavioral questions.
Method
Participants and Procedure
Participants were recruited through the online participant pool platform Prolific. Only
members of the platform who were proficient in English were invited to participate. Participants
received £4 British pounds for successfully completing the study. Participants completed the study
online with a personal computer. After reading a short description of the study, participants
answered demographic and background questions about deceptive behavior. They then completed
the main component of the study where they answered past behavioral short-answer employment
questions as if they were applying for a job.
An initial sample of 117 participants completed the study where (a) 9 participants were
removed because they indicated at the conclusion of the survey that they provided deceptive
responses in the honest condition or honest responses in the deceptive condition, and (b) 2
participants were removed because their responses did not satisfy the minimum character
requirements or because they used repetitive text entry strategies to satisfy the minimum character
requirements.
The final sample consisted of 106 participants (58% male) with a mean age of 30.28 years
(SD = 8.85, age range: 18-55 years). Most participants were employed (61% employed full-time,
16% employed part-time or casually, 17% student and not employed, 2% unemployed, 2% stay at
home parent, 2% other).
In order to investigate the base rate of the samples reported deceptive behavior, participants
were asked several questions about their past behavior and future intentions in a job applicant
context. Participants were first asked whether they had ever lied, exaggerated, or misled in an
attempt to secure employment. A minority of participants (23%) reported lying in the past in an
effort to secure employment. When asked to elaborate, these participants reported lying about work
experience (47%), personality (17%), knowledge/interests (8%), references (6%) and other matters
(11%). The main reasons for lying were financial reasons (62%) and a desire for career
advancement (24%). When participants were asked whether they were prepared to lie in the future
to secure employment, 18% indicated that they would lie, 31% would consider the possibility of
lying, and 51% indicated that they would never be prepared to lie.
The sample size of 106 resulted in 86% power to identify a statistically significant
difference in word frequencies, assuming d = 0.30, 0.5 correlation between deceptive and honest
statement word frequencies, and a .05 (two-tailed) significance threshold. The study received
ethics approval from the first author's University Faculty Human Ethics Advisory Group.
SHORT ANSWER DECEPTION
5
Short-Answer Employment Questions
Participants read a statement proceeding the short-answer question section of the
questionnaire that provided them with an overview of the study. It outlined that they would be
required to answer two short-answer questions both honestly and deceptively. There were two
distinct questions and participants answered both questions twice, once under instructions to
answer honestly, and again under instructions to answer deceptively (i.e., four responses in total).
The decision to use a within-subjects design was informed by Vrij (2018) who recently called for
deception researchers to use such a design in preference to between-subjects. In particular, the
within-subjects design also provides greater statistical power. For example, assuming d = 0.30 and
a correlation between conditions or r = .50, and alpha of .05, the present design has 86% power;
to achieve equivalent statistical power using a between-subjects design, the sample size would
need to be approximately four times larger. To control for order effects, participants were randomly
allocated into either honest-first (H1-H2-D1-D2) or deception-first (D1, D2, H1, H2) orderings.
Counterbalancing the order of conditions and not the question order ensured that no participant
was asked to answer the same question back to back, thus replicating the question sequencing in a
real-life application process. Participants were instructed to respond "honestly and truthfully" in
the honest condition and "lie and be untruthful" in the deception condition. Regardless of
condition, they were also instructed to respond in a way that they believed would give them the
best chance of securing employment, with the aim to look like a highly desirable job applicant.
The two questions were designed to be consistent with those currently used in online
applications for Australian government agencies. The two questions were (1) “Describe a notable
personal achievement, from your past or present, which demonstrates your ability to excel beyond
the norm. Provide a description of how you achieved it”, and (2) “Describe a time where you have
demonstrated your ability to collaborate with others, relying on your successful interpersonal
skills, to build and maintain effective relationships to achieve a positive outcome. What was that
outcome?” Responses were required to be between 1,000 and 2,000 characters (approximately 300
words). At the end of each question, a manipulation check was administered that required
participants to indicate whether their previous answer was honest or deceptive. Participants also
rated response satisfaction and degree of difficulty for each of their statements assessed on a 5-
point Likert scale, similar to the protocol of Hartwig, Granhag, Strömwall, and Doering (2010).
Textual analysis of the responses was performed using the software Linguistic Inquiry and
Word Count (LIWC; Pennebaker et al., 2015). LIWC is a text analysis software that analyses text
on a word-by-word basis with an internal master dictionary that is composed of 6400 words, word
stems (partial words that are terminated with an asterisk), and selected emoticons (Pennebaker et
al., 2015). LIWC can be used to measure the prevalence of linguistic (e.g., pronoun) and
psychological (e.g., positive affect) word categories. LIWC was used to measure the word
frequency of 30 verbal categories in each short-answer question. Percentages were obtained for
both past behavioral short-answer questions in each condition. Additionally, as both short answer
questions were past behavioral in nature, and the RM and NP models have historically been used
to assess this style of question (as opposed to future intention based responses), the average honest
and deceptive percentages were obtained by averaging the ratings across the two questions within
a condition.
Logistic regression for the NP and RM models was firstly applied to the responses for each
of the two individual questions and then to the averaged honest and deceptive responses from each
of the two short answer question responses.
In addition, two raters with professional selection and recruitment experience assessed all
SHORT ANSWER DECEPTION
6
responses. This assessment was completed without knowledge of LIWC scores or manipulation
condition. First, responses were checked for appropriateness as indicated by being legible and
addressing the question (i.e., not random sentences). Second, the quality of each response was
rated. Ratings were provided on a five-point scale from 1 = Does not address the requirement of
the question to 5 = Addresses the nature of the question in a highly specific and relevant nature.
When rating responses, assessors applied the widely used STAR approach to grading responses to
behavioral questions (Deters, 2017). As such, a high quality rating required the response to have
thoroughly address the Situation, Task, Action, and Result related to the question. The inter-rater
agreement of quality ratings were calculated for each question and condition and ranged from r =
.60 to .65 (Q1 Honest r = .60; Q2 Honest r = .60; Q1 Deceptive r = .62; Q2 Deceptive r = .65).
Results
Differences between Honest and Deceptive Responses
As a preliminary check, we first examined whether there was any evidence that participants
responded to deception and honest conditions differently based on the order of completion. We
conducted several 2 by 2 mixed ANOVAs (honest versus deception condition by honest first
versus deception first order) for all the verbal categories and the question ratings. Overall, there
was little evidence of order effects or order by condition interactions. The main significant effect
after Bonferroni correction related to satisfaction with answers, whereby participants were less
satisfied overall with their answers when answering in the honest then deceptive order (p = .002).
Participants also showed a relatively greater increase in satisfaction with honest relative to
deceptive answers when presented in the honest then deceptive order (p = .004). Importantly, there
were no significant interactions between order and condition on verbal category use.
We then examined broad differences between answers in the honest and deceptive
conditions. First, there was no significant difference (d = 0.04) in the number of words in the
honest (M = 208.5, SD = 31.1) and the deceptive (M = 209.7, SD = 32.3) condition. Participants
reported that providing deceptive responses (M = 2.63, SD = 1.02; Q1 M = 2.69; Q2 M = 2.58)
was significantly more difficult than honest responses (M = 3.42, SD = 1.04; Q1 M = 3.50; Q2 M
= 3.36) across both questions (d = 0.76, p < .001). Significantly lower levels of satisfaction were
also reported by participants for their combined deceptive responses (M = 3.33, SD = 0.90; Q1 M
= 3.41; Q2 M = 3.25) compared to honest responses (M = 3.76, SD = 0.77; Q1 = 3.79; Q2 M =
3.73) for both short-answer questions (d = 0.51, p < .001). Overall, responses were rated by experts
to be of higher quality (d = 0.44, p < .001) in the deceptive condition (M = 4.19; SD = 0.75) than
in the honest condition (M = 3.81; SD = 0.99). As such, participants were able to write more
effective responses when they fabricated responses.
In order to provide an exploratory overview of differences, mean proportional word use in
each verbal category across deceptive and honest conditions is shown in Table 1. Paired samples
t-tests indicated that nine-word categories were significantly different between the honest and
deceptive conditions at the .05 level. Although after Bonferroni correction, only two of these
differences were statistically significant at the more stringent .001 level. Deceptive responses
involved less use of first person singular words, auxiliary verbs, adverbs, conjunctions, and
cognitive process words, and significantly more use of articles, social process words, hearing
words, and motion words as measured by the LIWC dictionary compared to honest responses.
Correlations between the ratings of the quality of responses and the LIWC word
frequencies are also shown in Table 1. Notably the LIWC word category of Negate” which
consists of words such as “No” and “Never” showed the strongest relationship, with a strong
negative correlation with quality of r = -.42, p < .01 within the honest statements.
SHORT ANSWER DECEPTION
7
Newman-Pennebaker Model
Logistic regression was used to assess the ability of the set of linguistic categories
associated with the NP model to predict deception firstly for each individual question and then
across combined responses to both questions.
Analysis for the individual responses associated with the two individual questions revealed
variation in the NP models ability to predict deception. For the question of (1) “Describe a notable
personal achievement, from your past or present, which demonstrates your ability to excel beyond
the norm. Provide a description of how you achieved it” The NP model demonstrated a statistically
significant improvement over the null model χ² (4) = 14.18, p < .01, resulting in a 63.2% level of
accuracy at classifying responses to Q1. Table 2 shows the model coefficients of conjunctions (β
= -.16) motion verbs (β = .16) and first person singular pronouns (β = -.07) to not be statistically
significant while negative emotions (β = -.30) was significant and in the theorized direction.
Whereas for question (2) Describe a time where you have demonstrated your ability to
collaborate with others, relying on your successful interpersonal skills, to build and maintain
effective relationships to achieve a positive outcome. What was that outcome”? Results revealed
χ² (4) = 10.32, p < .05 an accuracy level of 55.2% when the NP model was applied to responses
exclusively from question 2. Table 2 shows the model coefficients of the NP model revealing the
Negative Emotions (β = .33) category to be the only significant predictor, although in the opposite
direction when compared to the coefficient obtained for this variable in question 1. Motion verbs
(β = .24) was approaching significance while conjunctions (β = -.12) and first person singular
pronouns (β = -.08) were not statistically significant predictors.
To test H1 across the combined responses to both questions, the NP model indicated a
statistically significant improvement over the null model χ² (4) = 10.72, p < .05, resulting in an
overall 60% accuracy in classifying responses. Table 2 presents model coefficients revealing that
conjunctions (β = -.20) and motion verbs (β = .33) were both significant p <.05 and in the
hypothesized direction, indicating that deceptive language contained less conjunctions and a
higher use of motion verbs. The beta coefficients of first person singular pronoun (β = -.07) and
negative emotions (β = -.02) were not significant.
The Reality Monitoring Model
To test H2 the RM model was firstly applied to each of the two individual questions. For
the question of (1) “Describe a notable personal achievement, from your past or present, which
demonstrates your ability to excel beyond the norm. Provide a description of how you achieved it
Logistic regression did not show a statistically significant improvement over the null model χ² (4)
= 5.62, p > .05, and resulted in an approximately 58% level of accuracy. As shown in Table 3 no
significant beta coefficients were observed. Whereas for question (2) “Describe a time where you
have demonstrated your ability to collaborate with others, relying on your successful interpersonal
skills, to build and maintain effective relationships to achieve a positive outcome. What was that
outcome”? Results revealed χ² (4) = 15.97, p < .05 a statistically significant improvement over the
null model, resulting in an accuracy level of 61.3% when applying the RM model to the responses
from question 2. Table 3 reveals the affective word category as the only verbal category in the
model with a significant beta coefficient (β = .24).
To test H2 across the combined responses to both short-answer questions, logistic analysis
indicated a significant improvement over the null model χ² (6, N = 212) = 12.38, p < .05 resulting
in an overall accuracy level of 60.4%. The beta coefficient (see Table 3) for the verbal indicator
hear (β = 1.03) was the largest and only statistically significant beta coefficient p < .05. All other
recorded beta coefficients for each of the word categories were not significant. Both the NP and
SHORT ANSWER DECEPTION
8
RM models demonstrated overall accuracy levels of 60% when the responses from both questions
were aggregated. Analysis undertaken at the level of individual question type revealed a variation
in the accuracy levels associated with each model. Between the RM and NP models there was a 5-
6% difference in accuracy for each of the questions.
Exploratory Data Driven Model
In addition to the NP and RM models an exploratory focused data driven model was tested.
This model was informed solely by the verbal variables presented in table 1 that were observed as
being significantly different between the combined honest and deceptive statements. As displayed
in table 4, the data driven model contained 9 variables that were tested with logistic regression.
The exploratory focused data driven model was statistically significant χ² (9, N = 212) = 27.17 p
< .001 resulting in an overall 66.5% accuracy. This exploratory data driven model was biased due
to only consisting of verbal indicators found to be statistically significant across conditions as
identified in Table 1. Only two verbal indicators in the data driven model resulted in significant
beta coefficients, these were Social (β = .16) and Hear (β = .87).
Discussion
This study examined the ability of textual analysis software to classify honest and deceptive
responses to written short-answer employment questions. Underlying the importance of the topic,
23% of participants reported previously lying on a job application, and nearly 50% reported being
either prepared to lie or would consider lying in the future so as to secure employment. Several
key findings emerged. First, based on external quality ratings, participants were able to provide
deceptive responses that were judged as being superior to honest responses. Second, deceptive
responses were perceived as more difficult to write and less satisfying. Third, the utility of the
tested models when applied to each of the individual questions separately revealed better than
chance overall accuracy levels. When responses from both question types were combined, the
LIWC verbal categories attributed to both the NP and RM models were able to classify statements
as deceptive or honest at 60% and above levels of accuracy. Finally, an exploratory data driven
model out-performed the accuracy levels of the NP and RM models.
Accuracy Levels of Tested Models
To test if a computerized approach to analyzing language for deception could result in
greater levels of accuracy than chance, verbal categories from two well established models were
tested with the LIWC software. Both models showed some variation in accuracy at the individual
question level which ranged from 55.2% to 63.2%. The NP and RM models as measured by the
word categories of LIWC both resulted in approximately 60% accuracy at predicting the combined
deceptive and honest responses to both questions.
Both the NP and RM models generally underperformed the accuracy levels achieved in
previous studies that have utilized these approaches. Bond and Lee (2005) used LIWC to test both
the NP and RM models. They found that the NP model predicted 69.1% of statements accurately
while the RM approach which utilized the same word categories as employed in this study
predicted marginally better with 71% accuracy. While these accuracy levels were higher than the
present study, Bond and Lee (2005) utilized statements based on the natural language production
of prisoners, who were asked to recount a short clip they had just viewed either deceptively or
honestly. Similarly, the higher overall rates of accuracy achieved by Newman et al. (2003) who
used the NP model to predict deception involved analyzing highly emotive statements based on
individuals' attitudes to abortion. Their study was able to classify statements with 61% accuracy.
While the current study did not utilize natural language production it revealed similar patterns of
language use in typed written statements from an unrelated context.
SHORT ANSWER DECEPTION
9
Previous research has also noted that verbal indicators such as pronoun use can be sensitive
to the stakes of the context (Granhag et al., 2015; Vrij, 2009). As the participants in this study were
not truly applying for an advertised position, it may be feasible that the level of deceptiveness and
honesty in the provided responses, did not receive the same level of attention that they may have
in a real-life setting. The opportunity of securing employment and the benefits associated with it,
may therefore have the effect of increasing the magnitude of the effects of the verbal indicators
identified in this study. Furthermore, unlike much of the previous research that has used
transcribed natural language, this study analyzed transcripts that participants could also freely edit
and review before submitting. That this study could achieve significantly higher levels of accuracy
than chance with the NP and RM approaches from transcripts written in a computer mediated
environment is a critical finding.
This study found support that lying is more cognitively taxing than telling the truth based
on participants self-report of the difficulty associated with providing a deceptive versus an honest
statement. A contemporary development in detecting deception has been to use questions that
impose more cognitive load (for a review see Granhag et al., 2015). By imposing greater cognitive
load through reverse order questions and requesting additional information the verbal cues to
deception may potentially be amplified (Granhag et al., 2015). This study used short answer
questions like those used in the initial phase of online job applications. Alterations to this question
structure that facilitates applicants to provide additional information may prove useful when
seeking to establish honesty. Examples of how this could be implemented should look to the
Cognitive Credibility assessment literature that uses methods to encourage interviewee’s or in this
case applicants to provide more information in response to the question (Bull, 2014). The
collection of more information through additional question prompts by asking the applicant to
expand on their last answer may further assist by leading to additional information that can be
analyzed and used to investigate truthfulness (Vrij, 2018). The LIWC software has been shown
here as a useful tool to assist with processing large volumes of responses to detect deception in
simple past behavior based questions. The analysis of future responses to questions that are
structured to make lying more cognitively taxing may serve to amplify these verbal cues even more
and subsequently improve accuracy rates. As a result, the application of computerized approaches
that analyze language may hold important functionality for organizations when undertaking
employee selection processes in the future.
The Verbal Patterns in Deceptive Responses Identified
To the best of our knowledge, this was the first study to examine whether textual analysis
software could be used to classify the veracity of written responses to short-answer employment
questions. As a result of the paucity of similar research, it is useful to compare if the significant
differences between word categories observed in the combined deceptive and honest responses
were consistent with that of previous empirical literature which has focused on different contexts.
The finding of significantly higher use of motion verbs in combined deceptive responses was
consistent with that of previous research which has found higher use of motion verbs to be present
in deceptive responses across multiple experiments (Newman et al., 2003) and further confirmed
across a range of studies in a recent meta-analysis (Hauch, Masip, Blandon-Gitlin, & Sporer 2012).
While Bond and Lee (2005) note that motion words appear to have no precedent in the literature
they suggest the use of motion words as measured by LIWC such as “went’ and ‘go’ may also be
used to distract or divert readers attention elsewhere. Interestingly, Newman et al. (2003) also
found that the less cognitively complex lying is, then the higher use of motion verbs in deceptive
statements one can expect. While the current study may have been less cognitively taxing than the
SHORT ANSWER DECEPTION
10
contexts of some previous research (e.g., providing false testimony), participants did report
significantly greater difficulty when providing deceptive responses and lower levels of satisfaction
when providing deceptive statements compared to honest ones. The greater difficulty associated
with providing deceptive responses reported by participants is consistent with cognitive load
theory, which proposes that lying is more cognitively taxing due to the intentional effort required
(Granhag et al., 2015).
Consistent with past research and the theory of the increased cognitive demands associated
with deception (Hancock & Woodworth, 2013), the present study also found that conjunctions
were less prevalent in deceptive responses. A significantly reduced level of cognitive process
words may mirror the difficulty that individuals found with providing detailed deceptive responses.
Interestingly, in the current study it appears that individuals may have attempted to over
compensate for their lack of detail by using more social process words. This may have been an
attempt by individuals to actively try to positively manage their impression whilst lying.
The current study also found that fewer personal pronouns were used in deceptive
statements. The use of personal pronouns is one of the most studied verbal indicators investigated
throughout a range of contexts across the field (Campbell & Pennebaker, 2003; Pennebaker, 2011).
The use of fewer pronouns is often understood as a psychological distancing mechanism that
individuals use when engaging in deception (Granhag et al., 2015; Vrij, 2008). This also suggests
that this linguistic distancing tendency may potentially operate in employment contexts although
may fluctuate depending on the nature of the question.
Interestingly, this study found no support for the leakage hypothesis that predicts that
negative emotions associated with the anxiety of lying will result in the use of more words related
to negative emotions (Ekman, 2007). It may be that this effect was not observed because unlike
when undertaking a real employment application process the participants from the current study
were not at risk of being held accountable for the consequences of their deception. The possibility
also exists that writing the response may have further negated the negative emotion that individuals
may feel when telling the same lie in an interpersonal context. While Newman et al. (2003)
reported negative emotion words did predict deception in a written context the topic of attitudes
toward abortion that their participants wrote about may have proven more emotion eliciting than
the questions used by the current study. Hancock and Woodworth (2013) have further suggested
that verbal language associated with deception may prove context specific and fluctuate depending
on the guilt and anxiety associated with the lie and how cognitively demanding it is.
Limitations
Several limitations should be noted. First, not knowing the ground truth associated with
each participant's response is an issue for both the current study and much of the existing deception
literature (Granhag et al., 2015; Vrij, 2008). The participants in this study were asked to provide
deceptive and honest responses. However, the absolute degree of their lie cannot be ascertained
for certain. For instance, Leins, Fisher, and Ross (2012) found that liars will often incorporate
previous experiences into their deceptive accounts rather than provide complete fabrications. As
this study did assess the way that participants created their deceptive responses, it is possible that
some deceptive statements contained truthful elements that were informed by real-life experience.
Second, the current study used a within-subjects design. While this allowed for much
greater statistical power and there was no evidence of carry-over effects, it would be useful for
future research to replicate these findings using between-subject designs. While between group
designs would require a greater number of participants to ensure adequate statistical power, it
would further control for the potential of carry-over effects. If the emergence of similar linguistic
SHORT ANSWER DECEPTION
11
profiles of deception does occur across multiple studies this will help reinforce the confidence to
include these verbal variables in an applicant based selection algorithm. Relatedly, while the study
had adequate statistical power for the primary hypotheses, even larger samples would be useful in
order to have more power for the more exploratory analyses.
Third, the study used non-applicants. While this allows for greater clarity around honest
and deceptive responses, it may not replicate the motivational context that a true job applicant
faces. Future research should explore the characteristics of actual applicant responses to a variety
of short-answer questions.
Conclusion and Future Research
As discussed in the recent review article by Nahari and colleagues (2019), automated
scoring approaches that are perfectly reliable do not exist; this may be due, in part, to the fact that
they do not incorporate contextual information. As revealed in the current study two experienced
raters rated the deceptive responses provided by participants as being of a significantly higher
quality than honest responses. Future research could examine combining automated text analysis
with some of the well-established qualitative methods of statement coding to be completed by
trained raters. A recent approach tested by Nahari (2016) involved raters identifying and
comparing the presence of just two criteria. These included the frequency and the intensity of the
presence of perceptual and contextual details being within a statement. Frequency counts of these
criteria rather than judged intensity led to significantly better accuracy. Furthermore, if computer
software developers can develop linguistic programs that move beyond simply counting categories
of words, and can instead code and measure perceptual and contextual details, then greater levels
of accuracy may be obtainable than those achieved with the models used in the current study. Coh-
Metrix is one such promising software program that has a greater level of sophistication than
LIWC and therefore allows for more nuanced analysis. It also allows for the integration of speech
classifiers and semantic analysis (Levine, 2014). For instance, Coh-Metrix has been used to show
that liars have more redundancy in their sentences with more repetition of meaning and word
patterns (Duran, Hall, McCarthy, & McNamara, 2010; Levine, 2014). Future research could
further explore the potential of this and related software and assess whether accuracy levels can be
improved if used in conjunction with trained human raters.
The major contribution of the current study was to introduce textual analysis as a potential
tool for examining the veracity of responses to written short-answer questions in online recruitment
settings. Findings suggest that applicants often lie and that the prevalence of several word
categories differed between deceptive and honest statements. Experienced employment
professionals also rated deceptive responses as being higher in quality than that of honest ones.
The study introduced two theoretical models—the Newman-Pennebaker (NP) and Reality
Monitoring (RM) models—that could be used to frame expectations about differences in word
categories across honest and deceptive statements. While promising, the utility of these models
may vary depending on the nature of the question. While using text analysis for detecting deception
in written responses is still in the research and development phase, the potential for assisting in
recruitment and selection appears promising.
References
Anglim, J., Bozic, Little, J., & Lievans, F. (201). Response distortion on personality tests
in applicants: Comparing high-stakes to low-stakes medical settings. Advances in Health Sciences Education, 18, 311-
321.
Anglim, J., Morse, G., De Vries, R, E., MacCann, C. & Marty, A. (2017). Comparing job applicants to non-applicants using an
item- level bifactor model on the HEXACO personality inventory. European Journal of Personality, 31, 669-684.
Birkeland, S. A., Manson, T. M., Kisamore, J. L., Brannick, M. T., & Smith, M. A. (2006). A meta-analytic investigation of job
applicant faking on personality measures. International Journal of Selection and Assessment, 14, 317335.
SHORT ANSWER DECEPTION
12
Bond, C. F., & DePaulo, B. M. (2006). Accuracy of deception judgements. Personality and Social Psychology Review, 10(3),
214-234.
Bond, G. D., & Lee, A. Y. (2005). Language of lies in prison: Linguistic classification of prisoners' truthful and deceptive natural
language. Applied Cognitive Psychology, 19(3), 313-329.
Bourdage J. S., Roulin, N., & Levashina, J. (2017). Impression management and faking in
job interviews. Frontiers in Psychology, 8, 1-4.
Bull, R. (Ed.). (2014). Investigative interviewing. New York, NY: Springer.
Campbell, R. S., & Pennebaker, J. W. (2003). The secret life of pronouns: Flexibility in writing style and physical health.
Psychological Science, 14, 6065.
Comer, M. J, & Stephens, T. E. (2004). An HR guide to workplace fraud and criminal behaviour: Recognition, prevention and
management. England: Gower Publishing Limited.
Deters, J. (2017). Global leadership talent management: Successful selection of
global leadership Talents as an integrated process. UK: Emerald Publishing
Limited.
Duran, N. D., Hall, C., McCarthy, P. M., & McNamara, D. S. (2010). The linguistic correlates of conversational deception:
Comparing natural language processing technologies. Applied Psycholinguistics, 31(03), 439-462.
Eggert, M. A. (2013). Deception in selection: Interviewees and the psychology of deceit. England: Gower Publishing Limited.
Ekman, P. (2007). Emotions revealed: Recognizing faces and feelings to improve communication and emotional life (2nd ed.).
New York: St Martin’s Griffin
Elaad, E. (2003). Effects of feedback on the overestimated capacity to detect lies and the underestimated ability to tell lies.
Applied Cognitive Psychology, 17, 349-363.
Granhag, P. A., Vrij, A., & Verschuere, B. (2015). Detecting deception: Current challenges and cognitive approaches. United
Kingdom: John Wiley and sons Ltd
Hancock, J. T., & Woodworth, M. (2013). An “Eye” for an “I”: The challenges and opportunities for spotting credibility in a
digital world. In B. S. Cooper, D, Griesel, & M. Ternes (Eds.), Applied issues in investigative interviewing, eyewitness
memory, and credibility assessment (pp. 325-340). New York: Springer.
Hartwig, M., Granhag, P. A., Strömwall, L. A., & Doering, N. (2010). Impression and information management: On the strategic
self-regulation of innocent and guilty suspects. The Open Criminology Journal, 3, 10-16.
Hauch, V., Masip, J., Blandon-Gitlin, I., & Sporer, S. L. (2012). Linguistic cues to deception assessed by computer programs: A
meta-analysis. In Proceedings of the workshop on computational approaches to deception detection (pp. 1-4).
Association for Computational Linguistics.
Henle, C. A., Dineen, B. R., & Duffy, M. K. (2017). Assessing intentional resume deception: Development and nomological
network of a resume fraud measure. Journal of Business and Psychology, 1-20.
Ho, S. M., Hancock, J. T., Booth, C., & Liu, X. (2016). Computer-mediated deception: Strategies revealed by language-action
cues in spontaneous communication. Journal of Management Information Systems, 33(2), 393-420.
Johnson, M. K., & Raye, C. L. (1981). Reality monitoring. Psychological Review, 88, 67-85.
Jupe, L., Vrij, A., Leal, S., & Mann, S. (2016). The lies we live: Using the verifiability approach to detect lying about occupation.
Journals of Articles in Support of the Null Hypothesis, 13, 1539-8714.
Kleinberg, B., van der Toolenn, Y., Vrij, A., & Verschuere, B. (2018). Automated verbal credibility assessment of intentions:
The model statement technique and predictive modelling. Applied Cognitive Psychology, 32(3), 354-366.
doi:10.1002/acp.3407
Leins, D. A., Fisher, R. P., & Ross, S. J. (2012). Exploring liars strategies for creating deceptive reports. Legal and
Criminological Psychology, 1(67), 1-11.
Levashina, J., Hartwell, C. J., Morgeson, F. P., & Campion, M. A. (2014). The structured
employment interview: Narrative and quantitative review of the research literature. Personnel Psychology, 67, 241
293.
Levashina, J., Morgeson, F. P., & Campion, M. A. (2009). They don’t do it often, but they do it
well: Exploring the relationship between applicant mental abilities and faking. International Journal of Selection and
Assessment, 17(3), 271-281.
Levine, R. L. (Ed.). (2014). Encyclopedia of deception. London: Sage Publications, Inc.
Melchers, K. G., Roulin, N., & Buehl, A.-K. (2020). A review of applicant faking in selection interviews. International Journal of
Selection and Assessment, Advance Online Access.
Nahari, G. (2016). When the long road is the shortcut: A comparison between two coding methods for content-based lie-detection
tools. Psychology, Crime and Law, 22(10), 1000-1014.
Nahari, G. (2017). Reality monitoring in the forensic context: Digging deeper into the speech of liars. Journal of Applied
Research in Memory and Cognition, 7(3), 432-440. doi: 10.1016/j.jarmac.2018.04.003
Nahari, G., Ashkenazi, T., Fisher, R. P., Granhag, P. A., Hershkowitz, I., Masip, J., Meijer, E. H., Nisin, Z., Sarid, N., Taylor, P.
J., Verschuere, B., & Vrij, A. (2019). ‘Language of lies’: Urgent issues and prospects in verbal lie detection research.
Legal and Criminological Psychology, 24, 1-23.
Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic
styles. Personality and Social Psychology Bulletin, 29(5), 665-675.
Pennebaker, J. W. (2011). The secret lives of pronouns. New York: Bloomsbury Press.
SHORT ANSWER DECEPTION
13
Pennebaker, J. W., Booth, R. J., Boyd, R. L., & Francis, M. E. (2015). Linguistic Inquiry and Word Count: LIWC2015. Austin,
TX: Pennebaker Conglomerates (www.LIWC.net).
Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and
psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin.
Ployhart, R. E. (2014). Personnel selection: Ensuring sustainable organizational effectiveness
through the acquisition of human capital. In S. W. J. Kozlowski (Ed.), The Oxford Handbook of Organizational
Psychology, pp. 221-246. Oxford, England: Oxford University Press.
Porter, S., & Yuille, J. C. (1995). Credibility assessment of criminal suspects through statement analysis. Psychology, Crime,
and Law, 1(4), 319-331.
Robinson, W. P., Shepherd, A., & Heywood, J. (1998). Truth, equivocation concealment, and lies in job applications and doctor-
patient communication. Journal of Language and Social Psychology, 17, 149-164.
Roulin, N., Bangerter, A., & Levashina, J. (2014). Honest and deceptive impression management in the employment interview:
Can it be detected and how does it impact evaluations. Personnel Psychology, 68, 395-444.
Roulin, N., & Levashina, J. (2016). Impression management and social media profiles. In R.
Landers & G. Schmidt (Eds.), Social media in employee selection and recruitment: theory, practice, and current
challenges (pp. 223248). Switzerland: Springer.
Roulin, N., & Powell, D. M. (2018). Identifying applicant faking inn job interviews: Examining
the role of criterion-based content analysis and storytelling. Journal of Personnel Psychology, 17(3), 143-154.
Schneider, L., Powell, D. M., & Roulin, N. (2015). Cues to deception in the employment interview. International Journal of
Selection and Assessment, 23(2), 182-190.
Tauzczik, Y., & Pennebaker J. (2010). The psychological meaning of words: LIWC and computerized text analysis methods.
Journal of Language and Social Psychology, 29(1), 1-31.
Toma, C. L., & Hancock, J. T. (2012). What lies beneath: The linguistic traces of deception in online dating profiles. Journal of
Communications, 62(1), 78-97. doi:10.1111/j.1460-2466.2011.01619.x
Undeutsch, U. (1954). Die Entwicklung der gerichtspsychologischen Gutachtertӓtigkeit. Oxford, England: Hogrefe, Verlag Fuer
Psychologie.
Undeutsch, U. (1967). Forensische psychologie [Forensic psychology]. Gottingen, Germany: Verlag Fuer Psychologie.
Vrij, A. (2008). Detecting lies and deceit: Pitfalls and opportunities (2nd ed.). England: John Wiley and sons Ltd.
Vrij, A. (2018). Deception and truth detecting when analyzing nonverbal and verbal cues. Applied Cognitive Psychology, 33, 1-8.
Zhou, L., Burgoon, J. K., Nunamaker, J. F., & Twitchell, D. (2004). Automated linguistics
based cues for detecting deception in text-based asynchronous computer-mediated communication: An empirical
investigation. Group Decision and Negotiation, 13, 81-106.
SHORT ANSWER DECEPTION
14
Table 1
Differences in Mean Word Category Percentages When Averaged Across Honest and Deceptive
Statements
Verbal Indicators
Honest
Deceptive
Examples
SD
r
M
SD
r
d
Function Words (Total)
3.90
-.11
54.59
3.78
-.24*
-0.16
Pronoun
Our, They, Yours
3.22
-.11
14.61
3.00
-.22*
-0.08
Personal Pronoun
I, them, her
2.39
-.05
9.70
2.26
-.17
-0.02
First Personal Singular Pronoun1
I, me, my
2.43
-.09
6.23
2.05
-.12
-0.23*
Impersonal Pronoun
It, They, That
1.89
-.11
4.92
1.65
-.18
-0.11
Article
A, An, The
1.97
.14
8.13
1.93
.09
.32***
Prepositions
On, to, from
1.81
.21*
16.18
1.94
.24*
-0.06
Aux verb
Am, Will, Have
1.84
-.28**
7.20
1.99
-.21*
-0.22*
Adverb
Very, Really
1.55
-.17
3.88
1.46
-.23*
-0.23*
Conjunctions1
And, also,
1.56
.04
6.52
1.48
-.16
-0.27**
Negate
No, never
0.68
-.42**
0.88
0.67
-.14
-0.18
Affect (Total)2
Happy, Ugly
1.52
-.03
4.77
1.46
-.09
0.18
Positive Emotion
Pretty, Good
1.29
.07
3.67
1.33
-.06
0.21
Negative Emotion1
Hate, worthless
0.78
-.16
0.94
0.79
-.09
-0.01
Social (Total)
Talk, They
2.73
.08
9.31
2.70
.01
.41***
Cognitive Processes (Total)2
Possible, Maybe
2.78
-.26**
10.65
3.04
-.11
-0.21*
Insight
Think, Know
0.92
-.02
2.14
1.00
.00
-0.17
Cause
Effect, Hence
1.07
-.03
1.89
0.87
.19
-0.01
Discrepancies
Would, Could
0.82
.02
1.43
0.77
-.01
-0.06
Tentative
Maybe, Perhaps
1.08
-.29**
1.86
1.04
-.17
-0.19
Certain
Always, Never
0.89
-.16
1.74
1.00
-.11
-0.11
Differentiations
But, Except,
1.26
-.23*
2.38
1.14
-.22*
-0.19
Perceptual Processes (Total)
Observing, Heard
0.69
-.10
1.35
0.85
-.22*
0.18
See2
View, Saw
0.47
.03
0.42
0.42
-.19
0.00
Hear2
Listen, Hearing
0.32
-.00
0.34
0.41
-.10
0.30**
Feel
Feels, touch
0.43
-.22*
0.47
0.45
-.19
0.07
Relatively (Total)
Area, Bend
2.90
.03
14.67
2.92
.09
0.04
Motion1
Go, move, walk
0.95
-.02
2.14
1.01
-.04
0.30*
Space2
Around, Over
1.60
-.02
7.46
1.75
-.04
0.07
Time2
Hour, Day
1.78
.07
5.26
1.67
.18
-0.17
Note. r column denotes correlations between quality ratings and word categories use. 1 NP variables 2 RM variables
* p < .05, ** p <.01 *** p < .001
SHORT ANSWER DECEPTION
15
Table 2
Logistic Regression of Newman-Pennebaker Model Predicting Deception
Predictor
β
S.E.
Wald
p
Odds
Ratio
95% CI for odds
Ratio
Lower Upper
Q1
First Person
Singular Pronoun
-.07
.04
2.23
.14
.94
.86
1.02
Negative Emotions
-.30
.15
4.09
.04
.74
.55
.99
Conjunctions
-.16
.08
3.66
.06
.85
.73
1.00
Motion Verbs
.16
.11
2.20
.14
1.17
.95
1.45
Intercept
1.50
.66
5.08
.02
4.40
Q2
First Person
Singular Pronoun
-.02
.05
1.00
.75
.98
.75
.98
Negative Emotions
.33
.15
4.41
.04
1.39
.04
1.39
Conjunctions
-.12
.08
2.49
.11
.88
.11
.88
Motion Verbs
.24
.12
3.85
.05
1.27
.05
1.27
Intercept
.18
.60
0.09
.76
1.19
Q1 & Q2 Combined Responses
First Person
Singular Pronoun
-.07
.06
1.12
.29
.93
.82
1.06
Negative Emotions
-.02
.18
0.01
.91
.98
.69
1.40
Conjunctions
-.20
.09
4.20
.04
.82
.68
.99
Motion Verbs
.33
.15
4.80
.03
1.40
1.03
1.87
Intercept
1.12
.77
2.12
.15
3.06
Note. NP model consists of the First Person Singular Pronouns, Conjunctions, Motion, and Negative Emotions
SHORT ANSWER DECEPTION
16
Table 3
Logistic Regression of Reality Monitoring Model Predicting Deception
β
S.E.
Wald
p
Odds
Ratio
95% CI for odds Ratio
Lower Upper
Q1
Cognitive Processes
-.06
.05
1.70
.19
.94
.86
1.03
Affect
-.04
.07
.39
.53
.96
.84
1.09
Time
-.08
.06
1.90
.17
.92
.82
1.04
Space
-.02
.07
0.14
.71
.98
.86
1.11
See
-.11
.21
0.28
.59
.89
.60
1.34
Hear
.36
.26
1.88
.17
1.43
.86
2.40
Intercept
1.44
1.01
2.04
.15
4.24
Q2
Cognitive Processes
-.05
.04
1.50
.22
.95
.88
1.03
Affect
.24
.09
7.77
.01
1.27
1.07
1.51
Time
-.01
.07
0.04
.83
.99
.86
1.13
Space
.09
.07
1.52
.22
1.09
.95
1.25
See
.14
.28
0.27
.60
1.16
.67
2
Hear
.63
.34
3.36
.07
1.87
.96
3.67
Intercept
-1.31
1.03
1.63
.20
.27
Q1 & Q2 Combined Responses
Cognitive Processes
-.09
.05
2.95
.09
.91
.82
1.01
Affect
.12
.10
1.35
.244
1.12
.92
1.37
Time
-.12
.09
1.90
.17
.89
.75
1.05
Space
.04
.09
0.17
.68
1.04
.87
1.24
See
.05
.33
0.02
.89
1.05
.55
2
Hear
1.03
.43
5.60
.02
2.80
1.19
6.54
Intercept
.52
1.33
0.15
.69
1.69
Note. RM model consists of Sensory, Spatial, Temporal, Affective, and Cognitive Processes.
SHORT ANSWER DECEPTION
17
Table 4
Exploratory Data Driven Model Predicting Deception in Combined Responses
β
S.E.
Wald
Odds
Ratio
95% CI for odds Ratio
Lower Upper
First Person Singular Pronoun
.06
.08
.54
1.06
.91
1.23
Article
.15
.09
2.30
1.16
.96
1.40
Auxverb
-.11
.09
1.56
.90
.75
1.06
Adverb
-.09
.12
.54
.92
.73
1.15
Conjunctions
-.13
.11
1.40
.88
.71
1.09
Social
.16
.06
6.23
1.17
1.03
1.32
Cognitive Processes
.01
.06
.03
1.01
.90
1.14
Hear
.87
.45
4.12
2.38
1.03
5.50
Motion
.30
.16
3.66
1.35
.99
1.84
Intercept
-1.80
1.93
.88
.16
Note. Exploratory Data Driven Model consists of all significant word categories from Table 1
... Second, computational stylometry has been tested for its ability to detect deception with various tools and methods (e.g., for LIWC, see Ali and Levine, 2008;Fornaciari and Poesio, 2013;Newman et al., 2003;Tomas et al., 2021c; for named entity recognition, see Kleinberg et al., 2018; for morpho-syntactic labeling, see Banerjee and Chua, 2014; for n-grams, see Cagnina and Rosso, 2017;Hernández Fulsilier et al., 2015;Ott et al., 2013; for vector representations, see Nam et al., 2020; for BERT, see Barsever et al., 2020). Third, it has been the subject of over 20 peer-reviewed publications (e.g., Hauch et al., 2015;Forsyth and Anglim, 2020;Tomas et al., 2021a). ...
Article
Full-text available
In this article, we wish to foster a dialogue between theory-based and classification-oriented stylometric approaches regarding deception detection. To do so, we review how cue-based and model-based stylometric systems are used to detect deceit. Baseline methods, common cues, recent methods, and field studies are presented. After reviewing how computational stylometric tools have been used for deception detection purposes, we show that the stylometric methods and tools cannot be applied to deception detection problems on the field in their current state. We then identify important advantages and issues of stylometric tools. Advantages encompass quickness of extraction and robustness, allowing for best interviewing practices. Issues are discussed in terms of oral data transcription issues and automation bias emergence. We finally establish future research proposals: We emphasize the importance of baseline assessment and the need for transcription methods, and the concern of ethical standards regarding the applicability of stylometry for deception detection purposes in practical settings, while encouraging the cooperation between linguists, psychologists, engineers, and practitioners requiring deception detection methods.
... In psychological studies LIWC was used to investigate the connection between language and individual differences, social processes or mental health. For example, LIWC was applied to identify gender differences (Gaucher, Friesen, & Kay, 2011;Kanze, Huang, Conley, & Higgins, 2018), hierarchical processes (Kacewicz, Pennebaker, Davis, Jeon, & Graesser, 2014;Markowitz, 2018), deception (Forsyth & Anglim, 2020;Hancock, Curry, Goorha, & Woodworth, 2007) or authorship . LIWC has been translated into numerous languages and is one of the most widely used text analysis software (for review see Chung & Pennebaker, 2018;Pennebaker, 2013;Tausczik & Pennebaker, 2010). ...
Article
This study investigated whether word categories of LIWC (Linguistic Inquiry and Word Count) are able to predict application success. To this end, 581 cover letters, CVs and complete application documents were analysed. Based on previous research, successful candidates, who receive a job offer, were expected to have used a more categorical, complex, and less self‐centred language. Conversely, rejected applications were expected to have been written in a dynamic style, linguistically simpler, more concerned with hedonistic issues and focused on the day‐to‐day lives. Overall, existing models could only be partially applied. Both the cover letter and the CV were found to contain predictive information regarding application success, which is noteworthy given the distinct standardization of application documents.
Chapter
The present investigation had a general objective to build a theoretical model of electronic management based on the humanist paradigm and labor competencies that will optimize the processes for the development of human potential in a local government of the Lambayeque region, the present investigation is relevant because it will allow to analyze the administrative systems proposed by SERVIR in a local government of the Lambayeque Region, in addition, a proposal is proposed that is characterized by developing people, above any financial interest, seeking human fulfillment, using technology. The sample consisted of 80 public servants from a Municipality of the Province of MPCH, the methodology used is a quantitative - positivist approach. Concluding that 70% of the collaborators surveyed, refers to the fact that the municipality executes the employment management dimension at a very low level, which means that hardly the collaborators have knowledge about the selection, induction and personnel rotation processes.Keywordse-HRME-RecruitmentE-Performance management y E-Learning
Article
Full-text available
The management of human talent are vital processes for the growth and development of a corporation, people being the reason for being of an organization, their expertise and the decisions they make daily are those that generate a competitive advantage in the world of work. The purpose of the following research is to establish the challenges of human talent in time of pandemic (COVID-19). For which a descriptive scope research was carried out. This analysis focused on identifying the current state and trends in talent management. In this sense, it could be concluded that the Artificial Intelligence used through digital applications effectively in the processes of recruitment, selection of personnel, recruitment, performance evaluation, training and programs: Cost reduction, increased productivity and speed of response. © 2021, University of Cienfuegos, Carlos Rafael Rodriguez. All rights reserved.
Article
Full-text available
Interviews are commonly used for selection but research on interview faking only gained momentum relatively recently. We review both theoretical and empirical work on prevalence, antecedents, processes, and effects of interview faking. Most applicants fake at least to some degree. Personality (e.g., Conscientiousness, Honesty-humility, the Dark Triad) and attitudes towards faking substantially correlate with faking behaviors. Research concerning applicants’ ability, interview structure components, or contextual factors is limited. Furthermore, the impact of faking on interview ratings is mixed and effects on criterion-related validity are not consistently negative. Finally, the detection of faking seems hardly possible and there are limited options available to reduce interview faking. Throughout our review, we describe important gaps and derive suggestions and propositions for future research.
Article
Full-text available
General Audience Summary The current study examined the verbal behavior of suspects, who tell the truth or lie when they are interviewed about their involvement in a crime, across three situations: when they provide a single statement immediately after the crime occurred; a single statement following a two-week delay; or two statements, the first provided immediately after the crime occurred and the second following a two-week delay. According to the reality-monitoring approach for lie detection, truth-tellers provide more perceptual (e.g., what they saw, heard, and smelled during the described event) and contextual details (e.g., times and locations) than liars. While truth-tellers usually provide truthful details in the interviews, liars, who are motivated to be convincing, manipulate their accounts by adding false details. Results showed that distinguishing truths from lies was possible in all situations, but with varying intensity. Truth-tellers provided only truthful details, whereas liars provided both truthful and false details. While the opportunity to provide truthful details decreased over time for both truth-tellers and liars, only the latter compensated for this decrease by adding false details. The current study provides new insights into the verbal behavior of liars and truth-tellers.
Article
Full-text available
Recently, verbal credibility assessment has been extended to the detection of deceptive intentions , the use of a model statement, and predictive modeling. The current investigation combines these 3 elements to detect deceptive intentions on a large scale. Participants read a model statement and wrote a truthful or deceptive statement about their planned weekend activities (Experiment 1). With the use of linguistic features for machine learning, more than 80% of the participants were classified correctly. Exploratory analyses suggested that liars included more person and location references than truth-tellers. Experiment 2 examined whether these findings replicated on independent-sample data. The classification accuracies remained well above chance level but dropped to 63%. Experiment 2 corroborated the finding that liars' statements are richer in location and person references than truth-tellers' statements. Together, these findings suggest that liars may over-prepare their statements. Predictive modeling shows promise as an automated veracity assessment approach but needs validation on independent data.
Article
Full-text available
Resume fraud is pervasive and has detrimental consequences, but researchers lack a way to study it. We develop and validate a measure for empirically investigating resume misrepresentations purposely designed to mislead recruiters. In study 1, an initial set of items designed to measure three theorized resume fraud dimensions (fabrication, embellishment, omission) are rated for content validity. In study 2, job seekers complete the measure and its factor structure is evaluated. In study 3, another sample of job seekers is surveyed to verify the measure’s factor structure and to provide evidence regarding construct validity. In study 4, working adults who recently conducted a job search are surveyed to determine which individuals are more likely to commit resume fraud and whether resume fraud relates to critical work behaviors. We confirm the three-factor structure of our measure and offer evidence of construct validity by showing that socially desirable responding, Machiavellianism, moral identity, conscientiousness, emotional stability, and agreeableness are related to resume fraud. Additionally, we find that resume fraud predicts reduced job performance and increased workplace deviance beyond deceptive interviewing behavior. Resume fraud is rarely studied despite the negative impact it can have on job-related outcomes. Researchers can use this measure to explore further the antecedents and outcomes of resume fraud and to advise recruiters on how to minimize it. We develop a measure focusing on intentional resume misrepresentations designed to deceive recruiters. This is one of the first studies to examine the antecedents and outcomes of resume fraud.
Article
In this article, I present my view on the significant developments and theoretical/empirical tipping points in nonverbal and verbal deception and lie detection from the last 30 years and on prospects for future research in this domain. I discuss three major shifts in deception detection research: (a) From observing target persons' nonverbal behavior to analyzing their speech; (b) from lie detection based on differences between truth tellers and liars' levels of arousal to lie detection based on the different cognitive processes or strategies adopted to appear convincing; and (c) from passively observing target persons to actively interviewing them to elicit or enhance verbal cues to deceit. Finally, I discuss my ideas for future research, focusing on initiatives from my own lab. Hopefully, this will stimulate other researchers to explore innovative ideas in the verbal deception research domain, which already has seen so much progress in the last decade.