ArticlePDF Available

Detecting contract cheating through linguistic fingerprint

Authors:

Abstract and Figures

Contract cheating, the act of students enlisting others to complete academic assignments on their behalf, poses a significant challenge in academic settings, undermining the integrity of education and assessment. It involves submitting work that is falsely represented as the student’s own, thus violating academic standards and ethics. The advent of artificial intelligence-based language models, such as ChatGPT, has raised concerns about the potential impact of contract cheating. As these language models can generate human-like text with ease, there are concerns about their role in facilitating and increasing contract cheating incidents. Innovative approaches are thus needed to detect contract cheating and address its implications for academic integrity. This study introduces a machine learning (ML) model focused on identifying deviations from a learner’s unique writing style (or their linguistic fingerprint) to detect contract cheating, complementing traditional plagiarism detection methods. The study involved 150 learners majoring in engineering and business who were studying English as a foreign language at a college in Saudi Arabia. The participants were asked to produce descriptive essays in English within a consistent genre over one semester. The proposed approach involved data preprocessing, followed by transformation using Term Frequency-Inverse Document Frequency (TF-IDF). To address data imbalance, random oversampling was applied, and logistic regression (LR) was trained with optimal hyperparameters obtained through grid search. Performance evaluation was conducted using various metrics. The results showed that the ML model was effective in identifying non-consistent essays with improved accuracy after implementing random oversampling. The LR model achieved an accuracy of 98.03%, precision of 98.52%, recall of 98.03%, and F1-score of 98.24%. The proposed ML model shows promise as an indicator of contract cheating incidents, providing an additional tool for educators and institutions to uphold academic integrity. However, it is essential to interpret the model results cautiously, as they do not constitute unequivocal evidence of cheating but rather serve as grounds for further investigation. We also emphasize the ethical implications of such approaches and suggest avenues for future research to explore the model’s applicability among first-language writers and to conduct longitudinal studies on second-language learners’ language development over longer periods.
This content is subject to copyright. Terms and conditions apply.
ARTICLE
Detecting contract cheating through linguistic
ngerprint
Mohammed Kutbi 1, Ali H. Al-Hoorie 2& Abbas H. Al-Shammari3
Contract cheating, the act of students enlisting others to complete academic assignments on
their behalf, poses a signicant challenge in academic settings, undermining the integrity of
education and assessment. It involves submitting work that is falsely represented as the
students own, thus violating academic standards and ethics. The advent of articial
intelligence-based language models, such as ChatGPT, has raised concerns about the
potential impact of contract cheating. As these language models can generate human-like
text with ease, there are concerns about their role in facilitating and increasing contract
cheating incidents. Innovative approaches are thus needed to detect contract cheating and
address its implications for academic integrity. This study introduces a machine learning
(ML) model focused on identifying deviations from a learners unique writing style (or their
linguistic ngerprint) to detect contract cheating, complementing traditional plagiarism
detection methods. The study involved 150 learners majoring in engineering and business
who were studying English as a foreign language at a college in Saudi Arabia. The participants
were asked to produce descriptive essays in English within a consistent genre over one
semester. The proposed approach involved data preprocessing, followed by transformation
using Term Frequency-Inverse Document Frequency (TF-IDF). To address data imbalance,
random oversampling was applied, and logistic regression (LR) was trained with optimal
hyperparameters obtained through grid search. Performance evaluation was conducted using
various metrics. The results showed that the ML model was effective in identifying non-
consistent essays with improved accuracy after implementing random oversampling. The LR
model achieved an accuracy of 98.03%, precision of 98.52%, recall of 98.03%, and F1-score
of 98.24%. The proposed ML model shows promise as an indicator of contract cheating
incidents, providing an additional tool for educators and institutions to uphold academic
integrity. However, it is essential to interpret the model results cautiously, as they do not
constitute unequivocal evidence of cheating but rather serve as grounds for further investi-
gation. We also emphasize the ethical implications of such approaches and suggest avenues
for future research to explore the models applicability among rst-language writers and to
conduct longitudinal studies on second-language learnerslanguage development over longer
periods.
https://doi.org/10.1057/s41599-024-03160-9 OPEN
1College of Computing and Informatics, Saudi Electronic University, Jeddah, Saudi Arabia. 2Royal Commission for Jubail and Yanbu, Jubail, Saudi Arabia.
3Faculty of Graduate Studies, Kuwait University, Kuwait City, Kuwait. email: hoorie_a@rcjy.edu.sa
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9 1
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Introduction
Technology has become an integral part of modern life and
has been extensively utilized in educational settings.
Technology has greatly enhanced the convenience of
accessing and transmitting information in educational settings.
With the onset of COVID-19, technology proved indispensable
for institutions worldwide as they switched to emergency remote
teaching (Hodges et al. 2020). However, one major impediment
to integrating technology with education is assessment (Al
Shlowiy et al. 2021). When educational institutions turned to
emergency remote learning, the scale of this problem became
apparent. For example, reported cheating incidents at the Uni-
versity of Waterloo increased by 146%, jumped by 269% at the
University of Calgary, doubled at the University of Houston, and
quadrupled at Queensland University of Technology (Basken,
2020). Lancaster and Cotarlan (2021) analyzed student requests
on one le-sharing website, Chegg, and found that exam-style
requests increased by almost 200% during the pandemic. With
the pandemic over, institutions became eager to go back to tra-
ditional face-to-face teaching and assessment in order to address
the sudden grade ination occurring during that period, thus
reducing the potential role of technology in education.
A major part of the phenomenon described above is known as
contract cheating (Lancaster and Clarke, 2016). Contract cheating
occurs when a student is requesting an original bespoke piece of
work to be created for them(Lancaster and Clarke, 2016, p. 639).
Traditional plagiarism software is unable to detect this type of
academic integrity breach because the text in question is original
and not copied from elsewhere. Students engaged in contract
cheating may obtain assistance from commercial essay mills or
private tutors for a fee. Other students may obtain such assistance
for free from family, friends, and other students. Obtaining work
from somebody else and submitting it as ones own for course
credit is a serious academic offense, whether payment was
involved or not, as grades would no longer be a true reection of
the students ability.
In this study, we attempted to develop a machine-learning
model that can detect the linguistic ngerprint of the foreign
language learner. We analyzed essays written by students over a
semester and trained the model to detect whether an essay was
written by the same student (i.e., consistent) or by a different
student (non-consistent). Thus, our primary point of reference in
developing this model was not detecting the similarity of text
submitted to texts written by others but detecting deviation from
texts written by the same student. Foreign language learners tend
to make different language mistakes consistently as they go
through a series of developmental stages (Mitchell et al. 2019),
and therefore, it may be possible to recognize their linguistic
ngerprints (i.e., distinct style of writing).
Contract cheating
In a study by Bretag et al. (2019), the researchers identied three
main factors associated with contract cheating. The rst factor
was that students were dissatised with the learning and teaching
environment. Students are usually under tremendous pressure to
obtain high GPAs in order to be able to compete in the job
market after graduation. This grade-focused educational envir-
onment may tempt some students to nd shortcuts to achieve
higher GPAs. The second factor was the availability of opportu-
nities for cheating. Indeed, a quick Google search shows thou-
sands of commercial services promising assistance in term essays,
grant applications, conference presentations, and journal manu-
scripts (Bretag, 2019). Students are bombarded with advertise-
ments, both online and ofine, about discreet coursework
assistance. This is a largely unregulated market, and many
countries do not have clear and effective laws addressing this
problem. Finally, students who speak languages other than Eng-
lish (LOTEs; Dörnyei and Al-Hoorie, 2017) may feel dis-
advantaged, potentially making contract cheating an enticing
option to improve their grades.
The prevalence of such dishonest practices constitutes a serious
risk to the credibility of academic institutions. Grades and degrees
students receive from their institutions will no longer be credible,
and consequently education loses its value. The larger society also
suffers from this situation when the proportion of the workforce
(e.g., doctors, engineers) whose qualications do not reect their
actual skill increases. When these students join academia, they
might also be tempted to resort to dishonesty in a publish-or-
perish environment (Bretag, 2019). Research suggests that
engagement in academic dishonesty is related to involvement in
dishonest behavior in other life contexts (Guerrero-Dib et al.
2020) and to corruption at the country level (Orosz et al. 2018).
Existing strategies to counteract academic dishonesty generally
depend on technology. One strategy requires students to turn on
their cameras during the exam to ensure that the student does not
receive assistance. Another strategy is to adopt a specially
designed browser (e.g., LockDown Browser) that limits access to
unwanted websites and applications. These strategies are not too
challenging to trick, considering the availability of smartphones.
Besides, not all assessments require students to complete tasks
within a specic session under invigilation. Students are fre-
quently asked to write reports and extended essays and submit
them at a later date during the semester. In these cases, institu-
tions usually resort to plagiarism detection software (e.g., Turn-
itin, PlagScan, AntiPlag). This software compares the submitted
text with text available online and within its database and pro-
duces what is called an originality scoreor a PlagLevel.A
clear problem with this approach is that the software will miss
plagiarism cases when it cannot reach the original text (e.g.,
behind a paywall or still not digitalized) if it is in a different
language, or even when there are typos in either text (Weber-
Wulff, 2019).
More importantly, for the present purposes, the traditional
anti-plagiarism approach is not designed to detect contract
cheating. Contract cheating, by denition, involves an original
text that the student does not produce themselves but submit as
their own. This could be through a professional agency doing it
for a fee, a family member doing it as a favor, or a free online
service that paraphrases the text until the plagiarism score is low
enough. Even using one of the rapidly increasing numbers of free
paraphrasing services is problematic, as there is no guarantee that
the student has actually learned the material. The problem
becomes more acute when the student is learning a foreign lan-
guage. Foreign language learners are expected to demonstrate
improvement in their language prociency, and therefore,
obtaining assistance from such automated programs defeats the
purpose. In short, the illegal services are developing at a faster
pace than the systems required to curb them(Hill et al. 2021,
p. 15).
Academic integrity in the era of ChatGPT
The emergence of advanced articial intelligence (AI) language
models, such as ChatGPT, has brought forth new challenges in
maintaining academic integrity. One critical issue is that
ChatGPT has the capability of producing novel texts that can
evade traditional plagiarism detection software, making it
increasingly difcult to identify instances of contract cheating.
Existing approaches to detect ChatGPT-generated texts often rely
on keyword-based matching and syntax analysis, but these
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9
2HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
methods are proving to be ineffective due to the models
sophisticated language generation abilities. Moreover, ChatGPT
can generate highly original text, complicating teachersability to
discern genuine student work from content generated by these
models. Available tools designed to detect AI texts have shown
poor accuracy and reliability as well as a bias toward classifying
texts as human-written (Weber-Wulff et al. 2023).
In certain contexts, using ChatGPT and other AI tools may be
acceptable, and even benecial. Academics, for example, may
leverage these tools to enhance their writing, improve pro-
ductivity, and explore novel ideas (Kim, 2023). Guidelines for the
responsible use of such tools have been developed within aca-
demic communities to ensure transparency and acknowledge the
involvement of AI tools in content creation (Flanagin et al. 2023).
For academic professionals, the goal is not to deceive but to
augment the clarity and value of their intellectual contributions.
However, the situation differs signicantly for language lear-
ners. The primary objective of language assessment is to evaluate
a students language prociency and linguistic development. In
this context, using AI tools to produce responses for language
tests defeats the purpose of accurately assessing learnersabilities.
Moreover, the impact of ChatGPT extends beyond direct student
usage. Even if students do not utilize ChatGPT themselves, the
existence of such tools empowers individuals who sell essays to
students. Unscrupulous essay-writing services can exploit
ChatGPT to mass-produce essays and sell them cheaply, posing a
considerable threat to academic integrity and rendering tradi-
tional plagiarism detection measures less effective against this
new form of contract cheating.
Analytical approaches to detect contract cheating
As contract cheating is a pressing concern for academic institu-
tions, the development of effective analytical approaches to detect
such misconduct has become vital. In this section, we review ML
algorithms that might be useful in identifying contract cheating
instances based on the linguistic patterns and features exhibited
in student essays.
ML algorithms. Logistic regression (LR): A LR is a supervised
learning algorithm that predicts the probability of class mem-
bership based on relationships with predictor variables. It utilizes
statistical analysis to determine binary outcomes and is valued for
its ability to handle varied data sources with minimal complexity.
However, LR is sensitive to minor changes in input values, which
can signicantly bias probability predictions, as noted by Dreiseitl
and Ohno-Machado (2002). Additionally, the models effective-
ness is inuenced by the dimensionality of the input vector; a
higher number of predictors can increase the cost of training and
risk overtting, thereby reducing the models generalization
capabilities across different datasets.
Light Gradient-Boosting Machine (LightGBM): LightGBM is a
widely used gradient-boosting algorithm based on a decision tree
(Friedman, 2001). It is often used in various tasks such as
classication, sorting, and regression and supports efcient
parallel training. LightGBM uses an approximate loss function
with a piecewise constant tree and quadratic Taylor approxima-
tion at each step. Then, it trains the decision tree to reduce this
quadratic approximation. LightGBM has proved to be an efcient
algorithm and exhibited higher classication results with
distributed systems.
Oversampling techniques. Random Oversampling: Random
oversampling is a technique for handling imbalanced data. It
generates new samples by randomly duplicating them from the
minority class (Mohammed et al. 2020). Consequently, the
minority class will be adjusted as the majority class and thereby
have the same sample distribution. This technique is deemed to
be the simplest and easiest way to cope with class imbalance
issues, showing robust performance. The main benet of this
straightforward technique is that there is no information loss.
However, it often increases the likelihood of overtting since it
copies the minority class instances.
Synthetic Minority Oversampling (SMOTE): SMOTE is a
sophisticated data balance technique (Chawla et al. 2002) used to
overcome the imbalance problem. SMOTE algorithms aim to
generate a balanced class distribution by generating synthesized
samples of the minority class via interpolation. SMOTE randomly
selects a member of the minority class and then uses k-nearest
neighbors (KNN) to synthesize new samples according to the line
segment of the minority class member and its neighbor. This
process is repeated many times until the minority class has the
same sample distribution as the majority class does. SMOTE can
contribute to alleviating the overtting problem caused by
random oversampling since this is a synthetic sample rather
than a replication.
The present study
Based on the above, there is a need to expand the available tools
to detect a wider range of academic dishonesty, including contract
cheating. As there are a number of statistical techniques available,
the aim of the present study was, therefore to test the efcacy of
these techniques to detect the possibility of contract cheating. To
assess the efciency of the proposed models, we focused on the
following research question:
RQ. Can we create an AI-based method that detects
inconsistency in writing styles?
To answer our research question, we compared the perfor-
mance of three ML algorithms and three balancing techniques by
using four evaluation metrics (Tharwat, 2021). The experi-
mentation phase incorporated two important steps: with and
without the application of the balancing technique. This funda-
mental step can deeply investigate the vital role of balancing in
enhancing the models performance. These metrics can be derived
from the confusion matrix outlined in Table 1.
Accuracy represents the percentage of correctly predicted
consistent essays relative to the whole dataset. It is calculated as in
Eq. 1.
Accuracy ¼TP þTN
TP þFP þFN þTN ð1Þ
Precision is the exactness representing the number of optimistic
class predictions that belong to the positive class. It measures the
proportion of students predicted by the classier to write con-
sistent essays that are actually consistent. It is the ratio of the True
Positive (TP) instances to the sum of True Positive (TP) and False
Positive (FP) instances (Eq. 2).
Precision ¼TP
TP þFP ð2Þ
Table 1 Confusion matrix.
Actual class Predicted class
Consistent Non-consistent
Consistent True Positive (TP) False Positive (FP)
Non-consistent False Negative (FN) True Negative (TN)
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Recall represents the percentage of consistent essays that have
been correctly predicted. It is calculated as in Eq. 3.
Recall ¼TP
TP þFN ð3Þ
F1-score is a harmonic mean of recall and precision values. It
strikes a balance between precision and recall, thereby providing a
correct evaluation of the models performance in classifying the
consistency of essays (Eq. 4).
F1 Score ¼2x Precision x Recall
Precision þRecall ð4Þ
Method
Participants. The participants (N=150) were freshmen taking
English language courses as prerequisites for their degree plans at
an engineering and business college in Saudi Arabia. Their lan-
guage prociency level was B1B2, as they had to successfully
complete a foundation year before starting their majors.
Materials. The participants were asked to write descriptive essays
about familiar topics, such as describing their campus and what
they do on the weekend. The genre was consistent in controlling
for genre-related vocabulary and grammatical structures. See
Data Analysis for more details.
Procedure. The participants completed the assignments during
class time. One of the researchers or their class teacher was
present to answer any questions. The participants wrote one essay
every 2 weeks, so that each participant produced a total of seven
essays over the semester. The participants were asked not to
resort to assistance from dictionaries or their smartphones. They
were assured that this task was not a test, that it was voluntary,
and that their grades would not be affected. Institutional ethical
approval was obtained before the study commenced.
Data analysis
Proposed approach. This section introduces our proposed
approach to classifying studentsessays into consistent and non-
consistent. Mainly, the approach was based on ve steps, as
depicted in Fig. 1. The rst step was data preprocessing, which
involved cleaning the data by removing punctuation marks,
numbers, special characters, extra spaces, empty lines, and stop
words. The data transformation step was conducted to prepare
the data to be fed into the classier. Next, an oversampling step
was undertaken to increase the number of samples in the minor
class to address the unbalancing of samples in each class of our
dataset. As a nal step of this approach, we trained an ML model
with its optimal settings, which were acquired empirically using
the grid search method. We assessed the performance according
to specic evaluation indicators discussed in detail below.
Data preprocessing: Data preprocessing is considered a vital step
when dealing with classication tasks; its role mainly involves
making the raw data tter to be fed into ML and deep learning
algorithms. Python offers numerous libraries ideal for handling
these tasks. Amongst the nest ones is the NLTK library
(Hardeniya et al. 2016). First, the samples of studentswritings
were cleaned by removing special characters, white spaces, and
punctuation marks, followed by a simple and effective lowercase
process to convert each word to its equivalent in lowercase.
Subsequently, a stop-word removal process was conducted to
omit all the English stop words according to a predened list
provided by the NLTK library. The preprocessing tasks are
straightforward; however, their impact on the models perfor-
mance is quite remarkable.
Data transformation: In this signicant step, we used one of the
most prevalent techniques in data transformation. The Term
Frequency-Inverse Document Frequency (TF-IDF) (Qaiser and
Ali, 2018) is used essentially in information retrieval and text
mining. Indeed, this technique balances the samples because it
studies how each sample is a weighted text sequence. The TF-IDF
technique is divided into two sub-steps. Initially, the term fre-
quency is counted by calculating the number of times a word
appears in a document divided by the total number of words, as
illustrated in Eq. 5, followed by inverse document frequency,
which aims to scale with the number of essays.
Wd;t¼tf ´log N
df
 ð5Þ
where Wdenotes the weight value for term tand document d.N
represents the total number of documents in the corpus. tf and df
dene the term frequency; more specically, it indicates the
number of times and the number of document in a particular
term.
Oversampling phase: The data imbalance issue is broadly domi-
nant in the AI eld due to a shortage of many data types. Thus, it
can adversely affect the models performance because it lacks
training in particular patterns, leading to poor model general-
izability. To this end, miscellaneous oversampling techniques are
reported in the literature, proving their mettle against this chal-
lenge. We used one of the popular oversampling techniques for
Fig. 1 Approach overview.
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9
4HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
our imbalance data, called random oversampling. Before delving
into details, we should be clear that this technique is applied only
to the training data. Therefore, we divided the whole dataset into
two subsets: training data which represents 70% of data according
to the hold-out methods, and the remaining data (30%) is devoted
to testing the decisive model. The used method is considered the
most straightforward oversampling technique to balance the
dataset. It balances the data by simply duplicating the samples of
the minority class. This process does not cause any harm to the
dataset, such as noise. Nevertheless, the model is likely to overt
while training because it feeds on the copied information. Addi-
tionally, random oversampling can signicantly enhance the ef-
ciency per class and the models overall performance. As noticed,
the number of samples in class non-consistent outnumbers the
instances of class consistency. Table 2details how the number of
samples increased after random oversampling.
Model training: In this step, we trained the model using the new
training data obtained after the application of oversampling using
ML algorithms. The main aim of this model was to differentiate
between studentsessays, whether consistent or non-consistent,
based on ML algorithms. Recently, ML, a subeld of AI, has
established its ability to solve many tasks related to text mining,
computer vision, and pattern classication. ML algorithms focus on
building an intelligent model based on the data they learn and
process. We used one of the highly used ML classiers called LR in
addition to ne-tuning hyperparameters to ensure the maximiza-
tion of results and usage of the optimal set of parameters. LR model
is a supervised learning algorithm that is virtually used to forecast
the class of a given variables probability based on its relationship
with the main class. Any slight alteration in the input data can
excessively impact the prediction probability. Furthermore, the
input vectors dimension should be low enough in order not to
affect the cost of the training model, leading to overtting of the
model and poor generalization. Nonetheless, LR is considered one
of the prominent ML algorithms in classication tasks with low
complexity. This training step was implemented using Python
programming language and conducted via Google Collab.
Basically, some components must be considered to develop an
effective classication model, such as the hyperparameter
conguration, which is fundamental to building an accurate
and optimal model (Elshawi et al. 2019). Nevertheless, the search
space of parameter value combinations is likely to be countless,
and thus, tuning manually becomes impractical, ineffective, time-
consuming, and often needs deep knowledge of models. To this
end, automatic hyperparameter optimization is of critical
importance. Several techniques exist in this context, with each
methods strengths and drawbacks (Yang and Shami, 2020). We
adopted the grid search function to nd the optimal parameters
belonging to a particular ML algorithm, which is the most
straightforward hyperparameter method. Indeed, it generates a
Cartesian product of all possible combinations of hyperpara-
meters. This technique trains LR with all hybrids developed. It
usually needs to be accompanied by a performance metric that is
often the accuracy metricwhich is the same as our case
calculated using the cross-validationtechnique on the training
set. In fact, this validation guarantees that the model is exposed to
many samples of data. Grid search uses space of parameter values,
calculates the score of each building model, and then selects the
optimal model that provides the best results. Finally, the grid
search algorithm outputs the best settings that give the uppermost
results, which are used later in the actual model. To achieve the
highest results with LR, we dened a search space of possible
values of parameters, as illustrated in Table 3.
After training the model with these values, the grid search
algorithm outputs the following combination as the optimal one
(max_iter =200, penalty =l1, solver =liblinear).
Model evaluation: This step tackles the models performance
using the testing data (30% of the whole dataset), which
encompasses 8491 samples, including 167 from the consistent
class and 8324 from the non-consistent category. Finally, the
overall performance analysis for the studied classiers was eval-
uated based on the metrics described below.
Experimental setup. Due to a lack of data in specicelds, we
strived to build our dataset based on studentsessays for English
courses taught at a college in Saudi Arabia. The reports were
written manually by students, and then we transferred them into
an electronic version for processing. Dealing with unlabeled data
is a challenging issue primarily because it often affects the models
efciency in case of wrongly labeling the data. In fact, the
information used was unlabeled, which incorporated 1050 essays
for 150 students. In this case, we managed to label the dataset
according to the following. Firstly, each essay by Student X was
paired with another essay by Student Y, so that this pairing
belonged to the non-consistent class label. Through this pairing,
we guaranteed the non-consistency of different studentsessays.
Moreover, each essay by Student X was split into two sub-
paragraphs and maintained as two essays, and we labeled it as a
consistent class. The reason behind this split is to ensure that the
two essays belong to the same student in case the assignment
submitted by the same person is obtained from elsewhere.
Accordingly, the dataset contains 28302 essays in total, 27710 of
which belong to the non-consistent class, and the remaining 592
are consistent samples.
As shown in Fig. 2, the dataset is highly unbalanced,
highlighting the pressing need for applying data imbalance
techniques.
Results
We implemented three experiments using three ML algorithms:
naive Bayes, LR, and LightGBM; and three balancing techniques:
ADASYN, SMOTE, and random oversampling. We rst con-
ducted the experiments with the whole dataset without the
oversampling phase to distinguish its positive impact on the
ultimate results.
Experimental results without the oversampling phase. In the
present step, we introduced an empirical evaluation of the studied
classiers without the phase of oversampling. Furthermore, we
Table 2 Increase in number of samples after implementing
the oversampling approach.
Class No. of
samples before
oversampling
No. of
samples after
oversampling
Total
training
set
Consistent 425 19,386 19,811
Non-consistent 19386 19,386
Table 3 Search space of parameters possible values.
Parameter Possible values
penalty l1, l2, elasticnet, none
solver lbfgs, newton-cg, liblinear, sag, saga
max_iter 100, 200, 300, 400
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
relied primarily on the data preprocessed after the preprocessing
stage, and then we performed data transformation using two
effective methods: TF-IDF (1-gram) and CountVectorize (bag of
words). The results are found in Table 4.
Generally, the obtained results are fairly similar in terms of
performance. Logistic regression combined with a bag of words
outperformed the other ML classiers with an accuracy of
98.06%, a precision value of 99.86%, a recall value of 98.06%, and
an F1-score of 98.92%.
As reported in Table 5, the model poorly predicted the
consistent samples while highly accurately predicting the non-
consistent samples. This poor prediction unveils the models lack
of learning from the non-consistent class. Due to the unbalanced
data, the overall accuracy calculation is high.
Basically, classication accuracy denes the models perfor-
mance by dividing the number of correct predictions by the total
number of predictions. Nevertheless, accuracy would have some
issues when the classes are not balanced, as in our case study,
prompting low accuracy in one of the deployed classes. The lower
the class accuracy is, the higher the increase in the rate of the
model failure to correctly predict the samples of that class. Thus,
poor class accuracy can become an unreliable model evaluation
metric due to unequal distributions of examples in the training
set. This disadvantage highlights the pressing need to balance the
sample to the effective classication model.
Experimental results with oversampling phase. In this part, we
discuss the results we obtained after applying oversampling.
Indeed, this step substantially enhanced the results by providing
the model with enough information to be trained on. We used
random oversampling to raise the number of samples in the non-
consistent class. In this way, the model gained a chance to be fed
on more data to become well-trained.
To investigate which oversampling sampling technique was
suitable for our ultimate model, we studied three oversampling
techniques, including SMOTE, random oversampling, and
ADASYN. Each oversampling procedure was executed using
data transformation and ML models. Table 6shows that most
models generate signicant differences compared to results
without applying oversampling.
The main objective of applying oversampling was to boost the
performance of the non-consistent class. Table 6shows that
random oversampling provides the best results with logistic
regression. It yielded a slight difference compared to what was
previously performed. Accuracy improved and reached 98.03%,
with a precision of 98.52%, a recall of 98.03%, and an F1-score of
98.24%.
Table 7reports the accuracy results per class for the model that
achieved the highest accuracy of 53.88% per the consistent class.
For the consistent class, the precision was also enhanced radically
to reach 55.62% instead of 3.59%, a recall of 46.11%, and an F1-
score of 50%. The signicant impact of random oversampling on
the model performance was quite noticeable.
To maximize results, we applied one of the automatic
hyperparameter techniques to tune the models parameters: grid
search. Such a step seems very inuential when dealing with ML
classiers. As shown in Table 8, the performance results recorded
a slight increase: an accuracy of 54.63%, a precision of 56.37%
(compared to the previous accuracy of 55.62%), a recall of
46.11%, and an F1-score of 50.42% instead of 49.12%. This
method was benecial in enhancing the results.
In summary, the logistic regression model combined with
random oversampling achieved better results than the other
studied models and demonstrated its ability to investigate the
consistency of student essays with an overall accuracy of 98.03%.
Discussion
Contract cheating, wherein students submit work not authored by
them, has emerged as a signicant challenge in academic insti-
tutions worldwide. To combat this misconduct, it is crucial to
employ effective and innovative approaches to detect instances of
contract cheating and uphold academic integrity. In this study, we
developed a ML model that aims to tackle contract cheating by
detecting deviations in the writing styles of individual students
rather than relying solely on text similarity. In this section, we
discuss the implications of our approach and explore the broader
context of academic integrity in the era of ChatGPT and other AI
tools, the effectiveness of existing approaches to detecting AI
texts, and the ethical considerations surrounding their use.
The advent of advanced language models, such as ChatGPT,
has transformed the landscape of academic integrity. These AI-
Table 4 Overall classication results without oversampling phase.
Data transformation Model Accuracy Precision Recall F1-score
CountVectorize naive Bayes 0.979508 0.993402 0.979508 0.985958
Logistic regression 0.980685 0.998618 0.980685 0.989264
LightGBM 0.980332 1.000000 0.980332 0.990068
TF-IDF (1-gram) naive Bayes 0.980332 1.000000 0.980332 0.990068
Logistic Regression 0.979508 0.993097 0.979508 0.985796
LightGBM 0.980332 1.000000 0.980332 0.990068
Note. Bold indicates the best performing model.
Fig. 2 Imbalance between consistent and non-consistent samples.
Table 5 Classication results per class for the best model.
Class Accuracy Precision Recall F1-score
Consistent 0.0359 0.0359 0.6666 0.0681
Non-consistent 0.9996 0.9810 0.9996 0.9902
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9
6HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
powered models are capable of producing novel and coherent
texts, rendering them virtually undetectable by traditional plagi-
arism detection software. As a result, students are increasingly
tempted to use such tools to generate essays, reports, and
assignments, leading to a rise in contract cheating incidents. The
ease of access and the ability to produce custom-written content
with minimal effort has made AI tools a potential enabler of
academic misconduct. Educational institutions must be proactive
in adopting strategies to identify and prevent contract cheating
facilitated by AI technologies.
In this study, we proposed an analytical approach based on ML
algorithms to detect contract cheating instances. Unlike tradi-
tional methods that rely on text similarity comparisons, our
model focuses on identifying inconsistencies in a students writing
style, which can indicate potential contract cheating. By using
ML, we aim to learn the distinctive patterns of each students
writing and establish a prole that allows recognition of sudden
deviations from their own norm. Our results demonstrated pro-
mising accuracy in detecting discrepancies, offering valuable
insights into the potential of data-driven methodologies for
contract cheating detection.
However, we acknowledge that analytical approaches alone
cannot completely eradicate contract cheating. As academic
integrity violations continue to evolve, it is crucial to adopt a
multi-pronged approach that combines analytical methods with
other preventive measures, such as educational interventions,
promoting a culture of academic honesty, and establishing strong
policies against contract cheating. While these tools can be useful
for academic writing, allowing academics to enhance their
research and communication, their application in educational
settings is a double-edged sword. Educators may utilize AI tools
to improve their writing or brainstorm ideas for lectures and
assignments. However, for language learners, the use of AI tools
can defeat the very purpose of testing their language prociency.
If students resort to AI-generated content, they do not genuinely
showcase their language skills, and educational assessments lose
their accuracy and fairness.
Detecting AI-produced texts using traditional plagiarism
detection methods is challenging (Weber-Wulff et al. 2023). Since
the content generated by AI models is not sourced from existing
online repositories, these texts remain undetected by similarity-
based plagiarism detection systems. Existing approaches, such as
rule-based detectors or keyword matching, are inadequate for
identifying AI-generated texts, as these methods do not account
for the linguistic nuances and coherence achieved by advanced
large language models. Traditional plagiarism detection methods
may not be adept at identifying AI-generated texts due to the
unique features and language patterns characteristic of AI lan-
guage models. Consequently, academics and educators face an
uphill battle in staying ahead of contract cheating attempts uti-
lizing AI technology.
Second language learners undergo a series of developmental
stages as they acquire the target language. Different learners
tend to make different mistakes consistently, which facilitates
the creation of individualized learner proles. Our model
exhibited a high overall level of accuracy in detecting whether a
piece of writing was written by the same individual or a different
one. For example, second language learnersreliance on readily
available automated paraphrasing services or online translators
can detrimentally affect their learning. The text they produce
does not reect their language prociency, which is an essential
evaluation criterion in second language learning courses. The
Table 6 Classication results for studied models after applying oversampling techniques.
Oversampling technique Data transformation Model Accuracy Precision Recall F1-score
Random oversampling CountVectorize Naive Bayes 0.779178 0.660305 0.779178 0.702178
Logistic Regression 0.980332 0.985253 0.980332 0.982432
LightGBM 0.702037 0.695287 0.7080337 0.598914
TF-IDF
(1-gram)
Naive Bayes 0.731245 0.589066 0.731245 0.990068
Logistic Regression 0.977859 0.982533 0.977859 0.979936
LightGBM 0.709457 0.715469 0.709457 0.608702
ADASYN CountVectorize Naive Bayes 0.954540 0.950083 0.954540 0.952275
Logistic Regression 0.974326 0.986617 0.974326 0.980025
LightGBM 0.829938 0.772293 0.829938 0.772037
TF-IDF
(1-gram)
Naive Bayes 0.738076 0.601687 0.738076 0.646156
Logistic Regression 0.977388 0.982135 0.977388 0.979509
LightGBM 0.867742 0.804498 0.867742 0.826125
SMOTE CountVectorize Naive Bayes 0.952774 0.946976 0.952774 0.949817
Logistic Regression 0.974679 0.987003 0.974679 0.980368
LightGBM 0.832529 0.774883 0.774883 0.775671
TF-IDF
(1-gram)
Naive Bayes 0.745731 0.612342 0.745731 0.656499
Logistic Regression 0.977506 0.982524 0.977506 0.979739
LightGBM 0.865151 0.797907 0.865151 0.822614
Note. Bold indicates the best performing model.
Table 7 Classication results per class for the best model
(logistic regression) with oversampling.
Class Accuracy Precision Recall F1-score
Consistent 0.5388 0.556250 0.435233 0.491228
Non-consistent 0.9915 0.984807 0.992046 0.988413
Table 8 Classication results per class for the best model
(logistic regression) with oversampling after grid search
application using the parameters (max_iter=200,
penalty =l1, solver =liblinear).
Class Accuracy Precision Recall F1-score
Consistent 0.5463 0.563758 0.461140 0.504249
Non-consistent 0.9915 0.986934 0.992167 0.989543
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
purpose of language prociency tests is to assess the learners
ability to comprehend, express, and articulate ideas in the target
language. Utilizing AI language models to produce texts can
undermine the accuracy of language assessments, as the generated
content may not genuinely reect the students linguistic ability.
The accessibility and ease of use of AI language models also
empower individuals involved in selling illegal essays to students.
AI-generated content can be mass-produced with minimal effort,
making it an attractive option for essay mills and unscrupulous
individuals seeking to prot from academic misconduct. This
exacerbates the problem of contract cheating, as it not only
facilitates studentsevading detection but also enables the une-
thical essay-writing industry to ourish. As educators strive to
combat contract cheating, it is essential to address the root causes
and disrupt the supply chain of such illegitimate services.
Despite these promising results, we emphasize that the results
from this model should not be implemented mechanistically.
When the model ags a discrepancy, this can serve as an indicator
of a potential cheating incident and mark a particular essay for
further investigation, but it does not constitute unequivocal evi-
dence of a cheating incident. When language teachers become
familiar with a students writing style, they can become suspicious
when that student submits work that greatly exceeds his/her
current language prociency. Our model formalizes this intuition
and provides a probability gure that calls for further investiga-
tion. This is also helpful for teachers responsible for large classes,
as they may not be able to become familiar with each students
individual style. This may also, hopefully, deter potential cheaters
from engaging in dishonest endeavors in the rst place.
For any proposed approach, it is important to consider its
ethical implications. Some approaches developed to enhance the
security of online exams have been criticized for violating privacy.
Examples include eye movement, biometric data, and keystroke
tracking (see Hill et al. 2021). There are also concerns about the
consequences of security breaches of the stored data. In our case,
the ethical issues seem less severe. The model we developed is not
very different from existing anti-plagiarism software, which col-
lects texts from the internet and from other users to build its
database. Nevertheless, we encourage further examination of the
ethical consequences of implementing this approach and other
similar ones in educational settings.
While ML algorithms can be powerful tools for preserving
academic integrity, their implementation must be done ethically
and responsibly. As with any technology that deals with student
data and performance evaluation, concerns about privacy and
fairness are paramount. Educational institutions and researchers
need to ensure that the data used to train these models are col-
lected and used in accordance with established ethical guidelines.
Additionally, the interpretability of ML models is essential to
build trust and transparency. Understanding how the model
arrives at its decisions can help educators and administrators
assess its reliability and applicability.
The ght against contract cheating requires a multi-pronged
approach that leverages technology, educational interventions,
and policy changes. As AI language models continue to evolve,
one future research direction is improving detection models
through advancing ML algorithms and natural language proces-
sing techniques to better identify AI-generated content and
develop tailored approaches for detecting contract cheating.
Another future direction is conducting longitudinal studies to
examine the evolution of language prociency among learners
over time, while considering the inuence of AI tools on language
development and its ability to anticipate their varying develop-
mental trajectories. Researchers should also collaborate with
educational institutions to develop robust policies that address
contract cheating and implement appropriate consequences for
offenders while simultaneously promoting a supportive learning
environment. Finally, future research should examine the possi-
bility of developing a model to detect contract cheating when an
individual writes in their native language. Native speakers range
from school students to graduate students, and our model may
not apply to all these groups.
Conclusion
In this study, we attempted to tackle the pressing issue of contract
cheating in academic institutions by exploring the possibility of
developing a statistical model to detect contract cheating
instances, specically in the context of student essays. Our
approach departed from conventional plagiarism detection
methods in favor of identifying deviations from an individuals
unique writing style, rather than relying solely on textual simi-
larities. We acknowledge the growing impact of AI language
models like ChatGPT on contract cheating, with their potential to
generate undetectable novel texts that elude traditional detection
mechanisms. We also acknowledge that it might be possible for
AI tools to mimic an individuals writing style, though this still
requires feeding the model with various examples of ones writ-
ing, making the process more burdensome and not merely a few
clicks of a button. Similarly, this will make it harder for essay
mills to mass-produce essays.
The advent of AI language models has indeed ushered in a new
era of academic integrity, prompting educators and institutions to
adapt their approaches to safeguard the credibility of education
and assessment. The challenge lies in devising effective detection
models that can distinguish AI-generated content from genuine
human-authored work while also maintaining student privacy and
data security. Our ndings underscore the need for continuous
research and development of ML algorithms that can better
identify AI-generated texts and provide nuanced approaches to
combatting contract cheating. Moreover, it is imperative to foster
a culture of academic honesty through educational interventions,
policies, and collaborative efforts that emphasize the ethical use of
AI tools and promote responsible academic conduct.
As the academic community confronts the evolving landscape
of academic misconduct, a multi-faceted approach is essential.
This involves leveraging technological advancements to enhance
detection capabilities, encouraging students to embrace the value
of authentic learning experiences, and enacting institutional
policies that deter and penalize contract cheating. While AI
language models hold immense potential to revolutionize edu-
cation positively, their misuse for the purpose of contract
cheating threatens the very foundations of academic integrity. By
remaining proactive in our strategies and engaging all stake-
holders, we can collectively uphold the principles of academic
integrity, preserve the credibility of educational assessments, and
foster an environment where learning and knowledge ourish
ethically.
Data availability
Data may be accessed at https://osf.io/m3zax/.
Received: 31 July 2023; Accepted: 13 May 2024;
References
Al Shlowiy A, Al-Hoorie AH, Alharbi M (2021) Discrepancy between language
learners and teachers concerns about emergency remote teaching. J Comput
Assist Learn 37(6):15281538. https://doi.org/10.1111/jcal.12543
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9
8HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Basken P (2020) Universities say student cheating exploding in Covid era. In:
Times Higher Education. https://www.timeshighereducation.com/news/
universities-say-student-cheating-exploding-covid-era
Bretag T (2019) Contract cheating will erode trust in science. Nature
574(7780):599. https://doi.org/10.1038/d41586-019-03265-1
Bretag T, Harper R, Burton M, Ellis C, Newton P, Rozenberg P, Saddiqui S, van
Haeringen K (2019) Contract cheating: a survey of Australian university
students. Stud. High. Educ. 44(11):18371856. https://doi.org/10.1080/
03075079.2018.1462788
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic
minority over-sampling technique. J Artif Intell Res 16:321357. https://doi.
org/10.1613/jair.953
Dörnyei Z, Al-Hoorie AH (2017) The motivational foundation of learning lan-
guages other than Global English. Mod Lang J 101(3):455468. https://doi.
org/10.1111/modl.12408
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and articial neural net-
work classication models: a methodology review. J Biomed Inform
35(5):352359. https://doi.org/10.1016/s1532-0464(03)00034-0
Elshawi R, Maher M, Sakr S (2019) Automated machine learning: State-of-the-art
and open challenges. arXiv.https://doi.org/10.48550/arXiv.1906.02287
Flanagin A, Kendall-Taylor J, Bibbins-Domingo K (2023) Guidance for authors,
peer reviewers, and editors on use of AI, language models, and chatbots.
JAMA, Advance online publication. https://doi.org/10.1001/jama.2023.12500
Friedman JH (2001) Greedy function approximation: a gradient boosting machine.
Ann Stat 29(5):11891232. https://doi.org/10.1214/aos/1013203451
Guerrero-Dib JG, Portales L, Heredia-Escorza Y (2020) Impact of academic
integrity on workplace ethical behaviour. Int J Educ Integr. 16(1):2. https://
doi.org/10.1007/s40979-020-0051-3
Hardeniya N, Perkins J, Chopra D, Joshi N, Mathur I (2016) Natural language
processing: python and NLTK. Packt Publishing
Hill G, Mason J, Dunn A (2021) Contract cheating: an increasing challenge for
global academic community arising from COVID-19. Res Pract Technol
Enhanc Learn 16(1):24. https://doi.org/10.1186/s41039-021-00166-8
Hodges C, Moore SL, Lockee B, Trust T, Bond A (2020) The difference between
emergency remote teaching and online learning. EDUCAUSE Rev. https://er.
educause.edu/articles/2020/3/the-difference-between-emergency-remote-
teaching-and-online-learning
Kim S-G (2023) Using ChatGPT for language editing in scientic articles. Max-
illofac Plast Reconstr Surg 45(1):13. https://doi.org/10.1186/s40902-023-
00381-x. Article
Lancaster T, Clarke R (2016) Contract cheating: the outsourcing of assessed student
work. In: T Bretag (Ed.) Handbook of academic integrity (pp. 639654) Springer
Lancaster T, Cotarlan C (2021) Contract cheating by STEM students through a le
sharing website: a Covid-19 pandemic perspective. Int J Educ Integr 17(1):3.
https://doi.org/10.1007/s40979-021-00070-0
Mitchell R, Myles F, Marsden E (2019) Second language learning theories (4th ed.).
Routledge
Mohammed R, Rawashdeh J, Abdullah M (2020, April). Machine learning with
oversampling and undersampling techniques: overview study and experi-
mental results. paper presented at the 11th international conference on
information and communication systems (ICICS), Irbid, Jordan
Orosz G, Tóth-Király I, Bőthe B, Paskuj B, Berkics M, Fülöp M, Roland-Lévy C
(2018) Linking cheating in school and corruption. Eur Rev Appl Psychol
68(2):8997. https://doi.org/10.1016/j.erap.2018.02.001
Qaiser S, Ali R (2018) Text mining: Use of TF-IDF to examine the relevance of
words to documents. Int J Comput Appl 181(1):2529. https://doi.org/10.
5120/ijca2018917395
Tharwat A (2021) Classication assessment methods. Appl Comput Inform
17(1):168192. https://doi.org/10.1016/j.aci.2018.08.003
Weber-Wulff D (2019) Plagiarism detectors are a crutch, and a problem. Nature
567(7749):435. https://doi.org/10.1038/d41586-019-00893-5
Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Foltýnek T, Guerrero-Dib J,
Popoola O, Šigut P, Waddington L (2023) Testing of detection tools for AI-
generated text. Int J Educ Integr 19(1):26. https://doi.org/10.1007/s40979-
023-00146-z
Yang L, Shami A (2020) On hyperparameter optimization of machine learning
algorithms: theory and practice. Neurocomputing 415:295316. https://doi.
org/10.1016/j.neucom.2020.07.061
Author contributions
Mohammed Kutbi: conceptualization, data curation, formal analysis, investigation, meth-
odology, software, visualization, writing: original draft, and writing: review & editing. Ali H.
Al-Hoorie: conceptualization, methodology, project administration, resources, writing: ori-
ginal draft, and writing: review & editing. Abbas Al-Shammari: conceptualization, data
curation, methodology, writing: original draft, and writing: review & editing.
Ethical approval
The authors sought and obtained IRB approval from the ethics team at the English
Language and Preparatory Year Institute, Royal Commission for Jubail and Yanbu (dated
23 November 2020) with no number attached to it. All procedures implemented in this
study adhered to the ethical standards of the granting institution and to the tenets of the
Declaration of Helsinki.
Informed consent
Informed consent was obtained from all participants in this study to ensure adherence to
ethical guidelines. Detailed consent forms explaining the scope of the study and the
participantsrights were signed by each participant before the commencement of the
study. The consent forms also assured each participant of the condentiality of their
responses and their right to withdraw at any time without any consequences. The
consent forms also explained that the data will be used for research purposes after
anonymization including further exploratory research.
Competing interests
The authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to Ali H. Al-Hoorie.
Reprints and permission information is available at http://www.nature.com/reprints
Publishers note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons licence, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2024
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-024-03160-9 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2024) 11:664 | https://doi.org/10.1057/s41599-024-03160-9 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... They found that AI can be useful for writing, but it requires responsible use by students and an educational environment that promotes its appropriate use. Similarly, the study (Kutbi et al., 2024) proposed developing a machine learning-based model to identify inconsistency in student-written essays and ChatGPTgenerated texts. They used Inverse Document Frequency (TF-IDF) based data processing and transformation over random sampling and logistic regression to identify inconsistency in AI-generated text, resulting in a familiarity accuracy of 98%. ...
... The review results show Turnitin, GPTZero and ZeroGPT as researchers' most frequently used tools, followed by less frequent ones, such as the TF-IDF method. These results are related to the work (Kutbi et al., 2024) Because they used the algorithm to identify the inconsistency of AI-generated text, accompanied by other methods such as random oversampling and logistic regression. Likewise, the work (Guleria et al., 2023) the GPTZero tool was used to identify that the articles generated by this model lacked real citations and references. ...
Article
Full-text available
The rapid advancement of artificial intelligence (AI) has profoundly transformed many people's lives, ChatGPT being a clear example, whose capabilities have substantially influenced the automation of tasks such as writing texts and providing information sources for researchers. This review article aims to understand the impact of AI on academic writing and why its use can be considered plagiarism. The Prism method was used to analyze the studies, which initially totaled 824, and after excluding them for duplicity and by title, a total of 137 were left, of which we proceeded to review those that were open access and those that were related to the study, obtaining a total of 54 manuscripts closely related to the research topic; the results were then segmented into three questions, What are the text-matching tools that help to identify the use of AI in article writing?, What strategies are used to regulate the use of AI in writing scientific articles? and What techniques are used to detect AI-generated text?, which were key to comparing the results of the review with the findings of the authors belonging to the literature review. The results showed that, from 2022 onwards, AI became a recurring topic among researchers in China and the United States, allowing the emergence of popular techniques and software such as Turnitin or GPTZero to identify when these linguistic models were used in writing tasks, as well as strategies to regulate their use in controlled environments. In the end, it was concluded that there is a fine line between ethics and abuse of AI capabilities, and further research and study of different techniques is recommended to recognize when these tools are blatantly used.
... Idea generation is another AI functionality discussed in 10 articles, nine of which investigated generative AI, mostly ChatGPT, in terms of its advantages and disadvantages as an idea generator and organizer, and the quality of its output. For instance, Zou and Huang (2023) It is noteworthy that six studies introduced particularly how AI systems (AWE or online tools) were developed and then evaluated in terms of their effectiveness in assessing L2 writing, such as syntactic complexity (Châu & Bulté, 2023), lexical richness (Spring & Johnson, 2022), or addressing issues such as contract cheating (Kutbi et al., 2024). ...
Article
Full-text available
The rapid advancement of artificial intelligence (AI) technologies has introduced both opportunities and challenges to second language (L2) writing, and a new dimension to L2 writing research. To map the evolving landscape of AI-integrated L2 writing research, this systematic review analyzed 112 studies published between January 2014 and June 2024. The review focused on research contexts and participants, theoretical and methodological orientations, research setups, and key research issues. The analysis revealed that the majority of studies were conducted with undergraduates in the English as a Foreign Language (EFL) context. Most studies did not specify a theoretical framework; however, the predominant theoretical orientations were cognitive, technological-pedagogical, social, critical, and genre. The literature commonly employed eclectic or mixed methodologies, favoring (quasi-)experimental designs with short duration. The writing tasks investigated were varied, with a notable emphasis on elemental genres such as argumentative essays. The main research issues addressed included AI functionalities; students' perceptions and experiences; impacts on students' writing, cognitive, affective, and behavioral performance; teachers' perceptions, teaching practices, and literacy development; and factors influencing AI use. The findings highlight the need for future research, particularly longitudinal studies focusing on the development of AI literacy in L2 writing.
... The advent of GenAI applications and the large language models underpinning them has already impacted much research in assessment generally and language assessment in particular, with educators wrestling with threats to assessment integrity and ethical consideration on the one hand (Kutbi et al., 2024) while trying to understand how to harness its affordances to enhance (language) assessment on the other (e.g., Mizumoto et al, 2024). For the former, Lodge et al. (2023a, p.3) suggest that "the barriers to engaging in cheating behavior (in terms of effort and risk) have been lowered significantly, and the ability to detect cheating has become significantly more difficult, if not impossible". ...
Chapter
Artificial intelligence (AI) is revolutionizing second language (L2) assessment by enhancing both traditional methods and introducing innovative approaches. This chapter examines AI's applications in assessing productive and receptive L2 language skills from automated speech recognition (ASR) for spoken assessments to automated essay scoring (AES) and writing evaluation (AWE). Recent advancements in generative AI (GenAI) and large language models (LLMs) have expanded opportunities for automated, scalable assessment across diverse contexts. Highlighting this, case studies from Australia, Hong Kong, and China illustrate the integration of GenAI into formative and summative assessments, highlighting strategies e.g., 6-P pedagogy and differentiated assessment practices. Despite promising developments, challenges persist, including concerns over assessment integrity, equity, and reliability across tasks and languages. To address these issues, the chapter advocates for professional development in AI literacy for educators, enabling ethical and effective integration of AI in assessment design and implementation. By fostering critical thinking, self-regulation, and inclusivity, AI has the potential to transform language education and prepare learners for real-world communication while still ensuring meaningful human involvement in assessment practices.
... Other research utilized stylometric features to detect machine-generated content. For instance, Kutbi et al. (2024) introduced a machine learning model with stylometry for identifying "Contract cheating", the act of students depending on others to complete academic assignments on their behalf, by detecting deviations from a learner's distinctive writing style, which achieved excellent accuracy in their research. Opara (2024) developed a data-driven model named "StyloAI" trained with 31 stylometric features to detect machine-generated content. ...
Preprint
Full-text available
Recent research has investigated the problem of detecting machine-generated essays for academic purposes. To address this challenge, this research utilizes pre-trained, transformer-based models fine-tuned on Arabic and English academic essays with stylometric features. Custom models based on ELECTRA for English and AraELECTRA for Arabic were trained and evaluated using a benchmark dataset. Proposed models achieved excellent results with an F1-score of 99.7%, ranking 2nd among of 26 teams in the English subtask, and 98.4%, finishing 1st out of 23 teams in the Arabic one.
Chapter
Cultural change is often required to support a positive academic integrity environment, a challenge for small institutions and community colleges alike. This case study demonstrates how planned, incremental changes to institutional practices and assessment approaches shifted a college’s culture from rule compliance to an academic integrity culture. Employing Gallant and Drinan’s (2008) four-stage model of institutionalization, this research develops lessons for other institutions, contributes to a greater understanding of academic integrity in applied education, and illustrates how the community college setting is conducive to institutionalizing academic integrity through practical processes and tools. The chapter ends with recent institutional work to further embed academic integrity into institutional quality assurance framework standards, processes, and evaluations and how the cultural evolution of quality assurance facilitates making academic integrity an integral part of all aspects of academic programming, from design to evaluation.
Article
Full-text available
This interpretive phenomenological analysis (IPA) study is a reflection on valuable insights gained by the authors in supervising graduate students’ theses, dissertations, and project works/reports, active participation in seminars on ethics in higher education and first-hand andragogical teaching experiences in the Ghanaian setting. The study explored contract cheating among graduate students and strategic interventions used by faculty to address it. Using an interpretive phenomenological analysis (IPA) in a qualitative narrative paradigm, the study discussed the critical causal factors, strategic interventions, andragogy, heutagogy, support mechanisms and software employed to mitigate graduate students’ indulgence in contract cheating in their final year thesis, dissertations, and project works/reports. Findings revealed that contract cheating among graduate students is influenced by personal, contextual, cultural, situational, institutional, and technological factors, as well as a misconception of widespread participation in higher education. The study recommends a paradigmatic shift away from the punitive and toward the developmental approach when responding to contract cheating. The study contributes new insights to enrich the ongoing scholarly conversation on contract cheating and interventions in Ghanaian universities. Keywords: Contract Cheating, Pseudepigraphy, Essay Mills, Academic Integrity, Andragogy
Article
Full-text available
Recent advances in generative pre-trained transformer large language models have emphasised the potential risks of unfair use of artificial intelligence (AI) generated content in an academic environment and intensified efforts in searching for solutions to detect such content. The paper examines the general functionality of detection tools for AI-generated text and evaluates them based on accuracy and error type analysis. Specifically, the study seeks to answer research questions about whether existing detection tools can reliably differentiate between human-written text and ChatGPT-generated text, and whether machine translation and content obfuscation techniques affect the detection of AI-generated text. The research covers 12 publicly available tools and two commercial systems (Turnitin and PlagiarismCheck) that are widely used in the academic setting. The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AI-generated text. Furthermore, content obfuscation techniques significantly worsen the performance of tools. The study makes several significant contributions. First, it summarises up-to-date similar scientific and non-scientific efforts in the field. Second, it presents the result of one of the most comprehensive tests conducted so far, based on a rigorous research methodology, an original document set, and a broad coverage of tools. Third, it discusses the implications and drawbacks of using detection tools for AI-generated text in academic settings.
Article
Full-text available
Emergency remote teaching refers to the unanticipated, involuntary shift to a virtual learning environment due to, for example, a natural disaster or political instability. The sudden nature of this transition creates additional challenges to effective learning. In this article, we investigate one such challenge, namely the potential for teacher–student miscommunication. We report on a study involving 674 language learners and 61 language teachers. The participants were asked to rate a number of education‐related ١problems that could potentially arise in the context of emergency remote teaching. Learners rated these concerns in terms of the extent to which they had actually experienced them, while teachers were asked to rate the extent to which they perceived these to be concerns for their students. The results showed that teachers believed that students required additional training on using learning management systems, that students did not take online teaching seriously, and that emergency remote teaching would encourage students to cheat. Students disagreed with these statements (ds = 0.53–0.65). We discuss the implications of these teacher–learner discrepancies in light of the need for explicit guidelines and clearer expectations of students during online learning and assessment.
Article
Full-text available
Due to COVID-19, universities with limited expertise with the digital environment had to rapidly transition to online teaching and assessment. This transition did not create a new problem but has offered more opportunities for contract cheating and diversified the types of such services. While universities and lecturers were adjusting to the new teaching styles and developing new assessment methods, opportunistic contract cheating providers have been offering $50 COVID-19 discounts and students securing the services of commercial online tutors to take their online exams or to take advantage of real-time assistance from ‘pros’ while sitting examinations. The article contributes to the discourse on contract cheating by reporting on an investigation of the scope and scale of the growing problems related to academic integrity exacerbated by an urgent transition to online assessments during the COVID-19 pandemic. The dark reality is the illegal services are developing at a faster pace than the systems required to curb them, as demonstrated by the results. The all-penetrating issues indicate systemic failures on a global scale that cannot be addressed by an individual academic or university acting alone. Multi-level solutions including academics, universities and the global community are essential. Future research must focus on developing a model of collaboration to address this problem on several levels, taking into account (1) individual academics, (2) universities, (3) countries and (4) international communities.
Article
Full-text available
Students are using file sharing sites to breach academic integrity in light of the Covid-19 pandemic. This paper analyses the use of one such site, Chegg, which offers “homework help” and other academic services to students. Chegg is often presented as a file sharing site in the academic literature, but that is just one of many ways in which it can be used. As this paper demonstrates, Chegg can and is used for contract cheating This is despite the apparent existence of an Honour Code on Chegg which asks students not to breach academic integrity. With pandemic led safety considerations leading to increased online teaching and assessment, the paper analyses data relating to how Chegg is used by students in five STEM subjects, namely Computer Science, Mechanical Engineering, Electrical Engineering, Physics and Chemistry. The results show that students are using Chegg to request exam style questions. They demonstrate that contract cheating requests can be put live and answered within the short duration of an examination. The number of student requests posted for these five subjects increased by 196.25% comparing the time period April 2019 to August 2019 with the period April 2020 to August 2020. This increase corresponds with the time when many courses moved to be delivered and assessed online. The growing number of requests indicates that students are using Chegg for assessment and exam help frequently and in a way that is not considered permissible by universities. The paper concludes by recommending that academic institutions put interventions in place to minimise the risk to educational standards posed by sites such as Chegg, particularly since increased online teaching and assessment may continue after the pandemic.
Article
Full-text available
Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine learning models has a direct impact on the model’s performance. It often requires deep knowledge of machine learning algorithms and appropriate hyper-parameter optimization techniques. Although several automatic optimization techniques exist, they have different strengths and drawbacks when applied to different types of problems. In this paper, optimizing the hyper-parameters of common machine learning models is studied. We introduce several state-of-the-art optimization techniques and discuss how to apply them to machine learning algorithms. Many available libraries and frameworks developed for hyper-parameter optimization problems are provided, and some open challenges of hyper-parameter optimization research are also discussed in this paper. Moreover, experiments are conducted on benchmark datasets to compare the performance of different optimization methods and provide practical examples of hyper-parameter optimization. This survey paper will help industrial users, data analysts, and researchers to better develop machine learning models by identifying the proper hyper-parameter configurations effectively. Github code: https://github.com/LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithms
Article
Full-text available
Well-planned online learning experiences are meaningfully different from courses offered online in response to a crisis or disaster. Colleges and universities working to maintain instruction during the COVID-19 pandemic should understand those differences when evaluating this emergency remote teaching.
Article
Full-text available
Corruption is a serious problem in Mexico and the available information regarding the levels of academic dishonesty in Mexico is not very encouraging. Academic integrity is essential in any teaching-learning process focussed on achieving the highest standards of excellence and learning. Promoting and experiencing academic integrity within the university context has a twofold purpose: to achieve the necessary learnings and skills to appropriately perform a specific profession and to develop an ethical perspective which leads to correct decision making. The objective of this study is to explore the relationship between academic integrity and ethical behaviour, particularly workplace behaviour. The study adopts a quantitative, hypothetical and deductive approach. A questionnaire was applied to 1203 college students to gather information regarding the frequency in which they undertake acts of dishonesty in different environments and in regards to the severity they assign to each type of infraction. The results reflect that students who report committing acts against academic integrity also report being involved in dishonest activities in other contexts, and that students who consider academic breaches less serious, report being engaged in academic misconduct more frequently in different contexts. In view of these results, it is unavoidable to reflect on the role that educational institutions and businesses can adopt in the development of programmes to promote a culture of academic integrity which: design educational experiences to foster learning, better prepare students to fully meet their academic obligations, highlight the benefits of doing so, prevent the severity and consequences of dishonest actions, discourage cheating and establish clear and efficient processes to sanction those students who are found responsible for academic breaches.
Article
To combat academic dishonesty, focus on educational systems and not just individual offenders, says Tracey Bretag. To combat academic dishonesty, focus on educational systems and not just individual offenders, says Tracey Bretag.