Content uploaded by Eka Miranda
Author content
All content in this area was uploaded by Eka Miranda on Nov 21, 2024
Content may be subject to copyright.
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 245 (2024) 282–289
1877-0509 © 2024 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 9th International Conference on Computer Science
and Computational Intelligence 2024
10.1016/j.procs.2024.10.253
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2024) 000–000
www.elsevier.com/locate/procedia
1877-0509 © 2024 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 9th International Conference on Computer Science and Computational
Intelligence 2024
9th International Conference on Computer Science and Computational Intelligence 2024 (ICCSCI 2024)
The Impact of Augmentation and SMOTE Implementation on
the Classification Models Performance: A Case Study on
Student Academic Performance Dataset
Albert Verasius Dian Sanoa*, Faqir M. Bhattib, Eka Mirandac, Mediana Aryunic, Alfi
Yusrotis Zakiyyahd, Charles Bernandoc
aComputer Science Department, School of Computer Science, Bina Nusantara University, Jakarta 11480, Indonesia
bRiphah Institute of Computing and Applied Sciences, Riphah Intternational University, Raiwind, Lahore, Pakistan
cInformation Systems Department, School of Information Systems, Bina Nusantara University, Jakarta 11480, Indonesia
dMathematics Department, School of computer Science, Bina Nusantara University, Jakarta 11480, Indonesia
Abstract
This study aims to find out the impact of data augmentation and Synthetic Minority Over-sampling Techniques (SMOTE)
implementation on the classification models performance using small and imbalanced dataset of student academic performance.
The design of this study involved a comprehensive experiment by comparing four scenarios: 1) comparing classification models
without both data augmentation and SMOTE, 2) models with data augmentation, 3) models with SMOTE, and 4) models with
both data augmentation and SMOTE. The model’s performances were each measured based on standard evaluation metrices such
as accuracy, precision, recall, F1-score.
To test the results validity, there were three classification algorithms implemented and evaluated for each scenario, that is,
Random Forest, XGBoost, and AdaBoost. The finding of this study highlights the significant impact of data augmentation and
SMOTE to the increase of classification models performance, particularly over the small and imbalanced dataset. Results showed
that the implementation of both techniques simultaneously brought about the most significant increase in the evaluation metrices
compared to the implementation of both techniques separately.
The originality of this study lies in its comprehensive approach in comparing the effectiveness of data augmentation and
SMOTE, as well as the use of student academic performance dataset, which is, a real case in the context of artificial intelligence.
This finding gives a valuable insight to the researchers and practitioners in choosing appropriate techniques to handle small and
* Corresponding author.
E-mail address: avds@binus.ac.id
10.1016/j.procs.2024.10.253 1877-0509
Albert Verasius Dian Sano et al. / Procedia Computer Science 245 (2024) 282–289 283
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2024) 000–000
www.elsevier.com/locate/procedia
1877-0509 © 2024 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 9th International Conference on Computer Science and Computational
Intelligence 2024
9th International Conference on Computer Science and Computational Intelligence 2024 (ICCSCI 2024)
The Impact of Augmentation and SMOTE Implementation on
the Classification Models Performance: A Case Study on
Student Academic Performance Dataset
Albert Verasius Dian Sanoa*, Faqir M. Bhattib, Eka Mirandac, Mediana Aryunic, Alfi
Yusrotis Zakiyyahd, Charles Bernandoc
aComputer Science Department, School of Computer Science, Bina Nusantara University, Jakarta 11480, Indonesia
bRiphah Institute of Computing and Applied Sciences, Riphah Intternational University, Raiwind, Lahore, Pakistan
cInformation Systems Department, School of Information Systems, Bina Nusantara University, Jakarta 11480, Indonesia
dMathematics Department, School of computer Science, Bina Nusantara University, Jakarta 11480, Indonesia
Abstract
This study aims to find out the impact of data augmentation and Synthetic Minority Over-sampling Techniques (SMOTE)
implementation on the classification models performance using small and imbalanced dataset of student academic performance.
The design of this study involved a comprehensive experiment by comparing four scenarios: 1) comparing classification models
without both data augmentation and SMOTE, 2) models with data augmentation, 3) models with SMOTE, and 4) models with
both data augmentation and SMOTE. The model’s performances were each measured based on standard evaluation metrices such
as accuracy, precision, recall, F1-score.
To test the results validity, there were three classification algorithms implemented and evaluated for each scenario, that is,
Random Forest, XGBoost, and AdaBoost. The finding of this study highlights the significant impact of data augmentation and
SMOTE to the increase of classification models performance, particularly over the small and imbalanced dataset. Results showed
that the implementation of both techniques simultaneously brought about the most significant increase in the evaluation metrices
compared to the implementation of both techniques separately.
The originality of this study lies in its comprehensive approach in comparing the effectiveness of data augmentation and
SMOTE, as well as the use of student academic performance dataset, which is, a real case in the context of artificial intelligence.
This finding gives a valuable insight to the researchers and practitioners in choosing appropriate techniques to handle small and
* Corresponding author.
E-mail address: avds@binus.ac.id
284 Sano et. al. / Procedia Computer Science 00 (2024) 000–000
imbalanced class datasets. This study is expected to make an important contribution to the more effective development of
classification methodology in various domains.
© 2024 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 9th International Conference on Computer Science and
Computational Intelligence 2024
Keywords: Data augmentation; Imbalanced class; Small dataset, SMOTE
1. Introduction
In this currently running digital era, the use of Artificial Intelligence (AI) techniques in data analysis is increasing
more and more in many various fields [1][2][3], including in the context of classification and prediction. In such a
context, handling a small and imbalanced class dataset becomes a significant challenge. In this study, we propose to
explore the impact of the implementation of data augmentation and Synthetic Minority Over-sampling Technique
(SMOTE) on the performance of classification models using student academic performance dataset.
The main objective of this study is to answer the proposed research questions as stated here: 1) What is the impact
of the implementation of data augmentation and SMOTE to the classification models performance? 2) Does the
combinational use of both techniques simultaneously bring about a more significant increase in the performance
compared to that of the use separately (without combinational use)? These two questions are proposed in the context
of a small and imbalanced class of student academic performance dataset. In addition to reflecting the main objective
of this study, the research questions also give direction to the next discussion about the use of data augmentation and
SMOTE to the small and imbalanced class dataset.
Small and imbalanced class datasets are common problem in many fields, mostly in medical and finance,
including in academic research [4][5][6]. The appropriate handling of this problem has important implication in
leveraging the accuracy and reliability of classification model performance, which can in turn improve the decision
making and AI application in various fields.
This study is expected to make an important contribution in the sense that we will present a better understanding
of the performance difference between the use of mere data augmentation, mere SMOTE, and the combination of
both simultaneously in the context of handling small and imbalanced dataset of student academic performance.
The background of this study is driven by the need to handle appropriately and enhance the performance of
classification models for small and imbalanced datasets. Along with the ever-growing interest of the AI techniques
implementation in education, this study has direct relevance to the need and practice in this field.
This paper is organized as follows: following the introduction section, we will discuss our experiment design
which is the methods, including dataset description and the implemented techniques. Furthermore, we will present
the results and discussion including the four experiment scenarios, followed by the discussion of the findings. This
paper will be concluded by a conclusion and the direction of the future research.
The topic of this study is encouraged by the need for an effective approach in handling small and imbalanced
problems of student academic performance dataset. By understanding the impact of some technique’s performance,
it is expected that we can present practical guidance to researchers and practitioners in selecting the appropriate
approaches for their data analysis.
2. Methods
2.1. Data collection
The data were collected from an online questionnaire from January 6, 2024, to February 28, 2024, and responded
by 252 students from various departments, such as information systems, managements, law, marketing, etc. The
dataset has sixteen attributes consisting of fifteen independent attributes and one class attribute derived from each
statement in the questionnaire. The learning method context that was taken for this questionnaire is Case Based
Learning (CBL). The list of fifteen dependent attributes can be found in Table 1. The 16th statement or attribute,
284 Sano et. al. / Procedia Computer Science 00 (2024) 000–000
imbalanced class datasets. This study is expected to make an important contribution to the more effective development of
classification methodology in various domains.
© 2024 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 9th International Conference on Computer Science and
Computational Intelligence 2024
Keywords: Data augmentation; Imbalanced class; Small dataset, SMOTE
1. Introduction
In this currently running digital era, the use of Artificial Intelligence (AI) techniques in data analysis is increasing
more and more in many various fields [1][2][3], including in the context of classification and prediction. In such a
context, handling a small and imbalanced class dataset becomes a significant challenge. In this study, we propose to
explore the impact of the implementation of data augmentation and Synthetic Minority Over-sampling Technique
(SMOTE) on the performance of classification models using student academic performance dataset.
The main objective of this study is to answer the proposed research questions as stated here: 1) What is the impact
of the implementation of data augmentation and SMOTE to the classification models performance? 2) Does the
combinational use of both techniques simultaneously bring about a more significant increase in the performance
compared to that of the use separately (without combinational use)? These two questions are proposed in the context
of a small and imbalanced class of student academic performance dataset. In addition to reflecting the main objective
of this study, the research questions also give direction to the next discussion about the use of data augmentation and
SMOTE to the small and imbalanced class dataset.
Small and imbalanced class datasets are common problem in many fields, mostly in medical and finance,
including in academic research [4][5][6]. The appropriate handling of this problem has important implication in
leveraging the accuracy and reliability of classification model performance, which can in turn improve the decision
making and AI application in various fields.
This study is expected to make an important contribution in the sense that we will present a better understanding
of the performance difference between the use of mere data augmentation, mere SMOTE, and the combination of
both simultaneously in the context of handling small and imbalanced dataset of student academic performance.
The background of this study is driven by the need to handle appropriately and enhance the performance of
classification models for small and imbalanced datasets. Along with the ever-growing interest of the AI techniques
implementation in education, this study has direct relevance to the need and practice in this field.
This paper is organized as follows: following the introduction section, we will discuss our experiment design
which is the methods, including dataset description and the implemented techniques. Furthermore, we will present
the results and discussion including the four experiment scenarios, followed by the discussion of the findings. This
paper will be concluded by a conclusion and the direction of the future research.
The topic of this study is encouraged by the need for an effective approach in handling small and imbalanced
problems of student academic performance dataset. By understanding the impact of some technique’s performance,
it is expected that we can present practical guidance to researchers and practitioners in selecting the appropriate
approaches for their data analysis.
2. Methods
2.1. Data collection
The data were collected from an online questionnaire from January 6, 2024, to February 28, 2024, and responded
by 252 students from various departments, such as information systems, managements, law, marketing, etc. The
dataset has sixteen attributes consisting of fifteen independent attributes and one class attribute derived from each
statement in the questionnaire. The learning method context that was taken for this questionnaire is Case Based
Learning (CBL). The list of fifteen dependent attributes can be found in Table 1. The 16th statement or attribute,
© 2024 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientic committee of the 9th International Conference on Computer Science and
Computational Intelligence 2024
284 Albert Verasius Dian Sano et al. / Procedia Computer Science 245 (2024) 282–289
Sano et. al. / Procedia Computer Science 00 (2024) 000–000 285
which is also the class attribute, is “How are the students grades during the course with CBL learning”. The possible
values of this class attribute are “Stable” or “Increase”. In other words, this value becomes the students’ academic
performance in CBL method.
Table 1. Dataset attributes.
Questionnaire statements/attributes
Mean (1-5)
Standard Dev
CBL improves your understanding of the
material studied
4.37
0.676
CBL facilitates student self-learning
4.3
0.735
Hypotheses can be formulated from CBL for a
specific problem
4.29
0.699
In CBL, incorporating prior knowledge into the
current problem’s context is feasible
4.33
0.72
In CBL, the information collected can be
assessed in relation to the problem
4.31
0.694
CBL learning methods promote the
enhancement of decision-making skills
4.37
0.701
CBL enhances information processing skills
4.44
0.651
CBL makes students learn to critically analyze
information submitted by other members for
discussion
4.45
0.669
CBL trains you to communicate ideas to the
group effectively
4.3
0.762
CBL provides an opportunity to improve
leadership skills
4.15
0.878
CBL enables you to convey your thoughts
clearly in group discussions
4.25
0.792
CBL enables students to engage without
needing constant guidance
3.95
0.934
In CBL, you develop an appreciation for the
viewpoints of others within the group
4.44
0.698
Students can recognize their ethical and moral
responsibilities towards fellow group members
4.24
0.796
Participating in CBL groups helps you
recognize personal limitations
4.21
0.78
2.2. Data pre-processing
There are three main activities in this stage:
1. Statements or attributes were converted using the Likert scale with the range of one to five where one indicates
strongly disagree and five indicates strongly agree. The use of Likert conversion for questionnaires are common
practice in many domains [7][8] . Mean and Standard Deviation (SD) are used in Table 1 because they provide a
summary of responses to the questionnaire. The use of mean and SD for Likert-scale data is supported by some
studies and statistical practices, despite the common view that Likert-scale data are ordinal [9][10]. The mean
indicates the average response to each statement. A higher mean describes greater agreement or a more positive
response. Standard deviation indicates the variability of the responses. A higher standard deviation means there was
more variability in the response. Let us take the first statement as an example, “CBL improves your understanding
of the material studied” has a mean of 4.37 and SD of 0.676. The high mean indicates that respondents generally
Albert Verasius Dian Sano et al. / Procedia Computer Science 245 (2024) 282–289 285
Sano et. al. / Procedia Computer Science 00 (2024) 000–000 285
which is also the class attribute, is “How are the students grades during the course with CBL learning”. The possible
values of this class attribute are “Stable” or “Increase”. In other words, this value becomes the students’ academic
performance in CBL method.
Table 1. Dataset attributes.
Questionnaire statements/attributes
Mean (1-5)
Standard Dev
CBL improves your understanding of the
material studied
4.37
0.676
CBL facilitates student self-learning
4.3
0.735
Hypotheses can be formulated from CBL for a
specific problem
4.29
0.699
In CBL, incorporating prior knowledge into the
current problem’s context is feasible
4.33
0.72
In CBL, the information collected can be
assessed in relation to the problem
4.31
0.694
CBL learning methods promote the
enhancement of decision-making skills
4.37
0.701
CBL enhances information processing skills
4.44
0.651
CBL makes students learn to critically analyze
information submitted by other members for
discussion
4.45
0.669
CBL trains you to communicate ideas to the
group effectively
4.3
0.762
CBL provides an opportunity to improve
leadership skills
4.15
0.878
CBL enables you to convey your thoughts
clearly in group discussions
4.25
0.792
CBL enables students to engage without
needing constant guidance
3.95
0.934
In CBL, you develop an appreciation for the
viewpoints of others within the group
4.44
0.698
Students can recognize their ethical and moral
responsibilities towards fellow group members
4.24
0.796
Participating in CBL groups helps you
recognize personal limitations
4.21
0.78
2.2. Data pre-processing
There are three main activities in this stage:
1. Statements or attributes were converted using the Likert scale with the range of one to five where one indicates
strongly disagree and five indicates strongly agree. The use of Likert conversion for questionnaires are common
practice in many domains [7][8] . Mean and Standard Deviation (SD) are used in Table 1 because they provide a
summary of responses to the questionnaire. The use of mean and SD for Likert-scale data is supported by some
studies and statistical practices, despite the common view that Likert-scale data are ordinal [9][10]. The mean
indicates the average response to each statement. A higher mean describes greater agreement or a more positive
response. Standard deviation indicates the variability of the responses. A higher standard deviation means there was
more variability in the response. Let us take the first statement as an example, “CBL improves your understanding
of the material studied” has a mean of 4.37 and SD of 0.676. The high mean indicates that respondents generally
286 Sano et. al. / Procedia Computer Science 00 (2024) 000–000
agree or strongly agree that CBL improves their understanding. The relatively low SD suggests that most
respondents have similar opinions on that statement.
2. Data augmentation. Data augmentation, in short, is a technique of creating synthetic samples from an existing
dataset [11][12][13]. It is generally used in classification models. Data augmentation applied in study is Random
Oversampling.
3. SMOTE implementation. SMOTE briefly is the process of oversampling of minority labeled class of a dataset
by constructing synthetic examples [14][15]. Some extended variants of SMOTE for particular applications have
been developed by some experts resulting in the so called Deep SMOTE, SASMOTE, GSMOTE, etc. [14][16][17].
In this study, we implemented the basic SMOTE technique that works based on k-NN with the default k = 5
using python library. The key formula to generate a new synthetic data is shown in the equation below.
ℎ = + (ℎ − ) (1)
The brief explanation of SMOTE algorithm is described as follows:
1. Choose a minority sample. For example, pick a minority sample Xi
2. Find the nearest neighbors. Find k nearest neighbors from Xi. Let’s say it as Xneigh.
3. Create a new sample. Calculate the difference between X and Xneigh. Multiply the difference with
random numbers between 0 and 1, and add the result to X. The formula can be seen in equation (1).
4. Repeat step 1 to 3 to create many synthetic samples
We skipped elaborating on standard data pre-processing such as data cleaning, as the data were clean, no missing
values, and no redundancy.
2.3. Data splitting
We used hold-out method for splitting the dataset and split it into the ratio of 80:20 for training and testing data
consecutively, as it is the most practiced splitting ratio for training and testing dataset. This approach is widely
accepted and has been supported by numerous studies and reviews as a standard practice to ensure sufficient data for
training while reserving enough for testing to validate model performance. Furthermore, it is a commonly employed
ratio, particularly for its simplicity and effectiveness in preventing overfitting and ensuring robust model evaluation
[18][19]. In our experiment we also tried k-fold cross validation with k=10. These two splitting techniques are
widely and commonly practiced because of their simplicity and effectiveness in evaluating model performance [19].
However, the evaluation results showed similar results and almost the same, then we used hold-out method and
ignore k-fold cross validation to avoid too many scenarios for this experiment and keeping focus on the impact of
data augmentation and SMOTE.
2.4. Classification model implementation
We chose three commonly practiced classification algorithms, Random Forest, XGBoost, and Ada Boost for this
experiment. These three algorithms are chosen because they are ensemble methods, meaning that they combine
some models to work together to improve the model’s performance. The brief explanation about the three
algorithms is described as follows:
1. Random Forest. This is an ensemble algorithm that combines many decision trees to increase accuracy and
to reduce overfitting. This algorithm uses bootstrap sampling method to generate some subsets from training
data. Each tree will give a prediction, and the final decision is decided based on majority votes from all trees.
2. XGBoost. This algorithm generates models iteratively where each new model improves by fixing errors from
previous models. This algorithm uses regularization techniques to prevent overfitting and improves
generalization.
3. AdaBoost. This algorithm is a simple boosting algorithm that combines some weak models, i.e., small
decision tree, into a robust model. This algorithm gives more weight to misclassified samples in the previous
iteration, thereby focusing on errors. Adaptively adjust the model to improve classification performance.
286 Albert Verasius Dian Sano et al. / Procedia Computer Science 245 (2024) 282–289
Sano et. al. / Procedia Computer Science 00 (2024) 000–000 287
The applications of these three algorithms have been known as widely used and very effective in machine
learning [20][21][22]. Each algorithm will take four scenarios, i.e. 1) without applying data augmentation or
SMOTE, 2) with data augmentation applied, 3) with SMOTE applied, and 4) with both data augmentation and
SMOTE applied.
2.5. Model performance evaluation.
We use standard evaluation measurements for classification for this step, i.e., accuracy, precision, recall, and F1-
score. Below is the list of formulae’s equations to compute the metrics.
Accuracy = +
+++ (2)
• TP stands for True Positive. It is the count of positive values identified accurately as positive.
• TN (True Negative) represents the count of negative values identified accurately as negative.
• FP (False Positive) represents the count of negative values identified inaccurately as positive.
• FN (False Negative) represents the count of positive values identified inaccurately as negative.
Precision =
+ (3)
Recall =
+ (4)
F1-score = 2 ∗ ( ∗ )
+ (5)
3. Results and Discussion
As stated earlier, we had four scenarios in this experiment. Each scenario was tried out using three classification
algorithms.
Scenario 1: This scenario used the original dataset which is small and imbalanced. It means without the use of
data augmentation and SMOTE. Findings from the evaluation dataset are showcased in Table 2.
Table 2. Findings of classification model performance using scenario 1.
Algorithm
Accuracy
Precision
Recall
F1-score
Random Forest
46.939%
48.105%
46.939%
46.806%
XGBoost
53.061%
54.736%
53.061%
52.787%
AdaBoost
55.102%
56.560%
55.102%
54.990%
If we examine the results shown in Table 2, then they apparently look normal with relatively quite low scores
regarding accuracy, precision, recall and F1-score. These results are a consequence of a small and imbalanced
dataset. AdaBoost outperforms the other two algorithms. In general, however, we identify the results as relatively
low with the range between 46.939% and 56.560%. We use these results as the baseline to measure the impact of
data augmentation and/or SMOTE implementation on the dataset.
Scenario 2: This scenario applied data augmentation without SMOTE implementation on the dataset. Findings
from the evaluation dataset are showcased in Table 3.
Albert Verasius Dian Sano et al. / Procedia Computer Science 245 (2024) 282–289 287
Sano et. al. / Procedia Computer Science 00 (2024) 000–000 287
The applications of these three algorithms have been known as widely used and very effective in machine
learning [20][21][22]. Each algorithm will take four scenarios, i.e. 1) without applying data augmentation or
SMOTE, 2) with data augmentation applied, 3) with SMOTE applied, and 4) with both data augmentation and
SMOTE applied.
2.5. Model performance evaluation.
We use standard evaluation measurements for classification for this step, i.e., accuracy, precision, recall, and F1-
score. Below is the list of formulae’s equations to compute the metrics.
Accuracy = +
+++ (2)
• TP stands for True Positive. It is the count of positive values identified accurately as positive.
• TN (True Negative) represents the count of negative values identified accurately as negative.
• FP (False Positive) represents the count of negative values identified inaccurately as positive.
• FN (False Negative) represents the count of positive values identified inaccurately as negative.
Precision =
+ (3)
Recall =
+ (4)
F1-score = 2 ∗ ( ∗ )
+ (5)
3. Results and Discussion
As stated earlier, we had four scenarios in this experiment. Each scenario was tried out using three classification
algorithms.
Scenario 1: This scenario used the original dataset which is small and imbalanced. It means without the use of
data augmentation and SMOTE. Findings from the evaluation dataset are showcased in Table 2.
Table 2. Findings of classification model performance using scenario 1.
Algorithm
Accuracy
Precision
Recall
F1-score
Random Forest
46.939%
48.105%
46.939%
46.806%
XGBoost
53.061%
54.736%
53.061%
52.787%
AdaBoost
55.102%
56.560%
55.102%
54.990%
If we examine the results shown in Table 2, then they apparently look normal with relatively quite low scores
regarding accuracy, precision, recall and F1-score. These results are a consequence of a small and imbalanced
dataset. AdaBoost outperforms the other two algorithms. In general, however, we identify the results as relatively
low with the range between 46.939% and 56.560%. We use these results as the baseline to measure the impact of
data augmentation and/or SMOTE implementation on the dataset.
Scenario 2: This scenario applied data augmentation without SMOTE implementation on the dataset. Findings
from the evaluation dataset are showcased in Table 3.
288 Sano et. al. / Procedia Computer Science 00 (2024) 000–000
Table 3. Findings of classification model performance using scenario 2.
Algorithm
Accuracy
Precision
Recall
F1-score
Random Forest
89.000%
89.027%
89.000%
89.004%
XGBoost
89.000%
89.027%
89.000%
89.004%
AdaBoost
72.000%
72.361%
72.000%
72.000%
Examining Table 3 and comparing it to the baseline, which is shown in Table 2, we can see that there is a
significant impact because of the use of data augmentation. The results showed quite high scores with the range
between 72.000% and 89.027%. In this scenario, AdaBoost underperforms the other two algorithms, while Random
Forest and XGBoost showed the same results.
Scenario 3: This scenario applied SMOTE without data augmentation on the dataset. Findings from the
evaluation dataset are showcased in Table 4.
Table 4. Findings of classification model performance using scenario 3.
Algorithm
Accuracy
Precision
Recall
F1-score
Random Forest
51.020%
53.459%
51.020%
49.990%
XGBoost
48.980%
50.890%
48.980%
48.209%
AdaBoost
57.143%
59.039%
57.143%
56.893%
Examining Table 4 and comparing it to the baseline shown in Table 2, we find that there is a small increase in the
evaluation scores for Random Forest and AdaBoost algorithms, and a small decrease for XGBoost. In general, the
scores for the three algorithms are relatively low and similar to the baseline with the range between 48.209% and
59.039%. We can infer that the impact of SMOTE implementation is small. Even though it gives small increase for
Random Forest and AdaBoost, it is worth noting that it does not always give positive impact as shown in XGBoost
results.
Scenario 4: This scenario applied both data augmentation and SMOTE on the dataset. Findings from the
evaluation dataset are showcased in Table 5.
Table 5. Findings of classification model performance using scenario 4.
Algorithm
Accuracy
Precision
Recall
F1-score
Random Forest
89.000%
89.004%
89.000%
88.993%
XGBoost
90.500%
90.791%
90.500%
90.459%
AdaBoost
73.000%
73.097%
73.000%
72.863%
Examining Table 5 and comparing it to the baseline in Table 2, we find that there is a significant increase in the
evaluation scores. If we compare Table 5 to Table 3, however, we find that there is a small increase in the scores for
XGBoost and AdaBoost, while Random Forest has the same results.
From these four scenarios we find that the use of data augmentation and SMOTE simultaneously on the small
and imbalanced dataset has significant impact to the classification evaluation scores, regarding accuracy, precision,
recall, and F1-score. In detail, data augmentation contributes very significantly to the scores, while SMOTE does
very little.
4. Conclusion
The original small and imbalanced dataset produces a relatively poor performance of the evaluated classification
models with the scores ranging from about 46.939% - 56.560% regarding accuracy, precision, recall, and F1-score.
Data augmentation has a very significant impact on the model’s performance over the small and imbalanced
dataset. On the other hand, SMOTE does not have as big an impact as data augmentation.
The combinational use of data augmentation and SMOTE gives the best results, however, the most significant
contribution comes from the data augmentation. This concludes that apparently data augmentation technique is
288 Albert Verasius Dian Sano et al. / Procedia Computer Science 245 (2024) 282–289
Sano et. al. / Procedia Computer Science 00 (2024) 000–000 289
much more effective compared to SMOTE in leveraging the classification model’s performance in the context of
small and imbalanced dataset. For further research, considering the relatively small impact of SMOTE in this
experiment, it is worth noting that the use of SMOTE variants such as Borderline-SMOTE, SMOTE-ENN needs to
be considered to experiment with.
Acknowledgements
This work is supported by Bina Nusantara University as a part of Bina Nusantara University’s BINUS
International Research – Applied entitled “Learning Analytics Tools Based on Machine Learning in Predicting
Student Academic Performance in Case-Based Learning (CBL)” with contract number: 069/VRRTT/III/2024 and
contract date: March 18, 2024.
References
[1] B. Burger, D. K. Kanbach, S. Kraus, M. Breier, and V. Corvello, “On the use of AI-based tools like ChatGPT to support management
research,” Eur. J. Innov. Manag., vol. 26, no. 7, pp. 233–241, 2023, doi: 10.1108/EJIM-02-2023-0156.
[2] H. Crompton and D. Burke, “Artificial intelligence in higher education: the state of the field,” Int. J. Educ. Technol. High. Educ., vol.
20, no. 1, 2023, doi: 10.1186/s41239-023-00392-8.
[3] F. Ramzan, C. Sartori, S. Consoli, and D. Reforgiato Recupero, “Generative Adversarial Networks for Synthetic Data Generation in
Finance: Evaluating Statistical Similarities and Quality Assessment,” AI, vol. 5, no. 2, pp. 667–685, May 2024, doi:
10.3390/ai5020035.
[4] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, no. 1, 2019, doi:
10.1186/s40537-019-0192-5.
[5] L. Dube and T. Verster, “Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning
models,” Data Sci. Financ. Econ., vol. 3, no. 4, pp. 354–379, 2023.
[6] M. S. Kraiem, F. Sánchez-Hernández, and M. N. Moreno-García, “Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties. An approach based on association models,” Appl. Sci., vol. 11, no. 18, 2021, doi:
10.3390/app11188546.
[7] J. C. Westland, “Information loss and bias in likert survey responses,” PLoS One, vol. 17, no. 7 July, pp. 1–17, 2022, doi:
10.1371/journal.pone.0271949.
[8] A. T. Jebb, V. Ng, and L. Tay, “A Review of Key Likert Scale Development Advances: 1995–2019,” Front. Psychol., vol. 12, no. May,
pp. 1–14, 2021, doi: 10.3389/fpsyg.2021.637547.
[9] A. D. Averin, A. A. Yakushev, O. A. Maloshitskaya, S. A. Surby, O. I. Koifman, and I. P. Beletskaya, “Synthesis of porphyrin-
diazacrown ether and porphyrin-cryptand conjugates for fluorescence detection of copper(II) ions,” Russ. Chem. Bull., vol. 66, no. 8,
pp. 1456–1466, 2017, doi: 10.1007/s11172-017-1908-3.
[10] J. C. F. de Winter and D. Dodou, “Five-point likert items: T test versus Mann-Whitney-Wilcoxon,” Pract. Assessment, Res. Eval., vol.
15, no. 11, 2010, doi: 10.7275/bj1p-ts64.
[11] C. Shorten, T. M. Khoshgoftaar, and B. Furht, “Text Data Augmentation for Deep Learning,” J. Big Data, vol. 8, no. 1, 2021, doi:
10.1186/s40537-021-00492-0.
[12] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J. Big Data, vol. 6, no. 1, 2019, doi:
10.1186/s40537-019-0197-0.
[13] A. Mumuni and F. Mumuni, “Data augmentation: A comprehensive survey of modern approaches,” Array, vol. 16, no. August, p.
100258, 2022, doi: 10.1016/j.array.2022.100258.
[14] T. Kosolwattana, C. Liu, R. Hu, S. Han, H. Chen, and Y. Lin, “A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly
imbalanced data classification in healthcare,” BioData Min., vol. 16, no. 1, pp. 1–14, 2023, doi: 10.1186/s13040-023-00330-4.
[15] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “snopes.com: Two-Striped Telamonia Spider,” J. Artif. Intell. Res.,
vol. 16, no. Sept. 28, pp. 321–357, 2002, [Online]. Available:
https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp.
[16] D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans.
Neural Networks Learn. Syst., vol. 34, no. 9, pp. 6390–6404, 2023, doi: 10.1109/TNNLS.2021.3136503.
[17] J. Fonseca and F. Bacao, “Geometric SMOTE for imbalanced datasets with nominal and continuous features,” Expert Syst. Appl., vol.
234, no. September 2022, p. 121053, 2023, doi: 10.1016/j.eswa.2023.121053.
[18] L. Alzubaidi et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, vol. 8, no. 1.
Springer International Publishing, 2021.
[19] E. Kee, J. J. Chong, Z. J. Choong, and M. Lau, “A Comparative Analysis of Cross-Validation Techniques for a Smart and Lean Pick-
and-Place Solution with Deep Learning,” Electron., vol. 12, no. 11, 2023, doi: 10.3390/electronics12112371.
Albert Verasius Dian Sano et al. / Procedia Computer Science 245 (2024) 282–289 289
Sano et. al. / Procedia Computer Science 00 (2024) 000–000 289
much more effective compared to SMOTE in leveraging the classification model’s performance in the context of
small and imbalanced dataset. For further research, considering the relatively small impact of SMOTE in this
experiment, it is worth noting that the use of SMOTE variants such as Borderline-SMOTE, SMOTE-ENN needs to
be considered to experiment with.
Acknowledgements
This work is supported by Bina Nusantara University as a part of Bina Nusantara University’s BINUS
International Research – Applied entitled “Learning Analytics Tools Based on Machine Learning in Predicting
Student Academic Performance in Case-Based Learning (CBL)” with contract number: 069/VRRTT/III/2024 and
contract date: March 18, 2024.
References
[1] B. Burger, D. K. Kanbach, S. Kraus, M. Breier, and V. Corvello, “On the use of AI-based tools like ChatGPT to support management
research,” Eur. J. Innov. Manag., vol. 26, no. 7, pp. 233–241, 2023, doi: 10.1108/EJIM-02-2023-0156.
[2] H. Crompton and D. Burke, “Artificial intelligence in higher education: the state of the field,” Int. J. Educ. Technol. High. Educ., vol.
20, no. 1, 2023, doi: 10.1186/s41239-023-00392-8.
[3] F. Ramzan, C. Sartori, S. Consoli, and D. Reforgiato Recupero, “Generative Adversarial Networks for Synthetic Data Generation in
Finance: Evaluating Statistical Similarities and Quality Assessment,” AI, vol. 5, no. 2, pp. 667–685, May 2024, doi:
10.3390/ai5020035.
[4] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, no. 1, 2019, doi:
10.1186/s40537-019-0192-5.
[5] L. Dube and T. Verster, “Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning
models,” Data Sci. Financ. Econ., vol. 3, no. 4, pp. 354–379, 2023.
[6] M. S. Kraiem, F. Sánchez-Hernández, and M. N. Moreno-García, “Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties. An approach based on association models,” Appl. Sci., vol. 11, no. 18, 2021, doi:
10.3390/app11188546.
[7] J. C. Westland, “Information loss and bias in likert survey responses,” PLoS One, vol. 17, no. 7 July, pp. 1–17, 2022, doi:
10.1371/journal.pone.0271949.
[8] A. T. Jebb, V. Ng, and L. Tay, “A Review of Key Likert Scale Development Advances: 1995–2019,” Front. Psychol., vol. 12, no. May,
pp. 1–14, 2021, doi: 10.3389/fpsyg.2021.637547.
[9] A. D. Averin, A. A. Yakushev, O. A. Maloshitskaya, S. A. Surby, O. I. Koifman, and I. P. Beletskaya, “Synthesis of porphyrin-
diazacrown ether and porphyrin-cryptand conjugates for fluorescence detection of copper(II) ions,” Russ. Chem. Bull., vol. 66, no. 8,
pp. 1456–1466, 2017, doi: 10.1007/s11172-017-1908-3.
[10] J. C. F. de Winter and D. Dodou, “Five-point likert items: T test versus Mann-Whitney-Wilcoxon,” Pract. Assessment, Res. Eval., vol.
15, no. 11, 2010, doi: 10.7275/bj1p-ts64.
[11] C. Shorten, T. M. Khoshgoftaar, and B. Furht, “Text Data Augmentation for Deep Learning,” J. Big Data, vol. 8, no. 1, 2021, doi:
10.1186/s40537-021-00492-0.
[12] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J. Big Data, vol. 6, no. 1, 2019, doi:
10.1186/s40537-019-0197-0.
[13] A. Mumuni and F. Mumuni, “Data augmentation: A comprehensive survey of modern approaches,” Array, vol. 16, no. August, p.
100258, 2022, doi: 10.1016/j.array.2022.100258.
[14] T. Kosolwattana, C. Liu, R. Hu, S. Han, H. Chen, and Y. Lin, “A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly
imbalanced data classification in healthcare,” BioData Min., vol. 16, no. 1, pp. 1–14, 2023, doi: 10.1186/s13040-023-00330-4.
[15] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “snopes.com: Two-Striped Telamonia Spider,” J. Artif. Intell. Res.,
vol. 16, no. Sept. 28, pp. 321–357, 2002, [Online]. Available:
https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp.
[16] D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans.
Neural Networks Learn. Syst., vol. 34, no. 9, pp. 6390–6404, 2023, doi: 10.1109/TNNLS.2021.3136503.
[17] J. Fonseca and F. Bacao, “Geometric SMOTE for imbalanced datasets with nominal and continuous features,” Expert Syst. Appl., vol.
234, no. September 2022, p. 121053, 2023, doi: 10.1016/j.eswa.2023.121053.
[18] L. Alzubaidi et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, vol. 8, no. 1.
Springer International Publishing, 2021.
[19] E. Kee, J. J. Chong, Z. J. Choong, and M. Lau, “A Comparative Analysis of Cross-Validation Techniques for a Smart and Lean Pick-
and-Place Solution with Deep Learning,” Electron., vol. 12, no. 11, 2023, doi: 10.3390/electronics12112371.
290 Sano et. al. / Procedia Computer Science 00 (2024) 000–000
[20] Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and
Survival Analysis,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS,
pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.
[21] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol.
13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
[22] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Comput.
Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997, doi: 10.1006/jcss.1997.1504.