Figure 3 - uploaded by Myke Morais de Oliveira
Content may be subject to copyright.
presents the FP rate (FPR) on the x-axis, also called specificity, which is a performance measure of the negative class. The TP rate (TPR) represented on the y-axis measures the performance of the positive class. TPR is also calculated as 1-specificity. In this sense, a model that exhibits optimal performance has a ROC curve attached at the upper end of the left corner of the graph, that is, when the FP rate is 0% and TP rate is 100%. A value of AUROC close to 100% of degree separation means the models can distinguish well between the two classes of interest (1: DROPOUT YES and 0: DROPOUT NO). In contrast, AUROC value close to zero represents a model that inverts the classes, e.g., classify DROPOUT YES as NO and the other way round. When the AUROC is 50%, the model cannot separate between classes (might as well toss a coin). Half of the models shown in exhibit an AUROC greater than 90%; the RF based model exceeded this level to reach an AUROC of 94%. The lowest separation degree yield in the LR model which also obtained the lowest accuracy among the models validated. Precision, Recall and F1 score were measured using the scikit-learn library. The results obtained for these three metrics can be visualised in Table VI. The results observed for each model have small variation among these metrics. The values are calculated as the average for the two clases (1 and 0 respectively). The algorithm Extra Trees achieved the highest recall average ith value of 0.94. The minimum average of 0.79
Similar publications
School dropout is a significant concern universally. This paper investigates the incorporation of spatial dependency in estimating the topographical effect of school dropout rates in India. This study utilizes the secondary data on primary, upper primary, and secondary school dropout rates of the different districts of India available at the Unifie...
IntroductionAlthough many studies have examined the associations between growth problems in infancy and age at school entry, grade repetition, school dropout and schooling level in developing country, no synthesis of the evidence has been conducted. We aim to review evidence of the effects of stunting, or height-for-age, on schooling level and scho...
This study evaluates socio-spatial segregation in Machala, Ecuador, analyzing socioeconomic and demographic factors that impact the quality of life of residents. It focuses on inequalities in access to basic services, urban dynamics, education, health and employment in different neighborhoods. The findings reveal a majority of young people, aged 10...
This study investigates the role of social capital within the university context in retaining working students. It specifically examines the effects of university social capital factors—such as teacher–student relationships, peer networks, and support services—on the dropout intentions of working students, emphasizing the mediating role of employab...
Within the scope of school-based community studies coordinated by Maltepe University Homeless and Working Children Application and Research Center (SOYAC), a program aiming at increasing the resilience of students was implemented in a secondary school with high school dropouts. This international program, briefly known as RESCUR, was implemented fo...
Citations
... Assim como outros estudos que realizaram automações com AM em fóruns, o presente estudo escolheu desenvolver um classificador binário com AM. Da mesma forma que outros estudos, esse estudo requer atenção para o processo engenharia de atributos que pode contribuir e facilitar o processo de predição, com a redução de dimensionalidade -problema que pode afetar até técnicas avançadas e não lineares -e interpretabilidade [Oliveira et al. 2019, Yoo et al. 2022]. ...
Os conceitos do diálogo freireano foram aplicados e organizados nos últimos anos de forma a ser possível delimitar características de mensagens de fóruns com a teoria. Esse trabalho propôs um classificador de texto binário para a presença da Valorização da Autônomia em mensagens de fóruns e ainda realizou uma comparação do desempenho de duas técnicas de codificação de texto. Os resultados indicaram com significância estatística que o Sentence-BERT foi superior ao método TF-IDF como método de codificação.
... Research by Oliveira et al. [7] identifying students at risk of dropping out is based on learning activity data on LMS using several forms of machine learning such as k-Nearest Neighbors, C-Support Vector Classification, Logistics Regression, Random Forest, Adaptive Boosting, Gradient Boosting, and Extremely Randomized Trees. In addition, Helal et al. [8] used Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) to predict students who are at risk of not graduating in a course based on data on online learning activities and social factors. ...
Students’ academic performance is a key aspect of online learning success. Online learning applications known as Learning Management Systems (LMS) store various online learning activities. In this research, students’ academic performances in online course X are predicted such that teachers could identify students who are at risk much sooner. The prediction uses tree-based ensemble methods such as Random Forest, XGBoost (Extreme Gradient Boosting), and LightGBM (Light Gradient Boosting Machine). Random Forest is a bagging method, whereas XGBoost and LightGBM are boosting methods. The data recorded in LMS UI, or EMAS (e-Learning Management Systems) is collected. The data consists of activity data for 232 students (219 passed, 13 failed) in course X. This data is divided into three proportions (80:20, 70:30, and 60:40) and three periods (the first, first two, and first three months of the study period). Data is pre-processed using the SMOTE method to handle imbalanced data and implemented in all categories, with and without feature selection. The prediction results are compared to determine the best time for predicting students’ academic performance and how well each model can predict the number of unsuccessful students. The implementation results show that students’ academic performance can be predicted at the end of the second month, with best prediction rates of 86.8%, 80%, and 75% for the LightGBM, Random Forest, and XGBoost models, respectively, with feature selection. Therefore, with this prediction, students who could fail still have time to improve their academic performance.
... This theme consists of studies predicting student dropouts from online courses using ML models. The theme consists of nine papers using several ML algorithms to predict student dropouts from online courses [22,[46][47][48][49][50][51][52][53]. We found eight experimental papers [46][47][48][49][50][51][52][53], one of which was a literature review [22]. ...
... The theme consists of nine papers using several ML algorithms to predict student dropouts from online courses [22,[46][47][48][49][50][51][52][53]. We found eight experimental papers [46][47][48][49][50][51][52][53], one of which was a literature review [22]. However, that study does not adhere to the systematic literature review guidelines used in this study. ...
... Lian et al. [49] claimed 89% accuracy in the dropout prediction task with a gradient boosting decision tree model (GBDT). Oliveira et al. [51] highlighted that the highest accuracy is delivered by the RF (88%). On the other hand, LR performed the worst, with an accuracy of 79%. ...
The use of artificial intelligence and machine learning techniques across all disciplines has exploded in the past few years, with the ever-growing size of data and the changing needs of higher education, such as digital education. Similarly, online educational information systems have a huge amount of data related to students in digital education. This educational data can be used with artificial intelligence and machine learning techniques to improve digital education. This study makes two main contributions. First, the study follows a repeatable and objective process of exploring the literature. Second, the study outlines and explains the literature’s themes related to the use of AI-based algorithms in digital education. The study findings present six themes related to the use of machines in digital education. The synthesized evidence in this study suggests that machine learning and deep learning algorithms are used in several themes of digital learning. These themes include using intelligent tutors, dropout predictions, performance predictions, adaptive and predictive learning and learning styles, analytics and group-based learning, and automation. artificial neural network and support vector machine algorithms appear to be utilized among all the identified themes, followed by random forest, decision tree, naive Bayes, and logistic regression algorithms.
Background
The growth of online education has provided flexibility and access to a wide range of courses. However, the self‐paced and often isolated nature of these courses has been associated with increased dropout and failure rates. Researchers employed machine learning approaches to identify at‐risk students, but multiple issues have not been addressed concerning the definition of at‐risk students, as well as the strengths and limitations of different machine learning models to predict at‐risk students.
Objectives
This systematic review aims to provide a comprehensive overview of the past 10‐year research focusing on applying machine learning techniques for predicting at‐risk students (i.e., failure, dropouts) in online learning environments.
Methods
Studies were extracted from the ACM Digital Library, IEEE Xplore Digital Library, Web of Science, ERIC, ProQuest, and EBSCO. A total of 161 studies published from 2014 to 2024 were included in the review.
Results and Conclusions
Findings revealed (1) four primary at‐risk definitions outlined in the reviewed studies, each focusing on specific stages of student engagement and performance in a course; (2) most studies relied on student behavioural engagement and academic factors as at‐risk predictors; (3) the adoption of deep learning and ensemble deep learning networks has significantly increased in the past 5 years, often outperforming classical machine learning models. While studies in which classical machine learning excelled often relied on the ensemble methodology and smaller sample sizes; (4) current machine learning practice evaluated by a list of criteria showed concerns regarding reproducibility, generalisability, and interpretability.
Predicting in advance the likelihood of students failing a course or withdrawing from a degree program has emerged as one of the widely embraced applications of Learning Analytics. While the literature extensively addresses the identification of at-risk students, it often doesn’t evolve into actual interventions, focusing more on reporting experimental outcomes than on translating them into real-world impact. The goal of early identification is straightforward, empowering educators to intervene before actual failure or dropout, but not enough attention is paid to what happens after the students are flagged as at risk. Interventions like personalized feedback, automated alerts, and targeted support can be game-changers, reducing failure and dropout rates. However, as this paper shows, few studies actually dig into the effectiveness of these strategies or measure their impact on student outcomes. Even more striking is the lack of research targeting stakeholders beyond students, like educators, administrators, and curriculum designers, who play a key role in driving meaningful interventions. The paper explores recent literature on automated academic risk prediction, focusing on interventions in selected papers. Our findings highlight that only about 14% of studies propose actionable interventions, and even fewer implement them. Despite these challenges, we can see that a global momentum is building around Learning Analytics, and institutions are starting to tap into the potential of these tools. However, academic databases, loaded with valuable insights, remain massively underused. To move the field forward, we propose actionable strategies, like developing intervention frameworks that engage multiple stakeholders, creating standardized metrics for measuring success and expanding data sources to include both traditional academic systems and alternative datasets. By tackling these issues, this paper doesn’t just highlight what is missing; it offers a roadmap for researchers and practitioners alike, aiming to close the gap between prediction and action. It’s time to go beyond identifying risks and start making a real difference where it matters most.
In the educational data mining (EDM) field, predicting student at-risk, student retention, dropout and performance have been attractive tasks among researchers. However, it is difficult to develop accurate models without first performing proper feature selection and class balancing. Therefore, the goal of this study is to review the current and future perspective and trends within the field of EDM for the past 10 years. The goal is to understand the state-of-the-art methods and techniques involving feature selection, class balancing and machine learning models. From the analysis, it is understood that there are plenty of research gaps yet to be explored.
Academic debt can cause a significant damage to the Russian economics and the higher education system in the medium term (on the horizon of 5–10 years). The purpose of the study is to identify the key problems based on the results of a comprehensive empirical analysis of the situation of the formation of massive academic debt (using the example of the “Business Informatics” direction at a Russian university) and to substantiate ways to improve the activities of universities in order to overcome them and reduce students’ academic dept. Research methods are general scientific (deduction, induction, generalization, comparative analysis, etc.), as well as special ones (correlation and regression, statistical, sociological surveys, etc.). Analytics and visualization of quantitative data were carried out using MS PowerBI software. Research results. It was revealed that: a) high incoming scores do not guarantee trouble-free education at the university; b) students with low scores (but not less than 160–170) are also able to master quite complicated university programs; c) the presence of academic debts does not depend on the type of disciplines studied (economics / information technology). The number of student dropouts in the studied sample (up to 50% of those who entered with a non-linear dependence on the total USE scores) testifies to the presence of reasons that are not related to the incoming educational potential of students. The results of the study made it possible to structure them into three groups: insufficient motivation, self-organization problems, and “incomplete maturation”. Five groups of students have been identified with an increased risk of accumulating academic debt. The article substantiates the use of indirect educational influence on the students through a special mobile application.