Contexts in source publication

Context 1
... Python data science libraries (pandas, numpy, matplotlib, scikit-learn, and others) were used to implement the data processing pipelines, or simply called the data pipelines. Figure 1 shows the data pipeline created for this work. ...
Context 2
... data pipeline as structured in Fig 1 ...

Similar publications

Article
Full-text available
Unplanned pregnancy is a pregnancy that is either mistimed or unwanted at the time of conception. It is a core concept in understanding the fertility of populations and the unmet need for contraception. Unintended pregnancy is associated with an increased risk of morbidity for women, and with health behaviors during pregnancy with adverse effects....
Article
Full-text available
Objective: The study entitled "Educational Lag in Higher Secondary Education Institutions in Ixtapaluca, Valle de Chalco, and Chalco" is to analyze the impact of the COVID-19 pandemic on school dropout rates and educational quality in these municipalities. Methodology: The research uses a mixed-methods approach, combining quantitative and qualitati...
Article
Full-text available
Student attrition poses a major challenge to academic institutions, funding bodies and students. With the rise of Big Data and predictive analytics, a growing body of work in higher education research has demonstrated the feasibility of predicting student dropout from readily available macro-level (e.g., socio-demographics or early performance metr...
Article
Full-text available
Early school leaving has profound implications at a socio-economic level and planning effective prevention programs within school is crucial for contrasting it. Based on the hypothesis that the dropout is the last step of a process that ends in the student’s decision to leave school, we studied the interplay between multiple risk factors of school...

Citations

... Assim como outros estudos que realizaram automações com AM em fóruns, o presente estudo escolheu desenvolver um classificador binário com AM. Da mesma forma que outros estudos, esse estudo requer atenção para o processo engenharia de atributos que pode contribuir e facilitar o processo de predição, com a redução de dimensionalidade -problema que pode afetar até técnicas avançadas e não lineares -e interpretabilidade [Oliveira et al. 2019, Yoo et al. 2022]. ...
Conference Paper
Full-text available
Os conceitos do diálogo freireano foram aplicados e organizados nos últimos anos de forma a ser possível delimitar características de mensagens de fóruns com a teoria. Esse trabalho propôs um classificador de texto binário para a presença da Valorização da Autônomia em mensagens de fóruns e ainda realizou uma comparação do desempenho de duas técnicas de codificação de texto. Os resultados indicaram com significância estatística que o Sentence-BERT foi superior ao método TF-IDF como método de codificação.
... Research by Oliveira et al. [7] identifying students at risk of dropping out is based on learning activity data on LMS using several forms of machine learning such as k-Nearest Neighbors, C-Support Vector Classification, Logistics Regression, Random Forest, Adaptive Boosting, Gradient Boosting, and Extremely Randomized Trees. In addition, Helal et al. [8] used Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) to predict students who are at risk of not graduating in a course based on data on online learning activities and social factors. ...
Article
Students’ academic performance is a key aspect of online learning success. Online learning applications known as Learning Management Systems (LMS) store various online learning activities. In this research, students’ academic performances in online course X are predicted such that teachers could identify students who are at risk much sooner. The prediction uses tree-based ensemble methods such as Random Forest, XGBoost (Extreme Gradient Boosting), and LightGBM (Light Gradient Boosting Machine). Random Forest is a bagging method, whereas XGBoost and LightGBM are boosting methods. The data recorded in LMS UI, or EMAS (e-Learning Management Systems) is collected. The data consists of activity data for 232 students (219 passed, 13 failed) in course X. This data is divided into three proportions (80:20, 70:30, and 60:40) and three periods (the first, first two, and first three months of the study period). Data is pre-processed using the SMOTE method to handle imbalanced data and implemented in all categories, with and without feature selection. The prediction results are compared to determine the best time for predicting students’ academic performance and how well each model can predict the number of unsuccessful students. The implementation results show that students’ academic performance can be predicted at the end of the second month, with best prediction rates of 86.8%, 80%, and 75% for the LightGBM, Random Forest, and XGBoost models, respectively, with feature selection. Therefore, with this prediction, students who could fail still have time to improve their academic performance.
... This theme consists of studies predicting student dropouts from online courses using ML models. The theme consists of nine papers using several ML algorithms to predict student dropouts from online courses [22,[46][47][48][49][50][51][52][53]. We found eight experimental papers [46][47][48][49][50][51][52][53], one of which was a literature review [22]. ...
... The theme consists of nine papers using several ML algorithms to predict student dropouts from online courses [22,[46][47][48][49][50][51][52][53]. We found eight experimental papers [46][47][48][49][50][51][52][53], one of which was a literature review [22]. However, that study does not adhere to the systematic literature review guidelines used in this study. ...
... Lian et al. [49] claimed 89% accuracy in the dropout prediction task with a gradient boosting decision tree model (GBDT). Oliveira et al. [51] highlighted that the highest accuracy is delivered by the RF (88%). On the other hand, LR performed the worst, with an accuracy of 79%. ...
Article
Full-text available
The use of artificial intelligence and machine learning techniques across all disciplines has exploded in the past few years, with the ever-growing size of data and the changing needs of higher education, such as digital education. Similarly, online educational information systems have a huge amount of data related to students in digital education. This educational data can be used with artificial intelligence and machine learning techniques to improve digital education. This study makes two main contributions. First, the study follows a repeatable and objective process of exploring the literature. Second, the study outlines and explains the literature’s themes related to the use of AI-based algorithms in digital education. The study findings present six themes related to the use of machines in digital education. The synthesized evidence in this study suggests that machine learning and deep learning algorithms are used in several themes of digital learning. These themes include using intelligent tutors, dropout predictions, performance predictions, adaptive and predictive learning and learning styles, analytics and group-based learning, and automation. artificial neural network and support vector machine algorithms appear to be utilized among all the identified themes, followed by random forest, decision tree, naive Bayes, and logistic regression algorithms.
Article
Predicting in advance the likelihood of students failing a course or withdrawing from a degree program has emerged as one of the widely embraced applications of Learning Analytics. While the literature extensively addresses the identification of at-risk students, it often doesn’t evolve into actual interventions, focusing more on reporting experimental outcomes than on translating them into real-world impact. The goal of early identification is straightforward, empowering educators to intervene before actual failure or dropout, but not enough attention is paid to what happens after the students are flagged as at risk. Interventions like personalized feedback, automated alerts, and targeted support can be game-changers, reducing failure and dropout rates. However, as this paper shows, few studies actually dig into the effectiveness of these strategies or measure their impact on student outcomes. Even more striking is the lack of research targeting stakeholders beyond students, like educators, administrators, and curriculum designers, who play a key role in driving meaningful interventions. The paper explores recent literature on automated academic risk prediction, focusing on interventions in selected papers. Our findings highlight that only about 14% of studies propose actionable interventions, and even fewer implement them. Despite these challenges, we can see that a global momentum is building around Learning Analytics, and institutions are starting to tap into the potential of these tools. However, academic databases, loaded with valuable insights, remain massively underused. To move the field forward, we propose actionable strategies, like developing intervention frameworks that engage multiple stakeholders, creating standardized metrics for measuring success and expanding data sources to include both traditional academic systems and alternative datasets. By tackling these issues, this paper doesn’t just highlight what is missing; it offers a roadmap for researchers and practitioners alike, aiming to close the gap between prediction and action. It’s time to go beyond identifying risks and start making a real difference where it matters most.
Article
In the educational data mining (EDM) field, predicting student at-risk, student retention, dropout and performance have been attractive tasks among researchers. However, it is difficult to develop accurate models without first performing proper feature selection and class balancing. Therefore, the goal of this study is to review the current and future perspective and trends within the field of EDM for the past 10 years. The goal is to understand the state-of-the-art methods and techniques involving feature selection, class balancing and machine learning models. From the analysis, it is understood that there are plenty of research gaps yet to be explored.
Article
Full-text available
Academic debt can cause a significant damage to the Russian economics and the higher education system in the medium term (on the horizon of 5–10 years). The purpose of the study is to identify the key problems based on the results of a comprehensive empirical analysis of the situation of the formation of massive academic debt (using the example of the “Business Informatics” direction at a Russian university) and to substantiate ways to improve the activities of universities in order to overcome them and reduce students’ academic dept. Research methods are general scientific (deduction, induction, generalization, comparative analysis, etc.), as well as special ones (correlation and regression, statistical, sociological surveys, etc.). Analytics and visualization of quantitative data were carried out using MS PowerBI software. Research results. It was revealed that: a) high incoming scores do not guarantee trouble-free education at the university; b) students with low scores (but not less than 160–170) are also able to master quite complicated university programs; c) the presence of academic debts does not depend on the type of disciplines studied (economics / information technology). The number of student dropouts in the studied sample (up to 50% of those who entered with a non-linear dependence on the total USE scores) testifies to the presence of reasons that are not related to the incoming educational potential of students. The results of the study made it possible to structure them into three groups: insufficient motivation, self-organization problems, and “incomplete maturation”. Five groups of students have been identified with an increased risk of accumulating academic debt. The article substantiates the use of indirect educational influence on the students through a special mobile application.