Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An increasing number of higher education institutions have deployed learning management systems (LMSs) to support learning and teaching processes. Accordingly, data-driven research has been conducted to understand the impact of student participation within these systems on student outcomes. However, most research has focused on small samples or has used variables that are expensive to measure, which limits its generalizability. This article presents a prediction model based on low-cost variables and a sophisticated algorithm, to predict early which students attending large classes (with more than 50 enrollments) who are at risk of failing a course. Therefore, it will enable instructors and educational managers to carry out early interventions to prevent course failure. The results overperform other approaches in terms of accuracy, cost, and generalization. Moreover, LMS usage information improved the model by up to 12.28% in terms of root-mean-square error, enabling better early identification of at-risk students.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In another research (Yousafzai et al., 2020), a genetic algorithm is used to select 29 optimal features to predict performance in the exams. (Sandoval et al., 2018) proposed a prediction model based on low-cost variables to identify the undergraduate students struggling due to large student strength enrolled for a course. (Helal et al., 2018) considered student heterogeneity while constructing models for predicting students' academic performance. ...
... Performance Prediction using pre-admission data High school GPA, GAT and Admission test scores (predict performance at the end of first year of undergrad program) (Tan et al., 2022), (Martínez-Navarro et al., 2021), (Alharthi, 2021), (Erika B. Varga, 2021), (Adekitan and N-Osaghae, 2019), Performance Prediction using academic data of undergrad program's initial years GPA of first-second year of undergrad program and grades in a some courses (predict graduating GPA) (Hashim et al., 2020), (Qazdar et al., 2019), , (Miguéis et al., 2018), (Asif et al., 2017), (Hoffait and Schyns, 2017), (Jia and Maloney, 2015) Performance Prediction using low-cost variables Class participation, resource availability, heterogeneity, and class strength (predict future academic performance) (Tomasevic et al., 2020), (Yousafzai et al., 2020), (Xu et al., 2019), (Helal et al., 2018), (Sandoval et al., 2018) , (Thiele et al., 2016), (Xing et al., 2015) Performance Prediction using non-academic variables in addition to academic data Behavioral and emotional characteristics, social and demographic features (forecast future academic performance) (Wild et al., 2023), (Kukkar et al., 2023), (Yao et al., 2019), (Nti et al., 2022), (Karagiannopoulou et al., 2021), (Keser and Aghalarova, 2021), (Fernandes et al., 2019), (Thiele et al., 2016) ...
... We conclude from the various experiments that student performance can be predicted in a relative grading scheme. The current research work (Iqbal et al., 2017;Sandoval et al., 2018) mainly concentrated on the system that uses standards-based grading. No significant work exists to handle the system with relative grading, an emerging system used in many international tests. ...
Article
Educational data mining is widely deployed to extract valuable information and patterns from academic data. This research explores new features that can help predict the future performance of undergraduate students and identify at-risk students early on. It answers some crucial and intuitive questions that are not addressed by previous studies. Most of the existing research is conducted on data from 2-3 years in an absolute grading scheme. We examined the effects of historical academic data of 15 years on predictive modeling. Additionally, we explore the performance of undergraduate students in a relative grading scheme and examine the effects of grades in core courses and initial semesters on future performances. As a pilot study, we analyzed the academic performance of Computer Science university students. Many exciting discoveries were made; the duration and size of the historical data play a significant role in predicting future performance, mainly due to changes in curriculum, faculty, society, and evolving trends. Furthermore, predicting grades in advanced courses based on initial pre-requisite courses is challenging in a relative grading scheme, as students’ performance depends not only on their efforts but also on their peers. In short, educational data mining can come to the rescue by uncovering valuable insights from academic data to predict future performance and identify the critical areas that need significant improvement.
... Besides, researchers also did research to predict the potential of students to fail if the students attend the large classes that enroll by more than 50 students [9]. This is very important for the instructors to come out with the early interventions to prevent the course failure [9]. ...
... Besides, researchers also did research to predict the potential of students to fail if the students attend the large classes that enroll by more than 50 students [9]. This is very important for the instructors to come out with the early interventions to prevent the course failure [9]. This is due to the previous researches just focused on the small samples of students and used the variables that are expensive to measure, which is limits its generalizability. ...
... This is due to the previous researches just focused on the small samples of students and used the variables that are expensive to measure, which is limits its generalizability. Therefore, in this case, Random Tree proved as the best algorithm compared to the linear regression and robust linear regression with the result of RMSE = 0.100 and R 2 = 0.6203 in the Large Courses set and M-LMS scenario [9]. ...
Chapter
Student performance is the most factor that can be beneficial for many parties, including students, parents, instructors, and administrators. Early prediction is needed to give the early monitor by the responsible person in charge of developing a better person for the nation. In this paper, the improvement of Bagged Tree to predict student performance based on four main classes, which are distinction, pass, fail, and withdrawn. The accuracy is used as an evaluation parameter for this prediction technique. The Bagged Tree with the addition of Bag, AdaBoost, RUSBoost learners helps to predict the student performance with the massive datasets. The use of the RUSBoost algorithm proved that it is very suitable for the imbalance datasets as the accuracy is 98.6% after implementing the feature selection and 99.1% without feature selection compared to other learner types even though the data is more than 30,000 datasets.
... Some studies use data on student demographics and socio-economical factors along with internal assessment as well (Kotsiantis and In addition to predictors, researchers have explored various methods for predicting student performance as well. Some of them (Hämäläinen and Vinni 2006;Hong Yu et al. 2018), support vector machine (Santana et al. 2017), random forest (Ahmed and Sadiq 2018;Chen et al. 2018;Hasan et al. 2018;Sandoval et al. 2018), deep learning Kim et al. 2018) and linear regression Yang et al. 2018) are also evident in the literature. Apart from the application of traditional data mining approaches, researchers have also proposed some specialised algorithms to predict student performance (Hasheminejad and Sarvmili 2018;Márquez-Vera et al. 2013;Meier et al. 2016;Uddin and Lee 2017;Xu et al. 2017;Zollanvari et al. 2017). ...
... As mentioned earlier, some researchers have even tried to predict actual marks Huang and Fang 2013;Lu et al. 2018;Romero et al. 2008;Sandoval et al. 2018;Yang et al. 2018). The existing studies have used both regression and classification techniques for this purpose. ...
... For example,Hong et al. (2017),Lu et al. (2018) andYang et al. (2018) have used video viewing behaviour for estimating student performance. Student behaviour in an online forum(Mueen et al. 2016;Ornelas and Ordonez 2017;Romero et al. 2008;Widyahastuti and Tjhin 2017;Yoo and Kim 2014;Yu et al. 2018), learning management system(Conijn et al. 2017;Kim et al. 2018;Ostrow et al. 2015;Sandoval et al. 2018;Xing et al. 2015), their movement pattern, and activity during web browsing(Chaturvedi and Ezeife 2017) also help in predicting student performance.Thai-Nghe et al. (2009) andZollanvari et al. (2017) have used teaching quality and psychological factors of students for classifying student performance. ...
Article
Full-text available
Student performance modelling is one of the challenging and popular research topics in educational data mining (EDM). Multiple factors influence the performance in non-linear ways; thus making this field more attractive to the researchers. The widespread availability of e ducational datasets further catalyse this interestingness, especially in online learning. Although several EDM surveys are available in the literature, we could find only a few specific surveys on student performance analysis and prediction. These specific surveys are limited in nature and primarily focus on studies that try to identify possible predictor or model student performance. However, the previous works do not address the temporal aspect of prediction. Moreover, we could not find any such specific survey which focuses only on classroom-based education. In this paper, we present a systematic review of EDM studies on student performance in classroom learning. It focuses on identifying the predictors, methods used for such identification, time and aim of prediction. It is significantly the first systematic survey of EDM studies that consider only classroom learning and focuses on the temporal aspect as well. This paper presents a review of 140 studies in this area. The meta-analysis indicates that the researchers achieve significant prediction efficiency during the tenure of the course. However, performance prediction before course commencement needs special attention.
... Their work focused on ten LMS features, resulting in a "disengagement classification score" that identified 93% of students who ultimately disengaged by flagging them, prospectively, as moderate or high risk during the semester. Sandoval et al. (2018) used Sakai LMS data from 21,314 students across 811 large courses at a single institution to develop a predictive analytic for early identification of students at risk for failing. ...
... These variables were chosen based on their known predictive value in the literature (Calvert, 2014;Feild et el. 2018;Morris et al., 2015;Motz et al., 2019;Sandoval et al., 2018;You, 2015) and their ease of comprehension for end users. The data extract defining how these variables were pulled from the LMS are publicly available (Quick et al., 2020). ...
Preprint
Full-text available
This paper presents two studies examining the effectiveness of using learning analytics to inform targeted, proactive advising interventions aimed at improving student success. Study 1 validates a simple learning management system (LMS) learning analytic as predictive of end-of-term outcomes and persistence. Results suggest that this analytic measure, based on students’ activity in the LMS, has predictive utility for identifying students who might benefit from a proactive advising intervention. In Study 2, a randomized experiment with 458 undergraduate pre-major students, we test the hypothesis that an LMS-informed proactive advising intervention would improve end-of-term outcomes and persistence. Students in the treatment group exhibited, on average, an increase of nearly one-third of a grade point in their term GPAs, a reduction in DFWs earned, and an 80% higher likelihood of persisting compared to the control group. These findings provide strong evidence for the effectiveness of proactive advising interventions, where advisors’ efforts are targeted using learning analytics. They suggest that by transparently providing advisors with comprehensible insights, institutions might improve student outcomes and promote the use of data-informed interventions in academic advising.
... Por consiguiente, la exibilidad del personal educativo, debe mostrar interés en la aplicación de pedagogías innovadoras y trabajo colaborativo; incorporando alfabetizaciones digitales (Blau et al., 2020); aprendizaje móvil (Bereczki et al., 2021); mediados en diferentes espacios de aprendizaje (Foellmer et al., 2021) tanto para los estudiantes como para el personal; y el uso de la analítica de aprendizaje (Sandoval et al., 2018) para proporcionar un enfoque de aprendizaje lo más personalizado posible. ...
... Lo que incide a formar al docente para la educación superior de hoy, a través un enfoque en dos áreas de habilidades: resolución colaborativa de problemas a través de la retroalimentación (feedback) y alfabetización en TIC en redes digitales; un enfoque de la evaluación formativa; y módulos de desarrollo profesional docente, generando exibilidad en las comunidades educativas para fomentar la enseñanza en Alfabetizaciones digitales (Blau et al., 2020); Aprendizaje móvil (Bereczki, et al., 2021); Espacios de aprendizaje (Chen et al., 2018;Foellmer et al., 2021); y Analítica de aprendizaje (Sandoval et al., 2018). ...
Chapter
Full-text available
Las organizaciones educativas de todo el mundo están buscando nuevas formas de sentar las bases para una sólida comunidad, igualitaria y resiliente para el futuro, ante los retos de la emergencia climática y la crisis del coronavirus; este documento desarrolla una revisión de la literatura desde el paradigma cualitativo con un modelo naturalista, desde el enfoque fenomenológico que permite la comprensión hermenéutica objetiva de los constructos: (1) Aprendizaje híbrido, (2) Aprendizaje combinado, (3) Aprendizaje flexible; por consiguiente, a través del método deductivo se responde a las preguntas: (1) ¿Cuál es el significado de aprendizaje híbrido?; (2) ¿Cuándo hablamos de aprendizaje combinado?; (3) ¿Qué se entiende por aprendizaje flexible?; (4) ¿Cómo se relacionan los constructos?; (5) ¿Cuál es el contexto nacional de los aprendizajes en la educación superior en la nueva era de la normalidad?; A su vez, los autores proponen un modelo para gestionar el aprendizaje flexible en la educación superior de hoy, y presentan propuestas conforme a la práctica docente, la planeación y diseño, práctica de valores, actualización permanente y la gestión educativa en el tiempo post pandemia.
... Thanks to predictive modeling and learning analytics techniques, identifying at-risk students and predicting their learning performance in a class is now possible. A predictive model can be used as an Early Warning System (EWS) to identify and predict at-risk students in a course and inform both the teacher and the students (Sandoval et al., 2018;Howard et al., 2018;Waddington et al., 2016). ...
... In the first group, we found the work of Sandoval and his co-authors, who stated that the "students' grade point average (GPA)" was the most relevant indicator among 36 other indicators, followed by "the school in which the students were enrolled" as a moderately relevant indicator (Sandoval et al., 2018). In the work of Howard et al. (2018), the authors affirm that "continuous assessment" is the best indicator among three categories: "students' background information," "students' engagement," and "continuous assessment results." ...
Thesis
Full-text available
During their learning process, learners may encounter learning difficulties that may affect the quality of their academic outcomes. These difficulties could be triggered by external factors such as inadequate teaching content or internal factors closely related to some learners' specific characteristics. However, rather than looking for flaws in learners, it is more effective to explore extrinsic factors such as the form and relevance of teaching pedagogical content that is more adaptable to change and improvement than learner-specific factors. Struggling learners need different forms of support to learn more effectively and, if possible, catch up with their ordinary peers in terms of academic success. Nevertheless, it is essential to identify learners with difficulties at early stages to benefit from the appropriate support. However, before that, it is imperative to determine the signs and indicators to identify these learners in the face-to-face learning context in general and in e-learning environments in particular. This research is situated in this context and focuses on the learning difficulties faced by learners when using distance learning systems, as well as the intelligent tools available to help them overcome these difficulties. The use of distributed artificial intelligence techniques and, in particular, intelligent agents can solve the problem of detecting learners' learning difficulties and offer them the appropriate support at the right time. In recent years, new intelligent tools adopting new learning theories are continually being integrated into modern learning systems through predictive modeling used as Early Warning Systems (EWS), where one can identify and predict learners at risk in a given learning unit and inform both the teacher and the learners concerned. By collecting and analyzing learners' behavior through the traces left by them and using Distributed Artificial Intelligence (DMI) algorithms such as Multi-Agent Systems (MAS), it is possible to model, track, and monitor separately the current or even future behavior of each learner and identify which of them are doing well and which will face the likely difficulties providing valuable time to intervene and help these learners. Learners. To achieve these goals, a set of cognitive agents have been designed and implemented to detect learners' difficulties on the one hand and predict learners' failure or success on the other hand based on their behaviors. Prototypes validating the ideas proposed in this research work were developed and tested under real learning conditions. The results obtained are considered very promising and very encouraging.
... Online higher education has attracted extensive attention in the COVID-19 period with a goal to improving the quality of personalization, monitoring and evaluation in learning . AI performance prediction model has been used as a promising method in online higher education to accurately predict and monitor students' learning performance using student learning data and AI algorithms (Aydogdu, 2021;Sandoval et al., 2018;Tomasevic et al., 2020). The existing AI performance prediction models have been developed from the AI model perspective: with the objective of predicting the learning performance that students are likely to achieve given all the input information (Cen et al., 2016). ...
... Online higher education has been improved by different types of AI prediction models such as early warning systems, recommender systems, and tutoring and learner models (Sandoval et al., 2018). As an important component of AI prediction models, providing feedback to the instructors and students have become a critical strand in recent research (Bravo-Agapito et al., 2021). ...
Article
Full-text available
As a cutting-edge field of artificial intelligence in education (AIEd) that depends on advanced computing technologies, AI performance prediction model is widely used to identify at-risk students that tend to fail, establish student-centered learning pathways, and optimize instructional design and development. A majority of the existing AI prediction models focus on the development and optimization of the accuracy of AI algorithms rather than applying AI models to provide student with in-time and continuous feedback and improve the students’ learning quality. To fill this gap, this research integrated an AI performance prediction model with learning analytics approaches with a goal to improve student learning effects in a collaborative learning context. Quasi-experimental research was conducted in an online engineering course to examine the differences of students’ collaborative learning effect with and without the support of the integrated approach. Results showed that the integrated approach increased student engagement, improved collaborative learning performances, and strengthen student satisfactions about learning. This research made contributions to proposing an integrated approach of AI models and learning analytics (LA) feedback and providing paradigmatic implications for future development of AI-driven learning analytics.
... PLA is also beneficial for institutions that intend to develop an educational system to support both students and teachers. The institutions can establish learning analytics offices so as to monitor the quality of education and to provide student support services as well as professional development programs (Palmer, 2013;Sandoval et al., 2018). In addition to the main stakeholders, PLA can be useful for developers of personalized learning environments, which timely provide students adaptive tasks, resources, supports, and feedback. ...
... First, PLA studies emphasized the role of instructors to prevent learners' dropout and encourage high-level achievements. Instructors can detect students at risk using PLA results in order to prevent their dropout and timely provide learning supports to improve behavioral, cognitive, social, and emotional engagement in online learning environments (Nespereira et al., 2015;Sandoval et al., 2018). PLA studies also implied that teachers should design learning activities to promote high-level engagement, assess both quantity and quality of students' participation, and seek training to monitor and improve their teaching practice (Mwalumbwe & Mtebe, 2017;Saqr et al., 2018;Soffer & Cohen, 2019). ...
... In this context, the issue of which variables to consider is essential. Sandoval et al. (2018) stated that selecting low-cost variables is more practical and helpful in producing more generalisable results. For this purpose, this study referenced the frequency of usage and the participation mode. ...
... For this purpose, this study referenced the frequency of usage and the participation mode. Time-related metrics (e.g., time spent on a task, individual work time, time spent on an activity) were not calculated because the calculation is labourintensive (Bravo-Agapito et al., 2021;Sandoval et al., 2018). ...
Article
Full-text available
This study aimed to examine the behaviour of learners across a whole system and in various courses to reveal the interrelation between learners' system interaction, age, programme features and course design. We obtained data from the system logs of 1,634 learners enrolled in distance learning programmes. We performed hierarchical clustering analysis to describe system interactions; then, we carried out a sequential pattern analysis to examine navigational behaviours by clusters. The results showed that the system interactions (e.g., content, live lesson, assignment, exam, discussion) across the whole system differ by age and programme. The behaviour profiles of the learners changed when different course designs were presented. Learners who interacted more with any component (e.g., live lesson or content) according to their needs were more successful than those with limited interaction and assessment-oriented (those with limited interactions outside of the assignment). In an information and communication technology course, learners whose system interactions were sufficient to receive rewards were more likely to succeed. The sequential pattern analysis showed that the assessment-oriented cluster interacted with the assignment in the midterm weeks; the award-oriented cluster interacted with the content or completed their assignment and received an award. Consequently, it is difficult to determine or generalise the intervention unless the system, programme and course design features are standard.
... Researchers focus on the LMS recorded behavioural data for prediction, based on the assumption that records in the LMS can represent certain behaviours or traits of the user. These behaviours or traits are associated with their academic performance (Conijn et al., 2016;Dominguez et al., 2016;Shruthi and Chaitra, 2016;Adejo and Connolly, 2018;Helal et al., 2018;Sandoval et al., 2018;Akçapınar et al., 2019;Liao et al., 2019;Sukhbaatar et al., 2019;Mubarak et al., 2020b;Waheed et al., 2020). Different studies are concerned with different issues. ...
... The experimental results verify the effectiveness of its algorithm. From a methodological point of view, many studies belong to the research of hybrid model type (Sandoval et al., 2018;Yu et al., 2018a;Zhou et al., 2018;Akçapınar et al., 2019;Baneres et al., 2019;Hassan et al., 2019;Hung et al., 2019;Polyzou and Karypis, 2019). The underlying logic of this type of research is that the algorithms differ in their optimization search logic and find the most suitable algorithm for course failure prediction by comparison. ...
Article
Full-text available
Anomalies in education affect the personal careers of students and universities' retention rates. Understanding the laws behind educational anomalies promotes the development of individual students and improves the overall quality of education. However, the inaccessibility of educational data hinders the development of the field. Previous research in this field used questionnaires, which are time- and cost-consuming and hardly applicable to large-scale student cohorts. With the popularity of educational management systems and the rise of online education during the prevalence of COVID-19, a large amount of educational data is available online and offline, providing an unprecedented opportunity to explore educational anomalies from a data-driven perspective. As an emerging field, educational anomaly analytics rapidly attracts scholars from a variety of fields, including education, psychology, sociology, and computer science. This paper intends to provide a comprehensive review of data-driven analytics of educational anomalies from a methodological standpoint. We focus on the following five types of research that received the most attention: course failure prediction, dropout prediction, mental health problems detection, prediction of difficulty in graduation, and prediction of difficulty in employment. Then, we discuss the challenges of current related research. This study aims to provide references for educational policymaking while promoting the development of educational anomaly analytics as a growing field.
... In all cases, an early forecast is desired to enable proactive teaching actions aimed at providing students with sufficient support to improve their performance and avoid their attrition [6], [7]. To this respect, some recent experiments have proven that tools such as intelligent tutoring systems, early warning systems (EWSs), and recommender systems can be very useful in higher education [8]. ...
... It should also be noted that most predictive student-related attributes in each CP were manually derived, because previous works have suggested that student's performance predictions can be maximized when domain knowledge is used as support to select the best performing set of input data [27]. Moreover, a broad variety of previous works have also made use of this approach [8], [15], [29], [51]. Nonetheless, automatic selection of features in each CP was also explored. ...
Article
Early warning systems (EWSs) have proven to be useful in identifying students at risk of failing both online and conventional courses. Although some general systems have reported acceptable ability to work in modules with different characteristics, those designed from a course-specific perspective have recently provided better outcomes. Hence, the main goal of this work is to design a tailored EWS for a conventional course in power electronic circuits. For that purpose, effectiveness of some common classifiers in predicting at-risk students has been analyzed. Although slight differences in their performance have only been noticed, an ensemble classifier combining outputs from several of them has provided to be the best performer. As a major contribution, a novel weighted voting combination strategy has been proposed to exploit global information about how basic prediction algorithms perform on several time points during the semester and diverse subsets of student-related features. Predictions at five critical points have been analyzed, revealing that the end of the fourth week is the optimal time to identify students at risk of failing the course. At that moment, accuracies about 85-90% have been reached. Moreover, several scenarios with different subsets of student-related attributes have been considered in every time point. Besides common parameters from students background and continuous assessment, novel features estimating students performance progression on weekly assignments have been introduced. The proposal of this set of new input variables is another key contribution, because they have allowed to improve more than 5% predictions of at-risk students at every time point.
... These researchers argue that some demographic factors can affect the academic performance of students at different study levels (Ali et al., 2013;Shum & Crick, 2012;Tempelaar et al., 2015). Other demographic characteristics that are used in the literature are family income, socio-economic status, race and ethnicity (Aguiar et al., 2014;Costa et al., 2017;Miguéis et al., 2018;Sandoval et al., 2018;Wolff et al., 2013). ...
... Other metrics include Mean Absolute Error (MAE) and Root Mean Square Error (RMSE, Chai & Draxler, 2014;Howard et al., 2018;Ioanna Lykourentzou et al., 2009;Kotsiantis, 2012). Average accuracy of a model for both classification and regression problem is measured using Average Prediction Accuracy (PAP) and Average Accurate Prediction (APA, Huang & Fang, 2013;Sandoval et al., 2018). Details of the evaluation measures used in the literature are shown in Table 3. ...
Article
Predictive models on students’ academic performance can be built by using historical data for modelling students’ learning behaviour. Such models can be employed in educational settings to determine how new students will perform and in predicting whether these students should be classed as at-risk of failing a course. Stakeholders can use predictive models to detect learning difficulties faced by students and thereby plan effective interventions to support students. In this paper, we present a systematic literature review on how predictive analytics have been applied in the higher education domain. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a literature search from 2008 to 2018 and explored current trends in building data-driven predictive models to gauge students’ performance. Machine learning techniques and strategies used to build predictive models in prior studies are discussed. Furthermore, limitations encountered in interpreting data are stated and future research directions proposed.
... They report that RFs outperformed other models with a precision of 71.5%. Similarly, RFs also outperformed LR and Robust Linear Regression in the study presented by Sandoval et al. [33], who used academic records and behaviour in the institutional learning management system to predict the final grade of students attending large classes. Falát and Piscová [34] also used several supervised Machine Learning methods to predict students' grade point average (GPA), namely LR, DTs, and RFs, with the latter being the one that provides the best predictive ability. ...
Article
Full-text available
For a lot of beginners, learning to program is challenging; similarly, for teachers, it is difficult to draw on students’ prior knowledge to help the process because it is not quite obvious which abilities are significant for developing programming skills. This paper seeks to shed some light on the subject by identifying which previously recorded variables have the strongest correlation with passing an introductory programming course. To do this, a data set was collected including data from four cohorts of students who attended an introductory programming course, common to all Engineering programmes at a Chilean university. With this data set, several classifiers were built, using different Machine Learning methods, to determine whether students pass or fail the course. In addition, models were trained on subsets of students by programme duration and engineering specialisation. An accuracy of 68% was achieved, but the analysis by specialisation shows that both accuracy and the significant variables vary depending on the programme. The fact that classification methods select different predictors depending on the specialisation suggests that there is a variety of factors that affect a student’s ability to succeed in a programming course, such as overall academic performance, language proficiency, and mathematical and scientific skills.
... By using data mining methods, namely, the random forest, logistic regression, and artificial neural network algorithms, the accuracy of predicting students who are at risk can be increased (Hoffait & Schyns, 2017). The risk of failing a course can be predicted early in the course so that an instructor can implement early intervention steps to prevent students from failing the course (Sandoval, Gonzalez, Alarcon, Pichara, & Montenegro, 2018). Several studies have been conducted to analyze student performance in coherent vertical curriculum, such as cluster analysis on aggregated data and segmented data , (Priyambada, Er, & Yahya, 2017). ...
Article
The ability to predict students' performance is important not only for the students but also for academic stakeholders in higher education institutes. Predictions can be made by using data stored in an academic information system on students' behavior related to taking courses that are an important part of a higher education institute with a coherent vertical curriculum. A student's course-taking behavior can be used as an indicator of their potential performance by investigating the alignment of their course-taking activities with curriculum guidelines. Domain knowledge is also considered as a variable due to the varying compositions of courses in curriculum guidelines. Past performance also needs to be taken into consideration. The result of the prediction can be used to help academic stakeholders take actions such as intervening to ensuring that students graduate on time. In this paper, we propose a two-layer ensemble learning technique that combines ensemble learning and ensemble-based progressive prediction and it utilizes students' learning behavior data and domain knowledge for current and past performances. The results show that the accuracy of our proposed framework on a real-world student dataset is improved.
... The instructors can use and apply educational interventions to reduce failure rate. (Sandoval et al., 2018) Students' academic records are stored in the offices of the engineering faculties and these records includes the performance of student at different subject as well as the information regarding the student origin, age, previous studies information. All this information should be enough and help to categorize class of the students we are dealing with. ...
Research
Full-text available
This paper is aims to analyses and evaluates student performance in the State Polytechnic. The data were collected for two (2) years intake from July 2016 to June 2018 it contains student previous academic records Such as course code, course Name marks obtained for each student by applying the classification technique algorism in Rapid Miner tool. Data mining provides good and powerful methods for education and another different field of study. Due to the vast amount of data of student which is used to find out valuable information which can be used to determine the student success. In this paper a classification task was used for the prediction. A decision tree model is applied during the experiment. The results indicates that it is possible to predicts the graduation performance, in addition, a procedure for evaluating the performance for each course have identified.
... Organizations around the world recognize that there is a large amount of supporting content that is often available, but it is ignored [16]. Successful use of this content requires a curator who uses specialized knowledge to combine relevant learning tools and pathways for higher education applicants [29]. The use of audiovisual materials increases exponentially during training. ...
Article
Full-text available
The article presents the implementation of future agricultural engineers’ training technology in the informational and educational environment. To train future agricultural engineers, it is advisable to form tutorials for the study of each discipline in the conditions of informational and educational environment. Such tutorials are an assistance in mastering both theoretical material and course navigation, where interactive electronic learning tools are presented to perform tasks in the informational and educational environment. Higher education applicants perform such tasks directly in the classroom with the help of gadgets or personal computers. The final grade is formed from the scores obtained in the classroom and the rating of higher education applicants while studying in the informational and educational environment. The outlined approach is able to help in the quality of learning content. The use of interactive audiovisual online tools allows to get acquainted with the theoretical, practical and experimental provisions clearly, it is important for the training of future agricultural engineers. At the end of the experiment, it can be argued that the developed technology increases the level of motivation and self-incentive to work in the informational and educational environment. The application of the presented technology provides an opportunity to combine the educational process in the classroom with learning in the informational and educational environment, forms analytical abilities and competencies in professional activity. The reliability of the obtained results was checked using the ???? Kolmogorov-Smirnov criterion. It is determined that when using this technology in the educational process, the indicators in the experimental group increased, which displays the effectiveness of training bachelors in agricultural engineering in the conditions of informational and educational environment.
... Baneres et al. [119] apply naive Bayes, decision trees, k-Nearest Neighbours, and support vector machines to performance features of students from a fully online university in order to identify at-risk students as soon as possible. Sandoval et al. [130] use background and academic records of students to predict a final score with implementation of linear regression, robust linear regression and random forest algorithms. In [131] authors apply principal component regression to features related to internal assessment and video viewing to predict final academic performance. ...
Thesis
Educational institutions seek to design effective mechanisms that improve academic results, enhance the learning process, and avoid dropout. The performance analysis and performance prediction of students in their studies may show drawbacks in the educational formations and detect students with learning problems. This induces the task of developing techniques and data-based models which aim to enhance teaching and learning. Classical models usually ignore the students-outliers with uncommon and inconsistent characteristics although they may show significant information to domain experts and affect the prediction models. The outliers in education are barely explored and their impact on the prediction models has not been studied yet in the literature. Thus, the thesis aims to investigate the outliers in educational data and extend the existing knowledge about them. The thesis presents three case studies of outlier detection for different educational contexts and ways of data representation (numerical dataset for the German University, numerical dataset for the Russian University, sequential dataset for French nurse schools). For each case, the data preprocessing approach is proposed regarding the dataset peculiarities. The prepared data has been used to detect outliers in conditions of unknown ground truth. The characteristics of detected outliers have been explored and analysed, which allowed extending the comprehension of students' behaviour in a learning process. One of the main tasks in the educational domain is to develop essential tools which will help to improve academic results and reduce attrition. Thus, plenty of studies aim to build models of performance prediction which can detect students with learning problems that need special help. The second goal of the thesis is to study the impact of outliers on prediction models. The two most common prediction tasks in the educational field have been considered: (i) dropout prediction, (ii) the final score prediction. The prediction models have been compared in terms of different prediction algorithms and the presence of outliers in the training data. This thesis opens new avenues to investigate the students' performance in educational environments. The understanding of outliers and the reasons for their appearance can help domain experts to extract valuable information from the data. Outlier detection might be a part of the pipeline in the early warning systems of detecting students with a high risk of dropouts. Furthermore, the behavioral tendencies of outliers can serve as a basis for providing recommendations for students in their studies or making decisions about improving the educational process.
... Random Forest is used to find most influencing factors for students' exam performance prediction. The proposed study found that students' previous semesters' CGPA and interaction with learning resources are best predictors of students' final results (Sandoval et al., 2018). Another research study combined Relief-F and Random Forest models for selecting most significant attributes for students' final exam scores prediction. ...
Article
Full-text available
Educational data mining is an emerging interdisciplinary research area involving both education and informatics. It has become an imperative research area due to many advantages that educational institutions can achieve. Along these lines, various data mining techniques have been used to improve learning outcomes by exploring large-scale data that come from educational settings. One of the main problems is predicting the future achievements of students before taking final exams, so we can proactively help students achieve better performance and prevent dropouts. Therefore, many efforts have been made to solve the problem of student performance prediction in the context of educational data mining. In this paper, we provide readers with a comprehensive understanding of student performance prediction and compare approximately 260 studies in the last 20 years with respect to i) major factors highly affecting student performance prediction, ii) kinds of data mining techniques including prediction and feature selection algorithms, and iii) frequently used data mining tools. The findings of the comprehensive analysis show that ANN and Random Forest are mostly used data mining algorithms, while WEKA is found as a trending tool for students’ performance prediction. Students’ academic records and demographic factors are the best attributes to predict performance. The study proves that irrelevant features in the dataset reduce the prediction results and increase model processing time. Therefore, almost half of the studies used feature selection techniques before building prediction models. This study attempts to provide useful and valuable information to researchers interested in advancing educational data mining. The study directs future researchers to achieve highly accurate prediction results in different scenarios using different available inputs or techniques. The study also helps institutions apply data mining techniques to predict and improve student outcomes by providing additional assistance on time.
... Lara et al. (2014) predict student performance to provide teachers with inferences from past student performance. Sandoval et al. (2018) present a prediction model based on predicting early which students attending large classes (with more than 50 enrollments) are at risk of failing a course. ...
Article
Full-text available
Data mining is one of the important and beneficial technological developments in education and its usage area is becoming widespread day by day as it includes applications that contribute positively to teaching activities. By making raw data in the field of education meaningful using data mining techniques, teaching activities can be made more effective and efficient. Studies carried out in the field of education between 2014-2020 with data mining methods were scanned from the "Science Direct" database. As a result of scanning studies, 60 papers were found to be directly related to data mining in education. The studies include issues such as the development of e-learning systems, pedagogical support, clustering of educational data, and student performance predictions. These selected articles were analyzed in terms of purpose, application area, method, and contribution to the literature. This study aims to group the studies conducted in the field of education using the data mining method under certain headings, evaluate the methods and goals and present the need in this field to the researchers who will work in this field.
... The learning in the conditions of informational IOP Publishing doi: 10.1088/1742-6596/1946/1/012014 2 and educational environment helps to develop an analytical thinking and understanding underlying issues [16]. An increasing number of higher education institutions have deployed learning management systems (LMS) to support learning and teaching processes [17][18][19]. But implementation learning management system in higher education institutions needs a range of special online tools [20]. ...
Article
Full-text available
The article presents the technology of application of competence-based educational simulators in the informational and educational environment for learning general technical disciplines. It was designed the classification of competence-based educational simulators for learning general technical disciplines. There are presented types of educational simulators and outlined professional competencies of general technical disciplines that provide the developed types of simulators. On the basis of passing educational simulators it is formed not only a qualitative indicator of the educational results, but also an indicator of the formation of competencies in the course and curriculum. The method was tested using experimental group and control group (total 1301 students of specialties ‘Agricultural Engineering’, ‘Electrical Power, Electrical Engineering and Electrical Mechanics’, ‘Professional Education’ that studying general technical disciplines) by systematically measuring achievement of professional competencies in the conditions of informational and educational environment by using educational simulators. The results show that higher education applicants in the experimental group achieve better results of acquiring professional competencies.
... Obviously, the majority of these studies either ignore that students have no control over such factors and informing them about such predictors may have destructive effects on them or do not consider that such indicators might be unavailable due to multiple reasons (e.g., data privacy) [17,18,32]. More research should alternatively concentrate on using data related to students' online learning behavior that are logically the best predictors of their performance in courses. ...
Article
Full-text available
While modelling students’ learning behavior or preferences has been found as a crucial indicator for their course achievement, very few studies have considered it in predicting achievement of students in online courses. This study aims to model students’ online learning behavior and accordingly predict their course achievement. First, feature vectors are developed using their aggregated action logs during a course. Second, some of these feature vectors are quantified into three numeric values that are used to model students’ learning behavior, namely, accessing learning resources (content access), engaging with peers (engagement), and taking assessment tests (assessment). Both students’ feature vectors and behavior model constitute a comprehensive students’ learning behavioral pattern which is later used for prediction of their course achievement. Lastly, using a multiple criteria decision-making method (i.e., TOPSIS), the best classification methods were identified for courses with different sizes. Our findings revealed that the proposed generalizable approach could successfully predict students’ achievement in courses with different numbers of students and features, showing the stability of the approach. Decision Tree and AdaBoost classification methods appeared to outperform other existing methods on different datasets. Moreover, our results provide evidence that it is feasible to predict students’ course achievement with a high accuracy through modelling their learning behavior during online courses.
... Because of the efforts of higher education institutions to carry out the digitization of their students' data, access to them has been facilitated, generating new opportunities for their analysis (Sandoval et al., 2018). In this sense, data mining becomes important and emerges as an interesting tool to answer complex questions in education such as learning, the prediction of academic performance and SA (Mduma et al., 2019). ...
Article
Purpose-The prediction of student attrition is critical to facilitate retention mechanisms. This study aims to focus on implementing a method to predict student attrition in the upper years of a physiotherapy program. Design/methodology/approach-Machine learning is a computer tool that can recognize patterns and generate predictive models. Using a quantitative research methodology, a database of 336 university students in their upper-year courses was accessed. The participant's data were collected from the Financial Academic Management and Administration System and a platform of Universidad Aut onoma de Chile. Five quantitative and 11 qualitative variables were chosen, associated with university student attrition. With this database, 23 classifiers were tested based on supervised machine learning. Findings-About 23.58% of males and 17.39% of females were among the attrition student group. The mean accuracy of the classifiers increased based on the number of variables used for the training. The best accuracy level was obtained using the "Subspace KNN" algorithm (86.3%). The classifier "RUSboosted trees" yielded the lowest number of false negatives and the higher sensitivity of the algorithms used (78%) as well as a specificity of 86%. Practical implications-This predictive method identifies attrition students in the university program and could be used to improve student retention in higher grades. Originality/value-The study has developed a novel predictive model of student attrition from upper-year courses, useful for unbalanced databases with a lower number of attrition students.
... All established models were evaluated for their ability to make accurate predictions on the validation data set; a comparison of prediction accuracy of the various models was done. Since our outcome variable is measured as a continuous variable, prediction accuracy and performance of the predictive models were assessed with several continuous error metrics, namely root mean square error (RMSE), mean absolute error (MAE), and R 2 (Sandoval et al., 2018). The RMSE refers to the square root of the average squared difference between the predicted and actual values. ...
Preprint
Full-text available
Predicting students’ academic performance has long been an important area of research in education. Most existing literature have made use of traditional statistical methods that run into the problems of overfitted models, inability to effectively handle large numbers of participants and predictors, and inability to pick out non-linearities that may be present. Regression-based ML methods that can produce highly interpretable yet accurate models for new predictions, are able to provide some solutions to the aforementioned problems. The present study is the first study that develops and compares between traditional MLR methods and regression-based ML methods (i.e. ridge regression, LASSO regression, elastic net, and regression trees) to predict students’ science performance in the PISA 2015. A total of 198,712 students from 60 countries, and 66 student- and school-related predictors were used to develop the predictive models. Predictive accuracy of the various models built were not that different, however, there were significant differences in the predictors identified as most important by the different methods. Although regression-based ML techniques did not outperform traditional MLR, significant advantages for using ML methods were noted and discussed. Moving forward, we strongly believe that there is merit for using such regression-based ML methods in educational research. Educational research can benefit from adopting ML practices and methods to produce models that can not only be used for explaining factors that influence academic performance prediction, but also for making more accurate predictions on unseen data.
... Prior research has pinpointed pre-service teachers' difficulties in implementing adaptive teaching in their lessons, possibly due in part to pre-service teachers' difficulties in noticing students' behaviors that manifest students' difficulties in understanding and learning . As described in the current literature (e.g., Blomberg et al. 2014;Sandoval et al. 2018aSandoval et al. , 2018bStürmer et al. 2013aStürmer et al. , 2013b, teachers need to notice and interpret student behavior as part of their everyday classroom work. Considering that few teacher education programs to date have included explicit instruction in how to engage in analysis of student behaviors to usefully promote adaptive teaching , the current study findings suggest a means to help pre-service teachers develop their facility for adaptive teaching practice, by incorporating noticing of meaningful student behaviors at an early stage in their teacher education programs. ...
Article
Full-text available
Teachers need to notice and interpret student behavior as part of their everyday classroom work. Current teacher education programs often do not explicitly focus on helping pre-service teachers learn to analyze and interpret student behavior and understand how it may influence teachers’ teaching behaviors, which in turn may affect students’ thinking and achievements. Using a quasi-experimental design, the current study examined a systematic reflective approach promoting dual learning from both teacher and student perspectives in authentic videotaped classrooms. More specifically, the study examined how this dual reflective “professional vision” framework influenced pre-service teachers’ actual ability to explicitly teach meta-strategic knowledge (MSK) to students. Results indicated that pre-service teachers whose video-analysis reflected on both teachers’ and students’ behaviors demonstrated greater improvement in their MSK-teaching, and their students showed better MSK achievements, compared to pre-service teachers whose video-analysis reflected only on teachers’ behaviors. The current study suggests the need to integrate systematic dual reflective professional vision approaches – that analyze not only teachers’ but also students’ behaviors – into teacher preparation programs as a means for developing pre-service teachers’ capacity to promote students’ MSK.
... In the LR model, the two-dimensional data is represented as dots falling into a straight line, where the Xaxis is the predictor and the Y-axis is the target [39]. The performance of the regression model is evaluated based on four of the most popular metrics: Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (R-squared) [40]. The MSE, RMSE, MAE, and R-squared are presented below, from Equation (5) to Equation (8). ...
Article
Full-text available
Educational Data Mining (EDM) helps to recognize the performance of students and predict their academic achievements that include the successes aspects and failures, negative aspects and challenges. In the educational systems, a massive amount of students' data has been collected, which has become difficult for officials to search through and obtain the knowledge required to discover challenges facing students and universities by traditional methods. Therefore, the rooted problem is how to dive into these data and discover real challenges that are facing both the students and the universities. The main aim of this research is to extract hidden, significant patterns, new insights from students' historical data, which can solve the current problems, help to enhance the educational process and to improve academic performance. The data mining tools used for this task are classification, regression, and association rules for frequent patterns generation. The research data sets gathered from the College of Business and Economics (CBE). The finding of this research can help to make appropriate decisions for certain circumstances and provide better suggestions for overcoming students' weaknesses and failures. Through the findings, numerous problems related to a students' performance discovered at different levels and in various courses. The research findings indicated that there are many important problems. Consequently, a suggestion of suitable solutions, which can be presented to the relevant authorities for the benefit and improving student performance and activating academic advising.
... Regression is a useful statistical method when data are not very complex, and the number of observations is not large. Linear regression has been widely used in predicting student performance (e.g., 27). In this article, multiple linear regression analyses are employed to predict student course performance. ...
Article
Full-text available
Over the past decade, the field of education has seen stark changes in the way that data are collected and leveraged to support high-stakes decision-making. Utilizing big data as a meaningful lens to inform teaching and learning can increase academic success. Data-driven research has been conducted to understand student learning performance, such as predicting at-risk students at an early stage and recommending tailored interventions to support services. However, few studies in veterinary education have adopted Learning Analytics. This article examines the adoption of Learning Analytics by using the retrospective data from the first-year professional Doctor of Veterinary Medicine program. The article gives detailed examples of predicting six courses from week 0 (i.e., before the classes started) to week 14 in the semester of Spring 2018. The weekly models for each course showed the change of prediction results as well as the comparison between the prediction results, and students' actual performance. From the prediction models, at-risk students were successfully identified at the early stage, which would help inform instructors to pay more attention to them at this point.
... As it is apparent, surprisingly, most of the studies focus on students' past performance or non-academic-related data (e.g., gender, race, and socioeconomic status) in their predictive models, largely neglecting data logged from students' activity (e.g. Sandoval et al. 2018). Such predictive models simply ignore the fact that many of these variables fall outside the control of students and teachers alike. ...
Article
Full-text available
A significant amount of educational data mining (EDM) research consider students’ past performance or non-academic factors to build predictive models, paying less attention to students’ activity data. While procrastination has been found as a crucial indicator which negatively affects performance of students, no research has investigated this underlying factor in predicting achievement of students in online courses. In this study, we aim to predict students’ course achievement in Moodle through their procrastination behaviour using their homework submission data. We first build feature vectors of students’ procrastination tendencies by considering their active, inactive, and spare time for homework, along with homework grades. Accordingly, we then use clustering and classification methods to optimally sort and put students into various categories of course achievement. We use a Moodle course from the University of Tartu in Estonia which includes 242 students to assess the efficacy of our proposed approach. Our findings show that our approach successfully predicts course achievement for students through their procrastination behaviour with precision and accuracy of 87% and 84% with L-SVM outperforming other classification methods. Furthermore, we found that students who procrastinate more are less successful and are potentially going to do poorly in a course, leading to lower achievement in courses. Finally, our results show that it is viable to use a less complex approach that is easy to implement, interpret, and use by practitioners to predict students’ course achievement with a high accuracy, and possibly take remedial actions in the semester.
... The instructors can use and apply educational interventions to reduce failure rate. (Sandoval et al., 2018) Students' academic records are stored in the offices of the engineering faculties and these records includes the performance of student at different subject as well as the information regarding the student origin, age, previous studies information. All this information should be enough and help to categorize class of the students we are dealing with. ...
Article
Full-text available
Abstract This paper is aims to analyses and evaluates student performance in the Department of Computer Science, Jigawa State Polytechnic. The data were collected for two (2) years intake from July 2016 to June 2018 it contains student previous academic records Such as course code, course Name marks obtained for each student by applying the classification technique algorism in Rapid Miner tool. Data mining provides good and powerful methods for education and another different field of study. Due to the vast amount of data of student which is used to find out valuable information which can be used to determine the student success. In this paper a classification task was used for the prediction. A decision tree model is applied during the experiment. The results indicates that it is possible to predicts the graduation performance, in addition, a procedure for evaluating the performance for each course have identified.
... Less than 83% of the activity in this dataset was passive. Sandoval et al (2018) found over 95% of the activity in their dataset was passive. In this study although there was little activity indicating interaction between students there was proportionally more activity indicating interaction with the system (e.g. ...
Article
Full-text available
Increasingly educational providers are being challenged to use their data stores to improve teaching and learning outcomes for their students. A common source of such data is learning management systems which enable providers to manage a virtual platform or space where learning materials and activities can be provided for students to engage with. This study investigated whether data from the learning management system Moodle can be used to predict academic performance of students in a blended learning further education setting. This was achieved by constructing measures of student activity from Moodle logs of further education courses. These were used to predict alphabetic student grade and whether a student would pass or fail the course. A key focus was classifiers that could predict likelihood of failure from data available early in the term. The results showed that classifiers built on all course data predicted student grade moderately well (accuracy= 60.5%, kappa = 0.43) and whether a student would pass or fail very well (accuracy= 92.2%, kappa=0.79). However, classifiers built on the first six weeks of data did not predict failing students well. Classifiers trained on the first ten weeks of data improved significantly on a no-information rate (p<0.008) though more than half of failing students were still misclassified. The evidence indicates that measures of Moodle activity on further education courses could be useful as part of on an early-warning system at ten weeks.
Article
Full-text available
Many stakeholders including students, teachers, and educational institutions, benefit from accurately predicting student performance and facilitating data-driven policies. In this field, providing users with accurate and understandable predictions is challenging, but equally important. The goals of this study are multifaceted: to identify students at-risk; to identify differences in assessment across different environments; methods for assessing students; and to determine the relationship between teacher employment status and student achievement. This study performs an empirical comparison of the performance and efficiency of ensemble classification methods based on bagging, boosting, stacking, and voting for successful predictions. An ensemble model is developed and validated using double, triple, and quadruple combinations of classification algorithms using Naive Bayes, J48 decision trees, Adaboost, logistics, and multilayer perceptron. This study uses primary quantitative data from the learning management system of a university in Pakistan to analyze the performance of these models. The boosted tree detection method outperforms bagged trees when the standard deviation is higher and the data size is large, while stacking is best for smaller datasets. Based on behavioral analysis results of students, academic advice can be given for selected case studies. These will help educational administrators and policymakers working in education to introduce new policies and curricula accordingly.
Article
There is a high failure rate and low academic performance observed in programming courses. To address these issues, it is crucial to predict student performance at an early stage. This allows teachers to provide timely support and interventions to help students achieve their learning objectives. The prediction of student performance has gained significant attention, with researchers focusing on machine learning features and algorithms to improve predictions. This article proposes a model for predicting student performance in a 16-week CS1 programming course, specifically in weeks 3, 5, and 7. The model utilizes three key factors: grades, delivery time, and the number of attempts made by students in programming labs and an exam. Eight classification algorithms were employed to train and evaluate the model, with performance assessed using metrics such as accuracy, recall, F1 score, and AUC. In week 3, the gradient boosting classifier (GBC) achieved the best results with an F1 score of 86%, followed closely by the random forest classifier (RFC) with 83%. These findings demonstrate the potential of the proposed model in accurately predicting student performance.
Article
Educators' loss of ability to read students' comprehension level during the class through quick questions or nonverbal communication is one of the main challenges of online and blended learning. Many researchers recently tackled this problem by proposing different frameworks for predicting students' academic performance. However, previous work relies heavily on feature engineering. Feature engineering is the process of selecting, transforming, manipulating, and constructing new variables from raw data using domain knowledge. A disadvantage of feature engineering is that the features are tailored to a specific dataset making the constructed models inflexible when used in new datasets. A direct consequence is that features need to be rebuilt for each course. This paper proposes a more flexible framework to predict the students' academic performance. In this framework, the raw data is used directly to construct the prediction model without the feature engineering step. The feature selection is instead based on model interpretability. The framework is applied to the open university learning analytics dataset (OULAD) with two different type of classifiers: random forest and artificial neural networks. Obtained results show that the feature engineering step can be abandoned without affecting the models' prediction performance. The prediction results of the flexible feature selection framework either outperform or have a difference of less than 1% accuracy compared to other work in the literature that relies on a manual feature engineering step. Both random forest and artificial neural networks without feature engineering accomplish a high prediction accuracy for the case of students at risk of failing with 86% and 88% compared to all students with pass grades and students with distinction grades, respectively. Also, the prediction models have the highest accuracy rate of 93% in predicting drop-out students. Yet, the prediction models in the proposed framework and previous research work perform poorly in predicting high achieving students with maximum accuracy of 81%, a precision of 69%, and a recall of 57%.
Article
With the onset of online education via technology-enhanced learning platforms, large amount of educational data is being generated in the form of logs, clickstreams, performance, etc . These Virtual Learning Environments provide an opportunity to the researchers for the application of educational data mining and learning analytics, for mining the students learning behavior. This further helps them in data-driven decision making through timely intervention via early warning systems (EWS), reflecting and optimizing educational environments and refining pedagogical designs. In this, the role of EWS is to timely identify the at-risk students. This study proposes a modeling methodology deploying interpretable Hidden Markov Model for mining of the sequential learning behavior built upon derived performance features from light-weight assessments. The public OULA dataset having diversified courses and 32,593 student records, is used for validation. The results on the unseen test data, achieve a classification accuracy ranging from 87.67%-94.83% and AUC from 0.927-0.989, and outperforms other baseline models. For implementation of EWS the study also predicts the optimal time-period, during the first and second quarter of the course with sufficient number of light-weight assessments in place. With the outcomes, this study tries to establish an efficient generalized modeling framework that may lead the higher educational institutes towards sustainable development.
Article
Bu araştırmanın amacı, COVID sonrası dönemde eğitimde dijital dönüşümün niteliğini artırmaya katkı sağlamak için, değerlendirmenin sürece yayıldığı bir çevrimiçi derste analitik temelli öğrenme performansı farklı olan grupların, ara sınav ve final performansını, e-değerlendirme tasarımı algısını ve genel öğrenme deneyimini incelemektir. Araştırmada öğrenme analitiği süreci yürütülmüş olup, betimleyici analitik yöntemi kullanılmıştır. Bu süreç ara sınava kadar ve finale kadar olan dönemlerde öğrenme performansı ile ilişkilendirilebilecek metriklerinin toplanmasını ve analiz edilmesini içermektedir. Çalışma grubu uzaktan eğitim programlarına kayıtlı olup Bilgi ve İletişim Teknolojileri dersini alan 285 öğrenciden oluşmaktadır. Veriler her konu için ön test, MOODLE içerisinde öğrenci izleme araçları (canlı derse katılım, çevrimiçi çalışma süresi, etkinlik tamamlama yüzdesi, öğrenme kaynaklarına erişim), e-değerlendirme tasarımı algısı ve genel öğrenme deneyimi boyutlarının kullanıldığı e-değerlendirme ölçeği ve çevrimiçi sınav (ara ve final) aracılığıyla toplanmıştır. Analitik temelli öğrenme performansını betimleyebilmek için kümeleme analizi (k-means ve hiyerarşik) kullanılmıştır. Kümelere göre ara sınav ve final performansı, e-değerlendirme tasarımı algısı ve genel öğrenme deneyimleri arasında farklılık t-testi ile analiz edilmiştir. Sonuç olarak, analitikler bakımından yüksek performans gösteren öğrencilerin akademik başarılarının daha yüksek olduğu bulunmuştur. Fakat, kurumların uzaktan eğitime ilişkin yönetmeliklerindeki sınırlılıklar nedeni ile adil bir değerlendirme sürecinin garanti edilemeyeceği tartışılmaktadır. Bu doğrultuda başarı ölçütlerinin daha iyi nasıl belirlenebileceğine odaklanılarak öğrenme performansını daha nitelikli ortaya koyabilecek uygulama örneklerinin çoğaltılması faydalı olabilir.
Conference Paper
Full-text available
Student dropout still becomes a critical problem in education. Educational Data Mining (EDM) can bring potential impact to support academic institution’s goals in making academic decisions, such as regulation renewal, rule enforcement, or academic process improvement. The sooner at-risk students can be identified, the earlier institution members can provide necessary treatments, thus prevent them from dropout and increase the student retention rate. This study performs a comprehensive literature review of student performance prediction using EDM techniques, including various research from 2002 to 2021. Our study is aimed to provide a comprehensive review of recent studies based on student performance prediction tasks, predictor variables, methods, accuracy, and tools used in previous works of student performance prediction. Performing student performance prediction in an academic institution can be helpful to provide the student performance mitigation mechanism because it can be managed earlier by the management to decrease the student dropout rate.
Article
Purpose This study aims to explore Chilean students’ digital technology usage patterns and approaches to learning. Design/Approach/Methods We conducted this study in two stages. We worked with one semester learning management systems (LMS), library, and students’ records data in the first one. We performed a k-means cluster analysis to identify groups with similar usage patterns. In the second stage, we invited students from emerging clusters to participate in group interviews. Thematic analysis was employed to analyze them. Findings Three groups were identified: 1) Digital library users/high performers, who adopted deeper approaches to learning, obtained higher marks, and used learning resources to integrate materials and expand understanding; 2) LMS and physical library users/mid-performers, who adopted mainly strategic approaches, obtained marks close to average, and used learning resources for studying in an organized manner to get good marks; and 3) Lower users of LMS and library/mid-low performers, who adopted mainly a surface approach, obtained mid-to-lower-than-average marks, and used learning resources for minimum content understanding. Originality/Value We demonstrated the importance of combining learning analytics data with qualitative methods to make sense of digital technology usage patterns: approaches to learning are associated with learning resources use. Practical recommendations are presented.
Article
Over the past decade, the field of education has seen stark changes in the way that data are collected and leveraged to support high-stakes decision-making. Utilizing big data as a meaningful lens to inform teaching and learning can increase academic success. Data-driven research has been conducted to understand student learning performance, such as predicting at-risk students at an early stage and recommending tailored interventions to support services. However, few studies in veterinary education have adopted Learning Analytics. This article examines the adoption of Learning Analytics by using the retrospective data from the first-year professional Doctor of Veterinary Medicine program. The article gives detailed examples of predicting six courses from week 0 (i.e., before the classes started) to week 14 in the semester of Spring 2018. The weekly models for each course showed the change of prediction results as well as the comparison between the prediction results and students’ actual performance. From the prediction models, at-risk students were successfully identified at the early stage, which would help inform instructors to pay more attention to them at this point.
Preprint
Full-text available
Predicting the performance of students early and as accurately as possible is one of the biggest challenges of educational institutions. Analyzing the performance of students early can help in finding the strengths and weakness of students and help the perform better in examinations. Using machine learning the student's performance can be predicted with the help of students' data collected from Learning Management Systems (LMS). The data collected from LMSs can provide insights about student's behavior that will result in good or bad performance in examinations which then can be studied and used in helping students performing poorly in examinations to perform better.
Article
Audience Response Systems like clickers are gaining much attention for early identification of at-risk students as quality education, student success rate and retention are major concerning areas, as evidenced in this COVID scenario. Usage of this active learning strategy across the varying strength of classrooms are found to be much effective in retaining the attention, retention and learning power of the students. However, implementing clickers for large classrooms incur overhead costs on instructor's part. As a result, educational researchers are experimenting with various lightweight alternatives. This paper discusses one such alternative: lightweight formative assessments for blended learning environments. It discusses their implementation and effectiveness in early identification of at-risk students. This study validates the usage of lightweight assessments for three core pedagogically different courses of large computer science engineering classrooms. It uses voting ensemble classifier for effective predictions. With the usage of lightweight assessments in early identification of at-risk students, accuracy range of 87%–94.7% have been achieved along-with high ROC-AUC values. The study also proposes the generalized pedagogical architecture for fitting in these lightweight assessments within the course curriculum of pedagogically different courses. With the constructive outcomes, the light-weight assessments seem to be promising for efficient handling of scaling technical classrooms.
Article
Next-term grade prediction is a challenging problem. The objective of this problem is to predict students grades in new courses, given their grades in courses they have previously taken. Adopting various machine learning algorithms is a very common and straightforward approach to tackling this problem. However, such models are very difficult to interpret. That is, it is difficult to explain to a student (or a teacher) why the model predicted grade B for a given student for example. In this work, we shed light on the importance of building interpretable models for educational data mining tasks. Specifically, we propose a novel interpretable framework for multi-class grade prediction that is based on an optimal rule-list mining algorithm. Additionally, we evaluate our proposed framework on two private datasets and compare our results with baseline models. Our findings show that our proposed framework is capable of achieving higher prediction and interpretability values when compared to black-box models.
Article
Full-text available
The global explosion of COVID-19 has brought unprecedented challenges to traditional higher education, especially for freshmen who have no major; they cannot determine what their real talents are. Thus, it is difficult for them to make correct choices based on their skills. Generally, existing methods mainly mine isomorphic information, ignoring relationships among heterogeneous information. Therefore, this paper proposes a new framework to give freshmen appropriate recommendations by mining heterogeneous educational information. This framework is composed of five stages: after data preprocessing, a weighted heterogeneous educational network (WHEN) is constructed according to heterogeneous information in student historical data. Then, the WHEN is projected into different subnets, on which metapaths are defined. Next, a WHEN-based embedding method is proposed, which helps mine the weighted heterogeneous information on multiple extended metapaths. Finally, with the information mined, a matrix factorization algorithm is used to recommend learning resources and majors for freshmen. A large number of experimental results show that the proposed framework can achieve better results than other baseline methods. This indicates that the proposed method is effective and can provide great help to freshmen during the COVID-19 storm.
Article
Full-text available
Veri madenciliği ve yapay zekâ teknolojilerinin ilerlemesiyle öğrencilerin öğretim yönetim sistemleri üzerindeki hareketlerine bakarak geleceğe yönelik davranışları tahmin edilebilir hale gelmiştir. Özellikle riskli öğrencilerin önceden tespit edilerek uyarı vermesi mantığına dayanan erken uyarı sistemleri geliştirilerek uzaktan eğitim veren kurumlara bilgi sağlanmaktadır. Çalışmamızın amacı, erken uyarı sistemleri üzerine yapılan çalışmaların yayın özellikleri ve veri madenciliğine dayalı analiz yöntemi özellikleri açısından incelenerek mevcut durumun ortaya çıkarılmasıdır. Bu amaç doğrultusunda Google Akademik veri tabanından elde edilen veriler içerik analizi yöntemi ile incelenmiş ve elde edilen sonuçlar frekans tabloları halinde sunulmuştur. Erken Uyarı Sistemleri üzerine yapılan çalışmalar incelendiğinde, bu başlıktaki çalışmalara 2014 yılından sonra başlandığı ve 2018 yılında konu ile ilgili çalışma sayısının arttığı görülmüştür. Bu çalışmaların çoğunlukla ABD’de yapıldığı, makale ve bildiri türünde olduğu ve nicel yöntemlerin tercih edildiği tespit edilmiştir. Yapılan deneysel çalışmalarda verilerin, öğrenme yönetim sisteminden alınan sistem kayıtlarından toplandığı ve verilerin çeşitli veri madenciliği teknikleri kullanılarak analiz edildiği sonucuna varılmıştır. Erken uyarı sistemleri konusunda uygulanabilirliği kanıtlanmış bir modelin henüz geliştirilememiş olması bu çalışmadan elde edilen en önemli sonuç olarak değerlendirilebilir.
Article
Full-text available
This decade, e-learning systems provide more interactivity to instructors and students than traditional systems and make possible a completely online (CO) education. However, instructors could not warn if a CO student is engaged or not in the course, and they could not predict his or her academic performance in courses. This work provides a collection of models (exploratory factor analysis, multiple linear regressions, cluster analysis, and correlation) to early predict the academic performance of students. These models are constructed using Moodle interaction data, characteristics, and grades of 802 undergraduate students from a CO university. The models result indicated that the major contribution to the prediction of the academic student performance is made by four factors: Access, Questionnaire, Task, and Age. Access factor is composed by variables related to accesses of students in Moodle, including visits to forums and glossaries. Questionnaire factor summarizes variables related to visits and attempts in questionnaires. Task factor is composed of variables related to consulted and submitted tasks. The Age factor contains the student age. Also, it is remarkable that Age was identified as a negative predictor of the performance of students, indicating that the student performance is inversely proportional to age. In addition, cluster analysis found five groups and sustained that number of interactions with Moodle are closely related to performance of students.
Article
Prediction models that underlie “early warning systems” need improvement. Some predict outcomes using entrenched, unchangeable characteristics (e.g., socioeconomic status) and others rely on performance on early assignments to predict the final grades to which they contribute. Behavioral predictors of learning outcomes often accrue slowly, to the point that time needed to produce accurate predictions leaves little time for intervention. We aimed to improve on these methods by testing whether we could predict performance in a large lecture course using only students’ digital behaviors in weeks prior to the first exam. Early prediction based only on malleable behaviors provides time and opportunity to advise students on ways to alter study and improve performance. Thereafter, we took the not-yet-common step of applying the model and testing whether providing digital learning support to those predicted to perform poorly can improve their achievement. Using learning management system log data, we tested models composed of theory-aligned behaviors using multiple algorithms and obtained a model that accurately predicted poor grades. Our algorithm correctly identified 75% of students who failed to earn the grade of B or better needed to advance to the next course. We applied this model the next semester to predict achievement levels and provided a digital learning strategy intervention to students predicted to perform poorly. Those who accessed advice outperformed classmates on subsequent exams, and more students who accessed the advice achieved the B needed to move forward in their major than those who did not access advice.
Article
The aim of this paper is to survey recent research publications that use Soft Computing methods to answer education-related problems based on the analysis of educational data ‘mined’ mainly from interactive/e-learning systems. Such systems are known to generate and store large volumes of data that can be exploited to assess the learner, the system and the quality of the interaction between them. Educational Data Mining (EDM) and Learning Analytics (LA) are two distinct and yet closely related research areas that focus on this data aiming to address open education-related questions or issues. Besides ‘classic’ data analysis methods such as clustering, classification, identification or regression/analysis of variances, soft computing methods are often employed by EDM and LA researchers to achieve their various tasks. Their very nature as iterative optimization algorithms that avoid the exhaustive search of the solutions space and go for possibly suboptimal solutions yet at realistic time and effort, along with their heavy reliance on rich data sets for training, make soft computing methods ideal tools for the EDM or LA type of problems. Decision trees, random forests, artificial neural networks, fuzzy logic, support vector machines and genetic/evolutionary algorithms are a few examples of soft computing approaches that, given enough data, can successfully deal with uncertainty, qualitatively stated problems and incomplete, imprecise or even contradictory data sets – features that the field of education shares with all humanities/social sciences fields. The present review focuses, therefore, on recent EDM and LA research that employs at least one soft computing method, and aims to identify (i) the major education problems/issues addressed and, consequently, research goals/objectives set, (ii) the learning contexts/settings within which relevant research and educational interventions take place, (iii) the relation between classic and soft computing methods employed to solve specific problems/issues, and (iv) the means of dissemination (publication journals) of the relevant research results. Selection and analysis of a body of 300 journal publications reveals that top research questions in education today seeking answers through soft computing methods refer directly to the issue of quality – a critical issue given the currently dominant educational/pedagogical models that favor e-learning or computer- or technology-mediated learning contexts. Moreover, results identify the most frequently used methods and tools within EDM/LA research and, comparatively, within their soft computing subsets, along with the major journals relevant research is being published worldwide. Weaknesses and issues that need further attention in order to fully exploit the benefits of research results to improve both the learning experience and the learning outcomes are discussed in the conclusions.
Chapter
This chapter focuses on the key practical aspects to be considered when facing the task of developing predictive models for student learning outcomes. It is based on the authors' experience building and delivering dropout prediction models within higher education contexts. The chapter presents the information used to generate the predictive models, how this information is treated, how the models are fed, which types of algorithms have been used, and why and how the obtained results have been evaluated. It recommends best practices for building, training, and evaluating predictive models. It is hoped that readers will find these recommendations useful for the design, development, deployment, and use of early warning systems.
Article
Full-text available
Using predictive modeling methods, it is possible to identify at-risk students early and inform both the instructors and the students. While some universities have started to use standards-based grading, which has educational advantages over common score-based grading, at–risk prediction models have not been adapted to reap the benefits of standards-based grading in courses that utilize this grading. In this paper, we compare predictive methods to identify at-risk students in a course that used standards-based grading. Only in-semester performance data that were available to the course instructors were used in the prediction methods. When identifying at-risk students, it is important to minimize false negative (i.e., type II) error while not increasing false positive (i.e., type I) error significantly. To increase the generalizability of the models and accuracy of the predictions, we used a feature selection method to reduce the number of variables used in each model. The Naive Bayes Classifier model and an Ensemble model using a sequence of models (i.e., Support Vector Machine, K-Nearest Neighbors, and Naive Bayes Classifier) had the best results among the seven tested modeling methods.
Article
Full-text available
This paper presents a dialogical tool for the advancement of learning analytics implementation for student retention in Higher Education institutions. The framework was developed as an outcome of a project commissioned and funded by the Australian Government's Office for Learning and Teaching. The project took a mixed-method approach including a survey at the institutional level (n = 24), a survey of individual teaching staff and other academics with an interest in student retention (n = 353), and a series of interviews (n = 23). Following the collection and analysis of these data an initial version of the framework was developed and presented at a National Forum attended by 148 colleagues from 43 different institutions. Participants at the forum were invited to provide commentary on the usefulness and composition of the framework which was subsequently updated to reflect this feedback. Ultimately, it is envisaged that such a framework might offer institutions an accessible and concise tool to structure and systematize discussion about how learning analytics might be implemented for student retention in their own context.
Article
Full-text available
This study examined the extent to which instructional conditions influence the prediction of academic success in nine undergraduate courses offered in a blended learning model (n = 4134). The study illustrates the differences in predictive power and significant predictors between course-specific models and generalized predictive models. The results suggest that it is imperative for learning analytics research to account for the diverse ways technology is adopted and applied in course-specific contexts. The differences in technology use, especially those related to whether and how learners use the learning management system, require consideration before the log-data can be merged to create a generalized model for predicting academic success. A lack of attention to instructional conditions can lead to an over or under estimation of the effects of LMS features on students' academic success. These findings have broader implications for institutions seeking generalized and portable models for identifying students at risk of academic failure.
Article
Full-text available
We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifi ers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearestneighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classi ers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively). © 2014 Manuel Fernández-Delgado, Eva Cernadas, Senén Barro and Dinani Amorim.
Article
Full-text available
This paper focuses on the comparison communication tools of six open source learning management systems (LMS). It compares the whiteboard/video services, discussion forums, file exchange/internal mail, online journal mail, and real live chat features of each of the LMS's. There are so many open source LMS out there due to this fact it is a bit tedious looking for a suitable one that will meet the instructors needs. This paper seeks to make it easier for instructors that want to make the best choice when choosing a learning management system by revealing which learning management system has the best communication tools. It also focuses on 6 popular LMS, ATutor, Claroline, Dokeos, Ilias, Moodle, and Sakai. The comparison of the six open source LMSs showed that Moodle and ATutor have the best communication tools with user friendly interface.
Conference Paper
Full-text available
All forms of learning take time. There is a large body of research suggesting that the amount of time spent on learning can improve the quality of learning, as represented by academic performance. The wide-spread adoption of learning technologies such as learning management systems (LMSs), has resulted in large amounts of data about student learning being readily accessible to educational researchers. One common use of this data is to measure time that students have spent on different learning tasks (i.e., time-on-task). Given that LMS systems typically only capture times when students executed various actions, time-on-task measures are estimated based on the recorded trace data. LMS trace data has been extensively used in many studies in the field of learning analytics, yet the problem of time-on-task estimation is rarely described in detail and the consequences that it entails are not fully examined. This paper presents the results of a study that examined the effects of different time-on-task estimation methods on the results of commonly adopted analytical models. The primary goal of this paper is to raise awareness of the issue of accuracy and appropriateness surrounding time-estimation within the broader learning analytics community, and to initiate a debate about the challenges of this process. Furthermore, the paper provides an overview of time-on-task estimation methods in educational and related research fields.
Article
Full-text available
With digitisation and the rise of e-learning have come a range of computational tools and approaches that have allowed educators to better support the learners' experience in schools, colleges and universities. The move away from traditional paper-based course materials, registration, admissions and support services to the mobile, always-on and always accessible data has driven demand for information and generated new forms of data observable through consumption behaviours. These changes have led to a plethora of data sets that store learning content and track user behaviours. Most recently, new data analytics approaches are creating new ways of understanding trends and behaviours in students that can be used to improve learning design, strengthen student retention, provide early warning signals concerning individual students and help to personalise the learner's experience. This paper proposes a foundational learning analytics model (LAM) for higher education that focuses on the dynamic interaction of stakeholders with their data supported by visual analytics, such as self-organising maps, to generate conversations, shared inquiry and solution-seeking. The model can be applied for other educational institutions interested in using learning analytics processes to support personalised learning and support services. Further work is testing its efficacy in increasing student retention rates.
Article
Full-text available
Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error and thus the MAE would be a better metric for that purpose. Their paper has been widely cited and may have influenced many researchers in choosing MAE when presenting their model evaluation statistics. However, we contend that the proposed avoidance of RMSE and the use of MAE is not the solution to the problem. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric.
Conference Paper
Full-text available
How do we identify students who are at risk of failing our courses? Waiting to accumulate sufficient assessed work incurs a substantial lag in identifying students who need assistance. We want to provide students with support and guidance as soon as possible to reduce the risk of failure or disengagement. In small classes we can monitor students more directly and mark graded assessments to provide feedback in a relatively short time but large class sizes, where it is most easy for students to disappear and ultimately drop out, pose a much greater challenge. We need reliable and scalable mechanisms for identifying at-risk students as quickly as possible, before they disengage, drop out or fail. The volumes of student information retained in data warehouse and business intelligence systems are often not available to lecturing staff, who can only observe the course-level marks for previous study and participation behaviour in the current course, based on attendance and assignment submission. We have identified a measure of ``at-risk'' behaviour that depends upon the timeliness of initial submissions of any marked activity. By analysing four years of electronic submissions over our school's student body we have extracted over 220,000 individual records, spanning over 1900 students, to establish that early electronic submission behaviour provides can provide a reliable indicator of future behaviour. By measuring the impact on a student's Grade Point Average (GPA) we can show that knowledge of assignment submission and current course level provides a reliable guide to student performance.
Article
Full-text available
Applying data mining (DM) in education is an emerging interdisciplinary research field also known as educational data mining (EDM). It is concerned with developing methods for exploring the unique types of data that come from educational environments. Its goal is to better understand how students learn and identify the settings in which they learn to improve educational outcomes and to gain insights into and explain educational phenomena. Educational information systems can store a huge amount of potential data from multiple sources coming in different formats and at different granularity levels. Each particular educational problem has a specific objective with special characteristics that require a different treatment of the mining problem. The issues mean that traditional DM techniques cannot be applied directly to these types of data and problems. As a consequence, the knowledge discovery process has to be adapted and some specific DM techniques are needed. This paper introduces and reviews key milestones and the current state of affairs in the field of EDM, together with specific applications, tools, and future insights. © 2012 Wiley Periodicals, Inc. This article is categorized under: Application Areas > Business and Industry
Article
Full-text available
With the increasing diversity of students attending university, there is a growing interest in the factors predicting academic performance. This study is a prospective investigation of the academic, psychosocial, cognitive, and demographic predictors of academic performance of first year Australian university students. Questionnaires were distributed to 197 first year students 4 to 8 weeks prior to the end of semester exams and overall grade point averages were collected at semester completion. Previous academic performance was identified as the most significant predictor of university performance. Integration into university, self efficacy, and employment responsibilities were also predictive of university grades. Identifying the factors that influence academic performance can improve the targeting of interventions and support services for students at risk of academic problems.
Conference Paper
Full-text available
Recent research has indicated that misuse of intelligent tutoring software is correlated with substantially lower learning. Students who frequently engage in behavior termed "gaming the system" (behavior aimed at obtaining correct answers and advancing within the tutoring curriculum by sys- tematically taking advantage of regularities in the software's feedback and help) learn only 2/3 as much as similar students who do not engage in such be- haviors. We present a machine-learned Latent Response Model that can identify if a student is gaming the system in a way that leads to poor learning. We believe this model will be useful both for re-designing tutors to respond appropriately to gaming, and for understanding the phenomenon of gaming better.
Conference Paper
Full-text available
The learners’ motivation has an impact on the quality of learning, especially in e-Learning environments. Most of these environments store data about the learner’s actions in log files. Logging the users’ interactions in educational systems gives the possibility to track their actions at a refined level of detail. Data mining and machine learning techniques can “give meaning” to these data and provide valuable information for learning improvement. An area where improvement is absolutely necessary and of great importance is motivation, known to be an essential factor for preventing attrition in e-Learning. In this paper we investigate if the log files data analysis can be used to estimate the motivational level of the learner. A decision tree is build from a limited number of log files from a web-based learning environment. The results suggest that time spent reading is an important factor for predicting motivation; also, performance in tests was found to be a relevant indicator of the motivational level.
Article
Full-text available
The main objective of higher education institutions is to provide quality education to its students. One way to achieve highest level of quality in higher education system is by discovering knowledge for prediction regarding enrolment of students in a particular course, alienation of traditional classroom teaching model, detection of unfair means used in online examination, detection of abnormal values in the result sheets of the students, prediction about students' performance and so on. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. Present paper is designed to justify the capabilities of data mining techniques in context of higher education by offering a data mining model for higher education system in the university. In this research, the classification task is used to evaluate student's performance and as there are many approaches that are used for data classification, the decision tree method is used here. By this task we extract knowledge that describes students' performance in end semester examination. It helps earlier in identifying the dropouts and students who need special attention and allow the teacher to provide appropriate advising/counseling. Keywords-Educational Data Mining (EDM); Classification; Knowledge Discovery in Database (KDD); ID3 Algorithm.
Article
Full-text available
In matters of great importance that have financial, medical, social, or other implications, we often seek a second opinion before making a decision, sometimes a third, and sometimes many more. In doing so, we weigh the individual opinions, and combine them through some thought process to reach a final decision that is presumably the most informed one. The process of consulting "several experts" before making a final decision is perhaps second nature to us; yet, the extensive benefits of such a process in automated decision making applications have only recently been discovered by computational intelligence community. Also known under various other names, such as multiple classifier systems, committee of classifiers, or mixture of experts, ensemble based systems have shown to produce favorable results compared to those of single-expert systems for a broad range of applications and under a variety of scenarios. Design, implementation and application of such systems are the main topics of this article. Specifically, this paper reviews conditions under which ensemble based systems may be more beneficial than their single classifier counterparts, algorithms for generating individual components of the ensemble systems, and various procedures through which the individual classifiers can be combined. We discuss popular ensemble based algorithms, such as bagging, boosting, AdaBoost, stacked generalization, and hierarchical mixture of experts; as well as commonly used combination rules, including algebraic combination of outputs, voting based techniques, behavior knowledge space, and decision templates. Finally, we look at current and future research directions for novel applications of ensemble systems. Such applications include incremental learning, data fusion, feature selection, learning with missing features, confidence estimation, and error correcting output codes; all areas in which ensemble systems have shown great promise
Conference Paper
The accurate estimation of students’ grades in future courses is important as it can inform the selection of next term’s courses and create personalized degree pathways to facilitate successful and timely graduation. This paper presents future-course grade predictions methods based on sparse linear models and low-rank matrix factorizations that are specific to each course or student-course tuple. These methods identify the predictive subsets of prior courses on a course-by-course basis and better address problems associated with the not-missing-at-random nature of the student-course historical grade data. The methods were evaluated on a dataset obtained from the University of Minnesota. This evaluation showed that the course specific models outperformed various competing schemes with the best performing scheme achieving a RMSE across the different courses of 0.632 vs 0.661 for the best competing method.
Article
Every day, teachers design and test new ways of teaching, using learning technology to help their students. Sadly, their discoveries often remain local. By representing and communicating their best ideas as structured pedagogical patterns, teachers could develop this vital professional knowledge collectively
Article
Blended learning (BL) is recognized as one of the major trends in higher education today. To identify how BL has been actually adopted, this study employed a data-driven approach instead of model-driven methods. Latent Class Analysis method as a clustering approach of educational data model-driven methods. Latent Class Analysis method as a clustering approach of educational data mining was employed to extract common activity features of 612 courses in a large private university located in South Korea by using online behavior data tracked from Learning Management System and institution's course database. Four unique subtypes were identified. Approximately 50% of the courses manifested inactive utilization of LMS or immature stage of blended learning implementation, which is labeled as Type I. Other subtypes included Type C - Communication or Collaboration (24.3%), Type D - Delivery or Discussion (18.0%), and Type S - Sharing or Submission (7.2%). We discussed the implications of BL based on data-driven decisions to provide strategic institutional initiatives.
Article
This study aimed to develop a practical model for predicting students at risk of performing poorly in blended learning courses. Previous research suggests that analyzing usage data stored in the log files of modern Learning Management Systems (LMSs) would allow teachers to develop timely, evidence-based interventions to support at risk or struggling students. The analysis of students' tracking data from a Moodle LMS-supported blended learning course was the focus of this research in an effort to identify significant correlations between different online activities and course grade. Out of 29 LMS usage variables, 14 were found to be significant and were input in a stepwise multivariate regression which revealed that only four variables – Reading and posting messages, Content creation contribution, Quiz efforts and Number of files viewed – predicted 52% of the variance in the final student grade.
Article
Building a student performance prediction model that is both practical and understandable for users is a challenging task fraught with confounding factors to collect and measure. Most current prediction models are difficult for teachers to interpret. This poses significant problems for model use (e.g. personalizing education and intervention) as well as model evaluation. In this paper, we synthesize learning analytics approaches, educational data mining (EDM) and HCI theory to explore the development of more usable prediction models and prediction model representations using data from a collaborative geometry problem solving environment: Virtual Math Teams with Geogebra (VMTwG). First, based on theory proposed by Hrastinski (2009) establishing online learning as online participation, we operationalized activity theory to holistically quantify students’ participation in the CSCL (Computer-supported Collaborative Learning) course. As a result, 6 variables, Subject, Rules, Tools, Division of Labor, Community, and Object, are constructed. This analysis of variables prior to the application of a model distinguishes our approach from prior approaches (feature selection, Ad-hoc guesswork etc.). The approach described diminishes data dimensionality and systematically contextualizes data in a semantic background. Secondly, an advanced modeling technique, Genetic Programming (GP), underlies the developed prediction model. We demonstrate how connecting the structure of VMTwG trace data to a theoretical framework and processing that data using the GP algorithmic approach outperforms traditional models in prediction rate and interpretability. Theoretical and practical implications are then discussed.
Article
This study extends prior research on approaches to teaching and perceptions of the teaching situation by investigating these elements when e-learning is involved. In this study, approaches to teaching ranged from a focus on the teacher and the taught content to a focus on the student and their learning, resembling those reported in previous investigations. Approaches to e-teaching ranged from a focus on information transmission to a focus on communication and collaboration. An analysis of perceptions of the teaching situation in relation to e-learning identified key themes influencing adopted approaches: control of teaching, institutional strategy, pedagogical and technological support, time required, teacher skills for using e-learning, and student abilities and willingness for using learning technology. Associations between these elements showed three groups of teachers: one focusing on transmission of information teaching both face-to-face and online while having a general negative perception of the teaching situation in relation to e-learning; a second focusing on student learning both face-to-face and online while having a general positive perception; and a third presenting unexpected patterns of associations. These results may be helpful for supporting different groups of teachers in employing e-learning in their on-campus units of study. At the same time, further research is proposed for inquiring into specific approaches in different disciplines and different university contexts.
Article
Technology adoption is usually modeled as a process with dynamic transitions between costs and benefits. Nevertheless, school teachers do not generally make effective use of technology in their teaching. This article describes a study designed to exhibit the interplay between two variables: the type of technology, in terms of its complexity of use, and the type of teacher, in terms of attitude towards innovation. The results from this study include: (a) elaboration of a characteristic teacher technology adoption process, based on an existing learning curve for new technology proposed for software development; and (b) presentation of exit points during the technology adoption process. This paper concludes that teachers who are early technology adopters and commit a significant portion of their time to incorporating educational technology into their teaching are more likely to adopt new technology, regardless of its complexity. However, teachers who are not early technology adopters and commit a small portion of their time to integrating educational technology are less likely to adopt new technology and are prone to abandoning the adoption at identified points in the process.
Article
The relative abilities of 2, dimensioned statistics-the root-mean-square error (RMSE) and the mean absolute error (MAE) -to describe average model-performance error are examined. The RMSE is of special interest because it is widely reported in the climatic and environmental literature; nevertheless, it is an inappropriate and misinterpreted measure of average error. RMSE is inappropriate because it is a function of 3 characteristics of a set of errors, rather than of one (the average error). RMSE varies with the variability within the distribution of error magnitudes and with the square root of the number of errors (n(1/2)), as well as with the average-error magnitude (MAE). Our findings indicate that MAE is a more natural measure of average error, and (unlike RMSE) is unambiguous. Dimensioned evaluations and inter-comparisons of average model-performance error, therefore, should be based on MAE.
Book
Handbook of Educational Data Mining (EDM) provides a thorough overview of the current state of knowledge in this area. The first part of the book includes nine surveys and tutorials on the principal data mining techniques that have been applied in education. The second part presents a set of 25 case studies that give a rich overview of the problems that EDM has addressed. Researchers at the Forefront of the Field Discuss Essential Topics and the Latest Advances With contributions by well-known researchers from a variety of fields, the book reflects the multidisciplinary nature of the EDM community. It brings the educational and data mining communities together, helping education experts understand what types of questions EDM can address and helping data miners understand what types of questions are important to educational design and educational decision making. Encouraging readers to integrate EDM into their research and practice, this timely handbook offers a broad, accessible treatment of essential EDM techniques and applications. It provides an excellent first step for newcomers to the EDM community and for active researchers to keep abreast of recent developments in the field.
Article
In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to accurately classify some e-learning students, whereas another may succeed, three decision schemes, which combine in different ways the results of the three machine learning techniques, were also tested. The method was examined in terms of overall accuracy, sensitivity and precision and its results were found to be significantly better than those reported in relevant literature.
Article
This study evaluates student use of an online study environment. Its purposes were to (1) determine if college students will voluntarily use online study tools, (2) identify characteristics of users and nonusers of the tools, and (3) determine if the use of online study tools relates to course achievement. Approximately 25% of students used the online tools for more than one hour before each of three examinations. In comparing use of the study tools provided, the largest number of students made use of the online lecture notes and the greatest amount of online study time was devoted to reviewing multiple choice questions. The perceived ease of access to the Internet differentiated tool users from nonusers. Study tool users scored higher on course examinations after accounting for measures of ability and study skill.
Article
This paper presents a methodological approach based on Bayesian Networks for modelling the behaviour of the students of a bachelor course in computers in an Open University that deploys distance educational methods. It describes the structure of the model, its application for modelling the behaviour of student groups in the Informatics Course of the Hellenic Open University, as well as the advantages of the presented method under conditions of uncertainty. The application of this model resulted in promising results as regards both prediction of student behaviour, based on modelled past experience, and assessment (i.e., identification of the reasons that led students to a given `current' state). The method presented in this paper offers an effective way to model past experience, which can significantly aid in decision-making regarding the educational procedure. It can also be used for assessment purposes regarding a current state enabling tutors to identify mistakes or bad practices so as to avoid them in the future as well as identify successful practices that are worth repeating. The paper concludes that modelling is feasible and that the presented method is useful especially in cases of large amounts of data that are hard to draw conclusions from without any modelling. It is emphasised that the presented method does not make any predictions and assessments by itself; it is a valuable tool for modelling the educational experience of its user and exploiting the past data or data resulting from its use.
Article
This chapter introduces a study which focuses on predicting college success as measured by students’ grade point averages (GPAs). The chapter also reviews prior research related to various types of predictors. Specifically, two categories of predictors are identified: ability measures and non-cognitive variables. Finally, an overview of the study is presented.
Conference Paper
We present a machine-learned model that can automatically detect when a student using an intelligent tutoring system is off-task, i.e., engaged in behavior which does not involve the system or a learning task. This model was developed using only log files of system usage (i.e. no screen capture or audio/video data). We show that this model can both accurately identify each student's prevalence of off-task behavior and can distinguish off-task behavior from when the student is talking to the teacher or another student about the subject matter. We use this model in combination with motivational and attitudinal instruments, developing a profile of the attitudes and motivations associated with off-task behavior, and compare this profile to the attitudes and motivations associated with other behaviors in intelligent tutoring systems. We discuss how the model of off-task behavior can be used within interactive learning environments which respond to when students are off-task.
Article
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field. Copyright © 1996, American Association for Artificial Intelligence. All rights reserved.
Article
Earlier studies have suggested that higher education institutions could harness the predictive power of Learning Management System (LMS) data to develop reporting tools that identify at-risk students and allow for more timely pedagogical interventions. This paper confirms and extends this proposition by providing data from an international research project investigating which student online activities accurately predict academic achievement. Analysis of LMS tracking data from a Blackboard Vista-supported course identified 15 variables demonstrating a significant simple correlation with student final grade. Regression modelling generated a best-fit predictive model for this course which incorporates key variables such as total number of discussion messages posted, total number of mail messages sent, and total number of assessments completed and which explains more than 30% of the variation in student final grade. Logistic modelling demonstrated the predictive power of this model, which correctly identified 81% of students who achieved a failing grade. Moreover, network analysis of course discussion forums afforded insight into the development of the student learning community by identifying disconnected students, patterns of student-to-student communication, and instructor positioning within the network. This study affirms that pedagogically meaningful information can be extracted from LMS-generated student tracking data, and discusses how these findings are informing the development of a customizable dashboard-like reporting tool for educators that will extract and visualize real-time data on student engagement and likelihood of success.
Book
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in an