Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An increasing number of higher education institutions have deployed learning management systems (LMSs) to support learning and teaching processes. Accordingly, data-driven research has been conducted to understand the impact of student participation within these systems on student outcomes. However, most research has focused on small samples or has used variables that are expensive to measure, which limits its generalizability. This article presents a prediction model based on low-cost variables and a sophisticated algorithm, to predict early which students attending large classes (with more than 50 enrollments) who are at risk of failing a course. Therefore, it will enable instructors and educational managers to carry out early interventions to prevent course failure. The results overperform other approaches in terms of accuracy, cost, and generalization. Moreover, LMS usage information improved the model by up to 12.28% in terms of root-mean-square error, enabling better early identification of at-risk students.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Some studies use data on student demographics and socio-economical factors along with internal assessment as well (Kotsiantis and In addition to predictors, researchers have explored various methods for predicting student performance as well. Some of them (Hämäläinen and Vinni 2006;Hong Yu et al. 2018), support vector machine (Santana et al. 2017), random forest (Ahmed and Sadiq 2018;Chen et al. 2018;Hasan et al. 2018;Sandoval et al. 2018), deep learning Kim et al. 2018) and linear regression Yang et al. 2018) are also evident in the literature. Apart from the application of traditional data mining approaches, researchers have also proposed some specialised algorithms to predict student performance (Hasheminejad and Sarvmili 2018;Márquez-Vera et al. 2013;Meier et al. 2016;Uddin and Lee 2017;Xu et al. 2017;Zollanvari et al. 2017). ...
... As mentioned earlier, some researchers have even tried to predict actual marks Huang and Fang 2013;Lu et al. 2018;Romero et al. 2008;Sandoval et al. 2018;Yang et al. 2018). The existing studies have used both regression and classification techniques for this purpose. ...
... For example,Hong et al. (2017),Lu et al. (2018) andYang et al. (2018) have used video viewing behaviour for estimating student performance. Student behaviour in an online forum(Mueen et al. 2016;Ornelas and Ordonez 2017;Romero et al. 2008;Widyahastuti and Tjhin 2017;Yoo and Kim 2014;Yu et al. 2018), learning management system(Conijn et al. 2017;Kim et al. 2018;Ostrow et al. 2015;Sandoval et al. 2018;Xing et al. 2015), their movement pattern, and activity during web browsing(Chaturvedi and Ezeife 2017) also help in predicting student performance.Thai-Nghe et al. (2009) andZollanvari et al. (2017) have used teaching quality and psychological factors of students for classifying student performance. ...
Article
Full-text available
Student performance modelling is one of the challenging and popular research topics in educational data mining (EDM). Multiple factors influence the performance in non-linear ways; thus making this field more attractive to the researchers. The widespread availability of e ducational datasets further catalyse this interestingness, especially in online learning. Although several EDM surveys are available in the literature, we could find only a few specific surveys on student performance analysis and prediction. These specific surveys are limited in nature and primarily focus on studies that try to identify possible predictor or model student performance. However, the previous works do not address the temporal aspect of prediction. Moreover, we could not find any such specific survey which focuses only on classroom-based education. In this paper, we present a systematic review of EDM studies on student performance in classroom learning. It focuses on identifying the predictors, methods used for such identification, time and aim of prediction. It is significantly the first systematic survey of EDM studies that consider only classroom learning and focuses on the temporal aspect as well. This paper presents a review of 140 studies in this area. The meta-analysis indicates that the researchers achieve significant prediction efficiency during the tenure of the course. However, performance prediction before course commencement needs special attention.
... In all cases, an early forecast is desired to enable proactive teaching actions aimed at providing students with sufficient support to improve their performance and avoid their attrition [6], [7]. To this respect, some recent experiments have proven that tools such as intelligent tutoring systems, early warning systems (EWSs), and recommender systems can be very useful in higher education [8]. ...
... It should also be noted that most predictive student-related attributes in each CP were manually derived, because previous works have suggested that student's performance predictions can be maximized when domain knowledge is used as support to select the best performing set of input data [27]. Moreover, a broad variety of previous works have also made use of this approach [8], [15], [29], [51]. Nonetheless, automatic selection of features in each CP was also explored. ...
Article
Early warning systems (EWSs) have proven to be useful in identifying students at risk of failing both online and conventional courses. Although some general systems have reported acceptable ability to work in modules with different characteristics, those designed from a course-specific perspective have recently provided better outcomes. Hence, the main goal of this work is to design a tailored EWS for a conventional course in power electronic circuits. For that purpose, effectiveness of some common classifiers in predicting at-risk students has been analyzed. Although slight differences in their performance have only been noticed, an ensemble classifier combining outputs from several of them has provided to be the best performer. As a major contribution, a novel weighted voting combination strategy has been proposed to exploit global information about how basic prediction algorithms perform on several time points during the semester and diverse subsets of student-related features. Predictions at five critical points have been analyzed, revealing that the end of the fourth week is the optimal time to identify students at risk of failing the course. At that moment, accuracies about 85-90% have been reached. Moreover, several scenarios with different subsets of student-related attributes have been considered in every time point. Besides common parameters from students background and continuous assessment, novel features estimating students performance progression on weekly assignments have been introduced. The proposal of this set of new input variables is another key contribution, because they have allowed to improve more than 5% predictions of at-risk students at every time point.
... On an university context, a model to early predict students who are at-risk of failing was presented by Sandoval et al. [31]. The data comes from the university's LMS, that is, activity logs for each user and the administrative information system called DARA, that is, past and current academic status and demographic data. ...
... Considering that our interest is to early predict at-risk students, we measured the performances of the models until the middle of the semester (8 weeks). It is possible to say that the models achieved performances that can be considered satisfactory (with AUC ROC values of 90% already in the first week) and it is similar to the results found in the literature, for example, Detoni et al. [25], Howard et al. [46], Sandoval et al. [31], and Lu et al. [47]. These results were found considering the pre-processing of the datasets using SMOTE to balance the classes. ...
Article
Full-text available
Algorithms and programming are some of the most challenging topics faced by students during undergraduate programs. Dropout and failure rates in courses involving such topics are usually high, which has raised attention towards the development of strategies to attenuate this situation. Machine learning techniques can help in this direction by providing models able to detect at-risk students earlier. Therefore, lecturers, tutors or staff can pedagogically try to mitigate this problem. To early predict at-risk students in introductory programming courses, we present a comparative study aiming to find the best combination of datasets (set of variables) and classification algorithms. The data collected from Moodle was used to generate 13 distinct datasets based on different aspects of student interactions (cognitive presence, social presence and teaching presence) inside the virtual environment. Results show there are no statistically significant difference among models generated from the different datasets and that the counts of interactions together with derived attributes are sufficient for the task. The performances of the models varied for each semester, with the best of them able to detect students at-risk in the first week of the course with AUC ROC from 0.7 to 0.9. Moreover, the use of SMOTE to balance the datasets did not improve the performance of the models.
... These researchers argue that some demographic factors can affect the academic performance of students at different study levels (Ali et al., 2013;Shum & Crick, 2012;Tempelaar et al., 2015). Other demographic characteristics that are used in the literature are family income, socio-economic status, race and ethnicity (Aguiar et al., 2014;Costa et al., 2017;Miguéis et al., 2018;Sandoval et al., 2018;Wolff et al., 2013). ...
... Other metrics include Mean Absolute Error (MAE) and Root Mean Square Error (RMSE, Chai & Draxler, 2014;Howard et al., 2018;Ioanna Lykourentzou et al., 2009;Kotsiantis, 2012). Average accuracy of a model for both classification and regression problem is measured using Average Prediction Accuracy (PAP) and Average Accurate Prediction (APA, Huang & Fang, 2013;Sandoval et al., 2018). Details of the evaluation measures used in the literature are shown in Table 3. ...
Article
Predictive models on students’ academic performance can be built by using historical data for modelling students’ learning behaviour. Such models can be employed in educational settings to determine how new students will perform and in predicting whether these students should be classed as at-risk of failing a course. Stakeholders can use predictive models to detect learning difficulties faced by students and thereby plan effective interventions to support students. In this paper, we present a systematic literature review on how predictive analytics have been applied in the higher education domain. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a literature search from 2008 to 2018 and explored current trends in building data-driven predictive models to gauge students’ performance. Machine learning techniques and strategies used to build predictive models in prior studies are discussed. Furthermore, limitations encountered in interpreting data are stated and future research directions proposed.
... Regarding issues addressed in the reviewed literature, the reported studies address a wide variety of issues. A common theme is identifying struggling students (Gray et al., 2016;Jayaprakash et al., 2014;Lopez Guarin et al., 2015;Sandoval et al., 2018). As shown in Table 6, different data analysis methods were used for the studies reported in the publications. ...
... Lopez Guarin et al. (2015) detail a data integration technique of joining database tables, which implies the use of a relational database for storage, with the relational model as the underlying format. Sandoval et al. (2018) also integrate data into a relational database. Di Mitri et al. (2017) and Mangaroska et al. (2019) format data using the xAPI specification. ...
Article
Full-text available
Learning analytics (LA) promises understanding and optimization of learning and learning environments. To enable richer insights regarding questions related to learning and education, LA solutions should be able to integrate data coming from many different data sources, which may be stored in different formats and have varying levels of structure. Data integration also plays a role for the scalability of LA, an important challenge in itself. The objective of this review is to assess the current state of LA in terms of data integration in the context of higher education. The initial search of six academic databases and common venues for publishing LA research resulted in 115 publications, out of which 20 were included in the final analysis. The results show that a few data sources (e.g., LMS) appear repeatedly in the research studies; the number of data sources used in LA studies in higher education tends to be limited; when data are integrated, similar data formats are often combined (a low-hanging fruit in terms of technical challenges); the research literature tends to lack details about data integration in the implemented systems; and, despite being a good starting point for data integration, educational data specifications (e.g., xAPI) seem to seldom be used. In addition, the results indicate a lack of stakeholder (e.g., teachers/instructors, technology vendors) involvement in the research studies. The review concludes by offering recommendations to address limitations and gaps in the research reported in the literature.
... Researchers focus on the LMS recorded behavioural data for prediction, based on the assumption that records in the LMS can represent certain behaviours or traits of the user. These behaviours or traits are associated with their academic performance (Conijn et al., 2016;Dominguez et al., 2016;Shruthi and Chaitra, 2016;Adejo and Connolly, 2018;Helal et al., 2018;Sandoval et al., 2018;Akçapınar et al., 2019;Liao et al., 2019;Sukhbaatar et al., 2019;Mubarak et al., 2020b;Waheed et al., 2020). Different studies are concerned with different issues. ...
... The experimental results verify the effectiveness of its algorithm. From a methodological point of view, many studies belong to the research of hybrid model type (Sandoval et al., 2018;Yu et al., 2018a;Zhou et al., 2018;Akçapınar et al., 2019;Baneres et al., 2019;Hassan et al., 2019;Hung et al., 2019;Polyzou and Karypis, 2019). The underlying logic of this type of research is that the algorithms differ in their optimization search logic and find the most suitable algorithm for course failure prediction by comparison. ...
Article
Full-text available
Anomalies in education affect the personal careers of students and universities' retention rates. Understanding the laws behind educational anomalies promotes the development of individual students and improves the overall quality of education. However, the inaccessibility of educational data hinders the development of the field. Previous research in this field used questionnaires, which are time- and cost-consuming and hardly applicable to large-scale student cohorts. With the popularity of educational management systems and the rise of online education during the prevalence of COVID-19, a large amount of educational data is available online and offline, providing an unprecedented opportunity to explore educational anomalies from a data-driven perspective. As an emerging field, educational anomaly analytics rapidly attracts scholars from a variety of fields, including education, psychology, sociology, and computer science. This paper intends to provide a comprehensive review of data-driven analytics of educational anomalies from a methodological standpoint. We focus on the following five types of research that received the most attention: course failure prediction, dropout prediction, mental health problems detection, prediction of difficulty in graduation, and prediction of difficulty in employment. Then, we discuss the challenges of current related research. This study aims to provide references for educational policymaking while promoting the development of educational anomaly analytics as a growing field.
... Hamsa et al. [10] were utilizing internal and external assessments. Sandoval et al. [11] were CGPA, external assessments, student demographics, internet and social network interaction. While Cortez and Silva [12] were utilizing internal assessments, internet activity, extracurricular dan student demographics. ...
... The Fuzzy based method such as Neuro-Fuzzy (ANFIS) [13] and Fuzzy Association Rule Mining [14] also employed by to predict student"s performance. Random Forest (RF) were employed by [11], [12] had promising results on the prediction student"s performance. Decision Tree (DT) based method had a promising result on [5], [8]- [10], [12]. ...
Article
Full-text available
Student’s performance is the most important value of the educational institutes for their competitiveness. In order to improve the value, they need to predict student’s performance, so they can give special treatment to the student that predicted as low performer. In this paper, we propose 3 boosting algorithms (C5.0, adaBoost.M1, and adaBoost.SAMME) to build the classifier for predicting student’s performance. This research used 1UCI student performance datasets. There are 3 scenarios of evaluation, the first scenario was employ 10-fold cross-validation to compare performance of boosting algorithms. The result of first scenario showed that adaBoost.SAMME and adaBoost.M1 outperform baseline method in binary classification. The second scenario was used to evaluate boosting algorithms under different number of training data. On the second scenario, adaBoost.M1 was outperformed another boosting algorithms and baseline method on the binary classification. As third scenario, we build models from one subject dataset and test using onother subject dataset. The third scenario results indicate that it can build prediction model using one subject to predict another subject.
... Less than 83% of the activity in this dataset was passive. Sandoval et al (2018) found over 95% of the activity in their dataset was passive. In this study although there was little activity indicating interaction between students there was proportionally more activity indicating interaction with the system (e.g. ...
Article
Full-text available
Increasingly educational providers are being challenged to use their data stores to improve teaching and learning outcomes for their students. A common source of such data is learning management systems which enable providers to manage a virtual platform or space where learning materials and activities can be provided for students to engage with. This study investigated whether data from the learning management system Moodle can be used to predict academic performance of students in a blended learning further education setting. This was achieved by constructing measures of student activity from Moodle logs of further education courses. These were used to predict alphabetic student grade and whether a student would pass or fail the course. A key focus was classifiers that could predict likelihood of failure from data available early in the term. The results showed that classifiers built on all course data predicted student grade moderately well (accuracy= 60.5%, kappa = 0.43) and whether a student would pass or fail very well (accuracy= 92.2%, kappa=0.79). However, classifiers built on the first six weeks of data did not predict failing students well. Classifiers trained on the first ten weeks of data improved significantly on a no-information rate (p<0.008) though more than half of failing students were still misclassified. The evidence indicates that measures of Moodle activity on further education courses could be useful as part of on an early-warning system at ten weeks.
... Regression is a useful statistical method when data are not very complex, and the number of observations is not large. Linear regression has been widely used in predicting student performance (e.g., 27). In this article, multiple linear regression analyses are employed to predict student course performance. ...
Article
Over the past decade, the field of education has seen stark changes in the way that data are collected and leveraged to support high-stakes decision-making. Utilizing big data as a meaningful lens to inform teaching and learning can increase academic success. Data-driven research has been conducted to understand student learning performance, such as predicting at-risk students at an early stage and recommending tailored interventions to support services. However, few studies in veterinary education have adopted Learning Analytics. This article examines the adoption of Learning Analytics by using the retrospective data from the first-year professional Doctor of Veterinary Medicine program. The article gives detailed examples of predicting six courses from week 0 (i.e., before the classes started) to week 14 in the semester of Spring 2018. The weekly models for each course showed the change of prediction results as well as the comparison between the prediction results, and students' actual performance. From the prediction models, at-risk students were successfully identified at the early stage, which would help inform instructors to pay more attention to them at this point.
... In the LR model, the two-dimensional data is represented as dots falling into a straight line, where the Xaxis is the predictor and the Y-axis is the target [39]. The performance of the regression model is evaluated based on four of the most popular metrics: Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (R-squared) [40]. The MSE, RMSE, MAE, and R-squared are presented below, from Equation (5) to Equation (8). ...
Article
Full-text available
Educational Data Mining (EDM) helps to recognize the performance of students and predict their academic achievements that include the successes aspects and failures, negative aspects and challenges. In the educational systems, a massive amount of students' data has been collected, which has become difficult for officials to search through and obtain the knowledge required to discover challenges facing students and universities by traditional methods. Therefore, the rooted problem is how to dive into these data and discover real challenges that are facing both the students and the universities. The main aim of this research is to extract hidden, significant patterns, new insights from students' historical data, which can solve the current problems, help to enhance the educational process and to improve academic performance. The data mining tools used for this task are classification, regression, and association rules for frequent patterns generation. The research data sets gathered from the College of Business and Economics (CBE). The finding of this research can help to make appropriate decisions for certain circumstances and provide better suggestions for overcoming students' weaknesses and failures. Through the findings, numerous problems related to a students' performance discovered at different levels and in various courses. The research findings indicated that there are many important problems. Consequently, a suggestion of suitable solutions, which can be presented to the relevant authorities for the benefit and improving student performance and activating academic advising.
... Lara et al. (2014) predict student performance to provide teachers with inferences from past student performance. Sandoval et al. (2018) present a prediction model based on predicting early which students attending large classes (with more than 50 enrollments) are at risk of failing a course. ...
Article
Full-text available
Data mining is one of the important and beneficial technological developments in education and its usage area is becoming widespread day by day as it includes applications that contribute positively to teaching activities. By making raw data in the field of education meaningful using data mining techniques, teaching activities can be made more effective and efficient. Studies carried out in the field of education between 2014-2020 with data mining methods were scanned from the "Science Direct" database. As a result of scanning studies, 60 papers were found to be directly related to data mining in education. The studies include issues such as the development of e-learning systems, pedagogical support, clustering of educational data, and student performance predictions. These selected articles were analyzed in terms of purpose, application area, method, and contribution to the literature. This study aims to group the studies conducted in the field of education using the data mining method under certain headings, evaluate the methods and goals and present the need in this field to the researchers who will work in this field.
... The instructors can use and apply educational interventions to reduce failure rate. (Sandoval et al., 2018) Students' academic records are stored in the offices of the engineering faculties and these records includes the performance of student at different subject as well as the information regarding the student origin, age, previous studies information. All this information should be enough and help to categorize class of the students we are dealing with. ...
Article
Full-text available
Abstract This paper is aims to analyses and evaluates student performance in the Department of Computer Science, Jigawa State Polytechnic. The data were collected for two (2) years intake from July 2016 to June 2018 it contains student previous academic records Such as course code, course Name marks obtained for each student by applying the classification technique algorism in Rapid Miner tool. Data mining provides good and powerful methods for education and another different field of study. Due to the vast amount of data of student which is used to find out valuable information which can be used to determine the student success. In this paper a classification task was used for the prediction. A decision tree model is applied during the experiment. The results indicates that it is possible to predicts the graduation performance, in addition, a procedure for evaluating the performance for each course have identified.
... All established models were evaluated for their ability to make accurate predictions on the validation data set; a comparison of prediction accuracy of the various models was done. Since our outcome variable is measured as a continuous variable, prediction accuracy and performance of the predictive models were assessed with several continuous error metrics, namely root mean square error (RMSE), mean absolute error (MAE), and R 2 (Sandoval et al., 2018). The RMSE refers to the square root of the average squared difference between the predicted and actual values. ...
Preprint
Full-text available
Predicting students’ academic performance has long been an important area of research in education. Most existing literature have made use of traditional statistical methods that run into the problems of overfitted models, inability to effectively handle large numbers of participants and predictors, and inability to pick out non-linearities that may be present. Regression-based ML methods that can produce highly interpretable yet accurate models for new predictions, are able to provide some solutions to the aforementioned problems. The present study is the first study that develops and compares between traditional MLR methods and regression-based ML methods (i.e. ridge regression, LASSO regression, elastic net, and regression trees) to predict students’ science performance in the PISA 2015. A total of 198,712 students from 60 countries, and 66 student- and school-related predictors were used to develop the predictive models. Predictive accuracy of the various models built were not that different, however, there were significant differences in the predictors identified as most important by the different methods. Although regression-based ML techniques did not outperform traditional MLR, significant advantages for using ML methods were noted and discussed. Moving forward, we strongly believe that there is merit for using such regression-based ML methods in educational research. Educational research can benefit from adopting ML practices and methods to produce models that can not only be used for explaining factors that influence academic performance prediction, but also for making more accurate predictions on unseen data.
... Because of the efforts of higher education institutions to carry out the digitization of their students' data, access to them has been facilitated, generating new opportunities for their analysis (Sandoval et al., 2018). In this sense, data mining becomes important and emerges as an interesting tool to answer complex questions in education such as learning, the prediction of academic performance and SA (Mduma et al., 2019). ...
Article
Purpose-The prediction of student attrition is critical to facilitate retention mechanisms. This study aims to focus on implementing a method to predict student attrition in the upper years of a physiotherapy program. Design/methodology/approach-Machine learning is a computer tool that can recognize patterns and generate predictive models. Using a quantitative research methodology, a database of 336 university students in their upper-year courses was accessed. The participant's data were collected from the Financial Academic Management and Administration System and a platform of Universidad Aut onoma de Chile. Five quantitative and 11 qualitative variables were chosen, associated with university student attrition. With this database, 23 classifiers were tested based on supervised machine learning. Findings-About 23.58% of males and 17.39% of females were among the attrition student group. The mean accuracy of the classifiers increased based on the number of variables used for the training. The best accuracy level was obtained using the "Subspace KNN" algorithm (86.3%). The classifier "RUSboosted trees" yielded the lowest number of false negatives and the higher sensitivity of the algorithms used (78%) as well as a specificity of 86%. Practical implications-This predictive method identifies attrition students in the university program and could be used to improve student retention in higher grades. Originality/value-The study has developed a novel predictive model of student attrition from upper-year courses, useful for unbalanced databases with a lower number of attrition students.
... He collected information form 21314 UG students of three semesters from second semester of 2013 to second semester of 2014. After comparing the three models Random Forest (RF), Linear Regression (LR) and Robust Linear Regression models it was found that Random Forest gives best performance over others [3]. ...
... Obviously, the majority of these studies either ignore that students have no control over such factors and informing them about such predictors may have destructive effects on them or do not consider that such indicators might be unavailable due to multiple reasons (e.g., data privacy) [17,18,32]. More research should alternatively concentrate on using data related to students' online learning behavior that are logically the best predictors of their performance in courses. ...
Article
Full-text available
While modelling students’ learning behavior or preferences has been found as a crucial indicator for their course achievement, very few studies have considered it in predicting achievement of students in online courses. This study aims to model students’ online learning behavior and accordingly predict their course achievement. First, feature vectors are developed using their aggregated action logs during a course. Second, some of these feature vectors are quantified into three numeric values that are used to model students’ learning behavior, namely, accessing learning resources (content access), engaging with peers (engagement), and taking assessment tests (assessment). Both students’ feature vectors and behavior model constitute a comprehensive students’ learning behavioral pattern which is later used for prediction of their course achievement. Lastly, using a multiple criteria decision-making method (i.e., TOPSIS), the best classification methods were identified for courses with different sizes. Our findings revealed that the proposed generalizable approach could successfully predict students’ achievement in courses with different numbers of students and features, showing the stability of the approach. Decision Tree and AdaBoost classification methods appeared to outperform other existing methods on different datasets. Moreover, our results provide evidence that it is feasible to predict students’ course achievement with a high accuracy through modelling their learning behavior during online courses.
... As it is apparent, surprisingly, most of the studies focus on students' past performance or non-academic-related data (e.g., gender, race, and socioeconomic status) in their predictive models, largely neglecting data logged from students' activity (e.g. Sandoval et al. 2018). Such predictive models simply ignore the fact that many of these variables fall outside the control of students and teachers alike. ...
Article
Full-text available
A significant amount of educational data mining (EDM) research consider students’ past performance or non-academic factors to build predictive models, paying less attention to students’ activity data. While procrastination has been found as a crucial indicator which negatively affects performance of students, no research has investigated this underlying factor in predicting achievement of students in online courses. In this study, we aim to predict students’ course achievement in Moodle through their procrastination behaviour using their homework submission data. We first build feature vectors of students’ procrastination tendencies by considering their active, inactive, and spare time for homework, along with homework grades. Accordingly, we then use clustering and classification methods to optimally sort and put students into various categories of course achievement. We use a Moodle course from the University of Tartu in Estonia which includes 242 students to assess the efficacy of our proposed approach. Our findings show that our approach successfully predicts course achievement for students through their procrastination behaviour with precision and accuracy of 87% and 84% with L-SVM outperforming other classification methods. Furthermore, we found that students who procrastinate more are less successful and are potentially going to do poorly in a course, leading to lower achievement in courses. Finally, our results show that it is viable to use a less complex approach that is easy to implement, interpret, and use by practitioners to predict students’ course achievement with a high accuracy, and possibly take remedial actions in the semester.
... The learning in the conditions of informational IOP Publishing doi: 10.1088/1742-6596/1946/1/012014 2 and educational environment helps to develop an analytical thinking and understanding underlying issues [16]. An increasing number of higher education institutions have deployed learning management systems (LMS) to support learning and teaching processes [17][18][19]. But implementation learning management system in higher education institutions needs a range of special online tools [20]. ...
... Research work and projects have been deployed with Learning Analytics tools that address several learning issues across different platforms. Sandova, et all 2018 and gray, et all 2016 [6][7] use learning analytics to understand the impact of student participation in learning systems and outcomes. They predict which attending students may fail a course or have difficulty in the academic course [8]. ...
... To predict student's performance, they have used Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) [1]. Sandoval et al. used LR, RLR, and Random forest to predict students' performance and concluded that RF performed better than LR and RLR [2]. Okubo et al. used Recurrent Neural Networks (RNN) for predicting final grades of students. ...
... Prior research has pinpointed pre-service teachers' difficulties in implementing adaptive teaching in their lessons, possibly due in part to pre-service teachers' difficulties in noticing students' behaviors that manifest students' difficulties in understanding and learning . As described in the current literature (e.g., Blomberg et al. 2014;Sandoval et al. 2018aSandoval et al. , 2018bStürmer et al. 2013aStürmer et al. , 2013b, teachers need to notice and interpret student behavior as part of their everyday classroom work. Considering that few teacher education programs to date have included explicit instruction in how to engage in analysis of student behaviors to usefully promote adaptive teaching , the current study findings suggest a means to help pre-service teachers develop their facility for adaptive teaching practice, by incorporating noticing of meaningful student behaviors at an early stage in their teacher education programs. ...
Article
Full-text available
Teachers need to notice and interpret student behavior as part of their everyday classroom work. Current teacher education programs often do not explicitly focus on helping pre-service teachers learn to analyze and interpret student behavior and understand how it may influence teachers’ teaching behaviors, which in turn may affect students’ thinking and achievements. Using a quasi-experimental design, the current study examined a systematic reflective approach promoting dual learning from both teacher and student perspectives in authentic videotaped classrooms. More specifically, the study examined how this dual reflective “professional vision” framework influenced pre-service teachers’ actual ability to explicitly teach meta-strategic knowledge (MSK) to students. Results indicated that pre-service teachers whose video-analysis reflected on both teachers’ and students’ behaviors demonstrated greater improvement in their MSK-teaching, and their students showed better MSK achievements, compared to pre-service teachers whose video-analysis reflected only on teachers’ behaviors. The current study suggests the need to integrate systematic dual reflective professional vision approaches – that analyze not only teachers’ but also students’ behaviors – into teacher preparation programs as a means for developing pre-service teachers’ capacity to promote students’ MSK.
Chapter
Creating learning environments, where students, parents, and teachers are linked to a learning process, helps study their overall impact on the students’ performance. Data mining can analyze these inter-relationships and thus enable the prediction of academic performance to improve the student’s academic level. The main factors that affect the student’s performance were selected using feature selection methods. An analysis of the crucial features was investigated to better understand the data. One of the main outcomes found is the impact of the behavioral features on the students’ academic performance. Moreover, gender and relation demographical features are another important features found. It was evedent that there is an academic disparity between genders, as females constitute the most outstanding students. Furthermore, mothers have a clear role in student academic excellence. Six machine learning methods were used and tested to predict the studnet’s performance, namely random forest, logistic regression, XGBoost, MLP, and ensemble learning using bagging and voting. Of all the methods, the random forest got the highest accuracy with 10-best selected features that reached 77%. Overfitting was addressed successfully by tuning the hyper-parameters. The results show that data mining can accurately predict the students’ performance level, as well as highlight the most influential features.
Chapter
This chapter focuses on the key practical aspects to be considered when facing the task of developing predictive models for student learning outcomes. It is based on the authors' experience building and delivering dropout prediction models within higher education contexts. The chapter presents the information used to generate the predictive models, how this information is treated, how the models are fed, which types of algorithms have been used, and why and how the obtained results have been evaluated. It recommends best practices for building, training, and evaluating predictive models. It is hoped that readers will find these recommendations useful for the design, development, deployment, and use of early warning systems.
Article
The aim of this paper is to survey recent research publications that use Soft Computing methods to answer education-related problems based on the analysis of educational data ‘mined’ mainly from interactive/e-learning systems. Such systems are known to generate and store large volumes of data that can be exploited to assess the learner, the system and the quality of the interaction between them. Educational Data Mining (EDM) and Learning Analytics (LA) are two distinct and yet closely related research areas that focus on this data aiming to address open education-related questions or issues. Besides ‘classic’ data analysis methods such as clustering, classification, identification or regression/analysis of variances, soft computing methods are often employed by EDM and LA researchers to achieve their various tasks. Their very nature as iterative optimization algorithms that avoid the exhaustive search of the solutions space and go for possibly suboptimal solutions yet at realistic time and effort, along with their heavy reliance on rich data sets for training, make soft computing methods ideal tools for the EDM or LA type of problems. Decision trees, random forests, artificial neural networks, fuzzy logic, support vector machines and genetic/evolutionary algorithms are a few examples of soft computing approaches that, given enough data, can successfully deal with uncertainty, qualitatively stated problems and incomplete, imprecise or even contradictory data sets – features that the field of education shares with all humanities/social sciences fields. The present review focuses, therefore, on recent EDM and LA research that employs at least one soft computing method, and aims to identify (i) the major education problems/issues addressed and, consequently, research goals/objectives set, (ii) the learning contexts/settings within which relevant research and educational interventions take place, (iii) the relation between classic and soft computing methods employed to solve specific problems/issues, and (iv) the means of dissemination (publication journals) of the relevant research results. Selection and analysis of a body of 300 journal publications reveals that top research questions in education today seeking answers through soft computing methods refer directly to the issue of quality – a critical issue given the currently dominant educational/pedagogical models that favor e-learning or computer- or technology-mediated learning contexts. Moreover, results identify the most frequently used methods and tools within EDM/LA research and, comparatively, within their soft computing subsets, along with the major journals relevant research is being published worldwide. Weaknesses and issues that need further attention in order to fully exploit the benefits of research results to improve both the learning experience and the learning outcomes are discussed in the conclusions.
Article
Prediction models that underlie “early warning systems” need improvement. Some predict outcomes using entrenched, unchangeable characteristics (e.g., socioeconomic status) and others rely on performance on early assignments to predict the final grades to which they contribute. Behavioral predictors of learning outcomes often accrue slowly, to the point that time needed to produce accurate predictions leaves little time for intervention. We aimed to improve on these methods by testing whether we could predict performance in a large lecture course using only students’ digital behaviors in weeks prior to the first exam. Early prediction based only on malleable behaviors provides time and opportunity to advise students on ways to alter study and improve performance. Thereafter, we took the not-yet-common step of applying the model and testing whether providing digital learning support to those predicted to perform poorly can improve their achievement. Using learning management system log data, we tested models composed of theory-aligned behaviors using multiple algorithms and obtained a model that accurately predicted poor grades. Our algorithm correctly identified 75% of students who failed to earn the grade of B or better needed to advance to the next course. We applied this model the next semester to predict achievement levels and provided a digital learning strategy intervention to students predicted to perform poorly. Those who accessed advice outperformed classmates on subsequent exams, and more students who accessed the advice achieved the B needed to move forward in their major than those who did not access advice.
Article
Full-text available
This decade, e-learning systems provide more interactivity to instructors and students than traditional systems and make possible a completely online (CO) education. However, instructors could not warn if a CO student is engaged or not in the course, and they could not predict his or her academic performance in courses. This work provides a collection of models (exploratory factor analysis, multiple linear regressions, cluster analysis, and correlation) to early predict the academic performance of students. These models are constructed using Moodle interaction data, characteristics, and grades of 802 undergraduate students from a CO university. The models result indicated that the major contribution to the prediction of the academic student performance is made by four factors: Access, Questionnaire, Task, and Age. Access factor is composed by variables related to accesses of students in Moodle, including visits to forums and glossaries. Questionnaire factor summarizes variables related to visits and attempts in questionnaires. Task factor is composed of variables related to consulted and submitted tasks. The Age factor contains the student age. Also, it is remarkable that Age was identified as a negative predictor of the performance of students, indicating that the student performance is inversely proportional to age. In addition, cluster analysis found five groups and sustained that number of interactions with Moodle are closely related to performance of students.
Article
Full-text available
The global explosion of COVID-19 has brought unprecedented challenges to traditional higher education, especially for freshmen who have no major; they cannot determine what their real talents are. Thus, it is difficult for them to make correct choices based on their skills. Generally, existing methods mainly mine isomorphic information, ignoring relationships among heterogeneous information. Therefore, this paper proposes a new framework to give freshmen appropriate recommendations by mining heterogeneous educational information. This framework is composed of five stages: after data preprocessing, a weighted heterogeneous educational network (WHEN) is constructed according to heterogeneous information in student historical data. Then, the WHEN is projected into different subnets, on which metapaths are defined. Next, a WHEN-based embedding method is proposed, which helps mine the weighted heterogeneous information on multiple extended metapaths. Finally, with the information mined, a matrix factorization algorithm is used to recommend learning resources and majors for freshmen. A large number of experimental results show that the proposed framework can achieve better results than other baseline methods. This indicates that the proposed method is effective and can provide great help to freshmen during the COVID-19 storm.
Article
Next-term grade prediction is a challenging problem. The objective of this problem is to predict students grades in new courses, given their grades in courses they have previously taken. Adopting various machine learning algorithms is a very common and straightforward approach to tackling this problem. However, such models are very difficult to interpret. That is, it is difficult to explain to a student (or a teacher) why the model predicted grade B for a given student for example. In this work, we shed light on the importance of building interpretable models for educational data mining tasks. Specifically, we propose a novel interpretable framework for multi-class grade prediction that is based on an optimal rule-list mining algorithm. Additionally, we evaluate our proposed framework on two private datasets and compare our results with baseline models. Our findings show that our proposed framework is capable of achieving higher prediction and interpretability values when compared to black-box models.
Article
Audience Response Systems like clickers are gaining much attention for early identification of at-risk students as quality education, student success rate and retention are major concerning areas, as evidenced in this COVID scenario. Usage of this active learning strategy across the varying strength of classrooms are found to be much effective in retaining the attention, retention and learning power of the students. However, implementing clickers for large classrooms incur overhead costs on instructor's part. As a result, educational researchers are experimenting with various lightweight alternatives. This paper discusses one such alternative: lightweight formative assessments for blended learning environments. It discusses their implementation and effectiveness in early identification of at-risk students. This study validates the usage of lightweight assessments for three core pedagogically different courses of large computer science engineering classrooms. It uses voting ensemble classifier for effective predictions. With the usage of lightweight assessments in early identification of at-risk students, accuracy range of 87%–94.7% have been achieved along-with high ROC-AUC values. The study also proposes the generalized pedagogical architecture for fitting in these lightweight assessments within the course curriculum of pedagogically different courses. With the constructive outcomes, the light-weight assessments seem to be promising for efficient handling of scaling technical classrooms.
Preprint
Full-text available
Predicting the performance of students early and as accurately as possible is one of the biggest challenges of educational institutions. Analyzing the performance of students early can help in finding the strengths and weakness of students and help the perform better in examinations. Using machine learning the student's performance can be predicted with the help of students' data collected from Learning Management Systems (LMS). The data collected from LMSs can provide insights about student's behavior that will result in good or bad performance in examinations which then can be studied and used in helping students performing poorly in examinations to perform better.
Article
Over the past decade, the field of education has seen stark changes in the way that data are collected and leveraged to support high-stakes decision-making. Utilizing big data as a meaningful lens to inform teaching and learning can increase academic success. Data-driven research has been conducted to understand student learning performance, such as predicting at-risk students at an early stage and recommending tailored interventions to support services. However, few studies in veterinary education have adopted Learning Analytics. This article examines the adoption of Learning Analytics by using the retrospective data from the first-year professional Doctor of Veterinary Medicine program. The article gives detailed examples of predicting six courses from week 0 (i.e., before the classes started) to week 14 in the semester of Spring 2018. The weekly models for each course showed the change of prediction results as well as the comparison between the prediction results and students’ actual performance. From the prediction models, at-risk students were successfully identified at the early stage, which would help inform instructors to pay more attention to them at this point.
Article
Purpose This study aims to explore Chilean students’ digital technology usage patterns and approaches to learning. Design/Approach/Methods We conducted this study in two stages. We worked with one semester learning management systems (LMS), library, and students’ records data in the first one. We performed a k-means cluster analysis to identify groups with similar usage patterns. In the second stage, we invited students from emerging clusters to participate in group interviews. Thematic analysis was employed to analyze them. Findings Three groups were identified: 1) Digital library users/high performers, who adopted deeper approaches to learning, obtained higher marks, and used learning resources to integrate materials and expand understanding; 2) LMS and physical library users/mid-performers, who adopted mainly strategic approaches, obtained marks close to average, and used learning resources for studying in an organized manner to get good marks; and 3) Lower users of LMS and library/mid-low performers, who adopted mainly a surface approach, obtained mid-to-lower-than-average marks, and used learning resources for minimum content understanding. Originality/Value We demonstrated the importance of combining learning analytics data with qualitative methods to make sense of digital technology usage patterns: approaches to learning are associated with learning resources use. Practical recommendations are presented.
Article
Full-text available
Using predictive modeling methods, it is possible to identify at-risk students early and inform both the instructors and the students. While some universities have started to use standards-based grading, which has educational advantages over common score-based grading, at–risk prediction models have not been adapted to reap the benefits of standards-based grading in courses that utilize this grading. In this paper, we compare predictive methods to identify at-risk students in a course that used standards-based grading. Only in-semester performance data that were available to the course instructors were used in the prediction methods. When identifying at-risk students, it is important to minimize false negative (i.e., type II) error while not increasing false positive (i.e., type I) error significantly. To increase the generalizability of the models and accuracy of the predictions, we used a feature selection method to reduce the number of variables used in each model. The Naive Bayes Classifier model and an Ensemble model using a sequence of models (i.e., Support Vector Machine, K-Nearest Neighbors, and Naive Bayes Classifier) had the best results among the seven tested modeling methods.
Article
Full-text available
This paper presents a dialogical tool for the advancement of learning analytics implementation for student retention in Higher Education institutions. The framework was developed as an outcome of a project commissioned and funded by the Australian Government's Office for Learning and Teaching. The project took a mixed-method approach including a survey at the institutional level (n = 24), a survey of individual teaching staff and other academics with an interest in student retention (n = 353), and a series of interviews (n = 23). Following the collection and analysis of these data an initial version of the framework was developed and presented at a National Forum attended by 148 colleagues from 43 different institutions. Participants at the forum were invited to provide commentary on the usefulness and composition of the framework which was subsequently updated to reflect this feedback. Ultimately, it is envisaged that such a framework might offer institutions an accessible and concise tool to structure and systematize discussion about how learning analytics might be implemented for student retention in their own context.
Article
Full-text available
This study examined the extent to which instructional conditions influence the prediction of academic success in nine undergraduate courses offered in a blended learning model (n = 4134). The study illustrates the differences in predictive power and significant predictors between course-specific models and generalized predictive models. The results suggest that it is imperative for learning analytics research to account for the diverse ways technology is adopted and applied in course-specific contexts. The differences in technology use, especially those related to whether and how learners use the learning management system, require consideration before the log-data can be merged to create a generalized model for predicting academic success. A lack of attention to instructional conditions can lead to an over or under estimation of the effects of LMS features on students' academic success. These findings have broader implications for institutions seeking generalized and portable models for identifying students at risk of academic failure.
Article
Full-text available
We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifi ers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearestneighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classi ers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively). © 2014 Manuel Fernández-Delgado, Eva Cernadas, Senén Barro and Dinani Amorim.
Article
Full-text available
This paper focuses on the comparison communication tools of six open source learning management systems (LMS). It compares the whiteboard/video services, discussion forums, file exchange/internal mail, online journal mail, and real live chat features of each of the LMS's. There are so many open source LMS out there due to this fact it is a bit tedious looking for a suitable one that will meet the instructors needs. This paper seeks to make it easier for instructors that want to make the best choice when choosing a learning management system by revealing which learning management system has the best communication tools. It also focuses on 6 popular LMS, ATutor, Claroline, Dokeos, Ilias, Moodle, and Sakai. The comparison of the six open source LMSs showed that Moodle and ATutor have the best communication tools with user friendly interface.
Conference Paper
Full-text available
All forms of learning take time. There is a large body of research suggesting that the amount of time spent on learning can improve the quality of learning, as represented by academic performance. The wide-spread adoption of learning technologies such as learning management systems (LMSs), has resulted in large amounts of data about student learning being readily accessible to educational researchers. One common use of this data is to measure time that students have spent on different learning tasks (i.e., time-on-task). Given that LMS systems typically only capture times when students executed various actions, time-on-task measures are estimated based on the recorded trace data. LMS trace data has been extensively used in many studies in the field of learning analytics, yet the problem of time-on-task estimation is rarely described in detail and the consequences that it entails are not fully examined. This paper presents the results of a study that examined the effects of different time-on-task estimation methods on the results of commonly adopted analytical models. The primary goal of this paper is to raise awareness of the issue of accuracy and appropriateness surrounding time-estimation within the broader learning analytics community, and to initiate a debate about the challenges of this process. Furthermore, the paper provides an overview of time-on-task estimation methods in educational and related research fields.
Article
Full-text available
With digitisation and the rise of e-learning have come a range of computational tools and approaches that have allowed educators to better support the learners' experience in schools, colleges and universities. The move away from traditional paper-based course materials, registration, admissions and support services to the mobile, always-on and always accessible data has driven demand for information and generated new forms of data observable through consumption behaviours. These changes have led to a plethora of data sets that store learning content and track user behaviours. Most recently, new data analytics approaches are creating new ways of understanding trends and behaviours in students that can be used to improve learning design, strengthen student retention, provide early warning signals concerning individual students and help to personalise the learner's experience. This paper proposes a foundational learning analytics model (LAM) for higher education that focuses on the dynamic interaction of stakeholders with their data supported by visual analytics, such as self-organising maps, to generate conversations, shared inquiry and solution-seeking. The model can be applied for other educational institutions interested in using learning analytics processes to support personalised learning and support services. Further work is testing its efficacy in increasing student retention rates.
Article
Full-text available
Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error and thus the MAE would be a better metric for that purpose. Their paper has been widely cited and may have influenced many researchers in choosing MAE when presenting their model evaluation statistics. However, we contend that the proposed avoidance of RMSE and the use of MAE is not the solution to the problem. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric.
Conference Paper
Full-text available
How do we identify students who are at risk of failing our courses? Waiting to accumulate sufficient assessed work incurs a substantial lag in identifying students who need assistance. We want to provide students with support and guidance as soon as possible to reduce the risk of failure or disengagement. In small classes we can monitor students more directly and mark graded assessments to provide feedback in a relatively short time but large class sizes, where it is most easy for students to disappear and ultimately drop out, pose a much greater challenge. We need reliable and scalable mechanisms for identifying at-risk students as quickly as possible, before they disengage, drop out or fail. The volumes of student information retained in data warehouse and business intelligence systems are often not available to lecturing staff, who can only observe the course-level marks for previous study and participation behaviour in the current course, based on attendance and assignment submission. We have identified a measure of ``at-risk'' behaviour that depends upon the timeliness of initial submissions of any marked activity. By analysing four years of electronic submissions over our school's student body we have extracted over 220,000 individual records, spanning over 1900 students, to establish that early electronic submission behaviour provides can provide a reliable indicator of future behaviour. By measuring the impact on a student's Grade Point Average (GPA) we can show that knowledge of assignment submission and current course level provides a reliable guide to student performance.
Article
Full-text available
Applying data mining (DM) in education is an emerging interdisciplinary research field also known as educational data mining (EDM). It is concerned with developing methods for exploring the unique types of data that come from educational environments. Its goal is to better understand how students learn and identify the settings in which they learn to improve educational outcomes and to gain insights into and explain educational phenomena. Educational information systems can store a huge amount of potential data from multiple sources coming in different formats and at different granularity levels. Each particular educational problem has a specific objective with special characteristics that require a different treatment of the mining problem. The issues mean that traditional DM techniques cannot be applied directly to these types of data and problems. As a consequence, the knowledge discovery process has to be adapted and some specific DM techniques are needed. This paper introduces and reviews key milestones and the current state of affairs in the field of EDM, together with specific applications, tools, and future insights. © 2012 Wiley Periodicals, Inc.
Article
Full-text available
With the increasing diversity of students attending university, there is a growing interest in the factors predicting academic performance. This study is a prospective investigation of the academic, psychosocial, cognitive, and demographic predictors of academic performance of first year Australian university students. Questionnaires were distributed to 197 first year students 4 to 8 weeks prior to the end of semester exams and overall grade point averages were collected at semester completion. Previous academic performance was identified as the most significant predictor of university performance. Integration into university, self efficacy, and employment responsibilities were also predictive of university grades. Identifying the factors that influence academic performance can improve the targeting of interventions and support services for students at risk of academic problems.
Conference Paper
Full-text available
Recent research has indicated that misuse of intelligent tutoring software is correlated with substantially lower learning. Students who frequently engage in behavior termed "gaming the system" (behavior aimed at obtaining correct answers and advancing within the tutoring curriculum by sys- tematically taking advantage of regularities in the software's feedback and help) learn only 2/3 as much as similar students who do not engage in such be- haviors. We present a machine-learned Latent Response Model that can identify if a student is gaming the system in a way that leads to poor learning. We believe this model will be useful both for re-designing tutors to respond appropriately to gaming, and for understanding the phenomenon of gaming better.
Conference Paper
Full-text available
The learners’ motivation has an impact on the quality of learning, especially in e-Learning environments. Most of these environments store data about the learner’s actions in log files. Logging the users’ interactions in educational systems gives the possibility to track their actions at a refined level of detail. Data mining and machine learning techniques can “give meaning” to these data and provide valuable information for learning improvement. An area where improvement is absolutely necessary and of great importance is motivation, known to be an essential factor for preventing attrition in e-Learning. In this paper we investigate if the log files data analysis can be used to estimate the motivational level of the learner. A decision tree is build from a limited number of log files from a web-based learning environment. The results suggest that time spent reading is an important factor for predicting motivation; also, performance in tests was found to be a relevant indicator of the motivational level.
Article
Full-text available
The main objective of higher education institutions is to provide quality education to its students. One way to achieve highest level of quality in higher education system is by discovering knowledge for prediction regarding enrolment of students in a particular course, alienation of traditional classroom teaching model, detection of unfair means used in online examination, detection of abnormal values in the result sheets of the students, prediction about students' performance and so on. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. Present paper is designed to justify the capabilities of data mining techniques in context of higher education by offering a data mining model for higher education system in the university. In this research, the classification task is used to evaluate student's performance and as there are many approaches that are used for data classification, the decision tree method is used here. By this task we extract knowledge that describes students' performance in end semester examination. It helps earlier in identifying the dropouts and students who need special attention and allow the teacher to provide appropriate advising/counseling. Keywords-Educational Data Mining (EDM); Classification; Knowledge Discovery in Database (KDD); ID3 Algorithm.
Article
Full-text available
In matters of great importance that have financial, medical, social, or other implications, we often seek a second opinion before making a decision, sometimes a third, and sometimes many more. In doing so, we weigh the individual opinions, and combine them through some thought process to reach a final decision that is presumably the most informed one. The process of consulting "several experts" before making a final decision is perhaps second nature to us; yet, the extensive benefits of such a process in automated decision making applications have only recently been discovered by computational intelligence community. Also known under various other names, such as multiple classifier systems, committee of classifiers, or mixture of experts, ensemble based systems have shown to produce favorable results compared to those of single-expert systems for a broad range of applications and under a variety of scenarios. Design, implementation and application of such systems are the main topics of this article. Specifically, this paper reviews conditions under which ensemble based systems may be more beneficial than their single classifier counterparts, algorithms for generating individual components of the ensemble systems, and various procedures through which the individual classifiers can be combined. We discuss popular ensemble based algorithms, such as bagging, boosting, AdaBoost, stacked generalization, and hierarchical mixture of experts; as well as commonly used combination rules, including algebraic combination of outputs, voting based techniques, behavior knowledge space, and decision templates. Finally, we look at current and future research directions for novel applications of ensemble systems. Such applications include incremental learning, data fusion, feature selection, learning with missing features, confidence estimation, and error correcting output codes; all areas in which ensemble systems have shown great promise
Conference Paper
The accurate estimation of students’ grades in future courses is important as it can inform the selection of next term’s courses and create personalized degree pathways to facilitate successful and timely graduation. This paper presents future-course grade predictions methods based on sparse linear models and low-rank matrix factorizations that are specific to each course or student-course tuple. These methods identify the predictive subsets of prior courses on a course-by-course basis and better address problems associated with the not-missing-at-random nature of the student-course historical grade data. The methods were evaluated on a dataset obtained from the University of Minnesota. This evaluation showed that the course specific models outperformed various competing schemes with the best performing scheme achieving a RMSE across the different courses of 0.632 vs 0.661 for the best competing method.
Article
Every day, teachers design and test new ways of teaching, using learning technology to help their students. Sadly, their discoveries often remain local. By representing and communicating their best ideas as structured pedagogical patterns, teachers could develop this vital professional knowledge collectively
Article
Blended learning (BL) is recognized as one of the major trends in higher education today. To identify how BL has been actually adopted, this study employed a data-driven approach instead of model-driven methods. Latent Class Analysis method as a clustering approach of educational data model-driven methods. Latent Class Analysis method as a clustering approach of educational data mining was employed to extract common activity features of 612 courses in a large private university located in South Korea by using online behavior data tracked from Learning Management System and institution's course database. Four unique subtypes were identified. Approximately 50% of the courses manifested inactive utilization of LMS or immature stage of blended learning implementation, which is labeled as Type I. Other subtypes included Type C - Communication or Collaboration (24.3%), Type D - Delivery or Discussion (18.0%), and Type S - Sharing or Submission (7.2%). We discussed the implications of BL based on data-driven decisions to provide strategic institutional initiatives.
Article
This study aimed to develop a practical model for predicting students at risk of performing poorly in blended learning courses. Previous research suggests that analyzing usage data stored in the log files of modern Learning Management Systems (LMSs) would allow teachers to develop timely, evidence-based interventions to support at risk or struggling students. The analysis of students' tracking data from a Moodle LMS-supported blended learning course was the focus of this research in an effort to identify significant correlations between different online activities and course grade. Out of 29 LMS usage variables, 14 were found to be significant and were input in a stepwise multivariate regression which revealed that only four variables – Reading and posting messages, Content creation contribution, Quiz efforts and Number of files viewed – predicted 52% of the variance in the final student grade.
Article
Building a student performance prediction model that is both practical and understandable for users is a challenging task fraught with confounding factors to collect and measure. Most current prediction models are difficult for teachers to interpret. This poses significant problems for model use (e.g. personalizing education and intervention) as well as model evaluation. In this paper, we synthesize learning analytics approaches, educational data mining (EDM) and HCI theory to explore the development of more usable prediction models and prediction model representations using data from a collaborative geometry problem solving environment: Virtual Math Teams with Geogebra (VMTwG). First, based on theory proposed by Hrastinski (2009) establishing online learning as online participation, we operationalized activity theory to holistically quantify students’ participation in the CSCL (Computer-supported Collaborative Learning) course. As a result, 6 variables, Subject, Rules, Tools, Division of Labor, Community, and Object, are constructed. This analysis of variables prior to the application of a model distinguishes our approach from prior approaches (feature selection, Ad-hoc guesswork etc.). The approach described diminishes data dimensionality and systematically contextualizes data in a semantic background. Secondly, an advanced modeling technique, Genetic Programming (GP), underlies the developed prediction model. We demonstrate how connecting the structure of VMTwG trace data to a theoretical framework and processing that data using the GP algorithmic approach outperforms traditional models in prediction rate and interpretability. Theoretical and practical implications are then discussed.
Article
This study extends prior research on approaches to teaching and perceptions of the teaching situation by investigating these elements when e-learning is involved. In this study, approaches to teaching ranged from a focus on the teacher and the taught content to a focus on the student and their learning, resembling those reported in previous investigations. Approaches to e-teaching ranged from a focus on information transmission to a focus on communication and collaboration. An analysis of perceptions of the teaching situation in relation to e-learning identified key themes influencing adopted approaches: control of teaching, institutional strategy, pedagogical and technological support, time required, teacher skills for using e-learning, and student abilities and willingness for using learning technology. Associations between these elements showed three groups of teachers: one focusing on transmission of information teaching both face-to-face and online while having a general negative perception of the teaching situation in relation to e-learning; a second focusing on student learning both face-to-face and online while having a general positive perception; and a third presenting unexpected patterns of associations. These results may be helpful for supporting different groups of teachers in employing e-learning in their on-campus units of study. At the same time, further research is proposed for inquiring into specific approaches in different disciplines and different university contexts.
Article
Technology adoption is usually modeled as a process with dynamic transitions between costs and benefits. Nevertheless, school teachers do not generally make effective use of technology in their teaching. This article describes a study designed to exhibit the interplay between two variables: the type of technology, in terms of its complexity of use, and the type of teacher, in terms of attitude towards innovation. The results from this study include: (a) elaboration of a characteristic teacher technology adoption process, based on an existing learning curve for new technology proposed for software development; and (b) presentation of exit points during the technology adoption process. This paper concludes that teachers who are early technology adopters and commit a significant portion of their time to incorporating educational technology into their teaching are more likely to adopt new technology, regardless of its complexity. However, teachers who are not early technology adopters and commit a small portion of their time to integrating educational technology are less likely to adopt new technology and are prone to abandoning the adoption at identified points in the process.
Article
The relative abilities of 2, dimensioned statistics-the root-mean-square error (RMSE) and the mean absolute error (MAE) -to describe average model-performance error are examined. The RMSE is of special interest because it is widely reported in the climatic and environmental literature; nevertheless, it is an inappropriate and misinterpreted measure of average error. RMSE is inappropriate because it is a function of 3 characteristics of a set of errors, rather than of one (the average error). RMSE varies with the variability within the distribution of error magnitudes and with the square root of the number of errors (n(1/2)), as well as with the average-error magnitude (MAE). Our findings indicate that MAE is a more natural measure of average error, and (unlike RMSE) is unambiguous. Dimensioned evaluations and inter-comparisons of average model-performance error, therefore, should be based on MAE.
Book
Handbook of Educational Data Mining (EDM) provides a thorough overview of the current state of knowledge in this area. The first part of the book includes nine surveys and tutorials on the principal data mining techniques that have been applied in education. The second part presents a set of 25 case studies that give a rich overview of the problems that EDM has addressed. Researchers at the Forefront of the Field Discuss Essential Topics and the Latest Advances With contributions by well-known researchers from a variety of fields, the book reflects the multidisciplinary nature of the EDM community. It brings the educational and data mining communities together, helping education experts understand what types of questions EDM can address and helping data miners understand what types of questions are important to educational design and educational decision making. Encouraging readers to integrate EDM into their research and practice, this timely handbook offers a broad, accessible treatment of essential EDM techniques and applications. It provides an excellent first step for newcomers to the EDM community and for active researchers to keep abreast of recent developments in the field.
Article
In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to accurately classify some e-learning students, whereas another may succeed, three decision schemes, which combine in different ways the results of the three machine learning techniques, were also tested. The method was examined in terms of overall accuracy, sensitivity and precision and its results were found to be significantly better than those reported in relevant literature.
Article
This study evaluates student use of an online study environment. Its purposes were to (1) determine if college students will voluntarily use online study tools, (2) identify characteristics of users and nonusers of the tools, and (3) determine if the use of online study tools relates to course achievement. Approximately 25% of students used the online tools for more than one hour before each of three examinations. In comparing use of the study tools provided, the largest number of students made use of the online lecture notes and the greatest amount of online study time was devoted to reviewing multiple choice questions. The perceived ease of access to the Internet differentiated tool users from nonusers. Study tool users scored higher on course examinations after accounting for measures of ability and study skill.
Article
This paper presents a methodological approach based on Bayesian Networks for modelling the behaviour of the students of a bachelor course in computers in an Open University that deploys distance educational methods. It describes the structure of the model, its application for modelling the behaviour of student groups in the Informatics Course of the Hellenic Open University, as well as the advantages of the presented method under conditions of uncertainty. The application of this model resulted in promising results as regards both prediction of student behaviour, based on modelled past experience, and assessment (i.e., identification of the reasons that led students to a given `current' state). The method presented in this paper offers an effective way to model past experience, which can significantly aid in decision-making regarding the educational procedure. It can also be used for assessment purposes regarding a current state enabling tutors to identify mistakes or bad practices so as to avoid them in the future as well as identify successful practices that are worth repeating. The paper concludes that modelling is feasible and that the presented method is useful especially in cases of large amounts of data that are hard to draw conclusions from without any modelling. It is emphasised that the presented method does not make any predictions and assessments by itself; it is a valuable tool for modelling the educational experience of its user and exploiting the past data or data resulting from its use.
Article
This chapter introduces a study which focuses on predicting college success as measured by students’ grade point averages (GPAs). The chapter also reviews prior research related to various types of predictors. Specifically, two categories of predictors are identified: ability measures and non-cognitive variables. Finally, an overview of the study is presented.
Conference Paper
We present a machine-learned model that can automatically detect when a student using an intelligent tutoring system is off-task, i.e., engaged in behavior which does not involve the system or a learning task. This model was developed using only log files of system usage (i.e. no screen capture or audio/video data). We show that this model can both accurately identify each student's prevalence of off-task behavior and can distinguish off-task behavior from when the student is talking to the teacher or another student about the subject matter. We use this model in combination with motivational and attitudinal instruments, developing a profile of the attitudes and motivations associated with off-task behavior, and compare this profile to the attitudes and motivations associated with other behaviors in intelligent tutoring systems. We discuss how the model of off-task behavior can be used within interactive learning environments which respond to when students are off-task.
Article
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field. Copyright © 1996, American Association for Artificial Intelligence. All rights reserved.
Article
Earlier studies have suggested that higher education institutions could harness the predictive power of Learning Management System (LMS) data to develop reporting tools that identify at-risk students and allow for more timely pedagogical interventions. This paper confirms and extends this proposition by providing data from an international research project investigating which student online activities accurately predict academic achievement. Analysis of LMS tracking data from a Blackboard Vista-supported course identified 15 variables demonstrating a significant simple correlation with student final grade. Regression modelling generated a best-fit predictive model for this course which incorporates key variables such as total number of discussion messages posted, total number of mail messages sent, and total number of assessments completed and which explains more than 30% of the variation in student final grade. Logistic modelling demonstrated the predictive power of this model, which correctly identified 81% of students who achieved a failing grade. Moreover, network analysis of course discussion forums afforded insight into the development of the student learning community by identifying disconnected students, patterns of student-to-student communication, and instructor positioning within the network. This study affirms that pedagogically meaningful information can be extracted from LMS-generated student tracking data, and discusses how these findings are informing the development of a customizable dashboard-like reporting tool for educators that will extract and visualize real-time data on student engagement and likelihood of success.
Book
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Analysing student performance using
  • M Saarela
  • T Kärkkäinen
Saarela, M., & Kärkkäinen, T. (2015). Analysing student performance using
Course signals at purdue: using learning analytics to increase student success
  • K E Arnold
  • M D Pistilli
Arnold, K. E., & Pistilli, M. D. (2012). Course signals at purdue: using learning analytics to increase student success. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 267-270). ACM.
Collaborative multiregression models for predicting students
  • A Elbadrawy
  • R S Studham
  • G Karypis
Elbadrawy, A., Studham, R. S., & Karypis, G. (2015). Collaborative multiregression models for predicting students' performance in course activities.
Correlation and causation
  • S Wright
Wright, S. (1921). Correlation and causation. Journal of agricultural research, 20, 557-585.