Poster

Mining Frequent Learning Pathways from a Large Educational Dataset

Authors:
  • Playpower Labs
To read the file of this research, you can request a copy directly from the authors.

Abstract

In this paper, we describe data mining techniques used to extract frequent learning pathways from a large educational dataset. These pathways were extracted as a directed graph that encoded student learning processes. Our dataset contains more than 800 million interactions of over 3 million anonymized students in an online learning platform. Performing process mining on large and complex datasets regularly yields incomprehensible process models. Although, if we cluster data and obtain groups following similar processes, we can greatly improve process mining results. To this end, we developed a sequence clustering algorithm that let us group students who followed similar learning pathways. To extract frequent learning pathways from these clusters of data, we developed a graph-based process discovery algorithm that revealed to us the sequences of learning activities that many students followed. These sequences represented highways of student learning.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

... Next, a graph-based process discovery algorithm is developed [16]. It can reveal the sequence of learning activities engaged by several students. ...
... Some research also specializes in comparing these methods [7,9]. Varying data sets are also used, such as those related to scores awarded in the early years and complex ones acquired from various fields of study [16]. The dataset used is the scores awarded to the academic and extracurricular activities of students from the Business Management and Architecture study programs in 2012. ...
... In previous studies, the research topics vary and are related to research from various perspectives. Most of the previous research focuses only on clustering and data mining methods [5,7,11,13,14,16], including those that strengthen clustering with ensemble learning [8]. These studies show that K-Means is the most effective and widely used clustering method. ...
Article
Full-text available
Educational data mining is a technique to evaluate educational process of university students, especially in their early stages. Most preliminary studies focus on observing courses undertaken by students from one semester to the next to predict their success rate. However, besides studying, many students are also involved in non-academic activities, which tends to affect their grades. Therefore, the research aims to determine the effect of student activities on grades while taking into account their academic activities. The method used for clustering is K-Means. Data are collected by observing students’ activity patterns in lectures. The research is conducted in two study programs at Petra Christian University: Business Management and Architecture. The results show that the K-Means method gives good results. The clusters formed from the data show non-homogenous groups and produce insights from several groups. The results show a tendency for students’ performance to increase along with the number of activities and points earned. Most students have increased activities during busy times in the third, fourth, fifth, and sixth semesters. The peak is between the fifth and sixth semesters. Then, it starts to decrease in the seventh and eighth semesters. Therefore, students’ activities in the Business Management study program affect performance significantly. Meanwhile, in the Architecture study program, it has an insignificant effect on performance.
... Reference Graphical behavior models [Okamoto et al. 2014], [Brown et al. 2015], [Kumar 2014], [Jiang et al. 2014] Graph-based log analysis [Gitinabard et al. 2017], [Dawson 2008], [Gardner and Brooks 2018], [Grawemeyer et al. 2017], [Kovanovic et al. 2014], [Mostafavi and Barnes 2014], Graphical solution representations , [Li et al. 2015], , [Gašević et al. 2013], [Eckles and Stradley 2012], [Fire et al. 2012], [Dekel and Gal 2014] Novel graph analysis techniques [Buluz and Yilmaz 2017], [Patel et al. 2017], [Costa et al. 2019], [London and Németh 2014], [Zheng et al. 2017] Tools and technologies for automatic concept hierarchy extraction [Chen et al. 2018], [Cateté et al. 2014], [Yang et al. 2010] Tools and technologies for traph grammar (pattern) recognition [Lynch 2014] Relevant analytical tools and standard problems [Sheshadri et al. 2014] Social network analysis [Wejnert 2010], [Kovanovic et al. 2014], [Garcia-Saiz et al. 2014] The Directed Acyclic Graph is the most widely used representation (34%), followed by Directed Acyclic Multigraph with 23% (see Table 2). In general Directed Acyclic Graph are graphs are often used in analysis of user interactions in virtual environments. ...
... Directed Acyclic Graph [Buluz and Yilmaz 2017], [Chen et al. 2018], [Kumar 2014], [Cateté et al. 2014], [Costa et al. 2019], [Wejnert 2010], [London and Németh 2014], [Zheng et al. 2017], , [Gitinabard et al. 2017] Directed Cyclic Graph [Yang et al. 2010] Directed Acyclic Multigraph [Dawson 2008], , [Li et al. 2015], [Brown et al. 2015], [Patel et al. 2017], [Kovanovic et al. 2014], [Ramos et al. 2016] Simple Acyclic Graph , [Gašević et al. 2013], [Grawemeyer et al. 2017], [Lynch 2014] , [Jiang et al. 2014] Tree [Gardner and Brooks 2018], [Dekel and Gal 2014], [Mostafavi and Barnes 2014], [Sheshadri et al. 2014 [Zheng et al. 2017] [London and Németh 2014] [Dekel and Gal 2014]. One focused on grammar models for learning [Lynch 2014] [Kumar 2014] and the others on tools ] [Ramos et al. 2016] [Chen et al. 2018], process learning and Social Network analysis [Fire et al. 2012] [Yang et al. 2010] ] [Gardner and Brooks 2018] [Eckles and Stradley 2012] [Wejnert 2010]. ...
... Python NetworkX [Grawemeyer et al. 2017] [Fire et al. 2012] ] is a Python script for graph analysis and visualization, and offers one of the largest varieties of algorithms for graph analysis. The Rgraph plugin for R [Fire et al. 2012], [Patel et al. 2017] presents some algorithms and metrics for visualization and extraction of information in graphs. ...
... By examining students' learning behavior patterns, we can assess the curriculum and evaluate students' performance during the study period. Some works on EPM have been applied in academic institutions to predict students' dropout [2], to recommend a correct path to students [3], to identify, examine and facilitate the evaluation of learning processes [4]- [6], and to enhance the curriculum design [3], [7], [8]. ...
... The curriculum guideline is designed for eight (8) consecutive semesters, which is the standard length of study for bachelor's degrees. During the eight (8) semesters, there are two phases, i.e., the preparation phase (1st semester to 4th semester) and the bachelor's phase (5th semester to 8th semester). ...
Article
Full-text available
Educational process mining is one of the research domains that utilizes students’ learning behavior to match students’ actual courses taken and the designed curriculum. While most works attempt to deal with the case perspective (i.e., traces of the cases), the temporal case perspective has not been discussed. The temporal case perspective aims to understand the temporal patterns of cases (e.g., students’ learning behavior in a semester). This study proposes an extension of cluster evolution analysis, called profile-based cluster evolution analysis, for students’ learning behavior based on profiles. The results show three salient features: (1) cluster generation; (2) within-cluster generation; and (3) time-based between-cluster generation. The cluster evolution phase extends the existing cluster evolution analysis with a dynamic profiler. The model was tested on actual educational data of the Information System Department in Indonesia. The results showed the learning behavior of students who graduated on time, the learning behavior of students who graduated late, and the learning behavior of students who dropped out. Students changed their learning behavior by observing the migration of students from cluster to cluster for each semester. Furthermore, there were distinct learning behavior migration patterns for each category of students based on their performance. The migration pattern can suggest to academic stakeholders students who are likely to drop out, graduate on time or graduate late. These results can be used as recommendations to academic stakeholders for curriculum assessment and development and dropout prevention.
... However, most sequential pattern mining algorithms produce large outputs with many similar patterns that make interpretation of the resulting sequences difficult. To resolve this, we developed a sequence summarization approach, based on Patel et al.'s sequence clustering technique (Patel et al., 2017). A directed weighted graph was constructed to represent a set of sequences as a cluster, with each annotated behavior represented by a node, and the number of times each multiple-node behavior sequence occurred represented by the weights of the edges, similar to (Patel et al., 2017). ...
... To resolve this, we developed a sequence summarization approach, based on Patel et al.'s sequence clustering technique (Patel et al., 2017). A directed weighted graph was constructed to represent a set of sequences as a cluster, with each annotated behavior represented by a node, and the number of times each multiple-node behavior sequence occurred represented by the weights of the edges, similar to (Patel et al., 2017). We differ from Patel et al. by then summarizing each sequence cluster into a single "core sequence" which reflects the most representative behavior sequences for that cluster. ...
Conference Paper
Full-text available
Prior work has found benefits of interpersonal closeness, or rapport, on student learning, but has primarily investigated its impact on learning outcomes, not learning processes. Moreover, such work often analyzes the direct impact of dyadic features like rapport on learning, without considering the role played by individual factors, such as learners' prior knowledge and self-efficacy. In this paper, we investigate the intertwined impact that rapport, self-efficacy, and prior knowledge have on the process and outcomes of peer tutoring. We find that peer tutors in high-rapport dyads offer more help and prompt their tutees to explain their reasoning more than low-rapport dyads, with tutees in such dyads verbalizing their problem-solving process and proposing more steps and answers. Meanwhile, rapport is associated with increased procedural performance, but tutees' self-efficacy and prior knowledge moderate the effect of rapport on tutees' conceptual performance.
... Patel et al. [8] analyses the LPs commonly used by students. The analysis uses sequence clustering and graph-based process mining on educational data. ...
Chapter
This paper promotes the idea of the learning process management in the e-learning system. A personalized adaptive e-learning system is used in this research that comprises three developed topic acquisition sequences: teacher, learner or optimal topic sequences. The learner has the ability to switch between the aforementioned topic sequences. The system stores data about the course acquisition process. The analysis of the stored data demonstrated that a bit more than half of the students used the teacher topic sequence; higher grades in topics got those students who chose the learner or optimal topic sequence; the grades of the half of the students who used the optimal and teacher topic sequences were in the same level. The obtained results were used as the justification for the improvement of the existing optimal topic sequence development method. As a result, an algorithm for the recommended learning path development is proposed in this paper. The topics of the course and links in between are described using a weighted directed graph. The weight of every edge and vertex of the graph is calculated based on the parameter values describing the topic. Afterwards, the recommended learning path is assumed to be the path with the lowest weight that is found in the weighted oriented graph using a search.
... In the paper [11] Patel N. et al. describe data mining techniques used to extract frequent learning pathways from a large educational dataset. A graph-based process discovery algorithm was developed for extracting these frequent learning pathways from the clusters of data and revealed the sequences of learning activities that many students followed. ...
Conference Paper
Full-text available
Abstract The article presents a graph model for building learning pathways between academic courses based on the relations between keywords. In this study, we use two-layer graph model where the first layer is represented by academic courses, the second one by the keywords. Relations between courses are built based on the relations between the keywords. An approach to individual learning pathway validation by the syllabus keywords graph is proposed.
... [8,10,14]), while others have clustered these sequences to find groups of similarly behaving students in classes (e.g. [4,11,7,17,18,19,25,30]). These studies have often been able to identify relevant clusters among the students such as "confirmers" and "non-confirmers" [11] or "behind", "on-track", "auditing", and "out" [17]. ...
Preprint
Students' interactions with online tools can provide us with insights into their study and work habits. Prior research has shown that these habits, even as simple as the number of actions or the time spent on online platforms can distinguish between the higher performing students and low-performers. These habits are also often used to predict students' performance in classes. One key feature of these actions that is often overlooked is how and when the students transition between different online platforms. In this work, we study sequences of student transitions between online tools in blended courses and identify which habits make the most difference between the higher and lower performing groups. While our results showed that most of the time students focus on a single tool, we were able to find patterns in their transitions to differentiate high and low performing groups. These findings can help instructors to provide procedural guidance to the students, as well as to identify harmful habits and make timely interventions.
... [8,10,14]), while others have clustered these sequences to find groups of similarly behaving students in classes (e.g. [4,11,7,17,18,19,25,30]). These studies have often been able to identify relevant clusters among the students such as "confirmers" and "non-confirmers" [11] or "behind", "on-track", "auditing", and "out" [17]. ...
Conference Paper
Full-text available
Students' interactions with online tools can provide us with insights into their study and work habits. Prior research has shown that these habits, even as simple as the number of actions or the time spent on online platforms can distinguish between the higher performing students and low-performers. These habits are also often used to predict students' performance in classes. One key feature of these actions that is often overlooked is how and when the students transition between different online platforms. In this work, we study sequences of student transitions between online tools in blended courses and identify which habits make the most difference between the higher and lower performing groups. These findings can help instructors to provide procedural guidance to the students, as well as to identify harmful habits and make timely interventions.
... Although these complex models can be used for predicting student actions over time more precisely, they have little interpretability. In this case, rather than analyzing sequence data directly, we can use clustering methods to group similar sequences together and analyze them separately [3,8]. ...
Chapter
Full-text available
This paper examines the use of “pacing plots” to represent variations in student learning sequences within a digital curriculum. Pacing plots are an intuitive and flexible data visualizations that have a potential for revealing the diversity of blended classroom instructional models. By using curriculum pacing plots, we identified several common implementation patterns in real-world classrooms. After analyzing two years’ worth of data from over 150,000 students in a digital math curriculum, we found that a PCA and K-Means clustering approach was able to discover pedagogically relevant instructional practices.
Chapter
A detailed illustration of how large scale digital learning systems can incrementally reduce the poverty-achievement gap
Chapter
Full-text available
Learning to read is one of the most important achievements of early childhood, and sets the stage for future success. Even prior to school entry, children’s foundational literacy skills predict their later academic trajectories (Duncan et al., 2007; La Paro & Pianta, 2000; Lloyd, 1969; Lloyd, 1978). Children learn to read with differing levels of ease, with an estimated 5-17% of school-age children who struggle with reading acquisition (Shaywitz, 1998). The individual variation in children’s reading skills can be predicted by genetic, environmental, academic and socio-demographic factors (for review, see Peterson & Pennington, 2015). This chapter focuses on the relationship between reading development and socioeconomic status (SES), with attention to both cognitive outcomes and neural mechanisms. First, we describe SES and its relation to academic achievement in general, and reading development in particular. Second, we examine environmental factors that can potentially give rise to socioeconomic disparities in reading, such as early language/literacy exposure and access to books. Next, we explore the link between SES and reading disability (RD), including a focus on intervention approaches and treatment response. Finally, we summarize remaining questions and propose future research priorities.
Book
Full-text available
Durante las últimas dos décadas, la investigación en el área de estudio de la pobreza ha comenzado a aportar evidencia que constituye un avance en la comprensión de cómo la adversidad temprana asociada a privaciones materiales, sociales y culturales modula el desarrollo cerebral. Cuando tal evidencia es utiliada en otros contextos disciplinares, con frecuencia se verifican referencias al desarrollo cerebral temprano como a un factor predictor de conductas adaptativas y de productividad económica durante la vida adulta; o de la imposibilidad de tales logros por la supuesta inmutabilidad de los impactos negativos a largo plazo de la pobreza infantil. Este tipo de afirmaciones, que tienen implicaciones no solo científicas sino también políticas, requieren ser analizadas de manera adecuada a la luz de la evidencia disponible, ya que podrían inducir conceptos erróneos y sobre-eneralizaciones que tienen la potencialidad de afectar los criterios para la inversion, el diseño, la implementación y la evaluación de acciones en el ámbito de la infancia temprana. Este libro se propone como un aporte en esta dirección. Los distintos capítulos, a cargo de referentes destacados del estudio neurocientífico y cognitivo de la pobreza, aportan evidencia que alimenta hipótesis y reflexiones en línea con las principales preguntas de tal área de estudio.
Conference Paper
Capturing students' behavioral patterns through analysis of sequential interaction logs is an important task in educational data mining and could enable more effective and personalized support during the learning processes. This study aims at discovery and temporal analysis of learners' study patterns in MOOC assessment periods. We propose two different methods to achieve this goal. First, following a hypothesis-driven approach, we identify learners' study patterns based on their interaction with lectures and assignments. Through clustering of study pattern sequences, we capture different longitudinal activity profiles among learners and describe their properties. Second, we propose a temporal clustering pipeline for unsupervised discovery of latent patterns in learners' interaction data. We model and cluster activity sequences at each time step and perform cluster matching to enable tracking learning behaviours over time. Our proposed pipeline is general and applicable in different learning environments such as MOOC and ITS. Moreover, it allows for modeling and temporal analysis of interaction data at different levels of actions granularity and time resolution. We demonstrate the application of this method for detecting latent study patterns in a MOOC course.
Conference Paper
Full-text available
Information systems supporting business processes generate event data which provide the starting point for a range of process mining techniques. Lion's share of real-life processes are complex and ad-hoc, which creates problems for traditional process mining techniques, that cannot deal with such unstructured processes. Finding mainstream and deviating cases in such data is problematic, since most cases are unique and therefore determining what is normal or exceptional may depend on many factors. Trace clustering aims to group similar cases in order to find variations of the process and to gain novel insights into the process at hand. However, few trace clustering techniques take the context of the process into account and focus on the control-flow perspective only. Outlier detection techniques provide only a binary distinction between normal and exceptional behavior, or depend on a normative process model to be present. As a result, existing techniques are less suited for processes with a high degree of variability. In this paper, we introduce a novel trace clustering technique that is able to find process variants as well as deviating behavior based on a set of selected perspectives. Evaluation on both artificial and real-life event data reveals that additional insights can consequently be achieved.
Conference Paper
Full-text available
Process Mining is a technique for extracting process models from ex- ecution logs. This is particularly useful in situations where people have an ide- alized view of reality. Real-life processes turn out to be less structured than peo- ple tend to believe. Unfortunately, traditional process mining approaches have problems dealing with unstructured processes. The discovered models are often "spaghetti-like", showing all details without distinguishing what is important and what is not. This paper proposes a new process mining approach to overcome this problem. The approach is configurable and allows for different faithfully simpli- fied views of a particular process. To do this, the concept of a roadmap is used as a metaphor. Just like different roadmaps provide suitable abstractions of reality, process models should provide meaningful abstractions of operational processes encountered in domains ranging from healthcare and logistics to web services and public administration.
Book
This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a considerably expanded section on software tools and a completely new chapter of process mining in the large. It is self-contained, while at the same time covering the entire process-mining spectrum from process discovery to predictive analytics. After a general introduction to data science and process mining in Part I, Part II provides the basics of business process modeling and data mining necessary to understand the remainder of the book. Next, Part III focuses on process discovery as the most important process mining task, while Part IV moves beyond discovering the control flow of processes, highlighting conformance checking, and organizational and time perspectives. Part V offers a guide to successfully applying process mining in practice, including an introduction to the widely used open-source tool ProM and several commercial products. Lastly, Part VI takes a step back, reflecting on the material presented and the key open challenges. Overall, this book provides a comprehensive overview of the state of the art in process mining. It is intended for business process analysts, business consultants, process managers, graduate students, and BPM researchers.
Conference Paper
The extraction of student behavior is an important task in educational data mining. A common approach to detect similar behavior patterns is to cluster sequential data. Standard approaches identify clusters at each time step separately and typically show low performance for data that inherently suffer from noise, resulting in temporally inconsistent clusters. We propose an evolutionary clustering pipeline that can be applied to learning data, aiming at improving cluster stability over multiple training sessions in the presence of noise. Our model selection is designed such that relevant cluster evolution effects can be captured. The pipeline can be used as a black box for any intelligent tutoring system (ITS). We show that our method outperforms previous work regarding clustering performance and stability on synthetic data. Using log data from two ITS, we demonstrate that the proposed pipeline is able to detect interesting student behavior and properties of learning environments.
Chapter
Learning management systems(LMs) can offer a great variety of channels and workspaces to facilite information haring and communication aming participants in a course. The let educators distribute informaction to students, produce content material, prepare assignments and tests, engage in discussions, manage distance classes, and enable collaborative learning with forums, chats, file storage areas, news services... Some examples of commercial systems are Blackboard[1], WebCT[2], and Top-Class[3], while some examples of free systems are Moodle[4], Ilias[5], and Claroline[6]. One of the most commonly used is Moodle (modular object oriented developmental learning environment), a free learning management system enabling the creation of powerful, flexible, and engaging online courses and experiences[42]. These e-learning systems accumulate a vast amount of information that is very valuable for analyzing students´ behavior and could create a gold mine of educational data[7]. They can record any student activities involved, such as reading, writing, taking tests, performing various tasks, and even communicating with peers. They normally also provide a data base that stores all the system´s information: personal information about the users (profile), and academic results and users´ interaction data. However, due to the vast quantities of dat these systmes can generate daily, it is very difficult to manage data analysis manually.
Article
In modern education, various information systems are used to support educational processes. In the majority of cases, these systems have logging capabilities to audit and monitor the processes they support. At the level of a university, administrative information systems collect information about students, their enrollment in particular programs and courses, and performance like examination grades. In addition, the information about the lectures, instructors, study programs, courses, and prerequisites are typically available as well. These data can be analyzed from various levels and perspectives, showing different aspects of organization, and giving us more insight into the overall educational system. From the level of an individual course, we can consider participation in lectures, accomplishing assignments, and enrolling in midterm and final exams. However, with the development and increasing popularity of blended learning and e-learning, information systems enable us to capture activities also at different levels of granularity. Besides more traditional tasks like overall student performance or dropout prediction [1], it becomes possible to track how different learning resources (videolectures, handouts, wikis, hypermedia, quizzes) are used [2], how students progress with (software) project assignments (svn commits) [3], and self-assessment test and questionnaires [4].
Exploring students' learning behaviour in moocs using process mining techniques
  • P Mukala
  • J Buijs
  • W Van Der Aalst
P. Mukala, J. Buijs, and W. Van der Aalst. Exploring students' learning behaviour in moocs using process mining techniques. Technical report, BPMcenter.org, 2015.