Conference PaperPDF Available

A Method for Finding Prerequisites Within a Curriculum.

Authors:
  • Independent Researcher

Abstract and Figures

Creating an educational curriculum is a difficult task involving many variables and constraints [Wang 2005]. In any curriculum, the order of the instructional units is partly based on which units teach prerequisite knowledge for later units. Historically, psychologists and cognitive scientists have studied the dependency structure of information in various domains [Bergan and Jeska 1980; Griffiths and Grant 1985; Chi and Koeske 1983]; however, many of these studies have been hampered by statistical issues such as the difficulty of removing instructional effects when using small samples [Horne 1983]. We hypothesize that large-scale assessment data can be analyzed to determine the dependency relationships between units in a curriculum. This structure could then be used in generating and evaluating alternate unit sequences to test whether omitting or re-ordering units would undermine necessary foundational knowledge building. Our method incorporates all possible pair-wise dependency relationships in a curriculum and, for each such candidate dependency, compares performance of students who used the potential prerequisite unit to performance of students who did not. We implemented the method on a random sample of schools from across the U.S. that use Carnegie Learning's Cognitive Tutor software and its associated curricula; a sample that far exceeds those used in previous studies both in size and in scope. The resulting structure is compared to a pre-existing list of prerequisites created by Carnegie Learning based on student skill models. We discuss extensions of this method, issues in interpreting the results, and possible applications. We hope that this work serves as a step toward developing a data-driven model of curriculum design.
Content may be subject to copyright.
A
A Method for Finding Prerequisites Within a Curriculum
Annalies Vuong, Tristan Nixon, and Brendon Towle1, Carnegie Learning
Creating an educational curriculum is a difficult task involving many variables and constraints [Wang 2005].
In any curriculum, the order of the instructional units is partly based on which units teach prerequisite
knowledge for later units. Historically, psychologists and cognitive scientists have studied the dependency
structure of information in various domains [Bergan and Jeska 1980; Griffiths and Grant 1985; Chi and
Koeske 1983]; however, many of these studies have been hampered by statistical issues such as the difficulty
of removing instructional effects when using small samples [Horne 1983]. We hypothesize that large-scale
assessment data can be analyzed to determine the dependency relationships between units in a curriculum.
This structure could then be used in generating and evaluating alternate unit sequences to test whether
omitting or re-ordering units would undermine necessary foundational knowledge building. Our method
incorporates all possible pair-wise dependency relationships in a curriculum and, for each such candidate
dependency, compares performance of students who used the potential prerequisite unit to performance of
students who did not. We implemented the method on a random sample of schools from across the U.S. that
use Carnegie Learning’s Cognitive Tutor software and its associated curricula; a sample that far exceeds
those used in previous studies both in size and in scope. The resulting structure is compared to a pre-existing
list of prerequisites created by Carnegie Learning based on student skill models. We discuss extensions of
this method, issues in interpreting the results, and possible applications. We hope that this work serves as
a step toward developing a data-driven model of curriculum design.
Additional Key Words and Phrases: data mining, curriculum, intelligent tutor
1. INTRODUCTION
Curriculum is a fundamental part of education at every scale, from a one-day class to
a four-year degree. Designing a curriculum involves balancing many competing con-
straints, not least of which is prerequisite knowledge. The method we present applies
to any scale of curricula, though our data comes from mathematics curricula spanning
one school year.
Prerequisite knowledge is here defined as the skills and information necessary to
succeed in a given instructional unit within a curriculum. This knowledge can be ac-
quired inside or outside the curriculum, giving rise to three important questions. What
prerequisite knowledge is required to successfully learn each topic in the curriculum?
Which units in the curriculum teach this knowledge? Finally, have students already
acquired this knowledge outside of the curriculum? The third question will have an
answer unique to each instructional situation. However, the first and second ques-
tions depend only on the content of the curriculum, and we believe that they can be
answered empirically.
In fact, we believe they can be combined into one question. Given an instructional
unit – call it Unit B – which prior units significantly influence students’ success in this
unit? It seems reasonable to conclude that these prior units cover some prerequisite
knowledge. This paper lays out a method, given sufficient user data, for finding the
prerequisite units for each instructional unit in a curriculum.
Creating a curriculum always involves defining prerequisites implicitly or explic-
itly. However, these definitions are usually based on expert opinion or on a theoretical
model [Bergan and Jeska 1980; Griffiths and Grant 1985; Chi and Koeske 1983] and
are rarely tested empirically. Even algorithms for designing optimal curricula may
take prerequisite relations as a given [Wang 2005]. This lack of empirical testing is
understandable, as it can be difficult to assess causal relationships or remove instruc-
1Authors’ addresses: A. Vuong, T. Nixon, and B. Towle, Carnegie Learning, Inc., 437 Grant St., Pittsburgh,
PA 15219. Correspondence can be sent to avuong@carnegielearning.com.
ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.
A:2 A. Vuong, T. Nixon, and B. Towle
tional effects, especially in small observational studies [Horne 1983]. Using a small
sample also runs the risk of only answering the third question: do these students al-
ready have sufficient prior knowledge. A further difficulty in determining dependency
structure is that a curriculum of multiple units will have a significant number of pos-
sible prerequisites to test.
Many of these problems relate to having small sample sizes. Our data are collected
from students using Carnegie Learning’s Cognitive Tutor, by far the most widely used
intelligent tutoring system in the United States: it is currently used by over five hun-
dred thousand students in more than twenty-five hundred schools. For high school
mathematics, Carnegie Learning offers four Cognitive Tutor curricula. Each of the
curricula consists of a sequence of units; each unit consists of a sequence of sections.
Units cover distinct mathematical topics; sections cover distinct sets of problems on
that topic, with a distinct student skill model for each section. Teachers and school
administrators can create customized variations of the standard curricula by omit-
ting units, reordering units, or adding units from another curriculum, though they
cannot customize the skill models or problem sets within each unit. In school year
2008-2009, 85% of teachers used a customized curriculum. The great variety of curric-
ula variations thus created provided us with a natural opportunity to compare student
performance as they progressed through subtly different unit sequences.
That all students used the same software was very helpful in avoiding issues raised
in previous studies: it reduced instructional effects in the data, reduced the chance of
content-based false positives via the coverage of a large number of topics, and provided
sufficiently uniform data to test almost all possible prerequisites.
2. METHOD
2.1. Data
Our data is taken from a random one-fifth sample of all schools using Cognitive Tutor
in the 2008-2009 school year from whom were collected detailed logs of students’ ac-
tivity in the software. This sample comprises 20,577 students from 888 schools across
the United States.
The standard Carnegie Learning high school curricula – Bridge to Algebra, Alge-
bra I, Geometry, and Algebra II – contain 175 total units, including cross-listed units.
Every unit prior to a given unit within each curriculum was considered a possible pre-
requisite; “true” prerequisites of a unit are defined to be those which have a significant
effect on the success of students in completing the target unit. The list of all possible
prerequisite relationships can be represented as a list of pairs of the form (Unit A, Unit
B), where A is prior to B in one of the four curricula. For each pair of units, Sample
A comprised all students in the data set whose curricula included both Unit A and
Unit B and who attempted at least one problem in both units. Sample B contained all
students whose curricula included Unit B but omitted Unit A and who attempted at
least one problem in Unit B. We tested all such pairs for which there was at least one
student in each sample, resulting in 3,832 tested pairs. Lacking enough information to
compare a pair of units was rare: only 30 pairs were not tested. The average Sample A
size was 325.5 students, with average Sample B size at 506.8 students.
2.2. Testing
For any given Unit B, student performance in the first section of the unit is more
likely to be affected by prior knowledge from other units than performance in the later
sections, since later sections would likely rely strongly on knowledge from earlier in
the unit. To avoid this confound, we decided to evaluate success in Unit B by looking
at student performance on the first section of the unit. Our hypothesis was that if
ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.
A Method for Finding Prerequisites Within a Curriculum A:3
Unit A truly provides prerequisite knowledge for Unit B, we should see an increased
graduation rate on the first section of Unit B for students who have had some exposure
to the material in Unite A. We therefore calculated the overall graduation rate from
the first section of Unit B in each of the two samples.
In the Cognitive Tutor, a student graduates from a section only if they master all
of the section’s skills over the course of a reasonable number of problems. The system
will automatically promote them to the next section if they hit the problem limit with-
out mastering all skills. Additionally, a student may simply stop working on a section
and never return, for reasons such as leaving the class. Finally, a teacher may move
a student forward out of a section if they are not keeping pace with the rest of class.
Given the ways in which a student may leave a section without graduating, it is rea-
sonable to assume that in general the performance of students who graduated from
the first section of Unit B was better than the performance of those who did not and
thus reasonable to use graduation rate as a performance metric.
To compare graduation rates for each pair of units in our list of possible prerequi-
sites, we used a binomial test with α= 0.01. The binomial test for each pair looked for
a significant difference in the average graduation rate for Sample A as compared to
the rate for Sample B. If a significant difference was found, Unit A was deemed a true
prerequisite for Unit B.
3. RESULTS
The average number of true prerequisites for each unit was approximately 9.6 out
of an average of 21.2 possible prerequisites. Overall, a little over 43% of all possible
prerequisites were found to be true prerequisites.
As part of our analysis, we compared the data-driven prerequisites to the list of
prerequisite relationships which is already included in the Cognitive Tutor, a list gen-
erated primarily from shared skills in the cognitive models for different units. Figure 1
shows all possible (Unit A, Unit B) pairs across the four curricula as elements in a 175
by 175 matrix, with Unit A as the row and Unit B as the column. The order of the units
within each curriculum and the order of curricula (Bridge to Algebra, Algebra I, Geom-
etry, Algebra II) was preserved in each axis of the table, hence the triangular regions
indicate the standard curricula. In Figure 1, red indicates a non-significant relation-
ship between Units A and B, with green indicating a significant relationship. Colored
blocks outside the triangular regions display the results of cross-listed units, and white
blocks indicate inter-curricular pairs not on our list of possible prerequisites.2Yellow
blocks mark the small number of cases where there was insufficient data.
Figure 2 shows the difference between the set of empirically-derived prerequisite
relationships and the set of prerequisite relationships given by the Cognitive Tutor
for the same list of possible prerequisites. Dark green indicates a true prerequisite
relationship found in both sets; light green a relationship deemed not prerequisite by
both. Orange indicates a prerequisite relationship listed by the Cognitive Tutor that
was not found in the data. Yellow shows a relationship found in the data but not in
the Cognitive Tutor set. Overall, there was 56% agreement between the two methods
(percentage of green blocks) and 14% agreement on true prerequisites (percentage of
dark green blocks). The next section gives possible reasons for these differences.
4. DISCUSSION
Our goal was to devise a method which would empirically determine the prerequisite
relationships in a given curriculum. It is important to note that this method is dis-
2Units were compared only within each curriculum. A study allowing for cross-curricular prerequisites could
yield further pertinent results.
ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.
A:4 A. Vuong, T. Nixon, and B. Towle
Fig. 1. Heatmap of data-based prerequisite relationships. Red indicates a non-significant relationship be-
tween Units A and B; green, a significant relationship; yellow, insufficient data to compare; and white, a
pair not on our list of possible prerequisites.
Fig. 2. Heatmap comparing data-based and cognitive model-based prerequisite relationships. Dark green
indicates a true prerequisite relationship found in both sets; light green, a relationship deemed not prereq-
uisite by both; orange, a prerequisite relationship listed by the Cognitive Tutor that was not found in the
data; and yellow, a relationship found in the data but not in the Cognitive Tutor.
tinctly different from methods to derive student skill models from data (e.g. [Cen et al.
2006; Barnes et al. 2005]). We feel that our method is complimentary to such work.
Our analysis takes place at a larger grain-size, comparing the relationships between
units of instruction that consist of distinct skill models. The similarity or difference be-
tween these skill models may play an important role in the prerequisite relationship
between the units.
It is interesting to note how the empirically derived prerequisite structure deviates
from that determined by domain experts. Where the empirical evidence shows a pre-
requisite not identified by a domain expert, we can imagine mathematical skills prac-
ticed in both units but not part of the focus for the units, and thus easily overlooked
in the expert analysis. Expert judgment is more suspect when it has identified a pre-
requisite that is not borne out by the data (marked in orange). However, as we see
from Figure 2, this type of disagreement between expert and empirical data was con-
fined mainly to the earlier units in each standard curriculum. It may be that a large
ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.
A Method for Finding Prerequisites Within a Curriculum A:5
number of students were already sufficiently proficient on this preliminary material,
to the point where any practice on prerequisite material would make no appreciable
improvement to their performance. In this case, these could be true prerequisites but
for our population it would make more sense to treat them as false prerequisites.
These results do raise many new questions, especially where this analysis disagrees
with expert assessment, and further investigation of such units is warranted. There
are many possible hypotheses for any given pre-requisite relationship. It could be that
the skills practiced in unit A are truly foundational for those in B. Conversely, it could
also be the case that the units share some subset of skills in common, and that it is sim-
ply the previous practice with these skills which provides the improved performance
on unit B. In such a case, the curriculum structure is effectively loading the learning
of those common skills onto whichever unit occurs first. It is also possible that another
metric of student performance might reveal a difference not revealed by graduation
rates. Although our method does not answer such questions, it does provide a useful
framework for focusing attention to those unit pairs which deserve more investigation.
Since there is not already an agreed-upon way to objectively determine prerequi-
sites, it is difficult to assess the validity of our method. A possible assessment method
would be to see if different measurements of student performance yield the same pre-
requisite relationships. One such measure of performance is the number of problems
done by students before graduating. Preliminary data exploration suggests that the
result with this metric will be similar.
The dependency structure of a curriculum can be a powerful piece of information.
For instance, [Ohland et al. 2004] found that removing a gateway course for their
engineering major improved graduation rates for the major as a whole, suggesting that
the course was not a true prerequisite. Every curriculum is subject to time constraints,
and knowing which units can be safely omitted if necessary is important. Given this
information, we could help teachers to customize their curricula without removing
necessary prerequisite units.
Overall we feel that data-driven course design is a fruitful topic for research, which
could yield relevant and useful information in many different areas of education.
ACKNOWLEDGMENTS
The authors would like to thank Dr. Steve Ritter of Carnegie Learning for his guidance and advice.
REFERENCES
BARNES, T., BITZ ER , D., AN D VOUK, M. 2005. Experimental analysis of the Q-matrix method in knowledge
discovery. Foundations of Intelligent Systems, 603–611.
BERGAN, J. AND JES KA, P. 1980. An examination of prerequisite relations, positive transfer among learning
tasks, and variations in instruction for a seriation hierarchy. Contemporary Educational Psychology 5, 3,
203–215.
CEN, H ., KO EDI NGE R, K.,AND JUNKER, B. 2006. Learning Factors Analysis–A general method for cognitive
model evaluation and improvement. Intelligent Tutoring Systems, 164–175.
CHI , M. AN D KOES KE, R. 1983. Network representation of a childs dinosaur knowledge. Developmental
Psychology 19, 1, 29–39.
GRI FFITH S, A. AN D GRAN T, B. 1985. High school students’ understanding of food webs: Identification of a
learning hierarchy and related misconceptions. Journal of Research in Science Teaching 22, 5, 421–436.
HOR NE, S. 1983. Learning Hierarchies: a critique. Educational Psychology 3, 1, 63–77.
OHL AND, M ., YU HAS Z, A ., AN D SILL , B. 2004. Identifying and removing a calculus prerequisite as a bot-
tleneck in Clemson’s General Engineering curriculum. JOURNAL OF ENGINEERING EDUCATION-
WASHINGTON- 93, 253–258.
WAN G, Y. 2005. A GA-based methodology to determine an optimal curriculum for schools. Expert Systems
with Applications 28, 1, 163–174.
ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.
... Prerequisites are defined as the necessary contexts that enable downstream activity or state in human cognitive processes [16]. In certain domains -especially education [1,24,34] -such requisites are an important consideration that constrains item selection. Context-aware recommendation systems have integrated the use of collaborative filtering with auxiliary metadata about users' current background or state, such as time sand location [22,32]. ...
... It is often also brittle, as items and their relationships with underlying contextual concepts can evolve over time. Assuming manually-labeled prerequisites [33,34] is often unrealistic due to the heavy cost of human annotation. An automatic means of inferring prerequisites is called for. ...
... An early study by Vuong et al. [34] examined the effect of learning curriculum units in various orders. Chen et al. [6] treated prerequisite relations as a Bayesian network, which requires a mapping of courses to fine-grained skill and relevant student performance data. ...
Preprint
Prerequisites can play a crucial role in users' decision-making yet recommendation systems have not fully utilized such contextual background knowledge. Traditional recommendation systems (RS) mostly enrich user-item interactions where the context consists of static user profiles and item descriptions, ignoring the contextual logic and constraints that underlie them. For example, an RS may recommend an item on the condition that the user has interacted with another item as its prerequisite. Modeling prerequisite context from conceptual side information can overcome this weakness. We propose Prerequisite Driven Recommendation (PDR), a generic context-aware framework where prerequisite context is explicitly modeled to facilitate recommendation. We first design a Prerequisite Knowledge Linking (PKL) algorithm, to curate datasets facilitating PDR research. Employing it, we build a 75k+ high-quality prerequisite concept dataset which spans three domains. We then contribute PDRS, a neural instantiation of PDR. By jointly optimizing both the prerequisite learning and recommendation tasks through multi-layer perceptrons, we find PDRS consistently outperforms baseline models in all three domains, by an average margin of 7.41%. Importantly, PDRS performs especially well in cold-start scenarios with improvements of up to 17.65%.
... This paper focuses on the concept prerequisite learning problem (Talukdar and Cohen 2012;Liang et al. 2015), where the goal is to predict whether a concept A is a prerequisite of a concept B given the pair (A, B). Although there has been research on learning prerequisites (Vuong, Nixon, and Towle 2011;Talukdar and Cohen 2012;Liang et al. 2015;Wang et al. 2016;Scheines, Silver, and Goldin 2014;Liu et al. 2016;Pan et al. 2017), the lack of large scale prerequisite labels remains a major obstacle for effective machine learning-based solutions. A possible solution for learning a good classifier given limited labeled instances is active learning (Angluin 1988;Cohn, Ghahramani, and Jordan 1996;Settles 2010), since it is designed to learn classifiers with significantly fewer labels by actively directing the query to the most "valuable" examples. ...
... Regardless of being a relatively new research area, datadriven methods for learning concept prerequisite relations have been explored in multiple works. Established methods in educational data mining have been devoted to analyzing student assessment data which records the performance of students on different items (Vuong, Nixon, and Towle 2011;Scheines, Silver, and Goldin 2014;Chen, Wuillemin, and Labat 2015;Chen, González-Brenes, and Tian 2016). Such methods require that the association between test items and handcrafted knowledge components is set beforehand and are not applicable for processing a large concept set. ...
Article
Concept prerequisite learning focuses on machine learning methods for measuring the prerequisite relation among concepts. With the importance of prerequisites for education, it has recently become a promising research direction. A major obstacle to extracting prerequisites at scale is the lack of large-scale labels which will enable effective data-driven solutions. We investigate the applicability of active learning to concept prerequisite learning.We propose a novel set of features tailored for prerequisite classification and compare the effectiveness of four widely used query strategies. Experimental results for domains including data mining, geometry, physics, and precalculus show that active learning can be used to reduce the amount of training data required. Given the proposed features, the query-by-committee strategy outperforms other compared query strategies.
... in educational data mining has been devoted to analyzing student assessment data which records the performance of students on different items (e.g. units, sections, etc.) (Vuong, Nixon, and Towle 2011). Existing approaches aim to discover prerequisite relations among certain performance variables such as handcrafted knowledge components and skills. ...
... Design of data-driven methods for automatically discovering prerequisite relations has been explored in multiple works. Established methods in educational data mining have been developed based on the automatic analysis of the assessment data acquired by students' performance (Vuong, Nixon, and Towle 2011). In addition, Liu et al. (2011) proposed a classification method for mining learning dependencies between knowledge units in text books. ...
Article
Prerequisite relations among concepts play an important role in many educational applications such as intelligent tutoring system and curriculum planning. With the increasing amount of educational data available, automatic discovery of concept prerequisite relations has become both an emerging research opportunity and an open challenge. Here, we investigate how to recover concept prerequisite relations from course dependencies and propose an optimization based framework to address the problem. We create the first real dataset for empirically studying this problem, which consists of the listings of computer science courses from 11 U.S. universities and their concept pairs with prerequisite labels. Experiment results on a synthetic dataset and the real course dataset both show that our method outperforms existing baselines.
... Item-to-skill mappings (also called Q-matrix) are desirable because they allow more interpretable diagnostic information. They also allow for discovering prerequisites among items based on their skills mapping [3,4]. They are standard representations used to specify the relationships between individual test items and target skills. ...
... However, with a large number of participants, the other possible factors can be minimized. For example, Vuong, Nixon, and Towle [25] analyzed prerequisites within a curriculum of 888 schools in the US involving 20,577 students using a binomial test to look for possible correlations and to compare the performance of the control and experimental groups. ...
Article
Full-text available
Determining prerequisite requirements is vital for successful curriculum development and student on-schedule completion of the course of study. This study adapts the Receiver Operating Characteristic (ROC) curve analysis to determine a threshold grade in a prerequisite course necessary for passing the next course in a sequence. This method was tested on a dataset of Calculus 1 and Calculus 2 grades of 164 undergraduate students majoring in mathematics at a private university in Kazakhstan. The results showed that while the currently used practice of setting prerequisite grade requirements is accurately identifying successful completions of Calculus 2, the ROC method is more accurate in identifying possible failures in Calculus 2. The findings also indicate that prior completion of Calculus 1 is positively associated with success in a Calculus 2 course. Thus, this study contributes to the field of mathematics education by providing a new data-driven methodology for determining the optimal threshold grade for mathematics prerequisite courses.
... In recent years, the mining of prerequisite relations among concepts has become a focus of researchers. Prerequisite relations among concepts have played a significant role in many applied fields of education, such as curriculum planning and design [1,2], student knowledge tracking [3], concept map building [4,5], learner ranking [6,7,8], document reading list generation [9,10], evaluation of students' learning status [11], and so on. ...
Article
Full-text available
Nowadays, online learning is becoming more and more popular. Various online learning platforms provide a huge amount of learning resources for learners around the world. When choosing or sorting learning resources, learners often need to know what important knowledge concepts are addressed in each learning resource. Exploring the prerequisite relations among concepts is of great significance to educational planning. In this paper, we extracted concepts from the content of course descriptions and proposed a new approach that uses both course-based features and Wikipedia-based features to discover the prerequisite relations between knowledge concepts. Experiments on both English and Chinese datasets show that the proposed method outperforms existing baselines.
Article
The interdisciplinary field of the learning sciences encompasses educational psychology, cognitive science, computer science, and anthropology, among other disciplines. The Cambridge Handbook of the Learning Sciences, first published in 2006, is the definitive introduction to this innovative approach to teaching, learning, and educational technology. In this significantly revised third edition, leading scholars incorporate the latest research to provide seminal overviews of the field. This research is essential in developing effective innovations that enhance student learning - including how to write textbooks, design educational software, prepare effective teachers, and organize classrooms. The chapters illustrate the importance of creating productive learning environments both inside and outside school, including after school clubs, libraries, and museums. The Handbook has proven to be an essential resource for graduate students, researchers, consultants, software designers, and policy makers on a global scale.
Article
In recent years, the use of analytics and data mining – methodologies that extract useful information from large datasets – has become commonplace in science and business. When these methods are used in education, they are referred to as learning analytics (LA) and educational data mining (EDM). For example, adaptive learning platforms – those that respond uniquely to each learner – require learning analytics to model the learner’s current state of knowledge. The researcher can conduct second-by-second analyses of phenomena that occur over long periods of time or in an individual learning session. Large datasets are required for these analyses. In most cases, the data are gathered automatically – such as keystrokes, eye movement, or assessments – and are analyzed using algorithms based in learning sciences research. This chapter reviews prediction methods, structure discovery, relationship mining, and discovery with models.
Chapter
Educational data mining techniques are very useful to analyze learner performance in purpose to optimize the approach of item-to-skill mapping. Therefore computing a degree of similarity between items using different measures based on the performance of the learner toward items, enhance the clustering of different items into knowledge components. This paper proposes a computational framework to group the elements of the corresponding knowledge component. The first phase of the framework represents a variation of Pearson coefficient to measure item similarity by applying a penalty score that is calculated from the number of hints taken by the learner during solving two items. The second phase applies a dimensionality reduction using deep auto encoders to improve the clustering accuracy. The experimental results show that clustering based on the penalized Pearson coefficient and the deep dimensionality reduction (PPC+DDR) outperforms basic clustering based on different similarity methods, with approximately +0.2 in Mean silhouette coefficient.KeywordsEducational data miningLearner modelMachine learningDeep learningItem-to-skill mappingClustering
Article
To use educational resources efficiently and dig out the nature of relations among MOOCs (massive open online courses), a knowledge graph was built for MOOCs on four major platforms: Coursera, EDX, XuetangX, and ICourse. This paper demonstrates the whole process of educational knowledge graph construction for reference. And this knowledge graph, the largest knowledge graph of MOOC resources at present, stores and represents five classes, 11 kinds of relations and 52 779 entities with their corresponding properties, amounting to more than 300 000 triples. Notably, 24 188 concepts are extracted from text attributes of MOOCs and linked them directly with corresponding Wikipedia entries or the closest entries calculated semantically, which provides the normalized representation of knowledge and a more precise description for MOOCs far more than enriching words with explanatory links. Besides, prerequisites discovered by direct extractions are viewed as an essential supplement to augment the connectivity in the knowledge graph. This knowledge graph could be considered as a collection of unified MOOC resources for learners and the abundant data for researchers on MOOC-related applications, such as prerequisites mining.
Conference Paper
Full-text available
A cognitive model is a set of production rules or skills encoded in intelligent tutors to model how students solve problems. It is usually generated by brainstorming and iterative refinement between subject experts, cognitive scientists and programmers. In this paper we propose a semi-automated method for improving a cognitive model called Learning Factors Analysis that combines a statistical model, human expertise and a combinatorial search. We use this method to evaluate an existing cognitive model and to generate and evaluate alternative models. We present improved cognitive models and make suggestions for improving the intelligent tutor based on those models.
Conference Paper
Full-text available
The q-matrix method, a new method for data mining and knowledge discovery, is compared with factor analysis and cluster analysis in analyzing fourteen experimental data sets. This method creates a matrix-based model that extracts latent relationships among observed binary variables. Results show that the q-matrix method offers several advantages over factor analysis and cluster analysis for knowledge discovery. The q-matrix method can perform fully unsupervised clustering, where the number of clusters is not known in advance. It also yields better error rates than factor analysis, and is comparable in error to cluster analysis. The q-matrix method also allows for automatic interpretation of the data sets. These results suggest that the q-matrix method can be an important tool in automated knowledge discovery.
Article
This study investigated the hypothesis that prerequisite skills in a seriation learning hierarchy mediate positive transfer for superordinate skills. In addition, the effect of instructional conditions involving modeling combined with variations in feedback on skill acquisition at different levels in the seriation sequence was examined. Application of the White and Clark test of inclusion revealed a close correspondence between hypothesized and observed prerequisite relations in the seriation hierarchy. However, structural analysis of the data revealed that positive transfer did not occur in many instances in which it was expected. In addition, instructional variations were found not to be uniformly effective at all levels in the sequence.
Article
Learning hierarchies have received much attention from developmental and instructional psychologists. This article notes that conceptual confusions and methodological deficiencies occur in much of the research so far published. The conceptual confusions concern the terminology used; the ‘likelihood’ or ‘causal’ relationships between elements in the hierarchy; the distinction between ‘prerequisition’ and ‘positive transfer'; the distinction between single pieces of learning and classes of learning; the inclusivity of hierarchical relationships. The methodological deficiencies arise from an inability to measure ‘causal’ relationships; the omission of measurements of ‘positive transfer'; the difficulty of measuring the range of possible relationships within a hierarchy; the need to remove instructional effects from hierarchy validation studies. It is concluded that these confusions and deficiencies preclude data from learning hierarchy studies from being used to diagnose learning failure and in test construction. Suggestions for alternatives to, and improvements on, current methods are made.
Article
A review of prerequisites often reveals that reasons for requiring a prerequisite may no longer prevail due to curriculum or course changes. Based on a study of a curriculum bottleneck unrelated to required mastery, the prerequisite structure in Clemson University's General Engineering curriculum (the common first- year curriculum for all engineering students) was changed so that Calculus I could be taken in the second semester. Student record analysis shows both the magnitude of the bottleneck prior to the policy change and the effect on student enrollment practices after the policy change. Longitudinal studies show a statistically significant improvement in retention in engineering adding to the body of evidence that indicates that it is important to retention that students start college mathematics at a level for which they are prepared.
Article
A 4-yr-old male's knowledge of 40 dinosaurs was elicited from 2 tasks. The data gathered from these knowledge-production protocols were used to map 2 interrelated semantic networks of dinosaurs, viewed as concept nodes connected by links. The 2 mappings corresponded to 2 sets of dinosaurs (20 each), partitioned on the basis of external criteria: mother's subjective judgment of the S's knowledge of each dinosaur and the frequency of mention in the S's dinosaur books. Comparisons of the structure of the 2 mappings were based on 3 attributes: (a) number of links, (b) strength of links, and (c) the internal cohesion of the network in terms of higher-order groupings and specific patterns of interlinkages. The validity of the differential structures of the 2 mappings was verified by the corresponding differential memory performance. The better structured set of dinosaurs was more easily remembered and retained by the S over a year than the less structured set of dinosaurs. (28 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Developing an understanding of the nature of food webs is an important topic in today's biology curricula. The relationships represented in a food web are rule-like in nature. Hence, it should be possible to construct a learning hierarchy for this concept. A hierarchy leading to the ability to determine how a change in the size of one population can affect another population in the same web but not on the same chain was hypothesized. Data from 200 subjects were extremely consistent with the hierarchy. A second major focus related to the identification of specific misconceptions held by subjects for food webs. The need to identify students' misconceptions of important concepts has been expressed widely in the recent science education literature. In the present article, an argument is presented for the usefulness of learning hierarchies in this work. Specific misconceptions and the frequencies of their occurrence are reported.
Article
Arranging an optimal curriculum at universities is a difficult problem to be resolved with multiple constraints. The arrangement of the curriculum can be divided into two sub issues: curriculum evaluation and curriculum scheduling. Each sub issue is concerned with its own multiple parameters. Traditional linear programming methods do not obtain satisfactory results with this complex problem. This study developed a methodology by utilizing genetic algorithms, a promising tool for solving complex optimization problems to deal with multiple constraints. The results of this study indicated that the methodology was feasible and there was a significant reduction in the amount of time required for arranging the optimal curriculum.
Experimental analysis of the Q-matrix method in knowledge discovery
  • T Barnes
  • D Bitzer
  • M Vouk
BARNES, T., BITZER, D., AND VOUK, M. 2005. Experimental analysis of the Q-matrix method in knowledge discovery. Foundations of Intelligent Systems, 603-611.