Conference PaperPDF Available

T.: Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing

Authors:

Abstract and Figures

The field of intelligent tutoring systems has been using the well known knowledge tracing model, popularized by Corbett and Anderson (1995), to track student knowledge for over a decade. Surprisingly, models currently in use do not allow for individual learning rates nor individualized estimates of student initial knowledge. Corbett and Anderson, in their original articles, were interested in trying to add individualization to their model which they accomplished but with mixed results. Since their original work, the field has not made significant progress towards individualization of knowledge tracing models in fitting data. In this work, we introduce an elegant way of formulating the individualization problem entirely within a Bayesian networks framework that fits individualized as well as skill specific parameters simultaneously, in a single step. With this new individualization technique we are able to show a reliable improvement in prediction of real world data by individualizing the initial knowledge parameter. We explore three difference strategies for setting the initial individualized knowledge parameters and report that the best strategy is one in which information from multiple skills is used to inform each student’s prior. Using this strategy we achieved lower prediction error in 33 of the 42 problem sets evaluated. The implication of this work is the ability to enhance existing intelligent tutoring systems to more accurately estimate when a student has reached mastery of a skill. Adaptation of instruction based on individualized knowledge and learning speed is discussed as well as open research questions facing those that wish to exploit student and skill information in their user models.
Content may be subject to copyright.
Modeling Individualization in a Bayesian Networks
Implementation of Knowledge Tracing
Zachary A. Pardos1, Neil T. Heffernan
Worcester Polytechnic Institute
Department of Computer Science
zpardos@wpi.edu, nth@wpi.edu
Abstract. The field of intelligent tutoring systems has been using the well
known knowledge tracing model, popularized by Corbett and Anderson (1995),
to track student knowledge for over a decade. Surprisingly, models currently in
use do not allow for individual learning rates nor individualized estimates of
student initial knowledge. Corbett and Anderson, in their original articles, were
interested in trying to add individualization to their model which they
accomplished but with mixed results. Since their original work, the field has not
made significant progress towards individualization of knowledge tracing
models in fitting data. In this work, we introduce an elegant way of formulating
the individualization problem entirely within a Bayesian networks framework
that fits individualized as well as skill specific parameters simultaneously, in a
single step. With this new individualization technique we are able to show a
reliable improvement in prediction of real world data by individualizing the
initial knowledge parameter. We explore three difference strategies for setting
the initial individualized knowledge parameters and report that the best strategy
is one in which information from multiple skills is used to inform each
student’s prior. Using this strategy we achieved lower prediction error in 33 of
the 42 problem sets evaluated. The implication of this work is the ability to
enhance existing intelligent tutoring systems to more accurately estimate when
a student has reached mastery of a skill. Adaptation of instruction based on
individualized knowledge and learning speed is discussed as well as open
research questions facing those that wish to exploit student and skill
information in their user models.
Keywords: Knowledge Tracing, Individualization, Bayesian Networks, Data
Mining, Prediction, Intelligent Tutoring Systems
1 Introduction
Our initial goal was simple; to show that with more data about students’ prior
knowledge, we should be able to achieve a better fitting model and more accurate
prediction of student data. The problem to solve was that there existed no Bayesian
network model to exploit per user prior knowledge information. Knowledge tracing
1 National Science Foundation funded GK-12 Fellow
Pardos, Z. A., Heffernan, N. T. (2010) Modeling Individualization in a Bayesian Networks
Implementation of Knowledge Tracing. In Proceedings of the 18th International Conference on User
Modeling, Adaptation and Personalization. pp. 255-266. Big Island, Hawaii.
2 Zachary A. Pardos, Neil T. Heffernan
(KT) is the predominant method used to model student knowledge and learning over
time. This model, however, assumes that all students share the same initial prior
knowledge and does not allow for per student prior information to be incorporated.
The model we have engineered is a modification to knowledge tracing that increases
its generality by allowing for multiple prior knowledge parameters to be specified and
lets the Bayesian network determine which prior parameter value a student belongs to
if that information is not known before hand. The improvements we see in predicting
real world data sets are palpable, with the new model predicting student responses
better than standard knowledge tracing in 33 out of the 42 problem sets with the use
of information from other skills to inform a prior per student that applied to all
problem sets. Equally encouraging was that the individualized model predicted better
than knowledge tracing in 30 out of 42 problem sets without the use of any external
data. Correlation between actual and predicted responses also improved significantly
with the individualized model.
1.1 Inception of knowledge tracing
Knowledge tracing has become the dominant method of modeling student knowledge.
It is a variation on a model of learning first introduced by Atkinson in 1972 [1].
Knowledge tracing assumes that each skill has 4 parameters; two knowledge
parameters and two performance parameters. The two knowledge parameters are:
initial (or prior) knowledge and learn rate. The initial knowledge parameter is the
probability that a particular skill was known by the student before interacting with the
tutor. The learn rate is the probability that a student will transition between the
unlearned and the learned state after each learning opportunity (or question). The two
performance parameters are: guess rate and slip rate. The guess rate is the probability
that a student will answer correctly even if she does not know the skill associated with
the question. The slip rate is the probability that a student will answer incorrectly even
if she knows the required skill. Corbett and Anderson introduced this method to the
intelligent tutoring field in 1995 [2]. It is currently employed by the cognitive tutor,
used by hundreds of thousands of students, and many other intelligent tutoring
systems to predict performance and determine when a student has mastered a
particular skill.
It might strike the uninitiated as a surprise that the dominant method of modeling
student knowledge in intelligent tutoring systems, knowledge tracing, does not allow
for students to have different learn rates even though it seems likely that students
differ in this regard. Similarly, knowledge tracing assumes that all students have the
same probability of knowing a particular skill at their first opportunity.
In this paper we hope to reinvigorate the field to further explore and adopt models
that explicitly represent the assumption that students differ in their individual initial
knowledge, learning rate and possibly their propensity to guess or slip.
1.2 Previous approaches to predicting student data using knowledge tracing
Corbett and Anderson were interested in implementing the learning rate and prior
knowledge individualization that was originally described as part of Atkinson’s model
Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing
3
of learning. They accomplished this but with limited success. They created a two step
process for learning the parameters of their model where the four KT parameters were
learned for each skill in the first step and the individual weights were applied to those
parameters for each student in the second step. The second step used a form of
regression to fit student specific weights to the parameters of each skill. Various
factors were also identified for influencing the individual priors and learn rates [3].
The results [2] of their work showed that while the individualized model’s predictions
correlated better with the actual test results than the non-individualized model, their
individualized model did not show an improvement in the overall accuracy of the
predictions.
More recent work by Baker et al [4] has found utility in the contextualization of the
guess and slip parameters using a multi-staged machine-learning processes that also
uses regression to fine tune parameter values. Baker’s work has shown an
improvement in the internal fit of their model versus other knowledge tracing
approaches when correlating inferred knowledge at a learning opportunity with the
actual student response at that opportunity but has yet to validate the model with an
external validity test.
One of the knowledge tracing approaches compared to the contextual guess and
slip method was the Dirichlet approach introduced by Beck et al [5]. The goal of this
method was not individualization or contextualization but rather to learn plausible
knowledge tracing model parameters by biasing the values of the initial knowledge
parameter. The investigators of this work engaged in predicting student data from a
reading tutor but found only a 1% increase in performance over standard knowledge
tracing (0.006 on the AUC scale). This improvement was achieved by setting model
parameters manually based on the authors understanding of the domain and not by
learning the parameters from data.
1.3 The ASSISTment System
Our dataset consisted of student responses from The ASSISTment System, a web
based math tutoring system for 7th-12th grade students that provides preparation for
the state standardized test by using released math problems from previous tests as
questions on the system. Tutorial help is given if a student answers the question
wrong or asks for help. The tutorial help assists the student learn the required
knowledge by breaking the problem into sub questions called scaffolding or giving
the student hints on how to solve the question.
2 The Model
Our model uses Bayesian networks to learn the parameters of the model and predict
performance. Reye [6] showed that the formulas used by Corbett and Anderson in
their knowledge tracing work could be derived from a Hidden Markov Model or
Dynamic Bayesian Network (DBN). Corbett and colleagues later released a toolkit [7]
using non-individualized Bayesian knowledge tracing to allow researchers to fit their
own data and student models with DBNs.
4 Zachary A. Pardos, Neil T. Heffernan
2.1 The Prior Per Student model vs. standard Knowledge Tracing
The model we present in this paper focuses only on individualizing the prior
knowledge parameter. We call it the Prior Per Student (PPS) model. The difference
between PPS and Knowledge Tracing (KT) is the ability to represent a different prior
knowledge parameter for each student. Knowledge Tracing is a special case of this
prior per student model and can be derived by fixing all the priors of the PPS model to
the same values or by specifying that there is only one shared student ID. This
equivalence was confirmed empirically.
Fig. 1. The topology and parameter description of Knowledge Tracing and PPS
The two model designs are shown in Figure 1. Initial knowledge and prior knowledge
are synonymous. The individualization of the prior is achieved by adding a student
node. The student node can take on values that range from one to the number of
students being considered. The conditional probability table of the initial knowledge
node is therefore conditioned upon the student node value. The student node itself
also has a conditional probability table associated with it which determines the
probability that a student will be of a particular ID. The parameters for this node are
fixed to be 1/N where N is the number of students. The parameter values set for this
node are not relevant since the student node is an observed node that corresponds to
the student ID and need never be inferred.
This model can be easily changed to individualize learning rates instead of prior
knowledge by connecting the student node to the subsequent knowledge nodes thus
training an individualized P(T) conditioned upon student as shown in Figure 2.
Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing
5
Fig. 2. Graphical depiction of our individualization modeling technique applied to the
probability of learning parameter. This model is not evaluated in this paper but is presented to
demonstrate the simplicity in adapting our model to other parameters.
2.2 Parameter Learning and Inference
There are two distinct steps in knowledge tracing models. The first step is learning the
parameters of the model from all student data. The second step is tracing an individual
student’s knowledge given their respective data. All knowledge tracing models allow
for initial knowledge to be inferred per student in the second step. The original KT
work [2] that individualized parameters added an additional step in between 1 and 2
to fit individual weights to the general parameters learned in step one. The PPS model
allows for the individualized parameters to be learned along with the non-
individualized parameters of the model in a single step. Assuming there is variance
worth modeling in the individualization parameter, we believe that a single step
procedure allows for more accurate parameters to be learned since a global best fit to
the data can now be searched for instead of a best fit of the individual parameters after
the skill specific parameters are already learned.
In our model each student has a student ID represented in the student node. This
number is presented during step one to associate a student with his or her prior
parameter. In step two, the individual student knowledge tracing, this number is again
presented along with the student’s respective data in order to again associate that
student with the individualized parameters learned for that student in the first step.
3 External Validity: Student Performance Prediction
In order to test the real world utility of the prior per student model, we used the last
question of each of our problem sets as the test question. For each problem set we
trained two separate models: the prior per student model and the standard knowledge
tracing model. Both models then made predictions of each student’s last question
responses which could then be compared to the students actual responses.
6 Zachary A. Pardos, Neil T. Heffernan
3.1 Dataset description
Our dataset consisted of student responses to problem sets that satisfied the following
constraints:
Items in the problem set must have been given in a random order
A student must have answered all items in the problem set in one day
The problem set must have data from at least 100 students
There are at least four items in the problem set of the exact same skill
Data is from Fall of 2008 to Spring of 2010
Forty-two problem sets matched these constraints. Only the items within the
problem set with the exact same skill tagging were used. 70% of the items in the 42
problem sets were multiple choice, 30% were fill in the blank (numeric). The size of
our resulting problem sets ranged from 4 items to 13. There were 4,354 unique
students in total with each problem set having an average of 312 students ( = 201)
and each student completing an average of three problem sets ( = 3.1).
Table 1. Sample of the data from a five item problem set
Student ID
1st response
2nd response
3rd response
5th response
750
0
1
1
1
751
0
1
1
0
752
1
1
0
0
In Table 1, each response represents either a correct or incorrect answer to the
original question of the item. Scaffold responses are ignored in our analysis and
requests for help are marked as incorrect responses by the system.
3.2 Prediction procedure
Each problem set was evaluated individually by first constructing the appropriate
sized Bayesian network for that problem set. In the case of the individualized model,
the size of the constructed student node corresponded to the number of students with
data for that problem set. All the data for that problem set, except for responses to the
last question, was organized into an array to be used to train the parameters of the
network using the Expectation Maximization (EM) algorithm. The initial values for
the learn rate, guess and slip parameters were set to different values between 0.05 and
0.90 chosen at random. After EM had learned parameters for the network, student
performance was predicted. The prediction was done one student at a time by entering
,as evidence to the network, the responses of the particular student except for the
response to the last question. A static unrolled dynamic Bayesian network was used.
This enabled individual inferences of knowledge and performance to be made about
the student at each question including the last question. The probability of the student
answering the last question correctly was computed and saved to later be compared to
the actual response.
Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing
7
3.3 Approaches to setting the individualized initial knowledge values
In the prediction procedure, due to the number of parameters in the model, care had to
be given to how the individualized priors would be set before the parameters of the
network were learned with EM. There were two decisions we focused on: a) what
initial values should the individualized priors be set to and b) whether or not those
values should be fixed or adjustable during the EM parameter learning process. Since
it was impossible to know the ground truth prior knowledge for each student for each
problem set, we generated three heuristic strategies for setting these values, each of
which will be evaluated in the results section.
3.3.1 Setting initial individualized knowledge to random values
One strategy was to treat the individualized priors exactly like the learn, guess and
slip parameters by setting them to random values to then be adjusted by EM during
the parameter learning process. This strategy effectively learns a prior per student per
skill. This is perhaps the most naïve strategy that assumes there is no means of
estimating a prior from other sources of information and no better heuristic for setting
prior values. To further clarify, if there are 600 students there will be 600 random
values between 0 and 1 set for for each skill. EM will then have 600 parameters to
learn in addition to the learn, guess and slip parameters of each skill. For the non-
individualized model, the singular prior was set to a random value and was allowed to
be adjusted by EM.
3.3.2 Setting initial individualized knowledge based on 1st response heuristic
This strategy was based on the idea that a student’s prior is largely a reflection of their
performance on the first question with guess and slip probabilities taken into account.
If a student answered the first question correctly, their prior was set to one minus an
ad-hoc guess value. If they answered the first question incorrectly, their prior was set
to an ad-hoc slip value. Ad-hoc guess and slip values are used because ground truth
guess and slip values cannot be known and because these values must be used before
parameters are learned. The accuracy of these values could largely impact the
effectiveness of this strategy. An ad-hoc guess value of 0.15 and slip value of 0.10
were used for this heuristic. Note that these guess and slip values are not learned by
EM and are separate from the performance parameters. The non-individualized prior
was set to the mean of the first responses and was allowed to be adjusted while the
individualized priors were fixed. This strategy will be referred to as the “cold start
heuristic” due to its bootstrapping approach.
3.3.3 Setting initial individualized knowledge based on global percent correct
This last strategy was based on the assumption that there is a correlation between
student performance on one problem set to the next, or from one skill to the next. This
is also the closest strategy to a model that assumes there is a single prior per student
that is the same across all skills. For each student, a percent correct was computed,
8 Zachary A. Pardos, Neil T. Heffernan
averaged over each problem set they completed. This was calculated using data from
all of the problem sets they completed except the problem set being predicted. If a
student had only completed the problem set being predicted then her prior was set to
the average of the other student priors. The single KT prior was also set to the average
of the individualized priors for this strategy. The individualized priors were fixed
while the non-individualized prior was adjustable.
3.4 Performance prediction results
The prediction performance of the models was calculated in terms of mean absolute
error (MAE). The mean absolute error for a problem set was calculated by taking the
mean of the absolute difference between the predicted probability of correct on the
last question and the actual response for each student. This was calculated for each
model’s prediction of correct on the last question. The model with the lowest mean
absolute error for a problem set was deemed to be the more accurate predictor of that
problem set. Correlation was also calculated between actual and predicted responses.
Table 2. Prediction accuracy and correlation of each model and initial prior strategy
Most accurate predictor (of 42)
Avg. Correlation
P(L0) Strategy
PPS
KT
PPS
KT
Percent correct heuristic
33
8
0.3515
0.1933
Cold start heuristic
30
12
0.3014
0.1726
Random parameter values
26
16
0.2518
0.1726
Table 2 shows the number of problem sets that PPS predicted more accurately than
KT and vice versa in terms of MAE for each prior strategy. This metric was used
instead of average MAE to avoid taking an average of averages. With the percent
correct heuristic, the PPS model was able to better predict student data in 33 of the 42
problem sets. The binomial with p = 0.50 tells us that the probability of 33 success or
more in 42 trials is << 0.05 (cutoff is 27 to achieve statistical significance), indicating
a result that was not the product of random chance. In one problem set the MAE of
PPS and KT were equal resulting in a total other than 42 (33 + 8 = 41). The cold start
heuristic, which used the 1st response from the problem set and two ad-hoc parameter
values, also performed well; better predicting 30 of the 42 problem sets which was
also statistically significantly reliable. We recalculated MAE for PPS and KT for the
percent correct heuristic this time taking the mean absolute difference between the
rounded probability of correct on the last question and actual response for each
student. The result was that PPS predicted better than KT in 28 out of the 42 problem
sets and tied KT in MAE in 10 of the problem sets leaving KT with 4 problem sets
predicted more accurately than PPS with the recalculated MAE. This demonstrates a
meaningful difference between PPS and KT in predicting actual student responses.
The correlation between the predicted probability of last response and actual last
response using the percent correct strategy was also evaluated for each problem set.
The PPS model had a higher correlation coefficient than the KT model in 32 out of 39
problem sets. A correlation coefficient was not able to be calculated for the KT model
in three of the problem sets due to a lack of variation in prediction across students.
Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing
9
This occurred in one problem set for the PPS model. The average correlation
coefficient across all problem sets was 0.1933 for KT and 0.3515 for PPS using the
percent correct heuristic. The MAE and correlation of the random parameter strategy
using PPS was better than KT. This was surprising since the PPS random parameter
strategy represents a prior per student per skill which could be considered an over
parameterization of the model. This is evidence to us that the PPS model may
outperform KT in prediction under a wide variety of conditions.
3.4.1 Response sequence analysis of results
We wanted to further inspect our models to see under what circumstances they
correctly and incorrectly predicted the data. To do this we looked at response
sequences and counted how many times their prediction of the last question was right
or wrong (rounding predicted probability of correct). For example: student response
sequence [0 1 1 1] means that the student answered incorrectly on the first question
but then answered correctly on the following three. The PPS (using percent correct
heuristic) and KT models were given the first three responses in addition to the
parameters of the model to predict the fourth. If PPS predicted 0.68 and KT predicted
0.72 probability of correct for the last question, they would both be counted as
predicting that instance correctly. We conducted this analysis on the 11 problem sets
of length four. There were 4,448 total student response sequence instances among the
11 problem sets. Tables 3 and 4 show the top sequences in terms of number of
instances where both models predicted the last question correctly (Table 3) and
incorrectly (Table 4). Tables 5-6 show the top instances of sequences where one
model predicted the last question correctly but the other did not.
Table 3. Predicted correctly by both
# of Instances
Response sequence
1167
1 1 1 1
340
0 1 1 1
253
1 0 1 1
252
1 1 0 1
Table 4. Predicted incorrectly by both
# of Instances
Response sequence
251
1 1 1 0
154
0 1 1 0
135
1 1 0 0
106
1 0 1 0
Table 5. Predicted correctly by PPS only
# of Instances
Response sequence
175
0 0 0 0
84
0 1 0 0
72
0 0 1 0
61
1 0 0 0
Table 6. Predicted correctly by KT only
# of Instances
Response sequence
75
0 0 0 1
54
1 0 0 1
51
0 0 1 1
47
0 1 0 1
Table 3 shows the sequences most frequently predicted correctly by both models.
These happen to also be among the top 5 occurring sequences overall. The top
occurring sequence [1 1 1 1] accounts for more than 1/3 of the instances. Table 4
shows that the sequence where students answer all questions correctly except the last
question is most often predicted incorrectly by both models. Table 5 shows that PPS
10 Zachary A. Pardos, Neil T. Heffernan
is able to predict the sequence where no problems are answered correctly. In no
instances does KT predict sequences [0 1 1 0] or [1 1 1 0] correctly. This sequence
analysis may not generalize to other datasets but it provides a means to identify areas
the model can improve in and where it is most strong. Figure 3 shows a graphical
representation of the distribution of sequences predicted by KT and PPS versus the
actual distribution of sequences. This distribution combines the predicted sequences
from all 11 of the four item problem sets. The response sequences are sorted by
frequency of actual response sequences from left to right in descending order.
Fig. 3. Actual and predicted sequence distributions of PPS (percent correct heuristic) and KT
The average residual of PPS is smaller than KT but as the chart shows, it is not by
much. This suggests that while PPS has been shown to provide reliably better
predictions, the increase in performance prediction accuracy may not be substantial.
4 Contribution
In this work we have shown how any Bayesian knowledge tracing model can easily
be extended to support individualization of any or all of the four KT parameters using
the simple technique of creating a student node and connecting it to the parameter
node or nodes to be individualized. The model we have presented allows for
individualized and skill specific parameters of the model to be learned simultaneously
in a single step thus enabling global best fit parameters to potentially be learned, a
potential that is prohibitive with multi step parameter learning methods [2,4].
We have also shown the utility of using this technique to individualize the prior
parameter by demonstrating reliable improvement over standard knowledge tracing in
0
200
400
600
800
1000
1200
1400
1600
1 1 1 1
0 0 0 0
0 1 1 1
1 1 0 1
1 0 1 1
1 1 1 0
0 1 0 0
0 0 0 1
0 0 1 1
1 1 0 0
1 0 0 0
0 1 1 0
1 0 0 1
0 1 0 1
0 0 1 0
1 0 1 0
Frequency of response sequences
Student response sequences
Response sequences for four question problem sets
actual
pps
kt
last
response
Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing
11
predicting real world student responses. The superior performance of the model that
uses PPS based on the student’s percent correct across all skills makes a significant
scientific suggestion that it may be more important to model a single prior per student
across skills rather than a single prior per skill across students, as is the norm.
5 Discussion and Future Work
We hope this paper is the beginning of a resurgence in attempting to better
individualize and thereby personalize students learning experiences in intelligent
tutoring systems.
We would like to know when using a prior per student is not beneficial. Certainly
if in reality all students had the same prior per skill then there would be no utility in
modeling an individualized prior. On the other hand, if student priors for a skill are
highly varied, which appears to be the case, then individualized priors will lead to a
better fitting model by allowing the variation in that parameter to be captured.
Is an individual parameter per student necessary or can the same or better
performance be achieved by grouping individual parameters into clusters? The
relatively high performance of our cold start heuristic model suggests that much can
be gained by grouping students into one of two priors based on their first response to
a given skill. While this heuristic worked, we suspect there are superior
representations and ones that allow for the value of the cluster prior to be learned
rather than set ad-hoc as we did. Ritter et al [8] recently showed that clustering of
similar skills can drastically reduce the number of parameters that need to be learned
when fitting hundreds of skills while still maintaining a high degree of fit to the data.
Perhaps a similar approach can be employed to find clusters of students and learning
their parameters instead of learning individualized parameters for every student.
Our work here has focused on just one of the four parameters in knowledge
tracing. We are particularly excited to see if by explicitly modeling the fact that
students have different rates of learning we can achieve higher levels of prediction
accuracy. The questions and tutorial feedback a student receives could be adapted to
his or learning rate. Student learning rates could also be reported to teachers allowing
them to more precisely or more quickly understand their classes of students. Guess
and slip individualization is also possible and a direct comparison to Baker’s
contextual guess and slip method would be an informative piece of future work.
We have shown that choosing a prior per student representation over the prior per
skill representation of knowledge tracing is beneficial in fitting our dataset; however,
a superior model is likely one that combines the attributes of the student with the
attributes of a skill. How to design this model that properly treats the interaction of
these two pieces of information is an open research question for the field. We believe
that in order to extend the benefit of individualization to new users of a system,
multiple problem sets must be linked in a single Bayesian network that uses evidence
from the multiple problem sets to help trace individual student knowledge and more
fully reap the benefits suggested by the percent correct heuristic.
This work has concentrated on knowledge tracing, however, we recognize there are
alternatives. Draney, Wilson and Pirolli [9] have introduced a model they argue is
more parsimonious than knowledge tracing due to having fewer parameters.
12 Zachary A. Pardos, Neil T. Heffernan
Additionally, Pavlik et al [10] have reported using different algorithms, as well as
brute force, for fitting the parameters of their models. We also point out that more
standard models that do not track knowledge such as item response theory that have
had large uses in and outside of the ITS field for estimating individual student and
question parameters. We know there is value in these other approaches and strive as a
field to learn how best to exploit information about students, questions and skills
towards the goal of a truly effective, adaptive and intelligent tutoring system.
Acknowledgements
We would like to thank all of the people associated with creating the ASSISTment
system listed at www.ASSISTment.org. We would also like to acknowledge funding
from the US Department of Education, the National Science Foundation, the Office of
Naval Research and the Spencer Foundation. All of the opinions expressed in this
paper are those of the authors and do not necessarily reflect the views of our funders.
References
1. Atkinson, R. C., Paulson, J. A. An approach to the psychology of instruction.
Psychological Bulletin, 1972, 78, 49-61.
2. Corbett, A. T., & Anderson, J. R. (1995). Knowledge tracing: modeling the acquisition of
procedural knowledge. User Modeling and User-Adapted Interaction, 4, 253278.
3. Corbett A. and Bhatnagar A. (1997). Student Modeling in the ACT Programming Tutor:
Adjusting a Procedural Learning Model with Declarative Knowledge. In User Modeling:
Proceedings of the 6th International Conference, pp. 243-254.
4. Baker, R.S.J.d., Corbett, A.T., Aleven, V.: More Accurate Student Modeling Through
Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing. In:
Wolf, B., Aimeur, E., Nkambou, R., Lajoie, S. (Eds.) Intelligent Tutoring Systems. LNCS,
vol. 5091/2008, pp. 406-415. Springer Berlin (2008)
5. Beck, J.E., Chang, K.M.: Identifiability: A Fundamental Problem of Student Modeling. In:
Conati, C., McCoy, K., Paliouras, G. (Eds.) User Modeling 2007. LNCS, vol. 4511/2009,
pp. 137-146. Springer Berlin (2007)
6. Reye, J. (2004). Student modelling based on belief networks. International Journal of
Artificial Intelligence in Education: Vol. 14, 63-96.
7. Chang, K.M., Beck, J.E., Mostow, J., & Corbett, A.: A Bayes Net Toolkit for Student
Modeling in Intelligent Tutoring Systems. In: Ikeda, M., Ashley, K., Chan, T.W. (Eds.)
Intelligent Tutoring Systems. LNCS, vol. 4053/2006, pp. 104-113. Springer Berlin (2006)
8. Ritter, S., Harris, T., Nixon, T., Dickison, D., Murray, C., Towle, B.(2009). Reducing the
knowledge tracing space. In Proceedings of the 2nd International Conference on
Educational Data Mining. pp. 151-160. Cordoba, Spain.
9. Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex
cognitive skill. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively
diagnostic assessment (pp. 103125). Hillsdale, NJ: Erlbaum.
10. Pavlik, P.I., Cen, H., Koedinger, K.R. (2009). Performance Factors Analysis - A New
Alternative to Knowledge Tracing. In Proceedings of the 14th International Conference
on Artificial Intelligence in Education. Brighton, UK, 531-538.
... Traditionally, metacognitive development has been approached through standardized interventions and instructional strategies aimed at enhancing students' awareness and control of their cognitive processes . However, with advancements in AIdriven adaptive learning systems, there emerges an opportunity to tailor metacognitive support to individual learners' cognitive styles, preferences, and prior knowledge (Arroyo et al., 2014;Pardos& Heffernan, 2010). AI-powered platforms have the potential to analyze vast datasets of learner interactions and provide personalized feedback, prompts, and scaffolding in real-time, fostering a more individualized and effective learning experience (Aleven et al., 2009;Baker et al., 2010). ...
... Their results indicated that AI-driven educational technologies can enhance self-regulated learning and academic performance, particularly in STEM disciplines. Pardos and Heffernan (2010) further emphasized the role of Bayesian networks in modeling individualization in intelligent tutoring systems, which can dynamically adapt to learners' evolving needs and promote metacognitive development. Calvo and D'Mello (2010) provided an interdisciplinary review of affect detection models and methods, arguing that AI systems capable of understanding and responding to learners' emotional states can provide comprehensive metacognitive support. ...
Article
Full-text available
This research paper investigates the perceptions of STEM (Science, Technology, Engineering, and Mathematics) learners regarding the integration of artificial intelligence (AI) in personalizing metacognitive strategy development. Metacognition, the awareness and regulation of one's own cognitive processes, plays a vital role in STEM education for enhancing problem-solving skills, critical thinking, and self-directed learning. With advancements in AI-driven technologies, there is growing interest in leveraging AI to personalize metacognitive support for individual learners. However, understanding how STEM learners perceive and engage with AI-driven interventions in their metacognitive development is essential for effective implementation. Through purposeful sampling and semi-structured interviews, this qualitative study explores the attitudes, experiences, and concerns of STEM learners regarding AI-driven personalized metacognitive support. Thematic analysis of interview data reveals diverse perspectives, ranging from acknowledgment of AI's potential benefits to skepticism about its effectiveness and ethical implications. The findings highlight the importance of considering learners' backgrounds, experiences, and ethical considerations in designing and implementing AI-driven interventions for metacognitive development in STEM education.
... It studies the problem of breaking down student learning into a series of time steps practicing certain KCs and using the correctness of each response step to track student knowledge. Classic Bayesian knowledge tracing methods [30,51] use latent binaryvalued variables to represent whether a student masters a KC or not. With the increase in popularity of neural networks, multiple deep learning-based KT methods were developed. ...
Preprint
Full-text available
Open-ended coding tasks, which ask students to construct programs according to certain specifications, are common in computer science education. Student modeling can be challenging since their open-ended nature means that student code can be diverse. Traditional knowledge tracing (KT) models that only analyze response correctness may not fully capture nuances in student knowledge from student code. In this paper, we introduce Test case-Informed Knowledge Tracing for Open-ended Coding (TIKTOC), a framework to simultaneously analyze and predict both open-ended student code and whether the code passes each test case. We augment the existing CodeWorkout dataset with the test cases used for a subset of the open-ended coding questions, and propose a multi-task learning KT method to simultaneously analyze and predict 1) whether a student's code submission passes each test case and 2) the student's open-ended code, using a large language model as the backbone. We quantitatively show that these methods outperform existing KT methods for coding that only use the overall score a code submission receives. We also qualitatively demonstrate how test case information, combined with open-ended code, helps us gain fine-grained insights into student knowledge.
... It studies the problem of breaking down student learning into a series of time steps practicing certain KCs and using the correctness of each response step to track student knowledge. Classic Bayesian knowledge tracing methods [39] use a latent binary-valued variable to represent whether a student masters a KC or not. Factor analysis-based methods [6,40] use features in addition to latent ability parameters in a logistic regression setting to predict student response correctness. ...
Preprint
Full-text available
Recent advances in large language models (LLMs) have led to the development of artificial intelligence (AI)-powered tutoring chatbots, showing promise in providing broad access to high-quality personalized education. Existing works have primarily studied how to make LLMs follow tutoring principles but not how to model student behavior in dialogues. However, analyzing student dialogue turns can serve as a formative assessment, since open-ended student discourse may indicate their knowledge levels and reveal specific misconceptions. In this work, we present a first attempt at performing knowledge tracing (KT) in tutor-student dialogues. We propose LLM prompting methods to identify the knowledge components/skills involved in each dialogue turn and diagnose whether the student responds correctly to the tutor, and verify the LLM's effectiveness via an expert human evaluation. We then apply a range of KT methods on the resulting labeled data to track student knowledge levels over an entire dialogue. We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues. We perform extensive qualitative analyses to highlight the challenges in dialogue KT and outline multiple avenues for future work.
... We explored three different baseline models that will ultimately be compared with our proposed model. (a) Bayesian Knowledge Tracing (BKT): BKT uses a Hidden Markov Model to dynamically assess and predict a learner's knowledge state (represented as binary states of "known" and "unknown"), with adjustments based on their responses to questions while considering probabilities of learning, guessing and slipping [88]; here, it also initially integrates both student-specific and skill-specific parameters to effectively account for individual learner variability and the hierarchical nature of skills [33], [94], [95]. (b) Performance Factor Analysis (PFA): PFA utilizes logistic regression to estimate the probability of the learner' performance on the question, factoring in individual learning ability, skill-related features (e.g., difficulty), and the learner's previous success and failures [32], [92]. ...
Preprint
Full-text available
Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning, such as in intelligent tutoring systems (ITSs). Learning performance data tend to be highly sparse (80\%\(\sim\)90\% missing observations) in most real-world applications due to adaptive item selection. This data sparsity presents challenges to using learner models to effectively predict future performance explore new hypotheses about learning. This article proposes a systematic framework for augmenting learner data to address data sparsity in learning performance data. First, learning performance is represented as a three-dimensional tensor of learners' questions, answers, and attempts, capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing tasks that predict missing performance values based on real observations. Third, a module for generating patterns of learning is used. This study contrasts two forms of generative Artificial Intelligence (AI), including Generative Adversarial Networks (GANs) and Generate Pre-Trained Transformers (GPT) to generate data associated with different clusters of learner data. We tested this approach on an adult literacy dataset from AutoTutor lessons developed for Adult Reading Comprehension (ARC). We found that: (1) tensor factorization improved the performance in tracing and predicting knowledge mastery compared with other knowledge tracing techniques without data augmentation, showing higher relative fidelity for this imputation method, and (2) the GAN-based simulation showed greater overall stability and less statistical bias based on a divergence evaluation with varying simulation sample sizes compared to GPT.
... -Бајесови мреживид на статистички модел кој користи графичка структура за претставување на меѓузависности и условни веројатности меѓу различни променливи. [42]. ...
Thesis
The large amounts of data generated in educational institutions, such as test results, student activities, and many other aspects of academic life, represent a very important data source. The application of data mining and machine learning techniques in analyzing this data enables the discovery of patterns, prediction of student needs, and personalization of learning. Big data enables an individual approach to learning, where teachers and educational systems can adapt their methods and resources according to the needs and challenges of each student. This doctoral dissertation focuses on researching the impact of big data in the context of education. Two main areas are studied through the theoretical framework and research methodology: primary and higher education. In the section on big data and its analysis in education, the research focus is on the application of data mining and machine learning techniques. The research methodology is focused on two main areas: primary education and higher education. In the field of elementary education, the research focus is placed on pair programming and the analysis of student success. In higher education, research focuses on the application of algorithms, data mining, the impact of a learning management system - LMS in online education, digital literacy skills, and the impact of information systems in universities. In summary, this doctoral thesis represents a significant contribution to the analysis and understanding of the impact of pair programming and data mining techniques in education, revealing new ways to improve learning processes and support students in modern educational environments. Keywords: Big Data, Education, Data Mining, Learning Management Systems, Pair Programming, Decision Tree Algorithm, Digital Twins.
Chapter
With the ever-growing presence of deep artificial neural networks in every facet of modern life, a growing body of researchers in educational data science—a field consisting of various interrelated research communities—have turned their attention to leveraging these powerful algorithms within the education domain. Use cases range from advanced knowledge tracing models that can leverage open-ended student essays or snippets of code to automatic affect and behavior detectors that can identify when a student is frustrated or aimlessly trying to solve problems unproductively—and much more. This chapter provides a brief introduction to deep learning, describes some of its advantages and limitations, presents a survey of its many uses in education, and discusses how it may further shape the field of educational data science.
Article
Full-text available
Discusses the relationship between a theory of learning and a theory of instruction. Examples are presented that illustrate how to proceed from a theoretical description of the learning process to the specification of an optimal strategy for carrying out instruction. The examples deal with fairly simple learning tasks, and are admittedly of limited generality. Nevertheless, they clearly define the steps necessary for deriving and testing instructional strategies, thereby providing a set of procedures for analyzing more complex problems. The parameter-dependent organization strategies are of particular importance because they take into account individual differences among learners as well as differences in difficulty among curriculum units. Experimental evaluations indicate that the parameter-dependent strategies lead to major gains in learning, when compared with strategies that do not take individual differences into account. (16 ref.) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Conference Paper
Full-text available
In Cognitive Tutors, student skill is represented by estimates of student knowledge on various knowledge components. The estimate for each knowledge component is based on a four-parameter model developed by Corbett and Anderson [Nb]. In this paper, we investigate the nature of the parameter space defined by these four parameters by modeling data from over 8000 students in four Cognitive Tutor courses. We conclude that we can drastically reduce the parameter space used to model students without compromising the behavior of the system. Reduction of the parameter space provides great efficiency gains and also assists us in interpreting specific learning and performance parameters.
Conference Paper
Full-text available
Knowledge tracing (KT)(1) has been used in various forms for adaptive computerized instruction for more than 40 years. However, despite its long history of application, it is difficult to use in domain model search procedures, has not been used to capture learning where multiple skills are needed to perform a single action, and has not been used to compute latencies of actions. On the other hand, existing models used for educational data mining (e.g. Learning Factors Analysis (LFA)(2)) and model search do not tend to allow the creation of a "model overlay" that traces predictions for individual students with individual skills so as to allow the adaptive instruction to automatically remediate performance. Because these limitations make the transition from model search to model application in adaptive instruction more difficult, this paper describes our work to modify an existing data mining model so that it can also be used to select practice adaptively. We compare this new adaptive data mining model (PFA, Performance Factors Analysis) with two versions of LFA and then compare PFA with standard KT.
Article
This paper describes an effort to model students' changing knowledge state during skill acquisition. Students in this research are learning to write short programs with the ACT Programming Tutor (APT). APT is constructed around a production rule cognitive model of programming knowledge, called theideal student model. This model allows the tutor to solve exercises along with the student and provide assistance as necessary. As the student works, the tutor also maintains an estimate of the probability that the student has learned each of the rules in the ideal model, in a process calledknowledge tracing. The tutor presents an individualized sequence of exercises to the student based on these probability estimates until the student has mastered each rule. The programming tutor, cognitive model and learning and performance assumptions are described. A series of studies is reviewed that examine the empirical validity of knowledge tracing and has led to modifications in the process. Currently the model is quite successful in predicting test performance. Further modifications in the modeling process are discussed that may improve performance levels.
Conference Paper
Modeling students’ knowledge is a fundamental part of intelligent tutoring systems. One of the most popular methods for estimating students’ knowledge is Corbett and Anderson’s [6] Bayesian Knowledge Tracing model. The model uses four parameters per skill, fit using student performance data, to relate performance to learning. Beck [1] showed that existing methods for determining these parameters are prone to the Identifiability Problem: the same performance data can be fit equally well by different parameters, with different implications on system behavior. Beck offered a solution based on Dirichlet Priors [1], but, we show this solution is vulnerable to a different problem, Model Degeneracy, where parameter values violate the model’s conceptual meaning (such as a student being more likely to get a correct answer if he/she does not know a skill than if he/she does).We offer a new method for instantiating Bayesian Knowledge Tracing, using machine learning to make contextual estimations of the probability that a student has guessed or slipped. This method is no more prone to problems with Identifiability than Beck’s solution, has less Model Degeneracy than competing approaches, and fits student performance data better than prior methods. Thus, it allows for more accurate and reliable student modeling in ITSs that use knowledge tracing.
Conference Paper
This paper describes an effort to model a student's changing knowledge state during skill acquisition. Dynamic Bayes Nets (DBNs) provide a powerful way to represent and reason about uncertainty in time series data, and are therefore well-suited to model student knowl- edge. Many general-purpose Bayes net packages have been implemented and distributed; however, constructing DBNs often involves complicated coding effort. To address this problem, we introduce a tool called BNT- SM. BNT-SM inputs a data set and a compact XML specification of a Bayes net model hypothesized by a researcher to describe causal re- lationships among student knowledge and observed behavior. BNT-SM generates and executes the code to train and test the model using the Bayes Net Toolbox (1). Compared to the BNT code it outputs, BNT-SM reduces the number of lines of code required to use a DBN by a factor of 5. In addition to supporting more flexible models, we illustrate how to use BNT-SM to simulate Knowledge Tracing (KT) (2), an established technique for student modeling. The trained DBN does a better job of modeling and predicting student performance than the original KT code (Area Under Curve = 0.610 > 0.568), due to differences in how it esti- mates parameters.
Conference Paper
In this paper we show ,how ,model ,identifiability is an ,issue for student modeling: observed ,student performance ,corresponds ,to an ,infinite family of possible model parameter estimates, all of which make identical predictions about student performance. However, these parameter estimates make different claims, some of which are clearly incorrect, about the student’s unobservable,internal knowledge. We ,propose methods ,for evaluating these models to find ones that are more plausible. Specifically, we present an approach using Dirichlet priors to bias model search that results in a statistically reliable improvement,in predictive accuracy (AUC of 0.620 ± 0.002 vs. 0.614 ± 0.002). Furthermore, the parameters associated with this model provide more plausible estimates of student learning, and better track with known properties ofstudents’ background,knowledge. The main conclusion,is that prior beliefs are necessary to bias the student modeling search, and even large quantities of performance,data alone are insufficient to properly estimate the model.
Article
Belief networks provide an important way to represent and reason about uncertainty – significant factors for modelling students. These networks provide a way of structuring such models, and allow a system to use a systematic approach when gathering information about the scope of the student’s knowledge. This work also provides a theoretically-sound way to update the student model, based on the concept of a dynamic belief network. The relationship to related research is discussed. Finally, the paper describes why the barren node concept is important for computational efficiency in belief-net-based student models. (http://aied.inf.ed.ac.uk/members04/archive/Vol_14/Reye/Reye04.html)