ChapterPDF Available

Insights into Learning Competence Through Probabilistic Graphical Models


Abstract and Figures

One-digit multiplication problems is one of the major fields in learning mathematics at the level of primary school that has been studied over and over. However, the majority of related work is focusing on descriptive statistics on data from multiple surveys. The goal of our research is to gain insights into multiplication misconceptions by applying machine learning techniques. To reach this goal, we trained a probabilistic graphical model of the students’ misconceptions from data of an application for learning multiplication. The use of this model facilitates the exploration of insights into human learning competence and the personalization of tutoring according to individual learner’s knowledge states. The detection of all relevant causal factors of the erroneous students answers as well as their corresponding relative weight is a valuable insight for teachers. Furthermore, the similarity between different multiplication problems - according to the students behavior - is quantified and used for their grouping into clusters. Overall, the proposed model facilitates real-time learning insights that lead to more informed decisions.
Content may be subject to copyright.
Insights into Learning Competence through
Probabilistic Graphical Models?
Abstract. One-digit multiplication problems is one of the major fields
in learning mathematics at the level of primary school that has been
studied over and over. However, the majority of related work is focus-
ing on descriptive statistics on data from multiple surveys. The goal of
our research is to gain insights into multiplication misconceptions by
applying machine learning techniques. To reach this goal, we trained a
probabilistic graphical model of the students’ misconceptions from data
of an application for learning multiplication. The use of this model fa-
cilitates the exploration of insights into human learning competence and
the personalization of tutoring according to individual learner’s knowl-
edge states. The detection of all relevant causal factors of the erroneous
students answers as well as their corresponding relative weight is a valu-
able insight for teachers. Furthermore, the similarity between different
multiplication problems - according to the students behavior - is quan-
tified and used for their grouping into clusters. Overall, the proposed
model facilitates real-time learning insights that lead to more informed
Keywords: Bayesian networks Probabilistic graphical models Learn-
ing analytics
?The authors declare that there are no conflicts of interest and no ethical issues, no
particular funding was achieved.
Draft - originally published in: Saranti, A., Taraghi, B., Ebner, M., Holzinger, A. (2019) Insights into
Learning Competence Through Probabilistic Graphical Models. In: Machine Learning and Knowledge Extraction.
Third IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2019,
Canterbury, UK, August 26–29, 2019, Proceedings. pp. 250-271
1 Introduction
1.1 Previous Work
The field of learning analytics is of increasing importance for educational research
[44], [17], [20]. Moreover, it aims to assist the learning process by providing teach-
ers a deeper insight into learning processes and learning results. Teachers play an
essential role because they are responsible for intervening in a pedagogical ade-
quate manner.Recently, the field makes heavy use of statistical machine learning
[40]. Whilst educational data mining targets on automating learning activities,
learning analytics supports educators in their daily routine.
The so called “1x1 trainer” 6is a learning analytics application developed by
the department Educational Technology of Graz University of Technology, Aus-
tria. It uses the benefits of both fields, learning analytics as well as educational
data mining [18], [19], [42].
The application poses exercises to students from the multiplication table
with one decimal digit operands. The algorithm of the “1x1 trainer” adapts the
sequence of given questions subject to the students answers individually, in order
to improve individual learning progress. However, technically, this needs to react
adaptively to the changes of the learning progress of a student. In that way, it
would support each student according to the user’s distinct learning progress
over the whole learning period. This underlying personalized adaptive learning
algorithm shall discover weak mathematical knowledge of single students and
alerts teachers just in time to adequately intervene.
The work is based on previous research that used data gathered by the “1x1
trainer”. Firstly, different mathematical questions were roughly classified accord-
ing to the learners answers. Questions were considered to be more difficult than
others when students required more attempts to answer them [47], [50]. Some
specific questions could be identified as difficult for the majority of the users.
The next step was to analyze the explanation of the errors made. Therefore,
error types were assigned to falsely answered questions, which correspond to
the innate cognitive and conceptual learning shortcomings of the users [49]. The
“relative difficulty” of those questions - 2 ×3 seems to be simpler than 7 ×8 -
played no role in identifying the error types. More explanation on error types
follows in section 2.
1.2 Bayesian Student Models
There is a plethora of learning applications that use probabilistic graphical mod-
els (also called Bayesian models/networks [38]) to model student’s knowledge.
These models have started making an impact in the research of causal learning
and inference generally, but there are good arguments that even children’s causal
learning could also be modeled in this way [11].
Most applications belong to the category of intelligent tutoring systems (ITS)
[14] or adaptive educational systems (AES) [4]. The main goal of an ITS is to
6, Last accessed 25 August 2018
Insights into Learning Competence through Probabilistic Graphical Models 3
provide personalized recommendations according to the different learning styles,
whereas AES adapt the learning content as well as its sequence according to
the student’s profile [41]. As explained in the literature review by Chrysafiadi &
Virvou [12], there are two classes of intelligent tutoring systems: Systems that
make diagnosis with the student’s knowledge, misconceptions, learning style or
cognitive state, and systems that plan a personalized strategy using diagnosis for
each learner individually. Student modelling is considered a subproblem of user
modelling which is of central importance to ITS since otherwise each student is
treated the same [36].
Primarily, we are interested in Bayesian networks because of their ability
to model uncertainty [35] and, at the same time, to support a decision making
process. The user modelling goals of a Bayesian network for knowledge modelling
is mainly to have an adaptive estimation of the knowledge itself, since it may
increase or decrease during the learning process [5]. Since scalar models and fuzzy
logic approaches [15] have lower precision, structural models are built with the
assumption that the knowledge is composed mainly by independent parts. On the
other hand, bug/perturbation models [12] represent errors and misconceptions
of the student. In this case, the Bayesian network is used to find the error that
most probably caused the observable behavior (also called evidence) [35] which
is called credit/blame assignment problem [37]. Bayesian networks can model
the assumption that a wrongly answered question having two potential causes is
most probably caused by the one that is more prevalent, according to the data
provided so far. Sometimes, random slips or typos are included in the model and
do not rely on assumptions as for example: A wrongly answered question does
not necessarily mean that the student does not know a concept completely, or a
correctly answered one wasn’t a guess. The structure in both cases constitutes
the qualitative model; its definition uses domain knowledge and (optionally)
data. The parametrization is learned from the data during a training phase and
constitutes the quantitative model.
The reason for the creation of the model is in some cases to assist the teachers
of large classes that suffer from a high dropout rate [51]. A model recognizes the
student’s knowledge faster and more accurate [35] which is primarily beneficial
when the class has a large number of students. In other cases that are summed
up in [6], the goal is to provide a personalized optimal sequence of the learning
material or even to sequence the curriculum according to the student’s individual
needs. And yet, further cases [46], [45] show that the learning application that is
based on the model provides long-term learning effects as opposed to traditional
methods. This was studied by a post-test that was made several weeks after the
learning sessions.
The issue of defining the prior beliefs, which consist the starting parameter-
ization, is often coupled with user clustering; demographics, longitudinal data
[46], pre-tests [25], [13], defining the prior beliefs as well as the starting group-
ings [5] with respect to the learners. In other cases, the teacher sets the prior
beliefs from his/her experience [36] or a uniform prior is used [13], [37]. Another
common characteristic is the definition of hidden structural elements that rep-
resent unobservable entities, which must be estimated from the observed ones.
The design of the structure must take correct assumptions into account, based
on a solid theoretical background, otherwise the model will not work correctly
In the work of Eva Mill´an et al. [32], the researchers draw a parallel between
medical diagnosis systems and student’s knowledge diagnosis [33]. Actually, this
is an important comparison as the development of clinical reasoning and decision-
making skills is very similar [3].
The student answers a set of questions that can only be answered correctly,
when several concepts are known. In this case the knowledge of the concepts is
the cause of the answer. The noise in the process, for instance when a student
knows the concept but answers wrongly and vice versa a correct guess, is also
modelled. The initialization of the model parameters is made by teachers-in-the-
loop; afterwards the parameters are learned from the data. The model is used
to efficiently determine those concepts the student knows less and the deductive
proposal of the next question.
The “eTeacher” is a web-based education system [22], [41] that recognizes
the learning style of a student according to the performance in exams as well
as email, chat and messaging usage. The number of different learning strategies
and their characteristics is the ”domain knowledge” defining the structure of
the Bayesian network. The initialization of the parameters uses in some cases
uniform priors and in others priors defined by experts. After that initial phase,
the parameters are continuously learned from the behaviour of the students.
After identifying the learning style, a recommendation engine proposes different
ways to learn the same material to each student according to his or her learning
“ANDES” is an ITS developed by Conati et al. [14], which mainly focuses on
knowledge tracing but also on recognition of the learning plan of the user. The
students solve Newtonian physics problems with different possible solution paths
that define the Bayesian network’s structure. Since each action may belong to
different solution paths and the user does not provide its reasoning explicitly,
the credit assignment problem is to find and quantify the most likely solution an
action belongs to. This triggers personalized help instructions and hints in two
cases: when a wrong answer is given or when the model predicts that the answer
might be wrong. The parameters of the network change in an online manner
while the student is solving the problems. Firstly, the evaluation was made by
simulating students that have different knowledge profiles and measuring the
accuracy of the predictions made by the model. In a second step, a post-test was
carried out to compare real students having used “ANDES” to students who
have not. Regression analysis was used to recognize the correlation between the
use of the program and the learning gain [7].
Specifically for mathematical problems there are several approaches that spe-
cialize in dealing with decimals misconceptions. In the work of Stacey et. al. [46],
[45] the misconceptions that define the structure of the model are provided by
two main factors: the domain knowledge and data of a Decimal Conception Test
Insights into Learning Competence through Probabilistic Graphical Models 5
(DCT) that students had to go through. Wrongly answered questions provided
by the students depend on their misconceptions. The researchers defined the
distinct misconception by computing which of them has the highest probability
according to the data. Although the model drives different question sequencing
strategies, some of the misconceptions were not correctly recognized. Therefore,
the researchers decided that the teacher and not the system should provide in-
Also, the research work of Goguadze concentrates on the modelling of dec-
imals misconceptions [25], [24]. The “AdaptErrEx” project selected the most
frequently occurring misconceptions and ordered them a taxonomy (higher and
lower level misconceptions), which is reflected in the dependencies of the Bayesian
network. As the previous application, a wrong answer may be caused by different
misconceptions. The prior beliefs are defined by a pre-test; the researchers assert
that sufficient training data diminish the role of the prior in the computation
of the posterior. This prior defines the typical/average student and then each
user’s parameters can be updated and individualized accordingly. One aspect
that has not been considered in this model yet, is the difficulty of each question:
easy questions will more likely be answered correctly than difficult ones, even if
there is a high probability of misconception.
Several student modelling models track the progress of knowledge through
time with Dynamic Bayesian Networks (DBN). The knowledge of the learner
at each time point can be considered to be dependent on the knowledge and
(optionally) the observed result of the interaction at the previous time point
[34]. The project “LISTEN” [10] represents the hidden knowledge state of the
student at each time point. The observable entities are the tutor interventions
and the student’s performance which are used to infer the knowledge state. In
the work of K¨aser et. al. [27] there is an overview and comparison of Bayesian
Knowledge Tracing (BKT), which is a technique for student modelling using a
Hidden Markov model (HMM) modelling and DBN for various learning topics,
such as number representation, mathematical operations, physics, algebra and
spelling. A HMM is a special case of a DBN, which, according to the researchers,
cannot represent dependencies that would lead to hierarchies of skills; in these
case DBNs create more adequate models.
All above described applications have a Bayesian network of the students
model at its architecture core. There are a number of other components that
either support the teacher or the student. One of them, for example, is the vi-
sualization of the model in the “VisMod” application [52], which is displayed in
(among other things) color and arrow intensity instead of number-filled tables.
This increases the readability of the model and enhances the tutor’s understand-
ing. Gamification elements can also be found in “Maths Garden” [28], an appli-
cation that lets users gain and loose points and coins depending on answering
correctly or wrongly. A coaching component that provides feedback and hints to
refresh the memory can be found in “ANDES” ’s architecture [7]. An overview
about the design and architectural elements of intelligent tutoring systems that
have a Bayesian network as user model is provided in the work of Gamboa et.
al. [21].
A detailed overview about intelligent techniques other than Bayesian net-
works, such as recommender systems for the computation of the learning path
as well as clustering and classification for learner profiles that are used in e-
learning systems, is provided in [31]. Specifically in [5], the demand for the most
appropriate activity proposed - neither too easy nor too difficult - can only be
fulfilled, if the used model is both accurate and adaptive.
1.3 Research Question
The main objective of this research work is to answer the research question,
whether Bayesian networks can quantify the defined misconceptions of one-digit
multiplication problems. In order to answer this question the “1x1 trainer” ap-
plication is taken as the underlying data provider. The application focuses on
the recognition of the current learning status. However learning aware appli-
cations maintain an adaptive learning model that represents the knowledge of
the learner/user with regard to the learned topic. The application is expected
to support individual learning needs and abilities as well as considering com-
mon characteristics in the learning process of different persons. The progress
of the learning model itself will be used to transform the learning application
into an adaptive one; that may change the content and sequence of assessments
constantly to improve the learning process and to maximize the learning efforts.
The “1x1 trainer” application has a current overall report that is accessi-
ble to teachers. It contains information about the actual number as well as the
proportion of correct and wrong answers of each posed question. It uses color en-
coding that helps distinguishing four sets of questions with similar proportion of
correct and wrong answers. The implementation of the Bayesian model provides
further insights to detailed cognitive information that enriches the information
content of the current report. Furthermore, the new report can concentrate on
individualized learning status, considering the causes of the wrong answers and
can be updated in real time after each action of the student.
1.4 Outline
This research work proposes a Bayesian model for the learning competence of
students using the “1x1 trainer” application. The first step is to specify the
error types that are relevant for this research; their detailed description is made
in section 2. Data analysis (specifically descriptive statistics) is used to guide
the necessary assumptions about the modelled entities and their independences.
Based on this information, the structure of the model and its parametrization
is defined. The personalized model of each student and the method by which
it adapts its parameters to new data is described in section 3. The usage of
the model and the insights that are provided to the teachers in the form of an
enhanced report are explained in section 4. Finally, a conclusion about future
research and improvement possibilities is in section 5.
Insights into Learning Competence through Probabilistic Graphical Models 7
2 Error types of one-digit multiplication and descriptive
2.1 Error types of one-digit multiplication problems
The bug library [12] of the proposed learning competence model contains six
error types: operand, intrusion, consistency, off-by, add/sub/div, and pattern.
Any false answer that does not belong to one of those six categories is assigned
to the unclassified category. The description of the error types is explained in
detail in [48]; a brief description follows here:
1. Operand error: It occurs, when the student mistakes at least one operand
for one of its neighbours [8]. In the implementation only a neighbourhood of
overall absolute distance of 2 from the correct operands was considered. One
example is the answer 48 to the question 7×8 since the user may mistakenly
multiplied 6 ×8. Research shows that this is the most frequently occurred
error, but it occurs with a different proportion in each posed question [9].
2. Operand intrusion (abbreviated intrusion) error: It happens, when the decades
digit and/or the unit digit of the result equals one of the two operands of
the posed question, for example 7 ×8 = 78. It is argued by [8] that the two
operands of the multiplication question are perceived as one number by the
student (the first operand corresponding to the decades digit and the second
to the unit digit).
3. Consistency: The student’s answer has either the unit digit or the decade
digit of the correct answer [43], [16]. For example, the answer 46 to the
question 7 ×8 indicates that the unit digit is correct, but the decades digit
is false.
4. Off-by-±1, Off-by-±2 : It occurs, when the answer of the student deviates
from the correct one by ±1 or ±2, for example, when the answer of the
question 5 8 is one of the following: {38,39,41,42}.
5. Add/Sub/Div: The student confuses the operation itself and performs for
example an addition instead of a multiplication; in that case the answer to
7×8 is 15.
6. Pattern: The student mistakes the order of the digits of the result, for ex-
ample, question 7 ×8 provides the answer 65 (the decades digit and the unit
digit are permuted).
7. Unclassified: Any answer that can not be matched to one of the above error
All questions that have a correct answer with value smaller than 10 do not
have consistency error. These are: 1 ×1,1×2,2×1,1×3,3×1,1×4,4×1,
One of the main reasons to use a probabilistic graphical model, is the fact
that a specific false answer can be classified to multiple error types. The identi-
fication of the most probable error type causing a wrong answer is called credit
assignment. The table 1 shows the possible false answers for the question 7 ×8.
One can see that for example the answer 72 could occur because of an operand
or an intrusion error.
Table 1. Answers for question 7 ×8 listed by error types
Error Type Answers
operand 40,42,48,49,54,63,64,72
intrusion 18,28,38,48,58,68,71,72,73,74,75,76,77,78,79,88,98
consistency 16,26,36,46,51,52,53,54,55,57,58,59,66,76,86,96
add/sub/div 1,15
pattern 65
unclassified 4,5,6,7,8,9,10,11,12,19,20,21,22,23,24,25,27,29,30,
correct 56
2.2 Data of the “1x1 trainer”
The data that were used for building the model were provided from the “1x1
trainer” application. The application is for both students and teachers. For this
work it was also used for a preliminary categorization of learners. Users of this
application are confronted with multiplication questions with both multiplicands
being one-digit integer numbers. The possible questions range from 1 ×1 up to
10 ×9 (a total of 90 questions) and are posed in a pre-specified order. The ap-
plication does not provide any means of help or hints to the students so far; the
only feedback users get, is whether their answer is correct or not. It is expected
that by repeated use of the application the students will learn and get better
through exercise. But there is no individualisation that takes care of the per-
sonal needs of the learning style and knowledge level of the users. Furthermore,
personal information such as age, gender, demographics, and educational level
were not collected.
The data were cleaned in the preprocessing phase. The answers that did not
lie in the interval [0 100) were considered invalid and were removed. Overall
there were 1179720 question-answer pairs with 1164786 valid. The number of
unique users that gave at least one valid answer is 9058.
3 Probabilistic Graphical Model of Learning Competence
The use of a learning-aware data-driven application cannot assume that the
user’s learning competence remains unchanged. Simple statistical descriptions
are not practical in representing a continuous change and do not effectively
capture the differences between the learning process of the users. Furthermore,
Insights into Learning Competence through Probabilistic Graphical Models 9
the purpose is to choose intelligent actions (also called “actionable information”
[1]) based on the data and this is not possible simply by one rigid and non-
adaptive analysis of the data.
The choice of a probabilistic graphical model has several benefits. Firstly, it
allows the representation of conditional dependencies (and independencies cor-
respondingly) in the graphical representation of the model of the data. Those
are assumed to be the same for all users and stay stable over the course of
application usage. Secondly, its parameters (that can be thought of as a config-
uration or instance) are adaptive and change with each new data sample that is
observed. They may be a temporary snapshot description that characterizes the
learning competence but unlike the statistics there is an effective way to adapt
those and not recompute them from scratch each time the model confronts new
data. Thirdly, they’ve already been extensively used for decision problems [1],
[29] which are the forefronts of reinforcement learning algorithms.
3.1 Introduction to Probabilistic Graphical Models
Probabilistic Graphical Models are representations of joint probability distri-
butions over random variables that have probabilistic relationships expressed
through a graph. The random variables involved can be discrete which have cat-
egorical values or continuous with real values. The set of possible values that a
random variable can take - sometimes also referred as the possible outcomes of
the experiment described by the random variable - is its domain. The random
variables can be either visible or hidden. The visible ones have outcomes that
can be directly observed and their values are contained in the dataset. The hid-
den variables are defined by human experts using the domain knowledge of the
problem, but their outcomes are not directly accessible. They usually represent
latent causes of visible random variables and can improve the accuracy and the
interpretability of the model [30].
To specify the dependencies of the variables in general, one needs to specify
their direction, type and intensity. This is made with the use of graphs which
provide the terminology and theory for understanding and reasoning about Prob-
abilistic Graphical Models. The nodes (also called vertices) of the graph represent
the random variables and the edges their dependencies which can be directed
or undirected. Undirected models - also called Markov networks - on the other
hand represent symmetric probabilistic interactions where there is no depen-
dency with direction, only factors that represent the degree of the strength of
the connection. In case where the dependencies are directed, the graph must be
a directed acyclic graph (DAG), otherwise circular reasoning would be possible.
These two categories are used in different applications.
3.2 Model Structure
Domain knowledge about the already described error types that are encoun-
tered in one-digit multiplication, as described in section 2, was used to define
the model. This is in accordance with the data-driven approach of model con-
struction [30] where the structure of the model is specified by the designer and
the parameters are learned from the data.
A question is either answered correctly or faulty. The student can make one
of the following errors: Operand, intrusion, consistency, off-by-±1 and off-by-±2,
pattern, confusion with addition, subtraction, division or an unclassified error
(meaning none of the above). Therefore a multinomial random variable called
Learning Stateq- individual for each question qwas chosen to represent the
proportion of each of these misconceptions of the user, when he or she is an-
swering a one-digit multiplication question. The variable follows the categorical
distribution; in this case the Learning Stateqhas eight possible outcomes and
the domain of this random variable is Val(Learning Stateq) = {operand, in-
trusion, consistency, pattern, confusion, unclassified, correct}(meaning that 1
is the operand error, 2 the intrusion error and so on). The Learning Stateqof
a specific user can be described for example 5% operand error, 4% consistency
error and 91% correct answering (the rest possible outcomes have 0%). This
parametrization must be learned from the data.
In the previous section it is shown that a specific faulty answer may be classi-
fied to more than one error types. Although in reality the model does not assume
that more than one error type created a particular answer, the model cannot
know a priori which error type was more prevalent and played the decisive role
in choosing the wrong answer. The Learning Stateqis hidden and the percent
of each error type is expected to be learned by the provided answers. Thereby, a
dominant error type (for a specific user) can be still discovered and weaken the
belief that multiple error types played a role for a specific faulty answer. In sec-
tion 4 the inference of the most probable error type (credit assignment problem)
of a specific wrong answer will be made after the learning of the parameters is
The proportion of correct and false answers is different for each question.
Even though each question is not posed the same number of times and the be-
lief about the possibility of correctly answering each question is different, this
was also taken into account. That means that the probability of answering cor-
rectly is not. Therefore, there are 90 random variables called Correctness1×1to
Correctness10×9(abbreviated by Correctnessq) that have each two possible
outcomes. Therefore the Bernoulli distribution was chosen, which is equivalent
to a categorical distribution with a domain of two values.
Each question has a distinct random variable, named accordingly as Answers1×1
to Answers10×9(abbreviated as Answesq), which is a child of the Learning Stateq
random variable. The arrows from the Learning Stateqto its children reflect
the dependency of the answer to a question from the misconception or correct
understanding of the user.
The conditional independence property of each Learning Competence model
is expressed by the following equation:
AnswesqCorrectnessq|Learning Stateq(1)
Insights into Learning Competence through Probabilistic Graphical Models 11
Fig. 1. The structure of all Probabilistic Graphical models for Learning Compe-
tence. The shaded Answesqnodes are the ones that are observed, whereas the
Correctnessq,Learning Stateqrandom variables remain unobserved.
The joint probability distribution for each question qhas the following fac-
P(Correctnessq,Learning Stateq,Answersq) =
P(Correctnessq)P(Learning Stateq|Correctnessq)
P(Answersq|Learning Stateq)
Each error type can only produce a specific subset of answers, so the others
will have zero probability of occurring given this particular error type. Every
row of the conditional probability tables of the Answersqrandom variables has
values that sum to one and the last row has only one entry with probability 1.0
at the column with the correct answer and 0.0 everywhere else. Figure 1 depicts
the described structure of Learning Competence models.
The model needs to express the following procedure: First knowing if the
question is answered correctly; this is provided by the Correctnessqrandom
variable. If this is true then there are no more steps to follow. In the case where
the answer is false, there must have been an error which belongs to the hid-
den Learning Stateq. One of the possible answers of this error, as seen and
quantified by Answersqwill be the actual answer of the user.
The model reflects our belief about the overall learning competence of the
user. Its structure is considered to be the same for all users, but the conditional
probability values (entries in the conditional probability tables) will differ for
each individual user. Nevertheless the model can also reveal similarities between
the users, meaning at this stage models that have similar parameter values.
3.3 Learning the Model’s Parameters
The answers of questions of the students which are already gathered comprise the
data set denoted by D. The goal of parameter learning is the estimation of the
densities of all random variables in the model. The joint probability distribution
PMdefined by the model Mwith parameters Θis expressed by equation 2.
The parameter learning’s goal is to increase the likelihood of the data given the
model: P(D|M) or equivalently the log-likelihood: log P (D|M) with respect to
the set of the parameters Θof the model. The likelihood expresses the probability
of the data given a particular model; a model that assigns a higher likelihood to
the data Dapproximates the true distribution (the one that has generated the
data) better.
The algorithm that is used to estimate the parameters in cases where some
of the variables are hidden is expectation-maximization (EM). Since the la-
tent variables Correctnessqand Learning Stateqare not observed, the direct
maximization of the likelihood is not possible. The EM algorithm initializes all
model’s parameters randomly and iteratively increases the likelihood by step-
wise maximizing the expected value of the currently estimated parameters [2]. If
the likelihood’s increase or the parameters’ change is not significant compared
to the previous iteration, then the algorithm can be stopped. The procedure of
updating the log-likelihood in this manner is shown to guarantee convergence
to a stationary point, which can be a local minimum, local maximum or saddle
point. Fortunately, by initializing the iterations from different starting parame-
ters and injecting small changes to the parameters, the local minima and saddle
points can be avoided [30].
The models are simple enough; therefore the EM-algorithm has a straightfor-
ward analytical solution. The available data were divided into a training and test
set, with a dataset containing data from users that have answered all the ques-
tions at least one time (The number of users that have answered all questions
exactly once is 2218). The models parameters are computed by the EM-algorithm
on the training data and it iterates 4 times. After 4 iterations the likelihood of
the training set increases, but the likelihood of the test set decreases which con-
sists an indication of overfitting. The diagram in figure 2 describes the main
computational blocks of this process.
Fig. 2. Computational blocks diagram of data preprocessing, splitting into training
and test set until the computation of the learned model.
Insights into Learning Competence through Probabilistic Graphical Models 13
4 Insights
After the model of a particular student is learned - by using the informed prior
as starting point and as evidence the answers he or she has given so far. The
better and more accurate the model captures the learning competence of the
student, the better the performance of the predictions of the answers will be. In
some probabilistic modelling frameworks such as Figaro 7the parameter learning
part is made by the offline component and the probability queries by the online
There are three types of reasoning one can make with probabilistic graphi-
cal models: causal, evidential and explaining away. Causal reasoning (also called
prediction) consists of statements that start with the knowledge of the causes as
evidence and provide information about the effects. In our model this would be
possible if the Correctnessqand Learning Stateqwere known: the computa-
tion of the answer to a posed question would be accurately determinable. The
direction of causal reasoning in directed graphical models goes from parent to
child variables (“downstream”) in general and is used to predict future events.
Evidential reasoning (also called explanation) on the other hand has the
opposite direction and involves situations where effects lead to the specification
of causes. This is the most important reasoning in our case because the answers
of the students provide the information to do evidential reasoning and learn
the hidden variables Correctnessqand Learning Stateqwhich in turn can
be used for causal reasoning to predict the future answers of each student. The
difference in causal and evidential reasoning can be understood by considering
the direction of time; evidential reasoning infers the past probability distribution
from the current set of data whereas causal reasoning makes a prediction for the
future given the data. The great benefit of graphical models over statistics is
that the same model is used for both backward and forward reasoning (with
respect to the perception of time).
Intercausal reasoning occurs when one random variable depends on two or
more parents. In this case, the observation of the value of one parent influences
the belief about the value of the other(s) (either strengthen or weaken). In this
situation it is said that one reason explains away the other. The Learning Com-
petence model’s structure does not contain such cases; further discussion about
this reasoning type can be found in [26], [30], [2].
The upcoming sections proceed with an analytical implementation of prob-
abilistic queries which is specific to the designed Learning Competence models.
Personalized insights computed by the latent explanations of wrong answers of
each student are made possible by exact and efficient inference as described in
section 4.2.
4.1 Probability Queries
A conditional probability query P(Y|E=e) - also called probabilistic inference
- computes the posterior of the subset of random variables represented by Y(tar-
7, Last accessed 25 August 2018
get of query) given observations eof the subset of evidence variables denoted by
E(of course there may be a subset of variables Zin the model not belonging to
either of these two subsets). By using the Bayes rule, the conditional probability
is written as:
P(Y|E=e) = P(Y, e)
The MAP query, which is also called most probable explanation (MPE) [30],
[39] is a query that maximizes the posterior of the joint distribution of a subset
of random variables Y:
MAP(Y|E=e) = arg max
yP(y, e) (4)
In the case of MAP Query the whole set of random variables is X={Y,E}.
In other words the MPE, after observing (clamping) a subset of variables, it
computes the most likely values of the rest of them jointly.
A slightly different query is the marginal MAP which is written as follows:
Marignal MAP(Y|E=e) = arg max
yP(y|e) =
arg max
which directly follows from the fact that X={Y,E,Z}.
The computation of the query result can be made with the variable elimina-
tion algorithm which is described in the following section 4.2; in this case the ex-
act value of equation 3 is computed by dividing P(y, e) = PwP(y, e, w) and P(e) =
PyP(y, e). Alternatively, the the normalization of a vector containing all P(yk, e)
(where ykare all possible outcomes of the variables Y) so that it has sum
that equals to one, provides also the desired result. For more complex Bayesian
networks approximate inference algorithms are applied since exact inference is
NP-hard [29], but even those can be in the worst-case also NP-hard [30]. The
Learning Competence models are simple and the variable elimination algorithm
is fast enough.
4.2 Variable elimination in the Learning Competence Model
The probability query that is of high relevance for the teachers is the probability
of error types regarded as causes of a specific wrong answer. The sum of products
expression in equation 6 computes the distribution of the Learning Stateqby
means of the joint distribution P(Cq,LSq,Aq):
P(LSq) = X
P(Cq,LSq,Aq) = X
P(Aq|LSq) (6)
Insights into Learning Competence through Probabilistic Graphical Models 15
Fig. 3. Parameters of Learning Competence model of question 6 ×7 that are relevant
to the computation of the MAP query when the answer is 40
The first step of the Variable Elimination algorithm, in case it is applied
where an evidence exists, is to compute the unnormalized joint distribution
P(C6×7,LS6×7,A6×7= 40). For example, the faulty answer 40 for the question
6×7 eliminates all cases for which the answer is not equal to 40; it can belong
only to two potential error types: consistency and off-by. The remaining rows of
the joint distribution - those containing the unnormalized proportion unequal to
0 are listed in table 2. The computations use the corresponding parameters of
the Learning Competence Model of question 6 ×7 depicted in figure 3.
Table 2. Unnormalized joint distribution P(C6×7,LS6×7,A6×7= 40)
C6×7LS6×7A6×7unnormalized proportions
wrong operand 40 0.158 ·0.336 ·0.035 = 1.85 ·103
wrong off-by 40 0.158 ·0.103 ·0.202 = 3.28 ·103
The sum of the unnormalized proportions, 1.85·103+3.28·103= 5.14·103
(which is the value of P(A6×7= 40)), can be used to compute the normalized
probabilities of the causes of answer 40 as depicted in table 3.
Table 3. Normalized joint distribution P(C6×7,LS6×7,A6×7= 40)
C6×7LS6×7A6×7normalized probabilities
wrong operand 40 1.85 ·103/5.14 ·103= 0.36
wrong off-by 40 3.28 ·103/5.14 ·103= 0.64
The process eventually performs the following computation in equation 7
which is in accordance to equation 3.
P(C6×7,LS6×7|A6×7= 40) = P(C6×7,LS6×7,A6×7= 40))
P(A6×7= 40) (7)
The distributions Correctness6×7and Learning State6×7in the Learning
Competence model is as follows:
Table 4. Learning State6×7distribution of wrong answers in question 6 ×7 before
the user answers 40
wrong correct
0.158 0.842
operand intrusion consistency off-by add/sub/div pattern unclassified
0.336 0.079 0.163 0.103 0.0014 0.072 0.243
After observing 40, the Explanations probability distributions are as follows:
Table 5. Explanations distribution of wrong answers in question 6 ×7 after the user
answers 40
operand off-by
0.36 0.64
The result of the MAP Query (most probable explanation) is the joint as-
signment MAP(Correctness6×7,Learning State6×7) = (wrong,off-by). The
result of the Marginal MAP query over the Learning State6×7only, states
that the most probable cause of the answer is the off-by error, as seen in figure
This is an example of a case where the an error type has a higher probability
than another one in the P(Learning Stateq|Correctnessq= wrong) distribu-
tion, but the probability query could state that the most probable cause of a
particular answer is the second one.
The results of the probability queries depend on the parameters of the model,
which in turn are influenced by the prior distribution and the number of EM-
5 Future Work
The learned probabilistic model can be used in a generative scheme where the
learning application will sample the model to predict the answer of the student.
There are several algorithms that compute samples from the models with dif-
ferent characteristics [39], [30]. Particularly for this model where the dataset is
highly unbalanced and the number of correctly answered questions is predom-
inant, the metric to measure prediction performance should particularly take
this fact into account. Although this feature does not provide an insight per
Insights into Learning Competence through Probabilistic Graphical Models 17
Fig. 4. Learning State6×7and explanations distribution of question 6 ×7 before and
after the user answers 40
se, it can be a starting point for other informative learning aspects. One as-
pect is explainable-AI which combines Bayesian learning approaches with clas-
sic logical approaches and ontologies, thereby making use of re-traceability and
transparency [23].
Even though the proposed research extends the capabilities of the current
learning application considerably, it cannot answer the fundamental question of
which should be the most appropriate question to pose to the student. After
the different learning competences are derived, the handling is delegated to the
teacher, not the application itself. Further considerations apply to whether the
models of learning competences that could be grouped together are the ones
where the students will have the same learning path till they’ve learned to an-
swer all questions correctly. The goal of this learning-aware application is not to
group the learning competences by similarity of their parameters (expressing the
current situation), but to find which ones will lead to similar optimal learning
paths. This learning-aware application could benefit from an answer prediction
component that accurately simulates students learning paths.
1. Barga, R., Fontama, V., Tok, W.H., Cabrera-Cordon, L.: Predictive analytics with
Microsoft Azure machine learning. Springer (2015)
2. Bishop, C.: Pattern recognition and machine learning. Springer (2006)
3. Bloice, M., Simonic, K.M., Holzinger, A.: On the usage of health records for the
teaching of decision-making to students of medicine. In: Huang, R., Kinshuk, Chen,
N.S. (eds.) The New Development of Technology Enhanced Learning, pp. 185–201.
Springer Berlin Heidelberg (2014). 11
4. Brusilovsky, P., Millan, E.: User models for adaptive hypermedia and adaptive
educational systems. In: The adaptive web, pp. 3–53. Springer, Heidelberg (2007). 1
5. Brusilovsky, P., Mill´an, E.: User models for adaptive hypermedia and adaptive
educational systems. In: The adaptive web, pp. 3–53. Springer (2007)
6. Brusilovsky, P., Peylo, C.: Adaptive and intelligent web-based educational systems.
International Journal of Artificial Intelligence in Education (IJAIED) 13(2-4), 159–
172 (2003)
7. Bunt, A., Conati, C.: Probabilistic student modelling to improve exploratory be-
haviour. User Modeling and User-Adapted Interaction 13(3), 269–309 (2003)
8. Campbell, J.I.: Mechanisms of simple addition and multiplication: A modified
network-interference theory and simulation. Mathematical cognition 1(2), 121–164
9. Campbell, J.I.: On the relation between skilled performance of simple division
and multiplication. Journal of Experimental Psychology: Learning, Memory, and
Cognition 23(5), 1140–1159 (1997)
10. Chang, K.m., Beck, J., Mostow, J., Corbett, A.: A bayes net toolkit for student
modeling in intelligent tutoring systems. In: Proceedings of the 8th International
Conference on Intelligent Tutoring Systems. pp. 104–113. Springer (2006)
11. Chater, N., Tenenbaum, J.B., Yuille, A.: Probabilistic models of cognition: Con-
ceptual foundations. Trends in cognitive sciences 10(7), 287–291 (2006)
12. Chrysafiadi, K., Virvou, M.: Student modeling approaches: A literature review for
the last decade. Expert Systems with Applications 40(11), 4715–4729 (2013)
13. Conati, C., Gertner, A., Vanlehn, K.: Using bayesian networks to manage uncer-
tainty in student modeling. User modeling and user-adapted interaction 12(4),
371–417 (2002)
14. Conati, C., Gertner, A.S., VanLehn, K., Druzdzel, M.J.: On-line student modeling
for coached problem solving using bayesian networks. In: User Modeling UM 97.
pp. 231–242. Springer (1997). 24
15. Danaparamita, M., Gaol, F.L.: Comparing student model accuracy with bayesian
network and fuzzy logic in predicting student knowledge level. International Jour-
nal of Multimedia and Ubiquitous Engineering 9(4), 109–120 (2014)
16. Domahs, F., Delazer, M., Nuerk, H.C.: What makes multiplication facts difficult:
Problem size or neighborhood consistency? Experimental Psychology 53(4), 275–
282 (2006)
17. Ebner, M., Neuhold, B., Sch¨on, M.: Learning analytics–wie datenanalyse helfen
kann, das lernen gezielt zu verbessern. In: Wilbers, K., Hohenstein, A. (eds.)
Handbuch E-Learning-Expertenwissen aus Wissenschaft und Praxis-Strategie, In-
strumente, Fallstudien, pp. 1–20. Deutscher Wirtschaftsdienst (Wolters Kluwer
Deutschland), 48, erg.-lfg edn. (2013)
18. Ebner, M., Sch¨on, M.: Why learning analytics in primary education matters. Bul-
letin of the Technical Committee on Learning Technology 15(2), 14–17 (2013)
19. Ebner, M., Sch¨on, M., Taraghi, B., Steyre, M.: Teachers little helper: Multi-math-
coach. International Association for Development of the Information Society (2013)
20. Ebner, M., Taraghi, B., Saranti, A., Scon, S.: Seven features of smart learn-
ing analytics-lessons learned from four years of research with learning analytics.
eLearning papers 40, 51–55 (2015)
Insights into Learning Competence through Probabilistic Graphical Models 19
21. Gamboa, H., Fred, A.: Designing intelligent tutoring systems: a bayesian approach.
Enterprise information systems 3, 452–458 (2002)
22. Garc´ıa, P., Amandi, A., Schiaffino, S., Campo, M.: Evaluating bayesian networks
precision for detecting students learning styles. Computers & Education 49(3),
794–808 (2007)
23. Goebel, R., Chander, A., Holzinger, K., Lecue, F., Akata, Z., Stumpf, S.,
Kieseberg, P., Holzinger, A.: Explainable ai: the new 42? In: Springer Lec-
ture Notes in Computer Science LNCS 11015. pp. 295–303. Springer (2018). 21
24. Goguadze, G., Sosnovsky, S., Isotani, S., McLaren, B.M.: Towards a bayesian stu-
dent model for detecting decimal misconceptions. In: Proceedings of the 19th Inter-
national Conference on Computers in Education. pp. 34–41. Chiang Mai, Thailand
25. Goguadze, G., Sosnovsky, S.A., Isotani, S., McLaren, B.M.: Evaluating a bayesian
student model of decimal misconceptions. In: Proceedings of the 4th International
Conference on Educational Data Mining. pp. 301–306. Citeseer (2011)
26. Karkera, K.R.: Building probabilistic graphical models with Python. Packt Pub-
lishing Ltd (2014)
27. aser, T., Klingler, S., Schwing, A.G., Gross, M.: Dynamic bayesian networks for
student modeling. IEEE Transactions on Learning Technologies 10(4), 450–462
28. Klinkenberg, S., Straatemeier, M., van der Maas, H.L.: Computer adaptive practice
of maths ability using a new item response model for on the fly ability and difficulty
estimation. Computers & Education 57(2), 1813–1824 (2011)
29. Kochenderfer, M.J.: Decision making under uncertainty: theory and application.
MIT Press (2015)
30. Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques.
MIT Press (2009)
31. Markowska-Kaczmar, U., Kwasnicka, H., Paradowski, M.: Intelligent techniques in
personalization of learning in e-learning systems. In: Computational Intelligence
for Technology Enhanced Learning, pp. 1–23. Springer (2010)
32. Mill´an, E., Agosta, J.M., P´erez de la Cruz, J.L.: Bayesian student modeling and
the problem of parameter specification. British Journal of Educational Technology
32(2), 171–181 (2001)
33. Mill´an, E., Loboda, T., P´erez-De-La-Cruz, J.L.: Bayesian networks for student
model engineering. Computers & Education 55(4), 1663–1683 (2010)
34. Mill´an, E., P´erez-De-La-Cruz, J.L.: A bayesian diagnostic algorithm for student
modeling and its evaluation. User Modeling and User-Adapted Interaction 12(2-
3), 281–330 (2002)
35. Mill´an, E., Trella, M., P´erez-de-la Cruz, J.L., Conejo, R.: Using bayesian networks
in computerized adaptive tests. In: Computers and Education in the 21st Century,
pp. 217–228. Springer (2000)
36. Nouh, Y., Karthikeyani, P., Nadara jan, R.: Intelligent tutoring system-bayesian
student model. In: 1st International Conference on Digital Information Manage-
ment. pp. 257–262. IEEE (2006)
37. Pardos, Z.A., Heffernan, N.T., Anderson, B., Heffernan, C.L.: Using fine-grained
skill models to fit student performance with bayesian networks. Handbook of edu-
cational data mining pp. 417–426 (2010)
38. Pearl, J.: Embracing causality in default reasoning. Artificial Intelligence 35(2),
259–271 (1988)
39. Pfeffer, A.: Practical Probabilistic Programming. Manning Publications (2016)
40. Romero, C., Ventura, S.: Educational data mining: a review of the state of the art.
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and
Reviews) 40(6), 601–618 (2010)
41. Schiaffino, S., Garcia, P., Amandi, A.: eteacher: Providing personalized assistance
to e-learning students. Computers & Education 51(4), 1744–1754 (2008)
42. Sch¨on, M., Ebner, M., Kothmeier, G.: It’s just about learning the multiplication
table. In: Buckingham Shum, S., Gasevic, D., Ferguson, R. (eds.) Proceedings of
the 2nd international conference on learning analytics and knowledge. pp. 73–81.
ACM, New York, NY, USA (2012)
43. Seidenberg, M.S., McClelland, J.L.: A distributed, developmental model of word
recognition and naming. Psychological review 96(4), 523–568 (1989)
44. Siemens, G., d Baker, R.S.: Learning analytics and educational data mining: to-
wards communication and collaboration. In: Proceedings of the 2nd international
conference on learning analytics and knowledge. pp. 252–254. ACM (2012)
45. Stacey, K., Flynn, J.: Evaluating an adaptive computer system for teaching about
decimals: Two case studies. In: AI-ED2003 Supplementary Proceedings of the 11th
International Conference on Artificial Intelligence in Education. pp. 454–460. Cite-
seer (2003)
46. Stacey, K., Sonenberg, E., Nicholson, A., Boneh, T., Steinle, V.: A teaching model
exploiting cognitive conflict driven by a bayesian network. In: International Con-
ference on User Modeling. pp. 352–362. Springer (2003)
47. Taraghi, B., Ebner, M., Saranti, A., Sch¨on, M.: On using markov chain to evi-
dence the learning structures and difficulty levels of one digit multiplication. In:
Proceedings of the Fourth International Conference on Learning Analytics And
Knowledge. pp. 68–72. ACM (2014)
48. Taraghi, B., Frey, M., Saranti, A., Ebner, M., M¨uller, V., Großmann, A.: Determin-
ing the causing factors of errors for multiplication problems. In: European Summit
on Immersive Education. pp. 27–38. Springer (2014)
49. Taraghi, B., Saranti, A., Ebner, M., Mueller, V., Grossmann, A.: Towards a
learning-aware application guided by hierarchical classification of learner profiles.
J. UCS 21(1), 93–109 (2015)
50. Taraghi, B., Saranti, A., Ebner, M., Sch¨on, M.: Markov chain and classification
of difficulty levels enhances the learning path in one digit multiplication. In: In-
ternational Conference on Learning and Collaboration Technologies. pp. 322–333.
Springer (2014)
51. Xenos, M.: Prediction and assessment of student behaviour in open and distance
education in computers using bayesian networks. Computers & Education 43(4),
345–359 (2004)
52. Zapata-Rivera, J.D., Greer, J.E.: Interacting with inspectable bayesian student
models. International Journal of Artificial Intelligence in Education 14(2), 127–
163 (2004)
... The copyright holder for this preprint this version posted January 12, 2022. ; doi: bioRxiv preprint for relevant subgraphs and their motifs [28], [29], walks [30] or even try to create Probabilistic Graphical Models [31], [32] which are causal structures, out of counterfactual examples that are computed by informed optimization problems [33]. ...
Full-text available
The tremendous success of graphical neural networks (GNNs) has already had a major impact on systems biology research. For example, GNNs are currently used for drug target recognition in protein-drug interaction networks as well as cancer gene discovery and more. Important aspects whose practical relevance is often underestimated are comprehensibility, interpretability, and explainability. In this work, we present a graph-based deep learning framework for disease subnetwork detection via explainable GNNs. In our framework, each patient is represented by the topology of a protein-protein network (PPI), and the nodes are enriched by molecular multimodal data, such as gene expression and DNA methylation. Therefore, our novel modification of the GNNexplainer for model-wide explanations can detect potential disease subnetworks, which is of high practical relevance. The proposed methods are implemented in the GNN-SubNet Python program, which we have made freely available on our GitHub for the international research community (
... 134). The probabilistic graphical models of Saranti et al. (2019) provide insights into human learning competence and can be used to personalize tutoring according to a learner's knowledge level. The factors causing incorrect student responses and the weights of these factors provide valuable insight for teachers. ...
This study aimed to categorize learning analytics (LA) models and identify their relevant components by analyzing LA-related articles published between 2011 and 2019 in international journals. A total of 101 articles discussing various LA models were selected. These models were characterized according to their goals and components. A qualitative content analysis approach was used to develop a coding scheme for analyzing the aforementioned models. The results reveal that the studied LA models belong to five categories, namely performance, meta-cognitive, interactivity, communication, and data models. The majority of the selected LA-related articles were data models, followed by performance models. This review also identified 16 components that were commonly used in the studied models. The results indicate that analytics was the most common component in the studied models (used in 10 LA models). Furthermore, visualization was the most relevant component in the studied communication models.
... More specifically, concept graphs [72] consist of an attempt to compute a graph from the concepts that are learned by trained neural network models and relations thereof. They draw inspiration from Bayesian models (which are also graphical models [15], [73]) that have interpretable random variables but lack performance and abilities to generalize to group the weights of trained deep neural networks with hierarchical clustering. The ultimate goal is to find active inference trails in the created graphical model, based on assumptions about the network weights and create visual trail descriptions that will be validated by medical professionals [74]. ...
Full-text available
Machine intelligence is very successful at standard recognition tasks when having high-quality training data. There is still a significant gap between machine-level pattern recognition and human-level concept learning. Humans can learn under uncertainty from only a few examples and generalize these concepts to solve new problems. The growing interest in explainable machine intelligence, requires experimental environments and diagnostic tests to analyze weaknesses in existing approaches to drive progress in the field. In this paper, we discuss existing diagnostic tests and test data sets such as CLEVR, CLEVERER, CLOSURE, CURI, Bongard-LOGO, V-PROM, and present our own experimental environment: The KANDINSKYPatterns, named after the Russian artist Wassily Kandinksy, who made theoretical contributions to compositivity, i.e. that all perceptions consist of geometrically elementary individual components. This was experimentally proven by Hubel &Wiesel in the 1960s and became the basis for machine learning approaches such as the Neocognitron and the even later Deep Learning. While KANDINSKYPatterns have computationally controllable properties on the one hand, bringing ground truth, they are also easily distinguishable by human observers, i.e., controlled patterns can be described by both humans and algorithms, making them another important contribution to international research in machine intelligence.
... Probabilistic graphical models [34] are used in interpretable description of images. Scene graphs are conditional random fields that are used to model objects, their attributes and relationships within an image, through random variables connected with edges [20]. ...
The study of visual concept learning methodologies has been developed over the last years, becoming the state-of-the art research that challenges the reasoning capabilities of deep learning methods. In this paper we discuss the evolution of those methods, starting from the captioning approaches that prepared the transition to current cutting-edge visual question answering systems. The emergence of specially designed datasets, distilled from visual complexity, but with properties and divisions that challenge abstract reasoning and generalization capabilities, encourages the development of AI systems that will support them by design. Explainability of the decision making process of AI systems, either built-in or as a by-product of the acquired reasoning capabilities, underpins the understanding of those systems robustness, their underlying logic and their improvement potential.
... Users answer 1-digit multiplication questions that are posed to them sequentially. Detailed information about the gathered data, student modelling and analysis can be found in [12,14]. The student model that was designed and provided valuable insights, is used in the forthcoming sections and its structure is depicted in Fig. 1. ...
Full-text available
Code quality is a requirement for successful and sustainable software development. The emergence of Artificial Intelligence and data driven Machine Learning in current applications makes customized solutions for both data as well as code quality a requirement. The diversity and the stochastic nature of Machine Learning algorithms require different test methods, each of which is suitable for a particular method. Conventional unit tests in test-automation environments provide the common, well-studied approach to tackle code quality issues, but Machine Learning applications pose new challenges and have different requirements, mostly as far the numerical computations are concerned. In this research work, a concrete use of property-based testing for quality assurance in the parameter learning algorithm of a probabilistic graphical model is described. The necessity and effectiveness of this method in comparison to unit tests is analyzed with concrete code examples for enhanced retraceability and interpretability, thus highly relevant for what is called explainable AI.
... The above-mentioned KT approaches model students' learning in an implicit manner by obtaining their (implicit) knowledge states through learning from sequences of multiple attempts. However, there are only a few studies in the field of KT that have addressed learning and forgetting explicitly and simultaneously [8,9,26,30,40,41], while either simplifying the forgetting behavior or just ignoring it. ...
Full-text available
Knowledge tracing (KT) is essential for adaptive learning to obtain learners’ current states of knowledge for the purpose of providing adaptive service. Generally, the knowledge construction procedure is constantly evolving because students dynamically learn and forget over time. Unfortunately, to the best of our knowledge most existing approaches consider only a fragment of the information that relates to learning or forgetting, and the problem of making use of rich information during learners’ learning interactions to achieve more precise prediction of learner performance in KT remains under-explored. Moreover, existing work either neglects the problem difficulty or assumes that it is constant, and this is unrealistic in the actual learning process as problem difficulty affects performance undoubtedly and also varies overtime in terms of the cognitive challenge it presents to individual learners. To this end, we herein propose a novel model, KTM-DLF (Knowledge Tracing Machine by modeling cognitive item Difficulty and Learning and Forgetting), to trace the evolution of each learner’s knowledge acquisition during exercise activities by modeling his or her dynamic knowledge construction procedure and cognitive item difficulty. Specifically, we first specify the concept of cognitive item difficulty and propose a method to model the cognitive item difficulty adaptively based on learners’ learning histories. Then, based on two classical theories (the learning curve theory and the Ebbinghaus forgetting curve theory), we propose a method for modeling learners’ learning and forgetting over time. Finally, the KTM-DLF model is proposed to incorporate learners’ abilities, the cognitive item difficulty, and the two dynamic procedures (learning and forgetting) together. We then use the factorization machine framework to embed features in high dimensions and model pairwise interactions to increase the model’s accuracy. Extensive experiments have been conducted on three public real-world datasets, and the results confirm that our proposed model outperforms the other state-of-the-art educational data mining models.
Explainable Artificial Intelligence (xAI) is an established field with a vibrant community that has developed a variety of very successful approaches to explain and interpret predictions of complex machine learning models such as deep neural networks. In this article, we briefly introduce a few selected methods and discuss them in a short, clear and concise way. The goal of this article is to give beginners, especially application engineers and data scientists, a quick overview of the state of the art in this current topic. The following 17 methods are covered in this chapter: LIME, Anchors, GraphLIME, LRP, DTD, PDA, TCAV, XGNN, SHAP, ASV, Break-Down, Shapley Flow, Textual Explanations of Visual Models, Integrated Gradients, Causal Models, Meaningful Perturbations, and X-NeSyL.
Full-text available
Explainable AI is not a new field. Since at least the early exploitation of C.S. Pierce’s abductive reasoning in expert systems of the 1980s, there were reasoning architectures to support an explanation function for complex AI systems, including applications in medical diagnosis, complex multi-component design, and reasoning about the real world. So explainability is at least as old as early AI, and a natural consequence of the design of AI systems. While early expert systems consisted of handcrafted knowledge bases that enabled reasoning over narrowly well-defined domains (e.g., INTERNIST, MYCIN), such systems had no learning capabilities and had only primitive uncertainty handling. But the evolution of formal reasoning architectures to incorporate principled probabilistic reasoning helped address the capture and use of uncertain knowledge.
Conference Paper
Full-text available
Most Machine Learning (ML) researchers focus on automatic Machine Learning (aML) where great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from the availability of ”big data”. However, sometimes, for example in health informatics, we are confronted not a small number of data sets or rare events, and with complex problems where aML-approaches fail or deliver unsatisfactory results. Here, interactive Machine Learning (iML) may be of help and the “human-in-the-loop” approach may be beneficial in solving computationally hard problems, where human expertise can help to reduce an exponential search space through heuristics. In this paper, experiments are discussed which help to evaluate the effectiveness of the iML-”human-in-the-loop” approach, particularly in opening the ”black box”, thereby enabling a human to directly and indirectly manipulating and interacting with an algorithm. For this purpose, we selected the Ant Colony Optimization (ACO) framework, and use it on the Traveling Salesman Problem (TSP) which is of high importance in solving many practical problems in health informatics, e.g. in the study of proteins.
Full-text available
Daten, Daten, Daten ... entstehen in der heutigen Welt durch das zuneh- mende Angebot an webbasierten Applikationen. Je mehr sich das Inter- net öffnet, je mehr Benutzer sich auf einer Plattform anmelden, desto mehr füllen sich die dahinterliegenden Datenbanken. Unter den Schlag- wörtern »Big Data« und »Data Mining« versteht man die Analyse dieser Daten, deren Interpretation und einer oft automatisierten benutzer- abhängigen Reaktion. Die Tragweite von solchem auf das Lehren und Lernen angewandten Vorgehen ist heute kaum abschätzbar. In diesem Beitrag geben wir mit Beispielen eine Einführung in das Forschungsgebiet Learning Analytics (LA). Die Darstellung soll auch helfen, LA vom konven- tionellen Bereich Educational Data Mining (EDM) abzugrenzen. Als Bei- spiele für LA werden Anwendungen zum Erlernen der Arithmetik und zur Analyse von Aktivitäten im Personal-Learning-(PLE)-Bereich der Technischen Universität Graz vorgestellt und es wird dargelegt, wie Lehr- personen von den Ergebnissen profitieren können. Charakteristisch für diese Anwendungen ist, dass dabei Unmengen von Daten erhoben, gespeichert und analysiert werden, wie das ohne Technikeinsatz nicht möglich wäre. So erkennen wir Chancen für eine wesentliche Qualitäts- verbesserung des Lehrens und Lernens durch individuelle, zeitnahe, prä- zise, aber kompakte Feedbacks für alle an Unterrichtsprozessen Beteilig- ten. Im Hinblick auf die Forschung zum Lehren und Lernenbetreten wir gerade einen Raum mit nur zu erahnenden Erkenntnismöglichkeiten.
Conference Paper
Full-text available
Literature in the area of psychology and education provides domain knowledge to learning applications. This work detects the difficulty levels within a set of multiplication problems and analyses the dataset on different error types as described and determined in several pedagogical surveys and investigations. Our research sheds light to the impact of each error type in simple multiplication problems and the evolution of error rates for different error types in relation to the increasing problem-size.
Predictive Analytics with Microsoft Azure Machine Learning, Second Edition is a practical tutorial introduction to the field of data science and machine learning, with a focus on building and deploying predictive models. The book provides a thorough overview of the Microsoft Azure Machine Learning service released for general availability on February 18th, 2015 with practical guidance for building recommenders, propensity models, and churn and predictive maintenance models. The authors use task oriented descriptions and concrete end-to-end examples to ensure that the reader can immediately begin using this new service. The book describes all aspects of the service from data ingress to applying machine learning, evaluating the models, and deploying them as web services. Learn how you can quickly build and deploy sophisticated predictive models with the new Azure Machine Learning from Microsoft. Whats New in the Second Edition? Five new chapters have been added with practical detailed coverage of: Python Integration a new feature announced February 2015Data preparation and feature selection Data visualization with Power BIRecommendation engines Selling your models on Azure Marketplace What youll learn A structured introduction to Data Science and its best practices An introduction to the new Microsoft Azure Machine Learning service, explaining how to effectively build and deploy predictive models Practical skills such as how to solve typical predictive analytics problems like propensity modeling, churn analysis, product recommendation, and visualization with Power BIA practical way to sell your own predictive models on the Azure Marketplace Who this book is for Data Scientists, Business Analysts, BI Professionals and Developers who are interested in expanding their repertoire of skill applied to machine learning and predictive analytics, as well as anyone interested in an in-depth explanation of the Microsoft Azure Machine Learning service through practical tasks and concrete applications. The reader is assumed to have basic knowledge of statistics and data analysis, but not deep experience in data science or data mining. Advanced programming skills are not required, although some experience with R programming would prove very useful.
Intelligent tutoring systems adapt the curriculum to the needs of the individual student. Therefore, an accurate representation and prediction of student knowledge is essential. Bayesian Knowledge Tracing (BKT) is a popular approach for student modeling. The structure of BKT models, however, makes it impossible to represent the hierarchy and relationships between the different skills of a learning domain. Dynamic Bayesian networks (DBN) on the other hand are able to represent multiple skills jointly within one model. In this work, we suggest the use of DBNs for student modeling. We introduce a constrained optimization algorithm for parameter learning of such models. We extensively evaluate and interpret the prediction accuracy of our approach on five large-scale data sets of different learning domains such as mathematics, spelling learning and physics. We furthermore provide comparisons to previous student modeling approaches and analyze the influence of the different student modeling techniques on instructional policies. We demonstrate that our approach outperforms previous techniques in prediction accuracy on unseen data across all learning domains and yields meaningful instructional policies.
The development of clinical reasoning and decision-making skills in medicine is inextricably linked to experience. Yet students are failing to gain this experience before embarking on their first medical jobs. This is due to several factors, including advances in medical science resulting in patients that are less likely to be hospitalized and more likely to be treated in outpatient departments. This lack of experience within real-world scenarios has resulted in students feeling they are ill prepared for their first medical jobs. One way to counter such a lack of experience is through the use of software simulations such as Virtual Patients. However, simulations are extremely costly to develop, in terms of both financial outlay and the time required to create them. We report here on the development of an iPad-based Virtual Patient simulation system that uses annotated electronic patient data and health records for the creation of cases to enable students to learn critical decision-making skills. By basing these Virtual Patients on abundant patient records, cases can be more quickly and easily created, thus enabling pools of cases to be accumulated—essential for gaining the experience required for the development of sound clinical reasoning skills.