Content uploaded by Fabiana Sgobbi
Author content
All content in this area was uploaded by Fabiana Sgobbi on Nov 11, 2016
Content may be subject to copyright.
DDA
AV
- Student Performance Detector
Andreia Rosangela Kessler Mühlbeier
UFSM – Universidade Federal de Santa Maria
Santa Maria - Brasil
andreiamuhlbeier@yahoo.com.br
Aderson de Carvalho
UFSM – Universidade Federal de Santa Maria
Santa Maria - Brasil
acarvalho@inf.ufsm.br
Fabiana Santiago Sgobbi
UFRGS – Universidade Federal do Rio Grande do Sul
Porto Alegre - Brasil
fabianasgobbi@gmail.com
Roseclea Duarte Medina
UFSM – Universidade Federal de Santa Maria
Santa Maria - Brasil
roseclea.medina@gmail.com
Liane Margarida Rockenbach Tarouco
UFRGS – Universidade Federal do Rio Grande do Sul
Porto Alegre - Brasil
liane@penta.ufrgs.br
Abstract - The accelerated development and increasing use of
virtual learning environments (VLE) motivates a change in
education. This article presents a study about data mining
(DM), techniques and tools with the aim of researching and
analysing the student's behaviour in the virtual environment,
in the real time of a course implementation, giving to the
professor a feedback of the student´s academic performance,
what estimulates student's participation and the improving on
his development, as well as avoiding the course evasion. The
results gotten with the researches on techniques and tools show
that it is possible to obtain these inferences over the period.
Keywords – data mining; student performance; weka; knowledge
on databases.
I.
I
NTRODUCTION
The previous achievements and the dissemination of the
use of new technologies reveal new perspectives to teaching
in terms of attending, semi-distance and distance education.
Through specific tools and through the access by mobile
devices, the Virtual Learning Environments (VLE) are have
a progressive prominent role in teaching progress. However,
virtual learning environments promote a changing in
education and enable a greater interaction into the
environment among students, professors, tutors, subjects
and interfaces what resumes the interactions as an efficient
part of the learning processes [1].
The information records in the environments a large
range of data, very important sources of knowledge that end
up being refused, sometimes regarding the lack of
knowledge in interpreting them. [2]
In this context, the process of evaluating the student's
performance in virtual learning environments is made at the
end of the subjects, considering most of times
the cognitive
performance as static, without practically any time for
resuming retroactive actions. However, it is clear the
necessity of promoting these actions of evaluation and
monitoring during the course in terms of proposing
alternatives for their better uses, producing more subsidies
to an early identification in time of the student succeed in
learning.
This research aims to research and to analyze the student’s
performance in the virtual learning environment, using data
mining techniques Data Mining (MD) through some
efficient techniques for Knowledge Discovery in Databases
(KDD) for the information recorded in the database, what
gives the opportunity of charting the student's performance
in real time of execution. This chart will give a feedback to
the teacher in order that he can both stimulates and
improves the student's participation in the course.
This paper is organized this way: Section II presents the
virtual learning environments. Section III describes the
theoretical basis for knowledge discovery in databases.
Section IV describes the WEKA tool and its relevant
characteristics. Section V presents the related work. Section
VI presents the development of the research methodology.
Section VII presents the Student Performance Detector
(DDA
AV
), considering its pedagogical and technological
aspects, and results. Section VIII presents the conclusions of
this paper.
.
II. V
IRTUAL LEARNING ENVIRONMENTS
The Virtual Learning Environments (VLE) are softwares
applications on web servers; with a set of tools allows to
create courses and to development learning. These
environments usually classify their users in three pre-set
profiles: Administrator, Teacher and Student. However, it is
valid to point out that there is another profile as the Tutor
that works with the teacher, being responsible for
pedagogical mediation [3].
One environment widely used in educational spaces is
Modular Object-Oriented Dynamic Learning Environment
(MOODLE); it is an Open Source software, whose
development started at the 90’s, by Martin Dougiamas,
based on the constructivism learning philosophy and on the
social constructivism, supplying the creation and the
administration of courses focused on the collaborative work
and on an environment of simply and intuitive use [4].
Among other existing free softwares which the process
of teaching/learning, we chose MOODLE to do the present
research. The choice is due to the environment has a large
quantity of tools, it is in constant update and it has a large
group of users who collaborates with its evolution, as well
as it permits the integration with other techniques in its
repositories.
III. K
NOWLEDGE DISCOVERY
D
ATABASES
With the advance of the computer technologies enabling
the storing and processing of a large volume of data, new
technologies have been developed in order to assist the
collection of the information from these databases through
techniques as the Knowledge Discovery in Databases
(KDD) and Data Mining (MD) [5].
The process of the Knowledge Discovery in Databases
(KDD) presented by Fayyad [6], is "a non-trivial process of
identifying valid, unknown but potentially useful and
interpretable standards." It consists of discovering the useful
knowledge in the stored data through the application of
modern date mining techniques, the evaluation of the gotten
standards and the interpretation of the results.
The KDD process which includes complex phases that
may be executed carefully because it is of paramount
important so that the established aims and the overall
success of the application be a reached, is divided up into:
Pre-processing, Data Mining and Post-processing [6].
A. Pre-processing
In this phase, the identification and the understanding of
the problem, occur considering aspects as the aims and the
data source form what intends the knowledge be taken. The
next step consists of the data selection from the sources,
according to the aim of the process, and of the data
processing in order to be submitted to methods and tools,
during the phase of patterns’ extraction.
B. Post-processing
In the phase, the extracted knowledge is measured in
terms of its quality and its use in the order to be useful for
some decision making process either by a human being
expert or by a special system.
C. Data Mining
The Data-Mining phase is the central phase and it
makes
the genuine discovering of new knowledges because its
algorithms produce the knowledge in a semiautomatic way
from the previous data.
The MD process include data selection, preparation,
application tasks and/or techniques and their respective
algorithms as a way of analyzing the results and detecting the
extracted knowledge. The MD is divided up into:
a) Association.
b) Classification.
c) Estimate.
d) Ssegmentation.
e) Summarization.
These tasks are performed associating the algorithms’
implementation with the techniques for learning machines
as:
a) Genetic algorithms.
b) Decision tree.
c) Discovering of association rule.
d) Reasoning based on cases.
e) Neural nets.
In this research, the classification task using the decision
tree via the algorithm J48 [7] was chosen. The choice
represents the aim of searching through the conditions
provided by both ones of identifying the student's
performance supported by some parameters. The decision
tree follows an hierarchical sequence of tests developed
over a structure of trees, with leaf nodes representing classes
in which the algorithm is the rule along the way of the tree,
from the root to an leaf node. The algorithm 48 is a
classification one, provided in the Java's language, from the
algorithm C 4.5 releases 8; it makes a model of decision tree
based on a set of attributes and it uses this model to classify
the instances in a group [7].
IV. WEKA
TOOL
The constant increasing of digital information leads to a
large interest in discovering some implicit knowledge in it.
According to [8][9], there are some aspects that might be
considered to order to choose a tool for learning discovery.
a) Potential access online and offline to a range of data
sources.
b) Capacity for adding object-oriented features or non-
standarded features.
c) Capacity for processing in terms of maximum number
of charts, registers or attributes..
d) Range of different attributes the machine can
operates.
e) Type of reference language.
The tool Weka was developed at the New Zealand
University, at the Computational Department. This tool uses
some techniques for executing the selected data mining
tasks as [10]: association, classification and clustering.
The mining starts by reading the data through a file
formatted especially for the tool, the ARFF (Attribute-
Relation File Format). The ARFF is a textual file that
describes a list of tasks sharing a set of attributes [10].
The choice of the tool WEKA for this work is justified
on the fact that it makes the system portable as well as it
presents an object-oriented multiplatform language. The
portability of this language enables the execution of the tool
in different platforms, and its orientation to object brings
advantages as modularity, polymorphism, encapsulation,
code reuse among others [11].
V. RELATED
WORK
The work of Maia et al. [12] focuses on the future
performance of students in disciplines of an undergraduate
degree, are made from the grades achieved in subjects taken
already. In this model, students and course subjects were
modeled as nodes and their representation as the edges that
make up a graph. The authors reported that, among the
subjects there is a large variation in the values of the
average errors analyzed, ranging from 3.6% to 100%.
However, the authors conclude that a significant mean error
for a discipline could indicate: that it does not have great
relationship with the other subjects in the curriculum, or the
assessment has some degree of disconnection with the
results obtained in other disciplines.
In [13], to see high rates of dropout students in distance
courses, one through an interview fieldwork was performed
with a professional distance education, to identify some
evidence of evasion courses. Based on the identified
attributes, a prototype was designed to identify with the user
log records stored in the database, information from these
students. The work follows the KDD online database and
used the WEKA tool, in particular the J48 algorithm that
identifies behavioral prediction by the decision trees show.
The author concludes the research, saying it can be
identified through access to AVA, use patterns and certain
diagnoses with evasion evidence thus propose corrective
measures to ensure that a pass student to have a material
behavior in the use of a VLE.
Accordingly, VLE [14] used to support classroom
courses, are characterized by storing a large volume of data.
These environments need tools to filter useful information
to detect student performance. The research investigated the
data stored in the VLE to extract information related to
student performance. To detect this information was
necessary to select a set of attributes, considering three
dimensions: usage profile of VLE, student-student
interaction and two-way interaction student-teacher. The
form used RandomForest [7] and MultilayerPerceptron [7]
ranking algorithms available in the Weka tool is pointed out
that in all the experiments we used the method K-fold
Cross-Validation [7] as data layering technique. The results
of using the MD techniques on the selected set of attributes
demonstrated that it is possible to obtain inferences
regarding student performance with overall accuracy rates
ranging from 72% to 80%, but leaves specific that the
accuracy rate may be insufficient to evaluate the quality of
the classification model, since the number of instances of
classes is unbalanced in the case study, due to each being in
different scenarios.
No analyzes focusing on student performance in the
virtual learning environment and real-time course of
execution were not identified in the current researches.
However, there are indications that this type of analysis is
important for the teacher to assist in stimulating
participation and improvement in students' learning
performance in MOODLE.
VI. RESEARCH
METHODOLOGY
In terms of nature, this research is classified as a
qualitative descriptive fieldwork. According to Lakatos and
de Marconi [15], a field and is aims both to obtain the
problem's data and to reach a result for the problem,
presenting a close relation between them.
In the first phase of development, to data mining
application in VLE, it was made a bibliographic research in
order to know how the knowledge discovery on databases
works and to understand and to analyze how the data mining
operation’s steps works (tasks and techniques) as well as the
functionality of the available data mining’s tools.
The second phase involved two moments: the assembly of a
hardware structure that supported the installation, the
development and the implementation of this work made up
of a Dell Power Edge T300 server, Intel Xenon Quadcore
X3363 2.83 GHz handler with 4 physical nucleous and 4
virtual nucleous, 8GB of Ram, 2 rigid disks of 500 MB and
64-bit 2008 Windows Server operating system. In this
server, the following programs were installed: the
WampServer Apache version 2.2, providing on its package
softwares that are required to operate MOODLE, where is
the version 2.2.22 Apache server; MySQL database version
5.5.24; PHP version 5.2.13 and phpMyAdmin version
3.4.10.1. Following this, the MOODLE VLE version 2.5.2
was installed. For the development, edition and
manipulation of the environmental, an Intel Pentium Dual-
Core, SU 4100 1.3GHz, 2GB of RAM, a 320GB hard drik
64 Bits Windows 7 Philco notebook computer was used.
From the installation of MOODLE environment, to
source of research was the database of the discipline
Introduction to the Integration of Midia in Education, form
the postgraduation course in Media in Education from the
Federal University of Santa Maria (UFSM), via distance
education mode offered to students during the second half of
2012. In this semester, there are 134 (one hundred thirty-
four) students in five university centers (Cachoeira do Sul,
Cruz Alta, Panambi, Restinga Seca and Santana do
Livramento) and 10 activities.
In the third phase, the process of modeling the block
operation began. The proposed modeling was done by the
tool Astah Community allowing the UML’s diagrams
constructions, Unified Modeling Language (UML) [16]
such as: use case diagrams, activity diagrams among others.
The Astah Community [16] is a free modeling software to
object-oriented operating design, based on the diagrams and
on the UML, note and it can generate Java code.
The fourth phase was related to the installation of
WEKA tool version 3.7.8, developed in the Java
programming language which allows a range of data
preprocessing algorithms as well as of data results analysis.
In the software, the ARFF files extension (* .arff) with their
respective rules, to the algorithm J 48 where generated. This
algorithm allows the construction of decision trees which
classify and present in their branches the most relevant
attributes such as: name, center, discipline, notes of the
performed activities and situation.
In the fifth phase, of the rules generated in software
WEKA with ARFF file extension (* .arff) for the language
PHP. The information was extracted from the MOODLE
database environment, in Excel electronic spreadsheet (note,
center and situation). So, the information was processed in
the WEKA tool, generating a file in the notepad from the
extension (* .arff). Then to this, the generated file was
turned into the PHP programming language [17] through the
software PHP Editor.
In the sixth phase, there were two integrated actions. The
first one was the block construction. This block takes the
number of any other activity proposed in the discipline in
order to be analysed. The second one was the integration of
this block in the MOODLE learning environment. The
developed block works by a pluggin put through an API that
permits its application to the environment interface.
In the seventh phase, validation tests for each phase of
development were made through, white box testing (made
by the web developer). According to Sommerville [18], the
tests are originated from both the knowledge of the structure
and the software implementation, in a way that the
developer aims to test and to know all the system code,
examining a logical way to verify the tool’s working. For
the development, the following tests were used: basis path
testing – consists of verifying whiter the system’s
instructions were executed al least once during the test; and
condition testing – consists of verifying all the logical
conditions of the system in terms of their main common
errors such as parenthesis, relational operator and arithmetic
expressions [19].
The first test was made after generating the rules in
ARFF (* .arff) format and testing the algorithm J48
consistency. The second test was made after translating of
the rules to PHP language. In the final test after integration
to the MOODLE system the block was validated.
Once the plugging is activated, the teacher informs the
number related to the activity proposed by the discipline
gathering a website feedback only about the low-performing
students. The result as well as the information about the
developed activities ate recorded on the SQL - based
database of MOODLE database.
Ending the process, in the eighth phase some monitoring
reports about tree student’s development (in PHP language)
as well as the construction of the decision tree and graphics
(WEKA software) were generated.
VII. DDA
AV
–
STUDENT
PERFORMANCE
DETECTOR
The environment aims to detect the student's
performance during the course through each one of the
performed and evaluated activity on the data mining
technique.
The source of the research was the discipline
Introduction to the Integration of Midia in Education with
134 students in five (5) university centers and ten (10)
activities related to the main subject.
The most relevant attributes extracted from the
environment are: name, center, discipline, and the notes of
the student’s activities which had already been registered by
the teacher. Considering the context in which the subject
was developed, performance might mean the evaluation of
the student's interaction with the environment, the learning
level for the proposed activities, participation level and
trouble level for the applying of the task.
The Figure 1 illustrates the environment set by showing
both proposal subjects and activities and the block "Student
Performance" that allows sending the number for any other
activity proposed in the discipline.
After the activities and situations posted by the J48
algorithms, the WEKA tool presented as outlet results the
attributes name, center, discipline. Some information is
more relevant to classify the attributes of this work. The
selected amounted 134 cases, representing 100% of all the
recorded ones in the database called "Correctly Classified
Instances"; it means the attributes level correctly classified
123 corpus achieving 91.79% of assertiveness, and it
presented just 11 wrong cases what means that the
"Incorrectly Classified Instances" achieved 8.21%.
As seen from Figure 2, the student’s performances are
presented through a MOODLE report. In this report, it is
possible to get both the individual performance (student’s
name) and the group performance (discipline), with all the
developed activities.
The relevance of the discipline is justified both by the
relevance of the professor monitor’s his students during its
whole process of execution, avoiding a posteriori analysis or
even another environment of analysis, and by the relevance
of the professor reviews the results in order to help the
students with constant learning disabilities.
Figure 1. Environmental progress and block.
Figura 2. Performance report of students.
VIII. CONCLUSION
Currently, teachers and higher education institutions
have faced a huge challenge of proposing both high
quality and a more individualized teaching for an
increasing number of students in different models
(attending, semi-presence and distance education). To
help in this process, the VLE have been used frequently
because they allow a better control, different kinds of
interactions and the adoption of different methods and
strategies. However, the large volume of complex data
makes the student’s performance assistance and
evaluation difficult thing.
In this sense, the KDD process that aims to discover
new knowledge, assists the research on large range of data
and detects useful information, through the application of
algorithmic tasks and techniques that implement MD
algorithms.
This research aimed both to apply the data-mining
techniques to a VLE by presenting to the professor a
student´s academic performance report during the course
as well as to avoid student's failure that leads frequently to
the course evasion. The report was extracted from the
integration of the rules for the algorithm J48 with the
relevant attributes of the environmental database.
The DDA
AV
choosen research environment was the
VLE MOODLE, that envolves pedagogic process
between teachers and VLE students. This research showed
the difficulty of analyzing a large amount of data,
available at the VLE’s database and then pointed out the
importance of using tools that assist the teacher in
monitoring the student’s path and performance in the
course.
In the conducted research, the difference among the
virtual learning environments is that the DDA
Av
presents
like advantages the unification in a single report and some
information to the teacher about the student’s path. This
fact constitutes a relevante dataset in order to the teacher
can elaborate pedagogic strategies to assist the student’s
individual needs. In addition to all that, the DDA
Av
takes
in a semiautomatic way the varieties "Sufficinet" and
"Insufficient" which features the student’s performance in
terms of the teacher measure inferences.
R
EFERENCES
[1] R. Donnelly, “Interaction analysis in a 'learning by doing'
problem-based professional development context”. Computers
& Education, vol. 55 no. 3, p. 1357-1366, 2010.
[2] C. Romero, S. Ventura, M. Pechenizkiy, R. Backer, S. J. D.,
“Handbook of Educational Data Mining”, Ed. C R C, p. 535,
2012.
[3] A. P. Rodrigues, "Virtual Environment Integration with
Digital Learning Repository" 2012. Thesis (Ph.D. in Education.)
- Federal University of Rio Grande do Sul - UFRGS, Porto
Alegre, p.188.
[4] MOODLE. “Statistics Documentation Moodle”. 2011.
Available at: <http://docs.MOODLE.org/22/en/Statistics>.
Accessed: Mar 2015.
[5] R. Goldschmidt, E. L. Passos, “Data Mining: um guia
prático”. Rio de Janeiro: Elsevier, 2005. 2ª. Reimpressão.
[6] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, “The KDD
process for extracting useful knowledge from volumes of data”.
Communications of the ACM, New York, vol. 39, no. 11, p.27-
34, 1996.
[7] WEKA. Waikato environment for knowledge analysis. 2013.
Available at: <http://www.cs.waikato.ac.nz/ml/weka/>.
Accessed: Abr 2015.
[8] M. Goebel, L. Gruenwald, “A survey of data mining and
knowledge discovery software tools”. In: SIGKDD
Explorations, June, 1999.
[9] L. A. Vieira, “Tools to estimate missing values in a database
in the Pre-Processing Step of a KDD”. Work and Conclusion
Course (Computer Science), University of Vale do Itajaí, 2008.
[10] I. H. Witten, E. Frank, M. A. Hall, “Data mining: Practical
machine learning tools and techniques”. San Francisco: Morgan
Kaufmann, 3 ed., 2011.
[11] D. Jacomini, “Entrants of Base Analysis in UNIDAVI”.
Work Completion Course in Information Systems. New South
Wales, in 2008.
[12] R. F. Maia, E. M. Spina, S. S. Shimizu, “System Student
Performance Forecast for Assisted Learning and Course
Rating”. Proceedings of the XXI SBIE -XVI WIE, 2010.
[13] C. S. de Afiune, “Educational Data Mining: Prediction
Behavior in Distance Education Environments (DE)”. Term
paper. State University of Goias, Anapolis, 2012, p. 108
[14] E. Gottardo, “Academic Performance estimation Students
in A AVA using Data Mining Techniques”. Dissertation (Master
of Applied Computing, Federal Technological University of
Paraná (UTFPR), 2012,
p.85.
[15] E. M. Lakatos, M. A. de Marconi, “Scientific Methodology
fundamentals”. 5th . Ed . Editora Atlas . Faculty of Arts, 2003.
[16] ASTAH, “Astah Community”. 2010. Available at:
<http://astah.change-vision.com/en/product/stah-
community.html>. Accessed: Mar. 2014.
[17] C.A. J. Oliviero. "Make a site with PHP 5.2 MySQL 5.0, E-
Commerce Driven project. "1
st
Edition. Ed. Erica Ltda. São
Paulo, p.412, 2013.
[18] I. Sommerville, “Software Engineering”. Edição 6:
Addison-Wesley, 2003.
[19] R. Pressman, “Software Engineering - A Professional
Approach ”. 7th Edition, 2011.