Content uploaded by Daniele Di Mitri
Author content
All content in this area was uploaded by Daniele Di Mitri on Jun 23, 2019
Content may be subject to copyright.
The Multimodal Learning Analytics Pipeline
Daniele Di Mitri1, Jan Schneider2, Marcus Specht1,3, Hendrik Drachsler1,2
1 Open University of The Netherlands, The Netherlands, daniele.dimitri@ou.nl
2 German Institute for International Educational Research, Germany schneider.jan@dipf.de
3 Delft University of Technology, The Netherlands
ABSTRACT
We introduce the Multimodal Learning Analytics
Pipeline, a generic approach for collecting and exploiting
multimodal data to support learning activities across physical
and digital spaces. The MMLA Pipeline facilitates researchers
in setting up their multimodal experiments, reducing setup and
configuration time required for collecting meaningful datasets.
Using the MMLA Pipeline, researchers can decide to use a set
of custom sensors to track different modalities, including
behavioural cues or affective states. Hence, researchers can
quickly obtain multimodal sessions consisting of synchronised
sensor data and video recordings. They can analyse and annotate
the sessions recorded and train machine learning algorithms to
classify or predict the patterns investigated.
I. INTRODUCTION
Learning researchers are increasingly employing multimodal
data and multi-sensor interfaces in a variety of learning
activities. Two main factors drive this interest. First, the
emergence and wide diffusion of new seamless technologies for
data capturing. These include smartphones, wearable sensors or
Internet of Things (IoT) devices. Research shows that some of
these technologies can be employed in formal and non-formal
learning settings [1]. Second, the learning activities both in the
academic and vocational education sectors, are becoming more
and more blended, as they take place across digital platforms and
physical and co-located settings such as group activities or
individual exercise. Related literature features several both
empirical and theoretical studies that can fall under the umbrella
name of Multimodal Learning Analytics [2]. The MMLA field
looks primarily at learning scenarios alternative to the learner
seated in front of the laptop.
II. PROBLEM STATEMENT
In most of the researches conducted in MMLA or contiguous
fields using multimodal data, the techniques for data collection,
synchronization, annotation and analysis, use tailor-made
solutions over standardised approaches. That is due to the lack
of established standard technological and methodological
practices which make the field of MMLA jeopardised and
adverse for newcomers. We consider this a significant drawback
for the field of learning analytics and we aim to address with this
research.
III. PROPOSED SOLUTION
With the Multimodal Learning Analytics Pipeline, we aim at
addressing the lack of tools and support for the MMLA
researchers. The MMLA Pipeline provides an approach for
collecting and exploiting multimodal data to support activities
across physical and digital spaces. The MMLA Pipeline
facilitates researchers in setting up their multimodal
experiments, reducing setup and configuration time required for
collecting meaningful datasets. The multimodal data collected
can support researchers to design more accurate student
modelling, learning analytics and intelligent machine tutoring.
Using the MMLA Pipeline, researchers can decide to use a set
of custom sensors to track different modalities, including
behavioural cues or affective states. Hence, researchers can
quickly obtain multimodal sessions consisting of synchronised
sensor data and video recordings. They can analyse and annotate
the sessions recorded and train machine learning algorithms to
classify or predict the patterns investigated.
A comprehensive overview of the MMLA Pipeline is given
in Figure 1. The MMLA Pipeline is a cycle consisting of five
steps, which propose a solution to the five main MMLA
challenges.
(1) The data collection: techniques used for capturing,
aggregating and synchronizing data from multiple
modalities and sensor streams;
(2) the data storing: the approach used for organizing
multimodal data which having multiple formats and big
sizes, for storing and retrieving them later;
(3) the data annotation: how to provide meaning to portions
of multimodal recordings and to collect human
interpretations through expert or self-reports;
Task model 3rd party
sensors or API
2. Data
storing
Dashboards
Physiological
sensor data
Motoric
sensor data
(D)
Historical
reports
(B)
Predictions
(C)
Patterns
5. Data
exploitation
1. Data
collection
Processed
data store
4. Data
processing
Predictions
Model
fitting
(A)
Corrective
feedback
Intelligent Tutors
(A)
Evaluation
Expert reports
3. Data
annotation
(B,C)
Prediction
models
(D)
(D)
B,C
(B,C)
RESEARCH PRODUCTION
corrections
awareness orchestration adaptation
Fig. 1 – Graphical representation of the MMLA Pipeline.
(4) the data processing: approach for cleaning, aligning,
integrating, extracting relevant features from the ‘raw’
multimodal data and transforming them into a new data
representation suitable for exploitation;
(5) the data exploitation: the approach to ultimately support
the learner during the learning process with the
predictions and the insights obtained by the multimodal
data.
The MMLA Pipeline offers a bird-eye view on the lifecycle
of multimodal data that are collected from and used to support
the learner. We imagine the MMLA Pipeline in two phases, the
‘research’ phase and the ‘production’ phase. The first one
includes a several expert-driven operations, such as sensor
selections, annotations, model training, parameter tuning. These
configurations are used in a later stage of ‘production’ in which
the MMLA Pipeline is used as the multimodal data backbone
infrastructure for collecting the learning data and using them for
improving the learning activities.
In real-life learning activities, multimodal data can be
supportive in different ways. We call these the exploitation
strategies. For example, an Intelligent Tutor using the MMLA
Pipeline can prompt instantaneous feedback, nudging the learner
towards the desired behaviour. Alternatively, the learner data
can be used for retrospective feedback, in the form of an
analytics dashboard.
IV. CURRENT PROTOTYPES
At the current stage, we developed two main prototypes as
implementations of the MMLA Pipeline. The prototypes were
presented in two recent studies A) the Multimodal Learning
Hub [4] and B) the Visual Inspection Tool [5].
A. Multimodal Learning Hub
The Multimodal Learning Hub (LearningHub) is a system
that focuses on the data collection and data storing of multimodal
learning experiences [4]. It uses the concept of Meaningful
Learning Task (MLT) using a custom data format (MLT session
file) for data storing and exchange. At the current stage of
development, the Learning Hub uses a set of specifications that
shape it for learning activities. Several libraries compatible with
the LearningHub have been coded to work with commercial
devices and sensors. LearningHub focuses on short and
meaningful learning activities (~10 minutes) and uses a
distributed, client-server architecture with a master node
controlling and receiving updates from multiple data-provider
applications. It also handles video and audio recordings with the
primary purpose to support the human annotation process. The
expected output of the LearningHub is one (or multiple) MLT-
JSON session files including 1) one-to-n multimodal, time-
synchronised sensor recordings; 2) a video/audio file providing
evidence for retrospective annotations. The LearningHub is open
source and developed in C#.
B. Visual Inspection Tool
The Visual Inspection Tool (VIT) allows the manual and
semi-automatic annotation psychomotor learning tasks which
can be captured with a set of sensors. The VIT enables the user
to 1) triangulate multimodal data with video recordings; 2) to
segment the multimodal data into time intervals and to add
annotations to the time intervals; 3) to download the annotated
dataset and use the annotations as labels for machine learning
predictions. The annotations created with the VIT are saved into
MLT-JSON data-format as the other sensor files. The
annotations are treated as an additional sensor application, where
each frame is a time interval with relative startTime and
stopTime instead that a single timestamp. Using the standard
MLT-JSON data-format, the user of the VIT can both define
custom annotation schemes or load existing annotation files.
V. FUTURE WORKS
The two prototypes described are a first of implementing a part
of the MMLA Pipeline proposing a solution for the first three
challenges of data collection, storing and annotation. Although
yet prototypical the tools described are available with Open
Source licensing and they were created with extensibility in
mind. As future work, we want to focus on the data exploitation
and processing improving the current feedback mechanisms to
produce feedback on real-time based both on expert as well as
machine learned rules. We are planning to extend the
LearningHub with a Runtime Feedback System which would
allow the expert to set the type of feedback message, which
sensor device to send the messages to, what learner to address
with feedback and under which conditions the feedback should
be prompted.
REFERENCES
[1] Worsley, M. (2018). Multimodal learning analytics’ past, present, and,
potential futures. In CEUR Workshop Proceedings (Vol. 2163, pp. 1–16).
Aachen, Germany: CEUR Workshop Proceedings. Retrieved from
http://crossmmla.org/wp-
content/uploads/2018/02/CrossMMLA2018_paper_8.pdf
[2] Schneider, J., Börner, D., van Rosmalen, P., & Specht, M. (2015).
Augmenting the Senses: A Review on Sensor-Based Learning Support.
Sensors, 15(2), 4097–4133. http://doi.org/10.3390/s150204097
[3] Di Mitri, D., Schneider, J., Specht, M., & Drachsler, H. (2018). The Big
Five: Addressing Recurrent Multimodal Learning Data Challenges. In
Martinez-Maldonado Roberto (Ed.), Proceedings of the Second
Multimodal Learning Analytics Across (Physical and Digital) Spaces
(CrossMMLA) (p. 6). Aachen: CEUR Workshop Proceedings. Retrieved
from http://ceur-ws.org/Vol-2163/paper6.pdf
[4] Schneider, J., Di Mitri, D., Limbu, B., & Drachsler, H. (2018).
Multimodal Learning Hub: A Tool for Capturing Customizable
Multimodal Learning Experiences. In Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics) (Vol. 11082 LNCS, pp. 45–58). Cham,
Switzerland: Springer. http://doi.org/10.1007/978-3-319-98572-5_4
[5] Di Mitri, D., Schneider, J., Klemke, R., Specht, M., & Drachsler, H.
(2019). Read Between the Lines: An Annotation Tool for Multimodal
Data for Learning. In Proceedings of the 9th International Conference on
Learning Analytics & Knowledge - LAK19 (pp. 51–60). New York, NY,
USA: ACM. http://doi.org/10.1145/3303772.3303776