ChapterPDF Available

Abstract

The estimation of collaboration quality using manual observation and coding is a tedious and difficult task. Researchers have proposed the automation of this process by estimation into few categories (e.g., high vs. low collaboration). However, such categorical estimation lacks in depth and actionability, which can be critical for practitioners. We present a case study that evaluates the feasibility of quantifying collaboration quality and its multiple sub-dimensions (e.g., collaboration flow) in an authentic classroom setting. We collected multimodal data (audio and logs) from two groups collaborating face-to-face and in a collaborative writing task. The paper describes our exploration of different machine learning models and compares their performance with that of human coders, in the task of estimating collaboration quality along a continuum. Our results show that it is feasible to quantitatively estimate collaboration quality and its sub-dimensions, even from simple features of audio and log data, using machine learning. These findings open possibilities for in-depth automated quantification of collaboration quality, and the use of more advanced features and algorithms to get their performance closer to that of human coders.
Quantifying Collaboration Quality in
Face-to-Face Classroom Settings Using MMLA
Anonymized for review
Anonymized for review
Abstract. The estimation of collaboration quality using manual obser-
vation and coding is a tedious and difficult task. Researchers have pro-
posed the automation of this process by estimation into few categories
(e.g., high vs. low collaboration). However, such categorical estimation
lacks in depth and actionability, which can be critical for practitioners.
We present a case study that evaluates the feasibility of quantifying col-
laboration quality and its multiple sub-dimensions (e.g., collaboration
flow) in an authentic classroom setting. We collected multimodal data
(audio and logs) from two groups collaborating face-to-face and in a
collaborative writing task. The paper describes our exploration of differ-
ent machine learning models and compares their performance with that
of human coders, in the task of estimating collaboration quality along a
continuum. Our results show that it is feasible to quantitatively estimate
collaboration quality and its sub-dimensions, even from simple features
of audio and log data, using machine learning. For instance, our support
vector regression model covered approximately 50% of the performance
gap between randomness and human-level performance. These findings
open possibilities for in-depth automated quantification of collaboration
quality, and the use of more advanced features and algorithms to get
their performance closer to that of human coders.
Keywords: Computer-Supported Collaborative Learning ·Multimodal
Learning Analytics ·Collaboration Quality
1 Introduction
Collaboration has been traditionally studied using observation, interviews and
ethnographic methods [11]. Although these methods offer in-detailed informa-
tion, they also demand a lot of human effort and time, which are difficult to
scale up [11]. The use of technology to mediate collaboration has provided re-
searchers with large amounts of learner activity data (in the form of logs), offering
an alternative to traditional analyses of collaboration. Researchers have used a
variety of data (e.g., system logs, chats, discussion forums) to understand the
underlying process of collaboration using Learning Analytics (LA) methods like
content analysis, and interaction analysis[4]. The results of these analyses have
been employed to develop various kinds of feedback systems, from mirroring to
guiding support [6]. While Computer-Supported Collaborative Learning (CSCL)
2 Anonymized for review
often involves face-to-face and computer-mediated interactions, collaborative LA
support often relies on just digital logs, thus offering only a partial picture of
the interactions. Aware of this limitation, the field of Multimodal Learning An-
alytics (MMLA) [2] emerged with the goal of understanding learning through
multimodal data from digital and physical spaces. Recent MMLA studies showed
that it is feasible to estimate collaboration aspects categorically (e.g., high vs.
low collaboration) in face-to-face settings by combining physical and digital ac-
tivity traces [15, 14]. In addition, researchers have found verbal interactions and
speaking activity features as an important indicator for collaboration behav-
ior [8] [1]. However, most of these studies are conducted in laboratory settings
[3], so their results might not hold under authentic classroom constraints (e.g.,
noisy data). Moreover, the qualitative estimation of collaboration quality into
a few classes provides end users (e.g., a teacher) with little information about
what might be the underlying problem or reason why collaboration quality is
high/low.
In order to estimate quality of collaboration in a more fine-grained fashion,
this paper explores regression analysis models to quantitatively estimate collabo-
ration quality in a classroom setting from audio and log data. To reach that goal,
we carried out a case-study where we collected data from two groups (each with
four participants) in an authentic classroom setting. The learning activity in-
volved face-to-face discussion and collaborative writing using digital means. We
applied various regression models and compared their performance with that of
human coders, in the task of coding collaboration quality and its sub-dimensions
along a continuum.
2 Related Work
Researchers have investigated the problem of estimating collaboration into a lim-
ited set of categories, in various settings: pair-programming [5], project-based
learning [14], and tabletop-based collaborative learning [9]. These studies col-
lected data through different means (audio [15, 7, 1], Kinect sensors [5], system
logs [8, 15], and video [15]) and extracted a wide variety of features from them,
e.g.: non-verbal features like intensity, pitch, or speaking rate [7]; or spatial and
dynamic features like hand movement or distance between learners [14]. These
features were in turn used to estimate different aspects of the collaboration
process: detecting rapport [7], collaboration quality [15, 1, 9], or success in col-
laboration [14]. Certain studies [5, 15, 9] have included data from both physical
and digital spaces to investigate collaboration behavior.
While most of these studies devised their own coding schemes to annotate or
classify collaboration quality, others [9] have used collaboration rating schemes
that are widely used in the collaborative learning sciences (e.g., [10]). Although
these rating schemes often output quantitative scores (e.g., collaboration quality
[9], grading of collaboration work [14]), such scores have often been mapped into
two or three categories (e.g., high vs. low collaboration), as binary classification is
an easier problem (from an information theory point of view) and often results in
Quantifying Collaboration Quality Using MMLA 3
better performance when using statistical analysis [7] or machine learning models
[15, 14]. However, this “flattening” of the scores also takes away much of the
nuance and the different aspects that contribute to high-quality collaboration.
In terms of performance, classification accuracy has been reported from above
average (48% [5]) to moderate (69% [8]) and high level (80% [14], 96% [15]).
A number of gaps emerge from the aforementioned state of the art. First, that
MMLA researchers have mostly built models to estimate collaboration quality
categorically, which offers limited information about the reasons or underlying
structure of that judgement (i.e., limited explainability and actionability). In
consequence, there is still a lack of understanding regarding whether we can
estimate collaboration quality along a continuum (or to what extent). Third,
that MMLA studies often report their results without frames of reference that
can help the community understand how far (or how close) we are to developing
solutions of practical relevance to our classrooms (e.g., how they compare with
human-level classification or quantification of collaboration quality).
3 Methodology
To address the gaps identified in previous section, we setup a study to explore
the following research questions: RQ1. How well can we estimate collaboration
quality using machine learning, using audio and log data from an authentic
classroom setting in upper secondary school? RQ2. How well can we estimate
the various sub-dimensions of collaboration quality with machine learning, using
audio and log data from an authentic classroom setting?
To start addressing these questions, we have conducted a first case study
[16] in an authentic classroom setting, where learners performed collaborative
discussion and writing tasks, as part of their normal classes. Such case study
methodology allowed us to understand the situation in depth, and explore mul-
tiple aspects of the research questions (e.g., data fusion and regression models,
collaboration sub-dimensions).
4 Case Study
A 30 minute collaboration activity was co-designed by a researcher and a teacher
in which the students had to discuss and fill in a worksheet regarding genetic
mutations. The activity was enacted in a secondary education biology course
with 10 students in autumn 2019.
Two researchers were present in the classroom for data collection purposes
and technical support. A brief introduction was given to the students about the
aim of study and their consent for data collection was taken in written form
before the activity. In the case study below, the data from two groups (four
students each) are analyzed.
4 Anonymized for review
4.1 Data Collection
The students used an audio-capturing prototype with an omni-directional mi-
crophone (CoTrack) placed in the center of the group’s table, and Etherpad1for
the collaborative writing. CoTrack uses a voice activation detection (VAD) algo-
rithm to detect the presence of voice, and a direction-of-arrival (DOA) algorithm
to identify the direction of the sound. The prototype then maps the direction
to a particular learner and extracts various features (e.g., speaking time, IP ad-
dresses, number of characters added or deleted) from the audio and Etherpad
logs. Our analyses below use a total of 12 features: three features (speaking time,
number of characters added, and number of characters deleted) for each of the
four students in the groups, for every 30-second window of time (see below).
4.2 Data Annotation
We used Rummel et al. [13] collaboration quality rating scheme (itself adapted
from [10]), assigning a collaboration quality score along seven dimensions: Sus-
taining Mutual Understanding (SMU), Collaboration Flow (CF), Knowledge
Exchange (KE), Cooperative Orientation (CO), Argumentation (ARG), Struc-
turing Problem Solving Process and Time Management (SPST), and Individual
Task Orientation (ITO). Two raters coded these dimensions at the group level
(except the ITO, which is coded at an individual level and averaged to get
the group-level feature). Following the recommendations by Mart´ınez et al. [8],
we used time windows of 30 seconds, in which each of the aforementioned sub-
dimensions was assigned a score between 2 (very bad) and +2 (very good). The
sub-dimension scores at the group level were then added up to get the overall
collaboration quality score of the group for that time window (which can theo-
retically range from 14 to +14). This resulted in a dataset with 121 data points
from the collaboration of two learner groups. Two raters went through four it-
erations of coding before reaching substantial agreement on each sub-dimension
in terms of Cohen’s kappa (Table 1).
Table 1: Inter-rater agreement of human coders in each collaboration quality
sub-dimension (Cohen’s kappa)
SMU CF KE ARG SPST CO ITO-1 ITO-2 ITO-3 ITO-4
0.71 0.91 0.74 0.80 0.65 0.68 0.72 0.76 0.75 0.78
4.3 Data Analysis2
To map the individual student audio and log features to group-level features,
we explored three different approaches: simple averaging of individual scores,
1An open source real-time collaborative text editor, see https://etherpad.org
2Data analysis source code available at (anonymized for review)
Quantifying Collaboration Quality Using MMLA 5
using dimensionality reduction, and entropy-based fusion. In the dimensional-
ity reduction approach we applied principal component analysis (PCA) on all
individual-level features and extracted the four components that explained most
variance. The entropy-based approach has been used to map individual features
to group-level features in previous research [1], by converting each of the indi-
vidual features into a proportion p(x) (by dividing their value by the sum of
that feature in the corresponding time-window). These proportion values are
then used to compute group-level features using Shannon’s Entropy using the
formula:
H=X
xX
p(x)log2p(x)
For the training and evaluation of the machine learning models, we used
Python’s Scikit-learn library [12]. We randomly divided our dataset into train-
ing and test sets, using a ratio of 70:30. We trained regression models of different
kinds on our training set and investigated their performance on the test set. Con-
cretely, machine learning model families explored included K-Nearest Neighbors,
Random forest, Adaboost, Gradient boost, XGboost, Support vector regressors
(SVR), Neural networks, and ensemble (voting) regression models (using SVR,
Random forest and Adaboost). We used GridSearchCV (from Scikit-learn) with
3-fold cross validation to tune the model’s parameters.
4.4 Results
We used RMSE (Root Mean Square Error) as the performance metric to com-
pare the different regression models. As frames of reference, we computed the
RMSE that the human coders had achieved in their last round of manual collab-
oration quality scoring. We also computed the RMSE of two “no-information”
regressors, one that just estimates random values within the range of possible
quality scores, and one that provides an estimation equal to the average value
of the collaboration quality (quality=1.93, for this dataset).
From our analysis of the three fusion approaches, we found PCA-based fusion
as a better option than entropy and average, in terms of the performance of the
different regression models on test data. Figure 1(a) shows that PCA-fusion
based regression models achieved lower RMSE on test data than entropy-fusion
and average-fusion based regression models.
For the comparative analysis among regression models, we used PCA-based
fusion and trained different kinds of regression models, computing RMSE for
both the training and test data. All regression models (except Gradient Boost)
performed better than the average estimation model (figure 1(b)). XG Boost
and Neural Network regression models reported the highest variation between
training and testing errors, which can probably be explained by the models over-
fitting the small dataset available. The support vector regression (SVR) model
performed better than the other models, both in terms of lower RMSE, and
lower difference between training and test error. Comparing the performance
of this SVR model (which, let’s remember, used only very basic audio and log
6 Anonymized for review
(a) RMSE for different Fusion (b) RMSE of regression models
Fig. 1: Performance scores of regression models
features) with that of the no-information models (average and random) and the
human coders’ own RMSE values, we find that SVR covered about 50% of the
gap between the best no-information predictors and human-level performance.
Fig. 2: SVR performance on various sub-dimensions of collaboration quality
We also applied similar regression models to estimate the seven sub-dimensions
of collaboration quality. Again, support vector regression models performed bet-
ter than other models in estimating the majority of the sub-dimensions. Figure
2 shows the RMSE scores of support vector regression model, compared with
the no-information and human-level frames of reference. For these dimensions,
SVR covered 50% or more of the gap between no-information and human-level
performance.
Quantifying Collaboration Quality Using MMLA 7
5 Conclusions and future work
This paper investigated the feasibility of estimating the quality of collabora-
tion in face-to-face classroom settings using simple features from audio and log
data, and machine learning regression models. These results suggest that it is
feasible to quantitatively estimate collaboration quality along a continuum, and
even open the door to more in-depth estimation of the different collaboration
sub-dimensions (e.g., collaboration flow, knowledge exchange), which can be of
greater value to practitioners. We also provided three frames of reference (average
and random no-information estimators, as well as human coders’ performance)
with the aim to offer a more interpretable view on their performance. We sug-
gest future MMLA researchers to analyze their models’ performance using such
frames of reference, to help our research community in better understanding how
far our models and solutions have to go to achieve human-level performance.
This work is not without limitations. The small size of our dataset is proba-
bly the main weakness of our results so far, limiting greatly the generalizability
of the particular models and performance claims made. This issue can explain
the discrepancy between training and test errors of some of the regression mod-
els (e.g., Gradient Boost, AdaBoost, Neural Networks) due to over-fitting. The
expansion of this dataset with data from groupwork performed in different kinds
of authentic classroom settings, is one of our most important avenues of future
work.
Moreover, in the current case study we only used simple audio and log fea-
tures, and a limited set of machine learning models, considering all dataset sam-
ples independently (i.e., not looking at their sequence). The use of more complex
features (e.g., MFCC for audio data, or conversion of voice to text and subse-
quent analyses of content), different data fusion models and the exploration of
time-dependent machine learning models (e.g., Hidden Markov Models, sequence
analysis) will also be an important way to expand our work towards automated
estimation of collaboration quality that is close to human-level performance.
Acknowledgement
Anonymized for review
References
1. Bassiou, N., Tsiartas, A., Smith, J., Bratt, H., Richey, C., Shriberg, E., D’Angelo,
C., Alozie, N.: Privacy-preserving speech analytics for automatic assessment of stu-
dent collaboration. In: Proceedings of the Annual Conference of the International
Speech Communication Association, INTERSPEECH. pp. 888–892 (2016)
2. Blikstein, P., Worsley, M.: Multimodal learning analytics and education data min-
ing: using computational technologies to measure complex learning tasks. Journal
of Learning Analytics 3(2), 220–238 (2016)
8 Anonymized for review
3. Chua, Y.H.V., Dauwels, J., Tan, S.C.: Technologies for automated analysis of co-
located, real-life, physical learning spaces: Where are we now? In: Proceedings of
the 9th International Conference on Learning Analytics Knowledge. p. 11–20.
LAK19, ACM, NY, USA (2019)
4. Dillenbourg, P., J¨arvel¨a, S., Fischer, F.: The evolution of research on computer-
supported collaborative learning. In: Balacheff, N., Ludvigsen, S., de Jong, T.,
Lazonder, A., Barnes, S. (eds.) Technology-Enhanced Learning: Principles and
Products, pp. 3–19. Springer Netherlands, Dordrecht (2009)
5. Grover, S., Bienkowski, M., Tamrakar, A., Siddiquie, B., Salter, D., Divakaran,
A.: Multimodal analytics to study collaborative problem solving in pair program-
ming. In: Proceedings of the Sixth International Conference on Learning Analytics
Knowledge. p. 516–517. LAK ’16, ACM, NY, USA (2016)
6. Jermann, P., Soller, A., Muehlenbrock, M.: From mirroring to guiding: A review
of the state of art technology for supporting collaborative learning. International
Journalof Artificial Intelligence in Education (IJAIED) 15, 261–290 (2005)
7. Lubold, N., Pon-Barry, H.: Acoustic-prosodic entrainment and rapport in collab-
orative learning dialogues. In: Proceedings of the 2014 ACM Workshop on Mul-
timodal Learning Analytics Workshop and Grand Challenge. p. 5–12. MLA ’14,
ACM, NY, USA (2014)
8. Martinez, R., Wallace, J.R., Kay, J., Yacef, K.: Modelling and identifying col-
laborative situations in a collocated multi-display groupware setting. In: Biswas,
G., Bull, S., Kay, J., Mitrovic, A. (eds.) Artificial Intelligence in Education. pp.
196–204. Springer, Berlin, Heidelberg (2011)
9. Martinez-Maldonado, R., Dimitriadis, Y., Martinez-Mon´es, A., Kay, J., Yacef, K.:
Capturing and analyzing verbal and physical collaborative learning interactions
at an enriched interactive tabletop. International Journal of Computer-Supported
Collaborative Learning 8(4), 455–485 (2013)
10. Meier, A., Spada, H., Rummel, N.: A rating scheme for assessing the quality of
computer-supported collaboration processes. International Journal of Computer-
Supported Collaborative Learning 2(1), 63–86 (2007)
11. Mercer, N., Littleton, K., Wegerif, R.: Methods for studying the processes of inter-
action and collaborative activity in computer-based educational activities. Tech-
nology, Pedagogy and Education 13(2), 195–212 (2004)
12. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
13. Rummel, N., Deiglmayr, A., Spada, H., Kahrimanis, G., Avouris, N.: Analyzing
collaborative interactions across domains and settings: an adaptable rating scheme.
In: Puntambekar, S., Erkens, G., Hmelo-Silver, C. (eds.) Analyzing Interactions in
CSCL: Methods, Approaches and Issues. pp. 367–390. Springer US, Boston, MA
(2011)
14. Spikol, D., Ruffaldi, E., Dabisias, G., Cukurova, M.: Supervised machine learning
in multimodal learning analytics for estimating success in project-based learning.
Journal of Computer Assisted Learning 34(4), 366–377 (2018)
15. Viswanathan, S.A., VanLehn, K.: Using the tablet gestures and speech of pairs of
students to classify their collaboration. IEEE Transactions on Learning Technolo-
gies 11(2), 230–242 (2018)
16. Yin, R.K.: Case study methods. APA handbooks in psychology R
., American Psy-
chological Association, Washington, DC, US (2012)
... Researchers have employed MMLA for studying a range of learning scenarios (e.g., projectbased learning [5], collaborative learning [6]) with the use of a wide variety of data sensors (e.g., audio, video, eye-gaze, physiological) [1,7]. These data sensors have allowed researchers to investigate the physical space of learning, thus enabling a more holistic understanding of learning behavior for researchers as well as teachers. ...
... The prototype was used along with Etherpad 1 which allowed the group's participants to draft solutions to the given problem. We collected multimodal data (audio and logs) from two classroom sessions in an Estonian High school during a collaborative learning activity with the same teacher [6,8]. There was a total of nine students, 2 researchers, and 1 teacher involved in the study ( Table 1). ...
... The learning activity was prepared in advance by a researcher and the subject teacher. The data collected from this study was used to study the feasibility of automating collaboration quality estimation [6] and to gain insights into the teacher's perception regarding the utility of MMLA for monitoring collaborative learning in the classroom [12]. ...
Conference Paper
Full-text available
Multimodal Learning Analytics (MMLA) has enabled researchers to address learning in physical settings which have long been either overlooked or studied using observational methods. With the use of sensors, researchers have been able to understand learning through an entirely new perspective (e.g., analyzing heart-rate variability to find collaboration indicators). Consequently, MMLA has grown significantly in the past few years, moving from a nascent stage towards a more mature field. It raises a question on how the MMLA researcher can move further, i.e., the transition towards practice which started getting researchers' attention. This paper discusses the challenges we faced while conducting MMLA studies in classroom settings over four years and potential solutions to realize the goal of transitioning MMLA research to educational practice. This paper aims to start a discussion in the field of MMLA over the transition of research to practice.
... In this direction, researchers have explored the use of different data sources (e.g., audio and video) [26] and identified several indicators of collaboration behavior [20]. For example, speaking time has been found to be an indicator of equality of participation [20] which was found as an indicator of the quality of collaboration [3,15]. Furthermore, MMLA research has also been extended to automatically detect collaboration in the context of collaborative learning. ...
... Furthermore, MMLA research has also been extended to automatically detect collaboration in the context of collaborative learning. For example, researchers have also built models that can automatically estimate collaboration behavior [3,22,30]. ...
... These advances may contribute toward propelling MMLA research on automating collaboration estimation [5]. Consequently, there has been a growing interest in building models that can automate the collaboration estimation in MMLA [3,8,11,15,22,30]. ...
... explored the use of different data sources (e.g., audio and video) [25] and identified several indicators of collaboration behavior [19]. For example, speaking time has been found to be an indicator of equality of participation [19] which was found as an indicator of quality of collaboration [3,14]. Furthermore, MMLA research has also been extended to automatically detect collaboration in the context of collaborative learning. ...
... Furthermore, MMLA research has also been extended to automatically detect collaboration in the context of collaborative learning. For example, researchers have also built models that can automatically estimate collaboration behavior [3,21,29]. ...
... These advances may contribute toward propelling MMLA research on automating collaboration estimation [4]. Consequently, there has been a growing interest in building automated models that can automate the collaboration estimation in MMLA [3,7,10,14,21,29]. ...
Preprint
Full-text available
Multimodal learning analytics (MMLA) research for building collaboration quality estimation models has shown significant progress. However, the generalizability of such models is seldom addressed. In this paper, we address this gap by systematically evaluating the across-context generalizability of collaboration quality models developed using a typical MMLA pipeline. This paper further explores modelling pipelines with different configurations to improve the generalizability of the model. We collected 11 multimodal datasets (audio and log data) from face-to-face collaborative learning activities in six different classrooms with five different subject teachers. Our results showed that the models developed using the often-employed MMLA pipeline degraded in terms of Kappa from Fair (.20 < Kappa < .40) to Poor (Kappa < .20) when evaluated across contexts. This degradation in performance was significantly improved with pipelines that emerged as high-performing from our exploration of 32 pipelines. Furthermore, our exploration of pipelines provided statistical evidence that often-overlooked contextual data features improve the generalizability of a collaboration quality model. With these findings, we make recommendations for the modelling pipeline which can potentially help other researchers in achieving better generalizability in their collaboration quality estimation models.
... Earlier MMLA research works have provided preliminary evidence on the feasibility of automating the estimation of collaboration quality (or other aspects of collaboration) using multimodal data (audio and logs) in face-to-face (FtoF) settings [12,15] . This research has been advanced by MMLA researchers, exploring a variety of modeling techniques (e.g., Random forest [27], Adaboost [23]) with different types of data (e.g., audio [3,27] eye-gaze [19], video [24]). ...
... The current MMLA research has shown a growing interest in building automated models for collaboration behavior [3,12,13,17,24,27]. A variety of data sources have been utilized, e.g., audio [3,13], video [24], eye-gaze [22], etc. (for more details refer to [18,21]). ...
... The current MMLA research has shown a growing interest in building automated models for collaboration behavior [3,12,13,17,24,27]. A variety of data sources have been utilized, e.g., audio [3,13], video [24], eye-gaze [22], etc. (for more details refer to [18,21]). The use of multiple data sources has provided data with different sampling rates, thus, on a different time granularity level. ...
Preprint
Full-text available
Multimodal Learning Analytics (MMLA) has been applied to collaborative learning, often to estimate collaboration quality with the use of multimodal data, which often have uneven time scales. The difference in time scales is handled by dividing and aggregating data using a fixed-size time window. The current MMLA research lacks the systematic exploration of whether and how much the window size affects the generalizability of the collaboration quality estimation model. In this paper, we investigate the impact of different window sizes (e.g., 30 seconds, 60s, 90s, 120s, 180s, 240s) on the generalizability of classification models for collaboration quality and its underlying dimensions (e.g., argumentation). Our results from an MMLA study involving the use of audio and log data showed that a 60 seconds window size enabled the development of more generalizable models for collaboration quality (AUC 61%) and argumentation (AUC 64%). While for modeling dimensions focusing on coordination, interpersonal relationship, and joint information processing, a window size of 180 seconds led to better performance in terms of across-context generalizability (on average from 56% AUC to 63% AUC). The findings have implications for the eventual application of MMLA in authentic practice.
... The proliferation of low-cost video recording devices and the development of learning analytics, particularly multimodal learning analytics (MMLA), have transformed the landscape of educational research [2,14]. This transition, marked by the shift from simple log files to sophisticated sensors, has necessitated a greater emphasis on theoretical grounding to ensure the relevance of data and analyses, especially as machine learning becomes more prevalent [3,4]. Recent discussions highlight the importance of developing learning theories that support the use of MMLA and prioritize meaningful educational insights and humanfocused analytics [12]. ...
... In the CSCL field, several studies have used multimodal data. For example, Chejara et al. (2020) used multimodal data (i.e. audio and log data) to show the possibility of assessing collaboration quality and its sub-dimensions quantitatively. ...
Article
Full-text available
Despite the growing interest in using multimodal data to analyse students' actions in Computers‐Supported Collaborative Learning (CSCL) settings, studying teacher's orchestration load in such settings remains overlooked. The notion of classroom orchestration, and orchestration load, offer a lens to study the implications of increasingly complex technology‐supported learning environments on teacher performance. A combination of multimodal data may aid in understanding teachers' orchestration actions and, as a result, gain insights regarding the orchestration load teachers perceive in scripted CSCL situations. Studying teacher orchestration load in CSCL helps understand the workload teachers experience while facilitating student collaboration and assists in informing design decisions for teacher supporting tools. In this paper, we collect and analyse data from different modalities (i.e. electrodermal activity, observation notes, log data, dashboard screen recordings and responses to self‐reported questionnaires) to study teachers' orchestration load in scripted CSCL. A tool called PyramidApp was used to deploy CSCL activities and a teacher‐facing dashboard was used to facilitate teachers in managing collaboration in real time. The findings of the study show the potential of multimodal data analysis in investigating and estimating the orchestration load experienced by teachers in scripted CSCL activities. Study findings further demonstrate factors emerging from multimodal data such as task type, activity duration, and number of students influenced teachers' orchestration load.
... The examples presented in Section 4, portray a glimpse of the potential of using AI through MMLA to acquire new insights on the teaching/learning dynamics in physical, remote, and hybrid learning environments. A broad range of different MMLA applications can be found in the literature, focusing on innovative and ground-breaking subjects, such as teaching cognitive neuroscience (Segawa, 2019), quantifying collaboration quality in face-to-face classrooms (Chejara et al., 2020), engagement detection in online learning (Hasnine et al., 2021), detection of drowsy learning behaviors on e-learning platforms (Kawamura et al., 2021), and automatic sign language recognition (Hosain et al., 2019), among many others. ...
Chapter
There is a growing interest in the research and use of multimodal data in learning analytics. This paper presents a systematic literature review of multimodal learning analytics (MMLA) research to assess (i) the available evidence of impact on learning outcomes in real-world contexts and (ii) explore the extent to which ethical considerations are addressed. A few recent literature reviews argue for the promising value of multimodal data in learning analytics research. However, our understanding of the challenges associated with MMLA research from real-world teaching and learning environments is limited. To address this gap, this paper provides an overview of the evidence of impact and ethical considerations stemming from an analysis of the relevant MMLA research published in the last decade. The search of the literature resulted in 663 papers, of which 100 were included in the final synthesis. The results show that the evidence of real-world impact on learning outcomes is weak, and ethical challenges of MMLA work are rarely addressed. In this paper, we discuss the results through the lenses of two theoretical frameworks (1) evidence of impact types and (2) ethical dimensions of MMLA. We conclude that for MMLA to stay relevant and become part of mainstream education, future research should directly address the gaps identified in this review.
Article
Full-text available
Multimodal learning analytics provides researchers with new tools and techniques to capture different types of data from complex learning activities in dynamic learning environments. This paper investigates the use of diverse sensors, including computer vision, user‐generated content, and data from the learning objects (physical computing components), to record high‐fidelity synchronised multimodal recordings of small groups of learners interacting. We processed and extracted different aspects of the students' interactions to answer the following question: Which features of student group work are good predictors of team success in open‐ended tasks with physical computing? To answer this question, we have explored different supervised machine learning approaches (traditional and deep learning techniques) to analyse the data coming from multiple sources. The results illustrate that state‐of‐the‐art computational techniques can be used to generate insights into the "black box" of learning in students' project‐based activities. The features identified from the analysis show that distance between learners' hands and faces is a strong predictor of students' artefact quality, which can indicate the value of student collaboration. Our research shows that new and promising approaches such as neural networks, and more traditional regression approaches can both be used to classify multimodal learning analytics data, and both have advantages and disadvantages depending on the research questions and contexts being investigated. The work presented here is a significant contribution towards developing techniques to automatically identify the key aspects of students success in project‐based learning environments and to ultimately help teachers provide appropriate and timely support to students in these fundamental aspects.
Article
Full-text available
New high-frequency multimodal data collection technologies and machine learning analysis techniques could offer new insights into learning, especially when students have the opportunity to generate unique, personalized artifacts, such as computer programs, robots, and solutions engineering challenges. To date most of the work on learning analytics and educational data mining has been focused on online courses and cognitive tutors, both of which provide a high degree of structure to the tasks, and are restricted to interactions that occur in front of a computer screen. In this paper, we argue that multimodal learning analytics can offer new insights into students’ learning trajectories in more complex and open-ended learning environments. We present several examples of this work and its educational application.
Chapter
Full-text available
A property of general interest of real-time collaborative editors is delay. Delays exist between the execution of one user’s modification and the visibility of this modification to the other users. Such delays are in part fundamental to the network, as well as arising from the consistency maintenance algorithms and underlying architecture of collaborative editors. Existing quantitative research on collaborative document editing does not examine either concern for delay or the efficacy of compensatory strategies. We studied an artificial note taking task in French where we introduced simulated delay. We found out a general effect of delay on performance related to the ability to manage redundancy and errors across the document. We interpret this finding as a compromised ability to maintain awareness of team member activity, and a reversion to independent work. Measures of common ground in accompanying chat indicate that groups with less experienced team members attempt to compensate for the effect of delay. In contrast, more experienced groups do not adjust their communication in response to delay, and their performance remains sensitive to the delay manipulation.
Conference Paper
The motivation for this paper is derived from the fact that there has been increasing interest among researchers and practitioners in developing technologies that capture, model and analyze learning and teaching experiences that take place beyond computer-based learning environments. In this paper, we review case studies of tools and technologies developed to collect and analyze data in educational settings, quantify learning and teaching processes and support assessment of learning and teaching in an automated fashion. We focus on pipelines that leverage information and data harnessed from physical spaces and/or integrates collected data across physical and digital spaces. Our review reveals a promising field of physical classroom analysis. We describe some trends and suggest potential future directions. Specifically, more research should be geared towards a) deployable and sustainable data collection set-ups in physical learning environments, b) teacher assessment, c) developing feedback and visualization systems and d) promoting inclusivity and generalizability of models across populations.
Article
Effective collaboration between student peers is not spontaneous. A system that can measure collaboration in real-time may be useful, as it could alert an instructor to pairs that need help in collaborating effectively. We tested whether superficial measures of speech and user interface actions would suffice for measuring collaboration. Pairs of students solved complex math problems while data were collected in the form of verbal interaction and user action logs from the students’ tablets. We distinguished four classifications of interactivity: collaboration, cooperation, high asymmetric contribution and low asymmetric contribution. Human coders used richer data (several video streams) to choose one of these codes for each episode. Thousands of features were extracted computationally from the log and audio data. Machine learning was used to induce a detector that also assigned a code to each episode as a function of these features. Detectors for combinations of codes were induced as well. The best detector’s overall accuracy was 96% (kappa = 0.92) compared to human coding. This high level of agreement suggests that superficial features of speech and log data do suffice for measuring collaboration. However, these results should be viewed as preliminary because the particular task may have made it relatively easy to distinguish collaboration from cooperation.
Article
We review systems that support the management of collaborative interaction, and propose a classification framework built on a simple model of coaching. Our framework distinguishes between mirroring systems, which display basic actions to collaborators, metacognitive tools, which represent the state of interaction via a set of key indicators, and coaching systems, which offer advice based on an interpretation of those indicators. The reviewed systems are further characterized by the type of interaction data they assimilate, the processes they use for deriving higher-level data representations, and the type of feedback they provide to users.
Conference Paper
Collaborative problem solving (CPS) is seen as a key skill in K-12 education—in computer science as well as other subjects. Efforts to introduce children to computing rely on pair programming as a way of having young learners engage in CPS. Characteristics of quality collaboration are joint exploring or understanding, joint representation, and joint execution. We present a data driven approach to assessing and elucidating collaboration through modeling of multimodal student behavior and performance data.