PreprintPDF Available

Artificial Intelligence in Multimodal Learning Analytics: A Systematic Literature Review

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

The proliferation of educational technologies has generated unprecedented volumes of diverse, multimodal learner data, offering rich insights into learning processes and outcomes. However, leveraging this complex, multimodal data requires advanced analytical methods. While Multimodal Learning Analytics (MMLA) offers promise for exploring this data, the potential of Artificial Intelligence (AI) to enhance MMLA remains largely unexplored. This paper bridges these two evolving domains by conducting the first systematic literature review at the intersection of AI and MMLA, analyzing 43 peer-reviewed English articles from 11 reputable databases published between 2019 and 2024. The findings indicate a growing trend in AI-based MMLA studies with a predominant focus on tertiary education and a diverse range of stakeholders. Guided by a novel conceptual framework, our analysis highlights the transformative role of AI across the MMLA process, particularly in model learning and feature engineering. However, it also uncovers significant gaps, including limited application of advanced AI techniques, under-utilization in certain MMLA phases, and lack of large-scale in the field. The review identifies key benefits, such as enhanced personalization and real-time feedback, while also addressing challenges related to ethical considerations, data integration, and scalability. Our study contributes by offering comprehensive recommendations for future research, emphasizing international collaboration, multi-level studies, and ethical AI implementation. These findings advance the theoretical understanding of AI's role in education, providing a foundation for developing sophisticated, interpretable, and scalable AI-based MMLA approaches, potentially revolutionizing personalized learning across diverse educational settings.
Artificial Intelligence in Multimodal Learning Analytics: A
Systematic Literature Review
Mehrnoush Mohammadi, Elham Tajik, Roberto Martinez-Maldonado,
Shazia Sadiq, Wojtek Tomaszewski, Hassan Khosravi
Abstract
The proliferation of educational technologies has generated unprecedented volumes of diverse,
multimodal learner data, offering rich insights into learning processes and outcomes. However,
leveraging this complex, multimodal data requires advanced analytical methods. While Mul-
timodal Learning Analytics (MMLA) offers promise for exploring this data, the potential of
Artificial Intelligence (AI) to enhance MMLA remains largely unexplored. This paper bridges
these two evolving domains by conducting the first systematic literature review at the intersec-
tion of AI and MMLA, analyzing 43 peer-reviewed English articles from 11 reputable databases
published between 2019 and 2024. The findings indicate a growing trend in AI-based MMLA
studies with a predominant focus on tertiary education and a diverse range of stakeholders.
Guided by a novel conceptual framework, our analysis highlights the transformative role of AI
across the MMLA process, particularly in model learning and feature engineering. However, it
also uncovers significant gaps, including limited application of advanced AI techniques, under-
utilization in certain MMLA phases, and lack of large-scale in the field. The review identifies key
benefits, such as enhanced personalization and real-time feedback, while also addressing chal-
lenges related to ethical considerations, data integration, and scalability. Our study contributes
by offering comprehensive recommendations for future research, emphasizing international col-
laboration, multi-level studies, and ethical AI implementation. These findings advance the
theoretical understanding of AI’s role in education, providing a foundation for developing so-
phisticated, interpretable, and scalable AI-based MMLA approaches, potentially revolutionizing
personalized learning across diverse educational settings.
Keywords: AI, Multimodal data, Learning Analytics, Multimodal learning analytics, Systematic
review.
1 Introduction
The increasing integration of educational tools, sensing technologies, interaction devices, and ad-
vanced artificial intelligence (AI) algorithms into learning environments has enabled a more exten-
sive collection of behavioral data across the various modalities through which learners demonstrate
their learning processes—commonly referred to as multimodal data (Sharma and Giannakos, 2020).
Analyzing learner activity across multiple modalities acknowledges that students often display and
communicate knowledge, interests, and intent in diverse ways, extending beyond mere interactions
with personal computers (Worsley et al., 2021). The emerging research field of multimodal learn-
ing analytics (MMLA) leverages this rich tapestry of multimodal educational data, encompassing
everything from students’ digital footprints captured in log files and insights into attention spans
derived from eye-tracking (e.g., Schneider et al., 2021), to emotional analysis via camera recordings
(e.g., Emerson et al., 2020a; Chettaoui et al., 2023), automated speech content analysis of learn-
ers working in groups (e.g., D’Angelo and Rajarathinam, 2024), socio-spatial and motion behavior
analysis of both students and teachers in physical classrooms (e.g., Yan et al., 2022b,a), engagement
measurement across various in-class activities (e.g., Sumer et al., 2023; Aslan et al., 2019), and even
Email: h.khosravi@uq.edu.au
1
analysis of physiological indicators of learners that may be hard to detect with the naked eye but
can reveal critical aspects such as stress (e.g., Alfredo et al., 2023). MMLA thus has the potential to
expand our understanding of learners, contributing to a deeper comprehension of human learning,
transcending the traditional one-size-fits-all educational paradigm, and fostering a more nuanced
appreciation of each student’s individual learning path (Ochoa et al., 2022).
MMLA has seen a significant surge in interest, particularly highlighted by the exponential growth
in research output since the early discussions around its potential in 2012. Several systematic
literature reviews in the area of MMLA have consistently reported on this growth as evidenced
by an increasing number of publications across various academic venues, indicating a broad and
interdisciplinary effort to harness multimodal data for advancing personalized and effective learning
experiences (e.g. Mu et al., 2020; Ouhaichi et al., 2023). However, despite the significant growth
in MMLA research and advancements in sensing technologies and computational methods, the field
has yet to reach its full potential due to prevailing ethical, methodological, and technical challenges
(Yan et al., 2022c).
Giannakos and Cukurova (2023) suggested that integrating AI advancements with multimodal
interaction data can unlock the full potential of MMLA systems, driving significant advancements in
their research and development. This integration can broaden the range of multimodal data sources
that can be analyzed using rapidly evolving multimodal analysis methodologies and tools. Accord-
ing to Davenport (2018) while simple analytics can enhance understanding and support decision-
making through rule-based methods and descriptive visualizations, such as dashboards, AI and
machine learning-powered predictive and prescriptive analytics enable more sophisticated interac-
tions. These advancements can both facilitate the use of intelligent agents to create automated or
semi-automated actions, elevating the potential of analytics beyond descriptive insights, and also
raise critical implications for our current pedagogical practices (Buckingham Shum and Luckin,
2019).
Indeed, the application of AI in education, beyond MMLA, has seen substantial growth, intro-
ducing promising approaches to developing explainable models of student learning (Khosravi et al.,
2022b), identifying students at risk of course failure (Foster and Siddle, 2020), automating and
scaffolding learning activities (Lim et al., 2023), and providing personalized nudges (Payne et al.,
2023) as well as real-time, rich feedback (Jin et al., 2024). These AI-driven innovations also help
educators devise more targeted and effective intervention strategies, thereby enhancing the overall
quality of education. Furthermore, recent advancements in Generative AI (GenAI) are opening new
opportunities to explore the creation of agents capable of continuously monitoring and analyzing
learners’ activities or intervening when necessary. A key question remains: to what extent have
AI-powered developments influenced MMLA? Although some previous systematic literature reviews
have indirectly touched on the methodologies frequently used in MMLA (e.g. Noroozi et al., 2020;
Mu et al., 2020), none have specifically focused on exploring the current state at the intersection of
AI and MMLA.
1.1 Conceptualizing the Review and Research Questions
This paper serves as a conduit between these two evolving research fields by undertaking a systematic
literature review at the nexus of AI and MMLA where MMLA has the potential to benefit from the
latest AI technological advancements, and conversely, AI applications in education can be enriched by
the theoretical underpinnings of MMLA. This paper specifically aims to explore the degree and ways
in which AI has enhanced the MMLA process, particularly focusing on the critical pipeline of tasks
including data collection and processing, data storage, fusion, annotation, modelling, sense-making,
and the formulation of interventions. This examination seeks to illuminate AI’s role in refining these
essential components, thereby advancing the efficacy and impact of MMLA in educational contexts.
To structure our research on the intersection of AI and MMLA, we adopted the framework pro-
posed by Chatti et al. (2012) which covers four main dimensions: Who, What, How, and Why. Who
questions are vital for identifying stakeholders, understanding different perspectives, and determin-
ing the relevance and impact on various individuals or groups. What questions are essential for
defining subjects, providing detailed descriptions and characteristics, and breaking down topics into
2
their essential components. How questions are crucial for explaining methods, outlining solutions,
and detailing the steps and procedures required for the implementation of a process. Finally, why
questions are fundamental for exploring reasons and motivations, understanding the significance of
the topic, and identifying the objectives and purposes behind actions and decisions.
RQ1 An important aspect of the study is to explore the current landscape and state of research in
AI within MMLA studies. For that we leverage the what dimension to better understand the
volume of studies, their geographical distributions, and collaboration patterns, which is crucial
for identifying key trends, emerging areas of focus, and potential gaps in the field, which in
turn can inform and guide future research efforts and collaborations. Our research here will
be guided by the following questions.
What is the current state of research in AI-based MMLA?
What is the evolution of publication volume and types of AI-based MMLA studies over
time?
What is the geographical distribution of AI-based MMLA studies?
What are the patterns of authors’ collaborations, and how have they shaped the geo-
graphical spread of AI-based MMLA studies?
RQ2 Another important aspect is to explore the current context of AI-based MMLA studies. For
that we use the what and how dimensions to investigate the educational levels targeted, the
stakeholders involved, and the modalities utilised in these studies. By identifying these aspects,
we can gauge the reach and inclusiveness of current research, ensuring that AI-enhanced MMLA
solutions cater to diverse educational needs and contexts. This knowledge helps identify un-
derrepresented areas or groups, promoting more balanced and equitable research efforts, and
fostering the development of tailored, effective educational interventions. Our research here
will be guided by the following questions.
In what contexts are AI-based MMLA being applied?
What educational levels are the primary focus of AI-based MMLA papers??
Who are the main stakeholders targeted by AI-based MMLA papers?
What modalities are used in AI-enhanced MMLA papers?
RQ3 A critical aim of this study is to explore the role of AI in AI-based MMLA studies. To achieve
this, we leverage the what and how dimensions to investigate at which steps of the MMLA
cycle AI has been utilized, and to explore how (i.e., using what techniques) AI has been
incorporated in these studies. This dual approach will provide a comprehensive understanding
of the integration of AI into MMLA studies. Our research here will be guided by the following
questions.
How has AI been implemented in MMLA studies?
At which stages of the MMLA process has AI been implemented?
What AI techniques are commonly used in MMLA studies?
RQ4 Another important aim of this study is to examine the experimental designs and settings
employed in AI-enhanced MMLA studies. Here, we leverage the what what dimension to in-
vestigate the types of experimental designs used, the settings in which these experiments are
conducted, and the methodologies applied. This approach will provide a detailed understand-
ing of the experimental frameworks that underpin AI-enhanced MMLA research, helping to
identify best practices and areas for improvement in future studies. Our research here will be
guided by the following questions.
What experimental designs and settings are employed in AI-enhanced MMLA
studies?
3
What are the typical sample sizes used in AI-enhanced MMLA studies?
What types of experimental settings are commonly employed in AI-enhanced MMLA
studies?
What kind of ethical implications are considered in AI-enhanced MMLA studies?
RQ5 Finally, the study aims to understand the motivations for using AI in MMLA studies, as
manifested by reported benefits, to identify the driving factors behind its adoption and the
positive impacts it brings to educational contexts. At the same time, we are interested in
investigating the challenges encountered in the implementation of AI within these studies. We
leverage the Why and What dimensions to explore benefits and challenges. This comprehensive
understanding will contribute to advancing the field of MMLA by optimizing AI technologies
and proactively addressing potential issues.
Why is AI adopted in MMLA studies, and what are some of the underlying chal-
lenges?
What are the reported benefits of AI-enhanced MMLA studies?
What are the reported challenges of AI-enhanced MMLA studies?
We specifically synthesize peer-reviewed journal articles and full conference papers for the last 5 years
from 01/01/2019 to 01/03/2024, which report on primary research utilizing AI within the scope of
MMLA sourced from several esteemed databases such as Web of Science (WoS), Scopus, ACM Digital
Library, IEEE Xplore, Education Resources Information Center (ERIC), SpringerLink, ProQuest
Education Databases, SAGE Journals, Wiley, EBSCO HOST, and Taylor & Francis Online.
2 Background
Significant growth and unprecedented advancement in MMLA have led to the publication of numer-
ous review papers that explore this evolving field. In this section, we review the existing perspec-
tives, underscoring the need for our proposed systematic review, which focuses on the intersection
of MMLA and AI.
Out of the various existing MMLA reviews, three systematic review papers have specifically ad-
dressed the critical role of multimodal data as a pivotal element for success in this domain. Sharma
and Giannakos (2020) presented a comprehensive review of the empirical evidence surrounding the
capabilities of multimodal data in understanding and supporting complex learning phenomena, par-
ticularly addressing underexplored areas such as supporting accessibility in learning environments.
Similarly, Noroozi et al. (2020) explored the optimal identification and integration of various data
modalities to gain comprehensive insights into the cognitive, motivational, and emotional aspects of
learning. Furthermore, Crescenzi-Lanna (2020) examined the complexity of collecting multimodal
data, focusing on identifying information sources and employing tools and strategies suitable for
assessing the progress and behavior of children under six years old. These reviews collectively em-
phasize the essential role of multimodal data in advancing MMLA, highlighting its potential impact
on understanding and improving educational outcomes.
The concept of data fusion in MMLA has been thoroughly examined, with a significant focus
on how different data types and learning indicators are integrated to enhance educational insights.
Mu et al. (2020) conducted a comprehensive literature review on multimodal data fusion in learn-
ing analytics, introducing a conceptual model that categorizes data types into five distinct spaces:
digital, physical, physiological, psychometric, and environmental. Alongside these data types, the
review identifies five key learning indicators—behaviour, cognition, emotion, collaboration, and en-
gagement—that are crucial for understanding learning processes. The study then explores the rela-
tionships between these data types and learning indicators, highlighting various data fusion methods
used in MMLA. By systematically investigating these relationships, the review proposes a framework
that positions multimodal data fusion as a central component of the MMLA process, facilitating a
deeper and more holistic understanding of learning dynamics through the integration of diverse data
4
sources. Furthermore, Wang and Gu (2024) focused on systematically exploring the complex rela-
tionship between learning indicators, multimodal data types, and data fusion strategies to improve
the decision-making process in classroom environments. This study highlighted engagement as the
leading learning indicator and addressed the fusion strategies employed at three levels of integration:
data, features, and decisions.
The topic of ethics has also been extensively studied in the context of MMLA, with several liter-
ature reviews delving into the ethical implications and challenges that arise in this rapidly evolving
field. Prinsloo et al. (2023) explores the delicate balance between utilizing MMLA for educational
insights and safeguarding student privacy, particularly within higher education. This review iden-
tifies MMLA as existing in an ’in-between’ space—straddling the line between laboratory research
and classroom application, and between the encroachment on and respect for student data privacy.
Alwahaby et al. (2022) further contributes by examining the real-world impacts and ethical consider-
ations of MMLA, highlighting the often-overlooked ethical challenges associated with implementing
MMLA in actual educational settings. Finally, Yan et al. (2022c) addresses the ethical, scalability,
and sustainability concerns tied to the use of sensor-based technologies in MMLA, underscoring the
need for careful consideration of these factors to ensure the responsible adoption of MMLA inno-
vations in education. These reviews collectively underscore the critical importance of addressing
ethical issues in the deployment and development of MMLA.
On the other hand, Ouhaichi et al. (2023) conducted a systematic mapping study of the MMLA
literature to review and classify the research methods, trending research themes, and technologies to
provide guidelines for developing and evaluating the MMLA tools. Additionally, three other MMLA
reviews have focused on concerns regarding the practical adaptation and feasibility of deploying
MMLA in real-world and authentic settings. For instance, Shankar et al. (2018) explored the tech-
nical challenges and complexities associated with different stakeholders, multiple data sources, and
data processing activities by analyzing proposed designs for MMLA software architectures. This
study investigated the infrastructures of these software architectures in terms of their capabilities in
MMLA, according to the analytics Data Value Chain. Bin Qushem (2020) systematically reviewed
the studies on MMLA to investigate how it can enhance learning support without relying solely
on technology-mediated tools, by exploring current MMLA methods and approaches and highlight-
ing key challenges (i.e., data-related challenges, environmental challenges, learning and pedagogi-
cal challenges, and technical challenges) in adoption and implementation. Furthermore, Ouhaichi
et al. (2024) addressed the challenges of consistency and effective development of MMLA systems,
emphasizing the lack of standardized models and frameworks and the scattered nature of design
recommendations in MMLA studies through a systematic review.
In addition to these systematic reviews, several non-systematic review papers have provided valu-
able patterns, perspectives, and theoretical insights, guiding and enhancing the direction of MMLA
research. For instance, Giannakos and Cukurova (2023) conducted a semi-systematic literature re-
view to investigate the role of learning theory in MMLA studies. Di Mitri et al. (2018) offered new
insights into the modalities for learning, learning theories, and their practical application in various
learning scenarios. The review paper (Chango et al., 2022) explored the various data types and
modalities utilized, their potential for learning, and the fusion strategies employed to integrate them
within learning environments. Cukurova et al. (2020a) addressed the methodological, practical, and
ethical challenges in MMLA research. Oviatt (2018) provided a summary and critical reflection on
key opportunities and challenges in developing MMLA, focusing on technology trends, participatory
design, and data privacy. Additionally, Worsley and Martinez-Maldonado (2018) conducted a pre-
liminary review of the MMLA research, distinguishing between empirical and non-empirical works,
analyzing publication trends, and identifying key opportunities for future development. Further-
more, In (Pei et al., 2023), the applications and academic development of MMLA were examined by
analyzing publication and citation trends, identifying prolific contributors, exploring collaborative
patterns, and tracing the evolution of research topics over time.
While there has been a wealth of research and numerous reviews conducted on various aspects of
MMLA including the critical role of multimodal data, the integration of diverse data sources, ethical
considerations, and the practical adaptation of MMLA in real-world settings, the specific intersection
of MMLA with AI remains largely underexplored. This gap highlights the need for a systematic
5
review that specifically addresses how AI can further enhance and transform the capabilities of
MMLA in educational contexts.
3 Methodology
This paper aims to provide a credible synthesis and structured review of the findings from studies
on AI in MMLA. To achieve this objective, we conducted a systematic review and meta-analysis
following the principles outlined in the Preferred Reporting Items for Systematic Reviews and Meta-
Analyses (PRISMA) guidelines proposed by Page et al. (2021). This protocol involves two primary
systematic phases: data collection and data extraction. The data collection procedure attempts to
gather relevant studies to address our research questions by designing strong search strings tailored
to research questions, extracting studies from reputable databases, screening the titles and abstracts
of imported studies based on the predefined inclusion criteria, and subsequently refining the collected
studies through full-text screening using a set of stringent constraints known as exclusion criteria. In
the second phase, each included paper is systematically analyzed to extract the essential information
for synthesis and analysis.
Fig. 1 illustrates the process of data collection procedure in our study. Notably, we utilized
Covidence, an online platform designed for systematic review studies, to manage our data collection
and extraction processes, as well as facilitate collaboration among our reviewers.
Identification
Screening
Included
298 papers were excluded
NO
YES
NO
YES
220 duplicates removed by Covidence
YES
NO
Duplicated?
686 studies identified from databases:
262 papers from SpringerLink,
126 papers from Scopus,
102 papers from Web of Science,
48 papers from ERIC,
46 papers from ACM digital libra,
Included?
35 papers from IEEE Explore,
31 papers from ProQuest,
18 papers from Wiley,
10 papers from EBSCO,
5 from Taylor and Francis Online,
3 papers from SAGE.
Excluded?
466 papers for abstract & title screening
168 papers for full-text screening
Inclusion
Criteria
Exclusion
Criteria
43 papers included for synthesis
2 Duplicated papers.
7 secondary research papers.
41 Less than two modal have been used.
35 No contribution to use of AI in MMLA.
40 No empirical results are reported
Figure 1: PRISMA flowchart depicting the data collection process, including a search across eleven
reputable databases, study selection based on inclusion and exclusion criteria, and data management
using Covidence.
6
3.1 Data collection procedure
The data collection process is fundamental to ensuring the integrity and comprehensiveness of a
systematic review. In this subsection, we outlined the methodological steps of this process in three
main subsections to systematically identify, extract, and refine studies from reputable databases as
follows:
3.1.1 Search terms
To ensure comprehensive coverage of relevant literature while remaining aligned with our research
questions, rigorous designing of search strings is imperative. Hence, we initially reviewed the primary
and systematic review studies addressing AI in Education and MMLA, published in prestigious
academic journals and conferences, to find a combination of keywords, phrases, and controlled
vocabulary terms related to the main concepts of our study. Through this synthesis, we identified
AI, Multimodal, and Learning Analytics as the three main keywords in our research. We created
a codebook for each keyword using the ’OR’ (|) operator to ensure a comprehensive search. By
concatenating these codebooks with the ’AND’(&) operator, we formulated our search strings to
capture studies on AI in MMLA within the context of education, as presented in Table 1. It’s
important to note that truncation (*) allows for the inclusion of different word terminations.
Table 1: Designed search strings.
Topic Search terms
Applications: Intelligent System |Machine Intelligence |Intelligent Agent* |In-
telligent Tutoring System |Expert System |Predictive Analytics |
Detection |Feedback |Visualization |Pattern Recognition |Com-
puter Vision.
AI Techniques: Artificial Intelligence |AI |Machine learning |ML |Supervised
learning |Unsupervised learning |Semi-supervised learning |Self-
supervised learning |Reinforcement learning |Predict* |Analy* |
Ensemble Learning |Transfer Learning |Feature Selection |Fea-
ture Extraction |Fusion |Integration |Clustering |Classifi* |
Natural Language Processing |NLP |Generative AI |GenAI |
Deep Learning |Neural Networks |Decision Tree |K-means |
Random Forest |Support Vector Machines |SVM |Logistic Re-
gression |Naive Bayes |k-Nearest Neighbors |k-NN |Fuzzy-Logic
|Bayesian network |latent Dirichlet allocation |genetic algorithm
|genetic programming |Principal Component Analysis |PCA.
&
Multimodal Multi*modal |Multi mode |Multimodality |Multi*sourced |Multi*view |
Multi*Channel |Multiple datasets |Multiple data |Triangulat*.
&
Learning Analytics Learning analytics |Educational Data Analytics.
3.1.2 Database search
In this step, we independently searched eleven reputable and relevant bibliographic databases, in-
cluding Web of Science (WoS), Scopus, ACM Digital Library, IEEE Xplore, Education Resources
Information Center (ERIC), SpringerLink, ProQuest Education Databases, SAGE Journals, Wi-
ley, EBSCO HOST, and Taylor & Francis Online, to identify a comprehensive set of high-quality
studies that included our key search strings in their titles or abstracts. During this process, we
restricted the identified studies by searching among published conference and journal papers from
January 1, 2019, to March 13, 2024 to capture the most relevant and recent advancements in AI and
MMLA, consistent with the approach of other systematic reviews (e.g., D´ıaz and Nussbaum, 2024;
Crescenzi-Lanna, 2020). This period represents a crucial phase of rapid technological development
and methodological progress in both fields, ensuring that our analysis reflects the latest innovations
and trends in AI applications within MMLA. As a result, an initial set of 686 papers, comprising
102 papers from WoS, 126 papers from Scopus, 46 papers from ACM Digital Library, 35 papers
7
from IEEE Xplore, 48 papers from ERIC, 262 papers from SpringerLink, 31 papers from ProQuest
Education Databases, three papers from SAGE Journals, 18 papers from Wiley, ten papers from
EBSCO HOST, and five papers from Taylor & Francis Online, was produced to be refined in the
next steps.
3.1.3 Study screening and selection
In this subsection, we describe the process of study screening and selection steps for our systematic
review. Firstly, we imported our identified 686 papers from databases (exported as .RIS files) to
the Covidence platform. It automatically identified and removed 220 duplicated papers, resulting
in 466 studies for the title and abstract screening (see Fig.1).
Next, an initial screening of the titles and abstracts was performed. In this incremental process,
we kept a paper if it met specific inclusion criteria designed to address our research questions (see
Table 2). Regarding this inclusion criteria, firstly, we limited the publication time between 2019
and 2024 to focus on recent advancements in AI in MMLA. Then, to facilitate comprehension and
accessibility of the reviewers to the project’s scope papers, the constraints, including written in
English, Full text available, and Published in journals or conferences, were applied. Furthermore,
restricting the candidate papers to the primary type of research ensures the depth and originality
of their findings. Finally, to keep the relevance to our research questions, only studies that involve
more than one modality and at least one AI application/technique within the context of education
were considered for the selection step.
To enhance the reliability of the abstract screening step, this process was performed by three
reviewers. Before beginning abstract paper screening, our three reviewers screened ten papers to-
gether to gain an in-depth understanding of inclusion criteria. Then, to increase the reliability of the
screening, our two reviewers, MM and ET, independently screened a random set of 30 papers. Their
inter-rater reliability (Cohen’s kappa) was 0.861, which confirms the high consistency and reliability
of screening by reviewers. Any conflicts or disagreements were resolved through discussion with a
third reviewer, HKH, to ensure the consensus of the final candidate papers in our study. After resolv-
ing disagreements on two papers, the reviewers screened the remaining 431 papers, resulting in 168
included papers for full-text screening. To ensure inter-rater reliability during the full-text screening
stage, our two reviewers screened a random set of 30 papers, achieving an inter-rater reliability
(Cohen’s kappa) of 0.7887. Three conflicted papers were resolved by HKH through a comprehensive
analysis with the reviewers. After reaching this agreement, MM and ET screened the remaining 138
papers. Exclusions were made as follows: seven secondary papers, 41 papers with fewer than two
modalities, 35 papers not contributing to the use of AI in MMLA, and 27 papers lacking empirical
results. As a result, we identified 43 of the most relevant papers for further analysis.
Table 2: The inclusion and exclusion criteria used to screen papers.
Inclusion criteria Exclusion criteria
Published between 2019 and 2024 Non-English papers
Written in English Duplicated papers
Full text available Secondary research papers
Journal or Conference papers Less than two modalities have been used
Primary research papers No contribution to use of AI in MMLA
Conducted in the education field No empirical results are reported
Must have more than one modality
Employs at least one AI application or technique
3.2 Analysis
This subsection aims to identify and extract key information from selected papers to address our
research questions. To achieve this, we coded the included papers as follows:
8
RQ1. To address this research question, we conducted our analysis from two perspectives: 1) The
growth rate of publication in the AI-enhanced MMLA field, and 2) The geographic spread and co-
authorship networks of authors of these papers. To achieve the former, we collected data on the
publication years and paper type, conference or journal, from included papers. This information
allowed us to track the temporal progression of research in the field and understand any trends or
shifts over time. In the latter perspective, the university the authors belong to and their countries
were recorded from each included paper to gain insights into the distribution of research efforts and
potential patterns of collaboration.
RQ2. In response to this research question, we developed a coding scheme to identify and cate-
gorize the multifaceted objectives driving research efforts in the AI-enhanced MMLA literature. In
this coding scheme, the pivotal variables, including the educational level under investigation, main
stakeholders involved, and sources and modalities of input data were extracted from included papers.
These codewords allow us to uncover the context, trends, and priorities of the AI-enhanced MMLA.
RQ3. Focusing on RQ3, we conducted a detailed data extraction process to capture key aspects
of AI utilization in MMLA. It involved extracting critical details regarding the specific phases of the
MMLA where AI was deployed, the particular AI techniques employed, and the role of AI in each
phase. By coding and analyzing these facets from our selected papers, researchers can gain invaluable
insights into the current landscape of AI in MMLA research, which is essential for advancing the
field.
To make such systematic coding within our included papers, we developed a framework outlining
the main phases, steps, and sub-components of the MMLA process. This framework consists of
seven key phases: Data Collection, Data Storage, Pre-processing, Annotation, Fusion, Modelling,
and Analysis (see Figure 2).
The Data Collection phase includes two essential steps: source identification and data acquisi-
tion. During the source identification step, researchers aim to identify and select specific information
Feedback Loop
Labelling
Metadata Generation
Feature Engineering
Standardization
Feature Integration
Visualization
Statistical Analysis
Insight Generation
Anonymizing
Data Cleaning
Transformation
Synchronization
Augmentation
EDA
Model Learning
Pre-processing Fusion
Annotation Modelling
Analysis
...
Information Spaces
Intervention
Data Collection
Digital space
Physical space
Environmental space
Physiological space
Psychometric space
Figure 2: A developed MMLA framework outlining the seven key phases—Data Collection, Data
Storage, Pre-processing, Annotation, Fusion, Modelling, and Analysis—along with their respective
steps, sub-components, and interconnections, providing a comprehensive guide to systematically
integrate AI in MMLA studies.
9
sources to exploit the multifaceted nature of learning environments. These sources, such as physical
activities, are crucial for creating an enriched multimodal dataset that reflects the diverse and latent
dimensions of the learning process (Blikstein and Worsley, 2016). After the source identification,
various techniques and tools, such as face tracking and Kinect cameras, are employed in the Data
Acquisition step to collect massive amounts of real-time data from these sources (Peng et al., 2021;
Di Mitri et al., 2022). The high quality of the collected data in this phase is necessary for the success
of MMLA, as it directly impacts the reliability and interpretability of the insights obtained from the
next phases.
Then, the collected data from different modalities is sent to a learning record database (Eradze
et al., 2017). The Data Storage phase aims to efficiently optimize the storage process in handling
volume, velocity, and heterogeneity of the data through two key sub-components: Data Warehousing
and Data Management. During the Data Warehousing process, collected multimodal data is stored
and organized in a structured manner within the database or data warehouses. Then, the Data
Management sub-component focuses on ensuring the privacy and accessibility of stored data by
finding proper data indexing strategies, which is essential for efficient retrieval and analysis in the
next phases.
The data pre-processing phase aims to convert raw captured data, including text, image,
video, audio, and sensor data, into a consistent and well-structured format suitable for modeling and
analysis. Anonymizing, Synchronization, Cleaning, Transformation, and Augmentation are the key
sub-components in this phase. The Anonymizing sub-component focuses on the privacy concerns
regarding the collected data, such as their sensitivity or existing unexpected details or patterns
unrelated to the learning task within them (Alwahaby and Cukurova, 2024). Removing personal
information (for instance, addresses) and applying blurring or masking techniques in the collected
data are two popular privacy-preserving techniques in this step. The data cleaning aims to enhance
the dataset’s quality and completeness while reducing redundancy by removing noise, duplicates,
and irrelevant data, handling missing values, and correcting errors in the collected data (Rahul and
Katarya, 2023). Furthermore, the diversity of employed tools in the Data Acquisition necessitates
the Synchronization step in the MMLA process to establish the alignment and coordination among
the collected modalities in various timestamps (Shankar et al., 2023; Ochoa and Worsley, 2016).
In the Transformation sub-component, the prepared raw data structures are converted into the
required specific format (Shankar et al., 2023). One of the most well-known techniques in this step
is extracting feature vectors from each data type. Lastly, in the data augmentation, the dataset is
fostered by various augmentation techniques, like resampling, to increase its robustness, diversity,
and class balance, making it more accurate and generalizable for analytical and modeling phases.
The Annotation phase consists of two sub-components: Labelling and Metadata Generation.
In the Labelling sub-component, meaningful learning labels are assigned to the time intervals of the
collected data, typically by experts or self-reports (Di Mitri et al., 2018). These labels are then
used to train and test the supervised learning models. On the other hand, the Metadata Generation
provides supplementary context and information about collected data to facilitate the understanding
and utilizing it effectively (Mangaroska and Giannakos, 2018).
The Fusion phase focuses on unifying and increasing the effectiveness of the available data
from different modalities through Feature Engineering, Standardization, and Feature Integration.
Feature Engineering process seeks to condense multimodal data into a lower-dimensional space while
preserving critical features as much as possible, extracting hidden information, and dropping noises
and redundancy (Ayesha et al., 2020). Dimension reduction techniques, including feature extraction
and feature selection, are the two main techniques used in this step. In the Standardization sub-
component, discovered features are scaled (for instance, by the min-max scaling strategy) to ensure
the equality of the modality’s contribution in the unified feature space. Finally, Feature Integration
provides a unified and enriched representation of collected data by combining complementary and
consensus information from different modalities.
The Modelling phase in MMLA encompasses two key sub-components: Exploratory Data Anal-
ysis (EDA) and Model Learning. During the EDA process, initial examinations, often statistical
analyses, are performed to understand the data distribution, patterns, and relationships, usually
through visualizations like histograms, box plots, and scatter plots (Chan et al., 2023). The insights
10
gained from this step are then used in the Model Learning step to select, develop, or build accurate
models. In the Model Learning step, various AI models, including supervised, unsupervised, semi-
supervised, and reinforcement learning models, are utilized to learn the complex and meaningful
information from multimodal data aligned with the specific problem objectives.
The Analysis phase includes Statistical Analysis, Visualization, and Insight Generation sub-
components. The Statistical Analysis component aims to evaluate the initial assumptions or theories
about the multimodal data, typically through statistical tests. In the visualization process, a visual
representation of the data and discovered insights are generated to facilitate the decision-making
process for stakeholders. Finally, The actionable insights are derived from the analysis results in
the insight generation component. This task usually involves interpreting the results to specify the
contributions of each modality and their relationships and provide recommendations for stakeholders’
interventions to improve the MMLA and learning process.
RQ4. To address this research question, we systematically coded key details of experimental design,
namely sample size, experimental environment, and any noted ethical considerations in the included
papers. By analyzing these aspects, we aimed to assess the quality of AI-enhanced MMLA studies
in terms of their reliability and generalizability and investigate their ethical accountability.
RQ5. The considered codewords regarding RQ5 involved extracting relevant information on the
benefits and challenges of AI-enhanced MMLA approaches reported in the included papers. Sys-
tematically synthesizing such information can provide valuable insight for researchers, educators,
policymakers, and other stakeholders in making informed decisions about the adoption, implemen-
tation, and optimization of AI-enhancing MMLA research.
4 Research findings
4.1 RQ1: What is the current state of research in AI-enhanced MMLA?
This research question analyzes the growth rate, geographic spread and patterns of national and
international collaboration in AI-enhanced MMLA.
4.1.1 What is the evolution of publication volume and types of AI-enhanced MMLA
studies over time?
Fig. 3 outlines the recorded trend in the publication of our included studies, comparing journal and
conference outputs from 2019 to 2024. This figure shows irregular trends in publishing included
papers over this period. In 2019, each of the journals and conferences had two publications. In
the next year, 2020, a significant shift occurred, publishing five conference papers and three journal
publications. However, the bin of 2021 suffers from a reduction in journal publications dropping
back to two, and conference publications decreasing to four. This reduction continues into 2022, with
conference publications dropping to two papers. A significant growth happened in 2023 by increasing
the journal and conference publications to 10 and eight papers, respectively. This change confirms
a recovery interest and activity in the field. Additionally, each of the journals and conferences
published only one paper during the three months of 2024 (our final search on March 13), indicating
a growth potential of AI-enhanced MMLA studies. Overall, the figure highlights the publication
patterns in AI-enhanced MMLA research from 2019 to 2024, marked by initial growth, a mid-period
decrease, and a recovery. This trend reflects the various factors influencing research and publication
activities, particularly the peaks of COVID-19 in 2021 and 2022.
4.1.2 What is the geographical distribution of AI-enhanced MMLA studies?
Fig.4 illustrates the geographical distribution of the contributing authors, distributed across 24
countries on six continents. Europe emerged as the most prolific region, contributing to 27 papers.
In this continent, Germany, Spain, the United Kingdom (UK), and Estonia led with contributions to
11
2
3
2
3
10
1
2
5
4
2
8
1
0 3 6 9 12 15 18
2019
2020
2021
2022
2023
2024
NUMBER OF PAPERS
YEAY OF PUBLICATION
Journals Conferences
Figure 3: Chronological trend in the publication of the AI-enhanced MMLA studies: Journals vs.
Conferences.
4 papers each. Norway and the Netherlands followed with three papers each. Finland contributed to
2 papers, while Italy, Sweden, and Switzerland each contributed to 1 paper. America, as the second-
ranked region appeared in 19 papers, with the USA leading at 16 papers, followed by Canada,
Mexico, and Ecuador with one paper each. Then, Asian authors, by contributing to 12 papers,
highlighted this continent as the third representative. China was the primary contributor appearing
in four papers, followed by Japan in three, India in two, and Taiwan, Malaysia, and Turkey in
one paper each. Africa’s contribution cannot be ignored, with Nigeria, Egypt, and Tunisia each
making valuable contributions in one paper, reflecting the continent’s emerging presence in this
subject. Furthermore, Australian authors contributed to the conduct of two papers reflecting their
participation in this research endeavor. From a global perspective, the USA leads in the AI-enhanced
MMLA publications, followed by China, Germany, Spain, the UK, and Estonia.
4.1.3 What are the patterns of authors’ collaborations and their impact on the geo-
graphical spread of AI-MMLA studies?
For this question, we focused on the two levels of geographical spread in the collaboration patterns
within the selected studies, highlighting international and national cooperations. Hence, we first
classified the included studies by tagging them with two types: International for papers conducted by
authors from multiple countries and National for those written by authors from different universities
within a single country. This analysis resulted in 17 National, 14 International, and 16 unlabeled
papers. Notably, each unlabeled paper was conducted by authors from a single university and not
included in our analysis in this subsection. Additionally, for papers that contributed to spreading
the field at both international and national levels, we considered them as papers involving two types
of collaborations, consequently labeling them with two tags.
Next, we constructed a weighted hypergraph to visualize these patterns and identify the con-
tributions of different countries in the geographical spreading of AI-enhanced MMLA studies. A
hypergraph is a type of network that allows an edge to connect more than two vertices. This
property made it a popular tool for modeling complex relationships and interactions, such as collab-
orations among multiple countries or universities (Gao et al., 2020). Our hypergraph, visualized in
Fig.5, consists of two key components: countries as nodes and International papers as hyperedges.
Each hyperedge, marked by an ellipse, denotes an International paper, and its members (the nodes
inside it) represent the countries that contributed to the publication of that paper. Also, in this
graph, we incorporated the internal collaborative spread of countries by weighting the nodes based
on the number of their national papers conducted. Notably, we indicated the weight of each node
12
1
16
USA, 16 China, 4
Germany, 4
Spain, 4
UK, 4
Canada, 1
Australia, 2
Malaysia, 1
India, 2
Japan, 3
Nigeria, 1
Egypt, 1
Turkey,1
Tunisia, 1
Italy, 1
Switzerland,1
Estonia, 4
Finland, 2
Norway, 3
Sweden, 1
Netherlands, 3
Mexico, 1
Ecuador,1
Taiwan, 1
Figure 4: Geographical distribution of the AI-enhanced MMLA studies
by the size and the number displayed on it.
From Fig.5, it is observable that the USA plays a critical role in spreading the field by collab-
orating on conducting four International and five national papers. Spain, with three International
and two national papers, is marked as the second spreader, followed by China and Germany, which
each contributed to three International and one national paper. Estonia and the UK each shared the
AI-enhanced MMLA knowledge across three International papers, and Japan published two Inter-
national papers and one national paper. Hence, these countries are strong candidates for the third
position. By continuing our analysis, Canada, Egypt, and the Netherlands were recognized as the
next top geographical spreading, each conducting one International and one National paper. Addi-
tionally, Norway also ranked alongside them by collaborating on two International papers. Among
the remaining 13 countries involved in the included papers, 12 countries collaborated on only one
paper that can have a role in the geographical spreading of the field, and Australian papers didn’t
contribute to spreading knowledge between countries or diverse universities.
4.2 RQ2: In what contexts are AI-enhanced MMLA being applied
In this section, we explore the context of AI-enhanced MMLA studies based on three key aspects:
level of education, stakeholders, and used modalities.
4.2.1 What educational levels are the primary focus of AI-based MMLA papers?
The included papers spanned a broad wide range of educational levels, as outlined in Fig.6. The
majority of them, 22 studies, focused on enhancing the learning process at the tertiary level. Among
them, the authors of seven papers conducted their studies at the undergraduate level in universities.
For instance, one paper sought to improve nurse students’ learning and patient outcomes (Vatral
et al., 2023). The other critical tertiary level for researchers was the graduate group of students,
which was the focus of three studies. In one of these (Peng et al., 2021), the authors aimed at im-
proving instructors’ understanding and students’ learning outcomes. Two included papers (Emerson
13
Ecuador
0
1
iwan
India
1
Tunisia
Egypt
Turkey
1
1
0
Mexico
1
Australia
0
Figure 5: Collaboration Network among Countries and Universities in the AI-enhanced MMLA Stud-
ies. Each country, represented by a node in this network, is weighted by the number of collaborative
studies between its universities (national papers). And, each ellipse, a hyperedge, encompasses a
group of countries involved in an international paper.
et al., 2023, 2020a) specifically focused on enhancing the performance and interest of college stu-
dents in game-based learning environments. Finally, one study targeted postgraduate education to
improve their collaborative learning outcomes (Zhou et al., 2024). It is worth noting that the most
conducted papers in tertiary education, nine, did not specify the education levels from which data
were collected. For example, the authors of (Sabuncuoglu and Sezgin, 2023) assisted higher educa-
tion students in evaluating their engagement over time. In another study, (Di Mitri et al., 2022),
the authors presented a real-time feedback CPR Tutor for training students.
Secondary education, with ten papers, was the second most addressed level of education in our
included studies. Three of these papers focused their studies on middle school students to enhance
their learning process. For instance, improving the performance of middle school students during
educational gameplay was the aim of (Moon et al., 2022). Additionally, two papers supported high
school students in achieving better collaborative learning outcomes, exemplified by Olsen et al.
(2020). Notably, the remaining five papers, such as that by Chejara et al. (2023b), conducted their
experiments on the less explored level of secondary education to augment the collaborative learning
quality of the students.
Three papers conducted their experiments at the primary level of education to improve students’
engagement or learning performance, which can be exemplified by Emerson et al. (2020b) and
Chettaoui et al. (2023), respectively. Two papers addressed the learning process enhancement at
the level of early childhood education. For instance, one of them concentrated on Students With
Special Education Needs to enhance the existing Applied Behaviour Analysis (ABA) therapy (Chan
et al., 2023). Moreover, among our analyses, one paper conducted its studies on the in-service
teachers to develop technological pedagogical content knowledge (Huang et al., 2023).
Furthermore, a key observation is that the educational level of participants in five of the included
studies was unknown. These papers have mainly focused on improving learning outcomes (e.g., Nandi
et al., 2021), facilitating management and implementation of the lesson plan in offline classrooms
(Akila et al., 2023), developing tools that capture 21st-century skills (e.g., Huang et al., 2019), and
improving learners’ self-regulated learning (Yun et al., 2020).
14
Unknown, 9 Undergraduate level, 7
Graduate level, 3 College level, 2
Unknown, 5
Middle schools, 3
High schools, 2
Tertiary Education Secondary Education Unknown Primary Education Early Childhood Education In-Service Teachers
Postgradualte
level, 1
Early Childhood
Education, 2
Tertiary Education, 22 Secondary Education,
10
Unknown, 5 Primary Education, 3
In-Service
Teachers, 1
Figure 6: Distribution of the AI-enhanced MMLA studies across academic levels.
4.2.2 Who are the main stakeholders targeted by AI-enhanced MMLA papers?
In this subsection, we categorized the papers based on their target audiences to understand how
they addressed the needs of various stakeholder groups. We identified four types of stakeholders:
Learners, Instructors, Researchers, and Technology Developers. Fig.7 depicts the distribution of
papers across these groups.
Learners (19)
Researchers (16)
Instructors (12)
9
8
6
7
3
4
2
2
Technology developers (11)
2
Figure 7: Distribution of the AI-enhanced MMLA studies across identified target audiences.
Our analysis revealed a significant subset of papers with learners as their stakeholders (19 pa-
pers). They sought to enhance the learning experience, improve academic performance, or enrich
students’ engagement. For instance, Sabuncuoglu and Sezgin (2023) presented the engagement level
of students on dashboards to facilitate their self-evaluation skills. Another set of papers, tailored for
researchers (16 papers), endeavored to enrich and expand the knowledge of MMLA by suggesting
15
new insights, methods, and frameworks and discovering hidden aspects of this field. As an example,
(e.g., Chejara et al., 2023b) addressed the generalizability of the proposed methods for collaboration
quality estimation by focusing on the difference in time scales of multimodal data. The papers
with Instructors as their stakeholders (12 papers) aimed to foster the instructors’ insights into the
learning process, highlight the diverse needs of learners, and provide real-time decision support for
instructors. As a sample of this category, Peng et al. (2021) emphasized the students’ mental states
for teachers as an essential constructor for monitoring to guide them about adapting learning mate-
rials for improving students’ learning outcomes. Furthermore, our analysis showed that technology
developers were the target audience of 11 papers. These papers aimed to develop the assistant
tools and technological infrastructures for facilitating and augmenting the learning process. For
example, Cebral-Loureda and Torres-Huitzil (2021), the smart infrastructure of a digital humanities
laboratory was enhanced by a novel deep learning-based model to capture the emotional states of
students.
It is worth noting that our analysis showed some overlaps among these categories. Six papers
targeted both Learners and researchers. For example, Moon et al. (2022), proposed new insights into
the support of cognitive-affective states of students during educational gameplays. Three papers ad-
dressed the needs of both Instructors and Technology. For instance, Chango et al. (2021) developed
Intelligent Tutoring Systems (ITS) to predict student performance, providing teachers with clear
explanations of this prediction. Two papers supported both researchers and technology developers.
For example, Huang et al. (2019)) targeted these stakeholders by developing tools to capture mul-
timodal data for discovering different collaborative learning insights. Learners and Instructors both
were the target audiences of the two papers. For example, Kawamura et al. (2021) sought to ad-
vise learner and teacher feedback in MMLA. At last, we found Learners and Technology developers
as the stakeholders in two papers. For example, Di Mitri et al. (2022), designed and developed a
cardiopulmonary resuscitation (CPR) tutor, improving the learners’ CPR skills by providing audio
feedback.
4.2.3 What modalities are used in AI-enhanced MMLA papers?
We used the categories outlined by Di Mitri et al. (2018) and Mu et al. (2020) for grouping selected
papers into different information spaces and modalities. Fig.8 provides an overview of the results.
Physical space. In this space, authors seek to specify meaningful learners’ physical activities that
reflect or influence different dimensions of their learning process. Authors of 18 out of 43 included
papers, exemplified by (Israel et al., 2021), involved the facial expression features in their learning
analysis processes, which confirms their significant role in exploring meaningful insights into the
learning process. Speech modality is another prominent physical activity that emerged in 17 studies
such as (Lin et al., 2023), followed by Eye-tracking modalities in 12 papers (e.g., Emerson et al.,
2020a). Motion and Gesture modalities were each used in seven papers. They can be exemplified
by (Chng et al., 2020) and (Closser et al., 2022), respectively. Additionally, Visual features were
collected in six papers, such as (Cebral-Loureda and Torres-Huitzil, 2021). Head position and posture
each fed into the learning analytics in two papers, for instance, (Akila et al., 2023) and (Yusuf et al.,
2023). Hand and eyebrow movements and the mouth region were other physical activities considered
by the authors in one paper each. (Lee et al., 2023) and (Chejara et al., 2023a) serve as examples
for them.
Digital space. This space encompasses diverse digital traces on digital platforms engaged in the
learning process, revealing the state of the learning process. Systems logs are informative digital
traces collected by authors of 12 included studies, such as (Chango et al., 2021). Additionally, textual
information like learners’ chats, posts, and comments is the second most frequently focused digital
data among the included papers, appearing in six studies, which can be exemplified by (Ouyang
et al., 2023). Performance data appearing in five papers (e.g., Chango et al., 2021), stood in third
place. Furthermore, screen recordings were another digital trace considered in the two included
studies, such as (Ouyang et al., 2023).
16
18
17
12
7
7
6
2
2
1
1
1
12
6
5
2
7
4
2
2
2
1
1
9
4
1
1
1
1
1
0 2 4 6 8 10 12 14 16 18
Facial Expression
Speech
Eye-tracking
Motion
Gesture, Pose
Visual
Posture
Head Position
Mouth Region
Hand Movements
Eyebrow Movements
Logs
Textual
Performance
Screen Recording
Electrodermal Activity
Electrocardiogram
Electroencephalogram
Skin temperature
Respiratory Belt
Blood Volume Pulse
Myo Electromyographic
Questionnaires
Location, Spatial
Humidity
Temperature
Light intensity
Seat pressure
Carbon dioxide concentration
NUMBER OF PAPERS
Phyical Space Digital Physiological Environmental
Psychometric
Figure 8: Frequency of modalities utilized in the AI-enhanced MMLA studies, categorized by phys-
ical, digital, physiological, psychometric, and environmental spaces (Mu et al., 2020)
Physiological Space. The factors in this space reflect the learners’ mental and health state
during their learning. EDA, the most frequently used physiological data among the included papers,
exemplified by (Reilly and Schneider, 2019), emerged in 12 studies, followed by Electrocardiogram
(ECG) data in four papers, such as (Yun et al., 2020). Skin temperature, Respiratory Belt (RB),
and Electroencephalogram (EEG) were each utilized in two papers. They can be exemplified by
(Chan et al., 2023), (Nandi et al., 2021), and (Sharma et al., 2019), respectivley. Furthermore,
Blood Volume Pulse (BVP) used in (Sharma et al., 2019), and Electromyographic (EMG) employed
in (Di Mitri et al., 2022) were other involved physiological modalities.
Psychometric Space. This information source, which appeared in nine papers (for instance, in
(Lin et al., 2023)), primarily captured the learners’ mental states through their self-report question-
naires.
Environmental Space. In this case, authors aimed to identify key factors in the physical en-
vironment that significantly influenced students’ learning processes. Among our 43 analyses, four
papers, such as (Li et al., 2023), involved the location or spatial information of learners during
their learning analyses. Also, the data on humidity, temperature, light intensity, and indoor carbon
dioxide collected together in paper (Chan et al., 2023), along with seat pressure data used in paper
(Kawamura et al., 2021), emerged as other critical environmental modalities during our analyses.
It is crucial to note that integrating information from these spaces allows authors to explore
unknown aspects of learning and gain holistic and accurate insights into the learning process. Fig.9
depicts the distribution of included papers across information spaces. This figure confirms the
17
Figure 9: Distribution of the AI-enhanced MMLA studies across information spaces.
high usage of information modalities from physical space in the AI-enhanced MMLA studies, which
appeared in 41 papers. Among these, nine papers collected their multimodal data exclusively from
this space. For instance, the authors of (Ivleva et al., 2023) combined audio(speech) and facial
expression modalities to recognize the emotional state of students and teachers. Digital space, with
19 appearances in the papers, ranked as the second most used informative space. In 15 of these
papers, modalities were from both Physical and Digital spaces. For example, one study utilized gaze
(eye-tracking), speech, and log data (from an intelligent tutoring system) to predict collaborative
learning outcomes (Olsen et al., 2020).
Modalities from the physiological space, appearing in 10 papers, were the next most frequently
used. In the four cases of these papers, the authors integrated this information with physical
modalities to explore diverse learning dimensions. As an example, in one paper (Huang et al.,
2019), the collaborative learning states were estimated by integrating EDA data with eye-tracking
and motion data. It is worth noting that the authors of the two papers focused particularly on fusing
the different modalities from this space. Specifically, the authors of (Nandi et al., 2021) collected
EEG, EDA, and RB to estimate learners’ emotions in the e-learning context. Next, questionnaires
from the psychometric space were the most utilized modalities among the included papers, appearing
in nine studies. Among them, four papers focused on different aspects of learning by analyzing
modalities from both physical and psychometric spaces. Take the case of one study where the
authors collected gaze, speech, and questionnaire data to interpret effective collaborative learning
interactions (Zhou et al., 2024). Moreover, three papers enriched their data collection by integrating
modalities from psychometric space with both physical and digital spaces. For instance, in one study,
facial expressions, eye tracking, system logs, performance data, and questionnaires were investigated
to predict student performance and interest after interacting with a game-based learning environment
(Emerson et al., 2020a).
Lastly, environmental modalities appeared in six papers, combined with modalities from other
spaces. Two papers utilized these modalities alongside physical and physiological information. For
example, the authors of (Kawamura et al., 2021) utilized students’ heart rates, seat pressure, and
facial expressions to model their level of wakefulness. One paper integrated them with physical
modalities (i.e., Zhao et al., 2024), and another (i.e., Chan et al., 2023) combined them with modal-
ities from physical, digital, and physiological spaces. Additionally, one paper contained multimodal
data collected from environmental, physiological, physical, and psychometric spaces. In particular,
its authors enhanced their analyses by investigating the 3D positions of body joints alongside the
collected questionnaires from teachers and the gathered EMG and visual data from students to
develop a CPR tutor system with real-time feedback generation (Di Mitri et al., 2022). Finally, in
another study, spatial data was combined with visual and questionnaire information to estimate the
position of participants within the learning environment (Li et al., 2023).
18
4.3 RQ3: How has AI been implemented in MMLA studies?
Our analysis categorized the implementation of AI into several main phases based on the framework
presented in Fig 2, above.
4.3.1 At which stages of the MMLA process has AI been implemented?
8
1
4
26
3
1
15
36
2
0 4 8 12 16 20 24 28 32 36
Data Acquisition
Data Anonymization
Data Cleaning
Data Transformation
Data Augmentation
Data Labelling
Feature Engineering
Model Learning
Insight Generation
NUMBER OF PAPERS
Collection
Pre-processing
Analysis
Annotation
Fusion
Modelling
Figure 10: Frequency of AI utilization across various phases and subcomponents of the MMLA
process in the AI-enhanced MMLA studies.
Fig. 10 provides a visual representation of the roles distribution of AI throughout various phases
of MMLA in the analyzed papers. Notably, a significant portion of these papers, totalling 36,
leveraged AI techniques predominantly in the Model Learning step to extract valuable information
from multi-modal datasets, aiding decision-making processes. Moreover, the pre-processing phase
stands in the second rank of times used of AI techniques, occurring 34 times across 43 papers. In this
phase, the authors of a paper utilized AI techniques for the privacy preservation of learners, and four
papers employed AI methods to address noise, missing values, and erroneous data. Furthermore,
in the Data Transformation sub-component, AI techniques played pivotal roles in converting data
into formats suitable for analysis, as observed in 26 studies. Additionally, three papers enhanced
the class balance of multi-modal data through popular data augmentation techniques in AI.
Furthermore, during our analysis, various AI techniques emerged in 15 papers for feature en-
gineering from feature selection and feature extraction perspectives, positioning the Fusion phase
as the third most AI-enhanced phase in MMLA. Then, in the Data Collection phase, eight papers
employed AI techniques to enhance the accuracy and richness of the collected data during the Data
Acquisition step. Also, the authors of two papers applied these techniques to derive actionable in-
sights from the modeling results in the Analysis phase, which ranked after Data collection. Lastly,
an AI technique was used to annotate data with relevant labels in the Labelling step, placing the
Data annotation phase as the least AI-enhanced phase in MMLA. The detailed descriptions of the
AI-enhanced steps for each included paper are reported in Appendix 1.
19
4.3.2 What AI techniques are commonly used in MMLA studies?
The following details outline the AI techniques employed across the various AI-enhanced phases of
MMLA in the included papers.
AI-based approaches in the Data Collection phase
Concerning this phase, in the data acquisition step of the eight included studies, advanced AI
techniques were applied to ensure the quality and comprehensiveness of the collected data. These
methods effectively minimized redundancy, facilitating the subsequent pre-processing and storage
steps. They were tailored to five diverse tasks: Object Detection, Facial Expression, Speech Recogni-
tion, Spatial Orientation Identification, and Gaze Behavior Identification. These tasks aligned with
the specific goals of the papers, and each contributed to a more robust data collection framework.
Fig.11 illustrates the distribution of the included studies across these tasks, as well as the employed
AI-based tools in each category are as follows:
Face Recognition: In paper (Chng et al., 2020), the OpenFace toolkit was used to identify
students and instructors by labeling individuals in each data collection instance. This applica-
tion was crucial for detecting episodes of collaboration. In another study (Ivleva et al., 2023)
authors generated their data with the Google Image Search API, and FER2013 automatically
detected, centered, resized, and cropped the facial regions, ensuring that each face was roughly
centered and occupied a similar amount of space in each image. Furthermore, BlazeFace, a
neural network model for real-time face detection, was used by (Akre et al., 2023) to confirm
the presence of faces in the frames during the data acquisition process.
Facial Expression: Authors in paper (Peng et al., 2021) employed ARKit packages for face
tracking on the iPhone, involving AI techniques to process depth sensor data and generate
a facial mesh. Additionally, they utilized ARKit packages to detect various facial attributes
and calculate blend shape coefficients that rely on AI algorithms. In another study (Moon
et al., 2022), student facial data was collected using two well-known facial-expression detec-
tion toolkits: OpenFace and Facial Expression Recognition (FER-2013). OpenFace generates
AU data by tracking facial-muscle movements, while the FER-2013 data-driven open-source
toolkit computes the probabilities of the “big five” emotions based on image-based emotion
classification data.
Speech Recognition: CoTrack, an audio-capturing prototype system, was utilized to perform
voice detection, directional analysis, and feature extraction from the audio during the data
3
2
1
1
1
0 1 2 3
Face Recognition
Facial Expression
Speech Recognition
Gaze Behavior Identification
Spatial Orientation Identification
NUMBER OF PAPERS
AI TECHNIQUES
Figure 11: Frequency of AI techniques utilized in the Data Acquisition step of AI-enhanced MMLA
studies, categorized by Face Recognition, Facial Expression, Speech Recognition, Gaze Behavior
Identification, and Spatial Orientation Identification tasks.
20
acquisition process of the paper (Chejara et al., 2020).
Spatial Orientation Identification: Authors in (Di Mitri et al., 2022) used the Microsoft
Kinect v2 depth camera tool for collecting data in the CPR Tutor, a C# application running
on a Windows 10 computer. This tool utilizes depth-sensing technology, which often involves
machine-learning algorithms for depth estimation and body tracking. These algorithms enable
the camera to capture 3D kinematic data of body joints.
Gaze Behavior Identification: In the data acquisition phase of (Chettaoui et al., 2023),
the gaze module, an AI-driven gaze tracking system, was employed to track students’ visual
attention on specified Areas of Interest (AOIs) in real-time. This algorithm leverages the
Dlib library’s facial key points predictor, which combines facial detection with gaze-tracking
capabilities. The Dlib library was chosen due to its pre-trained detector based on the IBUG
300-W face landmark dataset, employing an ensemble of regression trees to identify 68 facial
landmarks.
AI-based approaches in the Pre-processing phase
The AI techniques discovered in the different steps of the Pre-processing phase, based on the analysis
of 43 papers, are as follows:
Data Anonymization. In study (Li et al., 2023), authors utilized OpenCV and the MMTracking
algorithm to anonymize students’ facial identities. They achieved this by hiding the facial identities
with the black boxes placed on the top 20% region of the tracking box generated through these
techniques.
Data Cleaning. In this step, paper (Ivleva et al., 2023) utilized a face recognition library to
validate and filter out non-face images from the dataset. Additionally, the Deepface framework
was employed to identify images with high emotion recognition rates and to eliminate incorrectly
labeled images from the dataset. In paper (Sabuncuoglu and Sezgin, 2023), authors employed Dlib’s
feature extractor to promote data uniformity by centering and cropping faces to a standardized
resolution. Moreover, in another paper (Moon et al., 2022), the data bias and missing data problems
in multimodal data were addressed by a KNN-based imputation technique. Also, the authors of
paper (Chng et al., 2020) handled the issue of duplicate arising from the simultaneous use of two
Kinect sensors by aiding the Decision Tree method.
Data Transformation. In most cases of our included papers, this step involved extracting task-
specific features from raw data. We categorized the utilized AI techniques in this step of the included
paper based on the used strategies, as illustrated in Fig.12, as follows:
Facial Expression: During our analysis, five studies (e.g., Ma et al., 2022; Sabuncuoglu and
Sezgin, 2023) utilized the OpenFace toolkit to extract facial Action Units (AUs) or 3DFace
Landmarks of the frames captured from videos directly related to facial expression. Addition-
ally, this toolkit was employed to generate the facial expression recognition modules in the (Lin
et al., 2023). On the other hand, FACET is another well-known toolkit that uses computer
vision and machine learning techniques to analyze facial expressions and emotions from video
streams or images. This toolkit was used to extract and analyze AUs from video frames during
the Data Transformation step of two papers (Emerson et al., 2023, 2020a).
Concerning the Data Transformation step of the remaining included papers, in the study by
Peng et al. (2019), a well-trained universal model (haarcascade-frontalface-default) in OpenCV
was employed to detect and extract facial expressions from recorded videos (screenshots per
second), while Cebral-Loureda and Torres-Huitzil (2021) combined OpenCV with a face recog-
nition library to identify the critical points of the user’s face. Additionally, they utilized the
Py-Feat library to identify facial expression features, including action units, emotions, and
landmarks, from images and videos. In the next paper (Chango et al., 2021), the authors
21
analyzed the videos using the Microsoft Emotion API (2019 Automatic Facial Recognition
Software), which involved processing the video frames to transform them into structured and
categorized emotional data suitable for further analysis. Also, to analyze the captured frames
from students’ face videos, an open-source Python package that contains a CNN was performed
by Israel et al. (2021). Finally, In the paper by Akila et al. (2023), a deep-learning model was
trained and fine-tuned offline with facial images captured from video frames to identify the
target person’s facial features.
Speech Recognition: The OpenAI Whisper speech recognition package was used by au-
thors of two studies, Lin et al. (2023) and Zhao et al. (2024), for automatically generating
transcriptions from audio data. Among them, the paper by Zhao et al. (2024) focused on
the Whisper-large model. Additionally, both Chejara et al. (2023b) and Zhao et al. (2024)
employed Voice Activity Detection (VAD) methods to extract utterance timing (i.e., speaking
time and turn-taking) from continuous audio data. In another study (Zhou et al., 2024), the
audio information, including the content, speaker ID, and time stamps of the start and end
time of each speech, were automatically detected by Amazon Transcribe, an automatic speech
recognition service, and saved in as.json files.
As further examples in this category of methods, in the paper by Ivleva et al. (2023), the Libros
library was operated to extract key audio features, including Root Mean Square Energy, Zero-
Crossing Rate, and Mel Frequency Cepstral Coefficients (MFCCs), which were concatenated
and returned as NumPy array tailored to emotion detection task. In the paper by Ma et al.
(2022), authors used online transcription services to generate the textual transcript for each
dyad, openSMILE for extracting acoustic-prosodic features, and VGGish for generating em-
beddings from audio spectrograms in its Data Transformation step. Additionally, the authors
of (Vatral et al., 2023) employed a trained deep-learning model to extract specific features
from audio data. Finally, the Automatic Speech Recognition (ASR) models have been applied
for student categorization by authors of paper (Akila et al., 2023) in this step.
Spatial Orientation Identification: In three papers placed in this category, e.g., Kawa-
mura et al. (2021) and Ma et al. (2022), the OpenFace toolkit was employed to estimate the
head poses, resulting in a vector containing the location of the head concerning the camera.
Additionally, the authors of (Sabuncuoglu and Sezgin, 2023) and (Cukurova et al., 2020b)
utilized the OpenPose, a powerful deep learning-based library, to extract pose features from
13
8
7
6
3
3
1
0 2 4 6 8 10 12
Facial expression
Speech recognition
Spatial orientation
Gaze Behavior Identification
Natural language processing
Face Recognition
Attention Behavior Identification
NUMBER OF PAPERS
AI TECHNIQUES
Figure 12: Frequency of AI techniques utilized in the Transformation step of AI-enhanced MMLA
studies, categorized by Facial Expression, Speech Recognition, Spatial Orientation, Gaze Behavior
Identification, Natural Language Processing, Face Recognition, and Attention Behavior Identifica-
tion tasks.
22
videos. In paper (Akila et al., 2023), the Transforming Eyesight with Retina face model was
employed to extract the head-pose parameters from video data. In another paper (Cebral-
Loureda and Torres-Huitzil, 2021), MediaPipe was used for gesture recognition. MediaPipe is
an open-source framework that provides artificial vision technologies, a combination of deep-
learning-based models, to developers for quickly creating visual perception applications, such
as gesture recognition, object tracking, face detection, and multi-hand tracking, with real-
time performance. At last, the authors of (Li et al., 2023) adopted the multi-object tracking
function implemented in MMTracking to extract the positions and motions from video data.
Gaze Behavior Identification: Among the 43 included papers, two papers, (Ma et al.,
2022) and (Sabuncuoglu and Sezgin, 2023), utilized OpenFace as a gaze behavior identification
toolkit for analyzing video data. In the former, the authors used this toolkit for identifying
specific gaze behaviors, including eye landmarks, eye direction vectors, and eye directions in
radius. In the other case, it was employed to extract Gaze Directions from video frames.
Additionally, The (Lin et al., 2023) authors employed OpenFace 2.0 to build the eye gaze
tracking modules. Also, they utilized the Yolov7-tiny to specify the objects that students were
looking at. In the paper by Zhou et al. (2024), the authors utilized computer vision techniques,
known for their high accuracy (Zhou et al., 2023), and the Computer Vision Annotation Tool
(CVAT) (cvat.org) for identifying and labeling gaze behaviors (i.e., gazing at peers, gazing at
laptops, gazing at tutors, and gazing at other objects) in video data, respectively. In another
study (Vatral et al., 2023), the authors utilized the YOLOv5L model to generate person-class
bounding boxes, enabling the computation of a specific eye gaze feature called PersonGaze by
measuring the overlap between them and Tobii gaze coordinates. At last, in paper (Peng et al.,
2021), the eye-blink frequency, an eye-related feature, was estimated using the peak detection
methods employed to the coefficient time series data.
Natural Language Processing: The studies in this category employed AI-based models to
transform the text into numerical representations (vectors) that capture semantic meaning. In
the paper by Peng et al. (2019), the authors involved the Baidu NLP to analyze and classify
the emotional tendency of each ’Danmaku’ comment. Then, Jieba, an NLP tool, specifically
designed for Chinese text segmentation, was adopted to segment sentences into individual
words. Finally, they used Gensim, a library for topic modeling and document similarity analysis
using unsupervised machine learning, to convert words into vector representations. In the
paper by Ma et al. (2022) employed three language models, including Word2Vec, Fine-tuned
BERT, and Speaker-aware Fine-tuned BERT, to extract semantic features from the generated
textual transcript from audio data. In the paper by Emerson et al. (2023), two well-known
word embedding techniques, 300-dimensional GloVe embeddings and 1024-dimensional ELMo
embeddings, were applied to encode the students’ written reflection response.
Face Recognition: In the paper by Akila et al. (2023), the Transforming Eyesight with
Retina face (TER) model was utilized to identify faces from video data collected using Internet
of Things (IoT) technologies in the classroom. In another study (Chejara et al., 2023a), faces
were detected using the face detection framework proposed by Viola and Jones (2004), which
contains image processing and AdaBoost learning methods, to identify mouth regions as a
vision-based speaking activity feature. Additionally, In the paper by Cukurova et al. (2020b),
FaceNet, an open-source library, was involved in the feature extraction process from videos by
recognizing students and matching their actual IDs with assigned random identifiers.
Attention Regulator Behavior Identification: Lee et al. (2022) utilized ResNet archi-
tectures (ResNet-18, ResNet-50, ResNet-101), pre-trained on ImageNet, to extract frame-level
attention regulator behaviors from video data. Additionally, they employed a CNN-RNN
model to identify the video level of these behaviors.
Data Augmentation. Imbalanced classification is a common problem in machine learning where
the training dataset has an imbalanced distribution of class instances. This imbalance leads to poor
predictive performance, particularly for the minority class, addressed in the Data Augmentation
23
steps of three included studies (e.g., Moon et al., 2022; Lee et al., 2023). Moon et al. (2022) and Lee
et al. (2023) generated synthetic samples for the minority class using the SMOTE, which is based on
the k-Nearest Neighbour (kNN) algorithm. While in the paper by Chan et al. (2023), the APIs from
the Python imbalanced-learn toolbox were chosen to perform several data resampling techniques,
including RUS, AllKNN, Tomek, ROS, SMOTE, SMOTENC, SVMSMOTE, SMOTETomek.
AI-based approaches in the Fusion phase
In this phase of MMLA, the feature engineering process plays a pivotal role in preparing multimodal
data for analysis. In the following, we outline the AI methods utilized in this sub-component of
included papers from the perspective of their strategies, including feature extraction and feature
selection. Also, Fig.13 illustrates the distribution of the used methods across these strategies.
5
2
2
1
1
1
1
1
2
1
1
1
1
1
0 1 2 3 4 5
PCA
Optimal Matching
Hidden Markov Model
SVD
K-means
Decision Tree
Ward’s Clustering
k-Nearest Neighbou
Random Forest
RELIEF-F
CfsSubsetEval
Support Vector Machine
Univariate Linear Regression
Chi-Square Distribution Function
NUMBER OF PAPERS
Feature Selection
Feature Extraction
Figure 13: Frequency of AI methods utilized in the Feature Engineering step of AI-enhanced MMLA
studies, categorized by Feature Extraction and Feature Selection strategies.
Feature Extraction. Feature extraction methods focus on discovering informative task-
specific features by extracting hidden features or patterns from data that enhance model
performance or interpretability (Guyon and Elisseeff, 2006). Principal Component Analysis,
PCA, is a well-known unsupervised feature extractor used in five included papers (e.g., Sharma
et al., 2019; Yun et al., 2020). Notably, PCA in these papers significantly reduced the chance of
models overfitting. Also, Singular Value Decomposition, SVD, is another unsupervised method
employed by Moon et al. (2022) to extract features from data. Closser et al. (2022) involved
k-means clustering analysis to identify behavior profiles that emerged across the participants
and to examine verbal strategies. Furthermore, in paper by Ouyang et al. (2023), Optimal
Matching (OM) and Ward’s Clustering (WC) methods both cooperated to find collaborative
pattern types among collaborative problem-solving (CPS) activities. Then, the transitional
24
characteristics between hidden states (sequence features) of the collaborative patterns were
revealed by fitting Hidden Markov Models (HMM) for each type of collaboration pattern.
Moreover, Yusuf et al. (2023) employed HMM to estimate the transition matrix from the
observed behavioral features. The OM was then used to identify representative sequences of
the hidden states.
On the other hand, we found two included papers that adapted supervised machine learning
methods in this step, consequently making the extracted features more specific for their appli-
cations. Cukurova et al. (2020b) utilized Decision Tree methods (DT) to initialize indicators
for making, watching, speaking, and listening from the data. Then, these indicators were used
as new features and fed into another model to predict competence in the modeling phase. Lee
et al. (2022) adopted a kNN classifier to identify Attention regulation behaviors, which then
were used as input for some classification models for Attentional state recognition.
Feature Selection. Feature selection methods directly identify and select the most relevant
features from a data for modeling. By eliminating irrelevant, redundant, or noisy features,
these techniques help to improve model performance and reduce overfitting. Additionally, they
enhance interpretability by preserving the physical meanings of the original features (Li et al.,
2017). Random Forest (RF), a popular machine learning method, was used for the feature
selection process in two papers, by Sharma et al. (2019) and Yun et al. (2020), due to its ability
to provide feature importance scores. As another method, Ma et al. (2022) employed a Support
Vector Machine, SVM, to specify the predictive unimodal features for impasse detection. SVM
is known for its robust classification performance, particularly with small-sized datasets. In the
continuation of analysis, we found that the CfsSubsetEval method, a correlation-based feature
selection technique implemented in the WEKA tool, was used in by Chango et al. (2021).
Peng et al. (2021) adopted the RELIEF-F method (Urbanowicz et al., 2018) to select a set of
relevant features from the data. Emerson et al. (2020b) utilized univariate linear regression
tests for identifying features with p-values less than or equal to 0.15 from the training set.
In the final case (Chettaoui et al., 2023), the chi-square distribution function was involved in
selecting K’s best features. It is worth noting that the founded methods in this step of included
papers have involved the ground truth labels during their feature selection process, indicating
a supervised approach.
AI-based approaches in the Annotation phase
During our 43 analyses, we identified only one paper that employed AI-based methods for annotation.
Akre et al. (2023), involved the MobileNet architecture during their labeling process. This neural
network model was fine-tuned on the DAISEE dataset and used to generate pseudo-labels for the
multimodal data.
AI-based approaches in the Modelling phase
Among the included papers, the Model Learning step of this phase has received the most attention for
utilizing AI techniques. We categorized employed methods in this step in terms of their highlighted
characteristics in their papers, illustrated in 14, as follows:
Traditional Classification Models. These methods were employed to classify data into
predefined classes or categories by using traditional techniques. SVM emerged as the most
frequently used traditional classification model, contributing to eight studies (e.g., Reilly and
Schneider, 2019; Sharma et al., 2019). DT and kNN models each appeared in five studies,
which can be exemplified by (Chettaoui et al., 2023) and (Chan et al., 2023), respectively,
while the Naive Bayes Classifier (NBC) was used in four papers (i.g., Yusuf et al., 2023; Reilly
and Schneider, 2019). In two other studies, e.g., (Peng et al., 2021), Multi-Layer Perceptron
(MLP) appeared as a classification model. Rule Induction and Bayesian Network models were
each employed in one study (Chango et al., 2021; Lee et al., 2022).
Traditional Regression Models. The models in this category were designed to address tasks
with continuous target variables by predicting their values. Support Vector Regression (SVR),
25
8
5
5
4
4
1
1
2
2
2
2
1
1
1
2
12
3
2
2
1
4
4
3
2
0 2 4 6 8 10 12
Support Vector Machine
Decision Tree
K-Nearest Neighbor
MLP
Naive Bayes Classifier
Rule Induction
Bayesian Network
MLP
Linear Regression
Logistic Regression
Support Vector Regression
Lasso Regression
Fuzzy Logic Approach
Gaussian Process Regression
K-means
Random Forest
Gradient Boosted Regression
XGBoost
Adaptive Boosting
CatBoost
CNN
LSTM
DNN
Deep MLP
NUMBER OF PAPERS
Traditional Calssification
Traditional Regression
Ensemble-Based
Traditional Clustering
Deep Learning
-Based
Figure 14: Frequency of AI models utilized in the Model Learning step of AI-enhanced MMLA
studies, categorized by Traditional Classification, Traditional Regression, Traditional Clustering,
Ensemble Learning, and Deep Learning techniques.
Logistic Regression, Linear Regression, and MLP were utilized in two papers, e.g., (Emerson
et al., 2020b) and (Closser et al., 2022), and consequently stood as the first-ranked performers
among other employed regression methods. As well as, Lasso Regression, Gaussian Process
Regression, and Fuzzy Logic Approach each appeared in one study (e.g., Emerson et al., 2020b;
Sharma et al., 2019). These results indicate a diverse use of regression techniques to fit various
multimodal data and modeling requirements.
Traditional Clustering Models. Models in this category employ unsupervised learning
techniques to organize similar samples into groups based on specific metrics. Huang et al.
(2019) and Chejara et al. (2023a) utilized the K-Means model to identify collaborative states
from different perspectives. K-Means is a metric-based clustering algorithm well-known for its
effectiveness in handling data with convex-shaped clusters (linear data).
Ensemble Models. These models train multiple machine-learning models and then com-
bine their results to make a decision, typically in supervised machine-learning tasks. This
combination improves the performance and robustness of models, as well as their overfitting
significantly decreases (Sagi and Rokach, 2018). RF, a well-known model in this category,
was the most frequently used in the included papers, appearing in 13 studies, which can be
exemplified by (Kawamura et al., 2021) and (Som et al., 2020). Gradient Boosted Regression
(GBR) was utilized in three studies, such as (Vatral et al., 2023), while XGBoost and Adaptive
Boosting models were each used in two studies, which can be exemplified by (Chejara et al.,
2020). Additionally, CatBoost Regression was employed in one study (Kawamura et al., 2021).
These results illustrate the wide use of RF in the Model Learning step of AI-enhanced MMLA
26
studies. That is because of its precision, resilience to non-normally distributed data, and ability
to handle both continuous and categorical variables, which makes it particularly effective for
managing small sample sizes and high-dimensional feature spaces (Huang et al., 2023).
Deep Learning-based Models. These models are particularly well-known for their ability
to automatically extract features and patterns from raw data and their proficiency in cap-
turing temporal and nonlinear dependencies within the high-dimensional multivariate data
(Li et al., 2022). The Convolutional Neural Networks (CNN) were the most frequently used
deep learning-based model in the model learning step, appearing in 43 analyses across four
studies. Nguyen et al. (2023) classified the sequences of regulatory and physiological activi-
ties using CNN to predict collaborative learning success. Another study (Cebral-Loureda and
Torres-Huitzil, 2021) employed CNN to classify the emotion and identify the attention-degree
levels from images and gaze behavior data. Furthermore, Ivleva et al. (2023) utilized CNN
to predict emotion state from speech and facial features. Finally, in the last paper within
this class of models (Peng et al., 2019), CNN was involved in classifying the learning status.
Additionally, The Long Short-Term Memory Network (LSTM) is another deep learning-based
model employed in four included studies. In the paper by arvel¨a et al. (2023), it was utilized
for sequence prediction from regulatory activities in the collaborative learning environment.
Moreover, Peng et al. (2019) involved LSTM to classify the comments data, and it was also
used to predict the performance and Normalized Learning Gain (NLG) by Olsen et al. (2020).
Moreover, Di Mitri et al. (2022) classified the chest compression using this model. In addition
to the models mentioned, two papers explored the application of a Deep MLP architecture
in this step. The first paper, authored by Som et al. (2020), utilized a 5-layer MLP model
to evaluate group collaboration quality based on individual roles and behaviors exhibited by
group members. Meanwhile, another study (Ma et al., 2022) tackled impasse detection during
the collaborative problem-solving process by employing a Deep MLP classifier. This classifier
consists of two feed-forward layers and two dropout layers to mitigate overfitting. Further-
more, in papers by Chan et al. (2023) and Lee et al. (2023), the authors employed sequential
neural network architectures, DNN, to achieve remarkable performance in their classification
tasks. As well as, DL-SARF, a deep learning-student attention recognition framework, was
developed to assess the offline classroom, particularly focusing on students’ engagement, in
paper (Akila et al., 2023).
AI-based approaches in the Analysis phase
Focusing on this phase, two studies involved machine learning techniques in their Insight Generation
step to interpret the results of their respective models by identifying and leveraging key features.
Sabuncuoglu and Sezgin (2023) utilized InterpretML with LIME to explain the classification model
behavior and generate interpretable rules. Conversely, Chejara et al. (2023a) employed an RF model
to identify essential features that clarify the clustering results concerning collaborative quality scores.
4.4 RQ4: What experimental designs and settings are employed in AI-
enhanced MMLA studies?
4.4.1 What were the types of settings employed in the studies?
We grouped different types of settings in the studies into two discrete categories: Lab, and class-
room. A significant proportion of the studies, totaling 34 papers, were conducted in a lab situation
(e.g., Li et al., 2023; Reilly and Schneider, 2019). these studies involve controlled environments,
simulations, and structured tasks, and focus on particular groups of participants that do not corre-
spond to real contexts, thereby indicating a potential limitation in the ecological validity of these
studies. For example, Zhou et al. (2024) described working with 34 postgraduate students at a UK
tertiary institution. The students were assigned to groups of four or five, considering diverse first
languages, interdisciplinary backgrounds, and mixed-gender composition. Over a 10-week course,
each group collaborated for approximately 60 minutes each week to design technological solutions
27
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
NUMBER OF PARTICIPANTS
PAPERS
Figure 15: Sample size distribution across the AI-enhanced MMLA studies. The blue dots represent
individual included papers.
for educational challenges. Another study demonstrates that the participants were randomly as-
signed to three groups: a control group with no learning analytics system, experimental group 1
(EG1) using unimodal data, and experimental group 2 (EG2) utilizing multimodal data (Lin et al.,
2023). These studies exemplify a controlled situation and quasi-experimental design rather than a
true experimental design, as they did not involve a random assignment .
The second portion of the studies, comprising 7 papers, is assigned to real settings that resemble
the classroom (e.g., arvel¨a et al., 2023; Yusuf et al., 2023) because these studies were embedded into
regular learning practices and spanned multiple sessions or weeks, indicating that the essence of their
settings aligns with the real classroom environment. As an example, Nguyen et al. (2023) highlighted
that data collection was carried out in real science classrooms with weekly lessons involving 94
secondary school students, aged 13 (36 males and 58 females). Similarly, Akre et al. (2023) describe
their dataset, which reflects ”in the wild” conditions (i.e., real context), consisting of video recordings
of people in a virtual learning environment tagged with labels indicating attentiveness, frustration,
confusion, and boredom. It also includes log data of their actions on the learning management
system and the results of quizzes they attempted.
Yet, The remaining percentage of the studies, including 2 papers, is allocated to those studies
that did not specify the types of settings, whether controlled situations or real contexts. As a result,
Considering our analysis, it can be ascertained that the majority of MMLA research tends to focus
on controlled lab settings rather than authentic, real-world contexts.
4.4.2 What was the range of sample size used in the studies?
We examined the range of sample sizes employed across 43 studies to determine the extent to
which the results of a study are applicable to other samples. As indicated by Fig.15, the sample
sizes of different studies varied significantly, ranging from 4 participants (Peng et al., 2021) to 304
participants (Li et al., 2023). The median sample size was 40 participants (Chango et al., 2021)
and the standard deviation of sample sizes across studies was 54.61. The most frequently reported
sample sizes ranged from 4 to 30 (e.g., Chng et al., 2020; Chejara et al., 2020). This suggests a
common tendency among researchers to recruit this number of participants for their studies. For
example, Peng et al. (2019) indicated that fifteen university students, including 7 men and 8 women
aged 21 to 27, were recruited for their research focused on aggregating multimodal data, including
facial expressions and timeline-anchored comments, to design a tool for instructors that shows how
students’ status changes along the lecture video timeline. Additionally, Vatral et al. (2023) involved
14 student nurses who used Tobii 3 eye-tracking glasses during their training in a simulated hospital
environment. In this simulation, the students conducted evaluations and administered treatments
to a manikin patient.
In contrast, a few studies used very large sample sizes (e.g., Chettaoui et al., 2023; Emerson
et al., 2023), reflecting potentially higher generalizability to different contexts. For instance, Lin
et al. (2023) worked with a total of 60 voluntary participants (32 females and 28 males), all of whom
were senior high school students, to examine the impact of different learning analytics diagnostic sys-
tems on students’ learning behaviors, learning performance, and motivation in STEM collaborative
learning activities. Another study involved 70 students from two higher education institutions who
were exposed to emotional stimulants to investigate how wearable technology, through monitoring
physiological data, could potentially support learners’ self-regulation (Yun et al., 2020).
28
20
15
14
11
5
0 5 10 15 20
Informed Consent
Ethics Approval
Unknown
Privacy Protection
Algorithmic Bias
NUMBER OF PAPERS
ETHICAL CONSIDERATIONS
Figure 16: Reported Ethical Considerations in AI-enhanced MMLA studies, including key issues
and frequencies.
4.4.3 What kind of ethical implications were considered in the studies?
As shown in Fig. 16, various ethical implications, including informed consent, ethics approval,
privacy protection, and algorithmic bias, were examined in different studies. Informed consent was
the most frequently reported ethical consideration, specified in 20 studies. For example, Sharma
et al. (2019) reported that, prior to starting the research, students were asked to sign consent forms
informing them about the study’s procedures and granting permission to the researchers to use the
collected data for research purposes. In this case, Yun et al. (2020) noted that they explained the
study’s purpose and procedures to the participants, and received both verbal and written consent
from them 15 studies (e.g., Chango et al., 2021; Kawamura et al., 2021) pointed out ethical approval,
indicating that these studies secured approval from an Institutional Review Board (IRB) or ethics
committee to adhere to ethical regulations and standards.
A total of 11 studies mentioned privacy protection, indicating attempts to protect participants’
personal data and enhance confidentiality. For instance, Sabuncuoglu and Sezgin (2023) addressed
privacy concerns by avoiding the collection of demographic data and ensuring anonymous use. Only
teachers have access to student-specific data, and engagement data for policymakers is aggregated
to protect personal identifiers. To minimize surveillance concerns, they point out that the platform
does not store data and restricts data access to third parties and students. Similarly, Chettaoui
et al. (2023) highlighted that they safeguarded participant identification by saving their IDs, test
scores, and eye-tracking data.
Five studies addressed algorithmic bias involving mitigating biases in data analysis algorithms.
For example, Lee et al. (2023) balanced their data using SMOTE to prevent an imbalance between
distracted and attentive states, ensuring neither state dominated proportion. This provided sufficient
data points for training and reduced biases from different data ranges. Likewise, Cukurova et al.
(2020b) mentioned that to reduce biases, they selected participants with similar levels of knowledge
to ensure that differences in knowledge and skills did not impact their collaborative problem-solving
(CPS) competence. In contrast, a significant proportion of studies (14) did not draw attention to
any ethical considerations, raising concerns about the transparency, fairness, accountability, and
ethics of the studies.
4.5 RQ5:Why is AI adopted in MMLA studies, and what are some of the
underlying challenges?
In RQ5, we discuss the primary motivations for using AI in MMLA studies, as evidenced by the
reported benefits, and explore the underlying challenges.
29
4.5.1 What are the reported benefits of AI-enhanced MMLA studies?
A total of 11 benefits were identified across 43 analyses in AI-enhanced studies as shown in Table 3,
of which the top 5 are reported. Increasing insight into the student learning process is commonly re-
ported in 36 studies. These studies emphasized how MMLA enables educators to better understand
how students engage with learning (e.g., Akila et al., 2023; Emerson et al., 2020a). For instance,
A study conducted by Israel et al. (2021) explores how combining data from facial expressions and
gameplay logs can help understand students’ emotional and problem-solving states during computa-
tional thinking (CT) game-based learning. And also, Akre et al. (2023) introduced ”EngageDat-vL,”
a novel learning analytics dataset that combines emotional, cognitive, and behavioral data, providing
deep insight into student engagement in e-learning settings.
Table 3: Reported benefits across AI-enhanced MMLA studies
#Benefits Number of Studies
1 Increasing insight into student learning process 36
2 Personalized learning 18
3 Providing real-time feedback 16
4 Supporting learning experience 14
5 Enhancing the accuracy student behaviour prediction 10
6 Positive effect on learning outcome 6
7 Detecting students’ performance 6
8 Feasibility and Practicality 5
9 Automated Estimation of Collaboration 4
10 Development of teaching strategies 3
11 Reduction in the burden for teachers 2
Furthermore, after gaining deeper insight into the learning process, personalized learning was fre-
quently reported in 18 studies. For instance, Lin et al. (2023) demonstrated that using multimodal
data analytics supports adaptive learning by providing personalized guidance through the identi-
fication and addressing of student learning challenges in STEM collaborative learning activities.
Additionally, Yun et al. (2020) studied data from physiological sensors, specifically Electrodermal
Activity (EDA) and Electrocardiogram (ECG), to create a context-aware personal learning support
system aimed at supporting self-regulation in higher education.
Additionally, providing real-time feedback, as mentioned in 16 studies, plays a key role in fostering
21st-century skills (e.g., collaboration - Huang et al., 2019) and helps learners maintain longer
attention spans and reduce distractions (Lee et al., 2023). This is evidenced by a real-time feedback
system for cardiopulmonary resuscitation (CPR) training (Di Mitri et al., 2022), which demonstrated
that real-time feedback has a short-term positive impact on CPR performance. Furthermore, Peng
et al. (2019) indicate that collecting multimodal data, such as facial expressions and comment text
from students watching teaching videos online, allows instructors to offer timely feedback on student
engagement and comprehension, which enhances online education.
Fourteen studies focused on supporting the learning experience, highlighting that MMLA en-
hances learning by promptly identifying students’ difficulties and offering customized interventions
(e.g., Ma et al., 2022; Emerson et al., 2023). For example, Ma et al. (2022) demonstrated that
using multimodal learning analytics supports learning experiences among middle school learners by
detecting impasses during collaborative problem-solving. And also, another example by Vatral et al.
(2023) shows that predictive models for self-confidence, based on multimodal data from eye gaze
and speech patterns, could be part of a larger assessment framework. This framework would provide
instructors with additional tools to support and improve student learning and patient outcomes,
thereby enhancing the learning experience in simulation-based nurse training programs.
Ten studies underlined the importance of increasing the accuracy of student behavior prediction
by leveraging multiple data sources, such as eye-gaze features and academic data, aiding instructors
30
in identifying students’ indicators of learning gains more accurately (e.g., Emerson et al., 2020a;
Olsen et al., 2020). for example, Chettaoui et al. (2023) indicated that combining eye-gaze tracking
with other learning and behavioral data can provide an accurate prediction of learning outcomes.
Another example, as indicated by Huang et al. (2019), is that combining two different modalities
of data, such as eye-tracking and physiological activity data, could lead to better predictions of the
quality of collaboration.
4.5.2 What are the reported challenges of AI-enhanced MMLA studies?
Table 4 outlines 15 challenges that were raised in the included studies. One of the prominent
challenges identified is the issue of small sample sizes, which has been extensively discussed across 13
studies (e.g., Ouyang et al., 2023; Chng et al., 2020). Additionally, limited generalizability—although
ranked as the third challenge in the table appered in 11 studies—is closely related to sample size.
Thus, We are addressing both challenges together due to their interconnected nature. For instance,
Ma et al. (2022) reported that because the data was obtained from a limited group of 46 middle
school students, the predictive features identified would not generalize to different groups of students
in various educational contexts (i.e., online learning environments). Furthermore, Moon et al. (2022)
pointed out that the dataset they worked with was too limited to create a widely applicable predictive
model. Therefore, they emphasized the importance of expanding the dataset to validate and improve
the prediction model.
Table 4: Reported challenges across AI-enhanced MMLA studies
#Challenges Number of Studies
1 Sample size 13
2 Model selection and over-fitting challenges 11
3 Limited generalizability 10
4 Limited practical implementation 8
5 Data quality and Emotion/Behaviour Detection challenges 6
6 Technical issues 4
7 Unreliable self-reported metrics 4
8 Feature selection /extraction and redundancy issues 4
9 Complexity in data fusion 3
10 Data Accessibility, Privacy, and Context Challenges 3
11 Annotation, Representation & Data depth challenges 3
12 Challenge in Modelling temporal process 1
13 AI dataset balancing issues 1
14 Costly Measurement Hardware 1
Model selection and overfitting challenges, ranked second, were reported in 11 studies. Choosing
the right method for analysis is particularly challenging. In this case, Closser et al. (2022) highlighted
that to determine if actions, verbal communication, or gestures are related to student performance,
simple regression is suitable for analyzing such data. However, understanding the complexity of
student speech and language requires advanced techniques. Additionally, Reilly and Schneider (2019)
reported overfitting in their study’s models, which they attributed to the lack of regularization of
model complexity and the small sample size. They also suggested that it might be more appropriate
to use a regression model instead of classifying the scores, considering the range of -2 to 2 used for
evaluating the quality of collaboration.
Eight studies reported the issue of limited practical implementation as one of the challenges of
MMLA. Lin et al. (2023) mentioned that the limitation of practical implementation stems from the
short duration of instructional learning activities impeding learning and assimilation of knowledge.
Additionally, Zhou et al. (2024) acknowledged that while modern computer vision techniques are
capable of automatically identifying students’ behaviors, there are challenges in applying these AI-
enabled technologies in real-world contexts, resulting in the need for manual work in some parts of
the process.
31
Data quality and emotion/ behavior detection challenges were identified in six studies. For
example, Data quality decreases due to occlusion and illumination and the failure of facial detection
when learners are not directly facing the camera (Ma et al., 2022). Also, identifying emotions was
another challenge, as Cebral-Loureda and Torres-Huitzil (2021) noted the difficulty in detecting
negative emotions during task performance. As such, It is very challenging to capture emotions due
to variations occurring in video datasets (Akre et al., 2023).
5 Summary and Future Research Directions
In this section, we analysied the key findings from the results, highlighting the patterns and insights
observed across the included papers. We then explored the challenges and opportunities these
findings present for the field. This analyises provides a deep reflection on the overall impact of AI
in MMLA studies and the directions for future research.
5.1 Evolution, distribution, and collaboration in AI-based MMLA Stud-
ies.
Findings from RQ1 provide novel insights into the state of AI-based MMLA research up to 2024, re-
vealing patterns previously unexplored in the field. Our analysis shows that these studies have expe-
rienced varied growth, unique geographic distribution, and evolving collaborative patterns. Initially,
the publication of AI-based MMLA studies increased, followed by a decline during the COVID-19
pandemic, and then a significant recovery in 2023. This recovery underscores the resilience and
adaptability of the research community, as well as the growing recognition of the importance of
integrating AI in MMLA. From a pedagogical perspective, this trend indicates a shifting paradigm
in how researchers and educators perceive the role of AI in MMLA, as well as, it demonstrates that
computers equipped with AI are increasingly capable of processing diverse educational data streams,
leading to more comprehensive and adaptive learning environments. It points to an evolving theory
of technology integration in education (Boyraz and Ocak, 2021), where AI is a crucial component
in enhancing the depth of learning analytics. This evolution in the field is moving the boundaries of
conceptualizing and implementing technology-enhanced learning environments, offering new insights
into personalized and adaptive educational experiences.
Geographically, the research is spread across 24 countries on six continents, with Europe leading
in contributions, particularly from Germany, Spain, the UK, and Estonia, revealing the high potential
of Europe in advancing the field. This finding significantly extends our understanding compared to
previous reviews like Noroozi et al. (2020), which showed concentrated MMLA efforts in a few
countries, such as the US and Australia. Moreover, our result demonstrates a substantial expansion
into diverse nations, including Egypt, Tunisia, and Nigeria. This result marks a significant shift from
(Tahiru, 2021), which reported no AI in education publications from developing countries between
2010 and 2019, highlighting the rapid advancements now emerging in these countries. The presence
of developing countries in this field marks a shift towards a more inclusive and globally representative
model of AI integration in educational technologies. Moreover, from the perspective of Vygotsky’s
Sociocultural Theory (Gauvain, 2020), this geographical diversity suggests that AI-based MMLA
studies can be adapted to diverse cultural contexts, potentially resulting in more globally applicable
and culturally sensitive educational technologies. The theory emphasizes the culturally organized
and socially mediated nature of human cognitive processes, aligning with our findings of diverse
global contributions to AI-enhanced MMLA research.
The analysis of collaboration patterns among countries reveals significant insights into the global
landscape of AI-based MMLA research. Our findings show that the USA continues to lead in in-
ternational and national publications, consistent with earlier studies by Deng and Zhao (2022) and
Pei et al. (2023). Spain has emerged as another key player, indicating a shift in the global research
dynamics. This leadership pattern suggests an evolution in how advanced technological infrastruc-
ture influences the development and adoption of AI in educational settings (Deng and Zhao, 2022).
However, our study also uncovers that some countries, such as India, Turkey, Mexico, Taiwan, and
32
Australia, have only contributed to spreading the field at the national and local geographic. This
finding suggests a focused but limited engagement with AI in MMLA within these countries, high-
lighting their potential to accelerate the progress of the AI-enhanced MMLA field across boundaries.
From a pedagogical perspective, these collaboration patterns have important implications for the
global development of AI-enhanced learning tools. The leadership of technologically advanced coun-
tries may accelerate the creation of sophisticated MMLA systems, while the localized focus of other
nations could lead to more culturally adapted applications. This diversity in approach highlights the
need for a more inclusive model of AI integration in educational technology that can bridge global
innovations with local educational needs and contexts.
A call for increased collaboration. In terms of future research, the findings from RQ1 under-
score a need for increased collaboration across geographical and institutional boundaries to address
the imbalances in geographic representation. The analysis revealed that while research efforts are
concentrated in regions like Europe and North America, significant contributions are emerging from
diverse locations such as China, Japan, and various countries in Africa and South America. However,
the distribution of studies shows that many regions remain underrepresented, indicating a potential
for broader collaborative efforts. Expanding collaboration between countries with advanced infras-
tructure and those with expertise in both AI and LA fields is essential to accelerate and foster
progress in AI-enhanced MMLA studies. Advanced countries (like the USA and Spain), with their
key roles and robust resources, must strengthen collaboration with other countries to fully leverage
global expertise and capabilities. Each country has a set of unique strengths, and integrating these
with the potential of countries yet to contribute can lead to significant breakthroughs in the field.
This collaboration could result in more sophisticated, culturally sensitive MMLA systems that
combine global technological advancements with local educational expertise, potentially revolution-
izing personalized learning on a global scale. In the context of Vygotsky’s Sociocultural Theory, this
call for increased collaboration can be seen as an effort to create a global Zone of Proximal Develop-
ment (ZPD) in AI-based MMLA research (Gauvain, 2020). By fostering partnerships between more
experienced researchers and emerging contributors, we can scaffold the development of expertise
in this field across diverse cultural contexts, leading to more effective and culturally appropriate
AI-based learning tools.
5.2 Application contexts, educational levels, stakeholders and modalities
in AI-Enhanced MMLA Studies.
Findings from RQ2 emphasize a predominant focus on tertiary education, specifically undergraduate
students similar to the results reported by (Noroozi et al., 2020; Prinsloo et al., 2023), with a
substantial number of studies targeting learners, researchers, instructors, and technology developers.
This result not only underscores the critical importance of enhancing learning processes and outcomes
in universities and colleges but also highlights the advanced infrastructure now available for collecting
multimodal data at the higher education level (Samuelsen et al., 2019). Additionally, this finding
reflects a growing body of research recognizing the transformative potential of AI-enhanced MMLA
in improving educational practices, especially in post-secondary institutions.
Despite the emphasis on higher education, there is a notable inclusion of secondary education,
addressing the needs of learners, researchers, and instructors, reflecting a growing interest in un-
derstanding and improving learning outcomes across these educational levels. The lack of studies
targeting technology developers at the secondary level suggests a gap in research and innovation,
reflecting a missed opportunity to engage technology developers in creating or adapting tools specif-
ically designed for the unique needs of secondary education. This gap highlights the need for a more
comprehensive application of Design-Based Research approaches (Anderson and Shattuck, 2012) in
AI-based MMLA, ensuring that technological developments are rooted in educational theory and
practice. Primary and early childhood education was less frequently addressed, with a few studies
focusing on improving engagement and learning outcomes for younger students, underscoring the
challenges and difficulty in identifying and capturing various dimensions of learning at these levels,
which is consistent with the findings in (Crescenzi-Lanna, 2020; Noroozi et al., 2020). This gap
33
presents an opportunity to extend theories of early childhood development, such as Piaget’s Cog-
nitive Development Theory (McLelland, 2024; Cerovac and Keane, 2024), by leveraging AI-based
MMLA to provide more data-driven insights into young children’s learning processes. Furthermore,
five studies did not specify the educational levels in which their experiments were conducted, raising
concerns about the generalizability of their findings across different educational contexts. By not
accounting for the distinct pedagogical challenges, sensitivities, and requirements of various educa-
tional levels, these studies may risk sacrificing reliability and failing to capture the full potential of
AI in enhancing learning outcomes. This limitation underscores the importance of contextualizing
AI-enhanced MMLA within educational theories, such as Situated Learning Theory (Lave, 1991),
to ensure that findings are theory-based and practically applicable.
In terms of the modalities utilized, recent technological advancements, the growth of online learn-
ing environments, and the adoption of gameplay-based learning processes have significantly enhanced
the opportunity for collecting physical and digital data through tools used by learners (Samuelsen
et al., 2019; Mu et al., 2020). Our results confirm this trend, indicating that the most frequently
employed modalities are physical activities (such as facial expressions, speech, eye movements, and
motion) recorded by embedded tools like cameras, eye trackers, and microphones, consistent with
the finding by Mu et al. (2020), while digital traces (including system logs and textual data from
comments, posts, and online chats) captured by digital learning platforms are the second most fre-
quently utilized modalities. This focus aligns with the findings of (Giannakos and Cukurova, 2023),
who identified three dominant theories in MMLA research, including Embodied Cognition (EC),
Cognitive Load Theory (CLT), and Control-Value Theory of Achievement Emotions (CVTAE). Ad-
ditionally, our analysis reveals a growing use of physiological signals and psychometric data, such
as EDA, Heart rate, and questionaries, which provides insights into learners’ emotional, stress re-
sponses, and cognitive load. In contrast, modalities from environmental space, such as lighting
conditions, temperature, and physical space configuration, were less frequently utilized (Chan et al.,
2023; Kawamura et al., 2021). Despite its potential to offer valuable insights into how the learning
environment impacts learner engagement and performance, environmental data remains an under-
explored area in the current studies. The results also indicate that most of the included papers
integrated modalities from only two information sources, revealing a prevalent focus on combining
a limited range of data types. However, only eight out of 43 studies employed modalities from more
than two information spaces, underscoring the challenges of handling large and complex data. This
trend suggests that while dual-modality approaches are common, there may be opportunities to
enhance the richness of analyses by incorporating additional sources of information.
Opportunity for better utilisation of data across modalities in information spaces. Ac-
cording to the Connectivitism learning theory, which says learning occurs through connections within
networks of information (Alam, 2023), the integration of multiple modalities in MMLA provides a
rich, multimodal understanding of the learner interactions and behaviors. This networked approach
enables deep insights into the learning process (Ochoa et al., 2017). Our analysis reveals that signif-
icant studies have integrated modalities from physical and digital spaces, leveraging recent advance-
ments in online learning environments and gameplay-based learning. This focus, while valuable,
overlooks the rich possibilities offered by other nodes in the learning network, such as environmen-
tal, physiological, and psychometric data sources. Despite the clear potential of these underutilized
data nodes (sources) to provide deeper insights into learners’ emotional states, stress responses, and
cognitive load, as the critical elements in the Connectivist view of learning as a process of forming
meaningful connections, they remain underexplored in the current research landscape. For instance,
environmental data, such as lighting conditions, temperature, and physical space configuration, could
offer valuable insights into how the learning environment impacts learner engagement and perfor-
mance. Hence, there is a significant opportunity to explore and integrate these underutilized data
sources, expanding our understanding of the complex, interconnected nature of learning networks.
This opportunity extends the work of Giannakos and Cukurova (2023), who noted MMLA’s poten-
tial as a new information source for understanding and supporting learning process. While they
focused on embodied cognition, cognitive load, and emotions, our review suggests that including
environmental factors could further enhance MMLA’s ability to capture the full spectrum of the
34
learning experience.
Nevertheless, increasing the volume and heterogeneity of the modalities requires more complex
and time-intensive storage and analysis, which, without the appropriate tools and insights, makes
traditional analytical methods unusable (Sharma and Giannakos, 2020). Recently, the advancements
in AI, particularly in areas such as machine vision, machine learning, and explainable AI, have made
it feasible to manage and analyze such complex data effectively (Slupczynski and Klamma, 2021;
Blikstein and Worsley, 2016). This need for advanced analytical methods reflects the observations
of Giannakos and Cukurova (2023), who highlighted MMLA’s ability to index cognitive load in an
unobtrusive and temporal manner. Our analysis extends this observation, suggesting that AI-driven
techniques can seamlessly fetch, condense, and integrate diverse modalities, uncovering intricate
patterns and relationships that might otherwise remain hidden. By leveraging such technologies,
researchers can push the boundaries of MMLA, moving beyond dual-modality approaches to explore
richer, more diverse information spaces. This exploration can uncover deeper insights into the com-
plex dynamics of learning, potentially leading to knowledge breakthroughs in our understanding of
cognitive processes, emotional changes in learning, and the impact of environmental factors on edu-
cational outcomes. For example, AI-driven analysis of combined physiological, environmental, and
digital interaction data could reveal new patterns in how mental effort changes under various condi-
tions, leading to more effective strategies for managing information presentation in digital learning
environments. This approach not only advances our theoretical understanding of the AI application
in MMLA but also has significant practical implications for the design and implementation of adap-
tive learning technologies. It showcases the potential of AI to revolutionize MMLA by enabling the
integration and analysis of diverse data sources, thereby enhancing our ability to gain deep insights
into learning processes.
A call for broader inclusion and multi-level studies. Our results suggest that there is a
significant gap in research that covers different educational levels. While most AI-enhanced MMLA
research focuses on higher education, there is insufficient attention to primary, secondary, and early
childhood education. Moreover, the tendency for studies to focus on single educational levels limits
their broader applicability. This finding highlights a critical need for multilevel studies that employ
adaptive strategies to integrate analyses across multiple educational contexts. From a theoretical
perspective, such studies should leverage adaptive AI methodologies to ensure that research is reliable
and context-specific. By tailoring AI approaches to the unique characteristics of each educational
level, researchers can achieve more accurate findings, potentially extending current learning theories
to account for technological interventions across different ages. At the same time, they should aim for
generalizability by identifying cross-level patterns and trends that can inform broader educational
practices and policies and might lead to new theoretical frameworks that span the entire levels
of education. Pedagogically, implementing adaptive AI within a multilevel framework allows us
to generate deeper and more wide-ranging insights than previously possible. Our call for such
comprehensive research aligns with the need for methodological diversity highlighted in Martin
et al. (2020), emphasizing the importance of gaining deeper insights into the technology use in
education. This approach offers valuable contributions to understanding learning processes from
early childhood to higher education, potentially transforming how we conceptualize and support
learning across different ages.
Compared to the current literature (e.g. Moon et al., 2022; Emerson et al., 2023), this integrated
approach will not only fill the existing gaps but also enhance the effectiveness of AI in supporting
diverse educational needs. It represents a significant advancement in the use of technology for
educational purposes, offering new ways to personalize learning experiences and inform educational
policies based on comprehensive, data-driven insights.
5.3 Use of AI in MMLA studies.
To address RQ3, we developed a conceptualized framework for systematically analyzing AI integra-
tion across the MMLA process. This comprehensive framework, spanning from data collection to
intervention, represents a theoretical advancement in understanding the role of AI in the MMLA
35
field. It enables a structured mapping of AI techniques onto specific MMLA phases, providing
valuable theoretical insights and practical pedagogical implications. Our analysis revealed that AI
techniques were utilized in all phases, except for data storage, but were limited to specific sub-
components of the MMLA process. This uneven integration of AI across the analytical pipeline
highlights both the potential and current limitations of AI in MMLA. Notably, AI techniques were
predominantly applied in the modeling phase of MMLA studies, playing a critical role in extracting
valuable information from multimodal data and significantly aiding decision-making processes. In
this phase, the most frequently employed techniques are traditional and ensemble classification meth-
ods, particularly RF, SVM, and Decision Trees, within the model learning sub-component. However,
despite the potential of deep learning models to uncover complex patterns in large-scale data, they
have been employed in only a few studies. Similarly, unsupervised techniques, such as K-means
clustering, have seen limited application, with only two papers utilizing these methods to extract
valuable information from multimodal data. These results reveal that most studies employed tradi-
tional supervised techniques within their model learning sub-components, which inherently requires
advanced annotation phases across all these studies. Despite the critical role of accurate labeling,
our analysis reveals that only one paper attempted to incorporate automated labeling methods,
while the majority relied on manual processes for data annotation. This labeling process not only
limits scalability but also highlights a significant area for innovation in future research. Furthermore,
only two papers attempted to generate interpretations of the model learning results, underscoring a
lack of focus on explainability and transparency in current MMLA studies. For educators, this gap
means fewer opportunities to make data-driven decisions, while learners miss personalized insights
that could enhance their educational experience. These limitations underscore a serious gap between
the theoretical potential of MMLA, described by Giannakos and Cukurova (2023), and its practical
application. While MMLA is highlighted as a tool for uncovering hidden learning processes and
providing deeper insights through advanced AI techniques, the actual implementation falls short
of this promise. The over-reliance on manual data labeling and the lack of model interpretation
represent missed opportunities to leverage the full capabilities of AI.
On the other hand, the complexity of managing heterogeneous modalities (i.e., images, text,
audio, and video types) collected by different tools and technologies placed the pre-processing step
as the second most critical phase for AI application in MMLA. This phase of included studies
involves sub-components such as data anonymization, cleaning, transformation, and augmentation,
with AI techniques like image processing, speech recognition, and resampling (e.g., SMOTE) to
transfer data into a protected, accurate, and consistent format, as well as address data imbalances
and improve data quality. The transformation sub-component, the center of this phase, is crucial
for extracting valuable and consistent features from diverse modalities. Given its complex nature
and the challenge of uncovering latent features, it requires a substantial shift from human to AI
capabilities. This necessity is supported by the fact that more than half of the included studies have
leveraged AI techniques in tasks like facial expression, speech recognition, and spatial orientation,
in this sub-component, highlighting the pivotal role of AI in achieving accurate and comprehensive
data transformation.
AI techniques were also extensively utilized across other phases of the MMLA process, includ-
ing Fusion and Acquisition. The fusion phase, which involves integrating features from multiple
modalities to create a unified representation through feature engineering, standardization, and fea-
ture integration sub-components, is essential for enhancing the comprehensiveness and richness of
the dataset. Notably, among these sub-components, AI techniques were only applied in the feature
engineering sub-component of 15 studies. PCA, RF, and HMM were the most frequently used meth-
ods, demonstrating their effectiveness in dimension reduction and feature extraction. Similarly, the
data collection phase, encompassing source identification and acquisition, is critical for ensuring the
quality and relevance of the data used in MMLA. In this phase, AI applications were leveraged to
automate data collection through advanced tools and sensors capable of handling diverse modalities,
from text to multimedia. Despite its importance, only a few studies have focused on utilizing AI
technologies, with their application limited to specific tasks such as face recognition, facial expression
analysis, speech recognition, gaze behavior tracking, and spatial orientation identification. These
targeted AI-driven approaches aim to enhance the data acquisition process and reduce redundancy
36
by focusing on the collection of specific relevant features.
Opportunity for expanding AI applications across MMLA phases. Our review highlights
a significant opportunity to enhance the MMLA process by expanding AI applications across un-
derutilized sub-components, including source identification, data warehousing, data management,
synchronization, metadata generation, feature integration, EDA, statistical analysis, and visualiza-
tion. Leveraging AI technologies in these areas could lead to more efficient and robust data handling,
improved accuracy in data integration, and deeper insights through advanced analytical techniques
(Noroozi et al., 2019). For instance, from the connectivist learning theory perspective, AI-enhanced
source identification and data management facilitate more accurate and efficient aggregation of rel-
evant data. These approaches help map the complex networks of information that learners guide in
modern educational environments, as well as, minimize manual effort and reduce the risk of missing
critical information. AI-based EDA approach, viewed through a constructivist lens, could signifi-
cantly improve the understanding of complex multimodal data by automatically uncovering latent
patterns, correlations, and anomalies that traditional methods may overlook. This aligns with the
constructivist emphasis on building knowledge through active exploration and interpretation of in-
formation, potentially leading to more accurate insights into learning processes. Also, integrating
AI into visualization techniques could produce more dynamic and interactive data representations,
aligning with cognitive load theory. These advanced visualizations could reduce extraneous cogni-
tive load, facilitating better understanding and decision-making for learners and educators. They
represent a theoretical advancement in conceptualization, presentation, and interpretation of the
learning analytics data. By addressing these gaps, researchers can significantly improve the effec-
tiveness and impact of MMLA studies, ultimately leading to more informed and actionable insights
for educational stakeholders.
From a pedagogical perspective, these AI-based advancements offer transformative potential for
teaching and learning practices. For instance, AI-enhanced source identification and data man-
agement could enable educators to more efficiently tailor learning experiences to individual learner
needs, supporting personalized learning approaches. AI-enhanced EDA could help teachers identify
learning patterns and efforts, facilitating real-time interventions. Moreover, AI-based visualizations
could make complex learning multimodal data more accessible to both educators and students, foster-
ing decision-making and self-regulated learning. These advancements not only push the boundaries
of current MMLA research but also offer promising roads for revolutionizing classroom practices
and assessment methods. By bridging the gap between advanced analytics and practical pedagogy,
AI-enhanced MMLA has the potential to significantly improve learning outcomes and educational
experiences across diverse learning environments.
On the other hand, by employing AI technologies across most phases of the MMLA process,
researchers can significantly enhance their ability to develop more automated and adaptive learning
environments. The integration of cutting-edge AI technologies offers a transformative opportunity
to streamline and automate each phase of the MMLA process, from data acquisition to analysis.
For instance, in the data acquisition phase, generative AI can create synthetic data to augment
existing datasets, enhancing their diversity and robustness (Eigenschink et al., 2023). During the
pre-processing and fusion phases, Generative AI can assist in data cleaning and transformation, pro-
ducing high-quality, standardized data ready for analysis. Generative AI is particularly well-suited
for metadata generation, improving the richness and context of data (Asthana et al., 2023). Fur-
thermore, in the modeling phase, generative AI techniques can aid in developing more sophisticated
models by simulating complex learning scenarios and generating predictive insights. For analysis,
generative AI can also assist in visualizing data patterns and generating comprehensive reports that
translate complex analytics into actionable insights (Narayanan, 2024).
A call for advanced AI techniques in MMLA studies. Recent advances in AI present a
transformative opportunity for researchers in the MMLA field to enhance their methodologies and
outcomes. While AI has been integrated into some phases of the MMLA process, the lack of cutting-
edge AI techniques, such as deep learning, federated learning, generative models, transformers,
Self-Supervised Learning, Few-Shot Learning, Active Learning, Meta-Learning, and reinforcement
37
learning, highlights a notable gap and a substantial opportunity for innovation. These advanced
techniques promise to address current limitations, refine analytical capabilities, and drive significant
progress in the field, ultimately leading to more insightful and impactful educational technologies.
From the theoretical perspective, they offer new ways to conceptualize and model learning pro-
cesses in complex, multimodal learning environments. For example, integrating Explainable AI in
the modeling phase can enhance the interpretability of complex models, fostering transparency and
trust among educators and researchers (Tiukhova et al., 2024; Khosravi et al., 2022a). It aligns
with learning theories such as constructivism and cognitive load theory, emphasizing the importance
of understanding underlying processes in knowledge construction and information processing. By
making AI decision-making processes more transparent, Explainable AI supports these theories in
the context of learning analytics, ultimately leading to more meaningful and actionable insights for
educators. This approach represents an advancement over previous ’black box’ AI models used in
education, potentially transforming how educators interact with and apply AI-based analytics in
their teaching practices. The Few-Shot Learning technique offers a valuable opportunity to enhance
data labeling in MMLA by enabling models to learn from only a few annotated examples (Song
et al., 2023; Carpenter et al., 2024). This addresses a critical challenge in educational data min-
ing and potentially changes how we accomplish data collection and annotation in diverse learning
contexts. By enabling decentralized model training, federated learning allows data integration from
multiple sources while maintaining privacy and security (Tan et al., 2022; van Haastrecht et al.,
2024). This technique not only enhances data diversity but also facilitates collaborative learning
among institutions, enabling a community-oriented approach to education. This approach theoret-
ically aligns with and extends social learning theories by establishing a new form of collaborative
knowledge construction at an institutional level. Reinforcement learning offers the potential to de-
velop adaptive and personalized learning environments by continuously optimizing strategies based
on real-time feedback (Deeva et al., 2021; D´ıaz and Nussbaum, 2024), aligning closely with be-
haviorist learning theories. It enables dynamic, real-time adaptations to individual learner needs,
potentially transforming how we implement adaptive education. Generative models can augment
data sets and generate synthetic examples, enhancing the robustness of training data and enabling
more sophisticated simulations (van Breugel et al., 2024; Mozafari et al., 2024). Additionally, Gen-
erative Pre-trained Transformers (GPT) offer a novel approach to personalized learning in MMLA,
generating context-aware feedback based on multimodal data analysis (Hou et al., 2024; Yan et al.,
2023). It aligns with constructivist learning theory, enabling adaptive learning experiences and
extending our understanding of AI-supported cognitive processes. Therefore, the MMLA research
community is called upon to utilize these advancements to push the boundaries of what is possible
in MMLA and improve learning outcomes on a broader scale.
5.4 Experimental settings and ethical considerations in AI-enhanced MMLA
studies
The findings from RQ4 reveal a diverse range of experimental designs and settings employed in
AI-enhanced MMLA studies. Sample sizes vary significantly, reflecting the heterogeneity in study
designs, from small-scale experiments with tens of participants to large-scale studies involving hun-
dreds of learners. The experimental settings also differ widely, including controlled laboratory en-
vironments, real-world classrooms, and online learning platforms. This diversity underscores the
adaptability of AI techniques across different educational contexts, a crucial advancement in the
field. However, our analysis reveals a critical gap, the predominance of small-scale, laboratory-based
studies that fail to capture the complexities of authentic learning environments. This finding ad-
vances the theory of ecological validity in MMLA research, emphasizing the importance of real-world
applications to ensure the generalizability and practical relevance of findings.
Ethical considerations are also a critical component of these studies, with many reporting in-
formed consent and ethics approval. However, there is a need for greater emphasis on privacy protec-
tion and addressing algorithmic bias to ensure ethical integrity. This is consistent with other studies
and reviews (e.g., Prinsloo et al., 2023; Alwahaby and Cukurova, 2024), highlight that sensor data,
such as eye-tracking and facial recognition, can uncover personal feelings and health information,
38
which are susceptible to misuse outside of learning environments, thereby reducing the likelihood
of privacy protection. Hence, it is essential to carefully consider the context, the level of intrusive-
ness of the tools, and the type of data being collected in MMLA studies (Mangaroska et al., 2021).
This careful consideration that helps improve the understanding, interpretation, and validity of the
collected data was often inadequately addressed in current research studies.
Furthermore, Alwahaby and Cukurova (2024) argue that traditional informed consent forms fail
to adequately explain how sensor data is collected and how algorithms work. This shortage raises sig-
nificant ethical concerns in MMLA, such as anxiety, discomfort, and even simulator sickness among
participants (OECD, 2023). To address this, Alwahaby and Cukurova (2024) suggest enhancing
participant awareness through visuals, videos, and pre-sessions to demonstrate how MMLA opti-
mizes learning experiences. On the other hand, due to the lack of large-scale datasets in MMLA, the
machine learning models currently in use may develop algorithmic biases (Yan et al., 2022c). Fur-
thermore, while AI tools are widely used in MMLA, they often fail to identify essential differences
in disabled individuals, such as atypical facial expressions, motion, speech patterns, or cognitive
variations, which increases these biases (Guo et al., 2020). Another area susceptible to bias is the
labeling process, where sub jective human judgment can skew results. Although replacing subjec-
tive labels with objective ones where feasible can help, certain concepts, like 21st-century skills and
emotions, inherently require subjective assessment (Baker and Hawn, 2022). Additionally, the man-
ual annotation process is both time-consuming and labor-intensive, often resulting in small, biased
datasets (D’mello and Kory, 2015). Thus, to advance AI-enhanced MMLA, the MMLA community
must address a range of complex challenges, some of which are emphasized in the study.
A call for increased ethical considerations. As AI-enhanced MMLA continues to evolve,
even though the scholars (Alwahaby and Cukurova, 2024; Mangaroska et al., 2021) have examined
the ethical implications of MMLA, there is imperative to address the ethical considerations that
accompany its implementation. The integration of AI into educational contexts introduces complex
issues related to privacy, data security, and the potential for biased outcomes. Current studies often
overlook these ethical dimensions, focusing primarily on technological advancements and educational
outcomes. To ensure the responsible use of AI in education, there must be a concerted effort to
incorporate ethical guidelines into research and practice. This includes developing robust data
protection measures, ensuring transparency in AI models, and actively working to mitigate biases
that could disadvantage certain groups of students.
By prioritizing ethical considerations, researchers and educators can foster trust in AI technolo-
gies, ensuring their application supports fair and equitable educational opportunities for all learners.
As the field of MMLA advances in both technological capability and ethical integrity, it is crucial
to safeguard students’ rights and well-being. Our work provides a foundation for developing more
informed policies on the use of AI and data analytics in education, ultimately leading to more re-
sponsible and effective implementation of computer-based learning tools in educational institutions.
A call for large-scale in-the-field studies. The results indicate that most of the included
studies were conducted in inauthentic situations (i.e., labs) on a small scale, failing to capture the
complexities of real classrooms (e.g., dynamic interactions between participants). This limitation is
particularly significant when viewed through the lens of Situated Learning Theory, which empha-
sizes that learning is inherently tied to authentic activity, context, and culture (binti Pengiran and
Besar, 2018; Lave, 1991). The prevalence of lab-based studies potentially undermines our ability to
understand how AI-based MMLA can support authentic learning processes. This gap is consistent
with a study review by Prinsloo et al. (2023), underlining that it remains unclear how MMLA can
be applied, scaled, and replicated in actual classroom environments. Also, (Yan et al., 2022c) ac-
knowledge that approximately 71 percent of predictive analytics studies in MMLA employed small
sample sizes (i.e., fewer than 50 participants), jeopardizing their ecological validity. This trend not
only limits the generalizability of findings but also constrains our ability to develop AI-based MMLA
systems. While Martinez-Maldonado et al. (2023) emphasized that MMLA can be employed in real-
world situations, they also highlighted a set of practical and logistical challenges, such as technology
deployment and sustainability, that need to be addressed.
39
Thus, due to the lack of MMLA deployment in authentic contexts, it is required for future
researchers to bridge the gap between experimental studies and real learning environments. This
could involve collaborative partnerships between researchers, educators, and technology developers
to facilitate large-scale implementations, and the development of scalable, user-friendly MMLA tools
that can be easily integrated into existing educational technologies.
5.5 Benefits and challenges of AI-enhanced MMLA studies
The findings from RQ5 highlight both the benefits and challenges of AI-enhanced MMLA studies.
On the positive side, AI has significantly improved the ability to monitor and analyze complex
learning behaviors, leading to more personalized and adaptive learning experiences (Sharma and
Giannakos, 2020). The integration of AI allows for real-time feedback, automated assessment, and
enhanced decision-making for educators, ultimately contributing to better student engagement and
performance, which are consistent with the findings of other studies (Ez-Zaouia and Lavou´e, 2017;
Thomas, 2018). However, several challenges persist, as highlighted in previous studies (e.g., Al-
wahaby and Cukurova, 2024; Cukurova et al., 2020a), including ethical concerns related to privacy
and the potential for algorithmic bias. Additionally, the implementation of AI in real-world educa-
tional settings often requires large sample sizes and sophisticated technical infrastructure, which can
be resource-intensive and demand expert knowledge. Thus, while educators can benefit remarkably
from the deployment of AI-enhanced MMLA in a contextualized learning environment, it is essential
to increase their awareness of its existing hurdles.
Interestingly, it appears that the top five reported benefits of AI-enhanced MMLA are interrelated
with each other resulting in optimizing the learning environment and improving learning gains among
individual learners, as indicated by several studies (e.g., Emerson et al., 2020a; Reilly and Schneider,
2019). Collectively, the results manifest that the key adaptability of MMLA lies in offering deeper
insights, personalized learning paths, real-time feedback, enhanced learning experiences, and making
accurate behavior predictions.
6 Limitation
As the first SLR at the intersection of the rapidly evolving fields of AI and MMLA, this paper
serves as a foundational starting point, but it is not without limitations. Many of these limitations
arise from the methodological decisions that followed the PRISMA protocol to collect and analyze
literature. Designing search terms and a set of criteria, along with the reliance on a limited number of
databases, might introduce certain biases that could influence the comprehensiveness of the results.
To reduce such potential biases, we conducted our literature search across 11 reputable databases,
ensuring a broader and more representative sample of studies. However, by searching in the title
and abstract of studies across these databases for initial selection, we may have overlooked a subset
of MMLA studies that employed AI techniques but did not highlight them as primary contributions,
as well as studies that utilized multimodal data without explicitly using terms like ”multi*modal”
or ”multi*sourced” in their title or abstract. To mitigate these issues, we considered a broad set
of search terms for each main code word. Furthermore, selecting only papers published in English
should be acknowledged as another limitation, which may have introduced bias in our findings about
the geographic distribution of AI-based MMLA studies. Concerning this bias, we mostly focused on
international collaborations.
Another limitation that should be acknowledged is that in our analysis, we made a methodological
decision to consider all included studies with equal weight, regardless of the depth or centrality of AI
integration in their MMLA approaches. This approach allowed us to capture a broad overview of AI
applications in MMLA, encompassing both studies that focus on developing AI techniques for MMLA
and those that utilize existing AI methods in supporting roles. While this approach provided a
comprehensive view, it may not fully capture the nuances in AI’s role and significance across different
studies. Future research could benefit from a more granular analysis that distinguishes between these
different types of AI integration in MMLA. This could involve developing a classification system for
AI roles in MMLA studies, potentially leading to a deeper understanding of AI’s contributions to
40
the field. By highlighting this methodological consideration, we hope to encourage more detailed
classifications in future reviews, ultimately contributing to a richer understanding of AI’s evolving
role in MMLA.
References
Akila, D., Garg, H., Pal, S., Jeyalaksshmi, S., 2023. Research on recognition of students attention
in offline classroom-based on deep learning. Education and Information Technologies , 1–29.
Akre, S., Palandurkar, N., Iyengar, A., Chayande, G., Kumar, P., 2023. Engagedat-vl: A mul-
timodal engagement dataset comprising of emotional, cognitive, and behavioral cues in virtual
learning environment, in: International Conference on Pattern Recognition and Machine Intelli-
gence, Springer. pp. 270–278.
Alam, A., 2023. Connectivism learning theory and connectivist approach in teaching and learning:
a review of literature. Bhartiyam International Journal Of Education & Research 12.
Alfredo, R.D., Nie, L., Kennedy, P., Power, T., Hayes, C., Chen, H., McGregor, C., Swiecki, Z.,
Gaˇsevi´c, D., Martinez-Maldonado, R., 2023. ”that student should be a lion tamer!” stressviz:
Designing a stress analytics dashboard for teachers, in: LAK23: 13th International Learning
Analytics and Knowledge Conference, Association for Computing Machinery, New York, NY,
USA. p. 57–67.
Alwahaby, H., Cukurova, M., 2024. Navigating the ethical landscape of multimodal learning ana-
lytics: a guiding framework, in: Ethics in Online AI-based Systems. Elsevier, pp. 25–53.
Alwahaby, H., Cukurova, M., Papamitsiou, Z., Giannakos, M., 2022. The evidence of impact and
ethical considerations of multimodal learning analytics: A systematic literature review. The
multimodal learning analytics handbook , 289–325.
Anderson, T., Shattuck, J., 2012. Design-based research: A decade of progress in education research?
Educational researcher 41, 16–25.
Aslan, S., Alyuz, N., Tanriover, C., Mete, S.E., Okur, E., D’Mello, S.K., Arslan Esme, A., 2019.
Investigating the impact of a real-time, multimodal student engagement analytics technology in
authentic classrooms, in: Proceedings of the 2019 chi conference on human factors in computing
systems, pp. 1–12.
Asthana, S., Arif, T., Thompson, K.C., 2023. Field experiences and reflections on using llms to
generate comprehensive lecture metadata.
Ayesha, S., Hanif, M.K., Talib, R., 2020. Overview and comparative study of dimensionality reduc-
tion techniques for high dimensional data. Information Fusion 59, 44–58.
Baker, R.S., Hawn, A., 2022. Algorithmic bias in education. International Journal of Artificial
Intelligence in Education , 1–41.
Bin Qushem, U., 2020. Trends of multimodal learning analytics: A systematic literature review .
Blikstein, P., Worsley, M., 2016. Multimodal learning analytics and education data mining: Using
computational technologies to measure complex learning tasks. Journal of Learning Analytics 3,
220–238.
Boyraz, S., Ocak, G., 2021. Connectivism: A literature review for the new pathway of pandemic
driven education. Online Submission 6, 1122–1129.
van Breugel, B., Seedat, N., Imrie, F., van der Schaar, M., 2024. Can you rely on your model
evaluation? improving model evaluation with synthetic test data. Advances in Neural Information
Processing Systems 36.
41
Buckingham Shum, S.J., Luckin, R., 2019. Learning analytics and ai: Politics, pedagogy and
practices. British Journal of Educational Technology 50, 2785–2793.
Carpenter, D., Min, W., Lee, S., Ozogul, G., Zheng, X., Lester, J., 2024. Assessing student explana-
tions with large language models using fine-tuning and few-shot learning, in: Proceedings of the
19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), pp.
403–413.
Cebral-Loureda, M., Torres-Huitzil, C., 2021. Neural deep learning models for learning analytics
in a digital humanities laboratory, in: 2021 Machine Learning-Driven Digital Technologies for
Educational Innovation Workshop, IEEE. pp. 1–8.
Cerovac, M., Keane, T., 2024. Early insights into piaget’s cognitive development model through the
lens of the technologies curriculum. International Journal of Technology and Design Education ,
1–21.
Chan, R.Y.Y., Wong, C.M.V., Yum, Y.N., 2023. Predicting behaviour change in students with
special education needs using multimodal learning analytics. IEEE Access .
Chango, W., Cerezo, R., Sanchez-Santillan, M., Azevedo, R., Romero, C., 2021. Improving pre-
diction of students’ performance in intelligent tutoring systems using attribute selection and en-
sembles of different multimodal data sources. Journal of Computing in Higher Education 33,
614–634.
Chango, W., Lara, J.A., Cerezo, R., Romero, C., 2022. A review on data fusion in multimodal
learning analytics and educational data mining. Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery 12, e1458.
Chatti, M.A., Dyckhoff, A.L., Schroeder, U., Th¨us, H., 2012. A reference model for learning analytics.
International journal of Technology Enhanced learning 4, 318–331.
Chejara, P., Prieto, L.P., Rodr´ıguez-Triana, M.J., Ruiz-Calleja, A., Kasepalu, R., Chounta, I.A.,
Schneider, B., 2023a. Exploring indicators for collaboration quality and its dimensions in classroom
settings using multimodal learning analytics, in: European Conference on Technology Enhanced
Learning, Springer. pp. 60–74.
Chejara, P., Prieto, L.P., Rodriguez-Triana, M.J., Ruiz-Calleja, A., Khalil, M., 2023b. Impact of
window size on the generalizability of collaboration quality estimation models developed using
multimodal learning analytics, in: LAK23: 13th International Learning Analytics and Knowledge
Conference, pp. 559–565.
Chejara, P., Prieto, L.P., Ruiz-Calleja, A., Rodr´ıguez-Triana, M.J., Shankar, S.K., Kasepalu, R.,
2020. Quantifying collaboration quality in face-to-face classroom settings using mmla, in: Col-
laboration Technologies and Social Computing: 26th International Conference, CollabTech 2020,
Tartu, Estonia, September 8–11, 2020, Proceedings 26, Springer. pp. 159–166.
Chettaoui, N., Atia, A., Bouhlel, M.S., 2023. Student performance prediction with eye-gaze data in
embodied educational context. Education and Information Technologies 28, 833–855.
Chng, E., Seyam, M.R., Yao, W., Schneider, B., 2020. Using motion sensors to understand col-
laborative interactions in digital fabrication labs, in: Artificial Intelligence in Education: 21st
International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part I 21,
Springer. pp. 118–128.
Closser, A.H., Erickson, J.A., Smith, H., Varathara j, A., Botelho, A.F., 2022. Blending learning
analytics and embodied design to model students’ comprehension of measurement using their
actions, speech, and gestures. International Journal of Child-Computer Interaction 32, 100391.
42
Crescenzi-Lanna, L., 2020. Multimodal learning analytics research with young children: A systematic
review. British Journal of Educational Technology 51, 1485–1504.
Cukurova, M., Giannakos, M., Martinez-Maldonado, R., 2020a. The promise and challenges of
multimodal learning analytics. British Journal of Educational Technology 51, 1441–1449.
Cukurova, M., Zhou, Q., Spikol, D., Landolfi, L., 2020b. Modelling collaborative problem-solving
competence with transparent learning analytics: is video data enough?, in: Proceedings of the
tenth international conference on learning analytics & knowledge, pp. 270–275.
D’Angelo, C.M., Rajarathinam, R.J., 2024. Speech analysis of teaching assistant interventions
in small group collaborative problem solving with undergraduate engineering students. British
Journal of Educational Technology 55, 1583–1601.
Davenport, T.H., 2018. From analytics to artificial intelligence. Journal of Business Analytics 1,
73–80.
Deeva, G., Bogdanova, D., Serral, E., Snoeck, M., De Weerdt, J., 2021. A review of automated
feedback systems for learners: Classification framework, challenges and opportunities. Computers
& Education 162, 104094.
Deng, J.H., Zhao, Y., 2022. A literature review of data-driven multimodal learning analytics in edu-
cation based on citespace, in: Proceedings of the 2022 5th International Conference on Education
Technology Management, pp. 390–397.
Di Mitri, D., Schneider, J., Drachsler, H., 2022. Keep me in the loop: Real-time feedback with
multimodal data. International Journal of Artificial Intelligence in Education 32, 1093–1118.
Di Mitri, D., Schneider, J., Specht, M., Drachsler, H., 2018. From signals to knowledge: A conceptual
model for multimodal learning analytics. Journal of Computer Assisted Learning 34, 338–349.
D´ıaz, B., Nussbaum, M., 2024. Artificial intelligence for teaching and learning in schools: The need
for pedagogical intelligence. Computers & Education , 105071.
D’mello, S.K., Kory, J., 2015. A review and meta-analysis of multimodal affect detection systems.
ACM computing surveys (CSUR) 47, 1–36.
Eigenschink, P., Reutterer, T., Vamosi, S., Vamosi, R., Sun, C., Kalcher, K., 2023. Deep generative
models for synthetic data: A survey. IEEE Access 11, 47304–47320.
Emerson, A., Cloude, E.B., Azevedo, R., Lester, J., 2020a. Multimodal learning analytics for game-
based learning. British Journal of Educational Technology 51, 1505–1526.
Emerson, A., Henderson, N., Rowe, J., Min, W., Lee, S., Minogue, J., Lester, J., 2020b. Early predic-
tion of visitor engagement in science museums with multimodal learning analytics, in: Proceedings
of the 2020 international conference on multimodal interaction, pp. 107–116.
Emerson, A., Min, W., Rowe, J., Azevedo, R., Lester, J., 2023. Multimodal predictive student
modeling with multi-task transfer learning, in: LAK23: 13th International Learning Analytics
and Knowledge Conference, pp. 333–344.
Eradze, M., Rodriguez Triana, M.J., Laanpere, M., 2017. How to aggregate lesson observation data
into learning analytics datasets?, in: Joint Proceedings of the 6th Multimodal Learning Analytics
(MMLA) Workshop and the 2nd Cross-LAK Workshop co-located with 7th International Learning
Analytics and Knowledge Conference (LAK 2017), CEUR. pp. 74–81.
Ez-Zaouia, M., Lavou´e, E., 2017. Emoda: A tutor oriented multimodal and contextual emotional
dashboard, in: Proceedings of the seventh international learning analytics & knowledge conference,
pp. 429–438.
43
Foster, E., Siddle, R., 2020. The effectiveness of learning analytics for identifying at-risk
students in higher education. Assessment & Evaluation in Higher Education 45, 842–854.
doi:10.1080/02602938.2019.1682118.
Gao, Y., Zhang, Z., Lin, H., Zhao, X., Du, S., Zou, C., 2020. Hypergraph learning: Methods and
practices. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2548–2566.
Gauvain, M., 2020. Vygotsky’s sociocultural theory, in: Benson, J.B. (Ed.), Encyclopedia of Infant
and Early Childhood Development (Second Edition). second edition ed.. Elsevier, Oxford, pp.
446–454.
Giannakos, M., Cukurova, M., 2023. The role of learning theory in multimodal learning analytics.
British Journal of Educational Technology 54, 1246–1267.
Guo, A., Kamar, E., Vaughan, J.W., Wallach, H., Morris, M.R., 2020. Toward fairness in ai for
people with disabilities sbg@ a research roadmap. ACM SIGACCESS accessibility and computing
, 1–1.
Guyon, I., Elisseeff, A., 2006. An introduction to feature extraction, in: Feature extraction: foun-
dations and applications. Springer, pp. 1–25.
van Haastrecht, M., Brinkhuis, M., Spruit, M., 2024. Federated learning analytics: Investigating
the privacy-performance trade-off in machine learning for educational analytics, in: International
Conference on Artificial Intelligence in Education, Springer. pp. 62–74.
Hou, C., Zhu, G., Zheng, J., Zhang, L., Huang, X., Zhong, T., Li, S., Du, H., Ker, C.L., 2024.
Prompt-based and fine-tuned gpt models for context-dependent and-independent deductive coding
in social annotation, in: Proceedings of the 14th Learning Analytics and Knowledge Conference,
pp. 518–528.
Huang, K., Bryant, T., Schneider, B., 2019. Identifying collaborative learning states using unsu-
pervised machine learning on eye-tracking, physiological and motion sensor data. International
Educational Data Mining Society .
Huang, L., Doleck, T., Chen, B., Huang, X., Tan, C., La joie, S.P., Wang, M., 2023. Multimodal
learning analytics for assessing teachers’ self-regulated learning in planning technology-integrated
lessons in a computer-based environment. Education and Information Technologies 28, 15823–
15843.
Israel, M., Liu, T., Moon, J., Ke, F., Dahlstrom-Hakki, I., 2021. Methodological considerations for
understanding students’ problem solving processes and affective trajectories during game-based
learning: A data fusion approach, in: International Conference on Human-Computer Interaction,
Springer. pp. 201–215.
Ivleva, N., Pentel, A., Dunajeva, O., Juˇstˇsenko, V., 2023. Deep learning based audio-visual emo-
tion recognition in a smart learning environment, in: International Conference on Interactive
Collaborative Learning, Springer. pp. 420–431.
arvel¨a, S., Nguyen, A., Vuorenmaa, E., Malmberg, J., arvenoja, H., 2023. Predicting regulatory
activities for socially shared regulation to optimize collaborative learning. Computers in Human
Behavior 144, 107737.
Jin, F., Maheshi, B., Martinez-Maldonado, R., Gaˇsevi´c, D., Tsai, Y.S., 2024. Scaffolding feedback
literacy: Designing a feedback analytics tool with students. Journal of Learning Analytics 11,
123–137. doi:10.18608/jla.2024.8339.
Kawamura, R., Shirai, S., Takemura, N., Alizadeh, M., Cukurova, M., Takemura, H., Nagahara, H.,
2021. Detecting drowsy learners at the wheel of e-learning platforms with multimodal learning
analytics. IEEE Access 9, 115165–115174.
44
Khosravi, H., Shum, S.B., Chen, G., Conati, C., Tsai, Y.S., Kay, J., Knight, S., Martinez-Maldonado,
R., Sadiq, S., Gaˇsevi´c, D., 2022a. Explainable artificial intelligence in education. Computers and
Education: Artificial Intelligence 3, 100074.
Khosravi, H., Shum, S.B., Chen, G., Conati, C., Tsai, Y.S., Kay, J., Knight, S., Martinez-Maldonado,
R., Sadiq, S., Gaˇsevi´c, D., 2022b. Explainable artificial intelligence in education. Computers and
Education: Artificial Intelligence 3, 100074.
Lave, J., 1991. Situated learning: Legitimate peripheral participation. Cambridge university press.
Lee, Y., Chen, H., Zhao, G., Specht, M., 2022. Wedar: Webcam-based attention analysis via
attention regulator behavior recognition with a novel e-reading dataset, in: Proceedings of the
2022 International Conference on Multimodal Interaction, pp. 319–328.
Lee, Y., Migut, G., Specht, M., 2023. Behavior-based feedback loop for attentive e-reading (bflae):
A real-time computer vision approach. Micro-gesture Analysis for Hidden Emotion Understanding
2023 , 12.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H., 2017. Feature selection:
A data perspective. ACM computing surveys (CSUR) 50, 1–45.
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P.S., He, L., 2022. A survey on text
classification: From traditional to deep learning. ACM Transactions on Intelligent Systems and
Technology (TIST) 13, 1–41.
Li, X., Yan, L., Zhao, L., Martinez-Maldonado, R., Gasevic, D., 2023. Cvpe: A computer vi-
sion approach for scalable and privacy-preserving socio-spatial, multimodal learning analytics, in:
LAK23: 13th International Learning Analytics and Knowledge Conference, pp. 175–185.
Lim, L., Bannert, M., van der Graaf, J., Singh, S., Fan, Y., Surendrannair, S., Rakovic, M., Molenaar,
I., Moore, J., Gaˇsevi´c, D., 2023. Effects of real-time analytics-based personalized scaffolds on
students’ self-regulated learning. Computers in Human Behavior 139, 107547.
Lin, C.J., Wang, W.S., Lee, H.Y., Huang, Y.M., Wu, T.T., 2023. Recognitions of image and speech
to improve learning diagnosis on stem collaborative activity for precision education. Education
and Information Technologies , 1–26.
Ma, Y., Celepkolu, M., Boyer, K.E., 2022. Detecting impasse during collaborative problem solv-
ing with multimodal learning analytics, in: LAK22: 12th International Learning Analytics and
Knowledge Conference, pp. 45–55.
Mangaroska, K., Giannakos, M., 2018. Learning analytics for learning design: A systematic literature
review of analytics-driven design to enhance learning. IEEE Transactions on Learning Technologies
12, 516–534.
Mangaroska, K., Martinez-Maldonado, R., Vesin, B., Gaˇsevi´c, D., 2021. Challenges and opportuni-
ties of multimodal data in human learning: The computer science students’ perspective. Journal
of Computer Assisted Learning 37, 1030–1047.
Martin, F., Chen, Y., Moore, R.L., Westine, C.D., 2020. Systematic review of adaptive learning
research designs, context, strategies, and technologies from 2009 to 2018. Educational Technology
Research and Development 68, 1903–1929.
Martinez-Maldonado, R., Echeverria, V., Fernandez-Nieto, G., Yan, L., Zhao, L., Alfredo, R., Li,
X., Dix, S., Jaggard, H., Wotherspoon, R., et al., 2023. Lessons learnt from a multimodal learning
analytics deployment in-the-wild. ACM Transactions on Computer-Human Interaction 31, 1–41.
McLelland, J., 2024. Connecting piaget’s cognitive development theory to technology in the early
years. He Kupu 8.
45
Moon, J., Ke, F., Sokolikj, Z., Dahlstrom-Hakki, I., 2022. Multimodal data fusion to track students’
distress during educational gameplay. Journal of Learning Analytics 9, 75–87.
Mozafari, J., Jangra, A., Jatowt, A., 2024. Triviahg: A dataset for automatic hint generation from
factoid questions, in: Proceedings of the 47th International ACM SIGIR Conference on Research
and Development in Information Retrieval, pp. 2060–2070.
Mu, S., Cui, M., Huang, X., 2020. Multimodal learning analytics for game-based learning. Sensors
20, 6856.
Nandi, A., Xhafa, F., Subirats, L., Fort, S., 2021. Real-time multimodal emotion classification
system in e-learning context, in: International Conference on Engineering Applications of Neural
Networks, Springer. pp. 423–435.
Narayanan, N., 2024. The era of generative ai: transforming academic libraries, education, and
research .
Nguyen, A., arvel¨a, S., Ros´e, C., arvenoja, H., Malmberg, J., 2023. Examining socially shared
regulation and shared physiological arousal events with multimodal learning analytics. British
Journal of Educational Technology 54, 293–312.
Noroozi, O., Alikhani, I., arvel¨a, S., Kirschner, P.A., Juuso, I., Sepp¨anen, T., 2019. Multimodal
data to design visual learning analytics for understanding regulation of learning. Computers in
Human Behavior 100, 298–304.
Noroozi, O., Pijeira-D´ıaz, H.J., Sobocinski, M., Dindar, M., arvel¨a, S., Kirschner, P.A., 2020. Mul-
timodal data indicators for capturing cognitive, motivational, and emotional learning processes:
A systematic literature review. Education and Information Technologies 25, 5499–5547.
Ochoa, X., Lang, A.C., Siemens, G., 2017. Multimodal learning analytics. The handbook of learning
analytics 1, 129–141.
Ochoa, X., Lang, C., Siemens, G., Wise, A., Gasevic, D., Merceron, A., 2022. Multimodal learning
analytics-rationale, process, examples, and direction. The handbook of learning analytics , 54–65.
Ochoa, X., Worsley, M., 2016. Augmenting learning analytics with multimodal sensory data. Journal
of Learning Analytics 3, 213–219.
OECD, E., 2023. Algorithmic bias, equity and data protection. YouTube.
Olsen, J.K., Sharma, K., Rummel, N., Aleven, V., 2020. Temporal analysis of multimodal data to
predict collaborative learning outcomes. British Journal of Educational Technology 51, 1527–1547.
Ouhaichi, H., Spikol, D., Vogel, B., 2023. Research trends in multimodal learning analytics: A
systematic mapping study. Computers and Education: Artificial Intelligence 4, 100136.
Ouhaichi, H., Spikol, D., Vogel, B., 2024. A systematic review of multimodal learning analytics
design models and frameworks, in: 16th International Conference on Education and New Learning
Technologies.
Ouyang, F., Xu, W., Cukurova, M., 2023. An artificial intelligence-driven learning analytics method
to examine the collaborative problem-solving process from the complex adaptive systems perspec-
tive. International Journal of Computer-Supported Collaborative Learning 18, 39–66.
Oviatt, S., 2018. Ten opportunities and challenges for advancing student-centered multimodal learn-
ing analytics, in: Proceedings of the 20th ACM International Conference on Multimodal Interac-
tion, pp. 87–94.
Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer,
L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., et al., 2021. The prisma 2020 statement: an updated
guideline for reporting systematic reviews. bmj 372.
46
Payne, A.L., Compton, M., Kennedy, S., 2023. Supporting and humanising behavioural change
without the behaviourism: Digital footprints, learning analytics and nudges, in: Human Data
Interaction, Disadvantage and Skills in the Community: Enabling Cross-Sector Environments for
Postdigital Inclusion. Springer, pp. 111–131.
Pei, B., Xing, W., Wang, M., 2023. Academic development of multimodal learning analytics: A
bibliometric analysis. Interactive Learning Environments 31, 3543–3561.
Peng, Q., Qie, N., Yuan, L., Chen, Y., Gao, Q., 2019. Design of an online education evaluation
system based on multimodal data of learners, in: Cross-Cultural Design. Culture and Society:
11th International Conference, CCD 2019, Held as Part of the 21st HCI International Conference,
HCII 2019, Orlando, FL, USA, July 26–31, 2019, Proceedings, Part II 21, Springer. pp. 458–468.
Peng, S., Ohira, S., Nagao, K., 2021. Recognition of students’ multiple mental states in conversation
based on multimodal cues, in: Computer Supported Education: 12th International Conference,
CSEDU 2020, Virtual Event, May 2–4, 2020, Revised Selected Papers 12, Springer. pp. 468–479.
binti Pengiran, P.H.S.N., Besar, H., 2018. Situated learning theory: the key to effective classroom
teaching? HONAI 1.
Prinsloo, P., Slade, S., Khalil, M., 2023. Multimodal learning analytics—in-between student privacy
and encroachment: A systematic review. British Journal of Educational Technology 54, 1566–
1586.
Rahul, Katarya, R., 2023. Deep auto encoder based on a transient search capsule network for student
performance prediction. Multimedia Tools and Applications 82, 23427–23451.
Reilly, J.M., Schneider, B., 2019. Predicting the quality of collaborative problem solving through
linguistic analysis of discourse. International Educational Data Mining Society .
Sabuncuoglu, A., Sezgin, T.M., 2023. Developing a multimodal classroom engagement analysis
dashboard for higher-education. Proceedings of the ACM on Human-Computer Interaction 7,
1–23.
Sagi, O., Rokach, L., 2018. Ensemble learning: A survey. Wiley interdisciplinary reviews: data
mining and knowledge discovery 8, e1249.
Samuelsen, J., Chen, W., Wasson, B., 2019. Integrating multiple data sources for learning analyt-
ics—review of literature. research and practice in technology enhanced learning, 14, article11.
Schneider, B., Worsley, M., Martinez-Maldonado, R., 2021. Gesture and gaze: Multimodal data
in dyadic interactions. International handbook of computer-supported collaborative learning ,
625–641.
Shankar, S.K., Prieto, L.P., Rodr´ıguez-Triana, M.J., Ruiz-Calleja, A., 2018. A review of multimodal
learning analytics architectures, in: 2018 IEEE 18th international conference on advanced learning
technologies (ICALT), IEEE. pp. 212–214.
Shankar, S.K., Ruiz-Calleja, A., Prieto, L.P., Rodr´ıguez-Triana, M.J., Chejara, P., Tripathi, S.,
2023. Cimla: A modular and modifiable data preparation, organization, and fusion infrastructure
to partially support the development of context-aware mmla solutions. JUCS: Journal of Universal
Computer Science .
Sharma, K., Giannakos, M., 2020. Multimodal data capabilities for learning: What can multimodal
data tell us about learning? British Journal of Educational Technology 51, 1450–1484.
Sharma, K., Papamitsiou, Z., Giannakos, M., 2019. Building pipelines for educational data using
ai and multimodal analytics: A “grey-box” approach. British Journal of Educational Technology
50, 3004–3031.
47
Slupczynski, M., Klamma, R., 2021. Milki-psy cloud: Facilitating multimodal learning analytics by
explainable ai and blockchain., in: MILeS@ EC-TEL, pp. 22–28.
Som, A., Kim, S., Lopez-Prado, B., Dhamija, S., Alozie, N., Tamrakar, A., 2020. A machine
learning approach to assess student group collaboration using individual level behavioral cues, in:
Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part
VI 16, Springer. pp. 79–94.
Song, Y., Wang, T., Cai, P., Mondal, S.K., Sahoo, J.P., 2023. A comprehensive survey of few-shot
learning: Evolution, applications, challenges, and opportunities. ACM Computing Surveys 55,
1–40.
Sumer, o., Goldberg, P., D’Mello, S., Gerjets, P., Trautwein, U., Kasneci, E., 2023. Multimodal en-
gagement analysis from facial videos in the classroom. IEEE Transactions on Affective Computing
14, 1012–1027. doi:10.1109/TAFFC.2021.3127692.
Tahiru, F., 2021. Ai in education: A systematic literature review. Journal of Cases on Information
Technology (JCIT) 23, 1–20.
Tan, A.Z., Yu, H., Cui, L., Yang, Q., 2022. Towards personalized federated learning. IEEE transac-
tions on neural networks and learning systems 34, 9587–9603.
Thomas, C., 2018. Multimodal teaching and learning analytics for classroom and online educational
settings, in: Proceedings of the 20th ACM International Conference on Multimodal Interaction,
pp. 542–545.
Tiukhova, E., Vemuri, P., Flores, N.L., Islind, A.S., ´
Oskarsd´ottir, M., Poelmans, S., Baesens, B.,
Snoeck, M., 2024. Explainable learning analytics: Assessing the stability of student success pre-
diction models by means of explainable ai. Decision Support Systems 182, 114229.
Urbanowicz, R.J., Olson, R.S., Schmitt, P., Meeker, M., Moore, J.H., 2018. Benchmarking relief-
based feature selection methods for bioinformatics data mining. Journal of biomedical informatics
85, 168–188.
Vatral, C., Lee, M., Cohn, C., Davalos, E., Levin, D., Biswas, G., 2023. Prediction of students’
self-confidence using multimodal features in an experiential nurse training environment, in: Inter-
national Conference on Artificial Intelligence in Education, Springer. pp. 266–271.
Viola, P., Jones, M.J., 2004. Robust real-time face detection. International journal of computer
vision 57, 137–154.
Wang, Y., Gu, X., 2024. Data fusion in classroom-based multimodal learning analytics: A systematic
literature review, in: Proceedings of the 18th International Conference of the Learning Sciences-
ICLS 2024, pp. 951-954, International Society of the Learning Sciences.
Worsley, M., Martinez-Maldonado, R., 2018. Multimodal learning analytics’ past, present, and
potential futures. CrossMMLA@ LAK 2.
Worsley, M., Martinez-Maldonado, R., D’Angelo, C., 2021. A new era in multimodal learning
analytics: Twelve core commitments to ground and grow mmla. Journal of Learning Analytics 8,
10–27.
Yan, L., Martinez-Maldonado, R., Gallo Cordoba, B., Deppeler, J., Corrigan, D., Gaˇsevi´c, D., 2022a.
Mapping from proximity traces to socio-spatial behaviours and student progression at the school.
British Journal of Educational Technology 53, 1645–1664.
Yan, L., Martinez-Maldonado, R., Gaˇsevi´c, D., 2023. Generative artificial intelligence in learning
analytics: Contextualising opportunities and challenges through the learning analytics cycle. arXiv
preprint arXiv:2312.00087 .
48
Yan, L., Martinez-Maldonado, R., Zhao, L., Deppeler, J., Corrigan, D., Gasevic, D., 2022b. How do
teachers use open learning spaces? mapping from teachers’ socio-spatial data to spatial pedagogy,
Association for Computing Machinery. p. 87–97.
Yan, L., Zhao, L., Gasevic, D., Martinez-Maldonado, R., 2022c. Scalability, sustainability, and
ethicality of multimodal learning analytics, in: LAK22: 12th international learning analytics and
knowledge conference, pp. 13–23.
Yun, H., Fortenbacher, A., Helbig, R., Geißler, S., Pinkwart, N., 2020. Emotion recognition from
physiological sensor data to support self-regulated learning, in: Computer Supported Education:
11th International Conference, CSEDU 2019, Heraklion, Crete, Greece, May 2-4, 2019, Revised
Selected Papers 11, Springer. pp. 155–173.
Yusuf, A., Noor, N.M., Bello, S., 2023. Using multimodal learning analytics to model students’
learning behavior in animated programming classroom. Education and Information Technologies
, 1–44.
Zhao, L., Echeverria, V., Swiecki, Z., Yan, L., Alfredo, R., Li, X., Gasevic, D., Martinez-Maldonado,
R., 2024. Epistemic network analysis for end-users: Closing the loop in the context of multimodal
analytics for collaborative team learning, in: Proceedings of the 14th Learning Analytics and
Knowledge Conference, pp. 90–100.
Zhou, Q., Bhattacharya, A., Suraworachet, W., Nagahara, H., Cukurova, M., 2023. Automated
detection of students’ gaze interactions in collaborative learning videos: A novel approach, in:
European Conference on Technology Enhanced Learning, Springer. pp. 504–517.
Zhou, Q., Suraworachet, W., Cukurova, M., 2024. Detecting non-verbal speech and gaze behaviours
with multimodal data and computer vision to interpret effective collaborative learning interactions.
Education and Information Technologies 29, 1071–1098.
A Appendix
49
Table 5: AI-enhanced steps in each included study: Overview of the specific phases, steps, and
sub-components within MMLA that were augmented by AI techniques across the selected studies.
#Papers Collection Pre-processing Annotation Fusion Modelling Analysis
Acquisition Anonymizing Cleaning Transformation Augmentation Labelling F-Engineering M-Learning I-Generation
1 (Reilly and Schneider, 2019)
2 (Sabuncuoglu and Sezgin, 2023)
3 (Huang et al., 2019)
4 (Sharma et al., 2019)
5 (Chejara et al., 2020)
6 (Yun et al., 2020)
7 (Olsen et al., 2020)
8 (Emerson et al., 2020b)
9 (Emerson et al., 2020a)
10 (Cukurova et al., 2020b)
11 (Som et al., 2020)
12 (Chng et al., 2020)
13 (Chango et al., 2021)
14 (Kawamura et al., 2021)
15 (Israel et al., 2021)
16 (Cebral-Loureda and Torres-Huitzil, 2021)
17 (Peng et al., 2021)
18 (Nandi et al., 2021)
19 (Closser et al., 2022)
20 (Lee et al., 2022)
21 (Di Mitri et al., 2022)
22 (Chettaoui et al., 2023)
23 (Moon et al., 2022)
24 (Ma et al., 2022)
25 (Chejara et al., 2023b)
26 (Chan et al., 2023)
27 (Akre et al., 2023)
28 (Akila et al., 2023)
29 (Yusuf et al., 2023)
30 (Vatral et al., 2023)
31 (Sabuncuoglu and Sezgin, 2023)
32 (Ouyang et al., 2023)
33 (Nguyen et al., 2023)
34 (Lin et al., 2023)
35 (Lee et al., 2023)
36 (J¨arvel¨a et al., 2023)
37 (Ivleva et al., 2023)
38 (Huang et al., 2023)
39 (Emerson et al., 2023)
40 (Li et al., 2023)
41 (Chejara et al., 2023a)
42 (Zhou et al., 2024)
43 (Zhao et al., 2024)
50
... The next generation of AI systems will incorporate more nuanced models of psychological functioning, moving beyond current pattern recognition approaches to develop truly adaptive intelligence that can engage with the emotional and social dimensions of learning (Sarkar et al., 2022). AI research is rapidly moving toward multimodal analysis that combines psychological data, behavioural patterns, linguistic markers and performance metrics into integrated psychological assessment frameworks (Henlein et al., 2024;Mohammadi et al., 2024). This convergence of diverse data streams will enable unprecedented insights into the complex interplay of biological, psychological and social factors that influence learning and development, creating multidimensional models of psychological functioning that far exceed the capabilities of traditional assessment methods. ...
Chapter
Full-text available
This article explores the transformative impact of artificial intelligence on psychological research in educational contexts, examining how AI technologies are revolutionising our understanding of human learning and cognitive development. By analysing the amalgamation of machine learning algorithms with psychological assessment, AI-driven adaptive learning systems, emotional intelligence recognition and predictive modelling are creating unprecedented opportunities for personalised educational experiences that respond to individual psychological profiles. Through interdisciplinary approaches that bridge neuroscience, psychology and computer science, AI is enabling a paradigm shift towards more nuanced, responsive and psychologically informed educational practices that have the potential to fundamentally reshape how we understand and support human learning.
... These techniques were used to deliver automated feedback in the form of summative assessments based on predefined templates (Keuning et al., 2019). Pre-transformer ML and NLP methods were also used to, for instance, compare students' solutions to fully worked examples, demonstrating the correct approach and solution steps; display ML model predictions about students' expected performance; or compare students to their peers using information visualizations and learning analytics dashboards (Martinez-Maldonado et al., 2020;Di Mitri et al., 2022;Mohammadi et al., 2024). Additionally, recommendation system algorithms have been used to suggest the next learning path for students (Keuning et al., 2019;Cavalcanti et al., 2021). ...
Article
Full-text available
Engaging students in creating high-quality novel content, such as educational resources, promotes deep and higher-order learning. However, students often lack the necessary training or knowledge to produce such content. To address this gap, this paper explores the potential of incorporating generative AI (GenAI) to review students’ work and provide them with real-time feedback and assistance during content creation. Specifically, we use RiPPLE, which enables students to create bite-size learning resources and incorporates instant GenAI feedback, highlighting strengths and suggesting improvements to enhance quality. The AI reviews the resource and provides feedback encompassing three main components: a summary of the resource, a list of strengths, and suggestions for improvement. We evaluate this approach by analyzing log data from 1063 student-created multiple-choice questions (MCQs) and the corresponding AI feedback. This analysis aims to understand the depth, scope, and tone of the feedback provided by the AI, as well as the way students engage with and utilize this feedback in their content creation process. Additionally, we examined the perceived helpfulness of the GenAI feedback analyzed via 3324 student ratings and thematically analyzed 601 comments they provided about the feedback. Our findings demonstrate the potential value of AI-generated feedback for students when integrated into pedagogical design. Our analysis suggests that not only can AI-generated feedback provide students with a breadth of feedback to improve their writing and/or discipline-specific content knowledge, but also it is largely well received by students for both its clarity and its positive tone. Despite challenges in ensuring the accuracy of AI-generated feedback, this study shows how this feedback can enable students to make actionable changes in their academic performance.
Article
Full-text available
Feedback is essential in learning. The emerging concept of feedback literacy underscores the skills students require for effective use of feedback. This highlights students’ responsibilities in the feedback process. Yet, there is currently a lack of mechanisms to understand how students make sense of feedback and whether they act on it. This gap makes it hard to effectively support students in feedback literacy development and improve the quality of feedback. As a specific application of learning analytics, feedback analytics (analytics on learner engagement with feedback) can offer insights into students’ learning engagement and progression, which can in turn be used to scaffold student feedback literacy. This study proposes a feedback analytics tool, designed with students, aimed at aiding students to synthesize feedback received from multiple sources, scaffold the sense-making process, and prompt deeper reflections or actions on feedback based on data about students’ interactions with feedback. We held focus group discussions with 38 students to learn about their feedback experiences and identified tool features. Based on identified user requirements, a prototype was developed and validated with 16 students via individual interviews. Based on the findings, we envision a feedback analytics tool with the aim of scaffolding student feedback literacy.
Conference Paper
Full-text available
The practice of soliciting self-explanations from students is widely recognized for its pedagogical benefits. However, the labor-intensive effort required to manually assess students' explanations makes it impractical for classroom settings. As a result, many current solutions to gauge students' understanding during class are often limited to multiple choice or fill-in-the-blank questions, which are less effective at exposing misconceptions or helping students to understand and integrate new concepts. Recent advances in large language models (LLMs) present an opportunity to assess student explanations in real-time, making explanation-based classroom response systems feasible for implementation. In this work, we investigate LLM-based approaches for assessing the correctness of students' explanations in response to undergraduate computer science questions. We investigate alternative prompting approaches for multiple LLMs (i.e., Llama 2, GPT-3.5, and GPT-4) and compare their performance to FLAN-T5 models trained in a fine-tuning manner. The results suggest that the highest accuracy and weighted F1 score were achieved by fine-tuning FLAN-T5, while an in-context learning approach with GPT-4 attains the highest macro F1 score.
Article
Full-text available
Piaget’s theory of stage structure is synonymous with discussions involving cognitive development. As with any theoretical model, researchers inevitably and rightly seek to affirm and/or contest the elements of the model presented. In this comparative study, students’ performance across three hands-on engineering tasks for two distinct student cohort groups were investigated including young primary school students (aged 8 to 10) in Piaget’s concrete operations; and older secondary school students (aged 15 to 18) in Piaget’s formal operations stage of cognitive development. The purpose was to gain an insight into Piaget’s stage structure from the perspective of the compulsory national Technologies curriculum in Australia, of which engineering is a core subject. The senior students outperformed their younger peers on all three tasks (simple, complicated and complex), with differences in abstraction and spatial inferential reasoning abilities increasing, as the task complexity increased. Although there is very limited evidence linking practical technological subjects and Piaget’s cognitive development model, the findings were consistent with respect to students’ abstract thinking capabilities and their cognitive development.
Chapter
Full-text available
Artificial intelligence (AI) and multimodal data (MMD) are gaining popularity in education for their ability to monitor and support complex teaching and learning processes. This line of research and practice was recently named Multimodal Learning Analytics (MMLA). However, MMLA raise serious ethical concerns given the pervasive nature of MMD and the opaque AI techniques that process them. This study aims to explore ethical concerns related to MMLA use in higher education and proposes a framework for raising awareness of these concerns, which could lead to more ethical MMLA research and practice. This chapter presents the findings of 60 interviews with educational stakeholders (39 higher education students, 12 researchers, 8 educators, and 1 representative of an MMLA company). A thematic coding of verbatim transcriptions revealed nine distinct themes. The themes and associated probing questions for MMLA stakeholders are presented as a draft of the first ethical MMLA framework. The chapter is concluded with a discussion of the emerging themes and suggestions for MMLA research and practice.
Article
Full-text available
This descriptive study focuses on using voice activity detection (VAD) algorithms to extract student speech data in order to better understand the collaboration of small group work and the impact of teaching assistant (TA) interventions in undergraduate engineering discussion sections. Audio data were recorded from individual students wearing head‐mounted noise‐cancelling microphones. Video data of each student group were manually coded for collaborative behaviours (eg, group task relatedness, group verbal interaction and group talk content) of students and TA–student interactions. The analysis includes information about the turn taking, overall speech duration patterns and amounts of overlapping speech observed both when TAs were intervening with groups and when they were not. We found that TAs very rarely provided explicit support regarding collaboration. Key speech metrics, such as amount of turn overlap and maximum turn duration, revealed important information about the nature of student small group discussions and TA interventions. TA interactions during small group collaboration are complex and require nuanced treatments when considering the design of supportive tools. Practitioner notes What is already known about this topic Student turn taking can provide information about the nature of student discussions and collaboration. Real classroom audio data of small groups typically have lots of background noise and present challenges for audio analysis. TAs have little training in how to productively intervene with students about collaborative skills. What this paper adds TA interaction with groups primarily focused on task progress and understanding of concepts with negligible explicit support on building collaborative skills. TAs intervened with the groups often which gave the students little time for uptake of their suggestions or deeper discussion. Student turn overlap was higher without the presence of TAs. Maximum turn duration can be an important real‐time turn metric to identify the least verbally active student participant in a group. Implications for practice and/or policy TA training should include information about how to monitor groups for collaborative behaviours and when and how they should intervene to provide feedback and support. TA feedback systems should keep track of previous interventions by TAs (especially in contexts where there are multiple TAs facilitating) and the duration since previous intervention to ensure that TAs do not intervene with a group too frequently with little time for student uptake. Maximum turn duration could be used as a real‐time metric to identify the least verbally active student in a group so that support could be provided to them by the TAs.
Conference Paper
Full-text available
GPT has demonstrated impressive capabilities in executing various natural language processing (NLP) and reasoning tasks, showcas-ing its potential for deductive coding in social annotations. This research explored the effectiveness of prompt engineering and fine-tuning approaches of GPT for deductive coding of context-dependent and context-independent dimensions. Coding context-dependent dimensions (i.e., Theorizing, Integration, Reflection) requires a contextualized understanding that connects the target comment with reading materials and previous comments, whereas coding context-independent dimensions (i.e., Appraisal, Questioning , Social, Curiosity, Surprise) relies more on the comment itself. Utilizing strategies such as prompt decomposition, multi-prompt learning, and a codebook-centered approach, we found that prompt engineering can achieve fair to substantial agreement with expert-labeled data across various coding dimensions. These results affirm GPT's potential for effective application in real-world coding tasks. Compared to context-independent coding, context-dependent dimensions had lower agreement with expert-labeled data. To enhance accuracy, GPT models were fine-tuned using 102 pieces of expert-labeled data, with an additional 102 cases used for validation. The fine-tuned models demonstrated substantial agreement with ground truth in context-independent dimensions and elevated the inter-rater reliability of context-dependent categories to moderate levels. This approach represents a promising path for significantly Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). reducing human labor and time, especially with large unstructured datasets, without sacrificing the accuracy and reliability of deduc-tive coding tasks in social annotation. The study marks a step toward optimizing and streamlining coding processes in social annotation. Our findings suggest the promise of using GPT to analyze qualitative data and provide detailed, immediate feedback for students to elicit deepening inquiries.