PreprintPDF Available

Approximating eye gaze with head pose in a virtual reality microteaching scenario for pre-service teachers

Authors:

Abstract and Figures

Although immersive virtual reality (IVR) technology is becoming increasingly accessible, head-mounted displays with eye tracking capability are more costly and therefore rarely used in educational settings outside of research. This is unfortunate, since combining IVR with eye tracking can reveal crucial information about the learners' behavior and cognitive processes. To overcome this issue, we investigated whether the positional tracking of learners during a short teaching exercise in IVR (i.e., microteaching) may predict the actual fixation on a given set of classroom objects. We analyzed the positional data of pre-service teachers from 23 microlessons by means of a random forest and compared it to two baseline models. The algorithm was able to predict the correct eye fixation with an F1-score of .8637, an improvement of .5770 over inferring eye fixations based on the forward direction of the IVR headset (head gaze). The head gaze itself was a .1754 improvement compared to predicting the most frequent class (i.e., Floor). Our results indicate that the positional tracking data can successfully approximate eye gaze in an IVR teaching scenario, making it a promising candidate for investigating the pre-service teachers' ability to direct students' and their own attentional focus during a lesson.
Content may be subject to copyright.
APPROXIMATING EYE GAZE 1
Approximating eye gaze with head pose in a virtual reality microteaching
scenario for pre-service teachers.
Ivan Moser1, Martin Hlosta1, Per Bergamin1, Umesh Ramnarain2, Christo Van der
Westhuizen2, Mafor Penn2, Noluthando Mdlalose2, Koketso Pila2, and Ayodele Ogegbo2
1Swiss Distance University of Applied Sciences, Brig, Switzerland
2University of Johannesburg, Johannesburg, South Africa
APPROXIMATING EYE GAZE 2
Author Note
Ivan Moser https://orcid.org/0000-0003-2139-2421 Martin Hlosta
https://orcid.org/0000-0002-7053-7052 Per Bergamin
https://orcid.org/0000-0002-2551-9058 Umesh Ramnarain
https://orcid.org/0000-0003-4548-5913 Christo Van der Westhuizen
https://orcid.org/0000-0002-4762-8538 Mafor Penn
https://orcid.org/0000-0001-6217-328X Noluthando Mdlalose
https://orcid.org/0000-0002-5094-1074 Koketso Pila
https://orcid.org/0000-0002-8539-0348 Ayodele Ogegbo
https://orcid.org/0000-0002-4680-6689
Correspondence concerning this article should be addressed to Ivan Moser, Swiss
Distance University of Applied Sciences, Institute for Research in Open, Distance and
eLearning (IFeL), Schinerstrasse 18, 3900 Brig, Switzerland 13820. E-mail:
ivan.moser@ffhs.ch
APPROXIMATING EYE GAZE 3
Abstract
Although immersive virtual reality (IVR) technology is becoming increasingly accessible,
head-mounted displays with eye tracking capability are more costly and therefore rarely
used in educational settings outside of research. This is unfortunate, since combining IVR
with eye tracking can reveal crucial information about the learners’ behavior and cognitive
processes. To overcome this issue, we investigated whether the positional tracking of
learners during a short teaching exercise in IVR (i.e., microteaching) may predict the
actual fixation on a given set of classroom objects. We analyzed the positional data of
pre-service teachers from 23 microlessons by means of a random forest and compared it to
two baseline models. The algorithm was able to predict the correct eye fixation with an
F1-score of .8637, an improvement of .5770 over inferring eye fixations based on the
forward direction of the IVR headset (head gaze). The head gaze itself was a .1754
improvement compared to predicting the most frequent class (i.e., Floor). Our results
indicate that the positional tracking data can successfully approximate eye gaze in an IVR
teaching scenario, making it a promising candidate for investigating the pre-service
teachers’ ability to direct students’ and their own attentional focus during a lesson.
Keywords: virtual reality, eye gaze, eye tracking, positional tracking, teacher
education, microteaching, multimodal learning analytics
APPROXIMATING EYE GAZE 4
Approximating eye gaze with head pose in a virtual reality microteaching
scenario for pre-service teachers.
Introduction
Immersive virtual reality (IVR) enables the delivery of educational content in
situations where traditional in-person instruction would be dangerous, impossible,
counterproductive, or simply too expensive (Bailenson, 2018). Not surprisingly, there has
been a steady increase in research interest, investigating the promise and pitfalls of VR in
education (Mayer et al., 2022).
Besides the situational benefits, another important strength of IVR is hardware
related. Modern consumer IVR headsets are equipped with an array of various built-in
sensors. Originally designed to enable and enhance the experience of immersive games,
they can also be exploited for the purpose of gathering real-time user data that can be
related to the learning process and outcome. For example, positional data can provide
insight about learning outcome (Moore et al., 2020), cognitive load (Moser et al., 2022),
and social interactions (Yaremych & Persky, 2019).
One particular sensor that has been previously hardly accessible but is finding its
way into consumer devices is eye tracking. Put simply, video-based eye trackers emit
infrared/near-infrared light and utilize the resulting corneal reflections and their spatial
relation to the center of the pupil to estimate eye gaze vectors (Skaramagkas et al., 2023).
In combination with IVR, eye tracking offers unprecedented opportunities to study human
behavior and cognition (Clay et al., 2019). IVR allows creating highly realistic and
controlled environments, and modern game engines make it relatively easy to record gaze
directions and areas of interest (AOI) compared to mobile eye tracking systems that track
gaze in the real world. It has also been demonstrated that eye trackers integrated into IVR
headsets achieve sufficient levels to reliably identify the current fixation location, provided
that the gaze targets of interest are not in close proximity (Schuetz & Fiehler, 2022).
Consequently, the value of eye tracking in IVR could be demonstrated across a wide
APPROXIMATING EYE GAZE 5
range of tasks. More specifically, eye tracking was shown to enhance user interactions in
IVR, for example object selection (Wang & Kopper, 2021) or typing on a virtual keyboard
(Zhao et al., 2023). In the context of education and training, it is important to note that
eye tracking can be used to infer cognitive load (Bozkir et al., 2019), joint attention of
learners (Jing et al., 2022), and the distribution of teachers’ visual attention in the
classroom (Keskin et al., 2024). This opens up many possibilities, ranging from
personalized IVR learning experiences to enhanced performance feedback for learners and
teachers, respectively.
However, despite the promising research findings, eye tracking is still
underrepresented in practical settings outside a scientific context. It is conceivable that the
higher cost of IVR headsets with integrated eye-trackers make these devices less accessible
for educational use cases. This is even more relevant in the case of collaborative learning,
where a classroom would need to be equipped with a higher number of head-mounted
displays.
Therefore, this study set out to investigate whether the position and orientation (i.e.
pose) of an IVR headset offers a viable approximation of eye gaze. The research question
was driven by the idea that, provided head pose (hereafter referred to as head gaze) and
eye gaze align sufficiently well, the former could be used to substitute the latter, therefore
offering a low-cost alternative to IVR headsets with integrated eye trackers.
Related Work
Despite the high practical relevance, little research exists to date that studied
whether head gaze can sufficiently approximate eye gaze in IVR. However, there is a recent
study that argued that head gaze can indeed serve as a proxy for eye gaze in the context of
human-robot interaction (Higgins et al., 2022) when the aim is to teach a (virtual) robot
about a person’s intent, i.e. what object a person is intending to interact with. Similarly,
head gaze has proven useful in a scenario involving the collaboration with a virtual agent
APPROXIMATING EYE GAZE 6
(Andrist et al., 2017). In this study, the use of bidirectional head gaze between human
participants and a virtual character was shown to have a similar positive effect on task
performance as bidirectional gaze using eye tracking.
In the same vein, a few studies from the field of social psychology have utilized head
gaze as a proxy for social eye contact. For example, one study investigated how
participants interacted with a virtual physician during a simulated clinical visit (Persky
et al., 2016). The authors reported that the emotional state of the participants influenced
the amount of eye contact they made during the conversation with the physician. Another
IVR study tracked nonverbal behavior of participants in a virtual classroom and found
different patterns of head movement depending on the level of self-reported social anxiety.
Participants with higher level of anxiety exhibited more lateral head movement, indicating
increased room scanning behavior compared to participants with low levels of anxiety
(Won et al., 2016).
Both studies made the implicit assumption that users are mostly looking straight
ahead when wearing an IVR headset, thus exhibiting little eye-in-head motion range.
Although it has shown to be useful to approximate eye tracking with head tracking
(Andrist et al., 2017; Higgins et al., 2022), it is noteworthy that users’ eye movements in
IVR can show quite substantial deviations from the forward direction of the head pose.
Sidenmark and Gellersen investigated the coordination of eye, head, and body movements
during gaze shifts (Sidenmark & Gellersen, 2019). They found that smaller gaze shifts of
25°visual angle or less are predominantly performed with the eyes and without much
contribution from the head or torso. However, they also reported large inter-individual
differences between users in terms of the eyes’ motion range, varying from 20°to 70°visual
angle. In line with these findings, another study recently found a high correspondence
between eye and head movements in IVR, leading to an accuracy of 75% for AOI with an
angular size of 25°, with a substantial drop in accuracy when the AOI were smaller
(Llanes-Jurado et al., 2021).
APPROXIMATING EYE GAZE 7
Taken together, the existing literature shows initial evidence that head gaze can be
successfully utilized to approximate eye gaze in IVR, provided that careful attention is
directed towards the design of the virtual objects (i.e., AOI). However, we are not aware of
studies that investigated the practicability of these findings in applied settings of learning
or training. Therefore, the aim of the study was twofold. First, we aimed to evaluate the
similarity between eye and head gaze in a dynamic virtual teaching scenario. Based on the
previous findings, we hypothesized that we would observe a high correspondence of head
pose and eye gaze in a sparsely furnished IVR training environment (i.e., a scene with
predominantly large AOI). Second, we investigated whether we could use a machine
learning algorithm to successfully predict the correct eye gaze targets based on the head
gaze plus additional positional tracking data recorded from the IVR headset and
corresponding hand controllers.
Methods
Participants and context
Forty-five pre-service teachers (PSTs) at a large metropolitan university in South
Africa participated in the study. The sample consisted of third-year undergraduates from
the Department of Science and Technology education.
As an integral part of their third-year curriculum, PSTs are practicing their
teaching skills by conducting several microlessons throughout their studies. Microlessons
are defined as short lesson presentations, typically revolving around a single, tightly defined
topic (Banga, 2014). The goal of microlessons is to develop the PSTs’ pedagogical skills in
a safe environment and to teach them how to reflect on their own behavior.
In the context of this study, PSTs chose one of sixteen topics from the subjects of
biology, physics and chemistry. Their task was to prepare the lesson and deliver it using a
learner-centered, inquiry-based teaching strategy inside the IVR environment. In an
inquiry-based science classroom, the teacher is seen as a facilitator, who provides ample
APPROXIMATING EYE GAZE 8
opportunities for learners to actively engage in the learning process (Duran & Duran, 2004;
Turan, 2021). The study was approved by the local ethics committee of the University of
Johannesburg.
IVR Learning Environment
The IVR application was co-designed with five teacher educators to ensure the
alignment with inquiry-based teaching. Hence, the IVR classroom was set up in the Unity
Game Engine with two types of tables 1) a main table to accommodate students during the
introductory and closing phases of the microlessons, and 2) three separate breakout tables,
where groups of two to three students could collaborate on a given task. The classroom
was also equipped with a whiteboard for slide presentations and drawings, and a flipchart
for displaying quiz results. Furthermore, there was a teacher’s podium that hosted a
control panel to manipulate various classroom functions (e.g., controlling the slides,
starting a quiz, etc.). Importantly, it could also be used to select, spawn, and move 3D
objects as well as students between tables. For illustration, a screenshot of the IVR
environment is depicted in Figure 1.
Procedure
Before the participants delivered their microteaching lesson, they received a brief
training about the IVR classroom including a short hands-on experience to familiarize
them with the available tools and objects of the IVR classroom. Then, the PSTs carried
out the teaching exercise while changing roles after each microlesson. For example, in a
group of four PSTs A, B, C, and D, PST A would first take on the role of teacher, while
the other three PSTs would assume the role of students. After a maximum of 15 minutes
allocated for the microlesson, they would change roles and repeat the procedure until each
PST had completed their lesson. Later, the PSTs received individual feedback on their
teaching behavior from their educator based on a recording of the lesson and a learning
analytics dashboard.
APPROXIMATING EYE GAZE 9
Figure 1
Screenshot from the VR Microlesson application with 2 students in the lesson with the
human heart 3D model.
Data collection and preparation
From a total of 51 microlessons held, we selected 23 based on the following criteria:
functional version of the application able to record the eye rotation data (11 excluded
sessions), minimum duration of 5 minutes (17 excluded), and 2 lessons were excluded due
to the reusage of the same login for a teacher and a student in the same microlesson. This
filtering resulted in excluding 6 out of 24 participants as teachers from the dataset. Eye
gaze data was only collected for the teacher roles because these participants wore a Meta
Quest Pro headset with integrated eye tracker as opposed to the Meta Quest 2 headsets for
the student role. PSTs in the teacher role received feedback on their eye gaze behavior via
the learning analytics dashboard after the lesson. Eye tracking was not available for the
student roles due to the limited availability and higher cost of the eye tracking enabled
IVR headsets.
Raw data was collected automatically from each device during the run of the
APPROXIMATING EYE GAZE 10
microlessons. The collected sensors included pose data (position and rotation) from the
IVR headset and both hand controllers, rotation for both eyes and the head and the
objects where the user was gazing at. These are summarized in Table 1. These sensory
data were collected independently for each user and sensor, with different sampling rate for
each sensor - 10 Hz for positional data and 20 Hz for the eye tracking data. Hence, the
sensory data needed to be synchronized first. This was done by creating 50ms time
windows and taking a) the minimum value inside each window time frame for the
positional and rotational data and b) the union of all objects present in the eye and head
gazing data. Each collected row represents a 50ms long window from a microlesson.
Concatenating data from all microlessons generated N=439,749 rows.
We modeled the problem as a multi-class classification. The target variable was the
object that was detected as being gazed at by the user’s eyes. Before training the model,
the dataset had to be cleaned. This included the removal of eye rotation values outside the
reported field-of-view of the IVR headset (representing recording failures) and saccades.
Several approaches exist to identify fixations and saccades, respectively. We used an
Area-of-Interest Identification algorithm, which defines a fixation as a group of consecutive
gaze points that fall within the same target area (Salvucci & Goldberg, 2000). Groups that
did not span a minimum duration of 100 ms were regarded as saccades and excluded from
the analysis. This filtering resulted in a reduced dataset of N=186,864 rows.
The gazed objects can distinguish between specific users, but since these users were
different across sessions, a class User was created to represent all the users. This processing
resulted in a dataset with N=338,040 rows with 13 different classes1, recorded for 18 users
in 23 microlessons, with the average duration 16 minutes (min=5, max=28) and four other
students on average being present apart from the teacher (min=1, max=5).
13D Object, User, Object other, Flipchart, Room LessonState, Whiteboard, Podium, Room Chair, Room
Table, Room Ceiling, Room Floor, Room Wall
APPROXIMATING EYE GAZE 11
Analysis
For the modeling, we utilized a Random Forest classifier. This selection was
motivated by Random Forest being often one of the best classifiers in Learning Analytics
data (Imhof et al., 2022), but also in a recent study on VR data collected from an
educational application (Santamarıa-Bonfil et al., 2020). We used the implementation from
the R ranger package (Wright & Ziegler, 2015). Due to the large dataset, we did not
perform hyperparameter tuning and for the same reason, we used the version of training
without replacement and training on a .632 fraction of the data, setting the seed=123, and
leaving all other parameters default.
We used 5-fold cross validation (i.e. always 80% of the dataset with 20% left for
testing) with the split was stratified by the class distribution. The random forest model
(further referred as "CLASSIFIER") was compared to two baseline models. 1) Model
"FLOOR" represents a naive classifier that classifies all the instances to the majority class.
2) Model "HEAD GAZE" represents a model that is using the gazed object as derived from
the head gaze. All the metrics are reported as mean and standard deviation across the 5
testing folds.
Results
For both the machine learning model and the baselines, we report usual metrics for
a multiclass classification, i.e. precision, recall and F1-score, which were averaged across all
the classes using a) the weighted and b) macro average. We focus more on the weighted
average because we think it is important to consider the distribution of the classes.
The results of both average types are depicted in Figure 2. We see that the
performance using only the baseline model "FLOOR" is very poor, both for the weighted
and macro average. The only value above .20 is a weighted-average recall, due to classifying
the largest class affecting the weighted average more than the macro. The "HEAD GAZE"
baseline performs better than "FLOOR" on all the metrics, showing promising direction.
APPROXIMATING EYE GAZE 12
However all the values are below .35, which is still very poor performance.
On the other hand, both averages reveal a steep increase in the performance for the
"CLASSIFIER" over both of the baselines, with the F1-score for the macro-average .7568
and the weighted average .8637. The higher values of the weighted average are caused by a
better performance on the larger classes. This is expected, as some of the minor classes
might not have a sufficient representation in the dataset to produce good results.
Figure 2
Precision, recall and F1-score for all three models using weighted average (left), macro
average (center) and a table with the results for both average types (right).
A similar picture about the improvement of the machine learning model compared
to the baseline appears from Figure 3, depicting the heatmaps for the "HEAD GAZE" and
"CLASSIFIER" predictors. While the baseline model matrix is quite scattered, and full of
misclassifications for almost every class, the "CLASSIFIER" model reveals a more
pronounced diagonal line indicating higher precision on all classes. For example, it is
apparent that the "HEAD GAZE", misclassified many objects as "User". This would
indicate that a teacher is indeed paying closer attention to students, a laudable feature of
student-centered teaching. These false-positives for a "User" are significantly reduced for
the "CLASSIFIER" model. Still, the classifier is far from perfect, especially because of the
APPROXIMATING EYE GAZE 13
many misclassifications for the two largest classes "Room Wall" and "Room Floor".
Figure 3
Two heatmaps depicting the correspondence of gaze targets as determined by the eye
tracking ("Gaze Target Eye") with a) detected objects from the IVR headset’s forward
orientation ("Gaze Target Head") and b) the predicted gaze targets of the random forest.
Darker shades on the diagonal represent higher classification performance.
Discussion
Investigating the use of positional tracking in a microteaching scenario, we found a
low correspondence of gaze targets inferred from eye tracking and the forward orientation
of an IVR headset, that is, comparing eye gaze and head gaze. This result suggests that
head gaze alone does not sufficiently approximate eye gaze, which is in contrast to previous
reports claiming that the former can be used as proxy for the latter (Andrist et al., 2017;
Higgins et al., 2022). However, our finding is consistent with the notion that people exhibit
substantial eye gaze deviations from the head forward direction with large inter-individual
differences regarding the eye-in-head motion range (Sidenmark & Gellersen, 2019).
APPROXIMATING EYE GAZE 14
Moreover, it is noteworthy that contrary to controlled, experimental studies, we
investigated positional tracking in an applied training scenario. Teaching a microlesson in
the IVR environment entailed dynamic motion in terms of changing between different
locations ("teleporting") and the handling of various interactive tools and objects.
Nevertheless, we could demonstrate the usefulness of positional tracking data in
IVR. Although head gaze matched the eye gaze only poorly, submitting the positional data
of the IVR headset and hand controllers to a Random Forest classifier, the model was able
to predict the fixations of the eye tracking with high precision and recall. More specifically,
the results indicated a .8637 ±.0020 F1-score for weighted-average of the random forest.
Compared to the F1-score of .2867 ±.0017 for the baseline model with head gaze, this
represents an improvement of .5770.
Despite the promising results regarding the usefulness of positional tracking data to
predict actual eye gaze during teaching in IVR, it is important to discuss potential
limitations of our approach to the data analysis. Although we trained and evaluated our
classifier on different data samples, both datasets contained data from the same
individuals. It is therefore conceivable that the resulting predictive performance is inflated,
i.e., higher than if the model had been evaluated on new participants. This holds
particularly true as people show significant inter-individual differences in eye movement
behavior (Sidenmark & Gellersen, 2019), which would make the prediction of new
participants’ behavior challenging. However, it is also important to emphasize that the
PSTs rotate roles in the teaching exercise. Therefore, it can be considered adequate to train
an algorithm on the PSTs’ data in a session when they are wearing an eye tracking enabled
headset, and use that model to make inferences about their gaze in the other sessions.
Another potential limitation to note is that classical random forest classifiers do not
generate high-quality models on correlated data (Ngufor et al., 2019). This stems from the
violated assumption of independent and identically distributed when dealing with
longitudinal data. Therefore, a future direction of our research is to use a more
APPROXIMATING EYE GAZE 15
computationally intensive model (e.g. a long short-term memory (LSTM) recurrent neural
network) designed to handle time-series data.
Finally, we would like to point out that, to our knowledge, no established,
independent estimates of the Meta Quest Pro’s eye tracking performance exist to date. A
preliminary study found an accuracy of 1.652ºand a precision of 0.699ºstandard deviation,
which is comparable to other IVR devices with integrated eye tracking (Wei et al., 2023).
However, the authors of the study also point out to be careful when interpreting fixation
results. Their word of caution is related to the findings that the validity and reliability of
eye tracking in IVR is influenced by many interacting factors, e.g. the placement of visual
targets close or far from the periphery or vision correction (Schuetz & Fiehler, 2022).
Generally speaking, there is always a certain uncertainty involved in eye tracking research
in the absence of an external reference measurement. This is not a specific limitation of
this study but rather a general problem of eye tracking research.
For future direction, we are planning to corroborate and validate our findings by a)
employing a more adequate machine learning model (see above) and b) investigating how
well our results transfer from the teacher to the student role. For this purpose, we are
planning to equip students with eye tracking enabled devices too. This would allow us to
train and test an ML algorithm on different sessions of the same PST in different roles.
Showing that the good predictive performance power of the positional data generalizes
across different sessions could have far-reaching practical implications for teacher education.
It would equip PSTs and their educators with sophisticated, non-obtrusive ways to measure
the attentional focus of teacher and students during a teaching exercise. For example, it
could be used to make inferences about the PSTs’ ability to distribute their attention to all
students equally, and to make them aware of how their behavior compares to that of
experienced teachers (Keskin et al., 2024). Generally, visualizing the attentional focus can
greatly contribute to augmenting the feedback the PSTs receive from their peers and
educators, therefore improving this central component of teacher training (Banga, 2014).
APPROXIMATING EYE GAZE 16
Conclusion
In this study, we investigated to what extent we can approximate the eye gazed
objects by a) head gaze and b) a random forest classifier trained using the combination of
position and rotation data. This is in the context of an IVR classroom of PSTs practicing
their lesson. We found an added benefit of the machine learning model, which showed a
good performance, opposed to using only the rather poor results of the pure head gaze
approximation. These results are promising as they suggest that in some contexts, using
cheaper devices might be sufficient to estimate the eye gaze of IVR users, and enable
analytics possible currently only on expensive devices with eye tracking.
Acknowledgments
We would like to thank Lucas Martinic and Ferhan Özkan from XR Bootcamp
GmbH for programming the VR application and Deian Popic for creating the 3D assets
used in the study. The study was partially funded by a Higher Ed XR Innovation grant
from the Tides Foundation.
APPROXIMATING EYE GAZE 17
References
Andrist, S., Gleicher, M., & Mutlu, B. (2017). Looking Coordinated: Bidirectional Gaze
Mechanisms for Collaborative Interaction with Virtual Characters. Proc. of the 2017
CHI Conf. on Human Factors in Computing Systems, 2571–2582.
https://doi.org/10.1145/3025453.3026033
Bailenson, J. (2018). Experience on demand: What virtual reality is, how it works, and
what it can do. [Pages: 290]. W. W. Norton & Company.
Banga, C. L. (2014). Microteaching, an efficient technique for learning effective teaching.
Scholarly research journal for interdisciplinary studies,15 (2), 2206–2211.
Bozkir, E., Geisler, D., & Kasneci, E. (2019). Person Independent, Privacy Preserving, and
Real Time Assessment of Cognitive Load using Eye Tracking in a Virtual Reality
Setup. 2019 IEEE Conf. on Virtual Reality and 3D User Interfaces (VR),
1834–1837. https://doi.org/10.1109/VR.2019.8797758
Clay, V., König, P., & König, S. U. (2019). Eye tracking in virtual reality. Journal of Eye
Movement Research,12 (1). https://doi.org/10.16910/jemr.12.1.3
Duran, L. B., & Duran, E. (2004). The 5E instructional model: A learning cycle approach
for inquiry-based science teaching. [Publisher: ERIC]. Science Education Review,
3(2), 49–58.
Higgins, P., Barron, R., & Matuszek, C. (2022). Head pose as a proxy for gaze in virtual
reality. 5th international workshop on virtual, augmented, and mixed reality for
HRI. https://openreview.net/forum?id=ShGeRZBcp19
Imhof, C., Comsa, I.-S., Hlosta, M., Parsaeifard, B., Moser, I., & Bergamin, P. (2022).
Prediction of dilatory behavior in elearning: A comparison of multiple machine
learning models. IEEE Transactions on Learning Technologies.
Jing, A., May, K., Matthews, B., Lee, G., & Billinghurst, M. (2022). The Impact of Sharing
Gaze Behaviours in Collaborative Mixed Reality. Proceedings of the ACM on
Human-Computer Interaction,6(CSCW2), 1–27. https://doi.org/10.1145/3555564
APPROXIMATING EYE GAZE 18
Keskin, O., Seidel, T., Stürmer, K., & Gegenfurtner, A. (2024). Eye-tracking research on
teacher professional vision: A meta-analytic review. Educational Research Review,
42, 100586. https://doi.org/10.1016/j.edurev.2023.100586
Llanes-Jurado, J., Marín-Morales, J., Moghaddasi, M., Khatri, J., Guixeres, J., &
Alcañiz, M. (2021). Comparing Eye Tracking and Head Tracking During a Visual
Attention Task in Immersive Virtual Reality. In M. Kurosu (Ed.), Human-Computer
Interaction. Interaction Techniques and Novel Applications (pp. 32–43). Springer
International Publishing. https://doi.org/10.1007/978-3-030-78465-2_3
Mayer, R. E., Makransky, G., & Parong, J. (2022). The Promise and Pitfalls of Learning in
Immersive Virtual Reality. Int. Journal of Human–Computer Interaction, 1–10.
https://doi.org/10.1080/10447318.2022.2108563
Moore, A. G., McMahan, R. P., Dong, H., & Ruozzi, N. (2020). Extracting Velocity-Based
User-Tracking Features to Predict Learning Gains in a Virtual Reality Training
Application. 2020 IEEE Int. Symposium on Mixed and Augmented Reality
(ISMAR), 694–703. https://doi.org/10.1109/ISMAR50242.2020.00099
Moser, I., Comsa, I.-S., Parsaeifard, B., & Bergamin, P. (2022). Work-in-Progress–Motion
Tracking Data as a Proxy for Cognitive Load in Immersive Learning. 2022 8th
International Conference of the Immersive Learning Research Network (iLRN), 1–3.
https://doi.org/10.23919/iLRN55037.2022.9815894
Ngufor, C., Van Houten, H., Caffo, B. S., Shah, N. D., & McCoy, R. G. (2019). Mixed
effect machine learning: A framework for predicting longitudinal change in
hemoglobin a1c. Journal of Biomedical Informatics,89, 56–67.
https://doi.org/10.1016/j.jbi.2018.09.001
Persky, S., Ferrer, R. A., & Klein, W. M. P. (2016). Nonverbal and paraverbal behavior in
(simulated) medical visits related to genomics and weight: A role for emotion and
race. Journal of Behavioral Medicine,39 (5), 804–814.
https://doi.org/10.1007/s10865-016-9747-5
APPROXIMATING EYE GAZE 19
Salvucci, D. D., & Goldberg, J. H. (2000). Identifying fixations and saccades in
eye-tracking protocols. Proc. of the symposium on Eye tracking research &
applications - ETRA ’00, 71–78. https://doi.org/10.1145/355017.355028
Santamarıa-Bonfil, G., Ibáñez, M. B., Pérez-Ramırez, M., Arroyo-Figueroa, G., &
Martınez-Álvarez, F. (2020). Learning analytics for student modeling in virtual
reality training systems: Lineworkers case. Computers & Education,151, 103871.
Schuetz, I., & Fiehler, K. (2022). Eye tracking in virtual reality: Vive pro eye spatial
accuracy, precision, and calibration reliability. Journal of Eye Movement Research,
15 (3). https://doi.org/10.16910/jemr.15.3.3
Sidenmark, L., & Gellersen, H. (2019). Eye, Head and Torso Coordination During Gaze
Shifts in Virtual Reality. ACM Trans. on Computer-Human Interaction,27 (1),
1–40. https://doi.org/10.1145/3361218
Skaramagkas, V., Giannakakis, G., Ktistakis, E., Manousos, D., Karatzanis, I., Tachos, N.,
Tripoliti, E., Marias, K., Fotiadis, D. I., & Tsiknakis, M. (2023). Review of Eye
Tracking Metrics Involved in Emotional and Cognitive Processes. IEEE Reviews in
Biomedical Engineering,16, 260–277. https://doi.org/10.1109/RBME.2021.3066072
Turan, S. (2021). Pre-Service Teacher Experiences of the 5E Instructional Model: A
Systematic Review of Qualitative Studies. Eurasia Journal of Mathematics, Science
and Technology Education,17 (8), em1994. https://doi.org/10.29333/ejmste/11102
Wang, Y., & Kopper, R. (2021). Efficient and Accurate Object 3D Selection With Eye
Tracking-Based Progressive Refinement. Frontiers in Virtual Reality,2, 607165.
https://doi.org/10.3389/frvir.2021.607165
Wei, S., Bloemers, D., & Rovira, A. (2023). A Preliminary Study of the Eye Tracker in the
Meta Quest Pro. Proceedings of the 2023 ACM International Conference on
Interactive Media Experiences, 216–221. https://doi.org/10.1145/3573381.3596467
APPROXIMATING EYE GAZE 20
Won, A. S., Perone, B., Friend, M., & Bailenson, J. N. (2016). Identifying Anxiety Through
Tracked Head Movements in a Virtual Classroom. Cyberpsychology, Behavior, and
Social Networking,19 (6), 380–387. https://doi.org/10.1089/cyber.2015.0326
Wright, M. N., & Ziegler, A. (2015). Ranger: A fast implementation of random forests for
high dimensional data in c++ and r. arXiv preprint arXiv:1508.04409.
Yaremych, H. E., & Persky, S. (2019). Tracing physical behavior in virtual reality: A
narrative review of applications to social psychology [Publisher: Elsevier]. Journal of
Experimental Social Psychology,85, 103845.
https://doi.org/10.1016/j.jesp.2019.103845
Zhao, M., Pierce, A. M., Tan, R., Zhang, T., Wang, T., Jonker, T. R., Benko, H., &
Gupta, A. (2023). Gaze Speedup: Eye Gaze Assisted Gesture Typing in Virtual
Reality. Proc. of the 28th Int. Conf. on Intelligent User Interfaces, 595–606.
https://doi.org/10.1145/3581641.3584072
APPROXIMATING EYE GAZE 21
Table 1
Features extracted from the microlessons
name description type
head position x, y, z coordinates of the IVR headset, relative to
the environment
input feature
hands position x, y, z coordinates from both hand controllers,
relative to the head position
input feature
eye rotation x, y, z components of the rotation vector (Euler
angles) for the eye (average for left and right eye)
support
head rotation x, y, z components of the rotation vector (Euler
angles) for the head
input feature
gaze target head object in the classroom that intersected with the
forward vector (ray cast) from the user’s head
(computed in the game engine)
input feature
gaze target eye object in the classroom that intersected with the
eye gaze vector of the user (computed in the game
engine)
target variable
... Title Year [98] Augmented reality crossover gamified design for sustainable engineering education 2016 [99] Evaluating a mixed reality 3D virtual campus with big data and learning analytics: A transversal study 2016 [100] Analyzing heterogeneous learning logs using the iterative convergence method 2017 [101] Bringing Abstract Academic Integrity Development of engineering educational support system for manufacturing using Augmented Reality 2020 [110] Emotional characterization of children through a learning environment using learning analytics and AR-Sandbox 2020 [111] Learning analytics for student modeling in virtual reality training systems: Lineworkers case 2020 [112] The development and evaluation of an augmented reality learning system for Japanese compound verbs using learning analytics [126] Using learning analytics to investigate learning processes and behavioural patterns in an augmented reality language learning environment 2023 [127] A Platform for Analyzing Students' Behavior in Virtual Spaces on Mozilla Hubs 2024 [128] Applying multimodal data fusion to track autistic adolescents' representational flexibility development during virtual reality-based training 2024 [129] Approximating eye gaze with head pose in a virtual reality microteaching scenario for pre-service teachers 2024 [130] In-game performance: The role of students' socio-economic status, self-efficacy and situational interest in an augmented reality game 2024 [131] Unveiling Synchrony of Learners ′ Multimodal Data in Collaborative Maker Activities 2024 [132] Utilizing augmented reality for embodied mental rotation training: A learning analytics study 2024 [133] Learning Analytics for Collaboration Quality Assessment during Virtual Reality Content Creation 2024 ...
... However, little information was given regarding the tools and approaches used to develop the specific systems as the vast majority of studies did not report the platform or the tools they used during the development process. Of the ones that did, Unity arose as the most popular development platform [105,115,121,123,129], followed by Mozilla Hubs [127]. Similarly, in the case of educational data mining, only a few studies specified the techniques used. ...
... Of the 36 studies, only six (16.67%) reported the exact equipment used. Meta Quest 2 (n = 3) [121,123,129], Oculus Rift (n = 3) [103,104,114], and Microsoft Hololens (n = 2) [105,108] were mostly used. Focusing on the research approaches adopted by the studies, the research methods used, the variables examined, and the experimental designs were examined. ...
Article
Full-text available
This study aims to examine the combination of educational data mining and learning analytics with virtual reality, augmented reality, mixed reality, and the metaverse, its role in education, and its impact on teaching and learning. Therefore, a systematic literature review, a bibliometric and scientific mapping analysis, and a content analysis are carried out based on 70 relevant documents identified from six databases, namely, ACM, ERIC, IEEE, ScienceDirect, Scopus, and Web of Science (WoS) following the PRISMA framework. The documents were separated into the following three categories, (i) Theoretical and Review studies, (ii) Proposal and Showcase studies, and (iii) Experimental and Case studies and were examined from different dimensions through an in-depth content analysis using both quantitative and qualitative approaches. The documents were further analyzed using scientometric tools, such as Bibliometrix and VOSviewer and topic modeling through Latent Dirichlet Allocation (LDA). The most prominent topics, areas, and themes were revealed and the outcomes regarding the influence of this combination on learning and teaching were summarized. Based on the results, this combination can effectively enrich education, positively affect learning and teaching, offer deep and meaningful learning, and support both students and teachers. Additionally, it can support different educational approaches and strategies, various learning styles, and special education and be utilized in both formal and informal learning environments. The real-time identification, tracking, monitoring, analysis, and visualization of multimodal learning data of students’ behavior, emotions, cognitive and affective states and the overall learning and teaching processes emerged as a significant benefit that contributes greatly to the realization of adaptive and personalized learning. Finally, it was revealed that the combination of extended reality technologies with learning analytics and educational data mining can support collaborative learning and social learning, improve students’ self-efficacy and self-regulated learning, and increase students’ learning gains, academic achievements, knowledge retention, motivation, and engagement.
Article
Full-text available
Procrastination, the irrational delay of tasks, is a common occurrence in online learning. Potential negative consequences include a higher risk of drop-outs, increased stress, and reduced mood. Due to the rise of learning management systems and learning analytics, indicators of such behavior can be detected, enabling predictions of future procrastination and other dilatory behavior. However, research focusing on such predictions is scarce. Moreover, studies involving different types of predictors and comparisons between the predictive performance of various methods are virtually nonexistent. In this study, we aim to fill these research gaps by analyzing the performance of multiple machine learning algorithms when predicting the delayed or timely submission of online assignments in a higher education setting with two categories of predictors: 1) subjective, questionnaire-based variables and 2) objective, log-data-based indicators extracted from a learning management system. The results show that models with objective predictors consistently outperform models with subjective predictors, and a combination of both variable types performs slightly better with an accuracy of 70%. For each of these three options, a different approach prevailed (gradient boosting machines for the subjective, Bayesian multilevel models for the objective, and Random Forest for the combined predictors). We conclude that careful attention should be paid to the selection of predictors and algorithms before implementing such models in learning management systems.
Article
Full-text available
A growing number of virtual reality devices now include eye tracking technology, which can facilitate oculomotor and cognitive research in VR and enable use cases like foveated rendering. These applications require different tracking performance, often measured as spatial accuracy and precision. While manufacturers report data quality estimates for their devices, these typically represent ideal performance and may not reflect real-world data quality. Additionally, it is unclear how accuracy and precision change across sessions within the same participant or between devices, and how performance is influenced by vision correction. Here, we measured spatial accuracy and precision of the Vive Pro Eye built-in eye tracker across a range of 30 visual degrees horizontally and vertically. Participants completed ten measurement sessions over multiple days, allowing to evaluate calibration reliability. Accuracy and precision were highest for central gaze and decreased with greater eccentricity in both axes. Calibration was successful in all participants, including those wearing contacts or glasses, but glasses yielded significantly lower performance. We further found differences in accuracy (but not precision) between two Vive Pro Eye headsets, and estimated participants’ inter-pupillary distance. Our metrics suggest high calibration reliability and can serve as a baseline for expected eye tracking performance in VR experiments.
Conference Paper
Full-text available
Recent research has produced mixed results regarding the effectiveness of learning in VR. It has been suggested that the rich multisensory input in VR may induce cognitive overload that impedes the learning process. Cognitive load is typically measured by administering questionnaires. Although questionnaires are easily used, they imply the need to interrupt students during learning or to assess cognitive load in retrospect. In this work-in-progress paper, we argue that VR motion tracking data has the potential to provide unobtrusive, yet valid measures of cognitive load. We report preliminary results from a user study that aims at predicting cognitive load using the tracking data of a VR headset and two hand controllers. Using a recurrent neural network, we were able to distinguish between different levels of cognitive load with an accuracy of more than 88 percent. Based on this finding, we reflect on future research directions and practical considerations.
Article
Full-text available
Selection by progressive refinement allows the accurate acquisition of targets with small visual sizes while keeping the required precision of the task low. Using the eyes as a means to perform 3D selections is naturally hindered by the low accuracy of eye movements. To account for this low accuracy, we propose to use the concept of progressive refinement to allow accurate 3D selection. We designed a novel eye tracking selection technique with progressive refinement–Eye-controlled Sphere-casting refined by QUAD-menu (EyeSQUAD). We propose an approximation method to stabilize the calculated point-of-regard and a space partitioning method to improve computation. We evaluated the performance of EyeSQUAD in comparison to two previous selection techniques–ray-casting and SQUAD–under different target size and distractor density conditions. Results show that EyeSQUAD outperforms previous eye tracking-based selection techniques, is more accurate and can achieve similar selection speed as ray-casting, and is less accurate and slower than SQUAD. We discuss implications of designing eye tracking-based progressive refinement interaction techniques and provide a potential solution for multimodal user interfaces with eye tracking.
Article
An increasing number of research groups worldwide use eye tracking to study the professional vision and visual expertise of pre-service and in-service teachers. These studies offer evidence about how teachers process complex visual information in classrooms. Focusing on this growing evidence, the present meta-analytic review (k = 98 studies) aims to systematically aggregate and integrate past eye-tracking research on teacher professional vision and teacher noticing. Four goals are addressed. First, we review the methodological characteristics of past eye-tracking studies in terms of their sample, stimulus, and eye movement characteristics. The results show that most studies use mobile eye-tracking devices in action or remote eye trackers with classroom videos on action; less frequently used are photographs and virtual classroom simulations. The average sample size of the reviewed studies is 13 in-service and 13 pre-service teachers per study, indicating the benefit of meta-analytic synthesis. Second, we meta-analyze expertise-related differences between experienced and inexperienced teachers in two frequently used eye movement measures—teacher gaze proportions and the Gini coefficient as a measure of teachers’ equal gaze distribution in the classroom. Results suggest that experienced teachers had higher gaze proportions on the students in the classroom than inexperienced teachers (g = 0.926) who, in turn, gazed more often on instructional material and other objects in the classroom. Experienced teachers distributed their gaze more evenly than inexperienced teachers between students in the classroom (g = 0.501). Third, we synthesize the results reported in eye-tracking research on the processes of teacher professional vision using the cognitive theory of visual expertise as an organizing framework; the review also discusses boundary conditions of eye-tracking research with regard to student, teacher, and instructional characteristics. Fourth, we review studies exploring the use of gaze replays and eye movement modeling examples as an instructional tool to support reflection in teacher education and teacher professional development.
Article
In a remote collaboration involving a physical task, visualising gaze behaviours may compensate for other unavailable communication channels. In this paper, we report on a 360° panoramic Mixed Reality (MR) remote collaboration system that shares gaze behaviour visualisations between a local user in Augmented Reality and a remote collaborator in Virtual Reality. We conducted two user studies to evaluate the design of MR gaze interfaces and the effect of gaze behaviour (on/off) and gaze style (bi-/uni-directional). The results indicate that gaze visualisations amplify meaningful joint attention and improve co-presence compared to a no gaze condition. Gaze behaviour visualisations enable communication to be less verbally complex therefore lowering collaborators' cognitive load while improving mutual understanding. Users felt that bi-directional behaviour visualisation, showing both collaborator's gaze state, was the preferred condition since it enabled easy identification of shared interests and task progress.
Article
Can immersive virtual reality (IVR) serve as an effective venue for learning and training? The promise of learning in IVR lies in its affordances for motivating learners to engage in generative processing (i.e., cognitive processing aimed at making sense of the material). The pitfall of learning in IVR is that it can distract learners so they engage in extraneous processing (i.e., cognitive processing that does not support the instructional goal). This paper reviews (1) media comparison research we have conducted on the effectiveness of learning academic content and skills in IVR versus learning with conventional media and (2) value-added research we have conducted concerning which features can improve the instructional effectiveness of learning in IVR. The paper includes implications for practice and for further work in the area. Overall, the paper focuses on the challenges associated with determining how to reduce the distracting aspects of IVR, maintain the motivating aspects of IVR, and guide the learner towards the core instructional material.
Chapter
The use of eye tracking (ET) and head tracking (HT) in head-mounted displays allows for the study of a subject’s attention in virtual reality environments, expanding the possibility to develop experiments in areas such as health or consumer behavior research. ET is a more precise technique than HT, but many commercial devices do not include ET systems. One way to study visual attention is to segment the space in areas of interest (AoI). However, the ET and HT responses could be similar depending on the size of the studied area in the virtual environment. Therefore, understanding the differences between ET and HT based on AoI size is critical in order to enable the use of HT to assess human attention. The purpose of this study was to perform a comparison between ET and HT technologies through the study of multiple sets of AoI in an immersive virtual environment. To do that, statistical techniques were developed with the objective of measuring the differences between the two technologies. This study found that with HT, an accuracy of 75.37% was obtained when the horizontal and vertical angular size of the AoIs was 25°. Moreover, the results suggest that horizontal movements of the head are much more similar to eye movements than vertical movements. Finally, this work presents a guide for future researchers to measure the precision of HT against ET, considering the dimensions of the AoI defined in a virtual scenario.