Conference PaperPDF Available

The Effects of Video Instructor’s Body Language on Students’ Distribution of Visual Attention: an Eye-tracking Study

Authors:
1
© 2018, Jiawen Zhang, Marie-Luce Bourguet, Gentiane
Venture. Published by BCS Learning and Development Ltd.
Proceedings of British HCI 2018. Belfast, UK.
The Effects of Video Instructor’s Body
Language on Students’ Distribution of Visual
Attention: an Eye-tracking Study
Jiawen Zhang
Beijing University of Posts and
Telecommunications
zhangjiawen@bupt.edu.cn
Marie-Luce Bourguet
Queen Mary University of
London
marie-luce.bourguet@qmul.ac.uk
Gentiane Venture
Tokyo University of Agriculture and
Technology
venture@cc.tuat.ac.jp
Previous studies have shown that the instructor’s presence in video lectures has a positive effect
on learners experience. However, it does increase the cost of video production and may increase
learners’ cognitive load. An alternative to instructor’s presence is the use of embodied pedagogical
agents that display limited but appropriate social signals. In this extended abstract, we report a small
experimental study into the effects of video instructor’s behaviour on students’ learning experience,
with the long term aim of better understanding which instructor’s social signals should be applied
to pedagogical agents. We used eye-tracking technology and data visualisation techniques to collect
and analyse students’ distribution of visual attention in relation to the instructor’s speech and body
language. Participants also answered questions about their attitudes toward the instructor. The
results suggest that the instructor’s gaze directed towards the lecture’s slides, or a pointing gesture
towards the slides, is not enough to shift viewersattention. However, the combination of both is
effective. An embodied pedagogical agent should be able to display a multimodal behaviour,
combining gaze and gestures, to effectively direct the learners’ visual attention towards the relevant
material. Furthermore, to make learners pay attention to the lecturer’s speech, the instructional agent
should make use of pauses and emphasis.
Video lectures. Social signals. Eye tracking. Embodied pedagogical agents.
1. INTRODUCTION
In remote learning, videos have the potential to offer
many of the advantages of a classroom-like
experience and, in addition, they enable student’s
control over the pace of their learning (Yousef et al.,
2014). Various studies have looked at the effects of
different video-based instruction designs in relation
to students’ engagement, attention, emotion,
cognitive load, knowledge transfer and recall (Chen
& Wu, 2015; Guo et al., 2014). Based on the eye-
mind assumption that eye fixation locations reflect
attention distributions (Just & Carpenter, 1980), an
increasing number of studies are using eye-tracking
techniques to understand how students learn using
videos (Lai et al., 2013; Sharma et al., 2014) and
especially how instructor’s presence in the video
affects students distribution of visual attention
(Garrett, 2015; Kizilcec et al., 2014).
A general positive effect of instructor’s presence in
instructional videos has been found (Wang &
Antonenko, 2017). For example, it contributes to
increase students’ “with-me-ness”, which is the
extent to which the learner succeeds in following the
content that is being explained (Sharma et al.,
2016). Moreover, as lecturers’ hand gestures and
facial expressions are often linked to their
pedagogical intentions (Tian & Bourguet, 2016;
Zhang, 2012), the availability of social signals such
as the instructor’s pointing gestures and gaze can
improve learning experience and performance
(Ouwehand et al., 2015; Pi et al., 2017).
However, including the lecturer’s presence in videos
entails a high production cost (Hollands & Tirthali,
2014). Moreover, there is a concern that it may
contribute to increasing the learners’ cognitive load
(Chandler & Sweller, 1991; Mayer, 2001) by
inducing a split attention effect (when learners must
divide their attention across multiple information
sources). For example, it has been found that
learners are looking at the instructor’s face up to
65% of the time in average, and that they switch
between the lecturer’s face and the instructional
material up to every 2.4 seconds, depending on the
multimedia design (Garett, 2015).
A low-cost and accessible alternative to instructor’s
presence in videos is the use of embodied
pedagogical agents (Li et al., 2015). Agents that
display limited but appropriate social signals may
also incur less cognitive load than their human
The Effects of Video Instructor’s Body Language on Students’ Distribution of Visual Attention: an Eye-tracking Study
Jiawen Zhang
Marie-Luce Bourguet
Gentiane Venture
2
models. In this work-in-progress paper, we report
the results of an experimental study into the effects
of video instructor’s behaviour on students’ learning
experience, with the long term aim of better
understanding which instructor’s social signals
should be applied to pedagogical agents. The scale
of the study is small (8 participants), but at this early
stage of the research, the intention is to capture
some of the instructor’s important social signals in
order to build a first prototype that can be used for
further studies. We briefly describe our pedagogical
agent prototype in the conclusion of the paper.
2. EYE-TRACKING EXPERIMENT
2.1 Method
We used eye tracking technology and data
visualisation techniques (Bojko, 2009) to collect and
analyse students’ distribution of visual attention in
relation to the video instructor’s speech and body
language. Participants to the experiment also
answered questions about their attitudes toward the
instructor.
2.1.1. Video Stimulus
All participants watched the same video (duration of
4 minutes and 13 seconds) on the topic of “Design
Techniques” (covering brain storming, mind maps
and storyboards), extracted from a 3rd year
undergraduate telecommunications engineering
course. The video showed the instructor’s head and
upper body on the right side of the lecture’s slides,
all within the same frame (see Figure 1).
Figure 1: The Areas of Interest (AOI).
Prior to conducting the experiment, the video was
manually annotated with instructor behaviour’s
markers, using the ANVIL annotation tool (Kipp,
2014). Behaviour markers included three markers
for gaze (looking towards the camera, i.e. the
viewer; looking towards the slides; looking
elsewhere); seven markers for hand gestures
(pointing towards slide; waving hands; clasping
hands; unfolding hands; ball; other gesture; no
gesture); and three markers for speech (speaking
with direct reference to the slide’s content; not
directly referring to slide’s content; no speech). The
annotations were not displayed to the participants.
2.1.2. Participants
Ten undergraduate students from an International
Bachelor’s degree in Electronic Engineering in
China delivered in the English language were
recruited. Prior to the study, each participant was
asked to complete a background questionnaire to
ensure that all participants shared a similar level of
prior domain knowledge (all of them had taken the
module of the video in the previous semester) and
English comprehension (CET6 level). Two
participants had to be excluded due to problems with
their eye tracking data, leaving a sample of eight
participants, three males and five females, aged 20
to 22. None of them had abnormal vision or
abnormal hearing.
2.1.3. Procedure and Equipment
The experiment was conducted in individual
sessions of approximately 10 minutes. Before the
video stimulus started, the experimenter gave
participants a brief introduction to the experiment
and to the eye-tracking equipment, and each
participant was asked to follow a simple procedure
for equipment calibration purpose. The participants
were then asked to watch the instructional video
without being able to pause or stop it. To ensure that
they were paying attention and trying to learn from
the video, they were told that they would have to
write a summary of the video content immediately
after watching it.
The participants’ eye position was measured using
the Tobii 4C eye tracker. The device was mounted
to the bottom of the computer monitor on which the
lecture video was displayed. Tobii 4C operates at a
distance of 50-95cm and has a high accuracy of 0.4
degrees. The sampling frequency is 90 Hz.
Computer’s screen size was 13.3 inches, and the
resolution of the monitor was 1440 x 900 pixels.
2.2 Measurements
Visual attention is typically measured in the form of
fixations, which (in our study) describe durations of
at least 200ms that a viewer spends looking at a
small area on the screen (i.e. an area of side limited
to 10 pixels). Fixations are connected by saccades,
and a sequence of fixations and saccades is called
a scanpath.
2.2.1. Areas of Interest
Areas of Interest (AOIs) are parts of the video frame
that are of high importance for the hypothesis of the
study. Two non-overlapping AOIs were determined:
the instructor area and the slide area (see Figure 1).
We found that, in average, participants spend
95.33% of their time (percentage of gaze point
distribution) watching one of the two AOIs. They
spend slightly more time on the instructor AOI
(M=49.09%, SD=14.12) than on the slide AOI
(M=46.23%, SD=13.58), although the difference is
The Effects of Video Instructor’s Body Language on Students’ Distribution of Visual Attention: an Eye-tracking Study
Jiawen Zhang
Marie-Luce Bourguet
Gentiane Venture
3
not significant (a paired sample t-test was
conducted: t(7) = 0.29, p=0.05, ns). After dividing the
instructor AOI into two: the face area and the body
area; we observed that students look more at the
instructor’s face than the body gestures (M=75.66%,
SD=11.28).
Table 1 shows for each AOI and different instructor
behaviours the average fixation rate, i.e. the
average fixations count divided by the total duration
of the behaviour (note that the gaze, hand and
speech behaviours are not exclusive behaviours).
Surprisingly, behaviours that are meant to attract
attention to the lecture’s slide (e.g. Gaze towards
slide, Hand pointing and Speech with reference)
have higher fixation rate on the instructor AOI than
on the slide AOI. This could be explained by the fact
that there is a delay between the behaviour and the
effect it has on the students’ visual attention. It could
also be explained by a higher rate of transitions
between the two AOIs (see next section). For a
better explanation, combinations of behaviours
should in fact be scrutinised (see visualisation
section).
Table 1: Average fixation rate (count / duration) on each
AOI in relation to instructor behaviour; and total duration
(in seconds) of the observed behaviours.
2.2.2. Transitions
A transition is a movement from one AOI to another.
The typical measure related to transitions is the
transition count, i.e. number of transitions between
two AOIs.
Table 2 shows average transition counts across
participants in relation to different instructor’s
behaviours. Given that the total duration of each
behaviour is variable, we computed an average
transition rate (transition count/duration) for each
behaviour. We can see that when the instructor is
looking at the slide, the transition rate is relatively
high (1.195), which contributes to shorten the
fixation rate on the slide AOI and corroborates the
findings of Table 1.
Table 2: Average transition count [standard deviation]
and transition rate (count/duration) between the two AOIs
in relation to instructor behaviour.
2.3 Visualisation
2.3.1 Attention maps
An attention map (or heat map) is a graphical
representation of the attention distribution. Different
kinds of attention maps have been proposed (Bojko,
2009), e.g.: “Fixation count heat map”, which results
from the aggregation of fixation counts across time
and participants (also called bee swarm); and
“Absolute gaze duration heat map”, which is the
aggregation of absolute gaze duration across time
and participants.
Figure 2 (left) shows a fixation count heat map
calculated on a 5.32 second clip during which the
instructor is performing a pointing gesture, looking at
the slide and delivering speech that is directly
referring to the slide content. With the three
behaviours combined, the student’s visual attention
is clearly directed towards the slide AOI, where gaze
duration is also longer.
Figure 2: Fixation count heat maps where the instructor
looks at the slide (left) versus the camera (right).
Figure 2 (right) shows a fixation count heat map
calculated on a 4.52 second clip during which the
instructor is performing a pointing gesture and
delivering speech that is directly referring to the slide
content but is looking at the camera. The student’s
visual attention is scattered on the slide, as if the
gesture alone did not allow them to find the relevant
Behaviour
Average
transition count
[SD]
Average
transition
rate
Gaze towards camera
207.38 [70.96]
0.914
Gaze towards slide
13.00 [7.05]
1.195
Gaze other
8.25 [4.29]
0.576
Hand pointing
24.88 [10.15]
0.944
Hand waving
64.63 [21.81]
0.915
Unfolding hands
32.25 [11.79]
1.017
Other hand gesture
78.38 [25.63]
0.789
No hand gesture
24.63 [13.35]
1.026
Speech with reference
51.63 [17.07]
0.890
Speech no reference
152.88 [54.39]
0.916
Silence
23.25 [9.48]
0.769
Behaviour
Instruct.
AOI
Duration
(seconds)
Gaze towards
camera
0.413
226.8
Gaze towards slide
0.62
10.88
Gaze other
0.681
14.32
Hand pointing
0.560
26.36
Hand waving
0.338
70.60
Unfolding hands
0.248
31.72
Other hand gesture
0.247
99.32
No hand gesture
0.354
24.00
Speech with
reference
0.508
58.04
Speech no reference
0.369
166.92
Silence
0.579
30.24
The Effects of Video Instructor’s Body Language on Students’ Distribution of Visual Attention: an Eye-tracking Study
Jiawen Zhang
Marie-Luce Bourguet
Gentiane Venture
4
information, and the gaze duration is actually longer
on the instructor’s face.
2.3.2 Temporal Evolution of Scanpaths
Figure 3 shows horizontal fixation positions in the
vertical axis and time on the horizontal axis. Each
line corresponds to a different participant. The top
image has been calculated on the clip of Figure 2
(left) (slightly extended to 6 second duration), during
which the instructor is pointing at the slide while
looking towards it. The bottom image has been
calculated on the clip of Figure 2 (right) (also
extended to the same 6.00 second duration), during
which the instructor is pointing at the slide and
looking at the audience.
We can clearly observe less transitions between the
two AOIs in the top image, showing that the
instructor’s gaze towards the slide, when combined
with a pointing gesture, has the effect of helping
students maintain their attention on the slide. The
pointing gesture alone does not prevent students
from shifting their attention back and forth between
the slide and the instructor’s face, hence potentially
increase their cognitive load. In the bottom image,
some students keep staring at the instructor’s face,
whereas in the top image the opposite can be
observed: some students keep staring at the slide
without shifting their attention back to the instructor.
Figure 3: Scanpaths when the instructor is looking at the
slide (top image) versus the camera (bottom image). The
instructor AOI is the top dark grey area.
3. QUESTIONNAIRE RESULTS
After watching the video, the participants answered
a short questionnaire about their attitude towards
the video instructor.
The results show that all participants consider the
instructor's presence useful, and 75% of them think
that the instructor's behaviour is helping them
understand the lecture’s content. Two behaviours in
particular: ‘Hand pointing’ and ‘Speech with
emphasis’, are regarded as particularly important.
When the instructor performs a pointing gesture,
87.5% of the participants thought that there must be
something worth of attention in the slide, which
corroborates the results of the eye tracking
experiment. Conversely, 62.5% of the participants
believed that ‘Speech with emphasis’ means that the
instructor was saying something important.
Further results show that participants feel most
concerned with the instructor’s speech, followed by
the slides area, and finally the instructor’s body.
Indeed, we know already from the eye tracking
experiment that participants spend much more time
looking at the instructor’s face area than the body
area. The main function of the instructor's behaviour
is to help shifting the students’ visual attention
between the teaching material and the teacher's
face, i.e. the speech.
4. CONCLUSION AND FURTHER WORK
In this paper, we reported a small experimental
study into the effects of video instructor’s behaviour
on students’ distribution of visual attention. The
results suggest that pointing gestures combined with
gaze constitute an important and useful social
signal. An embodied pedagogical agent should be
able to display a multimodal behaviour, combining
gaze and gestures, to effectively direct the learners
visual attention towards the relevant material.
Furthermore, to make learners pay attention to the
speech, the instructional agent should make use of
pauses and emphasis.
We have implemented a prototype of an embodied
pedagogical agent for further studies on what social
signals should such an agent display (Figure 4). We
chose the social robot Pepper (SoftBank Robotics,
2017) because of its neutrality (e.g. it is non-
gendered), because a robot looks playful and non-
judgmental (Clark & Mayer, 2011), and because it is
not expected to display the complex, but not always
useful, behaviour of a human instructor. Pepper’s
main social signals for now include gaze (head
direction) and pointing gestures. Further studies
using Pepper are being conducted to test the
acceptability of a robot as instructor, and the social
signals it should display to support the learners.
Figure 4: Pepper the virtual social robot and embodied
pedagogical agent.
The Effects of Video Instructor’s Body Language on Students’ Distribution of Visual Attention: an Eye-tracking Study
Jiawen Zhang
Marie-Luce Bourguet
Gentiane Venture
5
5. REFERENCES
Bojko, A. (2009). Informative or Misleading?
Heatmaps Deconstructed. Human-Computer
Interaction. New Trends. Springer Berlin
Heidelberg.
Chandler, P., & Sweller, J. (1991). Cognitive load
theory and the format of instruction. Cognition &
Instruction, 8(4), 293-332.
Chen, C. M., & Wu, C. H. (2015). Effects of different
video lecture types on sustained attention,
emotion, cognitive load, and learning
performance. Computers & Education, 80(5),
108-121.
Clark, R. C., & Mayer, R. E. (2011). E-learning and
the science of instruction: proven guidelines for
consumers and designers of multimedia learning.
Pfeiffer.
Garrett, N. (2015). Eye-Tracking Analytics in
Instructional Videos. ISECON.
Guo, P. J., Kim, J., & Rubin, R. (2014). How video
production affects student engagement: an
empirical study of MOOC videos. ACM
Conference on Learning @ Scale
Conference (Vol.43, pp.41-50). ACM.
Hollands, F.M., & Tirthali, D. (2014). MOOCs:
Expectation sand reality. Full Report. Center for
Benefit Cost Studies of Education, Teachers
College, Columbia University, NY.
Just, M. A., & Carpenter, P. A. (1980). A theory of
reading: from eye fixations to
comprehension. Psychological Review, 87(4),
329.
Kipp, M. (2014). ANVIL: A Universal Video Research
Tool. In J. Durand, U. Gut, G. Kristofferson
(Eds.) Handbook of Corpus Phonology, Oxford
University Press, 420-436.
Kizilcec, R. F., Papadopoulos, K., & Sritanyaratana,
L. (2014). Showing face in video instruction:
effects on information retention, visual attention,
and affect. 2095-2102.
Lai, M. L., Tsai, M. J., Yang, F. Y., Hsu, C. Y., Liu,
T. C., & Lee, W. Y., et al. (2013). A review of using
eye-tracking technology in exploring learning from
2000 to 2012. Educational Research
Review, 10(4), 90-115.
Li, J., Kizilcec, R., Bailenson, J., & Ju, W. (2015).
Social robots and virtual agents as lecturers for
virdo instruction. Computers in Human Behavior,
55, 1222-1230.
Mayer, R. E. (2001). Multimedia Learning.
Cambridge University Press.
Ouwehand, K., Van Gog, T., & Paas, F. (2015).
Designing effective video-based modeling
examples using gaze and gesture
cues. Educational Technology & Society, 18.
Pi, Z., Hong, J., & Yang, J. (2017). Effects of the
instructor's pointing gestures on learning
performance in video lectures. British Journal of
Educational Technology, 48(4), 1020-1029.
Sharma, K., Jermann, P., & Dillenbourg, P. (2014).
How Students Learn using MOOCs: An Eye-
tracking Insight. EMOOCs 2014, the Second
MOOC European Stakeholders Summit.
Sharma, K., Alavi, H. S., Jermann, P., & Dillenbourg,
P. (2016). A gaze-based learning analytics
model:in-video visual feedback to improve
learner's attention in moocs. 417-421.
SoftBank Robotics (2017) "Find out more about
Pepper". [Online] Available from:
https://www.ald.softbankrobotics.com/en/robots/
pepper [accessed 28 March 2018].
Tian, Y., & Bourguet, M. L. (2016). Lecturers' Hand
Gestures as Clues to Detect Pedagogical
Significance in Video Lectures. European
Conference on Cognitive Ergonomics (pp.2).
ACM.
Wang, J., & Antonenko, P. D. (2017). Instructor
presence in instructional video: effects on visual
attention, recall, and perceived
learning. Computers in Human Behavior, 71, 79-
89.
Yousef, A. M. F., Chatti, M. A., & Schroeder, U.
(2014). Video-Based Learning: A Critical Analysis
of The Research Published in 2003-2013 and
Future Visions. eLmL 2014 : The Sixth
International Conference on Mobile, Hybrid, and
On-line Learning (pp.112-119).
Zhang, J. R. (2012). Upper body gestures in lecture
videos:indexing and correlating to pedagogical
significance. ACM International Conference on
Multimedia (pp.1389-1392). ACM.
... With regard to visual presence and presentation style, Shi et al. examined how instructors' visual attention and lecture delivery styles influence students' perceived engagement and academic performance across various instructional formats [30]. Similarly, Zhang et al. employed eyetracking and visualization technologies to investigate the effects of different instructional delivery styles on student viewing behavior [18]. Their analysis showed that students were more responsive to auditory cues-such as pauses and vocal emphasis-than to visual elements like gestures or slide transitions. ...
... According to the previous study by Zhang et al. [18], the behavior of the instructor influences the attention of students. Therefore, we obtained the instructor's action in the archive segments by the optical flow [19], the pattern of apparent motion of objects, surfaces, and edges in each segment caused by the relative motion between observer and scene. ...
Preprint
This study proposes a multimodal neural network-based approach to predict segment access frequency in lecture archives. These archives, widely used as supplementary resources in modern education, often consist of long, unedited recordings that make it difficult to keep students engaged. Captured directly from face-to-face lectures without post-processing, they lack visual appeal. Meanwhile, the increasing volume of recorded material renders manual editing and annotation impractical. Automatically detecting high-engagement segments is thus crucial for improving accessibility and maintaining learning effectiveness. Our research focuses on real classroom lecture archives, characterized by unedited footage, no additional hardware (e.g., eye-tracking), and limited student numbers. We approximate student engagement using segment access frequency as a proxy. Our model integrates multimodal features from teachers' actions (via OpenPose and optical flow), audio spectrograms, and slide page progression. These features are deliberately chosen for their non-semantic nature, making the approach applicable regardless of lecture language. Experiments show that our best model achieves a Pearson correlation of 0.5143 in 7-fold cross-validation and 69.32 percent average accuracy in a downstream three-class classification task. The results, obtained with high computational efficiency and a small dataset, demonstrate the practical feasibility of our system in real-world educational contexts.
... Jelasnya, isyarat gerakan tangan pensyarah boleh dilihat amat penting dalam membantu pelajar untuk mengikuti sesi pembelajaran dengan baik. Hasil kajian ini menyokong hasil kajian Tian dan Bourguet (2016) dan Zhang et al. (2018) yang mengkaji tentang isyarat gerakan tangan yang digunakan oleh pensyarah. ...
... Penemuan ini telah menyokong kajian Zhang et al. (2018) dan Wang et al. (2018) berkaitan perasaan suka untuk memahami topik semasa sesi pembelajaran dalam talian. Kombinasi kinesik iaitu gerakan tangan dan paralinguistik yang diamalkan oleh tenaga pengajar membantu pelajar untuk merasa tertarik dengan apa yang disampaikan. ...
Article
Full-text available
Studies related to non-verbal communication in virtual space need to be explored as a result of changes in communication processes that largely rely on online interaction due to the COVID-19 pandemic. Undoubtedly, it has extended the scope of understanding an individual’s virtual presence and the effectiveness of non-verbal communication practices. In this study, non-verbal communication is explored in the process of online teaching and learning. Social Presence Theory has been used in understanding the practice of lecturers to establish relationships through their virtual presence and build closeness with students during online teaching and learning. In-depth interviews were conducted with a total of ten students of Universiti Utara Malaysia (UUM). This study has found that lecturers can establish relationships and closeness with students through non-verbal communication cues such as kinesic, proxemic, chronemic, and paralinguistic. Lecturers who practice effective non-verbal communication enable a positive effect on students in terms of motivation to learn, focus in learning sessions, create interest in understanding topics, and feel at ease in learning. However, the lecturers' non-verbal communication has had a negative effect if the lecturer is unable to build a good relationship especially in terms of chronemic and facial expressions cues. It will cause students to be unmotivated and experience emotional stress. In conclusion, non-verbal communication is still vital in the process of establishing a social presence and building relationships even online. The practice of non-verbal communication during the individual social presence in virtual space needs to be explored in other contexts such as in organisations. Keywords: Non-verbal communication, online learning, qualitative, Social Presence Theory, Covid-19.
... In this study, a highly concentrated fixation point is found on the face of the instructor, which can be justified by the renowned theory of the human face attraction effect. These results are consistent with prior findings that the learner was attracted to human or human-like faces (Louwerse et al., 2009;Wang, Antonenko, et al., 2020;Yee et al., 2007;Zhang et al., 2018). ...
Article
Full-text available
Background Examining student attention in physical classrooms is crucial, but it faces challenges due to the lack of accurate monitoring. Constraints posed by device limitations and the design of educational materials impede the integration of eye‐tracking technology in these settings. Objectives This study aims to (1) develop a wearable eye‐tracking system specifically designed to monitor students' eye movements and gaze points on the projector screen within a physical classroom setting; (2) explore the impact of instructor gestures (by compare live instruction by an instructor and video‐recorded instruction) on student attention and examine the effectiveness of directing students' attention from text to image through instructor intervention. Methods An innovative wearable eye‐tracking system was developed to monitor learners' eye movements within the physical classroom. Twenty‐five students participated in the experiment, which included two approaches: classroom lectured by the instructor and by a video presentation. Results and Conclusions The results indicate that participants exhibit a stronger inclination to allocate additional time to text content than image content when receiving instruction through video presentations with a laser pointer in the physical classroom. This tendency can be attributed to the participants' requirement for longer reading and comprehension time in the absence of an instructor. Additionally, the instructor's gestures and body movements significantly impacted participants' fixation on text slides compared to the image slides. The heatmap analyses support these findings and further indicate that participants focus on the instructor's face rather than other body parts. Takeaways The wearable eye‐tracking technology developed in this study holds promise for future educational research, offering further exploration and analysis opportunities.
... However, another opposite prediction has been supported by the split-attention effect, which occurs if learners need to process multiple pieces of novel information [34]. In videotaped lectures, learning materials coupled with a teacher's face, particularly an emotional face, can lead to visual split attention because students must choose the more prominent content to focus on [35]. The stronger split-attention phenomenon has harmful effects on learners' performance. ...
Article
Full-text available
Although the discussion about the influence of the instructor-present videos has become a hot issue in recent years, the potential moderators on the effectiveness of an on-screen instructor have not been thoroughly synthesized. The present review systematically retrieves 47 empirical studies on how the instructors’ behaviors moderated online education quality as measured by learning performance via a bibliographic study using VOSviewer and meta-analysis using Stata/MP 14.0. The bibliographic networks illustrate instructors’ eye gaze, gestures, and facial expressions attract more researchers’ attention. The meta-analysis results further reveal that better learning performance can be realized by integrating the instructor’s gestures, eye guidance, and expressive faces with their speech in video lectures. Future studies can further explore the impact of instructors’ other characteristics on learning perception and visual attention including voice, gender, age, etc. The underlying neural mechanism should also be considered via more objective technologies.
... In fact, it accounted for a 41% average of learners' time during instruction (Kizilcec et al., 2014). The result is aligned with findings by Zhang et al. (2018), who reported that learners paid more attention to the face of the instructor (75.66%). Wang and Antonenko (2017) examined whether teacher presence can positively or negatively influence students' learning perceptions and performance when watching video lectures that feature varying content and difficulty levels. ...
Article
Full-text available
As the rampant pandemic witnessed significant growth in online learning, numerous studies focused on designing attractive videos to improve the quality of teaching. For an effective instructional video, whether a teacher should be present on the screen remained controversial. Therefore, the study conducted bibliometric analyses to review the previous evidence on the relationships between teacher presentation types and learning performance, attention distribution, and learning perceptions. The clustering results quantitatively proved that researchers paid more attention to the empirical studies on the effect of teacher presence. Both positive and negative influence of teacher presence on learning outcomes and perceptions has also been presented. Additionally, eye-tracking results showed that teachers attracted relatively more attention in teacher-present videos. Future studies should resort to more objective methods to address the controversy and emphasize the value of the individual difference to avoid one-size-fits-all presentation types, thus enhancing the quality of videos.
Chapter
This research proposes a deep neural network architecture for detecting focal periods for online students in lecture archives. Due to the COVID-19 pandemic, most universities attempted to use online education instead of traditional classrooms. However, watching long lecture archives, just recorded face-to-face lectures, is difficult for students to keep their attention. Hence, how to provide focal periods of the lecture archives is essential to maintain educational effectiveness in such a situation. This research divides lecture archives with high quality and fixed camera angles into 1-min segments, counts how many times students have accessed each segment from LMS as the label data, and defines the students’ focal periods. Then, we demonstrated deep neural network architectures with the combined features to improve detection reliability. Our experiments showed that the proposed method could detect the focal periods with 56.8% accuracy. Although there is room for improvement in accuracy, this enables us to detect certain focal periods with a small amount of computation without using semantic features.
Chapter
Gestures and speech modalities play potent roles in social learning, especially in educational settings. Enabling artificial learning companions (i.e., humanoid robots) to perform human-like gestures and speech will facilitate interactive social learning in classrooms. In this paper, we present the implementation of human-generated gestures and speech on the Pepper robot to build a robotic teacher. To this end, we transferred a human teacher gesture to a humanoid robot using a web and a kinect cameras and applied a video-based markerless motion capture technology and an observation-based motion mirroring method. To evaluate the retargeting methods, we presented different types of a humanoid robotic teacher to six teachers and collect their impressions on the practical usage of a robotic teacher in the classroom. Our results show that the presented AI-based open-source gesture retargeting technology was found attractive, as it gives the teachers an agency to design and employ the Pepper robot in their classes. Future work entails the evaluation of our solution to the stakeholders (i.e. teachers) for its usability.
Conference Paper
Full-text available
This paper describes a small experimental study into the relationship between the hand gestures performed by lecturers and the pedagogical significance of the corresponding parts of the lecture. Body movements have long been known to play an important role in communication, especially when teaching. The characterisation of such a relationship could be important to predict the pedagogical significance of parts of lectures in order to support a more effective delivery of online lectures (e.g., video summarisation). Videos of five lectures from different subjects were collected and the occurrence and frequency of the lecturers' hand gestures were carefully annotated together with the meaning of the gestures. A survey among 82 students was then conducted in order to establish a relationship between the annotated gestures and the pedagogical significance of their accompanying speech. It was found that three types of gesture ("pointing", "circle" and "ball") indicate that the corresponding lecture part is of particular pedagogical significance.
Article
Full-text available
Research suggests that learners will likely spend a substantial amount of time looking at the model's face when it is visible in a video-based modeling example. Consequently, in this study we hypothesized that learners might not attend timely to the task areas the model is referring to, unless their attention is guided to such areas by the model's gaze or gestures. Results showed that the students in all conditions looked more at the female model than at the task area she referred to. However, the data did show a gradual decline in the difference between attention toward the model and the task as a function of cueing: students who observed the model gazing and gesturing at the task, looked the least at the model and the most at the task area she referred to, while those who observed the model looking straight into the camera, looked most at the model and least at the task area she referred to. Students who observed a human model only gazing at the task fell in between. In conclusion, gesture cues in combination with gaze cues effectively help to distribute attention between the model and the task display in our video-based modeling example.
Conference Paper
Full-text available
While instructional videos are an increasingly popular way to deliver content, use of video analytics are in their infancy. This project used an eye-tracking machine to record learners while they were shown a slide presentation alongside a video of the instructor. Learners frequently switched between the video and slides, on average every 2.4 seconds. Speakers who gestured, repeated keywords, or began new points often sparked these switches. The switch patterns were evaluated using Multimedia Learning Theory to determine what they might reveal about the cognitive load of the presentation on the learner. The switches appear to show synchronization attempts (or failures) between the speaker and the multimedia, and yield insight for managing optimal cognitive load. Interviews clarified these eye-tracking results, and together they confirm and expand several helpful guidelines for multimedia use in IT-related instructional videos.
Conference Paper
Full-text available
Video-Based Learning (VBL) has a long history in the educational design research. In the past decade, the interest in VBL has increased as a result of new forms of online education, such as flipped classrooms, and most prominently MOOCs. VBL has unique features that make it an effective Technology-Enhanced Learning (TEL) approach. This study critically analyzed the current research of VBL published in 2003-2013 to build a deep understanding on what are the educational benefits and effectiveness that VBL has on teaching and learning. 67 peer reviewed papers were selected in this review and categorized into four main dimensions, namely, effectiveness, teaching methods, design, and reflection. In the light of the discussion of current research in terms of these categories, we present the future vision and research opportunities of VBL that support self-organized and network learning
Article
Full-text available
Over the past few years, observers of higher education have speculated about dramatic changes that must occur to accommodate more learners at lower costs and to facilitate a shift away from the accumulation of knowledge to the acquisition of a variety of cognitive and non-cognitive skills. All scenarios feature a major role for technology and online learning. Massive open online courses(MOOCs) are the most recent candidates being pushed forward to fulfill these ambitious goals. To date, there has been little evidence collected that would allow an assessment of whether MOOCs do indeed provide a cost-effective mechanism for producing desirable educational outcomes at scale. It is not even clear that these are the goals of those institutions offering MOOCs. This report investigates the actual goals of institutions creating MOOCs or integrating them into their programs, and reviews the current evidence regarding whether and how these goals are being achieved, and at what cost.
Article
Recent research on video lectures has indicated that the instructor's pointing gestures facilitate learning performance. This study examined whether the instructor's pointing gestures were superior to nonhuman cues in enhancing video lectures learning, and second, if there was a positive effect, what the underlying mechanisms of the effect might be. There were three kinds of video lectures in the study: one with the instructor's pointing gestures, one with nonhuman cues, and one without any cues. Eighty-four Chinese undergraduates were randomly assigned to view one of the three video lectures in a laboratory. As hypothesized, the results of analyses of variance showed that the instructor's pointing gestures improved learning performance more than the nonhuman cues and no-cues conditions. The pointing gestures directed the learners' visual attention to the relevant learning content of PowerPoint (PPT) slides in the video lecture. This suggests that the instructor's pointing gestures can be a valuable means to improve learning performance in video lectures, in particular PPT slides with much learning information.
Conference Paper
In the context of MOOCs, "With-me-ness" refers to the extent to which the learner succeeds in following the teacher, specifically in terms of looking at the area in the video that the teacher is explaining. In our previous works, we employed eye-tracking methods to quantify learners' With-me-ness and showed that it is positively correlated with their learning gains. In this contribution, we describe a tool that is designed to improve With-me-ness by providing a visual-aid superimposed on the video. The position of the visual-aid is suggested by the teachers' dialogue and deixis, and it is displayed when the learner's With-me-ness is under the average value, which is computed from the other students' gaze behavior. We report on a user-study that examines the effectiveness of the proposed tool. The results show that it significantly improves the learning gain and it significantly increases the extent to which the students follow the teacher. Finally, we demonstrate how With-me-ness can create a complete theoretical framework for conducting gaze-based learning analytics in the context of MOOCs.
Article
For hundreds of years verbal messages - such as lectures and printed lessons - have been the primary means of explaining ideas to learners. In Multimedia Learning Richard Mayer explores ways of going beyond the purely verbal by combining words and pictures for effective teaching. Multimedia encyclopedias have become the latest addition to students' reference tools, and the world wide web is full of messages that combine words and pictures. Do these forms of presentation help learners? If so, what is the best way to design multimedia messages for optimal learning? Drawing upon 10 years of research, the author provides seven principles for the design of multimedia messages and a cognitive theory of multimedia learning. In short, this book summarizes research aimed at realizing the promise of multimedia learning - that is, the potential of using words and pictures together to promote human understanding.