Conference PaperPDF Available

Towards Understanding Perceptual Differences between Genuine and Face-Swapped Videos

Authors:

Abstract and Figures

In this paper, we report on perceptual experiments indicating that there are distinct and quantitatively measurable differences in the way we visually perceive genuine versus face-swapped videos. Recent progress in deep learning has made face-swapping techniques a powerful tool for creative purposes, but also a means for unethical forgeries. Currently, it remains unclear why people are misled, and which indicators they use to recognize potential manipulations. Here, we conduct three perceptual experiments focusing on a wide range of aspects: the conspicuousness of artifacts, the viewing behavior using eye tracking, the recognition accuracy for different video lengths, and the assessment of emotions. Our experiments show that responses differ distinctly when watching manipulated as opposed to original faces, from which we derive perceptual cues to recognize face swaps. By investigating physiologically measurable signals, our findings yield valuable insights that may also be useful for advanced algorithmic detection.
Content may be subject to copyright.
Towards Understanding Perceptual Diferences
between Genuine and Face-Swapped Videos
Leslie Wöhler
woehler@cg.cs.tu-bs.de
Institut für Computergraphik, TU Braunschweig
Braunschweig, Germany
Susana Castillo
castillo@cg.cs.tu-bs.de
Institut für Computergraphik, TU Braunschweig
Braunschweig, Germany
Martin Zembaty
m.zembaty@cg.cs.tu-bs.de
Institut für Computergraphik, TU Braunschweig
Braunschweig, Germany
Marcus Magnor
magnor@cg.cs.tu-bs.de
Institut für Computergraphik, TU Braunschweig
Braunschweig, Germany
Figure 1: We estimate perceptual dierences between genuine and manipulated videos. Towards that goal, we use face swaps
as stimuli and perform three types of experiments focusing on eye tracking, dierent video durations, and the assessment of
emotions.
ABSTRACT
In this paper, we report on perceptual experiments indicating that
there are distinct and quantitatively measurable dierences in the
way we visually perceive genuine versus face-swapped videos.
Recent progress in deep learning has made face-swapping tech-
niques a powerful tool for creative purposes, but also a means for
unethical forgeries. Currently, it remains unclear why people are
misled, and which indicators they use to recognize potential manip-
ulations. Here, we conduct three perceptual experiments focusing
on a wide range of aspects: the conspicuousness of artifacts, the
viewing behavior using eye tracking, the recognition accuracy for
dierent video lengths, and the assessment of emotions.
Our experiments show that responses dier distinctly when
watching manipulated as opposed to original faces, from which
we derive perceptual cues to recognize face swaps. By investigat-
ing physiologically measurable signals, our ndings yield valuable
insights that may also be useful for advanced algorithmic detection.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
CHI ’21, May 8–13, 2021, Yokohama, Japan
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-8096-6/21/05.. .$15.00
https://doi.org/10.1145/3411764.3445627
CCS CONCEPTS
Human-centered computing Human computer interac-
tion (HCI)
;
User studies
;
Computing methodologies Per-
ception; Image manipulation.
KEYWORDS
video manipulation, human perception, eye tracking, face swapping
ACM Reference Format:
Leslie Wöhler, Martin Zembaty, Susana Castillo, and Marcus Magnor. 2021.
Towards Understanding Perceptual Dierences between Genuine and Face-
Swapped Videos. In CHI Conference on Human Factors in Computing Systems
(CHI ’21), May 8–13, 2021, Yokohama, Japan. ACM, New York, NY, USA,
13 pages. https://doi.org/10.1145/3411764.3445627
1 INTRODUCTION
Following recent technological advances, facial manipulations in
videos are becoming ubiquitously integrated into our everyday live
appearing in advertisements, movies, and on social media platforms.
Especially face swap videos, where faces of celebrities have been ex-
changed, have recently received media attention [
15
]. While these
face swaps are an extremely powerful tool in creative elds and the
entertainment sector, they also pose a potential threat to society.
By exchanging the original face in a video with that of a dierent
target person, oensive actions or illicit behaviour can be attrib-
uted to arbitrary people. One eld with enormous potential to be
abused is politics, where face swaps are feared to increase mistrust
in politicians and parties, which may drive voters towards more
radical groups [
18
,
63
,
73
]. Furthermore, due to the ease of use and
public availability of face-swapping frameworks [
16
,
17
,
23
], their
CHI ’21, May 8–13, 2021, Yokohama, Japan Wöhler et al.
application is not limited to specialized users. Therefore, realistic
face swaps can be generated by any individual and applied to social
contacts, e.g., in the scenario of cyber-bullying. Considering these
negative social implications, fundamental understanding of human
perception on face swaps as well as reliable detection methods for
facial manipulations are required.
The Computer Vision community is constantly in pursuit of new
methods to improve the generation and detection of manipulated
video content. While most of the detection systems are specically
tailored towards image analysis and the detection of artifacts [
44
,
67
,
70
], it is still unclear why and when people are misled by or
successfully able to recognize face swaps. As humans are inherently
very sensitive to faces and usually excel at facial recognition tasks,
a deeper understanding of this phenomenon would be an important
extension to current detectors.
In this paper, we aim to gain knowledge on the perception on
face swap videos and investigate which features or artifacts are
important for humans to recognize the forgeries. The resulting
insights can help to raise the awareness of viewers and learn about
common artifacts in face swaps. Moreover, we are looking into
measurable physiological responses which can be used to aid auto-
matic detection tools. To achieve this, we conduct dierent experi-
ments in order to understand the perception on face swaps. First,
we investigate eye tracking, which can be used to assess dierent
types of artifacts [
22
,
62
] and is a measurable response on vari-
ous devices [
28
,
40
,
72
]. Therefore it could be integrated into face
swap detection frameworks. In our experiment, we aim to to obtain
insights into dierences in viewing behaviour between real and
face-swapped videos. We further want to assess which facial areas
are most important for the visual detection of manipulations as this
may give hints towards noticeable and distracting artifacts. As a
second step, we look at the inuence of video duration on partici-
pants’ detection accuracy. We hypothesize that, the longer a video
is, the more information the viewer gets, which potentially leads to
a higher chance to notice manipulations. Finally, we move to a eld
that is still challenging to assess for computers but easy for humans:
conveyed emotions and expressions [
71
]. We are mainly interested
in whether face swaps are able to convey the same message as the
corresponding real videos used to generate the facial movements.
We assess this by looking at dierences in the recognition as well as
intensity and sincerity ratings of emotions and expressions between
face swaps and real videos. Based on this data, we obtain a better
estimate of the possible impact of face swaps than by just looking
at their conspicuousness. We further investigate if participants’
ratings indicate a general mismatch between the conveyed emotion
of real and face swap videos.
In summary, we contribute answers to the following research
questions:
How is gaze behaviour impacted by face swaps? Can eye
tracking be used to detect facial manipulations?
Does the length of video clips inuence participants’ assess-
ment accuracy?
Are conveyed emotions and expressions dierent between
face swaps and genuine videos?
2 RELATED WORK
In this section we briey discuss current techniques for the creation
and detection of facial manipulations as well as perceptual factors
in the context of facial processing.
Facial Manipulation and Detection Techniques. Among the ex-
isting facial manipulation techniques, this paper focuses on face-
swapping. This technique applies a face from one video to another
video while keeping the original body and expressions. Many algo-
rithms have been proposed for face-swapping using deep learning
techniques. While most approaches require a training on both tar-
get and source subjects [
39
,
54
], recent work also introduced a face
agnostic method [53].
The fast progress of face-swapping techniques, has led to a high
interest in facial manipulation detection methods. Many methods
focus on detecting errors and artifacts produced by deep learning-
based facial manipulation techniques [
2
,
27
,
44
,
68
,
70
]. Another
line of work has analyzed the specic properties of human facial
motion and physiological measurements to detect mismatches in
tampered videos. This way, unnatural blinking behaviour [
43
] and
the heart rate of actors were analyzed to detect manipulations [
24
].
Further, it was proposed to learn person-specic facial motion cues
to reliably detect manipulations of specic subjects [
3
]. While all
these methods are based on human behaviour and biology, none of
them directly uses feedback from the observer of the manipulated
videos. Thus, in this paper we aim to incorporate perceptual insights
into the detection pipeline using eye tracking.
Perception of Faces. Faces gather most of our attention in social
situations, as we rely heavily on them to recognize and assess in-
formation and emotions. Therefore, humans are highly specialized
in processing and analyzing faces [60].
In order to gain insights into the underlying processes of facial
recognition and exploration, eye tracking has been applied to facial
images and portrait videos. Early research showed that the eye,
nose, and mouth regions are xated in facial images [
34
,
48
]. Es-
pecially the eyes were found to draw the viewer’s attention [
6
].
The gazing behaviour in facial images is, however, inuenced by
various factors like the gender [
46
], the presence of artifacts [
7
]
or the familiarity with the shown face [
4
,
64
]. Similarly, the emo-
tions displayed on a face [
11
,
20
,
42
] and the task can inuence
the viewing behaviour of participants [
8
,
9
,
41
,
56
]. It was addition-
ally found that viewing behaviour is aligned to motion [
50
] and
varies based on the performed actions like talking or establishing
eye contact [
61
]. Finally, with the availability of speech and audio,
more xations occur on the mouth region, in contrast to muted
videos [65].
Eye tracking. Analyzing gaze has not only been employed in
facial processing research, but also to assess artifacts in videos.
Thereby previous research [
10
,
13
,
22
,
62
] found that artifacts attract
the gaze of observers. As face swapping introduces artifacts in
dierent areas of the face, we hypothesize that participants will look
at artifacts and their gaze diers between real and swap conditions.
Recently, the idea of exploring participants’ gaze in deepfake videos
has arisen indicating general interest in this topic [
29
]. In contrast
to our work, the authors only use partial face swaps in unrestricted
environments. Furthermore, their analysis focuses only on eye
Towards Understanding Perceptual Diferences between Genuine and Face-Swapped Videos CHI ’21, May 8–13, 2021, Yokohama, Japan
Figure 2: Close up frames for the stimuli of PEFS [69] and three examples for the stimuli from FaceForensics [58] (bottom
right) used in our experiments. Here, each frame contains an annotation of the gender (F or M) of the source actors noted as
body/face.
tracking statistics, like the number of xations and their duration,
without consideration of the xated facial areas.
Considering the highly specialized viewing behaviour for faces
and previous insights in the detection of artifacts via eye tracking,
we investigate whether original and manipulated videos evoke
dierences in gaze which could be used in facial manipulation
detection.
Emotions and Expressions. Next to general recognition of faces,
humans are also very good at recognizing emotions and expressions.
During conversations, information is not only conveyed through
speech but largely through facial expressions [
47
]. If the message
conveyed by the words and the perceived emotion do not match,
more emphasizes is put on the expression of the speaker [
12
]. In
the case of face swaps, this means that it is of high importance that
the expressions and emotions in the original video are retained to
evoke the desired eect in the viewer. However, emotion perception
is not only based on facial movements but also heavenly inuenced
by contextual cues like the situation, body language, or cultural
factors [
5
]. Moreover, people base their assessment of emotions and
expressions on previous knowledge of the speaker [
57
]. For face
swaps, this could produce a strong mismatch for viewers watching
manipulated videos of people they know. As a rst step, we set out to
investigate whether facial expressions and emotions are perceived
dierently between untampered videos and their corresponding
fake counterparts for unknown actors.
Face Swap Datasets. In order to unify research on the detection
of AI-synthesized face swap videos, numerous facial manipulation
datasets have been introduced. Among these, one contains short
clips (
<
10
)
in a controlled environment [
39
], one uses
publicly available videos of celebrities [
45
], and others focus on
news shows [
26
,
58
,
59
]. As an interesting concept for perceptual
research, Jiang et al. [
35
] included some recordings in a controlled
environment in their dataset. However, this dataset is still clearly
oriented towards the training and evaluation of neural networks
for manipulation detection. In contrast to these works, the PEFS
dataset [
69
] was specically designed for perceptual research and
contains partial human annotations towards the realism and arti-
facts of the stimuli. The videos from this dataset were recorded in a
controlled environment with three camera angles, feature various
induced emotions and expressions as well as long video durations
and dierent quality levels for the obtained face swaps. Therefore,
we mainly use stimuli from the PEFS dataset in our experiments.
3 EXPERIMENTAL DESIGN
We rst formulate our research hypotheses to then detail the design
and procedure for our three experiments.
3.1 Hypotheses
Based on our initial research questions, we formalize several hy-
potheses regarding the perception of face swaps:
H1: Even without knowing about the occurrence of manipu-
lations, participants are able to pick up artifacts in state-of-
the-art face swap videos.
H2: Eye tracking data diers between real and manipulated
videos.
H3: The length of videos has an eect on the recognition
accuracy as manipulations are easier to detect in longer
videos.
H4: While face swaps can retain the recognizability of emo-
tions, their intensity and sincerity can dier.
Based on these hypotheses, which we discuss in detail in the
following sections.
3.2 Experiment E1: Eye Tracking
In experiment E1, we recorded participants’ eye movements while
watching real videos and face swaps in order to assess their viewing
behaviour. After each trial, we additionally asked participants to
report whether they noticed something regarding the video quality
like artifacts. As we are interested in their unbiased impressions and
viewing behaviour, we did not inform them about the face swaps. In
order to investigate the eect of artifacts and their conspicuousness,
we used high quality and low quality face swaps.
Stimuli. For this experiment, it is crucial that the stimuli have
uniform background and illumination –to keep the attention of the
viewer on the actor–, as well as that all used stimuli are consistent
enough to compare viewing behaviour between videos. As previ-
ously mentioned, the restricted setup used in the recordings from
the PEFS dataset [
69
] ensures the satisfaction of both conditions, in
contrast to other available datasets. This dataset consists of muted
CHI ’21, May 8–13, 2021, Yokohama, Japan Wöhler et al.
Figure 3: Artifacts reported by participants for high and low quality face swaps.
video portraits taken in a controlled environment where the actors
are seated, one at a time, in front of a white wall and recorded while
talking. We used the same 11 annotated videos with 60 seconds
length (25 Hz) as in the experiments in the original paper consist-
ing of 2 female-female, 2 inter-gender and 7 male-male face swaps.
An overview for the selected face swaps can be seen in Fig. 2. We
used the real videos as well as their corresponding high-quality and
low-quality manipulations. Some dierences between the quality
of the stimuli are highlighted in Fig. 3.
Apparatus. We conducted the experiment using an EyeLink 1000
eye tracker by SR Research Ltd. with a sampling frequency of 1000
Hz which was placed at 65 cm distance to the participants and
performed monocular tracking of the right eye. The videos were
displayed on a 47-inch screen (100 Hz, 1920
×
1080 pixel) positioned
at a distance of 90 cm from a chin rest where participants placed
their head. Before each session, the eye tracker was calibrated using
a 9-point calibration and adjusted via drift correction between trials.
The participants sat behind black curtains which prevented direct
contact between them and the experiment conductor and ensured
a darkened environment avoiding external distractions.
Participants. We invited 40 participants to the experiment. The
participants consisted of 22 females and 18 males with ages between
18 and 35, an average age of 23
.
15 with a standard deviation (SD)
of 3
.
98. All of the participants were university students and came
from the elds of computer science, psychology, engineering, and
business studies. They could choose to either receive one course
credit or 10 EUR as compensation for their participation. Partici-
pants were mostly of German nationality with one Russian, one
South African and two Vietnamese nationals. Every participant
reported normal or corrected to normal vision.
Procedure. The experiment used a counterbalanced design with
full randomization, the real-manipulation pairs and the video qual-
ity as the in-between participant factor.
First, participants lled out a demographic questionnaire and an
informed consent form. Afterwards, they were instructed about the
general experimental setup. We briey explained the eye tracker
workings and that videos would be displayed during the experiment.
Participants were informed that we were especially interested in
observations regarding the video quality. They were, however, not
informed of the purpose of the experiment and the usage of face-
swapping.
Participants were instructed to sit on a chair with as little move-
ment as possible as to not compromise the eye tracking. Addition-
ally, we asked them not to speak while viewing the videos. Before
a video was displayed, a xation cross appeared in the center of
the screen for three seconds. After each video, the question ’Did
you notice anything in the video?’ was displayed, which the partic-
ipants answered via oral free description and was transcribed by
the conductor. The procedure was repeated for 11 trials randomly
selecting either ve real and six swapped videos or vice-versa. Dur-
ing randomization, we made sure that no participant would see a
manipulated video and its original counterpart.
After all trials, we performed a debrieng with the participants.
We rst asked about their general impressions of the faces in the
videos in order to gain further knowledge about suspected manipu-
lations and conspicuous artifacts. Finally, we explained the concept
of face-swapping, asked whether they knew about this concept, and
Towards Understanding Perceptual Diferences between Genuine and Face-Swapped Videos CHI ’21, May 8–13, 2021, Yokohama, Japan
Figure 4: Example frames for the stimuli used in experiment E3 to assess the recognition accuracy as well as intensity and
sincerity ratings for original and face swap videos.
informed them about the purpose of our experiment. The average
duration of the experiment was around 30 minutes.
3.3 Experiment E2: Video Length
In our next experiment, we focused on the accuracy of our partici-
pants at detecting face swap videos with dierent durations.
Stimuli. Given the more general and less restrictive nature of
the research question we want to address in this experiment, we
increased the number of stimuli and considered several sources.
This allows a more general look at the quality of state-of-the-art
face swaps. Next to the 11 face swaps used in E1, we selected four
more swaps from PEFS [
69
]. For each swap, we included sequences
from both frontal and right viewing angle. For this experiment,
we only used high-quality manipulations and their matching real
videos. We further included 16 face swaps from FaceForensics [
58
]
which contain short clips of news shows with one news anchor. We
chose these stimuli manually, aiming to include only face swaps of
high quality. Example frames for the stimuli are shown in Fig. 2.
Finally, we cut each video to the length of either 3, 5, 10, 30 or 60
seconds to obtain a good sample of time spans. Hereby, the same
length is used for each face swap and the corresponding real video.
Participants. We performed the experiment using Amazon Me-
chanical Turk. Participants were compensated with 1 USD and
were from from the United States, India, and Brazil. Overall, 40
participants (age range: 21-53) took part in this experiment.
Procedure. Before the experiment, participants were educated
about face swaps and their task of detecting the face-swapped
videos. Each trial consisted of a video stimuli and one question with
two possible answers. The video playback started automatically
and could not be paused or repeated. Afterwards, the video was
removed and instead the question ’Was this video manipulated?’
was displayed. It appeared together with two answer options real/
manipulated as a 2 alternatives forced choice task (2AFC). We per-
formed this experiment using a counterbalanced design with full
randomization and the real-manipulation pairs as the in-between
participant factor, making sure that no participant saw both a face
swap and its corresponding real version. Each participant completed
a total of 44 trials which were selected at random. The experiment
took around 15 minutes to complete.
3.4 Experiment E3: Emotion Assessment
In experiment E3 we aim to assess whether the conveyed emotions
and expressions of face swaps dier from their real counterparts
by looking at their recognition accuracy and ratings for intensity
and sincerity.
Stimuli. In contrast to other datasets, one part of the recordings
in PEFS [
69
] focuses on evoking emotions in the actors using a
method acting protocol [
37
]. We used a set of these recordings in
which the emotions and expressions have been isolated to short
clips. Each of the clips is labeled with the emotion which was
evoked in the actor. The set consists of 9 face swaps from the PEFS
dataset (2 female-female, 2 inter-gender, 5 male-male), see Fig. 2.
Our selection of stimuli was based on primary emotions and con-
versational expressions using 5 emotions proposed by Ekmann [
21
]
(Happiness [Hap], Sadness [Sad], Anger [Ang], Disgust [Disg], Sur-
prise [Sur]) and 4 conversational expressions [
14
] (Agree [Agr],
Disagree [Disa], Thinking [Thi] and Clueless [Clu]). We showed
these together with Neutral [Neu] for reference, totaling in 10 ex-
pressions. This way, our main consideration was on expressions
and emotions likely to occur in everyday conversations [14].
In this experiment, we only used high quality face swaps as these
are highly relevant due to their presence in modern entertainment
media or usage potential for defamatory content. Additionally, the
CHI ’21, May 8–13, 2021, Yokohama, Japan Wöhler et al.
Figure 5: E1: Average correct assessment percentages among videos and participants for each condition (left). A high rate
indicates that videos were correctly reported as either artifact-free (Real) or containing artifacts/ manipulations (Swap). Error
bars represent the standard error of the mean (SEM). Reported artifact occurrences for all videos (middle). Reported face
manipulations included face swaps, partial face alterations and beauty lters. Facial areas reported to be aected by artifacts
over all videos (right). Please note that the color legend is common to all plots.
high quality stimuli contain less distracting artifacts allowing par-
ticipants to focus on the expressions and movements in the videos.
Overall, our stimuli set consisted of a variety of 10 emotions/ expres-
sions per video for 9 face swaps and 9 corresponding real videos,
leading to a total of 180 stimuli with an average duration of 3 sec-
onds per video. Example frames of the stimuli can be found in
Fig. 4.
Participants. We performed the experiment online, while gath-
ering participants via university mailing lists. We recruited 21 stu-
dents who received a compensation of 10 EUR. for the completion
of the experiment. Participants were between 19 and 33 years old
with an average age of 25 years (SD
=
3
.
81). One participant was
of Vietnamese nationality, the rest were German. The balance of
genders was nearly equal with 10 male and 11 female participants.
Procedure. The experiment began with an explanation of the task
of assessing emotions in video portraits. We further explained that
facial manipulations were applied in some of the videos, however,
this should not be taken into account when rating sincerity. After
participants read the instructions, we asked for their nationality,
gender, and age. During the main experiment, participants started
the playback of a video by the press of a button. The video was only
played once and no interaction (pausing/ rewinding) was possible.
In order to avoid the analysis of a still frame, we removed the
video after playback. The participants had to choose one option
from a list containing all emotions and expressions included in the
experiment (10AFC task). Afterwards the participant had to rate
the intensity and sincerity of the shown emotion/ expression on a 7-
point Likert-scale (1 indicating extremely low and 7 extremely high).
All trials were chosen in a fully randomized manner. In contrast
to the previous experiments, here we used a full within design
regarding stimuli. Overall, the experiment took around 45 minutes.
4 ANALYSIS
In this section we evaluate our hypotheses based on the data ob-
tained in the experiments. In the following we refer to the dierent
conditions as follows: HQ for high-quality stimuli, with RealHQ
and SwapHQ for the real videos and manipulated videos; and LQ
for the low-quality stimuli, with RealLQ and SwapLQ. Note that,
the real videos are identical, however, participants’ responses may
be biased by the dierence in manipulated stimuli and, therefore,
we analyze them independently.
4.1 H1: Detection of Artifacts in Face-Swapped
Videos
In our rst hypothesis, H1, we pose that participants are able to spot
artifacts in state-of-the-art face swap videos, even if they are not in-
formed about the applied manipulations beforehand. We used a free
description task in E1 asking participants to report anything they
notice about the video quality. Therefore, we avoided biasing the
participants and can explore whether uninformed participants sus-
pect facial manipulations and which kind of artifacts are noticeable
without pointing the participants towards them. Please note that the
nature of the task determines that there are no xed answers, and no
xed number of possible occurrences for an answer aside from the
maximum:
  ×     =
110.
For our analysis, we exclusively consider artifacts reported in the
facial region as only these could be introduced due to the utilized
face-swapping approach. Participants sometimes stated which areas
of the face are aected by artifacts as shown in Fig. 5 (right). The
most commonly reported artifacts were blur, facial manipulations
(including face swaps, partial face alterations, and beauty lters),
unnatural expressions or eye movements, and contour artifacts, see
Fig. 5 (middle). Two participants did not report any artifacts on the
faces.
Using this data, we compute the rate of correctly assessed videos
for each condition as shown in Fig. 5 (left). Real videos were sel-
dom reported to contain artifacts and therefore have a high correct
assessment rate. Meanwhile, it is noticeable that the correct as-
sessment rate of SwapHQ trials is under 50% of the videos (Mean
= 44.44%, standard error of the mean (SEM) = 4.78). As partici-
pants therefore only reported artifacts in about half of the videos,
face swaps created by publicly available frameworks seem
Towards Understanding Perceptual Diferences between Genuine and Face-Swapped Videos CHI ’21, May 8–13, 2021, Yokohama, Japan
Figure 6: Exemplar frame with the areas of interest (Eyes,
Mouth, Nose, Contour) used to analyze the eye tracking x-
ations.
to mislead many viewers.
In contrast, 83.49% (SEM = 3.56) of the
SwapLQ trials were reported to contain artifacts demonstrating
their noticeable lower quality. This is further conrmed by an anal-
ysis on the correct reports yielding signicant dierences between
conditions (ANOVA F(3, 2636) = 189.934, p < 0.00001).
These results are especially interesting in comparison with the
annotations of the PEFS stimuli [
69
]. In their experiment, the au-
thors showed participants the same 11 face swaps and real videos
used in our experiment, but informed them beforehand of the ex-
istence of face swaps and had them decide whether a video was
real or manipulated (2AFC task). While the assessments of their
participants are overall similar to ours, they dier in the SwapLQ
condition. In their experiment 47% of participants reported these
videos to be face swaps, but in our experiment 83.49% (SEM = 3.56)
noticed facial artifacts. This may indicate that even though par-
ticipants consciously perceive artifacts in low quality face swaps,
they still believe them to stem from the video quality and do not
attribute them to face-swapping. In our experiment, full or partial
exchange or manipulation of facial features were suspected only
seven times for SwapHQ and three times for RealHQ videos. In
contrast, manipulated faces were reported 31 times for SwapLQ
and six times for RealLQ (out of 110 possible occurrences).
These results indicate that
participants are able to detect fa-
cial artifacts in high quality face-swapped videos, however,
they only seldom become suspicious of the face swaps.
4.2 H2: Dierences in Viewing Behaviour
between Real and Swapped Videos
In our second hypothesis, H2, we posed that the viewing behaviour
of participants diers between originals and face swaps as gaze is af-
fected by artifacts. As these are not bound to a specic location [
69
],
H2 is exploratory.
We base our evaluation of the eye tracking data on areas of
interest (AOIs). Similar to previous research [
30
], we generate the
AOIs automatically, using facial landmark detection. We rst extract
68-facial landmarks for each frame using Pythons Dlib library [
38
]
and afterwards group these landmarks to facial areas (Eyes, Mouth,
Nose, Contour), see Fig. 6. Each xation was then assigned to the
nearest landmark, to the background, or to the face outside of the
AOIs. Afterwards, we compute the cumulative duration of xations
that fall inside each AOI, i.e., the time spent looking at each area. A
visualization of the average time spend looking at each facial area,
over all videos and participants, is shown in Fig. 7 (left).
During our statistical analysis, we use a repeated measures
ANOVA (RMANOVA) to compare xations within one condition
with the AOIs as the within subject factor. We check for sphericity
using Mauchly and correct the results with Greenhouse-Geisser.
Afterwards, we use pair-wise Bonferroni corrected t-tests for the
post hoc analysis. For comparisons between conditions, we use a
Welch’s ANOVA as the assumption of homogeneity of variance
was violated (Levene) and Tukey as a post hoc test to investigate
all pairwise dierences with Bonferroni-corrected p-values.
Gaze Behaviour based on Areas of Interest. We rst analyze the
distribution of xations on eyes, mouth and nose within each con-
dition, as these are most important for facial processing in images
and videos [
34
,
48
,
65
]. For both types of real videos as well as for
SwapLQ, the attention of participants seems to be equally balanced
between nose, mouth, and eyes with no signicant dierences in
their distribution (RMANOVA all F’s < 0.5, all p’s > 0.05).
In contrast to this, xations are not equally distributed for the
SwapHQ condition. In this condition, participants focused more on
the mouth and nose while less on the eyes (RMANOVA F(2,175) =
3.56, p
=
0.04). Testing all pairs of xation areas for the SwapHQ
condition using a pair-wise Bonferroni corrected t-test, yields no
signicant dierences (all p’s
>
0.087). Based on these results, we
can conclude that
the distribution of xations in videos diers
between face swaps and original videos
. Especially, a lower
amount of xations on the eyes indicate manipulations in high
quality videos.
As a next step, we want to asses whether participants inhibit
a dierent viewing behaviour between manipulated and original
videos. For this, we include xations on the contour as we observed
artifact reports for this area, see Fig. 5 (right). In this scenario,
the video conditions as well as the AOIs are dependent variables,
therefore, we rst assess the data with an multivariate ANOVA
which yields a signicant result (F(15, 1302) = 2.48, p = 0.0013). To
afterwards analyze dierences between the conditions on AOIs,
we perform a Welch’s ANOVA. We nd signicant dierences for
the xations on the mouth between all conditions (F(3, 240) = 3.25,
p
=
0.022). This also holds true for the xations on the contour
region (F(3, 242) = 3.86, p = 0.01).
This result is also visible in Fig. 7 (left) where contours are fo-
cused more for SwapLQ and the mouth for SwapHQ trials. The
bias of xation towards the facial contours in SwapLQ could be
attributed to contour artifacts (example shown in Fig. 3), which
were reported more often in this condition than in the others. While
contour artifacts were less often reported than eye and mouth arti-
facts, they seem to be more salient. This may indicate, that not all
types of artifacts are equally xated. Therefore,
the saliency of
artifacts occurring in face swaps may not only be based on
their visibility but also on their spatial occurrence.
Viewing behaviour based on reported artifacts. We next look at
the xation times based on the self-assessments in order to analyze
which regions contribute most to the report of artifacts, see Fig. 7.
The right plot shows trials without artifact reports.
CHI ’21, May 8–13, 2021, Yokohama, Japan Wöhler et al.
Figure 7: E1: Mean xation times per area of interest (left). Fixations for trials in which participants reported artifacts (middle).
Fixations for trials without artifact reports (right). Error bars represent the SEM, and the color legend applies to all plots.
Following the results on the analysis of xations, we compare
the change in xations between trials with and without artifact
reports with a Tukey test. For the RealLQ condition, if artifacts
were reported participants focused more on the face (p = 0.0142;
 
= 5297.26, SEM = 495.89;
 
= 860.0, SEM
= 256.9). This may indicate that the face was analyzed more as
participants were looking for artifacts previously encountered in
the low quality manipulations. For the SwapLQ condition, trials
with artifact reports show more xations on the contours (p = 0.04;
 
= 15531.3, SEM = 1121.56;
 
= 9740.0, SEM
= 2631.97) and mouth (p = 0.045;
 
= 11848.26, SEM =
663.48;
 
= 8368.89, SEM = 1894.8). This seems rea-
sonable, as artifacts were reported often for these areas. The eyes
are not focused more in reported trials, even though they were as
often reported to show artifacts as the mouth region. Furthermore,
participants generally looked less onto the face for unreported tri-
als, indicating that they missed artifacts while watching dierent
areas of the video (p = 0.000225;
 
= 3651.3, SEM = 352.98;
 
= 7062.22, SEM = 911.96). In contrast, no signicant
dierence between reported and unreported trials is found for the
SwapHQ condition (all p’s
>
0.09). This is aligned with the overall
reduced number of reported artifacts.
Considering the distribution of xations on nose, mouth, and
eyes for SwapHQ separately for reported and unreported artifacts,
yields only a signicance for artifact reports (Welch’s F(2, 95) = 3.71,
p = 0.028). This means,
we can only detect a shift in viewing be-
haviour in trials where artifacts were reported consciously.
4.3 H3: Higher detection accuracy for longer
videos
We formulated the directed hypothesis H3 stating that a video’s
length and correct assessment should be positively correlated as
the amount of artifacts increases with the length of face swaps.
This is based on previous research stating that the duration of
stimuli aects both attention [
19
,
31
] and performance on artifact
detection [49].
Therefore, we visualize the assessment accuracy of E2 based
on the length of stimuli in Fig. 8. The plot shows that the overall
recognition rate for original videos of both datasets is rather con-
stant, while face swaps show slight variations. (Please note that
Figure 8: E2: Assessment accuracy (percentual) for all video
durations for both datasets. Error bars indicate the SEM.
Please note, that FaceForensics focuses on short video clips
and therefore was not included in the 60 second stimuli.
FaceForensics does consist of short clips under 60 seconds and was
therefore excluded from the condition with that length.) However,
we nd
no signicant dierences in correct assessments be-
tween the dierent video lengths
(Welch’s F(3,15.6) = 1.1, p =
0.379). This may indicate that participants decide early whether a
video is real or not.
As this was an interesting observation for us, we also decided to
look into the eye tracking data of E1 for dierent time spans. To
investigate if there is a specic gaze pattern early in the videos, we
look at the xations in the rst 3, 5, 10, 30 and the full 60 seconds
based on AOIs. The bias we found in the SwapHQ condition (see
Sec. 4.2), where participants focused more on mouth and nose and
less on the eyes, is present in all of these time frames (RMANOVA
3s: F(2, 202) = 6.78, p = 0.002; 5s: F(2,195) = 5.2, p = 0.008; 10s: F(2,
189)= 5.64, p = 0.006; 30s: F(2, 176) = 4.02, p = 0.027; 60s: F(2,175) =
3.56, p = 0.04). This indicates that
the bias may be independent
of the video length.
Towards Understanding Perceptual Diferences between Genuine and Face-Swapped Videos CHI ’21, May 8–13, 2021, Yokohama, Japan
Figure 9: E3: Correct recognition percentages (left), intensity (middle) and sincerity (right) rates, all averaged among partic-
ipants and videos. Error bars represent the SEM, the chance line for recognition is drawn in black, and the color legend is
common to all graphs.
4.4 H4: Recognition, Intensity and Sincerity
Dierences
Our fourth hypothesis (H4) is that the recognition accuracy of
emotions and expressions is the same for real videos and face swaps,
however, the perceived intensity and sincerity dier.
For face swap videos, the face of a target is applied to a video of
another individual, while keeping the second person’s movements
and facial expressions. Therefore, a general goal of face swaps
is to stay as close as possible to the targeted facial expressions
in order to convey the same message as the untampered video.
Another area that focuses on how the conveyed message is impacted
by facial alterations is the stylization of videos. We follow a well
known analysis in this eld and assess emotions and expressions
by examining the recognized emotion along with its perceived
intensity and sincerity [
66
]. To check the results for signicance, we
use Welch’s ANOVA (as inhomogeneous variances were detected
by Levene) and do the post hoc analysis with Tukey tests with
Bonferroni-corrected p-values.
The rst step is to look at how well our participants were able
to recognize the emotions and expressions and which of them were
confused easily Looking at the overall recognition accuracy in
Fig. 9 (left), we see that participants were overall able to recognize
the emotions similarly between real videos and face swaps. We
further visualize the recognized emotions as heatmaps in Fig. 10.
As seen from these gures, participants mainly confused clueless
and thinking in both conditions. For face swaps, there was also
some confusion between disgust and sadness. During our analysis
we nd that the recognition is signicantly dierent in real and
swap videos (Welch’s F(1,3778) = 11, p = 0.000898). A post hoc test
only reveals dierences for recognising the emotion disgust (p =
0.000028,

= 0.64, SEM = 0.04,

= 0.40, SEM =
0.03). This indicates that overall
the recognition of emotions
and expressions in face swaps and real videos is similar.
Next, we perform the same analysis for the intensity and sincerity
ratings of participants as shown in the middle and right plot of Fig. 9.
As a rst impression from these plots, real videos are generally rated
with higher intensity and sincerity, however, the assessments do
not dier too strongly. Assessing the ratings between conditions we
do obtain a highly signicant results for intensity (Welch’s F(1,3773)
= 29.4, p = 0.0000000613) and sincerity (Welch’s F(1,3776) = 12.8, p
= 0.000352). A post hoc test reveals high signicance for intensity
ratings on clueless (p = 0.000005;

= 5.18, SEM = 0.16;

= 4.46, SEM = 0.19) and disgust (p = 0.00004;

= 5.56, SEM = 0.1;

= 4.91, SEM = 0.1) and signicance
for surprise (p = 0.04;

= 5.62, SEM = 0.11;

=
5.32, SEM = 0.15). Dierences in sincerity ratings are signicant
for disagree (p = 0.025;

= 4.71, SEM = 0.09;

=
4.31, SEM = 0.09) and thinking (p = 0.004;

= 4.59, SEM
= 0.1;

= 4.18, SEM = 0.1). This indicates that
certain
emotions and expressions are rated less intense and sincere
in face swaps than in real videos.
Figure 10: E3: Confusion matrices for the recognition of ex-
pressions in real (left) or swap (right) videos. Rows index the
shown expression while columns indicate the voted ones.
5 DISCUSSION OF FINDINGS
In this section, we rst answer our initial research questions for-
mulated in the introduction. Afterwards we discuss interesting
observations, limitations and new ideas for future research.
5.1 Answers to Our Research Questions
Based on our analysis, we are now able to address our initial re-
search questions
How is gaze behaviour impacted by faces swaps? Can eye tracking
be used to detect facial manipulations? Our results indicate that
CHI ’21, May 8–13, 2021, Yokohama, Japan Wöhler et al.
viewing behaviour is impacted by face swaps. We nd that xa-
tions are more prominent on the nose and mouth than on the eyes
for high quality face swaps in trials where participants reported
artifacts. These artifacts, however, do not necessarily lead the par-
ticipants to asses a video as a face swap. As eye tracking is easily
measurable and integrable with common displays [
28
,
40
,
72
], it
could be a more reliable and less intrusive substitute for self reports
in the debunking of facial manipulations. As we nd signicant
dierences within the distribution of xations of high quality face
swaps, the corresponding real video would not be necessary for the
classication of a video as a face swap.
Furthermore, we nd that the location of artifacts in face swaps
inuences their saliency. Especially artifacts on facial contours,
which were reported less often than artifacts on eyes and mouth,
still inuenced the xation behaviour. Additionally, we nd that
artifacts are generally more xated, when they are also reported
by the participants indicating that participants may consciously
explore artifacts when they notice them.
Does the length of video clips inuence participants’ assessment
accuracy? We nd no signicant dierence in manipulation recog-
nition even for clips as short as three seconds. This contrasts our
initial hypothesis as we assumed that longer videos would give
participants more time to explore the face and notice unnatural
expressions and artifacts. However, the results indicate that the
conscious debunking of the video, if to happen, occurs rather early
and thus, the rst impression has an impact on the decision of
participants.
Are conveyed emotions and the expressions dierent between face
swaps and genuine videos? While the recognition accuracy, as well
as intensity and sincerity ratings for emotions and expressions, are
similar in both conditions, real videos generally seem to obtain
higher ratings. We further found signicant dierences for disgust,
surprise, disagree, and thinking which therefore do not match the
target video. Despite this, the assessment of emotions and expres-
sions in face swaps already matches the corresponding real videos
surprisingly well. This indicates that face swaps are overall able to
convey the intended emotions and expressions making them even
more powerful but also potentially more dangerous.
5.2 General Discussion
In the following we discuss interesting observations based on our
data and analysis.
Discussion on the Fixation Distribution. We observe a nearly equal
amount of xations on the eye, nose and mouth regions for real
videos and low quality face swaps. While eye tracking experiments
for static images often found a higher number of xations on the
eyes than on the mouth [
6
], previous research on facial videos
suggest that the mouth is more often xated if the actor is talk-
ing [
41
,
65
]. A high number of xations on the nose can be attributed
to a central bias as participants tend to xate on a region that allows
them to quickly change their attention to other areas [
32
]. Thus,
the general distribution of xations in our experiment aligns to
previous research.
However, we do detect a dierence in this distribution for high
quality swaps. Here our participants looked less often on the eyes
and instead focused more on the mouth and nose. As the actors
in the videos speak and therefore constantly move their mouth,
the increased focus on this area in high quality face swaps may be
further amplied by unnatural movements which may not match
those of natural speech. This mismatch may trigger the interest of
observers and lead to more xations.
Quality of State-of-the-Art Face Swaps. Overall, the high quality
stimuli of the PEFS dataset [
69
] were often mistaken for real videos
in our experiment focusing on the inuence of video length (see
Sec. 4.3). They were reported to be real nearly as often as the genuine
videos (74% vs. 71%). This contrasts not only the number of artifact
reports in our eye tracking experiment (see Sec. 4.1) but also the
realness assessment reported in the PEFS paper (Real 80%, Swap
65%) [
69
]. Therefore, the ratings of our participants are probably
inuenced by the varying length and diversity of the stimuli. As this
reduces the ability to successfully distinguish between real videos
and face swaps, it may also mean that in real world scenarios when
users are leisurely watching video clips from dierent sources and
with varying length, their ability to detect face swaps may be even
further reduced. We believe that, soon, the visual dierence between
original and manipulated videos will disappear, as classiers can
directly be used to improve face swaps [
25
]. Thus, it is crucial to
also discuss ethical and legal regulations on the handling of face
swaps as they may become ubiquitous.
Conspicuousness of Face Swaps. In our experiments, we nd sev-
eral hints towards conspicuous artifacts and features of face swaps.
We highlighted some artifacts reported by participants in Fig. 3. To
our surprise, the contour artifacts that appear when the shapes of
two faces do not match, were only seldom reported for high quality
manipulations (3 out of 20 participants). In contrast, those seem to
be a good cue for low quality manipulations, where 10 of the 20
participants noticed them. Additionally, visible blur on the faces
was often attributed to other types of manipulations like beauty
lters. Instead of clear artifacts, participants often discussed the
unnatural movements of eyes and mouth as noticeable for high
quality face swaps. Some even stated that the eyes of face swaps
seemed to be lifeless or suspected the actors to be blind. These nd-
ings indicate, that for high quality manipulations artifacts are less
notable, however, viewers can use behavioural cues like unnatural
expressions to debunk face swaps.
In one of our experiments (E1), we did not inform the partic-
ipants about the manipulated videos or face swaps beforehand.
However, we explained the manipulations to them in a debrieng
and asked them whether they knew about the concept of face swaps
and whether they had some suspicion towards this technique which
they did not mention before. During the debrieng, 11 of our 40
participants stated that they thought some form of manipulation
was applied to the videos but were not able to exactly tell what kind
of manipulation. Six other participants would not have suspected
face swaps, but rather thought videos were post-processed by tech-
niques like beauty lters. All other 23 participants did not suspect
this level of manipulation or post processing and rather attributed
artifacts to the recording, limited video quality or compression.
This is interesting, as it shows how convincing face swaps are and
that, even when participants report artifacts, they usually do not
assume to be confronted with a strongly manipulated video.
Towards Understanding Perceptual Diferences between Genuine and Face-Swapped Videos CHI ’21, May 8–13, 2021, Yokohama, Japan
Finally, even though the participants of the experiment were
rather young and mostly university students (including nearly 50%
computer science students) eight of them never heard about face
swaps. This emphasizes not only the need to establish reliable
detection methods but also the urgency to educate people about
possible manipulations and modern face-swapping techniques.
Observations on the Assessment of Emotions. Our ndings from
E3 do not indicate a strong mismatch between intended and con-
veyed message in face swaps. The assessment of participants is,
however, not matching real videos for disgust, surprise, disagree,
and thinking. Previous research found that these expressions are
some of the ones relying the most on a perfect balance of rigid head
motion, body language, and the coordinated movement of several
facial areas. If either of these factors is slightly o, the recognition
rate drops instantly [
55
]. Current face swaps, therefore, seem to
reproduce the overall expressions of body and face correctly, how-
ever, the balance between movements is not fully consistent to the
original video, as some of the micro-expressions do not transfer
perfectly. This can impact the recognition as well as intensity and
sincerity impression for more complex expression.
Demographic Considerations. Eye tracking and emotion recogni-
tion literature considers age-dependent factors [
1
,
33
,
51
]. However,
we assume young people are more likely to come into contact and
concern themselves with face swaps, making them a challenging
demographic. Nevertheless, a rst analysis on participants’ back-
ground points towards familiarity with technology not being a
deciding factor for face swap recognition. Following that line, it
could also be interesting to analyze the impact of gender-based
dierences (e.g., [
46
]), to further extend our comprehension of gaze
patterns on face swaps. We refer the reader to the supplemental
material for our preliminary research in those directions (see Sup-
plement Material, Sec 4). Overall, fully analysing the eect of face
swaps on dierent demographic groups may be an interesting line
for future work.
Limitations. Our main goal was to investigate dierences be-
tween face swaps and original videos, therefore, we used stimuli
recorded in a controlled environment which shows only one actor
at a time in a seated pose and without audio. This way we could
make sure that the responses of our participants mainly stem from
the face swaps and that there was no distraction by other objects
in the scene. However, real world videos will only seldom be this
restricted. In order to transfer our ndings to real world scenar-
ios, further experiments focusing on more varied scenes could be
conducted. Possibilities for this include videos with unrestricted
environments, free movements, more than one person, or audio.
Furthermore, the participants throughout all our experiments are
from a rather young demographic. Therefore our results may not
reect the impact of face swaps on society as a whole. As the stimuli
of the used dataset only contain young actors [
69
], the results may
also be inuenced by the age of the actors.
Actionable Insights. Current state of the art on forgery detec-
tion relies on specic artifacts inherent to CNN-based generator
models [
67
]. While currently performing well, researchers foresee
this changing in the near future, requiring new perspectives [
52
].
Humans excel in facial processing, thus we believe that insights
into the perception of face swaps can be a new perspective to
detect forged videos. Following our ndings, we could use the gath-
ered eye tracking data to design a face swap classier, similarly to
frameworks for saliency prediction [
10
,
36
,
68
]. Such a perceptually
driven classier could outperform the human eye, being able to
detect those artifacts on the pixel level. This is necessary as face-
swaps are already very close to be visually indistinguishable from
real videos.
6 CONCLUSION
In this paper, we investigated the perception of face swaps using
three experiments. We found that participants are able to recognize
artifacts in state-of-the-art face swaps but only seldom attribute
them to manipulations when they are not previously informed
about them. Furthermore, we investigated eye tracking patterns and
found signicant dierences in xation behaviour, with participants
focusing less on the eyes for high quality swaps. Interestingly, the
length of video clips did not inuence participants’ assessment
accuracy. Finally, conveyed emotions and expressions in face swaps
are not yet completely indistinguishable from reality. However,
these generated results are already very close to be visually non
detectable.
Our ndings yield valuable insights towards better understand-
ing the perceptual dierences between genuine and face-swapped
videos. We found valuable indications of physiologically measur-
able signals to debunk face swap videos that could assist and guide
future algorithmic detection tools.
As future work, it seems interesting to investigate whether our
ndings are also applicable to other types of facial manipulation
techniques and to assess familiarity eects in face swap videos.
ACKNOWLEDGMENTS
The authors gratefully acknowledge funding by the German Science
Foundation (DFG MA2555/15-1 “Immersive Digital Reality”), and
the L3S Research Center, Hanover, Germany.
REFERENCES
[1]
Laura Abbruzzese, Nadia Magnani, Ian Hamilton Robertson, and Mauro Mancuso.
2019. Age and gender dierences in emotion recognition. Frontiers in psychology
10 (2019), 2371. https://doi.org/10.3389/fpsyg.2019.02371
[2]
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018.
Mesonet: a compact facial video forgery detection network. In IEEE Interna-
tional Workshop on Information Forensics and Security (WIFS). IEEE, New York,
NY, USA, 1–7. https://doi.org/10.1109/WIFS.2018.8630761
[3] Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao
Li. 2019. Protecting World Leaders Against Deep Fakes. In IEEE/CVF Conference
on Computer Vision and Pattern Recognition Workshop (CVPRW). IEEE, New York,
NY, USA, 38–45.
[4]
Robert R. Altho and Neal J. Cohen. 1999. Eye-Movement-Based Memory Eect:
A Reprocessing Eect in Face Perception. Journal of Experimental Psychology:
Learning, Memory, and Cognition 25, 4 (7 1999), 997–1010. https://doi.org/10.
1037/0278-7393.25.4.997
[5]
Lisa Feldman Barrett, Batja Mesquita, and Maria Gendron. 2011. Context in
emotion perception. Current Directions in Psychological Science 20, 5 (2011),
286–290. https://doi.org/10.1177/0963721411422522
[6]
Elina Birmingham and Alan Kingstone. 2009. Human social attention. Progress in
brain research 176 (2009), 309–320. https://doi.org/10.1016/S0079-6123(09)17618-
5
[7] Dario Bombari, Fred W Mast, and Janek S Lobmaier. 2009. Featural, Congural,
and Holistic Face-Processing Strategies Evoke Dierent Scan Patterns. Perception
38, 10 (2009), 1508–1521. https://doi.org/10.1068/p6117
CHI ’21, May 8–13, 2021, Yokohama, Japan Wöhler et al.
[8]
Isabelle Boutet, Chantal L Lemieux, Marc-André Goulet, and Charles A Collin.
2017. Faces elicit dierent scanning patterns depending on task demands. Atten-
tion, Perception, & Psychophysics 79, 4 (2017), 1050–1063. https://doi.org/10.3758/
s13414-017- 1284-y
[9]
Julie N Buchan, Martin Paré, and Kevin G Munhall. 2007. Spatial statistics of
gaze xations during dynamic face processing. Social Neuroscience 2, 1 (2007),
1–13. https://doi.org/10.1080/17470910601043644
[10]
Martin Čadík, Robert Herzog, Rafał Mantiuk, Radosław Mantiuk, Karol
Myszkowski, and Hans-Peter Seidel. 2013. Learning to predict localized dis-
tortions in rendered images. In Computer Graphics Forum, Vol. 32. Wiley Online
Library, Hoboken, New Jersey, USA, 401–410. https://doi.org/10.1111/cgf.12248
[11]
Manuel G. Calvo and Lauri Nummenmaa. 2009. Eye-movement assessment of
the time course in facial expression recognition: Neurophysiological implications.
Cognitive, Aective, & Behavioral Neuroscience 9, 4 (2009), 398–411. https://doi.
org/10.3758/CABN.9.4.398
[12]
Pilar Carrera-Levillain and Jose-Miguel Fernandez-Dols. 1994. Neutral faces
in context: Their emotional meaning and their function. Journal of Nonverbal
Behavior 18, 4 (1994), 281–299. https://doi.org/10.1007/BF02172290
[13]
Susana Castillo, Tilke Judd, and Diego Gutierrez. 2011. Using eye-tracking to
assess dierent image retargeting methods. In ACM SIGGRAPH Symposium on
Applied Perception in Graphics and Visualization (APVG). ACM, New York, NY,
USA, 7–14. https://doi.org/10.1145/2077451.2077453
[14]
Susana Castillo, Christian Wallraven, and Douglas Cunningham. 2014. The
Semantic Space for Facial Communication. Computer Animation and Virtual
Worlds 25, 3-4 (May 2014), 223–231. https://doi.org/10.1002/cav.1593
[15]
BBC News Daniel Thomas. 2020. Deepfakes: A threat to democracy or just a bit
of fun? BBC. Retrieved September 16, 2020 from https://www.bbc.com/news/
business-51204954
[16]
DeepFaceLab. 2019. DeepFaceLab. https://github.com/iperov/DeepFaceLab
Accessed: 2020-01-06.
[17]
DeepFakes. 2019. DeepFakes. https://github.com/deepfakes/faceswap Accessed:
2020-01-06.
[18]
Tom Dobber, Nadia Metoui, Damian Trilling, Natali Helberger, and Claes de
Vreese. 2020. Do (Microtargeted) Deepfakes Have Real Eects on Political At-
titudes? The International Journal of Press/Politics 0 (2020), 1940161220944364.
https://doi.org/10.1177/1940161220944364
[19]
Howard E Egeth and Steven Yantis. 1997. Visual attention: Control, representation,
and time course. Annual review of psychology 48, 1 (1997), 269–297. https:
//doi.org/10.1146/annurev.psych.48.1.2699
[20]
Hedwig Eisenbarth and Georg W Alpers. 2011. Happy mouth and sad eyes:
scanning emotional facial expressions. Emotion 11, 4 (2011), 860. https://doi.org/
10.1037/a0022758
[21]
Paul Ekman, E Richard Sorenson, and Wallace V Friesen. 1969. Pan-cultural
elements in facial displays of emotion. Science 164, 3875 (1969), 86–88. https:
//doi.org/10.1126/science.164.3875.86
[22]
Ulrich Engelke, Daniel P Darcy, Grant H Mulliken, Sebastian Bosse, Maria G
Martini, Sebastian Arndt, Jan-Niklas Antons, Kit Yan Chan, Naeem Ramzan, and
Kjell Brunnström. 2016. Psychophysiology-based QoE assessment: A survey.
IEEE Journal of Selected Topics in Signal Processing 11, 1 (2016), 6–21. https:
//doi.org/10.1109/JSTSP.2016.2609843
[23]
FaceSwap. 2019. FaceSwap. https://github.com/MarekKowalski/FaceSwap
Accessed: 2020-01-06.
[24]
Steven Fernandes, Sunny Raj, Eddy Ortiz, Iustina Vintila, Margaret Salter,
Gordana Urosevic, and Sumit Jha. 2019. Predicting Heart Rate Variations
of Deepfake Videos using Neural ODE. In IEEE International Conference on
Computer Vision Workshops (CVPRW). IEEE, New York, NY, USA, 1721–1729.
https://doi.org/10.1109/ICCVW.2019.00213
[25]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial
nets. In Advances in Neural Information Processing Systems (NeurIPS). ACM, New
York, NY, USA, 2672–2680. https://doi.org/10.5555/2969033.2969125
[26]
GoogleAIBlog. 2019. Contributing Data to Deepfake Detection Research. https:
//ai.googleblog.com/2019/09/contributing-data- to-deepfake-detection.html
[27]
D. Güera and E. J. Delp. 2018. Deepfake video detection using recurrent neural
networks. In IEEE International Conference on Advanced Video and Signal Based
Surveillance (AVSS). IEEE, New York, NY, USA, 1–6. https://doi.org/10.1109/AVSS.
2018.8639163
[28]
Tianchu Guo, Yongchao Liu, Hui Zhang, Xiabing Liu, Youngjun Kwak, Byung
In Yoo, Jae-Joon Han, and Changkyu Choi. 2019. A Generalized and Robust
Method Towards Practical Gaze Estimation on Smart Phone. In IEEE International
Conference on Computer Vision Workshops (ICCVW). IEEE, New York, NY, USA,
1131–1139. https://doi.org/10.1109/ICCVW.2019.00144
[29]
Parul Gupta, Komal Chugh, Abhinav Dhall, and Ramanathan Subramanian. 2020.
The eyes know it: FakeET-An Eye-tracking Database to Understand Deepfake
Perception. In International Conference on Multimodal Interaction (ICIM). ACM,
New York, NY, USA, 519–527. https://doi.org/10.1145/3382507.3418857
[30]
Roy S Hessels, Jeroen S Benjamins, Tim HW Cornelissen, and Ignace TC Hooge.
2018. A validation of automatically-generated Areas-of-Interest in videos of a
face for eye-tracking research. Frontiers in psychology 9 (2018), 1367. https:
//doi.org/10.3389/fpsyg.2018.01367
[31]
Stephen J Hinde, Tim J Smith, and Iain D Gilchrist. 2018. Does narrative drive
dynamic attention to a prolonged stimulus? Cognitive research: principles and
implications 3, 1 (2018), 45. https://doi.org/10.1186/s41235-018- 0140-5
[32]
Janet Hui-wen Hsiao and Garrison Cottrell. 2008. Two xations suce in face
recognition. Psychological science 19, 10 (2008), 998–1006. https://doi.org/10.
1111/j.1467-9280.2008.02191.x
[33]
Derek M Isaacowitz, Corinna E Löckenho, Richard D Lane, Ron Wright, Lee
Sechrest, Robert Riedel, and Paul T Costa. 2007. Age dierences in recognition
of emotion in lexical stimuli and facial expressions. Psychology and aging 22, 1
(2007), 147. https://doi.org/10.1037/0882-7974.22.1.147
[34]
Stephen W Janik, A Rodney Wellens, Myron L Goldberg, and Louis F Dell’Osso.
1978. Eyes as the center of focus in the visual examination of human faces.
Perceptual and Motor Skills 47, 3 (1978), 857–858. https://doi.org/10.2466/pms.
1978.47.3.857
[35]
Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020.
DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detec-
tion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
IEEE, New York, NY, USA, 2886–2895. https://doi.org/10.1109/CVPR42600.2020.
00296
[36]
Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning
to predict where humans look. In IEEE International Conference on Computer
Vision (ICCV). IEEE, New York, NY, USA, 2106–2113. https://doi.org/10.1109/
ICCV.2009.5459462
[37]
Kathrin Kaulard, Douglas W. Cunningham, Heinrich H. Bültho, and Christian
Wallraven. 2012. The MPI Facial Expression Database A Validated Database
of Emotional and Conversational Facial Expressions. PLoS ONE 7, 3 (03 2012),
e32321. https://doi.org/10.1371/journal.pone.0032321
[38]
Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine
Learning Research 10 (2009), 1755–1758. http://www.dlib.net.
[39]
I. Korshunova, W. Shi, J. Dambre, and L. Theis. 2017. Fast face-swap using
convolutional neural networks. In IEEE International Conference on Computer
Vision (ICCV). IEEE, New York, NY, USA, 3677–3685. https://doi.org/10.1109/
ICCV.2017.397
[40]
Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhan-
darkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone.
In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,
New York, NY, USA, 2176–2184. https://doi.org/10.1109/CVPR.2016.239.
[41]
Charissa R Lansing and George W McConkie. 2003. Word identication and
eye xation locations in visual and visual-plus-auditory presentations of spoken
sentences. Perception & psychophysics 65, 4 (2003), 536–552. https://doi.org/10.
3758/BF03194581
[42]
Steven M. Gillespie Laura J. Wells and Pia Rotshtein. 2016. Identication of
emotional facial expressions: Eects of expression, intensity, and sex on eye gaze.
PloS ONE 11, 12 (2016). https://doi.org/10.1371/journal.pone.0168307
[43]
Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing ai
created fake videos by detecting eye blinking. In IEEE International Workshop
on Information Forensics and Security (WIFS). IEEE, New York, NY, USA, 1–7.
https://doi.org/10.1109/WIFS.2018.8630787
[44]
Yuezun Li and Siwei Lyu. 2019. Exposing DeepFake Videos By Detecting Face
Warping Artifacts. In IEEE Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), Vol. 2. IEEE, New York, NY, USA, 46–52.
[45]
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A
Large-Scale Challenging Dataset for DeepFake Forensics. In IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR). IEEE, New York, NY, USA,
3207–3216. https://doi.org/10.1109/CVPR42600.2020.00327
[46]
Tsoey Wun Man and Peter J Hills. 2016. Eye-tracking the own-gender bias in
face recognition: Other-gender faces are viewed dierently to own-gender faces.
Visual Cognition 24, 9-10 (2016), 447–458. https://doi.org/10.1080/13506285.2017.
1301614
[47]
Albert Mehrabian. 2008. Communication without words. Communication theory
6 (2008), 193–200.
[48]
I Mertens, H Siegmund, and O-J Grüsser. 1993. Gaze motor asymmetries in the
perception of faces during a memory task. Neuropsychologia 31, 9 (1993), 989–998.
https://doi.org/10.1016/0028-3932(93)90154- R
[49]
Xiongkuo Min, Guangtao Zhai, Zhongpai Gao, and Chunjia Hu. 2014. Inuence
of compression artifacts on visual attention. In IEEE International Conference on
Multimedia and Expo (ICME). IEEE, New York, NY, USA, 1–6. https://doi.org/10.
1109/ICME.2014.6890189
[50]
Parag K Mital, Tim J Smith, Robin L Hill, and John M Henderson. 2011. Clus-
tering of gaze during dynamic scene viewing is predicted by motion. Cognitive
computation 3, 1 (2011), 5–24. https://doi.org/10.1007/s12559-010- 9074-z
[51]
Nora A Murphy and Derek M Isaacowitz. 2010. Age eects and gaze patterns
in recognising emotional expressions: An in-depth look at gaze measures and
covariates. Cognition and Emotion 24, 3 (2010), 436–452. https://doi.org/10.1080/
Towards Understanding Perceptual Diferences between Genuine and Face-Swapped Videos
02699930802664623
[52]
Joao C Neves, Ruben Tolosana, Ruben Vera-Rodriguez, Vasco Lopes, Hugo
Proença, and Julian Fierrez. 2020. GANprintR: Improved Fakes and Evalua-
tion of the State of the Art in Face Manipulation Detection. IEEE Journal of
Selected Topics in Signal Processing 14, 5 (2020), 1038–1048. https://doi.org/10.
1109/JSTSP.2020.3007250
[53]
Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. FSGAN: Subject Agnostic Face
Swapping and Reenactment. In IEEE International Conference on Computer Vision
(ICCV). IEEE, New York, NY, USA, 7183–7192. https://doi.org/10.1109/ICCV.2019.
00728
[54]
Y. Nirkin, I. Masi, A. T. Tuan, T. Hassner, and G. Medioni. 2018. On face seg-
mentation, face swapping, and face perception. In IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2018). IEEE, New York, NY, USA,
98–105. https://doi.org/10.1109/FG.2018.00024
[55]
Manfred Nusseck, Douglas W. Cunningham, Christian Wallraven,
and Heinrich H. Bültho. 2008. The contribution of dierent fa-
cial regions to the recognition of conversational expressions. Jour-
nal of Vision 8, 8 (06 2008), 1–1. https://doi.org/10.1167/8.8.1
arXiv:https://arvojournals.org/arvo/content_public/journal/jov/933530/jov-8-8-
1.pdf
[56]
Ee J Pereira, Elina Birmingham, and Jelena Ristic. 2020. The eyes do not
have it after all? Attention is not automatically biased towards faces and eyes.
Psychological research 84, 5 (2020), 1407–1423. https://doi.org/10.1007/s00426-
018-1130- 4
[57]
Rista C Plate, Adrienne Wood, Kristina Woodard, and Seth D Pollak. 2019. Prob-
abilistic learning of emotion categories. Journal of Experimental Psychology:
General 148, 10 (2019), 1814. https://doi.org/10.1037/xge0000529
[58]
Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies,
and Matthias Nießner. 2018. FaceForensics: A Large-scale Video Dataset for
Forgery Detection in Human Faces. CoRR abs/1803.09179 (2018). arXiv:1803.09179
http://arxiv.org/abs/1803.09179
[59]
Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies,
and Matthias Nießner. 2019. FaceForensics++: Learning to Detect Manipulated
Facial Images. In IEEE International Conference on Computer Vision (ICCV). IEEE,
New York, NY, USA, 1–11. https://doi.org/10.1109/ICCV.2019.00009
[60]
Guillaume A Rousselet, Marc J-M Macé, and Michèle Fabre-Thorpe. 2003. Is it
an animal? Is it a human face? Fast processing in upright and inverted natural
scenes. Journal of Vision 3, 6 (2003), 5–5. https://doi.org/10.1167/3.6.5
[61]
Hannah Scott, Jonathan P Batten, and Gustav Kuhn. 2019. Why are you looking
at me? It’s because I’m talking, but mostly because I’m staring or not doing
much. Attention, Perception, & Psychophysics 81, 1 (2019), 109–118. https:
//doi.org/10.3758/s13414-018- 1588-6
[62]
Jan-Philipp Tauscher, Maryam Mustafa, and Marcus Magnor. 2017. Comparative
analysis of three dierent modalities for perception of artifacts in videos. ACM
Transactions on Applied Perception (TAP) 14, 4 (Sep 2017), 1–12. https://doi.org/
10.1145/3129289
[63]
Cristian Vaccari and Andrew Chadwick. 2020. Deepfakes and disinformation:
exploring the impact of synthetic political video on deception, uncertainty, and
trust in news. Social Media+ Society 6, 1 (2020), 2056305120903408. https:
//doi.org/10.1177/2056305120903408
[64]
Goedele Van Belle, Meike Ramon, Philippe Lefèvre, and Bruno Rossion. 2010.
Fixation patterns during recognition of personally familiar and unfamiliar faces.
Frontiers in Psychology 1 (2010), 20. https://doi.org/10.3389/fpsyg.2010.00020
[65]
Melissa L-H Võ, Tim J Smith, Parag K Mital, and John M Henderson. 2012. Do
the eyes really have it? Dynamic allocation of attention when viewing moving
faces. Journal of vision 12, 13 (2012), 3–3. https://doi.org/10.1167/12.13.3
[66]
Christian Wallraven, Heinrich H. Bültho, Douglas W. Cunningham, Jan Fischer,
and Dirk Bartz. 2007. Evaluation of Real-World and Computer-Generated Stylized
Facial Expressions. ACM Transactions on Applied Perception (TAP) 4, 3 (Nov. 2007),
16–es. https://doi.org/10.1145/1278387.1278390
[67]
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros.
2020. CNN-generated images are surprisingly easy to spot... for now. In IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 7. IEEE, New
York, NY, USA, 8692–8701. https://doi.org/10.1109/CVPR42600.2020.00872
[68]
Wenguan Wang, Jianbing Shen, Jianwen Xie, Ming-Ming Cheng, Haibin Ling,
and Ali Borji. 2019. Revisiting video saliency prediction in the deep learning
era. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2019),
220–237. https://doi.org/10.1109/TPAMI.2019.2924417
[69]
Leslie Wöhler, Jann-Ole Henningson, Susana Castillo, and Marcus Magnor. 2020.
PEFS: A Validated Dataset for Perceptual Experiments on Face Swap Portrait
Videos. In International Conference on Computer Animation and Social Agents
(CASA). Springer, Springer, Cham, 120–127. https://doi.org/10.1007/978-3- 030-
63426-1_13
[70]
Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent
head poses. In IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, New York, NY, USA, 8261–8265. https://doi.org/10.
1109/ICASSP.2019.8683164
CHI ’21, May 8–13, 2021, Yokohama, Japan
[71]
Gregory Zelinsky. 2013. Understanding scene understanding. Frontiers in Psy-
chology 4 (2013), 954. https://doi.org/10.3389/fpsyg.2013.00954
[72]
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It’s writ-
ten all over your face: Full-face appearance-based gaze estimation. In IEEE/CVF
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE,
New York, NY, USA, 51–60. https://doi.org/10.1109/CVPRW.2017.284
[73]
Fabian Zimmermann and Matthias Kohring. 2020. Mistrust, disinforming news,
and vote choice: A panel survey on the origins and consequences of believing dis-
information in the 2017 German parliamentary election. Political Communication
37, 2 (2020), 215–237. https://doi.org/10.1080/10584609.2019.1686095
... Due to its natural appearance, unobtrusiveness, and ability to preserve facial expressions [39,40,41], face-swapping has been explored as an anonymization technique. It has been studied on videos of children used to research and diagnose autism [15]. ...
... To this end, we compare the three anonymization techniques blocking, blurring, and face-swapping (see Fig. 1). As face-swapping has been found to be non-obtrusive in portrait videos [39], we estimate that it is also difficult to notice in 360°videos. Due to this, face-swapping might be perceived similar to the original videos. ...
... It was rated as distracting and unpleasant with participants furthermore describing it as unnatural in the debriefing. Another similarity to regular videos is that the detection accuracy for face-swapped 360 • videos in both conditions is relative low with participants having problems to distinguish between original and altered videos [16,39]. ...
Preprint
In this work, we investigate facial anonymization techniques in 360{\deg} videos and assess their influence on the perceived realism, anonymization effect, and presence of participants. In comparison to traditional footage, 360{\deg} videos can convey engaging, immersive experiences that accurately represent the atmosphere of real-world locations. As the entire environment is captured simultaneously, it is necessary to anonymize the faces of bystanders in recordings of public spaces. Since this alters the video content, the perceived realism and immersion could be reduced. To understand these effects, we compare non-anonymized and anonymized 360{\deg} videos using blurring, black boxes, and face-swapping shown either on a regular screen or in a head-mounted display (HMD). Our results indicate significant differences in the perception of the anonymization techniques. We find that face-swapping is most realistic and least disruptive, however, participants raised concerns regarding the effectiveness of the anonymization. Furthermore, we observe that presence is affected by facial anonymization in HMD condition. Overall, the results underscore the need for facial anonymization techniques that balance both photo-realism and a sense of privacy.
... For example, one study showed that detection accuracy ranged from 24% for high-quality deepfakes to 71% for lower-quality deepfakes (Korshunov & Marcel, 2021). Other studies have shown similar detection ranges (Köbis et al., 2021;Tahir et al., 2021;Wöhler et al., 2021). Interestingly, correct identification of videos as true recordings ranges between 60 and 75%, which is similar to the rate at which high-quality deepfakes are incorrectly labelled as genuine (Köbis et al., 2021). ...
... Fortunately, factors that contribute to detection, such as blurring and asynchronous mouth movements (Thaw et al., 2021) are becoming increasingly easier to avoid as technology improves. One study showed that truncating longer deepfakes to 3 s did not impact manipulation recognition, suggesting conscious identification occurs rather early (Wöhler et al., 2021). ...
... Despite the high relevance of emotional expressions in human communication (Jack & Schyns, 2015), and the role of emotion in how people perceive fake news (Martel et al., 2020), few studies have examined the ability for deepfakes to accurately portray emotional expressions. Wöhler et al. (2021) did so using face swaps, which are videos in which the inner face has been replaced with a deepfake of another identity. From a longer video of an interview, a 3-s clip was extracted that portrayed one of ten emotions. ...
Article
Full-text available
Video recordings accurately capture facial expression movements; however, they are difficult for face perception researchers to standardise and manipulate. For this reason, dynamic morphs of photographs are often used, despite their lack of naturalistic facial motion. This study aimed to investigate how humans perceive emotions from faces using real videos and two different approaches to artificially generating dynamic expressions – dynamic morphs, and AI-synthesised deepfakes. Our participants perceived dynamic morphed expressions as less intense when compared with videos (all emotions) and deepfakes (fearful, happy, sad). Videos and deepfakes were perceived similarly. Additionally, they perceived morphed happiness and sadness, but not morphed anger or fear, as less genuine than other formats. Our findings support previous research indicating that social responses to morphed emotions are not representative of those to video recordings. The findings also suggest that deepfakes may offer a more suitable standardized stimulus type compared to morphs. Additionally, qualitative data were collected from participants and analysed using ChatGPT, a large language model. ChatGPT successfully identified themes in the data consistent with those identified by an independent human researcher. According to this analysis, our participants perceived dynamic morphs as less natural compared with videos and deepfakes. That participants perceived deepfakes and videos similarly suggests that deepfakes effectively replicate natural facial movements, making them a promising alternative for face perception research. The study contributes to the growing body of research exploring the usefulness of generative artificial intelligence for advancing the study of human perception.
... . Weitere wesentliche Probleme 2 Beispielsweise wurden in einer Untersuchung manipulierte Videos, die einen Filmausschnitt zeigten, von einigen Teilnehmenden für echt gehalten, da ihnen nicht bewusst war, dass Filmszenen gefälscht werden können[52].3 So neigten Teilnehmende, denen detailliertes Wissen über Deepfakes oder der (echte) Kontext der Aufgabe fehlte, in Untersuchungen dazu, erkannte Fehler eher auf andere Gründe zurückzuführen, z.B.: eine begrenzte Videoqualität oder Komprimierung, Aufnahmeartefakte, die Verwendung von Schönheitsfiltern oder sogar die mögliche Blindheit einer Person[52]. ...
... . Weitere wesentliche Probleme 2 Beispielsweise wurden in einer Untersuchung manipulierte Videos, die einen Filmausschnitt zeigten, von einigen Teilnehmenden für echt gehalten, da ihnen nicht bewusst war, dass Filmszenen gefälscht werden können[52].3 So neigten Teilnehmende, denen detailliertes Wissen über Deepfakes oder der (echte) Kontext der Aufgabe fehlte, in Untersuchungen dazu, erkannte Fehler eher auf andere Gründe zurückzuführen, z.B.: eine begrenzte Videoqualität oder Komprimierung, Aufnahmeartefakte, die Verwendung von Schönheitsfiltern oder sogar die mögliche Blindheit einer Person[52]. ...
Article
Full-text available
Zusammenfassung Die zunehmende Verbreitung von täuschend echt wirkenden Deepfakes könnte künftig zu einer Herausforderung für Videoverhandlungen vor Gericht werden. So besteht die Möglichkeit, dass Deepfakes dazu verwendet werden könnten, Stimme oder Gesicht einer Person zu manipulieren und somit einen falschen Eindruck von ihrer Teilnahme an Verhandlungen zu erwecken. Dieser Beitrag diskutiert aktuelle Erkennungsmethoden sowie die Stärken und Schwächen menschlicher und technischer Erkennungsleistung und betrachtet mögliche Schritte zur Risikominderung.
... Several interactive tools have been developed to support different tasks, such as processing medical images [28], transferring text data into graph [29], and time-series data [30]. There are also user interface (UI) designs for supporting model developers and users, e.g., in natural language processing (NLP) [31], face recognition [32], computer vision [33], [34], and real-time games [35]. Some designs consider not only the tasks in ML workflows but also the people who use interactive visualization facilities (e.g., [36]). ...