ArticlePDF Available

AI-synthesized faces are indistinguishable from real faces and more trustworthy

Authors:

Abstract and Figures

Artificial intelligence (AI)–synthesized text, audio, image, and video are being weaponized for the purposes of nonconsensual intimate imagery, financial fraud, and disinformation campaigns. Our evaluation of the photorealism of AI-synthesized faces indicates that synthesis engines have passed through the uncanny valley and are capable of creating faces that are indistinguishable—and more trustworthy—than real faces.
Content may be subject to copyright.
BRIEF REPORT
PSYCHOLOGICAL AND
COGNITIVE SCIENCES
AI-synthesized faces are indistinguishable from real
faces and more trustworthy
Sophie J. Nightingalea,1 and Hany Faridb
aDepartment of Psychology, Lancaster University, Lancaster LA1 4YW, United Kingdom; and bDepartment of Electrical Engineering and Computer Sciences,
University of California, Berkeley, CA 94720
Edited by William Press, Computer Sciences and Integrative Biology, University of Texas at Austin, Austin, TX; received November 11, 2021; accepted
December 20, 2021
Articial intelligence (AI)–synthesized text, audio, image, and
video are being weaponized for the purposes of nonconsensual
intimate imagery, nancial fraud, and disinformation campaigns.
Our evaluation of the photorealism of AI-synthesized faces indi-
cates that synthesis engines have passed through the uncanny val-
ley and are capable of creating faces that are indistinguishable—
and more trustworthy—than real faces.
deep fakes | face perception
Artificial intelligence (AI)–powered audio, image, and video
synthesis—so-called deep fakes—has democratized access
to previously exclusive Hollywood-grade, special effects technol-
ogy. From synthesizing speech in anyone’s voice (1) to synthesiz-
ing an image of a fictional person (2) and swapping one person’s
identity with another or altering what they are saying in a video
(3), AI-synthesized content holds the power to entertain but also
deceive.
Generative adversarial networks (GANs) are popular mech-
anisms for synthesizing content. A GAN pits two neural
networks—a generator and discriminator—against each other.
To synthesize an image of a fictional person, the generator starts
with a random array of pixels and iteratively learns to synthesize
a realistic face. On each iteration, the discriminator learns to
distinguish the synthesized face from a corpus of real faces; if the
synthesized face is distinguishable from the real faces, then the
discriminator penalizes the generator. Over multiple iterations,
the generator learns to synthesize increasingly more realistic
faces until the discriminator is unable to distinguish it from real
faces (see Fig. 1 for example real and synthetic faces).
Much has been written in the popular press about the potential
threats of deep fakes, including the creation of nonconsensual
intimate imagery (more commonly referred to by the misnomer
“revenge porn”), small- to large-scale fraud, and adding jet fuel
to already dangerous disinformation campaigns. Perhaps most
pernicious is the consequence that, in a digital world in which any
image or video can be faked, the authenticity of any inconvenient
or unwelcome recording can be called into question.
Although progress has been made in developing automatic
techniques to detect deep-fake content (e.g., refs. 4–6), current
techniques are not efficient or accurate enough to contend with
the torrent of daily uploads (7). The average consumer of online
content, therefore, must contend with sorting out the real from
the fake. We performed a series of perceptual studies to deter-
mine whether human participants can distinguish state-of-the-art
GAN-synthesized faces from real faces and what level of trust the
faces evoked.
Results
Experiment 1. In this study, 315 participants classified, one at
a time, 128 of the 800 faces as real or synthesized. Shown in
Fig. 2Ais the distribution of participant accuracy (blue bars).
The average accuracy is 48.2% (95% CI [47.1%, 49.2%]),
close to chance performance of 50%, with no response bias:
d=0.09;β=0.99. Two repeated-measures binary logistic
regression analyses were conducted—one for real and one for
synthetic faces—to examine the effect of stimuli gender and race
on accuracy. For real faces, there was a significant gender ×
race interaction, χ2(3, N= 315) = 95.03, P<0.001. Post hoc
Bonferroni-corrected comparisons revealed that mean accuracy
was higher for male East Asian faces than female East Asian
faces and higher for male White faces than female White faces.
For synthetic faces, there was also a significant gender ×race
interaction, χ2(3, N= 315) = 68.41, P<0.001. For both male
and female synthetic faces, White faces were the least accurately
classified, and male White faces were less accurately classified
than female White faces. We hypothesize that White faces are
more difficult to classify because they are overrepresented in the
StyleGAN2 training dataset and are therefore more realistic.
Experiment 2. In this study, 219 new participants, with training
and trial-by-trial feedback, classified 128 faces taken from the
same 800 set of faces as in experiment 1. Shown in Fig. 2A
is the distribution of participant accuracy (orange bars). The
average accuracy improved slightly to 59.0% (95% CI [57.7%,
60.4%]), with no response bias: d=0.46;β=0.99.Despite
providing trial-by-trial feedback, there was no improvement in
accuracy over time, with an average accuracy of 59.3% (95% CI
[57.8%, 60.7%]) for the first set of 64 faces and 58.8% (95% CI
[57.4%, 60.3%]) for the second set of 64 faces. Further analyses
to examine the effect of gender and race on accuracy replicated
the primary findings of experiment 1. This analysis again revealed
that, for both male and female synthetic faces, White faces were
the most difficult to classify.
When made aware of rendering artifacts and given feedback,
there was a reliable improvement in accuracy; however, overall
performance remained only slightly above chance. The lack of
improvement over time suggests that the impact of feedback is
limited, presumably because some synthetic faces simply do not
contain perceptually detectable artifacts.
Experiment 3. Faces provide a rich source of information, with
exposure of just milliseconds sufficient to make implicit infer-
ences about individual traits such as trustworthiness (8). We
wondered whether synthetic faces activate the same judgements
of trustworthiness. If not, then a perception of trustworthiness
could help distinguish real from synthetic faces.
In this study, 223 participants rated the trustworthiness of 128
faces taken from the same set of 800 faces on a scale of 1 (very
untrustworthy) to 7 (very trustworthy) (9). Shown in Fig. 2Bis the
distribution of average ratings (by averaging the ordinal ratings,
we are assuming a linear rating scale). The average rating for real
Author contributions: S.J.N. and H.F. designed research, performed research, contributed
new reagents/analytic tools, analyzed data, and wrote the paper.
The authors declare no competing interest.
This open access article is distributed under Creative Commons Attribution-
NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).
1To whom correspondence may be addressed. Email: s.nightingale1@lancaster.ac.uk.
Published February 14, 2022.
PNAS 2022 Vol. 119 No. 8 e2120481119 https://doi.org/10.1073/pnas.2120481119 1of3
Downloaded by guest on February 17, 2022
Fig. 1. The most (Top and Upper Middle) and least (Bottom and Lower
Middle) accurately classied real (R) and synthetic (S) faces.
faces (blue bars) of 4.48 is less than the rating of 4.82 for synthetic
faces (orange bars). Although only 7.7% more trustworthy, this
difference is significant [t(222) = 14.6, P<0.001, d=0.49]. Al-
though a small effect, Black faces were rated more trustworthy
than South Asian faces, but, otherwise, there was no effect
across race. Women were rated as significantly more trustworthy
than men, 4.94 as compared to 4.36 [t(222) = 19.5, P<0.001,
d=0.82].
Shown in Fig. 3 are the four most (Fig. 3, To p) and four
least (Fig. 3, Bottom) trustworthy faces. The top three most
trustworthy faces are synthetic (S), while the bottom four least
trustworthy faces are real (R). A smiling face is more likely to
be rated as trustworthy, but 65.5% of our real faces and 58.8%
of synthetic faces are smiling, so facial expression alone cannot
explain why synthetic faces are rated as more trustworthy.
Discussion
Synthetically generated faces are not just highly photorealistic,
they are nearly indistinguishable from real faces and are judged
more trustworthy. This hyperphotorealism is consistent with re-
cent findings (10, 11). These two studies did not contain the
same diversity of race and gender as ours, nor did they match
the real and synthetic faces as we did to minimize the chance of
inadvertent cues. While it is less surprising that White male faces
are highly realistic—because these faces dominate the neural
network training—we find that the realism of synthetic faces
extends across race and gender. Perhaps most interestingly, we
find that synthetically generated faces are more trustworthy than
real faces. This may be because synthesized faces tend to look
more like average faces which themselves are deemed more
trustworthy (12). Regardless of the underlying reason, synthet-
ically generated faces have emerged on the other side of the
uncanny valley. This should be considered a success for the fields
of computer graphics and vision. At the same time, easy access
Fig. 2. The distribution of participant accuracy for (A) experiment 1 and
experiment 2 (chance performance is 50%), and (B) trustworthy ratings for
experiment 3 (a rating of 1 corresponds to the lowest trust).
(https://thispersondoesnotexist.com) to such high-quality fake
imagery has led and will continue to lead to various problems,
including more convincing online fake profiles and—as synthetic
audio and video generation continues to improve—problems of
nonconsensual intimate imagery (13), fraud, and disinformation
campaigns, with serious implications for individuals, societies,
and democracies.
We, therefore, encourage those developing these technologies
to consider whether the associated risks are greater than their
benefits. If so, then we discourage the development of technology
simply because it is possible. If not, then we encourage the par-
allel development of reasonable safeguards to help mitigate the
inevitable harms from the resulting synthetic media. Safeguards
could include, for example, incorporating robust watermarks into
the image and video synthesis networks that would provide a
downstream mechanism for reliable identification (14). Because
it is the democratization of access to this powerful technology
that poses the most significant threat, we also encourage recon-
sideration of the often laissez-faire approach to the public and
unrestricted releasing of code for anyone to incorporate into any
application.
Fig. 3. The four most (Top ) and four least (Bottom) trustworthy faces and
their trustworthy rating on a scale of 1 (very untrustworthy) to 7 (very
trustworthy). Synthetic faces (S) are, on average, more trustworthy than real
faces (R).
2of3 PNAS
https://doi.org/10.1073/pnas.2120481119
Nightingale and Farid
AI-synthesized faces are indistinguishable from
real faces and more trustworthy
Downloaded by guest on February 17, 2022
BRIEF REPORT
PSYCHOLOGICAL AND
COGNITIVE SCIENCES
Fig. 4. A representative set of matched real and synthetic faces.
At this pivotal moment, and as other scientific and engineering
fields have done, we encourage the graphics and vision commu-
nity to develop guidelines for the creation and distribution of
synthetic media technologies that incorporate ethical guidelines
for researchers, publishers, and media distributors.
Materials and Methods
Synthetic Faces. We selected 400 faces synthesized using the state-of-the-
art StyleGAN2 (2), ensuring diversity across gender (200 women; 200 men),
estimated age (ensuring a range of ages from children to older adults), and
race (100 African American or Black, 100 Caucasian, 100 East Asian, and 100
South Asian). To reduce extraneous cues, we only included images with a
mostly uniform background, and devoid of any obvious rendering artifacts.
This culling of obvious artifacts makes the perceptual task harder. Because
the synthesis process is so easy, however, it is reasonable to assume that any
intentionally deceptive use of a synthetic face will not contain obvious visual
artifacts.
Real Faces. For each synthesized face, we collected a matching real face (in
terms of gender,age, race, and overall appearance) from the underlying face
database used in the StyleGAN2 learning stage. A standard convolutional
neural network descriptor (15) was used to extract a low-dimensional,
perceptually meaningful (16) representation of each synthetic face. The
extracted representation for each synthetic face—a 4,096-D real-valued
vector
vs—was compared with all other facial representations in the data
set of 70,000 real faces to nd the most similar face. The real face with
representation
vrwith minimal Euclidean distance to
vs, and satisfying our
qualitative selection criteria, is selected as the matching face. As with the
synthetic faces, to reduce extraneous cues, we only included images 1) with
a mostly uniform background, 2) with unobstructed faces (e.g., no hats or
hands in front of face), 3) in focus and high resolution, and 4) with no
obvious writing or logos on clothing. We visually inspected up to 50 of the
best matched faces and selected the one that met the above criteria and was
also matched in terms of overall face position, posture, and expression, and
presence of glasses and jewelry. Shown in Fig. 4 are representative examples
of these matched real and synthetic faces.
Perceptual Ratings. For experiment 1 (baseline), we recruited 315 partici-
pants from Amazon’s Mechanical Turk Master Workers. Each participant rst
read a brief introduction explaining the purpose of the study and a brief
explanation of what a synthetic face is. Before beginning, each participant
was informed they would be paid $5 for their time, and an extra $5 if their
overall accuracy was in the top 20% of response accuracies. Participants were
also informed they would see 10 catch trials of obviously synthetic faces
with glaring rendering errors. A failure to respond correctly to at least nine
of these trials led to the participants not being paid and their data being
excluded from our study. Each participant then saw 128 images, one at a
time, and specied whether the image was real or synthetic. Participants
had an unlimited amount of time to respond and were not provided with
feedback after each response.
For experiment 2 (training and feedback), we recruited another 219
Mechanical Turk Master Workers (we had fewer participants in this study
because we excluded any participants who completed the rst study). Each
participant rst read a brief introduction explaining the purpose of the study
and a brief explanation of what a synthetic face is. Participants were then
shown a short tutorial describing examples of specic rendering artifacts
that can be used to identify synthetic faces. All other experimental condi-
tions were the same as in experiment 1, except that participants received
feedback after each response.
For experiment 3 (trustworthiness), we recruited 223 Mechanical Turk
Master Workers. Each participant rst read a brief introduction explaining
that the purpose of the study was to assess the trustworthiness of a face on
a scale of 1 (very untrustworthy) to 7 (very trustworthy). Because there was
no correct answer here, no trial-by-trial feedback was provided. Participants
were also informed they would see 10 catch trials of faces in which the
numeric trustworthy rating was directly overlaid atop the face. A failure
to correctly report the specied rating on at least nine of these trials led
to the participants not being paid and their data being excluded from our
study. Each participant then saw 128 images, one at a time, and was asked
to rate the trustworthiness. Participants had an unlimited amount of time
to respond.
All experiments were carried out with the approval of the University of
California, Berkeley’s Ofce for Protection of Human Subjects (Protocol ID
2019-07-12422) and Lancaster University’s Faculty of Science and Technology
Research Ethics Committee (Protocol ID FST20076). Participants gave fully
informed consent prior to taking part in the study.
Data Availability. Images have been deposited in GitHub (https://
github.com/NVlabs/stylegan2 and https://github.com/NVlabs/ffhq-dataset).
Anonymized experimental stimuli and data have been deposited in the
Open Science Framework (https://osf.io/ru36d/).
ACKNOWLEDGMENTS. We thank Erik Härkönen, Jaakko Lehtinen, and
David Luebke for their masterful synthesis of faces.
1. A. Oord et al., WaveNet: A generative model for raw audio. arXiv [Preprint] (2016).
https://arxiv.org/abs/1609.03499 (Accessed 17 January 2022).
2. T. Karras et al., “Analyzing and improving the image quality of StyleGAN” in IEEE
Conference on Computer Vision and Pattern Recognition (Institute of Electrical and
Electronics Engineers, 2020), pp. 8110–8119.
3. S. Suwajanakorn, S. M. Seitz, I. Kemelmacher-Shlizerman, Synthesizing Obama:
Learning lip sync from audio. ACM Trans. Graph. 36, 95 (2017).
4. L. Li et al., Face X-ray for more general face forgery detection. arXiv [Preprint]
(2019). https://arxiv.org/abs/1912.13458 (Accessed 17 January 2022).
5. S. Agarwal et al., “Protecting world leaders against deep fakes” in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
(Institute of Electrical and Electronics Engineers, 2019), pp. 38–45.
6. S. Y. Wang, O. Wang, R. Zhang, A. Owens, A. A. Efros, “CNN-generated images
are surprisingly easy to spot... for now” in IEEE Conference on Computer Vision
and Pattern Recognition (Institute of Electrical and Electronics Engineers, 2020), pp.
8695–8704.
7. H. Farid, Commentary: Digital forensics in a post-truth age. Forensic Sci. Int. 289,
268–269 (2018).
8. J. Willis, A. Todorov, First impressions: Making up your mind after a 100-ms exposure
to a face. Psychol. Sci. 17, 592–598 (2006).
9. R. M. Stolier, E. Hehman, M. D. Keller, M. Walker, J. B. Freeman, The conceptual
structure of face impressions. Proc.Natl.Acad.Sci.U.S.A.115, 9210–9215 (2018).
10. F. Lago et al., More real than real: A study on human visual perception of synthetic
faces. arXiv [Preprint] (2021). https://arxiv.org/abs/2106.07226 (Accessed 17 January
2022).
11. N. Hulzebosch, S. Ibrahimi, M. Worring, “Detecting CNN-generated facial images
in real-world scenarios” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops (Institute of Electrical and Electronics
Engineers, 2020), pp. 642–643.
12. C. Sofer, R. Dotsch, D. H. Wigboldus, A. Todorov, What is typical is good: The
inuence of face typicality on perceived trustworthiness. Psychol. Sci. 26, 39–47
(2015).
13. D. K. Citron, M. A. Franks, Criminalizing revenge porn. Wake For. Law Rev. 49, 345
(2014).
14. N. Yu, V. Skripniuk, S. Abdelnabi, M. Fritz, “Articial ngerprinting for generative
models: Rooting deepfake attribution in training data” in Proceedings of the
IEEE/CVF International Conference on Computer Vision (Institute of Electrical and
Electronics Engineers, 2021), pp. 14448–14457.
15. O. M. Parkhi, A. Vedaldi, A. Zisserman, Deep Face Recognition (British Machine
Vision Association, 2015).
16. T. Tariq, O. T. Tursun, M. Kim, P. Didyk, “Why are deep representations good per-
ceptual quality features?” in European Conference on Computer Vision (Springer,
2020), pp. 445–461.
Nightingale and Farid
AI-synthesized faces are indistinguishable from
real faces and more trustworthy
PNAS 3of3
https://doi.org/10.1073/pnas.2120481119
Downloaded by guest on February 17, 2022
... Advancements in technology have now made it possible to create near-perfect simulations that are indistinguishable from reality with an ease, affordability and accessibility that are unprecedented in Human history. These artificial, yet realistic constructs permeate all areas of life through immersive works of fiction, deep fakes (real-like images and videos generated by deep learning algorithms), virtual and augmented reality (VR and AR), artificial beings (artificial intelligence "bots" with or without a physical form), fake news and skewed narratives, of which ground truth is often hard to access (Nightingale & Farid, 2022). Such developments not only carry important consequences for the technological and entertainment sectors, but also for security and politics -for instance if used for propaganda and disinformation, recruitment into malevolent organizations, or religious indoctrination (Pantserev, 2020). ...
... While not all simulations have achieved perfect realism, such as Computer Generated Images (CGI) in movies or via recent algorithms such as GANs or diffusion model, which often include distortions or lack certain key details distinguishing them from real images (Corvi et al., 2022;McDonnell & Breidt, 2010), it is fair to assume that these technical limitations will become negligible in the near future. This is particularly true in the field of face generation, where face-generation algorithms are already able to create stimuli that are virtually indistinguishable from real photos (Moshel et al., 2022;Nightingale & Farid, 2022;Tucciarelli et al., 2020). Such a technological feat, however, leads to a new question: if real and fake stimuli cannot be differentiated based on their objective "physical" characteristics, how can we form judgements regarding their nature? ...
... Similarly, when participants are informed that faces are AI-generated, the perceived artificiality leads to lower trust ratings (Wang & Nishida, 2024), even when they are real faces (Liefooghe et al., 2022). In contrast, when participants are unaware that the faces are AI-generated, trust ratings for these synthetic faces tend to increase (Nightingale & Farid, 2022). Whereas this line of evidence suggests that reality beliefs have an effect on face attractiveness and trustworthiness ratings, the opposite question -whether attractiveness and trustworthiness contribute to the formation of reality beliefs -has received little attention to date. ...
... AI-generated faces, produced using techniques such as generative adversarial networks (GANs) [1], [2] and diffusion models [3]- [8], have become nearly indistinguishable from those captured by digital cameras [9]. This raises concerns about their potential misuse, making their detection crucial for countering misinformation, preserving multimedia integrity, and ensuring trust in sensitive visual data. ...
... Notably, our method successfully detects AI-generated faces by commercial APIs such as Midjourney [7] and SDv2.1 [3]. Second, UnivFD, which utilizes semantic CLIP features [26], exhibits marginal performance, indicating that semantics of photographic and AI-generated faces become increasingly indistinguishable, consistent with the findings in [9]. Third, frequency-based (i.e., FRDM [18]) and gradient-based (i.e., LGrad [39]) methods, trained on GAN-based images, also perform well on diffusion-based images in terms of AP, suggesting a shared presence of lowlevel artifacts as detection cues. ...
Preprint
Full-text available
The detection of AI-generated faces is commonly approached as a binary classification task. Nevertheless, the resulting detectors frequently struggle to adapt to novel AI face generators, which evolve rapidly. In this paper, we describe an anomaly detection method for AI-generated faces by leveraging self-supervised learning of camera-intrinsic and face-specific features purely from photographic face images. The success of our method lies in designing a pretext task that trains a feature extractor to rank four ordinal exchangeable image file format (EXIF) tags and classify artificially manipulated face images. Subsequently, we model the learned feature distribution of photographic face images using a Gaussian mixture model. Faces with low likelihoods are flagged as AI-generated. Both quantitative and qualitative experiments validate the effectiveness of our method. Our code is available at \url{https://github.com/MZMMSEC/AIGFD_EXIF.git}.
... First, studies continue to demonstrate our reliance on cognitive heuristics for visual perception. For example, synthetic faces may seem more trustworthy because they appear like "average faces" (Nightingale and Farid 2022), representing "the norm for facial judgements" (Seyama and Nagayama 2012, 385). Seyama and Nagayama's own experiment found that "perception of photorealism" is inversely proportional to "the detection of artificialness," with participants more inclined to perceive emergent artifice than diminishing naturalness (385). ...
... According to certain scholars (Zellers et al., 2019), AI-generated fake news are more credible to human raters than human-written disinformation. Moreover, texts, images, audios, and videos generated by LLMs can be impossible to distinguish from human creation (Nightingale & Farid, 2022). ...
Article
Full-text available
The paper examines the legal regulation and governance of “generative artificial intelligence” (AI), “foundation AI,” “large language models” (LLMs), and the “general-purpose” AI models of the AI Act. Attention is drawn to two potential sorcerer’s apprentices, namely, in the spirit of J. W. Goethe’s poem, people who were unable to control a situation they created. Focus is on developers and producers of technologies, such as LLMs that bring about risks of discrimination and information hazards, malicious uses and environmental harms; furthermore, the analysis dwells on the normative attempt of European Union legislators to govern misuses and overuses of LLMs with the AI Act. Scholars, private companies, and organisations have stressed limits of such normative attempt. In addition to issues of competitiveness and legal certainty, bureaucratic burdens and standard development, the threat is the over-frequent revision of the law to tackle advancements of technology. The paper illustrates this threat since the inception of the AI Act and recommends some ways in which the law has not to be continuously amended to address the challenges of technological innovation.
... Machine learning techniques can be applied to constantly evolve and improve images or videos, creating ever more convincing synthesised media [81] that are indistinguishable from reality. A recent study found that we are not able to discern real faces from fake ones and GAN-generated faces are even rated as more trustworthy than real human faces [82]. Deepfake technology can also be used in the palaeoanthropology and archaeology sectors to add facial expression, movement, and voice to static imagery. ...
Article
Full-text available
It has been proposed that we are entering the age of postmortalism, where digital immortality is a credible option. The desire to overcome death has occupied humanity for centuries, and even though biological immortality is still impossible, recent technological advances have enabled possible eternal life in the metaverse. In palaeoanthropology and archaeology contexts, we are often driven by our preoccupation with visualising and interacting with ancient populations, with the production of facial depictions of people from the past enabling some interaction. New technologies and their implementation, such as the use of Artificial Intelligence (AI), are profoundly transforming the ways that images, videos, voices, and avatars of digital ancient humans are produced, manipulated, disseminated, and viewed. As facial depiction practitioners, postmortalism crosses challenging ethical territory around consent and representation. Should we create a postmortem avatar of someone from past just because it is technically possible, and what are the implications of this kind of forced immortality? This paper describes the history of the technologically mediated simulation of people, discussing the benefits and flaws of each technological iteration. Recent applications of 4D digital technology and AI to the fields of palaeoanthropological and historical facial depiction are discussed in relation to the technical, aesthetic, and ethical challenges associated with this phenomenon.
... Humans often find such content indistinguishable from genuine examples, and generating such content is both relatively simple and extremely cheap (356). An example of this is images of human faces that were altered, or completely generated, using general-purpose AI or narrower types of AI systems (357). Such 'Deepfake' images and videos are thought to have been deployed in several national elections over recent months to defame political opponents, with potentially significant impact, but there is currently not much scientific evidence about the impact of such campaigns. ...
Preprint
Full-text available
This is the interim publication of the first International Scientific Report on the Safety of Advanced AI. The report synthesises the scientific understanding of general-purpose AI -- AI that can perform a wide variety of tasks -- with a focus on understanding and managing its risks. A diverse group of 75 AI experts contributed to this report, including an international Expert Advisory Panel nominated by 30 countries, the EU, and the UN. Led by the Chair, these independent experts collectively had full discretion over the report's content.
Article
Increasingly, practitioners are using artificial intelligence (AI) to strategically monitor and respond to crises. However, there is little evidence indicating whether a crisis response, disclosed as AI-scripted, will be accepted by stakeholders and what effects the disclosure may have on message credibility, attribution of responsibility, message acceptance, and organizational reputation. Using Situational Crisis Communication Theory (SCCT), this 2 (type of crisis) x 2 (presence or absence of AI label) online experiment explored how the disclosure of AI-generated content affects post-crisis organizational outcomes. Participants (n = 238) were randomly assigned to a vignette featuring a victim (e.g. shooting) or accidental (e.g. data breach) cluster crisis and a detailed response that either disclosed or did not disclose that the content was AI-generated. Results revealed no effect of disclosure on message credibility or attribution of responsibility. However, message acceptance served as a mediator between message credibility and attribution of responsibility on organizational reputation. This study contributes to SCCT and provides contextual evidence for practitioners who are considering AI for crisis responses. Ethical implications and future directions are discussed.
Conference Paper
Full-text available
Artificial, CNN-generated images are now of such high quality that humans have trouble distinguishing them from real images. Several algorithmic detection methods have been proposed, but these appear to generalize poorly to data from unknown sources, making them infeasible for real-world scenarios. In this work, we present a framework for evaluating detection methods under real-world conditions, consisting of cross-model, cross-data, and postprocessing evaluation, and we evaluate state-of-the-art detection methods using the proposed framework. Furthermore, we examine the usefulness of commonly used image pre-processing methods. Lastly, we evaluate human performance on detecting CNN-generated images, along with factors that influence this performance, by conducting an online survey. Our results suggest that CNN-based detection methods are not yet robust enough to be used in real-world scenarios.
Article
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Deep fakes have recently become popular. The term refers to doctored media content where one’s face is swapped with someone else’s face or performs someone else’s face movements. In the last couple of years, numerous video clips, often involving celebrities and politicians, have gone viral on social media platforms. This has been enabled by easy-to-use apps capable of processing user-generated content in real time. Although early deep fakes were easy to spot, technology has improved, and whether or not they can fool a human’s visual system is still unknown.
Chapter
Recently, intermediate feature maps of pre-trained convolutional neural networks have shown significant perceptual quality improvements, when they are used in the loss function for training new networks. It is believed that these features are better at encoding the perceptual quality and provide more efficient representations of input images compared to other perceptual metrics such as SSIM and PSNR. However, there have been no systematic studies to determine the underlying reason. Due to the lack of such an analysis, it is not possible to evaluate the performance of a particular set of features or to improve the perceptual quality even more by carefully selecting a subset of features from a pre-trained CNN. This work shows that the capabilities of pre-trained deep CNN features in optimizing the perceptual quality are correlated with their success in capturing basic human visual perception characteristics. In particular, we focus our analysis on fundamental aspects of human perception, such as the contrast sensitivity and orientation selectivity. We introduce two new formulations to measure the frequency and orientation selectivity of the features learned by convolutional layers for evaluating deep features learned by widely-used deep CNNs such as VGG-16. We demonstrate that the pre-trained CNN features which receive higher scores are better at predicting human quality judgment. Furthermore, we show the possibility of using our method to select deep features to form a new loss function, which improves the image reconstruction quality for the well-known single-image super-resolution problem.
Article
Significance Current theory of face-based trait impressions focuses on their foundation in facial morphology, from which emerges a correlation structure of face impressions due to shared feature dependence, “face trait space.” Here, we proposed that perceivers’ lay conceptual beliefs about how personality traits correlate structure their face impressions. We demonstrate that “conceptual trait space” explains a substantial portion of variance in face trait space. Further, we find that perceivers who believe any set of personality traits (e.g., trustworthiness, intelligence) is more correlated in others use more similar facial features when making impressions of those traits. These findings suggest lay conceptual beliefs about personality play a crucial role in face-based trait impressions and may underlie both their similarities and differences across perceivers.
Article
Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.