Guoying Zhao’s research while affiliated with University of Oulu and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Active Defense Against Voice Conversion Through Generative Adversarial Network
  • Article

January 2024

·

21 Reads

·

4 Citations

Signal Processing Letters, IEEE

Shihang Dong

·

Beijing Chen

·

Kaijie Ma

·

Guoying Zhao

Active defense is an important approach to counter speech deepfakes that threaten individuals' privacy, property, and reputation. However, the existing works in this field suffer from issues such as time-consuming and ordinary defense effectiveness. This letter proposes a Generative Adversarial Network (GAN) framework for adversarial attacks as a defense against malicious voice conversion. The proposed method uses a generator to produce adversarial perturbations and adds them to the mel-spectrogram of the target audio to craft adversarial example. In addition, in order to enhance the defense effectiveness, a spectrogram waveform conversion simulation module (SWCSM) is designed to simulate the process of reconstructing waveform from the adversarial mel-spectrogram example and re-extracting mel-spectrogram from the reconstructed waveform. Experiments on four state-of-the-art voice conversion models show that our method achieves the overall best performance among five compared methods in both white-box and black-box scenarios in terms of defense effectiveness and generation time. The source code is available at GitHub by https://github.com/imagecbj/ Initiative-Defense-against-Voice-Conversion-through-Gen erative-Adversarial-Network.



End-to-End Dual-Branch Network Towards Synthetic Speech Detection

January 2023

·

40 Reads

·

28 Citations

Signal Processing Letters, IEEE

Synthetic speech attacks bring more threats to Automatic Speaker Verification (ASV) systems, thus many synthetic speech detection (SSD) systems have been proposed to help the ASV system resist synthetic speech attacks. However, existing SSD systems still lack the generalization ability for the attacks generated by unknown synthesis algorithms. This paper proposes an end-to-end ensemble system, namely Dual-Branch Network, in which linear frequency cepstral coefficients (LFCC) and constant Q transform (CQT) are used as the input of two branches respectively. In addition, four fusion strategies are compared for the fusion of two branches to obtain an optimal one; multi-task learning and convolutional block attention module (CBAM) are introduced into the Dual-Branch Network to help the network learn the common forgery features from different forgery types of speech and enhance the representation power of learned features. Experimental results on the ASVspoof 2019 logical access (LA) dataset demonstrate that the proposed system outperforms existing state-of-the-art systems on both t-DCF and EER scores and has good generalization for unknown forgery types of synthetic speech. The source code is available at https://github.com/imagecbj/End-to-End-Dual-Branch-Network-Towards-Synthetic-Speech-Detection .</i



Extraction the specified face from a frame
Structure of the dual-branch neural network
Structure of the EfficientNet-B0 network
MBConv architecture
Hotmaps extracted with Grad-CAM from a frame sequence. The last row shows the softmax score of each frame

+9

A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies
  • Article
  • Full-text available

July 2022

·

337 Reads

·

12 Citations

Liang Kuang

·

·

Tian Hang

·

[...]

·

Guoying Zhao

It has become a research hotspot to detect whether a video is natural or DeepFake. However, almost all the existing works focus on detecting the inconsistency in either spatial or temporal. In this paper, a dual-branch (spatial branch and temporal branch) neural network is proposed to detect the inconsistency in both spatial and temporal for DeepFake video detection. The spatial branch aims at detecting spatial inconsistency by the effective EfficientNet model. The temporal branch focuses on temporal inconsistency detection by a new network model. The new temporal model considers optical flow as input, uses the EfficientNet to extract optical flow features, utilize the Bidirectional Long-Short Term Memory (Bi-LSTM) network to capture the temporal inconsistency of optical flow. Moreover, the optical flow frames are stacked before inputting into the EfficientNet. Finally, the softmax scores of two branches are combined with a binary-class linear SVM classifier. Experimental results on the compressed FaceForensics++ dataset and Celeb-DF dataset show that: (a) the proposed dual-branch network model performs better than some recent spatial and temporal models for the Celeb-DF dataset and all the four manipulation methods in FaceForensics++ dataset since these two branches can complement each other; (b) the use of optical flow inputs, Bi-LSTM and dual-branches can greatly improve the detection performance by the ablation experiments.

Download

Distinguishing Between Natural and GAN‐Generated Face Images by Combining Global and Local Features

January 2022

·

140 Reads

·

23 Citations

Chinese Journal of Electronics

With the development of face image synthesis and generation technology based on generative adversarial networks (GANs), it has become a research hotspot to determine whether a given face image is natural or generated. However, the generalization capability of the existing algorithms is still to be improved. Therefore, this paper proposes a general algorithm. To do so, firstly, the learning on important local areas, containing many face key‐points, is strengthened by combining the global and local features. Secondly, metric learning based on the ArcFace loss is applied to extract common and discriminative features. Finally, the extracted features are fed into the classification module to detect GAN‐generated faces. The experiments are conducted on two publicly available natural datasets (CelebA and FFHQ) and seven GAN‐generated datasets. Experimental results demonstrate that the proposed algorithm achieves a better generalization performance with an average detection accuracy over 0.99 than the state‐of‐the‐art algorithms. Moreover, the proposed algorithm is robust against additional attacks, such as Gaussian blur, and Gaussian noise addition.


A Local Perturbation Generation Method for GAN-Generated Face Anti-Forensics

January 2022

·

28 Reads

·

31 Citations

IEEE Transactions on Circuits and Systems for Video Technology

Although the current generative adversarial networks (GAN)-generated face forensic detectors based on deep neural networks (DNNs) have achieved considerable performance, they are vulnerable to adversarial attacks. In this paper, an effective local perturbation generation method is proposed to expose the vulnerability of state-of-the-art forensic detectors. The main idea is to mine the fake faces’ areas of common concern in multiple-detectors’ decision-making, then generate local anti-forensic perturbations by GANs in these areas to enhance the visual quality and transferability of anti-forensic faces. Meanwhile, in order to improve the anti-forensic effect, a double- mask (soft mask and hard mask) strategy and a three-part loss (the GAN training loss, the adversarial loss consisting of ensemble classification loss and ensemble feature loss, and the regularization loss) are designed for the training of the generator. Experiments conducted on fake faces generated by StyleGAN demonstrate the proposed method’s advantage over the state-of-the-art methods in terms of anti-forensic success rate, imperceptibility, and transferability. The source code is available at https://github.com/imagecbj/A-Local-Perturbation-Generation-Method-for-GAN-generated-Face-Anti-forensics .


A Robust GAN-Generated Face Detection Method Based on Dual-Color Spaces and an Improved Xception

September 2021

·

139 Reads

·

98 Citations

IEEE Transactions on Circuits and Systems for Video Technology

In recent years, generative adversarial networks (GANs) have been widely used to generate realistic fake face images, which can easily deceive human beings. To detect these images, some methods have been proposed. However, their detection performance will be degraded greatly when the testing samples are post-processed. In this paper, some experimental studies on detecting post-processed GAN-generated face images find that (a) both the luminance component and chrominance components play an important role, and (b) the RGB and YCbCr color spaces achieve better performance than the HSV and Lab color spaces. Therefore, to enhance the robustness, both the luminance component and chrominance components of dual-color spaces (RGB and YCbCr) are considered to utilize color information effectively. In addition, the convolutional block attention module and multilayer feature aggregation module are introduced into the Xception model to enhance its feature representation power and aggregate multilayer features, respectively. Finally, a robust dual-stream network is designed by integrating dual-color spaces RGB and YCbCr and using an improved Xception model. Experimental results demonstrate that our method outperforms some existing methods, especially in its robustness against different types of post-processing operations, such as JPEG compression, Gaussian blurring, gamma correction, and median filtering.

Citations (8)


... These methods effectively destroy the authenticity of the fake speech by adding tiny adversarial perturbations to the audio that are imperceptible to humans, making it impossible for the speech conversion model to accurately imitate the target speaker. In addition, Dong et al [44]. proposed a framework based on generative adversarial networks (GANs) that aims to quickly generate adversarial perturbations to defend speech conversion models. ...

Reference:

VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
Active Defense Against Voice Conversion Through Generative Adversarial Network
  • Citing Article
  • January 2024

Signal Processing Letters, IEEE

... Raza et al. [31] proposed a three-stream network utilizing temporal, spatial, and spatiotemporal features for deepfake detection. Moreover, security techniques for deepfake detection on untrusted servers were introduced by Chen B. et al. [34]; their method enables distant servers to detect deepfake videos without understanding the content. ...

Privacy-Preserving DeepFake Face Image Detection
  • Citing Article
  • October 2023

Digital Signal Processing

... These results set solid baselines for comparison, highlighting the performance of the top approaches in the field. 1.98 0.0469 LFCC+LCNN-LSTM-sum [19] 1.92 0.0520 SE-Res2Net50 * [14] 1.89 0.0452 LFCC+GMM-ResNet [21] 1.80 0.0498 CQT+MCG-Res2Net50 [15] 1.78 0.0520 Raw PC-DARTS [40] 1.77 0.0517 FastAudio-Tri+X-vector [41] 1.73 0.0491 LPS+SENet [32] 1.14 0.0368 Capsule * [12] 1.07 0.0328 LFCC+OCT [23] 1.06 0.0345 scDenseNet * [20] 0.98 0.0320 PA-SE-ResNet * [22] 0.96 0.0307 AASIST [35] 0.83 0.0275 Dual-Branch Network * [42] 0.80 0.0214 RW-ResNet [25] 2.98 0.0817 Res-TSSDNet [26] 1.64 0.0482 Ours: DDFNet 0.69 0.0203 ...

End-to-End Dual-Branch Network Towards Synthetic Speech Detection
  • Citing Article
  • January 2023

Signal Processing Letters, IEEE

... However, with the continuous innovation of GAN, the difference between generated images and natural images in the spatial domain becomes increasingly difficult to detect [68]. As a consequence, [6,9,13,14,17,37,43,68] turned their attention to the frequency domain, which improves the detection generalization performance by fusing features from the spatial and frequency domains. But their methods cannot adaptively capture the most dis-criminative features. ...

GAN-Generated Face Detection with Strong Generalization Ability Based on Quaternions
  • Citing Article
  • December 2022

Journal of Computer-Aided Design & Computer Graphics

... As cyberspace activities become increasingly frequent, cyberspace portraits raise many security issues. Terefore, it is crucial to protect the privacy of portraits [1,2]. Image-toimage (I2I) translation has become a popular research topic in recent years, and it aims to learn image-projecting functions from source to target domains. ...

A Local Perturbation Generation Method for GAN-Generated Face Anti-Forensics
  • Citing Article
  • January 2022

IEEE Transactions on Circuits and Systems for Video Technology

... The rapid development and distribution of DeepFakes technology issued an urgent call for the research community to present Face Image Manipulation Detection (FIMD) techniques that can check the authenticity of the face images and detect manipulations if exist. In the last few years, the research community dedicated a lot of efforts in developing FIMD techniques that can generally serve the media forensics and security systems [5][6][7][8][9][10][11][12][13][14]. ...

A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies

... For instance, McCloskey et al. [6] leveraged red-green bivariate histograms and abnormal pixel exposure ratios for detection, while Agarwal et al. [7] exploited highfrequency artifacts stemming from GANs' upsampling processes. Chen et al. [8] integrated both global and local image features, utilizing metric learning to enhance the overall detection performance of model. In terms of network architectures, Convolutional Neural Networks (CNNs) remain a predominant choice for deepfake detection tasks due to their capacity to extract semantic, color, and texture information [9], as seen in Fu et al. [10] dual-channel CNN architecture, which is capable of concurrently processing both high-and low-frequency image components. ...

Distinguishing Between Natural and GAN‐Generated Face Images by Combining Global and Local Features

Chinese Journal of Electronics

... However, these methods often rely on fixed filters, limiting adaptability to unseen models and post-processing effects. Feature Fusion integrates multiple complementary features for robust AI-synthesized image detection [5,30,32,46]. Techniques include dual-color fusion (RGB and YCbCr) [5] and frequency-spatial feature fusion [32]. Unlike previous methods, our approach introduces LLM-based detection, improving generalization against advanced generative models and diverse forgery types. ...

A Robust GAN-Generated Face Detection Method Based on Dual-Color Spaces and an Improved Xception
  • Citing Article
  • September 2021

IEEE Transactions on Circuits and Systems for Video Technology