Stefanos ZafeiriouImperial College London | Imperial · Department of Computing
Stefanos Zafeiriou
PhD
About
445
Publications
123,787
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
33,230
Citations
Introduction
Additional affiliations
Position
- Research Assistant
Position
- Research Associate/Postdoc
Position
- Imperial College London (Research Fellow)
Education
January 2004 - July 2007
September 1999 - June 2003
Publications
Publications (445)
Sign language is a visual language that encompasses all linguistic features of natural languages and serves as the primary communication method for the deaf and hard-of-hearing communities. While many studies have successfully adapted pretrained language models (LMs) for sign language translation (sign-to-text), drawing inspiration from its linguis...
Foundation Models have recently emerged as powerful tools in various domains of AI, showing potential for significant advancements in Brain-Computer Interfaces (BCIs). However, the initial implementations of Large Brainwave Models (LBMs) face significant challenges when applied to real-world scenarios, primarily due to various distribution shifts....
The use of Generative AI (GenAI) in developing large brainwave foundation models for Brain-Computer Interfaces (BCIs) offers enormous potential but also comes with several key safety and ethical concerns. This work identifies these challenges and highlights cases of potential misuse of GenAI in BCIs, including synthetic neural activity, behaviour p...
In recent years, 3D hand pose estimation methods have garnered significant attention due to their extensive applications in human-computer interaction, virtual reality, and robotics. In contrast, there has been a notable gap in hand detection pipelines, posing significant challenges in constructing effective real-world multi-hand reconstruction sys...
This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition compri...
We propose ID-to-3D, a method to generate identity- and text-guided 3D human heads with disentangled expressions, starting from even a single casually captured in-the-wild image of a subject. The foundation of our approach is anchored in compositionality, alongside the use of task-specific 2D diffusion models as priors for optimization. First, we e...
Recent advancements in text-to-image generation using diffusion models have significantly improved the quality of generated images and expanded the ability to depict a wide range of objects. However, ensuring that these models adhere closely to the text prompts remains a considerable challenge. This issue is particularly pronounced when trying to g...
Objective: The patterns of brain activity associated with different brain processes can be used to identify different brain states and make behavioral predictions. However, the relevant features are not readily apparent and accessible. Our aim is to design a system for learning informative latent representations from multichannel recordings of ongo...
Machine learning models have opened up enormous opportunities in the field of Brain-Computer Interfaces (BCIs). Despite their great success, they usually face severe limitations when they are employed in real-life applications outside a controlled laboratory setting. Mixing causal reasoning, identifying causal relationships between variables of int...
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the t...
Personalized electroencephalogram (EEG) decoders hold a distinct preference in healthcare applications, especially in the context of Motor-Imagery (MI) Brain-Computer Interfaces (BCIs), owing to their inherent capability to effectively tackle inter-subject variability. This study introduces a novel subject selection framework that blends ideas from...
The long-tailed distribution is a common phenomenon in the real world. Extracted large scale image datasets inevitably demonstrate the long-tailed property and models trained with imbalanced data can obtain high performance for the over-represented categories, but struggle for the under-represented categories, leading to biased predictions and perf...
Brain-computer interfaces (BCIs) enable a direct communication of the brain with the external world, using one's neural activity, measured by electroencephalography (EEG) signals. In recent years, Convolutional Neural Networks (CNNs) have been widely used to perform automatic feature extraction and classification in various EEG-based tasks. However...
Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (
e.g
. by concatenation) or merged through the guidance of proxies (
e.g
. attentions) in t...
In this paper, we introduce FitMe, a facial reflectance model and a differentiable rendering optimization pipeline, that can be used to acquire high-fidelity renderable human avatars from single or multiple images. The model consists of a multi-modal style-based generator, that captures facial appearance in terms of diffuse and specular reflectance...
Following the remarkable success of diffusion models on image generation, recent works have also demonstrated their impressive ability to address a number of inverse problems in an unsupervised way, by properly constraining the sampling process based on a conditioning input. Motivated by this, in this paper, we present the first approach to use dif...
Deep Convolutional Neural Networks (CNNs) have recently demonstrated impressive results in electroencephalogram (EEG) decoding for several Brain-Computer Interface (BCI) paradigms, including Motor-Imagery (MI). However, neurophysiological processes underpinning EEG signals vary across subjects causing covariate shifts in data distributions and henc...
Self-attention mechanisms and non-local blocks have become crucial building blocks for state-of-the-art neural architectures thanks to their unparalleled ability in capturing long-range dependencies in the input. However their cost is quadratic with the number of spatial positions hence making their use impractical in many real case applications. I...
We present Free-HeadGAN, a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks is sufficient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models. Apart from 3D pose and facial expressions, our method...
The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2...
In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT). TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial...
3D gaze estimation is most often tackled as learning a direct mapping between input images and the gaze vector or its spherical coordinates. Recently, it has been shown that pose estimation of the face, body and hands benefits from revising the learning target from few pose parameters to dense 3D coordinates. In this work, we leverage this observat...
We present Dynamic Neural Portraits, a novel approach to the problem of full-head reenactment. Our method generates photo-realistic video portraits by explicitly controlling head pose, facial expressions and eye gaze. Our proposed architecture is different from existing methods that rely on GAN-based image-to-image translation networks for transfor...
Recently, Deep Neural Networks (DNNs) have been shown to outperform traditional methods in many disciplines such as computer vision, speech recognition and natural language processing. A prerequisite for the successful application of DNNs is the big number of data. Even though various facial datasets exist for the case of 2D images, there is a rema...
The recent advances in 3D sensing technology have made possible the capture of point clouds in significantly high resolution. However, increased detail usually comes at the expense of high storage, as well as computational costs in terms of processing and visualization operations. Mesh and Point Cloud simplification methods aim to reduce the comple...
Near infrared (NIR) to Visible (VIS) face matching is challenging due to the significant domain gaps as well as a lack of sufficient data for cross-modality model training. To overcome this problem, we propose a novel method for paired NIR-VIS facial image generation. Specifically, we reconstruct 3D face shape and reflectance from a large 2D facial...
Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Quality (LQ) input images, which is a domain-specific image restoration problem in the low-level computer vision area. The early face restoration methods mainly use statistic priors and degradation models, which are difficult to meet the requirements of real-world applications i...
Facial 3D Morphable Models are a main computer vision subject with countless applications and have been highly optimized in the last two decades. The tremendous improvements of deep generative networks have created various possibilities for improving such models and have attracted wide interest. Moreover, the recent advances in neural radiance fiel...
The long-tailed distribution is a common phenomenon in the real world. Extracted large scale image datasets inevitably demonstrate the long-tailed property and models trained with imbalanced data can obtain high performance for the over-represented categories, but struggle for the under-represented categories, leading to biased predictions and perf...
Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNN...
We present Free-HeadGAN, a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models. Apart from 3D pose and facial expressions, our method...
Recovering a low-rank matrix from highly corrupted measurements arises in compressed sensing of structured high-dimensional signals (e.g., videos and hyperspectral images among others). Robust principal component analysis (RPCA), solved via principal component pursuit (PCP), recovers a low-rank matrix from sparse corruptions that are of unknown val...
Recent deep face hallucination methods show stunning performance in super-resolving severely degraded facial images, even surpassing human ability. However, these algorithms are mainly evaluated on non-public synthetic datasets. It is thus unclear how these algorithms perform on public face hallucination datasets. Meanwhile, most of the existing da...
Geodesic paths and distances are among the most popular intrinsic properties of 3D surfaces. Traditionally, geodesic paths on discrete polygon surfaces were computed using shortest path algorithms, such as Dijkstra. However, such algorithms have two major limitations. They are non-differentiable which limits their direct usage in learnable pipeline...
In this work, we employ causal reasoning to breakdown and analyze important challenges of the decoding of Motor-Imagery (MI) electroencephalography (EEG) signals. Furthermore, we present a framework consisting of dynamic convolu-tions, that address one of the issues that arises through this causal investigation, namely the subject distribution shif...
This paper probes intrinsic factors behind typical failure cases (e.g. spatial inconsistency and boundary confusion) produced by the existing state-of-the-art method in face parsing. To tackle these problems, we propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation (DML-CSR) for face parsing. Specifically, DML-CSR designs a mu...
Learning a dense 3D model with fine-scale details from a single facial image is highly challenging and ill-posed. To address this problem, many approaches fit smooth geometries through facial prior while learning details as additional displacement maps or personalized basis. However, these techniques typically require vast datasets of paired multi-...
In training machine learning models for land cover semantic segmentation there is a stark contrast between the availability of satellite imagery to be used as inputs and ground truth data to enable supervised learning. While thousands of new satellite images become freely available on a daily basis, getting ground truth data is still very challengi...
Background/aim: To develop a 3D morphable model of the normal paediatric mandible to analyse shape development and growth patterns for males and females. Methods: Computed tomography (CT) data was collected for 242 healthy children referred for CT scan between 2011 and 2018 aged between 0 and 47 months (mean, 20.6 ± 13.4 months, 59.9% male). Thresh...
While Graph Neural Networks (GNNs) have achieved remarkable results in a variety of applications, recent studies exposed important shortcomings in their ability to capture the structure of the underlying graph. It has been shown that the expressive power of standard GNNs is bounded by the Weisfeiler-Leman (WL) graph isomorphism test, from which the...
Transfer learning and meta-learning offer some of the most promising avenues to unlock the scalability of healthcare and consumer technologies driven by biosignal data. This is because current methods cannot generalise well across human subjects' data and handle learning from different heterogeneously collected data sets, thus limiting the scale of...
Clinical diagnosis of craniofacial anomalies requires expert knowledge. Recent studies have shown that artificial intelligence (AI) based facial analysis can match the diagnostic capabilities of expert clinicians in syndrome identification. In general, these systems use 2D images and analyse texture and colour. They are powerful tools for photograp...
Over the past decade, deep learning has had a revolutionary impact on a broad range of fields such as computer vision and image processing, computational photography, medical imaging and speech and language analysis and synthesis etc. Deep learning technologies are estimated to have added billions in business value, created new markets, and transfo...
Building subject-independent deep learning models for EEG decoding faces the challenge of strong covariate-shift across different datasets, subjects and recording sessions. Our approach to address this difficulty is to explicitly align feature distributions at various layers of the deep learning model, using both simple statistical techniques as we...
Micro-expressions (ME) are a special form of facial expressions which may occur when people try to hide their true feelings for some reasons. MEs are important clues to reveal people’s true feelings, but are difficult or impossible to be captured by ordinary persons with naked-eyes as they are very short and subtle. It is expected that robust compu...
In this article, we propose a fully supervised pretraining scheme based on contrastive learning particularly tailored to dense classification tasks. The proposed context-self contrastive loss (CSCL) learns an embedding space that makes semantic boundaries pop-up by use of a similarity metric between every location in a training sample and its local...
Over the last years, many face analysis tasks have accomplished astounding performance, with applications including face generation and 3D face reconstruction from a single "in-the-wild" image. Nevertheless, to the best of our knowledge, there is no method which can produce render-ready high-resolution 3D faces from "in-the-wild" images and this ca...
Over the last years, many face analysis tasks have accomplished astounding performance, with applications including face generation and 3D face reconstruction from a single ‘'in-the-wild’' image. Nevertheless, to the best of our knowledge, there is no method which can produce render-ready high-resolution 3D faces from ‘'in-the-wild’' images and thi...
Background
This study aims to capture the 3D shape of the human skull in a healthy paediatric population (0–4 years old) and construct a generative statistical shape model.
Methods
The skull bones of 178 healthy children (55% male, 20.8 ± 12.9 months) were reconstructed from computed tomography (CT) images. 29 anatomical landmarks were placed on t...
Patterns of brain activity are associated with different brain processes and can be used to identify different brain states and make behavioral predictions. However, the relevant features are not readily apparent and accessible. To mine informative latent representations from multichannel EEG recordings, we propose a novel differentiable EEG decodi...
Recent deep face hallucination methods show stunning performance in super-resolving severely degraded facial images, even surpassing human ability. However, these algorithms are mainly evaluated on non-public synthetic datasets. It is thus unclear how these algorithms perform on public face hallucination datasets. Meanwhile, most of the existing da...
The recent advances in 3D sensing technology have made possible the capture of point clouds in significantly high resolution. However, increased detail usually comes at the expense of high storage, as well as computational costs in terms of processing and visualization operations. Mesh and Point Cloud simplification methods aim to reduce the comple...