
Shubhajit BasakFotoNation · ML
Shubhajit Basak
Doctor of Engineering
3D computer vision, synthetic data, computer graphics
About
19
Publications
7,067
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
108
Citations
Introduction
Computer Vision, Deep Learning, Synthetic Data, Natural Language Processing
Skills and Expertise
Publications
Publications (19)
While current research predominantly focuses on image-based colorization, the domain of video-based colorization remains relatively unexplored. Most existing video colorization techniques operate on a frame-by-frame basis, often overlooking the critical aspect of temporal coherence between successive frames. This approach can result in inconsistenc...
The incorporation of 3D data in facial analysis tasks has gained popularity in recent years. Though it provides a more accurate and detailed representation of the human face, accruing 3D face data is more complex and expensive than 2D face images. Either one has to rely on expensive 3D scanners or depth sensors which are prone to noise. An alternat...
This article presents C3I-SynFace: a large-scale synthetic human face dataset with corresponding ground truth annotations of head pose and face depth generated using the iClone 7 Character Creator "Realistic Human 100" toolkit with variations in ethnicity, gender, race, age, and clothing. The data is generated from 15 female and 15 male synthetic 3...
In this paper we propose a method for end-to-end speech driven video editing using a denoising diffusion model. Given a video of a person speaking, we aim to re-synchronise the lip and jaw motion of the person in response to a separate auditory speech recording without relying on intermediate structural representations such as facial landmarks or a...
Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of depth maps, this paper proposed a lightweight neural...
Convolutional Neural Networks (CNN) have gained popularity as the de-facto model for any computer vision task. However, CNN have drawbacks, i.e. they fail to extract long-range perceptions in images. Due to their ability to capture long-range dependencies, transformer networks are adopted in computer vision applications, where they show state-of-th...
This article contains all of the information needed to conduct a study on monocular facial depth estimation problems. A brief literature review and applications on facial depth map research were offered first, followed by a comprehensive evaluation of publicly available facial depth datasets and widely used loss functions. The key properties and ch...
In many applications, such as robotic perception, scene understanding, augmented reality, 3D reconstruction, and medical image analysis, depth from images is a fundamentally ill-posed problem. The success of depth estimation models relies on assembling a suitably large and diverse training dataset and on the selection of appropriate loss functions....
Depth estimation from a single image frame is a fundamental challenge in computer vision, with many applications such as augmented reality, action recognition, image understanding, and autonomous driving. Large and diverse training sets are required for accurate depth estimation from a single image frame. Due to challenges in obtaining dense ground...
Accurate head pose estimation from 2D image data is an essential component of applications such as driver monitoring systems, virtual reality technology, and human-computer interaction. It enables a better determination of user engagement and attentiveness. The most accurate head pose estimators are based on Deep Neural Networks that are trained wi...
Recent advances in deep learning methods have increased the performance of face detection and recognition systems. The accuracy of these models relies on the range of variation provided in the training data. Creating a dataset that represents all variations of real-world faces is not feasible as the control over the quality of the data decreases wi...
In this paper, we explore how synthetically generated 3D face models can be used to construct a high-accuracy ground truth for depth. This allows us to train the Convolutional Neural Networks (CNN) to solve facial depth estimation problems. These models provide sophisticated controls over image variations including pose, illumination, facial expres...
In this paper, we explore how synthetically generated 3D face models can be used to construct a high accuracy ground truth for depth. This allows us to train the Convolutional Neural Networks (CNN) to solve facial depth estimation problems. These models provide sophisticated controls over image variations including pose, illumination, facial expres...