
Sanjay Singh- M.Tech., Ph.D.
- Senior Principal Scientist & Group Head at CSIR – Central Electronics Engineering Research Institute (CSIR-CEERI)
Sanjay Singh
- M.Tech., Ph.D.
- Senior Principal Scientist & Group Head at CSIR – Central Electronics Engineering Research Institute (CSIR-CEERI)
Research & Development: AI, Computer Vision, Intelligent Systems
About
108
Publications
36,423
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,104
Citations
Introduction
Sr. Principal Scientist & Group Head @ CSIR-CEERI | Professor @ AcSIR India | Visiting Scientist @ TUM Germany | Visiting Scientist & Faculty @ Hiroshima University Japan | Visiting Scientist @ Nagasaki University Japan
Skills and Expertise
Current institution
CSIR – Central Electronics Engineering Research Institute (CSIR-CEERI)
Current position
- Senior Principal Scientist & Group Head
Additional affiliations
March 2025 - present
Technical University of Munich (TUM), Germany
Position
- Visiting Scientist
Description
- Visiting Scientist (under Raman Research Fellowship) at Chair of Biological Imaging, Technical University of Munich (TUM), Germany with Prof. Dr. Vasilis Ntziachristos [Chair of Biological Imaging (TUM) and Director of Institute of Biological and Medical Imaging at the Helmholtz Zentrum München]
June 2017 - September 2019
Position
- Visiting Scientist & Designated Associate Professor
Description
- ->Visiting Researcher [02 Sept. 2019 – 18 Sept. 2019] in Department of System Cybernetics ->Visiting Researcher [11 Jun. 2017 – 01 Jul. 2017] in Department of System Cybernetics ->Designated Associate Professor [18 Dec. 2017 – 14 Mar. 2018] in Graduate School of Engineering
September 2022 - September 2022
Nagasaki University, Japan
Position
- Visiting Scientist
Description
- Visiting Scientist under Japan Science and Technology Agency (JST) Funded Sakura Science Researcher Exchange Program
Education
March 2009 - November 2015
CSIR - Central Electronics Engineering Research Institute (CSIR-CEERI) & Kurukshetra University
Field of study
- VLSI Architectures, Computer Vision, Real-time Image Processing
August 2005 - July 2007
Publications
Publications (108)
Detecting and interpreting operator actions, engagement, and object interactions in dynamic industrial workflows remains a significant challenge in human-robot collaboration research, especially within complex, real-world environments. Traditional unimodal methods often fall short of capturing the intricacies of these unstructured industrial settin...
Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MM...
The field of computational imaging has witnessed a promising paradigm shift with the emergence of untrained neural networks, offering novel solutions to inverse computational imaging problems. While existing techniques have demonstrated impressive results, they often operate either in the high-data regime, leveraging Generative Adversarial Networks...
Lensless imaging has emerged as a promising field within inverse imaging, offering compact, cost-effective solutions with the potential to revolutionize the computational camera market. By circumventing traditional optical components like lenses and mirrors, novel approaches like mask-based lensless imaging eliminate the need for conventional hardw...
Detecting anomalies in videos presents a significant challenge in the field of video surveillance. The primary goal is identifying and detecting uncommon actions or events within a video sequence. The difficulty arises from the limited availability of video frames depicting anomalies and the ambiguous definition of anomaly. Based on extensive appli...
Facial expression recognition (FER) in real-world unconstrained conditions is a challenging and active field of research among the pattern recognition and computer vision community. FER systems have immense use in advanced applications based on human-computer interaction (HCI) and human-robot interaction (HRI). Most of these applications heavily re...
Lack of proper maintenance of power line infrastructures is one of the main reasons behind power shortages and major blackouts. Current inspection methods are human-dependent, which is time-consuming and expensive. Recent progress in Unmanned Aerial Vehicles (UAVs) and digital cameras enforces the use of UAVs for power line inspection, reducing the...
Yoga has become an essential part of modern life, and hence, there has been a tremendous demand for self-training yoga platforms for trainer-less yoga practice. Robust and efficient recognition of yoga poses in video stream is the first requirement of such systems. However, the existing techniques for yoga pose recognition are compute-intensive and...
Automatic detection of abnormal behavior in video sequences is a fundamental and challenging problem for
intelligent video surveillance systems. However, the existing state-of-the-art Video Anomaly Detection (VAD)
methods are computationally expensive and lack the desired robustness in real-world scenarios. The contemporary VAD methods cannot detec...
Industry 5.0 and increased industrial automation have driven the demand for systems recognizing human activities in industrial environments. Vision-based systems for human activity recognition at industrial sites may be helpful in ergonomic studies. Besides, these systems may help identify possible deviations in assembly line standard operating pro...
Anomaly detection in video data plays a crucial role in numerous applications, such as industrial monitoring and automated surveillance. This paper presents a novel method for video anomaly detection (VAD) using Generative Adversarial Networks (GANs). The proposed method called VALT-GAN combines two separate branches, one for spatial information an...
The most crucial and difficult challenge for intelligent video surveillance is to identify anomalies in a video that comprises anomalous behavior or occurrences. The ambiguous definition of the anomaly makes the detection of it a challenging task.
Inspired by the wide adoption of generative adversarial networks (GANs), we proposed video anomaly det...
Automatic detection and interpretation of abnormal events have become crucial tasks in large-scale video surveillance systems. The challenges arise from the lack of a clear definition of abnormality, which restricts the usage of supervised methods. To this end, we propose a novel unsupervised anomaly detection method, Spatio-Temporal Generative Adv...
Image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem increases the challenge since the outputs could be multimodal. Existing learning-based methods produce acceptable results for straightforward cases but usually fail to restore the contextual information without clear fig...
Spoof detection in complex real-world conditions has always been challenging for the face anti-spoofing research community. Most existing datasets need more practical variations for spoof detection in the wild and thus generate the need for a more complex dataset encompassing the required diversities. The single image-based anti-spoofing solutions...
Convolutional neural networks (CNNs) have achieved human-level performance in various computer vision tasks, such as image classification, object detection & segmentation, etc. However, efficient CNN training requires a large amount of annotated data. Also, the CNNs, without explicit data augmentation, are bad at handling rotation and scale invaria...
Facial expression recognition (FER) in the wild is an active and challenging field of research. A system for automatic FER finds use in a wide range of applications related to advanced human–computer interaction (HCI), human–robot interaction (HRI), human behavioral analysis, gaming and entertainment, etc. Since their inception, convolutional neura...
The human face is one of the most widely available biometric methods of identification and verification. In the age of Industry 4.0, one can find digital cameras everywhere, making a face recognition-based digital identity system much more viable. The face is vulnerable to spoofing attacks because it is the most accessible and commonly used biometr...
The three-dimensional convolutional neural network (3D-CNN) and long short-term memory (LSTM) have consistently outperformed many approaches in video-based facial expression recognition (VFER). The image is unrolled to a one-dimensional vector by the vanilla version of the fully-connected LSTM (FC-LSTM), which leads to the loss of crucial spatial i...
Lensless image reconstruction is an ill-posed inverse problem in computational imaging, having several applications in machine vision. Existing approaches rely on large datasets for learning to perform deconvolution and are often specific to the point spread function of a particular lensless imager. Generating pairs of lensless images and their cor...
Systems for automatic facial expression recognition (FER) have an enormous need in advanced human-computer interaction (HCI) and human-robot interaction (HRI) applications. Over the years, researchers developed many handcrafted feature descriptors for the FER task. These descriptors delivered good accuracy on publicly available FER benchmark datase...
This article presents an online local path planning approach for autonomous drone navigating a 2D plane in an unknown, indoor corridor-like environment. The proposed method utilizes a reinforcement learning approach for training a local path planner for navigation in the said environment. With a continuous actor-critic learning automaton (CACLA) ap...
Early diagnosis of brain tumor using magnetic resonance imaging (MRI) is vital for timely medication and effective treatment. But, most people living in remote areas do not have access to medical experts and diagnosis facilities. Nevertheless, recent advancement in the Internet of Thing and artificial intelligence is transforming the healthcare sys...
Grayscale image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem makes it even more challenging since the outputs could be multi-modal. The learning-based methods currently in use produce acceptable results for straightforward cases but usually fail to restore the contextual...
Over the past few years, there has been a significant improvement in the domain of few-shot learning. This learning paradigm has shown promising results for the challenging problem of anomaly detection, where the general task is to deal with heavy class imbalance. Our paper presents a new approach to few-shot classification, where we employ the kno...
Clinical diagnostics for SARS-CoV-2 infection usually comprises the sampling of throat or nasopharyngeal swabs that are invasive and create patient discomfort. Hence, saliva is attempted as a sample of choice for the management of COVID-19 outbreaks that cripples the global healthcare system. Although limited by the risk of eliciting false-negative...
Automatic recognition of the eye states is essential for diverse computer vision applications related to drowsiness detection, facial emotion recognition (FER), human–computer interaction (HCI), etc. Existing solutions for eye state detection are either parameter intensive or suffer from a low recognition rate. This paper presents the design and im...
Alarming cases of falls in the elderly have triggered the rise of robust and cost-efficient systems for automated fall detection in humans. Although several potential solutions exist, they still have not achieved the desired level of robustness and acceptability. Lately, the proliferation of low-cost cameras coupled with deep learning techniques ha...
Today, due to the widespread outbreak of the deadly coronavirus, popularly known as COVID-19, the traditional classroom education has been shifted to computer-based learning. Students of various cognitive and psychological abilities participate in the learning process. However, most students are hesitant to provide regular and honest feedback on th...
In our day-to-day social interactions, non-verbal cues such as facial emotions play a vital role. These cues assist people in understanding and inferring the hidden emotional state of the individuals. However, blind and visually impaired persons (VIPs) sadly lack access to such cues, which results in impaired interpersonal communication. To allevia...
SARS-CoV2 pandemic exposed the limitations of artificial intelligence based medical imaging systems. Earlier in the pandemic, the absence of sufficient training data prevented effective deep learning (DL) solutions for the diagnosis of COVID-19 based on X-Ray data. Here, addressing the lacunae in existing literature and algorithms with the paucity...
Purpose
The electronic nose is an array of chemical or gas sensors and associated with a pattern-recognition framework competent in identifying and classifying odorant or non-odorant and simple or complex gases. Despite more than 30 years of research, the robust e-nose device is still limited. Most of the challenges towards reliable e-nose devices...
Globally, human falls are the second leading cause of deaths induced due to unintentional injuries. These fatalities, in most cases, arise due to a lack of timely medication. Therefore, over the years, there has been an immense demand for systems that can quickly send fall-related information to the caretakers so that the medical relief team can re...
This work proposes a hybrid 3D Convolutional Neural Network and Restricted Boltzmann Machine (Hybrid 3DCNN-RBM) architecture tailored for gas concentration estimation. The immense success of deep learning in computer vision and natural language processing inspired us to design a deep-learning-based gas concentration estimation network. The proposed...
Existing techniques for Yoga pose recognition build classifiers based on sophisticated handcrafted features computed from the raw inputs captured in a controlled environment. These techniques often fail in complex real-world situations and thus, pose limitations on the practical applicability of existing Yoga pose recognition systems. This paper pr...
Automatic recognition of facial expressions in the wild is a challenging problem and has drawn a lot of attention from the computer vision and pattern recognition community. Since their emergence, the deep learning techniques have proved their efficacy in facial expression recognition (FER) tasks. However, these techniques are parameter intensive,...
In the past decade, facial emotion recognition (FER) research saw tremendous progress, which led to the development of novel convolutional neural network (CNN) architectures for automatic recognition of facial emotions in static images. These networks, though, have achieved good recognition accuracy, they incur high computational costs and memory u...
This study is an attempt towards improving the accuracy and execution time of a facial expression recognition (FER) system. The algorithmic pipeline consists of a face detector block, followed by a facial alignment and registration, feature extraction, feature selection, and classification blocks. The proposed method utilizes histograms of oriented...
Rapid growth in advanced human-computer interaction (HCI) based applications has led to the immense popularity of facial expression recognition (FER) research among computer vision and pattern recognition researchers. Lately, a robust texture descriptor named Dynamic Local Ternary Pattern (DLTP) developed for face liveness detection has proved to b...
Fall detection holds immense importance in the field of health-care, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed mea...
The coronavirus disease of 2019 (COVID-19) pandemic exposed a limitation of artificial intelligence (AI) based medical image interpretation systems. Early in the pandemic, when need was greatest, the absence of sufficient training data prevented effective deep learning (DL) solutions. Even now, there is a need for Chest-X-ray (CxR) screening tools...
Driver’s drowsiness is one of the major causes of increase in the number of road accidents. Therefore, design and implementation of a real-time driver’s drowsiness detection system are considered as a crucial component of the Advanced Driver Assistance System (ADAS). Along with other physiological parameters, yawn is often considered as one of the...
Drowsiness of drivers is a critical problem and has recently attracted a lot of attention from both academia and industry. A real-time driver’s drowsiness detection system is often considered as a crucial component of an Advanced Driver Assistance System (ADAS). Although, there are a number of physical parameters associated with drowsiness like bli...
Fall detection holds immense importance in the field of healthcare, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed meas...
Over the past few years, Convolutional Neural Networks (CNNs) have provided major breakthroughs in fields such as computer vision and natural language processing, resulting in a rise in the adoption of CNNs with increased levels of complexity. Consequently, the need for fast and power efficient processing of such networks has become critically impo...
Recently, there has been a huge demand for assistive technology for industrial, commercial, automobile, and societal applications. In some of these applications, there is a requirement of an efficient and accurate system for automatic facial expression recognition (FER). Therefore, FER has gained enormous interest among computer vision researchers....
Human activity recognition (HAR) targets the methodologies to recognize the different actions from a sequence of observations. Vision-based activity recognition is among the most popular unobtrusive technique for activity recognition. Caring for the elderly who are living alone from a remote location is one of the biggest challenges of modern human...
Detection of falls of elderly people is a trivial yet an immediate problem due to the growing age of the population. This demands the need for autonomous self care systems for providing a quick assistance. The three basic approaches used for fall detection include non-invasive vision based devices, ambient based devices and wearable devices. The pa...
In this study, the novel approach of real-time video stabilization system using a high-frame-rate (HFR) jitter sensing device is demonstrated to realize the computationally efficient technique of digital video stabilization for high-resolution image sequences. This system consists of a high-speed camera to extract and track feature points in gray-l...
Automatic facial expression recognition (FER) has gained enormous interest among the computer vision researchers in recent years because of its potential deployment in many industrial, consumer, automobile, and societal applications. There are a number of techniques available in the literature for FER; among them, many appearance-based methods such...
Detection of objects in aerial images has gained significant attention in recent years, due to its extensive needs in civilian and military reconnaissance and surveillance applications. With the advent of Unmanned Aerial Vehicles (UAV), the scope of performing such surveillance task has increased. The small size of the objects in aerial images make...
Visual inspection of transmission and distribution networks is often carried out by various electricity companies on a regular basis to maintain the reliability, availability, and sustainability of electricity supply. Till date the widely used technique for carrying out an inspection is done manually either using foot patrol and/or helicopter opera...
Automated video surveillance is a rapidly evolving area and has been gaining importance in the research community in recent years due to its capabilities of performing more efficient and effective surveillance by employing smart cameras. In this article, we present the design and implementation of an FPGA-based smart camera system for automated vid...
Scene change detection, one of the fundamental and most important problem of computer vision, plays a very important role in the realization of a complete industrial vision system as well as automated video surveillance system - for automatic scene analysis, monitoring, and generation of alerts based on relevant changes in a video stream. Therefore...
Motion detection is the heart of a potentially complex automated video surveillance system, intended to be used as a standalone system. Therefore, in addition to being accurate and robust, a successful motion detection technique must also be economical in the use of computational resources on selected FPGA development platform. This is because many...
The design of smart video surveillance systems is an active research field among the computer vision community because of their ability to perform automatic scene analysis by selecting and tracking the objects of interest. In this paper, we present the design and implementation of an FPGA-based standalone working prototype system for real-time trac...
In this paper, we present hardware accelerator for Facial Expression Classification using One-Versus-All (OVA) linear Support Vector Machine (SVM) classifier. The motivation behind this work is to perform real-time classification of facial expressions into three different classes: neutral, happy and pain, which could be used in an embedded system t...
Design of automated video surveillance systems is one of the exigent missions in computer vision community because of their ability to automatically select frames of interest in incoming video streams based on motion detection. This research paper focuses on the real-time hardware implementation of a motion detection algorithm for such vision based...
A new resource efficient FPGA-based hardware architecture for real-time edge detection using Sobel operator for video surveillance applications has been proposed. The choice of Sobel operator is due to its property to counteract the noise sensitivity of the simple gradient operator. FPGA is chosen for this implementation due to its flexibility to p...
In this paper we present a prototype FPGA design for Saliency detection based on image signature technique to support embedded vision application. Visual attention supports biological vision to restrict our gaze only to the region of interest of a visual scene. We propose a pipelined architecture using Gaussian filter, Discrete Cosine Transform, In...
Tracking of objects of interest is of great significance for video based automated surveillance systems. This research presents the design and implementation of Xilinx ML510 (Virtex-5 FXT) FPGA platform based vision system for real-time object tracking in a video sequence. Modified particle filtering and sum of absolute differences (SAD) based sche...
An accurate, hardware efficient and fast image rescaling unit is a crucial part of any real-time image processing system. Although there are a number of image scaling algorithms existing in the literature but Bicubic and Bilinear interpolation algorithms are most widely used. In the recent years, numerous algorithms have been proposed that aim to b...
This paper presents the design of a dedicated VLSI architecture for focused region extraction in a video sequence and its implementation on Virtex-5 (ML510) FPGA platform. Edge width based scheme is used for focused region extraction. The proposed architecture is designed to meet the real-time requirements of video surveillance applications. It is...
This paper presents a comprehensive review and a comparative study of various hardware/FPGA implementations of Sobel edge detector and explored different architectures for Sobel gradient computation unit in order to show the various trade-offs involved in choosing one over another. The different architectures using pipelining and/or parallelism (ke...
Advances in FPGA technology have dramatically increased the use of FPGAs for computer vision applications. The primary task for development of such FPGAs based systems is the interfacing of the analog camera with FPGA board. This paper describes the design and implementation of camera interface module required for connecting analog camera with Xili...
Advances in FPGA technology have dramatically increased the use of FPGAs for computer vision applications. Availability of on-chip processor (like PowerPC) made it possible to design embedded systems using FPGAs for video processing applications. The objective of this research is to evaluate the performance of different memory components available...
This research paper presents a fast and efficient hardware implementation of a pseudo-random number generator based on Lehmer linear congruential method. We demonstrate in this paper that how the introduction of application specificity in the architecture can deliver huge performance in terms of area and speed. The design has been specified in VHDL...
A new area optimized VLSI architecture for color edge detection using Sobel operator is designed and implemented on Virtex-5 FPGA Platform. The proposed architecture uses only one processing element for computing gradients for all three R, G, and B color components and aims at reducing the FPGA resources usages. The FPGA resource usage is reduced m...
Image scaling, fundamental task of numerous image processing and computer vision applications, is the process of resizing an image by pixel interpolation. Image scaling leads to a number of undesirable image artifacts such as aliasing, blurring and moire . However, with an increase in the number of pixels considered for interpolation, the image qua...
Change detection is one of the several important problems in the design of any automated video surveillance system. Appropriate selection of frames of significant changes can minimize the communication and processing overheads for such systems. This research presents the design of a VLSI architecture for change detection in a video sequence and its...
Questions
Question (1)
Which polymers make a uniform film.