Sanjay Singh

Sanjay Singh
  • M.Tech., Ph.D.
  • Senior Principal Scientist & Group Head at CSIR – Central Electronics Engineering Research Institute (CSIR-CEERI)

Research & Development: AI, Computer Vision, Intelligent Systems

About

108
Publications
36,423
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,104
Citations
Introduction
Sr. Principal Scientist & Group Head @ CSIR-CEERI | Professor @ AcSIR India | Visiting Scientist @ TUM Germany | Visiting Scientist & Faculty @ Hiroshima University Japan | Visiting Scientist @ Nagasaki University Japan
Current institution
CSIR – Central Electronics Engineering Research Institute (CSIR-CEERI)
Current position
  • Senior Principal Scientist & Group Head
Additional affiliations
March 2025 - present
Technical University of Munich (TUM), Germany
Position
  • Visiting Scientist
Description
  • Visiting Scientist (under Raman Research Fellowship) at Chair of Biological Imaging, Technical University of Munich (TUM), Germany with Prof. Dr. Vasilis Ntziachristos [Chair of Biological Imaging (TUM) and Director of Institute of Biological and Medical Imaging at the Helmholtz Zentrum München]
June 2017 - September 2019
Hiroshima University
Position
  • Visiting Scientist & Designated Associate Professor
Description
  • ->Visiting Researcher [02 Sept. 2019 – 18 Sept. 2019] in Department of System Cybernetics ->Visiting Researcher [11 Jun. 2017 – 01 Jul. 2017] in Department of System Cybernetics ->Designated Associate Professor [18 Dec. 2017 – 14 Mar. 2018] in Graduate School of Engineering
September 2022 - September 2022
Nagasaki University, Japan
Position
  • Visiting Scientist
Description
  • Visiting Scientist under Japan Science and Technology Agency (JST) Funded Sakura Science Researcher Exchange Program
Education
March 2009 - November 2015
CSIR - Central Electronics Engineering Research Institute (CSIR-CEERI) & Kurukshetra University
Field of study
  • VLSI Architectures, Computer Vision, Real-time Image Processing
August 2005 - July 2007
Kurukshetra University
Field of study
  • Microelectronics & VLSI Design

Publications

Publications (108)
Preprint
Full-text available
Detecting and interpreting operator actions, engagement, and object interactions in dynamic industrial workflows remains a significant challenge in human-robot collaboration research, especially within complex, real-world environments. Traditional unimodal methods often fall short of capturing the intricacies of these unstructured industrial settin...
Preprint
Full-text available
Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MM...
Preprint
Full-text available
The field of computational imaging has witnessed a promising paradigm shift with the emergence of untrained neural networks, offering novel solutions to inverse computational imaging problems. While existing techniques have demonstrated impressive results, they often operate either in the high-data regime, leveraging Generative Adversarial Networks...
Preprint
Full-text available
Lensless imaging has emerged as a promising field within inverse imaging, offering compact, cost-effective solutions with the potential to revolutionize the computational camera market. By circumventing traditional optical components like lenses and mirrors, novel approaches like mask-based lensless imaging eliminate the need for conventional hardw...
Article
Detecting anomalies in videos presents a significant challenge in the field of video surveillance. The primary goal is identifying and detecting uncommon actions or events within a video sequence. The difficulty arises from the limited availability of video frames depicting anomalies and the ambiguous definition of anomaly. Based on extensive appli...
Article
Full-text available
Facial expression recognition (FER) in real-world unconstrained conditions is a challenging and active field of research among the pattern recognition and computer vision community. FER systems have immense use in advanced applications based on human-computer interaction (HCI) and human-robot interaction (HRI). Most of these applications heavily re...
Article
Lack of proper maintenance of power line infrastructures is one of the main reasons behind power shortages and major blackouts. Current inspection methods are human-dependent, which is time-consuming and expensive. Recent progress in Unmanned Aerial Vehicles (UAVs) and digital cameras enforces the use of UAVs for power line inspection, reducing the...
Article
Full-text available
Yoga has become an essential part of modern life, and hence, there has been a tremendous demand for self-training yoga platforms for trainer-less yoga practice. Robust and efficient recognition of yoga poses in video stream is the first requirement of such systems. However, the existing techniques for yoga pose recognition are compute-intensive and...
Article
Automatic detection of abnormal behavior in video sequences is a fundamental and challenging problem for intelligent video surveillance systems. However, the existing state-of-the-art Video Anomaly Detection (VAD) methods are computationally expensive and lack the desired robustness in real-world scenarios. The contemporary VAD methods cannot detec...
Article
Industry 5.0 and increased industrial automation have driven the demand for systems recognizing human activities in industrial environments. Vision-based systems for human activity recognition at industrial sites may be helpful in ergonomic studies. Besides, these systems may help identify possible deviations in assembly line standard operating pro...
Conference Paper
Anomaly detection in video data plays a crucial role in numerous applications, such as industrial monitoring and automated surveillance. This paper presents a novel method for video anomaly detection (VAD) using Generative Adversarial Networks (GANs). The proposed method called VALT-GAN combines two separate branches, one for spatial information an...
Article
Full-text available
The most crucial and difficult challenge for intelligent video surveillance is to identify anomalies in a video that comprises anomalous behavior or occurrences. The ambiguous definition of the anomaly makes the detection of it a challenging task. Inspired by the wide adoption of generative adversarial networks (GANs), we proposed video anomaly det...
Article
Full-text available
Automatic detection and interpretation of abnormal events have become crucial tasks in large-scale video surveillance systems. The challenges arise from the lack of a clear definition of abnormality, which restricts the usage of supervised methods. To this end, we propose a novel unsupervised anomaly detection method, Spatio-Temporal Generative Adv...
Article
Full-text available
Image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem increases the challenge since the outputs could be multimodal. Existing learning-based methods produce acceptable results for straightforward cases but usually fail to restore the contextual information without clear fig...
Article
Spoof detection in complex real-world conditions has always been challenging for the face anti-spoofing research community. Most existing datasets need more practical variations for spoof detection in the wild and thus generate the need for a more complex dataset encompassing the required diversities. The single image-based anti-spoofing solutions...
Article
Full-text available
Convolutional neural networks (CNNs) have achieved human-level performance in various computer vision tasks, such as image classification, object detection & segmentation, etc. However, efficient CNN training requires a large amount of annotated data. Also, the CNNs, without explicit data augmentation, are bad at handling rotation and scale invaria...
Article
Facial expression recognition (FER) in the wild is an active and challenging field of research. A system for automatic FER finds use in a wide range of applications related to advanced human–computer interaction (HCI), human–robot interaction (HRI), human behavioral analysis, gaming and entertainment, etc. Since their inception, convolutional neura...
Chapter
The human face is one of the most widely available biometric methods of identification and verification. In the age of Industry 4.0, one can find digital cameras everywhere, making a face recognition-based digital identity system much more viable. The face is vulnerable to spoofing attacks because it is the most accessible and commonly used biometr...
Article
The three-dimensional convolutional neural network (3D-CNN) and long short-term memory (LSTM) have consistently outperformed many approaches in video-based facial expression recognition (VFER). The image is unrolled to a one-dimensional vector by the vanilla version of the fully-connected LSTM (FC-LSTM), which leads to the loss of crucial spatial i...
Chapter
Lensless image reconstruction is an ill-posed inverse problem in computational imaging, having several applications in machine vision. Existing approaches rely on large datasets for learning to perform deconvolution and are often specific to the point spread function of a particular lensless imager. Generating pairs of lensless images and their cor...
Article
Full-text available
Systems for automatic facial expression recognition (FER) have an enormous need in advanced human-computer interaction (HCI) and human-robot interaction (HRI) applications. Over the years, researchers developed many handcrafted feature descriptors for the FER task. These descriptors delivered good accuracy on publicly available FER benchmark datase...
Article
This article presents an online local path planning approach for autonomous drone navigating a 2D plane in an unknown, indoor corridor-like environment. The proposed method utilizes a reinforcement learning approach for training a local path planner for navigation in the said environment. With a continuous actor-critic learning automaton (CACLA) ap...
Article
Full-text available
Early diagnosis of brain tumor using magnetic resonance imaging (MRI) is vital for timely medication and effective treatment. But, most people living in remote areas do not have access to medical experts and diagnosis facilities. Nevertheless, recent advancement in the Internet of Thing and artificial intelligence is transforming the healthcare sys...
Preprint
Full-text available
Grayscale image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem makes it even more challenging since the outputs could be multi-modal. The learning-based methods currently in use produce acceptable results for straightforward cases but usually fail to restore the contextual...
Preprint
Full-text available
Over the past few years, there has been a significant improvement in the domain of few-shot learning. This learning paradigm has shown promising results for the challenging problem of anomaly detection, where the general task is to deal with heavy class imbalance. Our paper presents a new approach to few-shot classification, where we employ the kno...
Article
Clinical diagnostics for SARS-CoV-2 infection usually comprises the sampling of throat or nasopharyngeal swabs that are invasive and create patient discomfort. Hence, saliva is attempted as a sample of choice for the management of COVID-19 outbreaks that cripples the global healthcare system. Although limited by the risk of eliciting false-negative...
Article
Full-text available
Automatic recognition of the eye states is essential for diverse computer vision applications related to drowsiness detection, facial emotion recognition (FER), human–computer interaction (HCI), etc. Existing solutions for eye state detection are either parameter intensive or suffer from a low recognition rate. This paper presents the design and im...
Article
Full-text available
Alarming cases of falls in the elderly have triggered the rise of robust and cost-efficient systems for automated fall detection in humans. Although several potential solutions exist, they still have not achieved the desired level of robustness and acceptability. Lately, the proliferation of low-cost cameras coupled with deep learning techniques ha...
Article
Full-text available
Today, due to the widespread outbreak of the deadly coronavirus, popularly known as COVID-19, the traditional classroom education has been shifted to computer-based learning. Students of various cognitive and psychological abilities participate in the learning process. However, most students are hesitant to provide regular and honest feedback on th...
Article
Full-text available
In our day-to-day social interactions, non-verbal cues such as facial emotions play a vital role. These cues assist people in understanding and inferring the hidden emotional state of the individuals. However, blind and visually impaired persons (VIPs) sadly lack access to such cues, which results in impaired interpersonal communication. To allevia...
Article
Full-text available
SARS-CoV2 pandemic exposed the limitations of artificial intelligence based medical imaging systems. Earlier in the pandemic, the absence of sufficient training data prevented effective deep learning (DL) solutions for the diagnosis of COVID-19 based on X-Ray data. Here, addressing the lacunae in existing literature and algorithms with the paucity...
Article
Purpose The electronic nose is an array of chemical or gas sensors and associated with a pattern-recognition framework competent in identifying and classifying odorant or non-odorant and simple or complex gases. Despite more than 30 years of research, the robust e-nose device is still limited. Most of the challenges towards reliable e-nose devices...
Article
Full-text available
Globally, human falls are the second leading cause of deaths induced due to unintentional injuries. These fatalities, in most cases, arise due to a lack of timely medication. Therefore, over the years, there has been an immense demand for systems that can quickly send fall-related information to the caretakers so that the medical relief team can re...
Article
This work proposes a hybrid 3D Convolutional Neural Network and Restricted Boltzmann Machine (Hybrid 3DCNN-RBM) architecture tailored for gas concentration estimation. The immense success of deep learning in computer vision and natural language processing inspired us to design a deep-learning-based gas concentration estimation network. The proposed...
Article
Full-text available
Existing techniques for Yoga pose recognition build classifiers based on sophisticated handcrafted features computed from the raw inputs captured in a controlled environment. These techniques often fail in complex real-world situations and thus, pose limitations on the practical applicability of existing Yoga pose recognition systems. This paper pr...
Article
Full-text available
Automatic recognition of facial expressions in the wild is a challenging problem and has drawn a lot of attention from the computer vision and pattern recognition community. Since their emergence, the deep learning techniques have proved their efficacy in facial expression recognition (FER) tasks. However, these techniques are parameter intensive,...
Article
Full-text available
In the past decade, facial emotion recognition (FER) research saw tremendous progress, which led to the development of novel convolutional neural network (CNN) architectures for automatic recognition of facial emotions in static images. These networks, though, have achieved good recognition accuracy, they incur high computational costs and memory u...
Chapter
This study is an attempt towards improving the accuracy and execution time of a facial expression recognition (FER) system. The algorithmic pipeline consists of a face detector block, followed by a facial alignment and registration, feature extraction, feature selection, and classification blocks. The proposed method utilizes histograms of oriented...
Article
Full-text available
Rapid growth in advanced human-computer interaction (HCI) based applications has led to the immense popularity of facial expression recognition (FER) research among computer vision and pattern recognition researchers. Lately, a robust texture descriptor named Dynamic Local Ternary Pattern (DLTP) developed for face liveness detection has proved to b...
Chapter
Fall detection holds immense importance in the field of health-care, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed mea...
Preprint
Full-text available
The coronavirus disease of 2019 (COVID-19) pandemic exposed a limitation of artificial intelligence (AI) based medical image interpretation systems. Early in the pandemic, when need was greatest, the absence of sufficient training data prevented effective deep learning (DL) solutions. Even now, there is a need for Chest-X-ray (CxR) screening tools...
Chapter
Driver’s drowsiness is one of the major causes of increase in the number of road accidents. Therefore, design and implementation of a real-time driver’s drowsiness detection system are considered as a crucial component of the Advanced Driver Assistance System (ADAS). Along with other physiological parameters, yawn is often considered as one of the...
Chapter
Drowsiness of drivers is a critical problem and has recently attracted a lot of attention from both academia and industry. A real-time driver’s drowsiness detection system is often considered as a crucial component of an Advanced Driver Assistance System (ADAS). Although, there are a number of physical parameters associated with drowsiness like bli...
Preprint
Fall detection holds immense importance in the field of healthcare, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed meas...
Chapter
Over the past few years, Convolutional Neural Networks (CNNs) have provided major breakthroughs in fields such as computer vision and natural language processing, resulting in a rise in the adoption of CNNs with increased levels of complexity. Consequently, the need for fast and power efficient processing of such networks has become critically impo...
Chapter
Recently, there has been a huge demand for assistive technology for industrial, commercial, automobile, and societal applications. In some of these applications, there is a requirement of an efficient and accurate system for automatic facial expression recognition (FER). Therefore, FER has gained enormous interest among computer vision researchers....
Chapter
Human activity recognition (HAR) targets the methodologies to recognize the different actions from a sequence of observations. Vision-based activity recognition is among the most popular unobtrusive technique for activity recognition. Caring for the elderly who are living alone from a remote location is one of the biggest challenges of modern human...
Chapter
Detection of falls of elderly people is a trivial yet an immediate problem due to the growing age of the population. This demands the need for autonomous self care systems for providing a quick assistance. The three basic approaches used for fall detection include non-invasive vision based devices, ambient based devices and wearable devices. The pa...
Article
Full-text available
In this study, the novel approach of real-time video stabilization system using a high-frame-rate (HFR) jitter sensing device is demonstrated to realize the computationally efficient technique of digital video stabilization for high-resolution image sequences. This system consists of a high-speed camera to extract and track feature points in gray-l...
Chapter
Automatic facial expression recognition (FER) has gained enormous interest among the computer vision researchers in recent years because of its potential deployment in many industrial, consumer, automobile, and societal applications. There are a number of techniques available in the literature for FER; among them, many appearance-based methods such...
Chapter
Detection of objects in aerial images has gained significant attention in recent years, due to its extensive needs in civilian and military reconnaissance and surveillance applications. With the advent of Unmanned Aerial Vehicles (UAV), the scope of performing such surveillance task has increased. The small size of the objects in aerial images make...
Chapter
Visual inspection of transmission and distribution networks is often carried out by various electricity companies on a regular basis to maintain the reliability, availability, and sustainability of electricity supply. Till date the widely used technique for carrying out an inspection is done manually either using foot patrol and/or helicopter opera...
Chapter
Automated video surveillance is a rapidly evolving area and has been gaining importance in the research community in recent years due to its capabilities of performing more efficient and effective surveillance by employing smart cameras. In this article, we present the design and implementation of an FPGA-based smart camera system for automated vid...
Conference Paper
Scene change detection, one of the fundamental and most important problem of computer vision, plays a very important role in the realization of a complete industrial vision system as well as automated video surveillance system - for automatic scene analysis, monitoring, and generation of alerts based on relevant changes in a video stream. Therefore...
Article
Full-text available
Motion detection is the heart of a potentially complex automated video surveillance system, intended to be used as a standalone system. Therefore, in addition to being accurate and robust, a successful motion detection technique must also be economical in the use of computational resources on selected FPGA development platform. This is because many...
Article
Full-text available
The design of smart video surveillance systems is an active research field among the computer vision community because of their ability to perform automatic scene analysis by selecting and tracking the objects of interest. In this paper, we present the design and implementation of an FPGA-based standalone working prototype system for real-time trac...
Chapter
In this paper, we present hardware accelerator for Facial Expression Classification using One-Versus-All (OVA) linear Support Vector Machine (SVM) classifier. The motivation behind this work is to perform real-time classification of facial expressions into three different classes: neutral, happy and pain, which could be used in an embedded system t...
Article
Full-text available
Design of automated video surveillance systems is one of the exigent missions in computer vision community because of their ability to automatically select frames of interest in incoming video streams based on motion detection. This research paper focuses on the real-time hardware implementation of a motion detection algorithm for such vision based...
Article
A new resource efficient FPGA-based hardware architecture for real-time edge detection using Sobel operator for video surveillance applications has been proposed. The choice of Sobel operator is due to its property to counteract the noise sensitivity of the simple gradient operator. FPGA is chosen for this implementation due to its flexibility to p...
Conference Paper
In this paper we present a prototype FPGA design for Saliency detection based on image signature technique to support embedded vision application. Visual attention supports biological vision to restrict our gaze only to the region of interest of a visual scene. We propose a pipelined architecture using Gaussian filter, Discrete Cosine Transform, In...
Conference Paper
Tracking of objects of interest is of great significance for video based automated surveillance systems. This research presents the design and implementation of Xilinx ML510 (Virtex-5 FXT) FPGA platform based vision system for real-time object tracking in a video sequence. Modified particle filtering and sum of absolute differences (SAD) based sche...
Conference Paper
An accurate, hardware efficient and fast image rescaling unit is a crucial part of any real-time image processing system. Although there are a number of image scaling algorithms existing in the literature but Bicubic and Bilinear interpolation algorithms are most widely used. In the recent years, numerous algorithms have been proposed that aim to b...
Conference Paper
This paper presents the design of a dedicated VLSI architecture for focused region extraction in a video sequence and its implementation on Virtex-5 (ML510) FPGA platform. Edge width based scheme is used for focused region extraction. The proposed architecture is designed to meet the real-time requirements of video surveillance applications. It is...
Article
Full-text available
This paper presents a comprehensive review and a comparative study of various hardware/FPGA implementations of Sobel edge detector and explored different architectures for Sobel gradient computation unit in order to show the various trade-offs involved in choosing one over another. The different architectures using pipelining and/or parallelism (ke...
Article
Full-text available
Advances in FPGA technology have dramatically increased the use of FPGAs for computer vision applications. The primary task for development of such FPGAs based systems is the interfacing of the analog camera with FPGA board. This paper describes the design and implementation of camera interface module required for connecting analog camera with Xili...
Article
Full-text available
Advances in FPGA technology have dramatically increased the use of FPGAs for computer vision applications. Availability of on-chip processor (like PowerPC) made it possible to design embedded systems using FPGAs for video processing applications. The objective of this research is to evaluate the performance of different memory components available...
Conference Paper
This research paper presents a fast and efficient hardware implementation of a pseudo-random number generator based on Lehmer linear congruential method. We demonstrate in this paper that how the introduction of application specificity in the architecture can deliver huge performance in terms of area and speed. The design has been specified in VHDL...
Conference Paper
A new area optimized VLSI architecture for color edge detection using Sobel operator is designed and implemented on Virtex-5 FPGA Platform. The proposed architecture uses only one processing element for computing gradients for all three R, G, and B color components and aims at reducing the FPGA resources usages. The FPGA resource usage is reduced m...
Article
Image scaling, fundamental task of numerous image processing and computer vision applications, is the process of resizing an image by pixel interpolation. Image scaling leads to a number of undesirable image artifacts such as aliasing, blurring and moire . However, with an increase in the number of pixels considered for interpolation, the image qua...
Article
Full-text available
Change detection is one of the several important problems in the design of any automated video surveillance system. Appropriate selection of frames of significant changes can minimize the communication and processing overheads for such systems. This research presents the design of a VLSI architecture for change detection in a video sequence and its...

Questions

Network

Cited By