Fig 1 - uploaded by Karl F. MacDorman
Content may be subject to copyright.
Source publication
The ability of computers to recognise hand gestures visually is essential for progress in human–computer interaction.
Gesture recognition has applications ranging from sign language to medical assistance to virtual reality. However, gesture
recognition is extremely challenging not only because of its diverse contexts, multiple interpretations, and...
Similar publications
In this paper we present our bio-responsive virtual reality (VR) experience that explores visual forms of entrainment through amorphous nature-inspired phenomena that evolves and reacts to a tightly coupled real-virtual immersive space controlled through two immersants' physiological data. Multiple layers of real-time visuals inspired by nature phe...
Citations
... By analyzing muscle electrical activity during gestures, EMG systems can predict hand and wrist movements in real-time [5]. Compared to other modalities like computer vision [6], EMG offers several advantages: it † Shared last authorship. is robust to lighting and field-of-view challenges, it sidesteps privacy concerns associated with environmental recording, and is uniquely suited for operating prosthetic devices via residual muscle activity. ...
Electromyography (EMG)-based gesture recognition is a promising approach for designing intuitive human-computer interfaces. However, while these systems typically perform well in controlled laboratory settings, their usability in real-world applications is compromised by declining performance during real-time control. This decline is largely due to goal-directed behaviors that are not captured in static, offline scenarios. To address this issue, we use \textit{Context Informed Incremental Learning} (CIIL) - marking its first deployment in an object-manipulation scenario - to continuously adapt the classifier using contextual cues. Nine participants without upper limb differences completed a functional task in a virtual reality (VR) environment involving transporting objects with life-like grips. We compared two scenarios: one where the classifier was adapted in real-time using contextual information, and the other using a traditional open-loop approach without adaptation. The CIIL-based approach not only enhanced task success rates and efficiency, but also reduced the perceived workload by 7.1 %, despite causing a 5.8 % reduction in offline classification accuracy. This study highlights the potential of real-time contextualized adaptation to enhance user experience and usability of EMG-based systems for practical, goal-oriented applications, crucial elements towards their long-term adoption. The source code for this study is available at: https://github.com/BiomedicalITS/ciil-emg-vr.
... The main aim of hand gesture recognition is to interpret the meaning conveyed by the hand's location and posture [5]. Hand gesture recognition has applications in many different fields and is used in gaming, IoT devices, virtual reality (VR), augmented reality (AR), robotics, sign language interpretation, assistive technologies, sports activity assistance, underwater rescue, firefighting assistance, etc., as shown in Figure 1 [6][7][8][9]. Static hand gestures are defined by a single, fixed hand form or position; dynamic gestures, on the other hand, incorporate movement across time, as shown in Figure 2. Our research focuses on static hand gesture detection, which is crucial in domains such as computer vision, human-computer interaction, and sign language interpretation. ...
Human gesture image recognition is the process of identifying, deciphering, and classifying human gestures in images or video frames using computer vision algorithms. These gestures can vary from the simplest hand motions, body positions, and facial emotions to complicated gestures. Two significant problems affecting the performance of human gesture picture recognition methods are ambiguity and invariance. Ambiguity occurs when gestures have the same shape but different orientations, while invariance guarantees that gestures are correctly classified even when scale, lighting, or orientation varies. To overcome this issue, hand-crafted features can be combined with deep learning to greatly improve the performance of hand gesture image recognition models. This combination improves the model’s overall accuracy and dependability in identifying a variety of hand movements by enhancing its capacity to record both shape and texture properties. Thus, in this study, we propose a hand gesture recognition method that combines Reset50 model feature extraction with the Tamura texture descriptor and uses the adaptability of GAM to represent intricate interactions between the features. Experiments were carried out on publicly available datasets containing images of American Sign Language (ASL) gestures. As Tamura-ResNet50-OptimizedGAM achieved the highest accuracy rate in the ASL datasets, it is believed to be the best option for human gesture image recognition. According to the experimental results, the accuracy rate was 96%, which is higher than the total accuracy of the state-of-the-art techniques currently in use.
... Within this domain, unsupervised video object segmentation (UVOS) aims to automatically identify and segment visually salient and semantically meaningful objects from the background without manual annotations or prior knowledge [3,4]. This process mirrors human cognition in visual perception, laying the foundation for various high-level vision tasks [5] and enabling diverse applications [6]. However, videos often contain rich motion dynamics and complex background distractions, leading to challenges such as occlusion, rapid motion, and appearance changes, which pose significant obstacles in achieving precise and robust object segmentation. ...
Unsupervised video object segmentation (UVOS) aims to automatically segment the most salient and semantically meaningful objects in a video without relying on manual annotations. Existing methods often focus on direct feature fusion without fully exploiting the inherent advantages of individual features, leading to limited performance when handling scenes with diverse motion patterns or similar foreground-background appearances. To address these challenges, we propose MAHC (motion-appearance hierarchical clustering), a novel unsupervised video object segmentation framework that effectively integrates motion and appearance cues through hierarchical feature learning and progressive clustering refinement. Our framework employs a hierarchical interleaved attention mechanism within an autoencoder structure to enhance motion feature representation, and utilizes a multi-level clustering strategy that progressively integrates different clustering techniques to achieve comprehensive segmentation from global patterns to fine-grained details. Additionally, we improve foreground-background discrimination by combining multi-view subspace analysis with motion intensity information from reconstructed optical flow. Extensive experiments on four challenging benchmark datasets (DAVIS16, FBMS, SegTrackv2, and YouTube-Objects) demonstrate that MAHC significantly outperforms existing unsupervised methods.
... There already exist surveys about most topics mentioned in the introduction (see Section 1), but none have specifically discussed HGR on edge devices with a focus on sensor technologies, processing hardware, and algorithms. Most previous surveys deal with vision-based systems or vision enhanced with depth information, as [6], which is an early survey about gesture recognition in vision-based systems. It already describes the general structure of gesture recognition systems which pertains until today and shows what kind of gestures were recognized before 2019, as well as which research areas gesture recognition relates to. ...
Hand gesture recognition (HGR) is a convenient and natural form of human–computer interaction. It is suitable for various applications. Much research has already focused on wearable device-based HGR. By contrast, this paper gives an overview focused on device-free HGR. That means we evaluate HGR systems that do not require the user to wear something like a data glove or hold a device. HGR systems are explored regarding technology, hardware, and algorithms. The interconnectedness of timing and power requirements with hardware, pre-processing algorithm, classification, and technology and how they permit more or less granularity, accuracy, and number of gestures is clearly demonstrated. Sensor modalities evaluated are WIFI, vision, radar, mobile networks, and ultrasound. The pre-processing technologies stereo vision, multiple-input multiple-output (MIMO), spectrogram, phased array, range-doppler-map, range-angle-map, doppler-angle-map, and multilateration are explored. Classification approaches with and without ML are studied. Among those with ML, assessed algorithms range from simple tree structures to transformers. All applications are evaluated taking into account their level of integration. This encompasses determining whether the application presented is suitable for edge integration, their real-time capability, whether continuous learning is implemented, which robustness was achieved, whether ML is applied, and the accuracy level. Our survey aims to provide a thorough understanding of the current state of the art in device-free HGR on edge devices and in general. Finally, on the basis of present-day challenges and opportunities in this field, we outline which further research we suggest for HGR improvement. Our goal is to promote the development of efficient and accurate gesture recognition systems.
... Vision-based techniques, such as edge tracking, POI tracking, optical flow, and template matching, struggle to achieve comprehensive mobile reality-virtuality matching, even with using a priori or a posteriori approaches (Suenaga et al., 2015;Thangarajah et al., 2015). This is because visual methods are highly sensitive to challenging scenes, such as bright light irradiation, repetitive textures or lack of textures (Chakraborty et al., 2018). In addition, integrating sensors and visual information has become an effective way to address the challenges of MAR geo-registration. ...
... Virtual illumination model could affect real-time shadow depth cues and might enhance control efficiency as well, as mentioned in the previous studies [3,25]. Findings demonstrated in the previous studies [2,7] has shown evidences of the potentialities of illumination in VR emphasizing in-depth understanding of human perception and its impact towards human behaviour in a variety of settings and environments through different illumination. ...
With the advent of affordable hand gesture sensors and intriguing interaction techniques like touchless hand gesture interaction, a plethora of virtual reality (VR) learning applications have been developed recently with the aim of being user-friendly. In computer vision, image depth is particularly crucial when imitating real-world object in virtual environment (VE). As touchless hand gestures enable users to communicate without physically touching objects, creating a precise interaction in VE, particularly in controlling objects effectively, such as grasping, moving and locating, under varied illumination, is still challenging. Moreover, visual cues for touchless hand movements are crucial. Nonetheless, inappropriate visual cues produced by inadequate illumination, such as real-time shadow depth cues, however, do hinder users' ability to reach and grasp objects due to difficulty in estimating the object's 3D hand position over the object. Even after several modifications, the user was still having trouble grasping objects and, in fact, the object falls easily when moving it. Further, the users do experience difficulty to locate the object. Therefore, this study investigates the use of the fLight illumination model to improve real-time shadow depth cues in touchless hand gesture interactions using the Leap Motion Controller (LMC) in a desktop VR (dVR) environment. A series of iterative experiments were conducted to acquire qualitative and quantitative findings. Results indicate that precise real-time shadow depth cues significantly enhance user control in grasping, moving and locating virtual objects under different ambient physical illumination, instead of improve the visibility.
... Specifically, flexible strain sensors offer a direct means of measuring hand motion, either by affixing them onto fingers or integrating them into data gloves [4]- [6]. Compared to alternatives such as visionbased [7], ultrasound, radio frequency (RF) sensing [8], or electrical impedance tomography (EIT) [9], flexible strain sensors stand out for their advantages, including high accuracy, cost-effectiveness, direct measurement, and immunity to environmental noise [10]. ...
Supercoiled polymer artificial muscles have shown great potential for achieving simultaneous flexible sensing and actuating. However, the lack of clarity regarding the sensing mechanism, coupled with a manufacturing procedure primarily optimized for actuation, has posed challenges. This paper introduces a novel flexible supercoiled polymer sensor, denoted as SCPS, with a streamlined manufacturing process aimed at significantly enhancing sensing capabilities. The SCPS’s sensing mechanism is revealed by analyzing its mechanics and electrical reactions under tension, which is attributed to resistance change resulting from increased current along the helical fiber direction. The microstructure of the proposed SCPS is analyzed using scanning electron microscopy. A comprehensive set of experiments is undertaken to evaluate the sensor’s performance, revealing a remarkably fast response time (< 100 ms) and high linearity (R
2
= 0.96) within a strain range of 25%. Furthermore, the experiments demonstrate the SCPS’s low hysteresis (5.3%) and exceptional repeatability (5.9%). A gesture recognition data glove is developed using four SCPSs and a recognition accuracy of 95% is achieved for eight distinct gestures.
... The role of Human-Computer Interaction (HCI) has become increasingly prominent, yet it lacks a clear philosophical foundation with coherent goals Ren et al., 2019;Shu et al., 2016). The ability of computers to recognize hand signals is crucial for advancing HCI (Chakraborty et al., 2018;Huang, 2009;Ziad et al., 2016). This paper reviews current HCI interface design approaches in modern information systems to determine their effectiveness (Nguyen & Le, 2020;Mohammed & Karagozlu, 2021). ...
... The article also details multimodal UIs that process heterogeneous user input simultaneously (Dodd et al., 2017;Karpov & Yusupov, 2018). It highlights the efficiency of speech input over text typing and discusses vision-based gesture recognition in HCI, essential for advancements in human-computer interaction (Ortega, 2021;Chakraborty et al., 2018). The study outlines challenges in gesture recognition due to varied contexts, interpretations, and hand movements, noting limitations in current classifiers (Punchoojit & Hongwarittorrn, 2018;Chakraborty et al., 2018). ...
... It highlights the efficiency of speech input over text typing and discusses vision-based gesture recognition in HCI, essential for advancements in human-computer interaction (Ortega, 2021;Chakraborty et al., 2018). The study outlines challenges in gesture recognition due to varied contexts, interpretations, and hand movements, noting limitations in current classifiers (Punchoojit & Hongwarittorrn, 2018;Chakraborty et al., 2018). This article uses an interdisciplinary approach to explore arts, psychology, and cultures (Nguyen & Le, 2020). ...
In the realm of human-computer interaction (HCI), while the integration of computerized systems with humans (e.g., robots) is crucial, the focus often remains on the technology itself rather than on user acceptance and interaction. This creates a significant research gap, as future advancements in digital systems will rely heavily on effective HCI. This article reviews literature through a framework emphasizing User Experience (UX), which focuses on enhancing interactions between people and technology. UX is described as a method for creating desirable, accessible, and useful technology experiences. The paper concludes with recommendations for future UX research, particularly in wireless and emerging technologies, highlighting the role of User Experience Strategy (UXS) in addressing consumer needs and developing practical, engaging solutions. This systematic literature review's general target is to investigate studies published in the past 15 years related to using AI strategies in conventional technology. The fundamental goal is partitioned into the next RQ research question to accomplish a more point-by-point and far-reaching perspective on this subject.
... There has been a lot of interest in face recognition technology in the past several decades, and researchers all over the world have been working hard to perfect the technology [1][2][3][4][5]. Considerable gains have been achieved in this domain through the development of technology and artificial intelligence [6][7]. ...
For public health and safety reasons, face masks were required worldwide during the COVID-19 epidemic. However, this poses challenges for face recognition systems as the face is partially covered. Face recognition is a widely used and cost-effective biometric security system, but it faces difficulties in accurately identifying individuals wearing masks. Existing algorithms for face recognition have struggled to maintain efficiency, accuracy, and performance in the context of masked faces. To address these challenges and improve cost-effectiveness, a new machine learning model is required. This manuscript describes a lightweight deep learning methodology that is flexible and efficient in recognizing masked faces. The HSTU Masked Face Dataset (HMFD) is utilized, comprising frontal and lateral faces with various colored masks. Our proposed method involves a lightweight CNN model designed to enhance the accuracy of masked face identification. To enhance operational efficiency, methods like batch normalization, dropout, and depth-wise normalization are integrated which are tailored to meet particular specifications, aiming to optimize overall performance. These techniques improve the efficiency and accuracy of the model while minimizing overall complexity. In this research, the accuracy of the model is evaluated in comparison to other well-established deep learning models, including VGG16, VGG19, Extended VGG19, MobileNet, and MobileNetV2. The results demonstrate that our lightweight deep learning model outperforms these models, achieving a high recognition accuracy of 97%. By considering the needs of the task and carefully optimizing the model architecture, our proposed method offers an effective and efficient solution for recognizing masked faces in real-world scenarios.
... These systems use various image-capturing devices, including video cameras, webcams, stereo cameras, infrared cameras, and more sophisticated active methods such as Kinect and LMC (Light Measuring Camera). Vision-based approaches analyze the visual information captured by these devices to interpret and recognize gestures [53]. Stereo cameras, Kinect, and LMC are 3D cameras that capture depth and visual data. ...
Hand gesture is the main method of communication for people who are hearing-impaired, which poses a difficulty for millions of individuals worldwide when engaging with those who do not have hearing impairments. The significance of technology in enhancing accessibility and thereby increasing the quality of life for individuals with hearing impairments is universally recognized. Therefore, this study conducts a systematic review of existing literature review on hand gesture recognition, with a particular focus on existing methods that address the application of vision, sensor, and hybrid-based methods in the context of hand gesture recognition. This systematic review covers the period from 2018 to 2023, making use of prominent databases including IEEE Xplore, Science Direct, Scopus, and Web of Science. The chosen articles were carefully examined according to predetermined criteria for inclusion and disqualification. Our main focus was on evaluating the hand gesture representation, data acquisition, and accuracy of vision, sensor, and hybrid-based methods for recognizing hand gestures. The accuracy of discernment in scenarios that rely on the specific signer varies from 64% to 98%, with an average of 87.9% among the studies that were analyzed. On the other hand, in situations where the signer’s identity is not important, the accuracy of recognition ranges from 52% to 98%, with an average of 79% based on the research analyzed. The problems observed in continuous gesture identification highlight the need for more research efforts to improve the practical feasibility of vision-based gesture recognition systems. The findings also indicate that the size of the dataset continues to be a significant obstacle to hand gesture detection. Hence, this study seeks to provide a guide for future research by examining the academic motivations, challenges, and recommendations in the developing field of sign language recognition.