ArticlePDF Available

Gesture Recognition Using mm-Wave Sensor for Human-Car Interface

Authors:

Abstract and Figures

This paper details the development of a gesture recognition technique using a mm-wave radar sensor for in-car infotainment control. Gesture recognition is becoming a more prominent form of human-computer interaction, and can be used in the automotive industry to provide a safe and intuitive control interface that will limit driver distraction. We use a 60 GHz mm-wave radar sensor to detect precise features of fine motion. Specific gesture features are extracted and used to build a machine learning engine that can perform real-time gesture recognition. This paper discusses the user requirements and in-car environmental constraints that influenced design decisions. Accuracy results of the technique are presented, and recommendations for further research and improvements are made.
Content may be subject to copyright.
VOL. 2, NO. 2, JUNE 2018 3500904
Microwave/millimeter wave sensors
Gesture Recognition Using mm-Wave Sensor for Human-Car Interface
Karly A. Smith1,Cl
´
ement Csech2, David Murdoch3, and George Shaker3,4
1Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
2Department of Biomechanics and Bioengineering, Universite de Technologie de Compiegne, Compiegne 60200, France
3Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
4Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Senior Member, IEEE
Manuscript received December 13, 2017; revised February 5, 2018; accepted February 21, 2018. Date of publication February 27, 2018; date of current
version May 4, 2018.
Abstract— This article details the development of a gesture recognition technique using a mm-wave radar sensor for in-car
infotainment control. Gesture recognition is becoming a more prominent form of human–computer interaction and can be
used in the automotive industry to provide a safe and intuitive control interface that will limit driver distraction. We use
a 60 GHz mm-wave radar sensor to detect precise features of fine motion. Specific gesture features are extracted and
used to build a machine learning engine that can perform real-time gesture recognition. This article discusses the user
requirements and in-car environmental constraints that influenced design decisions. Accuracy results of the technique are
presented, and recommendations for further research and improvements are made.
Index Terms—Microwave/millimeter wave sensors, human-car interface, 60 GHz mm-wave radar, gesture sensing, random forest clas-
sifier, machine learning.
I. INTRODUCTION
In the automotive industry, vehicular infotainment systems have
grown in popularity and complexity over the past several years.
Mainstream car manufacturers now offer up to 700 infotainment and
environmental controls for the driver and passengers to manipulate
[1]. However, increased functionality within the vehicle has increased
potential causes for driver distraction. The main causes of driver
distraction are categorized in [2] as visual, cognitive, manual, and
auditory. Studies have shown that visual and manual distractions
when combined have the most impact on driving performance [2].
This paper presents a gesture detection system using a mm-wave
radar sensor for intuitive human-vehicular interaction (HVI). Many
different gesture sensing and processing techniques have been
developed in recent years. Previous gesture detection systems have
used camera based sensors (IR, color, etc.), depth based sensors,
and wearable sensors such as gloves embedded with 3-D tracking
technology [2]–[11]. However, these systems all have significant
drawbacks that affect their usability. Camera based sensors are
susceptible to changes in light, colour, background, and have high
computational costs due to extensive image processing [3], [4]. Depth
based sensors are very good at detecting changes in position, however
they cannot detect orientation or specific hand shapes [5]. Wearable
technology may interfere with other tasks the user does in daily life
and limits system input to whoever is wearing the input device.
Alternatively, we believe that radar sensors present a viable sys-
tem solution. Radars are not affected by variable lighting changes
inside a car and are able to detect specific hand and finger ori-
entations with precision. The radar system described in this paper
Corresponding author: Karly A. Smith (e-mail: k62smith@edu.uwaterloo.ca).
Associate Editor: Y. Duroc.
Digital Object Identifier 10.1109/LSENS.2018.2810093
provides real-time visionless infotainment control to driver and pas-
sengers without wearable components, decreasing risk of driver dis-
traction and allowing multiple user input. Previous work has been
done on in-car gesture sensing that combines short-range radar, time
of flight depth sensors, and color cameras for gesture detection [6].
That system used a FMCW monopulse 25 GHz radar in conjunc-
tion with camera based data for detection and a convolutional neu-
ral network for near real-time recognition [6]. In comparison, the
system presented in this paper uses 60 GHz radar for finer spatial
resolution, and a random forest classifier algorithm for real time
recognition.
II. SYSTEM DESIGN
Using a wireless radar sensor for detection and recognition of ges-
tures offers several advantages over other systems currently in use.
Automobile manufacturers currently offer touchscreens, voice con-
trol, Bluetooth phone connection, and other methods of infotainment
control. Interfaces that require tactile manual input such as a touch-
screen also require small amounts of visual attention to navigate,
taking the driver’s eyes off the road. Voice control does not require
manual or visual control, however if the in-car environment is noisy
with music or conversation voice control is not a viable option. As
highlighted earlier, there are several other systems for gesture and
posture detection in development that use IR and depth cameras for
sensing [2]–[11]; however, cameras are affected by light conditions
and obstacles in the field of view. As well, issues of privacy and user
compliance arise when cameras are in use. Radar is advantageous
because it is not affected by light or sound in the environment, it
can be embedded in devices, has very precise resolution, offers real
time recognition, and does not require recording of an image of the
user [7].
1949-307X C2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
3500904 VOL. 2, NO. 2, JUNE 2018
Fig. 1. Radar chip photo showing size comparable to a nickel.
Fig. 2. Timestamped images of gesture progression with correspond-
ing range doppler signature progressions.
A. Gesture Detection and Recognition
In this article, we utilized a 60 GHz frequency modulated continuous
wave (FMCW) mm-wavelength radar sensor. The sensor hardware
consists of an 8 mm ×11 mm radar chip with two Tx and four Rx
antennas shown in Fig. 1.
The radar board was fitted inside an after-market car. Machine learn-
ing was used to record the radar signature of each set of hand gestures
(as shown in Fig. 2(a) and (b)), train a model using a random forest
classifier algorithm, and then perform recognition using that model.
The random forest classifier was used because of the higher success
rate earlier as highlighted in [15]. Features of the received signal were
processed and made easily accessible for manipulation in C language.
The features used in this project were range, acceleration, energy to-
tal, energy moving, velocity, velocity dispersion, spatial dispersion,
energy strongest component, and movement index [12].
Each gesture was assigned a classifier number, then twenty of each
were recorded and timestamped for data collection. Twenty samples
of the background were also recorded and assigned a classifier, so
Fig. 3. Two placements of the radar sensor. (a) For the driver and
front-seat passenger. (b) For the backseat passengers.
the system would accurately recognize the absence of a gesture. The
collected data was then used to create a random forest classifier, al-
lowing real time recognition of future gestures. The parameters of the
random forest classifier were set as follows:
1. Forest Size =10
2. Forest Max Depth =10
3. Forest Min Samples =50
4. Classifier Min Count =30
5. Classifier Buffer Size =50.
Increasing the number of trees within the classifier increased robust-
ness, and altering the ratio of classifier min count to classifier buffer
size changed the precision of recognition. A ratio of 3:5 between clas-
sifier min count and classifier buffer size ensures that for classification
to occur 30 out of the 50 minimum samples required must be classified
in the same category. Forest size and forest depth were both set to 10
to ensure adequate forest size and average tree depth for classification,
while minimizing computational cost on the system.
B. Sensor Placement
Specific environmental and user constraints were considered when
designing the system and gesture sets to ensure usability and robust-
ness. The interior of a car is spatially complex which created detection
challenges for the sensor. If placed too close to detectable objects, such
as a gear shift, it often recognized false positives, or did not recognize
gestures within the field. Placing the sensor where majority of the
radar beam spread into free space and recording a robust background
case mitigated detection of objects and identified those objects that did
intrude into the field as non-targets. This spatial constraint, combined
with the spatial constraints of the user position, lead to placement
of the sensor on the center console (for use by front seat passenger
and the driver) and between the backs of the front seats (for use by
the back-seat passengers). The beam of the field spreads upward into
free space, where little movement from passengers naturally occurs.
With these two placements (as shown in Fig. 3), the radar is within a
comfortable arm’s length reach for all passengers.
VOL. 2, NO. 2, JUNE 2018 3500904
Fig. 4. Flowchart of the system from gesture recognition to infotain-
ment output.
C. Connections to In-Car Infotainment System
When a gesture was detected, its classifier number was added to
the body of a POST request. This POST request was then sent to an
Android phone (paired to a car in older car models) or Android auto
system (available in newer car models), which, acting as a server,
parsed the POST request for the classifier number. The Android plat-
form would then parse the request to receive the classifier number.
Each classifier number was associated with some action, which the
system could then undertake, using either Android intents for phone
actions, or the Spotify API for media actions. This allowed for both
the streaming of audio over the car speakers and the use of the physi-
cal car infotainment system to display relevant information. A visual
representation of the entire system can be seen in Fig. 4.
D. Gesture Set Design
The needs of the driver and characteristics of the system were
the main considerations when designing the gesture sets. The 60 GHz
sensor in use was designed to detect fine motion, with spatial resolution
as fine as 0.3 mm, which is quite an improvement from the 3.75 cm
range resolution of the 25 GHz radar system presented in [13]; this
enables the 60 GHz radar to have greater precision and accuracy,
smaller footprint allowing for spatial and temporal differences to be
used to make gestures distinguishable from one another. Practical
system implementation dictated the use of gestures with tolerance of
large margins of error to create less driver distraction. By creating
larger spatial zones for each gesture, and having gestures performed
at varying speeds the driver would have more room for non-exact
gestures and thus would need to devote less visual resources to the
motion. Designing more robust gestures also allows multiple users to
operate the system, as it will be more tolerant to the natural spatial
and temporal variations of each user and the various gesture motions
within different vehicle designs.
Two demonstration videos [14] of the system at work were recorded
for the proof of concept system, the details of which are shown in
Table 1. Only the phone function and playlist music function of the
car infotainment system were included in the demonstration videos,
however the system can control much more. As much as possible
intuitive gestures that related to their corresponding functions were
used to limit cognitive load of the user. Using universal signs such
as a telephone gesture and numbers 1, 2, and 3 will make gestures
easier to recall and natural to perform. Vehicular infotainment systems
often use menu functions to navigate and organize all the options of
Table 1. Summary of Gesture Demonstration Characteristics
Demo Gesture Function
Phone (Driver) Wiggle phone sign Call/Hang up
Music (Backseat Passenger) Wiggle fingers Pause/Play
Hold out one finger Select Playlist 1
Wiggle 2 fingers Select Playlist 2
Wave 3 fingers Select Playlist 3
control, which allows reuse of several gestures. The same gestures
can be used in all menus for back, forward, select, etc., which will
provide extensive functionality to the user while limiting the number
of gestures they need to remember. Only main gestures that will be
used to select which menu to access (i.e., contact list, map, music, etc.)
will not be reusable. Reuse of gestures also increases the accuracy of
the system, as fewer classification options has been shown to increase
accuracy of recognition when using a random forest classifier [15].
III. TESTING THE SYSTEM
For each demonstration, gesture sets were recorded by one individ-
ual to film the demonstration. It took significantly less time to record
with one individual; however, gesture sets created by one individual
may not work for other users. Recording with multiple individuals will
capture natural temporal and spatial variations introduced with each
new user to create a more robust system. Another two gesture sets
comprised of three gestures each were recorded by multiple individ-
uals to test performance with more human variation. Five individuals
were used to record each gesture set, each individual recording 20 of
each gesture. The gesture set was later tested by the five recorders as
well as three individuals who did not record, the results of which are
shown in Table 2. To test, everyone performed each gesture 30 times
and the accuracy percentage was recorded. Before testing the system,
all participants had opportunity to practice and find the correct range
and speed of each gesture. Instructions were provided during this
practice to ensure each gesture was performed correctly; these results
reflect user accuracy with a strong understanding of each movement.
The participants’ ability to learn the gestures alone was not evalu-
ated and should be explored further to evaluate learnability of each
gesture when only provided with written or video instruction. The
system performed above 90% accuracy for all gestures on average.
The gestures with the lowest accuracy were gestures 1 and 4, which
may be attributed to human error; both had several incorrectly timed
recordings. Gesture set 2 showed a slight decrease in accuracy going
from the recorders to the users, which may be attributed to the spatial
design of the gesture set. Gesture set 1 had more distinct spatial zones
for each gesture, whereas the spatial zones for gesture set 2 had some
overlap; those who recorded the gesture set had more familiarity with
the spatial zones than the users who did not record. To improve the
accuracy of the system more participants should be used to record,
with each participant recording more gestures. It also may be valuable
to distribute the gestures evenly when testing and recording rather
than complete all samples for a gesture at once; it was observed that as
participants fatigued, variations were introduced that were not present
when the gesture was done naturally only once.
3500904 VOL. 2, NO. 2, JUNE 2018
Table 2. Summary of Recognition Accuracy Results for Two Gesture Sets
Number Set 1 Recorder 1 Recorder 2 Recorder 3 Recorder 4 Recorder 5 User 1 User 2 User 3 Average
1 Low wiggle 100% 100% 87% 90% 93% 93% 80% 83% 91%
2 Turn over 100% 100% 100% 100% 100% 93% 100% 100% 99%
3 High grab 93% 100% 90% 100% 87% 100% 90% 100% 95%
-Set 2 - - - - - - - - -
4 Swipe 100% 93% 87% 97% 93% 87% 87% 83% 91%
5 Large circle 100% 100% 100% 100% 97% 93% 93% 90% 97%
6 Small circle 100% 100% 93% 100% 100% 93% 93% 87% 96%
IV. CONCLUSION
We have presented a gesture detection system using mm-wave radar
for vehicular infotainment control. The gestures were designed to be
distinguishable by the radar, be intuitive and memorable for the user,
and fit the constraints of the environment. Specific decisions were
made to maximize ease of use by drivers; further testing should be
done to validate these usability design decisions. Demonstrations of
the system in use were filmed and presented to showcase the use of
intuitive gestures [14], ease of use by both driver and passengers,
and the use of large spatial zones for more robust recognition. The
accuracy of the system was tested with multiple users; it was found
that involving more participants when recording a gesture set increased
accuracy and robustness. In the future, studies are needed to define
and optimize the required user input for the system training stages.
ACKNOWLEDGMENT
This work was supported by the Natural Sciences and Engineering Research Council
of Canada. The radar transceiver used in this work was provided by Google’s Advanced
Technology and Projects group (ATAP) through the Project Soli Alpha DevKit Early
Access Grant.
REFERENCES
[1] C. Pickering, K. Burnham, and M. Richardson, “A research study of hand gesture
recognition technologies and applications for human vehicle interaction,” in Proc.
IET 3rd Inst. Eng. Technol. Conf. Automotive Electron., Warwick, U.K., 2007,
pp. 1–15.
[2] K. Young and M. Regan, “Driver distraction: A review of the literature,” Monash
Univ. Accident Res. Centre, Clayton, Vic, Australia, 2007, pp. 379–405.
[3] J. P. Wachs, M. Kolsch, H. Stern, and Y. Edan, “Vision based hand gesture ap-
plications,” Commun. ACM, vol. 54, no. 2, pp. 60–71, 2011. [Online]. Available:
http://dx.doi.org/10.1145/1897816.1897838
[4] P. Breuer, C. Eckes, and S. M¨
uller, “Hand gesture recognition with a novel IR
time-of-flight range camera—A pilot study,” in Proc. Int. Conf. Vis./Comput. Techn.
Appl., 2007, pp. 247–260. [Online]. Available:http://dx.doi.org/10.1007/978-3-540-
71457-6_23
[5] C. Keskin, F. Kırac, Y. E. Kara, and L. Akarun, “Real time hand pose estimation using
depth sensors,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2011, pp. 1228–
1234. [Online]. Available: http://dx.doi.org/10.1109/ICCVW.2011.6130391
[6] P. Molchanov, S. Gupta, K. Kim, and K. Pulli, “Multi-sensor system for driver’s
hand-gesture recognition,” in Proc. IEEE 11th Int. Conf. Workshops Automat.
Face Gesture Recognit., Ljubljana, Slovenia, 2015, pp. 1–8. [Online]. Available:
http://dx.doi.org/10.1109/FG.2015.7163132
[7] A. Riener, M. Rossbory, and M. Ferscha, “Natural DVI based on intuitive
hand gestures,” in Proc. INTERACT Workshop User Experience Cars, 2011,
pp. 62–66.
[8] P. Molchanov, S. Gupta, K. Kim, and J. Kautz, “Hand gesture recognition
with 3D convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. Workshops, Boston, MA, USA, 2015, pp. 1–7. [Online]. Available:
http://dx.doi.org/10.1109/CVPRW.2015.7301342
[9] E. Ohn-Bar and M. M. Trivedi, “Hand gesture recognition in real time for automotive
interfaces: A multimodal vision based approach and evaluations,IEEE Trans. Intell.
Transportation Syst., vol. 15, no. 6, pp. 2368–2377, Dec. 2014. [Online]. Available:
http://dx.doi.org/10.1109/TITS.2014.2337331
[10] Y. Jacob, F. Manitsaris, G. Lele, and L. Pradere, “Hand gesture recognition for driver
vehicle interaction,” in Proc. IEEE Comput. Soc. Workshop Observing Understand.
Hands Action 28th IEEE Conf. Comput. Vis. Pattern Recognit., Boston, MA, USA,
2015, pp. 41–44.
[11] U. Reissner, “Gestures and speech in cars,” Dept. Informat., Technische Univ.
Munchen, Munchen, Germany, 2007.
[12] N. Gillian, “Gesture Recognition Toolkit, ver. 1.0,” Oct. 2017. [Online]. Available:
http://www.nickgillian.com/grt/
[13] P. Molchanov, S. Gupta, K. Kim, and K. Pulli, “Short-range FMCW
monopulse radar for hand-gesture sensing,” in Proc. IEEE Radar Conf.,
Johannesburg, South Africa, 2015, pp. 1491–1496, Online]. Available:
http://dx.doi.org/10.1109/RADAR.2015.7131232
[14] G. Shaker, “Gesture recognition using mm-waves,” Waterloo Artif. Intell. Inst., Nov
2017. [Online]. Available: https://goo.gl/iRqkJC
[15] J. Lien, “Soli: Ubiquitous gesture sensing with mm-wave radar,” ACM
Trans. Graphics, vol. 35, no. 4, 2016, Art. no. 142, [Online]. Available:
http://dx.doi.org/10.1145/2897824.2925953
[16] S. Naik, H. R. Abhishek, K. N. Ashwal, and S. P. Balasubramanya, “A study
on automotive human vehicle interaction using gesture recognition technol-
ogy,Int. J. Multidisciplinary Cryptol. Inf. Secur., vol. 1, no. 2, pp. 6–12,
2012.
[17] M. Alpern and K. Minardo, “Developing a car gesture interface for use as
a secondary task,” in Proc. ACM Extended Abstracts Hum. Factors Com-
put. Syst., Fort Lauderdale, FL, USA, 2003, pp. 932–933. [Online]. Available:
http://dx.doi.org/10.1145/765891.766078
... Millimeter-wave radar has proven to be a robust sensing modality for privacy-sensitive and occlusion-prone environment [26,40,63]. Early approaches relied on handcrafted features, such as Micro-Doppler signatures [24], in combination with traditional classifiers (e.g., SVM [50], RF [37]), achieving up to 91% accuracy in recognizing basic actions like walking and gesturing [20,24,47]. More recent advances leverage deep learning architectures (e.g., LSTMs [45], dual-stream CNNs [11]) to automate feature extraction, further improving accuracy to 94-99% in controlled settings [11,23]. ...
Preprint
Millimeter-wave radar provides a privacy-preserving solution for human motion analysis, yet its sparse point clouds pose significant challenges for semantic understanding. We present Radar-LLM, the first framework that leverages large language models (LLMs) for human motion understanding using millimeter-wave radar as the sensing modality. Our approach introduces two key innovations: (1) a motion-guided radar tokenizer based on our Aggregate VQ-VAE architecture that incorporates deformable body templates and masked trajectory modeling to encode spatiotemporal point clouds into compact semantic tokens, and (2) a radar-aware language model that establishes cross-modal alignment between radar and text in a shared embedding space. To address data scarcity, we introduce a physics-aware synthesis pipeline that generates realistic radar-text pairs from motion-text datasets. Extensive experiments demonstrate that Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions. This breakthrough facilitates comprehensive motion understanding in privacy-sensitive applications like healthcare and smart homes. We will release the full implementation to support further research on https://inowlzy.github.io/RadarLLM/.
... Recently, several commercial devices have built-in radar for HGR. As a result, radar-based approaches for dynamic hand gesture recognition have gained significant attention [11]- [13]. ...
Article
Full-text available
In Human-Computer Interaction (HCI), seamless hand gesture recognition is essential for intuitive and natural interactions. Gestures act as a universal language, bridging the gap between humans and machines. Radar-based recognition surpasses traditional optical methods, offering robust interaction capabilities in diverse environments. This article introduces a novel deep-learning approach for hand gesture recognition, leveraging convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and attention mechanisms. CNNs extract spatial features from radar signals, LSTMs model temporal dependencies crucial for dynamic gestures, and attention mechanisms enhance feature selection, improving recognition performance.We evaluate our approach on the UWB-Gestures dataset, consisting of 12 gestures from eight participants captured using three X-Thru X4 UWB impulse radar sensors. Our pipeline integrates feature extraction, LSTM-attention blocks, and dense layers for final classification. Early fusion techniques, which combine spatial and temporal features in the initial stages, yield superior results, achieving an overall accuracy of 98.33% and outperforming intermediate fusion methods across gesture classes. To ensure robustness, we tested the model under radar-specific noise conditions, such as Gaussian noise, signal inversion, and multipath interference—common sources of signal degradation in real-world applications. Our model demonstrates high resilience, maintaining performance despite adverse conditions. Compared to state-of-the-art approaches, our approach delivers competitive accuracy and enhanced robustness, offering a reliable solution for noise-resilient radar-based hand gesture recognition in real-world applications.
... Additionally, radar operates reliably in both bright and dark environments and remains effective even with partial obstructions, unlike cameras that require a clear line of sight. These characteristics make radar a compelling alternative for secure, contactless gesture recognition in smart homes and other applications where privacy and reliability are critical [13]- [16]. Machine Learning (ML) models have been widely used for radar-based gesture classification [17]. ...
Article
Full-text available
This study introduces a novel physically interpretable data augmentation framework to improve the robustness and accuracy of hand gesture recognition using Frequency-Modulated Continuous Wave (FMCW) radar and Convolutional Neural Networks (CNN). The proposed reconfigurable and parametric method modifies specific characteristics of five time-series features, namely range, velocity, azimuth and elevation angles, and signal magnitude, to generate synthetic gesture samples with realistic variations. By simulating variations in hand gesture distance, angle, duration, and noise, the framework improves model generalization while reducing the need for extensive and costly data collection. The augmentation techniques employed in this research include time scaling, range and angle transformation, and noise injection, effectively simulating different gesture speeds, orientations, distances, and interference levels. Additionally, applying augmentation at the feature level, rather than on raw radar data, reduces data size significantly, leading to faster training, lower memory requirements, and improved scalability. Experimental results demonstrate significant performance gains in a 1D-CNN classifier deployed on an ARM Cortex-M4 microcontroller after applying the proposed augmentation techniques. Specifically, the combination of time scaling, range transformation, and noise injection improves accuracy by 16.67 %, precision by 6.4 %, recall by 15.07 %, and F1 score by 13.12 % in the extended range scenario, 1 meter to 1.2 meter, compared to the baseline model trained on unaugmented data.
... In automotive settings, JCAS detects gestures for hands-free control of infotainment systems and abnormal movements in real time [36]. Meanwhile, in Virtual Reality (VR), JCAS enables more accurate and immersive tracking of user movement, allowing for more interactive and responsive environments in gaming, training, or simulation scenarios [37]. ...
Chapter
This chapter delves into the principles, architectures, and applications of Joint Communications and Sensing (JCAS) systems, a transformative technology central to 6G networks. It begins by examining the convergence of radar and communication systems, highlighting the potential of JCAS to optimize spectral efficiency, reduce hardware complexity, and enable dual functionality through shared signals. Key sections explore architecture models, including monostatic, bistatic, and multistatic setups, alongside innovations in millimeter-wave (mmWave) and sub-terahertz hardware design, such as integrated 60 GHz transceivers. Practical applications in areas such as autonomous vehicles, Industry 4.0, and healthcare are analyzed, emphasizing sensing-assisted communications and real-time environmental awareness. Advanced waveform design and optimization techniques are discussed in detail, focusing on achieving trade-offs between communication reliability and sensing accuracy. The chapter underscores the critical role of JCAS in shaping future 6G networks and outlines ongoing challenges in hardware integration and standardization.
... In [24], authors studied and examined the contactless, non-intrusive vehicle occupant detecting methods. Real-time hand gesture detection using 60 GHz mmWave radar reduces driver distractions, replacing the need for touch screens and wearable components [25]. ...
Chapter
Full-text available
Radar technology is significantly, revolutionizing the way we appraise the world. This paper provides a detailed review and scrabbles us into the fundamental principles, operations, and technological advancement in radar systems, especially the antennas used and the exploration of how it has transfigured various industries. From air traffic control and weather forecasting to autonomous vehicles and space exploration, radar technology has become crucial. This paper highlights how radar technology helps us in day-today life, and how this technology is efficient in safety and connectivity concerns. On this captivating journey, exploration of radar technology's secrets and its importance in shaping the modern world is done.
... Smith et al. [9] developed a gesture recognition technique using a 60 GHz mmWave radar sensor for in-car infotainment control, aiming to provide a safe and intuitive interface to reduce driver distraction. They extracted specific gesture features and built a machine learning engine capable of real-time recognition, considering user requirements and in-car environmental constraints. ...
Article
Full-text available
Gesture recognition technology based on millimeter-wave radar can recognize and classify user gestures in non-contact scenarios. To address the complexity of data processing with multi-feature inputs in neural networks and the poor recognition performance with single-feature inputs, this paper proposes a gesture recognition algorithm based on ResNet Long Short-Term Memory with an Attention Mechanism (RLA). In the aspect of signal processing in RLA, a range–Doppler map is obtained through the extraction of the range and velocity features in the original mmWave radar signal. Regarding the network architecture in RLA, the relevant features of the residual network with channel and spatial attention modules are combined to prevent some useful information from being neglected. We introduce a residual attention mechanism to enhance the network’s focus on gesture features and avoid the impact of irrelevant features on recognition accuracy. Additionally, we use a long short-term memory network to process temporal features, ensuring high recognition accuracy even with single-feature inputs. A series of experimental results show that the algorithm proposed in this paper has higher recognition performance.
Preprint
Full-text available
Millimeter-wave (mmWave) radar-based gesture recognition is gaining attention as a key technology to enable intuitive human-machine interaction. Nevertheless, the significant challenge lies in obtaining large-scale, high-quality mmWave gesture datasets. To tackle this problem, we present iRadar, a novel cross-modal gesture recognition framework that employs Inertial Measurement Unit (IMU) data to synthesize the radar signals generated by the corresponding gestures. The key idea is to exploit the IMU signals, which are commonly available in contemporary wearable devices, to synthesize the radar signals that would be produced if the same gesture was performed in front of a mmWave radar. However, several technical obstacles must be overcome due to the differences between mmWave and IMU signals, the noisy gesture sensing of mmWave radar, and the dynamics of human gestures. Firstly, we develop a method for processing IMU and mmWave data to extract critical gesture features. Secondly, we propose a diffusion-based IMU-to-radar translation model that accurately transforms IMU data into mmWave data. Lastly, we devise a novel transformer model to enhance gesture recognition performance. We thoroughly evaluate iRadar, involving 18 gestures and 30 subjects in three scenarios, using five wearable devices. Experimental results demonstrate that iRadar consistently achieves 99.82% Top-3 accuracy across diverse scenarios.
Article
Full-text available
Approximately one quarter of vehicle crashes in the United States are estimated to result from the driver being inattentive, or distracted. As more wireless communication, entertainment and driver assistance systems proliferate the vehicle market, the incidence of distraction-related crashes is expected to escalate. In North America, Europe and Japan, driver distraction is a priority issue in road safety. However, the significance of driver distraction as road safety issue has only recently been recognised in Australia. This paper provides a review of current research on in-vehicle driver distraction, focusing on mobile phone use in particular, given that this device has received the greatest attention in the driver distraction literature. The review discusses the effect of in-vehicle devices on driving performance. Issues addressed include: the adaptive strategies drivers adopt in order to maintain their driving performance while distracted at an adequate level; under what conditions these adaptive strategies can fail; and how driving performance is affected when they do. Also examined is whether, and to what degree, these degradations in driving performance translate into an increased crash risk. In the final section of the paper, recommendations for future research are provided.
Chapter
The Gesture Recognition Toolkit is a cross-platform open-source C++ library designed to make real-time machine learning and gesture recognition more accessible for non-specialists. Emphasis is placed on ease of use, with a consistent, minimalist design that promotes accessibility while supporting flexibility and customization for advanced users. The toolkit features a broad range of classification and regression algorithms and has extensive support for building real-time systems. This includes algorithms for signal processing, feature extraction and automatic gesture spotting.
Article
This paper presents Soli, a new, robust, high-resolution, low-power, miniature gesture sensing technology for human-computer interaction based on millimeter-wave radar. We describe a new approach to developing a radar-based sensor optimized for human-computer interaction, building the sensor architecture from the ground up with the inclusion of radar design principles, high temporal resolution gesture tracking, a hardware abstraction layer (HAL), a solid-state radar chip and system architecture, interaction models and gesture vocabularies, and gesture recognition. We demonstrate that Soli can be used for robust gesture recognition and can track gestures with sub-millimeter accuracy, running at over 10,000 frames per second on embedded hardware.
Article
Intelligent driver assistance systems have become important in the automotive industry. One key element of such systems is a smart user interface that tracks and recognizes drivers' hand gestures. Hand gesture sensing using traditional computer vision techniques is challenging because of wide variations in lighting conditions, e.g. inside a car. A short-range radar device can provide additional information, including the location and instantaneous radial velocity of moving objects. We describe a novel end-to-end (hardware, interface, and software) short-range FMCW radar-based system designed to effectively sense dynamic hand gestures. We provide an effective method for selecting the parameters of the FMCW waveform and for jointly calibrating the radar system with a depth sensor. Finally, we demonstrate that our system guarantees reliable and robust performance.
Article
In this paper, we develop a vision-based system that employs a combined RGB and depth descriptor to classify hand gestures. The method is studied for a human-machine interface application in the car. Two interconnected modules are employed: one that detects a hand in the region of interaction and performs user classification, and another that performs gesture recognition. The feasibility of the system is demonstrated using a challenging RGBD hand gesture data set collected under settings of common illumination variation and occlusion.
Conference Paper
We propose a novel multi-sensor system for accurate and power-efficient dynamic car-driver hand-gesture recognition, using a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. We present a procedure to jointly calibrate the radar and depth sensors. We employ convolutional deep neural networks to fuse data from multiple sensors and to classify the gestures. Our algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night. It consumes significantly less power than purely vision-based systems.
Article
Haptic use of infotainment (information and entertainment) systems in cars, de- flect the driver from his primary task, the driving. Therefore we need new Human- Machine Interfaces (HMI), which do not require the driver's full attention, for controlling infotainment systems in cars. Gestures, speech and sounds provide an intuitive addition for existing haptical controls in automotive environments. While sounds are in common used as output, speech and gestures can be used as input modalities as well. The combination of all these devices leads us to multi- modal HMIs, where dierent input and output devices can be used at the same time.