ArticlePDF Available

Gesture Recognition Using mm-Wave Sensor for Human-Car Interface

Abstract and Figures

This paper details the development of a gesture recognition technique using a mm-wave radar sensor for in-car infotainment control. Gesture recognition is becoming a more prominent form of human-computer interaction, and can be used in the automotive industry to provide a safe and intuitive control interface that will limit driver distraction. We use a 60 GHz mm-wave radar sensor to detect precise features of fine motion. Specific gesture features are extracted and used to build a machine learning engine that can perform real-time gesture recognition. This paper discusses the user requirements and in-car environmental constraints that influenced design decisions. Accuracy results of the technique are presented, and recommendations for further research and improvements are made.
Content may be subject to copyright.
VOL. 2, NO. 2, JUNE 2018 3500904
Microwave/millimeter wave sensors
Gesture Recognition Using mm-Wave Sensor for Human-Car Interface
Karly A. Smith1,Cl
´
ement Csech2, David Murdoch3, and George Shaker3,4
1Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
2Department of Biomechanics and Bioengineering, Universite de Technologie de Compiegne, Compiegne 60200, France
3Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
4Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Senior Member, IEEE
Manuscript received December 13, 2017; revised February 5, 2018; accepted February 21, 2018. Date of publication February 27, 2018; date of current
version May 4, 2018.
Abstract— This article details the development of a gesture recognition technique using a mm-wave radar sensor for in-car
infotainment control. Gesture recognition is becoming a more prominent form of human–computer interaction and can be
used in the automotive industry to provide a safe and intuitive control interface that will limit driver distraction. We use
a 60 GHz mm-wave radar sensor to detect precise features of fine motion. Specific gesture features are extracted and
used to build a machine learning engine that can perform real-time gesture recognition. This article discusses the user
requirements and in-car environmental constraints that influenced design decisions. Accuracy results of the technique are
presented, and recommendations for further research and improvements are made.
Index Terms—Microwave/millimeter wave sensors, human-car interface, 60 GHz mm-wave radar, gesture sensing, random forest clas-
sifier, machine learning.
I. INTRODUCTION
In the automotive industry, vehicular infotainment systems have
grown in popularity and complexity over the past several years.
Mainstream car manufacturers now offer up to 700 infotainment and
environmental controls for the driver and passengers to manipulate
[1]. However, increased functionality within the vehicle has increased
potential causes for driver distraction. The main causes of driver
distraction are categorized in [2] as visual, cognitive, manual, and
auditory. Studies have shown that visual and manual distractions
when combined have the most impact on driving performance [2].
This paper presents a gesture detection system using a mm-wave
radar sensor for intuitive human-vehicular interaction (HVI). Many
different gesture sensing and processing techniques have been
developed in recent years. Previous gesture detection systems have
used camera based sensors (IR, color, etc.), depth based sensors,
and wearable sensors such as gloves embedded with 3-D tracking
technology [2]–[11]. However, these systems all have significant
drawbacks that affect their usability. Camera based sensors are
susceptible to changes in light, colour, background, and have high
computational costs due to extensive image processing [3], [4]. Depth
based sensors are very good at detecting changes in position, however
they cannot detect orientation or specific hand shapes [5]. Wearable
technology may interfere with other tasks the user does in daily life
and limits system input to whoever is wearing the input device.
Alternatively, we believe that radar sensors present a viable sys-
tem solution. Radars are not affected by variable lighting changes
inside a car and are able to detect specific hand and finger ori-
entations with precision. The radar system described in this paper
Corresponding author: Karly A. Smith (e-mail: k62smith@edu.uwaterloo.ca).
Associate Editor: Y. Duroc.
Digital Object Identifier 10.1109/LSENS.2018.2810093
provides real-time visionless infotainment control to driver and pas-
sengers without wearable components, decreasing risk of driver dis-
traction and allowing multiple user input. Previous work has been
done on in-car gesture sensing that combines short-range radar, time
of flight depth sensors, and color cameras for gesture detection [6].
That system used a FMCW monopulse 25 GHz radar in conjunc-
tion with camera based data for detection and a convolutional neu-
ral network for near real-time recognition [6]. In comparison, the
system presented in this paper uses 60 GHz radar for finer spatial
resolution, and a random forest classifier algorithm for real time
recognition.
II. SYSTEM DESIGN
Using a wireless radar sensor for detection and recognition of ges-
tures offers several advantages over other systems currently in use.
Automobile manufacturers currently offer touchscreens, voice con-
trol, Bluetooth phone connection, and other methods of infotainment
control. Interfaces that require tactile manual input such as a touch-
screen also require small amounts of visual attention to navigate,
taking the driver’s eyes off the road. Voice control does not require
manual or visual control, however if the in-car environment is noisy
with music or conversation voice control is not a viable option. As
highlighted earlier, there are several other systems for gesture and
posture detection in development that use IR and depth cameras for
sensing [2]–[11]; however, cameras are affected by light conditions
and obstacles in the field of view. As well, issues of privacy and user
compliance arise when cameras are in use. Radar is advantageous
because it is not affected by light or sound in the environment, it
can be embedded in devices, has very precise resolution, offers real
time recognition, and does not require recording of an image of the
user [7].
1949-307X C2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
3500904 VOL. 2, NO. 2, JUNE 2018
Fig. 1. Radar chip photo showing size comparable to a nickel.
Fig. 2. Timestamped images of gesture progression with correspond-
ing range doppler signature progressions.
A. Gesture Detection and Recognition
In this article, we utilized a 60 GHz frequency modulated continuous
wave (FMCW) mm-wavelength radar sensor. The sensor hardware
consists of an 8 mm ×11 mm radar chip with two Tx and four Rx
antennas shown in Fig. 1.
The radar board was fitted inside an after-market car. Machine learn-
ing was used to record the radar signature of each set of hand gestures
(as shown in Fig. 2(a) and (b)), train a model using a random forest
classifier algorithm, and then perform recognition using that model.
The random forest classifier was used because of the higher success
rate earlier as highlighted in [15]. Features of the received signal were
processed and made easily accessible for manipulation in C language.
The features used in this project were range, acceleration, energy to-
tal, energy moving, velocity, velocity dispersion, spatial dispersion,
energy strongest component, and movement index [12].
Each gesture was assigned a classifier number, then twenty of each
were recorded and timestamped for data collection. Twenty samples
of the background were also recorded and assigned a classifier, so
Fig. 3. Two placements of the radar sensor. (a) For the driver and
front-seat passenger. (b) For the backseat passengers.
the system would accurately recognize the absence of a gesture. The
collected data was then used to create a random forest classifier, al-
lowing real time recognition of future gestures. The parameters of the
random forest classifier were set as follows:
1. Forest Size =10
2. Forest Max Depth =10
3. Forest Min Samples =50
4. Classifier Min Count =30
5. Classifier Buffer Size =50.
Increasing the number of trees within the classifier increased robust-
ness, and altering the ratio of classifier min count to classifier buffer
size changed the precision of recognition. A ratio of 3:5 between clas-
sifier min count and classifier buffer size ensures that for classification
to occur 30 out of the 50 minimum samples required must be classified
in the same category. Forest size and forest depth were both set to 10
to ensure adequate forest size and average tree depth for classification,
while minimizing computational cost on the system.
B. Sensor Placement
Specific environmental and user constraints were considered when
designing the system and gesture sets to ensure usability and robust-
ness. The interior of a car is spatially complex which created detection
challenges for the sensor. If placed too close to detectable objects, such
as a gear shift, it often recognized false positives, or did not recognize
gestures within the field. Placing the sensor where majority of the
radar beam spread into free space and recording a robust background
case mitigated detection of objects and identified those objects that did
intrude into the field as non-targets. This spatial constraint, combined
with the spatial constraints of the user position, lead to placement
of the sensor on the center console (for use by front seat passenger
and the driver) and between the backs of the front seats (for use by
the back-seat passengers). The beam of the field spreads upward into
free space, where little movement from passengers naturally occurs.
With these two placements (as shown in Fig. 3), the radar is within a
comfortable arm’s length reach for all passengers.
VOL. 2, NO. 2, JUNE 2018 3500904
Fig. 4. Flowchart of the system from gesture recognition to infotain-
ment output.
C. Connections to In-Car Infotainment System
When a gesture was detected, its classifier number was added to
the body of a POST request. This POST request was then sent to an
Android phone (paired to a car in older car models) or Android auto
system (available in newer car models), which, acting as a server,
parsed the POST request for the classifier number. The Android plat-
form would then parse the request to receive the classifier number.
Each classifier number was associated with some action, which the
system could then undertake, using either Android intents for phone
actions, or the Spotify API for media actions. This allowed for both
the streaming of audio over the car speakers and the use of the physi-
cal car infotainment system to display relevant information. A visual
representation of the entire system can be seen in Fig. 4.
D. Gesture Set Design
The needs of the driver and characteristics of the system were
the main considerations when designing the gesture sets. The 60 GHz
sensor in use was designed to detect fine motion, with spatial resolution
as fine as 0.3 mm, which is quite an improvement from the 3.75 cm
range resolution of the 25 GHz radar system presented in [13]; this
enables the 60 GHz radar to have greater precision and accuracy,
smaller footprint allowing for spatial and temporal differences to be
used to make gestures distinguishable from one another. Practical
system implementation dictated the use of gestures with tolerance of
large margins of error to create less driver distraction. By creating
larger spatial zones for each gesture, and having gestures performed
at varying speeds the driver would have more room for non-exact
gestures and thus would need to devote less visual resources to the
motion. Designing more robust gestures also allows multiple users to
operate the system, as it will be more tolerant to the natural spatial
and temporal variations of each user and the various gesture motions
within different vehicle designs.
Two demonstration videos [14] of the system at work were recorded
for the proof of concept system, the details of which are shown in
Table 1. Only the phone function and playlist music function of the
car infotainment system were included in the demonstration videos,
however the system can control much more. As much as possible
intuitive gestures that related to their corresponding functions were
used to limit cognitive load of the user. Using universal signs such
as a telephone gesture and numbers 1, 2, and 3 will make gestures
easier to recall and natural to perform. Vehicular infotainment systems
often use menu functions to navigate and organize all the options of
Table 1. Summary of Gesture Demonstration Characteristics
Demo Gesture Function
Phone (Driver) Wiggle phone sign Call/Hang up
Music (Backseat Passenger) Wiggle fingers Pause/Play
Hold out one finger Select Playlist 1
Wiggle 2 fingers Select Playlist 2
Wave 3 fingers Select Playlist 3
control, which allows reuse of several gestures. The same gestures
can be used in all menus for back, forward, select, etc., which will
provide extensive functionality to the user while limiting the number
of gestures they need to remember. Only main gestures that will be
used to select which menu to access (i.e., contact list, map, music, etc.)
will not be reusable. Reuse of gestures also increases the accuracy of
the system, as fewer classification options has been shown to increase
accuracy of recognition when using a random forest classifier [15].
III. TESTING THE SYSTEM
For each demonstration, gesture sets were recorded by one individ-
ual to film the demonstration. It took significantly less time to record
with one individual; however, gesture sets created by one individual
may not work for other users. Recording with multiple individuals will
capture natural temporal and spatial variations introduced with each
new user to create a more robust system. Another two gesture sets
comprised of three gestures each were recorded by multiple individ-
uals to test performance with more human variation. Five individuals
were used to record each gesture set, each individual recording 20 of
each gesture. The gesture set was later tested by the five recorders as
well as three individuals who did not record, the results of which are
shown in Table 2. To test, everyone performed each gesture 30 times
and the accuracy percentage was recorded. Before testing the system,
all participants had opportunity to practice and find the correct range
and speed of each gesture. Instructions were provided during this
practice to ensure each gesture was performed correctly; these results
reflect user accuracy with a strong understanding of each movement.
The participants’ ability to learn the gestures alone was not evalu-
ated and should be explored further to evaluate learnability of each
gesture when only provided with written or video instruction. The
system performed above 90% accuracy for all gestures on average.
The gestures with the lowest accuracy were gestures 1 and 4, which
may be attributed to human error; both had several incorrectly timed
recordings. Gesture set 2 showed a slight decrease in accuracy going
from the recorders to the users, which may be attributed to the spatial
design of the gesture set. Gesture set 1 had more distinct spatial zones
for each gesture, whereas the spatial zones for gesture set 2 had some
overlap; those who recorded the gesture set had more familiarity with
the spatial zones than the users who did not record. To improve the
accuracy of the system more participants should be used to record,
with each participant recording more gestures. It also may be valuable
to distribute the gestures evenly when testing and recording rather
than complete all samples for a gesture at once; it was observed that as
participants fatigued, variations were introduced that were not present
when the gesture was done naturally only once.
3500904 VOL. 2, NO. 2, JUNE 2018
Table 2. Summary of Recognition Accuracy Results for Two Gesture Sets
Number Set 1 Recorder 1 Recorder 2 Recorder 3 Recorder 4 Recorder 5 User 1 User 2 User 3 Average
1 Low wiggle 100% 100% 87% 90% 93% 93% 80% 83% 91%
2 Turn over 100% 100% 100% 100% 100% 93% 100% 100% 99%
3 High grab 93% 100% 90% 100% 87% 100% 90% 100% 95%
-Set 2 - - - - - - - - -
4 Swipe 100% 93% 87% 97% 93% 87% 87% 83% 91%
5 Large circle 100% 100% 100% 100% 97% 93% 93% 90% 97%
6 Small circle 100% 100% 93% 100% 100% 93% 93% 87% 96%
IV. CONCLUSION
We have presented a gesture detection system using mm-wave radar
for vehicular infotainment control. The gestures were designed to be
distinguishable by the radar, be intuitive and memorable for the user,
and fit the constraints of the environment. Specific decisions were
made to maximize ease of use by drivers; further testing should be
done to validate these usability design decisions. Demonstrations of
the system in use were filmed and presented to showcase the use of
intuitive gestures [14], ease of use by both driver and passengers,
and the use of large spatial zones for more robust recognition. The
accuracy of the system was tested with multiple users; it was found
that involving more participants when recording a gesture set increased
accuracy and robustness. In the future, studies are needed to define
and optimize the required user input for the system training stages.
ACKNOWLEDGMENT
This work was supported by the Natural Sciences and Engineering Research Council
of Canada. The radar transceiver used in this work was provided by Google’s Advanced
Technology and Projects group (ATAP) through the Project Soli Alpha DevKit Early
Access Grant.
REFERENCES
[1] C. Pickering, K. Burnham, and M. Richardson, “A research study of hand gesture
recognition technologies and applications for human vehicle interaction,” in Proc.
IET 3rd Inst. Eng. Technol. Conf. Automotive Electron., Warwick, U.K., 2007,
pp. 1–15.
[2] K. Young and M. Regan, “Driver distraction: A review of the literature,” Monash
Univ. Accident Res. Centre, Clayton, Vic, Australia, 2007, pp. 379–405.
[3] J. P. Wachs, M. Kolsch, H. Stern, and Y. Edan, “Vision based hand gesture ap-
plications,” Commun. ACM, vol. 54, no. 2, pp. 60–71, 2011. [Online]. Available:
http://dx.doi.org/10.1145/1897816.1897838
[4] P. Breuer, C. Eckes, and S. M¨
uller, “Hand gesture recognition with a novel IR
time-of-flight range camera—A pilot study,” in Proc. Int. Conf. Vis./Comput. Techn.
Appl., 2007, pp. 247–260. [Online]. Available:http://dx.doi.org/10.1007/978-3-540-
71457-6_23
[5] C. Keskin, F. Kırac, Y. E. Kara, and L. Akarun, “Real time hand pose estimation using
depth sensors,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2011, pp. 1228–
1234. [Online]. Available: http://dx.doi.org/10.1109/ICCVW.2011.6130391
[6] P. Molchanov, S. Gupta, K. Kim, and K. Pulli, “Multi-sensor system for driver’s
hand-gesture recognition,” in Proc. IEEE 11th Int. Conf. Workshops Automat.
Face Gesture Recognit., Ljubljana, Slovenia, 2015, pp. 1–8. [Online]. Available:
http://dx.doi.org/10.1109/FG.2015.7163132
[7] A. Riener, M. Rossbory, and M. Ferscha, “Natural DVI based on intuitive
hand gestures,” in Proc. INTERACT Workshop User Experience Cars, 2011,
pp. 62–66.
[8] P. Molchanov, S. Gupta, K. Kim, and J. Kautz, “Hand gesture recognition
with 3D convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. Workshops, Boston, MA, USA, 2015, pp. 1–7. [Online]. Available:
http://dx.doi.org/10.1109/CVPRW.2015.7301342
[9] E. Ohn-Bar and M. M. Trivedi, “Hand gesture recognition in real time for automotive
interfaces: A multimodal vision based approach and evaluations,IEEE Trans. Intell.
Transportation Syst., vol. 15, no. 6, pp. 2368–2377, Dec. 2014. [Online]. Available:
http://dx.doi.org/10.1109/TITS.2014.2337331
[10] Y. Jacob, F. Manitsaris, G. Lele, and L. Pradere, “Hand gesture recognition for driver
vehicle interaction,” in Proc. IEEE Comput. Soc. Workshop Observing Understand.
Hands Action 28th IEEE Conf. Comput. Vis. Pattern Recognit., Boston, MA, USA,
2015, pp. 41–44.
[11] U. Reissner, “Gestures and speech in cars,” Dept. Informat., Technische Univ.
Munchen, Munchen, Germany, 2007.
[12] N. Gillian, “Gesture Recognition Toolkit, ver. 1.0,” Oct. 2017. [Online]. Available:
http://www.nickgillian.com/grt/
[13] P. Molchanov, S. Gupta, K. Kim, and K. Pulli, “Short-range FMCW
monopulse radar for hand-gesture sensing,” in Proc. IEEE Radar Conf.,
Johannesburg, South Africa, 2015, pp. 1491–1496, Online]. Available:
http://dx.doi.org/10.1109/RADAR.2015.7131232
[14] G. Shaker, “Gesture recognition using mm-waves,” Waterloo Artif. Intell. Inst., Nov
2017. [Online]. Available: https://goo.gl/iRqkJC
[15] J. Lien, “Soli: Ubiquitous gesture sensing with mm-wave radar,” ACM
Trans. Graphics, vol. 35, no. 4, 2016, Art. no. 142, [Online]. Available:
http://dx.doi.org/10.1145/2897824.2925953
[16] S. Naik, H. R. Abhishek, K. N. Ashwal, and S. P. Balasubramanya, “A study
on automotive human vehicle interaction using gesture recognition technol-
ogy,Int. J. Multidisciplinary Cryptol. Inf. Secur., vol. 1, no. 2, pp. 6–12,
2012.
[17] M. Alpern and K. Minardo, “Developing a car gesture interface for use as
a secondary task,” in Proc. ACM Extended Abstracts Hum. Factors Com-
put. Syst., Fort Lauderdale, FL, USA, 2003, pp. 932–933. [Online]. Available:
http://dx.doi.org/10.1145/765891.766078
... To realize this potential, there are several technical challenges that must be overcome and considerations that are specific to radar-based user interfaces. Most previous work using radar sensors only support a limited vocabulary [3,55,61], and do not fully exploit the complex features available from mmWave radars. ...
... The average F-1 score across over all participants and the average string similarity [19] in the last session is 0.8815 and 0.8619. Our work addressed the undefined variance between individuals [55] in gesture motion detection. We also share our compiled data, including the video recording for participants' hands, and the source code for the radar firmware and the data collection software, and the script that processes the data and provides the evaluation results. ...
... Millimeter-wave radar's frequency-modulated continuous-wave (FMCW) transmits a continuous wave modulated in a frequency range, capturing spatial and temporal information of objects. The data profile has a high signal frequency and established processing pipelines [18,22,31,32,49,55,68], making possible sub-millimeter accuracy. In addition, its compact size, low computational cost, and low monetary cost have led to increased interest and investment [3,22,31]. ...
Article
Full-text available
In this paper, we introduce IndexPen, a novel interaction technique for text input through two-finger in-air micro-gestures, enabling touch-free, effortless, tracking-based interaction, designed to mirror real-world writing. Our system is based on millimeter-wave radar sensing, and does not require instrumentation on the user. IndexPen can successfully identify 30 distinct gestures, representing the letters A-Z, as well as Space, Backspace, Enter, and a special Activation gesture to prevent unintentional input. Additionally, we include a noise class to differentiate gesture and non-gesture noise. We present our system design, including the radio frequency (RF) processing pipeline, classification model, and real-time detection algorithms. We further demonstrate our proof-of-concept system with data collected over ten days with five participants yielding 95.89% cross-validation accuracy on 31 classes (including noise). Moreover, we explore the learnability and adaptability of our system for real-world text input with 16 participants who are first-time users to IndexPen over five sessions. After each session, the pre-trained model from the previous five-user study is calibrated on the data collected so far for a new user through transfer learning. The F-1 score showed an average increase of 9.14% per session with the calibration, reaching an average of 88.3% on the last session across the 16 users. Meanwhile, we show that the users can type sentences with IndexPen at 86.2% accuracy, measured by string similarity. This work builds a foundation and vision for future interaction interfaces that could be enabled with this paradigm.
... In the recent years, dynamic hand gesture recognition has garnered lot of attention from industry and academia alike [Molchanov et al., 2016] [Wan et al., 2014] [ Molchanov et al., 2015] [Lien et al., 2016] [Smith et al., 2018] [Li et al., 2017] [Wang et al., 2016] [Kim and Toomajian, 2016] [Hazra and Santra, 2018] [Zhang, Tian, and Zhou, 2018]. A 2.4 GHz Doppler radar that classifies several gestures using handcrafted features is presented in [Wan et al., 2014]. ...
... A 60-GHz FMCW radar sensor that is capable of sensing and classifying minute gestures using hand-crafted features and random forest classifier is introduced in [Lien et al., 2016]. A human-car interface solution using 60-GHz FMCW radar that mitigates the effects of car vibrations is presented in [Smith et al., 2018]. Authors in [Li et al., 2017] propose a novel hand-crafted feature using sparsity-based approach for micro-Doppler extraction and subsequent recognition. ...
Thesis
Full-text available
The topic of detection, tracking and classification of weak targets in interferencedominated environment using radars is studied in this thesis. The problem is approached from the perspective of both resource-friendly radar systems and resourcelimited radar systems. In the case of resource-friendly radar systems, cognitive architectures that can progressively sense the environment and adjust its operating waveform-receiver filters are analyzed. The study of joint optimal transmit waveform and receive filters such that they operate optimally in the presence of weak targets and interference has been of huge interest in both academia and industry. Recent advances in adaptive waveform synthesis have focused on joint design and implementation of knowledge-aided receiver signal processing techniques and adaptive transmit signals. This closed loop radar framework that mimics mammals’ neurological capability to tune system parameters in response to cognition of the environment is commonly referred to as "cognitive radar" or "fully adaptive radar". In this thesis, the output signal to interference noise ratio maximizing jointly optimal transmit waveform and receive filter for a single-input, single-output radar design in the presence of extended target and colored interference is presented. The ambiguity function, processing gain and Cramer-Rao bound for such waveformfilters are derived. Apart from the optimal waveform dictated by the joint optimization strategy, it is desired that the radar transmit waveforms possess constant time envelope to drive the power amplifiers at saturation. This constraint requires reconstruction of constant envelope signals, which is addressed using proposed relaxed iterative error reduction algorithm. In general, iterative algorithms are sensitive to the initial seed, which is solved by deriving a closed-form solution making stationary phase assumption. In the case of multiple-input multiple-output (MIMO) radars, the interference between signals can significantly limit the radar’s ability for observation of weak targets in presence of stronger targets and background clutter. For the multi-channel radars, in this thesis orthogonally coded Linear Frequency Modulated (LFM) waveforms is proposed, wherein consecutive complex LFM signals in a frame are coded by orthogonal codes, namely Golay complementary, Zadoff Chu, direct spread spectrum, space-time block coded, discrete Fourier transform and Costasbased sequences. The orthogonal codes to modulate the LFM across symbol form fixed library waveforms leading to partial adaptation instead of arbitrary waveform dictated by "fully adaptive radar". The ambiguity function for such orthogonallycoded MIMO radar is derived, and the waveforms are analyzed in terms of their ambiguity function and imaging performance. With the advancement in silicon and packaging technology, radars have evolved from high-end aerospace technology into relatively low-cost Human-Machine Interface (HMI) sensors. However, the sensor in such industrial and consumer setting should have a small form-factor and low-cost, thus they cannot sustain cognitive architectures to detect and classify weak human targets. To improve detection and classification performance for HMI applications, novel processing and learning algorithms are proposed. In practice, there are several challenges to learningbased solutions using low-cost radars particularly with respect to open set classification. In open set classification, the system needs to handle variations of the input data, alien operating environment and unknown classes. Conventional deep learning approaches use a simple softmax layer and evaluate the accuracy on known classes, thus on closed set classification. The softmax layer provides separability of classes but does not provide discriminative class boundaries. Hence, many unknown classes are erroneously predicted as one of the known classes with a high confidence, resulting in poor performance in real world environments. Other challenges arise due to the inconspicuous interclass difference between features from one class and other closely-related class and large intra-class variations in the radar data from the same classes. To address these challenges, novel representation learning algorithms along with novel loss functions are proposed in this thesis. Unlike conventional deep learning approaches using softmax that learns to classify, deep representation learning learns the process to classify by projecting the input feature images to an embedded space where similar classes are grouped together while dissimilar classes are far apart. Thus, deep representation learning approaches simultaneously learn separable inter-class difference and compact discriminative intraclass, essential for open set classification. Specifically, the proposed representation learning algorithms are evaluated in the context of gesture sensing, material classification, air-writing and kick sensing HMI applications.
... Wireless-based Human Sensing Several wireless systems have been developed to reconstruct the human body [27,51,55,[59][60][61] and the mmWave-based system is one of them. The mmWave sensing has been widely adopted to enable various human sensing works, such as human monitoring and tracking [6,54,57], human detection and identification [14,56] , and behavior recognition 22,49,58]. For human pose estimation, only a few works [25,45,46] have been presented recently. ...
Preprint
Millimeter Wave (mmWave) Radar is gaining popularity as it can work in adverse environments like smoke, rain, snow, poor lighting, etc. Prior work has explored the possibility of reconstructing 3D skeletons or meshes from the noisy and sparse mmWave Radar signals. However, it is unclear how accurately we can reconstruct the 3D body from the mmWave signals across scenes and how it performs compared with cameras, which are important aspects needed to be considered when either using mmWave radars alone or combining them with cameras. To answer these questions, an automatic 3D body annotation system is first designed and built up with multiple sensors to collect a large-scale dataset. The dataset consists of synchronized and calibrated mmWave radar point clouds and RGB(D) images in different scenes and skeleton/mesh annotations for humans in the scenes. With this dataset, we train state-of-the-art methods with inputs from different sensors and test them in various scenarios. The results demonstrate that 1) despite the noise and sparsity of the generated point clouds, the mmWave radar can achieve better reconstruction accuracy than the RGB camera but worse than the depth camera; 2) the reconstruction from the mmWave radar is affected by adverse weather conditions moderately while the RGB(D) camera is severely affected. Further, analysis of the dataset and the results shadow insights on improving the reconstruction from the mmWave radar and the combination of signals from different sensors.
... Recent advancements in gesture recognition [3,14,28,32,51,52] and motion tracking has opened up new avenues in human-computer interaction (HCI) [18,25,44]. These techniques open up a space for the development of intelligent tutoring systems for student education. ...
Article
Full-text available
Air-writing is the process where, without the assistance of any handheld device, users use finger or hand gestures to write a character or words in free space. Due to its simple writing style, it has a great advantage over conventional pen-and-paper-based systems. However, because of the absence of any common delimiting criterion, non-uniform characters, and different writing styles, it is a difficult task. In this work, we propose an air written Mathematical expression recognition system using webcam video as input. We employed a new hand detection model that recognizes the writing hand and tracks the fingertip movement to collect the trajectories and then the convolutional neural network (CNN) is used as a recognizer. Through our model children can explore basic ME evaluation on their own without any instructor’s help and can gain more knowledge with minimal effort. Experiments were conducted on a combination of MNIST, ISI Air-Written English numerals, RTD along our own air-written Math operators datasets (MAIR). To evaluate the robustness of our model, we also tested our model on a group of children where they fed the input by writing in the air and the input data was captured using a system webcam. In both cases, we achieved promising results for digit recognition as well as ME evaluation.
... H AND gestures have continually received attention as playing an essential role in human-computer interaction (HCI) since it is a natural and intuitive communication method for humans. Hand gesture recognition systems can use various devices for real-life HCI applications, including VR/AR control [1], [2], in-vehicle gestural interface [3], [4], and human-robot interaction [5], [6]. Vision-based approaches utilizing RGB or RGB-D cameras have achieved high recognition accuracy with image processing techniques. ...
Article
Full-text available
Real-time hand gesture recognition plays a vital role in human-computer interaction (HCI). Recent radar-based hand gesture recognition methods have focused on achieving high classification accuracy using deep neural network (DNN)-based classifiers. However, the hand gesture recognition system should not only classify the gestures accurately but also detect out-of-distribution (OOD) samples to be used in real-world HCI scenarios with high reliability. Recognition systems without OOD detection capability misclassify unintended gestures in silence, especially in real-time scenarios. To tackle this problem, we propose a real-time hand gesture recognition system that can simultaneously classify hand gestures and detect OOD samples by using a Frequency Modulated ContinuousWave (FMCW) radar sensor. First, we design radar data processing technique and Transformer encoder-based classifier to achieve high classification accuracy. Second, the relative Mahalanobis distance (RMD)-based OOD detection method is adopted to increase the reliability of the proposed system. Finally, one in-distribution dataset and two OOD datasets are collected to verify the proposed system. The proposed system achieves a classification accuracy of 93.95%on the in-distribution dataset.We conduct the OOD detection experiments with two OOD datasets for which the proposed system reports AUROC values of 92.96% and 92.84%, respectively. Furthermore, the feasibility of the proposed system is certified through a real-time experimental demonstration.
... Usually, a set of handcrafted features from the micro-Doppler signature will be extracted from the radar signal, such as the Doppler bandwidth, or the centroid (torso frequency) for the DT domain. Then, statistical learning techniques, like support vector machine (SVM) [4] and random forest classifier [5] are used for classification. The DT signature is commonly used in radar-base HAR. ...
Article
Now a days we can see that there are many cases which are occurring due to the drowsiness of drivers and that has become a main problem of the automotive industry. To overcome this in automotive industry the introduction of new technologies that is by introducing of new sensor which can detect the different activities. Detection of activities by sensors is for biological measuring such as heartbeat, oxygen level, respiration activity, etc. By applying such widespread variety of sensor usage in the system has a very high implementation cost and also very complexity which is a bit challenging design. In this paper, we are going study that how humantenna effect is used to detect and test the drive drowsiness by using simple and budget sensors in automotive industry.
Article
Full-text available
Approximately one quarter of vehicle crashes in the United States are estimated to result from the driver being inattentive, or distracted. As more wireless communication, entertainment and driver assistance systems proliferate the vehicle market, the incidence of distraction-related crashes is expected to escalate. In North America, Europe and Japan, driver distraction is a priority issue in road safety. However, the significance of driver distraction as road safety issue has only recently been recognised in Australia. This paper provides a review of current research on in-vehicle driver distraction, focusing on mobile phone use in particular, given that this device has received the greatest attention in the driver distraction literature. The review discusses the effect of in-vehicle devices on driving performance. Issues addressed include: the adaptive strategies drivers adopt in order to maintain their driving performance while distracted at an adequate level; under what conditions these adaptive strategies can fail; and how driving performance is affected when they do. Also examined is whether, and to what degree, these degradations in driving performance translate into an increased crash risk. In the final section of the paper, recommendations for future research are provided.
Chapter
The Gesture Recognition Toolkit is a cross-platform open-source C++ library designed to make real-time machine learning and gesture recognition more accessible for non-specialists. Emphasis is placed on ease of use, with a consistent, minimalist design that promotes accessibility while supporting flexibility and customization for advanced users. The toolkit features a broad range of classification and regression algorithms and has extensive support for building real-time systems. This includes algorithms for signal processing, feature extraction and automatic gesture spotting.
Article
This paper presents Soli, a new, robust, high-resolution, low-power, miniature gesture sensing technology for human-computer interaction based on millimeter-wave radar. We describe a new approach to developing a radar-based sensor optimized for human-computer interaction, building the sensor architecture from the ground up with the inclusion of radar design principles, high temporal resolution gesture tracking, a hardware abstraction layer (HAL), a solid-state radar chip and system architecture, interaction models and gesture vocabularies, and gesture recognition. We demonstrate that Soli can be used for robust gesture recognition and can track gestures with sub-millimeter accuracy, running at over 10,000 frames per second on embedded hardware.
Article
Intelligent driver assistance systems have become important in the automotive industry. One key element of such systems is a smart user interface that tracks and recognizes drivers' hand gestures. Hand gesture sensing using traditional computer vision techniques is challenging because of wide variations in lighting conditions, e.g. inside a car. A short-range radar device can provide additional information, including the location and instantaneous radial velocity of moving objects. We describe a novel end-to-end (hardware, interface, and software) short-range FMCW radar-based system designed to effectively sense dynamic hand gestures. We provide an effective method for selecting the parameters of the FMCW waveform and for jointly calibrating the radar system with a depth sensor. Finally, we demonstrate that our system guarantees reliable and robust performance.
Article
In this paper, we develop a vision-based system that employs a combined RGB and depth descriptor to classify hand gestures. The method is studied for a human-machine interface application in the car. Two interconnected modules are employed: one that detects a hand in the region of interaction and performs user classification, and another that performs gesture recognition. The feasibility of the system is demonstrated using a challenging RGBD hand gesture data set collected under settings of common illumination variation and occlusion.
Conference Paper
We propose a novel multi-sensor system for accurate and power-efficient dynamic car-driver hand-gesture recognition, using a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. We present a procedure to jointly calibrate the radar and depth sensors. We employ convolutional deep neural networks to fuse data from multiple sensors and to classify the gestures. Our algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night. It consumes significantly less power than purely vision-based systems.
Article
Haptic use of infotainment (information and entertainment) systems in cars, de- flect the driver from his primary task, the driving. Therefore we need new Human- Machine Interfaces (HMI), which do not require the driver's full attention, for controlling infotainment systems in cars. Gestures, speech and sounds provide an intuitive addition for existing haptical controls in automotive environments. While sounds are in common used as output, speech and gestures can be used as input modalities as well. The combination of all these devices leads us to multi- modal HMIs, where dierent input and output devices can be used at the same time.