Content uploaded by George Shaker
Author content
All content in this area was uploaded by George Shaker on Mar 21, 2019
Content may be subject to copyright.
VOL. 2, NO. 2, JUNE 2018 3500904
Microwave/millimeter wave sensors
Gesture Recognition Using mm-Wave Sensor for Human-Car Interface
Karly A. Smith1,Cl
´
ement Csech2, David Murdoch3, and George Shaker3,4∗
1Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
2Department of Biomechanics and Bioengineering, Universite de Technologie de Compiegne, Compiegne 60200, France
3Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
4Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
∗Senior Member, IEEE
Manuscript received December 13, 2017; revised February 5, 2018; accepted February 21, 2018. Date of publication February 27, 2018; date of current
version May 4, 2018.
Abstract— This article details the development of a gesture recognition technique using a mm-wave radar sensor for in-car
infotainment control. Gesture recognition is becoming a more prominent form of human–computer interaction and can be
used in the automotive industry to provide a safe and intuitive control interface that will limit driver distraction. We use
a 60 GHz mm-wave radar sensor to detect precise features of fine motion. Specific gesture features are extracted and
used to build a machine learning engine that can perform real-time gesture recognition. This article discusses the user
requirements and in-car environmental constraints that influenced design decisions. Accuracy results of the technique are
presented, and recommendations for further research and improvements are made.
Index Terms—Microwave/millimeter wave sensors, human-car interface, 60 GHz mm-wave radar, gesture sensing, random forest clas-
sifier, machine learning.
I. INTRODUCTION
In the automotive industry, vehicular infotainment systems have
grown in popularity and complexity over the past several years.
Mainstream car manufacturers now offer up to 700 infotainment and
environmental controls for the driver and passengers to manipulate
[1]. However, increased functionality within the vehicle has increased
potential causes for driver distraction. The main causes of driver
distraction are categorized in [2] as visual, cognitive, manual, and
auditory. Studies have shown that visual and manual distractions
when combined have the most impact on driving performance [2].
This paper presents a gesture detection system using a mm-wave
radar sensor for intuitive human-vehicular interaction (HVI). Many
different gesture sensing and processing techniques have been
developed in recent years. Previous gesture detection systems have
used camera based sensors (IR, color, etc.), depth based sensors,
and wearable sensors such as gloves embedded with 3-D tracking
technology [2]–[11]. However, these systems all have significant
drawbacks that affect their usability. Camera based sensors are
susceptible to changes in light, colour, background, and have high
computational costs due to extensive image processing [3], [4]. Depth
based sensors are very good at detecting changes in position, however
they cannot detect orientation or specific hand shapes [5]. Wearable
technology may interfere with other tasks the user does in daily life
and limits system input to whoever is wearing the input device.
Alternatively, we believe that radar sensors present a viable sys-
tem solution. Radars are not affected by variable lighting changes
inside a car and are able to detect specific hand and finger ori-
entations with precision. The radar system described in this paper
Corresponding author: Karly A. Smith (e-mail: k62smith@edu.uwaterloo.ca).
Associate Editor: Y. Duroc.
Digital Object Identifier 10.1109/LSENS.2018.2810093
provides real-time visionless infotainment control to driver and pas-
sengers without wearable components, decreasing risk of driver dis-
traction and allowing multiple user input. Previous work has been
done on in-car gesture sensing that combines short-range radar, time
of flight depth sensors, and color cameras for gesture detection [6].
That system used a FMCW monopulse 25 GHz radar in conjunc-
tion with camera based data for detection and a convolutional neu-
ral network for near real-time recognition [6]. In comparison, the
system presented in this paper uses 60 GHz radar for finer spatial
resolution, and a random forest classifier algorithm for real time
recognition.
II. SYSTEM DESIGN
Using a wireless radar sensor for detection and recognition of ges-
tures offers several advantages over other systems currently in use.
Automobile manufacturers currently offer touchscreens, voice con-
trol, Bluetooth phone connection, and other methods of infotainment
control. Interfaces that require tactile manual input such as a touch-
screen also require small amounts of visual attention to navigate,
taking the driver’s eyes off the road. Voice control does not require
manual or visual control, however if the in-car environment is noisy
with music or conversation voice control is not a viable option. As
highlighted earlier, there are several other systems for gesture and
posture detection in development that use IR and depth cameras for
sensing [2]–[11]; however, cameras are affected by light conditions
and obstacles in the field of view. As well, issues of privacy and user
compliance arise when cameras are in use. Radar is advantageous
because it is not affected by light or sound in the environment, it
can be embedded in devices, has very precise resolution, offers real
time recognition, and does not require recording of an image of the
user [7].
1949-307X C2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
3500904 VOL. 2, NO. 2, JUNE 2018
Fig. 1. Radar chip photo showing size comparable to a nickel.
Fig. 2. Timestamped images of gesture progression with correspond-
ing range doppler signature progressions.
A. Gesture Detection and Recognition
In this article, we utilized a 60 GHz frequency modulated continuous
wave (FMCW) mm-wavelength radar sensor. The sensor hardware
consists of an 8 mm ×11 mm radar chip with two Tx and four Rx
antennas shown in Fig. 1.
The radar board was fitted inside an after-market car. Machine learn-
ing was used to record the radar signature of each set of hand gestures
(as shown in Fig. 2(a) and (b)), train a model using a random forest
classifier algorithm, and then perform recognition using that model.
The random forest classifier was used because of the higher success
rate earlier as highlighted in [15]. Features of the received signal were
processed and made easily accessible for manipulation in C language.
The features used in this project were range, acceleration, energy to-
tal, energy moving, velocity, velocity dispersion, spatial dispersion,
energy strongest component, and movement index [12].
Each gesture was assigned a classifier number, then twenty of each
were recorded and timestamped for data collection. Twenty samples
of the background were also recorded and assigned a classifier, so
Fig. 3. Two placements of the radar sensor. (a) For the driver and
front-seat passenger. (b) For the backseat passengers.
the system would accurately recognize the absence of a gesture. The
collected data was then used to create a random forest classifier, al-
lowing real time recognition of future gestures. The parameters of the
random forest classifier were set as follows:
1. Forest Size =10
2. Forest Max Depth =10
3. Forest Min Samples =50
4. Classifier Min Count =30
5. Classifier Buffer Size =50.
Increasing the number of trees within the classifier increased robust-
ness, and altering the ratio of classifier min count to classifier buffer
size changed the precision of recognition. A ratio of 3:5 between clas-
sifier min count and classifier buffer size ensures that for classification
to occur 30 out of the 50 minimum samples required must be classified
in the same category. Forest size and forest depth were both set to 10
to ensure adequate forest size and average tree depth for classification,
while minimizing computational cost on the system.
B. Sensor Placement
Specific environmental and user constraints were considered when
designing the system and gesture sets to ensure usability and robust-
ness. The interior of a car is spatially complex which created detection
challenges for the sensor. If placed too close to detectable objects, such
as a gear shift, it often recognized false positives, or did not recognize
gestures within the field. Placing the sensor where majority of the
radar beam spread into free space and recording a robust background
case mitigated detection of objects and identified those objects that did
intrude into the field as non-targets. This spatial constraint, combined
with the spatial constraints of the user position, lead to placement
of the sensor on the center console (for use by front seat passenger
and the driver) and between the backs of the front seats (for use by
the back-seat passengers). The beam of the field spreads upward into
free space, where little movement from passengers naturally occurs.
With these two placements (as shown in Fig. 3), the radar is within a
comfortable arm’s length reach for all passengers.
VOL. 2, NO. 2, JUNE 2018 3500904
Fig. 4. Flowchart of the system from gesture recognition to infotain-
ment output.
C. Connections to In-Car Infotainment System
When a gesture was detected, its classifier number was added to
the body of a POST request. This POST request was then sent to an
Android phone (paired to a car in older car models) or Android auto
system (available in newer car models), which, acting as a server,
parsed the POST request for the classifier number. The Android plat-
form would then parse the request to receive the classifier number.
Each classifier number was associated with some action, which the
system could then undertake, using either Android intents for phone
actions, or the Spotify API for media actions. This allowed for both
the streaming of audio over the car speakers and the use of the physi-
cal car infotainment system to display relevant information. A visual
representation of the entire system can be seen in Fig. 4.
D. Gesture Set Design
The needs of the driver and characteristics of the system were
the main considerations when designing the gesture sets. The 60 GHz
sensor in use was designed to detect fine motion, with spatial resolution
as fine as 0.3 mm, which is quite an improvement from the 3.75 cm
range resolution of the 25 GHz radar system presented in [13]; this
enables the 60 GHz radar to have greater precision and accuracy,
smaller footprint allowing for spatial and temporal differences to be
used to make gestures distinguishable from one another. Practical
system implementation dictated the use of gestures with tolerance of
large margins of error to create less driver distraction. By creating
larger spatial zones for each gesture, and having gestures performed
at varying speeds the driver would have more room for non-exact
gestures and thus would need to devote less visual resources to the
motion. Designing more robust gestures also allows multiple users to
operate the system, as it will be more tolerant to the natural spatial
and temporal variations of each user and the various gesture motions
within different vehicle designs.
Two demonstration videos [14] of the system at work were recorded
for the proof of concept system, the details of which are shown in
Table 1. Only the phone function and playlist music function of the
car infotainment system were included in the demonstration videos,
however the system can control much more. As much as possible
intuitive gestures that related to their corresponding functions were
used to limit cognitive load of the user. Using universal signs such
as a telephone gesture and numbers 1, 2, and 3 will make gestures
easier to recall and natural to perform. Vehicular infotainment systems
often use menu functions to navigate and organize all the options of
Table 1. Summary of Gesture Demonstration Characteristics
Demo Gesture Function
Phone (Driver) Wiggle phone sign Call/Hang up
Music (Backseat Passenger) Wiggle fingers Pause/Play
Hold out one finger Select Playlist 1
Wiggle 2 fingers Select Playlist 2
Wave 3 fingers Select Playlist 3
control, which allows reuse of several gestures. The same gestures
can be used in all menus for back, forward, select, etc., which will
provide extensive functionality to the user while limiting the number
of gestures they need to remember. Only main gestures that will be
used to select which menu to access (i.e., contact list, map, music, etc.)
will not be reusable. Reuse of gestures also increases the accuracy of
the system, as fewer classification options has been shown to increase
accuracy of recognition when using a random forest classifier [15].
III. TESTING THE SYSTEM
For each demonstration, gesture sets were recorded by one individ-
ual to film the demonstration. It took significantly less time to record
with one individual; however, gesture sets created by one individual
may not work for other users. Recording with multiple individuals will
capture natural temporal and spatial variations introduced with each
new user to create a more robust system. Another two gesture sets
comprised of three gestures each were recorded by multiple individ-
uals to test performance with more human variation. Five individuals
were used to record each gesture set, each individual recording 20 of
each gesture. The gesture set was later tested by the five recorders as
well as three individuals who did not record, the results of which are
shown in Table 2. To test, everyone performed each gesture 30 times
and the accuracy percentage was recorded. Before testing the system,
all participants had opportunity to practice and find the correct range
and speed of each gesture. Instructions were provided during this
practice to ensure each gesture was performed correctly; these results
reflect user accuracy with a strong understanding of each movement.
The participants’ ability to learn the gestures alone was not evalu-
ated and should be explored further to evaluate learnability of each
gesture when only provided with written or video instruction. The
system performed above 90% accuracy for all gestures on average.
The gestures with the lowest accuracy were gestures 1 and 4, which
may be attributed to human error; both had several incorrectly timed
recordings. Gesture set 2 showed a slight decrease in accuracy going
from the recorders to the users, which may be attributed to the spatial
design of the gesture set. Gesture set 1 had more distinct spatial zones
for each gesture, whereas the spatial zones for gesture set 2 had some
overlap; those who recorded the gesture set had more familiarity with
the spatial zones than the users who did not record. To improve the
accuracy of the system more participants should be used to record,
with each participant recording more gestures. It also may be valuable
to distribute the gestures evenly when testing and recording rather
than complete all samples for a gesture at once; it was observed that as
participants fatigued, variations were introduced that were not present
when the gesture was done naturally only once.
3500904 VOL. 2, NO. 2, JUNE 2018
Table 2. Summary of Recognition Accuracy Results for Two Gesture Sets
Number Set 1 Recorder 1 Recorder 2 Recorder 3 Recorder 4 Recorder 5 User 1 User 2 User 3 Average
1 Low wiggle 100% 100% 87% 90% 93% 93% 80% 83% 91%
2 Turn over 100% 100% 100% 100% 100% 93% 100% 100% 99%
3 High grab 93% 100% 90% 100% 87% 100% 90% 100% 95%
-Set 2 - - - - - - - - -
4 Swipe 100% 93% 87% 97% 93% 87% 87% 83% 91%
5 Large circle 100% 100% 100% 100% 97% 93% 93% 90% 97%
6 Small circle 100% 100% 93% 100% 100% 93% 93% 87% 96%
IV. CONCLUSION
We have presented a gesture detection system using mm-wave radar
for vehicular infotainment control. The gestures were designed to be
distinguishable by the radar, be intuitive and memorable for the user,
and fit the constraints of the environment. Specific decisions were
made to maximize ease of use by drivers; further testing should be
done to validate these usability design decisions. Demonstrations of
the system in use were filmed and presented to showcase the use of
intuitive gestures [14], ease of use by both driver and passengers,
and the use of large spatial zones for more robust recognition. The
accuracy of the system was tested with multiple users; it was found
that involving more participants when recording a gesture set increased
accuracy and robustness. In the future, studies are needed to define
and optimize the required user input for the system training stages.
ACKNOWLEDGMENT
This work was supported by the Natural Sciences and Engineering Research Council
of Canada. The radar transceiver used in this work was provided by Google’s Advanced
Technology and Projects group (ATAP) through the Project Soli Alpha DevKit Early
Access Grant.
REFERENCES
[1] C. Pickering, K. Burnham, and M. Richardson, “A research study of hand gesture
recognition technologies and applications for human vehicle interaction,” in Proc.
IET 3rd Inst. Eng. Technol. Conf. Automotive Electron., Warwick, U.K., 2007,
pp. 1–15.
[2] K. Young and M. Regan, “Driver distraction: A review of the literature,” Monash
Univ. Accident Res. Centre, Clayton, Vic, Australia, 2007, pp. 379–405.
[3] J. P. Wachs, M. Kolsch, H. Stern, and Y. Edan, “Vision based hand gesture ap-
plications,” Commun. ACM, vol. 54, no. 2, pp. 60–71, 2011. [Online]. Available:
http://dx.doi.org/10.1145/1897816.1897838
[4] P. Breuer, C. Eckes, and S. M¨
uller, “Hand gesture recognition with a novel IR
time-of-flight range camera—A pilot study,” in Proc. Int. Conf. Vis./Comput. Techn.
Appl., 2007, pp. 247–260. [Online]. Available:http://dx.doi.org/10.1007/978-3-540-
71457-6_23
[5] C. Keskin, F. Kırac, Y. E. Kara, and L. Akarun, “Real time hand pose estimation using
depth sensors,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2011, pp. 1228–
1234. [Online]. Available: http://dx.doi.org/10.1109/ICCVW.2011.6130391
[6] P. Molchanov, S. Gupta, K. Kim, and K. Pulli, “Multi-sensor system for driver’s
hand-gesture recognition,” in Proc. IEEE 11th Int. Conf. Workshops Automat.
Face Gesture Recognit., Ljubljana, Slovenia, 2015, pp. 1–8. [Online]. Available:
http://dx.doi.org/10.1109/FG.2015.7163132
[7] A. Riener, M. Rossbory, and M. Ferscha, “Natural DVI based on intuitive
hand gestures,” in Proc. INTERACT Workshop User Experience Cars, 2011,
pp. 62–66.
[8] P. Molchanov, S. Gupta, K. Kim, and J. Kautz, “Hand gesture recognition
with 3D convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. Workshops, Boston, MA, USA, 2015, pp. 1–7. [Online]. Available:
http://dx.doi.org/10.1109/CVPRW.2015.7301342
[9] E. Ohn-Bar and M. M. Trivedi, “Hand gesture recognition in real time for automotive
interfaces: A multimodal vision based approach and evaluations,” IEEE Trans. Intell.
Transportation Syst., vol. 15, no. 6, pp. 2368–2377, Dec. 2014. [Online]. Available:
http://dx.doi.org/10.1109/TITS.2014.2337331
[10] Y. Jacob, F. Manitsaris, G. Lele, and L. Pradere, “Hand gesture recognition for driver
vehicle interaction,” in Proc. IEEE Comput. Soc. Workshop Observing Understand.
Hands Action 28th IEEE Conf. Comput. Vis. Pattern Recognit., Boston, MA, USA,
2015, pp. 41–44.
[11] U. Reissner, “Gestures and speech in cars,” Dept. Informat., Technische Univ.
Munchen, Munchen, Germany, 2007.
[12] N. Gillian, “Gesture Recognition Toolkit, ver. 1.0,” Oct. 2017. [Online]. Available:
http://www.nickgillian.com/grt/
[13] P. Molchanov, S. Gupta, K. Kim, and K. Pulli, “Short-range FMCW
monopulse radar for hand-gesture sensing,” in Proc. IEEE Radar Conf.,
Johannesburg, South Africa, 2015, pp. 1491–1496, Online]. Available:
http://dx.doi.org/10.1109/RADAR.2015.7131232
[14] G. Shaker, “Gesture recognition using mm-waves,” Waterloo Artif. Intell. Inst., Nov
2017. [Online]. Available: https://goo.gl/iRqkJC
[15] J. Lien, “Soli: Ubiquitous gesture sensing with mm-wave radar,” ACM
Trans. Graphics, vol. 35, no. 4, 2016, Art. no. 142, [Online]. Available:
http://dx.doi.org/10.1145/2897824.2925953
[16] S. Naik, H. R. Abhishek, K. N. Ashwal, and S. P. Balasubramanya, “A study
on automotive human vehicle interaction using gesture recognition technol-
ogy,” Int. J. Multidisciplinary Cryptol. Inf. Secur., vol. 1, no. 2, pp. 6–12,
2012.
[17] M. Alpern and K. Minardo, “Developing a car gesture interface for use as
a secondary task,” in Proc. ACM Extended Abstracts Hum. Factors Com-
put. Syst., Fort Lauderdale, FL, USA, 2003, pp. 932–933. [Online]. Available:
http://dx.doi.org/10.1145/765891.766078