3D Ultrasonic Gesture Recognition
ABSTRACT An ultrasonic 3D gesture recognition system uses an array of AlN MEMS transducers and custom readout electronics to localize targets over a +/-45deg field of view up to 1m away. The rms localization error at 0.5m range is 0.4mm, 0.2deg, and 0.8deg for the range, x-angle, and y-angle axes, respectively. The 0.18μm CMOS ASIC comprises 10 independent channels with separate high voltage transmitters, readout amplifiers, and ADCs. Power dissipation is 400μW at 30fps, and scales to 5μW/ch at 10fps.
- SourceAvailable from: u-tokyo.ac.jp[Show abstract] [Hide abstract]
ABSTRACT: A miniaturized ultrasound range finder in a standard 0.8-μm CMOS technology is described. It uses thermally excited silicon membrane transducers to generate or detect ultrasound at 100 kHz and can measure distances up to 11 cm with an accuracy of 2.5 mm. The distance sensor is based on a quasi-continuous measurement requiring coherent ultrasound waves, that are delivered by an electro-mechanical phase locked loop. Circuit and system techniques as well as package considerations are presented which allow the range finder to be implemented as a fully integrated 0.8-μm CMOS microsystemIEEE Journal of Solid-State Circuits 01/2001; · 3.11 Impact Factor
- A Micromechanical Ultrasonic Distance Sensor With >1 Meter Range. Transducers Dig. Tech. Papers 2070-2073..
- A 1.5Mpixel RGBZ CMOS Image Sensor for Simultaneous Color and Range Image Capture. ISSCC Dig. Tech. Papers 392-393..
210• 2014 IEEE International Solid-State Circuits Conference
ISSCC 2014 / SESSION 12 / SENSORS, MEMS, AND DISPLAYS / 12.1
12.13D Ultrasonic Gesture Recognition
Richard J. Przybyla1, Hao-Yen Tang1, Stefon E. Shelton2,
David A. Horsley2, Bernhard E. Boser1
1University of California, Berkeley, CA, 2University of California, Davis, CA
Optical 3D imagers for gesture recognition suffer from large size and high power
consumption. Their performance depends on ambient illumination and they
generally cannot operate in sunlight. These factors have prevented widespread
adoption of gesture interfaces in energy- and volume-limited environments such
as tablets and smartphones. Wearable mobile devices, too small to incorporate
a touchscreen more than a few fingers wide, would benefit from a small,
low-power gestural interface. Gesture recognition using sound is an attractive
alternative to overcome these difficulties due to the potential for chip-scale size,
low power consumption, and ambient light insensitivity. Using pulse-echo
time-of-flight, MEMS ultrasonic rangers work over distances of up to a meter
and achieve sub-mm ranging accuracy [1,2]. Using a 2-dimensional array of
transducers, objects can be localized in 3 dimensions.
This paper presents an ultrasonic 3D gesture-recognition system that uses a
custom transducer chip and an ASIC to sense the location of targets such as
hands. The system block diagram is shown in Fig. 12.1.1. Targets are localized
using pulse-echo time-of-flight methods. Each of the 10 transceiver channels
interfaces with a MEMS transducer, and each includes a transmitter and a
readout circuit. Echoes from off-axis targets arrive with different phase shifts for
each element in the array. The off-chip digital beamformer realigns the signal
phase to maximize the SNR and determine target location.
The 450μm diameter piezoelectric micromachined ultrasound transducers
(pMUTs) used in this work are made up of a 2.2μm thick AlN/Mo/AlN/Al stack
deposited on a Si wafer and released with a back-side through-wafer etch. The
bottom electrode is continuous, while each pMUT has a top electrode
lithographically defined to actuate the trampoline mode. Each pMUT can
transmit and receive sound waves, and is operated at its resonance of
217kHz ± 2kHz with a bandwidth of 12kHz. The impedance of the transducers is
dominated by the 10pF transducer capacitance, and the motional resistance at
resonance is ~2.4MΩ. The resonant frequencies of the pMUTs vary due to
fabrication, temperature, and packaging stress, so online frequency tracking is
used to maintain maximum SNR during operation.
Two pMUTs are used for transmission and seven for reception as illustrated in
Fig. 12.1.1. The receive array is 3.5 wavelengths wide in the x-angle axis,
allowing targets separated by more than 15° to be distinguished. In the y-angle
axis the array is only 0.16 wavelengths wide, sufficient to determine the y-angle
to the target by measuring the average phase difference along the y axis of the
array. The center element of the receive array and the element 900μm above it
are used to launch a 138μs ≈ 24mm long pulse of sound into the environment.
The transmit configuration illuminates a wide field of view, permitting the
capture of an entire scene in a single measurement. Applications requiring better
target resolution or greater maximum range can also use transmit beamforming
at the expense of reduced measurement rate.
Each cycle begins with the launch of an acoustic pulse. Figure 12.1.2 shows the
schematic of a single channel. High-voltage level shifters actuate the STXtransmit
switches, setting the transducer’s bottom electrode to 16V to permit bi-polar
actuation of the transducer. The transmitter then excites the transducer with a
32Vppsquare wave for 30 cycles at the transmit frequency fTXwhich is locked to
1/16thof the sampling frequency fs. At the end of the transmit phase, the
mechanical energy stored in the inertia of the pMUT dissipates and the pMUT
rings down at its natural frequency. The SRXreceiver isolation switches are
turned on, and a resistor converts the ringdown current to a voltage that is
subsequently amplified and digitized by the receiver normally. The ringdown
signal is then I/Q demodulated with fTX. The slope of the phase signal during the
ringdown indicates the frequency offset and is used to update the fsand fTXused
in the next measurement. Figure 12.1.3 shows the offset measured by the
frequency autotuning loop as it is enabled. An initial 57kHz offset frequency is
nulled to 1kHz within 30 measurement cycles.
After 86μs, the ringdown signal has decayed sufficiently for the Sringswitch to be
opened, beginning the processing of received echoes. At this point, the signal
from the transducer is integrated on the transducer’s capacitance, and the
front-end measures a voltage that is proportional to the displacement of the
The front-end amplifier consists of an open-loop current-reuse OTA with both
NMOS and PMOS differential pairs biased near subthreshold for current
efficiency. The front-end current is integrated onto the integrating capacitor of
the second stage, which also makes up an integrator in the first of two switched
capacitor resonators. Although the second stage is a switched capacitor
integrator, the front-end current is processed in a continuous-time fashion
before it is sampled at the output of the second integrator. As a result, the
second integrator acts as an anti-aliasing filter for the wideband noise generated
by the front-end and prevents this noise, the dominant noise source in the
receiver architecture, from being aliased into the band of interest.
The signal then passes through a second switched-capacitor resonator and is
quantized by a comparator. The high in-band gain provided by the 4th-order
bandpass filter shapes the wideband quantization noise to be away from the
signal at fTX. The SC resonators are designed to resonate at 1/16 of the sampling
frequency fs, which is locked to the transducer’s resonance by the ringdown
autotuning circuit. This centers the bandpass ΔΣ’s noise notch on the signal at
The output of each ΔΣ ADC is I/Q demodulated, filtered, and downsampled off-
chip. A digital beamformer  processes the received signals to maximize the
receive SNR and determine the x-angle location of the target. This process can
be repeated in the orthogonal angle axis to implement 3D beamforming; in this
work we forgo 3D beamforming since the tiny y-axis aperture does not provide
any y-axis resolution.
Thermal noise in the front-end amplifier and the thermal motion of air limit the
minimum detectable echo. The input-referred noise of the amplifier is 11nV/√Hz,
and the noise voltage of the transducer is 6nV/√Hz at resonance. Figure 12.1.4
shows the measured signal-to-noise ratio vs. range for a 127mm×181mm flat
rectangular target. Figure 12.1.4 also shows the rms error in the range and
direction measurement. Amplitude noise in the received signal limits the
accuracy of the time-of-flight estimate. Figure 12.1.5 shows the output of the
digital beamformer from a single measurement, which captures the echoes from
a user’s hands and head as he poses as shown. The system tracks objects
between 45mm to 1m away and over an angular range of ±45°. Echoes from
targets at a range of 1m return after 5.8ms, and this sets the maximum
measurement rate of the system at 172 frames per second (fps).
Figure 12.1.7 shows a micrograph of the readout IC, which is fabricated in a
0.18μm CMOS process with 32V transistors. For a 1m maximum range, the
system presented here uses 13.6μJ per measurement. At 30fps, the receive
power consumption is 335μW and the transmit power consumption is 66μW.
The energy consumption scales roughly linearly with maximum range. For a
maximum range of 0.3m, the energy per frame is reduced to <0.5μJ per channel
per frame. Single-element range measurements can be conducted at 10fps using
Figure 12.1.6 compares the performance of this system to an earlier MEMS
ultrasonic 1D rangefinder  and two recent optical 3D rangers [5,6]. This
ultrasonic 3D rangefinder offers dramatically reduced energy consumption
compared to optical methods while permitting 3D target tracking. The energy
consumption trades off with performance, permitting continuous operation in
even tiny mobile devices. These characteristics enable energy-efficient gestural
interfaces in applications such as smartphones and tablets, and permit gestural
user interfaces in tiny mobile devices too small to accommodate a conventional
 R. Przybyla, et al., “In-air Ultrasonic Rangefinding and Angle Estimation
using an Array of AlN Micromachined Transducers,” in Proc. Hilton Head
Workshop, pp. 50-53, 2012.
 R. Przybyla, et al., “A Micromechanical Ultrasonic Distance Sensor With >1
Meter Range,” in Transducers Dig. Tech. Papers, pp. 2070-2073, 2011.
 M. Skolnik, Introduction to Radar Systems. 3rd edition, McGraw-Hill, 2001.
 C. Kuratli and Q. Huang, “A CMOS Ultrasound Range-Finder Microsystem,”
IEEE J. Solid-State Circuits, vol.35, no.12, pp. 2005-2017, Dec. 2000.
 W. Kim, et al. “A 1.5Mpixel RGBZ CMOS Image Sensor for Simultaneous
Color and Range Image Capture,” ISSCC Dig. Tech. Papers, pp. 392-393, Feb.
 O. Shcherbakova, et al., “3D Camera Based on Linear-Mode Gain-Modulated
Avalanche Photodiodes,” ISSCC Dig. Tech. Papers, pp. 490-491, Feb. 2013.
978-1-4799-0920-9/14/$31.00 ©2014 IEEE
211DIGEST OF TECHNICAL PAPERS •
ISSCC 2014 / February 11, 2014 / 8:30 AM
Figure 12.1.1: System block diagram.
Figure 12.1.2: Readout circuit with mixed CT/SC architecture for inherent
antialiasing. All structures are implemented differentially.
Figure 12.1.3: Ringdown frequency offset measurement and tuning loop
Figure 12.1.5: Echo from user’s hands and head when posing as shown. Color
axis shows y-angle position of the targets. Beamformed data is thresholded at
Figure 12.1.6: Comparison table.
Figure 12.1.4: Signal-to-noise ratio and target localization accuracy vs. range
for 127mm×181mm flat rectangular target.
• 2014 IEEE International Solid-State Circuits Conference 978-1-4799-0920-9/14/$31.00 ©2014 IEEE
ISSCC 2014 PAPER CONTINUATIONS
Figure 12.1.7: CMOS die photo and MEMS ultrasound die photo.