ArticlePDF Available

Acoustic Software Defined Platform: A Versatile Sensing and General Benchmarking Platform



Acoustic sensing has attracted significant attention recently, thanks to the pervasive availability of device support. However, adopting consumer-grade devices (e.g., smartphones) to deploy acoustic sensing applications faces the challenge of device/OS heterogeneity. Researchers have to pay tremendous efforts in tackling platform-dependent details even in simply accessing raw audio samples, thus losing focus on innovating sensing algorithms. To this end, this paper presents the first Acoustic Software Defined Platform (ASDP): a versatile sensing and general benchmarking platform. ASDP encompasses several customized acoustic modules running on a ubiquitous computing board, backed by a dedicated software framework. It is superior to commodity devices in controlling and reconfiguring physical layer settings, thus offering much better usability. The tailored software framework abstracts platform details and provides user-friendly interface for fast prototyping, while maintaining adequate programmability. To demonstrate the usefulness of ASDP, we showcase several relevant applications based on it. The promising outcomes make us believe that the release of our ASDP could greatly advance acoustic sensing research.
Acoustic Software Defined Platform: A Versatile
Sensing and General Benchmarking Platform
Chao Cai, Henglin Pu, Menglan Hu, Rong Zheng, Senior Member, IEEE, and Jun Luo, Senior
Member, IEEE
Abstract—Acoustic sensing has attracted significant attention recently, thanks to the per vasive availability of device support. However,
adopting consumer-grade devices (e.g., smartphones) to deploy acoustic sensing applications faces the challenge of device/OS
heterogeneity. Researchers have to pay tremendous efforts in tackling platform-dependent details even in simply accessing raw audio
samples, thus losing focus on innovating sensing algorithms. To this end, this paper presents the first Acoustic Software Defined
Platform (ASDP): a versatile sensing and general benchmarking platform. ASDP encompasses several customized acoustic modules
running on a ubiquitous computing board, backed by a dedicated software framework. It is superior to commodity devices in controlling
and reconfiguring physical layer settings, thus offering much better usability. The tailored software framework abstracts platform details
and provides user-friendly interface for fast prototyping, while maintaining adequate programmability. To demonstrate the usefulness of
ASDP, we showcase several relevant applications based on it. The promising outcomes make us believe that the release of our ASDP
could greatly advance acoustic sensing research.
Index Terms—Acoustic sensing, platform, context-aware sensing, aerial acoustic communication.
The rise of Internet-of-Things (IoT) technologies fuels the
proliferation of various emerging systems that exploit rich
on-board sensors [1], [2], [3], [4], [5]. Among these emerg-
ing technologies, acoustic sensing has attracted significant
attention due to the wide-spread adoption of acoustic sen-
sors. Given the relentless efforts paid by researchers in
the past decades, acoustic sensing has brought diversified
applications including aerial acoustic communication [1],
[6], [7], interactive sensing [3], [4], [8], and context-aware
computing [9], [10], [11].
Nonetheless, unlike wireless network community hav-
ing RF Software Defined Radios (RF-SDRs) to facilitate
flexible designs [12], [13], unifying acoustic communication
and sensing under distinct channel conditions is far more
challenging. As a result, existing Software Defined Modems
(SDM) for under-water communications (e.g., [14]) cannot
readily serve aerial acoustic purposes. In particular, sens-
ing functions (e.g., reliable preamble detection) cannot be
readily achievable by SDM under more severe path loss,
Doppler, near-far, and multipath effect. Meanwhile, there
are no tailored software framework supportive for aerial
acoustic sensing. All these facts have forced researchers to
resort to mobile devices [15]. However, extracting audio
samples on mobile devices, crucial in developing acoustic
sensing applications, can be very discouraging, because
C. Cai is with the College of Life Science and Engineering, Huazhong
University of Science and Technology. This work was carried out when
the author was a postdoc at Nanyang Technological University.
J. Luo is with the School of Computer Engineering, Nanyang Technologi-
cal University, Singapore 639798.
H. Pu, and M. Hu are with the School of Electronic Information and
Communications, Huazhong University of Science and Technology.
R. Zheng is with the Dept. of Computing and Software, McMaster
w AGC w/o AGC
center Raspbian OS
Hardware abstraction
User space app / GUI
Remote host
ASDP hardware ASDP software
Standalone mode
Remote mode
Fig. 1: Our Acoustic Software Defined Platform [16] consists
of hardware and software parts: three hardware modules
(left) and four software layers on the Raspbian Operating
System (OS) (middle). Acoustic sensing tasks can be per-
formed either locally or remotely on another host (right).
one has to put tremendous efforts on platform-dependent
details such as programming languages and development
frameworks. Additionally, acoustic sensing applications of-
ten involve tedious parameter tuning, further complicated
by hardware heterogeneity [15]. Most importantly, the lack
of a common testbed support makes it difficult to fairly
compare the performance of different techniques.
To address the above issues, researchers have made
some initial attempts. Proposals in [17] and [18] are among
the first to build general sensing platforms on commodity
mobile devices. Later on, the authors in [15] released a
cross-platform support for ubiquitous acoustic sensing. This
later proposal develops an agent to simplify the process
of extracting audio samples and hence one only needs to
focus on algorithm design. Despite their partial successes,
these platforms are built upon existing general-purpose
mobile devices and hence still lack sufficient flexibility in
controlling and reconfiguring hardware. For instance, dis-
abling Automatic Gain Control (AGC) for fingerprinting-
based sensing needs to go through a tedious test-and-verify
process from device to device, wasting significant efforts.
Moreover, these software-oriented platforms only support
a limited number of devices due to compatibility issues
caused by hardware heterogeneity and host environment
diversity. Finally, lack of sufficient device compatibility
renders these software platforms incompetent to serve a
common benchmark purpose.
The above concerns make it very difficult to build a
versatile platform on general-purpose devices. Though re-
sorting to customized hardware may seem to be a straight-
forward solution, designing a general platform for both
novices and experienced developers are highly non-trivial.
Whereas novices often prefer to use it as a black-box, an
experienced developer may opt to deploy code at kernel
layer for higher efficiency. Therefore, a platform suitable
for the former would inevitably hide too many details,
thus losing the edge in programmability to cater the lat-
ter. Consequently, striking a balance between usability and
programmability is challenging. In addition, maintaining an
adequate portability of applications developed on ASDP to
commodity devices is highly desirable yet very challenging,
as it allows end-users to quickly benefit from a particular
applications, wheres it is largely hampered by the diversity
of OS and toolkit.
To answer these challenges, we build an Acoustic Soft-
ware Defined Platform (ASDP) based on customized hard-
ware and a ubiquitous computing platform. As shown in
Fig. 1, ASDP is built on cost-effective acoustic hardware,
customized in a Plug-and-Play (PnP) manner and driven
by a Raspberry Pi. The acoustic module serves as a base-
band unit; it utilizes a PnP interface to communicate with
the acoustic front-ends (a.k.a. the microphone modules).
These front-ends offer alternative circuits and programable
interface so as to enable flexible control; such a modular
design also yields hardware reconfigurability. Based on the
Raspbian OS, we develop a layered software framework
with balanced usability and programmability to enable fast
prototyping and delivers an adequate portability. In sum-
mary, this paper makes the following major contributions:
We release the first unified ASDP to facilitate fast
prototyping and meaningful benchmarking. It contains
both a customized hardware platform and a tailored
software framework.
We elaborate the designs of ASDP, exposing the de-
tailed construction of acoustic subsystem and the sup-
portive software framework.
We showcase several application examples on our
ASDP platform, demonstrating its versatile capabilities
in supporting acoustic sensing developments, while
providing guidelines on effectively exploring these ca-
We conduct extensive experiments to evaluate the
performance of representative hardware and software
modules. The results demonstrate sufficient process-
ing capability and better performance than commodity
Unlike RF-SDR, ASDP is meant to not only deliver a
common platform but also to provide insights for smart-
phone manufactories to improve acoustic subsystem, thus
better cultivating acoustic sensing applications. In the fol-
lowing, we first briefly study the literature in Sec. 2, then
present ASDP hardware and software designs in Sec. 3.
Relevant applications of ASDP are further presented in
Sec. 4. We report performance evaluations of ASDP in Sec. 5,
and summarize the benefits and lessons of ASDP in Sec. 6,
before concluding our paper in Sec. 7.
In the past decades, acoustic sensing has attracted great
attention and witnessed a surge of applications in aerial
acoustic communication [19], [1], [7], [2], context-aware
computing [20], [21], [10], [11], [22], [9], [23], [24], [25],
as well as interactive sensing [26], [4], [3], [27], [28], [29],
[30], [31]. In this section, we first introduce some of these
applications, and then we discuss the need for a common
platform and also elaborate on existing developments.
2.1 Acoustic Sensing Applications
Aerial acoustic communication enables devices with acous-
tic transducers, i.e., microphones and speakers, to exchange
information. This type of applications usually treat acoustic
signals as wireless medium, so algorithms and designs in
RF domain can be applied with compulsory modifications
and customizations to account for the special properties of
acoustic signals. While long-range communications favor
Chirp Spread Spectrum (CSS) [19] and its extensions [2] due
to their resilience to the Doppler effect, multipath fading,
and synchronization errors, a higher throughput can be
achieved by more efficient modulationss such as Orthogonal
Frequency Division Multiplexing (OFDM) [7], [1]. State-of-
the-art approaches have achieved a maximum 2.4kbps at
proximity and at most 25 m communication range with a
16 bps data rate. Therefore, aerial acoustic communications
have become competent alternatives to traditional RF-based
short-range communications [1].
Context-aware sensing refers to applications that infer
distance/range [20], [22], [32], location [21], [10], [11], [9],
[33], [25], [34], [35], or motion trajectory [23], [24] from
acoustic signals. This category of applications take advan-
tage of the low propagation speed of acoustic signals to
measure Time-of-Arrival (ToA) and/or Time-Difference-of-
Arrival (TDoA), so as to enable tracking or localization. By
far, centimeter-level localization [9], [33] and sub-centimeter
tracking [23], [24] have both been achieved. Interactive
sensing applications range from recognizing body-scale ac-
tivities [26], [27] to finger-scale gestures [4], [3], [28], [29].
The underlying principle is that physical interactions can
affect acoustic channels to a detectable extent. Therefore, by
actively transmitting a modulated signal and analyzing the
properties of the received waveforms, one can identify the
associated physical interactions. Cutting-edge approaches
can decipher millimeter-scale finger gestures in a device-
free manner.
2.2 Acoustic Sensing Platforms
The aforementioned applications have motivated an increas-
ing research interest in acoustic sensing, but the lack of a
common platform is still discouraging many researchers.
One may believe that the community already has a plethora
of SDMs [36], [14], [37] that could act as general-purpose
sensing platforms. However, existing SDMs only serve
under-water communication purposes; converting them to
support aerial acoustic applications requires a total revamp
of the software framework, not to mention the portability
and cost issues of replacing the front-ends on these SDMs for
aerial acoustics. These reasons leave researchers no choice
but resort to commodity mobile devices for aerial acoustics.
Therefore, the need for building a common acoustic sensing
platform arises, and some initial attempts have been made
in this direction.
LibAS [15] is a recent cross-platform framework dedi-
cated to mobile acoustic sensing. It simplifies the process
to obtain audio samples by building a wrapper function,
enabling fast prototyping. However, due to device/OS
heterogeneity, LibAS only supports a limited number of
devices, confining its wide-scale adoption. Also, hardware
configurations are not easy to achieve; for instance, AGC
manipulation has to go through a tedious test-and-verify
process from device to device. Better than LibAS, ASDP
not only provides salient flexibility in fast prototyping but
also delivers extra flexibility in hardware controllability and
Other acoustic sensing frameworks are rather task-
specific and hence applicable only to limited application
scenarios. For instance, DSP.Ear [38] aims to reduce the
computational time cost by using the GPU of a device,
while DeepEar [39] provides a deep learning framework
pertaining to acoustic sensing. CAreDroid [17] is a frame-
work dedicated to general context-aware computing, hence
not specific enough to handle acoustic sensing. Code In the
Air (CITA) [18] is similar to LibAS but only provides lim-
ited real-time support. Auditeur [40] and SoundSense [41]
are confined to passive acoustic sensing, and are thereby
inadequate to become a common platform.
Besides their respective limitations, these platforms also
have compatibility issues. For instance, to successfully ex-
tract audio samples from Android platforms, one has to deal
with a class called AudioTrack. It streams audio samples to a
ring buffer, whose length is hardware-dependent and has to
be tuned from device to device. This hardware heterogene-
ity problem partly explains why LibAS only supports a lim-
ited number of devices. Moreover, if the host environment
including SDKs or OS undergoes upgrades, these platforms
can easily fail to operate. The aforementioned reasons have
become the major hurdle in building a common platform on
existing devices, thus calling for a standalone design similar
to the SDRs in RF domain [12], [13].
In this section, we present detailed hardware and software
designs of our proposed ASDP. As shown in Fig. 1, the
ASDP hardware physically has three parts: Raspberry Pi,
acoustic baseband, and acoustic front-end. They can be as-
sembled together via PnP sockets. To activate the hardware,
we develop a layered software framework mainly consisting
of four layers: driver, hardware abstraction, middleware,
and user space. In the following, we first explain the design
Mother Board: Raspberry Pi
Codec: WM8731 Power Amplifier:
Power Management : TPS54332
Mic Module:
Isolated Power Line
The Acoustic Module
Socket Interface
Control unit: Raspberry Pi
Codec: WM8731
Power Amplifier:
Power Management
Mic Module:
Isolated Power Line
Acoustic baseband
Socket Interface
Mic Module:
Control unit: Raspberry Pi
Codec: WM8731
Power Amplifier:
PAM8406 Power Management
Mic Module:
Acoustic baseband
Mic Module:
Data or control Power
Fig. 2: Function blocks of ASDP hardware.
rationales, followed by elaborations on the critical design
3.1 Hardware Design and Channel Model
We hereby present hardware design details and acoustic
channel model. As shown in Fig. 1, ASDP adopts a modular
design that separates the baseband from the front-end. Its
hardware consists of three components, whose correspond-
ing function blocks are shown in Fig. 2. Before diving into
the details, we first explain the major design considerations.
3.1.1 Design Considerations
Choosing appropriate hardware modules for ASDP is of
significant importance since they affect the overall system
performance. In ASDP hardware design, we aim to meet
the following requirements:
Sensing: The ASDP hardware should be sufficient to acous-
tic sensing. Specifically, a 16-bit resolution, a minimal of
48 kHz sampling rate (common standards for commodity
mobile devices), and at least two audio input channels
should be supported. Moreover, there are two optional yet
desirable requirements: i) turbocharging the receiver gain at
the inaudible frequency range (>16 kHz [23], [28]), and ii)
alleviating the notorious frequency selectivity problem [23].
Computation: ASDP should be powerful enough to handle
audio samples in a fluent manner. Consequently, compu-
tationally heavy operations such as frequently used cross-
correlation or FFT (Fast Fourier Transform) should only
introduce acceptable latency.
Reconfigurability: Reconfigurablity allows for hardware ex-
tensions, enabling more flexible setup for experimentation.
This likely implies a modular design that makes use of PnP
connections to wire different modules.
TABLE 1: Hardware components and parameters.
Name Function Major parameters
WM7831 Codec Stereo, 96kHz, 32bit
TPS5420 Power management 5.536v input, 3A
INMP411 Microphone 61dBA, 60Hz22 kHz
PAM8406 Power amplifier Stereo, 5v, 5 W
MAX9814 AGC 4060dB
LM431 Adjustable shunt
regulator 50ppm/C
LM386 Amplifier 20200×gain
Acoustic front-end
Sensor hole
Acoustic baseband
Mic Module
Raspberry pi
Mic Module TPS54332
LM431 WM8731
Acoustic front-end
Sensor hole
Acoustic baseband
Top view Top viewBottom view
Top view Bottom view
PAM8406Audio jack
Fig. 3: ASDP hardware implementation: a snapshot of the
acoustic front-end and baseband.
3.1.2 Hardware Details
The aformentioned considerations lead to a hardware de-
sign in Fig. 2 where the devices and their corresponding pa-
rameters are shown in TABLE. 1. We also provide examples
of our actual hardware implementation in Fig. 3.
Acoustic Baseband: The baseband module manages all au-
dio signal flows. The core part of the baseband module is the
codec chip WM8731 [42]. It supports 16 to 32-bit resolution
and a programmable sampling rate ranging from 8kHz to
96 kHz, sufficient for most acoustic sensing applications.
The codec drives its stereo output to a PAM8406 low-
power class D audio power amplifier. Also, the stereo out-
put is routed to a 3.5mm audio jack, which can be used to
connect external speakers and power amplifiers. The inputs
to the WM8731 codec are first magnified by an LM386 am-
plifier, whose default gain is set to 1. The objective of placing
LM386 here is to achieve a proper reconfigurability, say
turbocharging receiver gain. Another unit on the baseband
module is TPS5420, a DC-to-DC (Direct Current) converter
to supply power.
The acoustic baseband interfaces with the control unit
via a 40-pin PnP socket. The audio inputs of the acoustic
baseband are connected to two 4-pin sockets, which are used
to hook up an acoustic front-end.
Acoustic Front-end: The acoustic front-end is used to con-
vert the sound waves into analog signals. The crucial part is
the microphone sensor INMP411, which exhibits remarkable
performance. In particular, INMP411 has a flat frequency
response in the audible frequency range (e.g., 100 Hz -
10 kHz) and a higher sensitivity in the inaudible range
(above 16 kHz) as shown in Fig. 4 (red curve). The first
property helps mitigating the frequency selectivity prob-
lem [24], while the second one offers higher channel gains
in the inaudible range, thereby boosting performance. In
addition, the second property can compensate speaker-side
high attenuation in inaudible range and hence render the
frequency response at receiver-side almost flat in the entire
frequency range (the green curve in Fig. 4).
For easy control and reconfiguration, we design two
different acoustic front-ends. One utilizes 2N3904 transistors
for amplification and hence bears a fixed gain. The other
exploits MAX9814 AGC for 4060 dB dynamic gains. On
the front-end with AGC, a logic switch links the acoustic
front-end through the baseband to the control unit, so as
to enable a programmable maximum gain. To stabilize the
power source and in turn to gain a better Signal-to-Noise
Ratio (SNR), we use LM431, a voltage reference chip, to
power up INMP411.
Control Unit: The control unit, i.e., Raspberry Pi, is respon-
sible for firing up above acoustic modules and dealing with
all signal processing tasks. Raspberry Pi is a versatile plat-
form for ubiquitous computing; it is worldwide available
and has a variety of community supports. In ASDP, we
uses a Raspberry Pi 3 Model B+ board [43] that contains
a powerful 1.4GHz 64-bit quad-core processor, a graphical
processing unit (GPU), and many peripherals. Specifically,
the core processor supports the Neon technology [44], a
packed Single Instruction Multiple Data (SIMD) hardware
structure that can be used for acceleration, in addition to
GPU. Though unlikely to be as competent as up-to-date
smartphones, this control unit is sufficiently powerful for
supporting our unified acoustic sensing platform.
ASDP does not seek to involve high-performance hard-
ware units, such as codec supporting a 192 kHz sampling
rate or control unit adopting FPGA for sophisticated sig-
nal processing, but relies on economical and ubiquitous
components both sufficient for acoustic sensing and easy
to acquire. Currently, one control unit only supports one
baseband that allows two acoustic front-end connections.
Though there are acoustic subsystems available on the
market [45], they do not provide flexible control and easy
reconfigurability, and most of them do not have on-board
AGC. These shortcomings make them even less appealing
than consumer-grade devices.
3.1.3 Channel Properties
We hereby briefly discuss the acoustic channel proper-
ties. After a process of transmission, propagation, and
finally reaching a receiver end, acoustic signals suffer
from channel distortions including frequency selectivity,
speaker diaphragm inertia, time-variant dynamic gains, se-
vere Doppler effect, etc. The above channel effects can be
characterized by the following equation:
H= (Ap+Aa)AdArAtAg+σs,(1)
Frequency in Hz
100 1k 10k
Normalized Frequency Response (dBV)
Frequency in Hz
100 1k 10k
Normalized Frequency Response (dBV)
Frequency in Hz
100 1k 10k
Normalized Frequency Response
From datasheet
From measurement
Fig. 4: Frequency responses of the microphone.
Wrapper modul e
Configuration script
Libraries Signal processing functions
NEON and GPU utilities
Audio recorder/
Signal generator
UI module
User app
Online mode
Offline mode User space
Communication module
Wrapper modul e
Configuration script
User app
Signal processing functions
Computing utilities
User space
Communication module + Library Middleware
Hardware abstraction layer
Wrapper modul e
Configuration script
Libraries Signal processing functions
NEON and GPU utilities
User app
Shell script
Java GUI
Function generator
aplay, arecord
Driver Driver layer
Wrapper module
Configuration script
User app
Signal processing functions
Numerical utilities
User space
Communication modules Middleware
Hardware abstraction layer
Shell scripts
Java GUI
Function generator
aplay, arecord
Driver Driver layer
Fig. 5: ASDP software framework. The red box contains two major components supporting user application developments.
where Hdenotes the channel effects. Ap,Aa,Ad,Ar,At,
and Agrepresent propagation loss, absorption loss, Doppler
effect, frequency selectivity on receiver side, frequency se-
lectivity on transmitter side, and time-variant gain by AGC,
respectively. σsis the possible noise caused by transmitting
specific signals. Among these channel effects, Ap=kP0/r2,
where P0denotes the power density at zero distance, kis a
coefficient, and ris the distance [46]. As a function of tem-
perature Tand frequency f,Aa=g(T, f )[46]. Ad=β(f, v)
indicates that Doppler effect introduces frequency shift af-
fected by frequency fand velocity v. The travel speed
vof acoustic signals is a function of temperature T[47]:
v=c+mT , where c= 331 m/s (at 0 K) is a constant and
m0.606 is a coefficient. The above channel effects are the
basis for our demonstrative applications.
3.2 Software Design
We discuss software design in a similar vein as we present
the hardware design.
3.2.1 Design Considerations
Considerations for ASDP software design are taken primar-
ily towards facilitating fast prototyping.
Ease of Configuration: It is desirable that a platform
provides high-level configuration interfaces, especially for
low-layer hardware settings that are cumbersome to access
on most existing platforms. Both Graphic User Interface
(GUI) and readable scripts are feasbile means to serve this
Compatibility: The framework should not be prone to
compatibility issues when the host environment undergoes
changes. This consideration places prerequisites on the
choice of host environments, drivers, and dependencies,
etc. In addition, it is plausible to maintain compatibility
with existing software utilities, enriching the choice for
developments as well as attracting more developers for
improvements and maintenance.
Balanced Usability and Programmability: As a general
framework, it should not only offer adequate usability
for novices to readily grasp, but also maintain sufficient
programmability for experienced users to conduct efficient
3.2.2 Software Framework
Motivated by the above design considerations, we hereby
present our ASDP software framework. As shown in Fig. 5,
this framework consists of four layers: driver, hardware
abstraction, middleware, and user space.
Driver Layer: This layer involves a specific device driver to
fire up the codec chip WM8731. Since this driver is available
in kernel sources [48], we simply build a script to configure
it. The driver interfaces with the wrapper module in the
hardware abstraction layer. Also, it communicates with shell
scripts in the middleware via existing utilities including
aplay and arecord [49], [50].
Hardware Abstract Layer: This layer has a wrapper module
to interface with upper layer. It loads a readable JaveScripte
Notation Object (JSON) formatted configuration script from
user space to configure the audio hardware. This configura-
tion script can specify the properties of audio hardware in-
cluding sampling rate, number of channels, bit resolutions,
Middleware: This layer contains three major components,
namely shell scripts, communication modules, and a nu-
merical library (see Fig. 5). The shell scripts serve as com-
munication channels to dispatch commands from the GUI
to the hardware. It allows for playing and recording audio
profiles by invoking existing utilities such as aplay and
arecord through system calls. In this way, the shell scripts
help enable acoustic sensing in an offline manner, as well as
to debug the hardware.
The communication modules in the middleware offer
three interfaces to bridge the gaps between different pro-
gramming languages or to delegate the processing tasks
to other hosts. First, one can utilize the socket to interface
with other target hosts or programming languages. In par-
ticular, a client-side MATLAB script, either running locally
or remotely, together with a local server-side socket script
are provided, assisting the visualization of the processing
results in real-time. Since the above socket based scripts
accomplish the most cumbersome task of extracting raw
audio samples, one can focus on algorithm designs and
save efforts on platform details. Second, we build a hybrid
of C and Java programming environment via Java Native
Interface (establishing another channel for the GUI to com-
munication with hardware as shown in Fig. 5), emulating
Android and assisting to debug Native Development Kit
APIs (infeasible to achieve on Android hosts). Third, we
build a FIFO (First In First Out) data buffer to retrieve (push)
audio samples from (to) the hardware. To access this FIFO
buffer, one only needs to include certain C header files.
The numerical library contains utility functions for fast
prototyping. It has basic signal processing methods such as
i) waveform reshaping to mitigate audible noise when trans-
mitting signals in inaudible frequency range, ii) preamble
detection algorithms crucial for most acoustic sensing appli-
cations, and iii) robust decoding strategies. Specifically, we
propose a new preamble detection algorithm to effectively
handle the near-far effect, hardware heterogeneity, and mul-
tipath problem (we refer to Sec. 5.3 for further details). In ad-
dition, the library also includes efficient numerical utilities
built upon two special hardware architectures, namely Neon
and GPU, to speed up the program. Both the communication
modules and numerical library serve as major components
to support user application developments.
User Space: In the user space, ASDP offers a user-friendly
Java-based GUI and a readable JSON-format script as shown
in Fig. 5. The GUI offers various ways for acoustic sensing
developments. For instance, it allows recording and playing
audio signals via shell scripts by invoking arecord and
aplay, respectively, for offline processing. This can also be
achieved by running audacity, a compatible professional
audio utility. In addition, it provides a function generator
for modulation and an oscilloscope for display. Specifically,
it can generate popular signals such as pure tone or chirp
signals, and it can also visualize the FFT results or spec-
trogram of received signals in real-time. These functions
provided by the GUI can help debug the ASDP hardware
and also serve as good utilities for beginners. Moreover,
the GUI can read/write the JSON-format script that offers
another way for advanced developers to easily access and
configure the hardware. It should be noted that the offline
processing method could not be readily enabled by running
available applications on commodity smartphones. Those
applications mostly have “filters” inside, introducing equal-
ization effects [51] that further deteriorate the frequency
selectivity problem, rendering them unsuitable for acoustic
sensing development.
3.2.3 Preamble Detection as an Example
As an important instance of the signal processing functions,
our novel preamble detection algorithm is hereby elabo-
rated. Preamble detection is a crucial building block for
many acoustic sensing applications such as synchroniza-
tion [19], [24] and packet detection [4], [3]. It often involves
determining both the presence of a particular signal and
the exact time when the signal appears. However, reliable
preamble detection is very challenging due to hardware
heterogeneity and the near-far problem [52].
To explain the challenges behind reliable preamble detec-
tion, we start with basic knowledge of common approaches.
Normally, existing approaches correlate the captured sam-
ples with a known reference template (often a chirp signal)
and indicate the presence of the reference signal if the
maximum correlation coefficient exceeds a preset thresh-
old [53]. However, to set an appropriate threshold is non-
trivial because acoustic channels are dynamic as explained
in Eqn. (1). More specifically, to perform preamble detection
at different transceiver distances, if the threshold were set
(a) Common method. (b) Our novel method.
Fig. 6: The proposed preamble detection algorithm can
mitigate multipath problem. To detect the time of a sig-
nal appearance, the index of the maximum peak in the
correlation results is commonly used. Without multipath
effect, this maximum peak, caused by Line-of-Sight (LOS)
signals, should appear first. However, in a multipath rich
environment, multiple None-Line-of-Sight (NLOS) signals
can add up constructively, leading to stronger peaks that
come later as shown in (a), thereby affecting detection
performance. To change the situation, many advance pro-
cessing techniques [20], [21] are needed. Nonetheless, after
normalizing within a proper sliding window, maximum
peak index again becomes usable, as shown in (b).
too high, the preamble transmitted by a distant user would
be missed due to large propagation loss; otherwise a nearby
interference could be falsely identified. The hardware het-
erogeneity, resulting in diverse gains, could further com-
prise the detection performance in a similar fashion.
Our reliable preamble detection algorithm is based on
a simple yet effective normalization method; it differs from
common correlation coefficient computations (e.g., [53]) that
suffer from a smoothing effect. Taking the correlation co-
efficients {c(k)}k=1,··· resulted from common approaches,
we further divide each coefficient by the average of their
preceding wones (wis empirically set to 200):
c0(k) = c(k)
This sliding-window based normalization (hereafter, we de-
note it by normalization for brevity) operation can effectively
handle the near-far problem and hardware heterogeneity,
while adequately mitigating the multipath effect. As demon-
strated in Fig. 6, our algorithm substantially enhances LOS
signals and largely suppresses NLOS ones, making the
preamble detection more efficient and robust against chal-
lenging multipath scenarios.
Remark: In the design of above software framework, we
use plain APIs to construct high-level user interfaces as
far as possible, so as to reduce the possible conflicts when
using different versions of programming language. This also
makes our software framework generally adaptable to any
acoustic subsystem running on Pi, except for some minor
changes on the device driver. To verify the aforementioned
compatibility of our software framework, we have specifi-
cally tested our MATLAB toolkits from MATLAB-2016 up
to -2019; the results have indicated no sign of compatible
For demonstrative purpose, we discuss five applications
built upon our ASDP platform. First, we present an interac-
tive sensing application (gesture recognition in particular)
to demonstrate the efficiency in fast prototyping. Second,
we showcase the way to evaluate aerial acoustic communi-
cation systems in the offline mode by using existing com-
patible utilities. Third, we report a challenging indoor local-
ization project to prove the ability in real-time processing.
Fourth, we substantiate the controllability by a novel ther-
mal sensing application. Finally, we exhibit reconfigurability
with an ultrasonic sensing application. These applications
are purposefully chosen to be relatively preliminary for
easy exposition within page limit, but ASDP is capable of
emulating all state-of-the-art developments (e.g., [1], [7], [2],
[21], [10], [11], [22], [9], [23], [24], [25], [26], [4], [3], [27], [30]).
To gain the SNR in our experiment settings, we first
record the transmitted signals at a quite place (less than
35 dB) as ground truth clean signals. We than record noisy
signals at our test environment (instantaneous noise could
be up to 68 dB). Given these measurements, the measured
SNR is around 1030 dB depending on the type of trans-
mitted signals. The transmit power is around 0.16W (unless
otherwise noted) and the maximum value is 5W.
4.1 Interactive Sensing
In this section, we demonstrate an online gesture recog-
nition application [54]; it leverages the Doppler effect ex-
plained in Eqn. (1) to identify hand gestures. The intuition
is that hand gestures cause perceivable Doppler shifts in the
resulting acoustic reflections. Thus, by sensing the received
spectrum, one can decipher the performed gestures. Like-
wise, more complex gestures such as double taps can be
recognized using the above primitives.
We hereby present the basic routines to implement the
above application, which can serve as guidelines to develop
similar online acoustic sensing applications. The steps are
listed as follows:
i Boot up the Pi and connect it to a local network.
ii Open the following MATLAB script.
iii Add “plot(abs(fft(data, 48000)))” in the TODO area.
iv Using the Java GUI to generate a 18 kHz pure tone
signal and choose online streaming mode.
v Run the MATLAB script on any local machines.
initial_work; % preparation
while 1
% read 2048 ’int16’ samples
data = data_rdy(2048, ’int16’);
plot(abs(fft(data, 48000))); % TODO
After the above steps, one can visualize the FFT results of
captured signals on a GUI window and observe a frequency
peak at 18 kHz. If any gestures are performed nearby the
Fig. 7: Gesture recognition demonstration. Different per-
formed gestures lead to distinct spectrum profiles.
speaker, the FFT results would exhibit distinctive features.
Fig. 7 shows the snapshots of the FFT results corresponding
to different gestures: they indeed result in distinct FFT
To summarize, when deploying an online acoustic sens-
ing application, one can directly access raw audio samples
and hence focus solely on algorithm designs. In compari-
son, on popular Android platforms, one has to deal with
two basic audio management classes: AudioTrack and Au-
dioRecord. Besides, it is imperative to deeply understand
the Android framework, multi-thread programming, data
sharing mechanisms, widgets, etc. Such a tedious learning
procedure can be totally eliminated by adopting our ASDP
4.2 Aerial Acoustic Communication
This section showcases how to evaluate aerial acoustic com-
munication technologies. A typical aerial acoustic commu-
nication system involves a transmitter and a receiver. On the
transmitter side, we use MATLAB APIs to generate packets
in WAV format and play them by an existing media player
such as mplayer, emulating signal transmissions. On the
receiver side, we capture the packets using audacity and
demodulate them offline using, for example, our APIs. The
transmission power in this experiment is around 0.62W.
In this example, we present a commonly used modula-
tion technique called Chirp Spectrum Spread (CSS) [19], [55]
or Binary Orthogonal Keying (BOK) [56], [57] as a particular
implementation. It leverages the modulation coefficients of
noise-resilient chirp signals to represent information. An ex-
ample CSS frame is shown in Fig. 8, which can be generated
by exploiting our MATLAB APIs as below:
clc; clear all;
css = [preamble_up, guard_interval,
symbol_up, guard_interval,
% bit ‘0’
symbol_down, guard_interval, ...]
% bit ‘1’
audiowrite(’transmit.wav’, css, fs);
In the above code snippet, the “parameters” in the
second line provides a collection of useful utilities and con-
figuration parameters to compose commonly used signals
such as chirp signals. The third line generates a desired
packet by piecing available signal segments together. The
last line converts the packet into a local audio file. In the
end, the packet can be transmitted by playing the audio
ID Sequence
Guard interval
Preamble ID Sequence
Guard interval
Preamble ID Sequence
Guard interval
Preamble 0 1
Guard interval
Fig. 8: An example CSS frame, where slopes of line segments
refer to the corresponding modulation coefficients.
file. At the receiver end, we use audacity to capture the
packets, store them into local files, and demodulate them
with our APIs. These codes can be readily deployed for
online processing with minor adaptations, as ASDP sup-
ports multiple programming languages. In fact, MATLAB
compiler can produce codes with significant better run-time
performance. Such flexible migration from offline to online
is infeasible on consumer-grade devices, but this process is
indeed crucial to evaluate acoustic sensing systems.
For comparison, we have tested other modulation tech-
niques including OCDM (Orthogonal Chirp Division Mul-
tiplexing [58]) and OFDM using the offline mode. These
two techniques are more bandwidth efficient and hence
achieve higher transmission rates, yet sacrifice transmission
ranges due to higher Bit Error Rates (BERs). Table 2 shows
a performance comparison of these modulation techniques,
where CSS significantly outperforms OFDM and OCDM in
BER. The reason behind the high BER of OFDM and OCDM
is that aerial acoustic channel exhibits high dynamics as
shown in Eqn. (1).
TABLE 2: BERs of different modulations at various dis-
BER (%) <0.2 m 5 m 10 m 15 m 20 m
OCDM 0 3.7 26.17 36.42 49.8
OFDM 0.025 7.7 13.13 24.05 32.6
CSS 0 0 0 0 0
4.3 Context-aware Sensing
In this section, we present a challenging localization project
to demonstrate the ability of ASDP in handling complex
tasks as a standalone application in real-time.1For the sake
of efficiency, the code in this project is developed using C
language. The transmission power is around 0.73W.
This project can be deemed as an acoustic implemen-
tation of a Two-Way Ranging based localization technique
described in Decawave user manuals [59]. To account for
the special acoustic properties, we modify the encoding
mechanism to BOK. In this localization system, at least four
acoustic anchors are distributed in several place-of-interests.
They are scheduled by a remote central server and take
turns to transmit beacon messages in a loose time division
manner. The beacon messages contain several components:
1. It should be noted that the Raspbian OS is not a truly real-
time OS; it has latency in scheduling multiple tasks. The real-
time concept here means that the processing latency is smaller
than a scheduling period in a specific task.
a preamble (a chirp signal with longer duration compared
to symbols) to extract TDoA information, 2-bit symbols
to encode anchor IDs, and 2-bit symbols to label beacon
sequences. The format of the beacon message is similar to
that in Fig. 8. A target, as well as all the anchor nodes,
decode the information contained in the beacon messages
and upload them to a server. Upon collecting enough beacon
messages, the server computes the location of the target.
The project requires anchor nodes to tackle multiple time-
critical tasks, accomplished by the following five concurrent
Recording: It fills a ring buffer with recorded samples and
once the buffer is full, it moves the content to the FIFO
Playing: Upon receiving a valid schedule command, this
thread generates and transmits specific beacon messages.
Broadcast receiver: It creates a UDP socket to monitor in-
coming messages. Upon receiving activation commands,
it signals the playing thread to transmit specific beacon
Decoding: It extracts timestamps, anchor IDs, and se-
quence numbers from received beacon messages.
Uploading: This thread is mainly used to upload decoded
results from the decoding thread.
Among these threads, the most computationally intensive
one is the decoding thread that performs correlation and
extensive floating-point multiplication. To speed them up,
we adopt efficient implementations such as computing cor-
relation via FFT and multiplication, while exploiting GPU
and Neon for hardware acceleration. To ease the efforts on
configurations, we use the JSON script to configure runtime
parameters such as preamble detection threshold.
We deploy 4anchors in an 8×12 m2indoor scenario
and run the above localization project. We use Leica DISTO
S910 [60], a laser distance meter with 1mm resolution, to
measure the ground truth and the performance evaluation
results are shown in Fig. 9. It can be observed that the
average ranging error is below 2cm, and 80-percentile
localization errors are within 20 cm. The localization errors
are obtained without accounting for the board size and
the temperature effect on the travel speed explained in
Sec. 3.1.3. Therefore, we believe that the actual localization
performance given a more serious implementation could be
much better. When we check the workload at runtime, the
Pi shows only 13% CPU usage. And it takes about 100 ms
(a) Ranging errors. (b) Localization errors.
Fig. 9: Performance of acoustic localization indicated by
Cumulative Distribution Functions (CDFs) of (a) ranging
errors between a pair of devices and (b) the localization
errors for a target device.
Acoustic front-ends
Dedicated temperature sensor
Acoustic front-ends
Fig. 10: Testbed setup for thermal sensing.
to decode all the content in a beacon message, which is fast
enough to support real-time tracking.
4.4 Thermal Sensing
This section demonstrates the controllability of ASDP plat-
form via a novel acoustic sensing application, namely mea-
suring ambient air temperature. In meteorology, there have
been many similar proposals leveraging acoustic signals
to measure ambient air temperature or wind speed [47],
[61], [62]. These technologies exploit the relationship be-
tween sound speed and temperature explained in Sec. 3.1.3.
However, these solutions often target high-end devices and
require either high sampling rates (>48 kHz) or ultrasonic
signals, which are infeasible on commodity devices. To cir-
cumvent these limitations, we employ the chirp mixing ap-
proach [23], [24] to derive a relationship between frequency
peaks and temperatures as T=αd
mf c, where αdenotes
the modulation coefficient, dis the distance difference from
a speaker to two microphones, and frefers to the value of
the peak frequency.
To perform thermal sensing on ASDP, we need two
recording channels. Also, disabling AGC is crucial as it in-
troduces dynamic channel gains, leading to more severe fre-
quency selectivity problem and thus unstable readings [24].
Meeting these requirements on commodity mobile devices
is almost infeasible as it needs to rebuild the entire OS im-
ages. In contrast, ASDP only requires one to appropriately
set up two transistor-based acoustic front-ends for sensing.
After then, one can resort to ASDP software to gather data
for analysis.
(a) Model verification.
(b) Sensing errors.
Fig. 11: Thermal sensing results: (a) The relationship be-
tween frequency peaks and temperatures from measure-
ments are highly consistent with analytical ground truth.
(b) CDF of the temperature measurement errors.
On our ASDP hardware platform, the two 4-pin in-
terfaces on the baseband for plugging acoustic front-ends
are at close proximity, which affects resolution. Therefore,
we build a hub to increase their distance and the setup
is shown in Fig. 10. To measure ground truth tempera-
tures, we use a dedicated temperature module that achieves
0.05Caccuracy. For comparison, we also implement the
sensing algorithm on a Samsung S5 smartphone. Fig. 11
depicts the results. It can be observed that the measurement
errors achieved by ASDP are smaller than those by the
smartphone. ASDP’s better performance attributes to less
frequency selectivity problem and no AGC distortions. To
summarize, this project clearly demonstrates ASDP’s supe-
rior performance, adequate controllability in AGC manipu-
lation, and salient extensibility in flexible design.
4.5 Ultrasonic Sensing
This section demonstrates how to reconfigure ASDP for
ultrasonic sensing, an infeasible task for commodity devices.
We have developed an ultrasonic Frequency Modulated
Continuous Wave (FMCW) radar [63] using ASDP and the
corresponding hardware diagram is shown in Fig. 12. In this
FMCW radar, the acoustic front-ends are replaced by three
ultrasonic sensors: two of them are transmitters, attaching
to the stereo output channels directly, and the remaining
one serves as receiver, connecting to the baseband input.
Additionally, the gain for on-board LM386 on the baseband
is adjusted to increase the operation range. After then, trans-
mitter A is placed close to the receiver for synchronization
and transmitter B is used for ranging. Note that the above
process only requires wiring sensors to I/Os; no hardware
modification is needed.
Fig. 12: Diagram for ultrasonic sensing.
To use the above setup for ultrasonic ranging, we take
the following steps. First, we use our MATLAB APIs to
synthesize an audio file containing FMCW signals. This
FMCW signal has a bandwidth of B= 8 kHz, starting
from f0= 38 kHz, with a duration of T= 1 s and a
sampling rate of 96 kHz, achieving a ranging resolution
of d=c
2B2cm (which can be improved by involv-
ing phase). Second, we initiate the radar by playing the
audio file and recording the radar signals, using mplayer
and audacity, respectively. Finally, we analyze the mixed
spectrum from recorded radar profiles to inspect range data.
Fig. 13 depicts the sensing results. When we place an object
at 95 cm, one can clearly identify its existence through the
spectrum of the mixed signals as shown in Fig. 13(a). We
have also conducted extensive experiments to verify the
ranging accuracy in Fig. 13(b). It can be observed that the
average ranging error is around 2cm.
Remark: The above demonstrative applications successfully
show ASDP’s efficiency in developing acoustic sensing
(a) Mixed spectrum. (b) Ranging errors.
Fig. 13: Ultrasonic sensing results. (a) Mixed spectrum of the
sensing results in ultrasonic ranging. An object at 95cm dis-
tance generates a spike frequency at 93.7cm (not counting
the 12cm transducer size) in the spectrum, (b) CDF of
absolute ultrasonic ranging errors.
applications. Also, they confirm ASDP’s better flexibility
in controlling and reconfiguring hardware than consumer-
grade devices. We envision that more interesting applica-
tions could be developed using ASDP.
In this section, we report a thorough evaluation on several
components of ASDP. We first present some measurement
studies on its acoustic hardware to verify the sensing
performance. Then, we report the overhead of common
computation tasks including FFT, correlation, and floating-
point multiplication, demonstrating ASDP’s competent pro-
cessing capability. Finally, we showcase a representative
software implementation to exhibit its salient features in
preamble detection.
5.1 Acoustic Hardware Measurements
In this section, we conduct measurements on the acoustic
hardware. We first verify the two designs of the acoustic
front-ends. Specifically, we check the fixed gain of the one
using transistor-based circuit and inspect the dynamic be-
havior of the one regulated by AGC. In the first experiment,
to obtain the fixed gain for the transistor-based acoustic
front-end, we put a smartphone playing pure tone signals at
close proximity, and use a Tektronix TDS 2024C oscilloscope
to probe the input and output of the amplifying circuit. The
waveforms of the input and output are shown in Fig. 14(a),
demonstrating a gain of 55 (equivalent to 35 dB) and
hence sufficient for most sensing applications. In a separate
experiment (not reported here for brevity), the front-end
with AGC has almost reached the maximum gain stated in
its specification.
We carry out another experiment to further investigate
the behavior of the acoustic front-end with AGC, with
the same setup as the previous experiment except that
the transmitted sound intensity increases linearly and we
program the maximum AGC gain to three different values.
We capture the transmitted signals in a fixed location and
quantify the received signal intensity by its amplitude.
Fig. 14 (b) shows the received signal intensity versus the
transmitted one. Evidently, AGC governs the control as the
received signal intensity is dynamic, as opposed to the linear
increase in the transmitted one. These measurements also
confirm that the AGC-based acoustic front-end indeed has
its maximum gain tunable through program instructions.
In addition, we conduct measurements to check the
frequency response of the sound recording and playback
system thoroughly, so as to get a better understanding of the
frequency selectivity problem. In the experiment to inspect
the frequency response of the sound recording subsystem,
we remove the microphone sensor and use RIGOL DG4162,
a function generator, to excite the recording channel with
a sweeping chirp signal in 0.123 kHz frequency range.
After then, we run a recording application to measure
the corresponding frequency response. Likewise, we take
the speakers away from the playback subsystem, play the
sweeping signal, and probe its output using an oscilloscope.
The results are shown in Fig. 14(c). It is observable that
the frequency response of the acoustic hardware without
transducers (i.e., speakers and microphones in our case) are
flat, indicating that acoustic transducers are the major causes
of the frequency selectivity problem, which is equally true
for commodity smartphones.
As a follow-up of the previous experiments, we inves-
tigate whether the notorious frequency selectivity problem
is mitigated and the receiver gain of the inaudible range is
turbocharged on ASDP. Due to the lack of dedicated acoustic
instruments, we perform a self-recording experiment on our
ASDP platform and exploit the frequency response to verify
the above issues. We use a chirp signal with a frequency
range of 50 Hz24 kHz and a duration of 5 s. We play it
via an external speaker that has a relatively flat frequency
response and let ASDP record it. The same measurement
(a) Input/output waveforms. (b) AGC behaviors.
(c) Channel property.
(d) Frequency response.
Fig. 14: Experiments results: (a) Input and output of the am-
plification circuit on the acoustic front-end using transistors.
(b) The AGC outputs increase linearly with inputs when
the maximum gain is set to be relatively low (e.g., 40 dB),
but they may get quickly saturated under higher maximum
gains. (c) The channel responses of sound recording and
playback systems without acoustic transducers are flat. (d)
Frequency response comparison between ASDP and a com-
modity smartphone (Samsung Galaxy S9).
TABLE 3: Computation times for N-point operations on ASDP platform
Time (µs) N
256 512 1024 2048 4096 8192 16384 32768 65536 131073 262144
Normal-FFT (int) 119 199 342 684 1346 2730 5903 11914 26208 69887 126178
GPU-FFT (int) 27 41 58 111 265 715 1366 3447 6650 16386 43636
Neon-FFT (int) 6 13 32 80 190 452 1612 5478 15008 36621 82066
Normal-FFT (float) 133 204 398 761 1606 3309 7187 14609 31488 84738 149660
GPU-FFT (float) 27 42 59 114 266 715 1376 3411 6744 16395 43417
Neon-FFT (float) 6 13 32 80 214 556 1645 5605 16694 38436 91689
Normal multiplications 8 16 33 67 135 270 540 1092 2164 4394 8948
Neon-optimized multiplications 1 3 7 15 31 63 127 255 515 1066 2088
has also been carried out using a smartphone and Fig. 14(d)
shows the results. Evidently, the frequency response of
ASDP is remarkably flatter on the entire frequency range
than that of the commodity smartphone. Meanwhile, the
gain of the inaudible range from ASDP is several orders
higher than that from the smartphone. It should be noted
that the frequency selectivity problem cannot be readily
mitigated via reciprocal filters [24], [64] if AGC is adopted
as is the case with most commodity smartphones.
5.2 Overhead
This section presents the overheads of operations including
FFT, correlation, and multiplication, as they are frequently
used yet among the most time-consuming operations. Con-
sequently, the overhead of these operations are the major
concerns on deciding whether to run an application on a
local host or on a more powerful remote machine. To give
some handful experience on the above issues, we quantify
the overhead by recording the time it takes to accomplish
these operations with and without acceleration. Table 3
shows the results as the average time cost of 10,000 trials.
Table 3 reveals that GPU and Neon are very effective
in speeding up computation as any implementations using
either GPU or Neon cost much less time than the cor-
responding normal implementations. Upon a closer look,
we can find that the advantage of utilizing GPU becomes
more and more prominent with increasing N, whereas it is
more efficient to use Neon when Nis relatively small, say
N < 16384. Another observation is that implementations
using GPU or Neon seem to be largely independent of data
types (i.e., int or float) as operations using different data
types consume almost the same amount of time. In contrast,
time costs of normal implementations in float data type are
noticeably higher than those in int data type.
The time cost of correlation can be derived from the
data in Table 3, given our efficient implementation via
FFT and multiplication. Specifically, the correlation between
two sequences xand ycan be computed as xcorr (x, y) =
F1(F(x)F(y)), where Fstands for FFT with Fbeing
its complex conjugate and F1denoting IFFT. To detect a
40ms preamble (equivalent to 1920 samples at 48kHz sam-
pling rate, commonly adopted in many applications [23],
[24], [30], [10], [11], [26]), we need to invoke 2048-point
float FFT three times2and 4096-point multiplication once
(omitting additions of negligible cost), which consumes
3×114 + 31 =373 µs (GPU implementation, 271 µs for
Neon version). In contrast, a commodity smartphone (e.g.,
Samsung Galaxy S5) with NDK APIs requires about 1,500µs.
The above analysis reveals that the processing time (around
300µs) is less than the signal duration (40,000µs in this case),
indicating the competent capability of ASDP in handling
real-time operations.
5.3 Reliable Preamble Detection
We evaluate the proposed preamble detection algorithm in
this section. We first test the detection rate at typical indoor
scenarios. The preamble is a chirp signal that occupies a
bandwidth of 4kHz (18 to 22kHz), and the duration is 40ms.
We use our MATLAB APIs to synthesize an audio file that
has a repetition of 1000 preambles with a guard interval
of 50 ms. We let a smartphone play this audio file and let
ASDP and another smartphone perform detection at various
distances in real-time. ASDP adopts the transistor-based
acoustic front-end that bears a fixed gain. For comparison,
we have also implemented the common detection method
in LibAS [15]. The results are shown in Fig. 15(a). It can be
observed that our algorithm outperforms the common ap-
proach especially at longer distances. Another observation
is that the same algorithm performs slightly better on ASDP
than on the smartphone. We believe that ASDP’s higher
sensitivity in the inaudible range is the reason.
We carry out further experiments to find out whether
our preamble detection algorithm can handle the near-far
problem and hardware heterogeneity. For verification, we
record the Normalized Maximum Peak Values (NMPV)
of the correlation results and report their changes under
different distances and across various devices. As shown
in Fig. 15(b) and (c), NMPVs almost always remain con-
stant under different circumstances. Such a desirable feature
makes it easier to set an appropriate threshold. In contrast,
as shown in Fig. 15(b) and (c), the plain Maximum Peak
Value (MPV) from common method [15] can have a sharp
64×attenuation at long distances and 10×changes on
heterogeneous test devices. As a result, to work properly
2. IFFT has the same time complexity as FFT.
(a) Detection rate comparison. (b) Impact of distances. (c) Impact of devices.
Fig. 15: Preamble detection results. (a) Detection rates of different algorithms under different test conditions. The detection
rate of the common method (phone without normalization) almost cuts down by half as distances increase. However, after
normalization (phone with normalization), the detection rate can be significantly improved. The high sensitivity at the
inaudible range on ASDP contributes to a better detection rate (ASDP with normalization) than smartphones (phone with
normalization). (b) MPVs gradually decrease as distances increase while NMPVs remain almost a constant. (c) MPVs from
various devices exhibit sharp differences (up to 10×) while NMPVs are barely affected by hardware heterogeneity.
TABLE 4: Comparison between ASDP and existing work
Platform Control
ASDP GUI and readable
scripts; hot-plug X X
Others Software tuning × ×
under distance and device changes, it requires a tedious and
discouraging threshold calibration process. In summary, our
preamble detection algorithm resolves the near-far problem
and hardware heterogeneity problem elegantly. Note that
our preamble detection algorithm is immune to different
configurations of chirp signals. Tuning Bmakes no impacts
on detection performance and changing Tonly affects the
maximum value of peak frequency.
We highlight the benefits brought by ASDP and summarize
its distinctive features in Table 4.
Easy Controllability and Reconfigurability: ASDP pro-
vides flexible ways to control and reconfigure hardware
settings, making it convenient to try out various implemen-
tations and extensions for experimentation.
Good Usability and Programmability: ASDP offers a soft-
ware framework with a collection of utilities, maintaining
both good usability and programmability for fast prototyp-
ing, clearly demonstrated by all applications in Sec. 4.
The major limitation of ASDP is that it is built on
customized hardware, making it less appealing for commer-
cial solutions. However, this is a small price compared to
the aforementioned benefits (similar to RF-SDRs). After all,
ASDP is intended for fast prototyping and fair benchmark-
ing, so as to inspire novel designs of acoustic solutions on
commodity platforms.
Our ASDP design indeed reveals several lessons to learn
by phone manufactories, so as to better support acoustic
sensing applications. By improving upon current mobile
systems, including releasing more control to physical lay-
ers, allowing configurations via script languages, or simply
adopting better sensors, more innovative sensing applica-
tions would emerge. These improvements, incurring no
extra cost, can in turn add diversified functionalities to
smartphones and thus promoting their values.
In this paper, we have thoroughly presented ASDP, in both
hardware and software designs, as well as in its potential ap-
plications and performance evaluations. ASDP significantly
outperforms commodity devices for its tailored hardware
design, and its software framework offers a rich set of
utilities for fast prototyping. As a unfied platform, ASDP
eliminates the hardware heterogeneity problem and hence
incurs less compatibility issues, rendering it competent for
meaningful benchmarking. It offers user-friendly interfaces
to control and reconfigure the physical layer settings, a
tedious or almost infeasible task on commodity devices. For
demonstrative purpose, we have presented five different ap-
plications on ASDP platform, showcasing its salient features
in acoustic sensing development. We have also conducted a
comprehensive performance evaluation on individual com-
ponents of ASDP. We believe that, like RF-SDRs, the release
of our ASDP can be a strong driving force behind innovative
acoustic sensing research.
[1] R. Nandakumar, K. K. Chintalapudi, V. Padmanabhan, and
R. Venkatesan, “Dhwani: Secure Peer-to-Peer Acoustic NFC,” in
Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM,
ser. SIGCOMM ’13, 2013.
[2] S. Ka, T. H. Kim, J. Y. Ha, S. H. Lim, S. C. Shin, J. W. Choi, C. Kwak,
and S. Choi, “Near-ultrasound Communication for TV’s 2Nd
Screen Services,” in Proceedings of the 22Nd Annual International
Conference on Mobile Computing and Networking, ser. MobiCom ’16,
[3] S. Yun, Y.-C. Chen, H. Zheng, L. Qiu, and W. Mao, “Strata: Fine-
Grained Acoustic-based Device-Free Tracking,” in Proceedings of
the 15th Annual International Conference on Mobile Systems, Applica-
tions, and Services, ser. MobiSys ’17, 2017.
[4] R. Nandakumar, V. Iyer, D. Tan, and S. Gollakota, “FingerIO: Using
Active Sonar for Fine-Grained Finger Tracking,” in Proceedings of
the 2016 CHI Conference on Human Factors in Computing Systems,
ser. CHI ’16, 2016.
[5] C. Cai, Z. Chen, H. Pu, L. Ye, M. Hu, and J. Luo, “AcuTe: Acoustic
Thermometer Empowered by a Single Smartphone,” in Proceedings
of the 18th Conference on Embedded Networked Sensor Systems, 2020,
p. 2841.
[6] C. Cai, C. Zhe, J. Luo, H. Pu, M. Hu, and R. Zheng, “Boosting
chirp signal based aerial acoustic communication under dynamic
channel conditions,” IEEE Transactions on Mobile Computing, pp.
1–12, 2021.
[7] Q. Wang, K. Ren, M. Zhou, T. Lei, D. Koutsonikolas, and L. Su,
“Messages Behind the Sound: Real-time Hidden Acoustic Signal
Capture with Smartphones,” in Proceedings of the 22Nd Annual
International Conference on Mobile Computing and Networking, ser.
MobiCom ’16, 2016.
[8] W. Wang, A. X. Liu, and K. Sun, “Device-free gesture tracking us-
ing acoustic signals,” in Proceedings of the 22nd Annual International
Conference on Mobile Computing and Networking, ser. MobiCom ’16,
[9] K. Liu, X. Liu, and X. Li, “Guoguo: Enabling Fine-Grained Smart-
phone Localization via Acoustic Anchors,” IEEE Transactions on
Mobile Computing, vol. 15, no. 5, pp. 1144–1156, 2016.
[10] P. Lazik and A. Rowe, “Indoor Pseudo-ranging of Mobile Devices
Using Ultrasonic Chirps,” in Proceedings of the 10th ACM Conference
on Embedded Network Sensor Systems, ser. SenSys ’12, 2012.
[11] P. Lazik, N. Rajagopal, O. Shih, B. Sinopoli, and A. Rowe, “ALPS:
A Bluetooth and Ultrasound Platform for Mapping and Local-
ization,” in Proceedings of the 13th ACM Conference on Embedded
Networked Sensor Systems, ser. SenSys ’15, 2015.
[12] E. Research. (2019) USRP Network Series. https:
[13] (2019) WARP: Wireless Open Access Research Platform. http://
[14] H. S. Dol, P. Casari, T. van der Zwan, and R. Otnes, “Software-
Defined Underwater Acoustic Modems: Historical Review and the
NILUS Approach,” IEEE Journal of Oceanic Engineering, vol. 42,
no. 3, pp. 722–737, 2017.
[15] Y.-C. Tung, D. Bui, and K. G. Shin, “CrossPlatform Support for
Rapid Development of Mobile Acoustic Sensing Applications,”
in Proceedings of the 16th Annual International Conference on Mobile
Systems, Applications, and Services, ser. MobiSys ’18, 2018, pp. 455–
[16] C. Cai. (2021) Acoustic Software Defined Platform. https://
[17] S. Elmalaki, L. Wanner, and M. Srivastava, “CAreDroid: Adap-
tation Framework for Android Context-Aware Applications,” in
Proceedings of the 21st Annual International Conference on Mobile
Computing and Networking, ser. MobiCom ’15, 2015, pp. 386–399.
[18] L. Ravindranath, A. Thiagarajan, H. Balakrishnan, and S. Madden,
“Code in the Air: Simplifying Sensing and Coordination Tasks
on Smartphones,” in Proceedings of the Twelfth Workshop on Mobile
Computing Systems and Applications, ser. HotMobile ’12, 2012, pp.
[19] H. Lee, T. H. Kim, J. W. Choi, and S. Choi, “Chirp signal-based
aerial acoustic communication for smart devices,” in 2015 IEEE
Conference on Computer Communications, ser. INFOCOM ’15, 2015.
[20] C. Peng, G. Shen, Y. Zhang, Y. Li, and K. Tan, “Beepbeep: a high
accuracy acoustic ranging system using cots mobile devices,” in
Proceedings of the 5th International Conference on Embedded Networked
Sensor Systems, ser. SenSys ’07, 2007.
[21] R. Nandakumar, K. K. Chintalapudi, and V. N. Padmanabhan,
“Centaur: Locating Devices in an Office Environment,” in Proceed-
ings of the 18th Annual International Conference on Mobile Computing
and Networking, ser. Mobicom ’12, 2012.
[22] B. Zhou, M. Elbadry, R. Gao, and F. Ye, “BatMapper: Acous-
tic Sensing Based Indoor Floor Plan Construction Using Smart-
phones,” in Proceedings of the 15th Annual International Conference
on Mobile Systems, Applications, and Services, ser. MobiSys ’17, 2017.
[23] W. Mao, J. He, H. Zheng, Z. Zhang, and L. Qiu, “High-Precision
Acoustic Motion Tracking: Demo,” in Proceedings of the 22Nd An-
nual International Conference on Mobile Computing and Networking,
ser. MobiCom ’16, 2016.
[24] W. Mao, Z. Zhang, L. Qiu, J. He, Y. Cui, and S. Yun, “Indoor Follow
Me Drone,” in Proceedings of the 15th Annual International Conference
on Mobile Systems, Applications, and Services, ser. MobiSys ’17, 2017.
[25] Y.-C. Tung and K. G. Shin, “EchoTag: Accurate Infrastructure-
Free Indoor Location Tagging with Smartphones,” in Proceedings
of the 21st Annual International Conference on Mobile Computing and
Networking, ser. MobiCom ’15, 2015.
[26] R. Nandakumar, S. Gollakota, and N. Watson, “Contactless Sleep
Apnea Detection on Smartphones,” in Proceedings of the 13th
Annual International Conference on Mobile Systems, Applications, and
Services, ser. MobiSys ’15, 2015.
[27] J. Wang, K. Zhao, X. Zhang, and C. Peng, “Ubiquitous Keyboard
for Small Mobile Devices: Harnessing Multipath Fading for Fine-
grained Keystroke Localization,” in Proceedings of the 12th Annual
International Conference on Mobile Systems, Applications, and Services,
ser. MobiSys ’14, 2014.
[28] S. Ke, Z. Ting, W. Wei, and X. Lei, “VSkin: Sensing Touch Gestures
on Surfaces of Mobile Devices Using Acoustic Signals,” in Proceed-
ings of the 24th Annual International Conference on Mobile Computing
and Networking, ser. MobiCom ’18, 2018.
[29] K. Sun, W. Wang, A. X. Liu, and H. Dai, “Depth Aware Finger
Tapping on Virtual Displays,” in Proceedings of the 16th Annual
International Conference on Mobile Systems, Applications, and Services,
ser. MobiSys ’18, 2018.
[30] Y.-C. Tung and K. G. Shin, “Expansion of Human-Phone Interface
By Sensing Structure-Borne Sound Propagation,” in Proceedings of
the 14th Annual International Conference on Mobile Systems, Applica-
tions, and Services, ser. MobiSys ’16, 2016.
[31] C. Cai, H. Pu, M. Hu, R. Zheng, and J. Luo, “Sst: Software sonic
thermometer on acoustic-enabled iot devices,” IEEE Transactions
on Mobile Computing, vol. 20, no. 5, pp. 2067–2079, 2021.
[32] C. Cai, M. Hu, X. Ma, K. Peng, and J. Liu, “Accurate Ranging
on Acoustic-enabled IoT Devices,” IEEE Internet of Things Journal,
vol. 6, no. 2, pp. 3164–3174, April 2019.
[33] Y. Wang, J. Li, R. Zheng, and D. Zhao, “ARABIS: an Asynchronous
Acoustic Indoor Positioning System for Mobile Devices,” in 2017
IEEE International Conference on Indoor Positioning and Indoor Navi-
gation (IPIN 2017), 2017.
[34] D. B. Haddad, W. A. Martins, M. d. V. M. da Costa, L. W. P. Bis-
cainho, L. O. Nunes, and B. Lee, “Robust acoustic self-localization
of mobile devices,” IEEE Transactions on Mobile Computing, vol. 15,
no. 4, pp. 982–995, 2016.
[35] W. Huang, Y. Xiong, X. Li, H. Lin, X. Mao, P. Yang, Y. Liu, and
X. Wang, “Swadloon: Direction finding and indoor localization
using acoustic signal by shaking smartphones,” IEEE Transactions
on Mobile Computing, vol. 14, no. 10, pp. 2145–2157, 2015.
[36] E. Demirors, G. Sklivanitis, G. E. Santagati, T. Melodia, and S. N.
Batalama, “A High-Rate Software-Defined Underwater Acoustic
Modem With Real-Time Adaptation Capabilities,” IEEE Access,
vol. 6, pp. 18 602–18 615, 2018.
[37] M. Chitre, R. Bhatnagar, and W. Soh, “UnetStack: An agent-based
software stack and simulator for underwater networks,” in 2014
Oceans - St. John’s, 2014, pp. 1–10.
[38] P. Georgiev, N. D. Lane, K. K. Rachuri, and C. Mascolo, “DSP.Ear:
Leveraging Co-processor Support for Continuous Audio Sensing
on Smartphones,” in Proceedings of the 12th ACM Conference on
Embedded Network Sensor Systems, ser. SenSys ’14, 2014, pp. 295–
[39] N. D. Lane, P. Georgiev, and L. Qendro, “DeepEar: Robust Smart-
phone Audio Sensing in Unconstrained Acoustic Environments
Using Deep Learning,” in Proceedings of the 2015 ACM International
Joint Conference on Pervasive and Ubiquitous Computing, ser. Ubi-
Comp ’15, 2015, pp. 283–294.
[40] S. Nirjon, R. F. Dickerson, P. Asare, Q. Li, D. Hong, J. A. Stankovic,
P. Hu, G. Shen, and X. Jiang, “Auditeur: A Mobile-Cloud Service
Platform for Acoustic Event Detection on Smartphones,” in Pro-
ceeding of the 11th Annual International Conference on Mobile Systems,
Applications, and Services, ser. MobiSys ’13, 2013.
[41] H. Lu, W. Pan, N. D. Lane, T. Choudhury, and A. T. Campbell,
“SoundSense: Scalable Sound Sensing for Peoplecentric Appli-
cations on Mobile Phones,” in Proceedings of the 7th International
Conference on Mobile Systems, Applications, and Services, ser. Mo-
biSys ’09, 2009, pp. 165–178.
[42] (2019) WM8731 Datasheet.
pub/Main/DataSheets/WM8731 8731L.pdf.
[43] (2019) Raspberry Pi 3 Model B+ .
[44] (2019) Neon Technology.
[45] (2020) Element 14 Releases A 33 Sound Card for the Raspberry
33-sound-card-for-the- raspberry-pi/.
[46] J. Defrance, E. Salomons, I. Noordhoek, D. Heimann, B. Plovsing,
G. Watts, H. Jonasson, X. Zhang, E. Premat, I. Schmich, F.-E.
Aballa, M. Baulac, and F. de Roo, “Outdoor Sound Propaga-
tion Reference Model Developed in the European Harmonoise
Project,” Acta Acustica united with Acustica, vol. 93, no. 2, pp. 213–
227, 2007.
[47] J. C. Kaimal and J. E. Gaynor, “Another Look at Sonic Thermom-
etry,” Boundary-Layer Meteorology, vol. 56, no. 4, pp. 401–410, Sep
[48] (2019) WM8731 Linux Driver.
[49] (2019) Linux Audio Utility: aplay.
[50] (2019) Linux Audio Utility: arecord.
[51] V. Valimaki and J. D. Reiss, “All About Audio Equalization:
Solutions and Frontiers,” Applied Sciences, vol. 6, no. 5, 2016.
[52] B. G. Agee, Solving the Near-Far Problem: Exploitation of Spatial
and Spectral Diversity in Wireless Personal Communication Networks,
1994, pp. 69–80.
[53] R. Diamant, “Closed Form Analysis of the Normalized Matched
Filter With a Test Case for Detection of Underwater Acoustic
Signals,” IEEE Access, vol. PP, pp. 1–1, 11 2016.
[54] S. Gupta, D. Morris, S. Patel, and D. Tan, “SoundWave: Using the
Doppler Effect to Sense Gestures,” in Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems, ser. CHI ’12,
[55] S. Kim and J.-W. Chong, “Chirp Spread Spectrum Transceiver
Design and Implementation for Real Time Locating System,”
International Journal of Distributed Sensor Networks, vol. 11, no. 8,
p. 572861, 2015.
[56] A. Berni and W. Gregg, “On the Utility of Chirp Modulation for
Digital Signaling,” IEEE Transactions on Communications, vol. 21,
no. 6, pp. 748–751, Jun 1973.
[57] S. E. El-Khamy, S. E. Shaaban, and E. A. Tabet, “Efficient
MultipleAccess Communications Using Multi-User Chirp Mod-
ulation Signals,” in Spread Spectrum Techniques and Applications
Proceedings, 1996., IEEE 4th International Symposium on, vol. 3, Sep
1996, pp. 1209–1213.
[58] X. Ouyang and J. Zhao, “Orthogonal Chirp Division Multiplex-
ing,” IEEE Transactions on Communications, vol. 64, no. 9, pp. 3946–
3957, Sep. 2016.
[59] (2019) Decawave DW1000 User Manual. https://
[60] (2019) Leica S910 Laser Distance Meter. https://shop.leica-
[61] D. Cruette, A. Marillier, J. L. Dufresne, and J. Y. Grandpeix, “Fast
Temperature and True Airspeed Measurements with the Airborne
Ultrasonic AnemometerThermometer (AUSAT),” Journal of At-
mospheric and Oceanic Technology, vol. 17, no. 1003, pp. 1020 – 1039,
[62] “Sonic Anemometry and Thermometry: Theoretical Basis and
Dataprocessing Software,” Environmental Software, vol. 11, no. 4,
pp. 259 – 270, 1996.
[63] B. Dekker, S. Jacobs, A. S. Kossen, M. C. Kruithof, A. G. Huizing,
and M. Geurts, “Gesture Recognition with A Low Power FMCW
Radar and A Deep Convolutional Neural Network,” in 2017
European Radar Conference (EURAD), Oct 2017.
[64] N. Roy, H. Hassanieh, and R. Roy Choudhury, “BackDoor: Making
Microphones Hear Inaudible Sounds,” in Proceedings of the 15th
Annual International Conference on Mobile Systems, Applications, and
Services, ser. MobiSys ’17, 2017.
Chao Cai is an Associate Professor at the Col-
lege of Life Science and Technology, Huazhong
University of Science and Technology. He got his
Ph.D degree in the School of Electronic Infor-
mation and Engineering at Huazhong University
of Science and Technology, Wuhan, China. He
has worded as a postdoc at School of Computer
Science and Engineering, Nanyang Technology
University, Singapore. His current research inter-
ests include mobile computing, acoustic sens-
ing, wireless sensing, embedded system, digital
signal processing, and deep learning.
Henglin Pu is an undergraduate student at
Huazhong University of Science and Tech-
nology. His current research interests include
acoustic sensing, wireless sensing, and digital
signal processing.
Menglan Hu received the BE degree in
electronic and information engineering from
Huazhong University of Science and Technol-
ogy, China, in 2007, and the PhD degree in
electrical and computer engineering from the
National University of Singapore, Singapore, in
2012. He is currently an Associate Professor at
the School of Electronic Information and Com-
munications, Huazhong University of Science
and Technology, China. His research interests
includes cloud computing, mobile computing,
parallel and distributed systems, scheduling and resource management,
as well as wireless networking.
Rong Zheng received her Ph.D. degree from
Dept. of Computer Science, University of Illi-
nois at Urbana-Champaign and earned her M.E.
and B.E. in Electrical Engineering from Tsinghua
University, P.R. China. She is now a Professor
in the Dept. of Computing and Software in Mc-
Master University, Canada. Between 2004 and
2012, she is on the faculty of the Department of
Computer Science, University of Houston.
Rong Zheng’s research interests include wire-
less networking, mobile computing and machine
learning. She serves as an editor of IEEE Transactions on Mobile Com-
puting and IEEE Transactions on Network Science and Engineering, and
IEEE Transactions of Wireless Communication. Rong Zheng received
Discovery Accelerator Supplement from Natural Science and Engineer-
ing Council, Canada in 2019 and the National Science Foundation
CAREER Award in 2006. She was Joseph Ip Engineering fellow from
2015 to 2018.
Jun Luo received his BS and MS degrees in
Electrical Engineering from Tsinghua Univer-
sity, China, and the Ph.D. degree in Computer
Science from EPFL (Swiss Federal Institute of
Technology in Lausanne), Lausanne, Switzer-
land. From 2006 to 2008, he has worked as a
postdoctoral research fellow in the Department
of Electrical and Computer Engineering, Univer-
sity of Waterloo, Waterloo, Canada. In 2008, he
joined the faculty of the School Of Computer Sci-
ence and Engineering, Nanyang Technological
University in Singapore, where he is currently an Associate Professor.
His research interests include mobile and pervasive computing, wireless
networking, applied operations research, as well as network security.
More information can be found at
... In order to ensure that the signal frequency used in the experiment does not conflict with the frequency of living noise, this paper sets the speaker to send a single audio signal of 20 kHz. The single audio signal has the advantage of low complexity and high resolution in terms of Doppler shift [28]. Figures 2-4 show the schematic diagrams of the Doppler effect corresponding to 15 single gestures, six sets of continuous gestures, and six sets of sign language gesture data after pre-processing, respectively. ...
Full-text available
With the global spread of the novel coronavirus, avoiding human-to-human contact has become an effective way to cut off the spread of the virus. Therefore, contactless gesture recognition becomes an effective means to reduce the risk of contact infection in outbreak prevention and control. However, the recognition of everyday behavioral sign language of a certain population of deaf people presents a challenge to sensing technology. Ubiquitous acoustics offer new ideas on how to perceive everyday behavior. The advantages of a low sampling rate, slow propagation speed, and easy access to the equipment have led to the widespread use of acoustic signal-based gesture recognition sensing technology. Therefore, this paper proposed a contactless gesture and sign language behavior sensing method based on ultrasonic signals—UltrasonicGS. The method used Generative Adversarial Network (GAN)-based data augmentation techniques to expand the dataset without human intervention and improve the performance of the behavior recognition model. In addition, to solve the problem of inconsistent length and difficult alignment of input and output sequences of continuous gestures and sign language gestures, we added the Connectionist Temporal Classification (CTC) algorithm after the CRNN network. Additionally, the architecture can achieve better recognition of sign language behaviors of certain people, filling the gap of acoustic-based perception of Chinese sign language. We have conducted extensive experiments and evaluations of UltrasonicGS in a variety of real scenarios. The experimental results showed that UltrasonicGS achieved a combined recognition rate of 98.8% for 15 single gestures and an average correct recognition rate of 92.4% and 86.3% for six sets of continuous gestures and sign language gestures, respectively. As a result, our proposed method provided a low-cost and highly robust solution for avoiding human-to-human contact.
... Acoustic signals are of interest in the telecommunications industry for various remote applications, such as sonar for underwater civilian/military scenarios [1][2][3], modems for both aerial and underwater communications [4][5][6], as well as echography and other medical applications involving image reconstruction [7][8][9]. Networks of acoustic sensors, called wireless acoustic sensor networks (WASNs), are increasingly adopted to support sensing and monitoring applications in the emerging paradigm of the smart cities. Prominent examples include the monitoring of urban noise [10,11], the deployment of advanced acoustic surveillance systems [12], and more generally indoor and outdoor applications benefiting from the extraction of contextual information (e.g., proximity, ranging) from acoustic communications [13]. ...
Full-text available
Acoustic communications are experiencing renewed interest as alternative solutions to traditional RF communications, not only in RF-denied environments (such as underwater) but also in areas where the electromagnetic (EM) spectrum is heavily shared among several wireless systems. By introducing additional dedicated channels, independent from the EM ones, acoustic systems can be used to ensure the continuity of some critical services such as communication, localization, detection, and sensing. In this paper, we design and implement a novel acoustic system that uses only low-cost off-the-shelf hardware and the transmission of a single, suitably designed signal in the inaudible band (18-22 kHz) to perform integrated sensing (ranging) and communication. The experimental testbed consists of a common home speaker transmitting acoustic signals to a smartphone, which receives them through the integrated microphone, and of an additional receiver exploiting the same signals to estimate distance information from a physical obstacle in the environment. The performance of the proposed dual-function system in terms of noise, data rate, and accuracy in distance estimation is experimentally evaluated in a real operational environment.
... Later on, the authors in [121] released a cross-platform support for ubiquitous acoustic sensing. Cai et al. develop ASDP, the first Acoustic Software Defined Platform, which encompasses several customized acoustic modules running on a ubiquitous computing board, supported by a dedicated software framework [17]. Unfortunately, these platforms have not yet gained much attraction in the community. ...
Full-text available
With the proliferation of Internet-of-Things (IoT) devices, acoustic sensing attracts significant attention in recent years. It exploits acoustic transceivers such as microphones and speakers beyond their primary functions, namely recording and playing, to enable novel applications and new user experiences. In this paper, we present the first systematic survey on recent advances in ubiquitous acoustic sensing using commodity IoT hardware with a frequency range below 24 kHz. We propose a general framework that categorizes main building blocks of acoustic sensing systems. This framework encompasses three layers, i.e., device , core technique , and application . The device layer includes basic hardware components, acoustic platforms, as well as the air-borne and structure-borne channel characteristics. The core technique layer encompasses key mechanisms to generate acoustic signals (waveforms) and to extract useful temporal, spatial, and spectral information from received signals. The application layer builds upon the functions offered by the core techniques to realize different acoustic sensing applications. We highlight unique challenges due to the limitations of physical devices and acoustic channels and how they are mitigated or overcame by core processing techniques and application-specific solutions. Finally, research opportunities and future directions are discussed to spawn further in-depth investigation on IoT-enabled ubiquitous acoustic sensing.
... According to our experience with Octopus, integrating the SDRbased testbeds into commodity hardware still faces a lot of practical difficulties. In parallel with the development reported in this paper, we have also worked out an FMCW version [56] (similar to [2] but far more compact) and an acoustic sensing version [8,9]; they share similar computation modules as Octopus but compliment it in application scope with different transceivers. ...
Conference Paper
Full-text available
Radio frequency (RF) technologies have achieved a great success in data communication. In recent years, pervasive RF signals are further exploited for sensing; RF sensing has since attracted attentions from both academia and industry. Existing developments mainly employ commodity Wi-Fi hardware or rely on sophisticated SDR platforms. While promising in many aspects, there still remains a gap between lab prototypes and real-life deployments. On one hand, due to its narrow bandwidth and communication-oriented design, Wi-Fi sensing offers a coarse sensing granularity and its performance is very unstable in harsh real-world environments. On the other hand, SDR-based designs may hardly be adopted in practice due to its large size and high cost. To this end, we propose, design, and implement Octopus, a compact and flexible wideband MIMO sensing platform, built using commercial-grade low-power impulse radio. Octopus provides a standalone and fully programmable RF sensing solution; it allows for quick algorithm design and application development, and it specifically leverages the wideband radio to achieve a competent and robust performance in practice. We evaluate the performance of Octopus via micro-benchmarking, and further demonstrate its applicability using representative RF sensing applications, including passive localization, vibration sensing, and human/object imaging.
In the current critical situation of novel coronavirus, the use of contactless gesture recognition method can reduce human contact and decrease the probability of virus transmission. In this context, ultrasound-based sensing has been widely concerned for its slow propagation speed, low sampling rate, and easy access to devices. However, limited by the complexity of gestural movements and insufficient training data, the accuracy and robustness of gesture recognition are low. To solve this problem, we propose UltrasonicG, a system for highly robust gesture recognition on ultrasonic devices. The system first converts a single audio signal into a Doppler shift and subsequently extracts the feature values using the Residual Neural Network (ResNet34) and uses Bi-directional Long Short-Term Memory (Bi-LSTM) for gesture recognition. The method effectively improves the accuracy of gesture recognition by combining the information of feature dimension with time dimension. To overcome the challenge of insufficient dataset, we use data extension to expand the dataset. We have conducted extensive experiments and evaluations on UltrasonicG in a variety of real scenarios. The experimental results show that UltrasonicG can recognize 15 kinds of gestures with a recognition distance of 0.5 m. And it has a high accuracy and robustness with a comprehensive recognition rate of 98.8% under different environments and influencing factors.
Full-text available
Aerial acoustic communication attracts substantial attention for its simplicity, cost-effectiveness, and power-efficiency. Unfortunately, the preferred inaudible transmission has to strike a balance between the transmission rate and communication range, when the Bit-Error-Rate (BER) is under a certain threshold. Additionally, the performance of previous proposals can be deteriorated by dynamic channel conditions including near-far problem, device heterogeneity, and multipath fading. To this end, we propose a High-speed, long-range, and Robust Chirp Spread Spectrum (HRCSS) scheme for inaudible aerial acoustic communication under dynamic channels. HRCSS innovates in the definition of a loose orthogonality condition, and it leverages this orthogonality to overlap multiple chirp carriers in a single time duration to form a data symbol representing multiple bits, thereby substantially promoting the data rate. To further enhance system robustness in long communication ranges and dynamic channel conditions, we construct a lightweight rate adaptation algorithm and design a simple yet efficient normalization method. Experiment results reveal that HRCSS achieves a significant improvement in data rate over existing methods: it delivers 500bps data rate with a BER of 0.24% at 10m, and achieves 125bps with zero BER at 20m. Meanwhile, HRCSS can work adaptively under dynamic channel conditions while still retaining a BER below 3%.
Conference Paper
Full-text available
Though measuring ambient temperature is often deemed as an easy job, collecting large-scale temperature readings in real-time is still a formidable task. The recent boom of network-ready (mobile) devices and the subsequent mobile crowdsourcing applications do offer an opportunity to accomplish this task, yet equipping commodity devices with ambient temperature sensing capability is highly non-trivial and hence has never been achieved. In this paper, we propose Acoustic Thermometer (AcuTe) as the first ambient temperature sensor empowered by a single commodity smartphone. AcuTe utilizes on-board dual microphones to estimate airborne sound propagation speed, thereby deriving ambient temperature. To accurately estimate sound propagation speed, we leverage the phase of chirp signals to circumvent the low sample rate on commodity hardware. In addition, we propose to use both structure-borne and airborne propagations to address the multipath problem. Furthermore, to prevent disruptive audible transmissions, we convert chirp signals into white noises and propose a pipeline of signal processing algorithms to denoise received samples. As a mobile, economical, highly accurate sensor, AcuTe may potentially enable many relevant applications, in particular large-scale indoor/outdoor temperature monitoring in real-time. We conduct extensive experiments on AcuTe; the results demonstrate a robust performance, a median accuracy of 0.3C even at a varying humidity level, and the ability to conduct distributed temperature sensing in real-time.
Full-text available
Temperature is an important data source for weather forecasting, agriculture irrigation, anomaly detection, etc. Whereas temperature measurement can be achieved via low-cost yet standalone hardware with reasonable accuracy, integrating thermal sensing into ubiquitous computing devices is highly non-trivial due to the design requirement for specific heat isolation and proper device layout. In this paper, we present the first integrated thermometer using commercial-off-the-shelf acoustic-enabled devices. Our Software Sonic Thermometer (SST) utilizes on-board dual microphones on commodity mobile devices to estimate sound speed, which has a known relation with temperature. To precisely measure temperature via sound speed, we propose a chirp mixing approach to circumvent low sampling rates on commodity hardware and design a pipeline of signal processing blocks to handle channel distortions. SST, for the first time, empowers ubiquitous computing devices with thermal sensing capability. It is portable and cost-effective, making it competitive with current thermometers using dedicated hardware. SST is potential to facilitate many interesting applications such as large-scale distributed thermal sensing, yielding high temporal/spatial resolutions with unimaginable low costs. We implement SST on a commodity platform and results show that SST achieves a median accuracy of 0.5°C even at varying humidity levels.
Full-text available
There is an emerging need for high-rate underwater acoustic (UW-A) communication platforms to enable a new generation of underwater monitoring applications including video streaming. At the same time, modern UW-A communication architectures need to be flexible to adapt and optimize their communication parameters in real time based on the environmental conditions. Existing UW-A modems are limited in terms of achievable data rates and ability to adapt the protocol stack in real time. To overcome this limitation, this article presents the design, implementation, and experimental evaluation of a new high-rate software-defined acoustic modem (SDAM) with real-time adaptation capabilities for UW-A communications. We introduce new physical-layer adaptation mechanisms that enable either joint adaptation of communication parameters such as modulation constellation and channel coding rate or seamless switching between different communication technologies such as Orthogonal-Frequency- Division-Multiplexing (OFDM) and Direct-Sequence-Spread-Spectrum (DS-SS). The performance of the proposed SDAM has been evaluated in both indoor (water tank) and outdoor (lake) environments. We demonstrated that the SDAM achieves 104 kbit/s with bit-error-rate (BER) of 2x10—5, 208 kbit/s with BER of 10—3, and 260 kbit/s with BER of 10—2 in real time over a 200 m horizontal link at a very-shallow lake environment.
Conference Paper
Full-text available
The lack of digital floor plans is a huge obstacle to pervasive indoor location based services (LBS). Recent floor plan construction work crowdsources mobile sensing data from smartphone users for scalability. However, they incur long time (e.g., weeks or months) and tremendous efforts in data collection, and many rely on images thus suffering technical and privacy limitations. In this paper, we propose BatMapper, which explores a previously untapped sensing modality -- acoustics -- for fast, fine grained and low cost floor plan construction. We design sound signals suitable for heterogeneous microphones on commodity smartphones, and acoustic signal processing techniques to produce accurate distance measurements to nearby objects. We further develop robust probabilistic echo-object association, recursive outlier removal and probabilistic resampling algorithms to identify the correspondence between distances and objects, thus the geometry of corridors and rooms. We compensate minute hand sway movements to identify small surface recessions, thus detecting doors automatically. Experiments in real buildings show BatMapper achieves 1-2cm distance accuracy in ranges up around 4m; a 2-3 minute walk generates fine grained corridor shapes, detects doors at 92% precision and 1~2m location error at 90-percentile; and tens of seconds of measurement gestures produce room geometry with errors
The enabling Internet-of-Things technology has inspired many innovative sensing mechanisms by repurposing the onboard sensors. Leveraging the built-in acoustic sensors for ranging is among one of the interesting applications. However, among the few studies on acoustic ranging, the one-way sensing method suffers from synchronization errors and requires cumbersome kernel modifications; the other two-way approaches overcome these shortcomings, but they are sensitive to system delays. In this case, this paper proposes a novel lightweight one-way sensing paradigm without the above drawbacks. The key insight of our work is to perform ranging by estimating the propagation time of acoustic signals via Linear Frequency Modulation (LFM) signal mixing. Such a signal mix operation can translate range estimation into fine-grain frequency estimation, thereby enhancing ranging accuracy. In addition, our system can have multiple receivers co-exist and thus the measurement dimensions are boosted. We have implemented and evaluated our system prototype in real-world settings. The prototype demonstrated centimeter-level ranging performance.
Conference Paper
Enabling touch gesture sensing on all surfaces of the mobile device, not limited to the touchscreen area, leads to new user interaction experiences. In this paper, we propose VSkin, a system that supports fine-grained gesture-sensing on the back of mobile devices based on acoustic signals. VSkin utilizes both the structure-borne sounds, i.e., sounds propagating through the structure of the device, and the air-borne sounds, i.e., sounds propagating through the air, to sense finger tapping and movements. By measuring both the amplitude and the phase of each path of sound signals, VSkin detects tapping events with an accuracy of 99.65% and captures finger movements with an accuracy of 3.59mm.
Conference Paper
For AR/VR systems, tapping-in-the-air is a user-friendly solution for interactions. Most prior in-air tapping schemes use customized depth-cameras and therefore have the limitations of low accuracy and high latency. In this paper, we propose a fine-grained depth-aware tapping scheme that can provide high accuracy tapping detection. Our basic idea is to use light-weight ultrasound based sensing, along with one COTS mono-camera, to enable 3D tracking of user's fingers. The mono-camera is used to track user's fingers in the 2D space and ultrasound based sensing is used to get the depth information of user's fingers in the 3D space. Using speakers and microphones that already exist on most AR/VR devices, we emit ultrasound, which is inaudible to humans, and capture the signal reflected by the finger with the microphone. From the phase changes of the ultrasound signal, we accurately measure small finger movements in the depth direction. With fast and light-weight ultrasound signal processing algorithms, our scheme can accurately track finger movements and measure the bending angle of the finger between two video frames. In our experiments on eight users, our scheme achieves a 98.4% finger tapping detection accuracy with FPR of 1.6% and FNR of 1.4%, and a detection latency of 17.69ms, which is 57.7ms less than video-only schemes. The power consumption overhead of our scheme is 48.4% more than video-only schemes.
Conference Paper
LibAS is a cross-platform framework to facilitate the rapid development of mobile acoustic sensing apps. It helps developers quickly realize their ideas by using a high-level Matlab script, and test them on various OS platforms, such as Android, iOS, Tizen, and Linux/Win. LibAS simplifies the development of acoustic sensing apps by hiding the platform-dependent details. For example, developers need not learn Objective-C/SWIFT or the audio buffer management in the CoreAudio framework when they want to implement acoustic sensing algorithms on an iPhone. Instead, developers only need to decide on the sensing signals and the callback function to handle each repetition of sensing signals. We have implemented apps covering three major acoustic sensing categories to demonstrate the benefits and simplicity of developing apps with LibAS. Our evaluation results show the adaptability of LibAS in supporting various acoustic sensing apps and tuning/improving their performance efficiently. Developers have reported that LibAS saves them a significant amount of time/effort and can reduce up to 90% lines of code in their acoustic sensing apps.