Content uploaded by Tim O'Shea

Author content

All content in this area was uploaded by Tim O'Shea on Aug 30, 2016

Content may be subject to copyright.

Unsupervised Representation Learning of Structured

Radio Communication Signals

Timothy J. O’Shea

Bradley Department of Electrical

and Computer Engineering

Virginia Tech

Arlington, VA

http://www.oshearesearch.com

Johnathan Corgan

Corgan Labs

San Jose, CA

http://corganlabs.com/

T. Charles Clancy

Bradley Department of Electrical

and Computer Engineering

Virginia Tech

Arlington, VA

http://www.stochasticresearch.com/

Abstract—We explore unsupervised representation learning

of radio communication signals in raw sampled time series

representation. We demonstrate that we can learn modulation

basis functions using convolutional autoencoders and visually

recognize their relationship to the analytic bases used in digital

communications. We also propose and evaluate quantitative met-

rics for quality of encoding using domain relevant performance

metrics.

Index Terms—Radio communications, Software Radio, Cogni-

tive Radio, Deep Learning, Convolutional Autoencoders, Neural

Networks, Machine Learning

I. INTRODUCTION

Radio signals are all around us and serve as a key enabler

for both communications and sensing as our world grows

increasingly reliant on both in a heavily interconnected and

automated world. Much effort has gone into expert system

design and optimization for both radio and radar systems over

the past 75 years considering exactly how to represent, shape,

adapt, and recover these signals through a lossy, non-linear,

distorted, and often interference heavy channel environment.

Meanwhile, in recent years, heavily expert-tuned basis func-

tions such as Gabor ﬁlters in the vision domain have been

largely discarded due to the speed at which they can be naively

learned and adapted using feature learning approaches in deep

neural networks.

Here we explore making the same transition from using rel-

atively simple expert-designed representation and coding to us-

ing emergent, learned encoding. We expect to better optimize

for channel capacity, to be able to translate information to and

from channel and compact representations, and to better reason

about what kind of information is in the radio spectrum–

allowing less-supervised classiﬁcation, anomaly detection, and

numerous other applications.

This paper provides the ﬁrst step towards that goal by

demonstrating that common radio communications signal

bases emerge relatively easily using existing unsupervised

learning methods. We outline a number of techniques which

enable this to work to provide insight for continued investi-

gation into this domain. This work extends promising prior

supervised feature learning work in the domain we have

already begun in [12].

A. Basis Functions for Radio Data

Widely used single-carrier radio signal time series mod-

ulations schemes today still use a relatively simple set of

supporting basis functions to modulate information into the

radio spectrum. Digital modulations typically use a set of sine

wave basis functions with orthogonal or pseudo-orthogonal

properties in phase, amplitude, and/or frequency. Information

bits can then be used to map a symbol value sito a location

in this space φj, φk, .... In ﬁgure 1 we show three potential

basis functions where φ0and φ1form phase-orthogonal bases

used in Phase Shift Keying (PSK) and Quadrature Ampli-

tude Modulation (QAM), while φ0and φ2show frequency-

orthogonal bases used in Frequency Shit Keying (FSK). In the

ﬁnal ﬁgure of 1 we show a common mapping of constellation

points into this space as typically used in Quadrature Phase

Shift Keying (QPSK) where each symbol value encodes two

bits of information.

Digital modulation theory in communications is a rich

subject explored in much greater depth in numerous great texts

such as [3].

Figure 1. Example Radio Communications Basis Functions

arXiv:1604.07078v1 [cs.LG] 24 Apr 2016

B. Radio Signal Structure

Once basis functions have been selected, data to transmit is

divided into symbols and each symbol period for transmission

occupies a sequential time slot. To avoid creating wideband

signal energy associated with rapid transitions in symbols,

a pulse shaping envelope such as a root-raised cosine or

sinc ﬁlter is typically used to provide a smoothed transition

between discrete symbol values in adjacent time-slots [1].

Three such adjacent symbol time slots can be seen in ﬁgure 2.

Ultimately a sequence of pulse shaped symbols with different

values are summed together to form the transmit signal time-

series, s(t).

Figure 2. Discrete Symbols Envelopes in Time

C. Radio Channel Effects

The transmitted signal, s(t), passes through a number of

channel effects over the air before being received as r(t)at the

receiver. This includes time-delay, time-scaling, phase rotation,

frequency offset, additive thermal noise, and channel impulse

responses being convolved with the signal, all as random

unknown time-varying processes. A closed form of all these

effects might take the form of something roughly like this:

r(t) = ej∗nLo(t)Zτ0

τ=0

s(nClk (t−τ))h(τ) + nAdd(t)(1)

This signiﬁcantly complicates the data representation from

its original straightforward encoding at the transmitter when

considering the effects of wireless channels as they exist in

the real world.

II. LE AR NI NG F ROM RA DI O SIG NALS

We focus initially on attempting to learn symbol basis

functions from existing modulation schemes in wide use

today. We focus on Quadrature Phase-Shift Keying (QPSK)

and Gaussian Binary Frequency Shift Keying (GFSK) as our

modulation of interest in this work and hope to demonstrate

learning the analytical basis functions for these naively.

A. Building a Dataset

We leverage the dataset from [12] and focus on learning

only a single modulation basis set at a time in this work.

This dataset includes the QPSK and GFSK modulations passed

through realistic, but relatively benign wireless channels, sam-

pled in 88 complex-valued sample times per training example.

B. Unsupervised Learning

Autoencoders [2] have become a powerful and widely

used unsupervised learning tool. We review the autoencoder

and several relevant improvements on the autoencoder with

application to this domain which we leverage.

1) Autoencoder Architectures: Autoencoders (AE) learn an

intermediate, possibly lower dimensional encoding of an input

by using reconstruction cost as their optimization criteria,

typically attempting to minimize Mean Squared-Error (MSE).

They consist of an encoder which encodes raw inputs into a

lower-dimension hidden sparse representation, and a decoder

which reconstructs an estimate for the input vector as the

output.

A number of improvements have been made on autoen-

coders which we leverage below.

2) Denoising Autoencoders: By introducing noise into the

input of an AE training, but evaluating its reconstruction of

the unmodiﬁed input, Denoising Autoencoders [6] perform an

additional input noise regularization effect which is extremely

well suited in the communications domain where we always

have additive Gaussian thermal noise applied to our input

vectors.

3) Convolutional Autoencoders: Convolutional Autoen-

coders [7] are simply autoencoders leveraging convolutional

weight conﬁgurations in their encoder and decoder stages.

By leveraging convolutional layers rather than fully connected

layers, we force time-shift invariance learning in our features

and reduce the number of parameters required to ﬁt. Since

our channel model involves random time shifting of the input

signal, this is an important property to the radio application

domain which we feel is extremely well suited for this task.

4) Regularization: We leverage heavy L2=kWk2weight

regularization and L1=khk1activity regularization in our

AE to attempt to force it to learn orthogonal basis functions

with minimal energy. [4] Strong L1activation regularization

is especially important in the narrow hidden layer represen-

tation between encoder and decoder where we would like

to learn a maximally sparse compact basis representation of

the signal through symbols of interest occurring at speciﬁc

times. Dropout [10] is also used as a form of regularization

between intermediate layers, forcing the network to leverage

all available weight bases to span the representation space.

C. Test Neural Network Architecture

Our goal in this effort was to obtain a minimum complexity

network which allows us to convincingly reconstruct the

signals of interest with a signiﬁcant amount of information

compression. By using convolutional layers with only one

or two ﬁlters, we seek to achieve a maximally matched

small set of time-basis ﬁlters with some equivalence to the

expert features used to construct the signal. Dense layers with

non-linear activations then sit in between these to provide

some estimation of the logic for what the representation and

reconstruction should be for those basis ﬁlters occurring at

different times. The basic network architecture is shown below

in ﬁgure 3.

Figure 3. Convolutional Autoencoder Architecture Used

D. Evaluation Methods for Reconstruction

For the scope of this work we use MSE as our reconstruction

metric for optimization. We seek to evaluate reconstructed

signals from BER and SNR, but plan to defer this for later

work in the interest of space.

E. Visual Inspection of Learned Representations

Given a relatively informed view of what a smooth band-

limited QPSK signal looks like in reality, visual inspection

of the reconstruction vs the noisy input signal is an impor-

tant way to consider the quality of the representation and

reconstruction we have learned. The sparse representation is

especially interesting as by selecting hard-sigmoid dense layer

activations we have effectively forced the network to learn a

binary representation of the continuous signal. Ideally there

exists a direct GF(2) relationship between the encoded bits

and the coded symbol bits of interest here. Figures 4 and 5

illustrate this reconstruction and sparse binary representation

learned.

For GFSK, we show reconstructions and sparse representa-

tions in ﬁgure 6. In this case, the AE architecture converges

even faster to a low reconstruction error, but unfortunately the

sparse representations are not saturated into discrete values as

was the case for the constant modulus signal.

III. RES ULTS

We consider the signiﬁcance of these results below in the

context of the network complexity required for representation

and the compression ratio obtained.

Figure 4. QPSK Reconstruction 1 through Conv-AE

Figure 5. QPSK Reconstruction 2 through Conv-AE

Figure 6. GFSK Reconstruction 1 through Conv-AE

A. Learned Network Parameters

We use Adam [9] (a momentum method of SGD) to train

our network parameters as implemented in the Keras [11]

library. Evaluating our weight complexity, we have two 2D

convolutional layers, 2x1x1x40 and 1x1x1x81, making a total

of only 161 parameters learned in these layers to ﬁt the

translation invariant ﬁlter features which form the primary

input and output for our network. The Dense layers which

provide mappings from occurrences of these ﬁlter weights to

a sparse code and back to a wide representation, consist of

weight matrices of 516x44 and 44x176 respectively, making

a total of 30448 dense ﬂoating point weight values.

Training is relatively trivial with this size network and

dataset, we converge on a solution after about 2 minutes of

training, 25 epochs on 20,000 training examples using a Titan

X GPU.

Figure 7. QPSK Encoder Convolutional Weights

Figure 8. QPSK Decoder Convolutional Weights

In ﬁgure 7 we show the learned convolutional weight vectors

in the encoder ﬁrst layer. We can clearly see a sinusoid

occurs at varying time offests to form detections, and a second

sinusoid at double the frequency, both with some minimal

pulse shaping apparent on them.

In the decoder convolutional weight vector in ﬁgure 8 we

can clearly see the pulse shaping ﬁlter shape emerge in the

In ﬁgure 9 we display the learned dense layer weights

mappings of various symbol value and offset areas as rep-

resented by the convolutional ﬁlters. It is important to note

Figure 9. First Four Sparse Representation Dense Weights

that the 1x516 input is a linearized dimension of zero-padded

I and Q inputs through two separate ﬁlters (2x2x129). We see

that a single sparse hidden layer value equates to two pulses

representing sinusoidal convolutional ﬁlter occurrences in time

in the I and the Q channel, with roughly a sinc or root raised

cosine window roll-off visibly represented in this time-scale.

B. Radio Signal Representation Complexity

To measure the compression we have achieved, we compare

the effective number of bits required to represent the dynamic

range in the input and output continuous signal domains with

that of the number of bits required to store the signal in the

hidden layer. [8]

If we consider that our input signal contains roughly 20dB

of signal-to-noise ratio, we can approximate the number of

bits required to represent each continuous value as follows.

Neff =d20dB −1.76

6.02 e= 4 bits (2)

Given that we have 88*2 inputs of 4 bit resolution, com-

pressed to 44 intermediate binary values, we get a compression

ratio of 16x = 88*2*4/44.

Given that we are learning roughly 4 to 5 symbols per

example, with 4 samples per symbol, this equates to something

like 10 bits being the most compact possible form of data-

information representation. However in the current encoder,

we are also encoding timing offset information, phase error,

and generally all channel information needed to reconstruct

the data symbols in their speciﬁc arrival mode. Given this is

on the order of 4x the most compact representation possible

for the data symbols alone, this is not a bad starting point.

IV. CONCLUSIONS

We are able to obtain relatively good compression with

autoencoders for radio communications signals, however these

must encode both the data bits and the channel state informa-

tion which limits attainable compression.

Hard-sigmoid activations surrounding the hidden layer, for

constant modulus modulations, seem effective in saturating

representation into compact binary vectors, allowing us to

encode 88 x 64 bit complex values into 44 bits of information

without signiﬁcant degradation.

Convolutional autoencoders are well suited for reducing

parameter space, forcing time-invariance features, and forming

a compact front-end for radio data. We look forward to evaluat-

ing more quantitative metrics on reconstructed data, evaluating

additional multi-level binary or hard-sigmoid representation

for multi-level non-constant-modulus signals and investigating

the use of attention models to remove channel variance from

compact data representation requirements.

ACKNOWLEDGMENTS

The authors would like to thank the Bradley Department

of Electrical and Computer Engineering at the Virginia Poly-

technic Institute and State University, the Hume Center, and

DARPA all for their generous support in this work.

This research was developed with funding from the Defense

Advanced Research Projects Agency’s (DARPA) MTO Ofﬁce

under grant HR0011-16-1-0002. The views, opinions, and/or

ﬁndings expressed are those of the author and should not be

interpreted as representing the ofﬁcial views or policies of the

Department of Defense or the U.S. Government.

REFERENCES

[1] E. S. Sousa and S. Pasupathy, “Pulse shape design

for teletext data transmission”, Communications, IEEE

Transactions on, vol. 31, no. 7, pp. 871–878, 1983.

[2] G. E. Hinton and R. S. Zemel, “Autoencoders, min-

imum description length, and helmholtz free energy”,

Advances in neural information processing systems,

pp. 3–3, 1994.

[3] B. Sklar, Digital communications. Prentice Hall NJ,

2001, vol. 2.

[4] H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efﬁcient

sparse coding algorithms”, in Advances in neural infor-

mation processing systems, 2006, pp. 801–808.

[5] C. Clancy, J. Hecker, E. Stuntebeck, and T. O’Shea,

“Applications of machine learning to cognitive radio

networks”, Wireless Communications, IEEE, vol. 14,

no. 4, pp. 47–52, 2007.

[6] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Man-

zagol, “Extracting and composing robust features with

denoising autoencoders”, in Proceedings of the 25th

international conference on Machine learning, ACM,

2008, pp. 1096–1103.

[7] J. Masci, U. Meier, D. Cires¸an, and J. Schmidhu-

ber, “Stacked convolutional auto-encoders for hierar-

chical feature extraction”, in Artiﬁcial Neural Networks

and Machine Learning–ICANN 2011, Springer, 2011,

pp. 52–59.

[8] T. M. Cover and J. A. Thomas, Elements of information

theory. John Wiley & Sons, 2012.

[9] D. Kingma and J. Ba, “Adam: a method for stochastic

optimization”, arXiv preprint arXiv:1412.6980, 2014.

[10] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,

and R. Salakhutdinov, “Dropout: a simple way to pre-

vent neural networks from overﬁtting”, The Journal of

Machine Learning Research, vol. 15, no. 1, pp. 1929–

1958, 2014.

[11] F. Chollet, Keras, https : / / github. com / fchollet / keras,

2015.

[12] T. J. O’Shea and J. Corgan, “Convolutional ra-

dio modulation recognition networks”, CoRR, vol.

abs/1602.04105, 2016. [Online]. Available: http://arxiv.

org/abs/1602.04105.