ArticlePDF Available

Unsupervised Representation Learning of Structured Radio Communication Signals

Authors:

Abstract and Figures

We explore unsupervised representation learning of radio communication signals in raw sampled time series representation. We demonstrate that we can learn modulation basis functions using convolutional autoencoders and visually recognize their relationship to the analytic bases used in digital communications. We also propose and evaluate quantitative met- rics for quality of encoding using domain relevant performance metrics.
Content may be subject to copyright.
Unsupervised Representation Learning of Structured
Radio Communication Signals
Timothy J. O’Shea
Bradley Department of Electrical
and Computer Engineering
Virginia Tech
Arlington, VA
http://www.oshearesearch.com
Johnathan Corgan
Corgan Labs
San Jose, CA
http://corganlabs.com/
T. Charles Clancy
Bradley Department of Electrical
and Computer Engineering
Virginia Tech
Arlington, VA
http://www.stochasticresearch.com/
Abstract—We explore unsupervised representation learning
of radio communication signals in raw sampled time series
representation. We demonstrate that we can learn modulation
basis functions using convolutional autoencoders and visually
recognize their relationship to the analytic bases used in digital
communications. We also propose and evaluate quantitative met-
rics for quality of encoding using domain relevant performance
metrics.
Index Terms—Radio communications, Software Radio, Cogni-
tive Radio, Deep Learning, Convolutional Autoencoders, Neural
Networks, Machine Learning
I. INTRODUCTION
Radio signals are all around us and serve as a key enabler
for both communications and sensing as our world grows
increasingly reliant on both in a heavily interconnected and
automated world. Much effort has gone into expert system
design and optimization for both radio and radar systems over
the past 75 years considering exactly how to represent, shape,
adapt, and recover these signals through a lossy, non-linear,
distorted, and often interference heavy channel environment.
Meanwhile, in recent years, heavily expert-tuned basis func-
tions such as Gabor filters in the vision domain have been
largely discarded due to the speed at which they can be naively
learned and adapted using feature learning approaches in deep
neural networks.
Here we explore making the same transition from using rel-
atively simple expert-designed representation and coding to us-
ing emergent, learned encoding. We expect to better optimize
for channel capacity, to be able to translate information to and
from channel and compact representations, and to better reason
about what kind of information is in the radio spectrum–
allowing less-supervised classification, anomaly detection, and
numerous other applications.
This paper provides the first step towards that goal by
demonstrating that common radio communications signal
bases emerge relatively easily using existing unsupervised
learning methods. We outline a number of techniques which
enable this to work to provide insight for continued investi-
gation into this domain. This work extends promising prior
supervised feature learning work in the domain we have
already begun in [12].
A. Basis Functions for Radio Data
Widely used single-carrier radio signal time series mod-
ulations schemes today still use a relatively simple set of
supporting basis functions to modulate information into the
radio spectrum. Digital modulations typically use a set of sine
wave basis functions with orthogonal or pseudo-orthogonal
properties in phase, amplitude, and/or frequency. Information
bits can then be used to map a symbol value sito a location
in this space φj, φk, .... In figure 1 we show three potential
basis functions where φ0and φ1form phase-orthogonal bases
used in Phase Shift Keying (PSK) and Quadrature Ampli-
tude Modulation (QAM), while φ0and φ2show frequency-
orthogonal bases used in Frequency Shit Keying (FSK). In the
final figure of 1 we show a common mapping of constellation
points into this space as typically used in Quadrature Phase
Shift Keying (QPSK) where each symbol value encodes two
bits of information.
Digital modulation theory in communications is a rich
subject explored in much greater depth in numerous great texts
such as [3].
Figure 1. Example Radio Communications Basis Functions
arXiv:1604.07078v1 [cs.LG] 24 Apr 2016
B. Radio Signal Structure
Once basis functions have been selected, data to transmit is
divided into symbols and each symbol period for transmission
occupies a sequential time slot. To avoid creating wideband
signal energy associated with rapid transitions in symbols,
a pulse shaping envelope such as a root-raised cosine or
sinc filter is typically used to provide a smoothed transition
between discrete symbol values in adjacent time-slots [1].
Three such adjacent symbol time slots can be seen in figure 2.
Ultimately a sequence of pulse shaped symbols with different
values are summed together to form the transmit signal time-
series, s(t).
Figure 2. Discrete Symbols Envelopes in Time
C. Radio Channel Effects
The transmitted signal, s(t), passes through a number of
channel effects over the air before being received as r(t)at the
receiver. This includes time-delay, time-scaling, phase rotation,
frequency offset, additive thermal noise, and channel impulse
responses being convolved with the signal, all as random
unknown time-varying processes. A closed form of all these
effects might take the form of something roughly like this:
r(t) = ejnLo(t)Zτ0
τ=0
s(nClk (tτ))h(τ) + nAdd(t)(1)
This significantly complicates the data representation from
its original straightforward encoding at the transmitter when
considering the effects of wireless channels as they exist in
the real world.
II. LE AR NI NG F ROM RA DI O SIG NALS
We focus initially on attempting to learn symbol basis
functions from existing modulation schemes in wide use
today. We focus on Quadrature Phase-Shift Keying (QPSK)
and Gaussian Binary Frequency Shift Keying (GFSK) as our
modulation of interest in this work and hope to demonstrate
learning the analytical basis functions for these naively.
A. Building a Dataset
We leverage the dataset from [12] and focus on learning
only a single modulation basis set at a time in this work.
This dataset includes the QPSK and GFSK modulations passed
through realistic, but relatively benign wireless channels, sam-
pled in 88 complex-valued sample times per training example.
B. Unsupervised Learning
Autoencoders [2] have become a powerful and widely
used unsupervised learning tool. We review the autoencoder
and several relevant improvements on the autoencoder with
application to this domain which we leverage.
1) Autoencoder Architectures: Autoencoders (AE) learn an
intermediate, possibly lower dimensional encoding of an input
by using reconstruction cost as their optimization criteria,
typically attempting to minimize Mean Squared-Error (MSE).
They consist of an encoder which encodes raw inputs into a
lower-dimension hidden sparse representation, and a decoder
which reconstructs an estimate for the input vector as the
output.
A number of improvements have been made on autoen-
coders which we leverage below.
2) Denoising Autoencoders: By introducing noise into the
input of an AE training, but evaluating its reconstruction of
the unmodified input, Denoising Autoencoders [6] perform an
additional input noise regularization effect which is extremely
well suited in the communications domain where we always
have additive Gaussian thermal noise applied to our input
vectors.
3) Convolutional Autoencoders: Convolutional Autoen-
coders [7] are simply autoencoders leveraging convolutional
weight configurations in their encoder and decoder stages.
By leveraging convolutional layers rather than fully connected
layers, we force time-shift invariance learning in our features
and reduce the number of parameters required to fit. Since
our channel model involves random time shifting of the input
signal, this is an important property to the radio application
domain which we feel is extremely well suited for this task.
4) Regularization: We leverage heavy L2=kWk2weight
regularization and L1=khk1activity regularization in our
AE to attempt to force it to learn orthogonal basis functions
with minimal energy. [4] Strong L1activation regularization
is especially important in the narrow hidden layer represen-
tation between encoder and decoder where we would like
to learn a maximally sparse compact basis representation of
the signal through symbols of interest occurring at specific
times. Dropout [10] is also used as a form of regularization
between intermediate layers, forcing the network to leverage
all available weight bases to span the representation space.
C. Test Neural Network Architecture
Our goal in this effort was to obtain a minimum complexity
network which allows us to convincingly reconstruct the
signals of interest with a significant amount of information
compression. By using convolutional layers with only one
or two filters, we seek to achieve a maximally matched
small set of time-basis filters with some equivalence to the
expert features used to construct the signal. Dense layers with
non-linear activations then sit in between these to provide
some estimation of the logic for what the representation and
reconstruction should be for those basis filters occurring at
different times. The basic network architecture is shown below
in figure 3.
Figure 3. Convolutional Autoencoder Architecture Used
D. Evaluation Methods for Reconstruction
For the scope of this work we use MSE as our reconstruction
metric for optimization. We seek to evaluate reconstructed
signals from BER and SNR, but plan to defer this for later
work in the interest of space.
E. Visual Inspection of Learned Representations
Given a relatively informed view of what a smooth band-
limited QPSK signal looks like in reality, visual inspection
of the reconstruction vs the noisy input signal is an impor-
tant way to consider the quality of the representation and
reconstruction we have learned. The sparse representation is
especially interesting as by selecting hard-sigmoid dense layer
activations we have effectively forced the network to learn a
binary representation of the continuous signal. Ideally there
exists a direct GF(2) relationship between the encoded bits
and the coded symbol bits of interest here. Figures 4 and 5
illustrate this reconstruction and sparse binary representation
learned.
For GFSK, we show reconstructions and sparse representa-
tions in figure 6. In this case, the AE architecture converges
even faster to a low reconstruction error, but unfortunately the
sparse representations are not saturated into discrete values as
was the case for the constant modulus signal.
III. RES ULTS
We consider the significance of these results below in the
context of the network complexity required for representation
and the compression ratio obtained.
Figure 4. QPSK Reconstruction 1 through Conv-AE
Figure 5. QPSK Reconstruction 2 through Conv-AE
Figure 6. GFSK Reconstruction 1 through Conv-AE
A. Learned Network Parameters
We use Adam [9] (a momentum method of SGD) to train
our network parameters as implemented in the Keras [11]
library. Evaluating our weight complexity, we have two 2D
convolutional layers, 2x1x1x40 and 1x1x1x81, making a total
of only 161 parameters learned in these layers to fit the
translation invariant filter features which form the primary
input and output for our network. The Dense layers which
provide mappings from occurrences of these filter weights to
a sparse code and back to a wide representation, consist of
weight matrices of 516x44 and 44x176 respectively, making
a total of 30448 dense floating point weight values.
Training is relatively trivial with this size network and
dataset, we converge on a solution after about 2 minutes of
training, 25 epochs on 20,000 training examples using a Titan
X GPU.
Figure 7. QPSK Encoder Convolutional Weights
Figure 8. QPSK Decoder Convolutional Weights
In figure 7 we show the learned convolutional weight vectors
in the encoder first layer. We can clearly see a sinusoid
occurs at varying time offests to form detections, and a second
sinusoid at double the frequency, both with some minimal
pulse shaping apparent on them.
In the decoder convolutional weight vector in figure 8 we
can clearly see the pulse shaping filter shape emerge in the
In figure 9 we display the learned dense layer weights
mappings of various symbol value and offset areas as rep-
resented by the convolutional filters. It is important to note
Figure 9. First Four Sparse Representation Dense Weights
that the 1x516 input is a linearized dimension of zero-padded
I and Q inputs through two separate filters (2x2x129). We see
that a single sparse hidden layer value equates to two pulses
representing sinusoidal convolutional filter occurrences in time
in the I and the Q channel, with roughly a sinc or root raised
cosine window roll-off visibly represented in this time-scale.
B. Radio Signal Representation Complexity
To measure the compression we have achieved, we compare
the effective number of bits required to represent the dynamic
range in the input and output continuous signal domains with
that of the number of bits required to store the signal in the
hidden layer. [8]
If we consider that our input signal contains roughly 20dB
of signal-to-noise ratio, we can approximate the number of
bits required to represent each continuous value as follows.
Neff =d20dB 1.76
6.02 e= 4 bits (2)
Given that we have 88*2 inputs of 4 bit resolution, com-
pressed to 44 intermediate binary values, we get a compression
ratio of 16x = 88*2*4/44.
Given that we are learning roughly 4 to 5 symbols per
example, with 4 samples per symbol, this equates to something
like 10 bits being the most compact possible form of data-
information representation. However in the current encoder,
we are also encoding timing offset information, phase error,
and generally all channel information needed to reconstruct
the data symbols in their specific arrival mode. Given this is
on the order of 4x the most compact representation possible
for the data symbols alone, this is not a bad starting point.
IV. CONCLUSIONS
We are able to obtain relatively good compression with
autoencoders for radio communications signals, however these
must encode both the data bits and the channel state informa-
tion which limits attainable compression.
Hard-sigmoid activations surrounding the hidden layer, for
constant modulus modulations, seem effective in saturating
representation into compact binary vectors, allowing us to
encode 88 x 64 bit complex values into 44 bits of information
without significant degradation.
Convolutional autoencoders are well suited for reducing
parameter space, forcing time-invariance features, and forming
a compact front-end for radio data. We look forward to evaluat-
ing more quantitative metrics on reconstructed data, evaluating
additional multi-level binary or hard-sigmoid representation
for multi-level non-constant-modulus signals and investigating
the use of attention models to remove channel variance from
compact data representation requirements.
ACKNOWLEDGMENTS
The authors would like to thank the Bradley Department
of Electrical and Computer Engineering at the Virginia Poly-
technic Institute and State University, the Hume Center, and
DARPA all for their generous support in this work.
This research was developed with funding from the Defense
Advanced Research Projects Agency’s (DARPA) MTO Office
under grant HR0011-16-1-0002. The views, opinions, and/or
findings expressed are those of the author and should not be
interpreted as representing the official views or policies of the
Department of Defense or the U.S. Government.
REFERENCES
[1] E. S. Sousa and S. Pasupathy, “Pulse shape design
for teletext data transmission”, Communications, IEEE
Transactions on, vol. 31, no. 7, pp. 871–878, 1983.
[2] G. E. Hinton and R. S. Zemel, “Autoencoders, min-
imum description length, and helmholtz free energy”,
Advances in neural information processing systems,
pp. 3–3, 1994.
[3] B. Sklar, Digital communications. Prentice Hall NJ,
2001, vol. 2.
[4] H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient
sparse coding algorithms”, in Advances in neural infor-
mation processing systems, 2006, pp. 801–808.
[5] C. Clancy, J. Hecker, E. Stuntebeck, and T. O’Shea,
“Applications of machine learning to cognitive radio
networks”, Wireless Communications, IEEE, vol. 14,
no. 4, pp. 47–52, 2007.
[6] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Man-
zagol, “Extracting and composing robust features with
denoising autoencoders”, in Proceedings of the 25th
international conference on Machine learning, ACM,
2008, pp. 1096–1103.
[7] J. Masci, U. Meier, D. Cires¸an, and J. Schmidhu-
ber, “Stacked convolutional auto-encoders for hierar-
chical feature extraction”, in Artificial Neural Networks
and Machine Learning–ICANN 2011, Springer, 2011,
pp. 52–59.
[8] T. M. Cover and J. A. Thomas, Elements of information
theory. John Wiley & Sons, 2012.
[9] D. Kingma and J. Ba, “Adam: a method for stochastic
optimization”, arXiv preprint arXiv:1412.6980, 2014.
[10] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,
and R. Salakhutdinov, “Dropout: a simple way to pre-
vent neural networks from overfitting”, The Journal of
Machine Learning Research, vol. 15, no. 1, pp. 1929–
1958, 2014.
[11] F. Chollet, Keras, https : / / github. com / fchollet / keras,
2015.
[12] T. J. O’Shea and J. Corgan, “Convolutional ra-
dio modulation recognition networks”, CoRR, vol.
abs/1602.04105, 2016. [Online]. Available: http://arxiv.
org/abs/1602.04105.
... O'Shea et al. have studied end-to-end communication systems based on unsupervised learning in depth. [4][5][6][7][8] Specifically, in Reference 5, By introducing the attention mechanism into modulation recognition, spatial transformer network (STN) 9 was utilized to normalize the received signal and two-dimensional affine transformation was used to complete the timing synchronization and symbol recovery at first. Then, the synchronization of frequency and phase is converted into the estimation of frequency difference and phase difference parameters by introducing complex convolution and coordinates transform of Cartesian to polar. ...
... The PIRD layer D w, 4 ...
Article
Full-text available
Aiming at the difficulty of the deep neural network (DNN) adapting to channel changes in communication systems, a channel synchronization deep neural network (CSDNN) based on deep learning (DL) is designed for realizing carrier synchronization, bit timing synchronization, and automatic gain control (AGC). By introducing a frequency‐domain cyclic convolution (FDCC) layer, the network transformed the time‐domain triangle activation into frequency domain linear activation taking FFT and IFFT matrixes as the activation function, solved the reverse gradient transmission‐blocking problems in training the time‐domain carrier synchronization neural network, effectively overcome the FFT inherent “fence” effect, and accurately compensated carrier frequency offset; By introducing a time‐domain cyclic convolution (TDCC) layer and the special frame structure design containing repetitive training sequence, the network training was completed to realize bit timing synchronization under the condition of the uncertain corresponding relationship between training data and labels. Combining phase inverse rotation dense (PIRD) layer, the network can be trained with very little training data to complete fast carrier synchronization and timing synchronization, at the same time adjust the received signal gain and suppress the jamming, which makes it is possible to train the channel synchronization deep neural network online under jamming environment, and provide a feasible way of realizing the intelligent communication system.
... These advantages have motivated researchers to extend ML applications to communications and signal processing. The applications of ML in communications include belief propagation algorithm to improve channel decoding [4], [5], learning of cryptography schemes [6], one-shot channel decoding [7], as well as compression [8]. One of the interesting applications is learning an end-to-end communications system using autoencoder proposed first in [1]. ...
Preprint
Full-text available
The traditional communication model based on chain of multiple independent processing blocks is constraint to efficiency and introduces artificial barriers. Thus, each individually optimized block does not guarantee end-to-end performance of the system. Recently, end-to-end learning of communications systems through machine learning (ML) have been proposed to optimize the system metrics jointly over all components. These methods show performance improvements but has a limitation that it requires a differentiable channel model. In this study, we have summarized the existing approaches that alleviates this problem. We believe that this study will provide better understanding of the topic and an insight into future research in this field.
... For example, Timothy et al proposed the use of convolutional neural network (CNN), breaking the limitation of traditional machine learning. [9][10][11][12] So far, many network architectures have emerged to deal with the identification of radio signals such as long short-term memory (LSTM), 13 CLDNN, 14,15 ResNet, 15,16 GANs, [17][18][19] AlexNet and GoogleNet 20 and so forth. ...
Article
Full-text available
Recently, deep leaning has been making great progress in automatic modulation classification, just like its success in computer vision. However, radio signals with harsh impairments (oscillator drift, clock drift, noise) would significantly degrade the performance of the existing classifiers. To overcome the problem and explore the depth reason, a hybrid attention convolution network is proposed to enhance the capability of feature extraction. First, a spatial transformer network module with long short‐term memory is introduced to synchronize and normalize radio signals. Second, a channel attention module is constructed to weight and assemble feature maps, exploring global feature representations with more context‐relevant information. By combining these two modules, a relatively lightweight classifier with complex convolution layer for final classification is further researched through visualization. Moreover, different structures of attention module are compared and optimized in detail. Experimental result shows that our proposed hybrid model achieves the best performance among all compared models when SNR is upper than 7 dB, and it peaks at 93.448 at 0 dB, 2.7% higher than that of CLDNN and 97.560 at 20 dB, 8.2% higher than that of ResNet. And our model can be more efficient after a trade‐off between accuracy and model size. In this work, we proposed a hybrid attention convolution network to classify different modulations to develop a robust automatically learning method for radio signals with harsh impairments (oscillator drift, clock drift, noise) and the lightweight network is further optimized.
Conference Paper
Full-text available
We present a neural network architecture able to efficiently detect modulation scheme in a portion of I/Q signals. This network is lighter by up to two orders of magnitude than other state-of-the-art architectures working on the same or similar tasks. Moreover, the number of parameters does not depend on the signal duration, which allows processing stream of data, and results in a signal-length invariant network. In addition, we have generated a dataset based on the simulation of impairments that the propagation channel and the demodulator can bring to recorded I/Q signals: random phase shifts, delays, roll-off, sampling rates, and frequency offsets. We benefit from this dataset to train our neural network to be invariant to impairments and quantify its accuracy at disentangling between modulations under realistic real-life conditions. Data and code to reproduce the results are made publicly available.
Article
Deep learning has been fully verified and accepted in the field of electromagnetic signal classification. However, in many specific scenarios, such as radio resource management for aircraft communications, labeled data are difficult to obtain, which makes the best deep learning methods at present seem almost powerless, because these methods need a large amount of labeled data for training. When the training dataset is small, it is highly possible to fall into overfitting, which causes performance degradation of the deep neural network. For few-shot electromagnetic signal classification, data augmentation is one of the most intuitive countermeasures. In this work, a generative adversarial network based on the data augmentation method is proposed to achieve better classification performance for electromagnetic signals. Based on the similarity principle, a screening mechanism is established to obtain high-quality generated signals. Then, a data union augmentation algorithm is designed by introducing spatiotemporally flipped shapes of the signal. To verify the effectiveness of the proposed data augmentation algorithm, experiments are conducted on the RADIOML 2016.04C dataset and real-world ACARS dataset. The experimental results show that the proposed method significantly improves the performance of few-shot electromagnetic signal classification.
Article
We numerically perform the classification of IQ-modulated radiofrequency signals using reservoir computing based on narrowband optoelectronic oscillators (OEOs) driven by a continuous-wave semiconductor laser. In general, the OEOs used for reservoir computing are wideband and are processing analog signals in the baseband. However, their hardware architecture is inherently inadequate to directly process radiotelecom or radar signals, which are modulated carriers. On the other hand, the high- QQ OEOs that have been developed for ultra-low phase noise microwave generation have the adequate hardware architecture to process such multi-GHz modulated signals, but they have never been investigated as possible reservoir computing platforms. In this article, we show that these high- QQ OEOs are indeed suitable for reservoir computing with modulated carriers. Our dataset (DeepSig RadioML) is composed with 11 analog and digital formats of IQ-modulated radio signals (BPSK, QAM64, WBFM, etc.), and the task of the high- QQ OEO reservoir computer is to recognize and classify them. Our numerical simulations show that with a simpler architecture, a smaller training set, fewer nodes and fewer layers than their neural network counterparts, high- QQ OEO-based reservoir computers perform this classification task with an accuracy better than the state-of-the-art, for a wide range of parameters. We also investigate in detail the effects of reducing the size of the training sets on the classification performance.
Article
Signal recognition is one of the significant and challenging tasks in the signal processing and communications field. It is often a common situation that there's no training data accessible for some signal classes to perform a recognition task. Hence, as widely-used in image processing field, zero-shot learning (ZSL) is also very important for signal recognition. Unfortunately, ZSL regarding this field has hardly been studied due to inexplicable signal semantics. This paper proposes a ZSL framework, signal recognition and reconstruction convolutional neural networks (SR2CNN), to address relevant problems in this situation. The key idea behind SR2CNN is to learn the representation of signal semantic feature space by introducing a proper combination of cross entropy loss, center loss and reconstruction loss, as well as adopting a suitable distance metric space such that semantic features have greater minimal inter-class distance than maximal intra-class distance. The proposed SR2CNN can discriminate signals even if no training data is available for some signal class. Moreover, SR2CNN can gradually improve itself in the aid of signal detection, because of constantly refined class center vectors in semantic feature space. These merits are all verified by extensive experiments with ablation studies.
Conference Paper
Full-text available
Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that cap- ture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1-regularized least squares problem and an L2-constrained least squares problem. We propose novel algorithms to solve both of these optimiza- tion problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field sur- round suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons.
Conference Paper
Full-text available
Previous work has shown that the dicul- ties in learning deep generative or discrim- inative models can be overcome by an ini- tial unsupervised learning step that maps in- puts to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a rep- resentation based on the idea of making the learned representations robust to partial cor- ruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to ini- tialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising ad- vantage of corrupting the input of autoen- coders on a pattern classification benchmark suite.
Conference Paper
Full-text available
We present a novel convolutional auto-encoder (CAE) for unsupervised feature learning. A stack of CAEs forms a convolutional neural network (CNN). Each CAE is trained using conventional on-line gradient descent without additional regularization terms. A max-pooling layer is essential to learn biologically plausible features consistent with those found by previous approaches. Initializing a CNN with filters of a trained CAE stack yields superior performance on a digit (MNIST) and an object recognition (CIFAR10) benchmark.
Conference Paper
We study the adaptation of convolutional neural networks to the complex temporal radio signal domain. We compare the efficacy of radio modulation classification using naively learned features against using expert features, which are currently used widely and well regarded in the field and we show significant performance improvements. We show that blind temporal learning on large and densely encoded time series using deep convolutional neural networks is viable and a strong candidate approach for this task.
Article
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Chapter
Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex
Chapter
Information theory answers two fundamental questions in communication theory: what is the ultimate data compression (answer: the entropy H), and what is the ultimate transmission rate of communication (answer: the channel capacity C). For this reason some consider information theory to be a subset of communication theory. We will argue that it is much more. Indeed, it has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam's Razor: “The simplest explanation is best”) and to probability and statistics (error rates for optimal hypothesis testing and estimation). The relationship of information theory to other fields is discussed. Information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory) and computer science (algorithmic complexity). We describe these areas of intersection in detail.
Article
This paper studies the problem of designing a suitable pulse shape for teletext data transmission. The following four criteria are used: 1) Nyquist I criterion, 2) Nyquist II criterion, 3) degree of overshoots in the channel signal, and 4) robustness to sampling phase jitter. For system bandwidths less than the inverse-baud rate, it is not possible to satisfy all these criteria simultaneously; tradeoffs that have to be made are illustrated. Several candidate pulse shapes are given and a composite criterion developed. A pulse shape, which satisfies the Nyquist I criterion and is closest to satisfying the Nyquist II criterion, in a sum-of-squares-of-deviations sense, is recommended.