ArticlePDF Available

A Survey on Machine Learning-Based Mobile Big Data Analysis: Challenges and Applications


Abstract and Figures

This paper attempts to identify the requirement and the development of machine learning-based mobile big data (MBD) analysis through discussing the insights of challenges in the mobile big data. Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently applied data analysis methods are reviewed. Three typical applications of MBD analysis, namely, wireless channel modeling, human online and offline behavior analysis, and speech recognition in the Internet of Vehicles, are introduced, respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis.
This content is subject to copyright. Terms and conditions apply.
Review Article
A Survey on Machine Learning-Based Mobile Big Data Analysis:
Challenges and Applications
Jiyang Xie ,1Zeyu Song,1Yupeng Li,2Yanting Zhang,3Hong Yu,1Jinnan Zhan,2
Zhanyu Ma ,1Yuanyuan Qiao,3Jianhua Zhang ,2and Jun Guo1
1Pattern Recognition and Intelligent Systems Lab., Beijing University of Posts and Telecommunications, Beijing, China
2State Key Lab. of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
3Center for Data Science, Beijing University of Posts and Telecommunications, Beijing, China
Correspondence should be addressed to Zhanyu Ma; and Jianhua Zhang;
Received 9 April 2018; Accepted 7 June 2018; Published 1 August 2018
Academic Editor: Liu Liu
Copyright ©  Jiyang Xie et al. is is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
is paper attempts to identify the requirement and the development of machine learning-based mobile big data (MBD) analysis
through discussing the insights of challenges in the mobile big data. Furthermore, it reviews the state-of-the-art applications of data
analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently applied data analysis methods
are reviewed. ree typical applications of MBD analysis, namely, wireless channel modeling, human online and oine behavior
analysis, and speech recognition in the Internet of Vehicles, are introduced, respectively. Finally, we summarize the main challenges
and future development directions of mobile big data analysis.
1. Introduction
With the success of wireless local access network (WLAN)
technology (a.k.a. Wi-Fi) and the second/third/fourth gen-
eration (G/G/G) mobile network, the number of mobile
phones, which is . billion, . per  inhabitants all
over the world in , is rising dramatically []. Nowadays,
mobile phone can not only send voice and text messages, but
also easily and conveniently access the Internet which has
been recognized as the most revolutionary development of
mobile Internet (M-Internet). Meanwhile, worldwide active
mobile-broadband subscriptions in  have increased to
. billion, which is .% higher than that in  [].
Figure  shows the numbers of mobile-cellular telephone
main districts from  to . e numbers which are
mobile-broadband subscriptions (million) in the world of
the year which increase each year. Under the M-Internet,
various kinds of content (image, voice, video, etc.) can be sent
and received everywhere and the related applications emerge
to satisfy people’s requirements, including working, study,
daily life, entertainment, education, and healthcare. In China,
mobile applications giants, i.e.,Baidu,Alibaba,andTencent,
held % of M-Internet online time per day in apps which
was about , minutes in  []. is gure indicates that
M-Internet has entered a rapid growth stage.
Nowadays, more than  billion smartphones are in use
and producing a great quantity of data every day. is
situation brings far-reaching impacts on society and social
interaction and increases great opportunities for business.
Meanwhile, with the rapid development of the Internet-of-
ings (IoT), much more data is automatically generated
by millions of machine nodes with growing mobility, for
example, sensors carried by moving objects or vehicles. e
volume, velocity, and variety of these data are increasing
extremely fast, and soon they will become the new criterion
for data analytics of enterprises and researchers. erefore,
mobile big data (MBD) has been already in our lives and is
being enriched rapidly. e trend for explosively increased
data volume with the increasing bandwidth and data rate in
the M-Internet has followed the same exponential increase
as Moore’s Law for semiconductors []. e prediction []
about the global data volume will grow up to  zettabytes
Wireless Communications and Mobile Computing
Volume 2018, Article ID 8738613, 19 pages
Wireless Communications and Mobile Computing
Mobilecellular telephone subscriptions (million)
2010 2011 2012 2013 2014 2015 2016 2017
Ye a r s
5890 6261 6661 6996 7184 7511 7740
Active mobilebroadband subscriptions (million)
02010 2011 2012 2013
Ye a r s
2014 2015 2016 2017
e Americas
Asia & Pacic
Other districts
F : Mobile-cellular telephone subscriptions (million) in (a) and active mobile-broadband subscriptions (million) in (b) of the world
and main districts [].
(1zettabyte =1×10
21 bytes) by  and  zettabytes
by . For M-Internet, . exabytes (1exabyte =1×
1018 bytes) data have been generated per month from the
mobile data trac in  [], . exabytes in  [], 
exabytes by  on forecasting [], and  exabytes by 
on forecasting []. According to the statistical and prediction
results, a concept called MBD has appeared.
e MBD can be considered as a huge quantity of mobile
data which are generated from a massive number of mobile
devices and cannot be processed and analyzed by a single
machine [, ]. MBD is playing and will play a more impor-
tant role than ever before by the popularization of mobile
devices including smartphones and IoT gadgets especially in
the era of G and the forthcoming the h generation (G)
[, ].
With the rapid development of informationtechnologies,
various data generated from dierent technical elds are
showing explosive growth trends []. Big data has broad
application prospects in many elds and has become impor-
tant national strategic resources []. In the era of big data,
many data analysis systems are facing big challenges as the
volume of data increases. erefore, analysis for MBD is
currently a highly focused topic. e importance of MBD
analysis is determined by its role in developing complex
mobile systems which supports a variety of intelligently inter-
active services, for example, healthcare, intelligent energy
networks, smart buildings, and online entertainments [].
MBD analysis can be dened as mining terabyte-level or
petabyte-level data collected from mobile users and wireless
devices at the network-level or the app-level to discover
unknown, latent, and meaningful patterns and knowledge
with large-scale machine learning methods [].
Present requirements of MBD are based on soware-
dened in order to be more scalable and exible. M-Internet
environment in the future will be even more complex and
interconnected []. For this purpose, data centers of MBD
need to collect user statistics information of millions of users
and obtain meaningful results by proper MBD analysis meth-
ods. For the decreasing price of data storage and widely acces-
sible high performance computers, an expansion of machine
learning has come into not only theoretical researches, but
also various application areas of big data. Even though, there
is a long way to go for the machine learning-based MBD
Machine learning technology has been used by many
Internet companies in their services: from web searches [,
] to content ltering [] and recommendation [, ] on
online social communities, shopping websites, or contend
distribution platforms. Furthermore, it is also frequently
ers, and smart furniture. Machine learning systems are used
to detect and classify objects, return most relevant searching
results, understand voice commands, and analyze using
habits. In recent years, big data machine learning has become
a hot spot []. Some conventional machine learning methods
based on Bayesian framework [–], distributed optimiza-
tion [–], and matrix factorization [] can be applied into
the aforementioned applications and have obtained good per-
have always been trying to ll their machine learning model
with more and more data []. Furthermore, the data we
dynamic and sparse value; these features make it harder to
analyze MBD with conventional machine learning methods.
erefore, the aforementioned applications implemented
with conventional machine learning methods have fallen
in a bottleneck period for low accuracy and generalization.
Recently, a class of novel techniques, called deep learning,
and has obtained good performances []. Machine learning,
especially deep learning, has been an essential technique in
order to use big data eectively.
Most conventional machine learning methods are shal-
low learning structures with one or none hidden layers.
Wireless Communications and Mobile Computing
ese methods performed well in practical use and were
precisely analyzed theoretically. But when dealing with high-
dimensional or complicated data, shallow machine learning
methods show their weakness. Deep learning methods are
developed to learn better representations automatically with
deep structure by using super vised or unsupervised strategies
[, ]. e features extracted by deep hidden layers are used
for regression, classication, or visualization. Deep learning
uses more hidden layers and parameters to t functions
which could extract high level features from complex data;
the parameters will be set automatically using large amount of
unsupervised data [, ]. e hidden layers of deep learning
algorithms help the model learn better representation of data;
the higher layers learn specic and abstract features from
global features learned by lower layers. Many surveys show
that nonlinear feature extractors that are linked up as stacks
such as deep learning methods always perform better in
machine learning tasks, for example, a more accurate clas-
sication method [], better learning of data probabilistic
models [], and the extraction of robust features []. Deep
learning methods have proved useful in data mining, natural
language processing, and computer vison applications. A
more detailed introduction of deep learning is presented in
Section ...
Articial Intelligence (AI) is a technology that develops
theories, methods, techniques, and applications that simulate
or extend human brain abilities. e research of observ-
ing, learning, and decision-making process in human brain
motivates the development of deep learning, which was
rst designed aiming to emulate the human brain’s neural
structures. Further observation on neural signals processing
and the eect on brain mechanisms [–] inspired the
architecture design of deep learning network, using layers
and neuron connections to generalize globally. Conventional
methods such as support vector machines, decision trees, and
case-based reasoning which are based on statistics or logic
knowledge of human may fall short when facing complex
structure or relationships of data. Deep learning methods
can learn patterns and relationships from hidden layers and
may benet the signal processing study in human brain
with visualization methods of neural network. Deep learning
has attracted much attention from AI researchers recently
because of its state-of-the-art performance in machine learn-
ing domains including no only the aforementioned natural
language processing (NLP), but also speech recognition [,
], collaborative ltering [], and computer vision [, ].
Deep learning has been successfully used in industry
products which have access to big data from users. Com-
panies in United States such as Google, Apple, Facebook,
and Chinese companies like Baidu, Alibaba, and Tencent
have been collecting and analyzing data from millions of
users and pushing forward deep learning based applications.
For example, Tencent YouTu Lab has developed identica-
tion (ID) card identication and bank card identication
systems. ese systems can read information from card
images to check user information while registering and bank
information while purchasing. e identication systems are
data provided by Tencent. Apple develops Siri, a virtual
intelligent assistant in iPhones, to answer questions about
weather, location, news according to voice commands and
dial numbers or send text messages. Siri also utilizes deep
learning methods and uses data from apple services [].
Google uses deep learning on Google translation service with
massive data collected by Google search engine.
MBD contains a large variety of information of oine
data and online real-time data stream generated from smart
mobile terminals, sensors, and services and hastens various
applications based on the advancement of data analysis tech-
nologies, such as collaborative ltering-based recommenda-
tion [, ], user social behavior characteristics analysis
[–], vehicle communications in the Internet of Vehicles
(IoV) [], online smart healthcare [], and city residents’
activity analysis []. Although the machine learning-based
methods are widely applied in the MBD elds and obtain
good performances in real data test, the present methods still
need to be further developed. erefore, ve main challenges
facing MBD analysis regarding the machine learning-based
methods include large-scale and high-speed M-Internet,
overtting and undertting problems, generalization prob-
lem, cross-modal learning, and extended channel dimensions
and should be considered.
is paper attempts to identify the requirement and the
development of machine learning-based mobile big data
analysis through discussing the insights of challenges in
the MBD and reviewing state-of-the-art applications of data
organized as follows. Section  introduces the development
of data collection and properties of MBD. e frequently
adopted methods of data analysis and typical applications
are reviewed in Section . Section  summarizes the future
challenges of MBD analysis and provides suggestions.
2. Development and Collection of
the Mobile Big Data
2.1. Data Collection. Data collection is the foundation of
a data processing and analysis system. Data are collected
from mobile smart terminals and Internet services, or
called mobile Internet devices (MIDs) generally, which are
multimedia-capable mobile devices providing wireless Inter-
net access and contain smartphones, wearable computers,
laptop computers, wireless sensors, etc. [].
MBD can be divided into two hierarchical data form:
transmission and application data, from bottom to top. e
transmission data focus on solving channel modeling [,
] and user access problems corresponding to the physical
transmission system of M-Internet. On this foundation,
application data focus on the applications based on the MBD
including social networks analysis [–], user behavior
analysis [, , ], speech analysis and decision in IoV [–
nance services [, ], etc.
Due to the heterogeneity of the M-Internet and the variety
of the access devices, the collected data are unstructured and
usually in many categories and formats, which make data
preprocessing become an essential part of a data processing
and analysis system in order to ensure the input data complete
Wireless Communications and Mobile Computing
Data Sets
Data Pre-processing
Data Integration
Data Cleaning
Transmission Data
Application Data
Social networks analysis,
user behavior analysis,
IoV, smart grid,
networked healthcare,
Finance services,
Channel modeling,
multiple user access,
Raw Data Generation of
Implicit Ratings
F : e procedures of data collection and preprocessing.
and reliable []. Data preprocessing can be divided into three
steps which are data cleaning, generation of implicit ratings,
and data integration [].
(1) Data Cleaning. Due to possible equipment failures, trans-
mission errors, or human factor, raw data are “dirty data”
which cannot be directly used, generally []. erefore, data
cleaning methods including outlier detection and denoising
are applied in the data preprocessing to obtain the data meet
required quality. Manual removal of error data is dicult and
impossible to accomplish in MBD due to the massive volume.
Common data cleaning methods can alleviate the dirty data
problem to some extent by training support vector regression
(SVR) classiers [], multiple linear regression models [],
autoencoder [], Bayesian methods [–], unsupervised
methods [], or information-theoretic models [].
(2) Generation of Implicit Ratings. Generation of implicit
of rating data increases rapidly by analyzing specic user
behaviors to solve data sparsity problem with machine learn-
ing algorithms, for example, neural networks and decision
trees [].
(3) Data Integration. Data integration is a step to integrate
data from dierent resources with dierent formats and
categories and to handle missing data elds [].
Figure  represents the procedures of data collection and
2.2. Properties of Mobile Big Data. e MBD brings a massive
methods for its high dimensionality, heterogeneity, and
other complex features from applications, such as planning,
operation and maintenance, optimization, and marketing
[]. is section discusses the ve Vs (short for volume,
velocity, variety, value, and veracity) features [] deriving
been improved in M-Internet, while it makes users access
Internet anytime and any where [].
(1) Volume: Large Number of MIDs, Exabyte-Level Data, and
High-Dimensional Data Space. Vol um e i s t he m o s t o b v io u s
feature of MBD. In the forthcoming G network and the
era of MBD, conventional store and analysis methods are
incapable of processing the x or more wireless trac
volume [, ]. It is of great urgency to improve present
MBD analysis methods and propose new ones. e methods
should be simple and cost-eective to be implemented for
MBD processing and analysis. Moreover, they should also be
eective enough without requiring a massive amount of data
various elds [].
(2) Velocity: Real-Time Data Streams and Eciency Require-
ment. Velocity can be considered as the speed at which
continuously streaming into the servers in real-time and
makes the original batch process break down []. Due to
the high generating rate of MBD, velocity is the eciency
requirement of MBD analysis since real-time data processing
and analysis are extremely important in order to maximize
the value of MBD streams [].
(3) Variety: Heterogeneous and Nonstructured Mobile Mul-
timedia Contents. Due to the heterogeneity of MBD which
means that mobile data trac comes from spatially dis-
tributed data resources (i.e., MIDs), the variety of MBD
arises and makes the MBD more complex []. Meanwhile,
the nonstructured MBD also causes the variety. e MBD
can be divided into structured data, semistructured data,
and unstructured data. Here, unstructured data are usually
contents []; therefore, they are dicult to analyze before data
cleaning and integration.
(4) Value: Mining Hidden Knowledge and Patterns from Low
Density Value Data. Value, or low density value of MBD, is
caused by a large amount of useless or repeated information
MBD analyzing which is hidden knowledge and patterns
extraction. e puried data can provide comprehensive
information to conduct more eectively analysis results about
user demands, user behaviors, and user habits [] and
to achieve better system management and more accurate
demand prediction and decision-making [].
Wireless Communications and Mobile Computing
(5) Veracity: Consistency, Trustworthiness, and Security of
MBD. e veracity of MBD includes two parts: data consis-
tency and trustworthiness []. It can also be summarized as
data quality. MBD quality is not guaranteed due to the noise
of transmission channel, the equipment malfunctioning,
(for instance, malicious invasion) resulting in low-quality
data points []. Veracity of MBD ensures that the data
used in analysis process are authentic and protected from
unauthorized access and modication [].
3. Applications of Machine Learning
Methods in the Mobile Big Data Analysis
3.1. Development of Data Analysis Methods. In this section,
we present some recent achievements in data analysis from
four dierent perspectives.
3.1.1. Divide-and-Conquer Strategy and Sampling of Big Data.
e strategies dividing and conquering big data is a com-
puting paradigm dealing with big data problems. e devel-
opment of distributed and parallel computing makes divide-
and-conquer strategy particularly important.
Generally speaking, whether the diversity of samples in
learning data benets the training results varies. Some redun-
dant and noisy data can cause a large amount of storage cost
as well as reducing the eciency of the learning algorithm
and aecting the learning accuracy. erefore, it is more
preferable to select representative samples to form a subset
of original sample space according to a certain performance
standard, such as maintaining the distribution of samples,
topological structure, and keeping classication accuracy.
formed subset to nish the learning task. In this way, we can
maintain or even improve the performance of big data analyz-
ing algorithm with minimum computing and stock resources.
e need to learn with big data demands on sample selection
suitable for smaller data sets, such as the traditional con-
densed nearest neighbor [], the reduced nearest neighbor
[], and the edited nearest neighbor []; the core concept
of these methods is to nd the minimum consistent subset.
To nd the minimum consistent subset, we need to test every
sample and the result is very sensitive to the initialization of
the subset and samples setting order. Li et al. [] proposed
a method to select the classication and edge boundary
ey keep the space information of the original data but
need to calculate k-means for each sample. Angiulli et al. [,
] proposed a fast condensation nearest neighbor (FCNN)
algorithm based on condensed nearest neighbor, which tends
to choose the classication boundary samples.
Jordan [] proposed statistical inference method for big
data. When dealing with statistical inference with divide-
and-conquer algorithm, we need to get condence intervals
from huge data sets. By data resampling and then calculating
condence interval, the Bootstrap theory aims to obtain the
uctuation of the evaluation value. But it does not t big
data. e incomplete sampling of data can lead to erroneous
range uctuations. Data sampling should be correct in order
to provide statistical inference calibration. An algorithm
named Bag of Little Bootstraps was proposed, which can
not only avoid this problem, but also has many advantages
on computation. Another problem discussed in [] is
massive matrix calculation. e divide-and-conquer strategy
is heuristic, which has a good eect in practical application.
However, new theoretical problems arise when trying to
describe the statistical properties of partition algorithm. To
this end, the support concentration theorem based on the
theory of random matrices has been proposed.
In conclusion, data partition and parallel processing
strategy is the basic strategy to deal with big data. But the
current partition and parallel processing strategy uses little
data distribution knowledge, which has inuence on the
load balancing and the calculation eciency of big data
processing. Hence, there exists an urgent requirement to solve
the problem about how to learn the distribution of big data for
the optimization of load balancing.
3.1.2. Feature Selection of Big Data. Intheeldofdatamining,
such as document classication and indexing, the dataset is
always large, which contains a large number of records and
features. is leads to the low eciency of algorithm. By
feature selection, we can eliminate the irrelevant features and
increase the speed of task analysis. us, we can get a better
preformed model with less running time.
deal with high-dimensional and sparse data. Trac network,
smartphone communication records, and information shared
on Internet provide a large number of high-dimensional
data, using tensor (such as a multidimensional array) as
natural representation. Tensor decomposition, in this condi-
tion, becomes an important tool for summary and analysis.
Kolda [] proposed an ecient use of the memory of the
Tucker decomposition method named as memory-ecient
Tucker (MET) decomposition decreasing time and space cost
which traditional tensor decomposition algorithm cannot do.
MET adaptively selects execution strategy based on available
memory in the process of decomposition. e algorithm
maximizes the speed of computation in the premise of using
the available memory. MET avoid dealing with the large
number of sporadic intermediate results proceeded during
the calculation process. e adaptive selections of operation
sequence not only eliminate the intermediate overow prob-
lem, but also save memory without reducing the precision.
On the other hand, Wahba [] proposed two approaches to
the statistical machine learning model which involve discrete,
noisy, and incomplete data. ese two methods are regular-
ized kernel estimation (RKE) and robust manifold unfolding
(RMU). ese methods use dissimilarity between training
information to get nonnegative low rank denite matrix.
e matrix will then be embedded into a low dimensional
of various learning modes. Similarly, most online learning
research needs to access all features of training instances.
Such classic scenario is not always suitable for practical
applications when facing high-dimensional data instances or
expensive feature sets. In order to break through this limit,
Wireless Communications and Mobile Computing
Hoi et al. [] propose an ecient algorithm to predict
online feature solving problem using some active features
based on their study of sparse regularization and truncation
public data sets for feature selection performance.
e traditional self-organizing map (SOM) can be used
for feature extraction. But the low speed of SOM limits its
usage on large data sets. Sagheer [] proposed a fast self-
organizing map (FSOM) to solve this problem. e goal of
this method is to nd a feature space where data is mainly
distributed in. If there exits such area, data can be extracted
in these areas instead of information extraction in overall
feature spaces. In this way, we can greatly reduce extraction
Anaraki [] proposed a threshold method of fuzzy
rough set feature selection based on fuzzy lower approxima-
tion. is method adds a threshold to limit the QuickReduct
feature selection. e results of the experiment prove that this
method can also help the accuracy of feature extraction with
lower running time.
Gheyas et al. [] proposed a hybrid algorithm of sim-
ulated annealing and genetic algorithm (SAGA), combining
the advantages of simulated annealing algorithm, genetic
algorithm, greedy algorithm, and neural network algorithm,
to solve the NP-hard problem of selecting optimal feature
subset. e experiment shows that this algorithm can nd
better optimal feature subset, reducing the time cost sharply.
Gheyas pointed in as conclusion that there is seldom a single
algorithm which can solve all the problems; the combination
of algorithms can eectively raise the overall aect.
To sum up, because of the complexity, high dimen-
sionality, and uncertain characteristics of big data, it is an
data processing by using dimension reduction and feature
selection technology.
3.1.3. Big Data Classication. Supervised learning (classi-
cation) faces a new challenge of how to deal with big data.
Currently, classication problems involving large-scale data
are ubiquitous, but the traditional classication algorithms do
not t big data processing properly.
(1) Support Vector Machine (SVM). Traditional statistical
facing big data. () Traditional statistical machine learning
methods are always involving intensive computing which
of model that ts the robust and nonparameter condence
interval is unknown. Lau et al. [] proposed an online
support vector machine (SVM) learning algorithm to deal
with the classication problem for sequentially provided
input data. e classication algorithm is faster, with less
support vectors, and has better generalization ability. Laskov
et al. [] proposed a rapid, stable, and robust numerical
incremental support vector machine learning method. Chang
as a library for SVM code implementation.
In addition, Huang et al. [] present a large margin
classier M. Unlike other large margin classiers which
locally or globally constructed separation hyperplane, this
model can learn both local and global decision bound-
ary. SVM and minimax probability machine (MPM) has a
close connection with the model. e model has important
theoretical signicance and furthermore, the optimization
problem of maxi-min margin machine (M4)canbesolvedin
polynomial time.
(2) Decision Tree (DT). Traditional decision tree (DT), as a
classic classication learning algorithm, has a large memory
requirement problem when processing big data. Franco-
Arcega et al. [] put forward a method of constructing DT
from big data, which overcomes some weakness of algorithms
in use. Furthermore, it can use all training data without
saving them in memory. Experimental results showed that
this method is faster than current decision tree algorithm
on large-scale problems. Yang et al. [] proposed a fast
incremental optimization decision tree algorithm for large
data processing with noise. Compared with former deci-
sion tree data mining algorithm, this method has a major
advantage on real-time speed for data mining, which is quite
suitable when dealing with continuous data from mobile
devices. e most valuable feature of this model is that it
can prevent explosive growth of the decision tree size and
the decrease of prediction accuracy when the data packet
contains noise. e model can generate compact decision tree
and predict accuracy even with highly noisy data. Ben-Haim
et al. [] proposed an algorithm of building parallel decision
tree classier. e algorithm runs in distributed environment
eciency under the premise of accuracy error approxima-
(3) Neural Network and Extreme Learning Machine (ELM).
Traditional feedforward neural networks usually use gradient
descent algorithm to tune weight parameters. Generally
speaking, slow learning speed and poor generalization per-
formance are the bottlenecks that restrict the application of
feedforward neural network. Huang et al. [] discarded the
iterative adjustment strategy of the gradient descent algo-
rithm and proposed extreme learning machine (ELM). is
method randomly assigns the input weights and the devia-
tions of the single hidden layer neural network.It can analyze
Compared to the traditional feedforward neural network
training algorithm, the network weights can be determined
by multiple iterations, and the training speed of ELM is
signicantly improved.
However, due to the limitation of computing resource and
computational complexity, it is a dicult problem to train a
this problem: () training ELM [] based with divide-and-
conquer strategy; () introducing parallel mechanism []
to train a single ELM. It is shown in [, ] that a single
ELM has strong function approximation ability. Whether it
is possible to extend this approximation capability to ELM
based on divide-and-conquer strategy is a key index to
evaluate the possibility that ELM can be applied to big data.
Wireless Communications and Mobile Computing
Some of the related studies also include eective learning to
solve such problem [].
In summary, the traditional classication method of
machine learning is dicult to apply to the analysis of big
data directly. e study of parallel or improved strategies
of dierent classication algorithms has become the new
3.1.4. Big Data Deep Learning. With the unprecedentedly
large and rapidly growing volumes of data, it is hard for
us to get hidden information from big data with ordinary
machine learning methods. e shallow-structured learning
architectures of most conventional learning methods are not
t for the complex structures and relationships in these
input data. Big data deep learning algorithm, with its deep
architectures and globally feature extracting ability, can learn
complex patterns and hidden connections beyond big data
[, ]. It has had state-of-the-art performances in many
benchmarks and also been applied in industry products. In
this section, we will introduce some deep learning methods
in big data analytics.
Big data deep learning has some problems: () the hidden
layers of deep network make it dicult to learn from a given
data vector, () the gradient descent method for parameters
learning makes the initialization time increasing sharply as
the number of parameters arises, and () the approximations
at the deepest hidden layer may be poor. Hinton et al. []
proposed a deep architecture: deep belief network (DBN)
which can learn from both labeled and unlabeled data by
using unsupervised pretraining method to learn unlabeled
data distributions and a supervised ne-tune method to
construct the models, and solved part of the aforementioned
problems. Meanwhile, subsequent researches, for example,
[], improved the DBN trying to solve the problems.
Convolutional neural network (CNN) [] is another
popular deep learning network structure for big data ana-
lyzing. A CNN has three common features including local
receptive elds, shared weights, and spatial or temporal sub-
sampling, and two typical types of layers [, ]. Con-
volutional layers are key parts of CNN structure aiming to
extract features from image. Subsampling layers, which are
also called pooling layers, adjust outputs from convolutional
layer to get translation invariance. CNN is mainly applied
in computer vision eld for big data, for example, image
classication [, ] and image segmentation [].
Document (or textual) representation, also part of NLP,
is the basic method for information retrieval and important
to understand natural language. Document representation
nds specic or important information from the documents
by analyzing document structure and content. e unique
information could be document topic or a set of labels
highly related to the document. Shallow models for document
representation only focus on small part of the text and
get simple connection between words and sentences. Using
deep learning can get global representation of the document
because of its large receptive eld and hidden layers which
could extract more meaningful information. e deep learn-
ing methods for document representation make it possible to
obtain features from high-dimensional textual data. Hinton
et al. [] proposed deep generative model to learn binary
codes for documents which make documents easy to store
up. Socher et al. [] proposed a recursive neural network on
analyzing natural language and contexts, achieving state-of-
the-art results on segmentation and understanding of natural
language processing. Kumer et al. [] proposed recurrent
neural networks (RNN) which construct search space from
large amount of textual data.
With the rapid growth and complexity of academic and
industry data sets, how to train deep learning models with
large amount of parameters has been a major problem. e
ble parameter updating methods for training deep models.
Researchers focus on large-scale deep learning that can be
implemented in parallel including improved optimizers []
and new structures [, –].
In conclusion, big data deep learning methods are the
key methods of data mining. ey use complex structure
to learn patterns from big data sets and multimodal data.
e development of data storage and computing technology
promotes the development of deep learning methods and
makes it easier to use in practical situations.
3.2. Wireless Channel Modeling. As is well known, wireless
communication transmits information through electromag-
netic waves between a transmitting antenna and a receiving
antenna, which is deemed as a wireless channel. In the past
few decades, the channel dimension has been extended to
space, time, and frequency, which means the channel prop-
erty is comprehensively discovered. Another development is
that channel characteristics can be accurately described by
dierent methods, such as channel modeling [].
Liang et al. [] used machine learning to predict
channel state information so as to decease the pilot overhead.
Especially for G, wireless big data emerges and its related
technologies are employed to traditional communication
research to meet the demand of G. However, the wireless
channel is essentially a physical electromagnetic wave, and
the current G channel model research follows the traditional
way. Zhang [] proposed an interdisciplinary study of
big data and wireless channels, which is a cluster-based
channel model. In the cluster-nuclei based channel model,
the multipath components (MPCs) are aggregated into a
traditional stochastically channel model. At the same time,
the scene is discerned by the computer and the environment
is rebuilt by machine learning methods. en, by matching
the real propagation objects with the clusters, the cluster-
nuclei, which are the key factors in contacting deterministic
environment and stochastic clusters, can be easily found.
ere are two main steps employing the machine learning
methods in the cluster-nuclei based channel model. e
recent progress is shown as follows.
3.2.1. A Gaussian Mixture Model (GMM) Based Channel
MPCs Clustering Method. e MPCs are clustered with the
Gaussian mixture model (GMM) [, ]. Using sucient
statistic characteristics of channel multipath, the GMM can
get clusters corresponding to the multipath propagation
characteristics. e GMM assumes that all the MPCs consist
Wireless Communications and Mobile Computing
delay ((s))
AOD (rad)
AOA (rad)
F : Clustering results of GMM [].
of several Gaussian distributions in varying proportions.
Given a set of channel multipath , the log-likelihood of
the Gaussian mixture model is
𝑘=1𝑘𝑖|𝑖;𝑘𝑘, ()
where Θ={
𝑘,𝑘𝑘, = 1,⋅⋅⋅,}is the set of all the
parameters and 𝑘∈ [0,1]is the prior probability satisfying
the constraint 𝐾
𝑘=1 𝑘=1. To estimate the GMM parameters,
expectation maximization (EM) algorithm is employed to
solve the log-likelihood function of GMM []. Figure 
illustrates the simulation result of GMM clustering algorithm.
As seen in Figure , the GMM clustering obtains clearly
compact clusters. As scattering property of the channel
multipath obeys Gaussian distribution, the compact clusters
can accord with the multipath scattering property. Moreover,
corresponding to the clustering mechanism of GMM, paper
[] proposed a compact index (CI) to evaluate the clustering
results shown as follows:
= tr ()/(−1)
tr ()/(−)𝐾
𝑘, ()
where 2
𝑘is the variance of the kth cluster and tr()and tr()
are given as
tr ()=𝐾
tr ()=𝐾
where 𝑘is the number of multipaths corresponding to
the kth cluster. Both the means and variances of the clus-
ters are considered in CI. Considering sucient statistics
characteristics, CI can uncover the inherent information of
multipath parameters and provide appropriate explanation to
the clustering result. Besides, considering sucient statistics
characteristics, the CI can evaluate the clustering results more
3.2.2. Identifying the Scatters with the Simultaneous Localiza-
tion and Mapping Algorithm (SLAM). In order to reconstruct
three-dimensional (D) propagation environment and to nd
the main deterministic objects, simultaneous localization and
mapping (SLAM) algorithm is used to identify the texture
from the measurement scenario picture [, ]. Figure
illustrates our indoor reconstruction result with SLAM algo-
e texture of propagation environment can be used to
search for the main scatters in the propagation environment.
en, the three-dimensional propagation environment can
be reconstructed with the deep learning method.
en the mechanism to form the cluster-nuclei is clear.
e channel impulse response can be produced by machine
learning with a limited number of cluster-nuclei, i.e.,decision
tree [], neural network [], and mixture model [].
Based on the database from various scenarios, antenna
congurations, and frequency, channel changing rules can be
explored and then input into the cluster-nuclei based mod-
eling. Finally, the predication of channel impulse response in
various scenarios and conguration can be realized [].
3.3. Analyses of Human Online and Oine Behavior Based
on Mobile Big Data. e advances of wireless networks and
increasing mobile applications bring about explosion of
mobile trac data. It is a good source of knowledge to obtain
the individuals’ movement regularity and acquire the mobil-
ity dynamics of populations of millions []. Previous
researches have described how individuals visit geographical
locations and employed mobile trac data to analyze human
oine mobility patterns. Representative works like [, ]
explore the mobility of users in terms of the number of
base stations they visited, which turned out to be a heavy
tail distribution. Authors in [, , ] also reveal that
a few important locations are frequently visited by users.
In particular, these preferred locations are usually related to
home and work places. Moreover, through dening a measure
of entropy, Song et al. [] believe that % of individual
movements are potentially predictable. us, various models
have been applied to describe the human oine mobility
behavior []. Passively collecting human mobile trac data
advantages like low energy consumption. In general, the
mobile big data covers a wide range and a great number
of populations with ne time granularity, which gives us an
data sources are very hard to reach []. Novel oine user
mobility models developed based on the mobile big data are
expected to benet many elds, including urban planning,
road trac engineering, telecommunication network con-
struction, and human sociology [].
Online browsing behavior is another important facet
regarding user behavior when it comes to network resource
consumption. A variety of applications are now available
on smart devices, covering all aspects of our daily life and
Wireless Communications and Mobile Computing
(a) (b)
F : Recognition of multiobjects with SLAM algorithm: (a) real indoor scene and (b) reconstruction result with SLAM algorithm.
(a) App usage behavior of Bob in temporal and spatial dimension
(b) App usage behavior of crowds at crowd gathering place
F : App usage behavior in daily life: (a) the app usage behavior of an individual and (b) app usage behavior of crowds at crowd gathering
places [].
providing convenience. For example, we can order taxies,
shop, and book hotels using mobile phones. Yang et al.
[] provide a comprehensive study on user behaviors in
exploiting the mobile Internet. It has been found that many
factors, such as data usage and mobility pattern, may impact
people’s online behavior on mobile devices. It is discovered
that the more the number of distinct cells a user visit, the
more diverse applications user has visited. Zheng et al. []
analyze the longitudinal impact of proximity density, per-
sonality, and location on smartphone trac consumption. In
particular, location has been proven to have strong inuences
on what kinds of apps users prefer to use [, ]. e
aforementioned observations point out that there is a close
relationship between online browsing behavior and oine
mobility behavior.
Figure (a) is an example of how browsed applications
and current location related to each other from the view
of temporal and spatial regularity. It has been found that
the mobility behaviors have strong inuences on online
observed for crowds at crowd gathering places, as is shown
in Figure (b); i.e., certain apps are favored at places that
e authors in [] tried to measure the relationship between
human mobility and app usage behavior. In particular, the
authors proposed a rating framework which can forecast
the online app usage behavior for individuals and crowds.
Building the bridge between human oine mobility and
online mobile Internet behavior can tell us what people
really need in daily life. Content providers can leverage this
knowledge to appropriately recommend content for mobile
users. At the same time, Internet service providers (ISPs) can
use this knowledge to optimize networks for better end-user
In order to make full use of users’ online and oine
information, some researchers begin to quantize the interplay
between online social network and oine social network
and investigate network dynamics from the view of mobile
trac data [–]. Specically, the online and oine
social networks are, respectively, constructed based on online
interest based and location based social network among
mobile users. e two dierent networks are grouped into
layers of a multilayer social network ={
shown in Figure . 𝑜𝑓𝑓 and 𝑜𝑛 depict oine and online
 Wireless Communications and Mobile Computing
Iff }
F : Multilayer model of a network [].
social network separately. In each layer, the graph is described
as G =V,E,whereand , respectively, represent node
sets and edge sets. Nodes, such as 1,...,4, represent users.
Edges exist among users when users share similar object-
based interests []. Combining information from manifold
networks in a multilayer structure provides a new insight
into user interactions between virtual and physical worlds.
It sheds light on the link generation process from multiple
views, which will improve social bootstrapping and friend
recommendations in various valuable applications by a large
margin [].
So far, we have summarized some representative works
related to human online and oine behaviors. It is mean-
ingful to note that owing to the highly spatial-temporal and
nonhomogeneous nature of mobile trac data, a pervasive
framework is challenging yet indispensable to realize the
collection, processing, and analyses of massive data, reducing
resource consumption and improving Quality of Experience
work for MBD (FMBD). It provides comprehensive functions
on data collection, storage, processing, analyzing, and man-
agement to monitor and analyze the massive data. Figure (a)
displays the architecture of FMBD, while Figure (b) shows
the considered mobile networks framework. With the inter-
action between user equipment and G/G/G network,
toring equipment (TME). e implementation modules are
a security environment and easy-to-use platform both for
operators and data analysts, showing good performance on
energy eciency, portability, extensibility, usability, security,
and stability. In order to meet the increasing demands on
trac monitoring and analyzing, the framework provides a
solution to deal with large-scale mobile big data.
In conclusion, the prosperity of continuously emerg-
ing mobile applications and users’ increasing demands on
accessing Internet all bring about challenges for current and
future mobile networks. is section surveys the literature
on analyses of human online and oine behavior based
on the mobile trac data. Moreover, a framework has also
been investigated, in order to meet the higher requirement
of dealing with dramatically increased mobile trac data.
information for the ISPs on network deployment, resource
management, and the design of future mobile network
3.4. Speech Recognition and Verication for the Internet of
Vehi cles . With the signicant development of smart vehicle
technologies have received widespread attention of many
giant Internet businesses [–]. e IoV technologies
include the communication between dierent vehicles and
vehicles to sensors, roads, and humans. ese communica-
tions can help the IoV system sharing and the gathering
information on vehicles and their surrounds.
One of the challenges in the real-life applications of
smart vehicles and IoV systems is how to design a robust
interactive method between drivers and the IoV system
[]. e level of focusing on driving will directly aect
the danger of driver and passengers; hence, the attention
of drivers should be paid on the complex road situation
in order to avoid accidents during an intense driving. So,
using the voices transfer information to the IoV systems is
an eective solution for assistant and cooperative driving.
By building a speech recognition interactive system, the
driver can check trac jams near the destination or order
a lunch in the restaurant near the rest stop through the
IoV system by using voice-based interaction. e speech
recognition interactive system for IoV system can reduce
the risk of vehicle accident, and the drivers do not need to
touch the control panels or any buttons. A useful speech
recognition system in IoV can simplify the life of the drivers
and passengers in vehicles []. In the IoV system, drivers
want to use their own voice commands to control the driving
vehicles, and the IoV system must recognize the dierence
between an authorized and unauthorized user. erefore, an
automatic speaker verication system is necessary in IoV,
which can protect the vehicle from the imposters.
Recently, many deep learning methods have been applied
in the speech recognition and speaker verication systems
[, –], and published results show that speech pro-
cessing methods driven by MBD and deep learning can
obviously improve the performance of the existing speech
recognition and speaker verication system [, , ]. In
the IoV systems, millions of sensors collect abundant vehicles
and environmental noises from engines and streets will
signicantly reduce the accuracy of speech processing system,
while the traditional speech enhancement methods, for
example, Wiener ltering [] and minimum mean-square
error estimation (MMSE) [] which focus on advancing
signal noise ratio (SNR), do not take full advantage of a
priori distribution of noises around vehicles. With the help
of machine learning and deep learning methods, we can use
a priori knowledge of the noises to improve the robustness of
speech processing systems.
For speech recognition task, deep-neural-network
classier, instead of the traditional GMM based classier.
Moreover, the deep-neural-network hidden Markov model
(DNN-HMM) speech recognition model can signicantly
improve the performance of Gaussian mixture model hidden
Markov model (GMM-HMM) models [–]. As shown
Wireless Communications and Mobile Computing 
F : e overall architecture of framework for mobile big data (FMBD) and our considered mobile networks architecture [].
Noise recordings
Clean speech Corrupted speech
DNN training
Noise series
F : Multitraining DNN [].
in Figure , making full use of the self-adaption power of
DNN, we can use the multitraining methods to improve
the robustness of DNN monophone classier by adding
noise into the training data []. e experimental results
in [, ] show that the multitraining method can build a
matched training and testing condition which can improve
the accuracy of noisy speech recognition, especially for the
prior knowledge of noise types that we can easily obtain in
a feature mapping network (FMN) which uses noisy features
as input and corresponding clean features as training target.
Enhanced features extracted by the FMN can improve the
performance of speech recognition systems. Han et al. []
used FMN to extract one enhanced Mel-frequency cepstral
coecient (MFCC) frame from  noisy MFCCs frames. Xu
et al. [] built a FMN which learned the mapping from
a log spectrogram to a log Mel lter bank. e enhanced
feature can remarkably reduce the word error rate in speech
Besides getting the mapping feature directly, the DNN can
also be used to train an ideal binary mask (IBM) which can be
used to separate the clean speech from background noise as
shown in Figure  [, , ]. With a priori knowledge of
noise types and SNR, we can generate IBMs as training targets
and use noisy power spectral as training data. In the test
phase, we can use the learned IBMs to get enhanced features
which can improve the robustness of speech recognition.
In speaker verication tasks, the classical GMM based
methods, for example, Gaussian mixture model universal
background model (GMM-UBM) [] and i-vector systems
[], need to build a background GMM, rstly, using a large
quantity of speaker independent speeches. en, by comput-
ing the statistics information on each GMM component of
enrollment speakers, we can get speaker models or speaker i-
vectors. However, a trained monophone classication DNN
can replace the function of GMM by computing the statis-
tics information on each monophone instead of on GMM
components. Many published papers [–] show that the
DNN-i-vector based speaker verication systems work better
than the GMM-i-vector method on detection accuracy and
Unlike in the speech recognition tasks where the DNNs
researchers more prefer to use a DNN or convolutional neural
network (CNN) to generate noise robustness bottleneck
feature directly in speaker verication tasks [–]. As
shown in Figure , acoustic features or feature maps are
used to train a DNN/CNN with a bottleneck layer which
has less nodes and closes to the output layer. Speaker ID,
noise types, monophone labels, or combination of these
labels are used as training targets. Outputs of bottleneck
layers include abundant dierentiated information and can
be used as speaker verication features which improve the
performance of classical speaker verication methods such
as the aforementioned GMM-UBM and i-vector. Similar to
the multitraining method, adding noisy speeches into the
training data can also improve the robustness of extracted
bottleneck features [, ].
Recently, some adversarial training methods are intro-
duced to extract noise invariant bottleneck features [, ].
As shown in Figure , the adversarial network includes two
parts, i.e., an encoding network (EN) which can extract noise
invariant features and a discriminative network (DN) which
can judge noise types of the noise invariant feature generated
from EN. erefore, we can get robustness noise invariant
features from EN which can improve the performance of
 Wireless Communications and Mobile Computing
Noisy Feature
Feature Feature
Enhanced Feature
Word Sequence
Mapping NetworkExtraction
Noisy Speech Signal
Tra i n ing
F : DNN used for feature mapping [].
Tra i n ing
IBMs Learning
Feature extraction
Power Spectral
Speech recognition
Wor d S equence
Speaker ID
Noise type
Monophone Label
Training Label
Training Label
64 nodes
64 nodes
2048 nodes
2048 nodes
2048 nodes
Acoustic Feature Acoustic Feature Maps
F : DNN/CNN used for extracting bottleneck feature [].
speaker verication system by adversarial training these two
parts in turn [, ].
In conclusion, using DNN and machine learning meth-
ods can make full use of the MBD collected from the IoV
systems. Moreover, it improves the performance of speech
recognition and speaker verication methods applied in the
voice interactive systems.
4. Conclusions and Future Challenges
Although the machine learning-based methods introduced
in Section  are widely applied in the MBD elds and obtain
good performances in real data test, the present methods still
need to be further developed. erefore, ve main challenges
facing MBD analysis regarding the machine learning-based
methods should be considered as follows.
(1) Large-Scale and High-Speed M-Internet. Due to the growth
of MIDs and high speed of M-Internet, increasingly various
mobile data trac is introduced and results in a heavy
load to the wireless transmission system, which leads us
to improve wireless communication technologies including
WLAN and cellular mobile communication. In addition, the
requirement of real-time services and applications depends
Wireless Communications and Mobile Computing 
Training Label
Noise type
Noise invariant
Bottleneck Feature
Network (DN)
Network (EN)
Acoustic Feature
F : Adversarial training network for noise invariant bottle-
neck feature extraction [].
methods towards high eciency and precision.
(2) Overtting and Undertting Problems. A benet of MBD
to machine learning and deep learning lies in the fact that the
risk of overtting becomes smaller with more and more data
available for training []. However, undertting is another
problem for the oversize data volume. In this condition, a
larger model might be a better selection, while the model can
express more hidden information of the data. Nevertheless,
larger model which generally implies a deeper structure
increases runtime of the model which aects the real-time
performance. erefore, the model size in machine learning
and deep learning, which represents number of parameters,
should be balanced to model performance and runtime.
(3) Generalization Problem. As the massive scale of MBD, it
is impossible to gain entire data even if they are only in a
specic eld. erefore, the generalization ability which can
be dened as suitable of dierent data subspace, or called
scalability, of a trained machine learning or deep learning
model is of great importance for evaluating the perform-
(4) Cross-Modal Learning. e variety of MBD causes multi-
ple modalities of data (for example, images, audios, personal
location, web documents, and temperature) generated from
multiple sensors (correspondingly, cameras, sound recorders,
position sensor, and temperature sensor). Multimodal learn-
ing should learn from multimodal and heterogeneous input
data with machine learning and deep learning [, ] and
obtain hidden knowledge and meaningful patterns; however,
it is quite dicult to discover.
(5) Extended Channel Dimensions. e channel dimensions
have been extended to three domains, i.e., space, time, and
frequency, which means that the channel property is com-
prehensively discovered. Meanwhile, the increasing antenna
number, high bandwidth, and various application scenarios
bring the big data of channel measurements and estimations,
especially for G. e nding channel characteristics need to
be precisely described by more advanced channel modeling
In this paper, the applications and challenges of machine
learning-based MBD analysis in the M-Internet have been
reviewed and discussed. e development of MBD in various
application scenarios requires more advanced data analysis
technologies especially machine learning-based methods.
ree typical applications of MBD analysis focus on wireless
channel modeling, human online and oine behavior anal-
ysis, and speech recognition and verication in the Internet
of Vehicles, respectively, and the machine learning-based
methods used are widely applied in many other elds. In
order to meet the aforementioned future challenges, three
main study aims, i.e., accuracy, feasibility, and scalability [],
are highlighted for present and future MBD analysis research.
In future work, accuracy improving will be also the primary
task on the basis of a feasible architecture for MBD anal-
ysis. In addition, as the aforementioned discussion of the
generalization problem, scalability has obtained more and
more attentions especially in a classication or recognition
problem where scalability also includes the increase in the
number of inferred classes. It is of great importance to
improve the scalability of the methods with the high accuracy
and feasibility in order to face the analysis requirements of
Conflicts of Interest
e authors declare that they have no conicts of interest.
is paper was supported in part by the National Natural
Science Foundation of China (NSFC) [Grant no. ];
in part by the Beijing Nova Program Interdisciplinary Coop-
eration Project [Grant no. Z]; in part by
the Beijing Nova Program [Grant no. Z];
in part by the Beijing Natural Science Foundation (BNSF)
[Grant no. ]; in part by the Funds of Beijing Lab-
oratory of Advanced Information Networks of BUPT; in
part by the Funds of Beijing Key Laboratory of Network
System Architecture and Convergence of BUPT; and in part
by BUPT Excellent Ph.D. Students Foundation [Grant no.
[] International Telecommunication Union (ITU), “ICT Facts and
Figures ,”
facts/default.aspx, .
[] Meeker, “Internet Trend ,
net-trends, .
[] G. Fettweis and S. Alamouti, “G: personal mobile internet
beyond what cellular did to telephony,” IEEE Communications
[] M.A.Alsheikh,D.Niyato,S.Lin,H.-P.Tan,andZ.Han,“Mobile
big data analytics using deep learning and apache spark,IEEE
 Wireless Communications and Mobile Computing
[] Cisco, “Cisco Visual Networking Index: Global Mobile Data
Trac Forecast Update, - White Paper,” https://www
the city residents’ activity information through mobile big data
mining,” in Proceedings of the Joint 15th IEEE International Con-
ference on Trust, Security and Privacy in Computing and Com-
munications, 10th IEEE International Conference on Big Data
Science and Engineering and 14th IEEE International Symposium
on Parallel and Distributed Processing with Applications, IEEE
TrustCom/BigDataSE/ISPA 2016, pp. –, China, August
[] Z. Liao, Q. Yin, Y. Huang, and L. Sheng, “Management and
application of mobile big data,International Journal of Embed-
ded Systems,vol.,no.,pp.,.
[] M. Agiwal, A. Roy, and N. Saxena, “Next generation G wireless
networks: a comprehensive survey,IEEE Communications Sur-
veys & Tutorials,vol.,no.,pp.,.
[] W. Li and Z. Zhou, “Learning to hash for big data: current status
and future trends,Chinese Science Bulletin (Chinese Version),
vol. , no. -, p. , .
[] V. Mayersch¨
onberger and K. Cukier, Big Data: A Revolution
lan/Houghton Miin Harcourt, Boston, .
[] D. Z. Yazti and S. Krishnaswamy, “Mobile big data analytics:
research, practice, and opportunities,” in Proceedings of the 15th
IEEE International Conference on Mobile Data Management,
IEEE MDM 2014,pp.-,Australia,July.
[] E. Zeydan, E. Bastug, M. Bennis et al., “Big data caching for
networking: moving fromcloud to edge,I EEE Communications
large scale web-based Chinese short text,” in Proceedings of the
International Conference on Computer Science and Application
Engineering (CSAE),pp.,.
[] Z. Wang, Y. Qi, J. Liu, and Z. Ma, “User intention understanding
from scratch,” in Proceedings of the 1st International Workshop
on Sensing, Processing and Learning for Intelligent Machines,
SPLINE 2016,Denmark,July.
[] C. Zhang, Z. Si, Z. Ma, X. Xi, and Y. Yin, “Mining sequential
update summarization with hierarchical text analysis,Mobile
Information Systems,vol.,ArticleID,pages,
[] C. Zhang, Y. Zhang, W. Xu, Z. Ma, Y. Leng, and J. Guo, “Min-
ing activation force dened dependency patterns for relation
extraction,Knowledge-Based Systems,vol.,pp.,
[] C. Zhang, W. Xu, Z. Ma, S. Gao, Q. Li, and J. Guo, “Construction
of semantic bootstrapping models for relation extraction,
Knowledge-Based Systems,vol.,pp.,.
[] M. Jordan, “Message from the president: the era of big data,
ISBA Bull,vol.,pp.,.
[] W.Chen,D.Wipf,Y.Wang,Y.Liu,andI.J.Wassell,“Simul-
taneous Bayesian sparse approximation with structured sparse
models,IEEE Transactions on Signal Processing,vol.,no.,
[] W.Chen,M.R.D.Rodrigues,andI.J.Wassell,“Projection
design for statistical compressive sensing: a tight frame based
approach,IEEE Transactions on Signal Processing,vol.,no.,
[] H.Yong,D.Meng,W.Zuo,andL.Zhang,“Robustonlinematrix
factorization for dynamic background subtraction,IEEE
Transactions on Pattern Analysis and Machine Intelligence,.
[] Q. Xie, D. Zeng, Q. Zhao et al., “Robust low-dose CT sinogram
prepocessing via exploiting noise-generating mechanism,IEEE
Transac t ion s o n Med ical Ima g ing , vol. , no. , pp. –,
“Function splitting and quadratic approximation of the primal-
dual method of multipliers for distributed optimization over
graphs,IEEE Transactions on Signal and Information Processing
over Networks, pp. -, .
[] G. Zhang and R. Heusdens, “Distributed optimization using the
primal-dualmethodofmultipliers,IEEE Transactions onSignal
and Information Processing over Networks,vol.,no.,pp.
, .
[] G. Zhang and R. Heusdens, “Linear coordinate-descent mes-
sage passing for quadratic optimization,” in Proceedings of the
International Conference on Acoustics, Speech, and Signal Pro-
cessing, pp. -, .
[] G. Zhang, R. Heusdens, and W. B. Kleijn, “Large scale LP decod-
ing with low complexity,IEEE Communications Letters,vol.,
no. , pp. –, .
[] Z.Ma,A.E.Teschendor,A.Leijon,Y.Qiao,H.Zhang,and
J. Guo, “Variational bayesian matrix factorization for bounded
support data,IEEE Transactions on Pattern Analysis and
Machine Intelligence,vol.,no.,pp.,.
[] Z.-H.Zhou,N.V.Chawla,Y.Jin,andG.J.Williams,“Bigdata
opportunities and challenges: discussions from data analytics
perspectives,IEEE Computational Intelligence Magazine,vol.,
no. , pp. –, .
[] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,Nature,
[] Y. Bengio and S. Bengio, “Modeling high-dimensional discrete
data with multi-layer neural networks,” in Proceedings of the 13th
Annual Neural Information Processing Systems Conference, NIPS
1999, pp. –, USA, December .
[] M. Ranzato, Y.-L. Boureau, and Y. L e Cun, “Sparse feature learn-
ing for deep belief networks,” in Advances in Neural Information
Processing Systems,pp.,.
[] G.E.Hinton,S.Osindero,andY.Teh,“Afastlearningalgorithm
for deep belief nets,Neural Computation,vol.,no.,pp.
, .
[] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy
layer-wise training of deep networks,” in Proceedings of the 20th
Annual Conference on Neural Information Processing Systems
(NIPS ’06),pp.,Cambridge,Mass,USA,December
[] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Explor-
ing strategies for training deep neural networks,Journal of
Machine Learning Research,vol.,pp.,.
[] R. Salakhutdinov and G. Hinton, “Deep boltzmann machines,
in Proceedings of the International Conference on Articial
Intelligence and Statistics,vol.,pp.,.
[] I. Goodfellow, H. Lee, and Q. V. Le, “Measuring invariances
in deep networks,Neural Information Processing Systems,pp.
–, .
[] Y. Bengio and Y. LeCun, “Scaling learning algorithms towards,
AI,Large Scale Kernel Machines,vol.,pp.,.
[] Y. Bengio, A. Courville, and P. Vincent, “Representation learn-
ing: a review and new perspectives,IEEE Transactions on
Wireless Communications and Mobile Computing 
Pattern Analysis and Machine Intelligence,vol.,no.,pp.
–, .
learning—a new frontier in articial intelligence research,IEEE
Computational Intelligence Magazine,vol.,no.,pp.,
[] G. E. Dahl, D. Yu, L. Deng et al., “Context-dependent pre-
trained deep neural networks for large-vocabulary speech
recognition,IEEE Transactions on Audio, Speech, and Language
[] G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for
acoustic modeling in speech recognition: the shared views of
four research groups,IEEE Signal Processing Magazine,vol.,
no. , pp. –, .
[] R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted Boltz-
mann machines for collaborative ltering,” in Proceedings of the
24th International Conference on Machine learning (ICML ’07),
vol. , pp. –, Corvallis, Oregon, June .
[] D. C. Cires¸an, U. Meier, L. M. Gambardella, and J. Schmidhuber,
“Deep, big, simple neural nets for handwritten digit recogni-
tion,Neural Computation, vol. , no. , pp. –, .
[] M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adaptive decon-
volutional networks for mid and high level feature learning,
in Proceedings of the 2011 IEEE International Conference on
Computer Vision, ICCV 2011, pp. –, Spain, November
[] A. Efrati, “How deep learning works at Apple, beyond,” https://
ple-Beyond, .
[] Z. Yang, B. Wu, K. Zheng, X. Wang, and L. Lei, “A survey of
collaborative ltering-based recommender systems for mobile
internet applications,IEEE Access,vol.,pp.,.
[] K. Zhu, L. Zhang, and A. Pattavina, “Learning geographical and
mobility factors for mobile application recommendation,IEEE
Intelligent Systems,vol.,no.,pp.,.
[] S. Jiang, B. Wei, T. Wang, Z. Zhao, and X. Zhang, “Big data
enabled user behavior characteristics in mobile internet,” in
Proceedings of the 2017 9th International Conference on Wireless
Communications and Signal Processing (WCSP),pp.,Nan-
jing, October .
[] J.Yang,Y.Qiao,X.Zhang,H.He,F.Liu,andG.Cheng,“Char-
acterizing user behavior in mobile internet,IEEE Transactions
on Emerging Topics in Computing,vol.,no.,pp.,.
[] Y. Qiao, X. Zhao, J. Yang, and J. Liu, “Mobile big-data-driven
rating framework: measuring the relationship between human
mobility and app usage behavior,IEEE Network,vol.,no.,
pp. –, .
[] Y.Qiao,J.Yang,H.He,Y.Cheng,andZ.Ma,“Userlocation
prediction with energy eciency model in the Long Term-
Evolution network,” International Journal of Communication
[] M. Gerla and L. Kleinrock, “Vehicular networks and the future
ofthemobileinternet,Computer Networks,vol.,no.,pp.
–, .
[] M. M. Islam, M. A. Razzaque, M. M. Hassan, W. N. Ismail, and
B. Song, “Mobile cloud-based big healthcare data processing in
smart cities,IEEE Access,vol.,pp.,.
[] Texas Instruments, “Wireless Handset Solutions: Mobile Inter-
net Device,” smartphone,
wireless channel modeling method: motivation, principle and
performance,Journal of Communications and Information
[] X.Ma,J.Zhang,Y.Zhang,Z.Ma,andY.Zhang,“APCA-based
modeling method for wireless MIMO channel,” in Proceedings
of the 2017 IEEE Conference on Computer Communications:
Wor k shops (INFO C O M W K S H P S ) , pp. –, Atlanta, GA,
May .
[] X. Zhang, Z. Yi, Z. Yan et al., “Social computing for mobile big
data,e Computer Journal,vol.,no.,pp.,.
cascading and community-cascading in social networks: com-
parative analysis and its implications toedge caching,Informa-
tion Sciences,vol.-,pp.,.
[] S. Gao, H. Luo, D. Chen et al., “A cross-domain recommenda-
tion model for cyber-physical systems,IEEE Transactions on
Emerging Topics in Computing,vol.,no.,pp.,.
[] Y. Qiao, Z. Xing, Z. M. Fadlullah, J. Yang, and N. Kato,
“Characterizing ow, application, and user behavior in mobile
networks: a framework for mobile big data,” IEEE Wireless
Communications Magazine,vol.,no.,pp.,.
[] H. Yu, Z. Tan, Z. Ma, R. Martin, and J. Guo, “Spoong detection
in automatic speaker verication systems using DNN classiers
and dynamic acoustic features,IEEE Transactions on Neural
Networks and Learning Systems,pp..
[] H. Yu, Z.-H. Tan, Y. Zhang, Z. Ma, and J. Guo, “DNN lter bank
cepstral coecients for spoong detection,IEEE Access,vol.,
pp. –, .
[] Z. Ma, H. Yu, Z.-H. Tan, and J. Guo, “Text-independent speaker
identication using the histogram transform model,IEEE
[] H. Yu, Z.-H. Tan, Z. Ma, and J. Guo, “Adversarial network
bottleneck features for noise robust speaker verication,” in
Proceedings of the 18th Annual Conference of the International
Speech Communication Association, INTERSPEECH 2017,pp.
–, Sweden, August .
[] H.Yu,A.Sarkar,D.A.L.omsen,Z.-H.Tan,Z.Ma,andJ.Guo,
“Eect of multi-condition training and speech enhancement
methods on spoong detection,” in Proceedings of the 1st
International Workshop on Sensing, Processing and Learning for
Intelligent Machines, SPLINE 2016,Denmark,July.
[] H.Yu,Z.Ma,andM.Li,“HistogramtransformmodelUsing
MFCC features for text-independent speaker identication,” in
Proceedings of the IEEE Asilomar Conference on Signals, Systems,
development of intelligent energ y networks,IEEE Network,vol.
, no. , pp. –, .
[] Z.Ma,H.Li,Q.Sun,C.Wang,A.Yan,andF.Starfelt,“Statistical
analysis of energy consumption patterns on the heat demand of
buildings in district heating systems,Energy and Buildings,vol.
, pp. –, .
[] D. West, “How mobile devices are transforming healthcare,
Issues in Technology Innovation,vol.,no.,pp.,.
[] L. A. Tawalbeh, R. Mehmood, E. Benkhlifa, and H. Song,
“Mobile cloud computing model and big data analysis for
healthcare applications,” IEEE Access,vol.,pp.,.
[] S. Sagiroglu and D. Sinanc, “Big data: a review,” in Proceedings of
the International Conference on Collaboration Technologies and
Systems (CTS ’13), pp. –, IEEE, San Diego, Calif, USA, May
 Wireless Communications and Mobile Computing
“So-dened heterogeneous vehicular network: architecture
and challenges,” IEEE Network,vol.,no.,pp.,.
[] H. Hsieh, V. Klyuev, Q. Zhao, and S. Wu, “SVR-based outlier
detection and its application to hotel ranking,” in Proceedings of
the 2014 IEEE 6th International Conference on Awareness Science
and Technology (iCAST), pp. –, Paris, France, October .
[] S. Rahman, M. Sathik, and K. Kannan, “Multiple linear regres-
sion models in outlier detection,International Journal of
Research in Computer Science,vol.,no.,pp.,.
[] H. A. Dau, V. Ciesielski, and A. Song, “Anomaly Detection
using replicator neural networks trained on examples of one
class,” in SimulatedEvolutionandLearning, vol.  of Lecture
Notes in Computer Science, pp. –, Springer International
Publishing, Cham, .
[] Z. Ma, J.-H. Xue, A. Leijon, Z.-H. Tan, Z. Yang, and J. Guo,
“Decorrelation of neutral vector variables: theory and appli-
cations,IEEE Transactions on Neural Networks and Learning
[] Z. Ma, S. Chatterjee, W. B. Kleijn, and J. Guo, “Dirichlet
mixture modeling to estimate an empirical lower bound for LSF
quantization,Signal Processing, vol. , pp. –, .
[] Z. Ma and A. Leijon, “Bayesian estimation of beta mixture mod-
els with variational inference,IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. , no. , pp. –,
[] C. C. Aggarwal, “Outlier analysis,” in Data Mining,Springer,
[] Y.Demchenko,P.Grosso,C.deLaat,andP.Membrey,“Address-
ing big data issues in scientic data infrastructure,” in Pro-
ceedings of the IEEE International Conference on Collaboration
Techn o l o gi e s and Systems ( C T S ’ 1 3 ), pp. –, May .
acquisition by adding home and work related contexts on
mobile big data analysis,” in Proceedings of the IEEE INFOCOM
2016 - IEEE Conference on Computer Communications Work-
shops(INFOCOMWKSHPS), pp. –, San Francisco, CA,
USA, April .
[] X.Ge,H.Cheng,M.Guizani,andT.Han,“Gwirelessbackhaul
networks: challenges and research advances,IEEE Network,
vol. , no. , pp. –, .
[] S. Landset, T. M. Khoshgoaar, A. N. Richter, and T. Hasanin,
A survey of open source tools for machine learning with big
data in the Hadoop ecosystem,Journal of Big Data,vol.,no.
[] D. Soubra, “e Vs that dene Big Data,” http://ww w.datas-vs-that-dene-big-data,
[] L. Ma, F. Nie, and Q. Lu, “An analysis of supply chain restruc-
warehouse-type supermarkets,” in Proceedings of the IEEE Inter-
national Conference on Grey Systems and Intelligent Services,
GSIS 2015,pp.,UK,August.
[] A. McAfee and E. Brynjolfsson, “Big data: the management
revolution,Harvard Business Rev iew, vol. , no. , pp. –,
[] Y.Li,J.Zhang,andZ.Ma,“Clusteringinwirelesspropagation
channel with a statistics-based framework,” in Proceedings of the
2018 IEEE Wireless Communications and Networking Conference
(WCNC), pp. –, Barcelona, April .
[] P. Kazienko, K. Musiał, and T. Kajdanowicz, “Multidimensional
social network in the social recommender system,IEEE Trans-
pp. –, .
[] A. Abe, K. Yamamoto, and S. Nakagawa, “Robust speech recog-
nition using DNN-HMM acoustic model combining noise-
aware training with spectral subtraction,” in Proceedings of
the 16th Annual Conference of the International Speech Com-
munication Association, INTERSPEECH 2015,p
Germany, September .
[] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “An experimental study
on speech enhancement based on deep neural networks,IEEE
Signal Processing Letters,vol.,no.,pp.,.
[] A. Narayanan and D. Wang, “Ideal ratio mask estimation
using deep neural networks for robust speech recognition,” in
Proceedings of the 2013 38th IEEE International Conference on
Acoustics, Speech, and Signal Processing, ICASSP 2013,pp.
, Canada, May .
[] D. Serdyuk, K. Audhkhasi, and P. Brakel, “Invariant representa-
tions for noisy speech recognition,” Computation and Language,
, arXiv:..
[] P. E. Hart, “e condensed nearest neighbor rule,IEEE Trans-
actions on Information eory,vol.,no.,pp.-,.
[] G. Gates, “e reduced nearest neighbor rule,IEEE Transac-
tions on Information eory,vol.,no.,pp.,.
[] H. Brighton and C. Mellish, “Advances in instance selection for
instance-based learning algorithms,Data Mining and Knowl-
edge Discovery,vol.,no.,pp.,.
[] Y.LiandL.Maguire,“Selectingcriticalpatternsbasedonlocal
geometrical and statistical information,IEEE Transactions on
Pattern Analysis and Machine Intelligence,vol.,no.,pp.
–, .
[] F. Angiulli, “Fast nearest neighbor condensation for large data
sets classication,IEEE Transactions on Knowledge and Data
[] F. Angiulli and G. Folino, “Distributed nearest neighbor-based
condensation of very large data sets,IEEE Transactions on
Knowledge and Data Engineering,vol.,no.,pp.,
[] M. I. Jordan, “Divide-and-conquer and statistical inference
for big data,” in Proceedings of the the 18th ACM SIGKDD
international conference, p. , Beijing, China, August .
multi-aspect data mining,” in Proceedings of the 8th IEEE
International Conference on Data Mining, ICDM 2008,pp.
, Italy, December .
[] G. Wahba, “Dissimilarity data in statistical model building
and machine learning,” in Proceedings of the 5th International
Congress of Chinese Mathematicians,pp.,.
[] S.C.Hoi,J.Wang,P.Zhao,andR.Jin,“Onlinefeatureselection
for mining big data,” in Proceedings of the 1st International Work-
shop on Big Data, Streams and Heterogeneous Source Mining:
Algorithms, Systems, Programming Models and Applications,pp.
–, Beijing, China, August .
[] A.Sagheer,N.Tsuruta,R.-I.Taniguchi,D.Arita,andS.Maeda,
“Fast feature extraction approach for multi-dimension feature
space problems,” in Proceedings of the 18th International Con-
ference on Pattern Recognition, ICPR 2006, pp. –, China,
August  .
[] J.R.AnarakiandM.Eekhari,“Improvingfuzzy-roughquick
reduct for feature selection,” in Proceedings of the 2011 19th
Wireless Communications and Mobile Computing 
May .
[] I. A. Gheyas and L. S. Smith, “Feature subset selection in large
dimensionality domains,Pattern Recognition,vol.,no.,pp.
–, .
[] K. W. Lau and Q. H. Wu, “Online training of support vector
classier,Pattern Recognition,vol.,no.,pp.,
[] P. Laskov, C. Gehl, S. Kr¨
uller, “Incremental
support vector learning: analysis, implementation and applica-
tions,Journal of Machine Learning Research,vol.,pp.
, .
[] C. Chang and C. Lin, “LIBSVM: a Library for support vector
nology, vol. , no. , article , .
[] K. Huang, H. Yang, I. King, and M. R. Lyu, “Maxi-min margin
machine: learning large margin classiers locally and globally,
IEEE Transactions on Neural Networks and Learning Systems,
vol. , no. , pp. –, .
[] A. Franco-Arcega, J. A. Carrasco-Ochoa, G. Snchez-Daz et al.,
“Building fast decision trees from large training sets,Intelligent
Data Analysis, vol. , no. , pp. –, .
[] H. Yang and S. Fong, “Incrementally optimized decision tree
for noisy big data,” in Proceedings of the 1st International Work-
shop on Big Data, Streams and Heterogeneous Source Mining: