ArticlePublisher preview available

A survey for user behavior analysis based on machine learning techniques: current models and applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Significant research has been carried out in the field of User Behavior Analysis, focused on understanding, modeling and predicting past, present and future behaviors of users. However, the heterogeneity of the approaches makes their comprehension very complicated. Thus, domain and Machine Learning experts have to work together to achieve their objectives. The main motivation for this work is to obtain an understanding of this field by providing a categorization of state-of-the-art works grouping them based on specific features. This paper presents a comprehensive survey of the existing literature in the areas of Cybersecurity, Networks, Safety and Health, and Service Delivery Improvement. The survey is organized based on four different topic-based features which categorize existing works: keywords, application domain, Machine Learning algorithm, and data type. This paper aims to thoroughly analyze the existing references, to promote the dissemination of state-of-the-art approaches discussing their strong and weak points, and to identify open challenges and prospective future research directions. In addition, 127 discussed papers have been scored and ranked according to relevance-based features: paper reputation, maximum author reputation, novelty, innovation and data quality. Both types of features, topic-based and relevance-based have been combined to build a similarity metric enabling a rich visualization of all considered publications. The obtained graphic representation provides a guide of recent advancements in User Behavior Analysis by topic, highlighting the most relevant ones.
This content is subject to copyright. Terms and conditions apply.
https://doi.org/10.1007/s10489-020-02160-x
A survey for user behavior analysis based on machine learning
techniques: current models and applications
Alejandro G. Mart´
ın1·Alberto Fern´
andez-Isabel1·Isaac Mart´
ın de Diego1·Marta Beltr´
an1
Accepted: 16 December 2020
©The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021
Abstract
Significant research has been carried out in the field of User Behavior Analysis, focused on understanding, modeling
and predicting past, present and future behaviors of users. However, the heterogeneity of the approaches makes their
comprehension very complicated. Thus, domain and Machine Learning experts have to work together to achieve their
objectives. The main motivation for this work is to obtain an understanding of this field by providing a categorization of
state-of-the-art works grouping them based on specific features. This paper presents a comprehensive survey of the existing
literature in the areas of Cybersecurity,Networks,Safety and Health,andService Delivery Improvement.Thesurveyis
organized based on four different topic-based features which categorize existing works: keywords, application domain,
Machine Learning algorithm, and data type. This paper aims to thoroughly analyze the existing references, to promote
the dissemination of state-of-the-art approaches discussing their strong and weak points, and to identify open challenges
and prospective future research directions. In addition, 127 discussed papers have been scored and ranked according to
relevance-based features: paper reputation, maximum author reputation, novelty, innovation and data quality. Both types of
features, topic-based and relevance-based have been combined to build a similarity metric enabling a rich visualization of
all considered publications. The obtained graphic representation provides a guide of recent advancements in User Behavior
Analysis by topic, highlighting the most relevant ones.
Keywords User behavior analysis ·Behavioral analytics ·Survey ·Machine learning ·Topic-based features ·
Relevance-based features
1 Introduction
The foundations of Behavior Analysis were first introduced
in 1953 [154]. Behavior Analysis was initially focused
on studying human behavior, being closely related to the
psychology domain: science techniques were applied to
understand human behaviors [41]. Seven years later, the
Alberto Fern´
andez-Isabel
alberto.fernandez.isabel@urjc.es
Alejandro G. Mart´
ın
alejandro.garciam@urjc.es
Isaac Mart´
ın de Diego
isaac.martin@urjc.es
Marta Beltr´
an
marta.beltran@urjc.es
1Data Science Laboratory, Rey Juan Carlos University,
C/ Tulip´
an, s/n, 28933, M´
ostoles, Spain
principal components of human behavior analysis were
fixed [151]. However, with the fast growth of technologies,
the appearance of data mining [64] and the possibility of
implementing Machine Learning (ML) algorithms [122],
this behavioral science has evolved. Terms like Behavior
Informatics [27], Customer Behavior Analysis [149]orUser
Behavior Analytics (UBA) [139] are used on a daily basis.
The evolution of Behavior Analysis is often called
Behavioral Analytics [28], a scientific discipline focused on
modeling specific behaviors to understand interactions with
a system achieving a set of business goals. To address this
modeling, Behavioral Analytics tries to understand, analyze
and predict past, present and future behaviors using ML
techniques. This field has demonstrated to have a plethora
of significant applications in different domains such as
the detection of suspicious interactions within a corporate
network [145], the recommendation of products to potential
customers [125], the optimization of bus lines design [83]or
helping the older people with their routines [14], to mention
only some examples.
/ Published online: 26 January 2021
Applied Intelligence (2021) 51:6029–6055
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... e access of a large number of users and devices increases the potential risk of network attacks, bringing great challenges to network security [4][5][6]. e Trusted Protocol (TP) can effectively reduce the attacks launched by malicious users on the network by controlling and managing user behaviors, which is one of the important methods to improve network security [7][8][9]. How to construct a TP to detect malicious behaviors in 6G networks with massive connections is an urgent problem to be solved. However, traditional TPs (such as identity authentication, access control, and traffic detection) are mostly deployed in centralized networks and are difficult to be applied directly to 6G networks with dynamic changes in user behaviors and heterogeneous network structures. ...
... A user with a trusted identity can access the network resources only after obtaining the legitimate access authorization. e ACM can be modeled as shown in (7). g() is the trusted access control protocol. ...
Article
Full-text available
The access of massive users and devices in the 6G networks increases the risk of network attacks. Designing a trusted protocol to control user behavior can effectively improve the security capability of the network. However, most of the existing trusted protocols focus on unilateral user behavior and lack effective control over the whole process of user behavior. In this paper, we design a blockchain-enabled trusted protocol based on the whole-process user behavior. At first, we describe the Whole-Process User Behavior (WPUB) after the user accesses the network, and model the whole-process trusted control process. The proposed model establishes a trusted chain between user identity, access action, and communication traffic, and realizes the control of WPUB. Then, based on the proposed model, we design a whole-process trusted protocol with smart agents and smart contracts in combination with blockchain. Finally, we evaluate the designed protocol in the HyperLedger Fabric-based prototype system. Evaluations show that the proposed protocol can control the WPUB and reduce the risk of the network being attacked.
... It brings increased focus to the study and offers information on the needs and understanding the patterns and variability in road user behaviour (Crawford et al., 2018). Also, extensive user behaviour research was performed recently, concentrating on the realisation, modelling and prediction of user behaviour in the past, present and future (Martín, Fernández-Isabel, de Diego & Beltrán, 2021). ...
... It strengthens the study's focus and gives information on the requirements and differences in road user behaviour patterns (Crawford et al., 2018). Recent studies have concentrated on identifying, modeling, and forecasting past, present, and future user behaviour (Martín et al., 2021). The information gained in this study depicted that the progression of the publication reflects evolutionary nuances, and bibliometric analysis enables us to collect and graphically represent evolutionary nuances across temporal and spatial dimensions. ...
Article
Full-text available
Every single person is entitled to equal space on the roads or sidewalks, so they rely on each other's empathy and compassion and not be self-centered. Therefore, it is essential to promote the ethics of road safety and road users' exemplary behavior upmost. This review analyzed the publication trends and thematic evolution of road user behaviour over 47 years from 1973 to 2020. The assessment uses the Scopus database and various bibliometric indicators, such as output growth trends, eminent countries, research hotspots, and author keywords. Also, this study presented a graphical visualization of bibliometric indicators using a VOSviewer. Another bibliometric software tool, known as SciMAT, was used to inspect road user behaviour research's thematic evolution. The verdicts revealed that the number of publications increased exponentially, starting in 2005 with a hike in publications in 2020. Road user behaviour researches were diverse by examining the various research hotspots. This review also focuses on several themes and dimensions of road user behaviour research. The essential motor theme during the first period (2005-2012) was "schools". Other motor themes, such as "cross-sectional studies," "car", and "space-temporal-analysis", became the most significant number of publications in the second period (2013-2020). These four themes may be beneficial as a benchmark for researchers focusing on the art of road user behaviour. This bibliometric study provides a comprehensive and in-depth view of road users' behaviour that may help future researchers advance potential knowledge in this field.
... e mainstream English teaching still focuses on the traditional offline face-to-face teaching, supplemented by online network teaching, so the teaching mode is relatively single [1]. Since the teaching model innovation policies are promoted in different regions at this stage, the vigorous development of a variety of data analysis technologies has also triggered the innovative application of English Classroom Teaching [2]. Manufacturing computer-integrated manufacturing system CIMS, system scheduling, communication network system, database management system, military C3I system, and other systems are typical discrete event systems [3]. ...
... As can be seen from Figure 5, with the increase in data analysis and processing groups, it can be found that within a certain range (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12), the longer the unified standard processing time is, and it is relatively stable. ...
Article
Full-text available
The current English teaching mode focuses on the traditional offline teaching and online teaching. In order to solve the problems that some students are inefficient and cannot teach students according to their aptitude in the teaching process, this paper uses the big data analysis strategy based on a neural network algorithm. This paper studies the discrete dynamic modeling method of learner behavior analysis in English teaching. Firstly, it summarizes the current situation of English teaching and the research status of the hybrid application of discrete dynamic modeling technology. Secondly, combined with English teaching content and teaching objectives, through the analysis of various data of students’ learning behavior, this paper evaluates students’ English teaching quality from five aspects that affect the students’ English teaching quality and puts forward a personalized English teaching quality evaluation model based on discrete dynamic modeling technology and learners’ behavior analysis. Finally, through the practical teaching application in a university, the feasibility of the discrete dynamic English teaching model is verified. The results show that compared with the current innovative English teaching methods based on a dynamic iterative decision algorithm, the personalized discrete dynamic English teaching model based on learner behavior analysis significantly improves the quality of English teaching and students’ academic performance.
... Most often, phishing attacks exploit network bandwidth and connectivity, downgrading performance in systems they are compromising, whether during network-based attacks or host-based attacks, as shown in Figure 1. Consequently, phishing attacks are successful at halting, interrupting, and demoting the real-time performance of the system by draining all its resources [3]. Figure 1 shows multiple-incident level attacks that can occur on smartphones. ...
... Avoiding any one of these tenets deliberately or inadvertently might lead to open security breaches and unresolved vulnerabilities, consequently leading to a loss of credentials, reputations, and financial gains. Hence, the focus on web security against phishing attacks is necessary and must counter the latest exploitation techniques [3]. Among all cyberattacks, phishing attacks appear friendly but target financial transactions and highly confidential data. ...
Article
Full-text available
Different types of connectivity are available on smartphones such as WiFi, infrared, Bluetooth, GPRS, GPS, and GSM. The ubiquitous computing features of smartphones make them a vital part of our lives. The boom in smartphone technology has unfortunately attracted hackers and crackers as well. Smartphones have become the ideal hub for malware, gray ware, and spyware writers to exploit smartphone vulnerabilities and insecure communication channels. For every security service introduced, there is simultaneously a counterattack to breach the security and vice versa. Until a new mechanism is discovered, the diverse classifications of technology mean that one security contrivance cannot be a remedy for phishing attacks in all circumstances. Therefore, a novel architecture for antiphishing is mandatory that can compensate web page protection and authentication from falsified web pages on smartphones. In this paper, we developed a cluster-based antiphishing (CAP) model, which is a lightweight scheme specifically for smartphones to save energy in portable devices. The model is significant in identifying, clustering, and preventing phishing attacks on smartphone platforms. Our CAP model detects and prevents illegal access to smartphones based on clustering data to legitimate/normal and illegitimate/abnormal. First, we evaluated our scheme with mathematical and algorithmic methods. Next, we conducted a real test bed to identify and counter phishing attacks on smartphones which provided 90% accuracy in the detection system as true positives and less than 9% of the results as true negative.
... Over the past several years, DNNs have been exploited in various general [19,20] and specific [21,22] visual tasks. In the image denoising community, DNNs have also become a highly examined topic, and various network architectures and learning strategies have been exploited. ...
Article
Full-text available
This paper proposes feature-level full-reference image denoising quality metrics based on a joint sparse representation model. By decomposing a denoised image and its clean reference jointly and sparsely with a specific learning dictionary, our method measures the denoising quality from two contradictory perspectives, i.e., the detail preservation capability and noise suppression capability, which determine the denoising quality together, in an image feature space. This novel multiperspective method can not only measure the performance of denoising algorithms accurately but also provide a unique means for investigating denoising characteristics in a learning feature space. In the experiments, nine representative denoising methods and six widely used full-reference objective metrics were employed to verify the effectiveness of our method. In addition, the denoising influences exerted on dictionary atoms are investigated in depth, and several statistical conclusions are reported. Furthermore, our work also provides a new feasible assessment framework for other image recovery and generation tasks.
... Data mining approaches can be applied to automatically dig information with the goal to find out some habits and tendencies. 9,10,11 In that case, the web access log file is analysed in order to divide users on the basis of a selected algorithm into several clusters. Users within the same cluster should have similar properties from the point of view of their activity in cyberspace. ...
Article
Full-text available
Cyber security is one of the prominent global challenges due to the significant increase in the number of cyberattacks over the last few decades. The amount of transferred data is growing, and a quick reaction to cyber incidents is needed. The paper is a contribution to this effort. There is a possibility to save time and resources by concentrating only on a subgroup of potential threats caused by a specific group of users. The main source of information about a selected group of users is the web access log file, where all the necessary data is stored. The contribution also presents the concept of preprocessing data from the log files to a form useful for clustering. In the next step, a density based spatial clustering algorithm is applied to create the clusters. Clustering algorithms have been applied to many fields (marketing, business, etc.), but not for the purposes of cyber defence. The created clusters were analysed according to our definition of risky behaviour. After analysis of the clustering results, it was possible to select a potentially dangerous group of users in the specific cluster. The presented method has potential use in different areas of cyber defence and other applications where intelligent classification is required.
... A further issue is that the cost associated with running multiple conditions typically means reducing the number of conditions for reasons of pragmatism. There is a need to reduce the cost of user studies, not only in IR but also in other fields, where multiple userrelated aspects are often studied [51,65]. Even if power analysis is effectively employed to estimate the necessary number of participants and individual experiments required prior to the studies taking place, there can still be many instances where more resources are actually used than is necessary. ...
Article
Full-text available
Two major barriers to conducting user studies are the costs involved in recruiting participants and researcher time in performing studies. Typical solutions are to study convenience samples or design studies that can be deployed on crowdsourcing platforms. Both solutions have benefits but also drawbacks. Even in cases where these approaches make sense, it is still reasonable to ask whether we are using our resources – participants’ and our time – efficiently and whether we can do better. Typically user studies compare randomly-assigned experimental conditions, such that a uniform number of opportunities are assigned to each condition. This sampling approach, as has been demonstrated in clinical trials, is sub-optimal. The goal of many Information Retrieval (IR) user studies is to determine which strategy (e.g., behaviour or system) performs the best. In such a setup, it is not wise to waste participant and researcher time and money on conditions that are obviously inferior. In this work we explore whether Best Arm Identification (BAI) algorithms provide a natural solution to this problem. BAI methods are a class of Multi-armed Bandits (MABs) where the only goal is to output a recommended arm and the algorithms are evaluated by the average payoff of the recommended arm. Using three datasets associated with previously published IR-related user studies and a series of simulations, we test the extent to which the cost required to run user studies can be reduced by employing BAI methods. Our results suggest that some BAI instances (racing algorithms) are promising devices to reduce the cost of user studies. One of the racing algorithms studied, Hoeffding, holds particular promise. This algorithm offered consistent savings across both the real and simulated data sets and only extremely rarely returned a result inconsistent with the result of the full trial. We believe the results can have an important impact on the way research is performed in this field. The results show that the conditions assigned to participants could be dynamically changed, automatically, to make efficient use of participant and experimenter time.
Article
Full-text available
span>Institutions wrestle to protect their information from threats and cybercrime. Therefore, it is dedicating a great deal of their concern to improving the information security infrastructure. Users’ behaviors were explored by applying traditional questionnaire as a research instrument in data collocate process. But researchers usually suffer from a lack of respondents' credibility when asking someone to fill out a questionnaire, and the credibility may decline further if the research topic relates to aspects of the use and implementation of information security policies. Therefore, there is insufficient reliability of the respondent's answers to the questionnaire’s questions, and the responses might not reflect the actual behavior based on the human bias when facing the problems theoretically. The current study creates a new idea to track and study the behavior of the respondents by building a tracking game system aligned with the questionnaire whose results are required to be known. The system will allow the respondent to answer the survey questions related to the compliance with the information security policies by tracking their behavior while using the system.</span
Article
The scientific and business communities are proposing new authentication methods more robust than traditional solutions relying on a single security point such as passwords (i.e. “something you know”). User and Entity Behavior Analysis (UEBA) has postulated as an excellent solution to improve authentication systems by performing continuous authentication to extend the authentication process over time. UEBA is based on detecting anomalies in the intrinsic behaviour of each user or entity (i.e. it is based on “something you are/do”). This paper presents a method for performing continuous authentication using UEBA techniques that allows combining information from multiple sources at the feature level. This combination is achieved through a novel Symbolic Aggregate approximation (SAX) using Random Trees Embeddings for each information source, producing a sequence of symbols. Then, these sequences of symbols are combined into a single sequence using temporal information. The resulting sequence of symbols feeds a density-based clustering model that uses a distance based on DNA sequence alignment techniques to extract behavioural cores. Finally, new samples are compared against these cores to detect anomalies using a risk model that evaluates if a behaviour is anomalous (suspected user impersonation). The model has been extensively tested and evaluated against well-known state-of-the-art datasets.
Article
Full-text available
Click-through rate (CTR) prediction, whose goal is to estimate the probability of a user clicking on the item, has become one of the core tasks in the advertising system. For CTR prediction model, it is necessary to capture the latent user interest behind the user behavior data. Besides, considering the changing of the external environment and the internal cognition, user interest evolves over time dynamically. There are several CTR prediction methods for interest modeling, while most of them regard the representation of behavior as the interest directly, and lack specially modeling for latent interest behind the concrete behavior. Moreover, little work considers the changing trend of the interest. In this paper, we propose a novel model, named Deep Interest Evolution Network (DIEN), for CTR prediction. Specifically, we design interest extractor layer to capture temporal interests from history behavior sequence. At this layer, we introduce an auxiliary loss to supervise interest extracting at each step. As user interests are diverse, especially in the e-commerce system, we propose interest evolving layer to capture interest evolving process that is relative to the target item. At interest evolving layer, attention mechanism is embedded into the sequential structure novelly, and the effects of relative interests are strengthened during interest evolution. In the experiments on both public and industrial datasets, DIEN significantly outperforms the state-of-the-art solutions. Notably, DIEN has been deployed in the display advertisement system of Taobao, and obtained 20.7% improvement on CTR.
Article
Full-text available
There are two key characteristics of users in trust relationships that have been well studied: (1) users trust their friends with different trust strengths and (2) users play multiple roles of trusters and trustees in trust relationships. However, few studies have considered both of these factors. Indeed, it is quite common for someone to respond to his/her friend that they trusted him/her, which indicates that there exist two kinds of information between each pair of users: the trust influence of trustee on truster and the feedback influence of truster on trustee. Considering this problem, we propose a novel adaptive method to learn the trust influence between users with multiple roles of truster and trustee for recommendation. First, we propose to introduce the concept of latent trust strength to learn adaptive role-based trust strength with limited values for each trust relationship between users. Second, because there is only one training example to learn each parameter of latent trust strength, we further propose two regularization methods by building relations between latent trust strength and user preferences to guide the training process of latent trust strength. After that, we develop a new recommendation method, RoleTS, by integrating the role-based trust strength into a previous recommendation model, TrustSVD, which considers both explicit and implicit information of trust and ratings. We also conduct a series of experiments to study the performance of the proposed method. Experimental results on two public real datasets demonstrate that the proposed method performs better than several state-of-the-art algorithms.
Article
Full-text available
People may do the same activity in many different ways hence, modeling and recognizing that activity based on data gathered through simple sensors like motion sensor is a complex task. In this paper, we propose an approach for activity mining and activity tracking which identifies frequent normal and interleaved activities that individuals perform. With this capability, we can track the occurrence of regular activities to monitor users and detect changes in an individual’s behavioral pattern and lifestyle. We have tested the proposed method using the datasets of Washington State University CASAS and the Massachusetts Institute of Technology (MIT) smart home projects. The obtained results show considerable improvements compared with existing methods.
Article
This research investigates electric vehicle (EV) charging behavior and aims to find the best method for its prediction in order to optimize the EV charging schedule. This paper discusses several commonly used machine learning algorithms to predict charging behavior, including stay duration and energy consumption based on historical charging records. It is noted that prediction error increases along with the rise of data entropy or the decrease of data sparsity. Thus, this paper accounts for both indicators by defining the entropy/sparsity ratio (R). When R is low, support vector regression (SVR) and random forest (RF) regression show better accuracy for stay duration and energy consumption predictions, respectively. While R is high, a diffusion-based kernel density estimator (DKDE) performs better for both predictions. The three methods are assembled as the proposed Ensemble Predicting Algorithm (EPA) to improve predicting performance by decreasing 11 of the duration and 22 of the energy consumption prediction errors. The prediction results are then applied to an optimal EV charging scheduling algorithm to minimize load variance while reducing the EV charging cost. A numerical simulation using real charging data is conducted to show the effectiveness of improved predictions and EV load management. The results show that the charging scheduling combined with EPA prediction can reduce 27% of peak load, 10% of load variation, and 4% cost reduction, compared to uncoordinated charging.
Conference Paper
Nowadays, judging the current transaction based on user history transactions is an important detection method. However, different users have different transaction behaviors, when all users use the same limit to judge whether the transaction is abnormal, it will result in higher misjudgment for some users. Aiming at the above problems, this paper proposes an individual behavior transaction detection method based on hypersphere model. In this model, considering multiple dimensions of normal historical transaction records, the characteristics of user's transaction behavior is generated with the trend of transaction. Then, the user optimal risk threshold algorithm is proposed to determine the optimal risk threshold for each user. Finally combining the transaction behavior and the optimal risk threshold, the user behavior benchmark is formed, which is used to construct the multidimensional hypersphere model. On this basis, a mapping method for transforming transaction detection into midpoint in multidimensional space is proposed. The experiment proves that the proposed method is superior to other models, and it is found that the characterization effect of user behavior is related to the frequency of users' transactions.