Conference Paper

User Behavior Analytics for Anomaly Detection Using LSTM Autoencoder - Insider Threat Detection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Identifying anomalies from log data for insider threat detection is practically a very challenging task for security analysts. User behavior modeling is very important for the identification of these anomalies. This paper presents unsupervised user behavior modeling for anomaly detection. The proposed approach uses LSTM based Autoencoder to model user behavior based on session activities and thus identify the anomalous data points. The proposed method follows a two-step process. First, it calculates the reconstruction error using the autoencoder on the non-anomalous dataset, and then it is used to define the threshold to separate the outliers from the normal data points. The identified outliers are then classified as anomalies. The CERT insider threat dataset has been used for the research work. For each user, the feature vectors are prepared by extracting key information from corresponding raw events and aggregating the data points based on users' actions within respective users' sessions. LSTM Autoencoder has been implemented for behavior learning and anomaly detection. For any unseen behavior or anomaly pattern, the model produces high reconstruction error which is an indication of an anomaly. The experimental results show that in the best case, the model produced an Accuracy of 90.17%, True Positives 91.03%, and False Positives 9.84%. Thus, the results suggest that the proposed approach can be effectively used in automatic anomaly detection.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Baseline Tools. To evaluate the effectiveness of our proposed ASG-ITD method, we compare it with six representative anomaly detection models: DBN-OCSVM [28], CNN [29], LSTM [30], LSTM-CNN [11], LSTM-Autoencoder(LSTM-AE) [31], and GCN. These comparative models can be divided into two categories, i.e., fixed-time and sessionbased approaches. ...
... The statistical experimental results of each method are shown in Table 4. Compared with other methods, ASG-ITD performs well in all evaluation metrics. Table 4 shows that the PR and F1 of [31] are much lower than ours, which is mainly due to the unbalanced dataset, with only 0.03% anomalous instances and 99.7% normal instances. Moreover, it is clear that existing methods have a high FPR, which may stem from two causes: first, several methods [15,28,31] only focus on numerical and categorical features from various log data, e.g., number of external emails received, number of web accesses during the weekend, user role, etc., while ignoring the relationship between user activities; second, other approaches [11,32] tend to apply temporal information in the data representation. ...
... Table 4 shows that the PR and F1 of [31] are much lower than ours, which is mainly due to the unbalanced dataset, with only 0.03% anomalous instances and 99.7% normal instances. Moreover, it is clear that existing methods have a high FPR, which may stem from two causes: first, several methods [15,28,31] only focus on numerical and categorical features from various log data, e.g., number of external emails received, number of web accesses during the weekend, user role, etc., while ignoring the relationship between user activities; second, other approaches [11,32] tend to apply temporal information in the data representation. This type of method uses deep learning, e.g., LSTM, to learn from past behavior and predict the next behavior. ...
Article
Full-text available
Insider threats pose significant risks to organizational security, often leading to severe data breaches and operational disruptions. While foundational, traditional detection methods suffer from limitations such as labor-intensive rule creation, lack of scalability, and vulnerability to evasion by sophisticated attackers. Recent advancements in graph-based approaches have shown promise by leveraging behavior analysis for threat detection. However, existing methods frequently oversimplify session behaviors and fail to extract fine-grained features, which are critical for identifying subtle malicious activities. In this paper, we propose a novel approach that integrates session graphs to capture multi-level fine-grained behavioral features. First, seven heuristic rules are defined to transform user activities across different hosts and sessions into an associated session graph while extracting features at both the activity and session levels. Furthermore, to highlight critical nodes in the associated session graph, we introduce a graph node elimination technique to normalize the graph. Finally, a graph convolutional network is employed to extract features from the normalized graph and generate behavior detection results. Extensive experiments on the CERT insider threat dataset demonstrate the superiority of our approach, achieving an accuracy of 99% and an F1-score of 99%, significantly outperforming state-of-the-art models. The ASG method also reduces false positive rates and enhances the detection of subtle malicious behaviors, addressing key limitations of existing graph-based methods. These findings highlight the potential of ASG for real-world applications such as enterprise network monitoring and anomaly detection, and suggest avenues for future research into adaptive learning mechanisms and real-time detection capabilities.
... Deep learning, such as long short-term memory (LSTM), algorithms [127][128][129] are used to automatically select user and device behaviour features due to its ability to efficiently capture time series features. Singh et al. [127] proposed an anomaly detection method for network internal user behaviour based on a hybrid machine learning algorithm. ...
... However, Singh et al. [127] also believed that existing internal detection methods had problems such as a high false alarm rate and insufficient feature selection. Therefore, Singh et al. [128] continued to propose an internal threat detection method based on user behaviour for key infrastructure to improve feature extraction performance. Compared with the [127], Singh et al. [128] used bi-directional long short-term memory (Bi-LSTM) for efficient feature extraction and used SVM as a classifier to classify user behaviours into normal and malicious. ...
... Therefore, Singh et al. [128] continued to propose an internal threat detection method based on user behaviour for key infrastructure to improve feature extraction performance. Compared with the [127], Singh et al. [128] used bi-directional long short-term memory (Bi-LSTM) for efficient feature extraction and used SVM as a classifier to classify user behaviours into normal and malicious. Singh et al. [128] achieved an accuracy of 87.5%, which is higher than LSTM+CNN (75.3%). ...
Article
Full-text available
Zero trust architecture (ZTA) is a paradigm shift in how we protect data, stay connected and access resources. ZTA is non-perimeter-based defence, which has been emerging as a promising revolution in the cyber security field. It can be used to continuously maintain security by safeguarding against attacks both from inside and outside of the network system. However, ZTA automation and orchestration, towards seamless deployment on real-world networks, has been limited to be reviewed in the existing literature. In this paper, we first identify the bottlenecks, discuss the background of ZTA and compare it with traditional perimeter-based security architectures. More importantly, we provide an in-depth analysis of state-of-the-art AI techniques that have the potential in the automation and orchestration of ZTA. Overall, in this review paper, we develop a foundational view on the challenges and potential enablers for the automation and orchestration of ZTA.
... According to the proposed approach, the baseline behavior is identified using legitimate behavior, and it is flagged if nonbaseline behavior crosses its baseline threshold. Balaram Sharma et al., in [19], proposed a user behavior analytics for anomaly detection approach using an LSTM autoencoder. In the proposed solution, the dataset used for training was the CERT v4.3 dataset. ...
... The session-based time window approach was proposed in [19], where all the events in a session were recorded and represented in a single feature vector for a given session. A user can log off and log on many times in a day, which creates many sessions in a day. ...
... Even though the model is trained well, the deciding factor is the threshold value, because the threshold always helps in defining a normal use case. Unlike other approaches such as those presented in [19], in the proposed approach, the threshold is calculated differently. For calculating the threshold, the reconstruction errors were collected for all attributes, by passing the normal data into the network, and the calculated minimum MSE and maximum MSE of passed data defined the thresholds. ...
Article
Full-text available
The COVID-19 pandemic made all organizations and enterprises work on cloud platforms from home, which greatly facilitates cyberattacks. Employees who work remotely and use cloud-based platforms are chosen as targets for cyberattacks. For that reason, cyber security is a more concerning issue and is now incorporated into almost every smart gadget and has become a prerequisite in every software product and service. There are various mitigations for external cyber security attacks, but hardly any for insider security threats, as they are difficult to detect and mitigate. Thus, insider cyber security threat detection has become a serious concern in recent years. Hence, this paper proposes an unsupervised deep learning approach that employs an artificial neural network (ANN)-based autoencoder to detect anomalies in an insider cyber security attack scenario. The proposed approach analyzes the behavior of the patterns of users and machines for anomalies and sends an alert based on a set security threshold. The threshold value set for security detection is calculated based on reconstruction errors that are obtained through testing the normal data. When the proposed model reconstructs the user behavior without generating sufficient reconstruction errors, i.e., no more than the threshold, the user is flagged as normal; otherwise, it is flagged as a security intruder. The proposed approach performed well, with an accuracy of 94.3% for security threat detection, a false positive rate of 11.1%, and a precision of 89.1%. From the obtained experimental results, it was found that the proposed method for insider security threat detection outperforms the existing methods in terms of performance reliability, due to implementation of ANN-based autoencoder which uses a larger number of features in the process of security threat detection.
... Furthermore, Sharma and colleagues [37] research focuses on modeling user behavior for detecting deviations that may indicate potential insider threats. The research focuses on using an unsupervised user behavior modeling approach using an LSTM based Autoencoder. ...
Article
Full-text available
Insider threat refers to those threats which are malicious and perpetrated from within by people and employees of an organization who have direct and legitimate access to its network and computing systems, and are knowledgeable about its security architecture and mode of operation. However, every successful malfeasance or benign behavior and incident often originates from malicious and subtle intents hidden in digital footprints and these can serve as forensics and precursor to every insider attack. Examining extensive datasets can be overwhelming and require significant computational resources for human analysts and conventional machine learning models. Advanced deep learning methods are capable of extracting insights from intricate data. Also, it offers a new paradigm to overcome traditional machine learning limitations such as unlabeled data, sparsity, high-dimensionality, complexity, heterogeneity, and the dynamic nature of typical malicious insiders. This paper presents a review of recent literature on deep learning applications in insider threat research.
... • LSTM (Sharma et al. 2020): The model adopts a twolayer stacked bidirectional LSTM structure, with each layer containing 128 hidden units. It captures contextual information from both forward and backward directions of the input sequences. ...
Article
Full-text available
Insider threats pose significant challenges to network security due to their destructive and covert nature, often resulting in substantial losses for enterprises. Traditional methods mainly analyze user behavior patterns or convert behaviors into time sequences for further analysis. However, existing detection methods primarily focus on identifying abnormal users or behaviors, lacking the capability to pinpoint specific threats. Additionally, these methods struggle to accurately identify long-distance dependencies in behavior sequences, frequently increasing false positives. To address these issues, we introduce a scenario-oriented insider threat detection model. This model targets three specific threat scenarios-privilege abuse, identity theft, and data leakage-by analyzing user behavior patterns, extracting detailed behavioral characteristics, and constructing behavior sequences. Firstly, this paper serializes user behavior daily and vectorizes it using one-hot encoding. Then, it introduces contextual characteristic information and reconstructs the background of abnormal behavior through behavior vectorization, providing a comprehensive description of user behavior characteristics. This approach addresses the issue of behavior isolation, thereby improving the accuracy and robustness of anomaly detection. Subsequently, a time series analysis model based on a multi-head attention mechanism is employed to analyze long-distance dependencies in behavior sequences. The multi-head attention mechanism simultaneously attends to multiple positions in the behavior sequence, capturing potential correlations between behaviors and user behavior patterns. This mechanism can analyze local information and obtain long-distance dependencies, providing depth feature representation for anomaly detection. Ultimately, we achieve the goal of classifying abnormal behavior sequences. We conduct comprehensive tests on the CERT dataset, demonstrating that our method outperforms traditional deep learning approaches (LSTM, GNN, and GCN) in detecting abnormal sequences. Compared to the best results among the baseline methods, it shows an improvement in accuracy of approximately 2% for privilege abuse, 5% for identity theft, and 2% for data leakage.
... Examples include the Artificial Immune System (AIS), such as the Negative Selection Algorithm [7], which mimics immune system responses to recognize and predict anomalous behavior. Autoencoders, particularly in combination with Gated Recurrent Units (GRU) [8] and Long Short-Term Memory (LSTM) [9] networks, are commonly used for unsupervised learning of behavior patterns, enabling the detection of deviations without labeled data. Statistical and probabilistic models offer another approach to anomaly detection. ...
Article
Full-text available
Given a series of user action sequences, Contextual Sequence-Based User Behavior Anomaly Detection (CS-UBAD) identifies anomalous sequences that deviate from normal behavior patterns. The CS-UBAD problem is important for detecting insider threats, such as unauthorized access, intellectual property theft, or other malicious activities within an organization’s systems. In this paper, we propose a novel approach called Contiguous, Contextual, and Classifying Pipeline (C3P), which integrates pattern mining and the ABC (Antecedent-Behavior-Consequence) model to calculate anomaly scores without requiring human intervention. Our method reduces the computational complexity while accurately detecting anomalous sequences.
... As we increasingly depend on the cloud and cyberattacks become more sophisticated, many ways to increase healthcare cybersecurity have been created. [5][6][7][8] This section reviews prior work in three critical areas: how zeroday threats can be detected in the cloud and how to mitigate cybersecurity challenges using Guidewire. We then seek gaps within the literature to inform existing research and opportunities for further exploration in the realm of Guidewire cloud implementations. ...
... LSTM has been used as an unsupervised anomaly detection approach by several researchers [13][14][15] . This method attempts to fix issues that regular neural networks (RNNs) have, namely a lack of temporal gradients in lengthy data sequencing. ...
Article
Full-text available
Insider threats pose a significant challenge in cybersecurity, demanding advanced detection methods for effective risk mitigation. This paper presents a comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection. Specifically, we integrate Convolutional Neural Networks (CNN) with three popular data imbalance addressing techniques: Synthetic Minority Over-sampling Technique (SMOTE), Borderline-SMOTE, and Adaptive Synthetic Sampling (ADASYN). The objective is to enhance insider threat detection accuracy and robustness in imbalanced datasets common to cybersecurity domains. Our study addresses the lack of consensus in the literature regarding the superiority of data imbalance addressing techniques in this field. We analyze a human behavior-based dataset (i.e., CERT) that reports users’ Information Technology (IT) activities with a substantial number of samples to provide a clear conclusion on the effectiveness of these balancing techniques when coupled with CNN. Experimental results demonstrate that ADASYN, in conjunction with CNN, achieves a ROC curve of 96%, surpassing SMOTE and Borderline-SMOTE in enhancing detection accuracy in imbalanced datasets. We compare the results of these three hybrid models (CNN + imbalance addressing techniques) with state-of-the-art selective studies focusing on ROC, recall, and accuracy measures. Our findings contribute to the advancement of insider threat detection methodologies.
... A variation of the recurrent neural network (RNN) architecture is the long short-term memory neural network (LSMT). LSTM has been used as an unsupervised anomaly detection technique by numerous researchers, including 49,50 . The purpose of this method is to get around some of the problems with RNNs, namely the reduction in time gradients in long data sequencing. ...
Article
Full-text available
This study examines the formidable and complex challenge of insider threats to organizational security, addressing risks such as ransomware incidents, data breaches, and extortion attempts. The research involves six experiments utilizing email, HTTP, and file content data. To combat insider threats, emerging Natural Language Processing techniques are employed in conjunction with powerful Machine Learning classifiers, specifically XGBoost and AdaBoost. The focus is on recognizing the sentiment and context of malicious actions, which are considered less prone to change compared to commonly tracked metrics like location and time of access. To enhance detection, a term frequency-inverse document frequency-based approach is introduced, providing a more robust, adaptable, and maintainable method. Moreover, the study acknowledges the significant impact of hyperparameter selection on classifier performance and employs various contemporary optimizers, including a modified version of the red fox optimization algorithm. The proposed approach undergoes testing in three simulated scenarios using a public dataset, showcasing commendable outcomes.
... If the input sequence contains an anomaly, the reconstructed sequence will be different from the original, indicating that an anomaly has been detected. The LSTM autoencoder has been commonly used for anomaly detection of time series data in fields such as medical diagnosis [21], user behavior modeling [22], business management [23], and aerospace [24]. ...
Article
Full-text available
This study presents an application of Long Short-Term Memory Autoencoder (LSTM AE) for the detection of broken rails based on laser Doppler Vibrometer (LDV) measurements. This work is part of an ongoing project aimed at developing a non-contact damage detection system using LDV measurements. The damage detection system consists of two laser Doppler vibrometers (LDV) mounted on a moving rail car to measure vibrations induced on the rail head. Field tests were carried out at the Transportation Technology Center (TTC) in Pueblo, CO, to collect the vibrational data. This study focused on the detection of broken rails. To simulate the reflected and transmitted waves induced by the broken rail, a welded joint was used. The data was collected from moving LDV measurements, in which the train was operating at three different speeds: 16km/h (10mph), 32km/h (20mph), and 48km/h (30mph). After obtaining the data, filtering and signal processing were applied to obtain the signal features in time and frequency domains. Next, correlation analysis and principal component analysis were carried out for feature selection and dimension reduction to determine the input used to train and test the LSTM AE model. In this study, the LSTM AE models were trained based on different data sets for anomaly detection. Consequently, an automatic anomaly detection approach for anomaly detection based on the LSTM AE model was evaluated. The results show that the LSTM AE model can efficiently detect the anomaly based on the selected features at three different speeds.
... User and Entity Behavior Analytics (UEBA) is an advanced security solution that leverages machine learning and analytics to detect anomalies and potential security threats by analyzing the behavior of users and entities within a network. UEBA is particularly effective in cloud computing environments due to its ability to identify subtle and complex threats that traditional security measures might miss [43,44]. Recent research highlights the effectiveness of UEBA in identifying complex and subtle threats, enhancing cloud security through advanced anomaly detection techniques [45]. ...
Article
Full-text available
With the continued development of cloud computing environments, security measures have become more im-portant than ever. Intrusion detection systems (IDS) are considered one of the most critical security measures in cloud computing. Researchers aim to find effective technologies for detecting intrusions in cloud computing. This paper presents a comprehensive survey of the techniques used for Intrusion Detection in Cloud Computing and their classification. Specifically, it covers a range of techniques such as machine learning, and provides insights for researchers looking to develop more flexible and effective techniques for intrusion detection in cloud computing.
... Examples of Previous research in this domain have utilized conventional machine learning techniques, such as SVM [27] and clustering [28], as well as deep learning methods, such as LSTM [29,30,31], autoencoders [32], transformers [33], and graph embedding [34]. Other studies modeled logs generically [29,35] or modeled specific activities like network traffic [36], user activity [37], or other domains [38]. ...
Preprint
Serverless computing is an emerging cloud paradigm with serverless functions at its core. While serverless environments enable software developers to focus on developing applications without the need to actively manage the underlying runtime infrastructure, they open the door to a wide variety of security threats that can be challenging to mitigate with existing methods. Existing security solutions do not apply to all serverless architectures, since they require significant modifications to the serverless infrastructure or rely on third-party services for the collection of more detailed data. In this paper, we present an extendable serverless security threat detection model that leverages cloud providers' native monitoring tools to detect anomalous behavior in serverless applications. Our model aims to detect compromised serverless functions by identifying post-exploitation abnormal behavior related to different types of attacks on serverless functions, and therefore, it is a last line of defense. Our approach is not tied to any specific serverless application, is agnostic to the type of threats, and is adaptable through model adjustments. To evaluate our model's performance, we developed a serverless cybersecurity testbed in an AWS cloud environment, which includes two different serverless applications and simulates a variety of attack scenarios that cover the main security threats faced by serverless functions. Our evaluation demonstrates our model's ability to detect all implemented attacks while maintaining a negligible false alarm rate.
... An Artificial Intelligence/Machine Learning(AI/ML) driven solution can analyze large data sets with a high degree of accuracy to identify the most subtle indicators of Behavior (IoBs) at a scale that manual human analysis can never match. The goal of behavior analytics is to detect anomalous user behavior that indicates potential threats such as malicious insiders, compromised accounts, data exfiltration, ransomware, and other threats, through machine learning and statistical analysis [10]. ...
Article
Ransomware, a form of malicious software originating from cryptovirology, poses a serious threat by coercing victims to pay a ransom under the risk of exposing their data or permanently restricting access. While basic ransomware may lock a system without causing harm to files, more sophisticated variants utilize cryptoviral extortion techniques. The danger of ransomware is significant, with ongoing discoveries of new strains and families on the internet and dark web. Recovering from ransomware infections is challenging due to the complex encryption schemes employed. The exploration of machine learning and deep learning methods for ransomware detection is crucial, as these technologies can identify zero-day threats. This survey delves into research contributions on the detection of ransomware using deep learning algorithms. With deep learning gaining prominence in cybersecurity, we aimed to explore techniques for ransomware detection, assess weaknesses in existing deep learning approaches, and propose enhancements using those deep learning algorithms. Machine learning algorithms can be employed to tackle worldwide computer security challenges, encompassing the detection of malware, recognition of ransomware, detection of fraud, and identification of spoofing attempts. Machine learning algorithms play a crucial role in assessing prevalent forms of cyber security risks. They are instrumental in identifying and mitigating attacks, conducting vulnerability scans, and evaluating the risks associated with the public internet. By leveraging machine learning, computer defense mechanisms can effectively identify and respond to various cyber threats. These techniques aid in fortifying systems against potential vulnerabilities and enhance the overall security posture. Research in this field investigates the utilization of cyber training in both defensive and offensive contexts, offering insights into the intersection of cyber threats and machine learning techniques.
... This is true regardless of the origin of the actions that they carry out. From the outside, it is difficult to conceal hacking tracks, but it is similarly difficult to detect hostile insiders based on signature-based profiles of the users [3]. This is because it is difficult to conceal hacking trails from outsiders. ...
Article
Full-text available
The detection of malicious user conduct that does not result in an alert for an access violation or a data breach may prove to be a difficult task by itself. With the stolen login credentials, the intruder who is conducting espionage will initially make an effort to acquire data from the company network that he is authorised to access in a stealthy manner while attempting to avoid being discovered. This article presents a description of the User Behaviour Analytics Platform, which was designed to collect logs, extract features, and detect atypical users who may include possible insider threats. The platform was developed at the beginning of this article. A multi-algorithm ensemble that incorporates OCSVM, RNN, and Isolation Forest is also described. This is in addition to the previous point. Under the conditions of the experiment, it was proved that the system, which is made up of a collection of unsupervised anomaly detection algorithms, is able to recognise unusual patterns of user behaviour. The suggested study makes an effort to identify behaviours that are considered to be insider threats and to keep an eye out for any behaviour that is deemed to be unexpected or suspicious by the model. This behaviour is considered to be anomalies because it results in a high level of reconstruction error inside the model. During the training phase of the model, feature vectors that have been derived from user log activities are implemented within a predetermined time frame of each day. This strategy makes use of an autoencoder that is based on Gated Recurrent Units (GRU) in order to model user behaviour on a daily basis and identify abnormal insider threat spots. Errors generated by normal data are negligible since the model has been overfitted with normal data. However, when it comes to the malevolent category of aberrant data, the autoencoder produces a massive mistake. Computer Emergency Response Team (CERT) r4.2 is the name of the dataset used in this study. The feature vectors used are computed from the daily occurrences of a specific action by the users. GRU autoencoder is utilised for the purpose of behaviour learning
... Then, such a baseline is transmitted to a deep learning (DL)based anomaly detection system that employs a Long Short-Term Memory (LSTM) model. Similarly, a LSTM model is leveraged in the study by Sharma et al. (2020) as an autoencoder to learn the user behavioral pattern (known in advance, i.e., such method works on labeled data) and to determine anomalies according to a threshold. The latter is computed on the basis of a reconstruction error defined on the legitimate data subset. ...
Article
Full-text available
Introduction Government agencies are now encouraging industries to enhance their security systems to detect and respond proactively to cybersecurity incidents. Consequently, equipping with a security operation center that combines the analytical capabilities of human experts with systems based on Machine Learning (ML) plays a critical role. In this setting, Security Information and Event Management (SIEM) platforms can effectively handle network-related events to trigger cybersecurity alerts. Furthermore, a SIEM may include a User and Entity Behavior Analytics (UEBA) engine that examines the behavior of both users and devices, or entities, within a corporate network. Methods In recent literature, several contributions have employed ML algorithms for UEBA, especially those based on the unsupervised learning paradigm, because anomalous behaviors are usually not known in advance. However, to shorten the gap between research advances and practice, it is necessary to comprehensively analyze the effectiveness of these methodologies. This paper proposes a thorough investigation of traditional and emerging clustering algorithms for UEBA, considering multiple application contexts, i.e., different user-entity interaction scenarios. Results and discussion Our study involves three datasets sourced from the existing literature and fifteen clustering algorithms. Among the compared techniques, HDBSCAN and DenMune showed promising performance on the state-of-the-art CERT behavior-related dataset, producing groups with a density very close to the number of users.
... In this Table 11, we have presented a comparison of existing work with the results of the proposed work using performance evaluation metrics, including accuracy, precision, True Negative Rate, Area Under Curve, and False Positive Rate. These results are then compared with those of four established approaches: DNN [58], OCSVM based on DBN [59], LSTM Autoencoder [60], and User Behavior Analysis [61]. The results demonstrate that supervised learning with a balanced dataset in RF achieves the highest accuracy and F1-score of 95.9% compared to the existing works. ...
Article
Full-text available
Insider threats refer to abnormal actions taken by individuals with privileged access, compromising system data’s confidentiality, integrity, and availability. They pose significant cybersecurity risks, leading to substantial losses for several organizations. Detecting insider threats is crucial due to the imbalance in their datasets. Moreover, the performance of existing works has been evaluated on various datasets and problem settings, making it challenging to compare the effectiveness of different algorithms and offer recommendations to decision-makers. Furthermore, no existing work investigates the impact of changing hyperparameters. This paper aims to objectively assess the performance of various supervised machine learning algorithms for detecting insider threats under the same setting. We precisely evaluate the performance of various supervised machine learning algorithms on a balanced dataset using the same feature extraction method. Additionally, we explore the impact of hyperparameter tuning on performance within the balanced dataset. Finally, we investigate the performance of different algorithms in the context of imbalanced datasets under various conditions. We conduct all the experiments in the publicly available CERT r4.2 dataset. The results show that supervised learning with a balanced dataset in RF obtains the best accuracy and F1-score of 95.9% compared with existing works, such as, DNN, LSTM Autoencoder and User Behavior Analysis.
... An LSTM-based Autoencoder model was employed for modelling user behavior and session activities, with the objective of identifying data points that deviate significantly from the norm for anomaly detection. The dataset utilized for training and validation purposes is the CERT r4.2 dataset and Accuracy, Recall, and False Positive rate were the evaluation metrics [46]. ...
Article
Computers are crucial instruments providing a competitive edge to organizations that have adopted them. Their pervasive presence has presented a novel challenge to information security, specifically threats emanating from privileged employees. Various solutions have been tried to address the vice, but no exhaustive solution has been found. Due to their elusive nature, proactive strategies have been proposed of which detection using Machine Learning models has been favoured. The choice of algorithm, datasets and metrics are cornerstones of model performance and hence, need to be addressed. Although multiple studies on ML for insider threat detection have been done, none has provided a comprehensive analysis of algorithms, datasets and metrics for development of Insider Threat Detection models. This study conducts a comprehensive systematic literature review using reputable databases to answer the research questions posed. Search strings, inclusion and exclusion criteria were set for eligibility of articles published in the last decade.
Article
Full-text available
The increasing complexity of organizational systems creates new opportunities for insider threats to exploit vulnerabilities and cause significant damage. Insider threat detection (ITD) has become a critical first line of defense for organizations to prevent security breaches. Researchers have developed numerous methodologies targeting specific types of network activities, such as file transfers, login attempts, and network traffic patterns to address these threats. User behavioral-based insider threat detection (UBITD) is a critical research and development direction in cybersecurity. Despite the abundance of research on ITD methods, there is a notable scarcity of systematic reviews focusing on the latest advancements and the data used to train them. Although numerous review papers have explored various ITD approaches, most adopt a non-systematic approach, merely comparing existing techniques without providing a comprehensive analytical synthesis of methodologies and performance outcomes. Consequently, these reviews fall short of delivering a holistic understanding of the current ITD landscape, as much of the existing literature emphasizes signature-based ITD with a focus on machine learning and deep learning models, while UBITD remains minimally explored. This paper presents an in-depth analysis of UBITD by systematically reviewing 101 of the most influential research papers published on the topic. Our analysis rigorously examines the technical advancements, data preprocessing techniques, detection approaches, evaluation metrics, researcher collaborations, datasets, and future trends in this field. The findings reveal unsolved research challenges and uncharted research areas within each of these perspectives. By outlining several high-impact future research endeavors, this study aims to strengthen ITD role in cybersecurity, contributing to the development of more robust and proactive defenses against insider threats.
Article
Logs record crucial information about runtime status of software system, which can be utilized for anomaly detection and fault diagnosis. However, techniques struggle to perform effectively when dealing with interleaved logs and entities that influence each other. Although manually specifying a grouping field for each dataset can handle the single grouping scenario, the problems of multiple and heterogeneous grouping still remain unsolved. To break through these limitations, we first design a log semantic association mining approach to convert log sequences into Log-Entity Graph, and then propose a novel log anomaly detection model named Lograph. The semantic association can be utilized to implicitly group the logs and sort out complex dependencies between entities, which have been overlooked in existing literature. Also, a Heterogeneous Graph Attention Network is utilized to effectively capture anomalous patterns of both logs and entities, where Log-Entity Graph serves as a data management and feature engineering module. We evaluate our model on real-world log datasets, comparing with nine baseline models. The experimental results demonstrate that Lograph can improve the accuracy of anomaly detection, especially on the datasets where entity relationships are intricate and grouping strategies are not applicable.
Article
In the current digital age, users store their personal information in corporate databases to access services, making data security and sensitive information protection central to enterprise security management. Given the extensive attack surface, system assets continuously face cyber security challenges such as weak authentication, exploitation of system vulnerabilities, and malicious software. Through specific vulnerabilities, attackers may gain unauthorized system access, masquerading as legitimate users, and remaining hidden. Successful attacks can lead to the leakage of user privacy, disruption of business operations, significant financial losses, and damage to corporate reputation. The increasing complexity of attack vectors is blurring the boundaries between insider and external threats. To address this issue, this paper introduces the IDU-Detector, an innovative threat detection framework that strategically integrates Intrusion Detection Systems (IDS) with User and Entity Behavior Analytics (UEBA). This integration aims to monitor unauthorized access and malicious attacks within systems, bridging functional gaps between existing systems, ensuring continuous monitoring and real-time response of the network environment, and enhancing their collective effectiveness in identifying security threats. Additionally, the existing insider threat datasets exhibit significant deficiencies in both depth and comprehensiveness, lacking sufficient coverage of diverse attack vectors. This limitation hinders the ability of insider threat detection technologies to effectively address the growing complexity and expanding scope of sophisticated attack surfaces. To address these gaps, we propose new, more enriched and diverse datasets that includes a wider range of attack scenarios, thereby enhancing the adaptability and effectiveness of detection technologies in complex threat environments. We tested our framework on different datasets, the IDU-Detector achieved average accuracy rates of 98.96% and 99.12%. These results demonstrate the method’s effectiveness in detecting masquerader attacks and other malicious activities, significantly improving security protection and incident response speed, and providing a higher level of security assurance for asset safety.
Preprint
Full-text available
In the digital age, users store personal data in corporate databases, making data security central to enterprise management. Given the extensive attack surface, assets face challenges like weak authentication, vulnerabilities, and malware. Attackers may exploit vulnerabilities to gain unauthorized access, masquerading as legitimate users. Such attacks can lead to privacy breaches, business disruption, financial losses, and reputational damage. Complex attack vectors blur lines between insider and external threats. To address this, we introduce the IDU-Detector, integrating Intrusion Detection Systems (IDS) with User and Entity Behavior Analytics (UEBA). This integration monitors unauthorized access, bridges system gaps, ensures continuous monitoring, and enhances threat identification. Existing insider threat datasets lack depth and coverage of diverse attack vectors. This hinders detection technologies from addressing complex attack surfaces. We propose new, diverse datasets covering more attack scenarios, enhancing detection technologies. Testing our framework, the IDU-Detector achieved average accuracies of 98.96% and 99.12%. These results show effectiveness in detecting attacks, improving security and response speed, and providing higher asset safety assurance.
Article
Full-text available
This study delineates a suite of architectural views and a security perspective tailored to guide the deployment and integration of Social Robots in Public Spaces (SRPS). It commences with a business context view that utilizes the customer-producer-supplier model, underscoring the value of SRPS to various stakeholders and illustrating how robots can enhance user experiences and drive economic benefits. The system context view details the intricate interactions among the social robot, stakeholders, public spaces, and external systems, highlighting essential considerations for successful deployment, from technical configurations to stakeholder engagement. The functional view elaborates on the operational dynamics of the robot within its environment, focusing on user interaction and data management capabilities. Additionally, the security perspective delves into security considerations vital for safeguarding the SRPS across various domains, including identity and access management, application and network security, and data privacy. The paper also contextualizes these views through a city ferry use case, demonstrating their practical application and reinforcing the importance of multifaceted planning and analysis in real-world settings. This approach provides a strategic framework views for developing SRPS that are viable, efficient, and secure, fostering successful adoption in diverse public environments.
Article
Full-text available
Insider threats are profoundly damaging and pose serious security challenges. These threats, perpetrated by insiders, may arise from delinquency, retaliation, or motives such as ambition for success, recognition, financial gain, or knowledge acquisition. They manifest in various forms; for example, an insider might disrupt systems by inserting a malicious script or engage in intellectual property theft. Due to their diverse nature, detection is highly complex and challenging, as standard security devices such as intrusion detection systems, firewalls, or antivirus software cannot detect it; hence, it entails careful and diligent work. This survey reviews existing research between 2010 and 2024 on detecting insider threats. It not only expounds on the novel taxonomy based on previous works and diverse motivations for insider threats but also identifies challenges and gaps in detecting malicious insiders. It highlights the state-of-the-art tools, techniques, and methodologies and also discusses the limitations of the same. Finally, the paper provides an overview of identifying optimal solutions and discusses future research directions that could lead to new methods for detecting insider threats.
Article
Security threats have been the major challenge for any organization. This has even been more threatening since in present days most of the organizational data are in digital form and digital data are easy to access and alter if not properly secured. While most of the threats considered are external threats like Viruses, Worms, DOS, DDOS, hacking etc. Internal threats also cannot be ignored. Many frauds, especially for organizations that perform financial transactions, are done by misusing the internal access to the data. Internal threats happen from the users who have some privileged access to the data. Finding such a threat is not only difficult but also more challenging than that from the external source. Most organizations don’t give internal threats that much consideration but lately many works have been done in the field of internal threat detection.
Article
Full-text available
Businesses are experiencing an ever-growing problem of how to identify and guard in opposition to insider threats. Users with legal access to sensitive organizational data are positioned in a role of power that can be abused and could do harm to an enterprise. This can range from monetary and intellectual property theft to the destruction of assets and enterprise reputation. Traditional intrusion detection structures are neither designed nor able to figure out those who act maliciously inside a business enterprise. In this paper, we describe an automated system capable of detecting insider threats within an enterprise. We outline a tree-shape profiling technique that includes the information on activities conducted by each user and every task after which we use this to obtain a consistent representation of functions that provide a rich description of the user's behavior. The deviation may be assessed based on the amount of variance that each user exhibits across multiple attributes, compared in opposition to their peers. The primary function of User and Entity behavior Analysis(UEBA) is to track normal user behaviors. UEBA defines a baseline for each entity in the environment, and actions will be evaluated by comparing with pr-defined baselines.
Preprint
Full-text available
In the realm of healthcare, the continuous evolution of monitoring systems demands innovative solutions to ensure heightened reliability and accuracy. This paper introduces a pioneering approach to healthcare monitoring through a hybrid deep learning model that combines the advantages of recurrent neural networks (RNN) and deep neural networks (DNN). Focused on enhancing connectivity in Software Defined Networking (SDN), our framework places a significant emphasis on anomaly detection for improved predictive accuracy. The proposed Hybrid Deep Learning model is meticulously designed to harness the complementary features of DNN and RNN, enabling the system to capture both spatial and temporal dependencies in healthcare data. This integration enhances the precision of anomaly detection, allowing for the identification of subtle deviations from normal patterns with unprecedented accuracy. Key to our methodology is the adaptability of Software Defined Networking, providing a flexible and programmable infrastructure. The Hybrid Deep Learning model operates seamlessly within this SDN framework, dynamically optimizing resource allocation and traffic patterns to accommodate the unique demands of healthcare monitoring. Through extensive experimentation and validation, our framework demonstrates remarkable predictive accuracy in identifying anomalies within healthcare data streams. Comparative analyses against traditional anomaly detection methods underscore the superiority of our approach, showcasing its efficacy in real-world healthcare scenarios. In conclusion, our research contributes to the advancement of healthcare monitoring by introducing a Hybrid Deep Learning model, combining DNN and RNN architectures, within the context of Software Defined Networking. The achieved high prediction accuracy in anomaly detection signifies a significant leap forward in the reliability and precision of healthcare monitoring systems, paving the way for more robust and responsive healthcare networks.
Article
Full-text available
Insider threats pose a significant risk to organizations, necessitating robust detection mechanisms to safeguard against potential damage. Traditional methods struggle to detect insider threats operating within authorized access. Therefore, the use of Artificial Intelligence (AI) techniques is essential. This study aimed to provide valuable insights for insider threat research by synthesizing advanced AI methodologies that offer promising avenues to enhance organizational cybersecurity defenses. For this purpose, this paper explores the intersection of AI and insider threat detection by acknowledging organizations' challenges in identifying and preventing malicious activities by insiders. In this context, the limitations of traditional methods are recognized, and AI techniques, including user behavior analytics, Natural Language Processing (NLP), Large Language Models (LLMs), and Graph-based approaches, are investigated as potential solutions to provide more effective detection mechanisms. For this purpose, this paper addresses challenges such as the scarcity of insider threat datasets, privacy concerns, and the evolving nature of employee behavior. This study contributes to the field by investigating the feasibility of AI techniques to detect insider threats and presents feasible approaches to strengthening organizational cybersecurity defenses against them. In addition, the paper outlines future research directions in the field by focusing on the importance of multimodal data analysis, human-centric approaches, privacy-preserving techniques, and explainable AI.
Technical Report
Businesses are experiencing an ever-growing problem of how to identify and guard in opposition to insider threats. Users with legal access to sensitive organizational data are positioned in a role of power that can be abused and could do harm to an enterprise. This can range from monetary and intellectual property theft to the destruction of assets and enterprise reputation. Traditional intrusion detection structures are neither designed nor able to figure out those who act maliciously inside a business enterprise. In this paper, we describe an automated system capable of detecting insider threats within an enterprise. We outline a tree-shape profiling technique that includes the information on activities conducted by each user and every task after which we use this to obtain a consistent representation of functions that provide a rich description of the user's behavior. The deviation may be assessed based on the amount of variance that each user exhibits across multiple attributes, compared in opposition to their peers. The primary function of User and Entity behavior Analysis(UEBA) is to track normal user behaviors. UEBA defines a baseline for each entity in the environment, and actions will be evaluated by comparing with pr-defined baselines.
Article
Cybersecurity has become an increasingly vital concern for numerous institutions, organizations, and governments. Many studies have been carried out to prevent external attacks, but there are not enough studies to detect insider malicious actions. Given the damage inflicted by attacks from internal threats on corporate reputations and financial situations, the absence of work in this field is considered a significant disadvantage. In this study, several deep learning models using fully connected layer, convolutional neural network and long short-term memory were developed for user and entity behavior analysis. The hyper-parameters of the models were optimized using Bayesian optimization techniques. Experiments analysis were performed using the version 4.2 of Computer Emergency and Response Team Dataset. Two types of features, which are personal information and numerical features, were extracted with respect to daily activities of users. Dataset was divided with respect to user or role and experiment results showed that user based models have better performance than the role based models. In addition to this, the models that developed using long short-term memory were more accurate than the others. Accuracy, detection rate, f1-score, false discovery rate and negative predictive value were used as metrics to compare model performance fairly with state-of-the-art models. According the results of these metrics, our model obtained better scores than the state-of-the-art models and the performance improvements were statistically significant according to the two-tailed Z test. The study is anticipated to significantly contribute to the literature, as the deep learning approaches developed within its scope have not been previously employed in internal threat detection. Moreover, these approaches have demonstrated superior performance compared to previous studies.
Preprint
Full-text available
Fraud detection in the fintech sector is a critical area of concern as financial transactions increasingly shift to digital platforms. This paper presents a comprehensive analysis of enhancing fraud detection in fintech by combining machine learning techniques, leveraging behavioral analytics, and adopting RegTech solutions. The objective is to develop a holistic approach that strengthens fraud prevention strategies, ensures regulatory compliance, and safeguards the interests of customers and financial institutions. The paper begins with an introduction that sets the context by highlighting the growing importance of fraud detection in the digital financial landscape. It outlines the research objectives, scope, and structure of the paper. Subsequently, the methodology section details the data collection process, the selection and comparative analysis of machine learning models, the integration of behavioral analytics, and the implementation of RegTech solutions. The paper concludes with a summary of findings and contributions, emphasizing the significance of adopting a holistic approach to fraud detection in the fintech industry. It underscores the need for financial institutions to embrace advanced technologies, comply with data privacy regulations, and collaborate within the industry to combat financial crimes effectively.
Conference Paper
Insider threats involving authorised individuals exploiting their access privileges within an organisation can yield substantial damage compared to external threats. Conventional detection approaches analyse user behaviours from logs, using binary classifiers to distinguish between malicious and non-malicious users. However, existing methods focus solely on standalone or sequential activities. To enhance the detection of malicious insiders, we propose a novel approach: bilateral insider threat detection combining RNNs to incorporate standalone and sequential activities. Initially, we extract behavioural traits from log files representing standalone activities. Subsequently, RNN models capture features of sequential activities. Concatenating these features, we employ binary classification to detect insider threats effectively. Experiments on the CERT 4.2 dataset showcase the approach’s superiority, significantly enhancing insider threat detection using features from both standalone and sequential activities.
Article
Full-text available
Cyber-attacks are becoming more sophisticated and thereby presenting increasing challenges in accurately detecting intrusions. Failure to prevent the intrusions could degrade the credibility of security services, e.g. data confidentiality, integrity, and availability. Numerous intrusion detection methods have been proposed in the literature to tackle computer security threats, which can be broadly classified into Signature-based Intrusion Detection Systems (SIDS) and Anomaly-based Intrusion Detection Systems (AIDS). This survey paper presents a taxonomy of contemporary IDS, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes. It also presents evasion techniques used by attackers to avoid detection and discusses future research challenges to counter such techniques so as to make computer systems more secure.
Article
Full-text available
Insider threats are a considerable problem within cyber security and it is often difficult to detect these threats using signature detection. Increasing machine learning can provide a solution, but these methods often fail to take into account changes of behaviour of users. This work builds on a published method of detecting insider threats and applies Hidden Markov method on a CERT data set (CERT r4.2) and analyses a number of distance vector methods (Damerau-Levenshtein Distance, Cosine Distance, and Jaccard Distance) in order to detect changes of behaviour, which are shown to have success in determining different insider threats.
Conference Paper
Full-text available
Security is one of the top concerns of any enterprise. Most security practitioners in enterprises rely on correlation rules to detect potential threats. While the rules are intuitive to design, each rule is independently defined per log source, unable to collectively address heterogeneity of data from a myr-iad of enterprise networking and security logs. Furthermore, correlation rules do not look for data events beyond a short time range. To complement the conventional correlation rules-based system, we propose a user activity anomaly detection method. The method first addresses data heterogeneity of multi-source logs by designing a meta data extraction step for event normalization. It then builds user-specific models to flag alerts for users whose currently observed event patterns are sufficiently different from their own patterns in the past.
Conference Paper
Full-text available
The threat that malicious insiders pose towards organisations is a significant problem. In this paper, we investigate the task of detecting such insiders through a novel method of modelling a user's normal behaviour in order to detect anomalies in that behaviour which may be indicative of an attack. Specifically, we make use of Hidden Markov Models to learn what constitutes normal behaviour, and then use them to detect significant deviations from that behaviour. Our results show that this approach is indeed successful at detecting insider threats, and in particular is able to accurately learn a user's behaviour. These initial tests improve on existing research and may provide a useful approach in addressing this part of the insider-threat challenge.
Article
Full-text available
Cyber security is vital to the success of today’s digital economy. The major security threats are coming from within, as opposed to outside forces. Insider threat detection and prediction are important mitigation techniques. This study addresses the following research questions: 1) what are the research trends in insider threat detection and prediction nowadays? 2) What are the challenges associated with insider threat detection and prediction? 3) What are the best-to-date insider threat detection and prediction algorithms? We conduct a systematic review of 37 articles published in peer-reviewed journals, conference proceedings and edited books for the period of 1950–2015 to address the first two questions. Our survey suggests that game theoretic approach (GTA) is a popular source of insider threat data; the insiders’ online activities are the most widely used features in insider threat detection and prediction; most of the papers use single point estimates of threat likelihood; and graph algorithms are the most widely used tools for detecting and predicting insider threats. The key challenges facing the insider threat detection and prediction system include unbounded patterns, uneven time lags between activities, data nonstationarity, individuality, collusion attacks, high false alarm rates, class imbalance problem, undetected insider attacks, uncertainty, and the large number of free parameters in the model. To identify the best-to-date insider threat detection and prediction algorithms, our meta-analysis study excludes theoretical papers proposing conceptual algorithms from the 37 selected papers resulting in the selection of 13 papers. We rank the insider threat detection and prediction algorithms presented in the 13 selected papers based on the theoretical merits and the transparency of information. To determine the significance of rank sums, we perform “the Friedman two-way analysis of variance by ranks” test and “multiple comparisons between groups or conditions” tests.
Article
Full-text available
Mechanical devices such as engines, vehicles, aircrafts, etc., are typically instrumented with numerous sensors to capture the behavior and health of the machine. However, there are often external factors or variables which are not captured by sensors leading to time-series which are inherently unpredictable. For instance, manual controls and/or unmonitored environmental conditions or load may lead to inherently unpredictable time-series. Detecting anomalies in such scenarios becomes challenging using standard approaches based on mathematical models that rely on stationarity, or prediction models that utilize prediction errors to detect anomalies. We propose a Long Short Term Memory Networks based Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) that learns to reconstruct 'normal' time-series behavior, and thereafter uses reconstruction error to detect anomalies. We experiment with three publicly available quasi predictable time-series datasets: power demand, space shuttle, and ECG, and two real-world engine datasets with both predictive and unpredictable behavior. We show that EncDec-AD is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, we show that EncDec-AD is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500).
Chapter
Insider threat detection has attracted a considerable attention from the researchers and industries. Existing work mainly focused on applying machine-learning techniques to detecting insider threat. However, this work requires “feature engineering” which is difficult and time-consuming. As we know, the deep learning technique can automatically learn powerful features. In this paper, we present a novel insider threat detection method with Deep Neural Network (DNN) based on user behavior. Specifically, we use the LSTM-CNN framework to find user’s anomalous behavior. First, similar to natural language modeling, we use the Long Short Term Memory (LSTM) to learn the language of user behavior through user actions and extract abstracted temporal features. Second, the extracted features are converted to the fixed-size feature matrices and the Convolutional Neural Network (CNN) use these fixed-size feature matrices to detect insider threat. We conduct experiments on a public dataset of insider threats. Experimental results show that our method can successfully detect insider threat and we obtained AUC = 0.9449 in best case.
The Essential guide to User Behavior Analytics
  • Balabit Publication
  • Publication BALABIT
BALABIT Publication. The Essential guide to User Behavior Analytics. Retrieved March 6, 2020 from https://www.ciosummits.com/Online_Assets_Balabit_Essential_Guide_to_User_ Behavior_Analytics.pdf
Anomalous User Activity Detection in Enterprise Multi-source Logs
  • Qiaona Hu
  • Baoming Tang
  • Derek Lin
Qiaona Hu, Baoming Tang, and Derek Lin. 2017. Anomalous User Activity Detection in Enterprise Multi-source Logs. 2017 IEEE International Conference on Data Mining Workshops (ICDMW)(2017).