Technical Report

An MBS Model For Enterprise Security Using User and Entity Behavior Analytics

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Businesses are experiencing an ever-growing problem of how to identify and guard in opposition to insider threats. Users with legal access to sensitive organizational data are positioned in a role of power that can be abused and could do harm to an enterprise. This can range from monetary and intellectual property theft to the destruction of assets and enterprise reputation. Traditional intrusion detection structures are neither designed nor able to figure out those who act maliciously inside a business enterprise. In this paper, we describe an automated system capable of detecting insider threats within an enterprise. We outline a tree-shape profiling technique that includes the information on activities conducted by each user and every task after which we use this to obtain a consistent representation of functions that provide a rich description of the user's behavior. The deviation may be assessed based on the amount of variance that each user exhibits across multiple attributes, compared in opposition to their peers. The primary function of User and Entity behavior Analysis(UEBA) is to track normal user behaviors. UEBA defines a baseline for each entity in the environment, and actions will be evaluated by comparing with pr-defined baselines.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The increasing penetration of renewable energy sources causes complex uncertainties of the power system. To capture such uncertainties in power system planning, an important step is to generate representative scenarios. In this work, a long short term memory (LSTM) auto‐encoder based approach is proposed to generate representative scenarios in an integrated hydro‐photovoltaic (PV) power generation system, which consists of feature extraction by LSTM Encoder, scenario clustering in feature domain by combining gap statistics method and K‐means++, and representative scenario reconstruction by using LSTM Decoder. Compared with traditional scenario selection and generation methods, the proposed method can better capture the patterns of multivariate time‐series data in both temporal and spatial dimensions. A case study in southwest China is used to demonstrate the effectiveness of the proposed method, which outperforms other existing methods by achieving the lowest SSE and DBI indices of 0.89 and 0.12, respectively, and obtaining the best SIL and CHI scores of 0.93 and 2.30, respectively, In addition, the case study shows the proposed model setup works more stable for scenario generation.
Conference Paper
Full-text available
Organizations are using advanced security solutions to protect their information resources. However, even such high investments, traditional security approaches failed to protect the network structure against state-of-the-art attacks. New proactive approaches to security are on the rise such as User Entity Behavior Analytics (UEBA). UEBA is a type of cybersecurity process that uses machine learning, algorithms, and statistical analyses to detect real-time network attacks. This paper aims to assess the value and success of using behavior analytics in securing the network from not-before-seen attacks such as zero-day attacks. This paper uses a systematic literature review and self-administrated survey and interviews with convenience sampling of high profile network users and top security vendors. Survey and interviews with various security experts are utilized to verify the matter-of-fact effectiveness of the solutions based on behavior analytics. During collecting the primary data via a survey, researchers will go for a structured interview with vendors who are selling solutions to understand the performance of behavior analytics-based solutions and the distinct features of their solutions. The results of literature review, survey, interviews and focus groups will be used to assess the value and success of using behavior analytics in securing the network from not-before-seen attacks such as zeroday attacks. The endeavor of this paper is to highlight the weaknesses and strengths of different UEBA solutions and their effectiveness for detecting network attacks in real-time interaction. This research contrasts top fifteen UEBA technologies based on use cases and capabilities and highlights common usage scenarios. Based on the evidence, recommendations will be given.
Conference Paper
Full-text available
The detection of anomalous structures in natural image data is of utmost importance for numerous tasks in the field of computer vision. The development of methods for unsupervised anomaly detection requires data on which to train and evaluate new approaches and ideas. We introduce the MVTec Anomaly Detection (MVTec AD) dataset containing 5354 high-resolution color images of different object and texture categories. It contains normal, i.e., defect-free, images intended for training and images with anomalies intended for testing. The anomalies manifest themselves in the form of over 70 different types of defects such as scratches, dents, contaminations, and various structural changes. In addition, we provide pixel-precise ground truth regions for all anomalies. We also conduct a thorough evaluation of current state-of-the-art unsupervised anomaly detection methods based on deep architectures such as convolutional autoencoders, generative adversarial networks, and feature descriptors using pre-trained convolutional neural networks, as well as classical computer vision methods. This initial benchmark indicates that there is considerable room for improvement. To the best of our knowledge, this is the first comprehensive, multi-object, multi-defect dataset for anomaly detection that provides pixel-accurate ground truth regions and focuses on real-world applications.
Chapter
Full-text available
Anomaly detection is a classical problem in computer vision, namely the determination of the normal from the abnormal when datasets are highly biased towards one class (normal) due to the insufficient sample size of the other class (abnormal). While this can be addressed as a supervised learning problem, a significantly more challenging problem is that of detecting the unknown/unseen anomaly case that takes us instead into the space of a one-class, semi-supervised learning paradigm. We introduce such a novel anomaly detection model, by using a conditional generative adversarial network that jointly learns the generation of high-dimensional image space and the inference of latent space. Employing encoder-decoder-encoder sub-networks in the generator network enables the model to map the input image to a lower dimension vector, which is then used to reconstruct the generated output image. The use of the additional encoder network maps this generated image to its latent representation. Minimizing the distance between these images and the latent vectors during training aids in learning the data distribution for the normal samples. As a result, a larger distance metric from this learned data distribution at inference time is indicative of an outlier from that distribution—an anomaly. Experimentation over several benchmark datasets, from varying domains, shows the model efficacy and superiority over previous state-of-the-art approaches.
Conference Paper
Full-text available
We evaluate transfer representation-learning for anomaly detection using convolutional neural networks by: (i) transfer learning from pretrained networks, and (ii) transfer learning from an auxiliary task by defining sub-categories of the normal class. We empirically show that both approaches offer viable representations for the task of anomaly detection, without explicitly imposing a prior on the data.
Article
Full-text available
Cyber security is vital to the success of today’s digital economy. The major security threats are coming from within, as opposed to outside forces. Insider threat detection and prediction are important mitigation techniques. This study addresses the following research questions: 1) what are the research trends in insider threat detection and prediction nowadays? 2) What are the challenges associated with insider threat detection and prediction? 3) What are the best-to-date insider threat detection and prediction algorithms? We conduct a systematic review of 37 articles published in peer-reviewed journals, conference proceedings and edited books for the period of 1950–2015 to address the first two questions. Our survey suggests that game theoretic approach (GTA) is a popular source of insider threat data; the insiders’ online activities are the most widely used features in insider threat detection and prediction; most of the papers use single point estimates of threat likelihood; and graph algorithms are the most widely used tools for detecting and predicting insider threats. The key challenges facing the insider threat detection and prediction system include unbounded patterns, uneven time lags between activities, data nonstationarity, individuality, collusion attacks, high false alarm rates, class imbalance problem, undetected insider attacks, uncertainty, and the large number of free parameters in the model. To identify the best-to-date insider threat detection and prediction algorithms, our meta-analysis study excludes theoretical papers proposing conceptual algorithms from the 37 selected papers resulting in the selection of 13 papers. We rank the insider threat detection and prediction algorithms presented in the 13 selected papers based on the theoretical merits and the transparency of information. To determine the significance of rank sums, we perform “the Friedman two-way analysis of variance by ranks” test and “multiple comparisons between groups or conditions” tests.
Article
Full-text available
Risk assessment and management was established as a scientific field some 30-40 years ago. Principles and methods were developed for how to conceptualise, assess and manage risk. These principles and methods still represent to a large extent the foundation of this field today, but many advances have been made, linked to both the theoretical platform and practical models and procedures. The purpose of the present invited paper is to perform a review of these advances, with a special focus on the fundamental ideas and thinking on which these are based. We have looked for trends in perspectives and approaches, and we also reflect on where further development of the risk field is needed and should be encouraged. The paper is written for readers with different types of background, not only for experts on risk.
Article
Full-text available
Organizations are experiencing an ever-growing concern of how to identify and defend against insider threats. Those who have authorized access to sensitive organizational data are placed in a position of power that could well be abused and could cause significant damage to an organization. This could range from financial theft and intellectual property theft to the destruction of property and business reputation. Traditional intrusion detection systems are neither designed nor capable of identifying those who act maliciously within an organization. In this paper, we describe an automated system that is capable of detecting insider threats within an organization. We define a tree-structure profiling approach that incorporates the details of activities conducted by each user and each job role and then use this to obtain a consistent representation of features that provide a rich description of the user's behavior. Deviation can be assessed based on the amount of variance that each user exhibits across multiple attributes, compared against their peers. We have performed experimentation using ten synthetic data-driven scenarios and found that the system can identify anomalous behavior that may be indicative of a potential threat. We also show how our detection system can be combined with visual analytics tools to support further investigation by an analyst.
Article
Full-text available
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Article
Full-text available
The outlying property detection problem is the problem of discovering the properties distinguishing a given object, known in advance to be an outlier in a database, from the other database objects. In this paper, we analyze the problem within a context where numerical attributes are taken into account, which represents a relevant case left open in the literature. We introduce a measure to quantify the degree the outlierness of an object, which is associated with the relative likelihood of the value, compared to the to the relative likelihood of other objects in the database. As a major contribution, we present an efficient algorithm to compute the outlierness relative to significant subsets of the data. The latter subsets are characterized in a "rule-based" fashion, and hence the basis for the underlying explanation of the outlierness.
Conference Paper
Full-text available
In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the k nearest neighbors of each point in a fast and efficient way by linearizing the search space through the Hilbert space filling curve. The algorithm consists of two phases, the first provides an approximated solution, within a small factor, after executing at most d + 1 scans of the data set with a low time complexity cost, where d is the number of dimensions of the data set. During each scan the number of points candidate to belong to the solution set is sensibly reduced. The second phase returns the exact solution by doing a single scan which examines further a little fraction of the data set. Experimental results show that the algorithm always finds the exact solution during the first phase after d- 《 d + 1 steps and it scales linearly both in the dimensionality and the size of the data set.
Article
Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables us to learn and discover latent relationships between interesting lyrics and accompanying melodies. Unfortunately, the limited availability of a paired lyrics–melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory (LSTM)–Generative Adversarial Network for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Extensive experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics.
Conference Paper
Identifying anomalies from log data for insider threat detection is practically a very challenging task for security analysts. User behavior modeling is very important for the identification of these anomalies. This paper presents unsupervised user behavior modeling for anomaly detection. The proposed approach uses LSTM based Autoencoder to model user behavior based on session activities and thus identify the anomalous data points. The proposed method follows a two-step process. First, it calculates the reconstruction error using the autoencoder on the non-anomalous dataset, and then it is used to define the threshold to separate the outliers from the normal data points. The identified outliers are then classified as anomalies. The CERT insider threat dataset has been used for the research work. For each user, the feature vectors are prepared by extracting key information from corresponding raw events and aggregating the data points based on users' actions within respective users' sessions. LSTM Autoencoder has been implemented for behavior learning and anomaly detection. For any unseen behavior or anomaly pattern, the model produces high reconstruction error which is an indication of an anomaly. The experimental results show that in the best case, the model produced an Accuracy of 90.17%, True Positives 91.03%, and False Positives 9.84%. Thus, the results suggest that the proposed approach can be effectively used in automatic anomaly detection.
Article
In recent years, deep neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Article
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured {\em graph} data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we provide a comprehensive exploration of both data mining and machine learning algorithms for these {\em detection} tasks. we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly {\em attribution} and highlight the major techniques that facilitate digging out the root cause, or the `why', of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.
Conference Paper
Effective outlier detection requires the data to be described by a set of features that captures the behavior of normal data while emphasizing those characteristics of outliers which make them different than normal data. In this work, we present a novel non-parametric evaluation criterion for filter-based feature selection which caters to outlier detection problems. The proposed method seeks the subset of features that represents the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared to popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.
Conference Paper
In network intrusion detection research, one popular strategy for finding attacks is monitoring a network's activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community. However, despite extensive academic research one finds a striking gap in terms of actual deployments of such systems: compared with other intrusion detection approaches, machine learning is rarely employed in operational "real world" settings. We examine the differences between the network intrusion detection problem and other areas where machine learning regularly finds much more success. Our main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively. We support this claim by identifying challenges particular to network intrusion detection, and provide a set of guidelines meant to strengthen future research on anomaly detection.
Article
Assume you are given a data population characterized by a certain number of attributes. Assume, moreover, you are provided with the information that one of the individuals in this data population is abnormal, but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal. In several cases, you will be indeed interested in discovering such reasons. This article is precisely concerned with this problem of discovering sets of attributes that account for the (a priori stated) abnormality of an individual within a given dataset. A criterion is presented to measure the abnormality of combinations of attribute values featured by the given abnormal individual with respect to the reference population. In this respect, each subset of attributes is intended to somehow represent a “property” of individuals. We distinguish between global and local properties. Global properties are subsets of attributes explaining the given abnormality with respect to the entire data population. With local ones, instead, two subsets of attributes are singled out, where the former one justifies the abnormality within the data subpopulation selected using the values taken by the exceptional individual on those attributes included in the latter one. The problem of individuating abnormal properties with associated explanations is formally stated and analyzed. Such a formal characterization is then exploited in order to devise efficient algorithms for detecting both global and local forms of most abnormal properties. The experimental evidence, which is accounted for in the article, shows that the algorithms are both able to mine meaningful information and to accomplish the computational task by examining a negligible fraction of the search space.
User and entity behavior analytics for enterprise security
  • M Shashanka
  • M. -Y Shen
  • J Wang
M. Shashanka, M. -Y. Shen and J. Wang, "User and entity behavior analytics for enterprise security," 2016 IEEE International Conference on Big Data (Big Data), 2016, pp. 1867-1874, doi: 10.1109/Big-Data.2016.7840805.
  • Haowei Liu
Haowei Liu 2021 J. Phys.: Conf. Ser. 1994 012021 DOI 10.1088/1742-6596/1994/1/012021.
Measuring the Effectiveness of User and Entity Behavior Analytics for the Prevention of Insider Threats. Xi'an Jianzhu Keji Daxue Xuebao/Journal of Xi'an University of Architecture and Technology
  • Rasheed Yousef
  • Mahmoud Jazzar
Yousef, Rasheed and Jazzar, Mahmoud. (2021). Measuring the Effectiveness of User and Entity Behavior Analytics for the Prevention of Insider Threats. Xi'an Jianzhu Keji Daxue Xuebao/Journal of Xi'an University of Architecture and Technology. XIII. 175-181. 10.37896/JXAT13.10/313918.
Analyzing Data Granularity Levels for Insider Threat Detection Using Machine Learning
Machine Learning; Investigators from Dalhousie University Release New Data on Machine Learning (Analyzing Data Granularity Levels for Insider Threat Detection Using Machine Learning)[J]. Journal of Engineering,2020.
LSTM-based encoder-decoder for multi-sensor anomaly detection
  • Malhotra Pankaj
  • Anusha Ramakrishnan
  • Gaurangi Anand
Malhotra Pankaj, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff, LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148 (2016).
Insider Threat Detection with Deep Neural Network
  • Fangfang Yuan
  • Yanan Cao
  • Yanmin Shang
  • Yanbing Liu
Fangfang Yuan, Yanan Cao, Yanmin Shang, Yanbing Liu, Jianlong Tan, and Binxing Fang. 2018. Insider Threat Detection with Deep Neural Network. Lecture Notes in Computer Science Computational Science -ICCS 2018(2018), 43-54.
Deep Learning for Anomaly Detection
  • G Pang
  • C Shen
  • L Cao
  • A V D Hengel
Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2021). Deep Learning for Anomaly Detection. ACM Computing Surveys, 54(2), 1-38. doi:10.1145/3439950.
  • Elie Aljalbout
  • Vladimir Golkov
  • Yawar Siddiqui
  • Maximilian Strobel
  • Daniel Cremers
Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Maximilian Strobel, and Daniel Cremers. 2018. Clustering with deep learning: Taxonomy and new methods. arXiv:1801.07648.
Supervised Learning Approach for User and Entity Behavior Analytics
  • P Goyal
  • K Gupta
P. Goyal and K. Gupta, "Supervised Learning Approach for User and Entity Behavior Analytics," 2020 3rd International Conference on Computing and Communications Technologies (ICCCT), 2020, pp. 1-6, doi: 10.1109/ICCCT49228.2020.9269224
An Overview of Supervised Learning Algorithms for User and Entity Behavior Analytics
  • N R Pandit
  • V M Thakare
N. R. Pandit and V. M. Thakare, "An Overview of Supervised Learning Algorithms for User and Entity Behavior Analytics," 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), 2021, pp. 782-787, doi: 10.1109/ICACCS51225.2021.9375201.
A Survey on Unsupervised Learning for User and Entity Behavior Analytics
  • R Zhang
  • J Zhang
  • Y Jia
R. Zhang, J. Zhang, and Y. Jia, "A Survey on Unsupervised Learning for User and Entity Behavior Analytics," 2021 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), 2021, pp. 237-242, doi: 10.1109/ITOEC51615.2021.00055.