Conference Paper

A Modular Approach to Automatic Cyber Threat Attribution using Opinion Pools

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Cyber threat attribution can play an important role in increasing resilience against digital threats. Recent research focuses on automating the threat attribution process and on integrating it with other efforts, such as threat hunting. To support increasing automation of the cyber threat attribution process, this paper proposes a modular architecture as an alternative to current monolithic automated approaches. The modular architecture can utilize opinion pools to combine the output of concrete attributors. The proposed solution increases the tractability of the threat attribution problem and offers increased usability and interpretability, as opposed to monolithic alternatives. In addition, a Pairing Aggregator is proposed as an aggregation method that forms pairs of attributors based on distinct features to produce intermediary results before finally producing a single Probability Mass Function (PMF) as output. The Pairing Aggregator sequentially applies both the logarithmic opinion pool and the linear opinion pool. An experimental validation suggests that the modular approach does not result in decreased performance and can even enhance precision and recall compared to monolithic alternatives. The results also suggest that the Pairing Aggregator can improve precision over the linear and logarithmic opinion pools. Furthermore, the improved k-accuracy in the experiment suggests that forensic experts can leverage the resulting PMF during their manual attribution processes to enhance their efficiency.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In today’s cyber warfare realm, every stakeholder in cyberspace is becoming more potent by developing advanced cyber weapons. They have equipped with the most advanced malware and maintain a hidden attribution. The precocious cyber weapons, targeted and motivated with some specific intention are called as Advanced Persistent Threats (APT). Developing defense mechanisms and performing attribution analysis of such advanced attacks are extremely difficult due to the intricate design of attack vector and sophisticated malware employed with high stealth and evasive techniques. These attacks also include advanced zero-day and negative-day exploits and payloads. This paper provides a comprehensive survey on the evolution of advanced malware design paradigms, APT attack vector and its anatomy, APT attack Tactics, Techniques, and Procedures (TTP) and specific case studies on open-ended APT attacks. The survey covers a detailed discussion on APT attack phases and comparative study on threat life-cycle specification by various organizations. This work also addresses the APT attack attribution and countermeasures against these attacks from classical signature and heuristic based detection to modern machine learning and genetics based detection mechanisms along with sophisticated zero-day and negative day malware countermeasure by various techniques like monitoring of network traffic and DNS logs, moving target based defense, and attack graph based defenses. Furthermore, the survey addresses various research scopes in the domain of APT cyber-attacks.
Article
Full-text available
Systems that rely on Machine Learning (ML systems) have differing demands on quality—known as non-functional requirements (NFRs)—from traditional systems. NFRs for ML systems may differ in their definition, measurement, scope, and comparative importance. Despite the importance of NFRs in ensuring the quality ML systems, our understanding of all of these aspects is lacking compared to our understanding of NFRs in traditional domains. We have conducted interviews and a survey to understand how NFRs for ML systems are perceived among practitioners from both industry and academia. We have identified the degree of importance that practitioners place on different NFRs, including cases where practitioners are in agreement or have differences of opinion. We explore how NFRs are defined and measured over different aspects of a ML system (i.e., model, data, or whole system). We also identify challenges associated with NFR definition and measurement. Finally, we explore differences in perspective between practitioners in industry, academia, or a blended context. This knowledge illustrates how NFRs for ML systems are treated in current practice, and helps to guide future RE for ML efforts.
Article
Full-text available
During the past decade, mobile attacks have been established as an indispensable attack vector adopted by Advanced Persistent Threat (APT) groups. The ubiquitous nature of the smartphone has allowed users to use mobile payments and store private or sensitive data (i.e., login credentials). Consequently, various APT groups have focused on exploiting these vulnerabilities. Past studies have proposed automated classification and detection methods, while few studies have covered the cyber attribution. Our study introduces an automated system that focuses on cyber attribution. Adopting MITRE’s ATT&CK for mobile, we performed our study using the tactic, technique, and procedures (TTPs). By comparing the indicator of compromise (IoC), we were able to help reduce the false flags during our experiment. Moreover, we examined 12 threat actors and 120 malware using the automated method for detecting cyber attribution.
Conference Paper
Full-text available
Multi-year digital forensic backlogs have become commonplace in law enforcement agencies throughout the globe. Digital forensic investigators are overloaded with the volume of cases requiring their expertise compounded by the volume of data to be processed. Artificial intelligence is often seen as the solution to many big data problems. This paper summarises existing artificial intelligence based tools and approaches in digital forensics. Automated evidence processing leveraging artificial intelligence based techniques shows great promise in expediting the digital forensic analysis process while increasing case processing capacities. For each application of artificial intelligence highlighted, a number of current challenges and future potential impact is discussed.
Article
Full-text available
The attribution of cyber attacks is often neglected. The consensus still is that little can be done to prosecute the perpetrators – and unfortunately, this might be right in many cases. What is however only of limited interest for the private industry is in the center of interest for nation states. Investigating if an attack was carried out in the name of a nation state is a crucial task for secret services. Many methods, tools and processes exist for network- and computer forensics that allow the collection of traces and evidences. They are the basis to associate adversarial actions to threat actors. However, a serious problem which has not got the appropriate attention from research yet, are false flag campaigns, cyber attacks which apply covert tactics to deceive or misguide attribution attempts – either to hide traces or to blame others. In this paper we provide an overview of prominent attack techniques along the cyber kill chain. We investigate traces left by attack techniques and which questions in course of the attribution process are answered by investigating these traces. Eventually, we assess how easily traces can be spoofed and rate their relevancy with respect to identifying false flag campaigns.
Article
Full-text available
The structure common to a class of divide and conquer algorithms is represented by a program scheme. A theorem is presented which relates the functionality of a divide and conquer algorithm to its structure and the functionalities of its subalgorithms. Several strategies for designing divide and conquer algorithms arise from this theorem and they are used to formally derive algorithms for sorting a list of numbers, forming the cartesian product of two sets, and finding the convex hull of a set of planar points.
Article
With the rapidly evolving technological landscape, the huge development of the Internet of Things, and the embracing of digital transformation, the world is witnessing an explosion in data generation and a rapid evolution of new applications that lead to new, wider, and more sophisticated threats that are complex and hard to be detected. Advanced persistence threats use continuous, clandestine, and sophisticated techniques to gain access to a system and remain hidden for a prolonged period of time, with potentially destructive consequences. Those stealthy attacks are often not detectable by advanced intrusion detection systems (e.g., LightBasin attack was detected in 2022 and has been active since 2016). Indeed, threat actors are able to quickly and intelligently alter their tactics to avoid being detected by security defense lines (e.g., prevention and detection mechanisms). In response to these evolving threats, organizations need to adopt new proactive defense approaches. Threat hunting is a proactive security line exercised to uncover stealthy attacks, malicious activities, and suspicious entities that could circumvent standard detection mechanisms. Additionally, threat hunting is an iterative approach to generate and revise threat hypotheses endeavoring to provide early attack detection in a proactive way. The proactiveness consists of testing and validating the initial hypothesis using various manual and automated tools/techniques with the objective of confirming/refuting the existence of an attack. This survey studies the threat hunting concept and provides a comprehensive review of the existing solutions for Enterprise networks. In particular, we provide a threat hunting taxonomy based on the used technique and a sub-classification based on the detailed approach. Furthermore, we discuss the existing standardization efforts. Finally, we provide a qualitative discussion on current advances and identify various research gaps and challenges that may be considered by the research community to design concrete and efficient threat hunting solutions.
Article
Fusing probabilistic information is a fundamental task in signal and data processing with relevance to many fields of technology and science. In this work, we investigate the fusion of multiple probability density functions (pdfs) of a continuous random variable or vector. Although the case of continuous random variables and the problem of pdf fusion frequently arise in multisensor signal processing, statistical inference, and machine learning, a universally accepted method for pdf fusion does not exist. The diversity of approaches, perspectives, and solutions related to pdf fusion motivates a unified presentation of the theory and methodology of the field. We discuss three different approaches to fusing pdfs. In the axiomatic approach, the fusion rule is defined indirectly by a set of properties (axioms). In the optimization approach, it is the result of minimizing an objective function that involves an information-theoretic divergence or a distance measure. In the supra-Bayesian approach, the fusion center interprets the pdfs to be fused as random observations. Our work is partly a survey, reviewing in a structured and coherent fashion many of the concepts and methods that have been developed in the literature. In addition, we present new results for each of the three approaches. Our original contributions include new fusion rules, axioms, and axiomatic and optimization-based characterizations; a new formulation of supra-Bayesian fusion in terms of finite-dimensional parametrizations; and a study of supra-Bayesian fusion of posterior pdfs for linear Gaussian models.
Article
A key element in solving real-life data science problems is selecting the types of models to use. Tree ensemble models (such as XGBoost) are usually recommended for classification and regression problems with tabular data. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use cases. This paper explores whether these deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. In addition to systematically comparing their performance, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.
Chapter
Effective and impactful design science research requires appropriate research conduct, and an appropriate research problem. Scholars in the DSR community continue to clarify the foundations for appropriate research conduct. In contrast, few guidelines or guardrails have been proposed to identify and develop research problems. Without such guidance, DSR efforts run the risk of pursuing problems that are either un-important or not-well-formulated. In this paper, I draw on writings beyond the DSR community to develop considerations that DSR scholars can use to identify and develop research problems, acknowledging their evolving ontological status. The paper describes an approach, articulates these considerations, and develops arguments that can help the identification and development of research problems for DSR efforts.
Article
Deep learning as represented by the artificial deep neural networks (DNNs) has achieved great success recently in many important areas that deal with text, images, videos, graphs, and so on. However, the black-box nature of DNNs has become one of the primary obstacles for their wide adoption in mission-critical applications such as medical diagnosis and therapy. Because of the huge potentials of deep learning, increasing the interpretability of deep neural networks has recently attracted much research attention. In this paper, we propose a simple but comprehensive taxonomy for interpretability, systematically review recent studies in improving interpretability of neural networks, describe applications of interpretability in medicine, and discuss possible future research directions of interpretability, such as in relation to fuzzy logic and brain science.
Book
An increasing number of countries develop capabilities for cyber-espionage and sabotage. The sheer number of reported network compromises suggests that some of these countries view cyber-means as integral and well-established elements of their strategical toolbox. At the same time the relevance of such attacks for society and politics is also increasing. Digital means were used to influence the US presidential election in 2016, repeatedly led to power outages in Ukraine, and caused economic losses of hundreds of millions of dollars with a malfunctioning ransomware. In all these cases the question who was behind the attacks is not only relevant from a legal perspective, but also has a political and social dimension. Attribution is the process of tracking and identifying the actors behind these cyber-attacks. Often it is considered an art, not a science. This book systematically analyses how hackers operate, which mistakes they make, and which traces they leave behind. Using examples from real cases the author explains the analytic methods used to ascertain the origin of Advanced Persistent Threats. The Content • Advanced Persistent Threats • The attribution process • Analysis of malware • Attack infrastructure • Analysis of control servers • Geopolitical analysis • Telemetry - data from security products • Methods of intelligence agencies • Doxing • False flags • Group set-ups • Communication • Ethics of attribution The Target Audience • IT-security professionals • International relations researchers • Technical journalists • Employees of organizations that are targeted by Advanced Persistent Threats The Author Dr. Timo Steffens was involved in the analysis of many of the most spectacular cyber-espionage cases in Germany. He has been tracking the activities and techniques of sophisticated hacker groups for almost a decade.
Article
Today's cyber attacks require a new line of security defenses. The static approach of traditional security based on heuristic and signature does not match the dynamic nature of new generation of threats that are known to be evasive, resilient and complex. Organizations need to gather and share real-time cyber threat information and to transform it to threat intelligence in order to prevent attacks or at least execute timely disaster recovery. Threat Intelligence (TI) means evidence-based knowledge representing threats that can inform decisions. There is a general awareness for the need of threat intelligence while vendors today are rushing to provide a diverse array of threat intelligence products, specifically focusing on Technical Threat Intelligence (TTI). Although threat intelligence is being increasingly adopted, there is little consensus on what it actually is, or how to use it. Without any real understanding of this need, organizations risk investing large amounts of time and money without solving existing security problems. Our paper aims to classify and make distinction among existing threat intelligence types. We focus particularly on the TTI issues, emerging researches, trends and standards. Our paper also explains why there is a reluctance among organizations to share threat intelligence. We provide sharing strategies based on trust and anonymity, so participating organizations can do away with the risks of business leak. We also show in this paper why having a standardized representation of threat information can improve the quality of TTI, thus providing better automated analytics solutions on large volumes of TTI which are often non-uniform and redundant. Finally, we evaluate most popular open source/free threat intelligence tools, and compare their features with those of a new AlliaCERT TI tool.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
This survey paper describes a focused literature survey of machine learning (ML) and data mining (DM) methods for cyber analytics in support of intrusion detection. Short tutorial descriptions of each ML/DM method are provided. Based on the number of citations or the relevance of an emerging method, papers representing each method were identified, read, and summarized. Because data are so important in ML/DM approaches, some well-known cyber data sets used in ML/DM are described. The complexity of ML/DM algorithms is addressed, discussion of challenges for using ML/DM for cyber security is presented, and some recommendations on when to use a given method are provided.
Article
Who did it? Attribution is fundamental. Human lives and the security of the state may depend on ascribing agency to an agent. In the context of computer network intrusions, attribution is commonly seen as one of the most intractable technical problems, as either solvable or not solvable, and as dependent mainly on the available forensic evidence. But is it? Is this a productive understanding of attribution? — This article argues that attribution is what states make of it. To show how, we introduce the Q Model: designed to explain, guide, and improve the making of attribution. Matching an offender to an offence is an exercise in minimising uncertainty on three levels: tactically, attribution is an art as well as a science; operationally, attribution is a nuanced process not a black-and-white problem; and strategically, attribution is a function of what is at stake politically. Successful attribution requires a range of skills on all levels, careful management, time, leadership, stress-testing, prudent communication, and recognising limitations and challenges.
Conference Paper
Our objectives in the DARPA Spoken Language Program are: (1) to provide a central role in speech database design, development and distribution; and (2) to design, coordinate implementation of, and analyze results of performance tests.
Article
The field of dataset shift has received a growing amount of interest in the last few years. The fact that most real-world applications have to cope with some form of shift makes its study highly relevant. The literature on the topic is mostly scattered, and different authors use different names to refer to the same concepts, or use the same name for different concepts. With this work, we attempt to present a unifying framework through the review and comparison of some of the most important works in the literature.
Article
When a group of k individuals is required to make a joint decision, it occasionally happens that there is agreement on a utility function for the problem but that opinions differ on the probabilities of the relevant states of nature. When the latter are indexed by a parameter θ\theta, to which probability density functions on some measure μ(θ)\mu(\theta) may be attributed, suppose the k opinions are given by probability density functions ps1(θ),,psk(θ)p_{s1}(\theta), \cdots, p_{sk}(\theta). Suppose that D is the set of available decisions d and that the utility of d, when the state of nature is θ\theta, is u(d,θ)u(d, \theta). For a probability density function p(θ)p(\theta), write u[dp(θ)]=u(d,θ)p(θ)dμ(θ)u\lbrack d\mid p(\theta)\rbrack = \int u(d, \theta)p(\theta) d\mu(\theta). The Group Minimax Rule of Savage [1] would have the group select that d minimising maxi=1,,k{maxdϵDu[dpsi(θ)]u[dpsi(θ)]}\max_{i = 1, \cdots, k}\{\max_{d'\epsilon D} u\lbrack d' \mid p_{si}(\theta)\rbrack - u\lbrack d \mid p_{si}(\theta)\rbrack\}. As Savage remarks ([1], p. 175), this rule is undemocratic in that it depends only on the different distributions for θ\theta represented in those put forward by the group and not on the number of members of the group supporting each different representative. An alternative rule for choosing d may be stated as follows: "Choose weights λ1,,λk(λi0,i=1,,k\lambda_1, \cdots, \lambda_k (\lambda_i \geqq 0, i = 1, \cdots, k and 1kλi=1)\sum^k_1 \lambda_i = 1); construct the pooled density function psλ(θ)=1kλipsi(θ);p_{s\lambda}(\theta) = \sum^k_1 \lambda_ip_{si}(\theta); choose the d, say dsλd_{s\lambda}, maximising u[dpsλ(θ)]u\lbrack d \mid p_{s\lambda}(\theta)\rbrack." This rule, which may be called the Opinion Pool, can be made democratic by setting λ1==λk=1/k\lambda_1 = \cdots = \lambda_k = 1/k. Where it is reasonable to suppose that there is an actual, operative probability distribution, represented by an `unknown' density function pa(θ)p_a(\theta), it is clear that the group is then acting as if pa(θ)p_a(\theta) were known to be psλ(θ)p_{s\lambda}(\theta). If pa(θ)p_a(\theta) were known, it would be possible to calculate u[dsλpa(θ)]u\lbrack d_{s\lambda} \mid p_a(\theta)\rbrack and u[dsipa(θ)]u\lbrack d_{si} \mid p_a(\theta)\rbrack, where dsid_{si} is the d maximising u[dpsi(θ)],i=1,,ku\lbrack d \mid p_{si}(\theta)\rbrack, i = 1, \cdots, k and then to use these quantities to assess the effect of adopting the Opinion Pool for any given choice of λ1,,λk\lambda_1, \cdots, \lambda_k. It is of general theoretical interest to examine the conditions under which \begin{equation*}\tag{1.1}u\lbrack d_{s\lambda} \mid p_a(\theta)\rbrack \geqq \min_{i = 1, \cdots, k} u\lbrack d_{si} \mid p_a(\theta)\rbrack.\end{equation*} Theorems 2.1 and 3.1 provide different sets of sufficient conditions for (1.1) to hold. Theorem 2.1 requires k=2k = 2 and places a restriction on pa(θ)p_a(\theta) (or, equivalently, on ps1(θ)p_{s1}(\theta) and ps2(θ)p_{s2}(\theta)); Theorem 3.1 puts conditions on D and u(d,θ)u(d, \theta) instead.
Article
Software reuse enables developers to leverage past accomplishments and facilitates significant improvements in software productivity and quality. Software reuse catalyzes improvements in productivity by avoiding redevelopment and improvements in quality by incorporating components whose reliability has already been established. This study addresses a pivotal research issue that underlies software reuse - what factors characterize successful software reuse in large-scale systems. The research approach is to investigate, analyze, and evaluate software reuse empirically by mining software repositories from a NASA software development environment that actively reuses software. This software environment successfully follows principles of reuse-based software development in order to achieve an average reuse of 32 percent per project, which is the average amount of software either reused or modified from previous systems. We examine the repositories for 25 software systems ranging from 3,000 to 112,000 source lines from this software environment. We analyze four classes of software modules: modules reused without revision, modules reused with slight revision (
Semantic cyberthreat modelling
  • S Bromander
  • A Jøsang
  • M Eian
The MITRE Corporation
Glossary. National Institute of Standards and Technology (NIST)
  • Glossary
Cybercrime-as-a-service: identifying control points to disrupt
  • K Huang
  • M Siegel
  • S Madnick
Cyber attribution 2.0: Capture the false flag
  • T Pahi
  • F Skopik
Automating generation of cyber threat intelligence content through threat attribution of botnet incidents
  • K T W Teuwen
Semantic cyberthreat modelling
  • Bromander
Cybercrime-as-a-service: identifying control points to disrupt
  • Huang