Research

Applying Threat Intelligence Metrics on the Dark Web data

Authors:
To read the file of this research, you can request a copy directly from the author.

Abstract

The exponential growth in data and technology has brought in prospects for progressively destructive cyber-attacks. Traditional security controls struggle to match the intricacy of cybercriminal tools and methods; organizations have shifted towards Threat Intelligence. Amongst various platforms for threat intelligence (TI), hacker forums deliver rich metadata and thousands of Tools, Techniques, and Procedures (TTP). Today, many public and commercial sources distribute dark web threat intelligence data feeds to support this purpose. However, our understanding of this data, its characterization, and the extent to which it can meaningfully support its intended uses are still quite limited. This research will address these gaps by defining a set of metrics for characterizing dark web threat intelligence data feeds. Our measurement results give some closure to the consumers about the purchase and optimal use of dark web threat intelligence data feeds.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The deep and darkweb (d2web) refers to limited access web sites that require registration, authentication, or more complex encryption protocols to access them. These web sites serve as hubs for a variety of illicit activities: to trade drugs, stolen user credentials, hacking tools, and to coordinate attacks and manipulation campaigns. Despite its importance to cyber crime, the d2web has not been systematically investigated. In this paper, we study a large corpus of messages posted to 80 d2web forums over a period of more than a year. We identify topics of discussion using LDA and use a non-parametric HMM to model the evolution of topics across forums. Then, we examine the dynamic patterns of discussion and identify forums with similar patterns. We show that our approach surfaces hidden similarities across different forums and can help identify anomalous events in this rich, heterogeneous data.
Article
Full-text available
Cyber attacks cost the global economy approximately $445 billion per year. To mitigate attacks, many companies rely on cyber threat intelligence (CTI), or threat intelligence related to computers, networks, and information technology (IT). However, CTI traditionally analyzes attacks after they have already happened, resulting in reactive advice. While useful, researchers and practitioners have been seeking to develop proactive CTI by better understanding the threats present in hacker communities. This study contributes a novel CTI framework by leveraging an automated and principled web, data, and text mining approach to collect and analyze vast amounts of malicious hacker tools directly from large, international underground hacker communities. By using this framework, we identified many freely available malicious assets such as crypters, keyloggers, web, and database exploits. Some of these tools may have been the cause of recent breaches against organizations such as the Office of Personnel Management (OPM). The study contributes to our understanding and practice of the timely proactive identification of cyber threats.
Conference Paper
Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and criminal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to understand the markets makes manual exploration of these forums unscalable. In this work, we propose an automated, top-down approach for analyzing underground forums. Our approach uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices. We also demonstrate, via a pair of case studies, how an analyst can use these automated approaches to investigate other categories of products and transactions. We use eight distinct forums to assess our tools: Antichat, Blackhat World, Carders, Darkode, Hack Forums, Hell, L33tCrew and Nulled. Our automated approach is fast and accurate, achieving over 80% accuracy in detecting post category, product, and prices.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Reading the Tea Leaves: A Comparative Analysis of Threat Intelligence
  • V G Li
  • M Dunn
  • P Pearce
  • D Mccoy
  • G M Voelker
  • S Savage
  • K Levchenko
Li, V. G., Dunn, M., Pearce, P., McCoy, D., Voelker, G. M., Savage, S., & Levchenko, K. (2019, July). Reading the Tea Leaves: A Comparative Analysis of Threat Intelligence. In 28th USENIX Security Symposium.
Azsecure hacker assets portal: Cyber threat intelligence and malware analysis
  • S Samtani
  • K Chinn
  • C Larson
  • H Chen
Samtani, S., Chinn, K., Larson, C., & Chen, H. (2016, September). Azsecure hacker assets portal: Cyber threat intelligence and malware analysis. In 2016 IEEE conference on intelligence and security informatics (ISI) (pp. 19-24). Ieee.
From word embeddings to document distances
  • M Kusner
  • Y Sun
  • N Kolkin
  • K Weinberger
Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015, June). From word embeddings to document distances. In International conference on machine learning (pp. 957-966). PMLR.
  • J Devlin
  • M W Chang
  • K Lee
  • K Toutanova
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.