ArticlePDF Available

Dark-Web Cyber Threat Intelligence: From Data to Intelligence to Prediction

Authors:

Abstract

Scientific work that leverages information about communities on the deep and dark web has opened up new angles in the field of security informatics. [...]
information
Editorial
Dark-Web Cyber Threat Intelligence: From Data to
Intelligence to Prediction
Paulo Shakarian
School of Computing, Informatics, and Decision Support Engineering, Arizona State University, Tempe,
AZ 85281, USA; shak@asu.edu
Received: 29 November 2018; Accepted: 29 November 2018; Published: 1 December 2018


Scientific work that leverages information about communities on the deep and dark web has
opened up new angles in the field of security informatics. The presence of online communities
operating with relative impunity allows for data-driven approaches to various forms of adversarial
reasoning. Outside of this space, such techniques would require data that are either classified or
law-enforcement sensitive.
The pioneering work on dark-web mining by Hsinchun Chen and his group [
1
] laid the
foundations for how dark-web data could impact cyber threat intelligence in a very broad way.
We laid out a vision in early 2016 [
2
] on how this type of data could be leveraged to impact cyber threat
intelligence in a variety of ways—from adversarial models, to understanding hacker communities,
risk assessment, and data-driven prediction of cyberattacks. We were thrilled at the initial response to
some of our early work in this area (i.e., [
3
]) that coincided with government grants, new scientific
studies, and commercial efforts that have only served to help the field.
As the title of this volume suggests, there is an evolution in how the dark web can be used to impact
cyber threat intelligence. Simply put, the information must be obtained, analyzed, and potentially
used for prediction purposes, all of which poses significant challenges.
First, gathering information from dark-web communities poses a unique set of challenges.
Implementing crawlers to gather such information is a complex process. Furthermore, the adversarial
nature of the communities from which such data are collected poses a conundrum to researchers:
how much detail do they publish? They also run the risk of such techniques ceasing to be
viable as they become exposed to potential malicious hackers. These conversations have often
taken place at conferences such as ASONAM and IEEE Intelligence and Security Informatics (ISI).
For example, Richard Frank’s seminal work on dark-web mining [
4
]—which was named best paper at
ASONAM/FOSINT-SI in 2015—led me to engage in a series of conversations with him on many of
the challenges he had faced while conducting that research. In this Special Issue, ‘A Framework for
More Effective Dark Web Marketplace Investigations’ provides perhaps the most detailed description
of scraping dark-web sites available to-date, offering a detailed case-study that previously researchers
could only obtain through offline conversations.
While gathering information is important, data alone cannot address real-world cybersecurity
problems. Current threat intelligence organizations at major companies worldwide sift through this
data on a regular basis. They map out threat actors, conduct searches relevant to their organization,
and synthesize the information across multiple sources. Criminologist Tom Holt was a pioneer in this
area (i.e., [
5
]), which has gained importance recently as Chief Information Security Officers (CISO)
are increasingly hiring intelligence processionals. This has led to widespread use of counter-terrorism
and law enforcement techniques within operational cybersecurity elements. Techniques such as link
analysis are now commonplace within cyber threat intelligence organizations. Research that applies
data mining techniques to data obtained from the dark web will enable these threat intelligence teams
to create an accurate picture of the threat more quickly. One key challenge is the reconciliation of
Information 2018,9, 305; doi:10.3390/info9120305 www.mdpi.com/journal/information
Information 2018,9, 305 2 of 2
threat actor identities across multiple sources. In this Special Issue, ‘First Steps towards Data-Driven
Adversarial Deduplication’ addresses this problem head-on.
The current use of dark-web information to support real-world cybersecurity practices has been
focused on augmenting intelligence practices. However, with the significant advances in the industry
in technology for security information and event management (SIEM), recent work has shown that
dark-web indicators can be correlated with event data and used for the prediction of cyberattacks [
6
].
The paper ‘Predicting Cyber-Events by Leveraging Hacker Sentiment’ included in this Issue takes the
next step in prediction—adding sentiment mining as a prediction element (originally introduced as
a way to identify interesting hacker conversations in Reference [4]).
The use of information from hacker communities such as the dark web has great promise in
leading to a more threat-focused cybersecurity. The key to further progress in this area is continued
evolution and automation so that such threat intelligence can be made available to a wide variety of
organizations to drive security decisions and protect their infrastructure more effectively.
Conflicts of Interest: The author declares no conflict of interest.
References
1.
Chen, H. Dark Web: Exploring and Data Mining the Dark Side of the Web; Springer Science & Business Media:
New York, NY, USA, 2011; Volume 30.
2.
Shakarian, P.; Shakarian, J. Socio-Cultural Modeling for Cyber Threat Actors. In Proceedings of the AAAI
Workshop: Artificial Intelligence for Cyber Security, Phoenix, AR, USA, 12–13 February 2016.
3.
Nunes, E.; Diab, A.; Gunn, A.; Marin, E.; Mishra, V.; Paliath, V.; Robertson, J.; Shakarian, J.; Thart, A.;
Shakarian, P. Darknet and deepnet mining for proactive cybersecurity threat intelligence. In Proceedings
of the 2016 IEEE International Conference on Intelligence and Security Informatics (ISI), The University of
Arizona, Tucson, AR, USA, 27–30 September 2016; pp. 7–12.
4.
Macdonald, M.; Frank, R.; Mei, J.; Monk, B. Identifying Digital Threats in a Hacker Web Forum.
In Proceedings of the 2015 International Symposium on Foundations of Open Source Intelligence and
Security Informatics (FOSINT), Paris, France, 26–27 August 2015.
5.
Holt, T.J.; Lampke, E. Exploring stolen data markets online: Products and market forces.
Crim. Justice Stud.
2010,23, 33–50. [CrossRef]
6.
Almukaynizi, M.; Paliath, V.; Shah, M.; Shah, M.; Shakarian, P. Finding Cryptocurrency Attack Indicators
Using Temporal Logic and Darkweb Data. In Proceedings of the 2018 IEEE Conference on Intelligence and
Security Informatics (ISI-18), Florida International University, Miami, FL, USA, 8–10 November 2018.
©
2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
... Consequently, these places provide vital resources for researchers and cybersecurity experts to detect cyberattacks early and provide organizations with warnings of potential threats [4]. Moreover, studying these hackers' communities on the Dark Web allows for the continuous development of new areas in security informatics technologies [6]. ...
... Due to the technical nature of the Dark Web, developing crawlers that collect and analyze the required data can be complicated. Furthermore, researchers must consider efficient precautionary measures since their employed techniques and tools themselves face the risk of being disclosed and vulnerable to cyberattacks [6]. ...
... erefore, cooperation among organizations of different specialties is essential for successful attribution [61]. Moreover, matching cyberattack actors' identities among several platforms requires further research [6]. ...
Article
Full-text available
From proactive detection of cyberattacks to the identification of key actors, analyzing contents of the Dark Web plays a significant role in deterring cybercrimes and understanding criminal minds. Researching in the Dark Web proved to be an essential step in fighting cybercrime, whether with a standalone investigation of the Dark Web solely or an integrated one that includes contents from the Surface Web and the Deep Web. In this review, we probe recent studies in the field of analyzing Dark Web content for Cyber Threat Intelligence (CTI), introducing a comprehensive analysis of their techniques, methods, tools, approaches, and results, and discussing their possible limitations. In this review, we demonstrate the significance of studying the contents of different platforms on the Dark Web, leading new researchers through state-of-the-art methodologies. Furthermore, we discuss the technical challenges, ethical considerations, and future directions in the domain.
... Students need to be educated about the use of AI and ML technologies in security systems so as to identify and respond to threats in real time. Researchers have actively developed novel AI and ML solutions for, Cyber Threat Intelligence [4][5][6][7][8][9][10][11][12], Malware Analysis [13][14][15][16], Malware Classification [17], etc. to prevent and detect cyber-attacks. ...
... As a result of such studies, the security analysts produce CTI. We also discuss various sources of CTI like, NIST's National Vulnerability Database (NVD) [20], Common Vulnerabilities and Exposures (CVE) [21], After Action Reports [7], Social Media [4], Blogs and News Sources [22], Dark Web [8], VirusTotal [23]. Students then are able to understand how CTI allows an organization to identify, assess, monitor, and compute a response to cyber threats. ...
Article
Full-text available
The use of Artificial Intelligence (AI) and Machine Learning (ML) to solve cyber-security problems has been gaining traction within industry and academia, in part as a response to widespread malware attacks on critical systems, such as cloud infrastruc-tures, government offices or hospitals, and the vast amounts of data they generate. AI-and ML-assisted cybersecurity offers data-driven automation that could enable security systems to identify and respond to cyber threats in real time. However, there is currently a shortfall of professionals trained in AI and ML for cybersecurity. Here we address the shortfall by developing lab-intensive modules that enable undergraduate and graduate students to gain fundamental and advanced knowledge in applying AI and ML techniques to real-world datasets to learn about Cyber Threat Intelligence (CTI), malware analysis, and classification, among other important topics in cybersecurity. Here we describe six self-contained and adaptive modules in "AI-assisted Malware Analysis." Topics include: (1) CTI and malware attack stages, (2) malware knowledge representation and CTI sharing, (3) malware data collection and feature identification, (4) AI-assisted malware detection, (5) malware classification and attribution, and (6) advanced malware research topics and case studies such as adversarial learning and Advanced Persistent Threat (APT) detection.
... Students need to be educated about the use of AI and ML technologies in security systems so as to identify and respond to threats in real time. Researchers have actively developed novel AI and ML solutions for, Cyber Threat Intelligence [4][5][6][7][8][9][10][11][12], Malware Analysis [13][14][15][16], Malware Classification [17], etc. to prevent and detect cyber-attacks. ...
... As a result of such studies, the security analysts produce CTI. We also discuss various sources of CTI like, NIST's National Vulnerability Database (NVD) [20], Common Vulnerabilities and Exposures (CVE) [21], After Action Reports [7], Social Media [4], Blogs and News Sources [22], Dark Web [8], VirusTotal [23]. Students then are able to understand how CTI allows an organization to identify, assess, monitor, and compute a response to cyber threats. ...
Preprint
Full-text available
The use of Artificial Intelligence (AI) and Machine Learning (ML) to solve cybersecurity problems has been gaining traction within industry and academia, in part as a response to widespread malware attacks on critical systems, such as cloud infrastructures, government offices or hospitals, and the vast amounts of data they generate. AI- and ML-assisted cybersecurity offers data-driven automation that could enable security systems to identify and respond to cyber threats in real time. However, there is currently a shortfall of professionals trained in AI and ML for cybersecurity. Here we address the shortfall by developing lab-intensive modules that enable undergraduate and graduate students to gain fundamental and advanced knowledge in applying AI and ML techniques to real-world datasets to learn about Cyber Threat Intelligence (CTI), malware analysis, and classification, among other important topics in cybersecurity. Here we describe six self-contained and adaptive modules in "AI-assisted Malware Analysis." Topics include: (1) CTI and malware attack stages, (2) malware knowledge representation and CTI sharing, (3) malware data collection and feature identification, (4) AI-assisted malware detection, (5) malware classification and attribution, and (6) advanced malware research topics and case studies such as adversarial learning and Advanced Persistent Threat (APT) detection.
... Deep Web dan Darknet tidak memiliki lokasi khusus akan tetapi terdistribusikan di seluruh internet dan berbagi satu kesamaan yaitu tersembunyi dari mesin pencari dari pengguna internet biasa [10]. Dot onion (.onion) merupan domain dari Dark Web .onion ...
Article
Full-text available
Dark Web merupakan konten online yang terenkripsi dan hanya dapat di akses menggunakan jaringan khusus seperti TOR (The Onion Router) . Saat ini perkembangan konten online menjadi perhatian serius karena pertumbuhan bagi kegiatan dan layanan terlarang seperti penjualan barang illegal , narkoba dan pornografi anak. Kejahatan komputer dalam dunia internet mendorong banyaknya pertumbuhan transaksi jual beli barang-barang illegal yang dijual dipasar gelap, transaksi yang menguntukan namun illegal menarik perhatian. Dark Web merupakan istilah web yang dikategorikan sebagai Deep Web yang berdomain .onion yang tidak dapat ditemukan di mesin pencarian seperti google , yahoo dan bing . Analisis halaman-halaman Dark Web dalam mendukung investigasi kejahatan diusulkan sebagai solusi untuk memecahkan masalah tersebut. Konsep ini berupa analisis halaman-halaman Dark Web yang diharapkan mendukung dalam investigasi kejahatan.
... dictionary.com)"; such goals closely resemble CTI objectives. Counterterrorism and law enforcement techniques are already widely used in the operational setting of cybersecurity (Shakarian 2018). Thus, we insist that a counterintelligence theory can provide a high-level structure with which to frame the benefits of CTI. ...
Article
Full-text available
Given the global increase in crippling cyberattacks, organizations are increasingly turning to cyberthreat intelligence (CTI). CTI represents actionable threat information that is relevant to a specific organization and that thus demands its close attention. CTI efforts aim to help organizations “know their enemies better” for proactive, preventive, and timely threat detection and remediation—complementing conventional risk-management paradigms designed to improve ‘general readiness’ against known or unknown threats. Organizational security (OrgSec) and behavioral security research has lagged behind CTI’s growing potential to address current cybersecurity challenges. Instead, CTI has largely been the purview of computer science from an algorithmic perspective. However, OrgSec and behavioral researchers can contribute a further combined knowledge of design for the organization, human factors, and organizational governance to foster CTI. In this theory-building and review manuscript, we propose the CTI capability model (CTI-CM) to prescribe the key capabilities necessary for a CTI practitioner to engage effectively in CTI activities. The CTI-CM defines a practitioner’s CTI capability in terms of three highly interrelated but conceptually distinctive dimensions: analytical component capability, contextual response capability, and experiential practice capability. We further explain how these capabilities can be fostered, and the key implications for leading security practice in organizations.
... such goals closely resemble CTI objectives. Counterterrorism and law enforcement techniques are already widely used in the operational setting of cybersecurity ( Shakarian, 2018 ). Thus, we insist that a counterintelligence theory can provide a high-level structure with which to frame the benefits of CTI. ...
Article
Given the global increase in crippling cyberattacks, organizations are increasingly turning to cyberthreat intelligence (CTI). CTI represents actionable threat information that is relevant to a specific organization and that thus demands its close attention. CTI efforts aim to help organizations “know their enemies better” for proactive, preventive, and timely threat detection and remediation—complementing conventional risk-management paradigms designed to improve ‘general readiness’ against known or unknown threats. Organizational security (OrgSec) and behavioral security research has lagged behind CTI's growing potential to address current cybersecurity challenges. Instead, CTI has largely been the purview of computer science from an algorithmic perspective. However, OrgSec and behavioral researchers can contribute a further combined knowledge of design for the organization, human factors, and organizational governance to foster CTI. In this theory-building and review manuscript, we propose the CTI capability model (CTI-CM) to prescribe the key capabilities necessary for a CTI practitioner to engage effectively in CTI activities. The CTI-CM defines a practitioner's CTI capability in terms of three highly interrelated but conceptually distinctive dimensions: analytical component capability, contextual response capability, and experiential practice capability. We further explain how these capabilities can be fostered, and the key implications for leading security practice in organizations.
Chapter
The dark web is a virtually untraceable hidden layer of the internet that is frequently used to store and access secret data. However, a number of situations have been documented in which this platform has been used to covertly undertake illicit and unlawful operations. Traditional crime-solving procedures are inadequate to meet the demands of the current crime environment. Machine learning can be used to detect criminal patterns. Past crime records, social media sentiment analysis, meteorological data, and other sources of data can be used to feed this machine learning technique. Using machine learning, there are five phases to predicting crime. These are data gathering, data classification, pattern recognition, event prediction, and visualization. Using crime prediction technologies, law enforcement agencies can make better use of their limited resources. In this chapter, the authors show the importance of learning the principles of various policies on the dark web and cyber crimes, guiding new researchers through cutting-edge methodologies.
Conference Paper
Full-text available
Information threatening the security of critical infrastructures are exchanged over the Internet through communication platforms, such as online discussion forums. This information can be used by malicious hackers to attack critical computer networks and data systems. Much of the literature on the hacking of critical infrastructure has focused on developing typologies of cyber-attacks, but has not examined the communication activities of the actors involved. To address this gap in the literature, the language of hackers was analyzed to identify potential threats against critical infrastructures using automated analysis tools. First, discussion posts were collected from a selected hacker forum using a customized web-crawler. Posts were analyzed using a parts of speech tagger, which helped determine a list of keywords used to query the data. Next, a sentiment analysis tool scored these keywords, which were then analyzed to determine the effectiveness of this method.
The threat of hackers and data thieves has increased, though few have considered the ways they dispose of the information obtained through computer attacks. This exploratory study examines the nature of the market for stolen data using a qualitative analysis of 300 threads from six web forums run by and for data thieves. The results suggest that all manner of personal and financial data can be obtained through these markets at a fraction of their true value. In addition, there are distinct relationships between buyers and sellers that shape the relationships and structure of these markets. Policy implications for law enforcement intervention are also discussed.
Conference Paper
This talk will review the emerging research in Terrorism Informatics based on a web mining perspective. Recent progress in the internationally renowned Dark Web project will be reviewed, including: deep/dark web spidering (web sites, forums, Youtube, virtual worlds), web metrics analysis, dark network analysis, web-based authorship analysis, and sentiment and affect analysis for terrorism tracking. In collaboration with selected international terrorism research centers and intelligence agencies, the Dark Web project has generated one of the largest databases in the world about extremist/terrorist-generated Internet contents (web sites, forums, blogs, and multimedia documents). Dark Web research has received significant international press coverage, including: Associated Press, USA Today, The Economist, NSF Press, Washington Post, Fox News, BBC, PBS, Business Week, Discover magazine, WIRED magazine, Government Computing Week, Second German TV (ZDF), Toronto Star, and Arizona Daily Star, among others. For more Dark Web project information, please see: http://ai.eller.arizona.edu/research/terror/ .
Socio-Cultural Modeling for Cyber Threat Actors
  • P Shakarian
  • J Shakarian
Shakarian, P.; Shakarian, J. Socio-Cultural Modeling for Cyber Threat Actors. In Proceedings of the AAAI Workshop: Artificial Intelligence for Cyber Security, Phoenix, AR, USA, 12-13 February 2016.