Conference PaperPDF Available

Hacker Forum Exploit and Classification for Proactive Cyber Threat Intelligence

Authors:

Abstract and Figures

The exponential growth in data and technology have brought in prospects for progressively destructive cyber-attacks. Traditional security controls are struggling to match with the intricacy of cybercriminal tools and methods, organizations are now looking for better approaches to strengthen their cyber security capabilities. Cyber Threat Intelligence (CTI) in real-time is one such proactive approach which ensures that deployed appliances, security solutions and strategies are continually evaluated or optimized. Amongst various platforms for threat intelligence, hacker forums deliver affluent metadata, and thousands of Tools, Techniques, and Procedures (TTP). This research paper employs machine learning and deep learning approach using neural networks to automatically classify hacker forum data into predefined categories and develop interactive visualizations that enables CTI practitioners to probe collected data for proactive and opportune CTI. The results from this research shows that among all the models, deep learning model RNN GRU gives the best classification results with 99.025% accuracy and 96.56% precision.
Content may be subject to copyright.
... Cybersecurity is one of the most key concerns among users [1,2,3]. Cybercriminals are delivering more strong and advanced security threats constantly to gain security control over the networks i.e. ...
... Cybercriminals are delivering more strong and advanced security threats constantly to gain security control over the networks i.e. Internet [1,3,4]. During the previous decades, computer security was bound in only preventing computer viruses and another most common security issue was email spamming [5,6]. ...
... But, in recent years, cybercriminals are making targeted attacks so that they can obtain the whole control from the users. Over the online community, topics related to computer security are the most popular and community members discuss prior and possible security threats in social media, hacker forums [1,2,7,8]. Hacker forums are the most valuable source for computer security related posts, blogs and these posts usually contain vital information about possible computer and cybersecurity threats or holes [1,2]. ...
... In [8], a classification of CTI data originating from hacker forums was performed using two different variants of Recurrent Neural Networks (RNNs). The data were classified, using specialised web crawlers, into relevant or irrelevant. ...
... This attack allows attackers to steal items, sensitive company data, user lists or private customer detail [10]. ...
Article
Full-text available
In this short research, cyber-attack and the well-known attacking methods are discussed. Moreover, how many attacks were made in 2021 compared to the attacks in the previous year is found, to determine how fast this malicious activity is growing and the reasons which motivate such cyber-attacks are studied. The risk measurement methods are also discussed in this article based on some previous research. The conclusions are made on the suitable solution for cyber-attack, reviewed based on the point of view of different research.
... Machine learning techniques have been applied to CTI research recently. Most past research focused on classifying security and non-security related documents or extracting vulnerabilities [2][3][4][5] but rarely extracting attack tactics to fill up the information needed by APT incidents to outline attack processes. Some previous work [6][7][8] manually built up a TTP ontology that consumes intensive labor work and requires to keep it updated as new attack vectors emerge. ...
Conference Paper
To gain insight into potential cyber threats, this research proposes a novel automatic threat action retrieval system, which collects and analyzes various data sources including security news, incident analysis reports, and darknet hacker forums and develops an improved data preprocessing method to reduce feature dimension and a novel query match algorithm to capture effective threat actions automatically without manually predefined ontology applied by the past research. The experimental results illustrate that The proposed method achieves an accuracy of 94.7% and a recall rate of 95.8% and outperforms the previous research. The proposed solution can extract effective threat actions automatically and efficiently.
... 58 Cybersecurity researchers [5], [6], [7] have focused on 59 mining CTI that can automatically extract CTI information 60 from unstructured reports. Mining CTI facilitates extracting 61 the TTPs, structure of cyberattacks known as cyber kill 62 chain [8], and the artifacts of operating systems, applications, 63 programs, and networks which used indicates compromise, 64 intrusion or breach of a computing system collectively referred 65 as indicators of compromise [9]. Researchers [4], [10] are 66 also working on transforming the extracted CTI to structured 67 formats, such as CVE [11], STIX [12] and MITRE ATT&CK 68 [13]. ...
Article
Underground hacker forums serve as an online social platform for hackers to communicate and spread hacking techniques and tools. In these forums, a lot of latest information indirectly or directly affects cyberspace security, thereby threatening the assets of enterprises or individuals. Therefore, social media such as hacker forums and twitter have a great impact on the cybersecurity area. In recent years, analyzing hacker forum data to explore hacking activities and cybersecurity situational awareness have aroused widespread interest among researchers. Automatically identifying cybersecurity words and extracting neologisms from open source social platforms are less successful and still require further research. In order to provide early warning of cyber attack incidents, we proposed NEDetector, a novel method to automatically identify cybersecurity words and extract neologisms from unstructured content, mainly focus on attack groups and hacking tools. NEDetector firstly analyzes the cybersecurity words and proposes four group features to build cybersecurity words identification model based on Bidirectional LSTM algorithm. Secondly, NEDetector introduces 4 sets of features to identify cybersecurity neologisms based on RandomForest algorithm. The experiment result shows that the whole system of NEDetector achieves an identification precision of 89.11%. Furthermore, the proposed extracting neologisms system is often earlier than having enough data in Google Trends when performing predictions on Twitter data, which prove the validity and timeliness of presented system.
Article
Full-text available
The objectives of cyberattacks are becoming sophisticated, and attackers are concealing their identity by masquerading as other attackers. Cyber threat intelligence (CTI) is gaining attention as a way to collect meaningful knowledge to better understand the intention of an attacker and eventually predict future attacks. A systemic threat analysis based on data acquired from actual cyber incidents is a useful approach to generating intelligence for such an objective. Developing an analysis technique requires a high-volume and fine-quality data. However, researchers can become discouraged by inaccessibility to data because organizations rarely release their data to the research community. Owing to a data inaccessibility issue, academic research tends to be biased toward techniques that develop steps of the CTI process other than analysis and production. In this paper, we propose an automated dataset generation system called CTIMiner. The system collects threat data from publicly available security reports and malware repositories. The data are stored in a structured format. We released the source codes and dataset to the public, including approximately 640,000 records from 612 security reports published from January 2008 to June 2019. In addition, we present a statistical feature of the dataset and techniques that can be developed using it. Moreover, we demonstrate an application example of the dataset that analyzes the correlation and characteristics of an incident. We believe our dataset will promote collaborative research on threat analysis for the generation of CTI.
Conference Paper
Full-text available
Threat actors can be persistent, motivated and agile, and leverage a diversified and extensive set of tactics and techniques to attain their goals. In response to that, defenders establish threat intelligence programs to stay threat-informed and lower risk. Actionable threat intelligence is integrated into security information and event management systems (SIEM) or is accessed via more dedicated tools like threat intelligence platforms. A threat intelligence platform gives access to contextual threat information by aggregating, processing, correlating, and analyzing real-time data and information from multiple sources, and in many cases, it provides centralized analysis and reporting of an organization's security events. Sysmon logs is a data source that has received considerable attention for endpoint visibility. Approaches for threat detection using Sysmon have been proposed, mainly focusing on search engine technologies like NoSQL database systems. This paper demonstrates one of the many use cases of Sysmon and cyber threat intelligence. In particular, we present a threat assessment system that relies on a cyber threat intelligence ontology to automatically classify executed software into different threat levels by analyzing Sysmon log streams. The presented system and approach augments cyber defensive capabilities through situational awareness, prediction, and automated courses of action.
Conference Paper
Full-text available
—Hacker forums and other social platforms may contain vital information about cyber security threats. But using manual analysis to extract relevant threat information from these sources is a time consuming and error-prone process that requires a significant allocation of resources. In this paper, we explore the potential of Machine Learning methods to rapidly sift through hacker forums for relevant threat intelligence. Utilizing text data from a real hacker forum, we compared the text classification performance of Convolutional Neural Network methods against more traditional Machine Learning approaches. We found that traditional machine learning methods, such as Support Vector Machines, can yield high levels of performance that are on par with Convolutional Neural Network algorithms.
Article
Full-text available
Cyber attacks cost the global economy approximately $445 billion per year. To mitigate attacks, many companies rely on cyber threat intelligence (CTI), or threat intelligence related to computers, networks, and information technology (IT). However, CTI traditionally analyzes attacks after they have already happened, resulting in reactive advice. While useful, researchers and practitioners have been seeking to develop proactive CTI by better understanding the threats present in hacker communities. This study contributes a novel CTI framework by leveraging an automated and principled web, data, and text mining approach to collect and analyze vast amounts of malicious hacker tools directly from large, international underground hacker communities. By using this framework, we identified many freely available malicious assets such as crypters, keyloggers, web, and database exploits. Some of these tools may have been the cause of recent breaches against organizations such as the Office of Personnel Management (OPM). The study contributes to our understanding and practice of the timely proactive identification of cyber threats.
Conference Paper
Widespread adoption of networking technologies has brought about tremendous economic and social growth, but also exposed individuals and organization to new threats from malicious cyber actors. Recent attacks by WannaCry and NotPetya ransomware crypto-worms, infected hundreds of thousands of computer systems world wide, compromising data and critical infrastructure. In order to limit their impact, it is, therefore, critical to detect---and even predict---cyber attacks before they spread. Here, we introduce DISCOVER, an early cyber threat warning system, that mines online chatter from cyber actors on social media, security blogs, and darkweb forums, to identify words that signal potential cyber attacks. We evaluate DISCOVER and find that it can identify terms related to emerging cyber threats with precision above $80%$. DISCOVER also generates a time line of related online discussions on different Web sources that can be useful for analyzing emerging cyber threats.