May 2021
·
139 Reads
·
20 Citations
Journal of Information Security and Applications
Underground hacker forums serve as an online social platform for hackers to communicate and spread hacking techniques and tools. In these forums, a lot of latest information indirectly or directly affects cyberspace security, thereby threatening the assets of enterprises or individuals. Therefore, social media such as hacker forums and twitter have a great impact on the cybersecurity area. In recent years, analyzing hacker forum data to explore hacking activities and cybersecurity situational awareness have aroused widespread interest among researchers. Automatically identifying cybersecurity words and extracting neologisms from open source social platforms are less successful and still require further research. In order to provide early warning of cyber attack incidents, we proposed NEDetector, a novel method to automatically identify cybersecurity words and extract neologisms from unstructured content, mainly focus on attack groups and hacking tools. NEDetector firstly analyzes the cybersecurity words and proposes four group features to build cybersecurity words identification model based on Bidirectional LSTM algorithm. Secondly, NEDetector introduces 4 sets of features to identify cybersecurity neologisms based on RandomForest algorithm. The experiment result shows that the whole system of NEDetector achieves an identification precision of 89.11%. Furthermore, the proposed extracting neologisms system is often earlier than having enough data in Google Trends when performing predictions on Twitter data, which prove the validity and timeliness of presented system.