Ying Li’s research while affiliated with Sichuan University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


NEDetector: Automatically extracting cybersecurity neologisms from hacker forums
  • Article

May 2021

·

139 Reads

·

20 Citations

Journal of Information Security and Applications

Ying Li

·

·

·

[...]

·

Weina Niu

Underground hacker forums serve as an online social platform for hackers to communicate and spread hacking techniques and tools. In these forums, a lot of latest information indirectly or directly affects cyberspace security, thereby threatening the assets of enterprises or individuals. Therefore, social media such as hacker forums and twitter have a great impact on the cybersecurity area. In recent years, analyzing hacker forum data to explore hacking activities and cybersecurity situational awareness have aroused widespread interest among researchers. Automatically identifying cybersecurity words and extracting neologisms from open source social platforms are less successful and still require further research. In order to provide early warning of cyber attack incidents, we proposed NEDetector, a novel method to automatically identify cybersecurity words and extract neologisms from unstructured content, mainly focus on attack groups and hacking tools. NEDetector firstly analyzes the cybersecurity words and proposes four group features to build cybersecurity words identification model based on Bidirectional LSTM algorithm. Secondly, NEDetector introduces 4 sets of features to identify cybersecurity neologisms based on RandomForest algorithm. The experiment result shows that the whole system of NEDetector achieves an identification precision of 89.11%. Furthermore, the proposed extracting neologisms system is often earlier than having enough data in Google Trends when performing predictions on Twitter data, which prove the validity and timeliness of presented system.


Figure 1. The framework of HackerRank.
Figure 2. Social network graph.
Figure 4. LDA model's coherence of different number of topics.
Figure 5. LDA model's perplexity of different number of topics.
Underground forum data sets.

+2

HackerRank: Identifying key hackers in underground forums
  • Article
  • Full-text available

May 2021

·

981 Reads

·

32 Citations

With the rapid development of the Internet, cybersecurity situation is becoming more and more complex. At present, surface web and dark web contain numerous underground forums or markets, which play an important role in cybercrime ecosystem. Therefore, cybersecurity researchers usually focus on hacker-centered research on cybercrime, trying to find key hackers and extract credible cyber threat intelligence from them. The data scale of underground forums is tremendous and key hackers only represent a small fraction of underground forum users. It takes a lot of time as well as expertise to manually analyze key hackers. Therefore, it is necessary to propose a method or tool to automatically analyze underground forums and identify key hackers involved. In this work, we present HackerRank, an automatic method for identifying key hackers. HackerRank combines the advantages of content analysis and social network analysis. First, comprehensive evaluations and topic preferences are extracted separately using content analysis. Then, it uses an improved Topic-specific PageRank to combine the results of content analysis with social network analysis. Finally, HackerRank obtains users’ ranking, with higher-ranked users being considered as key hackers. To demonstrate the validity of proposed method, we applied HackerRank to five different underground forums separately. Compared to using social network analysis and content analysis alone, HackerRank increases the coverage rate of five underground forums by 3.14% and 16.19% on average. In addition, we performed a manual analysis of identified key hackers. The results prove that the method is effective in identifying key hackers in underground forums.

Download

ACER: detecting Shadowsocks server based on active probe technology

September 2020

·

2,534 Reads

·

7 Citations

Journal of Computer Virology and Hacking Techniques

Anonymous server is created for hiding the information of hosts when they are surfing the Internet, such as Tor, Shadowsocks, etc. It is quite difficult to identify these servers, which provides potential criminals with opportunities to commit crime. Also, hackers can make use of these servers to threaten public network security, such as DDoS and Phishing attacks. Hence, the study of identifying these servers is pretty crucial. Current works on detecting Shadowsocks servers are mostly based on the features of servers’ data stream combined with machine learning. However, they are passive methods because they can only be established when the servers are in connection state. Therefore, we propose a new system named ACER, which AC means active and ER means expert, to detect these servers. Besides, we introduce XGBoost algorithm to process the data stream to optimize the detection. The method can recognize more Shadowsocks servers actively instead of monitoring the communication tunnel passively to identify the servers. The experiment result has achieved an accuracy of 94.63% by taking proposed framework and 1.20% more accurate than other existing solutions. We hope to provide a novel solution for those who are conducting research in this area, and provide a detection scheme for network censors to block illegal servers at the same time.

Citations (3)


... Other approaches use dark web data but are built using singular LLM-based agents. V. Varghese, et al. [7] build on previous dark web scraping research that used traditional technologies, such as PageRank [8], TD-IDF [9], SVM, and CNN models [10]. They choose to implement the CyBERT [11] model that is optimized for Named Entity Recognition (NER). ...

Reference:

MAD-CTI: Cyber Threat Intelligence Analysis of the Dark Web Using a Multi-Agent Framework
HackerRank: Identifying key hackers in underground forums

... Such data are for the most part not available [116], at least up until now (the creation of synthetic data may contribute to progress in this aspect). • Although state-of-the-art NLP techniques can facilitate the understanding and extraction of information in cybersecurity in the form of textual reports, advisories, and logs, these texts also often require more sophisticated, tailored NLP approaches that can handle jargon, abbreviations, context-specific meaning, and neologisms [117]. • As taxonomies can potentially be used in the scope of critical applications, human experts need to trust and understand automated taxonomies; this commands the application of Explainable AI (XAI) [118]. ...

NEDetector: Automatically extracting cybersecurity neologisms from hacker forums
  • Citing Article
  • May 2021

Journal of Information Security and Applications

... They detect Shadowsocks traffic using the Random Forest algorithm. Cheng et al. [13] propose an active method for Shadowsocks servers detection. They collect the IP and port of the server as a dataset, and then classify servers of the Shadowsocks using machine learning algorithm XGBoost. ...

ACER: detecting Shadowsocks server based on active probe technology

Journal of Computer Virology and Hacking Techniques