The overlap of software vulnerabilities discussed across the multiple social platforms. https://doi.org/10.1371/journal.pone.0230250.g002

Source publication

Fig 1. Example CVEs that appear on social media before published in NVD.

Fig 2. The overlap of software vulnerabilities discussed across the...

Fig 3. Discussion volume on social media vs vulnerabilities published...

Fig 4. Discussion volume distribution on three social media platforms...

Fig 5. Projected user-user network visualizations for GitHub (degree...

Multiple social platforms reveal actionable signals for software vulnerability awareness: A study of GitHub, Twitter and Reddit

Article

Full-text available

Mar 2020

The awareness about software vulnerabilities is crucial to ensure effective cybersecurity practices, the development of high-quality software, and, ultimately, national security. This awareness can be better understood by studying the spread, structure and evolution of software vulnerability discussions across online communities. This work is the f...

Context 1

... number of different vulnerabilities discussed in the three platforms is not uniform. Fig 2 shows the number of vulnerabilities for each platform and the overlap between platforms. GitHub has the highest variety of CVE IDs being discussed (12,928) and this makes sense since GitHub is a platform geared towards software development. ...

View in full-text

An Analysis of Network Protocol Vulnerability Mining Using Fuzz Testing Combined with Deep Learning Models

Article

Full-text available

Apr 2024

Ya Ma

The protection of industrialized management systems and related network protocols is guaranteed by vulnerability mining innovation. The inadequate receiving efficiency and insufficient vulnerability mining capacity of vulnerability mining strategies are their drawbacks. So, this study analyzes the network protocol vulnerability mining using fuzz testing combined with deep learning (DL). In this study, Modbus TCP is employed as a network protocol regarding vulnerability mining. This paper presents a unique threshold-sample-driven deep neural network (T-s DNN) framework. Based on the TSDNN, we construct a fuzzing framework (T-s DNN Fuzzer) for Modbus TCP protocols. The DNN algorithm is first trained to understand the meaning of the protocol's data unit using this framework. The likelihood distribution of every value in the information is quantified using the softmax mechanism. The technique then examines the highest likelihood and the random variable's threshold in deciding whether to use the information value with the optimal likelihood in place of the existing information value. The MBAP header has been finished by the protocol standard. Fuzz tests demonstrate that in addition to increasing sample receipt levels and exploitability, fuzzing devices can identify protocol vulnerabilities rapidly. Experiments conducted with the T-s DNN fuzzer demonstrate that it can detect industrial control protocol vulnerabilities greater in addition to increasing test case reception scores and exploitability.

Content and interaction-based mapping of Reddit posts related to information security

Article

Full-text available

Apr 2024

Ensuring the privacy and safety of platform users has become a complex objective due to the emerging threats that surround any type of network, software, and hardware. Scams, malwares, hackers, and security vulnerabilities form the epicenter of cyber threats causing severe damage to the affected systems and sensitive data of users. Thus, users turn to online social networks to report cyber threats, discuss topics of their interest, and obtain knowledge concerning the various perspectives of information security. In this study, we aim to address the concepts of social interactions surrounding information security-related content by retrieving and analyzing Reddit posts from 45 relevant subreddits. In this regard, a word clustering approach is employed, based on the Affinity Propagation algorithm, that leads to the extraction and interpretation of 54 concepts. These concepts are relevant to information security and some more generic areas of interest including social media, software vendors, and labors. Furthermore, to provide a more comprehensive overview of users’ activity in the different Reddit communities/subreddits, a knowledge map associating subreddits and concepts based on their conceptual similarities is also established. The analysis shows that the descriptions of the examined subreddits are strongly related to their underlying concepts. At the same time, the outcomes also assess the conceptual associations between the different subreddits, offering knowledge related to similar and distant communities. Ultimately, two post metrics are utilized to explore how the concepts may impact user interactions. This allows us to differentiate between concepts associated with posts typically endorsed by communities, resulting in increased information exchange (via comments), or contributing as news/announcements. Overall, the findings of this study can be used as a knowledge basis in determining user interests, opinions, perspectives, and responsiveness, when it comes to cyber threats, attacks, and malicious activities. Also, the respective outcomes can contribute as a guide for identifying similar communities/subreddits and themes. Regarding the methodological contributions of this study, the proposed framework can be adapted to similar datasets and research goals as it does not depend on the special characteristics of the imported data, offering, in turn, a practical approach for future research.

Understanding EdTech's Privacy and Security Issues: Understanding the Perception and Awareness of Education Technologies' Privacy and Security Issues

Article

Full-text available

Oct 2023

Rakibul Hasan

A clear and well-documented LaTeX document is presented as an article formatted for publication by ACM in a conference proceedings or journal publication. Based on the "acmart" document class, this article presents and explains many of the common variations, as well as many of the formatting elements an author may use in the preparation of the documentation of their work.

Attention-grabbing news coverage: Violent images of the Black Lives Matter movement and how they attract user attention on Reddit

Article

Full-text available

Aug 2023
PLOS ONE

Portrayals of violence are common in contemporary media reporting; they attract public attention and influence the reader's opinion. In the particular context of a social movement such as Black Lives Matter (BLM), the portrayal of violence in news coverage attracts public attention and can affect the movement's development, support, and public perception. Research on the relationship between digital news content featuring violence and user attention on social media has been scarce. This paper analyzes the relationship between violence in online reporting on BLM and its effect on user attention on the social media platform Reddit. The analysis focuses on the portrayal of violence in images used in BLM-related digital news coverage shared on Reddit. The dataset is comprised of 5,873 news articles with images. The classification of violent images is based on a VGG19 convolutional neural network (CNN) trained on a comprehensive dataset. The results suggest that what significantly affects user attention in digital news content is not the display of violence in images; rather, it is negative article titles, the news outlet's political leanings and level of factual reporting, and platform affordances that significantly affect user attention. Thus, this paper adds to the understanding of user attention distributions online and paves the way for future research in this field.

Does the first response matter for future contributions? A study of first contributions

Article

Full-text available

May 2023
EMPIR SOFTW ENG

Open Source Software (OSS) projects rely on a continuous stream of new contributors for their livelihood. Recent studies reported that new contributors experience many barriers in their first contribution, with the social barrier being critical. Although a number of studies investigated the social barriers to new contributors, we hypothesize that negative first responses may cause an unpleasant feeling, and subsequently lead to the discontinuity of any future contribution. We execute protocols of a registered report to analyze 2,765,917 first contributions as Pull Requests (PRs) with 642,841 first responses. We characterize most first response as being positive, but less responsive, and exhibiting sentiments of fear, joy and love. Results also indicate that negative first responses have the literal intention to arouse emotions of being either constructive (50.71%) or criticizing (37.68%) in nature. Running different machine learning models, we find that predicting future interactions is low (F1 score of 0.6171), but relatively better than baselines. Furthermore, an analysis of these models show that interactions are positively correlated with a future contribution, with other dimensions (i.e., project, contributor, contribution) having a large effect.

Internet-Based Mental Health Survey Research: Navigating Internet Bots on Reddit

Article

Feb 2023

This study was a multistage process of recruiting participants through Reddit with the intent of increasing data integrity when facing an infiltration of Internet bots. Approaches to increase data integrity centered around preventing the occurrence of Internet bots from the onset and increasing the ability to identify Internet bot responses. We attempted to detect bots in a study focused on understanding social factors related to autism and suicide risk. Four recruitment rounds occurred through Reddit on mental health-related subreddits, with one post made on each subreddit per recruitment round. We found high presence of bots in the initial rounds-indeed, using location data, one third of the total responses (33.4 percent; 118/353) came from just eight locations (i.e., 4.7 percent of all locations). The proportion of detected bots was significantly different across the rounds of recruitment (χ2 = 150.22, df = 3, p < 0.001). In round 4, language advertising compensation was removed from recruitment posts. This round had significantly lower proportions of detected bots compared with round 1 (χ2 = 33.01, df = 1, p < 0.001), round 2 (χ2 = 129.14, df = 1, p < 0.001), and round 3 (χ2 = 46.6, df = 1, p < 0.001). Through a multistage recruitment process, we were able to increase the integrity of our collected data, as determined by a low percentage of fraudulent responses. Only once we removed advertisement of compensation in recruitment posts, did we see a significant decrease in the quantity and percentage of Internet bot responses. This multistage recruitment study provides valuable information regarding how to adapt when an online survey study is infiltrated with Internet bots.

ExpSeeker: extract public exploit code information from social media

Article

Full-text available

Nov 2022
APPL INTELL

Malicious actors often utilize publicly available software vulnerabilities and exploit codes to attack vulnerable targets. Exploit codes are shared across several platforms, including exploit databases, hacker communities, and social media platforms. Public exploit code information is a type of cyber threat intelligence. It can help security experts to analyze which vulnerabilities are available for malicious actors and need to be prioritized for patching. In this paper, We propose a intelligent framework to automatically extract public exploit code information from social media. Social media sites are capable of aggregating numerous cybersecurity-related information due to their timeliness and volume. Firstly, we present a convolutional neural network classifier to identity disclose exploit codes in their content or corresponding web pages linked in tweets, which achieved 0.989 AUC and 0.939 F1-score. The model shows better prediction accuracy than the baseline approaches. Secondly, we present a Bert-BiLSTM-CRF entity recognition method to figure out the target entity which may be influenced by the exploit code. As a result, the Bert-BiLSTM-CRF model reached an F1-score of 0.959, which performed better than the 0.927 and 0.922 obtained by the same neural network using Word2vec and GloVe word embeddings respectively. Finally, the experiment results show the proposed method provide enriched supplementary information and earlier intelligence for the appearances of open exploit codes on the Internet by contrasting to the exploit database.

Using Hashtags to Analyze Purpose and Technology Application of Open-Source Project Related to COVID-19

Preprint

Full-text available

Jul 2022

COVID-19 has had a profound impact on the lives of all human beings. Emerging technologies have made significant contributions to the fight against the pandemic. An extensive review of the application of technology will help facilitate future research and technology development to provide better solutions for future pandemics. In contrast to the extensive surveys of academic communities that have already been conducted, this study explores the IT community of practice. Using GitHub as the study target, we analyzed the main functionalities of the projects submitted during the pandemic. This study examines trends in projects with different functionalities and the relationship between functionalities and technologies. The study results show an imbalance in the number of projects with varying functionalities in the GitHub community, i.e., applications account for more than half of the projects. In contrast, other data analysis and AI projects account for a smaller share. This differs significantly from the survey of the academic community, where the findings focus more on cutting-edge technologies while projects in the community of practice use more mature technologies. The spontaneous behavior of developers may lack organization and make it challenging to target needs.

A Survey on Data-driven Software Vulnerability Assessment and Prioritization

Preprint

Full-text available

Jul 2021

Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surge in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken SV assessment and prioritization to the next level. Our survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization. We also discuss the current limitations and propose potential solutions to address such issues.

A Framework for Unsupervised Classificiation and Data Mining of Tweets about Cyber Vulnerabilities

Preprint

Full-text available

Apr 2021

Many cyber network defense tools rely on the National Vulnerability Database (NVD) to provide timely information on known vulnerabilities that exist within systems on a given network. However, recent studies have indicated that the NVD is not always up to date, with known vulnerabilities being discussed publicly on social media platforms, like Twitter and Reddit, months before they are published to the NVD. To that end, we present a framework for unsupervised classification to filter tweets for relevance to cyber security. We consider and evaluate two unsupervised machine learning techniques for inclusion in our framework, and show that zero-shot classification using a Bidirectional and Auto-Regressive Transformers (BART) model outperforms the other technique with 83.52% accuracy and a F1 score of 83.88, allowing for accurate filtering of tweets without human intervention or labelled data for training. Additionally, we discuss different insights that can be derived from these cyber-relevant tweets, such as trending topics of tweets and the counts of Twitter mentions for Common Vulnerabilities and Exposures (CVEs), that can be used in an alert or report to augment current NVD-based risk assessment tools.

The overlap of software vulnerabilities discussed across the multiple social platforms. https://doi.org/10.1371/journal.pone.0230250.g002

Context in source publication

Citations