Fig 2 - available via license: CC BY
Content may be subject to copyright.
The overlap of software vulnerabilities discussed across the multiple social platforms. https://doi.org/10.1371/journal.pone.0230250.g002

The overlap of software vulnerabilities discussed across the multiple social platforms. https://doi.org/10.1371/journal.pone.0230250.g002

Source publication
Article
Full-text available
The awareness about software vulnerabilities is crucial to ensure effective cybersecurity practices, the development of high-quality software, and, ultimately, national security. This awareness can be better understood by studying the spread, structure and evolution of software vulnerability discussions across online communities. This work is the f...

Context in source publication

Context 1
... number of different vulnerabilities discussed in the three platforms is not uniform. Fig 2 shows the number of vulnerabilities for each platform and the overlap between platforms. GitHub has the highest variety of CVE IDs being discussed (12,928) and this makes sense since GitHub is a platform geared towards software development. ...

Citations

... As a result, software faults that can be utilized maliciously are categorized as vulnerabilities. Because vulnerabilities are often ignored by users or programmers during regular system operation, vulnerabilities need a whole different strategy to identification than flaws, which can be found more easily and naturally [5]. Compared to usual faults, these make tackling vulnerabilities considerably more difficult. ...
Article
Full-text available
The protection of industrialized management systems and related network protocols is guaranteed by vulnerability mining innovation. The inadequate receiving efficiency and insufficient vulnerability mining capacity of vulnerability mining strategies are their drawbacks. So, this study analyzes the network protocol vulnerability mining using fuzz testing combined with deep learning (DL). In this study, Modbus TCP is employed as a network protocol regarding vulnerability mining. This paper presents a unique threshold-sample-driven deep neural network (T-s DNN) framework. Based on the TSDNN, we construct a fuzzing framework (T-s DNN Fuzzer) for Modbus TCP protocols. The DNN algorithm is first trained to understand the meaning of the protocol's data unit using this framework. The likelihood distribution of every value in the information is quantified using the softmax mechanism. The technique then examines the highest likelihood and the random variable's threshold in deciding whether to use the information value with the optimal likelihood in place of the existing information value. The MBAP header has been finished by the protocol standard. Fuzz tests demonstrate that in addition to increasing sample receipt levels and exploitability, fuzzing devices can identify protocol vulnerabilities rapidly. Experiments conducted with the T-s DNN fuzzer demonstrate that it can detect industrial control protocol vulnerabilities greater in addition to increasing test case reception scores and exploitability.
... Through the integration of data from social network platforms, researchers have previously offered important information by studying user interactions with security issues/vulnerabilities [30] and by associating conversations with specific weaknesses [31,32]. Regarding user awareness and system protection, prior studies make use of social network data to produce user alerts [10,16,33], assess exploitability indicators [15,18], and evaluate threat evidence [19] using machine learning algorithms, statistical analysis, networks, and text mining approaches. Moreover, other studies aim to protect social network users and software applications as well as prevent potential incidents by detecting malicious behavior [34][35][36]. ...
Article
Full-text available
Ensuring the privacy and safety of platform users has become a complex objective due to the emerging threats that surround any type of network, software, and hardware. Scams, malwares, hackers, and security vulnerabilities form the epicenter of cyber threats causing severe damage to the affected systems and sensitive data of users. Thus, users turn to online social networks to report cyber threats, discuss topics of their interest, and obtain knowledge concerning the various perspectives of information security. In this study, we aim to address the concepts of social interactions surrounding information security-related content by retrieving and analyzing Reddit posts from 45 relevant subreddits. In this regard, a word clustering approach is employed, based on the Affinity Propagation algorithm, that leads to the extraction and interpretation of 54 concepts. These concepts are relevant to information security and some more generic areas of interest including social media, software vendors, and labors. Furthermore, to provide a more comprehensive overview of users’ activity in the different Reddit communities/subreddits, a knowledge map associating subreddits and concepts based on their conceptual similarities is also established. The analysis shows that the descriptions of the examined subreddits are strongly related to their underlying concepts. At the same time, the outcomes also assess the conceptual associations between the different subreddits, offering knowledge related to similar and distant communities. Ultimately, two post metrics are utilized to explore how the concepts may impact user interactions. This allows us to differentiate between concepts associated with posts typically endorsed by communities, resulting in increased information exchange (via comments), or contributing as news/announcements. Overall, the findings of this study can be used as a knowledge basis in determining user interests, opinions, perspectives, and responsiveness, when it comes to cyber threats, attacks, and malicious activities. Also, the respective outcomes can contribute as a guide for identifying similar communities/subreddits and themes. Regarding the methodological contributions of this study, the proposed framework can be adapted to similar datasets and research goals as it does not depend on the special characteristics of the imported data, offering, in turn, a practical approach for future research.
... We complement prior survey-based research by analyzing large-scale data from two popular platforms: Twitter and Reddit. These platforms offer diverse perspectives on current sociotechnical issues and have been used in numerous studies on privacy and security (e.g., [7,64,65,78,98,101]). From Twitter and relevant subreddits (e.g., /r/teachers, /r/students, and /r/edtechhelp), we collected 11M tweets and 0.5M Reddit posts that contained EdTechrelated keywords; these posts were made between January 2008 and February 2022. ...
... Twitter provides an open platform for researchers, academics, and practitioners to discuss various aspects of research and teaching [57], and Reddit offers role-specific forums for different EdTech stakeholders. These platforms were instrumental in acquiring threats intelligence and understanding public attitudes toward security/privacy issues at scale (e.g., [64,78,98]). Furthermore, non-security experts also participate online in such discussions [114], and critically, large-scale public discussions have led to desirable outcomes such as enhancing product security [62] and banning invasive remote proctoring apps at educational institutes [23]. ...
... In this paper, we take a quantitative approach and analyze largescale data from Twitter and Reddit (Section 5) to identify privacy and security concerns regarding EdTech. This approach has been adopted in numerous prior research investigating such concerns in different domains (e.g., [56,64,78,90,93,98,125]). While interviewbased studies allow one to identify rich and nuanced content, collecting data from a large sample is not usually possible in those settings. ...
Article
Full-text available
A clear and well-documented LaTeX document is presented as an article formatted for publication by ACM in a conference proceedings or journal publication. Based on the "acmart" document class, this article presents and explains many of the common variations, as well as many of the formatting elements an author may use in the preparation of the documentation of their work.
... Other studies have come to similar conclusions that although news shared on a "news aggregator website" featuring images positively affects the readership, images negatively affect the number of comments an article receives due to their distracting character [106]. Since Reddit is a social media website known for encouraging the posting of comments and the discussion amongst users [107], it operates entirely differently than news aggregator websites. Thus, we consider our results as complementary to previous research. ...
Article
Full-text available
Portrayals of violence are common in contemporary media reporting; they attract public attention and influence the reader's opinion. In the particular context of a social movement such as Black Lives Matter (BLM), the portrayal of violence in news coverage attracts public attention and can affect the movement's development, support, and public perception. Research on the relationship between digital news content featuring violence and user attention on social media has been scarce. This paper analyzes the relationship between violence in online reporting on BLM and its effect on user attention on the social media platform Reddit. The analysis focuses on the portrayal of violence in images used in BLM-related digital news coverage shared on Reddit. The dataset is comprised of 5,873 news articles with images. The classification of violent images is based on a VGG19 convolutional neural network (CNN) trained on a comprehensive dataset. The results suggest that what significantly affects user attention in digital news content is not the display of violence in images; rather, it is negative article titles, the news outlet's political leanings and level of factual reporting, and platform affordances that significantly affect user attention. Thus, this paper adds to the understanding of user attention distributions online and paves the way for future research in this field.
... Developer Communication that Impact Contributions The first area is how discussions can between developers. The discussions of developers has been widely explored in GitHub (e.g., issue and PR) (Bosu and Carver 2014;Tsay et al. 2014), issue tracking systems (Correa and Sureka 2013;Bertram et al. 2010), StackOverflow (Barua et al. 2014), Twitter (Bougie et al. 2011), and Reddit (Iqbal et al. 2021;Shrestha et al. 2020). As discussed by Tsay et al. (2014), the developer discussion can encourage further contributions to OSS projects. ...
Article
Full-text available
Open Source Software (OSS) projects rely on a continuous stream of new contributors for their livelihood. Recent studies reported that new contributors experience many barriers in their first contribution, with the social barrier being critical. Although a number of studies investigated the social barriers to new contributors, we hypothesize that negative first responses may cause an unpleasant feeling, and subsequently lead to the discontinuity of any future contribution. We execute protocols of a registered report to analyze 2,765,917 first contributions as Pull Requests (PRs) with 642,841 first responses. We characterize most first response as being positive, but less responsive, and exhibiting sentiments of fear, joy and love. Results also indicate that negative first responses have the literal intention to arouse emotions of being either constructive (50.71%) or criticizing (37.68%) in nature. Running different machine learning models, we find that predicting future interactions is low (F1 score of 0.6171), but relatively better than baselines. Furthermore, an analysis of these models show that interactions are positively correlated with a future contribution, with other dimensions (i.e., project, contributor, contribution) having a large effect.
... 7 Similar concerns have risen with regard to recruitment through Twitter, GitHub, and Reddit. 8 1 Downloaded by Mary Ann Liebert, Inc., publishers from www.liebertpub.com at 02/13/23. ...
Article
This study was a multistage process of recruiting participants through Reddit with the intent of increasing data integrity when facing an infiltration of Internet bots. Approaches to increase data integrity centered around preventing the occurrence of Internet bots from the onset and increasing the ability to identify Internet bot responses. We attempted to detect bots in a study focused on understanding social factors related to autism and suicide risk. Four recruitment rounds occurred through Reddit on mental health-related subreddits, with one post made on each subreddit per recruitment round. We found high presence of bots in the initial rounds-indeed, using location data, one third of the total responses (33.4 percent; 118/353) came from just eight locations (i.e., 4.7 percent of all locations). The proportion of detected bots was significantly different across the rounds of recruitment (χ2 = 150.22, df = 3, p < 0.001). In round 4, language advertising compensation was removed from recruitment posts. This round had significantly lower proportions of detected bots compared with round 1 (χ2 = 33.01, df = 1, p < 0.001), round 2 (χ2 = 129.14, df = 1, p < 0.001), and round 3 (χ2 = 46.6, df = 1, p < 0.001). Through a multistage recruitment process, we were able to increase the integrity of our collected data, as determined by a low percentage of fraudulent responses. Only once we removed advertisement of compensation in recruitment posts, did we see a significant decrease in the quantity and percentage of Internet bot responses. This multistage recruitment study provides valuable information regarding how to adapt when an online survey study is infiltrated with Internet bots.
... Previous work by Allodi et al. [22] focuses on multiple deficiencies in CVSS version 2 as a metric for predicting whether or not a vulnerability will be exploited in the wild, specifically because predicting the small fraction of vulnerabilities exploited in the wild is not one of the design goals of CVSS. [23]. Its strong community structure enhances information diffusion. ...
Article
Full-text available
Malicious actors often utilize publicly available software vulnerabilities and exploit codes to attack vulnerable targets. Exploit codes are shared across several platforms, including exploit databases, hacker communities, and social media platforms. Public exploit code information is a type of cyber threat intelligence. It can help security experts to analyze which vulnerabilities are available for malicious actors and need to be prioritized for patching. In this paper, We propose a intelligent framework to automatically extract public exploit code information from social media. Social media sites are capable of aggregating numerous cybersecurity-related information due to their timeliness and volume. Firstly, we present a convolutional neural network classifier to identity disclose exploit codes in their content or corresponding web pages linked in tweets, which achieved 0.989 AUC and 0.939 F1-score. The model shows better prediction accuracy than the baseline approaches. Secondly, we present a Bert-BiLSTM-CRF entity recognition method to figure out the target entity which may be influenced by the exploit code. As a result, the Bert-BiLSTM-CRF model reached an F1-score of 0.959, which performed better than the 0.927 and 0.922 obtained by the same neural network using Word2vec and GloVe word embeddings respectively. Finally, the experiment results show the proposed method provide enriched supplementary information and earlier intelligence for the appearances of open exploit codes on the Internet by contrasting to the exploit database.
... These open-source projects share code and allow community members to create and build together. Open strategies can yield higher returns (Barge-Gil 2013), while modern software implementations rely entirely on open-source libraries and components (Shrestha et al. 2020). Community plays a unique role in creating new things (Soos and Leazer 2020); it can be seen that the open-source community is the leading practice frontier of open innovation. ...
Preprint
Full-text available
COVID-19 has had a profound impact on the lives of all human beings. Emerging technologies have made significant contributions to the fight against the pandemic. An extensive review of the application of technology will help facilitate future research and technology development to provide better solutions for future pandemics. In contrast to the extensive surveys of academic communities that have already been conducted, this study explores the IT community of practice. Using GitHub as the study target, we analyzed the main functionalities of the projects submitted during the pandemic. This study examines trends in projects with different functionalities and the relationship between functionalities and technologies. The study results show an imbalance in the number of projects with varying functionalities in the GitHub community, i.e., applications account for more than half of the projects. In contrast, other data analysis and AI projects account for a smaller share. This differs significantly from the survey of the academic community, where the findings focus more on cutting-edge technologies while projects in the community of practice use more mature technologies. The spontaneous behavior of developers may lack organization and make it challenging to target needs.
... Version control systems like GitHub 23 provide details about how developers addressed past SVs in real-world projects. Shrestha et al. [165] found developers sometimes discuss/disclose SV-related information on GitHub discussions even before the studied social media such as Twitter or Reddit. These findings show the potential of using GitHub discussions to complement the current sources for earlier SV assessment and prioritization. ...
Preprint
Full-text available
Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surge in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken SV assessment and prioritization to the next level. Our survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization. We also discuss the current limitations and propose potential solutions to address such issues.
... † MIT Lincoln Laboratory ‡ MIT Lincoln Laboratory § MIT Lincoln Laboratory research shows the NVD is not always updated in a timely manner, and that information about vulnerabilities is often openly discussed on social media platforms months before it is published by the NVD [16,17,1]. These social media platforms, such as Twitter, Reddit, and GitHub, have become profilic sources of information due to the wide access and reach of these mediums [19,7]. These platforms can be leveraged as open source threat intelligence (OSINT) for use by cyber operators. ...
Preprint
Full-text available
Many cyber network defense tools rely on the National Vulnerability Database (NVD) to provide timely information on known vulnerabilities that exist within systems on a given network. However, recent studies have indicated that the NVD is not always up to date, with known vulnerabilities being discussed publicly on social media platforms, like Twitter and Reddit, months before they are published to the NVD. To that end, we present a framework for unsupervised classification to filter tweets for relevance to cyber security. We consider and evaluate two unsupervised machine learning techniques for inclusion in our framework, and show that zero-shot classification using a Bidirectional and Auto-Regressive Transformers (BART) model outperforms the other technique with 83.52% accuracy and a F1 score of 83.88, allowing for accurate filtering of tweets without human intervention or labelled data for training. Additionally, we discuss different insights that can be derived from these cyber-relevant tweets, such as trending topics of tweets and the counts of Twitter mentions for Common Vulnerabilities and Exposures (CVEs), that can be used in an alert or report to augment current NVD-based risk assessment tools.