Bradley Reaves’s research while affiliated with North Carolina State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (49)


RiskHarvester: A Risk-based Tool to Prioritize Secret Removal Efforts in Software Artifacts
  • Preprint

February 2025

·

1 Read

·

Tanmay Pardeshi

·

Bradley Reaves

·

Laurie Williams

Since 2020, GitGuardian has been detecting checked-in hard-coded secrets in GitHub repositories. During 2020-2023, GitGuardian has observed an upward annual trend and a four-fold increase in hard-coded secrets, with 12.8 million exposed in 2023. However, removing all the secrets from software artifacts is not feasible due to time constraints and technical challenges. Additionally, the security risks of the secrets are not equal, protecting assets ranging from obsolete databases to sensitive medical data. Thus, secret removal should be prioritized by security risk reduction, which existing secret detection tools do not support. The goal of this research is to aid software practitioners in prioritizing secrets removal efforts through our security risk-based tool. We present RiskHarvester, a risk-based tool to compute a security risk score based on the value of the asset and ease of attack on a database. We calculated the value of asset by identifying the sensitive data categories present in a database from the database keywords in the source code. We utilized data flow analysis, SQL, and ORM parsing to identify the database keywords. To calculate the ease of attack, we utilized passive network analysis to retrieve the database host information. To evaluate RiskHarvester, we curated RiskBench, a benchmark of 1,791 database secret-asset pairs with sensitive data categories and host information manually retrieved from 188 GitHub repositories. RiskHarvester demonstrates precision of (95%) and recall (90%) in detecting database keywords for the value of asset and precision of (96%) and recall (94%) in detecting valid hosts for ease of attack. Finally, we conducted a survey (52 respondents) to understand whether developers prioritize secret removal based on security risk score. We found that 86% of the developers prioritized the secrets for removal with descending security risk scores.



Figure 5: Distribution of the top 15 categories of tracebacks of illegal robocalls used in PPoNE enforcement actions
Figure 6: Robocall Observatory's monthly call volume
Figure 7: Calls received by RRAPTOR signed with STIR/SHAKEN in 2024 (normalized by total calls per day) 5.1.2. Role of Honeypots in Characterizing Robocalls. Honeypots serve as valuable vantage points to study robocalls. We discuss and quantify the impact of adding interactivity to honeypots and how that can improve our visibility into the robocalling ecosystem. Finding 4: Large-scale and long-running honeypots are reliable vantage points to measure broader robocalling phenomena. To study the extent to which honeypots can be used to measure broader phenomena within the robocalling landscape, we compare two phenomena inherent to the robocalling ecosystem. Namely, fraction of unsolicited phone calls signed with STIR/SHAKEN and distribution of languages used by robocalling campaigns. The fraction of unsolicited phone calls signed with STIR/SHAKEN is consistent across three distinct vantage points -RRAPTOR, Robocall Observatory and independent industry reports. As seen in Figure 7 and Figure 8, the fraction of signed calls is about 80% across both honeypots as of May 2024, while following a similar trend throughout 2024. Furthermore, Aattested calls are the most common, with C-attested calls being the second most while B-attested calls were the third most common category. Finally, in both honeypots, the largest fraction of robocall campaigns use English. Spanish and Mandarin are the second and third most common languages, respectively. By comparing the signing rates and language distribution, we find that broader phenomena measured using Robocall Observatory and RRAPTOR as two different vantage points are consistent. This demonstrates that honeypots operating at the scale of Robocall Observatory and RRAPTOR are reliable sources of data for measuring broader robocalling phenomena. They can independently report and provide valuable insights into the robocalling ecosystem. Honeypots can differ in their operational characteristics by being more or less interactive, as described in Section 2 [32]. However, developing, deploying and operating an interactive honeypot (versus a non-interactive honeypot) is substantially more challenging. It requires well-timed interactive prompts, more computational resources, and nontrivial design overhaul to detect and play audio in near-real time. Such fundamental changes to a honeypot's design is not justified if it yields only marginal benefits.
Figure 8: Calls received by Robocall Observatory signed with STIR/SHAKEN in 2024 (normalized by total daily calls)
Figure 13: Heatmap of Caller ID Unique To Specific Feeds Relative to Other Feeds
Characterizing Robocalls with Multiple Vantage Points
  • Preprint
  • File available

October 2024

·

119 Reads

Telephone spam has been among the highest network security concerns for users for many years. In response, industry and government have deployed new technologies and regulations to curb the problem, and academic and industry researchers have provided methods and measurements to characterize robocalls. Have these efforts borne fruit? Are the research characterizations reliable, and have the prevention and deterrence mechanisms succeeded? In this paper, we address these questions through analysis of data from several independently-operated vantage points, ranging from industry and academic voice honeypots to public enforcement and consumer complaints, some with over 5 years of historic data. We first describe how we address the non-trivial methodological challenges of comparing disparate data sources, including comparing audio and transcripts from about 3 million voice calls. We also detail the substantial coherency of these diverse perspectives, which dramatically strengthens the evidence for the conclusions we draw about robocall characterization and mitigation while highlighting advantages of each approach. Among our many findings, we find that unsolicited calls are in slow decline, though complaints and call volumes remain high. We also find that robocallers have managed to adapt to STIR/SHAKEN, a mandatory call authentication scheme. In total, our findings highlight the most promising directions for future efforts to characterize and stop telephone spam.

Download


Figure 8: Private Traceback functionality
Minimum system requirements for each component in Jäger suppose the network processes 10,000 calls per second.
Performance for protocols measured in milliseconds.
J\"ager: Automated Telephone Call Traceback

September 2024

·

17 Reads

Unsolicited telephone calls that facilitate fraud or unlawful telemarketing continue to overwhelm network users and the regulators who prosecute them. The first step in prosecuting phone abuse is traceback -- identifying the call originator. This fundamental investigative task currently requires hours of manual effort per call. In this paper, we introduce J\"ager, a distributed secure call traceback system. J\"ager can trace a call in a few seconds, even with partial deployment, while cryptographically preserving the privacy of call parties, carrier trade secrets like peers and call volume, and limiting the threat of bulk analysis. We establish definitions and requirements of secure traceback, then develop a suite of protocols that meet these requirements using witness encryption, oblivious pseudorandom functions, and group signatures. We prove these protocols secure in the universal composibility framework. We then demonstrate that J\"ager has low compute and bandwidth costs per call, and these costs scale linearly with call volume. J\"ager provides an efficient, secure, privacy-preserving system to revolutionize telephone abuse investigation with minimal costs to operators.





A Comparative Study of Software Secrets Reporting by Secret Detection Tools

July 2023

·

215 Reads

·

2 Citations

Background: According to GitGuardian's monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated. Aims: The goal of our study is to aid developers in choosing a secret detection tool to reduce the exposure of secrets through an empirical investigation of existing secret detection tools. Method: We present an evaluation of five open-source and four proprietary tools against a benchmark dataset. Results: The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%). Our manual analysis of reported secrets reveals that false positives are due to employing generic regular expressions and ineffective entropy calculation. In contrast, false negatives are due to faulty regular expressions, skipping specific file types, and insufficient rulesets. Conclusions: We recommend developers choose tools based on secret types present in their projects to prevent missing secrets. In addition, we recommend tool vendors update detection rules periodically and correctly employ secret verification mechanisms by collaborating with API vendors to improve accuracy.


A Comparative Study of Software Secrets Reporting by Secret Detection Tools

July 2023

·

51 Reads

Background: According to GitGuardian's monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated. Aims: The goal of our study is to aid developers in choosing a secret detection tool to reduce the exposure of secrets through an empirical investigation of existing secret detection tools. Method: We present an evaluation of five open-source and four proprietary tools against a benchmark dataset. Results: The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%). Our manual analysis of reported secrets reveals that false positives are due to employing generic regular expressions and ineffective entropy calculation. In contrast, false negatives are due to faulty regular expressions, skipping specific file types, and insufficient rulesets. Conclusions: We recommend developers choose tools based on secret types present in their projects to prevent missing secrets. In addition, we recommend tool vendors update detection rules periodically and correctly employ secret verification mechanisms by collaborating with API vendors to improve accuracy.


Citations (38)


... Contemporary and subsequent research efforts provided alternatives that modified existing legacy protocols [7], [8] or used in-band [9] or out-of-band [10], [11], [12] protocols to provide cryptographic mutual call authentication. Other efforts have explored preventing spoofing [13], [14], fingerprinting call audio or data features [15], [16], automating tracebacks [17], intercepting calls with voice assistants [18], [19], or detecting or preemptively blocking calls based solely on metadata or calling patterns [20], [21], [22], [23], [24], [25], [26]. Finally, researchers have studied the vulnerability of humans to phone scams [27], [28] and how they could be helped [29], [30]. ...

Reference:

Characterizing Robocalls with Multiple Vantage Points
Jäger: Automated Telephone Call Traceback
  • Citing Conference Paper
  • December 2024

... In the current practice of 5G, there is no authentication mechanism for the user equipment to authenticate or verify the integrity of the control communications. In fact, the current 5G networking implementation and practice does not involve or utilize the base station's public key, which enables cryptographic mechanisms, although there have been recent proposals in research [3], [4], [6], [7] and standardization/development [5] to introduce and use the base station's public key. In contrast to the current practice, our scheme builds the base station's credentials (binding the ID and location to the public key) and authenticates the base station. ...

Fixing Insecure Cellular System Information Broadcasts For Good
  • Citing Conference Paper
  • September 2024

... Smishing campaigns traditionally focus on the volume of messages with low quality, diversity, and personalization in individual messages [35]. Evidently, many studies have demonstrated excellent binary classification performance on smish classification tasks [20,27,36]. ...

On SMS Phishing Tactics and Infrastructure
  • Citing Conference Paper
  • May 2024

... Our pre-order breadth-first-traversal is similar to Mir et al. [20]'s analysis of reachable vulnerable call chains. Reachability analysis is a program analysis concept to determine whether functions containing vulnerable code are called within dependencies [21,22]. After this reachable method identification step, we can find the reachable functions and sensitive APIs from the vulnerable functions (RQ1). ...

Pairing Security Advisories with Vulnerable Functions Using Open-Source LLMs
  • Citing Chapter
  • July 2024

... One critical task in vulnerability management is tracing the commits for fixing a vulnerability [14,30,45,48,49,54,62,64]. By locating the patch, security stakeholders can more accurately determine the affected version [6], identify affected software components [9,10,21,34], and improve the severity assessment [29]. ...

VFCFinder: Pairing Security Advisories and Patches
  • Citing Conference Paper
  • July 2024

... Step 4. We used well-known automated static analysis tools suggested by OWASP [54] to scan the collected code snippets. The reason why we chose a static rather than dynamic approach is that the static analysis has higher coverage and is able to analyze programs without executing them [16]. While dynamic analysis offers more precise insights by reasoning about program behavior, it suffers from limited coverage and can be expensive [19,72,73]. ...

Finding Fixed Vulnerabilities with Off-the-Shelf Static Analysis
  • Citing Conference Paper
  • July 2023

... For community tools, we used the six community tools selected by Setu et al. [26] as they conducted a thorough selection process to perform an analysis of checked-in secret detection tools on the open-source repositories. ...

A Comparative Study of Software Secrets Reporting by Secret Detection Tools

... Network access control. Several approaches have been proposed for enforcing access control in organizational networks [6,7,10,13,24,33,46]. Nayak et al. [33] proposed Resonance, a framework for the specification and enforcement of access control policies. ...

MSNetViews: Geographically Distributed Management of Enterprise Network Security Policy
  • Citing Conference Paper
  • May 2023

... A complete validation of a checked-in secret requires us to access the services associated with the checked-in secret, which would raise ethical concerns. As a result, we decide whether a checked-in secret is valid mainly by manually analyzing its format and contextual surroundings, which is a commonly adopted technique for checked-in secret related research [26,27]. For example, in Listing 2, the presence of some functions from Google Cloud Library is an indicator of checked-in secret. ...

SecretBench: A Dataset of Software Secrets

... We propose conducting more research on how to improve privacycompliant implementation. While Privacy by Design [63] was proposed to be a guiding framework, more research is needed with tools suggesting to support developers with privacy-compliant implementation [64]- [70]. ...

Actions Speak Louder than Words: Entity-Sensitive Privacy Policy and Data Flow Analysis with POLICHECK