Stefan Savage’s research while affiliated with University of California, San Diego and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (200)


Figure 2: Proportion of sites that fully disallow any AI-related user agent, broken down by site rank
Figure 3: Proportion of top 100k sites that explicitly impose restrictions on AI-related crawlers in robots.txt over time. The vertical line indicates when the EU AI Act was released.
Figure 6: Example of a GPT app (WebG) that can retrieve information from the web through a third-party infrastructure (mixerbox.com). Upon clicking "Allow", this GPT app can retrieve information from the web using service provided by mixerbox.com.
Figure 7: Flowchart for inferring the AI blocking setting on websites using Cloudflare.
Somesite I Used To Crawl: Awareness, Agency and Efficacy in Protecting Content Creators From AI Crawlers
  • Preprint
  • File available

November 2024

·

14 Reads

Enze Liu

·

Elisa Luo

·

Shawn Shan

·

[...]

·

Stefan Savage

The success of generative AI relies heavily on training on data scraped through extensive crawling of the Internet, a practice that has raised significant copyright, privacy, and ethical concerns. While few measures are designed to resist a resource-rich adversary determined to scrape a site, crawlers can be impacted by a range of existing tools such as robots.txt, NoAI meta tags, and active crawler blocking by reverse proxies. In this work, we seek to understand the ability and efficacy of today's networking tools to protect content creators against AI-related crawling. For targeted populations like human artists, do they have the technical knowledge and agency to utilize crawler-blocking tools such as robots.txt, and can such tools be effective? Using large scale measurements and a targeted user study of 182 professional artists, we find strong demand for tools like robots.txt, but significantly constrained by significant hurdles in technical awareness, agency in deploying them, and limited efficacy against unresponsive crawlers. We further test and evaluate network level crawler blockers by reverse-proxies, and find that despite very limited deployment today, their reliable and comprehensive blocking of AI-crawlers make them the strongest protection for artists moving forward.

Download


Figure 4: Total Inflow (ETH) -Total Outflow (SOL) Over Time: Wormhole Attack in Feb 2022.
Figure 5: The lifetime of the bridges in our retrospective study. Lines start with the bridge's first valid transaction and end with the last valid transaction in our data, corresponding to the bridge's closure or November 2023 if the bridge was still operating at the end of our data set. Diamonds indicate the dates of attack.
Figure 7: Announce-then-execute model for bridges.
Count of Monte Crypto: Accounting-based Defenses for Cross-Chain Bridges

October 2024

·

23 Reads

Between 2021 and 2023, crypto assets valued at over \$US2.6 billion were stolen via attacks on "bridges" -- decentralized services designed to allow inter-blockchain exchange. While the individual exploits in each attack vary, a single design flaw underlies them all: the lack of end-to-end value accounting in cross-chain transactions. In this paper, we empirically analyze twenty million transactions used by key bridges during this period. We show that a simple invariant that balances cross-chain inflows and outflows is compatible with legitimate use, yet precisely identifies every known attack (and several likely attacks) in this data. Further, we show that this approach is not only sufficient for post-hoc audits, but can be implemented in-line in existing bridge designs to provide generic protection against a broad array of bridge vulnerabilities.



Figure 1: Example giveaway scam landing pages promoted via Twitter. Scammers impersonate popular personalities including Brad Garlinghouse (the Ripple CEO) and Elon Musk.
Figure 2: Example livestream containing a giveaway scam. The video playing is of Brad Garlinghouse and the scam website is linked to in both the chat and the embedded QR code.
Figure 5: Effectiveness of keywords
Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates

May 2024

·

132 Reads

Scams -- fraudulent schemes designed to swindle money from victims -- have existed for as long as recorded history. However, the Internet's combination of low communication cost, global reach, and functional anonymity has allowed scam volumes to reach new heights. Designing effective interventions requires first understanding the context: how scammers reach potential victims, the earnings they make, and any potential bottlenecks for durable interventions. In this short paper, we focus on these questions in the context of cryptocurrency giveaway scams, where victims are tricked into irreversibly transferring funds to scammers under the pretense of even greater returns. Combining data from Twitter, YouTube and Twitch livestreams, landing pages, and cryptocurrency blockchains, we measure how giveaway scams operate at scale. We find that 1 in 1000 scam tweets, and 4 in 100,000 livestream views, net a victim, and that scammers managed to extract nearly \$4.62 million from just hundreds of victims during our measurement window.






Forward Pass: On the Security Implications of Email Forwarding Mechanism and Policy

February 2023

·

314 Reads

The critical role played by email has led to a range of extension protocols (e.g., SPF, DKIM, DMARC) designed to protect against the spoofing of email sender domains. These protocols are complex as is, but are further complicated by automated email forwarding -- used by individual users to manage multiple accounts and by mailing lists to redistribute messages. In this paper, we explore how such email forwarding and its implementations can break the implicit assumptions in widely deployed anti-spoofing protocols. Using large-scale empirical measurements of 20 email forwarding services (16 leading email providers and four popular mailing list services), we identify a range of security issues rooted in forwarding behavior and show how they can be combined to reliably evade existing anti-spoofing controls. We show how this allows attackers to not only deliver spoofed email messages to prominent email providers (e.g., Gmail, Microsoft Outlook, and Zoho), but also reliably spoof email on behalf of tens of thousands of popular domains including sensitive domains used by organizations in government (e.g., state.gov), finance (e.g., transunion.com), law (e.g., perkinscoie.com) and news (e.g., washingtonpost.com) among others.


Citations (87)


... In 2023, Li et al. [10] developed CryptoScamTracker, a system for identifying cryptocurrency giveaway scams [61,62], where scammers deceive victims by promising to return a multiplied amount of cryptocurrency if they first send a small amount to a provided wallet, often posing as a donation event or falsely advertising giveaways endorsed by public figures. Comparing our results, we find that investment scams are more widespread, with approximately 8.5x more scam websites detected overall. ...

Reference:

The Poorest Man in Babylon: A Longitudinal Study of Cryptocurrency Investment Scams
Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates
  • Citing Conference Paper
  • November 2024

... IP address geolocation plays a central role in networking and security: understanding the Internet's topology to improve last mile latency [1,9], filtering traffic to prevent cyber attacks [22], and identifying malicious activity to bring attackers to justice [14,19], all rely on geolocation. To geolocate an IP address, methodologies often rely upon a global, human-reported set of vantage points (i.e., probes) to measure from and verify accuracy against. ...

Using Honeybuckets to Characterize Cloud Storage Scanning in the Wild
  • Citing Conference Paper
  • July 2024

... dep., incl. 5 ) Safe Languages [50] / Software Verification [154,163] Single Full Byte 2 (safe/unsafe) (function call) Software Fault Isolation [79,119,141,142,179,238,257,262] Single Full Byte ∞ ( 5 ) Memory Encryption / AES-NI [155] Mutual Full 128 bits ∞ (copy key + encrypt + 5 ) 1 In Ring 0. 2 Not all combinations of R/W/X supported. 3 Covers many granularities [138]. ...

WaVe: a verifiably secure WebAssembly sandboxing runtime
  • Citing Conference Paper
  • May 2023

... While it is important to consider the impact of unique features, there has been limited research [7] investigating the possible misuses of Handshake, and the practical utilization of Handshake remains unclear due to its novelty. We pose the following research question: Can Handshake replace the current DNS in the future? ...

The Challenges of Blockchain-Based Naming Systems for Malware Defenders
  • Citing Conference Paper
  • November 2022

... The pandemic highlighted the importance of flexible yet robust data protection regulations to accommodate emergency public health measures while upholding strong privacy standards. It stressed maintaining a balance between public health security and individual privacy rights through transparent and respectful data handling practices (Campbell-Verduyn & Gstrein, 2024;Liu et al., 2023). During the pandemic, public compliance with health monitoring efforts heavily depended on trust, contingent upon transparency concerning data use, access, and protective measures (Houser & Bagby, 2023;Stalla-Bourdillon et al., 2020). ...

No Privacy Among Spies: Assessing the Functionality and Insecurity of Consumer Android Spyware Apps

Proceedings on Privacy Enhancing Technologies

... The effectiveness of such technologies has been investigated by the research community, most notably by means of web measurements [13,15,28]. Measurement studies perform large-scale analyses over the Web by automatically collecting data from popular websites and detecting interactions with web trackers to reason about privacy threats. ...

Measuring UID smuggling in the wild
  • Citing Conference Paper
  • October 2022

... Another set of work has examined web infrastructure within specific regions. For instance, Jonker et al. [34] describe Russian domain infrastructure. Fanou et al. [22,23] describe web infrastructure in Africa, including regional network infrastructure of major CDNs and trends of hosting abroad. ...

Where .ru?: assessing the impact of conflict on russian domain infrastructure
  • Citing Conference Paper
  • October 2022

... The issue of MAC address tracking and profiling underscores a broader debate about privacy in the digital age, highlighting the need for stronger regulatory frameworks and technology solutions to protect individuals from invasive tracking practices and ensure their right to privacy in increasingly connected environments. MAC address anonymization techniques helps to mitigate the MAC Address tracking, and profiling [172] - [176] as in Table 7 below. ...

Measuring security practices
  • Citing Article
  • September 2022

Communications of the ACM

... Among a random sample of .com domains, primarily comprised of less popular domains, GoDaddy serves as the email service provider for 29% of them [11]. Practices implemented by large providers, such as shadow banning IP addresses purportedly to combat spam, may contribute to this concentration [8]. ...

Who's got your mail?: characterizing mail service provider usage
  • Citing Conference Paper
  • November 2021