Kyle Soska’s research while affiliated with University of Illinois Urbana-Champaign and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


Figure 2: Attack mechanism
Figure 4: Number of poisoning transfers from each attack group over time (weekly basis)
Attack summary statistics
Successful phishing attacks
Attack group general statistics. Notations follow Fig. 2). The distinct number of contracts is in parentheses.

+1

Blockchain Address Poisoning
  • Preprint
  • File available

January 2025

·

30 Reads

Taro Tsuchiya

·

Jin-Dong Dong

·

Kyle Soska

·

In many blockchains, e.g., Ethereum, Binance Smart Chain (BSC), the primary representation used for wallet addresses is a hardly memorable 40-digit hexadecimal string. As a result, users often select addresses from their recent transaction history, which enables blockchain address poisoning. The adversary first generates lookalike addresses similar to one with which the victim has previously interacted, and then engages with the victim to ``poison'' their transaction history. The goal is to have the victim mistakenly send tokens to the lookalike address, as opposed to the intended recipient. Compared to contemporary studies, this paper provides four notable contributions. First, we develop a detection system and perform measurements over two years on Ethereum and BSC. We identify 13 times the number of attack attempts reported previously -- totaling 270M on-chain attacks targeting 17M victims. 6,633 incidents have caused at least 83.8M USD in losses, which makes blockchain address poisoning one of the largest cryptocurrency phishing schemes observed in the wild. Second, we analyze a few large attack entities using improved clustering techniques, and model attacker profitability and competition. Third, we reveal attack strategies -- targeted populations, success conditions (address similarity, timing), and cross-chain attacks. Fourth, we mathematically define and simulate the lookalike address-generation process across various software- and hardware-based implementations, and identify a large-scale attacker group that appears to use GPUs. We also discuss defensive countermeasures.

Download

Anatomy of a Digital Bubble: Lessons Learned from the NFT and Metaverse Frenzy

January 2025

·

20 Reads

Daisuke Kawai

·

Kyle Soska

·

Bryan Routledge

·

[...]

·

In the past few years, "metaverse" and "non-fungible tokens (NFT)" have become buzzwords, and the prices of related assets have shown speculative bubble-like behavior. In this paper, we attempt to better understand the underlying economic dynamics. To do so, we look at Decentraland, a virtual world platform where land parcels are sold as NFT collections. We find that initially, land prices followed traditional real estate pricing models -- in particular, value decreased with distance from the most desirable areas -- suggesting Decentraland behaved much like a virtual city. However, these real estate pricing models stopped applying when both the metaverse and NFTs gained increased popular attention and enthusiasm in 2021, suggesting a new driving force for the underlying asset prices. At that time, following a substantial rise in NFT market values, short-term holders of multiple parcels began to take major selling positions in the Decentraland market, which hints that, rather than building a metaverse community, early Decentraland investors preferred to cash out when land valuations became overly inflated. Our analysis also shows that while the majority of buyers are new entrants to the market (many of whom joined during the bubble), liquidity (i.e., parcels) was mostly provided by early adopters selling, which caused stark differences in monetary gains. Early adopters made money -- more than 10,000 USD on average per parcel sold -- but users who joined later typically made no profit or even incurred losses in the order of 1,000 USD per parcel. Unlike established markets such as financial and real estate markets, newly emergent digital marketplaces are mostly self-regulated. As a result, the significant financial risks we identify indicate a strong need for establishing appropriate standards of business conduct and improving user awareness.






Adversarial Matching of Dark Net Market Vendor Accounts

July 2019

·

122 Reads

·

28 Citations

Many datasets feature seemingly disparate entries that actually refer to the same entity. Reconciling these entries, or "matching," is challenging, especially in situations where there are errors in the data. In certain contexts, the situation is even more complicated: an active adversary may have a vested interest in having the matching process fail. By leveraging eight years of data, we investigate one such adversarial context: matching different online anonymous marketplace vendor handles to unique sellers. Using a combination of random forest classifiers and hierarchical clustering on a set of features that would be hard for an adversary to forge or mimic, we manage to obtain reasonable performance (over 75% precision and recall on labels generated using heuristics), despite generally lacking any ground truth for training. Our algorithm performs particularly well for the top 30% of accounts by sales volume, and hints that 22,163 accounts with at least one confirmed sale map to 15,652 distinct sellers---of which 12,155 operate only one account, and the remainder between 2 and 11 different accounts. Case study analysis further confirms that our algorithm manages to identify non-trivial matches, as well as impersonation attempts.


Plug and Prey? Measuring the Commoditization of Cybercrime via Online Anonymous Markets

August 2018

·

386 Reads

·

87 Citations

Researchers have observed the increasing commoditiza-tion of cybercrime, that is, the offering of capabilities, services, and resources as commodities by specialized suppliers in the underground economy. Commoditiza-tion enables outsourcing, thus lowering entry barriers for aspiring criminals, and potentially driving further growth in cybercrime. While there is evidence in the literature of specific examples of cybercrime commoditization, the overall phenomenon is much less understood. Which parts of cybercrime value chains are successfully com-moditized, and which are not? What kind of revenue do criminal business-to-business (B2B) services generate and how fast are they growing? We use longitudinal data from eight online anonymous marketplaces over six years, from the original Silk Road to AlphaBay, and track the evolution of commoditiza-tion on these markets. We develop a conceptual model of the value chain components for dominant criminal business models. We then identify the market supply for these components over time. We find evidence of com-moditization in most components, but the outsourcing options are highly restricted and transaction volume is often modest. Cash-out services feature the most listings and generate the largest revenue. Consistent with behavior observed in the context of narcotic sales, we also find a significant amount of revenue in retail cybercrime, i.e., business-to-consumer (B2C) rather than business-to-business. We conservatively estimate the overall revenue for cybercrime commodities on online anonymous markets to be at least US $15M between 2011-2017. While there is growth, commoditization is a spottier phenomenon than previously assumed.


An Empirical Analysis of Traceability in the Monero Blockchain

June 2018

·

6,945 Reads

·

249 Citations

Proceedings on Privacy Enhancing Technologies

Monero is a privacy-centric cryptocurrency that allows users to obscure their transactions by including chaff coins, called “mixins,” along with the actual coins they spend. In this paper, we empirically evaluate two weaknesses in Monero’s mixin sampling strategy. First, about 62% of transaction inputs with one or more mixins are vulnerable to “chain-reaction” analysis - that is, the real input can be deduced by elimination. Second, Monero mixins are sampled in such a way that they can be easily distinguished from the real coins by their age distribution; in short, the real input is usually the “newest” input. We estimate that this heuristic can be used to guess the real input with 80% accuracy over all transactions with 1 or more mixins. Next, we turn to the Monero ecosystem and study the importance of mining pools and the former anonymous marketplace AlphaBay on the transaction volume. We find that after removing mining pool activity, there remains a large amount of potentially privacy-sensitive transactions that are affected by these weaknesses. We propose and evaluate two countermeasures that can improve the privacy of future transactions.


Automatic Application Identification from Billions of Files

August 2017

·

42 Reads

·

4 Citations

Understanding how to group a set of binary files into the piece of software they belong to is highly desirable for software profiling, malware detection, or enterprise audits, among many other applications. Unfortunately, it is also extremely challenging: there is absolutely no uniformity in the ways different applications rely on different files, in how binaries are signed, or in the versioning schemes used across different pieces of software. In this paper, we show that, by combining information gleaned from a large number of endpoints (millions of computers), we can accomplish large-scale application identification automatically and reliably. Our approach relies on collecting metadata on billions of files every day, summarizing it into much smaller "sketches", and performing approximate k-nearest neighbor clustering on non-metric space representations derived from these sketches. We design and implement our proposed system using Apache Spark, show that it can process billions of files in a matter of hours, and thus could be used for daily processing. We further show our system manages to successfully identify which files belong to which application with very high precision, and adequate recall.


Citations (9)


... Additionally, ZKP-based systems rely on trust in a single entity to generate and verify the proofs without tampering [4]. This centralisation of trust introduces a potential vulnerability, as a compromised proof generator could invalidate the entire process, undermining the privacy guarantees [15]. Furthermore, many of these solutions require a complete setup restart whenever computational changes are needed, reducing system flexibility and increasing costs [21]. ...

Reference:

ZK-DPPS: A Zero-Knowledge Decentralised Data Sharing and Processing Middleware
Ratel: MPC-extensions for Smart Contracts
  • Citing Conference Paper
  • July 2024

... Public blockchains were originally introduced to realize a secure and transparent ledger system without relying on trusted third parties [49]. However, this technology is now supporting a large volume of mostly speculative cryptocurrency trading [11,39,58]. Besides cryptocurrencies, blockchains are also used to support non-fungible tokens (NFTs) [48]. ...

Is your digital neighbor a reliable investment advisor?
  • Citing Conference Paper
  • April 2023

... Examples include phishing websites [12], giveaway scams [19], counterfeit tokens and rug pulls [8,43,46], market manipulation [11,18,45], exchange/marketplace scams [20,35], and Ponzi schemes [39]. The above attacks exploit 1) the absence of an intermediary [22], 2) pseudonymous payments for scammers or criminals [5,30], 3) large price movements attracting inexperienced investors [15,31] and inviting mischief [17,36]. ...

Towards Understanding Cryptocurrency Derivatives:A Case Study of BitMEX
  • Citing Conference Paper
  • April 2021

... Extensive research efforts have demonstrated that both content and personal attributes are inherently sensitive data in terms of individual privacy [6]. Indeed, techniques leveraging user-generated content [7,8] or personal profile characteristics such as age, gender or usernames [9,10,11,12] have presented notable efficacy in detecting profiles corresponding to the same individual across different domains. Furthermore, even methodologies applied to anonymized domains, have exhibited remarkable cross-domain identity matching capabilities, highlighting the privacy implications of granting access to the network of interactions [13,14,15,16,17] and to individual metadata, such as profile trajectories [18]. ...

Adversarial Matching of Dark Net Market Vendor Accounts
  • Citing Conference Paper
  • July 2019

... For instance, Christin [9] collected and analysed data for eight months (between late 2011 and 2012) for a longitudinal study in the most notorious dark web market at the time-Silk Road. Van Wegberg et al. [32] analysed no less than six years of longitudinal data from eight dark web markets. Both works find out that the business models of these markets is maturing. ...

Plug and Prey? Measuring the Commoditization of Cybercrime via Online Anonymous Markets

... Since Monero hides the identity of the spending key in a transaction, one cannot partition the output set into spent and unspent outputs. While some transaction graph analysis techniques have been able to categorize a large percentage of non-RingCT outputs as spent [31,39], it is not possible to know if any of the RingCT outputs (except for 5 of them 3 ) have been spent [52,60]. Consequently, the set of unspent outputs in Monero keeps on growing. ...

An Empirical Analysis of Traceability in the Monero Blockchain

Proceedings on Privacy Enhancing Technologies

... Most hazard regression approaches are based on the Cox's propotional hazard model λ(t|x) = λ 0 (t) exp(w x) (Cox 1972), including parametric models, and nonparametric models with baseline hazard rate λ 0 (t) unspecified. In this paper, we present a nonlinear parametric hazard regression model inspired by (Liu et al. 2017). ...

Attributing hacks with survival trend filtering
  • Citing Article
  • January 2017

Electronic Journal of Statistics

... First, network activity seem to decrease after the two police interventions. Similar to past research (Horton-Eddison et al., 2021;Décary-Hétu & Giommoni, 2017;Soska & Christin, 2015), we observed fewer activities post-interventions, but the long-term conclusions we can draw differ from theirs. While others observed a return to the initial volume of activities, our interrupted time series analysis has shown that a decreasing significant trend settled post-intervention and lasted for the entirety of the observation period. ...

Measuring the longitudinal evolution of the online anonymous marketplace ecosystem