Conference PaperPDF Available

An Empirical Analysis of Pool Hopping Behavior in the Bitcoin Blockchain

Authors:

Abstract and Figures

We provide an empirical analysis of pool hoppingbehavior among 15 mining pools throughout Bitcoin’s history.Mining pools have emerged as major players to ensure that theBitcoin system stays secure, valid, and stable. Individual minersjoin mining pools to benefit from a more predictable income.Many questions remain open regarding how mining pools haveevolved throughout Bitcoin’s history and when and why minersjoin or leave mining pools. We propose a heuristic algorithm toextract the payout flow from mining pools and detect the pools’migration of miners. Our results showed that payout schemesand pool fees influence miners’ decisions to join, change, orexit from a mining pool, thus affecting the dynamics of miningpool market shares. Our analysis provides evidence that miningactivity becomes an industry as miners’ decisions follow classicaleconomic rationale.
Content may be subject to copyright.
A preview of the PDF is not available
... They mainly utilized economic activity pattern-based ways to explore the relationships between miner-owned addresses and pools. In another work, Tovanich et al. [10] studied pool hopping behaviors in 15 pools of Bitcoin transactions. Based on the empirical study and their proposed heuristic algorithm designed to describe the payout flows, they determined those pool fees and payout schemes are the two most important factors to influence the behaviors of miner-owned addresses. ...
... The time cost increases dramatically when hyperparameters exceed the limits, especially when k > 4. 2) From a local structure perspective, when k = 4 we take Fig. 2 as an example, there are eight figures of G k with different numbers of nodes generated from the same Ads. 10 Our experimental results from various k-hop subgraphs reveal that Fig. 2(a), Fig. 2(b), and Fig. 2(c) contain less structural information than the other five figures of G k . This observation aligns with the intuitive visual insights apparent in these figures. ...
Article
Cryptocurrencies have dramatically increased adoption in mainstream applications in various fields such as financial and online services, however, there are still a few amounts of cryptocurrency transactions that involve illicit or criminal activities. It is essential to identify and monitor addresses associated with illegal behaviors to ensure the security and stability of the cryptocurrency ecosystem. In this paper, we propose a framework to build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data, which is the largest labeled Bitcoin address behavior dataset publicly available to our knowledge. We also propose a novel and efficient subgraph generation algorithm called BTC-SubGen to extract a k -hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node. We then conduct 13-class classification tasks on BABD-13 by five machine learning models namely k -nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost, the results show that the accuracy rates are between 93.24% and 97.13%. In addition, we study the relations and importance of the proposed features and analyze how they affect the effect of machine learning models. Finally, we conduct a preliminary analysis of the behavior patterns of different types of Bitcoin addresses using concrete features and find several meaningful and explainable modes.
... Empirical analysis plays an important role in understanding the dynamics of the cryptocurrency ecosystem [27][28][29] and the behaviors of addresses/entities [14,30]. For instance, Tovanich et al. [31] and Hou et al. [32] reveal that factors such as payout schemes and pool fees influence miners' behaviors in Bitcoin mining pools, and then impact the overall system performance. Empirical analysis can also aid in identifying cryptocurrency scams. ...
Preprint
Full-text available
Cryptocurrencies are widely used, yet current methods for analyzing transactions heavily rely on opaque, black-box models. These lack interpretability and adaptability, failing to effectively capture behavioral patterns. Many researchers, including us, believe that Large Language Models (LLMs) could bridge this gap due to their robust reasoning abilities for complex tasks. In this paper, we test this hypothesis by applying LLMs to real-world cryptocurrency transaction graphs, specifically within the Bitcoin network. We introduce a three-tiered framework to assess LLM capabilities: foundational metrics, characteristic overview, and contextual interpretation. This includes a new, human-readable graph representation format, LLM4TG, and a connectivity-enhanced sampling algorithm, CETraS, which simplifies larger transaction graphs. Experimental results show that LLMs excel at foundational metrics and offer detailed characteristic overviews. Their effectiveness in contextual interpretation suggests they can provide useful explanations of transaction behaviors, even with limited labeled data.
... Pool members have the flexibility to leave or join other pools. A heuristic algorithm facilitates the extraction of payout flows from mining pools, enabling anyone to gather information about miners operating as pool members in specific pools [27]. Additionally, techniques involving block INV messages can be employed to identify mining nodes in the Bitcoin network [28]. ...
Article
Full-text available
In permissionless blockchain systems, Proof of Work (PoW) is utilized to address the issues of double-spending and transaction starvation. When an attacker acquires more than 50% of the hash power of the entire network, they gain the ability to engage in double-spending activities, posing a significant threat to the PoW consensus algorithm. This research focuses on the consensus algorithm employed in the Bitcoin system, explaining how it operates and the security challenges it faces. The proposed modification to the PoW algorithm imposes a restriction on miners: they are not allowed to accept consecutive blocks from the same miner into the final local blockchain to prevent the 51% attack problem. This modification supports transactions that require six confirmations. In the event an attacker attempts a 51% attack with a private chain that consists of fewer than 6 blocks, it becomes easier to detect a double-spending attack before accepting the attacker’s private chain. The modified algorithm introduces a "Safe Mode Detection Algorithm" that scrutinizes incoming blocks for adjustments at the top of the local blockchain. If inconsistencies are identified, the consensus algorithm proceeds cautiously by comparing the UTXO dictionaries from the attacker’s chain with those from the miner’s own blockchain. This meticulous comparison aims to detect instances of double-spending. If such instances are detected, the miner rejects the attacker’s chain, establishing a double-spend-free environment and thwarting 51% attacks.
... Specifically, they compare the Bitcoin network with other complex networks (e.g., social networks and citation networks). In addition, Wang et al. [23] and Tovanich et al. [24] study ransomware activities and pool hopping behaviors through different angles of empirical analysis. Both of them find several obvious phenomena in these two Bitcoin addresses that might be worthy of further research. ...
... There are a few studies research the specific type of Bitcoin address including Ponzi, blackmail, money laundering, tumbler, exchange, and pool [5]- [11]. However, they do not clearly figure out the features that can distinguish each type from the others, due to the variance of study perspectives and methods. ...
... However, Token Flow does not support arbitrary cross-chain use cases. On the academic side, we have several tools that allow on-chain analysis of smart contracts for security purposes [38], [62], [63], performance [64], [64], [65], compliance and anti-fraud [66], and others [67], [68] . However, such projects provide a sort of meta-view over user activity, do not provide specific information about interaction with protocols, and are not generalizable, contrarily to this work. ...
Preprint
Full-text available
p>Ecosystems of multiple blockchains are now a reality. Multi-chain applications and protocols are perceived as necessary to enable scalability, privacy, and composability. Despite being a promising emerging research area, we recently have witnessed many attacks that have caused billions of dollars in losses. Attacks against bridges that connect chains are at the top of such attacks in terms of monetary cost, and no apparent solution seems to emerge from the ongoing chaos. In this paper, we present our contribution to minimizing bridge attacks. In particular, we explore the concepts of cross-chain transaction, cross-chain logic, and the cross-chain state as the enablers of the cross-chain model. We propose Hephaestus , the first cross-chain model generator that captures the operational complexity of cross-chain applications. Hephaestus can generate cross-chain models from local transactions on different ledgers realizing arbitrary use cases and allowing operators to monitor their cross-chain applications. Monitoring helps identify outliers and malicious behavior, which can help programmatically to stop bridge hacks and other attacks. We conduct a detailed evaluation of our system, where we implement a cross-chain bridge use case. Our experimental results show that Hephaestus can process 600 cross-chain transactions in less than 5.5 seconds in an environment with two blockchains and requires sublinear storage.</p
... However, we think that change detection would be more actionable for a slightly different task that we define here: the change detection of individual actors. We argue that in various applications, researchers and practitioners are mostly interested in analyzing one particular actor, or a subset of actors of interest (e.g., malicious actors [18,19], Mining Pools [20], Major exchanges [21], etc.). In this section, we investigate how our supervised machine learning approach, contrary to unsupervised ones, could be used for the change detection on a particular actor, with the objective to better detecting the activity of this actor in particular. ...
Article
Full-text available
Bitcoin is the most widely used crypto-currency, and one of the most studied. Thanks to the open nature of the Blockchain, transaction records are freely accessible and can be analyzed by anyone. The first step in most analytics work is to group anonymous addresses into a set of addresses, called aggregates, that are meant to correspond to unique actors. In this paper, we propose new methods to discover more accurate address aggregates using supervised learning. We introduce a way to create a labeled training set based on reliable heuristics and external information, and propose two methods. The first method automatically finds address aggregates from a set of transactions. The second one improves an address aggregate of a target actor by specializing the training for a single actor. We empirically validate our results on large-scale datasets. A striking result of our analysis is that training a model to recognize the change addresses of a particular actor is more efficient than using a larger dataset that does not target that particular actor. In doing so, we clearly show the feasibility and interest of supervised machine learning to identify Bitcoin actors.
Article
Ecosystems of multiple blockchains are now a reality. Multichain applications and protocols are perceived as necessary to enable scalability, privacy, and composability. Despite being a promising emerging area, we have been witnessing devastating attacks on cross-chain bridges that have caused billions of dollars in losses, and no apparent solution seems to emerge from the ongoing chaos. In this article, we present our contribution to minimizing bridge attacks, by monitoring a cross-chain model . In particular, we aggregate cross-chain events into cross-chain transactions , and verify if they follow a set of cross-chain rules , which then generate a model. We propose Hephaestus , the first cross-chain model generator that captures the operational complexity of cross-chain applications. Hephaestus can generate cross-chain models from local transactions in different ledgers, realizing arbitrary cross-chain use cases and allowing operators to monitor their applications. Monitoring helps identify outliers and malicious behavior, which can enable programmatically stopping attacks (“a circuit breaker”), including bridge hacks. We conduct a detailed evaluation of our system, where we implement a cross-chain bridge use case. Our experimental results show that Hephaestus can process 600 cross-chain transactions in less than 5.5 s in an environment with two blockchains using sublinear storage, paving the way for more resilient bridge designs.
Article
Full-text available
Bitcoin uses an unspent transaction output (UTXO) model for coin circulation, which is similar to the banknotes. The transaction history is publicly available and allows to trace cryptocurrency flows. Different users merge transactions into a single bigger one to tangle flows. The merged transaction is called a shared send mixer (SSM). One can try to find the original subtransactions–solve an untangling problem. Based on the number of untanglings and their size, one extracts additional information about coin circulation. Theoretical analysis of the untangling problem is known from the literature. The paper aims to collect statistics of the SSM usage by transaction type for Bitcoin blockchain. We propose an algorithm to solve the problem, prove its correctness, and provide a source code. We applied the algorithm to the Bitcoin historical data. 15% transactions are SSM, and 90% of them allow unique untangling. The future work is an algorithm application to other UTXO systems and the results adaptation to an address grouping.
Article
We consider the “block withholding attack” as introduced by Eyal, where mining pools may infiltrate others to decrease their revenues. However, when two mining pools attack each other and neither controls a strict majority, the so-called miner’s dilemma arises. Both pools are worse off than without an attack. Knowing this, pools may make implicit non-attack agreements. Having said this, the miner’s dilemma is known to emerge only if no pool controls the majority of the mining power. In this work, we allow for miner migration and show that the miner’s dilemma emerges even for pools whose mining power exceeds 50%. We construct a game, where two mining pools attack each other and use simulation analysis methods to analyze the evolution the pools’ mining power, infiltration preferences and revenue densities under the influence of different mining pool sizes and miner migration preferences. The results show that underlying game experiences a phase transition fueled by miners’ migration preference. Without migration, it is profitable for a large mining pool to attack the other pool. The higher the migration preference of the miners, the more the game transitions into the miner’s dilemma and attacking makes both pools worse off. In a second step, we introduce solo-mining into the system. Introducing solo-mining cannot prevent the miner’s dilemma, however, it improves the efficiency of the mining process as the infiltration preferences of the mining pools are lowered. Thus, solo-mining has a control effect on the miner’s dilemma by keeping the infiltration preference below a certain threshold.
Article
Full-text available
Cryptocurrencies gain trust in users by publicly disclosing the full creation and transaction history. In return, the transaction history faithfully records the whole spectrum of cryptocurrency user behaviors. This article analyzes and summarizes the existing research on knowledge discovery in the cryptocurrency transactions using data mining techniques. Specifically, we classify the existing research into three aspects, i.e., transaction tracings and blockchain address linking, the analyses of collective user behaviors, and the study of individual user behaviors. For each aspect, we present the problems, summarize the methodologies, and discuss major findings in the literature. Furthermore, an enumeration of transaction data parsing and visualization tools and services is also provided. Finally, we outline several gaps and trends for future investigation in this research area.
Conference Paper
Full-text available
We present our work on visual analytics tools to support the analysis of Bitcoin mining pool evolution. Mining blocks are a critical component of the Bitcoin ecosystem, helping to keep the system secure, valid, and stable. At the same time, mining is a resource-intensive activity that continues to get more and more difficult. Mining pools have emerged to address this issue and to ensure a more stable and predictable income by sharing computing power. Yet, increased centralization of the mining power is also not without dangers (e. g., the 51% attack), and, thus, it is important to better understand and analyze mining pool activities in Bitcoin. Here, we report three contributions: our extensive data collection on Bitcoin mining pools, our development of two custom visualizations, and our first exploratory data analysis leading to hypotheses and documented activities about pools' main features such as market share, reward rules, or location.
Article
Full-text available
The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails two issues: first, it increases the complexity of the analysis requiring higher efforts and, second, it may hide network micro-dynamics important for detecting short-term changes in entity behavioral patterns. The aim of this paper is to address both issues by performing a “temporal dissection” of the Bitcoin blockchain, i.e., dividing it into smaller temporal batches to achieve entity classification. The idea is that a machine learning model trained on a certain time-interval (batch) should achieve good classification performance when tested on another batch if entity behavioral patterns are similar. We apply cascading machine learning principles—a type of ensemble learning applying stacking techniques—introducing a “k-fold cross-testing” concept across batches of varying size. Results show that blockchain batch size used for entity classification could be reduced for certain classes (Exchange, Gambling, and eWallet) as classification rates did not vary significantly with batch size; suggesting that behavioral patterns did not change significantly over time. Mixer and Market class detection, however, can be negatively affected. A deeper analysis of Mining Pool behavior showed that models trained on recent data perform better than models trained on older data, suggesting that “typical” Mining Pool behavior may be represented better by recent data. This work provides a first step towards uncovering entity behavioral changes via temporal dissection of blockchain data.
Article
Cryptocurrencies represented by Bitcoin have fully demonstrated their advantages and great potential in payment and monetary systems during the last decade. The mining pool, which is considered the source of Bitcoin, is the cornerstone of market stability. The surveillance of the mining pool can help regulators effectively assess the overall health of Bitcoin and issues. However, the anonymity of mining-pool miners and the difficulty of analyzing large numbers of transactions limit in-depth analysis. It is also a challenge to achieve intuitive and comprehensive monitoring of multi-source heterogeneous data. In this study, we present SuPoolVisor, an interactive visual analytics system that supports surveillance of the mining pool and de-anonymization by visual reasoning. SuPoolVisor is divided into pool level and address level. At the pool level, we use a sorted stream graph to illustrate the evolution of computing power of pools over time, and glyphs are designed in two other views to demonstrate the influence scope of the mining pool and the migration of pool members. At the address level, we use a force-directed graph and a massive sequence view to present the dynamic address network in the mining pool. Particularly, these two views, together with the Radviz view, support an iterative visual reasoning process for de-anonymization of pool members and provide interactions for cross-view analysis and identity marking. Effectiveness and usability of SuPoolVisor are demonstrated using three cases, in which we cooperate closely with experts in this field.
Chapter
The first six months of 2018 have seen cryptocurrency thefts of $761 million, and the technology is also the latest and greatest tool for money laundering. This increase in crime has caused both researchers and law enforcement to look for ways to trace criminal proceeds. Although tracing algorithms have improved recently, they still yield an enormous amount of data of which very few datapoints are relevant or interesting to investigators, let alone ordinary bitcoin owners interested in provenance. In this work we describe efforts to visualize relevant data on a blockchain. To accomplish this we come up with a graphical model to represent the stolen coins and then implement this using a variety of visualization techniques.
Article
Since its deployment in 2009, Bitcoin has achieved remarkable success and spawned hundreds of other cryptocurrencies. The author traces the evolution of the hardware underlying the system, from early GPU-based homebrew machines to today’s datacenters powered by application-specific integrated circuits. These ASIC clouds provide a glimpse into planet-scale computing’s future.
Article
Analysis of blockchain data is useful for both scientific research and commercial applications. We present BlockSci, an open-source software platform for blockchain analysis. BlockSci is versatile in its support for different blockchains and analysis tasks. It incorporates an in-memory, analytical (rather than transactional) database, making it several hundred times faster than existing tools. We describe BlockSci's design and present four analyses that illustrate its capabilities. This is a working paper that accompanies the first public release of BlockSci, available at https://github.com/citp/BlockSci. We seek input from the community to further develop the software and explore other potential applications.