PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In this work, we perform a longitudinal analysis of the BNB Smart Chain and Ethereum blockchain from their inception to March 2022. We study the ecosystem of the tokens and liquidity pools, highlighting analogies and differences between the two blockchains. We estimate the lifetime of the tokens, discovering that about 60% of them are active for less than one day. Moreover, we find that 1% of addresses create an anomalous number of tokens (between 20% and 25%). We present an exit scam fraud and quantify its prevalence on both blockchains. We find that token spammers use short lifetime tokens as disposable tokens to perpetrate these frauds serially. Finally, we present a new kind of trader bot involved in these activities, and we detect their presence and quantify their activity in the exit scam operations.
Content may be subject to copyright.
Token Spammers, Rug Pulls, and SniperBots: An Analysis of the
Ecosystem of Tokens in Ethereum and the Binance Smart Chain (BNB)
Federico Cernera
Sapienza University of Rome
Massimo La Morgia
Sapienza University of Rome
Alessandro Mei
Sapienza University of Rome
Francesco Sassi
Sapienza University of Rome
In this work, we perform a longitudinal analysis of the BNB
Smart Chain and Ethereum blockchain from their inception
to March 2022. We study the ecosystem of the tokens and liq-
uidity pools, highlighting analogies and differences between
the two blockchains. We estimate the lifetime of the tokens,
discovering that about 60% of them are active for less than
one day. Moreover, we find that 1% of addresses create an
anomalous number of tokens (between 20% and 25%). We
present an exit scam fraud and quantify its prevalence on both
blockchains. We find that token spammers use short lifetime
tokens as disposable tokens to perpetrate these frauds seri-
ally. Finally, we present a new kind of trader bot involved
in these activities, and we detect their presence and quantify
their activity in the exit scam operations.
1 Introduction
The cryptocurrency market is loosely regulated [7,33]. Even
if policymakers are moving towards building a safer environ-
ment for cryptocurrency investors [48], it is a complex task,
and it needs time. Meanwhile, blockchain-related technolo-
gies evolve fast, and with the birth of the DeFi [57] investors
begin to move from centralized exchanges (CEX) like Bi-
nance or FTX to decentralized exchanges (DEX). DEXes are
distributed Applications (dApp) for trading that run on-chain
powered by smart contracts. While regulating the standard
cryptocurrencies market is not easy, ruling the on-chain trad-
ing platform is harder. Indeed, even if the web interface of
a DEX can be shut down [5], its smart contracts are still
reachable and working on the blockchain.
DEX and DeFi dApp were born in the Ethereum
blockchain, but DeFi services rapidly pop up on all the
blockchains that support smart contracts. Although Ethereum
always plays the role of the main character in the DeFi world,
with over 68 billion USD locked in its smart contracts, the
BNB smart chain or BSC (former Binance Smart Chain) pro-
poses itself as a faster and cheaper alternative to Ethereum.
Uniswap and PancakeSwap are arguably the two most pop-
ular DEXes on Ethereum and BSC. They rely on the Auto-
mated Market Maker (AMM) model to handle the trading
system. At the basis of the AMM model, there is the concept
of liquidity pools, a smart contract that handles two tokens
(trading pair) that the user can swap. Unlike CEX, where the
platform defines the trading pair, every user can create their
pair on DEXes and let the other users use it. However, as we
will see in the following, some users abuse this freedom to
carry out frauds.
In this work, we perform a longitudinal investigation on
tokens and liquidity pools that live in the Ethereum and BSC
blockchains. We start by parsing over 3 billion transactions of
both blockchains, searching for the largest possible number of
tokens and liquidity pools. We identified more than 1.3 million
tokens and 1 million liquidity pools. Then we reconstruct their
lifetime, discovering that approximately 60% of the tokens
have a lifetime shorter than 1 day. Focusing on who creates
the token, we observe that a tiny fraction of addresses, just 1%,
is responsible for creating more than 20% of the tokens. Given
their overproduction of tokens, we will call these addresses
token spammers. Surprisingly, we also find that the tokens
with a very short life are actively traded on liquidity pools.
Albeit this phenomenon is present on both blockchains, we
notice that it is more widespread in BSC.
Diving into this subset of tokens, we observe that a large
fraction of liquidity pools used to trade the 1-day tokens show
the malicious pattern that we call exit scam. We analyze all
the liquidity pools looking for exit scam patterns, and we find
266,340 potential frauds in BSC and 21,594 potential frauds
in Ethereum. We estimate the cost of the operation and the
gains of the scammers. Here, we see that the success rate
of this operation is not very high (between 40% and 60%).
However, given the simplicity and the very low cost of the
operation, scammers can serially arrange the fraud and cover
a series of unsuccessful operations with a single successful
Our key contributions are:
arXiv:2206.08202v1 [cs.CY] 16 Jun 2022
Analysis of BNB smart chain
: To the best of our knowl-
edge, we are the first to study this young but well-
established blockchain, performing a longitudinal anal-
ysis from its inception to March 2022. We study the
tokens and the liquidity pools ecosystem, highlighting
analogies and differences with Ethereum.
1-day tokens and Token spammers
: We estimate the
lifetime of the tokens on both blockchains, discovering
that about 60% of tokens last less than one day. In par-
ticular, a significant fraction of them lasts just 1 block.
Analyzing who creates the tokens, we observe that just
1% of addresses create an abnormal number of tokens
(about 20-25% of tokens of the blockchain).
Exit scam operations
: We investigate the presence of
the exit scam fraud pattern in 1-day tokens. We dis-
cover that in BSC, 81.2% of 1-day tokens listed on Pan-
cakeSwap have this pattern. Moreover, we notice that
75.1% of the token spammer in BSC carry out at least
one operation using “disposable” tokens. We estimate
the gains of the scammer, observing that even if the op-
eration is very simple to arrange, given its cheap cost, it
is profitable when performed serially.
Sniper bots
: We find the presence of sniper bots, a par-
ticular kind of trader bot that observes the mempool of
the blockchain to buy newly listed tokens in almost zero
time from the moment they are tradable on a liquidity
pool. To the best of our knowledge, we are the first to
illustrate how this kind of trading bot works, detect their
presence, and quantify their activity in the exit scam
2 Ethereum and BNB Smart Chain
2.1 Ethereum
Ethereum [12] is a proof-of-work blockchain. Its native coin is
the Ether (ETH), the second most popular cryptocurrency after
Bitcoin (BTC) with a market cap of more than 210 billion
US dollars. A key feature of Ethereum is smart contracts,
pieces of code that can execute in a decentralized way on-
chain, making Ethereum a programmable blockchain. Smart
contracts enable the creation of decentralized applications
(dApp) and the so-called Web3.0 [3]. Through smart contracts,
it is possible to create new digital assets like (fungible) tokens
and NFTs (not fungible tokens).
The tokens.
Tokens, like coins, are cryptocurrencies that
can be exchanged or traded. The main difference is that coin is
the native asset of the blockchain, whereas tokens are created
on top of the blockchain, and their mechanisms are defined
using smart contracts. In Ethereum, the ERC-20 [25] standard
defines the main properties of tokens. ERC-20 was proposed
in late 2015 to establish the standard interface for tokens. An
ERC-20 compliant smart contract must implement a set of
functions and events specified in the standard. These func-
tions are reported in Table 1. Some of them are optional, in
particular the name(), the symbol(), and the decimal() func-
tions. In Ethereum, tokens and digital assets are held into
Ethereum accounts.
There are two kinds of accounts in
Ethereum: Externally owned accounts (EOA) and contract
accounts. EOAs consist of a pair of public and private keys
generated with the Elliptic Curve Digital Signature Algo-
rithm (ECDSA) [30]. An account is represented by its public
address, a 42-character hexadecimal string obtained concate-
nating "0x" to the last 20 bytes of the Keccak-256 [22] hash of
the public key. Generally, users interact with an account using
applications called wallets. Example of wallets are Meta-
Mask [40], TrustWallet [54] or MyEtherWallet [41]. A con-
tract account, instead, is an account tied to a smart contract,
and it is represented with an address of the same format as an
EOA. A contract account is generated when a smart contract
is deployed to the Ethereum blockchain. Both accounts can
hold and send Ether. However, contract accounts can only
send transactions in response to receiving a transaction.
Transactions and fee.
A transaction is an action that up-
dates the whole Ethereum network. It can be used to move
digital assets, deploy a smart contract, or invoke a smart con-
tract. Executing a transaction has a cost, commonly called
transaction fee. The fee is variable and depends on two main
factors: The state of the network (if the network is heavily
loaded, the fee is usually higher), and the complexity of the
operation that the transaction triggers. For instance, moving
Ether from an EAO to another is the cheapest kind of trans-
action, while interacting with a smart contract could be very
expensive. For the sake of simplicity, we can say that the trans-
action fee is composed of two parts: the gas limit and the gas
price. Gas refers to the unit that measures the computational
effort required to execute specific operations. The gas limit
represents the maximum amount of gas a user is willing to
pay for the operation, and it has to be high enough to pay
the computational effort; otherwise, the transaction will fail.
Instead, the gas price is the amount of Gwei (
Ether) the
user is willing to pay for each gas unit.
Smart contract deployment.
As we said, smart contracts
are programs that run on the Ethereum blockchain. They
are written in a high-level programming language (e.g., So-
lidity [20]) and compiled into bytecode that runs on the
Ethereum Virtual Machine (EVM) [24]. A smart contract
can be deployed by sending a contract creation transaction
from an EOA to the zero address
. The transaction contains
the bytecode of the smart contract. A smart contract can also
create new smart contracts. In this case, the bytecode of the
new smart contract has to be embedded in the bytecode of
the smart contract that generates the new one. Since a smart
contract can not start a transaction by itself but only in re-
sponse to a transaction that triggers it, an EOA must trigger
the generation of a new smart contract.
Events and logs.
A smart contract has data associated with
it, such as its Ether balance and the value of its variables.
Transactions, calling the smart contract methods, can mod-
ify those values, hence the state of the smart contract itself.
Knowing the internal state of a smart contract can be crucial,
especially in cases where it serves as a backend to distributed
applications (dApps). Ethereum provides Events and a Logs
register to track the internal states of smart contracts. Each
time an action changes the internal state of a smart contract, it
can fire an Event that will notify the change. All the events are
written on an Event log. Thanks to it, users and developers can
easily track the state of the smart contracts in the blockchain.
EVM and EVM compliant.
Ethereum is a distributed
state machine that changes its state at each new block ac-
cordingly to a predefined set of rules. The EVM is the entity
that computes these changes in states. Specifications of the
EVM are described in the Ethereum Yellowpaper [55]. There
are several standard implementations of the EVM in differ-
ent programming languages (e.g., Python, JavaScript, C++).
Other than Ethereum, other blockchains rely on the EVM
(to name a few: BNB Smart Chain [9], Avalanche [46], Fan-
tom [26], Cronos [18]), and they use one of the standard EVM
or a complete custom one. These blockchains are called EVM
compliant. They run the same (or with minimal change) smart
contract written in Ethereum, use the same convention for the
address, and handle states the same way as Ethereum.
2.2 The BNB Smart Chain
The BNB Smart Chain [9] (previously Binance Smart Chain)
or BSC is a blockchain that was born in 2020 as a paral-
lel to the Beacon Chain (previously Binance Chain), and
together they form the BNB Chain. Its consensus is based
on the PoSA [10] (Proof of Stake and Authority). While the
Beacon chain handles the staking and the governance of the
blockchain, the BSC manages the consensus layer and pro-
vides EVM compatibility. The coin of both chains is the BNB
(Build and Build, previously Binance Coin)—the third coin
by market cap with over 46 billion of capitalization. As Ether
on Ethereum, the BNB coin fuels the transactions in the BNB
chain. Given the EVM compatibility, creating tokens on the
BSC is possible. However, in this case, tokens follow the
BEP-20 standard instead of the ERC-20.
3 The Automated Market Maker - AMM
Cryptocurrencies can be traded on centralized exchanges
(CEXs)—like Binance, Coinbase, FTX—or on decentralized
exchanges (DEXs). The name CEX stems from the fact that
users who trade a cryptocurrency for another have to transfer
the cryptocurrency from their private wallet to a custodial
wallet managed by a centralized entity, that is the exchange.
Usually, on CEX, like on the traditional stock exchange, trades
are performed following the Order Book Model. In this model,
there is a system that matches the users’ trading orders, en-
suring that each order is closed accordingly. In simple terms,
if trader A wants to buy 1 Bitcoin at $40,000, the system
performs the trade only if it finds a trader B willing to sell 1
Bitcoin at the same price.
Unlike CEX, on DEX there is no intermediary. The user
interacts with the smart contracts deployed on the blockchain,
and the user’s cryptocurrencies leave their private wallet only
when traded. Although some DEXs operate the Order Book
Model (EtherDelta, Binance Dex, Waves Exchange, DyDx),
the most popular follow the Automated Market Maker model.
This model relies on a mathematical formula to fix the price
of assets and on the concept of liquidity pools and liquidity
providers. A liquidity pool is a smart contract that contains
two or more cryptocurrencies that the user can swap one for
the other. Instead, a liquidity provider is a user who invests
in the liquidity pool, providing cryptocurrencies to the smart
contract. When a liquidity provider injects liquidity into the
liquidity pool, the smart contract mints LP-tokens and gives
them to the liquidity provider. The LP-token represents the
share of the liquidity pool owned by the investor. Conversely,
when the liquidity provider desires to get back her cryptocur-
rencies, he transfers the LP-tokens to the smart contract. The
latter burns the LP-tokens and provides the cryptocurrencies
back to the investor. Usually, liquidity pools apply a trading
fee to each swap operation and distribute a portion of the fees
to the liquidity providers according to their LP-tokens.
3.1 Uniswap and its forks
Uniswap [1] is the first decentralized application (DEX) to use
the AMM model successfully. According to DefiLlama [36], a
popular DeFi statistics aggregator,Uniswap is the
dApp by
TVL (Total Value Locked, amount of money locked into smart
contracts) with over 6 billion USD, while it is the
the AMMs. Uniswap was launched on Ethereum, but now it
is also present on the Ethereum Layer 2 solutions Arbitrum
and Optimism and on the Ethereum side chain Polygon Matic.
Because of its popularity, its open-source smart contracts, and
the copyleft license [51], more than 50 protocols were born
on several blockchains by forking Uniswap smart contracts in
the last years. Uniswap is on its third version, but all its forks
belong to the second version since the third one is under a
Business Source License [52]. For this reason, in this work,
we focus on Uniswap V2 and its forks. One of the most
popular fork of Uniswap is PancakeSwap, which lives in BSC,
and it is the
dApp by TVL on this blockchain with over 4
billion of USD locked in its smart contracts.
In Uniswap V2, each pool consists of a pair of ERC-20
tokens. We can think of the liquidity pool as divided into
two parts, each containing a single token, and both have an
equivalent value. Let a pool consist of
token A and
B. At each swap, the pool preserves
. When a user swaps
token A for token B (the user adds token A to the pool and
takes token B from the pool),
increases by
, where
is computed so that
does not change. The
of the exchange is determined by the ratio of
in the pool. Consequently, the swap operation changes the
current exchange rate, the value of token A decreases while
the value of token B increases, and the two parts maintain the
same value.
Although Uniswap has some pools directly created by itself,
anyone can leverage the protocol and create a new pool with
a custom trading pair. This freedom allows the creator of the
pool to establish the initial price of a new token, balancing
the amount of liquidity added to the liquidity pool.
4 The Datasets
For our investigation, we build two different datasets: The
Token Dataset contains all the ERC-20 (resp. BEP-20) tokens
created, and the Liquidity Pool Dataset, contains data about
liquidity pools. Each dataset has two versions, one with data
from the Ethereum blockchain and the other from the BNB
Smart Chain.
Given the large amount of data and the need to parse the
entire blockchains multiple times, for performance reasons
and to avoid overloading public nodes (e.g., nodes provided
by Binance [8] and Infura [29]) or services (e.g., BSCscan or
Etherscan), we host and run an Ethereum and a BNB Smart
Chain node. Finally, to query the blockchains and process the
data we use the Web3 [42] and the Ethereum-etl [39] Python
libraries. Web3 is a library that allows interaction with a
local or remote EVM-compliant node using HTTP, IPC, or
WebSocket. Ethereum-etl allows extraction of information
from EVM-compliant blockchain and exporting them into
formats like CSV or JSON.
We consider the whole history of both blockchains from
their inception to March 2022. For the Ethereum blockchain,
we process all the blocks from block 0 (2015-07-30) to block
14340000 (2022-07-03). For the BSC blockchain from block
0 (2020-04-20) to block 15854000 (2022-03-07).
4.1 The Token Dataset
To build the Token dataset, we perform multiple steps. We
first collect all the contract creation transactions on both
blockchains, then we expand our dataset by collecting all
the contracts that emit a Transfer event. This process is neces-
sary to find tokens generated by internal transactions. Finally,
we refine our dataset by selecting only the smart contracts
compliant with the ERC-20 (resp. BEP-20) standard.
4.1.1 Gathering Smart Contracts
As a first step to building the Token dataset, we collect all the
contract creation transactions issued by EOAs. As mentioned
in Section 2, EOAs can deploy a smart contract by sending a
contract creation transaction to the zero address. We process
all the transactions in the considered time frame in BNB
Smart Chain (2.6 billion transactions) and Ethereum (1.4
billion transactions). We collect 2,195,399 and 4,420,389
contract creation transactions respectively.
However, tokens can also be created by a smart contract
itself. Indeed, it could be the case that an EOA calls a smart
contract method, and its execution generates a new ERC-
20 (or BEP-20) compliant smart contract. In this case, the
token is created with a so-called internal transaction. Despite
the name, internal transactions are not real transactions but
instead calls performed by smart contracts. These kinds of
transactions are stored off-chain—they are not visible simply
by parsing the blockchain.
To track tokens created by internal transactions, we can
operate in two ways: The first way is to re-execute all the
transactions in the blockchain in the EVM and trace all the
calls. This process is extremely expensive [49] from a com-
putational point of view. The alternative is to scan the Event
log looking for events that emit a Transfer event. The second
way is much faster and we estimate that it loses only 12% of
the total number of tokens created by internal transactions.
Moreover, the missing tokens are tokens that have never been
used, traded, or transferred and are thus of little importance
for our study (we discuss in detail the impact of this choice
in Section 10). So, we parse all the logs of both blockchains,
searching for smart contracts that emit a Transfer event com-
pliant with the ERC-20 (resp. BEP-20) interface. Then, we
use EtherScan [31] and BscScan [32] to retrieve the trans-
actions that created these smart contracts and all the needed
At the end of these two steps, we have a collection of
3,087,274 and 4,534,599 smart contracts extracted from BSC
and Ethereum, respectively. For each of them, we store the
following information: The address of the contract, the block
number in which the smart contract has been generated, the
block in which the smart contract emits the last event, the EOA
that deployed the smart contract or in the case of internal
transactions the EOA address that triggers the first smart
contract, the amount of gas used, the cost of the gas unit (gas
price), the bytecode of the smart contract, and if the smart
contract has been deployed by an EOA or through an internal
4.1.2 Token Identification
Smart contracts are not only used to create tokens, and not all
smart contracts that emit a Transfer event are tokens (e.g., NFT
contracts). Thus, we need to identify which of the retrieved
smart contracts are compliant with ERC-20 (resp. BEP-20).
Unfortunately, this is not a trivial task, and in the last years
several works [14,15,21,27,53], attempted to face this prob-
lem with several approaches that we describe in Section 9.
For our analysis, we follow the approach proposed by [15,53]
that leverage the bytecode of smart contracts.
According to the Solidity specification [35], in the byte-
code, smart contracts’ methods are identified by signatures
that consist of the first 4 bytes of the Kekkack-256 hash of
the method name and parameters’ type. Thus, to verify if a
bytecode of a retrieved smart contract represents an ERC-20
(resp. BEP-20) compliant token, we verify if it contains at
least all the signatures of the ERC-20 (resp. BEP-20) manda-
tory methods. Tab. 5in the Appendix shows the signature
of the mandatory and optional methods of the ERC-20 and
BEP-20 interfaces.
Of the 4,534,599 smart contracts’ bytecode retrieved on
the Ethereum blockchain, we find that 389,348 (8.5%) are
ERC-20 tokens compliant, and 381,551 (98%) of them also
implement the optional functions of the ERC-20 interface.
Instead, in the BNB Smart Chain, we find that 1,887,484 out
of 3,087,274 (61%) are BEP-20 compliant and, almost all
of them also implement the optional methods of the BEP-
20 interface. Although we found more smart contracts in
Ethereum than in BSC (4,534,599 vs 3,087,274), there are
many more compliant tokens in BSC (1,887,484) than in
Ethereum (389,348). This discrepancy suggests that BSC
may be a more interesting environment to study tokens and,
possibly, their misuse.
Lastly, we retrieve all the information about the identified
tokens, such as the name, the symbol, the number of deci-
mals, and the total supply. We use the Ethereum-etl library
and the Contract Application Binary Interface (ABI) [23].
The ABI is an interface between two program modules. It
contains the specification for encoding/decoding methods and
structures to interact with the machine code and interpret the
results. Through the library, it is possible to instantiate smart
contracts in an object-oriented manner and call its methods
using an appropriate ABI. We instantiate the token contracts
using an ABI that contains the specifications of ERC-20 (resp.
BEP-20) methods and call the name(),symbol(),decimals(),
totalSupply() methods.
At the end of the process, we have a dataset of ERC-20
(resp. BEP-20) tokens containing all the information about
the smart contracts described in Section 4.1.1 and the related
tokens. Table 1shows the number of smart contracts on both
4.2 Liquidity Pools Dataset
To create the Liquidity Pool dataset, we consider Uniswap,
its forks, and other protocols that leverage its smart contracts.
Uniswap has three main smart contracts: Factory,Pair and
the Router. The Factory contract is responsible for creating
the smart contract that handles the liquidity pool and the LP-
Table 1: An overview of the Token dataset.
Ethereum BNB Smart Chain
Contracts Total ERC-20 Total BEP-20
External 4,420,389 293,688 2,195,399 1,021,427
Internal 114,210 95,660 891,875 866,057
Total 4,534,599 389,348 3,087,274 1,887,484
Total (w/o LP) - 323,863 -1,078,016
Table 2: An overview of the Liquidity pools dataset.
Ethereum BNB Smart Chain
Events Uniswap Others PancakeSwap Others
PairC. 65,098 5,483 941,220 30,907
Mint 1,399,599 512,319 21,944,474 5,027,980
Burn 824,359 243,482 7,339,286 2,481,023
Swap 54M 27M 571M 179M
tokens. Note that since the same smart contract handles the
LP-token and the liquidity pool they have the same contract
address. The Pair contract keeps track of the balances of the
tokens in the pool and implements the AMM logic explained
in Sec. 2. The Router contract offers the entry point to interact
with the liquidity pools. Thus, interacting with the Router, it
is possible to add or remove cryptocurrencies from a liquidity
pool and swap tokens. Each of these contracts implements a
set of Events that notify their status changes.
To build our datasets, we parse the Event log of the
Ethereum and BSC blockchains. Following there are the
events we look for and a brief description:
This event is fired by the Factory con-
tract each time a new liquidity pool is created. We find
972,127 and 70,581 PairCreated events emitted in BSC
and Ethereum, respectively. From the event, we can ob-
tain the transaction hash, the block of the creation of the
liquidity pool, the address that created the liquidity pool,
the address of the liquidity pool, and the addresses of the
two tokens (the pair of the liquidity pool), the gas used
and the price paid per gas.
Analyzing the address that fired the event and looking
online for notable smart contract addresses, it is possible
to have a rough idea of the diffusion of the Uniswap
forks in the blockchains. In BSC, we find that Pan-
cakeSwap created most liquidity pools, with 941,220
emitted events (96.8%), followed by ApeSwap [4] (3,265
events), BakerySwap [6] (2,418 events) and Mdex [38]
(1,602 events). In Ethereum, Uniswap emitted 65,098
events (92.2%), while the SushiSwap [12] Factory con-
tract, a popular alternative to Uniswap on Ethereum,
2,637 (3%).
: The Pair contract emits a Mint (or Burn)
Event each time a LP-token is minted (or burned). This
occurs every time a liquidity provider adds (or removes)
tokens into a liquidity pool. Analyzing these events, we
can obtain the transaction hash and the block of the
Mint (Burn) Event, the address of the liquidity pool,
the address that added (removed) the liquidity, the num-
ber of LP-tokens minted (burned), the gas used, and
price paid for the gas. We find 26,972,454 Mint events
and 9,820,309 Burn events in BSC, and 1,911,918 Mint
events and 1,067,841 Burn events in Ethereum.
This event is fired by the Pair contract each time
a user swaps tokens in a liquidity pool. From the event,
we obtain all the information related to the swap as: The
transaction hash, the block in which the swap occurs,
the address of the liquidity pool used, the address that
performs the swap, the number of tokens swapped, the
gas used and the gas price. We find 750,508,160 events
in BSC and 82,447,051 events in Ethereum.
Moreover, we complete our dataset by collecting for each
smart contract the block number in which it emits the last
event. Tab. 2describes the final dataset.
LP-tokens are ERC-20 (resp. BEP-20) compliant tokens,
and they are already present in our Tokens Dataset. However,
our goal is to study standard token and liquidity pools sepa-
rately. Thus, as the final step, we get rid of the information
related to the LP-tokens from the Tokens Dataset. The last
line on Tab. 1reports the number of tokens after getting rid
of the LP-tokens.
5 The Lifetime of Tokens
Our data collection revealed a surprisingly high number of
tokens and liquidity pools in Ethereum and BSC. Services
like CoinGecko [16] or CoinmarketCap [17] list about 13,000
cryptocurrencies on 602 centralized and decentralized ex-
changes in total. So, it is unclear what is the role of the large
majority of tokens in the blockchain ecosystem.
To obtain a first insight into the characteristics of tokens
and liquidity pools, we introduce the concept of lifetime. We
define the lifetime of a token in the following way: A token
begins its lifetime at the block where its smart contract has
been deployed, while it ends its lifetime in the last block
where it emits any Event. Similarly, a liquidity pool begins its
lifetime at the block where the PairCreated event is emitted,
and it ends in the last block where the liquidity pool’ smart
contract emits any Event.
Figure 1shows the CDF of tokens’ lifetime and liquidity
pools’ lifetime in Ethereum (blue lines) and BSC (yellow
lines). Tokens and liquidity pools are shown with solid and
dashed lines, respectively. The slope of the curves tells that
the lifetime of the tokens in BSC are generally shorter than
the lifetime of the tokens in Ethereum. Consider that BSC
0 500 1000 1500 2000
(a) Lifetime duration (days)
Fraction of Tokens
BSC tokens
BSC pools
ETH tokens
ETH pools
0 4 8 12 16 20 24
(c) Lifetime duration (hours)
0 20 40 60
(b) Lifetime duration (minutes)
Figure 1: Lifetime of tokens and liquidity pools in BSC and
is a young blockchain, with slightly more than two years of
activity (released on 2020-04-20), while Ethereum is more
than seven years old (released on 2015-07-30). The longevity
of Ethereum is also visible by the long tail of its tokens in
the CDF. Nonetheless, it is apparent that Ethereum hosts to-
kens that tend to be more solid and long-lasting. This dif-
ference is smaller when we look at liquidity pools. Indeed,
PancakeSwap, that handles about 97% of the liquidity pools in
BSC, was born only four months after the release of Uniswap
V2. The reader can also observe that, in general, the lifetime
of the liquidity pools is shorter than the lifetime of the tokens.
This is not surprising. Indeed, a liquidity pool needs both
tokens involved to be created in advance. Thus, the lifetime
of a liquidity pool can be maximum as long as the lifetime of
the younger token.
From the CDF, we can also note a few additional interesting
facts, particularly when looking at the first 24 hours of the
life of tokens and liquidity pools.
A significant fraction of tokens are never active.
at the zoomed image in the center of Figure 1(b), it is possible
to see that a significant fraction of tokens have a lifetime of
zero, meaning that the token is active only in 1 block when it
was created. This phenomenon is more common in Ethereum,
with 104,836 out of 323,863 (32.4%) tokens that belong to
this category, against 167,318 out of 1,078,016 (15.5%) in
BSC. In the following, we refer to the tokens that last only one
block as 1-block tokens, while to the other tokens as active
tokens. We find 910,698 and 219,027 active tokens in BSC
and Ethereum, respectively. Table 3succinctly reports on
these statistics.
A large part of active tokens have an extremely short life-
Figure 1(b) shows that about 60% of the tokens in BSC
and Ethereum have a lifetime shorter than one day. We refer
to these tokens as 1-day tokens. If we consider only active
tokens, we find that 471,385 (51.7%) of all the active BSC
tokens and 82,542 (37.7%) of all the Ethereum active are
1-day tokens. Looking at the data at a higher granularity (Fig-
Table 3: Summary of 1-day and 1-block tokens in the BNB
Smart Chain and Ethereum.
Lifetime BNB Smart Chain Ethereum
1-day 638,703 (59.2%) 187,378 (57.8%)
1-block 167,318 (15.5%) 104,836 (32.4%)
Total tokens 1,078,016 323,863
ure 1(b)), we can note that the death ratio of BSC tokens is
surprisingly high. BSC has approximately half of the 1-block
tokens of Ethereum, about the same proportion of dead tokens
after 60 minutes, and a significantly larger proportion of dead
tokens after the first 4 hours. As we can see in Figure 1(c),
the first four hours of token life are crucial also in Ethereum.
Almost all the BSC tokens with short lifetime have a liq-
uidity pool.
Here, we find one of the main differences be-
tween BSC and Ethereum. 414,936 out of 471,385 (88%)
active tokens with a lifetime shorter than one day in BSC
have a liquidity pool. In Ethereum, only 33% (26,817). It
seems that in BSC, the liquidity pool is the main reason to
create a token.
6 Who Creates Tokens?
In this section, we change perspective and explore who creates
tokens. Retrieving the list of creator addresses from our token
dataset, we find 144,795 and 464,095 different addresses that
create at least one token, respectively in Ethereum and BSC.
Comparing these numbers with the total number of cumula-
tive unique addresses in Ethereum (189,858,744) and BSC
, we see that they represent only a very small
fraction of the addresses, the 0.07% in Ethereum and 0.33% in
BSC. Figure 2shows the distribution of the number of tokens
created by addresses in Ethereum and BSC. The first thing to
notice is that the two distributions are extremely similar. The
large majority of these addresses (70%) create only one token,
as we can see in the zoomed image on the bottom right corner
of Figure 2. 95% of addresses create 5 tokens or less and just
1% of addresses create more than 18 tokens. However, we can
gather further insights by plotting the same data differently.
A small fraction of addresses creates a disproportionate
amount of tokens.
Figure 3shows the CDF of tokens cre-
ated by fraction of addresses. From the figure, we can see
that although 70% of addresses create just 1 token, the to-
tal amount of tokens created by these addresses account for
only 30% of the tokens on the two blockchains. And, more
interestingly, we find that just 1% of the addresses creates
24.3% (262,023) of the tokens in BSC, and similarly, 1% of
the addresses in Ethereum create 20.1% (67,838) of the to-
kens. These addresses create an average of 51 and 61 tokens
2Data retrieved from Etherscan and BSCscan respectively
0 20 40 60 80 100
# of tokens created
Fraction addresses
Figure 2: Distribution of the number tokens created by the
addresses that create at least one token in BSC and Ethereum.
For the sake of visualization, the CDF is cut at 100 tokens.
The maximum number of tokens created are 17,936 in BSC
and 1,740 in Ethereum.
0.2 0.4 0.6 0.8 1.0
Fraction of addresses
Fraction of Tokens
0.96 0.98 1.00
Figure 3: Fraction of addresses that create at least one token
with respect to the fraction of tokens that they create.
in Ethereum and BSC, respectively. We will refer to these
addresses as token spammers.
6.1 The Token Spammers
In this subsection we put under the lens the token spammers.
Here are a few interesting observations.
Token spammers are more prevalent in BSC.
Although the
distribution of the number of tokens created by addresses in
Ethereum and BSC is almost identical (Figure 3), the absolute
numbers are different. Indeed, in terms of raw numbers, we
find that BSC has almost four times more token spammers
than Ethereum (4,231 vs 1,329), and the spammers of BSC
create almost four times more tokens in BSC than in Ethereum
(262,023 vs. 67,838).
Token spammers create tokens mainly with contract cre-
ation transaction.
As mentioned in Section 2, tokens can be
created in two ways: By sending a contract creation trans-
action or by sending a transaction to a smart contract that
generates the token. We find that 94.8% of the tokens in BSC
and 82.3% of the tokens in Ethereum are created directly
by sending a contract creation transaction. Focusing on the
small fraction created by sending a transaction to a smart
contract, we find that token spammers create 13,794 tokens
using 3,530 different contracts in BSC and 12,010 tokens
using 1,237 different contracts in Ethereum.
Token spammers create short lifetime tokens.
As we have
seen, a significant fraction of tokens have a lifetime shorter
than 1 day. Investigating the relationship between token spam-
mers and 1-day tokens, we discover that most of the tokens
created by the spammers have a lifetime shorter than one day.
The spammers created 170,768 1-day tokens out of 262,023
(65.1%) and 40,552 1-day tokens out of 67,838 (59.8%) re-
spectively, in BSC and Ethereum.
7 Malicious Activity
The top token spammer creates 17,936 tokens in the time-
frame of our analysis. If we look at the name of these tokens,
we find that almost all of them have the same name (the to-
kens have only 76 unique names), with the most used being
’Pornhub’ with 605 occurrences. The median lifetime of these
tokens is extremely small: 45 mins. Lastly, almost all of the
tokens (99.7%) created by this address have a liquidity pool.
We now detail in depth the life of one of the tokens.
The anatomy of an operation.
We focus on OnlyFans
, a
token created by the top token spammer on block 8090747
(2021-06-07 01:40:34 PM UTC) by issuing a contract creation
transaction. This token has a supply of
and its symbol
is the Unicode U+1F48B, the emoji of a kiss mark, followed
by the string "OnlyFans". On block 8090751 (2021-06-07
01:40:46 PM UTC), after 4 blocks from its creation, the token
spammer creates a liquidity pool that contains the pair (On-
lyFans, WrappedBNB) and adds a liquidity of 20 Wrapped
BNB (almost $7,180 at the moment of the operation) and 44
trillions of OnlyFans tokens.
After just 6 seconds, on block 8090753 (2021-06-07
01:40:52 PM UTC), an address swaps 4 billion OnlyFans
for 0.002 Wrapped BNB ($0.718). That operation is followed
by other 12 addresses—12 different swaps—for a total buy of
OnlyFans for 2.67 Wrapped BNB ($958). After 2
hours from the creation of the token, at block 8093101 (2021-
06-07 03:38:55 PM UTC), the token spammer removes all the
liquidity from the liquidity pools leaving it drained. Since the
12 addresses added Wrapped BNB into the pool by buying
OnlyFans, the token spammer collects 22.67 Wrapped BNB
and has a profit of 2.67 Wrapped BNB ($958).
These kind of operations are quite frequent and are com-
monly called rug pulls or exit scams [37,43].
7.1 The Rug Pull
We develop an approach to systematically find operations
following the same pattern we explored in the case study.
First of all, we formalize the kind of activity we are interested
1. Eve creates a new ERC-20 token τ.
Eve creates a new liquidity pool with pair (
), where
Bis a valuable token, e.g. Wrapped BNB.
Eve adds liquidity to the liquidity pool. The reserves of
the pool are now (reserveτ,reserveB).
At this point, Eve is the only one that owns token
Investors can buy token
by swapping their tokens with
token τin the liquidity pool.
Suppose that Bob buys a few
swapping it with
The new reserves of the liquidity pool are
Lastly, Eve removes all the liquidity from the liquidity
pool. The net gain of the operations is
minus the gas
fees to execute the transactions.
7.1.1 An Improved Version of the Fraud
The rug pull described above is the simple version of the
fraud, in which the scammer never interact with the liquidity
pools until he removes the liquidity. To attract more investors,
the scammer can manipulate some statistics of the liquidity
pool, such as the number of swaps, the trading volume, or the
price. The scammer can usewash-trading [13], a well-known
market manipulation. In this case, the creator of the pool tries
to create the impression that the liquidity pool is active, faking
the trading volume by repeatedly buying and selling tokens.
The goal is to increase the liquidity pool’s trading volume
rather than the scam token’s price.
Another way that scammers have to drum up the attention
of investors is to inflate the price by buying the scam token
gradually. This action is without risks for the scammer until
the liquidity pool has no new investors. Indeed, having the
total of the LP-tokens, he can retrieve all the new capital
injected when he removes all the liquidity. In the following,
we refer to these operations as pump operations.
The scammer can also hedge his gains—eliminating the
risk of an unrealized profit while the liquidity pool is still
active. The scammer usually does not put the scam token in
the pool but maintains a reserve in its wallet. When investors
start to buy the scam token, raising the asset’s price, the scam-
mer can gradually sell the owned token, taking profit from
the operation. In the following, we refer to these operations
as hedge operations. Of course, the scammer can use one or a
combination of these practices.
7.2 Looking for Rug Pulls
We leverage our datasets to identify exit scam rug pulls sys-
tematically. Since we saw a considerable number of 1-day
tokens, and most of them are created serially, we narrow our
investigation to the 414,936 in BSC and 26,817 in Ethereum
1-day tokens with a liquidity pool. We analyze all the Events
emitted by the liquidity pools, looking for all the liquidity
pools that emitted only 1 Mint event and only 1 Burn event
in which the address that performs the transaction burns at
least 99% of the minted LP-tokens (we don’t use 100% since
a small fraction of tokens might be stuck in the wallet due to
7.2.1 Estimating the Gains of the Fraud
The simple operation, where the scammer does not swap in
his liquidity pool, can be carried out by performing just four
transactions: A transaction that creates the token, one that cre-
ates the liquidity pool, one that adds the liquidity, and finally,
the last transaction to remove the liquidity. These transactions
can be performed individually, or they can be aggregated by
leveraging a smart contract. Of course, we consider both of
the cases when computing the fees. If the scammer performs
swaps on the liquidity pool, we also consider the transaction
fees paid for each swap.
To perform our estimation we use the following formula:
base_gain =δBfees (1)
net_gain =base_gain Tin +Tout feesswap (2)
The formula can be split into two components. The first part
computes the gain in the case of the simple operation, the
case that we explored in the Case Study and formalized in
Section 7.1. The second formula takes into account the more
refined version of the fraud that we described in 7.1.1, where
the creator of the liquidity pool manipulates it by performing
swaps operations. In this case, we remove from the gain
that is, the amount of tokens that the manipulator artificially
adds to the liquidity. We also add to the gain
, the quantity
of tokens that the manipulator removes from the liquidity pool
before the final removal of the liquidity. Finally, we remove
from the gains the fees used to perform the swaps operations
(f eeswap ).
7.3 Results
After processing our data, we discover that an incredibly high
number of liquidity pools are actually rug pulls. In BSC,
266,340 out of 414,936 (65.6%) of the considered liquidity
pools have an exit scam pattern, while 21,594 out of 26,817
(81.1%) in Ethereum. This result shows that scammers use
most of the 1-day tokens as disposable to carry out rug pulls.
These operations are arranged by 116,516 different ad-
dresses in BSC and 16,539 different addresses in Ethereum.
As we can expect from the previous analyses, most of the
token spammers that operate in BSC are linked to this kind of
activity. Indeed, in BSC, 3,179 out of 4,231 (75%) token spam-
mers performed at least one rug pull. Instead, in Ethereum,
there are only 101 token spammers (7.6%) that have been
involved in this activity. We find 114 addresses that perform
more than 100 exit scams in BSC, accounting for 19.1% of the
operations, with the most active performing 16,102 operations.
Instead, in Ethereum, we find only two addresses performing
more than 100 operations. Interestingly, combining the infor-
mation in the BSC and Ethereum datasets, we find that two
token spammers operated on both blockchains with the same
address. They perform 78 exit scams in Ethereum and 194 in
Looking at the liquidity pools, we find that BNB (97.8%
of the cases) is the token paired the most with the scam token.
It is followed by USDT (0.67%) and BUSD (0.15%), two
stable coins pegged to the USD. Instead, ETH is paired with
all the scam tokens in all the liquidity pools with an exit
scam in Ethereum. As the next step, we want to estimate the
number of potential users that fall prey to such activities. To
do so, we exclude the addresses that swap into liquidity pools
they have created themselves from this analysis. We collect
256,967 different addresses in BSC and 59,043 in Ethereum
that interact with at least one liquidity pool with an exit scam
pattern. These addresses performed 2,903,021 swaps on the
considered liquidity pools in BSC and 317,305 in Ethereum.
We divide the swaps into buy (scam token) and sell opera-
tions. As we can expect, given the anatomy of the fraud, we
find that most of the operations are buy operations. More in
details, in BSC 2,286,056 (78.7%) are buy operations and
616,966 (21.3%) sell operations. In Ethereum, we find a simi-
lar pattern, with 254,061 (80.1%) buy operations and 63,196
(19.9%) sell operations.
As final metric, we compute the average value of the
swaps performed by the users. The average amount of
swaps is almost identical for buy and sell operations on both
the blockchains, with 0.01 BNB in BSC and 0.19 ETH in
Ethereum. Interestingly, we notice a considerable difference
in the average swap amount between the two blockchains.
Indeed, the average swap is approximately $3 in BSC and
$360 in Ethereum, considering the current price of BNB and
7.3.1 The Gains
Before computing the gains of the scammers, we calculate the
average price a scammer has to invest in arranging the fraud.
If the scammer does not perform any swap into the liquidity
pool, the cost of the operation is on average 0.03 BNB in
the case of BSC and 0.2 ETH for the Ethereum blockchain.
Thus, the investment needed to perform such operations is low,
even if it could vary substantially when the blockchains are
overloaded. For instance, we found some operations among
our set of rug pulls that reached the cost of 1.1 BNB or even
3.3 ETH. The base cost to arrange the fraud is interesting
because it represents a bound to the loss that the scammers
have to afford for each operation.
We leverage our datasets to compute the gain of the opera-
tion using Formula 1. We describe the 266,340 operations on
BSC and the 21,594 in Ethereum in terms of successful and
unsuccessful operations based on the operation’s net gain. In
particular, we define an operation as successful if the net gain
is strictly positive.
Successful operations
. Among the liquidity pools with an
exit scam pattern, there are 104,404 (39.1%) operations in
BSC and 13,368 (61.9%) in Ethereum closed with a profit for
the scammer. A possible reason for the higher success rate of
the rug pull in Ethereum could be that, as we saw, on average,
users tend to investmore money. Indeed, on average, attracting
only one investor is enough to cover the operation’s cost. To
investigate what can affect the gains, we combine information
on gains with those of the manipulation. When the creator
of the liquidity pool does not perform any manipulation, the
net gain is on average 0.11 BNB in BSC and 1.34 ETH in
Ethereum. Operations carried out on liquidity pools that suffer
wash-trading activity have an average gain of 0.25 BNB in
BSC and 12 ETH in Ethereum, which is considerably higher
than the previous case. Instead, we notice a negligible increase
in gains in the case of pump operations with respect to the
gains obtained by the liquidity pools without manipulations.
Moreover, we find that both kinds of manipulation have no
impact on the success rate. This show that operations that
have wash trading are generally more profitable. However,
there is a shortcoming. Indeed the scammer have to perform
several swaps, increasing its cost and the loss in case of an
unsuccessful operation.
Unsuccessful operations.
There are 161,936 (60.9%) liq-
uidity pools in BSC and 8,226 (38.1%) in Ethereum, for which
the scammer does not cover the transaction fees with the oper-
ations. For the 13% (21,122) of these liquidity pools of BSC
and the 18.3% (1,506) of Ethereum, we notice that the oper-
ations were unsuccessful because nobody swapped into the
liquidity pools. Considering the results we obtained, we can
conjecture that the aim of the scammers is not to be success-
ful every time but to arrange frauds serially and take profit
in the long run. Indeed, the loss of an unsuccessful operation
is minimal, and a streak of operations closed in loss can be
covered with a single profitable operation.
7.3.2 The Tokens’ Names
To further deepen our analysis of exit scams, we focus on the
names used in the frauds. Analyzing the exit scams, we notice
several tokens with the same name in BSC and Ethereum.
We find that the tokens involved in exit scams have only
157,864 (57.9%) and 18,801 (86.4%) unique names in BSC
Table 4: Tokens names most frequently used in exit scams.
BNB Smart Chain Ethereum
Name # of tokens Name # of tokens
Pornhub 1023 50
Galaxy 588 Deriswap 32
Seedswap 502 Shibaswap 28
Lionswap 429 Apple core finance 17 421 16
Spacex 419 Yield farm rice 15
Onlyfans 398 The sandbox 14
and Ethereum. Table 4shows the most used names and the
number of occurrences for each of them. Thus, we attempt
to cluster the exit scam tokens into categories and enumerate
As a first category, we explore clones—tokens with the
same name as an existing (and more popular) cryptocurrency.
To systematically search for these cases, we need an authori-
tative source to discern what tokens are legitimate and what
are clones. We leverage CoinGecko APIs [16] to retrieve the
name and the addresses of all tokens created and verified with
the indexer service on the BSC and Ethereum. At the end of
the process, we build a list of 5,325 BSC tokens, and 5,172
Ethereum. We complement this list by adding a list of popular
variations for some tokens’ names (e.g., , we also considered
ADA as a possible name for the Cardano token). Using our
list, we discover 22,002 cloned tokens in BSC and 1,781 in
Ethereum. The most cloned tokens in BSC are Berryswap
(370 occurrences), ApeSwap (210 occurrences), Shiba Inu
(191 occurrences), and SafeMoon (158 occurrences).
The second category we explore is the one of tokens that at-
tempt to impersonate companies or websites. In this case, we
retrieve the name of the companies of the Standard and Poor’s
500 (S&P 500) stock market index to obtain a list of possi-
ble target companies. Instead, for the websites, we extract
the second-level domain from the top-ranked 200 websites
according to the Alexa ranking
. Using in conjunction these
two lists, we find 4,638 tokens of this category in BSC and
only 95 in Ethereum. The companies and websites that are
present the most are Pornhub (1,023), Spacex(419), Onlyfans
(398), Oracle (319), and Amazon (270).
We find several repetitions of names that contain popular
meme-related words like "Doge", "Inu" or "Shiba". This is not
surprising since meme coins are very popular nowadays after
events that involved the "meme stocks" of GameStop (GME)
and AMC Entertainment (AMC) in late 2020 [34]. Luckily,
CoinMarketCap and CoinGecko offer a categorization of the
tokens they list that containing also the "meme" category. We
leverage these lists to extract the most frequent words and
search for them in tokens involved in exit scams. We find a
huge amount of tokens in this category: 54,229 in BSC and
5Data retrieved Apr. 26, 2022
1.5K ETH
0 10 20 30 40 50 60
Avg. time between add liquidity and swaps (minutes)
Number of Liquidity Pools
Figure 4: Each data point represents an address swapping
inside liquidity pools with an exit scam pattern. On the y axis,
we represent the number of different liquidity pools where
the address swaps. On the x axis we show the average time
interval between the first time the liquidity is added to the
liquidity pool and the swaps operations of the address.
4,835 in Ethereum.
As the last category of our investigation, we look for DeFi
services (e.g., Deriswap, Shibaswap, Seedswap, and Eco Fi-
nance). In this case, we search for tokens containing the
"swap", "defi", and "finance" keywords. With this approach,
we find for this category 24,197 tokens in BSC and 3,548 in
With our simple categorization, we covered the names of
39% of the exit scam tokens on BSC and 47% in Ethereum.
Even if we were not able to categorize all the tokens, we get
some insights into how scammers pick the name to arrange
their fraud. In particular, it is possible to note a strong trend
in choosing tokens’ names related to the meme category and
leveraging the names of popular cryptocurrencies, services,
and companies.
8 Sniper Bots 2.0
We find that many exit scams are successful without using
fake tokens or wash trading. Since these zero-effort operations
are very quick and simple, it is still unclear how they can be
profitable. We analyze the operations carried out inside exit
scams more in-depth and find out that their success may be
the activity of a particular class of trading bots, which we
analyze in the following.
8.1 Sniper Bots Description
Sniper bots are automated bots that monitor time-bound ac-
tivities and perform an action before or after anyone else.
An example are Auction Snipers” which are used to place
last-second bidding on auction sites such as eBay [47]. In
this case, the goal is to secure the highest bid in the auction.
Another example of sniper bots is “Scalping Bot”, bots that
monitor the availability of target products from a website and
buy them as soon as they are available (e.g., GPU Nvidia
GPUs) [11]. Using these bots, it is possible to buy a product
as soon as it is available and sell it at a later time at a higher
With the birth of and the widespread adoption of AMMs,
a new kind of sniper bot has been developed that we define
Sniper Bots 2.0. These kinds of sniper bots are programs that
buy tokens on liquidity pools as soon as they are listed. Thus,
the basic idea is to “snipe” a new coin by buying it before any-
one else at its initial price. To do so in the fastest way, sniper
bots can leverage the mempool— the list of transactions not
yet inserted in blockchain blocks.
We find examples of these bots distributed for free on
Github [19,45,50] and for a price at several other web-
sites [2,44]. Analyzing the code, we can infer how they work.
As a first step, the bot connects to the blockchain network.
Then, it continuously scans the mempool looking for transac-
tions whose byte-code indicates that they are adding liquidity
to a brand new liquidity pool. The bot sends a swap transac-
tion to buy the token. If the gas price is adjusted correctly,
it is executed in the same block (but immediately after) the
transaction that adds the liquidity. Sniper bots typically exe-
cute only the buy operation. The user can freely decide when
to sell the tokens and make a profit. However, we also found
some variants that automatically sell the tokens when the
price rise by a certain percentage.
8.2 Identifying Sniper Bots
We conjecture that one of the reasons for the profitability of
exit scam operations is sniper bots that buy tokens from every
liquidity pool indiscriminately. Thus, we can consider the
liquidity pools involved in exit-scam as“honey pots” to detect
sniper bots. To verify our intuition, we focus on addresses
that have swapped inside liquidity pools with an exit scam.
Figure 4shows the phenomenon: every dot is an address,
and its position indicates the number of different liquidity
pools where the address has swapped and the average delay
from the pool creation. The figure shows a few addresses
that swap into thousand of liquidity pools almost immediately
after creation. Since these addresses perform these operations
serially and incredibly fast, we believe that these addresses
must be sniper bots. We set up two conservative thresholds
to identify evidence of addresses used by sniper bots. We
consider all the addresses that swap on average with a delay
smaller than 5 blocks (15 seconds, BSC blocks are mined
every 3 seconds on average) and that swap in at least 100
different liquidity pools.
We flag 130 addresses as possible sniper bots. These ad-
dresses represent only 0.05% of all the addresses that swap
inside liquidity pools involved in exit scams. It is impres-
sive that they swap in 235,777 liquidity pools, representing
86.5% of all the liquidity pools with an exit scam. Moreover,
these addresses perform also an impressive number of swaps:
646,339. That account for 24% of all the swaps performed in
liquidity pools with an exit scam. We find that 31% of these
swaps are performed in the same block where the liquidity is
added for the first time in the liquidity pool. In these cases,
we can confirm that the sniper bots scanned the mempool to
swap in the same block where the liquidity is added. However,
we also find sniper bots that perform the swap operations a
few blocks after the liquidity is created.
From the code of the open-source sniper bots, we find
that many implementations do not leverage the mempool
but instead wait for the token to be listed on services like
BscScan or Etherscan. Indeed, the documentation of sniper
bots that use the mempool reports that it is necessary to deploy
an Ethereum or BSC node locally. This requires technical
knowledge and hardware that is not accessible to most users.
We find sniper bots to be less present in Ethereum. In this
case, we pick two thresholds and consider all the addresses
that swap on average with a distance lower than 3 blocks
(45 seconds, Ethereum blocks are mined on average every
15 seconds) and that swap in at least 10 liquidity pools. We
find 64 possible sniper bots that are only the 0.1% of all
Ethereum addresses that swap in exit scam liquidity pools.
We find that these addresses swap in 30% of all the liquidity
pools and perform a much smaller fraction of swaps than
BSC sniper bots (3.5% of the total). However, interestingly, a
higher percentage of swaps are performed in the same block
where the liquidity is added to the liquidity pools (60%).
9 Related Work
Tokens identification.
In previous work, there are mainly two
approaches for token identification: behavior based approach
and interface based approach.
In behavior based approach, there is the assumption that a
token contract has a data structure that maps addresses to the
quantity of token they own and a function to transfer tokens.
Chen et al. [14] observe the EVM execution trace to identify
in smart contracts data structure that identify the bookkeeping
of a token. They also detect inconsistencies between the actual
behavior of the token and the actions indicated by standard
interfaces, and the behaviors suggested by standard events.
In interface oriented approach, that is the approach that we
take in this work, the aim is to identifiy tokens that are compli-
ant with specific interfaces (e.g ERC20 interface). Thus, the
approach consist in locating in the smart contract bytecode
the functions and events they implement. Several works use
this approach [15,21,53]. Frowis et al [27] combine both
approaches and compare their performances. They demon-
strated that they are able to detect 99% of the tokens in their
ground truth dataset using the interface based.
ERC-20 tokens graphs.
Some studies on ERC-20 tokens fo-
cus on graph analysis. Victor et al. [53] introduce the concept
of Token Networks to study ERC-20 tokens from a network
perspective. Token Networks are graphs where the nodes are
addresses that have owned a specific ERC-20 token and they
are connected if there are transfers of such tokens between
them. They collect a dataset of 75,514 ERC-20 tokens and
build a Token network for 64,393 of them. They analyze the
top 1000 Token Networks and find out that they account for
85% of all the token transfers and that some of them barely
show any activity after the initial token distribution. Chen et
al. [15] expand this work by creating transaction graphs of
the whole token ecosystem, considering more than 160,000
tokens. Tokens have been explored also on other blockchains,
albeit to the best of our knowledge there are no work that fo-
cus on the BSC. Zheng et al. [58] analyzed the EOSIO token
ecosystem via graph analysis, focusing on the graph of token
creators, token contract creators, and token holders. Similarly
to other works on Ethereum, they found out that most of the
tokens are “silent” and only 1% of the tokens cover more than
90% of the total token volume. They propose a fake-token de-
tection algorithm to detect anomalous manipulation activities,
finding three suspect cases.
Liquidity pools frauds.
Xia et al. [56] characterize scam to-
kens on Ethereum, focusing mainly on the Uniswap DEX.
First they leverage CoinMarketCap [17] to obtain a ground
truth of official and scam tokens. They use The Graph [28] , a
sandbox to obtain blockchain data, to obtain 21,778 tokens
and 25,131 liquidity pools from May 2020 to December 2020.
Firs they expand the dataset by using a guilt-by-association
heuristic and then they use it to train a machine-learning
model. Then they run their classifier on the extended dataset
and find more than 11,182 scam tokens. Mazorra et al. [37]
extend the dataset of Xia et al, adding Uniswap data until 3
September 2021, finding other 18 thousand scam tokens.They
provide a classification of three different types of rug pulls,
simple, sell, and trap-door rug pulls. Then, they found that
more than the 97.7% of the tokens labeled as scam are in-
volved in rug pulls operations.
10 Discussion
What is the impact of not collecting all the internal trans-
Unlike other works [15,53], we do not collect all
smart contracts generated by internal transactions. We col-
lect smart contracts created directly by EOAs, and expand
our dataset by adding contracts that emitted at least a Trans-
fer Event. This approach could lead to the loss of a small
percentage of tokens. We can perform a rough estimation
of the ERC-20 token we miss by comparing the number of
tokens we retrieved with the number of tokens retrieved by
Chen et al. [15] at the same block height. We find that our
approach retrieve 146,928 token instead of 165,955, approxi-
mately 12% less. However, it important to note that, by design,
our approach misses only tokens that are never used, traded or
transferred. So, the missing tokens do not represent interesting
cases for our study.
Why does it appear that frauds and token spammers are
more frequent in BSC than in Ethereum?
From a techni-
cal point of view, the frauds work the same way in the two
blockchains. Indeed, since BSC is EVM compliant and Pan-
cakeSwap is a fork of Uniswap, the same smart contracts can
be used on both blockchains. However, the cost of the oper-
ation is significantly different. As we saw in Section 7.3.1,
performing the fraud in BSC is cheaper (on average $10.5
with peaks of $600) than in Ethereum (on average $400 with
peaks of over $2,000). These costs represent a fixed cost for
the scammer, and thus going even or gaining money may be
more difficult in Ethereum versus BSC.
Can different users coordinate to carry out the same op-
eration, or can a user use multiple addresses?
In this work,
we considered each address as belonging to a single different
user, and we assumed that there is no coordination among
addresses. Nonetheless, a user may occasionally switch or, at
regular intervals, the address he uses to perform each fraud.
It is also possible that a group of users coordinate to carry
out the fraud. For example, a user can create a liquidity pool,
and the others perform wash trading. A possible approach to
detect this malicious behavior is to gather all the transactions
among the allegedly involved address and look for malicious
patterns or communities (e.g., using graph analysis). In this
work, we do not perform this analysis, but we plan to explore
more sophisticated frauds as an extension of this work.
Practical relevance of our work.
On the Uniswap and Pan-
cakeSwap websites, it is possible to select the tokens to swap
using the token’s name, the symbol or the address. In our
study, we find plenty of fake tokens that attempt to emulate
the original ones. An unaware user that selects the pair using
the token’s name or symbol can erroneously select a fake pair.
Both protocols already provide some kind of protection to
the users, displaying a warning message in case a user se-
lects a not official pair—a pair not created by the protocol
itself. While many not official pairs are not a fraud and have
legitimate reasons to be traded, we have evidence that token
spammers are also serial scammers, and it is better not to
trust their tokens. In the light of the results of this work, we
believe that swap protocols should implement token spam-
mer detection systems and flag the liquidity pools created
by these addresses (or new addresses linked to spammers) to
increase the users’ awareness. Sniper bots can build blocklists
of untrusted addresses and avoid swapping into liquidity pools
created by token spammers.
11 Ethical considerations
In this work, we analyzed two blockchains, Ethereum and
BNB smart chain, and processed over 3 billion transactions.
We looked at the addresses that created tokens on the two
blockchains and attempted to understand how the tokens
are used. By design, all the data are publicly stored in
the blockchain ledger, and the EOAs addresses are pseudo-
anonymous. During our study, we never attempt to correlate
users’ addresses with external events jeopardizing the privacy
of the users. Moreover, we never collected nor attempted to
retrieve the IP addresses of the users. Consequently, according
to our IRB’s policy, we did not need any explicit authorization
to perform our experiments.
12 Conclusion and Future Work
In this work, we thoroughly investigate the tokens and the
liquidity pools of the BNB Smart Chain and Ethereum. Al-
though the BSC is five years younger than Ethereum, we find
in BSC three times more tokens and liquidity pools than in
Ethereum. One of the most surprising differences between the
two blockchains is the remarkable difference in the purpose
of deployed smart contracts. Indeed, in BSC 61% of smart
contracts are token contracts, while in Ethereum, only the
We studied the lifetime of the tokens and who generates
them. Here, we discovered two very interesting metrics: 60%
of the total tokens of both blockchains do not survive to their
first day (1-day token), and a tiny fraction of addresses (1%
of addresses), which we called token spammers, created more
than 20% of the tokens. We explore the correlation between
token spammers and 1-day tokens, and we found that to-
ken spammers strongly impact the existence of 1-day tokens.
More interestingly, we find that token spammers use 1-day
tokens as disposable tokens to arrange frauds exploiting the
mechanism of liquidity pools.
We selected from our dataset all the liquidity pools that
show evidence of the frauds and used the retrieved data to
dissect them, analyzing them from several perspectives. Fi-
nally, we introduce the sniper bot, trading bots that aim to
buy tokens at their listing price. However, they unwillingly
became victims of the scam because of their mechanism.
As future work, we believe it is interesting to further refine
our results by verifying and including addresses that coop-
erate to perpetrate exit scams in the analysis. Moreover, it
could be possible to search also for other malicious and more
sophisticated patterns. Finally, another promising direction
is further exploring sniper bots to provide a more detailed
analysis of their typologies and operations.
Hayden Adams, Noah Zinsmeister, and Dan Robinson.
Uniswap v2 core. 2020.
adamsnipes. Pancakeswap bot & uniswap bot.
//, 2022.
Faten Adel Alabdulwahhab. Web 3.0: the decentralized
web blockchain networks and protocol innovation. In
2018 1st International Conference on Computer Appli-
cations & Information Security (ICCAIS), pages 1–4.
IEEE, 2018.
ApeSwap. Apeswap.
Osato Avan-Nomayo. Pancakeswap dex reportedly set
to block users from iran.
https://www.theblockcr rep
ortedly-set- to-block-users- from-iran, 2022.
BakerySwap. Bakeryswap.
https://www.bakerysw, 2022.
Massimo Bartoletti, Salvatore Carta, Tiziana Cimoli, and
Roberto Saia. Dissecting ponzi schemes on ethereum:
identification, analysis, and impact. Future Generation
Computer Systems, 102:259–277, 2020.
Binance. Binance chain docs - json-rpc endpoint.
/rpc.html, 2022.
Binance. Bnb chain documentation.
https://docs.b, 2022.
Binance. Proof of authority explained.
https://acad autho
rity-explained, 2022.
Steven Brock. Scalping in ecommerce: Ethics and im-
pacts. Available at SSRN 3793357, 2021.
Vitalik Buterin et al. A next-generation smart contract
and decentralized application platform. white paper,
3(37), 2014.
Yi Cao, Yuhua Li, Sonya Coleman, Ammar Belatreche,
and Thomas Martin McGinnity. Detecting wash trade
in financial market using digraphs and dynamic pro-
gramming. IEEE transactions on neural networks and
learning systems, 27(11):2351–2363, 2015.
Ting Chen, Yufei Zhang, Zihao Li, Xiapu Luo, Ting
Wang, Rong Cao, Xiuzhuo Xiao, and Xiaosong Zhang.
Tokenscope: Automatically detecting inconsistent be-
haviors of cryptocurrency tokens in ethereum. In Pro-
ceedings of the 2019 ACM SIGSAC conference on com-
puter and communications security, pages 1503–1520,
Weili Chen, Tuo Zhang, Zhiguang Chen, Zibin Zheng,
and Yutong Lu. Traveling the token world: A graph anal-
ysis of ethereum erc20 token ecosystem. In Proceedings
of The Web Conference 2020, pages 1411–1421, 2020.
CoinGecko. Coingecko api.
https://www.coingeck, 2022.
CoinMarketCap. Coinmarketcap.
https://coinmark, 2022.
[18] Cronos docs.
s/getting-started/, 2022.
damartripamungkas. Botdexdamar.
.com/damartripamungkas/botdexdamar, 2022.
Chris Dannen. Introducing Ethereum and solidity, vol-
ume 1. Springer, 2017.
Monika Di Angelo and Gernot Salzer. Identification
of token contracts on ethereum: standard compliance
and beyond. International Journal of Data Science and
Analytics, pages 1–20, 2021.
Morris J Dworkin et al. Sha-3 standard: Permutation-
based hash and extendable-output functions. 2015.
Ethereum. Contract abi specification.
Ethereum. Ethereum virtual machine (evm).
https://, 2022.
Vitalik Buterin Fabian Vogelsteller. Eip-20: Token stan-
Fantom Foundation. Fantom whitepaper.
f, 2022.
Michael Fröwis, Andreas Fuchs, and Rainer Böhme. De-
tecting token systems on ethereum. In International
conference on financial cryptography and data security,
pages 93–112. Springer, 2019.
The Graph. The graph: Apis for a vibrant decentralized
future., 2022.
[29] Infura. Infura., 2022.
Don Johnson, Alfred Menezes, and Scott Vanstone. The
elliptic curve digital signature algorithm (ecdsa). In-
ternational journal of information security, 1(1):36–63,
P.C. Kotsias. pcko1/etherscan-python.
https://gith, 2020.
P.C. Kotsias. pcko1/bscscan-python.
.com/pcko1/bscscan-python, 2021.
Massimo La Morgia, Alessandro Mei, Francesco Sassi,
and Julinda Stefa. Pump and dumps in the bitcoin era:
Real time detection of cryptocurrency market manipula-
tions. In 2020 29th International Conference on Com-
puter Communications and Networks (ICCCN), pages
1–9. IEEE, 2020.
Massimo La Morgia, Alessandro Mei, Francesco Sassi,
and Julinda Stefa. The doge of wall street: Analysis and
detection of pump and dump cryptocurrency manipula-
tions. arXiv preprint arXiv:2105.00733, 2021.
Solidity Lang. Contract abi specification.
l, 2022.
Defi Llama. Defi llama.
Bruno Mazorra, Victor Adan, and Vanesa Daza. Do
not rug on me: Leveraging machine learning techniques
for automated scam detection. Mathematics, 10(6):949,
[38] Mdex. Mdex., 2022.
Evgeny Medvedev and the D5 team. Ethereum etl.
s:// etl
MetaMask. A crypto wallet & gateway to blockchain
apps., 2022.
MyEtherWallet. Myetherwallet.
https://www.myet, 2022.
Jason Carve Piper Merriam.
https://web3, 2022.
Valerio Puggioni. Crypto rug pulls: What is a rug pull in
crypto and 6 ways to spot it.
https://cointelegrap what-is- a
-rug- pull-in-crypto-and-6- ways-to-spot- it
PumpBot. Sniper bot crypto: Chain sniper- the all in
one sniper bot, dex bot, pinksale bot.
https://pump-b bot-frontrunner-
chainsniper-dexbot, 2022.
saantiaguilera. Ax-50 liquidity sniper.
Kevin Sekniqi, Daniel Laine, Stephen Buttolph, and
Emin G
un Sirer. Avalanche Platform, volume 1. online,
Srivats Srinivasan. Human or bot. PhD thesis, California
State University, Sacramento, 2017.
Eva Su. Digital Assets and SEC Regulation. Congres-
sional Research Service, 2020.
Martin Holst Swende and Marius van der Wijden. Eip-
3155: Evm trace specification.
https://eips.ether, 2022.
Trading-Tiger. Pancakeswap bsc sniper bot.
_Sniper_Bot, 2022.
Uniswap. Uniswap v2 license.
/Uniswap/v2-core/blob/master/LICENSE, 2022.
Uniswap. Uniswap v3 license.
/Uniswap/v3-core/blob/main/LICENSE, 2022.
Friedhelm Victor and Bianca Katharina Lüders. Measur-
ing ethereum-based erc20 token networks. In Interna-
tional Conference on Financial Cryptography and Data
Security, pages 113–129. Springer, 2019.
Trust Wallet. Trust wallet.
m/, 2022.
Gavin Wood. Ethereum yellow paper: A formal specifi-
cation of ethereum, a programmable blockchain. 2018.
URL https://github. com/ethereum/yellowpaper, 2018.
Pengcheng Xia, Haoyu Wang, Bingyu Gao, Weihang Su,
Zhou Yu, Xiapu Luo, Chao Zhang, Xusheng Xiao, and
Guoai Xu. Trade or trick? detecting and characterizing
scam tokens on uniswap decentralized exchange. Pro-
ceedings of the ACM on Measurement and Analysis of
Computing Systems, 5(3):1–26, 2021.
Dirk A Zetzsche, Douglas W Arner, and Ross P Buckley.
Decentralized finance. Journal of Financial Regulation,
6(2):172–203, 2020.
Weilin Zheng, Bo Liu, Hong-Ning Dai, Zigui Jiang,
Zibin Zheng, and Muhammad Imran. Unravelling to-
ken ecosystem of eosio blockchain. arXiv preprint
arXiv:2202.11201, 2022.
A Appendix
Table 5: Functions and events of the ERC-20 (Ethereum)
and BEP-20 (Binance Smart Chain) standard interface. We
report in yellow the methods that are optional in the ERC-20
interface and in red the only method that is optional in both
Function Signature
name() 06fdde03
symbol() 95d89b41
decimals() 313ce567
totalSupply() 18160ddd
balanceOf(address) 70a08231
transfer(address,uint256) a9059cbb
transferFrom(address,address,uint256) 23b872dd
approve(address,uint256) 095ea7b3
allowance(address,address) dd62ed3e
Event Signature
Transfer(address,address,uint256) ddf252ad
Approval(address,address,uint256) 095ea7b3
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Uniswap, as with other DEXs, has gained much attention this year because it is a non-custodial and publicly verifiable exchange that allows users to trade digital assets without trusted third parties. However, its simplicity and lack of regulation also make it easy to execute initial coin offering scams by listing non-valuable tokens. This method of performing scams is known as rug pull, a phenomenon that already exists in traditional finance but has become more relevant in DeFi. Various projects have contributed to detecting rug pulls in EVM compatible chains. However, the first longitudinal and academic step to detecting and characterizing scam tokens on Uniswap was made . The authors collected all the transactions related to the Uniswap V2 exchange and proposed a machine learning algorithm to label tokens as scams. However, the algorithm is only valuable for detecting scams accurately after they have been executed. This paper increases their dataset by 20K tokens and proposes a new methodology to label tokens as scams. After manually analyzing the data, we devised a theoretical classification of different malicious maneuvers in the Uniswap protocol. We propose various machine-learning-based algorithms with new, relevant features related to the token propagation and smart contract heuristics to detect potential rug pulls before they occur. In general, the models proposed achieved similar results. The best model obtained accuracy of 0.9936, recall of 0.9540, and precision of 0.9838 in distinguishing non-malicious tokens from scams prior to the malicious maneuver.
Full-text available
Next to cryptocurrencies, tokens are a widespread application area of blockchains. Tokens are digital assets implemented as small programs on a blockchain. Being programmable makes them versatile and an innovative means for various purposes. Tokens can be used as investment, as a local currency in a decentralized application, or as a tool for building an ecosystem or a community. A high-level categorization of tokens differentiates between payment, security, and utility tokens. In most jurisdictions, security tokens are regulated, and hence, the distinction is of relevance. In this work, we discuss the identification of tokens on Ethereum, the most widely used token platform. The programs on Ethereum are called smart contracts, which—for the sake of interoperability—may provide standardized interfaces. In our approach, we evaluate the publicly available transaction data by first reconstructing interfaces in the low-level code of the smart contracts. Then, we not only check the compliance of a smart contract with an established interface standard for tokens, but also aim at identifying tokens that are not fully compliant. Thus, we discuss various heuristics for token identification in combination with possible definitions of a token. More specifically, we propose indicators for tokens and evaluate them on a large set of token and non-token contracts. Finally, we present first steps toward an automated classification of tokens regarding their purpose.
Full-text available
DeFi (‘decentralized finance’) has joined FinTech (‘financial technology’), RegTech (‘regulatory technology’), cryptocurrencies, and digital assets as one of the most discussed emerging technological evolutions in global finance. Yet little is really understood about its meaning, legal implications, and policy consequences. In this article we introduce DeFi, put DeFi in the context of the traditional financial economy, connect DeFi to open banking, and end with some policy considerations. We suggest that decentralization has the potential to undermine traditional forms of accountability and erode the effectiveness of traditional financial regulation and enforcement. At the same time, we find that where parts of the financial services value chain are decentralized, there will be a reconcentration in a different (but possibly less regulated, less visible, and less transparent) part of the value chain. DeFi regulation could, and should, focus on this reconcentrated portion of the value chain to ensure effective oversight and risk control. Rather than eliminating the need for regulation, in fact DeFi requires regulation in order to achieve its core objective of decentralization. Furthermore, DeFi potentially offers an opportunity for the development of an entirely new way to design regulation: the idea of ‘embedded regulation’. Regulatory approaches could be built into the design of DeFi, thus potentially decentralizing both finance and its regulation, in the ultimate expression of RegTech.
Conference Paper
Full-text available
Motivated by the success of Bitcoin, lots of cryptocurrencies have been created, the majority of which were implemented as smart contracts running on Ethereum and called tokens. To regulate the interaction between these tokens and users as well as third-party tools (e.g., wallets, exchange markets, etc.), several standards have been proposed for the implementation of token contracts. Although existing tokens involve lots of money, little is known whether or not their behaviors are consistent with the standards. Inconsistent behaviors can lead to user confusion and financial loss, because users/third-party tools interact with token contracts by invoking standard interfaces and listening to standard events. In this work, we take the first step to investigate such inconsistent token behaviors with regard to ERC-20, the most popular token standard. We propose a novel approach to automatically detect such inconsistency by contrasting the behaviors derived from three different sources, including the manipulations of core data structures recording the token holders and their shares, the actions indicated by standard interfaces, and the behaviors suggested by standard events. We implement our approach in a new tool named TokenScope and use it to inspect all transactions sent to the deployed tokens. We detected 3,259,001 transactions that trigger inconsistent behaviors, and these behaviors resulted from 7,472 tokens. By manually examining all (2,353) open-source tokens having inconsistent behaviors, we found that the precision of TokenScope is above 99.9%. Moreover, we revealed 11 major reasons behind the inconsistency, e.g., flawed tokens, standard methods missing, lack of standard events, etc. In particular, we discovered 50 unreported flawed tokens.
Full-text available
Ponzi schemes are financial frauds which lure users under the promise of high profits. Actually, users are repaid only with the investments of new users joining the scheme: consequently, a Ponzi scheme implodes soon after users stop joining it. Originated in the offline world 150 years ago, Ponzi schemes have since then migrated to the digital world, approaching first the Web, and more recently hanging over cryptocurrencies like Bitcoin. Smart contract platforms like Ethereum have provided a new opportunity for scammers, who have now the possibility of creating "trustworthy"' frauds that still make users lose money, but at least are guaranteed to execute "correctly"'. We present a comprehensive survey of Ponzi schemes on Ethereum, analysing their behaviour and their impact from various viewpoints.
Full-text available
A wash trade refers to the illegal activities of traders who utilize carefully designed limit orders to manually increase the trading volumes for creating a false impression of an active market. As one of the primary formats of market abuse, a wash trade can be extremely damaging to the proper functioning and integrity of capital markets. The existing work focuses on collusive clique detections based on certain assumptions of trading behaviors. Effective approaches for analyzing and detecting wash trade in a real-life market have yet to be developed. This paper analyzes and conceptualizes the basic structures of the trading collusion in a wash trade by using a directed graph of traders. A novel method is then proposed to detect the potential wash trade activities involved in a financial instrument by first recognizing the suspiciously matched orders and then further identifying the collusions among the traders who submit such orders. Both steps are formulated as a simplified form of the knapsack problem, which can be solved by dynamic programming approaches. The proposed approach is evaluated on seven stock data sets from the NASDAQ and the London Stock Exchange. The experimental results show that the proposed approach can effectively detect all primary wash trade scenarios across the selected data sets.
The prosperity of the cryptocurrency ecosystem drives the need for digital asset trading platforms. Beyond centralized exchanges (CEXs), decentralized exchanges (DEXs) are introduced to allow users to trade cryptocurrency without transferring the custody of their digital assets to the middlemen, thus eliminating the security and privacy issues of traditional CEX. Uniswap, as the most prominent cryptocurrency DEX, is continuing to attract scammers, with fraudulent cryptocurrencies flooding in the ecosystem. In this paper, we take the first step to detect and characterize scam tokens on Uniswap. We first collect all the transactions related to Uniswap V2 exchange and investigate the landscape of cryptocurrency trading on Uniswap from different perspectives. Then, we propose an accurate approach for flagging scam tokens on Uniswap based on a guilt-by-association heuristic and a machine-learning powered technique. We have identified over 10K scam tokens listed on Uniswap, which suggests that roughly 50% of the tokens listed on Uniswap are scam tokens. All the scam tokens and liquidity pools are created specialized for the "rug pull" scams, and some scam tokens have embedded tricks and backdoors in the smart contracts. We further observe that thousands of collusion addresses help carry out the scams in league with the scam token/pool creators. The scammers have gained a profit of at least $16 million from 39,762 potential victims. Our observations in this paper suggest the urgency to identify and stop scams in the decentralized finance ecosystem, and our approach can act as a whistleblower that identifies scam tokens at their early stages.