Chapter

Analysis of Ethereum Smart Contracts and Opcodes

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Much attention has been paid in recent years to the use of smart contracts. A smart contract is a transaction protocol that executes the terms of an agreement. Ethereum is a widely used platform for executing smart contracts, defined by using a Turing-complete language. Various studies have been performed in order to analyse smart contract data from different perspectives. In our study we gather a wide range of verified smart contracts written by using the Solidity language and we analyse their code. A similar study is carried out on Solidity compilers. The aim of our investigation is the identification of the smart contract functionalities, i.e. opcodes, that play a crucial role in practice, and single out those functionalities that are not practically relevant.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As shown in Table 2, the opcodes comprise instructions mnemonics (i.e., ADD, SSTORE JUMP, LT, MOD, etc.) as well as their operands. The use of operation codes has been successfully applied to various fundamental issues of contract accounts in previous studies [18,39]. Therefore, we extract the n-gram features from the opcode sequences of the contract accounts to detect abnormal schemes. ...
... The use of operation codes has been successfully applied to various fundamental issues of contract accounts in previous studies [18,39]. Therefore, we extract the n-gram features from the opcode sequences of the contract accounts to detect abnormal schemes. ...
Article
Full-text available
Blockchain technology has allowed many abnormal schemes to hide behind smart contracts. This causes serious financial losses, which adversely affects the blockchain. Machine learning technology has mainly been utilized to enable automatic detection of abnormal contract accounts in recent years. In spite of this, previous machine learning methods have suffered from a number of disadvantages: first, it is extremely difficult to identify features that enable accurate detection of abnormal contracts, and based on these features, statistical analysis is also ineffective. Second, they ignore the imbalances and repeatability of smart contract accounts, which often results in overfitting of the model. In this paper, we propose a data-driven robust method for detecting abnormal contract accounts over the Ethereum Blockchain. This method comprises hybrid features set by integrating opcode n-grams, transaction features, and term frequency-inverse document frequency source code features to train an ensemble classifier. The extra-trees and gradient boosting algorithms based on weighted soft voting are used to create an ensemble classifier that balances the weaknesses of individual classifiers in a given dataset. The abnormal and normal contract data are collected by analyzing the open source etherscan.io, and the problem of the imbalanced dataset is solved by performing the adaptive synthetic sampling. The empirical results demonstrate that the proposed individual feature sets are useful for detecting abnormal contract accounts. Meanwhile, combining all the features enhances the detection of abnormal contracts with significant accuracy. The experimental and comparative results show that the proposed method can distinguish abnormal contract accounts for the data-driven security of blockchain Ethereum with satisfactory performance metrics.
... For SC data specifically, [22] classify SCs by analyzing the data field extracted from transactions collected from Etherscan and traces collected from the Parity Ethereum client. [4] generates histograms of OPCodes after collecting contracts from Etherscan. However, the literature survey conducted for this paper has not revealed a broad arbitrary media analysis done on the Ethereum BC. ...
Conference Paper
Since the proposal of Bitcoin in 2009 and with the inclusion of the first transaction in its genesis block, Blockchains (BC) have been used to store arbitrary data, including texts, images, and documents. However, such data is often not easily discoverable in BCs and is embedded within their binary data structures. Thus, this paper presents the design and implementation of a solution to analyze BC transactions searching for “media” content. This solution, called blockchain-parser, is capable of detecting ASCII strings and files (e.g., PDF, GIF, and SVG) embedded in BC's transactions. To evaluate such a solution, Bitcoin, Monero, and Ethereum cryptocurrencies were examined to find commonalities and differences between different BCs regarding their arbitrary data storage usage. Conclusions from such an evaluation indicate that Ethereum has been the most used BC for media data storage compared to Bitcoin and Monero.
... Blockchain is a system for storing data in a manner that makes system alterations, hacking, and cheating difficult or impossible [1]. A blockchain is a network of computer systems that duplicates and distributes a digital ledger of transactions across the entire network [2,3]. ...
Article
Full-text available
Blockchain may be an optimal solution when a detailed and transparent record of assets is necessary. It is imperative to manage and safeguard digital interactions or maintain a decentralized and shared system of records in applications, such as those used for electricity production, transmission, distribution, and consumption and those used for data sharing and secure payments. Such applications can benefit from blockchain technology to resolve these problems. In the proposed blockchain-based consumer electronics data sharing and safe payment framework, an innovative IoT meter detects monthly consumption and transmits the data to a decentralized application that is stored in the blockchain. This decentralized platform will generate the bill and provide incentives for legitimate consumers. Finally, the end-to-end latency and throughput were used to evaluate the performance of the proposed approach.
... Underprized operations were changed afterwards in the Ethereum protocol. The execution frequency of single opcodes was analyzed [73] and correlated with source code features [46]. For gas cost profiling, the results are presented either as single opcodes [20], using the categories from the Ethereum yellow paper [11], or on source code line level [21]. ...
Conference Paper
Users must pay a fee depending on resource consumption when using smart contracts on the Ethereum blockchain. As even the most basic operations cost several dollars under moderate network load, developers may actively reduce user-paid fees by optimizing the smart contract resource consumption (’gas costs’). Previous works suggested patterns and tools supporting developers in gas cost optimization, but up to now a comprehensive analysis of their real-world impact is missing. Another gap is the maintenance and evolution support for smart contracts leveraging the publicly available usage data. We propose high-level gas cost profiles and review which profiles are considered in the existing literature. Additionally, we sampled around 68,000 smart contract interactions from three years, analyzed them using the gas cost profiles, and compare the findings to the current focus in literature. In our data set, external code, storage, and the transaction base fee are first-level cost drivers in terms of absolute gas usage, but contract deployment becomes also costly when considering the average gas usage per transaction. Our analysis also shows that plenty of previous work focused cost categories barely influencing resource consumption.
... These obtained features are fed to the model trained using machine learning algorithms to identify the smart contract either as a Ponzi contract or a non-Ponzi contract. Opcodes have been successfully used in previous studies to analyze different fundamental issues of smart contracts [15,16]. Thus, we extracted various types of opcode features to evaluate the performance of the proposed method. ...
Chapter
Full-text available
Blockchain-based currencies, i.e., Ethereum, have increased in popularity among followers since 2009. However, scammers have customized offline frauds to this new ecosystem depending on blockchain’s anonymity. As a result, smart Ponzi contracts are circulating on Ethereum, which appear to be secure investment schemes. We employ data mining techniques to present an effective detection model for smart Ponzi contracts over the Ethereum blockchain. First, we extended the dataset of smart Ponzi contracts and eliminated the imbalanced dataset by performing adaptive synthetic sampling. Next, we defined four kinds of feature sets based on the operation codes (opcodes) of smart contracts such as opcode frequency, count vector, n-gram Term Frequency-Inverse Document Frequency (TF-IDF), and opcode sequence features. It is noteworthy that the feature sets are based on the opcodes of smart contracts, which makes our model more reliable once the smart contract is uploaded to the Ethereum Blockchain. Finally, we designed an ensemble classification model combining Bagging-Tree and XGBoost classifiers, compared to other methods, to increase the detection accuracy of smart Ponzi contracts. The empirical and comparative results show that the ensemble model with only n-gram based features presents the best performance and achieves high precision and recall.KeywordEthereum blockchainPonzi contractsOpcode featuresMachine learning
... When it encounters "EXTER-NALINFOEND," the external transaction ends. e information that can be obtained at this time is whether the transaction is successful and all the gas spent by the exchange, and then all the data is written [35]. en the signs of the start of internal transactions are mainly "CALLSTART," "CREATESTART," "CREATE2START," "CALLCODESST ART," "DELEGATECALLSTART," "STATICCALLSTA RT," and then collect the transaction amount of both parties during this period. ...
Article
Full-text available
Since the Ethereum virtual machine is Turing complete, Ethereum can implement various complex logics such as mutual calls and nested calls between functions. Therefore, Ethereum has suffered a lot of attacks since its birth, and there are still many attackers active in Ethereum transactions. To this end, we propose a traceability method on Ethereum, using graph analysis to track attackers. We collected complete user transaction data to construct the graph and analyzed data on several harmful attacks, including reentry attacks, short address attacks, DDoS attacks, and Ponzi contracts. Through graph analysis, we found accounts that are strongly associated with these attacks and are still active. We have done a systematic analysis of these accounts to analyze their threats. Finally, we also analyzed the correlation between the information collected through RPC and these accounts and finally found that some accounts can find their IP addresses.
... As such, they are very much suitable for the detection of latent problems in smart contracts. Previous studies on various fields like Ethereum smart contracts [28], [29] and malware analysis [30] have shown that opcode provides a reliable and accurate analysis method for security threat detection. Fig. 2 shows the number of opcodes having different lengths after conversion of bytecodes. ...
... Mohanta et al. (2018) introduced seven uses cases for smart contracts, including supply chain, IoT, and healthcare systems. Many empirical studies also focus on the performance of smart contract tools (Perez and Livshits 2019;Parizi et al. 2018a), programming languages (Harz and Knottenbelt 2018;Schrans et al. 2018;Parizi et al. 2018b), ecosystem (Kiffer et al. 2018;He et al. 2019;Hegedűs 2019), permissions (Vukolić 2017), design patterns (Bartoletti and Pompianu 2017), life cycle (Di and Salzer 2019), call relations (Bistarelli et al. 2019). Durieux et al. (2020) presented an empirical study of 9 state-of-art smart contract vulnerability analysis tools. ...
Article
Full-text available
Software development is a very broad activity that captures the entire life cycle of a software, which includes designing, programming, maintenance and so on. In this study, we focus on the maintenance-related concerns of the post-deployment of smart contracts. Smart contracts are self-executed programs that run on a blockchain. They cannot be modified once deployed and hence they bring unique maintenance challenges compared to conventional software. According to the definition of ISO/IEC 14764, there are four kinds of software maintenance, i.e., corrective, adaptive, perfective, and preventive maintenance. This study aims to answer (i) What kinds of issues will smart contract developers encounter for corrective, adaptive, perfective, and preventive maintenance after they are deployed to the Ethereum? (ii) What are the current maintenance-related methods used for smart contracts? To obtain the answers to these research questions, we first conducted a systematic literature review to analyze 131 smart contract related research papers published from 2014 to 2020. Since the Ethereum ecosystem is fast-growing, some results from previous publications might be out-of-date and there may be a gap between academia and industry. To address this, we performed an online survey of smart contract developers on Github to validate our findings and received 165 useful responses. Based on the survey feedback and literature review, we present the first empirical study on smart contract maintenance-related concerns. Our study can help smart contract developers better maintain their smart contract-based projects, and we highlight some key future research directions to improve the Ethereum ecosystem.
... 2) Ethereum Block Explorers: Ethereum block explorers are platforms that allow the users to explore and search the Ethereum blockchain for transactions, addresses, tokens and other activities taking place on the Ethereum blockchain (20). Unlike GitHub, the Ethereum block explorers allow accessing only Ethereum data used in the Ethereum blockchain and thus smart contracts' real use-cases. ...
Article
Full-text available
Many empirical software engineering studies show that there is a need for repositories where source codes are acquired, filtered and classified. During the last few years, Ethereum block explorer services have emerged as a popular project to explore and search for Ethereum blockchain data such as transactions, addresses, tokens, smart contracts’ source codes, prices and other activities taking place on the Ethereum blockchain. Despite the availability of this kind of service, retrieving specific information useful to empirical software engineering studies, such as the study of smart contracts’ software metrics, might require many subtasks, such as searching for specific transactions in a block, parsing files in HTML format, and filtering the smart contracts to remove duplicated code or unused smart contracts. In this paper, we afford this problem by creating Smart Corpus, a corpus of smart contracts in an organized, reasoned and up-to-date repository where Solidity source code and other metadata about Ethereum smart contracts can easily and systematically be retrieved. We present Smart Corpus’s design and its initial implementation, and we show how the data set of smart contracts’ source codes in a variety of programming languages can be queried and processed to get useful information on smart contracts and their software metrics. Smart Corpus aims to create a smart-contract repository where smart-contract data (source code, application binary interface (ABI) and byte code) are freely and immediately available and are classified based on the main software metrics identified in the scientific literature. Smart contracts’ source codes have been validated by EtherScan, and each contract comes with its own associated software metrics as computed by the freely available software PASO. Moreover, Smart Corpus can be easily extended as the number of new smart contracts increases day by day.
... Ethereum block explorers are platforms that allow the users to explore and search the Ethereum blockchain for transactions, addresses, tokens and other activities taking place on the Ethereum blockchain (25). Unlike GitHub, the Ethereum block explorers allow accessing only Ethereum data used in the Ethereum blockchain and thus smart contracts' real use-cases. ...
Preprint
Many empirical software engineering studies show that there is a great need for repositories where source code is acquired, filtered and classified. During the last few years, Ethereum block explorer services have emerged as a popular project to explore and search Ethereum blockchain data such as transactions, addresses, tokens, smart-contracts' source code, prices and other activities taking place on the Ethereum blockchain. Despite the availability of this kind of services, retrieving specific information useful to empirical software engineering studies, such as the study of smart-contracts' software metrics might require many sub-tasks, such as searching specific transactions in a block, parsing files in HTML format and filtering the smart-contracts to remove duplicated code or unused smart-contracts. In this paper we afford this problem creating Smart Corpus', a Corpus of Smart Contracts in an organized reasoned and up to date repository where Solidity source code and other metadata about Ethereum smart contracts can easily and systematically be retrieved. We present the Smart Corpus' design and its initial implementation and we show how the data-set of smart contracts' source code in a variety of programming languages can be queried and processed, get useful information on smart contracts and their software metrics. The Smart Corpus aims to create a smart-contracts' repository where smart contracts data (source code, ABI and byte-code) are freely and immediately available and also classified based on the main software metrics identified in the scientific literature. Smart-contracts source code has been validated by EtherScan and each contract comes with its own associated software metrics as computed by the freely available software PASO. Moreover, Smart Corpus can be easily extended, as the number of new smart-contracts increases day by day.
Article
Blockchain is an up-and-coming technology designed to ease the process of verifying legitimate products without the need of a centralized system. An example of a well-known technology using Blockchain is Bitcoin and Ethereum cryptocurrency. Blockchain technology will ensure that data residing within each block cannot be tampered with by anyone other than the owner. This paper uses Blockchain technology to develop a system where customers can validate a product legitimacy without the need of a corresponding merchant. This system will be able to be used by manufacturers and companies to ensure that their products will be harder to counterfeit, therefore they will gain more trust from customers. The Ethereum blockchain is used to build the proposed model that is capable of tracing every item's creation and transactions, ensuring the credibility of an item's genuineness. Our simulation results in an effective model and a cost analysis that shows that the model uses an average gas amount of 715,046.3 gwei.
Article
In Ethereum blockchain, whenever a transaction of smart contract is executed, transaction fee is charged in terms of Ethers. To calculate the transaction fee, a computational unit, gas is introduced in smart contracts. Gas consumption is calculated against the smart contract source code execution. The transaction initiator sets the gas price against per unit of gas and the total gas limit. If the gas limit is sufficient, the transaction will be mined otherwise it will be reverted. Smart contracts of Ethereum can be written in any high-level language such as Solidity, Vyper, Python, Java and so forth, but Solidity is massively used for smart contracts creation. In this article, we have examined the 5000 transactions of Solidity based smart contracts from Etherscan and performed statistical analysis on opcodes and source code parameters used in these transactions to identify gas costly patterns. Our statistical results (correlation and regression) analyze the relationship of Solidity parameters and opcodes with the gas consumption. Factors causing an increase or decrease in the gas consumption of smart contracts are highlighted in this article. The regression analysis showed that 87.8% of the variability in the response variable (gas consumption) is due to the parameters used in this analysis. Our results will help the smart contract developers to write the gas optimized smart contracts. The results can be beneficial for end users as they will have to pay gas price for less number of gas units.
Article
Blockchain provides a decentralized environment for applications and information systems in various fields. It is an innovative revolution for the traditional Internet. However, without proper regulatory mechanisms, the blockchain technology has gradually become a hotbed of criminal activities, such as Ponzi scheme that brings huge economic losses to people. To maintain the security of the blockchain system, the machine learning technique, which can detect smart Ponzi schemes automatically has recently received extensive attention. However, the existing method has potential target leakage and prediction shift problems when dealing with category features and calculating gradient estimates. Besides, they also ignore the imbalance and repeatability of smart contracts, which often causes the model to overfit. In this paper, we introduce a novel method for detecting smart Ponzi schemes in blockchain. Specifically, we first expand the dataset of smart Ponzi schemes and eliminate the unbalanced dataset via data enhancement. Then, we leverage ordered target statistics (TS) to handle the category features of smart contract without target leakage. Finally, we propose an anti-leakage smart Ponzi schemes detection (Al-SPSD) model based on the idea of ordered boosting. Experimental results show that our proposal outperforms the competitive methods and is effective and reliable in detecting smart Ponzi schemes. Al-SPSD achieves 96% F-score and detects about 1,621 active smart Ponzi schemes in Ethereum.
Article
Full-text available
In Bitcoin, the most common kind of transactions is in the form “Bob pays Alice,” and it is based on the Pay to-Public Key Hash (P2PKH) script, which are resolved by sending the public key and a digital signature created by the corresponding private key. P2PKH transactions are just one among many standard classes: a transaction is standard if it passes Bitcoin Core IsStandard() and IsStandardTx() tests. However, the creation of ad-hoc scripts to lock (and unlock) transactions allows for also generating non-standard transactions, which can be nevertheless broadcast and mined as well. In this work, we explore the Bitcoin block-chain with the purpose to analyze and classify standard and non-standard transactions, understanding how much the standard behavior is respected.
Conference Paper
Full-text available
Bitcoin is a cryptocurrency and a peer-to-peer payment system, where transactions directly take place between pseudo-anonymous users, without any centralised authority. Since the block-chain (i.e., the public ledger where transactions are registered) is an example of Big Data, a straightforward visualisation is not very informative. For this reason, we employ techniques from Visual Analytics to filter out undesired information in order to obtain a tool to visually analyse the transactions and help its analysis. For instance, different views can highlight miners, or sources and leaves of bitcoin flows, together with the balance of each address and transaction. Moreover, the main view sees transactions as grouped into disconnected "islands", making it possible to focus on only one of them at once.
Article
Full-text available
Smart contracts are computer programs that can be consistently executed by a network of mutually distrusting nodes, without the arbitration of a trusted authority. Because of their resilience to tampering, smart contracts are appealing in many scenarios, especially in those which require transfers of money to respect certain agreed rules (like in financial services and in games). Over the last few years many platforms for smart contracts have been proposed, and some of them have been actually implemented and used. We study how the notion of smart contract is interpreted in some of these platforms. Focussing on the two most widespread ones, Bitcoin and Ethereum, we quantify the usage of smart contracts in relation to their application domain. We also analyse the most common programming patterns in Ethereum, where the source code of smart contracts is available.
Article
Full-text available
Half a decade after Bitcoin became the first widely used cryptocurrency, blockchains are receiving considerable interest from industry and the research community. Modern blockchains feature services such as name registration and smart contracts. Some employ new forms of consensus, such as proof-of-stake instead of proof-of-work. However, these blockchains are so far relatively poorly investigated, despite the fact that they move considerable assets. In this paper, we explore three representative, modern blockchains---Ethereum, Namecoin, and Peercoin. Our focus is on the features that set them apart from the pure currency use case of Bitcoin. We investigate the blockchains' activity in terms of transactions and usage patterns, identifying some curiosities in the process. For Ethereum, we are mostly interested in the smart contract functionality it offers. We also carry out a brief analysis of issues that are introduced by negligent design of smart contracts. In the case of Namecoin, our focus is how the name registration is used and has developed over time. For Peercoin, we are interested in the use of proof-of-stake, as this consensus algorithm is poorly understood yet used to move considerable value. Finally, we relate the above to the fundamental characteristics of the underlying peer-to-peer networks. We present a crawler for Ethereum and give statistics on the network size. For Peercoin and Namecoin, we identify the relatively small size of the networks and the weak bootstrapping process.
Conference Paper
Ethereum is the second most valuable cryptocurrency today, with a current market cap of over $68B. What sets Ethereum apart from other cryptocurrencies is that it uses the blockchain to not only store a record of transactions, but also smart contracts and a history of calls made to those contracts. Thus, Ethereum represents a new form of distributed system: one where users can implement contracts that can provide functionality such as voting protocols, crowdfunding projects, betting agreements, and many more. However, despite the massive investment, little is known about how contracts in Ethereum are actually created and used. In this paper, we examine how contracts in Ethereum are created, and how users and contracts interact with one another. We modify the geth client to log all such interactions, and find that contracts today are three times more likely to be created by other contracts than they are by users, and that over 60% of contracts have never been interacted with. Additionally, we obtain the bytecode of all contracts and look for similarity; we find that less than 10% of user-created contracts are unique, and less than 1% of contract-created contracts are so. Clustering the contracts based on code similarity reveals even further similarity. These results indicate that there is substantial code re-use in Ethereum, suggesting that bugs in such contracts could have wide-spread impact on the Ethereum user population.
Conference Paper
Smart contracts are computer programs that can be correctly executed by a network of mutually distrusting nodes, without the need of an external trusted authority. Since smart contracts handle and transfer assets of considerable value, besides their correct execution it is also crucial that their implementation is secure against attacks which aim at stealing or tampering the assets. We study this problem in Ethereum, the most well-known and used framework for smart contracts so far. We analyse the security vulnerabilities of Ethereum smart contracts, providing a taxonomy of common programming pitfalls which may lead to vulnerabilities. We show a series of attacks which exploit these vulnerabilities, allowing an adversary to steal money or cause other damage.
Chapter
In recent years, electronic contracts have gained attention, especially in the context of the blockchain technology. While public blockchains are considered secure, legally binding under certain circumstances, and without any centralized control, they are applicable to a wide range of application domains, such as smart contracts, public registries, registry of deeds, or virtual organizations. As one of the most prominent blockchain examples, the Bitcoin system has reached large public, financial industry-related, and research interest. Another prominent blockchain example, Ethereum, which is considered a general approach for smart contracts, has taken off too. Nevertheless, various different set of functions, applications, and stakeholders are involved in this smart contract arena. These are highlighted and put into interrelated technical, economic, and legal perspectives.
Conference Paper
Ethereum is a framework for cryptocurrencies which uses blockchain technology to provide an open global computing platform, called the Ethereum Virtual Machine (EVM). EVM executes bytecode on a simple stack machine. Programmers do not usually write EVM code; instead, they can program in a JavaScript-like language, called Solidity, that compiles to bytecode. Since the main purpose of EVM is to execute smart contracts that manage and transfer digital assets (called Ether), security is of paramount importance. However, writing secure smart contracts can be extremely difficult: due to the openness of Ethereum, both programs and pseudonymous users can call into the public methods of other programs, leading to potentially dangerous compositions of trusted and untrusted code. This risk was recently illustrated by an attack on TheDAO contract that exploited subtle details of the EVM semantics to transfer roughly $50M worth of Ether into the control of an attacker. In this paper, we outline a framework to analyze and verify both the runtime safety and the functional correctness of Ethereum contracts by translation to F*, a functional programming language aimed at program verification.
Conference Paper
We document our experiences in teaching smart contract programming to undergraduate students at the University of Maryland, the first pedagogical attempt of its kind. Since smart contracts deal directly with the movement of valuable currency units between contractual parties, security of a contract program is of paramount importance. Our lab exposed numerous common pitfalls in designing safe and secure smart contracts. We document several typical classes of mistakes students made, suggest ways to fix/avoid them, and advocate best practices for programming smart contracts. Finally, our pedagogical efforts have also resulted in online open course materials for programming smart contracts, which may be of independent interest to the community.
A suite of tools for the forensic analysis of bitcoin transactions: preliminary report
  • S Bistarelli
  • I Mercanti
  • F Santini
  • G Mencagli
  • D B Heras
  • V Cardellini
  • E Casalicchio
  • E Jeannot
  • F Wolf
  • A Salis
  • C Schifanella
  • R R Manumachu
  • L Ricci
  • M Beccuti
  • L Antonelli
  • J D G Sánchez
Ethereum: a secure decentralised generalised transaction ledger
  • G Wood
The Ethreum block explorer
  • M Tan
M. Tan. The Ethreum block explorer. https://etherscan.io, 2018. [Online; accessed 09-December-2018].