Fan Long

Fan Long
  • University of Toronto

About

54
Publications
4,354
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,971
Citations
Introduction
Current institution
University of Toronto

Publications

Publications (54)
Preprint
Large language models (LLMs) have demonstrated remarkable progress in code generation, but many existing benchmarks are approaching saturation and offer little guarantee on the trustworthiness of the generated programs, offering limited insight into deeper reasoning capabilities. We introduce VerifyThisBench, a new benchmark designed to evaluate LL...
Preprint
Full-text available
Smart contracts power decentralized financial (DeFi) services but are vulnerable to complex security exploits that can lead to significant financial losses. Existing security measures often fail to adequately protect these contracts due to the composability of DeFi protocols and the increasing sophistication of attacks. Through a large-scale empiri...
Preprint
Full-text available
Smart contracts, self-executing programs on the blockchain, facilitate reliable value exchanges without centralized oversight. Despite the recent focus on dynamic analysis of their transaction histories in both industry and academia, no open-source tool currently offers comprehensive tracking of complete transaction information to extract user-desi...
Article
Full-text available
Smart contract transactions associated with security attacks often exhibit distinct behavioral patterns compared with historical benign transactions before the attacking events. While many runtime monitoring and guarding mechanisms have been proposed to validate invariants and stop anomalous transactions on the fly, the empirical effectiveness of t...
Preprint
Full-text available
Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code, and systematically evaluate...
Article
Authenticated storage access is the performance bottleneck of a blockchain, because each access can be amplified to potentially O (log n ) disk I/O operations in the standard Merkle Patricia Trie (MPT) storage structure. In this paper, we propose a multi-Layer Versioned Multipoint Trie (LVMT), a novel high-performance blockchain storage with signif...
Article
We present the Layered Merkle Patricia Trie (LMPT), a performant storage data structure for processing transactions in high-throughput systems when compared to traditional Merkle Patricia Tries used in Ethereum clients. LMPTs keep smaller intermediary tries in memory to alleviate read and write amplification from high-latency disk storage. As an ad...
Article
This paper presents SigVM, the first blockchain virtual machine that extends EVM to support an event-driven execution model, enabling developers to build truly decentralized smart contracts. Contracts in SigVM can emit signal events, on which other contracts can listen. Once an event is triggered, corresponding handler functions are automatically e...
Preprint
We present automatic horizontal fusion, a novel optimization technique that complements the standard kernel fusion techniques for GPU programs. Unlike the standard fusion, whose goal is to eliminate intermediate data round trips, our horizontal fusion technique aims to increase the thread-level parallelism to hide instruction latencies. We also pre...
Preprint
We present Solythesis, a source to source Solidity compiler which takes a smart contract code and a user specified invariant as the input and produces an instrumented contract that rejects all transactions that violate the invariant. The design of Solythesis is driven by our observation that the consensus protocol and the storage layer are the prim...
Preprint
We present Aloes, a new technique and system for automatically detecting software errors in smart contracts. Given the Ethereum Virtual Machine byte code of a smart contract and a user specified constraint or invariant, Aloes symbolically executes the smart contract, explores all possible execution paths, and checks whether it is possible to initia...
Preprint
This paper presents Conflux, a fast, scalable and decentralized blockchain system that optimistically process concurrent blocks without discarding any as forks. The Conflux consensus protocol represents relationships between blocks as a direct acyclic graph and achieves consensus on a total order of the blocks. Conflux then, from the block order, d...
Article
We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects...
Conference Paper
We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obt...
Conference Paper
We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects...
Article
We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obt...
Article
We present a new system, Genesis, that processes sets of human patches to automatically infer code transforms and search spaces for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the resulting complete Genesis patch generation system working with real-world patches and erro...
Article
We present the first systematic analysis of key characteristics of patch search spaces for automatic patch generation systems. We analyze sixteen different configurations of the patch search spaces of SPR and Prophet, two current state-of-the-art patch generation systems. The analysis shows that 1) correct patches are sparse in the search spaces (t...
Article
We present Prophet, a novel patch generation system that works with a set of successful human patches obtained from open-source software repositories to learn a probabilistic, application-independent model of correct code. It generates a space of candidate patches, uses the model to rank the candidate patches in order of likely correctness, and val...
Conference Paper
We present the first systematic analysis of key characteristics of patch search spaces for automatic patch generation systems. We analyze sixteen different configurations of the patch search spaces of SPR and Prophet, two current state-of-the-art patch generation systems. The analysis shows that 1) correct patches are sparse in the search spaces (t...
Preprint
We present the first systematic analysis of the characteristics of patch search spaces for automatic patch generation systems. We analyze the search spaces of two current state-of-the-art systems, SPR and Prophet, with 16 different search space configurations. Our results are derived from an analysis of 1104 different search spaces and 768 patch ge...
Article
We present Prophet, a novel patch generation system that works with a set of successful human patches obtained from open- source software repositories to learn a probabilistic, application-independent model of correct code. It generates a space of candidate patches, uses the model to rank the candidate patches in order of likely correctness, and va...
Conference Paper
Control flow integrity (CFI) has been proposed as an approach to defend against control-hijacking memory corruption attacks. CFI works by assigning tags to indirect branch targets statically and checking them at runtime. Coarse-grained enforcements of CFI that use a small number of tags to improve the performance overhead have been shown to be inef...
Article
Control flow integrity (CFI) has been proposed as an approach to defend against control-hijacking memory corruption attacks. CFI works by assigning tags to indirect branch targets statically and checking them at runtime. Coarse-grained enforcements of CFI that use a small number of tags to improve the performance overhead have been shown to be inef...
Conference Paper
We present SPR, a new program repair system that combines staged program repair and condition synthesis. These techniques enable SPR to work productively with a set of parameterized transformation schemas to generate and efficiently search a rich space of program repairs. Together these techniques enable SPR to generate correct repairs for over fiv...
Article
We present Prophet, a novel patch generation system that learns a probabilistic model over candidate patches from a large code database that contains many past successful human patches. It defines the probabilistic model as the combination of a distribution over program points based on error localization algorithms and a parameterized log-linear di...
Article
We present a new technique and system, DIODE, for auto- matically generating inputs that trigger overflows at memory allocation sites. DIODE is designed to identify relevant sanity checks that inputs must satisfy to trigger overflows at target memory allocation sites, then generate inputs that satisfy these sanity checks to successfully trigger the...
Article
We present Code Phage (CP), a system for automatically transferring correct code from donor applications into recipient applications that process the same inputs to successfully eliminate errors in the recipient. Experimental results using seven donor applications to eliminate ten errors in seven recipient applications highlight the ability of CP t...
Article
We present SPR, a new program repair system that uses condition synthesis to instantiate transformation schemas to repair program defects. SPR's staged repair strategy combines a rich space of potential repairs with a targeted search algorithm that makes this space viably searchable in practice. This strategy enables SPR to successfully find correc...
Article
We present PCR, a new automatic patch generation system. PCR uses a new condition synthesis technique to efficiently discover logical expressions that generate desired control- flow transfer patterns. Presented with a set of test cases, PCR deploys condition synthesis to find and repair incorrect if conditions that cause the application to produce...
Article
We analyze reported patches for three prior generate-and-validate patch generation systems (GenProg, RSRepair, and AE). Because of experimental error, the majority of the reported patches violate the basic principle behind the design of these systems -- they do not produce correct outputs even for the inputs in the test suite used to validate the p...
Article
We present pDNA, a system for automatically transfer- ring correct code from donor applications into recipient applications to successfully eliminate errors in the recipient. Experimental results using six donor applications to eliminate nine errors in six recipient applications highlight the ability of pDNA to transfer code across applications to...
Article
We present a system, RCV, for enabling software applications to survive divide-by-zero and null-dereference errors. RCV operates directly on off-the-shelf, production, stripped x86 binary executables. RCV implements recovery shepherding, which attaches to the application process when an error occurs, repairs the execution, tracks the repair effects...
Article
We present a system, SIFT, for generating input filters that nullify integer overflow errors associated with critical program sites such as memory allocation or block copy sites. SIFT uses a static pro- gram analysis to generate filters that discard inputs that may trigger integer overflow errors in the computations of the sizes of allocated memory...
Conference Paper
We present a system, SIFT, for generating input filters that nullify integer overflow errors associated with critical program sites such as memory allocation or block copy sites. SIFT uses a static pro- gram analysis to generate filters that discard inputs that may trigger integer overflow errors in the computations of the sizes of allocated memory...
Conference Paper
We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint depen...
Conference Paper
We present a novel technique, automatic input rectification, and a prototype implementation, SOAP. SOAP learns a set of constraints characterizing typical inputs that an application is highly likely to process correctly. When given an atypical input that does not satisfy these constraints, SOAP automatically rectifies the input (i.e., changes the i...
Article
Full-text available
G2 is a graph processing system for diagnosing distributed systems. It works on execution graphs that model runtime events and their correlations in distributed systems. In G2, a diagnosis process involves a series of queries, expressed in a high-level declarative language that supports both relational and graph-based operators. Each query is compi...
Conference Paper
Full-text available
A replay tool aiming to reproduce a program's execution interposes itself at an appropriate replay interface between the program and the environment. During recording, it logs all non-deterministic side effects passing through the interface from the environment and feeds them back during replay. The replay interface is critical for correctness and...
Conference Paper
Full-text available
This paper presents a tool Altair that automatically gener- ates API function cross-references, which emphasizes reliable structural measures and does not depend on specic client code. Altair ranks related API functions for a given query according to pair-wise overlap, i.e., how they share state, and clusters tightly related ones into meaningful mo...
Conference Paper
MODIST is the first model checker designed for transparently checking unmodified distributed systems running on unmod- ified operating systems. It achieves this transparency via a novel architecture: a thin interposition layer exposes all ac- tions in a distributed system and a centralized, OS-independent model checking engine explores these action...

Network

Cited By