Siyuan Jiang's research while affiliated with University of Notre Dame and other places

Publications (18)

Preprint
Full-text available
Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven...
Article
Commit messages are a valuable resource in comprehension of software evolution, since they provide a record of changes such as feature additions and bug repairs. Unfortunately, programmers often neglect to write good commit messages. Different techniques have been proposed to help programmers by automatically writing these messages. These technique...
Article
Full-text available
Programmers need documentation to comprehend software, but they often lack the time to write it. Thus, programmers must prioritize their documentation effort to ensure that sections of code important to program comprehension are thoroughly explained. In this paper, we explore the possibility of automatically prioritizing documentation effort. We pe...
Article
Full-text available
“Change Impact Analysis” is the process of determining the consequences of a modification to software. In theory, change impact analysis should be done during software maintenance, to make sure changes do not introduce new bugs. Many approaches and techniques are proposed to help programmers do change impact analysis automatically. However, it is s...
Article
Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studyi...
Article
Full-text available
When learning to use an Application Programming Interface (API), programmers need to understand the inputs and outputs (I/O) of the API functions. Current documentation tools automatically document the static information of I/O, such as parameter types and names. What is missing from these tools is dynamic information, such as I/O examples---actual...
Article
Full-text available
Software is constantly changing. To ensure the quality of this process, when preparing to change a program, developers must first identify the main consequences and risks of modifying the program locations they intend to change. This activity is called change-impact analysis. However, existing impact analysis suffers from two major problems: coarse...
Article
Full-text available
Software constantly changes during its life cycle. This phenomenon is particularly prominent in modern software, whose complexity keeps growing and changes rapidly in response to market pressures and user demands. At the same time, developers must assure the quality of this software in a timely manner. Therefore, it is of critical importance to pro...
Conference Paper
Full-text available
Sensitivity analysis determines how a system responds to stimuli variations, which can benefit important software-engineering tasks such as change-impact analysis. We present SENSA, a novel dynamic-analysis technique and tool that combines sensitivity analysis and execution differencing to estimate the dependencies among statements that occur in pr...
Conference Paper
Full-text available
Dynamic slicing is a practical and popular analysis technique used in various software-engineering tasks. Dynamic slicing is known to be incomplete because it analyzes only a subset of all possible executions of a program. However, it is less known that its results may inaccurately represent the dependencies that occur in those executions. Some res...
Conference Paper
Dynamic program slicing attempts to find runtime dependencies among statements to support security, reliability, and quality tasks such as information-flow analysis, testing, and debugging. However, it is not known how accurately dynamic slices identify statements that really affect each other. We propose a new approach to estimate the accuracy of...
Conference Paper
Full-text available
We describe DUA-Forensics, our open-source Java-bytecode program analysis and instrumentation system built on top of Soot. DUA-Forensics has been in development for more than six years and has supported multiple research projects on efficient monitoring, test-suite augmentation, fault localization, symbolic execution, and change-impact analysis. Th...
Conference Paper
Full-text available
Program slicing is a popular but imprecise technique for identifying which parts of a program affect or are affected by a particular value. A major reason for this imprecision is that slicing reports all program statements possibly affected by a value, regardless of how relevant to that value they really are. In this paper, we introduce quantitativ...

Citations

... 2. Learning code representation from code structure: Code can be parsed into Abstract Syntax Tree (AST), which depicts the code's syntactic structure, and the structure is also critical for code summarization (Hu et al. 2018a). Considering the structure of code, some works (Hu et al. 2018a(Hu et al. , 2020LeClair et al. 2019;Zhang et al. 2020) try to utilize AST structure by traversing the AST in some order. Other works (Alon et al. , 2019 model the paths in the AST to learn code structure. ...
... We test our algorithm in two downstream tasks, namely, code search and code summarization. They are the most widely used software engineering tasks to demonstrate the capacity of NL-PL understanding [17,22,23,31,41,46]. Code Search. ...
... These can be classified into 2 categories: (1) tools to generate/recommend documentation and (2) empirical investigation of documentation usage and quality. Regarding automation for documentation, research has focused on either summarization or recommendation for bug reports [38,41], code [13,25,42], user stories [29], API usage examples [23,35,50,62], etc. Different from these, our tool AutoTSG focuses on automation that helps translate manual text documentation to executable workflows. Closer to our work in this space are the empirical studies on documentation. ...
... To avoid having false positive commits, we applied the filtering to narrow down the commit messages eliminating the ones that are less likely to be classified as one of the five motivation. We designed the filtering to help ensure that we only trained the algorithm on higher-quality commit messages 48 . ...
... The proposed summarization techniques fall into two categories. Extractive summarization techniques generate summaries by extracting information from the code components being summarized [23], [50], [64], [68]. On the other hand, abstractive summarization techniques aim at including in the summaries information not directly available in the source code [24], [28], [32], [46], [67]. ...
... These can be classified into 2 categories: (1) tools to generate/recommend documentation and (2) empirical investigation of documentation usage and quality. Regarding automation for documentation, research has focused on either summarization or recommendation for bug reports [38,41], code [13,25,42], user stories [29], API usage examples [23,35,50,62], etc. Different from these, our tool AutoTSG focuses on automation that helps translate manual text documentation to executable workflows. Closer to our work in this space are the empirical studies on documentation. ...
... Time series analysis is highly complex due to the large data dimension associated with it [20]. In order to reduce complexity, a time series is usually applied to some representation scheme such as discrete Fourier transform (DFT), discrete wavelet transform (DWT), piecewise aggregate approximation (PAA), trend extraction (TE), complexity-invariant distance measure (CID), temporal correlations (TC) etc. (for details, see references: [21][22][23][24][25]). In this analysis, we compute the distance between two sets of vectors and applied trend extraction in conjunction with Euclidean distance as a similarity measure to cluster the time series database defined by ...
... A structural semantic slice (SSC) consists of a description of relational information of involved modules, their classes, methods and connected modules in a change instance, which is easy to understand to a reviewer [26]. With these SSCs, various layers of abstract design summary generation are possible [21]. For example, a high-level abstract summary can be automatically generated from such a slice as -The ASBC module defines a sensitive STC method with new runtime dependency on ASC module for authorizing sensitive service access (underline texts represent an architectural relation, acronyms are discussed in Section 3). ...
... We found that debugging episodes varied widely in duration range, from a few seconds to more than a hundred minutes, with a skewed distribution. Most debugging time was spend in the longest [18] Controlled Observation 10 1 20 Strategies Jiang et.al [19] Controlled Observation 9 ...
... The EAS approach [18] which partially inspired D 2 ABS is a performance optimization of its predecessor PATHIMPACT [17]. Many other dynamic impact analysis techniques also exist [64], [65], aiming at improving precision [21], [66], [67], recall [68], efficiency [69], and cost-effectiveness [12], [24] over PATHIMPACT and EAS. However, these techniques did not address distributed or multiprocess programs that we focus on in this work. ...