Siyuan Jiang's research while affiliated with University of Notre Dame and other places

Publications (21)

Preprint
Full-text available
Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the fr...
Preprint
Full-text available
Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven...
Conference Paper
Full-text available
Recent emerged phrase-level topic models are able to provide topics of phrases, which are easy to read for humans. But these models are lack of the ability to capture the correlation structure among the discovered numerous topics. We propose a novel topic model PhraseCTM and a two-stage method to find out the correlated topics at phrase level. In t...
Article
Commit messages are a valuable resource in comprehension of software evolution, since they provide a record of changes such as feature additions and bug repairs. Unfortunately, programmers often neglect to write good commit messages. Different techniques have been proposed to help programmers by automatically writing these messages. These technique...
Article
Full-text available
Programmers need documentation to comprehend software, but they often lack the time to write it. Thus, programmers must prioritize their documentation effort to ensure that sections of code important to program comprehension are thoroughly explained. In this paper, we explore the possibility of automatically prioritizing documentation effort. We pe...
Article
Full-text available
“Change Impact Analysis” is the process of determining the consequences of a modification to software. In theory, change impact analysis should be done during software maintenance, to make sure changes do not introduce new bugs. Many approaches and techniques are proposed to help programmers do change impact analysis automatically. However, it is s...
Article
Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studyi...
Article
Full-text available
When learning to use an Application Programming Interface (API), programmers need to understand the inputs and outputs (I/O) of the API functions. Current documentation tools automatically document the static information of I/O, such as parameter types and names. What is missing from these tools is dynamic information, such as I/O examples---actual...
Article
Full-text available
Software is constantly changing. To ensure the quality of this process, when preparing to change a program, developers must first identify the main consequences and risks of modifying the program locations they intend to change. This activity is called change-impact analysis. However, existing impact analysis suffers from two major problems: coarse...
Article
Full-text available
Software constantly changes during its life cycle. This phenomenon is particularly prominent in modern software, whose complexity keeps growing and changes rapidly in response to market pressures and user demands. At the same time, developers must assure the quality of this software in a timely manner. Therefore, it is of critical importance to pro...
Conference Paper
Full-text available
Sensitivity analysis determines how a system responds to stimuli variations, which can benefit important software-engineering tasks such as change-impact analysis. We present SENSA, a novel dynamic-analysis technique and tool that combines sensitivity analysis and execution differencing to estimate the dependencies among statements that occur in pr...
Conference Paper
Full-text available
Dynamic slicing is a practical and popular analysis technique used in various software-engineering tasks. Dynamic slicing is known to be incomplete because it analyzes only a subset of all possible executions of a program. However, it is less known that its results may inaccurately represent the dependencies that occur in those executions. Some res...
Conference Paper
Dynamic program slicing attempts to find runtime dependencies among statements to support security, reliability, and quality tasks such as information-flow analysis, testing, and debugging. However, it is not known how accurately dynamic slices identify statements that really affect each other. We propose a new approach to estimate the accuracy of...
Conference Paper
Full-text available
We describe DUA-Forensics, our open-source Java-bytecode program analysis and instrumentation system built on top of Soot. DUA-Forensics has been in development for more than six years and has supported multiple research projects on efficient monitoring, test-suite augmentation, fault localization, symbolic execution, and change-impact analysis. Th...
Conference Paper
Full-text available
Program slicing is a popular but imprecise technique for identifying which parts of a program affect or are affected by a particular value. A major reason for this imprecision is that slicing reports all program statements possibly affected by a value, regardless of how relevant to that value they really are. In this paper, we introduce quantitativ...

Citations

... 6) API/Web service: In this category, we found a tool called Docio which can generate API documents with I/O examples [38]. This tool is more like the popular REST API documentation generation tool Swagger but only supports the C programming language [79]. ...
... In the SBT representation, "type" signifies structural information, "value" represents lexical information, and brackets indicate the hierarchical structure. Previous studies [32][33][34] have illustrated the effectiveness of SBT in preserving both the code's structural and lexical aspects. Consequently, we treat the SBT sequence as a modality of the AST, employing the SBT method to encapsulate the overall semantic information, including both structural and lexical elements. ...
... Indeed, it is generally challenging to meaningfully convey a coherent and unified theme solely based on a list of unigrams. Such inference requires major feats of interpretation and often lead to ambiguous understanding of the topic due to lack of contextual information , Huang, 2018. Unigrams are often part of broader sentences, which are lost in a simple unigram representation. ...
... To facilitate the code reviewing process, various forms of explanations may be added to code diffs, such as commit messages [2], code comments, and pull request descriptions [3]. As manually crafting a high-quality explanation is timeconsuming and may be neglected by developers [4], [5], [6], researchers have proposed techniques to automatically generate code diff explanations. ...
... These can be classified into 2 categories: (1) tools to generate/recommend documentation and (2) empirical investigation of documentation usage and quality. Regarding automation for documentation, research has focused on either summarization or recommendation for bug reports [38,41], code [13,25,42], user stories [29], API usage examples [23,35,50,62], etc. Different from these, our tool AutoTSG focuses on automation that helps translate manual text documentation to executable workflows. Closer to our work in this space are the empirical studies on documentation. ...
... In recent years, machine translation, especially neural machine translation (a.k.a., NMT) [17], has found numerous applications in several domains [18], [19]. NMT has also been used in different software engineering tasks including, but not limited to, code summarization [20], [21], code comment generation [22], [23], and commit message generation [24]- [27]. Traditional NMT models often consist of two items: encoder and decoder. ...
... Kelly [18] noted that stakeholders often have trouble seeing beyond the current situation and may be unwilling to be involved in the process. This lack of stakeholder involvement can exacerbate issues with developers who do not sufficiently understand the problem domain, and who may not even know what questions to ask stakeholders if they were to be involved [19]. Kelly [18] also noted that it can be difficult to iteratively develop requirements as the technology matures over each sprint. ...
... There is evidence for the restrictiveness of these filters. Jiang et al. [48] explored 1.6 million commit messages from top 1,000 Java projects, and their findings show that 53% of messages do not follow the Verb-Direct Object structure and 18% of the messages have more than one sentence. In the later work, Jiang et al. [8] employ the Message Length and Diff Length filters for 30 and 100 tokens, respectively. ...
... In fact, the lack of code examples has been identified as one of the most severe issues in documentation [35]. However, studies find that developers are often unwilling towards writing documentation as they find it less productive and less rewarding [23], [28], although developers do look for code examples when they seek to learn and reuse a software library [40]. Manual efforts for generating code examples is often labouring and time-consuming. ...
... For example, Jadeite [22] allowed API users to share their aggregate experience and collaboratively add placeholders to API documents to indicate the expected classes and methods. In addition, Docio [23] is a system that helps API users understand the actual/dynamic input and output values of API functions. Furthermore, the recommendation of source code examples by submitting queries against API calls was also proposed to help both API developers [24] and users [25], [26]. ...