Paul W. McBurney's research while affiliated with University of Notre Dame and other places

Publications (9)

Article
Programs are, in essence, a collection of implemented features. Feature discovery in software engineering is the task of identifying key functionalities that a program implements. Manual feature discovery can be time consuming and expensive, leading to automatic feature discovery tools being developed. However, these approaches typically only descr...
Article
Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the pro...
Article
Source code summarization is the task of creating readable summaries that describe the functionality of software. Source code summarization is a critical component of documentation generation, for example as Javadocs formed from short paragraphs attached to each method in a Java program. At present, a majority of source code summarization is manual...
Article
Source code documentation often contains summaries of source code written by authors. Recently, automatic source code summarization tools have emerged that generate summaries without requiring author intervention. These summaries are designed for readers to be able to understand the high-level concepts of the source code. Unfortunately, there is no...
Article
In this paper, we present an emerging source code summarization technique that uses topic modeling to select keywords and topics as summaries for source code. Our approach organizes the topics in source code into a hierarchy, with more general topics near the top of the hierarchy. In this way, we present the software's highest-level functionality f...
Article
A documentation generator is a programming tool that creates documentation for software by analyzing the statements and comments in the software's source code. While many of these tools are manual, in that they require specially-formatted metadata written by programmers, new research has made inroads towards automatic generation of documentation. T...
Article
Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the pro...

Citations

... It is mostly assessed by counting the different types of errors. Adequacy (D'Haro et al., 2019;Ma and Sun, 2017;McBurney and McMillan, 2014;Arumae and Liu, 2019;Libovický et al., 2018) rates the amount of meaning expressed in the generated sample given a reference sample. Human participants and categorical scales dominate the assessment process. ...
... Yet a trend is immediately clear: A first generation of techniques relied on Information Retrieval (IR) or manually-defined heuristics to extract words from source code (for example, by using TF/IDF to pick the top-n words [10], [50]), and templates to place those words into readable sentences (exemplars in this category are presented in [11], [17]). An important strength in this first [10] x *Sridhara et al. (2011) [11] x x *Rastkar et al. (2011) [12] x x x *DeLucia et al. (2012) [13] x *Panichella et al. (2012) [14] x x * Moreno et al. (2013) [15] x [21] x *Oda et al. (2015) [22] x *Abid et al. (2015) [23] x x *Iyer et al. (2016) [24] x *McBurney (2016) [25] x x *Zhang et al. (2016) [26] x x *Rodeghero et al. (2017) [27] x *Fowkes et al. (2017) [28] x * Badihi et al. (2017) [29] x x *Loyola et al. (2017) [30] x *Lu et al. (2017) [31] x *Jiang et al. (2017) [5] x *Hu et al. (2018) [32] x *Hu et al. (2018) [33] x x *Allamanis et al. (2018) [34] x x *Wan et al. (2018) [35] x x *Liang et al. (2018) [36] x x *Alon et al. (2019) [37], [38] x x *Gao et al. (2019) [39] x *LeClair et al. (2019) [40] x x *Mesbah et al. (2019) [41] x x *Nie et al. (2019) [42] x x *Haque et al. (2020) [43] x x *Haldar et al. (2020) [44] x x *Ahmad et al. (2020) [3] x generation is that they tended to be built on a solid foundation of empirical evidence of what is important to programmers. For example, Moreno et al. [15] designed sentence templates for Java classes based on specific studies of programmers' needs in documentation. ...
... As can be seen, the process of identifying SATD comments is a typical binary classification problem [42][43][44] in nature, which aims to determine whether a comment indicates SATD or not. For the classification result, there are a total four situations (TP, FP, TN, and FN) shown in a confusion matrix in Table 6. ...
... Extractive summarization techniques generate summaries by extracting information from the code components being summarized [23], [50], [64], [68]. On the other hand, abstractive summarization techniques aim at including in the summaries information not directly available in the source code [24], [28], [32], [46], [67]. DL techniques have been used to support for the latter. ...
... The earliest approaches to source code summarization are traditional, which rely mainly on templates [10][11][12][13]. Sridhara et al. [14] applied a method for creating artificial templates where they used the Software Word Usage Model (SWUM) to generate comments for Java methods. ...
... Both extractive and abstractive techniques have been used to document code components at different granularity levels, such as method (e.g., [41], [42], [50], [52], [53]), method parameters (e.g., [54]), method usages (e.g., [55], [56]) class (e.g., [11], [49], [57]), unit tests [58], and code snippets (e.g., [59], [60]). ...
... These methods were designed to generate code comments at the class [2] or function level [1], [3]. Initial methods were usually designed based on handcrafting or information retrieval techniques. ...
... Code constructs represent the syntactic/structural meaning of each token. The code constructs are reported to be useful in code comprehension and are used in Software Engineering tasks such as bug detection and program repair [7,19,59,64]. We note that AST, which provides a tree structure for code, is another way to represent structural information for code. ...