Article

Learning a Metric for Code Readability

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from 120 human annotators, we derive associations between a simple set of local code features and human notions of readability. Using those features, we construct an automated readability measure and show that it can be 80 percent effective and better than a human, on average, at predicting readability judgments. Furthermore, we show that this metric correlates strongly with three measures of software quality: code changes, automated defect reports, and defect log messages. We measure these correlations on over 2.2 million lines of code, as well as longitudinally, over many releases of selected projects. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggest that comments, in and of themselves, are less important than simple blank lines to local judgments of readability.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This way, it is important to understand how Google would rank code examples in isolation, i.e., without any other page elements. In this case, we could query and assess the characteristics of top/bottom ranked code examples and verify their quality aspects, for instance, whether good coding practices [9,53,58,59,75,77] are found in higher ranked ones. This may provide the basis to detect the possible strengths and limitations of the search engine in dealing with code. ...
... 8 Java SE provides basic functionalities that are used by almost all Java applications, such as collections, date and time facilities, among others. 9 Apache Commons provides a large set of reusable components. 10 Lastly, the Spring Framework supports the creation of enterprise applications in many scenarios and architectures, such as web apps, microservices, and cloud. ...
... Readability. It is a human judgment of how easy a text is to understand [9,58]. We rely on the metric proposed by Scalabrino et al. [75,77] to evaluate the readability of a code example, which have a higher accuracy score when evaluated against other state-of-the-art models. ...
Article
Full-text available
Developers often look for code examples on the web to improve learning and accelerate development. Google indexes millions of pages with code examples: pages with better content are likely to be top ranked. In practice, many factors may influence the rank: page reputation, content quality, etc. Consequently , the most relevant information on the page, i.e., the code example, may be overshadowed by the search engine. Thus, a better understanding of how Google would rank code examples in isolation may provide the basis to detect its strengths and limitations on dealing with such content. In this paper, we assess how the Google search engine ranks code examples. We build a website with 1,000 examples and submit it to Google. After being fully indexed, we query and analyze the returned examples. We find that pages with multiple code examples are more likely to top ranked by Google. Overall, single code examples that are higher ranked are larger, however, they are not necessarily more readable and reusable. We predict top ranked examples with a good level of confidence, but generic factors have more importance than code quality ones. Based on our results, we provide insights for researchers and practitioners.
... Although it is consensus that code readability is a subjective matter, in the last decades, researchers have proposed several metrics, tools, and models to assess code readability [1,2,8,12,20,21]. In this work we leverage the model proposed by Posnett and colleagues [20]. ...
... In this work we leverage the model proposed by Posnett and colleagues [20]. This model is a simplified mix of two other approaches: the Buse model [1] and the Hastead metrics [7]. Posnett and colleagues proposed a simplification of these two other works that consider only three variables: lines of code, volume, and entropy. ...
... A survey was conducted with 55 students of the software engineering course of the Computer Science course and another smaller group formed by 7 professional programmers of a large Brazilian software company. Buse and Weimer's [1] practices and the Scalabrino's model [21] were evaluated. At the end of the analysis it was identified that 8 out of 11 coding practices affected the readability perceived by the research participants. ...
Preprint
Full-text available
Cognitive-Driven Development (CDD) is a coding design technique that aims to reduce the cognitive effort that developers place in understanding a given code unit (e.g., a class). By following CDD design practices, it is expected that the coding units to be smaller, and, thus, easier to maintain and evolve. However, it is so far unknown whether these smaller code units coded using CDD standards are, indeed, easier to understand. In this work we aim to assess to what CDD improves code readability. To achieve this goal, we conducted a two-phase study. We start by inviting professional software developers to vote (and justify their rationale) on the most readable pair of code snippets (from a set of 10 pairs); one of the pairs was coded using CDD practices. We received 133 answers. In the second phase, we applied the state-of-the art readability model on the 10-pairs of CDD-guided refactorings. We observed some conflicting results. On the one hand, developers perceived that seven (out of 10) CDD-guided refactorings were more readable than their counterparts; for two other CDD-guided refactorings, developers were undecided, while only in one of the CDD-guided refactorings, developers preferred the original code snippet. On the other hand, we noticed that only one CDD-guided refactorings have better performance readability, assessed by state-of-the-art readability models. Our results provide initial evidence that CDD could be an interesting approach for software design.
... For example, since average line length is a feature, then the survey snippets should contain high, medium, and low average line length. Third, all selected snippets have a high degree of cohesiveness considering the functionality of the code [21]. Finally, the collection has a balanced mixture of readable and non-readable code. ...
... The input to our machine learning model consisted of static structural features of R code snippets, including indentation, comments, and line length. These structural features can play a major factor in one's judgment of code readability [21]. Since the snippets were between 12-18 lines of code and unrepresentative in size of real-world code uploaded to research repositories, all features were standardized per number of code lines. ...
... By doing this, we ensure that our machine learning model generalizes to longer pieces of code rather than just our selected snippets of code. Our list of features is modeled after Ref. [21] but modified to fit the R programming language use case (Tab. I). ...
Preprint
An essential part of research and scientific communication is researchers' ability to reproduce the results of others. While there have been increasing standards for authors to make data and code available, many of these files are hard to re-execute in practice, leading to a lack of research reproducibility. This poses a major problem for students and researchers in the same field who cannot leverage the previously published findings for study or further inquiry. To address this, we propose an open-source platform named RE3 that helps improve the reproducibility and readability of research projects involving R code. Our platform incorporates assessing code readability with a machine learning model trained on a code readability survey and an automatic containerization service that executes code files and warns users of reproducibility errors. This process helps ensure the reproducibility and readability of projects and therefore fast-track their verification and reuse.
... First, we investigate whether review changes can be automatically classified using a supervised machine learning approach; this with the goal of solving the scalability issue. To this end, (1) we manually classify 1,504 review changes using Beller et al.'s taxonomy (Beller et al. 2014); (2) we select 30 features based on the analysis of prior work (Buse and Weimer 2010;Fluri et al. 2007) as well as the insight acquired constructing the dataset; ...
... The rationale behind each of the metrics selected for our investigation based on the literature and our observations during the creation of the dataset. We combined common code analysis metrics (e.g., LOC, LOCExec, or Cyclomatic Complexity) with a selection of code readability metrics: number of commas or number of cycles (Buse and Weimer 2010). The selection of these metrics is based on an analysis of the literature on what can characterize the type of a modification performed by developers (Fluri et al. 2007) and what can capture the nature of source code under different perspectives. ...
... Another potential threat is related to the selection of the independent variables used to build our automated approach. We exploited a set of well-know features presented in previous work and covering different aspects of source code quality, understandability, and textual coherence (Buse and Weimer 2010;Palomba et al. May 2016;McCabe 1976). ...
Article
Full-text available
Code reviewing is a widespread practice used by software engineers to maintain high code quality. To date, the knowledge on the effect of code review on source code is still limited. Some studies have addressed this problem by classifying the types of changes that take place during the review process (a.k.a. review changes), as this strategy can, for example, pinpoint the immediate effect of reviews on code. Nevertheless, this classification (1) is not scalable, as it was conducted manually, and (2) was not assessed in terms of how meaningful the provided information is for practitioners. This paper aims at addressing these limitations: First, we investigate to what extent a machine learning-based technique can automatically classify review changes. Then, we evaluate the relevance of information on review change types and its potential usefulness, by conducting (1) semi-structured interviews with 12 developers and (2) a qualitative study with 17 developers, who are asked to assess reports on the review changes of their project. Key results of the study show that not only it is possible to automatically classify code review changes, but this information is also perceived by practitioners as valuable to improve the code review process. Data and materials: https://doi.org/10.5281/zenodo.5592254
... The readability of the source code potentially enables easier maintenance and decreases a potential number of regression defects in the system [66]. Despite such a property of code that might be difficult to measure exactly, several attempts have been made for such a quantification [14], [66], [67]. ...
... The readability of the source code potentially enables easier maintenance and decreases a potential number of regression defects in the system [66]. Despite such a property of code that might be difficult to measure exactly, several attempts have been made for such a quantification [14], [66], [67]. ...
... Buse and Weimer's Model (BW) employs quantification of structural aspects (e.g., number of branches, loops, etc.) to express the code readability [66]. No simple formula can be presented as a definition; refer to the original study for details. ...
Article
Full-text available
Software code is present on multiple levels within the current Internet of Things (IoT) systems. The quality of this code impacts system reliability, safety, maintainability, and other quality aspects. In this paper, we provide a comprehensive overview of code quality-related metrics, specifically revised for the context of IoT systems. These metrics are divided into main code quality categories: Size, redundancy, complexity, coupling, unit test coverage and effectiveness, cohesion, code readability, security, and code heterogeneity. The metrics are then linked to selected general quality characteristics from the ISO/IEC 25010:2011 standard by their possible impact on the quality and reliability of an IoT system, the principal layer of the system, the code levels, and the main phases of the project to which they are relevant. This analysis is followed by a discussion of code smells and their relation to the presented metrics. The overview presented in the paper is the result of a thorough analysis and discussion of the author’s team with the involvement of external subject-matter experts in which a defined decision algorithm was followed. The primary result of the paper is an overview of the metrics accompanied by applicability notes related to the quality characteristics, the system layer, the level of the code, and the phase of the IoT project.
... This suite includes Depth of Inheritance Tree, Number of Children, and Coupling between Object Classes. Readability as a code metric was developed by Buse et al. [15], who modelled different code features, such as number of comments, and number of blank lines, to a readability score. This model was later improved by Posnett et al. [63] and Scalabrino et al. [72]. ...
... User studies are a widely adopted approach [1,7,11,15,21,22,36,43,71,72] for understanding maintenance effort. The maintenance effort of a code fragment can be understood by the cognitive effort and time required to understand the code by a selected group of users. ...
... " One of the most frequent activities in software development and maintenance is code reading [72]. Readability is a measure of how easy (or difficult) it is to read and understand a source code [13,15,42,63], which has significant impact on code maintenance. In this paper, we have used the publicly available readability tool implemented by Buse et al. [15] to examine the relationship between method size and readability. ...
Preprint
Full-text available
Code metrics have been widely used to estimate software maintenance effort. Metrics have generally been used to guide developer effort to reduce or avoid future maintenance burdens. Size is the simplest and most widely deployed metric. The size metric is pervasive because size correlates with many other common metrics (e.g., McCabe complexity, readability, etc.). Given the ease of computing a method's size, and the ubiquity of these metrics in industrial settings, it is surprising that no systematic study has been performed to provide developers with meaningful method size guidelines with respect to future maintenance effort. In this paper we examine the evolution of around 785K Java methods and show that developers should strive to keep their Java methods under 24 lines in length. Additionally, we show that decomposing larger methods to smaller methods also decreases overall maintenance efforts. Taken together, these findings provide empirical guidelines to help developers design their systems in a way that can reduce future maintenance.
... They concluded that refactoring may lower developer's productivity in the short term in cases where the code reads differently from what developers have grown attached to. This observation makes it clear that one of the critical elements that might affect a developer's productivity after the application of refactoring operations is program/code readability, that is, the property describing how well developers can read the source code [39]. Readability can be considered as the basic unit of program comprehension [40] and, if not preserved, may induce developers to waste more time/resources while evolving source code, other than increasing the risks connected to the introduction of defects [35]. ...
... Code readability refers to how easy a text is to understand and is an essential attribute for maintainability and, subse-quently, code quality [39], [56]. Over the years, studies have investigated various ways to improve readability. ...
... More than 5,000 people manually evaluated the features of code snippets and concluded that their approach improved readability. Other models have been proposed in the literature to increase code readability [39], [58]. The impact of looping and nesting on readability was assessed by conducting an experiment with 275 participants on 32 Java methods [40]. ...
Conference Paper
Full-text available
Software refactoring is the activity associated with developers changing the internal structure of source code without modifying its external behavior. The literature argues that refactoring might have beneficial and harmful implications for software maintainability, primarily when performed without the support of automated tools. This paper continues the narrative on the effects of refactoring by exploring the dimension of program comprehension, namely the property that describes how easy it is for developers to understand source code. We start our investigation by assessing the basic unit of program comprehension, namely program readability. Next, we set up a large-scale empirical investigation-conducted on 156 open-source projects-to quantify the impact of refactoring on program readability. First, we mine refactoring data and, for each commit involving a refactoring, we compute (i) the amount and type(s) of refactoring actions performed and (ii) eight state-of-the-art program comprehension metrics. Afterwards, we build statistical models relating the various refactoring operations to each of the readability metrics considered to quantify the extent to which each refactoring impacts the metrics in either a positive or negative manner. The key results are that refactoring has a notable impact on most of the readability metrics considered.
... Code readability and complexity are both considered to be important aspects in software development as they directly link to code maintainability [1]- [3]. Code maintainability is one of the critical code quality characteristics defined by the standard model for assessing software quality in ISO/IEC 25010:2011 [4]. ...
... Buse and Weimer in [1] investigated the relationship between code readability and software quality. To this end, they first collected data using 120 human subjects rating short code snippets and then developed a mathematical model to estimate the readability of code, which we here refer to as the B&W model. ...
... In 2010 Buse and Weimer [1] presented the first readability metric for source code. They collected data by letting 120 human subjects rate the readability of 100 short code fragments. ...
Preprint
Full-text available
Well structured and readable source code is a pre-requisite for maintainable software and successful collaboration among developers. Static analysis enables the automated extraction of code complexity and readability metrics which can be leveraged to highlight potential improvements in code to both attain software of high quality and reinforce good practices for developers as an educational tool. This assumes reliable readability metrics which are not trivial to obtain since code readability is somewhat subjective. Recent research has resulted in increasingly sophisticated models for predicting readability as perceived by humans primarily with a procedural and object oriented focus, while functional and declarative languages and language extensions advance as they often are said to lead to more concise and readable code. In this paper, we investigate whether the existing complexity and readability metrics reflect that wisdom or whether the notion of readability and its constituents requires overhaul in the light of programming language changes. We therefore compare traditional object oriented and reactive programming in terms of code complexity and readability in a case study. Reactive programming is claimed to increase code quality but few studies have substantiated these claims empirically. We refactored an object oriented open source project into a reactive candidate and compare readability with the original using cyclomatic complexity and two state-of-the-art readability metrics. More elaborate investigations are required, but our findings suggest that both cyclomatic complexity and readability decrease significantly at the same time in the reactive candidate, which seems counter-intuitive. We exemplify and substantiate why readability metrics may require adjustment to better suit popular programming styles other than imperative and object-oriented to better match human expectations.
... What constitutes readable code and what does not, seems to be largely a matter of personal taste. Notwithstanding this, research by Buse and Weimer [47] suggests that code readability can, at least in part, be measured objectively. ...
... A relatively small number of papers [26], [47]- [50] examine models for automatic code readability estimation. Most recently, Scalabrino et al. [26], [50] compiled a dataset by letting 30 Computer Science students rate the readability of 200 methods, previously selected from well-known Java projects. ...
... Each method received 9 readability ratings: these ratings were then averaged and compared against a threshold value to assign a single binary readability label. Further, they developed a logistic regression model for code readability estimation by combining structural readability features proposed in Buse and Weimer [47] and Dorn [49] with novel textual features. We base all of our experiments on above dataset; previous datasets by Buse and Weimer [47] and Dorn [49] were not included due to lack of baselines suitable for comparison. ...
Preprint
Full-text available
This paper provides a starting point for Software Engineering (SE) researchers and practitioners faced with the problem of training machine learning models on small datasets. Due to the high costs associated with labeling data, in Software Engineering,there exist many small (< 1 000 samples) and medium-sized (< 100 000 samples) datasets. While deep learning has set the state of the art in many machine learning tasks, it is only recently that it has proven effective on small-sized datasets, primarily thanks to pre-training, a semi-supervised learning technique that leverages abundant unlabelled data alongside scarce labelled data.In this work, we evaluate pre-trained Transformer models on a selection of 13 smaller datasets from the SE literature, covering both,source code and natural language. Our results suggest that pre-trained Transformers are competitive and in some cases superior to previous models, especially for tasks involving natural language; whereas for source code tasks, in particular for very small datasets,traditional machine learning methods often has the edge.In addition, we experiment with several techniques that ought to aid training on small datasets, including active learning, data augmentation, soft labels, self-training and intermediate-task fine-tuning, and issue recommendations on when they are effective. We also release all the data, scripts, and most importantly pre-trained models for the community to reuse on their own datasets.
... In software engineering, the terms readability, legibility, understandability, and comprehensibility have overlapping meanings. For example, Buse and Weimer [22] define "readability as a human judgment of how easy a text is to understand". In a similar vein, Almeida et al. [23] affirm that "legibility is fundamental to code maintenance; if source code is written in a complex way, understanding it will require much more effort". ...
... A paper is deprecated if it was extended by another paper that we selected. For instance, the work of Buse and Weimer published in 2008 [40] was extended in a subsequent paper [22]. In this case, we consider the former to be deprecated and only take the latter into account. ...
... On the other hand, some studies required the subjects to act on the code. In ten studies subjects were asked to find [7], [8], [12], [13], [16], [21], [43]- [45], [47]- [55] answer questions about code characteristics 27 studies: [1], [2], [5], [6], [9], [10], [19], [20], [41]- [43], [46], [47], [50], [51], [54]- [65] remember (part of) the code 7 studies: [8], [9], [43], [44], [48], [52], [53] act on the code (15 studies) find and fix bugs in the code 10 studies: [11], [13]- [15], [19], [21], [45], [46], [66], [67] modify the code 8 studies: [10], [13], [45], [46], [56], [62], [66], [67] write code 3 studies: [61], [62], [68] provide personal opinion (30 studies) opinion about the code (readability or legibility) 23 studies: [3]- [5], [10], [12], [13], [22], [45], [49], [55], [59], [61], [62], [64], [65], [67]- [74] answer if understood the code 4 studies: [42], [55], [59], [60] rate confidence in her answer 3 studies: [8], [20], [53] rate the task difficulty 7 studies: [5], [10], [20], [21], [45], [49], [51] and fix bugs in the code. Scanniello et al. [11] asked subjects to do so in two programs with different identifier styles. ...
Preprint
Reading code is an essential activity in software maintenance and evolution. Several studies with human subjects have investigated how different factors, such as the employed programming constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to read and apprehend by developers, and code legibility, i.e., what influences the ease of identifying elements of a program. These studies evaluate readability and legibility by means of different comprehension tasks and response variables. In this paper, we examine these tasks and variables in studies that compare programming constructs, coding idioms, naming conventions, and formatting guidelines, e.g., recursive vs. iterative code. To that end, we have conducted a systematic literature review where we found 54 relevant papers. Most of these studies evaluate code readability and legibility by measuring the correctness of the subjects' results (83.3%) or simply asking their opinions (55.6%). Some studies (16.7%) rely exclusively on the latter variable.There are still few studies that monitor subjects' physical signs, such as brain activation regions (5%). Moreover, our study shows that some variables are multi-faceted. For instance, correctness can be measured as the ability to predict the output of a program, answer questions about its behavior, or recall parts of it. These results make it clear that different evaluation approaches require different competencies from subjects, e.g., tracing the program vs. summarizing its goal vs. memorizing its text. To assist researchers in the design of new studies and improve our comprehension of existing ones, we model program comprehension as a learning activity by adapting a preexisting learning taxonomy. This adaptation indicates that some competencies are often exercised in these evaluations whereas others are rarely targeted.
... As a side note, care should be taken that the comments do not directly interact with the task. For example, Fig. 1 in a paper by Buse and Weimer (2010) shows a short code snippet preceded by the comment "this is hard to read", from an experiment where subjects are asked to assess readability. Providing such overt hints regarding the expected answer should be avoided. ...
... While this terminology is not universal (for example, Smith and Taffler advocate distinguishing between "reading" and "understanding" (Smith and Taffler 1992)), it is often also adopted in studies on reading code. For example, the first sentence in Buse and Weimer (2010) is "we define readability as a human judgment of how easy a text is to understand". But code is actually somewhat different from text. ...
Article
Full-text available
Understanding program code is a complicated endeavor. As a result, studying code comprehension is also hard. The prevailing approach for such studies is to use controlled experiments, where the difference between treatments sheds light on factors which affect comprehension. But it is hard to conduct controlled experiments with human developers, and we also need to find a way to operationalize what “comprehension” actually means. In addition, myriad different factors can influence the outcome, and seemingly small nuances may be detrimental to the study’s validity. In order to promote the development and use of sound experimental methodology, we discuss both considerations which need to be applied and potential problems that might occur, with regard to the experimental subjects, the code they work on, the tasks they are asked to perform, and the metrics for their performance. A common thread is that decisions that were taken in an effort to avoid one threat to validity may pose a larger threat than the one they removed.
... Terms such as 'readability', 'efficiency', and 'performance' represent the developers' main focus, with 16.37%, 13.85%, and 11.52%, respectively. Although multiple studies (Pantiuchina et al. 2018;Fakhoury et al. 2019;Alrubaye et al. 2020) have been analyzing code comprehension and using metrics to measure readability, there is no mention of these readability tools/models (i.e., Dorn 2012; Scalabrino et al. 2018;Buse and Weimer 2009;Posnett et al. 2011) in the questions. For instance, developers refactor the code to improve its reusability. ...
... Improve and extend the applicability of readability quality metrics Our findings show that improvements to readability are a critical concern for developers. While the research community has made considerable strides in producing readability metrics and models (Buse and Weimer 2009;Scalabrino et al. 2018), the community needs to better collaborate with established vendors in integrating their contributions with popular tools and IDEs to promote the usage of their artifacts. Additionally, our findings highlight specific avenues for readability research, such as optimizing/eliminating lengthy switch-case statements and conditional loops and understanding their influence on comprehension. ...
Article
Full-text available
An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with several studies altering between the detection of refactoring opportunities and the recommendation of appropriate code changes, little is known about their adoption in practice. Analyzing the perception of developers is critical to understand better what developers consider to be problematic in their code and how they handle it. Additionally, there is a need for bridging the gap between refactoring, as research, and its adoption in practice, by extracting common refactoring intents that are more suitable for what developers face in reality. In this study, we analyze refactoring discussions on Stack Overflow through a series of quantitative and qualitative experiments. Our results show that Stack Overflow is utilized by a diverse set of developers for refactoring assistance for a variety of technologies. Our observations show five areas that developers typically require help with refactoring– Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database. We envision our findings better bridge the support between traditional (or academic) aspects of refactoring and their real-world applicability, including better tool support.
... The definition depends on the context of the software, the nature of the bug, the current state of the project, the ratio of affected users, potential harm to users, and many other factors. The current state of the practice to identify and record the severity of bugs is through a manual process in issue tracking systems, where the severity of a bug has its own field (with options such as Blocker, Critical, Major, Minor, and Trivial or sometimes with numbers ranging from zero to 20). The field is manually populated by the person who documents the bug, however, it may change by the technical team during the bug reporting review process. ...
... Readability (R): This metric combines different code features to calculate a single value for estimating code readability. We used the readability metric proposed by Buse et al. [20] which generates a readability score for a given method. The readability scores range from 0 to 1 for specifying least readable code to most readable code, respectively. ...
Preprint
Full-text available
In the past couple of decades, significant research efforts are devoted to the prediction of software bugs (i.e., defects). These works leverage a diverse set of metrics, tools, and techniques to predict which classes, methods, lines, or commits are buggy. However, most existing work in this domain treats all bugs the same, which is not the case in practice. The more severe the bugs the higher their consequences. Therefore, it is important for a defect prediction method to estimate the severity of the identified bugs, so that the higher severity ones get immediate attention. In this paper, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and also two popular static analysis tools (SpotBugs and Infer) for analyzing their capability in predicting defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are powerful in predicting the buggy code (Lines of the Code, Maintainable Index, FanOut, and Effort metrics are the best), they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%-7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools in estimating the severity. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Finally, we show that code metrics and static analysis methods can be complementary in terms of estimating bug severity.
... An extensive study has been conducted by the researchers in order to assess the impact of different program constructs on the code readability [16,17]. To propose the source code readability measure [2], the authors have performed a thorough study. First of all, they selected 100 code snippets of Java programming language from five open source projects. ...
... The authors in [4] have improved the work of Buse and Westley [2]. They use entropy based predictive modeling approach in order to improve code readability. ...
Article
Full-text available
In software, code is the only part that remains up to date, which shows how important code is. Code readability is the capability of the code that makes it readable and understandable for professionals. The readability of code has been a great concern for programmers and other technical people in development team because it can have a great influence on software maintenance. A lot of research has been done to measure the influence of program constructs on the code readability but none has placed the highly influential constructs together to predict the readability of a code snippet. In this article, we propose a novel framework using statistical modeling that extracts important features from the code that can help in estimating its readability. Besides that using multiple correlation analysis, our proposed approach can measure dependencies among di erent program constructs. In addition, a multiple regression equation is proposed to predict the code readability. We have automated the proposals in a tool that can do the aforementioned estimations on the input code. Using those tools we have conducted various experiments. The results show that the calculated estimations match with the original values that show the effectiveness of our proposed work. Finally, the results of the experiments are analyzed through statistical analysis in SPSS tool to show their significance.
... Terms such as 'readability', 'efficiency', and 'performance' represent the developers' main focus, with 16.37%, 13.85%, and 11.52%, respectively. Although multiple studies [75,53,32] have been analyzing code comprehension and using metrics to measure readability, there is no mention of these readability tools/models (i.e., [50,89,46,83]) in the questions. For instance, developers refactor the code to improve its reusability. ...
... Improve and extend the applicability of readability quality metrics Our findings show that improvements to readability are a critical concern for developers. While the research community has made considerable strides in producing readability metrics and models [46,89], the community needs to better collaborate with established vendors in integrating their contributions with popular tools and IDEs to promote the usage of their artifacts. Additionally, our findings highlight specific avenues for readability research, such as optimizing/eliminating lengthy switch-case statements and conditional loops and understanding their influence on comprehension. ...
Preprint
Full-text available
An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with several studies altering between the detection of refactoring opportunities and the recommendation of appropriate code changes, little is known about their adoption in practice. Analyzing the perception of developers is critical to understand better what developers consider to be problematic in their code and how they handle it. Additionally, there is a need for bridging the gap between refactoring, as research, and its adoption in practice, by extracting common refactoring intents that are more suitable for what developers face in reality. In this study, we analyze refactoring discussions on Stack Overflow through a series of quantitative and qualitative experiments. Our results show that Stack Overflow is utilized by a diverse set of developers for refactoring assistance for a variety of technologies. Our observations show five areas that developers typically require help with refactoring -- Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database. We envision our findings better bridge the support between traditional (or academic) aspects of refactoring and their real-world applicability, including better tool support.
... A few other studies have also examined practitioners' views of snippet quality (Bai et al., 2019;Buse & Weimer, 2009;Tavakoli et al., 2020). For instance, Buse & Weimer (2009) While previous work has focused on quality aspects of code snippets, we did not find any studies that have reviewed and compared the academics' views of code quality with practitioners'. ...
... A few other studies have also examined practitioners' views of snippet quality (Bai et al., 2019;Buse & Weimer, 2009;Tavakoli et al., 2020). For instance, Buse & Weimer (2009) While previous work has focused on quality aspects of code snippets, we did not find any studies that have reviewed and compared the academics' views of code quality with practitioners'. Our work aims to comprehensively understand how views on software quality have shifted over time, examining early software models focused on software quality and more recent frameworks typically used to study the quality of shorter code/snippets used by practitioners seeking help online. ...
... In average, developers spend 70% of their time reading programs [14]. Some code search engines usually use readability metrics [21] [19] [3] trying to improve the code snippets ranking [8] [15]. These metrics have been employed in recent research, for instance to recommend readable APIs in code snippets [8] or to evaluate readability changes in projects history [18]. ...
... Moreno et al. develop the Muse approach to rank code snippets producing an overall score using readability and reusability feature. But this research has employed other readability approach [3], and the other mentioned researches has used the Scalabrino et al. readability approach [20]. ...
Preprint
Full-text available
Code search engines usually use readability feature to rank code snippets. There are several metrics to calculate this feature, but developers may have different perceptions about readability. Correlation between readability and understandability features has already been proposed, i.e., developers need to read and comprehend the code snippet syntax, but also understand the semantics. This work investigate scores for understandability and readability features, under the perspective of the possible subjective perception of code snippet comprehension. We find that code snippets with higher readability score has better comprehension than lower ones. The understandability score presents better comprehension in specific situations, e.g. nested loops or if-else chains. The developers also mentioned writability aspects as the principal characteristic to evaluate code snippets comprehension. These results provide insights for future works in code comprehension score optimization.
... However, there is still very little support for developers in terms of helping them craft high-quality identifier names. Research has examined the terms or structure of names [2], [7]- [10] and produced readability metrics and models [11]- [13] to try and address this problem. However, they still fall short of providing tangible advice for improving naming practices in developers' day-to-day activities. ...
Preprint
Full-text available
Developers must comprehend the code they will maintain, meaning that the code must be legible and reasonably self-descriptive. Unfortunately, there is still a lack of research and tooling that supports developers in understanding their naming practices; whether the names they choose make sense, whether they are consistent, and whether they convey the information required of them. In this paper, we present IDEAL, a tool that will provide feedback to developers about their identifier naming practices. Among its planned features, it will support linguistic anti-pattern detection, which is what will be discussed in this paper. IDEAL is designed to, and will, be extended to cover further anti-patterns, naming structures, and practices in the near future. IDEAL is open-source and publicly available, with a demo video available at: https://youtu.be/fVoOYGe50zg
... Hora [10] constructed the API Sonar tool 2 ranking code snippets with readability feature proposed by Scalabrino et al. [18]. Moreno et al. [13] developed the Muse approach to rank code examples using readability feature proposed by Buse and Weimer [2]. These related works shows how readability is a well used feature to rank code snippets. ...
Preprint
Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine, and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing
... While difficult to quantify, readability of source code intuitively describes how easy it is to understand it. Buse and Weimer [27] created a model of code readability based on subjective human judgements of given code snippets, and demonstrated that this metric strongly correlates with different aspects of code quality. The model is based on a collection of syntactic features such as line length or types of tokens used. ...
Preprint
Full-text available
Block-based programming languages like Scratch have become increasingly popular as introductory languages for novices. These languages are intended to be used with a "tinkering" approach which allows learners and teachers to quickly assemble working programs and games, but this often leads to low code quality. Such code can be hard to comprehend, changing it is error-prone, and learners may struggle and lose interest. The general solution to improve code quality is to refactor the code. However, Scratch lacks many of the common abstraction mechanisms used when refactoring programs written in higher programming languages. In order to improve Scratch code, we therefore propose a set of atomic code transformations to optimise readability by (1) rewriting control structures and (2) simplifying scripts using the inherently concurrent nature of Scratch programs. By automating these transformations it is possible to explore the space of possible variations of Scratch programs. In this paper, we describe a multi-objective search-based approach that determines sequences of code transformations which improve the readability of a given Scratch program and therefore form refactorings. Evaluation on a random sample of 1000 Scratch programs demonstrates that the generated refactorings reduce complexity and entropy in 70.4% of the cases, and 354 projects are improved in at least one metric without making any other metric worse. The refactored programs can help both novices and their teachers to improve their code.
... For this, we enforce the code quality through automatic code analysis and strict code reviews. While comprehensibility is hard to quantify [36], we track this requirement by using Hyrise not only as a research but also as a teaching platform. Since the rewrite, 93 master's students have participated in four iterations of our project seminar. ...
Thesis
Full-text available
A decade ago, it became feasible to store multi-terabyte databases in main memory. These in-memory databases (IMDBs) profit from DRAM's low latency and high throughput as well as from the removal of costly abstractions used in disk-based systems, such as the buffer cache. However, as the DRAM technology approaches physical limits, scaling these databases becomes difficult. Non-volatile memory (NVM) addresses this challenge. This new type of memory is persistent, has more capacity than DRAM (4x), and does not suffer from its density-inhibiting limitations. Yet, as NVM has a higher latency (5-15x) and a lower throughput (0.35x), it cannot fully replace DRAM. IMDBs thus need to navigate the trade-off between the two memory tiers. We present a solution to this optimization problem. Leveraging information about access frequencies and patterns, our solution utilizes NVM's additional capacity while minimizing the associated access costs. Unlike buffer cache-based implementations, our tiering abstraction does not add any costs when reading data from DRAM. As such, it can act as a drop-in replacement for existing IMDBs. Our contributions are as follows: (1) As the foundation for our research, we present Hyrise, an open-source, columnar IMDB that we re-engineered and re-wrote from scratch. Hyrise enables realistic end-to-end benchmarks of SQL workloads and offers query performance which is competitive with other research and commercial systems. At the same time, Hyrise is easy to understand and modify as repeatedly demonstrated by its uses in research and teaching. (2) We present a novel memory management framework for different memory and storage tiers. By encapsulating the allocation and access methods of these tiers, we enable existing data structures to be stored on different tiers with no modifications to their implementation. Besides DRAM and NVM, we also support and evaluate SSDs and have made provisions for upcoming technologies such as disaggregated memory. (3) To identify the parts of the data that can be moved to (s)lower tiers with little performance impact, we present a tracking method that identifies access skew both in the row and column dimensions and that detects patterns within consecutive accesses. Unlike existing methods that have substantial associated costs, our access counters exhibit no identifiable overhead in standard benchmarks despite their increased accuracy. (4) Finally, we introduce a tiering algorithm that optimizes the data placement for a given memory budget. In the TPC-H benchmark, this allows us to move 90% of the data to NVM while the throughput is reduced by only 10.8% and the query latency is increased by 11.6%. With this, we outperform approaches that ignore the workload's access skew and access patterns and increase the query latency by 20% or more. Individually, our contributions provide novel approaches to current challenges in systems engineering and database research. Combining them allows IMDBs to scale past the limits of DRAM while continuing to profit from the benefits of in-memory computing.
... Following are code complexity metrics, which, for ease of reference we divided into two sub-categories: Literature and Complexity. In the "Literature" category, Table 2 presents major complexity metrics that received significant attention in the research literature, beginning from McCabe's almost half century old cyclomatic complexity [43] and Halstead's [42] metrics, through the famous Chidamber and Kemerer's [17] metrics of object-oriented design, and ending with the relatively recent Buse and Weimer [38] metric and Daka et al.'s "readability" metric. ...
Preprint
div> Mutation score is widely accepted to be a reliable measurement for the effectiveness of software tests. Recent studies, however, show that mutation analysis is extremely costly and hard to use in practice. We present a novel direct prediction model of mutation score using neural networks. Relying solely on static code features that do not require generation of mutants or execution of the tests, we predict mutation score with an accuracy better than a quintile. When we include statement coverage as a feature, our accuracy rises to about a decile. Using a similar approach, we also improve the state-of-the-art results for binary test effectiveness prediction and introduce an intuitive, easy-to-calculate set of features superior to previously studied sets. We also publish the largest dataset of test-class level mutation score and static code features data to date, for future research. Finally, we discuss how our approach could be integrated into real-world systems, IDEs, CI tools, and testing frameworks. </div
... We consider different groups of metrics: a) Object-oriented metrics which include the metrics proposed by Chidamber and Kemerer [51] such as Weighted Methods per Class (WMC), Depth Inheritance Tree (DIT), Coupling between objects (CBO), Response for a Class (RFC). b) Readability metrics such as the number of loops and the number of comparisons [37,240] and c) other source code metrics like the number of Sources Line Of Code (SLOC). To extract these features, we used CK [13] that applies static analysis to calculate code metrics. ...
Thesis
In recent years, with more than 3 million applications on its official store, Google’s Android has dominated the market of mobile operating systems worldwide. Despite this success, Google has continued evolving its operating system and its toolkits to ease application development. In 2017 Google declared Kotlin as an official Android programming language. More recently, during the Google I/O 2019, Google announced that Android became ‘Kotlin-first’, which means that new API, libraries, and documentation will target Kotlin and eventually Java and Kotlin as preferred language to create new Android applications. Kotlin is a programming language that runs on the Java Virtual Machine (JVM) and it is fully interoperable with Java because both languages are compiled to JVM bytecode. Due to this characteristic, Android developers do not need to migrate their existing applications to Kotlin to start using Kotlin in these applications. Moreover, Kotlin provides a different approach to write applications because it combines object-oriented and functional features. Therefore, we hypothesize that the adoption of Kotlin by developers may affect different aspects of Android applications’ development. However, one year after this first announcement, there were no studies in the literature about Kotlin. In this thesis, we conducted a series of empirical studies to address these lacks and build a better understanding of creating high-quality Android applications using Kotlin. First, we carried a study to measure the degree of adoption of Kotlin. Our results showed that 11% of the studied Android applications had adopted Kotlin. Then, we analyzed how the adoption of Kotlin impacted the quality of Android applications in terms of code smells. We found that the introduction of Kotlin in Android applications initially written in Java produces a rise in the quality scores from 50% to 80% according to the code smell considered. We analyzed the evolution of usage of features introduced by Kotlin, such as Smart cast, and how the amount of Kotlin code changes over applications’ evolution. We found that the number of instances of features tends to grow throughout applications’ evolution. Finally, we focused on the migration of Android applications from Java to Kotlin. We found that 25% of the open source applications that were initially written in Java have entirely migrated to Kotlin, and for 19%, the migration was done gradually, throughout several versions, thanks to the interoperability between Java and Kotlin. This migration activity is challenging because: a) each migrated piece of code must be exhaustively tested after the migration to ensure it preserves the expected behavior; b) a project can be large, composed of several candidate files to be migrated. In this thesis, we present an approach to support migration, which suggests, given a version of an application written in Java and eventually, in Kotlin, the most convenient files to migrate. We evaluated our approach’s feasibility by applying two different machine learning techniques: classification and learning-to-rank. Our results showed that both techniques modestly outperform random approaches. Nevertheless, our approach is the first that proposes the use of machine learning to recommend file-level migrations. Therefore, our results define a baseline for future work. Since the migration from Java to Kotlin may positively impact the application’s maintenance and that migration is time-consuming and challenging, developers may use our approach to select the files to be migrated first. Finally, we discuss several research perspectives opened by our results that can improve the experience of creating high-quality Android applications using Kotlin.
... It offers a score of complexity, represented by the probability of the sentence being one of the two classes. In the context of the evaluation of the readability of software codes, in [35] many classifiers are tested to discover the features of code writing, which affect the software quality and to create a readability measure. The study presents a descriptive model of software readability, which is strongly correlated to the judgments of 120 human annotators. ...
Article
Full-text available
Automatic Text Complexity Evaluation (ATE) is a research field that aims at creating new methodologies to make autonomous the process of the text complexity evaluation, that is the study of the text-linguistic features (e.g., lexical, syntactical, morphological) to measure the grade of comprehensibility of a text. ATE can affect positively several different contexts such as Finance, Health, and Education. Moreover, it can support the research on Automatic Text Simplification (ATS), a research area that deals with the study of new methods for transforming a text by changing its lexicon and structure to meet specific reader needs. In this paper, we illustrate an ATE approach named DeepEva, a Deep Learning based system capable of classifying both Italian and English sentences on the basis of their complexity. The system exploits the Treetagger annotation tool, two Long Short Term Memory (LSTM) neural unit layers, and a fully connected one. The last layer outputs the probability of a sentence belonging to the easy or complex class. The experimental results show the effectiveness of the approach for both languages, compared with several baselines such as Support Vector Machine, Gradient Boosting, and Random Forest.
... Recent important research is still concerned with process maturity, but CMM(I) is combined with DevOps and agile approaches [35][36][37]. • Metric-based software maintenance: Most cited papers related to this theme were published in the period 1993-2010 and are concerned with object-oriented metrics to predict maintainability [38], metrics-based refactoring [39], predicting faults [40] and code readability metrics [41]. Recent impactful papers deal with technical debt [42], code smells and refactoring [43] and test smells [44]. ...
Article
Interconnected computers and software systems have become an indispensable part of people’s lives in the period of digital transformation. Consequently, software quality research is becoming more and more critical. There have been multiple attempts to synthesise knowledge gained in software quality research; however, they were focused mainly on single aspects of software quality and did not structure the knowledge holistically. To fill this gap, we harvested software quality publications indexed in the Scopus bibliographic database. We analysed them using synthetic content analysis which is a triangulation of bibliometrics and content analysis. The search resulted in 15,468 publications. The performance bibliometric analysis showed that the production of research publications relating to software quality is currently following an exponential growth trend and that the software quality research community is growing. The most productive country was the United States, followed by China. The synthetic content analysis revealed that the published knowledge could be structured into six themes, the most important being the themes regarding software quality improvement by enhancing software engineering, advanced software testing and improved defect and fault prediction with machine learning and data mining.
... We seek participants' recommendations on tool design requirements. Besides, we offer a few tool support options (Table 7) and employ a 5-point Likert scale (i.e., [1][2][3][4][5] to estimate the participants' consent with the tool options. In particular, we ask two questions as follows. ...
Preprint
Full-text available
Software developers often look for solutions to their code-level problems by submitting questions to technical Q&A websites like Stack Overflow (SO). They usually include example code segments with questions to describe the programming issues. SO users prefer to reproduce the reported issues using the given code segments when they attempt to answer the questions. Unfortunately, such code segments could not always reproduce the issues due to several unmet challenges (e.g., external library not found) that might prevent questions from receiving prompt and appropriate solutions. A previous study produced a catalog of potential challenges that hinder the reproducibility of issues reported at SO questions. However, it is unknown how the practitioners (i.e., developers) perceive the challenge catalog. Understanding the developers' perspective is inevitable to introduce interactive tool support that promotes reproducibility. We thus attempt to understand developers' perspectives by surveying 53 users of SO. In particular, we attempt to -- (1) see developers' viewpoints on the agreement to those challenges, (2) find the potential impact of those challenges, (3) see how developers address them, and (4) determine and prioritize tool support needs. Survey results show that about 90% of participants agree to the already exposed challenges. However, they report some additional challenges (e.g., error log missing) that might prevent reproducibility. According to the participants, too short code segment and absence of required Class/Interface/Method from code segments severely prevent reproducibility, followed by missing important part of code. To promote reproducibility, participants strongly recommend introducing tool support that interacts with question submitters with suggestions for improving the code segments if the given code segments fail to reproduce the issues.
... Corazza et al. [7] performed a study where they manually analyzed code comments and predicted human ratings using TF-IDF as well. Buse and Weimer [2,3] developed a metric for code readability based on entropy within the code. Their model was later refined by Posnett et al. [36]. ...
Chapter
Full-text available
Machine learning has emerged as a useful tool to aid software quality control. It can support identifying problematic code snippets or predicting maintenance efforts. The majority of these frameworks rely on code metrics as input. However, evidence suggests great potential for text- and image-based approaches to predict code quality as well. Using a manually labeled dataset, this preliminary study examines the use of five text- and two image-based algorithms to predict the readability, understandability, and complexity of source code. While the overall performance can still be improved, we find Support Vector Machines (SVM) outperform sophisticated text transformer mod- els and image-based neural networks. Furthermore, text-based SVMs tend to perform well on predicting readability and understandability of code, while image-based SVMs can predict code complexity more accurately. Our study both shows the potential of text- and image-based algorithms for software quality prediction and outlines their weaknesses as a starting point for further research.
... Code snippets were collected from previous research (Mi et al., 2018). As shown in Table 1, based on a model proposed by Buse and Weimer (2009), 'D_Buse., represents a set of local code attributes to symbolize code readability. Dorn's model "D_Dorn" (Dorn, 2012) focused on building a universal model for code readability. ...
... We seek participants' recommendations on tool design requirements. Besides, we offer a few tool support options (Table 7) and employ a 5-point Likert scale (i.e., [1][2][3][4][5] to estimate the participants' consent with the tool options. In particular, we ask two questions as follows. ...
... Having readable code makes it easier for developers to understand and modify the code. Studies on code readability have been conducted extensively (Scalabrino et al. 2018;Scalabrino et al. 2017;Buse and Weimer 2010;Scalabrino et al. 2016;Posnett et al. 2011;Mi et al. 2018). These studies define the metrics and features that are considered as important factors in determining code readability. ...
Article
Full-text available
The Android operating system is frequently updated, with each version bringing a new set of APIs. New versions may involve API deprecation; Android apps using deprecated APIs need to be updated to ensure the apps’ compatibility with old and new Android versions. Updating deprecated APIs is a time-consuming endeavor. Hence, automating the updates of Android APIs can be beneficial for developers. CocciEvolve is the state-of-the-art approach for this automation. However, it has several limitations, including its inability to resolve out-of-method variables and the low code readability of its updates due to the addition of temporary variables. In an attempt to further improve the performance of automated Android API update, we propose an approach named AndroEvolve, that addresses the limitations of CocciEvolve through the addition of data flow analysis and variable name denormalization. Data flow analysis enables AndroEvolve to resolve the value of any variable within the file scope. Variable name denormalization replaces temporary variables that may present in the CocciEvolve update with appropriate values in the target file. We have evaluated the performance of AndroEvolve and the readability of its updates on 372 target files containing 565 deprecated API usages. Each target file represents a file from an Android application that uses a deprecated API in its code. AndroEvolve successfully updates 481 out of 565 deprecated API invocations correctly, achieving an accuracy of 85.1%. Compared to CocciEvolve, AndroEvolve produces 32.9% more instances of correct updates. Moreover, our manual and automated evaluation shows that AndroEvolve updates are more readable than CocciEvolve updates.
Preprint
Full-text available
Marketplaces for distributing software products and services have been getting increasing popularity. GitHub, which is most known for its version control functionality through Git, launched its own marketplace in 2017. GitHub Marketplace hosts third party apps and actions to automate workflows in software teams. Currently, this marketplace hosts 440 Apps and 7,878 Actions across 32 different categories. Overall, 419 Third party developers released their apps on this platform which 111 distinct customers adopted. The popularity and accessibility of GitHub projects have made this platform and the projects hosted on it one of the most frequent subjects for experimentation in the software engineering research. A simple Google Scholar search shows that 24,100 Research papers have discussed GitHub within the Software Engineering field since 2017, but none have looked into the marketplace. The GitHub Marketplace provides a unique source of information on the tools used by the practitioners in the Open Source Software (OSS) ecosystem for automating their project's workflow. In this study, we (i) mine and provide a descriptive overview of the GitHub Marketplace, (ii) perform a systematic mapping of research studies in automation for open source software, and (iii) compare the state of the art with the state of the practice on the automation tools. We conclude the paper by discussing the potential of GitHub Marketplace for knowledge mobilization and collaboration within the field. This is the first study on the GitHub Marketplace in the field.
Article
Context Code readability, which correlates strongly with software quality, plays a critical role in software maintenance and evolvement. Although existing deep learning-based code readability models have reached a rather high classification accuracy, only structural features are utilized which inevitably limits their model performance. Objective To address this problem, we propose to extract readability-related features from visual, semantic, and structural aspects from source code in an attempt to further improve code readability classification. Method First, we convert a code snippet into a RGB matrix (for visual feature extraction), a token sequence (for semantic feature extraction) and a character matrix (for structural feature extraction). Then, we input them into a hybrid neural network that is composed of BERT, CNN, and BiLSTM for feature extraction. Finally, the extracted features are concatenated and input into a classifier to make a code readability classification. Result A series of experiments are conducted to evaluate our method. The results show that the average accuracy could reach 85.3%, which outperforms all existing models. Conclusion As an innovative work of extracting readability-related features automatically from visual, semantic, and structural aspects, our method is proved to be effective for the task of code readability classification.
Chapter
A software cost estimation is one of the integral parts of project management in every software development organization, which deals with accounting for all the measurable effort required to develop software. This topic in software engineering has been consistently being investigated for the last decade with the intermittent publication of research papers. After reviewing existing approaches, it is found that still, the problem is an open end. Therefore, this paper introduces a machine learning-based approach where a project manager computes the software cost based on the standard input. In contrast, the project manager has estimated cost is further fed to neural network processors subjected to multiple learning algorithms to perform accurate software cost prediction considering all the practical project management scenario. The comparative study outcome shows extensively better accuracy only in three stages of evaluation in the presence of multiple learning approaches.
Article
Full-text available
Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions that contain sample code segments along with the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments they provide. Issues that are not easily reproducible may impede questions from receiving prompt and appropriate solutions. We conducted an exploratory study on the reproducibility of issues discussed in 400 Java and 400 Python questions. We parsed, compiled, executed, and carefully examined the code segments from these questions to reproduce the reported programming issues, expending 300 person-hours of effort. The outcomes of our study are three-fold. First, we can reproduce the issues for approximately 68% of Java and 71% of Python code segments. In contrast, we were unable to reproduce approximately 22% of Java and 19% of Python issues. Of the reproducible issues, approximately 67% of the Java and 20% of the Python code segments required minor or major modifications to reproduce the issues. Second, we carefully investigated why programming issues could not be reproduced and provided evidence-based guidelines to write effective code examples for Stack Overflow questions. Third, we investigated the correlation between the issue reproducibility status of questions and the corresponding answer meta-data, such as the presence of an accepted answer. According to our analysis, a reproducible question has at least two times higher chance of receiving an accepted answer than an irreproducible question. Besides, the median time delay in receiving accepted answers is double if the issues reported in questions could not be reproduced. We also investigated the confounding factors (e.g., user reputation) that can affect questions receiving answers besides reproducibility. We found that such factors do not hurt the correlation between reproducibility status and answer meta-data.
Preprint
Full-text available
The relevance of code comprehension in a developer's daily work was recognized more than 40 years ago. Over the years, several studies have gathered evidence that developers do indeed invest a considerable amount of their daily work in code comprehension. Consequently, many studies were conducted to find out how developers could be supported during code comprehension and which code characteristics contribute to better comprehension. Today, such experiments are more common than ever. While this is great for advancing the field, the number of publications makes it difficult to keep an overview. Additionally, designing rigorous experiments with human participants is a challenging task, and the multitude of design decisions and options can make it difficult for researchers to select a suitable design. We therefore conducted a systematic mapping study of 95 source code comprehension experiments published between 1979 and 2019. By systematically structuring the design characteristics of code comprehension studies, we provide a basis for subsequent discussion of the huge diversity of design options in the face of a lack of basic research on their consequences and comparability. We describe what topics have been studied, as well as how these studies have been designed, conducted, and reported. Frequently chosen design options and deficiencies are pointed out. We conclude with five concrete action items that we as a research community should address moving forward to improve publications of code comprehension experiments.
Chapter
Given the wide adoption of the agile software development paradigm, where efficient collaboration as well as effective maintenance are of utmost importance, the need to produce readable source code is evident. To that end, several research efforts aspire to assess the extent to which a software component is readable. Several metrics and evaluation criteria have been proposed; however, they are mostly empirical or rely on experts who are responsible for determining the ground truth and/or set custom thresholds, leading to results that are context-dependent and subjective. In this work, we employ a large set of static analysis metrics along with various coding violations towards interpreting readability as perceived by developers. Unlike already existing approaches, we refrain from using experts and we provide a fully automated and extendible methodology built upon data residing in online code hosting facilities. We perform static analysis at two levels (method and class) and construct a benchmark dataset that includes more than one million methods and classes covering diverse development scenarios. After performing clustering based on source code size, we employ Support Vector Regression in order to interpret the extent to which a software component is readable against the source code properties: cohesion, inheritance, complexity, coupling, and documentation. The evaluation of our methodology indicates that our models effectively interpret readability as perceived by developers against the above mentioned source code properties.
Preprint
Full-text available
Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate solutions. We conducted an exploratory study on the reproducibility of issues discussed in 400 Java and 400 Python questions. We parsed, compiled, executed, and carefully examined the code segments from these questions to reproduce the reported programming issues. The outcomes of our study are three-fold. First, we found that we can reproduce approximately 68% of Java and 71% of Python issues, whereas we were unable to reproduce approximately 22% of Java and 19% of Python issues using the code segments. Of the issues that were reproducible, approximately 67% of the Java code segments and 20% of the Python code segments required minor or major modifications to reproduce the issues. Second, we carefully investigated why programming issues could not be reproduced and provided evidence-based guidelines for writing effective code examples for Stack Overflow questions. Third, we investigated the correlation between the issue reproducibility status of questions and the corresponding answer meta-data, such as the presence of an accepted answer. According to our analysis, a reproducible question has at least two times higher chance of receiving an accepted answer than an irreproducible question. Besides, the median time delay in receiving accepted answers is double if the issues reported in questions could not be reproduced. We also investigate the confounding factors (e.g., reputation) and find that confounding factors do not hurt the correlation between reproducibility status and answer meta-data.
Article
Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon’s non-parametric statistical test (α=0.05).
Article
Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-performed studies in various research domains, including software requirements, software design and development, testing and debugging and software maintenance. This paper is a first attempt to systematically organize knowledge in this area by surveying a body of 421 papers on predictive models published between 2009 and 2020. We describe the key models and approaches used, classify the different models, summarize the range of key application areas, and analyze research results. Based on our findings, we also propose a set of current challenges that still need to be addressed in future work and provide a proposed research road map for these opportunities.
Article
Full-text available
These days, over three billion users rely on mobile applications (a.k.a. apps) on a daily basis to access high-speed connectivity and all kinds of services it enables, from social to emergency needs. Having high-quality apps is therefore a vital requirement for developers to keep staying on the market and acquire new users. For this reason, the research community has been devising automated strategies to better test these applications. Despite the effort spent so far, most developers write their test cases manually without the adoption of any tool. Nevertheless, we still observe a lack of knowledge on the quality of these manually written tests: an enhanced understanding of this aspect may provide evidence-based findings on the current status of testing in the wild and point out future research directions to better support the daily activities of mobile developers. We perform a large-scale empirical study targeting 1,693 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, (3) what is their effectiveness, and (4) how well manual tests can reduce the risk of having defects in production code. In addition, we conduct a focus group with 5 Android testing experts to discuss the findings achieved and gather insights into the next research avenues to undertake. The key results of our study show Android apps are poorly tested and the available tests have low (i) design quality, (ii) effectiveness, and (iii) ability to find defects in production code. Among the various suggestions, testing experts report the need for improved mechanisms to locate potential defects and deal with the complexity of creating tests that effectively exercise the production code.
Conference Paper
Blockchain is increasingly revolutionizing a variety of sectors, from finance to healthcare. Indeed, the availability of public blockchain platforms, such as Ethereum, has stimulated the development of hundreds of decentralized apps (dApps) that combine smart contract(s) and a front-end user interface. Smart contracts are software, as well, and, as traditional software, they require to be developed and maintained or evolved. Among all the quality properties that must be assessed and guaranteed, readability is a key aspect of source code: a highly readable code facilitates its maintainability, portability, and reusability. This is especially true when considering smart contracts, where code reuse is widely adopted. Indeed, smart contract developers often integrate code portions from other smart contracts in their artifacts. To help developers and researchers more easily estimating and monitoring the code readability of smart contracts, in this demo, we present iSCREAM. iSCREAM automatically inspects Solidity smart contracts and computes a set of metrics that previous research demonstrated being related to code readability. We evaluated iSCREAM on 90 real-world smart contract functions, showing that our tool correctly computes all the aforementioned metrics. Demo webpage: https://github.com/mfredella/iSCREAM
Conference Paper
Full-text available
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on arti cial data and theoretical results in restricted settings have shown that for selecting a good classi er from a set of classiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment|over half a million runs of C4.5 and a Naive-Bayes algorithm|to estimate the e ects of di erent parameters on these algorithms on real-world datasets. For crossvalidation, we vary the number of folds and whether the folds are strati ed or not � for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold strati ed cross validation, even if computation power allows using more folds. 1
Article
Full-text available
This article argues that the general practice of describing interrater reliability as a single, unified concept is at best imprecise, and at worst potentially misleading. Rather than representing a single concept, different statistical methods for computing interrater reliability can be more accurately classified into one of three categories based upon the underlying goals of analysis. The three general categories introduced and described in this paper are: 1) consensus estimates, 2) consistency estimates, and 3) measurement estimates. The assumptions, interpretation, advantages, and disadvantages of estimates from each of these three categories are discussed, along with several popular methods of computing interrater reliability coefficients that fall under the umbrella of consensus, consistency, and measurement estimates. Researchers and practitioners should be aware that different approaches to estimating interrater reliability carry with them different implications for how ratings across multiple judges should be summarized, which may impact the validity of subsequent study results.
Article
Full-text available
Many techniques have been developed over the years to automatically find bugs in software. Often, these techniques rely on formal methods and sophisticated program analysis. While these techniques are valuable, they can be difficult to apply, and they aren't always effective in finding real bugs. Bug patterns are code idioms that are often errors. We have implemented automatic detectors for a variety of bug patterns found in Java programs. In this extended abstract1, we describe how we have used bug pattern detectors to find serious bugs in several widely used Java applications and libraries. We have found that the effort required to implement a bug pattern detector tends to be low, and that even extremely simple detectors find bugs in real applications. From our experience applying bug pattern detectors to real programs, we have drawn several interesting conclusions. First, we have found that even well tested code written by experts contains a surprising number of obvious bugs. Second, Java (and similar languages) have many language features and APIs which are prone to misuse. Finally, that simple automatic techniques can be effective at countering the impact of both ordinary mistakes and misunderstood language features.
Conference Paper
Full-text available
The F-measure - the number of distinct test cases to detect the first program failure - is an effectiveness measure for debug testing strategies. We show that for random testing with replacement, the F-measure is distributed according to the geometric distribution. A simulation study examines the distribution of two adaptive random testing methods, to study how closely their sampling distributions approximate the geometric distribution, revealing that in the worst case scenario, the sampling distribution for adaptive random testing is very similar to random testing. Our results have provided an answer to a conjecture that adaptive random testing is always a more effective alternative to random testing, with reference to the F-measure. We consider the implications of our findings for previous studies conducted in the area, and make recommendations to future studies.
Conference Paper
Full-text available
WEKA is a workbench for machine learning that is intended to aid in the application of machine learning techniques to a variety of real-world problems, in particular, those arising from agricultural and horticultural domains. Unlike other machine learning projects, the emphasis is on providing a working environment for the domain specialist rather than the machine learning expert. Lessons learned include the necessity of providing a wealth of interactive tools for data manipulation, result visualization, database linkage, and cross-validation and comparison of rule sets, to complement the basic machine learning tools
Article
Full-text available
A set of properties of syntactic software complexity measures is proposed to serve as a basis for the evaluation of such measures. Four known complexity measures are evaluated and compared using these criteria. This formalized evaluation clarifies the strengths and weaknesses of the examined complexity measures, which include the statement count, cyclomatic number, effort measure, and data flow complexity measures. None of these measures possesses all nine properties, and several are found to fail to possess particularly fundamental properties; this failure calls into question their usefulness in measuring synthetic complexity
Article
Full-text available
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensiveleaveone -out cross-validation. We report on a largescale experiment---over half a million runs of C4.5 and a Naive-Bayes algorithm---to estimate the effects of different parameters on these algorithms on real-world datasets. For crossvalidation, wevary the number of folds and whether the folds are stratified or not# for bootstrap, wevary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds. 1 Introduction It can not be emphasized eno...
Article
The utility of technical materials is influenced to a marked extent by their reading level or readability. This article describes the derivation and validation of the Automated Readability Index (ARI) for use with technical materials. The method allows for the easy, automatic collection of data as narrative material is typed on a slightly modified electric typewriter. Data collected includes word length (a measure of word difficulty) and sentence length (a measure of sentence difficulty). Appropriate weightings of these factors in a multiple regression equation result in an index of reading difficulty. Uses of the index for evaluating and controlling the readability of large quantities of technical material are described.
Article
The consensus in the programming community is that indentation aids program comprehension, although many studies do not back this up. The authors tested program comprehension on a Pascal program. Two styles of indentation were used - blocked and nonblocked - in addition to four possible levels of indentation (0, 2, 4, 6 spaces). Both experienced and novice subjects were used. Although the blocking style made no difference, the level of indentation had a significant effect on program comprehension. (2 - 4 spaces had the highest mean score for program comprehension. ) It is recommended that a moderate level of indentation be used to increase program comprehension and user satisfaction.
Article
The paper is based on the premise that the productivity and quality of software development and maintenance, particularly in large and long term projects, is related to software readability. Software readability depends on such things as coding conventions and system overview documentation. Existing mechanisms to ensure readability --- for example, peer reviews --- are not sufficient. The paper proposes that software organizations or projects institute a readability/documentation group, similar to a test or usability group. This group would be composed of programmers and would produce overview documentation and ensure code and documentation readability. The features and functions of this group are described. Its benefits and possible objections to it are discussed.
Article
It is argued that program reading is an important programmer activity and that reading skill should be taught in programming courses. Possible teaching methods are suggested. The use of program reading in test construction and as part of an overall teaching strategy is discussed. A classification of reading comprehension testing methods is provided in an appendix.
The problem of poorly written hyperdocuments has already been identified. Furthermore, there is no complete definition of hyperdocument quality and the methodology and tools that will help in analysing and assessing the quality of hyperdocuments are missing. The ability to measure attributes of hyperdocuments is indispensable for the fields of hyperdocument authoring and hypertext engineering. Useful paradigms can be drawn from the practices used in the software engineering and software measurement fields.In this paper we define a hyperdocument quality model, based on the ideas of the well-known Factor-Criteria-Metric hierarchical model. The important factors of readability and maintainability are defined, as well as the corresponding criteria. Finally, structure metrics, that can be computed on the hypertext graph, are presented. Most of these metrics are derived from well-known software metrics.Experimentation is a key issue for the application of measurement, and flexible tools for the automatic collection of measures are needed to support it. Athena, a tool that was originally developed for software measurement and later tailored to meet hypertext measurement needs, is used for hyperdocument measurement.
Conference Paper
In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from human annotators, we derive associations be- tween a simple set of local code features and human notions of readability. Using those features, we construct an au- tomated readability measure and show that it can be 80% eective, and better than a human on average, at predict- ing readability judgments. Furthermore, we show that this metric correlates strongly with two traditional measures of software quality, code changes and defect reports. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggests that comments, in of themselves, are less im- portant than simple blank lines to local judgments of read- ability.
Conference Paper
Formal specifications can help with program testing, optimization, refactoring, documentation, and, most importantly, debugging and repair. Unfortunately, formal specifications are difficult to write manually, while techniques that infer specifications automatically suffer from 90–99% false positive rates. Consequently, neither option is currently practical for most software development projects. We present a novel technique that automatically infers partial correctness specifications with a very low false positive rate. We claim that existing specification miners yield false positives because they assign equal weight to all aspects of program behavior. By using additional information from the software engineering process, we are able to dramatically reduce this rate. For example, we grant less credence to duplicate code, infrequently-tested code, and code that exhibits high turnover in the version control system. We evaluate our technique in two ways: as a preprocessing step for an existing specification miner and as part of novel specification inference algorithms. Our technique identifies which input is most indicative of program behavior, which allows off-the-shelf techniques to learn the same number of specifications using only 60% of their original input. Our inference approach has few false positives in practice, while still finding useful specifications on over 800,000 lines of code. When minimizing false alarms, we obtain a 5% false positive rate, an order-of-magnitude improvement over previous work. When used to find bugs, our mined specifications locate over 250 policy violations. To the best of our knowledge, this is the first specification miner with such a low false positive rate, and thus a low associated burden of manual inspection.
Article
The list itself is based on a concise selection of empirical data and is in rough priority order. The first fact had the most effects on defect reduction on the empirical data that was used for evaluation, while the last fact was less important. The priority of the facts is discussable and depends on the context.
Article
Frequently, when circumstances require that a computer program be modified, the program is found to be extremely difficult to read and understand. In this case a new step to make the program more readable should be added at the beginning of the software modification cycle. A small investment will make (1) the specifications for the modifications easier to write, (2) the estimate of the cost of the modifications more accurate, (3) the design for the modifications simpler, and (4) the implementation of the modifications less error-prone.
Article
Treemaps, a space-filling method for visualizing large hierarchical data sets, are receiving increasing attention. Several algorithms have been previously proposed to create more useful displays by controlling the aspect ratios of the rectangles that make up a treemap. While these algorithms do improve visibility of small items in a single layout, they introduce instability over time in the display of dynamically changing data, fail to preserve order of the underlying data, and create layouts that are difficult to visually search. In addition, continuous treemap algorithms are not suitable for displaying fixed-sized objects within them, such as images.This paper introduces a new "strip" treemap algorithm which addresses these shortcomings, and analyzes other "pivot" algorithms we recently developed showing the trade-offs between them. These ordered treemap algorithms ensure that items near each other in the given order will be near each other in the treemap layout. Using experimental evidence from Monte Carlo trials and from actual stock market data, we show that, compared to other layout algorithms, ordered treemaps are more stable, while maintaining relatively favorable aspect ratios of the constituent rectangles. A user study with 20 participants clarifies the human performance benefits of the new algorithms. Finally, we present quantum treemap algorithms, which modify the layout of the continuous treemap algorithms to generate rectangles that are integral multiples of an input object size. The quantum treemap algorithm has been applied to PhotoMesa, an application that supports browsing of large numbers of images.
Article
This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph-theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The issue of using nonstructured control flow is also discussed. A characterization of nonstructured control graphs is given and a method of measuring the ″structuredness″ of a program is developed. The last section of this paper deals with a testing methodology used in conjunction with the complexity measure; a testing strategy is defined that dictates that program can either admit of a certain minimal testing level or the program can be structurally reduced.
Article
The project conceived in 1929 by Gardner Murphy and the writer aimed first to present a wide array of problems having to do with five major "attitude areas"--international relations, race relations, economic conflict, political conflict, and religion. The kind of questionnaire material falls into four classes: yes-no, multiple choice, propositions to be responded to by degrees of approval, and a series of brief newspaper narratives to be approved or disapproved in various degrees. The monograph aims to describe a technique rather than to give results. The appendix, covering ten pages, shows the method of constructing an attitude scale. A bibliography is also given.
Conference Paper
Software systems evolve over time due to changes in requirements, optimization of code, fixes for security and reliability bugs etc. Code churn, which measures the changes made to a component over a period of time, quantifies the extent of this change. We present a technique for early prediction of system defect density using a set of relative code churn measures that relate the amount of churn to other variables such as component size and the temporal extent of churn. Using statistical regression models, we show that while absolute measures of code chum are poor predictors of defect density, our set of relative measures of code churn is highly predictive of defect density. A case study performed on Windows Server 2003 indicates the validity of the relative code churn measures as early indicators of system defect density. Furthermore, our code churn metric suite is able to discriminate between fault and not fault-prone binaries with an accuracy of 89.0 percent.
Conference Paper
This paper describes an empirical study investigating whether programmers improve the readability of their source code if they have support from a source code editor that offers dynamic feedback on their identifier naming practices. An experiment, employing both students and professional software engineers, and requiring the maintenance and production of software, demonstrated a statistically significant improvement in source code readability over that of the control.
Conference Paper
For large software systems, the maintenance phase tends to have comparatively much longer duration than all the previous life-cycle phases taken together, obviously resulting in much more effort. A good measure of software maintainability can help better manage the maintenance phase effort. Software maintainability cannot be adequately measured by only source code or by documents. The readability and understandability of both source code and documentation should be considered to measure the maintainability. This paper proposes an integrated measure of software maintainability. The paper also proposes a new representation for rule base of fuzzy models, which require less space for storage and will be efficient in finding the results in the simulation. The proposed model measures the software maintainability based on three important aspects of software-readability of source code (RSC), documentation quality (DOQ), and understandability of software (UOS). Keeping in view the nature of these parameters, a fuzzy approach has been used to integrate these three aspects. A new efficient representation of rule base has been proposed for fuzzy models. This integrated measurement of software maintainability, which to our knowledge is a first attempt to quantify integrated maintainability, is bound to be better than any other single parameter maintainability measurement approach. Thus the output of this model can advise the software project managers in judging the maintenance efforts of the software
Article
This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph-theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual Fortran programs are then presented to illustrate the correlation between intuitive complexity and the graph-theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.
Article
A 3×2 factorial experiment was performed to compare the effects of procedure format (none, internal, or external) with those of comments (absent or present) on the readability of a PL/1 program. The readability of six editions of the program, each having a different combination of these factors, was inferred from the accuracy with which students could answer questions about the program after reading it. Both extremes in readability occurred in the program editions having no procedures: without comments the procedureless program was the least readable and with comments it was the most readable
Article
Software's complexity and accelerated development schedules make avoiding defects difficult. We have found, however, that researchers have established objective and quantitative data, relationships, and predictive models that help software developers avoid predictable pitfalls and improve their ability to predict and control efficient software projects. The article presents 10 techniques that can help reduce the flaws in your code
Article
Software is the key technology in applications as diverse as accounting, hospital management, aviation, and nuclear power. Application advances in different domains such as these-each with different requirements-have propelled software development from small batch programs to large, real-time programs with multimedia capabilities. To cope, software's enabling technologies have undergone tremendous improvement in hardware, communications, operating systems, compilers, databases, programming languages, and user interfaces, among others. In turn, those improvements have fueled even more advanced applications. Improvements in VLSI technology and multimedia, for example, have resulted in faster, more compact computers that significantly widened the range of software applications. Database and user interface enhancements, on the other hand, have spawned more interactive and collaborative development environments. Such changes have a ripple effect on software development processes as well as on software techniques and tools. In this article, we highlight software development's crucial methods and techniques of the past 30 years
Article
Program understanding is an essential part of all software maintenance and enhancement activities. As currently practiced, program understanding consists mainly of code reading. The few automated understanding tools that are actually used in industry provide helpful but relatively shallow information, such as the line numbers on which variable names occur or the calling structure possible among system components. These tools rely on analyses driven by the nature of the programming language used. As such, they are adequate to answer questions concerning implementation details, so called what questions. They are severely limited, however, when trying to relate a system to its purpose or requirements, the why questions. Application programs solve real-world problems. The part of the world with which a particular application is concerned is that application's domain. A model of an application's domain can serve as a supplement to programming-language-based analysis methods and tools....
C++ Coding Standards: 101 Rules, Guidelines, and Best Practices
  • H Sutter
  • A Alexandrescu
H. Sutter and A. Alexandrescu, C++ Coding Standards: 101 Rules, Guidelines, and Best Practices. Addison-Wesley Professional, 2004.
A readability metric for computer-generated mathematics
  • S Machaffie
  • R Mcleod
  • B Roberts
  • P Todd
  • L Anderson
S. MacHaffie, R. McLeod, B. Roberts, P. Todd, and L. Anderson, " A readability metric for computer-generated mathematics, " Saltire Software, http://www.saltire.com/equation.html, Tech. Rep., retrieved 2007.
Java Coding Standards Software Development
  • S Ambler
Smog Grading&mdash,A New Readability
  • G H Mclaughlin