Article

Learning a Metric for Code Readability

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from 120 human annotators, we derive associations between a simple set of local code features and human notions of readability. Using those features, we construct an automated readability measure and show that it can be 80 percent effective and better than a human, on average, at predicting readability judgments. Furthermore, we show that this metric correlates strongly with three measures of software quality: code changes, automated defect reports, and defect log messages. We measure these correlations on over 2.2 million lines of code, as well as longitudinally, over many releases of selected projects. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggest that comments, in and of themselves, are less important than simple blank lines to local judgments of readability.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... According to [41] "code readability measures the effort of the developer to access the information contained in the code." Poor code readability increases the difficulty of software maintenance tasks [41]- [43]. Therefore, code-readability score (the higher the score, the more readable the code) should be negatively correlated with the maintenance effort. ...
... We based this study on three publicly available datasets 5 collected by Buse and Weimer [43] (D b&w ), Dorn [44] (D dorn ), and Scalabrino et al. [41] (D scal ). The datasets include code snippets written in Java for which readability was manually assessed by human annotators. ...
... We could also see that equality operators and assignment operators are consistently being used as nodes in the decision trees. According to Buse and Weimer [43] the number of brackets, and line length are very strong predictors of code readability. ...
Article
Full-text available
Context: Lines of code (LOC) is a fundamental software code measure that is widely used as a proxy for software development effort or as a normalization factor in many other software-related measures (e.g., defect density). Unfortunately, the problem is that it is not clear which lines of code should be counted: all of them or some specific ones depending on the project context and task in mind? Objective: To design a generator of task-specific LOC measures and their counters mined directly from data that optimize the correlation between the LOC measures and variables they proxy for (e.g., code-review duration). Method: We use Design Science Research as our research methodology to build and validate a generator of task-specific LOC measures and their counters. The generated LOC counters have a form of binary decision trees inferred from historical data using Genetic Programming. The proposed tool was validated based on three tasks, i.e., mining LOC measures to proxy for code readability, number of assertions in unit tests, and code-review duration. Results: Task-specific LOC measures showed a “strong” to “very strong” negative correlation with code-readability score (Kendall’s τ ranging from -0.83 to -0.76) compared to “weak” to “strong” negative correlation for the best among the standard LOC measures (τ ranging from -0.36 to -0.13). For the problem of proxying for the number of assertions in unit tests, correlation coefficients were also higher for task-specific LOC measures by ca. 11% to 21% (τ ranged from 0.31 to 0.34). Finally, task-specific LOC measures showed a stronger correlation with code-review duration than the best among the standard LOC measures (τ = 0.31, 0.36, and 0.37 compared to 0.11, 0.08, 0.16, respectively). Conclusions: Our study shows that it is possible to mine task-specific LOC counters from historical datasets using Genetic Programming. Task-specific LOC measures obtained that way show stronger correlations with the variables they proxy for than the standard LOC measures.
... Automatic static analysis tools (ASAT), such as SonarQube 1 , offer a set of metrics that can identify potential readability violations in code. The readability models aim to measure the effort required to read code on single snapshots [7] [8] [9], or in code changes 1 https://www.sonarsource.com/products/sonarqube/ [10]. ...
... According to Buse & Weimer, code readability refers to how easily a human can understand code [9]. Reading code is often the most time-consuming task in software maintenance [22], making code readability an important aspect of software development. ...
... 6) Improve Formatting (Source Code Structure): (27 out of 370 samples). Previous readability models have already addressed code formatting as line length and indentation [9] [8] [42]. This improvement aims to visually group related code blocks and make them easier to distinguish from one another. ...
Preprint
Full-text available
Readability models and tools have been proposed to measure the effort to read code. However, these models are not completely able to capture the quality improvements in code as perceived by developers. To investigate possible features for new readability models and production-ready tools, we aim to better understand the types of readability improvements performed by developers when actually improving code readability, and identify discrepancies between suggestions of automatic static tools and the actual improvements performed by developers. We collected 370 code readability improvements from 284 Merged Pull Requests (PRs) under 109 GitHub repositories and produce a catalog with 26 different types of code readability improvements, where in most of the scenarios, the developers improved the code readability to be more intuitive, modular, and less verbose. Surprisingly, SonarQube only detected 26 out of the 370 code readability improvements. This suggests that some of the catalog produced has not yet been addressed by SonarQube rules, highlighting the potential for improvement in Automatic static analysis tools (ASAT) code readability rules as they are perceived by developers.
... Code with high readability and understandability can be also provided as examples to students in programming courses. Given these benefits, many studies have been conducted to predict the readability [4]- [7] and understandability [3], [8]. ...
... Buse and Weimer [4] constructed the first general model for predicting the readability of source code. Their model mainly relies on local code features (e.g., line length, number of identifiers, identifier length). ...
... Posnett et al. [5] built upon the work of Buse and Weimer [4] and presented a new model based on size and code entropy. Given a certain size, a higher entropy leads to higher readability, and vice versa. ...
Preprint
Full-text available
Context: Developers spend most of their time comprehending source code during software development. Automatically assessing how readable and understandable source code is can provide various benefits in different tasks, such as task triaging and code reviews. While several studies have proposed approaches to predict software readability and understandability, most of them only focus on local characteristics of source code. Besides, the performance of understandability prediction is far from satisfactory. Objective: In this study, we aim to assess readability and understandability from the perspective of language acquisition. More specifically, we would like to investigate whether code readability and understandability are correlated with the naturalness and vocabulary difficulty of source code. Method: To assess code naturalness, we adopted the cross-entropy metric, while we use a manually crafted list of code elements with their assigned advancement levels to assess the vocabulary difficulty. We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of code readability and understandability prediction methods. The study will be conducted on existing datasets.
... Code readability is defined as a measure of how easily developers can read and understand the source code (Buse and Weimer 2010;Lee et al 2013). It is an essential software quality attribute, which may well influence the source code's maintainability, reusability, portability, and reliability (Alawad et al 2019;Sedano 2016). ...
... Existing classification models can be divided into two categories. The first category is traditional machine learning-based methods (Buse and Weimer 2010;Dorn 2012). These methods first handcraft readability-related features (e.g., the number of identifiers), and then apply machine learning models (e.g., logistic regression) for readability prediction. ...
... Then, we present the GNN-based code representation research. Buse and Weimer (2010) first established a readability model based on human readability judgments. They investigated 120 human annotators and asked each to score 100 code snippets. ...
Article
Full-text available
Context Code readability is crucial for developers since it is closely related to code maintenance and affects developers’ work efficiency. Code readability classification refers to the source code being classified as pre-defined certain levels according to its readability. So far, many code readability classification models have been proposed in existing studies, including deep learning networks that have achieved relatively high accuracy and good performance. Objective However, in terms of representation, these methods lack effective preservation of the syntactic and semantic structure of the source code. To extract these features, we propose a graph-based code representation method. Method Firstly, the source code is parsed into a graph containing its abstract syntax tree (AST) combined with control and data flow edges to reserve the semantic structural information and then we convert the graph nodes’ source code and type information into vectors. Finally, we train our graph neural networks model composing Graph Convolutional Network (GCN), DMoNPooling, and K-dimensional Graph Neural Networks (k-GNNs) layers to extract these features from the program graph. Result We evaluate our approach to the task of code readability classification using a Java dataset provided by Scalabrino et al. (2016). The results show that our method achieves 72.5% and 88% in three-class and two-class classification accuracy, respectively. Conclusion We are the first to introduce graph-based representation into code readability classification. Our method outperforms state-of-the-art readability models, which suggests that the graph-based code representation method is effective in extracting syntactic and semantic information from source code, and ultimately improves code readability classification.
... To compute the code readability, we use the tool developed by Buse and Weimer [11]. Their readability model is trained on human perception of readability or understandability. ...
... Multiple annotators were not involved since we only concentrated on the code snippets' structure, completeness, and noises (e.g., stack traces). However, we first carefully analyzed the tool's specifications [11] to calculate the readability score. Then we check the violations that hurt readability based on those specifications. ...
... We estimated the readability of the code snippets using an existing tool [11]. However, the perceived readability is something different from the actual understandability of the code. ...
Preprint
Full-text available
In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e., verified solutions) also do not score the maximum votes. All these counter-intuitive findings cast doubts on the reliability of the evaluation mechanism employed at SO. Moreover, many users raise concerns against the evaluation, especially downvotes to their posts. Therefore, rigorous verification of the subjective evaluation is highly warranted to ensure a non-biased and reliable quality assessment mechanism. In this paper, we compare the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. According to our investigation, four objective metrics agree with the subjective evaluation, two do not agree, one either agrees or disagrees, and the remaining three neither agree nor disagree with the subjective evaluation. We then develop machine learning models to classify the promoted and discouraged questions. Our models outperform the state-of-the-art models with a maximum of about 76% - 87% accuracy.
... However, there have been few investigations of the relationship between software readability and complexity [7,8]. In the first paper, Buse and Weimer proposed an approach for constructing an automatic readability tool using local code features. ...
... Ref. [7] investigated the relationship between code readability and software quality and proposed an approach for constructing an automatic readability tool using local code features. A survey was applied to 120 human annotators with 100 code snippets, which gave a total of 12,000 human judgments. ...
... Some researchers have investigated the relation between software complexity and code readability as in [7,8,23]. ...
Article
Full-text available
Citation: Tashtoush, Y.; Abu-El-Rub, N.; Darwish, O.; Al-Eidi, S.; Darweesh, D.; Karajeh, O. A Notional Understanding of the Relationship between Code Readability and Software Complexity. Information 2023, 14, 81. https://doi. Abstract: Code readability and software complexity are considered essential components of software quality. They significantly impact software metrics, such as reusability and maintenance. The maintainability process consumes a high percentage of the software lifecycle cost, which is considered a very costly phase and should be given more focus and attention. For this reason, the importance of code readability and software complexity is addressed by considering the most time-consuming component in all software maintenance activities. This paper empirically studies the relationship between code readability and software complexity using various readability and complexity metrics and machine learning algorithms. The results are derived from an analysis dataset containing roughly 12,180 Java files, 25 readability features, and several complexity metric variables. Our study empirically shows how these two attributes affect each other. The code readability affects software complexity with 90.15% effectiveness using a decision tree classifier. In addition, the impact of software complexity on the readability of code using the decision tree classifier has a 90.01% prediction accuracy.
... The latest research on code recommendation [4,51] uses an acceptance rate that distinguishes the candidate's answers with the popularity rate [52]. Popularity shows the proximity of a source code with the frequent patterns that are appeared in a corpus of the code snippets. ...
... Previous studies employed four code metrics for code search and code quality estimation [51]. Query-independent features are grouped into a set, which is termed as code metrics. ...
... Table 1 abridges the metrics that we used throughout the approach. Buse et al. [51] specified that the total number of LOC (line of codes) and the average number of associated identifiers with each LOC are used to determine the reliability of the code. Code reliability is the quality metric that is measured using the average number of identifiers on each line of the code. ...
Article
Full-text available
The development of robotic applications necessitates the availability of useful, adaptable, and accessible programming frameworks. Robotic, IoT, and sensor-based systems open up new possibilities for the development of innovative applications, taking advantage of existing and new technologies. Despite much progress, the development of these applications remains a complex, time-consuming, and demanding activity. Development of these applications requires wide utilization of software components. In this paper, we propose a platform that efficiently searches and recommends code components for reuse. To locate and rank the source code snippets, our approach uses a machine learning approach to train the schema. Our platform uses trained schema to rank code snippets in the top k results. This platform facilitates the process of reuse by recommending suitable components for a given query. The platform provides a user-friendly interface where developers can enter queries (specifications) for code search. The evaluation shows that our platform effectively ranks the source code snippets and outperforms existing baselines. A survey is also conducted to affirm the viability of the proposed methodology.
... Given the importance of code readability, researchers have investigated the factors that could influence code readability [146,164,48]. Subsequently, these factors have been used to construct new models to automatically assess code readability [59,181,80,200,199], in particular, and code quality, in general. ...
... Previous works focused on automatic assessment of code readability [58,59,181,80,199]. All the state-of-the-art approaches use machine learning: they all define some features measured on the source code and they train a binary classifier to distinguish readable code from unreadable code. ...
... The authors obtained a positive correlation with software quality metrics, code changes and defect reports. Thus, Buse et al. [58,59] designed the first readability model based on structural features. Instead, the study of Dorn [80] extended the model created by the study of Posnett et al. [181]. ...
... In an effort to provide a holistic view of readability, we extracted different sets of metrics using the tool of Scalabrino et al. [30]. More specifically, the metrics extracted involve (a) the structural metrics identified by Buse and Weimer [31] (including, e.g., the number of identifiers, the number of loops, etc.), (b) the structural metrics defined by Posnett et al. [32] (lines of code, entropy, and Halstead's Volume), (c) the visual metrics of Dorn [33] (e.g., indentation length) (The Dorn metrics also include certain metrics that are relevant to spatial perception or natural language; however, we may also assume that these are relevant to the visual aspect of code.), and (d) the textual metrics of Scalabrino et al. [30] (e.g., Comments and Identifiers Consistency, Textual Coherence, etc.). ...
... Moreover, the data facilitate the design and development of readability models, either by using the source code directly [40,41] or the associated static analysis metrics [42][43][44]. When doing so, comparison with the state of the art is straightforward, as our dataset includes different metrics for readability [30][31][32][33]. ...
Article
Full-text available
The availability of code snippets in online repositories like GitHub has led to an uptick in code reuse, this way further supporting an open-source component-based development paradigm. The likelihood of code reuse rises when the code components or snippets are of high quality, especially in terms of readability, making their integration and upkeep simpler. Toward this direction, we have developed a dataset of code snippets that takes into account both the functional and the quality characteristics of the snippets. The dataset is based on the CodeSearchNet corpus and comprises additional information, including static analysis metrics, code violations, readability assessments, and source code similarity metrics. Thus, using this dataset, both software researchers and practitioners can conveniently find and employ code snippets that satisfy diverse functional needs while also demonstrating excellent readability and maintainability.
... Figure 2 shows a simple example of SC reported in [4], which rewards anyone who solves 73 a problem and submit the solution to the SC. This contract has been selected as an 74 example of an old style solidity smart contracts, in fact many of the constructs it uses 75 are now deprecated, but it is instructive since it also represents how the solidity 76 language and the metrics used in it changed along time. ...
... Moreover, we assume that in order to 734 increase public trust, the solidity developers tend to write smart contracts easy to 735 understand. Indeed, a program easy to understand should have a low cyclomatic 736 complexity although literature shows that readability, as intended by humans, weakly 737 correlates with low cyclomatic metrics [75]. ...
Article
Full-text available
Smart contracts (SC) are software programs that reside and run over a blockchain. The code can be written in different languages with the common purpose of implementing various kinds of transactions onto the hosting blockchain. They are ruled by the blockchain infrastructure with the intent to automatically implement the typical conditions of traditional contracts. Programs must satisfy context-dependent constraints which are quite different from traditional software code. In particular, since the bytecode is uploaded in the hosting blockchain, the size, computational resources, interaction between different parts of the program are all limited. This is true even if the specific programming languages implement more or less the same constructs as that of traditional languages: there is not the same freedom as in normal software development. The working hypothesis used in this article is that Smart Contract specific constraints should be captured by specific software metrics (that may differ from traditional software metrics). We tested this hypothesis on 85K Smart Contracts written in Solidity and uploaded on the Ethereum blockchain. We analyzed Smart Contracts from two repositories "Etherscan" and "Smart Corpus" and we computed the statistics of a set of software metrics related to Smart Contracts and compared them to the metrics extracted from more traditional software projects. Our results show that generally, Smart Contract metrics have more restricted ranges than the corresponding metrics in traditional software systems. Some of the stylized facts, like power law in the tail of the distribution of some metrics, are only approximate but the lines of code follow a log-normal distribution which reminds us of the same behaviour already found in traditional software systems.
... Jizhou et al. [13] combined structural information with textual and non-textual features to extract high-quality pairs in discussion threads from an online discussion forum using a support vector machine. Using the same method, Buse and Weimer [18] also combined textual and non-textual features to predict the best answers in a Yahoo Q&A dataset. ...
... While code length [11] or features of code readability, such as the number of lines of code and the average number of identifiers per line [18], could be linked to an answer's quality, this might not always be true, as evidence shows that such predictors can be due to chance [42]. The time lag was seen to be the feature with the highest coefficient. ...
Conference Paper
Stack Overflow is used to solve programming issues during software development. Research efforts have looked to identify relevant content on this platform. In particular, researchers have proposed various modelling techniques to predict acceptable Stack Overflow answers. Less interest, however, has been dedicated to examining the performance and quality of typically used modelling methods with respect to the model and feature complexity. Such insights could be of practical significance to the many practitioners who develop models for Stack Overflow. This study examines the performance and quality of two modelling methods, of varying degree of complexity, used for predicting Java and JavaScript acceptable answers on Stack Overflow. Our dataset comprised 249,588 posts drawn from years 2014–2016. Outcomes reveal significant differences in models’ performances and quality given the type of features and complexity of models used. Researchers examining model performance and quality and feature complexity may leverage these findings in selecting suitable modelling approaches for Q&A prediction.
... -Subjective rating. This is the subjective perception of how well maintainers believe that they understood the code they are asked to maintain, usually on an ordinal scale (Börstler and Paech 2016;Buse and Weimer 2010;Scalabrino et al. 2021). (Floyd et al. 2017;Fucci et al. 2019;Ikutani and Uwano 2014;Peitek et al. 2020;Sharafi et al. 2021) investigated the physiological activities occurring in the human body when understanding software code, involving for instance the brain, heart, and skin. ...
Article
Full-text available
Context Insufficient code understandability makes software difficult to inspect and maintain and is a primary cause of software development cost. Several source code measures may be used to identify difficult-to-understand code, including well-known ones such as Lines of Code and McCabe’s Cyclomatic Complexity, and novel ones, such as Cognitive Complexity. Objective We investigate whether and to what extent source code measures, individually or together, are correlated with code understandability. Method We carried out an empirical study with students who were asked to carry out realistic maintenance tasks on methods from real-life Open Source Software projects. We collected several data items, including the time needed to correctly complete the maintenance tasks, which we used to quantify method understandability. We investigated the presence of correlations between the collected code measures and code understandability by using several Machine Learning techniques. Results We obtained models of code understandability using one or two code measures. However, the obtained models are not very accurate, the average prediction error being around 30%. Conclusions Based on our empirical study, it does not appear possible to build an understandability model based on structural code measures alone. Specifically, even the newly introduced Cognitive Complexity measure does not seem able to fulfill the promise of providing substantial improvements over existing measures, at least as far as code understandability prediction is concerned. It seems that, to obtain models of code understandability of acceptable accuracy, process measures should be used, possibly together with new source code measures that are better related to code understandability.
... Buse and Weimer [25] built a readability metric for source code. A predictive model was developed by Daka et al. [26] to assess the readability of unit tests, which was applied to Evo-Suite to produce more readable tests by including readability as a secondary objective. ...
Preprint
Full-text available
Automatic unit test generators such as EvoSuite are able to automatically generate unit test suites with high coverage. This removes the burden of writing unit tests from developers, but the generated tests are often difficult to understand for them. In this paper, we introduce the MicroTestCarver approach that generates unit tests starting from manual or scripted end-to-end (E2E) tests. Using carved information from these E2E tests, we generate unit tests that have meaningful test scenarios and contain actual test data. When we apply our MicroTestCarver approach, we observe that 85% of the generated tests are executable. Through a user study involving 20 participants, we get indications that tests generated with MicroTestCarver are relatively easy to understand.
... More research is therefore necessary to (a) define indicators that capture code properties that matter for developers, to (b) develop software quality models that are relevant for developers and to (c) define metrics that reliably measure the quality attributes that are relevant for developers. Metrics for, e.g., readability (Buse and Weimer 2010) and comprehensibility (Scalabrino et al. 2019) have been defined and studied before. However, we see a need for further empirical research in professional software development contexts. ...
Article
Full-text available
There are many aspects of code quality, some of which are difficult to capture or to measure. Despite the importance of software quality, there is a lack of commonly accepted measures or indicators for code quality that can be linked to quality attributes. We investigate software developers’ perceptions of source code quality and the practices they recommend to achieve these qualities. We analyze data from semi-structured interviews with 34 professional software developers, programming teachers and students from Europe and the U.S. For the interviews, participants were asked to bring code examples to exemplify what they consider good and bad code, respectively. Readability and structure were used most commonly as defining properties for quality code. Together with documentation, they were also suggested as the most common target properties for quality improvement. When discussing actual code, developers focused on structure, comprehensibility and readability as quality properties. When analyzing relationships between properties, the most commonly talked about target property was comprehensibility. Documentation, structure and readability were named most frequently as source properties to achieve good comprehensibility. Some of the most important source code properties contributing to code quality as perceived by developers lack clear definitions and are difficult to capture. More research is therefore necessary to measure the structure, comprehensibility and readability of code in ways that matter for developers and to relate these measures of code structure, comprehensibility and readability to common software quality attributes.
... Readability (R) combines different code features to calculate a single value for estimating code readability. We used the readability metric proposed by Buse et al. [52] which generates a readability score for a given method. The readability scores range from 0 to 1 for specifying least readable code to most readable code, respectively. ...
Preprint
Full-text available
In the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect prediction method to estimate the severity of the identified bugs so that the higher-severity ones get immediate attention. In this study, we investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels of two prominent datasets. We leverage several source metrics at method-level granularity to train eight different machine-learning models. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. We then use the pre-trained CodeBERT LLM to study the source code representations' effectiveness in predicting bug severity. CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics, compared to the best classic prediction model on source code metric. Finally, we integrate source code metrics into CodeBERT as an additional input, using our two proposed architectures, which both enhance the CodeBERT model effectiveness.
... This metric combines different code features to calculate a single value for estimating code readability. We used the readability metric proposed by Buse et al. Buse and Weimer (2009) which generates a readability score for a given method. The readability scores range from 0 to 1 for specifying least readable code to most readable code, respectively. The authors concluded that this metric has a significant level of correlation with defects, code churn, and self-reported stability. ...
... Readability scores [32][33][34][35][36], based on the frequency of words' occurrence, were used to evaluate the texts' readability and applied to code readability for software development [37,38]. However, these scores are not designed to capture the changes in the document structure, that is, the order of the information presentation. ...
Preprint
Full-text available
Textual documents need to be of good quality to ensure effective asynchronous communication in remote areas, especially during the COVID-19 pandemic. However, defining a preferred document structure (content and arrangement) for improving lay readers' decision-making is challenging. First, the types of useful content for various readers cannot be determined simply by gathering expert knowledge. Second, methodologies to evaluate the document's usefulness from the user's perspective have not been established. This study proposed the experimental framework to identify useful contents of documents by aggregating lay readers' insights. This study used 200 online recipes as research subjects and recruited 1,340 amateur cooks as lay readers. The proposed framework identified six useful contents of recipes. Multi-level modeling then showed that among the six identified contents, suitable ingredients or notes arranged with a subheading at the end of each cooking step significantly increased recipes' usefulness. Our framework contributes to the communication design via documents.
... While there are multiple studies on code readability models (e.g., [11], [15]), empirical code readability studies involving project/code repositories and developers/students (e.g., [16]- [18]), we limit our reporting of related work to only studies examining test code's readability. These studies include generating/evaluating test code identifier names, specifically test method names, and work that proposes test readability models or utilizes readability models on test code. ...
Preprint
Full-text available
Unit testing is a vital part of the software development process and involves developers writing code to verify or assert production code. Furthermore, to help comprehend the test case and troubleshoot issues, developers have the option to provide a message that explains the reason for the assertion failure. In this exploratory empirical study, we examine the characteristics of assertion messages contained in the test methods in 20 open-source Java systems. Our findings show that while developers rarely utilize the option of supplying a message, those who do, either compose it of only string literals, identifiers, or a combination of both types. Using standard English readability measuring techniques, we observe that a beginner's knowledge of English is required to understand messages containing only identifiers, while a 4th-grade education level is required to understand messages composed of string literals. We also discuss shortcomings with using such readability measuring techniques and common anti-patterns in assert message construction. We envision our results incorporated into code quality tools that appraise the understandability of assertion messages.
... However, to the best of our knowledge, no code recommender explicitly focuses on these aspects when deciding which recommendation to trigger. While this could be possible exploiting the readability metrics previously defined in the literature [10], [14], [32], [39], [45], it is still unclear to what extent such metrics work on artificial code. ...
Preprint
The automatic generation of source code is one of the long-lasting dreams in software engineering research. Several techniques have been proposed to speed up the writing of new code. For example, code completion techniques can recommend to developers the next few tokens they are likely to type, while retrieval-based approaches can suggest code snippets relevant for the task at hand. Also, deep learning has been used to automatically generate code statements starting from a natural language description. While research in this field is very active, there is no study investigating what the users of code recommender systems (i.e., software practitioners) actually need from these tools. We present a study involving 80 software developers to investigate the characteristics of code recommender systems they consider important. The output of our study is a taxonomy of 70 "requirements" that should be considered when designing code recommender systems. For example, developers would like the recommended code to use the same coding style of the code under development. Also, code recommenders being "aware" of the developers' knowledge (e.g., what are the framework/libraries they already used in the past) and able to customize the recommendations based on this knowledge would be appreciated by practitioners. The taxonomy output of our study points to a wide set of future research directions for code recommenders.
Article
Full-text available
Context Reading and understanding the source code are fundamental to supporting software programmers’ daily activities. Still, there is no agreement regarding the program attributes needed to achieve the readability and comprehensibility of source code. Objective To analyze the influence of comments presence, indentation spacing, identifiers length, and code size on the readability and comprehensibility of source code from the perspective of novice and experienced programmers. Method We performed three primary studies and collected quantitative (Likert) and qualitative data representing the programmers’ judgment regarding the readability and comprehensibility of code snippets. For each study, the influence of the four attributes on the readability and comprehensibility of source code was analyzed. The results were aggregated using the odds-ratio approach and analyzed concerning participants’ programming experience. Results The quality characteristics were not significantly affected (alpha = 5%) by either the indentation spacing or the code size, whereas the presence of comments and identifier length affect source code quality positively under such characteristics, according to both novices and experienced programmers. Conclusions Although the results presented findings with statistical significance, the controlled factors and participants’ stratification between novices and experienced were insufficient to explain the contradictory findings in the technical literature concerning the impact of the attributes under evaluation on the readability and comprehensibility of source code.
Article
Code readability models are typically based on the code's structural and textual features, considering code readability as an objective category. However, readability is inherently subjective and dependent on the knowledge and experience of the reader analyzing the code. This paper assesses the readability of Python code statements commonly used in undergraduate programming courses. Our readability model is based on tracking the reader's eye movement during the while‐read phase. It uses machine learning (ML) techniques and relies on a novel set of features—observational features—that capture how the readers read the code. We experimented by tracking the eye movement of 90 undergraduate students while assessing the readability of 48 Python code snippets. We trained an ML model that predicts readability based on the collected observational data and the code snippet's structural and textual features. In our experiments, the XGBoost classifier trained using observational features exclusively achieved the best results (0.85 F ‐measure). Using correlation analysis, we identified Python statements most affecting readability for undergraduate students and proposed implications for teaching Python programming. In line with findings for Java language, we found that constructs related to the code's size and complexity hurt the code's readability. Numerous comments also hindered readability, potentially due to their association with less readable code. Some Python‐specific statements (list comprehension, lambda function, and dictionary comprehension) harmed code readability, even though they were part of the curriculum. Tracking students' gaze indicated some additional factors, most notably nonlinearity introduced by if, for, while, try, and function call statements.
Article
The relevance of code comprehension in a developer’s daily work was recognized more than 40 years ago. Consequently, many experiments were conducted to find out how developers could be supported during code comprehension and which code characteristics contribute to better comprehension. Today, such studies are more common than ever. While this is great for advancing the field, the number of publications makes it difficult to keep an overview. Additionally, designing rigorous code comprehension experiments with human participants is a challenging task, and the multitude of design options can make it difficult for researchers, especially newcomers to the field, to select a suitable design. We therefore conducted a systematic mapping study of 95 source code comprehension experiments published between 1979 and 2019. By structuring the design characteristics of code comprehension studies, we provide a basis for subsequent discussion of the huge diversity of design options in the face of a lack of basic research on their consequences and comparability. We describe what topics have been studied, as well as how these studies have been designed, conducted, and reported. Frequently chosen design options and deficiencies are pointed out to support researchers of all levels of domain expertise in designing their own studies.
Conference Paper
Developers often rely on code search engines to find high-quality and reusable code snippets online, such as those available on Stack Overflow. Recently, ChatGPT, a language model trained for dialog tasks, has been gaining attention as a promising approach for code snippet generation. However, there is still a need for in-depth analysis of the quality of its recommendations. In this work, we propose the evaluation of the readability of code snippets generated by Chat-GPT, comparing them with those recommended by CROKAGE, a state-of-the-art code search engine for Stack Overflow.We compare the recommended snippets of both approaches using readability issues raised by the automated static analysis tool (ASAT) SonarQube. Our results show that ChatGPT can generate cleaner code snippets and more consistent naming and code conventions than those written by humans and recommended by CROKAGE. However, in some cases, ChatGPT generates code that lacks recent features from Java API such as try-with-resources, lambdas, and others. Overall, our findings suggest that ChatGPT can provide valuable assistance to developers searching for didactic and high-quality code snippets online. However, it is still important for developers to review the generated code, either manually or assisted by an ASAT, to prevent potential readability issues, as the correctness of the generated code snippets.
Article
Full-text available
Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is, which can further be used in text simplification, resource recommendation, and curriculum planning. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accurate assessment. While several datasets, readability indices (RI) and assessment models have been proposed for military agencies manuals, health documents, and early educational materials, studies related to the readability assessment of Computer Science literature are limited. To address this gap, we have contributed Computer science (CS) literature dataset AGREE, comprising 42,850 learning resources(LR). We assessed the readability of scientific content pertaining to domains of Computer Science (CS), Machine Learning (ML), Software Engineering (SE), and Natural Language Processing (NLP). Learning Objects (LO) comprises of research papers, Lecture notes and Wikipedia pages of topics list of learning repositories for CS, NLP, SE and ML in English Language. From the statistically significant sample of LOs extracted from the six selected datasets,in addition to our contributed Software Engineering Learning Resources dataset (SELRD), two annotators manually annotated LO’s text difficulty and established gold standard. Text readability was computed using 14 readability Indices and 12 lexical measures. Readability measures were ensembled, in addition readability indices and lexical attributes were used to train the model for readability assessment. The results show suitability of Extra Tree classifier for AGREE dataset having better accuracy, F1 score, and efficiency. Our study reveals that there is no consensus among the readability measures for smaller texts i.e. less than 100 lexicons, but for the LOs with larger text, accuracy improves. Our contribution of AGREE and SELRD datasets and associated readability measures is novel and would be useful in training deep learning models for readability assessments, recommender system, and curriculum planning related to CS domain. In future, we plan to increase the dataset size and diversity by adding more LR of other subareas of the CS domain, along with deep learning methods for readability assessment.
Article
Understanding code depends not only on the code but also on the brain.
Article
At UC Davis, in late 2010, we noticed that developers frequently debated the topic of code readability and style. Practitioners often expressed strong, off-the-cuff judgments regarding the read- ability of code, but mostly disagreed on the precise attributes of readability. While developers clearly cared about the factors they believed related to readability, the concept was not well defined.
Thesis
Full-text available
With the growing relevance of non-financial reporting in the business context, creativity in communication mechanisms, as well as the way and clarity with which it is carried out, has globally become a factor of differentiation between companies. In this sense, the present investigation aims to analyze the potential influence of the economic and financial performance of listed companies in Portugal on the level of readability of the non-financial information disclosed by them. The sample, composed by 23 entities, with had its economic and financial data collected from the SABI database, while the analysis of the non-financial reporting focused on the Sustainability Reports or, in their absence, in the section associated with this theme of the Annual Reports, for the period from 2016 up to 2021. The non-financial information was processed with computer aid, namely the “Readable” software and, later, together with the economic and financial data, analyzed based on univariate, bivariate and multivariate analysis. The results obtained indicate a negative influence of the capital structure on readability levels. Also, the size and percentage of women on Board (CA) reveal statistically significant relations, with smaller entities presenting a higher readability and the presence of women showing a positive influence. This research presents itself as a contribution to an area whose empirical evidence is not yet exhaustive and explores the Portuguese market, which in itself is found under analyzed. Keywords: Non-Financial Reporting; Economic-Financial Performance; Readability; Sustainability Reports; Non-Financial Information.
Conference Paper
Full-text available
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on arti cial data and theoretical results in restricted settings have shown that for selecting a good classi er from a set of classiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment|over half a million runs of C4.5 and a Naive-Bayes algorithm|to estimate the e ects of di erent parameters on these algorithms on real-world datasets. For crossvalidation, we vary the number of folds and whether the folds are strati ed or not � for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold strati ed cross validation, even if computation power allows using more folds. 1
Article
Full-text available
This article argues that the general practice of describing interrater reliability as a single, unified concept is at best imprecise, and at worst potentially misleading. Rather than representing a single concept, different statistical methods for computing interrater reliability can be more accurately classified into one of three categories based upon the underlying goals of analysis. The three general categories introduced and described in this paper are: 1) consensus estimates, 2) consistency estimates, and 3) measurement estimates. The assumptions, interpretation, advantages, and disadvantages of estimates from each of these three categories are discussed, along with several popular methods of computing interrater reliability coefficients that fall under the umbrella of consensus, consistency, and measurement estimates. Researchers and practitioners should be aware that different approaches to estimating interrater reliability carry with them different implications for how ratings across multiple judges should be summarized, which may impact the validity of subsequent study results.
Article
Full-text available
First Page of the Article
Article
Full-text available
Many techniques have been developed over the years to automatically find bugs in software. Often, these techniques rely on formal methods and sophisticated program analysis. While these techniques are valuable, they can be difficult to apply, and they aren't always effective in finding real bugs. Bug patterns are code idioms that are often errors. We have implemented automatic detectors for a variety of bug patterns found in Java programs. In this extended abstract1, we describe how we have used bug pattern detectors to find serious bugs in several widely used Java applications and libraries. We have found that the effort required to implement a bug pattern detector tends to be low, and that even extremely simple detectors find bugs in real applications. From our experience applying bug pattern detectors to real programs, we have drawn several interesting conclusions. First, we have found that even well tested code written by experts contains a surprising number of obvious bugs. Second, Java (and similar languages) have many language features and APIs which are prone to misuse. Finally, that simple automatic techniques can be effective at countering the impact of both ordinary mistakes and misunderstood language features.
Conference Paper
Full-text available
The F-measure - the number of distinct test cases to detect the first program failure - is an effectiveness measure for debug testing strategies. We show that for random testing with replacement, the F-measure is distributed according to the geometric distribution. A simulation study examines the distribution of two adaptive random testing methods, to study how closely their sampling distributions approximate the geometric distribution, revealing that in the worst case scenario, the sampling distribution for adaptive random testing is very similar to random testing. Our results have provided an answer to a conjecture that adaptive random testing is always a more effective alternative to random testing, with reference to the F-measure. We consider the implications of our findings for previous studies conducted in the area, and make recommendations to future studies.
Conference Paper
Full-text available
WEKA is a workbench for machine learning that is intended to aid in the application of machine learning techniques to a variety of real-world problems, in particular, those arising from agricultural and horticultural domains. Unlike other machine learning projects, the emphasis is on providing a working environment for the domain specialist rather than the machine learning expert. Lessons learned include the necessity of providing a wealth of interactive tools for data manipulation, result visualization, database linkage, and cross-validation and comparison of rule sets, to complement the basic machine learning tools
Article
Full-text available
A set of properties of syntactic software complexity measures is proposed to serve as a basis for the evaluation of such measures. Four known complexity measures are evaluated and compared using these criteria. This formalized evaluation clarifies the strengths and weaknesses of the examined complexity measures, which include the statement count, cyclomatic number, effort measure, and data flow complexity measures. None of these measures possesses all nine properties, and several are found to fail to possess particularly fundamental properties; this failure calls into question their usefulness in measuring synthetic complexity
Article
Full-text available
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensiveleaveone -out cross-validation. We report on a largescale experiment---over half a million runs of C4.5 and a Naive-Bayes algorithm---to estimate the effects of different parameters on these algorithms on real-world datasets. For crossvalidation, wevary the number of folds and whether the folds are stratified or not# for bootstrap, wevary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds. 1 Introduction It can not be emphasized eno...
Article
The utility of technical materials is influenced to a marked extent by their reading level or readability. This article describes the derivation and validation of the Automated Readability Index (ARI) for use with technical materials. The method allows for the easy, automatic collection of data as narrative material is typed on a slightly modified electric typewriter. Data collected includes word length (a measure of word difficulty) and sentence length (a measure of sentence difficulty). Appropriate weightings of these factors in a multiple regression equation result in an index of reading difficulty. Uses of the index for evaluating and controlling the readability of large quantities of technical material are described.
Article
The consensus in the programming community is that indentation aids program comprehension, although many studies do not back this up. The authors tested program comprehension on a Pascal program. Two styles of indentation were used - blocked and nonblocked - in addition to four possible levels of indentation (0, 2, 4, 6 spaces). Both experienced and novice subjects were used. Although the blocking style made no difference, the level of indentation had a significant effect on program comprehension. (2 - 4 spaces had the highest mean score for program comprehension. ) It is recommended that a moderate level of indentation be used to increase program comprehension and user satisfaction.
Article
The paper is based on the premise that the productivity and quality of software development and maintenance, particularly in large and long term projects, is related to software readability. Software readability depends on such things as coding conventions and system overview documentation. Existing mechanisms to ensure readability --- for example, peer reviews --- are not sufficient. The paper proposes that software organizations or projects institute a readability/documentation group, similar to a test or usability group. This group would be composed of programmers and would produce overview documentation and ensure code and documentation readability. The features and functions of this group are described. Its benefits and possible objections to it are discussed.
Article
It is argued that program reading is an important programmer activity and that reading skill should be taught in programming courses. Possible teaching methods are suggested. The use of program reading in test construction and as part of an overall teaching strategy is discussed. A classification of reading comprehension testing methods is provided in an appendix.
The problem of poorly written hyperdocuments has already been identified. Furthermore, there is no complete definition of hyperdocument quality and the methodology and tools that will help in analysing and assessing the quality of hyperdocuments are missing. The ability to measure attributes of hyperdocuments is indispensable for the fields of hyperdocument authoring and hypertext engineering. Useful paradigms can be drawn from the practices used in the software engineering and software measurement fields. In this paper we define a hyperdocument quality model, based on the ideas of the well‐known Factor‐Criteria‐Metric hierarchical model. The important factors of readability and maintainability are defined, as well as the corresponding criteria. Finally, structure metrics, that can be computed on the hypertext graph, are presented. Most of these metrics are derived from well‐known software metrics. Experimentation is a key issue for the application of measurement, and flexible tools for the automatic collection of measures are needed to support it. Athena , a tool that was originally developed for software measurement and later tailored to meet hypertext measurement needs, is used for hyperdocument measurement.
Conference Paper
In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from human annotators, we derive associations be- tween a simple set of local code features and human notions of readability. Using those features, we construct an au- tomated readability measure and show that it can be 80% eective, and better than a human on average, at predict- ing readability judgments. Furthermore, we show that this metric correlates strongly with two traditional measures of software quality, code changes and defect reports. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggests that comments, in of themselves, are less im- portant than simple blank lines to local judgments of read- ability.
Conference Paper
Formal specifications can help with program testing, optimization, refactoring, documentation, and, most importantly, debugging and repair. Unfortunately, formal specifications are difficult to write manually, while techniques that infer specifications automatically suffer from 90–99% false positive rates. Consequently, neither option is currently practical for most software development projects. We present a novel technique that automatically infers partial correctness specifications with a very low false positive rate. We claim that existing specification miners yield false positives because they assign equal weight to all aspects of program behavior. By using additional information from the software engineering process, we are able to dramatically reduce this rate. For example, we grant less credence to duplicate code, infrequently-tested code, and code that exhibits high turnover in the version control system. We evaluate our technique in two ways: as a preprocessing step for an existing specification miner and as part of novel specification inference algorithms. Our technique identifies which input is most indicative of program behavior, which allows off-the-shelf techniques to learn the same number of specifications using only 60% of their original input. Our inference approach has few false positives in practice, while still finding useful specifications on over 800,000 lines of code. When minimizing false alarms, we obtain a 5% false positive rate, an order-of-magnitude improvement over previous work. When used to find bugs, our mined specifications locate over 250 policy violations. To the best of our knowledge, this is the first specification miner with such a low false positive rate, and thus a low associated burden of manual inspection.
Article
The list itself is based on a concise selection of empirical data and is in rough priority order. The first fact had the most effects on defect reduction on the empirical data that was used for evaluation, while the last fact was less important. The priority of the facts is discussable and depends on the context.
Article
Frequently, when circumstances require that a computer program be modified, the program is found to be extremely difficult to read and understand. In this case a new step to make the program more readable should be added at the beginning of the software modification cycle. A small investment will make (1) the specifications for the modifications easier to write, (2) the estimate of the cost of the modifications more accurate, (3) the design for the modifications simpler, and (4) the implementation of the modifications less error-prone.
Article
Treemaps, a space-filling method for visualizing large hierarchical data sets, are receiving increasing attention. Several algorithms have been previously proposed to create more useful displays by controlling the aspect ratios of the rectangles that make up a treemap. While these algorithms do improve visibility of small items in a single layout, they introduce instability over time in the display of dynamically changing data, fail to preserve order of the underlying data, and create layouts that are difficult to visually search. In addition, continuous treemap algorithms are not suitable for displaying fixed-sized objects within them, such as images.This paper introduces a new "strip" treemap algorithm which addresses these shortcomings, and analyzes other "pivot" algorithms we recently developed showing the trade-offs between them. These ordered treemap algorithms ensure that items near each other in the given order will be near each other in the treemap layout. Using experimental evidence from Monte Carlo trials and from actual stock market data, we show that, compared to other layout algorithms, ordered treemaps are more stable, while maintaining relatively favorable aspect ratios of the constituent rectangles. A user study with 20 participants clarifies the human performance benefits of the new algorithms. Finally, we present quantum treemap algorithms, which modify the layout of the continuous treemap algorithms to generate rectangles that are integral multiples of an input object size. The quantum treemap algorithm has been applied to PhotoMesa, an application that supports browsing of large numbers of images.
Article
This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph-theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The issue of using nonstructured control flow is also discussed. A characterization of nonstructured control graphs is given and a method of measuring the ″structuredness″ of a program is developed. The last section of this paper deals with a testing methodology used in conjunction with the complexity measure; a testing strategy is defined that dictates that program can either admit of a certain minimal testing level or the program can be structurally reduced.
Article
The project conceived in 1929 by Gardner Murphy and the writer aimed first to present a wide array of problems having to do with five major "attitude areas"--international relations, race relations, economic conflict, political conflict, and religion. The kind of questionnaire material falls into four classes: yes-no, multiple choice, propositions to be responded to by degrees of approval, and a series of brief newspaper narratives to be approved or disapproved in various degrees. The monograph aims to describe a technique rather than to give results. The appendix, covering ten pages, shows the method of constructing an attitude scale. A bibliography is also given.
Conference Paper
Software systems evolve over time due to changes in requirements, optimization of code, fixes for security and reliability bugs etc. Code churn, which measures the changes made to a component over a period of time, quantifies the extent of this change. We present a technique for early prediction of system defect density using a set of relative code churn measures that relate the amount of churn to other variables such as component size and the temporal extent of churn. Using statistical regression models, we show that while absolute measures of code chum are poor predictors of defect density, our set of relative measures of code churn is highly predictive of defect density. A case study performed on Windows Server 2003 indicates the validity of the relative code churn measures as early indicators of system defect density. Furthermore, our code churn metric suite is able to discriminate between fault and not fault-prone binaries with an accuracy of 89.0 percent.
Conference Paper
This paper describes an empirical study investigating whether programmers improve the readability of their source code if they have support from a source code editor that offers dynamic feedback on their identifier naming practices. An experiment, employing both students and professional software engineers, and requiring the maintenance and production of software, demonstrated a statistically significant improvement in source code readability over that of the control.
Conference Paper
For large software systems, the maintenance phase tends to have comparatively much longer duration than all the previous life-cycle phases taken together, obviously resulting in much more effort. A good measure of software maintainability can help better manage the maintenance phase effort. Software maintainability cannot be adequately measured by only source code or by documents. The readability and understandability of both source code and documentation should be considered to measure the maintainability. This paper proposes an integrated measure of software maintainability. The paper also proposes a new representation for rule base of fuzzy models, which require less space for storage and will be efficient in finding the results in the simulation. The proposed model measures the software maintainability based on three important aspects of software-readability of source code (RSC), documentation quality (DOQ), and understandability of software (UOS). Keeping in view the nature of these parameters, a fuzzy approach has been used to integrate these three aspects. A new efficient representation of rule base has been proposed for fuzzy models. This integrated measurement of software maintainability, which to our knowledge is a first attempt to quantify integrated maintainability, is bound to be better than any other single parameter maintainability measurement approach. Thus the output of this model can advise the software project managers in judging the maintenance efforts of the software
Article
This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph-theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual Fortran programs are then presented to illustrate the correlation between intuitive complexity and the graph-theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.
Article
A 3×2 factorial experiment was performed to compare the effects of procedure format (none, internal, or external) with those of comments (absent or present) on the readability of a PL/1 program. The readability of six editions of the program, each having a different combination of these factors, was inferred from the accuracy with which students could answer questions about the program after reading it. Both extremes in readability occurred in the program editions having no procedures: without comments the procedureless program was the least readable and with comments it was the most readable
Article
Software's complexity and accelerated development schedules make avoiding defects difficult. We have found, however, that researchers have established objective and quantitative data, relationships, and predictive models that help software developers avoid predictable pitfalls and improve their ability to predict and control efficient software projects. The article presents 10 techniques that can help reduce the flaws in your code
Article
Software is the key technology in applications as diverse as accounting, hospital management, aviation, and nuclear power. Application advances in different domains such as these-each with different requirements-have propelled software development from small batch programs to large, real-time programs with multimedia capabilities. To cope, software's enabling technologies have undergone tremendous improvement in hardware, communications, operating systems, compilers, databases, programming languages, and user interfaces, among others. In turn, those improvements have fueled even more advanced applications. Improvements in VLSI technology and multimedia, for example, have resulted in faster, more compact computers that significantly widened the range of software applications. Database and user interface enhancements, on the other hand, have spawned more interactive and collaborative development environments. Such changes have a ripple effect on software development processes as well as on software techniques and tools. In this article, we highlight software development's crucial methods and techniques of the past 30 years
Article
Program understanding is an essential part of all software maintenance and enhancement activities. As currently practiced, program understanding consists mainly of code reading. The few automated understanding tools that are actually used in industry provide helpful but relatively shallow information, such as the line numbers on which variable names occur or the calling structure possible among system components. These tools rely on analyses driven by the nature of the programming language used. As such, they are adequate to answer questions concerning implementation details, so called what questions. They are severely limited, however, when trying to relate a system to its purpose or requirements, the why questions. Application programs solve real-world problems. The part of the world with which a particular application is concerned is that application's domain. A model of an application's domain can serve as a supplement to programming-language-based analysis methods and tools....
C++ Coding Standards: 101 Rules, Guidelines, and Best Practices
  • H Sutter
  • A Alexandrescu
H. Sutter and A. Alexandrescu, C++ Coding Standards: 101 Rules, Guidelines, and Best Practices. Addison-Wesley Professional, 2004.
A readability metric for computer-generated mathematics
  • S Machaffie
  • R Mcleod
  • B Roberts
  • P Todd
  • L Anderson
S. MacHaffie, R. McLeod, B. Roberts, P. Todd, and L. Anderson, " A readability metric for computer-generated mathematics, " Saltire Software, http://www.saltire.com/equation.html, Tech. Rep., retrieved 2007.
Java Coding Standards Software Development
  • S Ambler
Smog Grading&mdash,A New Readability
  • G H Mclaughlin