Figure 3 - uploaded by Tegawendé F. Bissyandé
Content may be subject to copyright.
Correlation between Test Cases per LOC and Lines of Code We further examine the correlation between the number of lines of code and the number of test cases per LOC. We only consider the projects with test cases. For each project, we divide the number of test cases by the number of LOC. We observe an interesting thing from Figure 3 that as the project size increases, i.e., increase in number of lines of code (LOC), we see a decrease in the number of tests per LOC. Spearman's rho for the distribution is -0.686 with p-value < 2.2 e −16 , which shows that there is a negative correlation between the lines of code and the number of test cases per LOC. 90% of the projects have less than 100 test cases. Projects with test cases are bigger in size than projects without test cases. For the projects with test cases, the number of test cases per LOC decreases with increasing LOC.  

Correlation between Test Cases per LOC and Lines of Code We further examine the correlation between the number of lines of code and the number of test cases per LOC. We only consider the projects with test cases. For each project, we divide the number of test cases by the number of LOC. We observe an interesting thing from Figure 3 that as the project size increases, i.e., increase in number of lines of code (LOC), we see a decrease in the number of tests per LOC. Spearman's rho for the distribution is -0.686 with p-value < 2.2 e −16 , which shows that there is a negative correlation between the lines of code and the number of test cases per LOC. 90% of the projects have less than 100 test cases. Projects with test cases are bigger in size than projects without test cases. For the projects with test cases, the number of test cases per LOC decreases with increasing LOC.  

Source publication
Conference Paper
Full-text available
In software engineering, testing is a crucial activity that is designed to ensure the quality of program code. For this activity, development teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is however the proportion of open source projects that include test cases? What k...

Context in source publication

Context 1
... each project, we divide the number of test cases by the number of LOC. We observe an interesting thing from Figure 3 that as the project size increases, i.e., increase in number of lines of code (LOC), we see a decrease in the number of tests per LOC. Spearman's rho for the distribution is -0.686 with p-value < 2.2 e −16 , which shows that there is a negative correlation between the lines of code and the number of test cases per LOC. ...

Similar publications

Conference Paper
Full-text available
Testing is an indispensable part of software development efforts. It helps to improve the quality of software systems by finding bugs and errors during development and deployment. Huge amount of resources are spent on testing efforts. However, to what extent are they used in practice? In this study, we investigate the adoption of testing in open so...

Citations

... MUTAPI can work only on projects with an available test suite, while ALP is able to detect misuses in projects without test suites or with test suites with limited coverage. Existing research has also shown that many projects have poor code coverage and that developers do not always write test cases [88], [89]. ...
Preprint
Full-text available
A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern's frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects.
... To help with this process, it is important to keep track of these features, their changes, and their prevalence [19,29,35]. Narrower topics can also be investigated, such as configuration settings [36], popularity trends [12], popular testing practices [24], etc. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. ...
... A lot of existing research has been dedicated to studying large corpora of code [12,15,19,24,29,35,36], which involved developing tools, frameworks, and platforms for their processing [17,29,30,35,[38][39][40]. In this section, we list the most notable ones. ...
Preprint
Full-text available
In this paper, we present Lupa - a framework for large-scale analysis of the programming language usage. Lupa is a command line tool that uses the power of the IntelliJ Platform under the hood, which gives it access to powerful static analysis tools used in modern IDEs. The tool supports custom analyzers that process the rich concrete syntax tree of the code and can calculate its various features: the presence of entities, their dependencies, definition-usage chains, etc. Currently, Lupa supports analyzing Python and Kotlin, but can be extended to other languages supported by IntelliJ-based IDEs. We explain the internals of the tool, show how it can be extended and customized, and describe an example analysis that we carried out with its help: analyzing the syntax of ranges in Kotlin.
... We chose to classify changes at the test case level of granularity because this granularity is comparable across programming languages and is not influenced by the organization of test cases into test files or classes. In prior literature, test cases (Pinto et al. 2012;Beller et al. 2017;Kochhar et al. 2013) are also used more often than test files and test classes (Zaidman et al. 2008). ...
... It also finds that the inclusion of test code does not affect the time or the decision to merge a pull request. Kochhar et al. (2013) study the distribution of test cases across 50,000 GitHub projects. Pham et al. (2013) is a study of testing behaviors on GitHub and points out that developers' demand for test code in pull requests is influenced by the size of code changes, the types of contributions, and the estimated effort. ...
Article
Full-text available
Regular expressions cause string-related bugs and open security vulnerabilities for DOS attacks. However, beyond ReDoS (Regular expression Denial of Service), little is known about the extent to which regular expression issues affect software development and how these issues are addressed in practice. We conduct an empirical study of 356 regex-related bugs from merged pull requests in Apache, Mozilla, Facebook, and Google GitHub repositories. We identify and classify the nature of the regular expression problems, the fixes, and the related changes in the test code. The most important findings in this paper are as follows: 1) incorrect regular expression semantics is the dominant root cause of regular expression bugs (165/356, 46.3%). The remaining root causes are incorrect API usage (9.3%) and other code issues that require regular expression changes in the fix (29.5%), 2) fixing regular expression bugs is nontrivial as it takes more time and more lines of code to fix them compared to the general pull requests, 3) most (51%) of the regex-related pull requests do not contain test code changes. Certain regex bug types (e.g., compile error, performance issues, regex representation) are less likely to include test code changes than others, and 4) the dominant type of test code changes in regex-related pull requests is test case addition (75%). The results of this study contribute to a broader understanding of the practical problems faced by developers when using, fixing, and testing regular expressions.
... The quality and reliability of a project can be evaluated from several perspectives. In 2013, Kochhar et al. executed very similar empirical studies [1,21] of the adoption of testing in open source projects at GitHub, independently of programming language. They found out that 61% of the analyzed repositories include at least one test case, that projects with a larger number of developers have more test cases, and that number of test cases has a weak correlation with the number of bug reporters. ...
Article
Full-text available
Automated tests are often considered an indicator of project quality. In this paper, we performed a large analysis of 6.3 M public GitHub projects using Java as the primary programming language. We created an overview of tests occurrence in publicly available GitHub projects and the use of test frameworks in them. The results showed that 52% of the projects contain at least one test case. However, there is a large number of example tests that do not represent relevant production code testing. It was also found that there is only a poor correlation between the number of the word “test” in different parts of the project (e.g., file paths, file name, file content, etc.) and the number of test cases, creation date, date of the last commit, number of commits, or number of watchers. Testing framework analysis confirmed that JUnit is the most used testing framework with a 48% share. TestNG, considered the second most popular Java unit testing framework, occurred in only 3% of the projects.
... The software engineering community has taken up this challenge, researchers examine increasingly larger numbers of projects in order to test hypotheses and derive knowledge about the software development process. Examples of such studies include investigations of testing practices [12], changes to licensing over time [18], popularity trends [4] and configuration settings [17]. These works use samples of GitHub ranging from 15K to 100K projects filtered to exclude projects considered as lacking in size, popularity, originality or importance. ...
Conference Paper
Full-text available
Analyzing massive code bases is a staple of modern software engineering research – a welcome side-effect of the advent of large-scale software repositories such as GitHub. Selecting which projects one should analyze is a labor-intensive process, and a process that can lead to biased results if the selection is not representative of the population of interest. One issue faced by researchers is that the interface exposed by software repositories only allows the most basic of queries. CodeDJ is an infrastructure for querying repositories composed of a persistent datastore, constantly updated with data acquired from GitHub, and an in-memory database with a Rust query interface. CodeDJ supports reproducibility, historical queries are answered deterministically using past states of the datastore; thus researchers can reproduce published results. To illustrate the benefits of CodeDJ, we identify biases in the data of a published study and, by repeating the analysis with new data, we demonstrate that the study’s conclusions were sensitive to the choice of projects.
... Once test files are identified, we can investigate the test code changes in more detail. We chose to classify changes at the test case level of granularity because it is used more often in test code studies [47,4,28] than test files and test classes [69]. ...
... It also finds that the inclusion of test code does not affect the time or the decision to merge a pull request. Kochhar, et al. [28] study the distribution of test cases across 50,000 GitHub projects. Pham, et al. [46] is a study of testing behaviors on GitHub and points out that developers' demand for test code in pull requests is influenced by the size of code changes, the types of contributions, and the estimated effort. ...
Preprint
Regular expressions cause string-related bugs and open security vulnerabilities for DOS attacks. However, beyond ReDoS (Regular expression Denial of Service), little is known about the extent to which regular expression issues affect software development and how these issues are addressed in practice. We conduct an empirical study of 356 merged regex-related pull request bugs from Apache, Mozilla, Facebook, and Google GitHub repositories. We identify and classify the nature of the regular expression problems, the fixes, and the related changes in the test code. The most important findings in this paper are as follows: 1) incorrect regular expression behavior is the dominant root cause of regular expression bugs (165/356, 46.3%). The remaining root causes are incorrect API usage (9.3%) and other code issues that require regular expression changes in the fix (29.5%), 2) fixing regular expression bugs is nontrivial as it takes more time and more lines of code to fix them compared to the general pull requests, 3) most (51%) of the regex-related pull requests do not contain test code changes. Certain regex bug types (e.g., compile error, performance issues, regex representation) are less likely to include test code changes than others, and 4) the dominant type of test code changes in regex-related pull requests is test case addition (75%). The results of this study contribute to a broader understanding of the practical problems faced by developers when using, fixing, and testing regular expressions.
... We have several reasons to believe that the combination of test amplification and delta debugging can expose resilience issues: (i) a significant amount of time is spent on software testing [40,43] and tests are therefore likely to capture domain-specific information; (ii) developers tend to test the most important features (i.e., "happy paths") [8,30] first due to timing and budget constraints [7,18]; (iii) previous work [48] found that the majority of catastrophic failures could have been prevented by performing simple testing on error handling code; and (iv) that many distributed system failures are caused by the untimely arrival of a single event [33]. ...
... A persistent actor is implemented by (i) inheriting from the trait PersistentActor (line 12), (ii) overriding receiveCommand to define the message handler (lines 14-18), (iii) overriding receiveRecover to define the handler that replays persisted events (lines [21][22][23], and (iv) defining a persistenceId to uniquely identify the entity in a journal where events are written to and read from (line 32). To persist an event, a developer must call persist (line 16) with the event to be persisted and a callback (i.e., updateState on lines [25][26][27][28][29][30] to be executed whenever the given event has been persisted asynchronously. ...
... Other works have empirically studied tests on open source software. Kochhar et al. (2013) studied the correlation between the presence of test cases and project development characteristics (Kochhar 2013;Kochhar et al. 2013). It was found that tests increase the number of lines of code and the size of development teams. ...
... Other works have empirically studied tests on open source software. Kochhar et al. (2013) studied the correlation between the presence of test cases and project development characteristics (Kochhar 2013;Kochhar et al. 2013). It was found that tests increase the number of lines of code and the size of development teams. ...
... Other works have empirically studied tests on open source software. Kochhar et al. (2013) studied the correlation between the presence of test cases and project development characteristics (Kochhar 2013;Kochhar et al. 2013). It was found that tests increase the number of lines of code and the size of development teams. ...
Article
Full-text available
Software testing is an important phase in the software development lifecycle because it helps in identifying bugs in a software system before it is shipped into the hand of its end users. There are numerous studies on how developers test general-purpose software applications. The idiosyncrasies of mobile software applications, however, set mobile apps apart from general-purpose systems (e.g., desktop, stand-alone applications, web services). This paper investigates working habits and challenges of mobile software developers with respect to testing. A key finding of our exhaustive study, using 1000 Android apps, demonstrates that mobile apps are still tested in a very ad hoc way, if tested at all. However, we show that, as in other types of software, testing increases the quality of apps (demonstrated in user ratings and number of code issues). Furthermore, we find evidence that tests are essential when it comes to engaging the community to contribute to mobile open source software. We discuss reasons and potential directions to address our findings. Yet another relevant finding of our study is that Continuous Integration and Continuous Deployment (CI/CD) pipelines are rare in the mobile apps world (only 26% of the apps are developed in projects employing CI/CD) – we argue that one of the main reasons is due to the lack of exhaustive and automatic testing.
... Studies by Pham et al. (2013),and Tsay et al. (2012) studied the implications of such social behaviour on project success. Researchers have increasingly used tolls like GHTorrent and Gitminer to study testing patterns (Kochhar et al.,2013), the programming languages used , issue reporting patterns , and project success . ...
Preprint
Contribution is fundamental to the concept of performance analysis. It is integral to judging the worth of a person in a team, an employee in an organization, and participants in any activity in general. With the industry stressing on rightsizing and optimized workforce management, it is of paramount importance that the contribution of employees is comprehensively monitored and adequately rewarded. However, the subjective and bias-prone nature of existing processes has led to widespread employee dissatisfaction, especially in the software industry. Traditional LOC-based metrics fail to measure contribution as they do not consider the full range of activities performed by a software engineer in a project. Thus, there arises a need for a comprehensive metric to measure contribution. This research seeks to build a data-model to measure the contribution of a software engineer in a software project by using data mined from GitHub and Gitter. A high-level model for contribution is constructed and then expanded, by a top-down approach, to create a model that considers both the amount of work done, and its quality. This model represents an expert’s view of measuring contribution. Using the data mined from GitHub and Gitter a data- model is constructed, from bottom-up, that seeks to identify useful signals to quantify the concepts expressed by the high-level top-down model. This data-model is conceptualized by applying the principles of Measurement Theory in software engineering and is constantly refined using defeasible reasoning to better quantify the concepts of the high-level model. Finally, an expert system is used to measure the contribution of a software engineer in a project. The proposed solution is then subjected to a sanity test against the manual evaluation methods proposed by the expert in the high-level model. The results indicate that the data model is a reasonably accurate representation of the model proposed by the expert and succeeds in providing a rank-ordering of developers consistent with that obtained from a manual evaluation of contribution using the expert’s model, though some loss in conceptual fidelity is observed.
... GitHub Contains millions of software project information. In previous works, data from GitHub was leveraged to conduct large-scale studies on the popularity of programming languages [9] and the adoption of software testing [10]. In this paper, we exploit the data to analysis the difference of Bug Issue and Feature Issue. ...