Article
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The methodology of the proposed retail resilience engine has been developed to achieve both of these while maintaining a proper balance. It has been made possible because of the integration of Test-Driven Development (TDD) principles [28] with Natural Language Processing (NLP)-based Artificial Intelligence (AI). A conceptual explanation of the complete sequential methodological development process is presented in this section. ...
Article
Full-text available
System reliability and operational resilience are two critical success factors in the retail industry that are directly connected to customer satisfaction and business sustainability. Staying competitive in today’s dynamic and rapidly evolving market requires rapid adaptability. However, it contradicts the reliability and resilience. This paper proposes an innovative solution, the Retail Resilience Engine (RRE), to establish a balance between these success factors and market demand. It is a unique framework that combines Test-Driven Development (TDD) with a Large Language Model (LLM). This framework follows the state-of-the-art Agentic-AI architecture. It effectively evaluates the decision-making process at rapid speed in retail by incorporating diverse factors, including inventory management, demand forecasting, and customer feedback. As a result, the system reliability is improved significantly. The experimental analysis of the proposed framework shows its decision-making is similar to human experts with a similarity index of 97.5%. It further proves the reliability of the system. The framework also scales effectively, maintaining high accuracy, precision, recall, and F1 scores across varying dataset sizes. The robustness analysis of the system demonstrates the agility enhancement across diverse retail domains, ensuring consistent performance with accuracy exceeding 90% across all tested scenarios. The integration of a creative filtering mechanism further enhances the performance of the RRE framework by preventing 98.2% of the irrelevant inputs. Overall, the proposed RRE framework demonstrates the impressive potential to transform retail systems by enhancing reliability, scalability, and decision-making quality through an Agentic-AI approach.
... Extreme Programming (XP) is more suitable for projects that require high flexibility, rapid response to change, and intensive collaboration [16]. XP enables software development with short iterations, automated testing through Test-Driven Development (TDD), and practices such as pair programming to ensure quality and speed [17]. Compared to Waterfall, XP is more adaptive to change and risk [18], while Scrum provides a more structured iterative framework [19], but XP offers higher technical intensity and more responsive development speed [20]. ...
Article
The employee attendance system is an important component in human resource management that functions to efficiently record and monitor employee attendance. With the increasing reliance on information technology, it is essential to ensure optimal performance and a satisfying user experience. It is important for companies to ensure that this system functions well and responsively. This study aims to evaluate the performance of an employee attendance system through automated testing using the website performance analysis tool GTMetrix. By adopting automated testing using GTMetrix on this system to measure and analyze its performance, this research also offers an objective and efficient approach to identifying performance bottlenecks. The results of this research can serve as a reference for information system developers in designing and implementing a more optimal employee attendance system. In addition, the research also provides insights into the importance of optimizing website performance in the context of business applications. The results of the performance and structure testing analysis on this system using GTMetrix achieved an A grade with a performance score of 100% and a structure score of 100%. This research emphasizes the importance of automated testing as a tool to enhance and improve the performance of information systems, thereby meeting user needs and supporting more effective human resource management.
... Furthermore, Test-Driven Development (TDD) has been widely studied and promoted as a method that encourages developers to think through requirements and edge cases before writing code. Research by (Nascimento, 2020;Parsa, Zakeri-Nasrabadi, & Turhan, 2025) demonstrated that TDD leads to more testable code, better modularization, and higher test coverage. However, there have been conflicting findings regarding its effect on software quality. ...
Article
Full-text available
This study evaluates the impact of Behavior-Driven Development (BDD) and Test-Driven Development (TDD) on software quality using machine learning models, including Random Forest, XGBoost, and LightGBM. Key metrics such as bug detection, test coverage, and development time were analyzed using a dataset from multiple software projects. Polynomial feature expansion captured non-linear interactions, while SHapley Additive exPlanations (SHAP) enhanced interpretability. Results indicate that Random Forest achieved the best predictive accuracy, with an average RMSE of 7.64 and MAE of 6.39, outperforming XGBoost (average RMSE: 8.63, MAE: 7.37) and LightGBM (average RMSE: 6.89, MAE: 5.38). However, negative values across all models reveal challenges in generalization. SHAP analysis highlights the critical influence of higher-order interactions, particularly between test coverage and development time. These findings underscore the complexity of predicting software quality and suggest the need for additional features and advanced techniques to enhance model performance. This study provides a comprehensive, interpretable framework for assessing the comparative effectiveness of BDD and TDD in improving software quality.
... The results can be seen in Figure 9. Results on average responsiveness were less than 0.0611 seconds. Further testing using methods using acceptance testing [35]. This test is done with a questionnaire method based on usability testing with variables of effectiveness, efficiency, and satisfaction [36]. ...
Article
Full-text available
Data integration in this era is necessary for building a valid information system. Data in an information system must have a concept that interacts with other systems. With the development of information systems, data storage will increase. Big data must be channeled with a supporting information system connected to the data center information system. This research develops an API-integrated system with increased security in Basic Authentication with Cryptography. This research uses the Linear Sequential Model method with increased API security in Basic Authentication with Cryptographic hashes. test results using the CURL Library obtained appropriate data, and response time testing obtained an average result of 0.0611 per second. Acceptance testing obtained a percentage of results of 78%, which was included in the excellent functioning category. The research found that the Rest API can integrate and validate data between information systems
... As small teams become adapted to the new approach of writing tests before code, the learning curve associated with TDD may temporarily reduce productivity [12]. Even while initial productivity may suffer, recognizing TDD's iterative methodology can improve the quality of code and defect reduction in the long run. ...
Conference Paper
Full-text available
In the context of small software development teams, this research article gives a thorough investigation of the adoption of test-driven development (TDD) approaches. It aims to highlight the benefits that TDD offers, such as improved code quality through modularization and proactive defect spotting which results in effective debugging and development processes. It also discusses the complex issues that arise when TDD is implemented in smaller teams, such as the learning curve and resource constraints. This study significantly advances the understanding of how TDD can be used to optimize software development techniques in organizations or software houses having small development teams. It also explores the possible advantages and challenges.
Article
Full-text available
Refactoring is a critical but complex process to improve code quality by altering software structure without changing the observable behavior. Search-based approaches have been proposed to recommend refactoring solutions. However, existing works tend to leverage all the sub-attributes in an objective and ignore the relationship between the sub-attributes. Furthermore, the types of refactoring operations in the existing works can be further augmented. To this end, this paper proposes a novel approach, called MIRROR, to recommend refactoring by employing a multi-objective optimization across three objectives: (i) improving quality, (ii) removing code smell, and (iii) maximizing the similarity to refactoring history. Unlike previous works, MIRROR provides a way to further optimize attributes in each objective. To be more specific, given an objective, MIRROR investigates the possible correlations among attributes and selects those attributes with low correlations as the representation of this objective. MIRROR is evaluated on 6 real-world projects by answering 6 research questions. The experimental results demonstrate that MIRROR recommends an average of 43 solutions for each project. Furthermore, we compare MIRROR against existing tools JMove and QMove, and show that the F1 of MIRROR is 5.63% and 3.75% higher than that of JMove and QMove, demonstrating the effectiveness of MIRROR.
Article
Full-text available
The expenses associated with software maintenance and evolution constitute a significant portion, surpassing more than 80% of the overall costs involved in software development. Refactoring, a widely embraced technique, plays a crucial role in streamlining and minimizing maintenance activities and expenses. However, the effect of refactoring techniques on quality attributes presents inconsistent and conflicting findings, making it challenging for software developers to enhance software quality effectively. Additionally, the absence of a comprehensive framework further complicates the decision-making process for developers in selecting appropriate refactoring techniques aligned with specific design objectives. In light of these considerations, this research aims to introduce a novel framework for classifying refactoring techniques based on their measurable influence on internal quality attributes. Initially, an exploratory study was conducted to identify commonly employed refactoring techniques, followed by an experimental analysis involving five case studies to evaluate the effects of these techniques on internal quality attributes. Subsequently, the framework was constructed based on the outcomes of the exploratory and experimental studies, further reinforced by a multi-case analysis. Comprising three key components, namely the methodology for applying refactoring techniques, the Quality Model for Object-Oriented Design (QMOOD), and the classification scheme for refactoring techniques, this proposed framework serves as a valuable guideline for developers. By comprehending the effect of each refactoring technique on internal quality attributes, developers can make informed decisions and select suitable techniques to enhance specific aspects of their software. Consequently, this framework optimizes developers’ time and effort by minimizing the need to weigh the pros and cons of different refactoring techniques, potentially leading to a reduction in maintenance activities and associated costs.
Article
Full-text available
Unlike most other software quality attributes, testability cannot be evaluated solely based on the characteristics of the source code. The effectiveness of the test suite and the budget assigned to the test highly impact the testability of the code under test. The size of a test suite determines the test effort and cost, while the coverage measure indicates the test effectiveness. Therefore, testability can be measured based on the coverage and number of test cases provided by a test suite, considering the test budget. This paper offers a new equation to estimate testability regarding the size and coverage of a given test suite. The equation has been used to label 23,000 classes belonging to 110 Java projects with their testability measure. The labeled classes were vectorized using 262 metrics. The labeled vectors were fed into a family of supervised machine learning algorithms, regression, to predict testability in terms of the source code metrics. Regression models predicted testability with an R² of 0.68 and a mean squared error of 0.03, suitable in practice. Fifteen software metrics highly affecting testability prediction were identified using a feature importance analysis technique on the learned model. The proposed models have improved mean absolute error by 38% due to utilizing new criteria, metrics, and data compared with the relevant study on predicting branch coverage as a test criterion. As an application of testability prediction, it is demonstrated that automated refactoring of 42 smelly Java classes targeted at improving the 15 influential software metrics could elevate their testability by an average of 86.87%.
Article
Full-text available
The high cost of the test can be dramatically reduced, provided that the coverability as an inherent feature of the code under test is predictable. This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability. The prediction model consists of an ensemble of four regression models. The learning samples consist of feature vectors, where features are source code metrics computed for a class. The samples are labeled by the Coverageability values computed for their corresponding classes. We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class. We extend the size of the feature space by introducing a new approach to define submetrics in terms of existing source code metrics. Using feature importance analysis on the learned prediction models, we sort sources code metrics in the order of their impact on the test effectiveness. As a result of which we found the class strict cyclomatic complexity as the most influential source code metric. Our experiments with our prediction models on a large corpus of Java projects containing about 23,000 classes demonstrate the Mean Absolute Error (MAE) of 0.032, Mean‐Squared Error (MSE) of 0.004, and an R2 score of 0.855. Compared with the state‐of‐the‐art coverage prediction models, our models improve MAE, MSE, and an R2 score by 5.78%, 2.84%, and 20.71%, respectively.
Article
Full-text available
Context Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process, or case studies comparing the performance achieved by TDD vs. the control approach (e.g., the waterfall model), each applied to develop a different system. Further experiments with TDD experts are needed to validate these hypotheses.
Article
Full-text available
The article from this special issue was previously published in Software Testing, Verification and Reliability, Volume 29, Issue 4–5, 2019. For completeness we are including the title page of the article below. The full text of the article can be read in Issue 29:4–5 on Wiley Online Library: https://onlinelibrary.wiley.com/doi/10.1002/stvr.1701
Conference Paper
Full-text available
Comprehending the degree to which software components support testing is important to accurately schedule testing activities, train developers, and plan effective refactoring actions. Software testability estimates such property by relating code characteristics to the test effort. The main studies of testability reported in the literature investigate the relation between class metrics and test effort in terms of the size and complexity of the associated test suites. They report a moderate correlation of some class metrics to test-effort metrics, but suffer from two main limitations: (i) the results hardly generalize due to the small empirical evidence (datasets with no more than eight software projects); and (ii) mostly ignore the quality of the tests. However, considering the quality of the tests is important. Indeed, a class may have a low test effort because the associated tests are of poor quality, and not because the class is easier to test. In this paper, we propose an approach to measure testability that normalizes the test effort with respect to the test quality, which we quantify in terms of code coverage and mutation score. We present the results of a set of experiments on a dataset of 9,861 Java classes, belonging to 1,186 open source projects, with around 1.5 millions of lines of code overall. The results confirm that normalizing the test effort with respect to the test quality largely improves the correlation between class metrics and the test effort. Better correlations result in better prediction power, and thus better prediction of the test effort.
Article
Full-text available
Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.
Conference Paper
Full-text available
Background: Code refactoring aims to improve code structures via code transformations. A single transformation rarely suffices to fully remove code smells that reveal poor code structures. Most transformations are applied in batches, i.e. sets of interrelated transformations, rather than in isolation. Nevertheless, empirical knowledge on batch application, or batch refactoring, is scarce. Such scarceness helps little to improve current refactoring practices. Aims: We analyzed 57 open and closed software projects. We aimed to understand batch application from two perspectives: characteristics that typically constitute a batch (e.g., the variety of transformation types employed), and the batch effect on smells. Method: We analyzed 19 smell types and 13 transformation types. We identified 4,607 batches, each applied by the same developer on the same code element (method or class); we expected to have batches whose transformations are closely interrelated. We computed (1) the frequency in which five batch characteristic manifest, (2) the probability of each batch characteristics to remove smells, and (3) the frequency in which batches introduce and remove smells. Results: Most batches are quite simple: although most batches are applied on more than one method (90%), they are usually composed of the same transformation type (72%) and only two transformations (57%). Batches applied on a single method are 2.6 times more prone to fully remove smells than batches affecting more than one method. Surprisingly, batches mostly ended up introducing (51%) or not fully removing (38%) smells. Conclusions: The batch simplicity suggests that developers have sub-explored the combinations of transformations within a batch. We summarized some batches that may fully remove smells, so that developers can incorporate them into current refactoring practices.
Article
Full-text available
Search‐based unit test generation, if effective at fault detection, can lower the cost of testing. Such techniques rely on fitness functions to guide the search. Ultimately, such functions represent test goals that approximate—but do not ensure—fault detection. The need to rely on approximations leads to two questions—can fitness functions produce effective tests and, if so, which should be used to generate tests? To answer these questions, we have assessed the fault‐detection capabilities of unit test suites generated to satisfy eight white‐box fitness functions on 597 real faults from the Defects4J database. Our analysis has found that the strongest indicators of effectiveness are a high level of code coverage over the targeted class and high satisfaction of a criterion's obligations. Consequently, the branch coverage fitness function is the most effective. Our findings indicate that fitness functions that thoroughly explore system structure should be used as primary generation objectives—supported by secondary fitness functions that explore orthogonal, supporting scenarios. Our results also provide further evidence that future approaches to test generation should focus on attaining higher coverage of private code and better initialization and manipulation of class dependencies.
Article
Full-text available
Software testing is crucial in continuous integration (CI). Ideally, at every commit, all the test cases should be executed, and moreover, new test cases should be generated for the new source code. This is especially true in a Continuous Test Generation (CTG) environment, where the automatic generation of test cases is integrated into the continuous integration pipeline. In this context, developers want to achieve a certain minimum level of coverage for every software build. However, executing all the test cases and, moreover, generating new ones for all the classes at every commit is not feasible. As a consequence, developers have to select which subset of classes has to be tested and/or targeted by test‐case generation. We argue that knowing a priori the branch coverage that can be achieved with test‐data generation tools can help developers into taking informed decision about those issues. In this paper, we investigate the possibility to use source‐code metrics to predict the coverage achieved by test‐data generation tools. We use four different categories of source‐code features and assess the prediction on a large data set involving more than 3'000 Java classes. We compare different machine learning algorithms and conduct a fine‐grained feature analysis aimed at investigating the factors that most impact the prediction accuracy. Moreover, we extend our investigation to four different search budgets. Our evaluation shows that the best model achieves an average 0.15 and 0.21 MAE on nested cross‐validation over the different budgets, respectively, on evosuite and randoop. Finally, the discussion of the results demonstrate the relevance of coupling‐related features for the prediction accuracy.
Article
Full-text available
Context: Software testability is the degree to which a software system or a unit under test supports its own testing. To predict and improve software testability, a large number of techniques and metrics have been proposed by both practitioners and researchers in the last several decades. Reviewing and getting an overview of the entire state-of-the-art and state-of-the-practice in this area is often challenging for a practitioner or a new researcher. Objective: Our objective is to summarize the body of knowledge in this area and to benefit the readers (both practitioners and researchers) in preparing, measuring and improving software testability. Method: To address the above need, the authors conducted a survey in the form of a systematic literature mapping (classification) to find out what we as a community know about this topic. After compiling an initial pool of 303 papers, and applying a set of inclusion/exclusion criteria, our final pool included 208 papers (published between 1982 and 2017). Results: The area of software testability has been comprehensively studied by researchers and practitioners. Approaches for measurement of testability and improvement of testability are the most-frequently addressed in the papers. The two most often mentioned factors affecting testability are observability and controllability. Common ways to improve testability are testability transformation, improving observability, adding assertions, and improving controllability. Conclusion: This paper serves for both researchers and practitioners as an “index” to the vast body of knowledge in the area of testability. The results could help practitioners measure and improve software testability in their projects. To assess potential benefits of this review paper, we shared its draft version with two of our industrial collaborators. They stated that they found the review useful and beneficial in their testing activities. Our results can also benefit researchers in observing the trends in this area and identify the topics that require further investigation.
Article
Full-text available
This paper examines the impact of Test Driven Development on different software parameters such as software quality, cost effectiveness, speed of development, test quality, refactoring phenomena and its impact, overall effort required and productivity, maintainability and time required. The study is based primarily on the research conducted over the last ten years. This work makes a detailed analysis of the effects of test driven development and its applications and intends to help the researchers get a quick insight into its application areas, advantages and pitfalls with respect to the above mentioned parameters evaluated on academically controlled experiments and industrial case studies.
Article
Full-text available
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. To that end, we have conducted a comparative analysis of GitHub repositories that adopts TDD to a lesser or greater extent, in order to determine how TDD affects software development productivity and software quality. We classified GitHub repositories archived in 2015 in terms of how rigorously they practiced TDD, thus creating a TDD spectrum. We then matched and compared various subsets of these repositories on this TDD spectrum with control sets of equal size. The control sets were samples from all GitHub repositories that matched certain characteristics, and that contained at least one test file. We compared how the TDD sets differed from the control sets on the following characteristics: number of test files, average commit velocity, number of bug-referencing commits, number of issues recorded, usage of continuous integration, number of pull requests, and distribution of commits per author. We found that Java TDD projects were relatively rare. In addition, there were very few significant differences in any of the metrics we used to compare TDD-like and non-TDD projects; therefore, our results do not reveal any observable benefits from using TDD.
Article
Full-text available
Nowadays, the growth in size and complexity of object-oriented software systems bring new software quality assurance challenges. Applying equally testing (quality assurance) effort to all classes of a large and complex object-oriented software system is cost prohibitive and not realistic in practice. So, predicting early the different levels of the unit testing effort required for testing classes can help managers to: (1) better identify critical classes, which will involve a relatively high-testing effort, on which developers and testers have to focus to ensure software quality, (2) plan testing activities, and (3) optimally allocate resources. In this paper, we investigate empirically the ability of a Quality Assurance Indicator (Qi), a synthetic metric that we proposed in a previous work, to predict different levels of the unit testing effort of classes in object-oriented software systems. The unit testing effort of classes is addressed from the perspective of unit test cases construction. We focused particularly on the effort involved in writing the code of unit test cases. To capture the involved unit testing effort of classes, we used four metrics that quantify different characteristics related to the code of corresponding unit test cases. We used Means and K-Means-based categorizations to group software classes into five categories according to the involved unit testing effort. We performed an empirical analysis using data collected from eight open-source Java software systems from different domains, for which the JUnit test cases were available. To evaluate the ability of the Qi metric to predict different levels of the unit testing effort of classes, we used three modeling techniques: the univariate logistic regression, the univariate linear regression, and the multinomial logistic regression. The performance of the models based on the Qi metric has been compared to the performance of the models based on various well-known object-oriented source code metrics. We used different evaluation criteria to compare the prediction models. Results indicate that the models based on the Qi metric have more promising prediction potential than those based on traditional object-oriented metrics.
Conference Paper
Full-text available
A number of criteria have been proposed to judge test suite adequacy. While search-based test generation has improved greatly at criteria coverage, the produced suites are still often ineffective at detecting faults. Efficacy may be limited by the single-minded application of one criterion at a time when generating suites—a sharp contrast to human testers, who simultaneously explore multiple testing strategies. We hypothesize that automated generation can be improved by selecting and simultaneously exploring multiple criteria.
Conference Paper
Full-text available
Background: Testing is an essential activity in safety-critical software development, following high standards in terms of code coverage. Mutation testing allows assessing the effectiveness of testing and helps to further improve test cases. However, mutation testing is not widely practiced due to scalability problems when applied to real-world systems. Objective: The objective of the study is to investigate the applicability and usefulness of mutation testing for improving the quality of unit testing in context of safety-critical software systems. Method: A case study has been conducted together with an engineering company developing safety-critical systems. Mutation analysis has been applied to the studied system under test (60,000 LOC of C code) producing 75,043 mutants of which 27,158 survived test execution. A sample of 200 live mutants has been reviewed by the engineers, who also improved the existing unit test suite based on their findings. Findings: The reviewed sample contained 24+ equivalent mutants and 12+ duplicated mutants. It revealed a weak spot in the testing approach and provided valuable guidance to improve the existing unit test suite. Two new faults were found in the code when improving the tests. Test execution against the mutants required over 4,000 hours computing time. The overall effort was about half a person year.
Article
Full-text available
Search-based software engineering (SBSE) solutions are still not scalable enough to handle high-dimensional objectives space. The majority of existing work treats software engineering problems from a single or bi-objective point of view, where the main goal is to maximize or minimize one or two objectives. However, most software engineering problems are naturally complex in which many conflicting objectives need to be optimized. Software refactoring is one of these problems involving finding a compromise between several quality attributes to improve the quality of the system while preserving the behavior. To this end, we propose a novel representation of the refactoring problem as a many-objective one where every quality attribute to improve is considered as an independent objective to be optimized. In our approach based on the recent NSGA-III algorithm, the refactoring solutions are evaluated using a set of 8 distinct objectives. We evaluated this approach on one industrial project and seven open source systems. We compared our findings to: several other many-objective techniques (IBEA, MOEA/D, GrEA, and DBEA-Eps), an existing multi-objective approach a mono-objective technique and an existing refactoring technique not based on heuristic search. Statistical analysis of our experiments over 31 runs shows the efficiency of our approach.
Article
Full-text available
Software refactoring is a collection of reengineering activities that aims to improve software quality. Refactorings are commonly used in agile software processes to improve software quality after a significant software development or evolution. There is belief that refactoring improves quality factors such as understandability, flexibility, and reusability. However, there is limited empirical evidence to support such assumptions. The aim of this study is to confirm such claims using a hierarchal quality model. We study the effect of software refactoring on software quality. We provide details of our findings as heuristics that can help software developers make more informed decisions about what refactorings to perform in regard to improve a particular quality factor. We validate the proposed heuristics in an empirical setting on two open-source systems. We found that the majority of refactoring heuristics do improve quality; however some heuristics do not have a positive impact on all software quality factors. In addition, we found that the impact analysis of refactorings divides software measures into two categories: high and low impacted measures. These categories help in the endeavor to know the best measures that can be used to identify refactoring candidates. We validated our findings on two open-source systems-Eclipse and Struts. For both systems, we found consistency between the heuristics and the actual refactorings.
Article
Full-text available
Scientific software developers are increasingly employing various software engineering practices. Specifically, scientists are beginning to use Test-Driven Development (TDD). Even with this increasing use of TDD, the effect of TDD on scientific software development is not fully understood. To help scientific developers determine whether TDD is appropriate for their scientific projects, we surveyed scientific developers who use TDD to understand: (1) TDDs effectiveness, (2) the benefits and challenges of using TDD, and (3) the use of refactoring practices (an important part of the TDD process). Some key positive results include: (1) TDD helps scientific developers increase software quality, in particular functionality and reliability; and (2) TDD helps scientific developers reduce the number of problems in the early phase of projects. Conversely, some key challenges include: (1) TDD may not be effective for all types of scientific projects; and (2) Writing a good test is the most difficult task in TDD, particularly in a parallel computing environment. To summarize, TDD generally has a positive effect on the quality of scientific software, but it often requires a large effort investment. The results of this survey indicate the need for additional empirical evaluation of the use of TDD for the development of scientific software to help organizations make better decisions.
Conference Paper
Full-text available
One of the key challenges of developers testing code is determining a test suite's quality -- its ability to find faults. The most common approach is to use code coverage as a measure for test suite quality, and diminishing returns in coverage or high absolute coverage as a stopping rule. In testing research, suite quality is often evaluated by a suite's ability to kill mutants (artificially seeded potential faults). Determining which criteria best predict mutation kills is critical to practical estimation of test suite quality. Previous work has only used small sets of programs, and usually compares multiple suites for a single program. Practitioners, however, seldom compare suites --- they evaluate one suite. Using suites (both manual and automatically generated) from a large set of real-world open-source projects shows that evaluation results differ from those for suite-comparison: statement (not block, branch, or path) coverage predicts mutation kills best.
Article
Full-text available
In this paper, we investigate empirically the relationship between object-oriented design metrics and testability of classes. We address testability from the point of view of unit testing effort. We collected data from three open source Java software systems for which JUnit test cases exist. To capture the testing effort of classes, we used metrics to quan-tify the corresponding JUnit test cases. Classes were classified, according to the required unit testing effort, in two categories: high and low. In order to evaluate the relationship between object-oriented design metrics and unit testing effort of classes, we used logistic regression methods. We used the univariate logistic regression analysis to evaluate the individual effect of each metric on the unit testing effort of classes. The multivariate logistic regression analysis was used to explore the combined effect of the metrics. The performance of the prediction models was evaluated using Re-ceiver Operating Characteristic analysis. The results indicate that: 1) complexity, size, cohesion and (to some extent) coupling were found significant predictors of the unit testing effort of classes and 2) multivariate regression models based on object-oriented design metrics are able to accurately predict the unit testing effort of classes.
Article
Full-text available
Article
Full-text available
Abstract Background: Test-First programming is regarded as one of the software development practices that can make unit tests to be more rigorous, thorough and effective in fault detection. Code coverage measures can be useful as indicators of the thoroughness of unit test suites, while mutation testing turned out to be effective at finding faults. Objective: This paper presents an experiment in which Test-First vs Test- Last programming,practices are examined with regard to branch coverage and mutation score indicator of unit tests. Method: Student subjects were randomly assigned to Test-First and Test- Last groups. In order to further reduce pre-existing differences,among subjects, and to get a more sensitive measure of our experimental effect multivariate analysis of covariance was performed. Results: Multivariate tests results indicate that there is no statistically significant difference,between Test-First and Test-Last practices on the com- bined dependent variables, i.e. branch coverage and mutation score indicator, (F (2; 9) = :52, p > :05), even if we control for the pre-test results, the subjects’ experience, and when the subjects who showed deviations from the assigned programming technique are excluded from the analysis. Conclusion: According to the preliminary results presented in this paper, the benets,of the Test-First practice in this specific context can be considered minor.
Article
Full-text available
The building of highly cohesive classes is an important objective in object-oriented design. Class cohesion refers to the relatedness of the class members, and it indicates one important aspect of the class design quality. A meaningful class cohesion metric helps object-oriented software developers detect class design weaknesses and refactor classes accordingly. Several class cohesion metrics have been proposed in the literature. Most of these metrics are applicable based on low-level design information such as attribute references in methods. Some of these metrics capture class cohesion by counting the number of method pairs that share common attributes. A few metrics measure cohesion more precisely by considering the degree of interaction, through attribute references, between each pair of methods. However, the formulas applied by these metrics to measure the degree of interaction cause the metrics to violate important mathematical properties, thus undermining their construct validity and leading to misleading cohesion measurement. In this paper, we propose a formula that precisely measures the degree of interaction between each pair of methods, and we use it as a basis to introduce a low-level design class cohesion metric (LSCC). We verify that the proposed formula does not cause the metric to violate important mathematical properties. In addition, we provide a mechanism to use this metric as a useful indicator for refactoring weakly cohesive classes, thus showing its usefulness in improving class cohesion. Finally, we empirically validate LSCC. Using four open source software systems and eleven cohesion metrics, we investigate the relationship between LSCC, other cohesion metrics, and fault occurrences in classes. Our results show that LSCC is one of three metrics that explains more accurately the presence of faults in classes. LSCC is the only one among the three metrics to comply with important mathematical properties, and statistical analysis shows it captures a measurement dimension of its own. This suggests that LSCC is a better alternative, when taking into account both theoretical and empirical results, as a measure to guide the refactoring of classes. From a more general standpoint, the results suggest that class quality, as measured in terms of fault occurrences, can be more accurately explained by cohesion metrics that account for the degree of interaction between each pair of methods.
Article
Full-text available
Current software practice places a strong empha-sis on unit testing, to the extent that the amount of test code produced on a project can exceed the amount of actual application code required. This illustrates the importance of testability as a feature of software. In this paper we investigate whether it is possible to improve a program's testability using an automated refactoring approach. We conduct a quasi-experiment where we create a small application that scores poorly using a proven cohesion metric, LSCC. Using our automated refactoring platform, Code-Imp, this application is automatically refactored using the LSCC metric to guide the search for better solutions. To evaluate the results, a number of industrial software engineers were asked to write test cases for the application both before and after refactoring and compare the relative difficulty involved. The results were interesting though inconclusive, and suggest that further work is required.
Article
Long Method is amongst the most common code smells in software systems. Despite various attempts to detect the long method code smell, few automated approaches are presented to refactor this smell. Extract Method refactoring is mainly applied to eliminate the Long Method smell. However, current approaches still face serious problems such as insufficient accuracy in detecting refactoring opportunities, limitations on correction types, the need for human intervention in the refactoring process, and lack of attention to object-oriented principles, mainly single responsibility and cohesion-coupling principles. This paper aims to automatically identify and refactor the long method smells in Java codes using advanced graph analysis techniques, addressing the aforementioned difficulties. First, a graph representing project entities is created. Then, long method smells are detected, considering the methods’ dependencies and sizes. All possible refactorings are then extracted and ranked by a modularity metric, emphasizing high cohesion and low coupling classes for the detected methods. Finally, a proper name is assigned to the extracted method based on its responsibility. Subsequently, the best destination class is determined such that design modularity is maximized. Experts’ opinion is used to evaluate the proposed approach on five different Java projects. The results show the applicability of the proposed method in establishing the single responsibility principle with a 21% improvement compared to the state-of-the-art extract method refactoring approaches.
Chapter
One of the most popular machine learning algorithms is gradient boosting over decision trees. This algorithm achieves high quality out of the box combined with comparably low training and inference time. However, modern machine learning applications require machine learning algorithms, that can achieve better quality in less inference time, which leads to an exploration of grading boosting algorithms over other forms of base learners. One of such advanced base learners is a piecewise linear tree, which has linear functions as predictions in leaves. This paper introduces an efficient histogram-based algorithm for building gradient boosting ensembles of such trees. The algorithm was compared with modern gradient boosting libraries on publicly available datasets and achieved better quality with a decrease in ensemble size and inference time. It was proven, that algorithm is independent of a linear transformation of individual features.
Article
Context Software maintenance is expensive and so anything that can be done to reduce its cost is potentially of huge benefit. However, it is recognised that some maintenance, especially refactoring, can be automated. Given the number of possible refactorings and combinations of refactorings, a search-based approach may provide the means to optimise refactorings. Objective This paper describes the investigation of a many-objective genetic algorithm used to automate software refactoring, implemented as a Java tool, MultiRefactor. Method The approach and tool is evaluated using a set of open source Java programs. The tool contains four separate measures of software looking at the software quality as well as measures of code priority, refactoring coverage and element recentness. The many-objective algorithm combines the four objectives to improve the software in a holistic manner. An experiment has been constructed to compare the many-objective approach against a mono-objective approach that only uses a single objective to measure software quality. Different permutations of the objectives are also tested and compared to see how well the different objectives can work together in a multi-objective refactoring approach. The eight approaches are tested on six different open source Java programs. Results The many-objective approach is found to give better objective scores on average than the mono-objective approach and in less time. However, the priority and element recentness objectives are both found to be less successful in multi/many-objective setups when they are used together. Conclusion A many-objective approach is suitable and effective for optimising automated refactoring to improve quality. Including other objectives does not unduly degrade the quality improvements, but is less effective for those objectives than if they were used in a mono-objective approach.
Article
Test cases are crucial to help developers preventing the introduction of software faults. Unfortunately, not all the tests are properly designed or can effectively capture faults in production code. Some measures have been defined to assess test-case effectiveness: the most relevant one is the mutation score, which highlights the quality of a test by generating the so-called mutants, ie variations of the production code that make it faulty and that the test is supposed to identify. However, previous studies revealed that mutation analysis is extremely costly and hard to use in practice. The approaches proposed by researchers so far have not been able to provide practical gains in terms of mutation testing efficiency. This leaves the problem of efficiently assessing test-case effectiveness as still open. In this paper, we investigate a novel, orthogonal, and lightweight methodology to assess test-case effectiveness: in particular, we study the feasibility to exploit production and test-code-quality indicators to estimate the mutation score of a test case. We firstly select a set of 67 factors and study their relation with test-case effectiveness. Then, we devise a mutation score estimation model exploiting such factors and investigate its performance as well as its most relevant features. The key results of the study reveal that our estimation model only based on static features has 86% of both F-Measure and AUC-ROC. This means that we can estimate the test-case effectiveness, using source-code-quality indicators, with high accuracy and without executing the tests. As a consequence, we can provide a practical approach that is beyond the typical limitations of current mutation testing techniques.
Conference Paper
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. Previous work by Fucci et al. [2, 3] engaged in laboratory studies of developers actively engaged in test-driven development practices. Fucci et al. found little difference between test-first behaviour of TDD and test-later behaviour. To that end, we opted to conduct a study about TDD behaviours in the "wild" rather than in the laboratory. Thus we have conducted a comparative analysis of GitHub repositories that adopts TDD to a lesser or greater extent, in order to determine how TDD affects software development productivity and software quality. We classified GitHub repositories archived in 2015 in terms of how rigorously they practiced TDD, thus creating a TDD spectrum. We then matched and compared various subsets of these repositories on this TDD spectrum with control sets of equal size. The control sets were samples from all GitHub repositories that matched certain characteristics, and that contained at least one test file. We compared how the TDD sets differed from the control sets on the following characteristics: number of test files, average commit velocity, number of bug-referencing commits, number of issues recorded, usage of continuous integration, number of pull requests, and distribution of commits per author. We found that Java TDD projects were relatively rare. In addition, there were very few significant differences in any of the metrics we used to compare TDD-like and non-TDD projects; therefore, our results do not reveal any observable benefits from using TDD.
Conference Paper
Background: Writing unit tests is one of the primary activities in test-driven development. Yet, the existing reviews report few evidence supporting or refuting the effect of this development approach on test case quality. Lack of ability and skills of developers to produce sufficiently good test cases are also reported as limitations of applying test-driven development in industrial practice. Objective: We investigate the impact of test-driven development on the effectiveness of unit test cases compared to an incremental test last development in an industrial context. Method: We conducted an experiment in an industrial setting with 24 professionals. Professionals followed the two development approaches to implement the tasks. We measure unit test effectiveness in terms of mutation score. We also measure branch and method coverage of test suites to compare our results with the literature. Results: In terms of mutation score, we have found that the test cases written for a test-driven development task have a higher defect detection ability than test cases written for an incremental test-last development task. Subjects wrote test cases that cover more branches on a test-driven development task compared to the other task. However, test cases written for an incremental test-last development task cover more methods than those written for the second task. Conclusion: Our findings are different from previous studies conducted at academic settings. Professionals were able to perform more effective unit testing with test-driven development. Furthermore, we observe that the coverage measure preferred in academic studies reveal different aspects of a development approach. Our results need to be validated in larger industrial contexts.
Article
Context: Test-driven development (TDD) is an iterative software development practice where unit tests are defined before production code. A number of quantitative empirical investigations have been conducted about this practice. The results are contrasting and inconclusive. In addition, previous studies fail to analyze the values, beliefs, and assumptions that inform and shape TDD. Objective: We present a study designed, and conducted to understand the values, beliefs, and assumptions about TDD. Participants were novice and professional software developers. Method: We conducted an ethnographically-informed study with 14 novice software developers, i.e., graduate students in Computer Science at the University of Basilicata, and six professional software developers (with one to 10 years work experience). The participants worked on the implementation of a new feature for an existing software written in Java. We immersed ourselves in the context of our study. We collected qualitative information by means of audio recordings, contemporaneous field notes, and other kinds of artifacts. We collected quantitative data from the integrated development environment to support or refute the ethnography results. Results: The main insights of our study can be summarized as follows: (i) refactoring (one of the phases of TDD) is not performed as often as the process requires and it is considered less important than other phases, (ii) the most important phase is implementation, (iii) unit tests are almost never up-to-date, and (iv) participants first build in their mind a sort of model of the source code to be implemented and only then write test cases. The analysis of the quantitative data supported the following qualitative findings: (i), (iii), and (iv). Conclusions: Developers write quick-and-dirty production code to pass the tests, do not update their tests often, and ignore refactoring.
Article
Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.
Conference Paper
Anti-patterns are poor design choices that hinder code evolution, and understandability. Practitioners perform refactoring, that are semantic-preserving-code transformations, to correct anti-patterns and to improve design quality. However, manual refactoring is a consuming task and a heavy burden for developers who have to struggle to complete their coding tasks and maintain the design quality of the system at the same time. For that reason, researchers and practitioners have proposed several approaches to bring automated support to developers, with solutions that ranges from single anti-patterns correction, to multiobjective solutions. The latter approaches attempted to reduce refactoring effort, or to improve semantic similarity between classes and methods in addition to removing anti-patterns. To the best of our knowledge, none of the previous approaches have considered the impact of refactoring on another important aspect of software development, which is the testing effort. In this paper, we propose a novel search-based multiobjective approach for removing five well-known anti-patterns and minimizing testing effort. To assess the effectiveness of our proposed approach, we implement three different multiobjective metaheuristics (NSGA-II, SPEA2, MOCell) and apply them to a benchmark comprised of four open-source systems. Results show that MOCell is the metaheuristic that provides the best performance.
Article
Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated scientific advance in automated unit test generation with the EVOSUITE tool by evaluating it on manually selected open-source projects or examples that represent a particular problem addressed by the underlying technique. However, demonstrating scientific advance is not necessarily the same as demonstrating practical value; even if VOSUITE worked well on the software projects we selected for evaluation, it might not scale up to the complexity of real systems. Ideally, one would use large “real-world” software systems to minimize the threats to external validity when evaluating research tools. However, neither choosing such software systems nor applying research prototypes to them are trivial tasks. In this article we present the results of a large experiment in unit test generation using the VOSUITE tool on 100 randomly chosen open-source projects, the 10 most popular open-source projects according to the SourceForge Web site, seven industrial projects, and 11 automatically generated software projects. The study confirms that VOSUITE can achieve good levels of branch coverage (on average, 71% per class) in practice. However, the study also exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects. Furthermore, our experiments demonstrate how practical limitations interfere with scientific advances, branch coverage on an unbiased sample is affected by predominant environmental dependencies. The surprisingly large effect of such practical engineering problems in unit testing will hopefully lead to a larger appreciation of work in this area, thus supporting transfer of knowledge from software testing research to practice.
Article
The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites. We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness. We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.
Article
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
Conference Paper
Code Prioritization for testing promises to achieve the maximum testing coverage with the least cost. This paper presents an innovative method to provide hints on which part of code should be tested first to achieve best code coverage. This method claims two major contributions. First it takes into account a "global view" of the execution of a program being tested, by considering the impact of calling relationship among method/functions of complex software. It then relaxes the "guaranteed" condition of traditional dominator analysis to be "at least" relationship among dominating nodes, which makes dominator calculation much simpler without losing its accuracy. It also then expands this modified dominator analysis to include global impact of code coverage, i.e. the coverage of the entire software other than just the current function. We implemented two versions of code prioritization methods, one based on original dominator analysis and the other on relaxed dominator analysis with global view. Our comparison study shows that the latter is consistently better in terms of identifying code for testing to increase code coverage.
Conference Paper
Background: Test-first development (TF) is regarded as a development practice that can lead to better quality of software products, as well as improved developer productivity. By implementing unit tests before the corresponding production code, the tests themselves are the main driver to such improvements. The role of tests on the effectiveness of TF has been studied in a controlled experiment by Erdogmus et al. (i.e. original study). Aim: Our goal is to examine the impact of test-first (TF) development on product quality and developer productivity, specifically the role that tests play in it. Method: We replicated the original study's controlled experiment by comparing an experimental group applying TF to a control group applying a test-last approach. We then carried out a correlation study in order to understand whether the number of tests is a good predictor for external quality and/or productivity. Results: Mann-Whitney tests did not show any significant difference between the two groups in terms of number of tests written (W=114.5, p=0.38), developers' productivity (W=90, p=0.82) and external quality (W=81.55, p=0.53). In addition, while a significant correlation exists between the number of tests and productivity (Spearman's ρ = 0.57, p
Conference Paper
There has been a recent surge in interest in the application of Artificial Intelligence (AI) techniques to Software Engineering (SE) problems. The work is typified by recent advances in Search Based Software Engineering, but also by long established work in Probabilistic reasoning and machine learning for Software Engineering. This paper explores some of the relationships between these strands of closely related work, arguing that they have much in common and sets out some future challenges in the area of AI for SE.
Article
Background: Test-Driven Development (TDD) is claimed to have positive effects on external code quality and programmers’ productivity. The main driver for these possible improvements is the tests enforced by the test-first nature of TDD as previously investigated in a controlled experiment (i.e. the original study). Aim: Our goal is to examine the nature of the relationship between tests and external code quality as well as programmers’ productivity in order to verify/ refute the results of the original study. Method: We conducted a differentiated and partial replication of the original setting and the related analyses, with a focus on the role of tests. Specifically, while the original study compared test-first vs. test-last, our replication employed the test-first treatment only. The replication involved 30 students, working in pairs or as individuals, in the context of a graduate course, and resulted in 16 software artifacts developed. We performed linear regression to test the original study’s hypotheses, and analyses of covariance to test the additional hypotheses imposed by the changes in the replication settings. Results: We found significant correlation (Spearman coefficient = 0.66, with p-value = 0.004) between the number of tests and productivity, and a positive regression coefficient (p-value = 0.011). We found no significant correlation (Spearman coefficient = 0.41 with p-value = 0.11) between the number of tests and external code quality (regression coefficient p-value = 0.0513). For both cases we observed no statistically significant interaction caused by the subject units being individuals or pairs. Further, our results are consistent with the original study although there were changes in the timing constraints for finishing the task and the enforced development processes. Conclusions: This replication study confirms the results of the original study concerning the relationship between the number of tests vs. external code quality and programmer productivity. Moreover, this replication allows us to identify additional context variables, for which the original results still hold; namely the subject unit, timing constraint and isolation of test-first process. Based on our findings, we recommend practitioners to implement as many tests as possible in order to achieve higher baselines for quality and productivity.
Article
This paper does an extensive review on testability of object oriented software, and put forth some relevant information about class-level testability. Testability has been identified as a key factor to software quality, and emphasis is being drawn to predict class testability early in the software development life cycle. A Metrics Based Model for Object Oriented Design Testability (MTMOOD) has been proposed. The relationship from design properties to testability is weighted in accordance with its anticipated influence and importance. A suit of adequate object-oriented metrics useful in determining testability of a system has been proposed, which may be used to locate parts of design that could be error prone. Identification of changes in theses parts early could significantly improve the quality of the final product and hence decrease the testing effort. The proposed model has been further empirically validated and contextual interpretation has been drawn using industrial software projects.
Article
Even bad code can function. But if code isnt clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesnt have to be that way.Noted software expert Robert C. Martin, presents a revolutionary paradigm with Clean Code: A Handbook of Agile Software Craftsmanship. Martin, who has helped bring agile principles from a practitioners point of view to tens of thousands of programmers, has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code on the fly into a book that will instill within you the values of software craftsman, and make you a better programmerbut only if you work at it.What kind of work will you be doing? Youll be reading codelots of code. And you will be challenged to think about whats right about that code, and whats wrong with it. More importantly you will be challenged to reassess your professional values and your commitment to your craft. Clean Code is divided into three parts. The first describes the principles, patterns, and practices of writing clean code. The second part consists of several case studies of increasing complexity. Each case study is an exercise in cleaning up codeof transforming a code base that has some problems into one that is sound and efficient. The third part is the payoff: a single chapter containing a list of heuristics and smells gathered while creating the case studies. The result is a knowledge base that describes the way we think when we write, read, and clean code.Readers will come away from this book understandingHow to tell the difference between good and bad codeHow to write good code and how to transform bad code into good codeHow to create good names, good functions, good objects, and good classesHow to format code for maximum readability How to implement complete error handling without obscuring code logicHow to unit test and practice test-driven developmentWhat smells and heuristics can help you identify bad codeThis book is a must for any developer, software engineer, project manager, team lead, or systems analyst with an interest in producing better code.