ArticlePDF Available

Foundations of Empirical Software Engineering: The Legacy of Victor R. Basili

Authors:

Abstract

This book captures the main scientific contributions of Victor R. Basili, who has significantly shaped the field of empirical software engineering from its very start. He was the first to claim that software engineering needed to follow the model of other physical sciences and develop an experimental paradigm. By working on this postulate, he developed concepts that today are well known and widely used, including the Goal-Question-Metric method, the Quality-Improvement paradigm, and the Experience Factory. He is one of the few software pioneers who can aver that their research results are not just scientifically acclaimed but are also used as industry standards. On the occasion of his 65th birthday, celebrated with a symposium in his honor at the International Conference on Software Engineering in St. Louis, MO, USA in May 2005, Barry Boehm, Hans Dieter Rombach, and Marvin V. Zelkowitz, each a long-time collaborator of Victor R. Basili, selected the 20 most important research papers of their friend, and arranged these according to subject field. They then invited renowned researchers to write topical introductions. The result is this commented collection of timeless cornerstones of software engineering, hitherto available only in scattered publications.
A preview of the PDF is not available
... Therefore, conventional NFR testing practices are primarily performed manually, which is neither efficient nor effective (Júnior 2020). In fact, faulty NFRs produce additional work, which can account for 40% to 50% of the total work done in some software projects (Wagner 2006;Boehm and Basili 2005). Given this, we note that many product development organizations lack a shared understanding of NFR validation (Werner et al. 2020) in a continuous software engineering context. ...
... Low priority or awareness of internal NFRs could cause internal NFRs (e.g., scalability) to not be tested or overlooked (Aljallabi and Mansour 2015), which could result in additional cost. For example, if a system (e.g., banking service) is not built to easily incorporate the functionality required to accommodate different time zones, costly rework may be required (Boehm and Basili 2005). ...
Article
Full-text available
Context Non-functional requirements (NFRs) (also referred to as system qualities) are essential for developing high-quality software. Notwithstanding its importance, NFR testing remains challenging, especially in terms of automation. Compared to manual verification, automated testing shows the potential to improve the efficiency and effectiveness of quality assurance, especially in the context of Continuous Integration (CI). However, studies on how companies manage automated NFR testing through CI are limited. Objective This study examines how automated NFR testing can be enabled and supported using CI environments in software development companies. Method We performed a multi-case study at four companies by conducting 22 semi-structured interviews with industrial practitioners. Results Maintainability , reliability , performance , security and scalability , were found to be evaluated with automated tests in CI environments. Testing practices, quality metrics, and challenges for measuring NFRs were reported. Conclusions This study presents an empirically derived model that shows how data produced by CI environments can be used for evaluation and monitoring of implemented NFR quality. Additionally, the manuscript presents explicit metrics, CI components, tools, and challenges that shall be considered while performing NFR testing in practice.
... This is a highly demanding scenario. Considering that it has been estimated that software contains approximately 1-10 errors per thousand lines of code [135][136][137], methods are needed to alleviate the developers' work and improve the quality of the final product. ...
Article
Full-text available
The field of low-temperature plasmas (LTPs) excels by virtue of its broad intellectual diversity, interdisciplinarity and range of applications. This great diversity also challenges researchers in communicating the outcomes of their investigations, as common practices and expectations for reporting vary widely in the many disciplines that either fall under the LTP umbrella or interact closely with LTP topics. These challenges encompass comparing measurements made in different laboratories, exchanging and sharing computer models, enabling reproducibility in experiments and computations using traceable and transparent methods and data, establishing metrics for reliability and in translating fundamental findings to practice. In this paper, we address these challenges from the perspective of LTP standards for measurements, diagnostics, computations, reporting and plasma sources. This discussion on standards, or recommended best practices, and in some cases suggestions for standards or best practices, has as the goal improving communication, reproducibility and transparency within the LTP field and fields allied with LTPs. This discussion also acknowledges that standards and best practices, either recommended or at some point enforced, are ultimately a matter of judgment. These standards and recommended practices should not limit innovation nor prevent research breakthroughs from having real-time impact. Ultimately, the goal of our research community is to advance the entire LTP field and the many applications it touches through a shared set of expectations.
... Bug reports are an essential resource for long-term maintenance of software systems. Developers share information, discuss bugs, and fix associated bugs through bug reports [1]. Bug reports are managed by using bug tracking systems such as Trac and Bugzilla. ...
Article
Full-text available
During the maintenance phase of software development, bug reports provide important information for software developers. Developers share information, discuss bugs, and fix associated bugs through bug reports; however, bug reports often include complex and long discussions, and developers have difficulty obtaining the desired information. To address this issue, researchers proposed methods for summarizing bug reports; however, to select relevant sentences, existing methods rely solely on word frequencies or other factors that are dependent on the characteristics of a bug report, failing to produce high-quality summaries or resulting in limited applicability. In this paper, we propose a deep-learning-based bug report summarization method using sentence significance factors. When conducting experiments over a public dataset using believability, sentence-to-sentence cohesion, and topic association as sentence significance factors, the results show that our method outperforms the state-of-the-art method BugSum with respect to precision, recall, and F-score and that the application scope of the proposed method is wider than that of BugSum.
... The software debugging-process involves detecting, locating, and correcting faults in software [3]. About 20% of all software faults receive 80% of all the required work to analyze, isolate, and correct software faults [4]. Software malfunction is expected to lose American markets USD 60 billion annually [5]. ...
Article
Full-text available
In today’s fast-paced world of rapid technological change, software development teams need to constantly revise their work practices. Not surprisingly, regular reflection on how to become more effective is perceived as one of the most important principles of Agile Software Development. Nevertheless, running an effective and enjoyable retrospective meeting is still a challenge in real environments. As reported by several studies, the Sprint Retrospective is an agile practice most likely to be implemented improperly or sacrificed when teams perform under pressure to deliver. To facilitate the implementation of the practice, some agile coaches have proposed to set up retrospective meetings in the form of retrospective games. However, there has been little research-based evidence to support the positive effects of retrospective games. Our aim is to investigate whether the adoption of retrospective games can improve retrospective meetings in general and lead to positive societal outcomes. In this paper, we report on an Action Research project in which we implemented six retrospective games in six Scrum teams that had experienced common retrospective problems. The received feedback indicates that the approach helped the teams to mitigate many of the “accidental difficulties” pertaining to the Sprint Retrospective, such as lack of structure, dullness, too many complaints, or unequal participation and made the meetings more productive to some degree. Moreover, depending on their individual preferences, different participants perceived different games as having a positive impact on their communication, motivation-and-involvement, and/or creativity, even though there were others, less numerous, who had an opposite view. The advantages and disadvantages of each game as well as eight lessons learned are presented in the paper.
Article
Full-text available
This paper reports a design science research (DSR) study that develops, demonstrates and evaluates a set of design principles for information systems (IS) that utilise learning analytics to support learning and teaching in higher education. The initial set of design principles is created from theory-inspired conceptualisation based on the literature, and they are evaluated and revised through a DSR process of demonstration and evaluation. We evaluated the developed artefact in four courses with a total enrolment of 1,173 students. The developed design principles for learning analytics information systems (LAIS) to establish a foundation for further development and implementation of learning analytics to support learning and teaching in higher education.
Chapter
Full-text available
Empirical methods like experimentation have become a powerful means to drive the field of software engineering by creating scientific evidence on software development, operation, and maintenance, but also by supporting practitioners in their decision-making and learning. Today empirical methods are fully applied in software engineering. However, they have developed in several iterations since the 1960s. In this chapter we tell the history of empirical software engineering and present the evolution of empirical methods in software engineering in five iterations, i.e., (1) mid-1960s to mid-1970s, (2) mid-1970s to mid-1980s, (3) mid-1980s to end of the 1990s, (4) the 2000s, and (5) the 2010s. We present the five iterations of the development of empirical software engineering mainly from a methodological perspective and additionally take key papers, venues, and books, which are covered in chronological order in a separate section on recommended further readings, into account. We complement our presentation of the evolution of empirical software engineering by presenting the current situation and an outlook in Sect. 4 and the available books on empirical software engineering. Furthermore, based on the chapters covered in this book we discuss trends on contemporary empirical methods in software engineering related to the plurality of research methods, human factors, data collection and processing, aggregation and synthesis of evidence, and impact of software engineering research.
Article
Remarkably little is known about the cognitive processes which are employed in the solution of clinical problems. This paucity of information is probably accounted for in large part by the lack of suitable analytic tools for the study of the physician's thought processes. Here we report on the use of the computer as a laboratory for the study of clinical cognition.
Conference Paper
Research in software metrics incorporated in a framework established for software quality measurement can potentially provide significant benefits to software quality assurance programs. The research described has been conducted by General Electric Company for the Air Force Systems Command Rome Air Development Center. The problems encountered defining software quality and the approach taken to establish a framework for the measurement of software quality are described in this paper.
Article
Applied Statistics is currently published by Royal Statistical Society. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
Article
Experience from a dozen years of analyzing software engineering processes and products is summarized as a set of software engineering and measurement principles that argue for software engineering process models that integrate sound planning and analysis into the construction process. In the TAME (Tailoring A Measurement Environment) project at the University of Maryland, such an improvement-oriented software engineering process model was developed that uses the goal/question/metric paradigm to integrate the constructive and analytic aspects of software development. The model provides a mechanism for formalizing the characterization and planning tasks, controlling and improving projects based on quantitative analysis, learning in a deeper and more systematic way about the software process and product, and feeding the appropriate experience back into the current and future projects. The TAME system is an instantiation of the TAME software engineering process model as an ISEE (integrated software engineering environment). The first in a series of TAME system prototypes has been developed. An assessment of experience with this first limited prototype is presented including a reassessment of its initial architecture
Article
A description is given of a procedure for certifying the reliability of software before its release to users. The ingredients of this procedure are a life cycle of executable product increments, representative statistical testing, and a standard estimate of the MTTF (mean time to failure) of the product at the time of its release. The authors also discuss the development of certified software products and the derivation of a statistical model used for reliability projection. Available software test data are used to demonstrate the application of the model in certification process.
Article
This paper recommends the iterative enhancement' technique as a practical means of using a top-down, stepwise refinement approach to software development. This technique begins with a simple initial implementation of a property chosen (skeletal) subproject which is followed by the gradual enhancement of successive implementations in order to build the full implementation. The development and quantitative analysis of a production compiler for the language SIMPL-T is used to demonstrate that the application of iterative enhancement to software development is practical and efficient, encourages the generation of an easily modifiable product, and facilities reliability.
Article
The theory permits the estimation, in advance of a project, of the amount of testing in terms of execution time required to achieve a specified reliability goal [stated as a mean time to failure (MTTF)]. Execution time can then be related to calendar time, permitting a schedule to be developed. Estimates of execution time and calendar time remaining until the reliability goal is attained can be continually remade as testing proceeds, based only on the length of the execution time intervals between failures. The current MTTF and the number of errors remaining can also be estimated. Maximum likelihood estimation is employed, and confidence intervals are also established. The foregoing information is obviously very valuable in scheduling and monitoring the progress of program testing. A program has been implemented to compute the foregoing quantities. The reliability model that has been developed can be used in making system tradeoffs involving software or software and hardware components. It also provides a soundly based unit of measure for the comparative evaluation of various programming techniques that are expected to enhance reliability. The model has been applied to four medium-sized software development projects, all of which have completed their life cycles.
Article
The language in which programs are written can have a substantial effect on their reliability. This paper discusses the design of programming languages to enhance reliability. It presents several general design principles, and then applies them to particular languages constructs. Since the validity of such design principles cannot be logically proved, empirical evidence is needed to support or discredit them. A major experiment to measure the effect of nine specific language-design decisions in one context has been performed. Analysis of the frequency and persistence of errors shows that several decisions had a significant impact on reliability.
Article
This experiment represents a new approach to the study of the psychology of programming, and demonstrates the feasibility of studying an isolated part of the programming process in the laboratory. Thirty experienced FORTRAN programmers debugged 12 one-page FORTRAN listings, each of which was syntactically correct but contained one non-syntactic error (bug). Three classes of bugs (Array bugs, Iteration bugs, and bugs in Assignment Statements) in each of four different programs were debugged. The programmers were divided into five groups, based upon the information, or debugging “aids”, given them. Key results were that debug times were short (median = 6 min.). The aids groups did not debug faster than the control group; programmers adopted their debugging strategies based upon the information available to them. The results suggest that programmers often identify the intended state of a program before they find the bug. Assignment bugs were more difficult to find than Array and Iteration bugs, probably because the latter could be detected from a high-level understanding of the programming language itself. Debugging was at least twice as efficient the second time programmers debugged a program (though with a different bug in it). A simple hierarchical description of debugging was suggested, and some possible “principles” of debugging were identified.