About
124
Publications
46,510
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,404
Citations
Introduction
Lech Madeyski currently works at the Faculty of Information and Communication Technology, Wroclaw University of Science and Technology. Lech does research in Software Engineering and Data Science including AI/ML in Software Engineering. His RG project is 'Reproducible Research, Modern Statistical Methods and Enhancing Credibility of Empirical Research (with focus on Software Engineering).'
Publications
Publications (124)
Context
Code smells are symptoms of wrong design decisions or coding shortcuts that may increase defect rate and decrease maintainability. Research on code smells is accelerating, focusing on code smell detection and using code smells as defect predictors. Recent research shows that even between software developers, agreement on what constitutes a...
Context: Recent papers have proposed the use of grey literature (GL) and multivocal reviews. These papers have raised issues about the practices used for systematic reviews (SRs) in software engineering (SE) and suggested that there should be changes to the current SR guidelines.
Objective: To investigate whether current SR guidelines need to be ch...
Context: Several tertiary studies have criticized the reporting of software engineering secondary studies.
Objective: Our objective is to identify guidelines for reporting software engineering (SE) secondary studies which would address problems observed in the reporting of software engineering systematic reviews (SRs).
Method: We review the criti...
Context
Code smells are patterns in source code associated with an increased defect rate and a higher maintenance effort than usual, but without a clear definition. Code smells are often detected using rules hard-coded in detection tools. Such rules are often set arbitrarily or derived from data sets tagged by reviewers without the necessary indust...
This experience paper describes thirteen considerations for implementing machine learning software defect prediction (ML SDP) in vivo. Specifically, we provide the following report on the ground of the most important observations and lessons learned gathered during a large-scale research effort and introduction of ML SDP to the system-level testing...
Continuous Build Outcome Prediction (CBOP) is a lightweight implementation of Continuous Defect Prediction (CDP). CBOP combines: 1) results of continuous integration (CI) and 2) the data mined from the version control system with 3) machine learning (ML) to form a practice that evolved from software defect prediction (SDP) where a failing build is...
The process of software code review is a well-established practice in software engineering. Previous research identified quality metrics for code review. However, to our knowledge, this paper is the first that uses those review smells and metrics as predictors in software defect prediction. We used review process metrics used in other studies as we...
Context: Code Smells—a concept not fully understood among programmers, crucial to the code quality, and yet unstandardized in the scientific literature. Objective: Goal (#1): To provide a widely accessible Catalog that can perform useful functions both for researchers as a unified data system, allowing immediate information extraction, and for prog...
Context
The ever-growing size and complexity of industrial software products pose significant quality assurance challenges to engineering researchers and practitioners, despite the constant effort to increase knowledge and improve the processes. 5G technology developed by Nokia is one example of such a grand and highly complex system with improveme...
SZZ algorithm is one of the most important algorithms in mining software defects as it allows to create data sets for the sake of software defect prediction. Unfortunately, still very few open source implementations of this algorithm were created. In recent years two interesting open source implementations of SZZ algorithm have been created, which...
One of corner stones of software development are test cases, which help in assessment of created production code. As long as they are properly designed, they have a capacity to capture faults. In order to check whether tests are well made, different procedures have been established, like statement coverage or mutation testing, to evaluate their per...
Context Code smells in the software systems are indications that usually correspond to deeper problems that can negatively influence software quality characteristics. This review is a part of a R&D project aiming to improve the existing codebeat platform that help developers to avoid code smells and deliver quality code. Objective This study aims t...
The purpose of this paper is to analyze the security and scalability problems occurring in private permissionless blockchain systems. The consent management system (CMS) based upon Hyperledger Fabric (HLF), was implemented in the selected blockchain-as-a-service (BaaS), and therefore led to consent-as-a-service (CaaS) deployment. The experiments re...
Continuous Defect Prediction (CDP) is an assisting software development practice that combines Software Defect Prediction (SDP) with machine learning aided modelling and continuous developer feedback. Jaskier is a set of software tools developed under the supervision and with the participation of the authors of the article that implements a lightwe...
We explain the idea of Continuous Build Outcome Prediction (CBOP) practice that uses classification to label the possible build results (success or failure) based on historical data and metrics (features) derived from the software repository. Additionally, we present a preliminary empirical evaluation of CBOP in a real live software project. In a s...
Context: Although there are many tools for performing Systematic Literature Reviews (SLRs), none allows searching for articles using their full text across multiple digital libraries. Goal: This study aimed to show that searching the full text of articles is important for SLRs, and to provide a way to perform such searches in an automated and unifi...
Context: In empirical software engineering, crossover designs are popular for experiments comparing software engineering techniques that must be undertaken by human participants. However, their value depends on the correlation (r) between the outcome measures on the same participants. Software engineering theory emphasizes the importance of individ...
Sharing research data from public funding is an important topic, especially now, during times of global emergencies like the COVID-19 pandemic, when we need policies that enable rapid sharing of research data. Our aim is to discuss and review the revised Draft of the OECD Recommendation Concerning Access to Research Data from Public Funding. The Re...
There are inconsistencies between the formulas for the variance of standardized mean difference (SMD) in the Cochrane Handbook for Systematic Reviews and the variance reported in other sources. Instead of the variance appropriate for the SMD of a crossover experiment, the Cochrane Handbook uses the variance appropriate for a pre-test post-test expe...
Context: Research on code smells accelerates and there are many studies that discuss them in the machine learning context. However, while data sets used by researchers vary in quality, all which we encountered share visible shortcomings---data sets are gathered from a rather small number of often outdated projects by single individuals whose profes...
Context: The Technical Debt metaphor has grown in popularity. More software is being created and has to be maintained. Agile methodologies, in particular Scrum, are widely used by development teams around the world. Estimation is an often practised step in sprint planning. The subject matter of this paper is the impact technical debt has on estimat...
Context
Previous studies have raised concerns about the analysis and meta-analysis of crossover experiments and we were aware of several families of experiments that used crossover designs and meta-analysis.
Objective
To identify families of experiments that used meta-analysis, to investigate their methods for effect size construction and aggregat...
BACKGROUND: Continuous Test-Driven Development (CTDD) is, proposed by the authors, enhancement of the well- established Test-Driven Development (TDD) agile software development and design practice. CTDD combines TDD with continuous testing (CT) that essentially perform background testing. The idea is to eliminate the need to execute tests manually...
Background : Mutation testing is a widely explored technique used to evaluate the quality of software tests, but little attention has been given to its mathematical foundations.
Aim : We provide a formal description of the core concepts in mutation testing, relations between them and conclusions that can be drawn from the presented model.
Method :...
Background: Defining code smell is not a trivial task. Their recognition tends to be highly subjective. Nevertheless some code smells detection tools have been proposed. Other recent approaches incline towards machine learning (ML) techniques to overcome disadvantages of using automatic detection tools. Objectives: We aim to develop a research infr...
The challenge of effective refactoring in the software development cycle brought forward the need to develop automated defect prediction models. Among many existing indicators of bad code, code smells have attracted particular interest of both the research community and practitioners in recent years. In this paper, we describe the current state-of-...
Background Examples of questionable statistical practice, when published in high quality software engineering (SE) journals, may lead to novice researchers adopting incorrect statistical practices.
Objective Our goal is to highlight issues contributing to poor statistical practice in human-centric SE experiments.
Method We reviewed the statistical...
A proper estimation of time in user stories is a crucial task for both the IT team as well as for the customer, especially in agile projects. Estimating time of user story realisation provides clarity and the opportunity to control the project by the management, yet at the same time, it can increase pressure on software developers. Thus, incorrectl...
Vegas et al. IEEE Trans Softw Eng 42(2):120:135 (2016) raised concerns about the use of AB/BA crossover designs in empirical software engineering studies. This paper addresses issues related to calculating standardized effect sizes and their variances that were not addressed by the Vegas et al.’s paper. In a repeated measures design such as an AB/B...
We addressed the issues related to repeated measures experimental design such as an AB/BA crossover design that have been neither discussed nor addressed in the software engineering literature.
Firstly, there are potentially two different standardized mean difference effect sizes that can be calculated, depending on whether the mean difference is s...
Software defect prediction is a promising approach aiming to increase software quality and, as a result, development pace. Unfortunately, the cost effectiveness of software defect prediction in industrial settings is not eagerly shared by the pioneering companies. In particular, this is the first attempt to investigate the cost effectiveness of usi...
Test-Driven Development (TDD) is an agile software development and design practice popularized by the eXtreme Programming methodology. Continuous Test-Driven Development (CTDD), proposed by the authors, is the recent enhancement of the TDD practice and combines TDD with the continuous testing (CT) practice that recommends background testing. Thus C...
This book reports on recent advances in software engineering research and practice. Divided into 15 chapters, it addresses: languages and tools; development processes; modelling, simulation and verification; and education.
In the first category, the book includes chapters on domain-specific languages, software complexity, testing and tools. In the...
We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data rows, representing files involved in Continuous Integration (CI) builds, that synthesize the results of CI builds with data we mine from software repositori...
Context: There have been many changes in statistical theory in the past 30 years, including increased evidence that non-robust methods may fail to detect important results. The statistical advice available to software engineering researchers needs to be updated to address these issues.
Objective: This paper aims both to explain the new results in t...
In this paper, we describe our experience implementing some of classic software engineering metrics using Boa - a large-scale software repository mining platform - and its dedicated language. We also aim to take an advantage of the Boa infrastructure to propose new software metrics and to characterize open source projects by software metrics to pro...
Background: Defect prediction in software can be highly beneficial for development projects, when prediction is highly effective and defect-prone areas are predicted correctly. One of the key elements to gain effective software defect prediction is proper selection of metrics used for dataset preparation. Objective: The purpose of this research is...
We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data rows, representing files involved in Continuous Integration (CI) builds, that synthesize the results of CI builds with data we mine from software repositori...
Traditional mutation testing is a powerful technique to evaluate the quality of test suites. Unfortunately, it is not yet widely used due to the problems of a large number of generated mutants, limited realism (mutants not necessarily reflect real software defects), and equivalent mutants problem. Higher order mutation (HOM) testing has been propos...
Researchers have identified problems with the validity of software engineering research findings. In particular, it is often impossible to reproduce data analyses, due to lack of raw data, or sufficient summary statistics, or undefined analysis procedures. The aim of this paper is to raise awareness of the problems caused by unreproducible research...
This article proposes a novel software engineering practice called Agile Experimentation. It aims mostly small experiments in a business driven software engineering environment where a developer is a scarce resource and the impact of the experimentation on the return-of-investment driven software project needs to be minimal. In such environment the...
Software defect prediction is a promising, new approach to increase both, software quality and development pace. Unfortunately, the cost effectiveness of software defect prediction in industrial settings is not eagerly shared by the pioneering companies. In particular, the cost effectiveness of using the DePress open source software measurement fra...
This book presents the proceedings of the KKIO Software Engineering Conference held in Wrocław, Poland in September 15-17, 2016. It contains the carefully reviewed and selected scientific outcome of the conference, which had the motto: “Better software = more efficient enterprise: challenges and solutions”. Following this mission, this book is a co...
A proper estimation of time in user stories is a crucial task for both the IT team as well as for the customer, especially in agile projects. Although agile practices offer a lot of flexibility and promote a culture of continuous change, there are always clearly de need timeboxed periods where an IT company has to commit to delivering working soft-...
Mutation testing, which includes first order mutation (FOM) testing and higher order mutation (HOM) testing, appeared as a powerful and effective technique to evaluate the quality of test suites. The live mutants, which cannot be killed by the given test suite, make up a significant part of generated mutants and may drive the development of new tes...
The goal of higher order mutation testing is to improve mutation testing effectiveness in particular and test effectiveness in general. There are different approaches which have been proposed in the area of second order mutation testing and higher order mutation testing with mutants order ranging from 2 to 70. Unfortunately, the empirical evidence...
First order mutation testing is used to evaluate the quality of a given set of test cases by inserting single changes into the program under test to produce first order mutants (FOMs) of the original program, and then checking whether tests are good enough to detect the artificially injected defects. However, mutation testing is not yet widely used...
The knowledge about the software metrics which serve as defect indicators is vital for the efficient allocation of resources for quality assurance. It is the process metrics, although sometimes difficult to collect, which have recently become popular with regard to defect prediction. However, in order to identify rightly the process metrics which a...
The paper presents an analysis of 83 versions of industrial, open-source and academic projects. We have empirically evaluated whether those project types constitute separate classes of projects with regard to defect prediction. Statistical tests proved that there exist significant differences between the models trained on the aforementioned project...
Case studies focused on software defect prediction in real, industrial software development projects are extremely rare. We report on dedicated R&D project established in cooperation between Wroclaw University of Technology and one of the leading automotive software development companies to research possibilities of introduction of software defect...
Higher order mutation testing is considered a promising solution for overcoming the main limitations of first order mutation testing. Strongly subsuming higher order mutants (SSHOMs) are the most valuable among all kinds of higher order mutants (HOMs) generated by combining first order mutants (FOMs). They can be used to replace all of its constitu...
Context. Software data collection precedes analysis which, in turn, requires data science related skills. Software defect prediction is hardly used in industrial projects as a quality assurance and cost reduction mean. Objectives. There are many studies and several tools which help in various data analysis tasks but there is still neither an open s...
Context. The equivalent mutant problem (EMP) is one of the crucial problems in mutation testing widely studied over decades. Objectives. The objectives are: to present a systematic literature review (SLR) in the field of EMP; to identify, classify and improve the existing, or implement new, methods which try to overcome EMP and evaluate them. Metho...
Since Mutation Testing was proposed in the 1970s, it has been considered as an effective technique of software testing process for evaluating the quality of the test data. In other words, Mutation Testing is used to evaluate the fault detection capability of the test data by inserting errors into the original program to generate mutations, and afte...
Defect Prediction in Software Systems (DePress) Extensible Framework allows building workflows in a graphical manner. DePress is based on the KNIME project. The main aim of the DePress Framework is support for empirical software analysis. It allows you to collect, combine and analyse data from various data sources like software repositories or soft...
http://madeyski.e-informatyka.pl/download/Madeyski13ENASE.pdf
Continuous testing is a technique in modern software development in which the source code is constantly unit tested in the background and there is no need for the developer to perform the tests manually. We propose an extension to this technique that combines it with well-established sof...
Process metrics appear to be an effective addition to software defect prediction models usually built upon product metrics. We present a review of research studies that investigate process metrics in defect prediction. The following process metrics are discussed: Number of Revisions , Number of Distinct Committers, Number of Modified Lines, Is New...
Background: This paper describes an analysisthat was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters. Aims: The goal of this research is to perform clustering on sof...
The purpose of this section is to present and interpret the findings gathered throughout the experimentation process, including meta-analysis, as well as to compare our findings to those of the previous researchers (Sect. 10.1), to derive some rules of thumb useful for practitioners involved in industrial projects (Sect. 10.2), to explain plausible...
This chapter describes experiments conducted at Wroclaw University of Technology since 2004. The description of the experiments contains the information on the context of the experiments, subjects (i.e. participants), experimental materials and tasks, hypotheses and variables and experimental designs
and procedures chosen.
With the rising acceptance of XP, and agile methodologies in general, a growing number of software projects develop and maintain large test suites. Tests are considered a kind of a live documentation for the production code, because tests are always kept in sync with the code as opposed to typical text-based documentation which may not be in sync w...
Pursuing Goal 3.1, expressed in Sect. 3.1, and bearing in mind the selection of dependent variables made in Sect. 3.3.2.1, the aim of this chapter is to evaluate the impact of the TF software development practice on PATP (Percentage of Acceptance Tests Passed) which, in turn, is NATP (Number of Acceptance Tests Passed) Number of acceptance tests pa...
Empirical investigation seeks unambiguous and reliable conclusions. Whenever the results of analysed empirical studies are different or even contradictory, arriving at a single, tenable conclusion becomes problematic, though. Moreover, personal commitment into conducted research is another factor hindering any possibly unbiased summary or interpret...
In Experiment Accounting, the programming time was fixed for all the subjects, while in Experiments Submission and Smells&Library, the development time was measured by means of Eclipse plugin. One serious threat to the analysis performed in Chap. 5 is that the subjects in Experiments Submission and Smells&Library might have spent different times de...
This chapter presents the overview of the majority of empirical studies that have investigated the TF (Test-First) and PP (Pair Programming)
practices versus the TL (Test-Last) and SP (Solo Programming)
practices or closely related treatments. Some empirical studies were excluded due to the toy size of the delivered software products [266], or trea...
This chapter presents research goals and the high level conceptual model used to guide the research along with the independent, the dependent and possible confounding variables.
Popular code coverage measures, such as branch coverage, are indicators of the thoroughness rather than the fault detection capability of test suites. Mutation testing is a fault-based technique that measures the effectiveness of test suites for fault localisation. Unfortunately, use of mutation testing in the software industry is rare because gene...
Abstract Background: Test-First programming is regarded as one of the software development practices that can make unit tests to be more rigorous, thorough and effective in fault detection. Code coverage measures can be useful as indicators of the thoroughness of unit test suites, while mutation testing turned out to be effective at finding faults....
Agile methods are gaining more and more interest both in industry and in research. Many industries are transforming their way of working from traditional waterfall projects with long duration to more incremental, iterative and agile practices. At the same time, the need to evaluate and to obtain evidence for different processes, methods and tools h...
According to Bansiya [18], internal quality indicators influence external quality attributes and, therefore, evaluating a product’s internal characteristics is reasonable. As a result, some useful conclusions can be drawn about the product’s external quality attributes on the basis of its internal characteristics [18]. Relying on Briand et al. [31]...
Pair programming (PP) is regarded as one of the practices that can make testing more rigorous, thorough and effective. Therefore, we examined PP versus solo programming (SP) with respect to both thoroughness and fault detection effectiveness of test suites. Branch coverage (BC) and mutation score indicator (MSI) were used as measures of how thoroug...
The aspect-oriented programming (AOP) approach is supposed to enhance a system's features such as modularity, readability and simplicity. Owing to a better modularisation of cross- cutting concerns, the developed system implementation would be less complex, and more readable. Thus, software development efficiency would increase, so the system would...
Test-driven development (TDD) is entering the mainstream of software development. We examined the software development process
for the purpose of evaluation of the TDD impact, with respect to software development productivity, in the context of aweb
based system development. The design of the study is based on Goal-Question-Metric approach, and may...