Article

Comparison of software repositories for their usability in software process reconstruction

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Software development process is like any other business process composed of activities carried out by process participants in order to achieve a certain goal. In contrast to a typical business process that is relatively deterministic and thus repeatable, software processes are much more dynamic in nature and dependent on a number of circumstances. This explains why actual software development practice in organizations defer from what these organizations prescribe within their adopted software development methods. The research that is reported in this paper aims at analyzing the suitability of software repositories to support de facto software process reconstruction. We examine most common utility tools that are used in software development and analyze the information they capture (we do that for a number of open source and commercial projects). We than suggest what would be a reasonable level of documentation for a software process so that this information would adequately facilitate project managers and developers at their work. Finally, based on our findings, we provide guidelines on how organizations should use software repositories to support the process reconstruction.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For unlinked commits and issues, techniques such as Frlink can be used [18]. Let us also note that linking commits with issues is a good development practice that has been practiced in open source community for a long time [19] and should be enforced by the companies that want to raise the quality of their development processes. ...
... It is important to note here that companies should require from their developers to link commits with issues otherwise important information is missing. The open-source community seems to be aware of thatin MongoDB and Hibernate, for example, over 90% of all commits from 2014 are linked to at least one issue from Jira[19]. ...
Article
Full-text available
Software development is a complex process that requires disciplined engineering approaches. Empirical studies show that companies still don’t document their development practice, or if they do, these are not up-to-date and do not reflect how they really develop software. The main objective of this paper is to propose an approach that can help companies in documenting their real development practice. Comparing to existing approaches that require substantial effort on the side of project members, our approach extracts information on development practice directly from software repositories. Five companies have been studied to identify information that can be retrieved from software repositories. Based on this, an approach to reconstruct development practice has been developed. The approach has been evaluated on a real software repository shared by an additional company. The results confirm that software repository information suffice for the reconstruction of various aspects of development process, i.e. disciplines, activities, roles, and artifacts.
... [19,32,39,40] 2 Software repositories such as version control systems, source code configuration control systems and bug tracking systems. Mining software repositories reveal data about the real process executions [26,29,36,37,39] 3 ...
Chapter
Process mining is a process management technique that allows for the analysis of business processes based on the event logs and its aim is to discover, monitor and improve executed processes by extracting knowledge from event logs readily available in information systems. The popularity of agile software development methods has been increasing in the software development field over the last two decades and many software organizations develop software using agile methods. Process mining can provide complementary tools to Agile organizations for process management. Process mining can be used to discover agile processes followed by agile teams to establish the baselines and to determine the fidelity or they can be used to obtain feedback to improve agility. Despite the potential benefit of using process mining for agile software development, there is a lack of research that systematically analyzes the usage of process mining in agile software development. This paper presents a systematic mapping study on usage of process mining in agile software development approaches. The aim is to find out the usage areas of process mining in agile software development, explore commonly used algorithms, data sources, data collection mechanisms, analysis techniques and tools. The study has shown us that process mining is used in Agile software development especially for the purpose of process discovery from task tracking applications. We also observed that source code repositories are main data sources for process mining, a diversity of algorithms are used for analysis of collected data and ProM is the most widely used analysis tool for process mining.
... Banerjee and colleagues examined Mozilla and Eclipse repositories, finding that the maturity of a reporter reduces how often insignificant, poor quality, and duplicate bugs are detected [7]. Jankovic and colleagues find issues and commits can be used to reconstruct software processes; they define issues as parallel or sequential with the existence or nonexistence of a "block" relationship link between issues [11]. The work we present adds to these earlier efforts by identifying the prevalence and use of relationships between issues that are manually specified by developers. ...
Conference Paper
Software developers use issues as a means to describe a range of activities to be undertaken on a software system, including features to be added and defects that require fixing. When creating issues, software developers expend manual effort to specify relationships between issues, such as one issue blocking another or one issue being a sub-task of another. In particular, developers use a variety of relationships to express how work is to be broken down on a project. To better understand how software developers use work breakdown relationships between issues, we manually coded a sample of work breakdown relationships from three open source systems. We report on our findings and describe how the recognition of work breakdown relationships opens up new ways to improve software development techniques.
Conference Paper
Full-text available
Software development processes are often not explicitly mod-elled and sometimes even chaotic. In order to keep track of the involved documents and files, engineers use software configuration management systems. Along the way, those systems collect and store information on the software development process itself. In this paper, we show how this information can be used for constructing explicit process models, which is called process mining; and we show how the Process Mining Framework ProM can help engineers in obtaining a process model and in analysing, optimising and better understanding their software processes.
Conference Paper
Full-text available
Current enterprises spend much effort to obtain precise models of their system engineering processes in order to improve the process capability of the organization. The manual design of workflow models is complicated, time-consuming and error-prone; capabilities of human beings in detecting discrepancies between the actual process and the process model are rather limited. Therefore, automatic techniques for deriving these models are becoming more and more important. In this paper, we present an idea that exploits the user interaction with a version management system for the incremental automatic derivation, refinement and analysis of process models. Though this idea is not fully worked out yet, we sketch the architecture of the solution and the algorithms for the main steps of incremental automatic derivation of process models.
Conference Paper
Software development processes are often viewed as a panacea for software quality: prescribe a process and a quality project will emerge. Unfortunately this has not been the case, as practitioners are prone to push against processes that they do not perceive as helpful, often much to the dismay of stakeholders such as their managers. Yet practitioners still tend to follow some sort of software development processes regardless of the prescribed processes. Thus if a team wants to recover the software development processes of a project or if team is trying to achieve a certification such as ISO9000 or CMM, the team will be tasked with describing their development processes. Previous research has tended to focus on modifying existing projects in order to extract process related information. In contrast, our approach of software process recovery attempts to analyze software artifacts extracted from software repositories in order to infer the underlying software development processes visible within these software repositories.
Conference Paper
In this paper we describe the application of process mining techniques to analyze a software development process. Software engineering practitioners often conduct quality auditing of the development process to assure conformance with organizational standards. Despite some works have explored process mining techniques for the conformance analysis of general business processes, it is not of our knowledge any study that applies process mining to conformance checking of software development processes. Under a practical perspective, this paper explores a real database with event logs generated in the past five years of execution of a software development process. The database was gently provided by a Brazilian software house with annual revenue of more than US$ 500 million and includes more than 2,000 cases (process instances). The results show that process mining can be effectively employed as a supporting tool for the management of software development processes and for the improvement of the maturity level of software engineering organizations.
Article
Data mining is an exhausting yet very useful task. It can help programmers understand the nature of development process and reveal potential problems during the process. Thus more reliable software can be created. As open source software development communities thrive, many software repositories with useful information about the development process become publicly available, enabling us to do research. In this paper, the author explores an approach based on mapping the artifacts to the process activity of FreeBSD repository to study the control-flow of the development process.
Conference Paper
Often stakeholders, such as developers, managers, or buyers, want to find out what software development processes are being followed within a software project. Their reasons include: CMM and ISO 9000 compliance, process validation, management, acquisitions, and business intelligence. Recovering the software development processes from an existing project is expensive if one must rely upon manual inspection of artifacts and interviews of developers and their managers. Researchers have suggested live observation and instrumentation of a project to allow for more measurement, but this is costly, invasive, and also requires a live running project. Instead, we propose an after the fact analysis: software process recovery. This approach analyzes version control systems, bug trackers and mailing list archives using a variety of supervised and unsupervised techniques from machine learning, topic analysis, natural language processing and statistics. We can combine all of these methods to recover process events that we map back to software development processes like the Unified Process. We can produce diagrams called Recovered Unified Process Views (RUPV) that are similar to the Unified Process diagram, a time-line of effort per parallel discipline occurring across time. We then validate these methods using case studies of multiple open source software systems.
Conference Paper
The development process for a given software system is a combination of an idealized, prescribed model and a messy set of ad hoc practices. To some degree, process compliance can be enforced by supporting tools that require various steps be followed in order; however, this approach is often perceived as heavyweight and inflexible by developers, who generally prefer that tools support their desired work habits rather than limit their choices. An alternative approach to monitoring process compliance is to instrument the various tools and repositories that developers use - such as version control systems, bug-trackers, and mailing-list archives - and to build models of the de facto development process through observation, analysis, and inference. In this paper, we present a technique for recovering a project's software development processes from a variety of existing artifacts. We first apply unsupervised and supervised techniques - including word-bags, topic analysis, summary statistics, and Bayesian classifiers - to annotate software artifacts by related topics, maintenance types, and non-functional requirements. We map the analysis results onto a time-line based view of the Unified Process development model, which we call Recovered Unified Process Views. We demonstrate our approach for extracting these process views on two case studies: FreeBSD and SQLite.
Conference Paper
Acquiring general understanding of large software systems and components from which they are built can be a time consuming task, but having such an understanding is an important prerequisite to adding features or fixing bugs. In this paper we propose the tool, namely TopicXP, to support developers during such software maintenance tasks by extracting and analyzing unstructured information in source code identifier names and comments using Latent Dirichlet Allocation. TopicXP enables developers to gain an overview of a software system under analysis by extracting and visualizing natural language topics, which generally correspond to concepts or features implemented in software classes. TopicXP is implemented as an open-source Eclipse plug-in, which proposes interactive visualization of topics along with structural dependencies between underlying classes implementing these topics. The paper also presents the results of a preliminary user study aimed at evaluating TopicXP.
Conference Paper
Software developers' activities are in general recorded in software repositories such as version control systems, bug trackers and mail archives. While abundant information is usually present in such repositories, successful information extraction is often challenged by the necessity to simultaneously analyze different repositories and to combine the information obtained. We propose to apply process mining techniques, originally developed for business process analysis, to address this challenge. However, in order for process mining to become applicable, different software repositories should be combined, and “related” software development events should be matched: e.g., mails sent about a file, modifications of the file and bug reports that can be traced back to it. The combination and matching of events has been implemented in FRASR (Framework for Analyzing Software Repositories), augmenting the process mining framework ProM. FRASR has been successfully applied in a series of case studies addressing such aspects of the development process as roles of different developers and the way bug reports are handled.
Conference Paper
Software repositories, such as source code, email archives, and bug databases, contain unstructured and unlabeled text that is difficult to analyze with traditional techniques. We propose the use of statistical topic models to automatically discover structure in these textual repositories. This discovered structure has the potential to be used in software engineering tasks, such as bug prediction and traceability link recovery. Our research goal is to address the challenges of applying topic models to software repositories.
Conference Paper
Capstone projects are commonly carried out at the end of an undergraduate program of study in software engineering or computer science. While traditionally such projects solely focussed on the software product to be developed, in more recent work importance of the development process has been stressed. Currently process quality assessment techniques are limited to review of intermediary artifacts, self- and peer evaluations. We advocate augmenting the assessment by mining software repositories used by the students during the development. We present the assessment methodology and illustrate it by applying to a number of software engineering capstone projects.
Conference Paper
As development on a software project progresses, devel- opers shift their focus between different topics and tasks many times. Managers and newcomer developers often seek ways of understanding what tasks have recently been worked on and how much effort has gone into each; for example, a manager might wonder what unexpected tasks occupied their team's attention during a period when they were supposed to have been implementing new features. Tools such as Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) can be used to extract a set of inde- pendent topics from a corpus of commit-log comments. Pre- vious work in the area has created a single set of topics by analyzing comments from the entire lifetime of the project. In this paper, we propose windowing the topic analysis to give a more nuanced view of the system's evolution. By us- ing a defined time-window of, for example, one month, we can track which topics come and go over time, and which ones recur. We propose visualizations of this model that al- lows us to explore the evolving stream of topics of develop- ment occurring over time. We demonstrate that windowed topic analysis offers advantages over topic analysis applied to a project's lifetime because many topics are quite local.
Conference Paper
Process modeling is one of the most significant tasks software process improvement teams perform. Conventionally modeling is performed by teams composed of domain experts and process engineers and it takes considerable effort and time. In recent years there have been studies which use the data extracted from the actual events that took place to determine process models. In this paper we present the results of a case study we conducted to determine the effectiveness of four process discovery and process mining algorithms. After applying process discovery algorithms we compare the results by the actual process and process definitions. We discuss the discrepancies between the actual flow and the process definitions, and the weaknesses and strong aspects of the algorithms.
Article
Many software process methods and tools presuppose the existence of a formal model of a process. Unfortunately, developing a formal model for an on-going, complex process can be difficult, costly, and error prone. This presents a practical barrier to the adoption of process technologies. The barrier would be lowered by automating the creation of formal models. We are currently exploring techniques that can use basic event data captured from an on-going process to generate a formal model of process behavior. We term this kind of data analysis process discovery. This paper describes and illustrates three methods with which we have been experimenting: algorithmic grammar inference, Markov models, and neural networks. 1 Introduction The issues of managing and improving the process of developing and maintaining software have come to the forefront of software engineering research. In response, new methods and tools for supporting various aspects of the software process have been devised. M...
Activity mining for discovering software process models
  • E Kindler
  • V Rubin
  • W Schäfer