ArticlePDF Available

Abstract

During software development and maintenance, developers spend a considerable amount of time on program comprehension activities. Previous studies show that program comprehension takes up as much as half of a developer's time. However, most of these studies are performed in a controlled setting, or with a small number of participants, and investigate the program comprehension activities only within the IDEs. However, developers' program comprehension activities go well beyond their IDE interactions. In this paper, we extend our ActivitySpace framework to collect and analyze Human-Computer Interaction (HCI) data across many applications (not just the IDEs). We follow Minelli et al.'s approach to assign developers' activities into four categories: navigation, editing, comprehension, and other. We then measure the comprehension time by calculating the time that developers spend on program comprehension, e.g. inspecting console and breakpoints in IDE, or reading and understanding tutorials in web browsers. Using this approach, we can perform a more realistic investigation of program comprehension activities, through a field study of program comprehension in practice across a total of seven real projects, on 78 professional developers, and amounting to 3,148 working hours. Our study leverages interaction data that is collected across many applications by the developers. Our study finds that on average developers spend ∼58% of their time on program comprehension activities, and that they frequently use web browsers and document editors to perform program comprehension activities. We also investigate the impact of programming language, developers' experience, and project phase on the time that is spent on program comprehension, and we find senior developers spend significantly less percentages of time on program comprehension than junior developers. Our study also highlights the importance of several research directions needed to reduce program comprehension time, e.g., building automatic detection and improvement of low quality code and documentation, construction of software-engineering-specific search engines, designing better IDEs that help developers navigate code and browse information more efficiently, etc.
Published in IEEE Transactions on Software Engineering, 2017 July, Volume PP, Issue 99, Pages 1-26
http://doi.org/10.1109/TSE.2017.2734091
0 20 40 60 80
Effective Working Hours
100
A B C D E F G
Java C#
Low Medium High
Maintenance
Development
Hengtian IGS

Supplementary resource (1)

... b) Reading and Programming: From documentation to code review to requirements elicitation to code summarization, many software engineering activities involve a significant reading component [66]- [68]. Experimentally, several studies report a correlation between overall programming ability and the ability to read a program and describe its function in natural language [69], [70] or posit natural language reading as a basis for code comprehension [71], [72]. ...
Preprint
Full-text available
Understanding how novices reason about coding at a neurological level has implications for training the next generation of software engineers. In recent years, medical imaging has been increasingly employed to investigate patterns of neural activity associated with coding activity. However, such studies have focused on advanced undergraduates and professionals. In a human study of 31 participants, we use functional near-infrared spectroscopy to measure the neural activity associated with introductory programming. In a controlled, contrast-based experiment, we relate brain activity when coding to that of reading natural language or mentally rotating objects (a spatial visualization task). Our primary result is that all three tasks -- coding, prose reading, and mental rotation -- are mentally distinct for novices. However, while those tasks are neurally distinct, we find more significant differences between prose and coding than between mental rotation and coding. Intriguingly, we generally find more activation in areas of the brain associated with spatial ability and task difficulty for novice coding compared to that reported in studies with more expert developers. Finally, in an exploratory analysis, we also find a neural activation pattern predictive of programming performance 11 weeks later. While preliminary, these findings both expand on previous results (e.g., relating expertise to a similarity between coding and prose reading) and also provide a new understanding of the cognitive processes underlying novice programming.
... The measurement of developers' elapsed time in program comprehension activities beyond their IDE interactions is described in [43] in a field study with professionals. Findings showed that, on average, developers spend 58% of their time on program comprehension activities, and they frequently use web browsers and document editors to perform program comprehension activities. ...
Preprint
Full-text available
Context: Profiling developers is challenging since many factors, such as their skills, experience, development environment and behaviors, may influence a detailed analysis and the delivery of coherent interpretations. Objective: We aim at profiling software developers by mining their software development process. To do so, we performed a controlled experiment where, in the realm of a Python programming contest, a group of developers had the same well-defined set of requirements specifications and a well-defined sprint schedule. Events were collected from the PyCharm IDE, and from the Mooshak automatic jury where subjects checked-in their code. Method: We used n-gram language models and text mining to characterize developers' profiles, and process mining algorithms to discover their overall workflows and extract the correspondent metrics for further evaluation. Results: Findings show that we can clearly characterize with a coherent rationale most developers, and distinguish the top performers from the ones with more challenging behaviors. This approach may lead ultimately to the creation of a catalog of software development process smells. Conclusions: The profile of a developer provides a software project manager a clue for the selection of appropriate tasks he/she should be assigned. With the increasing usage of low and no-code platforms, where coding is automatically generated from an upper abstraction layer, mining developer's actions in the development platforms is a promising approach to early detect not only behaviors but also assess project complexity and model effort.
... Software developers spend more than 50% of their time on activities related to program understanding [1], [2]. Development teams strive to make their code as understandable as possible-refactoring source code to make it more understandable is a central part of agile software methodologies [3], [4]-and base their activities on the results of static code analysis tools to identify areas of code that are still difficult to understand. 1 Preprint, before peer review. ...
Preprint
Full-text available
Static code analysis tools and integrated development environments present developers with quality-related software metrics, some of which are on how understandable their source code is. Software metrics influence overarching strategic decisions that impact the future of companies and the prioritization of everyday software development tasks. Several software metrics, however, lack in validation: we just choose to trust that they reflect what they are supposed to measure. Some of them were even shown to not measure the quality aspects they intend to measure. Yet, they influence us through biases in our cognitive-driven actions. In particular, they might anchor us in our decisions. Whether the anchoring effect exists with software metrics has not been studied yet. We conduct a randomized and double-blind experiment to investigate the extent to which a displayed metric value for source code comprehensibility anchors developers in their subjective rating of source code comprehensibility, whether performance is affected by the anchoring effect when working on comprehension tasks, and which individual characteristics might play a role in the anchoring effect. We found that the displayed value of a comprehensibility metric has a significant and large anchoring effect on a developer's code comprehensibility rating. The effect does not seem to affect the time or correctness when working on comprehension questions related to the code snippet under study. Since the anchoring effect is one of the most robust cognitive biases, and we have limited understanding of the consequences of the demonstrated manipulation of developers by non-validated metrics, we call for an increased awareness of the responsibility in code quality reporting and for corresponding tools to be based on scientific evidence.
... To make changes to the software, developers require to understand the software structure (software architecture). During software maintenance, software engineers spend a considerable amount of time on program comprehension activities [1]. Because of the complex structure and relationships, understanding the structure of a large software system and applying new changes is not an easy task. ...
Article
Full-text available
A software system evolves overtime to meet the needs of users. Understanding a program is the most important step to apply new requirements. Clustering techniques by dividing a program into small and meaningful parts make it possible to understand the program. In general, clustering algorithms are classified into two categories: hierarchical and non-hierarchical algorithms (such as search-based approaches). While clustering problems generally tend to be NP-hard, search-based algorithms produce acceptable clustering, but have time and space constraints and hence they are inefficient in large-scale software systems. Most algorithms currently used in software clustering fields do not scale well when applied to large and very large applications. In this paper, we present a new and fast clustering algorithm, named FCA, that can overcome space and time constraints of existing algorithms by performing operations on the dependency matrix and extracting other matrices based on a set of features. The experimental results on ten small-sized applications, ten folders with different functionalities from Mozilla Firefox, a large-sized application (namely ITK), and a very large-sized application (namely Chromium) demonstrate that the proposed algorithm achieves higher quality modularization compared with hierarchical algorithms. It can also compete with search-based algorithms and a clustering algorithm based on subsystem patterns. But the running time of the proposed algorithm is much shorter than that of the hierarchical and non-hierarchical algorithms. The source code of the proposed algorithm can be accessed at https://github.com/SoftwareMaintenanceLab.
... Code comments provide a clear natural language description for a piece of the source code, which can help software developers understand programs quickly and correctly [37]. Previous studies showed that during software maintenance, program comprehension takes more than half of the time [6,9,22,44]. Although proper comments are very helpful for software maintenance, they are absent or out-dated in many software projects [7]. ...
Preprint
Full-text available
Code comment generation which aims to automatically generate natural language descriptions for source code, is a crucial task in the field of automatic software development. Traditional comment generation methods use manually-crafted templates or information retrieval (IR) techniques to generate summaries for source code. In recent years, neural network-based methods which leveraged acclaimed encoder-decoder deep learning framework to learn comment generation patterns from a large-scale parallel code corpus, have achieved impressive results. However, these emerging methods only take code-related information as input. Software reuse is common in the process of software development, meaning that comments of similar code snippets are helpful for comment generation. Inspired by the IR-based and template-based approaches, in this paper, we propose a neural comment generation approach where we use the existing comments of similar code snippets as exemplars to guide comment generation. Specifically, given a piece of code, we first use an IR technique to retrieve a similar code snippet and treat its comment as an exemplar. Then we design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input, and leverages the information from the exemplar to assist in the target comment generation based on the semantic similarity between the source code and the similar code. We evaluate our approach on a large-scale Java corpus, which contains about 2M samples, and experimental results demonstrate that our model outperforms the state-of-the-art methods by a substantial margin.
... It is difficult to identify developers' mental states during program comprehension based solely on external observations because the processes mainly take place in the brain. Behavior tracking systems capture developers' program comprehension processes from human computer interaction data [3]. However, the tracking systems get no behavioral data if developers spend time to think without any interaction with computers. ...
... Software comprehension is estimated to consume 58% of software developers' time [32]. A lot of research has been done on creating curricula for educating students in software comprehension, maintenance, and evolution. ...
Preprint
Full-text available
Context : Software comprehension and maintenance activities, such as refactoring, are said to be negatively impacted by software complexity. The methods used to measure software product and processes complexity have been thoroughly debated in the literature. However, the discernment about the possible links between these two dimensions, particularly on the benefits of using the process perspective, has a long journey ahead. Objective: To improve the understanding of the liaison of developers' activities and software complexity within a refactoring task, namely by evaluating if process metrics gathered from the IDE, using process mining methods and tools, are suitable to accurately classify different refactoring practices and the resulting software complexity. Method: We mined source code metrics from a software product after a quality improvement task was given in parallel to (117) software developers, organized in (71) teams. Simultaneously, we collected events from their IDE work sessions (320) and used process mining to model their processes and extract the correspondent metrics. Results: Most teams using a plugin for refactoring (JDeodorant) reduced software complexity more effectively and with simpler processes than the ones that performed refactoring using only Eclipse native features. We were able to find moderate correlations (43%) between software cyclomatic complexity and process cyclomatic complexity. The best models found for the refactoring method and cyclomatic complexity level predictions, had an accuracy of 92.95% and 94.36%, respectively. Conclusions: Our approach agnostic to programming languages, geographic location, or development practices. Initial findings are encouraging, and lead us to suggest practitioners may use our method in other development tasks, such as, defect analysis and unit or integration tests.
Article
Full-text available
Recent years have witnessed the increasing emphasis on human aspects in software engineering research and practices. Our survey of existing studies on human aspects in software engineering shows that screen-captured videos have been widely used to record developers’ behavior and study software engineering practices. The screen-captured videos provide direct information about which software tools the developers interact with and which content they access or generate during the task. Such Human-Computer Interaction (HCI) data can help researchers and practitioners understand and improve software engineering practices from human perspective. However, extracting time-series HCI data from screen-captured task videos requires manual transcribing and coding of videos, which is tedious and error-prone. In this paper we report a formative study to understand the challenges in manually transcribing screen-captured videos into time-series HCI data. We then present a computer-vision based video scraping technique to automatically extract time-series HCI data from screen-captured videos. We also present a case study of our scvRipper tool that implements the video scraping technique using 29-hours of task videos of 20 developers in two development tasks. The case study not only evaluates the runtime performance and robustness of the tool, but also performs a detailed quantitative analysis of the tool’s ability to extract time-series HCI data from screen-captured task videos. We also study the developer’s micro-level behavior patterns in software development from the quantitative analysis.
Article
Full-text available
Turnover is the phenomenon of continuous influx and retreat of human resources in a team. Despite being well-studied in many settings, turnover has not been characterized for open-source software projects. We study the source code repositories of five open-source projects to characterize patterns of turnover and to determine the effects of turnover on software quality. We define the base concepts of both external and internal turnover, which are the mobility of developers in and out of a project, and the mobility of developers inside a project, respectively. We provide a qualitative analysis of turnover patterns. We also found, in a quantitative analysis, that the activity of external newcomers negatively impact software quality.
Conference Paper
The utility of source code, as of other knowledge artifacts, is predicated on the existence of individuals skilled enough to derive value by using or improving it. Developers leaving a software project deprive the project of the knowledge of the decisions they have made. Previous research shows that the survivors and newcomers maintaining abandoned code have reduced productivity and are more likely to make mistakes. We focus on quantifying the extent of abandoned source files and adapt methods from financial risk analysis to assess the susceptibility of the project to developer turnover. In particular, we measure the historical loss distribution and find (1) that projects are susceptible to losses that are more than three times larger than the expected loss. Using historical simulations we find (2) that projects are susceptible to large losses that are over five times larger than the expected loss. We use Monte Carlo simulations of disaster loss scenarios and find (3) that simplistic estimates of the 'truck factor' exaggerate the potential for loss. To mitigate loss from developer turnover, we modify Cataldo et al.'s coordination requirements matrices. We find (4) that we can recommend the correct successor 34% to 48% of the time. We also find that having successors reduces the expected loss by as much as 15%. Our approach helps large projects assess the risk of turnover thereby making risk more transparent and manageable.
Article
As code search is a frequent developer activity in software development practices, improving the performance of code search is a critical task. In the text retrieval based search techniques employed in the code search, the term mismatch problem is a critical language issue for retrieval effectiveness. By reformulating the queries, query expansion provides effective ways to solve the term mismatch problem. In this paper, we propose Query Expansion based on Crowd Knowledge (QECK), a novel technique to improve the performance of code search algorithms. QECK identifies software-specific expansion words from the high quality pseudo relevance feedback question and answer pairs on Stack Overflow to automatically generate the expansion queries. Furthermore, we incorporate QECK in the classic Rocchio's model, and propose QECK based code search method QECKRocchio. We conduct three experiments to evaluate our QECK technique and investigate QECKRocchio in a large-scale corpus containing real-world code snippets and a question and answer pair collection. The results show that QECK improves the performance of three code search algorithms by up to 64 percent in Precision, and 35 percent in NDCG. Meanwhile, compared with the state-of-the-art query expansion method, the improvement of QECKRocchio is 22 percent in Precision, and 16 percent in NDCG.
Conference Paper
The number of software engineering research papers over the last few years has grown significantly. An important question here is: how relevant is software engineering research to practitioners in the field? To address this question, we conducted a survey at Microsoft where we invited 3,000 industry practitioners to rate the relevance of research ideas contained in 571 ICSE, ESEC/FSE and FSE papers that were published over a five year period. We received 17,913 ratings by 512 practitioners who labelled ideas as essential, worthwhile, unimportant, or unwise. The results from the survey suggest that practitioners are positive towards studies done by the software engineering research community: 71% of all ratings were essential or worthwhile. We found no correlation between the citation counts and the relevance scores of the papers. Through a qualitative analysis of free text responses, we identify several reasons why practitioners considered certain research ideas to be unwise. The survey approach described in this paper is lightweight: on average, a participant spent only 22.5 minutes to respond to the survey. At the same time, the results can provide useful insight to conference organizers, authors, and participating practitioners.