Conference Paper

Two Decades of Empirical Research on Developers' Information Needs: A Preliminary Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Various studies investigated what information programmers seek in documentation (Bouraffa and Maalej 2020). Sillito et al. (2008) and Duala-Ekoko and Robillard (2012), for example, observed participants during software maintenance and implementation tasks, respectively, to identify questions asked by programmers. ...
Article
Full-text available
Documentation for software programmers is often delivered online. Yet, its format seldom leverages interactive features supported by web technologies. Researchers have proposed techniques to include interactive elements in documents. We investigated one approach, Casdoc, to understand how programmers adapt their navigation behavior to interactive documents. During the study, participants completed programming tasks that involved an unfamiliar library. They used documents that presented information in two formats: one interactive and one traditional. We analyzed recordings of the participants’ usage of the documents and their answers to a post-study questionnaire to identify strengths and limitations of the two formats. We found that the interactive format helped find answers rapidly for short factual queries, whereas the traditional format was better suited to open-ended queries that required synthesizing more information. Overall, our results show the potential of leveraging alternative formats, even within the same document, to address a broader spectrum of information needs.
... At its core, software development is a knowledge-intensive process that requires collaboration to produce software artifacts [31]. The amount of information available and used by software developers in their daily activities has been steadily increasing during the last two decades, along with the research interest in mapping developer information needs [32]. Despite the increased research interest, the information needs of various practitioners participating in the development of software are not yet fully understood. ...
Article
Full-text available
Context: Agile software companies applying the DevOps approach require collaboration and information sharing between practitioners in various roles to produce value. Adopting new development practices affects how practitioners collaborate, requiring companies to form a closer connection between business strategy and software development. However, the types of information management, sales, and development needed to plan, evaluate features, and reconcile their expectations with each other need to be clarified. Objective: To support practitioners in collaborating and realizing changes to their practices, we investigated what information is needed and how it should be represented to support different stakeholders in their tasks. Compared to earlier research, we adopted a holistic approach—by including practitioners throughout the development process—to better understand the information needs from a broader viewpoint. Method: We conducted six workshops and 12 semi-structured interviews at three Finnish small and medium-sized enterprises from different software domains. Thematic analysis was used to identify information-related issues and information and visualization needs for daily tasks. Three themes were constructed as the result of our analysis. Results: Visual information representation catalyzes stakeholder discussion, and supporting information exchange between stakeholder groups is vital for efficient collaboration in software product development. Additionally, user-centric data collection practices are needed to understand how software products are used and to support practitioners’ daily information needs. We also found that a passive way of representing information, such as a dashboard that would disturb practitioners only when attention is needed, was preferred for daily information needs. Conclusion: The software engineering community should consider reviewing the information needs of practitioners from a more holistic view to better understand how tooling support can benefit information exchange between stakeholder groups when making product development decisions and how those tools should be built to accommodate different stakeholder views. Keywords Software engineering, Agile software development, DevOps, Information needs, Visualization
... Historically, research shows convenience sampling to be the most predominant sampling strategy within software engineering research and that many participants come from a few companies in industry [9]. These historical sampling strategies raise concerns for generalizablity and and overall appropriateness of sample size. ...
Preprint
Full-text available
Much of software engineering research focuses on tools, algorithms, and optimization of software. Recently, we, as a community, have come to acknowledge that there is a gap in meta-research and addressing the human-factors in software engineering research. Through meta research, we aim to deepen our understanding of online participant recruitment and human-subjects software engineering research. In this paper we motivate the need to consider the unique challenges that human studies pose in software engineering research. We present several challenges faced by our research team in several distinct research studies, how they affected research, and motivate how, as researchers, we can address these challenges. We present results from a pilot study and categorize issues faced into three broad categories including participant recruitment, community engagement, and data poisoning. We further discuss how we can address these challenges and outline the benefits a full-study could provide to the software engineering research community.
... In practice, sampling tends to be based on the practitioner's functional role, perhaps with the additional inclusion of experience or expertise, e.g., experienced software architects, or novice programmers. Also, researchers tend to use a convenience sample [3], often with self-selecting participants, e.g., respondents aware of and willing to participate in, for example, a survey. ...
Preprint
Full-text available
Context: Software practitioners are a primary provider of information for field studies in software engineering. Research typically recruits practitioners through some kind of sampling. But sampling may not in itself recruit credible participants. Objectives: To propose and demonstrate a framework for recruiting professional practitioners as credible participants in field studies of software engineering. Method: We review existing guidelines, checklists and other advisory sources on recruiting participants for field studies. We develop a framework, partly based on our prior research and on the research of others. We search for and select three exemplar studies (a case study, an interview study and a survey study) and use those to demonstrate the application of the framework. Results: Whilst existing guidelines etc. recognise the importance of recruiting participants, there is limited guidance on how to recruit the right participants. Our demonstration of the framework with three exemplars shows that at least some members of the research community are aware of the need to carefully recruit participants. Conclusions: The framework provides a new perspective for thinking about the recruitment of credible practitioners for field studies of software engineering. In particular, the framework identifies a number of characteristics not explicitly addressed by existing guidelines.
Article
Code reviews are central for software quality assurance. Ideally, reviewers should explain their feedback to enable authors of code changes to understand the feedback and act accordingly. Different developers might need different explanations in different contexts. Therefore, assisting this process first requires understanding the types of explanations reviewers usually provide. The goal of this paper is to study the types of explanations used in code reviews and explore the potential of Large Language Models (LLMs), specifically ChatGPT, in generating these specific types. We extracted 793 code review comments from Gerrit and manually labeled them based on whether they contained a suggestion, an explanation, or both. Our analysis shows that 42% of comments only include suggestions without explanations. We categorized the explanations into seven distinct types including rule or principle, similar examples, and future implications. When measuring their prevalence, we observed that some explanations are used differently by novice and experienced reviewers. Our manual evaluation shows that, when the explanation type is specified, ChatGPT can correctly generate the explanation in 88 out of 90 cases. This foundational work highlights the potential for future automation in code reviews, which can assist developers in sharing and obtaining different types of explanations as needed, thereby reducing back-and-forth communication.
Article
Documentation enables sharing knowledge between the developers of a technology and its users. Creating quality documents, however, is challenging: Documents must satisfy the needs of a large audience without being overwhelming for individuals. We address this challenge with a new document format, named Casdoc. Casdoc documents are interactive resources centered around code examples for programmers. Explanations of the code elements are presented as annotations that the readers reveal based on their needs. We evaluated Casdoc in a field study with over 300 participants who used 126 documents as part of a software design course. During the study, the majority of participants adopted Casdoc instead of a baseline format and used interactive annotations to reveal additional information about the code example. Although participants collectively viewed the majority of the documents’ content, they individually revealed a minority of the annotations they saw. We gathered insights into five aspects of Casdoc that can be applied to other formats, and highlighted five lessons learned to improve navigability in online documents.
Article
Context Software practitioners are a primary provider of information for field studies in software engineering. Research typically recruits practitioners through some kind of sampling. But sampling may not in itself recruit the “right” participants. Objective To assess existing guidance on participant recruitment, and to propose and illustrate a framework for recruiting professional practitioners as credible participants in field studies of software engineering. Methods We review existing guidelines, checklists and other advisory sources on recruiting participants for field studies. We develop a framework, partly based on our prior research and on the research of others. We search for and select three exemplar studies (a case study, an interview study and a survey study) and use those to illustrate the framework. Results Whilst existing guidance recognises the importance of recruiting participants, there is limited guidance on how to recruit the “right” participants. The framework suggests the conceptualisation of participants as “research instruments” or, alternatively, as a sampling frame for items of interest. The exemplars suggest that at least some members of the research community are aware of the need to carefully recruit the “right” participants. Conclusions The framework is intended to encourage researchers to think differently about the involvement of practitioners in field studies of software engineering. Also, the framework identifies a number of characteristics not explicitly addressed by existing guidelines.
Article
Full-text available
What is a good workday for a software developer? What is a typical workday? We seek to answer these two questions to learn how to make good days typical. Concretely, answering these questions will help to optimize development processes and select tools that increase job satisfaction and productivity. Our work adds to a large body of research on how software developers spend their time. We report the results from 5971 responses of professional developers at Microsoft, who reflected about what made their workdays good and typical, and self-reported about how they spent their time on various activities at work. We developed conceptual frameworks to help define and characterize developer workdays from two new perspectives: good and typical. Our analysis confirms some findings in previous work, including the fact that developers actually spend little time on development and developers' aversion for meetings and interruptions. It also discovered new findings, such as that only 1.7% of survey responses mentioned emails as a reason for a bad workday, and that meetings and interruptions are only unproductive during development phases; during phases of planning, specification and release, they are common and constructive. One key finding is the importance of agency, developers' control over their workday and whether it goes as planned or is disrupted by external factors. We present actionable recommendations for researchers and managers to prioritize process and tool improvements that make good workdays typical. For instance, in light of our finding on the importance of agency, we recommend that, where possible, managers empower developers to choose their tools and tasks.
Article
Full-text available
Technical question and answer Q&A platforms, such as Stack Overflow, provide a platform for users to ask and answer questions about a wide variety of programming topics. These platforms accumulate a large amount of knowledge, including hundreds of thousands lines of source code. Developers can benefit from the source code that is attached to the questions and answers on Q&A platforms by copying or learning from (parts of) it. By understanding how developers utilize source code from Q&A platforms, we can provide insights for researchers which can be used to improve next-generation Q&A platforms to help developers reuse source code fast and easily. In this paper, we first conduct an exploratory study on 289 files from 182 open-source projects, which contain source code that has an explicit reference to a Stack Overflow post. Our goal is to understand how developers utilize code from Q&A platforms and to reveal barriers that may make code reuse more difficult. In 31.5% of the studied files, developers needed to modify source code from Stack Overflow to make it work in their own projects. The degree of required modification varied from simply renaming variables to rewriting the whole algorithm. Developers sometimes chose to implement an algorithm from scratch based on the descriptions from Stack Overflow answers, even if there was an implementation readily available in the post. In 35.5% of the studied files, developers used Stack Overflow posts as an information source for later reference. To further understand the barriers of reusing code and to obtain suggestions for improving the code reuse process on Q&A platforms, we conducted a survey with 453 open-source developers who are also on Stack Overflow. We found that the top 3 barriers that make it difficult for developers to reuse code from Stack Overflow are: (1) too much code modification required to t in their projects, (2) incomprehensive code, and (3) low code quality. We summarized and analyzed all survey responses and we identified that developers suggest improvements for future Q&A platforms along the following dimensions: code quality, information enhancement & management, data organization, license, and the human factor. For instance, developers suggest to improve the code quality by adding an integrated validator that can test source code online, and an outdated code detection mechanism. Our findings can be used as a roadmap for researchers and developers to improve code reuse.
Conference Paper
Full-text available
The happy-productive worker thesis states that happy workers are more productive. Recent research in software engineering supports the thesis, and the ideal of flourishing happiness among software developers is often expressed among industry practitioners. However, the literature suggests that a cost-effective way to foster happiness and productivity among workers could be to limit unhappiness. Psychological disorders such as job burnout and anxiety could also be reduced by limiting the negative experiences of software developers. Simultaneously, a baseline assessment of (un)happiness and knowledge about how developers experience it are missing. In this paper, we broaden the understanding of unhappiness among software developers in terms of (1) the software developer population distribution of (un)happiness, and (2) the causes of unhappiness while developing software. We conducted a large-scale quantitative and qualitative survey, incorporating a psychometrically validated instrument for measuring (un)happiness, with 2,220 developers, yielding a rich and balanced sample of 1,318 complete responses. Our results indicate that software developers are a slightly happy population, but the need for limiting the unhappiness of developers remains. We also identified 219 factors representing causes of unhappiness while developing software. Our results, which are available as open data, can act as guidelines for practitioners in management positions and developers in general for fostering happiness on the job. We suggest future research in software engineering should consider happiness in studies of human aspects and even in seemingly unrelated technical areas.
Conference Paper
Full-text available
Security tools can help developers answer questions about potential vulnerabilities in their code. A better understanding of the types of questions asked by developers may help toolsmiths design more effective tools. In this paper, we describe how we collected and categorized these questions by conducting an exploratory study with novice and experienced software developers. We equipped them with Find Security Bugs, a security-oriented static analysis tool, and observed their interactions with security vulnerabilities in an open-source system that they had previously contributed to. We found that they asked questions not only about security vulnerabilities, associated attacks, and fixes, but also questions about the software itself, the social ecosystem that built the software, and related resources and tools. For example, when participants asked questions about the source of tainted data, their tools forced them to make imperfect tradeoffs between systematic and ad hoc program navigation strategies.
Article
Full-text available
The popularity of mobile devices has been steadily growing in recent years. These devices heavily depend on software from the underlying operating systems to the applications they run. Prior research showed that mobile software is different than traditional, large software systems. However, to date most of our research has been conducted on traditional software systems. Very little work has focused on the issues that mobile developers face. Therefore, in this paper, we use data from the popular online Q&A site, Stack Overflow, and analyze 13,232,821 posts to examine what mobile developers ask about. We employ Latent Dirichlet allocation-based topic models to help us summarize the mobile-related questions. Our findings show that developers are asking about app distribution, mobile APIs, data management, sensors and context, mobile tools, and user interface development. We also determine what popular mobile-related issues are the most difficult, explore platform specific issues, and investigate the types (e.g., what, how, or why) of questions mobile developers ask. Our findings help highlight the challenges facing mobile developers that require more attention from the software engineering research and development communities in the future and establish a novel approach for analyzing questions asked on Q&A forums.
Article
Full-text available
We present the results of an investigation into the nature of information needs of software developers who work in projects that are part of larger ecosystems. This work is based on a quantitative survey of 75 professional software developers. We corroborate the results identified in the survey with needs and motivations proposed in a previous survey and discover that tool support for developers working in an ecosystem context is even more meager than we thought: mailing lists and internet search are the most popular tools developers use to satisfy their ecosystem-related information needs.
Conference Paper
Full-text available
We present the results of an investigation into the nature of the information needs of software developers who work in projects that are part of larger ecosystems. In an open-question survey we asked framework and library developers about their information needs with respect to both their upstream and downstream projects. We investigated what kind of information is required, why is it necessary, and how the developers obtain this information. The results show that the downstream needs are grouped into three categories roughly corresponding to the different stages in their relation with an upstream: selection, adoption, and co-evolution. The less numerous upstream needs are grouped into two categories: project statistics and code usage. The current practices part of the study shows that to satisfy many of these needs developers use non-specific tools and ad hoc methods. We believe that this is a largely unexplored area of research.
Conference Paper
Full-text available
Software evolves with continuous source-code changes. These code changes usually need to be understood by software engineers when performing their daily development and maintenance tasks. However, despite its high importance, such change-understanding practice has not been systematically studied. Such lack of empirical knowledge hinders attempts to evaluate this fundamental practice and improve the corresponding tool support. To address this issue, in this paper, we present a large-scale quantitative and qualitative study at Microsoft. The study investigates the role of understanding code changes during software-development process, explores engineers' information needs for understanding changes and their requirements for the corresponding tool support. The study results reinforce our beliefs that understanding code changes is an indispensable task performed by engineers in software-development process. A number of insufficiencies in the current practice also emerge from the study results. For example, it is difficult to acquire important information needs such as a change's completeness, consistency, and especially the risk imposed by it on other software components. In addition, for understanding a composite change, it is valuable to decompose it into sub-changes that are aligned with individual development issues; however, currently such decomposition lacks tool support.
Conference Paper
Full-text available
Software development often requires knowledge beyond what developers already possess. In such cases, developers have to seek help from different sources of information. As a metacognitive skill, help seeking influences software developers' efficiency and success in many situations. However, there has been little research to provide a systematic investigation of the general process of help seeking activities in software engineering and human and system factors affecting help seeking. This paper reports our empirical study aiming to fill this gap. Our study includes two human experiments, involving 24 developers and two typical software development tasks. Our study gathers empirical data that allows us to provide an in-depth analysis of help-seeking task structures, task strategies, information sources, process model, and developers' information needs and behaviors in seeking and using help information and in managing information during help seeking. Our study provides a detailed understanding of help seeking activities in software engineering, the challenges that software developers face, and the limitations of existing tool support. This can lead to the design and development of more efficient and usable help seeking support that helps developers become better help seekers.
Article
Full-text available
Recent tools have been designed to help developers understand the potential runtime structure of objects in a system at compile time. Such tools let developers interactively explore diagrams of object structure. But do developers ask questions about object structure? If so, when? We conducted a small pilot study of developers working on coding tasks designed to require thinking about relationships between objects. Developers did indeed ask a number of questions about various types of relationships such as containment, ownership, object identities and aliasing. Finally, some of our results revealed usability challenges tools should address to more effectively answer these questions.
Article
Full-text available
Starting from the belief that software development is a human activity, this paper tries to conceptualize software development as a knowledge-intensive design and distributed cognitive activity. This conceptualization leads to the argument that providing support for software developers to engage in knowledge collaboration with external knowledge repositories and peers is essential for software development environments. Technical and social challenges in providing such support are identified, and an illustrative system support that we have been developing is briefly described.
Conference Paper
Full-text available
Code reviews have proven to be an effective means of improving overall software quality. During the review, there is an exchange of knowledge between the code author and reviewer that concerns the code being reviewed. We performed a study that looked at the code review practices of software product teams at Microsoft. The study results indicated that code reviews are a point at which design rationale is explicitly stated, but that retention and recovery of this information is not well supported in the current environment. The results also indicated that code reviews in collocated development environments such as Microsoft use a mix of face-to-face and electronic communication.
Conference Paper
Full-text available
ABSTRACT Inrecent years, the software engineering community has begun to study program,navigation and tools to support,it. Some of these navigation tools are very useful, but they lack a theoretical basis that could reduce the need for ad hoc tool building approaches,by explaining,what is funda- mentally necessary in such tools. In this paper, we present PFIS (Programmer Flow by Information Scent), a model and algorithm of programmer ,navigation during software maintenance. We also ,describe an experimental ,study of expert programmers ,debugging ,real bugs described in real bug reports for a real Java application. We found that PFIS’ performance,was close to aggregated human,decisions as to where to navigate, and was significantly better than indi- vidual programmers’ decisions. Author Keywords Information foraging, debugging, software maintenance ACM Classification Keywords
Conference Paper
Full-text available
Several authors have proposed information seeking as an appropriate perspective for studying software maintenance, and have characterized information seeking empirically in commercial software evolution settings. This paper addresses the parallel issue of information seeking in Open Source software evolution. Open Source software evolution exacerbates information-seeking problems, as team members are typically delocalized from the other members of their team. This paper employs an analysis schema from our previous study (Sharif et.al 2008), generated through open-coding, to characterize information seeking in Open-Source, programmers' mailing-lists, the medium they predominantly use for communication. A preliminary study using this schema had several interesting conclusions. Specifically, the analysis has shown that Open Source programmers rely somewhat on documentation, that many of their information seeking activities are process orientated and that their information seeking goals change over time.
Conference Paper
Full-text available
For many software projects, bug tracking systems play a central role in supporting collaboration between the devel- opers and the users of the software. To better understand this collaboration and how tool support can be improved, we have quantitatively and qualitatively analysed the questions asked in a sample of 600 bug reports from the MOZILLA and ECLIPSE projects. We categorised the questions and analysed response rates and times by category and project. Our re- sults show that the role of users goes beyond simply report- ing bugs: their active and ongoing participation is important for making progress on the bugs they report. Based on the results, we suggest four ways in which bug tracking systems can be improved.
Article
Full-text available
We provide an assessment of the status of empirical software research by analyzing all refereed articles that appeared in the Journal of Empirical Software Engineering from its first issue in January 1996 through June 2006. The journal publishesempirical software research exclusively and it is the only journal to do so. The main findings are: 1. The dominant empiricalmethods are experiments and case studies. Other methods (correlational studies, meta analysis, surveys, descriptive approaches,ex post facto studies) occur infrequently; long-term studies are missing. About a quarter of the experiments are replications.2. Professionals are used somewhat more frequently than students as subjects. 3. The dominant topics studied are measurement/metricsand tools/methods/frameworks. Metrics research is dominated by correlational and case studies without any experiments. 4.Important topics are underrepresented or absent, for example: programming languages, model driven development, formal methods,and others. The narrow focus on a few empirically researched topics is in contrast to the broad scope of software research.
Conference Paper
Full-text available
Previous research has documented the fragmented nature of software development work. To explain this in more detail, we analyzed software developers' day-to-day information needs. We observed seventeen developers at a large software company and transcribed their activities in 90-minute sessions. We analyzed these logs for the information that developers sought, the sources that they used, and the situations that prevented information from being acquired. We identified twenty-one information types and cataloged the outcome and source when each type of information was sought. The most frequently sought information included awareness about artifacts and coworkers. The most often deferred searches included knowledge about design and program behavior, such as why code was written a particular way, what a program was supposed to do, and the cause of a program state. Developers often had to defer tasks because the only source of knowledge was unavailable coworkers.
Article
Full-text available
Little is known about the specific kinds of questions programmers ask when evolving a code base and how well existing tools support those questions. To better support the activity of programming, answers are needed to three broad research questions: (1) what does a programmer need to know about a code base when evolving a software system? (2) how does a programmer go about finding that information? and (3) how well do existing tools support programmer's in answering those questions? We undertook two qualitative studies of programmers performing change tasks to provide answers to these questions. In this paper, we report on an analysis of the data from these two user studies. This paper makes three key contributions. The first contribution is a catalog of 44 types of questions programmers ask during software evolution tasks. The second contribution is a description of the observed behavior around answering those questions. The third contribution is a description of how existing deployed and proposed tools do, and do not, support answering programmers' questions.
Article
Full-text available
Much of software developers' time is spent understanding unfamiliar code. To better understand how developers gain this understanding and how software development environments might be involved, a study was performed in which developers were given an unfamiliar program and asked to work on two debugging tasks and three enhancement tasks for 70 minutes. The study found that developers interleaved three activities. They began by searching for relevant code both manually and using search tools; however, they based their searches on limited and misrepresentative cues in the code, environment, and executing program, often leading to failed searches. When developers found relevant code, they followed its incoming and outgoing dependencies, often returning to it and navigating its other dependencies; while doing so, however, Eclipse's navigational tools caused significant overhead. Developers collected code and other information that they believed would be necessary to edit, duplicate, or otherwise refer to later by encoding it in the interactive state of Eclipse's package explorer, file tabs, and scroll bars. However, developers lost track of relevant code as these interfaces were used for other tasks, and developers were forced to find it again. These issues caused developers to spend, on average, 35 percent of their time performing the mechanics of navigation within and between source files. These observations suggest a new model of program understanding grounded in theories of information foraging and suggest ideas for tools that help developers seek, relate, and collect information in a more effective and explicit manner
Article
Full-text available
The classical method for identifying cause-effect relationships is to conduct controlled experiments. This paper reports upon the present state of how controlled experiments in software engineering are conducted and the extent to which relevant information is reported. Among the 5,453 scientific articles published in 12 leading software engineering journals and conferences in the decade from 1993 to 2002, 103 articles (1.9 percent) reported controlled experiments in which individuals or teams performed one or more software engineering tasks. This survey quantitatively characterizes the topics of the experiments and their subjects (number of subjects, students versus professionals, recruitment, and rewards for participation), tasks (type of task, duration, and type and size of application) and environments (location, development tools). Furthermore, the survey reports on how internal and external validity is addressed and the extent to which experiments are replicated. The gathered data reflects the relevance of software engineering experiments to industrial practice and the scientific maturity of software engineering research.
Article
Full-text available
Code cognition models examine how programmers understand program code. The authors survey the current knowledge in this area by comparing six program comprehension models: the Letovsky (1986) model; the Shneiderman and Mayer (1979) model; the Brooks (1983) model; Soloway, Adelson and Ehrlich's (1988) top-down model; Pennington's (1987) bottom-up model; and the integrated metamodel of von Mayrhauser and Vans (1994). While these general models can foster a complete understanding of a piece of code, they may not always apply to specialized tasks that more efficiently employ strategies geared toward partial understanding. We identify open questions, particularly considering the maintenance and evolution of large-scale code. These questions relate to the scalability of existing experimental results with small programs, the validity and credibility of results based on experimental procedures, and the challenges of data availability
Article
Contemporary code review is a widespread practice used by software engineers to maintain high software quality and share project knowledge. However, conducting proper code review takes time and developers often have limited time for review. In this paper, we aim at investigating the information that reviewers need to conduct a proper code review, to better understand this process and how research and tool support can make developers become more effective and efficient reviewers. Previous work has provided evidence that a successful code review process is one in which reviewers and authors actively participate and collaborate. In these cases, the threads of discussions that are saved by code review tools are a precious source of information that can be later exploited for research and practice. In this paper, we focus on this source of information as a way to gather reliable data on the aforementioned reviewers' needs. We manually analyze 900 code review comments from three large open-source projects and organize them in categories by means of a card sort. Our results highlight the presence of seven high-level information needs, such as knowing the uses of methods and variables declared/modified in the code under review. Based on these results we suggest ways in which future code review tools can better support collaboration and the reviewing task.
Preprint
I welcome the contribution from Falessi et al. [1] hereafter referred to as F++ , and the ensuing debate. Experimentation is an important tool within empirical software engineering, so how we select participants is clearly a relevant question. Moreover as F++ point out, the question is considerably more nuanced than the simple dichotomy it might appear to be at first sight. This commentary is structured as follows. In Section 2 I briefly summarise the arguments of F++ and comment on their approach. Next, in Section 3, I take a step back to consider the nature of representativeness in inferential arguments and the need for careful definition. Then I give three examples of using different types of participant to consider impact. I conclude by arguing, largely in agreement with F++, that the question of whether student participants are representative or not depends on the target population. However, we need to give careful consideration to defining that population and, in particular, not to overlook the representativeness of tasks and environment. This is facilitated by explicit description of the target populations.
Article
Empirical research is playing a significant role in software engineering (SE), and it has been applied to evaluate software artifacts and technologies. There have been a great number of empirical research articles published recently. There is also a large research community in empirical software engineering (ESE). In this paper, we identify both the overall landscape and detailed implementations of ESE, and investigate frequently applied empirical methods, targeted research purposes, used data sources, and applied data processing approaches and tools in ESE. The aim is to identify new trends and obtain interesting observations of empirical software engineering across different sub-fields of software engineering. We conduct a mapping study on 538 selected articles from January 2013 to November 2017, with four research questions. We observe that the trend of applying empirical methods in software engineering is continuously increasing and the most commonly applied methods are experiment, case study and survey. Moreover, open source projects are the most frequently used data sources. We also observe that most of researchers have paid attention to the validity and the possibility to replicate their studies. These observations are carefully analyzed and presented as carefully designed diagrams. We also reveal shortcomings and demanded knowledge/strategies in ESE and propose recommendations for researchers.
Conference Paper
Representative sampling is considered crucial for predominately quantitative, positivist research. Researchers typically argue that a sample is representative when items are selected randomly from a population. However, random sampling is rare in empirical software engineering research because there are no credible sampling frames (population lists) for the units of analysis software engineering researchers study (e.g. software projects, code libraries, developers, projects). This means that most software engineering research does not support statistical generalization, but rejecting any particular study for lack of random sampling is capricious.
Conference Paper
Previous studies focus on the specific questions software engineers ask when evolving a codebase. Though these studies observe developers using statically typed languages, little is known about the developer questions using dynamically typed languages. Dynamically typed languages present new challenges to understanding and navigating in a codebase and could affect results reported by previous studies. This paper replicates a previous study and presents the analysis of six programming sessions made in Pharo, a dynamically typed language. We found a similar result when comparing sessions on an unfamiliar codebase with the previous work. Our result on the familiar code greatly deviates from the replicated study, likely caused by different tasks and development strategies. Both missing type information and test driven development affected participant behavior and prudence on codebase understanding, where some participants made changes based on assumptions. We provide a set of questions that are useful in characterizing activity related to the use of a dynamically typed language and test-driven development -- questions not explicitly considered in previous research. We also present a number of issues that we would like to discuss during the PLATEAU workshop.
Article
In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like data scientists to investigate about software, about software processes and practices, and about software engineers. Our analyses resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate these 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also saw opposition to questions that assess the performance of individual employees or compare them with one another. Our categorization and catalog of 145 questions can help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.
Article
Context Several authors have proposed information seeking as an appropriate perspective for studying software evolution. Empirical evidence in this area suggests that substantial time delays can accrue, due to the unavailability of required information, particularly when this information must travel across geographically distributed sites. Objective As a first step in addressing the time delays that can occur in information seeking for distributed Open Source (OS) programmers during software evolution, this research characterizes the information seeking of OS developers through their mailing lists. Method A longitudinal study that analyses 17 years of developer mailing list activity in total, over 6 different OS projects is performed, identifying the prevalent information types sought by developers, from a qualitative, grounded analysis of this data. Quantitative analysis of the number-of-responses and response time-lag is also performed. Results The analysis shows that Open Source developers are particularly implementation centric and team focused in their use of mailing lists, mirroring similar findings that have been reported in the literature. However novel findings include the suggestion that OS developers often require support regarding the technology they use during development, that they refer to documentation fairly frequently and that they seek implementation-oriented specifics based on system design principles that they anticipate in advance. In addition, response analysis suggests a large variability in the response rates for different types of questions, and particularly that participants have difficulty ascertaining information on other developer’s activities. Conclusion The findings provide insights for those interested in supporting the information needs of OS developer communities: They suggest that the tools and techniques developed in support of co-located developers should be largely mirrored for these communities: that they should be implementation centric, and directed at illustrating “how” the system achieves its functional goals and states. Likewise they should be directed at determining the reason for system bugs: a type of question frequently posed by OS developers but less frequently responded to.
Conference Paper
The increasing size of APIs and the increase in the number of APIs available imply developers must frequently learn how to use unfamiliar APIs. To identify the types of questions developers want answered when working with unfamiliar APIs and to understand the difficulty they may encounter answering those questions, we conducted a study involving twenty programmers working on different programming tasks, using unfamiliar APIs. Based on the screen captured videos and the verbalization of the participants, we identified twenty different types of questions programmers ask when working with unfamiliar APIs, and provide new insights to the cause of the difficulties programmers encounter when answering questions about the use of APIs. The questions we have identified and the difficulties we observed can be used for evaluating tools aimed at improving API learning, and in identifying areas of the API learning process where tool support is missing, or could be improved.
Conference Paper
Software development is a data rich activity with many sophisticated metrics. Yet engineers often lack the tools and techniques necessary to leverage these potentially powerful information resources toward decision making. In this paper, we present the data and analysis needs of professional software engineers, which we identified among 110 developers and managers in a survey. We asked about their decision making process, their needs for artifacts and indicators, and scenarios in which they would use analytics. The survey responses lead us to propose several guidelines for analytics tools in software development including: Engineers do not necessarily have much expertise in data analysis; thus tools should be easy to use, fast, and produce concise output. Engineers have diverse analysis needs and consider most indicators to be important; thus tools should at the same time support many different types of artifacts and many indicators. In addition, engineers want to drill down into data based on time, organizational structure, and system architecture.
Conference Paper
Quantitative studies in Software Engineering are frequently dependent on primary studies in which population is usually small and established by convenience. It brings several limitations for the analysis and strength of results due sampling issues. Therefore, when these studies are reapplied, different and non-clustered populations are established, making unfeasible evidence generalization and contributing for an imbalance between research and practice. Aiming at investigating ways to overcome the absence of large sampling frames in Software Engineering studies, this short paper presents the results of an initial experience concerned with the systematic recruitment of subjects for a survey regarding software requirements effort factors by using social networks compared with recruitment by convenience. We have observed in this particular case that using social networks technology does not guarantee sample enlargement by just posting invitations in specific forums. However, its usage can contribute to increase the subjects' heterogeneity and to increase the level of confidence of the sample, which consequently improve our capacity of observing the object under study, with the probable strengthen of results.
Conference Paper
We know surprisingly little about how professional developers define debugging and the challenges they face in industrial environments. To begin exploring professional debugging challenges and needs, we conducted and analyzed interviews with 15 professional software engineers at Microsoft. The goals of this study are: 1) to understand how professional developers currently use information and tools to debug, 2) to identify new challenges in debugging in contemporary software development domains (web services, multithreaded/multicore programming), and 3) to identify the improvements in debugging support desired by these professionals that are needed from research. The interviews were coded to identify the most common information resources, techniques, challenges, and needs for debugging as articulated by the developers. The study reveals several debugging challenges faced by professionals, including: 1) the interaction of hypothesis instrumentation and software environment as a source of debugging difficulty, 2) the impact of log file information on accurate debugging of web services, and 3) the mismatch between the sequential human thought process and the non-sequential execution of multithreaded environments as source of difficulty. The interviewees also describe desired improvements to tools to support debugging, many of which have been discussed in research but not transitioned to practice.
Article
Source code search is an important activity for program-mers working on a change task to a software system. We are at the early stages of a research program that is aiming to answer three research questions: (1) How effectively can programmers express (using today's tools) the information they are seeking? (2) How effectively can programmers de-termine which of the matches returned from their searches are relevant to their task? and (3) In what ways can tools be improved to support programmers in more effectively ex-pressing their information needs and exploring the results of searches? To begin answering these questions we have conducted a study in which we gathered both qualitative and quantitative data about programmers' search activities. Our analysis of this data is still incomplete, however this paper presents several of our initial observations about how programmers interact with the results from their searches.
Article
As software engineers collaboratively develop software, they need to often analyze past and present program modifications implemented by other developers. While several techniques focus on tool support for investigating past and present soft-ware modifications, do these techniques indeed address de-velopers' awareness interests that are important to them? We conducted an initial focus group study and a web survey to understand in which task contexts and how often particu-lar types of awareness-interests arise. Our preliminary study results indicate that developers have daily information needs about code changes that affect or interfere with their code, yet it is extremely challenging for them to identify relevant events out of a large number of change-events.
Conference Paper
Question and Answer (Q&A) websites, such as Stack Overflow, use social media to facilitate knowledge exchange between programmers and fill archives with millions of entries that contribute to the body of knowledge in software development. Understanding the role of Q&A websites in the documentation landscape will enable us to make recommendations on how individuals and companies can leverage this knowledge effectively. In this paper, we analyze data from Stack Overflow to categorize the kinds of questions that are asked, and to explore which questions are answered well and which ones remain unanswered. Our preliminary findings indicate that Q&A websites are particularly effective at code reviews and conceptual questions. We pose research questions and suggest future work to explore the motivations of programmers that contribute to Q&A websites, and to understand the implications of turning Q&A exchanges into technical mini-blogs through the editing of questions and answers.
Conference Paper
Difficulties understanding update paths while understand- ing code cause developers to waste time and insert bugs. A detailed investigation of these difficulties suggests that a wide variety of problems could be addressed by more easi- ly answering questions about update paths that existing tools do not answer. We are designing a feasible update path static analysis to compute these paths and a visualiza- tion for asking questions and displaying results. In addition to grounding the questions we answer and tailoring the program analysis in data, we will also evaluate the useful- ness of our tool using lab and field studies.
Conference Paper
Building tools to help software developers communicate effectively requires a deep understanding of their communication dynamics. To date we do not have good comprehension of why developers talk to each other as a result of some events in the life of their projects, and not of others. This lack of knowledge makes it difficult to design useful communication models and support systems. In this paper, we narrow down the study of communication behaviour to focus on interactions that occur as a result of a particular kind of project event: the submission of a changeset to the project repository. In a case study with the IBM® Rational® Team Concert™ development team we investigate which factors influence developers to request information about a changeset to their product. We identify several such factors, including the development mode in which the team is operating, the background and recent performance of the author of the changeset, and the risk that the changeset poses to the stability of the product. Incorporating these factors into recommender systems may lead to improvements in their performance.
Conference Paper
Version control branching allows an organization to parallelize its development efforts. Releasing a software system developed in this manner requires release managers, and other project stakeholders, to make decisions about how to integrate the branched work. This group decision-making process becomes very complex in the case of large-scale parallel development. To better understand the information needs of release managers in this context, we conducted an interview study at a large software company. Our analysis of the interviews provides a view into how release managers make integration decisions, organized around ten key factors. Based on these factors, we discuss specific information needs for release managers and how the needs can be met in future work.
Article
One of the main goals of an applied research field such as software engineering is the transfer and widespread use of research results in industry. To impact industry, researchers developing technologies in academia need to provide tangible evidence of the advantages of using them. This can be done trough step-wise validation, enabling researchers to gradually test and evaluate technologies to finally try them in real settings with real users and applications. The evidence obtained, together with detailed information on how the validation was conducted, offers rich decision support material for industry practitioners seeking to adopt new technologies and researchers looking for an empirical basis on which to build new or refined technologies. This paper presents model for evaluating the rigor and industrial relevance of technology evaluations in software engineering. The model is applied and validated in a comprehensive systematic literature review of evaluations of requirements engineering technologies published in software engineering journals. The aim is to show the applicability of the model and to characterize how evaluations are carried out and reported to evaluate the state-of-research. The review shows that the model can be applied to characterize evaluations in requirements engineering. The findings from applying the model also show that the majority of technology evaluations in requirements engineering lack both industrial relevance and rigor. In addition, the research field does not show any improvements in terms of industrial relevance over time.
Conference Paper
To understand developers' typical tools, activities, and practices and their satisfaction with each, we conducted two surveys and eleven interviews. Many problems arose from developers forced to invest great effort recovering implicit knowledge by exploring code and interrupting teammates only to persist this knowledge in their memory. Contrary to expectations that email and IM prevent expensive task switches caused by face-to-face interruptions, we found that face-to-face communication enjoys many advantages. Contrary to expectations that documentation makes understanding design rationale easy, we found that current design documents are inadequate. Contrary to expectations that code duplication involves the copy and paste of code snippets, developers reported several types of duplication. We use data to characterize these and other problems and draw implications for the design of tools for their solution.
Conference Paper
This paper proposes an empirical approach, called content analysis, to identify the information sought and obtained by programmers as they maintain software systems. Using this information, researchers have the potential to determine the information bottlenecks that programmers encounter as they view representations of their software systems. As such, this work promises both direction and an evaluation framework for those involved in creating software visualization tools
Article
Technology transfer, and thus industry-relevant research, involves more than merely producing research results and delivering them in publications and technical reports. It demands close cooperation and collaboration between industry and academia throughout the entire research process. During research conducted in a partnership between Blekinge Institute of Technology and two companies, Danaher Motion Saro AB (DHR) and ABB, we devised a technology transfer model that embodies this philosophy. We initiated this partnership to conduct industry-relevant research in requirements engineering and product management. Technology transfer in this context is a prerequisite: it validates academic research results in a real setting, and it provides a way to improve industry development and business processes