Article

How have views on Software Quality differed over time? Research and practice viewpoints

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Context Over the years, there has been debate about what constitutes software quality and how it should be measured. This controversy has caused uncertainty across the software engineering community, affecting levels of commitment to the many potential determinants of quality among developers. An up-to-date catalogue of software quality views could provide developers with contemporary guidelines and templates. In fact, it is necessary to learn about views on the quality of code on frequently used online collaboration platforms (e.g., Stack Overflow), given that the quality of code snippets can affect the quality of software products developed. If quality models are unsuitable for aiding developers because they lack relevance, developers will hold relaxed or inappropriate views of software quality, thereby lacking awareness and commitment to such practices. Objective We aim to explore differences in interest in quality characteristics across research and practice. We also seek to identify quality characteristics practitioners consider important when judging code snippet quality. First, we examine the literature for quality characteristics used frequently for judging software quality, followed by the quality characteristics commonly used by researchers to study code snippet quality. Finally, we investigate quality characteristics used by practitioners to judge the quality of code snippets. Methods We conducted two systematic literature reviews followed by semi-structured interviews of 50 practitioners to address this gap. Results The outcomes of the semi-structured interviews revealed that most practitioners judged the quality of code snippets using five quality dimensions: Functionality, Readability, Efficiency, Security and Reliability. However, other dimensions were also considered (i.e., Reusability, Maintainability, Usability, Compatibility and Completeness). This outcome differed from how the researchers judged code snippet quality. Conclusion Practitioners today mainly rely on code snippets from online code resources, and specific models or quality characteristics are emphasised based on their need to address distinct concerns (e.g., mobile vs web vs standalone applications, regular vs machine learning applications, or open vs closed source applications). Consequently, software quality models should be adapted for the domain of consideration and not seen as one-size-fits-all. This study will lead to targeted support for various clusters of the software development community.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... -Improving Understanding and Application: Some articles [4,8,12] offer guidance and better understanding of how to use AI tools in educational settings, develop effective curricula, and improve software quality. -New Experiments and Models: Some studies [10,15] propose new models and methods for the prediction of software defects, with a combined approach of data in natural language and programming, they also explore error patterns in software projects. -Knowledge Synthesis: Other articles [16,17] synthesize knowledge in specific areas, such as teaching software testing; In addition, they offer community references and expert insight in various fields across multiple disciplines. ...
... Understand and evaluate problems or bad practices in Python code detectable through metrics and static analysis tools [13] 10 Focus on the detection and validation of code clones in software programs using machine learning techniques, which can be valuable in improving software quality and maintainability by addressing code duplication issues [14] 11 Analyze how perspectives on software quality have evolved over time, focusing on both research and practical perspectives [15] 12 Investigate and address errors within a deep learning library, understanding their symptoms, causes and proposing effective solutions [16] Otherwis, one article [16] mentions biases in the search terms used in the systematic mapping of the literature, which could have excluded important articles and not considered them in the research work. ...
... In the same way, for I. Ndukwe in the Ref. [15], the methodology considered a systematic literature reviews followed by semi-structured interviews with 50 professionals were conducted to address this gap. Equially important, the contribution was a cataloged software quality models and synthesized how researchers study and view the quality of code fragments. ...
... To keep our project scope narrow and focused, we decided to evaluate the tasks on only one metric: functionality (i.e. correctness of the code), as it was the highest-rated metric for code evaluation by software developers [13]. Functionality tests for both syntactic and semantic correctness need to be correct for the code to be functional. ...
... Construct Validity. The metrics used to evaluate the correctness and functionality of the generated code (passing the unit tests generated by the research team) might not capture all aspects of code quality, such as readability, maintainability, or efficiency [13]. While we tried to generate unit tests (using assertions) for basic input-output pairs, more tests could be required in future studies. ...
Preprint
Full-text available
Large Language Models (LLMs), such as GPT models, are increasingly used in software engineering for various tasks, such as code generation, requirements management, and debugging. While automating these tasks has garnered significant attention, a systematic study on the impact of varying hyperparameters on code generation outcomes remains unexplored. This study aims to assess LLMs' code generation performance by exhaustively exploring the impact of various hyperparameters. Hyperparameters for LLMs are adjustable settings that affect the model's behaviour and performance. Specifically, we investigated how changes to the hyperparameters: temperature, top probability (top_p), frequency penalty, and presence penalty affect code generation outcomes. We systematically adjusted all hyperparameters together, exploring every possible combination by making small increments to each hyperparameter at a time. This exhaustive approach was applied to 13 Python code generation tasks, yielding one of four outcomes for each hyperparameter combination: no output from the LLM, non executable code, code that fails unit tests, or correct and functional code. We analysed these outcomes for a total of 14,742 generated Python code segments, focusing on correctness, to determine how the hyperparameters influence the LLM to arrive at each outcome. Using correlation coefficient and regression tree analyses, we ascertained which hyperparameters influence which aspect of the LLM. Our results indicate that optimal performance is achieved with a temperature below 0.5, top probability below 0.75, frequency penalty above -1 and below 1.5, and presence penalty above -1. We make our dataset and results available to facilitate replication.
... Indeed, rigorously designed academic work may help with the utility and validity of CQA portals. While previous studies have investigated buggy code snippets on CQA portals such as Stack Overflow [6], [7], [8], we have also explored how practitioners judge code snippet quality on such portals [9]. This study goes one step further and focuses on practitioners' experiences and challenges using CQA portals. ...
... The study was publicised, and its purpose explained, having received human and behavioural ethics approval from the university through which the study was conducted (D21/218). A total of 50 practitioners agreed to participate in the study, of which forty-one (41) were male and nine (9) were female. The sample size is deemed adequate for the chosen (purposive) sampling method as the possible pool of participants is already restricted. ...
Article
Software developers make use of on crowdsourcing during development. Beyond learning from others, developers use online portals such as Stack Overflow as a vehicle for collaboration. However, little is known about developers' experiences on such platforms, particularly around problems that are encountered online. Such insights could benefit software developers in terms of recommendations for pitfalls to avoid, ways to exploit crowdsourced knowledge, and the provision of insights to improve online code sharing communities. We interviewed 50 practitioners to fill this gap, where outcomes show that software developers' use of online portals is targeted, and such portals are a lifeline to modern software development. Practitioners are facilitated with code solutions and debugging, often in a very timely fashion. While these experiences are largely positive, practitioners also encounter negative experiences online, some of which could be significantly deleterious to the community. We discuss the implications of these findings, such as creating awareness of the quality and reliability of code snippets, improving code searches, code validation and outdated code detection and attribution of code snippets.
... As noted above, the software teams studied used Scrum, and were rewarded for the degree of conformance to recommended processes (guidelines), see below. The software engineering body of knowledge identify these software development processes (e.g., requirements scoping) around the analysis, design, construct, test pipeline [42], and the most established software models emphasized that the quality of software product should cover functionality, reliability, usability, efficiency, maintainability and portability [43]. Of note here is that these latter dimensions not only consider the software features and their appropriate scope and working order (e.g., functionality and usability), but also the appropriateness of the software to facilitate future maintenance and easy portability. ...
Article
Agile methods and associated practices have been held to deliver value to software developers and customers. Research studies have reported team productivity and software quality benefits. While such insights are helpful for understanding how agile methods add value during software development, there is need for understanding the intersection of useful practices and outcomes over project duration. This study addresses this opportunity and conducted an observation study of student projects that was complemented by the analysis of demographics data and open responses about the challenges encountered during the use of agile practices. Data from 22 student teams comprising 85 responses were analyzed using quantitative and qualitative approaches, where among our findings we observed that the use of good coding practices and quality management techniques were positively correlated with all dimensions of product quality (e.g., functionality scope and software packaging). Outcomes also reveal that software product quality was predicted by requirements scoping, team planning and communication, and coding practice. However, high levels of team planning and communication were not necessary for all software development activities. When examining project challenges, it was observed that lack of technical skills and poor time management present most challenges to project success. While these challenges may be mitigated by agile practices, such practices may themselves create unease, requiring balance during project implementation.
... When evaluating a product's efficiency, the FURPS quality model only takes the customer's perspective into account [18]. Majority of practitioners used the five quality dimensions (FURPS) of Functionality, Readability, Efficiency, Security, and Reliability to evaluate the quality of code snippets [19]. Really explaining the importance of assessing NFRs using the FURPS quality model. ...
Article
In this research paper we have attempted to elicit Non-Functional Requirements (NFR) which may or not have been explicitly added to the Request for Proposal (RFP) in housing industry but is important for the success of the program. We have used real time RFPs for the elicitation of NFRs. The sales proposal process, also known as response to RFP process, refers to the methodical steps a vendor takes when developing a proposal in response to a buyer’s RFP which will have details of integrating systems, Software platforms, users, business outcomes, limitations, functional requirements, and may be non-functional requirements (NFR). There is some work done by researchers on assessing opaqueness of NFRs and traceability of NFR. But there has been no work on a complexity scoring model for NFRs which enables a vendor to respond to an RFP with best price and schedule. The paper proposes a novel complexity scoring model for NFRs in RFP in Housing industry by using FURPS (Functionality, Usability, Reliability, Performance, Supportability) quality attribute model in conjunction with MoSCoW(Must Have, Should Have, Could Have, Won’t Have) priority model by using mathematical formula. The working model is broken down in steps for Sales and pre-Sales teams of the Vendor and is ready for adoption.
... As noted above, the software teams studied used Scrum, and were rewarded for the degree of conformance to recommended processes (guidelines), see below. The software engineering body of knowledge identify these software development processes (e.g., requirements scoping) around the analysis, design, construct, test pipeline [42], and the most established software models emphasized that the quality of software product should cover functionality, reliability, usability, efficiency, maintainability and portability [43]. Of note here is that these latter dimensions not only consider the software features and their appropriate scope and working order (e.g., functionality and usability), but also the appropriateness of the software to facilitate future maintenance and easy portability. ...
Preprint
Agile methods and associated practices have been held to deliver value to software developers and customers. Research studies have reported team productivity and software quality benefits. While such insights are helpful for understanding how agile methods add value during software development, there is need for understanding the intersection of useful practices and outcomes over project duration. This study addresses this opportunity and conducted an observation study of student projects that was complemented by the analysis of demographics data and open responses about the challenges encountered during the use of agile practices. Data from 22 student teams comprising 85 responses were analyzed using quantitative and qualitative approaches, where among our findings we observed that the use of good coding practices and quality management techniques were positively correlated with all dimensions of product quality (e.g., functionality scope and software packaging). Outcomes also reveal that software product quality was predicted by requirements scoping, team planning and communication, and coding practice. However, high levels of team planning and communication were not necessary for all software development activities. When examining project challenges, it was observed that lack of technical skills and poor time management present most challenges to project success. While these challenges may be mitigated by agile practices, such practices may themselves create unease, requiring balance during project implementation.
... Systematic research on the quality of software systems has been started since the 70s of the 20th century, and even now many researchers focus on this problem (Musa & Everett, 1990;Al-Qutaish, 2010;Ndukwe et al, 2023). The Consortium for Information and Software Quality published a study entitled "The Cost of Poor Software Quality in the US: A 2022 Report" (2023). ...
Article
Full-text available
n the Industry 4.0 environment, the software systems development methodology is rapidly evolving, flexible technologies and new programming languages are being applied. The development of the software industry has made the issue of the quality of software systems an urgent problem. A number of quality models have been proposed for determining the quality of software systems so far, and these models specify the parameters and criteria for evaluating quality. Software reliability is one of the key indicators among the quality parameters of software systems, as it quantifies software crashes which can bring down even the most powerful system, ensuring that software systems run correctly and unexpected incidents do not occur. The increasing difficulty of the software system, the expansion of the scope of issues assigned on them, and as a result, the significant increase in the volume and complexity of the software system have made the problem of the reliability of the software system even more urgent. The essence of the issue is to reveal the main factors affecting the reliability of software systems, demonstrate existing problems in this area and develop mathematical models for assessing reliability. Mathematical models estimate the number of errors remaining in the software system before commissioning, predict the time of occurrence of the next crash and when the testing process will end. It is necessary to comprehensively approach the issue of ensuring reliability at all stages of the life cycle of the software system. This paper proposes a conceptual model to solve this problem (pp.42-56).
... Aspek efficiency adalah kemampuan perangkat lunak untuk memberikan performa sesuai dengan standar dan relatif menyesuaikan jumlah dari sumber daya pada saat keadaan tersebut. Aspek reliability adalah Kemampuan perangkat lunak untuk mempertahankan tingkat kinerja tertentu [8]. Aspek portability adalah kemampuan perangkat lunas untuk ditransfer dari satu lingkungan ke lingkungan lainnya. ...
Article
Full-text available
Sistem Informasi Peminjaman dan Pengembalian Barang Milik Negara Politeknik Negeri Subang (SIP2 BMN POLSUB) adalah sistem informasi berbasis website yang digunakan untuk memberikan pelayanan peminjaman dan pengembalian barang milik negara (BMN) kepada seluruh civitas akademika Politeknik Negeri Subang (POLSUB). Hasil pengembangan sistem informasi tentunya masih memiliki kesalahan, oleh sebab itu pengujian terhadap suatu sistem informasi perlu dilakukan. Tujuannya agar dapat mengetahui kualitas dari sistem informasi yang telah dibangun. Diperlukan sebuah standar untuk dijadikan sebagai landasan bahwa sistem informasi layak atau tidak untuk digunakan sehingga kelayakan sistem informasi tersebut dapat diukur. Penilaian kualitas dalam penelitian ini menggunakan standar International Organization for Standardization (ISO) 9126 dengan enam aspek yakni functionality, reliability, usability, efficiency, maintainability serta portability. Hasil penelitian menunjukkan bahwa SIP2 BMN POLSUB yang dikembangkan telah memenuhi standar ISO 9126 pada aspek functionality, usability, efficiency, reliability, portability, dan maintainability. Pengujian dilakukan kepada civitas akademik POLSUB dengan teknik pengumpulan data menggunakan kuesioner. Aspek pertama yaitu functionality dengan nilai 100% yang menunjukkan semua fungsionalitas yang ada pada sistem sesuai dengan kebutuhan pengguna. Aspek usability bernilai 90% yang mana sistem informasi mudah untuk digunakan. Berdasarkan hasil dari GTMetrix aspek efficiency menghasilkan grade B dengan performance 80% dan structure 85% yang berarti sistem informasi dapat mempertahankan tingkat kinerjanya. Pada aspek reliability persentase sebesar 99% yang menandakan sistem informasi mampu diakses pengguna dalam satu waktu, begitu juga pada aspek portability mencapai 100% yang berarti SIP2 BMN POLSUB dapat digunakan oleh berbagai jenis browser. Aspek terakhir yaitu maintainability mencapai 100% dan telah memenuhi ketiga kriteria yaitu instrumentation, consistency, dan simplicity. Kesimpulan akhir SIP2 BMN POLSUB layak digunakan di POLSUB.
... Systematic research on the quality of software systems has been started since the 70s of the 20th century, and even now many researchers focus on this problem (Musa & Everett, 1990;Al-Qutaish, 2010;Ndukwe et al, 2023). The Consortium for Information and Software Quality published a study entitled "The Cost of Poor Software Quality in the US: A 2022 Report" (2023). ...
Article
Full-text available
In the Industry 4.0 environment, the software systems development methodology is rapidly evolving, flexible technologies and new programming languages are being applied. The development of the software industry has made the issue of the quality of software systems an urgent problem. A number of quality models have been proposed for determining the quality of software systems so far, and these models specify the parameters and criteria for evaluating quality. Software reliability is one of the key indicators among the quality parameters of software systems, as it quantifies software crashes which can bring down even the most powerful system, ensuring that software systems run correctly and unexpected incidents do not occur. The increasing difficulty of the software system, the expansion of the scope of issues assigned on them, and as a result, the significant increase in the volume and complexity of the software system have made the problem of the reliability of the software system even more urgent. The essence of the issue is to reveal the main factors affecting the reliability of software systems, demonstrate existing problems in this area and develop mathematical models for assessing reliability. Mathematical models estimate the number of errors remaining in the software system before commissioning, predict the time of occurrence of the next crash and when the testing process will end. It is necessary to comprehensively approach the issue of ensuring reliability at all stages of the life cycle of the software system. This paper proposes a conceptual model to solve this problem.
... objectives presents its own challenges: business objectives can vary significantly from one organisation to another, making it difficult to develop a generic set of criteria that would apply to all software. It is difficult to measure the alignment of software with business objectives, a complex task with increased subjectivity that foremost requires a deep understanding of the business context [12]. ...
... By paying attention to quality in testing, of course, to meet the needs of the application. High or excellent quality is achieved if those inherent characteristics meet all requirements [20]. Software developers put more effort to develop and test the quality of the software and verify its reliability before it is released [21]. ...
... The assessment of quality, based on the factors perceived by users, partly depends on the type of services provided, which, in turn, rely on the technology implementing these services. Based on this reasoning, a complex information system, which offers services organized in (software) modules and technologically supported by distinct software segments, is even harder to evaluate [2]. ...
Article
Various clone detection methods have been proposed, with results varying depending on the combination of the methods and hyperparameters used (i.e., configurations). To help select a suitable clone detection configuration, we propose two Bandit Algorithm (BA) based methods that can help evaluate the configurations used dynamically while using detection methods. Our analysis showed that the two proposed methods, the naïve method and BANC (BA considering Negative Cases), identified the best configurations from four used code clone detection methods with high probability.
Article
Full-text available
This research aimed to determine the feasibility of web-based digital Arabic language gamification media from experts and users. Specifically, the research investigated the feasibility analysis of web media with the URL: https://belajararabonline14.wordpress.com/ containing Plotagon, Wordwalls, and Genially apps in learning Arabic at the Junior High School level. Generally, Arabic language instruction remains conventional and lacks interactivity. Consequently, students may experience boredom and reduced motivation to learn. Creating interactive materials using gaming principles could be one approach to solving this issue. This research was a type of quantitative descriptive research. This research involved 37 users and 4 validators. Data collection was carried out by distributing questionnaires to validators and users of this media and was analysed using descriptive statistics. The findings showed that web-based digital Arabic language gamification media received very valid responses from experts and users with details: a score of 4.53 from material experts, a score of 4.41 from media experts, a score of 4.93 from teacher responses, and a score of 4.06 from students. These findings suggest to Arabic teachers the potential benefit of incorporating this media as an auxiliary resource in Junior High School-level Arabic language instruction.
Article
Full-text available
Software quality is a critical aspect of software development that significantly impacts business performance and customer satisfaction. However, defining software quality can be challenging, as different sources provide various definitions and perspectives. The article presents a literature review of software quality, acknowledging an ongoing debate over the years regarding the definition of software quality and the methods used for its assessment. Among all the different ideas about software quality, the article highlights key concepts that are crucial in understanding software quality: meeting requirements, satisfying users, using software features, and spotting defects. The article also checks out international standards like ISO/IEC 25010:2011 and ISO/IEC 5055:2021, introducing terms such as "Quality in use" and "Structural Quality." Unveiling a tripartite perspective elucidated in international standards-internal quality, external quality, and quality in use-the article underscores the intricate interplay between subjectivity and objectivity. The subjective dimension, influenced by user perception and contextual factors, is juxtaposed with more objective criteria such as conformance to requirements and the absence of defects. The standards provide helpful perspectives, but the human side of things, like user feelings and specific contexts, makes finding a universal definition tricky. The pivotal role of business analysis and requirements engineering in ensuring software quality is underscored. Business requirements, stakeholder needs, and the quality of functional and non-functional requirements emerge as integral components. The article argues that software quality is intricately tied to the quality of its requirements, presenting a dual perspective: compliance with quality criteria and alignment with stakeholders' expectations and business goals. Practical software quality assessment is built upon the foundational understanding of contextual nuances, user needs, and operational conditions, all discerned through business analysis.
Article
Full-text available
Context: Architecture Tactics (ATs) are architectural building blocks that provide general architectural solutions for addressing Quality Attributes (QAs) issues. Mining and analyzing QA-AT knowledge can help the software architecture community better understand architecture design. However, manually capturing and mining this knowledge is labor-intensive and difficult. Objective: Using Stack Overflow (SO) as our source, our main goals are to effectively mine such knowledge; and to have some sense of how developers use ATs with respect to QA concerns from related discussions. Methods: We applied a semi-automatic dictionary-based mining approach to extract the QA-AT posts in SO. With the mined QA-AT posts, we identified the relationships between ATs and QAs. Results: Our approach allow us to mine QA-AT knowledge effectively with an F-measure of 0.865 and Performance of 82.2%. Using this mining approach, we are able to discover architectural synonyms of QAs and ATs used by designers, from which we discover how developers apply ATs to address quality requirements. Conclusions: We make two contributions in this work: First, we demonstrated a semi-automatic approach to mine ATs and QAs from SO posts; Second, we identified little-known design relationships between QAs and ATs and grouped architectural design considerations to aid architects make architecture tactics design decisions.
Article
Full-text available
Stack Overflow hosts millions of solutions that aim to solve developers' programming issues. In this crowdsourced question answering process, Stack Overflow becomes a code hosting website where developers actively share its code. However, code snippets on Stack Overflow may contain security vulnerabilities, and if shared carelessly, such snippets can introduce security problems in software systems. In this paper, we empirically study the prevalence of the Common Weakness Enumeration-CWE, in code snippets of C/C++ related answers. We explore the characteristics of Codew, i.e., code snippets that have CWE instances, in terms of the types of weaknesses, the evolution of Codew, and who contributed such code snippets. We find that: 1) 36% (i.e., 32 out of 89) CWE types are detected in Codew on Stack Overflow. Particularly, CWE-119, i.e., improper restriction of operations within the bounds of a memory buffer, is common in both answer code snippets and real-world software systems. Furthermore, the proportion of Codew doubled from 2008 to 2018 after normalizing by the total number of C/C++ snippets in each year. 2) In general, code revisions are associated with a reduction in the number of code weaknesses. However, the majority of Codew had weaknesses introduced in the first version of the code, and these Codew were never revised since then. Only 7.5% of users who contributed C/C++ code snippets posted or edited code with weaknesses. Users contributed less code with CWE weakness when they were more active (i.e., they either revised more code snippets or had a higher reputation). We also find that some users tended to have the same CWE type repeatedly in their various code snippets. Our empirical study provides insights to users who share code snippets on Stack Overflow so that they are aware of the potential security issues. To understand the community feedback about improving code weaknesses by answer revisions, we also conduct a qualitative study and find that 62.5% of our suggested revisions are adopted by the community. Stack Overflow can perform CWE scanning for all the code that is hosted on its platform. Further research is needed to improve the quality of the crowdsourced knowledge on Stack Overflow.
Article
Full-text available
In the current competitive world, producing quality products has become a prominent factor to succeed in business. In this respect, defining and following the software product quality metrics (SPQM) to detect the current quality situation and continuous improvement of systems have gained tremendous importance. Therefore, it is necessary to review the present studies in this area to allow for the analysis of the situation at hand, as well as to enable us to make predictions regarding the future research areas. The present research aims to analyze the active research areas and trends on this topic appearing in the literature during the last decade. A Systematic Mapping (SM) study was carried out on 70 articles and conference papers published between 2009 and 2019 on SPQM as indicated in their titles and abstract. The result is presented through graphics, explanations, and the mind mapping method. The outputs include the trend map between the years 2009 and 2019, knowledge about this area and measurement tools, issues determined to be open to development in this area, and conformity between conference papers, articles and internationally valid quality models. This study may serve as a foundation for future studies that aim to contribute to the development in this crucial field. Future SM studies might focus on this subject for measuring the quality of network performance and new technologies such as Artificial Intelligence (AI), Internet of things (IoT), Cloud of Things (CoT), Machine Learning, and Robotics.
Conference Paper
Full-text available
Software reuse is an important and crucial quality attribute in modern software engineering, where almost all software projects, open-source or commercial, no matter small or ultra-large, source code reuse in one way or another. Although software reuse has experienced an increased adoption throughout the years with the exponentially growing number of available third-party libraries, frameworks, and APIs, little knowledge exists to investigate what aspects of code reuse developers discuss. In this study, we look into bridging this gap by examining Stack Overflow to understand the challenges developers encounter when trying to reuse code. Using the Stack Overflow tags "code-reuse" and "reusabil-ity", we extracted and analyzed 1,409 posts, composed of questions and answers. Our findings indicate that despite being popular, reuse questions take relatively longer than typical other questions to receive an accepted answer. From these posts, we identified 9 categories that group the different ways developers discuss software reuse. We found Java and ASP.NET MVC to be the most discussed programming language and framework, respectively. Based on the programming languages and frameworks mentioned in the posts, we noted that Web software development is the most frequently targeted environment. This study can be utilized to further analyze aspects of software reuse and develop guidelines to be practiced in the industry and taught when forming new developers.
Article
Full-text available
Software developers share programming solutions in Q&A sites like Stack Overflow, Stack Exchange, Android forum, and so on. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 99 vulnerable code snippets categorized into31 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 99 vulnerable code snippets found in Stack Overflow were reused in a total of 2859 GitHub projects. To help improve the quality of code snippets shared on Stack Overflow,we developed a browser extension that allows Stack Overflow users to be notified for vulnerabilities in code snippets when they see them on the platform.
Article
Full-text available
Community Question and Answer (CQA) platforms use the power of online groups to solve problems, or gain information. While these websites host useful information, it is critical that the details provided on these platforms are of high quality, and that users can trust the information. This is particularly necessary for software development, given the ubiquitous use of software across all sections of contemporary society. Stack Overflow is the leading CQA platform for programmers, with a community comprising over 10 million contributors. While research confirms the popularity of Stack Overflow, concerns have been raised about the quality of answers that are provided to questions on Stack Overflow. Code snippets often contained in these answers have been investigated; however, the quality of these artefacts remains unclear. This could be problematic for the software engineering community, as evidence has shown that Stack Overflow snippets are frequently used in both open source and commercial software. This research fills this gap by evaluating the quality of code snippets on Stack Overflow. We explored various aspects of code snippet quality, including reliability and conformance to programming rules, readability, performance and security. Outcomes show variation in the quality of Stack Overflow code snippets for the different dimensions; however, overall, quality issues in Stack Overflow snippets were not always severe. Vigilance is encouraged for those reusing Stack Overflow code snippets.
Article
Full-text available
Context APIs play a central role in software development. The seminal research of Carroll et al. [15] on minimal manual and subsequent studies by Shull et al. [79] showed that developers prefer task-based API documentation instead of traditional hierarchical official documentation (e.g., Javadoc). The Q&A format in Stack Overflow offers developers an interface to ask and answer questions related to their development tasks. Objective With a view to produce API documentation, we study automated techniques to mine API usage scenarios from Stack Overflow. Method We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task by summarizing the discussions around the code example. Third, we automatically associate developers reactions (i.e., positive and negative opinions) towards the code example to offer information about code quality. Results We evaluate the algorithms using three benchmarks. We compared the algorithms against seven baselines. Our algorithms outperformed each baseline. We developed an online tool by automatically mining API usage scenarios from Stack Overflow. A user study of 31 software developers shows that the participants preferred the mined usage scenarios in Opiner over API official documentation. The tool is available online at: http://opiner.polymtl.ca/. Conclusion With a view to produce API documentation, we propose a framework to automatically mine API usage scenarios from Stack Overflow, supported by three novel algorithms. We evaluated the algorithms against a total of eight state of the art baselines. We implement and deploy the framework in our proof-of-concept online tool, Opiner.
Conference Paper
Full-text available
Research demonstrates that code snippets listed on programming-oriented online forums (e.g., Stack Overflow)-including snippets containing security mistakes-make their way into production code. Prior work also shows that software developers who reference Stack Overflow in their development cycle produce less secure code. While there are many plausible explanations for why developers propagate insecure code in this manner, there is little or no empirical evidence. To address this question, we identify Stack Overflow code snippets that contain security errors and find clones of these snippets in open source GitHub repositories. We then survey (n=133) and interview (n=15) the authors of these GitHub repositories to explore how and why these errors were introduced. We find that some developers (perhaps mistakenly) trust their security skills to validate the code they import, but the majority admit they would need to learn more about security before they could properly perform such validation. Further, although some prioritize functionality over security, others believe that ensuring security is not, or should not be, their responsibility. Our results have implications for attempts to ameliorate the propagation of this insecure code.
Conference Paper
Full-text available
Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. In this paper we clean STACKOVERFLOW, one of the most popular online discussion forums for programmers, to generate a parallel English-Code corpus from Android posts. We contrast three data cleaning approaches: standard NLP, title only, and software task extraction. We evaluate the quality of the each corpus for MT. To provide indicators of how useful each corpus will be for machine translation, we provide researchers with measurements of the corpus size, percentage of unique tokens, and per-word maximum likelihood alignment entropy. We have used these corpus cleaning approaches to translate between English and Code [22, 23], to compare existing SMT approaches from word mapping to neural networks [24], and to reexamine the "natural software" hypothesis [29]. After cleaning and aligning the data, we create a simple maximum likelihood MT model to show that English words in the corpus map to a small number of specific code elements. This model provides a basis for the success of using StackOverflow for search and other tasks in the software engineering literature and paves the way for MT. Our scripts and corpora are publicly available on GitHub [1] as well as at https://search.datacite.org/works/10.5281/zenodo.2558551.
Conference Paper
Full-text available
Programming code snippets readily available on platforms such as StackOverflow are undoubtedly useful for software engineers. Unfortunately, these code snippets might contain issues such as deprecated, misused, or even buggy code. These issues could pass unattended, if developers do not have adequate knowledge, time, or tool support to catch them. In this work we expand the understanding of such issues (or the so called "violations") hidden in code snippets written in JavaScript, the programming language with the highest number of questions on StackOverflow. To characterize the violations, we extracted 336k code snippets from answers to JavaScript questions on StackOverflow and statically analyzed them using ESLinter, a JavaScript linter. We discovered that there is no single JavaScript code snippet without a rule violation. On average, our studied code snippets have 11 violations, but we found instances of more than 200 violations. In particular, rules related to stylistic issues are by far the most violated ones (82.9% of the violations pertain to this category). Possible errors, which developers might be more interested in, represent only 0.1% of the violations. Finally, we found a small fraction of code snippets flagged with possible errors being reused on actual GitHub software projects. Indeed, one single code snippet with possible errors was reused 1,261 times.
Conference Paper
Full-text available
Despite being the most popular question and answer website for software developers, answers posted on Stack Overflow (SO) are susceptible to contain Python-related insecure coding practices. A systematic analysis on how frequently insecure coding practices appear in SO answers can help the SO community assess the prevalence of insecure Python code blocks in SO. An insecure coding practice is recurrent use of insecure coding patterns in Python. We conduct an empirical study using 529,054 code blocks collected from Python-related 44,966 answers posted on SO. We observe 7.1% of the 44,966 Python-related answers to include at least one insecure coding practice. The most frequently occurring insecure coding practice is code injection. We observe 9.8% of the 7,444 accepted answers to include at least one insecure code block. We also find user reputation not to relate with the presence of insecure code blocks, suggesting that both high and low-reputed users are likely to introduce insecure code blocks.
Article
Full-text available
-Software developers around the globe are actively asking a question(s) and sharing solutions to the problems related to software development on Stack Overflow-a social question and answer (Q&A) website. The knowledge shared by software developers on Stack Overflow contains useful information related to software development such as feature requests (functional/non-functional), code snippets, reporting bugs or sentiments. How to extract the functional and non-functional requirements shared by mobile application developers on social/programming Q&A website Stack Overflow has become a challenge and a less researched area. To understand the problems, needs, and trend in the iOS mobile application development, we evaluated the quality requirements or non-functional requirements (NFRs) on Stack Overflow posts. To this end, we applied Latent Dirichlet Allocation (LDA) topic models, to identify the main topics in iOS posts on Stack Overflow. Besides, we labeled the extracted topics with quality requirements or NFRs by using the wordlists to evaluate the trend, evolution, hot and unresolved NFRS in all iOS discussions. Our findings revealed that the highly frequent topics the iOS developers discussed are related to usability, reliability, and functionality followed by efficiency. Interestingly, the most problematic areas unresolved are also usability, reliability, and functionality though followed by portability. Besides, the evolution trend of each of the six different quality requirements or NFRs over time is depicted through comprehensive visualization.
Conference Paper
Full-text available
Software developers frequently solve development issues with the help of question and answer web forums, such as Stack Overflow (SO). While tags exist to support question searching and browsing, they are more related to technological aspects than to the question purposes. Tagging questions with their purpose can add a new dimension to the investigation of topics discussed in posts on SO. In this paper, we aim to automate such a classification of SO posts into seven question categories. As a first step, we have manually created a curated data set of 500 SO posts, classified into the seven categories. Using this data set, we apply machine learning algorithms (Random Forest and Support Vector Machines) to build a classification model for SO questions. We then experiment with 82 different configurations regarding the preprocessing of the text and representation of the input data. The results of the best performing models show that our models can classify posts into the correct question category with an average precision and recall of 0.88 and 0.87 when using Random Forest and the phrases indicating a question category as input data for the training. The obtained model can be used to aid developers in browsing SO discussions or researchers in building recommenders based on SO.
Chapter
Full-text available
Reliability is an important issue for deciding the quality of the software. Reliability prediction is a statistical procedure that purpose to expect the future reliability values, based on known information during development processes. It is considered as a basic function of software development. A review-based research has been done in this work to evaluate the previously established methodologies for reliability prediction. In this paper, authors give a critical review related to successful research of reliability prediction. This paper also provides many challenges and keys of reliability estimation during software development process. Further, this paper gives a precarious discussion on previous work and identified factors which are important for reliability of software but still ignored. This work helps to developers for predicting the reliability of software with minimum risks.
Article
Full-text available
Java platform and third-party libraries provide various security features to facilitate secure coding. However, misusing these features can cost tremendous time and effort of developers or cause security vulnerabilities in software. Prior research was focused on the misuse of cryptography and SSL APIs, but did not explore the key fundamental research question: what are the biggest challenges and vulnerabilities in secure coding practices? In this paper, we conducted a comprehensive empirical study on StackOverflow posts to understand developers' concerns on Java secure coding, their programming obstacles, and potential vulnerabilities in their code. We observed that developers have shifted their effort to the usage of authentication and authorization features provided by Spring security--a third-party framework designed to secure enterprise applications. Multiple programming challenges are related to APIs or libraries, including the complicated cross-language data handling of cryptography APIs, and the complex Java-based or XML-based approaches to configure Spring security. More interestingly, we identified security vulnerabilities in the suggested code of accepted answers. The vulnerabilities included using insecure hash functions such as MD5, breaking SSL/TLS security through bypassing certificate validation, and insecurely disabling the default protection against Cross Site Request Forgery (CSRF) attacks. Our findings reveal the insufficiency of secure coding assistance and education, and the gap between security theory and coding practices.
Conference Paper
Full-text available
Platforms such as Stack Overflow are available for software practitioners to solicit help and solutions to their challenges and knowledge needs. This community's practices have in recent times however caused quality-related concerns. Academic work tends to provide validation for the practice and processes of these forums, however, previous work did not review the scale of scientific attention that is given to this cause. We conducted a Systematic Mapping study involving 266 papers from six relevant databases to address this gap. In this preliminary work we explored the level of academic interest Stack Overflow has generated, the publication venues, the topics studied and approaches used. Outcomes show that Stack Overflow has attracted increasing research interest, with topics relating to both community dynamics and human factors, and technical issues. In addition, research studies have been largely evaluative or proposed solutions, though this latter approach tends to lack validation. This signals the need for future work to explore the nature of Stack Overflow research contributions that are provided, and their quality. We outline our research agenda for continuing with such efforts.
Article
Full-text available
When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.
Article
Every organization wants to run with profit only and it is a common trend to acquire the new technologies, methods and models to enhance the quality of software product. Quality of product is directly proportional to the value of the product and the profit of the organization. Different quality model are used by different organization foundation upon the requirements. Different thoughts of software quality attributes with various software quality models are evaluated and examined in this survey. Also researcher performed comparative study of different software quality models utilized by different organizations. Making software by using open source software (OSS) and commercial of the shelf (COTS) component is very useful for any organization. So as to quantify quality in software, quality models can be used to check the quality of existing COTS and OSS software components. The purpose of this research is to do the comparative study to check software quality prediction using COTS and OSS components. In this survey the researcher do study of more than 100 papers and identified that OSS as the development technique is more better then COTS development. Because COTS development is black box in nature, so some tradition quality metrics cannot be apply but in case of OSS as development methodology source code is available. So we can apply all software quality metrics to assess the OSS quality. But still there is no proper technique, model and tool is available by which we can evaluate the quality of OSS. In future the researcher can improve the software quality by creating some tools and models for estimation of quality characteristics according to the modified ISO-9126 quality assurance model for OSS.
Preprint
Monthly, 50 million users visit Stack Overflow, a popular Q&A forum used by software developers, to share and gather knowledge and help with coding problems. Although Q&A forums serve as a good resource for seeking help from developers beyond the local team, the abundance of information can cause developers, especially novice software engineers, to spend considerable time in identifying relevant answers and suitable suggested fixes. This exploratory study aims to understand how novice software engineers direct their efforts and what kinds of information they focus on within a post selected from the results returned in response to a search query on Stack Overflow. The results can be leveraged to improve the Q&A forum interface, guide tools for mining forums, and potentially improve granularity of traceability mappings involving forum posts. We qualitatively analyze the novice software engineers’ perceptions from a survey as well as their annotations of a set of Stack Overflow posts. Our results indicate that novice software engineers pay attention to only 27% of code and 15-21% of text in a Stack Overflow post to understand and determine how to apply the relevant information to their context. Our results also discern the kinds of information prominent in that focus.
Conference Paper
Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p <; 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.
Conference Paper
The Internet of things systems (IoT) has a significant impact on different aspects of our lives. For that reason, IoT systems should be in high quality and clean of defects. The quality measurements for IoT systems vary according to the type of the IoT system and its applications. Therefore, IoT systems should be quality measured differently considering the presence of heterogeneous objects bound together to build the IoT system. This diversity leads to a variety of quality measurement model, which makes the process of measuring quality more challengeable, less accurate, and less applicable. In this research, different quality models for IoT systems have been studied and compared regarding the quality factors. Besides, a new quality model for IoT has been proposed. The new model focused on all the characteristics related to IoT systems, by introducing quality factors that measure them.
Conference Paper
Software reuse is a well-established software engineering pro-cess that aims at improving development productivity. Although reuse can be performed in a very systematic way (e.g., through product lines), in practice, reuse is performed in many cases opportunistically, i.e., copying small code chunks either from the web or in-house developed projects. Knowledge sharing communities and especially StackOverflow constitute the primary source of code-related information for amateur and professional software developers. Despite the obvious benefit of increased productivity, reuse can have a mixed effect on the quality of the resulting code depending on the properties of the reused solutions. An efficient concept for capturing a wide-range of internal software qualities is the metaphor of Tech-nical Debt which expresses the impact of shortcuts in software development on its maintenance costs. In this paper, we pre-sent the results from an empirical study on the effect of code retrieved from StackOverflow on the technical debt of the tar-get system. In particular, we study several open-source projects and identify non-trivial pieces of code that exhibit a perfect or near-perfect match with code provided in the context of an-swers in StackOverflow. Then, we compare the technical debt density of the reused fragments—obtained as the ratio of inef-ficiencies identified by SonarQube over the lines of reused code—to the technical debt density of the target codebase. The results provide insight to the potential impact of code reuse on technical debt and highlight the benefits of assessing code qual-ity before committing changes to a repository
Article
Online code clones are code fragments that are copied from software projects or online sources to Stack Overflow as examples. Due to an absence of a checking mechanism after the code has been copied to Stack Overflow, they can become toxic code snippets, e.g., they suffer from being outdated or violating the original software license. We present a study of online code clones on Stack Overflow and their toxicity by incorporating two developer surveys and a large-scale code clone detection. A survey of 201 high-reputation Stack Overflow answerers (33% response rate) showed that 131 participants (65%) have ever been notified of outdated code and 26 of them (20%) rarely or never fix the code. 138 answerers (69%) never check for licensing conflicts between their copied code snippets and Stack Overflow?s CC BY-SA 3.0. A survey of 87 Stack Overflow visitors shows that they experienced several issues from Stack Overflow answers: mismatched solutions, outdated solutions, incorrect solutions, and buggy code. 85% of them are not aware of CC BY-SA 3.0 license enforced by Stack Overflow, and 66% never check for license conflicts when reusing code snippets. Our clone detection found online clone pairs between 72,365 Java code snippets on Stack Overflow and 111 open source projects in the curated Qualitas corpus. We analysed 2,289 non-trivial online clone candidates. Our investigation revealed strong evidence that 153 clones have been copied from a Qualitas project to Stack Overflow. We found 100 of them (66%) to be outdated, of which 10 were buggy and harmful for reuse. Furthermore, we found 214 code snippets that could potentially violate the license of their original software and appear 7,112 times in 2,427 GitHub projects.
Conference Paper
The Java platform and its third-party libraries provide useful features to facilitate secure coding. However, misusing them can cost developers time and effort, as well as introduce security vulnerabilities in software. We conducted an empirical study on StackOverflow posts, aiming to understand developers' concerns on Java secure coding, their programming obstacles, and insecure coding practices. We observed a wide adoption of the authentication and authorization features provided by Spring Security---a third-party framework designed to secure enterprise applications. We found that programming challenges are usually related to APIs or libraries, including the complicated cross-language data handling of cryptography APIs, and the complex Java-based or XML-based approaches to configure Spring Security. In addition, we reported multiple security vulnerabilities in the suggested code of accepted answers on the StackOverfow forum. The vulnerabilities included disabling the default protection against Cross-Site Request Forgery (CSRF) attacks, breaking SSL/TLS security through bypassing certificate validation, and using insecure cryptographic hash functions. Our findings reveal the insufficiency of secure coding assistance and documentation, as well as the huge gap between security theory and coding practices.
Conference Paper
As the popularity of modern social coding paradigm such as Stack Overflow grows, its potential security risks increase as well (e.g., insecure codes could be easily embedded and distributed). To address this largely overlooked issue, in this paper, we bring an important new insight to exploit social coding properties in addition to code content for automatic detection of insecure code snippets in Stack Overflow. To determine if the given code snippets are insecure, we not only analyze the code content, but also utilize various kinds of relations among users, badges, questions, answers, code snippets and keywords in Stack Overflow. To model the rich semantic relationships, we first introduce a structured heterogeneous information network (HIN) for representation and then use meta-path based approach to incorporate higher-level semantics to build up relatedness over code snippets. Later, we propose a novel network embedding model named snippet2vec for representation learning in HIN where both the HIN structures and semantics are maximally preserved. After that, a multi-view fusion classifier is constructed for insecure code snippet detection. To the best of our knowledge, this is the first work utilizing both code content and social coding properties to address the code security issues in modern software coding platforms. Comprehensive experiments on the data collections from Stack Overflow are conducted to validate the effectiveness of the developed system ICSD which integrates our proposed method in insecure code snippet detection by comparisons with alternative approaches.
Conference Paper
This paper studies the software documentation quality in Stack Overflow from two perspectives: the questioners’ who are accepting answers and the community’s who is voting for answers. We show what developers can do to increase the chance that their questions or answers get accepted by the community or by the questioners. We found different expectations of what information such as code or images should be included in a question or an answer. We evaluated six different quality indicators (such as Flesh Reading Ease or images) which a developer should consider before posting a question and an answer. In addition, we found different quality indicators for different types of questions, in particular error, discrepancy, and how-to questions. Finally we use a supervised machine-learning algorithm to predict when an answer will be accepted or voted.
Conference Paper
Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. To reduce manual assessment effort, we design ExampleCheck, an API usage mining framework that extracts patterns from over 380K Java repositories on GitHub and subsequently reports potential API usage violations in Stack Overflow posts. We analyze 217,818 Stack Overflow posts using ExampleCheck and find that 31% may have potential API usage violations that could produce unexpected behavior such as program crashes and resource leaks. Such API misuse is caused by three main reasons---missing control constructs, missing or incorrect order of API calls, and incorrect guard conditions. Even the posts that are accepted as correct answers or upvoted by other programmers are not necessarily more reliable than other posts in terms of API misuse. This study result calls for a new approach to augment Stack Overflow with alternative API usage details that are not typically shown in curated examples.
Article
Unreadable code could compromise program comprehension, and it could cause the introduction of bugs. Code consists of mostly natural language text, both in identifiers and comments, and it is a particular form of text. Nevertheless, the models proposed to estimate code readability take into account only structural aspects and visual nuances of source code, such as line length and alignment of characters. In this paper, we extend our previous work in which we use textual features to improve code readability models. We introduce 2 new textual features, and we reassess the readability prediction power of readability models on more than 600 code snippets manually evaluated, in terms of readability, by 5K+ people. We also replicate a study by Buse and Weimer on the correlation between readability and FindBugs warnings, evaluating different models on 20 software systems, for a total of 3M lines of code. The results demonstrate that (1) textual features complement other features and (2) a model containing all the features achieves a significantly higher accuracy as compared with all the other state‐of‐the‐art models. Also, readability estimation resulting from a more accurate model, ie, the combined model, is able to predict more accurately FindBugs warnings.
Article
This synthesis study examined the reported use of credibility techniques in higher education evaluation articles that use qualitative methods. The sample included 118 articles published in six leading higher education evaluation journals from 2003 to 2012. Mixed methods approaches were used to identify key credibility techniques reported across the articles, document the frequency of these techniques, and describe their use and properties. Two broad sets of techniques were of interest: primary design techniques (i.e., basic), such as sampling/participant recruitment strategies, data collection methods, analytic details, and additional qualitative credibility techniques (e.g., member checking, negative case analyses, peer debriefing). The majority of evaluation articles reported use of primary techniques although there was wide variation in the amount of supporting detail; most of the articles did not describe the use of additional credibility techniques. This suggests that editors of evaluation journals should encourage the reporting of qualitative design details and authors should develop strategies yielding fuller methodological description.
Article
Online programming discussion platforms such as Stack Overflow serve as a rich source of information for software developers. Available information include vibrant discussions and oftentimes ready-to-use code snippets. Anecdotes report that software developers copy and paste code snippets from those information sources for convenience reasons. Such behavior results in a constant flow of community-provided code snippets into production software. To date, the impact of this behaviour on code security is unknown. We answer this highly important question by quantifying the proliferation of security-related code snippets from Stack Overflow in Android applications available on Google Play. Access to the rich source of information available on Stack Overflow including ready-to-use code snippets provides huge benefits for software developers. However, when it comes to code security there are some caveats to bear in mind: Due to the complex nature of code security, it is very difficult to provide ready-to-use and secure solutions for every problem. Hence, integrating a security-related code snippet from Stack Overflow into production software requires caution and expertise. Unsurprisingly, we observed insecure code snippets being copied into Android applications millions of users install from Google Play every day. To quantitatively evaluate the extent of this observation, we scanned Stack Overflow for code snippets and evaluated their security score using a stochastic gradient descent classifier. In order to identify code reuse in Android applications, we applied state-of-the-art static analysis. Our results are alarming: 15.4% of the 1.3 million Android applications we analyzed, contained security-related code snippets from Stack Overflow. Out of these 97.9% contain at least one insecure code snippet.
Conference Paper
Code fragments posted in answers on Q&A forums can form an important source of developer knowledge. However, effective reuse of code fragments found online often requires information other than the code fragment alone. We report on the results of a survey-based study to investigate to what extent developers perceive Stack Overflow code fragments to be self-explanatory. As part of the study, we also investigated the types of information missing from fragments that were not self-explanatory. We find that less than half of the Stack Overflow code fragments in our sample are considered to be self-explanatory by the 321 participants who answered our survey, and that the main issues that negatively affect code fragment understandability include incomplete fragments, code quality, missing rationale, code organization, clutter, naming issues, and missing domain information. This study is a step towards understanding developers' information needs as they relate to code fragments, and how these needs can be addressed.
Article
Context: Source code reuse has been widely accepted as a fundamental activity in software development. Recent studies showed that StackOverflow has emerged as one of the most popular resources for code reuse. Therefore, a plethora of work proposed ways to optimally ask questions, search for answers and find relevant code on StackOverflow. However, little work studies the impact of code reuse from StackOverflow. Objective: To better understand the impact of code reuse from StackOverflow, we perform an exploratory study focusing on code reuse from StackOverflow in the context of mobile apps. Specifically, we investigate how much, why, when, and who reuses code. Moreover, to understand the potential implications of code reuse, we examine the percentage of bugs in files that reuse StackOverflow code. Method: We perform our study on 22 open source Android apps. For each project, we mine their source code and use clone detection techniques to identify code that is reused from StackOverflow. We then apply different quantitative and qualitative methods to answer our research questions. Results: Our findings indicate that 1) the amount of reused StackOverflow code varies for different mobile apps, 2) feature additions and enhancements in apps are the main reasons for code reuse from StackOverflow, 3) mid-age and older apps reuse StackOverflow code mostly later on in their project lifetime and 4) that in smaller teams/apps, more experienced developers reuse code, whereas in larger teams/apps, the less experienced developers reuse code the most. Additionally, we found that the percentage of bugs is higher in files after reusing code from StackOverflow. Conclusion: Our results provide insights on the potential impact of code reuse from StackOverflow on mobile apps. Furthermore, these results can benefit the research community in developing new techniques and tools to facilitate and improve code reuse from StackOverflow.