Conference Paper

Ranking Software Components for Pragmatic Reuse

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Pragmatic software reuse, in which existing software components are invasively adapted for use in new projects, involves three main activities andash; selection, adaptation and integration. Most of the academic research into pragmatic research to date has focused on the second of these activities, adaptation, especially the definition of reuse plans and verification of invasive changes, even though the selection activity is arguably the most important and effort-intensive of the three activities. There is therefore a great deal of scope for improving the level of support provided by software search engines and recommendation tools to pragmatic reusers of software components. Test-driven search engines are particularly promising in this regard since they possess the inherent ability to "evaluate" components from the perspective of users' reuse scenarios. In this paper we discuss some of the main issues involved in improving the selection support for pragmatic reuse provided by test-driven search engines, describe some new metrics that can help address these issues, and present the outline of an approach for ranking software components in search results.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Software metrics have been shown useful to characterize different software properties [12,24,25,28]. For instance, some studies [24,25] apply software metrics to assess the component reusability of either single software systems or SPLs. ...
... Software metrics have been shown useful to characterize different software properties [12,24,25,28]. For instance, some studies [24,25] apply software metrics to assess the component reusability of either single software systems or SPLs. Another study [12] uses metrics to guide the management of component-based software systems. ...
Conference Paper
Full-text available
Component-based software product line (SPL) consists of a set of software products that share common components. For a proper SPL product composition, each component has to follow three principles: encapsulating a single feature, restricting data access, and be replaceable. However, it is known that developers usually introduce anomalous structures, i.e., code smells, along the implementation of components. These code smells might violate one or more component principles and hinder the SPL product composition. Thus, developers should identify code smells in component-based SPLs, especially those affecting highly interconnected components, which are called critical components. Nevertheless, there is limited evidence of how smelly these critical components tend to be in component-based SPLs. To address this limitation, this paper presents a survey with developers of three SPLs. We inquire these developers about their perceptions of a critical component. Then, we characterize critical components per SPL, and identify nine recurring types of code smells. Finally, we quantitatively assess the smelliness of the critical components. Our results suggest that: (i) critical components are ten times more prone to have code smells than non-critical ones; (ii) the most frequent code smell types affecting critical components violate several component principles together; and (iii) these smell types affect multiple SPL components.
... Develop-for-reuse is the process of producing software components that could be used in the development of software system in the future [33], [34], [35]. Develop-by-reuse is the process of using existing software component in the development of software system [36], which includes three main activities: selection, adaptation and integration [37]. ...
... Kessel and Atkinson [37] discussed the main issues related to the improvement support for pragmatic reuse selection using test-driven search engines. Moreover, [9] proposed a set of metrics in order to address these issues and new approach for ranking components in search results. ...
Article
Full-text available
For many decades, the cost, time and quality are the main concern of software engineering. The main objective of any software organization is to produce high quality software product within a shorter time and minimum cost. Software reuse is one of the main strategies concerns about using available resources to enhance the productivity of software development and the quality of software products. It aims at using existing software products and components in the development of new software systems. However, various types of software components available in different sources are used in the reuse strategy. This makes the reuse strategy confusable and its efficiency and effectiveness debatable. Selecting unsuitable component or scenario makes the reuse inefficient and ineffective. This study discusses the types of software components, their sources, characteristics and applicable scenarios for developing and reusing these components. A dataset from the literature is used to calculate and compare the cost of reuse processes. The results show that software reuse is an efficient strategy comparing with the normal development. Although, considering the reusability of software components required extra cost to the normal development, it could efficiently save the cost of the development of new software system. Moreover, using existing software components in the development of new reusable component is the most efficient strategy, which required even less than the cost of developing normal component.
... In this phase, the ability of adapting the developed asset is considered, in order to work with different systems in different environment. Develop by reuse concerns about adapting existing software asset in new system in order to achieve specific requirements [20] [21]. In this phase, the ability of adapting and using the existing software asset is considered, in addition to the ability to achieve its intended functionality. ...
... CBSD has been recognized to play various roles in addressing needs of particular applications [9]. While the reduced cost of component-based development comes from the relative ease with which components can be assembled, enabling reuse, such assembly may involve complexities of analysis, design, testing, and maintenance processes, common in conventional software development [3], [6,7]. In addition, prior literature on software product line engineering [10] has established the link between reusability of components in the product platform and the ability to increase product variety. ...
... A previous study [Vale et al. 2016] suggests the lack of specific metrics for componentbased development, which could be helpful to characterize SPL critical components. In despite of that, previous studies [Gill 2006, Her et al. 2007, Kessel and Atkinson 2015 often associate SPL critical components with conventional software metrics, mostly at the class level, such as a high coupling between components. Therefore, instead of proposing novel metrics specific to component-based SPL, can we use conventional metrics, like coupling metrics, to help characterize critical components in component-based SPL? ...
Conference Paper
Full-text available
In component-based software product lines (SPL), each component has to encapsulate features, restrict data access, and be replaceable. For critical components, with multiple features and dependencies, these criteria are fundamental for flexible product configuration. Previous work assume that coupling metrics help characterize critical components, but we lack empirical evidence for that assumption. By characterizing critical components, we could help developers identify components that require careful maintenance and evolution. This paper relies on five well-known coupling metrics to compose a strategy for characterizing critical components in component-based SPLs. Our results suggest reasonable strategy's accuracy but the need for using additional metrics.
... In other work on components, Kessel and Atkinson discuss reuse of software components [12] focussing on partial matches for suitability in situations where a component's intended use differs from its ultimate use. For Event-B, in the development of high-integrity systems, it will be very important to fully understand the behaviour of a component and underspecification must be judiciously applied to accommodate unforeseen variability. ...
Conference Paper
Efficient reuse is a goal of many software engineering strategies and is useful in the safety-critical domain where formal development is required. Event-B can be used to develop safety-critical systems, but could be improved by a component-based reuse strategy. In this paper, we outline a component-based reuse methodology for Event-B. It provides a means for bottom-up scalability, and can also be used with the existing top-down approach. We describe the process of creating library components , their composition, and specification of new properties (involving the composed elements). We introduce Event-B component interfaces and propose to use a diagrammatic representation of component instances (based on iUML-B) which can be used to describe the relationships between the composed elements. We also discuss the specification of communication flow across component boundaries and describe the additional proof obligations that are required.
... The research further reports that studies on maturity models of software reuse are limited and more was needed to be done in this area to help organizations in proper auditing of their maturity reuse level. Kessel and Atkinson (2015) discuss some of the main issues involved in improving the selection support for pragmatic reuse provided by test-driven search engines. It also describes some new metrics that could help address the issues and presents an approach for ranking components in search results. ...
Chapter
In the present world of innovation where different software applications simply called APPs flood the markets, application stability remains an important phenomenon in software business. End-users amongst other things often request to know which release is the stable product of a version and how stable the product is compared to other releases and related products in the market. Declaration of a specific release as a stable software release by the software developers may not sufficiently address the stability issues raised by the end-users as they do not state the percentage stability level of the product. This article presents a technique that utilizes software maturity index (SMI) with components ranking schemes in measuring and determining applications stability before they are finally released to the public as the stable product. A low stability percentage from the assessment indicates product immaturity and imminent changes regarding its behavior, functionalities and APIs specifications. It could also provide reasonable guides in terms of changes to be effected to further enhance its stability. Whereas, a high stability percentage duly obtained could boast the confidence of the developers and the end-users in the product.
Article
Full-text available
One of the biggest obstacles to software reuse is the cost involved in evaluating the suitability of possible reusable components. In recent years, code search engines have made significant progress in establishing the semantic suitability of components for new usage scenarios, but the problem of ranking components according to their non-functional suitability has largely been neglected. The main difficulty is that a component’s non-functional suitability for a specific reuse scenario is usually influenced by multiple, “soft” criteria, but the relative weighting of metrics for these criteria is rarely known quantitatively. What is required, therefore, is an effective and reliable strategy for ranking software components based on their non-functional properties without requiring users to provide quantitative weighting information. In this paper we present a novel approach for achieving this based on the non-dominated sorting of components driven by a specification of the relative importance of non-functional properties as a partial ordering. After describing the ranking algorithm and its implementation in a component search engine, we provide an explorative study of its properties on a sample set of components harvested from Maven Central.
Article
Full-text available
In the past two decades both the industry and the research community have proposed hundreds of metrics to track software projects, evaluate quality or estimate effort. Unfortunately, it is not always clear which metric works best in a particular context. Even worse, for some metrics there is little evidence whether the metric measures the attribute it was designed to measure. In this paper we propose a catalog format for software metrics as a first step towards a consolidated overview of available software metrics. This format is designed to provide an overview of the status of a metric in a glance, while providing enough information to make an informed decision about the use of the metric. We envision this format to be implemented in a (semantic) wiki to ensure that relationships between metrics can be followed with ease.
Conference Paper
Full-text available
A large number of software metrics have been proposed in the literature, but there is little understanding of how these metrics relate to one another. We propose a novel experimental technique, based on search-based refactoring, to assess software metrics and to explore relationships between them. Our goal is not to improve the program being refactored, but to assess the software metrics that guide the auto- mated refactoring through repeated refactoring experiments. We apply our approach to five popular cohesion metrics using eight real-world Java systems, involving 300,000 lines of code and over 3,000 refactorings. Our results demonstrate that cohesion metrics disagree with each other in 55% of cases, and show how our approach can be used to reveal novel and surprising insights into the software metrics under investigation.
Article
Full-text available
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-of-the-art software engineering tools, as these tools often require access to both the structural and textual information available in source code.We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-of-the-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.
Article
Full-text available
BackgroundMany papers are published on the topic of software metrics but it is difficult to assess the current status of metrics research.AimThis paper aims to identify trends in influential software metrics papers and assess the possibility of using secondary studies to integrate research results.MethodSearch facilities in the SCOPUS tool were used to identify the most cited papers in the years 2000–2005 inclusive. Less cited papers were also selected from 2005. The selected papers were classified according factors such as to main topic, goal and type (empirical or theoretical or mixed). Papers classified as “Evaluation studies” were assessed to investigate the extent to which results could be synthesized.ResultsCompared with less cited papers, the most cited papers were more frequently journal papers, and empirical validation or data analysis studies. However, there were problems with some empirical validation studies. For example, they sometimes attempted to evaluate theoretically invalid metrics and fail to appreciate the importance of the context in which data are collected.ConclusionsThis paper, together with other similar papers, confirms that there is a large body of research related to software metrics. However, software metrics researchers may need to refine their empirical methodology before they can answer useful empirical questions.
Conference Paper
Full-text available
Our goal is to use the vast repositories of avail- able open source code to generate specific functions or classes that meet a user's specifications. The key words here are specifications and generate. We let users specify what they are looking for as precisely as possible using keywords, class or method signatures, test cases, contracts, and security constraints. Our system then uses an open set of program transforma- tions to map retrieved code into what the user asked for. This approach is implemented in a prototype sys- tem for Java with a web interface. 1. Motivation
Conference Paper
Full-text available
This paper examines the reliability of implicit feedback generated from clickthrough data in WWW search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average.
Article
Full-text available
Construct validity is about the question, how we know that we're measuring the attribute that we think we're measuring? This is discussed in formal, theoretical ways in the computing literature (in terms of the representational theory of measurement) but rarely in simpler ways that foster application by practitioners. Construct validity starts with a thorough analysis of the construct, the attribute we are attempting to measure. In the IEEE Standard 1061, direct measures need not be validated. "Direct" measurement of an attribute involves a metric that depends only on the value of the attribute, but few or no software engineering attributes or tasks are so simple that measures of them can be direct. Thus, all metrics should be validated. The paper continues with a framework for evaluating proposed metrics, and applies it to two uses of bug counts. Bug counts capture only a small part of the meaning of the attributes they are being used to measure. Multidimensional analyses of attributes appear promising as a means of capturing the quality of the attribute in question. Analysis fragments run throughout the paper, illustrating the breakdown of an attribute or task of interest into sub-attributes for grouped study.
Conference Paper
One of the drawbacks of a pragmatic, white box approach to the reuse of software is that reusable components often have more built-in functionality than is needed for a particular (re)usage scenario. This functionality either has to be invasively removed by changing the source code of the component, with the corresponding risk of errors, or has to be incorporated into a new project where it essentially pollutes the code base. While this may not be an immediate problem, over time such unneeded, polluting functionality can decrease the understandability of software and make it harder to maintain. The degree of superfluous functionality built into a component, when considered for a new use for which it was not initially intended, is therefore a useful metric which should be taken into account when choosing components to reuse. For example, it can be used as an input for a code search engine's ranking algorithm. In this paper we present a family of metrics for measuring the superfluous functionality within software components from the perspective of specific (re)use scenarios, describe how these metrics can be calculated, and investigate their utility as a differentiating measure to help developers choose which components to reuse.
Article
In this introductory chapter, we map out code retrieval on the web as a research area and the organizational of this book. Code retrieval on the web is concerned with the algorithms, systems, and tools to allow programmers to search for source code on the web and the empirical studies of these inventions and practices. It is a label that we apply to a set of related research from a software engineering, information retrieval, human-computer interaction, management, as well as commercial products. The division of code retrieval on the web into snippet remixing and component reuse is driven both by empirical data, and analysis of existing search engines and tools. © Springer Science+Business Media New York 2013. All rights are reserved.
Chapter
The applicability of software reuse approaches in practice has long suffered from a lack of reusable material, but this situation has changed virtually over night: the rise of the open source movement has made millions of software artifacts available on the Internet. Suddenly, the existing (largely text-based) software search solutions did not suffer from a lack of reusable material anymore, but rather from a lack of precision as a query now might return thousands of potential results. In a reuse context, however, precisely matching results are the key for integrating reusable material into a given environment with as little effort as possible. Therefore a better way for formulating and executing queries is a core requirement for a broad application of software search and reuse. Inspired by the recent trend towards test-first software development approaches, we found test cases being a practical vehicle for reuse-driven software retrieval and developed a test-driven code search system utilizing simple unit tests as semantic descriptions of desired artifacts. In this chapter we describe our approach and present an evaluation that underlines its superior precision when it comes to retrieving reusable artifacts.
Article
Traditional industrial practice often involves the ad hoc reuse of source code that was not designed for that reuse. Such pragmatic reuse tasks play an important role in disciplined software development. Pragmatic reuse has been seen as problematic due to a lack of systematic support, and an inability to validate that the reused code continues to operate correctly within the target system. Although recent work has successfully systematized support for pragmatic reuse tasks, the issue of validation remains unaddressed. In this paper, we present a novel approach and tool to semi‐automatically reuse and transform relevant portions of the test suite associated with pragmatically reused code, as a means to validate that the relevant constraints from the originating system continue to hold, while minimizing the burden on the developer. We conduct a formal experiment with experienced developers, to compare the application of our approach versus the use of a standard IDE (the ‘manual approach’). We find that, relative to the manual approach, our approach: reduces task completion time; improves instruction coverage by the reused test cases; and improves the correctness of the reused test cases. Copyright © 2012 John Wiley & Sons, Ltd.
Article
Many software reuse tasks involve reusing source code that was not designed in a manner conducive to those tasks, requiring that ad hoc modifications be applied. Such pragmatic reuse tasks are a reality in disciplined industrial practice; they arise for a variety of organizational and technical reasons. To investigate a pragmatic reuse task, a developer must navigate through, and reason about, source code dependencies in order to identify program elements that are relevant to the task and to decide how those elements should be reused. The developer must then convert his mental model of the task into a set of actions that he can perform. These steps are poorly supported by modern development tools and practices. We provide a model for the process involved in performing a pragmatic reuse task, including the need to capture (mentally or otherwise) the developer's decisions about how each program element should be treated: this is a pragmatic-reuse plan. We provide partial support for this model via a tool suite, called Gilligan; other parts of the model are supported via standard IDE tools. Using a pragmatic-reuse plan, Gilligan can semiautomatically transform the selected source code from its originating system and integrate it into the developer's system. We have evaluated Gilligan through a series of case studies and experiments (each involving industrial developers) using a variety of source systems and tasks; we report in particular on a previously unpublished, formal experiment. The results show that pragmatic-reuse plans are a robust metaphor for capturing pragmatic reuse intent and that, relative to standard IDE tools, Gilligan can (1) significantly decrease the time that developers require to perform pragmatic reuse tasks, (2) increase the likelihood that developers will successfully complete pragmatic reuse tasks, (3) decrease the time required by developers to identify infeasible reuse tasks, and (4) improve developers' sense of their ability to manage the risk in such tasks.
Test-driven reuse: Key to improving precision of search engines for software reuse
  • O Hummel
  • W Janjic
Semantic component retrieval in software engineering
  • O Hummel