Conference PaperPDF Available

User-Perceived Reusability Estimation based on Analysis of Software Repositories

Authors:

Abstract and Figures

The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the- practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.
Content may be subject to copyright.
User-Perceived Reusability Estimation Based on
Analysis of Software Repositories
Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis, Andreas Symeonidis
Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki
Thessaloniki, Greece
mpapamic@issel.ee.auth.gr, thdiaman@issel.ee.auth.gr, iliachry@ece.auth.gr, filippas@ece.auth.gr, asymeon@eng.auth.gr
Abstract—The popularity of open-source software repositories
has led to a new reuse paradigm, where online resources can be
thoroughly analyzed to identify reusable software components.
Obviously, assessing the quality and specifically the reusability
potential of source code residing in open software repositories
poses a major challenge for the research community. Although
several systems have been designed towards this direction, most
of them do not focus on reusability. In this paper, we define and
formulate a reusability score by employing information from
GitHub stars and forks, which indicate the extent to which
software components are adopted/accepted by developers. Our
methodology involves applying and assessing different state-of-
the-practice machine learning algorithms, in order to construct
models for reusability estimation at both class and package levels.
Preliminary evaluation of our methodology indicates that our
approach can successfully assess reusability, as perceived by
developers.
Index Terms—source code quality, reusability, static analysis,
user-perceived quality
I. INTRODUCTION
The concept of software reuse is not new; developers have
always tried to reuse standalone sections of (own or others’)
code, by taking advantage of assets that have already been
implemented and released, either wrapped within software
libraries or even as components of applications. The benefits of
this paradigm include the reduction of time and effort required
for software development. Up until recently, the quest for
high-quality code to reuse was restricted within organizations,
or even within groups, which obviously led to suboptimal
searches, given the limited number of projects considered.
During the last decade, however, the popularity of the open-
source software paradigm has created the critical mass of
online software projects needed, and this deluge of source
code lying in online repositories has altered the main challenge
of software reuse. Finding software components that satisfy
certain functional requirements has become easy. Instead, what
is now important is to identify components that are suitable for
reuse. Assessing the reusability of a component before using
it is crucial, since components of poor quality are usually hard
to integrate and in some cases they even introduce faults.
The concept of reusability is linked to software quality,
and assessing the quality of a component is a challenging
task. Quality is multi-faceted and evaluated differently by
different people, while it also depends on the scope and the
requirements of each software project [1]. Should one follow
the ISO/IEC 25010:2011 [2] and ISO/IEC 9126 [3] standards,
software reusability, i.e. the extent to which a software com-
ponent is reusable is related to four major quality properties:
Understandability,Learnability,Operability and Attractive-
ness. These properties are directly associated with Usability,
and further affect Functional Suitability,Maintainability and
Portability, thus covering all four quality attributes related
to reusability [4], [5] (the rest characteristics are Reliability,
Performance and Efficiency,Security, and Compatibility).
Some of the aforementioned attributes can be effectively
defined by using static analysis metrics, such as the known CK
metrics [6], which have been extensively used for estimating
software quality [7], [8]. However, one should mention that
current research efforts largely focus on quality characteristics
such as maintainability or security, and assessing reusability
has not yet been extensively addressed. Furthermore, most
quality models depend on fixed thresholds for the static
analysis metrics, usually defined by an expert [9], and are
not efficient enough for quality estimation in all cases. On the
other hand, adaptable threshold models suffer more or less
from the same limitations, as their ground truth is again an
expert-defined quality value [10]. To this end, we argue that
an interesting alternative involves employing user-perceived
quality as a measure of the quality of a software component,
an approach initially explored in [11].
In this work, we employ the concepts proposed in [11] so
that we associate the extent to which a software component
is adopted (or preferred) by developers, i.e. its popularity,
with the extent to which it is reusable, i.e. its reusability.
Typically, popularity can be determined by the number of
stars and forks of GitHub repos. Thus, an important challenge
involves relating this information to reusability. We argue that
addressing this challenge requires constructing a methodology
to decompose the influence of various static analysis metrics
into the reusability degree of software. Unlike various systems
that employ expert-based approaches, our methodology uses
GitHub stars and forks as ground truth information towards
identifying the static analysis metrics that influence reusability.
Upon computing a large set of metrics both at class and
package level, we model their behavior to translate their values
into a reusability score. Metric behaviors are then used to train
a set of reusability estimation models using different state-
of-the-practice machine learning algorithms. Those models
estimate the reusability degree of components at both class
and package level, as perceived by developers.
II. RE LATE D WOR K
ISO/IEC 25010:2011 [2] defines reusability as a quality
characteristic that refers to the degree to which an asset
can be used in more than one system, or for building other
assets. There are several approaches that aspire to assess the
reusability of source code components using static analysis
metrics [12], [13]. This assessment, however, is a non-trivial
task, and often requires the aid of quality experts to examine
and evaluate the source code. Since, however, the manual
examination of source code can be very tedious, a common
practice involves creating a benchmarking code repository us-
ing representative software projects and then applying machine
learning techniques in order to calculate thresholds and define
the acceptable metric intervals [14]–[16].
Further advancing the aforementioned systems, some re-
search efforts attempt to derive reusability by setting thresh-
olds for quality metrics. Diamantopoulos et al. [4] proposed
a metric-based reusability scorer that calculates the degree
of reusability of software components based on the values
of eight static analysis metrics. The assessment depends on
whether the values of the metrics exceed certain thresholds,
as defined in current literature. When several thresholds are
exceeded, the returned reusability score is lower.
Since using predefined metric thresholds is also limited by
expert knowledge and may not be applicable on the specifics
of different software projects (e.g. the scope of an application),
several approaches have been proposed to overcome the neces-
sity of using thresholds [17]–[20]. These approaches involve
quantifying reusability through reuse-related information such
as reuse frequency [17]. Then, machine learning techniques
are employed in order to train reusability evaluation models
using as input the values of static analysis metrics [18]–[20].
Although the aforementioned approaches can be effective
for certain cases, their applicability in real-world scenarios is
limited. At first, using predefined metrics thresholds [4] leads
to the creation of models unable to incorporate the different
characteristics of software projects. Automated reusability
evaluation systems seem to overcome this issue [18]–[20],
however they are still confined by the ground truth knowl-
edge of quality experts for evaluating the source code and
determining whether the degree of reuse is acceptable. Apart
from its effect on both time and resources, this process may
also lead to subjective evaluation, as each expert may prioritize
differently the importance of each quality characteristic [21].
In this work, we build a reusability estimation system to pro-
vide a single score for every class and every package of a given
software component. The estimation is based on the values of
a large set of static analysis metrics and measures the extent to
which the component is preferred by developers. We adopt the
paradigm proposed in [11] and further extend it in the context
of reusability. The authors in [11] employ the information of
GitHub stars and forks, and through expert knowledge formu-
late and subsequently estimate software quality as perceived
by developers. Instead, we initially evaluate the impact of
each metric into the reusability degree of software components
individually, and then aggregate the outcome of our analysis in
order to construct a final reusability score. Finally, we quantify
the reusability degree of software components both at class
and package level by training reusability estimation models
that effectively estimate the degree to which a component is
reusable as perceived by software developers.
III. REUSABILITY MODELLING
In this section, we present an analysis of reusability from
a quality attributes perspective and design a reusability score
using GitHub information and static analysis metrics.
A. GitHub Popularity as Reusability Indicator
We associate reusability with the following major properties
described in ISO/IEC 25010 [2] and 9126 [3]): Understand-
ability,Learnability,Operability and Attractiveness. Accord-
ing to research performed by ARiSA [22], various static
analysis metrics are highly related to these properties. Table
I summarizes the relations between the six main categories
of static analysis metrics and the aforementioned reusability-
related quality properties. “P” is used to show proportional
relation and “IP” implies inverse-proportional relation.
TABLE I
CATEGORIES OF STATIC AN ALYSI S MET RIC S RE LATE D TO REUSABILITY
Source Code
Properties
Reusability Related Quality Properties
Forks related Stars related
Undestandability
Learnability
Operability
Attractiveness
Complexity IP IP IP P
Coupling IP IP IP P
Cohesion P P P P
Documentation P P
Inheritance IP IP – P
Size IP IP IP P
P: Proportional Relation
IP: Inverse-Proportional Relation
As already mentioned, we employ GitHub stars and forks in
order to quantify reusability and subsequently associate it with
the aforementioned properties. As forks measure how many
times the software repository has been cloned, they can be as-
sociated with Understandability,Learnability and Operability,
as those properties formulate the degree to which a software
component is [re]usable. Stars, on the other hand, reflect the
number of developers that found the repository interesting,
thus we may use them a measure of its Attractiveness.
B. Benchmark Dataset
We created a dataset that includes the values of the static
analysis metrics shown in Table II, for the 100 most starred
and 100 most forked GitHub Java projects (137 in total).
These projects amount to more than 12M lines of code spread
in almost 15K packages and 150K classes. All metrics were
extracted at class and package level using SourceMeter1.
1https://www.sourcemeter.com/
TABLE II
OVE RVIE W OF STATIC METRICS AND THEIR APPLICABILITY ON DI FFER EN T LEVE LS
Static Analysis Metrics Compute Levels
Type Name Description Class Package
Complexity NL{·,E}Nesting Level {Else-If} ×
WMC Weighted Methods per Class ×
Coupling
CBO{·,I}Coupling Between Object classes {Inverse} ×
N{I,O}I Number of {Incoming, Outgoing}Invocations ×
RFC Response set For Class ×
Cohesion LCOM5 Lack of Cohesion in Methods 5 ×
Documentation
AD API Documentation ×
{·,T}CD {Total}Comment Density × ×
{·,T}CLOC {Total}Comment Lines of Code × ×
DLOC Documentation Lines of Code ×
P{D,U}A Public {Documented, Undocumented}API × ×
TAD Total API Documentation ×
TP{D,U}A Total Public {Documented, Undocumented}API ×
Inheritance DIT Depth of Inheritance Tree ×
NO{A,C,D,P}Number of {Ancestors, Children, Descendants, Parents} ×
Size
{·,T}{·,L}LOC {Total} {Logical}Lines of Code × ×
N{A,G,M,S}Number of {Attributes, Getters, Methods, Setters} × ×
N{CL,EN,IN,P}Number of {Classes, Enums, Interfaces, Packages} ×
NL{A,G,M,S}Number of Local {Attributes, Getters, Methods, Setters} ×
NLP{A,M}Number of Local Public {Attributes, Methods} ×
NP{A,M}Number of Public {Attributes, Methods} × ×
NOS Number of Statements ×
TNP{CL,EN,IN}Total Number of Public {Classes, Enums, Interfaces} ×
TN{CL,DI,EN,FI}Total Number of {Classes, Directories, Enums, Files} ×
C. Evaluation of Metrics’ Influence on Reusability
As GitHub stars and forks refer to repository level, they
are not adequate on their own for estimating the reusability of
class level and package level components. Thus, we estimate
reusability using static analysis metrics. For each metric, we
first perform distribution-based binning and then relate its
values to those of the stars/forks to incorporate reusability
information. The final reusability estimation is computed by
aggregating over the estimations derived by each metric.
1) Binning Based on the Distribution of Values: As the
values of each metric are distributed differently among the
repositories of our dataset, we first define a set of intervals
(bins), unified across repositories, that approximate the actual
distribution of the values. Thus, we use values from all
packages (or classes) of our dataset to formulate a generic
distribution, and then determine the optimal bin size that
results in the minimum information loss. We use the Doane
formula [23] for selecting the bin size in order to account for
the skewness of the data. Figure 1 depicts the histogram of
the Comments Density (CD) metric at package level, which
will be used as an example throughout this section. Following
our binning strategy, 20 bins are produced. CD values appear
to have positive skewness, while their highest frequency is in
the interval [0.13, 0.39]. After having selected the appropriate
bins for each metric, we construct the histograms for each
of the 137 repositories.An example for the histograms of
two repositories is shown in Figure 2. The two distributions
differ, which is expected since each repository has it own
characteristics (different scope, contributors, etc.).
[
0.0,0.04)
[0.04,0.09)
[0.09,0.13)
[0.13,0.17)
[0.17,0.22)
[0.22,0.26)
[0.26,0.3)
[0.3,0.35)
[0.35,0.39)
[0.39,0.43)
[0.43,0.48)
[0.48,0.52)
[0.52,0.56)
[0.56,0.61)
[0.61,0.65)
[0.65,0.69)
[0.69,0.74)
[0.74,0.78)
[0.78,0.82)
[0.82,0.87)
0
500
1000
Comments Density (CD)
Frequency
Fig. 1. Package-level Distribution of Comments Density for all Repositories.
[
0.0,0.04)
[0.04,0.09)
[0.09,0.13)
[0.13,0.17)
[0.17,0.22)
[0.22,0.26)
[0.26,0.3)
[0.3,0.35)
[0.35,0.39)
[0.39,0.43)
[0.43,0.48)
[0.48,0.52)
[0.52,0.56)
[0.56,0.61)
[0.61,0.65)
[0.65,0.69)
[0.69,0.74)
[0.74,0.78)
[0.78,0.82)
[0.82,0.87)
0
10
20
30
40
50
Repository 1
(Number of Stars: 1751)
Repository 2
(Number of Stars: 4550)
Comments Density (CD)
Frequency
Fig. 2. Package-level Distribution of Comments Density for two Repositories.
2) Relate Bins Values with GitHub Stars and Forks: In this
step, we construct a set of data instances that relate each metric
bin value to a GitHub stars (or forks) value. To that end, we
use the produced histograms (one for each repository using
the bins calculated in the previous step) in order to construct
a set of data instances that relate each metric bin value (here
the CD) to a GitHub stars (or forks) value. So, we aggregate
these values for all the bins of each metric, i.e. for metric X
and bin 1 we gather all stars (or forks) values that correspond
to packages (or classes) for which the metric value lies in bin
1 and aggregate them using an averaging method. This process
is repeated for every metric and every bin.
Practically, for each bin value of the metric, we gather all
relevant data instances and calculate the weighted average of
their stars (or forks) count, which represents the stars-based
(or forks-based) reusability value for the specific bin. The
reusability scores are defined using the following equations:
RSM etric(i) =
N
X
repo=1
freqp.u. (i)·log(S(repo)) (1)
RFM etric(i) =
N
X
repo=1
freqp.u. (i)·log(F(repo)) (2)
where RSM etric(i)and RFM etric (i)refer to the stars and
the forks-based reusability score of the i-th bin for the metric
under evaluation, respectively.S(repo)and F(repo)refer to
the number of stars and the number of forks of the repository,
respectively, while the use of logarithm acts as a smoothing
factor between the big differences in the number of stars and
forks among the repositories. Finally, the term freqp.u. (i)is
the normalized/relative frequency of the metric value of the
i-th bin, defined as Fi/PN
i=1(Fi), where Fiis the absolute
frequency (i.e. count) of the values lying in the i-th bin. For
example, if a repository had 3bins with 5CD values in bin 1,
8values in bin 2, and 7values in bin 3, then the normalized
frequency for bin 1would be 5/(5 + 8 + 7) = 0.25, for bin
2 it would be 8/(5 + 8 + 7) = 0.4, and for bin 2it would be
7/(5 + 8 + 7) = 0.35. The use of normalized frequency was
chosen to eliminate any biases caused by the high variance in
the number of packages among the different repositories.
Figure 3 illustrates the results of applying (1), i.e. the star-
based reusability, to the AD metric values at package level.
As shown in this Figure, the reusability score based on AD is
maximum for AD values in the interval [0.3,0.4].
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Api Documentation (AD)
Score based on Stars
Fig. 3. Api Documentation versus Star-based Reusability Score.
Finally, based on the fact that we compute a forks-based
and a stars-based reusability score for each metric, the final
reusability score for each source code component (class or
package) is given by the following equations:
RSF inal =Pk
j=1 RSM etric(j)·corr(metricj, stars)
Pk
j=1 corr(metricj, stars)(3)
RFF inal =Pk
j=1 RFM etric(j)·corr(metricj, f orks)
Pk
j=1 corr(metricj, forks)(4)
ReusabilityScore =3·RFF inal +RSF inal
4(5)
where kis the number of metrics at each level. RSF inal and
RFF inal correspond to the final stars-based and forks-based
scores. RSmetric (j)and RFmetric(j)refer to the scores for
the jth metric as given by equations (1) and (2), while
corr(metricj, stars)and corr(metricj, f orks)correspond
to the Pearson correlation coefficient between the values of j
th metric and the stars and forks. Finally, ReusabilityScore
refers to the final reusability score and is the weighted average
of the final stars-based and forks-based scores. More weight
(3 vs 1) is given in the forks-based score as it is associated
with more reusability-related quality attributes (see Table I).
IV. REUSABILITY ESTIMATION
In this section, we devise a methodology that receives
as input the values of static analysis metrics and estimates
software reusability at class and package level. In specific, we
use the calculated bins (see Section III) to train one model
for each individual metric applied at each level. The output of
each model provides a reusability score that originates from
the value of the class (or package) under evaluation for the
corresponding metric. All the scores are then aggregated to a
final reusability score that represents the degree to which the
class (or package) is adopted by developers.
We evaluate three techniques to select the optimal for fitting
the metrics’ behavior: Support Vector Regression (SVR) with
radial basis function (RBF) kernel, Random Forest using bag-
ging ensemble, and Polynomial Regression where the optimal
degree is determined by applying the elbow method on the
square sum of residuals. To account for cases where the
number of bins, and consequently the number of training data
points, is low, we used linear interpolation up to the point
where the dataset for each model contained 60 instances.
Figure 4 illustrates the fitting procedure for the case of
the forks-based reusability score based on the values of the
API Documentation (AD) metric at package level. The figure
depicts four lines, one that corresponds to the actual behavior
of the metric, and three more lines, one for each model. It
is obvious that the Random Forest outperforms the other two
models (SVR and Polynomial Regression). This fact is also
reflected in the values of the Mean Absolute Error for the AD
metric, which are 0.0688, 0.1201 and 0.1228 respectively.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0
0.2
0.4
0.6
0.8
1
Api Documentation (AD)
Reusability Score
(Forks - Package)
Fig. 4. Fitting procedure for the Api Documentation metric at Package level.
We further compare the three models using the Normalized
Root Mean Squared Error (NRMSE) metric. Given the actual
scores y1, y2, . . . , yn, the predicted scores ˆy1,ˆy2,...,ˆyn, and
the mean actual score ¯y, the NRMSE is calculated as follows:
NRM SE =v
u
u
t
1
N
·PN
i=1(ˆyiyi)2
PN
i=1(¯yyi)2(6)
where Nis the number of samples in the dataset. NRMSE
was selected as it does not only take into account the average
difference between the actual and the predicted values, but
also provides a comprehensible result in a certain scale.
Figure 5 presents the mean NRMSE for the three algorithms
regarding the reusability scores (forks/stars based) at both class
and package levels. The Random Forest clearly outperforms
the other two algorithms in all four categories.
RF_Score RS_Score RF_Score RS_Score
0
0.02
0.04
0.06
0.08
0.1
Random Forest
Polynomial
SVR
Package level Class level
Average
NRMSE
Fig. 5. Average NRMSE for all three machine learning algorithms.
Figure 6 depicts the distribution of the reusability score at
class and package levels. As expected, the score in both cases
follows a distribution similar to normal and the majority of
instances are accumulated evenly around 0.5. For the score
at class level, we observe a left-sided skewness. After manual
inspection of the classes with scores in [0.2,0.25], they appear
to contain little valuable information (e.g. most of them have
LOC <10) and thus are given low score.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
4000
8000
12000
16000
Frequency
ReusabilityScore
(a)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
500
1000
1500
2000
Frequency
ReusabilityScore
(b)
Fig. 6. Reusability score distribution at (a) Class and (b) Package level.
V. EVALUATI ON
To evaluate our approach, we used the 5 projects shown
in Table III, out of which 3 were retrieved from GitHub and
2 were automatically generated using the tools of S-CASE2.
Human-generated projects are expected to exhibit high devia-
tions in their reusability score, as they include variable sets of
components. By contrast, auto-generated projects are RESTful
services and, given their automated structure generation, are
expected to have high reusability scores and low deviations.
2http://s-case.github.io/
TABLE III
DATASET STATISTICS
# Project Name Type # Packages # Classes
1 realm-java Human-generated 137 3859
2 liferay-portal Human-generated 1670 3941
3 spring-security Human-generated 543 7099
4 Webmarks Auto-generated 9 27
5 WSAT Auto-generated 20 127
A. Reusability Estimation Evaluation
Figures 7a and 7b depict the distributions of the reusability
score for all projects at class level and package level, re-
spectively. The boxplots in blue refer to the auto-generated
projects, while the ones in red refer to the human-generated
ones. At first, it is obvious that the variance of the reusability
scores is higher in the human-generated projects than in the
auto-generated ones. This is expected since the projects that
were generated automatically have proper architecture and
high abstraction levels. These two projects also have similar
reusability values, which is due to the fact that projects gener-
ated from the same tool ought to share similar characteristics
that are reflected in the values of the static analysis metrics.
The high variance for the human-generated projects in-
dicates that our methodology is capable of distinguishing
components with both high and low degrees of reusability.
Finally, given the reusability score distribution for all projects,
we conclude that the results are consistent regardless of the
project size. Despite the fact that the number of classes and
the number of packages vary from very low values (e.g. only 9
packages and 27 classes) to very high values (e.g. 1670 pack-
ages and 7099 classes), the reusability score is not affected.
B. Example Reusability Estimation
Further assessing the validity of the reusability scores,
we manually examined the static analysis metrics of sample
classes and packages in order to check whether they align
with the estimation. Table IV provides an overview of a
subset of the computed metrics for representative examples
of classes and packages with different reusability scores. The
table contains static analysis metrics for two classes and two
packages that received both high and low reusability scores.
Concerning the components that received high reusability
score (Class 1 and Package 1), they appear to be well docu-
mented (the value of the Comments Density (CD) is 20.31%
and 41.5%, respectively), which indicates that they are suitable
for reuse. In addition, their values for the Lines of Code (LOC)
metric, which are highly correlated with understandability
and thus reusability, are typical. Thus, the scores for those
components are rational. On the other hand, the class that
received low score (Class 2) appears to have low cohesion
(the LCOM5 metric value is high) and high coupling (the
RFC metric value is high), while the package with low score
(Package 2) appears to have little valuable information (only
38 LOC). Those code properties are crucial for the reusability-
related quality attributes and thus the low scores are expected.
0.14 0.17 0.18
0.11 0.12
0.43
0.46 0.47
0.10 0.18
0
0.2
0.4
0.6
0.8
1
ReusabilityScore
(a)
0.40 0.23 0.38
0.06
0.32
0.32 0.41 0.27
0.16 0.12
0
0.2
0.4
0.6
0.8
1
ReusabilityScore
(b)
Fig. 7. Boxplots depicting Reusability Distributions for 3 Human-generated ( ) and 2 Auto-generated ( ) projects, (a) at Class level and (b) at Package level.
TABLE IV
METRICS FOR CLA SSE S AND PAC KAG ES WI TH D IFFE RE NT REUSABILITY
Metrics Classes Packages
Class 1 Class 2 Package 1 Package 2
WMC 14 12
CBO 12 3
LCOM5 2 11 – –
CD (%) 20.31% 10.2% 41.5% 0.0%
RFC 12 30 – –
LOC 84 199 2435 38
TNCL – – 8 2
Reusability Score 95.78% 10.8% 90.59% 16.75%
VI. CONCLUSION AND FUTURE WO RK
In this work, we proposed a novel software reusability
estimation approach based on the rationale that the reusability
of a component is associated with the way it is perceived
by developers, and thus could be useful for assessing the
reusability of a component before integrating it into one’s
source code. Our evaluation indicates that our approach can be
effective for estimating reusability at class and package level.
Concerning future work, an interesting idea would be to
investigate and possibly redesign the reusability score in a
domain-specific context. Finally, evaluating our system under
realistic reuse scenarios, possibly also involving software
developers, would be useful to further validate our approach.
ACKNOWLEDGMENT
Parts of this work have been supported by the European
Union’s Horizon 2020 research and innovation programme
(grant agreement No 693319).
REFERENCES
[1] S. Pfleeger and B. Kitchenham, “Software quality: The elusive target,
IEEE Software, pp. 12–21, 1996.
[2] “ISO/IEC 25010:2011,” https://www.iso.org/obp/ui/#iso:std:iso-iec:
25010:ed-1:v1:en, 2011, [Online; accessed October 2017].
[3] “ISO/IEC 9126-1:2001,” https://www.iso.org/standard/22749.html,
2001, [Online; accessed October 2017].
[4] T. Diamantopoulos, K. Thomopoulos, and A. Symeonidis, “QualBoa:
reusability-aware recommendations of source code components,” in
IEEE/ACM 13th Working Conference on Mining Software Repositories
(MSR), 2016. IEEE, 2016, pp. 488–491.
[5] F. Taibi, “Empirical Analysis of the Reusability of Object-Oriented
Program Code in Open-Source Software,” International Journal of
Computer, Information, System and Control Engineering, vol. 8, no. 1,
pp. 114–120, 2014.
[6] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented
design,” IEEE Transactions on Software Engineering, vol. 20, no. 6, pp.
476–493, 1994.
[7] C. Le Goues and W. Weimer, “Measuring code quality to improve
specification mining,” IEEE Transactions on Software Engineering,
vol. 38, no. 1, pp. 175–190, 2012.
[8] H. Washizaki, R. Namiki, T. Fukuoka, Y. Harada, and H. Watanabe, “A
framework for measuring and evaluating program source code quality,
in Proceedings of the 8th International Conference on Product-Focused
Software Process Improvement, 2007, pp. 284–299.
[9] S. Zhong, T. M. Khoshgoftaar, and N. Seliya, “Unsupervised Learning
for Expert-Based Software Quality Estimation,” in Proceedings of the
Eighth IEEE International Conference on High Assurance Systems
Engineering, ser. HASE’04, Washington, DC, USA, 2004, pp. 149–155.
[10] T. Cai, M. R. Lyu, K.-F. Wong, and M. Wong, “ComPARE: A generic
quality assessment environment for component-based software systems,
in Intern. Symposium on Information Systems and Engineering, 2001.
[11] M. Papamichail, T. Diamantopoulos, and A. Symeonidis, “User-
Perceived Source Code Quality Estimation Based on Static Analysis
Metrics,” in IEEE International Conference on Software Quality, Relia-
bility and Security (QRS), 2016. IEEE, 2016, pp. 100–107.
[12] A. P. Singh and P. Tomar, “Estimation of Component Reusability through
Reusability Metrics,” International Journal of Computer, Electrical,
Automation, Control and Information Engineering, vol. 8, no. 11, pp.
1965–1972, 2014.
[13] P. S. Sandhu and H. Singh, “A reusability evaluation model for OO-
based software components,” International Journal of Computer Sci-
ence, vol. 1, no. 4, pp. 259–264, 2006.
[14] K. A. Ferreira, M. A. Bigonha, R. S. Bigonha, L. F. Mendes, and H. C.
Almeida, “Identifying thresholds for object-oriented software metrics,”
Journal of Systems and Software, vol. 85, no. 2, pp. 244–257, 2012.
[15] T. L. Alves, C. Ypma, and J. Visser, “Deriving metric thresholds
from benchmark data,” in IEEE International Conference on Software
Maintenance (ICSM). IEEE, 2010, pp. 1–10.
[16] P. Oliveira, M. T. Valente, and F. P. Lima, “Extracting relative thresholds
for source code metrics,” in IEEE Conference on Software Maintenance,
Reengineering and Reverse Engineering. IEEE, 2014, pp. 254–263.
[17] T. G. Bay and K. Pauls, “Reuse Frequency as Metric for Component
Assessment,” ETH, Department of Computer Science, Zurich, Tech.
Rep., 2004, technical Reports D-INFK.
[18] A. Kaur, H. Monga, M. Kaur, and P. S. Sandhu, “Identification and
performance evaluation of reusable software components based neural
network,” International Journal of Research in Engineering and Tech-
nology, vol. 1, no. 2, pp. 100–104, 2012.
[19] S. Manhas, R. Vashisht, P. S. Sandhu, and N. Neeru, “Reusability
Evaluation Model for Procedure-Based Software Systems,International
Journal of Computer and Electrical Engineering, vol. 2, no. 6, 2010.
[20] A. Kumar, “Measuring Software reusability using SVM based classifier
approach,” International Journal of Information Technology and Knowl-
edge Management, vol. 5, no. 1, pp. 205–209, 2012.
[21] T. Bakota, P. Heged˝
us, P. K¨
ortv´
elyesi, R. Ferenc, and T. Gyim´
othy,
“A probabilistic software quality model,” in 27th IEEE International
Conference on Software Maintenance (ICSM), 2011, pp. 243–252.
[22] “ARiSA - Reusability related metrics,” http://www.arisa.se/compendium/
node38.html, [Online; accessed September 2017].
[23] D. P. Doane and L. E. Seward, “Measuring skewness: a forgotten
statistic,” Journal of Statistics Education, vol. 19, no. 2, pp. 1–18, 2011.
... The assessment depends onwhether the values of the metrics exceed certain thresholds,as defined in current literature. When several thresholds areexceeded, the returned reusability score is lower (Papamichail, et al., 2018). In (Papamichail et al., 2018), a reusability score has been formulated by employing information from GitHub stars and forks, which denote the extent to which software components are adopted by developers. ...
... When several thresholds areexceeded, the returned reusability score is lower (Papamichail, et al., 2018). In (Papamichail et al., 2018), a reusability score has been formulated by employing information from GitHub stars and forks, which denote the extent to which software components are adopted by developers. Here, various machine learning algorithms have been evaluated to construct models for reusability estimation at both class and package levels. ...
... The simulation results of the proposed software reusability scheme is compared with the several works, such as NN (Padhy, et al., 2017a) ,Evolutionary computing enriched artificial intelligence (EC-AI) based regression (Padhy, et al., 2018), Fuzzy (Singh &Tomar, 2017), and User-Perceived Reusability Estimation (Papamichail et al., 2018). ...
Article
Measuring the software reusability has become a prime concern in maintaining the quality of the software. Several techniques use software related metrics and measure the reusability factor of the software, but still face a lot of challenges. This work develops the software reusability estimation model for efficiently measuring the quality of the software components over time. Here, the Rider based Neural Network has been used along with the hybrid optimization algorithm for defining the reusability factor. Initially, nine software related metrics are extracted from the software. Then, a holoentropy based log function identifies the Measuring the software reusability has become a prime concern in maintaining the quality of the software. Several techniques use software related metrics and measure the reusability factor of the software, but still face a lot of challenges. This work develops the software reusability estimation model for efficiently measuring the quality of the software components over time. Here, the Rider based Neural Network has been used along with the hybrid optimization algorithm for defining the reusability factor. Initially, nine software related metrics are extracted from the software. Then, a holoentropy based log function identifies the normalized metric function and provides it to the proposed Cat Swarm Rider Optimization based Neural Network (C-RideNN) algorithm for the software reusability estimation. The proposed C-RideNN algorithm uses the existing Cat Swarm Optimization (CSO) along with the Rider Neural Network (RideNN) for the training purpose. Experimentation results of the proposed C-RideNN are evaluated based on metrics, such as Magnitude of Absolute Error (MAE), Mean Magnitude of the Relative Error (MMRE), and Standard Error of the Mean (SEM). The simulation results reveal that the proposed C-RideNN algorithm has improved performance with 0.0570 as MAE, 0.0145 as MMRE, and 0.6133 as SEM.
... In previous work, we have attempted to provide such an effective ground truth quality value, by associating the extent to which a software component is adopted/preferred by developers, i.e. its popularity, with the extent to which it is reusable [27]. We initially employed GitHub stars and forks to build a target quality score at repository level, and then proposed different mechanisms, using heuristics [28,29] and statistical binning/benchmarking [30], in order to build a target reusability score at component level. Based on the evaluation of those works, one may conclude that this target score can be effectively used to train models for estimating the reusability of source code components. ...
... Our choice of these repositories is supported by the fact that popular projects typically incorporate examples of properly written source code. Indeed, research has shown that highly rated projects (i.e. with large number of stars/forks) exhibit also high quality [27,28,29,30], have sufficient documentation/readme files [45,46], while they also involve frequent refactoring cycles and maintenance releases [47]. Based on the above, we argue that the projects of AGORA can be used as a proper pool of projects to define our benchmark dataset. ...
Article
Full-text available
Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.
... They ranked learning resources on the basis of adaptation efforts. Papamichail et al. [16] relates popularity of software with reusability using GitHub Stars and Forks repos. GitHub Stars and Forks formulated a reusability score. ...
Chapter
Full-text available
Software engineering is an application of engineering which is more focused on original development, but reusability plays a very significant role in order to produce good quality, error free, and less maintainable software. Software reusability is an attribute of quality which helps in selecting beforehand acquired notions in new statuses. Software reusability not only advances productivity, but it also provides a good quality software and has also optimistic effect on maintainability. Software reusability is advantageous in the manner that it provides high reliability, low cost of maintenance, and reduction in development time. In this paper, we have discussed and analyzed various machine learning techniques used for estimation of software reusability. It is found that machine learning techniques are competitive in nature with other reusability estimation techniques and can be used for estimation of reusability. This study will help software developers and information industry to elucidate that how software reusability can assist them in selecting advanced quality of software.
... Other techniques to prioritize and select the most suitable candidate, besides interoperability [59], is to take into account the functional and non-functional requirements [59] of the specific domain. By evaluating reusable components through metrics that indicate the acceptance of a component by the developers [40], we can effectively estimate the reusability, leading to a more suitable selection prior to integrating it. The next step, after identifying the suitable reusable component, is to integrate it into the target software system. ...
Article
Full-text available
The major challenge that a developer confronts when building IoT systems is the management of a plethora of technologies implemented with various constraints, from different manufacturers, that at the end need to cooperate. In this paper we argue that developers can benefit from IoT frameworks by reusing their components so as to build in less time and effort IoT systems that can easily integrate new technologies. In order to explore the reuse opportunities offered by IoT frameworks we have performed a case study and analyzed 503 components reused by 35 IoT projects. We examined (a) the types of functionality that are most facilitated for reuse (b) the reuse strategy that is most adopted (c) thequality of the reused components. The results of the case study suggest that the main functionality reused is the one related to the Device Management layer and that Black-box reuse is the main type. Moreover, the quality of the reused components is improved compared to the rest of the components built from scratch.
Chapter
In the context of reusing components from online repositories, assessing the quality and specifically the reusability of source code before reusing it poses a major challenge for the research community. Although several quality assessment systems have been proposed, most of them do not focus on reusability. In this chapter, we design a reusability score using as ground truth information from GitHub stars and forks, which indicate the extent to which software components are adopted/preferred by developers. Our methodology includes applying different machine learning algorithms in order to produce reusability estimation models at both class and package levels. Finally, evaluating our methodology indicates that it can be effective for assessing reusability as perceived by developers.
Conference Paper
Full-text available
The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. However, most software quality estimation tools, even the ones using adaptable models, depend on fixed metric thresholds for defining the ground truth. In this work we argue that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. We present a generic methodology that relates quality with source code metrics and estimates the quality of software components residing in popular GitHub repositories. Our methodology employs two models: a one-class classifier, used to rule out low quality code, and a neural network, that computes a quality score for each software component. Preliminary evaluation indicates that our approach can be effective for identifying high quality software components in the context of reuse.
Article
Full-text available
Software reusability is an essential characteristic of Component-Based Software (CBS). The component reusability is an important assess for the effective reuse of components in CBS. The attributes of reusability proposed by various researchers are studied and four of them are identified as potential factors affecting reusability. This paper proposes metric for reusability estimation of black-box software component along with metrics for Interface Complexity, Understandability, Customizability and Reliability. An experiment is performed for estimation of reusability through a case study on a sample web application using a real world component.
Conference Paper
Full-text available
In order to take the right decisions in estimating the costs and risks of a software change, it is crucial for the developers and managers to be aware of the quality attributes of their software. Maintainability is an important characteristic defined in the ISO/IEC 9126 standard, owing to its direct impact on development costs. Although the standard provides definitions for the quality characteristics, it does not define how they should be computed. Not being tangible notions, these characteristics are hardly expected to be representable by a single number. Existing quality models do not deal with ambiguity coming from subjective interpretations of characteristics, which depend on experience, knowledge, and even intuition of experts. This research aims at providing a probabilistic approach for computing high-level quality characteristics, which integrate expert knowledge, and deal with ambiguity at the same time. The presented method copes with "goodness" functions, which are continuous generalizations of threshold based approaches, i.e. instead of giving a number for the measure of goodness, it provides a continuous function. Two different systems were evaluated using this approach, and the results were compared to the opinions of experts involved in the development. The results show that the quality model values change in accordance with the maintenance activities, and they are in a good correlation with the experts' expectations.
Conference Paper
Full-text available
Current software quality estimation models often involve using supervised learning methods to train a software quality classifier or a software fault prediction model. In such models, the dependent variable is a software quality measurement indicating the quality of a software module by either a risk-based class membership (e.g., whether it is fault-prone or not fault-prone) or the number of faults. In reality, such a measurement may be inaccurate, or even unavailable. In such situations, this paper advocates the use of unsupervised learning (i.e., clustering) techniques to build a software quality estimation system, with the help of a software engineering human expert. The system first clusters hundreds of software modules into a small number of coherent groups and presents the representative of each group to a software quality expert, who labels each cluster as either fault-prone or not fault-prone based on his domain knowledge as well as some data statistics (without any knowledge of the dependent variable, i.e., the software quality measurement). Our preliminary empirical results show promising potentials of this methodology in both predicting software quality and detecting potential noise in a software measurement and quality dataset.
Conference Paper
Full-text available
Establishing credible thresholds is a central challenge for promoting source code metrics as an effective instrument to control the internal quality of software systems. To address this challenge, we propose the concept of relative thresholds for evaluating metrics data following heavy-tailed distributions. The proposed thresholds are relative because they assume that metric thresholds should be followed by most source code entities, but that it is also natural to have a number of entities in the “long-tail” that do not follow the defined limits. In the paper, we describe an empirical method for extracting relative thresholds from real systems. We also report a study on applying this method in a corpus with 106 systems. Based on the results of this study, we argue that the proposed thresholds express a balance between real and idealized design practices.
Article
While software metrics are a generally desirable feature in the software management functions of project planning and project evaluation, they are of especial importance with a new technology such as the object-oriented approach. This is due to the significant need to train software engineers in generally accepted object-oriented principles. This paper presents theoretical work that builds a suite of metrics for object-oriented design. In particular, these metrics are based upon measurement theory and are informed by the insights of experienced object-oriented software developers. The proposed metrics are formally evaluated against a widelyaccepted list of software metric evaluation criteria.
Conference Paper
Contemporary software development processes involve finding reusable software components from online repositories and integrating them to the source code, both to reduce development time and to ensure that the final software project is of high quality. Although several systems have been designed to automate this procedure by recommending components that cover the desired functionality, the reusability of these components is usually not assessed by these systems. In this work, we present QualBoa, a recommendation system for source code components that covers both the functional and the quality aspects of software component reuse. Upon retrieving components, QualBoa provides a ranking that involves not only functional matching to the query, but also a reusability score based on configurable thresholds of source code metrics. The evaluation of QualBoa indicates that it can be effective for recommending reusable source code.
Article
Here we presented classification of the reusability of software components using Support Vector Machine (SVM). The identification of Reusable Software modules in Procedure Oriented Software System. Metrics has been used for the structural analysis of the different procedures. Software metrics for Procedure oriented paradigm has been used in this paper Cyclometric Complexity Using Mc Cabe's Measure, Halstead Software Science Indicator, Regularity Metric, Reuse frequency metric, Coupling Metric. The values of these Metrics will become the input dataset for the different neural network systems. Neural Network Based Approach is used to establish the relationship between different attributes of the reusability and serve as the automatic tool for the evaluation of the reusability of the procedures by calculating the relationship based on its training. Algorithms of neural network are experimented and the results are recorded in terms of Accuracy, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). Hence in this paper the proposed model can be used to improve the productivity and quality of software development.
Article
This paper discusses common approaches to presenting the topic of skewness in the classroom, and explains why students need to know how to measure it. Two skewness statistics are examined: the Fisher-Pearson standardized third moment coefficient, and the Pearson 2 coefficient that compares the mean and median. The former is reported in statistical software packages, while the latter is all but forgotten in textbooks. Given its intuitive appeal, why did Pearson 2 disappear? Is it ever useful? Using Monte Carlo simulation, tables of percentiles are created for Pearson 2. It is shown that while Pearson 2 has lower power, it matches classroom explanations of skewness and can be calculated when summarized data are available. This paper suggests reviving the Pearson 2 skewness statistic for the introductory statistics course because it compares the mean to the median in a precise way that students can understand. The paper reiterates warnings about what any skewness statistic can actually tell us.