Conference PaperPDF Available

A Graph-Based Feature Location Approach Using Set Theory: [Challenge Solution]

Authors:
  • ipoque - a Rohde & Schwarz company

Abstract and Figures

The ArgoUML SPL benchmark addresses feature location in Software Product Lines (SPLs), where single features as well as feature combinations and feature negations have to be identified. We present a solution for this challenge using a graph-based approach and set theory. The results are promising. Set theory allows to exactly define which parts of feature locations can be computed and which precision and which recall can be achieved. This has to be complemented by a reliable identification of feature-dependent class and method traces as well as refinements. The application of our solution to one scenario of the benchmark supports this claim.
Content may be subject to copyright.
A Graph-Based Feature Location Approach Using Set Theory
Richard Müller
Leipzig University
Leipzig, Germany
rmueller@wifa.uni-leipzig.de
Ulrich Eisenecker
Leipzig University
Leipzig, Germany
eisenecker@wifa.uni-leipzig.de
ABSTRACT
The ArgoUML SPL benchmark addresses feature location in Soft-
ware Product Lines (SPLs), where single features as well as fea-
ture combinations and feature negations have to be identied. We
present a solution for this challenge using a graph-based approach
and set theory. The results are promising. Set theory allows to
exactly dene which parts of feature locations can be computed
and which precision and which recall can be achieved. This has to
be complemented by a reliable identication of feature-dependent
class and method traces as well as renements. The application of
our solution to one scenario of the benchmark supports this claim.
CCS CONCEPTS
Software and its engineering Software product lines
;
Software reverse engineering
;
Information systems
Graph-
based database models.
KEYWORDS
Feature location, Software Product Lines, Benchmark, Reverse Engi-
neering, Extractive Software Product Line Adoption, ArgoUML, Set
theory, Static analysis, Graph database, Neo4j, Cypher, jQAssistant
ACM Reference Format:
Richard Müller and Ulrich Eisenecker. 2019. A Graph-Based Feature Location
Approach Using Set Theory. In 23rd International Systems and Software Prod-
uct Line Conference - Volume A (SPLC ’19), September 9–13, 2019, Paris, France.
ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3336294.3342358
1 INTRODUCTION
Feature location techniques aim at identifying a mapping from
features to source code elements. A feature is “a prominent or dis-
tinctive and user visible aspect, quality, or characteristic of a software
system or systems“ [
2
]. Locating features plays an important role
during software maintenance of single systems [
1
] and of Software
Product Lines (SPLs) [
5
]. In case of SPLs, single features as well as
feature combinations and feature negations have to be identied.
Martinez et al. [
3
] proclaim the ArgoUML SPL benchmark to fos-
ter research in feature location for SPLs. It provides a list of features
with names and descriptions, a set of scenarios each containing
a set of ArgoUML variants, a common ground-truth, and a Java
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SPLC ’19, September 9–13, 2019, Paris, France
©2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-7138-4/19/09.. .$15.00
https://doi.org/10.1145/3336294.3342358
program to automatically calculate precision, recall, and F1-score
based on the feature location results and the ground-truth. The
challenge is to develop a technique that locates single features, fea-
ture combinations, and feature negations in 15 dierent scenarios
derived from the ArgoUML SPL.
The contribution of this paper is a solution for this challenge.
According to the feature location taxonomy of Dit et al. [
1
], we
use a static analysis taking Java source code artifacts as input and
extracting a software graph for each ArgoUML variant. The soft-
ware graphs act as data sources. These graphs are created with
jQAssistant, a tool that scans software artifacts and stores them as
graphs in a Neo4j database.
1, 2
It can be extended with plugins to
support certain types of software artifacts [
4
]. For this challenge,
we use the Java source code plugin.
3
Next, a trace graph based on
the software graphs of the given scenario is created using the graph
query language Cypher.
4
We use set theory to dene all elementary
subsets for the features and their combinations specic to the given
scenario. Based on this, feature traces are computed for the actually
existing feature-specic subsets using the trace graph. The output
is a text le for each feature with class and method traces. The tech-
nique is evaluated with the traditional scenario of the ArgoUML
SPL benchmark [3].
2 APPROACH
Our proposed solution comprises a conceptual and a technical part.
In the rst part, we map the given problem to set theory which
facilitates nding a principal solution. In the second part, we use
the graph database Neo4j and its query language Cypher, which
provide a proper representation for sets of traces and methods for
set operations.
2.1 Concept
First, let us introduce the core system
S0
of the SPL
S
.
S0
implements
only the core functionality, but no additional feature. Let
T0
be
the set of software traces used to implement
S0
. Next, we assume
that there are three systems
S1
,
S2
, and
S3
, also instances of
S
.
S1
implements both features
F1
and
F2
.
S2
solely implements feature
F1
and
S3
solely
F2
.
T1
is the set of software traces implementing
S1
. The same applies to
T2
and
S2
as well as
T3
and
S3
. Table 1
summarizes these sets.
We denote the set
F1pur e
as exclusive traces required for
F1
, but
no other feature. The same applies to F2pur e and F2. In the case of
two features F1and F2, we have to care for two special cases.
1https://jqassistant.org/
2https://neo4j.com/
3https://github.com/softvis-research/jqa-javasrc- plugin/tree/splc-challenge
4https://www.opencypher.org
SPLC ’19, September 9–13, 2019, Paris, France Müller and Eisenecker
Table 1: Relationships between sets of system, implemented
features, and software traces with two features
System Features Traces
S0none T0
S1F1,F2T1
S2F1T2
S3F2T3
First, there may be software traces, which are common to both
features, that is, traces are included for the condition
// # i f de f i ne d
(F1) o r // # i f de f i ne d ( F2)
. This set is denoted as
F1or 2
. Now, we
can determine the following sets.
F1:=F1pur e F1or 2:=T2\T0(1)
F2:=F2pur e F1or 2:=T3\T0(2)
F1
contains all software traces, which are relevant for this fea-
ture including exclusive (pure) and common (or) traces. The same
applies to
F2
. The operator
denotes the union of two sets and the
operator
\
denotes the result of diminishing the left set by the right
set.
Second, there may be software traces, which are only included
if both features are present, that is, traces are included for the
condition
// # i f de f i ne d ( F1) an d // # if d e f in e d (F2)
. This set is
denoted as
F1and 2
. Please note, that
F1and 2
only exists if both fea-
tures, F1and F2, are present; otherwise it is non-existent.
F1pur e F1and 2:=(T1\T0) \ T3(3)
F2pur e F1and 2:=(T1\T0) \ T2(4)
F1and 2:=(F1p ur e F1and 2)∩(F2pur e F1and 2)(5)
T1\T0
contains all software traces of all features without the core
traces. If we subtract the set
T3
from this set we get the exclusive
(
pure
) traces for feature
F1
and the traces if both features
F1
and
F2
are present (
and
). The same applies to
T2
for
F2
. The intersection
of both sets results in the traces of the feature combination
F1and 2
.
Given this, the sets
T0
,
F1pur e
,
F2pur e
,
F1or 2
, and
F1and 2
are a
complete dissection of the software system
S1
, implementing both
features
F1
and
F2
.
Complet e dissection
means that each possible
intersection of two sets from the index set with elements
T0
,
F1pur e
,
F2pur e
,
F1or 2
, and
F1and 2
is empty. Furthermore, no software trace
exists, which is not an element of one of the aforementioned sets.
The preceding approach can be generalized to any number of
features belonging to a software product line and the corresponding
sets. We abstain from doing that because of space restrictions.
Now, the problem of locating features in a given scenario of
the ArgoUML SPL benchmark can be solved by computing the
sets of feature-specic traces depending on the denition of the
corresponding elementary feature subsets. When writing the ini-
tial version of this paper, we had to dene the elementary feature
subsets manually. Meanwhile, we designed an appropriate algo-
rithm and implemented it, which automates feature location in the
context of the ArgoUML SPL benchmark.5
Because of initially missing an adequate algorithm, we manually
determined the elementary feature subsets of the traditional sce-
nario. This scenario has 10 variants where one system is without all
the optional features, one is with all the features, and then, for each
feature, one system with all the features enabled and this feature
disabled [3]. Table 2 summarizes the given sets.
Table 2: Relationships between sets of system, implemented
features, and software traces in the traditional scenario
System Features Traces
S0none T0
S1F1,F2,F3,F4,F5,F6,F7,F8T1
S2F2,F3,F4,F5,F6,F7,F8T2
S3F1,F3,F4,F5,F6,F7,F8T3
S4F1,F2,F4,F5,F6,F7,F8T4
S5F1,F2,F3,F5,F6,F7,F8T5
S6F1,F2,F3,F4,F6,F7,F8T6
S7F1,F2,F3,F4,F5,F7,F8T7
S8F1,F2,F3,F4,F5,F6,F8T8
S9F1,F2,F3,F4,F5,F6,F7T9
Due to the given congurations, it is not possible to determine
the sets of common (
or
) feature traces in the traditional scenario.
However, we can determine exclusive (
pure
) feature traces and
feature combination (
and
) traces. For example, to get the exclusive
traces for
F1
and
F2
as well as the traces for the feature combination
F1and 2, we apply the following set operations.
F1pur e :=((T1\T0) \ T2) \
8
Ø
i=1
8
Ø
j=i+1
Fiand j(6)
F2pur e :=((T1\T0) \ T3) \
8
Ø
i=1
8
Ø
j=i+1
Fiand j(7)
F1and 2:=((T1\T0) \ T2) ∩ ((T1\T0) \ T3)(8)
To get the traces of the other features
F3
to
F8
and feature com-
binations these operations are repeated analogously. Computing
these sets we expect a precision of 1.0. The recall will be lower
because we cannot determine the sets of common (
or
) traces for
two or more features in this scenario. Thus, they will result in
values of 0.0, for both, precision and recall. While this may sound
disappointing at the rst look, it precisely reects the maximum
information, which can be computed (provided the identication
of relevant software traces is perfect) for the given information.
Hence, the described procedure can be considered as a principle
approach for feature location.
5https://github.com/softvis-research/featurelocation
A Graph-Based Feature Location Approach Using Set Theory SPLC ’19, September 9–13, 2019, Paris, France
Start Compute feature traces
Create software graphs Software
graphs
Yes
No
Software
graphs exist? Create trace graph Feature traces End
Trace graph
Figure 1: Flowchart of the graph-based feature location approach using set theory
2.2 Implementation
We have implemented the feature location technique in Java using
the provided Eclipse workspace. The source code is available on
GitHub.
6
The owchart in Figure 1 shows the process of applying
the technique.
For each variant it is checked whether a corresponding software
graph exists. If not, it is created by the command-line version of
jQAssistant using the Java source code plugin. Thus, each variant
is scanned exactly once and the resulting software graph is stored
in an embedded Neo4j database. Figure 2 shows a part of the soft-
ware graph including the nodes
Type
and
Me t ho d
, the relationship
DE C LA RE S
as well as node properties
name
,
fq n
(fully qualied name),
and si g na t ur e.
Type Method
name: String
fqn: String
name: String
signature: String
DECLARES
Figure 2: Part of the software graph created for each variant
Listing 1 shows the corresponding Cypher queries for getting
the traces to complete classes and methods. The rst query returns
all types whose
fq n
starts with
or g . a rg o um l
, that are not inner or
anonymous classes, and where the name consists of more than one
letter. The second query returns all methods whose declaring type
meets the above mentioned requirements.
Listing 1: Cypher queries for getting the traces to complete
classes and methods
MATCH (type:Type)
WHERE type.fqn STARTS WITH 'org.argouml'
AND NOT (:Type)-[:DECLARES]->(type) AND NOT type.fqn CONTAINS '$'
AND NOT type.fqn CONTAINS 'Anonymous'AND NOT size(type.name) = 1
RETURN DISTINCT type.fqn AS type
MATCH (type:Type)-[:DECLARES]->(method:Method)
// same conditions as above
RETURN DISTINCT type.fqn AS type, method.name AS method, method.
signature AS signature
6https://github.com/softvis-research/argouml-spl- benchmark
Based on these queries the trace graph for the scenario is created.
Figure 3 shows this graph including the nodes
Fe a tu re
,
Co n fi gu
-
ra t io n
,
Tr a ce : C la s s
, and
Tr a ce : M et h od
, the relationships
HA S
and
DE C LA RE S
as well as their properties
name
and
va lu e
. For each cong-
uration the active features are added as
Fe a tu re
, the queried types
are added as traces labeled
Tr a ce : C la s s
and the queried methods
are added as traces labeled T r ac e : M et h od to the trace graph.
HAS
HAS
HAS
Configuration Trace:MethodFeature DECLARESTrace:Class
name: String value: String
name: String value: String
Figure 3: The trace graph created for a scenario
Now, we can apply set operations on the trace graph using
Cypher including Awesome Procedures On Cypher (APOC).
7
Please
note, that this implementation is specic to the traditional scenario.
Other scenario sets have to be computed dierently.
Listing 2 shows the Cypher queries for the necessary set opera-
tions to compute the sets for the exclusive traces
F1pur e
to
F8pur e
and the feature combinations. First, we query all traces from the
conguration where all features are disabled and label them with
Core
. These traces correspond to
T0
. Second, we query
T1
from the
conguration where all features are enabled and apply the set dier-
ence
T1\T0
using the APOC procedure
apoc.c ol l .s ub tr ac t ()
and
label the resulting traces with
Fe a tu r eT r ac e
. Third, we query
T2
,
the traces where
F1
is disabled and apply the same APOC procedure
to apply the set dierence
(T1\T0) \ T2
resulting in the traces of
F1pur e
and of all feature combinations. For each identied trace a
relationship between
Fe a tu re
and
Tr ac e
is created and its value
is set to
pure
. These operations are repeated for all other features
F2
to
F8
. Fourth, we query all traces of each feature and apply the
APOC procedure
apoc.c ol l .i nt er s ec ti on ( )
on every pair of fea-
tures. For every identied feature combination relationships are
created and their value is set to
an d
. If there is a relationship with
the value
pure
it is deleted. Finally, the identied traces are written
to a text le for each feature and feature combination.
7https://neo4j-contrib.github.io/neo4j-apoc-procedures/
SPLC ’19, September 9–13, 2019, Paris, France Müller and Eisenecker
Listing 2: Cypher queries to compute feature traces
MATCH (c:Configuration)-[:HAS]->(ct:Trace)
WHERE c.name = 'P01_AllDisabled.config'SET ct:Core
// ...
MATCH (c:Configuration)-[:HAS]->(fct:Trace)
WHERE c.name = 'P02_AllEnabled.config'
WITH cts, collect(fct) AS fcts
WITH apoc.coll.subtract(fcts, cts) AS fts
FOREACH (f IN fts | SET f:FeatureTrace )
//...
MATCH (notf:Feature)<-[:HAS]-()-[:HAS]->(notft:FeatureTrace)
//...
WITH apoc.coll.subtract(allfts, notfts) AS fts
//...
FOREACH (ft IN (ffts) | CREATE (f)-[:HAS{value:'pure'}]->(ft))
//...
MATCH (f1:Feature)-[:HAS]->(dt:FeatureTrace)
MATCH (f1)-[:HAS]->(:FeatureTrace)-[:DECLARES]->(idt)
//...
WITH f1, f2, apoc.coll.intersection(fts1, fts2) AS andts
//...
MERGE (f1)-[:HAS{value:'and'}]->(fandt)
MERGE (f2)-[:HAS{value:'and'}]->(fandt) DELETE r1 DELETE r2
3 EVALUATION
We have located traces to complete classes and methods for 12 out
of 24 features and feature combinations in the traditional scenario.
The benchmark metrics are summarized in Table 3. The underly-
ing ground-truth is pruned, that is, all traces of class and method
renements have been removed.
Table 3: Benchmark metrics for 12 of 24 features and feature
combinations in the traditional scenario based on a pruned
ground-truth without renement traces
Name Precision Recall F1-score
ACTIVITYDIAGRAM 1.00 0.49 0.66
ACTIVITYDIAGRAM_and_STATEDIAGRAM 1.00 1.00 1.00
COGNITIVE 1.00 1.00 1.00
COGNITIVE_and_DEPLOYMENTDIAGRAM 1.00 0.93 0.96
COGNITIVE_and_SEQUENCEDIAGRAM 1.00 1.00 1.00
COLLABORATIONDIAGRAM 1.00 0.95 0.98
COLLABORATIONDIAGRAM_and_SEQUENCEDIAGRAM 1.00 1.00 1.00
DEPLOYMENTDIAGRAM 1.00 1.00 1.00
LOGGING 1.00 0.67 0.80
SEQUENCEDIAGRAM 1.00 0.96 0.98
STATEDIAGRAM 1.00 0.63 0.77
USECASEDIAGRAM 1.00 1.00 1.00
Average 1.00 0.89 0.93
As expected, we have identied all exclusive (
pure
) traces of
single features and all traces of feature combinations (
and
). The
recall values below 1.0 are due to the missing common (
or
) traces
as described in Section 2.1. The remaining 12 features, feature com-
binations, and feature negations have a precision and recall of 0.0
because they only contain traces of class and method renements
that are currently not available in the software graph.
We have measured the time for creating the software graphs and
the trace graph on a Zotac MAGNUS EN1070 (Intel Core i5-6400T,
16 GB RAM) with Windows 10 as operating system and a Java
development toolkit in version 12. In the traditional scenario, it
takes 60min (3606903ms) to create the 10 software graphs and 45s
(44987ms) to create the trace graph and compute the feature traces.
4 DISCUSSION
Next, we discuss the strengths and weaknesses of the feature loca-
tion technique. We will suggest possible improvements mitigating
the mentioned weaknesses in Section 5.
4.1 Strengths
The proposed approach locates features, feature combinations, and
feature negations with a precision of 1.0 and a recall of 1.0 provided,
the required sets of software traces are available. We demonstrated
that for exclusive feature traces and traces of feature combinations
to complete classes and methods in the traditional scenario.
Furthermore, the implementation of the technique is very com-
pact. This is mainly due to the sensible integration of existing tools,
such as jQAssistant and Neo4j.
4.2 Weaknesses
Until recently, we had to determine ourselves which elementary
feature subsets can be computed for a given scenario and which
questions can be answered accordingly. It is obvious that for the
scenario with all possible variants, that is, 256, all the information
is given to compute any possible subset.
The software graphs are created with the Java source code plugin
for jQAssistant. At the moment this plugin does not consider im-
ports, elds, and method statements. For these reasons, only traces
to complete classes and methods are supported.
Moreover, the creation of the software graph for one variant
takes approximately 6 minutes. In case of the scenario with 256
variants, the scan process to create the software graphs takes almost
26 hours.
5 CONCLUSION AND FUTURE WORK
In this paper we outline a principal solution to the feature location
benchmark with ArgoUML SPL [
3
]. We have mapped the given
problem to set theory and developed a graph-based solution for the
traditional scenario with Java, Neo4j, and Cypher. Our technique
locates exclusive traces and pair-wise feature interaction traces to
complete classes and methods of 12 out of 24 features and feature
combinations with a precision of 1.0 and a recall of 1.0.
After extending the Java source code plugin for jQAssistant to
scan imports, elds, and method statements, it will be possible to
detect class and method renements for locating the remaining
features, feature combinations, and feature negations. Including
the implementation of the algorithm for dening all elementary
feature subsets for a given scenario will allow to automate feature
location in the context of the ArgoUML SPL benchmark. Further-
more, performance bottlenecks have to be identied and replaced
by adequate optimizations. Finally, if a specic subset cannot be
exactly computed because of missing necessary variants in a given
scenario, alternative techniques to identify feature-specic traces
could be applied, for example, feature names as substrings of given
identiers.
ACKNOWLEDGMENTS
We especially acknowledge the work of Dirk Mahler, the main
developer jQAssistant, and Michael Hunger, the main developer of
APOC.
A Graph-Based Feature Location Approach Using Set Theory SPLC ’19, September 9–13, 2019, Paris, France
REFERENCES
[1]
Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013.
Feature location in source code: a taxonomy and survey. J. Softw. Evol. Process 25,
1 (2013), 53–95. https://doi.org/10.1002/smr.567
[2]
Kang K. C., Sholom G Cohen, James A Hess, William E Novak, A Spencer Peterson,
Sholom G Cohen, James A Hess, William E Novak, and A Spencer Peterson. 1990.
Feasibility Study Feature-Oriented Domain Analysis (FODA). Technical Report
November. Carnegie-Mellon University Software Engineering Institute.
[3]
Jabier Martinez, Nicolas Ordoñez, Xhevahire Tërnava, Tewk Ziadi, Jairo Aponte,
Eduardo Figueiredo, and Marco Tulio Valente. 2018. Feature Location Benchmark
with argoUML SPL. In Proc. 22Nd Int. Syst. Softw. Prod. Line Conf. - Vol. 1 (SPLC
’18). ACM, New York, NY, USA, 257–263. https://doi.org/10.1145/3233027.3236402
[4]
Richard Müller, Dirk Mahler, Michael Hunger, Jens Nerche, and Markus Harrer.
2018. Towards an Open Source Stack to Create a Unied Data Source for Software
Analysis and Visualization. In Proc. 6th IEEE Work. Conf. Softw. Vis. IEEE, Madrid,
Spain. https://doi.org/10.1109/VISSOFT.2018.00019
[5]
Julia Rubin and Marsha Chechik. 2013. A Survey of Feature Location Techniques.
In Domain Eng. Prod. Lines, Lang. Concept. Model. 29–58.
... We selected three public Java SPLs, widely-used for product-line analyses: Sudoku, Notepad [18], and ArgoUML [7]. These systems have often been used to evaluate and compare FLTs [1,9,19,25,26,27]. The aforementioned studies that used ArgoUML to evaluate their FLTs only used this system as a subject. ...
... Thus, the ArgoUML benchmark has an appropriate granularity, as it is at the level of source code statements instead of the method-level [23]. Furthermore, previous work already published used only the ArgoUML benchmark as a subject system to evaluate their technique [1,9,26,27]. Besides ArgoUML being one of the most often systems used to evaluate FLTs, it is also extensively used in the extractive SPL community research [3,30]. ...
... We selected three public Java SPLs, widely-used for product-line analyses: Sudoku, Notepad [18], and ArgoUML [7]. These systems have often been used to evaluate and compare FLTs [1,9,19,26,27,28]. The aforementioned studies that used ArgoUML to evaluate their FLTs only used this system as a subject. ...
... Thus, the ArgoUML benchmark has an appropriate granularity, as it is at the level of source code statements instead of the method-level [24]. Furthermore, previous work already published used only the ArgoUML benchmark as a subject system to evaluate their technique [1,9,27,28]. Besides ArgoUML being one of the most often systems used to evaluate FLTs, it is also extensively used in the extractive SPL community research [3,31]. ...
Preprint
Full-text available
Software product lines (SPLs) are known for improving productivity and reducing time-to-market through the systematic reuse of assets. SPLs are adopted mainly by re-engineering existing system variants. Feature location techniques (FLTs) support the re-engineering process by mapping the variants' features to their implementation. However, such FLTs do not perform well when applied to single systems. In this way, there is a lack of FLTs to aid the re-engineering process of a single system into an SPL. In this work, we present a hybrid technique that consists of two complementary types of analysis: i) a dynamic analysis by runtime monitoring traces of scenarios in which features of the system are exercised individually, and ii) a static analysis for refining overlapping traces. We evaluate our technique on three subject systems by computing the common metrics used in FL research. We thus computed Precision, Recall, and F-Score at the line-and method-level of source code. In addition to that, one of the systems has a ground truth available, which we also used for comparing results. Results show that our FLT reached an average of 68-78% precision and 72-81% recall on two systems at the line-level, and 67-65% precision and 68-48% recall at the method-level. In these systems, most of the implementation can be covered by the exercise of the features. For the largest system, our technique reached a precision of up to 99% at the line-level, 94% at the method-level, and 44% when comparing to traces. However, due to its size, it was difficult to reach high code coverage during execution, and thus the recall obtained was on average of 28% at the line-level, 25% at the method-level, and 30% when comparing to traces. The main contribution of this work is a hybrid FLT, its publicly available implementation, and a replication package for comparisons and future studies.
... Whether a clause is part of a presence condition for an artifact depends on some fairly intuitive ideas that have already been proven to work very well for simple feature location [28,31]. In this work we build upon these ideas and extend them to feature revisions. ...
... In order to support the extractive adoption of SPLs by reusing existing variants as the basis for the core assets several feature location techniques have been proposed [3]. However, the feature revision concept is still untreated among feature location techniques in the literature [3,6,8,28,31,33]. As suggested by Hinterreiter et al. [16], maintaining revisions of individual features may help to understand the evolution history of a variant and capture ongoing changes. ...
Preprint
Software companies encounter variability in space as variants of software systems need to be produced for different customers. At the same time, companies need to handle evolution in time because the customized variants need to be revised and kept up-to-date. This leads to a predicament in practice with many system variants significantly diverging from each other. Maintaining these variants consistently is difficult, as they diverge across space, i.e., different feature combinations, and over time, i.e., revisions of features. This work presents an automated feature revision location technique that traces feature revisions to their implementation. To assess the correctness of our technique, we used variants and revisions from three open source highly configurable software systems. In particular, we compared the original artifacts of the variants with the composed artifacts that were located by our technique. The results show that our technique can properly trace feature revisions to their implementation, reaching traces with 100% precision and 98% recall on average for the three analyzed subject systems, taking on average around 50 seconds for locating feature revisions per variant used as input. CCS CONCEPTS • Software and its engineering Softwareproductlines;Trace-ability; Software reverse engineering; Reusability.
... There are a number of benchmarks [6,15,16] that have been used to evaluate FL techniques performance [5,18,24], while we used one of them to estimate the performance of FL techniques with refactored variants. We have previously presented the ArgoUML SPL benchmark, which have been used to evaluate many FL techniques [5,18,19]. In addition, Martinez et al. [16] have proposed the use of eclipse variants to evaluate feature location techniques. ...
Conference Paper
Due to the increasing importance of feature location process, several studies evaluate the performance of different techniques based on IR strategies and a set of software variants as input artifacts. The proposed techniques attempt to improve the results obtained but it is often a difficult task. None of the existing feature location techniques considers the changing nature of the input artifacts, which may undergo series of refactoring changes. In this paper, we investigate the impact of refactoring variants on the feature location techniques. We first evaluate the performance of two techniques through the ArgoUML SPL benchmark when the variants are refactored. We then discuss the degraded results and the possibility of restoring them. Finally, we outline a process of variant alignment that aims to preserve the performance of the feature location.
Article
Full-text available
Source code analysis is one of the important activities during the software maintenance phase that focuses on performing the tasks including bug localization, feature location, bug/feature assignment, and so on. However, handling the aforementioned tasks on a manual basis (i.e. finding the location of buggy code from a large application) is an expensive, time-consuming, tedious, and challenging task. Thus, the developers seek automated support in performing the software maintenance tasks through automated tools and techniques. However, the majority of the reported techniques are limited to textual analysis where the real developers’ concerns are not properly considered. Moreover, existing solutions seem less useful for the developers. This work proposes a tool (called as FineCodeAnalyzer) that supports an interactive source code analysis grounded on structural and historical relations at fine granular-level between the source code elements. To evaluate the performance of FineCodeAnalyzer, we consider 74 developers that assess three main facets: (i) usefulness, (ii) cognitive-load, and (iii) time efficiency. For usefulness concern, the results show that FineCodeAnalyzer outperforms the developers’ self-adopted strategies in locating the code elements in terms of Precision, Recall, and F1-Measure of accurately locating the code elements. Specifically, FineCodeAnalyzer outperforms the developers’ strategies up to 47%, 76%, and 61% in terms of Precision, Recall, and F1-measure, respectively. Additionally, FineCodeAnalyzer takes 5% less time than developers’ strategies in terms of minutes of time. For cognitive-load, the developers found FineCodeAnalyzer to be 72% less complicated than manual strategies, in terms of the NASA Tool Load Index metric. Finally, the results indicate that FineCodeAnalyzer allows effectively locating the code elements than the developer’s adopted strategies.
Conference Paper
Full-text available
The beginning of every software analysis and visualization process is data acquisition. However, there are various sources of data about a software system. The methods used to extract the relevant data are as diverse as the sources are. Furthermore, integration and storage of heterogeneous data from different software artifacts to form a unified data source are very challenging. In this paper, we introduce an extensible open source stack to take the first step to solve these challenges. We show its feasibility by analyzing and visualizing JUnit and provide answers regarding the schema, selection, and implementation of software artifacts’ data.
Presentation
Full-text available
The beginning of every software analysis and visualization process is data acquisition. However, there are various sources of data about a software system. The methods used to extract the relevant data are as diverse as the sources are. Furthermore, integration and storage of heterogeneous data from different software artifacts to form a unified data source are very challenging. In this paper, we introduce an extensible open source stack to take the first step to solve these challenges. We show its feasibility by analyzing and visualizing JUnit and provide answers regarding the schema, selection, and implementation of software artifacts’ data.
Article
Full-text available
Feature location techniques aim at locating software artifacts that imple-ment a specific program functionality, a.k.a. a feature. These techniques support de-velopers during various activities such as software maintenance, aspect-or feature-oriented refactoring, and others. For example, detecting artifacts that correspond to product line features can assist the transition from unstructured to systematic reuse approaches promoted by software product line engineering (SPLE). Manag-ing features, as well as the traceability between these features and the artifacts that implement them, is an essential task of the SPLE domain engineering phase, during which the product line resources are specified, designed and implemented. In this chapter, we provide an overview of existing feature location techniques. We describe their implementation strategies and exemplify the techniques on a realistic use-case. We also discuss their properties, strengths and weaknesses and provide guidelines that can be used by practitioners when deciding which feature location technique to choose. Our survey shows that none of the existing feature location techniques are designed to consider families of related products and only treat different products of a product line as individual, unrelated entities. We thus discuss possible directions for leveraging SPLE architectures in order to improve the feature location process.
Article
SUMMARY Feature location is the activity of identifying an initial location in the source code that implements functionality in a software system. Many feature location techniques have been introduced that automate some or all of this process, and a comprehensive overview of this large body of work would be beneficial to researchers and practitioners. This paper presents a systematic literature survey of feature location techniques. Eighty-nine articles from 25 venues have been reviewed and classified within the taxonomy in order to organize and structure existing work in the field of feature location. The paper also discusses open issues and defines future directions in the field of feature location. Copyright © 2011 John Wiley & Sons, Ltd.