Figure 4 - uploaded by Chun Yong Chong
Content may be subject to copyright.
a.) Method used to select project releases and generate ground truth b.) Method used to evaluate clustering results against ground truth.

a.) Method used to select project releases and generate ground truth b.) Method used to evaluate clustering results against ground truth.

Source publication
Preprint
Full-text available
Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representa...

Contexts in source publication

Context 1
... any good clustering algorithm, small changes in the target software (clustering algorithm applied on multiple small increment releases of the same software) should not alter the clustering results significantly. Due to the way how we create the ground truth, 10 prior releases of the examined software will be needed to identify the common directory structure, as shown in Fig- ure 4. As such, 21 releases of the chosen project are required to form one of our selection criteria. ...
Context 2
... selected projects are shown in Ta- ble 4 and the complete set of datasets can be found on our Github page 3 . The columns firstRelease and lastRelease in Figure 4 indicate the versions of the software that we examined and used for our experiments. Note that we use 21 incremental releases in between the stated firstRelease and lastRelease to ensure the stability of the clustering algorithms, and to generate the ground truth. ...
Context 3
... the example shown in Figure 4a, we use the common directories across apache_spark-1.0 to apache_spark-1.9 to generate the ground truth. Subsequently, this ground truth comprising the common directory from releases 1.0 to 1.9 will be used to evaluate against the clustering results that we produce in the next release, which is apache_spark-2.0, ...
Context 4
... this ground truth comprising the common directory from releases 1.0 to 1.9 will be used to evaluate against the clustering results that we produce in the next release, which is apache_spark-2.0, as shown in Figure 4b. To illustrate another simple example, given the following directory structure of a software in 3 in- For example, given that the following directory paths are extracted ...
Context 5
... any good clustering algorithm, small changes in the target software (clustering algorithm applied on multiple small increment releases of the same software) should not alter the clustering results significantly. Due to the way how we create the ground truth, 10 prior releases of the examined software will be needed to identify the common directory structure, as shown in Fig- ure 4. As such, 21 releases of the chosen project are required to form one of our selection criteria. ...
Context 6
... selected projects are shown in Ta- ble 4 and the complete set of datasets can be found on our Github page 3 . The columns firstRelease and lastRelease in Figure 4 indicate the versions of the software that we examined and used for our experiments. Note that we use 21 incremental releases in between the stated firstRelease and lastRelease to ensure the stability of the clustering algorithms, and to generate the ground truth. ...
Context 7
... the example shown in Figure 4a, we use the common directories across apache_spark-1.0 to apache_spark-1.9 to generate the ground truth. Subsequently, this ground truth comprising the common directory from releases 1.0 to 1.9 will be used to evaluate against the clustering results that we produce in the next release, which is apache_spark-2.0, ...
Context 8
... this ground truth comprising the common directory from releases 1.0 to 1.9 will be used to evaluate against the clustering results that we produce in the next release, which is apache_spark-2.0, as shown in Figure 4b. To illustrate another simple example, given the following directory structure of a software in 3 in- For example, given that the following directory paths are extracted ...

Similar publications

Article
Full-text available
Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task pose...