Fig 1 - uploaded by Alexander LeClair
Content may be subject to copyright.
An overview of this paper. 

An overview of this paper. 

Source publication
Preprint
Full-text available
Software Categorization is the task of organizing software into groups that broadly describe the behavior of the software, such as "editors" or "science." Categorization plays an important role in several maintenance tasks, such as repository navigation and feature elicitation. Current approaches attempt to cast the problem as text classification,...

Contexts in source publication

Context 1
... argue that they are for the problem of categorization, especially given their ability to model embeddings. This paper has four major components as shown in the overview in Figure 1. First, in Section IV, we prepare a corpus comprised of C/C++ projects from the Debian packages repository, totaling over 1.5 million files and 6.6 million functions. ...
Context 2
... solution to this problem would have several applications in the short run, as discussed in the Introduction. But in the long run, we are positioning this paper to advance the state-of- the-art in automatic program comprehension generally: much effort in the software maintenance subarea has been dedicated to automated understanding of code changes, artifacts, etc., with the hope of "teaching the computer" to recognize high- level rationale similar to what a human might, rather than low-level details only. In addition, this paper contributes to an ongoing debate in Software Engineering research as to whether neural architectures are an appropriate tool given the unique constraints present in SE data [21], [22]. We argue that they are for the problem of categorization, especially given their ability to model embeddings. This paper has four major components as shown in the overview in Figure 1. First, in Section IV, we prepare a corpus comprised of C/C++ projects from the Debian packages repository, totaling over 1.5 million files and 6.6 million func- tions. The repository contains a special category labeled "libs", which we remove and annotate manually for a separate evalua- tion. Next, in Section V, we describe our custom classification approach that is based on neural text classification algorithms described in relevant NLP literature. Third, in Sections VI and VII, we present our evaluation of our classification approach in comparison with an alternate SE-specific software classification approach, as well as recent work from the area of text classification. Finally, in Sections VIII and IX, we present an example illustrating the intuition behind our results, and information for reproducing our ...
Context 3
... argue that they are for the problem of categorization, especially given their ability to model embeddings. This paper has four major components as shown in the overview in Figure 1. First, in Section IV, we prepare a corpus comprised of C/C++ projects from the Debian packages repository, totaling over 1.5 million files and 6.6 million functions. ...

Similar publications

Conference Paper
Full-text available
Software Categorization is the task of organizing software into groups that broadly describe the behavior of the software, such as “editors” or “science.” Categorization plays an important role in several maintenance tasks, such as repository navigation and feature elicitation. Current approaches attempt to cast the problem as text classification,...