Conference Proceeding

# TopicXP: Exploring topics in source code using Latent Dirichlet Allocation

Dept. of Comput. Sci., Coll. of William & Mary, Williamsburg, VA, USA
10/2010; DOI:10.1109/ICSM.2010.5609654 In proceeding of: Software Maintenance (ICSM), 2010 IEEE International Conference on
Source: IEEE Xplore

ABSTRACT Acquiring general understanding of large software systems and components from which they are built can be a time consuming task, but having such an understanding is an important prerequisite to adding features or fixing bugs. In this paper we propose the tool, namely TopicXP, to support developers during such software maintenance tasks by extracting and analyzing unstructured information in source code identifier names and comments using Latent Dirichlet Allocation. TopicXP enables developers to gain an overview of a software system under analysis by extracting and visualizing natural language topics, which generally correspond to concepts or features implemented in software classes. TopicXP is implemented as an open-source Eclipse plug-in, which proposes interactive visualization of topics along with structural dependencies between underlying classes implementing these topics. The paper also presents the results of a preliminary user study aimed at evaluating TopicXP.

0 0
·
1 Bookmark
·
66 Views
• Source
##### Conference Proceeding: Consistent Layout for Thematic Software Maps.
[hide abstract]
ABSTRACT: Software visualizations can provide a concise overview of a complex software system. Unfortunately, since software has no physical shape, there is no natural'' mapping of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance have no meaning, and consequently layout typical diverges from one visualization to another. We propose a consistent layout for software maps in which the position of a software artifact reflects its \emph{vocabulary}, and distance corresponds to similarity of vocabulary. We use Latent Semantic Indexing (LSI) to map software artifacts to a vector space, and then use Multidimensional Scaling (MDS) to map this vector space down to two dimensions. The resulting consistent layout allows us to develop a variety of thematic software maps that express very different aspects of software while making it easy to compare them. The approach is especially suitable for comparing views of evolving software, since the vocabulary of software artifacts tends to be stable over time.
WCRE 2008, Proceedings of the 15th Working Conference on Reverse Engineering, Antwerp, Belgium, October 15-18, 2008; 01/2008
• Source
##### Article: Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval
[hide abstract]
ABSTRACT: This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature location: a scenario-based probabilistic ranking of events and an information-retrieval-based technique that uses latent semantic indexing. The combination of these two experts is empirically evaluated through several case studies, which use the source code of the Mozilla Web browser and the Eclipse integrated development environment. The results show that the combination of experts significantly improves the effectiveness of feature location as compared to each of the experts used independently
IEEE Transactions on Software Engineering 07/2007; 33(6):420-432. · 2.59 Impact Factor
• Source
##### Conference Proceeding: A theory of aspects as latent topics.
Proceedings of the 23rd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2008, October 19-23, 2008, Nashville, TN, USA; 01/2008