- A preview of this full-text is provided by Springer Nature.
- Learn more
Preview content only
Content available from Empirical Software Engineering
This content is subject to copyright. Terms and conditions apply.
https://doi.org/10.1007/s10664-020-09863-2
CROKAGE: effective solution recommendation
for programming tasks by leveraging
crowd knowledge
Rodrigo Fernandes Gomes da Silva1·Chanchal K. Roy2·
Mohammad Masudur Rahman2·Kevin A. Schneider2·Kl´
erisson Paix˜
ao1·
Carlos Eduardo de Carvalho Dantas1·Marcelo de Almeida Maia1
©Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
Developers often search for relevant code examples on the web for their programming tasks.
Unfortunately, they face three major problems. First, they frequently need to read and anal-
yse multiple results from the search engines to obtain a satisfactory solution. Second, the
search is impaired due to a lexical gap between the query (task description) and the infor-
mation associated with the solution (e.g., code example). Third, the retrieved solution may
not be comprehensible, i.e., the code segment might miss a succinct explanation. To address
these three problems, we propose CROKAGE (CrowdKnowledge Answer Generator), a tool
that takes the description of a programming task (the query) as input and delivers a compre-
hensible solution for the task. Our solutions contain not only relevant code examples but also
their succinct explanations written by human developers. The search for code examples is
modeled as an Information Retrieval (IR) problem. We first leverage the crowd knowledge
stored in Stack Overflow to retrieve the candidate answers against a programming task. For
this, we use a fine-tuned IR technique, chosen after comparing 11 IR techniques in terms of
performance. Then we use a multi-factor relevance mechanism to mitigate the lexical gap
problem, and select the top quality answers related to the task. Finally, we perform natural
language processing on the top quality answers and deliver the comprehensible solutions
containing both code examples and code explanations unlike earlier studies. We evaluate
and compare our approach against ten baselines, including the state-of-art. We show that
CROKAGE outperforms the ten baselines in suggesting relevant solutions for 902 pro-
gramming tasks (i.e., queries) of three popular programming languages: Java, Python and
PHP. Furthermore, we use 24 programming tasks (queries) to evaluate our solutions with
29 developers and confirm that CROKAGE outperforms the state-of-art tool in terms of
relevance of the suggested code examples, benefit of the code explanations and the overall
solution quality (code + explanation).
Communicated by: Tim Menzies
Marcelo de Almeida Maia
marcelo.maia@ufu.br
Extended author information available on the last page of the article.
Empirical Software Engineering (2020) 25:4707–4758
Published online: 2 September 2020
Content courtesy of Springer Nature, terms of use apply. Rights reserved.