Figure 1 - uploaded by Alexander LeClair
Content may be subject to copyright.
Word count histogram for code, comment, and the book summaries. About 22% of words occur one time across all Java methods, versus 35% in the book summaries.
Source publication
Source Code Summarization is the task of writing short, natural language descriptions of source code. The main use for these descriptions is in software documentation e.g. the one-sentence Java method descriptions in JavaDocs. Code summarization is rapidly becoming a popular research problem, but progress is restrained due to a lack of suitable dat...
Contexts in source publication
Context 1
... make three observations about the dataset that, in our view, are likely to affect how researchers design source code summarization algorithms. First, as depicted in Figure 1, words appear to be used more often in code as compared to natural language -there are fewer words used only one or two times, and in general more used 3+ times. At the same time (Figure 2), the pattern for word occurrences per document appears similar, implying that even though words in code are repeated, they are repeated often in the same method and not across methods. ...
Context 2
... make three observations about the dataset that, in our view, are likely to affect how researchers design source code summarization algorithms. First, as depicted in Figure 1, words appear to be used more often in code as compared to natural language -there are fewer words used only one or two times, and in general more used 3+ times. At the same time (Figure 2), the pattern for word occurrences per document appears similar, implying that even though words in code are repeated, they are repeated often in the same method and not across methods. ...