Conference PaperPDF Available

Code Readability Testing, an Empirical Study

Authors:
  • Pivotal Software

Abstract and Figures

Context: One of the factors that leads to improved code maintainability is its readability. When code is difficult to read, it is difficult for subsequent developers to understand its flow and its side effects. They are likely to introduce new bugs while trying to fix old bugs or adding new features. But how do software developers know they have written readable code? Objective: This paper presents a new technique, Code Read-ability Testing, to determine whether code is readable and evaluates whether the technique increases programmers' ability to write readable code. Method: The researcher conducted a field study using 21 software engineering master students and followed the Code Readability Testing with each student in four separate sessions evaluating different " production ready " software. After the observations , a questionnaire evaluated the programmer's perspective. Results: By following Code Readability Testing, half of the programmers writing " unreadable " code started writing " readable " code after four sessions. Programmers writing " readable " code also improved their ability to write readable code. The study reveals that the most frequent suggestions for increasing code readability are improving variable names, improving method names, creating new methods in order to reduce code duplication, simplifying if conditions and structures, and simplifying loop conditions. The programmers report that readability testing is worth their time. They observe increases in their ability to write readable code. When programmers experience a reader struggling to understand their code, they become motivated to write readable code. Conclusion: This paper defines code readability, demonstrates that Code Readability Testing improves programmers' ability to write readable code, and identifies frequent fixes needed to improve code readability.
Content may be subject to copyright.
Code Readability Testing, an Empirical Study
Todd Sedano
Carnegie Mellon Unveristy
Silicon Valley Campus
Moffett Field, CA 94035, USA
Email: professor@gmail.com
Abstract—Context: One of the factors that leads to improved
code maintainability is its readability. When code is difficult to
read, it is difficult for subsequent developers to understand its
flow and its side effects. They are likely to introduce new bugs
while trying to fix old bugs or adding new features. But how do
software developers know they have written readable code?
Objective: This paper presents a new technique, Code Read-
ability Testing, to determine whether code is readable and
evaluates whether the technique increases programmers’ ability
to write readable code.
Method: The researcher conducted a field study using 21
software engineering master students and followed the Code
Readability Testing with each student in four separate sessions
evaluating different “production ready” software. After the obser-
vations, a questionnaire evaluated the programmer’s perspective.
Results: By following Code Readability Testing, half of the pro-
grammers writing “unreadable” code started writing “readable”
code after four sessions. Programmers writing “readable” code
also improved their ability to write readable code. The study
reveals that the most frequent suggestions for increasing code
readability are improving variable names, improving method
names, creating new methods in order to reduce code duplication,
simplifying if conditions and structures, and simplifying loop
conditions. The programmers report that readability testing is
worth their time. They observe increases in their ability to
write readable code. When programmers experience a reader
struggling to understand their code, they become motivated to
write readable code.
Conclusion: This paper defines code readability, demonstrates
that Code Readability Testing improves programmers’ ability
to write readable code, and identifies frequent fixes needed to
improve code readability.
I. INTRODUCTION
Writing readable code reduces the costs of development and
maintenance of software systems. A considerable portion of
the software development cost is ongoing maintenance to add
new features and fix defects [1]. Even in the early stages of the
software’s evolution, the ability to read and quickly understand
existing code is a key factor that affects the code’s ability to
change.
While creating programmers who write readable code is not
a new problem for the software industry, the previous work
focuses around what code should look like, not how to train
programmers to write readable code. Developers realize the
importance of writing code that is readable by their peers, but
they often do not receive feedback on whether their code is
readable. Programming constructs that are clear to the author
can confuse the next developer. Some programmers bemoan
that they can’t read their own code six months later. If the
code works, clearly the computer can understand it, but can
anyone else on the team?
Teaching this skill is not a top priority in computer sci-
ence and software engineering curricula. The Computer Sci-
ence Curriculum promotes understanding the programming
paradigms of a particular language (e.g. functional vs. non-
functional), not how to write readable code [2]. The Software
Engineering Body of Knowledge (SWEBOK) does make one
reference to writing “understandable code” in the Coding Prac-
tical Considerations for the Software Construction knowledge
area [3]. This is just one out of 229 subtopics of SWE-
BOK. The Graduate Software Engineering reference curricu-
lum (GSwE) does not prescribe any further recommendations
beyond SWEBOK for this topic [4]. Some undergraduate
courses briefly cover the issues of programming style. A
few courses will penalize students for producing unreadable
code. In rare courses, students swap assignments simulating
the experience of inheriting someone else’s code. While this
sensitizes students to the needs of writing readable code, the
experience lacks concrete steps to increase their skill. The
emphasis of a computer science curriculum or a software
engineering curriculum is on the substantial topics in the
reference curriculum.
Companies tend to assume programmers arrive with this
skill or will learn it through on the job training. Project teams
may have code style guidelines, or best practices around writ-
ing code e.g. when a programmer opens a database connection,
immediately write the close statement.
There is strong empirical evidence that supports the effec-
tiveness of software inspections and code reviews for uncov-
ering bugs. While theses techniques can identify readability
issues, they are not designed to teach developers how to write
readable code. When an author receives a list of defects, the
author looses the opportunity to learn how the code confuses
the reader.
Code Readability Testing reveals areas where the code is not
readable, and enables a dialogue between coder and reader.
Feedback is instantaneous, as the author sees exactly how
reader interprets the code.
A. Research Objectives
Using the goal template from Goal Question Metric (GQM),
the goal is to... Analyze Code Readability Testing for the
purpose of determining its effectiveness in improving pro-
grammers’ ability to write readable code with respect to their
effectiveness from the point of view of the researcher in
the context of the “craft of software development” course at
Carnegie Mellon University.
This paper decomposes this goal into four questions:
Research Question 1: Would programmers who repeatedly
follow Code Readability Testing increase the readability of
their code?
Research Question 2: What kinds of issues does Code
Readability Testing detect?
Research Question 3: How time-consuming is readability
testing?
Research Question 4: How did programmers perceive Code
Readability Testing?
II. BACKGROU ND A ND RE LATE D WORK
Improving code readability and programming style is not a
new topic for the software industry.
Kernighan and Plauger, in their 1974 seminal book
The Elements of Programming Style, document heuristics for
improving coding practices and code readability by rewrit-
ing code used in computer science textbooks [5]. In the
1982 book, Understanding the Professional Programmer, Ger-
ald Weinberg emphasizes that the programmer is a more
important reader of the code than the computer’s compiler or
interpreter. He suggests that just like the writing process for
English text, code needs to be rewritten several times before
it becomes exemplary code. He encourages programmers to
spend time reworking code that will be frequently read in the
future [6].
In recent books aimed at professional programmers, Andrew
Hunt, David Thomas, Kent Beck, and Robert Martin tackle
coding style in a variety of ways. In Pragmatic Programmers,
Hunt and Thomas examine the tools, processes, and tricks
that help programmers master their craft [7]. In Clean Code,
Robert Martin addresses techniques to help a developer be-
come a better programmer [8]. In Implementation Patterns,
Kent Beck addresses good software development design pat-
terns [9]. In short, they distill their life long experiences into
best practices, some of which address code readability.
In recent studies, researchers examine code readability
from different approaches: automated improvement tech-
niques, naming of identifiers, syntax, and automated metrics.
Several studies attempt to automate techniques to improve
code readability. Wang examines the automatic insertion of
blank lines in code to improve readability [10] whereas Sasaki
reorders programming statements to improve readability by
declaring variables immediately before their utilization [11].
Several researchers examine the naming of identifiers [12],
[13], [14], [15]. Relf’s tool encourages the developer to
improve variable and method names [15]. Binkley observes
that camel case is easier to read than underscore variables
[16]. Jones looks at the issues with operator precedence in
code readability [17].
While human assessment remains the gold standard of code
readability, automated metrics often serve as a proxy. Several
studies strive to create code readability metrics so that a
computer program determines the readability [18], [19], [20].
Incorporating these metrics into static analysis tools, de-
velopment environments, and IDEs provides an inexpensive
assessment. Substituting the computer for a human produces
problems. Metrics using character counts or dictionary words
might score a variable named “something confusing” or
“something vague” as equally readable as a variable that
is “exactly what i mean.” While a statistical approach to
readability metrics is helpful, these measures do not reveal
the programmer’s intention.
A. Comparison to other techniques
Fagan Inspections, Code Reviews, and Pair Programming
are other techniques that improve code quality as summarized
in Table I. Fagan Inspections are a proven, time intensive
process for finding defects where a committee of developers
reviews code [21]. Inspections often include programming
style guides and coding standards. While the author is present,
the emphasis is on defect identification, not revealing why
reviewers might be confused by the code. Code Reviews are
a popular, light-weight process where one developer reviews
code before it is committed to the master branch or trunk of a
source code management system [22]. The author receives a
list of suggested changes or issues to fix. Since the author is
not present, the author does not see the process the reviewer
goes through to understand the code. Developers primarily
use code reviews for bug detection [22], [23], not for training
developers how to write readable code. Pair programming
occurs when two developers write the code at the same time.
Pair programming enables continuous reviewing of code, but
doesn’t provide a fresh perspective to reveal issues for which
the authors are blind to observe. [24] Resistance to adoption
comes either from management who sees it as more expensive
than solo programming or from programmers who do not like
the social implications of the process.
Note: Bacchelli and Bird report that programmers thought
the purpose of code reviews is to find defects, when in reality
the programmers are increasing their understanding of the code
[23]. If this is the main benefit of code reviews, it is possible
to design other mechanisms to increase code understandability
more efficiently than the code review technique.
Perspective-Based-Reading reviews requirements docu-
ments from prescribed roles such as user, developer, and
tester [25]. A developer will convert requirements into a
design and a tester converts requirements into a test plan in
order to determine if there are omissions and defects in the
requirements.
Yet the question remains, “Can the relevant community un-
derstand and maintain the code?” Thus we can ask ourselves,
“how do we know if our code is readable?”
III. CODE REA DAB IL IT Y TES TI NG
The technique proposed here uses an experienced program-
mer to read code samples by thinking out loud and expressing
the reader’s thought process in understanding the code. During
TABLE I
COMPARISON TO OTHER TECHNIQUES
Technique: Code Readability Testing Code Review Fagan Inspection Pair Programming
Purpose: Understand code Find defects, Find defects High quality code
Understand code
Roles: Author Author Author Author
Reader Reviewer Moderator Author
Inspector (2+)
Recorder
Reader / Timekeeper
Feedback to the author is: Synchronous Asynchronous Asynchronous NA
the session, the author of the code observes if and where diffi-
culties emerge. At the end of the session, the two programmers
discuss approaches to improve code readability. This process
reveals to the code author how another programmer parses and
understands the author’s code [26].
1) The author tells the reader the main use case, story card,
or functionality produced. The author does not explain
the design or the code.
2) The author indicates which files were added or modified.
Starting with test cases helps the reader understand how
the code is used by client code.
3) The reader reads the code aloud and explains the reader’s
mental thought process. If the code is unclear, the
reader speculates on the intention of the code. The
reader verbally describes how he or she thinks the code
works and explains his or her thought process. Voicing
questions helps focus the reader and author. If the reader
does not understand a line of code due to unfamiliar
programming syntax, the reader asks the author what
the operation does.
4) The author does not respond to what the reader is
thinking or asking. The author can take notes about what
makes particular sections confusing.
5) At the end, the reader confirms with the author that
the reader properly understands the code. The author
then asks the reader any clarifying questions about the
experience.
6) The author and the reader discuss how to improve the
code.
The Usability Testing technique [27] from Human Com-
puter Interaction serves as a model for this process. In usability
testing, user experience designers watch representative users
attempt tasks on a prototype or the actual interface of a
product. The researcher observes the user to determine what
is obvious and what confuses the user. In particular, the user’s
natural interaction with the system informs natural affordances
for the user experience design. The system deviating from user
expectations indicates opportunities for improved design. In
Code Readability Testing, the product is the source code, and
the user is another developer.
The ideal reader represents future developers and those who
will maintain the system. For the typical team, developers on
the same team serve as ideal readers. For an open source
project, core developers and contributors serve as ideal readers.
The ideal reader possesses experiences similar to those of
the author, and is proficient with the programming language,
framework, and libraries used. If programmers expect their
code to be routinely read by less experienced programmers,
then novices would be ideal readers.
IV. FIE LD ST UDY
The researcher followed the Code Readability Testing with
each programmer in four separate one-on-one sessions to
assess effectiveness and observe improvements over time. The
programmers were 21 software engineering graduate students
enrolled in the “Craft of Software Development” course at
Carnegie Mellon University in Silicon Valley during the Spring
2013 semester.
The researcher scheduled each session for thirty minutes,
spaced three weeks apart, thus producing 84 data points. For
each session, the researcher asked the students to bring “pro-
duction ready” code, software that was ready to be released
on a real project. The students selected their own projects to
work on. At the end of each session, the researcher recorded
the review’s duration, the number and type of issues detected,
and assessment of the overall readability score.
The student’s professional development experience ranged
from zero to eight years. The average number of years of
experience was three years.
A. Readability Score
This paper defines code readability as the amount of mental
effort required to understand the code. After examining the
code, the researcher assigned a readability score following this
scale:
4) Easy to read
3) Pretty easy to read
2) Medium difficulty
1) Very challenging
In existing studies [10], [11], [20], [28], [29], there is no
standard readability definition or score. In both the Buse and
Dorn studies, participants rate code on a Likert scale from
“very unreadable” 1 to “very readable” 5, from “unreadable”
to “readable” [18], [19]. The participants define their own
meaning for readable.
In using this scale, the researcher noticed that the duration
of the review correlated with the amount of effort required.
For example, reviewing “easy to read” code didn’t take much
time to review. The average length was 8 minutes with 4
minutes variance. Reviewing “very challenging” to read code
often consumed the whole session. The correlation between
readability score and the time to review was 0.77
Typically “easy to read” code presents the reader with a
simple to follow narrative, keeping a few items in short term
memory. “Very challenging” code obscures the programmer’s
intention. When the reader grabs a sheet of paper and manually
executes the computer program by writing down variable
values in order to understand the program logic, then the code
is “very challenging” to read.
There are common solutions to many programming prob-
lems. “Very challenging” code might avoid typical solutions
or typical constructs for a solution. When the code’s solution
is different from the reader’s expectation for the solution, the
reader finds the code “very challenging.
After reviewing the data, it became clear that the data
could be collapsed into two distinct groups, “readable” and
“unreadable” code. The researcher grouped “Pretty easy to
read” and “Easy to read” code samples as “readable” code
and groups “Very challenging” and “medium difficulty” code
samples as “unreadable” code. For unreadable code, the code
clearly required rework before submission on a project. When
comparing these two groups, the code samples were indeed,
night and day.
V. RE SU LTS
Research Question 1: Would programmers who repeatedly
follow Code Readability Testing increase the readability of
their code?
After graphing trends in the data, the researcher lumped
the data into four groups: programmers who initially wrote
readable code and made small improvements, programmers
who initially wrote unreadable code and made large improve-
ments, programmers whom initially wrote unreadable code and
continued to do so, and programmers whose results are not
clear.
Result Count
Readable to readable (with small improvements) 11
Unreadable to readable (with large improvements) 5
Unreadable to unreadable 1
Results are not clear 4
Total 21
Starting from the first session, 11 of the programmers wrote
readable code consistently. While small improvements can be
made to the code, the reader easily understood the code. Of
these 11, five progressed from “pretty easy to read” to “easy
to read” as represented by Figure 1. The process did not hurt
the programmer’s ability to write code.
Five programmers initially produced “unreadable code” but
over time started improving and finished by writing “readable
Fig. 1. Programmer #16 consistently wrote “readable” code with small
improvements
Fig. 2. Programmer #11 started by writing “unreadable” code and progressed
to “readable” code
code” as illustrated by Figure 2. For some, immediate changes
occurred, whereas for one programmer, the change required a
few sessions.
One programmer consistently wrote “unreadable code” dur-
ing each session as shown in Figure 3. While the programmer
improved variable and method naming, the programmer ig-
nored feedback such as breaking multiple nested for loops
and if statements. Instead of taking the time to increase
readability, the participant reasoned, “I want my code to be as
efficient as possible.” (Ironically, by only making readability
improvements, the readable code was more efficient than the
original code.)
Four of the data plots were “all over the place.” While two
of them trended towards more “readable code,” the researcher
classified them as outliers. Considering the entire sample
size, this means that 16 of the 21 programmers improved
their ability to write readable code. When considering the 10
programmers who could benefit from improving readability
testing, five achieved large improvements.
Looking only at the first and last sessions, then an interest-
ing result emerged. During the first session, 13 programmers
wrote readable code and all still wrote readable code at
the end. During the first session, eight programmers wrote
Fig. 3. Programmer #1 continued to write “unreadable” code
unreadable code, and at the end two wrote unreadable code,
and six wrote readable code.
Result 1: Most programmers who write “unreadable” code
significantly improve and start writing “readable” code after
four sessions. Programmers who initially write “readable”
code also improve their ability to write readable code.
Research Question 2: What kinds of issues does Code
Readability Testing detect?
In reviewing the notes on the 84 sessions, the researcher
classified suggestions and feedback based upon feedback type.
The researcher relied on unstructured interview notes, not
an inspection checklist. The following table prioritizes the
feedback by the frequency of each feedback type across all
84 sessions. For example, 45 of the 84 reviews mentioned
altering the name of variables as a means improve readability.
Improve code readability by Number of Reviews
Improving variable names 45 / 84
Improving method names 25/ 84
Extract method to reduce code duplication 26 / 84
Simplifying if conditions 10 / 84
Reducing if nesting 11 / 84
Simplifying loop conditions 11 / 84
Reducing loop structures 5 / 84
Improving class names 3 / 84
Re-sequencing method arguments 1 / 84
Simplifying data structures 1 / 84
Although not a specific goal, readability testing found nine
defects in eight of the code samples.
Result 2: Code readability testing detects readability
issues that are solved by improvements to variable names,
improvements to method names, the creation of new methods
to reduce code duplication, simplifying if conditions and
nesting of if statements, and simplifying loop conditions.
Research Question 3: How time-consuming is readability
testing?
The reader’s subjective experience was that processing
“easy to read” code was not time consuming. If a system is
composed entirely of “easy to read” code, then the overhead
of this process is small. If a system has “very challenging”
sections of code, then it is worth reviewing. When the reviewer
detects unreadable code, terminating the process allows a
discussion of ways to improve code readability.
Readability Score Median time on review
Very challenging * 30 minutes
Medium difficulty 20 minutes
Pretty easy to read 11 minutes
Easy to read 8 minutes
Note: the sessions were limited to 30 minutes, the length
of the meeting. Often another session was scheduled after any
given session. If the reader could not understand the code after
30 minutes, the session was ended.
Result 3: For readable code, readability testing is
straightforward. For unreadable code, the process takes
significant time. Once unreadable code is detected, the reader
and the author can agree that the code needs rework and end
the session early.
Research Question 4: How did programmers perceive Code
Readability Testing?
At the end of the four sessions, the programmers answered
an anonymous survey about their experience with 20 of the
21 participants completing the survey. The self-assessment
exposes the programmers’ perception of the technique.
Question: “Was it worth your time or not worth your time?”
20 out of 20 say that following the process was worth their
time.
Question: “Why was it worth or why was it not worth your
time?”
The free-text responses were grouped according to themes.
If participants mentioned multiple reasons, then each reason
counts in each theme.
Code Readability Testing... Count
allows me to see areas of improvement to increase code
readability
9
allows me to see a different perspective on my code 7
provides guidance by someone with more experience 4
motivates me to improve the readability of my code 3
allows me to know if my code was understandable 3
allows me to improve my programming speed 1
increases collaboration of software development process 1
Question: “Did you learn how another developer reads and
understands your code?”
Out of the 20 participants, 18 participants said yes, and two
skipped the question.
Question: “How has this affected the way you write soft-
ware?”
See Table II for responses.
TABLE II
HOW HA S TH IS AFF ECT ED T HE WAY YOU WR IT E SOF TWAR E?
I now... Count
choose clearer variable and method names 9
consider the needs of future readers 7
think about the code narrative 5
write shorter methods 2
don’t repeat yourself (DRY) 2
avoid deep nested if-else logic 1
re-read code before committing 1
isolate complex logic into a method 1
Questions: “Did you see the reader struggle with under-
standing your code?”
Out of the 20 participants, 10 participants said yes.
Question: “If so, how did it make you feel?”
I am... Count
motivated to write more readable code 5
inspired as it was revealing and insightful 4
Result 4: Programmers think following readability testing
is worth their time. Their ability to write readable code
increases. They articulate concrete improvements to the way
they write code. When programmers see a reader struggle to
understand their code, the programmers are willing to write
readable code and inspired by another developer’s point of
view..
VI. TH RE ATS TO VALIDITY
A. Construct Validity
1) Code Readability Testing has the reviewer “think aloud”
as they read through the code. The “think aloud” activity
might not mirror the process a programmer uses when
they read code to themselves.
B. Internal Validity
1) The selection of the reviewer: in order to remove the
difficulty of inter-reviewer reliability, there is only one
reviewer in this study. The reviewer is the researcher,
which leads to possible researcher bias. The results
might change with a different reviewer. Another re-
viewer might find more or fewer issues. Another re-
viewer might be more or less experienced at reading
other people’s code.
The reviewer has professional experience in C, C++,
Java, and Ruby. The reviewer is able to read and under-
stand the provided C#, Javascript, Objective C, Python,
and Dart code. When the reviewer did not understand
programming language syntax or idioms, the reviewer
asked the author for clarification. While the reviewer is
able to understand Javascript code, a more experienced
Javascript programmer might find issues not detected.
2) The selection of programming assignments: the pro-
grammers selected what to work on. The difficulty level
of each session might not be consistent.
3) The selection of programming languages: this study
verifies that the approach works within a variety of pro-
gramming languages and problem domains. For future
research, constraining to a particular language may yield
stronger insights.
4) Influence from other graduate courses: discussions in
the concurrent metrics course and the craft of software
development course about code quality might affect
the results by sensitizing students to the need to write
readable code.
C. External Validity
1) The participants were software engineering students
enrolled in a master’s program. Their professional devel-
opment experience ranged from zero to eight years. The
average number of years of experience was three years.
The correlation between years of industry experience
and improvement was 0.31 showing little relationship
between improvement and years of industry experience.
In fact, the two participants with the most industry ex-
perience (seven years and eight years) both dramatically
improved their ability to write readable code. Since all
the students were still at the beginning of their careers,
the drastic improvements in writing readable code might
not transfer to more experienced programmers.
VII. FUTURE RESEARCH
Several of the programmers appreciated the value a more
experienced developer providing feedback. Future work could
reveal the results when the reader and the author possess
similar expertise, or if the reader possesses less expertise than
the author. If code needs to be readable by less experienced
peers, then learning how less experienced programmers read
code should contain valuable feedback.
Removing the researcher from the reader role would remove
researcher bias. Perhaps students could act as readers for each
other if they’re given training.
Future work could entail a direct analysis between code
reviews and readability testing. Next time, all the programmers
could finish the same programming exercise and the researcher
could directly compare the results from the two techniques.
One subject persistently wrote “unreadable” code. The sub-
ject defended his strategy because “I want my code to be as
efficient as possible.” Future work could examine how preva-
lent is this attitude of writing “efficient” but unreadable code,
determine where its origins, and suggest possible mitigation
steps. In 1974, Knuth proclaimed that premature optimization
is the root of all evil [30], yet the problem remains today.
VIII. CONCLUSIONS
Code readability testing addresses the question, “Is my
code readable?” by exposing the thought process of a peer
reading the code. In this study, 21 programmers followed
Code Readability Testing in four sessions. Most programmers
writing “difficult to read” code became programmers writing
“easy to read” code after three sessions. Programmers writing
“easy to read” code improved their skill. This study identifies
several common fixes to unreadable code including improve-
ments to variable names, improvements to method names,
the creation of new methods to reduce code duplication,
simplifying if conditions and structures, and simplifying loop
conditions. The programmers reported that the technique is
worth their time and articulated how readability testing alters
their programming habits.
REFERENCES
[1] F. P. Brooks Jr., The mythical man-month. Addison-Wesley, 1975.
[2] Computer Science Curriculum 2008: An Interim Revision of CS 2001.
ACM and the IEEE Computer Society, 2008.
[3] P. Bourque and R. Dupuis, Guide to the software engineering body of
knowledge. IEEE Computer Society Press, 2004.
[4] A. Pyster and et al, Graduate Software Engineering (GSwE2009)
Curriculum Guidelines for Graduate Degree Programs in Software
Engineering. Stevens Institute, 2009.
[5] B. W. Kernighan and P. J. Plauger, The Elements of Programming Style,
1982.
[6] G. M. Weinberg, Understanding the Professional Programmer. Dorset
House, 1982.
[7] A. Hunt and D. Thomas, The pragmatic programmer: from journeyman
to master. Addison-Wesley Longman Publishing Co., Inc., 2000.
[8] R. C. Martin, Clean Code: A Handbook of Agile Software
Craftsmanship. Prentice Hall PTR, 2008.
[9] K. Beck, Implementation Patterns. Addison-Wesley Professional, 2006.
[10] X. Wang, L. Pollock, and K. Vijay-Shanker, “Automatic segmenta-
tion of method code into meaningful blocks to improve readability,”
in Proceedings of the 2011 18th Working Conference on Reverse
Engineering.
[11] Y. Sasaki, Y. Higo, and S. Kusumoto, “Reordering program statements
for improving readability,” in Software Maintenance and Reengineering
(CSMR), 2013 17th European Conference on, March 2013.
[12] D. Binkley, D. Lawrie, S. Maex, and C. Morrell, “Identifier length
and limited programmer memory,” Science of Computer Programming,
2009.
[13] S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, “Exploring the influence
of identifier names on code quality: An empirical study,” in Proceedings
of the 2010 14th European Conference on Software Maintenance and
Reengineering. IEEE Computer Society, Conference Proceedings.
[14] B. Liblit, A. Begel, and E. Sweeser, “Cognitive perspectives on the role
of naming in computer programs,” in Proceedings of the 18th Annual
Psychology of Programming Workshop, Conference Proceedings.
[15] P. A. Relf, “Tool assisted identifier naming for improved software
readability: an empirical study,” in 2005 International Symposium on
Empirical Software Engineering.
[16] D. Binkley, M. Davis, D. Lawrie, and C. Morrell, “To camelcase or
under score,” in ICPC ’09. IEEE 17th International Conference on
Program Comprehension, Conference Proceedings, pp. 158–167.
[17] D. M. Jones, “Operand names influence operator precedence decisions,”
CVu, 2008.
[18] R. P. Buse and W. R. Weimer, “Learning a metric for code readability,
IEEE Computer Society, 2010.
[19] J. Dorn, “A general software readability model,” 2012.
[20] D. Posnett, A. Hindle, and P. Devanbu, “A simpler model of software
readability,” in Proceedings of the 8th Working Conference on Mining
Software Repositories.
[21] M. E. Fagan, “Advances in software inspections,” IEEE Transactions in
Software Engineering, 1986.
[22] J. Cohen, S. Teleki, and E. Brown, Best Kept Secrets of Peer Code
Review. Smart Bear, 2006.
[23] A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges
of modern code review,” in Proceedings of the 2013 International
Conference on Software Engineering. IEEE Press, Conference Pro-
ceedings.
[24] J. Cohen. (2013) Does pair programming obviate the need for code re-
view? [Online]. Available: http://blog.smartbear.com/programming/does-
pair-programming-obviate-the-need-for-code-review/
[25] V. R. Basili, S. Green, O. Laitenberger, F. Lanubile, F. Shull,
S. Sørumg˚
ard, and M. V. Zelkowitz, “The empirical investigation of
perspective-based reading,” Empirical Software Engineering, 1996.
[26] T. Sedano. (2011) Code readability testing process. [Online]. Available:
http://sedano.org/journal/2011/3/30/code-readability-process.html
[27] J. Nielsen, Usability Engineering. Morgan Kaufmann Publishers Inc.,
1993.
[28] M. Hansen, R. L. Goldstone, and A. Lumsdaine, “What makes code
hard to understand?” 2013.
[29] D. Crookes, “Generating readable software,” Software Engineering
Journal, vol. 2, no. 3, pp. 64–70, 1987.
[30] D. E. Knuth, “Computer programming as an art,” Communications of
the ACM, vol. 17, no. 12, pp. 667–673, 1974.
... Subsequently, for each dataset, ten programmers, as described in RQ1 (Section IV-D1), are invited and divided into pairs to evaluate the code readability of each base sample and its corresponding mutated test cases on a five-point Likert scale (1 = Very Unreadable, 5 = Very Readable) [58] based on their judgment. Additionally, the concept of code readability is provided for programmers: Code readability is the amount of mental effort required to understand the code [59]. To validate the effectiveness of the assessment, we compute Cohen's kappa coefficient [60] to assess the consistency between the two programmers of each group. ...
Preprint
In recent years, Large language model-powered Automated Program Repair (LAPR) techniques have achieved state-of-the-art bug-fixing performance and have been pervasively applied and studied in both industry and academia. Nonetheless, LLMs were proved to be highly sensitive to input prompts, with slight differences in the expressions of semantically equivalent programs potentially causing repair failures. Therefore, it is crucial to conduct robustness testing on LAPR techniques before their practical deployment. However, related research is scarce. To this end, we propose MT-LAPR, a Metamorphic Testing framework exclusively for LAPR techniques, which summarizes nine widely-recognized Metamorphic Relations (MRs) by developers across three perturbation levels: token, statement, and block. Afterward, our proposed MRs are applied to buggy codes to generate test cases, which are semantically equivalent yet to affect the inference of LAPR. Experiments are carried out on two extensively examined bug-fixing datasets, i.e., Defect4J and QuixBugs, and four bug-fixing abled LLMs released recently, demonstrating that 34.4% - 48.5% of the test cases expose the instability of LAPR techniques on average, showing the effectiveness of MT-LAPR and uncovering a positive correlation between code readability and the robustness of LAPR techniques. Inspired by the above findings, this paper uses the test cases generated by MT-LAPR as samples to train a CodeT5-based code editing model aiming at improving code readability and then embeds it into the LAPR workflow as a data preprocessing step. Extensive experiments demonstrate that this approach significantly enhances the robustness of LAPR by 49.32% at most.
... From a research perspective, code readability has been studied from different angles. (i) developers' perception of readability has been characterized (Sedano 2016;dos Santos and Gerosa 2018), (ii) the impact of a wide range of coding practices on readability has been analyzed (Ajami and Woodbridge 2017), and (iii) various metrics and theoretical models have been proposed to characterize and automatically assess code readability (Scalabrino et al. 2018). ...
Article
Full-text available
Context While developing software, developers must first read and understand source code in order to work on change requests such as bug fixes or feature additions. The easier it is for them to understand what the code does, the faster they can get to working on change tasks. Source code is meant to be consumed by humans, and hence, the human factor of how readable the code is plays an important role. During the past decade, software engineering researchers have used eye trackers to see how developers comprehend code. The eye tracker enables us to see exactly what parts of the code the developer is reading (and for how long) in an objective manner without prompting them. Objective In this paper, we leverage eye tracking technology to replicate a prior online questionnaire-based controlled experiment (Johnson et al. 2019) to determine the visual effort needed to read code presented in different readability rule styles. As in the prior study, we assess two readability rules - minimize nesting and avoid do-while loops. Each rule is evaluated on code snippets that are correct and incorrect with respect to a requirement. Method This study was conducted in a lab setting with the Tobii X-60 eye tracker where each of the 46 participants. 21 undergraduate students, 24 graduate students, and 6 professional developers (part-time or full-time)) participated and were given eight Java methods from a total set of 32 Java methods in four categories: ones that follow/do not follow the readability rule and that are correct/incorrect. After reading each code snippet, they were asked to answer a multiple-choice comprehension question about the code and some questions related to logical correctness and confidence. In addition to comparing the time and accuracy of answering the questions with the prior study, we also report on the visual effort of completing the tasks via gaze-based metrics. Results The results of this study concur with the online study, in that following the minimize nesting rule showed higher confidence (14.8%14.8%14.8\%) decreased time spent reading programming tasks (7.1%7.1%7.1\%), and decreased accuracy in finding bugs (5.4%5.4%5.4\%). However, the decrease in accuracy was not significant. For method analysis tasks showing one Java method at a time, participants spent proportionally less time fixating on code lines (9.9%9.9%9.9\%) and had fewer fixations on code lines (3.5%3.5%3.5\%) when a snippet is not following the minimize-nesting rule. However, the opposite is true when the snippet is logically incorrect (3.4% and 3.9%, respectively), regardless of whether the rule was followed. The avoid do-while rule, however, did not have as significant of an effect. Following the avoid do-while rule did result in higher accuracy in task performance albeit with lower fixation counts. We also note a lower rate for a majority of the gaze-based linearity metrics on the rule-breaking code snippet when the rule-following and rule-breaking code snippets are displayed side-by-side. Conclusions The results of this study show strong support for the use of the minimize nesting rule. All participants considered the minimize nesting rule to be important and considered the avoid do-while rule to be less important. This was despite the results showing that the participants were more accurate when the avoid do-while rule was followed. Overall, participants ranked the snippets following the readability rules to be higher than the snippets that do not follow the rules. We discuss the implications of these results for advancing the state of the art for reducing visual effort and cognitive load in code readability research.
... While working on source code documentation it is beneficial to use coding guidelines, conventions, checklists, tools such as Doxygen, as well as to write tests [47,48]. Additionally, code readability testing has been introduced, such as pair programming and reviewing source code by others, as method to improve software quality and its documentation [49]. ...
Article
Full-text available
While source code of software and algorithms depicts an essential component in all fields of modern research involving data analysis and processing steps, it is uncommonly shared upon publication of results throughout disciplines. Simple guidelines to generate reproducible source code have been published. Still, code optimization supporting its repurposing to different settings is often neglected and even less thought of to be registered in catalogues for a public reuse. Though all research output should be reasonably curated in terms of reproducibility, it has been shown that researchers are frequently non-compliant with availability statements in their publications. These do not even include the use of persistent unique identifiers that would allow referencing archives of code artefacts at certain versions and time for long-lasting links to research articles. In this work, we provide an analysis on current practices of authors in open scientific journals in regard to code availability indications, FAIR principles applied to code and algorithms. We present common repositories of choice among authors. Results further show disciplinary differences of code availability in scholarly publications over the past years. We advocate proper description, archiving and referencing of source code and methods as part of the scientific knowledge, also appealing to editorial boards and reviewers for supervision.
... Adapting SEGMENT's heuristics to IoT code by considering structural elements like event handlers, data processing, and communication tasks allows for inserting blank lines between logically related code segments, enhancing readability. Meaningful variable names and descriptive method names are important for clarity [51,53]. In IoT development, employing clear and descriptive names for variables representing sensors, actuators, and data improves code readability, especially when methods interact with sensors or perform specific tasks. ...
Preprint
Full-text available
Context: IoT systems, networks of connected devices powered by software, require studying software quality for maintenance. Despite extensive studies on non-IoT software quality, research on IoT software quality is lacking. It is uncertain if IoT and non-IoT systems software are comparable, hindering the confident application of results and best practices gained on non-IoT systems. Objective: Therefore, we compare the code quality of two equivalent sets of IoT and non-IoT systems to determine whether there are similarities and differences. We also collect and revisit software-engineering best practices in non-IoT contexts to apply them to IoT. Method: We design and apply a systematic method to select two sets of 94 non-IoT and IoT systems software from GitHub with comparable characteristics. We compute quality metrics on the systems in these two sets and then analyse and compare the metric values. We analyse in depth and provide specific examples of IoT system's complexity and how it manifests in the codebases. After the comparison, We systematically select and present a list of best practices to address the observed difference between IoT and non-IoT code. Results: Through a comparison of metrics, we conclude that software for IoT systems is more complex, coupled, larger, less maintainable, and cohesive than non-IoT systems. Several factors, such as integrating multiple hardware and software components and managing data communication between them, contribute to these differences. Considering these differences, we present a revisited best practices list with approaches, tools, or techniques for developing IoT systems. As example, applying modularity, and refactoring are best practices for lowering the complexity. Conclusion: Based on our work, researchers can now make an informed decision using existing studies on the quality of non-IoT systems for IoT systems.
... Our findings complement the results of previous source code readability studies [19], which reported that developers tend to focus on a limited number of lines within a method rather than reading the entire method. The results align with previous studies on readability showing that control flow, logical conditions, and indentation associated make source code difficult to understand and maintain [18], [24], [27]. 8 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. ...
Article
Full-text available
To address a bug detection task, developers often compare two versions of source code. Comparison tools support this task by showing both versions of the code in a split view or unified way. However, there is still little knowledge of how developers interact with code comparison tools during a source code change analysis task. Therefore, the strengths and weaknesses of each approach remain unknown. To address this research gap, we investigate their usefulness, limitations, and potential room for improvement. To this end, we conducted a user study involving 12 participants who analyzed the source code in two commits to detect bugs. In the study, we adopted a within-subjects approach. Each participant addressed similar tasks using the split and unified views of GitHub while using an eye-tracking device. We found that participants experienced less visual effort when using the unified view. The results suggest that a lower effort can promote a deeper analysis of the code, which often leads to the detection of more bugs.We also observed that participants predominantly looked at conditionals, class/instance variables, and code changes. However, we did not find statistically significant differences among the variables analyzed when using split or unified views. We consider the results of the study to be relevant to practitioners and researchers alike. Practitioners will be informed when choosing a view, while researchers will identify opportunities to improve code comparison tools.
Article
The better the code quality and the less complex the code, the easier it is for software developers to comprehend and evolve it. Yet, how do we best detect quality concerns in the code? Existing measures to assess code quality, such as McCabe’s cyclomatic complexity, are decades old and neglect the human aspect. Research has shown that considering how a developer reads and experiences the code can be an indicator of its quality. In our research, we built on these insights and designed, trained, and evaluated the first deep neural network that aligns a developer’s eye gaze with the code tokens the developer looks at to predict code comprehension and perceived difficulty. To train and analyze our approach, we performed an experiment in which 27 participants worked on a range of 16 short code comprehension tasks while we collected fine-grained gaze data using an eye tracker. The results of our evaluation show that our deep neural sequence model that integrates both the human gaze and the stimulus code, can predict (a) code comprehension and (b) the perceived code difficulty significantly better than current state-of-the-art reference methods. We also show that aligning human gaze with code leads to better performance than models that rely solely on either code or human gaze. We discuss potential applications and propose future work to build better human-inclusive code evaluation systems.
Article
Adequately selecting variable names is a difficult activity for practitioners. In 2018, Jaffe et al. proposed the use of statistical machine translation (SMT) to suggest descriptive variable names for decompiled code. A large corpus of decompiled C code was used to train the SMT model. Our paper presents the results of a partial replication of Jaffe’s experiment. We apply the same technique and methodology to a dataset made of code written in the Pharo programming language. We selected Pharo since its syntax is simple – it fits on half of a postcard – and because the optimizations performed by the compiler are limited to method scope. Our results indicate that SMT may recover between 8.9% and 69.88% of the variable names depending on the training set. Our replication concludes that: (i) the accuracy depends on the code similarity between the training and testing sets; (ii) the simplicity of the Pharo syntax and the satisfactory decompiled code alignment have a positive impact on predicting variable names; and (iii) a relatively small code corpus is sufficient to train the SMT model, which shows the applicability of the approach to less popular programming languages. Additionally, to assess SMT’s potential in improving original variable names, ten Pharo developers reviewed 400 SMT name suggestions, with four reviews per variable. Only 15 suggestions (3.75%) were unanimously viewed as improvements, while 45 (11.25%) were perceived as improvements by at least two reviewers, highlighting SMT’s limitations in providing suitable alternatives.
Article
Full-text available
What factors impact the comprehensibility of code? Previous research suggests that expectation-congruent programs should take less time to understand and be less prone to errors. We present an experiment in which participants with programming experience predict the exact output of ten small Python programs. We use subtle differences between program versions to demonstrate that seemingly insignificant notational changes can have profound effects on correctness and response times. Our results show that experience increases performance in most cases, but may hurt performance significantly when underlying assumptions about related code statements are violated.
Article
Conference Paper
In order to understand source code, humans sometimes execute the program in their mind. When they illustrate the program execution in their mind, it is necessary to memorize what values all the variables are along with the execution. If there are many variables in the program, it is hard to their memorization. However, it is possible to ease to memorize them by shortening the distance between the definition of a variable and its reference if they are separated in the source code. This paper proposes a technique reordering statements in a module by considering how far the definition of a variable is from its references. We applied the proposed technique to a Java OSS and collected human evaluations for the reordered methods. As a result, we could confirm that the reordered methods had better readability than their originals. Moreover, we obtained some knowledge of human consideration about the order of statements.
Article
In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from 120 human annotators, we derive associations between a simple set of local code features and human notions of readability. Using those features, we construct an automated readability measure and show that it can be 80 percent effective and better than a human, on average, at predicting readability judgments. Furthermore, we show that this metric correlates strongly with three measures of software quality: code changes, automated defect reports, and defect log messages. We measure these correlations on over 2.2 million lines of code, as well as longitudinally, over many releases of selected projects. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggest that comments, in and of themselves, are less important than simple blank lines to local judgments of readability.
Article
The book, The Mythical Man-Month, Addison-Wesley, 1975 (excerpted in Datamation, December 1974), gathers some of the published data about software engineering and mixes it with the assertion of a lot of personal opinions. In this presentation, the author will list some of the assertions and invite dispute or support from the audience. This is intended as a public discussion of the published book, not a regular paper.
Conference Paper
Code review is a common software engineering practice employed both in open source and industrial contexts. Review today is less formal and more 'lightweight' than the code inspections performed and studied in the 70s and 80s. We empirically explore the motivations, challenges, and outcomes of tool-based code reviews. We observed, interviewed, and surveyed developers and managers and manually classified hundreds of review comments across diverse teams at Microsoft. Our study reveals that while finding defects remains the main motivation for review, reviews are less about defects than expected and instead provide additional benefits such as knowledge transfer, increased team awareness, and creation of alternative solutions to problems. Moreover, we find that code and change understanding is the key aspect of code reviewing and that developers employ a wide range of mechanisms to meet their understanding needs, most of which are not met by current tools. We provide recommendations for practitioners and researchers.
Article
“Kent is a master at creating code that communicates well, is easy to understand, and is a pleasure to read. Every chapter of this book contains excellent explanations and insights into the smaller but important decisions we continuously have to make when creating quality code and classes.”ヨErich Gamma, IBM Distinguished Engineer“Many teams have a master developer who makes a rapid stream of good decisions all day long. Their code is easy to understand, quick to modify, and feels safe and comfortable to work with. If you ask how they thought to write something the way they did, they always have a good reason. This book will help you become the master developer on your team. The breadth and depth of topics will engage veteran programmers, who will pick up new tricks and improve on old habits, while the clarity makes it accessible to even novice developers.”ヨRuss Rufer, Silicon Valley Patterns Group“Many people don't realize how readable code can be and how valuable that readability is. Kent has taught me so much, I'm glad this book gives everyone the chance to learn from him.”ヨMartin Fowler, chief scientist, ThoughtWorks“Code should be worth reading, not just by the compiler, but by humans. Kent Beck distilled his experience into a cohesive collection of implementation patterns. These nuggets of advice will make your code truly worth reading.”ヨGregor Hohpe, author of Enterprise Integration Patterns“In this book Kent Beck shows how writing clear and readable code follows from the application of simple principles. Implementation Patterns will help developers write intention revealing code that is both easy to understand and flexible towards future extensions. A must read for developers who are serious about their code.”ヨSven Gorts“Implementation Patterns bridges the gap between design and coding. Beck introduces a new way of thinking about programming by basing his discussion on values and principles.”ヨDiomidis Spinellis, author of Code Reading and Code QualitySoftware Expert Kent Beck Presents a Catalog of Patterns Infinitely Useful for Everyday ProgrammingGreat code doesn't just function: it clearly and consistently communicates your intentions, allowing other programmers to understand your code, rely on it, and modify it with confidence. But great code doesn't just happen. It is the outcome of hundreds of small but critical decisions programmers make every single day. Now, legendary software innovator Kent Beckヨknown worldwide for creating Extreme Programming and pioneering software patterns and test-driven developmentヨfocuses on these critical decisions, unearthing powerful “implementation patterns” for writing programs that are simpler, clearer, better organized, and more cost effective.Beck collects 77 patterns for handling everyday programming tasks and writing more readable code. This new collection of patterns addresses many aspects of development, including class, state, behavior, method, collections, frameworks, and more. He uses diagrams, stories, examples, and essays to engage the reader as he illuminates the patterns. You'll find proven solutions for handling everything from naming variables to checking exceptions.This book covers The value of communicating through code and the philosophy behind patterns How and when to create classes, and how classes encode logic Best practices for storing and retrieving state Behavior: patterns for representing logic, including alternative paths Writing, naming, and decomposing methods Choosing and using collections Implementation pattern variations for use in building frameworks Implementation Patterns will help programmers at all experience levels, especially those who have benefited from software patterns or agile methods. It will also be an indispensable resource for development teams seeking to work together more efficiently and build more maintainable software. No other programming book will touch your day-to-day work more often.
Article
Even bad code can function. But if code isnt clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesnt have to be that way.Noted software expert Robert C. Martin, presents a revolutionary paradigm with Clean Code: A Handbook of Agile Software Craftsmanship. Martin, who has helped bring agile principles from a practitioners point of view to tens of thousands of programmers, has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code on the fly into a book that will instill within you the values of software craftsman, and make you a better programmerbut only if you work at it.What kind of work will you be doing? Youll be reading codelots of code. And you will be challenged to think about whats right about that code, and whats wrong with it. More importantly you will be challenged to reassess your professional values and your commitment to your craft. Clean Code is divided into three parts. The first describes the principles, patterns, and practices of writing clean code. The second part consists of several case studies of increasing complexity. Each case study is an exercise in cleaning up codeof transforming a code base that has some problems into one that is sound and efficient. The third part is the payoff: a single chapter containing a list of heuristics and smells gathered while creating the case studies. The result is a knowledge base that describes the way we think when we write, read, and clean code.Readers will come away from this book understandingHow to tell the difference between good and bad codeHow to write good code and how to transform bad code into good codeHow to create good names, good functions, good objects, and good classesHow to format code for maximum readability How to implement complete error handling without obscuring code logicHow to unit test and practice test-driven developmentWhat smells and heuristics can help you identify bad codeThis book is a must for any developer, software engineer, project manager, team lead, or systems analyst with an interest in producing better code.