PySStuBs: Characterizing Single-Statement Bugs in
Popular Open-Source Python Projects
Arthur V. Kamienski, Luisa Palechor, Cor-Paul Bezemer, Abram Hindle
University of Alberta
Edmonton, Alberta, Canada
Abstract—Single-statement bugs (SStuBs) can have a severe
impact on developer productivity. Despite usually being simple
and not offering much of a challenge to ﬁx, these bugs may still
disturb a developer’s workﬂow and waste precious development
time. However, few studies have paid attention to these simple
bugs, focusing instead on bugs of any size and complexity. In
this study, we explore the occurrence of SStuBs in some of
the most popular open-source Python projects on GitHub, while
also characterizing their patterns and distribution. We further
compare these bugs to SStuBs found in a previous study on
Java Maven projects. We ﬁnd that these Python projects have
different SStuB patterns than the ones in Java Maven projects
and identify 7 new SStuB patterns. Our results may help uncover
the importance of understanding these bugs for the Python
programming language, and how developers can handle them
Index Terms—Single-statement bugs, Python, open-source
All software developers have to deal with bugs at some
point in their careers, either while working on toy projects for
leisure or developing enterprise-grade software for industry.
These bugs may occur due to countless reasons, from syntax
errors to programming logic-related issues . Bugs may also
vary in size and complexity, ranging from a single wrong token
to many lines of code spread across different components.
Tricky bugs that span multiple functions and statements may
offer developers a great challenge to unravel, making them
waste precious hours of their work time . Single-statement
bugs (also known as simple stupid bugs, or SStuBs) 
also jeopardize developer productivity by interrupting their
workﬂow, despite being easier to ﬁx. Frequently occurring
SStuBs may signiﬁcantly impact workﬂows by continuously
making developers switch contexts to ﬁx problems.
Several studies have analyzed the impact of bugs on de-
veloper productivity and projects’ lifecycles , . Many
of those have focused on automatically identifying bugs to
relieve the developers’ burden of manually searching and
ﬁxing them . However, researchers have not given a lot
of attention to SStuBs and their relevance to software devel-
opment. Studying and characterizing those SStuBs can help
developers in identifying them sooner, reducing the amount
of time they invest in solving the problem.
Recently, a study by Karampatsis and Sutton  identi-
ﬁed and analyzed SStuBs in 1,000 open-source Java Maven
projects. The authors characterize 16 SStuB patterns and
discuss their frequency in those projects. While that study
sheds light on how SStuBs occur and the importance of
studying them, it only focuses on Java Maven projects.
With that in mind, this paper identiﬁes and analyzes the
occurrence of SStuBs in a different programming language,
namely 1,000 of the most popular Python projects on GitHub.
The Python programming language shows several differences
from other languages such as Java , , which may be
reﬂected in the types and number of occurrences of SStuBs.
By collecting data from these Python projects and using
a similar approach as the one used by Karampatsis and
Sutton , we seek to understand the differences in SStuBs
between the Java and Python projects. More speciﬁcally, we
answer the following Research Questions (RQs):
RQ1. What are the most common single-statement bugs
in the most popular open-source Python projects?
The differences between Python and other programming
languages may result in the occurrence of different types
of single-statement bugs in Python projects. In this research
question, we discuss the types of SStuBs we identiﬁed in
the studied Python projects. We identify the 16 top occurring
patterns, and characterize 7 new patterns not found within the
patterns presented by Karampatsis and Sutton .
RQ2. How do the single-statement bugs we identiﬁed
compare to the ones found in Java Maven projects?
While Python projects might contain new SStuB patterns,
we still expect to ﬁnd some of the patterns described for Java
Maven projects due to the similarities between the syntaxes
of the two languages (e.g., control structures and arithmetic
operators). In this research question, we compare the types
of Python SStuBs to the ones in Java Maven projects as
described by Karampatsis and Sutton . We ﬁnd that some of
the SStuBs are unique to each programming language, which
affects their frequency in the projects.
In this section, we describe the methodology we used to
select the studied projects, gather their data, and identify
SStuBs. Figure 1 shows an overview of the steps we took.
A. Selecting Python projects
We selected the 1,000 most popular Python projects on
GitHub as measured by their number of stars in January, 2021.
We used GitHub’s search engine to obtain a list of Python
Python 3.6 files
from each file
Data gathering Identifying
Fig. 1. Overview of the steps taken in our methodology.
projects (i.e., projects which have most of their content written
in Python) ordered by their number of stars. We chose this
speciﬁc number of projects to provide a fairer comparison with
the 1,000 Java Maven projects collected by Karampatsis and
Sutton  for their “SStuBs L” dataset.
B. Gathering data
We gathered data from the projects we selected in Sec-
tion II-A using World of Code (WoC) , an infrastructure
for mining open-source software and their version control data
which is updated on a monthly basis. Starting from the project
IDs on GitHub, we used the WoC API to collect the commits
associated to them. We used a similar process to gather the
ﬁles containing source code associated with each commit. The
data was collected in January, 2021.
We note that the data we collected from WoC is not an exact
representation of the projects, as the API could not retrieve
some of the entities belonging to the projects. For example,
we could not collect some of the commits referenced by the
projects. However, we measured an overall loss of data of less
than 10%, which should not affect our overall results.
We collected 6,062,534 commits from the 1,000 projects
we selected. We identiﬁed bug-ﬁxing commits using the same
methodology described by Karampatsis and Sutton , i.e., by
ﬁltering commit messages which contain one of the following
keywords: ‘error’, ‘bug’, ‘ﬁx’, ‘issue’, ‘mistake’, ‘incorrect’,
‘fault’, ‘defect’, ‘ﬂaw’, and ‘type’. Using this method, we ob-
tained 1,844,369 bug-ﬁxing commits. In addition, we excluded
commit messages that included the ‘refactor’ keyword to help
reduce the number of false positives in our sample, but those
only accounted for 1% (18,156) of the commits. Lastly, we
ﬁltered out any commits that added or deleted ﬁles.
We moved on to gather the ﬁles referenced by each commit.
In this step, we collected the ﬁles before and after they were
modiﬁed by the commit. As the projects can also contain ﬁles
that are not written in Python, we ﬁlter out any ﬁles which
do not contain the sufﬁx ‘.py’ in their ﬁle name. We also
excluded any ﬁles not containing a valid Python 3.6 syntax,
as those could not be parsed into Abstract Syntax Trees (ASTs)
in future steps using our version of Python 3.
C. Identifying single-statement bugs
We followed a similar approach as Karampatsis and Sut-
ton  to identify SStuBs. First, we used the Unix ‘diff’
command to identify the line changes between each ﬁle pair.
We selected only the pairs which showed single-statement
changes, while discarding those containing line deletions and
additions. We also discarded all of the ﬁles from commits that
showed multiple-statement changes in any position in any ﬁle.
After this step, we were left with 148,450 ﬁle pairs.
For ﬁle pairs that contain multiple single-statement changes,
we derived new pairs by applying each of the changes to the
original ﬁle, one at a time. Thus, each new pair contained only
one change. We parsed the ﬁles of each pair using Python’s
ast library, yielding an Abstract Syntax Tree (AST) for each of
them. The ast library ignores comments and whitespaces, and
we therefore do not consider changes to those in our analysis.
We also ignore changes to class and function docstrings.
We wrote a custom Python script to compare the resulting
AST pairs. Using the script, we perform a simultaneous
depth-ﬁrst traversal of each tree and locate the ﬁrst pair of
nodes in which the trees differ. Each pair of diverging nodes
thus corresponds to a single-statement change in between the
ﬁles. We manually analyzed each type of node differences to
identify if they matched any of the SStuB patterns as described
by Karampatsis and Sutton . However, unlike Karampatsis
and Sutton , our method only matches the ﬁrst pattern found
for a statement and not all of them. Furthermore, we analyzed
the most common types of node differences to identify the
16 most common SStuBs, and found 7 new SStuB patterns
(described in Section III-A).
D. Removing refactoring changes
We noticed that some of the patterns we identiﬁed in Sec-
tion II-C described changes to function and class deﬁnitions,
such as their names and arguments. As those patterns likely
relate to the refactoring of code and not to bug-ﬁxing changes,
we decided to exclude them from our analysis. We also
excluded changes made to statements that referenced those
refactored entities in any of the ﬁles that originated from the
same commit. For example, if a commit changed the name of
a class in one of its ﬁles, we excluded all of the changes made
to that class’ usages across all of the ﬁles in the commit.
We also observed that many of the single-statement changes
we identiﬁed describe changes to the values of string constants.
Strings in Python serve a large number of purposes, from
indexing values in dictionaries to storing data, and developers
may need to frequently change them to account for new
code versions. Strings also have a ﬂexible length and can
contain any type of written text, and are therefore prone to
errors and misspellings. However, not all of the changes to
those string values can be considered bugs, and including
them in our analysis may introduce many false positives.
For example, developers frequently use hard-coded strings
in natural language to write messages that describe errors,
program functionalities or interactions with users, and changes
to those strings may not change the behaviour of the program.
We therefore excluded these changes from our analysis.
We note that we only identiﬁed trivial changes, and we
did not remove more complex refactorings. In the end, we
were left with 126,912 single-statement changes that altered
the ASTs, 58% (73,013) of which belonging to the 23 SStuB
patterns we used. The remaining changes did not ﬁt any of
our patterns. Our ﬁnal dataset with 73,013 SStuBs is publicly
available online .
III. RES ULTS
In this section we answer our two RQs by describing the
results of our analysis of the 73,013 SStuBs and 23 patterns
we obtained from Section II.
A. RQ1. What are the most common single-statement bugs in
the most popular open-source Python projects?
We found 7 new SStuB patterns in Python projects. Out
of the 23 SStuBs we identiﬁed, 7 were not previously deﬁned
by Karampatsis and Sutton . While some of these patterns
may occur in Java, others only occur due to the difference in
syntax between the languages. We give a brief description of
each of these new patterns below. The number of occurrences
of each pattern can be seen in Table I.
•Change Attribute Used - When developers change
the attribute accessed from an object. For example,
person.name changes to person.age.
•Add Function Around Expression - When developers put
an expression inside a function call, often for modifying
the returned value. For example, human = person
changes to human = is_human(person).
•Add Elements to Iterable - When developers add an
element to a hard-coded iterable, such as a list or a
tuple. For example, info = (name, age) changes
to info = (name, age, height).
•Change Keyword Argument Used - When developers
change the keyword argument used in a function call or
object instantiation. For example, Person(name=20)
changes to Person(age=20).
•Add Method Call - When developers add a method call
to an expression which references an object, changing the
return value. For example, year = person changes to
year = person.birth_year().
•Change Constant Type - When developers
change the type of a hard-coded constant. For
example, person.age = ‘10’ changes to
person.age = 10.
•Add Attribute Access - When developers access the
attribute of an object instead of the object itself.
For example, say_hello_to(person) changes to
COU NTS O F SSTUBPATT ERN S IN PY THO N AN D JAVA MAVEN P ROJ ECT S.
PATTER NS I N BOL D IN DIC ATE TH E NEW PATT ERN S WE I DEN TI FIED .
NUM BER S IN B OLD S HOW T HE PATTE RN S THAT OC CU R OVER T WO T IME S
MORE IN JAVA .
Pattern name Python % Java  %
Same Function More Args 9,958 14 5,100 8
Wrong Function/Method Name 9,091 12 10,179 16
Change Identiﬁer Used 8,973 12 22,668 35
Add Function Around Expression 6,363 9 0 0
Change Attribute Used 5,229 7 0 0
Change Numeric Literal 4,775 7 5,447 8
Change Operand 4,657 6 807 1
Same Function Less Args 3,381 5 1,588 2
Add Method Call 3,338 5 0 0
Add Elements to Iterable 2,541 3 0 0
More Speciﬁc If 2,443 3 2,381 4
Change Constant Type 2,199 3 0 0
Change Unary Operator 2,187 3 1,016 2
Change Keyword Argument Used 1,554 2 0 0
Change Boolean Literal 1,466 2 1,842 3
Add Attribute Access 1,439 2 0 0
Same Function Wrong Caller 1,163 2 1,504 2
Change Binary Operator 976 1 2,241 5
Less Speciﬁc If 943 1 2,813 4
Same Function Swap Args 336 >1 612 1
Change Modiﬁer 0 0 5,011 8
Delete Throws Exception 0 0 508 1
Missing Throws Exception 0 0 206 >1
Total 73,013 100 63,923 100
B. RQ2. How do the single-statement bugs we identiﬁed
compare to the ones found in Java Maven projects?
The studied Python and Java Maven Projects share most
of the 16 original SStuBs. We could ﬁnd 13 of the 16 SStuBs
identiﬁed in Java Maven projects in Python projects, although
in different proportions. We applied a Chi-squared (χ2) test
to the SStuB categories found both in Java and Python and
found that the difference in proportion of SStuB types was
statistically signiﬁcant (p<.001).
We observed this difference in patterns such as Wrong
Function/Method Name (as seen in a commit from the
Keras project  with a change from model.train to
model.fit), which comprised 16% (10,179) of the bugs
in Java and 12% (9,091) of the bugs in Python.
While shared patterns can occur in both languages, dif-
ferences between the syntax and type system of the two
programming languages make it impossible for the other three
patterns to occur in Python. Therefore, we did not observe
any Change Modiﬁer,Missing Throws Exception or Delete
Throws Exception SStuBs. The Python programming language
does not have access level modiﬁers (e.g., public and private)
and developers instead use naming conventions to simulate
the access restriction behaviour. Similarly, there is no way to
explicitly denote that a function throws an exception.
Many of the new SStuBs we identiﬁed relate to Python’s
dynamic type system. We observed that many of the SStuB
patterns we identiﬁed in Python can be linked to the ﬂexible
way in which it allows developers to work with data types.
In Python, variables can be created and assigned a value
without an explicit type declaration, and then later be reas-
signed a new value of a different type. This dynamic typing
system allows for patterns such as Change Constant Type,
exempliﬁed by a commit in the Django project  when the
value assigned to a variable was changed from a string to an
integer (param = "1" to param = 1).
Other examples are Add Method Call and Add At-
tribute Access, which can only occur if the change from
an object reference to a return value is allowed. We ob-
served this pattern in a commit in the Ansible project ,
when base.group_upgrade(group) was changed to
base.group_upgrade(group.id). Similarly, the Add
Function Around Expression pattern is usually related to
changing the value and type of an expression. We could
observe such a change in the Scipy project , in which a
variable was cast to an integer when being returned from a
function (return nnz changed to return int(nnz)).
In contrast, Java has a static type system that checks for
type inconsistencies during program compilation. While this
system still allows for changes in the value of variables and
constants of the same type (described by patterns such as
Change Identiﬁer Used and Change Numerical Literal), it
prevents the occurrence of the patterns mentioned above.
IV. THR EATS TO VALI DI TY
Internal validity. We selected the 1,000 Python projects
based on their number of stars as a measure of popularity.
However, there are other ways to measure the popularity of
a project (e.g., the number of forks and contributors) which
could lead to the selection of a different set of projects.
Construct validity. We are limited by the accuracy of the
data provided by the GitHub project search. This may have
excluded some relevant projects from our analysis, including
projects that are more popular than the ones we selected.
Despite checking commits and ﬁles for refactoring changes,
we could not detect all of them and the number of SStuBs
may be overestimated. For example, we did not check for
the renaming of entities such as variables. Others have shown
that identifying refactorings in Python is complicated due to
Python’s dynamic nature . As Python is a dynamically
typed language, refactoring Python code tends to cause more
errors than in statically typed languages like Java .
External validity. Our results are limited to popular open-
source Python projects and may not generalize to other pro-
gramming languages, or even Python code from other sources.
However, many of our ﬁndings overlap with the ones from
Karampatsis and Sutton , which may indicate general trends
for other programming languages and projects.
V. REL ATED W OR K
Prior work has also focused on detecting bugs using AST
representations. Karampatsis and Sutton  found single-
statement bugs by mining a set of 100 and 1,000 open-
source Java projects. The authors used ASTs extracted from
the modiﬁed ﬁles before and after the bugs were ﬁxed,
ﬁnding that around 33% of the ﬁxes could be described with
their patterns. Martinez and Durieux  developed repair
tools for Python, presenting an empirical study to repair the
QuixBugs benchmark, even though the authors focus on Java
implementations. Zhaogui and Liu  proposed a predictive
analysis of Python projects by collecting traces and detecting
bugs. They evaluated their prototype on 11 Python projects and
ﬁnd 46 bug types. Chen and Lin  used ASTs to study ﬁne-
grained source code changes in Python and, later on, analyzed
the dynamic feature of Python code when ﬁxing bugs .
Other studies investigated program repair patterns and some
of the patterns we use in our work have been used by Le Goues
et al. , Kim et al. , and Karampatsis and Sutton .
This research differs from those mentioned above in that
we detect single-statement bugs in 1,000 Python projects and
discuss them with single-statement bugs from Maven projects
found by Karampatsis and Sutton .
In this paper we analyze the most common single-statement
bugs in Python code using data from some of the most popular
open-source Python projects on GitHub. We selected projects
based on their number of stars and used World of Code (WoC)
to collect commit messages and ﬁles. After preprocessing the
data, we compared the Abstract Syntax Trees (ASTs) for pairs
of ﬁles before and after the bug ﬁxes. As a result, we identiﬁed
23 “Simple Stupid Bug” (SStuB) patterns and 73,013 changes
that matched those patterns. Additionally, we characterize 7
new SStuB patterns found in the studied Python projects. We
moved on to compare the SStuBs we found to the ones found
by Karampatsis and Sutton , showing that differences in
the programming languages, and style of typing (dynamic
versus static) change the types of SStuBs identiﬁed. Our
ﬁndings may be used as a way of understanding these types
of bugs occurring in Python code, and may help developers
by improving the way they handle them. We also share our
dataset online , allowing its use in future research.
 “Commit on GitHub: BUG: Homogenezie nnz type to be int for
all sparse matrix types.” [Online]. Available: https://github.com/scipy/
 “Commit on GitHub: Fix using DNF group upgrade/remove
api.” [Online]. Available: https://github.com/ansible/ansible/commit/
 “Commit on GitHub: Fixed #23434 – Coerce Oracle bool params
to int.” [Online]. Available: https://github.com/django/django/commit/
 “Commit on GitHub: Fixed import errors with six.moves.cPickle
and model.train typo in the skipgram embeddings exam-
ple.” [Online]. Available: https://github.com/keras-team/keras/commit/
 Z. Chen, W. Ma, W. Lin, L. Chen, Y. Li, and B. Xu, “A study on the
changes of dynamic feature code when ﬁxing bugs: towards the beneﬁts
and costs of python dynamic features,” Science China Information
Sciences, vol. 61, 01 2018.
 G. Destefanis, M. Ortu, S. Porru, S. Swift, and M. Marchesi, “A
statistical comparison of java and python software metric properties,”
in Proceedings of the 7th International Workshop on Emerging Trends
in Software Metrics, 2016, pp. 22–28.
 L. A. F. Gomes, R. da Silva Torres, and M. L. Cˆ
ortes, “Bug report
severity level prediction in open source software: A survey and research
opportunities,” Information and software technology, vol. 115, pp. 58–
 A. V. Kamienski, L. Palechor, C.-P. Bezemer, and A. Hindle, “PySStuBs:
Single-Statement Bugs in Popular Open- Source Python Projects,” Jan.
2021. [Online]. Available: https://doi.org/10.5281/zenodo.4589607
 R.-M. Karampatsis and C. Sutton, “How often do single-statement
bugs occur? the manysstubs4j dataset,” in Proceedings of the 17th
International Conference on Mining Software Repositories, 2020, pp.
 D. Kim, J. Nam, J. Song, and S. Kim, “Automatic patch generation
learned from human-written patches,” in 2013 35th International Con-
ference on Software Engineering (ICSE). IEEE, 2013, pp. 802–811.
 A. J. Ko and B. A. Myers, “A framework and methodology for studying
the causes of software errors in programming systems,” Journal of Visual
Languages & Computing, vol. 16, no. 1-2, pp. 41–84, 2005.
 C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “Genprog: A
generic method for automatic software repair,” Ieee transactions on
software engineering, vol. 38, no. 1, pp. 54–72, 2011.
 W. Lin, Z. Chen, W. Ma, L. Chen, L. Xu, and B. Xu, “An empirical study
on the characteristics of python ﬁne-grained source code change types,”
in 2016 IEEE International Conference on Software Maintenance and
Evolution (ICSME), 2016, pp. 188–199.
 Y. Ma, C. Bogart, S. Amreen, R. Zaretzki, and A. Mockus, “World of
code: an infrastructure for mining the universe of open source vcs data,”
in 2019 IEEE/ACM 16th International Conference on Mining Software
Repositories (MSR). IEEE, 2019, pp. 143–154.
 S. Planning, “The economic impacts of inadequate infrastructure for
software testing,” National Institute of Standards and Technology, 2002.
 L. Prechelt, “An empirical comparison of c, c++, java, perl, python, rexx
and tcl,” IEEE Computer, vol. 33, no. 10, pp. 23–29, 2000.
 M. Sch¨
afer, “Refactoring tools for dynamic languages,” in Proceedings
of the Fifth Workshop on Refactoring Tools, ser. WRT ’12. New York,
NY, USA: Association for Computing Machinery, 2012, p. 59–62.
[Online]. Available: https://doi- org.login.ezproxy.library.ualberta.ca/10.
 J. Uddin, R. Ghazali, M. M. Deris, R. Naseem, and H. Shah, “A survey
on bug prioritization,” Artiﬁcial Intelligence Review, vol. 47, no. 2, pp.
 Z. Xu, P. Liu, X. Zhang, and B. Xu, “Python predictive analysis for bug
detection,” in Proceedings of the 2016 24th ACM SIGSOFT International
Symposium on Foundations of Software Engineering, ser. FSE 2016.
New York, NY, USA: Association for Computing Machinery, 2016, p.
121–132. [Online]. Available: https://doi.org/10.1145/2950290.2950357
 H. Ye, M. Martinez, T. Durieux, and M. Monperrus, “A comprehensive
study of automatic program repair on the quixbugs benchmark,” in 2019
IEEE 1st International Workshop on Intelligent Bug Fixing (IBF), 2019,
 T. Zhang, H. Jiang, X. Luo, and A. T. Chan, “A Literature Review of
Research in Bug Resolution: Tasks, Challenges and Future Directions,”
The Computer Journal, vol. 59, no. 5, pp. 741–773, 05 2016. [Online].
 W. Zhou, Y. Zhao, G. Zhang, and X. Shen, “Harp: Holistic analysis
for refactoring python-based analytics programs,” in 2020 IEEE/ACM
42nd International Conference on Software Engineering (ICSE), 2020,