PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Retracting academic papers is a fundamental tool of quality control when the validity of papers or the integrity of authors is questioned post-publication. While retractions do not completely eliminate papers from the record, they have far-reaching consequences for retracted authors and their careers, serving as a visible and permanent signal of potential transgressions. Previous studies have highlighted the adverse effects of retractions on citation counts and co-authors’ citations; however, the underlying mechanisms driving these effects and the broader impacts beyond these traditional metrics have not been fully explored. We address this gap leveraging Retraction Watch, the most extensive data set on retractions and link it to Microsoft Academic Graph, a comprehensive data set of scientific publications and their citation networks, and Altmetric that monitors online attention to scientific output. Our investigation focuses on: 1) the likelihood of authors exiting scientific publishing following retraction, and 2) the evolution of collaboration networks among authors who continue publishing after retraction. Our empirical analysis reveals that retracted authors, particularly those with less experience, tend to leave scientific publishing in the aftermath of retraction, particularly if their retractions attract widespread attention. Furthermore, we uncover a pattern whereby retracted authors who remain active in publishing tend to maintain and establish more collaborations compared to their similar non-retracted counterparts. Taken together, notwithstanding the indispensable role of retractions in upholding the integrity of the academic community, our findings shed light on the disproportionate impact that retractions impose on early-career researchers as opposed to those with more established careers.
Characterizing the effect of retractions on scientific careers 1
Shahan Ali Memon, Kinga Makovi*, Bedoor AlShebli*2
Social Science Division, New York University Abu Dhabi 3
*Corresponding author emails: km2537@nyu.edu, bedoor@nyu.edu 4
Abstract 5
Retracting academic papers is a fundamental tool of quality control when the validity of papers or the 6
integrity of authors is questioned post-publication. While retractions do not eliminate papers from the 7
record, they have far-reaching consequences for retracted authors and their careers, serving as a visible 8
and permanent signal of potential transgressions. Previous studies have highlighted the adverse effects 9
of retractions on citation counts and coauthors’ citations; however, the broader impacts beyond these 10
have not been fully explored. We address this gap leveraging Retraction Watch, the most extensive data 11
set on retractions and link it to Microsoft Academic Graph, a comprehensive data set of scientific pub- 12
lications and their citation networks, and Altmetric that monitors online attention to scientific output. 13
Our investigation focuses on: 1) the likelihood of authors exiting scientific publishing following a re- 14
traction, and 2) the evolution of collaboration networks among authors who continue publishing after a 15
retraction. Our empirical analysis reveals that retracted authors, particularly those with less experience, 16
tend to leave scientific publishing in the aftermath of retraction, particularly if their retractions attract 17
widespread attention. We also uncover that retracted authors who remain active in publishing maintain 18
and establish more collaborations compared to their similar non-retracted counterparts. Nevertheless, 19
retracted authors with less than a decade of publishing experience retain less senior, less productive 20
and less impactful coauthors, and gain less senior coauthors post-retraction. Taken together, notwith- 21
standing the indispensable role of retractions in upholding the integrity of the academic community, our 22
findings shed light on the disproportionate impact that retractions impose on early-career authors. 23
Introduction 24
Reputation is a crucial factor in building status, particularly when quality is uncertain or unobservable [1], 25
and when it is produced through highly technical and complex processes. This characterizes creative fields, 26
medicine and science alike. Therefore, when a scientist’s reputation is challenged, the consequences can be 27
severe [2, 3, 4, 5], with long-lasting effects on their career outcomes. The credibility of a scientist, a crucial 28
currency of their reputation, is established over the course of their career based on the quality of their 29
publications [6] among other factors. Therefore, when the quality of one’s work is called into question, 30
the stakes are high, and the consequences can be more significant than the outcome of a single project. 31
While positive signals, such as citations and grants, have been linked to reputation-building [7, 6], little is 32
understood about the relationship between a scientist’s challenged reputation and their career progression 33
in future collaborations. Further research is needed to fully comprehend the impact of such challenges on 34
scientific collaborations and career trajectories. Retractions of scientific papers gives us a window through 35
which to study this question. 36
1
When the integrity of a scientific paper is disputed, editors and authors may choose to remove the37
work from the canon either together or in isolation. While the article may still be accessible, it will be38
accompanied by a retraction notice that explains the reason(s) behind its removal, such as misconduct,39
plagiarism, mistake or other considerations. This creates a clear and visible signal associated with the40
authors of the paper that the quality of their work has come under scrutiny. Retractions have been used41
for this purpose since 1756 [8], and contemporary journals have formal procedures to execute when the42
authors or readers highlight problematic content.43
Prior work investigates the impact of retractions on scientific careers, examining productivity [3], ci-44
tations for retracted papers [4], citations for papers published prior to retraction [2], and post-retraction45
citations of pre-retraction collaborators [5], generally finding negative effects. However, some work also46
demonstrates that these effects are heterogeneous and might vary based on the reason for retraction, and/or47
the prominence of the retracted author [2, 9, 4]. While recent research is breaking new ground on the48
quantitative analysis of the impact of retractions, it often focuses on specific fields, or compares different49
retracted authors to one another (e.g., those who have experienced a single versus multiple retractions).50
Therefore, a comprehensive analysis of retractions across fields and over time has yet to be undertaken51
where retracted authors are compared to otherwise similar non-retracted authors. Beyond documenting the52
impact of retraction on careers, it is essential to examine the mechanisms that might bring these effects53
about. Therefore, we focus both on the continuity of post-retraction careers, as well as the development of54
the collaboration network of retracted authors that is needed to succeed in publishing careers [10, 11, 12].55
Retractions can attract significant attention, particularly when they expose egregious misconduct. Such56
instances not only question the authors’ reputation, but also the public’s trust in science, scientific findings,57
and institutions of science, such as universities, internal review boards, journals, and the peer-review pro-58
cess. Some retractions, therefore, cast a long shadow that extends far beyond the scrutinized work. For59
instance, the retraction of a study on political persuasion and gay marriage in Science in 2015 [13], which60
was likely based on fabricated data, led to questions about the impartiality of reviewers1. Similarly, when61
a paper in Nature was retracted due to falsified images [14], criticism went beyond concerning the conduct62
of the first author, and extended to the male-dominated Japanese academy and its culture of fierce pressure63
and competition2. In both cases, the first authors left scientific publishing careers after receiving extreme64
levels of attention (Altmetric scores above 1000). How systematic the impact of such attention, however, is65
yet to be fully understood. This question, of course, is tied closely to how retracted scientists might rebuild66
their collaboration network, as future collaborators may or may not learn about past events depending on67
the level of attention those received.68
The value in social relationships, and the theory that resources encapsulated in them may be lever-69
aged is longstanding [15]. The assumption, specifically that larger collaboration networks are beneficial, is70
rooted in prior work that documents the benefits of research collaborations, and that of larger collaboration71
networks. Qualitative evidence suggests that researchers collaborate for both instrumental and strategic72
reasons, such as access to specialized expertise, equipment and other resources, visibility for professional73
advancement, and enhanced research productivity, as well as emotional reasons, since many regard collab-74
orative work as energizing and fun [16]. These self-reports are reflected in empirical evidence, such as the75
association between the size of collaboration networks and citations [17], in addition to future productivity76
[18, 19]. Importantly, Ductor and colleagues suggest that the quality of one’s coauthor network signals77
important information about researchers’ quality, and that these signals are crucial to assess one’s research78
potential especially at the beginning of the career [19]. Additionally, prior work reveals that coauthor net-79
1https://www.newyorker.com/science/maria-konnikova/how-a-gay-marriage-study-went-wrong, Accessed: 12/20/2022
2https://www.nytimes.com/2014/07/07/world/asia/academic-scandal-shakes-japan.html, Accessed: 12/20/2022
2
works show higher levels of triadic closure than expected by chance, that is, authors of scientific papers 80
tend to work with former coauthors of their coauthors in the future [20, 21]. Such regularity is based on 81
similarity, but also on strategic considerations, where a scientist brokers relationships among their uncon- 82
nected coauthors, thereby communicating information about the qualities of those they connect that are 83
challenging to observe [22], such as their skill or integrity in the context of scientific publishing. Extending 84
these arguments to authors who experience a retraction, the collaboration networks they maintain, or build 85
could be crucial to recover from a negative signal about the quality of their work, and some processes, such 86
as triadic closure might help them in particular to do so. 87
Drawing on retractions as a (potentially stigmatizing) signal that challenges authors’ reputations, we 88
offer four key empirical observations. First, we find that the extent of attention received by a retraction 89
is positively associated with the likelihood of retracted authors leaving publishing careers. That is, the 90
more public the retraction, the more profound its consequences appear to be for authors’ careers. This 91
finding is especially significant since most attention received by papers extends beyond the content of the 92
science, and involves discussions of great societal importance about the context within which scientific 93
findings are produced. Second, perhaps counterintuitively, we demonstrate that conditional on staying in 94
scientific publishing, retracted authors retain and gain more collaborators compared to otherwise similar 95
authors without retractions. Third, while these larger collaboration networks may benefit retracted authors, 96
retracted authors with less than a decade of publishing experience build qualitatively different, and weaker 97
networks compared to their similar counterparts in terms of their collaborators seniority, productivity and 98
impact post-retraction. Fourth, retractions prompted by misconduct have much more severe consequences 99
for the quality of coauthor networks post-retraction when compared to a retraction resulting from a mistake. 100
Results 101
The consequences of retractions on authors’ careers can be severe, resulting in them leaving scientific 102
publishing entirely. In this context, studying the timing of authors’ last publications after their papers 103
have been retracted can provide valuable insights. To generate these insights, we analyze two main data 104
sets: Retraction Watch (RW) [23], the most extensive publicly available database of retracted papers with 105
over 26,000 publications in around 5,800 venues, and Microsoft Academic Graph (MAG) [24, 25], which 106
provides comprehensive records and citation networks for over 263 million scientific publications and 107
collaboration networks for over 271 million scientists. By merging MAG and RW, we identify over 23,000 108
retracted papers involving almost 73,000 authors (see “Merging RW and MAG” section in Materials and 109
Methods for more information). Linking these two data sets gives us an opportunity to describe who are 110
retracted scientists, and what do their pre-retraction careers look like. We exclude bulk retractions (e.g., 111
when all papers are retracted from a conference proceeding as a result of questionable peer-review [26]), 112
as well as authors with multiple retractions that usually cluster in time (see Supplementary Figure S1). Full 113
details of our pre-processing steps and justifications can be found in Supplementary Note S1. Our final 114
“filtered sample” consists of 4,672 retracted papers, and 15,614 authors. 115
Interestingly, the distribution of the year of the author’s last paper in relation to the retraction year 116
reveals a noteworthy trend, with approximately 25% of authors leaving their publishing careers around 117
the time of retraction (Figure 1a). Specifically, 16.2% leave in the year of retraction (year 0), 4.6% de- 118
part shortly before (year -1), and 4.2% leave in the year following their retraction (year 1). In addition 119
to exploring these aggregate patterns, we further investigate the probability of authors remaining in sci- 120
entific publishing across different academic ages (Supplementary Figure S2), replicating Figure 1a but 121
disaggregated by age. We find that early career authors, specifically those within 0-3 years from their first 122
3
publication, are significantly more likely to leave publishing. Furthermore, we examine if author order,123
affiliation rank, and retraction reason is associated with authors’ continuity in publishing, as illustrated in124
Supplementary Figures S3 to S5. These figures provide valuable insights into the possible factors that are125
associated with authors persisting or not in publishing careers.126
For the purposes of our further analysis, we define attrition as leaving in the year of retraction or shortly127
before, indicated by the absence of published papers in the five years after retraction. Consequently, we ex-128
clude all authors who had already departed from scientific publishing before year -1, accounting for 6.6%129
of the initial sample. This results in an “analytical sample” consisting of 14,583 retracted authors who130
authored 4,471 retracted papers. In this sample, descriptive statistics (Figure 1b) reveal that an overwhelm-131
ing majority of authors of retracted papers belong to STEM fields such as Biology, Medicine, Chemistry,132
Physics, with less than 4% originating from non-STEM fields. Additionally, descriptive comparisons also133
show that those who leave after a retraction are slightly more likely to be women. Furthermore, our analysis134
indicates that authors with more experience—measured by academic age, number of citations, number of135
papers, and the size of collaboration networks at the time of retraction—are less likely to leave scientific136
publishing after a retraction.137
Regarding the characteristics of retracted papers themselves, the majority were published in journals138
rather than conferences and were published after 2010 (Figure 1c). Most of these papers fall within the top139
quartile in terms of journal ranking (information on journal ranking is unavailable for approximately 30%140
of the papers). The distribution of retracted papers can be categorized into three groups: those where all141
authors’ last publication is in the year of retraction (12%), those where some authors had their last paper142
in the year of retraction while others continued to publish (31%), and those where all authors have papers143
published after the year of retraction (56%). The reasons for retractions vary, with approximately 25%144
attributed to misconduct, 30% to plagiarism, 25% to mistakes, and an additional 20% to other reasons.145
For further details on establishing author and paper level features, see the Materials and Methods section146
“Creating author and paper level features, and for numerical comparisons between retracted and non-147
retracted authors, see Supplementary Table S1.148
We next investigate the relationship between attrition and various key factors, such as the experience149
of authors at the time of retraction, the reasons for retraction, and the amount of attention received by the150
retraction. In particular, heightened levels of attention may bring authors into the spotlight, potentially151
influencing their perception within the broader scientific community, including individuals who may not152
have been previously familiar with their scholarly work. To investigate this, we use a third data set, Altmet-153
ric, a database of online mentions of publications, containing a record of more than 191 million mentions154
for over 35 million research outputs that we merge with RW. We measure attention using the Altmetric155
score, tracking it in the six-month period before and after the retraction event as per [27, 28] (see “Creating156
author and paper level features” in Materials and Methods for the calculation of the Altmetric score). In157
Figure 2, we present the logged distribution of the average attention received by the retracted papers during158
this time-window, highlighting that attention peaks during the month of retraction. For a breakdown by159
social media, news media, blogs, and knowledge repositories see Supplementary Figure S6. We find that160
while most retracted papers receive no attention, some gain world-wide publicity. More specifically, 61%161
of retracted papers receive no attention during their life-course, which increases to 84% when considering162
our time window (papers without attention are not shown in Figure 2).163
Our linear probability regression models incorporate attention, where a high level of attention is de-164
fined by an Altmetric score above 20 computed within our window (see “Creating author and paper level165
features” in Materials and Methods for establishing this threshold), as well as author-specific characteris-166
tics at the year of retraction, such as authors’ discipline, gender, affiliation rank and different measures of167
experience, namely, academic age, number of papers, number of citations, and number of collaborators.168
4
We also consider paper-specific measures, such as the year of retraction, the reason for retraction, the type 169
and ranking of venue of the retracted paper, and the number and order of authors on the paper. Regression 170
results (Table 1 and Supplementary Table S2) reveal that retracted authors who leave scientific publishing 171
have had shorter pre-retraction careers, fewer pre-retraction citations, fewer pre-retraction collaborators, 172
and fewer pre-retraction papers. Moreover, those who authored a paper that was retracted due to miscon- 173
duct are more likely to leave scientific publishing after retraction than those whose paper was retracted due 174
to a mistake (p < 0.05). 175
Importantly, experiencing a retraction that garnered high levels of attention is positively associated with 176
leaving publishing careers, across all measures of experience. Specifically, the estimated effect of high lev- 177
els of attention is between 5.4 and 9.8 percentage points, depending on the regression specification. These 178
results are robust to functional form, as logistic regression models yield similar substantive conclusions 179
(see Supplementary Table S3). Since venue ranking information is missing for about 4000 retracted au- 180
thors, we conduct these same analyses by including these authors and excluding the variable that controls 181
for venue ranking of the retracted paper and find substantively similar results (Supplementary Table S4). 182
We also conduct robustness analysis by including authors who left the year after retraction, recognizing 183
that their papers, published the year after retraction, may have already been going through the publication 184
process during the retraction, and find substantively similar results (see Supplementary Tables S5 and S6). 185
Furthermore, as attention metrics are likely to be less accurate for retractions in earlier years, we exclude 186
retractions prior to 2005 and we also find substantively similar results in this case (see Supplementary Ta- 187
ble S7). Lastly, for a smaller sample of about 4000 authors whose retraction notes were manually annotated 188
we conduct additional analysis including who initiated the retraction: the journal or the authors. We do 189
not find a statistically significant association between this characteristic and the likelihood that a retracted 190
author stopped publishing scientific papers (see Supplementary Table S8), while the other results remain 191
similar. 192
Evidently, while retraction may not lead to leaving scientific publishing, it might still affect one’s career 193
progression through impacting an author’s reputation. As such, we continue our analysis by exploring the 194
effect that the retraction of a scientific publication has on an author’s career conditional on staying in scien- 195
tific publishing regarding three main outcomes: (i) the number of collaborators retained post retraction, (ii) 196
the number of new collaborators gained post retraction, and (iii) the share of open triads closed (i.e., when 197
an author coauthors with someone who previously worked with their coauthor [20]). We perform a match- 198
ing experiment where we match every retracted author with similar non-retracted authors who are not part 199
of their collaboration network. From possible matches we explicitly exclude all and not just past collabora- 200
tors to eliminate a negative spillover effect that retraction may cause impacting future collaborators of the 201
retracted authors as studied in [29]. In the matching process, we ensure that every non-retracted author that 202
is identified as a match for a retracted author shares the following exact characteristics with them: gender, 203
academic age, affiliation rank at the start of their career, as well as affiliation rank and scientific discipline 204
at the time of retraction. Furthermore, matches are similar (i.e., within 30% of the values of the retracted 205
author) in terms of number of publications, number of citations, and number of collaborators, and closest 206
matches are selected based on a theoretically calibrated distance function. Since we focus on the career tra- 207
jectory of scientists post retraction, we ensure that both retracted authors and their matches have published 208
at least one more paper post-retraction. This approach aligns the careers of retracted authors with their 209
non-retracted matches, and avoids survival bias [30]. This matching experiment results in the matching of 210
3,094 retracted authors each matched to an average of 1.48 non-retracted authors. See the evaluation of the 211
matched sample we analyze (“Analytical sample for the matching experiment” subsection of Materials and 212
Methods), and Supplementary Note S3 for more details on calculations and binning decisions. 213
Figure 3 summarizes our findings from the matching experiment (when comparing retracted authors 214
5
to non-retracted ones we create a synthetic non-retracted author averaging over all closest matches in case215
there are multiple). We find that authors who have experienced a retraction in their careers tend to gain sig-216
nificantly more new collaborators and retain significantly more of their past collaborators, see Figures 3a–b217
(results are based on Welch t-tests to correct for unequal variances, and are replicated using a Kolmogorow-218
Smirnov and a Wicoxon tests in Supplementary Tables S10-S12). We also find that these results largely and219
consistently hold by gender, year of retraction, academic age, author order on the retracted paper, reason220
for retraction (mistake, plagiarism, misconduct), type of retraction (author-led vs. journal-led), discipline,221
and affiliation rank. We find substantively similar differences between retracted authors and their matches222
when restricting matches to a 20% difference in terms of number of papers, collaborators, and citations223
prior to retraction (Supplementary Figure S7). We do not find credible evidence that retracted and non-224
retracted authors close a different proportion of open triads with authors who had previously coauthored225
with their past collaborators, see Figure 3c.226
It is apparent that authors who survive a retraction, on average, maintain a greater number of previous227
collaborators and gain a higher number of new collaborators. However, it is essential to examine the228
characteristics of these retained and newly formed relationships among retracted authors in comparison to229
their matched counterparts. As such, we focus on three key characteristics: the academic age (seniority),230
number of papers (productivity), and number of citations (impact) of collaborators in both groups (and231
when multiple closest matches exist, we average over these to calculate a single statistic). Additionally,232
we analyze these characteristics across different categories of the retracted authors versus non-retracted233
authors. Specifically, we consider the early-career (0–3 years of experience at the time of retraction), mid-234
career (4-9 years of experience), and senior (10 or more years of experience) authors separately. We also235
consider different reasons for retraction: misconduct, plagiarism, and mistake separately.236
We find that both early and mid-career retracted authors develop qualitatively different collaboration237
networks post-retraction compared to their matched counterparts, and suffer a significant loss. Namely, they238
retain less senior, less productive and less impactful collaborators compared to their matched counterparts239
(Figures 4a-c and Supplementary Table S13). We also find that even senior authors retain less senior240
collaborators post-retraction compared to their matched pairs (Figure 4a). In terms of collaborators gained,241
we find that senior retracted authors are not affected when it comes to the age of and number of citations242
produced by their collaborators, but gain significantly more productive collaborators (Figures 4d-f and243
Supplementary Table S14), while early and mid-career retracted authors are found to gain significantly less244
senior new collaborators (Figure 4d).245
We also perform a difference-in-difference analysis whereby we examine the difference in characteris-246
tics of the average post-retraction coauthor and the average pre-retraction coauthor (note that for multiple247
closest matches this represents the average of averages). This metric captures how much more senior,248
productive and impactful post-retraction coauthors are who are retrained compared to those that are aban-249
doned. We find that this difference among early-career authors is smaller compared to the difference among250
their matched pairs, meaning, that they retain less of their senior collaborators that they had pre-retraction.251
The same difference can be observed in terms of their collaborators’ productivity (Figures 4g-h and Sup-252
plementary Table S15). In addition, this difference is also smaller in terms of citation impact for mid-career253
retracted authors compared to their matched non-retracted counterparts (Figure 4i and Supplementary Ta-254
ble S15).255
We perform a similar analysis by categorizing the sample of retracted authors based on reasons for256
retraction, see Figures 4j-r and Supplementary Tables S16S18. Not surprisingly, we find that those whose257
papers were retracted for misconduct retain less senior, less productive and less impactful collaborators.258
These differences only hold for seniority among collaborators gained, but hold for the difference of dif-259
ferences for seniority and impact. Those who were retracted for plagiarism experience similar changes in260
6
their collaboration networks compared to those retracted for misconduct, with the exception of finding no 261
credible evidence for a difference in retained coauthors’ impact, nor credible evidence for the difference 262
of differences. Retracted authors for mistake do not develop qualitatively different collaboration networks 263
compared to their matched pairs in terms of the seniority, productivity, and impact of their new collabora- 264
tors. However, they retain less senior collaborators. Additionally, the difference between their coauthors 265
whom they retained compared to those whom they lost is also larger compared to their matched pairs, i.e., 266
they retain less productive collaborators compared to those they have lost. 267
In sum, the career impact of retractions may not be fully understood without considering attrition, i.e., 268
leaving scientific careers where publishing is an integral component of success. We find that experiencing 269
a retraction is associated with leaving publishing earlier. However, the results of the matching experiment 270
suggest that when considering retracted authors’ who keep publishing, their collaboration networks do 271
not suffer in size. Those who keep publishing post-retraction, on average, in fact seem to build larger 272
collaboration networks compared to their counterparts, by retaining more collaborators and building new 273
collaborations. This size-difference, however, goes hand-in-hand with qualitative differences, especially 274
among junior and mid-career authors who retain and/or gain less senior, less productive, and less impactful 275
coauthors. This may be due to more junior authors’ reduced ability to be promoted to faculty positions from 276
being postdoctoral scholars, or shifting into lower status roles and participating in larger teams repeatedly, 277
rather than carving out a leading role on projects post-retraction. 278
Discussion 279
Retractions can have significant consequences for authors’ careers, leading to their departure from scien- 280
tific publishing. In this study, we conducted an analysis utilizing data from Retraction Watch, Microsoft 281
Academic Graph and Altmetric, identifying and examining an analytical sample of around 4,500 retracted 282
papers involving over 14,500 authors. Our findings reveal that: 1) around 25% of authors left their publish- 283
ing careers around the time of retraction, 2) authors who left exhibited shorter pre-retraction careers, had 284
fewer citations, collaborators, and publications compared to those who stayed, 3) high attention following 285
retraction increased the likelihood of leaving, 4) retracted authors who stayed post-retraction formed larger 286
collaboration networks, retaining more collaborators and gaining new ones, and 5) particularly among re- 287
tracted authors with less than 10 years of experience pre-retraction who stayed, they built qualitatively 288
weaker collaboration networks in terms of their coauthors seniority, productivity and impact. 289
It is important to acknowledge that our study is not without limitations. Firstly, online attention, as mea- 290
sured by the Altmetric score, captures the volume rather than the quality of attention and lacks a nuanced 291
description of its specific sources beyond categories of platforms. Additionally, the score fails to reveal 292
how relevant the coverage may be for retracted authors, which, if explored, could yield further insights into 293
our findings. Moreover, retained and new collaborators may be qualitatively different beyond the aspects 294
we explore. While our approach documents differences in the size of collaboration networks, and some 295
aspects of their composition, it does not delve into e.g., changes in sub-fields, which might characterize 296
the careers of retracted authors more so compared to their counterparts. Lastly, our matching analysis 297
reveals that retracted authors matched to similar non-retracted counterparts were on average, more junior 298
compared to the average retracted author. Therefore, it is possible that our estimates represent a lower 299
bound of a difference, assuming that more senior authors possess greater resources to further develop their 300
networks. Conversely, they may also represent an upper bound, assuming that more established authors re- 301
ceive less benefit of the doubt when assessing their culpability compared to their early-career counterparts. 302
While these limitations should be taken into account when interpreting our findings, the matching strategy 303
7
employed provides a high-quality estimate for the differences observed among the matched groups.304
Some considerations fall outside of the scope of the paper. For instance, self-retraction might signal305
integrity, which could be a factor contributing to why some retracted authors develop larger collaboration306
networks. It is possible that these authors become more cautious in their future endeavors to avoid a second307
retraction, making them desirable collaborators. It is also plausible that they change how agentically they308
search for new collaborators and cultivate already existing relationships to compensate for the assumed309
(and empirically documented) negative impact of retractions. The role of the scientific community is310
similarly under-explored. Specifically, some retracted authors might receive support from their colleagues,311
presumably in cases when their papers are retracted for mistakes. Bringing these authors on as collaborators312
may be one way to show such support. These assertions, we hope, would form the starting point of future313
work.314
Our study serves as an initial step in documenting how important institutions of science, such as re-315
tractions that serve a key role in policing the content of the canon, impact the careers of scientists. Fu-316
ture research should complement our work by exploring how authors navigate retractions and the micro-317
mechanisms underlying the strategies employed by retracted and non-retracted authors when seeking col-318
laborators. It would also be valuable to investigate whether retracted and non-retracted scientists are sought319
after for similar opportunities, particularly when retracted authors’ work was not retracted due to miscon-320
duct. Furthermore, the role of online attention in these matters deserves further exploration, as it becomes321
intertwined with the names of authors whose work is discussed, extending beyond the scientific content of322
papers and encompassing a broader set of issues.323
Materials and Methods324
Data sources325
The analyses presented in this paper rely on three data sets:326
1. Retraction Watch (RW) [23] is the largest publicly available data set of retracted articles, obtained on327
the 18th of May 2021. At the time it contained 26,504 retracted papers published in 5,844 journals328
and conferences. The earliest publication record goes back to the year 1753, whereas the latest record329
is in 2021. The data set consists of articles classified by a combination of 104 reasons for retraction.330
2. Microsoft Academic Graph (MAG) [31] is one of the largest data sets for scientific publication331
records. We collected this data set on the 30th of July 2021. It contained, at the time, approxi-332
mately 263 million publications, authored by approximately 271.5 million scientists, with the earliest333
publication record in 1800.334
3. Altmetric [32] is a database of online mentions of publications. It contains a record of more than335
191 million mentions for over 35 million research outputs. It uses unique identifiers (e.g., Digital336
Object Identifier or DOI, and the PubMedID) to match attention to research across several social337
media platforms, blogs, news sites, and knowledge repositories.338
Merging RW and MAG339
We merge MAG and RW using a two step approach. As both MAG and RW provide the DOI of the340
publication record, we start with these identifiers, as it is a persistent identifier unique to each document341
8
on the web. However, as not all records in MAG and RW have a DOI, out of 26,504 retracted papers in 342
RW, we are able to identify 7,906 papers in MAG using DOIs. In order to increase the size of our data set, 343
we merge the rest of the publication titles in RW using fuzzymatcher3which employs probabilistic record 344
linkage [33], to find similar titles based on Levenshtein distance. We validate the robustness of our fuzzy 345
matching, by randomly sampling 100 retracted papers and manually checking the accuracy of the merge. 346
Out of 100 sampled papers, 99 in RW were linked to the correct entry in MAG. As a result of this second 347
step, we additionally merge 15,363 retractions resulting in a total of 23,269 (88%) retracted papers and 348
72,594 authors in RW linked to their corresponding entry in MAG. 349
Merging RW and Altmetric 350
For each paper in RW, we use the associated DOI or the PubMedID to query the Altmetric API. Out of the 351
26,504 retracted papers in RW, we are able to identify 11,265 (42.5%) papers with online presence based 352
on their unique identifiers. There are 15,239 papers for which an Altmetric entry could not be located, 353
however, these papers and their respective authors are also part of our analysis. Using Google and Twitter, 354
we validate our approach by manually locating the online presence of a random sample of 100 papers that 355
were not found in Altmetric. We find that 96 out of 100 papers had no online mentions, i.e., the main 356
reason for not finding these papers in the Altmetric data set is a result of not receiving attention online. 357
Creating author and paper level features 358
We use the three data sets to create features for authors and papers. Here, we discuss the features requiring 359
additional data collection and calculations, such as the gender, scientific discipline, type of retraction, and 360
Altmetric score. 361
Gender. To identify the perceived gender of authors, we use Genderize.io to map author first names to 362
gender. Genderize.io returns the probability indicating the certainty of the assigned gender. We exclude 363
all authors whose gender could not be identified with >0.5probability. We validate the name-based gen- 364
der identified by Genderize.io by comparing the agreement (or concordance) of its labels against another 365
classifier, Ethnea. Ethnea is a name-based gender and ethnicity classifier specifically designed for bibli- 366
ographic records. We compare the labels of 31,907 authors in RW for whom “male” or “female” labels 367
were available using both Genderize.io and Ethnea. We find that the assignment of these labels agreed 368
for 31,028 (i.e., 97%) retracted authors with a Cohen’s κscore of 0.93 showing an almost perfect level 369
of agreement [34]. Our approach is in line with prior research which uses similar name-based gender 370
classifiers [35, 36, 37, 38, 39, 40], however, automated classifiers, such as the ones here have significant 371
shortcomings. They do not rely on self-identification, and therefore could misgender authors. Annota- 372
tions are performed on the basis of historical name–gender associations to assign male or female to an 373
author, recognizing that there are expansive identities beyond this limiting binary that our approach can not 374
explore. 375
Scientific discipline. To assign a scientific discipline to every author, we utilize the fields in MAG that
span more than 520,000 hierarchically structured fields. For every paper pand field f, MAG specifies a
confidence score, denoted by score(p, f )[0,1], which indicates the level of confidence that pis associ-
ated with f. The aforementioned hierarchy contains 19 top-level fields4, which we refer to as “disciplines.
3https://github.com/RobinL/fuzzymatcher
4These fields are Art, Biology, Business, Chemistry, Computer Science, Economics, Engineering, Environmental Science,
Geography, Geology, History , Materials Science, Mathematics, Medicine, Philosophy, Physics, Political Science, Psychology,
and Sociology.
9
Almost every field, f, has at least one ancestor that is a discipline. Let D(f)denote the set of all disciplines
that are ancestors of f. For any given paper, p, the set of disciplines associated with pis denoted by D(p),
and is computed as follows:
D(p) = nD(f) : fargmax
f
score (p, f )o.
For any given author, a, let P(a)denote the set of papers authored by a. We compute the discipline(s)
of aas follows:
D(a) = nfargmax
f
pP(a) : D(p) = f
o,
where |·| denotes the set cardinality operator. In other words, ais associated with the most frequent376
discipline(s) amongst all papers authored by aup to and including the retraction year.377
Retraction reasons. In order to identify the reason for retraction, we manually extracted the retraction378
notes of 1,250 retracted papers. The reasons for retraction can be classified into four broad categories: (i)379
misconduct; (ii) plagiarism (note that some prior research considers plagiarism as misconduct, for example380
[41]); (iii) mistake; and (iv) other. Every retraction note was evaluated by multiple annotators. We started381
with two annotators, and assigned additional annotators up to 5 until a majority reason was reached. If382
a majority reason was not reached, the reason was classified as “ambiguous. Finally, if no reason was383
provided for the retraction, then it was classified as “unknown. The final distribution of the reasons for384
retraction of the annotated papers is: 251 (20%) misconduct, 311 (25%) plagiarism, 347 (28%) mistake,385
170 (14%) other, 121 (10%) unknown, and 50 (4%) ambiguous. Note that the reason plagiarism includes386
plagiarising others’ work, as well as prior work by the authors. Based on a random sample of 100 retraction387
notes, 50 referred to taking someone else’s work without proper reference, 30 referred to lacking citations388
or quotes from the authors own work, and 20 did not include information whose work has been plagiarised.389
Therefore, 30%-50% of this category is self-plagiarism.390
We use the manually annotated papers to automatically code reasons of retraction for the rest of the391
papers in RW. We do so using a label propagation algorithm. There are 104 unique reasons for retractions392
provided in RW. Each retracted paper is associated with one or more of these reasons. We map the 104 rea-393
sons to a majority coarser class of plagiarism, misconduct, mistake, and other in our analysis. Then we use394
this mapping to annotate the rest of the papers without labels using the majority class (see Supplementary395
Figure S8 for a more detailed visualization of the label propagation algorithm). The final distribution of the396
reasons for retraction after label propagation for the filtered sample is as follows: 1153 (25%) misconduct,397
1507 (32%) plagiarism, 1097 (23%) mistake, 520 (11%) unknown, 348 (7%) other, and 50 (1%) ambigu-398
ous. These numbers are comparable to those reported about a decade ago in a sample of papers indexed in399
PubMed [42]. In our analysis, we merge all three of other, unknown and ambiguous in the “other” category.400
Type of retraction. Using the manually extracted retraction notes, we also identify whether the retraction401
was author-led or journal-led. The breakdown of the different types of retractions is as follows: 604 (48%)402
author-led, 499 (40%) journal-led, 119 (10%) unknown, and 28 (2%) ambiguous. These data are only403
available for the manually annotated papers.404
Journal and conference ranking. To identify the ranking for the venue (journal or conference) of retracted405
papers, we utilize the database of SCImago journal rankings (SJR) 5. For a given journal or conference year,406
the SJR score is computed as the average number of weighted citations received by the articles published407
in the venue during the past three years [43]. Based on this score, for each subject area, a quartile is also408
assigned to each journal. SJR provides rankings from 1999 to 2020. We use the year of publication of the409
5https://www.scimagojr.com/journalrank.php
10
retracted article to identify the SJR score and quartile ranking of the venue. Out of the 4,672 papers and 410
15,614 authors in the filtered sample, we were able to identify the rankings for 3,183 (68.1%) papers and 411
10,698 (68.5%) authors. Note that papers prior to 1999 do not have this information, nor do papers whose 412
venues were not featured in SJR. 413
Altmetric score. The Altmetric score is a weighted count of the attention a research output receives from 414
different online sources (e.g., Twitter, news, etc.). The Altmetric API, however, only provides the current 415
cumulative Altmetric score for a given record, and does not give the breakdown or a customized score for 416
a given time window. Since we focus on the 6 month before and after retraction to isolate the attention 417
that the retraction likely garnered, we compute this score using the methodology detailed by Altmetric on 418
their webpage6. While the algorithm Altmetric uses to compute its score is proprietary [44], the description 419
allows us to closely estimate it. We compute the Spearman correlation of the available cumulative Altmetric 420
score against our computed score for the complete time window using our methodology. For the 11,265 421
retracted papers for which an Altmetric record (and score) is available, the Spearman correlation is high 422
(ρ= 0.96). We compute the Altmetric score for both the retracted paper and its respective retraction note 423
(most retraction notes and papers have separate DOIs) and aggregate these by taking their sum. We assign 424
papers that are not indexed in Altmetric an attention score of zero. We validate this choice by randomly 425
sampling 100 of these papers and manually searching for mentions of them on Google and on Twitter using 426
the title and the DOI of the paper. Out of this 100, 96 have not received any attention. The remaining papers 427
only garnered one mention on average. For this reason, we treat the papers that do not have mentions on 428
Altmetric as having an attention score of zero. 429
Threshold for high attention. We determine the threshold by conducting regression analysis allowing the 430
threshold for high levels of attention to vary 0< a < 64. We examine four different specifications of the 431
regression model, where specifications differ by how we measure an author’s level of experience (academic 432
age, number of papers, number of citations, number of past collaborators), which are highly correlated. We 433
reveal that our data are consistent with a threshold of 12 <a<23, where in this region we detect a 434
positive and statistically significant effect of attention on attrition in at least three of the four specifications. 435
Outside of this region, there are no thresholds where more than one model yields a statistically significant 436
estimate. We set our threshold for high attention at 20 in the 12-months window around the retraction. We 437
also consider a non-linearity in the threshold, but we do not find strong evidence for it; for further details, 438
see Supplementary Note S2. 439
Analytical sample for the matching experiment 440
We use our filtered sample of 4,672 papers and 15,614 authors to generate the analytical sample for the 441
matching experiment. Of the 15,614 authors, we find suitable matches for 3,094 authors (20%). These 442
matches are established on the basis of a three-step process. First we perform exact matching on gender, 443
academic age, affiliation rank at the start of the career, as well as affiliation rank and scientific discipline at 444
the time of retraction. Second, we pick a threshold (30%) within which we accept matches on the remaining 445
characteristics: number of papers, number of collaborators, and number of citations pre-retraction; i.e., 446
these characteristics of the match ought to be within 30% of the same for the retracted author. Third, we use 447
a calibrated distance function to achieve balance on the three latter characteristics, giving more weight to 448
number of pre-retraction collaborators, our most important confounder. Specifically, we identify the closest 449
match for each author using the lowest weighted euclidean distance that minimizes the standardized mean 450
difference for the number of papers, number of citations, and the number of collaborators between the 451
6https://help.altmetric.com/support/solutions/articles/6000233311-how-is-the-altmetric-attention-score-calculated-
11
author and the match, calculated over the set of potential matches identified in the second step. We repeat452
these steps using a 20% threshold, and present robustness analyses with this threshold in Supplementary453
Figure S7.454
For every author a, let pa,caand oadenote the standardized number of papers, number of citations, and455
number of collaborators of aby the year of retraction. We choose the closest match mfor each author by456
minimizing the following distance function:457
D=qwpapers(papm)2+wcitations (cacm)2+wcollaborators (oaom)2,
where wpapers = 0.1,wcitations = 0.1, and wcollaborators = 0.8, denote the weights determined empiri-458
cally by minimizing the standardized mean differences.459
If the collaboration year of a collaborator is missing, we cannot place them on the authors’ career time-460
line. In other words, we cannot identify whether that collaboration happened pre- or post-retraction. As461
such, in the case of a missing collaboration year, we remove the author, and their corresponding matches462
from our analysis altogether. We carry out our analysis on retracted authors who authored at least one pa-463
per post their retraction year. All matched authors meet the same criteria. The “matched sample” of these464
authors who stayed in scientific publishing is different from those in the filtered sample in the following465
important ways: the matched sample is younger, has fewer papers, fewer citations and fewer collaborators466
on average, and slightly more likely to be a middle author, also more likely to be in institutions ranked467
101-1000 (see Supplementary Figure S9). In sum, the matched authors are lower status, on average, com-468
pared to the non-matched filtered sample. These differences are essential to consider when evaluating our469
inferences. We calculate standardized mean differences (SMDs) [45] for our matched sample and find that470
these values are 0.029, 0.023 and 0.046, respectively for the number of papers, citations, and collaborators.471
These statistics give us confidence that retracted authors are matched to non-retracted authors with similar472
career trajectories up to the time of retraction.473
Acknowledgement474
We gratefully acknowledge support and resources from the High Performance Computing Center at New475
York University Abu Dhabi. We thank Ivan Oransky and Adam Marcus, co-founders of Retraction Watch476
[46], as well as The Center For Scientific Integrity, the parent organization of RW, for diligently main-477
taining a curated list of scientific retractions, and making it freely available to researchers. We also thank478
Altmetric.com and Microsoft Academic Graph for providing the data used in this study. We thank Sarhana479
Adhikari, Alex Chae, Aasharya Dutt, Rhiane Kall, Danish Khan, Rhythm Kukreja, Ritin Malhotra and480
Zaeem Shahzad for finding and annotating the retraction notices. We thank Peter Bearman, Sanjeev Goyal,481
Byungkyu Lee, Fengyuan (Michael) Liu, Minsu Park, and the participants of the Workshop on the Frontiers482
of Network Science 2023 for thoughtful comments and suggestions.483
Data and Code Availability484
The Microsoft Academic Graph (MAG) dataset can be downloaded from the following website. The485
Retraction Watch (RW) database can be accessed from their website. Access to Altmetric API can be486
requested from their website. Finally, all data and code used to produce the figures and tables can be487
downloaded here.488
12
Conflict of Interest 489
The authors, B.A. and K.M. acknowledge that their study was inspired by a personal experience, experienc- 490
ing a retraction of one of their papers, which galvanized the research question asked in this paper, namely, 491
how retractions influence the careers of authors of scientific papers? While this shaped their inquiry, it had 492
no impact on their choice of methods, or interpretation of their results. The authors declare no conflict of 493
interest as defined by the Journal’s policy. 494
Figures 495
13
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
year of last publication since retraction (year 0)
0
5
10
15
20
% of retracted
authors who left
1.3 2.1 4.6
16.2
4.2 3.5 3.1 3.9
11.0 9.0 8.1 6.3 5.3 4.4 3.6 2.1 1.8 1.7
a
left publishing did not leave publishing
0 15 30 45
% authors
biology
chemistry
medicine
physics
other STEM
fields
non-STEM
fields
discipline
30.5
20.2
21.9
7.6
14.0
5.7
37.2
15.5
29.0
6.3
9.5
2.6
0 20 40 60
% authors
1-100
101-500
501-1000
>1000
affiliation rank
14.8
31.1
14.5
39.5
19.3
28.9
13.9
38.0
0 20 40 60 80
% authors
male
female
gender
69.9
30.1
74.7
25.3
0
20
40
academic age
***
0
50
100
150
200
number of papers
***
0 20 40 60 80
% authors
middle
author
first or last
author
author order
57.0
43.0
61.1
38.9
0
1000
2000
3000
number of citations
***
0
200
400
number of
collaborators
***
b
2514
(56%)
1401
(31%)
556
(12%)
number of papers
with overlap
5
10
15
number of coauthors
on the retracted paper
***
0 20 40 60 80 100
% papers
journal
conference
venue
96.5
3.5
99.1
0.9
0 20 40 60 80
% papers
1990-1995
1996-2000
2001-2005
2006-2010
2011-2015
year of retraction
2.5
2.6
4.3
26.7
63.9
2.0
3.0
5.9
26.1
63.0
0 10 20 30 40
% papers
misconduct
plagiarism
mistake
other
reason of retraction
24.7
31.5
23.7
20.1
23.3
31.6
25.2
19.9
0 20 40 60
% papers
Q1
Q2
Q3
Q4
NA
journal or conference
ranking
45.4
17.2
6.3
0.9
30.2
49.6
15.9
5.4
1.1
28.0
c
Figure 1: Characteristics of retracted authors. (a) shows percentage of retracted authors who left (blue)
versus those who did not (red). (b) shows comparisons across different author-level characteristics among
retracted authors who left scientific publishing at the time of retraction, and those who have not. (c) similar
to (b) but for paper-level characteristics. Outliers are removed from box plots for presentation purposes.
*** p<.001
-
Figure 2: Raincloud plot [47] showing the distribution of logged Altmetric score 6 months pre- and
post-retraction. The x-axis represents monthly windows between the retraction and attention, specifically
0 is the day of the retraction (not displayed), therefore -1 is the month right before the retraction, and 1 is
the month right after. The y-axis shows the logged Altmetric score for a given paper in the given month.
Note, that Altmetric scores [0, 1] are frequent, e.g., 1 tweet results in a score of 0.25. The black line shows
the average logged Altmetric score. We exclude all papers that do not receive any attention within the
12-months window. The 12-months window was chosen based on prior work that characterized attention
to retracted papers [28]. Comparison across different months shows that retracted papers receive the most
attention within 1 month of the retraction.
15
Table 1: Linear probability models of attrition. Models differ in how authors’ experience is measured
using (1) academic age, (2) number of papers by the time of retraction, (3) logged number of citations by
the the time of retraction, and (4) logged number of collaborators by the time of retraction, respectively.
Additional control variables (not represented) include authors’ discipline, gender, affiliation rank at the
time of retraction, number of coauthors on the retracted paper, author order, retraction year, venue, and
journal/conference rank. For the complete regression table see Supplementary Table S2.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.0670.0540.098∗∗∗ 0.063∗∗
(0.027) (0.027) (0.027) (0.024)
Academic Age 0.008∗∗∗
(0.000)
Papers 0.001∗∗∗
(0.000)
Log(Citations) 0.055∗∗∗
(0.002)
Log(Collaborators) 0.113∗∗∗
(0.004)
Reason: Misconduct 0.037∗∗ 0.038∗∗ 0.0260.028
(0.013) (0.013) (0.013) (0.012)
Reason: Plagiarism 0.013 0.021 0.005 0.009
(0.013) (0.013) (0.013) (0.012)
Reason: Other 0.020 0.015 0.037∗∗ 0.019
(0.013) (0.013) (0.013) (0.012)
Constant 10.641∗∗∗ 11.619∗∗∗ 10.804∗∗∗ 12.657∗∗∗
(3.025) (3.054) (2.959) (2.805)
Observations 10698 10698 10698 10698
R20.208 0.189 0.251 0.283
Adjusted R20.205 0.187 0.249 0.281
F Statistic 52.313∗∗∗ 44.729∗∗∗ 59.423∗∗∗ 75.255∗∗∗
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
16
0 2 4 6 8 10 12
Number of collaborators
retained
5 years post-retraction
Overall
Gender
Female
Male
Year of retraction
1990-1995
1996-2000
2001-2005
2006-2010
2011-2015
Author academic age
1 year
2 years
3-5 years
6 or more years
Author affiliation rank
1-100
101-500
501-1000
>1000
Author order
First or last author
Middle author
Reason of retraction
Misconduct
Plagiarism
Mistake
Other
Type of retraction
Author-led retraction
Journal-led retraction
Discipline
Biology
Chemistry
Medicine
Physics
Other STEM fields
Non-STEM fields
a
0 10 20 30 40 50 60
Number of collaborators
gained
5 years post-retraction
b
0.00 0.01 0.02 0.03 0.04 0.05
Ratio of
triads
closed
5 years post-retraction
c
Retracted Non-retracted Significant Not Significant
Figure 3: Analyzing collaborator retention, gain, and triadic closure among retracted authors who
stayed in scientific publishing post-retraction. The figure shows the difference between (a) the numbers
of collaborators retained, (b) the numbers of collaborators gained, and (c) the proportions of triads closed 5
years post-retraction for the authors who were retracted (red circle), and their matched non-retracted pairs
(green square). These are further stratified by gender, year of retraction, academic age, author order, reason
of retraction, type of retraction, and discipline.
17
11 13 15 17 19
Academic Age
Early-Career
Mid-career
Senior
Collaborators
Retained
a
60 70 80 90
Number of Papers
b
1200 1600 2000 2400
Number of Citations
c
6 7 8 9 10
Early-Career
Mid-career
Senior
Collaborators
Gained
d
35 40 45 50
e
600 800 1000 1200 1400
f
0 1.5 3.0 4.5
Early-Career
Mid-career
Senior
Difference-in-
Difference
g
20 24 28 32
h
150 300 450 600
i
Early-Career Mid-career Senior Retracted Non-retracted
13 14 15 16 17
Academic Age
Misconduct
Plagiarism
Mistake
Collaborators
Retained
j
60 70 80 90 100
Number of Papers
k
1200 1600 2000 2400
Number of Citations
l
6 7 8 9 10
Misconduct
Plagiarism
Mistake
Collaborators
Gained
m
35 40 45 50
n
600 800 1000 1200 1400
o
1.5 2.0 2.5 3.0
Misconduct
Plagiarism
Mistake
Difference-in-
Difference
p
20 24 28 32
q
150 300 450 600
r
Misconduct Plagiarism Mistake Retracted Non-retracted
Figure 4: Comparison of characteristics of collaborators retained and gained by retracted and non-
retracted authors, stratified by seniority and by reasons of retraction. (a-c) display the comparison
based on academic age, number of papers, and number of citations for retained collaborators of retracted
and non-retracted authors of different age groups. (d-f) show the comparison for collaborators gained. (g-i)
illustrate the results of the difference-in-difference analysis comparing the difference of collaborators re-
tained and collaborators lost. (j-r) show similar comparison for collaborators of retracted and non-retracted
authors stratified by reasons of retraction.
Supplementary Materials for
Characterizing the effect of retractions on scientific careers
Shahan Ali Memon, Kinga Makovi*, Bedoor AlShebli*
* Joint corresponding authors. E-mail: km2537@nyu.edu; bedoor@nyu.edu;
19
Supplementary Figures
0-1
month
1-2
months
2-3
months
3-6
months
6-12
months
1-2
years
2-3
years
3-6
years
6-10
years
> 10
years
Percentage distribution of the time interval
between consecutive retractions for authors with multiple retractions
57.6%
7.0%
4.7%
8.2% 8.7%
5.8%
2.7% 3.3% 1.5% 0.6%
Figure S1: Percentage distribution of time interval between consecutive retraction for authors with
multiple retractions.
1
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
10
20
30
40
% of retracted
authors who left
1.7 4.1
11.1
34.4
7.5 6.0 4.2 4.1 6.4 4.5 3.8 2.7 2.3 1.8 1.7 1.0
a
Early career authors (age 0-3)
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
10
20
30
40
% of retracted
authors who left
1.9 2.3 3.1
14.2
4.1 3.6 3.6 4.8
10.5 8.2 8.1 6.0 5.0 4.5 3.4 1.9 1.6 1.6
b
Mid-career authors (age 4-9)
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
year of last publication since retraction (year 0)
0
10
20
30
40
% of retracted
authors who left
4.7 1.8 1.7 2.1 3.3
14.6 12.6 11.1 9.1 7.5 6.1 5.0 2.9 2.5 2.4 1.4 1.0
c
Senior authors (age 10 or greater)
left publishing did not leave publishing
Figure S2: Percentage of retracted authors who left (blue) versus those who did not (red) stratified
by age. (a) shows early-career authors (age 0-3), (b) shows mid-career authors (age 4-9), and (c) shows
senior authors (age 10).
2
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
5
10
15
20
25
% of retracted
authors who left
1.5 2.8
6.6
20.6
5.2 3.7 3.3 4.8
9.3 7.3 6.8 5.8 4.1 3.6 3.2 1.8 1.5 1.4
a
First author
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
5
10
15
20
25
% of retracted
authors who left
1.3 2.0 4.1
15.5
4.1 3.6 3.3 4.0
11.3 9.3 8.3 6.3 5.5 4.6 3.5 2.0 1.8 1.7
b
Middle author
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
year of last publication since retraction (year 0)
0
5
10
15
20
25
% of retracted
authors who left
1.2 1.9 3.9
13.9
3.4 2.9 2.6 2.7
12.0 10.0 8.6 7.1 5.7 4.7 4.3 2.6 1.8 1.9 1.3
c
Last or only author
left publishing did not leave publishing
Figure S3: Percentage of retracted authors who left (blue) versus those who did not (red) stratified by
author order in the paper. (a) shows first authors, (b) shows middle authors, and (c) shows last (or solo)
authors.
3
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
5
10
15
20
% of retracted
authors who left
1.9 2.8 4.5
17.3
3.7 3.5 2.5 4.3
12.0 10.0 8.0 5.8 4.2 4.1 2.4 1.1 1.1 1.5
a
Authors retracted due to misconduct
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
5
10
15
20
% of retracted
authors who left
1.0 2.2
5.2
18.0
4.3 3.8 3.2 3.7
10.5 8.5 8.2 7.5 5.4 4.7 4.0 1.9 1.8 1.2
b
Authors retracted due to plagiarism
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
year of last publication since retraction (year 0)
0
5
10
15
20
% of retracted
authors who left
1.3 1.8
4.2
13.5
3.5 3.3 3.3 3.7
10.0 10.1 8.6
6.2 5.3 4.4 3.2 2.3 2.1 2.4 1.4 1.1
c
Authors retracted due to mistake
left publishing did not leave publishing
Figure S4: Percentage of retracted authors who left (blue) versus those who did not (red) stratified
by reason of retraction. (a) shows authors retracted due to misconduct, (b) shows authors retracted due
to plagiarism, and (c) shows authors retracted due to mistake.
4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
5
10
15
20
% of retracted
authors who left
2.3 3.2
11.9
3.1 2.9 2.5 3.3
11.6 9.6 8.7 7.0 5.4 5.4 4.7 3.0 2.1 2.1 1.3 1.2 1.0
a
Authors with affiliation rank 1-100
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
5
10
15
20
% of retracted
authors who left
1.3 1.9 3.8
16.0
3.8 3.3 2.6 3.5
11.3 9.3 8.4 6.8 5.4 4.4 4.0 2.3 2.1 1.9
b
Authors with affiliation rank 101-500
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
5
10
15
20
% of retracted
authors who left
1.1 1.9
4.6
14.8
4.6 3.3 3.3 4.7
12.3
9.3 9.2 7.1 5.4 4.7 3.8 2.2 1.6 1.3
c
Authors with affiliation rank 501-1000
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
year of last publication since retraction (year 0)
0
5
10
15
20
% of retracted
authors who left
1.2 1.8
4.5
14.7
3.8 3.2 3.1 3.7
12.0 9.9 8.3 6.5 5.7 4.6 3.7 1.9 1.8 1.9 1.0
d
Authors with affiliation rank >1000
left publishing did not leave publishing
Figure S5: Percentage of retracted authors who left (blue) versus those who did not (red) stratified
by affiliation rank at the time of retraction. (a) shows authors with affiliation rank between 1 and 100,
(b) with affiliation rank between 101 and 500, (c) with affiliation rank between 501 and 1000, and (d) with
affiliation rank greater than 1000.
5
0
20
40
60
80
100
Social Media
0
20
40
60
80
100
News Media
0
20
40
60
80
100
Blogs
-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
0
20
40
60
80
100
Knowledge Repositories
Months since retraction
Percentage mentions
Figure S6: Raincloud plot [47] showing the normalized distribution of mentions 6 months pre- and
post-retraction across different types of platforms. The x-axis represents time from the month of re-
traction (0). The y-axis shows the percentage of mentions for a given paper in the given month where the
denominator is all mentions in the 12-months window displayed [28]. The black line shows the average.
All papers that do not receive any attention within the 12-months window are excluded. Papers receive the
most attention within 1 month of the retraction across a variety of platforms: (i) social media (this includes
Twitter, Facebook, Google+, LinkedIn, Pinterest, Reddit, and videos); (ii) news media; (iii) blogs; and (iv)
knowledge repositories (this includes Wikipedia, patents, F1000, Q&A, and peer reviews), a classification
based on [27] studying the dynamics of cross-platform attention to retracted papers.
6
0 2 4 6 8 10 12
Number of collaborators
retained
5 years post-retraction
Overall
Gender
Female
Male
Year of retraction
1990-1995
1996-2000
2001-2005
2006-2010
2011-2015
Author academic age
1 year
2 years
3-5 years
6 or more years
Author affiliation rank
1-100
101-500
501-1000
>1000
Author order
First or last author
Middle author
Reason of retraction
Misconduct
Plagiarism
Mistake
Other
Type of retraction
Author-led retraction
Journal-led retraction
Discipline
Biology
Chemistry
Medicine
Physics
Other STEM fields
Non-STEM fields
a
0 10 20 30 40 50 60
Number of collaborators
gained
5 years post-retraction
b
0.00 0.01 0.02 0.03 0.04 0.05
Ratio of
triads
closed
5 years post-retraction
c
Retracted Non-retracted Significant Not Significant
Figure S7: Robustness analysis for analyzing collaborator retention, gain, and triadic closure among
retracted authors who stayed in scientific publishing post-retraction using a 20% threshold for per-
centage difference in papers, citations, and collaborators between retracted and non-retracted scien-
tists. The figure shows the difference between (a) the numbers of collaborators retained, (b) the numbers
of collaborators gained, and (c) the proportions of triads closed 5 years post-retraction for the authors who
were retracted (red circle), and their matched non-retracted pairs (green square). These are further strati-
fied by gender, year of retraction, academic age, author order, reason of retraction, type of retraction, and
discipline.
7
Figure S8: Annotating reasons of retraction using label propagation. This diagram shows the process
we employ to annotate reasons of retraction for the non-annotated papers. Step (1) shows the process used
to annotate papers manually; Steps (2) and (3) illustrate the label propagation algorithm.
8
Supplementary Tables
Table S1: Standardized mean differences between authors who left academic publishing and those
who stayed. pdenotes the proportion of authors who stayed, and pdenotes the proportion of authors who
left academic publishing. For continuous variables, pand prepresent the mean.
Category p(n= 11,347) p(n= 3,236) Standardized Mean Difference
Attention High (>20 Altmetric score) 0.04 0.03 -0.07
Low 0.96 0.97 0.07
Academic Age 13.71 4.08 1.03
# Papers 70.0 7.02 0.75
# Citations 1618.23 132.15 0.44
# Collaborators 193.14 18.67 0.44
Gender Male 0.75 0.70 -0.24
Female 0.25 0.30 0.24
Affiliation Rank
1-100 0.20 0.15 -0.37
101-500 0.30 0.31 0.09
501-1000 0.14 0.15 0.07
>1000 0.36 0.39 0.12
Reason
Misconduct 0.22 0.25 0.16
Plagiarism 0.28 0.32 0.20
Mistake 0.30 0.24 -0.30
Other 0.21 0.19 -0.07
Author order First or last author 0.39 0.43 0.17
Middle author 0.61 0.57 -0.17
Year of retraction
1990-1995 0.02 0.02 0.38
1996-2000 0.03 0.02 -0.19
2001-2005 0.06 0.04 -0.42
2006-2010 0.26 0.26 -0.04
2010-2015 0.64 0.66 0.11
Venue Journal 0.99 0.96 -1.47
Conference 0.01 0.04 1.47
Journal Ranking
Q1 0.73 0.65 -0.37
Q2 0.20 0.24 0.24
Q3 0.06 0.10 0.49
Q4 0.01 0.01 0.23
Discipline
Biology 0.37 0.31 -0.30
Chemistry 0.15 0.20 0.32
Medicine 0.29 0.22 -0.37
Physics 0.06 0.08 0.21
Other STEM fields 0.09 0.14 0.44
Non-STEM fields 0.03 0.06 0.79
9
Table S2: Complete linear probability models of attrition. Models differ in how authors’ experience is
measured using (1) academic age, (2) number of papers by the time of retraction, (3) logged number of
citations by the the time of retraction, and (4) logged number of collaborators by the time of retraction,
respectively. Controls for author’s scientific discipline are included as categorical variables, but are not
shown.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.0670.0540.098∗∗∗ 0.063∗∗
(0.027) (0.027) (0.027) (0.024)
Academic Age 0.008∗∗∗
(0.000)
Papers 0.001∗∗∗
(0.000)
Log(Citations) 0.055∗∗∗
(0.002)
Log(Collaborators) 0.113∗∗∗
(0.004)
Female 0.010 0.003 0.013 0.025∗∗
(0.009) (0.009) (0.009) (0.009)
Author Affiliation Rank 0.000 0.000 0.0000.000∗∗
(0.000) (0.000) (0.000) (0.000)
Reason: Misconduct 0.037∗∗ 0.038∗∗ 0.0260.028
(0.013) (0.013) (0.013) (0.012)
Reason: Plagiarism 0.013 0.021 0.005 0.009
(0.013) (0.013) (0.013) (0.012)
Reason: Other 0.020 0.015 0.037∗∗ 0.019
(0.013) (0.013) (0.013) (0.012)
Coauthors on Retracted Paper 0.003∗∗ 0.003∗∗ 0.000 0.006∗∗∗
(0.001) (0.001) (0.001) (0.001)
Author order: Middle Author 0.004 0.004 0.002 0.007
(0.008) (0.008) (0.008) (0.007)
Retraction Year 0.005∗∗∗ 0.006∗∗∗ 0.006∗∗∗ 0.007∗∗∗
(0.001) (0.002) (0.001) (0.001)
Venue: Journal 0.089 0.071 0.059 0.020
(0.216) (0.233) (0.223) (0.206)
Journal/Conference Rank 0.023∗∗ 0.027∗∗∗ 0.004 0.012
(0.008) (0.008) (0.007) (0.007)
Constant 10.641∗∗∗ 11.619∗∗∗ 10.804∗∗∗ 12.657∗∗∗
(3.025) (3.054) (2.959) (2.805)
Observations 10698 10698 10698 10698
R20.208 0.189 0.251 0.283
Adjusted R20.205 0.187 0.249 0.281
F Statistic 52.313∗∗∗ 44.729∗∗∗ 59.423∗∗∗ 75.255∗∗∗
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
10
Table S3: Logistic regression models of attrition. Models differ in how authors’ experience is measured
using (1) academic age, (2) number of papers by the time of retraction, (3) logged number of citations by
the the time of retraction, and (4) logged number of collaborators by the time of retraction, respectively.
Controls for author’s scientific discipline are included as categorical variables, but are not shown.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.623∗∗ 0.524∗∗ 0.947∗∗∗ 0.678∗∗∗
(0.193) (0.192) (0.194) (0.191)
Academic Age 0.112∗∗∗
(0.008)
Papers 0.064∗∗∗
(0.008)
Log(Citations) 0.442∗∗∗
(0.018)
Log(Collaborators) 1.286∗∗∗
(0.044)
Female 0.054 0.1420.008 0.115
(0.061) (0.060) (0.064) (0.065)
Author Affiliation Rank 0.000 0.000 0.000 0.000
(0.000) (0.000) (0.000) (0.000)
Reason: Misconduct 0.287∗∗ 0.2100.180 0.211
(0.097) (0.096) (0.104) (0.105)
Reason: Plagiarism 0.105 0.089 0.025 0.066
(0.097) (0.096) (0.104) (0.107)
Reason: Other 0.151 0.166 0.321∗∗ 0.143
(0.103) (0.102) (0.108) (0.110)
Coauthors on Retracted Paper 0.0210.017 0.002 0.093∗∗∗
(0.009) (0.009) (0.011) (0.012)
Author order: Middle Author 0.001 0.033 0.033 0.212∗∗
(0.057) (0.058) (0.061) (0.065)
Retraction Year 0.037∗∗ 0.040∗∗∗ 0.034∗∗ 0.043∗∗∗
(0.012) (0.011) (0.012) (0.012)
Venue: Journal 0.423 0.643 0.522 0.346
(1.314) (1.343) (1.278) (1.043)
Journal/Conference Rank 0.1160.1000.063 0.066
(0.051) (0.049) (0.053) (0.055)
Constant 74.908∗∗ 81.394∗∗∗ 67.786∗∗ 85.123∗∗∗
(23.499) (22.552) (24.742) (24.334)
Observations 10698 10698 10698 10698
Pseudo R20.219 0.274 0.249 0.324
Log-Likelihood -4374.899 -4067.782 -4208.473 -3786.276
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
11
Table S4: Linear probability models of attrition without controlling for journal/conference rank.
Models differ in how authors’ experience is measured using (1) academic age, (2) number of papers by the
time of retraction, (3) logged number of citations by the the time of retraction, and (4) logged number of
collaborators by the time of retraction, respectively. Controls for author’s scientific discipline are included
as categorical variables, but are not shown.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.0520.035 0.088∗∗∗ 0.050
(0.025) (0.026) (0.025) (0.023)
Academic Age 0.008∗∗∗
(0.000)
Papers 0.001∗∗∗
(0.000)
Log(Citations) 0.054∗∗∗
(0.002)
Log(Collaborators) 0.113∗∗∗
(0.003)
Female 0.014 0.006 0.0150.029∗∗∗
(0.008) (0.008) (0.007) (0.007)
Author Affiliation Rank 0.000 0.000 0.0000.000∗∗
(0.000) (0.000) (0.000) (0.000)
Reason: Misconduct 0.035∗∗ 0.035∗∗ 0.0220.027
(0.011) (0.011) (0.011) (0.011)
Reason: Plagiarism 0.020 0.0270.005 0.014
(0.011) (0.011) (0.011) (0.010)
Reason: Other 0.011 0.005 0.039∗∗∗ 0.018
(0.011) (0.011) (0.011) (0.011)
Coauthors on Retracted Paper 0.004∗∗∗ 0.004∗∗∗ 0.001 0.005∗∗∗
(0.001) (0.001) (0.001) (0.001)
Author order: Middle Author 0.012 0.012 0.010 0.002
(0.007) (0.007) (0.006) (0.006)
Retraction Year 0.003∗∗ 0.003∗∗∗ 0.003∗∗ 0.004∗∗∗
(0.001) (0.001) (0.001) (0.001)
Venue: Journal 0.167∗∗ 0.190∗∗ 0.1210.148∗∗
(0.056) (0.058) (0.054) (0.052)
Constant 5.669∗∗ 6.258∗∗∗ 4.928∗∗ 7.353∗∗∗
(1.834) (1.878) (1.799) (1.701)
Observations 14583 14583 14583 14583
R20.215 0.194 0.258 0.290
Adjusted R20.213 0.192 0.256 0.289
F Statistic 78.405∗∗∗ 63.738∗∗∗ 90.875∗∗∗ 115.305∗∗∗
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
12
Table S5: Linear probability models of attrition by classifying authors as attrited who left scientific
publishing in the years -1, 0, and 1. Models differ in how authors’ experience is measured using (1)
academic age, (2) number of papers by the time of retraction, (3) logged number of citations by the the
time of retraction, and (4) logged number of collaborators by the time of retraction, respectively. Controls
for author’s scientific discipline are included as categorical variables, but are not shown.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.0520.037 0.087∗∗∗ 0.047
(0.026) (0.027) (0.027) (0.024)
Academic Age 0.009∗∗∗
(0.000)
Papers 0.001∗∗∗
(0.000)
Log(Citations) 0.062∗∗∗
(0.002)
Log(Collaborators) 0.128∗∗∗
(0.004)
Female 0.000 0.008 0.002 0.017
(0.009) (0.010) (0.009) (0.009)
Author Affiliation Rank 0.000 0.000 0.0000.000
(0.000) (0.000) (0.000) (0.000)
Reason: Misconduct 0.037∗∗ 0.039∗∗ 0.025 0.028
(0.013) (0.013) (0.013) (0.013)
Reason: Plagiarism 0.018 0.0270.003 0.014
(0.013) (0.013) (0.013) (0.012)
Reason: Other 0.001 0.007 0.019 0.002
(0.013) (0.014) (0.013) (0.012)
Coauthors on Retracted Paper 0.003∗∗ 0.003∗∗ 0.000 0.006∗∗∗
(0.001) (0.001) (0.001) (0.001)
Author order: Middle Author 0.009 0.009 0.006 0.004
(0.008) (0.008) (0.008) (0.008)
Retraction Year 0.008∗∗∗ 0.008∗∗∗ 0.008∗∗∗ 0.009∗∗∗
(0.002) (0.002) (0.001) (0.001)
Venue: Journal 0.016 0.006 0.019 0.063
(0.188) (0.207) (0.196) (0.176)
Journal/Conference Rank 0.0180.022∗∗ 0.013 0.005
(0.008) (0.008) (0.008) (0.007)
Constant 15.867∗∗∗ 17.054∗∗∗ 16.045∗∗∗ 18.151∗∗∗
(3.074) (3.100) (2.967) (2.828)
Observations 10698 10698 10698 10698
R20.227 0.206 0.274 0.310
Adjusted R20.225 0.203 0.272 0.308
F Statistic 66.930∗∗∗ 55.524∗∗∗ 82.685∗∗∗ 104.242∗∗∗
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
13
Table S6: Logistic regression models of attrition by classifying authors as attrited who left scientific
publishing in the years -1, 0, and 1. Models differ in how authors’ experience is measured using (1)
academic age, (2) number of papers by the time of retraction, (3) logged number of citations by the the
time of retraction, and (4) logged number of collaborators by the time of retraction, respectively. Controls
for author’s scientific discipline are included as categorical variables, but are not shown.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.4450.324 0.754∗∗∗ 0.474∗∗
(0.175) (0.175) (0.179) (0.179)
Academic Age 0.104∗∗∗
(0.007)
Papers 0.048∗∗∗
(0.006)
Log(Citations) 0.437∗∗∗
(0.016)
Log(Collaborators) 1.224∗∗∗
(0.039)
Female 0.013 0.070 0.057 0.049
(0.058) (0.057) (0.061) (0.062)
Author Affiliation Rank 0.000 0.000 0.000 0.000
(0.000) (0.000) (0.000) (0.000)
Reason: Misconduct 0.269∗∗ 0.2010.162 0.188
(0.091) (0.089) (0.098) (0.099)
Reason: Plagiarism 0.127 0.123 0.002 0.092
(0.090) (0.087) (0.096) (0.098)
Reason: Other 0.019 0.014 0.143 0.044
(0.093) (0.091) (0.098) (0.099)
Coauthors on Retracted Paper 0.0200.016 0.003 0.085∗∗∗
(0.008) (0.009) (0.010) (0.011)
Author order: Middle Author 0.035 0.068 0.006 0.146
(0.055) (0.056) (0.059) (0.063)
Retraction Year 0.052∗∗∗ 0.056∗∗∗ 0.051∗∗∗ 0.061∗∗∗
(0.011) (0.010) (0.011) (0.011)
Venue: Journal 0.172 0.037 0.040 0.305
(1.208) (1.239) (1.207) (1.004)
Journal/Conference Rank 0.073 0.060 0.1120.013
(0.048) (0.047) (0.050) (0.052)
Constant 105.900∗∗∗ 114.431∗∗∗ 103.699∗∗∗ 122.171∗∗∗
(21.731) (20.716) (22.838) (22.691)
Observations 10698 10698 10698 10698
Pseudo R20.222 0.266 0.254 0.325
Log-Likelihood -4374.899 -4496.267 -4571.522 -4133.949
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
14
Table S7: Linear probability models of attrition for retractions between 2005-2015. Models differ in
how authors’ experience is measured using (1) academic age, (2) number of papers by the time of retraction,
(3) logged number of citations by the the time of retraction, and (4) logged number of collaborators by
the time of retraction, respectively. Controls for author’s scientific discipline are included as categorical
variables, but are not shown.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.0670.053 0.099∗∗∗ 0.063∗∗
(0.027) (0.027) (0.027) (0.024)
Academic Age 0.008∗∗∗
(0.000)
Papers 0.001∗∗∗
(0.000)
Log(Citations) 0.055∗∗∗
(0.002)
Log(Collaborators) 0.113∗∗∗
(0.004)
Female 0.012 0.005 0.015 0.027∗∗
(0.009) (0.009) (0.009) (0.009)
Author Affiliation Rank 0.000 0.000 0.000∗∗ 0.000∗∗
(0.000) (0.000) (0.000) (0.000)
Reason: Misconduct 0.040∗∗ 0.042∗∗ 0.0290.030
(0.013) (0.013) (0.013) (0.012)
Reason: Plagiarism 0.019 0.0270.001 0.014
(0.013) (0.013) (0.013) (0.012)
Reason: Other 0.016 0.012 0.034∗∗ 0.017
(0.013) (0.013) (0.013) (0.012)
Coauthors on Retracted Paper 0.003∗∗ 0.003∗∗ 0.000 0.006∗∗∗
(0.001) (0.001) (0.001) (0.001)
Author order: Middle Author 0.002 0.002 0.000 0.009
(0.008) (0.008) (0.008) (0.008)
Retraction Year 0.007∗∗∗ 0.007∗∗∗ 0.007∗∗∗ 0.008∗∗∗
(0.002) (0.002) (0.002) (0.002)
Venue: Journal 0.091 0.073 0.061 0.023
(0.216) (0.233) (0.223) (0.206)
Journal/Conference Rank 0.024∗∗ 0.029∗∗∗ 0.003 0.013
(0.008) (0.008) (0.007) (0.007)
Constant 13.698∗∗∗ 14.301∗∗∗ 13.429∗∗∗ 15.360∗∗∗
(3.607) (3.682) (3.481) (3.364)
Observations 10222 10222 10222 10222
R20.212 0.193 0.256 0.287
Adjusted R20.210 0.190 0.254 0.285
F Statistic 51.467∗∗∗ 43.920∗∗∗ 58.212∗∗∗ 73.245∗∗∗
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
15
Table S8: Linear probability models of attrition including a control for who led the retraction. Mod-
els differ in how authors’ experience is measured using (1) academic age, (2) number of papers by the
time of retraction, (3) logged number of citations by the the time of retraction, and (4) logged number of
collaborators by the time of retraction, respectively. Controls for author’s scientific discipline are included
as categorical variables, but are not shown. Note that these regression models only include authors whose
retraction notices were manually annotated.
Dependent variable: Attrition
(1) (2) (3) (4)
High Attention (>20 Altmetric score) 0.1170.1110.164∗∗ 0.109
(0.054) (0.054) (0.056) (0.049)
Academic Age 0.009∗∗∗
(0.001)
Papers 0.001∗∗∗
(0.000)
Log(Citations) 0.059∗∗∗
(0.004)
Log(Collaborators) 0.128∗∗∗
(0.006)
Female 0.029 0.023 0.028 0.046∗∗
(0.015) (0.015) (0.015) (0.014)
Author Affiliation Rank 0.000 0.000 0.000 0.000
(0.000) (0.000) (0.000) (0.000)
Reason: Misconduct 0.005 0.002 0.012 0.007
(0.026) (0.026) (0.026) (0.025)
Reason: Plagiarism 0.048 0.0680.020 0.031
(0.029) (0.030) (0.028) (0.027)
Reason: Other 0.0430.035 0.0530.039
(0.021) (0.022) (0.021) (0.020)
Retracted by: Journal 0.031 0.032 0.019 0.013
(0.021) (0.022) (0.021) (0.020)
Retracted by: Other 0.031 0.037 0.009 0.003
(0.031) (0.031) (0.030) (0.029)
Coauthors on Retracted Paper 0.004∗∗ 0.004∗∗ 0.001 0.005∗∗
(0.001) (0.001) (0.002) (0.002)
Author order: Middle Author 0.003 0.004 0.004 0.010
(0.013) (0.013) (0.013) (0.012)
Retraction Year 0.008∗∗ 0.009∗∗∗ 0.008∗∗ 0.010∗∗∗
(0.003) (0.003) (0.003) (0.002)
Journal/Conference Rank 0.028 0.0340.006 0.027
(0.014) (0.015) (0.014) (0.014)
Constant 16.638∗∗ 18.185∗∗∗ 15.733∗∗ 18.686∗∗∗
(5.240) (5.301) (5.179) (4.866)
Observations 3888 3888 3888 3888
R20.238 0.219 0.280 0.312
Adjusted R20.231 0.213 0.274 0.306
F Statistic 31.609∗∗∗ 26.657∗∗∗ 35.534∗∗∗ 43.009∗∗∗
p<0.05; ∗∗p<0.01; ∗∗∗ p<0.001
16
Table S9: Standardized mean differences between authors who stayed within academic publishing
and those that were matched.pdenotes the proportion of authors who stayed within the filtered sample,
and mdenotes the proportion of authors who were matched. For continuous variables, pand mrepresent
the mean.
Category p(n= 11,347) m(n= 3,094) Standardized Mean Difference
Academic Age 13.71 8.43 0.54
# Papers 70.0 23.49 0.55
# Citations 1618.23 287.01 0.4
# Collaborators 193.14 53.07 0.35
Gender Male 0.75 0.71 -0.18
Female 0.25 0.29 0.18
Affiliation Rank
1-100 0.20 0.18 -0.13
101-500 0.30 0.46 0.73
501-1000 0.14 0.21 0.54
>1000 0.36 0.14 -1.27
Reason
Misconduct 0.22 0.23 0.08
Plagiarism 0.28 0.27 -0.07
Mistake 0.30 0.27 -0.12
Other 0.21 0.23 0.14
Author order First or last author 0.39 0.35 -0.18
Middle author 0.61 0.65 0.18
Year of retraction
1990-1995 0.02 0.01 -0.07
1996-2000 0.03 0.02 -0.19
2001-2005 0.06 0.05 -0.15
2006-2010 0.26 0.26 -0.02
2010-2015 0.64 0.65 0.07
Discipline
Biology 0.37 0.37 -0.03
Chemistry 0.15 0.18 0.19
Medicine 0.29 0.30 0.03
Physics 0.06 0.06 -0.11
Other STEM fields 0.09 0.08 -0.21
Non-STEM fields 0.03 0.02 -0.12
17
Table S10: Results of matching analysis on collaborator retention for the authors who stayed.rand r
represent retracted and non-retracted authors, respectively. nrrepresents the number of retracted authors.
Each retracted author is compared to a closest average matched non-retracted author in terms of the col-
laborators retained 5 years post retraction. µrand µrare the average number of collaborators retained for
retracted and non-retracted authors, respectively. µδis the average relative gain of rover r. CI95% is the
95% confidence interval for µδ.pt,pks, and pware the p-values for the Welch t-test, Kolmogorov–Smirnov
test, and Wilcoxon signed ranked test, respectively.
Group nrµrµrµδCI95% ptpks pw
Overall 3,094 9.438 8.099 1.339 [0.957, 1.720] < .001 0.004 < .001
Gender
Male 2,204 10.034 8.490 1.544 [1.056, 2.032] < .001 0.003 < .001
Female 890 7.963 7.132 0.831 [0.282, 1.379] 0.033 0.653 0.019
Year of retraction
1990-1995 46 3.500 3.779 0.279 [-1.862, 1.304] 0.789 0.494 0.377
1996-2000 68 5.118 3.559 1.559 [-0.098, 3.216] 0.175 0.594 0.127
2001-2005 159 8.767 6.613 2.154 [0.419, 3.889] 0.033 0.664 0.276
2006-2010 803 8.731 7.509 1.222 [0.452, 1.993] 0.020 0.272 0.004
2011-2015 2,018 10.053 8.703 1.350 [0.875, 1.826] < .001 0.019 < .001
Author academic age
1 year 445 4.629 4.282 0.347 [-0.028, 0.723] 0.184 0.021 0.069
2 years 306 5.275 5.484 0.209 [-0.834, 0.416] 0.609 0.106 0.223
3-5 years 714 6.899 6.581 0.318 [-0.207, 0.843] 0.377 0.554 0.094
6 or more years 1,629 12.646 10.299 2.348 [1.682, 3.014] < .001 0.001 < .001
Author affiliation rank
1-100 577 8.659 7.527 1.132 [0.314, 1.950] 0.049 0.143 0.015
101-500 1,568 10.464 8.841 1.623 [1.028, 2.219] < .001 0.034 < .001
501-1000 706 8.644 7.494 1.151 [0.494, 1.807] 0.014 0.351 0.003
>1000 253 7.229 6.573 0.656 [-0.530, 1.841] 0.358 0.409 0.054
Author order
First or last author 1,075 8.315 7.428 0.887 [0.315, 1.459] 0.038 0.234 0.002
Middle author 2,019 10.036 8.456 1.579 [1.080, 2.078] < .001 0.008 < .001
Reason of retraction
Misconduct 714 9.224 8.128 1.096 [0.311, 1.880] 0.038 0.045 0.004
Plagiarism 823 9.036 8.370 0.666 [0.006, 1.326] 0.194 0.768 0.156
Mistake 847 10.331 8.171 2.160 [1.450, 2.870] < .001 0.002 < .001
Other 710 9.054 7.670 1.384 [0.467, 2.300] 0.019 0.089 < .001
Type of retraction
Author-led retraction 813 9.568 7.984 1.584 [0.778, 2.390] 0.004 0.063 < .001
Journal-led retraction 559 8.676 7.858 0.818 [0.017, 1.620] 0.138 0.732 0.145
Discipline
Biology 1,437 8.682 7.512 1.170 [0.651, 1.689] < .001 0.080 < .001
Chemistry 553 6.278 5.381 0.897 [0.365, 1.429] 0.018 0.037 0.005
Medicine 1,126 10.708 8.919 1.789 [1.107, 2.470] < .001 0.112 < .001
Physics 210 8.952 8.116 0.836 [-1.105, 2.778] 0.497 0.577 0.159
Other STEM fields 244 4.955 4.485 0.470 [-0.260, 1.200] 0.347 0.451 0.176
Non-STEM fields 47 5.064 4.546 0.518 [-0.551, 1.587] 0.754 0.956 0.289
18
Table S11: Results of matching analysis on collaborator gained for the authors who stayed.rand r
represent retracted and non-retracted authors, respectively. nrrepresents the number of retracted authors.
Each retracted author is compared to a closest average matched non-retracted author in terms of the col-
laborators gained 5 years post retraction. µrand µrare the average number of collaborators gained for
retracted and non-retracted authors, respectively. µδis the average relative gain of rover r. CI95% is the
95% confidence interval for µδ.pt,pks, and pware the p-values for the Welch t-test, Kolmogorov–Smirnov
test, and Wilcoxon signed ranked test, respectively.
Group nrµrµrµδCI95% ptpks pw
Overall 3,094 41.736 28.379 13.357 [8.971, 17.742] < .001 0.002 < .001
Gender
Male 2,204 42.184 29.725 12.460 [7.832, 17.088] < .001 0.003 < .001
Female 890 40.625 25.047 15.578 [5.508, 25.647] 0.003 0.009 0.847
Year of retraction
1990-1995 46 20.109 15.264 4.844 [-7.159, 16.847] 0.434 0.087 0.053
1996-2000 68 18.132 13.368 4.765 [-1.670, 11.200] 0.262 0.868 0.425
2001-2005 159 30.692 25.036 5.656 [-1.438, 12.749] 0.140 0.398 0.412
2006-2010 803 38.400 26.831 11.569 [2.373, 20.764] 0.017 0.246 0.322
2011-2015 2,018 45.222 30.063 15.158 [9.551, 20.765] < .001 0.010 0.001
Author academic age
1 year 445 15.789 17.793 2.004 [-6.785, 2.778] 0.417 < .001 < .001
2 years 306 18.095 20.785 2.691 [-7.259, 1.877] 0.254 0.006 0.062
3-5 years 714 30.507 24.532 5.975 [-1.006, 12.957] 0.123 0.470 0.637
6 or