Figure - uploaded by Eitan Frachtenberg
Content may be subject to copyright.
Review fairness

Review fairness

Source publication
Article
Full-text available
Computer Science researchers rely on peer-reviewed conferences to publish their work and to receive feedback. The impact of these peer-reviewed papers on researchers’ careers can hardly be overstated. Yet conference organizers can make inconsistent choices for their review process, even in the same subfield. These choices are rarely reviewed critic...

Similar publications

Conference Paper
Full-text available
Ubiquitous computing is the emerging technology in computer science. It is a method of enhancing computer usage by making many small computer systems available throughout the environment i-e physical environment. The main concept is to implement smart objects and computer system in real world being disappearing. Disappearance of computer system imp...

Citations

... In this work, we focus on author rebuttals due to its widespread use and frequently-raised questions about its efficacy.) In computer science conferences, rebuttal stages are a widely adopted practice, with a large number of recent conferences having instituted such periods [1,2]. ...
Article
Full-text available
Objective Peer review frequently follows a process where reviewers first provide initial reviews, authors respond to these reviews, then reviewers update their reviews based on the authors’ response. There is mixed evidence regarding whether this process is useful, including frequent anecdotal complaints that reviewers insufficiently update their scores. In this study, we aim to investigate whether reviewers anchor to their original scores when updating their reviews, which serves as a potential explanation for the lack of updates in reviewer scores. Design We design a novel randomized controlled trial to test if reviewers exhibit anchoring. In the experimental condition, participants initially see a flawed version of a paper that is corrected after they submit their initial review, while in the control condition, participants only see the correct version. We take various measures to ensure that in the absence of anchoring, reviewers in the experimental group should revise their scores to be identically distributed to the scores from the control group. Furthermore, we construct the reviewed paper to maximize the difference between the flawed and corrected versions, and employ deception to hide the true experiment purpose. Results Our randomized controlled trial consists of 108 researchers as participants. First, we find that our intervention was successful at creating a difference in perceived paper quality between the flawed and corrected versions: Using a permutation test with the Mann-Whitney U statistic, we find that the experimental group’s initial scores are lower than the control group’s scores in both the Evaluation category (Vargha-Delaney A = 0.64, p = 0.0096) and Overall score (A = 0.59, p = 0.058). Next, we test for anchoring by comparing the experimental group’s revised scores with the control group’s scores. We find no significant evidence of anchoring in either the Overall (A = 0.50, p = 0.61) or Evaluation category (A = 0.49, p = 0.61). The Mann-Whitney U represents the number of individual pairwise comparisons across groups in which the value from the specified group is stochastically greater, while the Vargha-Delaney A is the normalized version in [0, 1].
... There are a number of papers in the literature that conduct surveys of authors. [17] survey authors of accepted papers from 56 computer science conferences. The survey was conducted after these papers were published. ...
... [31] conduct multiple surveys: an anonymous survey in the ICML 2021 and EC 2021 conferences had response rates of 16% and 51% respectively; a second, non-anonymous opt-in survey in EC 2021 had a response rate of 55.78%. [17] survey authors of accepted papers in 56 computer systems conferences, with response rates ranging from 0% to 59% across these conferences. The survey by [29] was optin in 2011 and their response rate was 28%. ...
... Furthermore, even among rejected papers, over 30% of responses mentioned that the reviews made their perception more positive. While past studies [11, [17][18][19] document whether the review process helps improve the paper, the results in Fig 6 shows that it also results in a change of perception of authors about their papers about half the time. ...
Article
Full-text available
How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we surveyed the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors had roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction was 70% for an approximately 25% acceptance rate. (2) Female authors exhibited a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers were similarly calibrated, but better than authors who were not invited to review. (3) Authors’ relative ranking of scientific contribution of two submissions they made generally agreed with their predicted acceptance probabilities (93% agreement), but there was a notable 7% responses where authors predicted a worse outcome for their better paper. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate—about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review.
... In this work, we focus on author rebuttals due to its widespread use and frequently-raised questions about its efficacy.) In computer science conferences, rebuttal stages are a widely adopted practice, with a large number of recent conferences having instituted such periods (Dershowitz and Verma, 2022;Frachtenberg and Koster, 2020). ...
Preprint
Peer review frequently follows a process where reviewers first provide initial reviews, authors respond to these reviews, then reviewers update their reviews based on the authors' response. There is mixed evidence regarding whether this process is useful, including frequent anecdotal complaints that reviewers insufficiently update their scores. In this study, we aim to investigate whether reviewers anchor to their original scores when updating their reviews, which serves as a potential explanation for the lack of updates in reviewer scores. We design a novel randomized controlled trial to test if reviewers exhibit anchoring. In the experimental condition, participants initially see a flawed version of a paper that is later corrected, while in the control condition, participants only see the correct version. We take various measures to ensure that in the absence of anchoring, reviewers in the experimental group should revise their scores to be identically distributed to the scores from the control group. Furthermore, we construct the reviewed paper to maximize the difference between the flawed and corrected versions, and employ deception to hide the true experiment purpose. Our randomized controlled trial consists of 108 researchers as participants. First, we find that our intervention was successful at creating a difference in perceived paper quality between the flawed and corrected versions: Using a permutation test with the Mann-Whitney U statistic, we find that the experimental group's initial scores are lower than the control group's scores in both the Evaluation category (Vargha-Delaney A=0.64, p=0.0096) and Overall score (A=0.59, p=0.058). Next, we test for anchoring by comparing the experimental group's revised scores with the control group's scores. We find no significant evidence of anchoring in either the Overall (A=0.50, p=0.61) or Evaluation category (A=0.49, p=0.61).
... This peer-review perspective segues naturally to examining aspects of the review process itself. In an author survey we conducted in 2019 [48], authors from eight different SIGMETRICS'17 papers shared details about their reviews and the reviewing process. Although survey responses remain confidential, and the low number of responses is insufficient to draw statistically significant conclusions, we can still observe four trends in the responses. ...
Article
Full-text available
Performance evaluation is a broad discipline within computer science, combining deep technical work in experimentation, simulation, and modeling. The field’s subjects encompass all aspects of computer systems, including computer architecture, networking, energy efficiency, and machine learning. This wide methodological and topical focus can make it difficult to discern what attracts the community’s attention and how this attention evolves over time. As a first attempt to quantify and qualify this attention, using the proxy metric of paper citations, this study looks at the premier conference in the field, SIGMETRICS. We analyze citation frequencies at monthly intervals over a five-year period and examine possible associations with myriad other factors, such as time since publication, comparable conferences, peer review, self-citations, author demographics, and textual properties of the papers. We found that in several ways, SIGMETRICS is distinctive not only in its scope, but also in its citation phenomena: papers generally exhibit a strongly linear rate of citation growth over time, few if any uncited papers, a large gamut of topics of interest, and a possible disconnect between peer-review outcomes and eventual citations. The two most-cited papers in the dataset also exhibit larger author teams, higher than typical self-citations, and distinctive citation growth curves. These two papers, sharing some coauthors and a research focus, could either signal the area where SIGMETRICS had the most research impact, or they could represent outliers; their omission from the analysis reduces some of the otherwise distinctive observed metrics to nonsignificant levels.
... For example, a gender assigned by an outdated biography paragraph with pronouns may no longer agree with the self-identification of the researcher. To verify the validity of our approach, we compared our manually assigned genders to self-assigned binary genders in a separate survey we conducted among 918 of the authors [28]. We found no disagreements for these authors, which suggests that the likelihood of disagreements among the remaining authors is low. ...
... Nevertheless, the resulting statistics are directly comparable to other studies employing the same approach. Moreover, our survey results indicate that such peer-review bias may be limited [28]. ...
... Mirroring studies from other fields that found no evidence of gender bias in the peer-review process [6,27,43], we found that women's papers were actually accepted at slightly higher rates when their identity was visible to reviewers (in 24 single-blind conferences) or when it was prominent in the first author position (11.1% of papers). An author survey also found that the reviews women received in the single-blind conferences in our dataset showed similar or higher grades than men's [28]. ...
Article
Full-text available
The gender gap in computer science (CS) research is a well-studied problem, with an estimated ratio of 15%–30% women researchers. However, far less is known about gender representation in specific fields within CS. Here, we investigate the gender gap in one large field, computer systems. To this end, we collected data from 72 leading peer-reviewed CS conferences, totalling 6,949 accepted papers and 19,829 unique authors (2,946 women, 16,307 men, the rest unknown). We combined these data with external demographic and bibliometric data to evaluate the ratio of women authors and the factors that might affect this ratio. Our main findings are that women represent only about 10% of systems researchers, and that this ratio is not associated with various conference factors such as size, prestige, double-blind reviewing, and inclusivity policies. Author research experience also does not significantly affect this ratio, although author country and work sector do. The 10% ratio of women authors is significantly lower than the 16% in the rest of CS. Our findings suggest that focusing on inclusivity policies alone cannot address this large gap. Increasing women’s participation in systems research will require addressing the systemic causes of their exclusion, which are even more pronounced in systems than in the rest of CS.
... Choice of conferences. Our dataset evolved from our previous study of conferences related to one major field, computer systems [41]. The conferences we selected include some of the most prestigious systems conferences (based on indirect measurements such as Google Scholar's metrics), as well as several smaller or less-competitive conferences for contrast. ...
... For example, a gender assigned by an outdated biography paragraph with pronouns may no longer agree with the self-identification of the researcher. To verify the validity of our approach, we compared our manually assigned genders to self-assigned binary genders in a separate survey we conducted among 918 of the authors [41]. We found no disagreements for these authors, which suggests that the likelihood of disagreements among the remaining authors is low. ...
... Another example is the field of systems, which also generally exhibits very low FAR. Systems is a large and influential field, with many industrial and technological applications [41]. It is therefore particularly of interest to try to explain and reduce the gender gap, as this could have far-reaching societal impact [2]. ...
Article
Full-text available
The research discipline of computer science (CS) has a well-publicized gender disparity. Multiple studies estimate the ratio of women among publishing researchers to be around 15–30%. Many explanatory factors have been studied in association with this gender gap, including differences in collaboration patterns. Here, we extend this body of knowledge by looking at differences in collaboration patterns specific to various fields and subfields of CS. We curated a dataset of nearly 20,000 unique authors of some 7000 top conference papers from a single year. We manually assigned a field and subfield to each conference and a gender to most researchers. We then measured the gender gap in each subfield as well as five other collaboration metrics, which we compared to the gender gap. Our main findings are that the gender gap varies greatly by field, ranging from 6% female authors in theoretical CS to 42% in CS education; subfields with a higher gender gap also tend to exhibit lower female productivity, larger coauthor groups, and higher gender homophily. Although women published fewer single-author papers, we did not find an association between single-author papers and the ratio of female researchers in a subfield.
... For example, a gender assigned by an outdated biography paragraph with pronouns may no longer agree with the self-identification of the researcher. To verify the validity of our approach, we compared our manually assigned genders to self-assigned binary genders in a separate survey we conducted among 918 of the authors [25]. We found no disagreements for these authors, which suggests that the likelihood of disagreements among the remaining authors is low. ...
... Nevertheless, the resulting statistics are directly comparable to other studies employing the same approach. Moreover, our survey results indicate that such peer-review bias may be limited [25]. ...
... Mirroring studies from other fields that found no evidence of gender bias in the peer-review process [4,24,39], we found that women's papers were actually accepted at slightly higher rates when their identity was visible to reviewers (in 24 single-blind conferences) or when it was prominent in the first author position (11.1% of papers). An author survey also found that the reviews women received in the single-blind conferences in our dataset showed similar or higher grades than men's [25]. ...
Preprint
Full-text available
The gender gap in computer science (CS) research is a well-studied problem, with an estimated ratio of 15%--30% women researchers. However, far less is known about gender representation in specific fields within CS. Here, we investigate the gender gap in one large field, computer systems. To this end, we combined data from 53 leading systems conferences with external demographic and bibliometric data to evaluate the ratio of women authors and the factors that might affect this ratio. Our main findings are that women represent only about 10% of systems researchers, and that this ratio is not associated with various conference factors such as size, prestige, double-blind reviewing, and inclusivity policies. Author research experience also does not significantly affect this ratio, although author country and work sector do. The 10% ratio of women authors is significantly lower than that of CS as a whole. Our findings suggest that focusing on inclusivity policies alone cannot address this large gap. Increasing women's participation in systems research will require addressing the systemic causes of their exclusion, which are even more pronounced in systems than in the rest of CS.
... That said, employing the same approach as other studies implies that the resulting statistics are directly comparable to theirs. Moreover, our survey results indicate that such peer-review bias may be limited [17]. ...